Reports

I recommend a different approach to your design.

A 30-second delay in 2025 is quite long, unless you're performing deep research that involves web crawling, compiling, and generating a report. For long running tasks, it's advisable to use an intermediate system like a Pub/Sub Queue. While this introduces the overhead of setting up new queues, managing message reception, and handling retries for failures, it's generally more efficient.

If you prefer to maintain a simpler system and a certain degree of latency is acceptable, consider the following:

Utilize AsyncIO as suggested in https://stackoverflow.com/a/78884632/1686903.
Improve model performance by using a Global Endpoint or Provisioned Throughput.
Experiment with Gemini Flash-lite models to evaluate their advantages and disadvantages.
Explore libraries such as https://pypi.org/project/backoff/ along with timeout based response mentioned in https://stackoverflow.com/a/79722709/1686903

79732410