79362983

Date: 2025-01-16 20:29:15
Score: 0.5
Natty:
Report link

I think the issue isn’t about where the processing happens, but how the data is handled after processing. When you run client.query(query).to_dataframe(), BigQuery executes the query and then transfers the entire result set to your Colab instance. I suppose this is where the bottleneck occurs.

What you can try is to perform as much processing as possible within BigQuery itself. Instead of pulling the entire result set into a DataFrame, export the results to a destination like Cloud Storage. Then, you can process the data in smaller chunks or use tools designed for large datasets within your Colab environment.

Reasons:
  • Long answer (-0.5):
  • No code block (0.5):
  • Low reputation (0.5):
Posted by: jggp1094