Reports

Firstly, we need to diagnose the root cause, Dask does not authomatically spill to disk on joins. Dask can handle larger-than-memory datasets, but it still relies on having enough memory for computations and intermediate results.

It is also possible that dataframe has exceeded available memory before it can be written to disk.

Optimazing the Exsisting Desk Code:- (Repartitioning) (df.reparttition(npartitions=10). This can help, buit the number of partitioning should be chosen carefully. Too many partitioning can increase overhead .

Also Early Filtering can also help, filter the dataframes before the merges to reduce the overall size of the data. Example: if you need data for a specific data range, filter on that date range before merging.

79777639