Reports

Thank you for your insights. The idea behind workers is to execute parallel processing, for ex for 700k records, it makes chunks and then assign it to workers for calculations, then merge the results and send it back to client in a json format , my Api structure is request comes on api controller, parsing of excel file, goes to the service layer, data gets splits in chunks, chunks are distributed to workers for maths/calculations , we merge back the calculations and streaming response is sent to client.

My current approach is:

Parallel path uses Python’s ProcessPoolExecutor inside the request lifecycle to fan out chunks, then merge and return the response. No Celery/Redis; the HTTP handler orchestrates, workers are OS processes, results are returned in-memory and merged, then sent back.

I will work on ditching pandas; my main issue was the impact of workers when the project is deployed in prod. Tbh I am not familiar with using the redis and producers but again after your comment I went through it and it may be a better approach in the case, so I am going to explore more about it. My focus to reduce the api response time and not use complex processes as I would be the one to debug it heh.

Original DataFrame (700,000 rows)
│
├─ Chunk 1  [0     - 19,999]   → 20,000 rows
├─ Chunk 2  [20,000 - 39,999]   → 20,000 rows
├─ Chunk 3  [40,000 - 59,999]   → 20,000 rows
├─ ...
└─ Chunk 35 [680,000 - 699,999] → 20,000 rows

79811073