Please elaborate on the way you use FastAPI's workers, and your implementation of API as a whole. If uploading a CSV, preprocessing and all the math is done with one endpoint - you won't really gain any performance by increasing number of FastAPI workers. Your work is mostly CPU-bound, so it makes sense to separate all the networking and all the maths into different entities. The way I usually do this in my projects is like this: have an API/Producer (FastAPI), which processes incoming requests, if a request is nested - splits it into different jobs and later passes it for workers to process. Workers are replicated and run in parallel, each one processing it's own part of workload. After completing the work, results are passed back to Producer for a response. More technically speaking, your Producer is FastAPI, for workers I usually go with Celery, which is a popular and solid choice, but there are many others, and you'll need a way for Producer and Worker to communicate - Redis is a good choice. Adding to that - I'd suggest ditching Pandas and going with Polars, in my experience performance gain is really noticeable. So your workflow will go like that: upload a csv -> split it in chunks -> assign a separate task to process each chunk and execute them in parallel -> gather results and return a response