The short answer: No, in general, a multiprocessing pool does not force the use of all cores.
But let's clear up some things:
The first argument to the Pool
initializer, processes, is the number of processes to create within the pool, which is totally different than the number of cores to be used. If you do not specify this argument or use a value of None
, then the number of processes that will be created is the number of CPUs (cores) you have given by multiprocessing.cpu_count()
and this is different than the number of CPUs that are usable, which is given by os.process_cpu_count()
. So your specifying n_cores = 20
and passing n_cores
to the pool's initializer might confuse somebody who is unfamiliar how multiprocessing works. This variable would make more sense if it were named n_processes
.
So how may cores (CPUs) will actually be used? Assuming you have N cores available to you where N < 20 (the pool size), then we have various cases:
example_function
is long running and contains significant I/O or network activity such that your process periodically relinquishes the core it is running on allowing other process to use the core until the I/O completes, then it is possible that all the cores will ultimately be used -- but not necessarily concurrently. This is a situation where it could be useful to create a pool size that is greater than the number of cores allowing for more I/O activity to be overlapped.example_function
is long running and is 100% CPU-bound (i.e. no I/O, network activity, etc.). In this case I would expect all cores to be eventually used by your worker function if you are submitting at least N tasks, but it is not guaranteed depending on what other processes are competing for CPU resources. In any case, it makes no sense to create a pool size that is larger than N; you cannot have more than N CPU-bound computations executing in parallel when you only have N CPUs.example_function
is extremely short running and 100% CPU-bound (for example, the function just returns the original argument passed to it). Assuming the length of args
(the number of tasks being submitetd) is rather small, it is possible that after the first pool process is created it is able to process all the submitted tasks before the rest of the pool processes have been created. In this extreme case only one CPU would be used by your map
call.With that out of the way, you state:
I would like to know if there is a workaround for this issue. Because if there is no workaround, then we need to look at other methods such as MPI.
If I'm missing something, please advise.
What is your issue? You never make this clear and yes, I think you are missing something, which I hope the above explanation clears things up.
In summary:
If you have a lot of long running tasks that are a combination of CPU and I/O, it could be profitable to create a pool size larger than the number of cores you have if maximum performance is what you seek. You should not, however, be concerned with which cores are being used to run these tasks. When your worker function needs the CPU to perform its work on a task, the operating system will choose a CPU for you, which may or may not be a CPU that has been previously assigned to any of your tasks.
If, however, you want to limit the resources you use, then do this by creating a pool with fewer processes. For example, you are submitting 100 tasks but you never want 4 tasks to be worked on in parallel. In this case create a pool size of 4. But, again, you should not care which CPUs are assigned to these 4 processes over its lifetime on the assumption that one CPU is as good as another.