It would be nice to have some code to try to replicate the issue.
I have two possible guesses for you:
It could be some interactions between the torch.multiprocess and the python futures, this could be happening because you have not pass any context (mp_context) Wich defaults to the multiprocess context when creating the pool. This might be breaking torch spawning. Try to set the context to the context of torch, which is returned by the spawn() call.
At the cost of performance, try to limit the pytprch threads set_num_thread
to 1 or 2, same thing with the pool. When doing this monitor the memory usage,
I think that either the copy process of the python multiprocess is making internal torch values not change and continue to fork until the bomb. Or some memory copy between forks is filling your ram.
As an addendum, you could also try seeing what happens if you change the start method of pytprch to spawn instead of fork.
Updates with any results, and maybe some code?