The NCCL backend fails using under Docker + WSL2 (unstable multi-GPU communication) possibly due to the incompatibility of the NCCL, Docker and WSL2 versions that are not compatible. Also, NCCL works best with Native Linux.
The other alternative is to increase the GPU size in the system that can accommodate the model requirement.
https://medium.com/@devaru.ai/debugging-nccl-errors-in-distributed-training-a-comprehensive-guide-28df87512a34 https://ai.google.dev/gemma/docs/core