79403605

Date: 2025-01-31 17:52:16
Score: 1
Natty:
Report link

After further investigation, I've found why the kernels are being run in serial. By default, after cuda 12.2, a setting called CUDA_MODULE_LOADING is set to lazy. The cuda C++ programming guide outlines issues with lazy CUDA_MODULE_LOADING with respect to concurrent execution of kernels:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#concurrent-execution

Concurrent execution of kernels is described by the guide as an anti-pattern, but a workaround is to set the environment variable: CUDA_MODULE_LOADING=EAGER

Reasons:
  • Has code block (-0.5):
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: rubikssolver4