The debugger is stopping at “weird” places because the compiler inlined/optimized template/device functions (and/or you don’t have device debug symbols), so source binary mapping is not 1:1.
Quick steps to fix (do these in your Debug build)
Build with device debug info and disable device optimizations:
nvcc: add -G (Generate debug info for device) and disable optimizations for CUDA code.
Only use -G for local debug builds (it drastically changes code/performance).
Build host with debug info and no optimizations:
MSVC: C/C++ → Optimization = Disabled (/Od) and Debug Info = /Zi.
Clean + full rebuild.
Force a non-inlined function where you want a reliable breakpoint:
Use noinline on the functions you debug:
GCC/Clang/nvcc: attribute((noinline))
MSVC: __declspec(noinline)
Example:
device host attribute((noinline)) TVector3<T> operator+(...) { ... }
Start the correct debug session in Nsight:
Verify symbols and sources:
Short-run kernel for easier stepping: