79750351

Date: 2025-08-29 13:44:11
Score: 1
Natty:
Report link

With the help of Anthropic I have found the issue. In the first kernel I was defining the swap space DenseXY while in the second the 3D matrix was declared DenseZY. I did not think this could make any difference except for how many cache misses I would have maybe encountered. Actually if I change all the declarations to DenseXY it compiles and runs.

By the way, for the sake of good order, I also understood that the density of the stride is opposite to what my intuition brought me to:

Stride3D.DenseXY:

Memory order: X → Y → Z (X changes fastest, Z changes slowest) For array[z][y][x]: consecutive X elements are adjacent in memory Memory layout: [0,0,0], [0,0,1], [0,0,2], ..., [0,1,0], [0,1,1], ..., [1,0,0]

Stride3D.DenseZY:

Memory order: Z → Y → X (Z changes fastest, X changes slowest) For array[x][y][z]: consecutive Z elements are adjacent in memory Memory layout: [0,0,0], [1,0,0], [2,0,0], ..., [0,1,0], [1,1,0], ..., [0,0,1]

Reasons:
  • Long answer (-0.5):
  • No code block (0.5):
  • Self-answer (0.5):
  • Low reputation (0.5):
Posted by: AlessandroParma