79157272

Date: 2024-11-04 23:28:08
Score: 2
Natty:
Report link

Credit given to user @paleonix

line A_s[threadIdx.y * tile_size + threadIdx.x] = a_d[row*n + tile*tile_size + threadIdx.x];

should be changed to

A_s[threadIdx.y * tile_size + threadIdx.x] = a_d[row*k + tile*tile_size + threadIdx.x];

this is due to incorrect indexing of global memory Matrix A into the shared memory matrix A_s, resulting is an incorrect partial sum.

Reasons:
  • Has code block (-0.5):
  • User mentioned (1): @paleonix
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: Maayan Israel