79232526

Date: 2024-11-28 03:46:12
Score: 2.5
Natty:
Report link

this is because dim = 64, head_num2 = 16, 64 // 16 = 4 and 4 is not divisible by 8. Pytorch becomes inefficient in this case.

To avoid this, also need to set dim = 128 as 128 // 16 = 8.

Reasons:
  • Low length (0.5):
  • No code block (0.5):
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: Kerry Zhu