79331136

Date: 2025-01-05 16:58:02
Score: 1
Natty:
Report link

Packing the matrices A and B is indeed necessary.

For an short outline, consider the PowerPC documentation (red book). https://www.redbooks.ibm.com/abstracts/redp5612.html (page 35). PowerPC has a similar blocked matrix multiply instruction as VNNI and Arm Neon.

I have written such packing function within the matrix multiply code. The packing didn't do any harm to the throughput of the code.

Reasons:
  • No code block (0.5):
  • Self-answer (0.5):
Posted by: fabian