79696738

Date: 2025-07-10 08:34:22
Score: 0.5
Natty:
Report link

q, k, v and o are standing respectively for query, key, value and output. The most common combination for memory efficiency is q, v. To which you can add k if you need key adaptation to improve control over attention weights. If your downstream task is gen heavy you can also include the output projection but if you are memory-limited, do not bother.

Hope it helps.

Reasons:
  • Whitelisted phrase (-1): Hope it helps
  • No code block (0.5):
  • Low reputation (1):
Posted by: Guillaume Levene