79122694

Date: 2024-10-24 15:45:04
Score: 6.5 🚩
Natty: 5
Report link

I understand the answer from @artoby, but isn't the linear layer (or feed forward or thinking layer) after the self-attention destroying this information flow as it pulls in information from other tokens to previous token?

Reasons:
  • Low length (0.5):
  • No code block (0.5):
  • Ends in question mark (2):
  • User mentioned (1): @artoby
  • Single line (0.5):
  • Looks like a comment (1):
  • Low reputation (1):
Posted by: Flooo