Reports

Check out a userscript which highlights deleted posts. GitHub

79122694

Date: 2024-10-24 15:45:04

Score: 6.5 🚩

Natty: 5

I understand the answer from @artoby, but isn't the linear layer (or feed forward or thinking layer) after the self-attention destroying this information flow as it pulls in information from other tokens to previous token?

Reasons:

Low length (0.5):
No code block (0.5):
Ends in question mark (2):
User mentioned (1): @artoby
Single line (0.5):
Looks like a comment (1):
Low reputation (1):

Posted by: Flooo