79829735

Date: 2025-11-25 13:08:48
Score: 5
Natty: 5
Report link

Dude before applying the accumulated gradients should we first divide the accumulated gradient by number by size of the effective mini batch / target accumulation_count . ?

Reasons:
  • Low length (1):
  • No code block (0.5):
  • Ends in question mark (2):
  • Single line (0.5):
  • Low reputation (1):
Posted by: shaurya1negi