In my case, a denominator in the Adam optimizer was getting near zero. What stabilized the training was increasing the value of the eps parameter.