Since the model is actually changing between the odd epochs I it is more likely to be a hyperparameter issue than an architecture issue. I would try adjusting the learning rate and batch size and see if that that helps.