After some investigation, I have found the problem. The model included two Dropout
layers, which are active during training but disabled during evaluation. This was affecting the final accuracy in evaluate
, as the model sees all connections in use during inference. After removing the Dropout
layers, the model was able to train correctly.