As you said you are using DeiT model and the learning rate for training the model like Deit is relatively high which leads model to converge to a sub optimal solution and that is why your model is favouring only one class.