I don't see an error here other than a statement reversal about the training dataset while predicting the model Training model.
In the below statement trainTgt had been sent to mask the source data to train. It doesn't ideally matter since you are only considering the output predictions for your reference. Do you have any error message to display to understand more about the issue?
tgt_padding_mask = generate_padding_mask(trainTgt, tokenizer.vocab['[PAD]']).cuda()
model.train()
trainPred: torch.Tensor = model(trainSrc, trainTgt, tgt_mask, tgt_padding_mask)
Thanks,
Ramakoti Reddy.