Your model maybe suffering from an insufficient training sample
I think where you may be having a problem is the dataset structure for training. Check again how you have prepared the dataset for training. There is an issue with the tokenized function
Then I don't think you should use the global tokenizer -- why not pass the tokenizer explicitly