To solve the issue, I replaced DataCollatorWithPadding
with DataCollatorForLanguageModeling
.
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer,mlm=False)
DataCollatorForLanguageModeling
automatically creates the labels
column, so you don’t need to generate it manually in your code. This change fixed the problem.