Set a proper pad_token and pass attention_mask in model.chat() if it supports it.
If .chat() doesn’t take attention_mask, you’ll need to tokenize your inputs first with
encoding = tokenizer(text, return_tensors="pt", padding=True, return_attention_mask=True)
then pass encoding["attention_mask"] manually to model.generate().