This error is almost always a KV-cache mismatch, (the Cache object introduced in recent versions). In training, you don’t need the KV cache at all.
model.config.use_cache = False