u can do the following :
Masking + Loss Adjustment:
Pad targets with zeros (or any value),
Use a mask during training to ignore padded positions in the loss calculation.
Avoid Ambiguous Padding:
Instead of 0.0 (which may be valid), use a clearly invalid float like -9999.0,
Then apply masking to ignore this during training and evaluation.
Use Sequence Models:
Dynamic Output Generation:
The key: combine padding with masking, and never penalize the model for predictions on padded areas.