Currently I am also facing similar issue with Qwen2.5-3B model where full fine tuning on 5k dataset takes only 38 minutes while inference on 1319 data take 1 hour 24 minutes.
Reasons:
Low length (1):
No code block (0.5):
Me too answer (2.5): I am also facing similar issue