I’ve worked on a project fine-tuning Whisper-Tiny for translation tasks, and I got good results. You can check out my repo for the steps I followed, which might help you fix the issue you’re facing.
For your problem, I suggest checking if your training data includes the correct output in Arabic script. Also, make sure the fine-tuning settings are adjusted for translation (not just transcription). Double-check your data preprocessing and ensure it's compatible with Arabic script. Testing the model after fine-tuning with a few examples should help identify if the issue is with the training or how you're using the model during inference.
Feel free to check out my repo for more details!