If you are using the model to transcribe streaming audio, try using streamingRecognize() function as this is specialized in streaming audio transcription. If your audios are longer than 60 seconds, I would recommend to split them in 60 sec chunks, and transcribe them all and join their output into one. I tried this approach with chirp_2 model, it worked well. Most of the time your audio quality matters. Watch out for that as well