Reports

I've recently encountered an unexpected performance issue while running the Whisper Turbo V3 ASR model on NVIDIA GPUs. When inferencing via Triton Inference Server, the model exhibits better performance on a V100 GPU compared to an A100 GPU. This is surprising since the A100 is significantly more powerful and optimized for AI workloads.

Observations:

Latency and Throughput: Lower latency and higher throughput were observed on the V100.

Model and Environment: The Whisper Turbo V3 model is the same in both cases, and the Triton configurations are identical.

Any suggestion why this might happen. Thanks in advance for any help!

Reasons:

Blacklisted phrase (0.5): Thanks
Blacklisted phrase (1): any help
RegEx Blacklisted phrase (3): Thanks in advance
RegEx Blacklisted phrase (2): Any suggestion
Long answer (-0.5):
No code block (0.5):
Low reputation (1):

Posted by: Sneh Shah

79229061