79229061

Date: 2024-11-27 05:54:18
Score: 7.5 🚩
Natty: 4.5
Report link

I've recently encountered an unexpected performance issue while running the Whisper Turbo V3 ASR model on NVIDIA GPUs. When inferencing via Triton Inference Server, the model exhibits better performance on a V100 GPU compared to an A100 GPU. This is surprising since the A100 is significantly more powerful and optimized for AI workloads.

Observations:

Latency and Throughput: Lower latency and higher throughput were observed on the V100.

Model and Environment: The Whisper Turbo V3 model is the same in both cases, and the Triton configurations are identical.

Any suggestion why this might happen. Thanks in advance for any help!

Reasons:
  • Blacklisted phrase (0.5): Thanks
  • Blacklisted phrase (1): any help
  • RegEx Blacklisted phrase (3): Thanks in advance
  • RegEx Blacklisted phrase (2): Any suggestion
  • Long answer (-0.5):
  • No code block (0.5):
  • Low reputation (1):
Posted by: Sneh Shah