Reports

We experienced a similar problem and eventually discovered that in Azure OpenAI you need to set the Asynchronous Content Filter option. It's buried in the Azure model deployment settings in the Azure AI Foundry portal.

Without that it is essentially internally buffering the streamed response to enable the system to scan and block / flag content before it's returned.

79411004