Best practice is to wrap the generator in proper error handling. For token limits, validate or truncate input before sending to the API. For 429 rate limit errors, use exponential backoff or retries with libraries like tenacity
. For other OpenAI errors (API errors, connection issues, timeouts), catch them explicitly and return clear HTTP error responses to the client. Always close the stream properly and yield partial tokens safely so clients don’t get cut off mid-response.