f you're okay using synchronous streaming
from transformers import TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
Then, redirect stdout to a custom generator function. But since you already want async and FastAPI streaming, let’s fix it properly.