Thanks to Yurii for letting me know the change of behavior at FastAPI 0.106.0. It seems impossible to extend the lifespan of object (or at least cannot be done in a clean way).
After some research I realize the best practice is still to serialize and cache the object before the end of endpoint, and start a new context in a background task. So there has to be a different context manager that doesn't load object from cache, but instead use an existing object, and still save it to the cache.
Here I give my modification for reference, just added an async context manager method autosave, and used it in the async generator. In this example it's equivalent to manually save, but this pattern can extend to more complicated scenarios.
@asynccontextmanager
async def autosave(self, cache: dict):
yield
await self.save(cache)
async def chat(self, msg: str, cache: dict):
self.add_message(msg)
async def get_chat():
async with self.autosave(cache=cache):
resp = []
for i in range(random.randint(3, 5)):
# simulate long network IO
await asyncio.sleep(0.1)
chunk = f"resp{i}:{random.randbytes(2).hex()};"
resp.append(chunk)
yield chunk
self.add_message("".join(resp))