I ended up checking if the input was over 1000 chars. If yes, then first send the input to a llm with the prompt, "Summarize this text to max 600 characters. The text will be used to retrieve documents from a vector storage" (adjust as needed).
Then use the returned text to fetch context with the original input to generate the answer.