I'm facing exactly the same issue. The number of tokens used doesn't make sense, even following the logic of RAG, where we need to count the tokens of the system prompt + main prompt + chunks. In my case, I'm retrieving 3 chunks of 256 tokens, asking a very small prompt and it results in ~3k tokens. More details here: https://learn.microsoft.com/en-us/answers/questions/2103832/high-token-consumption-in-azure-openai-with-your-d.
Wondering if there are additional steps running under the hood.