No idea what your question is unless you upload the error messages printed on your terminal. Besides, open-source LLM on huggingface should have a built-in kv_cache implementation. I don't know about Qwen, but Llama definitely has one.