llama.cpp doesn't have a /v1/generate endpoint, so the server will respond with a 404 error. Use the Open AI-compatible /v1/chat/completions endpoint instead.
/v1/generate
/v1/chat/completions