Your code is working as intended. You're using the Llama-2-7b-chat-hf model, which is designed to align with human preferences and conversational contexts. As a result, when presented with a straightforward and common prompt like yours, the model tends to generate responses with high confidence, leading to the observed probabilities. If you were to ask a less common question, you would likely notice that the probabilities decrease, reflecting the model's uncertainty in those scenarios.