I dont think there is way in the API to extract this. They allow you to control the way you sample using softmax temperatures but dont expose the probabilities directly. I resort to just empirical sampling, of course it is not efficient, but it works
def estimate_probabilities(prompt, n_samples=100):
responses = []
for _ in range(n_samples):
response = anthropic. completions.create(
model="<your model of choice here>",
prompt=prompt,
temperature=1.0,
max_tokens_to_sample=1024
)
responses.append(response.content)
time.sleep(0.1)
unique_responses = set(responses)
probabilities = {
response: responses.count(response) / n_samples
for response in unique_responses
}
return dict(sorted(probabilities.items(), key=lambda x: x[1], reverse=True)
If you need cascading for many tokens in the completion like OpenAI does you will need to extend my code accordingly