The problem was solved by wrapping the prompt with the chat template that Llama models use during instruction tuning. Adding the special tokens to the prompt better steered the model in the right direction.
Here is a code block that demonstrates what worked:
# Prepare a prompt for email re-write task
original_text = "Hi guys, just checking in to see if you finally finished the slides for next week when we meet with Jack. Let me know asap. Cheers, John"
messages = [
{"role": "system", "content": "You are an AI assistant that revises emails in a professional writing style."},
{"role": "user", "content": f"Revise the following draft email in a professional voice, preserving meaning. Only provide the revised email.\n\n### Draft:\n{original_text}"}
]
# Apply the chat template (adds special tokens like <|start_header_id|>, etc.)
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False, # We want the string, not tokens yet
add_generation_prompt=True # Ensures the prompt ends expecting the assistant's turn
)
print("--- Formatted Prompt ---")
print(prompt)
print("------------------------")