sometimes i find that the behavior changes and operates better when you use a better model. Try gpt-4o instead if you have not already.