This isn't going to be the answer you were hoping for but hopefully it will have some guidance that is useful to you.
But I’m trying to make sure I’m asking the right questions upfront. What should I be looking for when it comes to system performance?
I really like that you are taking a moment to stop and think about what you are trying to achieve before "just doing stuff". This is multi-facetted:
What you should be looking for is to understand what the desired performance targets / non-functional requirements are. If your customer has specific performance requirements then fine, but if they don't then you have no idea what "success" looks like. If you haven't discussed performance targets with your customer then it's time to do so.
On performance optimization and motivations in general, this article is a must read. I only found it through an SO post recently. It goes back to first principles about what are you actually trying to achieve and why.
What’s the best way to push the whole thing to its limits and really explore where it breaks?
I've always thought that performance testing is a specialist area, fraught with complexity. It depends on how much effort and time you want to invest in this, and how critical the results are. If it's critical maybe talk to a specialist performance tester/company.
Low-effort testing might be stubbing out the external systems in your dev environment and throwing some transactions through, with some kind of observability to measure performance; high-effort testing might be setting up a dedicated environment, working with the providers of the external systems, etc.
Questions to ask / aspects to consider:
What does real-world usage look like?
Transaction counts - what is "average" and what is "peak". Average and peak in the context of a timeframe e.g. daily, weekly, monthly - only you will know which is the right timeframe to use based on the context of your solution. Monthly may be useful if you are using cloud services that charge per-month.
Transaction sizes - average and max. E.g. is the average payload 700Kb, +/- 10% or 700KB up to +500%, 20% of the time?
Authentication and authorization - how is this done? I.e. How much load will you be putting on the IDAM systems?