Thanks for the replies! A few of the linked threads have solutions to this for the question of which cpu architecture we're in. In this situation, all of the nodes are x86_64. Actually, the sysadmin didn't realize that they were heterogeneous until we hit this issue.
I haven't actually used a ton of cpu optimizations, just -O3. But that does turn on a ton of other things, and probably dialing it back to -O2 or -O1 would partly solve this. The code I'm running takes weeks, though, so I'm relucantant to do that.
The workaround I've been using is to have a shell script attempt to run the code. If it fails, recompile a local copy and use that. This is partly in-line with some suggestions above, although in some cases I know for sure that it's running at sub-optimal speed.
Pepijn Kramer's idea to use a shell script to query for the exact cpu might be an improvement --- and then trigger the recompile if it doesn't match.