After tinkering a bit, I discovered that adding -fomit-frame-pointer
was enough to make the codegen output the same on each compiler. Looking at the history of this optimization, it looks like there's been a push to make it default disabled on all distros, which is a good thing of course. But in this case I am heavily dependent on reducing instruction usage down to the absolute minimum.