My surmise is that the code without a break is getting vectorized by the compiler, while the code with a break cannot be vectorized and must remain as a scalar loop. Because the loop exit happens almost 90% of the way through the array, the inefficiency of iterating through the last 10% of the array is small compared to the gains from vectorization.