The code runs in O(1) time, and rep bsf is not a loop, it is a single instruction that runs in a fixed amount of time. The compiler chose this implementation because there is no std::countr_zero counterpart on x86, so bsf is used to get the first bit set