yes I agree with you, there are tricky corner cases where this overflows. I ran into the same bug, working on a 32-bit architecture one counter-example is v = (1, 2) (that is v = 2^32 + 2, v_1 = 1, v_0 = 2), the normalization step D1 computes d = floor((b-1)/v_1) that is 0xffffffff with b = 2^32. But then d * v = (2^32-1) * (2^32 + 2) = 2^64 + 2^32 - 2 = (1, 0, 0xfffffffe) with an overflow.
Would d = ceil(((b/2)/v_1)) work?