79716122

Date: 2025-07-27 03:49:34
Score: 0.5
Natty:
Report link

The basic problem is that floating point numbers don't actually have a natural way of representing zero.

A 32-bit (single precision) consists of a sign bit, an eight-bit exponent, and a twenty-three bit mantissa. (Double is similar, but larger.) Let's use a smaller format: 4 bit. That has a sign bit, two exponent bits, and a mantissa bit (in that order). There are sixteen possible values.

Binary Decimal without denormalization Denormalized Decimal
0000 .5 0
0001 .75 .5
0010 1 1
0011 1.5 1.5
0100 2 2
0101 3 3
0110 4 Inf
0111 6 NaN
1000 -0.5 -0
1001 -0.75 -.5
1010 -1 -1
1011 -1.5 -1.5
1100 -2 -2
1101 -3 -3
1110 -4 -Inf
1111 -6 NaN

Common rule:

The rules (without denormalization):

The denormalization rules (for floats whose exponent is all zeroes):

The denormalization rules (for floats whose exponent is all ones):

See the problem? If you never apply the denormalization rules, the smallest magnitude positive and negative numbers are plus and minus one half, not zero. If you round up positive one half, you get one.

Others are explaining how this is a bug in rounding, but I find it interesting how floating point numbers are represented. This is why the bug works the way that it does. They sort of hacked zero into a format that doesn't naturally support zero. Without denormalization of the subnormal numbers, "zero" would actually just be a really small number. Round it up, and it would become one. Basically the bug is them not special-casing the denormalized numbers properly.

Note: the exponent bias is -1 in the 4-bit float, giving a range of -1, 0, 1, 2. In a regular 32-bit float, it would be -127 (-127 through 128). In a double, it would be -1023 (-1023 through 1024). There would be four subnormal values in a 4-bit float. 32-bit has more than sixteen million.

Reasons:
  • Long answer (-1):
  • No code block (0.5):
  • Contains question mark (0.5):
  • Low reputation (0.5):
Posted by: mdfst13