User Tools

Site Tools


pdclib:printing_floating_point_numbers

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
pdclib:printing_floating_point_numbers [2025/08/10 21:41] solarpdclib:printing_floating_point_numbers [2025/08/21 14:01] (current) – [Biased Exponent] solar
Line 25: Line 25:
 Instead of assuming two's complement to allow for positive and negative exponents, IEEE 754 uses //biased// exponents: The exponent bits are interpreted as unsigned integer, but to get the "real" exponent value, you need to //substract// the bias value, which is ''FLOAT_MAX_EXP - 1'', ''DBL_MAX_EXP - 1'', or ''LDBL_MAX_EXP - 1'', respectively. Instead of assuming two's complement to allow for positive and negative exponents, IEEE 754 uses //biased// exponents: The exponent bits are interpreted as unsigned integer, but to get the "real" exponent value, you need to //substract// the bias value, which is ''FLOAT_MAX_EXP - 1'', ''DBL_MAX_EXP - 1'', or ''LDBL_MAX_EXP - 1'', respectively.
  
 +=== Huh? ===
 +
 +Remember that IEEE 754 is a //floating point// standard. It makes //no// asumptions on the integer logic of the machine. What should the exponent be encoded at? Two's compliment? You don't know if the ALU supports that! So the exponent is stored unsigned. That means that the value ''1'' (1x10^0, or 1x2^0) is not stored with an exponent of all zeroes, but an exponent halfway between all zeroes (signifying denormals) and all ones (signifying INF / NaN).
 ==== Infinity ==== ==== Infinity ====
  
Line 98: Line 101:
  
 [2]: There is a special case when the successor value would have a higher exponent, i.e. the successor would be twice as far away as the predecessor. You need to take this into account. [2]: There is a special case when the successor value would have a higher exponent, i.e. the successor would be twice as far away as the predecessor. You need to take this into account.
 +
 +==== Visualization ====
 +
 +Taking the hint from [[https://www.ryanjuckett.com/printing-floating-point-numbers/|Ryan Juckett's tutorial]] on the subject, let's visualize what we're doing with a 6-bit floating point format, with 1 sign bit, 3 exponent bits, and 2 mantissa bits.
 +
 +^   Binary    Exponent  ^  Mantissa  ^  Value  ^  Margin  ^
 +|  0 000 00  | 0          | 0          | 0       | 0.0625   |
 +|  0 000 01  | 0          | 1          | 0.0625  | 0.0625   |
 +|  0 000 10  | 0          | 2          | 0.125   | 0.0625   |
 +|  0 000 11  | 0          | 3          | 0.1875  | 0.0625   |
 +|  0 001 00  | 1          | 0          | 0.25    | 0.0625   |
 +|  0 001 01  | 1          | 1          | 0.3125  | 0.0625   |
 +|  0 001 10  | 1          | 2          | 0.375   | 0.0625   |
 +|  0 001 11  | 1          | 3          | 0.4375  | 0.0625   |
 +|  0 010 00  | 2          | 0          | 0.5     | 0.125    |
 +|  0 010 01  | 2          | 1          | 0.625   | 0.125    |
 +|  0 010 10  | 2          | 2          | 0.75    | 0.125    |
 +|  0 010 11  | 2          | 3          | 0.875   | 0.125    |
 +|  0 011 00  | 3          | 0          | 1       | 0.25     |
 +|  0 011 01  | 3          | 1          | 1.25    | 0.25     |
 +|  0 011 10  | 3          | 2          | 1.5     | 0.25     |
 +|  0 011 11  | 3          | 3          | 1.75    | 0.25     |
 +|  0 100 00  | 4          | 0          | 2       | 0.5      |
 +|  0 100 01  | 4          | 1          | 2.5     | 0.5      |
 +|  0 100 10  | 4          | 2          | 3       | 0.5      |
 +|  0 100 11  | 4          | 3          | 3.5     | 0.5      |
 +|  0 101 00  | 5          | 0          | 4       | 1        |
 +|  0 101 01  | 5          | 1          | 5       | 1        |
 +|  0 101 10  | 5          | 2          | 6       | 1        |
 +|  0 101 11  | 5          | 3          | 7       | 1        |
 +|  0 110 00  | 6          | 0          | 8       | 2        |
 +|  0 110 01  | 6          | 1          | 10      | 2        |
 +|  0 110 10  | 6          | 2          | 12      | 2        |
 +|  0 110 11  | 6          | 3          | 14      | -        |
 +|  0 111 00  | 7          | 0          | INF     | -        |
 +|  0 111 01  | 7          | 1          | NaN(*)  | -        |
 +|  0 111 10  | 7          | 2          | NaN     | -        |
 +|  0 111 11  | 7          | 3          | NaN     | -        |
 +
 +*: Signalling NaN (highest mantissa bit zero)
 +
 +The "Margin" is the difference between that number and its next higher "neighbor". Half that distance is where a decimal representation would be "tied" between those two binary representations.
 +
 +Take 0.25 and 0.3125 for example, which have a margin of 0.0625. Half that margin would be 0.03125. The decimal number (0.25 + 0.03125) = 0.28125 would be tied beween 0.25 and 0.3125. But 0.28124 would //unambiguously// identify the binary 0 001 00, because that's closer. This is how strtod() and scanf() can make use of this margin number.
 +
 +This works the other way around, too. Let's look at 0.4375. I don't need to //print// 0.4375 to unambiguously identify the binary 0 001 11, because either 0.43 or 0.44 would suffice (being less than 0.03125 away from the "real" value). Just 0.4 wouldn't do, because (0.4375 - 0.4) > 0.03125, and thus closer to 0.375 (0 001 10). This is the way printf() is looking at the issue.
pdclib/printing_floating_point_numbers.1754854880.txt.gz · Last modified: by solar

Except where otherwise noted, content on this wiki is licensed under the following license: CC0 1.0 Universal
CC0 1.0 Universal Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki