pdclib:printing_floating_point_numbers
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
pdclib:printing_floating_point_numbers [2025/08/10 19:20] – created solar | pdclib:printing_floating_point_numbers [2025/08/21 14:01] (current) – [Biased Exponent] solar | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Printing Floating Point Numbers ====== | ====== Printing Floating Point Numbers ====== | ||
- | To complete the ``printf()`` function of PDCLib -- which so far lacks support for ``%e``, ``%f``, and ``%g`` -- I needed to solve the problem of converting the binary, in-memory representation of ``double`` and ``long double`` to decimal. | + | To complete the '' |
Using the same algorithm as for integers (divide by ten, take quotient, recurse with remainder) is not an option. Not only would repeated floating point divisions be horrendously slow: Multiplying / dividing a (base 2) floating point by 10 repeatedly would also accumulate rounding errors. You would have slow, //wrong// results. | Using the same algorithm as for integers (divide by ten, take quotient, recurse with remainder) is not an option. Not only would repeated floating point divisions be horrendously slow: Multiplying / dividing a (base 2) floating point by 10 repeatedly would also accumulate rounding errors. You would have slow, //wrong// results. | ||
+ | |||
+ | ===== What is in a floating point number ===== | ||
+ | |||
+ | The lowest bit in an integer has the value 2⁰ (1). The next bit counts for 2¹ (2), the next for 2² (4), and so on. | ||
+ | |||
+ | The basic concept of a floating point number is that the first bit has the value 2⁰ (1), with the next having the value 2⁻¹ (0.5), then 2⁻² (0.25), and so on. With this // | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | The C type '' | ||
+ | |||
+ | There are a couple of special cases that need to be kept in mind when working with floating point numbers. | ||
+ | |||
+ | ==== Implicit Decimal ==== | ||
+ | |||
+ | The first bit of a floating point number, the one valued 2⁰, is //always set// (but see // | ||
+ | |||
+ | ==== Biased Exponent ==== | ||
+ | |||
+ | Instead of assuming two's complement to allow for positive and negative exponents, IEEE 754 uses //biased// exponents: The exponent bits are interpreted as unsigned integer, but to get the " | ||
+ | |||
+ | === Huh? === | ||
+ | |||
+ | Remember that IEEE 754 is a //floating point// standard. It makes //no// asumptions on the integer logic of the machine. What should the exponent be encoded at? Two's compliment? You don't know if the ALU supports that! So the exponent is stored unsigned. That means that the value '' | ||
+ | ==== Infinity ==== | ||
+ | |||
+ | If a value is too big for the exponent to represent, the value will become // | ||
+ | |||
+ | ==== Not a Number ==== | ||
+ | |||
+ | Some mathematical operations (like '' | ||
+ | |||
+ | ==== Subnormals ==== | ||
+ | |||
+ | When a floating point number becomes too close to zero to represent even with the smallest exponent, most platforms support // | ||
+ | |||
+ | ==== Unnormals ==== | ||
+ | |||
+ | The 80bit Intel x86 Extended Precision format has an // | ||
+ | |||
+ | While the explicit decimal bit still exists in the architecture, | ||
+ | |||
+ | ===== In PDCLib ===== | ||
+ | |||
+ | The general idea for PDCLib is as follows: | ||
+ | |||
+ | * When passed to '' | ||
+ | * There is no need for '' | ||
+ | * This deconstruction makes use of bit-twiddling macros defined in '' | ||
+ | * The remaining conversion code works on the '' | ||
+ | * For now, I limited myself to base-2 floating points. Other formats (like base-8, or base-16) could probably be added as needed. | ||
+ | |||
+ | ---- | ||
===== Dragon4 ===== | ===== Dragon4 ===== | ||
- | The seminal work in this area is a paper by [[https:// | + | Now that we have learned a bit about what a floating point number looks like, we will focus on how to convert one to a decimal representation. |
+ | |||
+ | The seminal work in this area is a paper by [[https:// | ||
+ | |||
+ | I am aware of later optimizations, | ||
+ | |||
+ | ==== Overview ==== | ||
+ | |||
+ | //Dragon 4// allows to find the shortest decimal number that //uniquely identifies// | ||
+ | |||
+ | Given the limited precision of the mantissa, each representable binary value has exactly one predecessor, | ||
+ | |||
+ | Let's have a look at such a triple of consecutive numbers, for the sake of brevity in single precision format: | ||
+ | |||
+ | * 0x4040 0001 -- 3.0000002384185791015625 | ||
+ | * 0x4040 0002 -- 3.000000476837158203125 | ||
+ | * 0x4040 0003 -- 3.0000007152557373046875 | ||
+ | |||
+ | You will see that everything after the seventh fractional digit is, effectively, | ||
+ | |||
+ | The initial reflex might be, "but what when I //need// those additional digits?" | ||
+ | |||
+ | ==== How does it do it? ==== | ||
+ | |||
+ | Looked at from low earth orbit, Dragon4 works something like this< | ||
+ | |||
+ | * Copy the mantissa part of the value into a big integer. | ||
+ | * If the value is not a subnormal, set the implied decimal. | ||
+ | * As you don't have a decimal point in there anywhere, correct the exponent by the number of mantissa bits (basically, move the decimal point out of the bit pattern). | ||
+ | * You set up a " | ||
+ | * //{ stuff not yet understood here }// | ||
+ | * The mantissa bigint now being the numerator / dividend, find the highest power of ten that is still less than the mantissa bigint, as denominator / divisor. | ||
+ | * Begin loop: | ||
+ | * Divide mantissa bigint numerator by power-of-ten denominator. Result is in the range 0..9 (1..9 for the first round), and your decimal digit. | ||
+ | * Substract denominator * quotient from mantissa. | ||
+ | * Multiply mantissa and mask value by 10. | ||
+ | * Repeat until a) mantissa is zero, b) specified precision is reached, c) mantissa < mask value | ||
+ | |||
+ | [1]: I am not done either understanding nor implementing the Dragon4 algorithm itself, so what I am writing here is very much work in progress. | ||
+ | |||
+ | [2]: There is a special case when the successor value would have a higher exponent, i.e. the successor would be twice as far away as the predecessor. You need to take this into account. | ||
+ | |||
+ | ==== Visualization ==== | ||
+ | |||
+ | Taking the hint from [[https:// | ||
+ | |||
+ | ^ | ||
+ | | 0 000 00 | 0 | 0 | 0 | 0.0625 | ||
+ | | 0 000 01 | 0 | 1 | 0.0625 | ||
+ | | 0 000 10 | 0 | 2 | 0.125 | 0.0625 | ||
+ | | 0 000 11 | 0 | 3 | 0.1875 | ||
+ | | 0 001 00 | 1 | 0 | 0.25 | 0.0625 | ||
+ | | 0 001 01 | 1 | 1 | 0.3125 | ||
+ | | 0 001 10 | 1 | 2 | 0.375 | 0.0625 | ||
+ | | 0 001 11 | 1 | 3 | 0.4375 | ||
+ | | 0 010 00 | 2 | 0 | 0.5 | 0.125 | | ||
+ | | 0 010 01 | 2 | 1 | 0.625 | 0.125 | | ||
+ | | 0 010 10 | 2 | 2 | 0.75 | 0.125 | | ||
+ | | 0 010 11 | 2 | 3 | 0.875 | 0.125 | | ||
+ | | 0 011 00 | 3 | 0 | 1 | 0.25 | | ||
+ | | 0 011 01 | 3 | 1 | 1.25 | 0.25 | | ||
+ | | 0 011 10 | 3 | 2 | 1.5 | 0.25 | | ||
+ | | 0 011 11 | 3 | 3 | 1.75 | 0.25 | | ||
+ | | 0 100 00 | 4 | 0 | 2 | 0.5 | | ||
+ | | 0 100 01 | 4 | 1 | 2.5 | 0.5 | | ||
+ | | 0 100 10 | 4 | 2 | 3 | 0.5 | | ||
+ | | 0 100 11 | 4 | 3 | 3.5 | 0.5 | | ||
+ | | 0 101 00 | 5 | 0 | 4 | 1 | | ||
+ | | 0 101 01 | 5 | 1 | 5 | 1 | | ||
+ | | 0 101 10 | 5 | 2 | 6 | 1 | | ||
+ | | 0 101 11 | 5 | 3 | 7 | 1 | | ||
+ | | 0 110 00 | 6 | 0 | 8 | 2 | | ||
+ | | 0 110 01 | 6 | 1 | 10 | 2 | | ||
+ | | 0 110 10 | 6 | 2 | 12 | 2 | | ||
+ | | 0 110 11 | 6 | 3 | 14 | - | | ||
+ | | 0 111 00 | 7 | 0 | INF | - | | ||
+ | | 0 111 01 | 7 | 1 | NaN(*) | ||
+ | | 0 111 10 | 7 | 2 | NaN | - | | ||
+ | | 0 111 11 | 7 | 3 | NaN | - | | ||
+ | |||
+ | *: Signalling NaN (highest mantissa bit zero) | ||
+ | |||
+ | The " | ||
+ | |||
+ | Take 0.25 and 0.3125 for example, which have a margin of 0.0625. Half that margin would be 0.03125. The decimal number (0.25 + 0.03125) = 0.28125 would be tied beween 0.25 and 0.3125. But 0.28124 would // | ||
+ | |||
+ | This works the other way around, too. Let's look at 0.4375. I don't need to //print// 0.4375 to unambiguously identify the binary 0 001 11, because either 0.43 or 0.44 would suffice (being less than 0.03125 away from the " |
pdclib/printing_floating_point_numbers.1754846447.txt.gz · Last modified: by solar