Differences

This shows you the differences between two versions of the page.

--- pdclib:floatingpoint [2019/10/02 10:54]
solar [Floating Point Explained]
+++ pdclib:floatingpoint [2019/10/18 14:40]
solar [Fixed Point Fraction Output]
@@ Line 9: / Line 9: @@
 ==== Decimal Notation ====
-[[https://en.wikipedia.org/wiki/Decimal_notation | Decimal notation]] uses the digits ''0'' through ''9'', which carry value according to their position. Consider the decimal number ''123.4'':
+[[https://en.wikipedia.org/wiki/Decimal_notation | Decimal notation]] uses the digits 0 through 9, which carry value according to their position. Consider the decimal number <m>123.4</m>:
 ^ Digit ^ 1 ^ 2 ^ 3 ^ . ^ 4 ^
-^ Value | %%10^2%% | %%10^1%% | %%10^0%% | | %%10^-1%% |
+^ Value | <m>*10^2</m> (100) | <m>*10^1</m> (20) | <m>*10^0</m> (3) | | <m>*10^-1</m> (0.4) |
 ==== Scientific Notation ====
-Decimal notation gets more and more unwieldy the further your significant digits are away from the decimal point -- very large, or very small numbers. For these, you would use [[https://en.wikipedia.org/wiki/Scientific_notation | scientific notation]], where a number is written as ''m x 10^n''.
+Decimal notation gets more and more unwieldy the further your significant digits are away from the decimal point -- very large, or very small numbers. For these, you would use [[https://en.wikipedia.org/wiki/Scientific_notation | scientific notation]], where a number is written as <m>m*10^n</m>.
-The ''m'' is the [[https://en.wikipedia.org/wiki/Significand | significand]], also called "mantissa" (which is the more common term when talking about floating point numbers in computing), while ''n'' is the [[https://en.wikipedia.org/wiki/Exponent | exponent]] and ''10'' is the [[https://en.wikipedia.org/wiki/Base_(exponentiation) | base]].
+The <m>m</m> is the [[https://en.wikipedia.org/wiki/Significand | significand]], also called "mantissa" (which is the more common term when talking about floating point numbers in computing), while <m>n</m> is the [[https://en.wikipedia.org/wiki/Exponent | exponent]] and <m>10</m> is the [[https://en.wikipedia.org/wiki/Base_(exponentiation) | base]].
-Let's return to our number ''123.4'', which could be written as ''123.4x10^0''. Any number to the zeroth power equals one, so multiplying by ''10^0'' is a no-op (and usually omitted, returning to normal //decimal notation//).
+Let's return to our number <m>123.4</m>, which could be written as <m>123.4*10^0</m>. Any number to the zeroth power equals one, so multiplying by <m>10^0</m> is a no-op (and usually omitted, returning to normal //decimal notation//).
-Increment or decrement the exponent to shift the decimal point of the significand one digit to the left or the right, respectively. So we could also write ''12.34x10^1'', ''1234x10^-1'', or ''1.234x10^2'' -- the decimal point is "floating" along the significand.
+Increment or decrement the exponent to shift the decimal point of the significand one digit to the left or the right, respectively. So we could also write <m>12.34*10^1</m>, <m>1234*10^-1</m>, or <m>1.234*10^2</m> -- the decimal point is "floating" along the significand.
-This allows us to write very large numbers, like ''1.234x10^78'', or very small numbers, like ''1.234x10^-34'', efficiently by scaling the exponent appropriately.
+This allows us to write very large numbers like <m>1.234*10^78</m>, or very small numbers like <m>1.234*10^-34</m> efficiently, by scaling the exponent appropriately.
 ==== Normalized Number ====
@@ Line 36: / Line 36: @@
 ==== Binary Encoding ====
-First thing, the //base// used by a computer is ''2'' instead of ''10'', for obvious reasons. It is implied, and does not need to be stored.
+First thing, the //base// used by a computer is (usually) 2 instead of 10, for obvious reasons. It is implied, and does not need to be stored.
 One bit is used as //sign bit// for the mantissa (i.e. whether the whole number is positive or negative).
@@ Line 46: / Line 46: @@
 === Sign ===
-''0'' indicates positive, ''1'' negative. Yes, this format has a [[https://en.wikipedia.org/wiki/Signed_zero | signed zero]].
+Unset indicates positive, set negative. Yes, this format has a [[https://en.wikipedia.org/wiki/Signed_zero | signed zero]].
 === Mantissa ===
-The mantissa is assumed to be //normalized//, i.e. the highest (non-fractional) bit is assumed to be set. It has a value of ''2^0'', i.e. ''1'', with following fractional bits having incrementing negative exponents. Under this assumption, the highest bit need not actually be stored, giving room for an additional //fractional// mantissa bit at the end.
+The mantissa is assumed to be //normalized//, i.e. the highest (non-fractional) bit is assumed to be set. It has a value of <m>2^0</m>, i.e. 1, with following fractional bits having incrementing negative exponents. Under this assumption, the highest bit need not actually be stored, giving room for an additional //fractional// mantissa bit at the end.
-Most people who read thus far can probably rattle down the values for //positive// base-2 exponents -- one, two, four, eight, sixteen and so on. //Negative// exponents usually give people a moment pause. But it's quite easy: While each positive increment //doubles// the value, each decrement //halves// it -- ''0.5'', ''0.25'', ''0.125'', ''0.0625'' and so on.
+Most people who read thus far can probably rattle down the values for //positive// base-2 exponents -- one, two, four, eight, sixteen and so on. //Negative// exponents usually give people a moment pause. But it's quite easy: While each positive increment //doubles// the value, each decrement //halves// it -- <m>0.5</m>, <m>0.25</m>, <m>0.125</m>, <m>0.0625</m> and so on.
 === Exponent ===
-The exponent is a funny bugger. IEEE 754 defines the exponent as an absolute (positive) offset to a (negative) bias. That bias is calculated as ''1-2^(bits-1)''. "All bits clear" and "all bits set" are reserved (we will come to this later), so the smallest meaningful exponent is ''1''.
+The exponent is a funny bugger. IEEE 754 defines the exponent as an absolute (positive) offset to a (negative) bias. That bias is calculated as <m>1-2^(bits-1)</m>. "All bits clear" and "all bits set" are reserved (we will come to this later), so the smallest meaningful exponent is 1.
-For our 3-bit exponent, we get a bias of ''1-2^2 = -3'', with a practical range of ''-2..3'':
+For our 3-bit exponent, we get a bias of <m>1-2^2 = -3</m>, with a practical range of <m>-2..3</m>:
 ^ Exponent ^ Value ^
@@ Line 90: / Line 90: @@
 === Denormalized Numbers ===
-When all exponent bits are cleared, the mantissa is considered to be [[https://en.wikipedia.org/wiki/Denormal_number | denormalized]], with an exponent of ''1-(bias)''.
+When all exponent bits are cleared, the mantissa is considered to be [[https://en.wikipedia.org/wiki/Denormal_number | denormalized]], with an exponent of <m>1-(bias)</m>.
 In other words, the smallest representable exponent (with only the lowest bit set) is applied, //with the usually implied pre-fractional bit unset//. The purpose, here, is to allow a bit of additional precision in the very smallest numbers presentable.
@@ Line 97: / Line 97: @@
 ^ Exponent ^ Mantissa (stored) ^ Mantissa (implied) ^ Value ^
-| 001      | 11                | 1.11               | %%1.75 x 2^(-2) = 0.4375%% |
+| 001      | 11                | 1.11               | <m>1.75 * 2^(-2) = 0.4375</m> |
-| 001      | 10                | 1.10               | %%1.50 x 2^(-2) = 0.375%%  |
+| 001      | 10                | 1.10               | <m>1.50 * 2^(-2) = 0.375</m>  |
-| 001      | 01                | 1.01               | %%1.25 x 2^(-2) = 0.3125%% |
+| 001      | 01                | 1.01               | <m>1.25 * 2^(-2) = 0.3125</m> |
-| 001      | 00                | 1.00               | %%1.00 x 2^(-2) = 0.25%%   |
+| 001      | 00                | 1.00               | <m>1.00 * 2^(-2) = 0.25</m>   |
-| 000      | 11                | **0**.11               | **0**%%.75 x 2^(-2) = 0.1875%% |
+| 000      | 11                | **0**.11               | **0**<m>.75 * 2^(-2) = 0.1875</m> |
-| 000      | 10                | **0**.10               | **0**%%.50 x 2^(-2) = 0.125%%  |
+| 000      | 10                | **0**.10               | **0**<m>.50 * 2^(-2) = 0.125</m>  |
-| 000      | 01                | **0**.01               | **0**%%.25 x 2^(-2) = 0.0625%% |
+| 000      | 01                | **0**.01               | **0**<m>.25 * 2^(-2) = 0.0625</m> |
 | 000      | 00                | **0**.00               | **0** |
@@ Line 140: / Line 140: @@
 The seminal works on the conversion of binary floating point to decimal string representation are:
-  * Guy Steele, Jon White (1990): How to Print Floating-Point Numbers Accurately
+  * Guy Steele, Jon White (1990): [[https://doi.org/10.1145/93548.93559 | How to Print Floating-Point Numbers Accurately]]
-  * David Gay (1990): Correctly Rounded Binary-Decimal and Decimal-Binary Conversions
+  * David Gay (1990): [[http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.4049 | Correctly Rounded Binary-Decimal and Decimal-Binary Conversions]]
-  * Robert G. Burger, R. Kent Dybvig (1996): Printing Floating-Point Numbers Quickly and Accurately
+  * Robert G. Burger, R. Kent Dybvig (1996): [[http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.2247 | Printing Floating-Point Numbers Quickly and Accurately]]
-  * Florian Loitsch (2010): Printing Floating-Point Numbers Quickly and Accurately with Integers
+  * Florian Loitsch (2010): [[http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.1021.1906 | Printing Floating-Point Numbers Quickly and Accurately with Integers]]
-That first paper presented the fourth iteration of an algorithm dubbed "Dragon" by its inventors, i.e. "Dragon 4". Loitsch continued in that vein by calling his small-integer optimization "Grisu", of which there are three iterations.
+Steele & White reference another seminal work on the opposite conversion, from decimal string to binary floating point representation:
+  * William D. Clinger (1990): [[https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.4152 | How to read floating point numbers accurately]]
+That first paper by Steele & White presented the "Dragon" algorithm in its fourth iteration, to which Gay and Burger / Dybvig submitted improvements. Loitsch presented a significant performance improvement he called "Grisu", of which there are three iterations.
 (Thanks to [[https://www.ryanjuckett.com/programming/printing-floating-point-numbers/ | Ryan Juckett]] for writing a very nice introduction on the subject, which was at the root of what I assembled on this page.)
+===== Dragon =====
+Steele & White approach the presentation of their Dragon algorithm as a series of "lesser" algorithms building on each other.
+==== Fixed Point Fraction Output ====
+Given:
+  * A positive value //f// less than 1, with //n// digits to (input) base //b// (usually 2).
+Output:
+  * A value //F// of //N// digits to (output) base //B// (usually 10).
+Such that:
+  - <m>delim{|}{F - f}{|} < { b^{-n} } / 2</m>
+    * The difference between representations is less than half the positional value of the //n//th digit of //f//.
+  - <m>N</m> is the smallest number of digits so that 1. can be true.
+  - <m>delim{|}{F - f}{|} <= { B^{-N} } / 2</m>
+    * The difference between representations is no more than half the positional value of the //N//th digit of //F//.
+  - Each digit of //F// is output before the next is generated; no "back up" for corrections.
+Algorithm <m>(FP)^3</m> (Finite-Precision Fixed-Point Fraction Printout):
+  * <m>k = 0, R = f, M = { b ^ { -n } / 2 }</m>
+  * while ( 1 )
+    * k++
+    * U = floor( R * B )
+    * R = ( R * B ) % 1
+    * M *= B
+    * if ( <m>R >= M AND <= 1 - M</m> )
+      * append( F, U )
+    * else
+      * break
+  * if ( <m>R <= 1/2</m> )
+    * append( F, U )
+  * if ( <m>R >= 1/2</m> )
+    * append( F, U + 1 )
+At the end, <m>k</m> is <m>N</m>, i.e. the number of digits in <m>F</m>.
+Example:
+  * Given the base <m>b = 2</m> number <m>f = .10110</m>, with <m>n = 5</m>, to be converted to base <m>B = 10</m>
+  * The //exact// decimal representation would be <m>.6835</m>
+    * The next higher number (<m>f_{+1} = .10111</m>) would be decimal <m>0.71475</m>
+    * The next lower number (<m>f_{-1} = .10101</m>) would be decimal <m>0.65225</m>
+  * The Mask would be <m>M = { b ^ {-n} } / 2 = { 2 ^ { -5 } } / 2 = 0.015625</m>
+  * First (<m>k = 1</m>) loop
+    * multiply <m>R = 0.6835</m> by <m>B = 10</m> for integral part <m>6</m>, fractional part <m>0.835</m>
+    * multiply <m>M = 0.015625</m> by <m>B = 10</m> for new <m>M = 0.15625</m>
+    * Fractional part <m>0.835</m> is larger than mask <m>0.15625</m>, and smaller than <m>1 - 0.15625 = 0.84375</m>, so <m>6</m> is our first (fractional) digit, and the loop continues
+  * Second (<m>k = 2</m>) loop
+    * multiply <m>R = 0.835</m> by <m>B = 10</m> for integral part <m>8</m>, fractional part <m>0.35</m>
+    * multiply <m>M = 0.15625</m> by <m>B = 10</m> for new <m>M = 1.5625</m>
+    * Fractional part <m>.35</m> is smaller than mask <m>1.5625</m>, and not smaller than <m>1 - 1.5625 = -0.5625</m>, so the loop terminates
+  * Post-loop
+    * Fractional part <m>.35</m> is smaller than <m>1/2</m>, so <m>8</m> is our next fractional digit
+    * We have <m>N = k = 2</m> fractional digits in our result of <m>0.68</m>, which is the smallest <m>N</m> that uniquely identifies our original <m>f = .10110</m>
+==== Floating-Point Printout ====
+Given:
+  * A base //b// floating-point number <m>v = f * b ^ (e - p)</m>, where //e//, //f//, and //p// are integers with <m>p >= 0</m> and <m>0 <= f < b ^ p</m>.
+Output:
+  * An approximate representation to base //B//, using exponential notation if the value is very large or very small.
+Algorithm (Dragon2):
+   * Compute <m>v prime = v * B ^ {-x}</m>, where <m>x</m> is chosen so that the result either is between 1 / //B// and 1, or between 1 and //B//
+   * Print the integer part of <m>v prime</m>
+   * Print a decimal point
+   * Take the fractional part of <m>v prime</m>, let <m>n = p - floor ( log_ b v prime ) - 1</m>
+   * Apply algorithm <m>(FP)^3</m> to //f// and //n//
+   * If <m>x</m> was not zero, append the letter "E" and a representation of the scale factor <m>x</m>

Solar's Rootdirectory

User Tools

Site Tools

Differences

Page Tools