User Tools

Site Tools


pdclib:floatingpoint

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
pdclib:floatingpoint [2019/10/02 11:05]
solar [Dragon 4 and Grisu 3]
pdclib:floatingpoint [2019/10/21 10:13] (current)
solar [Floating-Point Printout]
Line 9: Line 9:
 ==== Decimal Notation ==== ==== Decimal Notation ====
  
-[[https://en.wikipedia.org/wiki/Decimal_notation | Decimal notation]] uses the digits ''0'' through ''9'', which carry value according to their position. Consider the decimal number ''123.4'':+[[https://en.wikipedia.org/wiki/Decimal_notation | Decimal notation]] uses the digits 0 through 9, which carry value according to their position. Consider the decimal number <m>123.4</m>:
  
 ^ Digit ^ 1 ^ 2 ^ 3 ^ . ^ 4 ^ ^ Digit ^ 1 ^ 2 ^ 3 ^ . ^ 4 ^
-^ Value | %%10^2%% %%10^1%% %%10^0%% | | %%10^-1%% |+^ Value | <m>*10^2</m> (100) <m>*10^1</m> (20) <m>*10^0</m> (3) | | <m>*10^-1</m> (0.4) |
  
 ==== Scientific Notation ==== ==== Scientific Notation ====
  
-Decimal notation gets more and more unwieldy the further your significant digits are away from the decimal point -- very large, or very small numbers. For these, you would use [[https://en.wikipedia.org/wiki/Scientific_notation | scientific notation]], where a number is written as ''10^n''.+Decimal notation gets more and more unwieldy the further your significant digits are away from the decimal point -- very large, or very small numbers. For these, you would use [[https://en.wikipedia.org/wiki/Scientific_notation | scientific notation]], where a number is written as <m>m*10^n</m>.
  
-The ''m'' is the [[https://en.wikipedia.org/wiki/Significand | significand]], also called "mantissa" (which is the more common term when talking about floating point numbers in computing), while ''n'' is the [[https://en.wikipedia.org/wiki/Exponent | exponent]] and ''10'' is the [[https://en.wikipedia.org/wiki/Base_(exponentiation) | base]].+The <m>m</m> is the [[https://en.wikipedia.org/wiki/Significand | significand]], also called "mantissa" (which is the more common term when talking about floating point numbers in computing), while <m>n</m> is the [[https://en.wikipedia.org/wiki/Exponent | exponent]] and <m>10</m> is the [[https://en.wikipedia.org/wiki/Base_(exponentiation) | base]].
  
-Let's return to our number ''123.4'', which could be written as ''123.4x10^0''. Any number to the zeroth power equals one, so multiplying by ''10^0'' is a no-op (and usually omitted, returning to normal //decimal notation//).+Let's return to our number <m>123.4</m>, which could be written as <m>123.4*10^0</m>. Any number to the zeroth power equals one, so multiplying by <m>10^0</m> is a no-op (and usually omitted, returning to normal //decimal notation//).
  
-Increment or decrement the exponent to shift the decimal point of the significand one digit to the left or the right, respectively. So we could also write ''12.34x10^1''''1234x10^-1'', or ''1.234x10^2'' -- the decimal point is "floating" along the significand.+Increment or decrement the exponent to shift the decimal point of the significand one digit to the left or the right, respectively. So we could also write <m>12.34*10^1</m><m>1234*10^-1</m>, or <m>1.234*10^2</m> -- the decimal point is "floating" along the significand.
  
-This allows us to write very large numberslike ''1.234x10^78'', or very small numberslike ''1.234x10^-34'', efficiently by scaling the exponent appropriately.+This allows us to write very large numbers like <m>1.234*10^78</m>, or very small numbers like <m>1.234*10^-34</m> efficientlyby scaling the exponent appropriately.
  
 ==== Normalized Number ==== ==== Normalized Number ====
Line 36: Line 36:
 ==== Binary Encoding ==== ==== Binary Encoding ====
  
-First thing, the //base// used by a computer is ''2'' instead of ''10'', for obvious reasons. It is implied, and does not need to be stored.+First thing, the //base// used by a computer is (usually) 2 instead of 10, for obvious reasons. It is implied, and does not need to be stored.
  
 One bit is used as //sign bit// for the mantissa (i.e. whether the whole number is positive or negative). One bit is used as //sign bit// for the mantissa (i.e. whether the whole number is positive or negative).
Line 46: Line 46:
 === Sign === === Sign ===
  
-''0'' indicates positive, ''1'' negative. Yes, this format has a [[https://en.wikipedia.org/wiki/Signed_zero | signed zero]].+Unset indicates positive, set negative. Yes, this format has a [[https://en.wikipedia.org/wiki/Signed_zero | signed zero]].
  
 === Mantissa === === Mantissa ===
  
-The mantissa is assumed to be //normalized//, i.e. the highest (non-fractional) bit is assumed to be set. It has a value of ''2^0'', i.e. ''1'', with following fractional bits having incrementing negative exponents. Under this assumption, the highest bit need not actually be stored, giving room for an additional //fractional// mantissa bit at the end.+The mantissa is assumed to be //normalized//, i.e. the highest (non-fractional) bit is assumed to be set. It has a value of <m>2^0</m>, i.e. 1, with following fractional bits having incrementing negative exponents. Under this assumption, the highest bit need not actually be stored, giving room for an additional //fractional// mantissa bit at the end.
  
-Most people who read thus far can probably rattle down the values for //positive// base-2 exponents -- one, two, four, eight, sixteen and so on. //Negative// exponents usually give people a moment pause. But it's quite easy: While each positive increment //doubles// the value, each decrement //halves// it -- ''0.5''''0.25''''0.125''''0.0625'' and so on.+Most people who read thus far can probably rattle down the values for //positive// base-2 exponents -- one, two, four, eight, sixteen and so on. //Negative// exponents usually give people a moment pause. But it's quite easy: While each positive increment //doubles// the value, each decrement //halves// it -- <m>0.5</m><m>0.25</m><m>0.125</m><m>0.0625</m> and so on.
  
 === Exponent === === Exponent ===
  
-The exponent is a funny bugger. IEEE 754 defines the exponent as an absolute (positive) offset to a (negative) bias. That bias is calculated as ''1-2^(bits-1)''. "All bits clear" and "all bits set" are reserved (we will come to this later), so the smallest meaningful exponent is ''1''.+The exponent is a funny bugger. IEEE 754 defines the exponent as an absolute (positive) offset to a (negative) bias. That bias is calculated as <m>1-2^(bits-1)</m>. "All bits clear" and "all bits set" are reserved (we will come to this later), so the smallest meaningful exponent is 1.
  
-For our 3-bit exponent, we get a bias of ''1-2^2 = -3'', with a practical range of ''-2..3'':+For our 3-bit exponent, we get a bias of <m>1-2^2 = -3</m>, with a practical range of <m>-2..3</m>:
  
 ^ Exponent ^ Value ^ ^ Exponent ^ Value ^
Line 90: Line 90:
 === Denormalized Numbers === === Denormalized Numbers ===
  
-When all exponent bits are cleared, the mantissa is considered to be [[https://en.wikipedia.org/wiki/Denormal_number | denormalized]], with an exponent of ''1-(bias)''.+When all exponent bits are cleared, the mantissa is considered to be [[https://en.wikipedia.org/wiki/Denormal_number | denormalized]], with an exponent of <m>1-(bias)</m>.
  
 In other words, the smallest representable exponent (with only the lowest bit set) is applied, //with the usually implied pre-fractional bit unset//. The purpose, here, is to allow a bit of additional precision in the very smallest numbers presentable. In other words, the smallest representable exponent (with only the lowest bit set) is applied, //with the usually implied pre-fractional bit unset//. The purpose, here, is to allow a bit of additional precision in the very smallest numbers presentable.
Line 97: Line 97:
  
 ^ Exponent ^ Mantissa (stored) ^ Mantissa (implied) ^ Value ^ ^ Exponent ^ Mantissa (stored) ^ Mantissa (implied) ^ Value ^
-| 001      | 11                | 1.11               %%1.75 2^(-2) = 0.4375%% +| 001      | 11                | 1.11               <m>1.75 2^(-2) = 0.4375</m> 
-| 001      | 10                | 1.10               %%1.50 2^(-2) = 0.375%%  | +| 001      | 10                | 1.10               <m>1.50 2^(-2) = 0.375</m>  | 
-| 001      | 01                | 1.01               %%1.25 2^(-2) = 0.3125%% +| 001      | 01                | 1.01               <m>1.25 2^(-2) = 0.3125</m> 
-| 001      | 00                | 1.00               %%1.00 2^(-2) = 0.25%%   | +| 001      | 00                | 1.00               <m>1.00 2^(-2) = 0.25</m>   | 
-| 000      | 11                | **0**.11               | **0**%%.75 2^(-2) = 0.1875%% +| 000      | 11                | **0**.11               | **0**<m>.75 2^(-2) = 0.1875</m> 
-| 000      | 10                | **0**.10               | **0**%%.50 2^(-2) = 0.125%%  | +| 000      | 10                | **0**.10               | **0**<m>.50 2^(-2) = 0.125</m>  | 
-| 000      | 01                | **0**.01               | **0**%%.25 2^(-2) = 0.0625%% |+| 000      | 01                | **0**.01               | **0**<m>.25 2^(-2) = 0.0625</m> |
 | 000      | 00                | **0**.00               | **0** | | 000      | 00                | **0**.00               | **0** |
  
Line 145: Line 145:
   * Florian Loitsch (2010): [[http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.1021.1906 | Printing Floating-Point Numbers Quickly and Accurately with Integers]]   * Florian Loitsch (2010): [[http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.1021.1906 | Printing Floating-Point Numbers Quickly and Accurately with Integers]]
  
-That first paper presented the fourth iteration of an algorithm dubbed "Dragon" by its inventorsi.e. "Dragon 4". Loitsch continued in that vein by calling his small-integer optimization "Grisu", of which there are three iterations.+Steele & White reference another seminal work on the opposite conversion, from decimal string to binary floating point representation: 
 + 
 +  * William D. Clinger (1990): [[https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.4152 | How to read floating point numbers accurately]] 
 + 
 +That first paper by Steele & White presented the "Dragon" algorithm in its fourth iterationto which Gay and Burger / Dybvig submitted improvements. Loitsch presented a significant performance improvement he called "Grisu", of which there are three iterations.
  
 (Thanks to [[https://www.ryanjuckett.com/programming/printing-floating-point-numbers/ | Ryan Juckett]] for writing a very nice introduction on the subject, which was at the root of what I assembled on this page.) (Thanks to [[https://www.ryanjuckett.com/programming/printing-floating-point-numbers/ | Ryan Juckett]] for writing a very nice introduction on the subject, which was at the root of what I assembled on this page.)
 +
 +===== Dragon =====
 +
 +Steele & White approach the presentation of their Dragon algorithm as a series of "lesser" algorithms building on each other.
 +
 +==== Fixed Point Fraction Output ====
 +
 +Given:
 +
 +  * A positive value //f// less than 1, with //n// digits to (input) base //b// (usually 2).
 +
 +Output:
 +
 +  * A value //F// of //N// digits to (output) base //B// (usually 10).
 +
 +Such that:
 +
 +  - <m>delim{|}{F - f}{|} < { b^{-n} } / 2</m>
 +    * The difference between representations is less than half the positional value of the //n//th digit of //f//.
 +  - <m>N</m> is the smallest number of digits so that 1. can be true.
 +  - <m>delim{|}{F - f}{|} <= { B^{-N} } / 2</m>
 +    * The difference between representations is no more than half the positional value of the //N//th digit of //F//.
 +  - Each digit of //F// is output before the next is generated; no "back up" for corrections.
 +
 +Algorithm <m>(FP)^3</m> (Finite-Precision Fixed-Point Fraction Printout):
 +
 +  * <m>k = 0, R = f, M = { b ^ { -n } / 2 }</m>
 +  * while ( 1 )
 +    * k++
 +    * U = floor( R * B )
 +    * R = ( R * B ) % 1
 +    * M *= B
 +    * if ( <m>R >= M AND <= 1 - M</m> )
 +      * append( F, U )
 +    * else
 +      * break
 +  * if ( <m>R <= 1/2</m> )
 +    * append( F, U )
 +  * if ( <m>R >= 1/2</m> )
 +    * append( F, U + 1 )
 +
 +At the end, <m>k</m> is <m>N</m>, i.e. the number of digits in <m>F</m>.
 +
 +Example:
 +
 +  * Given the base <m>b = 2</m> number <m>f = .10110</m>, with <m>n = 5</m>, to be converted to base <m>B = 10</m>
 +  * The //exact// decimal representation would be <m>.6835</m>
 +    * The next higher number (<m>f_{+1} = .10111</m>) would be decimal <m>0.71475</m>
 +    * The next lower number (<m>f_{-1} = .10101</m>) would be decimal <m>0.65225</m>
 +  * The Mask would be <m>M = { b ^ {-n} } / 2 = { 2 ^ { -5 } } / 2 = 0.015625</m>
 +
 +  * First (<m>k = 1</m>) loop
 +    * multiply <m>R = 0.6835</m> by <m>B = 10</m> for integral part <m>6</m>, fractional part <m>0.835</m>
 +    * multiply <m>M = 0.015625</m> by <m>B = 10</m> for new <m>M = 0.15625</m>
 +    * Fractional part <m>0.835</m> is larger than mask <m>0.15625</m>, and smaller than <m>1 - 0.15625 = 0.84375</m>, so <m>6</m> is our first (fractional) digit, and the loop continues
 +  * Second (<m>k = 2</m>) loop
 +    * multiply <m>R = 0.835</m> by <m>B = 10</m> for integral part <m>8</m>, fractional part <m>0.35</m>
 +    * multiply <m>M = 0.15625</m> by <m>B = 10</m> for new <m>M = 1.5625</m>
 +    * Fractional part <m>.35</m> is smaller than mask <m>1.5625</m>, and not smaller than <m>1 - 1.5625 = -0.5625</m>, so the loop terminates
 +  * Post-loop
 +    * Fractional part <m>.35</m> is smaller than <m>1/2</m>, so <m>8</m> is our next fractional digit
 +    * We have <m>N = k = 2</m> fractional digits in our result of <m>0.68</m>, which is the smallest <m>N</m> that uniquely identifies our original <m>f = .10110</m>
 +
 +==== Floating-Point Printout ====
 +
 +Given:
 +
 +  * A base //b// floating-point number <m>v = f * b ^ (e - p)</m>, where //e//, //f//, and //p// are integers with <m>p >= 0</m> and <m>0 <= f < b ^ p</m>.
 +
 +Output:
 +
 +  * An approximate representation to base //B//, using exponential notation if the value is very large or very small.
 +
 +Algorithm (Dragon2):
 +
 +   * Compute <m>v prime = v * B ^ {-x}</m>, where <m>x</m> is chosen so that the result either is between 1 / //B// and 1, or between 1 and //B// (normalization, either with leading zero or leading non-zero digit)
 +   * Print the integer part of <m>v prime</m>
 +   * Print a decimal point
 +   * Take the fractional part of <m>v prime</m>, let <m>n = p - floor ( log_ b v prime ) - 1</m>
 +   * Apply algorithm <m>(FP)^3</m> to //f// and //n//
 +   * If <m>x</m> was not zero, append the letter "E" and a representation of the scale factor <m>x</m>
pdclib/floatingpoint.1570007139.txt.gz · Last modified: 2019/10/02 11:05 by solar