|
Floating-point addition |
|
Floating point numbers format |
The binary coding of floating-point real numbers is composed of three fields. The sign S
(1 bit), the exponent E (8 bits) and the mantissa M, or significand
(23 bits). The number value is (-1)S × 2(E - 127) × (1 + M / 8388608 ) . However if E = 0, the number value is (-1)S × 2(-126) × ( M / 8388608 ) and if E = 255, the value is infinite. |
|
Addition and subtraction |
Since floating-point numbers are coded as "sign/magnitude",
reversing the sign-bit inverses the sign. Consequently the same operator performs as well addition or subtraction
according to the two operand's signs.
The alignment step keeps a "guard bit" and a "round bit" and yields a "sticky bit" for the rounding step. The "sticky bit" is the "or" of all the bits discarded during the alignment shift. |
|
Adder/ subtractor |
A floating-point adder is made up of the following blocks: Block 1: outputs the larger of the two exponents (8 bits), outputs the absolute value of the exponents difference (5 bits), reveals the "hidden bit" of both mantissas. Block 2: output at left the smaller operand mantissa (23 bits), output at right the larger operand mantissa (23 bits). The "implicit bit" is added, totaling 24 bits. Shifter 1: shifts to the right the smaller operand mantissa, appends the "guard bit", "round bit" and "sticky bit", totaling 27 bits. Complementer: on request, does the logic complement for a subtraction. Adder 1: adds the two inputs and the carry in. Outputs the sum and a carry out totaling 28 bits, 2 of them before the decimal point, 5 bits serve for the rounding, of whom 2 dropped. Zero-leading-counter: the ZLC output is the number of leading '0'. If inhibited outputs 1. Shifter 2: shifts to the left ( ZLC – 2 ) positions (i.e. from 2 to the right up to 23 to the left). The fist bit is dropped ( implicit '1' ). Adder 2: subtracts ( ZLC – 1 ) from the larger of the two exponents. |
|
|
This adder/subtractor implements faithfully the IEEE 754 standard. Nevertheless some blocks that would clutter up the figure are not included, namely the rounding logic , the exceptions (infinite, NaN, denormal, zero) and the "flags". |
| Floating-point numbers addition requires integer additions/subtractions, parametrised shifts (to the right for alignment, to the left for renormalization) and a counting of the result leading zeroes . Addition/subtraction can be completed with delay log2(n). The parametrised shift's delay, below, is log2(n) as well. A circuit of "or" and "and" gates computes the "sticky" bit. |
|
Zero leading counter ( ZLC ) |
A binary tree counts up by dichotomy the number of '0' in the most significant positions. If the size of the sub-strings is a power of two, then there is no need for adders but multiplexers can be used instead. Actually only the size of the left substring has to be a power of two. The substring at right must simply be shorter than or of the same size as the left substring. |
| This cell combines the number of leading '0' of two 16-bit strings to obtain the number
of leading '0' of the concatenation of the two strings (32 bits). . if X < 16 then S = X else S = 16 + Y |
| From the mantissas A and B, one can construct in constant time a number P with the same number of leading zeroes, but for at most one, as the difference D = A – B without having to wait for the subtraction completion. When fed to a ZLC, this string predicts the number of positions required by the shifter. If the result of the shift still exhibits a leading zero, then a shift of one more position is necessary to normalize the result. Otherwise the shifted value is already normalized. |
| The prediction is valid if A is normalized and B less than or equal to A. This is the always
the case in a significand subtraction. The leading zeroes result from a carry string
'P'* 'G' 'K'* , made up with a number (possibly null) of 'P'
followed by a unique 'G' followed by a number (possibly null)
of 'K' . The predictor cell outputs a '0' for every pair of symbols in: 'P' 'P' ; 'P' 'G' ; 'G' 'K' et 'K' 'K'
and outputs a '1' for every other pair. This predictor does not take into account the carry propagation, which may lead to bits misspredicted. Nevertheless because only one bit in 'P'* 'G' 'K'* might be incorrectly predicted, the error is tolerable since its correction is easy. |
|
Predictor cell "Pred" |
The leading '0' prediction cell "Pred" output a '1' at the end of the string 'P'* 'G' 'K'* and '0' inside the string (and don't care neither inside nor at the end). |
| The prediction is incorrect only if the carry string starts with 'P'* 'G' 'K'* 'P' 'P'* 'K'. The following circuit output 'Y' whenever the prediction is incorrect, therefore too small by one. |
| Z indicates a string 'K'* 'P'* Q indicates a string 'P'* 'G' 'K'* 'P'* (including only one 'G') N indicates a string starting with 'P'* 'K' Y indicates a string starting with 'P'* 'G' 'K'* 'P' 'P'* 'K', i.e. Q followed by N. U indicates any other string. |
| The absolute value of the exponents difference is needed to control Shifter 1. With four slight modifications, Sklansky's adder with a late carry-in returns S = ½ A – B ½. |
| In a floating point addition, the adder output S is first normalized by shift and then rounded up by adding 0 or 1 ulp, to give the result mantissa. The 28-bit S is labeled "adder out" in the floating point adder above. Check whether you are acquainted with the normalization and rounding. |
|
The vertical arrow
|
| To avoid the extra delay due to the carry propagation when adding 1 ulp for the rounding, a three-output adder precalculates : S, S' = S + 1 and S" = S + 2. From this three outputs, all the possible final results can be obtained with a mere extra shift right: S, S/2, S' = S + 1, S"/2 = S/2 + 1. |