|
Multipliers |
|
Multiplier |
The multiplication comes second for frequency of use. |
|
AND gate
|
An "AND" gate multiplies two bits. To multiply two n-bit
numbers A and X, n2 "AND" gates are required. The weighted sum of the n2 gate outputs
has indeed the same value as P = A × X. However this set of bit is not a number, although
its value is computed as if it was a number. Since A < 2n et X < 2n, the product P < 22n and therefore P is written with 2n bits. |
|
Unsigned Multiplication |
A regular structure of "AND" gates and "FA"
cells with a "consistent" assembling first produces the partial products and then reduces them to a number
P. Since each "FA" cell reduces the number of bits by exactly one (while preserving the sum), the necessary
number of "FA" cells is n2 – 2n (i.e. number of input bits – number of output bits). Yet in
the following applet there are more "FA" than necessary since some '0' must
also be reduced, to be precise just as many '0' as bits of X. The vertical arrow |
|
Signed multiplication |
If the multiplicand A and the multiplier X are both in 2's complement, then the "AND" with the sign-bits an-1 and xn-1 must be inverted and a constant 2n added to get the opposed value of the products of X and A with those sign-bits. Besides the sign-bit of P must also be inverted. |
|
Overflow of the signed multiplication |
The two most significant bits of P have (almost) always the same value. Only one of them may be saved as the sign
of P. But then the product of the largest negative value by itself gives an incorrect truncated P. This overflow of P is detected by the "AND" of the 2 outputs of the leftmost "FA"
cell. The vertical arrow |
|
Fast Multipliers |
Many approaches lead to a speed improvement.
|
|
Booth recoding |
Using a larger radix automatically reduces the multiplier X digits number. |
|
Cell of the binary to "BC" converter |
Check whether you are acquainted with the logic of "B2BC" cell, which convert a "BC"
digit into "sign/magnitude" for the generation of partial products. The sum is preserved, i.e. : -2*x3 + x2 + x1 = (-1)s × ( 2*M2 + M1 ) ; |
|
Multiplication of A bits by one "BC" digit
|
The multiplication by one "BC" digit Î {-2, -1, -0, 0, 1, 2} adds 2
bits on top of the A bits: one at left to get either A or 2A, another for the input carry in case of subtraction.
Since A is signed, its sign may have to be extended to the bit added at left. The multiplication requires as many "CASS" cells as bits in A plus 1. |
| The multiplication first step generates from A and X a set of bits whose weights sum is the product P. For unsigned multiplication, P most significant bit weight is positive, while in 2's complement it is negative. |
| The multiplication second step reduces the partial products from the preceding step into two numbers while preserving
the weighted sum. The sough after product P is the sum of those two numbers. The two numbers will be added during
the third step. The "Wallace trees" synthesis follows the Dadda's algorithm, which assures of the minimum counter number. If on top of that we impose to reduce as late as (or as soon as) possible then the solution is unique. The two binary number to be added during the third step may also be seen a one number in "CS" notation (2 bits per digit). |
|
Example of reduction tree |
The following trees reduce 82 partial products (for example the product of two 8-bit unsigned integers). The "Wallace trees" reduce "as late as possible" (key "Late" on the preceding applet). The weighted sum of the 16 output bits equals the weighted sum of the 64 input bits. |
| The top check box allows the use of "CS" cell in the
reduction. The bottom vertical arrow |
|
"FA" cell delay |
A more subtle delay model for the "FA" cell permits to
reduce even more the partial products reduction trees delay. The input "x" is the time reference of the
"FA" cell delays; dxy and dxy are tolerable latencies of inputs "y" and "z". Finally
dxc and dxs are the computational delays of outputs "c" and "s". The chronogram displays the internal signals generation time; the output signals to the final adder are drawn in magenta and the carries in green. The vertical arrow |
|
Final adder with unequal arrival times |
The previous figure shows that the final adder inputs are not ready at the same time. Typically the least significant bits are early, the most significant ones are a bit delayed and the middle ones are very late. This different arrival times (shown by magenta dots in the figure below) can be taken into account to build the reduction "BK" cells trees that produce the final adder carries. To modify the arrival times profile, draw lines in the figure while maintaining the "ctrl" key down. |