The multiplication comes second for frequency of use.

AND gate

An "AND" gate multiplies two bits. To multiply two n-bit numbers A and X, n2 "AND" gates are required. The weighted sum of the n2 gate outputs has indeed the same value as P = A × X. However this set of bit is not a number, although its value is computed as if it was a number.
Since A < 2n et X < 2n, the product P < 22n and therefore P is written with 2n bits.

Unsigned Multiplication

A regular structure of "AND" gates and "FA" cells with a "consistent" assembling first produces the partial products and then reduces them to a number P. Since each "FA" cell reduces the number of bits by exactly one (while preserving the sum), the necessary number of "FA" cells is n2 2n (i.e. number of input bits number of output bits). Yet in the following applet there are more "FA" than necessary since some '0' must also be reduced, to be precise just as many '0' as bits of X. The vertical arrow labeled "view" hides or displays the wires distributing A and X. At each wires crossing is an "AND" gate.

Signed multiplication

If the multiplicand A and the multiplier X are both in 2's complement, then the "AND" with the sign-bits an-1 and xn-1 must be inverted and a constant 2n added to get the opposed value of the products of X and A with those sign-bits. Besides the sign-bit of P must also be inverted.

Overflow of the signed multiplication

The two most significant bits of P have (almost) always the same value. Only one of them may be saved as the sign of P. But then the product of the largest negative value by itself gives an incorrect truncated P. This overflow of P is detected by the "AND" of the 2 outputs of the leftmost "FA" cell. The vertical arrow "view" truncate/restore P.

Fast Multipliers

Many approaches lead to a speed improvement.

  • Minimize the number of partial product bits using a higher radix.
  • Use a tree structure for the "FA" cell reduction net.
  • Use "CS" cell, with a reduction power double the "FA" cell's one. Besides this cell allows balanced binary trees (with some difficulties).
  • Use an advanced cell delay model for the reduction trees.
  • Select a fast final adder taking into account unequal arrival times.

Booth recoding

Using a larger radix automatically reduces the multiplier X digits number.
Let have a look at radix 4, using about two times as less digit as radix 2 for the same range. The "Booth Code" ( "BC" for short) is the minimally redundant symmetric radix-4 code. Digits values Î{-2, -1, -0, 0, 1, 2}. The 3-bit code picked below, known as "sign/magnitude", has 2 notations for zero.

However the partial products are computed by a cell more complex than a simple "AND" gate.

Cell of the binary to "BC" converter

Check whether you are acquainted with the logic of "B2BC" cell, which convert a "BC" digit into "sign/magnitude" for the generation of partial products. The sum is preserved, i.e. :
-2*x3 + x2 + x1 = (-1)s × ( 2*M2 + M1 ) ;

Multiplication of A bits by one "BC" digit

The multiplication by one "BC" digit Î {-2, -1, -0, 0, 1, 2} adds 2 bits on top of the A bits: one at left to get either A or 2A, another for the input carry in case of subtraction.
Since A is signed, its sign may have to be extended to the bit added at left.
The multiplication requires as many "CASS" cells as bits in A plus 1.

Partial products generation

The multiplication first step generates from A and X a set of bits whose weights sum is the product P. For unsigned multiplication, P most significant bit weight is positive, while in 2's complement it is negative.

Partial products reduction

The multiplication second step reduces the partial products from the preceding step into two numbers while preserving the weighted sum. The sough after product P is the sum of those two numbers. The two numbers will be added during the third step.
The "Wallace trees" synthesis follows the Dadda's algorithm, which assures of the minimum counter number. If on top of that we impose to reduce as late as (or as soon as) possible then the solution is unique. The two binary number to be added during the third step may also be seen a one number in "CS" notation (2 bits per digit).

Example of reduction tree

The following trees reduce 82 partial products (for example the product of two 8-bit unsigned integers). The "Wallace trees" reduce "as late as possible" (key "Late" on the preceding applet). The weighted sum of the 16 output bits equals the weighted sum of the 64 input bits.

  The top check box allows the use of "CS" cell in the reduction. The bottom vertical arrow hides/shows the final 14-bit adder. The result is in binary with the adder and in carry-save notation "CS" without it. The final adder delay is usually comparable to the reduction tree's one.

"FA" cell delay

A more subtle delay model for the "FA" cell permits to reduce even more the partial products reduction trees delay. The input "x" is the time reference of the "FA" cell delays; dxy and dxy are tolerable latencies of inputs "y" and "z". Finally dxc and dxs are the computational delays of outputs "c" and "s".
The chronogram displays the internal signals generation time; the output signals to the final adder are drawn in magenta and the carries in green. The vertical arrow "view" appends a line downto the moment when the signal enters another cell (i.e. shows the lost time intervals or slacks) or displays the "FA" and "HA" cells in the reduction trees.

Final adder with unequal arrival times

The previous figure shows that the final adder inputs are not ready at the same time. Typically the least significant bits are early, the most significant ones are a bit delayed and the middle ones are very late. This different arrival times (shown by magenta dots in the figure below) can be taken into account to build the reduction "BK" cells trees that produce the final adder carries. To modify the arrival times profile, draw lines in the figure while maintaining the "ctrl" key down.