Multipliers 
Multiplier 
The multiplication comes second for frequency of use. 
AND gate

An "AND" gate multiplies two bits. To multiply two nbit
numbers A and X, n^{2} "AND" gates are required. The weighted sum of the n^{2} gate outputs
has indeed the same value as P = A × X. However this set of bit is not a number, although
its value is computed as if it was a number. Since A < 2^{n} et X < 2^{n}, the product P < 2^{2n }and therefore P is written with 2n bits. 
Unsigned Multiplication 
A regular structure of "AND" gates and "FA" cells with a "consistent" assembling first produces the partial products and then reduces them to a number P. Since each "FA" cell reduces the number of bits by exactly one (while preserving the sum), the necessary number of "FA" cells is n^{2} – 2n (i.e. number of input bits – number of output bits). Yet in the following applet there are more "FA" than necessary since some '0' must also be reduced, to be precise just as many '0' as bits of X. The vertical arrow labeled "view" hides or displays the wires distributing A and X. At each wires crossing is an "AND" gate. 
Signed multiplication 
If the multiplicand A and the multiplier X are both in 2's complement, then the "AND" with the signbits a_{n1} and x_{n1} must be inverted and a constant 2^{n} added to get the opposed value of the products of X and A with those signbits. Besides the signbit of P must also be inverted. 
Overflow of the signed multiplication 
The two most significant bits of P have (almost) always the same value. Only one of them may be saved as the sign of P. But then the product of the largest negative value by itself gives an incorrect truncated P. This overflow of P is detected by the "AND" of the 2 outputs of the leftmost "FA" cell. The vertical arrow "view" truncate/restore P. 
Fast Multipliers 
Many approaches lead to a speed improvement.

Booth recoding 
Using a larger radix automatically reduces the multiplier X digits number. 
Cell of the binary to "BC" converter 
Check whether you are acquainted with the logic of "B2BC" cell, which convert a "BC"
digit into "sign/magnitude" for the generation of partial products. The sum is preserved, i.e. : 2*x3 + x2 + x1 = (1)^{s} × ( 2*M2 + M1 ) ; 
Multiplication of A bits by one "BC" digit

The multiplication by one "BC" digit Î {2, 1, 0, 0, 1, 2} adds 2
bits on top of the A bits: one at left to get either A or 2A, another for the input carry in case of subtraction.
Since A is signed, its sign may have to be extended to the bit added at left. The multiplication requires as many "CASS" cells as bits in A plus 1. 
The multiplication first step generates from A and X a set of bits whose weights sum is the product P. For unsigned multiplication, P most significant bit weight is positive, while in 2's complement it is negative. 
The multiplication second step reduces the partial products from the preceding step into two numbers while preserving
the weighted sum. The sough after product P is the sum of those two numbers. The two numbers will be added during
the third step. The "Wallace trees" synthesis follows the Dadda's algorithm, which assures of the minimum counter number. If on top of that we impose to reduce as late as (or as soon as) possible then the solution is unique. The two binary number to be added during the third step may also be seen a one number in "CS" notation (2 bits per digit). 
Example of reduction tree 
The following trees reduce 8^{2} partial products (for example the product of two 8bit unsigned integers). The "Wallace trees" reduce "as late as possible" (key "Late" on the preceding applet). The weighted sum of the 16 output bits equals the weighted sum of the 64 input bits. 
The top check box allows the use of "CS" cell in the reduction. The bottom vertical arrow hides/shows the final 14bit adder. The result is in binary with the adder and in carrysave notation "CS" without it. The final adder delay is usually comparable to the reduction tree's one. 
"FA" cell delay 
A more subtle delay model for the "FA" cell permits to
reduce even more the partial products reduction trees delay. The input "x" is the time reference of the
"FA" cell delays; dxy and dxy are tolerable latencies of inputs "y" and "z". Finally
dxc and dxs are the computational delays of outputs "c" and "s". The chronogram displays the internal signals generation time; the output signals to the final adder are drawn in magenta and the carries in green. The vertical arrow "view" appends a line downto the moment when the signal enters another cell (i.e. shows the lost time intervals or slacks) or displays the "FA" and "HA" cells in the reduction trees. 
Final adder with unequal arrival times 
The previous figure shows that the final adder inputs are not ready at the same time. Typically the least significant bits are early, the most significant ones are a bit delayed and the middle ones are very late. This different arrival times (shown by magenta dots in the figure below) can be taken into account to build the reduction "BK" cells trees that produce the final adder carries. To modify the arrival times profile, draw lines in the figure while maintaining the "ctrl" key down. 