# CENG 3420 Lecture 05: Arithmetic and Logic Unit

Bei Yu

byu@cse.cuhk.edu.hk



香港中文大學 The Chinese University of Hong Kong

CENG3420 L05.1 Spring 2017

## **Outline**

- 1. Overview
- 2. Addition
- 3. Multiplication & Division
- 4. Shift
- □ 5. Floating Point Number

CENG3420 L05.2 Spring 2017

## **Outline**

- 1. Overview
- □ 2. Addition
- □ 3. Multiplication & Division
- 4. Shift
- □ 5. Floating Point Number

CENG3420 L05.3 Spring 2017

# **Abstract Implementation View**



CENG3420 L05.4 Spring 2017

#### **Arithmetic**

- Where we've been
  - Abstractions
    - Instruction Set Architecture (ISA)
    - Assembly and machine language
- What's up ahead
  - Implementing the ALU architecture



CENG3420 L05.5 Spring 2017

#### **Review: VHDL**

- Supports design, documentation, simulation & verification, and synthesis of hardware
- Allows integrated design at behavioral & structural levels



CENG3420 L05.6 Spring 2017

#### **Review: VHDL**

- Basic structure
  - Design entity-architecture descriptions
  - Time-based execution (discrete event simulation) model



CENG3420 L05.7 Spring 2017

## **Review: Entity-Architecture Features**

- Entity defines externally visible characteristics
  - Ports: channels of communication
    - signal names for inputs, outputs, clocks, control
  - Generic parameters: define class of components
    - timing characteristics, size (fan-in), fan-out
- □ Architecture defines the internal behavior or structure of the circuit
  - Declaration of internal signals
  - Description of behavior
    - collection of Concurrent Signal Assignment (CSA) statements (indicated by <=); can also model temporal behavior with the delay annotation
    - one or more processes containing CSAs and (sequential)
       variable assignment statements (indicated by :=)
  - Description of structure
    - interconnections of components; underlying behavioral models of each component must be specified

CENG3420 L05.8 Spring 2017

# **ALU VHDL Representation**

```
entity ALU is
  port(A, B: in std logic vector (31 downto 0);
          m: in std logic vector (3 downto 0);
          result: out std logic vector (31 downto 0);
          zero: out std logic;
          ovf: out std logic)
end ALU;
architecture process behavior of ALU is
begin
   ALU: process(A, B, m)
   begin
       result := A + B;
   end process ALU;
end process behavior;
```

CENG3420 L05.9 Spring 2017

## **Machine Number Representation**

- Bits are just bits (have no inherent meaning)
  - conventions define the relationships between bits and numbers
- □ Binary numbers (base 2) integers

  0000 -> 0001 -> 0010 -> 0011 -> 0100 -> 0101 -> . . . .
  - in decimal from 0 to 2<sup>n</sup>-1 for n bits
- Of course, it gets more complicated
  - storage locations (e.g., register file words) are finite, so have to worry about overflow (i.e., when the number is too big to fit into 32 bits)
  - have to be able to represent negative numbers, e.g., how do we specify -8 in

```
addi \$sp, \$sp, -8 \#\$sp = \$sp - 8
```

 in real systems have to provide for more than just integers, e.g., fractions and real numbers (and floating point) and alphanumeric (characters)

CENG3420 L05.10 Spring 2017

## **MIPS Representations**

□ 32-bit signed numbers (2's complement):

```
0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 = 0_{ten}
0000 0000 0000 0000 0000 0000 0001<sub>two</sub> = + 1_{ten}
                                                         maxint
1111 \ 1111_{\text{two}} = + 2,147,483,647_{\text{ten}}
    1111 1111 1111 1111 1111
               0000 0000 0000 0000 0000_{two} = -2,147,483,648_{ten}
1000
    0000 0000
1000 0000 0000 0000 0000 0000 0001<sub>two</sub> = -2,147, 483,647_{ten}
                          0000 \ 0000 \ 0010_{\text{two}} = -2,147,483,646_{\text{ten}}
1000 0000 0000 0000 0000
                                                         minint
1111 1111 1111 1111 1111 1111 1111 1101_{two} = -3_{ten}
                               1111 \ 1110_{\text{two}} = - 2_{\text{ten}}
          1111
                1111 1111 1111
               1111 \ 1111 \ 1111 \ 1111 \ 1111_{two} = -1_{ten}
```

- What if the bit string represented addresses?
  - need operations that also deal with only positive (unsigned) integers

CENG3420 L05.11 Spring 2017

## **Two's Complement Operations**

- Negating a two's complement number complement all the bits and then add a 1
  - remember: "negate" and "invert" are quite different!
- Converting n-bit numbers into numbers with more than n bits:
  - MIPS 16-bit immediate gets converted to 32 bits for arithmetic
  - sign extend copy the most significant bit (the sign bit) into the other bits

```
0010 -> 0000 0010
1010 -> 1111 1010
```

sign extension versus zero extend (1b vs. 1bu)

CENG3420 L05.12 Spring 2017

## **Design the MIPS Arithmetic Logic Unit (ALU)**

zero ovf

ALU

Must support the Arithmetic/Logic operations of the ISA

```
add, addi, addiu, addu sub, subu mult, multu, div, divu sqrt
```

and, andi, nor, or, ori, xor, xori m (operation) beq, bne, slt, slti, sltiu, sltu

- With special handling for
  - sign extend addi, addiu, slti, sltiu
  - zero extend andi, ori, xori
  - Overflow detected add, addi, sub

CENG3420 L05.13 Spring 2017

# **MIPS Arithmetic and Logic Instructions**



| Type  | ор     | funct |
|-------|--------|-------|
| ADDI  | 001000 | XX    |
| ADDIU | 001001 | XX    |
| SLTI  | 001010 | XX    |
| SLTIU | 001011 | XX    |
| ANDI  | 001100 | XX    |
| ORI   | 001101 | XX    |
| XORI  | 001110 | XX    |
| LUI   | 001111 | XX    |

| Type | ор     | funct  |
|------|--------|--------|
| ADD  | 000000 | 100000 |
| ADDU | 000000 | 100001 |
| SUB  | 000000 | 100010 |
| SUBU | 000000 | 100011 |
| AND  | 000000 | 100100 |
| OR   | 000000 | 100101 |
| XOR  | 000000 | 100110 |
| NOR  | 000000 | 100111 |

| Type | ор     | funct  |  |
|------|--------|--------|--|
|      | 000000 | 101000 |  |
|      | 000000 | 101001 |  |
| SLT  | 000000 | 101010 |  |
| SLTU | 000000 | 101011 |  |
|      | 000000 | 101100 |  |

CENG3420 L05.14 Spring 2017

## **Design Trick: Divide & Conquer**

- Break the problem into simpler problems, solve them and glue together the solution
- Example: assume the immediates have been taken care of before the ALU
  - now down to 10 operations
  - can encode in 4 bits



| 0 | add  |
|---|------|
| 1 | addu |
| 2 | sub  |
| 3 | subu |
| 4 | and  |
| 5 | or   |
| 6 | xor  |
| 7 | nor  |
| а | slt  |
| b | sltu |

CENG3420 L05.15 Spring 2017

## **Outline**

- 1. Overview
- 2. Addition
- □ 3. Multiplication & Division
- 4. Shift
- □ 5. Floating Point Number

CENG3420 L05.16 Spring 2017

## **Addition & Subtraction**

Just like in grade school (carry/borrow 1s)

$$\begin{array}{r} 0111 \\ + 0110 \\ \hline 1101 \end{array}$$

- Two's complement operations are easy
  - do subtraction by negating and then adding

- Overflow (result too large for finite computer word)
  - e.g., adding two n-bit numbers does not yield an n-bit number

## **Building a 1-bit Binary Adder**



| Α | В | carry_in carry_o |   | S |
|---|---|------------------|---|---|
| 0 | 0 | 0                | 0 | 0 |
| 0 | 0 | 1                | 0 | 1 |
| 0 | 1 | 0                | 0 | 1 |
| 0 | 1 | 1                | 1 | 0 |
| 1 | 0 | 0                | 0 | 1 |
| 1 | 0 | 1                | 1 | 0 |
| 1 | 1 | 0                | 1 | 0 |
| 1 | 1 | 1                | 1 | 1 |

- How can we use it to build a 32-bit adder?
- How can we modify it easily to build an adder/subtractor?

CENG3420 L05.18 Spring 2017

## **Building 32-bit Adder**



■ Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . .

- □ Ripple Carry Adder (RCA)
  - advantage: simple logic, so small (low cost)
  - disadvantage: slow and lots of glitching (so lots of energy consumption)

CENG3420 L05.19 Spring 2017

#### **Glitch**

- Glitch: invalid and unpredicted output that can be read by the next stage and result in a wrong action
- Example: Draw the propagation delay



CENG3420 L05.20 Spring 2017

# Glitch in RCA



| Α | В | carry_in | carry_out | S |
|---|---|----------|-----------|---|
| 0 | 0 | 0        | 0         | 0 |
| 0 | 0 | 1        | 0         | 1 |
| 0 | 1 | 0        | 0         | 1 |
| 0 | 1 | 1        | 1         | 0 |
| 1 | 0 | 0        | 0         | 1 |
| 1 | 0 | 1        | 1         | 0 |
| 1 | 1 | 0        | 1         | 0 |
| 1 | 1 | 1        | 1         | 1 |

CENG3420 L05.21 Spring 2017

### **But What about Performance?**

Critical path of n-bit ripple-carry adder is n\*CP



Design trick – throw hardware at it (Carry Lookahead)

CENG3420 L05.22 Spring 2017

## A 32-bit Ripple Carry Adder/Subtractor

- Remember 2's complement is just
  - complement all the bits

control  

$$(0=add,1=sub)$$
  $B_0$  if control = 0  
 $B_0$  if control = 1

 add a 1 in the least significant bit

A 0111 
$$\rightarrow$$
 0111  
B  $-$  0110  $\rightarrow$  + 1001  
0001  $\frac{1}{10001}$ 



CENG3420 L05.23 Spring 2017

## Minimal Implementation of a Full Adder

□ Gate library: inverters, 2-input nands, or-and-inverters

```
architecture concurrent behavior of full adder is
   signal t1, t2, t3, t4, t5: std logic;
begin
      t1 <= not A after 1 ns;
      t2 <= not cin after 1 ns;
      t4 <= not((A or cin) and B) after 2 ns;
      t3 \le not((t1 \text{ or } t2) \text{ and } (A \text{ or } cin)) \text{ after } 2 \text{ ns};
      t5 <= t3 nand B after 2 ns;
      S <= not((B or t3) and t5) after 2 ns;</pre>
      cout <= not((t1 or t2) and t4) after 2 ns;</pre>
end concurrent behavior;
```

[Optional] Can you create the equivalent schematic? Can you determine worst case delay (the worst case timing path through the circuit)?

CENG3420 L05.24 Spring 2017

## Tailoring the ALU to the MIPS ISA

- Also need to support the logic operations (and, nor, or, xor)
  - Bit wise operations (no carry operation involved)
  - Need a logic gate for each function and a mux to choose the output
- Also need to support the set-on-less-than instruction (slt)
  - Uses subtraction to determine if (a b) < 0 (implies a < b)
- Also need to support test for equality (bne, beq)
  - Again use subtraction: (a b) = 0 implies a = b
- Also need to add overflow detection hardware
  - overflow detection enabled only for add, addi, sub
- Immediates are sign extended outside the ALU with wiring (i.e., no logic needed)

CENG3420 L05.25 Spring 2017

# A Simple ALU Cell with Logic Op Support



CENG3420 L05.26 Spring 2017

# Modifying the ALU Cell for slt



CENG3420 L05.27 Spring 2017

# Modifying the ALU for slt

- First perform a subtraction
- Make the result 1 if the subtraction yields a negative result
- Make the result 0 if the subtraction yields a positive result
  - tie the most significant sum bit (sign bit) to the low order less input



#### **Overflow Detection**

- Overflow occurs when the result is too large to represent in the number of bits allocated
  - adding two positives yields a negative
  - or, adding two negatives gives a positive
  - or, subtract a negative from a positive gives a negative
  - or, subtract a positive from a negative gives a positive
- On your own: Prove you can detect overflow by:
  - Carry into MSB xor Carry out of MSB





CENG3420 L05.29 Spring 2017



#### **Overflow Detection and Effects**

- On overflow, an exception (interrupt) occurs
  - Control jumps to predefined address for exception
  - Interrupted address (address of instruction causing the overflow) is saved for possible resumption
- Don't always want to detect (interrupt on) overflow

CENG3420 L05.31 Spring 2017

## **New MIPS Instructions**

| Category                  | Instr                         | Op Code  | Example               | Meaning                                 |
|---------------------------|-------------------------------|----------|-----------------------|-----------------------------------------|
| Arithmetic                | add unsigned                  | 0 and 21 | addu \$s1, \$s2, \$s3 | s   \$s1 = \$s2 + \$s3                  |
| (R & I                    | sub unsigned                  | 0 and 23 | subu \$s1, \$s2, \$s3 | \$s1 = \$s2 - \$s3                      |
| format)                   | add<br>imm.unsigned           | 9        | addiu \$s1, \$s2, 6   | \$s1 = \$s2 + 6                         |
| Data<br>Transfer          | ld byte unsigned              | 24       | lbu \$s1, 20(\$s2)    | \$s1 = Mem(\$s2+20)                     |
|                           | ld half unsigned              | 25       | lhu \$s1, 20(\$s2)    | \$s1 = Mem(\$s2+20)                     |
| Cond.<br>Branch<br>(I & R | set on less than unsigned     | 0 and 2b | sltu \$s1, \$s2, \$s3 | if (\$s2<\$s3) \$s1=1<br>else<br>\$s1=0 |
| format)                   | set on less than imm unsigned | b        | sltiu \$s1, \$s2, 6   | if (\$s2<6) \$s1=1<br>else \$s1=0       |

- □ Sign extend addi, addiu, slti
- □ Zero extend andi, ori, xori
- Overflow detected add, addi, sub

CENG3420 L05.32 Spring 2017

## **Outline**

- 1. Overview
- □ 2. Addition
- 3. Multiplication & Division
- 4. Shift
- □ 5. Floating Point Number

CENG3420 L05.33 Spring 2017

## **Multiplication**

- More complicated than addition
  - Can be accomplished via shifting and adding

```
0010 (multiplicand)
x_1011 (multiplier)
0010
0010 (partial product
array)
0010
00010110 (product)
```

- Double precision product produced
- More time and more area to compute

CENG3420 L05.34 Spring 2017

# First Version of Multiplication Hardware



CENG3420 L05.35 Spring 2017

# Add and Right Shift Multiplier Hardware



CENG3420 L05.36 Spring 2017

## Add and Right Shift Multiplier Hardware



CENG3420 L05.37 Spring 2017

## **MIPS Multiply Instruction**

Multiply (mult and multu) produces a double precision product

mult \$s0, \$s1 # hi||lo = \$s0 \* \$s1

0 16 17 0 0 0x18

- Low-order word of the product is left in processor register
   lo and the high-order word is left in register
- Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file
- Multiplies are usually done by fast, dedicated hardware and are much more complex (and slower) than adders

CENG3420 L05.38 Spring 2017

#### **Division**

Division is just a bunch of quotient digit guesses and left shifts and subtracts



CENG3420 L05.39 Spring 2017

## **Example: Division**

Dividing 1001010 by 1000

CENG3420 L05.40 Spring 2017

#### **MIPS Divide Instruction**

Divide generates the reminder in hi and the quotient in lo

- Instructions mflo rd and mfhi rd are provided to move the quotient and reminder to (user accessible) registers in the register file
- As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0.

CENG3420 L05.41 Spring 2017

## **Outline**

- 1. Overview
- □ 2. Addition
- □ 3. Multiplication & Division
- 4. Shift
- □ 5. Floating Point Number

CENG3420 L05.42 Spring 2017

### **Shift Operations**

Shifts move all the bits in a word left or right

```
sll $t2, $s0, 8 #$t2 = $s0 << 8 bits srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits op rs rt rd shamt funct
```

- Notice that a 5-bit shamt field is enough to shift a 32-bit value 2<sup>5</sup> − 1 or 31 bit positions
- Logical shifts fill with zeros, arithmetic left shifts fill with the sign bit
- The shift operation is implemented by hardware separate from the ALU
  - using a barrel shifter (which would takes lots of gates in discrete logic, but is pretty easy to implement in VLSI)

CENG3420 L05.43 Spring 2017

# **A Simple Shifter**



CENG3420 L05.44 Spring 2017

## Parallel Programmable Shifters



CENG3420 L05.45 Spring 2017

## Logarithmic Shifter Structure



CENG3420 L05.46 Spring 2017

## **Logarithmic Shifter Structure**



CENG3420 L05.47 Spring 2017

## **Outline**

- 1. Overview
- □ 2. Addition
- □ 3. Multiplication & Division
- 4. Shift
- 5. Floating Point Number

CENG3420 L05.48 Spring 2017

### **Floating Point Number**

- □ Scientific notation: 6.6254 ×10<sup>-27</sup>
  - A normalized number of certain accuracy
     e.g. 6.6254 is called the mantissa
  - Scale factors to determine the position of the decimal point
     e.g. 10<sup>-27</sup> indicates position of decimal point and is called the exponent (the base is implied)
  - Sign bit

CENG3420 L05.49 Spring 2017

#### **Normalized Form**

□ Floating Point Numbers can have multiple forms, e.g.

```
0.232 \times 10^4 = 2.32 \times 10^3
= 23.2 x 10<sup>2</sup>
= 2320. x 10<sup>0</sup>
= 232000. x 10<sup>-2</sup>
```

- It is desirable for each number to have a unique representation => Normalized Form
- We normalize Mantissa's in the Range [ 1 .. R ) where R is the Base, e.g.:

[ 1 .. 2 ) for BINARY

[ 1 .. 10 ) for DECIMAL

CENG3420 L05.50 Spring 2017

# IEEE Standard 754 Single Precision (32-bit, float in C/ C++/ Java)



CENG3420 L05.51 Spring 2017

# IEEE Standard 754 Double Precision (64-bit, double in C/ C++/ Java)



(c) Double precision

CENG3420 L05.52 Spring 2017

#### **Example**

■ What is the IEEE single precision number 40C0 0000<sub>16</sub> in decimal?

- Binary:
   0100 0000 1100 0000 0000 0000 0000
- Sign: +
- Exponent: 129 127 = +2
- Mantissa:  $1.100\ 0000\ \dots_2 \rightarrow 1.5_{10}\ x\ 2^{+2}$  $\rightarrow +110.0000\ \dots_2$
- Decimal Answer = +6.0<sub>10</sub>

CENG3420 L05.53 Spring 2017

#### **Class Exercise**

■ What is –0.5<sub>10</sub> in IEEE single precision binary floating point format?

- $\bullet$  0.5 = 1.0... \*  $2^{-1}$  (in binary)
- Exponent bit= 127 + (-1) = 011111110
   Sign bit = 1
   Mantissa = 1.000 0000 0000 0000 0000
- binary representation =
- 1 01111110 000 0000 0000 0000 0000 0000

CENG3420 L05.54 Spring 2017

#### **Ref: IEEE Standard 754 Numbers**

- Normalized +/- 1.d...d x 2<sup>exp</sup>
- **Denormalized** +/-0.d...d x  $2^{min\_exp}$   $\rightarrow$  to represent <u>near-zero</u> numbers e.g. + 0.0000...0000001 x  $2^{-126}$  for Single Precision

| Format          | # bits | # significant bits | macheps                                  | # exponent bits | exponent range                                 |
|-----------------|--------|--------------------|------------------------------------------|-----------------|------------------------------------------------|
|                 |        |                    |                                          |                 |                                                |
| Single          | 32     | 1+23               | 2 <sup>-24</sup> (~10 <sup>-7</sup> )    | 8               | $2^{-126} - 2^{+127} (\sim 10^{\pm 38})$       |
| Double          | 64     | 1+52               | 2 <sup>-53</sup> (~10 <sup>-16</sup> )   | 11              | $2^{-1022} - 2^{+1023} (\sim 10^{\pm 308})$    |
| Double Extended | >=80   | >=64               | <=2 <sup>-64</sup> (~10 <sup>-19</sup> ) | >=15            | 2 <sup>-16382</sup> - 2 <sup>+16383</sup> (~10 |

(Double Extended is 80 bits on all Intel machines) macheps = Machine Epsilon =  $_{mach}$  = 2<sup>- (# significand bits)</sup>



CENG3420 L05.55 Spring 2017

#### **Other Features**

- +, -, x, /, sqrt, remainder, as well as conversion to and from integer are correctly rounded
  - As if computed with infinite precision and then rounded
  - Transcendental functions (that cannot be computed in a finite number of steps e.g., sine, cosine, logarithmic,  $\pi$ , e, etc. ) may not be correctly rounded
- Exceptions and Status Flags
  - Invalid Operation, Overflow, Division by zero, Underflow, Inexact
- Floating point numbers can be treated as "integer bitpatterns" for comparisons
  - If Exponent is all zeroes, it represents a denormalized, very small and near (or equal to) zero number
  - If Exponent is all ones, it represents a very large number and is considered infinity (see next slide.)

CENG3420 L05.56 Spring 2017

## **Special Values**

- Exponents of all 0's and all 1's have special meaning
  - E=0, M=0 represents 0 (sign bit still used so there is +/-0)
  - $\bullet$  E=0, M<>0 is a denormalized number  $\pm 0$ .M x  $2^{-127}$  (smaller than the smallest normalized number)
  - E=All 1's, M=0 represents ±Infinity, depending on Sign
  - E=All 1's, M<>0 represents NaN

CENG3420 L05.57 Spring 2017

#### **Other Features**

- Dual Zeroes: +0 (0x00000000) and -0 (0x80000000) (they are treated as the same)
- Infinity is like the mathematical one
  - Finite / Infinity  $\rightarrow 0$
  - Infinity x Infinity → Infinity
  - Non-zero / 0 → Infinity
  - Infinity ^ (Finite or Infinity) → Infinity
- NaN (Not-a-Number) is produced whenever a *limiting value* cannot be determined:
  - Infinity Infinity → NaN
  - Infinity / Infinity → NaN
  - 0 / 0 → NaN
  - Infinity x 0 → NaN
  - Many systems just store the result quietly as a NaN (all 1's in exponent) some systems will signal or raise an exception

☐ If x is a NaN, x != x

CENG3420 L05.58 Spring 2017

## **Inaccurate Floating Point Operations**

■ E.g. Find 1<sup>st</sup> root of a quadratic equation

$$r = (-b + sqrt(b*b - 4*a*c)) / (2*a)$$

Sparc processor, Solaris, gcc 3.3 (ANSI C),

Expected Answer 0.00023025562642476431 double 0.00023025562638524986 float 0.00024670246057212353

Problem is that if c is near zero,

$$sqrt(b*b - 4*a*c) \approx b$$

Rule of thumb: use the highest precision which does not give up too much speed

CENG3420 L05.59 Spring 2017

## **Catastrophic Cancellation**

- (a b) is inaccurate when a ≈ b
- Decimal Examples
  - Using 2 significant digits to compute mean of 5.1 and 5.2 using the formula (a+b) / 2:

```
a + b = 10 (with 2 sig. digits, 10.3 can only be stored as 10) 10 / 2 = 5.0 (the computed mean is less than both numbers!!!)
```

Using 8 significant digits to compute sum of three numbers:

```
(11111113 + (-11111111)) + 7.5111111 = 9.5111111

1111113 + ((-11111111)) + 7.5111111) = 10.000000
```

Catastrophic cancellation occurs when

$$|\frac{[round(x)"\bullet"round(y)] - round(x \bullet y)}{round(x \bullet y)}| >> \varepsilon_{mach}$$

CENG3420 L05.60 Spring 2017