# CENG 3420 Computer Organization & Design

Lecture 02: Arithmetic and Logic Unit

Bei Yu

(Latest update: January 20, 2021)

Spring 2021

### Overview



Overview

Addition

Multiplication & Division

Shift

Floating Point Number

#### Overview



#### Overview

Addition

Multiplication & Division

Shift

Floating Point Number

## **Abstract Implementation View**





#### **Arithmetic**



Where we've been: abstractions

- Instruction Set Architecture (ISA)
- Assembly and machine language

#### **Arithmetic**



Where we've been: abstractions

- ► Instruction Set Architecture (ISA)
- Assembly and machine language

#### What's up ahead: Implementing the ALU architecture



#### Review: VHDL



- Supports design, documentation, simulation & verification, and synthesis of hardware
- Allows integrated design at behavioral & structural levels



### Review: VHDL (cont.)



#### **Basic Structure**

- Design entity-architecture descriptions
- Time-based execution (discrete event simulation) model



## Review: Entity-Architecture Features



#### **Entity**

defines externally visible characteristics

- Ports: channels of communication
  - signal names for inputs, outputs, clocks, control
- Generic parameters: define class of components
  - timing characteristics, size (fan-in), fan-out

## Review: Entity-Architecture Features (cont.)



#### Architecture

defines the internal behavior or structure of the circuit

- Declaration of internal signals
- Description of behavior
  - collection of Concurrent Signal Assignment (CSA) statements (indicated by <=);</li>
  - can also model temporal behavior with the delay annotation
  - one or more processes containing CSAs and (sequential) variable assignment statements (indicated by :=)
- Description of structure
  - interconnections of components; underlying behavioral models of each component must be specified

## **ALU VHDL Representation**



```
entity ALU is
  port(A, B: in std logic vector (31 downto 0);
          m: in std logic vector (3 downto 0);
          result: out std logic vector (31 downto 0);
          zero: out std logic;
          ovf: out std logic)
end ALU;
architecture process behavior of ALU is
begin
   ALU: process(A, B, m)
   begin
       result := A + B;
   end process ALU;
end process behavior;
```

## Machine Number Representation



- Bits are just bits (have no inherent meaning)\*
- ► Binary numbers (base 2) integers

Of course, it gets more complicated:

- storage locations (e.g., register file words) are finite, so have to worry about overflow (i.e., when the number is too big to fit into 32 bits)
- have to be able to represent negative numbers, e.g., how do we specify -8 in

**addi** \$sp, \$sp, 
$$-8$$
 #\$sp = \$sp - 8

in real systems have to provide for more than just integers, e.g., fractions and real numbers (and floating point) and alphanumeric (characters)



<sup>\*</sup>conventions define the relationships between bits and numbers

## **RISC-V Representation**



#### 32-bit signed numbers (2's complement):

```
0000 0000 0000 0000 0000 0000 0000 0000_{two} = 0_{ten}
0000 0000 0000 0000 0000 0000 0000 0001_{two} = + 1_{ten}
0000 0000 0000 0000 0000 0000 0000 0010_{two} = + 2_{ten}
. . .
1000 0000 0000 0000 0000 0000 0000 0000_{two} = -2,147,483,648_{ten}
1000 0000 0000 0000 0000 0000 0000 0001_{two} = -2,147,483,647_{ten}
1000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0010_{two} = -2,147,483,646_{ten}
1111 1111 1111 1111 1111 1111 1111 1110_{two} = -2_{ten}
```

What if the bit string represented addresses?

need operations that also deal with only positive (unsigned) integers



## Two's Complement Operations



- Negating a two's complement number complement all the bits and then add a 1
  - remember: "negate" and "invert" are quite different!
- Converting n-bit numbers into numbers with more than n bits:
  - MIPS 16-bit immediate gets converted to 32 bits for arithmetic
  - sign extend: copy the most significant bit (the sign bit) into the other bits

```
0010 -> 0000 0010
1010 -> 1111 1010
```

sign extension versus zero extend (lb vs. lbu)

# Design the RISC-V Arithmetic Logic Unit (ALU)



Must support the Arithmetic/Logic operations of the ISA

RV 32I: add, sub, mul, mulh, mulhu, mulhsu, div, divu, rem, li, addi, sll, srl, sra, or, xor, not, slt, sltu, slli, srli, srai, andi, ori, xori, slti, sltiu,

#### RV 64I:

addw, subw, remu, mulw, divw, divuw, remw, remuw, addiw, sllw, srlw, sraw, srliw, sraiw,

#### With special handling for:

- ▶ sign extend: addi, slti, sltiu
- zero extend: andi, xori
- Overflow detected: add, addi, sub



## RISC-V Arithmetic and Logic Instructions



| _       | 7       | 5   | 5   | 3      | 5  | 7      |   |
|---------|---------|-----|-----|--------|----|--------|---|
| R-type: | funct7  | rs2 | rs1 | funct3 | rd | opcode |   |
| _       | 12      |     | 5   | 3      | 5  | 7      | _ |
| I-type: | imm[11: | 0]  | rs1 | funct3 | rd | opcode |   |

| Type  | ор     | funct |
|-------|--------|-------|
| ADDI  | 001000 | xx    |
| ADDIU | 001001 | xx    |
| SLTI  | 001010 | xx    |
| SLTIU | 001011 | xx    |
| ANDI  | 001100 | xx    |
| ORI   | 001101 | xx    |
| XORI  | 001110 | xx    |
| LUI   | 001111 | XX    |

| Type | ор     | <u>funct</u> |
|------|--------|--------------|
| ADD  | 000000 | 100000       |
| ADDU | 000000 | 100001       |
| SUB  | 000000 | 100010       |
| SUBU | 000000 | 100011       |
| AND  | 000000 | 100100       |
| OR   | 000000 | 100101       |
| XOR  | 000000 | 100110       |
| NOR  | 000000 | 100111       |

| Type | ор     | funct  |
|------|--------|--------|
|      | 000000 | 101000 |
|      | 000000 | 101001 |
|      | 000000 |        |
| SLTU | 000000 | 101011 |
|      | 000000 | 101100 |

### Overview



Overviev

Addition

Multiplication & Division

Shift

Floating Point Number

#### Addition & Subtraction



Just like in grade school (carry/borrow 1s)

Two's complement operations are easy: do subtraction by negating and then adding

Overflow (result too large for finite computer word). E.g., adding two n-bit numbers does not yield an n-bit number

# Building a 1-bit Binary Adder





| Α | В | carry_in carry_out |   | S |
|---|---|--------------------|---|---|
| 0 | 0 | 0                  | 0 | 0 |
| 0 | 0 | 1                  | 0 | 1 |
| 0 | 1 | 0                  | 0 | 1 |
| 0 | 1 | 1                  | 1 | 0 |
| 1 | 0 | 0                  | 0 | 1 |
| 1 | 0 | 1                  | 1 | 0 |
| 1 | 1 | 0                  | 1 | 0 |
| 1 | 1 | 1                  | 1 | 1 |

```
S = A xor B xor carry_in
carry_out = A&B | A&carry_in | B&carry_in
(majority function)
```

- How can we use it to build a 32-bit adder?
- How can we modify it easily to build an adder/subtractor?



# Building 32-bit Adder





- Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect ...
- Ripple Carry Adder (RCA)
  - ©: simple logic, so small (low cost)
  - ©: slow and lots of glitching (so lots of energy consumption)

#### Glitch



#### Glitch

invalid and unpredicted output that can be read by the next stage and result in a wrong action

Example: Draw the propagation delay



## Glitch in RCA





| Α | В | carry_in | carry_out | S |
|---|---|----------|-----------|---|
| 0 | 0 | 0        | 0         | 0 |
| 0 | 0 | 1        | 0         | 1 |
| 0 | 1 | 0        | 0         | 1 |
| 0 | 1 | 1        | 1         | 0 |
| 1 | 0 | 0        | 0         | 1 |
| 1 | 0 | 1        | 1         | 0 |
| 1 | 1 | 0        | 1         | 0 |
| 1 | 1 | 1        | 1         | 1 |

### But What about Performance?



- ightharpoonup Critical path of n-bit ripple-carry adder is  $n \times CP$
- Design trick: throw hardware at it (Carry Lookahead)



# A 32-bit Ripple Carry Adder/Subtractor



add/sub

complement all the bits

control 
$$(0=add,1=sub)$$
  $B_0$  if control = 0  $B_0$  if control = 1

add a 1 in the least significant bit



## Minimal Implementation of a Full Adder



Gate library: inverters, 2-input NANDs, or-and-inverters

```
architecture concurrent_behavior of full_adder is
  signal t1, t2, t3, t4, t5: std_logic;
begin
  t1 <= not A after 1 ns;
  t2 <= not cin after 1 ns;
  t4 <= not((A or cin) and B) after 2 ns;
  t3 <= not((t1 or t2) and (A or cin)) after 2 ns;
  t5 <= t3 nand B after 2 ns;
  S <= not((B or t3) and t5) after 2 ns;
  cout <= not((t1 or t2) and t4) after 2 ns;
end concurrent_behavior;</pre>
```

## Tailoring the ALU to the MIPS ISA



- Also need to support the logic operations (and, nor, or, xor)
  - Bit wise operations (no carry operation involved)
  - Need a logic gate for each function and a mux to choose the output
- Also need to support the set-on-less-than instruction (slt)
  - ▶ Uses subtraction to determine if (a b) < 0 (implies a < b)
- Also need to support test for equality (bne, beq)
  - Again use subtraction: (a b) = 0 implies a = b
- Also need to add overflow detection hardware
  - overflow detection enabled only for add, addi, sub
- Immediates are sign extended outside the ALU with wiring (i.e., no logic needed)

# A Simple ALU Cell with Logic Op Support





## A Simple ALU Cell with Logic Op Support





Modifying the ALU Cell for  ${\tt slt}$ 



## Modifying the ALU for slt

- First perform a subtraction
- Make the result 1 if the subtraction yields a negative result
- Make the result 0 if the subtraction yields a positive result
- Tie the most significant sum bit (sign bit) to the low order less input



#### Overflow Detection



Overflow occurs when the result is too large to represent in the number of bits allocated

- adding two positives yields a negative
- or, adding two negatives gives a positive
- or, subtract a negative from a positive gives a negative
- or, subtract a positive from a negative gives a positive

#### Question: **prove** you can detect overflow by:

Carry into MSB xor Carry out of MSB





## Modifying the ALU for Overflow

- Modify the most significant cell to determine overflow output setting
- ► Enable overflow bit setting for signed arithmetic (add, addi, sub)



#### Overflow Detection and Effects



- On overflow, an exception (interrupt) occurs
- Control jumps to predefined address for exception
- Interrupted address (address of instruction causing the overflow) is saved for possible resumption
- Don't always want to detect (interrupt on) overflow

#### **New MIPS Instructions**



| Category                  | Instr                         | Op Code  |       | Example          | Meanii                  | ng               |
|---------------------------|-------------------------------|----------|-------|------------------|-------------------------|------------------|
| Arithmetic                | add unsigned                  | 0 and 21 | addu  | \$s1, \$s2, \$s3 | \$s1 = \$s2 +           | \$s3             |
| (R & I                    | sub unsigned                  | 0 and 23 | subu  | \$s1, \$s2, \$s3 | \$s1 = \$s2 - \$        | Ss3              |
| format)                   | add<br>imm.unsigned           | 9        | addiu | \$s1, \$s2, 6    | \$s1 = \$s2 +           | 6                |
| Data<br>Transfer          | ld byte<br>unsigned           | 24       | lbu   | \$s1, 20(\$s2)   | \$s1 = Mem(\$           | Ss2+20)          |
|                           | ld half unsigned              | 25       | lhu   | \$s1, 20(\$s2)   | \$s1 = Mem(\$           | \$s2+20)         |
| Cond.<br>Branch<br>(I & R | set on less than unsigned     | 0 and 2b | sltu  | \$s1, \$s2, \$s3 | if (\$s2<\$s3)<br>else  | \$s1=1<br>\$s1=0 |
| format)                   | set on less than imm unsigned | b        | sltiu | \$s1, \$s2, 6    | if (\$s2<6) \$s<br>else | 1=1<br>\$s1=0    |

▶ Sign extend: addi, addiu, slti

► Zero extend: andi, ori, xori

▶ Overflow detected: add, addi, sub





In the early days of computing, designers made computers express numbers using **unsigned binary**.

And they were content...

Until there were negative numbers.



#### To include negative numbers, designers came up with sign magnitude.

That took care of the negative numbers... But the computer had to count backwards for the negative numbers,



#### Then designers created one's complement.

#### Finally, designers developed **two's complement**. Now, there was only one zero... And they were content.







Hey guys! How do you negate numbers?



### Overview



Overview

Addition

Multiplication & Division

Shift

Floating Point Number

## Multiplication



- More complicated than addition
- Can be accomplished via shifting and adding

```
\begin{array}{c} 0010 \\ \times \underline{1011} \\ 0010 \\ 0010 \\ 0000 \\ \underline{0010} \\ 0001 \underline{0110} \end{array} \begin{array}{c} \text{(multiplicand)} \\ \text{(multiplier)} \\ \text{(partial product array)} \\ \text{(product)} \end{array}
```

- Double precision product produced
- More time and more area to compute

# First Version of Multiplication Hardware





## Add and Right Shift Multiplier Hardware





## **RISC-V Multiply Instruction**



Multiply (mult and multu) produces a double precision product

- ➤ Low-order word of the product is left in processor register 10 and the high-order word is left in register hi
- Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file
- Multiplies are usually done by fast, dedicated hardware and are much more complex (and slower) than adders

#### Division



Division is just a bunch of quotient digit guesses and left shifts and subtracts



#### **Question:** Division



Dividing 1001010 by 1000

#### **RISC-V Divide Instruction**



Divide generates the reminder in hi and the quotient in lo

| op rs ri | rd | shamt | funct |
|----------|----|-------|-------|
|----------|----|-------|-------|

- ► Instructions mflo rd and mfhi rd are provided to move the quotient and reminder to (user accessible) registers in the register file
- As with multiply, divide ignores overflow so software must determine if the quotient is too large.
- Software must also check the divisor to avoid division by 0.

### Overview



Overviev

Addition

Multiplication & Divisior

Shift

Floating Point Number

### **Shift Operations**

positions



Shifts move all the bits in a word left or right

```
sll $t2, $s0, 8 #$t2 = $s0 << 8 bits
srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits
sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits
```

rd

shamt

funct

- Notice that a 5-bit shamt field is enough to shift a 32-bit value  $2^5-1$  or 31 bit
- Logical shifts fill with zeros, arithmetic left shifts fill with the sign bit

#### The shift operation is implemented by hardware separate from the ALU

Using a barrel shifter, which would takes lots of gates in discrete logic, but is pretty easy to implement in VLSI



# A Simple Shifter





# Parallel Programmable Shifters









0,1 shifts



























### Overview



Overviev

Addition

Multiplication & Division

Shift

Floating Point Number

# Floating Point Number



Scientific notation:  $6.6254 \times 10^{-27}$ 

- ► A normalized number of certain accuracy (e.g. 6.6254 is called the mantissa)
- Scale factors to determine the position of the decimal point (e.g.  $10^{-27}$  indicates position of decimal point and is called the exponent; the **base** is implied)
- Sign bit

### Normalized Form



Floating Point Numbers can have multiple forms, e.g.

$$0.232 \times 10^{4} = 2.32 \times 10^{3}$$

$$= 23.2 \times 10^{2}$$

$$= 2320. \times 10^{0}$$

$$= 232000. \times 10^{-2}$$

- ▶ It is desirable for each number to have a unique representation => Normalized Form
- $\blacktriangleright$  We normalize Mantissa's in the Range [1..R), where R is the Base, e.g.:
  - ► [1..2) for BINARY
  - ► [1..10) for DECIMAL

### IEEE Standard 754 Single Precision



32-bit, float in C / C++ / Java



### IEEE Standard 754 Double Precision



64-bit, float in C / C++ / Java





#### Question:

What is the IEEE single precision number 40C0  $0000_{16}$  in decimal?



#### Question:

What is  $-0.5_{10}$  in IEEE single precision binary floating point format?

### Ref: IEEE Standard 754 Numbers



- Normalized +/- 1.d...d x 2<sup>exp</sup>
- Denormalized +/-0.d...d x 2<sup>min\_exp</sup> → to represent <u>near-zero</u> numbers e.g. + 0.0000...0000001 x 2<sup>-126</sup> for Single Precision

| Format                                                                                                    | # bits | # significant bits | macheps                               | # exponent bits | exponent range                |  |  |
|-----------------------------------------------------------------------------------------------------------|--------|--------------------|---------------------------------------|-----------------|-------------------------------|--|--|
|                                                                                                           |        |                    |                                       |                 |                               |  |  |
| Single                                                                                                    | 32     | 1+23               | 2 <sup>-24</sup> (~10 <sup>-7</sup> ) | 8               | 2-126 - 2+127 (~10 ±38)       |  |  |
| Double                                                                                                    | 64     | 1+52               | 2-53 (~10-16)                         | 11              | 2-1022 - 2+1023 (~10 ±308)    |  |  |
| Double Extended                                                                                           | >=80   | >=64               | <=2-64(~10-19)                        | >=15            | 2-16382 - 2+16383 (~10 ±4932) |  |  |
| (Double Extended is 80 bits on all Intel machines) macheps = Machine Epsilon = = 2 - (# significand bits) |        |                    |                                       |                 |                               |  |  |



# **Special Values**



#### Exponents of all 0's and all 1's have special meaning

- $\blacktriangleright$  E=0, M=0 represents 0 (sign bit still used so there is  $\pm$ 0)
- ► E=0, M≠0 is a denormalized number  $\pm$ 0.M  $\times 2^{-127}$  (smaller than the smallest normalized number)
- ► E=All 1's, M=0 represents ±Infinity, depending on Sign
- ► E=All 1's, M≠0 represents NaN

### Other Features



# +, -, x, /, sqrt, remainder, as well as conversion to and from integer are correctly rounded

- As if computed with infinite precision and then rounded
- ➤ Transcendental functions (that cannot be computed in a finite number of steps e.g., sine, cosine, logarithmic, , e, etc. ) may not be correctly rounded

#### **Exceptions and Status Flags**

Invalid Operation, Overflow, Division by zero, Underflow, Inexact

#### Floating point numbers can be treated as "integer bit-patterns" for comparisons

- If Exponent is all zeroes, it represents a denormalized, very small and near (or equal to) zero number
- If Exponent is all ones, it represents a very large number and is considered infinity (see next slide.)

**Dual Zeroes:** +0 (0x00000000) and -0 (0x80000000): they are treated as the same



### Other Features



#### Infinity is like the mathematical one

- ▶ Finite/Infinity  $\rightarrow$  0
- ► Infinity × Infinity → Infinity
- ightharpoonup Non-zero/0 ightharpoonup Infinity
- $\blacktriangleright \ \, \text{Infinity} \, \, ^{\{\text{Finite or Infinity}\}} \rightarrow \text{Infinity}$

### NaN (Not-a-Number) is produced whenever a limiting value cannot be determined:

- ► Infinity Infinity → NaN
- ▶ Infinity/Infinity  $\rightarrow$  NaN
- ▶  $0/0 \rightarrow \text{NaN}$
- ▶ Infinity  $\times$  0  $\rightarrow$  NaN
- ▶ If x is a NaN,  $x \neq x$
- ► Many systems just store the result quietly as a NaN (all 1's in exponent), some systems will signal or raise an exception

### **Inaccurate Floating Point Operations**



- E.g. Find 1st root of a quadratic equation
  - r = (-b + sqrt(b\*b 4\*a\*c)) / (2\*a)

Sparc processor, Solaris, gcc 3.3 (ANSI C),

Expected Answer 0.00023025562642476431 double 0.00023025562638524986 float 0.00024670246057212353

• Problem is that if c is near zero,

$$\operatorname{sqrt}(b*b - 4*a*c) \approx b$$

• Rule of thumb: use the highest precision which does not give up too much speed

# Catastrophic Cancellation



- (a b) is inaccurate when a ≈ b
- Decimal Examples
  - Using 2 significant digits to compute mean of 5.1 and 5.2 using the formula (a+b) / 2:

```
a + b = 10 (with 2 sig. digits, 10.3 can only be stored as 10) 10/2 = 5.0 (the computed mean is less than both numbers!!!)
```

Using 8 significant digits to compute sum of three numbers:

Catastrophic cancellation occurs when

$$|\frac{[round(x)"\bullet"round(y)] - round(x \bullet y)}{round(x \bullet y)}| >> \varepsilon_{mach}$$

