## 5DV008 Computer Architecture Umeå University Department of Computing Science Stephen J. Hegner

## **Topic 3: Arithmetic**

These slides are mostly taken verbatim, or with minor changes, from those prepared by

Mary Jane Irwin (www.cse.psu.edu/~mji)

of The Pennsylvania State University [Adapted from *Computer Organization and Design, 4<sup>th</sup> Edition,* Patterson & Hennessy, © 2008, MK]

11/16/10 5DV008 20101611 t:3 sl:1

Hegner UU

## Key to the Slides

- The source of each slide is coded in the footer on the right side:
  - Invin CSE331 = slide by Mary Jane Invin from the course CSE331 (Computer Organization and Design) at Pennsylvania State University.
  - Irwin CSE431 = slide by Mary Jane Irwin from the course CSE431 (Computer Architecture) at Pennsylvania State University.
  - Hegner UU = slide by Stephen J. Hegner at Umeå University.

11/16/10 5DV008 20101611 t:3 sl:2 Hegner UU

2

## Review: MIPS (RISC) Design Principles

#### □ Simplicity favors regularity

- fixed size instructions
- small number of instruction formats
- opcode always the first 6 bits

#### Smaller is faster

- Iimited instruction set
- limited number of registers in register file
- limited number of addressing modes

#### Make the common case fast

arithmetic operands from the register file (load-store machine)
allow instructions to contain immediate operands

#### Good design demands good compromises

# • three instruction formats

5DV008 20101611 t:3 sl:3

Irwin CSE431 PSU











## Dealing with Overflow

- Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit
  - When adding operands with different signs or when subtracting operands with the same sign, overflow can *never* occur

| Operation | Operand A | Operand B | Result indicating<br>overflow |
|-----------|-----------|-----------|-------------------------------|
| A + B     | ≥ 0       | ≥ 0       | < 0                           |
| A + B     | < 0       | < 0       | ≥ 0                           |
| A - B     | ≥ 0       | < 0       | < 0                           |
| A - B     | < 0       | ≥ 0       | ≥ 0                           |

 MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains 11/16 the address of the instruction that caused the exception
 EXPLORE TO THE CELL OF THE CELL OF





## Multiply













|                                                      | (mult and m                                                        | ultu <b>)</b>              | produce                    | s a do                     | uble     |        |
|------------------------------------------------------|--------------------------------------------------------------------|----------------------------|----------------------------|----------------------------|----------|--------|
| mult                                                 | \$s0, \$s1                                                         | #                          | hi  ]                      | Lo =                       | \$s0 *   | \$s1   |
| <ul> <li>Low-o<br/>and ti</li> <li>Instru</li> </ul> | D 16<br>Dorder word of the<br>ne high-order word<br>ctions mfhi ro | rdisleft<br>ส <b>and</b> m | <b>in registe</b><br>flord | process<br>r hi<br>are pro | vided to | move   |
| the pi                                               | oduct to (user a                                                   | ccessible                  | e) register                | s in the                   | registe  | r file |
|                                                      | es are usually are and are m                                       |                            |                            |                            |          | ver)   |

| - 1 | 11 | 11 | 61 | 1 | 0 |  |
|-----|----|----|----|---|---|--|
|     |    |    | U, |   | 0 |  |
|     |    |    |    |   |   |  |

5DV008 20101611 t:3 sl:13

Irwin CSE431 PSU







 Division is just a *bunch* of quotient digit guesses and left shifts and subtracts

dividend = quotient x divisor + remainder

















| Excep                                                                                                      | Exception Events in Floating Point                                                                           |                                               |                   |  |  |  |  |
|------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------|--|--|--|--|
|                                                                                                            | Overflow (floating point) happens when a positive<br>exponent becomes too large to fit in the exponent field |                                               |                   |  |  |  |  |
| Underflow (floating point) happens when a negative exponent becomes too large to fit in the exponent field |                                                                                                              |                                               |                   |  |  |  |  |
| -∞ ← /<br>+ largestE -                                                                                     | 0                                                                                                            | +  ;                                          | argestE +largestF |  |  |  |  |
|                                                                                                            |                                                                                                              | chance of underflow<br>t that has a larger ex |                   |  |  |  |  |
| • Do                                                                                                       | ouble precision – takes                                                                                      | two MIPS words                                |                   |  |  |  |  |
|                                                                                                            | s E (exponent)                                                                                               | F (fraction)                                  |                   |  |  |  |  |
|                                                                                                            | 1 bit 11 bits                                                                                                | 20 bits                                       |                   |  |  |  |  |
| 444640                                                                                                     | F (fraction                                                                                                  | continued)                                    |                   |  |  |  |  |
| 11/16/10                                                                                                   |                                                                                                              | 32 bits                                       | Irwin CSE431 PSU  |  |  |  |  |

### **IEEE 754 FP Standard**

- Most (all?) computers these days conform to the IEEE 754 floating point standard (-1)<sup>sign</sup> x (1+F) x 2<sup>E-bias</sup>
  - · Formats for both single and double precision
  - F is stored in normalized format where the msb in F is 1 (so there is no need to store it!) called the hidden bit
  - To simplify sorting FP numbers, E comes before F in the word and E is represented in excess (biased) notation where the bias is -127 (-1023 for double precision) so the most negative is 00000001 = 21-127 = 2-126 and the most positive is 1111110 = 2<sup>254-127</sup> = 2<sup>+127</sup>
- Examples (in normalized format)

  - Zero:

  - 1.0, x 2<sup>-1</sup> =

11/16/**•**00.75<sub>10</sub> x 2<sup>4</sup> =

5DV008 20101611 t:3 sl:20

5DV008 20101611 t:3 sl:21

Irwin CSE431 PSU

21

Irwin CSE431 PSU

## IEEE 754 FP Standard

- $\hfill Most$  (all?) computers these days conform to the IEEE 754 floating point standard (-1)^{sign} x (1+F) x 2^{\text{E-bias}}
  - Formats for both single and double precision
  - F is stored in normalized format where the msb in F is 1 (so there is no need to store it!) called the hidden bit
  - To simplify sorting FP numbers, E comes before F in the word and E is represented in excess (biased) notation where the bias is -127 (-1023 for double precision) so the most negative is 00000001 = 2<sup>1-127</sup> = 2<sup>-126</sup> and the most positive is 11111110 = 2<sup>134-127</sup> = 2<sup>+127</sup>
- Examples (in normalized format)

  - 1.0<sub>2</sub> x 2<sup>-1</sup> =

11/16/10 5DV008 20101611 t:3 sl:22

22 Invin CSE431 PSU

#### IEEE 754 FP Standard Encoding

- Special encodings are used to represent unusual events
  - ± infinity for division by zero
  - NaN (not a number) for the results of invalid operations such as 0/0
  - True zero is the bit string all zero

| Single<br>Precision       |          | Double<br>Precision        |          | Object<br>Represented      |
|---------------------------|----------|----------------------------|----------|----------------------------|
| E (8)                     | F (23)   | E (11)                     | F (52)   |                            |
| 0000 0000                 | 0        | 0000 0000                  | 0        | true zero (0)              |
| 0000 0000                 | nonzero  | 0000 0000                  | nonzero  | ± denormalized<br>number   |
| 0111 1111<br>to +127,-126 | anything | 01111111<br>to +1023,-1022 | anything | ± floating point<br>number |
| 1111 1111                 | + 0      | 1111 1111                  | - 0      | ± infinity                 |
| 1111 1111                 | nonzero  | 1111 1111                  | nonzero  | not a number<br>(NaN)      |

5DV008 20101611 t:3 sl:23

Irwin CSE431 PSU

23

### **Support for Accurate Arithmetic**

□ IEEE 754 FP rounding modes

- Always round up (toward +∞)
- Always round down (toward -∞)
- Truncate
- Round to nearest even (when the Guard || Round || Sticky are 100) – always creates a 0 in the least significant (kept) bit of F
- Rounding (except for truncation) requires the hardware to include extra F bits during calculations
  - Guard bit used to provide one F bit when shifting left to normalize a result (e.g., when normalizing F after division or subtraction)
  - Round bit used to improve rounding accuracy
  - Sticky bit used to support Round to nearest even; is set to a 1 whenever a 1 bit shifts (right) through it (e.g., when aligning F during addition/subtraction)

 $F = 1 \cdot xxxxxxxxxxxxxxxxxxxx G R S$ 

5DV008 20101611 t:3 sl:24

Irwin CSE431 PSU

## **Floating Point Addition**

Addition (and subtraction)

 $(\pm F1 \times 2^{E1}) + (\pm F2 \times 2^{E2}) = \pm F3 \times 2^{E3}$ 

- Step 0: Restore the hidden bit in F1 and in F2
- Step 1: Align fractions by right shifting F2 by E1 E2 positions (assuming E1  $\geq$  E2) keeping track of (three of) the bits shifted out in G R and S
- Step 2: Add the resulting F2 to F1 to form F3
- Step 3: Normalize F3 (so it is in the form 1.XXXXX ...)
  - If F1 and F2 have the same sign  $\to$  F3  $\in$  [1,4)  $\to$  1 bit right shift F3 and increment E3 (check for overflow)
  - If F1 and F2 have different signs → F3 may require *many* left shifts each time decrementing E3 (check for underflow)
- Step 4: Round F3 and possibly normalize F3 again
- Step 5: Rehide the most significant bit of F3 before storing the result
   11/16/10

|       | <br> |  | <br> |
|-------|------|--|------|
| 5DV00 |      |  |      |

| Irwin | CSE431 | PSU |
|-------|--------|-----|
|       |        |     |
|       |        |     |



| Floating F                                         | Point Addition Example                                                                                                      |
|----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| Add                                                |                                                                                                                             |
| (0.5                                               | = 1.0000 × 2 <sup>-1</sup> ) + (-0.4375 = -1.1100× 2 <sup>-2</sup> )                                                        |
| Step 0:                                            | Hidden bits restored in the representation above                                                                            |
| • Step 1:                                          | Shift significand with the smaller exponent (1.1100) right<br>until its exponent matches the larger exponent (so once)      |
| • Step 2:                                          | Add significands<br>1.0000 + (-0.111) = 1.0000 - 0.111 = 0.001                                                              |
| • Step 3:                                          | Normalize the sum, checking for exponent over/underflow $0.001 \times 2^{-1} = 0.010 \times 2^{-2} = = 1.000 \times 2^{-4}$ |
| • Step 4:                                          | The sum is already rounded, so we're done                                                                                   |
| • Step 5:<br>11/16/10<br>5DV008 20101611 t:3 sl:27 | Rehide the hidden bit before storing                                                                                        |

## Floating Point Multiplication

#### Multiplication

 $(\pm F1 \times 2^{E1}) \times (\pm F2 \times 2^{E2}) = \pm F3 \times 2^{E3}$ 

- Step 0: Restore the hidden bit in F1 and in F2
- Step 1: Add the two (biased) exponents and subtract the bias from the sum, so E1 + E2 127 = E3 also determine the sign of the product (which depends on the sign of the operands (most significant bits))
- Step 2: Multiply F1 by F2 to form a double precision F3
- Step 3: Normalize F3 (so it is in the form 1.XXXXX ...)
  - Since F1 and F2 come in normalized  $\rightarrow$  F3  $\in$  [1,4)  $\rightarrow$  1 bit right shift F3 and increment E3
  - Check for overflow/underflow
- Step 4: Round F3 and possibly normalize F3 again
- Step 5: Rehide the most significant bit of F3 before storing the 11/16/10<sup>result</sup>

| 5DV008 | 201 | 10161 | 1 t | :3 sl:28 |
|--------|-----|-------|-----|----------|

Irwin CSE431 PSU



| Floating F                | Point Multiplication Example                                                                                                                         |
|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| Multiply                  |                                                                                                                                                      |
| (0.5                      | = 1.0000 × 2 <sup>-1</sup> ) x (-0.4375 = -1.1100× 2 <sup>-2</sup> )                                                                                 |
| Step 0:                   | Hidden bits restored in the representation above                                                                                                     |
| • Step 1:                 | Add the exponents (not in bias would be $-1 + (-2) = -3$<br>and in bias would be $(-1+127) + (-2+127) - 127 = (-1) + (127+127-127) = -3 + 127 = 124$ |
| • Step 2:                 | Multiply the significands<br>1.0000 x 1.110 = 1.110000                                                                                               |
| • Step 3:                 | Normalized the product, checking for exp over/underflow 1.110000 x $2^{-3}$ is already normalized                                                    |
| • Step 4:                 | The product is already rounded, so we're done                                                                                                        |
| • Step 5: 11/16/10        | Rehide the hidden bit before storing                                                                                                                 |
| 5DV008 20101611 t:3 sl:30 | 30 Irwin CSE431 PSU                                                                                                                                  |

## MIPS Floating Point Instructions

| (\$f0, \$<br><i>pairs</i> for  | sf1, …, \$f31)                     | ating Point Register File<br>(whose registers are use<br>values) with special ins<br>them |                  |
|--------------------------------|------------------------------------|-------------------------------------------------------------------------------------------|------------------|
| lwc1                           | \$f1,54(\$s2)                      | #\$f1 = Memory[\$s2                                                                       | .+54]            |
| swcl                           | \$f1,58(\$s4)                      | #Memory[\$s4+58] =                                                                        | \$f1             |
| • •                            | ports IEEE 754 s<br>\$f2,\$f4,\$f6 | <b>ingle</b><br>#\$f2 = \$f4 + \$f6                                                       |                  |
| and dout                       | ole precision oper                 | rations                                                                                   |                  |
| add.d                          | \$f2,\$f4,\$f6                     | #\$f2  \$f3 =<br>\$f4  \$f5 + \$f                                                         | 6  \$f7          |
| similarly<br>div.d<br>11/16/10 | for sub.s, sub                     | o.d, mul.s, mul.d,                                                                        | div.s,           |
| 5DV008 20101611 t:3 sl:31      |                                    | 31                                                                                        | Irwin CSE431 PSU |

| PSU |  |  |  |  |
|-----|--|--|--|--|
| P30 |  |  |  |  |
|     |  |  |  |  |
|     |  |  |  |  |
|     |  |  |  |  |
|     |  |  |  |  |

## MIPS Floating Point Instructions. Con't

|         | 0 · 0                  | <pre>precision comparison operations #if(\$f2 &lt; \$f4) cond=1;         else cond=0</pre> |
|---------|------------------------|--------------------------------------------------------------------------------------------|
| where 2 | k <b>may be</b> eq, ne | eq, lt, le, gt, ge                                                                         |
|         | •                      | nparison operations<br>#\$f2  \$f3 < \$f4  \$f5<br>cond=1; else cond=0                     |
| And flo | ating point branch     | operations                                                                                 |
| bc1t    | 25                     | #if(cond==1)<br>go to PC+4+25                                                              |
| bclf    | 25                     | #if(cond==0)<br>go to PC+4+25                                                              |

Irwin CSE431 PSU

32

| - 1 | 1 | 11 | 6 | 11 | 0 |
|-----|---|----|---|----|---|
|     |   |    |   |    |   |

| 5DV008 20101611 t:3 sl:32 |
|---------------------------|
|                           |

| Frequency | of | Common | <b>MIPS</b> | Instructions |
|-----------|----|--------|-------------|--------------|
|           |    |        |             |              |

□ Only included those with >3% and >1%

|                  | SPECint | SPECfp |       | SPECint | SPECfp      |
|------------------|---------|--------|-------|---------|-------------|
| addu             | 5.2%    | 3.5%   | add.d | 0.0%    | 10.6%       |
| addiu            | 9.0%    | 7.2%   | sub.d | 0.0%    | 4.9%        |
| or               | 4.0%    | 1.2%   | mul.d | 0.0%    | 15.0%       |
| sll              | 4.4%    | 1.9%   | add.s | 0.0%    | 1.5%        |
| lui              | 3.3%    | 0.5%   | sub.s | 0.0%    | 1.8%        |
| lw               | 18.6%   | 5.8%   | mul.s | 0.0%    | 2.4%        |
| SW               | 7.6%    | 2.0%   | l.d   | 0.0%    | 17.5%       |
| lbu              | 3.7%    | 0.1%   | s.d   | 0.0%    | 4.9%        |
| beq              | 8.6%    | 2.2%   | 1.s   | 0.0%    | 4.2%        |
| bne              | 8.4%    | 1.4%   | s.s   | 0.0%    | 1.1%        |
| slt              | 9.9%    | 2.3%   | lhu   | 1.3%    | 0.0%        |
| slti             | 3.1%    | 0.3%   |       |         |             |
| 1 <b>6/10</b> u  | 3.4%    | 0.8%   |       |         | Invin CSE43 |
| 101611 1:3 51:33 | •       | •      | •     | 33      | irwin CSE43 |

