## Umeå University Department of Computing Science 5DV008 — Computer Architecture Examination: April 09, 2010

| Name (printed):     |
|---------------------|
| Swedish ID number:  |
| Computer User-ID:   |
| Signature:          |
| Secret code number: |

## Instructions: / Instruktioner:

This examination will be graded anonymously. This page will be removed before the instructor receives the examination for grading. The secret code number given above must therefore be written on every answer page which you turn in to the examination proctor.

Denna skrivning rättas kodad. Detta blad kommer att avskiljas innan läraren får skrivningen för rättning. Ovanstående kod måste därför finnas på samtliga svarsblad när du lämnar skrivningen till skrivvakten.

## To the proctor of the examination: / Till skrivningsbevakaren:

Detach this cover sheet from the examination and put it in the envelope which is addressed to Yvonne Löwstedt, Department of Computing Science.

Avskilj detta försättsblad och stoppa i kuvert som skickas till Yvonne Löwstedt, Datavetenskap.

## Umeå University Department of Computing Science 5DV008 — Computer Architecture Examination: April 09, 2010

Secret code number:

- 1. Answers may be written in English or Swedish. However, all technical terms which do not have an absolutely standard representation in Swedish must be given in English.
- 2. A pocket engineering calculator (miniräknare) and English/X X/English dictionary may be used. No other help materials are allowed.
- 3. Answers must be written on the official university answer sheets which are provided the sheets in numerical order of the problems, and write on only one side of the paper. Write only the question number and your secret code number on these pages; do not write your name or ID.
- 4. Show your work. For questions which require that an answer be computed, answers without derivations will not receive credit. It is furthermore a good idea to show the symbolic formula used to obtain the answer, since that will result in more credit in the case that an error is made.
- 5. The examination has a total of 1000 points.
- 6. For problems with multiple parts, you have the choice, for each part, to do the problem, or to skip it for partial credit. In the table below, place an X in the position for any problem for which you have attempted a solution, and which you wish to have graded. It is extremely important that you fill in this table properly, because of the following option. For any box which is left blank, the associated question will not be graded, and you will instead be awarded 15% of the points for that question. Your decision to leave a box blank is definitive, so be very careful. For example, If you leave box 8(b) blank, your answer to that question will not be graded, even if it is completely correct. On the other hand, if you place an X in box 8(b), but provide no answer whatsoever to that question, you will not receive 15% of the points for that question. It is strongly recommended that you use a pencil, in case you change your mind!

| Prob | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|------|---|---|---|---|---|---|---|---|
| (a)  |   |   |   |   |   |   |   |   |
| (b)  |   |   |   |   |   |   |   |   |
| (c)  |   |   |   |   |   |   |   |   |

(1: 100 points total) MIPS instructions of the R-format (*register* format) have six fields. Identify each of these six fields by name, and indicate the size and purpose of each. It is not necessary to specify such details as the position of each field within the instruction word.

(2: 125 points total) Answer the following questions about the representation of floating-point numbers using IEEE 754 format.

- (a: 50 points) Identify the three fields which comprise this format and explain briefly the purpose of each.
- (b: 50 points) The significand is represented in *normalized* format. Explain what this means, and identify the advantage of employing this format.
- (c: 25 points) In floating-point arithmetic, explain what is meant by *underflow*.

(3: 125 points total) An implementation of the MIPS architecture following the model developed in the textbook has the following delays for each of the five main stages:

| Component                                      | Delay   |
|------------------------------------------------|---------|
| Instruction fetch (IF)                         | 100 ps. |
| Instruction decode and register file read (ID) | 90 ps.  |
| Execution or address calculation (EX)          | 210 ps. |
| Data memory access (MEM)                       | 115 ps. |
| Write back (WB)                                | 80 ps.  |

- (a: 60 points) Assuming a non-pipelined implementation, and assuming all delays not explicitly identified above to be negligible, compute the latency for each of the six instructions add, addi, bne, jr, lw, and sw.
- (b: 40 points) Repeat part (a), this time assuming a pipelined implementation with the five stages identified above.
- (c: 25 points) Suppose that the instruction mix of a program is as follows: add: 10%; addi: 10%; bne: 30%; jr; 10%; lw: 15%; sw: 25%. For the pipelined implementation, compute the percentage of total cycles which utilize data memory. Assume that there are no stalls or hazards.

(4: 150 points total) In a pipelined implementation, it is often the case that instruction A writes a register which a second instruction B later reads, thus creating a *read after write* dependency. In answering the questions below, assume that the read is from a register and the write is to a register; instructions which involve memory access need not be considered. Use the five-stage pipeline architecture which is developed in the course text and slides.

- (a: 50 points) Suppose that the write of instruction A and the read of instruction B occur during the same clock cycle. Explain the implementation mechanism for dealing with such a hazard.
- (b: 50 points) Repeat (a), this time assuming that the read of instruction B occurs one clock cycle later than the write of instruction A.
- (b: 50 points) Repeat (a), this time assuming that the read of instruction B occurs two clock cycles later than the write of instruction A.

(5: 125 points) A program contains two distinct branch instructions. For the first instruction, the pattern YYNYY repeats endlessly, where Y indicates that the branch was taken and N that it was not. For the second program, the pattern YYNNN repeats endlessly. For each of these two patterns, answer the following questions.

- (a: 40 points) Compute the success rate for the always-taken and never-taken branch predictors.
- (b: 40 points) Suppose that a one-bit predictor is employed, with the initial state 0 (previous branch not taken). Compute the accuracy of the prediction for the first k iterations of the loop, with k large enough to identify the repeating pattern.
- (c: 45 points) Now suppose that a two-bit predictor is employed, with the initial state 00 (previous two branches not taken). Compute the accuracy of the prediction for the first k iterations of the loop, with k large enough to identify the repeating pattern.

(6: 125 points total) A direct-mapped cache is to hold 64K bytes of data. Assuming that a one-bit valid field is required, answer the following questions for four-word data blocks. (A word is 32 bits.)

- (a: 40 points) Determine the size of the tag field.
- (b: 40 points) Compute the total number of bytes required for the cache.
- (c: 45 points) Compute the size of the index needed for the cache.

(7: 125 points) A memory system operates on a bus which is one-word (32-bits) wide. It is governed by the following parameters:

| Time to send the address to memory  | 1 clock cycle   |
|-------------------------------------|-----------------|
| Row cycle time                      | 20 clock cycles |
| Column access time                  | 7 clock cycles  |
| Time to return one word from memory | 1 clock cycle   |

- (a: 45 points) Compute the number of clock cycles necessary to fetch one word from memory.
- (b: 40 points) Compute the number of clock cycles necessary to fetch eight words from memory, assuming that the memory is not interleaved, and that there are two blocks of four words, with the words of each block in the same row, but the two blocks in different rows.
- (c: 40 points) Compute the number of clock cycles necessary to fetch eight words from memory, assuming that the memory is interleaved, and that each word of a four-block word is in a different bank of the memory. Assume further that the two blocks are in different rows.
- (8: 125 points total) A hard drive for a laptop computer has the parameters given in the table below.

| Parameter           | Value          |
|---------------------|----------------|
| Rotational speed    | 4200 rpm       |
| Average seek time   | 15 ms.         |
| Controller overhead | 1 ms.          |
| Transfer rate       | 25M bytes/sec. |

- (a: 75 points) Compute the average amount of time required to read or write 100K Bytes. Assume that the entire transfer requires only one seek and latency penalty.
- (b: 50 points) To increase performance, two alternatives are possible. The first alternative has a rotational speed of 5400 rpm, with all other parameters the same as in the table above. The second alternative has a transfer rate of 50M bytes/sec., with all other parameters the same as in the table above. Determine which one of these alternatives will give the greater improvement for the average amount of time required to read or write 100K Bytes over that which was computed in part (a).