55:132/22C:160 Spring 2010

High-Performance Computer Architecture

Sixth Homework Assignment

(Second Verilog Project)

REVISED—04/15/10

Due Date: Tuesday, April 27

Objective

To gain an understanding of CPU pipeline design issues, specifically data and control hazards, and the mitigation of these hazards through data forwarding and dynamic branch prediction.

Project Teams

Students will work in teams of two on this assignment. Each team should submit a single solution and report. Students may pick their own partners, if desired. The instructor will facilitate assignment of partners for any students who are unable to find a partner on their own. Any student desiring to be matched with a partner must send an e-mail request to the instructor at the address: kuhl@engineering.uiowa.edu no later than 8:00 a.m. on Thursday, April 15. If, for some reason, you prefer to work alone on the project you may do so. However, you must notify the instructor of this preference by the April 15 deadline. Otherwise you will be assigned to a team. If students wish to work in teams larger than two, this is possible. However, additional project scope must be negotiated in advance with the instructor.

Specification

You will be given a Verilog specification of the simple five-stage pipelined CPU shown shown in this figure. The Verilog model actually has its instruction fetch stage slightly optimized as shown here to permit it to operate with a two-cycle branch delay (with branch target address generation and branch condition evaluation in the EX stage). The CPU executes the instruction set shown here. The pipeline implements delayed branching. Hence, two non-branch-dependent instructions (or nops) must be placed in the shadow of each conditional branch instruction. The Verilog model also includes a hazard detection module (dethazard.v). This hazard detection module identifies data hazards and introduces pipeline stalls as needed. The supplied Verilog model does not implement any forwarding paths for data hazards.

You are to extend this design in the following manner:

Implement a simple dynamic branch prediction scheme as follows:

Implement a Branch Target Buffer (BTB) in the fetch stage of the pipeline. This BTB should have 32 entries and should be direct-mapped, using bits 6:2 of the fetch address.
Each BTB entry should contain the Branch Instruction Address, the Branch Target Address, and a two-bit local branch history, based on a Smith (saturating counter) predictor. Following BTB hit, the fetch stage should dynamically predict the branch outcome from the history bits and if the prediction is “branch-taken” should redirect the next instruction fetch to the branch target address specified in the BTB. If the branch outcome is later determined to be not-taken, the instructions fetched from the predicted target address must be cancelled. Note that for our simple instruction set, branch target addresses will constant since the only addressing mode is PC+Immed.
For branch instructions that hit the BTB, but are predicted as “not-taken”, and BTB misses, the next instruction fetch should continue with the instruction sequentially following the branch. If the actual branch outcome is later determined to be “taken”, the instructions fetched along the “not-taken” path must be cancelled.
For all branch instructions, BTB entries should be updated after the branch outcome and branch target address are computed in the ALU stage.
With the new branch prediction strategy described above, delayed branching semantics are no longer used so there are no “branch shadow” instructions.

Run the provided matrix multiplication code on the original pipeline and after the addition of dynamic branch prediction mechanisms in step 2. Note the difference in performance due to the use of dynamic branch prediction. Make sure that you compute BTB hit rate and the successful prediction rate.

To Run the Verilog Simulation

To run a simulation you need to initialize the memory for both instructions (Imem) and data (Dmem). The data for these memories are kept in text files and read in at startup. Edit the file "sdlx.v" to read the correct input files. In the supplied version this is near line 181. An example of a Dmem file (matrix.dat) is in the directory along with the source code for the matrix multiply program (matmult.s). Note that matmult6x6.s has NOP instructions in the shadows of its branch instructions. A modified version of the matcix multiply program, called matmultNoDelay.s is also provided.
You can use the command "vlog -f pipeline.vc" to compile all of the source files at one time.

What to turn in:

Block diagrams showing the design. (You may hand this in separately on paper)
All Verilog files, generously commented.
Transcript files showing the performance of the original design (with delayed branching) versus your modified design (with dynamic branch prediction) on the matrix multiply program. (Be sure to use matmult6x6.s in the former case and matmultNoDelay.s in the latter case)
A discussion of simulation results. Be sure to discuss all issues relevant to your design. In particular, you should compare the observed performance of the pipeline in its original form and after the addition of dynamic branch prediction.

Your submission should be in the form of a tarred directory, where the name of the directory is the concatenation of your login names (hawkIDs) in alphabetical order--i.e. if your hawkid is "smith" and your partner's is "jones", the name of your submission directory should be "jonessmith". Project documentation should be in a subdirectory called "Documentation". Instructions for packaging your submission into a tar file can be found here.

You should submit your tarred directory via e-mail to: hpca@engineering.uiowa.edu

Make sure you DON'T include the work directory in the submission. (i.e. The directory created by the "vlib work" command). To avoid losing points, the submission must be mailed before 11:59 p.m. on Tuesday, April 27.

Due Date

Tuesday, April 27 by 11:59 p.m.

Source and Test Code

Source/Test Code