55:132/22C:160 Spring 2010
Sixth
Homework Assignment
(Second
Verilog Project)
REVISED—04/15/10
Due Date: Tuesday, April 27
Objective
- To gain an understanding of CPU pipeline design issues,
specifically data and control hazards, and the mitigation of these hazards
through data forwarding and dynamic branch prediction.
Project Teams
Students will work in teams of two on this assignment. Each team
should submit a single solution and report. Students may pick their own
partners, if desired. The instructor will facilitate assignment of partners for
any students who are unable to find a partner on their own. Any student
desiring to be matched with a partner must send an e-mail request to the instructor
at the address: kuhl@engineering.uiowa.edu no
later than 8:00 a.m. on Thursday, April 15. If, for some reason, you
prefer to work alone on the project you may do so. However, you must
notify the instructor of this preference by the April 15 deadline. Otherwise
you will be assigned to a team. If students wish to work in teams larger than
two, this is possible. However, additional project scope must be negotiated in
advance with the instructor.
Specification
You will be given a Verilog
specification of the simple five-stage pipelined CPU shown shown
in this figure. The Verilog
model actually has its instruction fetch stage slightly optimized as shown here to permit it to operate with a two-cycle branch delay
(with branch target address generation and branch condition evaluation in the
EX stage). The CPU executes the instruction set shown here.
The pipeline implements delayed branching. Hence, two non-branch-dependent
instructions (or nops) must be placed in the shadow
of each conditional branch instruction. The Verilog model
also includes a hazard detection module (dethazard.v).
This hazard detection module identifies data hazards and introduces pipeline
stalls as needed. The supplied Verilog model does not
implement any forwarding paths for data hazards.
You are to extend this design in the following manner:
- Implement a simple dynamic branch prediction scheme as
follows:
- Implement a Branch
Target Buffer (BTB) in the fetch stage of the pipeline. This BTB
should have 32 entries and should be direct-mapped, using bits 6:2 of the
fetch address.
- Each BTB entry should
contain the Branch Instruction Address, the Branch Target Address, and a
two-bit local branch history, based on a Smith (saturating counter)
predictor. Following BTB hit, the fetch stage should dynamically predict
the branch outcome from the history bits and if the prediction is
“branch-taken” should redirect the next instruction fetch to the branch
target address specified in the BTB. If the branch outcome is later
determined to be not-taken, the instructions fetched from the
predicted target address must be cancelled. Note that for our simple
instruction set, branch target addresses will constant since the only
addressing mode is PC+Immed.
- For branch instructions
that hit the BTB, but are predicted as “not-taken”, and BTB misses, the
next instruction fetch should continue with the instruction sequentially
following the branch. If the actual branch outcome is later determined to
be “taken”, the instructions fetched along the “not-taken” path must be
cancelled.
- For all branch
instructions, BTB entries should be updated after the branch outcome and
branch target address are computed in the ALU stage.
- With the new branch
prediction strategy described above, delayed branching semantics are no
longer used so there are no “branch shadow” instructions.
- Run the provided matrix multiplication code on the
original pipeline and after the addition of dynamic branch prediction
mechanisms in step 2. Note the difference in performance due to the use of
dynamic branch prediction. Make sure that you compute BTB hit rate
and the successful prediction rate.
To Run the Verilog Simulation
- To run a simulation you need to initialize the memory
for both instructions (Imem) and data (Dmem). The data for these memories are kept in text
files and read in at startup. Edit the file "sdlx.v"
to read the correct input files. In the supplied version this is near line
181. An example of a Dmem file (matrix.dat) is
in the directory along with the source code for the matrix multiply
program (matmult.s). Note that matmult6x6.s has NOP
instructions in the shadows of its branch instructions. A modified version of the matcix multiply program, called matmultNoDelay.s
is also provided.
- You can use the command "vlog
-f pipeline.vc" to compile all of the source files at one time.
What to turn in:
- Block diagrams showing the design. (You may hand this
in separately on paper)
- All Verilog files, generously commented.
- Transcript files showing the performance of the
original design (with delayed branching) versus your modified design (with
dynamic branch prediction) on the matrix multiply program. (Be sure to use matmult6x6.s in the
former case and matmultNoDelay.s in the latter
case)
- A discussion of simulation results. Be sure to discuss
all issues relevant to your design. In particular, you should compare the
observed performance of the pipeline in its original form and after the addition
of dynamic branch prediction.
Your submission should be in the form of a tarred directory, where the name of the directory is the concatenation of your login names (hawkIDs) in alphabetical order--i.e. if your hawkid is "smith" and your partner's is "jones", the name of your submission directory should be "jonessmith". Project documentation should be in a subdirectory called "Documentation". Instructions for packaging your submission into a tar file can be found here.
You should submit your tarred directory via e-mail to: hpca@engineering.uiowa.edu
Make sure you DON'T include the work directory in the submission. (i.e. The directory created by the "vlib work" command). To avoid losing points, the submission must be mailed before 11:59 p.m. on Tuesday, April 27.
Due Date
Tuesday, April 27 by 11:59 p.m.
Source and Test Code