Chapter 7: Intel's P6 Architecture Modern Processor Design: Fundamentals of Superscalar Processors

## Pentium Pro Case Study

- Microarchitecture
  - Order-3 Superscalar
  - Out-of-Order execution
  - Speculative execution
  - In-order completion
- Design Methodology
- Performance Analysis

# Goals of P6 Microarchitecture

IA-32 Compliant

Performance (Frequency - IPC)

Validation

Die Size

Schedule



































## **Instruction Completion**

- Handles all exception/interrupt/trap conditions
- Handles branch recovery
  - OOO core drains out right-path instructions, commits to RRF
  - In parallel, front end starts fetching from target/fall-through
  - However, no renaming is allowed until OOO core is drained
  - After draining is done, RAT is reset to point to RRF
  - Avoids checkpointing RAT, recovering to intermediate RAT state
- Commits execution results to the architectural state in-order
  - Retirement Register File (RRF)
  - Must handle hazards to RRF (writes/reads in same cycle)
  - Must handle hazards to RAT (writes/reads in same cycle)
- "Atomic" IA-32 instruction completion
  - uops are marked as 1st or last in sequence
  - exception/interrupt/trap boundary
- 2 cycle retirement



### Pentium Pro Performance Analysis

- Observability
  - On-chip event counters
  - Dynamic analysis
- Benchmark Suite
  - BAPco Sysmark32 32-bit Windows NT applications
  - Winstone97 32-bit Windows NT applications
  - Some SPEC95 benchmarks











#### Conclusions

IA-32 Compliant

Performance (Frequency - IPC)

366.0 ISpec92

283.2 FSpec92

8.09 SPECint95

6.70 SPECfp95

Validation

Die Size - Fabable

Schedule - 1 year late

Power -