CSCE 432/832 High Performance Processor Architectures Scalar to Superscalar

CSCE 432/832 High Performance Processor ArchitecturesScalar to Superscalar Adopted from Lecture notes based in part on slides created by Mikko H. Lipasti, John Shen, Mark Hill, David Wood, Guri Sohi, and Jim Smith

Scalar to Superscalar • Forecast • Real pipelines • IBM RISC Experience • The case for superscalar • Instruction-level parallel machines • Superscalar pipeline organization • Superscalar pipeline design CSCE 432/832, Scalar to Superscalar

Instructions Cycles Time = X X Instruction Program Cycle (code size) (CPI) (cycle time) Processor Performance • In the 1980’s (decade of pipelining): • CPI: 5.0 => 1.15 • In the 1990’s (decade of superscalar): • CPI: 1.15 => 0.5 (best case) Time Processor Performance = --------------- Program CSCE 432/832, Scalar to Superscalar

Amdahl’s Law • h = fraction of time in serial code • f = fraction that is vectorizable • v = speedup for f • Overall speedup: N No. of Processors h 1 - h f 1 - f 1 Time CSCE 432/832, Scalar to Superscalar

Revisit Amdahl’s Law • Sequential bottleneck • Even if v is infinite • Performance limited by nonvectorizable portion (1-f) N No. of Processors h 1 - h f 1 - f 1 Time CSCE 432/832, Scalar to Superscalar

Pipelined Performance Model • g = fraction of time pipeline is filled • 1-g = fraction of time pipeline is not filled (stalled) CSCE 432/832, Scalar to Superscalar

Pipelined Performance Model N Pipeline Depth 1 g 1-g • g = fraction of time pipeline is filled • 1-g = fraction of time pipeline is not filled (stalled) CSCE 432/832, Scalar to Superscalar

Pipelined Performance Model • Tyranny of Amdahl’s Law [Bob Colwell] • When g is even slightly below 100%, a big performance hit will result • Stalled cycles are the key adversary and must be minimized as much as possible N Pipeline Depth 1 g 1-g CSCE 432/832, Scalar to Superscalar

Motivation for Superscalar[Agerwala and Cocke] Speedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s=1 (scalar) Typical Range CSCE 432/832, Scalar to Superscalar

Superscalar Proposal • Moderate tyranny of Amdahl’s Law • Ease sequential bottleneck • More generally applicable • Robust (less sensitive to f) • Revised Amdahl’s Law: CSCE 432/832, Scalar to Superscalar

Limits on Instruction Level Parallelism (ILP) CSCE 432/832, Scalar to Superscalar

Superscalar Proposal • Go beyond single instruction pipeline, achieve IPC > 1 • Dispatch multiple instructions per cycle • Provide more generally applicable form of concurrency (not just vectors) • Geared for sequential code that is hard to parallelize otherwise • Exploit fine-grained or instruction-level parallelism (ILP) CSCE 432/832, Scalar to Superscalar

Classifying ILP Machines [Jouppi, DECWRL 1991] • Baseline scalar RISC • Issue parallelism = IP = 1 • Operation latency = OP = 1 • Peak IPC = 1 CSCE 432/832, Scalar to Superscalar

Classifying ILP Machines [Jouppi, DECWRL 1991] • Superpipelined: cycle time = 1/m of baseline • Issue parallelism = IP = 1 inst / minor cycle • Operation latency = OP = m minor cycles • Peak IPC = m instr / major cycle (m x speedup?) CSCE 432/832, Scalar to Superscalar

Classifying ILP Machines [Jouppi, DECWRL 1991] • Superscalar: • Issue parallelism = IP = n inst / cycle • Operation latency = OP = 1 cycle • Peak IPC = n instr / cycle (n x speedup?) CSCE 432/832, Scalar to Superscalar

Classifying ILP Machines [Jouppi, DECWRL 1991] • VLIW: Very Long Instruction Word • Issue parallelism = IP = n inst / cycle • Operation latency = OP = 1 cycle • Peak IPC = n instr / cycle = 1 VLIW / cycle CSCE 432/832, Scalar to Superscalar

Classifying ILP Machines [Jouppi, DECWRL 1991] • Superpipelined-Superscalar • Issue parallelism = IP = n inst / minor cycle • Operation latency = OP = m minor cycles • Peak IPC = n x m instr / major cycle CSCE 432/832, Scalar to Superscalar

Superscalar vs. Superpipelined • Roughly equivalent performance • If n = m then both have about the same IPC • Parallelism exposed in space vs. time CSCE 432/832, Scalar to Superscalar

Superpipelining “Superpipelining is a new and special term meaning pipelining. The prefix is attached to increase the probability of funding for research proposals. There is no theoretical basis distinguishing superpipelining from pipelining. Etymology of the term is probably similar to the derivation of the now-common terms, methodology and functionality as pompous substitutes for method and function. The novelty of the term superpipelining lies in its reliance on a prefix rather than a suffix for the pompous extension of the root word.” - Nick Tredennick, 1991 CSCE 432/832, Scalar to Superscalar

Superpipelining: Hype vs. Reality CSCE 432/832, Scalar to Superscalar

Superscalar Challenges CSCE 432/832, Scalar to Superscalar

CSCE 432/832 High Performance Processor Architectures Scalar to Superscalar

CSCE 432/832 High Performance Processor Architectures Scalar to Superscalar

Presentation Transcript

CSCE 531 Compiler Construction Introduction

CSCE 212 Chapter 2: Instruction Set Architecture

Predicate Logic and Quantifies

CSCE 416 : Introduction to Computer Networks

Introduction to Logic

CSCE 580 Artificial Intelligence

CSCE 580 Artificial Intelligence Ch.4: Features and Constraints

CSCE 641 Computer Graphics: Image Processing

CSCE 641: Computer Graphics Rotation Representation and Interpolation

Moving Forward with (Coastal) Design and Management

CSCE 431: Requirements

CONSTRAINT-BASED SCHEDULING and PLANNING

CSCE 747 Software Testing and Quality Assurance

CSCE 641 Computer Graphics: Image-based Modeling

CSCE 531 Compiler Construction Ch.4: Lexical Analysis

CSCE 411 Design and Analysis of Algorithms

CSCE 580 Artificial Intelligence Ch.3: Uninformed (Blind) Search

Introduction to UML

Ruby and the tools 740Tools03RubyRegExpr

CSCE 580 Artificial Intelligence Ch.4: Informed (Heuristic) Search and Exploration

CSCE 441 Computer Graphics: Animation with Motion Capture

CSCE 552 Spring 2010