ENGR 330: Today’s Class

ENGR 330: Today’s Class • Notes • Networking/Telecom Course (QMCS 370) • CIGs • Pentium Instruction Set • Format overview • Evolution • Details, Address Modes • Pentium Architecture/Pipelining • Pentium, the first • Pentium Pro • Pentium 3 • Pentium 4 • Memory Management (if time) R. Smith - University of St Thomas - Minnesota

Pentium Instruction Format • Supports 8, 16, 32-bit operands • Officially 17 addressing modes, arguably more • Keyed off the opcode and prefixes • Identical “assembly language” from old 8080 CPU R. Smith - University of St Thomas - Minnesota

Chronology • 8080 (1974) • 8-bit registers, 16-bit RAM addresses (MITS Altair) • 8086, 8088 (1978) “IA-16” • 16-bit registers and RAM addresses (IBM-PC) • 8088 hardware was ‘backwards compatible” with 8085 • “Assembler compatible” with 8080 - just reassemble • Segmentation allowed 1MB of RAM addressing • 80386 (1985) “IA-32” • 32-bit registers w/smaller ‘subsets’ for compatibility • 32-bit addresses made segments irrelevant • Pentium - the first (1993) • P6 Family introduced in 1995 • Pentium Pro, Pentium II, Pentium III, etc. • Pentium 4 introduced in 2000 R. Smith - University of St Thomas - Minnesota

Instruction Set Extensions • Each new processor brought new instructions • Specialized sets, too • 80x87 Math Co-Processor • Introduced floating point instructions and stack • Integrated into later processors • MMX (1997) • SIMD instructions, 8 integer registers @ 64 bits (reused FP) • 3DNow! (AMD in 1997) • MMX extended to support floating point operations • SSE (1999; SSE2 in 2000 for integers) • 8 giant 128-bit registers for SIMD operation R. Smith - University of St Thomas - Minnesota

Pentium General Registers • Cut into halves/quarters for compatibility R. Smith - University of St Thomas - Minnesota

Pentium Registers • Address Space • Segments with 32 bit addresses • Usually only 1 segment is used by a program • Standard general purpose registers • EAX, EBX, ECX, and EDX – 32 bits each, with lower half accessible separately and as separate high/low bytes • Each has special jobs in certain arithmetic instructions • Address Registers • ESI, EDI – point to strings in memory • EBP – points to bse of the current stack frame (local memory) • ESP – the stack pointer R. Smith - University of St Thomas - Minnesota

Intel Assembly Language • Opcode destination, source • Format similar to LC-3 and MIPS • BUT, allows memory to memory transfers • Operands may be reg/mem, reg/reg, mem/mem, mem/reg • But it all depends on the opcode - many weird restrictions • Segment Registers – mostly obsolete • Provide the “upper” part of the address in 16bit-1MB days • RAM addresses traditionally included a segment register • MOV AL,DS:[7777h] • Move contents of 7777 hex (mapped by DS) to AL • DS segment is for “data” - the default segment R. Smith - University of St Thomas - Minnesota

Addressing Modes • “Displacement only” - direct address • Traditionally uses a segment register • DS is the default • Register Indirect • MOV AL,[BX] - moves to RAM addressed by BX • MOV AL,ES:[DI] addressed by DI with ES segment • The ‘BP’ register uses SS segment by default • Various Indexed modes • Combine 1 or 2 index registers plus offset • May include offset R. Smith - University of St Thomas - Minnesota

Memory Addressing Modes Summary • Pick zero or one from each column • Suffix an “E” for the Pentium registers • BX = EAX, EBX, ECX, EDX, ESP, or EBP • Also add a “scale factor” for 8/16/32/64 R. Smith - University of St Thomas - Minnesota

Pentium Instructions • Traditional Instructions • Add, sub, add w/carry, sub w/carry, mul, div • BCD arithmetic, booleans, shifts/rotates, string ops • Loops, conditionals, condition code setting, subroutines • MMX / SSE / XXM Extensions • Intended to better support image manipulation for multimedia • MMX: Eight 64-bit registers, plus special instructions • SSE/XXM: Eight 128-bit registers usable as 16 registers of 64 bits • Parallel adds, shifts, multiplies of multiple values packed into MMX registers • Example applications • Subtracting one image from another for overlaying • Unpacking a compressed image (JPEG, MPEG, etc) R. Smith - University of St Thomas - Minnesota

Architecture of The First Pentium • Before SIMD and super-graphics, but still a major machine • Superscalar – faster than “linear” instruction execution • Separate caches for instructions and data R. Smith - University of St Thomas - Minnesota

Fixed Point: 5 stages U-pipe and V-pipe are interchangeable except if the instruction needs the barrel shifter. U and V can run 2 instructions in parallel – cover vast majority of instructions used Floating Point: 8 stages Uses part of the fixed point pipeline Pentium Details R. Smith - University of St Thomas - Minnesota

Pentium ProProcessor Overview • First P6 Pentium • 1995 • Pentium II in 1997 • Pentium III in 1999 • Reservation Station • Decoupled instruction fetching from execution • Leap in performance R. Smith - University of St Thomas - Minnesota

P6 Architecture – Caches & Execution R. Smith - University of St Thomas - Minnesota

Pentium Pro Details R. Smith - University of St Thomas - Minnesota

Pentium III System Structure R. Smith - University of St Thomas - Minnesota

Pentium III Processor R. Smith - University of St Thomas - Minnesota

P III Instruction Execution Units R. Smith - University of St Thomas - Minnesota

Pentium 4 System Structure R. Smith - University of St Thomas - Minnesota

Pentium 4 microinstructions • Embeds a RISC architecture and pipelining within a CISC instruction set • Instructions fetched to CPU • Translated into internal RISC-style “microinstructions” • Microinstructions are stored in the level 0 instruction cache • CPU execution logic executes microinstructions in a pipelined fashion • Retains compatibility with old Pentium and x86 code while achieving RISC-like performance R. Smith - University of St Thomas - Minnesota

Pentium 4 Processor Architecture R. Smith - University of St Thomas - Minnesota

Memory Hierarchies • Temporal Locality • If I touched location X just now, I’ll likely touch it again soon. • Spatial Locality • If I touch location X, I’ll probably also touch X+1, X-1, etc. • Lesson: keep stuff you’re using nearby in the fastest RAM you can build • Lesson: if you’re not using it right now, it’s OK to stick it in slower storage till you need it • Lesson: the system can hide the hierarchy from your programs, most of the time R. Smith - University of St Thomas - Minnesota

Storage Technologies (costs in 2004) • At the top: Hard Drives • Size: Terabytes. Cost/GB: $.50-$2 • Access time: 5 million to 20 million nsec • Flash • Size: Gigabytes. Cost/GB: $15 • Access time: 200 nsec • Dynamic RAM (typical computer RAM) • Size: Gigabytes, Cost/GB: $100-200 • Speed 50-70 nsec • Static RAM (cache, on-chip, registers) • Size: Megabytes, Cost/GB: $4K to $10K • Speed: .05 - 5 nsec R. Smith - University of St Thomas - Minnesota

The driving force in computer design • Programs are hard to write • How do we get the most out of the programs we have already written? • Implications for memory • CPU mustn’t see cache operation in general • CPU mustn’t see oddities in RAM layout or availability • How do we hide these details? R. Smith - University of St Thomas - Minnesota

Hiding the details • Cache implementation • We give the CPU an MAR/MDR interface • We make most RAM references as fast as possible • We NEVER make a mistake • Process swapping • Multiprocessor problems • RAM Management • We make RAM look identical to all programs • Programs can’t tell where they really reside in RAM • Give programs exactly as much RAM as they need at a given time, and give the rest away to other programs that are taking turns with the CPU R. Smith - University of St Thomas - Minnesota

Direct Mapped Cache • The preferred design these days • A collection of high speed RAM locations • Broken into individually addressed “cache entries” • Part of RAM address chooses cache entry (“Direct mapping”) • A cache entry • “Index” is its address in the cache • Valid bit - true if the entry contains valid RAM data • “Tag” holds the address bits not matching the cache address • Data area - where the stored data resides • Store multiple words (spatial locality) R. Smith - University of St Thomas - Minnesota

Example • 32 bit RAM addresses • 64 cache entries, each contains 16 bytes • How do we resolve cache addresses? • How big is the tag field? • How much RAM does it need, in bits, per entry? • How much for the whole cache? R. Smith - University of St Thomas - Minnesota

CPU and Cache Handling • What happens with a cache hit? • What happens with a cache miss? • A stall, like a pipeline stall, but simpler • We stall the whole CPU - inefficient but it’s the best approach • What happens when we write data? • “Write through” runs the write while CPU proceeds • Other CPU accesses get the cached, updated value • “Write miss” - obvious approach isn’t efficient • Use a “write buffer” to catch missed writes R. Smith - University of St Thomas - Minnesota

All done. • Questions? • Diagrams cribbed from random Internet sites R. Smith - University of St Thomas - Minnesota

ENGR 330: Today’s Class

ENGR 330: Today’s Class

Presentation Transcript

Remote Method Invocation (RMI)

UML Class Diagram and Packages

Chapter 3: Lexical Analysis

Chapter 4 - Part 3: Bottom-Up Parsing

Middleware, Service-Oriented Architectures and Grid Computing

ENGR 691 – 73: Introduction to Free-Surface Hydraulics in Open Channels

CSE- Mobile Computing Systems and Applications

Colorado Space Grant Consortium

Today is Friday (!), October 11 th , 2013

Introduction to Computer Vision

Chemistry Class Introduction

Enterprise Computing and Interoperability

Chapter 12: Virtual Memory

Class 2

Southern Methodist University Fall 2003 EETS 8316/NTU CC745-N Wireless Networks

The Java Language and Environment

ISLAM: From God or Man? Class 4 – Reaching Muslims through Jesus Christ

Topics for Today

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications

ENGR 140: Engineering Mechanics - Statics

Outline for today

Architectural Alternatives for HIE