290 likes | 429 Vues
This lecture provides a comprehensive overview of the Pentium architecture and its evolution, detailing the instruction sets including the Pentium Pro, Pentium II, Pentium III, and Pentium 4. We examine the terminology of the Pentium instruction format, which supports various operand sizes and addressing modes. Additionally, we focus on the critical role of memory management and the significance of SIMD instruction sets such as MMX and SSE in multimedia applications. The lecture outlines how each new processor generation introduced specialized instructions, addressing modes, and memory management capabilities.
E N D
ENGR 330: Today’s Class • Notes • Networking/Telecom Course (QMCS 370) • CIGs • Pentium Instruction Set • Format overview • Evolution • Details, Address Modes • Pentium Architecture/Pipelining • Pentium, the first • Pentium Pro • Pentium 3 • Pentium 4 • Memory Management (if time) R. Smith - University of St Thomas - Minnesota
Pentium Instruction Format • Supports 8, 16, 32-bit operands • Officially 17 addressing modes, arguably more • Keyed off the opcode and prefixes • Identical “assembly language” from old 8080 CPU R. Smith - University of St Thomas - Minnesota
Chronology • 8080 (1974) • 8-bit registers, 16-bit RAM addresses (MITS Altair) • 8086, 8088 (1978) “IA-16” • 16-bit registers and RAM addresses (IBM-PC) • 8088 hardware was ‘backwards compatible” with 8085 • “Assembler compatible” with 8080 - just reassemble • Segmentation allowed 1MB of RAM addressing • 80386 (1985) “IA-32” • 32-bit registers w/smaller ‘subsets’ for compatibility • 32-bit addresses made segments irrelevant • Pentium - the first (1993) • P6 Family introduced in 1995 • Pentium Pro, Pentium II, Pentium III, etc. • Pentium 4 introduced in 2000 R. Smith - University of St Thomas - Minnesota
Instruction Set Extensions • Each new processor brought new instructions • Specialized sets, too • 80x87 Math Co-Processor • Introduced floating point instructions and stack • Integrated into later processors • MMX (1997) • SIMD instructions, 8 integer registers @ 64 bits (reused FP) • 3DNow! (AMD in 1997) • MMX extended to support floating point operations • SSE (1999; SSE2 in 2000 for integers) • 8 giant 128-bit registers for SIMD operation R. Smith - University of St Thomas - Minnesota
Pentium General Registers • Cut into halves/quarters for compatibility R. Smith - University of St Thomas - Minnesota
Pentium Registers • Address Space • Segments with 32 bit addresses • Usually only 1 segment is used by a program • Standard general purpose registers • EAX, EBX, ECX, and EDX – 32 bits each, with lower half accessible separately and as separate high/low bytes • Each has special jobs in certain arithmetic instructions • Address Registers • ESI, EDI – point to strings in memory • EBP – points to bse of the current stack frame (local memory) • ESP – the stack pointer R. Smith - University of St Thomas - Minnesota
Intel Assembly Language • Opcode destination, source • Format similar to LC-3 and MIPS • BUT, allows memory to memory transfers • Operands may be reg/mem, reg/reg, mem/mem, mem/reg • But it all depends on the opcode - many weird restrictions • Segment Registers – mostly obsolete • Provide the “upper” part of the address in 16bit-1MB days • RAM addresses traditionally included a segment register • MOV AL,DS:[7777h] • Move contents of 7777 hex (mapped by DS) to AL • DS segment is for “data” - the default segment R. Smith - University of St Thomas - Minnesota
Addressing Modes • “Displacement only” - direct address • Traditionally uses a segment register • DS is the default • Register Indirect • MOV AL,[BX] - moves to RAM addressed by BX • MOV AL,ES:[DI] addressed by DI with ES segment • The ‘BP’ register uses SS segment by default • Various Indexed modes • Combine 1 or 2 index registers plus offset • May include offset R. Smith - University of St Thomas - Minnesota
Memory Addressing Modes Summary • Pick zero or one from each column • Suffix an “E” for the Pentium registers • BX = EAX, EBX, ECX, EDX, ESP, or EBP • Also add a “scale factor” for 8/16/32/64 R. Smith - University of St Thomas - Minnesota
Pentium Instructions • Traditional Instructions • Add, sub, add w/carry, sub w/carry, mul, div • BCD arithmetic, booleans, shifts/rotates, string ops • Loops, conditionals, condition code setting, subroutines • MMX / SSE / XXM Extensions • Intended to better support image manipulation for multimedia • MMX: Eight 64-bit registers, plus special instructions • SSE/XXM: Eight 128-bit registers usable as 16 registers of 64 bits • Parallel adds, shifts, multiplies of multiple values packed into MMX registers • Example applications • Subtracting one image from another for overlaying • Unpacking a compressed image (JPEG, MPEG, etc) R. Smith - University of St Thomas - Minnesota
Architecture of The First Pentium • Before SIMD and super-graphics, but still a major machine • Superscalar – faster than “linear” instruction execution • Separate caches for instructions and data R. Smith - University of St Thomas - Minnesota
Fixed Point: 5 stages U-pipe and V-pipe are interchangeable except if the instruction needs the barrel shifter. U and V can run 2 instructions in parallel – cover vast majority of instructions used Floating Point: 8 stages Uses part of the fixed point pipeline Pentium Details R. Smith - University of St Thomas - Minnesota
Pentium ProProcessor Overview • First P6 Pentium • 1995 • Pentium II in 1997 • Pentium III in 1999 • Reservation Station • Decoupled instruction fetching from execution • Leap in performance R. Smith - University of St Thomas - Minnesota
P6 Architecture – Caches & Execution R. Smith - University of St Thomas - Minnesota
Pentium Pro Details R. Smith - University of St Thomas - Minnesota
Pentium III System Structure R. Smith - University of St Thomas - Minnesota
Pentium III Processor R. Smith - University of St Thomas - Minnesota
P III Instruction Execution Units R. Smith - University of St Thomas - Minnesota
Pentium 4 System Structure R. Smith - University of St Thomas - Minnesota
Pentium 4 microinstructions • Embeds a RISC architecture and pipelining within a CISC instruction set • Instructions fetched to CPU • Translated into internal RISC-style “microinstructions” • Microinstructions are stored in the level 0 instruction cache • CPU execution logic executes microinstructions in a pipelined fashion • Retains compatibility with old Pentium and x86 code while achieving RISC-like performance R. Smith - University of St Thomas - Minnesota
Pentium 4 Processor Architecture R. Smith - University of St Thomas - Minnesota
Memory Hierarchies • Temporal Locality • If I touched location X just now, I’ll likely touch it again soon. • Spatial Locality • If I touch location X, I’ll probably also touch X+1, X-1, etc. • Lesson: keep stuff you’re using nearby in the fastest RAM you can build • Lesson: if you’re not using it right now, it’s OK to stick it in slower storage till you need it • Lesson: the system can hide the hierarchy from your programs, most of the time R. Smith - University of St Thomas - Minnesota
Storage Technologies (costs in 2004) • At the top: Hard Drives • Size: Terabytes. Cost/GB: $.50-$2 • Access time: 5 million to 20 million nsec • Flash • Size: Gigabytes. Cost/GB: $15 • Access time: 200 nsec • Dynamic RAM (typical computer RAM) • Size: Gigabytes, Cost/GB: $100-200 • Speed 50-70 nsec • Static RAM (cache, on-chip, registers) • Size: Megabytes, Cost/GB: $4K to $10K • Speed: .05 - 5 nsec R. Smith - University of St Thomas - Minnesota
The driving force in computer design • Programs are hard to write • How do we get the most out of the programs we have already written? • Implications for memory • CPU mustn’t see cache operation in general • CPU mustn’t see oddities in RAM layout or availability • How do we hide these details? R. Smith - University of St Thomas - Minnesota
Hiding the details • Cache implementation • We give the CPU an MAR/MDR interface • We make most RAM references as fast as possible • We NEVER make a mistake • Process swapping • Multiprocessor problems • RAM Management • We make RAM look identical to all programs • Programs can’t tell where they really reside in RAM • Give programs exactly as much RAM as they need at a given time, and give the rest away to other programs that are taking turns with the CPU R. Smith - University of St Thomas - Minnesota
Direct Mapped Cache • The preferred design these days • A collection of high speed RAM locations • Broken into individually addressed “cache entries” • Part of RAM address chooses cache entry (“Direct mapping”) • A cache entry • “Index” is its address in the cache • Valid bit - true if the entry contains valid RAM data • “Tag” holds the address bits not matching the cache address • Data area - where the stored data resides • Store multiple words (spatial locality) R. Smith - University of St Thomas - Minnesota
Example • 32 bit RAM addresses • 64 cache entries, each contains 16 bytes • How do we resolve cache addresses? • How big is the tag field? • How much RAM does it need, in bits, per entry? • How much for the whole cache? R. Smith - University of St Thomas - Minnesota
CPU and Cache Handling • What happens with a cache hit? • What happens with a cache miss? • A stall, like a pipeline stall, but simpler • We stall the whole CPU - inefficient but it’s the best approach • What happens when we write data? • “Write through” runs the write while CPU proceeds • Other CPU accesses get the cached, updated value • “Write miss” - obvious approach isn’t efficient • Use a “write buffer” to catch missed writes R. Smith - University of St Thomas - Minnesota
All done. • Questions? • Diagrams cribbed from random Internet sites R. Smith - University of St Thomas - Minnesota