130 likes | 251 Vues
This lecture covers the latest trends in operating systems and computer architecture, highlighting key topics such as superscalar pipelines, out-of-order execution, longer pipelines, and speculation. It examines the significance of techniques like predicated execution and VLIW, and discusses how smarter compilers can enhance performance. The lecture also touches on advancements in multiprocessor chips, simultaneous multithreading, and low-power CPUs, emphasizing the importance of time-to-market in technology development. Essential concepts for modern architecture design are reviewed, aiming to inform students on current industry practices.
E N D
10/27: Lecture Topics • Survey results • Current Architectural Trends • Operating Systems Intro • What is an OS? • Issues in operating systems
Superscalar Pipelines • Superscalar pipelines can execute multiple instructions at once • 2+ instructions in any stage of the pipeline • Some processors allow 8 instructions to be issued at once • Most programs can only take advantage of 1 or 2 issue slots
Out-of-Order Execution • Allows you to execute any instruction that you can • Enables more issue slots to be filled • Often out-of-order execution, but in-order commit • that is, write back results in the order they should have occurred • Note: IA-64 is in-order
Longer Pipelines • Pipelines are getting longer • original RISC pipelines had 5 stages • pipelines now have up to 20 stages • Allows the clock cycle to be very fast • Okay as long as you can accurately predict branches (or get rid of them)
Speculation • Prediction • better branch predictors (95% accurate) • predict many levels of branches • predict variable values • predict load addresses • Simultaneously execute both paths of a branch • Execute instructions even if there could be a dependency • sw after lw could be the same address, but probably not • let the sw execute and then fix it if you were wrong
Predicated Execution • Predicated execution allows conditional moves and conditional adds instead of only conditional branches • Avoids branches, which are bad because pipelines are so long • IA-64 almost everything in IA-64 is predicated (many 1-bit predicate registers) • HW problem with movn and movz was an example of this
VLIW • Long Instruction Words (LIW) and Very Long Instruction Words (VLIW) • each instruction contains multiple smaller instructions that execute in parallel • (V)LIW instructions can be 128 to 1024 bits long and contain 3 to 16 instructions • It's the compiler's job to find independent instructions to execute
Register Windows • Saving registers on the stack during procedure call hurts performance • Register windows use a stack of registers that are allocated to a procedure as it needs it Baz() Bar() Foo()
Smarter Compilers • VLIW requires good compilers • Predicated execution and speculation needs help from the compiler • Old architectures had instructions to emulate high-level constructions (bad) • New architectures provide many general instructions and instruction options • IA-64 will keep compiler writers busy for a decade
Multiple CPUs on a Chip • Chip multiprocessors • multiple simple CPUs, but share a cache • can run multiple programs simultaneously • single programs are no faster • like a multiprocessor machine but cheaper • Simultaneous Multithreading (SMT) • more complex CPUs • like chip multiprocessors + superscalar + out-of-order • also improves single program performance • developed at UW • memory bandwidth is an issue for both
Funky Hardware on a Chip • We can squeeze more and more transistors on a chip • What do we do with them? • Bigger caches (boring) • Put programmable hardware on the CPU • FPGAs can be (re)programmed quickly • hardware runs 1000X faster than software • Graphics specific hardware • Instruction Co-Processors • Simultaneously run two copies of all programs to avoid hardware glitches
Low Power • CPUs are being put in everything, even devices that have very small batteries (tiny sensors) • Need to make CPUs that use very little power (only as much as they need) • reduce the CPU clock frequency • allow the OS to turn off part of the chip • Transmeta is building chips that emulate Intel x86, but with less power
Time to Market • It used to be solely about being the fastest • Now being adequate is enough • Being the first technology to fill a need is the most important