60 likes | 193 Vues
In the rapidly evolving landscape of computing, the focus has predominantly been on performance enhancements. However, as Dave Patterson from the University of California, Berkeley, argues, the real challenges have shifted towards software reliability and debugging. This proposal advocates for integrating lightweight transaction support within CPUs. By leveraging existing out-of-order execution mechanisms, we can improve software error recovery and facilitate easier debugging. This innovative approach aims to address longstanding software issues and foster higher programming productivity.
E N D
Patterson Consulting Radical Proposal: Let’s help real problems Dave Patterson University of California at Berkeley Patterson@cs.berkeley.edu April 2001
Execution 2 Bus Intf D cache TLB Out-Of-Order branch SS Icache What have designers been doing? • Performance, Performance, Performance; 2X/18 months • Superscalar, 3 levels of cache, branch prediciton, out-of-order execution, … • If performance right goal, then > 1GHz => sales jump • ~ Year 2000 v. 1999 car sales • What Happened? • US PC market shrank 3.5 % 1Q01; 1st shrink in 7 yrs! • Performance no longer hard? Pentium III
Time to help on other problems? • Software quality? • Fry’s Law: 2X programming productivity every 18 years • Last architectural assist was virtual memory protection, 1970? • SW Engineering perspective on SW bugs: • Bugs reproducible from inputs will be repaired • Transient errors very hard to fix • Jim Gray hypothesis: • Most production software bugs are soft - Heisenbugs • Bohrbugs, like the Bohr atom, are solid, easily detected by standard techniques, and hence boring • Then can repair most SW bugs by restart
What should HW designers do? • Already have heavyweight transactions in, databases, operating systems • atomic event that can be completely undone if fails midstream • Expensive so done only on some disk operations • Support lightweight transactions in CPU 1) Help with restart of routines to fix Heisenbugs 2) Make Software error recovery more reliable • Start transaction • SW detects error • back out all evidence of work to original place
What will it take? • Mechanisms in modern CPUs for performance speculation lay foundation • Speculative execution via branch prediction, out-of-order execution, in-order completion allows “transactions” per branch, preserving interrupts: Reorder Buffer, Memory Buffer, Commit Table, … • Transmeta Crusoe provides • software control of Write Buffer, allowing SW to discard results of speculative SW execution; • shadowed registers so can go to old values • Expand these mechanisms to support transactions to help with SW bugs • Shadow registers, Much Bigger write buffer (100 KB?) under SW control
Summary • Performance no longer the only problem • Last 15 years: Processors 1000X faster, memories 1000X bigger allow SW 1000X bigger • No help to SW in last 15 years • 1 bugs / 1000 lines of code, millions of lines of code • Real problem is SW • Like Simulateneous Multithreading, which uses existing OOO HW to improve throughput of threads, Tranaction support uses existing OOO HW and SW mechanisms to provide undo for SW bugs, SW repair • Solve an important problem, no performance!