1 / 20

PROCESSOR DESIGN Lan Jin Tsinghua University California State University-Fresno

PROCESSOR DESIGN Lan Jin Tsinghua University California State University-Fresno. Computer Architecture Arena Design Requirements and Constraints Development of IC Technology High-Performance Processor Architectures Electronic Design Automation Embedded Computing

ondrea
Télécharger la présentation

PROCESSOR DESIGN Lan Jin Tsinghua University California State University-Fresno

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PROCESSOR DESIGN Lan Jin Tsinghua University California State University-Fresno • Computer Architecture Arena • Design Requirements and Constraints • Development of IC Technology • High-Performance Processor Architectures • Electronic Design Automation • Embedded Computing • Cellular Computing

  2. Computer Architecture Arena • Markets • Technology • Target applications

  3. Design Requirements and Constraints Requirements • High performance • Low cost • Low power Constraints fueled by increasing IC density and switching speed • Power dissipation • Wire-length barrier • Design and verification complexities

  4. Development of IC Technology • Moore’s Law ◊ Computing power becomes half as expensive every 18 to 24 months ◊ No. of transistors per chip doubles every ≈ 18 months. • Predicted IC by 2005 Roadmap of SIA ◊ 200 M transistors, 0.1µm feature size ◊ 2.0-3.5 GHz ◊ 0.9-1.2 V, dynamic power could be 150W!

  5. Development of IC Technology(continued) The IC Design Paradigm • Transistor-component based design • Cell-based and RTL-based design • IP-based design for SOC development Key IC-Design Issues • Switching currents • due to extremely high switching speeds • Optimization • tighter constraints on speed, power, and cost • Asynchronization • global, tightly skewed vs. local self-timed signals • More reuse • Design skills and design automation

  6. Development of IC Technology(continued) Cool-Chip Design Techniques • Dynamic power clock frequency x transistor switching activity x voltage2 • to lower transistor threshold voltage • Multithreshold voltage to minimize leakage • Speed-adaptive variable threshold circuit • Software-selectable voltage matching the speed • Software-controlled clock frequency • Turning units on/off individually and dynamically • A wide variety of sleep modes • Power monitoring circuit

  7. High-Performance Processor Architectures • Straightforward approach - to add more of: ◊ on-chip multilevel cache and prefetch buffers ◊ hardware contexts and registers ◊ large distributed on-chip DRAM ◊ processors or processing elements • Extending traditional architectures ◊ a higher degree of ILP ◊ new possibility of prediction and speculation ◊ the ability of overcoming memory latencies ◊ on-chip multiprocessing and multithreading • Cooperating distributed system on a chip • Co-designed virtual machine • How to use a billion transistors on a chip?

  8. High-Performance Processor Architectures • Advanced Superscalar: 16 or 32 instr/cycle • Superspeculative Processor: ◊ aggressive fine-grained speculation at every step • Simultaneous Multithread Processor (SMT) • Trace (multiscalar) Processor ◊ coarse-grained traces on distr. multiple cores • Vector Intelligent RAM proc. (V-IRAM) ◊ couple vector exec. with large, high-bw DRAM • On-chip Multiprocessor (CMP) ◊ the ability of overcoming memory latencies • RAW (configurable) Processor ◊ Compiler customizes the h/w to each application • Proposed New Processor Architectures

  9. High-Performance Processor Architectures Extending ILP architecture • Deeper pipeline • Increasing use of prediction and speculation • Advanced superscalar processing • Wider instruction window • Highly intelligent optimizing compiler

  10. High-Performance Processor Architectures Trace Processor - metahardware approach • H/w & runtime s/w monitor pgm’s behavior. • Using aggressive prediction and speculative techniques to recast the pgm into traces. • Multiple PEs exploit trace-level parallelism. • Metahardware can be implemented as various helper engines.

  11. High-Performance Processor Architectures Instr.-Level Distributed Processing(ILDP) • Run distributed PUs at very high clock rate. • Simple logic of small PUs reduces critical • paths and overall IC area and wire lengths. • Partition pgm to maximize local comm. • Asynchronous multiclock domains. • Monitored by co-designed virtual machine.

  12. High-Performance Processor Architectures Co-designed Virtual Machine for ILDP • Virtual machine monitor (VMM) is a hidden • layer of s/w, codesigned with h/w. • VMM manages ILDP resources through tight • interaction between h/w and low-level s/w. • dynamically optimizes executing threads based • on instr. dependencies and interinstr. comm.

  13. High-Performance Processor Architectures Clustered Dependence-Based Architecture • organize PUs into clusters. • steer dependent instructions to the same cluster. • Within a cluster, further divide into instruction, • cache, integer, floating-point processing, etc. • Instructions within a cluster can be issued in • order, while in different clusters out of order.

  14. Electronic Design Automation Why design automation? • EDA increases design productivity. • Time to market or time to prototype is crucial. • EDA reduces design cost, especially for low- • volume custom-designed product. • Automation Philosophy • Select a pareto optimal set from a design space. • Architectural framework and parameter range • define the design space. • Select lower-level components from a library. • Space walker explores the design space. • Constructor, simulator, evaluator, …

  15. Electronic Design Automation PICO (Program In Chip Out) System • An architectural synthesis system written in C. • emits VHDL for h/w and compiled s/w code. • Optimality defined by gate count and exe time.

  16. Embedded Computing Three Features of Embedded Architecture • Specialization • Customization • Automation and verification • Specialized architectures • SLI or SOC w/ rich diversity of custom designs. • Some GP arch. or OTS run special applications • optimally, e.g., multimedia vector applications. • Mostly domain- or application-specific system • relatively small, well-defined workloads • irregular determinant system configuration • minimized logic complexity and die size

  17. Embedded Computing Customization • A level of specialization beyond OTS • better cost performance than OTS design • incurs three nonrecurring expenses (NRE) costs • reducing architectural costs by reusing soft IP • reducing phys. design costs by reusing IP blocks • reducing mask set costs by avoiding SLI design • Automation and Verification • System-level simulation to start a project. • EDA supports a sharp increase in IC complexity. • EDA reduces design costs. • EDA reduces time to market.

  18. Cellular Computing Motivation for the CA Computing Paradigm • simple+vastly parallel+local=cellular computing • simple cell is the basic processor • parallelism on a much larger scale measured by 10x • local connectivity pattern carrying little information

  19. Cellular (CP) vs. Parallel (PP) Processing Cellular Computing • PP: a small number of powerful processors • CP: a vast number of small processing cells • PP: global scheduling and synchronization • CP: Cell computes its next state as a function of its neighboring values. No one cell has a global view of the entire system. • PP: Global communication is slow. • CP: Local communication can be much faster. • PP: Sequential-to-parallel partitioning is difficult. • CP: potential of addressing much larger problems based on local interaction rules. • PP: centralized • CP: naturally distributed

  20. Cellular Computing Specific Application Areas • Image processing • Pattern recognition • NP-complete problems from graph theory, network • design, VLSI simulation, pgm optimization, etc. • Random number generation, cryptography, compu- • tational physics, chemistry, biology. • Environmental modeling, landslide simulation, • social behavior, finance. • Molecular dynamics, molecular devices, nano- • scale calculating machines.

More Related