1 / 28

EPIC Architecture (Explicitly Parallel Instruction Computing)

EPIC Architecture (Explicitly Parallel Instruction Computing). Yangyang Wen CDA5160--Advanced Computer Architecture I University of Central Florida. Outline. What is EPIC? EPIC Philosophy Architectural Features Supporting EPIC Intel’s IA-64 Architectural Features IA-64’s Key Technologies

makani
Télécharger la présentation

EPIC Architecture (Explicitly Parallel Instruction Computing)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EPIC Architecture(Explicitly Parallel Instruction Computing) Yangyang Wen CDA5160--Advanced Computer Architecture I University of Central Florida

  2. Outline • What is EPIC? • EPIC Philosophy • Architectural Features Supporting EPIC • Intel’s IA-64 Architectural Features • IA-64’s Key Technologies • Summary and Reference

  3. Original Source Code Sequential Machine Code Hardware Compiler parallelized code multiple functional units Execution Units Available the execution units are not used efficiently Traditional Architectures: Limited Parallelism Today’s Processors often 60% Idle

  4. Original Source Code Compile Compiler Hardware multiple functional units EPIC Compiler Views Wider Scope Get more efficient use of execution resources . . . . . . . . . . . . EPIC Architecture: Explicit Parallelism Better Parallel machine Code Increases Parallel Execution

  5. What is EPIC ? EPIC means Explicitly Parallel Instruction computing, and EPIC architecture provides features that allow compilers to take a proactive role in enhancing Instruction level parallelism( ILP) without unacceptable hardware complexity.

  6. EPIC’s Performance

  7. EPIC Design Philosophy • EPIC permits the compiler have advanced features to enhance ILP: predication, speculation. • EPIC can design the plan of execution (POE) at compile-time and communicate the POE to the hardware. • EPIC must have massive hardware resources for parallel execution

  8. Introducing IA-64 • IA-64 comes from Intel and is the first 64-bit architecture for Intel. • The first instance of a commercially available EPIC ISA. • The first architecture to bring ILP features to general-purpose microprocessors.

  9. IA-64’s Architectural Basics • Explicit Parallelism • Enhanced ILP • Compiler-oriented • Extremely large physical memory • A huge virtual address space for applications • 64-bit computation • Extremely large register files

  10. IA-64’s Key Technologies • Instructions Bundling • Predication • Control Speculation • Data Speculation • Software pipelining

  11. Instruction Bundling 128-bit bundle • Uses a form of VLIW architecture • Three Instructions are combined into a 128-bit instruction • Parallel Instructions are executed in groups • Template bits decode and route instructions and mark the end of groups of parallel instructions. 41-bits 127 0 Insrtruction2 Instruction 1 Instruction 0 Template

  12. ILP Bottlenecks • Branches • Deal with branch, take predication. • Branch mispredications cause 20% to 30% loss in processor performance . • Memory latency • Latency is the time it takes to get data from memory. The longer it takes you to access memory to get code and data, the longer the CPU sits idle. • For memory latency, it's the loads that are the big problem, not the stores.

  13. Predication If A>B If A>B If A>B S+=A else S+=B end if Predicate S+=A S+=A S+=B The predication is wrong Throw away S+=A *P=S S+=B (b) IA-64 predication • Traditional predication Branching is a major cause of lost performance.

  14. EPIC Predication Process Branch Candidate Instructions are packed into bundles Instructions are marked with ID Processor executes both paths in parallel Compiler finds what insts to execute in parallel Processor checks predication and stores correct results

  15. Predication Benefits • Reduce branches • Reduce mispredication penalties • Reduce critical paths

  16. Control Speculation Traditional Architectures IA-64 Architectures ld.s r8=a[ ] instr 1 instr 2 instr 1 instr 2 . . . br Barrier br Load a[ ] use chk.s r8 use Allows elevation of load, even above a branch Elevating the load above a branch is not possible Memory latency is a major performance bottleneck

  17. Introducing the Token Bit IA-64 ld.s r8=a[ ] instr 1 instr 2 ;Exception Detection Propagate Exception br ;Exception Delivery chk.s r8 use • When elevate ld, give an exception detection • If the load address is valid, it’s normal. • If the load address is invalid, compiler sets token bit ,and jumps out of this path. • If the code goes to chk.s, and the chk.s detects the token bit,jumps to fix-up code,executes the load.

  18. Data Speculation Traditional Architectures IA-64 instr 1 ALAT load.a instr 1 instr 2 instr 2 . . . store Barrier store load use load.c use Chk.a Allows the compiler to elevate the load ,even it isn’t sure if the memory reference overlaps. Can’t elevate the load, so prevents from reordering insts

  19. chk.a reg#? ld.a reg# =... store Advanced Load Address Table: ALAT reg # Address reg # Address • When elevate ld.a,insert ALAT • When store, remove overlap address records in ALAT • When chk.a,if no address is found ,there is a conflict, and jumps to fix-up code to reexecute the code reg # Address ...

  20. Speculation Benefits • Reduces impact of memory latency • Study demonstrates performance improvement of 80% when combined with predication • Greatest improvement to code with many cache accesses • Scheduling flexibility enables new levels of performance headroom

  21. Software Pipelining vs. • Overlap the execution of different loop iterations • Get more iterations in same amount of time

  22. Software Pipelining Example For(I=0;I<1000;I++) x[I]=x[I]+s; Loop: Ld f0,0(r1) Add f0,f0,f1 Sd f0,0(r1) Add r1,r1,8 Subi r2,r2,1 Benz loop Loop: SD f2, -4(r1) Add f2,f0,f1 Subi r2,r2,1 Ld f0, 4(r1) Benz loop Software pipelining

  23. Software Pipelining Advantages • Traditionally performed through loop unrolling • less code compared loop unrolling, increased regularity • Smaller code means fewer cache misses • Especially useful for integer code with small number of loop iterations

  24. Software Pipelining disadvantages • Requires many additional instructions to manage the loop • Without hardware support the overhead may greatly increase code size • typically only used in special technical computing applications

  25. IA-64 Features Supporting Software Pipelining • Full predication • Circular Buffer of General and FP Registers • Loop Branches Decrement RRBs (register rename bases)

  26. Summary • Predication removes branches • Parallel compares increase parallelism • Benefits complex control flow: large databases • Speculation reduces memory latency impact • IA-64 removes recovery from critical path • Benefits applications with poor cache locality: server applications, OS • S/W pipelining support with minimal overhead enables broad usage • Performance for small integer loops with unknown trip counts as well as monster FP loops

  27. Reference • M. S. Schlanker, "EPIC: Explicitly Parallel Instruction Computing", Computer, vol. ?, No. ?, pp 37--45, 2000. • Jerry Huck et al., "Introducing the IA-64 Architecture", Sept - Oct. 2000, pp. 12-23 • Carole Dulong “The IA-64 Architecture at Work”,Computing Practices

More Related