1 / 19

UltraSPARC III

UltraSPARC III. Hari P. Ananthanarayanan Anand S. Rajan. Presentation Outline. Background Introduction to the UltraSPARC Instruction Issue Unit Integer Execute Unit Floating Point Unit Memory Subsystem. Introduction. 3 rd generation of Sun Microsystems’ 64 bit SPARC V9 architecture

nubia
Télécharger la présentation

UltraSPARC III

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan

  2. Presentation Outline • Background • Introduction to the UltraSPARC • Instruction Issue Unit • Integer Execute Unit • Floating Point Unit • Memory Subsystem

  3. Introduction • 3rd generation of Sun Microsystems’ 64 bit SPARC V9 architecture • Design Target • 600 MHz • 70-watt power dissipation @ 1.8V • 0.25-micron process with 6 metal layers • Transistors Count - 12 million (RAM) 4 million (Logic) • Die size of 360mm2

  4. A Tour of the UltraSPARC • 14 stage pipeline • Instruction Issue Unit occupies stages A through J • Integer Execution Unit - stages R through D • Data Cache Unit – stages E through W • Floating Point Unit – stages E through D

  5. Design Goals • Minimum latency for integer execution path, determines cycle time - limit stage size to approximately 8 logic gates • Minimize performance degradation due to clock overhead, e.g. - On-chip caches are wave pipelined • Minimize branch misprediction latency – use of miss queue

  6. Instruction Pipeline

  7. Instruction Issue Unit

  8. Instruction Issue Unit • UltraSparc III is a static speculation machine. Compiler makes the speculation path sequential, results in fewer requirements on the Fetch • Stage A contains a small, 32-byte buffer to support sequential prefetching into instruction cache • I-cache access over 2 cycles (P and F) , it is wave pipelined Pipeline

  9. Instruction Issue Unit – Contd. • ITLB and branch prediction mechanism overlapped with I-cache access • Target address is generated only in Stage B and redirected to Stage A if taken • 20 entry instruction queue and 4-entry miss queue. Latter stores alternate execution path to mitigate effects of misprediction • Stages I and J used to decode and dispatch instructions; scoreboarding is used to check for operand dependency. Pipeline

  10. Branch Prediction Mechanism • Slightly modified Gshare algorithm with 16K saturating 2-bit counters – the three low order index bits into predictor use PC info only • 8 cycle misprediction delay, need to drain stages Pipeline

  11. Integer Execute Unit • Executes loads, stores, shift, arithmetic, logical and branch instructions • 4 integer executions per cycle – 2 from (arithmetic/logical/shift), 1 from load/store and 1 branch • Entire data path uses dynamic precharge circuits – this is the E stage • Future file technique to handle exceptions – we have working and architectural register files (WARF) Pipeline

  12. Integer Execute Unit – Contd. • Integer execution accesses data from WRF in the R stage and writes to it in C stage. • ARF copied into WRF in case of exceptions. • Results are committed into ARF at the end of the pipe. • Integer multiply and divide are not pipelined and are executed in the ASU; strategy is to decouple less frequently executed instructions. Pipeline

  13. Floating Point Unit • Floating point and partitioned fixed point (graphics) instructions • 3 datapaths • 4 stage divide/multiply • 4 stage add/subtract/compare • Unpipelined divide/square root • Push FPU by one stage to keep integer unit compact (counter the effect of wire delays) Pipeline

  14. Data Cache Unit

  15. Memory – L1 Data Cache • 64 KB, 4-way, 32-byte line • 2 cycle access time – Wave pipelined • Sum addressed memory (SAM) – combines address addition and word line decode Pipeline

  16. Memory - Prefetch Cache • 2 KB, 2 way, 64-byte line • Multi-ported SRAM • Streaming data possible (similar to stream buffers) • Detects striding loads – hardware prefetch issued independent of software prefetch Pipeline

  17. Memory – Write Cache • 2 KB, 4 way, 64-byte line • Reduce bandwidth due to store traffic • Sole source of on-chip dirty data – easy to handle on-chip cache consistency • Write-validate scheme- multiplex between L2 bytes and write-cache bytes for loads Pipeline

  18. External Memory Interface • L2 Cache – Direct-mapped, Unified Data and Instruction, 12 cycle access time • Cache controller allows programmable support of 4 MB or 8 MB • On-chip Main Memory Controller • On-chip Tags – allow associative L2 cache design without latency penalty Pipeline

  19. Layout of UltraSPARC III

More Related