1 / 18

UltraSPARC T2 Sun Microsystems

CS 433 – Computer System Organization Manish Agrawal Brett Daniel Josh Smith. UltraSPARC T2 Sun Microsystems. Overview of the UltraSPARC T2. Multi-threaded(8), multi-core(8) CPU Frequency ranges from 900MHz to 1.4GHz Powered by less than 95 watts (nominal) with less than 2 watts per thread

lea
Télécharger la présentation

UltraSPARC T2 Sun Microsystems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 433 – Computer System Organization Manish Agrawal Brett Daniel Josh Smith UltraSPARC T2Sun Microsystems

  2. Overview of the UltraSPARC T2 • Multi-threaded(8), multi-core(8) CPU • Frequency ranges from 900MHz to 1.4GHz • Powered by less than 95 watts (nominal) with less than 2 watts per thread • Integrated • 10 Gb Ethernet networking • PCI Express I/O expansion • FPU and cryptographic processing units per core

  3. History • Codename Niagara2 • Member of SPARC family • 2 previous multi-core processors • UltraSPARC IV • UltraSPARC IV+ • UltraSPARC T1 (first multi-core and multi-threaded) • Released 14 November 2005 • 4, 6, or 8 cores with 4 threads each • UltraSPARC T2 Released 7 August 2007 • Now 8 threads per core (instead of 4)

  4. Motivation • Instead of optimizing each core, overall goal was running as many concurrent threads as possible maximizing and utilizing each core’s pipeline • Each core is less complex than those of current high end processor, allowing 8 cores to fit on the same die. • Does not feature out-of-order execution, or a sizable amount of cache • Each core is a barrel processor

  5. Components • 8 Fully pipelined FPUs • 8 SPUs • 2 integer ALUs per core, each one shared by a group of four threads • 4MB L2 Cache (8-banks, 16-way associative) • 8 KB data cache and 16 KB instruction cache • Two 10Gb Ethernet ports and one PCIe port Source: http://www.sun.com/processors/UltraSPARC-T2/datasheet.pdf

  6. Chip Source: http://www.opensparc.net/images/stories/t2/ultrasparc-t2-layout.png

  7. For a single thread • Memory is THE bottleneck to improving performance • Commercial server workloads exhibit poor memory locality • Only a modest throughput speedup is possible by reducing compute time • Conventional single-thread processors optimized for ILP have low Utilizations With many threads • It’s possible to find something to execute every cycle • Significant throughput speedups are possible • Processor utilization is much higher Source: Golla R, „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct. 2006, http://www.opensparc.net/pubs/preszo//06/04-Sun-Golla.pdf

  8. Engineering Solutions • • Goals of the T2 project were: • Double UltraSparc T1's throughput and throughput/watt • Improve UltraSparc T1's FP single-thread (T1 was unable to handle workloads with more than 1-3% FP instructions) • throughput performance • Minimize required area for these improvements • • Considered doubling number of UltraSparc T1 cores • 16 cores of 4 threads each • Takes too much die area • No area left for improving FP performance

  9. Core Architecture Source: http://realworldtech.com/page.cfm?ArticleID=RWT090406012516&p=2

  10. Core Architecture Source: http://blogs.sun.com/sprack/resource/N2_Announce_Breakout_final.pdf

  11. Fetch Cache Pick Decode Execute Mem Bypass W Fetch Cache Pick Decode Execute Fx1 . . . Fx5 FB FW Efficient in-order single issue pipeline • Eight-stage integer pipeline • Pick is for selecting 2 threads for execution (Added this stage for T2) • In the bypass stage, the load/store unit (LSU) forwards data to the integer register files (IRFs) with sufficient write timing margin. All integer operations pass through the bypass stage. • 12-stage floating point pipeline • 6-cycle latency for dependent FP ops • Integer multiplies are pipelined between different threads. Integer multiplies block within the same thread. • Integer divide is a long latency operation. Integer divides are not pipelined between different threads.

  12. “Server on a chip” • Two 10/1 Gigabit ethernet ports • Integrated PCI-Express • Embedded cryptography http://www.podtech.net/home/1293/niagara-2-server-on-a-chip/

  13. Comparison Against AMD Opteron • 4 cores max • Allows multiprocessors • “Hypertransport” between cores • Shared execution units

  14. Comparison Against Intel Core • 4 cores6 in development8+ in “Nehalem” • Allows multiprocessors • Shared FSB

  15. OpenSPARC • Open source release under GNU GPL • Verilog, verification/tests, simulation/modeling tools • ISA specification • http://www.opensparc.net/ "We truly believe OpenSparc will blossom in the future because it is open." Naxin Zhang, Polaris Micro

  16. Future • Niagra III: “Victoria Falls” • "Pushing up threads and cores" • Retain simplicity: In-order processing • Target multiprocessor servers

  17. http://www.sun.com/processors/ UltraSPARC-T2/gallery/index.xml?p=1&s=1 Video

  18. Sources • http://www.sun.com/processors/UltraSPARC-T2/ • http://www.opensparc.net/ • http://www.opensparc.net/pubs/preszo/06/04-Sun-Golla.pdf • http://realworldtech.com/page.cfm?ArticleID=RWT090406012516 • http://www.news.com/2100-1006_3-6127137.html • http://www.news.com/2100-7344_3-6183562.html

More Related