sldsrv
Uploaded by
22 SLIDES
257 VUES
220LIKES

Cell Broadband Engine Interconnect and Memory Interface

DESCRIPTION

Cell Broadband Engine Interconnect and Memory Interface<br>

1 / 22

Télécharger la présentation

Cell Broadband Engine Interconnect and Memory Interface

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript

Playing audio...

  1. Systems and Technology Group Cell Broadband Engine Interconnect and Memory Interface Scott Clark, Kent Haselhorst, Kerry Imming, John Irish Dave Krolak, Tolga Ozguner IBM Systems and Technology Group Rochester, Minnesota © IBM Corporation, 2005, All rights reserved

  2. Systems and Technology Group Agenda  Cell Overview  Interconnect Challenges  Bus Interface Controller  Element Interconnect Bus  Memory Interface Controller  Summary 2 © IBM Corporation, 2005, All rights reserved

  3. Systems and Technology Group Cell Broadband Engine Key Features Systems and Technology Group The first generation CELL processor consists of: – A Power Processor Element (PPE) – 8 Synergistic Processor Elements (SPE) with Synergistic Memory Flow Control (SMF) – A high bandwidth Element Interconnect Bus (EIB) – A Bus Interface Controller with two configurable I/O interfaces (BIC) – A Memory Interface Controller (MIC)  Synergistic Processor Elements for High (Fl)ops / Watt SPE SPU SPU SPU SPU SPU SPU SPU SPU SXU SXU SXU SXU SXU SXU SXU SXU LS LS LS LS LS LS LS LS SMF SMF SMF SMF SMF SMF SMF SMF 16B/cycle EIB (up to 96B/cycle) 16B/cycle 16B/cycle 16B/cycle (2x) PPE PPU MIC BIC L2 L1 PXU 32B/cycle 16B/cycle FlexIOTM Dual XDRTM 64-bit Power Architecture w/VMX for Traditional Computation 3 3 © IBM Corporation, 2005, All rights reserved © 2005 IBM Corporation

  4. Systems and Technology Group Cell BE Interconnect Challenges  High Bandwidth  Memory Bandwidth  Internal Element to Element Bandwidth  External I/O and SMP Bandwidth  Flexibility and Scalability  Allow System Configuration Flexibility  Modular Internal Bus Structure 4 © IBM Corporation, 2005, All rights reserved

  5. Systems and Technology Group Cell Broadband Engine Die 5 © IBM Corporation, 2005, All rights reserved

  6. Systems and Technology Group Bus Interface Controller (BIC) 6 © IBM Corporation, 2005, All rights reserved

  7. Systems and Technology Group Bus Interface Controller Overview  Two configurable scalable interfaces  7 bytes total outbound / 5 bytes total inbound chip capacity  Rambus FlexIOTM physical  60 GB/s raw bandwidth at 5 Gb/s per differential pair  BIF/IOIF0  Configurable protocol  Broadband Engine Interface (BIF) coherent protocol  I/O Interface (IOIF) non-coherent protocol  Scalable from 0 to 6 bytes outbound / 0 to 5 bytes inbound  Up to 30 GB/s outbound / 25 GB/s inbound in 5 GB/s increments  IOIF1  IOIF protocol  Scalable from 0 to 2 bytes outbound / 0 to 2 bytes inbound  Up to 10 GB/s outbound / 10 GB/s inbound in 5 GB/s increments 7 © IBM Corporation, 2005, All rights reserved

  8. Systems and Technology Group Bus Interface Controller  IOIF Mode Additional Features  I/O Address Translation and Protection  Two stage segment table / page table lookup with caching  4KB, 64KB, 1MB, 16MB page size support per segment  Storage protection at page granularity by device ID  Command ordering attributes assigned at page granularity  HW and SW load of translation caches  Four Virtual Channels per IOIF  IOIF commands assigned to one of four virtual channels  Independent flow control, ordering and resource management per virtual channel  Interrupt Controller  Interrupt presentation, routing and status to PPEs  Interprocessor Interrupt support  16 priority levels 8 © IBM Corporation, 2005, All rights reserved

  9. Systems and Technology Group Bus Interface Controller  Flexible Bandwidth and Protocol through Layered Architecture  Logical Layer  Selectable coherent or non- coherent protocol  Credit based flow control  Transport Layer  Packet generation and parsing  Asynchronous to Data Link Layer  Data Link Layer  Packet transmission and reception  CRC and retry protocol  Physical Layer  Configurable interface widths in 1B granularity  Supports asymmetric Tx / Rx Element Interconnect Bus BIF or IOIF0 IOIF1 Logical Layer BIF Logical Layer IOIF Logical Layer IOIF Core Clock Domain Transport Layer Transport Layer Data Link Layer Data Link Layer Link Clock Domain Physical Layer RX TX RX TX RX TX TX TX RX TX RX TX 9 © IBM Corporation, 2005, All rights reserved

  10. Systems and Technology Group Cell BE Processor Can Support Many Systems  Game console systems  Blades  HDTV  Home media servers  Supercomputers XDRtm XDRtm XDRtm XDRtm Cell BE Processor Cell BE Processor BIF or IOIF0 IOIF1 IOIF1 XDRtm XDRtm XDRtm XDRtm XDRtm XDRtm Cell BE Processor Cell BE Processor IOIF1 IOIF1 BIF BIF SW IOIF1 IOIF1 Processor Processor Cell BE Processor Cell BE Cell BE IOIF0 IOIF1 XDRtm XDRtm XDRtm XDRtm 10 © IBM Corporation, 2005, All rights reserved

  11. Systems and Technology Group Element Interconnect Bus (EIB) 11 © IBM Corporation, 2005, All rights reserved

  12. Systems and Technology Group Element Interconnect Bus Overview Coherent SMP Bus  Supports over 100 outstanding requests  Address collision detection and prevention  High Bandwidth  Four 16 Byte data rings  Operates at ½ processor core frequency  Up to 96 Bytes / processor cycle à 192 Bytes / bus cycle  Over 300 GB/s at 3.2 GHz processor  16 Bytes / bus cycle source and 16 Bytes / bus cycle sink per port  12 Element ports Modular Design for Scalability  Physical modularity for flexibility   Independent Command/Address and Data Networks  Split Command / Data Transactions  12 © IBM Corporation, 2005, All rights reserved

  13. Systems and Technology Group Element Interconnect Bus – Command Topology “Address Concentrator” tree structure minimizes wiring resources Single serial command reflection point (AC0) Address collision detection and prevention Fully pipelined Round robin arbitration Credit based flow control       SPE 1 SPE 3 SPE 5 SPE 7 PPE IOIF1 CMD CMD CMD CMD A C 3 A C 2 A C 1 A C 2 CMD CMD AC0 CMD CMD CMD CMD CMD BIF/IOIF0 MIC SPE 0 SPE 2 SPE 6 SPE 4 13 © IBM Corporation, 2005, All rights reserved

  14. Systems and Technology Group Element Interconnect Bus – Coherent Connection BIF Coherent Protocol Dual chip configuration without external switch chip  Master / Slave AC0 Multi-chip configuration possible with external switch chip Local command processing  AC1 root bypass for non-global commands     4 6 SPE 5 SPE 7 SPE IOIF1 SPE BIF/IOIF0 CMD CMD CMD CMD BIF A C 1 A C 2 2 1 CMD CMD CMD AC0 C AC0 C CMD A A CMD CMD CMD CMD 7 5 BIF/IOIF0 SPE 6 IOIF1 SPE 4 SPE SPE 14 © IBM Corporation, 2005, All rights reserved

  15. Systems and Technology Group Element Interconnect Bus - Data Topology Four 16B data rings connecting 12 bus elements  Two clockwise / Two counter-clockwise Physically overlaps all processor elements Central arbiter supports up to three concurrent transfers per data ring  Two stage, dual round robin arbiter Each element port simultaneously supports 16B in and 16B out data path  Ring topology transparent to element data interface     SPE 1 SPE 3 SPE 5 SPE 7 PPE IOIF1 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B Data Arb 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B BIF/IOIF0 MIC SPE 0 SPE 2 SPE 6 SPE 4 15 © IBM Corporation, 2005, All rights reserved

  16. Systems and Technology Group Resource Allocation Management Optional facility used to minimize over-allocation effects of critical resources  Independent but complimentary function to the EIB  Critical (managed) resource’s time is distributed among groups of requestors Managed resources include:  Rambus XDRTM DRAM memory banks (0 to 15)  BIF/IOIF0 Inbound and BIF/IOIF0 Outbound  IOIF1 Inbound and IOIF1 Outbound Requestors Allocated to Four Resource Allocation Groups (RAG)  17 requestors – PPE, SPEs, I/O Inbound (4 VCs), I/O Outbound (4 VCs) Central Token Manager controller  Requestors ask permission to issue EIB commands to managed resources  Tokens granted across RAGs allow requestor access to issue command to the EIB  Round robin allocation within RAG  Dynamic software configuration of the Token Manager to adjust token allocation rates for varying workloads  Multi-level hardware feedback from managed resource congestion to throttle token allocation     16 © IBM Corporation, 2005, All rights reserved

  17. Systems and Technology Group Memory Interface Controller (MIC) 17 © IBM Corporation, 2005, All rights reserved

  18. Systems and Technology Group EIB Memory Interface Controller (MIC) Command and Data Flow EIB Interface Command Sort  Two independent 32b controllers  25.6 GB/s @ 3.2 Gb/s  32 read and 32 write queues for each channel  All accesses are closed page  SEC/DED ECC  Speculative read support  Remote connection to EIB XIO1 Dataflow XIO0 Dependency Check Dependency Check Dataflow 1.6 Ghz Proc Sync Buffers XIO 0 – Cmd Queues Read Q Buffers Read Write Write Q XIO 0 – Cmd Queues Read Q Read Write Write Q Scrub Init Refresh Scrub Init Refresh Cmd Reorder & Arbitration ECC Correct ECC Cmd Reorder & Arbitration 1.6 Ghz Rambus Sync ECC Correct ECC Generate Generate DRAM Control Mux & Dist DRAM Control Mux & Dist 400 Mhz Rambus XIO 0 XIO 1 18 © IBM Corporation, 2005, All rights reserved

  19. Systems and Technology Group Memory Interface Controller Overview  Capacity  Dependent on XDRAM size & width  Minimum of one 32b interface fully connected  2 x 16b - 256 Mb XDRAMs = 64 MB  Maximum of 2 x 32b interface fully connected  Theoretical Maximum 64 x 1bit - 8 Gb XDRAMs = 64 GB  Clocking  The Rambus interface runs at 400 Mhz which is called a Pclk  Memory Controller logic runs at ½ the processor frequency = 1.6 GHz with an asynchronous crossing to the Rambus clock domain which also runs at 1.6 GHz (multiplied up Pclk)  The MIC dataflow distributes the data to the Rambus interface at 1.6 GHz and widens the datapath in the last level of logic to interface with the Rambus macro  The Rambus macro then has an 8:1 clock ratio to drive the data out at 3.2 Gb/s per differential pair 19 © IBM Corporation, 2005, All rights reserved

  20. Systems and Technology Group Memory Interface Controller Overview  Logical to Physical Memory Map  Interleaves across channels  Interleaves across internal banks  Enables closed bank memory controller to maximize random access bandwidth  Programmable Command Reordering to maximize bandwidth utilization  Command selection will tend to group commands into burst of 8 or 16 in a row before switching the bus  Bank conflicts, high priority reads, dependencies or lack of available commands can cause bus turnarounds  Memory Scrubbing to correct Single Bit Errors (optional)  Frequency: Once every 20.6 msec  Duration: One read is performed. A write is performed if a single bit error is found and corrected.  Refreshes  Frequency: Once every .49 usec  Duration: 1 command (4 Rambus Pclks) 20 © IBM Corporation, 2005, All rights reserved

  21. Systems and Technology Group Design to Support XDR Dram Controller  Power on Initialization & Calibration  Timing Calibration, Receive/Transmit, Setup and Hold  Current Calibrations for drivers and receivers  Impedance Calibrations for drivers and receivers  Periodic Calibration  Periodic Timing Calibration, Receive/Transmit, Setup and Hold  Frequency: Once every 8 msec  Duration: 64 Rambus Pclks (4 separate operations)  Current and Impedance Calibrations for drivers and receivers  Frequency: Once every 8 msec  Duration: 64 Rambus Pclks (4 separate operations)  Early Read  Used to minimize Read to Write turnaround gap on the DRAM data busses  Write masking support for 16B to 128B writes  Data dependant function  Requires Hardware calculation of ECC before storage into the Write data buffer 21 © IBM Corporation, 2005, All rights reserved

  22. Systems and Technology Group Summary  High Bandwidth Interconnect  Dual XDRTM Memory Controller (25.6 GB/s @ 3.2 Gbps)  Two I/O interfaces (60 GB/s @ 5 Gbps)  Internal Element Interconnect (peak BW over 300 GB/s @ 3.2 GHz)  Resource Allocation Management  Flexible and Scalable  Multiple System Configurations  Configurable I/O Interface Bandwidth  Coherent and Non-coherent protocols  Configurable Memory Capacity and Bandwidth  Internal Modular Interconnect Bus  Memory and I/O Interfaces Asynchronous to Processor Core 22 © IBM Corporation, 2005, All rights reserved

More Related