1 / 15

Combining Simulators and FPGAs “An Out-of-Body Experience”

Combining Simulators and FPGAs “An Out-of-Body Experience”. Eric S. Chung , Brian Gold, James C. Hoe, Babak Falsafi {echung, bgold, jhoe, babak}@ece.cmu.edu. S IM F LEX /P ROTO F LEX. The RAMP full-system challenge. RAMP vision for studying systems w/ FPGAs

arin
Télécharger la présentation

Combining Simulators and FPGAs “An Out-of-Body Experience”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Simulators and FPGAs “An Out-of-Body Experience” Eric S. Chung, Brian Gold, James C. Hoe, Babak Falsafi {echung, bgold, jhoe, babak}@ece.cmu.edu SIMFLEX/PROTOFLEX

  2. The RAMP full-system challenge • RAMP vision for studying systems w/ FPGAs • functional & cycle-accurate simulation • scalability, speed, & flexibility on FPGAs • full-system (run unmodified binaries & OS) •  •  •  IRQ controller DMAcontroller I/O MMUcontroller CPU CPU Terminal PCI Bus Memory Ethernetcontroller SCSIcontroller Graphics card Disk Disk ‘Full-sys’ RAMP will incur large effortyet, not all behaviors frequently used (e.g., I/O) Eric S. Chung / RAMP 2006 Summer Retreat

  3. Combining simulators & FPGAs • Simulators already provide full-system  why not simulate infrequent behaviors (e.g., I/O devices)? Simulator FPGA CPU CPU CPU CPU Ethernet Ethernet SCSI Memory SCSI Memory disk disk • Advantages • avoid impl. infreq. behaviors  lowers full-sys FPGA development • low impact on scalability & perf. on FPGA Eric S. Chung / RAMP 2006 Summer Retreat

  4. Outline • Motivation • Migration • Implementation status • Conclusion Eric S. Chung / RAMP 2006 Summer Retreat

  5. 1 1 2 2 3 3 Migration Target design FPGA Simulator “Target objects”ex: func or timing cpu • 3 ways to map target object to host FPGA-only Simulation-only Migratable • Migratable objects • switch modes between FPGA & simulator hosts • target behavior need not be 100% in FPGA mode e.g., impl. 80% target behavior in FPGA, 100% in simulator Eric S. Chung / RAMP 2006 Summer Retreat

  6. load CPU SCSI cmd Migration example Target-to-host mappings: • CPU = migratable • Memory = FPGA-only • Devices = SW-only CPU FPGA SCSI Memory Example CPU instruction stream CPU state transfer Simulator load CPU add time multiply I/O SCSI cmd SCSI Memory add sub .. disk Eric S. Chung / RAMP 2006 Summer Retreat

  7. Advantages • Lowers development effort • avoid bring-up of infrequent behaviors • migrate & validate ref. models from simulator • tailor impl. to workload (avoid rarely used instrs, good for CISC x86) • Fast & scalable • perf-critical objects on FPGA (eg, CPU, memory) • scalable for MPs  add migratable CPUs FPGA Simulator CPU CPU CPU CPU CPU CPU SCSI Memory Memory SCSI disk Eric S. Chung / RAMP 2006 Summer Retreat

  8. CPU Subtleties • Objects separated in simulator/FPGA interact • examples: interrupts, DMA • handle by forwarding messages between FPGA/simulator • FPGA-only & SW-only mapped objects easy to locate • migrated objects require tracking Simulator FPGA CPU CPU DMA SCSI Memory SCSI Memory disk Forwarded DMA Eric S. Chung / RAMP 2006 Summer Retreat

  9. CPU Subtleties • Objects separated in simulator/FPGA interact • examples: interrupts, DMA • handle by forwarding messages between FPGA/simulator • FPGA-only & SW-only mapped objects easy to locate • migrated objects require tracking Option 2:Forced migration Option 1:Forwarded interrupt Simulator FPGA CPU CPU Interrupt SCSI Memory SCSI Memory disk Cross-host interactions rare  low impact on FPGA perf. Eric S. Chung / RAMP 2006 Summer Retreat

  10. Subtleties cont. • Migration cost • migrating object requires state copy e.g., migratable CPU has registers & TLBs • FPGA-to-simulator latency & sim. time limits # migrations/instr • FPGA & simulator asynchrony • simulated time “ticks” at different rates in FPGA & simulator • must synchronize for deterministic replay & accurate device timing Eric S. Chung / RAMP 2006 Summer Retreat

  11. Outline • Motivation • Migration • Implementation in progress • Conclusion Eric S. Chung / RAMP 2006 Summer Retreat

  12. Implementation status • Target system • Sun Fire[tm] 3800 Server (up to 24-way) • UltraSPARC III ISA • Solaris 8 • Proof-of-concept software-to-software migration • run 2 instances of Virtutech Simics • migration designed & tested in 2 weeks • can migrate on arbitrary behavior (e.g., ADD instruction) Eric S. Chung / RAMP 2006 Summer Retreat

  13. BlueSPARC core (in progress) • In-order SPARCV9 core • supports 144 out of 170 integer instr behaviors • supports partial MMU w/ I- & D-TLBs • goal: 99.999% of instrs & behaviors in target workloads • SPEC (mostly user-level), OLTP/DB2 (high TLB misses, 40% time in priv-mode) • CPI ranges 5 to 7 cycles • synth: 15k LUTs on Virtex-II Pro 30, 85MHz, 12MIPS (worst-case) • developed in Bluespec HDL, 6000L in 6 weeks • Core validation • run RTL in lockstep w/ Simics’s UltraSPARC simulation model • workload validation w/ SPEC, OLTP/DB2, OpenSPARC verif. suite Eric S. Chung / RAMP 2006 Summer Retreat

  14. Migration on FPGA (in progress) Virtutech Simics Xilinx XUP Virtex-II Pro 30 Simics UltraSPARC BlueSPARC PowerPC Migration& messageinterface Simulated target devices DDR memory ethernet • PowerPC functions • core & memory initialization from Simics checkpoints • facilitates migration for BlueSPARC • connects simulated devices to memory (e.g., SCSI DMA) Eric S. Chung / RAMP 2006 Summer Retreat

  15. Conclusion • Contributions • virtualizes infrequent behaviors using simulation • simplifies full-system FPGA emulator, still fast/scalable • incremental validation from reference system • Future work • support migration in RDL? • adding cores + scaling across multiple FPGAs • We are ready for BEE2 • Thanks! Questions? echung@ece.cmu.edu • PROTOFLEX/SIMFLEX(http://www.ece.cmu.edu/~simflex) Eric S. Chung / RAMP 2006 Summer Retreat

More Related