1 / 23

Bringing Up Anton: Taking Co-Design into Production

Bringing Up Anton: Taking Co-Design into Production. Joseph A. Bank September 24, 2010 D E Shaw Research. Talk Outline. Brief history of Anton Bringup challenges of Anton The bringup lessons. A Brief History of Anton.

coy
Télécharger la présentation

Bringing Up Anton: Taking Co-Design into Production

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bringing Up Anton:Taking Co-Design into Production Joseph A. Bank September 24, 2010 D E Shaw Research

  2. Talk Outline • Brief history of Anton • Bringup challenges of Anton • The bringup lessons

  3. A Brief History of Anton • Massively parallel special purpose machine to accelerate Molecular Dynamics (MD) simulations • Custom designed ASICs connected by specialized toroidal network • First ASIC received Q1 2008 • 512-node machine operational Q4 2008 • 1-millisecond BPTI MD simulation Q2 2009 • 512-node achieves performance of ~17,000 ns/day for 5DHFR (23,558 atoms) MD simulation

  4. Bringup Challenges of Anton • Application: MD itself is a bit hard to verify • Few simple metrics (energy drift, frms, folds/ms, …) • No one has simulated the time scales of Anton • Algorithm Changes • Gaussian Split Ewald method non-bonded far interactions • Neutral Territory method for non-bonded near interactions • Architecture • Massively parallel heterogeneous system • 512+ nodes, 13 cores per node, 3 types of cores • Custom communication primitives • Fixed point instead of floating point • Resource optimized => I/D caches, SRAMs are all tightly constrained • Software • From scratch MD code base for Anton • Anton simulation preparation framework is complex • Dynamic code generation • Specialization to machine size, chemical system, etc Summary => Application/Architecture Co-design makes bringup uniquely challenging

  5. Bringup Lesson Outline as Quips • “Do your homework” • “Where’s the chip?” • “Repeat yourself, over and over and over” • “Inspector gadget” • “Use your eyes” • “Target practice” • “Trust no one”

  6. “Do Your Homework”: Preparing for Bringup • Desmond: Verification of algorithms, develop experience with MD simulation • Pyrite: Verification of fixed point calculation kernels • Detailed architectural simulator • Interface compatible with ASIC design (allowing co-simulation with RTL) • Enabled earliest possible development and testing of complete software stack (embedded code, prep time, etc) • During bringup the simulator could rerun simulations with much higher visibility of the architectural and software state.

  7. “Where’s the chip?”: Dealing with Scarcity • Challenge: Anton’s primary designed mode of operation is “SRAM mode” where all data fits in SRAM. This requires a configuration of at least 2x2x2 ASICs. During bringup, ASICs trickled in… • Solution: “DRAM mode” • We spent about 6 man months of software development on a mode of operation that choreographs paging data into SRAM from DRAM and could perform large chemistry simulations on small Anton configurations (even single ASICs). • DRAM mode was used to test every ASIC individually and at each machine size we have built:1, 2, 4, 8, 64, 128, 256, 512, 1024, 2048.

  8. “Repeat Yourself”: Bit-wise Reproducibility • Anton and its embedded SW were designed to provide application level bit-wise reproducibility independent of HW configuration. • Detection: Rerun entire simulations and compare trajectories. • Primary means of detecting HW/SW bugs during bringup • Used with “golden” trajectories for suite of tests on every ASIC • Periodically used to check machine status • Isolation with Force comparison: Online checking of redundant force calculation • Generalized isolation with redundancy checker infrastructure: Online piecewise rerunning of simulation with arbitrary logging of lightweight checksums

  9. “Inspector Gadget”: Anton’s Logic Analyzer • Anton ASICs include a builtin “logic analyzer” that can be configured to capture traces of various hardware signals without perturbing timing. • Extremely useful when it worked. • Limited number of signals could be traced in a single run, often requiring multiple runs • Traces can be “bumped” for other DRAM traffic, so often was not useful in DRAM mode simulations • Provided key performance tuning data • Lesson: HW visibility tools are a great investment.

  10. “Use your eyes”: Visualization for Debugging and Optimization • Many of the most difficult bugs during bringup were initially tracked down by creating custom visualizations that provided key insights. • Favor quick and dirty over beautiful! • Example 1: Force mismatch blast patterns • Example 2: When ions attack • Example 3: Logic Analyzer for optimization/tuning

  11. “Use Your Eyes”: Blast Pattern

  12. “Use your eyes”: When ions attack

  13. “Use your eyes”: Profiling

  14. “Target Practice”: Have concrete milestones

  15. “Trust No One”: Paranoid Debugging • During Anton bringup, it was useful to be very paranoid. • Issues were found in both hardware and software at similar frequency and our initial guesses were often wrong. • Most engineers have little experience with this phase of a project; as a software developer it takes practice to learn to distrust the hardware. • Best example: SRAMs that would return bad results for some locations less than once an hour.

  16. Conclusions • Application/Architecture Co-design made bringing up Anton extremely challenging • Most important lessons from Anton’s successful bringup • Preparation • Repeatability • Paranoia

  17. End

  18. Molecular Dynamics Simulation (MD) • 104 to 105 atoms in a simulation • Millisecond-scale simulations • Each time step is ~2fs (2x10-15 seconds) • Need 5x1011 time steps • Presently at ~108 time steps/day on a cluster with Desmond (Bowers et al, SC2006) • Simulating 1 ms takes >10 years on a cluster • Needed an architectural jump forward: Anton (Shaw et al, ISCA 2007, CACM2008, SC09)

  19. Biomolecular Timescales (seconds) Simulation Experiment Adapted from Suits (IBM), originally from Chan & Dill (1993) Hours/days on workstation A few months on Anton, longest MD simulation ever run Long MD run with Desmond on Infiniband cluster (weeks to months) Less than a day on Anton

  20. Compute Interactions on Neutral Territory Tower Plate Traditional Method NT Method D. E. Shaw, “A Fast, Scalable Method for the Parallel Evaluation of Distance-Limited Pairwise Particle Interactions”, J Comput. Chem., 2005

  21. An Anton ASIC • Two computational subsystems connected by communication ring • Hardware datapaths compute over 25 billion interactions/sec • Software runs on 12 cores in the flexible subsystem • 6 links for the 3D Torus, each 42Gbps bandwidth, 50ns chip-chip latency • 1 Host Interface link for external I/O, 1Gbps. • 2 banks of DDR2-800 DRAM

  22. Anton’s Flexible Subsystem General Purpose cores are 32bit Tensilica LX Remote Access Unit handles multiple parallel DMA to/from 32KB of local SRAM Geometry Cores are custom-designed, dual-slot VLIW, quad-word fixed-point SIMD Kuskin et al, HPCA 2008

  23. Anton New York Segment • Anton 512 node system in NY. 2 of 4 racks shown under construction. • Each racks contains 32 boards • Each board holds 4 Anton nodes 512 nodes in an 888 3D torus can be built out to 4096 nodes in a larger data center D. E. Shaw Research

More Related