1 / 14

Guang R. Gao Founder ET International Inc Newark, Delaware USA ggao@etinternational

HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches. Guang R. Gao Founder ET International Inc Newark, Delaware USA ggao@etinternational.com. Who is ETI ?. From “Cool Vendors” Report – By Gartner ( April 17,2012 ): [

ava-holt
Télécharger la présentation

Guang R. Gao Founder ET International Inc Newark, Delaware USA ggao@etinternational

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HPC User Forum 2012 Panel on Potential Disruptive TechnologiesEmerging Parallel Programming Approaches Guang R. Gao Founder ET International Inc Newark, Delaware USA ggao@etinternational.com

  2. Who is ETI ? From “Cool Vendors” Report – By Gartner (April 17,2012): [ ET International Newark, Delaware (www.etinternational.com) Analysis by Carl Claunch Why Cool: ET International delivers its dataflow-oriented ETI Swarm environment for garnering high efficiency from highly parallel software, based on the alternative ParalleX execution model. As highly parallel execution becomes essential to addressing the more substantial computing tasks that HPC users face today, progress is increasingly being stymied by the application's inability to keep all the parallel strands working productively. …]

  3. Motivation • Many-core is coming • Current paradigms don't have the expressive power to harness concurrency • Hardware is getting more heterogeneous • Current hybrid programming techniques (OpenMP+MPI+OpenCL) are not maintainable: too complicated • Caches are disappearing or becoming non-coherent • Distributed memory is everywhere, and at different levels • Fine grained power management • Use what you need and turn off/down the rest • Failure is the norm • Resilience must be baked in the whole stack (application, compiler, runtime, hardware) • Increasing Application Computation/data Irregularity • Static scheduling can no longer properly load balance

  4. ETI Vision • We need new “Execution Models”! • Leverage ETI’s deep and growing IP position based on 25+ years of applied R&D expertise and $20M+ in R&D software engineering and development • (e.g. extensive system software base for Cyclops, CELL, SCC, Intel Runnemede, Intel X86 based machines, Adapteva, etc) • Provide high-performance SWARM software solutions to our OEM’s, partners and direct customers • Advance SWARM solutions to address optimization opportunities driven by heterogeneous multi-/many- core processing including: • Big Compute (Private HPC Cloud)systems • Big Data HPC systems • HPC embedded appliances • etc

  5. Execution Paradigm Comparisons MPI, OpenMP, OpenCL SWARM Time Time Active threads Waiting • Asynchronous Event-Driven Tasks • Dependencies • Resources • Active Messages • Control Migration • Communicating Sequential Processes • Bulk Synchronous • Message Passing

  6. SWARM Execution Overview Enabled Tasks Tasks with Unsatisfied Dependencies Tasks enabled SWARM Dependencies satisfied Tasks mapped to resources Resources in Use CPU CPU CPU CPU Available Resources CPU CPU CPU GPU CPU Resources allocated CPU CPU CPU GPU GPU Resources released

  7. FT-06-09-2011-Gao Case Studies of Fine-Gran Execution Models • Static Dataflow Model (1970s - ) • EARTH Model (1988 - ) • TNT Model and Cyclops-64 (2003 - ) • Codelet Model under Intel-led DARPA/UHPC

  8. DARPA/Intel Runnemede Program ET International, Inc. 1000X Energy reduction Heterogeneous, Tightly-Coupled Simple Architecture System Management & Concurrency Assured Operation Event driven codelets Self-aware introspection Code and data motion CPU <10% overhead Checkpoint with Flash/CPM Security Through Sandboxing Resiliency Execution Model HW/SW Co-Design University of Illinois Interconnect Fabric Productivity Application Efficiency Data Movement Model-based Goal-oriented Self-morphing Heterogeneous & tapered Large local memory Memory Courtesy of The Intel DARPA UHPC Team 1000X energy reduction Overhauled DRAM mArch Resilient memory Our Collaborators

  9. Progress & Proof Points To-Date

  10. Barnes-HutSWARM vsOpenMP Barnes-Hut SWARM vs OpenMP Ideal SWARM OpenMP Barnes-Hut

  11. SWARM/MPI Performance Comparison Consistent Speed-up from 2X to 14.5X

  12. Cholesky Decomposition (SWARM vs MKL/ScaLAPACK) Cholesky Decomposition (SWARM vs MKL/ScaLAPACK

  13. Summary and Acknowledgements • Summary (productivity observation) • N-Body: 1 man-day, 3X • G-500: 1 man-month, upto 14x • Cholesky: 2 man-week, 1.5x NOTE: the base is performance of optimized code • Acknowledgements • Our Sponsors • Our Collaborators and Colleagues • My Host • Others .

  14. Cholesky Profiles SWARM OpenMP

More Related