1 / 30

Center for Plasma Edge Simulation Framework

Center for Plasma Edge Simulation Framework. Scott Klasky and Ilkay Altintas, SDSC Bertram Ludäscher, UC Davis Mladen Vouk, NCSU add help from CPES team! ORNL June 7, 2005. Outline of Talk. The Center For Plasma Edge Simulation FSP. Computer Science Enabling Technologies.

brit
Télécharger la présentation

Center for Plasma Edge Simulation Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Center for Plasma Edge Simulation Framework Scott Klasky and Ilkay Altintas, SDSC Bertram Ludäscher, UC Davis Mladen Vouk, NCSU add help from CPES team! ORNL June 7, 2005

  2. Outline of Talk • The Center For Plasma Edge Simulation FSP. • Computer Science Enabling Technologies. • Technologies necessary for FSP • LOOSE code Coupling. • Adaptive Workflow Technology • Collaborative code monitoring. • Integrated Data Analysisand Visualization Environment. • Ubiquitous and Transparent Data Sharing.

  3. Pedestal growth Center for Plasma Edge Simulation • Develop a new integrated predictive plasma edge simulation framework applicable to existing magnetic fusion facilities and next-generation burning plasma experiments, including ITER, by • Creating a new edge kinetic code XGC-ET using PIC. • Improving M3D for nonlinear ELM studies. • Developing an integrated simulation framework between XGC-ET and M3D. • Studying L-H transition, Pedestal buildup, ELM crash, etc. • Working with existing SciDAC centers.

  4. Applications Math CS It’s about the enabling technologies Applications drive Enabling technologies respond

  5. FSP computer science requirements • Coupling multiple codes/data • In-core and network-based • Analysis and visualization • Feature extraction, data juxtaposition for V&V • Dynamic monitoring and control • Parameter modification, snapshot generation, … • Data sharing among collaborators • Transparent and efficient data access • Our FSP will drive our framework requirements. • Physicists desire a “easy-to-use”, “easy-to-develop”, framework. • Evaluated many frameworks before we made a decision.

  6. Workflow Requirements: Loose coupling monitoring, analysis, storing Start (L-H) M3D-L (Linear stability) XGC-ET Mesh/Interpolation Yes Stable? No Distributed Store M3D XGC-ET Mesh/Interpolation Distributed Store t Stable? B healed? No Mesh/Interpolation TBs Yes GBs Compute Puncture Plots Noise Detection Portal (Elvis) Island detection Need More Flights? Distributed Store Blob Detection Feature Detection Out-of-core Isosurface methods MBs I D A V E conductivity

  7. Adaptive Workflow Automation Problem:Unique requirements of scientific WFs • Moving large volumes between modules • Tightly-coupled efficient data movement • Wide-area loosely coupled data movement • Specification of granularity-based iteration • e.g. In spatial-temporal simulations – a time step is a “granule” • Support for data transformation • complex data types (including file formats, e.g. netCDF, HDF) • Dynamic steering of workflow by user • Dynamic user examination of results • Adaptive workflow automation • detect and dynamically respond to changing requirements, state and execution in the workflow. • Paper in progress, “An Autonomic Service Architecture for Self-Managing Grid Applications”.

  8. Elvis - Feibush • Part of the Fusion Collaboratory SciDAC • GOALS: • Harden SciVis Deploy on portal. • Integration with MDS+: for scopes. • http://w3.pppl.gov/transp/transpgrid_monitor • Used everyday for monitoring transp runs. • Contains the basic functionality of Scivis (Klasky). • Web based and java application. • Used by dozens of fusion scientist. • Elvis will be extended to be an actor in the Kepler system.

  9. Ubiquitous and Transparent Data Sharing Petabytes Tapes e.g. HPSS IDAVE Terabytes IDAVE Disks Terabytes IDAVE Disks IDAVE

  10. Ubiquitous and Transparent Data Sharing • Problem: • Simulations and collaborators in any FSP will be distributed across a national and international networks • FSP simulations will produce massive amounts of data that will be permanently stored in national facilities, and temporary stored at collaborators disk storage systems • Need to share large volume of data amongst collaborators and the wider community. • Current fusion solutions are inadequate to handle FSP data management challenges.

  11. Ubiquitous and Transparent Data Sharing • What technology is required • Metadata system • To map user concepts to datasets and files • e.g. find {ITER, shot_1174, Var=P(2D), Time=0-10} • e.g. Yields: /iter/shot1174/mhd • Logical to physical data (files) mapping • e.g. lors://www.pppl.gov/fsp/shot1174.xml • Support for multiple replicas based on access patterns • Technology to manage temporary space • Lifetime, garbage collection • Technology for fast access • Parallel streams, large transfer windows, data streaming • Robustness • If mass store unavailable, replicas can be used • Technology to recover from transient failure

  12. Ubiquitous and Transparent Data Sharing • Approach • Need logistical variants of standard libraries and tools (NetCDF, HDF5) for moving and accessing data across the network • Speed of transfer and control of placement are vital to performance and fault tolerance • Data staging, scheduling and tracking based on common SDM tools and policies • Global namespace and placement policies to enable community collaboration around distributed postprocessing, visualization tasks • Experience • Logistical Networking: distributed depot system, maps logical to physical, parallel access, file staging • Storage Resource Management (SRM): Disk & Tape Mgmt Systems, manage space, lifetime, garbage collection, • No dependence on a single system: SRM is a middleware standard for multiple storage systems

  13. The Big Picture: Supporting the Scientist to go from Napkin Drawings to Executable Workflows Conceptual SWF Executable SWF Here: John Blondin, NC State Astrophysics Terascale Supernova Initiative SciDAC, DOE

  14. Another Example: External Job Management • Reuse existing job management infrastructure • (e.g., NIMROD, CONDOR, etc.) • Here: 1000’s of GAMESS jobs (quantum mechanics)

  15. Promoter Identification Workflow in KEPLER • choice of “directors” (= models of computation) • nested sub-workflows • dynamic parameters • user interaction • smart pause and rerun infrastructure • smooth transition from design to execution, etc.

  16. CEPS FSP Workflow

  17. PPPL/Fusion Workflow Palette • AMRMHD –3 months • Elvis –6 months (monitoring) • GTC –4 months • M3D –3 months • XGC-ET + M3D - 1.5 years

  18. Ilkay Altintas SDM, NLADR, Resurgence, EOL, … Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEK Terence Critchlow SDM Tobin Fricke ROADNet Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEK Efrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOL Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludaescher SDM, SEEK, GEON, BIRN,ROADNet Mark Miller EOL Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy II Bing Zhu SEEK ••• KEPLER/CSP: Contributors, Sponsors, Projects Ptolemy II Ptolemy II www.kepler-project.org LLNL, NCSU, SDSC, UCB, UCD, UCSB, UCSD, U Man… Utah,…, UTEP, …, Zurich SPA Collab. tools: IRC, cvs, skype, Wiki: hotTopics, FAQs, ..

  19. GEON Dataset Generation & Registration(co-development is the key in KEPLER/SPA) % Makefile $> ant run SQL database access (JDBC) Matt et al. (SEEK) Efrat (GEON) Ilkay (SDM) Yang (Ptolemy) Xiaowen (SDM) Edward et al.(Ptolemy)

  20. The Challenges: … Many … • How to manage complexity • How to enable data-intensive apps • How to enable compute-intensive apps • How to enable fault tolerance • How to enable user interactivity • Dynamic WF adaptation • Support various transport mechanism (main memory, shared file system, Logistic Networks,…)

  21. KEPLER as a Framework • Actor oriented modeling an design • … maximizes component reusability • … by employing a data-centric, data-flow oriented (and functional) view, and … • … by “separating out” orchestration issues • … from the “normal components” (actors) • … and centralizing them in a special “director” • … keeping actor interfaces extremely simple

  22. KEPLER as a Scientific Workflow Methodology • From (actor oriented) modeling and design … • … to deployment, execution, monitoring, reuse. • A framework to “glue together” different technologies (i.e., playing “nice” with external apps) • Local/remote legacy application (ssh2, command-line,…) • GridFTP, SRB (Sput, Sget, Sreplicate, …), scp, … • Globus, Nimrod, CONDOR, … • … • Moreover, (Ptolemy heritage of 15+ years…) • … multiple (well-studied!) execution models • … nested sub-workflows • … a user-friendly GUI front-end

  23. The Competition: Commercial/Open Source Scientific Workflow and (Dataflow) Systems – we know ‘em all … Kensington Discovery Edition from InforSense Triana SciRUN II Taverna

  24. Process • Need to understand specific requirements of our workflows • Typical process: • Requirements, Abstraction & Parameterization, Virtualization • From “napkin-drawings” to executable Kepler workflows • Descriptions of individual tasks & steps, data formats, overall dataflow, execution constraints, distribution aspects, deciding on the right parameters and encapsulation levels. • Deciding level and granularity of control, synchronicity, execution environments, timings, data throughput, indirection level, flow persistance, etc. • It’s all about the interfaces! • Overall Goal: Scientific Process Automation (SPA!)

  25. Component Interaction and Behavioral Polymorphism in Ptolemy II / KEPLER These polymorphic methods implement the communication semantics of a domain in Ptolemy II. The receiver instance used in communication issupplied by the director, not by the component. Behavioral polymorphism is the idea that components can be defined to operate with multiple models of computation and multiple middleware frameworks. Source: Edward A. Lee, UC Berkeley

  26. As in KEPLER: Components are linked via ports; often Dataflow (but also msg/ctl-flow)  But: where is the component interaction semantics defined?? WS composition, orchestration, … Component Composition & Interaction Source: GRIST workshop, July 2004, Caltech

  27. Some KEPLER Actors (out of 160+ … and counting…)

  28. KEPLER Today • Coarse-grained scientific workflows, e.g., • web service actors, Grid actors, command-line actors, … • Fine-grained workflows and simulations, e.g., • Database access, XSLT transformations, … • Kepler Extensions • support for data- and compute-intensive workflows (SDM/SPA, SEEK) • real-time data streaming (ROADNet) • other special and generic extensions (e.g. GEON, SEEK) • Status • first release (alpha) was in May 2004 • nightly builds w/ version tests • “Link-Up Sister Project” w/ other SWF systems (myGrid/Taverna, Triana, …), SciRUN II (DOE SciDAC/SDM) • Participation in various workshops and conferences (GGF10, SSDBMs, eScience WF workshop, Ptolemy/Kepler Miniconf. Berkeley…)

  29. KEPLER Extensions • Data-intensive workflows • SRB (data access, movement, replication management, collection management, …), scp, GridFTP, Sabul, … • Compute-intensive workflows • NIMROD, LST, …. , CONDOR, Pegasus, …, Globus, …, Griddles • Execution Monitoring • custom actors • Fault tolerance • custom actors (e.g., retry-n, failover)

  30. KEPLER Tomorrow • Support for complete SWF life cycle • Design, share, prototype, run, monitor, deploy, … • Application-driven extensions (here: SDM): • access to/integration with other SDM components • PnetCDF?, PVFS(2)?, MPI-IO?, parallel-R?, ASPECT?, FastBit, … • support for execution of new SWF domains • Astrophysics, Fusion, …. • Further generic extensions: • addtl. support for data-intensive and compute-intensive workflows (all SRB Scommands, CCA support, …) • semantics-intensive workflows • (C-z; bg; fg)-ing (“detach” and reconnect) • workflow deployment models • distributed execution, monitoring, graceful degradation, fault tolerance, .. • Additional “domain awareness” (esp. via new directors) • currently: “outsource” specific capabilities (job scheduling, data movement, etc) to external “black box” components (NIMROD, CONDOR, SRB, Globus, … ) • What can we achieve with “awareness” about job scheduling, time series, parameter sweeps, hybrid type system with semantic types (“Sparrow” extensions) • Consolidation • More installers, regular releases, improved usability, documentation, …

More Related