Center for Plasma Edge Simulation Framework

Center for Plasma Edge Simulation Framework Scott Klasky and Ilkay Altintas, SDSC Bertram Ludäscher, UC Davis Mladen Vouk, NCSU add help from CPES team! ORNL June 7, 2005

Outline of Talk • The Center For Plasma Edge Simulation FSP. • Computer Science Enabling Technologies. • Technologies necessary for FSP • LOOSE code Coupling. • Adaptive Workflow Technology • Collaborative code monitoring. • Integrated Data Analysisand Visualization Environment. • Ubiquitous and Transparent Data Sharing.

Pedestal growth Center for Plasma Edge Simulation • Develop a new integrated predictive plasma edge simulation framework applicable to existing magnetic fusion facilities and next-generation burning plasma experiments, including ITER, by • Creating a new edge kinetic code XGC-ET using PIC. • Improving M3D for nonlinear ELM studies. • Developing an integrated simulation framework between XGC-ET and M3D. • Studying L-H transition, Pedestal buildup, ELM crash, etc. • Working with existing SciDAC centers.

Applications Math CS It’s about the enabling technologies Applications drive Enabling technologies respond

FSP computer science requirements • Coupling multiple codes/data • In-core and network-based • Analysis and visualization • Feature extraction, data juxtaposition for V&V • Dynamic monitoring and control • Parameter modification, snapshot generation, … • Data sharing among collaborators • Transparent and efficient data access • Our FSP will drive our framework requirements. • Physicists desire a “easy-to-use”, “easy-to-develop”, framework. • Evaluated many frameworks before we made a decision.

Workflow Requirements: Loose coupling monitoring, analysis, storing Start (L-H) M3D-L (Linear stability) XGC-ET Mesh/Interpolation Yes Stable? No Distributed Store M3D XGC-ET Mesh/Interpolation Distributed Store t Stable? B healed? No Mesh/Interpolation TBs Yes GBs Compute Puncture Plots Noise Detection Portal (Elvis) Island detection Need More Flights? Distributed Store Blob Detection Feature Detection Out-of-core Isosurface methods MBs I D A V E conductivity

Adaptive Workflow Automation Problem:Unique requirements of scientific WFs • Moving large volumes between modules • Tightly-coupled efficient data movement • Wide-area loosely coupled data movement • Specification of granularity-based iteration • e.g. In spatial-temporal simulations – a time step is a “granule” • Support for data transformation • complex data types (including file formats, e.g. netCDF, HDF) • Dynamic steering of workflow by user • Dynamic user examination of results • Adaptive workflow automation • detect and dynamically respond to changing requirements, state and execution in the workflow. • Paper in progress, “An Autonomic Service Architecture for Self-Managing Grid Applications”.

Elvis - Feibush • Part of the Fusion Collaboratory SciDAC • GOALS: • Harden SciVis Deploy on portal. • Integration with MDS+: for scopes. • http://w3.pppl.gov/transp/transpgrid_monitor • Used everyday for monitoring transp runs. • Contains the basic functionality of Scivis (Klasky). • Web based and java application. • Used by dozens of fusion scientist. • Elvis will be extended to be an actor in the Kepler system.

Ubiquitous and Transparent Data Sharing Petabytes Tapes e.g. HPSS IDAVE Terabytes IDAVE Disks Terabytes IDAVE Disks IDAVE

Ubiquitous and Transparent Data Sharing • Problem: • Simulations and collaborators in any FSP will be distributed across a national and international networks • FSP simulations will produce massive amounts of data that will be permanently stored in national facilities, and temporary stored at collaborators disk storage systems • Need to share large volume of data amongst collaborators and the wider community. • Current fusion solutions are inadequate to handle FSP data management challenges.

Ubiquitous and Transparent Data Sharing • What technology is required • Metadata system • To map user concepts to datasets and files • e.g. find {ITER, shot_1174, Var=P(2D), Time=0-10} • e.g. Yields: /iter/shot1174/mhd • Logical to physical data (files) mapping • e.g. lors://www.pppl.gov/fsp/shot1174.xml • Support for multiple replicas based on access patterns • Technology to manage temporary space • Lifetime, garbage collection • Technology for fast access • Parallel streams, large transfer windows, data streaming • Robustness • If mass store unavailable, replicas can be used • Technology to recover from transient failure

Ubiquitous and Transparent Data Sharing • Approach • Need logistical variants of standard libraries and tools (NetCDF, HDF5) for moving and accessing data across the network • Speed of transfer and control of placement are vital to performance and fault tolerance • Data staging, scheduling and tracking based on common SDM tools and policies • Global namespace and placement policies to enable community collaboration around distributed postprocessing, visualization tasks • Experience • Logistical Networking: distributed depot system, maps logical to physical, parallel access, file staging • Storage Resource Management (SRM): Disk & Tape Mgmt Systems, manage space, lifetime, garbage collection, • No dependence on a single system: SRM is a middleware standard for multiple storage systems

The Big Picture: Supporting the Scientist to go from Napkin Drawings to Executable Workflows Conceptual SWF Executable SWF Here: John Blondin, NC State Astrophysics Terascale Supernova Initiative SciDAC, DOE

Another Example: External Job Management • Reuse existing job management infrastructure • (e.g., NIMROD, CONDOR, etc.) • Here: 1000’s of GAMESS jobs (quantum mechanics)

Promoter Identification Workflow in KEPLER • choice of “directors” (= models of computation) • nested sub-workflows • dynamic parameters • user interaction • smart pause and rerun infrastructure • smooth transition from design to execution, etc.

CEPS FSP Workflow

PPPL/Fusion Workflow Palette • AMRMHD –3 months • Elvis –6 months (monitoring) • GTC –4 months • M3D –3 months • XGC-ET + M3D - 1.5 years

Ilkay Altintas SDM, NLADR, Resurgence, EOL, … Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEK Terence Critchlow SDM Tobin Fricke ROADNet Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEK Efrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOL Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludaescher SDM, SEEK, GEON, BIRN,ROADNet Mark Miller EOL Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy II Bing Zhu SEEK ••• KEPLER/CSP: Contributors, Sponsors, Projects Ptolemy II Ptolemy II www.kepler-project.org LLNL, NCSU, SDSC, UCB, UCD, UCSB, UCSD, U Man… Utah,…, UTEP, …, Zurich SPA Collab. tools: IRC, cvs, skype, Wiki: hotTopics, FAQs, ..

GEON Dataset Generation & Registration(co-development is the key in KEPLER/SPA) % Makefile $> ant run SQL database access (JDBC) Matt et al. (SEEK) Efrat (GEON) Ilkay (SDM) Yang (Ptolemy) Xiaowen (SDM) Edward et al.(Ptolemy)

The Challenges: … Many … • How to manage complexity • How to enable data-intensive apps • How to enable compute-intensive apps • How to enable fault tolerance • How to enable user interactivity • Dynamic WF adaptation • Support various transport mechanism (main memory, shared file system, Logistic Networks,…)

KEPLER as a Framework • Actor oriented modeling an design • … maximizes component reusability • … by employing a data-centric, data-flow oriented (and functional) view, and … • … by “separating out” orchestration issues • … from the “normal components” (actors) • … and centralizing them in a special “director” • … keeping actor interfaces extremely simple

KEPLER as a Scientific Workflow Methodology • From (actor oriented) modeling and design … • … to deployment, execution, monitoring, reuse. • A framework to “glue together” different technologies (i.e., playing “nice” with external apps) • Local/remote legacy application (ssh2, command-line,…) • GridFTP, SRB (Sput, Sget, Sreplicate, …), scp, … • Globus, Nimrod, CONDOR, … • … • Moreover, (Ptolemy heritage of 15+ years…) • … multiple (well-studied!) execution models • … nested sub-workflows • … a user-friendly GUI front-end

The Competition: Commercial/Open Source Scientific Workflow and (Dataflow) Systems – we know ‘em all … Kensington Discovery Edition from InforSense Triana SciRUN II Taverna

Process • Need to understand specific requirements of our workflows • Typical process: • Requirements, Abstraction & Parameterization, Virtualization • From “napkin-drawings” to executable Kepler workflows • Descriptions of individual tasks & steps, data formats, overall dataflow, execution constraints, distribution aspects, deciding on the right parameters and encapsulation levels. • Deciding level and granularity of control, synchronicity, execution environments, timings, data throughput, indirection level, flow persistance, etc. • It’s all about the interfaces! • Overall Goal: Scientific Process Automation (SPA!)

Component Interaction and Behavioral Polymorphism in Ptolemy II / KEPLER These polymorphic methods implement the communication semantics of a domain in Ptolemy II. The receiver instance used in communication issupplied by the director, not by the component. Behavioral polymorphism is the idea that components can be defined to operate with multiple models of computation and multiple middleware frameworks. Source: Edward A. Lee, UC Berkeley

As in KEPLER: Components are linked via ports; often Dataflow (but also msg/ctl-flow)  But: where is the component interaction semantics defined?? WS composition, orchestration, … Component Composition & Interaction Source: GRIST workshop, July 2004, Caltech

Some KEPLER Actors (out of 160+ … and counting…)

KEPLER Today • Coarse-grained scientific workflows, e.g., • web service actors, Grid actors, command-line actors, … • Fine-grained workflows and simulations, e.g., • Database access, XSLT transformations, … • Kepler Extensions • support for data- and compute-intensive workflows (SDM/SPA, SEEK) • real-time data streaming (ROADNet) • other special and generic extensions (e.g. GEON, SEEK) • Status • first release (alpha) was in May 2004 • nightly builds w/ version tests • “Link-Up Sister Project” w/ other SWF systems (myGrid/Taverna, Triana, …), SciRUN II (DOE SciDAC/SDM) • Participation in various workshops and conferences (GGF10, SSDBMs, eScience WF workshop, Ptolemy/Kepler Miniconf. Berkeley…)

KEPLER Extensions • Data-intensive workflows • SRB (data access, movement, replication management, collection management, …), scp, GridFTP, Sabul, … • Compute-intensive workflows • NIMROD, LST, …. , CONDOR, Pegasus, …, Globus, …, Griddles • Execution Monitoring • custom actors • Fault tolerance • custom actors (e.g., retry-n, failover)

KEPLER Tomorrow • Support for complete SWF life cycle • Design, share, prototype, run, monitor, deploy, … • Application-driven extensions (here: SDM): • access to/integration with other SDM components • PnetCDF?, PVFS(2)?, MPI-IO?, parallel-R?, ASPECT?, FastBit, … • support for execution of new SWF domains • Astrophysics, Fusion, …. • Further generic extensions: • addtl. support for data-intensive and compute-intensive workflows (all SRB Scommands, CCA support, …) • semantics-intensive workflows • (C-z; bg; fg)-ing (“detach” and reconnect) • workflow deployment models • distributed execution, monitoring, graceful degradation, fault tolerance, .. • Additional “domain awareness” (esp. via new directors) • currently: “outsource” specific capabilities (job scheduling, data movement, etc) to external “black box” components (NIMROD, CONDOR, SRB, Globus, … ) • What can we achieve with “awareness” about job scheduling, time series, parameter sweeps, hybrid type system with semantic types (“Sparrow” extensions) • Consolidation • More installers, regular releases, improved usability, documentation, …

Center for Plasma Edge Simulation Framework

Center for Plasma Edge Simulation Framework

Presentation Transcript

TIDE Simulation Framework

Terascale computational atomic physics for the plasma edge

Edge-SOL Plasma Transport Simulation for the KSTAR

Simulation Center

New trigger simulation framework

Living on the Edge: OSG Edge Services Framework

Space weather and plasma simulation

US SciDAC Fusion Simulation Prototype Center for Plasma edge Simulation -CPES-

Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Integrated Simulation Code for Burning Plasma Analysis

XGC: Gyrokinetic Particle Simulation of Edge Plasma

Fundamentals of Plasma Simulation (I)

ALICE Simulation Framework

Scientific Progress in the SciDAC Center for Simulation of Wave – Plasma Interactions

The Center for Multiscale Plasma Dynamics

Lithium technologies for edge plasma control

CBM Simulation Framework

Simulation Center

XGC gyrokinetic particle simulation of edge plasma

Fundamentals of Plasma Simulation (I)

Workflow automation for processing plasma fusion simulation data

Simulation Equipment - Medical Simulation Center