Software Engineering and Process for Scientific Computing: The FLASH Example Anshu Dubey Sept 9, 2013 AFRD Simulation an

Software Engineering and Process for Scientific Computing: The FLASH Example Anshu Dubey Sept 9, 2013 AFRD Simulation and Modeling Meeting

Code Architecture : Considerations • Identify logically separable functional units of computation • Encode the logical separation (modularity) into a framework • Separate what is exposed outside the module from what is private to the module • Define interfaces through which the modules can interact with each other • Devise control flow – the driver While these are good principles to start with, they don’t always work out easily. It may become difficult to untangle the data dependencies or modularity might dictate code replication. This is where design really becomes important.

FLASH Architecture • Implemented by the Setup Script, which also configures • Links together needed physics and tools for an application • Parses Config files to • Determine a self consistent set of units to include • If a unit has multiple implementations, finds out which implementation to include • Get list of parameters from units • Determines solution data storage • Configures Makefiles properly • For a particular platform • For included Units • Implements inheritance with unix directory structure • Provides a mechanism for customization

Config file example Required Units Alternate local IO routines Runtime parameters and documentation Additional scratch grid variable Enforce geometry or other conditions

Data Management • Defined constants for globally known quantities • Data ownership by individual units • Arbitration on data shared by two or more units • Definition of scope for groups of data • Unit scope data module, one per implementation of the unit • Subunit scope data module, one per implementation of the subunit • All other data modules follow the general FLASH inheritance • The directory in which the module exists, and all of its subdirectories have access to the data modules • Other units can access data through available accessor functions • For large scale manipulations of data residing in two or more units, runtime control transfers back and forth between units • Avoids lateral transfer of large amounts of data • Avoids performance degradation

Unit Hierarchy Unit API/stubs UnitMain Common API implementation UnitSomething API implementation Common Impl Impl_1 Remaining API impl kernel Impl_2 Remaining API impl Impl_3 Remaining API impl kernel kernel kernel

local API Example of a Unit – Grid (simplified) Grid GridParticles GridMain GridSolvers GridBC GPMapToMesh GPMove UG paramesh Why Local API ? Grid_init calls init functions for all subunits, if subunit is not included code won’t build. Paramesh2 paramesh4 UG paramesh etc… PM4_package PM4dev_package Sieve PttoPt

Functional Component in Multiple Units • Example Particles • Position initialization and time integration in Particles unit • Data movement in Grid unit • Mapping divided between Grid and Particles • Solve the problem by moving control back and forth between units Particles Init Map Evolve Driver Init Evolve Grid Init Map Move

Goal from the beginning Make the code public Use the same code for many different applications All target applications were for reactive flows Diverging camps from the beginning Camp 1: Produce a well architected modular code Camp 2: Yes, but also use it soon for science Both goals hard to meet in the near term Two parallel development paths started Not enough resources to sustain both Camp 2 won out First release FLASH1.6 FLASH Evolution : Version 1

Smashed together from three distinct existing codes PARAMESH for AMR Prometheus for Hydro EOS and nuclear burn from other research codes F77 style of programming; Common blocks for data sharing Inconsistent data structures, divergent coding practices and no coding standards Concept of alternative implementations brought in with a script for plugging different EOS Beginning of inheriting directory structure First release FLASH 1.6 Version 1

Centralized database Common blocks eliminated All data inventoried Different types of variables identified Testing got formalized Test-suite version 1 Run on multiple platforms Policies about monitoring Not much else changed in the architecture Version 2 : Data Inventory

Navigating the source tree became more confusing and Config file dependencies became more verbose No possibility of data scoping; every data item was equally accessible to every routine in the code When parsing a function, one could not tell the source of data Lateral dependencies were further hidden Overhead of database querying slowed the code by about 10-15% The queries caused huge amount of code replication and source files became ugly Central Database Disadvantages

Kept inheriting directory structure, inheritance and customization mechanisms from earlier versions Defined naming conventions Differentiate between namespace and organizational directories Differentiate between API and non-API functions in a unit Prefixes indicating the source and scope of data items Formalized the unit architecture Defined API for each unit with null implementation at the top level Resolved data ownership and scope Resolved lateral dependencies for encapsulation Introduced subunits and built-in unit test framework Version 3 : the Current Architecture

Did not need any change in the architecture Primarily a capabilities addition exercise Mesh replication was easily introduced for multigroup radiation Expanded to other communities such as fluid-structure interaction because of existing Lagrangian framework and elliptic solver Has Chombo as an alternative mesh package, but for hydro only applications Version 4

The bias at the time – keep the scientists in control Keep the development and production branches synchronized Enforced backward compatibility in the interfaces Precluded needed deep changes Hugely increased developer effort High barrier to entry for a new developer Did not get adopted for production in the center for more than two years Development continued in FLASH1.6, and so had to be brought simultaneously into FLASH2 too. Database caused performance hit and IPA could not be done, so slower Transition to Version 2

Controlled by the developers Sufficient time and resources made available to design and prototype No attempt at backward compatibility No attempt to keep development synchronized with production All focus on a forward looking modular, extensible and maintainable code Transition to Version 3 Two very important factors to remember: The scientists had a robust enough production code The developers had internalized the vagaries of the solvers

Build the framework in isolation from the production code base Infrastructure units first implemented with a homegrown Uniform Grid. Helped define the API and data ownership Unit tests for infrastructure built before any physics was brought over Hydro and ideal gas EOS were next with Sod problem Next was PARAMESH: the Sod problem and the IO implementation were verified Test-suite was started on multiple platforms with various configurations (1/2/3D, UG/PARAMESH, HDF5/PnetCDF) This took about a year and a half, the framework was very well tested and robust by this time The Methodology

In the next stage the mature solvers (ones that were unlikely to have incremental changes) were transitioned to the code Once a code unit became designated for FLASH3, no users could make a change to that unit in FLASH2 without consulting the code group. The next transition was the simplest production application (with minimal amount of physics) Scientists were in the loop for verification and in prioritizing the units to be transitioned FLASH3 was in production in the Center long before its official 3.0 release More trust between developers and scientists More reliable code; unit tests provided more confidence, and it was easier to add capabilities The Methodology Continued …

Verification • Codes obviously need to be verified for correctness • There is no such thing as a bug-free code • A code is only as robust as the most rigorous test designed for it • Devising a good test is at least as important as a good algorithm design • Multi-component code testing needs • Unit test to verify a single functionality • May need to be done in more than one way • Other tests that combine components in many different ways • Combinations increase non-linearly with code components

What makes a good test-suite • Verifies the code in every possible meaningful configuration (again impossible to achieve) • In the absence of comprehensive coverage provides a wide coverage with available resources • Verifies the code on all supported hardware and software stack • Is able to report on detected errors in easy to interpret ways • Runs regularly and catches bugs introduced into the code base as early as possible

Maintenance Practices • Repository management • For every development branch if there is a production schedule there is a corresponding production branch • Stable revisions of the development branches are tagged and periodically merged to production branch • Campaigns branch off from the production branch • No forward merges occur on these branches • Backward merges are rare, but they do happen • Usually very limited manual merges of individual files or directories • It all works only if all participants buy into the practice • Typical pitfall : someone not checking in their work regularly, their working copy diverges from the repo, updates become a headache

Coding Standard Management • Code is F90 based, compilers tend to be very tolerant of bad code • Extremely easy to let non-maintainable code proliferate • Example : you can violate variable scoping by simply putting in the “use” anywhere, it is valid F90 code • Function prototypes (interfaces in F90) are not necessary, you can eat arguments and not find out until it has become hard to debug because it is so old • Set of scripts that run nightly and flag the violations in coding and document standards • Periodically (most often just before releases) those violations get resolved

Documentation : How much • A well maintainable code is likely to have 25-30% of its source as inline documentation • More is even better • Not doing that is the surest way of a code component to become unsupported (and eventually disappear from the code base) once its developer has moved on • Even otherwise, in a common code it is a requirement that others can read and make sense out of your code • You might forget why you did what you did • The APIs should be really well documented in terms of their function, inputs and outputs, the correct range of values for inputs and expected outcome for those values. • Examples of use are even better

Documentation : How much • If the code is public, other type of documentation becomes necessary • User’s guide • Online resources • FAQ’s or equivalent • If the code accepts contributions from external users then even more documentation becomes necessary • Published coding standards • Coding examples • Developer’s guide FLASH Example

Prioritization whether good long term design or meet short term science objectives Both have their place Initial stages should be driven by science objectives Too early for long term software design Quick and dirty solutions with an eye to learning about code components and their interplay Once there is useable code, long term planning and design should occur Willingness to make wholesale changes to the code at least once in necessary At no stage should one lose sight of science objectives Interdisciplinary Interactions

Partnership model Science users who recognize the code as a research instrument Even better if they are interested in the code Flash early scientists were Developers and computer scientists interested in a product and the science being done with the code Helps to have people with multidisciplinary training Comparable resources and autonomy for code group And recognition of their intellectual contribution to scientific discovery Careful balance between long term and short term objectives Interdisciplinary Interactions

Public Releases – every 8-10 months – forces discipline Brings the code up to coding standards Reconciles and refreshes the test suite Documentation – transient developer population User support documentation Extensive inline documentation Backward compatibility is overrated Uncluttered infrastructure is the best Supporting users is good, letting users drive the capability addition is even better Testing the code on multiple platforms is indispensable Allowing branches to diverge is a really bad idea Lessons Learned

Application Codes Now • Many successful codes provide an infrastructure backbone into which solvers plug in • Mesh, IO, runtime etc • Balancing act between performance and portability • now a new concern : survival • Reducing the size of code : very limited option • tunable parameters : re-factor the codes – but how ? • Software process applied to codes – decade and a half ago • everybody went their own way, but arrived at remarkably similar solution • Is there a lesson in it for the abstractions in the code infrastructures ?

Architecting for Future • Requirements • Maintainable code, support large user community • Reliable results within quantified limits • Retain code portability and performance • Measurable and predictable performance • The challenges in meeting the requirements; tension between • Modularity and performance • Readable/maintainable code and portability • Easy adaptability to new and heterogeneous architectures and complex multiphysics capabilities • Regression test based verification and tolerance for non reproducibility

One Possibility: Foothold for Abstractions • Separation of concerns • Codes have different types of complexities • Physical model, and its numerical algorithms • Implementation – data structures and therefore memory access patterns • Parallelism • Expose parallelism opportunities • Spatial • Operational • Hardware oblivious solver

Expose Parallelism : Spatial

Mapping to Programming Abstractions micro-block computation code transformation dynamic scheduling memory access complexity numerical complexity parallel complexity hardware oblivious solver • write solvers in the form of interdependent tasks • register dependencies with the abstraction layer • expose data/operation fusion possibilities Need programming languages with richer collection of data structures and high level constructs that allow expression of computations with much less detail

http://flash.uchicago.edu/site/flashcode http://flash.uchicago.edu/site/flashcode/user_support/ http://flash.uchicago.edu/site/publications/flash_pubs.shtml http://flash.uchicago.edu/site/testsuite/home.py Some useful links

Backup Slides

Supporting multiple set of projects from different branches is more recent at FLASH A hierarchy of project and production branches A stringent merge and test schedule is important How we did it : Turned one of the branches into main development branch Turned trunk into the merge area Enforced a merge schedule Enforced a policy of prioritizing the fixing of whatever broke in the merge. Further Insights

The Present State • The code is large, extensible and well architected • Just about managing to run well on some of the current architectures – Mira • Homogeneous architecture • Sufficient memory per core • Hybrid MPI-OpenMP parallelism • Threading at the solvers level • The maintainable format by threading on blocks • The not so easy to maintain but better performing - threading the nested loops • The code as is will not be able to effectively use Titan and quite possibly mic architectures.

Separate Complexity: Example • At present we separate unit complexity from parallel complexity (most good codes do) • Unit explicitly pulls the data it needs • we get a block, cell coordinates and other relevant grid meta data explicitly • At the wrapper layer we separate some infrastructural complexity from the numerics, but not all • Solver has to make receiving data structures conform to the mesh -> has to know them • Because of data structures memory access patterns are deeply intertwined with the numerics • Getting the performance implies second guessing the compiler • Solver should ideally be written without explicit knowledge of data structures, loop-bounds and nesting • Data structures as desired by the solver • Possibly the solver written as a stencil • Deepen the wrapper layer to assemble the data structure

Expose Parallelism : Functional FLASH Hydro • update halo • apply equation of state to halo • get Riemann state • compute face fluxes • conserve fluxes • update • apply equation of state In get Riemann state • normal state reconstruction using the characteristics • transverse flux construction • correct states • Lots of field variables and meta-data

Expose functional parallelism • Rewrite solvers as a collection of somewhat independent operations • Define dependencies in the solvers (operation and data) • Apply operator fusing at build (code transformations, pre-processing or some combination) • Make it possible to operate on micro-tiles/blocks : stencil based approaches are the extreme cases • Data fusing at run and/or compile time The abstraction layers should do appropriate fusions and code transformations and use dynamic runtime management to orchestrate the computation for performance

Simple setup hostname:Flash3> ./setup MySimulation -auto setup script will automatically generate the object directory based on the MySimulation problem you specify Sample Units File INCLUDE Driver/DriverMain/TimeDep INCLUDE Grid/GridMain/paramesh/Paramesh3/PM3_package/headers INCLUDE Grid/GridMain/paramesh/Paramesh3/PM3_package/mpi_source INCLUDE Grid/GridMain/paramesh/Paramesh3/PM3_package/source INCLUDE Grid/localAPI INCLUDE IO/IOMain/hdf5/serial/PM INCLUDE PhysicalConstants/PhysicalConstantsMain INCLUDE RuntimeParameters/RuntimeParametersMain INCLUDE Simulation/SimulationMain/Sedov INCLUDE flashUtilities/general INCLUDE physics/Eos/EosMain/Gamma INCLUDE physics/Hydro/HydroMain/split/PPM/PPMKernel INCLUDE physics/Hydro/HydroMain/utilities

FLASH Example : Makefile • Each supported site has a specific Makefile.h • Variable defined for library locations • Variables for compiler being used • Flags for using in “debug”, “test” or “opt” mode • Other necessary flags • Every directory can have a makefile snippet • Exploits the recursively expanded variables • Makes sure to include the source files defined at that level unless they are inherited • Specified local dependencies • The file snippets are consolidated into Makefile.Unit for every unit • The Makefile.h and Makefile.Unit are “included” in the generated Makefile

Code Architecture : Important Questions • What are the essential data structures • State data, meta data and scratch data • What are the different ways in which the data structures are manipulated • Solver operations, housekeeping, being moved around • How do various data structures interact with each other • What metadata needed to correctly change state data • How much scratch space is needed, where can it be reused • What are the data dependencies • Where are the firewalls between who can use what data and how • Which part of the data can be modified by which solver • Which variables can only be modified by global state change • How should the data be scoped

FLASH Example • Requirements for infrastructure support: • AMR, and also preferably Uniform Grid • Input runtime parameters • IO • Support for multiple species, physical constants etc • Physics requirements • Shock hydrodynamics /MHD • Nuclear networks • Equation of state and other material properties • Time-stepping • Lagrangian particles

Example of Unit Design • Non trivial to design several of the physics units in ways that meet modularity and performance constraints. • Eos (equation of state) unit is a good example • Individual mesh points are independent of each other • There are several reusable calculations • Other physics units demand great flexibility from it • single grid point • only the interior cells, or only the ghost cells • a row at a time, a column at a time or the entire block at once • different grid data structures, and different modes at different times • Implementations range from simple ideal gas law to table look up and iterations for degenerate matter and plasma, with widely differing relative contribution in the overall execution time • Relative values of overall energy and internal energy play role in accuracy of results • Sometimes several derivative quantities are desired as output

EOS interface Design • Hierarchy in complexity of interfaces • For single point calculation scalar input and output • For sections of a block or full block vectorized input and output • wrappers to vectorize and configure the data • returning derivative quantities if desired • Different levels in the hierarchy give different degrees of control to the client routines • Most of the complexity is completely hidden from casual users • More sophisticated users can bypass the wrappers for greater control • Done with elaborate machinery of masks and defined constants

Coding Standards • Absolutely essential for code maintainability • Consistent code is easier to maintain • Someone other than the developer can inspect and make sense out of the code segment • Data structures remain more consistent • Should always include documenting standards also • Critical when there is transient population of developers • Someone else can understand and maintain your code • Easier for users to customize and even contribute code • Typically involve • Naming conventions • Inheritance and Code organization

FLASH Example: The Tests Collection

Maintenance Practices • Repository management • Should you have a gatekeeper • How far do you allow the branches to diverge • How much access control do you apply • Verification management • Monitoring the regression tests • Prioritization of efforts : how long do you let a failing test go on failing • Coding Standards management • How do you verify that the new code adheres to coding and documentation standards • Documentation • What fraction of developer time reasonable

Variety of User Expertise • Novice users – execute one of included applications • change only the runtime parameters • Most users – generate new problems, analyze • Generate new Simulations with initial conditions, parameters • Write alternate API routines for specialized output • Advanced users – Customize existing routines • Add small amounts of new code where their application resides • Expert – new research • Completely new algorithms and/or capabilities • Can contribute to core functionality

Code Repositories • Centralized Version Control • CVS the first one to be heavily deployed • Subversion the most commonly used • Distributed Version Control • Most popular ones are Git and Mercurial • Synchronization through exchange of patches • One can maintain multiple local branches • Makes for a much easier co-existence of production and development • Gate keeping can become challenging

Software Engineering and Process for Scientific Computing: The FLASH Example Anshu Dubey Sept 9, 2013 AFRD Simulation an