Scalable System for Large Unstructured Mesh Simulation

Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio Oñate

Overview • Introduction • Preparation and Simulation • More Efficient Partitioning • Parallel Element Splitting • Post Processing • Results Cache • Merging Many Partitions • Memory usage • Off-screen mode • Conclusions, Future lines Acknowledgements

Introduction • Education: Masters in Numerical Methods, trainings, seminars, etc. • Publishers: magazines, books, etc. • Research: PhD’s, congresses, projects, etc. • One of the International Centers of Excellence on Simulation-Based Engineering and Sciences [Glotzer et al., WTEC Panel Report on International Assessment of Research and Development in Simulation Based Engineering and Science. World Technology Evaluation Center (wtec.org), 2009].

Introduction • Simulation: structures

Introduction • CFD: Computer Fluid Dynamics

Introduction • Geomechanics • Industrial forming processes • Electromagnetism • Acoustics • Bio-medical engineering • Coupled problems • Earth sciences

Visualization of results Geometrydescription Preparation of analysis data Computer Analysis Provided by CAD or using GiD Introduction • Simulation GiD

Introduction • Analysis Data generation Read in and correct CAD data Assignment of boundary conditions Definitions of analysis parameters Generation of analysis data Assignment of material properties, etc.

Introduction • Visualization of Numerical Results • Deformed shapes, temperature distributions, pressures, etc. • Vector, contour plots, graphs, • Line diagrams, results surfaces • Animated sequences • Particle line flow diagrams

Introduction • Goal: do a CFD simulation with 100 Million elements using in-house tools • Hardware: cluster with • Master node: 2 x Intel Quad Core E5410, 32 GB RAM • 3 TB disc with dedicated Gigabit to Master node • 10 nodes: 2 x Intel Quad Core E5410 and 16 GB RAM • 2 nodes: 2 x AMD Opteron Quad Core 2356 and 32 GB • Total of 96 cores, 224 GB RAM available • Infiniband 4x DDR, 20 Gbps

Introduction • Airflow around a F1 car model

Introduction • Kratos: • Multi-physics, open source framework • Parallelized for shared and distributed memory machines • GiD: • Geometry handling and data management • First coarse mesh • Merging and post-processing results

Introduction res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation

Overview • Introduction • Preparation and Simulation • More Efficient Partitioning • Parallel Element Splitting • Post Processing • Results Cache • Merging Many Partitions • Memory usage • Off-screen mode • Conclusions, Future lines and Acknowledgements

Meshing • Single workstation: limited memory and time • Three steps: • Single node: GiD generates a coarse mesh with 13 Million tetrahedrons • Single node: Kratos+ Metis divide and distribute • In parallel: Kratos refines the mesh locally

Preparation and simulation res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation

Efficient partitioning: before • Rank0 read the model, partitions it and send the partitions to the other ranks Rank 0 Rank 1 Rank 2 Rank 3

Efficient partitioning: before • Requires large memory in node 0 • Using the cluster time for partitioning which can be done outside • Each rerun need repartitioning • Same working procedure for OpenMP and MPI run

Efficient partitioning: now • Dividing and writing the partitions in another machine • Reading data of each rank separately

Preparation and simulation res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation

Local refinement: triangle k k n m 3 m n 4 1 2 k k j i i j l l k m 2 2 1 1 j i i i j j l l k k k m 3 m m 3 1 2 2 1 j i i i j j l l l

Local refinement: triangle • Selecting the case respecting nodes Id • The decision is not for best quality! • It is very good for parallelization • OpenMP • MPI k k k m 3 m m 3 1 2 2 1 j i i i j j l l l

Local refinement: tetrahedron Father Element Child Elements

Local refinement: examples

Local refinement: uniform • A Uniform refinement can be used to obtain a mesh with 8 times more elements • Does not improve the geometry representation

Introduction res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation

Parallel calculation • Calculated using 12 x 8 MPI processes • Less than 1 day for 400 time steps • About 180 GB memory usage • Single volume mesh of 103 Million tetrahedrons split into 96 files ( mesh portion and its results)

Overview • Introduction • Preparation and Simulation • More Efficient Partitioning • Parallel Element Splitting • Post Processing • Results Cache • Merging Many Partitions • Memory usage • Off-screen mode • Conclusions, Future lines and Acknowledgements

Post processing res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation

Post-process • Challenges to face: • Single node • Big files: tens or hundreds of GB • Merging: Lots of files • Batch post-processing • Maintain generality

Big Files: results cache • Uses a defined memory pool to store results. • Used to cache results stored in files. User definable Memory pool Results from files: single, multiple, merge Temporal results Mesh information Created Results: cuts, extrusions, tcl

Big Files: results cache RC Info Result memory footprint file 1 offset type Results cache table RC info file 2 offset type RC entry timestamp · · · · · · · · · RC entry file n offset type timestamp Result · · · · · · · RC info RC entry timestamp Result Open files table RC info file handle type file handle type · · · · · · · · · Granularity of result file handle type

Big Files: results cache • Verifies result’s file(s) and gets result’s position in file and memory footprint. • Results of latest analysis step in memory. • Loaded on demand. • Oldest results unloaded if needed. • Touch on use.

Big Files: results cache • Chinese harbour: 104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16 GB memory usage ( 2 GB results’ cache)

Merging many partitions • Before: 2, 4, ... 10 partitions • Now: 32, 64, 128, ... of a single volume mesh • Postpone any calculation: • Skin extraction • Finding boundary edges • Smoothed normals • Neighbour information • Graphical objects creation

Merging many partitions Telescope example 23,870,544 tetrahedrons Before 32 partitions 24’ 10” After 32 partitions 4’ 34” 128 partitions 10’ 43” Single file 2’ 16”

Merging many partitions

Merging many partitions Racing car example 103,671,344 tetrahedrons Before 96 partitions > 5 hours After 96 partitions 51’ 21” Single file 13’ 25”

Memory usage • Around 12 GB of memory used with a spike of 15 GB ( MS Windows) 17,5 GB ( Linux), including: • Volume mesh ( 103 Mtetras) • Skin mesh ( 6 Mtriangs) • Several surface and cut meshes • Stream line search tree • 2 GB of results cache • Animations

Pictures

Batch post-processing: off-screen • GiD with no interaction and no window • Command line: gid -offscreen [ WxH] -b+gbatch_file_to_run • Useful to: • launch costly animations in bg or in queue • use gid as template generator • use gid behind a web server: Flash Video animation • Animation window: added button to generate batch file for offscreen-gid to be sent to a batch queue.

Animation

Scalable System for Large Unstructured Mesh Simulation

Scalable System for Large Unstructured Mesh Simulation

Presentation Transcript

LARGE EDDY SIMULATION

Unstructured Mesh Discretizations and Solvers for Computational Aerodynamics

Issues in Unstructured Mesh Generation

Hybrid Viscous Unstructured CFD Mesh Technology

Scalable Algorithms for Structured Adaptive Mesh Refinement

Infrastructure for Parallel Adaptive Unstructured Mesh Simulations

Large Eddy Simulation

GRAIL: Scalable Reachability Index for Large Graphs

Scalable behaviors for crowd simulation

Aerodynamic Drag Prediction Using Unstructured Mesh Solvers

Progress in Unstructured Mesh Techniques

Large scale networked system simulation using MLDesigner

Scalable Molecular Dynamics for Large Biomolecular Systems

UNSTRUCTURED/ADAPTIVE MESH MODEL FOR STRATIFIRD TURBULENCE IN ATMOSPHERIC FLOWS

Scalable Molecular Dynamics for Large Biomolecular Systems

Large Mesh Laundry Bags - Stores.laundrybagsonline.com

Unstructured Data Partitioning for Large Scale Visualization

An unstructured mesh model for rotating stratified fluids

Progress in Unstructured Mesh Techniques

Aerodynamic Drag Prediction Using Unstructured Mesh Solvers

Scalable Behaviors for Crowd Simulation

Large scale networked system simulation using MLDesigner