10 likes | 106 Vues
MapReduce -enabled Model Reduction for Large Scale Simulation Data. SUPERCOMPUTER. DATA CENTER. ENGINEER. Joe Ruthruff Sandia National Labs. Jeremy Templeton Sandia National Labs. Paul G. Constantine Stanford University. David F. Gleich Purdue University.
E N D
MapReduce-enabled Model Reduction for Large Scale Simulation Data SUPERCOMPUTER DATA CENTER ENGINEER Joe Ruthruff Sandia National Labs Jeremy Templeton Sandia National Labs Paul G. Constantine Stanford University David F. Gleich Purdue University … enabling engineers to querysimulationdata for statistical studies anduncertainty quantification. A data cluster canhold hundreds or thousandsof simulations … Each HPCsimulation generatesgigabytes of data. Approximation Model Motivation Tuning the Approximation Interpolating the Parameter Basis SVD in MapReduce Simulation Informatics! Heat Conduction Example Ordered by Increasing Oscillations • We implement the SVD via a communication-avoiding QR factorization for tall, skinny matrices in MapReduce. input parameters time space solution CAQR (Demmel et al, 2008) • Construct parameter basis functions by interpolating the samples. • Can use any appropriate interpolation method (e.g., polynomials, radial basis functions, Gaussian process models, piecewise linear, splines, … ) • Given input parameters, a physical simulation approximates a space/time dependent solution. • Each solution is computationally expensive, so exhaustive parameter studies (e.g., UQ/SA/optimization) are infeasible. • Cheaper reduced order models enable guesstimates for parameter inquiries. http://github.com /dgleich/mrtsqr Which section would you rather try to interpolate, A or B? A B truncation fixed space/time discretization amplitudes Constantine and Gleich, Tall and Skinny QR Factorizations in MapReduce Architectures, 2011 • From the SVD, it is natural to expect that the parameter bases will become more oscillatory as the index increases. • Therefore, we expect the gradient between sample points to increase as the index increases. Truth ROM unknown parameter basis functions space-time basis vectors Variance For and experimental design • 80k nodes, 300 time steps • 104 basis runs • SVD of 24m x 104 data matrix (18G) • 500x reduction in wall clock time “predictable” “unpredictable” Space-time prediction variance: • The left singular vectors are the space-time basis. • The singular values give the amplitudes. • The truncation is determined by examining the decay of the singular values. • Treat each right singular vector as samples from the unknown basis functions. SVD The truncation is the largest such that Sandia National Laboratories is a multi-program laboratory operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin company, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.