320 likes | 414 Vues
Database Methods for Scientific Computing. David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint work with Tiankai Tu and Julio Lopez). t. The Scientific Computing Process. Physical model. Simulation results. Mesh. Mesh generation. Solver.
E N D
Database Methods for Scientific Computing David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint work with Tiankai Tu and Julio Lopez)
t The Scientific Computing Process Physical model Simulation results Mesh Mesh generation Solver Visuali- zation
t The Euclid Project • Goal: Run large-scale physical simulations on PC’s with limited physical memory. • Approach: Index and store the input and output datasets in databases, and compute on the databases directly. • Requires research at the intersection of scientific computing, algorithms, databases, and systems. Physical model DB Mesh DBs Simulation results DB Mesh generation Solver Visuali- zation
David O’Hallaron, Jacobo Bielak, Omar Ghattas (Carnegie Mellon) Jonathan Shewchuk (UC Berkeley) Steven Day (SD State)
Teora, Italy 1980
lat. 34.38 long. -118.16 epicenter lat. 34.32 long. -118.48 x San Fernando Valley lat. 34.08 long. -118.75 San Fernando Valley
San Fernando Valley (Top View) Hard rock epicenter x Soft soil
San Fernando Valley (Side View) Soft soil Hard rock
Partitioned Unstructured Mesh nodes element
t Scientific Computing with Euclid • Represent physical model, mesh, and simulation results on disk in spatial database structures called etrees (Euclid trees) • Linear octree indexed by standard Morton-based locational codes. • Disk pages indexed by standard B-tree indexing structure. • Perform entire process out-of-core by querying and updating the etrees. Mesh node and element etrees Physical model etree Simulation results etree Mesh generation Solver Visuali- zation
h2 h4 slave node h m h1 h3 h m e g b l j d f i j k l a b c a c i k d e f g h2 h3 h4 h1 element/octant master node Octrees Octree mesh generation Balance requirement for meshes (2-to-1 constraint)
h m i j k l a b c d e f g h g b l f e j d a i k c Linear Octrees y 8 7 h m 6 5 4 e g b l j 3 d f 2 a c i k 1 0 x B-tree index 0 1 2 3 4 5 6 7 8 m B-tree Pages
Addressing Linear Octree Elements y 8 d’s left-lower corner (2, 2) 7 h m 6 Binary form (010, 010) 5 4 e g b l j 3 Interleave the bits to obtain Morton code d f 2 a c i k 1 010 010 0 x 0 1 2 3 4 5 6 7 8 00 11 00 Morton code: Maps n-dimensional points to one-dimensional scalars Locational code: Appends an octant’s level to the Morton code of its left-lower corner Append level of d to obtain locational code 001100_11
m x h m i j k l a b c d e f g Nice Properties of Linear Octrees m h e g b l j d f a c i k An addressing scheme that clusters nearby octants Finding an octant without knowing its locational code The order imposed by the locational code is the same as the preorder traversal of leafs in octree
Etree Mesh Generator Application-specific input element database unbalanced octree balanced octree construct balance transform etree library etree library etree library node database
Etree Library: A Framework In C for Manipulating Etrees on Disk • Etree API— Octant (insert) and octree (balance) level operations. • Linear octree— Well-known coding scheme to assign keys to octants. • Auto navigation— New algorithm for constructing octree automatically. • Local balancing— New algorithm to speed up balancing operation. • B-tree — Well-known DB indexing structure. Application (e.g., construct, balance) Etree API Etree Library Linear Octree Local Balancing Auto Navigation B-Tree
01 10 11 01 10 11 00 00 Mesh Element Etree root 01 10 11 A F G 01 10 11 01 10 11 00 00 B C D E B-tree page (locational code keys) 0000_01 A 0100_10 B 0101_10 C 0110_10 D 0111_10 E 1000_01 F 1100_01 G exact hit aggregate hit X:0101_10 Y:1010_10 KEY FACT: Leaf nodes and aggregated nodes can be located within a B-tree page with a fast binary search, without traversing the edges of the octree. 00
000000 a 000100 b 000101 c 000110 d 000111 e 001000 f 001100 g 001101 h 010000 i 010010 j 011000 k 100000 l 100100 m 110000 n Mesh Node Etree j(1,4) k(2,4) n(4,4) i(0,4) c(0,3) h(2,3) e(1,3) b(0,2) m(4,2) g(2,2) d(1,2) a(0,0) f(2,0) l(4,0) B-tree leaf page 1 (Morton code keys) B-tree leaf page 2 (Morton code keys)
Auto Navigation Navigation octree • Guided by an application function • An in-memory pointer-based octree • Dynamically grows in depth-first fashion • Leaf octants are pruned and flushed to disk in preorder (in increasing locational code order) • Appends the octants to the etree database to avoid database search : Octants not yet processed (in memory) : Non-leaf octants being decomposed (in memory) : Leaf octants (flushed to database)
Local Balancing Operational steps Partition the entire domain into equal-size blocks Perform internal balancing to enforce 2-to-1 constraint within each block (in a memory resident blocking array) Perform boundary balancing to resolve interactions between adjacent blocks Key Fact: Interactions between adjacent blocks arealways absorbedby boundary octantsandwill not be propagated into the blocks.
Some Evaluation Questions Is etree mesh generation feasible? How does running time vary with the physical memory size? What is the performance impact of auto navigation? What is the performance impact of local balancing?
Evaluation Methodology Used etree mesh generator to build family of finite element meshes for San Fernando Valley earthquake ground motion simulations. SFx : A mesh of the 50 km x 50 km x 12 km San Fernando Valley that resolves seismic waves with periods of at most x seconds.
Evaluation Setup All experiments conducted on a PIII 1GHz machine running Linux 2.4.17. Machine’s physical memory for the experiments ranged from 128 MB to 880 MB. Before each experiment, two 1.5 GB files were sequentially scanned to ensure that the operating system’s buffer cache was flushed.
Etree Feasibility All experiments performed with 128 MB physical memory • Generating a mesh with 13.6 million elements and of size 4.3 GB in 2.6 hours seems reasonable • The overall throughput increases with mesh size
Impact of Physical Memory Size • Memory size does not have a significant impact on the running time • The etree method is not relying on the operating system’s internal caching mechanism to achieve its performance
Impact of Auto Navigation • Reducing B-tree buffer size does not increase the construction time • Auto navigation is not sensitive to B-tree buffer size
Impact of Local Balancing • Achieves speedups ranging from 8 (SF1) to 28 (SF10) • Benefits from the one-time scan of the database and the efficient array-based neighbor finding algorithm
Some Related Work General octree algorithms: Samet 90 Octree mesh: Shepard & Geoges 91, Bern et al. 90, Young et al. 91, Wang99 Out-of-core octree solver method: Salmon 97 Linear quadtree: Gargantini 82, Morton 66 Space filling curve: Orenstein 84, Orenstein 86, Faloutsos & Roseman 89 Large dataset processing: Freitag & Loy 99, Seamons & Winslett 96, Ferreira et al. 99, Kurc et al. 01, Choudhary et al. 99, Parashar & Browne 97
Summary and Conclusions • Euclid project aims to recast entire scientific computing process in terms of database ops. • Incorporating existing database techniques (linear octree and B-tree) with new algorithms (auto navigation and local balancing) in a unified framework (the etree) can deliver new capabilities. • On the horizon: • Caching and prefetching for etree solver • Remote access and derived value caching for visualization • Parallell visualization system based on etrees • Unstructured tetrahedral mesh generation using R-trees.
Etree API Unix file I/O style, three levels of abstraction: Initialization and cleanup. e.g., etree_t *etree_open(const char *path, int flag, …); Octant-level operations. e.g., int etree_insert(etree_t *ep, location_t loc, void* value); Octree-level operations. e.g., int etree_balance(etree_t *ep, decom_t *baldecom);