1 / 32

Database Methods for Scientific Computing

Database Methods for Scientific Computing. David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint work with Tiankai Tu and Julio Lopez). t. The Scientific Computing Process. Physical model. Simulation results. Mesh. Mesh generation. Solver.

leola
Télécharger la présentation

Database Methods for Scientific Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Methods for Scientific Computing David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint work with Tiankai Tu and Julio Lopez)

  2. t The Scientific Computing Process Physical model Simulation results Mesh Mesh generation Solver Visuali- zation

  3. t The Euclid Project • Goal: Run large-scale physical simulations on PC’s with limited physical memory. • Approach: Index and store the input and output datasets in databases, and compute on the databases directly. • Requires research at the intersection of scientific computing, algorithms, databases, and systems. Physical model DB Mesh DBs Simulation results DB Mesh generation Solver Visuali- zation

  4. David O’Hallaron, Jacobo Bielak, Omar Ghattas (Carnegie Mellon) Jonathan Shewchuk (UC Berkeley) Steven Day (SD State)

  5. Teora, Italy 1980

  6. lat. 34.38 long. -118.16 epicenter lat. 34.32 long. -118.48 x San Fernando Valley lat. 34.08 long. -118.75 San Fernando Valley

  7. San Fernando Valley (Top View) Hard rock epicenter x Soft soil

  8. San Fernando Valley (Side View) Soft soil Hard rock

  9. Node Distribution

  10. Partitioned Unstructured Mesh nodes element

  11. Simulation and Visualization

  12. t Scientific Computing with Euclid • Represent physical model, mesh, and simulation results on disk in spatial database structures called etrees (Euclid trees) • Linear octree indexed by standard Morton-based locational codes. • Disk pages indexed by standard B-tree indexing structure. • Perform entire process out-of-core by querying and updating the etrees. Mesh node and element etrees Physical model etree Simulation results etree Mesh generation Solver Visuali- zation

  13. h2 h4 slave node h m h1 h3 h m e g b l j d f i j k l a b c a c i k d e f g h2 h3 h4 h1 element/octant master node Octrees Octree mesh generation Balance requirement for meshes (2-to-1 constraint)

  14. h m i j k l a b c d e f g h g b l f e j d a i k c Linear Octrees y 8 7 h m 6 5 4 e g b l j 3 d f 2 a c i k 1 0 x B-tree index 0 1 2 3 4 5 6 7 8 m B-tree Pages

  15. Addressing Linear Octree Elements y 8 d’s left-lower corner (2, 2) 7 h m 6 Binary form (010, 010) 5 4 e g b l j 3 Interleave the bits to obtain Morton code d f 2 a c i k 1 010 010 0 x 0 1 2 3 4 5 6 7 8 00 11 00 Morton code: Maps n-dimensional points to one-dimensional scalars Locational code: Appends an octant’s level to the Morton code of its left-lower corner Append level of d to obtain locational code 001100_11

  16. m x h m i j k l a b c d e f g Nice Properties of Linear Octrees m h e g b l j d f a c i k An addressing scheme that clusters nearby octants Finding an octant without knowing its locational code The order imposed by the locational code is the same as the preorder traversal of leafs in octree

  17. Etree Mesh Generator Application-specific input element database unbalanced octree balanced octree construct balance transform etree library etree library etree library node database

  18. Etree Library: A Framework In C for Manipulating Etrees on Disk • Etree API— Octant (insert) and octree (balance) level operations. • Linear octree— Well-known coding scheme to assign keys to octants. • Auto navigation— New algorithm for constructing octree automatically. • Local balancing— New algorithm to speed up balancing operation. • B-tree — Well-known DB indexing structure. Application (e.g., construct, balance) Etree API Etree Library Linear Octree Local Balancing Auto Navigation B-Tree

  19. 01 10 11 01 10 11 00 00 Mesh Element Etree root 01 10 11 A F G 01 10 11 01 10 11 00 00 B C D E B-tree page (locational code keys) 0000_01 A 0100_10 B 0101_10 C 0110_10 D 0111_10 E 1000_01 F 1100_01 G exact hit aggregate hit X:0101_10 Y:1010_10 KEY FACT: Leaf nodes and aggregated nodes can be located within a B-tree page with a fast binary search, without traversing the edges of the octree. 00

  20. 000000 a 000100 b 000101 c 000110 d 000111 e 001000 f 001100 g 001101 h 010000 i 010010 j 011000 k 100000 l 100100 m 110000 n Mesh Node Etree j(1,4) k(2,4) n(4,4) i(0,4) c(0,3) h(2,3) e(1,3) b(0,2) m(4,2) g(2,2) d(1,2) a(0,0) f(2,0) l(4,0) B-tree leaf page 1 (Morton code keys) B-tree leaf page 2 (Morton code keys)

  21. Auto Navigation Navigation octree • Guided by an application function • An in-memory pointer-based octree • Dynamically grows in depth-first fashion • Leaf octants are pruned and flushed to disk in preorder (in increasing locational code order) • Appends the octants to the etree database to avoid database search : Octants not yet processed (in memory) : Non-leaf octants being decomposed (in memory) : Leaf octants (flushed to database)

  22. Local Balancing Operational steps Partition the entire domain into equal-size blocks Perform internal balancing to enforce 2-to-1 constraint within each block (in a memory resident blocking array) Perform boundary balancing to resolve interactions between adjacent blocks Key Fact: Interactions between adjacent blocks arealways absorbedby boundary octantsandwill not be propagated into the blocks.

  23. Some Evaluation Questions Is etree mesh generation feasible? How does running time vary with the physical memory size? What is the performance impact of auto navigation? What is the performance impact of local balancing?

  24. Evaluation Methodology Used etree mesh generator to build family of finite element meshes for San Fernando Valley earthquake ground motion simulations. SFx : A mesh of the 50 km x 50 km x 12 km San Fernando Valley that resolves seismic waves with periods of at most x seconds.

  25. Evaluation Setup All experiments conducted on a PIII 1GHz machine running Linux 2.4.17. Machine’s physical memory for the experiments ranged from 128 MB to 880 MB. Before each experiment, two 1.5 GB files were sequentially scanned to ensure that the operating system’s buffer cache was flushed.

  26. Etree Feasibility All experiments performed with 128 MB physical memory • Generating a mesh with 13.6 million elements and of size 4.3 GB in 2.6 hours seems reasonable • The overall throughput increases with mesh size

  27. Impact of Physical Memory Size • Memory size does not have a significant impact on the running time • The etree method is not relying on the operating system’s internal caching mechanism to achieve its performance

  28. Impact of Auto Navigation • Reducing B-tree buffer size does not increase the construction time • Auto navigation is not sensitive to B-tree buffer size

  29. Impact of Local Balancing • Achieves speedups ranging from 8 (SF1) to 28 (SF10) • Benefits from the one-time scan of the database and the efficient array-based neighbor finding algorithm

  30. Some Related Work General octree algorithms: Samet 90 Octree mesh: Shepard & Geoges 91, Bern et al. 90, Young et al. 91, Wang99 Out-of-core octree solver method: Salmon 97 Linear quadtree: Gargantini 82, Morton 66 Space filling curve: Orenstein 84, Orenstein 86, Faloutsos & Roseman 89 Large dataset processing: Freitag & Loy 99, Seamons & Winslett 96, Ferreira et al. 99, Kurc et al. 01, Choudhary et al. 99, Parashar & Browne 97

  31. Summary and Conclusions • Euclid project aims to recast entire scientific computing process in terms of database ops. • Incorporating existing database techniques (linear octree and B-tree) with new algorithms (auto navigation and local balancing) in a unified framework (the etree) can deliver new capabilities. • On the horizon: • Caching and prefetching for etree solver • Remote access and derived value caching for visualization • Parallell visualization system based on etrees • Unstructured tetrahedral mesh generation using R-trees.

  32. Etree API Unix file I/O style, three levels of abstraction: Initialization and cleanup. e.g., etree_t *etree_open(const char *path, int flag, …); Octant-level operations. e.g., int etree_insert(etree_t *ep, location_t loc, void* value); Octree-level operations. e.g., int etree_balance(etree_t *ep, decom_t *baldecom);

More Related