efficient handling of massive terrain datasets n.
Skip this Video
Loading SlideShow in 5 Seconds..
Efficient Handling of Massive (Terrain) Datasets PowerPoint Presentation
Download Presentation
Efficient Handling of Massive (Terrain) Datasets

Efficient Handling of Massive (Terrain) Datasets

164 Views Download Presentation
Download Presentation

Efficient Handling of Massive (Terrain) Datasets

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling ofMassive (Terrain) Datasets Lars Arge

  2. Massive Data Algorithmics • Massive data being acquired/used everywhere • Storage management software is billion-$ industry • Science is increasingly about mining massive data (Nature 2/06) Examples (2002): • Phone: AT&T 20TB phone call database, wireless tracking • Consumer: WalMart 70TB database, buying patterns • WEB: Google index 8 billion web pages • Geography: NASA satellites generate Terrabytes each day

  3. Terrain Data • New technologies: Much easier/cheaper to collect detailed data • Previous ‘manual’ or radar based methods • Often 30 meter between data points • Sometimes 10 meter data available • New laser scanning methods (LIDAR) • Less than 1 meter between data points • Centimeter accuracy (previous meter) Denmark: • ~2 million points at 30 meter (<<1GB) • ~18 billion points at 1 meter (>>1TB) • COWI (and other) now scanning DK • NC scanned after Hurricane Floyd in 1999

  4. read/write head read/write arm track magnetic surface Massive data = I/O-Bottleneck • I/O is often bottleneck when handling massive datasets • Disk access is 106 times slower than main memory access! • Disk systems try to amortize large access time transferring large contiguous blocks of data • Need to store and access data to take advantage of blocks! “The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a sharpener on someone else’s desk.” (D. Comer)

  5. I/O-efficient Algorithms • Taking advantage of block access important • Traditionally algorithms developed without block considerations • I/O-efficient algorithms leads to large runtime improvements Normal algorithm running time I/O-efficient algorithm datasize Main memory size

  6. Scalability: Hierarchical Memory • Block access not only important on disk level • Machines have complicated memory hierarchy • Levels get largerandslower • Block transfers on all levels R A M L 1 L 2 running time datasize

  7. My Research Work • Theoretically I/O- (and cache-) efficient algorithms work • Data structures, computational geometry, graph theory, … • Focus on Geographic Information Systems problems • Algorithm engineering work, e.g • TPIE system for simple, efficient, and portable implementation of I/O-efficient algorithms • Software for terrain data processing • LIDAR data handling • Terrain flow computations

  8. Example: Terrain Flow • Terrain water flow has many important applications • Predict location of streams, areas susceptible to floods… • Conceptually flow is modeled using two basic attributes • Flow direction: The direction water flows at a point • Flow accumulation: Amount of water flowing through a point • Flow accumulation used to compute other hydrological attributes: drainage network, topographic convergence index… 7 am 3pm

  9. Terrain Flow Accumulation • Collaboration with environmental researchers at Duke University • Appalachian mountains dataset: • 800x800km at 100m resolution  a few Gigabytes • On ½GB machine: • ArcGIS: • Performance somewhat unpredictable • Days on few gigabytes of data • Many gigabytes of data….. • Appalachian dataset would be Terabytes sized at 1m resolution • 14 days!!

  10. Terrain Flow Accumulation: TerraFlow • We developed theoretically I/O-optimal algorithms • TPIE implementation was very efficient • Appalachian Mountains flow accumulation in 3 hours! • Developed into comprehensive software package for flow computation on massive terrains: TerraFlow • Efficient: 2-1000 times faster than existing software • Scalable: >1 billion elements! • Flexible: Flexible flow modeling (direction) methods • Extension to ArcGIS

  11. LIDAR Terrain Data Work: TerraStream • Now TerraStream software “pipeline” for handling terrain data • Points to DEM (incl. breaklines) • DEM flow modeling (incl “flooding”, “flat” routing, “noise” reduce) • DEM flow accumulation (incl river extraction) • DEM hierarchical watershed computation • All work for both grid and TIN DEM’s • Capable of handling massive datasets • Test dataset: 400M point Neuse river basin (1/3 NC) (>17GB)

  12. Examples of Ongoing Terrain Work • Terrain modeling, e.g • “Raw” LIDAR to point conversion (LIDAR point classification) (incl feature, e.g. bridge, detection/removal) • Further improved flow and erosion modeling (e.g. carving) • Contour line extraction (incl. smoothing and simplification) • Terrain (and other) data fusion (incl format conversion) • Terrain analysis, e.g • Choke point, navigation, visibility, change detection,… • Major grand goal: • Construction of hierarchical (simplified) DEM where derived features (water flow, drainage, choke points) are preserved/consistent

  13. Thanks Lars Arge Work supported in part by • US National Science Foundation ESS grant EIA–98070734, RI grant EIA–9972879, CAREER grant EIA–9984099, and ITR grant EIA–0112849 • US Army Research Office grants W911NF-04-1-0278 and DAAD19-03-1-0352 • Ole Rømer Scholarship from Danish Science Research Council • NABIIT grant from the Danish Strategic Research Council • Danish National Research foundation (MADALGO center)