Advanced Techniques for Analyzing Massive Terrain Data Sets in Hydrology and Ecology
This study presents scalable techniques for modeling and analyzing high-resolution elevation data derived from various sources such as LIDAR and RTK-GPS. A collaborative effort aims to construct Digital Elevation Models (DEMs) that support hydrological and ecological analyses across extensive terrains like the North Carolina coastline and Neuse River basin. Emphasizing automated tracking of changing terrain features, this research addresses the challenges of massive datasets, focusing on I/O-efficient algorithms and interpolation techniques to enhance data processing and visualization.
Advanced Techniques for Analyzing Massive Terrain Data Sets in Hydrology and Ecology
E N D
Presentation Transcript
Modeling & Analyzing Massive Terrain Data Sets Pankaj K. Agarwal Duke University
STREAM Project Scalable Techniques for hi-Resolution Elevation data Analysis & Modeling In collaboration with: Lars Arge, Helena Mitasova Students: Andrew Danner,Thomas Molhave,Ke Yi
Photogrammetry0.76m v. accuracy (5ft contours) 1974 1995 1998 Lidar0.15m v. accuracy; altitude 700m and 2300m RTK-GPS0.10m v. accuracy 2001 2004 1999 Diverse elevation point data: density, distribution, accuracy
Constructing Digital Elevation Models Grid, TIN, Contour DEMs
Natural feature extraction Maps of topo parameters: computed simultaneously with interpolation using partial derivatives of the RST function, terrain features: combined thresholds of topo parameters Ma Extracted foredunes 1999-2004 using profile curvature and elevation threshold
slip faces: slope > 30deg dune crests: profile curvature threshold 1995 new slip face in 1999 2001 Tracking evolving features how to automatically track features that change some of their properties over time?
Hydrology Analysis • Flow Analysis • Watershed hierarchy
Challenge: Massive Data Sets • LIDAR • NC Coastline: 200 million points – over 7 GB • Neuse River basin (NC): 500 million points – over 17 GB • Output is also large • 10ft grid: 10GB • 5ft grid: 40GB
Approximation Algorithms • Exact computation expensive Many practical problems are intractable Multiple & often conflicting optimization criteria Suffices to find a near-optimal solution • Tunable Approximation algorithms • Tradeoff between accuracy and efficiency • User specifies tolerance
read/write head track read/write arm magnetic surface I/O-Bottleneck • Data resides in secondary memory • Disk access is 106 times slower than main memory access • Maximize useful data transferred with each access • Amortize large access time by transferring large contiguous blocks of data • I/O – not CPU – is often the bottleneck for processing massive data sets
Disk RAM CPU I/O-efficient Algorithms [AV88] • Traditional algorithms optimize CPU computation • Not sensitive to penalty of disk access • I/O model • Memory is finite • Data is transferred in blocks • Complexity measured in disk blocks transferred B M B ~ 2K-8K
Elevation Data to Watershed Hierarchies: A Pipeline Approach
Point Cloud to Grid DEM • Given a point cloud sampled from a bivariate function z=F(x,y) (elevation data) • Goal: construct a grid DEM for z • DEM Applications • Hydrology • Ecology • Viewsheds … • Many algorithms only work on grid data
Sub-meter resolution: improved structures representation 2004 lidar-based 30cm resolution DSM 1999 lidar 2m resolution DSM 2004 lidar: 0.5m resolution binning interpolated DSM Binning (within cell average) sufficient for ~1-3m DEMs but interpolation produces improved geometry representation and allows us to create higher resolution DEMs/DSMs
Interpolate O(N3) General Approach: Interpolation • Kriging, IDW, Splines, … z=F(x,y) N Points • Modern data sets can have millions of points • Direct interpolation does not scale
Improving Interpolation with Quad Trees • Segment initial point set using a quad tree • Each quad tree leaf has a <K points (10-100) • Interpolate each segment separately • N/K segments • O(K3) interpolation per segment • O(K2N) total running time • Linear in N • Boundary smoothness? • Use points from nearby segments
Overview of the Algorithm • Construct quad-tree on disk using bulk loading [AAPV 01] • For each quad-tree node, find points in neighboring segments • Arrange points on disk for sequential interpolation • Use the algorithm by Mitasova, Mitas, Harmon • Can use any other interpolation method
Block I/O Building an External Quad Tree K=2 • Build top few levels in memory • Buffer points in lower levels • Write buffers to disk when full
Building a TIN TIN: Triangulated Irregular Network Constrained DelaunayTriangulation Developed an I/O-efficient algorithm Builds TIN for NeuseRiver Basin (14GB) in7.5 hours
Future Work: Hierarchical DEMs • Considerable work in graphics and GIS communities on multiresolution hierarchies for terrains • Focuses on visualization • Most algorithms perform random memory access • Out-of-core simplification • I/O-efficient algorithms for terrain simplification (edge contraction method) • Feature preserving simplification • Preserves main river networks • Preserves visibility for most of the points
Coping with Noisy Data • Nearly all flow models assume [O’Callaghan and Mark 84, de Berg et al. 96] • water flows downwards • stops at a minimum
Denoising via Flooding[Jenson and Domingue 88, Arge et al. 03] • Pour water uniformly on terrain • Water level rises until steady-state
River Network After Flooding After flooding Sort(N) I/Os
Denoising via Flooding[Jenson and Domingue 88, Arge et al. 03] • Pour water uniformly on terrain • Water level rises until steady-state • Use topologicalpersistence (Edelsbrunner et al.) as importance measure to measure the importance of a minimum • Flood low-persistence minima, preserve high-persistence minima
The Union-Find Problem • A universe of N elements: x1, x2, …, xN • Initially N singleton sets: {x1}, {x2 }, …, {xN} • Each set has a representative • Maintain the partition under • Union(xi, xj) : Joins the sets containing xi and xj • Find(xi) : Returns the representative of the set containing xi
Formulated as Batched Union-Find • On a terrain represented as a TIN • When reach • A minimum or maximum: do nothing • A regular point u: Issue union(u,v) for a lower neighbor v • A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) • Sequence of union/find operations can be computed in advance lower link
Classical Union-Find Algorithm representatives d h i p b j a f l z s r c k e g m n Union(d, h) : Find(n) : h h d f l d f l m n b j a b j a m path compression link-by-rank e g e g n
Complexity • O(N α(N)) for a sequence of N union and find operations [Tarjan 75] • α(•) : Inverse Ackermann function (very slow!) • Optimal in the worst case [Tarjan79, Fredman and Saks 89] • Batched (Off-line) version • Entire sequence known in advance • Can be improved to linear on RAM [Gabow and Tarjan 85] • Not possible on a pointer machine [Tarjan79]
Our Results • An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os • Same as sorting • optimal in the worst case • A simpler algorithm using O(sort(N) log(N/M)) I/Os • Implemented • Apply to persistence computation, flooding • O(sort(N)) time • Other applications
I/O-Efficient Batched Union-Find • Two-stage algorithm • Convert to interval union-find • Compute an order on the elements s.t. each union joins two adjacent sets • Solve batched interval union-find • Each stage formulated as a geometric problem and solved using sort(N) I/Os
Experimental Results Traditional algorithm good as long as data fits in memory
Experimental Results:Neuse River Basin 0.5 billion points (14GB raw data), sampled to obtain smaller data sets On the entire data set: EM takes 5 hours, IM fails
Previous Results • Directly maintain contours • O(N log N) time [van Kreveld et al. 97] • Needs union-split-find for circular lists • Do not extend to higher dimensions • Two sweeps by maintaining components, then merge • O(N log N) time [Carr et al. 03] • Extend to arbitrary dimensions
Join Tree and Split Tree[Carr et al. 03] Qualified nodes 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 1 1 1 1 Join tree Split tree Join tree Split tree
Final Contour Tree Hard to BATCH! 9 9 9 8 8 8 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree
9 9 9 Now can BATCH! 8 8 8 u 7 7 u 7 u 6 6 6 v v v 5 5 5 w w w 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree Another Characterization Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge
Experimental Results On the Neuse River Basin data set
Coping with Noise: Future Work • Constructing hydrologically conditioned DEMs • Formulate as an optimization problem • Using persistent homology • Integrating topology and geometry • Handling flat areas • Scalability remains a big issue • Handling anthropogenic features
110 90 85 95 100 80 Sea From Elevation to River Networks • Where does water go? • From higher elevation to lower elevation • Single flow directions forms a tree
1 2 2 1 7 1 9 1 1 2 11 14 17 1 2 25 1 3 7 1 1 1 Drainage Area • How much area is upstream of each node? • Each node has initial drainage area (1) • Drainage area of internal nodes depends on drainage area of children 3 3 5
IFSARE and SRTM based stream analysis Stream and watershed boundary extraction for Panama for on-going water quality research to provide more detailed and up-to-date data. SRTM - 7400x3600 DSM at 90m resolution for entire Panama, IFSARE - 10800x11300 DEM at 10m resolution for the Panama canal section Using r.terraflow - officially released in GRASS6.2 in 2006 - streams can be extracted in 3-4 hours, tile-based approach takes days SRTM DEM for entire Panama with main rivers extracted from DEM in 3 hr
Watershed Hierarchies • Decompose a river network into a hierarchy of hydrological units • All water in HU flows to a common outlet • Hierarchy provides tunable level of detail • Method used: Pfafstetter [VV99] • Want a solution scalable to large modern hi-res terrains
1 2 1 8 3 9 1 7 2 7 3 1 4 5 11 2 9 14 1 3 17 2 1 1 3 25 7 2 6 1 1 1 1 5 Pfafstetter • Find main river • Find four largest tributaries • Label basins/interbasins • Recurse until single path
72 75 7 71 73 74 25 24 21 2 27 26 22 23 Recurse
96 9 97 999 98 998 94 6 994 992 993 99 4 95 93 997 8 92 2 91 996 7 991 5 3 995 1 Example Watershed Boundaries