240 likes | 267 Vues
Mete Celik 1,3 , Baris M. Kazar 4 , Shashi Shekhar 1,3 , Daniel Boley 1 , David J. Lilja 1,2 1 CSE Department @ University of Minnesota, Twin Cities 2 ECE Department @ University of Minnesota, Twin Cities 3 Army High Performance Computing Research Center 4 Oracle USA.
E N D
Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar 1,3, Daniel Boley 1, David J. Lilja 1,2 1 CSE Department @ University of Minnesota, Twin Cities 2 ECE Department @ University of Minnesota, Twin Cities 3 Army High Performance Computing Research Center 4 Oracle USA Spatial Dependency Modeling Using Spatial Auto-Regression
Outline of Today’s Talk • Motivation & Background • Problem Definition • Related Work & Contributions • Proposed Approach • Experimental Evaluation • Conclusion & Future Work Spatial Dependency Modeling Using SAR
Motivation • Widespread use of spatial databases • Mining spatial patterns • The 1855 Asiatic Cholera on London [Griffith] • Fair Landing [NYT, R. Nader] • Correlation of bank locations with loan activity in poor neighborhoods • Retail Outlets [NYT, Walmart, McDonald etc.] • Determining locations of stores by relating neighborhood maps with customer databases • Crime Hot Spot Analysis [NYT, NIJ CML] • Explaining clusters of sexual assaults by locating addresses of sex-offenders • Ecology [Uygar] • Explaining location of bird nests based on structural environmental variables Spatial Dependency Modeling Using SAR
Spatial Auto-correlation (SA) • Random Distributed Data (no SA): Spatial distribution satisfying assumptions of classical data Pixel property with independent identical distribution Random Nest Locations • Cluster Distributed Data: Spatial distribution NOT satisfying assumptions of classical data Pixel property with spatial auto-correlation Cluster Nest Locations Spatial Dependency Modeling Using SAR
Execution Trace 6th row 6th row • Given: • Spatial framework • Attributes Space + 4-neighborhood Binary W Row-normalized W • Wallows other neighborhood definitions • distance based • 8-neighbors Spatial Dependency Modeling Using SAR
SDM Provides Better Model! • Linear Regression → SAR • Spatial auto-regression (SAR) model has higher accuracy and removes IID assumption of linear regression Spatial Dependency Modeling Using SAR
= + + n-by-n n-by-1 n-by-k n-by-1 n-by-1 1-by-1 k-by-1 Data Structures in SAR Model • Vectors:y, β, ε • Matrices: W, x • Wis a large matrix Spatial Dependency Modeling Using SAR
Computational Challenge • Maximum-Likelihood Estimation = MINimizing the log-likelihood Function • Solving SAR Model • = 0 → Least Squares Problem • = 0, = 0 → Eigen-value Problem • General case: → Computationally expensive due to the log-det term in the ML Function Log-det term Theorem 1 SSE term Spatial Dependency Modeling Using SAR
Outline • Motivation & Background • Problem Definition • Related Work & Contributions • Proposed Approach • Experimental Evaluation • Conclusion & Future Work Spatial Dependency Modeling Using SAR
Problem Statement Given: • A spatial framework S consisting of sites {s1, …, sq} for an underlying geographic space G • A collection of explanatory functions fxk: S k , k=1,…, K. k is the range of possible values for the explanatory functions • A dependent function fy: y • A family of F (SAR equation) of learning model functions mapping 1 x … x k y • A neighborhood relationship (4 and 8- neighbor) on the spatial framework Find: • The SAR parameter and the regression coefficient vector with a desired precision to save log-det computations. Spatial Dependency Modeling Using SAR
Problem Statement – Cont’d Objective: • Algebraic error ranking of approximate SAR model solutions. Constraints: • S is a multi-dimensional Euclidean Space, • The values of the explanatory variables xand the dependent function (observed variable) ymay not be independent with respect to those of nearby spatial sites, i.e., spatial autocorrelation exists. • The domain of xand yare real numbers. • The SAR parameter varies in the range [0,1), • The error is normally distributed with unit standard deviation and zero mean, i.e., ~N(0,2I) IID • The neighborhood matrix W exhibits sparsity. Spatial Dependency Modeling Using SAR
Related Work Spatial Dependency Modeling Using SAR
Contributions • A new approximate SAR model solution: Gauss-Lanczos approximation method • Key Idea: Do not find all of the eigenvalues of W • Error ranking of approximate SAR model solutions Spatial Dependency Modeling Using SAR
Outline • Motivation & Background • Problem Definition • Related Work & Contributions • Proposed Approach • Experimental Evaluation • Conclusion & Future Work Spatial Dependency Modeling Using SAR
Gauss-Lanczos Approximation • Log-det is approximated by transforming the eigenvalue problem to the quadratic form. • Finally, Gauss-type quadrature rules are applied using Lanczos procedure Spatial Dependency Modeling Using SAR
How does GL Method Work? • GL (Algorithm 3.2) is repeated • m (i.e., 400) times in our experiments • Parameter r varies between 5 and 8 in our experiments. • For large problem sizes, the effects of m and r for getting good solution are low. Spatial Dependency Modeling Using SAR
Taylor’s Series Approximation • Log-det term in terms of Taylor’s Series • Trace is sum of eigen-values& W is symmetrized neighborhood matrix Spatial Dependency Modeling Using SAR
Chebyshev Polynomial Approximation • Log-det term in terms of Chebyshev Polynomials • Trace is sum of eigen-values, Ts are matrix polynomials, cs are Chebyshev polynomial coefficients Spatial Dependency Modeling Using SAR
Outline • Motivation & Background • Problem Definition • Related Work & Contributions • Proposed Approach • Experimental Evaluation • Conclusion & Future Work Spatial Dependency Modeling Using SAR
Experiment Design Spatial Dependency Modeling Using SAR
Exact and Approximate Values of Log-det • GL gives better approximation while spatial autocorrelation • increases Spatial Dependency Modeling Using SAR
Absolute Relative Error of Approximations • Absolute relative error of approximation goes down as • spatial autocorrelation increases (GL Mean error % 0.9, GL max error % 1.78) Spatial Dependency Modeling Using SAR
Conclusions • GL is slightly more expensive than Taylor series and Chebyshev polynomials. • GL gives better approximations when spatial autocorrelation is high and the problem size is large. • GL quality depends on the number of iterations and the initial Lanczos vector and the random number generator. • No need to compute all eigenvalues. Spatial Dependency Modeling Using SAR
AHPCRC Minnesota Supercomputing Institute (MSI) Spatial Database Group Members ARCTiC Labs Group Members Dr. Dan Boley Dr. Sanjay Chawla Dr. Vipin Kumar Dr. James LeSage Dr. Kelley Pace Dr. Pen-Chung Yew ` Acknowledgments THANK YOU VERY MUCH Q/A Spatial Dependency Modeling Using SAR