570 likes | 654 Vues
Statistical Bases for Map Reconstructions and Comparisons. Jerry Platt May 2005. Preliminaries. Motivation Do Different Maps “Differ”? Methods Singular-Value Decomposition Multidimensional Scaling and PCA Mantel Permutation Test Procrustean Fit and Permu. Test
E N D
Statistical Bases for Map Reconstructions and Comparisons Jerry Platt May 2005
Motivation Do Different Maps “Differ”? Methods Singular-Value Decomposition Multidimensional Scaling and PCA Mantel Permutation Test Procrustean Fit and Permu. Test Bidimensional Regression Working Example Locational Attributes of Eight URSB Campuses Outline
Comparing Maps Over Time Accuracy of a 14th Century Map Leader Image Change in Great Britain Where IS Wall Street, post-9/11? Comparing Maps Among Sub-samples Things People Fear, M v. F Face-to-Face Comparisons Comparing Maps Across Attributes Competitive Positioning of Firms Chinese Provinces & Human Dev. Indices Motivation
Accuracy of a 14th Century Map http://www.geog.ucsb.edu/~tobler/publications/ pdf_docs/geog_analysis/Bi_Dim_Reg.pdf
http://www.mori.com/pubinfo/rmw/two-triangulation-models.pdf
Things People Fear, F v. M http://www.analytictech.com/borgatti/papers/borgatti %2002%20-%20A%20statistical%20method%20for%20comparing.pdf
Face-to-Face Comparisons http://www.multid.se/references/Chem%20Intell%20Lab%20Syst%2072,%20123%20(2004).pdf
Eigen-Analysis and Singular-Value Decomposition Multidimensional Scaling & Principal Comps. Mantel Permutation Test Procrustean Fit and Permutation Test Bidimensional Regression Methods
C = an NxN variance-covariance matrix Find the N solutions to C = = the N Eigenvalues, with 1≥ 2≥ … = the N associated Eigenvectors C = LDL’, where L = matrix of s D = diagonal matrix of s Eigen-analysis
Every NxP matrix A has a SVD A = U D V’ Columns of U = Eigenvectors of AA’ Entries in Diagonal Matrix D = Singular Values = SQRT of Eigenvalues of either AA’ or A’A Columns of V = Eigenvectors of A’A Singular Value Decomposition
A is a column-centered data matrix A = U D V’ V’ = Row-wise Principal Components D ~ Proportional to variance explained UD = Principal Component Scores DV’ = Principle Axes Principal Component Analysis
A is a column-centered dissimilarity matrix B = B = U D V’ B = XX’, where X = UD1/2 Limit X to 2 Columns Coordinates to 2d MDS Multidimensional Scaling
Given Dissimilarity Matrices A and B: A Random Permutation Test N! Permutations 37! = 1.4*E+43 8! = 40,320
Permutation Tests Observed Test Statistic TS = 25 # Correct Of 37 SB. Is 25 Significantly > 18.5? Ho: TS = 18.5 HA: TS > 18.5 P = .069 P > .05 Do Not Reject Ho Permute List & rerun
Centering & Scaling Rotation & Dilation to Min ∑(є2) Mirror Reflection http://www.zoo.utoronto.ca/jackson/pro2.html
Two NxP data configurations, X and Y X’Y = U D V’ H = UV OLS Min SSE = tr ∑(XH-Y)’(XH-Y) = tr(XX’) + tr(YY’) -2tr(D) = tr(XX’) + tr(YY’) – 2tr(VDV’) Procrustean Analysis
Y = X + Y = Xb + e X = UDV’ b = VrD-1Ur’Y, where r = first r columns (N>P) b = (X’X)-1X’Y b = VrVr’ Estimated Y values = Ur Ur’Y OLS Regression
(Y,X) = Coordinate pair in 2d Map 1 Y = 0 + 0X (A,B) = Coordinate pair in 2d Map 2 E[A] 1 1 -2 X 1 E[B] 1 2 1 Y 2 1 = Horizontal Translation 2 = Vertical Translation = Scale Transformation = SQRT(12 + 22) = Angle Transformation = TAN-1(2 / 1 ) +1800 Bidimensional Regression + = + Iff 1 < 0
Angle of rotation around origin (0,0) Horizontal & Vertical Translation Although r = 1, differ in location, scale, and angles of rotation around origin (0,0) Scale transform, with < 1 if contration, & > 1 if expansion
Working Example • Eight URSB Campuses • RD, BK, TO, RC, SA, RV, SD, TA • Data Sources • Locations • Housing Attributes • Tapestry Attributes • Data Analyses
87.5 miles 88.1 miles
BK RC RD RV TO SA TA SD
Treat Distance Matrix as Dissimilarity Matrix Apply Multidimensional Scaling Apply the two-dimension solution “as if” it represents latitude and longitude coordinates … and if DISTANCES available, but COORDINATES Unavailable?
Distance Estimates Vary … But Not “Significantly”
Errors “appear” to be quite small … BUT is there a way to test if errors are “STAT SIGNIF” ? RD RV RC TA BK SD SA TO
Procrustean Test:MDS Map Recreation CONCLUDE: Near-perfect Map Recreation
Driving Distances Do these differ “significantly” from linear distances? PRACTICAL STATISTICAL
DriveD = Driving DistancesEight URSB Locations Multidimensional Scaling, with 2-dimension solution
RD RV RC TA SA BK SD TO
PROTEST Comparison Bidimensional Regression Procrustean Rotation