Innovative Uses of Geographic Information Systems Lance A. Waller Department of Biostatistics Rollins School of Public Health Emory University firstname.lastname@example.org
Outline • Why does the geography of immunization matter? • What is GIS? • What does GIS do? • What data do I have? • What questions can I answer with my data?
Why geography? • Is immunization coverage constant? • If you know where coverage is low, can you do something? • If you know where coverage is high, can you learn something?
What is GIS? • A “geographic information system” (GIS) links: • Geographic features • Houses • Census tracts • Attribute measurements • Immunized (yes/no) • Age • Sociodemographics
Think of… Each cell contains an attribute value linked with Map (locations) Table (attributes) Objects on the map are features.
What does a GIS do? Basic GIS operation #1: • Layering Non-compliers Health center cachement Compliers
Basic GIS operation # 2: • Buffering • Find areas within a user-specified distance of: • points • lines • areas
Famous public health map ! Snow, J. (1949) Snow on Cholera. Oxford University Press: London.
Wow! Can we do that? • Many introductions to GIS and public health essentially say: • “If John Snow could do it with shoe leather, ink, and paper, just imagine what we can do with a computer!”
Basic take-home figure • The Whirling Vortex of GIS analysis The question you want to answer The question you can answer with those data GIS The data you need to answer that question The data you can get Original source: Toxicologist EPA Region IV
What kind of questions? • Where is coverage the lowest? • Where is coverage the highest? • Outbreak size starting in high coverage area? • Outbreak size starting in low coverage area? • How could coverages impact the course of an outbreak? • Best response to current outbreak?
What kind of attributes? • Compliers • Residence location • Census region counts • Sociodemographic data • Census summaries on age, race, sex, income of census region residents • Some information on compliers’ sociodemographics
Additional attributes • Noncompliers • Residence location • Regional counts • School data • School district • Health plan data • Billing provides residence address • ZIP codes?
Basic location types • Point data • Latitude and longitude • (Seems) precise • Distance calculations • Regional data • Counts (cases/controls) from census regions
Any complications? • Maxcy (1926): Endemic typhus fever in Montgomery, AL • Where is “where”? • Which location for each case? Maxcy, K.F. (1926) “An epidemiological study of endemic typhus (Brill’s disease) in the Southeastern United States with special reference to its mode of transmition.” Public Health Reports41, 2967-2995.
Residence: Employment: Lilienfeld, D.E. and Stolley, P.D. (1994) Foundations of Epidemiology, Third Edition. Oxford University Press: New York, pp. 136-140.
4 1 2 1 1 2 Complications with regions • Counts lose some resolution...
Modifiable Areal Unit Problem • Different aggregations can lead to different results. 4 1 2 1 1 2 2 0 0 0 0 2 2 4 2 1 0 0 0
MAUP example: John Snow ? Monmonier, M (1991) How to Lie with Maps. University of Chicago Press: Chicago. p. 142.
What questions can I ask? • Point locations • Interesting/uninteresting clusters • Interesting: clusters of non-compliers away from clusters of compliers • Regional counts • Interesting/uninteresting raised counts • Interesting: Less coverage than “expected”
Point locations • Treat locations as spatial point process • Spatial “intensity” (average number of events per unit area) • Think of intensity as a surface • Compare intensity of compliers to intensity of non-compliers. • Peaks and valleys in same places?
Monte Carlo simulation • Simulate data sets under null hypothesis (e.g., constant coverage rate). • See if observed data (actual compliers) appear “unusual”. • To compare intensities, split all locations into compliers and non-compliers at random, find out how high peaks, how low valleys can get. • Most GIS packages will not do this, but it is a very handy tool in spatial statistics.
Regions • Compare observed counts to “expected” counts. • Some basic point process results extend to counts (counts of points in regions). • Constant coverage rate (perhaps age-adjusted) again a common way of obtaining “expected” counts. • Monte Carlo simulation for significance.
Related work • Cancer registries: North American Association of Central Cancer Registries (NAACCR) report on GIS (Wiggins 2002) • Birth outcome registries • Public Health/Bioterrorism/Syndromic Surveillance • Similarities: • Registry data • Differences: • Infectious vs. chronic outcome • Urgency of temporality
Conclusion • Best work a collaboration between • Geographers • GISers • Epidemiologists • Statisticians • Get the best data you can to answer the questions you want.
Handy references • Wiggins L (Ed). Using Geographic Information Systems Technology in the Collection, Analysis, and Presentation of Cancer Registry Data: A Handbook of Basic Practices. Springfield (IL): North American Association of Central Cancer Registries, October 2002, 68 pp. • Cromley, E.K. and McLafferty, S.L. (2002) GIS and Public Health. The Guilford Press. • Bailey and Gatrell (1995) Interactive Spatial Data Analysis. Longman. • Waller and Crawford (2004) Applied Spatial Statistics for Public Health Data. Wiley.
What kind of software? Statistical Software(SAS, S+ Spatial Stats)Spatially and/or visually challenged Subject-specificSpaceStat/GeoDaSaTScan GS+ClusterSeerWinBUGS/GeoBUGSXGOBI/XGvisR (many nice spatial modules, must write code, quality control?)Link to GIS S+/ArcView 3.x SAS Bridge to ArcGIS 8.x Commercial GIS Software(ArcView, Mapinfo) Statistically challengedExtensions (Analysts)$$$, limited capability Packages by scientific user good, but basic Scripts and MacrosUser-contributedOften do not give numerical output