220 likes | 236 Vues
This overview explores the quality of geocoded data in the Florida Registry, including identifying errors and monitoring for problems. It covers the components of geocoding quality, geocoding precision and accuracy, and the impact on data analysis. The text is in English.
E N D
Assessing Quality of Geocoded Data The Florida Registry Experience
Overview • What is geocoding quality? • Florida’s geocoding experience • Identifying geocoding errors • Results • before and after improved geocoding • Monitoring for geocoding problems
What is Geocoding? • Spatially enable • Assign geocode • Latitude/Longitude • FIPS—Census Units • Match address to street file • Batch (automated) • Interactive (manual 5-10%)
Geocoding Quality Components • Match rate • Coverage, % with spatial location • Precision • Scale • County center versus census block • NAACCR Items #366,#364,#365 GIS Coordinate Quality, Census Tract Certainty • Accuracy • Correct location
Geocoding Match • Software • Deterministic, Probabilistic • Parsing algorithm, Assumptions (ties) • “Black box” • Underlying street files • Quality of address data • Batch versus manual 133 NE 2nd, Miami, FL Did you mean: 133 NE 2nd St, Miami, 133 SE 2nd Ave, Miami, 133 NW 2nd Ave, Miami, 133 SW 2nd St, Miami, 133 SW 2nd Ave, Miami, 133 SE 2nd St, Miami,
Geocoding Precision • Parcel match • “gold standard” • Match to building footprint • Street level match • Most common • Interpolate along street segment • Centroid • Center of polygon • Block, tract, zipcode, county • Population center, physical
FCDS Geocoding • Proprietary, local vendor • Problems found via use • Reported county does not match geocoded county • Representativeness of cases • Cases assigned to invalid or zero population block groups • Problems found via scrutiny • Cases in nautical areas (not islands) • Vendor assumptions
Geocoding Project • Test file • Created “gold standard” files • FIPS (cancer cases) • Long/Lat (well locations) • Selected a vendor • Based on logistics rather than quality • New vendor re-geocoded entire registry • Compared Results – Before and After 9
Old versus Improved Vendor: Representativeness of Cases • Environmental Health • Re-geocoded our data • Census Data • 96% Black • Old Geocoding Vendor • 15% Black Cases • New Geocoding Vendor • 85% Black Cases
Old versus Improved Vendor: Nautical, Invalid, Zero pop • Cases assigned to the sea • 0 cases from new vendor • Cases assigned to invalid bg • 0 cases from new vendor • Cases assigned to 0, 1, 10 population bgs • 5,765 cases • 743 cases (3+ more years of data) • SF1 vs. SF3; Overlay
Specificity ? Old Data: Improved Data:
Sensitivity ? Old Data: Improved Data:
Validity ? Old Data : Oral Cancer by SES New Data : Oral Cancer by SES Wealthy 37.3 ref Mid High 40.1 RR 1.08 Mid Low 45.4 RR 1.22 Poorest 49.2 RR 1.32 • Wealthy • 34.0 ref • Mid High • 36.6 RR 1.08 • Mid Low • 39.1 RR 1.15 • Poorest • 46.3 RR 1.36
Monitoring Geocoding Quality • % County match • Florida zipcodes; military addresses geocoded to NJ • % Contiguous counties • Incorrect FIPS • Nautical FIPS • # Zero Pops • Representativeness
Impact • Fewer, smaller, lower risk clusters • Greater % ungeocodable • More accurate • Less specific • Ungeocodable • Rural, Poor, Old • Potential bias • Manual geocoding
Addressing Ungeocodables • Address quality? • Implemented edits • Software development • Improve matching algorithm • Specific to our data • Link with administrative databases • DMV, Medicaid, Medicare • Geo-imputation • Kevin Henry • Requires institutional priority ! 20
Acknowledgements • Dr. Greg Kearny • Environmental Health, FL DOH • N. Dean Powell • FCDS • Jackie Button • FCDS • Dr. Monique Hernandez • FCDS • We acknowledge the CDC for financial support under cooperative agreement U58/DP000844 • Contents are responsibility of authors and do not represent views of CDC, FL DOH, or FCDS 22