160 likes | 383 Vues
Data Linkage Project. Florida’s Newborn Screening Program. Gary Sammet Bureau of Vital Statistics. Outline. Data Linkage Approach Start with Probabilistic Linking Data Linkage Automated Process Flow Data Processing Design: Linking Variables, Weights, Bonuses, Use of Jaro-Winkler
E N D
Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics
Outline • Data Linkage Approach • Start with Probabilistic Linking • Data Linkage Automated Process Flow • Data Processing Design: Linking Variables, Weights, Bonuses, Use of Jaro-Winkler • Data Processing Sample Results
Data Linkage Approach • VS & LAB work closely together • System can accommodate needs • Reduce duplication of efforts • Reconciliation • All births have a screening record • All screening records have a birth • Most cost effective with best results
Start With Probabilistic Linking • Identify linking variables - assign initial weight based on understanding & experience w/data • Run initial linking - sort by weight & display linkage flags to see data patterns/anomalies • Adjust weights as needed w/o changing code • Define deterministic rules to ensure consistent linking in automated process
Weight Bonuses • DOB, Time of Birth, Sex, Facility + Zipcode(MFirst or MSSN) BONUS = .50 • DOB, Time of Birth, Sex, Facility-JW + Zipcode (MFirst or MSSN) BONUS = .40 • DOB, Time of Birth, Sex, Facility + ZipcodeBONUS = .20 • DOB, Time of Birth, Sex, Facility-JW + Zipcode BONUS = .15
Linking With Jaro-Winkler • With Exact Facility + Zipcode Match 41% - Facility & Zipcode must match • With Jaro-Winkler Facility + Zipcode Match Additional 36.84% Total Match = 77.84% vs. just 41% Examples: LAB FACILITY NAME FLORIDA HOSP ORLANDO – LAB SHANDS AT THE UNIV OF FLA BROWARD MED CTR SHANDS AT JACKSONVILLE HOLLYWOOD BIRTH CENTER, INC VS FACILITY NAME FLORIDA HOSP ORLANDO SHANDS AT UF BROWARD MEDICAL CENTER SHANDS JACKSONVILLE HOLLYWOOD BIRTH CENTER
Linking Mother Address & City • Only 16% match on exact mother address & city • Additional 56% match on mother address & city, using Jaro-Winkler Total Match: 72% vs. just 16% Examples: LAB Mother AddressVS Mother AddressLAB CityVS City 2323 SAMSON ROAD 2323 SAMSON RD ORLANDO ORLANDO 5105 NE 75TH AVE 5105 NE 75 AVENUE MIAMI MIAMI 1001 MAIN ST APT A 1001 MAIN ST APT A KEY WEST KEY WEST 532 HORNET CT 532 HORNET COURT PENSACOLA PENSACOLA 101 MAGIC CIR 101 MAGIC CIRCLE TAMPA TAMPA
Data Processing Results • LAB Data with DOB 12/1-31/2010 Unduplicated On OrigSpecID: 9,211 rows • VS Data with DOB 11/1 – 12/31/2010 Unduplicated on State File Number: 37,741 rows • 99% Unduplicated & Linked Records with weighted score > 2.5
Overall Linkage Results • 98 – 99 % using back-end approach • Still not good enough • Follow Rhode Island front-end approach
Advantages of Front-end Linkage • Provide real-time linkage at hospital with VS Birth Date & NBS demographic data • Reduces data entry by hospital staff • Provide daily report of unlinked/missing records • Provide LAB w/checklist of incoming blood specimens • Reduce follow-up by state staff to hospitals • Allow end-users (hospitals, MDs) ability to view electronic patient reports/results in real-time
Acknowledgements Ken Jones Bureau Chief/Deputy State Registrar Bureau of Vital Statistics Sharon Dover Operations Manager Bureau of Vital Statistics Paula Stewart Database Analyst Health Statistics & Assessment