120 likes | 258 Vues
Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI). Laura O’Sullivan Statistics New Zealand laura.o’sullivan@stats.govt.nz. IAOS Vietnam October 2014. Outline. The Integrated Data Infrastructure (IDI) Terminology IDI linking
E N D
Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’SullivanStatistics New Zealandlaura.o’sullivan@stats.govt.nz IAOS Vietnam October 2014
Outline • The Integrated Data Infrastructure (IDI) • Terminology • IDI linking • Near-exact and non-exact • Selecting cut-offs • Quality • Clerical review • Linking at Statistics New Zealand and at the Australian Bureau of Statistics
Integrated Data Infrastructure (IDI) Education Student loans & allowances Benefits Migration & movements Person-centred data Tax Business data Justice Families & households Health & safety 3
Terminology • Data integration (aka Record linkage) • Deterministic linking • Probabilistic linking (Fellegi-Sunter theory) • Weights • Represent the probability that two records are from the same person
Quality True matches Non matches Linked Unlinked
Near-exact and non-exact • First name and Last name agreement • Date of birth agreement
Quality in the IDI • False positive rates • Sample from non-exact links • Assume near-exact links are true matches • Use proportional sampling • Non-exact rates • Monitoring
Clerical review A link with two first names matching and different last name A link with unique identifiers and missing name information in one dataset A link with missing name information and without unique identifiers
Statistics New Zealand and the Australian Bureau of Statistics • Statistics New Zealand • Census to the Post-enumeration survey (PES) • Linking the longitudinal census • Australian Bureau of Statistics • Linking projects using name and address • Census data enhancement project
Thank you for listening • Questions • laura.o’sullivan@stats.govt.nz