1 / 12

Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI). Laura O’Sullivan Statistics New Zealand laura.o’sullivan@stats.govt.nz. IAOS Vietnam October 2014. Outline. The Integrated Data Infrastructure (IDI) Terminology IDI linking

Télécharger la présentation

Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’SullivanStatistics New Zealandlaura.o’sullivan@stats.govt.nz IAOS Vietnam October 2014

  2. Outline • The Integrated Data Infrastructure (IDI) • Terminology • IDI linking • Near-exact and non-exact • Selecting cut-offs • Quality • Clerical review • Linking at Statistics New Zealand and at the Australian Bureau of Statistics

  3. Integrated Data Infrastructure (IDI) Education Student loans & allowances Benefits Migration & movements Person-centred data Tax Business data Justice Families & households Health & safety 3

  4. Terminology • Data integration (aka Record linkage) • Deterministic linking • Probabilistic linking (Fellegi-Sunter theory) • Weights • Represent the probability that two records are from the same person

  5. Cut-offs

  6. Quality True matches Non matches Linked Unlinked

  7. Near-exact and non-exact • First name and Last name agreement • Date of birth agreement

  8. Selecting the cut-off

  9. Quality in the IDI • False positive rates • Sample from non-exact links • Assume near-exact links are true matches • Use proportional sampling • Non-exact rates • Monitoring

  10. Clerical review A link with two first names matching and different last name A link with unique identifiers and missing name information in one dataset A link with missing name information and without unique identifiers

  11. Statistics New Zealand and the Australian Bureau of Statistics • Statistics New Zealand • Census to the Post-enumeration survey (PES) • Linking the longitudinal census • Australian Bureau of Statistics • Linking projects using name and address • Census data enhancement project

  12. Thank you for listening • Questions • laura.o’sullivan@stats.govt.nz

More Related