FAMILY HISTORY TECHNOLOGY WORKSHOP February 3, 2012

Improving Indexing Efficiency & Quality:Comparing A-B-Arbitrate and Peer Review FAMILY HISTORY TECHNOLOGY WORKSHOPFebruary 3, 2012 Derek Hansen, Jake Gehring, Patrick Schone, and Matthew Reid

FamilySearch

FamilySearch indexing

A-b-arbitrate process (a-b-arb) A ARB B

The problem

Our approach • Historical Data Analysis • Field Experiment comparing quality control models

Historical data analysis • Quality (estimated based on A-B agreement) • Measures difficulty more than actual quality • Underestimates quality, since an experienced Arbitrator reviews all A-B disagreements • Good at capturing differences across people, fields, and projects • Time (calculated using keystroke-logging data) • Idle time is tracked separately, making actual time measurements more accurate • Outliers removed

A-B Agreement by field

A-b agreement by language English Language French Language Given Name: 62.7% Surname: 48.8% 1871 Canadian Census • Given Name: 79.8 • Surname: 66.4

A-b agreement by experience Birth Place: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)

A-b agreement by experience Given Name: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)

A-b agreement by experience Surname: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)

A-b agreement by experience Gender: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)

A-b agreement by experience Canada - English U.S. - English Mexico - Spanish Canada - French

Time & keystroke by experience

Time & Keystroke of ARB

A new approach? (A-R-ARB) • Peer review model • Efficiency ++ • Quality ?

Peer review process (A-R-ARB) A R ARB Already Filled In Optional?

Field Experiment • Develop Truth Set of 2,000 1930 Census images • Use historical A-B-ARB data • Create new A-R-ARB dataset by having new indexers review and arbitrate • Compare quality & efficiency • Qualitatively identify types of errors

Discussion IMPLICATIONS • Transition users from novice to expert • Recruit foreign language indexers • Intelligent matching based on expertise (in A-B-ARB &/or A-R-ARB) FUTURE POSSIBILITIES • Peer review by algorithms? • Initial indexing by algorithms?

FAMILY HISTORY TECHNOLOGY WORKSHOP February 3, 2012

FAMILY HISTORY TECHNOLOGY WORKSHOP February 3, 2012

Presentation Transcript

DECEASED ESTATES and WILLS WORKSHOP 3 February 2012 10h00 13h00

February 26 March 3, 2012

Curriculum Scoping Workshop 3 February 2011

3 rd February 2012

Opportunities in Technology February 23, 2012

Neglect Workshop 8 th February 2012

Board budget workshop February 21, 2012

Family History

Friday February 3, 2012

February 3, 2012

February 13 – March 3, 2012

February 3 rd , 2012

Film – February 3, 2012

3 February 2012

My family tree a-3 February 25,2013

Family History

February 3, 2012

2011-2012 Staffing Workshop February 3, 2010

Family History

Lecture 40: Family History Technology

DECEASED ESTATES and WILLS WORKSHOP 3 February 2012 10h00 – 13h00

Family History Class Lesson 3