200 likes | 544 Vues
Machine Learning Classification for Document Review. Tom Barnett, Svetlana Godjevac, Caroline Privault, Jean-Michel Renders, John Schneider, Robert Wickstrom. Time pressure. Information growth. Basic assumptions. Key Words Search addresses recall. Attorney Review addresses precision.
E N D
Machine Learning Classification for Document Review Tom Barnett, Svetlana Godjevac, Caroline Privault, Jean-Michel Renders, John Schneider, Robert Wickstrom
Time pressure Information growth
Basic assumptions Key Words Search addresses recall Attorney Review addresses precision
Tacit Assumption • Attorney review is superior to any other form of review
Problems With Standard Process Scalability Speed Cost Quality Lower recall Lower consistency
CategoriX Combines two facets of document analysis: • Clustering (document identification based on similarity of text content) • Manual categorization
Clustering Technology red green blue
Attorney Categorization non responsive responsive
CategoriX Process Test
CategoriX – Training Training set non responsive green Model red responsive blue CategoriX
Responsive Non-Responsive CategoriX – Model application score 0.273 0.987 Test set 0.515 0.641 0.358 0.735 CategoriX Model 0.886 0.106 0.793 0.074 0.672 0.439
Experiment 1 Data: Population: 5000 emails 5 Review Groups: A1, A2, A3, A4, A5 Training Sets: 1000 emails Test Sets: (Population minus the training set) Goals: CategoriX retrieval for different review groups Comparison between manual and automated classification
Attorney Review Responsiveness Rates A2 marked 42% more documents responsive than A1
CategoriX Retrieval 0.76 0.83 0.83 0.80 0.84
Attorney Review vsCategoriX Attorney to Attorney CategoriX to Attorney A5 = gold standard
Results Summary • Attorney responsiveness rate varied greatly • CategoriX models achieved high recall and precision • CategoriX was more consistent than attorneys
Conclusion Our testing indicated that the combination of clustering with attorney coding (CategoriX) was at least as accurate and more consistent than attorney review