1 / 20

Machine Learning Classification for Document Review

Machine Learning Classification for Document Review. Tom Barnett, Svetlana Godjevac, Caroline Privault, Jean-Michel Renders, John Schneider, Robert Wickstrom. Time pressure. Information growth. Basic assumptions. Key Words Search addresses recall. Attorney Review addresses precision.

glenna
Télécharger la présentation

Machine Learning Classification for Document Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning Classification for Document Review Tom Barnett, Svetlana Godjevac, Caroline Privault, Jean-Michel Renders, John Schneider, Robert Wickstrom

  2. Time pressure Information growth

  3. Basic assumptions Key Words Search addresses recall Attorney Review addresses precision

  4. Tacit Assumption • Attorney review is superior to any other form of review

  5. Problems With Standard Process Scalability Speed Cost Quality Lower recall Lower consistency

  6. CategoriX Combines two facets of document analysis: • Clustering (document identification based on similarity of text content) • Manual categorization

  7. Clustering Technology red green blue

  8. Attorney Categorization non responsive responsive

  9. CategoriX Process Test

  10. CategoriX – Training Training set non responsive green Model red responsive blue CategoriX

  11. Responsive Non-Responsive CategoriX – Model application score 0.273 0.987 Test set 0.515 0.641 0.358 0.735 CategoriX Model 0.886 0.106 0.793 0.074 0.672 0.439

  12. Experiment 1 Data: Population: 5000 emails 5 Review Groups: A1, A2, A3, A4, A5 Training Sets: 1000 emails Test Sets: (Population minus the training set) Goals: CategoriX retrieval for different review groups Comparison between manual and automated classification

  13. Attorney Review Responsiveness Rates A2 marked 42% more documents responsive than A1

  14. CategoriX Retrieval 0.76 0.83 0.83 0.80 0.84

  15. Attorney Review vsCategoriX Attorney to Attorney CategoriX to Attorney A5 = gold standard

  16. Results Summary • Attorney responsiveness rate varied greatly • CategoriX models achieved high recall and precision • CategoriX was more consistent than attorneys

  17. Gains

  18. Conclusion Our testing indicated that the combination of clustering with attorney coding (CategoriX) was at least as accurate and more consistent than attorney review

More Related