290 likes | 460 Vues
Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology. Scott Doyle 1 , Michael Feldman 2 , John Tomaszewski 2 , Anant Madabhushi 1. 1 Department of Biomedical Engineering, Rutgers, The State University of New Jersey
E N D
Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle1, Michael Feldman2, John Tomaszewski2, Anant Madabhushi1 1Department of Biomedical Engineering, Rutgers, The State University of New Jersey 2Department of Surgical Pathology, University of Pennsylvania http://lcib.rutgers.edu
Outline • Background • Digital Prostate Histopathology • Supervised Classification • Active Learning • Methodology • Active Learning • Data Description • Experimental Setup • Experimental Results • Concluding Remarks
~1 million biopsies per year in USA 10-12 tissue samples per biopsy 80% benign diagnosis Large amount of data to analyze Prostate Cancer Detection
Identifies regions of interest / suspicion Quantitative Automated Reduces variability Supervised classification system Computer-Aided Diagnosis Doyle, S., Feldman, M., Tomaszewski, J., Madabhushi, A. “A Hierarchical Computer-aided Classification Scheme for Automated Detection of Prostatic Adenocarcinoma from Digitized Histology,” APIII 2006
Expert segmentation for training Histopathology: Expensive, time-consuming to annotate Cost per training sample is high Supervised Classification
Random training inefficient Possible redundancy with existing training No guarantee of improved accuracy Supervised Classification
Solution: Active Learning • Choose training samples intelligently, not randomly • Increased accuracy per training sample • Forced choice of training, maximized accuracy • Useful where: • Large amount of unlabeled data • Annotations are expensive • Ideally suited for histopathology data
Active Learning Random Learning Active Learning Classifier Performance Accuracy # of Training Samples
Previous Work • Liu [2004], Vogiatzis and Tsapatsoulis [2006] • Gene microarray data • Yao, et al [2008] • Content-based image retrieval • Little work done in histopathology with Active Learning
Outline • Background • Digital Prostate Histopathology • Supervised Classification • Active Learning • Methodology • Active Learning • Data Description • Experimental Setup • Experimental Results • Concluding Remarks
Active Learning Methodology Classify Unlabeled Training Labeled Build Classifier Training Data Unlabeled Obtained from pathologist Build Classifier Uncertain Classification Cancer Non-cancer
Active Learning Methodology + Uncertain Classification Informative Samples Eliminate, labeling these adds no information Obtain Expert Labels Identify Informative Regions Combine With Original Set Certain Classification Uninformative
Active Learning Methodology New Training Set Generate New Classifier
Feature Extraction Feature Images Original Image Cancer Region
Classification Feature Images C4.5 Decision Tree “Random Forest” [Brieman, 2001] Majority voting determines classification Doyle, S., Madabhushi, A., Feldman, M., Tomaszeweski, J.: A Boosting Cascade for Automated Detection of Prostate Cancer from Digitized Histology, MICCAI, Lecture Notes in Computer Science, Vol. 4191, pp. 504-511, 2006.
Image Data Description • 27 H&E stained digital biopsy samples • Data breakdown: • Initial Training Set • Unlabeled Training Set • Testing Set • Active Learning drawn from Unlabeled Training • Groups rotated so all images are tested
Classification • Three training groups evaluated: • Initial set: • Active Learning set: • Random Learning set: Initial Training Initial Training Active Learning + Initial Training Random Learning +
Outline • Background • Digital Prostate Histopathology • Supervised Classification • Active Learning • Methodology • Active Learning • Data Description • Experimental Setup • Experimental Results • Concluding Remarks
Results: Qualitative Random Learning Original Image Active Learning
Results: Qualitative Random Learning Active Learning
Results: Qualitative Original Image Random Learning Active Learning
Results: Qualitative Random Learning Active Learning
Outline • Background • Digital Prostate Histopathology • Supervised Classification • Active Learning • Methodology • Active Learning • Data Description • Experimental Setup • Experimental Results • Concluding Remarks
Concluding Remarks • Maximize classification accuracy by choosing training intelligently • Efficiently obtain annotations • Make the most use of “training budget” • Build Active Learning into clinical applications • Online training correction / modification • User feedback
Acknowledgements • The Coulter foundation (WHCF 4-29368) • New Jersey Commission on Cancer Research • The National Cancer Institute (R21CA127186-01, R03CA128081-01) • The US Department of Defense (427327) • The Society for Medical Imaging and Informatics