Using Ensemble Models in the Histological Examination of Tissue Abnormalities

Using Ensemble Models in the Histological Examination of Tissue Abnormalities M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12th Annual Research DayFriday, May 2nd, 2014

Objective The objective of this study is: • to investigate the possibility of automatically identifying abnormalities in tissue samples through the use of an ensemble model on data generated by histological examination • to minimize the number of false negative cases.

Introduction • As part of breast cancer prevention screening if a lump is found a fine-needle aspiration biopsy (FNAB) is performed. • Normally the sample is analyzed visually by a pathologist that looks for cancerous tissues with abnormal characteristics. • This procedure is time consuming • Automatic procedures do exists that evaluate cytology features derived from a digital scan of breast FNAB slides. • These procedure achieve very high accuracies, and better than manual procedure, but still have a certain level of false negative • Our goal is to reduce the false negative rate

The Data • Wisconsin Breast Cancer Dataset • Containing 569 samples classified as “normal” or “abnormal” • 12 attributes • Dataset split: • Training set: 448 samples. • Test set: 121 samples.

The Data Cont.… • Table Structure (12 fields) IdDiagnosis (A=Abnormal/ N=Normal)Radius (mean of distances from center to points on the perimeter)Texture (standard deviation of gray-scale values)PerimeterAreaSmoothness (local variation in radius lengths)Compactness (perimeter^2 / area - 1.0)Concavity (severity of concave portions of the contour)Concave points (number of concave portions of the contour)SymmetryFractal dimension ("coastline approximation" - 1)

Exploratory Data Analysis • The data set was of very good quality • No missing values • Outliers detected through the use of Z-Score, with a possible outlier falling outside of the interval [-4,+4] • We detected some outliers, but further investigation excluded errors in the data.

Exploratory Data Analysis (Cont.…) • Normality Assumption: variables normally distributed within acceptable variations • Skewness within [-2,+2] • Kurtosis within [-2,+2]

Exploratory Data Analysis (Cont.…) • Normalization: to avoid that variables will influence the model due to their scales we normalized the data using the min-Max transformation • All resulting variables were within the interval of [0,1]

Exploratory Data Analysis (Cont.…) • Normalization: to avoid that variables will influence the model due to their scales we normalized the data using the min-Max transformation • All resulting variables were within the interval of [0,1] • Correlation • We kept radius and dropped the other variables.

Clustering • Derived a new “cluster” variable by applying the K-Means algorithm with k=2.

Modeling • Due to the characteristics of the data we applied two algorithms • CART (with misclassification costs) • Logistic Regression • Confusion Matrixes & Error Rates

Ensemble Model • We leveraged the confidence interval measures produced by these models. • Applied a voting scheme in which the prediction with the highest confidence wins.

Conclusions • The voting-based ensemble model derived through the combination of decision trees and logistic regression proved to be a very efficient way of helping in improving the detection of abnormal biopsy samples. • The very low false negative rate of 1% is a clear indication that this problem can be solved by the generation of high quality classification solutions, representing an improvement when compared to other classification systems developed in the past.

References • E. D. Pisano, L. L. Fajardo, D. J. Caudry, N. Sneige, W. J. Frable, W. A. Berg, I. Tocino, S. J. Schnitt, J. L. Connolly, C. A. Gatsonis, and B. J. McNeil, Fine-Needle Aspiration Biopsy of Nonpalpable Breast Lesions in a Multicenter Clinical Trial, Radiology, 2001, Vol. 219, Issue 3, pp. 785-792 • W. H. Wolberg, W. N. Street, O. L. Mangasarian, Breast Cytology Diagnosis Via Digital Image Analysis, Dept. of Surgery, Universit of Wisconsin, 1993 • W. Wolberg, W.N. Street, O.L. Mangasarian, Importance of nuclear morphology in breast cancer prognosis, Clinical Cancer Research, (1999) Vol. 5, 3542-3548 • B. Lantz, “Machine Learning with R”, Packt Publishing, 2013 • UCI-Machine Learning Repository, http://archive.ics.uci.edu/ml/ • D. Larose, Discovering Knowledge in Data, Wiley, 2005. • G. Seni and J. F. Elder, Ensemble Methods in Data Mining, Morgan & Claypool Publishers, 2009. • J. F. Elder and S. S. Lee, Bundling Heterogeneous Classifiers with Advisor Perceptrons, University of Idaho, Technical Report, Oct. 1997.

Using Ensemble Models in the Histological Examination of Tissue Abnormalities

Using Ensemble Models in the Histological Examination of Tissue Abnormalities

Presentation Transcript

ABNORMALITIES IN DERMAL CONNECTIVE TISSUE

PREPARATION OF HISTOLOGICAL SPECIMENS

Abnormalities in Erythrocytes

Soft Tissue abnormalities

Chapter 1: FRESH TISSUE EXAMINATION

Using Models in Science

Application of Ensemble Models in Web Ranking

Interpretation of abnormalities in urine

Continued Air Quality Forecast Support in Maryland using Ensemble Statistical Models

The Histological In-growth of Soft Tissue into the Nottingham Hood Prosthesis

Advancing Hydrologic Ensemble Forecasting using Distributed Watershed Models

Study of internal variability in regional climate models: Application of the ensemble technique

Accounting for Uncertainties in NWPs using the Ensemble Approach for Inputs to ATD Models

Group Communication using Ensemble

Tissue Deposits and Growth Abnormalities

An Examination Of Interesting Properties Regarding A Physics Ensemble

Histological Structure of Lymphoid Organs

Group Communication using Ensemble

Abnormalities of the Teeth

ABNORMALITIES OF THE UMBILICAL CORD

Applications of the Canonical Ensemble : Simple Models of Paramagnetism