Learning Classifiers for Computer Aided Diagnosis Using Local Correlations

Learning Classifiers for Computer Aided Diagnosis Using Local Correlations Glenn Fung, Computer-Aided Diagnosis and Therapy Siemens Medical Solutions, Inc. Collaborators: Volkan Vural, Jennifer Dy [Northeastern University] Murat Dundar, Balaji Krishnapuram, Bharat Rao [Siemens] Feb 13, 2008

Outline • Brief Overview of CAD systems • Assumption in traditional classifier design are Often, not valid in CAD problems • Convex algorithms for Multiple Instance Learning (MIL) • Bayesian algorithms for Batch-wise classification • Faster, approximate algorithms via mathematical programming • Summary / Conclusions

3D:CT, MRI, PET... 2D:X-ray, Mammo, Pap... Imaging Data: Growing Possibilities, Growing Challenges 3D+Time:4DCardiac US/CT,Gated PET/CT, Dynamic MRI... 2D+Time:Echo 1D*:EKG *signal acquired in time

Computer-Aided Intelligent Imaging InterpretatonThe Goal • For computer to “see” (or do) what medical experts see (or do) • To automate routine, mind-numbing, and time-consuming tasks; • To improve consistency (by reducing intra- and inter-expert variability);

Sensitivity = 3/5 = 60% Specificity = 3/4 = 75% True Positive Rate (= Sensitivity) False Positive Rate (= 1 – specificity) Computer-Aided Intelligent Imaging InterpretatonThe Goal • For computer to “see” what doctors may miss • To improve sensitivity for disease detection and diagnosis; • To perform quantitative assessment not achievable by “eyeballing” or “guesstimate”; Receiver operating characteristic (ROC) curve

Computer-Aided Intelligent Imaging InterpretatonBasic Tools and Approaches Segmentation “Segmentation is the partition of a digital image into multiple regions (sets of pixels), according to some criterion.” – wikipedia.org At the low level, the criterion can be uniformity, which is determined according to pixel intensity, texture (repetitive patterns), etc. At a semantic level, the criterion can be object(s) and the background. In medical imaging, it usually refers to the delineation of different tissues or organs.

Computer-Aided Intelligent Imaging InterpretatonBasic Tools and Approaches Detection Detection is the process of finding one or more object or region of interest. In medical imaging, detection of abnormalities is often a primary goal. Examples include the detection of lung nodules, colon polyps, or breast lesions, all of which can be precursors to cancer; or the detection of abnormality of the brain (e.g., Alzheimer's disease) or pathological deformation of the heart (e.g., ventricular enlargement).

Computer-Aided Intelligent Imaging InterpretatonBasic Tools and Approaches Classification Classification is the separation of objects into different classes. In medical imaging, classification is often performed on a tissue or organ to distinguish between its healthy and diseased state, or different stages of the disease. A classifier is often trained using a training set, where one or more experts have assigned labels to a set of objects.

Computer-Aided Intelligent Imaging InterpretatonChallenges • More and more data available, • It is the prediction and early detection of diseases that saves most lives. However, “early” usually means more subtle signs and weaker signals in the images. Doctor often use a complex set of features that are often hard to formulate in computational forms; • If doctors miss them, who will teach the computer? • How do we know that we are doing better, if doctors do not agree among themselves? • Regulatory challenges

CAD Algorithms

Modeling / Candidate generation Inference Decision support for physician CAD Workflow:Core Tasks Feature Extraction from free text Feature Extraction from images Feature Extraction from omics data Causal prob. inference Evidential inference Low-level image processing Segmentation & quantification Image Registration Temporal Reasoning Fusion & Classification Collect individual patient’s data Feature extraction Classification (for candidate pruning) Predictive modeling Combine info from multiple sources Knowledge-based modeling Model Optimization

Vol 1 Time 1 Chest CT Colon CT Chest CT Detect / Analyze Detect Nodules Detect Polyps Detect Emboli Results1 Results Results Results General Detection Examples

Lung CAD

Motivation • Lung cancer is the most commonly diagnosed cancer worldwide, accounting for 1.2 million new cases annually. Lung cancer is an exceptionally deadly disease: 6 out of 10 people will die within one year of being diagnosed • The expected 5-year survival rate for all patients with a diagnosis of lung cancer is merely 15% • In the United States, lung cancer is the leading cause of cancer death for both men and women, causes more deaths than the next three most common cancers combined, and costs $9.6 Billion to treat annually. • However, lung cancer prognosis varies greatly depending on how early the disease is diagnosed; as with all cancers, early detection provides the best prognosis.

The need for lung CAD • Every pulmonary nodule, independent of size and location may be malign and needs to be looked at (20 - 50% of resected nodules are malignant) • The smaller the nodule the better the prognosis after nodule resection with respect to 5 year survival rate • There is need for a screening method, as it is already available for mammography.

Lung CAD:Introduction • CAD in plain words : • Find nodules in a large volume data set- solitary or attached to anatomical structures • Segment nodules correctly- remove structures like vessel, bronchus and pleura consistently and anatomically correct • Quantify nodules- volume, calcification, morphology, localization • Classify nodules as benign or malignant

Detecting Lung Cancer is hard:Part of a Single CT study of Lung

Where is the nodule?

Where is the lung cancer?

Computer Aided Detection • Computer aided detection • automatic detection scheme acts as a second reader

CAD Viewing Modes • Fly around • interactive visualization of the nodule, and • even fly around movies are possible ...

Colon CAD

Motivation • Colorectal cancer is the 3rd most common • diagnosed cancer in USA: • - 135,000 new cases forecast for 2001 • - 48,000 deaths forecast in 2001 • - 95% 5-year mortality rate for patients whose colorectal cancer has spread to other body parts • - 10% 5-year mortality rate if treated at early stage Source: American Cancer Society

CT Colonography:Exciting opportunity • Invasive colonoscopy remains the Gold Standard • CT Colonography: a promising non-invasive method • - 0.8 mm slices of abdomen possible in 9 sec breath-hold with a 16-slice CT • - CT has been shown capable of down to 6 mm polyp visualization • - CT exam is more acceptable and comfortable for patients

CT Volume Colon Segmentation (pre-processing) Pre-processed Volume Polyp Candidate Generation Candidate List Feature Extractions Features for Candidate List Pruning/Filtering Final List Colon CAD Summary GOAL High sensitivity (Low specificity is acceptable) GOAL High sensitivity High specificity

Detection missed by physician Shown in endo-view (bottom right) example of located polyp. This polyp was missed by the physician prospectively

General paradigm for CAD systems Image Candidate generation Candidates Feature Extraction Numerical attributes for each candidate Classification Final Marks on Image

Properties of the data used for designing classifiers for CAD systems • The training data is highly unbalanced • There is a form of stochastic dependence among the labeling errors of a group of candidates that are closer to a radiologist mark. • The features used to describe spatially close samples are highly correlated • The CG algorithm tends to have varying levels of sensitivity to different types of structures. • Some training images tend to contain far more false positive candidates as compared to the rest of the training dataset.

Shortcomings in standard classification algorithms • Tend to underestimate minority class when problems are very unbalanced • Assume that the training examples or instances are drawn identically and independently from an underlying unknown distribution • Assume that the appropriate measure for evaluating the classifiers is based only on the accuracy of the system on a per-lesion basis • Correct classification of every candidate instance is the main goal, instead of the ability to detect at least one candidate to points to each malignant lesion.

CAD: Correlations among candidate ROI

Hierarchical Correlation Among Samples • Correlations among patients from the same hospital scanner type, patient preparation, geographical location etc • Correlations among samples from the same patient: • samples pointing to the same structure, samples from different orientations, image characteristics – e.g., contrast/artifacts/noise

Initial Idea: Additive Random Effect Models • The classification is treated as iid, but only if given both • Fixed effects (unique to sample) • Random effects (shared among samples) • Simple additive model to explain the correlations • P(yi|xi,w,ri,v)=1/(1+exp(-wT xi–vT ri)) • P(yi|xi,w,ri)=s P(yi|xi,w,ri,v) p(v|D) dv • Sharing vT ri among many samples  correlated prediction • …But only small improvements in real-life applications

Candidate Specific Random Effects Model: Polyps Sensitivity 1-Specificity

CAD algorithms: Other examples of correlations between samples • Multiple (correlated) views: one detection is sufficient • Systemic treatment of diseases: e.g. detecting one PE sufficient • Modeling the data acquisition mechanism • Errors in labeling for training set.

Bag of candidates 4 Candidates pointing to the same polyp The Multiple Instance Learning Problem (NIPS 2006): Motivation Only ONE candidate needs to be correctly classified!!!

The Multiple Instance Learning Problem (NIPS 2006) • A bag is a collection of many instances (samples) • The class label is provided for bags, not instances • Positive bag has at least one positive instance in it • Examples of “bag” definition for CAD applications: • Bag=samples from multiple views, for the same region • Bag=all candidates referring to same underlying structure • Bag=all candidates from a patient

CH-MIL Algorithm: 2-D illustration

CH-MIL Algorithm for Fisher’s Discriminant • Easy implementation via Alternating Optimization • Scales well to very large datasets • Convex problem with unique optima

Lung CAD Computed Tomography Lung Nodules

CH-MIL: Pulmonary Embolisms

CH-MIL: Polyps in Colon

Classifying a Correlated Batch of Samples (ECML 2006) : Motivation • The candidates that belong to the same patient’s medical images are highly correlated • There is not any correlation between candidates from different patients • The level of correlation is a function of the pair wise distance between candidates • The samples (candidates) are collected naturally in batches • All the samples that belong to the same image constitute a batch

Classifying a Correlated Batch of Samples (ECML 2006) • Let classification of individual samples xi be based on ui • Eg. Linear ui = wT xi ; or kernel-predictor ui= j=1Nj k(xi,xj) • Instead of basing the classification on ui, we will base it on an unobserved (latent) random variable zi • Prior: Even before observing any features xi (thus before ui), zi are known to be correlated a-priori, • p(z)=N(z|0,) • Eg. due to spatial adjacency = exp(-D), • Matrix D=pair-wise dist. between samples

Classifying a Correlated Batch of Samples • Prior: Even before observing any features xi (thus before ui), zi are known to be correlated a-priori, • p(z)=N(z|0,) • Likelihood: Let us claim that ui is really a noisy observation of a random variable zi : • p(ui|zi)=N(ui|zi, 2) • Posterior: remains correlated, even after observing the features xi • P(z|u)=N(z|(-12+I)-1u, (-1+2I)-1) • Intuition: E[zi]=j=1N Aij uj ; A=(-12+I)-1

Related Work • Conditional Random Fields and Maximum Margin Markov Networks used for Natural Language Processing • Computationally expensive • Multiple Instance Learning (MIL)

Support vectors Support Vector MachinesMaximizing the Margin between Bounding Planes A+ A-

Membership of each in class +1 or –1 specified by: • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, • More succinctly: where e is a vector of ones. Algebra of the Classification Problem2-Category Linearly Separable Case • Given m points in n dimensional space Represented by an m-by-n matrix A

Learning Classifiers for Computer Aided Diagnosis Using Local Correlations