Computational Intelligence in Biomedical and Health Care InformaticsHCA 590 (Topics in Health Sciences) Rohit Kate Introduction to Machine Learning A few slides have been adapted from Ray Mooney’s Machine Learning course at UT Austin.
Reading • Chapter 11, Main textbook, page 192 onwards
What is Machine Learning? Medical Phenomenon Input Output ??? Machine Learning System Output’ Machine learning techniques are used to automatically predict output from input.
Wide Applications • Predict stock prices • Input: Economic factors • Output: Stock price • Detect frauds in credit card transactions • Input: Details of a transaction • Output: Fraud or no fraud • Classify emails as spam or no spam • Input: Email text • Output: Spam or no spam • Recognize spoken words • Input: Features from sound signals • Output: Spoken word • Recognize handwritten letters • Input: Features from images • Output: Letter or number • …
Applications in Medicine • Medical diagnosis • Input: Patient symptoms, physiological data • Output: Disease • Detect cancer in radiology images • Input: Features from images • Output: Cancer or no cancer • …
How Does Machine Learning Work? Training: System learns from examples of inputs paired with their corresponding outputs Testing: System predicts output on novel inputs Training Examples Trained Machine Learning System Predicted output Novel input The computer (machine) learns from examples, hence it is called machine learning. As opposed to humans encoding knowledge into the computer.
Features in Machine Learning • Inputs are presented to a machine learning system in the form of features • Each application will have its own features, giving the right features is critical for success
Example Machine Learning Task • Medical task: Predict presence of diabetes (positive or negative) in women [Smith et al. 1988] http://repository.seasr.org/Datasets/UCI/arff/diabetes.arff Input features: • Number of times pregnant (preg) • Plasma glucose concentration (plas) • Diastolic blood pressure (mm Hg) (pres) • Triceps skin fold thickness (mm) (skin) • 2-Hour serum insulin (mu U/ml) (insu) • Body mass index (weight in kg/(height in m)^2) (mass) • Diabetes pedigree function (pedi) • Age (years) (age) • Training examples with the correct presence/absence of diabetes can be gathered from past data or from experts
Example Machine Learning Task From past data or from experts Training Examples Features Feature values
Example Machine Learning Task Training Examples Unknown Test Examples
Benefits of Machine Learning • Can work with thousands of features that may be overwhelming for humans • Can learn from millions of examples which may not be feasible for humans to learn from • May find relations between inputs and outputs that may not be apparent to human inspection
Classification and Regression • Most learning tasks fall under two categories • Classification: The value to be predicted is a nominal value, for example, positive or negative diagnosis • Many machine learning methods can also give confidence (e.g. probability) for their classification • Regression: The value to be predicted is a numerical value, for example, stock prices, energy expenditure, etc. • Most machine learning techniques have both classification and regression versions
Machine Learning Techniques • Wide range of machine learning techniques have been developed, from statistical-based to rule-based • Freely available software “Weka” has most of them http://www.cs.waikato.ac.nz/ml/weka/ • Different techniques offer different advantages and disadvantages • Rule-based: • Human-interpretable • Generally less accurate than statistical-based techniques • Statistical-based: • Less human-interpretable • Generally more accurate than rule-based techniques
Machine Learning Techniques • Learning can be viewed as using experience to approximate a chosen target function. • Function approximation can be viewed as a search through a space of hypotheses (representations of functions) for one that best fits a set of training data. • Different machine learning techniques assume different hypothesis spaces (representation languages) and/or employ different search techniques.
Various Function Representations • Numerical functions • Linear regression • Neural networks • Support vector machines • Symbolic functions • Decision trees • Rules in propositional logic • Rules in first-order predicate logic • Instance-based functions • Nearest-neighbor • Case-based • Probabilistic Graphical Models • Naïve Bayes • Bayesian networks • Hidden-Markov Models (HMMs) • Probabilistic Context Free Grammars (PCFGs) • Markov networks
Various Search Algorithms • Gradient descent • Perceptron • Backpropagation • Dynamic Programming • HMM Learning • Divide and Conquer • Decision tree induction • Rule learning • Evolutionary Computation • Genetic Algorithms (GAs)
Decision Trees(Rule-based) Part of an automatically learned decision tree on the diabetes data: plas >=139.5 <139.5 …. plas <166.5 >=166.5 preg positive >= 6.5 < 6.5 pedi mass >= 29 < 0.33 < 29 >= 0.33 positive negative negative preg < 5.5 >= 5.5 negative insu >= 422.5 < 422.5 negative positive
Propositional Rule Learner(Rule-based) Automatically learned rules from the diabetes data (plas >= 132) and (mass >= 30) => positive (age >= 29) and (insu >= 125) and (preg <= 3) => positive (age >= 31) and (pedi >= 0.529) and (preg >= 8) and (mass >= 25.9) => positive Otherwise => negative
Support Vector Machines (Statistical) Separating hyperplane for a linear support vector machine 1.3614 * (normalized) preg + 4.8764 * (normalized) plas + -0.8118 * (normalized) pres + -0.1158 * (normalized) skin + -0.1776 * (normalized) insu + 3.0745 * (normalized) mass + 1.4242 * (normalized) pedi + 0.2601 * (normalized) age - 5.1761
Neural Networks(Statistical) • Mathematically relates inputs to outputs through intermediate hidden nodes • Biologically inspired from neurons Output nodes Hidden nodes Input nodes
Feature Engineering • Besides the machine learning method employed, the performance depends largely on the features used • It is a skill to come up with the best features, called feature engineering • If the relevant features are not used then the machine learning method will never be able to learn to predict the correct output • Extraneous features may confuse the machine learning methods, although they usually have some robustness to certain level • Methods exist to automatically search the possible space of features to select the best features, feature selection methods
Evaluation of Learning Systems • Experimental • Conduct controlled cross-validation experiments to compare various methods on a variety of benchmark datasets • Gather data on their performance, e.g. test accuracy, training-time, testing-time • Analyze differences for statistical significance. • Theoretical • Analyze algorithms mathematically and prove theorems about their: • Computational complexity (how fast the algorithm runs) • Ability to fit training data • Sample complexity (number of training examples needed to learn an accurate function)
Machine Learning Techniques in Medicine • Must be accurate • Must be able to work with missing or error-prone medical data • Must be able to provide explanation for its decisions • Should be able to assist medical professionals without changing their usual workflow
Data Mining • Closely related to machine learning; often the same techniques are employed • Instead of predicting outputs, the goal is to find interesting and useful patterns in data • For example, a pattern mined from patient records: A patient suffering from diseases A & B and given a treatment T has 80% chance of having a side-effect S • Computer techniques may reveal patterns which may not be apparent to human experts