A cursory glance at machine learning
This document provides a concise overview of machine learning, highlighting its foundations in statistics, linear algebra, calculus, and computer science. It differentiates between supervised learning methods, such as linear regression and support vector machines, which minimize error functions to predict outcomes, and unsupervised learning techniques like k-means clustering that organize data into groups without pre-labeled instances. The text also discusses Bayesian and frequentist approaches to probability, using real-world examples, including applications in smart grid technology.
A cursory glance at machine learning
E N D
Presentation Transcript
A cursory glance at machine learning Ashwath Rajan
Overview, in brief • Marriage between statistics, linear algebra, calculus, and computer science • Machine Learning: • Supervised Learning • ex: linear Regression • Unsupervised Learning • ex: clustering • Discriminative Methods • Learns borders in the feature space; border chosen to minimize error functions • Generative Methods • Learns probabilistic distribution of each class, dependent on parameters; gives confidence of classification
Supervised learninglinear regression • Regression can be used to fit a numerical model to a parameter space • To solve, minimize error function • Can solve analytically, or iteratively Test Set Predictions
Iterative linear regression Turns out to be )
Supervised learningsupport vector machines • Mechanism for both pattern recognition and regression • Finds maximum margin n-1 dimensional hyperplanes that split n dimensional parameter spaces • Data can be separable or non-separable Separable vs Non-separable data sets
Unsupervised learningk-means clustering Procedure – 0. Dictate number of clusters (k) Randomly select k starting points. These serve as the initial group centroids. Associate the remaining points with the nearest centroid. Move the centroids to the center of their respective clusters Repeat steps 2 and 3
Unsupervised learning • If number of clusters is unknown, can use different algorithms, where instead of setting # means, we set size of relative neighborhood
Frequentist vs. Bayesian • Maximum Likelihood • Assumes fixed value for parameters • Can be used analytically • Bayesian estimation • Assumes parameter as distribution • Uses evidence to amend prior distribution into posterior
Probability and Bayes rule • Bayesian probability estimates allow prior distributions to be modified with discovered data.
Bayes rule – cancer example Say, probability of rare cancer: • Probability of no cancer: Now say, there is a blood test to detect cancer • Its fairly accurate, as described by the following table: Sensitivity Specificity
Bayes rule – cancer example • So, what happens if you get a positive result? • What is the chance you have cancer? Use Bayes Rule: Sensitivity = .8 * .01 / .594 = .13 Specificity
My research at USC • Machine Learning to help Smart Grid • Take building sensor data, and find models to connect different data streams to kWh usage • Both supervised and unsupervised techniques could be considered – however, supervised learning is often most apt
Online courses • Much of this material has been shamelessly reproduced/copied from online coursework: • Udacity: Statistics 101 and CS 373 AI • Coursera: Great 10 week machine learning course https://www.coursera.org/course/ml
Cited • Regression Example: Andrew Ng, Stanford – Coursera • Cancer Example: Sebastian Thrum, Udacity • “A Tutorial on Support Vector Machines for Pattern Recognition” - CHRISTOPHER J.C. BURGES; Bell Laboratories, Lucent Technologies 1998 • Pattern Recognition Primer – David Doria, 2008