Machine Learning Saarland University, SS 2007

Machine LearningSaarland University, SS 2007 Lecture 1, Friday April 19th, 2007 (basics and example applications) Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken, Germany

Overview of this Lecture • Machine Learning Basics • Classification • Objects as feature vectors • Regression • Clustering • Example applications • Surface reconstruction • Preference Learning • Netflix challenge (how to earn $1,000,000) • Text search

Classification • Given a set of points, each labeled + or – • learn something from them … • … in order to predict label of new points + + + + + + + – – + – – – – – – ? – this is an instance of supervised learning

Classification — Quality • Which classifier is better? • answer requires a model of where the data comes from • and a measure of quality/accuracy + + + + + + + – – + – – – – – ? –

Classification — Outliers and Overfitting • We have to find a balance between two extremes • oversimplification( large classification error) • overfitting(lack of regularity) • again: requires a model of the data + + – + + – – + + + – – + – – + – – – –

Classification — Point Transformation • If a classifier does not work for the original data • try it on a transformation of the data • typically: make points linearly separable by a suitable mapping to a higher-dimensional space + + + + + + map x to (x , |x|) + + – – – + + + – – – + + + + + 0

Classification — more labels • Typically: • first, basic technique for binary classification • then, extension to more labels o o + o + o + + o o + o + + + – – – – – – – –

Objects as Feature Vectors • But why learn something about points ? • General Idea: • represent objects as points in a space of fixed dimension • each dimension corresponds to a so-called feature of the object • Very crucial: • selection of features • normalization of vectors

Objects as Feature Vectors • Example: Objects with attributes • features = values • normalize by reference value for each feature Person 3 Person 4 Person 1 Person 2 188 cm 181 cm 190 cm 176 cm 90 kg 77 kg 55 kg 75 kg age 32 age 34 age 24 age 36

Objects as Feature Vectors • Example: Images • features = pixels(with grey values) • often fine without further normalization Image 1 Image 2

Objects as Feature Vectors • Example: Text documents • features = words • normalize to unit norm Doc. 1 Machine LearningSS 2007 Doc. 2 Statistical LearningTheorySS 2007 Doc. 3 Statistical LearningTheorySS 2006

Regression value to learn (typically a real number) • Learn a function that maps objects to values • Similar trade-off as for classification: • risk of oversimplification vs. risk of overfitting x x x x x x x x ? given value (typically multi-dimensional)

Clustering • Partition given set of points into clusters • Similar problems as for classification • follow data distribution, but not too closely • transformation often helps (next slide) x x x x x x x x x x x x this is an instance of unsupervised learning

Clustering • Partition given set of points into clusters • Similar problems as for classification • follow data distribution, but not too closely • transformation often helps (next slide) x x x x x x x x x x x x

Clustering — Transformation • For clustering, typically dimension reduction helps • whereas in classification, embedding in a higher-dimensional space typically helps vectors fordocuments 2, 3, and 4equally dissimilar project to 2 dimensions 2-clustering would work fine now

Application Example: Text Search • 676 abstracts from the Max-Planck-Institute • for example: We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved. • 3283 words (words like and, or, this, … removed) • abstracts come from 5 working groups: Algorithms, Logic, Graphics, CompBio, Databases • reduce to 10 concepts No dictionary, no training, only the plain text itself !

Machine Learning Saarland University, SS 2007