Short Introduction to Machine Learning Instructor: Rada Mihalcea

Short Introduction to Machine Learning Instructor: Rada Mihalcea

Learning? • What can we learn from here? • If Sky=Sunny and Air Temperature = Warm  Enjoy Sport = Yes • If Sky=Sunny  Enjoy Sport = Yes • If Air Temperature = Warm  Enjoy Sport = Yes • If Sky=Sunny and Air Temperature = Warm and Wind = Strong  Enjoy Sport = Yes ??

What is machine learning? • (H.Simon) • “Any process by which a system improves performance” • (T.Mitchell) • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” • Machine Learning has to do with designing computer programs that improve their performance through experience

Related areas • Artificial intelligence • Probability and statistics • Computational complexity theory • Information theory • Human language technology

Applications of ML • Learning to recognize spoken words • SPHINX (Lee 1989) • Learning to drive an autonomous vehicle • ALVINN (Pomerleau 1989) • Learning to classify celestial objects • (Fayyad et al 1995) • Learning to play world-class backgammon • TD-GAMMON (Tesauro 1992) • Learning to translate between languages • Learning to classify texts into categories • Web directories

Main directions in ML • Data mining • Finding patterns in data • Use “historical” data to make a decision • Predict weather based on current conditions • Self customization • Automatic feedback integration • Adapt to user “behaviour” • Recommending systems • Writing applications that cannot be programmed by hand • In particular because they involve huge amounts of data • Speech recognition • Hand writing recognition • Text understanding

Terminology • Learning is performed from EXAMPLES (or INSTANCES) • An example contains ATTRIBUTES or FEATURES • E.g. Sky, Air Temperature, Water • In concept learning, we want to learn the value of the TARGET ATTRIBUTE • Classification problems. Binary case +/–  positive/negative • Attributes have VALUES: • A single value (e.g. Warm) • ? - indicates any value possible for this attribute •  - indicates that no value is acceptable. • All features in an example are sometimes referred to as FEATURE VECTOR

Terminology • Feature vector for our learning problem: • (Sky, Air Temp, Humidity, Wind, Water, Forecast) and the target attribute is EnjoySport. • How to represent Aldo enjoys sports only on cold days with high humidity • (?, Cold, High, ?, ?, ?) • How about Emma enjoys sports regardless of the weather? • Hypothesis = the entire set of vectors that cover given examples • Most general hypothesis • (?, ?, ?, ?, ?, ?) • Most specific hypothesis • (, , , , , ) • How many hypothesis can be generated for our feature vector ?

Task in machine learning • Given: • A set of examples X • A set of hypotheses H • A target concept c • Determine: • A hypothesis h in H such that h(x) = c(x) • Practically, we want to determine those hypotheses that would best fit our examples. • (Sunny, ?, ?, ?, ?, ?) Yes • (?, Warm, ?, ?, ?, ?) Yes • (Sunny, Warm, ?, ?, ?, ?) Yes

Machine learning applications • Until now: toy example, decide if X enjoys sport given the current and future forecast • Practical problems: • Part of speech tagging. How? • Word sense disambiguation • Text categorization • Chunking • . • . • Whatever problem that can be modeled through examples should support learning

Machine learning algorithms • Concept learning via searching on general-specific hypotheses • Decision tree learning • Instance based learning • Rule based learning • Neural networks • Bayesian learning • Genetic algorithms

Basic elements of information theory • How to determine which attribute is the best classifier? • Measure the information gain of each attribute • Entropy characterizes the (im)purity of an arbitrary collection of examples. • Given a collection S of positive and negative examples • Entropy(S) = - p log p – q log q • Entropy is at its maximum when p = q = ½ • Entropy is at its minimum when p = 1 and q = 0 • Example: • S contains 14 examples: 9 positive and 5 negative • Entropy(S) = - (9/14) log (9/14) – (5/14) log (5/14) = 0.94 • log 0 = 0

Basic elements of information theory • Information gain • Measures the expected reduction in entropy • Many learning algorithms are making decisions based on information gain

Basic elements of information theory

Decision trees

Decision trees • Have the capability of generating rules: • IF outlook=sunny and temperature = hot • THEN play tennis = no • Powerful! It would be very hard to do that as a human. • C4.5 (Quinlan) • ID3 • Integral part of MLC++ • Integral part of Weka (for Java)

Instance based algorithms • Distance between examples • Remember the WSD algorithm? • K-nearest neighbour • Given a set of examples X • (a1(x), a2(x) … an(x)) • Classify a new instance based on the distance between current example and all examples in training

Instance based algorithms • Take into account every single example: • Advantage? Disadvantage? • “Do not forget exceptions” • Very good for NLP tasks: • WSD • POS tagging

Measure learning performance • Error on test data • Sample error (generalization error): wrong cases / total cases • True error: estimate an error range starting with the sample error • Cross validation schemes – for more accurate evaluations • 10 fold cross validation scheme • Divide training data into 10 sets • Use one set for testing, and the other 9 sets for training • Repeat 10 times, measure average accuracy

Practical issues – Using Weka • Weka – freeware • Java implementation of many learning algorithms • + boosting • + capability of handling very large data sets • + automatic cross – validation • To run an experiment: • file.arff [test optional – if not present, will evaluate through cross-validation]

Specify the feature types • Specify the feature types: • Discrete: value drawn from a set of nominal values • Continuous: numeric value • Example : Golf data • Play, Don't Play. | the target attribute • outlook: sunny, overcast, rain. | features. • temperature: real. • humidity: real. • windy: true, false.

Weather Data • sunny, 85, 85, false, Don't Play • sunny, 80, 90, true, Don't Play • overcast, 83, 78, false, Play • rain, 70, 96, false, Play • rain, 68, 80, false, Play • rain, 65, 70, true, Don't Play • overcast, 64, 65, true, Play • sunny, 72, 95, false, Don't Play • sunny, 69, 70, false, Play • rain, 75, 80, false, Play • sunny, 75, 70, true, Play • overcast, 72, 90, true, Play • overcast, 81, 75, false, Play • rain, 71, 80, true, Don't Play

Running Weka • Check “Short Intro to Weka”

Short Introduction to Machine Learning Instructor: Rada Mihalcea

Short Introduction to Machine Learning Instructor: Rada Mihalcea

Presentation Transcript

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to machine learning

Short Perl tutorial Instructor: Rada Mihalcea

Introduction to Machine Learning

Seminar on Machine Learning Rada Mihalcea

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning

Seminar on Machine Learning Rada Mihalcea

Introduction-to-Machine-Learning