1 / 24

Short Introduction to Machine Learning Instructor: Rada Mihalcea

Short Introduction to Machine Learning Instructor: Rada Mihalcea. Learning?. What can we learn from here? If Sky=Sunny and Air Temperature = Warm  Enjoy Sport = Yes If Sky=Sunny  Enjoy Sport = Yes If Air Temperature = Warm  Enjoy Sport = Yes

hanae-ross
Télécharger la présentation

Short Introduction to Machine Learning Instructor: Rada Mihalcea

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Short Introduction to Machine Learning Instructor: Rada Mihalcea

  2. Learning? • What can we learn from here? • If Sky=Sunny and Air Temperature = Warm  Enjoy Sport = Yes • If Sky=Sunny  Enjoy Sport = Yes • If Air Temperature = Warm  Enjoy Sport = Yes • If Sky=Sunny and Air Temperature = Warm and Wind = Strong  Enjoy Sport = Yes ??

  3. What is machine learning? • (H.Simon) • “Any process by which a system improves performance” • (T.Mitchell) • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” • Machine Learning has to do with designing computer programs that improve their performance through experience

  4. Related areas • Artificial intelligence • Probability and statistics • Computational complexity theory • Information theory • Human language technology

  5. Applications of ML • Learning to recognize spoken words • SPHINX (Lee 1989) • Learning to drive an autonomous vehicle • ALVINN (Pomerleau 1989) • Learning to classify celestial objects • (Fayyad et al 1995) • Learning to play world-class backgammon • TD-GAMMON (Tesauro 1992) • Learning to translate between languages • Learning to classify texts into categories • Web directories

  6. Main directions in ML • Data mining • Finding patterns in data • Use “historical” data to make a decision • Predict weather based on current conditions • Self customization • Automatic feedback integration • Adapt to user “behaviour” • Recommending systems • Writing applications that cannot be programmed by hand • In particular because they involve huge amounts of data • Speech recognition • Hand writing recognition • Text understanding

  7. Terminology • Learning is performed from EXAMPLES (or INSTANCES) • An example contains ATTRIBUTES or FEATURES • E.g. Sky, Air Temperature, Water • In concept learning, we want to learn the value of the TARGET ATTRIBUTE • Classification problems. Binary case +/–  positive/negative • Attributes have VALUES: • A single value (e.g. Warm) • ? - indicates any value possible for this attribute •  - indicates that no value is acceptable. • All features in an example are sometimes referred to as FEATURE VECTOR

  8. Terminology • Feature vector for our learning problem: • (Sky, Air Temp, Humidity, Wind, Water, Forecast) and the target attribute is EnjoySport. • How to represent Aldo enjoys sports only on cold days with high humidity • (?, Cold, High, ?, ?, ?) • How about Emma enjoys sports regardless of the weather? • Hypothesis = the entire set of vectors that cover given examples • Most general hypothesis • (?, ?, ?, ?, ?, ?) • Most specific hypothesis • (, , , , , ) • How many hypothesis can be generated for our feature vector ?

  9. Task in machine learning • Given: • A set of examples X • A set of hypotheses H • A target concept c • Determine: • A hypothesis h in H such that h(x) = c(x) • Practically, we want to determine those hypotheses that would best fit our examples. • (Sunny, ?, ?, ?, ?, ?) Yes • (?, Warm, ?, ?, ?, ?) Yes • (Sunny, Warm, ?, ?, ?, ?) Yes

  10. Machine learning applications • Until now: toy example, decide if X enjoys sport given the current and future forecast • Practical problems: • Part of speech tagging. How? • Word sense disambiguation • Text categorization • Chunking • . • . • Whatever problem that can be modeled through examples should support learning

  11. Machine learning algorithms • Concept learning via searching on general-specific hypotheses • Decision tree learning • Instance based learning • Rule based learning • Neural networks • Bayesian learning • Genetic algorithms

  12. Basic elements of information theory • How to determine which attribute is the best classifier? • Measure the information gain of each attribute • Entropy characterizes the (im)purity of an arbitrary collection of examples. • Given a collection S of positive and negative examples • Entropy(S) = - p log p – q log q • Entropy is at its maximum when p = q = ½ • Entropy is at its minimum when p = 1 and q = 0 • Example: • S contains 14 examples: 9 positive and 5 negative • Entropy(S) = - (9/14) log (9/14) – (5/14) log (5/14) = 0.94 • log 0 = 0

  13. Basic elements of information theory • Information gain • Measures the expected reduction in entropy • Many learning algorithms are making decisions based on information gain

  14. Basic elements of information theory

  15. Decision trees

  16. Decision trees

  17. Decision trees • Have the capability of generating rules: • IF outlook=sunny and temperature = hot • THEN play tennis = no • Powerful! It would be very hard to do that as a human. • C4.5 (Quinlan) • ID3 • Integral part of MLC++ • Integral part of Weka (for Java)

  18. Instance based algorithms • Distance between examples • Remember the WSD algorithm? • K-nearest neighbour • Given a set of examples X • (a1(x), a2(x) … an(x)) • Classify a new instance based on the distance between current example and all examples in training

  19. Instance based algorithms • Take into account every single example: • Advantage? Disadvantage? • “Do not forget exceptions” • Very good for NLP tasks: • WSD • POS tagging

  20. Measure learning performance • Error on test data • Sample error (generalization error): wrong cases / total cases • True error: estimate an error range starting with the sample error • Cross validation schemes – for more accurate evaluations • 10 fold cross validation scheme • Divide training data into 10 sets • Use one set for testing, and the other 9 sets for training • Repeat 10 times, measure average accuracy

  21. Practical issues – Using Weka • Weka – freeware • Java implementation of many learning algorithms • + boosting • + capability of handling very large data sets • + automatic cross – validation • To run an experiment: • file.arff [test optional – if not present, will evaluate through cross-validation]

  22. Specify the feature types • Specify the feature types: • Discrete: value drawn from a set of nominal values • Continuous: numeric value • Example : Golf data • Play, Don't Play. | the target attribute • outlook: sunny, overcast, rain. | features. • temperature: real. • humidity: real. • windy: true, false.

  23. Weather Data • sunny, 85, 85, false, Don't Play • sunny, 80, 90, true, Don't Play • overcast, 83, 78, false, Play • rain, 70, 96, false, Play • rain, 68, 80, false, Play • rain, 65, 70, true, Don't Play • overcast, 64, 65, true, Play • sunny, 72, 95, false, Don't Play • sunny, 69, 70, false, Play • rain, 75, 80, false, Play • sunny, 75, 70, true, Play • overcast, 72, 90, true, Play • overcast, 81, 75, false, Play • rain, 71, 80, true, Don't Play

  24. Running Weka • Check “Short Intro to Weka”

More Related