1 / 77

Introduction to Machine Learning

Introduction to Machine Learning. Learning. Agent has made observations ( data ) Now must make sense of it ( hypotheses ) Hypotheses alone may be important (e.g., in basic science) For inference (e.g., forecasting) To take sensible actions (decision making)

juana
Télécharger la présentation

Introduction to Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Machine Learning

  2. Learning • Agent has made observations (data) • Now must make sense of it (hypotheses) • Hypotheses alone may be important (e.g., in basic science) • For inference (e.g., forecasting) • To take sensible actions (decision making) • A basic component of economics, social and hard sciences, engineering, …

  3. Last time • Going from observed data to unknown hypothesis • 3 types of statistical learning techniques • Bayesian inference • Maximum likelihood • Maximum a posterior • Applied to learning: • Candy bag example (5 discrete hypotheses) • Coin flip probability (infinite hypotheses from 0 to 1)

  4. Bayesian View of Learning • P(hi|d) = a P(d|hi) P(hi) is the posterior • (Recall, 1/a = P(d) = Si P(d|hi) P(hi)) • P(d|hi) is the likelihood • P(hi) is the hypothesis prior h1C: 100%L: 0% h2C: 75%L: 25% h3C: 50%L: 50% h4C: 25%L: 75% h5C: 0%L: 100%

  5. Bayesian vs. Maximum Likelihood vs Maximum a Posteriori • Bayesian reasoning requires thinking about all hypotheses • ML and MAP just try to get the “best” • ML ignores prior information • MAP uses it • Smoothes out the estimate for small datasets • All are asymptotically equivalent given large enough datasets P(X|hML) P(X|d) P(X|hMAP)

  6. Learning bernoulli distributions • Example data • ML estimates

  7. Maximum Likelihood for BN • For any BN, the ML parameters of any CPT can be derived by the fraction of observed values in the data N=1000 B: 200 E 500 P(E) = 0.5 P(B) = 0.2 Earthquake Burglar A|E,B: 19/20A|B: 188/200A|E: 170/500A| : 1/380 Alarm

  8. Maximum A Posteriori with Beta Priors • Example data • MAP Estimatesassuming Betaprior with a=b=3 • virtual counts 2H,2T

  9. Topics in Machine Learning TechniquesBayesian learningDecision treesNeural networksSupport vector machinesBoostingCase-based reasoningDimensionality reduction… Tasks & settingsClassificationRankingClusteringRegressionDecision-makingSupervisedUnsupervisedSemi-supervisedActiveReinforcement learning Applications Document retrieval Document classification Data mining Computer vision Scientific discoveryRobotics…

  10. What is Learning? • Mostly generalization from experience: “Our experience of the world is specific, yet we are able to formulate general theories that account for the past and predict the future”M.R.Genesereth and N.J. Nilsson, in Logical Foundations of AI, 1987 •  Concepts, heuristics, policies • Supervised vs. un-supervised learning

  11. Inductive Learning • Basic form: learn a functionfrom examples • f is the unknown target function • An example is a pair (x, f(x)) • Problem: find a hypothesis h • such that h ≈ f • given a training set of examples D • Instance of supervised learning • Classification task: f  {0,1,…,C} (usually C=1) • Regression task: f reals

  12. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting:

  13. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting:

  14. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting:

  15. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting:

  16. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting:

  17. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: h=D is a trivial, but perhaps uninteresting solution (caching)

  18. a small one! Classification Task • The target function f(x) takes on values True and False • A example is positive if f is True, else it is negative • The set X of all examples is the example set • The training set is a subset of X

  19. Logic-Based Inductive Learning • Here, examples (x, f(x)) take on discrete values

  20. Logic-Based Inductive Learning • Here, examples (x, f(x)) take on discrete values Concept Note that the training set does not say whether an observable predicate is pertinent or not

  21. Rewarded Card Example • Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded” • Background knowledge KB:((r=1) v … v (r=10))  NUM(r)((r=J) v (r=Q) v (r=K))  FACE(r)((s=S) v (s=C))  BLACK(s)((s=D) v (s=H))  RED(s) • Training set D:REWARD([4,C])  REWARD([7,C])  REWARD([2,S])  REWARD([5,H])  REWARD([J,S])

  22. Rewarded Card Example • Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded” • Background knowledge KB:((r=1) v … v (r=10))  NUM(r)((r=J) v (r=Q) v (r=K))  FACE(r)((s=S) v (s=C))  BLACK(s)((s=D) v (s=H))  RED(s) • Training set D:REWARD([4,C])  REWARD([7,C])  REWARD([2,S])  REWARD([5,H])  REWARD([J,S]) • Possible inductive hypothesis:h  (NUM(r)  BLACK(s)  REWARD([r,s])) There are several possible inductive hypotheses

  23. Learning a Logical Predicate (Concept Classifier) • Set E of objects (e.g., cards) • Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD) • Observable predicates A(x), B(X), … (e.g., NUM, RED) • Training set: values of CONCEPT for some combinations of values of the observable predicates

  24. Learning a Logical Predicate (Concept Classifier) • Set E of objects (e.g., cards) • Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD) • Observable predicates A(x), B(X), … (e.g., NUM, RED) • Training set: values of CONCEPT for some combinations of values of the observable predicates • Find a representation of CONCEPT in the form: CONCEPT(x)  S(A,B, …)where S(A,B,…) is a sentence built with the observable predicates, e.g.: CONCEPT(x)  A(x)  (B(x) v C(x))

  25. Hypothesis Space • An hypothesis is any sentence of the form: CONCEPT(x) S(A,B, …)where S(A,B,…) is a sentence built using the observable predicates • The set of all hypotheses is called the hypothesis space H • An hypothesis h agrees with an example if it gives the correct value of CONCEPT

  26. Inductivehypothesis h Training set D - - + - + - - - - + + + + - - + + + + - - - + + Hypothesis space H {[CONCEPT(x)  S(A,B, …)]} Example set X {[A, B, …, CONCEPT]} Inductive Learning Scheme

  27. 2n 2 Size of Hypothesis Space • n observable predicates • 2n entries in truth table defining CONCEPT and each entry can be filled with True or False • In the absence of any restriction (bias), there are hypotheses to choose from • n = 6  2x1019 hypotheses!

  28. Multiple Inductive Hypotheses h1 NUM(r)  BLACK(s)  REWARD([r,s]) h2 BLACK(s) (r=J)  REWARD([r,s]) h3 ([r,s]=[4,C])  ([r,s]=[7,C])  [r,s]=[2,S])  REWARD([r,s]) h4 ([r,s]=[5,H])  ([r,s]=[J,S])  REWARD([r,s]) agree with all the examples in the training set

  29. Multiple Inductive Hypotheses Need for a system of preferences – called an inductive bias – to compare possible hypotheses h1 NUM(r)  BLACK(s)  REWARD([r,s]) h2 BLACK(s) (r=J)  REWARD([r,s]) h3 ([r,s]=[4,C])  ([r,s]=[7,C])  [r,s]=[2,S])  REWARD([r,s]) h4 ([r,s]=[5,H])  ([r,s]=[J,S])  REWARD([r,s]) agree with all the examples in the training set

  30. Notion of Capacity • It refers to the ability of a machine to learn any training set without error • A machine with too much capacity is like a botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he has seen before • A machine with too little capacity is like the botanist’s lazy brother, who declares that if it’s green, it’s a tree • Good generalization can only be achieved when the right balance is struck between the accuracy attained on the training set and the capacity of the machine

  31.  Keep-It-Simple (KIS) Bias • Examples • Use much fewer observable predicates than the training set • Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax • Motivation • If an hypothesis is too complex it is not worth learning it (data caching does the job as well) • There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller

  32.  Keep-It-Simple (KIS) Bias • Examples • Use much fewer observable predicates than the training set • Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax • Motivation • If an hypothesis is too complex it is not worth learning it (data caching does the job as well) • There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller Einstein: “A theory must be as simple as possible, but not simpler than this”

  33.  Keep-It-Simple (KIS) Bias • Examples • Use much fewer observable predicates than the training set • Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax • Motivation • If an hypothesis is too complex it is not worth learning it (data caching does the job as well) • There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller If the bias allows only sentences S that are conjunctions of k << n predicates picked fromthe n observable predicates, then the size of H is O(nk)

  34. - - + - + - - - - + + + + - - + + + + - - - + + Capacity is Not the Only Criterion • Accuracy on training set isn’t the best measure of performance Training set D Test Learn Example set X Hypothesis space H

  35. - - + - + - - - - + + + + - - + + + + - - - + + Generalization Error • A hypothesis h is said to generalize well if it achieves low error on all examples in X Test Learn Example set X Hypothesis space H

  36. Assessing Performance of a Learning Algorithm • Samples from X are typically unavailable • Take out some of the training set • Train on the remaining training set • Test on the excluded instances • Cross-validation

  37. - - + - + - - - - + + + + - - + + + + - - - + + Cross-Validation • Split original set of examples, train Examples D Train Hypothesis space H

  38. - - - - + + + + + + - - + Cross-Validation • Evaluate hypothesis on testing set Testing set Hypothesis space H

  39. Cross-Validation • Evaluate hypothesis on testing set Testing set - + + - + + + Test - + + - - - Hypothesis space H

  40. - - - - + + + + + + - - + Cross-Validation • Compare true concept against prediction 9/13 correct Testing set - + + - + + + - + + - - - Hypothesis space H

  41. Tennis Example • Evaluate learning algorithmPlayTennis = S(Temperature,Wind)

  42. Trained hypothesis PlayTennis =(T=Mild or Cool)  (W=Weak) Training errors = 3/10 Testing errors = 4/4 Tennis Example • Evaluate learning algorithmPlayTennis = S(Temperature,Wind)

  43. Trained hypothesis PlayTennis = (T=Mild or Cool) Training errors = 3/10 Testing errors = 1/4 Tennis Example • Evaluate learning algorithmPlayTennis = S(Temperature,Wind)

  44. Trained hypothesis PlayTennis = (T=Mild or Cool) Training errors = 3/10 Testing errors = 2/4 Tennis Example • Evaluate learning algorithmPlayTennis = S(Temperature,Wind)

  45. Ten Commandments of machine learning • Thou shalt not: • Train on examples in the testing set • Form assumptions by “peeking” at the testing set, then formulating inductive bias

  46. Unknown concept we want to approximate Hypothesisspace Choice of learning algorithm Learner Observations we have seen InductiveHypothesis Test set Better quantities to assess performance Prediction Observations we will see in the future Supervised Learning Flow Chart Datapoints Targetfunction Training set

  47. How to construct a better learner? • Ideas?

  48. A? True False B? False False True C? True True False True False Predicate as a Decision Tree The predicate CONCEPT(x)  A(x) (B(x) v C(x)) can be represented by the following decision tree: • Example:A mushroom is poisonous iffit is yellow and small, or yellow, • big and spotted • x is a mushroom • CONCEPT = POISONOUS • A = YELLOW • B = BIG • C = SPOTTED

  49. A? True False B? False False True C? True True False True False Predicate as a Decision Tree The predicate CONCEPT(x)  A(x) (B(x) v C(x)) can be represented by the following decision tree: • Example:A mushroom is poisonous iffit is yellow and small, or yellow, • big and spotted • x is a mushroom • CONCEPT = POISONOUS • A = YELLOW • B = BIG • C = SPOTTED • D = FUNNEL-CAP • E = BULKY

  50. Training Set

More Related