Chapter 6: Machine Learning

Chapter 6: Machine Learning

Objectives At the end of the chapter, you should be able to: • Describe how each method can be used to perform classification tasks – given some input features you should be able to describe how the out classification is determined. 2. Write a program to perform a simple application using the above methods. 3. Compare and contrast the different methods, commenting on the suitability for different kinds of problems. Ch06A / 2

Topics • Introduction • A Simple Inductive Learning Example • Version Space Learning • Decision Tree Induction • Genetic Algorithms • Neural Networks • Comparing Learning Methods Ch06A / 3

Introduction • Another fundamental aspect of human intelligence is the ability to learn. • Ability to learn is arguably one of the most crucial characteristics of an intelligent entity. • A system that can learn: • can adapt to change, • able to respond to new problems and situations • can develop more sophisticated rules/reasoning with experience, • May also be easier to program - start off simple, but give program ability to learn and improve. • Learning is still an expanding area of AI research. • It overlaps with almost all other areas of AI such as: • in planning and robotics, there is interest in getting systems to learn rules of behaviour from experience in some environment; • in natural language a system may learn syntactic rules from example sentences; Ch06A / 4

Introduction • in vision a system may learn to recognize some object given some example images; • in expert systems rules may be learned from example cases. • It is also an area which is attracting interest in industry, with many commercial products available. • For example, there is interest in analyzing data obtained from supermarket loyalty cards in order to find rules that can be used in direct marketing campaigns. • Many types of learning, e.g., • Learning from example (inductive learning). • Give someone pictures of 10 different tigers. • Learning by “being told” • Tell someone “A tiger has a long tail, stripes, short ears,…” • A Learning by analogy • “A tiger is like a big cat with orange stripes and big teeth” • Learning by discovery. • AI focuses on learning from examples (inductive learning) Ch06A / 5

Introduction • The inductive learning methods may be used to try to produce a system to automatically produce the correct classification given just the input feature values. • The techniques for inductive learning • symbolic methods • where the learned concept is represented using the sorts of knowledge representation languages: • learning is seen as a search problem • Two main approaches: • the search space of possible concepts to be searched to find one that matches the examples • building up the best decision tree to categorize the given examples. • genetic algorithms • based on the notion that good solution can evolve out of a population, by combining possible solutions to produce “offspring” solutions and “killing off” the weaker of those solutions. • neural networks • loosely based on the architecture of the brain, and are a promising approach for certain tasks. Ch06A / 6

A Simple Inductive Learning Example • Real machine learning applications typically require many hundreds or even thousands of examples in order for interesting knowledge to be learned. • For example, to learn rules to diagnose a particular disease, given that the patient has, say, stomach pains, • data on thousands of patients would be required, • listing the additional symptoms of each patient and • the final diagnoses made by an expert. • Goal of inductive learning is to find some rule or function that lets you draw useful conclusions, e.g., • Rule letting you determine student performance from entry qualifications. • Rule letting you classify images (into widget/wodget) given some features. • Rule letting you determine whether a word in a sentence is a noun or a verb. • Provide learning system with data/cases where result is known. • Student results, manually classified images • Rule should work for new cases. Ch06A / 7

A Simple Inductive Learning Example Classification • Many learning problems can be expressed as classification problems. • Given various input features (e.g., size of teeth, stripyness..) goal is to decide which of a number of classes (or “output categories) the case falls into (e.g., tiger vs rabbit). • Consider: • character recognition • disease diagnosis • student performance prediction. • All classification tasks. • One way to formalise it as looking for a function that maps features to categories: • f(feature1, feature2, ….) --> Category • More informally, if we have features such as: • teeth=big, covering = stripy we want a rule allowing us to conclude that this example is a tiger. • We start with cases where we know the right classification. Ch06A / 8

A Simple Inductive Learning Example Example: The student problem • Suppose we have data on a number of students in last year’s class, and are trying to find a rule that will allow us to determine whether current students are likely to get a first-class degree mark. We will assume the data that we have on last year’s students include: • Whether they got the equivalent of a first class mark last year; • Whether they work hard; • Male or female; • Whether they go out drinking a lot • For each we also know whether they did in fact get a first. • The ones that didi are referred to as positive example while the ones didn’t are referred to as negative examples. • Six such students are to be considered Ch06A / 9

Student First Last Male? Works Drinks? Year? hards? First this Year? Richard yes yes no yes no Alan yes yes yes no yes Alison no no yes no no Positive example Negative example Jeff no yes no yes no Gail yes no yes yes yes Simon no yes yes yes no A Simple Inductive Learning Example Student Exam Performance Data • It shows that the two people who got firsts (Alan and Gail) both got first last year and work hards, and that none of the people who failed to get first both did well last year and work hards. • So a reasonable learned rule that if you did well last year and work hard this year you should do OK. • However, other rules are possible. For example, if you EITHER are male and don’t drink OR are female and drink a lot then you’ll do well. Ch06A / 10

A Simple Inductive Learning Example • The rule is a little odd, and more complex than the order rule. • Generally the best rule, getting most predictions right, will be the simplest one, as it tends to capture generalities (hard-working students do well). • In the example, four attributes (or features) to focus are first last year, works hard, male/female and drinks. All these have yes/no answers – known as feature values. • We use the letters L, M, Wand Dtorepresent features, and the feature values as Ts and Fs. • So, Richard’s feature values correspond to the row TTFT. • The fact doesn’t drink but does work hard can also be represented as W D. Ch06A / 11

Version Space Learning • Treats learning as a search problem • As a search problem, we are looking to search through all possible rules/functions, and see which best accounts for our example data. • ExampIe, given data on 10 tigers and non tigers, search through all possible rules and find the one that (most) correctly classifies all the examples. • Want to find the simplest such rule that covers the example data. • If we make the simplifying assumption that the rule to be learned involves a conjunction of facts, then there is a fairly limited number of possible rules • For example, a rule only involving AND and not OR • A rule like “If they work hard and don’t drink a lot they’ll get a first” can be represented as W D • In logic terms, if this formula is true for a particular student, then the rule says that it is true that they will get a first • A rule “Everyone will get a first” is represented by T (always true) • A rule “No one will get a first” is represented by F Ch06A / 12

Positive example Negative example Version Space Learning • Example: Data • Assume rule is just a conjunction of required features (e.g., stripy AND longTail -> tiger) • Which is the simplest such that predicts tigers given this data? • How do we search through all possible rules to find best one? • In this (much simplified) example we could just go through all the possible rules systematically. • But in general the “search space” is very large and this is impractical. • So inductive learning is partly about clever ways of managing search for possible rules. Ch06A / 13

Version Space Learning • All possible rules can be represented as a graph where • the top/upper node is T • the bottom node is F • edges link nodes (lower node) contains possible rules • Same as upper node but with an additional condition • For example, W D • Adds the condition “doesn’t drink” to the node W • Figure below shows the part of Search Space for Student Problem Ch06A / 14

Version Space Learning: Positive Examples • The learning task is involving searching the graph to find possible rules that fit the given example data. • This can be done by • Maintaining some candidate hypotheses about what the right rules could be • Going through the examples one by one modifying these hypotheses to fit the current example • A simple version of the approach considers: • only positive examples • Updates a hypothesisS representing a possible rule covering examples seen so far • Swill always be the most specific formula that is true for the examples looked at so far. • One formula is more specific than another if it is true for fewer possible examples, and will be below it in the graph. • S = F when no examples have been examined. • For each positive example we move S up the graph until a formula is found that is true for that example. If more than one such formula , the more specific one is chosen. Ch06A / 15

Version Space Learning: Positive Examples Find-S Algorithm: 1. Initialize s to the most specific hypothesis in in S 2. For each positive training instance x (example) • For each attribute constraint ai in s If the constraint ai is satisfied by x then Do nothing else replace ai in h by the next more general constraint that is statisfied by x 3. Output hypothesis h Ch06A / 16

Version Space Learning: Positive Examples For Student Problem: • Positive Example: • Alan, with feature values TTTF <L, W, M, ¬D> • Initialize s to the most specific hypothesis in in S • Initialize s to the most specific hypothesis in in S S= F{<,  ,  , >} • Are there any elements of example that satisfied the elements of S? No element statisfied, so replace elements of S by the elements of example. S = L  W  M  ¬D • Only people who got a first last year, work hard, are male AND don’t drink will do well this year. Ch06A / 17

Version Space Learning: Positive Examples • Gail, with feature values TFTT  <L, ¬M, W, D> • Looking up the graph, a possible new hypothesis is Previous S = L  M  W  ¬D  {<L, ?, ?, ?>,<?, ?, W, ?>, <?, M, ?, ?>, <?, ?, ?, ¬D>} S is true for example = {<L, ?, ?, ?>, <?, ?, W, ?>} S = L  W • The most specific formula that fits. It omitting M ¬D and M  D from S • Other more general formulae are possible given these positive examples (e.g.,W) but the most specific one is chosen. Ch06A / 18

Version Space Learning: Negative Examples • It is also possible to consider just the negative examples • Starting at the top of the graph (T) • Set current hypotheses initially as G = T • This set will contain the most general formulae that are false for the negative examples. • When a negative example is considered we can move down the graph to find formulae that are false for the examples. • The most general such formulae are chosen, but there may be more than one equally general formula. Ch06A / 19

Version Space Learning: Negative Examples For Student Problem: • Negative Example: Initially G = {T} • Richard, with feature values TTFT(L, M, ¬W, D) • Moving down the graph, starting at T, a possible hypothesis is G = {¬L, ¬M, W, ¬D} • From the evidence so far possible rules are that you’ll get a first if you didn’t last year, if you work hard, if you’re female, or if you don’t go drinking • Alison, with feature values FFTF (¬L, ¬M, W, ¬D) • False for negative example = FFTF (L, M, ¬W, D) • Previous G = {¬L, ¬M, W, ¬D} • A new possible hypothesis is G = {¬L  M, ¬L  ¬W,¬L  D, L  ¬M,L  W,L  ¬D, ¬M  ¬W,¬M  D, W M, W D,¬D ¬W} Ch06A / 20

Version Space Learning: Negative Examples • Jeff, with feature values FTFT (¬L, M, ¬W, D) • False for negative example = TFTF (L, ¬M, W, ¬D) • Previous G = {¬L  M, ¬L  ¬W,¬L  D, L  ¬M,L  W,L  ¬D, ¬M  ¬W,¬M  D, W M, W D,¬D ¬W} • A new possible hypothesis is G = {¬L  M  W, ¬L  ¬M  ¬W,¬L  D  ¬M, L  W  ¬D,L  ¬D  ¬M, L  ¬M  D,L  W M, L  W D,L  ¬D ¬W,…etc} • Simon, with feature values FTTT (¬L, M, W, D) • False for negative example = TFFF (L, ¬M, ¬W, ¬D) • Previous G = {¬L  M  W, ¬L  ¬M  ¬W,¬L  D  ¬M, L  W  ¬D,L  ¬D  ¬M, L  ¬M  D,L  W M, L  W D,L  ¬D ¬W,…etc} • Moving down the graph, a possible hypothesis is G = {¬L  M  W, ¬L  ¬M  ¬W,¬L  D  ¬M, L  W  ¬D,L  ¬D  ¬M, L  ¬M  D,L  W M, L  W D,L  ¬D ¬W,…etc} Ch06A / 21

Version Space Learning: Negative Examples At the end of this process there are many hypotheses (most not explicitly considered above), but including the two rules that you can get: • A first by NOT getting one last year, NOT working hard and being female • The more sensible rule that you can get one if you got a first last year AND work hard. • In general it is best to consider both positive and negative examples. • All the examples can be used to check current hypotheses. • A maximally specific hypothesisS is maintained as well as the maximally general hypothesis set G. • At the end of the day the algorithm should give us the range of possible rules, from the most general ones to the most specific. • If, at the end of processing, G has only one element and that equals S then we can be sure there is only one rule that fits the given facts. Ch06A / 22

Version Space Learning: Candidate Elimination Algorithm The algorithm that combines the techniques positive and negative examples is reffered to as candidate elimination algorithm. Candidate Elimination Algorithm: Initialize so that G = {T}  {<?,?,?,?>} S = F {<,  ,  , >} 2. For each exampleE: • If it is a positive example then: • If Sis false for E, look UP the graph from Sand replaceSwith the first formula which is found true for E. • Delete any elements of Gwhich aren’t true for E. • If it is a negative example then: • If any formulae in Gare true for E, look DOWN the graph and replace them with the first formulae found which are false for E. • Delete any elements of Gmore specific than S. Ch06A / 23

Version Space Learning: Candidate Elimination Algorithm Apply Candidate Elimination Algorithm for Student Problem example Initialize so that G = {T}  {<?,?,?,?>} S = F {<,  ,  , >} 2. For each exampleE: 1. The first example (Richard: Feature value: TTFT  <L, M, ¬W, D>) is a negative. • The only element in G is currently T, which is always true. • We look DOWN the graph from T • Find that four nodes have formulae false for that example (¬L, ¬M, W , ¬D) • Add all to G, replacing T. G = {¬L, ¬M, W , ¬D}  {<F,?,?,?>, <?,F,?,?>, <?,?,T,?>, <?,?,?,F>} S = F {<,  ,  , >} • S is more specific than G, no element G deleted Ch06A / 24

Version Space Learning: Candidate Elimination Algorithm 2. The second example (Alan: TTTF  <L, M, W, ¬D>)) is a positive. • S is false for this example • We look UP the graph from F to find a formula is true • Replace S with the first formula which is found true for E. • The one chosen is L  W  M  ¬D, Previous formula S = F {<,  ,  , >} New S is S = {L  M  W  ¬D }  {<T,?,?,?>, <?,T,?,?>, <?,?,T,?>, <?,?,?,F>} • This is the most specific such formula, or the first one found looking up the graph from the previous formula. Ch06A / 25

Version Space Learning: Candidate Elimination Algorithm • Delete any element of G which aren’t true for E G = {¬L, ¬M, W , ¬D}  {<F,?,?,?>, <?,F,?,?>, <?,?,T,?>, <?,?,?,F>} • We delete ¬Land ¬M from G which is not true for example. So G become G = {W, ¬D} S = L  M  W  ¬D Ch06A / 26

Version Space Learning: Candidate Elimination Algorithm 3. The third example (Alison: Feature value: FFTF  <¬L, ¬M, W, ¬D>) is a negative. • Previous element of G and S: G = {W, ¬D} S = L  M  W  ¬D • All formulae in G are true for example • We look DOWN the graph from T • Find that four nodes have formulae false for that example (L, M, ¬W , D) • Replace Gwith with first formulae found which are false for example. G = {<L  W>, <L  ¬D>, <M  W>, <M  ¬D>, <¬W ¬D>, <D  W>}  {<T,?,T,?>, <T,?,?,F>,<?,T,T,?>,<?,T,?,F>, <?,?,F,?>, <?,?,T,T>} • No elements of G are deleted because S is more specific than G. Ch06A / 27

Version Space Learning: Candidate Elimination Algorithm 4. The fourth example (Jeff: Feature value: FTFT  <¬L, M, ¬W, D>) is a negative. • Previous element of G and S: G = {<L  W>, <L  ¬D>, <M  W>, <M  ¬D>, <¬W ¬D>, <D  W>}  {<T,?,T,?>, <T,?,?,F>,<?,T,T,?>,<?,T,?,F>, <?,?,F,?>, <?,?,T,T>} S = L  M  W  ¬D • No formulae in G are true for example • No elements of G are deleted because S is more specific, so G maintain G = {<L  W>, <L  ¬D>, <M  W>, <M  ¬D>, <¬W ¬D>, <D  W>}  {<T,?,T,?>, <T,?,?,F>,<?,T,T,?>,<?,T,?,F>, <?,?,F,?>, <?,?,T,T>} Ch06A / 28

Version Space Learning: Candidate Elimination Algorithm 5. The fifth example (Gail: Feature value: TFTT  <L, ¬M, W, D>) is a positive. • Previous element of G and S: G = {<L  W>, <L  ¬D>, <M  W>, <M  ¬D>, <¬W ¬D>, <D  W>}  {<T,?,T,?>, <T,?,?,F>,<?,T,T,?>,<?,T,?,F>, <?,?,F,?>, <?,?,T,T>} S = L  M  W  ¬D • Are there exist elements S arefalse for example? • Yes, so replace S with the first formula which is found true for example S = {<L  W>}  {<T,?,T,?>} • We delete <L  ¬D>, <M  W>, <M  ¬D>,and <¬W ¬D>,from G which is not true for example. So G become G = {<L  W>, <D  W>}  {<T,?,T,?>, <?,?,T,T>} Ch06A / 29

Version Space Learning: Candidate Elimination Algorithm 6. The sixth example (Simon: Feature value: FTTT  < ¬L, M, W, D>) is a negative • Previous element of G and S: G = {<L  W>, <D  W>}  {<T,?,T,?>, <?,?,T,T>} S = {<L  W>}  {<T,?,T,?>} • Is any formulae in G are truefor example? • Yes, so replace G with the first formula which is false for G G = {<L  W>}  {<T,?,T,?>} • Non of elements in G are more specific than S, so no element in G will be deleted. • After the final negative example (Simon) is considered we get G = {L  W} S = L  W • This allow us to conclude for sure that the only rule of the type considered (a simple conjunction of features) is the one given: you’ll get a first if you got one last year and work hard Ch06A / 30

Version Space Learning: Candidate Elimination Algorithm Summary • Inductive Learning is about finding general rules from examples. • Often used for classification tasks - what kind of thing is it, given some features? Ch06A / 31

Chapter 6: Machine Learning