Artificial Intelligence

Artificial Intelligence Universitatea Politehnica Bucuresti2007-2008 Adina Magda Florea http://turing.cs.pub.ro/ai_07

Course No. 10, 11 Machine learning • Types of learning • Learning by decision trees • Learning disjunctive concepts • Learning in version space 2

1. Types of learning Specific inferences • Inductive inference • Abductive inference • Analogical inference • Uda(iarba) • (x)(PlouaPeste(x)  Uda(x))

General structure of a learning system Teacher Learning system Learning Process Feed-back Data Learning results Problem Solving K & B Inferences Strategy Environment Results Performance Evaluation Feed-back

Types of learning • Learning through memorization • Learning through instruction / operationalization • Learning through induction (from examples) • Learning through analogy

2. Decision trees. ID3 algorithm • Inductive learning • Learns concept descriptions from examples • Examples (instances of concepts) are defined by attributes and classified in classes • Concepts are represented as a decision tree in which every level of the tree is associated to an attribute • The leafs are labeled with concepts

Building and using the decision tree • First build the decision tree from examples • Label leaves with YES or NO (one class) or with the class (Ci) • Unknown instances are then classified by following a path in the decision tree according to the values of the attributes

Example

Another example: Credit evaluation No. Risk (Classification) Credit History Debt Collateral Income1 High Bad High None $0 to $15k 2 High Unknown High None $15 to $35k3 Moderate Unknown Low None $15 to $35k4 High Unknown Low None $0k to $15k5 Low Unknown Low None Over $35k 6 Low Unknown Low Adequate Over $35k 7 High Bad Low None $0 to $15k 8 Moderate Bad Low Adequate Over $35k 9 Low Good Low None Over $35k 10 Low Good High Adequate Over $35k 11 High Good High None $0 to $15k 12 Moderate Good High None $15 to $35k13 Low Good High None Over $35k 14 High Bad High None $15 to $35k

Algorithm for building the decision treefunc tree (ex_set, attributes, default) 1. if ex_set = empty thenreturn a leaf labeled with default 2. if all examples in ex_set are in the same class thenreturn a leaf labeled with that class 3. if attributes = empty thenreturn a leaf labeled with the disjunction of classes in ex_set 4. Select an attribute A, create a node for A and labeled the node with A - remove A from attributes –> attributes’ - m = majority (ex_set) -for each value V of A repeat - be partitionV the set of examples from ex_set with value V for A - create nodeV = tree (partitionV, attributes’,m) - create link node A - nodeV and label the link with V end

Remarks • Different decision trees • Depth of different DTs is different • Occam's razor: build the simplest tree

Information theory • Universe of messages • M = {m1, m2, ..., mn } • and a probability p(mi) of occurrence of every message in M, the information content of M can be defined as:

Information content I(T) • p(risk is high) = 6/14 • p(risk is moderate) = 3/14 • p(risk is low) = 5/14 • The information content of the decision tree is: • I(Arb) = 6/14log(6/14)+3/14log(3/14)+5/14log(5/14)

Information gain G(A) • For an attribute A, the information gain obtained by selecting this attribute as the root of the tree equals the total information content of the tree minus the information content that is necessary to finish the classification (building the tree), after selecting A as root • G(A) = I(Arb) - E(A)

Computing E(A) • Set of learning examples C • Attribute A with n values in the root -> C divided in {C1, C2, ..., Cn}

Example • “Income” as root: • C1 = {1, 4, 7, 11} • C2 = {2, 3, 12, 14} • C3 = {5, 6, 8, 9, 10, 13} G(income) = I(Arb) - E(Income) = 1,531 - 0,564 = 0,967 bits G(credit history) = 0,266 bits G(debt) = 0,581 bits G(collateral) = 0,756 bits

Learning performance • Be S the set of learning examples • Divide S in the learning set and the training set • Apply ID3 • How many examples from the training set are correctly classified? • Repeat steps above for different LS and TS • Obtain a prediction of the learning performance • Graph X- size of LS, Y- percentage of correctly classified examples • Happy graphs

Remarks • Lack of data • Attributes with many values and high information gain • Attributes with numerical values • Decision rules

3. Learning by clustering • Generalization and specialization Learning examples 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) 21

Learning by clustering concept name: NAME positive part cluster: description: (yellow brick nice big) ex: 1 negative part ex: concept name: NAME positive part cluster: description: ( _ _ nice _) ex: 1, 2 negative part ex: 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) 22

Learning by clustering concept name: NAME positive part cluster: description: ( _ _ _ _) ex: 1, 2, 3, 4, 5 negative part ex: 6, 7 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) over generalization 23

Learning by clustering concept name: NAME positive part cluster: description: (yellow brick nice big) ex: 1 cluster: description: ( blue ball nice small) ex: 2 negative part ex: 6, 7 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) 24

Learning by clustering concept name: NAME positive part cluster: description: ( yellow brick _ _) ex: 1, 3 cluster: description: ( _ ball _ _) ex: 2, 4 negative part ex: 6, 7 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) 25

Learning by clustering concept name: NAME positive part cluster: description: ( yellow _ _ _) ex: 1, 3, 5 cluster: description: ( _ ball _ _) ex: 2, 4 negative part ex: 6, 7 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) A if yellow or ball 26

Learning by clustering algorithm 1. Be S the set of examples 2. Create PP and NP 3. Add all ex- from S in NP and remove ex- from S 4. Create a cluster in PP and add first ex+ 5. S = S – ex+ 6. for every ex+ in S eirepeat 6.1 for every cluster Cirepeat - Create description ei + Ci - if description covers no ex- then add ei to Ci 6.2 if ei has not been added to any cluster then create a new cluster with ei end 27

4. Learning in version space Generalization operators in version space • Replace constants with variables color(ball, red) color(X, red) • Remove literals from conjunctions shape(X, round)  size(X, small)  color(X, red) shape(X, round)  color(X, red) • Add disjunctions shape(X, round)  size(X, small)  color(X, red) shape(X, round)  size(X, small)  (color(X, red)  color(X, blue)) • Replace an class with the superclass in is-a relations is-a(tom, cat) is-a(tom, animal) 28

Candidate elimination algorithm • Version space = the set of concept descriptions which are consistent with the learning examples • What is the idea? = reduce the version space based on learning examples • 1 algorithm – from specific to general • 1 algorithm – from general to specific • 1 algorithm – bidirectional search = candidate elimination algorithm 29

Candidate elimination algorithm obj(X, Y, Z) obj(small, Y, Z) obj(X, red, Z) obj(X, Y, ball) obj(X, red, ball) obj(small, Y, ball) obj(small, red, Z) obj(small, red, ball) obj(small, orange, ball) 30

Generalization and specialization • P and Q – the set which unify with p and q in FOPL • p is more general than qif and only if P  Q color(X,red)  color(ball,red) • p more genarl than q - p  q x p(x)  positive(x) x q(x)  positive(x) • p covers q if and only if: q(x)  positive(x) is a logical consequence of p(x)  positive(x) • Concept space obj(X,Y,Z) 31

Generalization and specialization • A concept c is maximally specific if it covers all ex+, does not cover any ex- and for c’ which covers all ex+, c  c’. - S • A concept c is maximally general if it does not cover any ex- and for c’ which does not cover any ex-, c  c’. - G S – set of hypothesis (candidate concepts) = maximum specific generalizations G – set of hypothesis (candidate concepts) = maximum general specializations 32

Algorithm for searching from specific to general 1. Initialize S with the first ex+ 2. Initialize N with the empty set 3. for every learning example repeat 3.1 if ex+, p, then for each s  Srepeat - ifs does not cover pthen replace s with the most specific generalization which covers p - Remove from S all hypothesis more general than other hypothesis from S - Remove from S all hypothesis which cover an ex- from N 3.2 if ex-, n, then - Remove from S all hypothesis which cover n - Add n to N (to check for overgeneralization) end 33

Algorithm for searching from specific to general S: { } Positive: obj(small, red, ball) Positive: obj(small, white, ball) S: { obj(small, red, ball) } Positive: obj(large, blue, ball) S: { obj(small, Y, ball) } S: { obj(X, Y, ball) } 34

Algorithm for searching from general to specific 1. Initialize G with the most general description 2. Initialize P with the empty set 3. for every learning example repeat 3.1 if ex-, n, then for each g  Grepeat - ifg covers nthen replace g with the most general specialization which does not cover n - Remove from G all the hypothesis more specific than other hypothesis in G - Remove from G all hypothesis which does not cover the positive examples from P 3.2 if ex+, p, then - Remove from G all the hypothesis that does not cover p - Add p to P (to check for overspecialization) end 35

Algorithm for searching from general to specific G: { obj(X, Y, Z) } Negative: obj(small, red, brick) G: { obj(large, Y, Z), obj(X, white, Z), obj(X, blue, Z), obj(X, Y, ball), obj(X, Y, cube) } Positive: obj(large, white, ball) G: { obj(large, Y, Z), obj(X, white, Z), obj(X, Y, ball) } Negative: obj(large, blue, cube) G: {obj(X, white, Z), obj(X, Y, ball) } Positive: obj(small, blue, ball) G: obj(X, Y, ball) 36

Algorithm for searching in version space 1. Initialize G with the most general description 2. Initialize S with the first ex+ 3.for every learning example repeat 3.1if ex+, p, then 3.1.1 Remove from G all the elements that does not cover p 3.1.2for each s  Srepeat - ifs does not cover pthen replace s with the most specific generalization which covers p - Remove from S all hypothesis more general than other hypothesis in S - Remove from S all hypothesis more general than other hypothesis in G 37

Algorithm for searching in version space - cont 3.2if ex-, n, then 3.2.1 Remove from S all the hypothesis that cover n 3.2.2for each g  Grepeat - ifg covers nthen replace g with the most general specialization which does not cover n - Remove from G all hypthesis more specific than other hypothesis in G - Remove from G all hypthesis more specific than other hypothesis in S 4. ifG = S and card(S) = 1 then a concept is found 5.ifG = S = { } then there is no concept consistent with all hypothesis end 38

Algorithm for searching in version space G: { obj(X, Y, Z) } S: { } Positive: obj(small, red, ball) G: { obj(X, Y, Z) } S: { obj(small, red, ball) } Negative: obj(small, blue, ball) G: { obj(X, red, Z) } S: { obj(small, red, ball) } Positive: obj(large, red, ball) G: { obj(X, red, Z) } S: { obj(X, red, ball) } Negative: obj(large, red, cube) G: { obj(X, red, ball) } S: { obj(X, red, ball) } 39

Implementation of the algorithm specific to general exemple([pos([large,white,ball]),neg([small,red,brick]), pos([small,blue,ball]),neg([large,blue,cube])]). acopera([],[]). acopera([H1|T1], [H2|T2]) :- var(H1), var(H2), acopera(T1,T2). acopera([H1|T1], [H2|T2]) :- var(H1), atom(H2), acopera(T1,T2). acopera([H1|T1], [H2|T2]) :- atom(H1), atom(H2), H1=H2, acopera(T1,T2). maigeneral(X,Y) :- not(acopera(Y,X)), acopera(X,Y). generaliz([], [], []). generaliz([Atrib|Rest], [Inst|RestInst], [Atrib|RestGen]):- Atrib==Inst, generaliz(Rest,RestInst,RestGen). generaliz([Atrib |Rest], [Inst|RestInst], [_|RestGen]):- Atrib\=Inst, generaliz(Rest,RestInst,RestGen). 40

Implementation of the algorithm specific to general specgen :- exemple( [pos(H)|Rest] ), speclagen([H], [], Rest). speclagen(H, N, []) :- print('H='), print(H), nl, print('N='), print(N), nl. speclagen(H, N, [Ex|RestEx]) :- process(Ex, H, N, H1, N1), speclagen(H1, N1, RestEx). process(pos(Ex), H, N, H1, N) :- generalizset(H, HGen, Ex), elim(X, HGen, (member(Y,HGen), maigeneral(X,Y)), H2), elim(X, H2, (member(Y,N),acopera(X,Y)), H1). process(neg(Ex), H, N, H1, [Ex|N]) :- elim(X, H, acopera(X,Ex), H1). elim(X,L,Goal,L1):- (bagof(X, (member(X,L), not(Goal)), L1); L1=[]). 41

Implementation of the algorithm specific to general generalizset([], [], _). generalizset([Ipot|Rest], IpotNoua, Ex) :- not(acopera(Ipot,Ex)), (bagof(X, generaliz(Ipot,Ex,X), ListIpot); ListIpot=[]), generalizset(Rest,RestNou,Ex), append(ListIpot,RestNou,IpotNoua). generalizset([Ipot|Rest], [Ipot|RestNou], Ex):- acopera(Ipot,Ex), generalizset(Rest,RestNou,Ex). ?- specgen. H=[[_G390, _G393, ball]] N=[[large, blue, cube], [small, red, brick]] 42

Artificial Intelligence