Decision Tree Learning

Decision Tree Learning Brought to you by Chris Creswell

Why learn about decision trees? • A practical way to get AI to adapt to player – a simple form of user modeling • Enhances replayability • Player’s bot allies can be more effective • Opponent bots can learn player’s tactics, player can’t repeat the same strategy over and over

What we’ll learn • What is a decision tree • How do we build a decision tree • What has been done with decision trees in games • What else can we do with them

What is a decision tree • Decision Tree Learning (DTL) is a form of inductive learning task, meaning it has the following objective: use a training set of examples to create a hypothesis that makes general conclusions

What is a decision tree – terms/concepts • Attribute: a variable that we take into account in making a decision • Target attribute: the attribute that we want to take on a certain value, we’ll decide based on it

What is a decision tree – an example

What is a decision tree – how to use it • Given a set of circumstances (values of attributes), use it to traverse the tree from root to leaf • The leaf node is a decision

Why is this useful • The hypothesis formed from the training set can be used to draw conclusions about sets of circumstances not present in the training set – it will generalize

How do we construct a decision tree? • Guiding principle of inductive learning: • Occam’s razor – choose the simplest possible hypothesis that is consistent with the provided examples • General idea: recursively classify the examples based on one of the attributes until all examples have been used • Here’s the algorithm:

node LearnTree(examples, targetAttribute, attributes) examples is the training set targetAttribute is what to learn attributes is the set of available attributes returns a tree node begin if all the examples have the same targetAttribute value, return a leaf with that value else if the set of attributes is empty return a leaf with the most common targetAttribute value among examples else begin A = the “best” attribute among attributes having a range of values v1, v2, …, vk Partition examples according to their value for A into sets S1, S2, …, Sk Create a decision node N with attribute A for i = 1 to k begin Attach a branch B to node N with test Vi if Si has elements (is non-empty) Attach B to LearnTree(Si, targetAttribute, attributes – {A}); else Attach B to a leaf node with most common targetAttribute end return decision node N end end

This is how we construct a decision tree • This very simple pseudo-code basically implements the construction of a decision tree, except for one key thing that is abstracted away, this is … • Key step in the algorithm: choosing the “best” attribute to classify on • One algorithm for doing this is ID3 (used in Black and White) • We’ll get to the algorithm in a bit

This is how we construct a decision tree – pseudo-code walkthrough • First, LearnTree is called with all examples, the targetAttribute, and all attributes to classify on • It chooses the “best” (we’ll get to that) attribute to split on, creates a decision node for it, then recursively calls LearnTree for each partition of the examples

This is how we construct a decision tree – pseudo-code walkthrough • Recursion stops when: • All examples have the same value • There are no more attributes • There are no more examples • The first two need some explanation, the third one is trivial – all examples have been classified

This is how we construct a decision tree – pseudo-code walkthrough • Recursion stops when all examples have the same value, when does this happen? • When ancestor attributes and corresponding branch values, as well as the target attribute and value, are the same across examples

This is how we construct a decision tree – pseudo-code walkthrough • Recursion stops when there are no more attributes • This happens when training set is inconsistent, e. g. there are 2 or more examples having the same values for all but the target attribute • The way our pseudo-code is written, it guesses when this happens, it picks the most popular target attribute value • This is a decision left up to the implementer • This is a weakness of the algorithm • It doesn’t handle “noise” in its training set well

This is how we construct a decision tree – pseudo-code walkthrough • Let’s watch the algorithm in action … • http://www.cs.ualberta.ca/~aixplore/learning/DecisionTrees/InterArticle/2-DecisionTree.html

ID3 algorithm • Picks the best attribute to classify on in a call of LearnTree, does so by quantifying how useful an attribute will be w/respect to the remaining examples • How? Using Shannon’s Information theory, pick the attribute that favors the best reduction in entropy

ID3 algorithm – Shannon’s Information Theory • Choose an attribute that favors the best reduction in entropy • Entropy quantifies the variation in a set of examples with respect to the target attribute values • A set of ex’s with mostly the same targetAttr value has very low entropy (that’s good) • A set of ex’s with many varying targetAttr values will have high entropy (bad) • Ready? Here come some equations …

ID3: Shannon’s Information Theory • In the following, S is the set of examples, Si is a subset of S with value Vi under the target Attribute

ID3: Shannon’s Information Theory • Expected entropy of candidate attribute A is weighted sum of subset • In the following, k is the size of range of attribute A:

ID3: Shannon’s Information Theory • What we really want is to maximize information gain, defined:

ID3: Shannon’s Information Theory • Entropy of the commute time example: The thirteens are because there are thirteen examples. The fours, twos, and sevens come from how many short, medium, and long commutes there are, respectively.

ID3: Shannon’s Information Theory

ID3: Drawbacks • Does not guarantee the smallest possible decision tree • Selects classifying attribute based on best expected information gain, not always right • Not very good with continuous values, best with symbolic data • When given lots of distinct continuous values, ID3 will create very “bushy” trees – 1 or 2 levels deep, lots and lots of leaves • We can make this less serious, but it’s still a drawback

Decision Trees in games • First successful use of a decision tree was in “Black and White” (Lionhead studios, 2001) • http://www.gameai.com/blackandwhite.html • “In Black & White you can be the god you want to be. Will you rule with a fair hand, making life better for your people? Or will you be evil and scare them into prayer and submission? No one can tell you which way to be. You, as a god, can play the game any way you choose.”

Decision Trees in games • “And as a god, you get to own a Creature. Chosen by you from magical, special animals, your Creature will copy you, you will teach him and he will learn by himself. He will grow, ultimately to 30 metres, and can do anything you can do in the game. Your Creature can help the people or can kill and eat them. He can cast Miracles to bring rain to their crops or he can drown them in the sea. Your Creature is your physical manifestation in the world of Eden, He is whatever you want him to be. ... And the game also boasts a new level of artificial intelligence. Your Creature is almost a living, breathing thing. He learns, remembers and makes connections. His huge range of abilities and decisions is born of a ground-breakingly powerful and complex AI system.”

Decision Trees in games • So you teach your creature by giving it feedback – it learns to perform actions that get it the highest feedback • Problem: feedback is a continuous variable • We have to make it discrete • We do so using K-means clustering

Decision Trees in games • In K-means clustering, we find out how many clusters we want to create, then use an algorithm to successively associate or dissociate instances with clusters until associations stabilize around k clusters • The author’s reference for this is from a computer vision textbook • I wasn’t about to go buy it • Not important to know clustering algorithm

Decision Trees in games • Example from B&W: should your creature attack a town • Examples:

Decision Trees in games • If we ask for 4 clusters, K-means clustering will create clusters around -1, 0.4, 0.1, -0.3. The memberships in these clusters will be {D1, D3, D5, D9}, {D2}, {D6, D8}, {D4, D7} respectively. • The tree ID3 will create using these examples and clusters:

Decision Trees in games

Decision Trees in games • So in this case, the tree the creature learned can be reduced to a nice compact logical expression: • ((Allegiance = Enemy) AND (Defense = weak)) OR ((Allegiance = Enemy) AND (Defense = Medium)) • This happens sometimes • Makes it easier and more efficient to apply

An Extension to ID3 to better handle continuous values • Seems simple, use an inequality, right? • Not that simple – need to pick cut points • Cut points are the boundaries we create for our inequalities, where do they go? • Key insight: optimal cut points must always reside at boundary points • Okay, so what are boundary points?

An Extension to ID3 to better handle continuous values • If we sort the list of examples according to their values of the candidate attribute, a boundary point is a value in this list between 2 adjacent instances having different attribute values of the target attribute. • In the worst case, the number of boundary points is about equal to the number of instances • This happens if the target attribute oscillates back and forth between good and bad

Example software on CD • Show an example made using the software on the CD

Conclusions • Decision Trees are an elegant way of learning – it is easy to expose their logic and understand what it has learned • Decision Trees are not always the best way to learn – they have some weaknesses • But it also has its own set of strengths

Conclusions • Decision Trees work best for symbolic, discrete values • Can be extended to work with continuous values • B&W had to do some clustering of feedback values to use decision trees

Conclusions • Up to now, the only use of Decision Trees in games has been in B&W • What are they good for? • User modeling -- teaching the computer how to react to the player, enhances replayability • Can be used to make bots that are the player’s allies more effective as in B&W • Could also make enemies more intelligent – the player would be forced to come up with new strategies • How else can they be used? • This is relatively unexplored territory people – if you think you have a great idea, go for it

Decision Tree Learning