Introduction to Learning

Introduction to Learning ECE457 Applied Artificial Intelligence Fall 2007 Lecture #11

Outline • Overview of learning • Supervised learning • Russell & Norvig, sections 18.1-18.3, 19.1 • Unsupervised learning • Russell & Norvig, section 20.3 • Reinforcement learning • Russell & Norvig, sections 21.1, 21.3, 21.5 CS 498 & CS 698 (Prof. Ben-David) ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2

Limit of Predefined Knowledge • Many of the algorithms and techniques we saw relied on predefined information • Probability distributions • Heuristics • Utility functions • Only works if this information is easily available • For real-world application, often preferable to make agent learn the information automatically ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 3

Overview of Learning • Learn what? • Facts about the world (KB) • Decision-making strategy • Probabilities, costs, functions, states, … • Learn from what? • Training data • Often freely available for common problems • Real world • Learn how? • Need some form of feedback to the agent ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 4

Learning and Feedback • Supervised learning • Training data includes correct output • Learn relationship between data and output • Evaluate with statistics • Unsupervised learning • Training data with no correct output • Learn patterns in the data • Evaluate with fitness of the pattern • Reinforcement learning • Set of actions with rewards and punishments • Learn to maximize reward & minimize punishment • Evaluate with value of reward ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 5

Supervised Learning • Given a training corpus of data-output pairs • x & y values • Email & spam/not spam • Variable values & decision • Learn the relationship mapping the data to the output • f(x) • Spam features • Decision rules ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 6

- - - - - - + + - - + + - + - + + - + + + - + + + + - + - - - - - - - - - Supervised Learning Example • 2D state space with binary classification • Learn function to separate both classes ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 7

- - - - - - + + - - + + - + - + + - + + + - + + + + - + - - - - - - - - - Decision Tree X>5 Yes 8 No - X<2 Yes No - Y>8 Yes No - Y<4 4 No Yes - + 2 5 ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 8

- - - - - - + + - - + + - + - + + - + + + - + + + + - + - - - - - - - - - Decision Tree X Other <2 8 >5 - - Y Other <4 >8 - - + 4 2 5 ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 9

Decision Tree • Multiple variables and value to decide whether email is spam ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 10

Decision Tree • Build decision tree UW No Yes Not Spam Multi No Yes Not Spam Attach Yes No Spam Pictures Yes No Spam Not Spam ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 11

- - - - - - + + - - + + - + - + + - + + + - + + + + - + - - - - - - - - - Supervised Learning Algorithm • Many possible learning techniques, depending on the problem and the data • Start with inaccurate initial hypothesis • Refine to reduce error or increase accuracy • End with trade-off between accuracy and simplicity ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 12

Trade-Off • Why trade-off between accuracy and simplicity? • Noisy data • Special cases • Measurement errors - - - - - - + - + + - - + + - + - + - + - - + + + - + + + + - - - - - - - - - - - + - ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 13

Supervised Learning Algorithm • Learning a decision tree follows the same general algorithm • Start with all emails at root • Pick attribute that will teach us the most • Highest information gain, i.e. difference of probability of each class • Branch using that attribute • Repeat until trade-off between accuracy of leafs and depth limit / relevance of attributes ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 14

Supervised Learning Evaluation • Statistical measures of agent’s performance • RMS error between f(x) and y • Making correct decision • With as few decision rules as possible • Shallowest tree possible • Accuracy of a classification • Precision and recall of a classification ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 15

Precision and Recall • Binary classification: distinguish + (our target) from – (everything else) • Classifier makes mistakes • Classifies some + as – and some – as + • Define four categories: ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 16

Precision and Recall • Precision • Proportion of selected items the classifier got right • TP / (TP + FP) • Recall • Proportion of target items the classifier selected • TP / (TP + FN) ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 17

Precision and Recall • Why ignore True Negatives? • Typically, there are a lot more negatives than positives • Internet searches: + are target websites, - are all other websites • Counting TN would skew the statistics and favour a system that classifies everything as negatives ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 18

Overfitting • A common problem with supervised learning is over-specializing the relation learned to the training data • Learning from irrelevant features of the data • Email features such as: paragraph indentation, number of typos, letter “x” in sender address, … • Works well on training data • Because of poor sampling or random chance • Fails in real-world tests ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 19

Testing Data • Evaluate the relation learned using unseen test data • i.e. that was not used in the training • Therefore system not overfitted for it • Split training data beforehand, keep part away for testing • Only works once! • If you reuse testing data, you are overfitting your system for that test!! • Never do that!!! ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 20

Cross-Validation • Shortcomings of holding out test data • Test only works once • Training on less data, therefore result less accurate • n-fold cross-validation • Split the training corpus into n parts • Train with n-1, test with 1 • Run n tests, each time using a different test part • Final training with all data and best features ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 21

Naïve Bayes Classifier • P(Cx|F1,…,Fn) = P(Cx) iP(Fi|Cx) • Classify item in class Cx with maximum probability • Weighted Naïve Bayes Classifier • Paper on website • Give each feature Fi a weight wi • Learn the proper weight values • P(Cx|F1,…,Fn) = P(Cx) iP(Fi|Cx)wi ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 22

Learning the Weights • Start with initial weight values • At each iteration, for each feature • Measure the impact of that feature on the accuracy of the classification • Modify the weight to increase the accuracy of the classification • End if • Iteration limit is reached • Accuracy increase less than threshold ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 23

Learning the Weights • Define the following • Initial weight values: wi(0) = 1 • Learning rate:  • Measure of the accuracy of the classification using feature Fi at iteration n: Ain • Function to convert Ain into weight variation: (Ain) • (Ain) = (1 + e-Ain)-1 * [1 - (1 + e-Ain)-1]² • Threshold improvement in accuracy: є • Iteration limit: nmax ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 24

Learning the Weights • Start with wi(0) • At iteration n, for feature Fi • Measure Ain • Compute wi(n) = (Ain) • wi(n) = wi(n-1) + wi(n) • End if • n = nmax (entire algorithm) • Ain < є (feature Fi) ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 25

Unsupervised Learning • Given a training corpus of data points • Observed value of random variables in Bayesian network • Series of data points • Orbits of planets • Learn underlying pattern in the data • Existence and conditional probability of hidden variables • Number of classes and classification rules • Kepler’s laws of planetary motion ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 26

Unsupervised Learning Example • 2D state space with unclassified observations • Learn number and form of clusters • Problem of unsupervised clustering • Many algorithms proposed for it • More research still being done for better algorithms, different kind of data, … * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 27

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Unsupervised Learning Algorithm • Define a similarity measure, to compare pairs of elements • Starting with no clusters • Pick seed element • Group similar elements until threshold • Pick new seed from free elements and start again * * * ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 28

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Unsupervised Learning Algorithm • Starting with one all-encompassing cluster • Find cluster with highest internal dissimilarity • Find most dissimilar pair of elements inside cluster • Split into two clusters • Repeat until all clusters have internal homogeneity • Merge homogeneous clusters * * * * * * ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 29

Unsupervised Learning Evaluation • Need to evaluate fitness of relationship learned • Number of clusters vs. their internal properties • Difference between clusters vs. internal homogeneity • Number of parameters vs. number of hidden variables in Bayesian network • No way of knowing what is the optimal solution ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 30

K-Means • Popular unsupervised clustering algorithm • Data represented as cloud of points in state space • Target • Group points in k clusters • Minimize intra-cluster variance ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 31

K-Means • Start with k random cluster centers • For each iteration • For each data point • Associate the point to the nearest cluster center • Add to variance • Move each cluster center to the center of mass of associated data point cloud • End when • Variance less than threshold • Cluster centers stabilize ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 32

K-Means • We have: • Data points: x1, …, xi, …, xn • Clusters: C1, …, Cj, … Ck • Cluster centers: 1, …, j, … k • Minimize intra-cluster variance • V = j xiCj |xi - j|² ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 33

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * K-Means Example o o o o o o o o o o o ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 34

Reinforcement Learning • Given a set of possible actions, the resulting state of the environment, and rewards or punishment for each state • Taxi driver: tips, car repair costs, tickets • Checkers: advantage in number of pieces • Learn to maximize the rewards and/or minimize the punishments • Maximize tip, minimize damage to car and police tickets: drive properly • Protect own pieces, take enemy pieces: good play strategy ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 35

Reinforcement Learning • Learning by trial and error • Try something, see the result • Speeding results in tickets, going through a red light results in car damage, quick and safe drive results in tips • Checkers pieces in the center of the board are soon lost, pieces on the side are kept longer, sacrifice some pieces to take a greater number of enemy pieces • Sacrifice known rewarding actions to explore new, potentially more rewarding actions • Develop strategies to maximize rewards while minimizing penalties over the long-term ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 36

Q-Learning • Each state has • A reward or punishment • A list of possible actions, which lead to other states • Learn value of state-action pairs • Q-value ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 37

Q-Learning • Update value of previous (t-1) state-action pair based on current (t) state-action value • Q(st-1,at-1) = [Rt-1 + maxa(Q(st,at)) – Q(st-1,at-1)] • Q(s,a): estimated value of state-action pair (s,a) • Rt: reward of state st • : learning rate • : discount factor of future rewards • 0 (future rewards are irrelevant), 1 (future rewards are the same as current rewards) • Update: Q(st-1,at-1) = Q(st-1,at-1) + Q(st-1,at-1) ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 38

Exploration Function • If agent always does action with max Q(s,a), it always evaluates the same state-action pairs • Need exploration function • Trade-off greed vs. curiosity • Try rarely-explored low-payoff actions instead of well-known high-payoff actions • Many possible functions ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 39

Exploration Function • Define: • Q(s,a): estimated value of (s,a) • N(s,a): Number of times (s,a) has been tried • Rmax: maximum possible value of Q(s,a) • Nmin: minimum number of times we want the agent to try (s,a) • f( Q(s,a), N(s,a) ) = • Rmax if N(s,a) < Nmin • Q(s,a) otherwise • Agent picks action with maximum f(.) value • Guarantees each (s,a) pair is explored at least Nmin times ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 40

Limits of RL • Search • Number of state-action pairs can be very large • Intermediate rewards can be noisy • Real-world search • Initial policy can have very poor reward • Necessary exploration of suboptimal actions can be costly • Some states are hard to reach ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 41

Policy • Learn the optimal policy in decision network • : S  A • EU() = t=0t Rt • Greedy search • Modify policy until EU stops increasing ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 42

Helicopter Flight Control • Sustained stable inverted flight • Very difficult for humans • First AI able to do it ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 43

Helicopter Flight Control • Collect flight data with human pilot • Learn model of helicopter dynamics • Stochastic and nonlinear • Supervised learning • Learn policy for helicopter controller • Reinforcement learning ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 44

Helicopter Dynamics • States • Position, orientation, velocity, angular velocity • 12 variables • 391 seconds of flight data • Time step 0.1s • 3910 triplets (st, at, st+1) • Learn probability distribution P(st+1|st,at) • Implemented in simulator and tested by pilot ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 45

Helicopter Controller • Problem definition • S: set of possible states • s0: initial state (s0 S) • A: set of possible actions • P(S|S,A): state transition probabilities • : discount factor • R: reward function mapping states to values • At state st, controller picks action at, system transitions to random state st+1 with probability P(st+1|st,at) ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 46

Helicopter Controller • Reward function • Punish deviation from desired helicopter position and velocity • R  [-, 0] • Policy learning • Reinforcement learning • EU() = t=0t Rt • Problem • Stochastic state transitions • Impossible to compare several policies! ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 47

PEGASUS algorithm • Predefined series of random numbers • Length of series function of complexity of policy • Use the same series to test all policies • At time t, each policy encounters the same random event • Simulate stochastic environment • Environment stochastic from point of view of agent • Environment deterministic from our point of view • Makes comparison between policies possible ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 48

Summary of Learning ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 49

Introduction to Learning