Machine Learning Algorithms in Computational Learning Theory
E N D
Presentation Transcript
Machine LearningAlgorithms inComputational Learning Theory TIAN HE JI GUAN WANG Shangxuan XiangnanKun PeiyongHancheng 25th Jan 2013
Outlines • Introduction • Probably Approximately Correct Framework (PAC) • PAC Framework • Weak PAC-Learnability • Error Reduction • Mistake Bound Model of Learning • Mistake Bound Model • Predicting from Expert Advice • The Weighted Majority Algorithm • Online Learning from Examples • The Winnow Algorithm • PAC versus Mistake Bound Model • Conclusion • Q & A
Machine Learning Machine cannot learn but can be trained.
Machine Learning • Definition "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E". ---- Tom M.Mitchell • Algorithm types • Supervised learning : Regression, Label • Unsupervised learning : Clustering, Data Mining • Reinforcement learning : Act better with observations.
Machine Learning • Other Examples • Medical diagnosis • Handwritten character recognition • Customer segmentation (marketing) • Document segmentation (classifying news) • Spam filtering • Weather prediction and climate tracking • Gene prediction • Face recognition
Computational Learning Theory • Why learning works • Under what conditions is successful learning possible and impossible? • Under what conditions is a particular learning algorithm assured of learning successfully? • We need particular settings (models) • Probably approximately correct (PAC) • Mistake bound models
Probably Approximately Correct Framework (PAC) • PAC Learnability • Weak PAC-Learnability • Error Reduction • Occam’s Razor
PAC Learning • PAC Learning • Any hypothesis that is consistent with a sufficiently large set of training examples is unlikely to be wrong. • Stationarity : The future being like the past. • Concept: An efficiently computable function of a domain. Function : {0,1} n -> {0,1} . • A concept class is a collection of concepts.
PAC Learnability • Learnability • Requirements for ALG • ALG must, with arbitrarily high probability (1-d), output a hypothesis having arbitrarily low error(e). • ALG must do as efficiently as in time that grows at most polynomially with 1/d and 1/e.
PAC Learning for Decision Lists • A Decision List (DL) is a way of presenting certain class of functions over n-tuples. • Example if x4 = 1 then f(x) = 0 else if x2 = 1 then f(x) = 1 else f(x) = 0. Upper bound on the number of all possible boolean decision lists on n variables is : n!4 n= O(n n )
PAC Learning for Decision Lists • Algorithms : A greedy approach (Rivest, 1987) • If the example set S is empty, halt. • Examine each term of length k until a term t is found s.t. all examples in S which make t true are of the same type v. • Add (t, v) to decision list and remove those examples from S. • Repeat 1-3. • Clearly, it runs in polynomial time.
What does PAC do? • A supervised learning framework to classify data
How can we use PAC? • Use PAC as a general framework to guide us on efficient sampling for machine learning • Use PAC as a theoretical analyzer to distinguish hard problems from easy problems • Use PAC to evaluate the performance of some algorithms • Use PAC to solve some real problems
What we are going to cover? • Explore what PAC can learn • Apply PAC to real data with noise • Give a probabilistic analysis on the performance of PAC
PAC Learning for Decision Lists • Algorithms : A greedy approach
Analysis of Greedy Algorithm • The output • Performance Guarantee
PAC Learning for Decision Lists 1. For a given S, by partitioning the set of all concepts that agree with f on S into a “bad” set and a “good”, we want to achieve 2. Consider any h, the probability that we pick S such that h ends up in bad set is 3. 4. Putting together
The Limitation of PAC for DLs • What if the examples are like below?
Other Concept Classes • Decision tree : Dts of restricted size are not PAC-learnable, although those of arbitrary size are. • AND-formulas: PAC-learnable. • 3-CNF formulas: PAC-learnable. • 3-term DNF formulas: In fact, it turns out that it is an NP-hard problem, given S, to come up with a 3-term DNF formula that is consistent with S. Therefore this concept class is not PAC-learnable—but only for now, as we shall soon revisit this class with a modified definition of PAC-learning.
Weak PAC-Learnability Benefits: • To loose the requirements for a highly accurate algorithm • To reduce the running time as |S| can be smaller • To find a “good” concept using the simple algorithm A
Error Reduction by Boosting • The basic idea exploits the fact that you can learn a little on every distribution and with more iterations we can get much lower error rate.
Error Reduction by Boosting • Detailed Steps: 1. Some algorithm A produces a hypothesis that has an error probability of no more than p = 1/2−γ (γ>0). We would like to decrease this error probability to 1/2−γ′ with γ′> γ. 2. We invoke A three times, each time with a slightly different distribution and get hypothesis h1, h2 and h3, respectively. 3. Final hypothesis then becomes h=Maj(h1, h2,h3).
Error Reduction by Boosting • Learn h1 from D1 with error p • Modify D1 so that the total weight of incorrectly marked examples are 1/2, thus we get D2. Pick sample S2 from this distribution and use A to learn h2. • Modify D2 so that h1 and h2 always disagree, thus we get D3. Pick sample S3 from this distribution and use A to learn h3.
Error Reduction by Boosting • The total error probability h is at most 3p^2−2p^3, which is less than p when p∈(0,1/2). The proof of how to get this probability is shown in [1]. • Thus there exists γ′> γ such that the error probability of our new hypothesis is at most 1/2−γ′. [1] http://courses.csail.mit.edu/6.858/lecture-12.ps
Adaboost • Defines a classifier using an additive model:
Error Reduction by Boosting Fig. Error curves for boosting C4.5 on the letter dataset as reported by Schapire et al.[]. Training and test error curves are lower and upper curves respectively.
PAC learning conclusion • Strong PAC learning • Weak PAC learning • Error reduction and boosting
Mistake Bound Model of Learning • Mistake Bound Model • Predicting from Expert Advice • The Weighted Majority Algorithm • Online Learning from Examples • The Winnow Algorithm
Mistake Bound Model of Learning | Basic Settings • x – examples • c – the target function, ct ∈ C • x1, x2… xt an input series • at the tth stage • The algorithm receives xt • The algorithm predicts a classification for xt, bt • The algorithm receives the true classification, ct(x). • a mistake occurs if ct(xt) ≠ bt
Mistake Bound Model of Learning | Basic Settings • A hypotheses class C has an algorithm A with mistake M: • if for any concept c ∈ C, and • for any ordering of examples, • the total number of mistakes ever made by A is bounded by M.
Mistake Bound Model of Learning | Basic Settings • Predicting from Expert Advice • The Weighted Majority Algorithm • Online Learning from Examples • The Winnow Algorithm
Predicting from Expert Advice • The Weighted Majority Algorithm • Deterministic • Randomized Predicting from Expert Advice
Predicting from Expert Advice | Basic Flow Combining Expert Advice Truth Prediction Assumption: prediction ∈ {0, 1}.
Predicting from Expert Advice | Trial (1) Receiving prediction from experts (2) Making its own prediction (3) Being told the correct answer
Predicting from Expert Advice | An Example • Task : predicting whether it will rain today. • Input : advices of n experts ∈ {1 (yes), 0 (no)}. • Output : 1 or 0. • Goal: make the least number of mistakes.
The Weighted Majority Algorithm | Deterministic 1 0 1 1 1 1 1 2 1 1 0 1 0 1 0.50 1 2 0.50 0 1 1 0 1 0.50 0.50 0.50 0.50 1 1 1 0 1 1 0.50 0.25 0.50 0.50 0.75 1 1 1 0 1 0.25 0.25 0.50 0.25 0.75 1 1