280 likes | 401 Vues
This piece delves into the intricacies of machine learning, discussing key concepts such as training and test sets, evaluation protocols, and performance measures. It highlights the importance of having adequate data for reliable machine learning outcomes and addresses challenges like overfitting. Additionally, it introduces several learning methods, from supervised to unsupervised techniques, while detailing how statistical measures like precision, recall, and root-mean-square error contribute to evaluating model performance in real-world applications.
E N D
Learning from observations (b) • How good is a machine learner? • Experimentation protocols • Performance measures • Academic benchmarks vs Real Life KI1 / L. Schomaker - 2007
Experimentation protocols • Fooling yourself: training a decision tree on 100 example instances from earth and sending the robot to Mars • training set / test set distinction • both must be of sufficient size: • large training set for reliable ‘h’ (coefficients etc.) • large test set for reliable prediction of real-lifeperformance KI1 / L. Schomaker - 2007
Experimentation protocols • one training set / one test set, four yearsPhD project: still fooling yourself! • Solution: • training set • test set • final evaluation set with real-life data • k-Fold evaluation: k subsets fromlarge data base, measuring standard deviationof performance over experiments KI1 / L. Schomaker - 2007
Experimentation protocols • What to do if your don’t have enough data? • Solution: • Leave-one-out: use N-1 samples for training, • use the Nth sample for testing • repeat for all samples • compute the average performance KI1 / L. Schomaker - 2007
Performance • Example: % correctly classified samples (P) • Ptrain • Ptest • Preal Ptest KI1 / L. Schomaker - 2007
Performance, two-class KI1 / L. Schomaker - 2007
Performance, two-class KI1 / L. Schomaker - 2007
Performance, two-class Precision = 100 * #correct_hits / #says_Yes [%] Recall = 100 * #correct_hits / #is_Yes [%] KI1 / L. Schomaker - 2007
Performance, multi-class KI1 / L. Schomaker - 2007
Performance, multi-class Confusion matrix KI1 / L. Schomaker - 2007
Rankings / hit lists • Given a query Q, systems returns a hitlist of matches M: an ordered set, with instances i in decreasing likelihood of correctness • Precision: proportion of correct instances M in the hit list • Recall: proportion of correct instances from totalnumber of target samples in the database KI1 / L. Schomaker - 2007
Function approximation • For, e.g. regression models,learning an ‘analog’output • Example: target function t(x) • Obtained output function o(x) • For performance evaluation computeroot-mean square error (RMS error): • = ( (o(x)-t(x))2 / N ) KI1 / L. Schomaker - 2007
Learning curves P [% OK] #epochs (presentations of training set) KI1 / L. Schomaker - 2007
Learning curves performance on training set P [% OK] performance on test set #epochs (presentations of training set) KI1 / L. Schomaker - 2007
Learning curves 100% P [% OK] performance on training set performance on test set #epochs (presentations of training set) KI1 / L. Schomaker - 2007
Learning curves 100% P [% OK] performance on training set no generalization,overfit Stop! performance on test set #epochs (presentations of training set) KI1 / L. Schomaker - 2007
Overfitting • The learner learns the training set • Even perfectly, like a lookup table (LUT) • memorizing training instances • without correctly handling unseen data • Usual cause: more parameters in the learner than in the data KI1 / L. Schomaker - 2007
Preventing Overfit • For good generalization: • number of training examples must be much larger than the number of attributes (features): Nsamples / Nattr >> 1 KI1 / L. Schomaker - 2007
Preventing Overfit • For good generalization: • also: Nsamples >> Ncoefficients e.g.: solving linear equation: 2 coefficients, needing 2 data points in 2D Coefficients: model parameters, weights etc. KI1 / L. Schomaker - 2007
Preventing Overfit • For good generalization: • Ndatavalues >> Ncoefficients Coefficients: model parameters, weights etc. Ndatavalues = Nsamples * Nattributes e.g.: use Ndatavalues/Ncoefficients for system comparison KI1 / L. Schomaker - 2007
Example: machine-print OCR • Very accurate, today, but: • Needs 5000 examples of each character • Printed on ink-jet, laser printers, matrixprinters, fax copies • of many brands of printers • on many paper types • for 1 font & point size! . . A . . . KI1 / L. Schomaker - 2007
Ensemble methods • Boosting: • train a learner h[m] • weigh each of the instances • weigh the method m • train a new learner h[m+1] • perform majority voting on ensembleopinions KI1 / L. Schomaker - 2007
The advantage of democracy: partly intelligent, independent deciders KI1 / L. Schomaker - 2007
Learning methods • Gradient descent, parameter finding(multi-layer perceptron, regression) • Expectation Maximization (smart Monte Carlo search for best model, given the data) • Knowledge-based, symbolic learning (Version Spaces) • Reinforcement learning • Bayesian learning KI1 / L. Schomaker - 2007
Memory-based ‘learning’ • Lookup-table (LUT) • Nearest neighbour argmin(dist) • k-Nearest neighbour majority(Nargmin(dist,k) KI1 / L. Schomaker - 2007
Unsupervised learning • K-means clustering • Kohonen self-organizing maps (SOM) • Hierarchical clustering KI1 / L. Schomaker - 2007
Summary (1) • Learning needed for unknown environments (and/or) lazy designers • Learning agent = performance element + learning element • Learning method depends on type of performance element, available • feedback, type of component to be improved, and its representation KI1 / L. Schomaker - 2007
Summary (2) • For supervised learning, the aim is to find a simple hypothesis • that is approximately consistent with training examples • Decision tree learning using information gain: entropy-based • Learning performance = prediction accuracy measured on test set(s) KI1 / L. Schomaker - 2007