4. Instance-Based Learning
This document explores instance-based learning, focusing on approaches like k-Nearest Neighbors (k-NN) and Radial Basis Functions (RBF). It discusses local approximation methods that classify instances based on the proximity to query points. The challenges in classifying new instances, particularly the computational costs and sensitivity to noise and dimensionality, are highlighted. Additionally, the text examines locally weighted regression and the distinctions between lazy and eager learning paradigms, emphasizing their implications for model generalization and performance.
4. Instance-Based Learning
E N D
Presentation Transcript
4. Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the target function that applies in the neighborhood of the query instance • Cost of classifying new instances can be high: Nearly all computations take place at classification time • Examples: k-Nearest Neighbors • Radial Basis Functions: Bridge between instance-based learning and artificial neural networks 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning K-Nearest Neighbors 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Most plausible hypothesis 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Now? Or maybe… 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Is the simplest hypothesis always the best one? 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 4.2k-Nearest Neighbor Learning Instance x = [ a1(x), a2(x),..., an(x) ] n d(xi,xj) = [ (xi-xj).(xi-xj) ]½ = Euclidean Distance • Discrete-Valued Target Functions ƒ : n V = {v1, v2,.., vs) 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Prediction for a new query x: (k nearest neighbors of x) ƒ(x) = argmaxvV i=1,k[v,ƒ(xi)] [v,ƒ(xi)] = 1 if v =ƒ(xi) , [v,ƒ(xi)] = 0 otherwise • Continuous-Valued Target Functions ƒ(x) = (1/k) i=1,kƒ(xi) 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning • Distance-Weighted k-NN ƒ(x) = argmaxvV i=1,kwi[v,ƒ(xi)] ƒ(x) = i=1,kwiƒ(xi)/ k i=1,kwi wi = [d(xi,x)]-2 Weights more heavily closest neighbors 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning • Remarks for k-NN • Robust to noise • Quite effective for large training sets • Inductive bias: The classification of an instance will be most similar to the classification of instances that are nearby in Euclidean distance • Especially sensitive to the curse of dimensionality • Elimination of irrelevant attributes by suitably chosen the metric: d(xi,xj) = [ (xi-xj).G.(xi-xj) ]½ 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 4.3 Locally Weighted Regression Builds an explicit approximation to ƒ(x) over a local region surrounding x (usually a linear or quadratic fit to training examples nearest to x) Locally Weighted Linear Regression: ƒL(x) = w0+ w1x1+ ...+ wnxn E(x) = i=1,k[ƒL(xi)-ƒ(xi)]2 (xi nn of x) 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Generalization: ƒL(x) = w0+ w1x1+ ...+ wnxn E(x) = i=1,NK[d(xi,x)] [ƒL(xi)-ƒ(xi)]2 K[d(xi,x)] = kernel function Other possibility: ƒQ(x) = quadratic function of xj 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 4.4 Radial Basis Functions Approach closely related to distance-weighted regression and artificial neural network learning ƒRBF(x) = w0+ µ=1,kwµ K[d(xµ,x)] K[d(xµ,x)] = exp[-d2(xµ,x)/ 22µ] = Gaussian kernel function 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning Training RBF Networks 1st Stage: • Determination of k (=number of basis functions) • xµ and µ (kernel parameters) Expectation-Maximization (EM) algorithm 2nd Stage: • Determination of weights wµ Linear Problem 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
4. Instance-Based Learning 4.6 Remarks on Lazy and Eager Learning Lazy Learning: stores data and postpones decisions until a new query is presented Eager Learning: generalizes beyond the training data before a new query is presented • Lazy methods may consider the query instance x when deciding how to generalize beyond the training data D (local approximation) • Eager methods cannot (they have already chosen their global approximation to the target function) 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006