Instance Based Learning

Instance Based Learning Ata Kaban The University of Birmingham

Today we learn: • K-Nearest Neighbours • Locally weighted regression • Case-based reasoning • Lazy and eager learning

Instance-based learning • One way of solving tasks of approximating discrete or real valued target functions • Have training examples: (xn, f(xn)), n=1..N. • Key idea: • just store the training examples • when a test example is given then find the closest matches

1-Nearest neighbour: Given a query instance xq, • first locate the nearest training example xn • then f(xq):= f(xn) • K-Nearest neighbour: Given a query instance xq, • first locate the k nearest training examples • if discrete values target function then take vote among its k nearest nbrs else if real valued target fct then take the mean of the f values of the k nearest nbrs

The distance between examples • We need a measure of distance in order to know who are the neighbours • Assume that we have T attributes for the learning problem. Then one example point x has elements xt , t=1,…T. • The distance between two points xixj is often defined as the Euclidean distance:

Voronoi Diagram

Characteristics of Inst-b-Learning • An instance-based learner is a lazy-learner and does all the work when the test example is presented. This is opposed to so-called eager-learners, which build a parameterised compact model of the target. • It produces local approximation to the target function (different with each test instance)

When to consider Nearest Neighbour algorithms? • Instances map to points in • Not more then say 20 attributes per instance • Lots of training data • Advantages: • Training is very fast • Can learn complex target functions • Don’t lose information • Disadvantages: • ? (will see them shortly…)

one two three six five four Eight ? seven

Training data Test instance

Keep data in normalised form One way to normalise the data ar(x) to a´r(x) is

Normalised training data Test instance

Distances of test instance from training data Classification 1-NN Yes 3-NN Yes 5-NN No 7-NN No

What if the target function is real valued? • The k-nearest neighbour algorithm would just calculate the mean of the k nearest neighbours

Variant of kNN: Distance-Weighted kNN • We might want to weight nearer neighbors more heavily • Then it makes sense to use all training examples instead of just k (Stepard’s method)

Difficulties with k-nearest neighbour algorithms • Have to calculate the distance of the test case from all training cases • There may be irrelevant attributes amongst the attributes – curse of dimensionality

A generalisation: Locally Weighted Regression • Some useful terminology • Regression = approximating a real valued function • Residual error = the error in approximating the target function • Kernel function = the function of distance that is used to determine the weight of each training example, I.e.

(Locally Weighted Regression) • Note kNN forms local approximations to f for each query point • Why not form an explicit local approximation for regions surrounding the query point, e.g. • Fit a linear function to the k nearest neighbours (or a quadratic, …) e.g. • …

Case-based reasoning (CBR) • CBR is an advanced instance based learning applied to more complex instance objects • Objects may include complex structural descriptions of cases & adaptation rules • It doesn’t use Euclidean distance measures but can do matching between objects using e.g. semantic nets. • It tries to model human problem-solving • uses past experience (cases) to solve new problems • retains solutions to new problems • CBR is an ongoing area of machine learning research with many applications

Applications of CBR • Design • landscape, building, mechanical, conceptual design of aircraft sub-systems • Planning • repair schedules • Diagnosis • medical • Adversarial reasoning • legal

Retrieve Matched Cases Knowledge and Adaptation rules Learn matching Suggest solution Case Base Retain Adapt? Reuse No Revise Yes Closest Case CBR process New Case

CBR example: Property pricing Test instance

How rules are generated • Examine cases and look for ones that are almost identical • case 1 and case 2 • R1: If recep-rooms changes from 2 to 1 then reduce price by £5,000 • case 3 and case 4 • R2: If Type changes from semi to terraced then reduce price by £7,000

Matching • Comparing test instance • matches(5,1) = 3 • matches(5,2) = 3 • matches(5,3) = 2 • matches(5,4) = 1 • Estimate price of case 5 is £25,000

Adapting • Reverse rule 2 • if type changes from terraced to semi then increase price by £7,000 • Apply reversed rule 2 • new estimate of price of property 5 is £32,000

Learning • So far we have a new case and an estimated price • nothing is added yet to the case base • If later we find house sold for £35,000 then the case would be added • could add a new rule • if location changes from 8 to 7 increase price by £3,000

Problems with CBR • How should cases be represented? • How should cases be indexed for fast retrieval? • How can good adaptation heuristics be developed? • When should old cases be removed?

Advantages • An local approximation is found for each test case • Knowledge is in a form understandable to human beings • Fast to train

Summary • K-Nearest Neighbour • (Locally weighted regression) • Case-based reasoning • Lazy and eager learning

Lazy and Eager Learning • Lazy: wait for query before generalizing • k-Nearest Neighbour, Case based reasoning • Eager: generalize before seeing query • Radial Basis Function Networks, ID3, … • Does it matter? • Eager learner must create global approximation • Lazy learner can create many local approximations

Instance Based Learning