Instance-Based Learning

Instance-Based Learning Evgueni Smirnov

Overview • Instance-Based Learning • Comparison of Eager and Instance-Based Learning • Instance Distances for Instance-Based Learning • Nearest Neighbor (NN) Algorithm • Advantages and Disadvantages of the NN algorithm • Approaches to overcome the Disadvantages of the NN algorithm • Combining Eager and Instance-Based Learning

Instance-Based Learning • Learning = storing all training instances • Classification = an instance gets a classification equal to the classification of the nearest instances to the instance.

Different Learning Methods • Eager Learning • Learning = acquiring an explicit structure of a classifier on the whole training set; • Classification = an instance gets a classification using the explicit structure of the classifier. • Instance-Based Learning (Lazy Learning) • Learning = storing all training instances • Classification = an instance gets a classification equal to the classification of the nearest instances to the instance.

Different Learning Methods • Eager Learning Any random movement =>It’s a mouse I saw a mouse!

Instance-Based Learning Its very similar to a Desktop!!

Nearest-Neighbor Algorithm (NN) The Features of the Task of the NN Algorithm: • the instance language I is a conjunctive language with a set A with n attributes a1, a2, … an. The domain of each attribute ai, can be discrete or continuous. • an instance x is represented as < a1(x), a2(x), … an(x) >, where ai(x) is the value of the attribute ai for the instance x; • the classes to be learned can be: • discrete. In this case we learn discrete function f(x) and the co-domain C of the function consists of the classes c to be learned. • continuous. In this case we learn continuous function f(x) and the co-domain C of the function consists of the classes c to be learned.

Distance Functions • The distance functions are composed from difference metrics da w.r.t. attributes a defined for each two instances xi and xj. • If the attribute a is numerical, then : • If the attribute a is discrete, then :

Distance Functions The main distance function for determining nearest neighbors is the Euclidean distance:

k-Nearest-Neighbor Algorithm The case of discrete set of classes. • Take the instance x to be classified • Find k nearest neighbors of x in the training data. • Determine the class c of the majority of the instances among the k nearest neighbors. • Return the class c as the classification of x.

Classification & Decision Boundaries e1 - - - - q1 + + - + + - 1-nn: q1 is positive 5-nn: q1 is classified as negative 1-nn:

k-Nearest-Neighbor Algorithm The case of continuous set of classes (Regression). • Take the instance x to be classified • Find k nearest neighbors of x in the training data. • Return the average of the classes of the k nearest neighbors as the classification of x.

Distance Weighted Nearest-Neighbor Algorithm The case of discrete set of classes. • Take the instance x to be classified • Determine for each class c the sum • Return the class c with the greater Sc.

Advantages of the NN Algorithm • the NN algorithm can estimate complex target classes locally and differently for each new instance to be classified; • the NN algorithm provides good generalisation accuracy on many domains; • the NN algorithm learns very quickly; • the NN algorithm is robust to noisy training data; • the NN algorithm is intuitive and easy to understand which facilitates implementation and modification.

Disadvantages of the NN Algorithm • the NN algorithm has large storage requirements because it has to store all the data; • the NN algorithm is slow during instance classification because all the training instances have to be visited; • the accuracy of the NN algorithm degrades with increase of noise in the training data; • the accuracy of the NN algorithm degrades with increase of irrelevant attributes.

Condensed NN Algorithm The Condensed NN algorithm was introduced to reduce the storage requirements of the NN algorithm. The algorithm finds a subset S of the training data D s.t. each instance in D can be correctly classified by the NN algorithm applied on the subset S. The average reduction of the algorithm varies between 60% to 80%.

Condensed NN Algorithm D S + + + - + - + - This algorithm first randomly selects one instance for each class in D and puts it in S. Then each instance in D is classified using only the instances in S. If an instance is misclassified, it is added to S. This process is repeated until there are no instances in D that are misclassified.

Condensed NN Algorithm • The CNN algorithm is especially sensitive to noise, because noisy instances will usually be misclassified by their neighbors, and thus will be retained. This causes two problems. • storage reduction is hindered, because noisy instances are retained, and because they are there, often non-noisy instances nearby will also need to be retained. • generalization accuracy is hurt because noisy instances are usually exceptions and thus do not represent the underlying function well.

Edited NN Algorithm The Edited Nearest Neighbor algorithm was proposed to stabilise the accuracy of the NN algorithm when there is increase of noise in the training data. The algorithm starts with the set S equal to the training data D, and then each instance in S is removed if it does not agree with the majority of its k nearest neighbors (with k=3, typically). The algorithm edits out noisy instances as well as close border cases, leaving smoother decision boundaries. It also retains all internal points; i.e., it does not reduce the space as much as most other reduction algorithms.

Edited NN Algorithm e1 - - - - + + - + The negative instance is removed! + - The average reduction of the algorithm varies between 20% to 40%.

Weighting Attributes The weighting-attribute technique was proposed in order to improve the accuracy of the NN algorithm in the presence of irrelevant attributes. The key idea is to find weights for all the attribute and to use them when the distance between instances is computed. Determining the weights of the attributes can be done by some search algorithm while determining the adequacy of the weights can be done with the process of cross validation. In a similar way we can choose the best k parameter for the NN algorithm!

Combining Decision Tress and the NN Algorithm Outlook sunny overcast rainy Humidity yes Windy high normal false true no yes yes no

Combining Decision Tress and the NN Algorithm Outlook sunny overcast rainy Humidity yes Windy high normal false true Classify the instance using the NN algorithm applied on the training instances associated with the classification nodes (leaves)

Instances & Abstractions Instances Incrementally Compile Combining Decision Rules and the NN Algorithm

Summary Points • Instance-based learning is simple, efficient and accurate approach to concept learning and classification. • Many of the problems of instance-based learning can be solved. • Instance-based learning can be combined with eager approaches to concept learning.

Instance-Based Learning