Introduction to Classification in Intelligent Computing Systems
This comprehensive overview introduces classification within intelligent computing, focusing on essential concepts such as feature vectors, decision boundaries, and classifiers. It explains how classification functions map measured features to discrete classes, aiding in various applications, including medical diagnosis and speech recognition. The discussion also highlights classification accuracy assessment and linear separability, emphasizing the importance of learning from data to develop effective predictive models. Examples include the classification of mammalian milk compositions and image data, demonstrating diverse real-world applications.
Introduction to Classification in Intelligent Computing Systems
E N D
Presentation Transcript
Computacion inteligente Introduction to Classification
Outline • Introduction • Feature Vectors and Feature Spaces • Decision boundaries • The nearest-neighbor classifier • Classification Accuracy • Accuracy Assesment • Linear Separability • Linear Classifiers • Classification learning
? Classification • Classification: • A task of induction to find patterns from data • Inferring knowledge from data
Learning from data • You may find different names for Learning from data identification estimation Regression classification pattern recognition Function approximation, curve or surface fitting etc…
Sample data Composition of mammalian milk
Classification • Classification is an important component of intelligent systems • We have a special discrete-valued variable called the Class, C • C takes values in {c1, c2, ......, cm}
Classification • Problem is to decide what class an object is • i.e., what value the class variable Y is for a given object • given measurements on the object, e.g., x1, x2, …. • These measurements are called “features” we wish to learn a mapping from Features -> Class
Classification Functions • Notation • Input space: X – ( X Rn) • Output domain: Y • binary classification: C = {-1, 1} • m-class classification: Y = {c1, c2, c3, …, cm} • Training set : S Each class is described by a label
Classification Functions • We want a mapping or function which: • takes any combination of values x = (a, b, d, .... z) and, • produces a prediction C, • i.e., a function C = f(a,b, d, ….z) which produces a value c1 or c2, etc The problem is that we don’t know this mapping: we have to learn it from data!
a b C Classifier d z Classification Functions Feature Values (which are known, measured) Predicted Class Value (true class is unknown to the classifier)
Classification Algorithms Training Data Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classification: Model Construction
Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured? Classification: Prediction Using the Model
Applications of Classification • Medical Diagnosis • classification of cancerous cells • Credit card and Loan approval • Most major banks • Speech recognition • IBM, Dragon Systems, AT&T, Microsoft, etc • Optical Character/Handwriting Recognition • Post Offices, Banks, Gateway, Motorola, Microsoft, Xerox, etc • Email classification • classify email as “junk” or “non-junk” • Many other applications • one of the most successful applications of AI technology
Examples: Classification of Galaxies Class 2 Class 1
Original Image Classified Image Examples: Image Classification • Remote sensing
Feature Vectors and Feature Spaces • Feature Vector: Say we have 2 features: • we can think of the features as a 2-component vector (i.e., a 2-dimensional vector, [a b]) • So the features correspond to a 2-dimensional space • We can generalize to d-dimensional space. This is called the “feature space”
Feature Vectors and Feature Spaces • Each feature vector represents the “coordinates” of a particular object in feature space • If the feature-space is 2-dimensional (for example), and the features a and b are real-valued • we can visually examine and plot the locations of the feature vectors
Additive RGB color model HSV Color Space Data with 3 Features • Given a set of of balls • Classify it by “color”
Data from Two Classes • Classes: data sets D1 and D2: • sets of points from classes 1 and 2 • data are of dimension d • i.e., d-dimensional vectors If d = 2 (2 features), we can plot the data
Data from Multiple Classes • Now consider that we have data from mclasses • e.g., m=5
Data from Multiple Classes • Now consider that we have data from mclasses • e.g., m=5 • We can imagine the data from each class being in a “cloud” in feature space
Composition of mammalian milk Proteins (%) Classes Fat (%) Example of Data from 5 Classes
Decision Boundaries • What is a Classifier? • A classifier is a mapping from feature space to the class labels {1, 2, … m} • Thus, a classifier partitions the feature space into mdecision regions • The line or surface separating any 2 classes is the decision boundary
Decision Boundaries • Linear Classifiers • a linear classifier is a mapping which partitions feature space using a linear function • it is one of the simplest classifiers we can imagine In 2 dimensions the decision boundary is a straight line
Class Overlap • Consider two class case • data from D1 and D2 may overlap • features = {age, body temperature}, • classes = {flu, not-flu} • features = {income, savings}, • classes = {good/bad risk}
Class Overlap • common in practice that the classes will naturally overlap • this means that our features are usually not able to perfectly discriminate between the classes • note: with more expensive/more detailed additional features (e.g., a specific test for the flu) we might be able to get perfect separation If there is overlap => classes are not linearly separable
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE 6 Decision Region 1 Decision 5 Region 2 4 3 Feature 2 2 1 0 Decision Boundary -1 2 3 4 5 6 7 8 9 10 Feature 1 Solution: A More Complex Decision Boundary
Training Data and Test Data • Training data • examples with class values for learning. Used to build a classifier • Test data • new data, not used in the training process, toevaluate how well a classifier does on new data
Some Notation • Feature Vectors • x(i) is the ith training data feature vector • in MATLAB this could be the ith column of an dxN matrix • Class Labels • c(i) is the class label of the ith feature vector • in general, c(i) can take m different class values, (e.g., c = 1, c = 2, ...)
Some Notation • Training Data • Dtrain = {[x(1), c(1)] , [x(2), c(2)] , ……, [x(N), c(N)]} • N pairs of feature vectors and class labels • Test Data • Dtest = {[x(1), c(1)] , [x(2), c(2)] , ……, [x(M), c(M)]} • M pairs of feature vectors and class labels Let y be a new feature vector whose class label we do not know, i.e., we wish to classify it.
Nearest Neighbor Classifier • y is a new feature vector whose class label is unknown • SearchDtrain for the closest feature vector to y • let this “closest feature vector” be x(j) • Classifyy with the same label as x(j), i.e. • y is assigned label c(j)
Nearest Neighbor Classifier • How are “closest x” vectors determined?. We have to define a distance • Euclidean distance • Manhatan distance
Nearest Neighbor Classifier • How are “closest x” vectors determined?. We have to define a distance • Mahalanobis distance
Nearest Neighbor Classifier • How are “closest x” vectors determined? • We have to define a distance • typically use minimum Euclidean distance dE(x, y) = sqrt(S (xi - yi)2) • Side note: this produces a called “Voronoi tesselation” of the d-space • each point “claims” a cell surrounding it • cell boundaries are polygons Analogous to “memory-based” reasoning in humans
1 2 Feature 2 1 2 2 1 Feature 1 Geometric Interpretation of Nearest Neighbor