Intelligent Sensors Chapter 8: Pattern Classification by Prof. Dehan Luo

Chapter 8 Pattern Classification Prof. Dehan Luo 第八章模式分类 Section One Discriminant functions （第一节判别函数） Section Two Quadratic classifiers （第二节二次方分类器） Section Three Euclidean and Mahalanobis metrics （第三节欧几里得和Mahalanobis方法） Section Four Quadratic classifier examples （第四节二次分类器举例） Section Five K Nearest Neighbor classifiers （第五节 KNN分类器） Intelligent Sensors System 8-1 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Section One Discriminant functions （第一节判别函数） A convenient way to represent a pattern classifier is in terms of a family of discriminant functions gi(x) with a simple MAX gate as the classification rule （表达模式分类器的简易的方法是用具有简单MAX门的系列判别函数gi(x)作为分类准则） How do we choose the discriminant functions gi(x) Depends on the objective function to minimize （取决于目标函数最小化） Probability of error（误差概率） Bayes Risk Intelligent Sensors System 8-2 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Discriminant functions (Cont.) （判别函数）(续) Probability of error P[error|x] is “the probability of assigning x to the wrong class”（误差概率是将X值赋予给错误类型的概率） For a two-class problem, P[error|x] is simply It makes sense that the classification rule be designed to minimize the average probability of error P[error] across all possible values of x （这意味分类规则被设计为在所有可能的X值范围内使平均误差概率为最小） Intelligent Sensors System 8-3 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Discriminant functions (Cont.) （判别函数）(续) To ensure P(error) is minimum we minimize P(error|x) by choosing the class with maximum posterior P(ωi|x) at each x （为使误差概率最小，对每个X，选择最大的P(ωi|x） This is called the MAXIMUM A POSTERIORI (MAP) RULE and the associated discriminant functions become （这就称为最大后续规则（MAP），相关的判别函数表示为） Intelligent Sensors System 8-4 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Discriminant functions (Cont.) （判别函数）(续) We “prove” the optimality of the MAP rule graphically The right plot shows the posterior for each of the two classes The bottom plots shows the P(error) for the MAP rule and another rule。 Which one has lower P(error) (color-filled area) ? Intelligent Sensors System 8-5 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Section Two Quadratic classifiers （第二节二次方分类器） Let us assume that the likelihood densities are Gaussian （假设可能的概率密度是高斯分布） Using Bayes rule, the MAP discriminant functions become （使用Bayes规则，MAP判别函数成为） Intelligent Sensors System 8-6 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Quadratic classifiers（二次方分类器）（续） Eliminating constant terms （ Eliminating常数条件） We take natural logs (the logarithm is monotonically increasing) （采用自然对数，对数算法是单调增加） This is known as aQuadratic Discriminant Function（称为二次分类函数） The quadratic term is know as theMahalanobis distance （二次术语称为Mahalanobis 距离） Intelligent Sensors System 8-7 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Section Three Euclidean and Mahalanobis metrics （第三节欧几里得和Mahalanobis方法） The Mahalanobis distance can be thought of vector distance that uses a ∑ -1 norm （Mahalanobis距离被看作是矢量距离） ∑-1 can be thought of as a stretching factor on the space（∑-1被认为是空间弹性因子） Note that for an identity covariance matrix (∑i=I), the Mahalanobis distance becomes the familiarEuclidean distance（当协方差矩阵∑i=I时，Mahalanobis 距离即为欧几里得距离） Intelligent Sensors System 8-8 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Section Four Quadratic classifier examples （第四节二次分类器举例） In the following slides we look at special cases of the Quadratic classifier For convenience we will assume equiprobable priors（等概率优先） so we can drop the term log(P(ωi)) Special case I: Σi= σ2I Intelligent Sensors System 8-9 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Quadratic classifier examples (Cont.)（续） Special case 2: Σi= Σ( Σdiagonal) Intelligent Sensors System 8-10 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Quadratic classifier examples (Cont.)（续） Special case 3: Σi=Σ (Σ non-diagonal) Intelligent Sensors System 8-11 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Quadratic classifier examples (Cont.)（续） Special Case 4: Σi=σi2I Intelligent Sensors System 8-12 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Quadratic classifier examples (Cont.)（续） Case 5: Σi≠Σj general case Intelligent Sensors System 8-13 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Quadratic classifier examples (Cont.)（续） Limitations of quadratic classifiers （1）The fundamental limitation is the unimodal Gaussian assumption（基本局限是假设单峰高斯分布） For non-Gaussian or multimodal Gaussian, the results may be significantly sub-optimal （对于非高斯或多峰高斯分布，其结果可能很明显不是最佳）（2）A practical limitation is associated with the minimum required size for the dataset（实际局限是与所需最小数据库大小有关） If the number of examples per class is less than the number of dimensions, the covariance matrix becomes singular and, therefore, its inverse cannot be computed In this case it is common to assume the same covariance structure for all classes and compute the covariance matrix using all the examples, regardless of class Intelligent Sensors System 8 -14 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Quadratic classifier examples (Cont.)（续） Conclusions We can extract the following conclusions 1、The Bayes classifier for normally distributed classes is quadratic （1）The Bayes classifier for normally distributed classes with equal covariance matrices is a linear classifier （2） The minimum Mahalanobis distance classifier is optimum for normally distributed classesandequal covariance matrices andequal priors （3）The minimum Euclidean distance classifier is optimum for normally distributed classes and equal covariance matricesproportional to the identity matrix and equal priors （4）Both Euclidean and Mahalanobis distance classifiers are linear Intelligent Sensors System 8-15 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Quadratic classifier examples (Cont.)（续） Conclusions（续） We can extract the following conclusions（续） 2、The goal of this discussion was to show that some of the most popular classifiers can be derived from decision- theoretic principles and some simplifying assumptions （1）It is important to realize that using a specific (Euclidean or Mahalanobis) minimum distance classifier implicitly ` corresponds to certain statistical assumptions （2）The question whether these assumptions hold or don’t can rarely be answered in practice; in most cases we are limited to posting and answering the question“does this classifier solve our problem or not?” Intelligent Sensors System 8-16 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo Section Five K Nearest Neighbor classifiers （第五节 KNN分类器） 1、K Nearest Neighbor classifier（ KNN分类器） 2、KNN in action: examples （KNN分类器运用举例） 3、KNN versus 1NN（ KNN分类器与1NN比较） 4、Characteristics of the kNN classifier（KNN分类器特点） Intelligent Sensors System 8-18 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo 1、K Nearest Neighbor classifier（ KNN分类器） The kNN classifier is based on non-parametric density estimation techniques （ KNN分类器是基于非参数密度估计技术）（1） Let us assume we seek to estimate the density function P(x) from a dataset of examples （假设从样本数据组中求密度函数P(x) ）（2）P(x) can be approximated by the expression （密度函数P(x) 近似表达为） Intelligent Sensors System 8-19 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo 1、K Nearest Neighbor classifier（ KNN分类器）（续）（3）The volume V is determined by the D-dim distance RkD(x) between x and its k nearest neighbor （V的大小是由X与KNN之间的D维距离RkD(x) 决定） Where CD is the volume of the unit sphere in D dimensions （ CD是D维空间的单位球体） Intelligent Sensors System 8-20 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo 1、K Nearest Neighbor classifier（ KNN分类器）（续） We use the previous result to estimate the posterior probability （使用先前结果估计后期概率） And the priors can be estimated by （先前概率值通过下式被估算） The posterior probability then becomes（后期概率可表示为） Yielding discriminant functions（得到的判别函数为） This is known as the k Nearest Neighbor classifier Intelligent Sensors System 8-21 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo 1、K Nearest Neighbor classifier（ KNN分类器）（续） The kNN classifier is a very intuitive method （ KNN分类器是非常直接的方法） Examples are classified based on their similarity with training data （样本基于类似训练数据进行分类） For a given unlabeled example Xu ∈ R D, find the k “closest” labeled examples in the training data set and assign Xu to the class that appears most frequently within the k-subset Intelligent Sensors System 8-22 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo 1、K Nearest Neighbor classifier（ KNN分类器）（续） The kNN only requires An integer k（整数K） A set of labeled examples （一组分类样本） A measure of “closeness” （接近度测量） Intelligent Sensors System 8-22 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo 2、KNN in action: examples （KNN分类器运用举例） We generate data for a 2-dimensional 3-class problem（产生2维3类问题数据）, where the class-conditional densities are multi-modal, and non-linearly separable We used kNN with k = five Metric = Euclidean distance Intelligent Sensors System 8-23 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo 3、KNN versus 1NN（ KNN分类器与1NN比较） Intelligent Sensors System 8-24 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo 4、Characteristics of the kNN classifier（KNN分类器特点）（1）Advantages (a) Analytical tractable, simple implementation(简单易用） Nearly optimal in the large sample limit (N→∞) （大样本条件下接近最佳） P[error]Bayes >P[error]1-NNR<2P[error]Bayes (b) Uses local information, which can yield highly adaptive behavior （使用局部信息，能产生好的适应特性） (c) Lends itself very easily to parallel implementations （容易进行并行运行）（2） Disadvantages Large storage requirements（需要大存储空间） Computationally intensive recall（计算量大） Intelligent Sensors System 8-25 School of Information Engineering

Chapter 8 Pattern Classification Prof. Dehan Luo 4、Characteristics of the kNN classifier（KNN分类器特点）（续）（3）1NN versus kNN The use of large values of k has two main advantages （a）Yields smoother decision regions （b）Provides probabilistic information: The ratio of examples for each class gives information about the ambiguity of the decision However, too large values of k are detrimental （a）It destroys the locality of the estimation （b）In addition, it increases the computational burden Intelligent Sensors System 8-26 School of Information Engineering

Intelligent Sensors Chapter 8: Pattern Classification by Prof. Dehan Luo