240-650 Principles of Pattern Recognition

240-650 Principles of Pattern Recognition Montri Karnjanadecha montri@coe.psu.ac.th http://fivedots.coe.psu.ac.th/~montri 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Chapter 3 Maximum-Likelihood and Bayesian Parameter Estimation 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Introduction • We could design an optimum classifier if we know P(wi) and p(x|wi) • We rarely have knowledge about the probabilistic structure of the problem • We often estimate P(wi) and p(x|wi) from training data or design samples 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Maximum-Likelihood Estimation • ML Estimation • Always have good convergence properties as the number of training samples increases • Simpler that other methods 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The General Principle • Suppose we separate a collection of samples according to class so that we have c data sets, D1, …, Dcwith the samples in Djhaving been drawn independently according to the probability law p(x|wj) • We say such samples are i.i.d.– independently and identically distributed random variable 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The General Principle • We assume that p(x|wj) has a known parametric form and is determined uniquely by the value of a parameter vector qj • For example • We explicitly write p(x|wj) as p(x|wj, qj) 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Problem Statement • To use the information provided by the training samples to obtain good estimates for the unknown parameter vectors q1,…qc associated with each category 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Simplified Problem Statement • If samples in Di give no information about qj if i = j • We now have c separated problems of the following form: To use a set D of training samples drawn independently from the probability density p(x|q) to estimate the unknown vector q. 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Suppose that D contains n samples, x1,…,xn. • Then we have • The Maximum-Likelihood estimate of q is the value of that maximizes p(D|q) Likelihood of q with respect to the set of samples 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Let q = (q1, …, qp)t • Let be the gradient operator 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Log-Likelihood Function • We define l(q) as the log-likelihood function • We can write our solution as 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

MLE • From • We have • And • Necessary condition for MLE 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The Gaussian Case: Unknown m • Suppose that the samples are drawn from a multivariate normal populationwith mean m and covariance matrix S • Let m is the only unknown • Consider a sample point xk and find • and 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The MLE of m must satisfy • After rearranging 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Sample Mean • The MLE for the unknown population meanis just the arithmetic average of the training samples (or sample mean) • If we think of the n samples as a cloud of points, then the sample mean is the centroid of the cloud 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The Gaussian Case: Unknown mand S • This is a more typical case where mean and covariance matrix are unknown • Consider the univariate case with q1=m and q2=s2 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

And its derivative is • Set to 0 • and 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

With a little rearranging, we have 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

MLE for multivariate case 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Bias • The MLE for the variance s2 is biased • The expected value over all data sets of size n of the sample variance is not equal to the true variance • An Unbiased estimator for S is given by 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

240-650 Principles of Pattern Recognition