From last time: PR Methods

From last time:PR Methods • Feature extraction + Pattern classification • Training, testing, overfitting, overtraining • Minimum distance methods • Discriminant Functions • Linear • Nonlinear (e.g, quadratic, neural networks) • -> Statistical Discriminant Functions

Statistical Pattern Recognition • Many sources of variability in speech signal • Much more than known deterministic factors • Powerful mathematical foundation • More general way of handling discrimination

Statistical Discrimination Methods • Minimum error classifier and Bayes rule • Gaussian classifiers • Discrete density estimation • Mixture Gaussians • Neural networks

we decide x is in class 2 we decide x is in class 1

How to approximate a Bayes classifier • Parametric form with single pass estimation • Discretize, count co-occurrences • Iterative training (mixture Gaussians, ANNs) • Kernel estimation

Minimum distance classifiers • If Euclidean distance used, optimum if: • Gaussian • Equal priors • Uncorrelated features • Equal variance per feature • If different variances per feature, correlated features, MD could be better

Then the discriminant function can be Di(x) = wiTx+ wi0 • Where Wi = Σi-1μi • Andwi0 = - ½ (μiTΣi-1μi) + log p(ωi) • This is a linear classifier

General Gaussian case • Unconstrained covariance matrices per class • Thenthe discriminant function is Di(x) = xTWix + wiTx + wi0 • This is a quadratic classifier • Gaussians are completely specified by 1stand 2nd order statistics • Is this enough for general populations of data?

A statistical discriminant function log p(x |ωi) + log p (ωi )

Remember: P(a|b) = P(a,b)/P(b) P(a,b) = P(a|b)P(b) = P(b|a)P(a)

Upcoming quiz etc. • Monday, 1st the guest talk on “deep” neural networks • Then the quiz. Topics: ASR basics, pattern recognition overview. Typical questions are multiple choice plus short explanation. Aimed at a 30 minute length. • There will be one more HW, one more quiz, then all oriented towards project.

From last time: PR Methods