This week: overview on pattern recognition (related to machine learning)

This week: overview on pattern recognition (related to machine learning)

Non-review of chapters 6/7 • Z-transforms • Convolution • Sampling/aliasing • Linear difference equations • Resonances • FIR/IIR filtering • DFT/FFT

Speech Pattern Recognition • Soft pattern classification plus temporal sequenceintegration • Supervised pattern classification: class labels used in training • Unsupervised pattern classification: labels not available (or at least not used)

Training and testing • Training: learning parameters of classifier • Testing: classify independent test set, compare with labels, and score

F1 and F2 for various vowels

Swedish basketball players vs.speech recognition researchers

Feature extraction criteria • Class discrimination • Generalization • Parsimony (efficiency)

Feature vector size • Best representations for discrimination on training set are highly dimensioned • Best representations for generalization to test set tend to be succinct

Dimensionality reduction • Principal components (i.e., SVD, KL transform, eigenanalysis ...) • Linear Discriminant Analysis (LDA) • Application-specific knowledge • Feature Selection via PR Evaluation

PR Methods • Minimum distance • Discriminant Functions • Linear • Nonlinear(e.g., quadratic, neural networks) • Some aspects of each - SVMs • Statistical Discriminant Functions

Minimum Distance • Vector or matrix representing element • Define distance function • Collect examples for each class • In testing, choose the class of the closest example • Choice of distance equivalent to an implicit statistical assumption • Signals (e.g. speech) add temporal variability

Limitations • Variable scale of dimensions • Variable importance of dimensions • For high dimensions, sparsely sampled space • For tough problems, resource limitations (storage, computation, memory access)

Decision Rule for Min Distance • Nearest Neighbor (NN) - in the limit of infinite samples, at most twice the error of optimum classifier • k-Nearest Neighbor (kNN) • lots of storage for large problems; potentially largesearches

Some Opinions • Better to throw away bad data than to reduce itsweight • Dimensionality-reduction based on variance often abad choice for supervised pattern recognition • Both of these are only true sometimes

Discriminant Analysis • Discriminant functions max for correct class, min for others • Decision surface between classes • Linear decision surface for 2-dim is aline, for 3 is a plane; generally called hyperplane • For 2 classes, surface at wTx + w0 = 0 • 2-class quadratic case, surface at xTWx+ wTx + w0 = 0

Two prototype example • Di2 = (x – zi)T (x – zi) = xTx + ziTzi– 2 xTzi • D12 – D22 = 2 xTz2 – 2 xTz1 + z1Tz1– z2Tz2 • At decision surface, distances are equal, so • 2 xTz2 – 2 xTz1= z2Tz2 – z1Tz1 Or xT(z2 – z1) = ½ (z2Tz2 – z1Tz1) And if prototypes are all normalized to 1, Decision surface is • xT(z2 – z1) = 0 • And each discriminant function is xTzi

System to discriminate between two classes

Training Discriminant Functions • Minimum distance • Fisher linear discriminant • Gradient learning

Generalized Discriminators - ANNs • McCulloch Pitts neural model • Rosenblatt Perceptron • Multilayer Systems

Typical unit for MLP Σ

Typical MLP

Support Vector Machines (SVMs) • High dimensional feature vectors • Transformed from simple features (e.g., polynomial) • Can potentially classify training set arbitrarily well • Improve generalization by maximizing margin • Via “Kernel trick”, don’t need explicit high dim • Inner product function between 2 points in space • Slack variables allow for imperfect classification

Maximum margin decision boundary, SVM

Unsupervised clustering • Large and diverse literature • Many methods • Next time one method explained in the context of a larger, statistical system

Some PR/machine learning Issues • Testing on the training set • Training on the test set • # parameters vs # training examples: overfittingand overtraining • For much more on machine learning, CS 281A/B

This week: overview on pattern recognition (related to machine learning)