General Signal Processing and Machine Learning Tools for BCI Analysis (Part-2)

General Signal Processing and Machine Learning Tools for BCI Analysis (Part-2) Md. Zia Uddin Bio-Imaging Lab, Department of Biomedical Engineering Kyung Hee University

Previous Lecture • FIR & IIR • Fourier-Based Filter • Bipolar Filtering • Common Average Reference • Laplace Filtering • PCA • ICA

Spatial Filter: Common Spatial Pattern • First, the normalized covariance matrix of each single trial raw EEG X is • determined as • X is represented as an N x T matrix, with N the number of channels, T the number of samples in the time interval of interest. • The average of covariance matrices from trials within class a and class b (Ra and Rb) are summed to produce a composite covariance matrix Rc = Ra + Rb. The eigenvectors Bc and eigenvalues λ of this covariance matrix yield a whitening transform • Now, when Ra and Rb are transformed by • then Sa and Sb share the same Eigenvectors, such that Sa = UDaU’ and Sb = UDbU’, where U is the common orthonormal Eigenvectors of Sa and Sb. • Da and Db are the corresponding diagonal matrices of Eigenvalues which sum up to 1.

Spatial Filter: Common Spatial Pattern(2) • So, U’RaU=Da , U’RbU=Db , U’(Ra+Rb)U=I and Da +Db=I • Assume the Eigenvectors in U are sorted in descending order in respect of the Eigenvalues in Da = (Da,1 , Da,2 ,..., Da,N ), Da,1 >= Da,2 ... >= Da,N (i.e. in ascending order in respect of Db = (Db,1 , Db,2 ,..., Db,N ), Db,1 >= Db,2 ... >= Db,N . • Now, when class a and b are both projected onto the first Eigenvector U1, then class a yields the maximal variance and class b the minimal variance. • Whereas when the classes are projected onto the last Eigenvector UN, then class a yields the minimal variance and class b the maximal variance. • In general, the classes projected onto the Eigenvector Um and Un-m+1 yield the m largest and m smallest variances in each class, respectively. So, the projection of data samples onto these directions are optimal for classification.

Discriminability of features: Fisher Score • Given the data with labels. For example, there are two classes. Then • the fisher score becomes • S= |mean1-mean2|/(std1+std2)

Discriminant Analysis • The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X1, X2, ..., Xp) which are obtained from each object. • each object is assumed to be a member of one (and only one) group 1 <= k <= K • an error is incurred if the object is attached to the wrong group the measurements of all objects of one class k are characterized by a probability density fk (X) • we would like to find a rule to decide for every object to which class it belongs to. • Example • A group of people consist of male and female persons  K = 2 • From each person the data of their weight and height is collected  p = 2 • The gender is unknown in the data set • We try to classify the gender for each person from the weight and height •  discriminant analysis • A classification rule is needed (discriminant function) to choose the group for each person

Quadratic Discriminant Analysis • A set of vectors of observations x of an event, each of which has a known type y. • To determine for a new observation vector, the correct solution may be assumed to be quadratic in the measurements, so y can be decided based on • In Quadratic discriminant analysis (QDA) it is assumed that there are only two classes of points (so y€{0,1}), and the measurements are normally distributed. • Suppose the means of each class are known to be μy = 0,μy = 1 and the covariances Σy = 0,Σy = 1. Then the likelihood ratio will be given by • Likelihood ratio = for some threshold t. • After some rearrangement, it can be shown that the resulting separating surface between the classes is a quadratic.

Linear Discriminant Analysis • LDA approaches the problem by assuming that the conditional probability density functions and are both normally distributed. • Under this assumption, the Bayes optimal solution is to predict points as being from the second class if the likelihood ratio is below some threshold T, so that • Without any further assumptions, the resulting classifier is referred to as QDA (quadratic discriminant analysis). • LDA also makes the simplifying homoscedastic assumption (i.e. that the class covariances are identical, so Σy = 0 = Σy = 1 = Σ). In this case, several terms are canceled.

RegularizedLinearDiscriminantAnalysis • The covariance matrices are controlled by two parameters λ and γ. • If λ =0, RDA • If λ =1, LDA • γ decreases the higher eigenvalues and increases the lower eigenvalues until γ=1

FishersLinearDiscriminantAnalysis • Two scatter matrices are defined • Between scatter matrix • Within scatter matrix • The goal is to maximize the ratio between the SB and SW • Thus the main objective to maximize • It can be solved by generalized eigenvalue problem • where w and λ are the eigenvectors and eigenvalues of respectively.

NearestNeighborclassifiers: k Nearest Neighbors • The aim of this technique is to assign to an unseen point the dominant class among its k nearest neighbors within the training set. • For BCI, these nearest neighbors are usually obtained using a metric distance. KNN algorithms are not very popular in the BCI compmunity, probably because • they are known to be very sensitive to the curse-of-dimensionality, which made them fail in several BCI experiments. • However, when used in BCI systems with low-dimensional feature vectors, kNN may prove to be efficient. • Two famous k-NN algorithms • K-means • LBG

MahalanobisDistance-BasedClassifiers • Mahalanobis distance based classifiers assume a Gaussian distribution N(μc,Mc) for each prototype of the class c. • Then, a feature vector x is assigned to the class that corresponds to the nearest prototype, according to the so-called Mahalanobis distance dc (x):

SVM • An SVM also uses a discriminant hyper plane to identify classes. • However, concerning SVM, the selected hyper plane is the one that maximizes the margins, i.e., the distance from the nearest training points. • Maximizing the margins is known to increase the generalization capabilites. • an SVM uses a regularization parameter C that enables accommodation to outliers and allows errors on the training set.

MultiLayerPerceptron • An MLP is composed of several layers of neurons: an input layer, possibly one or several hidden layers, and an output layer. • Each neuron's input is connected with the output of the previous layer's neurons whereas the neurons of the output layer determine the class of the input feature vector. • A MultiLayer Perceptron without hidden layers is known as a perceptron. • Interestingly enough, a perceptron is equivalent to LDA and, as such, has been sometimes used for BCI applications. Multi-Layer-Perceptron structure

OtherNNarchitechtures • Learning Vector Quantization (LVQ) Neural Network • Fuzzy ARTMAP Neural Network • Finite Impulse Response Neural Network (FIRNN) • Time-Delay Neural Network (TDNN) • Gamma dynamic Neural Network (GDNN) • RBF Neural Network • Bayesian Logistic Regression Neural Network (BLRNN)

HiddenMarkovModel (1) • Hidden Markov Models (HMM) are popular dynamic classifiers in the field of speech recognition. • Hidden Markov Model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. • An HMM is a kind of probabilistic automaton that can provide the probability of observing a given sequence of feature vectors. • Each state of the automaton can modelize the probability of observing a given feature vector. • HMM are perfectly suitable algorithms for the classification of time series. As EEG components used to drive BCI have specific time courses, HMM have been applied to the classification of temporal sequences of BCI features.

HiddenMarkovModel (2) • HMM parameters • A generic HMM can be expressed as λ={S,π,A,B} where • S denotes possible states • π the initial probability of the states • Athe transition probability matrix between hidden states • Bobservation symbols’ probability from every state. HMM for T observations

HMM-basedMentalTaskClassification (1) imagery left or right hand movements • The task was to control a feedback bar by means of imagery left or right hand movements according to the cues shown to the subject. • Servaral study shows that for imagery hand movements, 2/3 channels of data over the motor cortex are enough to be processed. • Data is collected from 3 channels C3, C4, and Cz. • Sampling frequency is 128Hz • Filtered between 0.5 and 30 Hz. • 140 trials of 9 second length. • In each trial, • the first 2s was quiet • at t = 2s an acoustic stimulus indicates the beginning of the trial, and a cross ‘+’ was displayed for 1s; then at t = 3s an arrow (left or right) was displayed as the cue. • At the same time the subject was asked to move a bar into the direction of the cue. Therefore, only the period between t = 4s and t = 9s of each trial was considered for classification study.

HMM-basedMentalTaskClassification (2) imagery left or right hand movements • Randomly chosen 100 trials as training- and 40 trials as test-dataset. • This process has been repeated 20 times on randomly separated training and test data and at last the results of all these trends have been averaged to provide the total classification accuracy percentage. • The best classification percentage showed to be for the classifier with 2 states and 16 observable symbols/state according to second 0.5s segment of data, which is 77.13 %. Reference. S. Solhjoo, A. M. Nasrabadi, and M. R. H. Golpayegani. Classication of chaotic signals using hmm classiers: EEG-based mental task classication. In Proceedings of the European Signal Processing Conference, 2005.

PCA+HMM+SVMFOREEGPATTERNCLASSIFICATION (1) imagery left or right hand movements • First Approach • Principal component features extracted separately from C3 and C4 channels are concatenated • Then they are fed into the corresponding HMM (which models either left-movement or right-movement) for training. • Accuracy: 75.70%

PCA+HMM+SVMFOREEGPATTERNCLASSIFICATION (2) imagery left or right hand movements • Second Approach • Principal component features from each channel (C3 and C4) are fed into two HMMs separately which results in four HMMs in total. • HMMC3L • HMMC4L • HMMC3R • HMMC4R • The SVM is employed to make a final decision from the likelihood scores computed by HMMs. • Accuracy: 78.15% Reference: H. Lee and S. Choi. Pca+hmm+svm for eeg pattern classication. In Proceedings of the Seventh International Symposium on Signal Processing and Its Applications, 2003.

ClassifyingEEGsignalsbasedHMM-AR (2) ICA+AR+HMM for imagery left or right hand movements AR Model An AR model is a linear predictor used for modeling time series. The so-called Kalman filter AR model achieves this by having a state which is a vector of AR coefficients. • Basic Steps • Consider 8hz to 30hz • ICA on C3 and C4 and consider first IC as source • AR • HMM for training and testing of two actions Reference: Tang Yan, Tang Jingtian, Gong Andong, and Wang Wei. Classifying EEG signals based HMM-AR. In Proceedings of The 2nd International Conference on Bioinformatics and Biomedical Engineering, pp. 2111-2114 , 2008.

Conclusion • Some basic signal processing and classification techniques are discussed over here.

Thank you

General Signal Processing and Machine Learning Tools for BCI Analysis (Part-2)

General Signal Processing and Machine Learning Tools for BCI Analysis (Part-2)

Presentation Transcript

Bayesian Machine Learning for Signal Processing

Basic Acoustics + Digital Signal Processing

Data Mining Practical Machine Learning Tools and Techniques

Some Useful Machine Learning Tools

Computational Tools for Image Processing

General Signal Processing and Machine Learning Tools for BCI Analysis (Part-1)

Signal Processing for Wireless Communications: Design, Tools, Architectures

Introduction to audio signal processing

Signal and Data Processing

Digital Signal Processing

Information Theoretic Signal Processing and Machine Learning

When Signal Processing Meets Machine Learning

Machine Learning for Signal Processing Fundamentals of Linear Algebra - 2

Machine Learning for Signal Processing Principal Component Analysis

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Linear Gaussian Models

Machine Learning for Signal Processing Eigenfaces and Eigen Representations