Action Recognition from Video Using Feature Covariance Matrices

Action Recognition fromVideo UsingFeature Covariance Matrices Kai Guo, PrakashIshwar, Senior Member, IEEE, and JanuszKonrad, Fellow, IEEE

Outline • Introduction • Framework • Action Feature • Experiments • Conclusion

Introduction • A new approach to action representation—one based on the empirical covariance matrix of a bag of localaction features. • We apply the covariance matrix representation to two typesof local feature collections: 1.A sequence ofsilhouettes of an object (the so–called silhouette tunnel) 2.The optical flow.

Introduction • We focus on two distinct types of classifiers: 1.Thenearest-neighbor (NN) classifier. 2. the sparse-linearapproximation(SLA) classifier. • Transformation of the supervised classification problem in theclosed convex cone of covariance matrices into an equivalentproblem in the vector space of symmetric matrices via thematrix logarithm.

Framework • Feature Covariance Matrices • We adopt a “bag of dense local feature vectors” modeling approach. • Inspired by Tuzelet al.’s work, the feature-covariance matrix can provide a very discriminativerepresentation for action recognition.

Framework • Let F = {fn} denote a “bag of feature vectors” extracted from a video sample, the size of the feature set |F| be N. • The empirical estimate of the covariance estimate of the covariance matrix of F is given by: • Where is the empirical mean feature vector.

Framework • Log-Covariance Matrices • A key idea is to map the convex cone of covariance matrices to the vector space of symmetric matrices1 by using the matrix logarithm proposed by Arsignyet al. . • The eigen-decomposition of C is given by C = • Then log(C) := , where is a diagonal matrix obtained from D by replacing D’s diagonal entries by their logarithms.

Framework • Classification Using Log-Covariance Matrices • Nearest-Neighbor (NN) Classification: • Given a query sample, find the most similar sample in the annotated training set, where similarity is measured with respect to some distance measure, and assign its label to the query sample.

Framework • Sparse Linear Approximation (SLA) Classification: • We approximate the log-covariance matrix of a query sample by a sparse linear combination of log-covariance matrices of all training samples p1, . . . , pN.

Framework • Given a query sample , one may attempt to express it as a linear combination of training samples by solving the matrix-vector equation given by • By solving the following NP-hard optimization problem: • If the optimal solution α∗is sufficiently sparse: • This difficulty can be overcome by introducing a noise term as follows: where z is an additive noise term whose length is assumed to be bounded by ε, • This leads to the following -minimization problem:

Framework • Use a reconstruction residual error (RRE) measure to decide the query class. • Let α∗ denotethe coefficients associated with class i(having label li ), corresponding to columns of training matrix Pi. • The RRE measure of class iis defined as : • To annotate the sample we assign the class label that leads to the minimum RRE

Action Feature • Silhouette Tunnel Shape Features • Our goal is to reliably discriminate between shapes; not to accurately reconstruct them. Hence a coarse, low-dimensional representation of shape would suffice. • We capture the shape of the 3D silhouette tunnel by the empirical covariance matrix of a bag of thirteen-dimensional local shape features.

Action Feature • We associate the following 13-dimensional feature vector f(s) that captures certain shape characteristics of the tunnel:

Action Feature • After obtaining 13-dimensional silhouette shape feature vectors, we can compute their 13 × 13 covariance matrix, denoted by C, using (1) (with N = |S|): • Where is the mean feature vector. • Thus, C is an empirical covariance matrix of the collection of vectors F.

Action Feature • Optical Flow Features • Here we use a variant of the Horn and Schunck method, which optimizes a functional based on residuals from the intensity constraints and a smoothness regularization term. • Let I (x, y, t) denote the luminance of the raw video sequence at pixel position (x, y, t) and let u(x, y, t) represent the corresponding optical flow vector . • Based on I (x, y, t) and u(x, y, t), we use the following feature vector f(x, y, t):

Experiments

Conclusion • The action recognition framework that we have developedin this paper is conceptually simple, easy to implement, hasgood run-time performance. • The TRECVID [63]and VIRAT [64] video datasets exemplify these types of realworldchallenges and much work remains to be done to addressthem.

Conclusion • Our method’s relative simplicity, as compared to someof the top methods in the literature, enables almost tuning-freerapid deployment and real-time operation. • This opens newapplication areas outside the traditional surveillance/securityarena, for example in sports video annotation and customizablehuman-computer interaction.

The End

Action Recognition from Video Using Feature Covariance Matrices

Action Recognition from Video Using Feature Covariance Matrices

Presentation Transcript

FORECASTING COVARIANCE MATRICES FOR ASSET ALLOCATION

Object Recognition Using Distinctive Image Feature From Scale-Invariant Key point

Facial Feature Recognition

Action Recognition

Action Recognition

Using Matrices

Action Recognition

Action Recognition

Emotion Recognition From Video Sequence

Action Recognition

Color wavelet covariance(CWC) Texture feature

Action Recognition

Action Recognition

Action Recognition

Gene Feature Recognition

Human Action Recognition using Spatio-Temporal Classification

Face Recognition From Video Part (II)

Hand Signals Recognition from Video Using 3D Motion Capture Archive

Action Recognition

Face Recognition From Video Part (II)