Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data Xu Chen University of Illinois at Chicago Electrical and Computer Engineering March/01/2010

Outline • Background and Motivation • Related Work • Problem Statement • Expected Contributions Null Space Invariants Tensor Null Space Localized Null Space Non-linear Kernel Space Invariants Bilinear Invariants

Background • Within the last several years, object motion trajectory-based recognition has gained significant interest in diverse application areas including: sign language gesture recognition, Global Positioning System (GPS), Car Navigation System (CNS), animal mobility experiments, sports video trajectory analysis automatic video surveillance .

Motivation • Accurate activity classification and recognition algorithms in multiple view is an extremely challenging task. • Object trajectories captured from different view-points lead to completely different representations, which can be modeled by affine transformation approximately. • To get a view independent representation, the trajectory data is represented in an affine invariant feature space.

Related Work • [Stiller, IJCV, 1994] math formulation of NSI • [Bashir et al., ACM multimedia, 2006] • Curvature scale space (CSS), Centroid distance function (CDF) representation, only works with small camera motions • [Chellapa et al., TIP, 2006] PCNSA for activity recognition, • [Huang et al., TIP, 2008] correlation tensor analysis [Chang et al., PAMI, 2008] kernel methods with multilevel temporal alignment, not view invariant

Problem Statement and Approach • Development of efficient view invariant representation, indexing/retrieval, and classification techniques for motion based events • Null Space in a particular basis is invariant in the presence of arbitrary affine transformations. • Demonstration of enormous potential in computer vision, especially in motion event, activity recognition and retrieval.

Null Space Invariants • Let be a single 2-D point, i = 0,1,… ,N-1 . Motion trajectory can be represented in the n 2-D points in a matrix M: • null space H: Where q is a n by 1 vector, H is a matrix spanned by the vector (linearly independent basis) with the size n by (n-3).

Null Space Invariants (NSI) • Typically, each element in H is given by:

Null Space based Classification/Retrieval Algorithm • Normalization the length of trajectories. Taking 2D FFT, selecting the N largest coefficients and then taking 2D IFFT. • Computation of NSI for the normalized raw data and vectorizing the NSI. Once we obtain the n by n-3 NSI H, we convert H into the n(n-3) by 1 vector. • Applying Principal Component Null Space Analysis (PCNSA) on vectorized NSI. There are various classification and retrieval algorithms we could apply on NSI.

Normalization Example to 25 samples

Details of PCNSA • 1. Obtain PCA Space: Evaluate total covariance matrix , then apply PCA to the total covariance matrix to find W(pca), whose columns are the L leading eigenvectors. • Project the data vectors, class means and class covariance matrices into the corresponding data vectors, class means, and class covariance matrices in the PCA space. • 3. Obtain ANS: Find the approximate null space , for each class i by choosing M(i) smallest eigenvalues’ corresponding eigenvectors.

Details of PCNSA 4. Obtain Valid Classification Directions in ANS: Say If anydirection e(i) satisfies this direction is said valid direction and used to build valid ANS, W(NSA, i). 5. Classification: PCNSA finds distances from a query trajectory X to all classes : d(X, i)=||W(NSA, i) (X-m(i)||, where m(i) is the mean for each class. We choose the smallest distance to a class for classification of X. 6. Retrieval: We compute the distance of the query trajectory Y to any other trajectory X(i) by d(X, i)=||W(NSA, i)(X(i)-Y||.

Classification Performance • We plot the classification accuracy verus the number of classes with 20 trajectories in each class (up to 40 classes).

Classification Performance We plot the classification accuracy with the number of trajectories (up to 40 trajectories in each class)

Retrieval Performance Apply PCNSA on NSI; Directly using PCA on NSI Precision Recall • To further demonstrate the view invariance nature of our system, we populate the CAVIAR dataset with 5 rotated versions for each trajectories in the class by rotating the trajectories with -60, -30, 0, 30, 60 degrees.

Visual illustration for retrieval results with 20 classes with motion trajectories from CAVIAR dataset for the motion events ”chase” and ”shopping and leave” for fixed cameras from unknown views. (query and top 2 retrieval)

Applications of NSI in image retrieval Facial recognition Extract SIFT as feature points The raw data matrix is not necessarily of the size 3 by n.

Image retrieval results

Perturbation Analysis So the ratio of the output error (error on null space) and input error (error on the raw data) is: Z the noise matrix on the raw data

SNR The ratio of the energy of the signal for NSI and the energy of the noise on NSI.

Optimal Sampling • Given the perturbation, designing optimal sampling strategy . • Uniform sampling and Poisson sampling are utilized. • Arbitrary trajectories in x and y directions: x=f(t), y=g(t) Expanding the trajectory in Macluarin series.

Optimal Sampling Property 2: The rate parameter = O(N) should be chosen for Poisson sampling to guarantee the convergence of the error ratio, where N is the total number of samples.

Optimal Sampling

Optimal Sampling In our framework, the density corresponds to the average number of samples per unit-length; i.e.

Arbitrary Moving Cameras and Segmented NSI • Fixed cameras from unknown views (all the features points undergo the same global affine transformations). • Classification and retrieval problem is further compound (the feature points can undergo different affine transformations). • Computing null space of segmented trajectories yields higher accuracy since the orientations and the translations for adjacent points are very close, therefore they have more similar null space representation locally. • Overlapping segmentation and non-overlapping segmentation. (Assumption)

Query Rank 1 Rank 2 Entering the shop global First 16 NSI Overlap segment by 5

Without Poisson sampling With Poisson sampling optimal sampling The same trajectory with different representation due to camera motions

Without sampling With Poisson sampling The example of the trajectory ”all” and affine versions with and without Poisson sampling with lamda=0.8. NS representations with Poisson sampling (on the right) are more similar than the ones without sampling (on the left). Poisson sampling greatly attenuates the noisy effects.

Classification Accuracy

Retrieval Time sec

Comparison ASL dataset, 20 classes with 40 trajectories in each class

Tensor Null Space (TNSI) • Fundamental mathematical framework for tensor NSI • View-invariant classification and retrieval of multiple motion trajectories.

Definition of Tensor Null Space Applying affine transformation T (m) on the mth unfolding of the multi-dimensional data M, if the resulting tensor null space Q is invariant in the mth dimension, then it is referred to as mode-m invariant. Conditions for rotational invariance: M(1), M(2), M(3) are unfolding of the three dimensional tensor from different dimensions. M(1): I1 by I2I3 M(2): I2 by I1I3 M(3): I3 by I1I2

Definition of Tensor Null Space Due to the invariance of rotation, Conditions for translation invariance for tensor null space:

Motion Event Tensor K: Number of video samples J: Twice of the number of trajectories P: the length of normalized trajectories We align each trajectory as two rows in a matrix according to x and y coordinates, and the number of rows of a matrix is set to be twice the number of the objects in the motion event under analysis

Simulation results for TNSI The accuracy of proposed classification system versus number of classes. There are 20 tensors in each class. Simulation results show that our system preserves its efficiency even for higher number of different classes (J (three trajectory in each clip)=3, P (length of trajectories)=18, K (video clips)=20). (unfolding in K )

Accuracy values versus increase in the number of tensors within a class. There are 20 classes in the system.

Localized Null Space Consider the view invariant video classification and retrieval. partial queries dynamical video database. Efficient updating and downdating procedures for the representation of dynamic video databases. Localized Null Space is one of the ways to solve the problem.

Localized Null Space (LNS) Localized Null Space relies on different key points in different segments. 47

Localized Null Space 48

Structure of Localized Null Space N-3 K-3 Non-Zero elements for W1 Non-Zero elements 3 Zero elements Zero elements K Zero elements N-K-3 N-3 Non-Zero elements for W2 Zero elements 3 N Zero elements Zero elements Zero elements N-K Zero elements Proposed Localized Null Space Traditional Null Space Illustration of the structure of the traditional Null Space and the proposed Localized Null Space. 49

Splitting of Raw Data Space

Splitting of Raw Data Space Deterministic splitting The length of the feature vector and the key points are known to the users. LNS provides perfect solution Random splitting The length of the feature vector and the key points are not available to the users. Splitting and key points must be estimated. 51

Optimal Splitting where D is the distortion for random splitting given by and P(L) is the distribution of the segment with length L, and K is the optimal segmentation length. Solving the minimization problem, we obtain 52

Optimal Key Points Selection within Each segment where C is the probability that all the key points are in the range . 53

Benefits of LNS • The localized null space can be viewed as consisting multiple subspaces and therefore can be dynamically split for retrieval of partial queries. • Localized Null Space can be used to merge multiple Null Spaces into an integrated Null Space. • Localized Null Space has the same complexity as the traditional null space. 54

LNS Example Visual illustration of the facial image B and part of rotate image A with identical localized null space representations. 55

Non-linear Kernel Space Invariants(NKSI) • Invariance to non-linear transformation • Relying on Taylor expansions to approximate the non-linear transformations with linear transformations • Application: Standard Perspective Transformation

Non-linear Kernel Space Invariants(NKSI) When k=2

Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Presentation Transcript

Image and Video Retrieval

Knowledge Representation and Retrieval

Image and Video Retrieval and Visual Analytics: Opportunities for Collaboration

Video Data Retrieval

3D Motion Classification Partial Image Retrieval and Download

Text and Image Representation

Image Representation and Manipulation

Image Formation and Representation

Deformation Invariant Shape and Image Matching

Chapter Three Graphical Image and Data Representation

Image and Video Retrieval

A fuzzy video content representation for video summarization and content-based retrieval

Robust Classification of Objects, Faces, and Flowers Using Natural Image Statistics

Concept-based Image and Video Retrieval

Imaging and Image Representation

Image retrieval and categorization

A decision-theoretic view of image retrieval

An Efficient and Robust Technique for Region Based Shape Representation and Retrieval

IMAGE CLASSIFICATION DATA INTEGRATION AND ANALYSIS

Graphics and image data representation

3D Motion Classification Partial Image Retrieval and Download