Bilinear Classifiers for Visual Recognition: A Comprehensive Overview
410 likes | 442 Vues
Learn about bilinear models in visual recognition, their motivation, advantages, and applications, as presented at NIPS 2009 by Computational Vision Lab, University of California, Irvine.
Bilinear Classifiers for Visual Recognition: A Comprehensive Overview
E N D
Presentation Transcript
Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva Ramanan Charless Fowlkes
Reshape (:) n f 2 3 n n y x 6 7 6 7 . 6 7 x = . 6 7 . 6 7 4 5 1 £ n n n f y x Introduction • Linear model for visual recognition A window on input image Features Feature vector 1
T 2 3 2 3 T 6 7 6 7 0 > w x 6 7 6 7 0 . . 6 7 6 7 > . . 6 7 6 7 . . 6 7 6 7 4 5 4 5 Introduction • Linear classifier • Learn a template • Apply it to all possible windows Template(Model) Feature vector
T T W W Features W W W W = f f s s ( ) d d 2 3 i · n n n n : m n n n n n = f f f f s x y s n ; : : : f n . 6 7 n y . x X 6 7 . = 6 7 4 5 Reshape £ n n n f y x Introduction 2 3 2 3 : : : . 6 7 6 7 h i : : : . £ . 6 7 6 7 . = . . 6 7 6 7 . : : : d £ . n f 4 5 4 5 . d £ £ n n n f s s
Introduction • Motivation for bilinear models • Reduced rank: less number of parameters • Better generalization: reduced over-fitting • Run-time efficiency • Transfer learning • Share a subset of parameters between different but related tasks
Outline • Introduction • Sliding window classifiers • Bilinear model and its motivation • Extension • Related work • Experiments • Pedestrian detection • Human action classification • Conclusion
1 T T ( ) P ( ) L C i 0 1 + ¡ m n w w w m a x y w x = w n n 2 ; n T 0 > w x Sliding window classifiers • Extract some visual features from a spatio-temporal window • e.g., histogram of gradients (HOG) in Dalal and Triggs’ method • Train a linear SVM using annotated positive and negative instances • Detection: evaluate the model on all possible windows in space-scale domain • Use convolution since the model is linear
W Sliding window classifiers Sample template Sample image Features
Bilinear model (Definition) • Visual data are better modeled as matrices/tensors rather than vectors • Why not use the matrix structure
F X t d l M W e a u r e o e T T T T ( ( ) ) T T T W W X X 2 3 0 0 > > 2 3 2 3 n n n w n r r : x n n w x = = f f f T s x y n : : : f 2 3 2 3 : : : : : : n . 6 7 n 6 7 6 7 w y x . x X 6 7 . = . . 6 7 6 7 6 7 6 7 . 6 . 7 ( ) T . . 6 7 6 7 6 7 6 7 r . . = 4 5 . . 6 7 6 7 6 7 6 7 . . 4 5 4 5 6 7 6 7 4 5 4 5 £ n n f s Bilinear model (Definition) Features Reshape
d T T T T ( ( ) ) ( ( ) ) f d f W W W W X W T W W W X i 0 0 · > > w n n x x m n n r n = = f f f f f f s s s ; s Bilinear model (Definition) • Bilinear model 2 3 2 3 : : : . 6 7 6 7 h i : : : . £ . 6 7 6 7 . = . . 6 7 6 7 . : : : d £ . n f 4 5 4 5 . d £ £ n n n f s s
d T T T T T ( ( ( ) ) ) ( ( ( ) ) ) d f f f l d W B W W W X X W T T W W W W W X W X W i i i i 0 0 · > > n n w x x n m e a n r n n r r n a n = = = f f f f f f f f s s s s ; s s Bilinear model (Definition) • Bilinear model 2 3 2 3 : : : . 6 7 6 7 h i : : : . £ . 6 7 6 7 . = . . 6 7 6 7 . : : : d £ . n f 4 5 4 5 . d £ £ n n n f s s
d T T T T T T ( ( ( ( ) ) ) ) ( ( ( ( ) ) ) ) d f f f f l d W B W W W W X X X L W T T T W W W W W W X X W X W i i i i i 0 0 · > > n w n a x x s n m e a n r n e n n a r r r r : n a n = = = = f f f f f f f f s s s s ; s s Bilinear model (Definition) • Bilinear model 2 3 2 3 : : : . 6 7 6 7 h i : : : . £ . 6 7 6 7 . = . . 6 7 6 7 . : : : d £ . n f 4 5 4 5 . d £ £ n n n f s s
1 T T ( ) ( ) P ( ( ) ) L W T W W C T W X i 0 1 + ¡ m n r m a x y r = W n n 2 ; n f g X y n n ; Bilinear model (Learning) • Linear SVM for a given set of training pairs Objective function Regularizer Constraints Parameter
T 1 T T ( ) ( ) P ( ( ) ) W W L W W T W W C T W X i 0 1 + ¡ m n : r m a x y r = = W f s n n 2 ; n f g X y n n ; Bilinear model (Learning) • Linear SVM for a given set of training pairs Objective function Regularizer Constraints Parameter
1 T T T ( ) ( ) P ( ( ) ) L W W T W W W W C T W W X i 0 1 + ¡ m n r m a x y r = f f f f s s n n ; 2 ; s s n Bilinear model (Learning) • For Bilinear formulation • Biconvex so solve by coordinate decent • By fixing one set of parameters, it’s a typical SVM problem (with a change of basis) • Use off-the-shelf SVM solver in the loop
( ) d i · T m n n n W W W f s ; = f s n d f Motivation • Regularization • Similar to PCA, but not orthogonal and learned discriminatively and jointly with the template • Run-time efficiency • convolutions instead of Reduced dimensional Template Subspace
T W W W = f s W f Motivation • Transfer learning • Share the subspace between different problems • e.g human detector and cat detector • Optimize the summation of all objective functions • Learn a good subspace using all data
( ) L W W W ( ( ( ) ) ) L L L W W W W W W W W f x y f f ; ; t x x s y y ; ; ; ; ; Extension • Multi-linear • High-order tensors • instead of just • For 1D feature • Separable filter for (Rank=1) • Spatio-temporal templates
T ( ( ) ) T T W W W r r Related work (Rank restriction) • Bilinear models • Often used in increasing the flexibility; however, we use them to reduce the parameters. • Mostly used in generative models like density estimation and we use in classification • Soft Rank restriction • They used rather than in SVM to regularize on rank • Convex, but not easy to solve • Decrease summation of eigen values instead of the number of non-zero eigen values (rank) • Wolf et al (CVPR’07) • Used a formulation similar to ours with hard rank restriction • Showed results only for soft rank restriction • Used it only for one task (Didn’t consider multi-task learning)
Related work(Transfer learning) • Dates back to at least Caruana’s work (1997) • We got inspired by their work on multi-task learning • Worked on: Back-propagation nets and k-nearest neighbor • Ando and Zhang’s work (2005) • Linear model • All models share a component in low-dimensional subspace (transfer) • Use the same number of parameters
8 8 £ Experiments: Pedestrian detection • Baseline: Dalal and Triggs’ spatio-temporal classifier (ECCV’06) • Linear SVM on features: (84 for each cell) • Histogram of gradient (HOG) • Histogram of optical flow • Made sure that the spatiotemporal is better than the static one by modifying the learning method
d 1 4 6 8 4 5 1 0 £ n n o r = = = f s , , Experiments: Pedestrian detection • Dataset: INRIA motion and INRIA static • 3400 video frame pairs • 3500 static images • Typical values: • Evaluation • Average precision • Initialize with PCA in feature space • Ours is 10 times faster
Experiments: Human action classification 1 • 1 vs all action templates • Voting: • A second SVM on confidence values • Dataset: • UCF Sports Action (CVPR 2008) • They obtained 69.2% • We got 64.8% but • More classes: 12 classes rather than 9 • Smaller dataset: 150 videos rather than 200 • Harder evaluation protocol: 2-fold vs. LOOCV • 87 training examples rather than 199 in their case
UCF action Results Linear (0.518) (Not always feasible) PCA on features (0.444) Bilinear (0.648)
Experiments: Human action classification 2 • Transfer • We used only two examples for each of 12 action classes • Once trained independently • Then trained jointly • Shared the subspace • Adjusted the C parameter for best result
1 2 1 2 W W W W W W m m T W W W f f f s s s = f s Transfer PCA to initialize Train Train Train Train Train Train
2 1 1 2 W W W W W W W m m T f W W W f f f s s s = f s i m P ( ) L W W i m n f W i 1 ; f s = Transfer PCA to initialize Train Train Train Train Train Train Train
Results: Transfer Average classification rate
Results: Transfer (for “walking”) Iteration 1 Refined at Iteration 2
Conclusion • Introduced multi-linear classifiers • Exploit natural matrix/tensor representation of spatio-temporal data • Trained with existing efficient linear solvers • Shared subspace for different problems • A novel form of transfer learning • Got better performance and about 10X speed up in run-time compared to the linear classifier. • Easy to apply to most high dimensional features (instead of dimensionality reduction methods like PCA) • Simple: ~ 20 lines of Matlab code
1 1 T T T T T ( ( ) ) ( ( ) P ) ( P ( ( ) ) ( ) ) L L W W W T T W W W W C W W C T W X T W W X i i 0 1 0 1 + + ¡ ¡ m m n n r r m a x m y a x r y r = = f f f W f s s n n n n ; 2 2 ; ; s s n n f g x y n n ; Bilinear model (Learning details) • Linear SVM for a given set of training pairs • For Bilinear formulation • It is biconvex so solve by coordinate decent
1 1 1 1 ¡ ¡ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ T T T T 1 T T 1 T T W W ( ) ( ) P ( ( ) ) W W W A A W ( X X ) X A W ( W A X ) A A P W W W W ( ( ) ) 2 2 2 2 L W W W W C T W X i L W W W W C T W X 0 1 i 0 1 + ¡ + ¡ f m n m a x y r = = = = = = m n m a x y r f f f f = ~ = f f ~ f s f f s f s f s s n n n n s s f f W , , , , s n n W s s s s n n ; 2 ; ; 2 ; s s n n f s Bilinear model (Learning details) • Each coordinate descent iteration: freeze where then freeze where