1 / 34

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest. Tsz -Ho Yu, Tae-Kyun Kim and Roberto Cipolla. Machine Intelligence Laboratory, Engineering Department, University of Cambridge. Introduction and Motivations. A novel real-time solution for action recognition

misu
Télécharger la présentation

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory, Engineering Department, University of Cambridge

  2. Introduction and Motivations • A novel real-time solution for action recognition • utilises local-appearance and structural information. Main features / major contributions: Continuous / frame-by-frame recognition Real-time feature extraction and classification Pyramidal spatiotemporal relationship match (PSRM) Main objective: efficiency

  3. A short demo Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for the full demo video.

  4. Related Work • Many current methods focus on:[Schuldt et al. ICPR2004, Niebles et al. BMVC06, Ryoo and Aggarwal ICCV09, Willems BMVC09, Riemenschneider et al. BMVC09] • Some achieve high accuracies, but take a long time to recognise • How can we improve efficiency? • Can we improve codebook learning and feature matching? Accuracy Action representation model (Feature design)

  5. Related Work • Vector quantisation by random forest [Moosmann et al. ECCV06] • For image segmentation [Shotton et al. CVPR08] • Can we apply it in video analysis? • Pyramid match kernel [Graumann and Darrell. ICCV05] • Image recognition [Graumann and Darrell. ICCV05] , scene classification[Lazebnik et al. CVPR06],etc. • Spatiotemporal relationship match [Ryoo and Aggarwal ICCV09] Moosmann NIPS2006 Graumann and Darrell.ICCV05 • S. Lazebnik C. Schmid J. Ponce “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories” , CVPR 2006 • K. Grauman and T. Darrell “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features” ICCV2005 • F. Moosmann, B. Triggs, and F. Jurie. “Fast discriminative visual codebooks using randomized clustering forests” NIPS2006 • J. Shotton, M. Johnson, and R. Cipolla. “Semantic texton forests for image categorization and segmentation” CVPR2008 • M. S. Ryoo and J. K. Aggarwal. “Spatio-temporal relationship match: Video structure comparison for recognition of copmlex human activities” ICCV2009 Ryoo and Aggarwal ICCV09

  6. Our Contributions • Our contribution is three-fold: Spatiotemporal Texton Forest Image segmentation(2D) → Action recognition (3D) • SRM → PSRM: pyramidal spatiotemporal relationship match 1. V-FAST corner detector 2. Random forest classifiers 3. Continuous action recognition

  7. Comparison with existing approaches Typical Approaches Our Method K-means Clustering Semantic Texton Forest Feature Encoding Efficient Slow for Large Codebook Robust Feature Matching The “Bag of Words” (BOW) Model PSRM Structural Information Lacks Structural Information Quantisation Error Hierarchical Matching

  8. Overview Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

  9. Feature detection Feature detection Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

  10. V-FAST: Spatiotemporal Feature Detection • A novel spatiotemporal interest point detector • Inspired from FAST [Rosten and Drummond ECCV2006] • A cascade of three FAST detectors. • Consider three orthogonal Bensenham circles • Features: • Very fast! E. Rosten and T. Drummond. “Machine learning for high-speed corner detection” ECCV 2006

  11. Feature extraction Feature extraction Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

  12. Building a codebook using STF • Extract small video cuboids at detected keypoints • Visual codebook using STF:

  13. Feature extraction Feature matching Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

  14. Pyramidal Spatiotemporal Relationship Match (PSRM) A set of “rules” (in different colours) are designed to describe spatiotemporal structure of features.

  15. Pyramidal Spatiotemporal Relationship Match (PSRM) TREE N TREE N

  16. Pyramidal Spatiotemporal Relationship Match (PSRM) Typical pyramid match kernel Ajacent bins are merged Our Pyramid Match Kernel Children are merged to parents

  17. Pyramidal Spatiotemporal Relationship Match (PSRM) Pyramid Match Kernel (PMK) Multiple Structural Relationship Histograms

  18. Continuous action recognition Our Approach Classification Classification Classification Classification Classification Classification Classification Classification Classification Features Features Features Features Features Features Features Features Features Features Classification Typical Methods

  19. Classification Classification! Spatiotemporal Semantic Texton Forest PSRM K-means Forest Results V-FAST Corner Spatio-temporal Cuboids BOST Random Forest Classifier

  20. Combined Classification • PSRM and BOST (bag of spatiotemporal textons) are classified indenpendently: • PSRM: k-means forest Data points are clustered using k-means at root For each cluster, perform another k-means recursively At each terminal cluster , a posterior prob. dist. Is assigned M.Muja and D. G. Lowe. “Fast approximate nearest neighbors with automatic algorithm” VISAPP2009 K-means tree figure courtesy of David Aldavert Miró : http://www.cvc.uab.cat/~aldavert/plor/

  21. Experiments • Short video sequences (50 frames ~ 2 seconds) are extracted from the input video. • Sampling frequency is 5 frames for experiment and 1 frame for the laptop demo. (so it is a frame-by-frame recognition) • Two datsets are used for performance evaluation: UT interaction dataset KTH dataset

  22. Experiments: Results (KTH dataset) snippet: subsequence level recognition • Comparable to most state-of-the-art. • Around ~3% slower than the top performer • Is it a sensible trade-off? • Useful for many more practical applications. (surveillance, robotics, etc.) sequence: major voting of subsequence labels leave-of-out-cross-validation Leave-of-out-cross-validation

  23. Experiments: Results • Results: UT interaction dataset • Run time performance ~20% performance improved by simply combining the class labels! PSRM and BOST gave low accuracies when applied separately. Can be further optimised (e.g. GPU, mult-core processing) < 25 fps, but enough for most real-time applications

  24. Demo video • Frame-level recognition • Potential improvement: • Delay (~1s) in recognition results (Depends on the subsequence length ) • Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for the full demo video.

  25. Conclusions

  26. THANK YOU VERY MUCH THE END

  27. Extra slide • Formulation of V-FAST

  28. Extra slide • Formulation of STF • Split function model: • Split criteria --- Information gain:

  29. Extra slide • Formulation of STF

  30. Extra slide • Formulation of PSRM • Step 1 Feature matching: • Step 2 Semantic PMK over histogram

  31. Extra slide • Formulation of Classifier training • Optimising the clusters of feature which maximise the PMK with the mean.

  32. Extra slide • Experiment parameters

  33. Extra slide • Confusion matrix:

  34. Extra slide PSRM BOST Kernel k-means forest Random forest Weighted combination Action recognition results (class labels)

More Related