1 / 26

Flow Based Action Recognition

Flow Based Action Recognition. Papers to discuss:  The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing Action at a Distance (Efros et al. 2003). What is an Action?.

agrata
Télécharger la présentation

Flow Based Action Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flow Based Action Recognition Papers to discuss:  The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing Action at a Distance (Efros et al. 2003)

  2. What is an Action? Action: Atomic motion(s) that can be unambiguously distinguished and usually has a semantic association (e.g. sitting down, running). An activity is composed of several actions performed in succession (e.g. dining, meeting a person). Event is a combination of activities (e.g. football match, traffic accident).

  3. Action Recognition • Previously  • action recognition is part of articulated tracking problem •  or generalized tracking problem for directly detecting (activities/events) • Novelty • direct recognition of short time motion segments • new feature descriptors • motion history images • motion energy images • Efros' features

  4. Flow Based Action Recognition Papers to discuss:  The Representation and Recognition of Action Using Temporal Templates (Bobbick-Davis 2001) Recognizing Action at a Distance (Efros et al. 2003)

  5. Motivation

  6. Goal Action: Motion over time Create a view-specific representation of action Construct a vector-image suitable for matching against other instances of action

  7. Motion Energy Images D(x,y,t): Binary image sequence indicating motion              locations

  8. Motion Energy Images

  9. Motion History Images Descriptor: Build a 2-component vector image by combining MEI and MH Images

  10. Matching • Compute the 7 Hu moments • Model the 7 moments each action class with a Gaussian distribution (diagonal covariance) • Given a new action instance: measure the Mahalanobis distance to all classes.  Pick the nearest one.

  11. Image Moments Translation Invariant Moments

  12. Scale Invariant Moment 7 Hu Moments

  13. Results Only the left (30 dg) camera as input and matches against all 7 views of all 18 moves (126 total).  Metric: a pooled independent Mahalanobis distance using a diagonal covariance matrix to accommodate variations in magnitude of the moments.

  14. Results Two camera The minimum sum of Mahalanobis distances between the two input templates and two stored views of an action that have the correct angular difference between them (in this case 90) The assumption: we know the approximate angular relationship between the cameras.

  15. Flow Based Action Recognition Papers to discuss:  The Representation and Recognition of Action Using Temporal Templates (Bobbick-Davis 2001) Recognizing Action at a Distance (Efros et al. 2003)

  16. The Goal Recognize medium-field human actions Humans few pixels tall Noisy video

  17. System Flow • Track and stabilize the human figure • Simple normalized-correlation based tracker • Compute pixelwise optical flow • On the stabilized space time volume • Build the descriptor • More on this later... • Find NN

  18. Descriptor What are good features for motion? • Pixel values • Spatial image gradients • Temporal gradients •     Problems: Appearance dependent and no directionality information on motion • Pixel-wise optical flow •     Captures motion independent of appearance

  19. Descriptor The key idea is that the channels must be sparse and non-negative

  20. Similarity a,b: motion descriptors for two           different sequences T: motion length I: frame (size) c: # of channels

  21. Similarity

  22. Classification • Construct similarity matrix as outlined. • Convolve with the temporal kernel • For each frame of the novel sequence, the maximum score in the corresponding row of this matrix will indicate the best match to the motion descriptor centered at this frame. • Classify this frame using a k-nearest-neighbor classifier: find the k best matches from labeled data and take the majority label.

  23. Results Ballet (16 Classes): Clips of motions from an instructional video. Professional dancers, two men and two women. Performing mostly standard ballet moves. Tennis (6 Classes): Two amateur tennis players outdoors (one player test, one player train). Each player was video-taped on different days in different locations with slightly different camera positions. Players about 50 pixels tall. Football (8 Classes): Several minutes of a World Cup football game from an NTSC video tape.  Wide angle of the playing field. Substantial camera motion and zoom. About 30-by-30 noisy pixels per human figure. 

  24. Results Values on the diagonals: Ballet   (K=5, T=51):   [.94 .97 .88 .88 .97 .91 1 .74 .92 .82 .99 .62 .71 .76 .92 .96] Tennis  (K=5, T=7):     [.46 .64 .7 .76 .88 .42] Football  (K=1, T=13): [.67 .58 .68 .79 .59 .68 .58 .66]

  25. Do As I Do Synthesis Given a “target” actor database T and a “driver” actor sequence D, the goal is to create a synthetic sequence S that contains the actor from T performing actions described by D.

  26. Extensions to MHI Alper Yilmaz; Mubarak Shah, "Actions sketch: a novel action representation," Computer Vision and Pattern Recognition, 2005. Volumetric Features for Event Recognition in VideoYan Ke, Rahul Sukhtankar, Martial Hebertin ICCV 2007.

More Related