1 / 26

Learning video saliency from human gaze using candidate selection

Learning video saliency from human gaze using candidate selection. CVPR2013 Poster. Outline. Introduction Method Experiments Conclusions. Introduction. Predicting where people look in video is relevant in many applications. Image vs. video saliency. Introduction. Two observation:

milica
Télécharger la présentation

Learning video saliency from human gaze using candidate selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning video saliency from human gaze using candidate selection CVPR2013 Poster

  2. Outline • Introduction • Method • Experiments • Conclusions

  3. Introduction • Predicting where people look in video is relevant in many applications. • Image vs. video saliency

  4. Introduction • Two observation: • 1.image saliency studies concentrate on a single image stimulus, without any prior. • 2. when watching dynamic scenes people usually follow the action and the characters by shifting their gaze to a new interesting location in the scene.

  5. Introduction • We propose a novel method for video saliency estimation, which is inspired by the way people watch videos.

  6. Method • Candidate extraction • Modeling gaze dynamics

  7. Candidate extraction • Three types of candidates: • 1. Static candidates • 2. Motion candidates • 3. Semantic candidates

  8. Candidate extraction • 1. Static candidates • calculate the graph-based visual saliency (GBVS)

  9. Candidate extraction • 2. Motion candidates • calculate the optical flow between consecutive frames • apply Difference-of-Gaussians (DoG) filtering to the optical flow magnitude

  10. Candidate extraction • Static (a) and motion (b) candidates.

  11. Candidate extraction • 3. Semantic candidates • due to higher level visual processing • three types: • center, face, and body

  12. Candidate extraction • 3. Semantic candidates • small detections : create a single candidate at their center. • large detections : create several candidates • four for body detections (head, shoulders and torso) • three for faces (eyes and nose with mouth).

  13. Candidate extraction • Semantic candidates

  14. Modeling gaze dynamics • Features • Gaze transitions for training • Learning transition probability

  15. Features • the creation of a feature vector for every ordered pair of (source, destination) candidate locations • The features can be categorized into two sets: destination frame features and inter-frame features.

  16. Features • As a low level spatial cue we use the local contrast of the neighborhood around the candidate location.

  17. Gaze transitions for training • Whether a gaze transition occurs from a given source candidate to a given target candidate. • 1. choose relevant pairs of frames • Scene cut • 2. to label positive and negative gaze transitions between these frames

  18. Gaze transitions for training

  19. Learning transition probability • whether a transition occurs or not • train a standard random forest classifier using the normalized feature vectors and their labeling. • trained model classifies every transition between source and destination candidates and provides a confidence value.

  20. Learning transition probability • transition probability P(d|si)

  21. Experiments • Dataset : • DIEM (Dynamic Images and Eye Movements)dataset • CRCNS dataset

  22. Experiments

  23. Experiments

  24. Experiments

  25. Experiments

  26. Conclusions • The method is substantially different from existing methods and uses a sparse candidate set to model the saliency map. • using candidates boosts the accuracy of the saliency prediction and speeds up the algorithm. • the proposed method accounts for the temporal dimension of the video by learning the probability to shift between saliency locations.

More Related