1 / 18

VIDEO ANNOTATION TOOL

stacey
Télécharger la présentation

VIDEO ANNOTATION TOOL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MUSCLE movie data base is a multimodal movie corpus collected to develop content-based multimedia processing like:- speaker clustering- speaker turn detection- visual speech activity detection- face detection- facial feature detection- face clustering scene segmentation- saliency detection- multimodal dialogue detection

  2. This database covers four different modalities :- audio- video- audiovisual - text Video annotation tool ANVIL and Anthropos 7 Editor are described

  3. VIDEO ANNOTATION TOOL • ANVIL : video annotation tool • It offers hierarchical multi-layered annotation • Annotation board shows colour-coded elements on multiple tracks in time-alignment • ANVIL can import data from PRAAT and XWaves

  4. Anthropos 7 Editor • Anthropos 7 Editor is an annotation tool for MPEG-7 data • It makes viewing and editing MPEG-7 data easier • To visualise information Anthropos 7 Editor uses the Timeline Area. Information based on a single frame is visualised in the Video Area, static movie information in the Static Information Area. • These areas communicate with each other • Anthropos 7 Editor can visualise the ROI (Region of interest) of each actor. The user can interact using the mouse. • Every image region encompassing an actor can be overlaid as a Box and it can be modified by a user

  5. MUSCLE movie data base specifications • Concepts like dialogue, saliency must be described independently: audio-only, video-only and also audio-visual description

  6. Dialogue annotation • 54 movie scenes extracted from 8 movies • The language for all scenes is English • Duration of each scene is 24-123 seconds • Each movie scene was separated in two different files: an audio file, a video file

  7. MUSCLE movie data base description • Movie title Number of Number of non-dialogue Scenes per Movie Dialoguescenes scenes • Analyze That 4 2 6 • Cold Mountain 5 1 6 • Jackie Brown 3 3 6 • Lord of the Rings I 5 3 8 • Platoon 4 2 6 • Secret Window 4 6 10 • The Prestige 4 2 6 • American Beauty 10 0 10 • Total number 39 19 58

  8. Types of dialogues for audios:- with low-level audio background: BD (dialogue wih background): dialogue in the presence of noisy background or music- monologue is classified as CM (clean monologue) or BM (Monologue with backrgound)all scenes not labeled CD or BD are considered non-dialogueTypes of dialogues for video:- CD: 2 actors present in the scene- BD: at least two actors are present- monologues types for video labeled as CM or BM

  9. Metadata for audio files: • Speech activity data: • Speech intervals (from the start and the end time) Metadata for video files: • Lip activity data (defined by the start and end time and frame)

  10. States to label lip activity intervals: • 0 : ack of actor’s head visible • 1 :actor’s frontal face is visible • 2 : actor’s frontal face visible + lip activity

  11. Afterwards: • Face tracking info extracted from the scenes • The extracted info is processed by a human annotator • face of each actor in a dialogue or monologue is assigned a bounding box • Data saved in xml MPEG-7 format • Two files (audio, video) merged into one xml file for each scene

  12. Saliency annotation • Based on detection of „pops-out” (abrupt changes, abnormalities e.g. in speech, environmental noises etc.)

  13. 3 movie clips (27 mins) from 3 different movies of different genres • Chosen carefully to represent all cases of saliency • Audio content includes: speech in a dialogie, with background sound like music, noises. • The background sounds: animals, knockings, cars etc. • Visual content: abrupt scene changes,editing effects e.g. computer made light

  14. clips annotated by two different annotatorsan event considered salient is annotated separatelyfor audio this event depends on the importance of sounds it makes in scenes for the annotatorfor visual: pop-out colour and motionsudden events can be regarded as salientsilence is not annotated

  15. ANVIL used for saliency detection • 3 main saliency categories of the annotation scheme: visual, audio, generic saliency • Audio saliency is annotated using auditory sense • Visual saliency using the visual sense • Generic saliency using both modalities simultaneously

  16. Audio saliency • Description of the audio in the scene • Chosen categories: dialogue, music, noise, sound effect, environmental sound, machine sound, background sound, unclassified sound, mixed sound. • The annotator can chose more than one sound • Speech saliency measured by intensity and loudness of voice

  17. Visual saliency • Description of the object’s motion • Pop-out events annotated as well Visual Saliency Motion Start-Stop, Stop-Start, Impulsive event, Static, Moving, Other Changes of cast (binary decision) Pop-out event (binary decision) Saliency Factor None, Low, Mid, High

  18. Generic saliency • A low-level description of saliency • Description features are: audio, visual, audiovisual • Saliency measured as high, mid or low

More Related