1 / 1

Active Frame Selection for Label Propagation in Videos

…. Selected frame index:. b-2. b-1. b. b. b-1. …. …. …. …. b. b = 1. Sequence frame index:. …. …. …. 1. 2. i. n-1. n. …. …. …. …. …. i-1. 1. m. i = n. j+1. j. i. i+1. i+2. n-1. n. 1. i = n. i-1. 1. 2. Active Frame Selection for Label Propagation in Videos

Télécharger la présentation

Active Frame Selection for Label Propagation in Videos

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selected frame index: b-2 b-1 b b b-1 … … … … b b = 1 Sequence frame index: … … … 1 2 i n-1 n … … … … … i-1 1 m i = n j+1 j i i+1 i+2 n-1 n 1 i = n i-1 1 2 Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhanand Kristen Grauman Department of Computer Science, University of Texas at Austin Case 3: Both ways. b > 1 and n = i Estimating Expected Label Propagation Error Motivation Case 2: 1-way  beg. b = 1 and n = i Case 1: 1-way  end. n > i We explicitly model the probability that pixel p in frame t will be mislabeled if we were to obtain its label from frame t+1: , where • Manually labeling objects in video is tedious and expensive, yet such annotations are valuable for object and activity recognition. • Existing methods for interactive labeling • Propagate labels from an arbitrarily selected frame, and/or • Assume a human will intervene repeatedly to correct errors Distances use flow to estimate errors due to boundaries, occlusions, and when pixels change in appearance, or enter/leave the frame: Main Idea Appearance Motion Occlusion Results Identify the k frames which, if labeled, would propagate to the rest of the video with minimal expected error. Datasets:Camseq01: 101 frames of a moving driving scene, Camvid seq05: 3000 frames of a driving scene, Labelme 8126: 167 frames of a traffic signal, Segtrack: 6 videos with moving objects. Our error predictions in C follow the actual errors closely. … If more than one frame separates the labeled frame rt and current frame t, we compute the accumulated error recursively (and analogously for lt): • Baselines • Uniform-f: samples frames uniformly and transfers labels forward • Uniform: samples frames uniformly and transfers labels in both directions. • Keyframe: selects frames with k-way spectral clustering on Gist features. Actively select k informative frames Segment and label selected frames Propagate labels to all other frames Active Frame Selection • To segment an N-frame video, there are two sources of manual effort cost: • the cost of fully labeling a frame from scratch, denoted Cl • the cost of correcting errors by propagation, denoted Cc. • Highlights of our approach • Annotate all objects in a video with minimal manual effort. • Jointly select k most useful frames via predicted “trackability” • Efficient dynamic programming solution Errors and time saved Segtrack k = 5 Objective We want that minimizes expected effort: Video Label Propagation where Pixel Flow Label Propagation Use dense optical flow to track each pixel in both the forward and backward directions, until it reaches the closest labeled frame on either side. Error in terms of average number of mislabeled pixels, in hundreds of pixels Our active approach outperforms the baselines for all values of k, and saves hours of manual effort per video, if cost to correct errors is proportional to number of mislabeled pixels. Dynamic programming solution label prop back Accuracy per frame, sorted from high to low Total annotation time Let be the optimal value of for selecting b frames from the first n frames, where i denotes the index of the b-th selected frame. flow fwd label prop fwd Our approach yields higher accuracy, especially for frames far from labeled frames. It reduces effort better than the baselines, and can also predict the optimal number of frames to have labeled, k*. flow back For a given k, we show how to obtain the optimal value: Example of actively selected frames • Pixel Flow + MRF Label Propagation • Enhance flow model with space-time Markov Random Field: • Infer label maps that are smooth in space and time • Exploit object appearance models defined by labeled frames. in time, compared to for a naïve exhaustive search. Let the N x N matrix C record the frame-to-frame predicted errors: In this case, our method automatically selects frames with high resolution information of most of the objects.

More Related