220 likes | 297 Vues
On-the-fly Specific Person Retrieval. Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman. 24 th May 2012. University of Oxford. Overview. Textual Queries. Ranked Shots. Search for:. Large collection of u n-annotated videos. People “Barack Obama” “George Bush” “Courtney Cox”.
E N D
On-the-fly Specific Person Retrieval Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman 24thMay 2012 University of Oxford
Overview Textual Queries Ranked Shots Search for: Large collection of un-annotated videos People “Barack Obama” “George Bush” “Courtney Cox” On-the-fly i.e. with no previous knowledge or model for these queries
ScrubsDataSet • 12 Episodes from Seasons 1-5 and 8 • 5 hours of video data • About 400k frames, partitioned into 5k shots • About 300k near frontal face detections • 768 x 576 MPEG2 format
Demo • Search for “Courteney Cox” in Scrubs dataset. • Steps: • Download example images from Google • Train a ranking function • Apply ranking function to video collection
On the fly person retrieval system OFF-LINE PROCESSING ON-LINE PROCESSING Text Query “CourteneyCox” Negative Training Images Facial Features & Descriptors Facial Features & Descriptors Video Collection Google Image Search “Courteney Cox” Fast Linear Classifier Face Tracks Ranking Facial Features & Descriptors Results
Detection and Tracking Viola-Jones face detection on each frame Tracking measures “connectedness” of a pair of faces by point tracks intersecting both Doesn’t require contiguous detections No drift Faces clustered into tracks [Everinghamet al. 2006, Apostoloff & Zisserman, 2007]
Scrubs Data Set • 12 Episodes from Seasons 1-5 and 8 • 5 hours of video data • About 400k frames, partitioned into 5k shots • 300k face detections • 6k face tracks • 768 x 576 MPEG2 format
Detecting facial feature points • Pictorial structure model • Joint model of feature appearance and position • [Felzenszwalband Huttenlocher’2004, Everinghamet al. 2006]
Face Appearance Representation • Affine transformation of face to canonical frame • Independent photometric normalization of parts • Represent gradients over circle centred on facial feature points • Feature descriptor is a 3849 dimensional vector • [Everinghamet al. 2006]
Negative Training Images • Combination of faces from • Random downloaded images • Labeled Faces in the Wild dataset • Caltech Faces dataset • About 16k face detections. Caltech 10, 000 Web Faces: http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/ Labeled Faces in the Wild: http://vis-www.cs.umass.edu/lfw/
On-the-fly Person Retrieval OFF-LINE PROCESSING ON-LINE PROCESSING Text Query “CourteneyCox” Negative Training Images Facial Features & Descriptors Facial Features & Descriptors Video Collection Google Image Search “Courteney Cox” Fast Linear Classifier Face Tracks Ranking Facial Features & Descriptors Results
TRECVid 2011 (IACC.1.B) • About 200 hours of video data. • 8k videos. • MPEG4, 320x240 pixels • 130k shots, • About 3 million face detections • 25,535 face tracks.
Facial attributes – FaceTracer project • Examples: • gender: male, female • age: baby, child, youth, middle age, senior • race: white, black, asian • smiling, mustache, eye-wear, hair colour • Method • person independent training set with attribute • facial feature representation • discriminative training of classifier for attribute N. Kumar, P. N. Belhumeur and S. K. Nayar, FaceTracer: A Search Engine for Large Collections of Images with Faces, European Conference on Computer Vision (ECCV), 2010 http://www.cs.columbia.edu/CAVE/projects/face_search/
Quantitative Performance - Scrubs Dataset • Performance evaluation for 3 guest actors (Brendan Fraser, Courteney Cox and Michael J Fox) • 12 dataset videos split into training and test sets (3 Training, 9 Testing) • Annotations: • Manual labeling of training and test set for each actor • Manual labeling of positive training images from Google • Negative training images from Caltech Faces dataset.
Quantitative Performance - Scrubs Dataset • Retrieval Average Precision (AP)
Quantitative Performance - Scrubs Dataset • Using more training data per track
Future Work • Exploring sources for positive examples • Better feature representations • Combination of attributes and identities