220 likes | 308 Vues
This system enables real-time search for specific individuals in large video collections without prior knowledge, utilizing fast linear classifiers, face tracks, and facial features. It includes offline and online processing, negative training images, and facial attribute detection.
E N D
On-the-fly Specific Person Retrieval Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman 24thMay 2012 University of Oxford
Overview Textual Queries Ranked Shots Search for: Large collection of un-annotated videos People “Barack Obama” “George Bush” “Courtney Cox” On-the-fly i.e. with no previous knowledge or model for these queries
ScrubsDataSet • 12 Episodes from Seasons 1-5 and 8 • 5 hours of video data • About 400k frames, partitioned into 5k shots • About 300k near frontal face detections • 768 x 576 MPEG2 format
Demo • Search for “Courteney Cox” in Scrubs dataset. • Steps: • Download example images from Google • Train a ranking function • Apply ranking function to video collection
On the fly person retrieval system OFF-LINE PROCESSING ON-LINE PROCESSING Text Query “CourteneyCox” Negative Training Images Facial Features & Descriptors Facial Features & Descriptors Video Collection Google Image Search “Courteney Cox” Fast Linear Classifier Face Tracks Ranking Facial Features & Descriptors Results
Detection and Tracking Viola-Jones face detection on each frame Tracking measures “connectedness” of a pair of faces by point tracks intersecting both Doesn’t require contiguous detections No drift Faces clustered into tracks [Everinghamet al. 2006, Apostoloff & Zisserman, 2007]
Scrubs Data Set • 12 Episodes from Seasons 1-5 and 8 • 5 hours of video data • About 400k frames, partitioned into 5k shots • 300k face detections • 6k face tracks • 768 x 576 MPEG2 format
Detecting facial feature points • Pictorial structure model • Joint model of feature appearance and position • [Felzenszwalband Huttenlocher’2004, Everinghamet al. 2006]
Face Appearance Representation • Affine transformation of face to canonical frame • Independent photometric normalization of parts • Represent gradients over circle centred on facial feature points • Feature descriptor is a 3849 dimensional vector • [Everinghamet al. 2006]
Negative Training Images • Combination of faces from • Random downloaded images • Labeled Faces in the Wild dataset • Caltech Faces dataset • About 16k face detections. Caltech 10, 000 Web Faces: http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/ Labeled Faces in the Wild: http://vis-www.cs.umass.edu/lfw/
On-the-fly Person Retrieval OFF-LINE PROCESSING ON-LINE PROCESSING Text Query “CourteneyCox” Negative Training Images Facial Features & Descriptors Facial Features & Descriptors Video Collection Google Image Search “Courteney Cox” Fast Linear Classifier Face Tracks Ranking Facial Features & Descriptors Results
TRECVid 2011 (IACC.1.B) • About 200 hours of video data. • 8k videos. • MPEG4, 320x240 pixels • 130k shots, • About 3 million face detections • 25,535 face tracks.
Facial attributes – FaceTracer project • Examples: • gender: male, female • age: baby, child, youth, middle age, senior • race: white, black, asian • smiling, mustache, eye-wear, hair colour • Method • person independent training set with attribute • facial feature representation • discriminative training of classifier for attribute N. Kumar, P. N. Belhumeur and S. K. Nayar, FaceTracer: A Search Engine for Large Collections of Images with Faces, European Conference on Computer Vision (ECCV), 2010 http://www.cs.columbia.edu/CAVE/projects/face_search/
Quantitative Performance - Scrubs Dataset • Performance evaluation for 3 guest actors (Brendan Fraser, Courteney Cox and Michael J Fox) • 12 dataset videos split into training and test sets (3 Training, 9 Testing) • Annotations: • Manual labeling of training and test set for each actor • Manual labeling of positive training images from Google • Negative training images from Caltech Faces dataset.
Quantitative Performance - Scrubs Dataset • Retrieval Average Precision (AP)
Quantitative Performance - Scrubs Dataset • Using more training data per track
Future Work • Exploring sources for positive examples • Better feature representations • Combination of attributes and identities