Performance Evaluation Measures for Face Detection Algorithms

Performance Evaluation Measures for Face Detection Algorithms Prag Sharma, Richard B. Reilly DSP Research Group, Department of Electronic and Electrical Engineering, University College Dublin, Ireland.

Aim • To highlight the lack of standard performance evaluation measures for face detection purposes. • To propose a method for the evaluation and comparison of existing face detection algorithms in an unbiased manner. • To apply the proposed method on an existing face detection algorithm. DSP Research Group, University College Dublin

Face Detection: Applications and Challenges Posed DSP Research Group, University College Dublin

Need for Face Detection • Face Recognition • Intelligent Vision-based Human Computer Interaction • Object-based Video Processing • Content-based functionalities • Improved Coding Efficiency • Improved Error-Robustness • Content Description DSP Research Group, University College Dublin

Challenges Associated with Face Detection • Pose Estimation and Orientation • Presence or Absence of Structural Component • Facial Expressions and Occlusion • Imaging Conditions DSP Research Group, University College Dublin

Performance Evaluation Measures DSP Research Group, University College Dublin

Need for Standard Performance Evaluation Measures • Main reason for advancement of research by comparison and testing. • In order to obtain an impartial and empirical evaluation and comparison of any two methods, it is important to consider the following points: • Use of a standard and representative test set for evaluation. • Use of standard terminology for the presentation of results. DSP Research Group, University College Dublin

Data Set Description MIT Test Set (Sung and Poggio) First set contains 301 frontal and near-frontal mugshots of 71 different people and the second set contains 23 images with 149 faces in complex backgrounds. Most faces are frontal and upright. All images are greyscale. CMU Test Set (Rowley et al.) 130 images with a total of 507 frontal faces. Face sizes vary. All images are greyscale. Also contains 50 images with a total of 223 faces with 95% of the rotated at an angle of more than 10 degrees. All images are greyscale. CMU Profile Test Set (Schneiderman and Kanade) 208 images with varying facial expressions and in profile view. The images are greyscale. Kodak Data Set 80 images with 90% of the faces in frontal view. Wide variety of resolutions and face sizes. The images are in colour. UCD Colour Face Image Database 100 images with 299 faces with variations in pose, orientation and imaging conditions. Also, occlusion and presence/absence of structural components such as beard, glasses etc. The images are in colour. Standard and Representative Test Set for Evaluation DSP Research Group, University College Dublin

Use of Standard Terminology • Lack of standard terminology to describe results leads to difficulty in comparing algorithms. • Eg, while one algorithm may consider a successful detection if the bounding box contains eyes and mouth, another may require the entire face (including forehead and hair) to be enclosed in a bounding box for a positive result. Successful face detection by (a) Rowley et al. (b) Hsu et al. (a) (b) DSP Research Group, University College Dublin

Use of Standard Terminology • Lack of standard terminology to describe results leads to difficulty in comparing algorithms. • Moreover, there may be differences in the definition of a face (e.g., cartoon, hand-drawn or human faces). DSP Research Group, University College Dublin

Use of Standard Terminology • Therefore, first step towards a standard evaluation protocol is to answer the following questions: • What is a face? • What constitutes successful face detection? DSP Research Group, University College Dublin

Use of Standard Terminology • What is a face? • Several databases contain human faces, animal faces, cartoon faces, line-drawn faces, frontal and profile view faces. • MIT-23: contains 23 images with 149 faces. • MIT20: contains only 20 images with 136 faces (excluding hand-drawn and cartoon faces). • CMU: Rowley established ground truth for 483 faces in this database based on excluding some of the occluded faces and non-human faces. Therefore, total number of faces in a database can vary for different algorithms!! DSP Research Group, University College Dublin

Use of Standard Terminology • To eliminate this problem!! • Use only standard databases that come with clearly marked faces in terms of cartoon/human, pose, orientation, occlusion and presence or absence of structural components such as glasses or sunglasses. • Previous work in this has led to the development of the UCD Colour Face Image Database. Each face in the database is marked using clearly defined terms.http://dsp.ucd.ie/~prag • This eliminates any misinterpretation between pose variations, orientation etc. by different researchers as a fixed number for cartoon faces, hand-drawn faces and faces in different poses and orientations is provided with the database. DSP Research Group, University College Dublin

Use of Standard Terminology • What constitutes successful face detection? • Most face detection algorithms do not clearly define a successful face detection process. • A uniform criterion should be adopted to define a successful detection. • Test image. • Possible face detection results to be classified as face or non-face. (a) (b) DSP Research Group, University College Dublin

Use of Standard Terminology • What constitutes successful face detection? • Criterion adopted by Rowley:the center of the detected bounding box must be within four pixels and the scale must be within a factor of 1.2 (their scale step size) of ground truth (recorded manually). • Face detection results should be presented in such a manner so that the interpretation of results is open for specific applications. • Graphical representation: number of faces vs. percentage overlap • Use database that comes with hand-segmented results outlining each face, e.g. UCD Colour Face Image Database Therefore, a correct face detection is one in which the bounding box includes the visible eyes and the mouth region and the overlap region between the hand-segmented results and the detection result is greater than a fixed threshold (the threshold dependent on the application). DSP Research Group, University College Dublin

Use of Standard Terminology • What constitutes successful face detection? • Use of standard terminology in describing results. • • Detection rate:The number of faces correctly detected to the number of faces determined by a human expert (hand-segmented results). • • False positives:This is when an image region is declared to be a face but it is not. • • False negatives:This is when an image region that is a face is not detected at all. • • False detections:False detections = False positives + False negatives. DSP Research Group, University College Dublin

Use of Standard Terminology • What constitutes successful face detection? • For methods that require training • • The number and variety of training examples have a direct effect on the classification performance. • • The training and execution time varies for different algorithms. • • Most of these systems can often be tested at different threshold values to balance the detection rate and the number of false positives. DSP Research Group, University College Dublin

Use of Standard Terminology • What constitutes successful face detection? • To standardize these variability • • Training should be complete on a different dataset prior to testing. • • The number and variety of training examples should be left to the algorithm developer • • The training and execution time should always be mentioned for all algorithms that require training. • • All methods should present results in terms of an ROC curve. DSP Research Group, University College Dublin

Overall Procedure • Employ a colour face detection database that comes with hand-segmented results in the form of eyes and mouth coordinates along with segmented face regions. • The face database should also contain details of the faces in standard terminology of pose, orientation, occlusion and presence of structural components along with the type of faces (hand-draw, cartoon etc.). • Clearly define the type of faces the algorithm can detect. DSP Research Group, University College Dublin

Overall Procedure • For algorithms that require training, the training should be completed prior to testing using face recognition databases for the face class and the boot-strap training technique for the non-face class. • All results should be presented in the form of two graphical plots. The ROC curves should be used to show the correct detection/false-positives trade-off while the "number of faces vs. percentage overlap" should be presented for determining correct face detection. • All results should also present the training and execution times for comparison. DSP Research Group, University College Dublin

Presentation of Results • The above procedure is implemented for the performance evaluation of a previously developed face detection algorithm as follows: • The colour face detection database chosen is the HHI MPEG-7 image database. • • The algorithm developed does not require any training before execution. • • The results are presented in terms of number of faces vs. percentage overlap for the HHI MPEG-7 database (see figure). • • Since there is no adjustable variable threshold, the ROC curve is not presented. • • The execution time is 3.54 seconds/image on a Pentium III processor. DSP Research Group, University College Dublin

No.of Faces vs Percentage Overlap 25 20 15 No. of Faces 10 5 0 -20 0 20 40 60 80 100 Percentage Overlap Presentation of Results The graph shows that there are 13 faces with no overlap (i.e. false detections) and 43 faces with over 85% overlap with the hand segmented results. DSP Research Group, University College Dublin

Conclusions • This paper highlights the problems associated with evaluating and comparing the performance of new and existing face detection methods in an unbiased manner. • A solution in the form of a standard procedure for the evaluation and presentation of results has been presented. • The evaluation procedure described in this paper concentrates on using standard terminology along with carefully labelled face databases for evaluation purposes. • The method also recommends that results should be presented graphically; the ROC curves to show the correct detection/false-positives trade-off while the "number of faces vs. percentage overlap" to determine correct face detection accuracy. DSP Research Group, University College Dublin

Questions?? DSP Research Group, University College Dublin

Performance Evaluation Measures for Face Detection Algorithms

Performance Evaluation Measures for Face Detection Algorithms

Presentation Transcript

Face Detection

Face detection

Performance Evaluation of Shadow Detection Algorithms

Performance Evaluation for Learning Algorithms

Face Detection for Access Control

Asymmetric Boosting for Face Detection

Performance Evaluation of Machine Learning Algorithms

Face Detection

Face detection

Face Detection

A Colour Face Image Database for Benchmarking of Automatic Face Detection Algorithms

Face Detection

Face Detection

Performance Evaluation of Grouping Algorithms

Face detection

Performance Evaluation Measures, I.

Adaboost for Face Detection

Face Detection

Face detection

Retrieval Performance Evaluation - Measures

Face Detection