180 likes | 348 Vues
Experiments ○ ○ ○. Methodology ○ ○ ○ ○ ○ ○ ○ ○ ○. Introduction ● ○ ○ ○ ○. Real-time Articulated Hand Pose Estimation u sing Semi-supervised Transductive Regression Forests. Danhang Tang, Tsz -Ho Yu, Tae- kyun Kim Imperial College London, UK. Su-A Kim. 3 rd June 2014.
E N D
Experiments ○ ○ ○ Methodology ○ ○ ○ ○ ○ ○ ○○○ Introduction ● ○ ○ ○○ Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests Danhang Tang, Tsz-Ho Yu, Tae-kyun Kim Imperial College London, UK Su-A Kim 3rd June 2014 ※ The slides excerpted parts of the author’s oral presentation at ICCV 2013.
Challenges for Hand? • Viewpoint changes and self occlusions • Discrepancy between synthetic and real data is larger than human body • Labeling is difficult and tedious! Su-A Kim 3rd June 2014 @CVLAB
Method • Viewpoint changes and self occlusions Hierarchical Hybrid Forest • Discrepancy between synthetic and real data is larger than human body Transductive Learning • Labeling is difficult and tedious! Semi-supervised Learning Su-A Kim 3rd June 2014 @CVLAB
Existing Approaches • Generative Approach : use explicit hand models to recover the hand pose • - optimization, 현재 hypothesis를 최적화 하기 위해 앞 결과에 의존 Hamer et al. ICCV2009 Motion capture Ballan et al. ECCV 2012 De La Gorce et al. PAMI2010 Oikonomidis et al. ICCV2011 • Generative Approach : learn a mapping from visual features to the target parameter space, such as joint labels or joint coordinates(i.e. hand poses), from a labelled training dataset. • - classification, regression,.... • - each frame independent, error recovery Keskin et al. ECCV2012 Wang et al. SIGGRAPH2009 Stengeret al. IVC 2007 Xu and Cheng ICCV 2013 Su-A Kim 3rd June 2014 @CVLAB
Discriminative Approach • achieved great success in humanbody pose estimation. • Efficient : real-time • Accurate : frame-basis, not rely on tracking • Require a large dataset to cover many poses • Train on synthetic, test on real data Su-A Kim 3rd June 2014 @CVLAB
Hierarchical Hybrid Forest To evaluate the classification performance of all the viewpoint labels in dataset Viewpoint Classification: Qa • STR forest: • Qa – View point classification quality (Information gain) Qapv =αQa+ (1-α)βQP + (1-α)(1-β)QV Su-A Kim 3rd June 2014 @CVLAB
Hierarchical Hybrid Forest Viewpoint Classification: Qa To measure the performance of classifying individual patch Finger joint Classification: QP • STR forest: • Qa – View point classification quality (Information gain) • Qp – Joint label classification quality (Information gain) Qapv =αQa+ (1-α)βQP+ (1-α)(1-β)QV Su-A Kim 3rd June 2014 @CVLAB
Hierarchical Hybrid Forest Viewpoint Classification: Qa Finger joint Classification: QP Pose Regression: QV • STR forest: • Qa – View point classification quality (Information gain) • Qp – Joint label classification quality (Information gain) • Qv – Compactness of voting vectors (Determinant of covariance trace) Qapv =αQa+ (1-α)βQP + (1-α)(1-β)QV Su-A Kim 3rd June 2014 @CVLAB
Hierarchical Hybrid Forest Viewpoint Classification: Qa Finger Joint Classification: QP Pose Regression:QV • STR forest: • Qa – View point classification quality (Information gain) • Qp – Joint label classification quality (Information gain) • Qv – Compactness of voting vectors (Determinant of covariance trace) • (α,β) – Margin measures of view point labels and joint labels Qapv =αQa+ (1-α)βQP + (1-α)(1-β)QV : difference btw the highest posterior of a class and the second in a node Using all three terms together is slow. Su-A Kim 3rd June 2014 @CVLAB
Transductive Learning Source space (Synthetic data S) Target space (Realistic data R) • Training data D = {Rl, Ru, S}: labeled unlabeled • Synthetic data S: • Generated from an articulated hand model. All labeled. • Realistic data R: • Captured from Primesense depth sensor • A small part of R, Rlare labeled manually (unlabeled set Ru) Su-A Kim 3rd June 2014 @CVLAB
Transductive Learning Source space (Synthetic data S) Target space (Realistic data R) • Training data D = {Rl, Ru, S}: • Synthetic data S: • Generated from a articulated hand model, where |S| >> |R| • Realistic data R: • Captured from Primesense depth sensor • A small part of R, Rlare labeled manually (unlabeled set Ru) Su-A Kim 3rd June 2014 @CVLAB
Transductive Term Qt Nearest neighbour Source space (Synthetic data S) Target space (Realistic data R) • Training data D = {Rl, Ru, S}: • Similar data-points in Rl and S are paired(if separated by split function give penalty) • Qt is the ratio of preserved association after a split : the training data that pass down the left and right child nodes respectively 1 when matches Su-A Kim 3rd June 2014 @CVLAB
Semi-supervised Term Qu Source space (Synthetic data S) Target space (Realistic data R) • Training data D = {Rl, Ru, S}: • Similar data-points in Rl and S are paired(if separated by split function give penalty) • Qu evaluates the appearance similarities of all realistic patches R within a node Su-A Kim 3rd June 2014 @CVLAB
Kinematic Refinement • 각 관절에 대하여 GMM으로 voting, 두 모드의 가우시안 사이의 euclidean거리를 측정 • High Confidence / Low Confidence • High Confidence -> query large joint position databasechoose the uncertain joint positions that are close to the result of the query. Su-A Kim 3rd June 2014 @CVLAB
Experimental Settings • Training data: • Synthetic data(337.5K images) • Real data(81K images, <1.2K labeled) • Evaluation data: • Three different testing sequences • Sequence A --- Single viewpoint(450 frames) • Sequence B --- Multiple viewpoints, with slow hand movements(1000 frames) • Sequence C --- Multiple viewpoints, with fast hand movements(240 frames) Su-A Kim 3rd June 2014 @CVLAB
Self comparison experiment • This graph shows the joint classification accuracy of Sequence A. • Realistic and synthetic baselines produced similar accuracies. • Using the transductive term is better than simply augmented real and synthetic data. • All terms together achieves the best results. Su-A Kim 3rd June 2014 @CVLAB
Su-A Kim 3rd June 2014 @CVLAB
Reference [1] Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture, CVPR, 2014 [2] A Survey on Transfer Learning, Transactions on knowledge and data engineering , 2010 [3] Motion Capture of Hands in Action using Discriminative Salient Points, ECCV, 2012 Su-A Kim 3rd June 2014 @CVLAB