Image acquisition • Image acquisition at BBC R&D studios in London using eight different viewpoints. • Sequence frame-by-frame segmentation provided by Chroma keying system using non-blue screen. Special camera provided with blue LED lights. Room equipped with image acquisition systems
Image acquisition • Example of eight shots from synchronized cameras:
Image segmentation • Chroma Keying segmentation enhanced by Shen algorithm. Original image Chroma Keying application featuring Shen algorithm Final result
3D Reconstruction • Frame-by-frame 3D recontruction using different-viewpoint shots. • Software developed in ISPG by Federico Lupica expecially to archive this goal: [ScreenShot del siluetto] A screenshot of the 3D reconstruction system
3D Reconstruction • Example of voxelsets creation by 3D intersection of Visual Hulls projected from segmented edges. This method is performed for each frame of a gesture action sequence.
Feature extraction • Global body barycentre, height and horizontalplane projection are derived from each voxelset representing an action shot. Height Barycentre Horizontal Projection
Feature extraction • Motion direction first rough estimate as the major axis of the ellipse fitting the shape previously projected onto the horizontal plane. Estimated motion direction
Feature extraction • Voxelsets are divided into six fundamental body regions, each of them marked with a barycentre. K-Means algorithm is used in order to perform clustering. Head/Shoulders Right arm Left arm Abdomen Left leg Right leg
Feature extraction • Another, more accurate, motion direction estimate is computed through an LDA (Linear Discriminant Analysis) based method, particularly studied for this project: Vectors defining the LDA plane: maximum ratio of between-class variance to the within-class variance is guaranteed for datasets of legs and abdomen if projected onto this plane. Note: this plane is a representation of a maximum separability slice of legs, hence it could be seen as orthogonal to the motion direction. The abdonem dataset is used in LDA computation in order to mantain plane verticality, othewise it could suffer from legs obliquity.
Feature extraction • The normal to the LDA plane projected onto the horizontal plane could be considered as an estimate of the motion direction. • Eventually the set of features used for next statistical computations is made up of the 3D coordinates of each cluster barycentre in a reference frame represented by the motion direction, the normal to the motion direction (on horizontal plane) and the original vertical axis. This reference frame is centred in the global barycentre.
Feature extraction (conclusions) • Features are now trajectory independent, integral with the moving body. • Change of leg forward-backward during walking causes unavoidable alterations of motion direction estimate: it rotates from front to back and vice versa. Consequences are rigid rotations of patterns designed by features in the new reference frame. The next recognition system could take advantage from this: it is obvoiusly another degree of freedom that convey important informations to the statistical modelling process.
Gestures modeling • HMM (Hiddel Markov Model) method is used in order to obtain a model given a set of features representing evolution of an entire action in time. • E.M. (Expectation-Maximization) algorithm is performed in order to build the most likely HMM model for an action. • Each HMM is characterized by a set of matrices:
Getures modeling • Steps followed by E.M. algorithm:
Getures modeling • Example of military marching action compound of 110 frames interpreted by a three-states HMM: States Frame number
Model clustering • Clustering of HMM models is done by a metric definition, creating therefore a model space. • Kullback-Leibler distance is used as a metric between HMM models. It is defined as:
Model clustering • Example of K-L distance iterative computation between two HMM models (two instances of military marching action).
Recognition • Gestures recognition is performed through computation of a new HMM model given a set of features along time and then through its classification into the space containing HMM clusters. • If the distance between the new model and each cluster barycentre is over a threshold, this gesture is considered a new one and the system begin to build a new cluster to put it up.
Future work • Estimate a consistent measure in order to quantify SNR involved in action recognition, distinguishing what is important in feature signals for the recognition system and what can be considered noise. • Designing an efficient filtering system in order to maximize this SNR, probably using some costraint given by body joints. • Enhance the recognition system based on HMM and, in case, search for another recognition engine. • Implement a more efficient software version of the system, possibly to have real-time recognition.