1 / 20

Combined Gesture-Speech Analysis and Synthesis

Combined Gesture-Speech Analysis and Synthesis. M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp {msargin,eerzin,yyemez,mtekalp}@ku.edu.tr Multimedia Vision and Graphics Laboratory, Koc University. Outline. Project Objective Technical Description

dalit
Télécharger la présentation

Combined Gesture-Speech Analysis and Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combined Gesture-Speech Analysis and Synthesis M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp {msargin,eerzin,yyemez,mtekalp}@ku.edu.tr Multimedia Vision and Graphics Laboratory, Koc University The SIMILAR NoE Summer Workshop 2005

  2. Outline • Project Objective • Technical Description • Preparation of Gesture-Speech Database • Detection of Gesture Elements • Gesture-Speech Correlation Analysis • Synthesis of Gestures Accompanying Speech • Resources • Work Plan • Team Members The SIMILAR NoE Summer Workshop 2005

  3. Project Objective • The production of speech and gesture is interactive throughout the entire communication process. • Computer-Human Interaction systems should be interactive such that, for an edutainment application, animated person’s speech should be aided and complemented by it’s gestures. • Two main goals of this project: • Analysis and modeling of correlation between speech and gestures. • Synthesis of correlated natural gestures accompanying speech. The SIMILAR NoE Summer Workshop 2005

  4. Technical Description • Preparation of Gesture-Speech Database • Detection of Gesture Elements • Gesture-Speech Correlation Analysis • Synthesis of Gestures Accompanying Speech The SIMILAR NoE Summer Workshop 2005

  5. Preparation of Database • Gestures of a specific person will be investigated. • The video database related with that specific person should include the gestures that he/she frequently uses. • Locations of head, arm, elbows, etc. should easily be detectable and traceable. The SIMILAR NoE Summer Workshop 2005

  6. Detection of Gesture Elements • In this project, we consider arm and head gestures. • Main tasks included in detection of gesture elements: • Tracking of head region. • Tracking of hand and possibly shoulder and elbow. • Extraction of gesture features. • Recognition and labeling of gestures. The SIMILAR NoE Summer Workshop 2005

  7. Head Region Tracking • To extract motion information coming from head one should first extract head region. • Exhaustive search of head in each frame is a possible solution. However this is computationally inefficient. • Tracking is efficient by the means of computational complexity. • Motion information calculated for tracking will be used for head gesture features. The SIMILAR NoE Summer Workshop 2005

  8. Tracking Methodology • Exhaustive search for head region in initial frame • Haar-Based Face Detection • Skin Color information • Extraction of motion information from head region • Optical flow vectors • Fitting global motion parameters optical flow vectors • Warp search window according to motion information. • Search for head region in the search window. The SIMILAR NoE Summer Workshop 2005

  9. Head Tracking Results The SIMILAR NoE Summer Workshop 2005

  10. Hand Tracking Methodology • Hand region will be extracted using skin color information. • Robust State-Space Tracking will be applied. • Observations are position of hand. • States are position, speed and acceleration of hand. • Kalman Filtering removes unwanted noise from features • In Regular Kalman Filter, parameters are fixed. • In Robust Kalman Filter parameters are re-adjusted for each iteration to minimize MSE and overcome the effects of abrupt changes in motion of hand. The SIMILAR NoE Summer Workshop 2005

  11. Extraction of Gesture Features • Head Gesture Features: Global Motion Parameters calculated within head region will be used. • Hand Gesture Features: Hand center of mass position and calculated velocity will form hand gesture features. The SIMILAR NoE Summer Workshop 2005

  12. Gesture-Speech Correlation Analysis • Recognized gestures are labeled w.r.t. time. • Head Gestures: Down, Up, Left, Right, Left-Right, … • Arm Gestures: Abduction, Adduction, Extension,… • Recognized speech patterns are labeled w.r.t. time. • Semantic Info: Approval, Refusal phrases, etc. • Prosodic Info: Intonational phrases, ToBI transcriptions, etc. • Correlation Analysis via examining • Co-occurrence Matrix • Input/Output Hidden Markov Models The SIMILAR NoE Summer Workshop 2005

  13. Co-occurrence Matrix • Estimation of joint probability distribution function, f(g,s) • For each time sample give a vote to related gesture-speech label pair. • For a specific speech element the most correlated gesture feature will be: • gi=argmax ( f (gx,si) ) • Relatively easy to compute. • Gives an intuition about what we are examining. x The SIMILAR NoE Summer Workshop 2005

  14. Input/Output Hidden Markov Models • IOHMM is a graphical model which allows the mapping of input sequences into output sequences. • It is used in three tasks of sequence processing: • Prediction • Regression • Classification • The model is trained to maximize the conditional distribution of an output sequence {y1,…,yt} given an input sequence {x1,…,xt}. • In our project: • Input sequence will be speech labels. • Output sequence will be gesture labels. The SIMILAR NoE Summer Workshop 2005

  15. Synthesis of Gestures Accompanying Speech • Based on the methodology used in correlation analysis given a speech signal: • Features will be extracted. • Most probable speech label will be designated to speech patterns. • Gesture pattern that is most correlated with speech pattern will be used to animate a stick model of a person. The SIMILAR NoE Summer Workshop 2005

  16. Resources • Database Preparation and Labeling • VirtualDub • Anvil • Paraat • Image Processing and Feature Extraction: • Matlab Image Processing Toolbox • OpenCV Image Processing Library • Gesture-Speech Correlation Analysis • HTK HMM Toolbox • Torch Machine Learning Library The SIMILAR NoE Summer Workshop 2005

  17. Work Plan • Timeline of the project: • Schedule of the lectures: The SIMILAR NoE Summer Workshop 2005

  18. Team Members • Ferda Ofli • Koc University • Image, Video Processing and Feature Extraction • Yelena Yasinnik • Massachusetts Institute of Technology • Audio-Visual Correlation Analysis • Oya Aran • Bogazici University • Gesture Based Human-Computer Interaction Systems The SIMILAR NoE Summer Workshop 2005

  19. Team Members • Alexey Anatolievich Karpov • Saint-Petersburg Institute for Informatics and Automation • Speech Based Human-Computer Interaction Systems • Stephen Wilson • University College Dublin • Audio-Visual Gesture Annotation • Alexander Refsum Jensenius • Department of Music, Oslo University • Gesture Analysis The SIMILAR NoE Summer Workshop 2005

  20. References • Jie Yao and Jeremy R. Cooperstock, “Arm Gesture Detection in a Classroom Environment,” Proc. WACV’02 pp. 153-157, 2002. • Y. Azoz, L. Devi. R. Sharma, “Tracking Hand Dynamics in Unconstrained Environments,” Proc. Int. Conference on Automatic Face and Gesture Recognition’98 pp. 274-279, 1998. • S. Malassiotis, N. Aifanti, M.G. Strintzis, “A Gesture Recognition System Using 3D Data,” Proc. Int. Symposium on 3D Data Processing Visualization and Transmission’02 pp. 190-193,2002. • J-M. Chung, N. Ohnishi, “Cue Circles: Image Feature for Measuring 3-D Motion of Articulated Objects Using Sequential Image Pair,” Proc. Int. Conference on Automatic Face and Gesture Recognition’98 pp. 474-479, 1998. • S. Kettebekov, M. Yeasin, R. Sharma, “Prosody based co-analysis for continuous recognition of coverbal gestures,”Proc. ICMI’02 pp.161-166, 2002. • F. Quek, D. McNeill, R. Ansari, X-F. Ma, R. Bryll, S. Duncan, K.E. McCullough “Gesture cues for conversational interaction in monocular video,” Proc. Int. Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems’99 pp. 119-126, 1999. • For detailed information visit: http://htk.eng.cam.ac.uk • Rabiner, L.; Juang, B., “An introduction to hidden Markov models” ASSP Magazine, IEEE, Vol.3, Iss.1, pp. 4- 16, Jan 1986 • Jae-Moon Chung; Ohnishi, N., “Cue circles: image feature for measuring 3-D motion of articulated objects using sequential image pair” Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, Vol., Iss., pp. 474-479, 14-16 Apr 1998 • A. Just, O. Bernier, S. Marcel., “Recognition of isolated complex mono- and bi-manual 3D hand gestures” Proc.6.ICAFGR, 2004    The SIMILAR NoE Summer Workshop 2005

More Related