1 / 31

Speech and Gesture Corpus From Designing to Piloting

Gheida Shahrour Supervised by Prof. Martin Russell Dr Neil Cooke. Electronic, Electrical and Computer Engineering University of Birmingham. Speech and Gesture Corpus From Designing to Piloting. Our research focuses on modelling human behaviour from body motion.

carlow
Télécharger la présentation

Speech and Gesture Corpus From Designing to Piloting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GheidaShahrour Supervised by Prof. Martin Russell Dr Neil Cooke Electronic, Electrical and Computer Engineering University of Birmingham Speech and Gesture CorpusFrom Designing to Piloting

  2. Our research focuses on modelling human behaviour from body motion. No dataset which could serve our research focus. Motivation

  3. We need data that: • Contains the motion of people’s head, arms and hands • Captured from people come from different cultural backgrounds • Contains spontaneous speech • Captured using a marker-based tracking technique Dataset Specification

  4. Capturing people’s gestures is mainly based on computer vision techniques: • skin colour- people’s skin & light in images. • contour of people- objects may overlap/occluded • tracking from sequence of frames-may not be accurate • images are from 2D- accuracy issues. To Avoid these problems We will capture gestures using marker-based optical motion tracking: • data obtained from 3D coordinate system • less occlusion & recovered easily • tracking the object accurately- good calibration • tracking the light-reflective markers- accuracy. Why Marker-based Tracking Technique?

  5. The Balance and Posture Laboratory in the School of Psychology equipped with QTM system (http://www.qualisys.com): • 12 cameras with LED strobes which emits a beam of infrared light which is not visible to the naked eye. • QTM Software & Analogue Interface for recording speech • passive markers- different sizes • calibration Kit: axis L shape & wand T shape. Qualisys Track Manager (QTM) http://www.qualisys.com

  6. Camera & Strobe http://www.qualisys.com

  7. . • The spherical markers are coated with a material to amplify their brightness. • The strobes project light towards the markers and the markers reflect it back to the camera • Then the camera system measures a 2-dimensional position of the reflective target by combining the 2-D data from several cameras. • The camera uses the reflected data from multiple cameras to calculate the 3D position of the markers with high spatial resolution. How it works?

  8. How it works?

  9. . • Attach markers on the objects of interest- how? • Define the measurement area where subjects will stand • Test the area • Calibrate the area • Capture your data • Save your data The Process of Capturing data

  10. Reprocess the files you captured to construct the 3D view-how? Reprocess Data Files

  11. Label your data – how? • Create a text file- Unique name • Unique colour • Upload the file • Drag & drop • Play the motion data • Play it again • Fill the gap • Play it again • Save the file • Export the data Labeling Data

  12. 2 volunteers each wears 36 7mm flat-based half spherical markers on: - head(4) - elbows(2) - waist(4) - golf gloves(26). • 12 cameras & measurement volume is not specified • frame rate: 200 frames per second • speech is not recorded. Experiments (1)_Methods & Materials

  13. Experiments (1)3D View

  14. Experiments (1)Best Result

  15. To improve the quality of data. 1. Quantity: number of unidentified markers’ trajectories should be the same number of the markers used in the experiment. 2. Quality: No loss of markers, ghost markers • The technique: the reduction both the number of markers & the measurement volume Experiment (2)_Motivation

  16. Typical 3D Data & Cameras Position

  17. Prediction error • Residual: the remaining of the trajectory set to low • Filling gaps between frames Low Vs High 3D Tracker Parameters http://www.qualisys.com

  18. Markers’ Trajectories & Filling the Gap

  19. Missing Data

  20. How to Fill these Gaps?

  21. 3 volunteers each wears 28 7mm flat-based half spherical markers on: - head(4) - elbows(2) - shoulders(2) - waist(4) - golf gloves(16) Experiments (2)_Methods & Materials

  22. Experiments (2)_ Measurement Volume

  23. Experiments (1)_Cameras Position

  24. Experiments (2)_Cameras Position

  25. Experiments (2)_Sessions

  26. Experiments (2)_Result

  27. Experiments (2)_Result

  28. We will track motion of head, arms and hand • Leave 3 fingers out: middle, ring and pink. • Occlusion of the markers on fingers is not only due to the cameras set up, but also due to the degree of freedom of the hands • Finding unidentified trajectories of markers is laborious and time consuming. • Tracking all fingers is very useful for many applications such as Sign Language but this is not our focus. Conclusion

  29. Each volunteer will wear not less than 12mm passive markers on head(4), elbows(2), waist(4), shoulder(3) and gloves(10) Data collection_ assignment

  30. Put yourselves into groups of 3. • The members of each group should be from the same first language, same gender & same country of birth • Each member in British group (country of birth is Britain & first language is English) will record 2 sessions. Each session will last 15 minutes captured in 5 stages. Each stage lasts for 3 minutes. • Each member in the cultural group (country of birth is not Britain & first language is not English) will record 4 sessions. 2 sessions in English as a Second Language and 2 in their first language. Each session will last 15 minutes captured in 5 stages. Each stage lasts for 3 minutes. Group Setup

  31. Any Question?

More Related