Unsupervised Learning for Speech Motion Editing

Yong Cao1,2Petros Faloutsos1Frederic Pighin2 University of California, Los Angeles1 Institute for Creative Technologies, University of Southern California2 Eurographics/SIGGRAPH Symposium on Computer Animation (2003) Unsupervised Learning for Speech Motion Editing

Problem • Motion Capture is convenient but lacks flexibility • Problem: How to extract the semantics of the data for intuitive motion editing?

Related Work 1.Face motion synthesis • Physics-based face model Lee, Terzopoulos, Water ( SIGGRAPH 1995) Kähler, Haber, Seidel (Graphics Interface 2001) • Speech motion synthesis Bregler, Covell, Slaney (SIGGRAPH 1997) Brand (SIGGRAPH 1999) Ezzat, Pentland, Poggio (SIGGRAPH 2002) 2. Separation of style and content Brand, Hertzmann (SIGGRAPH 2000) Chuang, Deshpande, Bregler (Pacific Graphics 2002)

Our Contribution • New statistical representation of facial motion • Decomposition into style and content • Intuitive editing operations

Our Contribution Original Neutral Motion Edited Sad Motion

Roadmap • Independent Component Analysis (ICA) • Facial motion decomposition • Semantics of components • Motion editing

Independent Component Analysis (ICA) • Statistical technique • Linear transformation • Components are maximally independent

Steps of ICA • Preprocessing (PCA) • Centering • Whitening • ICA decomposition Reconstruction: Decomposition:

Speech motion Dataset • Speech motion of 113 sentences in 5 emotion moods: • Frustrated18 sentences • Happy18 sentences • Neutral17 sentences • Sad30 sentences • Angry30 sentences • Each motion: 109 motion capture markers 2 – 4 seconds

Components in ICA space Facial Motion and Decomposition Reconstruction Facial motion …………

Interpretation of independent components • Goal: Find the semantics of each component Classify each component into: Style(emotion) Content (speech) • Methodology • Qualitatively • Quantitatively

changing Qualitatively Style (emotion) Content (speech)

Quantitatively • Style: Emotion Same speech, different emotion ………… ………… Happy Frustrated

Speech Content Grouping of motion markers • Mouth motion • Eyebrow motion • Eyelid motion

………… Reconstruct 0 0 0 0 Content: speech related motion Step1: Using each independent component to reconstruct facial motion

Content: speech related motion Step2: Compare according to certain region

Roadmap • Independent Component Analysis (ICA) • Facial motion decomposition • Semantic meaning of components • Motion editing

Translate • Copy and Replace • Copy and Add Motion Editing with ICA • Edit the motion in intuitive ways

Results • Changing emotional state by translating

Conclusion • New statistical representation of facial motion • Decomposition into content and style • Intuitive editing operations

The End Thanks to Wen Tien for his help on this paper, Christos Faloutsos for useful discussions, and Brian Carpenter for his excellent performance. Thanks to the USC School of Cinema – Television and House of Moves for motion capture.

Unsupervised Learning for Speech Motion Editing

Unsupervised Learning for Speech Motion Editing

Presentation Transcript

Unsupervised Learning

Unsupervised learning (II)

Unsupervised Learning

Unsupervised Learning

Machine learning: Unsupervised learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning

Unsupervised learning Networks

Unsupervised Learning

Unsupervised Learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning Networks

Unsupervised Learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning