1 / 30

MikeTalk:An Adaptive Man-Machine Interface

MikeTalk:An Adaptive Man-Machine Interface. Tony Ezzat Volker Blanz Tomaso Poggio. TTVS Overview. Input: Text Output: Photo-realistic talking face uttering text. Desktop Agents. You have received 1 email from Tommy Poggio. Desktop Agents. Customer Support. You have bought 20

nara
Télécharger la présentation

MikeTalk:An Adaptive Man-Machine Interface

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MikeTalk:An Adaptive Man-Machine Interface Tony Ezzat Volker Blanz Tomaso Poggio

  2. TTVS Overview • Input: Text • Output: Photo-realistic talking face uttering text

  3. Desktop Agents

  4. You have received 1 email from Tommy Poggio. Desktop Agents

  5. Customer Support

  6. You have bought 20 shares of SONY at $40 each. Customer Support

  7. Advertisements

  8. Hi Tony, would you be interested in a ticket from Boston to New York for $50.00? Advertisements

  9. Modules

  10. Phoneme Corpus Step 1: • collect a visual corpus from a subject • corpus contains 44 words • one word for each American English phoneme

  11. 6 Consonantal Visemes Step 2: • extract one image per phoneme: viseme • group visemes together by visual similarity

  12. 9 Vocalic Visemes (+ 1 SilenceViseme)

  13. Problem1:Need to Interpolate!

  14. Solution: Morphing! Simultaneous interpolation of shape & texture. (Beier & Neely 1992) Problem 2: too tedious to specify correspondence by hand across many images!

  15. Solution: Optical Flow (Horn & Schunk 1986) (Lucas & Kanade 1988) • To interpolate between two visemes, optical flow is first computed • A 2D motion vector field is produced: dx(x,y) dy(x,y)

  16. Morphing • Forward warping A to B • Forward warping B to A • Blending • Holefilling

  17. Synthesis Database • 16 Visemes total • 256 Optical flow vectors total, from every viseme to every other viseme

  18. Concatenation and Lip Sync • Load the correct viseme transitions • Concatenate viseme transitions • Sample the viseme transitions using audio durations

  19. Examples “1, 2, 3, 4, 5” “you have received 10 email messages.” “cat, dog, pig, cow, moose, horse, sheep”

  20. Current Work • Coarticulation • Eye + head movements • Emotion • 3D instead of 2d • Psychophysics

  21. 3D With Volker Blanz

  22. The End

  23. Co-articulation • Problem: Current method does not handle coarticulation, so speech looks overly articulated • Can record all possible triphones/ quadriphones but this approach requires a lot of data! • Best method is to learn a model for coarticulation, but what is the representation for the lips?

  24. Principal Components Analysis • Each image is a vector in a high-dimensional space • Using PCA, find the optimal set of vectors that span the space • Project the entire corpus onto those basis vectors

  25. Top 2 PCA Bases for /buut/

  26. Top 2 PCA Bases for /get/ Problem: Too nonlinear!

  27. Flow Component Analysis • Compute optical from a reference lip image to all other images in the corpus • Compute PCA on all the flows

  28. Top 2 FPCA Bases for /buut/

  29. Top 2 FPCA Bases for /get/ Much more linear behavior!

  30. Current Work • Now that we have parameterized the mouth, what is the model for mouth synthesis? • How is that model fit to the PCA data?

More Related