Perceiving Talking Faces: A Paradigm for Multimodal Communication
860 likes | 1.07k Vues
Perceiving Talking Faces: A Paradigm for Multimodal Communication. Perceptual Science Laboratory University of California Santa Cruz, CA 95064 mambo.ucsc.edu. Perceptual Science Laboratory (PSL). Dom Massaro Michael Cohen Christopher Campbell Rashid Clark Jonas Beskow
Perceiving Talking Faces: A Paradigm for Multimodal Communication
E N D
Presentation Transcript
Perceiving Talking Faces: A Paradigm for Multimodal Communication Perceptual Science Laboratory University of California Santa Cruz, CA 95064 mambo.ucsc.edu
Perceptual Science Laboratory (PSL) • Dom Massaro • Michael Cohen • Christopher Campbell • Rashid Clark • Jonas Beskow • Kartik Venkataraman • Tony Rodriguez • Nathan Sanders
Interdisciplinary Endeavor • Cognitive Science • Psychology • Philosophy • Linguistics • Computer Sciences • Anthropology
Anecdotal Evidence • Persons with Hearing Loss • Benjamin Franklin in France • Hal in 2001: A Space Odyssey • “Hear TV Better with Glasses On” • Poorly Dubbed Foreign Films
Summary of Research • Psychological and Psycholinguistic Inquiry • How we make sense of the world • Value of multiple sources of information • Expert parallel processors • Developing and evaluating a talking head • Application in language tutoring • Implications for education • Other application possibilities
Theories of Speech Perception • Psychoacoustic Theories • Motor Theory • Direct Perception Theory • Pattern Recognition Theory
Theories of Speech Perception • Psychoacoustic Theories • Influence is Solely Auditory
Theories of Speech Perception • Motor Theory • Speech Production Mediates Perception
Theories of Speech Perception • Direct Perception Theory • Articulatory Gestures Enforce Direct Perception
Theories of Speech Perception • Pattern Recognition Theory • Speech is Prototypical Pattern Recognition
Theories of Speech Perception • Psychoacoustic Theories • Influence of Visible Speech • No Obvious Mechanism
Theories of Speech Perception • Motor Theory • Influence of Context
Theories of Speech Perception • Direct Perception Theory • Influence of Non-Articulatory Sources
Theories of Speech Perception • Pattern Recognition Theory • Best Description of All Relevant Results
Research Strategy to Develop and Evaluate the Effectiveness of Visible Speech • Control Presentation • Auditory Synthetic speech • Computer Animated Talking Head • Development and Evaluation
Experimental Strategy • Manipulate auditory and visual speech • present unimodal stimuli • present factorial bimodal stimuli • test models of perception
BA VA THA DA none BA VA THA DA none Auditory Visual
stimulus input speech alternatives a lot like /da/ visual /da/ mostly nothing like /ba/ not at all like /da/ somewhat like /va/ auditory /ba/ A lot like /ba/ a little like /tha/
A BA 2 3 4 DA none BA 2 3 4 DA none V
Pattern Recognition • Central to Cognition • Multiple Continuous Sources of Information • Bimodal Speech Perception • Other Domains • Reading, visual perception, skill learning • Universal Principle
Fuzzy Logical Model of Perception (FLMP) • Continuous Information (Fuzzy Logic) • Independence of Sources • Multiplicative Integration of Sources • Optimal Integration Rule
Fuzzy Logic • Truth of Proposition x: t(x) • Truth of Proposition y: t(y) • 0 < t(x) < 1, 0 < t(y) < 1 • Negation of x: t(~x) = 1 - t(x) • Conjunction: t(x and y) = t(x) t(y) • Disjunction: DeMorgan’s Law • t(x or y) = t(x) + t(y) - t(x) t(y)
FLMP • Evaluation: /ba/ - Rising F2-F3 and Closed Lips • /da/ - Level F2-F3 and Open Lips • Integration: s(/ba/) = (1 - a)(1 - v) • s(/da/) = av • Decision: • av • P(/da/) = -------------------------- • av + (1 - a)(1 - v)
Confusion Matrix • Auditory /va/ & visual /da/ ---> /tha/ • Auditory /ba/ & visual /da/ ---> /tha/
stimulus input speech alternatives a lot like /da/ visual /da/ mostly nothing like /ba/ not at all like /da/ somewhat like /va/ auditory /ba/ A lot like /ba/ a little like /tha/
A BA 2 3 4 DA none BA 2 3 4 DA none V
Pattern Recognition • Central to Cognition • Multiple Continuous Sources of Information • Bimodal Speech Perception • Other Domains • Reading, visual perception, skill learning • Universal Principle
A i Evaluation V j a v i j Integration s k Decision R k
FLMP • Evaluation: /da/ - Level F2-F3 and Open Lips • /ba/ - Rising F2-F3 and Closed Lips • Integration: s(/da/ | A) = a • s(/ba/ | A) = (1 - a) • Decision: • a • P(/da/ | A) = -------------------------- = a • a + (1 - a)
FLMP • Evaluation: /da/ - Level F2-F3 and Open Lips • /ba/ - Rising F2-F3 and Closed Lips • Integration: s(/da/ | A) = .6 • s(/ba/ | A) = (1 - .6) • Decision: • .6 • P(/da/ | A) = -------------------------- = .6 • .6 + (1 - .6)