250 likes | 418 Vues
multimodality, universals, natural interaction…. and some other stories…. Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4. going multimodal. ‘multimodal’ is this decade’s main ‘affective interaction’ aspect. plethora of modalities available to capture and process
E N D
multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4
going multimodal • ‘multimodal’ is this decade’s main ‘affective interaction’ aspect. • plethora of modalities available to capture and process • visual, aural, haptic… • ‘visual’ can be broken down to ‘facial expressivity’, ‘hand gesturing’, ‘body language’, etc. • ‘aural’ to ‘prosody’, ‘linguistic content’, etc.
why multimodal? • Extending unimodality… • recognition from traditional unimodal inputs had serious limitations • Multimodal corpora become available • What to gain? • have recognition rates improved? • or just introduced more uncertain features
essential reading • Communications of the ACM,Nov. 1999, Vol. 42, No. 11, pp. 74-81
putting it all together • myth #6: multimodal integration involves redundancy of content between modes • you have features from a person’s • facial expressions and body language • speech prosody and linguistic content, • even their heartbeat rate • so, what do you do when their face tells you different than their …heart?
but it can be good • what happens when one of the available modalities is not robust? • better yet, when the ‘weak’ modality changes over time? • consider the ‘bartender problem’ • very little linguistic content reaches its target • mouth shape available (viseme) • limited vocabulary
again, why multimodal? • holy grail: assigning labels to different parts of human-human or human-computer interaction • yes, labels can be nice! • humans do it all the time • and so do computers (e.g., classification) • OK, but what kind of label?
In the beginning … • Based on the claim that ‘there are six facial expressions recognized universally across cultures’… • all video databases used to contain images of sad, angry, happy or fearful people… • thus, more sad, angry, happy or fearful people appear, even when data involve HCI, and subtle emotions/additional labels are out of the picture • can you really be afraid that often when using your computer?
the Humaine approach • so where is Humaine in all that? • subtle emotions • natural expressivity • alternative emotion representations • discussing dynamics • classification of emotional episodes from life-like HCI and reality TV
HUMAINE 2010 three years from now in a galaxy (not) far, far away…
a fundamental question • OK, people may be angry or sad, or express positive/active emotions • face recognition provides response to the ‘who?’ question • ‘when?’ and ‘where?’ are usually known or irrelevant • but, does anyone know ‘why?’ • context information • semantics
is it me or?... • some modalities may display no clues or, worse, contradicting clues • the same expression may mean different things coming from different people • can we ‘bridge’ what we know about someone or about the interaction with what we sense? • and can we adapt what we know based on that? • or can we align what we sense with other sources?
another kind of language • sign language analysis poses a number of interesting problems • image processing and understanding tasks • syntactic analysis • context (e.g. when referring to a third person) • natural language processing • vocabulary limitations
want answers? Let us try to extend some of the issues already raised!
C1 C2. Cn Semantics – Context (a peak at the future) Centralised /Decentralised Knowledge Repository Visual- data Segmentation Feature Extraction Visual analysis Fuzzy Reasoning Engine (FiRE) Adapt - ation Semantic Analysis Label- ling Fusion Classifiers Ontology infrastructure Context Context analysis
Standardisation Activities • W3C Multimedia Semantics Incubator Group • W3C Emotion Incubator Group • Provide machine understandable representations of available Emotion Modelling, Analysis, Synthesis theory, cues and results to be accessed through the Web and used in all types of affective interaction.