multimodality, universals, natural interaction…

multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

going multimodal • ‘multimodal’ is this decade’s main ‘affective interaction’ aspect. • plethora of modalities available to capture and process • visual, aural, haptic… • ‘visual’ can be broken down to ‘facial expressivity’, ‘hand gesturing’, ‘body language’, etc. • ‘aural’ to ‘prosody’, ‘linguistic content’, etc.

why multimodal? • Extending unimodality… • recognition from traditional unimodal inputs had serious limitations • Multimodal corpora become available • What to gain? • have recognition rates improved? • or just introduced more uncertain features

essential reading • Communications of the ACM,Nov. 1999, Vol. 42, No. 11, pp. 74-81

putting it all together • myth #6: multimodal integration involves redundancy of content between modes • you have features from a person’s • facial expressions and body language • speech prosody and linguistic content, • even their heartbeat rate • so, what do you do when their face tells you different than their …heart?

first, look at this video

and now, listen!

but it can be good • what happens when one of the available modalities is not robust? • better yet, when the ‘weak’ modality changes over time? • consider the ‘bartender problem’ • very little linguistic content reaches its target • mouth shape available (viseme) • limited vocabulary

but it can be good

again, why multimodal? • holy grail: assigning labels to different parts of human-human or human-computer interaction • yes, labels can be nice! • humans do it all the time • and so do computers (e.g., classification) • OK, but what kind of label?

In the beginning … • Based on the claim that ‘there are six facial expressions recognized universally across cultures’… • all video databases used to contain images of sad, angry, happy or fearful people… • thus, more sad, angry, happy or fearful people appear, even when data involve HCI, and subtle emotions/additional labels are out of the picture • can you really be afraid that often when using your computer?

the Humaine approach • so where is Humaine in all that? • subtle emotions • natural expressivity • alternative emotion representations • discussing dynamics • classification of emotional episodes from life-like HCI and reality TV

Humaine WP4 results

HUMAINE 2010 three years from now in a galaxy (not) far, far away…

a fundamental question

a fundamental question • OK, people may be angry or sad, or express positive/active emotions • face recognition provides response to the ‘who?’ question • ‘when?’ and ‘where?’ are usually known or irrelevant • but, does anyone know ‘why?’ • context information • semantics

a fundamental question (2)

is it me or?...

is it me or?... • some modalities may display no clues or, worse, contradicting clues • the same expression may mean different things coming from different people • can we ‘bridge’ what we know about someone or about the interaction with what we sense? • and can we adapt what we know based on that? • or can we align what we sense with other sources?

another kind of language

another kind of language • sign language analysis poses a number of interesting problems • image processing and understanding tasks • syntactic analysis • context (e.g. when referring to a third person) • natural language processing • vocabulary limitations

want answers? Let us try to extend some of the issues already raised!

C1 C2. Cn Semantics – Context (a peak at the future) Centralised /Decentralised Knowledge Repository Visual- data Segmentation Feature Extraction Visual analysis Fuzzy Reasoning Engine (FiRE) Adapt - ation Semantic Analysis Label- ling Fusion Classifiers Ontology infrastructure Context Context analysis

Standardisation Activities • W3C Multimedia Semantics Incubator Group • W3C Emotion Incubator Group • Provide machine understandable representations of available Emotion Modelling, Analysis, Synthesis theory, cues and results to be accessed through the Web and used in all types of affective interaction.

multimodality, universals, natural interaction…