340 likes | 356 Vues
Explore the interaction of verbal, prosodic, and visual components in language understanding. Discover the significance of communication channels and their contributions.
E N D
Взаимодействиевербального,просодическогоивизуального каналов в понимании речи Ярославль 22 ноября 2012 А.А. Кибрик (Институт языкознания РАН и МГУ имени М.В.Ломоносова) aakibrik@gmail.com
INTERACTION OF THE VERBAL, PROSODIC, AND VISUAL COMPONENTS in language understanding Jaroslavl’ November 22, 2012 Andrej A. Kibrik (Institute of Linguistics RAN and Lomonosov Moscow State University) aakibrik@gmail.com
The mainstream linguistic approach • Language consists of hierarchically organized segmental units, such as phonemes, morphemes, words, phrases, and sentences • Linguistic form is thus equated with verbal form
However • Apart from sound, there are other channels (or components) of communication, in the first place through vision (body language - gesture, mimic, gaze, posture, etc.) • Also, there are prosodic, that is non-verbal (non-segmental) aspects to sound • Imagine prosody-free talk • or, vice versa, talk behind a wall
Communication channels • The verbal component, prosody, and body language all count as distinct communication (or information) channels • They all cooperate in getting message from speaker to addressee • This is what is sometimes called the multimodal approach • Cf. Реформатский 1963: How the non-verbal “text” interacts with the verbal text?
Multimodality • ‘‘A multimodal approach assumesthat the message is ‘spread across’ all the modes ofcommunication. If this is so, then each mode is apartial bearer of the overall meaning of the message.’’ (Kress2002). • “Any use of language is inescapably multimodal” (Scollon 2006) • “Unimpairedcommunication is, of course, inherently multimodal,with the speech content being modified byprosody and delivered in parallel with facial expression,gesture, posture, and a range of other nonverbalcommunication methods.” (Alm 2006) • “Within biology, experimental psychology, and cognitive neuroscience, a separate rapidly growing literature has clarified that multisensory perception and integration cannot be predicted by studying the senses in isolation.” (Cohen and Oviatt 2006)
What is the contribution of different channels? • Traditional approach of mainstream linguistics: the verbal channel is so central that prosody and the visual channel are at best downgraded as “paralinguistics” • Applied psychology It is often stated that (figures go back to Mehrabian 1971): • body language conveys 55% of information • prosody conveys 38% of information • the verbal component conveys 7% of information • «Words may be what men use when all else fails» (Крейдлин 2002: 6) • Who is right?
Relative contribution of three communication channels? DISCOURSE Vocal channelsVisual channel Verbal channelProsodic channel
Experimental design • Isolate the three communication channels • Present a sample discourse in all possible variants (23=8) • Present each of the eight variants to a group of subjects • Assess the degree of understanding in each case • Such assessment may lead to estimates of the contributions of communication channels
Studies in this line of research • Èl’bert 2006, year paper • Èl’bert 2007, diploma thesis • Reinterpreted and refined in Kibrik and Èl’bert 2008 • Molchanova 2008, year paper • Molchanova 2009, year paper • Molchanova 2010, diploma thesis • Reinterpreted and refined in Kibrik 2011
Èl’bert 2007, Kibrik and Èl’bert 2008 • Russian TV serial “Tajny sledstvija” – “Mysteries of the investigation” • Experimental excerpt: 3 min. 20 sec. • Preceded by a 8 minutes context (that starts from the beginning of the series) • The excerpt fully consists of a conversation, to ensure that we are testing the understanding of discourse rather than of the film in general • Two vocal channels have been separated: • Verbal: running subtitles • Prosodic: superimposed filter creating the “behind a wall” effect • Participants: • 99 participants, divided into 8 groups • Native speakers of Russian • Each group comprised 10 to 17 participants
Eight experimental groups • Group 0: only the context excerpt • Groups 1 (one communication channel) • Verbal: subtitles, temporally aligned • Prosodic: filtered sound • Visual: video • Groups 2 (two communication channels): • Verbal + prosodic = original sound • Verbal + visual: subtitles and video • Prosodic + visual: filtered sound and video • Group 3: original material
Procedure • The context and the experimental excerpts were shown to a group of subjects on a large screen • Each subject was instructed to watch the context and the experimental excerpt and then answer a set of questions concerned with the experimental excerpt alone • Questionnaire was constructed in accordance with the received principles of test tasks (Panchenko 2000) • 23 multiple-choice questions in questionnaire • A subject was supposed to choose only one answer out of four listed variants • What Tamara Stepanovna offers Masha before the beginning of the conversation: • a. to take off her coat • b. to have a cup of tea • c. to have a seat • d. to have a drink • Percentage of correct answers is used as an assessment of a subject’s degree of understanding
Results • All three channels are substantially informative • Verbal > visual > prosodic • Integration of visual and prosodic channels is difficult
Molchanova 2010 • “Contribution of information channels in understanding spoken discourse: methodological aspects” • The following aspects of the prior study have been changed (improved) • Stimulus material • Prosodic channel • Verbal channel • Questionnaire • Interviewing procedure
Stimulus material: discourse type • Shortcomings of movies • Plot facilitates guessing • Possible familiarity with the movie • Quasi-natural behavior of actors • Solution: natural dialogue • Shared activity • Figure-guessing game • Can be filmed by one camera все 3 канала.avi, 0:19 – 0:57 • Remaining problems • Hard to remember the sequence of events • Many events are similar
Stimulus material: speakers • Shortcomings of the prior studies • Same-sex speakers indistinguishable in the prosody-only version • Solutions • Different sexes: F0 range is different • Additional features • Acquainted • Not close friends
Prosodic channel • Shortcomings of the prosodic material as used in previous studies • Èl’bert 2007: noisy sound • Molchanova 2009: Unnatural, “electronic”, sound • Solution: • Loudness is decreased radically at all frequencies except for the speaker’s average F0 frequency • This has led to the “behind the wall” (or “behind the glass”) effect
Verbal channel • Shortcomings of subtitles • Hard to read without punctuation • Especially at the rate of speech • And especially in the “verbal + visual” condition • Solution: spoken prosody-free signal • Each word in transcript is replaced by an individually pronounced word • All thus elicited words are glued together in the right order
Verbal channel • Remaining problem • Unnatural input • No reduction • No intonation • etc.
Questionnaire • Shortcomings of prior studies • Èl’bert 2007: gap between Group 0 (38.3%) and Group 3 (87.4%) is insufficient • Solution • Testing stage • Identify trivial questions (high Group 0) • Identify unfortunate questions (low Group 3) • 30 17 • Group 0: 24.7% correct answers • Group 3: 91.2% correct answers
Interviewing procedure • Shortcomings of prior studies • Participants of various age and life experience • Multiple participants may affect each other’s performance • Need for a large room, loud speakers, and big screen • Solutions • Control for age, gender, geographical origin, social status • Remote implementation • Stimulus materials at Youtube.com • Questionnaire at Googledocs • All participants are in similar conditions • Comfortable, adjustable conditions • No need for audio and video control in large rooms
Kibrik and Èl’bert 2008 vs. Molchanova 2010 • General picture is remarkably similar • All three channels are substantially informative • Verbal > visual > prosodic • Visual + prosodic dip is even sharper • Cleaner results • Two channels is much better than one channel • Verbal and visual channels integrate well
Normalized contribution of three channels • Suppose the three channels are independent • Sum up all percentages of individual channel contributions and normalize to 100% • Identify normalized contribution
Gender differences • Molchanova 2010: gender advantages • Percentages of correct answers
Conclusions • All communicatioin channels are highly significant the traditional linguistic viewpoint is erroneous • The verbal channel is the leading one the viewpoint popular in applied psychology is erroneous • Information from the prosodic and the visual channels is primarily used through integration with the verbal channel • Very similar results have been attained in different studies, in spite of very different methodological details
Further questions • Auditory or graphic presentation of the “verbal alone” channel? • Optimal discourse type? • …and: Other suggestions on this approach?
Thanks for your attention visual channel language verbal channel prosodic channel