Applying Speech and Language Technology to Foreing Language Education

Applying Speech and Language Technology to Foreing Language Education Grażyna Demenko, Natalia Cylwik, Agnieszka Wagner Adam Mickiewicz University, Institute of Linguistics, Department of Phonetics 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Introduction L2 learning – to acquire and develop different skills • reading - vocabulary • writing - grammar • listening - perception • speaking – pronunciation & prosody Computer-assisted language learning (CALL) Computer-assisted pronunciation training (CAPT) 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Outline • Requirements on an intelligent tutoring system • Software –tutoring system AzAR • Pronunciation training • curriculum • feedback • Prosody training • curriculum • feedback 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Requirements on an intelligent CAPT system • allow for training of both pronunciation and prosody: weak pronunciation can sometimes preclude full intelligibility of speech, but prosody is important too, because it helps listeners to process the segmental content • identify precisely the location and type of the error • provide scoring of learner’s utterance that gives immediate information on the overall output quality • provide effective feedback via different channels (visual, aural, also descriptive, contrastive feedback) – the feedback should be relevant to the type of error made by the learner, easy to interpret and constructive, so that the learner understands how to self-correct and get improvement • keep track of the learner’s performance, so that identification of features that should be practiced is possible and the learner’s progress can be monitored • user-friendliness - it should be clear how to interpret displays and evaluate results 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

AzAR (Automat for Accent Reduction) • focuses on specific languagepairs: Slavonic (Polish, Slovak, Russian, Czech) and German • is a knowledge-based system: it uses expert’s knowledge on typical errors made by L2 learners caused by interference with their native language (L1) phonology and phonetics (analyses of large non-native speech corpora) • includes an extensive curriculum for production and perception training of difficult segmental and prosodic contrasts • learner’s taskis to listen to the recording of the utterance produced by the reference voice and to repeat it (in the production scenario) or to discriminate between utterances realized by the reference voice (in the perception scenario) 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Pronunciation training: curriculum (1) • Production exercises created by expert phoneticians on the basis of analyses of non-native speech corpora • Goal (1): elicit specific pronunciation errors • Goal (2): concentrate on the most common problems • pronunciation of sounds which do not exist in one’s mother tongue (L1) or are too difficult to pronounce • (e.g. production of Polish vowels instead of the German ones by speakers with L1 Polish) • carry-over of pronunciation regularities from L1 • (e.g. assimilation rules with respect to devoicing) • overgeneralizations of target (L2) language regularities • (e.g. mapping of polish graphemes [ą] and [ę] to /ow~/ and /ew~/ irrespective of the following consonantal context) 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Pronunciation training: curriculum (2) • L1 PL – L2 DE • L1DE– L2 PL 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Features of the feedback system (1) • multimodality: visual and audiofeedback • the software uses HMM-based speech recognition and speech signal analysis on the learner’s input which makes a visual and aural comparison of the user’s own performance with that of the reference voice possible • in the detection and assessment of pronunciation errors the system relies on “mispronunciation hypotheses” defined by experts • mięsień (a muscle) • canonical pronunciation: • /m j e j~ s' e n'/ • mispronunciation hypotheses: • 1) m j e-E~ j~- s' e n' • 2) m j e-E~ j~- s' e n'-n • 3) m j e-E~ j~- s'-S' e n'-n • 4) m j e-E~ j~- s'-S e n'-n 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Features of the feedback system (2) learner’s utterance is displayed and scored all uttered phones are marked using a color scale An oscillogram of the model utterance is presented simultaneously to allow for comparison User interface in AzAR, template for training vowel contrasts in German (here: long tense /i:/ versus short lax /I/) animated visualization of the vocal tract (lips area and articulators movements) and a formants graph for particular phones The learner can listen to his/her own realization of the utterance and to that produced by the reference voice. 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Features of the feedback system (3) • Tutorial: gives introduction to acoustic and articulatory phonetics and explains how to interpret the acoustic displays 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Features of the feedback system (4) For each exercise in the curriculum a passage containing information on the classification, features and articulation of the phone is provided as well as a sagittal slice of the vocal tract during the phone production and pictures of the lip area 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Prosody training: curriculum (1) • Production and perception exercises created by expert phoneticians on the basis of analyses of non-native speech corpora • Goal (1): elicit specific prosodic errors • Goal (2): enable an efficient training of the realization of prosodic features in smaller and larger syntactic units 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Prosody training: curriculum (2) Word-level prosody similar word form in Polish and German, but different accentuation: student (vs. Student), Irak (vs. Irak) stress shift: chodził - chodziłem ABW /a.be.vu/ menu eksmąż Acha! matematyka byliśmy regular word stress: łata - sałata dał, dałby, dałby mi, dałby mi go do domu nie wiem 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Prosody training: curriculum (3) Sentence-level prosody • Intonation in questions: • Are you going home? (rising) • What are you doing? (falling) • Are you going by car or by plane? (falling) • Do you have it? - What? – A dress. (falling) • -Do you have it? - What? – Do you have it? (rising) Intonation patterns (accentuation and phrasing) in simple and complex neutral statements Intonation patterns in commands, requests, warnings, contrastive and emphatic sentences 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Prosody training: curriculum (4) Perception exercises Asking for additional information/repetition: assigning answer to the question, e.g. – I ‘ve eaten it. – What? a) A banana. b) I’ve eaten it. Completing sentence with focus, e.g. – It’s a big car… a) not a small one. b) not a bus. Assigning question to a sentence with focus, e.g. – Martha is going to the seaside. a) Who is going to the seaside? b) Where is Martha going? 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Prosody training: feedback • automatic assessment: qualitative (categorical) and quantitative (parametric) • results are displayed on a color scale (red - green) • the learner is instructed to compare his/her realization to that of the native speaker – quantitative measurements of both realizations and explanations are provided as a support • relevant portions of the intonation contour (accented and phrase-boundary words) are described parametrically • a higher-level categorical representation of utterance's intonation is derived from the parametric description; intonation contours which have different categorical representations convey different meanings • an accurate visual representation of student’s and native speaker’s pitch contour in real time paired with auditory feedback • pitch contours are stylized (Pitch Line software) to provide a nearly continuous representation and to ensure that only perceptually relevant pitch variations are displayed Polish native German learner #mam #ją #co #sukienkę 2nd International Symposium on Multimedia – Applications and Processing (MMAP'09)

Training session in AzAR

Thank you for your attention!

Applying Speech and Language Technology to Foreing Language Education