SCILL: Spoken Conversational Interaction for Language Learning

SCILL: Spoken Conversational Interaction for Language Learning Stephanie Seneff (seneff@csail.mit.edu) Jim Glass (jrg@csail.mit.edu) Spoken Language Systems Group MIT Computer Science and Artificial Intelligence Lab Steve Young (sjy@eng.cam.ac.uk) Speech Group CUED Machine Intelligence Lab

Language Generation Speech Synthesis Dialogue Management Audio Database Speech Recognition Context Resolution Language Understanding Conversational Interfaces

Hub Galaxy Architecture Conversational Interfaces Language Generation Speech Synthesis Dialogue Management Audio Database Speech Recognition Context Resolution Language Understanding

Bilingual Weather Domain: Video Clip

Computer Aids through Conversational Interaction • Language teachers have limited time to interact with students in dialogue exchanges • Computers provide non-threatening environment in which to practice communicating • Three-phase interaction framework is envisioned: • Preparation: practice phrases, simulated dialogues • Conversational Interaction • Telephone conversation with graphical support • Seamless translation aid • Assessment • Review dialog interaction • Feedback and fluency scores

Domain Expert Tutor MIT SLS Bilingual Conversational Dialogue Systems CU Speech Group Speech Recognition and Pronunciation Scoring SCILL: A Spoken Computer Interface for Language Learning Conversational systems for interactive environment for language learning Speaks only target language. Has access to information sources. Can provide translations for both user queries and system responses

Technology Requirements • Robust recognition and understanding of foreign-accented speech • If recognition is too poor, student may become frustrated • Customize vocabulary and linguistic constructs to lesson plans • High quality cross-lingual language generation • Natural and fluent speech synthesis • Ability to automatically generate simulated dialogues • System should be able to generate multiple dialogues based on a given lesson topic on the fly • Allows the student to see example sentence constructs for a particular lesson • Ability to reconfigure quickly and easily to new lessons • Automatic scoring for fluency, pronunciation, tone quality, use of vocabulary, etc.

WEB SERVER USER INTERFACE SCILL System Overview

Bilingual Spoken Dialogue Interaction: Current Status • Initial version of end-to-end system is in place for the weather domain • Rain, snow, wind, temperature, warnings (e.g., tornado), etc. • MIT Recognizer supports both English and Mandarin • Seamless language switching • English queries are translated into Mandarin • Mandarin queries are answered in Mandarin • User can ask for a translation into English of the response at any time • Currently using off-the-shelf Mandarin synthesizer from ITRI • Plan to develop high quality domain-dependent Mandarin synthesis using our Envoice tools • System can be configured as telephone-only or as telephone augmented with a Web-based GUI interface

Interlingua Parse Generate Chinese corpus Recognizer English Recognizer Language Model Chinese Recognizer Language Model English Network Chinese Network Bilingual Recognizer Construction English corpus Create Mandarin corpus by automatically translating existing English corpus Automatically induce language model for both English and Mandarin recognizers using NL grammar Two recognizers compete in common search space

Standard HTK LVCSR Setup: • PLP Front-end with 1st/2nd/3rd Derivatives transformed using HLDA • 3 state cross-word hidden Markov models • Decision tree clustered context dependent triphones • N-gram language model smoothed with class-based language model HTK Mandarin Speech Recognizer Except: • Standard PLP front-end augmented with F0+derivatives (F0 added after HLDA transformation) • 46 phone acoustic model set with long final phones split eg uang -> ua ng • Questions about tone added to decision tree context clustering

A simple approximation sh ih d ax . . . Good Bad Expert Rankings Good Bad P(p | A) Relates confidence scores to human perception HMM-Based Pronunciation Scoring Basic approach: • estimate posterior probabilities (ie confidence score) of each phone or syllable given acoustics • map confidence scores to good/bad decision using data labelled by experts

English Chinese Spanish Japanese Recognition Models NLU Generation Rules Semantic Frame NLG Parsing Rules English Chinese Spanish Japanese Synthesis Speech Corpora Multilingual Translation Framework Common meaning representation: semantic frame

rain/storm clause: weather_event topic: precip_act, name: thunderstorm, num: pl quantifier: some pred: accompanied_by adverb: possibly topic: wind, num: pl, pred: gusty and: precip_act, name: hail weather wind hail Japanese: Spanish: Algunas tormentas posiblement acompanadas por vientos racheados y granizo Chinese:¤@ ¨Ç ¹p «B ¥i ¯à ·| ¦ñ ¦³ °} · ©M ¦B ¹r Content Understanding and Translation English: Some thunderstorms may be accompanied by gusty winds and hail Frame indexed under weather, wind, rain, storm, and hail

Audio Demonstration • User asks: “Will it rain tomorrow in Boston?” • System paraphrases query, then responds in Chinese • “Please repeat that” in English or Chinese interpreted identically • System repeats response in Chinese • User speaks query in English: seamless language switching • System paraphrases, then translates query into Chinese • User attempts to repeat translation • Recognition error: hallucinates an erroneous date (February 30) which will be remembered • System supplies known cities in England • User chooses London • System has no weather for London on February 30 • User asks “how about today?” • System provides London’s weather today • User asks for a translation into English, which is provided

{c eform :attribute “name” :person “you” } Key-value Representation generate generate Linguistic Frame transfer Linguistic Frame generate parse parse English query Chinese query Proposed Translation Procedure {c wh_question :topic {q name } :pro “you” :verb “call” :complement {q object :trace “what” } {c wh_question :topic {q name :poss “you” } :auxil “link” :complement {q object :trace “what” } If generated query fails to parse, simplify interlingua and generation “what is your name” “ni3 jiao4 shen2_me5 ming2_zi4”

Type-in Window Input: Reply Window Query: Response: Proposed Exercise using Typed Inputs Input: Da2 la2 si4 hui4 xia4 yu3 ming2 tian1 ma5? System is able to parse query in spite of tone errors and (limited) syntax errors Next: Los Angeles wind Saturday Next: Dallas rain tomorrow Query: Da2 la1 si1ming2 tian1 hui4 xia4 yu3 ma5? System color codes errors in tone and in syntactic constructs Response: Da2 la1 si1 ming2 tian1 xia4 wu3 xia4 te4 da4 yu3

Testing the Effectiveness of Training on Typed Input: Proposed Measures • Compare the quality of spoken dialogue recorded before and after a Web-based training session • Measures of fluency: • Syntactic well-formedness • Tone production accuracy • Frequency of pauses, edits, and filler words • Phonetic quality , etc. • Measures of communication success: • Frequency of usage of translation assistance • Understanding error rate • Task completion • Time to completion, etc.

parse Interlingual Representation generate Mandarin Sentence Grammar Induction Mandarin Parsing Grammar Technology Goal: Automated Language Understanding Once translation ability exists from English to target language, can create reverse system almost effortlessly English Sentence Corpus Pairs Utilizes English parse tree and Mandarin generation lexicon to induce Mandarin parse tree

English Interlingua Interlingua Building NxN Translation Efficiently Japanese Mandarin Arabic French Spanish Urdu Korean Automatic Grammar Induction

Future Plans (Near Term and Long Term) • Install current version of system at Cambridge University • Incorporate CU Mandarin recognizer • Add support for audio input at the computer • Build high quality synthesis capability • Improve understanding, dialogue, and translation performance • Collect and transcribe data from language learners and assess both system and students • Develop various scoring algorithms for student fluency • Refine all aspects of system based on collected data

SCILL: Spoken Conversational Interaction for Language Learning

SCILL: Spoken Conversational Interaction for Language Learning

Presentation Transcript

CONVERSATIONAL IMPLICATURES

Dimensions of Tourism

MITSL Meaning, Imagery, Tone, Structure, Language My Itchy Toes Smell Loads

CLiE, November 28 th , 2014

Psy1302 Psychology of Language

Designing Interaction

Second Language Acquisition

Conversational Reframing Acknowledgements

Interaction Design

Towards Google-like Search on Spoken Documents with Zero Resources How to get something from nothing in a language that

General Body Meeting 10-31-12

CS 224S / LINGUIST 285 Spoken Language Processing

Language

HCI - Lesson 2 Interaction

Continuous acoustic detail affects spoken word recognition

PSY 369: Psycholinguistics

Discourse and Dialogue Processing in Spoken Intelligent Tutoring Systems

LANGUAGE ACQUISITION vs. LANGUAGE LEARNING

A Demonstration of Continuous Interaction with Elckerlyc

Continuous acoustic detail affects spoken word recognition

Language