Speech and Language Technology For Dialog-based CALL

Speech and Language Technology For Dialog-based CALL Gary Geunbae Lee, POSTECH

Outline 1 Introduction DBCALL: Educational Error Handling 2 3 Spoken Dialog Systems 4 5 PESAA: Postech English Speaking Assessment and Assistant Field Study

CHAPTER 1 iNTRODUCTION

English Tutoring Methods • Tranditional Approches <Multimedia> <Classroom> <Textbook> CALL Approches <CMC> <ICALL>

Socio-Economic Effects • Changing our current foreign language education system in public schools • From vocabulary and grammar methodology • To speaking ability • Significant effect of decreasing private English education fee • private English education fee in Korea, reaching up to 16 trillion won annually • Expect the effect of the overseas export • Japan, China, etc.

Interdiciplinary Research Evaluation • Cognitive Effect • Affective Effect NLP • Dialog Management • Error Detection • Corrective Feedback • Comprehensible Input and Output • Corrective Feedback • Attitude & Motivation SLA

Second Language Acquisition Second Language Acquisition Theory • Input Enhancement • Comprehensible input • Provision of inputs with high frequency • Immersion • Authentic environment • Direct form-meaning mapping • Noticing & Attention • Output hypothesis test • Corrective feedback • Affective factors • Motivation • Goal achievement & rewards • Interest • Importance of L2

Dialog-Based CALL (DB-CALL) • Spoken Dialog System • DB-CALL System <Educational Robot> <3D Educational Game>

Existing DB-CALL Systems • Alelo • Tactical language & culture training system • Learn Iraqi Arabic by playing a fun video game • Dedicated to serving langauge and culture learning needs of military • SPELL • Learning English in functional situations such as going to a restaurant, expressing (dis-)likes, etc. • The speech recogniser is programmed to recognise grammatical and some ungrammatical utterances • DEAL • Learning Dutch in a flea market situation • The model can also convey extra linguistic signs such as lip-synching, frowning, nodding, and eyebrow movements

Video Demo

CHAPTER 2 Spoken dialog systeMs

SPOKEN DIALOG SYSTEM (SDS)

Home networking Car-navigation Tele-service Robot interface SDS APPLICATIONS

Automatic Speech Recognition (ASR) 버스 정류장이 어디에 있나요? 버스 정류장이 어디에 있나요? Feature Extraction Decoding Speech Signals Word Sequence Network Construction Speech DB Acoustic Model Pronunciation Model Language Model HMM Estimation G2P Text Corpora LM Estimation

Info. Source Feature Extraction / Selection + Dialog Act Identification Frame-Slot Extraction Relation Extraction + + + Unification + Spoken Language Understanding (SLU) • Semantic Frame Extraction (~Information ExtractionApproach) • Dialog act / Main action Identification ~ Classification • Frame-Slot Object Extraction ~ Named Entity Recognition • Object-Attribute Attachment ~ Relation Extraction How to get to DisneyWorld? Domain: Navigation Dialog Act: WH-question Main Action: Search Object.Location.Destination=DisneyWorld Examples of semantic frame structure Overall architecture for semantic analyzer

JOINT APPROACH • Named Entity ↔ Dialog Act [Jeong and Lee, SLT2006][Jeong and Lee, IEEE TASLP2008]

HDP-HMM for Unsupervised Dialog Acts β ~ GEM(α), ω~ Dir(ω0) for each hidden state k ∈ [1,2,…] πk~ DP(α',β) ϕk~ Dir(ϕ0), θk~ Dir(θ0) for each dialog d λd~ Beta(λ0) for time stamp t zt ~ Multi(πzt-) for each entity e ei ~ Multi(θzt) for each word w xi ~ Bern(λd)[select word type] if xi = 0:wi ~ Multi(ϕzt) elsewi ~ Multi(ω) [background LM] Generative Story

CRF with Posterior Regularization for unsupervised NER • Constraints for NER # We would like to go on a tour during the day . # -> null 0:1.000:We would like to go on a tour during the day . # We have two daytime tours # -> the Downtown Tour and the All Around Town Tour . 0:1.000:We have two daytime tours # Which tour goes to the Statue of Liberty ? # -> null 0:1.000:Which tour goes to the <PLACE>Statue of Liberty</PLACE> ? # You can visit the Statue of Liberty on either tour . # -> null 0:1.000:You can visit the <PLACE>Statue of Liberty</PLACE> on either tour . … Welcome O:1.000 W1=<s> O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W2=<s>,Welcome O:1.000 W3=_ O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W4=_ O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W5=_ O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W6=to O:1.000 W7=Welcome,to O:1.000 W8=the O:0.924 PLACE-b:0.005 PLACE-i:0.006 TOURS-b:0.001 TOURS-i:0.064 W9=Welcome,the O:1.000 … BOARD_TYPE:Hop-on BOARD_TYPE:Hop-off PLACE:Times Square PLACE:Empire State Building PLACE:Chinatown PLACE:Site of the World Trade Center PLACE:Statue of Liberty PLACE:Rockefeller Center PLACE:Central Park … Heuristic Matching Extract Features LABELED FEATURES HYPOTHESIS DICT/DB/Web Welcome to the New York City Bus Tour Center . I want to buy tickets for me and my child . What kind of tour would you like to take ? We would like to go on a tour during the day . We have two daytime tours: the Downtown Tour and the All Around Town Tour . Which tour goes to the Statue of Liberty ? … Constraints Learning CRF Model with PR UNLABELD CORPUS

Vanilla EXAMPLE-BASED DM (EBDM) • Example-based approaches Turn #1 (Domain=Building_Guidance) Dialog Corpus USER: 회의 실 이 어디 지 ? [Dialog Act = WH-QUESTION] [Main Goal = SEARCH-LOC] [ROOM-TYPE =회의실] SYSTEM: 3층에 교수회의실, 2층에 대회의실, 소회의실이 있습니다. [System Action = inform(Floor)] Indexed by using semantic & discourse features Domain = Building_Guidance Dialog Act = WH-QUESTION Main Goal = SEARCH-LOC ROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled) LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0 Previous Dialog Act = <s>, Previous Main Goal = <s> Discourse History Vector = [1,0,0,0,0] Lexico-semantic Pattern = ROOM_TYPE 이 어디 지 ? System Action = inform(Floor) Dialog Example Having the similar state Dialog State Space [Lee et al., SPECOM2009]

Error handling and N-best support • To increase the robustness of EBDM with prior knowledge 1) Error Handling • AgendaHelp • S: Next, you can do the subtask 1) Asking the room's role, or 2)Asking the office phone number, or 3) Selecting the desired room for navigation. If the system knows what the user will do next Dynamic Help Generation FOCUS NODE LOCATION UtterHelp S: Next, you can say 1) “What is it?”, or 2) “What’s the phone number of [ROOM_NAME]?”, or 3) “ Let’s go there. ROOM ROLE OFFICE PHONE NUMBER GUIDE NEXT_TASK [Lee et al CSL2010]

Error handling and N-best support • To increase the robustness of EBDM with prior knowledge 2) N-best support If the system knows which subtask will be more probable next Rescoring N-best hypotheses (h1~hn) h1 ROOM NAME h3 FLOOR LOCATION h2 OFFICE PHONE NUMBER h4

Misunderstanding handling by Confirmation [Kim et al SLT 2010]

The Framework of ranking-based EBDM EBDM Dialog Examples Scoring Module Discourse Similarity Calculated Scores Relative Position RankSVM User Intention (system intention) system Intention (user intention) Entity Constraint Dialog Act Features [Noh et al IWSDS2011]

Dialog Simulation • User Simulation for spoken dialog systems involves four essential problems User Intention Simulation User Utterance Simulation Spoken Dialog System Simulated Users ASR Channel Simulation [Jung et al., CSL 2009]

Dialog Studio Architecture Design Step Semantic Structure Dialog Structure KnowledgeStructure ExternalComponent Dialog StudioComponent Annotation Step SemanticAnnotator DialogAnnotator KnowledgeAnnotator File KnowledgeImporter Corpus SLUCorpus DialogCorpus KnowledgeSource LanguageSynchronization Step DialogUtterance Pool Training Step ASRTrainer SLUTrainer DMTrainer KnowledgeBuilder Model ASRModel SLUModel DialogModel KnowledgeModel Running Step ASR SLU DM [Jung et al., SPECOM 2008]

Architecture of WOZ [Lee et al SLATE2011] User speech mic speaker human subject Wizard User Screen Wizard Screen TTS Text input User Character Control NPCs Control Wizard speech(Network RPC)

User Screen (Mission)

CHAPTER 3 DBCALL: Educational error handling

Global Errors • Global errors are errors that affect overall sentence organization. They are likely to have a marked effect on comprehension. [1] What is the purpose of your trip? It’s ... I ... purpose business Intention: inform(trip-purpose) Sorry, I didn’t understand. What did you say? You can say “I am here on business” I am here on business

Hybrid Model • Robust to learners’ errors • Hybrid model combining utterance-based model and dialog context-based model Learner’s Utterance Dialog State Level 1 Data Level 1 Utterance Model Dialog Context Model Dialog Manager Level 2 Data Level 2 Utterance Model Level N Data Level N Utterance Model Learner‘s Intention Lee, S., Lee, C., Lee, J., Noh, H., & Lee, G. G. (2010). Intention-based Corrective Feedback Generation using Context-aware Model. Proceedings of International Conference on Computer Supported Education.

Formulating the prediction as probabilistic inference: Chain rule Bayes’ rule Ignore invariants Utterance Model Dialog-Context Model • Maximum Entropy • Features: • Word • Part of speech • Enhanced K-Nearest Neighbors • Features: • Previous system intention • Previous user intention • Current system intention • A list of exchanged information • Number of database query results

Dialog-Context Model Segment #2 (Domain = Fruit Store) Dialog Corpus SYSTEM: Namsu, what would you like to buy today? [Intention = Ask(Select_Item)] USER: I’d like to buy some oranges [Intention = Inform(Order_Fruit), ITEM_NAME = orange] SYSTEM: How many oranges do you need? [Intention = Ask(Order_Quantity)] USER: I need three oranges [Intention = Inform(Order_Quantity), NUM = three] Indexed by using semantic & discourse features Domain = Fruit_Store Previous System Intention = Ask(Select_Item) Previous User Intention = Inform(Order_Fruit) System Intention = Ask(Order_Quantity) Exchanged Information State= [ITEM_NAME = ‘orange’ (C), ITEM_QUANTITY = 3 (U)] Number of DB query results = 0 Dialog State Dialog State Space User Intention = Inform(Order_Quantity) User Intention

Recast Feedback Generation Example Expresssion DB Example Search Intention Recognition Example Expressions User’s Utterance Pattern Matching N > θ No Feedback Y Feedback

Local Errors • Local errors are errors that affect single elements in a sentence. [1] What is the purpose of your trip? I am here at business ErrorInfo: prep_sub(at/on) On business I am here on business [1] Ellis., R. (2008). The Study of Second Language Acquisition. 2nd ed. Oxford: OUP

Local Error Detecter Architecture Erroneous Text Grammatical ErrorSimulation Text N-gram LM N-gram LM ASR ASR’ Merged Hypotheses Grammaticality Checker Error-type Classifier Feedback Error Patterns Error Frequency Lee, S., Noh, H., Lee, K., & Lee, G. G., (2011) Grammatical Error Detection for Corrective Feedback Provision in Oral Conversations, Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco.

Two-Step Approach • Data Imbalance Problem • Simply produce majority class • Or, High false positive rate • Large number of error types • Makes model learning and selection procedure vastly complicated • Grammaticality checking itself can be useful for some Applications • Categorizing learners’ proficiency level • Generating implicit corrective feedback such as repetition, elicitation, and recast feedback Grammatical Error Detection I am here at business Grammaticality Checking 0 0 0 1 0 1) Error Type Classification 2) None None None PRP_LXC None

Grammaticality Checker- Feature Extraction

Grammaticality Checker- Model Learning • Binary Classification • Support Vector Machine • Model Selection • Radial Basis Kernel • Search for C, γ which optimize: • Maximize F-scoreSubject to Precision > 0.90, False positive rate < 0.01 • 5-fold cross-validation

Error Type Classification • Error type information is useful for • Meta-linguistic feedback • Sophisticated learner model • Simplest way • Choose the error type associated with the top ranked error pattern • Two flaws: • does not have a principled way to break tied error patterns • does not consider the error frequency • Weighting according to error frequency • Score(e) = TS(e) + α * EF(e)

GES: Grammar Error Simulator Correct Sentences Grammatical Error Simulator Incorrect Sentences <LM Adaptation & Grammatical Error Detection> Automatic Speech Recognizer Error Types

GES Application <Grammar Quiz Generation>

Markov Logic Network • subject-verb agreement errors • omission errors of prepositions • omission errors of articles He want go to movie theater Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. Proceedings of the ACL 2009, Singapore, August 2009. Sungjin Lee, Jonghoon Lee, Hyungjong Noh, Kyusong Lee, Gary Geunbae Lee. (2011) Grammatical Error Simulation for Computer-Assisted Language Learning, Knowledge-Based Systems

Grammar Error Simulation • Realistic errors • Encoding characteristics of learners’ errors using the Markov logic • Over-generalization of some rules of the L2 • Lack of knowledge of some rules of the L2 • Applying rules and forms of the first language into the L2

Overall Process

NICT JLE Corpus Erroneous part <n_num crr=“x”>...</n_num> • Number of interviews • 167 • Number of sentences of interviewees • 8,316 • Average length of sentences • 15.59 • Nubmer of total errors • 15,954 POS (i.e. n=noun) Corrected form Grammatical system (i.e. num=number) Example) I belong to two baseball <n_num crr=“teams”>team</n_num>

CHAPTER 4 PESAA: POSTECH English speaking assessment & assistant

English oral proficiency assessment:International test

English oral proficiency assessment:Korean national test • National English Ability Test (NEAT) • Tasks • Answering short questions (communication) • Describing pictures (story telling) • Presentation • Describing figures, tables, and graphs • Introducing products or events • Giving an opinion (discussion)

English oral proficiency assessment:General common tasks • Giving an opinion / discussion • Rubrics • Delivery • Pronunciation • Fluency (Prosody) • Language use • Grammar • Word choice • Topic development • Organization • Discourse • Contents

Requirements:Real environment Existing systems for read speech NEAT Spontaneous speech Text-independent input

Speech and Language Technology For Dialog-based CALL