Development of Multilingual Speech Technology: South Africa's AST Project Overview

MAI Internship April-May 2002

What? • The AST Project promotes development of speech technology for official languages of South Africa • SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho • Create reusable databases & software • Prototype hotel booking dialogue system • 2000-2003

AST dialogue system: basics Telephone Network DATABASE Speech Synthesis Speech Recognition Dialogue Manager Natural LanguageUnderstanding

AST Speech Database • Use?  input ASR: acoustic training •  output ASR: dictionary • Start from scratch, even for SAE • Telephone data based on SpeechDat • Datasheet utterances • Hierarchical recruiting method • Labeling Tool: PRAAT

Language Spoken Code No. of Speakers 1 English (E) Speech varieties: Mother-tongue English Black English Coloured English Asian English Afrikaans English EE BE CE ASE AE 1500-2000 300-400 300-400 300-400 300-400 300-400 2 isiXhosa (X) XX 300-400 3 Sesotho (S) SS 300-400 4 isiZulu (Z) ZZ 300-400 5 Afrikaans (A) Speech varieties: Mother-tongue Afrikaans Black Afrikaans Coloured Afrikaans AA BA CA 900-1200 300-400 300-400 300-400

AST Speech Database Acoustic signal Manual labour Orthographic annotation Rules & dictionary: Patana Phonemic transcription Forced alignment: HTK Phonetic alignment

AST Speech Recognition • Difficult: • Speaker independent, noisy conditions • Medium-size vocabulary (10.000 words) • Training data sparse • Not so difficult: • Dialogue Manager helps • Phoneme-based HMMs  future diphones • Finite-state language model • Pitch & clicks African languages ignored

AST Natural Language Understanding • Same finite-state network as language model recogniser •  +: all utterances ‘understood’ • -: FSG are limited • Makes no sense to recognise more than we can understand • Semantic labels are activated • Alternative: robust parsing (Phoenix, ATIS)

Meaning Recognised utterance Grammar ID Grammar ID AST Natural Language Understanding Speech Recognition Dialogue Manager NLU FSG

AST Natural Language Understanding • Embedded semantic tags: • ‘drie honderd duisend agt en neëntig’  3 0 0 0 9 8 t1=3 t2=0 t3=0 V6=3 V5=0 V4=0 V3=0 V2=9 V1=8

AST Dialogue Manager • Trade-off: naturalness  response restriction • System-directed: predictability user utterances, simple dialogues • Mixed-initiative: shorter dialogues, more recognition errors • User-initiative: unpopular

AST Dialogue Manager • Design: • Early focus on users and task • Wizard-of-Oz: pay no attention to the man behind the curtain • System-in-the-loop • Finite-state structure because of simplicity and functionality • Possible frame-based approach in future

AST Speech Synthesis • Fixed machine utterances: pre-recorded speech • Database queries: limited-domain synthesis (Festival platform)

Conclusion • Finite-state approach in • Recogniser • NLU component • Dialogue manager • Workable prototype • New fundings 2003

Development of Multilingual Speech Technology: South Africa's AST Project Overview

Development of Multilingual Speech Technology: South Africa's AST Project Overview

Presentation Transcript

April 2002

22 April 2002

May 31, 2002

11 April 2002

April 30 - 02 May 2002

May 29, 2002

May 2002 presentation

May 9, 2002

29 May 2002

10 MAY 2002

April 2002

April 2002

April 10, 2002

April 29th, 2002

02 APRIL 2002

April 2002

April 30 - 02 May 2002

May 2002