160 likes | 269 Vues
The AST Project focuses on advancing speech technology for South Africa's official languages, including English, Afrikaans, Zulu, Xhosa, and Sesotho. The initiative involves creating reusable databases and software, with a prototype dialogue system for hotel booking. The project's scope spans from speech recognition and synthesis to natural language understanding. Challenges include speaker independence and noisy conditions, while goals target effective user dialogue management. Insights are based on extensive acoustic training data and finite-state models to enhance multilingual support and user interaction.
E N D
What? • The AST Project promotes development of speech technology for official languages of South Africa • SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho • Create reusable databases & software • Prototype hotel booking dialogue system • 2000-2003
AST dialogue system: basics Telephone Network DATABASE Speech Synthesis Speech Recognition Dialogue Manager Natural LanguageUnderstanding
AST Speech Database • Use? input ASR: acoustic training • output ASR: dictionary • Start from scratch, even for SAE • Telephone data based on SpeechDat • Datasheet utterances • Hierarchical recruiting method • Labeling Tool: PRAAT
Language Spoken Code No. of Speakers 1 English (E) Speech varieties: Mother-tongue English Black English Coloured English Asian English Afrikaans English EE BE CE ASE AE 1500-2000 300-400 300-400 300-400 300-400 300-400 2 isiXhosa (X) XX 300-400 3 Sesotho (S) SS 300-400 4 isiZulu (Z) ZZ 300-400 5 Afrikaans (A) Speech varieties: Mother-tongue Afrikaans Black Afrikaans Coloured Afrikaans AA BA CA 900-1200 300-400 300-400 300-400
AST Speech Database Acoustic signal Manual labour Orthographic annotation Rules & dictionary: Patana Phonemic transcription Forced alignment: HTK Phonetic alignment
AST Speech Recognition • Difficult: • Speaker independent, noisy conditions • Medium-size vocabulary (10.000 words) • Training data sparse • Not so difficult: • Dialogue Manager helps • Phoneme-based HMMs future diphones • Finite-state language model • Pitch & clicks African languages ignored
AST Natural Language Understanding • Same finite-state network as language model recogniser • +: all utterances ‘understood’ • -: FSG are limited • Makes no sense to recognise more than we can understand • Semantic labels are activated • Alternative: robust parsing (Phoenix, ATIS)
Meaning Recognised utterance Grammar ID Grammar ID AST Natural Language Understanding Speech Recognition Dialogue Manager NLU FSG
AST Natural Language Understanding • Embedded semantic tags: • ‘drie honderd duisend agt en neëntig’ 3 0 0 0 9 8 t1=3 t2=0 t3=0 V6=3 V5=0 V4=0 V3=0 V2=9 V1=8
AST Dialogue Manager • Trade-off: naturalness response restriction • System-directed: predictability user utterances, simple dialogues • Mixed-initiative: shorter dialogues, more recognition errors • User-initiative: unpopular
AST Dialogue Manager • Design: • Early focus on users and task • Wizard-of-Oz: pay no attention to the man behind the curtain • System-in-the-loop • Finite-state structure because of simplicity and functionality • Possible frame-based approach in future
AST Speech Synthesis • Fixed machine utterances: pre-recorded speech • Database queries: limited-domain synthesis (Festival platform)
Conclusion • Finite-state approach in • Recogniser • NLU component • Dialogue manager • Workable prototype • New fundings 2003