1 / 29

Spoken Dialog Systems

Spoken Dialog Systems. Lecture 3 Spoken Language Processing Prof. Andrew Rosenberg. Overview. Conversational Agents Automatic Speech Recognition (ASR) Natural Language Understanding (NLU) Generation Cooperative Question-Answering Dialog Representations. Spoken Dialog Systems.

agnes
Télécharger la présentation

Spoken Dialog Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spoken Dialog Systems Lecture 3 Spoken Language Processing Prof. Andrew Rosenberg

  2. Overview • Conversational Agents • Automatic Speech Recognition (ASR) • Natural Language Understanding (NLU) • Generation • Cooperative Question-Answering • Dialog Representations

  3. Spoken Dialog Systems • Other terms: • Interactive Voice Response (IVR) Systems • Conversational Agents • Dialog Systems • Applications • Travel Arrangements (Amtrak, United Airlines) • Telephone call routing • Tutoring • Communicating with Robots • Anything with limited screen or keyboard

  4. Travel Assistant: Communicator

  5. Call Routing: ATT HMIHY

  6. Tutorial: ITSPOKE

  7. Conversational Structure • Telephone Conversation • Stage 1: Enter a conversation • Stage 2: Identification • Stage 3: Establish a joint willingness to converse • Stage 4: First topic is raised. Often by the caller.

  8. Conversational confusion • Customer: (rings) • Operator: Directory Enquiries, for which town please? • Customer: Could you give me the phone number of um: Mrs. um: Smithson? • Operator: Yes, which town is this at please? • Customer: Huddleston. • Operator: Yes. And the name again? • Customer: Mrs. Smithson

  9. Customer confusion • Agent: And, what day in May did you want to travel? • Client: OK, uh, I need to be there for a meeting that’s from the 12th to the 15th. • The client didn’t answer the question. • Meaning of the sentence • Meeting • start-of-meeting: 12th • end-of-meeting: 15th • Doesn’t say anything about flying • How does the agent infer that the client is informing him/her of travel dates, or requirements

  10. Customer Confusion • Agent: ….there’s 3 non-stops today • True if, in fact there are more than 3 non-stops today. • Intention is that there are exactly 3. • How can the client infer that the agent means • “only 3”

  11. Grice: Conversational Implicature • Implicature: a particular class of licensed inferences. • Grice (1975): how do conversational participants draw inferences beyond what is actually asserted? • Cooperative Principle • “Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.” • A tacit agreement by speakers and listeners to cooperate in communication

  12. Four Gricean Maxims • Relevance: Be relevant • Quantity :Do not make your contribution more or less informative than required • Quality: Try to make your contribution one that is true – don’t say things that are false, or for which you do not have evidence • Manner: Avoid ambiguity and obscurity. Be brief and orderly

  13. Maxim of Relevance • A: Is Regina here? • B: Her car is outside. • Implicature: yes • Hearer thinks: Why would B mention the car? It must be relevant. How is it relevant? If her car is here, then she probably is too.

  14. Maxim of Relevance • Client: I need to be there for a meeting that’s from the 12th to the 15th. • Hearer thinks: Speaker is following conversational maxims. This must be relevant? How would a meeting be relevant? If the client meant that he or she must arrive in the destination in time for the meeting, and will not leave until after the meeting is over.

  15. Maxim of Quantity • A: How much money do you have on you? • B: I have 5 dollars. • Implicature: 5 and only 5 dollars. not 6 dollars. • Similarly, 3 non-stops implies “only 3” • Hearer thinks: If the speaker means 7 non-stops, he or she would have said 7 non-stops • A: Did you do the reading for today’s class. • B: I’m going to at lunch • Implicature: No. • B’s answer would be true if B planned on doing the reading at lunch, AND already did the reading, but this would violate the Maxim of Quantity

  16. Dialog System Architecture

  17. Speech Recognition • Input: acoustic waveform • Output: string of words • Basic Components: • A phone recognizer • acoustic model • small sound units [k][ae], etc. • A pronunciation dictionary • pronunciation model • cat = [k ae t] • A grammar telling us what words are likely to follow what words • language model • A search algorithm to find the best string of words • decoder

  18. Natural Language Understanding • “NLU” • “Computational Semantics” • There are many ways to represent the meaning of a sentence, or group of sentences • For speech spoken dialog systems, the most common is Frame and Slot Semantics

  19. A Sample Frame • “Show me morning flights from Boston to SF on Tuesday”SHOW: FLIGHTS: ORIGIN: CITY: Boston DATE: Tuesday TIME: morning DEST: CITY: San Francisco

  20. How do we Generate this semantics • Many Methods: • Simplest: “semantic grammars” • E.g. a simple Context Free Grammars • Left Hand Side rules are semantic categories • LIST -> show me | I want | can I see | … • DEPARTURETIME -> (after|around|before) HOUR | morning | afternoon | evening • HOUR -> one | two | three | … | twelve (am|pm) • FLIGHTS -> (a) flight | flights • ORIGIN -> from CITY • DESTINATION -> to CITY • CITY -> Boston | San Francisco | Denver | Washington

  21. Semantics for a sentence LIST FLIGHTS ORIGIN Show me flights from Boston DESTINATION DEPARTDATE DEPARTTIME to San Francisco on Tuesday morning

  22. Generation and TTS • Generation component • Chooses concepts to express to the user • Plans how to express these concepts in words • Assigns appropriate prosody to the words • Text-to-Speech (TTS) component • Input: words and prosodic annotations • Output: Synthesized audio waveform

  23. Generation Component • Content Planner • Decides what information to express to user • ask a question, present an answer, etc. • Frequently merged with a dialog manager • Language Generation • Chooses syntactic structures and words to express meaning. • Simplest method: Template Based Generation • Pre-specify basic sentences and fill in the blanks as appropriate • Can have variables: • What time do you want to leave CITY-ORIG? • Will you return to CITY-ORIG from CITY-DEST?

  24. More sophisticate language generation • Natural Language Generation • Approach • Dialog manager builds representation of the meaning of the utterance to be expressed • Possibly Slot-Frame representation • Passes this to a “generator” • Generators have three components • Sentence planner • Surface realizer • Prosody assigner

  25. Architecture of a generator • Based on Walker & Rambow 2002

  26. HCI Constraints on Generation • Human Computer Interaction (HCI) • Discourse markers and pronouns • “Coherence” Please say the date… Please say the start time… Please say the duration… Please say the subject… First, tell me the date… Next, I’ll need the time it starts… Thanks. <pause> Now, how long is it supposed to last. Just one more thing. I need a brief description

  27. HCI Constraints • “Tapered Prompts” • Prompts get incrementally smaller • System: Now, what’s the first company you’d like to add to your watch list? • Caller: Cisco • System: What’s the next company name? Or you can say “Finished” • Caller: IBM • System: Tell me the next company name or say “Finished” • Caller: Intel • System: Next One? • Caller: General Motors • System: Next? • Caller: …

  28. Issues in Building SDS • What architecture should you use? • How much discourse history should you collect and maintain? • What type of dialog strategy will you use? • What type of confirmation strategy to use? • What help will be provided? • How can you recover from errors?

  29. Next Class • Acoustics of Speech • Reading: J&M 7.4

More Related