1 / 72

Chatbots

Università di Pisa. Chatbots. Human Language Technologies Giuseppe Attardi Dipartimento di Informatica Università di Pisa. Evolution of Spoken Dialog Systems. Primitive prototype -> Academic demo -> Industrial product. Slide by CY Chen. Early Approaches. ELIZA ( Weizenbaum , 1966)

helgar
Télécharger la présentation

Chatbots

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Università di Pisa Chatbots Human Language Technologies Giuseppe Attardi Dipartimento di Informatica Universitàdi Pisa

  2. Evolution of Spoken Dialog Systems Primitive prototype -> Academic demo -> Industrial product Slide by CY Chen

  3. Early Approaches • ELIZA (Weizenbaum, 1966) • Used clever hand-written templates to generate replies that resemble the user’s input utterances • Several programming frameworks available today for building dialog agents d (Marietto et al., 2013, Microsoft, 2017b), Google Assistant

  4. Templates and Rules • hand-written rules to generate replies. • simple pattern matching or keyword retrieval techniques are employed to handle the user’s input utterances. • rules are used to transform a matching pattern or a keyword into a predefined reply. <category> <pattern>What is your name?</pattern> <template>My name is Alice</template> </category > <category> <pattern>I like *</pattern> <template>I too like <star/>.</template> </category >

  5. Open vs Closed Domain Closed Domain (easier) Open Domain (harder) • The user can take the conversation anywhere • No necessarily a well-defined goal or intention • Conversations on social media sites like Twitter and Reddit are typically open domain • The space of possible inputs and outputs is somewhat limited • The system is trying to achieve a very specific goal • Technical Customer Support or Shopping Assistants are examples of closed domain problems

  6. Open-Domain vs Closed-Domain

  7. Two Paradigms Generative (harder) Retrieval based (easier) • Use a repository of predefined responses and some heuristic to pick an appropriate response based on the input and context • The heuristic could be as simple as a rule-based expression match, or as complex as an ensemble of ML classifiers. • These systems don’t generate any new text, they just pick a response from a fixed set. • Don’t rely on pre-defined responses • They generate new responses from scratch • Generative models are typically based on transducer techniques • They “transduce” an input into an output (response)

  8. Comparison Generative Retrival-based • No grammatical mistakes. • Unable to handle unseen cases for which no appropriate predefined response exists • Can’t refer back to contextual entity information like names mentioned earlier in the conversation. • can refer back to entities in the input and give the impression of talking to a human • Hard to train • Likely to make grammatical mistakes (especially on longer sentences) • Typically require huge amounts of training data

  9. Long vs Short Conversations The longer the conversation the more difficult to automate it. Long Conversation (harder) Short Conversation (easier) • The goal is to create a single response to a single input. • For example, answering a specific question from a user with an appropriate answer. • bot goes through multiple turns and must keep track of what has been said • Customer support conversations are typically long conversational threads with multiple questions. • Alexa Prize Challenge aims at building a Socialbot capable of engaging users for 20 minutes

  10. Generation Based Models

  11. Transducer Model • Train machine translation system to perform translation from utterance to response

  12. Data-driven Dialog Response Generation • Lots of filtering, etc., to make sure that the extracted translation rules are reliable • Ritter et al. 2011 Slide by Graham Neubig

  13. Neural Models for Dialog Response Generation • Like other translation tasks, dialog response generation can be done with encoder-decoders • Shang et al. (2015) present the simplest model, translating from previous utterance Slide by Graham Neubig

  14. Retrieval-based Models

  15. Templates • Many requests can be answered with templates • Select most relevant response from a collection of templates, determine the slots required to fill it and instantiate the template with those values • Slots filler can be extracted from user utterance or retried with a query

  16. Retrieval-based Chat • Basic idea: given an utterance, find the most similar in the database and return it (Lee et al. 2009) • Similarity based on exact word match, plus extracted features regarding discourse Slide by Graham Neubig

  17. Neural Response Retrieval • Idea: use neural models to soften the connection between input and output and do more flexible matching (Nio et al. 2014) • Model uses Socher et al. (2011) recursiveautoencoder+ dynamic pooling Slide by Graham Neubig

  18. Dialog Control

  19. End-to-end Dialog Control • Train an LSTM that takes in text and entities and directly chooses an action to take (reply or API call) (Williams et al. 2017) • Trained using combination of supervised and reinforcement learning Slide by Graham Neubig

  20. Challenges

  21. Incorporating Context • To produce sensible responses systems may need to incorporate both linguistic context and physical context. • In long dialogs people keep track of what has been said and what information has been exchanged. • Experiments in Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models and Attention with Intention for a Neural Network Conversation Model try to embed a conversation into vectors. Can you tell me why truth is not lie? You perceive truth to be not lie, but is that true? They are very different. Good to know! I will store that info in y database. Don’t you think so? I kind of do, but it depends on the right context

  22. Coherence

  23. Coherence • Considering only a single previous utterance will lead to locally coherent but globally incoherent output • Need to consider more context! (Sordoni et al. 2015) • Contrast to MT, where context sometimes is helpful (Matsuzaki et al. 2015) and sometimes isn’t (Jean et al. 2017) Slide by Graham Neubig

  24. Simple Solution: Concatenate Utterances • Sordoni et al. (2015) consider one additional previous context utterance concatenated together • Vinyals et al. (2015) just concatenate together all previous utterances and hope an RNN an learn Human: i am not sure i know that Machine: is the network connect prompting for any user certificate Machine: ? Human: yes Machine: are you at home or at work at the moment ? Machine: office ? Human: i am at home Machine: try this Machine: goto <URL> page and enter your name and password Human: done Machine: try logging in with <NAME> and check if its working fine now Human: yes , now it works ! Machine: great Machine: anything else that i can help Human: no , that was very useful Machine: you are other than <NAME> : ) Machine: have a nice night Machine: describe your problem Human: i am having problems accessing the vpn Machine: could you please let me know what are the operating systems you are using ? Human: i am using linux Machine: ok it was in the sslvpn on which you stated it was broken Machine: please check if you are logged in to corp network Machine: what is the error that you are running please Human: i am seeing an error related to vpn Machine: what is the error message that you are getting when connecting to vpn using network connect ? Human: connection refused or something like that Machine: may i know the version of network connect you connect ? Slide by Graham Neubig

  25. Hierarchical Encoder-Decoder Model • Also have utterance-level RNN track overall dialog state (Serban et al. 2016) • 2RNN for words, RNN for utterance • No attention Slide from Graham Neubig

  26. Diversity

  27. More Varied Responses • For translation, there is lexical variation but content remains Slide by Graham Neubig

  28. Discourse-level VAE Model • Encode entire previous dialog context as latent variable in Variational Autoencoders (VAE) (Zhao et al. 2017) • Also meta-information such as dialog acts Slide by Graham Neubig

  29. Diversity Objective • Basic idea: we want responses that are likely given the context, unlikely otherwise (Li et al. 2016) • Method: subtract weighted unconditioned log probability from conditioned probability (calculated only on first few words) Slide by Graham Neubig

  30. Personality

  31. Coherent Personality • If we train on all of our data, our agent will be a mish-mash of personalities (e.g. Li et al. 2016) • We would like our agents to be consistent! Slide by Graham Neubig

  32. Personality Infused Dialog • Train a generation system with controllable “knobs” based on personality traits (Mairesse et al. 2007) • e.g. Extraversion: • Non-neural, but well done and perhaps applicable Slide by Graham Neubig

  33. Speaker Embeddings Speaker Embeddings may learns general facts associated with the specific speaker (Li et al. 2017) For example to the question Where do you live? it might reply with different answers depending on the speaker embedding

  34. Evaluation

  35. Evaluation of Models • Translation uses BLEU score; while imperfect, not horrible • In dialog, BLEU shows very little correlation (Liu et al. 2016) Slide by Graham Neubig

  36. DeltaBLEU • Retrieve good-looking responses, perform human evaluation, up-weight good ones, down-weight bad ones • Galley et al. 2015 Slide by Graham Neubig

  37. Learning to Evaluate • Use context, true response, and actual response to learn a regressor that predicts goodness (Lowe et al. 2017) • Important: similar to model, but has access to reference! • Adversarial evaluation: try to determine whether response is true or fake (Li et al. 2017) • One caveat from MT: learnable metrics tend to overfit Slide by Graham Neubig

  38. Advanced Features

  39. Integration with KB • Useful for task oriented dialogs. Not trivial to integrate.

  40. Experiments

  41. Template-based with DialogFlow

  42. Build Actions with Google Assistant • Experiment with Dialogflow: • https://codelabs.developers.google.com/codelabs/actions-1 • https://codelabs.developers.google.com/codelabs/actions-2 • Actions Console • https://console.actions.google.com/u/0/ • Once the action has been

  43. Key Concepts • Action: An Action is an entry point into an interaction that you build for the Assistant. Users can request your Action by typing or speaking to the Assistant. • Intent: An underlying goal or task the user wants to do; for example, ordering coffee or finding a piece of music. In Actions on Google, this is represented as a unique identifier and the corresponding user utterances that can trigger the intent. • Fulfillment: A service, app, feed, conversation, or other logic that handles an intent and carries out the corresponding Action.

  44. Tools Used • Google Actions • https://console.actions.google.com/ • Dialogflow • https://console.dialogflow.com/api-client/

  45. Test the action on Google Home • Once the action has been setup, click on See how it works on Google Assistant • Now the action can be tested also on Google Home

  46. A Retrieval-based Model in TensorFlow

  47. The Ubuntu Dialog Corpus • Ubuntu Dialog Corpus (github). • One of the largest public dialog datasets available. • Based on chat logs from the Ubuntu channels on a public IRC network. • This paper goes into detail on how exactly the corpus was created. • The training data consists of 1,000,000 examples • 50% positive (label 1) and 50% negative (label 0) • Each example consists of a context, the conversation up to this point, and an utterance, a response to the context • A positive label means that an utterance was an actual response to a context, • a negative label means that the utterance wasn’t — it was picked randomly from somewhere in the corpus.

  48. The dataset has been preprocessed— it has been tokenized, stemmed, and lemmatized using the NLTK tool. • Replaced entities like names, locations, organizations, URLs, and system paths with special tokens. • This preprocessing is likely to improve performance by a few percent. • The average context is 86 words long and the average utterance is 17 words long.

  49. Dual Encoder LSTM • Dual Encoder has been reported to give decent performance on this data set. • Applying other models to this problem would be an interesting project.

  50. Training • Both the context and the response text are split by words, and each word is embedded into a vector. The word embeddings are initialized with Stanford’s GloVe vectors and are fine-tuned during training. • Both the embedded context and response are fed into the same RNN word-by-word. The RNN generates a vector representation that captures the “meaning” of the context and response (c and r in the picture). We can choose how large these vectors should be, but let’s say we pick 256 dimensions. • We multiply c with a matrix M to “predict” a response r’. If c is a 256-dimensional vector, then M is a 256×256 dimensional matrix, and the result is another 256-dimensional vector, which we can interpret as a generated response. The matrix M is learned during training. • The similarity of the predicted response r’ and the actual response r is measured by taking the dot product of these two vectors, aka cosine similarity. We then apply a sigmoid function to convert that score into a probability.

More Related