1 / 61

Natural Language in AI

Natural Language in AI. Outline . Text-based natural language Dialogue-based natural language. Methods in Natural Language Processing. Methods in NLP can be oriented to two categories of tasks: NL generation NL understanding. Natural Language problems. dialogue-based NL interfaces

ami
Télécharger la présentation

Natural Language in AI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language in AI

  2. Outline • Text-based natural language • Dialogue-based natural language

  3. Methods in Natural Language Processing Methods in NLP can be oriented to two categories of tasks: • NL generation • NL understanding

  4. Natural Language problems • dialogue-based • NL interfaces • spoken and written communication • uses natural language understanding • discourse (any string more than 1 Sentence long) • text-based • text categorization, text generation, information extraction, machine translation

  5. Text-Based Natural Language

  6. Text-based NL problems • story/text understanding; • information extraction: extracting information from text; • translating documents, manuals, communications; • drafting documents; • summarizing texts; • text generation, categorization or clustering, text DB retrieval, text mining, topic identification;

  7. Text-basedNatural Language Topics • Information extraction • Machine translation • Drafting • Text summarization

  8. Information Extraction • Extracting specific types of information from large volumes of unrestricted text; • The IE system must be input with domain guidelines that specify what to find and what to extract; • They seek for the portions that might contain the relevant information intended. • IE systems are not required to understand completely the text source;

  9. Types of IE • Knowledge-based Information Extraction • Machine learning IE • Template-based, Wrappers • Template Mining

  10. Types of IE Knowledge-based Information Extraction • Use of linguistic patterns to support the interpretation of input texts in knowledge-based information extraction. Machine learning IE • inductive learning mechanism to automatically construct a knowledge base of patterns.

  11. Types of IE Template-based, Wrappers • IE’s output is a populated database, which can be used as a case base • The values for the slots are strings from the source text • The resulting database works as a template Template Mining • well suited for areas, “where the text is terse and sentences are unambiguous and declarative in nature”.

  12. Relation between IE and NLP Using linguistic patterns: • knowledge-based (represents patterns) • inductive learning based (learns patterns) • template mining (skips parsing) • NLP is needed whenever there is need for disambiguating negation and ordering makes a difference in meaning

  13. Examples of applications of IE

  14. References of IE Robert Gaizauskas and Yorick Wilks (1998) Information Extraction: Beyond Document Retrieval. Computational Linguistics and Chinese Language Processing, vol. 3, no. 2, pp. 17-60. Riloff, E. Lehnert, W. (1994). Information Extraction as a Basis for High-Precision Text Classification. ACM Transactions in Information Systems, 12, 3, 296-333. Lehnert, W., McCarthy, J., Soderland, S., Riloff, E., Cardie, C., Peterson, J., Feng, F.,Dolan, C., and Goldman, S., (1993) UMASS/HUGHES: Description of the CIRCUS System Used for MUC-5. Proceedings of the Fifth Message Understanding Conference,pp. 277-291. San Mateo, CA:Morgan Kaufmann. S. Soderland and W. Lehnert (1994) Wrap Up: a Trainable Discourse Module for Information Extraction, Journal of Artificial Intelligence Research, 2, 131-168. Natural Language Processing Laboratory Online Information Extraction Bibliography online at: http://www-nlp.cs.umass.edu/ciir-pubs/tepubs.html

  15. Text-based Natural Language Topics • Information extraction • Machine translation • Drafting • Text summarization

  16. Can you translate this sentence? Ever since computers were invented, it has been natural to wonder whether they might be able to learn. By Tom Mitchell

  17. Describe the steps you used to translate the sentence

  18. List the words you used in the translated sentence and associate to the ones in the source sentence

  19. Desde que computadores foram inventados tem sido natural imaginar que eles sejam capazes de aprender. Ever since computers were invented it has been natural to wonder whether they might be able to learn.

  20. Online translators • http://babelfish.altavista.com/babelfish/trhttp://world.altavista.com/trhttp://www.systransoft.com/ • What’s wrong with them?

  21. Can you translate this sentence? …cursing my head for things that I've said till I finally died, which started the whole world living…

  22. What works? • The KANT project: • Knowledge-based, Accurate Translation for technical documentation • founded in 1989 • large-scale, practical translation systems • for technical documentation • Kant project homepage: http://www.lti.cs.cmu.edu/Research/Kant/

  23. KANT • uses a controlled vocabulary and grammar for each language • explicit yet focused semantic models for each technical domain • achieves very high accuracy in translation • multilingual document production • has been applied to the domains • electric power utility management • heavy equipment technical documentation.

  24. Machine Translation • Unrestricted MT is still inadequate. Will it ever change? • Why would MT target outperforming human translation? • An alternative is using humans to edit the original document into a subset of the original language (canonical form) Cost of MT • lexicons of 20,000-100,000 words • grammars with 100 to 10,000 rules

  25. Text-based Natural Language Topics • Information extraction • Machine translation • Drafting • Text summarization

  26. Drafting • applications in the legal domain • drafting of wills • petitions for restraining orders • use of rhetorical structure

  27. Example Rhetorical Structure

  28. Text-based Natural Language Topics • Information extraction • Machine translation • Drafting • Text summarization

  29. Summarize text

  30. Describe the steps you used to summarize text

  31. Text summarization applications • Generate a summary of many documents; • Generate a summary of one document only; • Headline generation;

  32. Text summarization The traditional idea of summarization is to extract sentences and concatenate them. Human beings produce summaries of documents by creating new sentences that capture the most salient pieces of information in the original document and that are grammatical, that cohere with one another, and . Given that large collections of text/abstract pairs are available online, it is now possible to envision algorithms that are trained to mimic this process. From Knight, K. and Marcu, D. 2000.

  33. Text summarization steps • Identify most relevant segments; • Apply rules for deleting redundant parts; • Compress/aggregate long sentences; • Assess coherence of segments; • Revise.

  34. Example

  35. Dialogue-based natural language

  36. Dialogue-based natural language NL Understanding • Speech recognition • intonation, pronunciation, speed • Natural Language Processing • syntactic , semantic , pragmatic analysis Natural Language Generation • intention, generation, speech synthesis

  37. Speech recognition • analog signal from voice is digitized • identify phonemes produced • template matching attempts to match phonemes from a library of sounds with sounds produced • outcome is a list of phonemes and probabilities • find the words using hidden Markov modeling

  38. How to recognize speech How to wreck a nice beach Ice cream I scream

  39. Speech Recognition Methods • speech recognition can also be implemented with an inductive method such as neural networks • individual and continuous recognizers • controlled vocabulary can increase chances of success e.g., Jupiter • limit to one speaker , when multiple speakers are needed, retraining may be often necessary • speech understanding includes speech recognition and understanding of the recognized utterance

  40. Natural Language Understanding - Syntactic Analysis - Parsing - Semantics - Pragmatics

  41. Syntactic analysis • a parser recovers the phrase structure of an utterance, given a grammar (rules of syntax) • parser’s outcome is the structure (groups of words and respective parts of speech) • phrase structure is represented in a parse tree • Parsing is the first step towards determining the meaning of an utterance

  42. Parsing • Parsing: method to analyze a sentence to determine its structure according to the grammar • Grammar: formal specification of the structures allowable in the language

  43. Examples of Symbols in a Grammar • (S) sentence • (NP) noun phrase • (VP) verb phrase • (PP) prepositional phrase • (RelClause) relative clause • (Det) determiner

  44. Grammar rules S  NP VP NP  Det Adjective N S  VP VP VP  V Adjective S  VP PP NP  Adjective N S  NP VP VP Dictionary entries: VP  V S V  ate VP  V NP NAME  John VP  V PP Det(art)  the NP  Noun N  cat PP  P Noun NP  Det Noun

  45. S NP VP Article Noun Verb Adjective is insurmountable The terrain Parsing Tree

  46. the outcome of the syntactic analysis can still be a series of alternate structures with respective probabilities • sometimes grammar rules can disambiguate a sentence, “John set the set of chairs” Sometimes they can’t. …the next step is semantic analysis

  47. Semantic analysis • semantics provide a partial representation for meaning • represents the sentence in meaningful parts • uses possible syntactic structures and meaning • builds a parse tree with associated semantics • semantics typically represented with logic

  48. Compositional semantics • The semantics of a phrase is a function of the semantics of its sub-phrases • It does not depend on any other phrase • So, if we know the meaning of sub-phrases, then we know the meaning of the phrases • “A goal of semantic interpretation is to find a way that the meaning of the whole sentence can be put together in a simple way from the meanings of the parts of the sentence.” (Alison, 1997 p. 112)

  49. Semantic analysis • transitiveness of a verb enhances the meaning in a parse tree (e.g., jump is intransitive, love is transitive) -John died Mary Is there a period missing or is it: -John dyed Mary

  50. Pragmatic analysis • uses context • uses partial representation • includes purpose and performs disambiguation • Where, when, by whom an utterance was said

More Related