1 / 13

Can Computers Make Sense of Sentences?

Can Computers Make Sense of Sentences?. Text mining, Problems & Challenges. Sean Wallis Survey of English Usage University College London, UK. What is Text Mining ?. A set of different methods and approaches Aim: to extract knowledge from natural language sentences Information extraction

minya
Télécharger la présentation

Can Computers Make Sense of Sentences?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Can Computers Make Sense of Sentences? Text mining, Problems & Challenges Sean Wallis Survey of English Usage University College London, UK

  2. What is Text Mining? • A set of different methods and approaches • Aim: to extract knowledge from natural language sentences • Information extraction • Detailed knowledge from text • e.g. to extract logical rules • Text summarisation • Superficial model of texts • e.g. for comparative purposes

  3. Typical Applications • High level text summarisation • Superficial abstraction • Cluster texts about similar subjects • Find texts similar to this one • Find the most similar text to a key phrase • find texts about a subject • Low level information extraction • Glossary extraction: identify unique terms in text and extract a definition • Convert text into logic, models • Employ logical description to precis text

  4. Text abstraction • Aim: abstract from longer document • Use surface clues, keywords and statistical frequencies to assign weights to sentences • Sentences with highest aggregate score are selected as important • This method has obvious deficiencies • Measuring textual similarity by weights can have other applications • e.g. internet search algorithms

  5. Language Levels • Morphological • internal structure of words • Syntactic • sentence structure, grammar • Semantic • meaning of words and sentences • Discourse • sentence topic, anaphora (“it”, etc.) • Pragmatic • meaning depends on context

  6. Ambiguity at all Levels • Morphological • bank, file, chair • Syntactic • John saw [[the woman] with the telescope] • Semantic • The rabbit is ready for lunch • Discourse • Julie put the bowl on the plate. It broke. • Pragmatic • You owe me £50 (a fact or a request?) (Examples courtesy of Ruslan Mitkov)

  7. How is ambiguity resolved? • Information Extraction • Employ background knowledge • semantic nets • Narrow focus • try application-specific approaches • Late resolution • ‘read’ entire text and then disambiguate • combinatorial complexity • requires entire text (cannot be interactive) • Text summarisation • Might accept ambiguity

  8. How is ambiguity resolved by humans? • Cognitive science • Human information processing • Psycho-physical experiments • Attention and Memory literature • Interpretation and repair • Processes are data-driven, massively parallel and cyclic • Simple local resolution of ambiguity • Repairing ‘garden path’ difficult • Limited recovery from deep ambiguity • Greater parallelism at early stages • Some evidence for maintaining alternative interpretations in parallel

  9. How is ambiguity resolved by humans? • Cognitive science • Human information processing • Psycho-physical experiments • Attention and Memory literature • Interpretation and repair • Processes are data-driven, massively parallel and cyclic • Simple local resolution of ambiguity • Repairing ‘garden path’ difficult • Limited recovery from deep ambiguity • Greater parallelism at early stages • Some evidence for maintaining alternative interpretations in parallel

  10. Think like a computer –or a human? • Humans are not very logical • Texts can be logically ambiguous • “I will identify those who have a record of outstanding performance and, in light of market pay comparabilities, warrant a salary increase.” • Does this mean the following? x.(HighPerf(x)  Market(x)  PayIncrease(x)) • No. In this case the “and” refers to the set of conditions. x.( (HighPerf(x)  PayIncrease(x))  (Market(x)  PayIncrease(x)) ) x.(HighPerf(x)  Market(x)  PayIncrease(x))

  11. Algorithms • Syntax: the central role of verbs • Frames: • Domain-specific verb complementation patterns • “The futures market fell by ten points” • Grammar: • Penn Treebank II: predicate-argument relations • More general but parsing is itself imperfect • Semantic nets • WordNet - very large network of definitions • Dictionaries, thesauri etc • Domain ontologies (commercially: now XML)

  12. Algorithms (II) • Discourse: unify references • Crucial to integration of sentences into whole • Anaphora resolution • “If Peter Mandelson had been in Tony Blair’s shoes he would have demanded his resignation the day the Prime Minister forced him to leave the Cabinet.” • Semantic unification • nouns: “X23325” = “the widget” = “it” • verbs: “turn” = “rotate” • Pragmatic algorithms? • Role of expectation

  13. State of art &future work • Information extraction • Fairly robust systems in limited domains • restricted terms, syntax, pragmatics can be found in, e.g. legal documents • General systems still at research stage • Research: from morphology to pragmatics • What can be done with the results? • What is the role of human beings? • Where is human assistance most viable? • Machine learning (in particular, ILP) • Generalisation of propositions = ‘learning’ • Deductive reasoning • Detection of contradiction, simplification

More Related