Can Computers Make Sense of Sentences?

Can Computers Make Sense of Sentences? Text mining, Problems & Challenges Sean Wallis Survey of English Usage University College London, UK

What is Text Mining? • A set of different methods and approaches • Aim: to extract knowledge from natural language sentences • Information extraction • Detailed knowledge from text • e.g. to extract logical rules • Text summarisation • Superficial model of texts • e.g. for comparative purposes

Typical Applications • High level text summarisation • Superficial abstraction • Cluster texts about similar subjects • Find texts similar to this one • Find the most similar text to a key phrase • find texts about a subject • Low level information extraction • Glossary extraction: identify unique terms in text and extract a definition • Convert text into logic, models • Employ logical description to precis text

Text abstraction • Aim: abstract from longer document • Use surface clues, keywords and statistical frequencies to assign weights to sentences • Sentences with highest aggregate score are selected as important • This method has obvious deficiencies • Measuring textual similarity by weights can have other applications • e.g. internet search algorithms

Language Levels • Morphological • internal structure of words • Syntactic • sentence structure, grammar • Semantic • meaning of words and sentences • Discourse • sentence topic, anaphora (“it”, etc.) • Pragmatic • meaning depends on context

Ambiguity at all Levels • Morphological • bank, file, chair • Syntactic • John saw [[the woman] with the telescope] • Semantic • The rabbit is ready for lunch • Discourse • Julie put the bowl on the plate. It broke. • Pragmatic • You owe me £50 (a fact or a request?) (Examples courtesy of Ruslan Mitkov)

How is ambiguity resolved? • Information Extraction • Employ background knowledge • semantic nets • Narrow focus • try application-specific approaches • Late resolution • ‘read’ entire text and then disambiguate • combinatorial complexity • requires entire text (cannot be interactive) • Text summarisation • Might accept ambiguity

How is ambiguity resolved by humans? • Cognitive science • Human information processing • Psycho-physical experiments • Attention and Memory literature • Interpretation and repair • Processes are data-driven, massively parallel and cyclic • Simple local resolution of ambiguity • Repairing ‘garden path’ difficult • Limited recovery from deep ambiguity • Greater parallelism at early stages • Some evidence for maintaining alternative interpretations in parallel

Think like a computer –or a human? • Humans are not very logical • Texts can be logically ambiguous • “I will identify those who have a record of outstanding performance and, in light of market pay comparabilities, warrant a salary increase.” • Does this mean the following? x.(HighPerf(x)  Market(x)  PayIncrease(x)) • No. In this case the “and” refers to the set of conditions. x.( (HighPerf(x)  PayIncrease(x))  (Market(x)  PayIncrease(x)) ) x.(HighPerf(x)  Market(x)  PayIncrease(x))

Algorithms • Syntax: the central role of verbs • Frames: • Domain-specific verb complementation patterns • “The futures market fell by ten points” • Grammar: • Penn Treebank II: predicate-argument relations • More general but parsing is itself imperfect • Semantic nets • WordNet - very large network of definitions • Dictionaries, thesauri etc • Domain ontologies (commercially: now XML)

Algorithms (II) • Discourse: unify references • Crucial to integration of sentences into whole • Anaphora resolution • “If Peter Mandelson had been in Tony Blair’s shoes he would have demanded his resignation the day the Prime Minister forced him to leave the Cabinet.” • Semantic unification • nouns: “X23325” = “the widget” = “it” • verbs: “turn” = “rotate” • Pragmatic algorithms? • Role of expectation

State of art &future work • Information extraction • Fairly robust systems in limited domains • restricted terms, syntax, pragmatics can be found in, e.g. legal documents • General systems still at research stage • Research: from morphology to pragmatics • What can be done with the results? • What is the role of human beings? • Where is human assistance most viable? • Machine learning (in particular, ILP) • Generalisation of propositions = ‘learning’ • Deductive reasoning • Detection of contradiction, simplification

Can Computers Make Sense of Sentences?

Can Computers Make Sense of Sentences?

Presentation Transcript

Introduction to Computers

Sentences and Non-Sentences

Chapter 8 Writing Effective Sentences (P. 268)

Revising Sentences for Connection

Computers: A short history

Chapter 2 Sociology’s Family Tree: Theories and Theorists

Computers and Computer Systems – Lesson1

Computers, Applications, and Java

Sentence of the Week

Chapter 1: Introduction to the World of Computers

Complete Sentences

Transformation sentences 3

Memory and Language

1. COMMON SENSE

Common Sense in a World of Regulations A Presentation at the WAMEA Symposium “Generations”

DIVE COMPUTERS 101

E_English Grammar Course

BIG SIGHT WORDS

Combining Sentences

Cells and Systems

Complete Sentences

Evolution of computers and Computers Today