Strategies for Statistical Semantic Knowledge Discovery in Natural Language Processing

Tricks for Statistical Semantic Knowledge Discovery:A Selectionally Restricted Sample Marti A. Hearst UC Berkeley

Acquire Semantic Information Goal:

Something on Finin

Tricks I Like Unambiguous Cues Lots o’ Text Rewrite and Verify

Trick: Lots o’ Text • Idea: words in the same syntactic context are semantically related. • Hindle, ACL’90, “Noun classification from predicate-argument structure.”

Trick: Lots o’ Text • Idea: words in the same syntactic context are semantically related. • Nakov & Hearst, ACL/HLT’08 “Solving Relational Similarity Problems Using the Web as a Corpus”

Trick: Lots o’ Text • Idea: bigger is better than smarter! • Banko & Brill ACL’01: “Scaling to Very, Very Large Corpora for Natural Language Disambiguation”

Trick: Lots o’ Text • Idea: apply web-scale n-grams to every problem imaginable. • Lapata & Keller, HLT/NACCL ‘04: “Web as a Baseline: Evaluating the Performance of Unsupervised Web-Based Models for a Range of NLP Tasks” = supervised > supervised MT candidate selection Noun compound bracketing Article suggestion Adjective ordering Noun compound interpretation

Limitation • Sometimes counts alone are too ambiguous. Solution • Bootstrap from unambiguous contexts.

Trick: Use Unambiguous Context • … to build statistics for ambiguous contexts. • Hindle & Rooth, ACL ’91“Structural Ambiguity and Lexical Relations” Example: PP attachment I eat spaghetti with sauce. Bootstrap from unambiguous contexts: Spaghetti with sauce is delicious. I eat with a fork.

Trick: Use Unambiguous Context • … to identify semantic relations (lexico-syntactic contexts) • Hearst, COLING ’92, “Automatic Acquisition of Hyponyms from Large Text Corpora” Example: Hyponym Identification

Combine Tricks 1 and 2

Trick: Use Unambiguous Contexts + Lot’s O’ Text • Combine lexico-syntactic patterns with occurrence counts. • Kozareva, Riloff, Hovy, HLT-ACL’08. “Semantic Class learning form the Web with Hyponym Pattern Linkage Graphs”.

Trick: Use Unambiguous Contexts + Lot’s O’ Text • Combine (usually) unambiguous surface patterns with occurrence counts. • Nakov & Hearst, HLT/EMNLP’05 “Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution”. • Left dash • cell-cycle analysis left • Possessive marker • brain’s stem cell  right • Parentheses • growth factor (beta)  left • Punctuation • heath care, provider  left • Abbreviation • tum. necr.(TN) factor  right • Concatenation • heathcare reform  left

Trick: Use Unambiguous Contexts + Lot’s O’ Text • Identify a “protagonist” in each text to learn narrative structure • Chambers & Jurafsky, ACL’08 “Unsupervised Learning of Narrative Event Chains”.

Trick 3: Rewrite & Verify

Trick: Rewrite & Verify • Check if alternatives exist in text • Nakov & Hearst, HLT/EMNLP’05 “Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution”. • Example: NP bracketing • Prepositional • stem cellsin the brain right • stem cellsfrom the brain right • cellsfrom the brainstem left • Verbal • viruscausinghuman immunodeficiency left • painassociated witharthritis migraine left • Copula • office buildingthat is a skyscraper right

Trick: Use Lexical Hierarchies • To improve generation of pseudo-words for WSD • Nakov & Hearst, HLT/NAACL’03, “Category-based Pseudo-Words” • To classify nouns in noun compounds and thus determine the semantic relations between them • Rosario, Hearst, & Fillmore, ACL’02, “Descent of Hierarchy and Selection in Relational Semantics” • To generate new (faceted) category systems • Stoica, Hearst, & Richardson, NAACL/HLT’07. “Automating Creation of Hierarchical Faceted Metadata Structures”

Example: Recipes (3500 docs)

Castanet Output(shown in Flamenco)

Castanet Output

Towards New Approaches to Semantic Analysis

Ideas • Inducing Semantic Grammars • Boggess, Agarwal, & Davis, AAAI’91, “Disambiguation of Prepositional Phrases in Automatically Labelled Technical Text”

Ideas • Use Cognitive Linguistics • Hearst, ’90,’92, “Direction-Based Text Interpretation”. • Talmy’s Force Dynamics + Reddy’s Conduit Metaphor  Path Model • Solves: Was the person in favor of or opposed to the idea:

Using Cognitive Linguistics • Talmy’s Theory of Force Dynamics • Talmy, “Force Dynamics in Language and Thought,” in Parasession on Causatives and Agentivity, Chicago Linguistic Society 1985. • Describes how the interaction of agents with respect to force is lexically and grammatically expressed. • Posits two opposing entities: Agonist and Antagonist. • Each entity expresses an intrinsic force: towards rest or motion. • The balance of the strengths of the entities determines the outcome of the event. • Grammatical expression includes using a claused headed by “despite” to express a weaker antagonist.

Using Cognitive Linguistics • Reddy’s Conduit Metaphor • Reddy, “The Conduit Metaphor – A Case of Frame Conflict in Our Language about Language,” in Metaphor and Thought, Ortony (Ed), Cambridge University Press, 1979. • A thought is schematized as an object which is placed by the speaker into a container that is sent along a conduit. • The receiver at the other end is the listener, who removes the objectified thought from the container and thus possesses it. • Inferences that apply to conduits can be applied to communication. • “Your meaning did not come through.” • “I can’t put this thought into words.” • “She is sending you some kind of message with that remark.”

Using Cognitive Linguistics • Combine into the Path Model • Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text-based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992. • If an agent favors an entity or event, that agent can be said to desire the existence or “well-being” of that entity, and vice-versa. • Thus if an agent favors an entity’s triumph in a force-dynamic interaction, then the agent favors that entity or event. • But: force dynamics does not have the expressive power for a sequence. • Instead of focusing on the relative strength of two interacting entities, the model should represent what happens to a single entity through the course of its encounters with other entities. • Thus the entity can be schematized as if it were moving along a path toward some destination or goal.

Using Cognitive Linguistics • The Path Model • Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text-based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992.

Strategies for Statistical Semantic Knowledge Discovery in Natural Language Processing