Relational Learning of Pattern-Match Rules for Information Extraction

Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper by Mary Elaine Califf and Raymond J. Mooney

Introduction • Information Extraction (IE) is the task of locating specific pieces of information in NL text • IE is an important subpart of text understanding • IE systems are difficult and time consuming to build and they don’t port well to different domains • Researchers are combining learning methods with NLP methods to automate IE

Overview of RAPIER • RAPIER – Robust Automated Production of Information Extraction Rules • Learn IE rules automatically • Use a corpus of documents paired with filled templates • Resulting rules do not require prior parsing or subsequent processing • Uses limited syntactic information from a POS tagger • Induced patterns incorporate semantic classes • Rules characterize slot-fillers and their context

RAPIER Rules • Consist of three parts: • Pre-filler pattern – matches text immediately preceding the extracted information • Filler pattern – matches the exact text to be extracted • Post-filler pattern – matches text after information • Each pattern is a sequence of pattern items or pattern lists • Pattern item specifies constraints for one word or symbol • Pattern list specifies constraints for 0..n words or symbols • Constraints include: • List of words, one of which must match the item • POS tag • Semantic class

RAPIER Rules (cont.)

Learning Algorithm located in Atlanta, Georgia. offices in Kansas City, Missouri. For each slot, S in the template being learned SlotRules = most specific rules from document S while compression has failed fewer than lim times randomly select r pairs of rules from SlotRules find the set L of generalizations of the fillers of the rule pairs create rules from L, evaluate, and initialize RulesList let n = 0 while best rule in RuleList produces spurious fillers and weighted information value of best rule is improving increment n specialize each rule in RuleList with generalizations of the last n items of the pre-filler patterns of the rule pair and add specializations to RuleList specialize each rule in RuleList with generalizations of the last n items of the post-filler patterns of the rule pair and add specializations to RuleList if best rule in RuleList produces only valid fillers Add it to SlotRules Remove empirically subsumed rules

Experimental Results • The task: Extract information from coputer-related job postings • 17 slots used, including employer, salary, etc. • Results do not employ semantic categories • 100 document dataset with filled templates with 10-fold cross validation • Measured precision, recall, and F-measure

Experimental Results – continued • Performance: • Is comparable to Crystal on a medical domain • Is better than AutoSlog and AutoSlog-TS on MUC-4 terrorism task • Is hard to compare because of the different domains tested • Is good because precision is most important

Related Work • Resolve • Uses decision trees • Uses annotated coreference examples • Crystal • Uses a clustering algorithm to build a dictionary of extraction patterns • Requires patterns identified by an expert • Requires prior syntax analysis to identify syntactic elements and their relationships • AutoSlog • Specializes a set of general syntatic patterns • An expert must examine the patterns it produces • Requires prior syntax analysis • Liep • Requires prior syntax analysis • Makes no real use of semantic information • Has not been applied to complex domains

Related Work – BYU DEG • RAPIER rules correspond closely to DEG data frames. • Data frames are finer-grained, based on character patterns, whereas rules are based on word patterns • Pre-filler and Post-filler patterns correspond closely to data frame contexts and key words • Semantic categories correspond closely with lexicons • Not mentioned how RAPIER handles multiple record documents • Rapier data structure is given by the template (slots) defined in the input data • RAPIER is very similar in purpose to what Joe is trying to do – learn extraction rules based on a filled in form

Conclusions • Extracting desired pieces of information from NL text is important • Manually constructing IE systems too hard • RAPIER uses relational learning to build a set of pattern-match rules given a database of texts and filled templates • Learned patterns employ syntactic and semantic information to match slot fillers and context • Fairly accurate results can be obtained for a real-world problem with relatively small datasets • RAPIER compares favorably with other IE learning systems

Relational Learning of Pattern-Match Rules for Information Extraction

Relational Learning of Pattern-Match Rules for Information Extraction

Presentation Transcript

Learning Hidden Markov Model Structure for Information Extraction

Information Extraction

Information Extraction

Learning Semantic Information Extraction Rules from News

information extraction

Text Learning and Information Extraction

Information Extraction

Learning Effective Patterns for Information Extraction

Learning for Biomedical Information Extraction with ILP

IEPAD: Information Extraction Based on Pattern Discovery

IEPAD: Information Extraction based on Pattern Discovery

Searching for Pattern Rules

Comparing Information Extraction Pattern Models

Relational Preference Rules for Control

Coupled Semi-Supervised Learning for Information Extraction

Relational Learning of Pattern-Match Rules for Information Extraction

Information Extraction

Learning Subjective Nouns using Extraction Pattern Bootstrapping

Learning Relational Rules for Goal Decomposition

Machine Learning for Information Extraction

Learning Subjective Nouns using Extraction Pattern Bootstrapping

ONDUX On-Demand Unsupervised Learning for Information Extraction