Hidden Markov Models for Information Extraction

Hidden Markov Modelsfor Information Extraction Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning

HMM Approach to IE • HMM states are associated with a semantic type • background-text, person-name, etc. • Constrained EM learns transitions and emissions • Viterbi alignment of a document marks tagged ranges of text with the same semantic type Extract range with highest probability 2 3 4 5 6 2 SpeakerisHuy Nguyenthis week

Existing Work • Leek (97 [UCSD MS thesis]) • Early results, fixed structures • Freitag & McCallum (99, 00) • Grow complex structures

Limitations of Existing Work • Only one field extracted at a time • Relative position of fields is ignored • e.g. authors usually come before titles in citations • Similar-looking fields aren’t competed for • e.g. acquired company vs. purchasing company • Simple model of unknown words • Use <UNK> for all words seen less than N times • No separation of content and context • e.g. can’t plug in generic date extractors, etc.

Current Research Goals • Flexibly train and combine extractors for multiple fields of information • Learn structures suited for individual fields • Can be recombined and reused with many HMMs • Learn intelligent context structures to link targets • Canonical ordering of fields • Common prefixes and suffixes • Construct merged HMM for actual extraction • Context/target split makes search problem tractable • Transitions between models are compiled out in merge

Current Research Goals • Richer models for handling unknown words • Estimate likelihood of novel words in each state • Featural decomposition for finer-grained probs • e.g. Nguyen  UNK[Capitalized, No-numbers] • Character-level models for higher precision • e.g. phone numbers, room numbers, dates, etc. • Conditional training to focus on extraction task • Classical joint estimation often wastes states modeling patterns in English background text • Conditional training is slower, but only rewards structure that increases labeling accuracy

Learning Target Structures • Goal: Learn flexible structure tailored to composition of particular fields • Representation: Disjunction of multi-state chains • Learning method: • Collect and isolate all examples of the target field • Initialization: single state • Search operators (greedy search): • extend current chain(s) • Start a new chain • Stopping criteria: MDL score

Example Target HMM: dlramt mln billion U.S. Canadian dlrs dollars yen pesos 13.5 240 100 START END undisclosed withheld amount

Learning Context Structures • Goal: Learn structure to connect multiple target HMMs • Captures canonical ordering of fields • Identifies prefix and suffix patterns around targets • Initialization: • Background state connected to each target • Find minimum # words between each target type in corpus • Connect targets directly if distance is 0 • Add context state between targets if they’re close • Search operators (greedy search): • Add prefix/suffix between background and target • Lengthen an existing chain • Start a new chain (by splitting an existing one) • Stopping criteria: MDL score

Example of Context HMM The yesterday Reuters Background START END Purchaser Context Acquired purchased acquired bought

Merging Context and Targets • In context HMM, targets are collapsed into a single state that always emits “purchaser” etc. • Target HMMs have single START and END state • Glue target HMMs into place by “compiling out” start/end transitions and creating one big HMM • Challenge: create supportive structure without being overly restrictive • Too little structure  hard to find regularities • Too much structure  can’t generate all docs

Background START END START END Purchaser Context Acquired Background START END Context Acquired Example of Merging HMMs

Tricks and Optimizations • Mandatory end state • Allows explicit modeling of document end • Structural enhancements • Add transitions from start directly to targets • Add transitions from target/suffix directly to end • Allow “skip-ahead” transitions • Separation of core structure learning • Structure learning is performed on “skeleton” structure • Enhancements are added during parameter estimation • Keeps search tractable while exploiting rich transitions

Sample of Recent F1 Results

Unknown Word Results

Conditional Training • Observation: Joint HMMs waste states modeling patterns in background text • Improves document likelihood (like n-grams) • Doesn’t improve labeling accuracy (can hurt it!) • Ideally focus on prefixes, suffixes, etc. only • Idea: Maximize conditional probability of labels P(labels|words) instead of P(labels, words) • Should only reward modeling helpful patterns • Can’t use standard Baum-Welch training • Solution: use numerical optimization (CG)

b a|b|c a|o c|e o e T T Potential of Conditional Training • Don’t waste states modeling background patterns • Toy data model: ((abc)*(eTo))* [T is target] • e.g. abcabcabcabceToabcabceToabcabcabc • Modeling abc improves joint likelihood but provides no help for labeling targets Optimal Joint Model Optimal Labeling Model

Running Conditional Training • Gradient descent requires differentiable function • Value: • Deriv: • Likelihood and expectations are easily computed with existing HMM algorithms • Compute values with and without type constraints Forward algorithm Param expectations

Challenges for Cond. Training • Need additional constraint to keep numbers small • Can’t guarantee you’ll get a probability distribution • But it’s ok if you’re just summing and multiplying! • Solution: sum of all params must equal a constant • Need to fix parameter space ahead of time • Can’t add states, new words, etc. • Solution: start with large ergodic model in which all states emit entire vocabulary (use UNK tokens) • Need sensible initialization • Uniform structure has high variance • Fixed structure usually dictates training

Results on Toy Data Set • Results on (([ae][bt][co])*(eto))* • Contains spurious prefix/target/suffix-like symbols • Joint training always labels every t • Conditional training eventually gets it perfectly

Current and Future Work • Richer search operators for structure learning • Richer models of unknown words (char-level) • Reduce variance of conditional training • Build reusable repository of target HMMs • Integrate with larger IE framework(s) • Semantic Web / KAON • LTG • Applications • Semi-automatic ontology markup for web pages • Smart email processing

Hidden Markov Models for Information Extraction

Hidden Markov Models for Information Extraction

Presentation Transcript

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models Applied to Information Extraction

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

COLACL 2006 Segment-based Hidden Markov Models for Information Extraction

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models (HMMs) for Information Extraction

Hidden Markov Models (HMMs) for Information Extraction