210 likes | 330 Vues
Hidden Markov Models for Information Extraction. Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning. HMM Approach to IE. HMM states are associated with a semantic type background-text , person-name , etc. Constrained EM learns transitions and emissions
E N D
Hidden Markov Modelsfor Information Extraction Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning
HMM Approach to IE • HMM states are associated with a semantic type • background-text, person-name, etc. • Constrained EM learns transitions and emissions • Viterbi alignment of a document marks tagged ranges of text with the same semantic type Extract range with highest probability 2 3 4 5 6 2 SpeakerisHuy Nguyenthis week
Existing Work • Leek (97 [UCSD MS thesis]) • Early results, fixed structures • Freitag & McCallum (99, 00) • Grow complex structures
Limitations of Existing Work • Only one field extracted at a time • Relative position of fields is ignored • e.g. authors usually come before titles in citations • Similar-looking fields aren’t competed for • e.g. acquired company vs. purchasing company • Simple model of unknown words • Use <UNK> for all words seen less than N times • No separation of content and context • e.g. can’t plug in generic date extractors, etc.
Current Research Goals • Flexibly train and combine extractors for multiple fields of information • Learn structures suited for individual fields • Can be recombined and reused with many HMMs • Learn intelligent context structures to link targets • Canonical ordering of fields • Common prefixes and suffixes • Construct merged HMM for actual extraction • Context/target split makes search problem tractable • Transitions between models are compiled out in merge
Current Research Goals • Richer models for handling unknown words • Estimate likelihood of novel words in each state • Featural decomposition for finer-grained probs • e.g. Nguyen UNK[Capitalized, No-numbers] • Character-level models for higher precision • e.g. phone numbers, room numbers, dates, etc. • Conditional training to focus on extraction task • Classical joint estimation often wastes states modeling patterns in English background text • Conditional training is slower, but only rewards structure that increases labeling accuracy
Learning Target Structures • Goal: Learn flexible structure tailored to composition of particular fields • Representation: Disjunction of multi-state chains • Learning method: • Collect and isolate all examples of the target field • Initialization: single state • Search operators (greedy search): • extend current chain(s) • Start a new chain • Stopping criteria: MDL score
Example Target HMM: dlramt mln billion U.S. Canadian dlrs dollars yen pesos 13.5 240 100 START END undisclosed withheld amount
Learning Context Structures • Goal: Learn structure to connect multiple target HMMs • Captures canonical ordering of fields • Identifies prefix and suffix patterns around targets • Initialization: • Background state connected to each target • Find minimum # words between each target type in corpus • Connect targets directly if distance is 0 • Add context state between targets if they’re close • Search operators (greedy search): • Add prefix/suffix between background and target • Lengthen an existing chain • Start a new chain (by splitting an existing one) • Stopping criteria: MDL score
Example of Context HMM The yesterday Reuters Background START END Purchaser Context Acquired purchased acquired bought
Merging Context and Targets • In context HMM, targets are collapsed into a single state that always emits “purchaser” etc. • Target HMMs have single START and END state • Glue target HMMs into place by “compiling out” start/end transitions and creating one big HMM • Challenge: create supportive structure without being overly restrictive • Too little structure hard to find regularities • Too much structure can’t generate all docs
Background START END START END Purchaser Context Acquired Background START END Context Acquired Example of Merging HMMs
Tricks and Optimizations • Mandatory end state • Allows explicit modeling of document end • Structural enhancements • Add transitions from start directly to targets • Add transitions from target/suffix directly to end • Allow “skip-ahead” transitions • Separation of core structure learning • Structure learning is performed on “skeleton” structure • Enhancements are added during parameter estimation • Keeps search tractable while exploiting rich transitions
Conditional Training • Observation: Joint HMMs waste states modeling patterns in background text • Improves document likelihood (like n-grams) • Doesn’t improve labeling accuracy (can hurt it!) • Ideally focus on prefixes, suffixes, etc. only • Idea: Maximize conditional probability of labels P(labels|words) instead of P(labels, words) • Should only reward modeling helpful patterns • Can’t use standard Baum-Welch training • Solution: use numerical optimization (CG)
b a|b|c a|o c|e o e T T Potential of Conditional Training • Don’t waste states modeling background patterns • Toy data model: ((abc)*(eTo))* [T is target] • e.g. abcabcabcabceToabcabceToabcabcabc • Modeling abc improves joint likelihood but provides no help for labeling targets Optimal Joint Model Optimal Labeling Model
Running Conditional Training • Gradient descent requires differentiable function • Value: • Deriv: • Likelihood and expectations are easily computed with existing HMM algorithms • Compute values with and without type constraints Forward algorithm Param expectations
Challenges for Cond. Training • Need additional constraint to keep numbers small • Can’t guarantee you’ll get a probability distribution • But it’s ok if you’re just summing and multiplying! • Solution: sum of all params must equal a constant • Need to fix parameter space ahead of time • Can’t add states, new words, etc. • Solution: start with large ergodic model in which all states emit entire vocabulary (use UNK tokens) • Need sensible initialization • Uniform structure has high variance • Fixed structure usually dictates training
Results on Toy Data Set • Results on (([ae][bt][co])*(eto))* • Contains spurious prefix/target/suffix-like symbols • Joint training always labels every t • Conditional training eventually gets it perfectly
Current and Future Work • Richer search operators for structure learning • Richer models of unknown words (char-level) • Reduce variance of conditional training • Build reusable repository of target HMMs • Integrate with larger IE framework(s) • Semantic Web / KAON • LTG • Applications • Semi-automatic ontology markup for web pages • Smart email processing