1 / 78

Information Extraction, Data Mining and Joint Inference

Information Extraction, Data Mining and Joint Inference. Andrew McCallum Computer Science Department University of Massachusetts Amherst. Joint work with Charles Sutton, Aron Culotta, Xuerui Wang, Ben Wellner, David Mimno, Gideon Mann. Goal:.

kaethe
Télécharger la présentation

Information Extraction, Data Mining and Joint Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Extraction,Data Miningand Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta, Xuerui Wang, Ben Wellner, David Mimno, Gideon Mann.

  2. Goal: Mine actionable knowledgefrom unstructured text.

  3. foodscience.com-Job2 JobTitle: Ice Cream Guru Employer: foodscience.com JobCategory: Travel/Hospitality JobFunction: Food Services JobLocation: Upper Midwest Contact Phone: 800-488-2611 DateExtracted: January 8, 2001 Source: www.foodscience.com/jobs_midwest.html OtherCompanyJobs: foodscience.com-Job1 Extracting Job Openings from the Web

  4. A Portal for Job Openings

  5. Job Openings: Category = High Tech Keyword = Java Location = U.S.

  6. Data Mining the Extracted Job Information

  7. IE from Research Papers [McCallum et al ‘99]

  8. IE from Research Papers

  9. Mining Research Papers [Rosen-Zvi, Griffiths, Steyvers, Smyth, 2004] [Giles et al]

  10. IE fromChinese Documents regarding Weather Department of Terrestrial System, Chinese Academy of Sciences 200k+ documents several millennia old - Qing Dynasty Archives - memos - newspaper articles - diaries

  11. What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification + clustering + association October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

  12. What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

  13. What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification+ association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

  14. What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification+ association+ clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation * Free Soft.. Microsoft Microsoft TITLE ORGANIZATION * founder * CEO VP * Stallman NAME Veghte Bill Gates Richard Bill

  15. From Text to Actionable Knowledge Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Database Documentcollection Actionableknowledge Prediction Outlier detection Decision support

  16. Solution: Uncertainty Info Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Database Documentcollection Actionableknowledge Emerging Patterns Prediction Outlier detection Decision support

  17. Discriminatively-trained undirected graphical models Conditional Random Fields [Lafferty, McCallum, Pereira] Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…] Complex Inference and Learning Just what we researchers like to sink our teeth into! Solution: Unified Model Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Probabilistic Model Documentcollection Actionableknowledge Prediction Outlier detection Decision support

  18. Scientific Questions • What model structures will capture salient dependencies? • Will joint inference actually improve accuracy? • How to do inference in these large graphical models? • How to do parameter estimation efficiently in these models,which are built from multiple large components? • How to do structure discovery in these models?

  19. Scientific Questions • What model structures will capture salient dependencies? • Will joint inference actually improve accuracy? • How to do inference in these large graphical models? • How to do parameter estimation efficiently in these models,which are built from multiple large components? • How to do structure discovery in these models?

  20. Outline a a • Examples of IE and Data Mining. • Motivate Joint Inference • Brief introduction to Conditional Random Fields • Joint inference: Examples • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation) • Joint Co-reference Resolution (Graph Partitioning) • Joint Co-reference with Weighted 1st-order Logic (MCMC) • Joint Relation Extraction and Data Mining (Bootstrapping) • Ultimate application area: Rexa, a Web portal for researchers

  21. where Wide-spread interest, positive experimental results in many applications. Noun phrase, Named entity [HLT’03], [CoNLL’03]Protein structure prediction [ICML’04] IE from Bioinformatics text [Bioinformatics ‘04],… Asian word segmentation [COLING’04], [ACL’04]IE from Research papers [HTL’04] Object classification in images [CVPR ‘04] (Linear Chain) Conditional Random Fields [Lafferty, McCallum, Pereira 2001] Undirected graphical model, trained to maximize conditional probability of output (sequence) given input (sequence) Finite state model Graphical model OTHERPERSONOTHERORGTITLE … output seq y y y y y t+2 t+3 t - 1 t t+1 FSM states . . . observations x x x x x t t +2 +3 t - t +1 t 1 input seq said Jones a Microsoft VP …

  22. Outline a a • Examples of IE and Data Mining. • Motivate Joint Inference • Brief introduction to Conditional Random Fields • Joint inference: Examples • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation) • Joint Co-reference Resolution (Graph Partitioning) • Joint Co-reference with Weighted 1st-order Logic (MCMC) • Joint Relation Extraction and Data Mining (Bootstrapping) • Ultimate application area: Rexa, a Web portal for researchers a

  23. Jointly labeling cascaded sequencesFactorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words

  24. Jointly labeling cascaded sequencesFactorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words

  25. Jointly labeling cascaded sequencesFactorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words But errors cascade--must be perfect at every stage to do well.

  26. Jointly labeling cascaded sequencesFactorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words Joint prediction of part-of-speech and noun-phrase in newswire, matching accuracy with only 50% of the training data. Inference: Loopy Belief Propagation

  27. Outline a a • Examples of IE and Data Mining. • Motivate Joint Inference • Brief introduction to Conditional Random Fields • Joint inference: Examples • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation) • Joint Co-reference Resolution (Graph Partitioning) • Joint Co-reference with Weighted 1st-order Logic (MCMC) • Joint Relation Extraction and Data Mining (Bootstrapping) • Ultimate application area: Rexa, a Web portal for researchers

  28. Joint co-reference among all pairsAffinity Matrix CRF “Entity resolution”“Object correspondence” . . . Mr Powell . . . 45 . . . Powell . . . Y/N Y/N -99 Y/N ~25% reduction in error on co-reference of proper nouns in newswire. 11 . . . she . . . Inference: Correlational clustering graph partitioning [McCallum, Wellner, IJCAI WS 2003, NIPS 2004] [Bansal, Blum, Chawla, 2002]

  29. Joint Co-reference for Multiple Entity Types [Culotta & McCallum 2005] People Stuart Russell Y/N Stuart Russell Y/N Y/N S. Russel

  30. Joint Co-reference for Multiple Entity Types [Culotta & McCallum 2005] People Organizations Stuart Russell University of California at Berkeley Y/N Y/N Stuart Russell Y/N Berkeley Y/N Y/N Y/N S. Russel Berkeley

  31. Joint Co-reference for Multiple Entity Types [Culotta & McCallum 2005] People Organizations Stuart Russell University of California at Berkeley Y/N Y/N Stuart Russell Y/N Berkeley Y/N Y/N Y/N Reduces error by 22% S. Russel Berkeley

  32. Outline a a • Examples of IE and Data Mining. • Motivate Joint Inference • Brief introduction to Conditional Random Fields • Joint inference: Examples • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation) • Joint Co-reference Resolution (Graph Partitioning) • Joint Co-reference with Weighted 1st-order Logic (MCMC) • Joint Relation Extraction and Data Mining (Bootstrapping) • Ultimate application area: Rexa, a Web portal for researchers a a a

  33. Sometimes pairwise comparisonsare not enough. • Entities have multiple attributes (name, email, institution, location);need to measure “compatibility” among them. • Having 2 “given names” is common, but not 4. • Need to measure size of the clusters of mentions. •  a pair of lastname strings that differ > 5? We need measures on hypothesized “entities” We need First-order logic

  34. SamePerson(Howard Dean, Howard Martin)? SamePerson(Dean Martin, Howard Dean)? Pairwise Features StringMatch(x1,x2) EditDistance(x1,x2) SamePerson(Dean Martin, Howard Martin)? Pairwise Co-reference Features Howard Dean Dean Martin Howard Martin

  35. First-Order Features x1,x2StringMatch(x1,x2) x1,x2 ¬StringMatch(x1,x2) x1,x2EditDistance>.5(x1,x2) ThreeDistinctStrings(x1,x2, x3 ) N = 3 Cluster-wise (higher-order) Representations Howard Dean SamePerson(Howard Dean, Howard Martin, Dean Martin)? Dean Martin Howard Martin

  36. . . . . . . Combinatorial Explosion! … SamePerson(x1,x2 ,x3,x4 ,x5 ,x6) … SamePerson(x1,x2 ,x3,x4 ,x5) … SamePerson(x1,x2 ,x3,x4) … SamePerson(x1,x2 ,x3) … SamePerson(x1,x2) Cluster-wise (higher-order) Representations … Dino Martin Dean Martin Howard Dean Howard Martin Howie

  37. This space complexity is common in first-order probabilistic models

  38. Markov Logic: (Weighted 1st-order Logic)Using 1st-order Logic as a Template to Construct a CRF [Richardson & Domingos 2005] ground Markov network grounding Markov network requires space O(nr) n = number constants r = highest clause arity

  39. How can we perform inference and learning in models that cannot be grounded?

  40. Inference in First-Order ModelsSAT Solvers • Weighted SAT solvers [Kautz et al 1997] • Requires complete grounding of network • LazySAT [Singla & Domingos 2006] • Saves memory by only storing clauses that may become unsatisfied • Still requires exponential time to visit all ground clauses at initialization.

  41. Inference in First-Order ModelsSampling • Gibbs Sampling • Difficult to move between high probability configurations by changing single variables • Although, consider MC-SAT [Poon & Domingos ‘06] • An alternative: Metropolis-Hastings sampling • Can be extended to partial configurations • Only instantiate relevant variables • Successfully used in BLOG models [Milch et al 2005] • 2 parts: proposal distribution, acceptance distribution. [Culotta & McCallum 2006]

  42. Howard Martin Howie Martin Dean Martin Dino Proposal Distribution Dean Martin Howie Martin Howard Martin Dino y y’

  43. Proposal Distribution Dean Martin Howie Martin Howard Martin Dino y y’ Dean Martin Howie Martin Howard Martin Howie Martin

  44. Proposal Distribution y y’ Dean Martin Howie Martin Howard Martin Howie Martin Dean Martin Howie Martin Howard Martin Dino

  45. Inference with Metropolis-Hastings • y : configuration • p(y’)/p(y) : likelihood ratio • Ratio of P(Y|X) • ZX cancels • q(y’|y) : proposal distribution • probability of proposing move y y’

  46. Experiments • Paper citation coreference • Author coreference • First-order features • All Titles Match • Exists Year MisMatch • Average String Edit Distance > X • Number of mentions

  47. Results on Citation Data Citeseer paper coreference results (pair F1) Author coreference results (pair F1)

  48. Outline a a • Examples of IE and Data Mining. • Motivate Joint Inference • Brief introduction to Conditional Random Fields • Joint inference: Examples • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation) • Joint Co-reference Resolution (Graph Partitioning) • Joint Co-reference with Weighted 1st-order Logic (MCMC) • Joint Relation Extraction and Data Mining (Bootstrapping) • Ultimate application area: Rexa, a Web portal for researchers a a a a

  49. Data • 270 Wikipedia articles • 1000 paragraphs • 4700 relations • 52 relation types • JobTitle, BirthDay, Friend, Sister, Husband, Employer, Cousin, Competition, Education, … • Targeted for density of relations • Bush/Kennedy/Manning/Coppola families and friends

More Related