1 / 93

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction. Saurabh Verma IIT (BHU), Varanasi Estevam R. Hruschka Jr. F ederal University of São Carlos, Brazil. http:// rtw.ml.cmu.edu. NELL: Never-Ending Language Learner. Inputs: initial ontology

kareem
Télécharger la présentation

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction SaurabhVerma IIT (BHU), Varanasi Estevam R. Hruschka Jr. Federal University of São Carlos, Brazil ECML/PKDD2012 Bristol, UK September, 26th, 2012

  2. http://rtw.ml.cmu.edu ECML/PKDD2012 Bristol, UK September, 26th, 2012

  3. NELL: Never-Ending Language Learner Inputs: initial ontology handful of examples of each predicate in ontology the web occasional interaction with human trainers The task: run 24x7, forever •each day: 1.extract more facts from the web to populate the initial ontology 2.learn to read (perform #1) better than yesterday ECML/PKDD2012 Bristol, UK September, 26th, 2012

  4. NELL: Never-Ending Language Learner Goal: •run 24x7, forever •each day: 1.extract more facts from the web to populate given ontology 2.learn to read better than yesterday Today... Running 24 x 7, since January, 2010 Input: •ontology defining ~800 categories and relations •10-20 seed examples of each •1 billion web pages (ClueWeb – Jamie Callan) Result: •continuously growing KB with +1,300,000 extracted beliefs ECML/PKDD2012 Bristol, UK September, 26th, 2012

  5. NELL: Never-Ending Language Learner http://rtw.ml.cmu.edu ECML/PKDD2012 Bristol, UK September, 26th, 2012

  6. The Problem with Semi-Supervised Bootstrap Learning ECML/PKDD2012 Bristol, UK September, 26th, 2012 Paris Pittsburgh Seattle Cupertino

  7. The Problem with Semi-Supervised Bootstrap Learning ECML/PKDD2012 Bristol, UK September, 26th, 2012 Paris Pittsburgh Seattle Cupertino … mayor of arg1 … live in arg1

  8. The Problem with Semi-Supervised Bootstrap Learning ECML/PKDD2012 Bristol, UK September, 26th, 2012 San Francisco Austin denial Paris Pittsburgh Seattle Cupertino … mayor of arg1 … live in arg1

  9. The Problem with Semi-Supervised Bootstrap Learning ECML/PKDD2012 Bristol, UK September, 26th, 2012 San Francisco Austin denial Paris Pittsburgh Seattle Cupertino … mayor of arg1 … live in arg1

  10. The Problem with Semi-Supervised Bootstrap Learning ECML/PKDD2012 Bristol, UK September, 26th, 2012 San Francisco Austin denial Paris Pittsburgh Seattle Cupertino … mayor of arg1 … live in arg1 arg1is home of … … traits such asarg1

  11. The Problem with Semi-Supervised Bootstrap Learning ECML/PKDD2012 Bristol, UK September, 26th, 2012 San Francisco Austin denial Paris Pittsburgh Seattle Cupertino … mayor of arg1 … live in arg1 arg1is home of … … traits such as arg1

  12. The Problem with Semi-Supervised Bootstrap Learning ECML/PKDD2012 Bristol, UK September, 26th, 2012 San Francisco Austin denial Paris Pittsburgh Seattle Cupertino … mayor of arg1 … live in arg1 arg1is home of … … traits such as arg1

  13. The Problem with Semi-Supervised Bootstrap Learning ECML/PKDD2012 Bristol, UK September, 26th, 2012 it’s underconstrained!! San Francisco Austin denial Paris Pittsburgh Seattle Cupertino … … mayor of arg1 … live in arg1 arg1is home of … … traits such as arg1

  14. Bayesian Sets (BS) Ghahramani & Heller; NIPS 2005 • Given and , rank the elements of by how well they would “fit into” a set which includes • Define a score for each : • From Bayes rule, the score can be re-written as:

  15. Bayesian Sets (BS) Ghahramani & Heller; NIPS 2005 • Intuitively, the score compares the probability that x and Dcwere generated by the same model with the sameunknown parameters θ, to the probability that x and Dccame from models with different parameters θ and θ’.

  16. Bayesian Sets (BS) Ghahramani & Heller; NIPS 2005 • Intuitively, the score compares the probability that x and Dcwere generated by the same model with the sameunknown parameters θ, to the probability that x and Dccame from models with different parameters θ and θ’.

  17. Bayesian Sets (BS) Ghahramani & Heller; NIPS 2005 • Intuitively, the score compares the probability that x and Dcwere generated by the same model with the sameunknown parameters θ, to the probability that x and Dccame from models with different parameters θ and θ’.

  18. Bayesian Sets (BS) Ghahramani & Heller; NIPS 2005 • Intuitively, the score compares the probability that x and Dcwere generated by the same model with the sameunknown parameters θ, to the probability that x and Dccame from models with different parameters θ and θ’.

  19. ECML/PKDD2012 Bristol, UK September, 26th, 2012 BS using NELL’s Ontology Everything • Initial ontology: Company Person Sport Vegetable

  20. ECML/PKDD2012 Bristol, UK September, 26th, 2012 BS using NELL’s Ontology Everything • Initial ontology: Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

  21. ECML/PKDD2012 Bristol, UK September, 26th, 2012 BS using NELL’s Ontology • Given a huge web corpus, run BS once Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

  22. ECML/PKDD2012 Bristol, UK September, 26th, 2012 BS using NELL’s Ontology • Given a huge web corpus, run BS once Everything Company Person Sport City Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Facebook DELL … Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle Alan Turing Alexander Fleming … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon Baseball Badminton … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane Beijing Cairo …

  23. ECML/PKDD2012 Bristol, UK September, 26th, 2012 BS using NELL’s Ontology

  24. ECML/PKDD2012 Bristol, UK September, 26th, 2012 BS using NELL’s Ontology

  25. ECML/PKDD2012 Bristol, UK September, 26th, 2012 BS using NELL’s Ontology

  26. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Iterative BS using NELL’s Ontology Zhang & Liu, 2011 • Given a huge web corpus, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo AT&T Boeing Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Basketball Football Swimming Tennis Golf Soccer Volleyball Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London

  27. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Iterative BS using NELL’s Ontology Zhang & Liu, 2011 • Given a huge web corpus, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane

  28. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Iterative BS using NELL’s Ontology Zhang & Liu, 2011 • Given a huge web corpus, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Facebook DELL … Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle Alan Turing Alexander Fleming … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon Baseball Badminton … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane Beijing Cairo …

  29. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Iterative BS using NELL’s Ontology

  30. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Iterative BS using NELL’s Ontology

  31. NELL: Coupled semi-supervised training of many functions ECML/PKDD2012 Bristol, UK September, 26th, 2012

  32. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Coupled Training Type 2:Structured Outputs, Multitask, Posterior Regularization, Multilabel Learn functions with the same input, different outputs, where we know some constraint

  33. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Coupled Training Type 2:Structured Outputs, Multitask, Posterior Regularization, Multilabel Learn functions with the same input, different outputs, where we know some constraint

  34. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Coupled Training Type 2:Structured Outputs, Multitask, Posterior Regularization, Multilabel Learn functions with the same input, different outputs, where we know some constraint

  35. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Coupled Bayesian Sets (CBS)

  36. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Coupled Bayesian Sets (CBS)

  37. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Coupled Bayesian Sets (CBS)

  38. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Coupled Bayesian Sets (CBS)

  39. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Coupled Bayesian Sets (CBS)

  40. ECML/PKDD2012 Bristol, UK September, 26th, 2012 Coupled Bayesian Sets (CBS)

  41. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

  42. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

  43. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

  44. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

  45. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

  46. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

  47. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

  48. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

  49. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

  50. ECML/PKDD2012 Bristol, UK September, 26th, 2012 CBS using NELL’s Ontology • Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS Everything Company Person Sport City Apple Microsoft Google IBM Yahoo AT&T Boeing Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

More Related