1 / 72

Gerhard Weikum Max Planck Institute for Informatics mpi-inf.mpg.de/~weikum/

From Information to Knowledge. Harvesting Entities and Relationships From Web Sources. Gerhard Weikum Max Planck Institute for Informatics http://www.mpi-inf.mpg.de/~weikum/. Martin Theobald Max Planck Institute for Informatics http://www.mpi-inf.mpg.de /~mtb/.

afra
Télécharger la présentation

Gerhard Weikum Max Planck Institute for Informatics mpi-inf.mpg.de/~weikum/

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Information toKnowledge HarvestingEntitiesandRelationships From Web Sources Gerhard Weikum Max Planck Institute forInformatics http://www.mpi-inf.mpg.de/~weikum/ Martin Theobald Max Planck Institute forInformatics http://www.mpi-inf.mpg.de/~mtb/

  2. Goal: Turn Web into Knowledge Base Source: DB & IR methods for knowledge discovery. Communications of the ACM 52(4), 2009 • comprehensiveDB ofhuman knowledge • everythingthatWikipediaknows • everythingmachine-readable • capturingentities, classes, relationships

  3. Approach: Harvesting Facts from Web PoliticianPosition Angela Merkel Chancellor Germany Karl-Theodor zu Guttenberg Minister of Defense Germany Christoph Hartmann Minister of Economy Saarland … ActorAward Christoph Waltz Oscar Sandra Bullock Oscar Sandra Bullock Golden Raspberry … PoliticianPolitical Party Angela Merkel CDU Karl-Theodor zu Guttenberg CDU Christoph Hartmann FDP … CompanyCEO Google Eric Schmidt Yahoo Overture Facebook FriendFeed Software AG IDS Scheer … MovieReportedRevenue Avatar $ 2,718,444,933 The Reader $ 108,709,522 Facebook FriendFeed Software AG IDS Scheer … PoliticalPartySpokesperson CDU Philipp Wachholz Die Grünen Claudia Roth FacebookFriendFeed Software AG IDS Scheer … CompanyAcquiredCompany Google YouTube Yahoo Overture FacebookFriendFeed Software AG IDS Scheer … Cyc IWP ReadTheWeb TextRunner YAGO-NAGA

  4. Knowledge as Enabling Technology • entityrecognition & disambiguation • understandingnaturallanguage& speech • knowledgeservices & reasoningforsemanticapps • (e.g. deep QA) • semanticsearch: preciseanswersto advancedqueries • (byscientists, students, journalists, analysts, etc.) US presidentwhenBarackObamawas born? Indy 500 winnerswhoare still alive? Politicians who are also scientists? Relationshipbetween Angela Merkel, Jim Gray, Dalai Lama? Enzymes thatinhibit HIV? Influenza drugsforteenswithhighbloodpressure? ...

  5. Knowledge Search (1) Whowas US president whenBarackObama was born? http://www.wolframalpha.com

  6. Knowledge Search (1) Whowas mayor of Indianapolis whenBarackObama was born? not enough facts in KB ! http://www.wolframalpha.com

  7. Knowledge Search (2) Indy500 winners? http://www.google.com/squared/

  8. Knowledge Search (2) Indy500 winners? http://www.google.com/squared/

  9. Knowledge Search (2) Indy500 winners from Europe? notypes noinference ! http://www.google.com/squared/

  10. Related Work Yago-Naga EntityRank Cazoodle Text2Onto Powerset ReadTheWeb Avatar System T Hakia Cyc information extraction ontologies UIMA Kylin KOG WebTables (Semantic Web) (Statistical Web) kosmix KnowItAll TextRunner WolframAlpha SWSE StatSnowball EntityCube sig.ma communities DBpedia (Social Web) Cimple DBlife PSOX TrueKnowledge GoogleSquared Freebase Answers START WorldWideTables Cyc IWP ReadTheWeb TextRunner YAGO-NAGA

  11. Outline  What and Why Framework EntitiesandClasses Relationships Temporal Knowledge Wrap-up ...

  12. Framework: Types of Knowledge • facts / assertions: bornIn (JohnDillinger, Indianapolis) • hasWon (JimGray, TuringAward), … • taxonomic: instanceOf (JohnDillinger, bankRobbers), • subclassOf (bankRobbers, criminals), … • lexical / terminology: means (“Big Apple“, NewYorkCity), • means (“Big Mike“, MichaelStonebraker) • means (“MS“, Microsoft) , means (“MS“, MultipleSclerosis) … • common-senseproperties: • applesaregreen, red, juicy, sweet, sour … - but not fast, smart … • ballsareround, smooth, slippery … - but not square, funny … • common-senseaxioms: •  x: human(x)  male(x)  female(x) •  x: (male(x)   female(x))  (female(x) )   male(x)) •  x: animal(x)  (hasLegs(x)  isEven(numberOfLegs(x)) … • procedural: howto fix/install/prepare/remove … • epistemic / beliefs: believes (Ptolemy, shape(Earth, disc)), • believes (Copernicus, shape(Earth, sphere)) … ...

  13. Framework: Information Extraction (IE) Surajit obtainedhis PhD in CS from Stanford University underthesupervision of Prof. Jeff Ullman. He laterjoined HP and workedcloselywith Umesh Dayal … instanceOf (Surajit, scientist) inField (Surajit, computerscience) hasAdvisor (Surajit, Jeff Ullman) almaMater (Surajit, Stanford U) workedFor (Surajit, HP) friendOf (Surajit, Umesh Dayal) … source- centric IE 1) recall ! 2) precision onesource yield-centric harvesting hasAdvisor StudentAdvisor StudentUniversity StudentAdvisor StudentAdvisor Surajit Chaudhuri Jeffrey Ullman Alon Halevy Jeffrey Ullman Jim Gray Mike Harrison … … 1) precision ! 2) recall almaMater StudentUniversity Surajit Chaudhuri Stanford U Alon Halevy Stanford U Jim Gray UC Berkeley … … near-human quality ! manysources

  14. Framework: Knowledge Representation • RDF (Resource Description Framework, W3C): • subject-property-object (SPO) triples, binaryrelations • structure, but no (prescriptive) schema • Relations, frames • Description logics: OWL, DL-lite • Higher-order logics, epistemic logics facts (RDF triples): (JimGray, hasAdvisor, MikeHarrison) (SurajitChaudhuri, hasAdvisor, JeffUllman) (Madonna, marriedTo, GuyRitchie) (NicolasSarkozy, marriedTo, CarlaBruni) facts (RDF triples) 1: 2: 3: 4: factsaboutfacts: 5: (1, inYear, 1968) 6: (2, inYear, 2006) 7: (3, validFrom, 22-Dec-2000) 8: (3, validUntil, Nov-2008) 9: (4, validFrom, 2-Feb-2008) 10: (2, source, SigmodRecord) temporal & provenanceannotations canrefertoreifiedfacts via factidentifiers (approx. equiv. to RDF quadruples: “Color“  Sub  Prop  Obj) ...

  15. KB‘s: Example YAGO (Suchanek et al.: WWW‘07) 2 Mio. entities, 20 Mio. facts 40 Mio. RDF triples ( entity1-relation-entity2, subject-predicate-object ) Entity subclass subclass subclass Organization Person Location subclass subclass subclass Accuracy  95% subclass subclass Country Scientist Politician subclass subclass State instanceOf instanceOf Biologist instanceOf Physicist City instanceOf Germany instanceOf instanceOf locatedIn Erwin_Planck Oct 23, 1944 diedOn locatedIn Kiel Schleswig-Holstein FatherOf bornIn Nobel Prize hasWon instanceOf citizenOf diedOn Oct 4, 1947 Max_Planck Society Max_Planck Angela Merkel Apr 23, 1858 bornOn means(0.9) means means means means(0.1) “Max Planck” “Max Karl Ernst Ludwig Planck” “Angela Merkel” “Angela Dorothea Merkel” http://www.mpi-inf.mpg.de/yago-naga/

  16. KB‘s: Example YAGO (F. Suchanek et al.: WWW‘07) http://www.mpi-inf.mpg.de/yago-naga/

  17. KB‘s: ExampleDBpedia(Auer, Bizer, et al.: ISWC‘07) • 3 Mio. entities, • 1 Bio. facts (RDF triples) • 1.5 Mio. entitiesmappedto • hand-craftedtaxonomyof • 259 classeswith 1200 properties http://www.dbpedia.org

  18. Outline  What and Why  Framework EntitiesandClasses Relationships Temporal Knowledge Wrap-up ...

  19. Entities & Classes Whichentitytypes (classes, unarypredicates) arethere? scientists, doctoralstudents, computerscientists, … femalehumans, male humans, marriedhumans, … Whichsubsumptionsshould hold (subclass/superclass, hyponym/hypernym, inclusiondependencies)? subclassOf (computerscientists, scientists), subclassOf (scientists, humans), … Whichindividual entitiesbelongtowhichclasses? instanceOf (Surajit Chaudhuri, computerscientists), instanceOf (BarbaraLiskov, computerscientists), instanceOf (Barbara Liskov, femalehumans), … Whichnamesdenotewhichentities? means (“Lady Di“, Diana Spencer), means (“Diana Frances Mountbatten-Windsor”, Diana Spencer), … means (“Madonna“, Madonna Louise Ciccone), means (“Madonna“, Madonna(paintingby Edward Munch)), … ...

  20. WordNet Thesaurus [Miller/Fellbaum 1998] 3 concepts / classes & theirsynonyms (synset‘s) http://wordnet.princeton.edu/

  21. WordNet Thesaurus [Miller/Fellbaum 1998] subclasses (hyponyms) superclasses (hypernyms) http://wordnet.princeton.edu/

  22. WordNet Thesaurus [Miller & Fellbaum 1998] • > 100 000 classes and lexical relations; • canbecastinto • descriptionlogicsor • graph, withweightsforrelationstrengths • (derivedfromco-occurrencestatistics) but: onlyfewindividual entities (instancesofclasses) scientist, man of science (a personwithadvancedknowledge) => cosmographer, cosmographist => biologist, life scientist => chemist => cognitivescientist => computerscientist ... => principalinvestigator, PI … HAS INSTANCE => Bacon, Roger Bacon … http://wordnet.princeton.edu/

  23. Tapping on Wikipedia Categories

  24. Tapping on Wikipedia Categories

  25. Mapping: Wikipedia  WordNet [Suchanek: WWW‘07, Ponzetto&Strube: AAAI‘07] Missing Person Sailor, Crewman American Computer Scientist Scientist Jim Gray (computer specialist) Chemist Artist

  26. Mapping: Wikipedia  WordNet [Suchanek: WWW‘07, Ponzetto&Strube: AAAI‘07] Missing Person Sailor, Crewman ? People Lost atSea Computer Scientists by Nation American instanceOf American Computer Scientists Computer Scientist Scientist subclassOf Jim Gray (computer specialist) Databases Data- base ? Database Researcher ? Engineering Societies Fellow (1), Comrade ? Fellowsof the ACM ? Fellow (2), Colleague ACM namesimilarity (editdist., n-gram overlap) ? Fellow (3) (of Society) Members ofLearned Societies contextsimilarity (word/phraselevel) ? Member (1), Fellow ? machinelearning ? Member (2), Extremity

  27. Mapping: Wikipedia  WordNet [Suchanek: WWW‘07, Ponzetto & Strube:AAAI‘07] Given: entitye in Wikipediacategoriesc1, …, ck Wanted: instanceOf(e,c) and subclassOf(ci,c) for WN classc Problem: vagueness& ambiguity of names c1, …, ck Analyzingcategorynames noungroupparser: American MusiciansofItalianDescent pre-modifier head post-modifier American Folk Music ofthe 20th Century pre-modifier head post-modifier American Indy 500 Drivers on Pole Positions pre-modifier head post-modifier Head wordiskey, shouldbe in pluralforinstanceOf

  28. Mapping Wikipedia Entities to WordNet Classes [Suchanek: WWW‘07, Ponzetto & Strube: AAAI‘07] Given: entitye in Wikipediacategoriesc1, …, ck Wanted: instanceOf(e,c) and subclassOf(ci,c) for WN classc Problem: vagueness& ambiguity of names c1, …, ck Heuristic Method: foreachci do ifheadword w ofcategorynameciis plural { 1) match w againstsynsetsofWordNetclasses 2) choosebestfittingclassc andsete  c 3) expandw bypre-modifierandsetci  w+  c } tunedconservatively: highprecision, reducedrecall • can also derivefeaturesthisway • feedintosupervisedclassifier

  29. Learning More Mappings [ Wu & Weld: WWW‘08 ] • KylinOntology Generator (KOG): • learnclassifierforsubclassOfacrossWikipedia & WordNetusing • YAGO astrainingdata • advanced ML methods (MLN‘s, SVM‘s) • richfeaturesfromvarioussources • category/classnamesimilaritymeasures • categoryinstancesandtheirinfoboxtemplates: • templatenames, attributenames (e.g. knownFor) • Wikipediaedithistory: • refinementofcategories • Hearst patterns: • C such as X, X and Y andotherC‘s, … • othersearch-enginestatistics: • co-occurrencefrequencies > 3 Mio. entities > 1 Mio. w/ infoboxes > 500 000 categories

  30. Goal: Comprehensive & Consistent ! Telecomm. History Knuth Prize Laureate Doctoral Students American … Bell Labs Known For Princeton Alumni Academic American People by Occupation Jeffrey Ullman Alma Mater American Computer Scientists Scientist Notable Awards Databases Jim Gray (computer specialist) Database Researcher Fellow(1) Computer Data Fellow(2) Born Fellowsof the ACM Members ofLearned Societies Award Winner Years Active Madonna (entertainer) U Michigan Alumni Athlete Genres Americansof ItalianDescent World Record Holders Artist Also Known As Bob Dylan People by Status Musician American Songwriters … Hall ofFame Inductees Singer Website Guitar Players Italian

  31. Goal: Comprehensive & Consistent ! Telecomm. History Knuth Prize Laureate Doctoral Students American … Bell Labs Known For Princeton Alumni Academic American People by Occupation Jeffrey Ullman Alma Mater American Computer Scientists Scientist Notable Awards Databases Jim Gray (computer specialist) Database Researcher Fellow(1) Computer Data Fellow(2) Born Fellowsof the ACM Members ofLearned Societies Award Winner Years Active U Michigan Alumni Madonna (entertainer) Athlete Genres Americansof ItalianDescent World Record Holders Artist Also Known As Bob Dylan People by Status American Songwriters Musician … Hall ofFame Inductees Singer Website Guitar Players Italian

  32. Goal: Comprehensive & Consistent ! Telecomm. History Knuth Prize Laureate Doctoral Students American … Bell Labs Known For Princeton Alumni Academic American People by Occupation Jeffrey Ullman Alma Mater American Computer Scientists Scientist Notable Awards Databases Jim Gray (computer specialist) Database Researcher Fellow(1) Computer Data Fellow(2) Born Fellowsof the ACM Members ofLearned Societies Award Winner Years Active U Michigan Alumni Madonna (entertainer) Athlete Genres Americansof ItalianDescent World Record Holders Artist Also Known As Bob Dylan People by Status American Songwriters Musician … Hall ofFame Inductees Singer Website Guitar Players Italian

  33. Goal: Comprehensive & Consistent ! Telecomm. History Knuth Prize Laureate Doctoral Students American … Bell Labs Known For Princeton Alumni Academic American People by Occupation Jeffrey Ullman Alma Mater American Computer Scientists Scientist Clean upthe mess: • graphalgorithms ? • random walk withrestart • densesubgraphs … • statisticalmachinelearning ? • logicalconsistencyreasoning ? • giganticschemaintegration? • ontologymerging Notable Awards Databases Jim Gray (computer specialist) Database Researcher Fellow(1) Computer Data Fellow(2) Born Fellowsof the ACM Members ofLearned Societies Award Winner Years Active U Michigan Alumni Madonna (entertainer) Athlete Genres Americansof ItalianDescent World Record Holders Artist Also Known As Bob Dylan People by Status American Songwriters Musician … Hall ofFame Inductees Singer Website Guitar Players Italian

  34. Long Tail of Class Instances

  35. Long Tail of Class Instances [Etzioni et al. 2004, Cohen et al. 2008, Mitchell et al. 2010] • State-of-the-Art Approach (e.g. SEAL): • Start withseeds: a fewclassinstances • Find lists, tables, textsnippets(“forexample: …“), … • thatcontainoneormoreseeds • Extractcandidates: nounphrasesfromvicinity • Gatherco-occurrencestats(seed&cand, cand&classNamepairs) • Rankcandidates • point-wise mutual information, … • random walk (PR-style) on seed-candgraph But: Precision dropsforclasseswithsparsestatistics(DB profs, …) Harvesteditemsarenames, not entities Canonicalization (de-duplication) unsolved

  36. Individual Entity Disambiguation Names Entities Sean Penn “Penn“ ? University of Pennsylvania “U Penn“ Pennsylvania State University “Penn State“ Pennsylvania (US State) „PSU“ Passenger Service Unit • ill-definedwithzerocontext • knownasrecordlinkagefornames in recordfields • Wikipediaoffersrichcandidatemappings: • disambiguationpages, re-directs, inter-wiki links, • anchortextsofhref links

  37. Collective Entity Disambiguation [McCallum 2003, Doan 2005, Getoor 2006. Domingos 2007, Chakrabarti 2009, …] • Consider a setofnames {n1, n2, …} in same context • andsetsofcandidateentities • E1 = {e11, e12, …}, E2 = {e21, e22, …}, … • Definejointobjectivefunction(e.g. likelihoodfor prob. model) • thatrewardscoherence of mappingsni eij • Solveoptimizationproblem Stuart Russell (DJ) Stuart Russell Stuart Russell (computerscientist) Michael Jordan Michael Jordan (computerscientist) Michael Jordan (NBA)

  38. Problems and Challenges Wikipediacategoriesreloaded comprehensive & consistentinstanceOfandsubClassOf acrossWikipediaandWordNet (via consistency reasoning ?) Long tail of entities beyondWikipedia: domain-specificentitycatalogs discovernewentities, detectnewnamesforknownentities Tags, tables, topics tap on othersources: Web2.0, Web tables, directories, etc. Robust disambiguation near-real-time mappingofnamestoentities withnear-human quality

  39. Outline  What and Why  Framework  EntitiesandClasses Relationships Temporal Knowledge Wrap-up ...

  40. Relationships Whichinstances (pairs of individualentities) arethere forgivenbinaryrelationswithspecifictypesignatures? hasAdvisor (JimGray, MikeHarrison) hasAdvisor (HectorGarcia-Molina, Gio Wiederhold) hasAdvisor (Susan Davidson, Hector Garcia-Molina) graduatedAt (JimGray, Berkeley) graduatedAt (HectorGarcia-Molina, Stanford) hasWonPrize (JimGray, TuringAward) bornOn (JohnLennon, 9Oct1940) diedOn (JohnLennon, 8Dec1980) marriedTo (JohnLennon, YokoOno) Which additional & interestingrelationtypesarethere betweengivenclassesofentities? competedWith(x,y), nominatedForPrize(x,y), … divorcedFrom(x,y), affairWith(x,y), … assassinated(x,y), rescued(x,y), admired(x,y), …

  41. Picking Low-Hanging Fruit (First)

  42. Deterministic Pattern Matching [Kushmerick 97, Califf & Mooney 99, Gottlob 01, …] • Regular expressionsmatching • Wrapper induction • (grammarlearningfor • restrictedregularlanguages) • Well understood ...

  43. French Marriage Problem facts in KB: newfactsorfactcandidates: married(Cecilia, Nicolas) married (Carla, Benjamin) married (Carla, Mick) married (Michelle, Barack) married (Yoko, John) married (Kate, Leonardo) married (Carla, Sofie) married (Larry, Google) married (Hillary, Bill) married (Carla, Nicolas) married (Angelina, Brad) forrecall: pattern-basedharvesting forprecision: consistencyreasoning

  44. Pattern-BasedHarvesting (Hearst 92, Brin98, Agichtein 00, Etzioni 04, …) Facts & Fact Candidates Patterns (Hillary, Bill) X and her husband Y (Carla, Nicolas) X and Y on their honeymoon (Angelina, Brad) (Victoria, David) X and Y and their children (Hillary, Bill) X has been dating with Y (Carla, Nicolas) X loves Y (Yoko, John) … • good for recall • noisy, drifting • not robust enough • for high precision (Kate, Pete) (Carla, Benjamin) (Larry, Google) (Angelina, Brad) (Victoria, David)

  45. Reasoningabout Fact Candidates Useconsistencyconstraintstoprunefalsecandidates groundatoms: FOL rules (restricted): spouse(Hillary,Bill) spouse(Carla,Nicolas) spouse(Cecilia,Nicolas) spouse(Carla,Ben) spouse(Carla,Mick) Spouse(Carla, Sofie) spouse(x,y)  diff(y,z)  spouse(x,z) spouse(x,y)  diff(w,y)  spouse(w,y) spouse(x,y)  f(x) spouse(x,y)  m(y) spouse(x,y)  (f(x)m(y))  (m(x)f(y)) f(Hillary) f(Carla) f(Cecilia) f(Sofie) m(Bill) m(Nicolas) m(Ben) m(Mick) Rules revealinconsistencies Find consistentsubset(s) ofatoms (“possibleworld(s)“, “thetruth“) • Rules canbeweighted • (e.g. byfractionofgroundatomsthatsatisfy a rule) • uncertain / probabilistic data • compute prob. distr. ofsubsetofatomsbeingthetruth

  46. MarkovLogic Networks (MLN‘s) (M. Richardson / P. Domingos 2006) Maplogicalconstraints & factcandidates intoprobabilisticgraph model: Markov Random Field (MRF) s(x,y)  diff(y,z)  s(x,z) s(x,y)  f(x) f(x)  m(x) s(Carla,Nicolas) s(Cecilia,Nicolas) s(Carla,Ben) s(Carla,Sofie) … s(x,y)  diff(w,y)  s(w,y) s(x,y)  m(y) M(x)  f(x) Grounding: Literal  Boolean Var Literal  binary RV s(Ca,Nic)  s(Ce,Nic) s(Ca,Nic)  s(Ca,Ben) s(Ca,Nic)  m(Nic) s(Ca,Nic)  s(Ca,So) s(Ce,Nic)  m(Nic) s(Ca,Ben)  s(Ca,So) s(Ca,Ben)  m(Ben) s(Ca,Ben)  s(Ca,So) s(Ca,So)  m(So)

  47. MarkovLogic Networks (MLN‘s) (M. Richardson / P. Domingos 2006) Maplogicalconstraints & factcandidates intoprobabilisticgraph model: Markov Random Field (MRF) s(x,y)  diff(y,z)  s(x,z) s(x,y)  f(x) f(x)  m(x) s(Carla,Nicolas) s(Cecilia,Nicolas) s(Carla,Ben) s(Carla,Sofie) … s(x,y)  diff(w,y)  s(w,y) s(x,y)  m(y) M(x)  f(x) s(Ce,Nic) RVs coupled by MRF edge iftheyappear in same clause m(Nic) s(Ca,Nic) s(Ca,Ben) m(Ben) s(Ca,So) MRF assumption: P[Xi|X1..Xn]=P[Xi|N(Xi)] m(So) Varietyofalgorithmsforjointinference: Gibbs sampling, other MCMC, belief propagation, randomized MaxSat, … jointdistribution hasproduct form over all cliques

  48. Related Alternative Probabilistic Models ConstrainedConditional Models [D. Roth et al. 2007] log-linear classifierswithconstraint-violationpenalty mappedinto Integer Linear Programs Factor Graphs with Imperative Variable Coordination [A. McCallum et al. 2008] s(Ce,Nic) RV‘sshare “factors“ (jointfeaturefunctions) generalizes MRF, BN, CRF, … inference via advanced MCMC flexible coupling & constrainingofRV‘s m(Nic) s(Ca,Nic) s(Ca,Ben) m(Ben) s(Ca,So) m(So) softwaretools: alchemy.cs.washington.edu code.google.com/p/factorie/ research.microsoft.com/en-us/um/cambridge/projects/infernet/

  49. Reasoning for KB Growth: Direct Route (F. Suchanek et al.: WWW‘09) newfactcandidates: facts in KB: married(Cecilia, Nicolas) married (Carla, Benjamin) married (Carla, Mick) married (Carla, Sofie) married (Larry, Google) ? married (Hillary, Bill) married (Carla, Nicolas) married (Angelina, Brad) + patterns: X and her husband Y X and Y andtheirchildren X hasbeendatingwith Y X loves Y Directapproach: • factsaretrue; factcandidates& patterns hypotheses • groundedconstraints clauseswithhypothesesasvars • castintoWeighted Max-Satwithweightsfrompatternstats • customizedapproximationalgorithm • unifies: factcandconsistency, patterngoodness, entitydisambig. www.mpi-inf.mpg.de/yago-naga/sofie/

  50. Facts & Patterns Consistency (F. Suchanek et al.: WWW‘09) constraintstoconnectfacts, factcandidates, patterns functionaldependencies: relationproperties: spouse(X,Y): X Y, Y X asymmetry, transitivity, acyclicity, … pattern-factduality: type constraints, inclusiondependencies: occurs(p,x,y)  expresses(p,R)  R(x,y) spouse Person  Person capitalOfCountry cityOfCountry occurs(p,x,y)  R(x,y)  expresses(p,R) domain-specificconstraints: name(-in-context)-to-entitymapping: bornInYear(x) + 10years ≤ graduatedInYear(x)  means(n,e1)   means(n,e2)  … hasAdvisor(x,y)  graduatedInYear(x,t) graduatedInYear(y,s)  s < t www.mpi-inf.mpg.de/yago-naga/sofie/

More Related