1 / 71

On WordNet, Text Mining, and Knowledge Bases of the Future

On WordNet, Text Mining, and Knowledge Bases of the Future. Peter Clark March 2006 Knowledge Systems Boeing Phantom Works. Introduction. Interested in text understanding & question-answering use of world knowledge to go beyond text Used WordNet as (part of) the knowledge repository

bert-harris
Télécharger la présentation

On WordNet, Text Mining, and Knowledge Bases of the Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On WordNet, Text Mining, and Knowledge Bases of the Future Peter Clark March 2006 Knowledge Systems Boeing Phantom Works

  2. Introduction • Interested in text understanding & question-answering • use of world knowledge to go beyond text • Used WordNet as (part of) the knowledge repository • got some leverage • can we get more? • what would a WordNet KB look like?

  3. Outline • Machine understanding and question-answering • An initial attempt • From WordNet to a Knowledge Base • Representation • Reasoning • Text Mining for Possibilistic Knowledge • A Knowledge Base of the Future?

  4. Outline • Machine understanding and question-answering • An initial attempt • From WordNet to a Knowledge Base • Representation • Reasoning • Text Mining for Possibilistic Knowledge • A Knowledge Base of the Future?

  5. On Machine Understanding • Consider “China launched a meteorological satellite into orbit Wednesday, the first of five weather guardians to be sent into the skies before 2008.” • Suggests: • there a rocket launch • China owns the satellite • the satellite is for monitoring weather • the orbit is around the Earth • etc. None of these are explicitly stated in the text

  6. On Machine Understanding • Understanding = creating a situation-specific model (SSM), coherent with data & background knowledge • Data suggests background knowledge which may be appropriate • Background knowledge suggest ways of interpreting data ? ? Fragmentary, ambiguous inputs Coherent Model (situation-specific)

  7. On Machine Understanding World Knowledge Assembly of pieces, assessment of coherence, inference ? ? Fragmentary, ambiguous inputs Coherent Model (situation-specific)

  8. On Machine Understanding World Knowledge • Conjectures about the nature of the beast: • “Small” number of core theories • space, time, movement, … • can encode directly • Large amount of “mundane” facts • a dictionary contains many of these facts

  9. Outline • Machine understanding and question-answering • An initial attempt • From WordNet to a Knowledge Base • Representation • Reasoning • Text Mining for Possibilistic Knowledge • A Knowledge Base of the Future?

  10. Caption-Based Video Retrieval World Knowledge Coherent representation of the scene (elaborated, disambiguated) ? ? English captions describing a video segment (partial, ambiguous) Question-Answering, Search, etc.

  11. Illustration: Caption-Based Video Retrieval Open Caption text Interpretation agent object Man Door Airplane is-part-of Elaboration (inference, scene-building) Open Airplane Door Man World Knowledge Query: Touch Search Door Person Video “A lever is rotated to the unarmed position” “…” “A man opens an airplane door” “…” Captions (manual authoring)

  12. Semantic Retrieval • Query: “A person walking” → Result: “A man carries a box across a room” • “Someone injured” → “An employee was drilling a hole in a piece of wood. The drill bit of the drill broke. The drill twisted out of the employee's right hand. The drill injured the employee's right thumb.” • “An object was damaged” → the above caption (x 2) → “Someone broke the side mirrors of a Boeing truck.”

  13. The Knowledge Base • Representation: • Horn-Clause rules • plus add/delete lists for “before” and “after” rules • Authored in simplified English • NLP system interactively translates to logic • WordNet + UT Austin relations as the ontology • ~1000 rules authored • just a drop in the ocean! • Reasoning: • depth-limited forward chaining • precondition/effects just asserted (no sitcalc simulation)

  14. Some of the Rules in the KB: IF a person is carrying an entity that is inside a room THEN(almost) always the person is in the room. IF a person is picking an object up THEN(almost) always the person is holding the object. IF an entity is near a 2nd entity AND the 2nd entity contains a 3rd entity THEN usually the 1st entity is near the 3rd entity. ABOUT boxes: usually a box has a lid. BEFORE a person gives an object, (almost) always the person possesses the object. AFTER a person closes a barrier, (almost) always the barrier is shut. …1000 more…

  15. Some of the Rules in the KB: IF a person is carrying an entity that is inside a room THEN (almost) always the person is in the room. isa(_Person1, person_n1), isa(_Carry1, carry_v1), isa(_Entity1, entity_n1), isa(_Room1, room_n1), agent(_Carry1, _Person1), object(_Carry1, _Entity1), is-inside(_Entity1, _Room1), ==== (almost) always ===> is-inside(_Person1, _Room1).

  16. Critique: 2 Big Questions Hanging • Representation: The Knowledge Base • Unscalable to build the KB from scratch • WordNet helped a lot • Could it be extended to help more? • What would that WordNet KB look like? • How could it be built? • Reasoning: • Deductive inference is insufficient • How looks with large, noisy, uncertain knowledge?

  17. Outline • Machine understanding and question-answering • An initial attempt • From WordNet to a Knowledge Base • Representation • Reasoning • Text Mining for Possibilistic Knowledge • A Knowledge Base of the Future?

  18. What Knowledge Do We Need? "A dawn bomb attack devasted a major Shiite shrine in Iraq..." Like system to infer that: The bomb exploded The explosion caused the devastation The shrine was damaged … System needs to know: Bombs can explode Explosions can destroy things Destruction ≈ devastation Attacks are usually done by people …

  19. What Knowledge Do We Need? "Israeli troops were engaged in a fierce gun battle with militants in a West Bank town. An Israeli soldier was killed. Like system to infer that: There was a fight. The soldier died. The soldier was shot. The soldier was a member of the Israeli troops. … System needs to know: A battle involves a fight. Soldiers use guns. Guns can kill. If you are killed you are dead. Soldiers belong to troops …

  20. WordNet (Princeton Univ) • Is not a word net; is a concept net • 117,000 lexically motivated concepts (synsets) • organized into a taxonomy (hypernymy) • massively used in AI (~7000 downloads/month) 201173984 201378060: "shuffle", "ruffle", "mix": (mix so as to make a random order or arrangement; "shuffle the cards") 201174946 superclass / genls / supertype 201378060

  21. WordNet (Princeton Univ) • Is not a word net; is a concept net • 117,000 lexically motivated concepts (synsets) • organized into a taxonomy (hypernymy) • massively used in AI (~7000 downloads/month) handle_v4 mix_v6: "shuffle", "ruffle", "mix": (mix so as to make a random order or arrangement; "shuffle the cards") manipulate_v2 superclass / genls / supertype mix_v6

  22. The Evolution of WordNet lexical resource • v1.0 (1986) • synsets (concepts) + hypernym (isa) links • v1.7 (2001) • add in additional relationships • has-part • causes • member-of • entails-doing (“subevent”) • v2.0 (2003) • introduce the instance/class distinction • Paris isa Capital-City is-type-of City • add in some derivational links • explode related-to explosion • … • v10.0 (2010?) • ????? knowledge base?

  23. WordNet as a Knowledge Base Got: just “isa” and “part-of” knowledge But still need: • Axioms about each concept! • From definitions and examples (?) • shallow extraction has been done (LCC and ISI) • getting close to useful logic II. Relational vocabulary (case roles, semantic relns) • could take from: FrameNet, Cyc, UT Austin III. Relations between word senses: • bank (building) vs. bank (institution) vs. bank (staff) • cut (separate) vs. cut (sweeping motion)

  24. I. Knowledge in the word sense definitions: How much knowledge is in WordNet? • Ide & Veronis: • dictionaries have no broad contextual/world knowledge • e.g., no connection between “lawn” and “house” • Not true! WN1.6 1 sense of lawn Sense 1 lawn#1 -- (a field of cultivated and mowed grass) -> field#1 -- (a piece of land cleared of trees and usually enclosed) => yard#2, grounds#2 -- (the land around a house or other building; "it was a small house with almost no yard at all") WN1.7.1 garden -- (a yard or lawn adjoining a house)

  25. I. Knowledge in the word sense definitions: How much knowledge is in WordNet? "lawn". WordNet seems to "know", among other things, that lawns • need watering • can have games played on them • can be flattened, mowed • can have chairs on them and other furniture • can be cut/mowed • things grow on them • have grass ("lawn grass" is a common compound) • leaves can get on them • can be seeded

  26. I. Knowledge in the word sense definitions: How much knowledge is in WordNet? "accident" (ignoring derivatives like "accidentally") • accidents can block traffic • you can be prone to accidents • accidents happen • result from violent impact; passengers can be killed • involve vehicles, e.g., trains • results in physical damage or hurt, or death • there are victims • you can be blamed for accidents

  27. Let’s take a look…

  28. I. Knowledge in the word sense definitions: Generating Logic from Glosses • Definitions appear deceptively simple • really, huge representational challenges underneath hammer_n2:(a hand tool with a heavy rigid head and a handle; used to deliver an impulsive force by striking) launch_v3: (launch for the first time; "launch a ship") cut_n1: (the act of reducing the amount or number) love_v1: (have a great affection or liking for) theater_n5: (a building where theatrical performances can be held) • Want logic to be faithful but also simple (usable) • Claim: We can get away with a “shallow” encoding • all knowledge as Horn clauses • some loss of fidelity • gain in syntactic simplicity and reusability

  29. I. Knowledge in the word sense definitions: Simplifying • “Timeless” representations • No tagging of facts with situations • Representation doesn’t handle change break_v4:(render inoperable or ineffective; "You broke the alarm clock when you took it apart!") Ax,y isa(x,Break_v4) & isa(y,Thing) & object(x,y) → property(y,Inoperable) Break object Thing Inoperable property

  30. I. Knowledge in the word sense definitions: Simplifying • For statements about types, use instances instead: “hammer_n2:(… used to deliver an impulsive force by striking)” • Ax isa(x,Hammer_n2) → • Ed,f,s,y,z … & • isa(d, Deliver_v9) & • isa(s, Hit_v2) & • isa(f, Force_n3) & • purpose(x, d) & • object(d, f) & • subevent(d, s). object purpose Hammer Deliver Force has-part subevent Handle Head Strike property Rigid Heavy Strictly, should be purpose(x,Deliver-Impulsive-Force)

  31. II: Relational Vocabulary • Is this enough? • No, also need relational vocabulary • Which relational vocabulary to use? • agent, patient, has-part, contains, destination, … • Possible sources: • UT Austin’s Slot Dictionary (~100 relations) • Cyc (~1000 relations) • FrameNet (??)

  32. III. Relations between word senses: Nouns School_n1: an institution School_n2: a building School_n3: the process of being educated School_n4: staff and students School_n5: a time period of instruction • Nouns often have multiple, related senses • Reasoner needs to know these are related The school declared that the teacher’s strike was over. Students should arrive at 9:15am tomorrow morning. School_n1 (institution) School_n2 (building) School_n4 (staff,students)

  33. III. Relations between word senses: Nouns • Can hand-code these relationships (slow) constituent staff and students (School_n4) institution (School_n1) participants constituent educational process (School_n3) building (School_n2) location during time period of instruction (School_n5)

  34. III. Relations between word senses: Nouns • Can hand-code these relationships (slow) • BUT: The patterns repeat (Buitelaar) constituent staff and students (School_n4) institution (School_n1) participants constituent educational process (School_n3) building (School_n2) location during time period of instruction (School_n5) constituent institution members constituent participants building process location during time period

  35. III. Relations between word senses: Nouns • Can hand-code these relationships (slow) • BUT: The patterns repeat (Buitelaar) • can encode and reuse the patterns constituent staff and students (School_n4) institution (School_n1) participants constituent educational process (School_n3) building (School_n2) location during time period of instruction (School_n5) constituent institution members constituent participants building process location during time period

  36. III. Relations between word senses: Verbs • WordNet’s verb senses: • 41 senses of “cut” • linguistically not representationally motivated • “cut grass” (cut_v18) ≠ “cut timber” (cut_v31) ≠ “cut grain” (cut_v28) (“mow”, “chop”, “harvest”) • cut_v1 (separate) ≠ cut_v3 (slicing movement) • fails to capture commonality • Better: • Organize verbs into “mini taxonomy” • “Supersenses”, to group same meanings • Identify facets of verbs, use multiple inheritance • result of action • style of action

  37. Outline • Machine understanding and question-answering • An initial attempt • From WordNet to a Knowledge Base • Representation • Reasoning • Text Mining for Possibilistic Knowledge • A Knowledge Base of the Future?

  38. The Myth of Common-Sense:All you need is knowledge… “We don’t believe that there’s any shortcut to being intelligent; the “secret” is to have lots of knowledge.” Lenat & Guha ‘86 “Knowledge is the primary source of the intellectual power of intelligent agents, both human and computer.” Feigenbaum ‘96

  39. The Myth of Common-Sense • Common, implicit assumption (belief?) in AI: • Knowledge is the key to intelligence • Acquisition of knowledge is bottleneck • Spawned from: • ’70s experience with Expert Systems • Feigenbaum’s “knowledge acquisition bottleneck” • Introspection

  40. Thought Experiment… • Suppose we had • good logical translations of the WordNet definitions • good relational vocabulary • rich relationships between related word senses • How would these be used? • Would they be enough? • What else would be needed?

  41. Initial Scenario Sentence "A dawn bomb attack devasted a major Shiite shrine in Iraq..." Dawn time causes Devastate Attack instrument object Shrine Bomb

  42. One Elaboration Step (knowledge of bomb) "A dawn bomb attack devasted a major Shiite shrine in Iraq..." Dawn time causes Devastate Attack WordNet instrument object Shrine Bomb “bomb: an explosive device fused to detonate” Device Bomb Detonate purpose contains Explosive

  43. One Elaboration Step (knowledge of bomb) "A dawn bomb attack devasted a major Shiite shrine in Iraq..." Dawn time causes Devastate Attack WordNet instrument object Shrine Bomb “bomb: an explosive device fused to detonate” Dawn Device causes Devastate Attack instrument Bomb Detonate Device purpose object Shrine contains Detonate Bomb purpose Explosive contains Explosive

  44. Additional, Relevant Knowledge in WordNet “bomb: an explosive device fused to detonate” “bombing: the use of bombs for sabotage; a tactic frequently used by terrorists” Device agent Terrorist Bombing Bomb Detonate purpose instrument contains Bomb Explosive “destroy: damage irrepairably” causes “plastic explosive: an explosive material …intended to destroy” Destroy Damage Destroy “explode: destroy by exploding” Destroy causes purpose Explode Explosive

  45. Multiple Elaboration Steps "A dawn bomb attack devasted a major Shiite shrine in Iraq..." Dawn WordNet time causes {Devastate, Destroy} Attack instrument object Shrine Bomb “bomb: an explosive device fused to detonate” Dawn time causes Devastate Attack instrument object {Detonate, Explode} Shrine Bomb contains Explosive

  46. Multiple Elaboration Steps "A dawn bomb attack devasted a major Shiite shrine in Iraq..." Dawn WordNet time causes {Devastate, Destroy} Attack instrument object Shrine Bomb “bomb: an explosive device fused to detonate” “bombing: the use of bombs for sabotage; a tactic frequently used by terrorists” Dawn time agent causes Devastate Terrorist Attack instrument object {Detonate, Explode} Shrine Bomb contains Explosive

  47. Multiple Elaboration Steps "A dawn bomb attack devasted a major Shiite shrine in Iraq..." Dawn WordNet time causes {Devastate, Destroy} Attack instrument object Shrine Bomb “bomb: an explosive device fused to detonate” “bombing: the use of bombs for sabotage; a tactic frequently used by terrorists” “plastic explosive: an explosive material …intended to destroy” Dawn time agent causes {Devastate, Destroy} Terrorist Attack instrument object {Detonate, Explode} Shrine Bomb purpose contains Explosive

  48. Multiple Elaboration Steps "A dawn bomb attack devasted a major Shiite shrine in Iraq..." Dawn WordNet time causes {Devastate, Destroy} Attack instrument object Shrine Bomb “bomb: an explosive device fused to detonate” “bombing: the use of bombs for sabotage; a tactic frequently used by terrorists” “plastic explosive: an explosive material …intended to destroy” “destroy: damage irrepairably” Dawn time agent causes {Devastate, Destroy} causes Terrorist Attack Damage instrument object {Detonate, Explode} Shrine Bomb purpose contains Explosive

  49. Multiple Elaboration Steps "A dawn bomb attack devasted a major Shiite shrine in Iraq..." Dawn WordNet time causes {Devastate, Destroy} Attack instrument object Shrine Bomb “bomb: an explosive device fused to detonate” “bombing: the use of bombs for sabotage; a tactic frequently used by terrorists” “plastic explosive: an explosive material …intended to destroy” “destroy: damage irrepairably” “explode: destroy by exploding” Dawn time agent causes {Devastate, Destroy} causes Terrorist Attack Damage instrument causes object {Detonate, Explode} Shrine Bomb purpose contains Explosive

  50. How this really works… • Pieces may not “fit together” so neatly • multiple ways of saying the same thing • Uncertainty at all stages of the process • definitions are often only typical facts • errors in both English and translations • Process is not a chain of deductions, rather • is a search of possible elaborations • looking for the most “coherent” elaboration • More “crystallization” rather than “deduction”

More Related