1 / 14

BBN AQUA

BBN AQUA. Scott Miller & Ralph Weischedel 13 June 2002. BBN’s Approach. Theme: Use statistical learning algorithms for document retrieval, entity recognition , & proposition recognition Mechanism Analyze the question Reduce question to propositions, entities and a bag of words

tansy
Télécharger la présentation

BBN AQUA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BBN AQUA Scott Miller & Ralph Weischedel 13 June 2002

  2. BBN’s Approach • Theme: Use statistical learning algorithms for document retrieval, entity recognition, & proposition recognition • Mechanism • Analyze the question • Reduce question to propositions, entities and a bag of words • Predict the type of the answer • Retrieve passages using document retrieval based on the propositional components and the bag of words • Find the most relevant (in document retrieval sense) passage(s) that contain an answer of the right type and satisfying the propositions. • Return the answer and the document

  3. Frequency of Q Types Defined Types

  4. Question Classification • Developed initial statistically trainable classifier • Offers language independence • Collecting and annotating training data by type • > 2,000 annotated to date

  5. Spotting Answers via IdentiFinder • Same categories as the question-answer types, except no • Reason • Definition • Use • Biography • Cause-Effect-Influence • Other

  6. Status • Singly annotated 1M word Treebank by types and subtypes for names and descriptions • Current IdentiFinder performance • IdentiFinder easily trainable for other languages, e.g., Arabic and Chinese

  7. Spotting Answers via SIFT Parser • Parse a ‘paragraph’ that is relevant • Find the following structures that involve the person/thing to be described • Appositive • Lt. Gen. Colin L. Powell • Colin Powell, outgoing chairman of the US Joint Chiefs of Staff, • Copula • Colin L. Powell became the first black to serve as White House national security adviser • Select the appropriate description • SIFT parser easily trainable for other languages where a treebank is under development, e.g., Arabic and Chinese

  8. What is Proposition Indexing • A shallow semantic representation • Deeper than bags of words • But broad enough to cover all the text • Characterizes documents by • The entities they contain • Propositions involving those entities • Resolves all references to entities • Whether named, described, or pronominal • Represents all propositions that are directly stated in the text

  9. Why Proposition Indexing • Question • Who is Ayman al-Zawahri? • Source text Osama bin Laden, his top deputy and a man identified as a Sept. 11 attacker were shown in a brief video aired Monday by the pan-Arab satellite station Al-Jazeera. It wasn't clear when the tapes were made. The deputy, Ayman al-Zawahri, was shown claiming the Sept. 11 attacks as a ``great victory.'' • Answer • Osama bin Laden’s deputy.

  10. (shown lobj:e2 in:e1) … (video e1) (brief e1) (*person* e2) (*name* e2 “Osama bin Laden”) (*person* e3) (deputy e3 of:e2) (*person* e4) (man e4) … (*name* e3 “Ayman al-Zawahri”) Proposition Example • Osama bin Laden, his top deputy and a man identified as a Sept. 11 attacker were shown in a brief video aired Monday by the pan-Arab satellite station Al-Jazeera. It wasn't clear when the tapes were made. • The deputy, Ayman al-Zawahri, was shown claiming the Sept. 11 attacks as a ``great victory.'' ANSWER

  11. Status • Annotated full UPenn 1M word Treebank by semantic categories • Names • Descriptions • Quantity/numerical expressions • Received initial 100k word PropBank additions to Treebank • Hypothesized an initial generative model

  12. Proposition Recognition Strategy • Start with a lexicalized, probabilistic (LPCFG) parsing model • Distinguish names by replacing NP labels with NPP • Extend the model to • Predict argument labels for clauses • Resolve references to entities

  13. Extended Parse lobj lsbj E3 PER lsbj lobj E1 PER NP NPP NPP NP E2 GPE NPP NP E1 S ousted VP NP VP SBAR PP S led NP VP WHNP NP NP , was ousted led Sharif who , Muscharraf 12 by , * Pakistan Pervez General October Nawaz Army Pakistani

  14. Conclusions • Basis for long-term infrastructure underway • Training data for rich set of categories & subcategories on million word treebank • IdentiFinder retrained on rich set of categories • Developed trainable question classifier • Collected and annotated questions by types • Parser able to process collection • Remaining steps in CY2002 • Improve system performance • Participate in end-to-end evaluation in TREC QA 2002 • Develop(ing) proposition recognition while awaiting PropBank • Prepare for pilot AQUAINT evaluation • Develop model of important, non-redundant information • Integrate proposition recognition

More Related