1 / 10

AeroDAML Applying Information Extraction to Generate DAML Annotations

AeroDAML Applying Information Extraction to Generate DAML Annotations. Dr. Paul Kogut Lockheed Martin Management & Data Systems. What is Information Extraction?. Information Extraction. Text or web pages. Entities. Relationships. Co-references. Events. Linguistic Knowledge.

arty
Télécharger la présentation

AeroDAML Applying Information Extraction to Generate DAML Annotations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AeroDAMLApplying Information Extractionto Generate DAML Annotations Dr. Paul Kogut Lockheed Martin Management & Data Systems

  2. What is Information Extraction? Information Extraction Text or web pages Entities Relationships Co-references Events Linguistic Knowledge

  3. Extraction and Semantic Annotation • Consumer-side extraction - 3rd party text -> database • Advantages: • Applicable to raw documents (most of the web) • Disadvantages: • Must deal with full complexity of natural language • Semantic annotation proposed to overcome difficulty of consumer-side extraction - but annotation is labor intensive • Producer-side extraction - authored text -> annotation • Advantages: • Partial-automation - reduces manual effort • Human assisted disambiguation • Domain customization for intranets and B2B e-commerce • Disadvantages: • Requires manual effort to correct and add rich set of relationships • Domain customization requires up-front effort from the author/webmaster • Both types of extraction will coexist.

  4. DAML Annotator AeroDAML Architecture UBOT Annotation Editor refined annotation basic annotation basic annotation Extraction to DAML Translation DAML annotated text or web pages DAML Ontologies Text or web pages Text Extraction

  5. Client-Server AeroDAML • Users: • personnel who routinely produce documents (e.g., intelligence analysts) • personnel who have a large collection of legacy documents

  6. Web-based AeroDAML • Users: • novice/infrequent DAML annotators • people who want to do quick/simple annotation of a web page

  7. AeroDAML Output: Entities <aac:ABSOLUTEDATE rdf:about="December19,1997"> <daml:label><![CDATA[December 19, 1997]]></daml:label> </aac:ABSOLUTEDATE> <aac:AIRCRAFT rdf:about="Dash8Series400"> <daml:label><![CDATA[Dash 8 Series 400]]></daml:label> </aac:AIRCRAFT> <aac:MEASURE rdf:about="61-foot"> <daml:label><![CDATA[61-foot]]></daml:label> </aac:MEASURE>

  8. AeroDAML Output: Relationships <aac:NATION rdf:about="Austria"> <daml:label><![CDATA[Austria]]></daml:label> </aac:NATION> <aac:ORGANIZATION rdf:about="TyroleanAirways"> <aac:OrgToLoc rdf:resource="Austria"/> <daml:label><![CDATA[Tyrolean Airways]]></daml:label> </aac:ORGANIZATION>

  9. AeroDAML Output: Co-reference <aac:PERSON rdf:about="PierreLortie"> <aac:PersToOrg rdf:resource="BombardierRegionalAircraft"/> <daml:equivalentTo rdf:resource="Lortie"/> <daml:label><![CDATA[Pierre Lortie]]></daml:label> </aac:PERSON> <aac:PERSON rdf:about="Lortie"> <daml:label><![CDATA[Lortie]]></daml:label> </aac:PERSON>

  10. AeroDAML Plans • Integrate with annotation editor • Improve Web-based AeroDAML • Allow user to select other ontologies besides the current AeroDAML default ontology for annotation generation: • OpenCyc or Cyc Upper Ontology • CIA World Fact Book • IEEE Standard Upper Ontology • Dublin Core • UNSPSC... • Try AeroDAML! • http://ubot.lockheedmartin.com/ubot/hotdaml/aerodaml.html

More Related