1 / 10

CSC 9010: AeroText, Ontologies, AeroDAML

CSC 9010: AeroText, Ontologies, AeroDAML. Dr. Paula Matuszek Paula_A_Matuszek@glaxosmithkline.com (610) 270-6851. AeroText. Information Extraction tool marketed by Lockheed Martin Capabilities similar to GATE Much better developed IDE Less open to extensions of the system itself.

radha
Télécharger la présentation

CSC 9010: AeroText, Ontologies, AeroDAML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 9010: AeroText, Ontologies, AeroDAML Dr. Paula Matuszek Paula_A_Matuszek@glaxosmithkline.com (610) 270-6851

  2. AeroText • Information Extraction tool marketed by Lockheed Martin • Capabilities similar to GATE • Much better developed IDE • Less open to extensions of the system itself. • Equally steep learning curve for effective use! • Lockheed AeroText General Overview • Lockheed AeroText White Paper

  3. AeroText Demo

  4. Ontologies • Information Extraction requires modeling extensive domain knowledge • Other applications of text mining, such as document categorization, can also use domain information • In modeling such knowledge we often create an ontology: An explicit formal specification of how to represent the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them.

  5. A Simple Ontology: Birthdates • Objects, concepts, entities: • Months, days, years • dates • first names • last names • persons • birthdates • Relationships between them • a date has exactly one month, day, year • a birthdate is a date • a person has at least 1 first name and exactly 1 last name • a person has a birthdate • a birthdate has a person

  6. Who and Why? • Many groups are developing ontologies: • standardize terms and vocabulary • facilitate the semantic web • improve information integration • interested in the domain itself • Some ontologies under development • Cyc • GO (Gene ontology) • UMLS (Unified Medical Language System) • CIA World Factbook

  7. DAML • DARPA Agent Markup Language • A language for describing ontologies • Example: an ontology for dates • Extensive information available at www.daml.org.

  8. UBOT • UML Based Ontology Toolkit • Part of a DARPA project to automatically mark up web pages to make them • The purpose of DAML is to annotate information on the web to make it machine-readable so that software agents can interpret it and reason with it: the semantic web • http://ubot.lockheedmartin.com/ubot/intro/index.html

  9. AeroDAML • AeroDAML is a web service that takes a web page as an input and generates DAML markup. • Uses AeroText as the underlying extraction tool. • Works with various ontologies. • Paper describing system

  10. Lab: try out AeroDAML • AeroDAML page • Choose a news page (www.phillynews.com, Google News, ...) and tag it with the Cyc and CIA ontologies. • How well did each ontology do at picking up content? Did they miss things they should have found? Was anything tagged incorrectly? • Repeat for one of your domain-specific documents, or a web page in a specific area. Try a different ontology if you think one of the others might be more interesting. • How was the annotation different? • Are we enabling the semantic web?

More Related