1 / 22

Web Crawler Agent (WCA)

Web Crawler Agent (WCA). Presented by Kirk Martinez University of Southampton. Introduction. WCA searches for missing information (fragments) on the Web WCA structures information into ontology “place_of_birth” (Person,Place)

amma
Télécharger la présentation

Web Crawler Agent (WCA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Crawler Agent(WCA) Presented by Kirk Martinez University of Southampton

  2. Introduction • WCA searches for missing information (fragments) on the Web • WCA structures information into ontology “place_of_birth” (Person,Place) • Techniques used: NLP (Natural Language Processing), Information extraction, relation extraction, question answering

  3. Overview

  4. Is it something like “Google”? • Search “date_of_birth” (when Rembrandt was born) with Google

  5. Searching information with Google • The “old” Web Search (eg Google) is good for getting documents but NOT for extracting concise answers • (e.g. “15-July-1606”) • No analysis to “understand” the documents (e.g. “Rembrandt” can mean “hotel” or “bookstore”)

  6. Information extraction on the Web • data may be low quality and repeated • e.g. Seurat Georges’s date of death • 29, March 1891(http://www.ibiblio.org/wm/paint/auth/seurat/) • 19, March 1891 (http://www.rickdoble.net/influence/20seurat.htm) • WCA depends on: • Well-structured sentences and documents • Good named-entity recognisers

  7. Future work • verification • performance • autonomous

More Related