Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
TAP PowerPoint Presentation

TAP

179 Vues Download Presentation
Télécharger la présentation

TAP

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. TAP R.V.Guha, IBM Research Rob McCool, Stanford KSL

  2. TAP: Context • Islands of XML from disparate web services • Example : Tori Amos • Up to consumer to put these chunks together • Situation analogous to pre-web hypertext systems and RDBMS today

  3. TAP Goal • Create a coherent semantic web from disparate chunks • Effectively make the web a giant distributed DB • Why --- Bringing the Internet to programs

  4. TAP: What We Do • Inspired by DNS and early web --- simple contracts, everything decentralized • Protocols to publish & navigate • a small simple set of publishing & access guidelines that knit together schematically unified whole create • Bootstrapping: Create comprehensive chunks of the semantic web in a few areas • Applications: Semantic Search, Internet Wet Lab

  5. TAP Protocol : GetData • Simple API to navigate this web • DNS : GetHostByName(<host>) => ip addr. • TAP:GetData(<resource>, <property>) => value • GetData(<Tori Amos>, birthplace) => <Newton, NC> • GetData(<Newton, NC>, temperature) => 57 F • GetData(<Newton, NC>, locatedIn) => <North Carolina> • Publisher exposes data as a graph via GetData • Consumer uses GetData to navigate graph • Key tech. issues : Caching, Directories, Names

  6. The Name Problem North Carolina Located in USA Located in We don’t get nice sub-graphs like these, with easy to use assembly instructions Geo Almanac City instanceof Weather channel Newton, NC temperature Newton, NC 62 F Newton, NC CDNow birthplace Tori Amos Under The Pink Atlantic Musician publisher Author instanceof Tori Amos instanceof Author Date Of Birth instanceof publisher People Magazine Music Album Crucify “8/22/63” EMI

  7. USA North Carolina We get a mess like this Located in Located in Geo Almanac City instanceof USNC0491 Weather channel NTNC temperature 62 F Newton,_NorthCar CDNow birthplace 328723677 Under The Pink Atlantic Musician publisher Author instanceof 0,9855,109071,00 instanceof Author Date Of Birth instanceof publisher Music Album Crucify “8/22/63” EMI People Magazine

  8. The Name Problem • Names are crucial in information exchange • 2 parties cannot exchange information about an object without agreeing on how they are going to refer to it • The Problem : too many names to keep track off! • No URN for <Newton, NC> or <Tori Amos> • Different sites have different names for the same thing! • URN efforts to date largely failures • Traditional Approach : Name-Mapping tables

  9. USA USNC0491 North Carolina Weather channel Located in temperature 62 F Located in Geo Almanac City instanceof USNC0491 NTNC NTNC Calling program 328723677 <-> 0,9855,1… USNC0491 <-> NTNC <-> . . . Newton,_NorthCar Newton,_Nor… 0,9855, … 328723677 birthplace 328723677 Under The Pink Atlantic Musician publisher Author instanceof 0,9855,109071,00 instanceof Author Date Of Birth instanceof publisher Music Album Crucify “8/22/63” EMI People Magazine CDNow

  10. TAPNaming • Reference by descriptions • E.g., “A Musician whose firstName is ’Tori’ and whose lastName is ‘Amos’ and whose …” • Names are degenerate descriptions • Amzn:B000002UB2, CDNOW: 328723677 • Description based name negotiation • Core Insight • Don’t require globally unique names for everything if we can describe things using a starting vocabulary • Need a description language, starting vocabulary and negotiation mechanism • Bootstrapping some shared meaning into more shared meaning

  11. The vision: descriptions choreograph the integration North Carolina USA Weather channel Located in Geo Almanac temperature Located in 62 F City USNC0491 D1 instanceof NTNC D1 Calling program D1 = description of Newton, NC D2 = description of Tori Amos Newton,_NorthCar D1, D2 CDNow D2 birthplace Under The Pink Atlantic 0,9855,109071,00 Musician publisher Author 328723677 instanceof instanceof Author Date Of Birth instanceof publisher Music Album Crucify “8/22/63” EMI People Magazine

  12. Description based References • The core protocol : GetData • GetData(Resource Description, arc-label) • GetData(<Tori Amos>, birthplace) • GetData(RDF Description of Tori Amos, birthplace) • A form of loose coupling: • Handling Ambiguity, Failure to denote, … • The core contract: • Expose your data as a Graph • Map incoming descriptions to nodes in your graph • In return, your data is now integrated into the global semantic web

  13. Infrastructure: Kernel Vocabulary • Provides vocabulary for descriptions • Purpose is to provide the infrastructure for constructing descriptions with which programs can refer to things • “A Musician whose firstName is ’Tori’ and whose lastName is ‘Amos’ and whose • It doesn’t reside anywhere : it’s a specification

  14. Applications • Good infrastructures have waves of applications • WWW : home pages, portals, ecommerce, … • DNS : email, telnet, ftp, gopher, … WWW • Semantic Search • Adding Semantics to Search • Crawl, grab, index model of search doesn’t work for dynamic web sites or web applications • Semantic based Search Augmentation enables search to cover time sensitive data • Internet Wet Lab

  15. Semantic Web Application: Semantic Search

  16. Search Augmentation Example

  17. How the Semantic Infrastructure gets used in Semantic Search KB UDDI++ Musician whose genre is ClassicalMusic, First name is … Who has - concert dates? - discography? - auctions? - bio? For musician whose Search Front End “Yo Yo Ma” Caching & Buffering Auctions for … Concert Dates for Musician whose … Bio for … Discography for … AllMusic TicketMaster EBay CDNow

  18. TAP KBs for Semantic Search • Large Knowledge Base of specific musicians, cities, athletes, … • Currently covers about 20% of search terms • Built in a largely automated fashion • Scrapers for free data sources • Simple noun phrase analysis of news articles • AP, Reuters, … • Scrapers for important sites to bootstrap • KB also helps bootstrap the semantic web

  19. Music Musicians, instr., styles Movies Movies, actors, tv-shows Authors Top authors, classic books, Sports Athletes, sports, sports teams, equipment Autos Auto models, motorcycles, . Companies Fortune 500 Home Appliances Types, brands Toys Types, brands Baby products Types, brands Places Countries, cities, tourist attractions, … Consumer electronics Audio/Video, Communication Game : consoles, titles, … Health Diseases, Drugs, … KB Coverage Today

  20. Semantic Site Search • Semantic Search useful not just for internet wide search, but also for site search • Same principles as internet-wide search • KBs created for searching related individual sites can be shared between sites • These KBs feed into global semantic web • Example: Semantic Search for www.w3.org

  21. TAP Appl: Internet Wet Lab • In many sciences, more data will be produced in the next 2 years than exists today • Increasingly, research consists of writing programs that mine this data • Data is isolated as islands in different labs • Data from one lab not easily available to programs in another lab • We want to use TAP to create a single virtual net-wide “database” containing all this experimental data • Example : Clinical Trial Data

  22. TAP Organization • TAP is a multi-organization research effort • IBM, Stanford KSL, Stanford Logic Group, CMU West, … • KBs, source-code, etc. freely available (via BSD license) • A number of new projects starting up … places, entertainment, … • We invite you to join • URL: http://tap.stanford.edu/

  23. TAP: Summary • Small set of guidelines that create a coherent semantic web out of disparate web services • Potential solution to naming problem • Relevant to all web services • Semantics Search & Internet Wet Lab as driving applications • TAP is a research project • Lot of fundamental work remains to be done • Everything freely available. We want you to join!

  24. Questions

  25. US State instanceof USA City Geo Almanac Country CDNow North Carolina Located in People Magazine instanceof instanceof Weather channel Located in temperature Bg KB 62 F Newton, NC birthplace Under The Pink Tori Amos Atlantic Author publisher instanceof Musician instanceof Author Date Of Birth Crucify publisher Music Album EMI “8/22/63” instanceof

  26. USA North Carolina Located in Newton, NC Located in Weather channel Geo Almanac temperature 62 F City instanceof Newton, NC Newton, NC CDNow birthplace Tori Amos Under The Pink Atlantic Musician publisher Author instanceof Tori Amos instanceof Author Date Of Birth instanceof publisher People Magazine Music Album Crucify “8/22/63” EMI

  27. TAP : Summary • Focus is shifting from just storing and retrieving data to exchanging data. XML provides syntax. We need semantics • We need infrastructure layer for semantics • Applications drive infrastructures. The driving application for this layer is Semantics based Search & News Augmentation.

  28. What is an Internet Infrastructure Layer? • There is a data structure, pieces of which are in different places on the net • DNS: Hash table of host names to ip addresses accessed via GetHostByName • WWW : Directed graph of documents accessed via HTTP GET/POST • Infrastructure layer provides a set of standards & APIs to unify the different pieces so that a client can pretend it is all local

  29. Application 2 : RTA for news articles

  30. RTA for News Articles Knowledge Base Text analysis Directory SportsTeam_TexasRangers, AthleteRodriguez_Alex … Whose - team schedule? - posters? - auctions? - bio? Search/ Syndication Front End News article Team Schedule for team whose title … Auctions for … Poster for … Videos for … AllPosters MLB.com EBay AOL Shopping