1 / 17

Ontology Based Extraction of RDF Data from the World Wide Web

Ontology Based Extraction of RDF Data from the World Wide Web. Tim Chartrand Masters Thesis Research Supported By NSF. Introduction. World Wide Web Has a huge amount of existing information Designed primarily for human consumption Semantic Web Is an extension of WWW

dena
Télécharger la présentation

Ontology Based Extraction of RDF Data from the World Wide Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand Masters Thesis Research Supported By NSF

  2. Introduction • World Wide Web • Has a huge amount of existing information • Designed primarily for human consumption • Semantic Web • Is an extension of WWW • Gives information a well-defined meaning • Allows automation of tasks • DEG contribution – Extract data from the WWW • Solution • Extract Semantic Web data from the WWW • Superimpose extracted data

  3. User HTML Page HTML Page DAML Ontology Extraction Ontology Extraction Ontology Extraction Engine Extraction Engine RDF Browser RDF Data Relational Data Relational Data Research Overview

  4. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data mailto:tim@cs.byu.edu genealogy:fatherOf genealogy:age genealogy:fatherOf mailto:tyler@thechartrands.com 25 RDF – What is it? • Resource Description Framework • Language of the Semantic Web • Set of <subject><predicate><object> triples <mailto:tim@cs.byu.edu><genealogy#age>“25” <mailto:tim@cs.byu.edu><genealogy#fatherOf><mailto:tyler@thechartrands.com>

  5. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data DAML Core Concepts • daml:class – defines a class • daml:property – defines a binary relation, has a value • rdfs:domain – specifies class to which a property applies • rdfs:range – specifies possible values of a property • daml:uniqueProperty, daml:unambiguousProperty – specify cardinality constraints for a property

  6. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data Example Ontology . . . <daml:Class rdf:ID="Program"> <rdfs:label>Program</rdfs:label> </daml:Class> <daml:Class rdf:ID="OperatingSystem"> <rdfs:label>OperatingSystem</rdfs:label> </daml:Class> . . . <daml:DatatypeProperty rdf:ID="Name"> <rdf:type rdf:resource="&daml;UniqueProperty"/> <rdf:type rdf:resource="&daml;UnambiguousProperty"/> <rdfs:domain rdf:resource="#Program"/> <rdfs:range rdf:resource="&rdfs;Literal"/> </daml:DatatypeProperty> <daml:Property rdf:ID="supportsOperatingSystem"> <rdfs:domain rdf:resource="#Program"/> <rdfs:range rdf:resource="#OperatingSystem"/> </daml:Property> . . .

  7. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data DAML  OSM • Class  Non-lexical object set • Property  Binary relationship set between object sets • Literal property  Lexical object set and binary relationship set between non-lexical and lexical object sets • Cardinality restriction  Participation constraint

  8. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data DAML  OSM <daml:Class rdf:ID="Program"> <rdfs:label>Program</rdfs:label> </daml:Class> <daml:Class rdf:ID="OperatingSystem"> <rdfs:label>OperatingSystem</rdfs:label> </daml:Class> . . . <daml:DatatypeProperty rdf:ID="Name"> <rdf:type rdf:resource="&daml;UniqueProperty"/> <rdf:typerdf:resource="&daml;UnambiguousProperty"/> <rdfs:domain rdf:resource="#Program"/> <rdfs:range rdf:resource="&rdfs;Literal"/> </daml:DatatypeProperty> <daml:Property rdf:ID="supportsOperatingSystem"> <rdfs:domain rdf:resource="#Program"/> <rdfs:range rdf:resource="#OperatingSystem"/> </daml:Property>

  9. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data Data Frames • Lexical object sets need data frames. • Use data-frame library • Match lexical object sets with data frames • Compare stemmed names and aliases • Levenshtein edit distance • Soundex • Longest common subsequence • Weighted average • Specialization heuristic • Choose most similar data frame (above a threshold)

  10. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data User Modification • Provide graphical ontology editor • Automate graph layout • Allow the user to edit participation constraints • Allow user to edit data-frame mapping • Provide data frame editor

  11. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data Extracting the Data

  12. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data Pointing to the Data <html> . . . <body> <table> <tr> <td> <a href="..."><b>Stick Death 1.0</b></a><br /> Advance in levels, grab weapons, and unlock new levels and characters.<br /> <b>OS:</b> Windows 3.x/95/98/Me/NT/2000/XP<br /> <b>File Size:</b>2.66MB<br /> <b>License:</b>Free<br /> </td> <td>05/14/2002<br /> <i><b>new</b></i> </td> <td></td> <td>2,235</td> <td><a href="...">Download now</a><br /><br /></td> </tr> . . . xpointer(string-range(/html[1]/body[1]/table[1]/tr[1], ’’, 10, 3))

  13. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data http://www.deg.byu.edu/software.html#Program1001 rdf:type software:name software:ProgSize software:Program software:version software:OperatingSystem Stick Death 1.0 rdf:type software:SizeVal software:SizeUnit rdf:type software:Size software:OSVersion software:OperatingSystem 2.66 MB software:OSName 3.x/95/98/Me/NT/2000/X Windows Convert to RDF

  14. User HTML Extraction Engine DAML Ontology Extraction Ontology Relational Data RDF Data Superimposed Data

  15. Results • RDF Data Extraction and Viewing • Built 4 data-extraction ontologies • 3 from DAML ontologies for data extraction • 1 from an existing DAML ontology • Most existing DAML ontologies not good for data extraction • Data Frame Matcher • 8 ‘training ontologies’, 16 test ontologies • 128 lexical object sets, 40 correct matches, 12 incorrect matches • Precision: 77% • Recall: 89% • Experiment (apartment rentals): 6 students 3 data frames • Phone: 2.8 min • RentalRate: 16.5 min • Bedrooms: 17.5 min

  16. Contributions • Advancement of Semantic Web • Application of Information Extraction to building Semantic Web content • Semantic Web data as superimposed information • Algorithm for ontology conversion

  17. Future Work • Data extraction • Enhance name matcher with data values • Support n-ary relationship sets • RDF data generation • Generate only one URI for an object • Associate concepts from DAML ontologies to well-known DAML ontologies

More Related