640 likes | 659 Vues
RDFa, Etc. (Resource Description Framework–in–attributes). W3C: RDFa 1.1 Primer http://www.w3.org/TR/xhtml-rdfa-primer/. RDFa. RDFa allows RDF statements to be included in ordinary HTML/XHTML files using formally defined attributes A W3C recommendation, http://www.w3.org/TR/rdfa-core
E N D
RDFa, Etc. (Resource Description Framework–in–attributes) W3C: RDFa 1.1 Primer http://www.w3.org/TR/xhtml-rdfa-primer/
RDFa • RDFa allows RDF statements to be included in ordinary HTML/XHTML files using formally defined attributes • A W3C recommendation, http://www.w3.org/TR/rdfa-core • The vocabularies are specified using XML namespaces, so use with XHTML, not HTML, document types • Do not generate RDF/XML files separately • RDF/XML is complex • Requires a separate creation and storage mechanisms • Add extra structured content to the (X)HTML pages • Let processors extract that content and turn it into RDF
RDFa provides attributes to carry metadata in an XML language • Note the ‘a’ (attributes) in RDFa • These attributes include: • aboutgives a URI specifying the resource the metadata is about • reland revspecify a relationship and inverse relationship with another resource, resp. • src, hrefand resourcespecify the partner resource • propertyspecifies a property for the content of an element or the partner resource (the resource that the metadata is about) • content(optional) overrides the content of the element when using the property attribute • datatype(optional) specifies the datatype of text specified for use with the property attribute • typeof(optional) specifies the RDF type(s) of the subject or the partner resource
Five "principles of interoperable metadata" met by RDFa • Publisher Independence: Each site can use its own standards • Data Reuse: Data are not duplicated—separate XML and HTML sections aren’t required for the same content. • Self Containment: The HTML and the RDF are separated • Schema Modularity: The attributes are reusable • Evolvability: Additional fields can be added and XML transforms can extract the semantics of the data from an XHTML file
Attributes map to RDF components • Subject: about, src—e.g., about="rdfa-course" • Predicate: property, rel, rev, typeof—e.g., property="dc:title" • Object: content, href, resource, datatype, or just plain content or a resource—e.g., RDFa Courseas the content of an HTML element Example <div about=”rdfa-course"> <h3 property="dc:title">RDFa Course</h3> </div>
RDFa Example <div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person"> <span typeof="v:Address"> <span property="v:locality">Albuquerque</span> <span property="v:region">NM</span> </span> </div> • The namespace used here identifies the vocabulary developed by Schema.org—see below
Publishing RDFa • RDFa provides an easy way of publishing RDF data on the Web • Often the same RDF data is available in different formats, including RDFa • The client chooses which one(s) to support Consuming RDFa • Various search engines have begun to consume RDFa • Google, Yahoo, … • They may specify which vocabularies they “understand” • Facebook’s “social graph” is based on RDFa
RDFa Distiller • W3C service to identify and list RDF in a web page http://www.w3.org/2012/pyRdfa/ • Extract RDF from HTML + RDFa • Using a web address, local file or direct text inputs, it provides a clean view of the implied data hierarchy
Example • Select the tab Distill by Direct Text Input, copy the following into the window <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Books by Marco Pierre White</title> </head> <body> I think White's book '<span about="urn:ISBN:0091808189" typeof="http://purl.org/ontology/bibo/Book" property="http://purl.org/dc/terms/title" >Canteen Cuisine</span>' is well worth getting since although it's quite advanced stuff, he makes it pretty easy to follow. You might also like <span about="urn:ISBN:1596913614" typeof="http://purl.org/ontology/bibo/Book" property="http://purl.org/dc/terms/description" >White's autobiography</span>. </body> </html>
Choose the following selections in the dropdowns below the text window • Host Language: HTML5 + RDFa • Output Format: Turtle • Returned content: Only core triples • Expand vocabularies: No • Generate warnings for non RDFa 1.1 Lite usage: No • Click the Go button (below these dropdowns) • Output presented in a downloaded file—open in, e.g., Notepad++
For our example, the output is @prefix dc: <http://purl.org/dc/terms/> . <urn:ISBN:0091808189> a <http://purl.org/ontology/bibo/Book>; dc:title "Canteen Cuisine" . <urn:ISBN:1596913614> a <http://purl.org/ontology/bibo/Book>; dc:description "White's autobiography" .
RDFa Developer https://addons.mozilla.org/en-US/firefox/addon/rdfa-developer/?src=ss • Firefox add-on that lets us visualize all the RDFa triples in a web page • Shows a list of errors and warnings found while parsing the document • Lets us execute SPARQL queries on the RDFa content
To install, follow above link, click Add to Firefox button, restart Firefox (Perhaps first look Tools Add-ons for restart in Developer listing) • The Developer windows occupy the bottom part of the screen • To add an icon in the lower right corner of the browser (the icon bar), in the View menu at the top, under Toolbars, have Add-on bar checked • If the Developer icon doesn't appear in add-on bar, View Toolbar Customize and drag the Developer icon from the pallet to the add-on bar • Click the icon to toggle the Developer display off and on • By default, the Developer windows appear when you start up Firefox • To prevent this, in the Tools tab, select Add-ons • In the resulting display, click the Disable button for RDFa Developer • To use the Developer again, go back and click the Enable button
Example • Save the following code (same as the previous example) in an HTML file <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Books by Marco Pierre White</title> </head> <body xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dc="http://purl.org/dc/terms/"> I think White's book '<span about="urn:ISBN:0091808189" typeof="bibo:Book" property="dc:title" >Canteen Cuisine</span>' is well worth getting since although it's quite advanced stuff, he makes it pretty easy to follow. You might also like <span about="urn:ISBN:1596913614" typeof="bibo:Book" property="dc:description" >White's autobiography</span>. </body> </html>
The output should show 4 triples in the Data tab (expand by clicking th triangles) and 3 warnings in the Notices tab • If the tabs do not show any triples or warnings, try to disable & re-enable the RDFa Developer add-on • Open the saved HTML file in Firefox
Regarding the Notices tab (errors & warning), suppose we remove the namespaces in the body element • Change <body xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dc="http://purl.org/dc/terms/"> to <body xmlns:dc="http://purl.org/dc/terms/"> • Open the saved HTML file in Firefox
The output shows the errors and warnings in the Notices tab • The errors specify that the prefix used for the bibo namespace is not defined (and the attribute with this prefix is unused) • I couldn't get the Query tab to submit queries
E.g., on the BBC New world website, http://www.bbc.co.uk/news/world/, one part of the HTML is as follows • To see the source HTML, in Mozilla, right click in the window • In the resulting menu, click View Page Source <meta property="og:title" content="BBC News: World"> <meta property="og:description" content="World news from the BBC"> <meta property="og:url" content="http://www.bbc.co.uk/news/world/"> <meta property="og:type" content="website"> <meta property="og:image" content= "http://news.bbcimg.co.uk/media/images/56400000/jpg/_56400259_bbcnews.jpg"> <meta property="og:site_name" content="BBC News"> <meta property="fb:app_id" content="218019758281651"> • The next slide shows part of the RDFa Developer Data tab • The RDFa occurs in several places—hence the triples from RDFa not shown here
The Open Graph protocol http://ogp.me/ • Enables any web page to become a rich object in a social graph • Used on Facebook to allow any web page (by adding metadata) to have the same functionality as any other object on Facebook • Since Open Graph is an open protocol of sorts, it's not Facebook specific • Google Plus gives schema.org the highest weight • If they don’t exist, it falls back on open graph tags • If they do not exist, falls back on page content, like "title", etc. • Even without a good internal search engine, Facebook already drives more traffic for some searches (social searches) than Google
No single technology provides enough info to richly represent any web page within the social graph • The Open Graph protocol builds on these existing technologies • Developer simplicity is a key goal that has informed many of the technical design decisions • See: The Open Graph Protocol Design Decisions (D. Recordon, presented at the W3C’s Linked Data CAMP at WWW 2010) http://www.scribd.com/doc/30715288/The-Open-Graph-Protocol-Design-Decisions • Within 7 days of implementation, the following services hosted it • og:it—simple metadata extractor to HTML • OpenGraph.in—simple metadata extractor to HTML and JSON • Multiple RDF parsers now understand the Open Graph protocol • Open Graph protocol to JSON converter for testing • Open source libraries for Java, Perl, PHP, and Ruby • WorldPress plugin for easy publishing
Initial version is based on RDFa • Place additional <meta> tags in the <head> of your web page • The 4 required properties • og:title—the title of your object as it’s to appear in the graph • og:type—the type of your object, e.g., "video.movie" • Depending on the type you specify, other properties may also be required • og:image—an image URL to represent your object in the graph • og:url—the canonical URL of your object, used as its permanent ID in the graph
Example: the Open Graph protocol markup for The Rock on IMDB <html prefix="og: http://ogp.me/ns#"> <head> <title>The Rock (1996)</title> <meta property="og:title" content="The Rock" /> <meta property="og:type" content="video.movie" /> <meta property="og:url" content="http://www.imdb.com/title/tt0117500/" /> <meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" /> ... </head> ... </html> • Also 7 optional properties
Some properties can have extra metadata attached to them • E.g, the og:image property has some optional structured properties • og:image:url—identical to og:image • og:image:secure_url—an alternate url to use if the webpage requires HTTPS • og:image:type—a MIME type for this image • og:image:width—the number of pixels wide • og:image:height—the number of pixels high • The og:video tag has the identical tags • The og:audio tag only has the 1st 3 properties
If a tag can have multiple values, put multiple versions of the same <meta> tag on your page • The 1st tag (from top to bottom) is given preference during conflicts • This is effectively an array of values • When the community agrees on the schema for a type, it’s added to the list of global types • All other objects in the type system are CURIEs (see below) of the form <head prefix="my_namespace: http://example.com/ns#"> <meta property="og:type" content="my_namespace:my_type" />
The global types are grouped into verticals, each with its own namespace • The og:type values for a namespace are prefixed with the namespace and then a period • Reduces confusion with user-defined namespace types (which have colons) Example (more a candidate vertical) • profile—namespace URI: http://ogp.me/ns/profile# • profile:first_name—string—a given name • profile:last_name—string—a name inherited from a family or marriage • profile:username—string—a short unique string to identify them • profile:gender—enum(male, female)—their gender
The types used when defining attributes • Boolean—values: true, false, 1, 0 • DateTime—composed of a date (year, month, day) and an optional time component (hours, minutes) as per the ISO 8601 standard • Enum—a type consisting of bounded set of constant string values • Float—a 64-bit signed floating point number • Integer—a 32-bit signed integer. • String—a sequence of Unicode characters • URL—all valid URLs that utilize the http:// or https:// protocols • Discuss the Open Graph Protocol • in the Facebook group (https://www.facebook.com/groups/opengraph/) or • on the developer mailing list (http://groups.google.com/group/open-graph-protocol)
The open source community has developed several parsers and publishing tools • Facebook Object Debugger—Facebook's official parser & debugger • Google Rich Snippets Testing Tool—Open Graph protocol support in specific verticals and Search Engines. • OpenGraph.in—a service that parses Open Graph protocol markup and outputs HTML and JSON • PHP Validator and Markup Generator—OGP 2011 input validator and markup generator in PHP5 objects • PHP Consumer—a small library for accessing of Open Graph Protocol data in PHP • OpenGraphNode in PHP—a simple parser for PHP • PyOpenGraph—a library written in Python for parsing Open Graph protocol information from web sites • Continued
OpenGraph Ruby—a Ruby Gem that parses web pages and extracts Open Graph protocol markup • OpenGraph for Java—a small Java class used to represent the Open Graph protocol • RDF::RDFa::Parser—a Perl RDFa parser that understands the Open Graph protocol • WordPress plugin—Facebook's official WordPress plugin WordPress http://wordpress.org/ • A free and open source blogging tool and a content management system (CMS) based on PHP and MySQL • Runs on a web hosting service • Used by more than 18.9% of the top 10 million websites (August 2013) • The most popular blogging system (>60 M websites)
A CURIE (short for Compact URI) defines a generic, abbreviated syntax for expressing URIs, e.g., [isbn:0393315703] • May be considered a datatype • The square brackets may be used to prevent ambiguities between CURIEs and regular URIs, yielding so-called safe CURIEs • QNames may be considered a type of CURIE • CURIEs can be better defined and may include checking • Unlike QNames, the part of a CURIE after the colon needn’t conform to the rules for XML element names • The final W3C recommendation was released 2009
Definition CURIE Example (using a QName syntax within XHTML) <html xmlns:wiki="http://en.wikipedia.org/wiki/"> <head>...</head> <body> <p> Find out more about <a href="[wiki:Biome]">biomes</a>. </p> </body> </html
RDFa Play http://rdfa.info/play/ • Beta version (still bugs) yet very useful • HTML fragment with RDFa in left panel, rendering in right • Choose to see (below the panels) either N3 serialization of contained RDF or its graphical visualization • Examples of type Person, Social Network, Event, Place, Product, SVG • Edit these or make your own HTML fragments from scratch
See Tools tab at RDF.info web page (http://rdfa.info/tools/) The W3C’s Nu Markup Validation Service http://validator.w3.org/nu/ • Handles RDFa in XML and (X)HTML (various versions) as well as SVG and MathML • Can automatically detect content type
java-rdfa https://github.com/shellac/java-rdfa • An offshoot of the Stars Project, Univ. of Bristol, Institute for Learning and Research Technology (Web Futures team) • STARS (roughly Semantic Tools for Screen Arts Research) project (http://www.dshed.net/dshed/stars, http://stars.ilrt.bris.ac.uk/blog/) is now finished • Funded by JISC, a charity that champions the use of digital technologies in UK education and research • The Semantic Web technologies used in it broadly seek to capture and make machine readable data resources of video content • Lets people browsing the content discover thematic links and describe them in new ways
For HTML sources, add the format argument; need the validator.nu parser (see below) $ java -cp '*' rdfa.simpleparse --format HTML http://www.slideshare.net/intdiabetesfed/world-diabetes-day-2009 <http://www.slideshare.net/intdiabetesfed/world-diabetes-day-2009> <http://www.w3.org/1999/xhtml/vocab#stylesheet> <http://public.slidesharecdn.com/v3/styles/combined.css?1265372095> . ... • The output of simpleparse is n-triples (hard to read) • Add Jena to the classpath and use rdfa.parse instead $ java -cp '*:/path/to/jena/lib/*' rdfa.parse --format HTML http://www.slideshare.net/intdiabetesfed/world-diabetes-day-2009 @prefix dc: <http://purl.org/dc/terms/> . @prefix hx: <http://purl.org/NET/hinclude> . ... nice turtle output ...
java-rdfa can be used from Jena—invoke Class.forName("net.rootdev.javardfa.RDFaReader"); • This hooks the 2 readers into Jena, then we can do either of the following model.read(url, "XHTML"); // xml parsing model.read(other, "HTML"); // html parsing
The Validator.nu HTML Parser http://about.validator.nu/htmlparser/ • An implementation of the HTML5 parsing algorithm in Java • Works as a drop-in replacement for the XML parser in applications that • already support XHTML 1.x content with an XML parser and • use SAX, DOM or XOM to interface with the parser • The parser core compiles on Google Web Toolkit
The following are mentioned in RDFa.info, Developers link, http://rdfa.info/dev/ Green Turtle http://code.google.com/p/green-turtle/ • An implementation of RDFa 1.1 for browsers • Including a bit of JavaScript extends the DOM to include the RDFa API • An RDFa 1.1 processor to process any ancillary documents to harvest triples EasyRdf http://www.easyrdf.org/ • A PHP library to make it easy to consume and produce RDF—e.g., $foaf = new EasyRdf_Graph("http://njh.me/foaf.rdf"); $foaf->load(); $me = $foaf->primaryTopic(); echo "My name is: ".$me->get('foaf:name')."\n"; • There’s a class to map between RDF Types and PHP Classes • Support for visualization of graphs using GraphViz • EasyRdf 0.8 does support RDFa, but it's still in beta • Use the converter at easyrdf-converter.aelius.com to test it out
pyrdfa3 https://github.com/RDFLib/pyrdfa3 • This is what provides the W3C’s RDFa Distiller and Parser • Part of Python RDFLib, https://github.com/RDFLib The RDFa gem http://rubygems.org/gems/rdf-rdfa • The Ruby RDF Project collects numerous gems supporting Linked Data and Semantic Web programming in Ruby • See http://ruby-rdf.github.io/ librdfa, “The Fastest RDFa Processor on the Internet” https://github.com/rdfa/librdfa/ • A SAX-based RDFa processor written in C for XML and HTML family languages • Supports XML+RDFa, XHTML+RDFa, SVG+RDFa, HTML4+RDFa and HTML5+RDFa for both RDFa 1.0 and RDFa 1.1
clj-rdfa https://github.com/niklasl/clj-rdfa • An RDFa extractor implemented in Clojure running on a Java Virtual Machine. • Clojure (pronounced “closure”) is a dialect of Lisp programming • A functional general-purpose language • Runs on the Java Virtual Machine, Common Language Runtime, and JavaScript engines • Focus is on programming with immutable values and explicit progression-of-time constructs • Facilitates the development of more robust programs, particularly multithreaded ones
Semargl http://semarglproject.org/ • Download from https://github.com/levkhomich/semargl • A modular framework for crawling linked data from structured documents • Provides lightweight and performant tools without excess dependencies • High-performant streaming parsers for RDFa, JSON-LD (see below), RDF/XML, N-Triples • Streaming serializer for Turtle, NTriples, NQuads • Integration with Jena, Sesame (see below) and Clerezza (see below) • Small memory footprint and CPU requirements allow this framework to be used by any application • Runs seamlessly on Android and GAE (Google App Engine)
Sesame http://www.openrdf.org/about.jsp • An open-source framework for querying and analyzing RDF data • Implements an in-memory triple store and an on-disk triple store • And 2 Servlet packages to manage and provide access to these triple stores on a permanent server • The Sesame Rio (RDF Input/Output) package contains a simple API for Java-based RDF parsers and writers • Supports 2 query languages: SPARQL and SeRQL (in the SWI-Prolog Semantic Web Library, http://www.swi-prolog.org/pldoc/package/semweb.html, see also http://www.swi-prolog.org/web/) • Its Alibaba component is an API that lets us • map Java classes onto ontologies and • Generate Java source files from ontologies • Can thus use specific ontologies like RSS, FOAF and the DC directly from Java
Clerezza http://clerezza.apache.org/ • A service platform based on OSGi (Open Services Gateway initiative, open specifications that enable the modular assembly of software built with Java technology, http://www.osgi.org/) • Functionality for managing semantically linked data accessible through RESTful Web Services and in a secured way • Tools to manipulate RDF data, create RESTful Web Services and Renderlets using Scala Server Pages • A renderlet is a special container that can receive every object in Pimcore • Pimcore is an open source web content management platform for creating and managing web applications and digital presences implemented in PHP and MySQL • Scala Server Pages are like JSPs but for Scala instead of Java • Scala is an object-functional programming and scripting language for general software applications
RDF triples are stored via Clerezza’s Smart Content Binding (SCB) • A java implementation of the graph data model and functionalities to operate on it • A service interface to access multiple named graphs • Can use various providers to manage RDF graphs in a technology specific manner (using e.g., Jena or Sesame) • Provides for adaptors that allow an application to use various APIs (including the Jena api) to process RDF graphs • A serialization and a parsing service to convert a graph into a certain representation and vice versa
JSON-LD (JSON for Linked Data, http://json-ld.org/) is a method of transporting Linked Data using JSON • Being standardized by the W3C RDF Working Group (http://www.w3.org/TR/2013/PR-json-ld-20131105/, Nov. 2013) • Linked Data is a way of publishing structured data so that it can be interlinked and more useful • Builds upon standard Web technologies (HTTP, RDF, URIs, …) • Extends them to share info in a computer-readable way so that data from different sources can be connected and queried • JSON-LD aims to require as little effort as possible from developers to transform their existing JSON to JSON-LD • Designed around the concept of a “context” to provide additional mappings from JSON to an RDF-like model • See the playground at http://json-ld.org/playground/
checkrdfa http://check.rdfa.info/ • Checks a web page for RDFa and displays the found data • Validates our data against the published recommendations from major consumers/users of RDFa data • I don’t think this works anymore
Microformats • See microformats.org at http://microformats.org/ • Primer: http://www.digital-web.com/articles/microformats_primer/ • A microformat (abbreviated μF) is a web-based approach to semantic markup • Re-uses existing HTML/XHTML tags to convey metadata and other attributes in web pages and other contexts that support (X)HTML (e.g., RSS) • Lets software process info intended for end-users (e.g., contact info, geographic coords, calendar events) automatically • Established microformats (e.g., hCard) are published on the web at least as often as alternatives (e.g., schema and RDFa) • hCard is a microformat version of vCard
Mozilla Operator add-on https://addons.mozilla.org/en-US/firefox/addon/operator/ • Leverages microformats and other semantic data available on many web pages to provide new ways to interact with web services • After adding it, View Toolbar Customize and drag the Operator icon from the pallet to the add-on bar • Then, at the top of the Mozilla window, ViewSidebar and click Operator
Operator toolbar Operator icon and drop-down menu
Add various items of info to various services • Here add an event to my Google Calendar • Get the same options in the toolbar just above the page