340 likes | 475 Vues
This discussion, led by John Snelson, focuses on leveraging RDF and XML databases to enhance the dynamic semantic publishing of sports data. It highlights the challenges faced in managing large datasets—covering numerous athletes, teams, and related statistics—while ensuring a rich user experience. The talk addresses solutions for automating content management, personalizing user journeys, and adapting to fast-changing information. By utilizing triple stores and linked data, it emphasizes robust and scalable architectures for effective data retrieval and analytics within sports media.
E N D
An RDF and XML Database John Snelson, Lead Engineer 23rdOctober 2013
MarkLogic DATABASE SEARCH APPLICATION SERVICES
Data ≠ • Information
Data + • Context = • Information
Dynamic Semantic PublishingBBC Sports The Challenge Goals • Size and Complexity: • # of athletes • # of teams • # of assets (match reports, statistics, etc.) • # of relations (facts) • Rich user experience • See information in context • Personalize content • Easy navigation • Intelligently serve ads (outside of UK) • Manageable • Static pages? Too many, changing too fast • Limited number of journalists • Automate as much as possible
Dynamic Semantic PublishingA Solution XML Database Triple Store • Store, manage documents • Stories • Blogs • Feeds • Profiles • Store, manage values • Statistics • Full-Text search • Performance, scalability • Robustness • Metadata about documents • Tagged by journalists • Added (semi-)automatically • Inferred • Facts reported by journalists • Linked Open Data for real-world facts
Dynamic Semantic PublishingUnderstanding Data played in plays for plays in
What is RDF? “John” :person4 :place5 :birth-place :first-name :spouse :has-child :birth-place :spouse :has-parent :person5 :has-child :person20 :has-parent
What is RDF? • Schema-less • Triple granularity • Open world assumption • Joins - the cost of granularity RDF
What is Semantics? Datastored in Triples • Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London""London" : isIn : "England"
What is Semantics? Datastored in Triples • Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London""London" : isIn : "England" Rulestell us something about the triples Example: If (A livesIn X) AND (X isIn Y) then (A livesIn Y) Inference: "John Smith" : livesIn : "England"
What is Semantics? Datastored in Triples • Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London""London" : isIn : "England" Rulestell us something about the triples "John Smith" "London" "England" livesIn isIn livesIn
Semantics Architecture GRAPH SPARQL TRIPLE SPARQL XQY XSLT SQL
Triple Index • 3 triple orders • Cached for performance • Works seamlessly with other indexes • Security • 150 bytes per triple on disk • Billions oftriples per host • Scaling out horizontally TRIPLE
RDF Loading RDF
Triples Embedded in Documents … <sem:triple> <sem:subject> http://example.org/kennedy/person12 </sem:subject> <sem:predicate> http://example.org/kennedy/last-name </sem:predicate> <sem:object datatype="http://www.w3.org/2001/XMLSchema#string"> Lawford </sem:object> </sem:triple> …
Content, Data, and Semantics <SAR> </title> Suspicious vehicle near airport <title> Suspicious vehicle… 2012-11-12Z <date> </date> <type> observation/surveillance </type> <threat> <type> suspicious activity </type> <category> suspicious vehicle </category> </threat> <location> <lat> 37.497075 </lat> <long> -122.363319 </long> </location> A blue van with license plate ABC 123 was observed parked behind the airport sign… <description> A blue van… </triple> <predicate> isa </predicate> license-plate <object> </object> IRIID <subject> </subject> <triple> <triple> </subject> value <subject> IRIID <predicate> </predicate> <object> ABC 123 </object> </triple> </description> </SAR>
Content, Data, and Semantics Unstructured full-text <SAR> <description> <title> Suspicious vehicle… <triple> <type> A blue van… <object> <location> <date> <triple> <predicate> ABC 123 <lat> 2012-11-12Z <threat> <long> <subject> value <subject> 37.497075 IRIID -122.363319 <predicate> IRIID <type> observation/surveillance <object> Semantic (RDF) Triples isa <category> Geospatial Data suspicious activity license-plate suspicious vehicle
RDFValues <http://example.org/kennedy/person4> _:blank1 “string value”^^xs:string “bonjour”@fr “2013-04-09”^^xs:date “simple” “987”^^xs:double
Datatype Mapping IRI <http://example.com> sem:iri(“http:// example.com”) Blank Node _:blank1 sem:blank(“…”) Simple Literal “simple” xs:string(“simple”) Language “bonjour”@fr Tagged Literal rdf:langString(“bonjour”, “fr”)
SPARQL select * where { ?person :birth-place ?place; :first-name “John” } • Executed using the triple index • SPARQL 1.0 + much of SPARQL 1.1 • Cost-based optimization • Join ordering and algorithms SPARQL
Executing SPARQL sem:sparql(“ prefix : <http://example.org/kennedy/> select * { ?person :first-name ?first; :last-name ?last; :alma-mater [:ivy-league :true] }”, map:entry(“first”,“John”), (), cts:collection-query(“mycollection”) )
Returning Binding Solutions select * where { ?person :birth-place :place5 } select * where { ?person :birth-place ?place; :first-name “John” }
Solution Results map:map
SPARQL Query Results XML Format sem:query-result-serialize( sem:sparql(“select* { … }”), “xml” )
Returning Triples describe :person4 construct { ?bp :uses-name ?fn } where { ?person :birth-place ?bp; :first-name ?fn }
Triple Results :place0 :uses-name “Ethel”, “Jeffrey”, “Kara” . :place1 :uses-name “Edward”, “James” . :place10 :uses-name “Robert”, “Sheila”, “Stephen” . sem:triple sem:iri
Querying Named Graphs select * from <http://my_graph> where { ?s ?p ?o } collection select * where { graph <http://my_graph> { ?s ?p ?o } }
Restricting The Datasets let $options := “properties” let $query := cts:and-query( cts:directory-query(“/triples/”), cts:element-range-query( xs:QName(“date”),“>”,$date) ) returnsem:sparql(“…”,(),(), $options,$query)
Creating Triples Returning sem:triple values Inserting to a database • sem:triple() • sem:rdf-parse() • sem:rdf-get() • sem:rdf-builder() • sem:rdf-load() • sem:rdf-insert()
Graph Store API • declare function graph-insert( • $graphname as sem:iri, • $triples as sem:triple*, • [$permissions as element(sec:permission)*, • $collections as xs:string*, • $quality as xs:int?, • $forest-ids as xs:unsignedLong*] • ) as xs:string*; • declare function graph-delete( • $graphname as sem:iri • ) as empty-sequence();
Conclusion • Semantics can enhance your data-oriented and search applications. • XQuery and SPARQL work well together. • A combination RDF and XML database simplifies working with the technologies together. • Try MarkLogic 7:http://www.marklogic.com/early-access/