1 / 32

Query Languages for XML

Query Languages for XML. Why a query language? Extracting, Restructuring, Integration, Browsing… XML-QL http://www.w3.org/TR/NOTE-xml-ql http://db.cis.upenn.edu/XML-QL/ XPATH (part of a query language) http: www.w3.org/TR/xpath XSLT http://www.w3.org/TR/xslt

milos
Télécharger la présentation

Query Languages for XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Languages for XML CSE330

  2. Why a query language? Extracting, Restructuring, Integration, Browsing… XML-QL http://www.w3.org/TR/NOTE-xml-ql http://db.cis.upenn.edu/XML-QL/ XPATH (part of a query language) http:www.w3.org/TR/xpath XSLT http://www.w3.org/TR/xslt http://www.mulberrytech.com/quickref/XSLTquickref.pdf QUILT http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html http://db.cis.upenn.edu/Kweelt/ CSE330

  3. XML-QL (XML Query Language) • W3C proposal, August 1998 • authors: • Mary Fernandez AT&T • Dana Florescu INRIA • Alon Levy Univ. of Washington • Dan Suciu AT&T • Alin Deutsch Univ. of Pennsylvania CSE330

  4. Address Book Revisited <addrBook> <person SSN=“111-22-3333”> <name> Caesar </name> <greet> Caesar Imperator</greet> <addr> The Capitol </addr> <addr> Rome, OH 98765 </addr> <tel> (321) 786 2543 </tel> <fax> (321) 786 2543 <fax> <tel> (321) 786 2543 </tel> <email> jc@forum.rome.org </email> </person> </addrBook> CSE330

  5. XML-QL: Pattern Matching Find Caesar’s e-mail address: where<addrBook> <person> <name>Caesar</name> <email>$e</email> </person> </addrBook> in “http://db.cis.upenn.edu/~peter/address.xml” construct $e <XML>jc@forum.rome.org</XML> Data Extraction CSE330

  6. XML-QL: Constructing New XML Data Whom can we contact electronically? where<addrBook> <person> <greet>$g</greet> <email>$e</email> </person> </addrBook> in “http://...” construct <e-contact> <who>$g</who> <where>$e</where> </e-contact> <XML> <e-contact> <who>Caesar Imperator</who> <where>jc@forum.rome.org </where> </e-contact> <e-contact> <who>Brutus</who> <where>mb@philippi.com </where> </e-contact> ... </XML> Data Restructuring CSE330

  7. XML-QL: Joins Who of our contacts was involved in a movie? where<addrBook> <person> <greet>$g</greet> <email>$e</email> </person> </addrBook> in “http://…address.xml” <movie><title>$t</> <character>$g</> </movie> in “http://www.imdb.com” construct <cine-contact> <who>$g</who> <movie>$t</movie> <where>$e</where> </cine-contact> CSE330

  8. XML-QL: Joins (cont’d) <XML> <cine-contact> <who>Caesar Imperator</who> <where>jc@forum.rome.org</where> <movie>Asterix and Cleopatra</movie> </cine-contact> <cine-contact> <who>Dr. Strangelove</who> <where>strangelov@love.the.bomb</where> <movie>Dr. Strangelove or How I Stopped ...</movie> </cine-contact> ... </XML> Data Integration CSE330

  9. XML-QL Data Model • Directed, labeled graph • Tags represented as edge labels • Sets of attribute name-value pairs as node labels • Two models: ordered and unordered CSE330

  10. addrBook person SSN=“111-…” name greet tel fax tel email Caesar Caesar Imperator addr addr (321) 786 2543 The Capitol Rome, OH XML-QL Data Model (cont’d) <person SSN=“111-22-3333”> <name> Caesar </name> <greet> Caesar Imperator </greet> <addr> TheCapitol </addr> <addr> Rome, OH 98765 </addr> <tel> (321) 786 2543 </tel> <fax> (321) 786 2543 <fax> <tel> (321) 786 2543 </tel> <email> jc@forum.rome.org </email> </person> CSE330

  11. addrBook person person SSN=“111-…” SSN=“111-…” name greet tel fax tel email name greet tel fax tel email Stragelove strangelov@ jc@forum.rome Caesar Dr. Strangelove Caesar Imperator addr addr addr addr (321) 786 2543 The Capitol Washington, DC The Capitol Rome, OH where <addrBook> <person> <name>$n</> <email>$e</> </> </> $n $e Caesar jc@forum.rome.org Strangelove strangelov@love.the.bomb XML-QL Semantics: Variable Bindings CSE330

  12. $n $e Caesar jc@forum.rome.org Strangelove strangelov@love.the.bomb XML-QL Semantics: XML Output construct <e-contact> <who>$n</who> <where>$e</where> </e-contact> XML e-contact e-contact who where who where Caesar jc@forum.rome.org Strangelove strangelov@love.the.bomb CSE330

  13. Advanced XML-QL Find tags of person subelements: where<addrBook.person.$tag></> in “http://db.cis.upenn.edu/~peter/address.xml” construct <childOfPerson>$tag</> Find all email addresses and fax numbers : where<addrBook._*. (email | fax)>$eORf</> in “http://db.cis.upenn.edu/~peter/address.xml” construct <emailOrFax>$eORf</> Schema browsing CSE330

  14. More Advanced XML-QL Find attributes of person elements: where<_*.person $attrName=$attrVal></> in “http://db.cis.upenn.edu/~peter/address.xml” construct <personAttribute> <name>$attrName</> <value>$attrVal</> </> Schema browsing CSE330

  15. XPath • Reasonably widely adopted -- in XML-Schema and query languages. • Neither more expressive nor less expressive than regular path expressions (can’t do (ab)* ) • Primary goal = to permit to access some nodes from a given document • XPath main construct : axis navigation • An XPath path consists of one or more navigation steps, separated by / • A navigation step is a triplet: axis + node-test + list of predicates • Examples • /descendant::node()/child::author • /descendant::node()/child::author[parent/attribute::booktitle = “XML”][2] • XPath also offers some shortcuts • no axis means child • // º /descendant-or-self::node()/ CSE330

  16. context node aaa ccc aaa aaa ccc 2 3 1 bbb bbb 4 5 6 7 XPath- child axis navigation • author is shorthand for child::author. Examples: • aaa -- all the child nodes labeled aaa (1,3) • aaa/bbb -- all the bbb grandchildren of aaa children (4) • */bbb all the bbb grandchildren of any child (4,6) • . -- the context node • / -- the root node CSE330

  17. XPath- child axis navigation (cont) • /doc -- all the doc children of the root • ./aaa -- all the aaa children of the context node (equivalent to aaa) • text() -- all the text children of the context node • node() -- all the children of the context node (includes text and attribute nodes) • .. -- parent of the context node • .// -- the context node and all its descendants • // -- the root node and all its descendants • //para -- all the para nodes in the document • //text() -- all the text nodes in the document • @font the font attribute node of the context node CSE330

  18. Predicates • [2] -- the second child node of the context node • chapter[5] -- the fifth chapter child of the context node • [last()] -- the last child node of the context node • chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes) • person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe” CSE330

  19. Unions of Path Expressions • employee | consultant -- the union of the employee and consultant nodes that are children of the context node • For some reason person/(employee|consultant) --as in regular path expressions -- is not allowed • However person/node()[boolean(employee|consultant)] is allowed!! • From the XPATH specification: • The boolean function converts its argument to a boolean as follows: • a number is true if and only if it is neither positive or negative zero nor NaN • a node-set is true if and only if it is non-empty • a string is true if and only if its length is non-zero • an object of a type other than the four basic types is converted to a boolean in a way that is dependent on that type CSE330

  20. Axis navigation • So far, nearly all our expressions have moved us down the by moving to child nodes. Exceptions were • . -- stay where you are • / go to the root • // all descendants of the root • .// all descendants of the context node • All other expressions have been abbreviations for child::… e.g. child::para. child:is an example of an axis • XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self • Some of these (self, parent) describe single nodes, others describe sequences of nodes. CSE330

  21. XPath Navigation Axes(merci, Arnaud Sahuguet) ancestor preceding-sibling following-sibling self child attribute preceding following namespace descendant

  22. XPath abbreviated syntax (nothing) child:: @ attribute:: // /descendant-or-self::node() . self::node() .// descendant-or-self::node .. parent::node() / (document root)

  23. Quilt proposed by Chamberlin, Robbie and Florescu (from the authors’ slides) • Leverage the most effective features of several existing and proposed query languages • Design a small, clean, implementable language • Cover the functionality required by all the XML Query use cases in a single language • Write queries that fit on a slide • Design a quilt, not a camel CSE330

  24. bind variables where <pattern> in <XML-expression> <pattern> in <XML-expression> … <condition> construct <expression> use variables bind variables for x in <XPath-expression> y in <XPath-expression> … where <condition> return <expression> use variables Quilt = XPath + “comprehension” syntax • XML -QL • Quilt CSE330

  25. Examples of Quilt(from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt ) Relational data -- two DTDs: <?xml version="1.0" ?> <!DOCTYPE items [ <!ELEMENT items (item_tuple*)> <!ELEMENT item_tuple (itemno, description, offered_by, start_date?, end_date?, reserve_price? )> <!ELEMENT itemno (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT offered_by (#PCDATA)> <!ELEMENT start_date (#PCDATA)> <!ELEMENT end_date (#PCDATA)> <!ELEMENT reserve_price (#PCDATA)> ]> <?xml version="1.0" ?> <!DOCTYPE bids [ <!ELEMENT bids (bid_tuple*)> <!ELEMENT bid_tuple (userid, itemno, bid, bid_date)> <!ELEMENT userid (#PCDATA)> <!ELEMENT itemno (#PCDATA)> <!ELEMENT bid (#PCDATA)> <!ELEMENT bid_date (#PCDATA)> ]> CSE330

  26. The data <items> <item_tuple> <itemno>1001</itemno> <description>Red Bicycle</description> <offered_by>U01</offered_by> <start_date>1999-01-05</start_date> <end_date>1999-01-20</end_date> <reserve_price>40</reserve_price> </item_tuple> <item_tuple> <itemno>1002</itemno> <description>Motorcycle</description> <offered_by>U02</offered_by> <start_date>1999-02-11</start_date> <end_date>1999-03-15</end_date> <reserve_price>500</reserve_price> </item_tuple> … </items> <bids> <bid_tuple> <userid>U02</userid> <itemno>1001</itemno> <bid>35</bid> <bid_date>99-01-07</bid_date> </bid_tuple> <bid_tuple> <userid>U04</userid> <itemno>1001</itemno> <bid>40</bid> <bid_date>99-01-08</bid_date> </bid_tuple> … </bids> CSE330

  27. Query 1 FUNCTION date() { "1999-02-01" } <result> ( FOR $i IN document("items.xml")//item_tuple WHERE $i/start_date LEQ date() AND $i/end_date GEQ date() AND contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description </item_tuple> SORTBY (itemno) ) </result> simple function definitions XPath expressions inorange dates are formatted so that lexicographic ordering gives the right result CSE330

  28. Output from Q1 <?xml version="1.0" ?> <result> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> </item_tuple> </result> CSE330

  29. Query Q2 For all bicycles, list the item number, description, and highest bid (if any), ordered by item number. <result> ( FOR $i IN document("items.xml")//item_tuple LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno] WHERE contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description , IF ($b) THEN <high_bid> NumFormat("#####.##", max(-1, $b/bid)) </high_bid> ELSE "" </item_tuple> SORTBY (itemno) ) </result> use of variable in Xpath lots of coercion CSE330

  30. Output from Q2 <result> <item_tuple> <itemno> 1001 </itemno> <description> Red Bicycle </description> <high_bid> 55 </high_bid> </item_tuple> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> <high_bid> 20 </high_bid> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> <high_bid> 225 </high_bid> </item_tuple> <item_tuple> <itemno> 1008 </itemno> <description> Broken Bicycle </description> </item_tuple> </result> CSE330

  31. Query Q3 Find cases where a user with a rating worse (alphabetically greater than "C" ) offers an item with a reserve price of more than 1000. <result> ( FOR $u IN document("users.xml")//user_tuple, $i IN document("items.xml")//item_tuple WHERE $u/rating GT 'C' AND $i/reserve_price GT 1000 AND $i/offered_by = $u/userid RETURN <warning> <user_name>$u/name/text()</user_name>, <user_rating>$u/rating/text()</user_rating>, <item_description>$i/description/text()</item_description>, $i/reserve_price </warning> ) </result> Comparing sets with singletons Same rules as in XPath? In this case the DTD gives uniqueness CSE330

  32. Conclusions • XML is a data format for which there are an increasing number of useful tools for • Constructing schemas • Programming • Querying • Although it is likely that a query language will soon emerge as a standard, there is less agreement or understanding on how to store XML data efficiently. • Many other database issues remain to make it useful for manipulating large amounts of data. CSE330

More Related