1 / 27

CIS 550

CIS 550. Handout 7 -- XPATH and XQuery. URLs -- XPath. http://www.w3.org/TR/xpath This is the “recommendation”. Dense. Few examples. Difficult to extract the “big picture” from the morass of detail http://www.zvon.org/xxl/XPathTutorial/ General/examples.html

eydie
Télécharger la présentation

CIS 550

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIS 550 Handout 7 -- XPATH and XQuery Fall 2001

  2. URLs -- XPath • http://www.w3.org/TR/xpath This is the “recommendation”. Dense. Few examples. Difficult to extract the “big picture” from the morass of detail • http://www.zvon.org/xxl/XPathTutorial/ General/examples.html A tutorial with some simple examples. Maybe too simple. There are lots of tutorials on the web. Fall 2001

  3. URLs -- XQuery • http://www.w3.org/TR/xquery/ The basic recommendation. Plenty of examples, so work through these first. • http://www.w3.org/TR/query-semantics/ A formal semantics for XQuery. Despite its forbidding title, it is remarkably readable. It also discusses a type system for XQuery. • http://www.w3.org/TR/xmlquery-use-cases A bunch of example queries and their solution in XQuery (not surprising, since XQuery is Turing-complete!) Fall 2001

  4. db dept depts emps mgr emp emp name name name How to Identify nodes in a Tree -- Regular Path Expressions In the normal syntax of regular expressions: db.emps.emp db.(depts.dept.mgr |emps.emp) db._*.name “Mary” “Bill” “John” N.B. Regular path expressions have nothing to do with regular expresions in DTDs Fall 2001

  5. More examples With the DTD: <!ELEMENT PERSON (NAME, FATHER, MOTHER)> <!ELEMENT MOTHER (PERSON?)> … the regular path expression (PERSON.MOTHER)* identifies matrilineal ancestry XPATH is a “superset of a subset” of regular path expressions. (It cannot express this set of nodes.) However, it is not limited to moving “down” the tree. Fall 2001

  6. XPath • Primary goal = to permit to access some nodes from a given document • XPath main construct : axis navigation • An XPath path consists of one or more navigation steps, separated by / • A navigation step is a triplet: axis + node-test + list of predicates • Examples • /descendant::node()/child::author • /descendant::node()/child::author[parent/attribute::booktitle = “XML”][2] • XPath also offers some shortcuts • no axis means child • // º /descendant-or-self::node()/ Fall 2001

  7. context node aaa ccc aaa aaa ccc 2 3 1 bbb bbb 4 5 6 7 XPath- child axis navigation • author is shorthand for child::author. Examples: • aaa -- all the child nodes labeled aaa (1,3) • aaa/bbb -- all the bbb grandchildren of aaa children (4) • */bbb all the bbb grandchildren of any child (4,6) • . -- the context node • / -- the root node Fall 2001

  8. XPath- child axis navigation (cont) • /doc -- all the doc children of the root • ./aaa -- all the aaa children of the context node (equivalent to aaa) • text() -- all the text children of the context node • node() -- all the children of the context node (includes text and attribute nodes) • .. -- parent of the context node • .// -- the context node and all its descendants • // -- the root node and all its descendants • //para-- all the para nodes in the document • //text() -- all the text nodes in the document • @font the font attribute node of the context node Fall 2001

  9. Predicates • [2] -- the second child node of the context node • chapter[5] -- the fifth chapter child of the context node • [last()] -- the last child node of the context node • chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes) • person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe” • From the XPath specification: NOTE: If $x is bound to a node set then $x = “foo” does not mean the same as not ($x != “foo”) . Fall 2001

  10. Unions of Path Expressions • employee | consultant -- the union of the employee and consultant nodes that are children of the context node • For some reason person/(employee|consultant) --as in regular path expressions -- is not allowed • However person/node()[boolean(employee|consultant)] is allowed!! • From the XPATH specification: • The boolean function converts its argument to a boolean as follows: • a number is true if and only if it is neither positive or negative zero nor NaN • a node-set is true if and only if it is non-empty • a string is true if and only if its length is non-zero • an object of a type other than the four basic types is converted to a boolean in a way that is dependent on that type Fall 2001

  11. Axis navigation • So far, nearly all our expressions have moved us down the by moving to child nodes. Exceptions were • . -- stay where you are • / go to the root • // all descendants of the root • .// all descendants of the context node • All other expressions have been abbreviations for child::… e.g. child::para. child:is an example of an axis • XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self • Some of these (self, parent) describe single nodes, others describe sequences of nodes. Fall 2001

  12. XPath Navigation Axes(merci, Arnaud Sahuguet) ancestor preceding-sibling following-sibling self child attribute preceding following namespace descendant Fall 2001

  13. XPath abbreviated syntax (nothing) child:: @ attribute:: // /descendant-or-self::node() . self::node() .// descendant-or-self::node .. parent::node() / (document root) Fall 2001

  14. XPath • Reasonably widely adopted -- in XML-Schema and query languages. • Neither more expressive nor less expressive than regular path expressions (can’t do (ab)* ) • Particularly messy in some areas: • defining order of results • overloading of operations, • e.g. [chapter/title = “Introduction”] • why not [ “Introduction” IN chapter/title] ? Fall 2001

  15. XQuery proposed by Chamberlin, Robbie and Florescu (from the authors’ slides) • Leverage the most effective features of several existing and proposed query languages • Design a small, clean, implementable language • Cover the functionality required by all the XML Query use cases in a single language • Write queries that fit on a slide Fall 2001

  16. bind variables where <pattern> in <XML-expression> <pattern> in <XML-expression> … <condition> construct <expression> use variables bind variables for x in <XPath-expression> y in <XPath-expression> … where <condition> return <expression> use variables XQuery = XPath + “comprehension” syntax • XML -QL • Quilt Fall 2001

  17. Examples from XQuery List the titles of books published by Morgan Kaufmann in 1998. FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title XPath expressions inorange Fall 2001

  18. Examples from XQuery (cont) List each publisher and the average price of its books. FOR $p IN distinct(document("bib.xml")//publisher) LET $a := avg( document("bib.xml")//book[publisher = $p]/price) RETURN <publisher> <name> {$p/text()} </name> <avgprice> {$a} </avgprice> </publisher> LET binds a variable to a value. It does not cause an iteration. Does this create a (well-formed) XML document? Fall 2001

  19. Examples from XQuery (cont) List the publishers who have published more than 100 books. <big_publishers> { FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")//book[publisher = $p] WHERE count($b) > 100 RETURN $p } </big_publishers> What about efficiency? Fall 2001

  20. Examples from XQuery (cont) Invert the structure of the input document so that each distinct author element contains a sequence of book-titles. <author_list> { FOR $a IN distinct(document("bib.xml")//author) RETURN <author> <name> {$a/text()} </name> { FOR $b IN document("bib.xml")//book[author = $a] RETURN $b/title } </author> } </author_list> Fall 2001

  21. More Examples (Quilt)(from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt ) Relational data -- two DTDs: <?xml version="1.0" ?> <!DOCTYPE items [ <!ELEMENT items (item_tuple*)> <!ELEMENT item_tuple (itemno, description, offered_by, start_date?, end_date?, reserve_price? )> <!ELEMENT itemno (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT offered_by (#PCDATA)> <!ELEMENT start_date (#PCDATA)> <!ELEMENT end_date (#PCDATA)> <!ELEMENT reserve_price (#PCDATA)> ]> <?xml version="1.0" ?> <!DOCTYPE bids [ <!ELEMENT bids (bid_tuple*)> <!ELEMENT bid_tuple (userid, itemno, bid, bid_date)> <!ELEMENT userid (#PCDATA)> <!ELEMENT itemno (#PCDATA)> <!ELEMENT bid (#PCDATA)> <!ELEMENT bid_date (#PCDATA)> ]> Fall 2001

  22. The data <items> <item_tuple> <itemno>1001</itemno> <description>Red Bicycle</description> <offered_by>U01</offered_by> <start_date>1999-01-05</start_date> <end_date>1999-01-20</end_date> <reserve_price>40</reserve_price> </item_tuple> <item_tuple> <itemno>1002</itemno> <description>Motorcycle</description> <offered_by>U02</offered_by> <start_date>1999-02-11</start_date> <end_date>1999-03-15</end_date> <reserve_price>500</reserve_price> </item_tuple> … </items> <bids> <bid_tuple> <userid>U02</userid> <itemno>1001</itemno> <bid>35</bid> <bid_date>99-01-07</bid_date> </bid_tuple> <bid_tuple> <userid>U04</userid> <itemno>1001</itemno> <bid>40</bid> <bid_date>99-01-08</bid_date> </bid_tuple> … </bids> Fall 2001

  23. Query 1 FUNCTION date() { "1999-02-01" } <result> ( FOR $i IN document("items.xml")//item_tuple WHERE $i/start_date LEQ date() AND $i/end_date GEQ date() AND contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description </item_tuple> SORTBY (itemno) ) </result> simple function definitions dates are formatted so that lexicographic ordering gives the right result Fall 2001

  24. Output from Q1 <?xml version="1.0" ?> <result> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> </item_tuple> </result> Fall 2001

  25. Query Q2 For all bicycles, list the item number, description, and highest bid (if any), ordered by item number. <result> ( FOR $i IN document("items.xml")//item_tuple LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno] WHERE contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description , IF ($b) THEN <high_bid> NumFormat("#####.##", max(-1, $b/bid)) </high_bid> ELSE "" </item_tuple> SORTBY (itemno) ) </result> lots of coercion Fall 2001

  26. Output from Q2 <result> <item_tuple> <itemno> 1001 </itemno> <description> Red Bicycle </description> <high_bid> 55 </high_bid> </item_tuple> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> <high_bid> 20 </high_bid> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> <high_bid> 225 </high_bid> </item_tuple> <item_tuple> <itemno> 1008 </itemno> <description> Broken Bicycle </description> </item_tuple> </result> Fall 2001

  27. Query Q3 Find cases where a user with a rating worse (alphabetically greater than "C" ) offers an item with a reserve price of more than 1000. <result> ( FOR $u IN document("users.xml")//user_tuple, $i IN document("items.xml")//item_tuple WHERE $u/rating GT 'C' AND $i/reserve_price GT 1000 AND $i/offered_by = $u/userid RETURN <warning> <user_name>$u/name/text()</user_name>, <user_rating>$u/rating/text()</user_rating>, <item_description>$i/description/text()</item_description>, $i/reserve_price </warning> ) </result> Comparing sets with singletons Same rules as in XPath? In this case the DTD gives uniqueness Fall 2001

More Related