CSE 636 Data Integration
Learn the basics of XPath query language and how to use it for querying XML documents. This tutorial covers XPath expressions, functions, qualifiers, and navigation axes.
CSE 636 Data Integration
E N D
Presentation Transcript
CSE 636Data Integration XML Query Languages XPath
XPath • http://www.w3.org/TR/xpath (11/99) • Building block for other W3C standards: • XSL Transformations (XSLT) • XML Link (XLink) • XML Pointer (XPointer) • XQuery • Was originally part of XSL
Example for XPath Queries <bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> </bib>
Data Model for XPath / Document The root XML PI Comment Element bib The root element Element book Element book … Element publisher Element author … Text Addison-Wesley Text Serge Abiteboul
XPath: Simple Expressions /bib/book/year Result: <year> 1995 </year> <year> 1998 </year> /bib/paper/year Result: empty (there were no papers)
XPath: Restricted Kleene Closure //author Result: <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> /bib//first-name Result: <first-name> Rick </first-name>
XPath: Functions /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: • text() = matches the text value • node() = matches any node (= * or @* or text()) • name()= returns the name of the current tag
XPath: Wildcard //author/* Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element
XPath: Attribute Nodes /bib/book/@price Result: “55” @price means that price is has to be an attribute
XPath: Qualifiers /bib/book/author[first-name] Result:<author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author>
XPath: More Qualifiers /bib/book/author[firstname][address[//zip][city]]/lastname Result: <lastname> … </lastname> <lastname> … </lastname>
XPath: More Qualifiers /bib/book[@price < “60”] /bib/book[author/@age < “25”] /bib/book[author/text()]
XPath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib bib/book[@price<“55”]/author/lastname matches…
XPath: More Details • An XPath expression, p, establishes a relation between: • A context node, and • A node in the answer set • In other words, p denotes a function: • S[p] : Nodes {Nodes} • Examples: • author/firstname • . = self • .. = parent • part/*/*/subpart/../name = part/*/*[subpart]/name
The Root and the Root <bib> <paper> 1 </paper> <paper> 2 </paper> </bib> • bib is the “document element” • The “root” is above bib • /bib = returns the document element • / = returns the root • Why? • Because we may have comments before and after <bib> • They become siblings of <bib>
XPath: More Details • We can navigate along 13 axes: ancestor ancestor-or-self parent attribute child descendant-or-self descendant following following-sibling namespace preceding preceding-sibling self We’ve only seen these, so far
XPath: More Details • Examples: • child::author/child:lastname = author/lastname • child::author/descendant-or-self::node()/child::zip = author//zip • child::author/parent::* = author/.. • child::author/attribute::age = author/@age • What does this mean ? • /bib/book/publisher/parent::*/author • /bib//address[ancestor::book] • /bib//author/ancestor::*//zip
XPath: Even More Details • name() = the name of the current node • /bib//*[name()=book] same as /bib//book • What does this mean? /bib//*[ancestor::*[name()!=book]] • Is it equivalent to the following? • /bib//* • /bib//*[name()!=book]//* • Navigation axis gives us strictly more power!
XPath: Example How do we evaluate this XPath expression?/bib//*[name()!=book]//* Let’s take it one step at a time bib A B book C D
XPath: Example /bib returns the following list of one node: bib A B book C D
XPath: Example /bib//* when executed on the previous node list, returns the following new list of nodes: A B book C D book C D C D
XPath: Example /bib//*[name()!=book] when executed on the previous node list, it eliminates one node: A B book C D C D
XPath: Example /bib//*[name()!=book]//* gives us the resulting node list of the XPath expression: book C D C D
Keys in XML Schema • We forgot something about XML Schema • Keys • Key References • Why? • XPath is used for keys and key references
Keys in XML Schema XML: <purchaseReport> <regions> <zipcode="95819"> <partnumber="872-AA" quantity="1"/> <partnumber="926-AA" quantity="1"/> <partnumber="833-AA" quantity="1"/> <partnumber="455-BX" quantity="1"/> </zip> <zip code="63143"> <partnumber="455-BX" quantity="4"/> </zip> </regions> <parts> <partnumber="872-AA">Lawnmower</part> <partnumber="926-AA">Baby Monitor</part> <partnumber="833-AA">Lapis Necklace</part> <partnumber="455-BX">Sturdy Shelves</part> </parts> </purchaseReport> XML Schema: <keyname="NumKey"> <selectorxpath="parts/part"/> <fieldxpath="@number"/> </key>
Keys in XML Schema XML Schema: <xs:elementname="purchaseReport"> <xs:complexType> <xs:sequence> <xs:element name="regions"> … </xs:element> <xs:element name="parts"> … </xs:element> </xs:sequence> </xs:complexType> <xs:key name="numKey"> <xs:selector xpath="parts/part" /> <xs:field xpath="@number" /> </xs:key> <keyref name="numKeyRef" refer="numKey"> <selector xpath="regions/zip/part" /> <field xpath="@number" /> </keyref> </xs:element>
Keys in XML Schema • In general, two flavors: Note • All XPath expressions “start” at the element currently being defined • The fields must identify a single node <keyname=“someNameHere"> <selectorxpath=“p"/> <fieldxpath=“p1"/> <fieldxpath=“p2"/> … <fieldxpath=“pk"/> </key> <uniquename=“someNameHere"> <selectorxpath=“p"/> <fieldxpath=“p1"/> <fieldxpath=“p2"/> … <fieldxpath=“pk"/> </key>
Keys in XML Schema • Unique = guarantees uniqueness • Key = guarantees uniqueness and existence • All XPath expressions are “restricted”: • /a/b | /a/c OK for selector • //a/b/*/c OK for field • Note: better than DTD’s ID mechanism
Keys in XML Schema • Examples • <keyname="fullName"> • <selectorxpath=".//person"/> • <fieldxpath="forename"/> • <fieldxpath="surname"/> • </key> • <uniquename="nearlyID"> • <selectorxpath=".//*"/> • <fieldxpath="@id"/> • </unique> Recall: must have a single forename, surname
Foreign Keys in XML Schema • Examples • <keyrefname="personRef" refer="fullName"> • <selectorxpath=".//personPointer"/> • <fieldxpath="@first"/> • <fieldxpath="@last"/> • </keyref>
References • Lecture Slides • Dan Suciu • http://www.cs.washington.edu/homes/suciu/COURSES/590DS/06xpath.htm • http://www.cs.washington.edu/homes/suciu/COURSES/590DS/14constraintkeys.htm • BRICS XML Tutorial • A. Moeller, M. Schwartzbach • http://www.brics.dk/~amoeller/XML/index.html • W3C's XPath homepage • http://www.w3.org/TR/xpath • W3C's XML Schema homepage • http://www.w3.org/XML/Schema • XML School • http://www.w3schools.com