1 / 70

XML queries and updates

XML queries and updates. Daniela Florescu. Outline. Introduction XML Data Model and Type System XML Queries Technical discussions on Xquery design XML Updates Conclusions. What is XML?.

emma
Télécharger la présentation

XML queries and updates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML queries and updates Daniela Florescu

  2. Outline • Introduction • XML Data Model and Type System • XML Queries • Technical discussions on Xquery design • XML Updates • Conclusions

  3. What is XML? • The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web. • Base specifications: • XML 1.0, W3C Recommendation Feb '98 • Namespaces, W3C Recommendation Jan '99

  4. Simple XML Data Example <bookyear=“1967” xmlns:amz=“www.amazon.com”> <title>The politics of experience</title> <author>R.D. Laing</author> <amz:refamz:isbn=“1341-1444-555”/> <section> The great and true Amphibian, whose nature is disposed to….. <title>Persons and experience</title> Even facts become... </section> … </book>

  5. The secrets of the XML success • XML is a data representation format • XML is universal • XML is human readable • XML is machine readable • XML is international • XML is platform independent • XML is vendor independent • XML is endorsed by the W3C • XML is not a new technology • XML is not only a data representation format

  6. XML as a family of technologies • XML Information Set • XML Schema • XML Query • The Extensible Stylesheet Transformation Language (XSLT) • XML Forms • XML Protocol • XML Encryption • XML Signature • Others • … almost all the pieces needed for a reasonably good Web Services puzzle…

  7. Major application domains for XML • Data exchange on the Web • e.g.HealthCare Level Seven http://www.hl7.org/ • Application integration on the Web • e.g. ebXML http://www.ebxml.org/ • Document exchange on the Web • e.g. Encoded Archival DescriptionApplicationhttp://lcweb.loc.gov/ead/

  8. The role of an XML querylanguage • Why a query language for XML ? • Preserve the logical/physical data independence • The semantics is described in terms of an abstract data model, independent on the physical data storage • Declarativeprogramming • Such programs should describe the “what”, not the “how” • Why a native query language ?? • We need to deal with the specificities of XML (hierarchical, ordered , textual, potentially schema-less structure)

  9. XML query languages: state of the art • Query languages for graph data • e.g. GOOD, GraphLog, Clean • Query languages/scripting languages for the WEB • e.g. WebSQL, WebOQL, WebL • Query languages for semi-structured data • e.g. MSL, UnQL, StruQL, YATL

  10. XML query languages: state of the art • Research query languages for XML • e.g. XML-QL, Lorel, XML-GL, Quilt, Xduce • Industry query languages for XML • e.g. XQL, OQL extensions to query SGML documents • W3C standard processing languages for XML • e.g. XPath, XSLT Standard W3C XML Query Language: Xquery

  11. W3C Query Working Group - History • Sept 1999: WG creation and first F2F • Currently 30+ W3C member companies • Twelve F2F meetings and 80+ telecons so far • Public WDs every three months  http://www.w3.org/XML/Query

  12. W3C Query Working Group - Goal "The goal of the XML Query WG is to produce: - an abstract data model for XML documents, - a set of query operators on that data model, - a query language based on these query operators

  13. XML: Many Environments DOM DOM SAX SAX DBMS DBMS XQuery W3C XML Query Data Model W3C XML Query Data Model XML XML Java Java COBOL COBOL

  14. W3C XML Working Group - Status • June 2001: new or revised working drafts • XML Query Requirements • XML Query Use Cases • XML Query 1.0 and Xpath 2.0 Data Model • XML Query 1.0 Formal Semantics • Xquery 1.0: An XML Query Language • XML Syntax for Xquery 1.0 (XqueryX)

  15. General XML query requirements • Non-procedural, declarative query language • XML syntax for query language but also a human readable syntax • Protocol independent • Standard error conditions • Should not preclude updates

  16. XML Query Use Cases • Use Case Organization • Description, DTD/Schema, Input Data, Queries and Results • Current Use Cases • "XMP": Experiences and exemplars • "TREE": Queries that preserve hierarchy • "SEQ" - Queries based on sequences • "R" - Access to relational data • "TEXT": Full-text search • "NS" - Queries using namespaces • "PARTS" - Recursive computation • "REF" - Queries based on references

  17. XML Abstract Data Model • Common for Xpath 2.0 and XQuery 1.0 • A logical model composed of • a set of logicalentities • constructors and accessors for each entity • Based on the notion of an ordered tree • XML data cannot be modeled as a “simple” tree

  18. XML Abstract Data Model Entities • Nodes Node = Document | Element | Attribute | Text | Namespaces | PI | Comment • Simple values (all XML Schema simple types) string, boolean, ID, IDREF, decimal, QName, URI, ... • Sequences • Errors • Schema components

  19. Document Nodes: Constructors & Accessors • Constructor: document-node : URI X Sequence[1,*] (Element | Text | PI | Comment ) ->DocumentNode • Accessors: base-uri : DocumentNode -> URI children : DocumentNode -> Sequence[1,*](ElementNode|TextNode|PI|Comment) string-value : DocumentNode -> string

  20. Attribute Nodes: Constructors & Accessors • Constructor: attribute-node: Qname X string X SchemaComponent -> AttributeNode • Accessors: name : AttributeNode -> Qname type : AttributeNode -> SchemaComponent typed-value : AttributeNode -> Sequence(SimpleValue) string-value :AttributeNode -> string parent : AttributeNode -> Sequence[0,1] (Node)

  21. Sequences: Constructors & Accessors • Constructors: empty-sequence: () -> Sequence append : Sequence X Sequence -> Sequence • Accessors: empty : Sequence -> boolean head : Sequence -> UnitValue tail : Sequence -> Sequence

  22. XML data model - conclusion • Complete with respect to XML • Relatively simple design • ordered trees, node-labeled, with node identity • Semantics of the query language relies on the data model constructors and accessors • Relationship with the other W3C XML related standards: • Clear mapping to/from the XML Infoset • Less clear relationship with Document Object Model (DOM) • Less clear relationship with the XML Schema and the type system

  23. The Xquery type system • Xquery’s original design had a powerful type system (based on Xduce) • The type system can: (1) detect statically errors in the queries (2) infer the type of the result of valid queries (3) ensure statically that the result of a given query is of a given (expected) type if the input dataset is guaranteed to be of a given type • Queries on types • Big debate: XML type system vs. XML Schema

  24. Xquery in a nutshell • Functional language • A query is an expression • The result of the query is the result of the evaluation of the expression • Expressions are evaluated in a certain environment • Strongly typed • Every expression has a type • Statically typed • The type of the result of an expression can be detected statically • Formal semantics based on XML Abstract Data Model • Dual syntax: XML and non XML • Influenced by: SQL, OQL, XQL, Xpath, Quilt

  25. Xquery expressions • Constants and variables • expression1 operator expression2 • function(expression1,...expression2) • XPath expressions (for navigation) • FLWR expressions (for iteration) • SORTBY expressions • Quantified expressions • Conditional expressions • Type-related expressions • XML node constructors (elements, attributes, etc) • Xquery expressions can be nested with full generality !

  26. First XML queries • 1+1 • $x • $x/title • $x/price+1 • document(“www.amazon.com/books.xml”)

  27. Xquery functions and operators • Arithmetic operators • +, -, *, div, =, !=, <, etc • Logical operators • and, not, or • Collection oriented operators • union, intersection, difference, empty, distinct, count, sum, avg, min, max, etc • Global topological order related operators • before, after, unordered • XML specific functions • document, name, string-value, typed-value, etc • Many semantic open issues related to the semantics of these operators

  28. Xpath expressions • General syntax: expression ‘/’ step • Two syntaxes: abbreviated or not • Step in the non-abbreviated syntax: axis ‘::’ nodeTest • Axis control the navigation direction in the tree • ancestor, ancestor-or-self, attribute, child, descendent, descendent-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self • Node test by: • Name (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* ) • Type (e.g. node(), comment(), text() )

  29. Examples of path expressions • document(“bibliography.xml”)/child::bib • $x/child::bib/child::book/attribute::year • $x/parent::* • $x/ancestor::*/descendent::comment()

  30. Semantics of XPath expressions • Semantics of path expressions in Xpath 1.0 (1) Ordered forests of nodes as input, ordered forests of nodes as output (2) For each root node in the input forest, select the nodes in the same document that obey to the given axis (3) Among those select and return the ones that satisfy the node test. (4) No duplicates are allowed in the output (5) Output nodes are ordered by the document order (6) Nodes preserve their identity • No type error for $book/nose • A list of lists is automatically flattened

  31. Xpath abbreviated syntax (1) • Axis can be missing • By default the child axis $x/child::person -> $x/person • Short-hands for common axes • Descendent-or-self $x/descendant-or-self::comment() -> $x//comment() • Parent $x/parent::* -> $x/.. • Attribute $x/attribute::year -> $x/@year • Self $x/self::* -> $x/.

  32. Xpath abbreviated syntax (2) • Implicit root node $root/bib -> /bib $root -> / • Implicit current node (inside in the second order functions ) $self/title -> ./title $self/title -> title

  33. Simple iteration expression • Syntax : for variable in expression1 return expression2 • Example : for $x in document(“bibliography.xml”)/bib/book return $x/title • Semantics : • bind the variable to each root node of the forest returned by expression1 • for each such binding evaluate expression2 • concatenate the resulting forests • lists of lists are automatically flattened

  34. Local variable declaration • Syntax : let variable := expression1 return expression2 • Example : let $x := document(“bibliography.xml”)/bib/book return count($x) • Semantics : • bind the variable to the result of the expression1 • add this binding to the current environment • evaluate expression2 • remove the local variable from the environment.

  35. Conditional expressions • Syntax : if ( expression1 ) then expression2 else expression3 • Example : if ( $book/@year <1980 ) then “old book” else “new book” • Semantics : • If expression1 evaluates to true then return the result of the evaluation of expression2 else return the result of the evaluation of expression3.

  36. FOR var IN expr RETURN expr LET var := expr WHERE expr FLWR expressions • Syntactic sugar that combines FOR, LET, IF • Example for $x in //bib/book /* like the FROM in SQL */ let $y := $x/author /* no analog in SQL */ where $x/title=“The politics of experience” /* like the WHERE in SQL */ return count($y) /* like the SELECT in SQL */

  37. FLWR expression semantics • FLWR expression: for $x in //bib/book let $y := $x/author where $x/title=“The politics of experience” return count($y) • Semantically equivalent to: for $x in //bib/book return (let $y := $x/author return if ($x/title=“The politics of experience” ) then count($y) else () )

  38. More FLWR expression examples • Selections for $b in document("bib.xml")//book where $b/publisher = “Springer Verlag" and $b/@year = "1998" return $b/title • Joins for $b in document("bib.xml")//book, $p in //publisher where $b/publisher = $p/name return $b/title | $p/address/title, $p/name

  39. Xpath filter predicates • Syntax: expression1 [ expression2 ] • [] is an overloaded operator • Filtering by predicate : • //book [./author/firstname = “ronald”] • //book [@price <25] • //book [count(author [@gender=“female”] )>0 ] • Filtering by position : • /book[3] • /book[3]/author[1] • /book[3]/author[1 to 2]

  40. Quantified expressions • Syntax: some variable in expression1 satisfies expression2 every variable in expression1 satisfies expression2 • Examples: • some $x in //book satisfies $x/price >200 • //book[some $x in author satisfies $x/@gender=“female”] • for $x in //book where every $y in $x/author satisfies $y/@gender=“female” return $x/title

  41. SORTBY expressions • Syntax: expression0 SORTBY ( expression1 [ ASCENDING | DESCENDING ] , …., expressionK [ ASCENDING | DESCENDING ] ) • Examples: • //book sortby ( @price ) • //book[@year=“2001”]/author sortby (lastname, firstname) • for $x in //book where empty($x/author) return $x sortby (title)

  42. Global document order queries • Syntax: expression1 ( before | after ) expression2 • Examples: • //section before //section[title=“Persons and experiences”] • //paragraph after //section[name=“Introduction”] before //paragraph[contains(“Xquery”)

  43. Xquery element constructors • Standard XML elements: <section title=“Persons and experiences” > This is a section of the book entitled <title>The politics of Experience</title> written by <author> Ronald Laing</author>. </section> • Dynamically constructed elements: <section title = {$s/title} >This is a section of the book entitled {$s/ascendents::book/title} written by { for $a in$s/ascendents::book/author return <author> {concat($a/firstname, $a,lastname)} </author> }.</section>

  44. Complex Xquery example <bibliography> { for $x in //book[@year=“2001”] return <book title={$x/title}> { if(empty($x/author)) then $x/editor/affiliation else $x/author } </book> } </bibliography>

  45. Xquery operators on datatypes • INSTANCEOF • returns True if its first operand is an instance of the type named in its second operand • CAST • is used to convert a value from one datatype to another • TREAT • causes the query processor to treat an expression as though its datatype were a subtype of its static type • TYPESWITCH • branching based on the dynamic type of the input data

  46. Dealing with node identity • All nodes in the data model have node identity • Nodes identity is preserved through queries • Two equality functions for nodes: • Value based • Identity based

  47. Local function declarations • Example: function number_paragraphs($x ns:section) return xsd:integer {count($x/paragraph) + sum(for $y in $x/section return number_paragraphs($y))} number_paragraphs(/bib/book[title=“The politics of experience”]/section[1])

  48. Joins in XQuery <books-with-prices> {for $a in document(“amaxon.xml”)/book, $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return <book> {$a/title} <price-amazon>{$a/price}</price-amazon>, <price-bn>{$b/price}</price-bn> </book> } </books-with prices>

  49. Left-outer joins in XQuery <books-with-prices> {for $a in document(“amaxon.xml”)/book return <book> {$a/title} <price-amazon>{$a/price}</price-amazon>, {for $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return <price-bn>{$b/price}</price-bn> } </book> } </books-with prices>

  50. Full-outer joins in Xquery let $allISBNs:=distinct(document(“amazon.xml”)/book/isbn union document(“bn.xml”)/book/isbn ) return <books-with-prices> {for $isbn in $allISBNs return <book> { for $a in document(“amazon.xml”)/book[isbn=$isbn] return <price-amazon>{$b/price}</price-amazon> } {for $b in document(“bn.xml”)/book [isbn=$isbn] return <price-bn>{$b/price}</price-bn> } </book> } </books-with prices>

More Related