200 likes | 335 Vues
This document delves into the intricacies of XML and semi-structured data, focusing on their querying languages such as Lorel and UnQL. It highlights the similarities and differences between XML and semi-structured data formats, discussing essential features required for effective query languages, including expressive power, semantics, and compositionality. The document explains the structure of queries, optimization techniques, and the handling of nested data. Insights into formal semantics and examples are also provided, showcasing the capabilities of different query languages and their applications.
E N D
Query Languages Aswin Yedlapalli
XML Query data model • Document is viewed as a labeled tree with nodes • Successors of node may be : - an ordered sequence of nodes (eg. for sub elements). - an unordered set of nodes (eg. For attributes). • Compatible with XML schemas
Comparison of XML and semi structured data • Similarities: - both are best described by a labeled graph. - both are schema-less self describing. • Differences: - XML is ordered; semi structured data is unordered. - XML can mix text and elements
Required features for a Query Language • Expressive power - The Query language must be at least as expressive as SQL on relational data. - The Query language should have the ability to restructure data. - The Query language should be able to navigate data with arbitrary nesting. • Semantics - It is very important in a query language for query transformation and optimization.
Compositionality - Our queries must remain in the same data model. They cannot take data in one model and produce output in another model. • Schema - when structure is defined, a query language should be exploited for optimization, type checking etc.,
Query languages • For semi structured data - Lorel (Lightweight Object REpository Language) - UnQL (Unstructured Query Language) -StruQL, MSL, W3QL, WebSQL, Weblog, etc., • For XML - XML-QL (XML Query Language) - XSLT & structural recursion. - XML Query Algebra.
Formal Semantics • Given query Q = SELECT E[X1,……. Xn] FROM F WHERE Cand database DB Answer: (Q,DB) is defined in two steps: –Step 1: compute all bindings: •Cij are node oids or atomic values
•Must satisfy paths in F •Must satisfy conditions in C –Step 2: answer is E[C11, …, C1n] È…È E[Cm1, …, Cmn]
When E has nested sub queries, apply semantics recursively • Note: so far we have dealt with an unordered model • –What do we need to do for order ? • •Complexity: PTIME in |DB| (not in |Q|).
LOREL • Minor syntactic differences in regular pathexpressions (% instead of _, # instead of _*) • Common path convention SELECT biblio.book.author FROM biblio.book WHERE biblio.book.year = 1999
Becomes SELECT X.author FROM biblio.book X WHERE X.year = 1999
Lorel • Query language of LORE system adapts OQL to semi structured data. Select X.title from bib.article X where “tova milo” in X.author returns {title: “type inf…”}
Features of Lorel • Differences with typed query languages - performs implicit coercions. - deals with missing attributes. - deals with set valued attributes. eg., x.year > 1998 may have several years. • Select clause creates new nodes. • Allows for nested queries. • Allows for regular path expressions.
UnQL (Unstructured Query language) • UnQL is an extension of basic LOREL. • UnQL does not make use of coercion unlike LOREL. • “Where” clause contains 2 kinds of constructs. - generators; variables are bound via patterns. - conditions; as in LOREL • “from” clause is not needed as variables are bound in patterns.
UnQL Queries • Eg., Select title:T where {bib:article:{title:T, year:Y}}in db, y>1998. • Root of the database is explicitly represented: db • UnQL queries can be rewritten in LOREL. The equivalent LOREL for the above query is: select title:T from bib.article A, A.title T, A.year Y where Y>1998.
Additional features of LOREL • Label variables - can combine “schema” and “data” information. - can turn tables to data and vice-versa. - perform group-by operations. • Can match variables with regular expressions.
References • Managing XML and semi structured data – Lecture series by Prof. Dan Suciu. • website:www.cs.washington.edu/homes/suciu/COURSES/590DS/