1 / 41

XML and Databases

XML and Databases. 198:541. XML Motivation. XML Motivation. Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions (presentation) Integration of data from different sources Structural differences

lionel
Télécharger la présentation

XML and Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML and Databases 198:541

  2. XML Motivation

  3. XML Motivation • Huge amounts of unstructured data on the web: HTML documents • No structure information • Only format instructions (presentation) • Integration of data from different sources • Structural differences • Closely related to semistructured data

  4. Semistructured Data • Integration of heterogeneous sources • Data sources with non rigid structures • Biological data • Web data • Need for more structural information than plain text, but less constraints on structure than in relational data

  5. Characteristics of Semistructured Data • Missing or additional tuples • Multiple attributes • Different types in different objects • Heterogeneous collection • Self-describing, irregular data with no apriori structure

  6. HTML Document Example Type of information <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999 Title Authors Year book

  7. The Idea Behind XML • Easily support information exchange between applications / computers • Reuse what worked in HTML • Human readable • Standard • Easy to generate and read • But allow arbitrary markup • Uniform language for semistructured data • Data Management

  8. XML

  9. XML • eXtensible Markup Language • Universal standard for documents and data • Defined by W3C • Set of emerging technologies • XLink, XPointer, XSchema, DOM, SAX, XPath, XQuery,…

  10. XML • XML gives a syntax, not a semantic • XML defines the structure of a document, not how it is processed • Separate structural information from format instructions

  11. XML Example <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography>

  12. XML Terminology • Tags: book, title, author,… • Start tag: <book> • End Tag: </book> • Elements are nested • Empty Element • <reviews></reviews> => <reviews/> • XML Document: single root element • XML Document is well formed: matching tags

  13. XML Attributes • Attributes are <name, value> pairs that characterize an element. <book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book> • Can define oid, but they are just syntax

  14. More XML • Text can be CDATA or PCDATA • Entity References: &amp:&, &gt:>,… • Processing Instructions: <?blink?> • Comments: <!-- comment text -->

  15. Well Formed XML Documents • Elements must be properly nested • <book><title> Foundations of Databases </title></book> • But Not: • <book><title> Foundations of Databases</book></title> • There must be a unique root element • Elements can be of • ‘element content’ • or ‘mixed content’: • <title>This is <b>Mixed</b>Content</title>

  16. XML: Potential • Flexible enough to represent anything • Stock market, DNA, Music, Chemicals • Weather information • Wireless network configuration • Enables easy information exchange • Between companies • Within companies • Standard: everybody uses the same technology

  17. XML: Limitations • XML is only a syntax for documents • We need tools! • Editors and parsers • Programming APIs (for Java, C++, etc.) • Languages to manipulate XML (how many books?) • Schemas (What is a book like?) • Storage (What if you have a lot of XML?) • Transfer protocols (How do you exchange it?) • What about XML in Chinese…? • How can XML fit into my phone…? • Query processing? • …

  18. XML Schema Language

  19. DTDs: Document Type Descriptors • Similar to a schema • Grammar describing constraints on document structure and content • XML Documents can be validated against a DTD <!ELEMENT Book (title, author*)><!ELEMENT title #PCDATA><!ELEMENT author (name, address, age?)><!ATTLIST Book id ID #REQUIRED><!ATTLIST Book pub IDREF #IMPLIED>

  20. Shortcomings of DTDs • Useful for documents, but not so good for data: • No support for structural re-use • Object-oriented-like structures aren’t supported • No support for data types • Can’t do data validation • Can have a single key item (ID), but: • No support for multi-attribute keys • No support for foreign keys (references to other keys) • No constraints on IDREFs (reference only a Section)

  21. XSchema • In XML format • Includes primitive data types (integers, strings, dates,…) • Supports value-based constraints (integers > 100) • Inheritance • Foreign keys • …

  22. Example of XSchema <schema version=“1.0” xmlns=“http://www.w3.org/1999/XMLSchema”> <element name=“author” type=“string” /> <element name=“date” type = “date” /> <element name=“abstract”> <type> … </type> </element> <element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0” maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0” maxOccurs=“1” /> <element ref=“body” /> </type> </element> </schema>

  23. XML Storage

  24. Storing XML Data • Different approaches: • Storing as text • Using RDBMS • Using a native system Tailored for XML, (NATIX, Tamino, Ipedo, etc.) Performance of the various approaches depends on your application

  25. Storing XML as Text • Simple • Easy to compress • No updates • Need to parse the document every time it is needed

  26. Storing XML in RDBMS • Uses existing RDBMS techniques • Costly in space, takes time to reconstruct original document • Example techniques: • Schema with 2 relations: tag and value • Schema with n relations: 1 per element name

  27. Accessing and Querying XML Data

  28. XML as a Tree: DOM • DOM = Document Object Model • Class hierarchy serving as an API to XML trees • Methods of those classes can be used to manipulate XML (e.g., Node::child, Node::name) • Can be used from Java, C++ to develop XML applications. • Each node has an identity (i.e., a unique identifier) in the whole document

  29. XML as a DOM Tree • Class hierarchy(node, element attribute) bibliography book book title author author author publisher year Foundations ofDatabases Abiteboul Hull Vianu Addison Wesley 1995

  30. XML as a Stream: SAX • XML document = event stream. E.g., • Opening tag ‘book’ • Opening tag ‘title’ • Text “Foundations of databases” • Closing tag ‘title’ • Opening tag ‘author’ • Etc. • SAX allow you to associate actions with those events to build applications • Very efficient since it corresponds to events during parsing, but not always sufficient.

  31. XPath • Language for navigating in an XML document (seen as a tree) • One root node • types of nodes: root, element, text, attribute, comment,… • XPath expression defines navigation in the tree following axis: child, descendant, parent, ancestor,…

  32. XPath: Examples • Find all the titles of all the books: • //book/title • Find the title of all books written by Charles Dickens • //book[author=“Charles Dickens”]/title • Find the title of the first section in the second chapter in “Great Expectations” • //book[title=“Great Expectations”]/chapter[2]/section[1]/title • Find the title of all sections that come after the second chapter in “Great Expectations”: • //book[title=“Great Expectations”]/chapter[2]/following::section/title

  33. Querying XML Data • Need for a language to query XML data • Should yield XML output • Should support standard query operations • No schema required • Several work on an XML query language: XML-QL, XQuery,..

  34. XQuery • XPath included in XQuery • FLWR expressions: for let where return FOR$xINdocument("bib.xml")/bib/book WHERE$x/year > 1995 RETURN$x/title Result: <title> abc </title> <title> def </title> <title> ghi </title>

  35. How to process XML Queries? • Use indexes • Need to identify nodes • Need to know relations between nodes • Labeling Schemes • Dewey encoding • Prefix-Postfix encoding • Twigstack

  36. Web Services

  37. What are Web Services • Programming interfaces for application to application communication on the Web • platform-independent, • language-independent • object model-independent • Possibility to activate methods on remote web servers (RPC) • 2 main applications • E-commerce • Access to remote data

  38. XML and Web Services • Exchange of information between application is in XML • Input and Result • Use of SOAP to generate messages • Descriptions of the web service functionality given in XML, according to the WSDL schema Web Services standards use XML heavily

  39. XML: a very active area Many research directions Many applications Standards not finalized yet: XQuery XML Schema Web Services… Conclusions

  40. Some Important XML Standards • XSL/XSLT: presentation and transformation standards • RDF: resource description framework (meta-info such as ratings, categorizations, etc.) • XPath/XPointer/XLink: standard for linking to documents and elements within • Namespaces: for resolving name clashes • DOM: Document Object Model for manipulating XML documents • SAX: Simple API for XML parsing • …

  41. References • XML • http://www.w3.org/XML/ • Sudarshan S. Chawathe: Describing and Manipulating XML Data. IEEE Data Engineering Bulletin 22(3)(1999) • XML Standards • http://www.w3.org/ (XSL, XPath, XSchema, DOM…) • Storing XML Data • Daniela Florescu, Donald Kossmann: Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin 22(3)(1999) • Hartmut Liefke, Dan Suciu: XMILL: An Efficient Compressor for XML Data. SIGMOD Conference 2000 • XQuery • http://www.w3.org/TR/xquery/ • Peter Fankhauser: XQuery Formal Semantics: State and Challenges. SIGMOD Record 30(3)(2001) • Web Services • http://www.w3.org/2002/ws/

More Related