1 / 80

Native XML Databases

Native XML Databases. Ronald Bourret rpbourret@rpbourret.com http://www.rpbourret.com. Overview. What is a native XML database? Use cases Code examples Implementation and performance Normalization and referential integrity Common database features. What is a Native XML Database?.

Télécharger la présentation

Native XML Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Native XML Databases Ronald Bourretrpbourret@rpbourret.comhttp://www.rpbourret.com

  2. Overview • What is a native XML database? • Use cases • Code examples • Implementation and performance • Normalization and referential integrity • Common database features

  3. What is aNative XML Database?

  4. Storing XML in a database 1. Build a model • Model the data in the XML document, or... • Model the XML document itself 2. Map the model to the database 3. Transfer data according to the model

  5. Sample XML document <Order> <Number>1234</Number> <Customer>Gallagher Co.</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item></Order>

  6. Modeling data • Objects in model are specific to XML schema object Order { number=1234; customer="Gallagher Corp."; date=29.10.00; items={ptrs to Items};} object Item { number=1; part="A-10"; quantity=12; price=10.95;} object Item { number=2; part="B-43"; quantity=600; price=3.99;}

  7. Storing data • Database schema specific to XML schema OrdersNumber Customer Date1234 Gallagher Co. 291000 ... ... ... ... ... ... ItemsSONum Item Part Qty Price 1234 1 A-10 12 10.95 1234 2 B-43 600 3.99 ... ... ... ... ...

  8. Modeling documents • Objects in model independent of XML schema Element (Order) Element Element Element Element Element (Number) (Customer) (Date) (Item) (Item) Text Text Text ... Element Element Element Attr 1234 Gallagher Co. 29.10.00 (Part) (Quantity) (Price) (Number) Text Text Text Text A-10 12 10.95 1

  9. Storing documents • Database schema independent of XML schema(order columns not shown) ElementsAttributesText ID Name Parent ID Name Parent Parent Value 1 Order -- 13 Number 5 2 1234 2 Number 1 14 Number 6 3 Gallagher Co. 3 Customer 1 4 29.10.00 4 Date 1 7 A-10 5 Item 1 8 12 6 Item 1 9 10.95 7 Part 5 10 B-43 8 Quantity 5 11 600 9 Price 5 12 3.99 10 Part 6 13 1 11 Quantity 6 14 2 12 Price 6

  10. Types of databases (take 1) • Categorize databases based on what they model • Model data => XML-enabled database • Model document => Native XML database

  11. Types of databases (take 2) • XML adds a new data model to the world • In addition to relational, hierarchical, OO, ... • The “XML” data model is • A tree of ordered nodes • Nodes have different types (element, attribute, etc.) • Some nodes are labeled (Date, Quantity, Price, etc.) • Data stored in leaf nodes • Modeling language is an XML schema language • DTD, XML Schemas, RELAX NG, etc.

  12. Types of databases (take 2) • XML-enabled databases • Have their own data model (relational, OO, etc.) • Map their data model to XML data model • Native XML databases • Use XML data model directly

  13. Formal definition¹ • A native XML database: • Defines an XML data model • Uses a document as its fundamental unit of (logical) storage • Can have any physical storage ¹ Definition from XML:DB Initiative

  14. XML data model • XML 1.0 did not define a model • Native XML databases define their own model • Model must include elements, attributes, text, and document order • Examples are XPath, Info Set, DOM, and SAX • XQuery model will be de facto standard in future? • Data transferred according to the model

  15. Fundamental unit of storage • Document is fundamental unit of logical storage • Equivalent structure in RDBMS is a row • Document usually contains single set of data

  16. Physical storage • Can use any physical storage • Examples • Tables for SAX “objects” in relational database • DOM objects in object-oriented database • Binary file format optimized for XPath data model • Hashtables of XPaths and values • Compressed, indexed XML documents in file system

  17. Use Cases

  18. Use case: Tabular data • Manufacturing data • Accounting data • Employee data • Scientific data • ...

  19. Use a native XML DB only if: • You want to lose your job. • Relational databases excel at storing tabular data • Systems are mature, fast • Few reasons to change existing code or DBMS

  20. Use case: Documents • Product documentation • Static Web pages • Presentations • Advertisements • ....

  21. Use a native XML DB because: • Structure too irregular for relational model • Physical info (entities, CDATA, ...) is important • Need document-centric queries • Get first chapter with a paragraph containing “XML” • Get headings of all chapters that contain figures • Want to retain document identity

  22. Use case: Semi-structured data • Some structure, but not regular • Structured data: white pages • Semi-structured data: yellow pages • Examples • Health data • Result of EAI (Enterprise Application Integration) • Geneological data • ...

  23. Use a native XML DB because: • Difficult to store in a relational database • Use single, sparsely populated table or... • Many tables (one per subject area) • Documents might contain arbitrary extensions

  24. Use case:“Natural” format is XML • Only format is XML • XSLT stylesheets • Data stored temporarily as XML • Long-running transactions • Enterprise application integration (EAI) • Documents in a message queue • Schema-less documents • No schema or schema not known

  25. Use a native XML DB because: • Want query language, security, transactions, etc. • No reason to use an XML-enabled database • Natural format is XML • Mapping arbitrary documents at run-time is inefficient, error-prone

  26. Use case: Large documents • Need entire document in memory • DOM tree • XSLT transformation • etc. • Document too large for memory

  27. Use a native XML DB because: • In-memory tree lazily instantiated • Unused nodes kept on disk • Tree size limited only by: • Disk size • Database address space

  28. Use case: Archiving • Long-term document storage • Often required by law • Pharmaceutical industry • Contracts • Financial transactions

  29. Use a native XML DB because: • Want query language, security, transactions, etc. • Better than file system or BLOBs • Retains complete document • Retaining original document might be a problem • May support versioning through updates or diffs

  30. Use case: Performance • We all want better performance, right?

  31. Use a native XML DB because: • May be faster than XML-enabled databases • Speed depends on: • Actual query • Query and storage engines • Output format (DOM, SAX, string)

  32. Code Examples

  33. XML:DB API • Database-neutral API from XML:DB Initiative • Similar to JDBC, OLE DB, etc. • Implemented by database drivers • Methods to connect to database, execute queries, etc. • Uses separate query language(s)

  34. JDBC example: INSERT // Instantiate the Sun JDBC-ODBC bridge driver. The driver registers// itself with the driver manager.Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");// Get a connection to the Orders database and turn auto-commit off.Connection conn = DriverManager.getConnection("jdbc:odbc:Orders", "rpbourret", "my_password");conn.setAutoCommit(false);// Get a Statement.Statement s = conn.createStatement();

  35. JDBC example: INSERT (cont.) // Execute INSERT statements to insert new rows into the Orders// and Items tables.s.executeUpdate("INSERT INTO Orders " + "VALUES ('1234', 'Gallagher Co.', #10/29/00#)");s.executeUpdate("INSERT INTO Items " + "VALUES ('1234', 'A-10', 12)");s.executeUpdate("INSERT INTO Items " + "VALUES ('1234', 'B-43', 600)");// Commit the transaction. Close the statement and the connection.conn.commit();s.close();conn.close();

  36. XML:DB example: INSERT // Get the database (driver) for dbXML/Xindice. Register it// explicitly with the database manager.Class c = Class.forName("org.dbxml.client.xmldb.DatabaseImpl");Database db = (Database) c.newInstance();DatabaseManager.registerDatabase(db);// Get the Orders collection.Collection coll = DatabaseManager.getCollection("xmldb:dbxml:///db/Orders", "rpbourret", "my_password");// Get a transaction service. XML:DB is designed to use "services",// such as transaction, collection management, and query services.TransactionService txn = (TransactionService)coll.getService("TransactionService", "1.0");

  37. XML:DB example: INSERT (cont.) // Start a new transaction.txn.begin();// Create a new (empty) XMLResource in the Orders collection.XMLResource res = (XMLResource)coll.createResource("Order1234", "XMLResource");// Read the contents of the Order1234.xml file into a string.int b;FileReader r = new FileReader("Order1234.xml");StringWriter w = new StringWriter();while ((b = r.read()) != -1){ w.write(b);}

  38. XML:DB example: INSERT (cont.) // Set the contents of the new resource to the string.res.setContent(w.toString());// Commit the transaction.txn.commit();// Close the collection and deregister the database.coll.close();DatabaseManager.deregisterDatabase(db);

  39. JDBC example: SELECT // Instantiate the Sun JDBC-ODBC bridge driver. The driver registers// itself with the driver manager.Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");// Get a connection to the Orders database.Connection conn = DriverManager.getConnection("jdbc:odbc:Orders", "rpbourret", "my_password");// Execute a SELECT statement to get all orders from 10/29/00.Statement s = conn.createStatement();ResultSet rs = s.executeQuery("SELECT Number, Date, Customer " + "FROM Orders WHERE Date=#10/29/00#");

  40. JDBC example: SELECT (cont.) // Get each row in the result set and print it.while (rs.next()){ System.out.println("Order: " + rs.getString(1)); System.out.println("Customer: " + rs.getString(2)); System.out.println("Date: " + rs.getDate(3)); System.out.println();}// Close the result set, the statement, and the connection.rs.close();s.close();conn.close();

  41. XML:DB example: SELECT // Get the database (driver) for dbXML/Xindice. Register it// explicitly with the database manager.Class c = Class.forName("org.dbxml.client.xmldb.DatabaseImpl");Database db = (Database) c.newInstance();DatabaseManager.registerDatabase(db);// Get the Orders collection.Collection coll = DatabaseManager.getCollection("xmldb:dbxml:///db/Orders", "rpbourret", "my_password");// Get an XPath query service over the Orders collection. XML:DB// is designed to use a variety of "services", such as different// query engines.XPathQueryService service = (XPathQueryService)coll.getService("XPathQueryService", "1.0");

  42. XML:DB example: SELECT (cont.) // Execute an XPath query to get all orders from 29.10.00.ResourceSet docs = service.query("/Order[Date=\"29.10.00\"]");// Iterate over the returned documents and print each one. We do// not need to know any metadata (such as data types) here. This// is because metadata is contained in the XML document itself.ResourceIterator iterator = docs.getIterator();while (iterator.hasMoreResources()){ XMLResource doc = (XMLResource)iterator.nextResource(); System.out.println((String)doc.getContent()); System.out.println();}// Close the collection and deregister the database.coll.close();DatabaseManager.deregisterDatabase(db);

  43. Implementationand Performance

  44. Text-based storage • Stores documents as text • Uses file system, CLOB, etc. • Includes XML-aware text in RDBMS • May need to parse documents at run time • Uses indexes to avoid extra parsing, increase search speed

  45. Text-based storage <Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address> <Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address> <Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address> <Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>

  46. Text-based databases • Indexed files • TextML • CLOBs • Oracle 9i release 2, DB2

  47. Model-based storage • Stores documents in “object” form • Documents parsed when inserted • For example, store DOM objects in OODBMS • Underlying storage can be relational, object-oriented, hierarchical, proprietary • Uses indexes to speed searches

  48. Model-based storage <Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address> Element Element Element Element Element Element Text Text Text Text Text

  49. Model-based databases • Proprietary • Tamino, Xindice, Neocore, Ipedo, XStream DB, XYZFind, Infonyte, Virtuoso, Coherity, Luci, TeraText, Sekaiju, Cerisent, DOM-Safe, XDBM, ... • Relational • Xfinity, eXist, Sybase, DBDOM • Object-oriented • eXcelon, X-Hive, Ozone/Prowler, 4Suite, Birdstep

  50. Performance • No public benchmark data? • Native vendors say native is faster • Relational vendors say relational is faster • Native XML databases rely heavily on indexing • Slows update times • Some guesses are possible

More Related