210 likes | 330 Vues
DASMOD Project A3XDB: XML Databases. Christian Mathis mathis@informatik.uni-kl.de Databases and Information Systems Group. 1st DASMOD Summer School July 31st – August 13th University of Kaiserslautern. A3XDB Project Members.
E N D
DASMOD Project A3XDB: XML Databases Christian Mathis mathis@informatik.uni-kl.de Databases and Information Systems Group 1st DASMOD Summer School July 31st – August 13th University of Kaiserslautern
A3XDB Project Members • Joint project of the Information Systems Group with the Software Technology Group • A3XDB is part of XTC (XML Transaction Coordinator) • Chairs • Theo Härder (Information Systems) • Arnd Poetzsch-Heffter (Software Technology) • Scientific Staff • Michael Haustein (Locking and Recovery; Project Founder) • Christian Mathis (Query Processing) • Jose de Aguiar Moraes Filho (Cost Model) • Karsten Schmidt (Adaptivity) • Patrick Michel (Adaptivity)
Outline • Why XML Database Systems? And how do they look like? • Let's (sky-)dive into XTC • L5: 33,000 ft. (XML Management) • XML, XQuery, DOM, SAX • L4: 15,000 ft. (Node Management) • XML Tree • L3: 10,000 ft. (Record Management) • Mapping onto Records, Pages • L2: 5,000 ft. (Buffer Management) • DB Buffer • L1: 0 ft. (I/O Management) • Containers, Blocks • Adaptivity Aspects
Why XML Database Systems (XDBMS)? • Q: When do I need an XML Database System? • A: When you have a lot of XML data. • … and if you also need some of these nice DBMS features • ACID transactions • high-level data handling (declarative query processing) • efficient and parallel processing of large data volumes • high availability and fault tolerance • scalabilty w.r.t transaction workload and data volumes • adaptive tuning • Examples: • Document centric view: document collections • books, articles, web pages, … • application: structure-sensitive information retrieval • Data centric view: semistructured data model • messages, configuration files, semistructured data per se • application: helthcare information management
XQuery XML SQL Tuples SQL Tuples XQuery XML XQuery Rewriter SQL Rewriter SQL DBMS XQuery DBMS Tables Native XML Store How do XDBMS look like? • ROX: "Relational over XML" • "Native" XML storage • SQL Systems become legacy • XOR: "XML over Relational" • "Shredding" XML -> Tables
Temporary Files Container Logs Transaction Log Container Files XML Transaction Coordinator (XTC) XTCdriver DOM XTCconnection Browser FTP Client SAX L5 Interface Services Http Agent Ftp Agent DOM RMI SAX RMI API RMI XML Services XQuery Processor XML Manager XSLT Processor Transaction Services Node Services L4 Node Manager Lock Manager Access Services L3 Record Mgr Index Mgr Catalog Mgr Transaction Manager Propagation Control Deadlock Detector L2 Buffer Manager XTCserver File Services L1 Temp File Mgr I/O Manager OS File System
L5 (33,000 ft.): Example XML Document • <bib> <book year=“1994“ id=“1“> <title>TCP/IP Illustrated</title> <author> <first>W.</first> <last>Stevens</last> </author> <price>65.95</price> </book> <book year=“2000“ id=“2“> <title>Data on the Web</title> <author> <last>Abiteboul</last> <first>Serge</first> </author> <author> <last>Buneman</last> <first>Peter</first> </author> <author> <last>Suciu</last> <first>Dan</first> </author> <price>39.95</price> </book> <book year=“1999“ id=“3“> <title>The Economics of . . . </title> <editor> <last>Gerbarg</last> <first>Darcy</first> <affiliation>CITI</affiliation> </editor> <price>129.95</price> </book> </bib>
L5 (33,000 ft.): Example API-Access • XQuery • DOM • SAX <result>{ for $b in //book[@year=2000] where count($b/author) > 2 return $b/title }</result> Node contextNode = document.getDocumentElement (); // navigate to first book element contextNode = contextNode.getFirstChild (); // navigate to next sibling book element contextNode = contextNode.getNextSibling (); public void startElement(String namespaceURI, String lName, ...) {} public void endElement(String namespaceURI, String lName, ...) {} public void characters(char ch[], int start, int length) {}
L5 (33,000 ft.): XTC Command Center • document handling • store/delete documents • document navigation/modification/querying in transactional context • DOM, SAX, XQuery
<?xml version="1.0"?><bib> <book year="2004" id="book1"> <title>The Title</title> <author> <first>FirstName</first> <last>LastName</last> </author> <price>49,99</price> </book></bib> element node bib book attribute root node title price author first last id year book1 The Title T T T T FirstName LastName attribute node 49,99 text node string node 2004 L4 (15,000 ft.) taDOM data model
bib book title price author first last id year book1 The Title T T T T FirstName LastName 49,99 2004 L4 (15,000 ft.) SPLID node addressing scheme • Stable Path Labeling IDentifiers • for document storage • for query processing • for locking support 1 1.3 1.3.1 1.3.3 1.3.5 1.3.7 1.3.1.3 1.3.1.5 1.3.5.5 1.3.3.3 1.3.5.3 1.3.7.3 1.3.1.3.1 1.3.1.5.1 1.3.3.3.1 1.3.5.3.3 1.3.5.5.3 1.3.7.3.1 1.3.5.3.3.1 1.3.5.5.3.1
modify read • needs exclusive access • requests X lock • needs shared access • requests R lock bib book On a tree: hierarchical locking! 1 title price author 1.3 first last id year book1 1.3.1 The Title 1.3.3 1.3.5 1.3.7 1.3.1.5 1.3.5.5 T T T T 1.3.3.3 1.3.5.3 1.3.7.3 FirstName LastName 49,99 1.3.1.3.1 1.3.1.5.1 1.3.3.3.1 1.3.5.3.3 1.3.5.5.3 1.3.7.3.1 2004 1.3.5.3.3.1 1.3.5.5.3.1 L4 (15,000 ft.) Simple Locking Example Protocol: Compatability Matrix T1 T2 Object OK! T2: R T1: X T2: R
L3 (10,000 ft.) XTC Document Index 1.3.3.3 document index 1 1.3.5.3.3 1.3.1.3.1 1.3.5.5.3.1 1 1.3.1.3.1 1.3.3.3 1.3.5.3.3 1.3.5.5.3.1 1.3 1.31.5 1.3.3.3.1 1.3.5.3.3.1 1.3.7 1.3.1 1.3.1.5.1 1.3.5 1.3.5.5 1.3.7.3 1.3.1.3 1.3.3 1.3.5.3 1.3.5.5.3 1.3.7.3.1 document container SPLID node data (byte representation) • document mapped to records and distributed across fixed sized pages • efficient DOM navigations • prefix compression works
PageNumber (4 Bytes) Database Buffer PageType (1 Byte) Data Page Data Page Data Page Data Page Frame Frame Frame Frame Frame Buffer Management • Buffer = main memory area with fixed number of frames for pages • Exploits reference locality • Typical BufferManager operations • fetch page, allocate page, clear page, fix page, unfix page • Page replacement strategy LRU or LRD-V2 • Page addressing by 4-byte page number (external memory address)
I/O-Manager (1) • Container file is sliced into fixed sized blocks (blockSize == pageSize) • I/O-Manager handles container file • read block, write block, allocate block, release block. • Dynamic allocation of new external memory space, if container is full • Indexblock an Position 0 verwaltet Block- und Erweiterungsgröße • Before-Image-Block at position 1 for Update-In-Place with Write-Ahead-Log • Block addressing with 3-byte block number Block Size Extent Size … Block 0 Index Block 1 Before Image Block 2 Data Block Block 3 Data Block Block n Data Block Container
Approaches to Adaptivity of System Behavior • DBS have a large number of tuning parameters • Choose default values for tuning: rules of thumb • OK for workload-independent parameters: page size, striping unit, minimal buffer size • insufficient for load balancing aspects: MPL limit, etc. • Hardware is cheap: the KIWI principle • OK if applied with care • however, it often implies a waste of resources • Autonomic computing: online feedback control loop • OK, but requires additional ressources (cycles, memory, ...)
Automate some Tasks of the DBA • Process the loop automatically • monitor – analyze – plan – react • prediction needs quantitative models! • additional information flow within / between layers
Local Self-Tuning – Index Selection Better solution! • Automatic creation of indexes in L3 • Analogy: • Global self-tuning requires distributed knowledge • Workload statistics collected in L5 • Use of path processing algorithms in L4 • availability of alternative indexes in L3 Global traffic observation Counting traffic locally Planning new resources?
Conclusions • XTC is a real database system • Try it: www.xtc-project.de • We dived through the 5 XTC layers • XML management • Node Management • Record Management • Buffer Management • I/O Management • Adaptivity • We are only at the beginning • Central concept: online feedback control loop • First step in XTC: Let the components talk to each other