240 likes | 363 Vues
This lecture review covers the challenges of XML data storage, distinguishing between small and large XML instances. We explore various schema design solutions derived from DTD, edge relations for graph storage, and BLOB storage, detailing advantages and inefficiencies. Additionally, we discuss queries related to these storage methods, with a focus on relational schema design and XML processing topics, including XPath and XQuery. Prepare effectively for the midterm with insights on SQL, E/R diagrams, and schema design principles.
E N D
Lecture 15:Midterm Review Friday, October 31, 2003
XML Storage • Most often the XML data is small • E.g. a SOAP message • Parsed directly into the application (DOM API) • Sometimes XML data is large • need to store/process it in a database • The XML storage problem: • How do we choose the schema of the database ?
XML Storage Three solutions: • Schema derived from DTD • Storing XML as a graph: “Edge relation” • Store it as a BLOB • Simple, boring, inefficient • Won’t discuss in class
Designing a Schema from DTD Design a relational schema for: <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office?, phone*)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, ((price,availability)|description))> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>
Designing a Schema from DTD First, constructthe DTD graph: company * We ignorethe order * person product * office phone pid price avail. descr. ssn name
Designing a Schema from DTD Next, design therelational schema,using common sense. company * * person product * office phone pid price avail. descr. ssn name Person(ssn, name, office)Phone(ssn, phone)Product(pid, name, price, avail., descr.) Which attributes may be NULL ? (Look at the DTD)
Designing a Schema from DTD What happens to queries: FOR$xIN /company/product[description] RETURN <answer> { $x/name, $x/description } </answer> SELECT Product.name, Product.description FROM Product WHERE Product.description IS NOT NULL
Storing XML as a Graph Sometimes we don’t have a DTD: • How can we store the XML data ? Every XML instance is a tree • Store the edges in an Edge table • Store the #PCDATA in a Value table
0 1 db 9 2 5 book book publisher 4 8 11 3 6 7 10 title author author title state title author “Complete Guide to DB2” “Morgan Kaufman” “CA” “Chamberlin” “Transaction Processing” “Bernstein” “Newcomer” Storing XML as a Graph Can be ANYXML data(don’t know DTD) Edge Value
xdb xbook xauthor xtitle vtitle vauthor Storing XML as a Graph What happens to queries: FOR$xIN /db/book[author/text()=“Chamberlin”] RETURN$x/title db book author title “Chamberlin” Return value
Storing XML as a Graph What happens to queries: A 6-way join !!! SELECT vtitle.value FROM Edge xdb, Edge xbook, Edge xauthor, Edge xtitle, Value vauthor, Value vtitle WHERE xdb.source = 0 and xdb.tag = ‘db’ and xdb.dest = xbook.source and xbook.tag = ‘book’ and xbook.dest = xauthor.source and xauthor.tag = ‘author’ and xbook.dest = xtitle.source and xtitle.tag = ‘title’ and xauthor.dest = vauthor.source and vauthor.value = ‘Chamberlin” and xtitle.dest = vtitle.source
Storing XML as a Graph Edge relation summary: • Same relational schema for every XML document: Edge(Source, Tag, Dest) Value(Source, Val) • Generic: works for every XML instance • But inefficient: • Repeat tags multiple times • Need many joins to reconstruct data
Other XML Topics • Name spaces • XML API: • DOM = “Document Object Model” • XML languages: • XSLT • XML Schema • Xlink, XPointer • SOAP Available from www.w3.org (but don’t spend rest of your life reading those standards !)
Research on XML Data Management at UW • Processing: • Query languages (XML-QL, a precursor of XQuery) • Tukwila • XML updates • XML publishing/storage • SilkRoute: silkroute.sourceforge.net • STORED • XML tools • XML Compressor: Xmill • XML Toolkit (xsort, xagg, xgrep, xtransf, etc): xmltk.sourceforge.net • Theory: • Typechecking • Xpath, Xquery containment
The Midterm Four questions: • SQL • E/R • Schema design (BCNF/3NF) • XML
The Midterm • Open book exam (books,/notebooks,/lectures) • But no computers • No retakes • Your score:Max(midterm-score,100 – 1.2(100 – final-score) • No retakes on final: you *must* take it on Dec. 12
1. SQL • Selection/project/join • Understand well duplicates • Aggregate queries • avoid nested queries when a GROUP BY suffices • Nested queries • More difficult: ANY, ALL, NOT IN • Updates, table creations, views
2. E/R Diagrams • One/many v.s. many/many relationships • Inheritance • Translation to relations • Remember: no table for one/many !
3. Schema Design • What does AB CD mean ? • Compute {AB}+ • Compute all keys • Decompose in BCNF or 3NF
4. XML • XML: • Basic syntax: elements + attributes • DTDs: elements only • The tree model • Canonical XML view of a relation (<row>...) • XPath • XQuery • Basic principles in publishing/storing data
Final Thoughts • Open book • But read the book before the exam • Some question(s) may be hard(er) • Answer first the questions that are easier • The answers should not be very complex
Exercises • ABC, CD, DB, DE • Decompose in BCNF General strategy: find X s.t.: X X+ all attributes
ABC, CD, DB, DE BCNF • Try AB+ = ABCDE nope… • Try C+ = CDBE yep…decompose CDBE, CA • Continue in CDBE: try D+ = DBE yep…decompose DBE, DC • Continue in DBE: try D+ = DE yep…decompose DE, DB • Answer: CA, DC, DE, DB Notice: AB is a key, but A and B are separated. Cleary 3NF differs
ABC, CD, DB, DE 3NF • Minimal keys: AB, AC, AD • Try C+ = CDBE yes, BUT: D,B part of keysso decompose on C E only:CE, CABD Continue in CABD • Minimal keys: AB, AC, AD • Try A+ = A nope… Try B+ = B nope…Try AB+ = ABCD nope…Try C+ = CDB yes, but B, D part of keysTry D+ = DB again B part of a key Try others… nope. • Answer: CE, CABD. Notice: DE, DABC also OK