1 / 86

XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page. Overview. XML A self-describing, hierarchal data model DTD, XML Schema Standardizing schemas for XML Xpath, XQuery How to navigate and query XML documents Xslt

tayte
Télécharger la présentation

XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XMLThese slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

  2. Overview • XML • A self-describing, hierarchal data model • DTD, XML Schema • Standardizing schemas for XML • Xpath, XQuery • How to navigate and query XML documents • Xslt • How to transform one XML document into another XML document

  3. XML – eXtensibleMarkupLanguage • Language • A way of communicating information • Markup • Notes or meta-data that describe your data or language • Extensible • Limitless ability to define new languages or data sets

  4. Motivation • Data interchange is critical in today’s networked world • Examples: • Banking: funds transfer • Order processing (especially inter-company orders) • Scientific data • Chemistry: ChemML, … • Genetics: BSML (Bio-Sequence Markup Language), … • Paper flow of information between organizations is being replaced by electronic flow of information • Each application area has its own set of standards for representing information • XML has become the basis for all new generation data interchange formats

  5. XML Document <accommodations> <hotel> <name Hyatt <name> <amenitiespool=“Y”gym=“N”> <phonenumber="(216) 555-1234“ /> <phonenumber="(216) 555-1258“/> <addresssc=“OH"> <street> South St. </street> <city> Cleveland </city> <state> OH</state> </address> <available> <room type="S" price="125.00"> <number> 101 </number> <dates> <from> 10/30/2002 </from> <to> 11/04/2002 </to> </dates> </room> <room type=...> ... </room> ... </available> </hotel> <hotel> ...

  6. Example <class name=‘CS63005’> <location building=‘MSB’ room=‘121’/> <professor>Yuri Breitbart</professor> <student_list> <student id=‘999-991’>John Smith</student> <student id=‘999-992’>Jane Doe</student> </student_list> </class>

  7. XML Genealogy • HTML is one of the parents of XML • XML is not a replacement for HTML • Ability to introduce new tags and to nest them • Includes data and data description • Example: Chemical Markup Language <molecule> <weight>234.5</weight> <Spectra>…</Spectra> <Figures>…</Figures> </molecule>

  8. attribute closing tag open tag attribute value data element name Structure of XML Data • Xml is a hierarchy of user-defined tags called elements with attributes and data • Data is described by elements, elements are described by attributes <student id=‘999-991’>John Smith</student>

  9. Structure of XML Data • Elements are nested • Every document must have a single top-level element

  10. attribute closing tag open tag attribute value data element name Elements <student id=‘999-991’>John Smith</student> • XML is case and space sensitive • Element opening and closing tag names must be identical • Opening tags: “<” + element name + “>” • Closing tags: “</” + element name + “>” • Empty Elements have no data and no closing tag: • They begin with a “<“ and end with a “/>” <location/>

  11. attribute closing tag open tag attribute value data element name Attributes <student id=‘999-991’>John Smith</student> • Attributes arte inside the starting and ending tags of an element. • There can be zero or more attributes in every element; each one has the the form: attribute_name=‘attribute_value’ • There is no space between the name and the “=‘” • Attribute values must be surrounded by “ or ‘ characters • Multiple attributes are separated by white space (one or more spaces or tabs).

  12. Attributes Vs. Subelements • Distinction between subelement and attribute • In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents • In the context of data representation, the difference is unclear and may be confusing • Same information can be represented in two ways • <account account-number = “A-101”> …. </account> • <account> <account-number>A-101</account-number> … </account> • Suggestion: use attributes for identifiers of elements, and use subelements for contents

  13. attribute closing tag open tag attribute value data element name Data <student id=‘999-991’>John Smith</student> • XML data is any information between an opening and closing tag • XML data must not contain the ‘<‘ or ‘>’ characters

  14. Nesting & Hierarchy • XML tags can be nested in a tree hierarchy • XML documents can have only one root tag • Between an opening and closing tag you can insert: 1. Data 2. More Elements 3. A combination of data and elements <root> <tag1> Some Text <tag2>More</tag2> </tag1> </root>

  15. More on XML Syntax • Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag • <account number=“A-101” branch=“Perryridge” balance=“200 /> • To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below • <![CDATA[<account> … </account>]]> • Here, <account> and </account> are treated as just strings

  16. Namespaces • XML data has to be exchanged between organizations • Same tag name may have different meaning in different organizations, causing confusion on exchanged documents • Specifying a unique string as an element name avoids confusion

  17. Namespaces • Avoid using long unique names all over document by using XML Namespaces <bank Xmlns:FB=‘http://www.FirstBank.com’> … <FB:branch> <FB:branchname>Downtown</FB:branchname> <FB:branchcity> Brooklyn</FB:branchcity> </FB:branch>… </bank>

  18. Node Type: Element_Node Name: Element Value: Root Node Type: Element_Node Name: Element Value: tag1 Node Node Type: Text_Node Name: Text Value: Some Text Type: Element_Node Name: Element Value: tag2 Node Type: Text_Node Name: Text Value: More XML – Storage • Storage is done just like an n-ary tree (DOM) <root> <tag1> Some Text <tag2>More</tag2> </tag1> </root>

  19. Xml vs. Relational Model <Table> <Computer Id=‘101’> <Speed>800Mhz</Speed> <RAM>256MB</RAM> <HD>40GB</HD> </Computer> <Computer Id=‘102’> <Speed>933Mhz</Speed> <RAM>512MB</RAM> <HD>40GB</HD> </Computer> </Table> Computer Table

  20. XML Document Schema • Database schemas constrain what information can be stored, and the data types of stored values • XML documents are not required to have an associated schema • However, schemas are very important for XML data exchange • Otherwise, a site cannot automatically interpret data received from another site

  21. XML Document Schema • Document Type Definition (DTD) • Widely used • XML Schema • Newer, not yet widely used

  22. Document Type Definition (DTD) • The type of an XML document can be specified using a DTD • DTD constraints structure of XML data • What elements can occur • What attributes can/must an element have • What subelements can/must occur inside each element, and how many times. • DTD does not constrain data types • All values represented as strings in XML • XML protocols and languages can be standardized with DTD present

  23. Fruit Basket DTD <?xml version='1.0'?> <!ELEMENT Basket (Cherry+, (Apple | Orange)*) > <!ELEMENT Cherry EMPTY> <!ATTLIST Cherry flavor CDATA #REQUIRED> <!ELEMENT Apple EMPTY> <!ATTLIST Apple color CDATA #REQUIRED> <!ELEMENT Orange EMPTY> <!ATTLIST Orange location ‘Florida’> -------------------------------------------------------------------------------- <Basket> <Cherry flavor=‘good’/> <Apple color=‘red’/> <Apple color=‘green’/> </Basket> <Basket> <Apple/> <Cherry flavor=‘good’/> <Orange/> </Basket>

  24. <!DOCTYPE bank [ <!ELEMENT bank ( ( account | customer | depositor)+)> <!ELEMENT account (account-number branch-name balance)> <! ELEMENT customer(customer-name customer-street customer-city)> <! ELEMENT depositor (customer-name account-number)> <! ELEMENT account-number (#PCDATA)> <! ELEMENT branch-name (#PCDATA)> <! ELEMENT balance(#PCDATA)> <! ELEMENT customer-name(#PCDATA)> <! ELEMENT customer-street(#PCDATA)> <! ELEMENT customer-city(#PCDATA)> ]> Bank DTD

  25. DTD Syntax • <!ELEMENT element (subelements-specification) > • <!ATTLIST element (attributes) >

  26. Element Specification in DTD <!ELEMENT Basket (Cherry+, (Apple | Orange)*) > • !ELEMENT declares an element name, and what children elements(subelements) it should have • Wildcards: • * Zero or more • + One or more Name Children

  27. Element Specification in DTD • Subelements can be specified as • names of elements, or • #PCDATA (parsed character data), i.e., character strings • EMPTY (no subelements) or ANY (anything can be a subelement) • Example <! ELEMENT depositor (customer-name account-number)> <! ELEMENT customer-name(#PCDATA)> <! ELEMENT account-number (#PCDATA)>

  28. Element Specification in DTD • Subelement specification may have regular expressions <!ELEMENT bank ( ( account | customer | depositor)+)> • Notation: • “|” - alternatives • “+” - 1 or more occurrences • “*” - 0 or more occurrences

  29. Attribute specification : for each attribute Name Type of attribute CDATA ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) Whether mandatory (#REQUIRED) has a default value (value), or neither (#IMPLIED) Attribute Specification in DTD

  30. Examples <!ATTLIST account acct-type CDATA “checking”> <!ATTLIST customer customer-id ID # REQUIRED accounts IDREFS # REQUIRED > Attribute Specification in DTD

  31. IDs and IDREFs • An element can have at most one attribute of type ID • The ID attribute value of each element in an XML document must be distinct • Thus the ID attribute value is an object identifier • An attribute of type IDREF must contain the ID value of an element in the same document • An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document

  32. DTD –Well-Formed and Valid <?xml version='1.0'?> <!ELEMENT Basket (Cherry+)> <!ELEMENT Cherry EMPTY> <!ATTLIST Cherry flavor CDATA #REQUIRED> -------------------------------------------------------------------------------- Not Well-Formed <basket> <Cherry flavor=good> </Basket> Well-Formed but Invalid <Job> <Location>Home</Location> </Job> Well-Formed and Valid <Basket> <Cherry flavor=‘good’/> </Basket>

  33. Bank DTD with Attributes • Bank DTD with ID and IDREF attribute types. <!DOCTYPE bank-2[ <!ELEMENT account (branch, balance)> <!ATTLIST account account-number ID # REQUIRED owners IDREFS # REQUIRED> <!ELEMENT customer(customer-street, customer-city)> <!ATTLIST customer customer-name ID # REQUIRED accounts IDREFS # REQUIRED> … declarations for branch, balance, customer-street and customer-city ]>

  34. <bank-2> <account account-number=“A401” owners=“Joe, Mary”> <branch-name>Downtown</branch-name> <balance>500</balance> </account> <customer customer-name=“Joe” accounts=“A401”> <customer-street>Monroe</customer-street <customer-city>Madison</customer-city> </customer> <customer customer-name=“Mary” accounts=“A401”> <customer-street> Erin</customer-street> <customer-city> Newark </customer-city> </customer> </bank-2> XML data with ID and IDREF attributes

  35. Limitations of DTDs • No typing of text elements and attributes • All values are strings, no integers, reals, etc. • Difficult to specify unordered sets of subelements • Order is usually irrelevant in databases • (A | B)* allows specification of an unordered set, but • Cannot ensure that each of A and B occurs only once • IDs and IDREFs are untyped • The owners attribute of an account may contain a reference to another account, which is meaningless • owners attribute should ideally be constrained to refer to customer elements

  36. XML Schema • XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports • Typing of values • E.g. integer, string, etc • Also, constraints on min/max values • User defined types • Is itself specified in XML syntax, unlike DTDs • More standard representation, but verbose • Is integrated with namespaces • Many more features • List types, uniqueness and foreign key constraints, inheritance .. • BUT: significantly more complicated than DTDs, not yet widely used.

  37. <xsd:schema xmlns:xsd:http://www.w3.org/2001/XMLSchema> <xsd:element name=“bank” type=“BankType”/> <xsd:element name=“account”> <xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:sequence> </xsd:complexType> </xsd:element> …….definitions o customer and depositor……………. <xsd:complexType name=“BankType”> <xsd:sequence> <xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded” /> <xsd:element ref”customer” minOccurs=“0” maxOccurs=“unbounded /> ………. XML Schema of Bank DTD

  38. XML – Querying and Schema Transforming • Translation of XML schemas and querying are closely related and handled by the same tool • Standard XML querying/translation languages • Xpath: Simple language consisting of path expressions • XSLT: Simple language designed for translation from XML to XML and XML to HTML • Xquery: An XML query language with a rich set of features • Wide variety of other languages have been proposed.

  39. Tree Model of XML Data • Query and transformation languages are based on a tree model of XML data • An XML document is modeled as a tree, with nodes corresponding to elements and attributes • Element nodes have children nodes, which can be attributes or subelements • Text in an element is modeled as a text node child of the element • Children of a node are ordered according to their order in the XML document • Element and attribute nodes (except for the root node) have a single parent, which is an element node • The root node has a single child, which is the root element of the document

  40. XPath • When XML is stored in a tree, XPath allows you to navigate to different nodes. <Class> <Student>Jeff</Student> <Student>Pat</Student> </Class> Class Student Student Text: Jeff Text: Pat

  41. XPath • XPath is used to address (select) parts of documents using path expressions • A path expression is a sequence of steps separated by “/” • Result of path expression: set of values that along with their containing elements/attributes match the specified path • E.g. /bank-2/customer/name evaluated on the bank-2 data we saw earlier returns <name>Joe</name> <name>Mary</name> • E.g. /bank-2/customer/name/text( ) returns the same names, but without the enclosing tags

  42. The initial “/” denotes root of the document (above the top-level tag) Path expressions are evaluated left to right Selection predicates may follow any step in a path, in [ ] E.g. /bank-2/account[balance > 400] /bank-2/account[balance] Attributes are accessed using “@” E.g. /bank-2/account[balance > 400]/@account-number Xpath

  43. XPath • XML is similar to a file structure, but you can select more than one node: //Class/Student Class <Class> <Student>Jeff</Student> <Student>Pat</Student> </Class> Student Student Text: Jeff Text: Pat

  44. XPath <class name=‘CS63005’> <location building=‘MSB’ room=‘121’/> <professor>Yuri Breitbart</professor> <student_list> <student id=‘999-991’>John Smith</student> <student id=‘999-992’>Jane Doe</student> </student_list> </class> Starting Element Attribute Constraint //class[@name=‘CS63005’]/student_list/student/@id Element Path Selection Selection Result: The attribute nodes containing 999-991 and 999-992

  45. XPath - Context • Context – your current focus in an XML document • Use: //<root>/… When you want to start from the beginning of the XML document

  46. XPath - Context XPath: List/Student Class Prof Location List Text: Gehrke Attr: Olin Student Student Text: Jeff Text: Pat

  47. Class Prof Location List Text: Gehrke Attr: Olin Student Student Text: Jeff Text: Pat XPath - Context XPath: Student

  48. XPath – Examples <Basket> <Cherry flavor=‘sweet’/> <Cherry flavor=‘bitter’/> <Cherry/> <Apple color=‘red’/> <Apple color=‘red’/> <Apple color=‘green’/> … </Basket> Select all of the red apples: //Basket/Apple[@color=‘red’]

  49. XPath – Examples <Basket> <Cherry flavor=‘sweet’/> <Cherry flavor=‘bitter’/> <Cherry/> <Apple color=‘red’/> <Apple color=‘red’/> <Apple color=‘green’/> … </Basket> Select the cherries that have some flavor: //Basket/Cherry[@flavor]

  50. XPath – Examples <orchard> <tree> <apple color=‘red’/> <apple color=‘red’/> </tree> <basket> <apple color=‘green’/> <orange/> </basket> </orchard> Select all the apples in the orchard: //orchard/descendant()/apple

More Related