1 / 41

XML 101: A Technical Introduction to XML

XML 101: A Technical Introduction to XML. 20 November 2002 Bank of Montreal Database Users Group Ian GRAHAM IT Strategy, IBS, Technology and Solutions, BMO Financial Group E: <ian.graham@bmo.com> T: (416) 513.5656 / F: (416) 513.5590

knox
Télécharger la présentation

XML 101: A Technical Introduction to XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML 101:A Technical Introduction to XML 20 November 2002 Bank of Montreal Database Users Group Ian GRAHAM IT Strategy, IBS, Technology and Solutions, BMO Financial Group E: <ian.graham@bmo.com> T: (416) 513.5656 / F: (416) 513.5590 To download this talk: http://www.utoronto.ca/ian/talks/

  2. Presentation Outline • What is XML (basic introduction) • Defining language dialects and constraints • DTDs, namespaces, and schemas • XML processing • Parsers and parser interfaces; XML processing tools • XML databases • High-level issues, and references • XML messaging / web services • Why, and some issues/example • Conclusions

  3. What is XML? • A base-level syntax • for encoding structured, text-based information (words, characters, ...) • A text-based syntax • XML is written using printableUnicode characters. Explicit binary data is not allowed • Supports extensible data formats • XML lets you define your own elements (essentially data types), within the constraints of the syntax rules • Designed as a universalformat • The syntax rules ensure that all XML processing software MUST identically handle a given piece of XML data. If you can read and process it, so can anybody else

  4. XML: A Simple Example Flags character encoding used in file XML Declaration (“this is XML”) <?xml version="1.0" encoding="iso-8859-1"?> <partorders xmlns=“http://myco.org/Spec/partorders”> <order ref=“x23-2112-2342” date=“25aug1999-12:34:23h”> <desc> Gold sprockel grommets, with matching hamster </desc> <part number=“23-23221-a12” /> <quantity units=“gross”> 12</quantity> <deliveryDate date=“27aug1999-12:00h” /> </order> <order ref=“x23-2112-2342” date=“25aug1999-12:34:23h”> . . . Order something else . . . </order> </partorders> Black – XML tags and markup Blue - encoded text data

  5. attribute of this quantity element element tags Hierarchical, structured data Example Revisited <partorders xmlns=“http://myco.org/Spec/partorders” > <order ref=“x23-2112-2342” date=“25aug1999-12:34:23h”> <desc> Gold sprockel grommets, with matching hamster </desc> <part number=“23-23221-a12” /> <quantityunits=“gross”> 12 </quantity> <deliveryDate date=“27aug1999-12:00h” /> </order> <order ref=“x23-2112-2342” date=“25aug1999-12:34:23h”> . . . Order something else . . . </order> </partorders>

  6. ref= date= desc text order part quantity partorders text xmlns= delivery-date order ref= date= XML Data Model - A Tree <partorders xmlns="..."> <order date="..." ref="..."> <desc> ..text.. </desc> <part /> <quantity /> <delivery-date /> </order> <order ref=".." .../> </partorders>

  7. XML: Design goals • Simplebut reliable • Strict syntax rules, to eliminate syntax errors • syntax defines structure (hierarchically), and names structural parts (element names) -- it is self-describing data • Extensible and ‘mixable’ • Can create your own language of tags/elements • Can mix one language with another, and still reliably separate / process the data • Designed for a distributed environment • Can have remote (‘webbed’) data, and retrieve and use it reliably

  8. XML Processing: The XML Parser parser Interface • The parser must verify that the XML is syntactically correct • Such data is said to be well-formed • The minimal requirement to “be” XML • A parser MUST stop processing if the data isn’t well-formed • E.g., stop processing and “throw an exception” to the XML-based application. The XML 1.0 spec requires this behaviour XML parser XML-based application XML data

  9. Special Issues: Characters and Charsets • XML specification defines characters allowed as whitespace in tags: <element id = “23.112” /> • You cannot use EBCIDIC character ‘NEL’ as whitespace • Must make sure to not do so! • What if you want to include characters not defined in the encoding charset (e.g., Greek characters in an ISO-Latin-1 document): • Use character references. For example:&#9824; -- the spades character () 9824th character in the Unicode character set • Also, a reminder that binary data is forbidden • must be encoded as printable characters (e.g. using Base64)

  10. Parsers and DTDs parser interface • A DTD can define external parts (entities) to be ‘included’ in • But …. what if the parser can’t find the external parts (firewall?)? • That depends on the type: there are two types of XML parsers • one that MUST retrieve all parts • one that can ignore them (if it can’t find them) parser XML-based application XML data DTD

  11. Two types of XML parsers • Validating • Must retrieve all entities and process all of the DTD. Will stop processing and indicate a failure if it cannot • It must also test and verify other things in the DTD -- instructions that define syntactic document rules (allowed elements, attributes, etc.). • Non-validating (well-formed only) • Tries retrieve all ‘parts’, but will cease processing the DTD content at the first part (entity) it can’t find, • But this is not an error -- the parser simply makes available the XML data (and the names of any unresolved ‘parts’) to the application. Application behavior will depend on parser type Many parsers can operate in either mode (config)

  12. Presentation Outline • What is XML (basic introduction) • Defining language dialects and constraints • DTDs, namespaces, and schemas • XML processing • Parsers and parser interfaces; XML processing tools • XML databases • High-level issues, and references • XML messaging / web services • Why, and some issues/example • Conclusions

  13. Defining constraints / languages • Two ways of doing so: • XML Document Type Declaration (DTD) -- Part of core XML spec. • XML Schema(often called XSD) -- New specification (2001), which allows for richer constraints on XML documents. • What DTDs and/or schema specify: • Allowed element and attribute names, hierarchical nesting rules; element content/type restrictions • Adding dialect specifications implies two classes of XML data • Well-formedXML that is syntactically correct • ValidXML that is well-formed and consistent with a specific DTD (or Schema) • Schemas are more powerful than DTDs • Often used for type validation, or for defining low-level type constraints (integer, varchar, datetime, etc.) constraints on values.

  14. DTD Example <!DOCTYPE transfers [ <!ELEMENTtransfers(fundsTransfer)+> <!ELEMENTfundsTransfer(from, to)> <!ATTLISTfundsTransfer dateCDATA #REQUIRED> <!ELEMENTfrom(amount, transitID?, accountID, acknowledgeReceipt)> <!ATTLISTfrom type (intrabank|internal|other) #REQUIRED> <!ELEMENTamount (#PCDATA) > . . . Omitted DTD content . . . <!ELEMENTtoEMPTY> <!ATTLISTto accountCDATA#REQUIRED> ]> <transfers> <fundsTransfer date="20010923T12:34:34Z"> . . . As with previous example . . .

  15. XML Namespaces • Mechanism for identifying different “spaces” for XML names • That is, element or attribute names • This is a way of identifying different language dialects, consisting of names that have specific semantic (and processing) meanings. • For example <key/> in one language (e.g. a security key) can be distinguised from <key/> in another language (a database key) • Mechanism uses a special xmlns attribute to define namespaces. • The namespace is a URL string • But the URL does not reference anything in particular (there may be nothing there!)

  16. Mixing languages together Namespaces let you do this relatively easily: <?xml version= "1.0" encoding= "utf-8" ?> <htmlxmlns="http://www.w3.org/1999/xhtml1" xmlns:mt="http://www.w3.org/1998/mathml” > <head> <title> Title of XHTML Document </title> </head><body> <div class="myDiv"> <h1> Heading of Page </h1> <mt:mathml> <mt:title> ... MathML markup . . . </mt:mathml> <p> more html stuff goes here </p> </div> </body> </html> Default ‘space’ is xhtml mt: prefix indicates ‘space’ mathml (a different language)

  17. XML Schemas • A specification for defining XML validation rules Specs: http://www.w3.org/XML/SchemaBest-practice:http://www.xfront.com/BestPracticesHomepage.html • Uses pureXML (plus namespaces) to do this • More powerful than DTDs - can specify things like integer types, date strings, real numbers in a given range, etc. • Often used for type validation, or for relating database schemas to XML models • They don’t, however, let you declare entities -- those can only be done in DTDs • The following slide shows the XML schema equivalent to our DTD

  18. XML Schema version of our DTD (Portion) <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="accountID" type="xs:string"/> <xs:element name="acknowledgeReceipt" type="xs:string"/> <xs:complexType name="amountType"> <xs:simpleContent> <xs:restriction base="xs:string"> <xs:attribute name="currency" use="required"> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="USD"/> . . . (some stuff omitted) . . . </xs:restriction> </xs:simpleType> </xs:attribute> </xs:restriction> </xs:simpleContent> </xs:complexType> <xs:complexType name="fromType"> <xs:sequence> <xs:element name="amount" type="amountType"/> <xs:element ref="transitID" minOccurs="0"/> <xs:element ref="accountID"/> <xs:element ref="acknowledgeReceipt"/> </xs:sequence> . . . And still more !!! . . .

  19. Presentation Outline • What is XML (basic introduction) • Defining language dialects and constraints • DTDs, namespaces, and schemas • XML processing • Parsers and parser interfaces; XML processing tools • XML databases • High-level issues, and references • XML messaging / web services • Why, and some issues/example • Conclusions

  20. XML Software • XML parsers….. • Read in XML data, checks for syntactic (and possibly DTD/Schema) constraints, and makes data available to an application. There are three 'generic' parser APIs • SAX Simple API to XML (event-based) • DOM Document Object Model (object/tree based) • JDOM Java Document Object Model (object/tree based) • Pull evolving API (new) (pull-based / object + tree) • Lots of XML parsers and interface software available • Unix, Linux, Windows 2000/XP, Z/OS, etc • SAX-based parsers are fast (often as fast as you can stream data) • DOM slower, more memory intensive (create in-memory version of entire document • Validating can be much slower than non-validating

  21. Parser API: SAX A) SAX: Simple API for XML • http://www.megginson.com/SAX/index.html • An event-based interface (a push parser API) • Parser reports events whenever it sees a tag/attribute/text node/unresolved external entity/other (driven by input stream) • Programmer attaches “event handlers” to handle the event • Advantages • Simple to use • Very fast (not doing very much before you get the tags and data) • Low memory footprint (doesn’t read an XML document entirely into memory) • Disadvantages • Not doing very much for you -- you have to do everything yourself • Not useful if you have to dynamically modify the document once it’s in memory (since you’ll have to do all the work to put it in memory yourself!)

  22. Parser API: DOM B) DOM: Document Object Model • http://www.w3.org/DOM/ • An object-based interface • Parser generates an in-memory tree corresponding to the document • DOM interface defines methods for accessing and modifying the tree • Advantages • Very useful for dynamic modification of, access to the tree • Useful for querying (I.e. looking for data) that depends on the tree structure [element.childNode("2").getAttributeValue("boobie")] • Same interface for many programming languages (C++, Java, ...) • Disadvantages • Can be slow (needs to produce the tree), and may need lots of memory • DOM programming interface is a bit awkward, not terribly object oriented

  23. DOM desc parser interface text order parser application XML data part partorders quantity Document “object” delivery-date order DOM Parser Processing Model

  24. Parser API: JDOM B2) JDOM: Java Document Object Model • http://www.jdom.org • A Java-specific object-oriented interface • Parser generates an in-memory tree corresponding to the document • JDOM interface has methods for accessing and modifying the tree • Advantages • Very useful for dynamic modification of the tree • Useful for querying (I.e. looking for data) that depends on the tree structure • Much nicer Object Oriented programming interface than DOM • Disadvantages • Can be slow (make that tree...), and can take up lots of memory • New, and not entirely cooked (but close) • Only works with Java

  25. Parser API: Pull C) Pull Interfaces • http://www.xmlpull.org/ (Java); there is also a .NET pull API • An pull-parser interface • API uses expressions / methods to ‘pull’ specific chunks of XML data, or to iterate over the XML • Can be built on top of a DOM model • Advantages • Easier to write applications that need to read in and process XML data (‘easier’ model than a push API, in many cases) • Has proven a very popular component in the .NET toolkit • Disadvantages • Can be slow if you do lots of iteration over the XML input data • No common API across different languages (although xmlpull.org tries to be similar to the .NET API); not yet a ‘real’ standard (still being worked on; not part of most commercial environments)

  26. XML Processing: XSLT D) XSLT eXtensible Stylesheet Language -- Transformations • http://www.w3.org/TR/xslt • An XML language for processing/transforming XML • Does tree transformations -- takes XML and an XSLT style sheet as input, and produces a new XML document with a different structure • Advantages • Very useful for tree transformations -- much easier than DOM or SAX for this purpose • Can be used to query a document (XSLT pulls out the part you want) • Disadvantages • Can be slow for large documents or stylesheets • Can be difficult to debug stylesheets (poor error detection; much better if you use schemas)

  27. desc text order part partorders quantity delivery-date xza foo partorders bee order order XSLT processing model • D) Processing model schema XSLT processor XSLT style sheet in XML parser XML data in data out (XML) XML parser schema document “objects” for data and style sheet

  28. XML Processing Toolkits Lots of them … • Java • JAXP ( http://java.sun.com/xml/jaxp/faq.html )dom4j ( http://www.dom4j.org ) .NET ( part of .NET framework)… … others … • Provide DOM, SAX, (JDOM) interfaces, plus lots of other useful tools in a standardized way (loading parsers, performing XSLT transformations, etc.) • JAXP is standard Java, and thus integrated with Websphere

  29. Presentation Outline • What is XML (basic introduction) • Defining language dialects and constraints • DTDs, namespaces, and schemas • XML processing • Parsers and parser interfaces; XML processing tools • XML databases • High-level issues, and references • XML messaging / web services • Why, and some issues/example • Conclusions

  30. XML and databases • So where do you stick XML data • Inside a database!?! • But how to do this – and which database type to use: • RDBMS, ORDBMS, ODB, XML?? • How you do so depends on the use cases you have for the data. Some good-to-ask questions are • Am I talking about storing documents, or data? • Is the XML format integral to the application (e.g. XHTML, DocBook?) • How will the database be queried? • Queried by XML structure, or by standard SQL • What ‘parts’ of the document need to be queried • Do I need a text index? • How will the data be used/retrieved? • Passed to XML processing tools (e.g. XSLT), or used at ‘atomic’ simple type level? • The answers drive out • What database to choose, how to map XML to tables (O-R or table mappings), store as BLOB or broken up …..

  31. XML and databases • Upcoming technologies • XML Query – a query language for querying XML datasets (and databases) • Uses XML schema for type casting, and validation • Info: http://www.w3.org/XML/Query • Useful XML Database references • http://www.xml.com/pub/a/2001/10/31/nativexmldb.html Introductory article • http://www.rpbourret.com/xml/XMLAndDatabases.htm XML and databases • http://www.rpbourret.com/xml/XMLDatabaseProds.htm Products list • http://www.xmldb.org/resources.html Docs / resource list

  32. Presentation Outline • What is XML (basic introduction) • Defining language dialects and constraints • DTDs, namespaces, and schemas • XML processing • Parsers and parser interfaces; XML processing tools • XML databases • High-level issues, and references • XML messaging / web services • Why, and some issues/example • Conclusions

  33. XML Messaging • Use XML as the format for sending messages between systems • Advantages: • Common syntax; self-describing (easier to parse) • Can use common/existing transport mechanisms to “move” the XML data (HTTP, HTTPS, SMTP (email), MQ, IIOP/(CORBA), JMS, ….) • Requirements • Shared understanding of dialects for transport (required registry [namespace!] ) for identifying dialects • Shared acceptance of messaging contract • Disadvantages • Asynchronous transport; no guarantee of delivery, no guarantee that partner (external) shares acceptance of contract. • Messages will be much larger than binary (10x or more) [can compress]

  34. Common messaging model • XML over HTTP • Use HTTP to transport XML messages • POST /path/to/interface.pl HTTP/1.1Referer: http://www.foo.org/myClient.htmlUser-agent: db-server-olkAccept-encoding: gzipAccept-charset: iso-8859-1, utf-8, ucsContent-type: application/xml; charset=utf-8Content-length: 13221. . . <?xml version=“1.0” encoding=“utf-8” ?><message> . . . Markup in message . . . </message>

  35. Some standards for message format • Define dialects designed to “wrap” remote invocation messages • XML-RPChttp://www.xmlrpc.com • Very simple way of encoding function/method call name, and passed parameters, in an XML message. • SOAP (Simple object access protocol) http://www.soapware.org • More complex wrapper, which lets you specify schemas for interfaces; more complex rules for handling/proxying messages, etc. This is a core component of Microsoft’s .NET strategy, and is integrated into more recent versions of Websphere and other commercial packages. W3c activity (who sets the SOAP spec) is outlined at: http://www.w3.org/2000/xp/Group/

  36. XML Messaging + Processing • XML as a universal format for data exchange Place order (XML/edi) using SOAP over HTTP SOAP interface Application Supplier SOAP API Factory SOAP Supplier XML/ EDI Transport HTTP(S) SMTP other ... Supplier Response (XML/edi) using SOAP over HTTP

  37. Web “Services” Model • SOAP plus higher-level modeling for how services are ‘advertised’, ‘exposed’ and ‘found’ • Uses an XML dialect, WSDL (Web Services Description Language) to define a service • WSDL can use XML Schema to define how data is passed between a service provider and requestor • Uses an XML dialect, UDDI (Universal Description, Discovery and Integration) for • Describing services (high-level) • Discovering services (registry services, metadata) • UDDI defined using XML Schema • Core technology for application integration • Microsoft .NET • IBM Websphere • Oracle • …. Many others

  38. Web Services Code Development Client code WSDL proxy proxy WS/SOAP SOAP Requests/ responses Write the Application! automated code generator WS/SOAP XML schema skeleton skeleton Validation, business logic, routing, Logging, more… Middle tier code adapter Product System code adapter MECH

  39. Presentation Outline • What is XML (basic introduction) • Defining language dialects and constraints • DTDs, namespaces, and schemas • XML processing • Parsers and parser interfaces; XML processing tools • XML databases • High-level issues, and references • XML messaging / web services • Why, and some issues/example • Conclusions

  40. industry std Xfragment RDF Canonical Xpath MathML SMIL 1 & 2 Xpointer XML base W3C rec SVG Xlink Infoset XSL …... XML signature XHTML events DOM 3 Xforms XHTML basic Modularized XHTML FinXML Biztalk CSS 1 IFX dirXML ebXML CSS 2 WDDX XMI 100's more .... FpML ... ... CSS 3 ... XML (and related) Specifications W3C draft ‘Open’ std XML Core XML 1.0 XML names APIs XSLT JDOM JAXP DOM 1 XHTML 1.0 DOM 2 XML query …. XML schema SAX 1 SAX 2 SOAP UDDI XML-RPC WSDL Style Protocols Web Services Application areas

  41. XML 101:A Technical Introduction to XML The End. Ian GRAHAM IT Strategy, IBS, Technology and Solutions, BMO Financial Group E: <ian.graham@bmo.com> T: (416) 513.5656 / F: (416) 513.5590

More Related