210 likes | 226 Vues
Learn the fundamentals of XML, how it differs from HTML and databases, and its key principles. Explore XML document validation, related technologies, and use cases. Includes examples and comparisons.
E N D
System Documentation Overview • Query System Diagram Review • Query Front End Components • How the Front End Works • Query Front End UI File Types • Query Module File • Introduction to XML • Review Sample Query Module XML This Morning's Road Map
Introduction to XML • What XML it is and isn't and how it compares to HTML and Relational Databases • XML Basics (rules and a simple example) • More than basics (current related technologies used for the IBIS User Interface like namespaces, XSLT, Xpath, and Xinclude) • XML document validation (dtds, xsds) • Other XML Technology to be aware of • Resources
What is XML? • XML stands for eXtensible Markup Language • XML is a markup language much like HTML (data in between tags example <tag>my data</tag> • XML is a markup meta-language. A framework for defining other markup languages • XML was designed to describe data (not how to present it) • XML tags are NOT predefined. You must define your own tags • In some cases it is a replacement for EDI
What XML is Not? • Able to do anything on it's own. It is just plain text with some sort of tag that enclose the data so it can be differentiated from other data. • Able to define how your data is to be shown. To show data, you need other techniques. • Able to easily deal with binary data like gifs, jpegs, old MS-Word documents, MS-Excel documents etc. • A replacement for HTML • A replacement for relation databases • A Silver Bullet
Compared with HTML • HTML is for the formatting and presentation of data. XML is for the storage/structuring of data. • HTML has a fixed set of elements that are defined for a specific function. XML has a set of user defined elements that are dependent on the application(s) that uses the data. • HTML can be an XML document (called XHTML) but XML is not necessarily an HTML document. • HTML is typically very difficult to parse to share it's contained data with an application. • XML provides a standard way of sharing data between disparate systems.
Compared with Relational Databases • Relational databases store lists of similarly structured data. They are used to efficiently store, quickly search/retrieve, and join together different lists of data. Example: A user wants to see everyone who lives in Utah with a zip code of 84108 and drives a blue car. A list of Utah citizens could be joined with the DMV list of cars to quickly produce such a list. • XML typically stores nested hierarchical structures. These nested structures can completely contain all the data for a given item. Example: The same XML document structure can be used to store data about a motorcycle, car, truck, tractor, or tricycle.
Summary of XML • Allows for user defined tags and structure which includes hierarchical nesting of related but different data elements. • Each XML language is targeted for its own application domain that knows how to use it. • Provides a mechanism for sharable, platform independent files. • Has free tools and libraries that allow for the creation, usage and adoption of XML data. • Is NOT anything special – not a silver bullet.
XML Basics • Must begin with XML declaration (prolog) • Must have one unique root element • All element tags must have an opening and matching closing end-tag • Blank/Empty elements can be coded in a single tag syntax. Example: <MY_TAG_NAME/> • XML element tags are case sensitive • All elements must be properly nested • All attribute values must be quoted • HTML comments are supported Example: <!-- a comment block -->
More Basics & HTML Gotchas • Empty HTML tag elements need to be updated to have a " /" on the tag's end (the extra space is for Netscape compatibility). Some of these are: <br />, <hr />, <img /> and <input />. • XML only likes three characters below the space character " " (0x20): tab, line feed, and carriage return (0x9, 0xA, 0xD). • The "<", ">", and "&" characters are NOT allowed within an element's text. These need to be escaped as <, >, and &. • is undefined in XML but   /   will produce a non-breaking space character. • Special CDATA section where everything is ignored by the parser (except characters below ""). Example: <![CDATA[anything < can go inside here]]>
Sample XML <?xml version="1.0"?> <note> <to alias='Jack Trip'>Jack</to> <from>Jill</from> <heading>Reminder</heading> <body>Don't forget to bring your pail. We are going to have a good time hiking up to the well and fetching the water for my goldfish Ralph. </body> </note>
Same Sample XML Showing that Indenturing does NOT matter for Validity – but does for Readability • Please use the [Tab] key and keep your code and be consistently indented. <?xml version="1.0"?><note><to alias='Jack Trip'>Jack</to><from>Jill</from><heading>Reminder</heading><body>Don't forget to bring your pail. We are going to have a good time hiking up to the well and fetching the water for my goldfish Ralph. </body></note>
Overview of Other XML Stuff Used for the IBIS User Interface • Namespaces • Xpath • XSLT • Xinclude
XML Namespaces • XML Namespaces provide a method to avoid element name conflicts. Most XML extension stuff utilizes a namespace to ensure that their elements do not conflict with contained elements. • The XML Namespace Attribute is "xmlns:". Note the "xi:" prefix which is part of the tag. Example of syntax and usage: <?xml version="1.0"?> <xi:include href="../answers/yearAll.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
Xpath • XPath is a declarative language for: • Addressing XML elements (used in XLink/XPointer and in XSLT) • Accessing XML Elements (pattern matching used in XSLT and in XQuery) • Location paths are evaluated left-to-right and resemble operating system directory paths • Each node resulting from evaluation of one step is used as context for evaluation of the next • Example of selecting a module's count measure element (selects the all the measure's elements): /QUERY/MODULE/MEASURES/MEASURE[NAME='count']
XSL • XSL stands for eXtensible Stylesheet Language • XSL has nothing to do with HTML's CSS • XSL consists of three parts: • XSLT - language for transforming XML documents • XPath - mechanism to access XML document elements • XSL-FO - language for formatting XML documents for printing (how XML if turned into PDF files). • Think of XSL as set of languages that can transform XML into other XML structures, HTML, XHTML, filter and sort XML data, format XML data based on the data value, and output XML data to different media, like web pages, paper, or voice.
XSLT • XSLT is the mechanism that is used to merge the IBISQ User Interface XML data into HTML web pages. • XSLT works by using XPath that defines parts of the source document (nodes) that match one or more predefined templates. When a match is found, XSLT will transform the matching node(s) of the source document into the result document based on how the template rules are coded. • XSLT has XSL namespace functions that allow for basic string, element node, and node set data manipulation/operations.
XSLT Example –Module Overview Section Element Template <xsl:template match="OVERVIEW/SECTION"> <xsl:if test="string-length(TITLE) > 0"> <div class="contentBlockTitle"> <xsl:value-of select="TITLE"/> </div> </xsl:if> <xsl:copy-of select="TEXT/* | TEXT/text() | @*"/> <xsl:if test="position() !=$overviewSectionsCount"> <br/> </xsl:if> <br/> </xsl:template>
Xinclude • Xinclude is a mechanism that enables XML elements to be split out into separate XML data files and merged in at the time the file is parsed. This allows the XML to be more modularized by having common XML elements stored in one file and shared among many module files instead of being replicated. • Currently, xinclude is not well supported in the open source Java XML parsers. This feature has been implemented as an extra XSLT step in the IBISQ system and is only setup to work for query modules. Note that recursive xincludes are supported. • Usage Example: <xi:include href="sub_filename.xml" xmlns:xi="http://www.w3.org/2003/XInclude">
XML Document Validation • A "Well Formed" XML document is a document that conforms to the XML syntax rules outlines in the XML Basics slide. • A "Valid" XML document is a "Well Formed" XML document and also one which conforms to the rules of a Document Type Definition (DTD) or a XML Schema (XSD). • IBISQ User Interface XML Modules have XSDs but are not used because the way xinclude has to be implemented precludes it's usage.
XML Buzzwords • The technologies listed below have been a standard for a few years now but they are not yet completely implemented in most parsers/ products or are not ready for prime time. • XLink • generalization of the HTML link concept but with higher abstraction level • more expressive power (multiple destinations, special behaviors, linkbases, ...) • Xpointer - an extension of XPath • Xquery - SQL-like database queries • XQL - XML Query Language
XML Resources • W3Schools is a good all around website that provides tutorials, examples, and reference material for most web related technologies http://www.w3schools.com/ • Microsoft's MSDN http://msdn.microsoft.com/library/ default.asp?url=/library/en-us/adschema/adschema/c_group.asp • Good XML Technologies Overview can be found at "The XML Revolution Technologies for the future Web" http://www.brics.dk/~amoeller/XML/