560 likes | 685 Vues
Learn the basics of XML as an extensible markup language and the advantages it offers for data interchange. Explore how XML is utilized for web syndication and transforming XML into HTML using XSL stylesheets.
E N D
XML and XSL Overview by Alex Chaffee alex@jguru.com, http://www.purpletech.com/ Purple Technology: Open source development jGuru: Java online resource FAQs and News and other cool stuff
XML • eXtensible Markup Language • Replacement for HTML • Metalanguage - used to create other languages • Has become a universal data-exchange format
Advantages of XML • Human-readable • Machine-readable (easy to parse) • Standard format for data interchange • Possible to validate • Extensible • can represent any data • can add new tags for new data formats • Hierarchical structure (nesting)
Why not HTML? • Browsers are too lenient • Led to sloppy HTML code all over the Web • <imG src="foo.gif> is "legal" HTML • Told HTML, "go to your room and don't come out until it's clean" • Out came XML
XML Searching and Agents • An early motivation for XML • Allows detailed queries of disparate data sources • Find best price for certain product • Search for properties with different real estate brokers • HTML insufficient • Good for humans, bad for computers • Doesn't scale
XML Example <?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> </menu>
XML Languages • MML - musical scores • CML - chemicals • HRMML - Human Resource Management (???) • MathML - equations • RSS - web syndication
Tag vs. Element • A tag is a name, enclosed by angle brackets, with optional attributes • <foo id=“123”> • An element is a tree, containing an open tag, contents, and a close tag • <foo id=“123”>This is <bar>an element</bar></foo>
XML Syntax • Tags properly nested • Tag names case-sensitive • All tags must be closed • or self-closing • <foo/> is the same as <foo></foo> • Attributes enclosed in quotes • Document consists of a single (root) element • A few other details
Well-Formed vs. Valid • Well-Formed: • Structure follows XML syntax rules • Valid: • Structure conforms to a DTD
DTD • Document Type Definition • A grammar for XML documents • Defines • which elements can contain which other elements • which attributes are allowed/required/permitted on which elements
DTD and Data Exchange • Both sides must agree on DTD ahead of time • DTD can be part of document or stored separately
DTD Example <?xml encoding="US-ASCII"> <!ELEMENT menu (meal)*> <!ATTLIST menu name CDATA #OPTIONAL> <!ELEMENT meal (food|drink)*> <!ATTLIST meal name CDATA #REQUIRED> <!ELEMENT food (#PCDATA)*> <!ELEMENT drink (#PCDATA)*>
Why isn't a DTD in XML? • It will be someday: XSchema
XML Namespaces • A single document can use multiple DTDs • But! Two DTDs can use the same element name with different rules • Solution: Namespaces • Must prefix tag name with namespace name • e.g. <xsl:apply-templates select="."/>
Entities • Macros / constants • Values defined once, used in document <!DOCTYPE foo SYSTEM "foo.dtd" [ <!ENTITY background "#99FFFF"> ]> <BODY BGCOLOR="&background;">
SML / Minimal XML • Simplified Markup Language • Subset of XML, but stripped down • Easier to understand, parse • No • DTDs • Attributes • Processing instructions • etc.
XSL • The eXtensible Style Language • Transforms XML into HTML • Actually, transforms XML into a tree, then turns that tree into another tree, then outputs that tree as XML
XSL Architecture XSL Stylesheet XML Source XSL Processor HTML Output
XML is a Tree menu <?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> <meal name="snack"> <food>Chips</food> </meal> </menu> meal meal name food food drink "breakfast" "Scrambled Eggs" "Hash Browns" "Orange Juice"
XML Is A Tree • Nodes • Branch nodes contain children • Leaf nodes contain content • Attributes, Values, Entities, etc. • DOM provides API-based access to tree models • XSL turns one tree into a different tree
Command Line Invocation • Apache Xalan java org.apache.xalan.xslt.Process -IN faq.xml –XSL faq.xsl –OUT faq.html • IBM LotusXSL java com.lotus.xsl.xml4j.ProcessXSL -in servletfaq.xml -xsl faq.xsl -out faq.html • And so on…
Formatting Objects • Forget about it for now
XSLT • The meat of XSL • Syntax for making XSL template files • Pattern matching • Output formatting • Rules-based (like Prolog)
XPath • The stuff inside the quotes in XSL patterns • "/person/name/firstname" • A sensible way to locate content in an XML document • More straightforward than walking a DOM tree or waiting for a SAX callback
XPath Syntax • book/title • title child of book child of current node • /book/title • title child of book child of document root • @language • language attribute of current node • chapter/@language • language attribute of chapter child of current node
XPath Syntax (cont.) • chapter[3]/para • all the para children of the third chapter • book/*/title • all title children of all children of book (but not of their children) • chapter//para • all para children of any child of chapter, recursively • ../../title • title child of parent of parent • parent::node()/parent::node()/child::title
XPath Functions • para[1] or para[position()=1] • the first para node of the current node • para[last()] • para[count(child::note)>0] • all paragraphs with one or more notes • para[id("abstract")] • selects all child nodes like <para id="abstract"> • para[@type='secret'] or para[attribute::type='secret'] • selects all child nodes like <para type="secret">
XPath Functions (cont.) • para[not(title)] • selects all child paragraphs with no title elements • para[position() >= 2 and position() < last()] • selects all but the first and last paragraphs • para[lang("en")] • matches <para xml:lang="en-uk">…</para> • note[contains(., "alex")] • . means "test childrens' content too, recursively" in this context • note[starts-with(., "hello")]
XPath Disadvantages • Not XML • Not hierarchical • New syntax rules • Weird mix of /,[],(),*,:,::,.,..,@ • New function set • Not Java • Concepts like "axis" not always clear
XSL Rules • XSL is a series of rules or templates • Each template matches an element • Templates can contain XML commands
XSL Commands: apply-templates • Main rule: apply-templates • looks for a template match • applies it • Usually the template calls apply-templates recursively on its children • If not, then processing stops at that node (but continues for its other siblings that matched this template)
Default Rule • For a leaf node, output its contents • For a branch node, apply templates (recursively) (including default rule)
Some XSL Commands • value-of • grabs raw value, good for text elements and attributes • if • executes conditionally • number • counts position of element in group • good for ordered list numbering, table of contents, etc.
XSL Example <?xml version="1.0"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY background "#99FFFF"> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0" xmlns="http://www.w3.org/TR/REC-html40" result-ns="">
Example (cont.) <xsl:template match="menu"> <HTML> <HEAD> <TITLE>Menu: <xsl:value-of select="@name"/> </TITLE> </HEAD> <BODY BGCOLOR="&background;"> <H1> Menu <xsl:value-of select="@name"/> </H1> [Note: Can reuse contents, unlike CSS]
Example (cont.) <xsl:apply-templates /> </BODY> </HTML> </xsl:template>
Example (cont.) <xsl:template match="meal"> <H2><xsl:value-of select="@name"/></H2><br />; <UL> <xsl:apply-templates/> </UL> </xsl:template>
Example (cont.) <xsl:template match="food"> <LI><xsl:apply-templates/></LI> </xsl:template> <xsl:template match="drink"> <LI><xsl:apply-templates/></LI> </xsl:template> </xsl:stylesheet>
Outputting Attributes • From This: • <link> <name>Stinky</name> <url>http://www.stinky.com/</url></link> • We Want This: • <A href="http://www.stinky.com/">Stinky</A>
Outputting Attributes • The Hard Way: • <xsl:element name="A"> <xsl:attribute name="href"> <xsl:value-ofselect="url" /> </xsl:attribute><xsl:value-ofselect="name" /></xsl:element> • The Easy Way: • <A href="{url}"> <xsl:value-of select="name"/></A>
Copying Subtrees • <xsl:template match="*|@*|text()"> <xsl:copy> <xsl:apply-templates select="*|@*|text()"/> </xsl:copy></xsl:template> • No, I don't understand it either • Default copy rule strips all tags/attributes • Also copy-of
XSL conditionals: if • <xsl:if test="author">by <xsl:apply-templates select="author" /></xsl:if> • Note: no else (?!?)
XSL Conditonals: choose • Case 1 • <link> <name>Stinky</name> <url>http://www.stinky.com/</url></link> • <a href="http://www.stinky.com/">Stinky</a> • Case 2 • <link> <url>http://www.stinky.com/</url></link> • <a href="http://www.stinky.com/">http://www.stinky.com/</a> • Case 3 • <link> <name>Stinky</name></link> • Stinky
XSL Conditionals: choose • <xsl:choose><xsl:when test="url"> <A href="{url}"> <xsl:choose><xsl:when test="name"><xsl:value-ofselect="name" /></xsl:when><xsl:otherwise><xsl:value-ofselect="url" /></xsl:otherwise></xsl:choose></A></xsl:when><xsl:otherwise><xsl:value-ofselect="name" /></xsl:otherwise></xsl:choose>
XSL Looping: for-each • <xsl:for-each select="chapter"> <h2><xsl:value-of select="@title"/> </h2></xsl:for-each> • Functional overlap with apply-templates • Difference in programming style • Use it inside a given template rule
Template Modes • Same element name, different context -> different template, different output • Can invoke apply-templates with a mode, matches corresponding moded template • <h1>Table of Contents</h1><ol><xsl:apply-templates select="chapter" mode="toc"/></ol> • <xsl:template select="chapter" mode="toc"> <li><xsl:value-of select="@title"/></li></xsl:template> • <xsl:template select="chapter"> <h1><xsl:value-of select="@title"/></h1> <xsl:apply-templates/></xsl:template>