120 likes | 231 Vues
This guide introduces XML, providing a thorough overview from its structure and well-formedness to validation via DTDs. It covers essential encoding types and the importance of XML namespaces. Additionally, readers will learn about XML Schema for validation, various XML tools, and APIs like SAX and DOM for effective XML processing. With examples illustrating the transition from HTML to XML and stylesheets like CSS and XSL, this resource is ideal for both beginners and experienced developers looking to deepen their understanding of XML.
E N D
Introducing XML : Table of Contents 1. From HTML to XML 2. Well-Formed XML 3. Validity / DTDs 4. Encodings 5. XML Namespaces 6. XML Schema 7. XML Tools 8. XML APIs / SAX 9. XML APIs / DOM 10. Stylesheets : CSS & XSL 11. XML Query Language (XQL)
1. From HTML to XML <HTML> <HEAD><TITLE>Drei Sonaten und drei Partiten für Violine solo</TITLE></HEAD> <BODY> <H1>Drei Sonaten und drei Partiten für Violine solo</H1> <P>Publisher : Bärenreiter</P> <P>Composer : J. S. Bach</P> … <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE music SYSTEM "../DTDs/music.dtd"> <music ismn="M-006-46489-0" type="concert"> <title>Drei Sonaten und drei Partiten für Violine solo</title> <publisher>Bärenreiter</publisher> <composer>J. S. Bach</composer> ...
2. Well-Formed XML An XML document is called well-formed if 1. It starts with the XML prolog, i.e. <?xml version=“1.0”?> ... 2. The tags are properly nested, 3. There is exactly one root element, ... The XML spec : http://www.w3.org/TR/REC-xml
3. Validity / DTDs An XML document is valid, if it has an associated document type declaration and if the document complies with the constraints expressed in it. <!-- DTD for Music --> <!ELEMENT music (title, publisher, composer?, opus?, remarks?, instruments?, pieces)> <!ELEMENT title (#PCDATA)> <!ELEMENT publisher (#PCDATA)> <!ELEMENT composer (#PCDATA)> <!ATTLIST music ismn CDATA #IMPLIED ... A Gentle Introduction to SGML : http://www-tei.uic.edu/orgs/tei/sgml/teip3sg/
4. Encodings • <?xml version="1.0" encoding="ISO-8859-1"?> • <!DOCTYPE music SYSTEM "../DTDs/music.dtd"> • <music ismn="M-006-46489-0" type="concert"> • <title>Drei Sonaten und drei Partiten für Violine solo</title> • <publisher>Bärenreiter</publisher> • … • ISO-8859-1, western european, one byte per character, superset of US-ASCII • UTF-8, at least one byte per character, superset of US-ASCII • UTF-16, two bytes per character, endian problem • UTF-32, four bytes per character, endian problem Unicode Consortium Homepage: http://www.unicode.org/
5. XML Namespaces XML namespaces provide a simple method for qualifying element and attribute names used in XML documents by associating them with namespaces identified by URI references. <library xmlns:m=“http://www.somewhere.com/”/> <m:music><m:title>… <m:music><m:title>… ... </library> W3C Rec., Namespaces in XML : http://www.w3.org/TR/REC-xml-names/
6. XML Schema <?xml version='1.0'?> <schema name='music' version='1.0'> <elementType name='music'> <sequence> <elementTypeRef name='title'/> <elementTypeRef name='publisher'/> <elementTypeRef name='composer' minOccur="0" maxOccur="1"/> … <attrDecl name='ismn'> <datatypeRef name='string'/> </attrDecl> ... XML Schema Part 1: Structures : http://www.w3.org/TR/xmlschema-1/ XML Schema Part 2: Datatypes : http://www.w3.org/TR/xmlschema-2/
7. XML Tools XML Parsers expat, James Clark, C, http://www.jclark.com/xml/expat.html XP, James Clark, Java, http://www.jclark.com/xml/xp/ IE5 XJParser, DataChannel, http://xdev.datachannel.com/downloads/xjparser/ XML Editors Notepad XMetaL, SoftQuad, http://www.softquad.com/products/xmetal/index.html Adept, Arbortext, http://www.arbortext.com/Products/ ADEPT_Series/adept_series.html
8. XML APIs / SAX SAX 1.0: a free API for event-based XML parsing import org.xml.sax.Parser; import org.xml.sax.DocumentHandler; import org.xml.sax.helpers.ParserFactory; Parser parser = ParserFactory.makeParser("com.microstar.xml.SAXDriver"); DocumentHandler handler = new MyHandler(); parser.setDocumentHandler(handler); parser.parse("http://pcjhb.software-ag.de/xml/closedXml/instances/m1.xml"); ... import org.xml.sax.HandlerBase; import org.xml.sax.AttributeList; public class MyHandler extends HandlerBase { public void startElement (String name, AttributeList atts) { ... SAX Homepage : http://www.megginson.com/SAX/
9. XML APIs / DOM A platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of XML and HTML documents. import org.w3c.dom.*; import com.docuverse.dom.*; DOM dom = new com.docuverse.dom.DOM(); ... doc = dom.openDocument(url); BasicElement rootNode = (BasicElement) doc.getDocumentElement(); NodeList list = rootNode.getChildNodes(); Integer.toString(list.getLength()) + " Children"); ... DOM Level 1 Specification : http://www.w3.org/TR/REC-DOM-Level-1/
10. Stylesheets <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="music.xsl"?> <!DOCTYPE music SYSTEM "music.dtd"> <music ismn="M-006-46489-0" type="concert"> … <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> ... <xsl:template match="music"> <xsl:apply-templates select="opus"/> ... XSL spec : http://www.w3.org/TR/WD-xsl/ The W3C’s CSS Homepage : http://www.w3.org/Style/CSS/ DSSSL Page at Mulberry Technologies : http://www.mulberrytech.com/dsssl/index.html
11. XML Query Language (XQL) The XML Query Language (XQL) is a query language for XML using the structure of XML as its data model. <pieces> <piece> <title>Sonata I</title> <opus>BWV 1001</opus> <movements> <movement><title>Adagio</title></movement> <movement><title>Fuga Allegro</title></movement> ... //piece//movement[title=“Adagio”] A W3C Proposal : http://www.w3.org/TandS/QL/QL98/pp/xql.html