330 likes | 448 Vues
This document provides a comprehensive introduction to XML (Extensible Markup Language), detailing its significance as a universal format for structured documents on the web. It covers the anatomy of XML documents, including well-formedness, markup, elements, and attributes. The guide clarifies what XML is and what it is not, dispelling common misconceptions. Furthermore, it emphasizes XML's role in separating data from presentation, its extensibility, and its utility across various platforms. For more insights and applications, explore the provided resources.
E N D
Introduction to XML John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: 0131 551 8073 (x2073) mailto:John.Arnett@isd.csa.scot.nhs.uk http://isdscotland.org/xml
Contents • What is XML? • Anatomy of an XML Document • Conformance and Validation • Summary • Find Out More
What is XML? • a programming language • a software panacea • an object-oriented technology • HTML with funny tags • a replacement for HTML… but it is re-shaping publishing on the web • XML is not…
What is XML? • Meta-markup language derived from SGML (Standard Generalised Markup Language) • Open Standard, currently XML 1.0 2nd edition (W3C Recommendation 6 October 2000) • Stands for Extensible Markup Language
What is XML? • XML is the universal format for structured documents and data on the Web • A data object is an XML document if it is well-formed, as defined in [the W3C] specification(more on this later) • W3C says
ID SURNAME FORENAME SEX DOB 134376 Jones Ian 0 06011971 198457 McKenzie Alison 1 23081983 111672 Martin Lesley 0 12111979 147678 Jackson Sarah 1 15061976 Flat file, database, spreadsheet, etc What is XML? • Data Content and Presentation Sample dataset
Structured • Searchable • Easy to understand • Portable What is XML? • Record – data oriented structure 111672 Martin Lesley 0 12111979
Easy to understand • Portable • Structured • Searchable What is XML? • HTML – document oriented structure Record Id: 11672 Surname: Martin Given Name: Lesley Sex: Male Date of Birth: 12 November 1979 <h1>Record Id: <font color="red">11672</font></h1> <table><colgroup><col align="left"></colgroup> <tr><th>Surname:</th><td>Martin</td> </tr><tr><th>Given Name:</th><td>Lesley</td> </tr><tr><th>Sex:</th><td>Male</td></tr> <tr><th>Date of Birth:</th><td>12 November 1979</td></tr> </table>
Easy to understand • Portable • Structured • Searchable What is XML? • XML to the rescue! <Record recordId=“11672"> <Surname>Martin</Surname> <GivenName>Lesley</GivenName> <Sex>M</Sex> <DateOfBirth> <Day>12</Day><Month>11</Month><Year>1979</Year> </DateOfBirth> </Record>
What is XML? • Text based • Open standards • Widely used • HTML and XML are…
What is XML? • Structured • Separates data from presentation • Self-describing • Searchable • Extensible • i.e. any number of tags allowed • But XML also…
Anatomy of an XML Document • character data • tab, carriage return and line feed • Unicode characters • markup • XML documents consist of text
Anatomy of an XML Document • Markup <?xml version="1.0" encoding="UTF-8"?> <Message> <!-- this is an xml comment --> <MessageBody>Hello, World Wide Web!</MessageBody> </Message> • start-, end- and empty element tags • tag names are case sensitive! • entity and character references • comments
Anatomy of an XML Document • Character data <?xml version="1.0" encoding="UTF-8"?> <Message> <!-- this is an xml comment --> <MessageBody>Hello, World Wide Web!</MessageBody> </Message> • Reserved characters • &, <, >,‘ and “
Anatomy of an XML Document • Declaration <?xml version="1.0" encoding="UTF-8"?> <Message> <!-- this is an xml comment --> <MessageBody>Hello, World Wide Web!</MessageBody> </Message> • Optional first line of markup (but W3C recommended) • Used to match documents to parsers
Anatomy of an XML Document • Root Element <?xml version="1.0" encoding="UTF-8"?> <Message> <!-- this is an xml comment --> <MessageBody>Hello, World Wide Web!</MessageBody> </Message> • Uniquely named element • Contains all the data and links to other documents
Anatomy of an XML Document • Elements <Book>XML Bible <Price>24.99</Price> <img src=“book.gif"/> <Author>E.R. Harold</Author> <Publisher>J. Forbes</Publisher> </Book> • Define the content of the XML document • May contain other elements, character data or can be empty
Anatomy of an XML Document • Attributes <BookCatalogSubject="XML"> <Book Title="XML Bible" Price="24.99“/> <Book Title="XML How To Program" Price=“19.99“/> <Book Title=“Definitive XML Schema“ Price=“44.99“/> </BookCatalog> • Add data about the elements
Anatomy of an XML Document • Built-in entities & = & “ = " < = < > = > ‘ = ' • Handling reserved characters • CDATA Sections <CodeSnippet> <![CDATA[if(this->getX() < 5 && values[0] => 10) cerr << "out of range";]]> </CodeSnippet>
Anatomy of an XML Document • Namespaces • Preventing naming collisions <order xmlns:cust="http://www.example.com/custDetails“ xmlns:book="http://www.example.com/bookDetails" xmlns="http://www.example.com/order"> <cust:title>Dr</cust:title> <cust:name>Peter Parker</cust:name> <book:title>White Teeth</book:title> <book:price>5.99</book:price> <orderNumber>AYT2379</orderNumber> </order>
Conformance and Validation • One root element • Start and end tags match <Tag>content</Tag> • Empty elements are terminated as<Tag/> • Tags are correctly nested <Parent><Child></Child></Parent> • All attributes enclosed in “quotes” • All XML processors must check well-formedness constraints
Conformance and Validation • specified in Document Type Definitions (DTDs) or Schemas • a valid XML document must be well-formed • a well-formed document need not necessarily be valid • Validating XML processors check against validity constraints
Structure and order of child elements <!ELEMENT Product (Name, Size?)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Size (#PCDATA)> • Element attributes <!ATTLIST Product EffDate CDATA #IMPLIED> Document Type Definitions • DTD syntax able to specify • limited number of data types • default and fixed attribute values
Document Type Definitions • Easy to understand and implement • Lightweight alternative to schemas • But… • use non-XML syntax • only limited support for data typing and namespaces • difficult to extend • DTD’s
Schemas • Uses XML syntax • Provides built-in and supports user-defined data types • Supports namespaces • Provides several extensibilty mechanisms • W3C Schema
Schemas • Schemas therefore more flexible… <xs:element name="Product"> <xs:complexType> <xs:sequence> <xs:element name=“Name" type="xs:string"/> <xs:element name=“Size" type="xs:positiveInteger” minOccurs="0"/> </xs:sequence> <xs:attribute name=“EffDate" type="xs:date"/> </xs:complexType> </xs:element> • but harder to understand than DTD’s <!ELEMENT Product (Name, Size?)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Size (#PCDATA)> <!ATTLIST Product EffDate CDATA #IMPLIED>
In Summary… • A language for describing markup languages • Extensible, ie. define own tags • Readable, structured and self describing • Documents must be well-formed • Documents may be validated using DTD’s and/or Schemas
Find Out More • World Wide Web Consortium • www.w3.org • W3C XML v1.0 Specification • http://www.w3.org/TR/REC-xml
Find Out More • The XML Industry Portal • www.xml.org • O’Reilly XML site • www.xml.com • XML Cover Pages • www.oasis-open.org/cover/ • Café Con Leche • www.ibiblio.org/xml/
Find Out More • Scottish Health and Community Care XML Steering Group • www.isdscotland.org/xml
XML Tools • XSV - Open Source XML Schema Validator • www.ltg.ed.ac.uk/~ht/xsv-status.html • MSXML 4.0 • www.microsoft.com/downloads/details.aspx?FamilyID=3144b72b-b4f2-46da-b4b6-c5d7485f2b42
XML Tools • XML Spy 2004 IDE • www.altova.com/products_ide.html • Free XML Tools and Software • www.garshol.priv.no/download/xmltools/
Printed Sources • Numerous printed sources – for more information visit • Charles F. Goldfarb'swww.xmlbooks.com • www.amazon.com