190 likes | 339 Vues
This seminar, presented at the National Library of Scotland on May 5, 2006, explores XML, its history, and its significance as a widely accepted industry standard for information transfer on the internet. Attendees will learn about XML's basic constructs, the importance of well-formed documents, and the use of DTDs and XML Schema for document validation. The presentation covers practical examples and potential issues with XML, such as problem characters and namespaces, while also offering resources for further learning and exploration of XML technology.
E N D
Introduction to XML Presented to the seminar “Introduction to MARCXML”, National Library of Scotland, Edinburgh, 5 May 2006, organised by the Cataloguing and Indexing Group in Scotland Eric Jutrzenka, e.jutrzenka@nls.uk, 5 May 2006
Brief History of XML • Markup used by copy editors • Lead to ISO Standard Generic Markup Language (SGML) – 1986 • Tim Berners-Lee created Hyper-Text Markup Language - 1989 • eXtensible Markup Language (XML) became W3C recommendation in 1998
What is XML? • A set of specifications that is widely accepted as an industry standard for information transfer on the internet • A means of applying type and structure to information • It’s easy to understand • It’s flexible
Basic Constructs • Document • Contains a single root Element • Which contains zero or more elements • Each of which have • zero or more attributes • zero or one value • zero or more child elements
Example <?xml version="1.0" encoding="ISO-8859-1"?> <!-– Example XML document --> <library> <booktype=“novel”> <title>A Tale of Two Cities</title> <author>Charles Dickens</author> </book> <book/> </library>
All XML is Well Formed • Must have a single root element • Every start tag must have an end tag • Must be a tree structure • Element names are case sensitive and must begin with a letter or underscore (_) • Element names can contain letters, digits, periods (.), hyphens (-), underscores (_), or colons (:)
Example – Not XML <?xml version="1.0" encoding="ISO-8859-1"?> <!-–Example XML document, 4 mistakes--> <library> <1booktype=“novel”> <title> A Tale of Two Cities <join> </Title> <author’s name> </join> Charles Dickens </author’s name> </1book> </library>
Problem Characters • < > ‘ “ • Angle brackets used in element names so can cause problems • Apostrophes and quotes used to delimit attribute values
Example <?xml version="1.0" encoding="ISO-8859-1"?> <!-– Example XML document --> <library> <booktype=“novel”> <title> Pride & Prejudice </title> <author> Jane Austen </author> <notedesc=“quotes look like: “”> 3 < 4, 4 > 3 </note> </book> </library>
Example <?xml version="1.0" encoding="ISO-8859-1"?> <!-– Example XML document --> <library> <booktype=“novel”> <title> Pride & Prejudice </title> <author> Jane Austen </author> <note type=“quotes look like: "”> 3 < 4, 4 > 3 </note> </book> </library>
Namespaces • Name collisions • Conversion <table /> VS. Coffee <table /> • <furniture:table /> • <conversions:table />
Example <?xml version="1.0" encoding="ISO-8859-1"?> <stuff xmlns:conv=‘http://nls.uk/conversion’> <table xmlns=‘http://nls.uk/furniture’> <material>Wood</material> <height>0.5m</height> </table> <conv:table> <conv:unit>Miles</unit> <conv:unit>Kilometres</unit> </conv:table> </stuff>
DTDs and XML Schema • Document Type Definition (DTD) • Part of the SGML standard • XML Schema • New and improved XML standard for specifying type • Formal ways of specifying document type • A Valid XML document conforms to a DTD or Schema
Example DTD <!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author+)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> ]>
Example Schema <?xml version="1.0"?> … <xs:element name=“library”> <xs:complexType> <xs:sequence> <xs:element name=“book”> <xs:complexType> <xs:sequence> <xs:element name=“title” type=“xs:string” /> <xs:element name=“author” type=“xs:string” minOccurs=‘1’ maxOccurs=‘unbounded’ /> </xs:sequence>
Extract from MARC 21 XML Schema <xsd:simpleType name="leaderDataType" id="leader.st"> <xsd:restriction base="xsd:string"> <xsd:whiteSpace value="preserve"/> <xsd:pattern value="[\d ]{5}[\dA-Za-z ]{1}[\dA-Za-z]{1}[\dA-Za-z ]{3}(2| )(2| )[\d ]{5}[\dA-Za-z ]{3}(4500| )"/> </xsd:restriction> </xsd:simpleType>
XSLT • Used to transform XML documents • Can handle formats other than XML • Often used to display information in XML document, e.g. XML to PDF
Additional Resources • http://www.w3c.org • For detailed specifications • http://www.w3schools.com/ • For friendly ‘click-through’ web tutorials with many examples • http://www.xml.com • News, Articles, Tutorials