1 / 55

XML in Libraries

XML in Libraries. Roy Tennant eScholarship California Digital Library escholarship.cdlib.org. Outline. Introduction to XML XML vs. Databases Displaying and Transforming XML XSLT Primer Serving XML to Web Users Case Studies Tips & Advice Resources. Introduction to XML.

Télécharger la présentation

XML in Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML inLibraries Roy Tennant eScholarship California Digital Library escholarship.cdlib.org

  2. Outline • Introduction to XML • XML vs. Databases • Displaying and Transforming XML • XSLT Primer • Serving XML to Web Users • Case Studies • Tips & Advice • Resources

  3. Introduction to XML • Extensible Markup Language • A method of creating and using tags (elements) to identify the structure and contents of a document — not how it should be displayed • The tags can be arbitrary or can come from a specification (a Document Type Definition or schema)

  4. What it Looks Like <?xml version=“1.0” encoding=“UTF-8” ?> <book> <author> <lastname>Tennant</lastname> <firstname>Roy</firstname> </author> <title>The Great American Novel</title> <chapter number=“1”> <chaptitle>It Was Dark and Stormy</chaptitle> <p>It was a dark and stormy night.</p> </chapter> </book>

  5. Why is XML Important? • It is a standard, easily extensible way to encode loosely-structured as well as highly-structured information • Due to its easy parseability, software can transform it in countless ways, thereby allowing: • Easy migration paths • Alternative displays • On-the-fly response to user needs

  6. Ways to Use XML • Behind the scenes as a standard and easily transformed format for information • As a transfer syntax, to exchange information in a machine-parseable form • As a method of delivery direct to the user (not recommended)

  7. Documents • XML is expressed as “documents”, whether an entire book or a database record • Must haves: • At least one element • Only one “root” element • Should haves: • A document type declaration; e.g.,<?xml version="1.0"?> • Namespace declarations • Can haves: • One or more properly nested elements • Comments • Processing instructions

  8. Elements • Must have a name; e.g., <title> • Names must follow rules: no spaces or special characters, must start with a letter, are case sensitive • Must have a beginning and end; <title></title> or <title/> • May wrap text data; e.g., <title>Hamlet</title> • May have an attribute that must be quoted; e.g., <title level=“main”>Hamlet</title> • May contain other “child” elements; e.g., <title level=“main”>Hamlet <subtitle>Prince of Denmark</subtitle></title>

  9. Element Relationships • Every XML document must have only one “root” element • All other elements must be contained within the root • An element contained within another tag is called a “child” of the container element • An element that contains another tag is called the “parent” of the contained element • Two elements that share the same parent are called “siblings”

  10. The Tree <?xml version="1.0"?> <book> <author> <lastname>Tennant</lastname> <firstname>Roy</firstname> </author> <title>The Great American Novel</title> <chapter number=“1”> <chaptitle>It Was Dark and Stormy</chaptitle> <p> It was a dark and stormy night.</p> </chapter> </book> Root element Parent of <lastname> Siblings Child of <author>

  11. Character Entities • There are 5 characters that are reserved for special purposes; therefore, to use these characters when not part of XML tags, you must use an entity reference: • & (ampersand) becomes: &amp; • < (less than) becomes: &lt; • > (greater than) becomes: &gt; • ‘ (apostrophe) becomes: &apos; • “ (quote) becomes: &quot;

  12. Two Types of XML • Well-Formed • Valid

  13. Well-Formed XML • Follows general tagging rules: • All tags begin and end • But can be minimized if empty: <br/> instead of <br></br> • All tags are case sensitive • All tags must be properly nested: • <author> <firstname>Mark</firstname><lastname>Twain</lastname> </author> • All attribute values are quoted: • <subject scheme=“LCSH”>Music</subject> • Has identification & declaration tags • Software can make sure a document follows these rules

  14. Valid XML • Uses only specific tags and rules as codified by one of: • A document type definition (DTD) • A schema definition • Only the tags listed by the schema or DTD can be used • Software can take a DTD or schema and verify that a document adheres to the rules • Editing software can prevent an author from using anything except allowed tags

  15. XML vs. Databases(a simplistic formula) • If your information is… • Tightly structured • Fixed field length • Massive numbers of individual items • You need a database • If your information is… • Loosely structured • Variable field length • Massive record size • You need XML

  16. Displaying XML: CSS • A modern web browser (MSIE, Mozilla) and a cascading style sheet (CSS) may be used to view XML as if it were HTML • A style must be defined for every XML tag • All display characteristics of each element must be explicitly defined • Elements are displayed in the order they are encountered in the XML • No reordering of elements or other processing is possible

  17. Transforming XML: XSLT • XML Stylesheet Language — Transformations (XSLT) • A markup language and programming syntax for processing XML • Is most often used to: • Transform XML to HTML for delivery to standard web clients • Transform XML from one set of XML tags to another • Transform XML into another syntax/system

  18. XLST Primer • XSLT is based on the process of matching templates to nodes of the XML tree • Working down from the top, XSLT tries to match segments of code to: • The root element • Any child node • And on down through the document • You can specify different processing for each element if you wish

  19. XSLT Primer: Beginning Syntax • Start with the XSLT namespace declaration:<xsl:stylesheet version="1.1” xmlns:xsl=http://www.w3.org/1999/XSL/Transform xmlns:xsd="http://www.w3.org/2001/XMLSchema"></xsl:stylesheet> • Then add, which matches the root node:<xsl:template match=“/”> <xsl:apply-templates/></xsl:template>

  20. XSLT Primer: Templates • Then add, for each element you wish to process:<xsl:template match=“ELEMENTNAME”> XSLT INTSTRUCTIONS AND/OR HTML HERE</xsl:template> • If you want to process nodes below this point (children), use the <xsl:apply-templates/> instruction

  21. XSLT Primer: Doing HTML • Typical way to begin:<xsl:template match="/"> <html> <head> <title><xsl:value-of select="title"/></title> <link type="text/css" rel="stylesheet" href="xslt.css" /> </head> <body> <xsl:apply-templates/> </body> </html></xsl:template> • Then, templates for each element appear below

  22. Serving XML to Web Users • Basic requirements: an XML doc and a web server • Additional requirements for simple method: • A CSS Stylesheet • Additional requirements for complex, powerful method: • An XSLT stylesheet • An XML parser (we’ll use xsltproc) • A CGI program to take user input and call xsltproc • A CSS stylesheet (optional) to control how itlooks in a browser

  23. XML Web Publishing Software • Software used to add XML serving capability to a web server • A couple examples, both free: • Cocoon (xml.apache.org/cocoon/) • Requires a Java servlet container such as Tomcat (free) or Resin (commercial) • AxKit (axkit.org) • Requires Perl

  24. AxKit mod_perl Web Server

  25. Cocoon Tomcat Web Server

  26. Java Servlet Resin Web Server

  27. Java Servlet Resin Web Server I want this XML doc…

  28. XSLT Stylesheet XML Doc Java Servlet Resin Web Server

  29. XSLT Stylesheet XML Doc XHTML Document (no displaymarkup)* Java Servlet Resin HTML Stylesheet (CSS) Web Server * Dynamic document

  30. Transformation XSLT Stylesheet Information Presentation XML Doc XHTML Document (no displaymarkup)* Java Servlet Resin HTML Stylesheet (CSS) Web Server * Dynamic document

  31. Case Study: Publishing Books @ the California Digital Library • Goals: • To create highly usable online versions of books • To create versions that will migrate easily as technology changes • To create an infrastructure that will support dynamic presentations of the same content

  32. File System Encodedin TEIXML Stored Search Index Full Text

  33. File System Encodedin TEIXML Stored Search Index Full Text Structure Search Index SelectedFieldsExtracted METSRepository RecordsCreated Stored Project Profile MODS record UC Press record Library Catalog UC PressDatabase

  34. File System Encodedin TEIXML Stored Search Index Full Text Structure Search Index SelectedFieldsExtracted METSRepository RecordsCreated Stored Project Profile Userqueries MODS record UC Press record Library Catalog UC PressDatabase

  35. File System Encodedin TEIXML Stored Search Index Full Text Structure Search Index SelectedFieldsExtracted METSRepository RecordsCreated Stored Project Profile Search Results MODS record User requests book UC Press record Library Catalog UC PressDatabase

  36. File System Encodedin TEIXML Stored Search Index Full Text Javaservlet Structure Search Index SelectedFieldsExtracted METSRepository RecordsCreated Stored User requestsbook segment Project Profile MODS record METS record in XML UC Press record XSLT Library Catalog UC PressDatabase

  37. File System Encodedin TEIXML Stored Search Index Full Text Javaservlet Structure Search Index SelectedFieldsExtracted METSRepository RecordsCreated Stored XSLT Project Profile Booksegmentreturned MODS record UC Press record Library Catalog UC PressDatabase

  38. File System Encodedin TEIXML Stored Search Index Full Text Structure Search Index SelectedFieldsExtracted METSRepository RecordsCreated Stored Project Profile Userqueries MODS record UC Press record Library Catalog UC PressDatabase

  39. File System Encodedin TEIXML Stored Search Index Full Text Structure Search Index SelectedFieldsExtracted METSRepository RecordsCreated Stored Project Profile Resultsreturned MODS record UC Press record Library Catalog UC PressDatabase

  40. File System Encodedin TEIXML Stored Search Index Full Text Javaservlet Structure Search Index SelectedFieldsExtracted METSRepository RecordsCreated Stored User wants to seesearch wordsin context Project Profile MODS record UC Press record Library Catalog UC PressDatabase

  41. File System Encodedin TEIXML Stored Search Index Full Text Javaservlet Structure Search Index SelectedFieldsExtracted METSRepository RecordsCreated Stored Booksegmentreturnedw/termshighlighted XSLT Project Profile MODS record UC Press record Library Catalog UC PressDatabase

  42. http://escholarship.cdlib.org/ucpress/

  43. Case Study: ILL ASAP

  44. Service Tasmania Architecture

More Related