130 likes | 271 Vues
This document provides a comprehensive overview of XML (Extensible Markup Language) and its functions as a flexible, platform-independent data container. It describes the fundamental aspects of XML, including well-formedness, validation, built-in types, and the differences between Document Type Definitions (DTDs) and XML schemas. Additionally, it highlights XML's application in digital libraries, particularly through the DLESE system's metadata management and the use of various design philosophies like the Venetian Blind Model. The document serves as a guide for understanding how XML can organize and structure data dynamically.
E N D
XML and XML in DLESE Katy Ginger November 2003
XML Purpose • Provide a container for data that is presentation and platform independent • Container for data that is flexible and extensible; user defines the tags and content • A single container for data that has multiple purposes and uses in a variety of software or web applications Note: XML databases exist now
What is XML data? • Is called instance documents • Consists of user defined tags • Is well-formed and valid • The content that can be defined and controlled See DLESE Annotation Metadata Record example
Built-in Primitive Types • Strings: e.g. strings • Binary: e.g. boolean • Numeric: e.g. decimal, float, double from which integer is derived • Date/time: e.g. date, dateTime, duration, time
Correct <car> <make>Dodge</make> <model>Spirit</model> <year>1994</year> <owner> <name>you</name> <plate>CO</plate> </owner> </car> Incorrect <car> <make>Dodge</make> <model>Spirit</model> <year>1994 <owner> <plate>CO</plate> <name>you</name> </car> </owner> Well-formed and valid XML
DTD: Document Type Definition Describe the elements of XML instance documents Not well-formed XML Some data-typing Namespaces harder to deal with Schemas Describe the elements of XML instance documents Well-formed XML Strong data-typing Namespaces are easier to deal with DTD, Schemas & Namespaces Namespace: Collection of related element names identified by a name label (e.g. dc:title where dc is for Dublin Core)
XML Schema Design Philosophy • Decide where to apply XML, it’s good for: • Data archiving • Message passing • Presentation documents • Decide between global & local XML declarations • Russian Doll Design • Salami Slice Design • Venetian Blind Model
Russian Doll Design • Schema mirrors structure of the instance doc • Elements are declared inside parent elements • All elements are local in scope • Elements may appear multiple times but can’t be referenced elsewhere • Changes on an element affect only the content model (parent element) of the element being changed • Hides namespace complexities
Salami Slice Design • Each element/attribute is declared globally • Content models are then pieced together through references • Elements can be used anywhere and in multiple schemas • Changes on an element affect it everywhere the element is used • Does not hide namespaces
Venetian Blind Model • Elements/attributes are defined as types (simple or complex) • Content models are made of types • Types are reusable and extendable • Changes on an element affect its type and where the type is used • Namespaces can be hidden or not hidden
How to control tag content • Use restriction and enumeration elements • Use regular expressions See DLESE Annotation Metadata Record example
How DLESE uses XML • DLESE systems act on metadata • All DLESE metadata is stored as flat XML files • DLESE harvests metadata XML files from other digital libraries • DLESE provides metadata XML files to other digital libraries • DLESE transitioned to schemas in Spring 2003 • DLESE schemas use the Venetian Blind Model
XSLT (Extensible Stylesheet Language Transformation) • Acts on valid and well formed XML documents like instance docs and schemas • Used to create new text, HTML or XML docs • DLESE uses it to crosswalk our metadata format to the Dublin Core metadata format