200 likes | 212 Vues
This article explores the ordered hierarchy of content objects in XML and its advantages over HTML. It also covers the basics of XML, its uses in primary sources, metadata and document management, and the components of XML implementations.
E N D
Beyond HTML:Extensible Markup Language Timothy W. Cole Grainger Engineering Library Information CenterUniversity of Illinois at Urbana-Champaign American Association of Law Libraries19 July 2000 t-cole3@uiuc.edu http://dli.grainger.uiuc.edu/Publications/TWCole/AALL_2000/
Ordered Hierarchy of Content ObjectsA Definition of Text in Computer Terms • Premise: A Text is the Sum of its Components • So a <BOOK> Could Be Defined as Containing:<FRONT_MATTER> <CHAPTER>s <BACK_MATTER> • <FRONT_MATTER> Could Contain:<BOOK_TITLE> <AUTHOR>s <PUBLISHER> • While Each <CHAPTER> Could Contain:<CHAPTER_TITLE> <SECTION>s • And Each <SECTION> Could Contain:<SECTION_TITLE> <PARAGRAPH>s • Components Chosen Reflect Anticipated Use
Ordered Hierarchy of Content Objects(continued) • OHCO is a Useful, Albeit Imperfect Model • More Powerful Than Model of Text as a Stream of Characters & Formatting Instructions • Does Not Allow for Overlapping Content Objects • OHCO Model is Inherent in XML, HTML • XML Designed for Descriptive Content Objects, Not Presentational Content Objects • XML Syntax is Fixed, But Semantics is Extensible
XML Basics: Markup & Content • Consider: Would Display As:<?xml version='1.0' ?> Colè, Tim<!-- This is an Example --><author sequence='first'><LName> Colè </LName>,<FName> Tim </FName> </author> • This example illustrates: • XML Processing Instructions • XML Comments (Ignored by XML Applications) • XML Element Markup, Including an Attribute • XML Content, Including an Entity
XML Basics (continued) • “Well-Formed” XML Rules: • XML Element Markup is Case-Sensitive • All XML Tags Must Be Closed • Hierarchical Nesting; No Overlapping Elements • All XML Attribute Values Must Be Quoted • Enforces Stricter Syntax than HTML • Facilitates Fast, Efficient Parsing • Extensible Semantics Provide Flexibility • “Well-Formed” More Lightweight Than SGML
Is It Valid Or Well-Formed?When Does It Matter? • All Web Browsers Need Is Well-Formed • XML Authoring Tools Need To Validate • Otherwise Tower of Babel Ensues • Indexing Agents & Schema-Specific Rendering Agents May Need To Validate • Illustrations: • Malformed XML • Well-Formed But Invalid XML • Valid XML
Library Uses of XML:Using XML for Primary Sources • Facilitates Searching • Full-Text Searching & Field-Specific Searching • More Meaningful Proximity Searching • Better Retrieval / Browsing • Selective Views / Suppression of Personal Data • Re-Ordered & Piecemeal Views • Illustration -- Illinois Agronomy Handbook • Search • Browsing
Library Uses of XML:XML for Metadata & Wrapping • Facilitates Interchange, Normalization, ... • Simpler than Fixed Fields, Record Headers, Etc. • XML Implementations of Metadata Standards, e.g.: RDF, EAD, DC, FGDC, US-MARC • Easier Routing / Handling of Specialized Content • In Combination with Primary Source XML • Automatic Extraction of Metadata From Source • Facilitates Authority Control
Library Uses of XML: XML for Document Management • Smarter Documents • XML Namespaces -- Integrating Multiple XML Schemas (Including XHTML) • Rights Management, Technical Requirements,… • Facilitates Enhanced Linking Between Docs. • Creation of Links From Marked Up Content • Easy to Add or Modify Links Over Time • XLink & XPointer Promise More Robust Linking • Metadata File from Illinois DLIB Testbed • Schema Integrates RDF, DC, & Project Design
Components of XML ImplementationsDTDs & XML Schemas • Use Either to: • Define Content Models • Declare Attributes & Entities • DTDs Inherited from SGML • DTDs Themselves Not Well-Formed XML • Limits on Detail of Content Model Definitions • Minimal Data Typing • XML Schemas Are Well-Formed XML • Data Typing & Better Content Models Supported • Not Yet in Widespread Use
Components of XML ImplementationsEncoding & Entities(Using Characters Not on Your Keyboard) • Computers Use 1s and 0s, but Characters form the Basis of Human-Readable Texts • Coded Character Sets (CCS) Assign Integer Values to Characters -- ASCII, ISO 8859, Unicode • Character Encoding Schemes (CES) Map Those Integers to Bytes -- 7-bit, 8-bit, UTF-8 • Bytes Are Then Rendered as Glyphs by Your Computer, Using Font Appropriate to CCS/ CES • Font Unavailable Or CCS/CES Misunderstood Results in Incorrect Character(s) on Screen
Components of XML ImplementationsEncoding & Entities (continued) • Common Ways to Deal With This Problem: • Select CCS/CES Appropriate to Language • Use Default CCS/CES, but Override Default Font • Use XML/HTML Named or Numeric Entity • HTML Understands Non-Extensible Set of Named Entities • XML Understands Numeric Entities Corresponding to Unicode CCS, All Named Entities Must Be Declared in DTD • Use Unicode for CCS, UTF-8 for CES - XML Defaults • An Illustration in HTML
Components of XML ImplementationsPresentation - CSS Style Sheets • XML Content Objects Have No Style • Use Cascading Style Sheets (CSS)Work Like CSS for HTML, Except: • Must Be Explicit About Everything • No Special Treatment of Class & ID Attributes • Attach CCS to XML Using Special XML PI • CSS Does Define Formatting • CSS DOES NOT Reorganize or Add Content • Simple XML-CSS Example; The CSS Used
Components of XML ImplementationsTransformations - XSLT Style Sheets • Some Characteristics of XSLT Style Sheets • XSLT Files Are Well-Formed XML • XSLT Transform to Another Schema, Or to XHTML • XSLT Objects Have Implicit Functionality • Attach XSLT To Document Using XML PI • XSLT Can Reorganize & Add Content • Still Need CSS for Presentation -- CSS Style Sheets Work on the Output of XSLT Processing • Supplement XSLT With Script To Manipulate & Modify Actual Content • Simple XSLT Example; The XSLT Style Sheet
The State-of-the-Art in XML Tools • XML Authoring • Add-Ons to Established Word Processors, e.g.:WordPerfect 9 / WordPerfect 2000 • Tools With SGML Roots, e.g.:ArborText’s Epic (was Adept) EditorSoftQuad’s XMetaL Editor • New XML Tools, e.g.:Vervet Logic’s XMLProExtensibility’s XML Authority / XML Turbo • So Far, There Are Fewer Authoring Tools Customized for Specialized XML Schemas
The State-of-the-Art in XML Tools (continued) • XML Presentation Tools: • Latest Releases of Netscape Navigator/Mozilla, and Microsoft’s Internet Explorer Support XML-- But Support is Generic, Partial, & Uneven • Plug-Ins, Standalones Available / In Work for Advanced XML Schemas (CML, MML, VML,…) • XML Database Integration Tools: • Add-Ons to Established DBMS Available/In WorkMicrosoft SQL Server-XML Technology Preview • Illustration; With Query & CSS; XML Source File; • XML Query Language Specification In Work
Developing XML Applications:The Politics of XML • Evolution of XML • XML Formalized as W3C Recommendation 2/98 • Numerous Ancillary Specs Released & In WorkNamespaces, XSLT, XLink/XPointer, XML Signature • Numerous Early Implementors(Chemistry, Biology, Multimedia, Metadata) • Prerequisites for Community Implementations • Identify Target(s) of Opportunity • Define Horizontal & Vertical Content Objects • Consensus Building & Community Buy-In • Test Implementations & Tool Building
Developing XML Applications:The Politics of XML (continued) • Status of XML In Legal Community • LegalXML Has Identified Targets Begun Process of Defining Content Objects & Building Consensus • Progress in Some Areas, e.g.:Court Filing (see also XML Court Interface) • Less Visible Progress in Other Workgroups, e.g.:Reference, Public Law, Users • Presence (& Vested Interests) of Extensive Non-XML Legal Automation Systems In Place Lessens Motivation
Developing XML Applications:The Politics of XML (continued) • Status of XML In Publishing & Libraries • Extensive XML Work in MetadataUnfortunately Has Led to Competing Stds. • Many Publishers Have Been Using SGML for a Decade or More -- But Only Internally • Perceived Tradeoff (probably overrated):Publicly Releasing Primary Sources in XML vs.Control of Product & Marketplace • Problems with Early SGML Web Experiments • No One Wants to be FirstBut No One Wants to be Last Either
Future Directions • Continued Evolution of Standards, Tools • Continued Development of Community Implementations -- Selected Disciplines • Increased Use of XML Behind the Scenes • Carryover from SGML Trends • Integration of XML with Databases • XML Unlikely to Replace HTML, Other Document Formats, But Will Co-Exist • Magnitude of Role in Law Libraries Uncertain, but Likely to Have At Least Some Role