950 likes | 1.21k Vues
XML Tutorial, Bertram Lud
 
                
                E N D
1. 1 XML: The Big Picture and Some Gory Details    (A brief tutorial with an eye towards e-records and archival)  Bertram Ludaescher
 ludaesch@sdsc.edu
Data Intensive Computing Environments (DICE) Group
San Diego Supercomputer Center, UCSD 
2.   XML Tutorial, Bertram Ludäscher 2 DICE Members   Staff
Reagan Moore
Chaitan Baru
Amarnath Gupta
Bertram Ludäscher
Richard Marciano
Arcot Rajasekar
Wayne Schroeder
Michael Wan
Ilya Zaslavsky
Bing Zhu
+ NN * 4   Students
Pratik Mukhopadhyay
Azra Mulic
Kevin Munroe
Paul Nguyen
Michail Petropolis
Nicholas Puz
Pavel Velikhov
+/- NN 
3.   XML Tutorial, Bertram Ludäscher 3 Tutorial Outline Roadmap & Overview 
What about XML vs. E-records and Archives? (or: why it’s good to be here ;-)
XML 101 
XML 232
Querying & Transforming XML
Mediation of Information using XML (MIX)
Other Projects... 
4.   XML Tutorial, Bertram Ludäscher 4 Some History (or: from fat via lean… SGML (Standard Generalized Markup Language)
ISO Standard, 1986, for data storage & exchange
Metalanguage for defining languages (through DTDs) 
A famous SGML language: HTML!!
Separation of content and display
Used in U.S. gvt. & contractors, large manufacturing companies, technical info. Publishers,...
SGML reference is 600 pages long
XML (eXtensible Markup Language)
W3C (World Wide Web Consortium) -- http://www.w3.org/XML/) recommendation in 1998
Simple subset (80/20 rule) of SGML:  “ASCII of the Web”, “Semantic Web”.
XML specification is 26 pages long
 
5.   XML Tutorial, Bertram Ludäscher 5   … to skinny and back! ) Canonical XML
“normalization”, equivalence testing of XML documents
SML (Simple Markup Language)
“Reduce to the max”: No Attributes / No Processing Instructions (PI)  / No DTD / No non-character entity-references / No CDATA marked sections / Support for only UTF-8 character encoding / No optional features 
XML Schema
XML Schema definition language 
Back to complex: 
Part I (Structures), Part II (Data Types), Part III aehm 0 (Primer)  
X-Zoo (Xoo?), “Brave New X-World”
Specifications CSS • Digital Signatures • ebxml Project Teams • ebXML • IETF Specifications • Internationalization • IOTP (Internet Open Trading Protocol) • OASIS • Requirements Documents • SMIL • SVG (Scalable Vector Graphics) • Topic Maps • W3C Activity Pages • W3C Notes • W3C Standards • W3C Standards-in-progress • WAP • WebDAV • XHTML • XLink • XPath • XSLT
Vocabularies DTDs • Music • P3P • RDF • RSS • SMIL • W3C Standards • W3C Standards-in-progress • WML • XHTML • XSL FO's • XSLT • XUL
 Vertical Industries Advertising • Commerce • Consortiums • Construction • Food • Insurance • Legal • Medical • Music • OASIS • Real Estate • Science • Space Exploration • Telecommunications • Travel • Weather 
6.   XML Tutorial, Bertram Ludäscher 6 … but …  FEAR NOT! 
7.   XML Tutorial, Bertram Ludäscher 7 Back to the Future (or Archival for the Past...)     A time traveler sends a message in the virtual bottle, containing parts of the universal library of human and post-human mankind back into the last third of the 20th century... 
 ... when the Web, XML, WAP, B2B, and Petabytes were unheard of 
 ... RAM was so precious that it was ok to deal with nibbles 
 ... MS-DOS was still called CP/M 
 ... and in fact Bill hadn’t moved into the garage yet but worked on a homework assignment by Christos, trying to sort pancakes faster (Gates, W.H. and Papadimitriou, C. "Bounds for Sorting by Prefix Reversal." Discr. Math. 27, 47-57, 1979.) 
Task: make sense out of the futuristic message in the past!
 
8.   XML Tutorial, Bertram Ludäscher 8 Our past futurist’s (future archeologist’s?) supercomputer looked like this …   
9.   XML Tutorial, Bertram Ludäscher 9 Message in the bottle: 1 ÐÏ^Qࡱ^Zá^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@>^@^C^@þÿ     ^@^F^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@#^@^@^@^@^@^@^@^@^P^@^@%^@^@^@^A^@^@^@þÿÿÿ^@^@^@^@"^@^@^@ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿì¥Á^@q^@^D^@^@^@^R¿^@^@^@^@^@^@^P^@^@^@^@^@^D^@^@Ç^G^@^@^N^@bjbjt+t+^@^@^@
^@Some Quotations from the Universal Library^M1 Famous Quotes^M1.1 By William I^M[2, Sonnet XVIII]^MShall I compare thee to a summer's day?^MThou art more lovely and more temperate.^MRough winds do shake the darling buds of May,^MAnd summer's lease hath all too short a date.^MSometime too hot the eye of heaven shines,^MAnd often is his gold complexion dimmed.^MAnd every fair from fair some declines,^MBy chance or nature's changing course untrimmed.^MBut thy eternal summer shall not fade,^MNor lose possession of that fair thou owest,^MNor shall Death brag thou wander'st in his shade^MWhile in eternal lines to time thou growest.^MSo long as men can breathe, or eyes can see,^MSo long live this, and this gives life to thee.^M1.2 By William II^M[1, p.265]^M\223The obvious mathematical breakthrough would be development of^Man easy way to factor large prime numbers."^MReferences^M[1] W. H. Gates. The Road Ahead. Viking Penguin, 1995.^M[2] W. Shakespeare. The Sonnets of Shakespeare.609.^M^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ^A^@þÿ^C^@^@ÿÿÿÿ^F^B^@^@^@^@^@À^@^@^@^@^@^@F^X^@^@^@Microsoft Word Document^@^@^@^@MSWordDoc^@^P^@^@^@Word.Document.8^@ô9²q^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ 
10.   XML Tutorial, Bertram Ludäscher 10 Message in the bottle: 2 
{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose02020603050405020304}Times New Roman;}\
{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}^M 
{\f17\froman\fcharset238\fprq2 Times New Roman CE;}{\f18\froman\fcharset204\fprq2 Times New Roman Cyr;}{\f20\froman\fcharset161\fprq2 Times New R\
oman Greek;}{\f21\froman\fcharset162\fprq2 Times New Roman Tur;}^M 
…
Some Quotations from the Universal Library^M 
\par }\pard\plain \s2\sb240\sa60\keepn\widctlpar\outlinelevel1\adjustright \b\i\f1\cgrid {\cgrid0 1 Famous Quotes^M 
\par }\pard\plain \s3\sb240\sa60\keepn\widctlpar\outlinelevel2\adjustright \f1\cgrid {\cgrid0 1.1 By William I^M 
\par }\pard\plain \s4\sb240\sa60\keepn\widctlpar\outlinelevel3\adjustright \b\f1\cgrid {\cgrid0 [2, Sonnet XVIII]^M 
\par }\pard\plain \widctlpar\adjustright \fs20\cgrid {\f1\fs24\cgrid0 Shall I compare thee to a summer's day?^M 
\par Thou art more lovely and more temperate.^M 
\par Rough winds do shake the darling buds of May,^M 
… 
\par }\pard\plain \s3\sb240\sa60\keepn\widctlpar\outlinelevel2\adjustright \f1\cgrid {\cgrid0 1.2 By William II^M 
\par }\pard\plain \s4\sb240\sa60\keepn\widctlpar\outlinelevel3\adjustright \b\f1\cgrid {\cgrid0 [1, p.265]^M 
\par }\pard\plain \widctlpar\adjustright \fs20\cgrid {\f1\fs24\cgrid0 \ldblquote The obvious mathematical breakthrough would be development of^M 
\par an easy way to factor large prime numbers."^M 
\par }\pard\plain \s2\sb240\sa60\keepn\widctlpar\outlinelevel1\adjustright \b\i\f1\cgrid {\cgrid0 References^M 
\par }\pard\plain \widctlpar\adjustright \fs20\cgrid {\f1\fs24\cgrid0 [1] W. H. Gates. The Road Ahead. Viking Penguin, 1995.^M 
\par [2] W. Shakespeare. The Sonnets of Shakespeare. 1609.}{\fs28 ^M 
\par }} 
 
11.   XML Tutorial, Bertram Ludäscher 11 Message in the bottle: 3 %!PS-Adobe-2.0 
%%Creator: dvipsk 5.58f Copyright 1986, 1994 Radical Eye Software 
%%Title: msg.dvi 
%%Pages: 1 
…
/X{S N}B /TR{translate}N /isls false N /vsize 11 72 mul N /hsize 8.5 72 
mul N /landplus90{false}def /@rigin{isls{[0 landplus90{1 -1}{-1 1} 
ifelse 0 0 0]concat}if 72 Resolution div 72 VResolution div neg scale 
…
TeXDict begin 39158280 55380996 1000 600 600 (msg.dvi) 
@start /Fa 16 117 df<0000000001C0000000000003C0000000000003C00000000000 
07C000000000000FC000000000000FC000000000001FC000000000001FE000000000003F 
E000000000003FE000000000007FE00000000000FFE00000000000EFE00000000001EFE0 
0000000001CFE000000000038FE000000000038FE000000000070FE000000000070FE0 
…
%%EndSetup 
1 0 bop 659 872 a Ff(Some)44 b(Quotations)f(from)f(the)i(Univ)l(ersal)h 
(Library)515 1470 y Fe(1)134 b(F)-11 b(amous)45 b(Quotes)515 
1669 y Fd(1.1)112 b(By)37 b(William)d(I)515 1822 y Fc([2)o(,)d(Sonnet)h 
(XVI)s(I)s(I])722 2004 y Fb(Shall)c(I)g(compare)e(thee)i(to)f(a)g 
(summer's)g(da)n(y?)722 2104 y(Thou)h(art)f(more)f(lo)n(v)n(ely)h(and)g 
(more)g(temp)r(erate.)722 2204 y(Rough)g(winds)h(do)f(shak)n(e)g(the)h 
(darling)e(buds)i(of)g(Ma)n(y)-7 b(,)722 2303 y(And)28 
b(summer's)g(lease)e(hath)i(all)f(to)r(o)h(short)f(a)g(date.)722 
2403 y(Sometime)h(to)r(o)f(hot)h(the)g(ey)n(e)f(of)h(hea)n(v)n(en)e 
(shines,)722 2503 y(And)i(often)g(is)g(his)f(gold)g(complexion)g  
12.   XML Tutorial, Bertram Ludäscher 12 Message in the bottle: 4 \documentclass{article} 
 \begin{document} 
 \title{Some Quotations from the Universal Library} 
...
\section{Famous Quotes} 
 \subsection{By William I} 
 \textbf{\cite[Sonnet XVIII]{shakespeare-sonnets-1609}} 
 \begin{verse} 
  Shall I compare thee to a summer's day?\\ 
  Thou art more lovely and more temperate. \\ 
  Rough winds do shake the darling buds of May, \\ 
  And summer's lease hath all too short a date. \\ 
  Sometime too hot the eye of heaven shines, \\ 
  And often is his gold complexion dimmed. \\ 
…
  \qquad So long as men can breathe, or eyes can see,\\ 
  \qquad So long live this, and this gives life to thee.   \\ 
\end{verse} 
...
 \bibliographystyle{abbrv} 
\bibliography{msg} 
 
\end{document}  
13.   XML Tutorial, Bertram Ludäscher 13 Message in the bottle: 5 <HTML> 
<HEAD> 
<TITLE>Some Quotations from the Universal Library</TITLE> 
</HEAD> 
<BODY> 
 
<B><FONT FACE="Arial" SIZE=5><P>Some Quotations from the Universal Library</P> 
</FONT><I><FONT FACE="Arial"><P>1 Famous Quotes</P> 
</B></I><P>1.1 By William I</P> 
<B><P>[2, Sonnet XVIII]</P></B>
<P>Shall I compare thee to a summer's day?</P> 
<P>Thou art more lovely and more temperate.</P> 
<P>Rough winds do shake the darling buds of May,</P> 
<P>And summer's lease hath all too short a date.</P> 
<P>Sometime too hot the eye of heaven shines,</P> 
<P>And often is his gold complexion dimmed.</P> 
...
<P>So long as men can breathe, or eyes can see,</P> 
<P>So long live this, and this gives life to thee.</P> 
<P>1.2 By William II</P> 
<B><P>[1, p.265]</P> 
</B><P>"The obvious mathematical breakthrough would be development of</P> 
<P>an easy way to factor large prime numbers."</P> 
<B><I><P>References</P> 
</B></I><P>[1] W. H. Gates. The Road Ahead. Viking Penguin, 1995.</P> 
<P>[2] W. Shakespeare. The Sonnets of Shakespeare. 1609.</P></FONT></BODY> 
</HTML> 
 
 
14.   XML Tutorial, Bertram Ludäscher 14 Message in the bottle: 6 <?xml version="1.0"?> 
<universal_library> 
  <books> 
    <book> <title>Some Quotations from the Universal Library</title> 
      <section> <title>Famous Quotes</title> 
        <subsection>  <title>By William I</title> 
          <quote bibref="shakespeare-sonnets-1609"> 
          <title>Sonnet XVIII</title> 
          <verse> 
            <line>Shall I compare thee to a summer's day?</line> 
            <line>Thou art more lovely and more temperate. </line> 
            <line>Rough winds do shake the darling buds of May, </line> 
          </verse>
  …
      <subsection> <title>By William II</title>         
        <quote bibref="gates-road-ahead-1995"> 
          <title>Page 265</title>  
          <line>``The obvious mathematical breakthrough would be development of an easy way to factor large prime numbers.’’</line> 
        </quote> 
      </subsection> 
      </section> 
</book> 
… 
</books>
</universal_library> 
 
15.   XML Tutorial, Bertram Ludäscher 15 XML as a Self-Describing Format can be “understood” using any (archaic CP/M) editor
can be parsed easily
contains its own structure (=parse tree) in the data
=> allows the e-archeologist to rediscover schema and content (=semantics!?)
may also include an explicit schema description (DTD)
=> “meta-model”: definition of a language w.r.t. which it is valid
allows separation of marked-up content from presentation (=>style sheets)
as a self-describing format good for “archival into the past” => not bad for archival into the future 
16.   XML Tutorial, Bertram Ludäscher 16 Some thoughts on how XML can help with e-record management...   Assumption: represent e-records in XML
=> self-describing format (good for archival)
=> get a semistructured data model (flexible: encode regular tables, nested structures, objects, or even (cleaned up) HTML)
=> many tools (and many more to come -- (re)use code): 
parsers, validators, query languages, storage
=> standards (good for interoperation, integration, etc):
generic standards (XML, DTDs, XML Schema, XPath,...)
community/industry standards (=specific markup languages) 
17.   XML Tutorial, Bertram Ludäscher 17 ...thoughts continued  “E-Record Quality Assurance”:
by “subscribing” to a certain XML DTD/XML Schema/XML ???, you can make sure that “the same language is spoken”
validation using DTDs provides a first simple quality control:
are the right tags used? 
is the nesting of elements ok w.r.t. the DTD? 
is the order and multiplicity of element ok?
if you need more => use validation w.r.t. an XML Schema
now: check also data types
use specialization and other mechanisms from object-oriented modeling 
more integrity checking possible (cardinalities,…)
still want more integrity checks (ICs) or even “policies”?
=> use a declarative rule language for specifying the constraints and policies at design time. Implement them at run time, e.g., by adding the ICs  to the XML DTD/Schema/… 
=> checking ICs and policies is similar to issuing specific queries against the data
=> use  query processors (relational DBs, XML DBs, XML tools) for integrity checking when possible
=> for evolution of records, look at versioning models for data bases and temporal database models and query languages  
18.   XML Tutorial, Bertram Ludäscher 18 Back to XML: Different Perspectives   Document (SGML) Community
data = linear text documents
mark up (annotate) text pieces to describe context, structure, semantics of the marked text
Database Community
XML as a (most prominent) example of the semistructured data model
=> captures the whole spectrum from highly structured, regular data to unstructured data 
19.   XML Tutorial, Bertram Ludäscher 19 More Perspectives on XML "XML is the cure for  your data exchange, information integration, e-commerce, [x-2-y, U name it] problems” (“snake oil/silver bullet theory”) 
"XML is nothing but (another) syntax (for Lisp, trees,…)”  (“nothing new under the sun”) 
(books (book (author “Shakespeare” )
                    (title “Sonnets”) 
                    (verse (line “Shall I compare…” )
                                (line …)   …))) 
20.   XML Tutorial, Bertram Ludäscher 20 So what is XML (all about)?  Executive Summary:
XML = HTML –  idiosyncrasies  (simplified syntax) 
                      +  user-definable ("semantic") tags
Separation of data and its presentation
=> simple, very flexible data exchange format:
      semistructured data model 
=> new applications: 
Information exchange (B2B), sharing (diglib), integration ("mediation"), archival, ...
Web site mangement (XML+XSL stylesheets), ... 
21.   XML Tutorial, Bertram Ludäscher 21 Many X-cellent(?) Acronyms...   XML (Extensible Markup Language)
XML Namespaces
XML DTDs, XML Schema
RDF (Resource Description Framework)
XSL (Extensible Style Sheet Language)
XPath (=XSLT? XPointer), XLink
XQL, XML-QL (XML Query Language)
XMAS (XML Matching And Structuring language)
eXcelon, ... 
=> XML++ (i.e. += X-tensions)  >>  just syntax
=> a family of technologies (XML extensions, tools, ... )
=> generic standards and industry/community standards 
22.   XML Tutorial, Bertram Ludäscher 22 XML Applications & Industry Initiatives http://www.oasis-open.org/cover/xml.html#applications
Advertising: adXML place an ad onto an ad network or to a single vendor
Literature: Gutenberg convert the world’s great literature into XML
Directories: dirXML Novell’s Directory Services Markup Language (DSML)
Web Servers: apacheXML parsers, XSL, web publishing
Travel: openTravel information for airlines, hotels, and car rental places
News: NewsML creation, transfer and delivery of news
Human Resources: XML-HR standardization of HR/electronic recruiting XML definitions
International Dvt: IDML improve the mgt. and exchange of info. for sustainable development
Voice: VoxML markup language for voice applications
Wireless: WAP (Wireless Application Protocol) wireless devices on the World Wide Web
Weather:  OMF Weather Observation Markup Format (simulation) 
Geospatial: ANZMETA  distributed national directory for land information
Banking: MBA  Mortgage Bankers Association of America --> credit report, loan file, underwriting…
Healthcare: HL7  DTDs for prescriptions, policies & procedures, clinical trials
Math: MathML  (Mathematical Markup Language)
Surveys: DDI  (Data Documentation Initiative) “codebooks” in the social and behavioral sciences
 
23.   XML Tutorial, Bertram Ludäscher 23 XML E-commerce Initiatives CommerceNet
eCo Framework XML specs. to support interoperability among e-businesses
Commerce One Common Business Library (CBL): set of business components, docs. In DTD, XDR, SOX 
BizTalk Microsoft spec. based on XML schemas
cXML (Commerce XML) -- tag-sets for e-procurement into BizTalk
Electronic Data Interchange (EDI)
RosettaNet Common format for online ordering
FpML (Financial products Markup Language): sharing of financial data (interest rate & foreign exchange products)
Open Buying on the Internet (OBI)
OBI high volume b2b purchasing transactions over the Internet (Office Depot, Lockheed, barnesandnoble, AX...
E-commerce and XML
VISA Invoices The Visa Extensible Markup Language (XML) Invoice Specification provides a comprehensive list of data elements contained in most invoices, including: Buyer/Supplier, Shipping, Tax, Payment, Currency, Discount, and Line Item Detail. 
B2B Integration
code360 XML-Broker is middleware software that manages XML based transactions
Bluestone XML Suite Enables to develop and deploy e-commerce, electronic data interchange, application integration and supply chain management applications. Bluestone XML Suite products include: XML-Server, Visual-XML, XML-Contact and XwingML.
webMethods Provides companies with integrated direct links to buyers and suppliers
 
24.   XML Tutorial, Bertram Ludäscher 24 What’s Wrong with HTML? 
25.   XML Tutorial, Bertram Ludäscher 25 ...What’s Wrong with HTML... 
26.   XML Tutorial, Bertram Ludäscher 26 ... And Some Repercussions Lack of schema/semantics when querying the Web (HTML):
"find documents (books, papers, ...)                   where author = Michael Jackson" 
(... and learn how software engineering meets the moon walker ...)
"create a list of M. Jackson's books and  (if available) their prices"
 => HTML is inappropriate for
data exchange
automation of information management              (retrieval, manipulation, integration) 
27.   XML Tutorial, Bertram Ludäscher 27 XML is Based on Markup 
28.   XML Tutorial, Bertram Ludäscher 28 Elements and their Content 
29.   XML Tutorial, Bertram Ludäscher 29 Element Attributes 
30.   XML Tutorial, Bertram Ludäscher 30 XML = Labeled Ordered Trees 
31.   XML Tutorial, Bertram Ludäscher 31 In Search of the Lost Structure & Semantics 
32.   XML Tutorial, Bertram Ludäscher 32 Adding Structure and Semantics  XML Document Type Definitions (DTDs):
define the structure of "allowed" documents          (i.e., valid wrt. a DTD) 
? database schema 
=> improve query formulation, execution, ...  
XML Schema 
 defines structure and data types 
allows developers to build their own libraries of interchanged data types
XML Namespaces
identify your vocabulary 
33.   XML Tutorial, Bertram Ludäscher 33 XML DTDs as Extended CFGs 
34.   XML Tutorial, Bertram Ludäscher 34 Document Type Definitions (DTDs) 
35.   XML Tutorial, Bertram Ludäscher 35 Element Declarations 
36.   XML Tutorial, Bertram Ludäscher 36 Element Content Declarations 
37.   XML Tutorial, Bertram Ludäscher 37 Attributes 
38.   XML Tutorial, Bertram Ludäscher 38 Attribute Types 
39.   XML Tutorial, Bertram Ludäscher 39  Uses of XML Entities Physical partition  
size, reuse, "modularity", … (both XML docs & DTDs)
Non-XML data
unparsed entities ? binary data
Non-standard characters
character entities
Shorthand for phrases & markup
 
40.   XML Tutorial, Bertram Ludäscher 40 Entities & Physical Structure   
41.   XML Tutorial, Bertram Ludäscher 41 External Text Entities 
42.   XML Tutorial, Bertram Ludäscher 42 Types of Entities Internal (to a doc) vs. External (? use URI)
General (in XML doc) vs. Parameter (in DTD)
Parsed (XML) vs. Unparsed (non-XML) 
 
43.   XML Tutorial, Bertram Ludäscher 43 Internal Text Entities 
44.   XML Tutorial, Bertram Ludäscher 44 Unparsed (& "Binary") Entities 
45.   XML Tutorial, Bertram Ludäscher 45 From Docs to Data: XML Schema XML DTDs (part of the XML spec.)
flexible, semistructured data model (nesting, ANY, ?, *, |, ...)   
but document-oriented (SGML heritage)
no support for namespaces, datatypes, inheritance (e.g., type of book.title may be different from poem.title)
XML Schema (W3C working draft)
schema definition language in XML
data-oriented: data types
extends capabilities of DTD 
46.   XML Tutorial, Bertram Ludäscher 46 XML Schema: Example <type name="Order" >
    <element name="name"   type="string" />
    <element name="street" type="string" />
    <element name="zip"    type="integer" />
    <...>
    <attribute name="orderDate" type="date" />
</type>
 
47.   XML Tutorial, Bertram Ludäscher 47 XML Schema: Example 
48.   XML Tutorial, Bertram Ludäscher 48 W3C Work on XML Schemas Structures:
Specify complex element structure and 
Set constraints on the permitted values of the content of those elements
Datatypes:
Sets forth a standard of content datatypes and
Sets rules for generating new types from them 
49.   XML Tutorial, Bertram Ludäscher 49 Further Approaches RELAX (REgular LAnguage description for XML)
Standardized by INSTAC XML SWG of Japan.  
Compared with DTD, RELAX has new features:
RELAX grammars are represented in the XML instance syntax 
RELAX borrows rich data types of XML Schema Part 2 
RELAX is namespace-aware 
many others
 XML-Data, XML-DR, DCD, SOX, DDML, DSD, Schematron...
 
50.   XML Tutorial, Bertram Ludäscher 50 Normalized Data/Metadata Representation  Resource Description Framework (RDF) 
Metadata model
The designer can describe objects, add properties to define and describe them, and also make complicated statements about the objects (statements about relationships between resources).
The specification comes in two sections:
Model & Syntax (viewed as directed, labeled graphs)
RDF Schemas (using an XML vocabulary) 
51.   XML Tutorial, Bertram Ludäscher 51 Resource Description Framework (RDF) Metadata is useful for information retrieval  (esp. if no other schema info or semantics is available)
Idea: representation independent encoding of metadata as triples (Resource, PropertyType, Value):
(uri1, DC:creator, uri2), (uri2, vCard:name, smith), ...
"Semantic Net"
 
52.   XML Tutorial, Bertram Ludäscher 52 Identifying Vocabularies My element may not be your element: 
geometry context: <element>line</element> 
chemistry context: <element>oxygen</element> 
SGML/XML context: ....
  use XML namespaces to identify the vocabulary
 
53.   XML Tutorial, Bertram Ludäscher 53 XML Namespaces mechanism for globally unique tag names:
 <h:html xmlns:xdc="http://www.xml.com/books"
         xmlns:h="http://www.w3.org/HTML/1998/html4">
  <h:head><h:title>Book Review</h:title></h:head>
  ...
  <xdc:bookreview>
    <xdc:title>XML: A Primer</xdc:title>
  ... 
 </h:html>
mix of different tag vocabularies without confusion
namespaces only identify the vocabulary; additional mechanisms required for structure and meaning of tags 
54.   XML Tutorial, Bertram Ludäscher 54 Processing XML Non-validating parser:
checks that XML doc is syntactically well-formed
Validating parser:
checks that XML doc is also valid w.r.t. a given DTD
Parsing yields tree/object representation:
Document Object Model (DOM) API 
 Or a stream of events (open/close tag, data):
Simple API for XML (SAX)
 
55.   XML Tutorial, Bertram Ludäscher 55  DOM Structure Model and API hierarchy of Node objects: 
document, element, attribute, text, comment, ...
language independent programming DOM API: 
get... first/last child, prev/next sibling, childNodes
insertBefore, replace
getElementsByTagName
... 
alternative event-based SAX API (Simple API for XML)
does not build a parse tree (reports events when encountering begin/end tags)
for (partially) parsing large documents
 
56.   XML Tutorial, Bertram Ludäscher 56 DOM Summary Object-Oriented approach to traverse the XML node tree
Automatic processing of XML docs
Manipulation & Updating of XML on client & server
Database interoperability mechanism
Memory-intensive 
57.   XML Tutorial, Bertram Ludäscher 57 SAX Event-Based API Pros:
The whole file doesn’t need to be loaded into memory
XML stream processing
Simple and fast
Allows you to ignore less interesting data
Cons:
 limited expressive power (query/update) when working on streams
=> application needs to build (some) parse-tree when necessary
 
58.   XML Tutorial, Bertram Ludäscher 58 Querying XML   What can be done to XML so far:
generation: from HTML, DBs, manually, …
parsing: with/without DTD (valid/well-formed XML)
accessing: APIs for XML applications:
DOM  (in memory, tree-based), SAX (event-based) 
Now: Query languages for XML
XML-QL, XMAS, XPath, XSL(T), XQL, ... 
59.   XML Tutorial, Bertram Ludäscher 59 Querying XML   
Why not just query XML with SAX or DOM?
SAX: very simple “event-based” queries: ok
DOM: simple navigational queries (getChildNodes, getNextSibling, getElementsByTagName,…): ok
But: these are “low-level” APIs 
? iterator/cursor API for RDBs (but more powerful!)
used to write XML applications
“high-level” querying, restructuring and transformation (and updates??) is tedious 
=> analogue to high-level relational query languages (SQL, QBE, Logic (Datalog), …) 
=> Query languages for XML
 
60.   XML Tutorial, Bertram Ludäscher 60 Querying XML  No "official" W3C XML QL yet (but bits and pieces) 
numerous quite different XML QLs are popping up
some XML QL overviews, comparisons, and resources: 
 XML Query Languages: Experiences and Exemplars                 (co-authored by several XML QL gurus)
XML and Query Languages (Oasis Cover Pages)
Comparative Analysis of Five XML Query Languages (A. Bonifati, S. Ceri)
A Data Model and Algebra for XML Query (Philip Wadler et.al. “functional (Haskell) perspective”)
XML-QL vs XSLT queries  (Geert Jan Bex and Frank Neven; for (future) XSLT experts only ;-) 
Introduction to XMAS (the XML QL of the MIX project)
children of the “(semistructured) database(s) crowd”:
XML-QL, YaTL, Lorel, …
… from the “functional crowd”: 
… from the “document processing folks”: 
XQL, XSL(T), XPath, ... 
XPath: W3C Recommendation  
Powerful pattern language for selecting parts of XML docs
Used by XSL(T), XPointer, and XQL 
XQL 
based on XPath,   
Browser:IE5
XML DBs: Excelon, Tamino, 
Perl, …
 
61.   XML Tutorial, Bertram Ludäscher 61 Querying XML  Different XML QL paradigms depending on the community:
(relational, oo, semistructured) database perspective
Lorel, YaTL, XML-QL, XMAS, FLORA/FLORID, ...
document processing perspective
XQL, XSL(T), XPath, ... 
functional programming perspective
QLs with structural recursion, … 
62.   XML Tutorial, Bertram Ludäscher 62  Important QL Features (DB Perspective)  
typical parts of a query: 
(match) pattern (selects parts of the source XML tree without looking at data)
filter condition (selects further, now looking at the data)
answer construction (putting the results together, possibly reordered, grouped, etc.)
reordering based on nested queries, grouping, sorting, or Skolem functions 
tag variables, path expressions for defining the patterns without requiring  knowledge of the DTD  
63.   XML Tutorial, Bertram Ludäscher 63 Selection Queryies with XQL/XPath  Find the root element (bookstore) of this document: 
     /bookstore
Find all author elements anywhere within the current document: 
     //author
Find all books where the value of the style attribute on the book is equal to the value of the specialty attribute of the bookstore element at the root of the document: 
     //book[/bookstore/@specialty = @style]
 
 
64.   XML Tutorial, Bertram Ludäscher 64 Sample Queries with XQL/XPath  Find the root element (bookstore) of this document: 
     /bookstore
Find all author elements anywhere within the current document: 
     //author
Find all books where the value of the style attribute on the book is equal to the value of the specialty attribute of the bookstore element at the root of the document: 
     //book[/bookstore/@specialty = @style]
 
Find all books with author/first-name equal to 'Bob' and all magazines with price less than 10: 
     //(book[author/first-name = 'Bob'] $union$ magazine[price $lt$ 10])
 
65.   XML Tutorial, Bertram Ludäscher 65 Presenting XML: Extensible Stylesheet Language (XSL) Why Stylesheets? 
separation of  content (XML) from presentation (XSL)
Why not just CSS for XML?
XSL is far more powerful:
selecting elements
transforming the XML tree
content based display (result may depend on data)
 separation => XML for apps/the machine, XSL to produce human readable output
selections:  regular path expressions, select n-th child, string operations, 
transformations: filter, reorder, restructure the tree (=query capabilities), e.g. TOC, 
strip details, inline results of PIs, eg sorted table results of a query 
content based: if value < 0 then red else blackseparation => XML for apps/the machine, XSL to produce human readable output
selections:  regular path expressions, select n-th child, string operations, 
transformations: filter, reorder, restructure the tree (=query capabilities), e.g. TOC, 
strip details, inline results of PIs, eg sorted table results of a query 
content based: if value < 0 then red else black 
66.   XML Tutorial, Bertram Ludäscher 66 XSL Overview XSL stylesheets are denoted in XML syntax
XSL components:
1. a language for transforming XML documents              (XSLT: integral part of the XSL specification)
2. an XML formatting vocabulary                            (Formatting Objects: >90% of the formatting properties inherited from CSS)
 1. will be the focus
2. FOs denote typographic abstractions such as page, paragraph
finer level control (=properties):
word- and letter-spacing; and widow, orphan, and hyphenation control, 
1. will be the focus
2. FOs denote typographic abstractions such as page, paragraph
finer level control (=properties):
word- and letter-spacing; and widow, orphan, and hyphenation control, 
 
67.   XML Tutorial, Bertram Ludäscher 67 XSLT Processing Model 
68.   XML Tutorial, Bertram Ludäscher 68 XSLT Processing Model XSL stylesheet:   collection of template rules
template rule:      (pattern ? template)
main steps:
match pattern against source tree
instantiate template (replace current node “.” by the template in the result tree)
select further nodes for processing
control can be a mix of
recursive processing ("push": <xsl:apply-templates> ...)
program-driven ("pull": <xsl:foreach> ...) 
69.   XML Tutorial, Bertram Ludäscher 69 But first: some syntactic sugar, PLEASE... instead of something complicated like  
        y=f(x)  
in the brave new XSLT world you can “simply” write this as: 
<xsl:variable name="y"> 
  <xsl:call-template name="f"> 
    <xsl:with-param name="x"/>   
  </xsl:call-template> 
</xsl:variable name="y">  
70.   XML Tutorial, Bertram Ludäscher 70 Template Rule: Example 
71.   XML Tutorial, Bertram Ludäscher 71 Match/Select Patterns match patterns ? select patterns = defined in http://w3.org/TR/xpath 
Examples: 
/mybook/chapter[2]/section/*
chapter|appendix
chapter//para
div[@class="appendix" and position() mod 2 = 1]//para
../@lang 
 
72.   XML Tutorial, Bertram Ludäscher 72 XSLT Processing Flavors: Recursive Descent Processing    
73.   XML Tutorial, Bertram Ludäscher 73 Creating the Result Tree... Literal result elements: non-XSL elements (e.g., HTML) appear “literally” in the result tree
Constructing elements:
(similar for xsl:attribute, xsl:text, xsl:comment,…)
Generating text:
 
74.   XML Tutorial, Bertram Ludäscher 74 Creating the Result Tree... Further XSL elements for ...
Numbering 
<xsl:number value="position()" format="1 ">
Conditions
<xsl:if test="position() mod 2 = 0">
Repetition...
 
75.   XML Tutorial, Bertram Ludäscher 75 Creating the Result Tree: Repetition 
76.   XML Tutorial, Bertram Ludäscher 76 Creating the Result Tree: Sorting 
<xsl:template match="employees">
  <ul>
   <xsl:apply-templates select="employee">
        <xsl:sort select="name/last"/>
        <xsl:sort select="name/first"/>
       </xsl:apply-templates>
      </ul>
   </xsl:template>
   <xsl:template match="employee">
    <li>
      <xsl:value-of select="name/first"/>
      <xsl:text> </xsl:text>
      <xsl:value-of select="name/last"/>
    </li>
   </xsl:template>
 
sort by last name, then by first name
sort by last name, then by first name 
77.   XML Tutorial, Bertram Ludäscher 77 More on XSL XSL(T):
Conflict resolution for multiple applicable rules 
Modularization <xsl:include> <xsl:import>
…
XSL Formatting Objects
a la CSS
XPath (navigation syntax + functions)
 = XSLT ? XPointer
... FO:  pagination and layout, block formatting, tables, list
Properties: background, border, font
XPointer: provides for specific reference to elements, character strings, selections, 
and other parts of XML documents, whether or not they bear an explicit ID attribute, 
using traversals of a document"s structure and choice of parts based on their properties 
such as element types, attribute values, character content, and relative position, 
containment, and order.
XPointer defines the meaning of the "selector" or "fragment identifier" portion of URIs
 that locate resources of MIME media types "text/xml" and "application/xml".FO:  pagination and layout, block formatting, tables, list
Properties: background, border, font
XPointer: provides for specific reference to elements, character strings, selections, 
and other parts of XML documents, whether or not they bear an explicit ID attribute, 
using traversals of a document"s structure and choice of parts based on their properties 
such as element types, attribute values, character content, and relative position, 
containment, and order.
XPointer defines the meaning of the "selector" or "fragment identifier" portion of URIs
 that locate resources of MIME media types "text/xml" and "application/xml". 
78.   XML Tutorial, Bertram Ludäscher 78 The MIX Project:Mediation of Information using XML Joint effort between SDSC and the UCSD CSE Department 
79.   XML Tutorial, Bertram Ludäscher 79 Mediation of Information using XML (MIX)   
80.   XML Tutorial, Bertram Ludäscher 80 Integrated / Mediated views 
81.   XML Tutorial, Bertram Ludäscher 81 A Typical Mediation Scenario 
82.   XML Tutorial, Bertram Ludäscher 82 MIX Components MIXm Mediator tool-kit
allows definition of views across multiple resources
views are expressed in a declarative query language
query engine to execute queries on views
XML Matching And Structuring (XMAS) query language
operates on a given set of XML documents to produce a new XML document, using XMAS algebra 
83.   XML Tutorial, Bertram Ludäscher 83 An XML Query (XMAS) 
84.   XML Tutorial, Bertram Ludäscher 84 MIX components... DOM-VXD: DOM Virtual XML Document extension
a “lazy” implementation of DOM. Supports browsing/ navigation of XML documents with a server-side, “compute as you go” model
Blended Browsing and Querying (BBQ) interface
supports navigation and querying of XML documents
generates XMAS queries on mediator views
generates XMAS queries modified by DOM-VXD operations to incrementally evaluate the result set, to support navigation of XML documents 
85.   XML Tutorial, Bertram Ludäscher 85 Navigation driven evaluation In general:
A lazy mediator computes the result of each client navigation of the virtual result by issuing navigations to the sources and processing the results of source navigations.
Thus the mediator acts as a translator of navigational commands (DOM)In general:
A lazy mediator computes the result of each client navigation of the virtual result by issuing navigations to the sources and processing the results of source navigations.
Thus the mediator acts as a translator of navigational commands (DOM) 
86.   XML Tutorial, Bertram Ludäscher 86 Blended Browsing and Querying UI (BBQ) 
87.   XML Tutorial, Bertram Ludäscher 87 Another MIX Example: CDL/AMICO Mediator Prototype 
88.   XML Tutorial, Bertram Ludäscher 88 XSL Stylesheet for AMICO Answer Docs 
89.   XML Tutorial, Bertram Ludäscher 89 ... and the Result (+BBQ) 
90.   XML Tutorial, Bertram Ludäscher 90  Projects at DICE/SDSC National Archives and Records Administration, NARA
Persistent Archives and Electronic Records
NHPRC/NARA
XML and GIS
aXioMap
I2T: An Information Integration Testbed for Digital Government 
91.   XML Tutorial, Bertram Ludäscher 91 Projects at SDSC (… cont) AMICO
In conjunction with the California Digital Library (CDL)
Part of the NSF DLI-2 project
ESRI
Community of Science, Inc.
Networked Earthquake Engineering Simulation (NEES)
NSF program 
92.   XML Tutorial, Bertram Ludäscher 92 Information Based Computing 
93.   XML Tutorial, Bertram Ludäscher 93 Integrating Data Set Management Model-Based Information Management
Rule-based ontology mapping, conceptual-level mediation - CMIX
Data Grid
Data federation across multiple libraries - MIX 
Digital Library 
Interoperable services for information discovery and presentation - SDLIP
Data Collection 
Tools for managing data set collections on databases - MCAT
Data Handling
Systems for data retrieval from remote storage - SRB
Persistent Archives
Storage of data collections for 30 years 
94.   XML Tutorial, Bertram Ludäscher 94 Model-Based Mediation 
Knowledge-based mediation 
conceptual-level integration 
Rule-based ontology maps
map source XML to CM to FL (ontologies, views)
 Models for exporting
rules  
integrity constraints  
query capabilities 
data & schema (XML/DTDs) 
95.   XML Tutorial, Bertram Ludäscher 95    Federation of Brain Data 
96.   XML Tutorial, Bertram Ludäscher 96 Further Information xml.com
w3.org
xml.org
ibm.com/xml
... 
 Mediation of Information using XML (MIX):
www.npaci.edu/DICE/MIX/
www.db.ucsd.edu/Projects/MIX/