1 / 170

Διδάσκων: Ν. Βασιλειάδης Αναπλ. Καθ. Τμ. Πληροφορικής ΑΠΘ

ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ , ΑΠΘ ΜΕΤΑΠΤΥΧΙΑΚΟ ΠΡΟΓΡΑΜΜΑ ΣΠΟΥΔΩΝ Κατεύθυνση Πληροφοριακών Συστημάτων - 1ο Εξάμηνο Σημασιολογικός Ιστός lpis .csd.auth.gr/mtpx/sw/index.htm. Διδάσκων: Ν. Βασιλειάδης Αναπλ. Καθ. Τμ. Πληροφορικής ΑΠΘ. Μαθήματα: 2-3 -4. Chapter 2 Structured Web Documents in XML.

eddy
Télécharger la présentation

Διδάσκων: Ν. Βασιλειάδης Αναπλ. Καθ. Τμ. Πληροφορικής ΑΠΘ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ, ΑΠΘΜΕΤΑΠΤΥΧΙΑΚΟ ΠΡΟΓΡΑΜΜΑ ΣΠΟΥΔΩΝ Κατεύθυνση Πληροφοριακών Συστημάτων - 1ο Εξάμηνο Σημασιολογικός Ιστόςlpis.csd.auth.gr/mtpx/sw/index.htm Διδάσκων: Ν. Βασιλειάδης Αναπλ. Καθ. Τμ. Πληροφορικής ΑΠΘ Μαθήματα: 2-3-4

  2. Chapter 2Structured Web Documents in XML Grigoris Antoniou Frank van Harmelen

  3. Introduction • Today HTML (HyperText Markup Language) is the standard language for Web pages. • HTML was derived from SGML (Standard Generalized Markup Language) • International standard (ISO 8879) for the definition of device- and system-independent methods of representing information, both human- and machine-readable. A Semantic Web Primer, 2nd Edition

  4. Standards • Standards are important • They enable effective communication • They support technological progress and business collaboration • In WWW, standards are set by the W3C • They are called recommendations • In a distributed environment without central authority, standards cannot be enforced. A Semantic Web Primer, 2nd Edition

  5. SGML Applications • HTML • Developed because SGML is too complex for Internet-related purposes. • XML (eXtensible Markup Language) • Its development was driven by shortcomings of HTML. A Semantic Web Primer, 2nd Edition

  6. An HTML Example <h2>Nonmonotonic Reasoning: Context- Dependent Reasoning</h2> <i>by <b>V. Marek</b> and <b>M. Truszczynski</b></i><br> Springer 1993<br> ISBN 0387976892 A Semantic Web Primer, 2nd Edition

  7. The Same Example in XML <book> <title>Nonmonotonic Reasoning: Context- Dependent Reasoning</title> <author>V. Marek</author> <author>M. Truszczynski</author> <publisher>Springer</publisher> <year>1993</year> <ISBN>0387976892</ISBN> </book> A Semantic Web Primer, 2nd Edition

  8. HTML versus XML: Similarities • Both use tags (e.g. <h2> and </year>) • Tags may be nested (tags within tags) • Human users can read and interpret both HTML and XML representations quite easily … But how about machines? A Semantic Web Primer, 2nd Edition

  9. HTML versus XML: Differences • All tags in XML must be closed • E.g. for an opening tag <title> there must be a closing tag </title> • The enclosed content, together with its opening and closing tags, is called element • In HTML some tags (<br>), may be left open. • XHTML brings HTML more in line with XML • Any valid XHTML document is also a valid XML document • Opening and closing tags in XHTML are balanced A Semantic Web Primer, 2nd Edition

  10. Problems with Automated Interpretation of HTML Documents An intelligent agent trying to retrieve the names of the authors of the book • Authors’ names could appear immediately after the title • or immediately after the word by • Are there two authors? • Or just one, called “V. Marek and M. Truszczynski”? A Semantic Web Primer, 2nd Edition

  11. HTML vs XML: Structural Information • HTML documents do not contain structural information: pieces of the document and their relationships. • XML more easily accessible to machines: • Every piece of information is described. • Relations are also defined through the nesting structure • E.g., <author> tags appear within <book> tags, so they describe properties of the particular book. A Semantic Web Primer, 2nd Edition

  12. HTML vs XML: Structural Information (2) • A machine processing the XML document would be able to deduce that • the author element refers to the enclosing book element • rather than by proximity considerations • XML allows the definition of constraints on values • E.g. a year must be a number of 4 digits A Semantic Web Primer, 2nd Edition

  13. HTML vs XML: Formatting • The HTML representation provides more than the XML representation: • The formatting of the document is also described • Τhe main use of an HTML document is to display information: it must define formatting • XML: separation of content from display • Same information can be displayed in different ways • Content may be used for purposes other than display A Semantic Web Primer, 2nd Edition

  14. HTML vs XML: Another Example • In HTML <h2>Relationship matter-energy</h2> <i> E = M × c2 </i> • In XML <equation> <meaning>Relationship matter energy</meaning> <leftside> E </leftside> <rightside> M × c2 </rightside> </equation> A Semantic Web Primer, 2nd Edition

  15. HTML vs XML: Different Use of Tags • Both HTML docs have the same tags. • In XML completely different tags are used! • HTML tags define display: color, lists … • XML tags not fixed: user definable tags • XML meta markup language: language for defining markup languages A Semantic Web Primer, 2nd Edition

  16. XML Vocabularies • Web applications must agree on common vocabularies to communicate and collaborate • Communities and business sectors are defining their specialized vocabularies • mathematics (MathML) • bioinformatics (BSML) • human resources (HRML) • … A Semantic Web Primer, 2nd Edition

  17. Data Exchange • The main use of XML today is as a uniform data exchange format between applications. • Rather than a document markup language. • Companies retrieve information from their partners, and update their databases accordingly. • If there is not an agreed common standard, then specialized processing and querying software must be developed for each partner separately • Technical overhead • The software must be updated every time a partner decides to change its own database format. A Semantic Web Primer, 2nd Edition

  18. Lecture Outline • Introduction • Detailed Description of XML • Structuring • DTDs • XML Schema • Namespaces • Accessing, querying XML documents: XPath • Transformations: XSLT A Semantic Web Primer, 2nd Edition

  19. The XML Language An XML document consists of • a prolog • a number of elements • an optional epilog (not discussed) A Semantic Web Primer, 2nd Edition

  20. Prolog of an XML Document The prolog consists of • an XML declaration and • an optional reference to external structuring documents <?xml version="1.0" encoding="UTF-16"?> <!DOCTYPE book SYSTEM "book.dtd"> A Semantic Web Primer, 2nd Edition

  21. XML Elements • The “things” the XML document talks about • E.g. books, authors, publishers • An element consists of: • an opening tag • the content • a closing tag <lecturer>David Billington</lecturer> A Semantic Web Primer, 2nd Edition

  22. XML Elements (2) • Tag names can be chosen almost freely. • The first character must be a letter, an underscore, or a colon • No name may begin with the string “xml” in any combination of cases • E.g. “Xml”, “xML” A Semantic Web Primer, 2nd Edition

  23. Content of XML Elements • Content may be text, or other elements, or nothing <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> • If there is no content, then the element is called empty; it is abbreviated as follows: <lecturer/> for <lecturer></lecturer> A Semantic Web Primer, 2nd Edition

  24. XML Attributes • An empty element is not necessarily meaningless • It may have some properties in terms of attributes • An attribute is a name-value pair inside the opening tag of an element <lecturer name="David Billington" phone="+61 − 7 − 3875 507"/> A Semantic Web Primer, 2nd Edition

  25. XML Attributes: An Example <order orderNo="23456“ customer="John Smith" date="October 15, 2002"> <item itemNo="a528" quantity="1"/> <item itemNo="c817" quantity="3"/> </order> A Semantic Web Primer, 2nd Edition

  26. The Same Example without Attributes <order> <orderNo>23456</orderNo> <customer>John Smith</customer> <date>October 15, 2002</date> <item> <itemNo>a528</itemNo> <quantity>1</quantity> </item> <item> <itemNo>c817</itemNo> <quantity>3</quantity> </item> </order> A Semantic Web Primer, 2nd Edition

  27. XML Elements vs Attributes • Attributes can be replaced by elements • When to use elements and when attributes is a matter of taste • But attributes • Cannotbenested • Cannot have two attributes with the same name in the same element A Semantic Web Primer, 2nd Edition

  28. Mixed Content • Elements with mixed content contain both text and other elements at the same time <letter>   Dear Mr.<name>John Smith</name>.Your order <orderid>1032</orderid>will be shipped on <shipdate>2001-07-13</shipdate>. </letter> A Semantic Web Primer, 2nd Edition

  29. Further Components of XML Docs • Comments • A piece of text that is to be ignored by parser • <!-- This is a comment --> • Processing Instructions (PIs) • Define procedural attachments • <?stylesheet type="text/css" href="mystyle.css"?> A Semantic Web Primer, 2nd Edition

  30. Well-Formed XML Documents • Syntactically correct documents • Some syntactic rules: • Only one outermost element (called root element) • Each element contains an opening and a corresponding closing tag • Tags may not overlap <author><name>Lee Hong</author></name> • Attributes within an element have unique names • Element and tag names must be permissible A Semantic Web Primer, 2nd Edition

  31. The Tree Model of XML Documents: An Example <email> <head> <from name="Michael Maher" address="michaelmaher@cs.gu.edu.au"/> <to name="Grigoris Antoniou" address="grigoris@cs.unibremen.de"/> <subject>Where is your draft?</subject> </head> <body> Grigoris, where is the draft of the paper you promised me last week? </body> </email> A Semantic Web Primer, 2nd Edition

  32. A Tree Example A Semantic Web Primer, 2nd Edition

  33. The Tree Model of XML Docs • The tree representation of an XML document is an ordered labeled tree: • There is exactly one root • There are no cycles • Each non-root node has exactly one parent • Each node has a label. • The order of elements is important • … but the order of attributes is not important A Semantic Web Primer, 2nd Edition

  34. The Tree Model of XML Docs • Trees are used just as illustrations. • Some XML aspects are not properly represented. • A more refined tree concept is required • The different types of nodes should be differentiated • Element node, attribute node etc. • There is difference between: • the root of the tree (representing the XML document) • the root element (email) • This distinction is important for addressing and querying XML documents. A Semantic Web Primer, 2nd Edition

  35. Lecture Outline • Introduction • Detailed Description of XML • Structuring • DTDs • XML Schema • Namespaces • Accessing, querying XML documents: XPath • Transformations: XSLT A Semantic Web Primer, 2nd Edition

  36. Why Structuring XML Documents? • An XML document is well-formed if it respects certain syntactic rules. • These rules say nothing specific about the structure of the document. • Imagine 2 applications that try to communicate • They wish to use the same vocabulary A Semantic Web Primer, 2nd Edition

  37. Structuring XML Documents • Define all the element and attribute names that may be used • Define the structure • what values an attribute may take • which elements may or must occur within other elements, etc. • If such structuring information exists, the document can be validated A Semantic Web Primer, 2nd Edition

  38. Structuring XML Documents (2) • An XML document is valid if • it is well-formed • respects the structuring information it uses • There are two ways of defining the structure of XML documents: • DTDs (the older and more restricted way) • XML Schema (offers extended possibilities) A Semantic Web Primer, 2nd Edition

  39. DTD: Element Type Definition <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> DTD for above element (and all lecturer elements): <!ELEMENT lecturer (name,phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT phone (#PCDATA)> A Semantic Web Primer, 2nd Edition

  40. The Meaning of the DTD • The element types lecturer, name, and phone may be used in the document • A lecturer element contains a name element and a phone element, in that order (sequence) • A name element and a phone element may have any content • In DTDs, #PCDATA is the only atomic type for elements A Semantic Web Primer, 2nd Edition

  41. DTD: Disjunction in Element Type Definitions • We express that a lecturer element contains either a name element or a phone element as: <!ELEMENT lecturer (name | phone)> • Both elements below are validated: <lecturer> <name>David Billington</name> </lecturer> <lecturer> <phone> +61 − 7 − 3875 507 </phone> </lecturer> A Semantic Web Primer, 2nd Edition

  42. DTD: Element Type Definitions – Any order • A lecturer element contains a name element and a phone element in any order. <!ELEMENT lecturer ((name,phone) | (phone,name))> • Both elements below are validated: <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> <lecturer> <phone> +61 − 7 − 3875 507 </phone> <name>David Billington</name> </lecturer> A Semantic Web Primer, 2nd Edition

  43. Example of an XML Element <order orderNo="23456" customer="John Smith" date="October 15, 2002"> <item itemNo="a528" quantity="1"/> <item itemNo="c817" quantity="3"/> </order> A Semantic Web Primer, 2nd Edition

  44. The Corresponding DTD <!ELEMENT order (item+)> <!ATTLIST order orderNo ID #REQUIRED customer CDATA #REQUIRED date CDATA #REQUIRED> <!ELEMENT item EMPTY> <!ATTLIST item itemNo ID #REQUIRED quantity CDATA #REQUIRED comments CDATA #IMPLIED> A Semantic Web Primer, 2nd Edition

  45. Comments on the DTD • The item element type is defined to be empty • + (after item) is a cardinality operator: • ?: appears zero times or once • *: appears zero or more times • +: appears one or more times • No cardinality operator means exactly once A Semantic Web Primer, 2nd Edition

  46. Comments on the DTD (2) • In addition to defining elements, we define attributes • This is done in an attribute list containing: • Name of the element type to which the list applies • A list of triplets of attribute name, attribute type, and value type • Attribute name: A name that may be used in an XML document using a DTD A Semantic Web Primer, 2nd Edition

  47. DTD: Attribute Types • Similar to predefined data types, but limited selection • The most important types are • CDATA, a string (sequence of characters) • ID, a name that is unique across the entire XML document • IDREF, a reference to another element with an ID attribute carrying the same value as the IDREF attribute • IDREFS, a series of IDREFs • (v1| . . . |vn), an enumeration of all possible values • Limitations: no dates, number ranges etc. A Semantic Web Primer, 2nd Edition

  48. DTD: Attribute Value Types • #REQUIRED • Attribute must appear in every occurrence of the element type in the XML document • #IMPLIED • The appearance of the attribute is optional • #FIXED "value" • Every element must have this attribute • "value" • This specifies the default value for the attribute A Semantic Web Primer, 2nd Edition

  49. Referencing with IDREF and IDREFS <!ELEMENT family (person*)> <!ELEMENT person (name)> <!ELEMENT name (#PCDATA)> <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> A Semantic Web Primer, 2nd Edition

  50. An XML Document Respecting the DTD <family> <person id="bob" mother="mary" father="peter"> <name>Bob Marley</name> </person> <person id="bridget" mother="mary"> <name>Bridget Jones</name> </person> <person id="mary" children="bob bridget"> <name>Mary Poppins</name> </person> <person id="peter" children="bob"> <name>Peter Marley</name> </person> </family> A Semantic Web Primer, 2nd Edition

More Related