370 likes | 475 Vues
Tools for Memory: Semantic Content (XML). Mahesh Chaudhari School of Computing and Informatics Department of Computer Science and Engineering Arizona State University. Outline. World Wide Web (WWW) and HTML Migration towards XML What is XML? XML supporting Technologies
E N D
Tools for Memory: Semantic Content (XML) Mahesh Chaudhari School of Computing and Informatics Department of Computer Science and Engineering Arizona State University
Outline • World Wide Web (WWW) and HTML • Migration towards XML • What is XML? • XML supporting Technologies • Languages based on XML specifications • Conclusion
World Wide Web (WWW) and HTML • Giant network of computers. • Part of day to day activities. • Emails, chat, video, news. • Most important Browsing or surfing the Net. • HTML (Hyper-Text Markup Language) common language for Internet.
Why HTML? • Easy to understand, learn and use. • Quick and fancy way of presentation. • Fixed set of instructions in the form of elements (tags) and attributes. • e.g. <HTML>, <HEAD>, <BODY>, etc. • Standard for sharing information over Internet. • Understandable by all the Internet browsers. • Text Based browsers e.g. Lynx, HyperTerminal, etc. • Graphical browsers e.g. IE, Firefox, Netscape Navigator, etc.
Structure of HTML Document. Root of the Document <HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252"> <TITLE>course</TITLE> </HEAD> <BODY bgcolor="#9F9F9F"> <Center>course</Center> <TABLE> <TR> <TD>123</TD> <TD>databases</TD> <TD>S. Urban</TD> <TD>3</TD> </TR> </TABLE> </BODY> </HTML> Cover Page of the Document Header Main Content of the Document Draws a Table with course Information in rows and columns. Contents
Why not HTML? • Fixed set of elements vs. User-defined HTML: <TD>S. Urban</TD> XML: <instructor>S. Urban</instructor> • Similarly with the attributes. • Cannot exchange information between different applications, organizations, etc. • Cannot provide more meaning to the data (semantics to the data).
School Virtual Organization School Records University • Sharing Information • Understanding what • is being sent/received Internet/Network University Records Employment Records Personal Records Health Records Company Student/Person Hospital
eXtensible Markup Language (XML) • Similar to HTML (consists of elements, attributes and DATA). • Allows definition of user-defined elements and attributes (<instructor> tag is allowed). • More meaning to the data (adds semantics to the data). • Extensively used for data exchange. • Understood by most of the Internet browsers. More Strict, Powerful and Rich than HTML.
Structure of XML Document <?xml version="1.0" encoding="UTF-8"?> <dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="course.xsd"> <course> <crsid>123</crsid> <cname>databases</cname> <inst>S. Urban</inst> <length>3</length> </course> </course> <crsid>124</crsid> <cname>software engineering</cname> <inst>J. Urban</inst> <length>3</length> </course> </dataroot> Root Main Content of the Document User-defined elements give more meaning to the data
</TD>? </TR>? Is XML Strict? XML HTML <?xml version="1.0" encoding="UTF-8"?> <dataroot> <course> <crsid>123 <cname>databases</cname> <inst>S. Urban</inst> <length>3</length> </course> </dataroot> <HTML> <HEAD> <TITLE>course</TITLE> </HEAD> <BODY bgcolor="#9F9F9F"> <Center>course</Center> <TABLE border="1"> <TR> <TD>123 <TD>databases</TD> <TD>S. Urban</TD> <TD>3</TD> </TABLE> </BODY> </HTML> Allowed in HTML Not allowed in XML <?xml version="1.0" encoding="UTF-8"?> <dataroot> <course> <crsid>123</crsid> <cname>databases</cname> <inst>S. Urban</inst> <length>3</length> </course> </dataroot> Allowed in XML For every starting element XML should always have ending element !
Is XML more strict than that? XML HTML <HTML> <HEAD> <TITLE>course</TITLE> </HEAD> <BODY bgcolor="#9F9F9F"> <b><i>This text is bold and italics.</b> but this text is only italics.</i> </BODY> </HTML> Allowed in HTML <?xml version="1.0" encoding="UTF-8"?> <dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <b><i>This text is bold and italics.</b>but this text is only italics.</i> </dataroot> Not allowed in XML <?xml version="1.0" encoding="UTF-8"?> <dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <b><i>This text is bold and italics. but this text is only italics. </i></b> </dataroot> Allowed in XML All the elements in XML document should be properly nested !
Key Features of XML • User-defined tags/elements possible. • Document has only one root element. • Document must be well-formed. • Every start tag should have end tag. • Tags must be properly nested. • Tags in XML are case sensitive and may not contain white space. • Tags must start with a letter or underscore, and may contain letters, digits, period ( . ), underscore( _ ) or hyphen ( - ) • Tags cannot begin with the letters "xml" - reserved • Tags should have semantic meaning. • Start tags may have attributes.
Elements • Elements • Always consist of start_tag, data (optional), and end_tag. • E.g. <crsid>123</crsid> <hr></hr> or <hr/> • Attributes • Provide metadata information or additional information for the element and occur only once inside the element. • E.g. <course ID=“123”></course>
Special Attributes in XML • ID and IDREF • ID: unique value in the whole document. • IDREF: reference the unique ID values in the document. e.g. <instructor ID=“1”>S. Urban</instructor> <instructor ID=“2”>P. Dasgupta</instructor> … <course> <inst IDREF=“1” /> … </course>
Data-centric XML • Regular, defined structure. • Ordering of tags immaterial. • Used for machine reading. • E.g. Course information or Instructor Information.
Data-centric XML E.g. Course Information <?xml version="1.0" encoding="UTF-8"?> <dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="course.xsd"> <course> <crsid>123</crsid> <cname>databases</cname> <inst>S. Urban</inst> <length>3</length> </course> </course> <crsid>124</crsid> <cname>software engineering</cname> <inst>J. Urban</inst> <length>3</length> </course> </dataroot>
Document-centric XML • Less regular structure. • Ordering of tags important. • Mostly used for human consumption. • E.g. Product description, Book Information, Library Catalogs.
Document-centric XML E.g. Product Description <Product> <Intro> The <ProductName>Turkey Wrench</ProductName> from <Developer>Full Fabrication Labs, Inc.</Developer> is <Summary>like a monkey wrench, but not as big.</Summary> </Intro> <Description> <Para>The turkey wrench, which comes in <i>both right- and left- handed versions (skyhook optional)</i>, is made of the <b>finest stainless steel</b>. The Readi-grip rubberized handle quickly adapts to your hands, even in the greasiest situations. Adjustment is possible through a variety of custom dials.</Para> <Para>You can:</Para> <List> <Item><Link URL="Order.html">Order your own turkey wrench</Link></Item> <Item><Link URL="Wrenches.htm">Read more about wrenches</Link></Item> <Item><Link URL="Catalog.zip">Download the catalog</Link></Item> </List> <Para>The turkey wrench costs <b>just $19.99</b> and, if you order now, comes with a <b>hand-crafted shrimp hammer</b> as a bonus gift.</Para> </Description> </Product>
XML DOM XSLT DTD CSS SAX XQUERY XSD XPATH XML a Jigsaw Puzzle • Supporting Technologies • Give meaning to the elements. • Data types for every element. • Traversing, querying mechanism. • Support in other programming languages. • Presentation like HTML.
Meaning and Structure to XML • Document Type Definition (DTD) • Describes the structure of the XML document. • Legal Parents – Children relationships. • Custom non-XML syntax to describe the schema. • Does not support data types and namespaces. • XML Schema Definition (XSD) • XML-like syntax to describe the schema. • Supports different data types. • Support namespaces.
Traversing and Querying • XPath • Navigate through the XML document. • Find a particular element or attribute. • Building block for XQuery, XSLT. • XQuery • Find and retrieve elements and attributes from XML document. • Query language similar to SQL. • Supported by relational databases like Oracle and SQL server.
Other Programming Languages Support • Document Object Model (DOM) • Standard way of accessing and manipulating XML elements. • Loads the entire XML document in the memory (RAM). • Bi-directional traversal of the XML tree. • Slow and high memory consumption for large XML documents. • Simple API for XML (SAX) • Event-driven parser to access XML elements. • Reads the XML document from the file, element-by-element basis. • Unidirectional traversal of the XML tree (top to bottom). • Fast and low memory consumption for large XML documents.
Presenting XML Data • Cascading Style Sheet (CSS) • Set of instructions to present data in readable format. • Non-XML syntax to make data look pretty. • Used in conjunction with HTML. • EXtensible Stylesheet Language Transformations (XSLT) • Transforms XML document into HTML, another XML or text file. • Uses XPath extensively. • XML-like syntax.
Other Markup Languages • MathML : Markup language for Mathematics • SVG : Scalar Vector Graphics • MusicXML: an XML-based music notation file format. • VoiceXML: format for specifying interactive voice dialogues between a human and a computer • Linguists : Use of XML in studying different languages and their grammar. http://en.wikipedia.org/wiki/List_of_XML_markup_languages
Useful Links • http://xml.coverpages.org/PESC-HS-Transcript2006.html XML High School Transcript Standard • http://enterprise.astm.org/REDLINE_PAGES/E2369.htm XML Health care Record Standard • http://www.w3schools.com/xml/default.asp XML Tutorial from W3Cschools • http://www.w3schools.com/xsl/xsl_languages.asp XSL Tutorial from W3Cschools • http://www.w3schools.com/xpath/default.asp XPath Tutorial from W3Cschools • http://www.w3schools.com/xquery/default.asp XQuery Tutorial from W3Cschools • http://www.w3schools.com/dtd/default.asp DTD Tutorial from W3Cschools • http://www.w3schools.com/schema/default.asp XML Schema (XSD) Tutorial from W3Cschools • http://www.xml.com/pub/rg/XML_Editors XML Editors (contains a list of editors, not exhaustive many more exist outthere) • http://www.w3.org/ The World Wide Web Consortium (W3C) • http://www.wowwiki.com/XML_User_Interface World of Warcraft and XML • http://docs.info.apple.com/article.html?artnum=93732 iTunes and XML
School Revisit Virtual Organization School Records (XML) University Internet/Network • Sharing Information • Understanding what • is being sent/received University Records (XML) Employment Records Personal Records Health Records (XML) Company Student/Person Hospital
Summary • HTML and WWW • Limitations of HTML • Introduction to XML • XML • Structure • Key features • Data-centric Vs. Document-centric. • Supporting technologies
Document Type Definition (DTD) The following slides are derived from the slides of Dr. Suzanne Dietrich. She is an assistant professor at the West campus, Department of Mathematical Sciences & Applied Computing.
Document Type Definition (DTD) • Describes the structure of the XML document. • Legal Parents – Children relationships. • Can be defined as internal section of the XML document before the root element of the XML document. <!DOCTYPE root-element [element-declarations]> • Can be attached to XML document as an external reference. <!DOCTYPE root-element SYSTEM "filename.dtd">
Structure of DTD Document • <!ELEMENT elementName contentSpecification> • contentSpecification defines the content of the element • ANY: No restrictions on the element’s content; limited use • EMPTY: Cannot store any content (assume attributes) • #PCDATA: Contains parsed character data (NO ELEMENTS) < (<) >(<) "(") ' (‘) & (&)<!ELEMENT inst (#PCDATA)> • Nested elements using parentheses • Mixed elements – can contain parsed character data and nested elements
DTD: Nested Elements • (element1, element2, element3) indicates a sequence of elements, i.e., ordered <!ELEMENT sequencedElements (element1, element2, element3)> <!ELEMENT course (crsid, cname, inst, length)> • (elementA | elementB | elementC) indicates a choice of elements<!ELEMENT choiceOfElements (elementA | elementB | elementC)> <!ELEMENT customer (name | company)>
DTD: Elements Cardinality • element+: element occurs one or more times • element*: element occurs zero or more times • element?: optional (0 or 1) • element: exactly once
DTD: Mixed Elements <!ELEMENT elementName (#PCDATA | child1 | child2 | …) * > • Elements with mixed content allow for both parsed character data or child elements. • Allows any number of occurrences of pcdata or child elements • Not very useful for a document with defined structure.
Limitations of DTD • No support for newer features of XML — most importantly, namespaces. • Lack of expressivity. Certain formal aspects of an XML document cannot be captured in a DTD. • Custom non-XML syntax to describe the schema.