1 / 85

XML, XSLT

XML, XSLT. Discussion on Markup Languages, Trends. What is XML ?. XML is the Extensible Markup Languag e It is designed to enable the use of SGML on the World Wide Web. It defines ‘an extremely simple dialect of SGML

Télécharger la présentation

XML, XSLT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML, XSLT

  2. Discussion on Markup Languages, Trends

  3. What is XML ? • XML is the Extensible Markup Language • It is designed to enable the use of SGML on the World Wide Web. • It defines ‘an extremely simple dialect of SGML • XML is not a single, predefined markup language: it's a metalanguage -- a language for describing other languages • XML lets you define your own customized markup languages • XML is a markup language for structured documentation

  4. What is XML for? • To make it easy and straightforward to use SGML on the Web • easy to define document types • easy to author and manage SGML-defined documents • easy to transmit and share them across the Web • It defines ‘an extremely simple dialect of SGML which is completely described in the XML Specification • The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML • XML has been designed for ease of implementation, and for interoperability with both SGML and HTML

  5. What is SGML? • SGML is the Standard Generalized Markup Language (ISO 8879) • It is the international standard for defining descriptions of the structure and content of different types of electronic document • SGML is quite complex to implement and contains a lot of features that are very rarely used • SGML parsers and browsers are complex and difficult to write

  6. Why XML? • Extensibility • Structure • Validation • Media independence • Platform independence • Meaningful markup • Strict hierarchies ensure clean structures. • Define your own schema to suit tags

  7. Why XML? • Structure modeling (DTD) • UNICODE - multilingual support • XML replaces ASCII CSV for data interchange • XML provides the means for structured information exchange

  8. Why XML? • A bridge between SGML and HTML • Simplified version of SGML • SGML too complex and heavy to use • Difficult to interpret an SGML document • A new standard which would take the best features of SGML, yet keep it SIMPLE • HTML is to rigid. It has solved it’s purpose. XML is more extensible • By defining your own markup language (DTD), you can encode the information of your documents much more precisely

  9. Why XML? • It removes two constraints which are holding back Web development • dependence on a single, inflexible document type (HTML) • the complexity of full SGML, whose syntax allows many powerful but hard-to-program options • Authors and providers can designtheir own document types using XML, instead of being stuck with HTML • Document types can be explicitly tailored to an audience, so the cumbersome fudging that has to take place with HTML to achieve special effects can become a thing of the past

  10. Why XML? • Information content can be richer and easier to use, because the hypertext linking abilities of XML are much greater than those of HTML • XML can provide more and better facilities for browser presentation and performance, using CSS and XSL stylesheets • Information will be more accessible and reusable, because the more flexible markup of XML can be used by any XML software instead of being restricted to specific manufacturers as has become the case with HTML

  11. SGML, XML and HTML… • SGML is the `mother tongue', used for describing thousands of different document types • HTML is just one of these document types, the one most frequently used in the Web • It defines a simple, fixed type of document with markup designed for a common class of documents • XML is an abbreviated version of SGML • Makes it easier for you to define your own document types • Omits the more complex and less-used parts of SGML

  12. SGML, XML and HTML… • XML itself does not replace HTML: instead, it provides an alternative which allows you to define your own set of markup elements • HTML is expected to remain in common use for some time to come • Document TypeDefinitions for HTML are available in XML versions (XHTML)

  13. HTML Presentation + Unstructured data It is an application of SGML Predefined tags Inability to nest components properly. One view XML Data structure divorced from presentation (more OO like) It is a subset of SGML, meta language User defined tags Ability to nest components using DTD. One document, multiple views HTML V/s XML

  14. HTML Representation <HTML> <HEAD> <TITLE>RESUME</TITLE> </HEAD> <BODY> <B>Name</B> Amit Rekhi<BR/> <B>Age</B> 23yrs<BR/> </BODY> </HTML> XML Representation <!ELEMENT Resume (Name,Age, Sex,…)> <!ELEMENT Name(#PCDATA)> <!ELEMENT Age (#PCDATA)> <!ELEMENT Sex (#PCDATA)> <?xml version=“1.0”?> <Resume> <Name>Amit Rekhi</Name> <Age>23yrs</Age> <Sex>Male</Sex> </Resume> - ExampleA Resume DTD XML

  15. What happened to HTML? • HTML is already overburdened with dozens of interesting but incompatible inventions from different manufacturers, because it provides only one way of describing your information. • It is too rigid and fixed. No easy, generic and standard way to extend HTML • It mixes structure with presentation • It does not allows groups of people or organizations to create their own customized, standardized markup applications for exchanging information in their domain • HTML has served it’s purpose of making the web popular. Now something better and more extensible is needed

  16. What happened to HTML? • HTML is broken. A few start tags do not have end tags. • Can I make my existing HTML files work in XML? If so how?

  17. Why XML and not Words or Notes? • Public information cannot afford to be restricted to one make or model or manufacturer • It is helpful for such information to be in a standard form that can be reused in many different ways, as this can minimize wasted time and effort • Proprietary data formats, no matter how well documented or publicized, are simply not an option • XML gives a standard interchange format for document interchange which is easily understood programmatically

  18. Is XML same as C/C++/Java? • XML is a markup language • C/C++/Java are programming languages • XML does not have programming constructs. • No if, for…..next etc. • No compiler ONLY a parser • If no constructs, then how to represent logic? Do I need to do it? If so where is it done?

  19. How to control presentation in XML? • In XML, you can define your own tagset, consequently browsers cannot know anything about the names/elements you use so the use of a stylesheet is required • XML deals only with structure. Style and presentation is taken care of seperately • Concept of presentation is similar to Document View Architecture • How should presentation be taken care of? Any reuse of XML here?

  20. Can I use Java/C++/Scripts in XML? • XML is ONLY about describing information • Scripting languages and others enable embedded functionality which helps enables information to be manipulated at the user's end, are not used to represent structure • No place for PLs in XML • Do I need to represent PL logic in XML? How?

  21. Can I use PLs to create XML files? • Any programming language can be used to output data from any source in XML format • XML is an interchange format. It is an input/output format. It only represents structure • Implementations are available to manipulate XML • Should you have APIs to access XML? What types? How would APIs relate to PLs?

  22. How to execute XML files? • You can't and you don't • XML is not a programming language, so XML files don't ‘run’ or ‘execute’ • XML files are data: You have to • Run a program which displays them (like a browser) • Write a program that does some work with them (like a converter which writes the data in another format) • Create a program that creates them

  23. What does XML look like? <?xml version="1.0" standalone="yes"?> <conversation> <greeting>Hello, world!</greeting> <response> Stop the planet,I want to get off! </response> </conversation>

  24. What does XML look like? <?xml version="1.0" standalone="no" encoding="UTF-8"?> <!DOCTYPE titlepage SYSTEM "http://www.frisket.org/dtds/typo.dtd" [<!ENTITY % active.links "INCLUDE">]> <titlepage> <white-space type="vertical" amount="36"/> <title font="Baskerville" size="24/30" alignment="centered">Hello, world!</title> <image location="http://www.foo.bar/fleuron.eps" type="URL" alignment="centered"/> </titlepage>

  25. XML - A Simple Example Structure Definition <!ELEMENT order (order-no, deliver-to, item+) > <!ELEMENT order-no (#PCDATA) > <!ELEMENT deliver-to (address) > <!ELEMENT item (name, quantity) > • DTD - defines STRUCTURE of XML documents • DTD - used for VALIDATION of XML documents

  26. XML - Instance Data <!DOCTYPE order SYSTEMS “http://www.something.org/messages/xml/message1.xml”> <order> <order-no>0000123</order-no> <deliver-to> <address> <company>A. B. Infosys Private Limited</company> <street>B-102, Gulmohar Park</street> <town>New Delhi</town> <region>India</region> <postcode>110049</postcode> </address> </deliver-to> <item> <name>Pencil</name> <quantity>12</quantity> </item> </order>

  27. Benefits of XML • Separation of (XSL) from Structure (XML-DTD) • All XML based languages parsed using single browser • Easy Development of 3-tier Web Applications. • Data integration from disparate sources. • XML data is self-describing • Interchange format of a variety of applications

  28. Benefits of XML • Local computation and manipulation. • Multiple views of data. • XML based on Open Standards

  29. EXERCISE 1 XML

  30. What do you see here? <?xml version="1.0" encoding="UTF-8” standalone="yes"?> <!DOCTYPE FAQ SYSTEM "FAQ.DTD"> <FAQ> <INFO> <SUBJECT>XML</SUBJECT> <AUTHOR>Lars Marius Garshol</AUTHOR> <EMAIL>larsga@ifi.uio.no</EMAIL> <VERSION>1.0</VERSION> <DATE>20.jun.97</DATE> </INFO> <PART NO="1"> <Q NO="1"> <QTEXT>What is XML?</QTEXT> <A>SGML light.</A> </Q> </PART> </FAQ>

  31. … And here? <!ELEMENT FAQ (INFO, PART+)> <!ELEMENT INFO (SUBJECT, AUTHOR, EMAIL?, VERSION?, DATE?)> <!ELEMENT SUBJECT (#PCDATA)> <!ELEMENT AUTHOR (#PCDATA)> <!ELEMENT EMAIL (#PCDATA)> <!ELEMENT VERSION (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT PART (Q+)> <!ELEMENT Q (QTEXT, A)> <!ELEMENT QTEXT (#PCDATA)> <!ELEMENT A (#PCDATA)> <!ATTLIST PART NO CDATA #IMPLIED TITLE CDATA #IMPLIED> <!ATTLIST Q NO CDATA #IMPLIED>

  32. XML Document Valid *Conforms to DTD Well-formed *Obeys XML syntax

  33. Concepts • Document Type Definition (DTD) • Validity • Well-formedness

  34. Concepts - DTD • A Document Type Definition (DTD) is a file written in XML's declaration syntax • It contains a formal description of the syntax, structure particular type of document • It sets out what names can be used for element types • It sets out where elements may occur • It also shows how all elements and other constructs fit together • The concept of a DTD and XML is similar to a class and an object

  35. DTD Sample A DTD fragment: . . . <!ELEMENT List (Item)+> <!ELEMENT Item (#PCDATA)> An XML instance: . . . <List> <Item>Chocolate</Item> <Item>Music</Item> <Item>Surfing</Item> </List>

  36. Concepts - Validity • Valid XML files are those which have a Document Type Definition (DTD) and adhere to it • They must already be well-formed • A valid file begins with a Document Type Declaration (DTD) • An XML version of the specified DTD must be accessible to the XML processor. This can be specified by supplying the URL for the DTD in a System Identifier • Sample XML file: • <?xml version="1.0"?> <!DOCTYPE advert SYSTEM “….”>

  37. Concepts - Well-formedness • All tags must be balanced: that is, all elements which may contain character data must have both start- and end-tags present • All attribute values must be in quotes • Any EMPTY element tags must either end with ‘/>’ or you have to make them appear non-EMPTY by adding a real end-tag • There must not be any isolated markup-start characters (< or &) in your text data. If present it should be escaped. • Elements must nest inside each other properly

  38. Sample well-formed XML file? <?xml version="1.0" standalone="yes"?> <foo> <bar>...<blort/>...</bar> </foo>

  39. Structure of an XML file Word Document

  40. XML Structure: Elements • Elements are the most common form of markup • Delimited by angle brackets (start tags, end tags), most elements identify the nature of the content they surround. • Some elements may be empty in which case they have no content and are shown as <element/> • If an element is not empty, it begins with a start-tag, <element>, and ends with an end-tag, </element> • Element Sample • <Element>Sample Content </Element>

  41. XML Structure: Attributes • Attributes are name-value pairs that occur inside tags after the element name • All attribute values must be quoted • Attribute Sample • <Element attr1=“sample attribute”>Sample Content </Element>

  42. XML Structure: Entity References • An entity reference refers to the content of a named entity • They are used to insert reserved markup characters (<, &, “) characters into your document as content • Entities are also used to refer to often repeated or varying text and to include the content of external files • In order to use an entity, you simply reference it by name • References to parsed general entities use ampersand (&) and semicolon (;) as delimiters. Parameter-entity references use percent-sign (%) and semicolon (;) as delimiters

  43. XML Structure: Char. References • Entity References Sample • &amp; and %pe1; • Is a special form of entity reference • Can be used to insert arbitrary Unicode characters into your document • This is a mechanism for inserting characters that cannot be directly typed • Character references take one of two forms: • decimal references eg. &#8478; • hexadecimal references eg. &#x211E;

  44. XML Structure:Comments • Comments begin with “<!--” and end with “-->” • They can contain any data except the literal string “--” • You can place comments between markup anywhere in your document • They are not part of the textual content of an XML document • XML Comment Sample • <!-- This is a sample comment -->

  45. XML Struc.:Processing Instructions • Processing instructions (PIs) are an escape hatch to provide information to an application • They are not textually part of the XML document • They have the form: <?name pidata?> • The name, called the PI target, identifies the PI to the application • The names used in PIs may be declared as notations in order to formally identify them • Any data that follows the PI target is optional, it is for the application that recognizes the target • XML PI Sample • <? xml version=“1.0” ?>

  46. XML Structure.:CDATA Sections • CDATA sections may occur anywhere character data may occur • They are used to escape blocks of text containing characters which would otherwise be recognized as markup • They begin with the string "<![CDATA[" and end with the string "]]>” • The only string that cannot occur in a CDATA section is “]]>” • Sample XML CDATA Section: • <![CDATA[ I < 3; ]]>

  47. XML Structure Document Type Declarations

  48. XML Structure:Doc.Type Decl. • For any XML document to have meaning there must be some constraint on the sequence and nesting of tags • Declarations are where these constraints can be expressed • Declarations allow a document to communicate meta-information to the parser about its content • Meta-information includes • Allowed sequence and nesting of tags • Attribute values and their types and defaults • the names of external files that may be referenced and whether or not they contain XML • the formats of some external (non-XML) data, and entities

  49. XML Structure:Doc.Type Decl. • There are four kinds of declarations in XML: • element declarations • attribute list declarations • entity declarations • notation declarations

  50. XML Structure:Element Decl. • The element structure of an XML document may, for validation purposes, be constrained using element type and attribute-list declarations. • Element type declarations often constrain which element types can appear as children of the element • Element declarations identify the names of elements and the nature of their content (content model) • In addition to element names, the special symbol #PCDATA is reserved to indicate character data • Elements with both element content and PCDATA content are said to have “mixed content”

More Related