1 / 65

Tutorial 3: XML

Tutorial 3: XML. Validating DOCUMENTS with DTDs. Section 3.1. Creating a Valid Document. Customer orders table. The structure of the order.xml document. customers. customer custID [ custType ]. +. name [ title ]. The customers must have at least one customer child.

Télécharger la présentation

Tutorial 3: XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial 3:XML Validating DOCUMENTS with DTDs

  2. Section 3.1 Creatinga Valid Document

  3. Customer orders table

  4. The structure of the order.xml document customers customer custID [custType] + name [title] The customers must have at least one customer child A customermust have a custID, name, address, phone, and may have a custType, title, email address phone An orders element is used to group one or more separate order placed by a customer ? email orders + order orderID orderBy The orders must have at least one order child orderDate items The items must have at least one item child + item itemPrice [itemQty]

  5. The first customer in the Orders.xml

  6. DTD and A Valid Document • An XML document can be validated using either DTDs (Document Type Definitions) or schemas. • A DTD is a collection of rules that define the content and structure of an XML document. • A DTD can be used to: • enforce a specific data structure • ensure all required elements are present • prevent undefined elements from being used • specify the use of attributes and define their possible values

  7. Declaring a DTD • A DTD is declared in a DOCTYPEstatement. It has to be added to the document prolog, after the XML declaration and before the document's root element. • While there can only be one DTD per XML document, it can be divided into two parts: • An internal subset is placed within the same XML document. • An external subset is located in a separate file.

  8. To declare an internal DTD subset <!DOCTYPEdocument’s root [ declarations ]> • An example: <!DOCTYPE customers [ ]>

  9. To declare an external DTD subset • External subsets have two types of locations: system and public. For a system DTD, <!DOCTYPE rootSYSTEM “uri_ExternalFile”> • An example: <!DOCTYPE customers SYSTEM "rules.dtd">

  10. To declare an external DTD subset • The syntax of the DOCTYPE declaration using a public identifier: <!DOCTYPEroot PUBLIC “id”“uri” > Where id is public identifier acting like the namespace URI • An example: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

  11. Using External & Internal DTDs • The real power of XML comes from an external DTD that can be shared among many documents. • If a document contains both an internal and an external subset, the internal subset takes precedence over the external subset if there is a conflict between the two. • This way, the external subset would define basic rules for all the documents, and the internal subset would define those rules specific to each document.

  12. Using External and Internal DTDs

  13. Declaring Document Elements • In a valid document, every element must be declared in the DTD. • The syntax of an element declaration is: <!ELEMENT element content-model> where element is the name of the element and content-model specifies what type of content the element contains. • The element name is case sensitive

  14. Five values for content-model • ANY - No restrictions on the element’s content • EMPTY - The element cannot store any content • #PCDATA - The element can only contain parsed character data • Elements - The element can only contain child elements • Mixed - The element contains both parsed character data and child elements

  15. <!ELEMENT elementANY> • An example: <!ELEMENT product ANY> All of the following satisfy the above declaration: • <product>PLBK70 Painted Lady Breeding Kit</product> • <product type = "Painted Lady Breeding Kit" /> • <product> <name>PLBK70</name> <type> Painted Lady Breeding Kit</type></product>

  16. <!ELEMENT elementEMPTY> • An example: <!ELEMENT img EMPTY> The following would satisfy the above declaration: • <img />

  17. <!ELEMENT element(#PCDATA)> • An example <!ELEMENT name (#PCDATA)>would permit the following element in an XML document: <name>Lea Ziegler</name> • PCDATA element may contain plain text. The "parsed" part of it means that markup in it is parsed instead of displayed as raw text. It also means that entity references are replaced. • PCDATA element does not allow for child elements

  18. <!ELEMENT parent (children)> <!ELEMENT customer (phone)> • The customer element can contain only a single child element, named phone. • The following would be invalid: <customer> <name>Lea Ziegler</name> <phone>555-2819</phone></customer>

  19. Specifying an element sequence<!ELEMENTparent (child1, child2, . .)> child1, child2, . . is the order in which the child elements must appear within the parent element • <!ELEMENT customer (name, phone, email)> indicates the document below is invalid:<customer> <name>Lea Ziegler</name> <email>LZiegler@tempmail.net</email> <phone>(813) 555-8931</phone></customer>

  20. Specifying an element choice<!ELEMENTparent (child1 | child2 | . .)> child1, child2 are the possible child elements of the parent element • <!ELEMENT customer (name | company)> allows the customer element to contain either the name element or the company element. • <!ELEMENT customer ((name | company), phone, email)> indicates that the customer element must have three child elements

  21. Modifying Symbols • DTDs use a modifying symbol to specify the number of occurrences of each element • ? allows zero or one of the item • + allows one or more of the item • * allows zero or more of the item • If you want to specify that an element contain exactly three child elements you have to enter the sequence childchild childinto the declaration

  22. Modifying Symbols • <!ELEMENT customers (customer+)>the customers element must contain at least one element named customer • <!ELEMENT order (orderDate, items)+> the (orderDate, items) sequence can be repeated one or more times within each order element • <!ELEMENT customer (name, address, phone, email?, orders)> the customer element contains zero or one email element

  23. Declaring child elements customers element can contain one or more customer elements customer element has the following child elements: name, address, phone email (optional), and orders

  24. Working with Mixed Content • Mixed content elements contain both parsed character data and child elements. The syntax is: <!ELEMENT parent (#PCDATA | child1 | child2 | … )*> • The parent element can contain character data or any number of the specified child elements, or it can contain no content at all. • It is better not to work with mixed content if you want a tightly structured document.

  25. Section 3.2 Declaring Attributes

  26. Declaring Element Attributes Add an attribute-list declaration to the document’s DTD to accomplish the following: • lists the names of all of the attributes associated with a specific element • specifies the data type of each attribute • indicates whether each attribute is required or optional • provides a default value for each attribute, if necessary

  27. Attributes used in orders.xml

  28. <!ATTLIST element attribute1 type1 default1 attribute2 type2 default2 attribute3 type3 default3 … > or <!ATTLIST element attribute1 type1 default1 > <!ATTLIST element attribute2 type2 default2 > <!ATTLIST element attribute3 type3 default3 > Declaring Attributes in a DTD • element is the name of the element associated with the attributes • attribute is the name of an attribute • type is the attribute’s data type • default indicates whether the attribute is required and whether it has a default value

  29. Declaring Attribute Names Attribute-list declaration can be placed anywhere within the document type declaration, although it is easier if they are located adjacent to the declaration for the element with which they are associated

  30. Attribute TypesAttribute values can consist only of character data, but you can control the format of those characters • CDATA - character data • Enumerated list - a list of possible attribute values • ID - A unique text string • IDREF - A reference to an ID value • ENTITY - a reference to an external unparsed entity • ENTITIES - a list of entities separated by white space • NMTOKEN - an accepted XML name • NMTOKENS - a list of XML names separated by white space

  31. CDATAcan contain any character except those reserved by XML <!ATTLIST elementattributeCDATAdefault> • <!ATTLIST item itemPrice CDATA> <!ATTLIST item itemQty CDATA> • Any of the following attributes values are allowed:<item itemPrice=“29.95”> . . . </item><item itemPrice=“$29.95”> . . . </item><item itemPrice=“£29.95”> . . . </item>

  32. Enumerated Types: Attributes that are limited to a set of possible values <!ATTLIST elementattribute (value1 | value2 |value3 | . .)default >where value1, value2, . . are allowed values • <!ATTLIST customer custType(home| business | school)> • any custType attribute whose value is not “home”, “school”, or “business” causes parsers to reject the document as invalid

  33. Tokenized Types are character strings that follow certain rules (known as tokens) for format & content • DTDs support four kinds of tokens: IDs, ID references, name tokens, and entities

  34. ID Token is used when an attribute value must be unique within the document <!ATTLIST customer custIDID> • This declaration ensures each customer will have a unique ID • The following elements would not be valid because the same custID value is used more than once: <customer custID="Cust021"> ... </customer><customer custID="Cust021"> ... </customer>

  35. <!ATTLIST elementattributeIDREFdefault> An attribute declared as an IDREF token must have a value equal to the value of an ID attribute located somewhere in the same document. This enables an XML document to contain cross-references between one element and another. • <!ATTLIST order orderBy IDREF> • When an XML parser encounters this attribute, it searches the XML document for an ID value that matches the value of the orderBy attribute. If it doesn't find one, it rejects the document as invalid.

  36. An attribute contains a list of ID references <!ATTLIST customer orders IDREFS><!ATTLIST order orderID ID> <customer orders="OR3413 OR3910 OR5310"> ... </customer>...<order orderID="OR3413"> ... </order><order orderID="OR3910"> ... </order><order orderID="OR5310"> ... </order> 36

  37. Specifying attribute IDs and IDREFs each custID value must be unique in the document each orderBy value must reference an ID value somewhere in the document

  38. Attribute Defaults There are four possible defaults: • #REQUIRED: The attribute must appear with every occurrence of the element. • #IMPLIED: The attribute is optional. • An optional default value: A validated XML parser will supply the default value if one is not specified. • #FIXED: The attribute is optional. If an attribute value is specified, it must match the default value.

  39. An attribute contains a list of ID references <!ATTLIST customer custID ID #REQUIRED> a customer ID is required for every customer <!ATTLIST customer custType (school | home | business) #IMPLIED> If an XML parser encounters a customer element without a custType attribute, it assumes a blank value for the attribute <!ATTLIST item itemQty CDATA "1"> Assume a value of "1" for itemQty if it's missing 39 39

  40. Specifying attribute defaults

  41. DTDs and Namespaces • You can work with namespace prefixes, applying a validation rule to the element's qualified name. <!ELEMENT cu:phone (#PCDATA)> • Any namespace declarations in a document must also be included in the DTD for the document to be valid. This is usually done using a fixeddatatype for the namespace's URI. <!ATTLIST cu:customersxmlns:cuCDATA #FIXED " http://www.butterfly.com/customers ">

  42. Validating a Document with SMLSpy • The Web is an excellent source for validating parsers, including Web sites in which you can upload your XML document for free to have it validated. • XMLSpy is an XML development environment created by Altova, which is used for designing and editing professional applications involving XML, XML Schema, and other XML-based technologies.

  43. Section 3.3 Working with entities

  44. Introducing Entities • XML supports the following built-in entities: &amp;&lt;&gt;&apos;&quot; • If you have a long text string that will be repeated throughout your XML document, avoid data entry errors by placing the text string in its own entity. • You can create your own customized set of entities corresponding to text strings like product descriptions that you want referenced by the XML document.

  45. Working with General Entities • A general entity is an entity that references content to be used within an XML document. That content can be either parsed or unparsed. • A parsed entity references text that can be readily interpreted or parsed by an application reading the XML document. • An entity that references content that is either nontextual or which cannot be interpreted by an XML parser is an unparsed entity. One example of an unparsed entity is an entity that references a graphic image file.

  46. Working with General Entities • The content referenced by an entity can be placed either within the DTD or in an external file. Internal entities reference content found in the DTD. External entities reference content from external files.

  47. Internal Parsed Entities <!ENTITY entity “value”> whereentityis the name assigned to the entity and value is the entity’s value that must be well-formed XML text • Examples: <!ENTITY MBL25 "Monarch Butterfly, 6-12 larvae"> <!ENTITY MBL25 "<desc>Monarch Butterfly, 6-12 larvae</desc>"> • & and % are not allowed as part of an entity's value. Use &amp; to include the & symbol, if necessary

  48. External Parsed External Entities • For longer text strings, place the content in an external file. To create an external parsed entity, use: <!ENTITY entity SYSTEM “uri”> where uri is the URI of the external file containing the entity value • In the declaration: <!ENTITY MBL25 SYSTEM "description.xml"> an entity named “MBL25” gets its value from the description.xml file

  49. Referencing a General Entity • After an entity is declared, it can be referenced anywhere within the document. The syntax is:&entity; • For example, <item>&MLB25;</item>is interpreted as<item>Monarch Butterfly, 6-12 larvae</item>

  50. Declare parsed entities in the codes.dtd file for the product codes in the orders.xml documentation <!ENTITY BF100P "Butterfly farm pop-up self erecting portable greenhouse"><!ENTITY BFGK10 "Field of Dreams backyard butterfly garden kit"><!ENTITY HME100 "Hummingbird Hawkmoth (ManducaSexta), 100 eggs"><!ENTITY MBL25 "Monarch Butterfly, 6-12 larvae"><!ENTITY MP12 "Monarch Pupae (DanausPlexippus), 12 pupae"><!ENTITY MWT15 "Giant Milkweed Tree (Calotropis Ssp.), 1 crown flower"><!ENTITY PLBK70 "Painted Lady classroom breeding kit, 70 larvae"> Entity name Entity value

More Related