
SchemaPath: a minimal extension to XML Schema for conditional constraints Paolo Marinelli Claudio Sacerdoti Coen Fabio Vitali University of Bologna (Italy)
Validation • Validation is writing correctness rules for an XML document, and verifying that they hold for every document received. • Possible with a number of schema languages, roughly divided in two kinds: • grammar-based languages: DTD, XML Schema (XSD), Relax NG, etc. A whole generative grammar is created, and every document that can be built with this grammar is valid. • Rule-based languages: Schematron, xlinkit, etc. Rules are defined to check for special conditions (required or rejected). Every document that does not violate any of these rules is valid. Next: Why validate XML documents? 2/20
DOM Tree + PSVI XMLdoc rules rules DOM tree downstream application DOM parser Not well-formed Schema validator Invalid Why validate XML documents? • Usually, when receiving data from an unreliable source, programmers intersperse their application code with checks on data values, error handling, remedial procedures, etc. • Validation does all checks before submitting the data to the downstream application, removing the need for most of the checks on data values. Next: The PSVI 3/20
The PSVI • XML Schema adds to the validation of XML structures another concept: the decoration of each structure of the XML document with additional information. This is called the Post-Schema Validation Infoset, or PSVI • This can be useful for the downstream application, that can activate specific code depending on the PSVI data available for each element. • The most important contribution of the PSVI is without doubt the data type: validation code can assess that an element contains a valid date, a valid number, or a valid complex markup structure, so that the downstream application can skip any control on it and call appropriate handling code. Next: Unfortunately… 4/20
Unfortunately… • … most schema languages cannot express all the structure and data constraints that document designers may need. • For example: • Mutual exclusion (“element x may have either the a attribute or the b attribute, but not both) • Deep exclusions (“element x cannot contain, at any level of its subtree, element y”) • Structure-dependent structures (“if the item is gratis (the attribute gratis is present), then no price should be specified (the element price should be absent)”) • Data-dependent structures (“if the address is a PO box, then the address must include a PO box number, otherwise it must include a street name and a street number”) • These kinds of constraints are known as co-constraints, or co-occurrence constraints. Most real life XML document types have one or more of those constraints. Next: Plenty of examples 5/20
Plenty of examples • XHTML • “a elements cannot contain other a elements” (appendix B) • Both the normative DTD and the non normative XML Schema cannot express fully this requirement (they only express a weaker form: “a elements cannot directly contain other a elements”) • XSLT • “In a template element at least one of the match and name attributes must be present” • Again, the DTD and XML schema cannot express this requirement, and specify both attributes as optional. • XML Schema itself • “An element definition must either contain a ref or a name attribute, but not both. Furthermore, if the name attribute is present, then the type attribute or one of the simpleType or complexType elements must be present, but not two.” • The normative XML schema can only specify all these elements and attributes as optional. • … and plenty more… Next: Who cares? 6/20
? ? ? XMLdoc rules rules DOM tree downstream application DOM parser Not well-formed Schema validator DOM Tree + PSVI invalid Who cares? • Documents could contain violations to these rules, and still be considered valid according to the DTD or XML schema. • Three solutions: • Cross your fingers and hope for the best • Provide a default behavior (pick one option and ignore other structures) • Provide validation code within the downstream application incorrect Next: Schematron 7/20
Schematron • Schematron could in fact express most of these requirements (but data- and structure-related structures only through hacks). • Schematron lacks generative rules, and they can be specified with great pain, or by mixing Schematron rules with grammar-based rules of another schema language. • Suggestions to use XML Schema and Schematron together in one schema document exist in literature. • Quite complex in practice, requires competence in both languages, and has problems with PSVI. Next: Extending XML Schema 8/20
Extending XML Schema • Our view is that the only practical solution is to extend XML Schema (or another grammar-based language). • If the extension is minimal, then implementation costs, learning efforts, and impact on existing schemas are also minimal. Next: Our proposal: SchemaPath 9/20
Our proposal: SchemaPath • SchemaPath is our proposal to minimally extend XML Schema to handle co-constraints of all kinds. • The idea is to find a way to conditionally assign types to elements and attributes. • Furthermore, a non-satisfiable type is added for specifying error conditions to avoid. • SchemaPath maintains the XML Schema syntax, adds only ONE construct and ONE pre-defined simple type, maintains important XML Schema properties (the validation theorem and round-tripping and reverse round-tripping properties), and does not impact the PSVI for valid documents. • Its simplest implementation is straightforward and trivial (~15 lines of code) in any language and architecture where an XSLT engine and an XML Schema engine already exist. • Qualified under namespace http://www.cs.unibo.it/SchemaPath/1.0, but the parser accepts also plain XSD schema namespace. Next: SchemaPath syntax (in one slide!) 10/20
SchemaPath syntax (in one slide!) • <xsd:alt>: Expresses a condition in the type assignment of an element or an attribute. Its attributes are: • cond: an optional XPath expressing the condition that must be verified for the type assignment to be performed. Multiple conditions may be verified, in which case a priority mechanism is employed. An alt elements without an explicit cond attribute implicitly has a low-priority, default, always-true condition. • priority: an optional decimal number specifying the priority level of a condition, in case the default priority is unsatisfactory. • type: a required XML Schema type name which is assigned to the element or attribute if the condition holds and has the top priority. • xsd:error: a predefined unsatisfiable simple type. Assigning this type to an element or an attribute always determines a validation error. Next: A few examples 11/20
A few examples • Mutual exclusion • “Element x may have either the a attribute or the b attribute but not both”. Suppose we have defined a type myType with both a and b attributes as optional <xsd:element name=“x”><xsd:alt cond=“(@a and @b)” type=“xsd:error”/><xsd:alt type=“myType”/> </xsd:element> • Data-dependent structures • “The element quantity must be an integer if the unit element is ‘items’, and it must be a decimal value if the unit element is ‘meters’”. Suppose we have already defined the data type for the unit element to only contain the values “meters” or “items”. <xsd:element name=“quantity”><xsd:alt cond=“../unit=‘items’” type=“xsd:integer”/><xsd:alt cond=“../unit=‘meters’” type=“xsd:decimal”/> </xsd:element> Next: Addressing co-constraints: XHTML 12/20
Addressing co-constraints: XHTML • Deep exclusion of a elements within other a elements • “a elements cannot contain other a elements” • Suppose we have defined an inlineType to contain all inline elements that can go inside an a element, as well as inside other elements such as b, i, etc. <xsd:element name=“a”> <xsd:alt cond=“.//a” type=“xsd:error”/> <xsd:alt type=“inlineType”/></xsd:element> Next: Addressing co-constraints: XSLT 13/20
Addressing co-constraints: XSLT • Minimal presence • “In a template element at least one of the match and name attribute must be present” • Suppose we have already defined a templateType type with the match and name attributes both set as optional <xsd:element name=“template”><xsd:alt cond=“@match or @name” type=“templateType”/><xsd:alt type=“xsd:error”/> </xsd:element> Next: Addressing co-constraints: XML Schema 14/20
Addressing co-constraints: XML Schema • Complex mutual exclusions • “An element definition must either contain a ref or a name attribute, but not both. Furthermore, if the name attribute is present, then either the type attribute or one of the simpleType or complexType elements must be present.” • Suppose we have already defined an elementType with a choice of simpleType and complexType, and the type, ref and name attributes as optional. <xsd:element name=“element”><xsd:alt cond=“@name and @ref” priority=“2.0” type=“xsd:error”/><xsd:alt cond=“(@type or @ref) and (simpleType or complexType)” priority=“1.5” type=“xsd:error”/><xsd:alt cond=“../schema and @ref” priority=“1.0” type=“xsd:error”/><xsd:alt cond=“not(@name) and not(@ref)” priority=“0.5” type=“xsd:error”/><xsd:alt priority=“0.0” type=“element”/> </xsd:element> • The conditions could be simpler by using different complex types Next: Implementation: an XSD preprocessor 15/20
X SPrules rules XSDrules X’ downstream application DOM parser XSDpreprocessor ok Nonwell-formed rules Schemavalidator invalid Implementation: an XSD preprocessor • SchemaPath validators can be implemented: • From scratch (but they have a complexity in the order of a XML Schema validator) • Modifying an existing XML Schema validator (breaking the evolution path of the selected validator) • As an XSD preprocessor (i.e. an independent application feeding a plain XML Schema validator) • It can be proved that SP validates X iff XSD validates X’ Next: Our XSLT-based process 16/20
X SPrules rules XSDrules X’ XSLT T’’ XSLT MT XSLT T’ rules Our XSLT-based process • Our test preprocessor is implemented simply with two (rather convoluted) XSLT stylesheets and about 20 lines of real code. • The whole process uses a stylesheet T’ to create an XSD schema out of the SchemaPath, and meta-stylesheet MT to generate a stylesheet T’’ to transform the XML document X. The whole schema looks as follows: Next: An example of the final schema and XML doc 17/20
This used to be the XPath“../unit=‘items’” This used to be the XPath“../unit=‘meters” An example of the final schema and XML doc • <xsd:choice> <xsd:element name="wrquantity0.2E.2E.2Funit.3D.27items.27"> <xsd:complexType><xsd:sequence> <xsd:element name="quantity" type="xsd:integer"/> </xsd:sequence></xsd:complexType> </xsd:element> <xsd:element name="wrquantity0.2E.2E.2Funit.3D.27meters.27"> <xsd:complexType><xsd:sequence> <xsd:element name="quantity" type="xsd:decimal"/> </xsd:sequence></xsd:complexType> </xsd:element></xsd:choice> • <invoiceLine> <unit>meters</unit> <wrquantity0.2E.2E.2Funit.3D.27meters.27> <quantity>2.5</quantity> </wrquantity0.2E.2E.2Funit.3D.27meters.27></invoiceLine> Next: Conclusions 18/20
Conclusions • Support for co-constraints is heavily needed in many situations. • Many schemas and DTDs contain plain language specifications of co-constraints • Some document specifications even lament the lack of support for co-constraints in the schema language • The solution is to extend a schema language • One grammar, one validation, one schema document • The implementation as a pre-processor is a great aid. • Conditional type assignments are much cleaner than conditional types • The PSVI does not change • Good validity properties are preserved • Much simpler to implement Next: Thanks! 19/20
Thanks! Visit us at http://genesispc.cs.unibo.it:3333/schemapath.asp or http://tesi.fabio.web.cs.unibo.it/schemapath/