740 likes | 747 Vues
XML for .NET. Session 1 Introduction to XML Introduction to XSLT Programmatically Reading XML Documents Introduction to XPATH . XML Documents Can be Read Programmatically. The .NET Framework consists of many classes to aid in programmatically iterating through and navigating XML documents.
E N D
XML for .NET Session 1 Introduction to XML Introduction to XSLT Programmatically Reading XML Documents Introduction to XPATH
XML Documents Can be Read Programmatically • The .NET Framework consists of many classes to aid in programmatically iterating through and navigating XML documents. • These classes are found in the System.Xml namespace. The various classes in the System.Xml namespace are highlighted in Chapter 6 of the text, XML and ASP.NET (starting on page. 261).
Accessing XML Content • XML documents can be accessed in one of two ways: in a push model or a pull model. • The pull model loads the entire XML document into memory, and then works with the document once it has been completely loaded. • The push model accesses only tiny pieces of the XML document when needed.
How to use the Two Methods • The .NET Framework provides developers both methods: • Pull Method – use the DOM classes in the .NET Framework. • Push Method – use the XmlReader and XmlWriter classes.
Using the Pull Method • The System.Xml namespace contains a number of classes to work with XML documents in the DOM paradigm: • XmlDocument – represents an XML document. • XmlElement – represents an individual element in the DOM • XmlAttribute – represents an attribute. • XmlText – represents text content.
Using the Push Method • The XmlReader reads one node at a time from a specified XML source. The XmlReader can only read in a FORWARD direction. • The XmlReader class cannot be used directly; instead, one of its derived classes must be used instead: • XmlNodeReader – reads one node at a time from an XML DOM. • XmlTextReader – reads one node at a time from an XML source, such as a file with XML content. • XmlValidatingReader – a reader that performs DTD or schema validation (more on this next week!)
Iterating through an XML Document using XmlTextReader • To iterate through the contents of an XML document with the XmlTextReader we need to: • Specify the XML document to iterate through when creating the XmlTextReader. • Call the Read() method, which reads in the next Node. • Access the properties of the XmlTextReader to determine the name, value, and other information about the read Node.
Iterating through an XML Document using XmlTextReader • We can programmatically read through the contents of an XML file like so: // create an XmlTextReader to read the specified XML file XmlTextReader reader = new XmlTextReader(filepath); // now, display the information of each node in the TextBox while (reader.Read()) { // access the properties of the XmlTextReader class... // like reader.Name, reader.NodeType, reader.Value, etc. } // close the XmlTextReader reader.Close();
What is a Node? • Recall that the XmlReader classes read XML nodes. What constitutes a node? Can you identify the nodes in the following XML fragment? <?xml version=“1.0” encoding=“utf-8” ?><books> <book price=“34.95”> <title>Animal Farm</title> <authors> <author>Orwell</author> </authors> </book></books>
What is a Node? <?xml version=“1.0” encoding=“utf-8” ?><books> <book price=“34.95”> <title>Animal Farm</title> <authors> <author>Orwell</author> </authors> </book></books> The whitespace between each element (if present) is also considered a node! (Although, you can set the XmlTextReader’s WhitespaceHandling property to specify if the Reader should read whitespace nodes or not.
What is a Node? <?xml version=“1.0” encoding=“utf-8” ?><books> <book price=“34.95”> <title>Animal Farm</title> <authors> <author>Orwell</author> </authors> </book></books> Notice that the attributes of an element are not considered nodes...
Creating a Program to View the Content Read by an XmlTextReader • We can create a program that allows the user to select an XML file; then, the contents of the XML file are read by an XmlTextReader, with each read node’s name, type, and value displayed.(Run demo!)
Reading the Attributes • As we saw in the demo, the attributes are not read as a separate node. • We can determine whether or not a given node has attributes by the HasAttributes property. • In order to programmatically access the attributes of a node, we must use the MoveToNextAttribute() method of the XmlTextReader.
Reading the Attributes while (reader.Read()) // C# { if (reader.HasAttributes) while (reader.MoveToNextAttribute()) // Access the attribute name/value via // reader.Name/reader.Value } While reader.Read // VB.NET Ifreader.HasAttributes then Whilereader.MoveToNextAttribute() ' Access the attribute name/value via ' reader.Name/reader.Value End While End If End While
The XmlTextReader Properties and Methods • The properties and methods of the XmlTextReader are listed started on pg. 272 of the text. • Some more germane methods include: • ReadInnerXml() – returns a string with the complete content (including XML markup) of the current node’s content (child nodes, text content, etc.) • ReadOutterXml() – returns a string containing the node’s XML markup along with the node’s content XML markup.
The XmlTextReader Properties and Methods • Run ReadInnerOutterXml-ForXmlTextReader demo… • When reading an XML document, the XmlTextReader class will throw an XmlException if there was an error in parsing the XML. • An error can occur if the XML, for example, is malformed. (That is, it is not well-formed.)
The XmlTextReader Properties and Methods • Run the XmlException demo • We will examine the XmlNodeReader and XmlValidatingReader – the other two XmlReader classes – later in this course.
Using the DOM to Iterate through an XML Document • In contrast to the Push method (XmlReader/XmlWriter), the .NET Framework offers a Pull method. • Recall that the Pull method reads the entire XML document into memory and then works with it from there. • For this model, XML documents are represented in the Document Object Model (DOM).
What is the DOM? • DOM stands for Document Object Model, and it’s a model that can be used to describe an XML document. • The DOM expresses the XML document as a hierarchy of nodes, where each element can have zero to many children elements. • The text content and attributes of an element are expressed as its children as well.
Example XML File <?xml version="1.0" encoding="UTF-8" ?> <books> <book price="34.95"> <title>TYASP 3.0</title> <authors> <author>Mitchell</author> </authors> </book> <book price=“29.95"> <title>ASP.NET Tips</title> <authors> <author>Mitchell</author> <author>Walther</author> <author>Seven</author> </authors> </book> </books>
The DOM Classes - XmlNode • There are a number of classes in the System.Xml namespace that represent the DOM. • Each “box” in the DOM model is represented in the .NET Framework by the XmlNode class. • This means that elements, attributes, and text values are all represented by the XmlNode class. The XmlNode class is discussed on pg. 287
Extending the XmlNode Class • There are a number of classes that are derived from the XmlNode class: • XmlAttribute • XmlElement • XmlDocument • And so on…
The XmlNode Properties • The XmlNode class many properties, the most germane ones being: • Name – the name of the node. For elements and attributes, the name is the name of the element or attribute. For text content, the name is #text. • Value – the value of the DOM element. For elements, there is no value. For attributes, it’s the value of the attribute; for text nodes, it’s the value of the text in the node. • NodeType – indicates the type of the node (element, text, attribute, etc.)
More XmlNode Properties • InnerXml – the string content of the XML markup of the node’s children. • OuterXml – the string content of the XML markup of the node itself and its children. • InnerText – the string content of the value of the node and all its children nodes. • HasChildNodes – a Boolean, indicating if the node has any children.
The XmlNodeList Class • The XmlNodeList class represents an arbitrary collection of XmlNodes. • For example, the XmlNode class has a ChildNodes property, which returns an XmlNodeList instance. This instance is a collection of nodes representing the DOM element’s children.
Loading an XML Document into a DOM Representation • The XmlDocument’s Load() method has four variations: • Load(Stream) • Load(string) • Load(TextReader) • Load(XmlTextReader) • In the Load(string) variation, the input string is a file path (or URL) to the XML file to load into the DOM representation.
The XmlDocument Properties • The XmlDocument is derived from the XmlNode class, meaning it has all of the properties and methods available to the XmlNode class. • Once an XML file has been loaded into an XmlDocument instance, we can access the root element through the DocumentElement property.
The XmlElement and XmlAttribute Classes • The XmlElement and XmlAttribute classes are also derived from the XmlNode class. • They represent, respectively, an element and an attribute.
Example • The following loads and XML document and displays the name of the root element. Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepath) Dim rootElementName as String rootElementName = xmlDoc.DocumentElement.Name
Example • Iterating through the root element’s children: Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepath) Dim n as XmlNode For Each n in xmlDoc.DocumentElement.ChildNodes ' Display the name of the node using n.Name Next
An Example of Iterating through an XML Document • Let’s create an application that displays an XML document in a TreeView control. • Each node in the TreeView represents a Node in the DOM
An Example of Iterating through an XML Document • We can recursively iterate through the DOM, ensuring that we’ll visit each node. (Explain recursion?) • Examine application code... • Questions on the program?
Navigating through an XML Document • So far, all we have seen is how to iterate through an XML document, one node at a time. • With the pull method (DOM), however, we can navigate through the document as well. • For example, we might want access just the elements in the document that have a certain name. (Such as elements with the name <author>.)
Accessing Elements with a Certain Name • The XmlDocument class contains a GetElementsByTagName() method, which returns an XmlNodeList containing elements that have the specified tag name. Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepath) Dim n as XmlNode For Each n in xmlDoc.GetElementsByTagName("author") Display n.Value Next What would be the output of the above code???
Navigating through an XML Document • However, what if we want to access nodes based on more complex criteria, such as: “Access all <book> elements with a price attribute value less than 30,” or, “Access the name of the authors who have written more than one book.” • To accomplish this we need something more powerful – enter XPath!
A Quick Examination of XPath • XPath is used to define particular sections of an XML document. • XPath is named XPath because its syntax is similar to the syntax for a file path. For example, in our books XML document, we could use the following XPath statement to access all of the author elements: /books/book/authors/author
Why We Might Want to Access Certain XML Document Portions • When using XSLT to display an XML file, typically we want to display only a subset of the XML document. For example, we might want to display a listing of flights, displaying the date, the departure city and the destination city. • When working with XML data, we might want to retrieve only a certain subset of the data. • We might want to access data that meets a certain set of criteria. All of these tasks can be accomplished with XPath
XPath Components – Steps • To access the root element of the XML document, we use the following syntax: /RootElementName • Then, to access immediate descendents (children) of a given element, we use /, followed by the name of the child element. • The / operator is referred to as the step operator.
XPath Components – Steps • The step operator has parallels to the \ operator in file paths. With file systems (which can be modeled as XML documents), you navigate the directory structure by using \. For example, a path like: C:\Games\Quake\SavedGames • This file path - C:\Games\Quake\SavedGames – takes you to the specified directory. • A file system can be represented as an XML Document
The file system can be represented as an XML document… <?xml version="1.0" encoding="UTF-8" ?> <filesystem> <drive letter="C"> <folder name="Program Files" /> <folder name="Games"> <folder name="Quake"> <folder name="SavedGames" /> <file>Quake.exe</file> <file>README.txt</file> </folder> </folder> <folder name="Windows"> <file>README.txt</file> </folder> </drive> <drive letter="D"> <folder name="Backup"> <file>2003-06-01.bak</file> <file>2003-06-07.bak</file> </folder> </drive> </filesystem>
XPath Components - Steps • Using XPath we can access all of the root element using: /filesystem
XPath Components - Steps • To access all of the <drive> elements, we’d use: /filesystem/drive
XPath Components - Steps • To access all of the folder elements that were children of <drive> elements, we’d use: /filesystem/drive/folder
XPath Components - Steps • What about /filesystem/drive/folder/folder/folder
Descendent Steps • Using elementName/elementName2, we get all of the elements that are children of elementName that have the name elementName2. • But what if we want all elements that are descendents of elementName, regardless of whether or not the element is a child, grandchild, great-grandchild, etc.? • Here, we use the // operator.
Descendent Steps • As we saw earlier, /filesystem/drive/folder will return the folders that are immediate children of the <drive> element (Program Files, Games, and Window). • If we want to get all folders, regardless of their depth in the hierarchy, we can use: /filesystem/drive//folder
Descendent Steps - Example • What will /filesystem//file return?