550 likes | 751 Vues
XML 语言 及其应用. 主要内容. XML 概述 XML 语法 DTD 的建立和使用 XML 的解析器 DOM. XML 概述. 什么是 XML. 什么是 Html ( 例 1 - 1 ) 什么是 Xml ( 例 1 - 2 ) 需要 DTD 作为置标的语法 需要样式单来显示 DTD 的描述( 例 1 - 3 ). 2 ) XML 应用实例. 返回. 例 1 - 1. < UL > < LI >张三</ LI > < UL > < LI >用户 ID : 001 </ LI > < LI >公司: A 公司</ LI >
 
                
                E N D
主要内容 • XML概述 • XML语法 • DTD的建立和使用 • XML的解析器DOM
什么是XML • 什么是Html(例1-1) • 什么是Xml(例1-2) • 需要DTD作为置标的语法 • 需要样式单来显示 • DTD的描述(例1-3) 2)XML应用实例
返回 例1-1 <UL> <LI>张三</LI> <UL> <LI>用户ID:001</LI> <LI>公司:A公司</LI> <LI>EMAIL:zhang@aaa.com</LI> <LI>电话:(010)62345678</LI> <II>地址:五街1234号</LI> <U>城市:北京市</LI> <LI>省份:北京</LI> </Ul> <LI>李四</LI> <UL> <U>用户ID:002</LI> <LI>公司:B公司</LI> <LI>EMAIL:li@bbb.org</LI> <LI>电话:(021)87654321</LI> <LI>地址:南京路9876号</LI> <LI>城市:上海市</LI> <LI>省份:上海<LI> </UL> </UL>
返回 例1-2 <联系人列表> <联系人> <姓名>张三</姓名> <ID>001</ID> <公司>A公司</公司> <EMAIL>zhang@aaa.com</EMAIL> <电话>(010)62345678</电话> <地址> <街道>五街1234号</街道> <城市>北京市</城市> <省份>北京</省份> </地址> </联系人> <联系人> <姓名>李四</姓名> <ID>002</ID> <公司>B公司</公司> <EMAII>1i@bbb.org</EMAII> <电话>(021)87654321</电话> <地址> <街道>南京路9876号</街道> <城市>上海市</城市> <省份>上海</省份> </地址> </联系人> </联系人列表>
返回 例1-3 <!ELEMENT 联系人列表(联系人)*> <!ELEMENT 联系人(姓名,ID,公司,EMAIL,电话,地址)> <!ELEMENT 地址(街道,城市,省份)> <!ELEMENT 姓名(#PCDATA)> <!ELEMENT ID(#PCDATA)> <!ELEMENT 公司(#PCDATA)> <!ELEMENT EMAIL(#PCDATA)> <!ELEMENT 电话(#PCDATA)> <!ELEMENT 街道(#PCDATA)> <!ELEMENT 城市(#PCDATA)> <!ELEMENT 省份(#PCDATA)>
XML应用实例 • 为置标语言FCLML公司的客户列表置表语言制定的,文档类型定义DTD,其程序为Com.dtd • 客户联系信息的XML文档Com.xml (例1-5) • 为Com.xml制定一个样式Com.xsl(例1-6) • Html格式及显示(例1-7,例1-8) 3)XML和Html比较
返回 例1-4 Fclml.dtd <? xml version=“1.0” encoding=“GB2312”? > <!ELEMENT 联系人列表(联系人)*> <!ELEMENT 联系人(姓名,ID,公司,EMAIL,电话,地址)> <!ELEMENT 地址(街道,城市,省份)> <!ELEMENT 姓名(#PCDATA)> <!ELEMENT ID(#PCDATA)> <!ELEMENT 公司(#PCDATA)> <!ELEMENT EMAIL(#PCDATA)> <!ELEMENT 电话(#PCDATA)> <!ELEMENT 街道(#PCDATA)> <!ELEMENT 城市(#PCDATA)> <!ELEMENT 省份(#PCDATA)>
返回 例1-5 Com.xml < ? xml version=“1.0”encoding=”GB2312” standalone=”no”?> <!DOCTYPE 联系人列表 SYSTEM”com.dtd”> <?xml—stylesheet type=”text/xsl” href=“mystyle.xsl”?> <联系人列表> <联系人> <姓名>张三</姓名> <ID>001</ID> <公司>A公司</公司> <EMAIL>zhang@aaa.com</EMAIL> <电话>(010)62345678</电话> <地址> <街道>五街1234号</街道> <城市>北京市</城市> <省份>北京</省份> </地址> </联系人> <联系人> <姓名>李四</姓名> <ID>002</ID> <公司>B公司</公司> <EMAII>1i@bbb.org</EMAII> <电话>(021)87654321</电话> <地址> <街道>南京路9876号</街道> <城市>上海市</城市> <省份>上海</省份> </地址> </联系人> </联系人列表>
返回 例1-6 MyStyle.xsl <?xml version="1.0"encOding="GB2312"?> <xsl:stylesheet xmlns:xsl=”http://www.w3.org/TR/WD-xsl" xmlHs="http://www.w3.org/TR/REC—html40" result—ns:=""> <xst:template><xsI:apply—templates/></xsl:template> <xsl:template match="/"> <HTML> <HEAD> <TITLE>F公司的客户联系信息</TITlE> </HEAD> <BODY> <xsl:apply—templates select="联系人列表"/> </BODY> </HTMI> </xsl:template>
例1-6 <xsl:stemplat match="联系人列表"> <xsl:for—each select="联系人"> <UL> <LI><xsl:value—of select="姓名"/><LI> <UL> <LI>用户ID:<xsl:value—of select="ID"/></LI> <LI>公司:<xsl:value—of select=“公司"/></LI> <LI>EMAIL:<xsl:value—of select=“EMAIL"/></LI> <LI>电话:<xsl:value—of select=“电话"/></LI> <LI>街道:<xsl:value—of select=“地址/街道"/></LI> <LI>城市:<xsl:value—of select=“地址/城市"/></LI> <LI>省份:<xsl:value—of select=“地址/省份"/></LI> </UL> </UL> </xsl:for-each> </xsl:template> </xsl:stylesheet>
返回 例1-7 <HTML> <HEAD> <TITLE>F公司的客户联系信息</TITLE> </HEAD> <BODY> <UL> <LI>张三</LI> <UI> <LI>用户ID:001</LI> <LI>公司:A公司</LI> <LI>EMAIL:zhang@aaa.com</LI> <LI>电话:(010)62345678</LI> <LI>地址:五街1234号</LI> <LI>城市:北京市</LI> <LI>省份:北京</LI> </UL> <LI>李四</LI> <UL> <LI>ID:002</LI> <LI>公司:B公司</LI> <LI>EMAIL:1i@bbb.or8</LI> <LI>电话:(021)87654321</LI> <LI>地址:南京路9876号</LI> <LI>城市:上海市</LI> <LI>省份:上海</LI> </UL> </UL> </BODY> </HTML>
返回 例1-8 • 张三 • 用户ID:001 • 公司:A公司 • EMAIL:zhang@aaa.com • 电话:(010)62345678 • 地址:五街1234号 • 城市:北京市 • 省份:北京 • 李四 • ID:002 • 公司:B公司 • EMAIL:1i@bbb.or8 • 电话:(021)87654321 • 地址:南京路9876号 • 城市:上海市 • 省份:上海
GML(1969) 通用置标语言 SGML(1985) 标准通用置标语言 超文本置标语言 HTML(1993) XML(1998) 可扩展置标语言 XHTML SVG SMIL HDML … OEB 可缩放矢量图形语言 同步多媒体综合语言 手持设备置标语言 可扩展超文本置标语言 开放电子结构规范 XML和Html比较 置标语言家谱表
Sources • Major Sources: • http://www.cis.upenn.edu/~cis550/slides/xml.ppt CIS550 Course Notes, U. Penn, source for many slides • http://www.cs.technion.ac.il/~oshmu/236804 - Seminar in Computer Science 4: XML - Technology, Systems and Theory • http://dom4j.org
Agenda • Short Introduction to XML • What is XML • Structure and Terminology • JAVA APIs for XML: an Overview • dom4j • Parsing an XML document • Writing to an XML document • Xpath • Xpath Queries • Xpath in dom4j • References
The Structure of XML • XML consists of tags and text • Tags come in pairs<date> ...</date> • They must be properly nested <date> <day> ... </day> ... </date> --- good <date> <day> ... </date>... </day> --- bad
XML text • XML has only one “basic” type -- text. It is bounded by tags e.g. <title> The Big Sleep </title> <year> 1935 </ year> --- 1935 is still text • XML text is called PCDATA (for parsed character data). • It uses a 16-bit encoding.
XML structure • Nesting tags can be used to express various structures. E.g. A tuple (record): <person> <name> Jeff Cohen</name> <tel> 04-828-1345 </tel> <tel> 054-470-778 </tel> <email> jeffc@cs.technion.ac.il </email> </person>
XML structure (cont.) • We can represent a list by using the same tag repeatedly: <addresses> <person> ... </person> <person> ... </person> <person> ... </person> ... </addresses>
XML structure (cont.) • Nested tags can be part of a list too: <addresses> <person> <name> Yossi Orr</name> <tel> 04-828-1345 </tel> <email> yossio@cs.technion.ac.il </email> </person> <person> <name> Irma Levy</name> <tel> 03-426-1142 </tel> <email>irmal@yourmail.com</email> </person> </addresses>
Terminology • The segment of an XML document between an opening and a corresponding closing tag is called an element. • Meta date about an element can appear in an attribute. attribute <person type=“Friend”> <name>Ortal Derech</name> <tel>04-8732122</tel> <tel>054-646888</tel> <email>oderech@tx.technion.ac.il</email> </person> element text element, a sub-element of
person name tel tel email XML is tree-like Malcolm Atchison (215) 898 4321 (215) 898 4321 mp@dcs.gla.ac.sc
A Complete XML Document • Tells whether or not this document references an external entity or an external data type specification <?XMLversion ="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE addresses SYSTEM "http://www.technion.ac.il/~erant/addresses.dtd"> <addresses> <person> <name> Jeff Cohen</name> <tel> 04-828-1345 </tel> <tel> 054-470-778 </tel> <email> jeffc@cs.technion.ac.il </email> </person> </addresses>
XML Structure Definitions • DTD • Document Type Definition – defines structure constraints for XML documents • XML Schema • Same as DTD, more powerful because it includes facilities to specify the data type of elements and it is based on XML. • Namespaces • Namespaces are a way of preventing name clashes among elements from more than one source within the same XML document.
More Standards • Xpath • XML Path Language, a language for locating parts of an XML document. • Xquery • A query language for XML documents (like SQL…). • XSLT • XSL Transformations, a language for transforming XML documents into other XML documents. • RDF • Resource Description Framework. A formal knowledge model from the World Wide Web.
Why Is XML Important? • Because it exists, and everybody uses it. • Plain Text - you can create and edit files with anything. • Data Identification - XML tells you what kind of data you have, not how to display it. • Separation from style. • Hierarchical, and easily processed.
An Overview of the APIs • JAXP: Java API for XML Processing • It provides a common interface for creating and using the standard SAX, DOM, and XSLT APIs. • JAXB: Java Architecture for XML Binding • defines a mechanism for writing out Java objects as XML. • JDOM • Represents an XML file as a tree of objects (sophisticated version of DOM) • dom4j • Lightweight version of JDOM.
Agenda • Introduction to XML • What is XML • Structure and Terminology • JAVA APIs for XML: an Overview • dom4j • Parsing an XML document • Writing to an XML document • Xpath • Xpath Queries • Xpath in dom4j • References
dom4j • An Open Source XML framework for Java. • Allows you to read, write, navigate, create and modify XML documents. • Integrates with DOM and SAX. • Full XPath support. • XSLT Support.
Download and Use • Go to: http://dom4j.org. • Go to http://dom4j.org/download.html, and download the latest release (current = 1.4). • Unzip. • Don’t forget the classpath. When working in an IDE, don’t forget to add the log4j.jar library. • Javadoc: http://dom4j.org/apidocs/index.html. • Quick start guide: http://dom4j.org/guide.html.
Opening an XML Document import org.dom4j.*; public class TestDom4j{ publicDocumentparse(Stringid) throwsDocumentException{ SAXReaderreader=newSAXReader(); Documentdocument=reader.read(id); returndocument; } } We can read: file, URL, InputStream, String
Example XML File <?xml version="1.0" encoding="UTF-8" ?> <salesdata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="C:\Documents and Settings\eran\ My Documents\Academic\Courses\XML\xpath_ass_schema.xsd"> <year> <theyear>1997</theyear> <region><name>central</name><sales unit="millions">34</sales></region> <region><name>east</name><sales unit="millions">34</sales></region> <region><name>west</name><sales unit="millions">32</sales></region> </year> <year> <theyear>1998</theyear> <region><name>east</name><sales unit="millions">35</sales></region> region><name>west</name><sales unit="millions">42</sales></region> </year> </salesdata>
Accessing XML Elements Accessing root element Retrieving child elements • publicvoiddump(Documentdocument) • throwsDocumentException{ • Elementroot=document.getRootElement(); • for(Iteratori=root.elementIterator();i.hasNext();){ • Elementelement=(Element)i.next(); • System.out.println(element.getQualifiedName()); • System.out.println(element.getTextTrim()); • System.out.println(element.elementText("theyear")); • } • } Retrieving element text Retrieving element name Retrieving the text of the child element “theyear”
Accessing XML Elements – cont’d • What will be the output of dump()? year 1997 year 1998 Why?
Accessing XML Elements Recursively publicvoidgo(Elementelement,intdepth){ for(intd=0;d<depth;d++){ System.out.print(" "); } System.out.print(element.getQualifiedName()); System.out.println(" "+element.getTextTrim()); for(Iteratori=element.elementIterator();i.hasNext();){ Elementson=(Element)i.next(); go(son,depth+1); } } What will be the output?
Accessing Recursively – cont’d salesdata year theyear 1997 region name central sales 34 region name east sales 34 region name west sales 32 year theyear 1998 region name east sales 35 region name west sales 42 The whole XML tree, element names + values
Creating an XML document Creating root element publicDocumentcreateDocument(){ Documentdocument= DocumentHelper.createDocument(); Elementroot=document.addElement("phonebook"); Elementaddress1=root.addElement("address") .addAttribute("name","Yuval") .addAttribute("category","family") .addText("Ehud 3, Jerusalem"); Elementaddress2=root.addElement("address") .addAttribute("name","Ortal") .addAttribute("category","friends") .addText("Kibbutz Givaat Haim"); returndocument; } Adding elements What will we get when running go()?
Creating an XML document – cont’d phonebook address Ehud 3, Jerusalem address Kibbutz Givaat Haim XML tree structure of the new document FileWriterout=newFileWriter("C:\\addresses.xml"); document.write(out); String XML = document.asXML() Writing the XML document to a file Retrieving the XML itself as string
Client Program publicstaticvoidmain(String[]args){ Foofoo=newFoo(); try{ Documentdoc=foo.parse("C\\sales.xml"); foo.dump(doc); foo.go(doc.getRootElement(),0); foo.xpath(doc); DocumentnewDoc=foo.createDocument(); foo.go(newDoc.getRootElement(),0); FileWriterout=newFileWriter("C:\\addresses.xml"); newDoc.write(out); } catch(ExceptionE){ System.out.println(E); } } Opening the file Dumping and printed recursively Creating a new document
Agenda • Introduction to XML • What is XML • Structure and Terminology • JAVA APIs for XML: an Overview • dom4j • Parsing an XML document • Writing to an XML document • Xpath • Xpath Queries • Xpath in dom4j • References
Xpath - Introduction • XML Path Language. XPath is a language for addressing parts of an XML document. • Enables node locating and retrieving, very much like directory accessing in file systems. • Limited (but not bad) filtering and querying abilities. • Retrieved the actual PCDATA or node sets
Xpath – Simple Path Selection Xpath Expression: /salesdata/year/theyear <theyear>1997</theyear> <theyear>1998</theyear> /salesdata/year[2]/theyear <theyear>1998</theyear> “/” signifies child-of Filtering the level – getting only the second year element
Xpath – Conditions /salesdata/year/region[sales > 34] <region> <name>east</name> <sales unit="millions">35</sales> </region> <region> <name>west</name> <sales unit="millions">42</sales> </region> Going down to region, and filtering according to the sales element /salesdata/year/region[sales > 34]/name ?
Xpath – Traveling Up the Tree /salesdata/year/region[sales > 34]/parent::year/theyear <theyear>1998</theyear> Going up the XML tree (and then down again)
Xpath – Traveling Down Fast /descendant::sales <sales unit="millions">34</sales> <sales unit="millions">34</sales> <sales unit="millions">32</sales> <sales unit="millions">35</sales> <sales unit="millions">42</sales> ./*/sales Going all the way down, until the sales element Same same
Xpath – Advanced Queries • The years (text nodes) for which sales data exists: //region[name=\"west\" and sales > 32]/sales[@unit='millions']/ancestor::year/theyear <theyear>1998</theyear> ancestor is same as parent but goes all the way up to year Logical operators Accessing attributes
Xpath – Advanced Queries (cont’d) • The years (text nodes) in which the west region sales were higher than the east region sales; sales may be expressed in thousands or in millions: year[region[name="west"]/sales[@unit='millions'*1000 or @unit='thousands'] > region[name="east"]/sales[@unit='millions‘*1000 or @unit='thousands']]/theyear/text()