70 likes | 205 Vues
This project provides an overview of XML (eXtensible Markup Language), a versatile markup language specification from W3C, and its general structure similar to HTML. It details two main Java APIs for XML parsing: SAX (Simple API for XML) and DOM (Document Object Model). SAX is memory-efficient and processes XML in a stream-like manner, while DOM creates an in-memory representation of the XML document, allowing for more flexible data manipulation. The project includes a simple SAX example to print a MODS document without formatting, demonstrating the parsing capabilities in Java.
E N D
XML Parsing Using Java Timmy Wong, Fall 2010 AIP Independence project
XML overview • XML (eXtensible Markup Language) is a language specification created by the W3C • A very general version of HTML • Format takes the form of arbitrary tags that contain information • e.g. <recordCreationDate encoding="w3cdtf">2010-10-06</recordCreationDate> • These tags are defined in XML schema documents (.xsd)
JAXP Java API for XML Parsing, the default Java XML parsing library There are two main default interfaces.
SAX • SAX (Simple API for XML) is used for serial reading, analogous to a file stream • Faster and uses less memory • Doesn’t store the XML file in memory • The user is responsible for keeping track of needed data
DOM • DOM (Document-Object Model) • Creates an actual internal tree representation of the XML • Provides non-sequential access, allowing data to be manipulated at will • Slower and takes more memory
A related API: JAXB • Java API for XML Binding • A separate and somewhat more sophisticated approach • Using the schema document, XML tags are bound as actual Java objects • Allows intuitive coding, but also memory-intensive
A simple example This program uses SAX to print the provided sample MODS document Doesn’t apply any formatting or try to figure out how to use the information yet, but this should be possible using the MODS specification