Storing XML in ORDBMS

Storing XML in ORDBMS Amine Kaddara Supervisor: Dr Haddouti

Outline • Motivation • Benefits of using ORDBMS for storing XML • Storage techniques using XORator algorithm • JDOM API (JavaDOM) • JDOM Examples • JDO API(Java Data Objects) • JDO Examples

Motivation • First, most database vendors today offer universal database products that combine their relational DBMS and ORDBMS offerings into a single product. • Second, an ORDBMS has a more expressive type system than an RDBMS. • Third, an ORDBMS is better suited for storing and querying XML documents that may use a richer set of data types.

Motivation: Applications • Computer-Aided Design (CAD) • Computer-Aided Manufacturing (CAM) • Computer-Aided Software Engineering (CASE) • Network Management Systems • Office Information Systems (OIS) and Multimedia Systems • Digital Publishing • Geographic Information Systems (GIS) • Interactive and Dynamic Web sites • Other applications with complex and interrelated objects and procedural data.

Motivation: RDBMS weaknesses • Poor Representation of “Real World” Entities • Normalization leads to relations that do not correspond to entities in “real world”. Semantic Overloading • Relational model has only one construct for representing data and data relationships: the relation. • Relational model is semantically overloaded. • Difficulty Handling Recursive Queries • RDBMSs are poor at navigational access to data. • Limited Operations • RDBMs only have a fixed set of operations which are difficult to extend.

Motivation: ORDBMS Advantages • Add object storage facilities to relational database • Greater flexibility than strict relational • Easier to introduce into organisation than full OO • Backwards compatible with strict relational applications, SQL etc • Relational paradigm retained • Tables with rows of values • But attributes can contain objects, sets, arrays, tuples etc

Motivation: ORDBMS Advantages • Code held within database, as functions, procedures or methods • common functionality can be centralised rather than re-implemented by every application that uses the data • BLOBs(Binary Large Objects) and CLOBs(Character Large Objects) are used to store large unstructured values within database • allows storage of complex data e.g. multimedia

Motivation: ORDBMS Advantages • ORDBMS • The ability to directly manipulate data stored in a relational database using an object programming language is called transparent persistence • Object-relational mapping means less code to write • Higher performance over an embedded SQL or a call interface(JDBC,ODBC)

XML and ORDBMS

XORator mapping • The XORator(XML to OR Translator) algorithm is a practical demonstration of the use of XML data types • It takes advantage of using an ORDBMS over an RDBMS. • XORatoruses Document Type Definitions (DTDs) to map XML documents to tables in an ORDBMS. • An important part of this mapping is the assignmentof a fragment of an XML document to a new XML data type, called XADT (XML Abstract Data Type).

XORator: DTD -> OR schema • Reducing the DTD complexity • Building DTD graph • Mapping DTD to OR schema • Defining XADT(XML Abstract Data Types)

XORator: DTD -> OR schema • <!ELEMENT PLAY (INDUCT?, ACT+)> • <!ELEMENT INDUCT (TITLE, SUBTITLE*, SCENE+)> • <!ELEMENT ACT (SCENE+, TITLE, SUBTITLE*, SPEECH+, PROLOGUE?)> • <!ELEMENT SCENE (TITLE, SUBTITLE*, (SPEECH | SUBHEAD)+)> • <!ELEMENT SPEECH (SPEAKER, LINE)+> • <!ELEMENT PROLOGUE (#PCDATA)> • <!ELEMENT TITLE (#PCDATA)> • <!ELEMENT SUBTITLE (#PCDATA)> • <!ELEMENT SUBHEAD (#PCDATA)> • <!ELEMENT SPEAKER (#PCDATA)> • <!ELEMENT LINE (#PCDATA)>

XORator: DTD complexity • Simplify the DTD information to a form that makes the mapping process easier. • Set of transformations to reduce the number of nested expressions and the number of element items: • Flattening (to convert a nested definition into a flat representation): (e1,e2)* -> e1, e2 • Simplification (to reduce multiple unary operators into a single unary operator) : e1**->e1* • Grouping (to group subelements that have the same name): e0; e1*; e1*; e2 -> e0; e1*; e2 • In addition, e+ is transformed to e*.

XORator: DTD -> OR schema • The simplified version of the previous DTD • <!ELEMENT PLAY (INDUCT?, ACT*)> • <!ELEMENT INDUCT (TITLE, SUBTITLE*, SCENE*) • <!ELEMENT ACT (SCENE*, TITLE, SUBTITLE*, SPEECH*, PROLOGUE?)> • <!ELEMENT SCENE (TITLE, SUBTITLE*, SPEECH*, SUBHEAD*)> • <!ELEMENT SPEECH (SPEAKER*, LINE*)> • <!ELEMENT PROLOGUE (#PCDATA)> • <!ELEMENT TITLE (#PCDATA)> • <!ELEMENT SUBTITLE (#PCDATA)> • <!ELEMENT SUBHEAD (#PCDATA)> • <!ELEMENT SPEAKER (#PCDATA)> • <!ELEMENT LINE (#PCDATA)>

XORator: DTD -> OR schema • we build a DTD graph to represent the structure of the DTD. • Nodes in the DTD graph are elements, attributes, and operators. • In the DTD graph, elements that contain characters are duplicated to eliminate the sharing.

XORator: DTD -> OR schema • Given an DTD graph, a relation is created for nodes that satisfy any of these following conditions: 1) nodes that have an in-degree of zero 2) recursive nodes with in-degree greater than one 3) one node among mutually recursive nodes with in-degree one. 4) All remaining nodes (nodes not mapped to a relation) are inlined as attributes under the relation created for their closest ancestor nodes (in the DTD graph).

XORator: DTD -> OR schema • An XADT attribute can store a fragment of an XML document • The XORator algorithm allows mapping an entire subtree of the DTD graph to an attribute of the XADT.

XORator: XADT • A storage representation is to use a compressed representation for each XML fragment. • The element tags are mapped to integer codes, and element tags are replaced by these integer codes. • A small dictionary is stored along with the XML fragment to record the mapping between the integer codes and the actual element tag names. • There is two implementations of the XADT: one that uses compression, and the other one that does not.

XORator: XADT • The decision to use the “correct” implementation of the XADT is made during the document transformation process by monitoring the effectiveness of the compression technique. • Compression is used only if the space efficiency is above a certain threshold value.

XORator: XADT • XADT getElm(XADT inXML, VARCHAR rootElm, VARCHAR searchElm, VARCHAR searchKey, INTEGER level): • This Method returns all rootElm elements that have searchElm within a depth of level from the rootElm. • INTEGER findKeyInElm(XADT inXML, VARCHAR searchElm, VARCHAR searchKey): • This method examines all elements with the tag name searchElm in inXML, and searches for all searchElm elements with content that matches the searchKey keyword and returns 1 if true • XADT getElmIndex(XADT inXML, VARCHAR parentElm, VARCHAR childElm, INTEGER startPos, INTEGER endPos): • This method returns all childElm elements that are children of the parentElm elements and with the sibling order from startPos to endPos positions.

XORator: XADT • This query retrieves lines that are spoken in acts by the ‘SPEAKER’ ‘HAMLET’ and have the keyword ‘friend’ in the line.

JDOM • JDOM is an open source, tree-based(DOM), pure Java API for parsing, creating, manipulating, and serializing XML documents • JDOM represents an XML document as a tree composed of elements, attributes, comments, processing instructions, text nodes, CDATA sections,etc.. • JDOM is written in and for Java. It consistently uses the Java coding conventions and the class library and it implemets the cloenable and serializable interfaces

JDOM • Xerces 1.4.4 is bundled with JDOM to parse XML documents. • A JDOM tree is fully read-write. All parts of the tree can be moved, deleted, and added to, subject to the usual restrictions of XML. • Unlike DOM, there are no annoying read-only sections of the tree that one can’t change.

JDOM Example <person> <name>Michael Owen</name> <address>222 Bazza Lane, Liverpool, MN</address> <ssn>111-222-3333</ssn> <email>michael@owen.com</email> <home-phone>720.111.2222</home-phone> <work-phone>111.222.3333</work-phone> </person>

JDOM Example public class Person { private String name; private String address; private String ssn; private String email; private String homePhone; private String workPhone;// -- allows us to create a Person public Person(String name, String address, String ssn, String email, String homePhone, String workPhone) { this.name = name; this.address = address; this.ssn = ssn; this.email = email; this.homePhone = homePhone; this.workPhone = workPhone; }// -- used by the data-binding

JDOM Example public Person() { }// -- accessors public String getName() { return name; } public String getAddress() { return address; } public String getSsn() { return ssn; } public String getEmail() { return email; } public String getHomePhone() { return homePhone; } public String getWorkPhone() { return workPhone; }// -- mutators public void setName(String name) { this.name = name; } public void setAddress(String address) { this.address = address; } public void setSsn(String ssn) { this.ssn = ssn; } public void setEmail(String email) { this.email = email; } public void setHomePhone(String homePhone) { this.homePhone = homePhone; } public void setWorkPhone(String workPhone) { this.workPhone = workPhone; }

JDOM Example import org.exolab.castor.xml.*; import java.io.FileReader; public class ReadPerson { public static void main(String args[]) { try { Person person = (Person) Unmarshaller.unmarshal(Person.class, new FileReader("person.xml")); System.out.println("Person Attributes"); System.out.println("-----------------"); System.out.println("Name: " + person.getName() ); System.out.println("Address: " + person.getAddress() ); System.out.println("SSN: " + person.getSsn() ); System.out.println("Email: " + person.getEmail() ); System.out.println("Home Phone: " + person.getHomePhone() ); System.out.println("Work Phone: " + person.getWorkPhone() ); } catch (Exception e) { System.out.println( e ); } } }

JDOM Example import org.exolab.castor.xml.*; import java.io.FileWriter; public class CreatePerson { public static void main(String args[]) { try {// -- create a person to work with Person person = new Person("Bob Harris", "123 Foo Street", "222-222- 2222", "bob@harris.org", "(123) 123-1234", "(123) 123-1234");// -- marshal the person object out as a <person> FileWriter file = new FileWriter("bob_person.xml"); Marshaller.marshal(person, file); file.close(); } catch (Exception e) { System.out.println( e ); } } }

JDOM Example import org.exolab.castor.xml.*;import java.io.FileWriter; import java.io.FileReader; public class ModifyPerson { public static void main(String args[]) { try {// -- read in the person Person person = (Person) Unmarshaller.unmarshal(Person.class, new FileReader("person.xml"));// -- change the name person.setName("David Beckham");// -- marshal the changed person back to disk FileWriter file = new FileWriter("person.xml"); Marshaller.marshal(person, file); file.close(); } catch (Exception e) { System.out.println( e ); } }}

JDO • Sun's Java Data Objects (JDO) standard. • JDO allows you to persist Java objects. • It supports transactions and multiple users. It differs from JDBC in that you don't have to think about SQL and "all that database stuff." • It differs from serialization as it allows multiple users and transactions. • It allows Java developers to use their object model as a data model. There is no need to spend time going between the "data" side and the "object" side.

JDO: Example package addressbook; import java.util.*;//OF Import javax.jdo.*; Importcom.prismt.j2ee.connector.jdbc.ManagedConnectionFactoryImpl; public class PersonPersist{ private final static int SIZE = 3; private PersistenceManagerFactory pmf = null; private PersistenceManager pm = null; private Transaction transaction = null; private Person[] people; // Vector of current object identifiers private Vector id = new Vector(SIZE); public PersonPersist() { try { Properties props = new Properties(); props.setProperty("javax.jdo.PersistenceManagerFactoryClass", "com.prismt.j2ee.jdo.PersistenceManagerFactoryImpl"); pmf = JDOHelper.getPersistenceManagerFactory(props); pmf.setConnectionFactory( createConnectionFactory() ); } catch(Exception ex) { ex.printStackTrace(); System.exit(1); } }

JDO: Example public static Object createConnectionFactory() { ManagedConnectionFactoryImpl mcfi = new ManagedConnectionFactoryImpl(); Object connectionFactory = null; try { mcfi.setUserName("scott"); mcfi.setPassword("tiger"); mcfi.setConnectionURL("jdbc:oracle:thin:@localhost:1521:thedb"); mcfi.setDBDriver("oracle.jdbc.driver.OracleDriver"); connectionFactory = mcfi.createConnectionFactory(); } catch(Exception e) { e.printStackTrace(); System.exit(1); } return connectionFactory; }

JDO: Example public void persistPeople() { // create an array of Person's people = new Person[SIZE]; // create three people people[0] = new Person("Gary Segal", "123 Foobar Lane“,"123-123-1234", "gary@segal.com", "(608) 294-0192", "(608) 029-4059"); people[1] = new Person("Michael Owen", "222 Bazza Lane, Liverpool, MN", "111-222-3333", "michael@owen.com", "(720) 111-2222", "(303) 222-3333"); people[2] = new Person("Roy Keane", "222 Trafford Ave, Manchester, MN", "234-235-3830", "roy@keane.com", "(720) 940-9049", "(303) 309-7599)"); // persist the array of people pm = pmf.getPersistenceManager(); transaction = pm.currentTransaction(); pm.makePersistentAll(people); transaction.commit(); // retrieve the object ids for the persisted objects for(int i = 0; i < people.length; i++) { id.add(pm.getObjectId(people[i])); } // close current persistence manager to ensure that // objects are read from the db not the persistence // manager's memory cache. pm.close(); }

JDO: Example public void change() { Person person; // retrieve objects from datastore pm =pmf.getPersistenceManager(); transaction = pm.currentTransaction(); // change DataString field of the second persisted object person=(Person)pm.getObjectById(id.elementAt(1, false); person.setName("Steve Gerrard"); // commit the change and close the persistence manager transaction.commit(); pm.close(); }

JDOM Example • <addressbook name="Manchester United Address Book"> <person name="Roy Keane"> <address>23 Whistlestop Ave</address> <ssn>111-222-3333</ssn> <email>roykeane@manutd.com</email> <home-phone>720.111.2222</home-phone> <work-phone>111.222.3333</work-phone> </person> <person name="Juan Sebastian Veron"> <address>123 Foobar Lane</address> <ssn>222-333-444</ssn> <email>juanveron@manutd.com</email> <home-phone>720.111.2222</home-phone> <work-phone>111.222.3333</work-phone> </person></addressbook>

JDOM: Example import java.util.List; import java.util.ArrayList; public class Addressbook { private String addressBookName; private List persons = new ArrayList(); public Addressbook() { }// -- manipulate the List of Person public void addPerson(Person person) { persons.add(person); } public List getPersons() { return persons; } // -- manipulate the name of the address book public String getName() { return addressBookName; } public void setName(String name) { this.addressBookName = name; } }

JDOM Example • <?xml version="1.0"?><mapping><description>A mapping file for our Address Book application</description><class name="Person"> <field name="name" type="string"> <bind-xml name="name" node="attribute" /> </field> <field name="address" type="string" /> <field name="ssn" type="string" /> <field name="email" type="string" /> <field name="homePhone" type="string" /> <field name="workPhone" type="string" /></class><class name="Addressbook"> <field name="name" type="string"> <bind-xml name="name" node="attribute" /> </field> <field name="persons" type="Person" collection="collection" /></class></mapping>

JDOM Example import org.exolab.castor.xml.*; import org.exolab.castor.mapping.*; import java.io.FileReader; import java.util.List; import java.util.Iterator; public class ViewAddressbook { public static void main(String args[]) { try { // -- Load a mapping file Mapping mapping = new Mapping(); mapping.loadMapping("mapping.xml"); Unmarshaller un = new Unmarshaller(Addressbook.class); un.setMapping( mapping ); // -- Read in the Addressbook using the mapping FileReader in = new FileReader("addressbook.xml"); Addressbook book = (Addressbook) un.unmarshal(in); in.close();

JDOM Example // -- Display the addressbook System.out.println( book.getName() ); List persons = book.getPersons(); Iterator iter = persons.iterator(); while ( iter.hasNext() ) { Person person = (Person) iter.next(); System.out.println("\n" + person.getName() ); System.out.println("-----------------------------"); System.out.println("Address = "+ person.getAddress()); System.out.println("SSN = " + person.getSsn() ); System.out.println("Home Phone = " + person.getHomePhone() ); } } catch (Exception e) { System.out.println( e ); } } }

The End

Storing XML in ORDBMS

Storing XML in ORDBMS

Presentation Transcript

Storing and Querying XML Data in Databases

Querying and storing XML

Storing XML Data in Relational Databases

Querying and Storing XML

Querying and storing XML

Querying and storing XML

Querying and Storing XML

OODBMS and ORDBMS

Schemes of Storing XML Query Cache

Storing XML

Storing XML using native storage

PostgreSQL ORDBMS

Storing XML

Querying and storing XML

Storing XML