1 / 59

A Logic Programming Approach to Supporting the Entries of XML Documents in an Object Database

A Logic Programming Approach to Supporting the Entries of XML Documents in an Object Database. Ching-Long Yeh 葉 慶 隆 Department of Computer Science and Engineering Tatung University Taipei 104, Taiwan chingyeh@cse.ttu.edu.tw. Introduction. XML improves upon HTML in

bern
Télécharger la présentation

A Logic Programming Approach to Supporting the Entries of XML Documents in an Object Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Logic Programming Approach to Supporting the Entries of XML Documents in an Object Database Ching-Long Yeh 葉 慶 隆 Department of Computer Science and Engineering Tatung University Taipei 104, Taiwan chingyeh@cse.ttu.edu.tw XML Object Database

  2. Introduction • XML improves upon HTML in • capturing the meaning of a document and • extending the tag set. • At the same time, it also reduces the complexity of SGML. • It is believed that XML will soon be the standard of data exchanges on the Web. XML Object Database

  3. Introduction • Due to lack of indices in files, we are not able to make full use of the meaning (or metadata) in an XML document, if it is stored in a file. • Since an XML document can be easily viewed according to the object-oriented model, a promising solution is to employ object database technology to manage the access of XML documents. XML Object Database

  4. Introduction • In this talk, I will present our research in • the design and implementation of an XML object DB, and • an extensible template-based query interface to accessing to XML object database XML Object Database

  5. The Remainder of the Talk • An Introduction to XML • Design and Implementation of an XML Object Database • An Extensible Template-based Interface XML Object Database

  6. An Introduction to XML XML Object Database

  7. HyperText Markup Language • HTML is a language used to create hyperlink text in the WWW. • The text is presented according to a set of predefined tags. • The definition of tags is based on the Document Type Definition (DTD) of SGML. • In other words, HTML is an application of SGML in the WWW. XML Object Database

  8. Standard Generalized Markup Language • Central to SGML is the concept that documents have structure, content, and format. • These three ingredients combine to form a document. XML Object Database

  9. What is Content? • Content is the actual data within a document. • The words and illustrations that make up a bicycle assembly manual are its contents. XML Object Database

  10. What is Format? • Format consists of how the words, sentences, and paragraphs are visually presented and distinguished from one another within a document. • Boldface for title, italics for special terms, and blank lines between sections are examples of document formats. • People often confuse format with structure. XML Object Database

  11. 12 ounces coconut milk 4 to 6 tablespoons sugar 4 to 6 tablespoons cornstarch 3/4 cup water Pour coconut milk into saucepan. Combine sugar and cornstarch; stir in water and blend well. Stir sugar mixture into coconut milk; cook and stir over low heat until thickened. What is Structure? Recipe Title Coconut Pudding Ingredient List Ingredient Instruction List Step XML Object Database

  12. Document Type Definition • Defining the structures in XML/SGML • The structure of a document  its type  is defined by a document type definition, or DTD. • The DTD lays out the rules for a document through the use of elements, attributes, and entities. XML Object Database

  13. Document Type Definition • A DTD looks like <!ELEMENT recipe -- ( title, ingredientList, instructionList)> <!ELEMENT title -- (#PCDATA)> <!ELEMENT ingredientList -- (ingredient*)> <!ELEMENT instructionList -- (step*)> <!ELEMENT ingredient -- (#PCDATA) > <!ELEMENT step -- (#PCDATA)> XML Object Database

  14. Document Instance <!DOCTYPE RECIPE PUBLIC ”recipe" ”recipe"> <RECIPE><TITLE>Coconut Pudding</TITLE> <INGREDIENTLIST> <INGREDIENT> 12 ounces coconut milk</INGREDIENT> <INGREDIENT> 4 to 6 tablespoons sugar </INGREDIENT> <INGREDIENT> 4 to 6 tablespoons cornstarch </INGREDIENT> <INGREDIENT> 3/4 cup water </INGREDIENT> <INGREDIENTLIST> <INSTRUCTIONLIST>` <STEP> Pour coconut milk into saucepan. </STEP> <STEP>Combine sugar and cornstarch; stir in water and blend well. </STEP> <STEP>Stir sugar mixture into coconut milk; cook and stir over low heat until thickened. </STEP> … </INSTRUCTIONLIST> </RECIPE> XML Object Database

  15. HTML, SGML, XML • HTML helped establish the Internet by providing a universal way to present information. • However, HTML only addresses the presentation of data. • Using SGML, user can add structure along with the content of a document. • However, SGML has proven too heavy-weight for the Internet. XML Object Database

  16. Extensible Markup Language • The XML is a simple dialect of SGML. • HTML is sufficient for sending web pages that are viewed by human beings. • XML, however, adds the tags that enable computers to understand, act on or process the information. • XML has been designed for ease of implementation and for interoperability with both SGML and HTML. XML Object Database

  17. XML Application Profile • Electronic commerce • Electronic data interchange (EDI) • Fine-grain content publishing • Internet search engines • Distributed application design • etc. XML Object Database

  18. Data Type Requirements of Documents • HTML • One file per page • Simple uni-directional linking • XML • Tens, hundreds or even thousands of objects per page • Multiple DTDs • Hierarchical structure and rich linking • Query and navigation capabilities required • Agents and business rules interact with the data XML Object Database

  19. Data Types of Storage • File system • Store monolithic stuff. • Folder system on top of them • Good at storing multimedia data XML Object Database

  20. Data Types of Storage • Relational database • Tabular in nature • Good at storing rows and columns of data like spreadsheets and data from forms like invoices. XML Object Database

  21. Data Types of Storage • Object-oriented database • Good at managing structured, hierarchical rich linked information. • That’s exactly what XML is. • XML is the object representation of data. XML Object Database

  22. Design and Implementation of an XML Object Database XML Object Database

  23. Basic Idea • The arrangement of elements in an XML document is governed by the element and attribute list declarations in document type definition. • The creation of DTD in a sense is closely related to defining new data types and hierarchical relationship in an object database. • Thus, to enter an XML document into an object database, at first a new schema corresponding to a DTD is generated in the object database, and then the document conforming to that DTD is fragmented into objects and entered into the database. XML Object Database

  24. Basic Idea • Both the tasks of creating a schema in object database for a DTD and fragmenting XML documents into objects can be divided into two parts: analysis and generation. • For the former task, an input DTD is analyzed according to the formation rules specified in the XML recommendation, and the schema definitions are produced for the structures found in the analysis of DTD. • The other task is to analyze XML document instances and produce object definitions for the elements found in them. XML Object Database

  25. Basic Idea • We employ the definite clause grammar (DCG) in Prolog as a tool to implement the analysis and generation tasks. • The basic idea is to encode the analysis task in the context-free rule part and the generation task in the action part of the DCG rules. XML Object Database

  26. Strucuture Document Database • Combine structured document with OODB technology: • VERSO project at INRIA • News-On-Demand Application • Document Database from GMD-IPSI • XML document database products: • The Poet XML Repository • eXcelon, ODI • Ardent Sofiware, Inc XML Object Database

  27. System Architecture XML Object Database

  28. elementdecl::= ’<!ELEMENT S Name S contentspec S? ‘>’ elementdecl(contentModel(N,C))--> elementPrefix, name(N), contentSpec(C), rightAngle. contentspec::= ‘EMPTY’| ‘ANY’| Mixed | children contentSpec(C)--> empty,{C=’EMPTY’}; any,{C={ANY’}; mixed(C); children(C). DTD Parser XML Object Database

  29. <!ELEMENT top (p,spec,div1)> <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT spec (front,body, back?)*> <!ELEMENT div1 (head,(p|list1 |note)*, div2*)> <!ELEMENT name (#PCDATA)> <!ELEMENT a (#PCDATA)> <!ELEMENT ul (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT i (#PCDATA)> <!ELEMENT em (#PCDATA)> <!ELEMENT front (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ELEMENT back (#PCDATA)> <!ELEMENT head (#PCDATA)> <!ELEMENT list1 (#PCDATA)> <!ELEMENT note (#PCDATA)> <!ELEMENT div2 (#PCDATA)> [contentModel(top,seq([p/null,spec /null,div1/null])/null), contentModel(p,mixed([pcdata,a,ul,b,i,em])), contentModel(spec,seq([front/null,body/null,back/question])/star), contentModel(div1,seq([head/null,alt([p/null,list1/null,note/null]) /star,div2/star])/null), contentModel(name,pcdata), contentModel(a,pcdata), contentModel(ul,pcdata), contentModel(b,pcdata), contentModel(i,pcdata), contentModel(em,pcdata), contentModel(front,pcdata), contentModel(body,pcdata), contentModel(back,pcdata), contentModel(head,pcdata), contentModel(list1,pcdata), contentModel(note,pcdata), contentModel(div2,pcdata)] Parsing Result XML Object Database

  30. defineClass 'Top' super: SingleSeq { instance: 'P' 'p'; 'Spec' 'spec'; 'Div1' 'div1';}; defineClass 'P' super: Mixed { instance: List<Mixedp> mixedp;}; defineClass Mixedp super: SingleAlt { instance: String pcdata; 'A' 'a'; 'Ul' 'ul'; 'B' 'b'; 'I' 'i'; 'Em' 'em';}; defineClass 'Spec' super: MultiSeq { instance: List<Seqspec> seqspec;}; defineClass 'Seqspec' super: SingleSeq { instance: 'Front' 'front'; 'Body' 'body'; 'Back' 'back';}; defineClass 'Div1' super: SingleSeq { instance: 'Head' 'head'; List<Alt1> 'alt1'; List<Div2> 'div2';}; defineClass 'Alt1' super: SingleAlt { instance: 'P' 'p'; 'List1' 'list1'; 'Note' 'note';}; defineClass 'Name' super: Unstructured { instance: String pcdata;}; ... Schema Generation XML Object Database

  31. top(V) --> stg(top), p(P),spec(Spec),div1(Div1), etg(top). p(V) --> stg(p), mixedp(Mixedp),etg(p). mixedp(V) --> (pcdata(Pcdata); a(A);ul(Ul);b(B); I(I); em(Em);{false}), mixedp(_); []. spec(V) --> stg(spec), spec1(Spec), etg(spec). spec1(V) --> front(Front), body(Body), (back(Back);[ ]), spec1(_); []. div1(V) --> stg(div1), head(Head), alt1(Alt1), div21(Div21), etg(div1). alt1(V) --> (p(P); list1(List1); note(Note); {false}), alt1(_); []. div21(V) --> div2(Div2), div21(_) ; []. name(V) --> stg(name), pcdata(Pcdata), etg(name). DI Parser XML Object Database

  32. Rule_Head --> Start_Tag, Rule_Body, End_Tag, {Semantic Actions}. DI Parser Generation for each contentModel(ElementName,ContentStructure) do generate the rule head for ElementName; generate the start tag for ElementName; generate the rule body for ContentStructure; generate the end tag for ElementName; generate the semantic action; XML Object Database

  33. Implementation • We have built a prototype of the system using LPA Win-Prolog V3.5 on personal computer. • It consists of a DTD parser, Schema generator and DI parser generator. • After creating the physical store and class family for XML documents, we can proceed to build the database schema for DTD by executing the ODQL codes generated by the DTD schema generator. XML Object Database

  34. XML Object Database

  35. XML Object Database

  36. An Extensible Query-By-Template Interface to Accessing XML Document Database XML Object Database

  37. Motivation • Vastness of search results on current WWW search engines • Textual-based query language with a simple English-like syntax is inconvenient for the user. • Current user interfaces primarily use form-based queries. XML Object Database

  38. Goal • The goal is to design a convenient interface for user to access XML document without knowing the knowledge of the document types. • The interface will relieve user from typing complex query language. • The interface should be web-based and platform-independent. XML Object Database

  39. System Architecture Visual Query Interface XML Object Database

  40. Visual Query Facility • Query By Example (QBE) • The interface is composed of tabular skeletons representing tables in the database. • Query By Forms (QBF) • The interface is presented with a list of searchable fields, each with an entry area that can be used to indicate the search string. • Query By Template (QBT) • The interface is displayed a template for a representative entry of the database. User express their queries by indicating the search keywords in the appropriate regions of the template. XML Object Database

  41. Example of Image-based QBT XML Object Database

  42. Limits of Image-based QBT • The image template is divided into regions, each of which corresponds to an element in the document structure. • Associated with each regions is the query action. • Its significant drawback is the lack of flexibility in the template creation. • It is difficult to automate the task of reconfiguration of query action associate with the new template. • A single interface template for all types of document is probably not a good idea. XML Object Database

  43. Concept of eXtensible QBT (XQBT) • The environment provides a template creator which consists of a DTD schema browser and a scene for presentation design. • The environment aims at providing automatic configuration of query actions associated with presentation of template. • The design of the template presentation must be tightly coupled with the arrangement of document data stored in the repository. • The component in the design of presentation must be properly associated with corresponding nodes in the object database schema. XML Object Database

  44. Environment for XQBT XML Object Database

  45. Template Creator • The template creator consists of a DTD schema browser a scene for template draft, and functional area. • The template creator in mainly relied on a DTD schema browser, which corresponds to the database schema. • The scene is a visual display area where the designer can organize a template draft for certain purpose. • The content of template draft is exported to a file, which contains the template presentation and additional information. XML Object Database

  46. Functional area Template Creator Functional Area XML Object Database

  47. Exported File • The file contains the information about the template presentation property associate with each element. • Each element is appended with the path information in the database schema, in order that the template executor, which can make use of the information to carry out query actions. XML Object Database

  48. Template Executor • The template executor loads the exported file and presents the template as was originally designed in the template creator. • The path of each node in the DTD schema browser is used to carry out the query action required by the user. XML Object Database

  49. The template is an image by taking a photograph or by scanning from existing pages. The query action associate with each region is hand-coded. Either planar or nested template is limited to region level that is not very deep. The template is generated for a representative document. The associated query action can be generated automatically for the interface program. The designer can change the template to meet the requirement of various region level. Comparison between Image-based QBT and XQBT XQBT QBT XML Object Database

  50. Implementation • Java Proxies (Jp) for Jasmine • Jp allows developer to build their application in J-API, and take advantage of Jasmine class libraries. XML Object Database

More Related