1 / 28

Efficient Management Of Multi-Version XML Documents For E-Government Applications

WEBIST 2005 – International Conference on Web Information Systems and Technologies. Efficient Management Of Multi-Version XML Documents For E-Government Applications. Federica Mandreoli Riccardo Martoglia Università degli Studi di Modena e Reggio Emilia Fabio Grandi Maria Rita Scalas

penha
Télécharger la présentation

Efficient Management Of Multi-Version XML Documents For E-Government Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WEBIST 2005 – International Conference on Web Information Systems and Technologies Efficient Management Of Multi-Version XMLDocuments For E-Government Applications Federica Mandreoli Riccardo Martoglia Università degli Studi di Modena e Reggio Emilia Fabio Grandi Maria Rita Scalas Università degli Studi di Bologna

  2. Overview • Our research activities concern the implementation of Web information systems for e-Government applications • Importance of E-government: • European commission e-government • U.S. president’s e-government initiatives • More and more resources and services are being made available by Public Administrations (PAs) • We make use of temporal database and semantic Web techniques to provide improved access to such resources and services WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  3. new version new version Original normative text time 1 2 3 Importance of versioning • Temporal concerns are ubiquitous in the law domain • Each normative text changes in time due to different modifications, but keeps its identity • The ability to model temporal dimensions is essential for the management of evolving norms • it is crucial to reconstruct the consolidated version of a norm • also past versionsare still important WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  4. Importance of versioning • Applicability (semantic) versioning also plays an important role • some norms or some of their parts have or acquire a limited applicability • personalized versionof the norm • A version only containing articles which are applicable to a citizen personal case Self-employed WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  5. Objectives • Development of an effective and efficient system where: • norms are represented as XML documents and are available on the Web • dynamics of norms in time is captured • limited applicability of norms (and their parts) is captured • Enable citizens to access personalizedversions of multiversion resources • Improve and optimize the involvement of citizens in the e-Governance process WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  6. Approach • Definition of a temporal XML model including • a temporal XML schema • temporal manipulation operations • applicability (semantic) extensions • Design, implementation and evaluation of system prototypes supporting the model • First system, based on “stratum” approach on top of a commercial DBMS • Ongoing research: second system, “native” approach • includes semantic annotations in multiversioning WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  7. The temporal XML data model • XML Schema based • Based on the hierarchical organization of normative texts • contents-section-article-paragraph • At each level of the hierarchy, the history of changes is represented by the versions produced • The temporal pertinence is represented by timestamps, i.e. temporal elements encoded as multiple 3-dim intervals (TA) • A reference to the modifying (active) norm is added (an_ref) • Supports ancestor-descendant inheritance • Timestamps of a node are inherited by its descendants • Along the hierarchy, redefinitions can only involve a restriction of the temporal pertinence WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  8. Publication – R Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O TA Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O TA Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O TA Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O TA The temporal XML schema Num – R Law Type – R 4 Temporal Dimensions: Publication time time of publication on the Official Journal Validity time time the norm is in force Efficacy time time the norm can be applied Transaction time time the norm is storedin the system Title Contents An_ref – O Ver Num – R Section Num – R An_ref – O Ver Num – R Num – R Heading Article An_ref – O Ver Num – R Heading Paragraph Num – R An_ref – O Ver Num – R WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  9. An example document <norm num="2624/1999" type="Law"> <title>Cereals Importation</title> <contents publication="2001-01-01"vt_start="2001-01-01" tt_start="2001-01-10" et_start="2001-01-01" > … <article num="1"> <ver num="1"> <ta/ vt_start="2001-01-01" tt_start="2001-01-10" tt_end="2001-06-01" et_start="2001-01-01" > <ta/ vt_start="2001-01-01"et_start="2001-01-01"et_end="2001-06-10" … > <ta/ vt_start="2001-01-01"vt_end="2001-06-10"et_start="2001-06-10" … > <paragraph num="1"><ver num="1" > …Art. 1 before modification… </ver> </paragraph> … </ver> <ver num="2" an_ref="LD135/2000" > <ta/ vt_start="2001-06-10" tt_start="2001-06-01" et_start="2001-06-10" > <paragraph num="1"><ver num="1"> …Art. 1 after modification… </ver> </paragraph> <paragraph num ="2"><ver num="1"> …Art. 1 after modification… </ver> </paragraph> </ver> </article> … WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  10. “Stratum” approach Based on two components: • XML document management facilities offered by Oracle 9i • document-size granularity • structural andtextual constraints • software stratum built on top • temporal aspects • reconstruction Extensive experimental results on the system behaviour show: • good performance • ability to manage large collections of XML multi-version documents WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  11. Querying and modification aspects • Full search and reconstruction functionalities FOR $a IN pathWHERE constraints on $aRETURN const-tree(document($a), temporal specs) • constraintscan contain keyword-based text selections • const-tree • operator for the reconstruction of a temporally consistent normative act (involves temporal selections) • temporal specs may require a temporal predicate for each of the supported dimensions • Two basic operators for the management of norm modifications: • change the textual content of a norm portion • deletion, introduction, replacement of (a part of) the norm • modifications to the temporal pertinence of a given version • time extension or suspension of (part of) the norm WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  12. Example of reconstruction (curr. ver.) <norm num="2624/1999" type="Law"> <title>Cereals Importation</title> <contents publication="2001-01-01"vt_start="2001-01-01" tt_start="2001-01-10" et_start="2001-01-01" > … <article num="1"> <ver num="1"> <ta/ vt_start="2001-01-01" tt_start="2001-01-10" tt_end="2001-06-01" et_start="2001-01-01" > <ta/ … > <ta/ … > <paragraph num="1"><ver num="1" > …Art. 1 before modification… </ver> </paragraph> … </ver> <ver num="2" an_ref="LD135/2000" > <ta/ vt_start="2001-06-10" tt_start="2001-06-01" et_start="2001-06-10" > <paragraph num="1"><ver num="1"> …Art. 1 after modification… <ver> </paragraph> <paragraph num ="2"><ver num="1"> …Art. 1 after modification… </ver> </paragraph> </ver> </article> … ( NOT INCLUDED ) WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  13. “Native” approach Based on a Temporal XML Query Processor: • provides all the temporal, structural, textual and applicability query facilities in a single component • exploits ad-hoc data structures and algorithms • finer “tuple” granularity • embedded “light” DBMS libraries • structural joins algorithms • allows users to store and reconstruct on-the-fly XML norm texts satisfying the four types of constraints WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  14. Semantic versioning • Extension of the multi-version model based on temporal dimensions to include a semantic versioning dimension • Aim: provide personalized access to norm texts • Civic ontology a classification of citizens based on the distinctions introduced by successive norms (founding acts) that imply some limitations in their applicability WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  15. Semantic versioning • At this stage of the project, we manage “tree-like” ontologies • class taxonomies induced by the IS-A relationship • we exploit the pre-order and post-order properties of trees • New versioning dimension • Applicability of different parts of a norm text to the relevant classes of the civic ontology WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  16. Semantic versioning • Applicability is inherited by descendant nodes unless locally redefined • By means of redefinitions we can also introduce, for each part of a document, complex applicability properties • Extensions with respect to ancestors • Restrictions with respect to ancestors WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  17. Example of full search • John Smith is a self-employed citizen. • He is interested in the text of all the norms ... • ... which contain paragraphs dealing with health care, ... • ... which were valid and in effect between 2002 and 2004, ... • ... and which are applicable to his class. Structural constraint Textual constraint Temporal constraint Applicability constraint 4 orthogonal constraints WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  18. Example of full search FOR $a IN norm WHERE textConstr ($a//paragraph//text(), ’health AND care’) AND tempConstr (’vTime OVERLAPS PERIOD(’2002-01-01’,’2004-12-31’)’) AND tempConstr (’eTime OVERLAPS PERIOD(’2002-01-01’,’2004-12-31’)’) AND applConstr (’class 7’) RETURN $a Structural constraint Textual constraint Temporal constraint Applicability constraint 4 orthogonal constraints WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  19. Finer storing granularity Each document is split into ad-hoc structures (tuples), providing a finer granularity and optimized time and space requirements Tuple ( id, ) < structural attributes > < temporal attributes > < text > < appicability attributes > Each constraint is verified at query time on the respective attributes WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  20. startPos level 4 4 vtStart vtEnd etStart etEnd ttStart ttEnd pt 01/01/1980 F 01/01/1980 F 20/12/1979 UC 15/12/1979 AA text 3 Health care … Finer storing granularity WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  21. Example of full search Civic ontology Normative DB Norm Article 1 Article 2 TA Ver 1 AA=3 Par 1 Par 2 … norm//paragraph//text() … TA TA TA Ver 1 Ver 1 Ver 2 AA=4 AA=3,8 ‘class 7’ … AA Health care… …text X Health care… …text Y Health care… …text Z WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  22. “Native” approach benefits • The native approach is able to access and retrieve only the strictly necessary data • ad-hoc and temporally-enhanced structures are queried • finer granularity than the entire documents managed by standard XML engines • Only the parts which are required and which satisfy the temporal constraints are used for the reconstruction of the retrieved documents • There is no need to retrieve whole XML documents and build space-consuming structures such as DOM trees, as required in the stratum approach Enhanced query processing efficiency Reduced memory requirements WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  23. Three XML document sets 5000 documents (120MB) 10000 documents (240MB) 20000 documents (480MB) Variable document size min = 2KB avg = 24KB max = 125KB Evaluation benchmark • Five different query types • Queries on keywords (structural + textual constraints) • Q1 – keywords in contents • Q2 – keywords in type and contents • Temporal queries (structural + temporal constraints) • Q3 – conditions on publication, validity and transaction time • Mixed queries (structural + textual + temporal constraints) • Q4, Q5 – with keywords and temporal conditions • Five variants with personalized access • Qx-A – with additional applicability constraints WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  24. Performance evaluation • The selectivity of the query predicates strongly influences the performance of the stratum approach • Q2, Q3: large amounts of documents containing some (typically small) relevant portions have to be retrieved • The native approach shows to be faster and more reliable performance in all cases • Retrieval of useless document parts is avoided WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  25. Performance evaluation • Very high personalization query efficiency • The system is able to solve personalization problems by means of simple comparisons involving pre-post encodings • 0.5-1% more time than for the original versions • 3-4%storage space overhead WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  26. Performance evaluation time 1741 msec 1366 msec 1046 msec 5000 docs 10000 docs 20000 docs • Scalability tests • The computing time grows sublinearly with the number of documents • Good scalability of the system in every type of query context WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  27. Conclusions • We presented our research work concerning the design and implementation of efficient Web-based information systems for e-Government applications • We developed a platform (“stratum” approach) for temporal management of multi-version norm texts on top of a commercial DBMS • We migrated such a system towards a more efficient platform (“native” approach) for which a specialized Temporal XML Query Processor has been designed • We implemented advanced functionalities • personalized access to resources on the basis of the digital identity of citizens • We proved our approach to be very efficient in a large set of experimental situations and showed excellent scale-up figures with varying load configurations WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

  28. Future Work • Extensions of the current framework • more advanced application requirements may include a more sophisticated ontology definition • Development of a complete technological infrastructure usable in a large Web-based e-Government scenario • identification, classification and reconstruction services • Assessment of our developed systems in a concrete working environment • real users • large repository of real legal documents WEBIST 2005 Mandreoli Martoglia Grandi Scalas - Efficient Management Of Multi-Version XML Documents For E-Government Applications

More Related