1 / 41

Efficient Processing of Updates in Dynamic XML Data

Efficient Processing of Updates in Dynamic XML Data. Changqing Li, Tok Wang Ling, Min Hu. Outline. Background and related work Our proposals Lexicographical order A compact dynamic binary string encoding ( CDBS ) Applying CDBS to different labeling schemes for update processing

lotta
Télécharger la présentation

Efficient Processing of Updates in Dynamic XML Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu

  2. Outline • Background and related work • Our proposals • Lexicographical order • A compact dynamic binary string encoding (CDBS) • Applying CDBS to different labeling schemes for update processing • Experimental evaluation • Conclusion Efficient Processing of Updates in XML

  3. Background and related work: Labeling schemes • Three main categories of labeling schemes to process XML queries • (1) Containment labeling scheme [Zhang et al SIGMOD01 etc.] • (2) Prefix labeling scheme [Tatarinov et al SIGMOD02 etc.] • (3) Prime number labeling scheme [Wu et al ICDE04] • In this talk, we focus on the labeling schemes to efficiently process updates Efficient Processing of Updates in XML

  4. 1,18,1 2,3,2 4,9,2 10,11,2 12,17,2 5,6,3 7,8,3 13,14,3 15,16,3 (1) Containment scheme • Each node is assigned with three values, i.e. “start”, “end”, and “level” • Based on “start”, “end”, and “level” to determine different relationships Efficient Processing of Updates in XML

  5. 1,18,1 2,3,2 4,9,2 10,11,2 12,17,2 5,6,3 7,8,3 13,14,3 15,16,3 Containment is bad to process updates • Need to re-label all the ancestor nodes and all the nodes after the inserted node in document order Efficient Processing of Updates in XML

  6. 1,20,1 2,3,2 4,9,2 12,13,2 10,11,2 14,19,2 5,6,3 7,8,3 15,16,3 17,18,3 Containment is bad to process updates • Need to re-label all the ancestor nodes and all the nodes after the inserted node in document order Efficient Processing of Updates in XML

  7. Existing approaches to process the updates in containment scheme • Increase the interval size and leave some values unused for the future insertions [Li et al VLDB01] • When unused values are used up, have to re-label • Use float-point value [Amagasa et al ICDE03] • Float-point value represented in a computer with a fixed number of bits • Due to float-point precision, have to re-label • They both can not avoid the re-labeling Efficient Processing of Updates in XML

  8. (2) Prefix scheme • Three main prefix schemes • DeweyID [Tatarinov et al SIGMOD02] • BinaryString [Cohen et al PODS02] • OrdPath [O'Neil et al SIGMOD04] Efficient Processing of Updates in XML

  9. 1 2 3 4 2.1 2.2 4.1 4.2 DeweyID (Cont.) • Determine different relationships based on the prefix property Efficient Processing of Updates in XML

  10. 1 2 3 4 2.1 2.2 4.1 4.2 DeweyID is bad to process order-sensitive updates • Order-sensitive updates: to maintain the document order when updates are performed • Need to re-label all the sibling nodes after the inserted node and all the descendants of these siblings Efficient Processing of Updates in XML

  11. 1 2 4 3 5 2.1 2.2 5.1 5.2 DeweyID is bad to process order-sensitive updates • Order-sensitive updates: to maintain the document order when updates are performed • Need to re-label all the sibling nodes after the inserted node and all the descendants of these siblings Efficient Processing of Updates in XML

  12. 1 3 5 7 3.1 3.3 7.1 7.3 Existing approaches to process the updates in prefix scheme: OrdPath • OrdPath [O'Neil et al SIGMOD04] • Similar to DeweyID • But at the beginning, use odd numbers only Efficient Processing of Updates in XML

  13. Existing approaches to process the updates in prefix scheme: OrdPath • OrdPath Label of node a “-1” Label of node b “4.1” Label of node c “4.3” Label of node d “4.2.1” They are siblings, but their labels look very different 1 5 3 7 b d c a 3.1 3.3 7.1 7.3 Efficient Processing of Updates in XML

  14. (3) Prime number scheme [Wu et al ICDE04] • Prime re-calculate the SC value to maintain the document order instead of re-labeling. • But re-calculation is much more expensive. Efficient Processing of Updates in XML

  15. Our CDBS encoding (1) Lexicographical order (2) Encoding (3) Applications and processing of updates (4) Experimental results Efficient Processing of Updates in XML

  16. (1)Lexicographical order of binary string • Given two binary strings “0011” and “01”, “0011”“01”lexicographically because the comparison is from left to right, and the 2nd bit of “0011” is “0”, while the 2nd bit of “01” is “1”. “0011” < “01” • Given two binary strings “01” and “0101”, “01”“0101”lexicographically because “01” is a prefix of “0101”. “01” < “0101” Efficient Processing of Updates in XML

  17. Find a binary string between two binary strings lexicographically • To insert a binary string between “0011” and “01” • the size of “0011” is 4 which is larger than the size 2 of “01”; this is Case (a) (larger than or equal) • therefore we directly concatenate one more “1” after “0011”. • The inserted binary string is “00111”, and “0011” < “00111” < “01” lexicographically. • To insert a binary string between “01” and “0101” • the size of “01” is 2 which is smaller than the size 4 of “0101”; this is Case (b)(smaller than) • therefore we change the last bit “1” of “0101”to “01”, i.e. the inserted binary string is “01001”; “01” < “01001” < “0101” lexicographically. Efficient Processing of Updates in XML

  18. (2) Compact encoding • Achieved the dynamic objective. • Further, we need to propose a CompactDynamic Binary String encoding, called CDBS. Efficient Processing of Updates in XML

  19. 1,18,1 2,3,2 4,9,2 10,11,2 12,17,2 5,6,3 7,8,3 13,14,3 15,16,3 Example illustration of CDBS • We show how to encode 18 numbers based on our CDBS encoding • This is only an example, any other numbers can be encoded with our CDBS Efficient Processing of Updates in XML

  20. Efficient Processing of Updates in XML

  21. Efficient Processing of Updates in XML

  22. Efficient Processing of Updates in XML

  23. Efficient Processing of Updates in XML

  24. Efficient Processing of Updates in XML

  25. Efficient Processing of Updates in XML

  26. Efficient Processing of Updates in XML

  27. Efficient Processing of Updates in XML

  28. 00001,1111,1 0001,001,2 0011,0111,2 1,10001,2 1001,111,2 01,01001,3 0101,011,3 101,1011,3 11,1101,3 (3) Applying CDBS to the containment scheme • Replace the “start” and “end” values 1 to 18 with our CDBS encoding • Based on the lexicographical order comparison • Level is still the same Efficient Processing of Updates in XML

  29. 001 01 1 11 01.01 01.1 11.01 11.1 Applying CDBS to the prefix scheme • The CDBS codes for 4 numbers are “001”, “01”, “1” and “11”. • The CDBS codes for 2 numbers are “01” and “1”. Efficient Processing of Updates in XML

  30. Applying CDBS to the prime scheme • Store the document order with our CDBS codes. • Based on the lexicographical order to determine the orders of nodes. • The size of Prime and the query performance of Prime are bad, so we do not show the details. Efficient Processing of Updates in XML

  31. 00001,1111,1 0001,001,2 0011,0111,2 1,10001,2 1001,111,2 01,01001,3 0101,011,3 101,1011,3 11,1101,3 Processing updates based on CDBS: forcontainment scheme • To insert two binary strings between “0011” and “01”, the inserted two binary strings will be “00111” and “001111”. • The complete label of the inserted node is “00111,001111,3” • No need to re-label the existing nodes, but different relationships, e.g. ancestor-descendant etc., can be determined, and the orders can be kept. Efficient Processing of Updates in XML

  32. 001 01 1 11 01.01 01.1 11.01 11.1 Processing updates based on CDBS: forprefix scheme • To insert a binary string before “01”, the inserted binary string will be “001” • The complete label of the inserted node is “01.001” • No need to re-label the existing nodes, but different relationships, e.g. ancestor-descendant etc., can be determined, and the orders can be kept. Efficient Processing of Updates in XML

  33. Problem about CDBS • The size of V-CDBS and F-CDBS may encounter the overflow problem when many nodes are inserted. • To solve the overflow problem, we propose QED in [Li & Ling CIKM05] • QED uses four quaternary symbols, i.e. 0, 1, 2, and 3, and each is stored with 2 bits • 0 is used as the separator or delimiter, and it will never encounter the overflow problem • QED is not as compact as CDBS, update cost is higher Efficient Processing of Updates in XML

  34. (4) Experimental results • Experimental setup • Performance study on static XML • Performance study on updates Efficient Processing of Updates in XML

  35. Experimental setup • All the schemes are implemented in Java and all the experiments are carried out on a 3.0 GHz Pentium 4 processor with 1 GB RAM running Windows XP Professional. Efficient Processing of Updates in XML

  36. Experimental setup (cont.) • The following table shows the datasets we used. Efficient Processing of Updates in XML

  37. Performance study on static XML • Our V-CDBS and F-CDBS are the most compact variable and fixed length dynamic encoding Label sizes of different schemes Efficient Processing of Updates in XML

  38. The 5 cases of node updates in experiments • We select one XML file Hamlet in dataset D1 to test the update performance (it is similar for other XML files). • Hamlet has 5 act elements. We test the following 5 cases • inserting an act element before act[1], • inserting an act element before act[2], • ···, • and inserting an act element before act[5]. Efficient Processing of Updates in XML

  39. Number of nodes to re-label in updates Efficient Processing of Updates in XML

  40. Total time for node updates • Several nodes inserted, main time is the I/O time, our approaches are the best to process updates. • When considering processing time only, our approaches are much better, more than 300 times faster. More appropriate for updates with many nodes. Log2(Update time) of different schemes Efficient Processing of Updates in XML

  41. Conclusion • Our CDBS is dynamic • Our CDBS is the most compact • Update cost is the cheapest, only need to modify the last 1 bit of the neighbor label Efficient Processing of Updates in XML

More Related