1 / 27

Efficient Top-K Keyword Search in XML Databases: Join-Based Algorithm Strategy

This paper delves into a novel Join-Based Algorithm for Top-K Keyword Search in XML Databases, addressing the inefficiencies of existing methods and emphasizing on high performance in both frequency and correlation. The study also explores the LCA (Lowest Common Ancestor) concept and introduces ELCA (Effective Lowest Common Ancestor), SLCA (Subset Lowest Common Ancestor), and query semantics for enhanced search capabilities. Detailed experiments and conclusive remarks reaffirm the algorithm's robustness across different scenarios.

Télécharger la présentation

Efficient Top-K Keyword Search in XML Databases: Join-Based Algorithm Strategy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Top-K Keyword Search in XMLDatabases ICDE 2010

  2. Outline • Introduction • Motivation • Preliminaries • Join-based Algorithm • Join-based Top-k Algorithm • Experiments • Conclusions

  3. Introduction • LCA:Lowest Common Ancestor

  4. Introduction • LCA:Lowest Common Ancestor

  5. Motivation • The naive LCA-based semantics is straightforward, but leads to exponential computation and result size. • Two keywords:{XML} and {data} :lists of node XML. :lists of node data. the total number of the LCAs :m*n • Existing algorithms focusing on efficiency, cannot provide effective support for Top-k processing. • tg

  6. Preliminaries 1.Query Semantics • k-keyword query • :the list of nodes directly • :the LCA of nodes • ELCA semantics :the result as a set of nodes that contain at least one occurrence of all of the query keywords either in their labels or in the labels of their descendant nodes, after excluding the occurrences of the keywords in the subtrees that already contain at least one occurrence of all the query keywords

  7. Cont. • SLCA: a subset of such that no LCA in the subset is the ancestor of another LCA. • LCA:1.1, 1.1.2, 1, 1.3.4, 1.3 • SLCA:1.1.2, 1.3.4 • ELCA:1.1.2, 1.3.4, 1

  8. Cont. 2.Ranking Function

  9. Cont. • : a decreasing function

  10. Join-based Algorithm 1.Node encoding

  11. Join-based Algorithm 2.Algorithm .Two lists of nodes: . . .

  12. Cont. (2,3) join (1),no matched

  13. Cont. (3,5,6) join (1,2,4) no matched

  14. Cont. (2,3,4,5) join (1,2,4)=>(2,4) matchedthe nodes numbered 2 and 4 at level 3 are the lowest ELCAs=>erased

  15. Cont. (2,3) join (1) ,no matched

  16. Cont. (1,1) join (1) matched=>root is ELCA 1 correspond two node (1.2.3 and 1.3.5.6),output one of them

  17. Cont.

  18. Cont. Score:(1.3.4.5.3.1.1) is greater than Score(1.3.5.6) But in 4th column, 0.5*d(3) may greater than or equal 0.44

  19. Cont.

  20. Cont. Assume d( ): Join column 5 and 4: no result

  21. Cont. Column 3: Number 2 is matched It’s score is 0.73+0,41=1.14 Threshold of the unseen results in column 3 is =max{0.7+0.3,0.5+0.4}=1

  22. Cont. Consider the unseen results in other column: column 1 and 2 do not contain sequence s. ignore. Consider column 2:the maximum scores 0.7*0.9 and 0.5*0.9, threshold is 0.63+0.45=1.08<1.14 Therefore , node 2 at level 3 can output.

  23. Experiments

  24. Cont.

  25. Cont.

  26. Cont.

  27. Conclusions • 1. Join-based Algorithm has good performance in high frequency • 2. Join-based Top-k Algorithm has good performance in high correlation.

More Related