1 / 101

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML. ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ. ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML. Partial queries Query processing Query evaluation Query containment Experiments Conclusion. . Hotels. Athens. City. Island. Creta. Athens. Creta. Location. Island. City. City. Center.

oatman
Télécharger la présentation

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ

  2. ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

  3. Hotels Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Creta

  4. Hotels Athens City Island Creta  Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Search problem Name: Xiaoying Wu Place:Athens Center, Heraklio Purpose:Sightseeing Problem: structural difference Parthenon (438 BC) Phaistos’ Disk (1700 BC) Creta

  5. Hotels Athens City Island Creta   Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Search problem Name:Theodore Dalamagas Place:Islands Purpose:Sea sports Problem: structural inconsistency Windsurf Jet ski Creta

  6. Hotels Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Search problem Name:Dimitri Theodoratos Place:Heraklio Purpose:HDMS Conference Problem: unknown structure HDMS 2008 Creta

  7. Difficulties on Querying XML Data Search problem Name:Stefanos Souldatos Place:Any island Purpose:Escape from PhD! Problem:  multiple sources Creta  1400 islands theHotel.gr hotels.gr holidays.gr

  8. Hotels Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Can we use existing query languages (XPath, XQuery) to express our queries? Can we use existing techniques to evaluate our queries? Creta

  9. Hotels 1 2 3 Hotels Hotels City City City Athens Athens Athens 0% structure 100% 4 5 Hotels Hotels City City City City Athens Island Athens Island Partial Queries in XPath Path queries Tree-pattern queries 1. //Hotels[descendant-or-self::*[ancestor-or-self::City][ancestor-or-self::Athens]] 2. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]] 3. //Hotels[/City//Athens] 4. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]][//City [descendant-or-self::*[ancestor-or-self::Island]]] 5. //Hotels[/City//Athens][/City//Island]

  10. r a c c b a d Partial Queries root node (optional) query node labelled by “a” child relationship descendant relationship r a

  11. Conclusions (up to now) • Need for queries with partial structure • We introduce partial queries • Partial queries can be expressed in XPath

  12. ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

  13. r r a c c c a b a a d b d Query Processing QUERY PROCESSING QUERY EVALUATION partial path query partial path query in canonical form

  14. r a c c b a d Query Processing • Full form • Satisfiability • Redundant nodes • Canonical form

  15. IR1 r a c c b a d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form

  16. IR4 r a c c b a d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form

  17. r a c c b a d IR4 Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form

  18. IR6 r a c IR8 c b a d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form

  19. r c c a a b d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form

  20. r x y c c a a b d Query Processing • Full form • Satisfiability • Redundant nodes • Canonical form A query is unsatisfiable if its full form contains a trivial cycle:

  21. r c a a x x y b d y y y y y z y z Query Processing A node y is redundant if one of the following patterns occur: • Full form • Satisfiability • Redundant nodes • Canonical form a) c) c b)

  22. r c a a b d Query Processing • Full form • Satisfiability • Redundant nodes • Canonical form canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes

  23. r r d b d b c e c e Canonical Form partial path query directed acyclic graph with same-path constraint partial tree-pattern query directed acyclic graph with same-path constraints

  24. Conclusions (up to now) • Need for queries with partial structure • We introduce partial queries • Partial queries can be expressed in XPath • We can process any partial query  dag

  25. ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

  26. r r d b d b c e c e Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PPJoin: Decompose into PPs PartialTreeStack: novel holistic

  27. r r r r r b b d d d d d b b e c e c e b r e c e c c d b c e Partial Path Queries: PQGen Producing all possible path queries…  1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results

  28. r r r r r b b d d d d d b b e c e c e b r e c e c c d b c e Partial Path Queries: PQGen Producing all possible path queries…  1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results

  29. r r r r r b b d d d d d b b e c e c e b r e c e c c d b c e Partial Path Queries: PQGen Producing all possible path queries…  1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results

  30. r r r d d b c c e r d b c e Partial Path Queries: PathJoin Decomposing into root-to-leaf paths…  1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity , path )

  31. r r r d d b c c e r d b c e Partial Path Queries: PathJoin Decomposing into root-to-leaf paths…  1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity , path )

  32. r r r d d b c c e r d b c e Partial Path Queries: PathJoin Decomposing into root-to-leaf paths…  1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity , path )

  33. r r d b c e r d b c e Partial Path Queries: PartialMJ Using a spanning tree… r   d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity , structural , path )

  34. r r d b c e r d b c e Partial Path Queries: PartialMJ Using a spanning tree… r   d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity , structural , path )

  35. r r d b c e r d b c e Partial Path Queries: PartialMJ Using a spanning tree… r   d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity , structural , path )

  36. r r d b c e r d b c e Partial Path Queries: PartialMJ Using a spanning tree… r   d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity , structural , path )

  37. r b d c e r d b c e Sr Sr Sb Sb Sd Sd Sc Sc Se Se Partial Path Queries: PartialPathStack leaf node tree PathStack r b1 d1 Results: leaf nodes PartialPathStack c1 e1 d2 c2 e2 Results:

  38. r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r r b1 d1 Results: leaf nodes PartialPathStack c1 e1 r d2 Sr Sb Sd Sc Se c2 e2 Results:

  39. r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r r b1 b1 d1 Results: leaf nodes PartialPathStack c1 e1 r b1 d2 Sr Sb Sd Sc Se c2 e2 Results:

  40. r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r r b1 d1 b1 d1 Results: leaf nodes PartialPathStack c1 e1 d1 r b1 d2 Sr Sb Sd Sc Se c2 e2 Results:

  41. r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r c1 r b1 d1 b1 d1 Results: leaf nodes PartialPathStack c1 e1 d1 r b1 c1 d2 Sr Sb Sd Sc Se c2 e2 Results:

  42. r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r c1 e1 r b1 d1 b1 d1 Results:ra1b1d1c1e1 leaf nodes PartialPathStack c1 e1 d1 r b1 c1 e1 d2 Sr Sb Sd Sc Se c2 e2 Results:ra1b1d1c1e1

  43. r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack d2 r c1 r b1 d1 b1 d1 Results: ra1b1d1c1e1 leaf nodes PartialPathStack c1 e1 d2 d1 r b1 c1 e1 d2 Sr Sb Sd Sc Se c2 e2 Results: ra1b1d1c1e1

  44. r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack d2 c2 r c1 r b1 d1 b1 d1 Results: ra1b1d1c1e1 leaf nodes PartialPathStack c1 e1 c2 d2 d1 r b1 c1 e1 d2 Sr Sb Sd Sc Se c2 e2 Results: ra1b1d1c1e1,ra1b1d1c2e1

  45. r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack d2 c2 r c1 r b1 d1 e2 b1 d1 Results: ra1b1d1c1e1, ra1b1d1c1e2 leaf nodes PartialPathStack c1 e1 c2 d2 e2 d1 r b1 c1 e1 d2 Sr Sb Sd Sc Se c2 e2 Results: ra1b1d1c1e1,ra1b1d1c2e1,ra1b1d1c1e2

  46. r b d c e r d b c e Partial Path Queries: PartialPathStack tree PathStack [Bruno et al, 2002] r Optimal for path queries: O(input + output) b1 d1 [Souldatos et al, 2007] PartialPathStack c1 e1 Optimal for partial path queries: O(input*indegree+output*outdegree) d2 c2 e2

  47. Partial Path Queries: Comparison

  48. r r d b d b c e c e Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PartialPathJoin: Decompose into PPs PartialTreeStack: novel holistic

  49. r r b d d e b c e r c d b c e Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries…  1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results

  50. r r b d d e b c e r c d b c e Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries…  1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results

More Related