1 / 18

Seminar: Information Management in the Web

Seminar: Information Management in the Web. Query Processing Over Peer-to-Peer Data Sharing Systems (UC Santa Barbara). Motivation. E.g. find all object whose attribute values ( NOT hash IDs!!) are between 100 and 200 DHTs poorly support range queries

woolery
Télécharger la présentation

Seminar: Information Management in the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seminar: Information Management in the Web Query Processing Over Peer-to-Peer Data Sharing Systems (UC Santa Barbara) Thomas Zahn CST 1

  2. Motivation • E.g. find all object whose attribute values (NOT hash IDs!!) are between 100 and 200 • DHTs poorly support range queries • Due to hashing, semantically succeeding objects could be stored at "opposite" ends of the overlay for each value in range, a separate lookup needs to be issued Thomas Zahn CST 2

  3. Overlay Object Placement 0 2128-1 d46a1c h(15) = d1a08e h(16) = 3102ab h(17) = d46a1c 15 16 17 d1a08e 3102ab Thomas Zahn CST 3

  4. Problem • for each value in range, a separate lookup would have to be issued • while theoretically possible for discrete sets (e.g. [10,11,12,…,50] ) • completely impossible for continuous sets (e.g. [10.0, 50.0]) Thomas Zahn CST 4

  5. General Concept (1) • uses 2-dimensional CAN virtual space • virtual space is partitioned into rectangular zones • each zone is owned by an active node • each node maintains RT with its neighbors (20,80) 30 42 (80,80) 6 7 4 3 61 2 50 5 35 1 (20,20) (80,20) Thomas Zahn CST 5

  6. (20,80) 30 42 (80,80) 6 7 4 3 61 2 50 5 35 1 (20,20) (80,20) General Concept (2) • node stores results of queries whose range are hashed to its zone • range query <a,b> hashed to target point (a,b) target zone, target node • result of range query is stored at target node/zone • e.g. range query <55,70> Thomas Zahn CST 6

  7. General Concept (3) • given two range queries r1:<a1,b1> and r2:<a2,b2> • two target points t1 (r1) and t2 (r2) • if a1 < a2  t1 lies to the left of t2 • if b1 < b2  t1 lies below t2 • t1 lies to the upper-left of t2 iff range r1 contains range r2 Thomas Zahn CST 7

  8. General Concept (4) • range query <x,y> hashed into zone A • if any prior range query result containing <x,y> exists must have been hashed to point in shaded region • any intersecting zone can potentially contain a result B D C (x,y) A Thomas Zahn CST 8

  9. B z' C z General Concept (5) • two target points t1 (r1) and t2 (r2) •  t1 lies to the upper-left of t2 iff range r1 contains range r2 • Diagonal Zone: zone z (x1,y1),(x2,y2), zone z' (a1,b1),(a2,b2) • z' is diagonal zone of z if a2 ≤ x1 and b1 ≥ y2 • Intuitively: z' is diagonally above upper-left corner of z • only non-empty zones exist •  a diagonal zone of z can answer ALL range queries that hash into z Thomas Zahn CST 9

  10. Zone Maintenance • initially entire hash space is single zone assigned to one active node • each active node has RT containing its neighbor active nodes along with their zone coordinates • a zone splits when load (storage and/or processing) too high  decision made by zone owner • owner contacts a passive node  assigns it portion of its zone  transfer corresponding results, neighbor list Thomas Zahn CST 10

  11. Query Routing (1) • result likely to be cached at target zone  range query is routed through virtual space toward its target zone • starting at requesting zone, each zone passes query on to a neighboring zone • a zone chooses neighbor zone whose coordinates are closest to target point • process continues until target zone is reached Thomas Zahn CST 11

  12. Query Routing (2) • simple way: compute Euclidean distance between target point and center of a zone •  might not converge 1 4 3 t 2 Thomas Zahn CST 12

  13. Query Routing (3) • distance of target t from a zone Z should be measured as the closest distance of t from the entire zone Thomas Zahn CST 13

  14. 8 11 6 7 4 5 3 10 2 Forwarding (1) • query reaches target zone  check local cache • if no result containing query range is found  forward query •  only zones to upper left of target point can have a result containing the given range • forwarding similar to flooding  Forward Limit 0.0 – 1.0 Thomas Zahn CST 14

  15. 3 5 7 1 2 6 4 Forwarding (2) • Again: diagonal zones are especially interesting  guaranteed to have a result containing the given range • Because: every point in the diagonal zone contains the range  every point lies to upper-left of target point • BUT: zone may not have a diagonal zone Thomas Zahn CST 15

  16. Updates • tuple t with range attribute A=k is updated • sent update message to target zone containing (k,k) • tuple t included in all ranges <a,b> s.th. a ≤ k ≤ b •  forward to all zones that lie on the upper left of target zone Thomas Zahn CST 16

  17. Conclusion • Does not assume natural equal distribution of attribute values • Efficient average path length (O( ) ) • BUT: hot spot nodes in upper-left section  many splits  heavy partitioning  longer path length • cached results may not reflect current result •  updates / deletion expensive Thomas Zahn CST 17

  18. Questions • ? Thomas Zahn CST 18

More Related