Seminar: Information Management in the Web

Seminar: Information Management in the Web Query Processing Over Peer-to-Peer Data Sharing Systems (UC Santa Barbara) Thomas Zahn CST 1

Motivation • E.g. find all object whose attribute values (NOT hash IDs!!) are between 100 and 200 • DHTs poorly support range queries • Due to hashing, semantically succeeding objects could be stored at "opposite" ends of the overlay for each value in range, a separate lookup needs to be issued Thomas Zahn CST 2

Overlay Object Placement 0 2128-1 d46a1c h(15) = d1a08e h(16) = 3102ab h(17) = d46a1c 15 16 17 d1a08e 3102ab Thomas Zahn CST 3

Problem • for each value in range, a separate lookup would have to be issued • while theoretically possible for discrete sets (e.g. [10,11,12,…,50] ) • completely impossible for continuous sets (e.g. [10.0, 50.0]) Thomas Zahn CST 4

General Concept (1) • uses 2-dimensional CAN virtual space • virtual space is partitioned into rectangular zones • each zone is owned by an active node • each node maintains RT with its neighbors (20,80) 30 42 (80,80) 6 7 4 3 61 2 50 5 35 1 (20,20) (80,20) Thomas Zahn CST 5

(20,80) 30 42 (80,80) 6 7 4 3 61 2 50 5 35 1 (20,20) (80,20) General Concept (2) • node stores results of queries whose range are hashed to its zone • range query <a,b> hashed to target point (a,b) target zone, target node • result of range query is stored at target node/zone • e.g. range query <55,70> Thomas Zahn CST 6

General Concept (3) • given two range queries r1:<a1,b1> and r2:<a2,b2> • two target points t1 (r1) and t2 (r2) • if a1 < a2  t1 lies to the left of t2 • if b1 < b2  t1 lies below t2 • t1 lies to the upper-left of t2 iff range r1 contains range r2 Thomas Zahn CST 7

General Concept (4) • range query <x,y> hashed into zone A • if any prior range query result containing <x,y> exists must have been hashed to point in shaded region • any intersecting zone can potentially contain a result B D C (x,y) A Thomas Zahn CST 8

B z' C z General Concept (5) • two target points t1 (r1) and t2 (r2) •  t1 lies to the upper-left of t2 iff range r1 contains range r2 • Diagonal Zone: zone z (x1,y1),(x2,y2), zone z' (a1,b1),(a2,b2) • z' is diagonal zone of z if a2 ≤ x1 and b1 ≥ y2 • Intuitively: z' is diagonally above upper-left corner of z • only non-empty zones exist •  a diagonal zone of z can answer ALL range queries that hash into z Thomas Zahn CST 9

Zone Maintenance • initially entire hash space is single zone assigned to one active node • each active node has RT containing its neighbor active nodes along with their zone coordinates • a zone splits when load (storage and/or processing) too high  decision made by zone owner • owner contacts a passive node  assigns it portion of its zone  transfer corresponding results, neighbor list Thomas Zahn CST 10

Query Routing (1) • result likely to be cached at target zone  range query is routed through virtual space toward its target zone • starting at requesting zone, each zone passes query on to a neighboring zone • a zone chooses neighbor zone whose coordinates are closest to target point • process continues until target zone is reached Thomas Zahn CST 11

Query Routing (2) • simple way: compute Euclidean distance between target point and center of a zone •  might not converge 1 4 3 t 2 Thomas Zahn CST 12

Query Routing (3) • distance of target t from a zone Z should be measured as the closest distance of t from the entire zone Thomas Zahn CST 13

8 11 6 7 4 5 3 10 2 Forwarding (1) • query reaches target zone  check local cache • if no result containing query range is found  forward query •  only zones to upper left of target point can have a result containing the given range • forwarding similar to flooding  Forward Limit 0.0 – 1.0 Thomas Zahn CST 14

3 5 7 1 2 6 4 Forwarding (2) • Again: diagonal zones are especially interesting  guaranteed to have a result containing the given range • Because: every point in the diagonal zone contains the range  every point lies to upper-left of target point • BUT: zone may not have a diagonal zone Thomas Zahn CST 15

Updates • tuple t with range attribute A=k is updated • sent update message to target zone containing (k,k) • tuple t included in all ranges <a,b> s.th. a ≤ k ≤ b •  forward to all zones that lie on the upper left of target zone Thomas Zahn CST 16

Conclusion • Does not assume natural equal distribution of attribute values • Efficient average path length (O( ) ) • BUT: hot spot nodes in upper-left section  many splits  heavy partitioning  longer path length • cached results may not reflect current result •  updates / deletion expensive Thomas Zahn CST 17

Questions • ? Thomas Zahn CST 18

Seminar: Information Management in the Web