Supporting Top- k join Queries in Relational Databases
130 likes | 253 Vues
This presentation discusses techniques for optimizing top-k join queries in relational databases. It highlights the challenges of conducting searches across multiple features, which often yield different rankings. A novel algorithm called Rank-Join is introduced, which optimizes the join and ranking process simultaneously. This method generates valid join combinations, computes scores for each, and efficiently manages top-k results using priority queues and various join strategies like Ripple Join and Hash Rank Join. The presentation emphasizes the benefits of these approaches in reducing costs and improving query performance.
Supporting Top- k join Queries in Relational Databases
E N D
Presentation Transcript
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington
Introduction • Often searches are done on multiple features • Each feature produces a different ranking for the query • Must thus join and aggregate rankings on different features
Example • Find location for a house such that the combination of the cost of the house and 5 years tuition at a nearby school is minimal. • Exact location is not predefined in query, per location the house and school features would have to be analyzed.
Motivation • Current techniques decouple join and sorting (ranking) of results. • Sorting is expensive and is a blocking operation. • More apparent if ranking and the joining features are different.
Rank-Join Algorithm • Generate new valid join combinations • Compute score for each combination • For each incoming input, calculate the total score of: • The last seen feature value and the top ranked feature value for all other features in the query. • Store the maximum of these as T (threshold) • Store top k in priority queue. • Halt when lowest value of queue ≥ T
Optimality • Is Instance Optimal over all correct top-K join algorithms. • Guarantees that cost of Rank-Join is O (cost of any other algorithm). • Mathematically: • Cost(Rank-Join) ≤ c*Cost(Any Other Algorithm) + c’ • c is the optimality ratio • c, c’ > 0
Rank-Join Continued … • Join strategy crucial • Recommended: Ripple Join • Alternates between tuples • Flexible in the way it sweeps out (rectangular, etc) • Retains ordering in considering samples • Variant of Rank-Join • Hash Rank Join (HRJN) • Block Ripple Join
Hash Rank Join (HRJN) Operator • Built on idea of hash ripple join • Inputs are as two hash tables • Maintains highest (first) and lowest (last selected) objects from each relation. • Results are added to a priority queue • Advantages: • Smaller space requirement • Can be pipelined
Hash Rank Join (HRJN) Operator: Problems • Local Ranking Problem • Results from three or more input streams • Larger queue sizes • More database accesses • Buffer Problem • Cannot predict how many partial joins will result
HRJN Solutions? • Block Ripple Joins • Do comparisons as blocks • Score-Guided Strategy • If thresholds are very different, then this may be because of the way one of the rankings is larger and descends at a slower rate • Can then take more inputs from the slower growing ranking so that the threshold goes closer to the other thresholds
Optimal Join-Order • Try to have the least number of input records in order to get a correct ranking • No clear way of estimating the order of joins • Have a heuristic – Footrule Distance • Simple measure of similarity among two rankings. • First join the most similar rankings • This would quickly yield a join by accessing fewer records
Rank-Join Algorithm: Benefits • What can it do? • Integrates well with query plans • Produces results as fast as possible • Provides performance guarantees • Minimizes space requirements • Offers a mechanism to determine the best order of joining to execute query optimally. • Can be improved further if random access is available • Can eliminate on-the-fly duplicate elimination
References • “Supporting top-k join queries in relational databases” - Ihab Ilyas, Walid Aref, Ahmed Elmagarmid (2004) • Jing Chen : DBIR Spring 2005, CSE-UT Arlington http://ranger.uta.edu/~gdas/Courses/ Spring2005/DBIR/slides/top-k_join.ppt