1 / 17

Query Specific Ranking

Query Specific Ranking. CSE 6392 02/27/2006. Content. Comparison of FA and TA algorithm Representing ranking problem as a geometric problem Query Specific Ranking. Comparison between FA and TA algorithm. TA is faster than FA

mitch
Télécharger la présentation

Query Specific Ranking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Specific Ranking CSE 6392 02/27/2006 Database Exploration

  2. Content • Comparison of FA and TA algorithm • Representing ranking problem as a geometric problem • Query Specific Ranking Database Exploration

  3. Comparison between FA and TA algorithm • TA is faster than FA • TA stops as soon as the score of the hypothetical tuple is less than the score of tuples in the top-k buffer. • TA is a bounded buffer algorithm • TA maintains a top-k buffer • FA maintains a set of candidates of all the tuples read until it gets ‘k’ objects in common in these sets. Database Exploration

  4. Comparison between FA and TA • TA has to immediately scan as it reads a tuple in order to find the score in an eager manner. • FA has 2 phases for calculating score: - sort phase - scan phase • TA and FA algorithm requires the scoring function to be monotonic. Database Exploration

  5. Why does TA work? • Stopping condition for TA is: • Score (hypothetical tuple) < score (k-th tuple in top-k buffer) • Idea is that score of unseen tuples will be less that the score of the hypothetical tuple according to the monotonic property. Database Exploration

  6. Closing points on TA and FA • FA algorithm stops only when we get ‘k’ common objects/intersections in the set of candidates. • TA algorithm makes assumptions of unseen tuples based on the score of the hypothetical tuple in order to stop. • Therefore, there is no way FA can stop earlier than TA. • Hence, TA is instance optimal. Database Exploration

  7. Query Specific Ranking • The ranking function we have discussed so far depends on the assumption of total ordering of attributes. • E.g. total ordering of price: - high price is bad - low price is good • In reality, this is not always true. Database Exploration

  8. Query Specific Ranking • Different people will have a different ideal price in mind. • E.g. for one person, an ideal restaurant will be: price = $20 and capacity = 100. • In this case, the ranking function can be: • Score(<P, C>) = 5*|20-p| + 10*|100-c| Database Exploration

  9. Query Specific Ranking • The above ranking function is more realistic than total ranking function. • But the above ranking function is not monotonic. • How can we find the top-k restaurants in this case without looking at the whole data set? Database Exploration

  10. Solution • Assume the data set is sorted on all the attributes of interest. • First, create transformed attributes based on the original attributes involved in the ranking function such that the transformed attributes maintains the monotonic property. • Secondly, simulate sorted access. Database Exploration

  11. Transformed attributes • Consider the restaurant example where: Score(<P, C>) = 5*|20-p| + 10*|100-c| • Transformed attributes are: • ∆p = differential of price from original price • ∆c = differential of capacity from original capacity • Suppose tid1 = <$30, 120> then < ∆p, ∆c>=<10,20> tid2 = <$15, 85> then < ∆p, ∆c>=<5, 15> Database Exploration

  12. Simulating sorted access • Achieving monotonicity is just part of the problem. Need to achieve sorted access on the transformed (∆p and ∆c) attributes. • Suppose if data is presorted on the ‘price’ attribute. • Without presorting the whole dataset, we can go directly to the ‘sweet spot’ (i.e. price = $20 & capacity = 100) using B+ tree index. • From this point do 2 walks in the opposite directions and find ∆p and ∆c in the sorted order and merge them. Database Exploration

  13. Adding Selection • This explains how hard conditions are handled or added to a ranking function. • E.g. Look for restaurants in Arlington • location =“Arlington”  hard condition Database Exploration

  14. Handling hard conditions • The query will look like this: Select top[10] From restaurants Where location = “Arlington” Order by 5*abs(120 - price) • How to solve this query? Database Exploration

  15. Handling hard conditions • Do selection first, then do ranking • This method is not the best method for the following reasons: • If selection produces a big result, it defeats the purpose of doing ranking • If selection produces a small result, then doing ranking on it will be an overkill. • The raw data is presorted and doing a selection first on this raw data will destroy the order of tuples. TA requires data to be presorted. Database Exploration

  16. Handling hard conditions • The second method is to integrate selection as part of ranking. • Score (<L,P,C>) = If L= “Arlington” then 5*|20-P| + 10*|100-C| else 0 Database Exploration

  17. Handling hard conditions • Now we are no longer dealing with numeric values alone. • Since location = “Arlington”, ranking function is no longer on numeric data but is instead on characterical data. • How do we deal with ranking function that have characterical data? Database Exploration

More Related