Query Specific Ranking

Query Specific Ranking CSE 6392 02/27/2006 Database Exploration

Content • Comparison of FA and TA algorithm • Representing ranking problem as a geometric problem • Query Specific Ranking Database Exploration

Comparison between FA and TA algorithm • TA is faster than FA • TA stops as soon as the score of the hypothetical tuple is less than the score of tuples in the top-k buffer. • TA is a bounded buffer algorithm • TA maintains a top-k buffer • FA maintains a set of candidates of all the tuples read until it gets ‘k’ objects in common in these sets. Database Exploration

Comparison between FA and TA • TA has to immediately scan as it reads a tuple in order to find the score in an eager manner. • FA has 2 phases for calculating score: - sort phase - scan phase • TA and FA algorithm requires the scoring function to be monotonic. Database Exploration

Why does TA work? • Stopping condition for TA is: • Score (hypothetical tuple) < score (k-th tuple in top-k buffer) • Idea is that score of unseen tuples will be less that the score of the hypothetical tuple according to the monotonic property. Database Exploration

Closing points on TA and FA • FA algorithm stops only when we get ‘k’ common objects/intersections in the set of candidates. • TA algorithm makes assumptions of unseen tuples based on the score of the hypothetical tuple in order to stop. • Therefore, there is no way FA can stop earlier than TA. • Hence, TA is instance optimal. Database Exploration

Query Specific Ranking • The ranking function we have discussed so far depends on the assumption of total ordering of attributes. • E.g. total ordering of price: - high price is bad - low price is good • In reality, this is not always true. Database Exploration

Query Specific Ranking • Different people will have a different ideal price in mind. • E.g. for one person, an ideal restaurant will be: price = $20 and capacity = 100. • In this case, the ranking function can be: • Score(<P, C>) = 5*|20-p| + 10*|100-c| Database Exploration

Query Specific Ranking • The above ranking function is more realistic than total ranking function. • But the above ranking function is not monotonic. • How can we find the top-k restaurants in this case without looking at the whole data set? Database Exploration

Solution • Assume the data set is sorted on all the attributes of interest. • First, create transformed attributes based on the original attributes involved in the ranking function such that the transformed attributes maintains the monotonic property. • Secondly, simulate sorted access. Database Exploration

Transformed attributes • Consider the restaurant example where: Score(<P, C>) = 5*|20-p| + 10*|100-c| • Transformed attributes are: • ∆p = differential of price from original price • ∆c = differential of capacity from original capacity • Suppose tid1 = <$30, 120> then < ∆p, ∆c>=<10,20> tid2 = <$15, 85> then < ∆p, ∆c>=<5, 15> Database Exploration

Simulating sorted access • Achieving monotonicity is just part of the problem. Need to achieve sorted access on the transformed (∆p and ∆c) attributes. • Suppose if data is presorted on the ‘price’ attribute. • Without presorting the whole dataset, we can go directly to the ‘sweet spot’ (i.e. price = $20 & capacity = 100) using B+ tree index. • From this point do 2 walks in the opposite directions and find ∆p and ∆c in the sorted order and merge them. Database Exploration

Adding Selection • This explains how hard conditions are handled or added to a ranking function. • E.g. Look for restaurants in Arlington • location =“Arlington”  hard condition Database Exploration

Handling hard conditions • The query will look like this: Select top[10] From restaurants Where location = “Arlington” Order by 5*abs(120 - price) • How to solve this query? Database Exploration

Handling hard conditions • Do selection first, then do ranking • This method is not the best method for the following reasons: • If selection produces a big result, it defeats the purpose of doing ranking • If selection produces a small result, then doing ranking on it will be an overkill. • The raw data is presorted and doing a selection first on this raw data will destroy the order of tuples. TA requires data to be presorted. Database Exploration

Handling hard conditions • The second method is to integrate selection as part of ranking. • Score (<L,P,C>) = If L= “Arlington” then 5*|20-P| + 10*|100-C| else 0 Database Exploration

Handling hard conditions • Now we are no longer dealing with numeric values alone. • Since location = “Arlington”, ranking function is no longer on numeric data but is instead on characterical data. • How do we deal with ranking function that have characterical data? Database Exploration

Query Specific Ranking

Query Specific Ranking

Presentation Transcript

Wavelets and Ranking of database query results

Ranking of Database Query Results

Query Performance and Optimizer Specific Type of Issues

Result Diversification Based On Query Specific Cluster Ranking

Temporal Query Log Profiling to Improve Web Search Ranking

Probabilistic Ranking of Database Query Results

Focused Belief Propagation for Query-Specific Inference

Temporal Query Log Profiling to Improve Web Search Ranking

Automated Ranking Of Database Query Results

Automated Ranking Of Database Query Results

Probabilistic Ranking of Database Query Result

Learning Joint Query Interpretation and Response Ranking

Depth Estimation for Ranking Query Optimization

Raccolta, ranking e query delle pagine di un webgraph

Query Dependent Ranking using K-Nearest Neighbor

Probabilistic Ranking of Database Query Results

Query Specific Fusion for Image Retrieval

Query Ranking in Probabilistic XML Data

Query Dependent Ranking Using K-Nearest Neighbor

Query-Specific Learning and Inference for Probabilistic Graphical Models

Probabilistic Ranking of Database Query Results

Ranking Query Results in a Networked World