410 likes | 536 Vues
This talk by Davide Mottin explores a new probabilistic optimization framework designed to address the Empty Answer Problem. The framework focuses on enhancing user satisfaction through query relaxation mechanisms, allowing systems to dynamically adapt to user preferences when no results are returned. By utilizing an interactive Query Relaxation Tree, the proposed solutions balance the need for efficient query modifications while maximizing user-centric outcomes. The talk outlines various approaches, including exact and approximate solutions, and presents experimental results validating the framework's effectiveness in real-world applications.
E N D
A Probabilistic Optimization Framework for the Empty-Answer Problem Davide Mottin, Alice Marascu, SenjutiBasu Roy Gautam Das, Themis Palpanas, YannisVelegrakis Talk by Davide Mottin at Yahoo! Research Barcelona
Who am I? • Born in Marostica • Live in Trento • I’mmember of the dbTrentogroup in the University of Trento • Advisors: ThemisPalpanas and Yannis Velegrakis
Empty-AnswerProblem CAR DB Alarm, DSL, Manual No answer {}
Issues • Usersneed a productmatchinghis/herpreferences • Difficult to propose an approximateanswerclose to userneeds • The systemdoesnotprovidesufficient help
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • ExactSolutions • ApproximateSolution • ExperimentalResults • Conclusions
Existing Solutions • Ranking function • Propose ranked results that are close to the user preferences • Both IR [Baeza11] and database solutions [Chaudhuri04] • Query relaxation • Remove or change one of the conditions in user query [Mishra09]
Query Relaxation CAR DB Alarm, DSL, Manual {}
How ManyRelaxations? Exponential in the size of the query
Challenges • Too many relaxation proposed • Lack of a principled method to propose the next relaxation • Exploring all the relaxations is impractical • Limited user interaction with the system • Lack of user-centric model and motivation for a refinement SOLUTION??? Interactive Query Relaxation
Interactive Query Relaxation CAR DB Alarm, DSL, Manual Remove DSL? RemoveAlarm? YES NO Result: {Askari, A10, …} {}
Applications • Small mobile • Hand-helddevices • Interaction with an agent via telephone • Reservations in restaurants • …
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • ExactSolutions • ApproximateSolution • ExperimentalResults • Conclusions
Query RelaxationTree Relax DSL? DSL isnotrelaxableanymore Relax Alarm? Problem: Find the optimalpaththatmaximizeor minimize some user-centricquantity
Query Relaxation Tree • Nodesrepresent • Next relaxation proposed (relaxationnodes) • Yes/No User choices(choicenode) • Leaves represent • Non-relaxablequery • Non-emptyquery • Choicebrancheshaverelprefyesrelprefnoprobabilities • A refusedrelaxation = cannot be relaxedfurther (hard constraint) • For Eachnodewe compute a costthatdepends on the optimizationadopted (Dynamic, Semi-Dynamic, Static)
User-Centric Model • Prior(t,Q,Q’) • userknowledgeabout the existence of a tuple t satisyingrelaxedqueryQ’ • Pref(t, Q’) [preferencefunction] • User preferenceabout a tuplegiven the queryQ’ t? Q' DB t t
User-Centric Model Q: Whatis the probability the usersays NO to a relaxation? A: The userdoesn’tlikeany of the tuplesthatsatisfiesthe relaxedqueryQ’
Problem Definition Given a query Q and a database D, find the sequence of relaxations in the Query Relaxation Tree that (3 separate goals): • Minimize the number of relaxations (Dynamic) • Maximize the user satisfaction (Semi-Dynamic) • Maximize some profit/benefit (Static)
Costfunction whereoptimize = Minif goal isDynamic (minimum number of steps) and Maxotherwise • Cost of a leaf: • 0 for Dynamic (minimizeNumber of steps) • Maxpreference (usingpref) of tuples for Semi-Dynamic • Maxvalue (e.g. price, revenue) of the tuples for Static
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions
Exact Solution (FullTree) Input: queryQ, database D Output: optimalcost • Construct the Query Relaxation • Compute the cost for eachnode (bottom-up) • Returns the cost of the root
FullTreeAlgorithm (Dynamic) 1 1 2 0.3 0.7 0 0 1 1 1 0.3 0.7 0 0
Fast Solution (FastOpt) Idea: prune the unpromisingbranches in advance and expandonly the goodones • Associate an upper and a lowerboundateachnode • Upperboundis the cost of the nodewhen the probabilityis 1 on no nodes • Lower boundis the opposite of the upper • Remove a nodeifhislowerboundisgreaterthan some upperbound of the siblingnodes
FastOptAlgorithm (Dynamic) Prune!!!
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions
Approximate Solution (CDR) Idea: compute the costdistribution of eachnodeand expand the onlynodethatmaximizes the probabilitythat the costislessthan the cost of all the siblings. • Associate a b-sizehistogram to eachnode • Construct the tree for the first L levels • Assignsuniformprobability to nodes • Use convolution to find sum (choicenodes)/min (relaxationnodes) distributions for costs • Expand the branchthathas the biggestprobability of having the lowercost
Compute costdistributions Query size 5 Remember the cost formula Choicenodeatlevel 2, costuniformlydistributed in [1,3] Compute relpref * (1 + cost(n'))
Compute costdistributions Probabilitydistribution of nyes Probabilitydistribution of nyes Sum the distributions of yes child and no childusing sum-convolution
Compute CostDistributions Compute the minconvolution of the child of relaxationnode
Choose the Branch to Expand Idea: for each son of the root, compute the probabilitythat the costissmallerthan the siblings and choose the son with the highestprobability Expandthis! Pr(n2<n1) = 0.4 Pr(n1<n2) = 0.6 n1 n2
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions
Experimental Setup • Datasets: • US Home dataset: 38k tuples 18 attributes • Car dataset: 100k tuples, 31 attributes • Synteticdatasets: 20k to 500k tuples • Baseline algorithms: • Query refinementalgorithm[Mishra09] (QueryRef) • Random relaxation • Greedy: choose the first non empyotherwise random
Experimental Setup • Effectiveness: • Query time • Size of the tree (number of nodes) • Cost of the root (expectednumber of steps) • CDR calibration: • Impact of L and number of buckets • User study: • 125 users with MechanicalTurk • Random queries with 4-8 attributes • Evaluation of the usefulnesssystem
RootCost • CDR close to optimal • QueryRefis 30% worse on average • Random is 150% worsethanFullTree
Goal comparison • All the objectivefunctionscorrectlyoptimizetheirgoals • Dynamic and Semi-Dynamic are verysimilar in performance
Query Time Exponentialbehaviour Efficient for small queries 1.4 sec for querysize 10!!!
User Study Q1 - Rate the suggested refinements Q2 - Did you like the system guiding you? Q3 - Did the system help you arrive to the results fast? Q4 - Did you prefer using the help of this system to relaxing the query by yourself?
Conclusions • We propose • anovelprincipled, user-centric and interactiveapproach for the empty-answerproblem • two exact algorithms and an approximate algorithm • We show that • the framework can deal with the combinatorialexplosion • the usereffortisminimized • the user is generally satisfied by the system
Bibliography [Mishra09] C. Mishra and N. Koudas, “Interactive query refinement,” in EDBT,2009. [Roy08]S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania, “Minimum-effort driven dynamic faceted search in structured databases,” in CIKM, 2008. [Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “Probabilistic ranking of database query results,” in VLDB, 2004. [Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto, Modern Information Retrieval,2011.
CostProbability Probabilitythat the cost of n1lesserthancost n2 Relaxation of the root