270 likes | 431 Vues
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration. Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng & Xiaofeng Meng. The previous Web: things are just on the surface. The current Web: Getting “deeper”.
E N D
Selectivity Estimationfor Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng & Xiaofeng Meng
The previous Web: things are just on the surface Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
The current Web: Getting “deeper” • A great deal of information is hidden behind query forms • Deep = not accessible through search engines Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Why is it important? • More than 10 million distinct forms Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Why is it important? • Up to5,000 billions dynamic result pages Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Integrated query interface Query translation Web database query interfaces A Key Component: Query translation • Challenge • Large-scale • Heterogeneity • Autonomy Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
The Problem Selectivity Estimation for Exclusive Query Translation Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
? ? √ Example Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Related work & the Challenge • A prominent solution for selectivity estimation —— histograms [Piatetsky+, Poosala+, Ioannidis+] • Categorical attribute • Infinite-value attribute • Another solution —— random sampling [Goodman+, Haas+, Oliken+, Vitter+, Dasgupta+] • Random sampling • Challenge • Selectivity estimation of infinite-value attribute Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Selectivity Estimation for Exclusive Query Translation Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Strongest Weaker Weaker Weakest Two Observations • There exist different correlations between different attribute pairs • the word frequency of the values on an infinite-value attribute usually has a Zipf-like distribution Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Selectivity Estimation for Exclusive Query Translation • Attribute Correlation calculation for a domain • Selectivity estimation for a Web database • Correlation-based sampling • Word frequency probing • Zipf equation calculation • Selectivity estimation Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Selectivity Estimation Challenges 1. Attribute Correlation calculation • Find the least correlative attribute • Discover the word rank 2.Zipf equation calculation • Calculate the parameters of Zipf equation • Estimate selectivity Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Attribute Correlation Calculation Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Attribute Correlation calculation Goal • Random sample Word Rank (1) (2) Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Discussion on Word rank • Word rank should be computed for each attribute Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Zipf Equation Calculation Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Zipf equation calculation • Zipf equantion Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
The parameters of Zipf equation Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
discussion on P, p and E Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Experiments Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Data Sets & Evaluation Method • Data sets • Evaluation method Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Experimental Results • The average precision of selectivity estimations is high. Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Summary Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Contributions • Identify the selectivity estimation problem of infinite-value attribute for exclusive query translation • Propose correlation-base sampling approach to obtain the sample as random as possible • Propose Zipf-based selectivity estimation method • Verify the accuracy of our approach Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
Thanks (Q&A) Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)