1 / 26

A Generic Framework for Handling Uncertain Data with Local Correlations

A Generic Framework for Handling Uncertain Data with Local Correlations. Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon Hong Kong, China { xlian , leichen } @cse.ust.hk.

enye
Télécharger la présentation

A Generic Framework for Handling Uncertain Data with Local Correlations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon Hong Kong, China {xlian, leichen}@cse.ust.hk VLDB 2011 @ Seattle

  2. Sensory data: <temperature, light> Motivation Example • Forest monitoring application forest VLDB 2011 @ Seattle

  3. Motivation Example (cont'd) • Samples si collected from sensor node ni VLDB 2011 @ Seattle

  4. Motivation Example (cont'd) • Sensory data are uncertain and imprecise uncertainty regions VLDB 2011 @ Seattle

  5. Motivation Example (cont'd) • 3 monitoring areas forest VLDB 2011 @ Seattle

  6. Motivation Example (cont'd) • 3 monitoring areas forest sensors far away spatially close sensors VLDB 2011 @ Seattle

  7. Locally Correlated Sensory Data Area 2 Efficient Query Answering on Locally Correlated Uncertain Data Area 3 Area 1 VLDB 2011 @ Seattle

  8. Nearest Neighbor Queries on Locally Correlated Uncertain Data VLDB 2011 @ Seattle

  9. Outline • Introduction • Model for Locally Correlated Uncertain Data • Problem Definition • Query Answering on Uncertain Data With Local Correlations • Experimental Evaluation • Conclusions VLDB 2011 @ Seattle

  10. Introduction • Uncertain data are pervasive in real applications • Sensor networks • RFID networks • Location-based services • Data integration • While existing works often assume the independence among uncertain objects, • Uncertain objects exhibit correlations local correlations! VLDB 2011 @ Seattle

  11. Data Model for Local Correlations • Data Model • Uncertain objects contain several locally correlated partitions (LCPs) • Uncertain objects within each LCP are correlated with each other • Uncertain objects from distinct LCPs are independent of each other VLDB 2011 @ Seattle

  12. Data Model for Local Correlations (cont'd) • Bayesian network • Each vertex corresponds to a random variable • Each vertex is associated with a conditional probability table (CPT) VLDB 2011 @ Seattle

  13. Data Model for Local Correlations (cont'd) • The joint probability of variables • Join tuples in CPTs and multiply conditional probabilities • Variable elimination VLDB 2011 @ Seattle

  14. Definition of LC-PNN Query • Probabilistic Nearest Neighbor Query on Uncertain and Locally Correlated Data, LC-PNN VLDB 2011 @ Seattle

  15. Challenges & Solutions • Challenges • Straightforward method of linear scan is costly • Computation cost of integration is expensive • Dealing with data correlations • Filtering Methods • Index pruning • Candidate filtering with pre-computations VLDB 2011 @ Seattle

  16. Index Pruning • Basic idea • Let best_so_far be the smallest maximum distance from query point q to any uncertain objects seen so far • Then, any objects/nodes e having mindist(q, e) > best_so_far can be safely pruned best_so_far VLDB 2011 @ Seattle

  17. Candidate Filtering with Pre-Computations • Basic idea • Obtain an upper bound, UB_PrLC-PNN(q, oi), of the LC-PNN probability • Object oi can be safely pruned, if UB_PrLC-PNN(q, oi) < a How to obtain the probability upper bound? Derived from formula of the LC-PNN probability upper bound via pivots! VLDB 2011 @ Seattle

  18. Derivation of Probability Upper Bound pivotpivs5 l VLDB 2011 @ Seattle

  19. Range [min_l, max_l] of l • l= • Let min_l = and max_l = • If online l is smaller than min_l, then JPo(s5) = 1 • If online l is greater than max_l , then JPo(s5) = 0 • Thus, we do not need to store pre-computations with l outside the range [min_l, max_l] VLDB 2011 @ Seattle

  20. Candidate Positions of Pivots samples5 pivot pivs5

  21. Selection of Pivot Positions • We provide a cost model to formalize the filtering and refinement costs, and obtain a good value of parameter d to achieve low query cost VLDB 2011 @ Seattle

  22. LC-PNN Query Procedure • Index uncertain objects containing LCPs in an R-tree based index • For an LC-PNN query • When traversing the index, apply index pruning method and candidate filtering to remove false alarms • Refine candidates and return true query answers VLDB 2011 @ Seattle

  23. Experimental Evaluation • Data Sets • Real data: California road network • Synthetic data: lUeU, lUeG, lSeU, and lSeG • Generate center locations of LCPs with Uniform or Skew distribution • Produce extent lengths of LCPs with Uniform or Gaussian distribution • Within LCPs, randomly generate locally correlated uncertain objects with Bayesian networks • Competitor • Basic method [Cheng et al., SIGMOD 2003] • Assuming uncertain objects are independent • Measures • Wall clock time • Speed-up ratio VLDB 2011 @ Seattle

  24. LC-PNN Performance vs. a Extent length of LCP = [1, 3], data size N = 150K, average No. of uncertain objects in an LCP = 5 VLDB 2011 @ Seattle

  25. Conclusions • We proposed the problem of queries over locally correlated uncertain data, in particular, the LC-PNN query, which is important in real applications • We designed the index pruning method, and based on a proposed cost model, we presented the candidate filtering method via offline pre-computations w.r.t. pivots • We provided efficient query processing techniques to answer LC-PNN queries on locally correlated uncertain data, and discussed applying the same framework to answer other types of queries. VLDB 2011 @ Seattle

  26. Thank you! Q/A VLDB 2011 @ Seattle

More Related