1 / 116

Managing Uncertainty in Spatial and Spatio -temporal Data

Managing Uncertainty in Spatial and Spatio -temporal Data. Andreas Züfle 1 , Goce Trajcevski², Tobias Emrich ? Matthias Renz 1 , Hans-Peter Kriegel 1 , Nikos Mamoulis³, Reynold Cheng³ 1 LMU Munich ² NWU Evanston ³ HKU Hong Kong 4 USC Los Angeles.

chin
Télécharger la présentation

Managing Uncertainty in Spatial and Spatio -temporal Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing Uncertainty in Spatial and Spatio-temporal Data Andreas Züfle1, Goce Trajcevski², Tobias Emrich? Matthias Renz1, Hans-Peter Kriegel1, Nikos Mamoulis³, Reynold Cheng³ 1 LMU Munich ² NWU Evanston ³ HKU Hong Kong 4 USC Los Angeles

  2. Managing Uncertainty in Spatial and Spatio-temporal Data ? ? ? Andreas Züfle1, Goce Trajcevski², Tobias Emrich? Matthias Renz1, Hans-Peter Kriegel1, Nikos Mamoulis³, Reynold Cheng³ 1 LMU Munich ² NWU Evanston ³ HKU Hong Kong 4 USC Los Angeles

  3. Aimofthistutorial … • Understand basic concepts for scalable probabilistic query processing on uncertain spatial and spatio-temporal query processing. • A tutorial, not a survey. • Get the big picture … • NOT in terms of a long list of recent methods and algorithms • BUT in terms of general concepts, commonly used in this field.

  4. Outline • Tutorial decomposed into three parts: • Uncertain Spatial Data (Andreas Züfle) • UncertainSpatio-Temporal Data (Geometric Approach) (GoceTrajcevski) • UncertainSpatio-Temporal Data (Probabilistic Approach) (Tobias Emrich) • Please feel free to ask questions at any time during the presentation. • The latest version of these slides will be made available withinthenextweek: http://www.dbs.ifi.lmu.de/~zuefle

  5. Outline • Introduction • UncertainSpatial Data • UncertainSpatio-Temporal Data (Geometric Approach) • UncertainSpatio-Temporal Data (Probabilistic Approach)

  6. Geo-Spatial Data • Huge flood of geo-spatial data • Modern technology • New user mentality • Great research potential • New applications • Innovative research • Economic Boost • “$600 billion potential annual consumer surplus from using personal location data” [1] [1] McKinsey Global Institute. Big data: The next frontier for innovation, competition, and productivity. June 2011.

  7. Geo-Spatial Data

  8. Spatio-Temporal Data • (object, location, time) triples • Queries: • “Find friends that attended the same concert last saturday” • Best case: Continuousfunction GPS log takenfrom a thirtyminutedrivethrough Seattle Dataset providedby: P. Newsonand J. Krumm. Hidden Markov Map Matching Through Noise and Sparseness. ACMGIS 2009.

  9. SourcesofUncertainty • Missing Observations • Missing GPS signal • RFID sensorsavailable in discretelocationsonly • Wireless sensornodessendinginfrequentlytopreserveenergy • Infrequentcheck-insofusersof geo-socialnetworks • Dataset providedby: E. Cho, S. A. Myers and J. Leskovek. Friendshipand Mobility: User Movement in Location-BasedSocial Networks. SIGKDD 2011.

  10. SourcesofUncertainty • Uncertain Observations • Imprecisesensormeasurements (e.g. radiotriangulation, Wi-Fi positioning) • Inconsistentinformation (e.g. contradictivesensordata) • Human errors (e.g. in crowd-sourcingapplications) • Fromdatabaseperspective, thepositionof a mobile objectisuncertain • Dataset providedby: E. Cho, S. A. Myers and J. Leskovek. Friendshipand Mobility: User Movement in Location-BasedSocial Networks. SIGKDD 2011.

  11. Research Challenge Include the uncertainty, which is inherent in spatial and spatio-temporal data, directly in the querying and mining process.

  12. Research Challenge Include the uncertainty, which is inherent in spatial and spatio-temporal data, directly in the querying and mining process. Assess the reliability of similarity search and data mining results, enhancing the underlying decision-making process.

  13. Research Challenge Include the uncertainty, which is inherent in spatial and spatio-temporal data, directly in the querying and mining process. Assess the reliability of similarity search and data mining results, enhancing the underlying decision-making process. Improve the quality of modern location based applications and of research results in the field.

  14. Possible World Semantics UncertainSpatial Data: Models • Discrete Models • Continuous Models 0.4 b

  15. Possible World Semantics Possible World Semantics • A collectionofuncertainspatialobjectsdefines an uncertainspatialdatabase. • Combinationsofobjectinstancesdefinepossibledatabaseinstances, calledPossibleWorlds. • Assumption: The probabilityof a possibleworldcanbecomputedefficiently.

  16. Possible World Semantics AnsweringQueriesusing PWS • Let • be an uncertaindatabasehavingpossibleworlds • bethesetofpossibleworldsof • be a querypredicate. • be an indicatorfunctionreturningoneifpredicateholds in worldandzerootherwise. • The probabilitythat a querypredicateholds on an uncertaindatabaseisdefinedas

  17. PossibleWorlds: Example II A B D E C F L H I J K G Q O N S R P T M U W Z Y X V

  18. PossibleWorlds: Example II A B D E C F L H I J K G Q O N S R P T M U W Z Y X V

  19. Toomanypossibleworlds

  20. QueryingUncertain Data: Complexity • Naive Query Processing is exponential in the number of objects • Are there efficient solutions to query uncertain spatial data? • In general: No! • “The problem of answering queries on a probabilistic database D is -complete in thesizeof D.“[DalviSuciu04] • Can bereducedtouncertainspatialdatabases • But: Specific queries may have polynomial time solutions! [DalviSuciu04] Dalvi, N. N., and Suciu, D. Efficient query evaluation on probabilistic databases. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB),(2004).

  21. QueryingUncertain Data: RunningExample Return thenumberofobjectslocated in thedepictedcircularregioncentered at querypoint q. This numberis a random variable. Total numberofpossibleworlds: q

  22. Data Cleaning: Aggregation • Ignore Uncertainty (Data Cleaning) • Replace uncertain objects by a deterministic “best guess” • Expected Positions • Most-likely Positions • … • Query results are not reliable! • Query results may be biased! D C q H I

  23. EquivalentWorlds: An intuition Observation #1: Foranypossibleworldandanypossibleworldderivedfrombychangingthepositionofobjectthefollowingequivalenceholds q

  24. EquivalentWorlds: An intuition Observation #1 allowstodiscardobjects outside of´ thequeryregion. Remaining numberofequivalentclassesofpossibleworlds: q

  25. Querying Uncertain Spatial Data EquivalentWorlds: An intuition D C Observation #2: Foreachremainingobject, weonlyneedtoconsiderthepredicate “inside”. q H I

  26. EquivalentWorlds: An intuition D C Observation #2: Foreachremainingobject, weonlyneedtoconsiderthepredicate “inside”. Remaining numberofequivalentclassesofpossibleworlds: q H I

  27. EquivalentWorlds: An intuition D C Observation #3: We only require the number of objects in the query region. Information about concrete results objects can be discarded. q H I

  28. Generating Functions • Main idea: Usepolyomialmultiplicationtoenumeratepossibleresults C A 0.8 0.2 q H H3 0.4 B

  29. Generating Functions • Main idea: Usepolyomialmultiplicationtoenumeratepossibleresults • Observation 3: Anonymize Objects - Substitute A,B,C by x x x 0.8 0.2 q H H3 0.4 x

  30. Generating Functions • Main idea: Usepolyomialmultiplicationtoenumeratepossibleresults • Observation 3: Anonymize Objects - Substitute A,B,C by x x x 0.8 0.2 q H H3 0.4 x

  31. Generating Functions • Main idea: Usepolyomialmultiplicationtoenumeratepossibleresults • Observation 3: Anonymize Objects - Substitute A,B,C by x • Eachmonomialimpliesthattheprobabilityofhavingexactlyresults, equals x x 0.8 0.2 q H H3 0.4 x

  32. Generating Functions: Formally For eachobjectlet Considerthefollowinggeneratingfunction [2] in theexpandedpolynomialthecoefficientofmonomialequalstheprobabilitythatexactlyobjectsareinsidethequeryregion. [2] Jian Li, BarnaSaha and Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): 502-513 (2009).

  33. Count Queries on Uncertain Data Example: C A 0.8 0.2 q H H3 0.4 B

  34. Count Queries on Uncertain Data Example: = C A 0.8 0.2 q H H3 0.4 B

  35. Count Queries on Uncertain Data Example: = = C A 0.8 0.2 q H H3 0.4 B

  36. Count Queries on Uncertain Data Example: = = C A 0.8 0.2 q H H3 0.4 B

  37. The Paradigm of Equivalent Worlds A query predicate , and an uncertain database DB, we can answer on DB in PTIME if the following three conditions are satisfied: • A traditional query on certain data can be answered in polynomial time • We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|. • The probability of a class can be computed in polynomial time.

  38. The Paradigm of Equivalent Worlds

  39. ApproximatedResults: Sampling • Materialize a set S of possible worlds • Samples drawn independent and unbiased • Evaluated the query predicate on each world • Distribution of sampled results is an unbiased approximation of the true distribution of results.

  40. Sampling: Example • Drawing 100 possible worlds may yield the following • estimators: • Comparetotheexactprobabilities: • Noindicationofreliabilityorconfidenceofestimations! C A 0.8 0.2 q H H3 0.4 B

  41. Sampling: Confidences • Drawing 100 possible worlds may yield the following • estimators: • Usestatisticalmethodstoassessthequalityofestimators • E.g. Wald-Test: • Where is the percentile of the standard normal distribution. • At a significancelevelof, thetrueprobabilityis in theinterval [0.442, 0.638]. • True probability C A 0.8 0.2 q H H3 0.4 B

  42. UncertainSpatial Data Management: Summary • Motivation • Floodof geo-spatialdata • Enrichedwith additional contexts (text, social, multimedia) • Inherentuncertainty • Data Cleaning • “Best guess” answers. • Unreliable results • Biased results • Paradigm of Equivalent Worlds • Efficient solution for the most prominent types of spatial queries • Example: Generating Functions • Approximations • Monte-Carlo sampling • Probabilistic guarantees

More Related