An Efficient Distance Calculation Method for Uncertain Objects
This paper presents an efficient methodology for calculating distances between uncertain objects, addressing the inherent challenges in traditional methods that transform uncertain data into exact values. It explores various approximation techniques, including Distance between Means, Pair-wise between Random Samples, and single Gaussian approximations. The study reveals that the Approximation by Single Gaussian (ASG) method achieves high accuracy at reduced computational costs, outperforming classical methods like Distance between Means and Grid Approximation. Results indicate ASG can effectively replace traditional approaches while maintaining accuracy.
An Efficient Distance Calculation Method for Uncertain Objects
E N D
Presentation Transcript
An Efficient Distance Calculation Method for Uncertain Objects Edward Hung csehung@comp.polyu.edu.hk Hong Kong Polytechnic University 2007 CIDM, Hawaii, USA, Apr 1-5, 2007
Uncertain Objects: From Where? • Sources • Sensors readings • statistical classifiers in image processing • predictive programs for stock market • Weather forecast
Uncertain Objects handled traditionally … • Transformed into exact values • Weighted average or mean • Value of highest frequency or possibility • Why bad?? • Intermediate and final results become approximate • E.g., deviation of cluster centroids and wrong assignment of some data
Distance: Why Important? • Various queries and data mining tasks, e.g., • Nearest-neighbor queries • Clustering (e.g., K-means clustering)
Distance: Why Expensive? • An uncertain object has more than one possible location • Continuous E.g., take n samples on each uncertain object • More samples in region of higher probability density o1 o2
Expected Distance: Why Expensive? • Expected distance: weighted average of all pair-wise combinations’ distances • VERY expensive • Much cheaper IF we do NOT need to try all combinations
Analytic Solutions • Uniform pdf • Gaussian pdf
Approximation Methods for Arbitrary pdf • 5 methods proposed …
2. Pair-wise between Random Samples (PRS) • take n samples on each uncertain object o1 o2
3. Grid Approximation and Pair-wise between Samples (GAPS) • Approximation by a grid of √s X √s cells formed on the uncertainty domain • Probability of each cell determined by sampling
4. Pair-wise between Gaussian Mixture (PGM) • Use K-means to cluster samples into a few clusters) • Approximate the uncertain object by a mixture of Gaussian distributions o1 o2
5. Approximation by Single Gaussian (ASG) • Approximate an uncertain object by a single Gaussian distributions: • Complexity = O((ni+nj)d) o1 o2
Equivalence of PRS, PGM and ASG • Theorem: • Given any uncertain objects oi, oj and their samples, EDPRS(oi,oj)=EDPGM(oi,oj)=EDASG(oi,oj) • So, ASG vs PRS, PGM • Cheapest with same accuracy • What about ASG vs DM and GAPS?
Performance Study • Experimental results show that • ASG vs DM • much more accurate with comparable speed • ASG VS GAPS • much faster than GAPS with higher or comparable accuracy
Conclusion • ASG can obtain highly accurate results quickly • For data with arbitrary pdf, uniform pdf, Gaussian mixture pdf • ASG can replace GAPS used in recent research work