Density Biased Sampling An improved method for clustering
210 likes | 378 Vues
Density Biased Sampling An improved method for clustering. By: Mesbah, Seyedsadra Department of computer Science, Lakehead University December 2013. Table of Contents. Abstract Introduction Density Biased Sampling Related Works Approximating Density Biased Sampling Experiments
Density Biased Sampling An improved method for clustering
E N D
Presentation Transcript
Density BiasedSamplingAn improved method for clustering By: Mesbah, Seyedsadra Department of computer Science, Lakehead University December2013
Table of Contents • Abstract • Introduction • Density Biased Sampling • Related Works • Approximating Density Biased Sampling • Experiments • Methodology • Evaluation Metrics • Data Generation • Results • Conclusion • References
Abstract • purpose • Problem with Uniform Random Sampling • Under Sample / Over Sample • Weighted Sample • Memory Efficient
Introduction • Uniform Sampling / No Value Consideration • Sets of Equivalent Records • Clustering in General • Reduce the Data Size • P-Uniform • Example • Density Biased Sample / Weighted Sample
Density Biased Sampling • Basic Definition • Constraints • Uniform Selection • Density Preserving Sample • Biased by Group Size / Sample Size M • Observations
Related Works • Some of Related Works • BIRCH Algorithm • Uniform Sampling vs. CF-Tree • DBS vs. PPS
Approximating DBS • Need to be Partitioned • Lack of Memory Problem • Two Pass algorithm • Sample of First j Items • Convert to One Pass Algorithm
Experiments • Aim • Conditions
Methodology • Experiment Specifications • BIRCH Summarization • Uniform Random Sampling • Hash Based Approximation • Exact Density Biased Sampling
Evaluation Metrics • RMS • RMS Error • Number of Clusters Found (NC)
Data Generation • Based on Mixture Model • Discard Noises • Cluster Membership Distributions • Example
Results (1) • BIRCH performs quite poorly
Results (2) • IBS and IRBS Find More Clusters
Results (3) • In Average Case, IBS and IRBS Are Better
Results (4) • Binning is ideal for IRBS
Results (5) • Collisions Have no Effect on Clustering
Applications • Improve Summarizations • Statistical Models
Conclusion • General Summary • Hash Based Approximation • Appropriate Binning • Problem with Uniform Sampling • Using Zipf Distribution
References 1. Christopher R. Palmer , Christos Faloutsos"Density Biased Sampling: An Improved Method for Data Mining and Clustering" 2. International Journal of Computer Science and Management Research Vol 1 Issue 1 Aug 2012ISSN 2278-733X A.K.Jainet.al. 72"Survey of Recent Clustering Techniques in Data Mining"