A Framework for Projected Clustering of High

A Framework for Projected Clustering of High Dimensional Data Streams Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004

Motivation and Underlying Concepts • All dimensions should not be considered in high dimensional setup for clustering • The Fading Cluster Structure: Use fading function • The half life t0 of a point is defined as the time at which f(t0) = (1=2)f(0). • A fading cluster structure at time t for a set of d-dimensional points • The clustering structure properties called additivity and temporal multiplicity • The clustering process requires a simultaneous maintenance of the clusters as well as the set of dimensions associated with each cluster

HPStream : High-Dimentional Projected Stream Clustering Method

HPStream Algorithm – Brief Explanation -Set parameters -Normalization Process -Initial Clustering using k-means and Init Number -ComputeDimensions: This procedure determines the dimensions in such a way that the spread along the chosen dimensions is as small as possible -The next step is the determination of the closest cluster to the incoming data point using FindProjectedDist -The procedure for determination of the limiting radius is denoted by FindLimitingRadius -Finally decision which cluster to add or delete.

Experimental Setup HPStream compared with Clustream : both implemented on MS VC++ One synthetic data and 2 sets of Real world data - Network Intrusion and Forest cover type data sets. Comparison criteria for judging the 2 algorithms: - accuracy : clustering quality - efficiency : stream processing rate - sensitivity : varying decay rate, l and radius threshold - scalability : varying number of dimensions and clusters Parameters initialized as following: Decay-rate = 0:5, Spread radius factor = 2, InitNumber =2000, Average Projected Dimensionality l > d/2.

Comparing Accuracy : Using clustering quality and cluster purity

Accuracy comparison continued:

Efficiency comparison using Stream Processing Rate:

Sensitivity : Varying ‘l’

Sensitivity: Varying radius threshold and decay rate

Scalability : varying dimensionality and number of clusters

A Framework for Projected Clustering of High

A Framework for Projected Clustering of High

Presentation Transcript

A Polygon-based Clustering and Analysis Framework for Mining Spatial Dataset

A Distributed Clustering Framework for MANETS

A Probabilistic Framework for Semi-Supervised Clustering

A UNIQUENESS THEOREM FOR CLUSTERING

Measures of Clustering Quality: A Working Set of Axioms for Clustering

A Framework for Annotating High-Throughput Genome-Wide Screens

Clustering of objectives from the KS3 Renewed Framework

A Framework for Identifying High Conservation Value Aquatic Ecosystems

Scalable Framework for Heterogeneous Clustering of Commodity FPGAs

Collaborative Clustering for Entity Clustering

Measures of Clustering Quality: A Working Set of Axioms for Clustering

Design and Evaluation of a Parallel Execution Framework for the CLEVER Clustering Algorithm

Open64: A Framework for High performance Compiler

A Framework for

A Framework for Verifying High-Assurance Transformation System (HATS)

A Framework for Clustering Evolving Data Streams

Efficient Clustering of High-Dimensional Data Sets

SCALE: a scalable framework for efficiently clustering transactional data

A Framework for

Catching the Trend- A Framework for Clustering Concept-Drifting Categorical Data

A Framework for High-Level Synthesis of System-on-Chip Designs

A Framework for Clustering Evolving Data Streams