220 likes | 372 Vues
This text offers a thorough introduction to cluster analysis, a technique also known as classification analysis or numerical taxonomy, aimed at grouping objects to maximize intra-group similarity while minimizing inter-group dissimilarity. The guide outlines the applications of cluster analysis in consumer studies, market segmentation, niche identification, and data reduction. It covers both divisive and agglomerative procedures, various similarity measures, and linkage methods. The summary highlights common problems faced in cluster analysis and proposes practical solutions for effective application.
E N D
Cluster Analysis Dr. Michael R. Hyman
Introduction • Also called classification analysis and numerical taxonomy • Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized • No (in)dependent variables • Find naturally occurring groupings of objects
Uses in Studying Consumers • Benefit segmentation • Finding market niches • Finding homogeneous market segments for future study • Data reduction
Scatter Plot of Income and Education Data for PC Owners and Non-owners
Procedure #1: Divisive (tear down) • Start with profile data • Find variable with highest variance • Split objects above and below mean on this variable • Find remaining high variance variable and split along mean
Procedure #2: Agglomerative (build up) • Select similarity measure • Distance (Euclidean, city block) • Correlation • Similarity • Search similarity matrix for most similar cluster pair • Repeat iteratively until only one cluster remains
Procedure #2: Agglomerative Stopping Rules • Theory and practice • Distance that clusters combine • Within/between group variance • Relative sizes of clusters
Procedure #2: Agglomerative Linkage Methods • Single (nearest neighbor) • Makes long, thin clusters • Complete (maximum distance to farthest neighbor) • Sensitive to outliers • Average distance between objects • Variance methods (minimum within-cluster variance) • Nodal (begin with two least similar objects as nodes)
Procedure #2: Agglomerative Reliability and Validity Assessment • Use different distance measures • Use different clustering methods • Split data, run both halves, and compare • Shuffle cases (objects) • Solve with subset of profile variables
General Problems • Early assignments treated as permanent • Precludes later revision for improved fit • Number of clusters • More clusters means greater intra-group homogeneity but less descriptive power • No good measure of cluster compactness • Lack of statistical properties makes inference difficult
General Problems (cont.) • Coping with inter-correlated profile variables • Must select profile variables that can discriminate among objects • Sensitive to unit of measurement and outliers • Fix: Standardize data and delete outliers • Subjective interpretation of results (i.e., naming clusters)