1 / 10

Cluster Analysis

Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within the group and heterogeneous (different) from other groups on some variables (?).

tamal
Télécharger la présentation

Cluster Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within the group and heterogeneous (different) from other groups on some variables (?). When we don’t have “some variables”, we can still form groups using Multidimensional Scaling (MDS) Techniques. MDS - continuous Space Cluster - discrete groups Main Application in Marketing:Market Segmentation Data requirement ~ generally interval or ratio (ordinal and nominal ??) Steps Decide on measures of distance (similarity or dissimilarity) Hierarchical Cluster ~decide on how to combine observations Non-hierarchical cluster (K-means or quick cluster) Interpretation of clusters How many clusters Cluster validation Cluster Analysis

  2. Two types of measures of distance ( or proximity, similarity) Direct ~ we shall use in MDS Indirect Derived from original variables or factor scores Indirect Measures of distance Non-metric ~ we shall use in MDS Metric Data Euclidean Distance Minkowski Distance Mahalanobis Distance Distance between BMW and Ford Cluster Analysis:Measures of Distance~ Similarity or Dissimilarity i=BMW j=Ford k = nos. variables Euclidean Minkowski v2 Mahalanobis ED v1 Luxury Luxury & Safety Safety

  3. Correlational vs Distance measure Conceptual decision Compromise – Mahalanobis distance Not based on statistical foundation Assumptions Representative samples No Multicolinearity Similarity Measures 1 2 3

  4. Cluster Analysis:Hierarchical Clustering Dendogram • Methods to combine observations • Centroid • Nearest Neighbor or single linkage • Farthest-neighbor or complete linkage • Average linkage • Ward’s • Centriod Method distance Data should be scaled? s1 s2 s3 s4 s5 s6 Nearest neighbor

  5. Cluster Analysis:Non-Hierarchical Clustering • K-Means Cluster/ Quick Cluster • The data are divided into k-groups each group representing a cluster • STEPS • Select k initial cluster centroids, the number of cluster desired • Assign each observation to the cluster to which it is closest • Reassign or relocate each observation to one of the k clusters according to predetermined stopping rule Say we want 3 clusters and first 3 observations are centroids Change criterion: Continue if > 2% Which Clustering Method is Best? 1. Hierarchical ~ Which one to use? ~ Advantage: no prior knowledge of nos. of clusters, ~ Disadvantage: Once assigned, no reassignment 2. K-Means / Quick Cluster ~ require prior knowledge, how many clusters? Complementary: Run Hierarchical, decide on no of clusters, Run K -Means

  6. Interpretation of Clusters • . • Pseudo F

  7. Cluster Analysis:Validation Cross-validation • . S1 = assignment based on cluster on 1-14 cases S2 = assignment based on separate cluster Example from Text Hit rate =112/151 =74%

  8. Latent Segments Model to Incorporate Heterogeneity

  9. Customer segmentation - partition consumers into homogeneous groups that differ in purchasing behavior It provides information about consumer preferences and market structure at segment level Consumers with similar socio-demographics have different purchasing behavior Brand choice probabilities can be used to define both market segment and market structure Theoretical model: Multinomial logit Conceptual appeal being grounded in economic theory Analytical tractability and ease of econometric estimation Excellent Empirical performance Introduction

  10. Kamakura and Russell (1989) propose and test latent segmentation. Number of applications and numerous citation, 200+ Discrete interpretation of continuous distribution. Number of useful applications in Marketing and other areas. In our own work used to determine size of price sensitive segment (25% to 35%).

More Related