1 / 123

Session 9

Session 9. Outline. Two Multivariate Methods Cluster Analysis Excel Minitab Discriminant Analysis Excel Minitab Steam case Cars. Cluster Analysis.

dara
Télécharger la présentation

Session 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session 9

  2. Outline • Two Multivariate Methods • Cluster Analysis • Excel • Minitab • Discriminant Analysis • Excel • Minitab • Steam case • Cars Applied Regression -- Prof. Juran

  3. Cluster Analysis • Concerned with grouping a large number of observations into reasonable sub-groups (clusters) on the basis of their similarities on multiple dimensions • Similar to regression in terms of its basic method: finding a solution that minimizes a total sum of squared errors • Not concerned with explaining variability or forecasting • No dependent variable Applied Regression -- Prof. Juran

  4. Example: MBA Programs Applied Regression -- Prof. Juran

  5. Cluster Analysis Questions • Given a certain number of clusters, which schools are grouped together? • How is the set of clusters affected if we change the number of clusters? • For each cluster, which school is the most “typical”? • How different are the clusters from each other? • What is the best number of clusters? Applied Regression -- Prof. Juran

  6. Basic Method in Excel • We will assume that all of these attributes deserve equal weighting in our analysis. We will • name a school as the “typical” school in each cluster (called the centroid of the cluster), • assign each of the non-centroid schools to the cluster where they are most similar to the centroid, and • optimize the identities of the centroids and the cluster assignments so as to minimize the total Euclidean distance between each school and its cluster centroid. • We define “most similar” to be the least sum of squared errors across all attributes between a cluster member and the centroid of the cluster. Applied Regression -- Prof. Juran

  7. Nonlinear Problems Some nonlinear problems can be formulated in a linear fashion (i.e. some network problems). Other nonlinear functions can be solved with our basic methods (i.e. smooth, continuous functions that are concave or convex, such as portfolio variances). However, there are many types of nonlinear problems that pose significant difficulties. Applied Regression -- Prof. Juran

  8. Nonlinear Problems The linear solution to a nonlinear (say, integer) problem may be infeasible. The linear solution may be far away from the actual optimal solution. Some functions have many local minima (or maxima), and Solver is not guaranteed to find the global minimum (or maximum). Applied Regression -- Prof. Juran

  9. 3 Solvers • Simplex LP Solver • GRG Nonlinear Solver • Evolutionary Solver Applied Regression -- Prof. Juran

  10. Solution Methodology The standard simplex algorithm (Solver’s default method) won’t work on this problem. The GRG Nonlinear algorithm will make an honest effort, but is likely to give up without finding the optimal solution. This can result from the use of MAX, IF, and SUMIF functions, resulting in discontinuities in our productive function and constraints as functions of the decision variables. It can also be the result of using numerical decision variables that are in fact simply names (as in this example, where the names of the clusters happen to be numbers). The Evolutionary Solver, a genetic algorithm, can do a good job with a problem like this, but is not guaranteed to find the optimal solution. Applied Regression -- Prof. Juran

  11. Solution Methodology The Evolutionary Solver operates in a completely different way from the other types. Instead of searching in a structured way guaranteed to reach the optimal solution, genetic algorithms operate somewhat like biological evolutionary processes, with some degree of randomness in the steps taken from one solution to the next. In a finite period of time, the Evolutionary Solver is not guaranteed to find the optimal solution, but it will find very good solutions and try to improve upon them. Applied Regression -- Prof. Juran

  12. Standardization In cluster analysis it is common to standardize the attribute data, so that those variables with large units (such as cost, salary and student body size) do not dominate the sum of squares over attributes with small units (such as % female, % admitted, and % with a job at graduation). So we transform each attribute for each school into a z-value. Applied Regression -- Prof. Juran

  13. Applied Regression -- Prof. Juran

  14. Applied Regression -- Prof. Juran

  15. Optimization Procedure We set up the model in a large spreadsheet, as shown here. The upper section contains the standardized data, the middle section contains information about the 10 centroids, and the lower section evaluates the distances between each school and each of the centroids, and assigns schools to clusters on the basis of minimum distance. Applied Regression -- Prof. Juran

  16. Decision Variables We begin by setting up cells C34:C43, where Solver can identify which schools are centroids. In this initial solution, all centroids have a value of 1 (the index for Stanford), and the corresponding standardized data for Stanford appear in D34:P43. These indices will be manipulated by Solver to find the best ten centroids. Applied Regression -- Prof. Juran

  17. In the lower section of the worksheet, we calculate the total squared distance from each school to each centroid, and pick the minimum. Cell B45 — the objective function in this problem — is the sum of M49:M75. Applied Regression -- Prof. Juran

  18. Applied Regression -- Prof. Juran

  19. Applied Regression -- Prof. Juran

  20. Applied Regression -- Prof. Juran

  21. Applied Regression -- Prof. Juran

  22. Applied Regression -- Prof. Juran

  23. Applied Regression -- Prof. Juran

  24. Applied Regression -- Prof. Juran

  25. Applied Regression -- Prof. Juran

  26. Applied Regression -- Prof. Juran

  27. Applied Regression -- Prof. Juran

  28. Applied Regression -- Prof. Juran

  29. Applied Regression -- Prof. Juran

  30. Applied Regression -- Prof. Juran

  31. Applied Regression -- Prof. Juran

  32. Applied Regression -- Prof. Juran

  33. Applied Regression -- Prof. Juran

  34. Cluster Analysis Questions • Given a certain number of clusters, which schools are grouped together? • How is the set of clusters affected if we change the number of clusters? • For each cluster, which school is the most “typical”? • How different are the clusters from each other? • What is the best number of clusters? Applied Regression -- Prof. Juran

  35. Given a certain number of clusters, which schools are grouped together? • Columbia and NYU are always in the same cluster, as are Harvard-Penn, Indiana-Michigan State. • Michigan-Cornell-Yale-Dartmouth-Chicago-Duke. • Texas-Emory-Georgetown-Minnesota. • What happens with UCLA-Berkeley? Applied Regression -- Prof. Juran

  36. How is the set of clusters affected if we change the number of clusters? • Notice the behavior of Northwestern as we reduce the number of clusters. • Stanford seems to be very different from all other schools; the last school to have its own cluster. Applied Regression -- Prof. Juran

  37. For each cluster, which school is the most “typical”? • The centroid represents the school most typical in each cluster. • We observe that Michigan is almost always the centroid of a large cluster. Applied Regression -- Prof. Juran

  38. How different are the clusters from each other? • This is difficult to assess with this method; Minitab will provide more useful output. Applied Regression -- Prof. Juran

  39. What is the best number of clusters? Applied Regression -- Prof. Juran

  40. Correlation issues? Applied Regression -- Prof. Juran

  41. Applied Regression -- Prof. Juran

  42. Applied Regression -- Prof. Juran

  43. Applied Regression -- Prof. Juran

  44. Applied Regression -- Prof. Juran

  45. Applied Regression -- Prof. Juran

  46. Applied Regression -- Prof. Juran

  47. Applied Regression -- Prof. Juran

  48. Applied Regression -- Prof. Juran

  49. Applied Regression -- Prof. Juran

  50. Applied Regression -- Prof. Juran

More Related