1 / 82

Multivariate Statistical Analysis

Multivariate Statistical Analysis. Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia. controllable factors. input. Process. output. uncontrollable factors. What Is Multivariate Analysis?.

felton
Télécharger la présentation

Multivariate Statistical Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Statistical Analysis Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia

  2. controllable factors input Process output uncontrollable factors What Is Multivariate Analysis? • Statistical methodology to analyze data with measurements on many variables

  3. Why to Learn Multivariate Analysis? • Explanation of a social or physical phenomenon must be tested by gathering and analyzing data • Complexities of most phenomena require an investigator to collect observations on many different variables

  4. Application Examples • Is one product better than the other? • Which factor is the most important to determine the performance of a system? • How to classify the results into clusters? • What are the relationships between variables?

  5. Course Outline • Introduction • Matrix Algebra and Random Vectors • Sample Geometry and Random Samples • Multivariate Normal Distribution • Inference about a Mean Vector • Comparison of Several Multivariate Means • Multivariate Linear Regression Models

  6. Course Outline • Principal Components • Factor Analysis and Inference for Structured Covariance Matrices • Canonical Correlation Analysis • Discrimination and Classification • Clustering, Distance Methods, and Ordination • Multidimensional Scaling* • Structural Equation Modeling*

  7. Text Book and Website • R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, 5th ed., Prentice Hall, 2002. (雙葉) • http://cc.ee.ntu.edu.tw/~skjeng/ MultivariateAnalysis2006.htm

  8. References • J. F. Hair, Jr., B. Black, B. Babin, R. E. Anderson, and R. L. Tatham, Multivariate Data Analysis, 6th ed., Prentice Hall, 2006. (華泰) • D. C. Montgomery, Design and Analysis of Experiments, 6th ed., John Wiley, 2005. (歐亞) • D. Salsberg著, 葉偉文譯,統計,改變了世界, 天下遠見, 2001.

  9. References • 張碧波,推理統計學,三民,1976. • 張輝煌編譯,實驗設計與變異分析,建興,1986.

  10. Time Management Importance II I Emergency III IV

  11. Some Important Laws • First things first • 80 – 20 Law • Fast prototyping and evolution

  12. Major Uses of Multivariate Analysis • Data reduction or structural simplification • Sorting and grouping • Investigation of the dependence among variables • Prediction • Hypothesis construction and testing

  13. Array of Data

  14. Descriptive Statistics • Summary numbers to assess the information contained in data • Basic descriptive statistics • Sample mean • Sample variance • Sample standard deviation • Sample covariance • Sample correlation coefficient

  15. Sample Mean and Sample Variance

  16. Sample Covariance and Sample Correlation Coefficient

  17. Standardized Values (or Standardized Scores) • Centered at zero • Unit standard deviation • Sample correlation coefficient can be regarded as a sample covariance of two standardized variables

  18. Properties of Sample Correlation Coefficient • Value is between -1 and 1 • Magnitude measure the strength of the linear association • Sign indicates the direction of the association • Value remains unchanged if all xji’s and xjk’s are changed to yji = axji + b and yjk = cxjk + d, respectively, provided that the constants a and c have the same sign

  19. Arrays of Basic Descriptive Statistics

  20. Example • Four receipts from a university bookstore • Variable 1: dollar sales • Variable 2: number of books

  21. Arrays of Basic Descriptive Statistics

  22. Using SAS • Create New Project • Name  Project • Insert Data • New: Name  Data1 • Change column name • Right button, select Properties… • Enter data in the data grid • Select Data1 under Project • Analysis Descriptive • Summary statistics • Correlations

  23. Summary Statistics • Save • Personal  Enterprise Guide Sample  Data Data1.sas7bdat • Columns  Variables to assign  Analysis variables • Statistics  Mean

  24. Report on Means

  25. Correlations • Delete Summary statistics node • Save • Personal  Enterprise Guide Sample  Data Data1.sas7bdat • Columns  Variables to assign  Correlation variables • Correlations  Pearson  Covariances  Show Pearson correlations in results  Divisor for variances (Number of rows)

  26. Correlations • Results  Show results  (uncheck) show statistics for each variable  (uncheck) show significance probabilities associated with correlations

  27. Report on Correlations

  28. Scatter Plot and Marginal Dot Diagrams

  29. Scatter Plot and Marginal Dot Diagrams for Rearranged Data

  30. Effect of Unusual Observations

  31. Effect of Unusual Observations

  32. Paper Quality Measurements

  33. Lizard Size Data *SVL: snoutvent length; HLS: hind limb span

  34. 3D Scatter Plots of Lizard Data

  35. Female Bear Data and Growth Curves

  36. Utility Data as Stars

  37. Chernoff Faces over Time

  38. Euclidean Distance • Each coordinate contributes equally to the distance

  39. Statistical Distance • Weight coordinates subject to a great deal of variability less heavily than those that are not highly variable

  40. Statistical Distance for Uncorrelated Data

  41. Ellipse of Constant Statistical Distance for Uncorrelated Data x2 x1 0

  42. Scattered Plot for Correlated Measurements

  43. Statistical Distance under Rotated Coordinate System

  44. General Statistical Distance

  45. Necessity of Statistical Distance

  46. Necessary Conditions for Statistical Distance Definitions

  47. Reading Assignments • Text book • pp. 50-60 • pp. 84-97

  48. Outliers • Observations with a unique combination of characteristics identifiable as distinctly different from the other observations • Impact • Limiting the generalizability of any type of analysis • Must be viewed in light of how representative it is of the population to be retained or deleted

  49. Sources of Outliers • Procedure error • Extraordinary event • e,g., hurricane for daily rainfall analysis • Extraordinary observations • Researcher has no explanation • Unique in their combinations of values across the variables • Falls within ordinary range of values of variables • Retain it unless proved invalid

  50. Rules of Thumb for Univariate Outlier Detection • Small samples (80 or fewer observations) • Cases with standard scores of 2.5 or greater • Larger sample size • Threshold increases up to 4 • Standard score not used • Cases falling outside the range of 2.5 versus 4 standard deviations, depending on the sample size

More Related