1 / 82

# Multivariate Statistical Analysis

Multivariate Statistical Analysis. Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia. controllable factors. input. Process. output. uncontrollable factors. What Is Multivariate Analysis?. Télécharger la présentation ## Multivariate Statistical Analysis

E N D

### Presentation Transcript

1. Multivariate Statistical Analysis Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia

2. controllable factors input Process output uncontrollable factors What Is Multivariate Analysis? • Statistical methodology to analyze data with measurements on many variables

3. Why to Learn Multivariate Analysis? • Explanation of a social or physical phenomenon must be tested by gathering and analyzing data • Complexities of most phenomena require an investigator to collect observations on many different variables

4. Application Examples • Is one product better than the other? • Which factor is the most important to determine the performance of a system? • How to classify the results into clusters? • What are the relationships between variables?

5. Course Outline • Introduction • Matrix Algebra and Random Vectors • Sample Geometry and Random Samples • Multivariate Normal Distribution • Inference about a Mean Vector • Comparison of Several Multivariate Means • Multivariate Linear Regression Models

6. Course Outline • Principal Components • Factor Analysis and Inference for Structured Covariance Matrices • Canonical Correlation Analysis • Discrimination and Classification • Clustering, Distance Methods, and Ordination • Multidimensional Scaling* • Structural Equation Modeling*

7. Text Book and Website • R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, 5th ed., Prentice Hall, 2002. (雙葉) • http://cc.ee.ntu.edu.tw/~skjeng/ MultivariateAnalysis2006.htm

8. References • J. F. Hair, Jr., B. Black, B. Babin, R. E. Anderson, and R. L. Tatham, Multivariate Data Analysis, 6th ed., Prentice Hall, 2006. (華泰) • D. C. Montgomery, Design and Analysis of Experiments, 6th ed., John Wiley, 2005. (歐亞) • D. Salsberg著, 葉偉文譯,統計,改變了世界, 天下遠見, 2001.

9. References • 張碧波，推理統計學，三民，1976. • 張輝煌編譯，實驗設計與變異分析，建興，1986.

10. Time Management Importance II I Emergency III IV

11. Some Important Laws • First things first • 80 – 20 Law • Fast prototyping and evolution

12. Major Uses of Multivariate Analysis • Data reduction or structural simplification • Sorting and grouping • Investigation of the dependence among variables • Prediction • Hypothesis construction and testing

13. Array of Data

14. Descriptive Statistics • Summary numbers to assess the information contained in data • Basic descriptive statistics • Sample mean • Sample variance • Sample standard deviation • Sample covariance • Sample correlation coefficient

15. Sample Mean and Sample Variance

16. Sample Covariance and Sample Correlation Coefficient

17. Standardized Values (or Standardized Scores) • Centered at zero • Unit standard deviation • Sample correlation coefficient can be regarded as a sample covariance of two standardized variables

18. Properties of Sample Correlation Coefficient • Value is between -1 and 1 • Magnitude measure the strength of the linear association • Sign indicates the direction of the association • Value remains unchanged if all xji’s and xjk’s are changed to yji = axji + b and yjk = cxjk + d, respectively, provided that the constants a and c have the same sign

19. Arrays of Basic Descriptive Statistics

20. Example • Four receipts from a university bookstore • Variable 1: dollar sales • Variable 2: number of books

21. Arrays of Basic Descriptive Statistics

22. Using SAS • Create New Project • Name  Project • Insert Data • New: Name  Data1 • Change column name • Right button, select Properties… • Enter data in the data grid • Select Data1 under Project • Analysis Descriptive • Summary statistics • Correlations

23. Summary Statistics • Save • Personal  Enterprise Guide Sample  Data Data1.sas7bdat • Columns  Variables to assign  Analysis variables • Statistics  Mean

24. Report on Means

25. Correlations • Delete Summary statistics node • Save • Personal  Enterprise Guide Sample  Data Data1.sas7bdat • Columns  Variables to assign  Correlation variables • Correlations  Pearson  Covariances  Show Pearson correlations in results  Divisor for variances (Number of rows)

26. Correlations • Results  Show results  (uncheck) show statistics for each variable  (uncheck) show significance probabilities associated with correlations

27. Report on Correlations

28. Scatter Plot and Marginal Dot Diagrams

29. Scatter Plot and Marginal Dot Diagrams for Rearranged Data

30. Effect of Unusual Observations

31. Effect of Unusual Observations

32. Paper Quality Measurements

33. Lizard Size Data *SVL: snoutvent length; HLS: hind limb span

34. 3D Scatter Plots of Lizard Data

35. Female Bear Data and Growth Curves

36. Utility Data as Stars

37. Chernoff Faces over Time

38. Euclidean Distance • Each coordinate contributes equally to the distance

39. Statistical Distance • Weight coordinates subject to a great deal of variability less heavily than those that are not highly variable

40. Statistical Distance for Uncorrelated Data

41. Scattered Plot for Correlated Measurements

42. Statistical Distance under Rotated Coordinate System

43. General Statistical Distance

44. Necessity of Statistical Distance

45. Necessary Conditions for Statistical Distance Definitions

46. Reading Assignments • Text book • pp. 50-60 • pp. 84-97

47. Outliers • Observations with a unique combination of characteristics identifiable as distinctly different from the other observations • Impact • Limiting the generalizability of any type of analysis • Must be viewed in light of how representative it is of the population to be retained or deleted

48. Sources of Outliers • Procedure error • Extraordinary event • e,g., hurricane for daily rainfall analysis • Extraordinary observations • Researcher has no explanation • Unique in their combinations of values across the variables • Falls within ordinary range of values of variables • Retain it unless proved invalid

49. Rules of Thumb for Univariate Outlier Detection • Small samples (80 or fewer observations) • Cases with standard scores of 2.5 or greater • Larger sample size • Threshold increases up to 4 • Standard score not used • Cases falling outside the range of 2.5 versus 4 standard deviations, depending on the sample size

More Related