1 / 82

1.82k likes | 3.34k Vues

Multivariate Statistical Analysis. Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia. controllable factors. input. Process. output. uncontrollable factors. What Is Multivariate Analysis?.

Télécharger la présentation
## Multivariate Statistical Analysis

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Multivariate Statistical Analysis**Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia**controllable factors**input Process output uncontrollable factors What Is Multivariate Analysis? • Statistical methodology to analyze data with measurements on many variables**Why to Learn Multivariate Analysis?**• Explanation of a social or physical phenomenon must be tested by gathering and analyzing data • Complexities of most phenomena require an investigator to collect observations on many different variables**Application Examples**• Is one product better than the other? • Which factor is the most important to determine the performance of a system? • How to classify the results into clusters? • What are the relationships between variables?**Course Outline**• Introduction • Matrix Algebra and Random Vectors • Sample Geometry and Random Samples • Multivariate Normal Distribution • Inference about a Mean Vector • Comparison of Several Multivariate Means • Multivariate Linear Regression Models**Course Outline**• Principal Components • Factor Analysis and Inference for Structured Covariance Matrices • Canonical Correlation Analysis • Discrimination and Classification • Clustering, Distance Methods, and Ordination • Multidimensional Scaling* • Structural Equation Modeling***Text Book and Website**• R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, 5th ed., Prentice Hall, 2002. (雙葉) • http://cc.ee.ntu.edu.tw/~skjeng/ MultivariateAnalysis2006.htm**References**• J. F. Hair, Jr., B. Black, B. Babin, R. E. Anderson, and R. L. Tatham, Multivariate Data Analysis, 6th ed., Prentice Hall, 2006. (華泰) • D. C. Montgomery, Design and Analysis of Experiments, 6th ed., John Wiley, 2005. (歐亞) • D. Salsberg著, 葉偉文譯,統計,改變了世界, 天下遠見, 2001.**References**• 張碧波，推理統計學，三民，1976. • 張輝煌編譯，實驗設計與變異分析，建興，1986.**Time Management**Importance II I Emergency III IV**Some Important Laws**• First things first • 80 – 20 Law • Fast prototyping and evolution**Major Uses of Multivariate Analysis**• Data reduction or structural simplification • Sorting and grouping • Investigation of the dependence among variables • Prediction • Hypothesis construction and testing**Descriptive Statistics**• Summary numbers to assess the information contained in data • Basic descriptive statistics • Sample mean • Sample variance • Sample standard deviation • Sample covariance • Sample correlation coefficient**Standardized Values (or Standardized Scores)**• Centered at zero • Unit standard deviation • Sample correlation coefficient can be regarded as a sample covariance of two standardized variables**Properties of Sample Correlation Coefficient**• Value is between -1 and 1 • Magnitude measure the strength of the linear association • Sign indicates the direction of the association • Value remains unchanged if all xji’s and xjk’s are changed to yji = axji + b and yjk = cxjk + d, respectively, provided that the constants a and c have the same sign**Example**• Four receipts from a university bookstore • Variable 1: dollar sales • Variable 2: number of books**Using SAS**• Create New Project • Name Project • Insert Data • New: Name Data1 • Change column name • Right button, select Properties… • Enter data in the data grid • Select Data1 under Project • Analysis Descriptive • Summary statistics • Correlations**Summary Statistics**• Save • Personal Enterprise Guide Sample Data Data1.sas7bdat • Columns Variables to assign Analysis variables • Statistics Mean**Correlations**• Delete Summary statistics node • Save • Personal Enterprise Guide Sample Data Data1.sas7bdat • Columns Variables to assign Correlation variables • Correlations Pearson Covariances Show Pearson correlations in results Divisor for variances (Number of rows)**Correlations**• Results Show results (uncheck) show statistics for each variable (uncheck) show significance probabilities associated with correlations**Lizard Size Data***SVL: snoutvent length; HLS: hind limb span**Euclidean Distance**• Each coordinate contributes equally to the distance**Statistical Distance**• Weight coordinates subject to a great deal of variability less heavily than those that are not highly variable**Ellipse of Constant Statistical Distance for Uncorrelated**Data x2 x1 0**Reading Assignments**• Text book • pp. 50-60 • pp. 84-97**Outliers**• Observations with a unique combination of characteristics identifiable as distinctly different from the other observations • Impact • Limiting the generalizability of any type of analysis • Must be viewed in light of how representative it is of the population to be retained or deleted**Sources of Outliers**• Procedure error • Extraordinary event • e,g., hurricane for daily rainfall analysis • Extraordinary observations • Researcher has no explanation • Unique in their combinations of values across the variables • Falls within ordinary range of values of variables • Retain it unless proved invalid**Rules of Thumb for Univariate Outlier Detection**• Small samples (80 or fewer observations) • Cases with standard scores of 2.5 or greater • Larger sample size • Threshold increases up to 4 • Standard score not used • Cases falling outside the range of 2.5 versus 4 standard deviations, depending on the sample size

More Related