1 / 70

Principal Components Analysis

Principal Components Analysis. Principal components factor analysis. Obtaining a factor solution through principal components analysis is an iterative process that usually requires repeating the SPSS factor analysis procedure a number of times to reach a satisfactory solution.

Télécharger la présentation

Principal Components Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principal Components Analysis

  2. Principal components factor analysis • Obtaining a factor solution through principal components analysis is an iterative process that usually requires repeating the SPSS factor analysis procedure a number of times to reach a satisfactory solution. • We begin by identifying a group of variables whose variance we believe can be represented more parsimoniously by a smaller set of factors, or components. The end result of the principal components analysis will tell us which variables can be represented by which components, and which variables should be retained as individual variables because the factor solution does not adequately represent their information. • A principal component factor analysis requires: • The variables included must be metric level • The sample size must be greater than 150 (Tabachnick and Fidell) • The correlation matrix for the variables must contain 2 or more correlations of 0.30 or greater • Variables with measures of sampling adequacy less than 0.50 must be removed • The overall measure of sampling adequacy is 0.50 or higher • The Bartlett test of sphericity is statistically significant (p ≤ alpha) • The suitability phase of a principal component analysis is devoted to verifying that we meet these requirements. If we do not meet these requirements, factor analysis is not appropriate.

  3. Extracting the Factor Solution • The second phase of a principal component factor analysis focuses on deriving a factor model, or pattern of relationships between variables and components, that satisfies the following requirements: • The derived components explain 50% or more of the variance in each of the variables, i.e. have a communality greater than 0.50 • None of the variables have loadings, or correlations, of 0.40 or higher for more than one component, i.e. do not have complex structure • No component has only one variable in it • To meet these requirements, we remove problematic variables from the analysis and repeat the principal component analysis procedure in SPSS. • If, at the conclusion of this process, we can substitute the components for the variables in further analyses if: • the components have more than one variable loading on them, • the components explain at least 50% of the variance in each of the included variables, and • components that cumulatively explain more than 60% of the variance in the set of included variables. • Variables that were removed in the analysis should be included individually in further analyses.

  4. Substituting components for variables • Substitution of components for individual variables is accomplished by : • using only the highest loading variable in place of the other variables loading on the component, • or by combining the variables loading on each component to create a new variable. This week’s problems focus on obtaining the factor structure and do not require us to create any factor variables.

  5. Notes • When evaluating measures of sampling adequacy, communalities, or factor loadings, we ignore the sign of the numeric value and base our decision on the size or magnitude of the value. • The sign of the number indicates the direction of the relationship (direct or inverse). • A loading of -0.732 is just as strong as a loading of 0.732. The minus sign indicates an inverse or negative relationship; the absence of a sign is meant to imply a plus sign indicating a direct or positive relationship. • If there are two or more components in the component matrix, the pattern of loadings is based on the SPSS Rotated Component Matrix. If there is only one component in the solution, the Rotated Component Matrix is not computed, and the pattern of loadings is based on the Component Matrix. • It is possible that the analysis will break down and we will have too few variables in the analysis to support the use of principal component analysis.

  6. The Problem in BlackBoard • The problem statement tells us: • the data set and variables included in the analysis • to make the assumption that it is not necessary to omit outliers • the alpha for the statistical tests

  7. Statement about Level of Measurement

  8. Marking the Statement about Level of Measurement All of the variables included in the analysis are ordinal level. We will employ the common convention of treating ordinal variables as metric variables, but we should consider mentioning this as a limitation to the analysis. Since we treated all variables as metric, we mark the check box.

  9. Statement about Sample Size We will use the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

  10. Run the Principal Components Analysis - 1 Select the Factor command from the Analyze > Data Reduction menu.

  11. Run the Principal Components Analysis - 2 First, move the variables listed in the problem to the Variables list box. Next, click on the Descriptives button to request the statistics needed to evaluate the suitability of the data for factor analysis.

  12. Run the Principal Components Analysis - 3 First, mark the check box for Univariate Statistics to get the number of valid cases for the analysis. Third, click on the Continue button to close the Factor Analysis: Descriptives dialog box. • Second, mark the check boxes for the statistics for the suitability of factor analysis: • Coefficients of the correlation matrix, • KMO and Bartlett’s test of sphericity, and • Anti-image correlation matrix.

  13. Run the Principal Components Analysis - 4 Click on the Extraction button to tell SPSS what method it should use to extract the factors.

  14. Run the Principal Components Analysis - 5 We will use the default method of Principal Components. The drop down list contains numerous other methods. Click on the Continue button to close the dialog box. We accept the other defaults for displaying the unrotated factor solution and extracting eigenvalues over 1.

  15. Run the Principal Components Analysis - 6 Click on the Rotation button to tell SPSS what method it should use to rotate the factors to clarify the interpretation.

  16. Run the Principal Components Analysis - 7 Click on the Continue button to close the dialog box. We mark the option button for the Varimax rotation which will make the factors independent of each other.

  17. Run the Principal Components Analysis - 8 Having specified the analysis, click on the OK button to produce the output.

  18. Output for Sample Size Requirement The 509 cases available for this principal components analysis satisfy the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

  19. Marking the Statement about Sample Size Since we satisfied the minimum sample size requirement, we mark the statement. If we did not satisfy the sample size requirement, we should consider mentioning this fact as a limitation to the analysis. Factor analysis can be numerically unstable when the sample size is small.

  20. The Statement about Suitability for Factor Analysis: Sufficient Correlations Principal components analysis requires that there be some correlations greater than 0.30 (more than 1) between the variables included in the analysis.

  21. Sufficient Correlations in Correlation Matrix For this set of variables, there are 9 correlations in the matrix greater than 0.30.

  22. Marking the Statement about Sufficient Correlations Since there are 9 correlations greater than 0.30, we mark the statement.

  23. The Statement about Suitability for Factor Analysis: Test of Sphericity Bartlett’s test of sphericity tests the null hypothesis that the correlation matrix is an identity matrix with 1’s, or perfect correlations, on the main diagonal, and 0’s for all of the remaining elements. If this is true, the variables are not correlated and the factor analysis will not work. Our goal in this test is to reject the null hypothesis, supporting the contention that there are sufficient correlations, or similarity of values, among the variables that several can be combined into a factor or component.

  24. Bartlett’s Test of Sphericity Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity (χ²(df=15, N = 509) = 854.15, p < .001) be less than or equal to the level of significance (0.05). The probability associated with the Bartlett Test satisfies this requirement.

  25. Marking the Statement about Bartlett’s Test of Sphericity Since the probability associated with the Bartlett Test is sufficient to reject the null hypothesis, we mark the check box.

  26. The Statement about Suitability for Factor Analysis: Sampling Adequacy Sampling adequacy predicts if data are likely to factor well, based on correlation and partial correlation. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy (MSA) must be greater than 0.50 for each individual variable as well as the set of variables. Variables that do not have an MSA of .50 or greater are removed from the analysis one at a time, until all variables and the overall measure are above .50.

  27. Measures of Sampling Adequacy for Individual Variables In the initial iteration for suitability of principal components analsyis , the MSA for all of the individual variables was greater than 0.50 ("information and knowledge are shared openly within this organization" [q76] - .70; "an effort is made to get the opinions of people throughout the organization" [q77] - .69; "our web site is easy to use and contains helpful information" [q83] - .76; "I have a good understanding of our mission, vision, and strategic plan" [q84] - .73; "I believe we communicate our mission effectively to the public" [q85] - .81; and "my organization encourages me to be involved in my community" [q86] - .84). Note: Not all MSA’s are shown on this slide.

  28. Kaiser-Meyer-Olkin Measure of Sampling Adequacy In addition, the overall MSA for the set of variables included in the analysis was 0.75, which exceeds the minimum requirement of 0.50 for overall MSA.

  29. Marking the Statement about Measures of Sampling Adequacy Since the sampling adequacy measures met the criteria for both individual variables and overall, the check box is marked.

  30. Statement about Initial Number of Factors Various tests are used to estimate the number of factors to be extracted. This was very important when factor analysis was calculated by hand. Two of the criteria were the latent root criterion which was based on the number of eigenvalues greater than 1.0 and the cumulative proportion of variance criteria which calculated the number of components needed to explain 60% or more of the total variance in the original set of variables. The problem offers two possible responses.

  31. Initial Number of Factors: Eigenvalues Greater than One The latent root criterion for number of factors to extract would indicate that there were 2 components to be extracted for these variables, since there were 2 eigenvalues greater than 1.0 (2.84, and 1.05).

  32. Initial Number of Factors: Percentage of Variance Explained In addition, the cumulative proportion of variance criteria can be met with 2 components to satisfy the criterion of explaining 60% or more of the total variance in the original set of variables. A 2 component solution would explain an estimated 64.86% of the total variance.

  33. Marking the Statement about Initial Number of Factors Since the SPSS default is to extract the number of components indicated by the latent root criterion, our initial factor solution will be based on the extraction of 2 components. We mark the second statement in the pair. Note: the question is worded to indicate that both criteria suggest the same number of factors. Should they suggest a different number of factors, neither statement would be marked, but we would still continue with the factor analysis using the number of factors suggested by the latent root criteria.

  34. Statement about First Iteration of Factor Extraction The problem suggests that the first iteration of the factor solution included a variable (my organization encourages me to be involved in my community [q86] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

  35. Output for Communalities on First Iteration Examination of the first principal components model extracted by SPSS resulted in the removal of the variable "my organization encourages me to be involved in my community" [q86] from the analysis. "My organization encourages me to be involved in my community" [q86]was removed because it communality (.467) meant that the factor solution explained less than half of the variable's variance. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable.

  36. Marking the Statement about First Iteration of Factor Extraction My organization encourages me to be involved in my community [q86] was removed because it did not satisfy the requirement for communalities, i.e. the factors should explain at least 50% of the variance in the variable. Since we have already determined that the variable is to be removed, it was not necessary to check the factor loadings for simple structure. The first statement in the pair is marked.

  37. Removing a Variable from the Factor Analysis - 1 To remove the variable, my organization encourages me to be involved in my community [q86], we select Factor Analysis from the Dialog Recall drop down menu.

  38. Removing a Variable from the Factor Analysis - 2 To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.

  39. Removing a Variable from the Factor Analysis - 3 Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

  40. Statement about Second Iteration of Factor Extraction The problem suggests that the second iteration of the factor solution included a variable (I believe we communicate our mission effectively to the public [q85] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

  41. Output for Communalities on Second Iteration Examination of the second principal components model extracted by SPSS produced a table of Communalities in which all variables have the required minimum of .50.

  42. Output for Factor Structure on Second Iteration Examination of the second principal components model extracted by SPSS resulted in the removal of the variable "I believe we communicate our mission effectively to the public" [q85] from the analysis. The variable "I believe we communicate our mission effectively to the public" [q85] had loadings of 0.40 or higher on component 1 (.526) and component 2 (.536). Multiple high loadings violates the requirement for simple structure, so this variable was removed from the analysis.

  43. Marking the Statement about Second Iteration of Factor Extraction I believe we communicate our mission effectively to the public [q85] was removed because it did not satisfy the requirement for simple structure, so the first statement in the pair is marked.

  44. Removing a Variable from the Factor Analysis - 1 To remove the variable, I believe we communicate our mission effectively to the public [q85], we select Factor Analysis from the Dialog Recall drop down menu.

  45. Removing a Variable from the Factor Analysis - 2 To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.

  46. Removing a Variable from the Factor Analysis - 3 Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

  47. Statement about Third Iteration of Factor Extraction • The problem does not indicate that any variables were removed on the third iteration of the factor extraction, and that the solution met all of the requirements for a factor analysis solution: • all the variables remaining in the analysis had communalities above 0.50, • demonstrated simple structure, and • each component had more than one variable loading on it

  48. Output for Communalities on Third Iteration - 1 Examination of the third principal components model extracted by SPSS produced a table of Communalities in which all four variables have the required minimum of .50.

  49. Output for Factor Structure on Third Iteration - 2 Examination of the third principal components model extracted by SPSS did not show any variables having a loading of .40 on both of the components.

  50. Output for Factor Structure on Third Iteration - 3 Each of the components has two variables loading on it. If a component had only one variable loading on it, it would make more sense to use the original variable in subsequent analyses rather than the component.

More Related