1 / 38

Business Research Methods

Business Research Methods. Factor Analysis.

elan
Télécharger la présentation

Business Research Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Business Research Methods Factor Analysis

  2. Factor analysis is used to uncover the latent structure (dimensions) of a set of variables. It reduces attribute space from a larger number of variables to a smaller number of factors and as such is a "non-dependent" procedure (that is, it does not assume a dependent variable is specified). Factor analysis could be used for any of the following purposes: • To reduce a large number of variables to a smaller number of factors for modeling purposes, where the large number of variables precludes modeling all the measures individually. As such, factor analysis is integrated in structural equation modeling (SEM), helping create the latent variables modeled by SEM. • However, factor analysis can be and is often used on a stand-alone basis for similar purposes. • To select a subset of variables from a larger set, based on which original variables have the highest correlations with the principal component factors. • To create a set of factors to be treated as uncorrelated variables as one approach to handling multicollinearity in such procedures as multiple regression • To validate a scale or index by demonstrating that its constituent items load on the same factor, and to drop proposed scale items which cross-load on more than one factor. • To establish that multiple tests measure the same factor, thereby giving justification for administering fewer tests. • To identify clusters of cases and/or outliers.

  3. A non-technical analogy: A mother sees various bumps and shapes under a blanket at the bottom of a bed. When one shape moves toward the top of the bed, all the other bumps and shapes move toward the top also, so the mother concludes that what is under the blanket is a single thing, most likely her child. Similarly, factor analysis takes as input a number of measures and tests, analogous to the bumps and shapes. Those that move together are considered a single thing, which it labels a factor. That is, in factor analysis the researcher is assuming that there is a "child" out there in the form of an underlying factor, and he or she takes simultaneous movement (correlation) as evidence of its existence. If correlation is spurious for some reason, this inference will be mistaken, of course, so it is important when conducting factor analysis that possible variables which might introduce spuriousness, such as anteceding causes, be included in the analysis and taken into account. • Factor analysis is part of the multiple general linear hypothesis (MLGH) family of procedures and makes many of the same assumptions as multiple regression: linear relationships, interval or near-interval data, untruncated variables, proper specification (relevant variables included, extraneous ones excluded), lack of high multicollinearity, and multivariate normality for purposes of significance testing. • Factor analysis generates a table in which the rows are the observed raw indicator variables and the columns are the factors or latent variables which explain as much of the variance in these variables as possible. • The cells in this table are factor loadings, and the meaning of the factors must be induced from seeing which variables are most heavily loaded on which factors. • This inferential labeling process can be fraught with subjectivity as diverse researchers impute different labels. • There are several different types of factor analysis, with the most common being principal components analysis (PCA). • However, principal axis factoring (PAF), also called common factor analysis, is preferred for purposes of confirmatory factory analysis in structural equation modeling.

  4. Initial Considerations • Sample Size • Correlation coefficients fluctuate from sample to sample, much more so in small samples than in large. • Therefore, the reliability of factor analysis is also dependent on sample size. Much has been written about the necessary sample size for factor analysis resulting in many “rules-of-thumb.” • The common rule is to suggest that a researcher has at least 10-15 subjects per variable. • In fact, Tabachnick and Fidell (1996) agree that 'it is comforting to have at least 300 cases for factor analysis (p. 640) and Comrey and Lee (1992) class 300 as a good sample size, 100 as poor and 1000 as excellent. • More recently, Guadagnoli and Velicer (1988) found that the most important factors in determining reliable factor solutions was the absolute sample size and the absolute magnitude of factor loadings. • In short, they argue that if a factor has four or more loadings greater than 0.6 then it is reliable regardless of sample size. • Furthermore, factors with 10 or more loadings greater than 0.40 are reliable if the sample size is greater than 150. • Finally, factors with a few low loadings should not be interpreted unless the sample size is 300 or more.

  5. Data Screening • If we find any variables that do not correlate with any other variables (or very few) then you should consider excluding these variables before the factor analysis is run. • One extreme of this problem is when the R-matrix resembles an identity matrix (we’ll peek at this later). • In this case, variables correlate only with themselves and all other correlation coefficients are close to zero. • SPSS tests this using Bartlett s test of sphericity (see the next slides).

  6. The correlations between variables can be checked using the correlate procedure to create a correlation matrix of all variables. • This matrix can also be created as part of the main factor analysis. • The opposite problem is when variables correlate too highly. • Although mild multicollinearity is not a problem for factor analysis it is important to avoid extreme multicollinearity (i.e. variables that are very highly correlated) and singularity (variables that are perfectly correlated). • Therefore, at this early stage we look to eliminate any variables that don’t correlate with any other variables or that correlate very highly with other variables (R < 0.9). • As well as looking for interrelations, you should ensure that variables have roughly normal distributions and are measured at an interval level (which Likert scales are, perhaps wrongly, assumed to be!). • The assumption of normality is important only if you wish to generalize the results of your analysis beyond the sample collected.

  7. The SPSS Stress Test • The SPSS Stress Test

  8. The Data from the SPSS Stress Test

  9. Running the Analysis • Access the main dialog box by using the Analyze - Data Reduction - Factor . • Simply select the variables you want to include in the analysis (remember to exclude any variables that were identified as problematic • during the data screening) and transfer them to the box labeled Variables by clicking on arrow button.

  10. There are several options available, the first of which can be accessed by clicking on to access the dialog box in the preceding Figure. • The Univariate descriptives option provides means and standard deviations for each variable. • You can also ask for the Determinant of this matrix and this option is vital for testing for multicollinearity or singularity. • KMO and Bartlett’s test of sphericity produces the Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett’s test. • With a sample of 2571 we shouldn’t have cause to worry about the sample size. • The value of KMO should be greater than 0.5 if the sample is adequate.

  11. Bartlett’s test examines whether the population correlation matrix resembles an identity matrix (i.e. it tests whether the off-diagonal components are zero). • If the population correlation matrix resembles an identity matrix then it means that every variable correlates very badly with all other variables (i.e., all correlation coefficients are close to zero). • If it were an identity matrix then it would mean that all variables are perfectly independent from one another (all correlation coefficients are zero). • Given that we are looking for clusters of variables that measure similar things, it should be obvious why this scenario is problematic: if no variables correlate then there are no clusters to find. • The Reproduced option produces a correlation matrix based on the model (rather than the real data).

  12. The Anti-image option produces an anti-image matrix of covariances and correlations. • These matrices contain measures of sampling adequacy for each variable along the diagonal and the negatives of the partial correlation/covariances on the off-diagonals. • When you have finished with this dialog box click on _____ to return to the main dialog box.

  13. Factor Extraction on SPSS • To access the extraction dialog box, click on in the main dialog box. • There are a number of ways of conducting a factor analysis and when and where you use the various methods depend on the method chosen and what you hope to do with the analysis. • Tinsley and Tinsley (1987) give an excellent account of the different methods available. • There are two things to consider: whether you want to generalize the findings from your sample to a population and whether you are exploring your data or testing a specific hypothesis. • Here, we’re looking at techniques for exploring data using factor analysis. • Hypothesis testing requires considerable complexity and can be done with computer programs such as LISREL and others. • Those interested in hypothesis testing techniques (known as confirmatory factor analysis) are advised to read Pedhazur and Schmelkin (1991) for an introduction.

  14. In the Analyze box there are two options: to analyze the Correlation matrix or to analyze the Covariance matrix. • You should be happy with the idea that these two matrices are actually different versions of the same thing: the correlation matrix is the standardized version of the covariance matrix. Analyzing the correlation matrix is a useful default method because it takes the standardized form of the matrix; therefore, if variables have been measured using different scales this will not affect the analysis. • In our example, all variables have been measured using the same measurement scale (a five-point Likert scale), but often you will want to analyze variables that use different measurement scales. • Analyzing the correlation matrix ensures that differences in measurement scales are accounted for.

  15. The Display box has two options within it: to display the Unrotated factor solution and a Scree plot. • The scree plot is a useful way of establishing how many factors should be retained in an analysis. • The unrotated factor solution is useful in assessing the improvement of interpretation due to rotation. • If the rotated solution is little better than the unrotated solution then it is possible that an inappropriate (or less optimal) rotation method has been used.

  16. The Extract box provides options pertaining to the retention of factors. • You have the choice of either selecting factors with Eigenvalues greater than a user-specified value or retaining a fixed number of factors. • For the Eigenvalues over option the default is Kaiser s recommendation of eigenvalues over 1, but you could change this to Jolliffe s recommendation of 0.7 or any other value you want. • It is probably best to run a primary analysis with the Eigenvalues over 1 option selected, select a scree plot, and compare the results. • If looking at the scree plot and the eigenvalues over 1 lead you to retain the same number of factors then continue with the analysis and be happy. • If the two criteria give different results then examine the communalities and decide for yourself which of the two criteria to believe. • If you decide to use the scree plot then you may want to redo the analysis specifying the number of factors to extract. • The number of factors to be extracted can be specified by selecting Number of factors and then typing the appropriate number in the space provided (e.g. 4).

  17. Rotation Techniques • The interpretability of factors can be improved through rotation. • Rotation maximizes the loading of each variable on one of the extracted factors whilst minimizing the loading on all other factors. • This process makes it much clearer which variables relate to which factors. • Rotation works through changing the absolute values of the variables while keeping their differential values constant. • Click on _ to access the dialog box. • Varimax, quartimax and equamax are all orthogonal rotations while direct oblimin and promax are oblique rotations. • Quartimax rotation attempts to maximize the spread of factor loadings for a variable across all factors. • Therefore, interpreting variables becomes easier. • However, this often results in lots of variables loading highly onto a single factor. • Varimax is the opposite in that it attempts to maximize the dispersion of loadings within factors. • Therefore, it tries to load a smaller number of variables highly onto each factor resulting in more interpretable clusters of factors. • Equamax is a hybrid of the other two approaches and is reported to behave fairly erratically. • In most circumstances the default of 25 is more than adequate for SPSS to find a solution for a given • data set. • However, if you have a large data set (like we have here) then the computer might have difficulty finding a solution (especially for oblique rotation). To allow for the large data set we are using change the value to 30.

  18. Scores • The factor scores dialog box can be accessed by clicking in the main dialog box. • This option allows you to save factor scores for each subject in the data editor. • SPSS creates a new column for each factor extracted and then places the factor score for each subject within that column. • These scores can then be used for further analysis, or simply to identify groups of subjects who score highly on particular factors. • There are three methods of obtaining these scores. • If you want to ensure that factor scores are uncorrelated then select the Anderson-Rubin method; if correlations between factor scores are acceptable then choose the Regression method. • As a final option, you can ask SPSS to produce the factor score coefficient matrix.

  19. Options • This set of options can be obtained by clicking on in the main dialog box. • Missing data are a problem for factor analysis just like most other procedures and SPSS provides a choice of excluding cases or estimating a value for a case. • If the missing data are non-normally distributed or the sample size after exclusion is too small then estimation is necessary. • SPSS uses the mean as an estimate (Replace with mean). • These procedures lower the standard deviation of variables and so can lead to significant results that would otherwise be non-significant. • The final two options relate to how coefficients are displayed. • By default SPSS will list variables in the order in which they are entered into the data editor. Usually, this format is most convenient. • However, when interpreting factors it is sometimes useful to list variables by size. • By selecting Sorted by size, SPSS will order the variables by their factor loadings. • In fact, it does this sorting fairly intelligently so that all of the variables that load highly onto the same factor are displayed together. • The second option is to Suppress absolute values less than a specified value (by default 0.1). • The default value is not that useful and I recommend changing it either to 0.4 (for interpretation purposes) or to a value reflecting the expected value of a significant factor loading given the sample size. • For this example set the value at 0.4.

  20. Interpreting Output from SPSS • Preliminary Analysis • The first body of output concerns data screening, assumption testing and sampling adequacy. • You’ll find several large tables (or matrices) that tell us interesting things about our data. • If you selected the Univariate descriptives option then the first table will contain descriptive statistics for each variable (the mean, standard deviation and number of cases). • This table can be found here. • The table also includes the number of missing cases; this summary is a useful way to determine the extent of missing data. • SPSS shows the R-matrix (or correlation matrix) produced using the coefficients and significance levels options. • The easiest way to do this is by scanning the significance values and looking for any variable for which the majority of values are greater than 0.05. • Then scan the correlation coefficients themselves and look for any greater than 0.9. • If any are found then you should be aware that a problem could arise because of singularity in the data: check the determinant of the correlation matrix and, if necessary, eliminate one of the two variables causing the problem. • The determinant is listed at the bottom of the matrix blink and you ll miss it). • For these data its value is 5.271E 04 (which is 0.0005271) which is greater than the necessary value of 0.00001. • Therefore, we can be confident that multicollinearity is not a problem for these data. • In summary, all questions in the SAQ correlate fairly well with all others (this is partly because of the large sample) and none of the correlation coefficients are particularly large; therefore, there is no need to consider eliminating any questions at this stage.

  21. SPSS shows several very important parts of the output: the Kaiser-Meyer-Olkin measure of sampling adequacy, Bartlett’s test of sphericity and the anti-image correlation and covariance matrices (note that these matrices have been edited down to contain only the first and last five variables). • The anti-image correlation and covariance matrices provide similar information (remember the relationship between covariance and correlation) and so only the anti-image correlation matrix need be studied in detail as it is the most informative. • These tables are obtained using the KMO and Bartlett’s test of sphericity and the Anti-image options.

  22. The KMO statistic can be calculated for individual and multiple variables and represents the ratio of the squared correlation between variables to the squared partial correlation between variables. In this instance, the statistic is calculated for all 23 variables simultaneously. • The KMO statistic varies between 0 and 1. A value of 0 indicates that the sum of partial correlations is large relative to the sum of correlations, indicating diffusion in the pattern of correlations (bad news). • A value close to I indicates that patterns of correlations are relatively compact and so factor analysis should yield distinct and reliable factors. • Kaiser (1974) recommends accepting values greater than 0.5 as acceptable (values below this should lead you to either collect more data or rethink which variables to include). • Furthermore, values between 0.5 and 0.7 are mediocre, values between 0.7 and 0.8 are good, values between 0.8 and 0.9 are great and values above 0.9 are superb. • For these data the value is 0.93 which falls into the range of being superb: so, we should be confident that factor analysis is appropriate for these data.

  23. I Remember, the KMO can be calculated for multiple and individual variables. • The KMO values for individual variables are produced on the diagonal of the anti-image correlation matrix (I have highlighted the values in yellow; next slide). • These values make the anti-image correlation matrix an extremely important part of the output (although the anti-image covariance matrix can be ignored). • As well as checking the overall KMO statistic, it is important to examine the diagonal elements of the anti-image correlation matrix: the value should be above 0.5 for all variables. • For these data all values are well above 0.5 which is good news! • If you find any variables with values below 0.5 then you should consider excluding them from the analysis (or run the analysis with and without that variable and note the difference). • Removal of a variable affects the KMO statistics, so if you do remove a variable be sure to re-examine the new anti-image correlation matrix. As for the rest of the anti-image correlation matrix, the off-diagonal elements represent the • partial correlations between variables. • For a good factor analysis we want these correlations to be very small (the smaller the better). • So, as a final check you can just look through to see that the off-diagonal elements are small (they should be for these data).

  24. Bartlett’s measure tests the null hypothesis that the original correlation matrix is an identity matrix. • For factor analysis to work we need some relationships between variables and if the R-matrix were an identity matrix then all correlation coefficients would be zero. • Therefore, we want this test to be significant (i.e. have a significance value less than 0.05). • For these data, Bartlett’s test is highly significant (p < 0.001), and therefore factor analysis is appropriate.

  25. Factor Extraction • The first part of the factor extraction process is to determine the linear components within the data set (the eigenvectors) by calculating the eigenvalues of the R-matrix. • We know that there areas many components (eigenvectors) in the R-matrix as there are variables, but most will be unimportant. • To determine the importance of a particular vector we look at the magnitude of the associated eigenvalue. • We can then apply criteria to determine which factors to retain and which to discard. • By default SPSS uses Kaiser s criterion of retaining factors with eigenvalues greater than 1.

  26. The eigenvalues associated with each factor represent the variance explained by that particular linear component and SPSS also displays the eigenvalue in terms of the percentage of variance explained (so, factor 1 explains 31.696% of total variance). • It should be clear that the first few factors explain relatively large amounts of variance (especially factor 1) whereas subsequent factors explain only small amounts of variance. • SPSS then extracts all factors with eigenvalues greater than 1, which leaves us with four factors. • The eigenvalues associated with these factors are again displayed (and the percentage of variance explained) in the columns labelled Extraction Sums of Squared Loadings. • The values in this part of the table are the same as the values before extraction, except that the values for the discarded factors are ignored (thus, the table is blank after the fourth factor). • In the final part of the table (labeled Rotation Sums of Squared Loadings), the eigenvalues of the factors after rotation are displayed.

  27. In optimizing the factor structure, one consequence for these data is that the relative importance of the four factors is equalized. • Before rotation, factor 1 accounted for considerably more variance than the remaining three (31.696% compared to 7.560, 5.725, and 5.336%), however after extraction it accounts for only 16.219% of variance (compared to 14.523, 11.099 and 8.475% respectively).

  28. The scree plot is shown in SPSS with an arrow indicating the point of inflexion on the curve. • This curve is difficult to interpret because the curve begins to tail off after three factors, but there is another drop after four factors before a stable plateau is reached. • Therefore, we could probably justify retaining either two or four factors. • Given the large sample, it is probably safe to assume Kaiser s criterion; however, you might like to rerun the analysis specifying that SPSS extract only two factors and compare the results.

  29. Factor Rotation • The first analysis to run was an orthogonal rotation (Varimax). • However, you we’re also running the analysis using oblique rotation. • In this section the results of both analyses will be reported so as to highlight the differences between the outputs. • This comparison will also be a useful way to show the circumstances in which one type of rotation might be preferable to another.

  30. Orthogonal Rotation (Varimax) • SPSS Output shows the rotated component matrix (also called the rotated factor matrix in factor analysis) which is a matrix of the factor loadings for each variable onto each factor. • This matrix contains the same information as the component matrix in SPSS except that it is calculated after rotation. • There are several things to consider about the format of this matrix. • First, factor loadings less than 0.4 have not been displayed because we asked for these loadings to be suppressed using the option. • If you didn’t select this option, or didn’t adjust the criterion value to 0.4, then your output will differ. • Second, the variables are listed in the order of size of their factor loadings. • By default, SPSS orders the variables as they are in the data editor; however, we asked for the output to be Sorted by size using the option. • If this option was not selected your output will look different. • I have allowed the variable labels to be printed to aid interpretation. • The original logic behind suppressing loadings less than 0.4 was based on Stevens s (1992) suggestion that this cut-off point was appropriate for interpretative purposes (i.e. loadings greater than 0.4 represent substantive values).

  31. Unrotated Solution (Before rotation, most variables loaded highly • onto the first factor and the remaining factors didn’t really get a look see).

  32. Rotated Solution (However, the rotation of the factor structure has clarified things considerably: there are four factors and variables load very highly onto only one factor … with the exception of one question).

  33. The next step is to look at the content of questions that load onto the same factor to try to identify common themes. • If the mathematical factor produced by the analysis represents some real-world construct then common themes among highly loading questions can help us identify what the construct might be. • The questions that load highly on Factor I seem to all relate to using computers or SPSS. • Therefore we might label this factor -- fear of computers. • The questions that load highly on factor 2 all seem to relate to different aspects of statistics; therefore, we might label this factor fear of statistics. • The three questions that load highly on factor 3 all seem to relate to mathematics; therefore, we might label this factor fear of mathematics. • Finally, the questions that load highly on factor 4 all contain some component of social evaluation from friends; therefore, we might label this factor peer evaluation. • This analysis seems to reveal that the initial questionnaire, in reality, is composed of four sub-scales: fear of computers, fear of statistics, fear of maths, and fear of negative peer evaluation. • There are two possibilities here. • The first is that the SAQ failed to measure what it set out to (namely SPSS anxiety) but does measure some related constructs. • The second is that these four constructs are sub-components of SPSS anxiety; however, the factor analysis does not indicate which of these possibilities is true.

  34. In the original analysis I asked for scores to be calculated based on the Anderson-Rubin method (thus, why they are uncorrelated). • You will find these scores in the data editor. • There should be four new columns of data (one for each factor) labeled FAC1_1, FAC2_1, FAC3_1 and FAC4_1 respectively. • If you asked for factor scores in the oblique rotation then these scores will appear in the data editor in four other columns labeled FAC2_1 and so on. • These factor scores can be listed in the output viewer using the Analyze – Reports - Case Summaries... command path). • Given that there are over 1500 cases you might like to restrict the output to the first 10 or 20.

  35. It should be pretty clear that subject 9 scored highly on all four factors and so this person is very anxious about statistics, computing and maths, but less so about peer evaluation (factor 4). • Factor scores can be used in this way to assess the relative fear of one person compared to another, or we could add the scores up to obtain a single score for each subject (that we might assume represents SPSS anxiety as a whole). • We can also use factor scores in regression when groups of predictors correlate so highly that there is multicollinearity.

  36. For the APA-Style Write Up, Click Here

More Related