1 / 14

Data preparation for use in SEM

Data preparation for use in SEM. Ned Kock. Each column corresponds to a manifest variable. Data in table format. Some groups of columns correspond to a latent variable. Each row often contains the answers from one subject under a particular condition, and is also known as a “case”.

cbeard
Télécharger la présentation

Data preparation for use in SEM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data preparation for use in SEM Ned Kock

  2. Each column corresponds to a manifest variable. Data in table format Some groups of columns correspond to a latent variable. Each row often contains the answers from one subject under a particular condition, and is also known as a “case”.

  3. Missing values • A missing value is an empty cell in a data table. • Missing values are a fact of life in many areas of research, including behavioral research. • In terms of behavioral research, missing values may be present when: • Respondents do not answer one or more questions in a questionnaire. • A researcher empties a data cell when a respondent answers a question with non-usable data; e.g., by responding with a “0” (zero) when asked for his or her age.

  4. Examples of missing values Datasets with missing values are a common occurrence in behavioral research, as well as other types of research.

  5. Percentage of missing data A simple Excel formula can be used to calculate the percentage of missing data for a manifest variable. How much is too much? A recent Monte Carlo simulation suggests that as much as 30% may be okay. More than that can lead to problems. Supporting source: Kock, N. (2014). Single missing data imputation in PLS-SEM. Laredo, TX: ScriptWarp Systems.

  6. Dealing with missing values • A first step is to make an effort to ensure that no more than 30% of the data is missing in each column of a data table. • The above can be accomplished by employing data collection techniques that minimize missing data; e.g., targeted questionnaires and interviews. • Then the remaining missing cells can be filled using one of the several imputation methods, such as: • Arithmetic Mean Imputation • Multiple Regression Imputation • Hierarchical Regression Imputation • Stochastic Multiple Regression Imputation • Stochastic Hierarchical Regression Imputation

  7. Missing data imputation with WarpPLS Main menu > Settings > View or change missing data imputation settings: Using deletion, listwise or pairwise, to deal with missing data: Researchers have traditionally used deletion methods, often listwise and pairwise deletion, to deal with missing data. A report by the American Psychological Association Task Force on Statistical Inference stated that these techniques are ‘‘among the worst methods available for practical applications’’. Supporting source: Kock, N. (2014). Single missing data imputation in PLS-SEM. Laredo, TX: ScriptWarp Systems.

  8. Missing data imputation performance Main menu > Settings > View or change missing data imputation settings: Results from a Monte Carlo simulation: Multiple Regression Imputation yielded the least biased mean path coefficient estimates, followed by Arithmetic Mean Imputation. With respect to mean loading estimates, Arithmetic Mean Imputation yielded the least biased results, followed by Stochastic Hierarchical Regression Imputation and Hierarchical Regression Imputation. Supporting source: Kock, N. (2014). Single missing data imputation in PLS-SEM. Laredo, TX: ScriptWarp Systems.

  9. Replacing missing values with SPSS

  10. Creating source data file for WarpPLS • Source data files contain the data used in a WarpPLS analysis. • They are often referred to as “raw data files”. • Source data files should be prepared as follows: • They should be .xls or .xlsx files (Excel), or plain text files with the names of the variables first followed by each data case in the same order as the variables listed (missing data points do not have to be imputed a-priori). • If text files, variable names and numeric data should be separated from each other by tabs. • If text files, the suffix of the data file should be designated as .txt.

  11. Using Excel to create a .txt file

  12. Important tips • One file format that usually works well for a .txt file, and that is widely available is the ASCII tab-delimited format. • If you are using Excel to create a .txt file, save the Excel-formatted file first, and create the .txt file with a different name. • With Excel, have only one worksheet with the raw data. • You can also create .txt tab-delimited files using SPSS, in which case it is important to instruct SPSS to write the variable names into the .txt file. • The above is done by default when you use Excel.

  13. File import wizard Reading raw data file in WarpPLS Viewing and accepting data

  14. Acknowledgements Adapted text, illustrations, and ideas from the following sources were used in the preparation of the preceding set of slides: • Kock, N. (2015). WarpPLS 5.0 User Manual. Laredo, TX: ScriptWarp Systems. • Kline, R.B. (1998), Principles and Practice of Structural Equation Modeling, The Guilford Press, New York, NY. • MS Excel, SPSS, and WarpPLS software applications. • Rencher, A.C. (1998), Multivariate Statistical Inference and Applications, John Wiley & Sons, New York, NY. • SPSS’ web site: www.spss.com. • WarpPLS software. Final slide

More Related