Introduction to SPSS

# Introduction to SPSS

## Introduction to SPSS

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Introduction to SPSS Data types and SPSS data entry and analysis

2. In this session • What does SPSS look like? • Types of data (revision) • Data Entry in SPSS • Simple charts in SPSS • Summary statistics • Contingency tables and crosstabulations • Scatterplots and correlations • Tests of differences of means

3. SPSS/PASW

4. Aspects of SPSS • Menus - Analyse and Charts esp. • Spreadsheet view of data • Rows are cases (people, respondents etc.) • Columns are Variables • Variable view of data • Shows detail of each variable type

5. Questionnaire Data Coding

6. In SPSS • We change ticks etc. on a questionnaire into numbers • One number for each variable for each case • How we do this depends on the type of variable/data

7. Types of data • Nominal • Ranked • Scales/measures • Mixed types • Text answers (open ended questions)

8. Nominal (categorical) • order is arbitrary • e.g. sex, country of birth, personality type, yes or no. • Use numeric in SPSS and give value labels. (e.g. 1=Female, 2=Male, 99=Missing) (e.g. 1=Yes, 2=No, 99=Missing) (e.g. 1=UK, 2=Ireland, 3=Pakistan, 4=India, 5=other, 99=Missing)

9. Ranks or Ordinal • in order, 1st, 2nd, 3rd etc. • e.g. status, social class • Use numeric in SPSS with value labels • E.g. 1=Working class, 2=Middle class, 3=Upper class • E.g. Class of degree, 1=First, 2=Upper second, 3=Lower second, 4=Third, 5=Ordinary, 99=Missing

10. Measures, scales • Interval - equal units • e.g. IQ • Ratio - equal units, zero on scale • e.g. height, income, family size, age • Makes sense to say one value is twice another • Use numeric (or comma, dot or scientific) in SPSS • E.g. family size, 1, 2, 3, 4 etc. • E.g. income per year, 25000, 14500, 18650 etc.

11. Mixed type • Categorised data • Actually ranked, but used to identify categories or groups • e.g. age groups • = ratio data put into groups • Use numeric in SPSS and use value labels. • E.g. Age group, 1=‘Under 18’, 2=‘18-24’, 3=‘25-34’, 4=‘35-44’, 5=‘45-54’, 6=‘55 or greater’

12. Text answers • E.g. answers to open-ended questions • Either enter text as given (Use String in SPSS) • Or • Code or classify answers into one of a small number types. (Use numeric/nominal in SPSS)

13. Data Entry in SPSS • Video by Andy Field

14. Frequency counts • Used with categorical and ranked variables • e.g. gender of students taking Health and Illness option

15. Central Tendency • Mean • = average value • sum of all the values divided by the number of values • Mode • = the most frequent value in a distribution • (N.B. it is possible to have 2 or more modes, e.g. bimodal distribution) • Median • = the half-way value, or the value that divides the ordered distribution in the middle • The middle score when scores are ordered • N.B. need to put values into order first

16. Dispersion and variability • Quartiles • The three values that split the sorted data into four equal parts. • Second Quartile = median. • Lower quartile = median of lower half of the data • Upper quartile = median of upper half of the data • Need to order the individuals first • One quarter of the individuals are in each inter-quartile range

17. Used on Box Plot Age of Health and Illness students Upper quartile Median Lower quartile

18. Variance • Average deviation from the mean, squared • 5.20 is the Sum of Squares • This depends on number of individuals so we divide by n (5) • Gives 1.04 which is the variance

19. Standard Deviation • The variance has one problem: it is measured in units squared. • This isn’t a very meaningful metric so we take the square root value. • This is the Standard Deviation

20. Using SPSS • ‘Analyse>Descriptive>Explore’ menu. • Gives mean, median, SD, variance, min, max, range, skew and kurtosis. • Can also produce stem and leaf, and histogram.

21. Charts in SPSS • Use ‘Chart Builder’ from ‘Graph’ menu or the Legacy menu • And/or double click chart to edit it. • E.g. double click to edit bars (e.g. to change from colour to fill pattern). • Do this in SPSS first before cut and paste to Word • Label the chart (in SPSS or in Word)

22. Stem and leaf plots • e.g. age of students taking Health and Illness option • good at showing • distribution of data • outliers • range

23. Stem and leaf plots e.g.

24. Box Plot

25. Box Plot Fill colour changed. N.B. numbers refer to case numbers.

26. Histograms and bar charts • Length/height of bar indicates frequency

27. Histogram Fill pattern suitable for black and white printing

29. Pie chart • angle of segment indicates proportion of the whole

30. Pie Chart Shadow and one slice moved out for emphasis

31. Analysing relationships • Contingency tables or crosstabulations • Compares nominal/categorical variables • But can include ordinal variables • N.B. table contains counts (= frequency data) • One variable on horizontal axis • One variable on vertical axis • Row and column total counts known as marginals

32. Example • In the Health and Illness class, are women more likely to be under 21 than men?

33. Crosstabulations • e.g. • Use column and row percentages to look for relationships

34. SPSS output

35. Chi-square ² Cross tabulations and Chi-square are tests that can be used to look for a relationship between two variables: • When the variables are categorical so the data are nominal (or frequency). • For example, if we wanted to look at the relationship between gender and age. • There are several different types of Chi-square (²), we will be using the 2 x 2 Chi-square

36. 2x2 Chi-square results in SPSS

37. Another example • The Bank employees data

38. Bank EmployeesChi-Square tests

40. Low values in cells • Get SPSS to output expected values • Look where these are <5 • Consider recoding to combine cols or rows

41. Tabulating questionnaire responses • Categorical survey data often “collapsed” for purposes of data analysis An analysis on a sample of 2 (e.g. Black African) would not have been very meaningful!

42. Recoding variables • http://www.youtube.com/watch?v=uzQ_522F2SM&feature=related • Ignore t-test for now 6m11s • http://www.youtube.com/watch?v=FUoYZ_f6Lxc • Uses old version of SPSS, no submenu now. 6m

43. Scatterplots and correlations • Looks for association between variables, e.g. • Population size and GDP • crime and unemployment rates • height and weight • Both variables must be rank, interval or ratio (scale or ordinal in SPSS). • Thus cannot use variables like, gender, ethnicity, town of birth, occupation.

44. Scatterplots • e.g. age (in years) versus Number of GCSEs

45. Interpretation • As Y increases X increases • Called correlation • Regression line model in red

46. Correlation measures association not causation • The older the child the better s/he is at reading • The less your income the greater the risk of schizophrenia • Height correlates with weight • But weight does not cause height • Height is one of the causes of weight (also body shape, diet, fitness level etc.) • Numbers of ice creams sold is correlated with the rate of drowning • Ice creams do not cause drowning (nor vice versa) • Third variable involved – people swim more and buy more ice creams when it’s warm