1 / 49

Chapter 13

Chapter 13. Describing Data. Exploratory Data Analysis. Descriptive statistics allow you to summarize data within only a few numbers Use exploratory data analysis (EDA) to search for patterns in your data

delta
Télécharger la présentation

Chapter 13

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 13 Describing Data

  2. Exploratory Data Analysis Descriptive statistics allow you to summarize data within only a few numbers Use exploratory data analysis (EDA) to search for patterns in your data Before conducting any inferential statistic, use EDA to ensure that your data meet the requirements and assumptions of the test you are planning to use (e.g., normally distributed)

  3. Steps involved in the EDA: • Organize and summarize your data on a data coding sheet • If desired, organize data for computer entry • Graph data (bar graph, histogram, line graph, or scatterplot) so that you can visually inspect distributions • This will help you choose the appropriate statistics • Display frequency distributions on a histogram, and create a STEMPLOT • Examine your graphs for normality or skewness in your distributions

  4. Organizing Your Data • How you organize your data depends on the type of research conducted • Survey data • Organized into a series of columns one for respondent number and the rest for responses to survey items • Nominally scaled data should be dummy coded (e.g., 1=Yes, 2=No) • A “key” or “code sheet” is used to describe scales used for questions • Experimental/quasi-experimental data • Data broken down by condition • Unstacked format: Separate column for data from each treatment • Stacked format: Columns created for participant number, each treatment and data

  5. Data Organized in Unstacked (Left) and Stacked (Right) formats

  6. Grouped Versus Individual Data • You must decide whether to analyze grouped or individual data • Grouped data • Data represented by a single score (e.g., the average) • Convenient but there are limitations • Average score may not represent subjects’ individual performance in a group • Curves representing averaged data may not represent true nature of psychological phenomenon measured

  7. Individual data • Makes most sense for repeated measures • Useful when phenomenon measured is either/or (something learned or not) • May reflect effect more faithfully than grouped data • Best strategy is to inspect both individual and grouped data

  8. Graphing Your Data • A graph represents data in two dimensional space • Two dimensions represented by two lines intersecting at a right angle • Horizontal axis is called the abscissa or x-axis • Levels of independent variable represented on x-axis • Vertical axis is called the ordinate or y-axis • Value of the dependent variable represented on y-axis • A pair of values (one from x-axis and one from y-axis) defines a single point on the graph

  9. Types of Graphs • Bar Graph • Presents data as bars extending from the axis representing the independent variable • Length of each bar determined by value of the dependent variable • Width of each bar has no meaning • Precision of estimate shown in error bars (whiskers extending above and below top of each bar) • Can be used to represent data from single-factor and two-factor designs • Best if independent variable is categorical

  10. Example Bar Graph Error bar

  11. Line Graph • Data represented by a series of points connected by a line • Most appropriate for quantitative independent variables • Used to display functional relationships • Line graphs can show different shapes • Positively accelerated: Curve starts flat and becomes progressively steeper as it moves along x-axis • Negatively accelerated: Curve is steep at first and then “levels off” as it moves along x-axis • Once the curve levels off it is said to be asymptotic

  12. Example Line Graph Error bar

  13. A line graph can vary in complexity • A monotonic function represents a uniformly increasing or decreasing function • A nonmonotonic function has reversals in direction

  14. Examples of Positively (Top) and Negatively (Bottom) Accelerated Functions

  15. Example of a Nonmonotonic Function

  16. Scatterplot • Used to represent data from two dependent variables • The value of one dependent variable is represented on the x-axis and the value of the other on the y-axis • May include a “best fitting” straight line indicating the general trend in the data • Correlation coefficient and equation for line usually shown in this case

  17. Example Scatterplot

  18. Pie Graph • Used to represent proportions or percentages • Two types • Standard pie graph • Exploded pie graph

  19. Standard Pie Graph (Left) and Exploded Pie Graph (Right)

  20. Importance of Graphing Data • Graphing data has two advantages over tabular presentation • Shows relationship between independent and dependent variables more clearly • Graph shows subtleties that may not be evident in a table • Helps you evaluate data to decide on appropriate statistical test • Shapes of graphs can be inspected to see if data meet assumptions of statistical test

  21. The Frequency Distribution • Represents a set of mutually exclusive categories into which actual values are classified • Can take the form of a table or a graph • Graphically, a frequency distribution is shown on a histogram • A bar graph on which the bars touch • The y-axis represents a frequency count of the number of observations falling into a category • Categories represented on the x-axis

  22. Example of a Histogram

  23. Shapes of Histograms • You should examine your frequency distribution to determine its shape. • Normal distribution: Most scores centered around the mean • Positive skew: Most scores at the lower end of the measurement scale • Negative skew: Most scores at the higher end of the measurement scale • Bimodal distribution: Two modes

  24. Histogram Showing a Positive Skew

  25. Histogram Showing a Negative Skew

  26. A Bimodal Distribution

  27. The Stemplot • Alternative to histogram • Displays data so that original numbers can be retrieved • Creating a stemplot • Break each number into a stem (leftmost number) and leaf (rightmost number) • Arrange stems in a column and draw a line to the right of the stems • Place each leaf number for each stem to the right of the line

  28. Example Stemplot

  29. Measures of Center: Characteristics and Applications • Mode • Most frequent score in a distribution • Simplest measure of center • Scores other than the most frequent not considered • Limited application and value • Median • Central score in an ordered distribution • More information taken into account than with the mode • Relatively insensitive to outliers • Used primarily when the mean cannot be used

  30. Mean • Average of all scores in a distribution • Value dependent on each score in a distribution • This makes the mean sensitive to outliers (extreme scores) • Most widely used and informative measure of center

  31. Measures of Center: Applications • Mode • Used if data are measured along a nominal scale • Median • Used if data are measured along an ordinal or nominal scale • Used if interval data do not meet requirements for using the mean

  32. Mean • Used if data are measured along an interval or ratio scale • Most sensitive measure of center • Used if scores are normally distributed • Mean does not represent center if distribution is skewed

  33. Distributions Where Mean Represents Center (Left) and Does Not (Right two distributions)

  34. Measures of Spread: Characteristics • Range • Subtract the lowest from the highest score in a distribution of scores • Simplest and least informative measure of spread • Scores between extremes are not taken into account • Very sensitive to extreme scores • Semi-Interquartile Range • Less sensitive than the range to extreme scores • Used when you want a simple, rough estimate of spread

  35. Variance • Average squared deviation of scores from the mean • Not expressed in same units as original numbers • Standard Deviation • Square root of the variance • Expressed in the same units as original numbers • Most widely used measure of spread

  36. Measures of Spread: Applications • The range and standard deviation are sensitive to extreme scores • In such cases the semi-interquartile range is best • When your distribution of scores is skewed, the standard deviation does not provide a good index of spread • With a skewed distribution, use the semi-interquartile range

  37. The Five Number Summary and Box Plots • Five Number Summary • Convenient way to represent a distribution with a few numbers • Statistics included • Minimum score • The first quartile • The median (second quartile) • Third quartile • Maximum score

  38. Example of a Five Number Summary

  39. Boxplot • Graphic representation of the five number summary • First and third quartile define the ends of the box • A line in the box represents the median • Vertical “whiskers” extending above and below the box represent the maximum and minimum scores (respectively) • Data from multiple treatments are represented by side-by-side boxplots

  40. Example of a Boxplot (Left) and a Side-By-Side Boxplot (Right)

  41. Measures of Association Used when you want to determine the direction and degree of relationship between variables Various measures of association available for different applications

  42. The Pearson Product–Moment Correlation (r) Most widely used measure of association Value of r can range from +1 through 0 to –1 Magnitude of r tells you the degree of LINEAR relationship between variables Sign of r tells you the direction (positive or negative) of the relationship between variables

  43. Scatterplots Showing a Positive (a), Negative (b) and No Correlation (c)

  44. Presence of outliers affects the sign and magnitude of r Variability of scores within a distribution affects the value of r Used when scores are normally distributed

  45. Measures of Association • Pearson Product-Moment Correlation • Index of linear relationship between two continuously measured variables • Point-Biserial Correlation • Index of correlation between two variables, one of which is measured on a nominal scale and the other on at least an interval scale • Spearman Rank-Order Correlation (rho) • Index of correlation between two variables measured along an ordinal scale

  46. Phi Coefficient • Index of correlation between two variables measured along a nominal scale

  47. Linear Regression and Prediction Used to find the straight line that best fits the data plotted on a scatterplot The best fitting straight line is known as the least squares regression line The regression line is defined mathematically:

  48. The regression weight(b) is based on raw scores and is difficult to interpret • The standardized regression weight(beta weight) is based on standard scores and is easier to interpret • You can predict a value of Y from a value of X once the regression equation has been calculated • The difference between predicted and observed values of Y is the standard error of estimate

  49. Example Correlation Matrix

More Related