1 / 35

Univariate Descriptive Statistics

Univariate Descriptive Statistics. Chapter 2. Lecture Overview. Tabular and Graphical Techniques Distributions Measures of Central Tendency Measures of Dispersion. Tabular and Graphical Techniques. Frequency Tables Ungrouped Grouped Histograms Cumulative Frequency Histogram.

dena
Télécharger la présentation

Univariate Descriptive Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Univariate Descriptive Statistics Chapter 2

  2. Lecture Overview • Tabular and Graphical Techniques • Distributions • Measures of Central Tendency • Measures of Dispersion

  3. Tabular and Graphical Techniques • Frequency Tables • Ungrouped • Grouped • Histograms • Cumulative Frequency Histogram

  4. Frequency Tables

  5. Histograms Note: sometimes percent is on the Y axis rather than frequency

  6. Cumulative Frequency Histograms

  7. Key Concepts • Choosing Intervals (i.e., choosing your “bins”) • Rules from the textbook (pages 38 – 39) • Commonly Used Examples from GIS • Equal Interval • Quantiles (e.g., quartiles and quintiles) • Natural Breaks • Standard Deviation

  8. Rules For Bin Sizes • Note: This is very relevant for GIS • Rule 1: Use intervals with simple bounds • Rule 2: Respect natural breakpoints • Rule 3: Intervals should not overlap • Rule 4: Intervals should be the same width • Rule 5: Select an appropriate number of classes

  9. The Effect of Classification • Equal Interval • Splits data into user-specified number of classes of equal width • Each class has a different number of observations

  10. The Effect of Classification • Quantiles • Data divided so that there are an equal number of observations are in each class • Some classes can have quite narrow intervals

  11. The Effect of Classification • Natural Breaks • Splits data into classes based on natural breaks represented in the data histogram

  12. The Effect of Classification • Standard Deviation • Mean + or – Std. Deviation(s)

  13. Key Concepts • Making sense of your histograms using distributions • Rectangular • Unimodal • Bimodal • Multimodal • Skew (positive and negative)

  14. Bimodal Distribution

  15. Multimodal Distribution

  16. Skew • An asymmetrical distribution

  17. Measures of Central Tendency • Measures of central tendency • Measures of the location of the middle or the center of a distribution • Mean, median, mode, midrange

  18. Definitions • Midrange • Mode • Median • Quantiles • Mean

  19. Definitions • Sample Mean • Population Mean

  20. Description of Mean • Mean – Most commonly used measure of central tendency • Average of all observations • The sum of all the scores divided by the number of scores • Note: Assuming that each observation is equally significant

  21. Symbols • n : the number of observations • N : the number of elements in the whole population • Σ : this (capital sigma) is the symbol for sum • i : the starting point of a series of numbers • X: one element in our dataset, usually has a subscript (e.g., i, min, max) • : the sample mean • : the population mean

  22. Summation Notation: Components refers to where the sum of terms ends indicates what we are summing up indicates we are taking a sum refers to where the sum of terms begins

  23. Mathematical Notation of Mean • The mathematical notation used most often in this course is the summation notation • The Greek letter capital sigma is used as a shorthand way of indicating that a sum is to be taken: The expression is equivalent to:

  24. Summation Notation: Simplification • A summation will often be written leaving out the upper and/or lower limits of the summation, assuming that all of the terms available are to be summed

  25. Equation for Mean Sample mean: Population mean:

  26. Example Mean Calculations • Example I • Data: 8, 4, 2, 6, 10 • Example II • Sample: 10 trees randomly selected from Battle Park • Diameter (inches): 9.8, 10.2, 10.1, 14.5, 17.5, 13.9, 20.0, 15.5, 7.8, 24.5

  27. Example Mean Calculations • Example III Annual mean temperature (°F) Monthly mean temperature (°F) at Chapel Hill, NC (2001).

  28. Examples IV & V Mean annual precipitation (mm) Mean 1198.10 (mm) Mean annual temperature (°F) Mean 58.51 (°F) Chapel Hill, NC (1972-2001)

  29. Explanation of Mean • Advantage • Sensitive to any change in the value of any observation • Disadvantage • Very sensitive to outliers Mean = 6.19 m without #8 Mean = 8.10 m with #8

  30. Measures of Dispersion • Used to describe the data dispersion/spread/variation/deviation numerically • Usually used in conjunction with measures of central tendency

  31. Measures of variation # of obs score score Low variation High variation Groups have equal means and equal n, but one varies more than the other

  32. Definitions • Range • Mean Deviation • Variance • Standard Deviation • Coefficient of Variation • Pearson’s

  33. Symbols • s2 : the sample variance • σ2 : the population variance • s : the sample standard deviation • σ : the population standard deviation

  34. Sample Variance and Standard Deviation Variance Standard Deviation Note: as with the mean there are both sample and population standard deviations & variances

  35. Next Class • Read chapter 3 • Work on the homework • Come with questions • Bring your laptop

More Related