1 / 41

Lecture 1

Lecture 1. STAT 211 – 019 Dan Piett West Virginia University. Overview. 1.1 Science of Statistics 1.2 Displaying Small Sets of Numbers 1.3 Graphing Categorical Data BREAK - Problems 1.4 Frequency Histograms 1.5 Density Histograms 1.6 Centers of Data 1.7 Mean vs Median. Section 1.1.

tbaxter
Télécharger la présentation

Lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 1 STAT 211 – 019 Dan Piett West Virginia University

  2. Overview • 1.1 Science of Statistics • 1.2 Displaying Small Sets of Numbers • 1.3 Graphing Categorical Data • BREAK - Problems • 1.4 Frequency Histograms • 1.5 Density Histograms • 1.6 Centers of Data • 1.7 Mean vs Median

  3. Section 1.1 Science of Statistics

  4. Definitions • Statistics • The science of collecting, organizing, and analyzing data for the purpose of estimation and making inferences. • Data • Values which arise from observing characteristics on a selected group (sample) of individuals. The characteristic(s) which are observed are called variables. • Population • The entire group of “things” from which data may be collected. • Sample • The selected group of the population that is studied.

  5. Population vs. Sample • Population • All undergraduate students at WVU • Sample • A group of 100 undergraduate students at WVU All WVU Students (population) Group of 100 Students (sample)

  6. Types of Data • Variables • The characteristics which we look at when observing data. These can be classified into four groups. • Numeric • Discrete • Continuous • Categorical • Ranked (Ordinal) • Unranked (Nominal)

  7. Numeric Variables • Numeric Variables • Variables whose values represent quantities. This can be further broken down into discrete and continuous. • Discrete Numeric • Variables which usually arise by counting • Examples • Number of cars in a parking lot • Number of courses taken in a semester • Continuous Numeric • Variables which arise by measuring • Examples • Time spent running a marathon • Weight

  8. Categorical Variables • Categorical Variables • Variables that are not numeric. These can be further classified into ranked and unranked. • Ranked Categorical Variables • If the possible values of a categorical value follow a “natural” ordering • Example • Olympic Medal • Unranked Categorical Variables • Anything that is not ranked • Example • Blood Type

  9. Two Major Branches of Statistics Descriptive Statistics Inferential Statistics Use graphical displays and numeric summarizations to represent data. First half of this semester Use analytic methods and theory of probability to draw conclusions or make decisions. Second half of this semester

  10. Section 1.2 Displaying Small Sets of Numbers

  11. Displaying Small Sets of Data • We will be looking at three different ways to display small sets of data. • Dot Plots • Stem and Leaf Display • Histograms

  12. Dot Plot • Each Dot represents one data value • Example – Height of Students 62, 71, 65, 68, 64, 72, 66, 68, 70, 67, 67, 68, 64, 65, 68

  13. Stem and Leaf Display • Represents the data using the actual digits that make up the data. • Leading digit becomes the stem • Trailing digit becomes the leaf

  14. Stem and Leaf Examples • Example – Exam Grades 76, 74, 82, 96, 66, 76, 93, 86, 84, 62, 82, 75, 58, 71, 73, 79, 65, 80 Example - Decimals 1.3, 2.4, 1.7, 3.2, 5.6

  15. Outliers • Outlier • An observation whose value is unusual or extreme. • Example 15, 21, 13, 18, 23, 19, 26, 16, 71, 22, 14, 21 71 is an outlier

  16. Section 1.3 Graphing Categorical Data

  17. Grouped Frequency Table • Frequencies are tabulated for each value of the categorical variable. • Example – STAT 101 Grade Distribution (2000)

  18. Bar Graph • Represents categorical data by showing the amount of data that belong to each category as proportionally sized rectangles. • Example: US Coal Production (in millions) WV – 172.0 PA – 189.2 KY – 154.8 WY – 233.6 Other – 120.4

  19. Pie Chart (Circle Graph) • A pie chart shows the amount of data that belongs to each category as a proportional part of a circle. • Example: US Coal Production (in millions) WV – 172.0 PA – 189.2 KY – 154.8 WY – 233.6 Other – 120.4

  20. Section 1.4 Frequency Histograms

  21. Grouped Frequency Distributions • Grouped Frequency Distributions • A list (or table) which pairs ranges for values of a variable with their frequencies (counts). Each range is called a class. • Rules for constructing a grouped frequency distribution • Each class should be the same width. • Classes should not overlap. • Each observation falls into one and only one class. • Use between 3 and 15 classes.

  22. Frequency Histogram • Frequency Histogram • A graphical representation of a frequency distribution • Example – Math3 Exam Scores

  23. Shape of a Distribution • Symmetric Shaped • Mound Shaped • U-Shaped • Uniform

  24. Shape of a Distribution • Asymmetric Shaped • Skewed Right (Positive) • Skewed Left (Negative)

  25. Break

  26. Section 1.5 Density Histograms

  27. Density Histogram • Relative Frequency Histogram • A histogram in which the vertical axis represents percentages or proportions, rather than counts. • Example – Math3 Exam Scores

  28. Constructing Density Histograms • For each class, compute the percent of observations in each class. • Divide the percent associated with each class by the width of that class. This yields the percent of observations associated with each unit of the measurement scale. • Draw the histogram using the values computed in step 2.

  29. More on Density Histograms • If the bars of a histogram are arranged from largest to smallest, it is called a Pareto Chart. • A density histogram is a histogram whose vertical axis is scaled so that the sum of the areas of it’s rectangles is 1 square unit. • If the sample size, n, is very large, a density histogram can be used to estimate the distribution of the population from which the data was obtained.

  30. Section 1.6 Centers of Data

  31. Averages • Average • An average is a single value which represents all of the data. • Types of Averages • Sample Mean • Sample Median • Sample Mode

  32. Sample Mean (x) • Sample Mean • Formula • Example x: 14, 23, 8, 19, 41 Sample Mean = (14 + 23 + 8 + 19 + 41)/5 = 105/5 = 21

  33. More on Sample Means • If we acquire data from all members of a population, we can compute the population mean ( ) • Because of this, we can use the sample mean to estimate the population mean

  34. Sample Median ( ) • Sample Median • The middle value of the observed data values ranked from lowest to highest • If the sample contains n observations, the median is the ½(n+1) ranked value • Example: x: 14, 23, 8, 19, 41 Ranked: 8, 14, 19, 23, 41 n = 5 Position of the median: ½(5+1) observation = 3rd Median = 19

  35. Sample Median Even Example • In the previous example, n was odd • Example with an even n: x: 43, 26, 37, 19, 52, 80 Ranked: 19, 26, 37, 43, 52, 80 n=6 Position of the Median: ½ (6+1) observation =3½ position Median is the mean of the 3rd and 4th ranked observation Median = (37+43)/2 = 40

  36. Sample Mode • Sample Mode • The observed value which occurs with the greatest frequency • Example: x: 4, 3, 1, 4, 0, 3, 3, 1, 4, 0, 1, 1, 2, 2 Mode = 1

  37. More Mode Examples • A set of x might have no mode (every value occurs once) • Mode = None • NOT Mode = 0 • A set of x might have more than one mode • Mode = 5 and 6

  38. Section 1.7 Mean vs. Median

  39. Best Measure of the Center Mean Median Use for approximately symmetric distributions Greatly influenced by outliers Use for significantly skewed distributions Not greatly influenced by outliers

  40. How to Handle Outliers • If the outlier is a “mistaken” data value, eliminate it from the analysis • If the outlier is an actual data value, do not eliminate it

  41. End of Class

More Related