1 / 75

Chapter 1

Chapter 1 . Data Collection. Section 1.1. Introduction to the Practice of Statistics. Statistics. The science of statistics is Collecting Organizing Summarizing Analyzing information to draw conclusions or answer questions Statistics provides a measure of confidence in any conclusion.

ellard
Télécharger la présentation

Chapter 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1 Data Collection

  2. Section 1.1 Introduction to the Practice of Statistics

  3. Statistics • The science of statistics is • Collecting • Organizing • Summarizing • Analyzing information to draw conclusions or answer questions • Statistics provides a measure of confidence in any conclusion

  4. Data • Solve 3x + 5 = 11 • Everyone (should) get the same answer • How long was your drive (or walk) to class today? • Different answers…this is why we need statistics! • We can then break down the data to meaningful information

  5. Statistics and mathematics have similarities but are different • Mathematics • Solves problems with 100% certainty • Has only one correct answer • Statistics, because of variability • Does not solve problems with 100% certainty (95% certainty is much more common) • Frequently has multiple reasonable answers

  6. Population vs. Sample • A population (Greek μ) • Is the group to be studied • Includes all of the individuals in the group • A sample • Is a subset of the population • Is often used in analyses because getting access to the entire population is impractical

  7. Population vs. Sample • Population Example • People 18 years and older • Sample Example • Students at SHU 18 and older

  8. Parameter vs. Statistic • A statistic is a numerical summary of the sample • Descriptive statistics organize and summarize the data in ways such as tables and graphs • Inferential statistics use the sample results and extend them to the population so we can measure the reliability of the results • A Parameter is a numerical summary of a population

  9. Example • Suppose the actual percentage of all students at SHU that own a car is 48.2% • This is a ________________________ • We surveyed 100 students and found 46% own a car • This is a _________________________

  10. The Process of Statistics • Identify the research objective: what do you want answered • Collect he data needed to answer the question: Usually a sample (1.2 – 1.6) • Describe the data: the descriptive statistics (ch. 2 – 4) • Perform Inferences: Use appropriate techniques to test reliability for population (ch. 9 – 12)

  11. Variables • Characteristics of the individuals under study are called variables • Some variables have values that are attributes or characteristics … those are called qualitative or categorical variables • Some variables have values that are numeric measurements … those are called quantitative variables

  12. Qualitative Variables • Examples of qualitative variables • Gender • Zip code • Blood type • States in the United States • Brands of televisions • Qualitative variables have category values … those values cannot be added, subtracted, etc.

  13. Quantitative Variables • Examples of quantitative variables • Temperature • Height and weight • Sales of a product • Number of children in a family • Points achieved playing a video game • Quantitative variables have numeric values … those values can be added, subtracted, etc.

  14. Discrete Vs. Continuous • Quantitative variables can be either discrete or continuous • Discrete variables • Variables that have a finite or a countable number of possibilities • Frequently variables that are counts • Continuous variables • Variables that have an infinite but not countable number of possibilities • Frequently variables that are measurements

  15. Discrete Variables • Examples of discrete variables • The number of heads obtained in 5 coin flips • The number of cars arriving at a McDonald’s between 12:00 and 1:00 • The number of students in class • The number of points scored in a football game • The possible values of qualitative variables can be listed

  16. Continuous Variables • Examples of continuous variables • The distance that a particular model car can drive on a full tank of gas • Heights of college students • Sometimes the variable is discrete but has so many close values that it could be considered continuous • The number of DVDs rented per year at video stores • The number of ants in an ant colony

  17. Section 1.2 Observational Studies Versus Designed Experiments

  18. Observational Study • A survey sample is an example of an observationalstudy • An observational study is one where there is no attempt to influence the value of the variable • An observational study is also called an expostfacto (after the fact) study • Advantages • It can detect associations between variables • Disadvantages • It cannot isolate causes to determine causation

  19. Designed Experiment • A designedexperiment is an experiment • That applies a treatment to individuals • Often compares the treated group to a control (untreated) group • Where the variables can be controlled • Advantages • Can analyze individual factors • Disadvantages • Cannot be done when the variables cannot be controlled • Cannot apply in cases for moral / ethical reasons

  20. Lurking & Confounding Variables • A danger in observational studies are confounding and lurkingvariables • In an observational study, two explanatory variables can be linked, thus causing the relation to the response to be due to another variable not accounted for: Confounding variables. • Lurking Variables are variables not initially considered in the study but affect the response variable. • Associated does not mean that one causes the other • A simple observational study may find that smoking and cancer are associated • Cannot conclude that smoking causes cancer • Cannot conclude that cancer causes people to smoke • What are some Lurking Variables with Smoking and Cancer?

  21. Types of Observational Studies • Cross-sectional • Case-control • cohort

  22. Cross-sectional Studies Observational studies that collect information about individuals at a specific point in time, or over a very short period of time. Case-control Studies These studies are retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records. In case-control studies, individuals that have certain characteristics are matched with those that do not. Cohort Studies A cohort study first identifies a group of individuals to participate in the study (cohort). The cohort is then observed over a period of time. Over this time period, characteristics about the individuals are recorded. Because the data is collected over time, cohort studies are prospective.

  23. Census • A census is a list • Of all the individuals in a population • That records the characteristics of the individuals • An example is the US Census held every 10 years (this is only an example though) • Advantages • Answers have 100% certainty • Disadvantages • May be difficult or impossible to obtain • Costs may be prohibitive

  24. Section 1.3 Simple Random Sampling

  25. Simple Random Sample • A simplerandomsample is when every possible sample of size n out of a population of N has an equally likely chance of occurring

  26. Simple Random Sample

  27. Let’s Try It! • 5 Volunteers… • A simple (but not foolproof) method • Write each individual’s name on a separate piece of paper • Put all the papers into a hat • Draw 2 random papers from the hat • Physical methods have some issues • Are the papers sufficiently mixed? • Are some of the papers folded? • What else???

  28. Random Numbers • A method using a table of random numbers • (Back pages Table 1) • List and number the individuals • Decide on a way to pick the random numbers (how to choose the starting point and what rule to use to select which digits to choose after that) • Select the random numbers • Match the numbers to the individuals • With the technology available today, this method is almost silly

  29. Calculator • Randint(start #, end #, how many) • Leave the 3rd entry blank for 1 value • Table 3 Page 25: • Randomly survey 5 of their 30 clients. • Number them 1 – 30 • RandInt(1,30,5) • Survey the clients corresponding to the generated values.

  30. Section 1.4 Other Effective Sampling Methods

  31. Collecting Data • There are other effective ways to collect data • Stratified sampling • Systematic sampling • Cluster sampling • Each of these is particularly appropriate in certain specific circumstances

  32. Stratified Sample • A stratifiedsample is obtained when we choose a simple random sample from subgroups of a population • This is appropriate when the population is made up of nonoverlapping (distinct) groups called strata • Within each strata, the individuals are likely to have a common attribute • Between the stratas, the individuals are likely to have different common attributes

  33. Stratified Sample

  34. Stratified Sample • Example – polling a population about a political issue • It is reasonable to divide up the population into Democrats, Republicans, and Independents • It is reasonable to believe that the opinions of individuals within each party are the same • It is reasonable to believe that the opinions differ from group to group • Therefore it makes sense to consider each strata separately • Method can help ensure all subgroups are represented so our data is more reliable

  35. Stratified Sample • Example – a poll about safety within a university • Three identified strata • Resident students • Commuter students • Faculty and staff • It is reasonable to assume that the opinions within each group are similar • It is reasonable to assume that the opinions between each group are different

  36. Stratified Sample • Assume that the sizes of the strata are • Resident students – 5,000 • Commuter students – 4,000 • Faculty and staff – 1,000 • If we wish to obtain a sample of size n = 100 that reflects the same relative proportions, we would want to choose • 50 resident students • 40 commuter students • 10 faculty and staff • Finally, conduct a simple random sample within each subgroup to obtain data.

  37. Systematic Sample • A systematicsample is obtained when we choose every kth individual in a population • The first individual selected corresponds to a random number between 1 and k • Systematic sampling is appropriate • When we do not have a frame • When we do not have a list of all the individuals in a population

  38. Systematic Sampling

  39. Systematic Sampling • Example – polling customers about satisfaction with service • We do not have a list of customers arriving that day • We do not even know how many customers will arrive that day • Simple random sampling (and stratified sampling) cannot be implemented

  40. Systematic Sampling • Assume that • We want to choose a sample of 40 customers • We believe that there will be about 350 customers • Values of k • k = 7 is reasonable because it is likely that enough customers will arrive to reach the 40 target • k = 2 is not reasonable because we will only interview the very early customers • k = 20 is not reasonable because it is unlikely that enough customers will arrive to reach the 40 target

  41. Cluster Sample • A clustersample is obtained when we choose a random set of groups and then select all individuals within those groups • We can obtain a sample of size 50 by choosing 10 groups of 5 • Cluster sampling is appropriate when it is very time consuming or expensive to choose the individuals one at a time

  42. Cluster Sample

  43. Cluster Sample • Example – testing the fill of bottles • It is time consuming to pull individual bottles • It is expensive to waste an entire cartons of 12 bottles to just test one bottle • If we would like to test 240 bottles, we could • Randomly select 20 cartons • Test all 12 bottles within each carton • This reduces the time and expense required

  44. Convenience Sample • A conveniencesample is obtained when we choose individuals in an easy, or convenient way • Self-selecting samples are examples of convenience sampling • Individuals who respond to television or radio announcements • “Just asking around” is an example of convenience sampling • Individuals who are known to the pollster

  45. Convenience Sample • Convenience sampling has little statistical validity • The design is poor • The results are suspect • However, there are times when convenience sampling could be useful as a rough guess

  46. Multistage Sample • A multistagesample is obtained using a combination of • Simple random sampling • Stratified sampling • Systematic sampling • Cluster sampling • Many large scale samples (the US census in noncensus years) use multistage sampling

  47. Section 1.5 Errors in Sampling

  48. Bias • If the results of the sample are not representative of the population, then the sample has bias. • Three Sources of Bias • Sampling Bias • Nonresponse Bias • Response Bias

  49. Sampling Bias • Technique used to obtain individuals tends to favor one part of population over another. • Occurs often in convenience sampling • Often results in undercoverage, proportion of subgroup of population is lower in sample than actual population.

  50. Nonresponse Bias • Occurs when the “nonresponders” to a survey have different opinions than those who do. • Frequent with surveys • Controlled using callbacks or incentives

More Related