1 / 20

Introduction to Statistics

Cambodian Mekong University. MB102. Introduction to Statistics. Chapter 5 Sampling. Learning Objectives. Distinguish between a population and a sample Understand the importance of sampling to public and private enterprise Recognise and avoid bias in sampling

shandi
Télécharger la présentation

Introduction to Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cambodian Mekong University MB102 Introduction to Statistics Chapter 5 Sampling

  2. Learning Objectives • Distinguish between a population and a sample • Understand the importance of sampling to public and private enterprise • Recognise and avoid bias in sampling • Apply sampling techniques to commercial problems • Understand common methods of sampling (including random, systematic, quota, stratified, cluster, multistage and importance) • Measure and round numbers • Design questionnaires and surveys • Select an appropriate sample size

  3. 1. Introduction • Definitions • A population is the complete set that we seek to obtain information about • A sample is some part of the population (i.e. a ‘subset’ of the population) that is actually available as a source of information • It is not always practical to analyze the entire data relating to a particular problem. This may be due to the following: 1. It may not be physically possible to collect the entire data 2. It may be too expensive to collect the entire data 3. The data may be collected and summarized more quickly by using a sample rather than the complete set 4. The situation may change with time, so that sampling needs to be confined to a short time interval

  4. 2. Bias • A sample should be unbiased • E.g. there should be no tendency for certain individuals to have a larger or smaller chance of being selected in the sample • It should be genuinely representative of the population • If there is any bias the result will be valueless • Bias may be generated even if the sample selected is sufficiently representative • E.g. in interviewing situations, where the wording of the questions and behaviour or appearance of the interviewer can combine to distort the outcome of sample surveys

  5. 2. Bias • There are several simple rules that can be followed to help eliminate bias in sampling. These include: • Do not use only people who volunteer to be in the sample • Do not choose a sample using a method that omits segments of the population • Do not use people in the sample only because they are ‘handy’ or readily available • Ensure that the person selecting the sample does not have a vested interest in the results of the sample • Even following a set of rules, in practice it is very difficult to eliminate bias completely—quite often, the best we can hope for is to limit it as much as possible

  6. 3. Sampling error • Even if a sampling process were completely free of bias, there would still be fluctuations due to naturally occurring random variation • It is necessary to assess how much variation can be expected to occur from one sample to another • This random discrepancy between a measurement from a sample (called a sample statistic) and the population quantity being estimated (the population parameter) is called sampling error

  7. 4. Selection of a random sample • Choosing a random sample from population means that each member of the population has the same chance of being selected • This is intended to ensure that the sample is unbiased, although of course there will still be sampling error • When selecting any sample, consider the following: • What precisely are the members from which the sample is to be chosen (i.e. the population)? • What should be the size of the sample? • How should the members be selected for inclusion in the sample?

  8. 4. Selection of a random sample • The lottery method • Each member of the population is assigned a number • These numbers are transferred to ‘marbles’, which are well mixed; then a sample is chosen randomly from them • The marble technique is no longer used (since 1980 the winning numbers have been computer-generated) • Random numbers • A process similar to the lottery method is selecting the identifying numbers from a table of random numbers as published in books or generated by a computer • In their simplest form, these tables consist of a series of the digits 0 to 9 that have (as far as can be ascertained) an equal chance of occurring • When selecting random numbers from a random number table, it is desirable to adopt a specific pattern

  9. 5. Selection of other types of samples • Systematic sampling • In systematic sampling, the members of the population are put in the form of an array • That is, they are arranged in some natural order (usually ascending or descending) • Then every nth item is selected for the sample • The value of n should be chosen so that it will yield a sufficiently large sample • The greatest problem seems to arise when the listing of the population is incomplete • E.g. taking every tenth number out of the telephone directory may give a reasonable selection of the people listed there, but it discriminates very forcefully against people who do not have a telephone and those who have ‘silent’ numbers

  10. 5. Selection of other types of samples • Quota sampling • Uses selection rules that include certain types of people and at the same time controls the mix of people who can be included in the sample • One difficulty in using a random sample is that, once the sample is selected, interviewers must commit themselves to contacting the selected persons • A number of factors can affect the cost of conducting a survey. These include: • the length of the interview • the number of interviewers used • telephone charges • the travel expenses of interviewers • the number of call-backs required, and so on

  11. 5. Selection of other types of samples • Stratified sampling • The stratified sampling method divides the population into strata (subsets), then takes samples of equal size from each stratum • The selection of subjects from each stratum is done randomly • There are advantages in using a stratified sample; it may cost more to obtain but the precision of results is usually greater • Stratified sampling provides valuable information about the characteristics of the strata themselves, as well as overall characteristics of the population

  12. 5. Selection of other types of samples • Cluster sampling • In cluster sampling the population is also divided into strata, and then certain strata are selected randomly and the sample is chosen from only those strata • Cluster sampling is also used when the population in question is concentrated in defined areas • Multistage sampling • This is an extension of cluster sampling • The population is first broken down into a set of distinct groups, from which a number of groups are selected randomly • The groups selected are again broken down according to another characteristic, and a random selection is again made • This process is repeated until all desired stages have been considered

  13. 5. Selection of other types of samples • Importance sampling • Thisis where the probability of selecting a particular individual (or larger subset) is dependent on its importance • For example, in election predictions, there is often little purpose in sampling electorates where the result is a ‘foregone conclusion’ • If the ‘swinging’ electorates can be identified, the efforts of the pollsters can be concentrated on effective sampling from these

  14. 6. Measurement and rounding • Approximation to the nearest number above or below • In this method of approximating data, the digits, 1, 2, 3 and 4 are rounded down, while the digits 6, 7, 8 and 9 are rounded up. • A borderline case that ends in a 5 presents a problem. One method of resolution is to round such cases to the nearest even value • E.g. 38.5 would become 38 and 43.5 would become 44. • This convention tends to create larger errors when dealing with the totals of rounded numbers.

  15. 6. Measurement and rounding • Types of error • The error made in recording or approximating continuous data is the difference between the computed value and the true value • There are several methods of expressing these errors • The absolute erroris simply the difference between the approximated value and the actual value • The relative erroris the ratio of the absolute error to the actual value • The relative error percentage is the absolute error expressed as a percentage of the actual value

  16. 7. Questionnaire and survey design • Many surveys take the form of a questionnaire in which the respondent must answer a series of questions • There are two basic types of questions: • Multiple-choice questions • Open-ended questions • Rules for designing questions • There are a number of basic rules when framing survey questions, including: • Order of questions • Direct questions • Unbiased questions • Unambiguity and clarity of questions • Presampling

  17. 8. Modern practices • There are three main types of surveys, namely • mail surveys • telephone interviews • in-person interviews • Conducting a credible survey entails scores of activities, each of which must be carefully planned and controlled, e.g. • Always pretest field procedures • Ensure that there is sufficient follow-up on non-respondents • Maintain a high standard of field work and adequate quality controls

  18. 9. Sample size selection • The issue is to select a sample of a certain size that meets the requirements of accuracy • The most commonly selected confidence level is 95% • Suppose point estimate p (obtained from our sample) is within a specified distance d (expressed as a decimal) from the true (unknown) population proportion p. The value d is sometimes called a margin of error of the estimate. • The minimum samples sizes required can be found To be 90% confident select n such that: To be 95% confident select n such that: To be 99% confident select n such that:

  19. 9. Sample size selection • Example A sociologist wants to estimate the proportion of teenagers who have driven a car when they were above the legal blood-alcohol level. She would like the estimate to be within 5% of the true population proportion with 95% confidence. How large a sample should she select? • Solution Using d = 0.05, the minimum sample size n must satisfy: Therefore the minimum sample size is about 385

  20. Summary • We have distinguished between a population and a sample • We discovered the importance of sampling to public and private enterprise • We recognized and avoided bias in sampling • We applied sampling techniques to commercial problems • We covered common methods of sampling (including random, systematic, quota, stratified, cluster, multistage and importance) • We measured and rounded numbers • We looked at designing questionnaires and surveys • Lastly we selected an appropriate sample size

More Related