340 likes | 403 Vues
STT 350: SURVEY SAMPLING Dr. Cuixian Chen. Chapter 8: Cluster sampling. REVIEW: Why systematic sampling a useful alternative?. Easier to perform in the field (possibly less subject to selection errors by fieldworkers, especially if a good frame is not available)
E N D
Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow STT 350:SURVEY SAMPLINGDr. Cuixian Chen Chapter 8: Cluster sampling
REVIEW:Why systematic sampling a useful alternative? • Easier to perform in the field (possibly less subject to selection errors by fieldworkers, especially if a good frame is not available) • More information per unit cost than simple or stratified sampling.
Comparison of four sampling schemes • Obtain a specified amount of information about a population parameter at minimum cost. Stratified random sampling is often better suited for this than is simple random sampling. • Systematic sampling often gives results at least as accurate as those from simple random sampling, and it is easier to perform. • Cluster sampling gives more information per unit cost than do any of the other three designs.
Cluster sampling • A cluster sample is a probability sample in which each sampling unit is a collection, or cluster, of elements. • To summarize, cluster sampling is an effective design for obtaining a specified amount of information at minimum cost under the following conditions: • 1. A good frame listing population elements either is not available or is very costly to obtain, but a frame listing clusters is easily obtained. • 2. The cost of obtaining observations increases as the distance separating the elements increases.
Illustrative example • Suppose we wish to estimate the average income per household in a large city. How should we choose the sample? • Possibly: a frame listing all households (elements) in the city, and this frame may be very costly or impossible to obtain. • City block statistics from the Census Bureau are widely used in cluster sampling by market research firms, which may want to estimate the potential market for a product, the potential sales if a new store were to open in the area, or the potential number of clients for a new service, such as an emergency medical facility. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Difference b/w Strata & Cluster sampling • Main difference b/w the optimal construction of strata (Chapter5) and the construction of clusters. • Strata are to be as homogeneous (alike) as possible within, but one stratum should differ as much as possible from another with respect to the characteristic being measured. • Clusters, on the other hand, should be as heterogeneous (different) as possible within, and one cluster should look very much like another in order for the economic advantages of cluster sampling to pay off.
Est of µ with Cluster sampling The estimated variance is biased and a good estimator of only if n is large - say, n>=20. The bias disappears if the cluster sizes ml, m2, . . . , mNare equal.
Eg 8.1, page 254 • Q: A sociologist wants to estimate per-capita income in a certain small city. No list of resident adults is available. How should he design sample survey? • Answer: Cluster sampling, for no lists of elements are available. • City is marked off into rectangular blocks, except for two industrial areas and three parks (with a few houses). • (a) each city block for one cluster, (b) two industrial for one cluster, (c) three parks for one cluster. • Clusters are numbered on a city map, from 1 to 415. The experimenter has enough time and money to sample clusters and to interview every household within each cluster. • Results: 25 random numbers are SRS from 1 to 415. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Recall Eg8.1: N=415; n=?, For m, t, or p? What sampling scheme? Eg 8.2, page 256 Data is available in class website: Dataset used in Textbook Examples Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Est of t with Cluster sampling, when M is known Note that the estimator M*(y-bar) is useful only if the number of elements in the population, M, is known.
Recall Eg8.1: N=415; n=?, For m, t, or p? What sampling scheme? M is known Eg 8.3, page 258 Data is available in class website: Dataset used in Textbook Examples Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Ex 8.2, page 279 [Overview for background for Ex8.4] Data is available in class website: Dataset in Excel format Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Data is available in class website: Dataset in Excel format Ex 8.4, page 279 Note: n = 20 from the N = 96 M is known. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Est of t with Cluster sampling, when M is unknown • Often, M, the # of elements in the population is unknown. • Note: , sample average of the N cluster total. • It is an unbiased estimator of the population average of N cluster totals. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Recall Eg8.1: N=415; n=?, For m, t, or p? What sampling scheme? Eg 8.4, page 260 Data is available in class website: Dataset used in Textbook Examples Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Ex 8.2, page 279 [Overview for background for Ex8.3] Data is available in class website: Dataset in Excel format Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Data is available in class website: Dataset in Excel format Ex 8.3, page 279 Note: n = 20 from the N = 96 M is unknown. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Chap 8.5: Find sample size for Est of µ • Quantity of info is affected by 2 factors: # of clusters and relative cluster size. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Recall Eg8.1: N=415; For m, t, p, or n? What sampling scheme? Eg 8.6, page 265 Data is available in class website: Dataset used in Textbook Examples Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Find sample size for Est of t , when M is known • Quantity of info is affected by 2 factors: # of clusters and relative cluster size. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Recall Eg8.1: N=415; For m, t, p, or n? What sampling scheme? Eg 8.7, page 265 Data is available in class website: Dataset used in Textbook Examples Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Find sample size for Est of t, , when M is unknown • Quantity of info is affected by 2 factors: # of clusters and relative cluster size. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Recall Eg8.1: N=415; For m, t, p, or n? What sampling scheme? Eg 8.8, page 267 Data is available in class website: Dataset used in Textbook Examples Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Data is available in class website: Dataset in Excel format Ex 8.5, page 279 Note: n’ = 20 from the N = 96 M is unknown. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Est of pwith Cluster sampling Let ai denote total # of elements in cluster i that possess characteristic of interest.
Data is available in class website: Dataset in Excel format Eg 8.9, page 269 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Data is available in class website: Dataset in Excel format Ex 8.8, page 280 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Est of pwith Cluster sampling Let ai denote total # of elements in cluster i that possess characteristic of interest.
Data is available in class website: Dataset in Excel format Eg 8.10, page 271 Preliminary study outcomes. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow
Data is available in class website: Dataset in Excel format Ex 8.9, page 280 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow