1 / 112

By: Pulakesh Maiti Indian Statistical Institute

Evolution of the art of keeping Records and Development of Total Survey Design with application to some projects. By: Pulakesh Maiti Indian Statistical Institute.

zea
Télécharger la présentation

By: Pulakesh Maiti Indian Statistical Institute

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution of the art of keeping Records and Development of Total Survey Design with application to some projects By: PulakeshMaiti Indian Statistical Institute

  2. Summary. While statistics have been collected and used in this subcontinent from antiquity, much changes in collection and use took place during the British Period (1757 – 1947) in Indian History. • Some of the changes were due to imperial needs, but much of it took place indirectly as a result of western education and a spirit of scientific curiosity and experimentation. • Interest in rapid social, economic and technical development added a new dimension after India’s independence in 1947. • We keep a track of the evolution of data collection. Some discussions are made on the acceptance of random sampling with its basic principles and the necessary activities in its domain. • Finally, Total Survey design has been developed and deployed to some projects undertaken in India and abroad.

  3. Two highlights from the pre British period. In the Arthasastra by Kautilya (321 – 296 B.C.), which literally means a treatise on economics, one gets an account of data collection. “ It is the duty of Gopa, village accountant, to attend the accounts five or ten villages, as ordered by the collector general……..Also, having numbered the houses as tax paying or non-tax paying, he shall not only register the total number of inhabitants of all the four castes in each village, but also keep an account of the exact number of cultivators, cowherds, merchants, artisans, labourers, slaves and biped and quadruped animals, fixing at the same time the amount of gold, free labour, toll and fines that can be collected from it (each household)”. (Shamsastry, 1929, p 158).

  4. “The basic unit for recording information pertaining to agriculturists and the produce was the village. Ascertainment of the extent of the soil in cultivation and weighing several portion of personal observation was made through the superintendent to the survey, the Bitkchi, the Patwaris who were being appointed at the village level”. The above period pertains to that of Akbar, the Great MoghulEmperor(1556A.D.to1605A.D.).

  5. In many Countries, especially in European Countries too, evolved the mechanism of data collection. As early as in 1662 AD, Graunt published his work on social statistics based on the data collected in an arbitrary or haphazard manner. However, such practice was not well organized. It is only after the industrial renaissance in Europe, the necessity for such enquires in depth and breadth also increased and collection of data in the from of complete enumeration on various social, economic, demographic and biological characteristics came into practice in the 19th century through organized bodies. Such a practice came into existence in India too, when the British Government started census operation around 1881.

  6. The following is a summary picture of the current statistical system in India developed through the early and later British period and the period after independence.

  7. Office of the Registrar General and census commissioner; Home Ministry is responsible for conducting the decennial population data, birth and death statistics, calculation of birth, death and other demographic rates; • Department of commercial intelligence and statistics, Ministry of Finance looks after statistics on foreign trade and business; • Reserve Bank of India, Ministry of Finance, looks after foreign trade, monetary flow, interest rates etc.; • Directorate of Economics and Statistics, Ministry of Food and Agriculture is responsible for compiling and publishing agricultural statistics such as crop production, crop fore costs, fisheries, live stock on all India basis; • Labour Bureau, Ministry of Labour, prepares consumer price index number;

  8. Office of Economic Adviser (OEA), Ministry of Industry on a weekly basis, based on price quotations compiled by official as well as some non-official agencies in respect of 435 selected items and commodities identified in the basket of index; Central Bureau of Health Intelligence (CBHI), State Welfare Bureau, ICMR, Ministry of Health and Family Welfare records different aspects of public health and family welfare. The system producing health statistics is totally decentralized and still relatively week even by Indian Standard on incidence or prevalence of major diseases at the national level; Newly Created Ministry of Environment and CSO have been bringing about handbook on environmental statistics.

  9. The rapid growth of interest in “sampling methods” and the conclusions made possibly started after Kiaer (1895) who introduced the concept of random sampling to replace the usual approach of complete enumeration and emphasize the value of a representative sample. • A representative sample is defined as a photograph, who reproduces details of the original in its true relative proportions. Bowley A.L. (1906) discussed about the use of a random sample. The works of Bowley A.L. (1926) and Neyman.J (1934) may be said to have laid the foundation f modern sampling theory.

  10. India witnessed the advent of large scale sample surveys under the guidance of Late Professor P.C. Mahalanobis. The National Sample Survey (NSS) was created in 1950 as a multifaced fact finding body . The Department of statistics (DOS) was set up in the CabinateSecretariate in April 1961 and during the same period CSO and NSS were under the full fledged department of statistics (DOS). In the month of February, 1999, Department of Statistics and ProgrammeImplementation were merged and named as ‘Department of Statistics and ProgrammeImplementation’ in the ministry of planning and implementation. Finally by October 1999, Department of Statistics and Programmeimplementation were declared as ‘ Ministry of Statistics and ProgrammeImplementation’.

  11. Responsibility of collection or co-ordination of data fell on NSSO and CSO. Since then NSSO is continuing to contribute to National data base, whereas CSO is mainly playing the role of dealing conceptualization and standardization of different concepts and definitions.

  12. Classification of Available Data Sources Data at present are obtained mainly through The government organization set up; Different line departments of the government; Academic research institutes / universities. The first two may be defined as official data, whereas the third one may be termed as academic statistic.

  13. Academic statistics are mainly generated from different research projects undertaken by different research institutions/universities ,while making investigations on methodological issues; development of probability/non probability samples understanding the nature and extent of errors and their effects on survey results.

  14. It may be noted that, much attention has been paid by the survey theoreticians to measure the extent of sampling error(a part of total errors) and to control through properly adopted sampling designs, choice of appropriateestimators, but, so far survey design has received little importance in theory and practice of survey sampling

  15. Schematic Diagram 1

  16. Schematic Diagram 1 (Contd.)

  17. Each tool involves estimation and estimators that differ in mathematical complexity. One needs to examine at this stage also, if relatively simple descriptive estimators such as totals, means and ratios may be used or more complex relationship measures such as regression or correlation coefficient may be used in exploratory analysis, whose primary concern is to make the characteristics of the population being studied more understandable. It is also necessary to plan, if some of those tools may also be used in confirmatory analysis, when the objective is to test statistical models or assumptions indicated by exploratory work; It is also necessary to decide on the type of activities to be conducted in the face of non-sampling errors and on the statistical tests to be used for measuring total error.

  18. Schematic diagram 2

  19. Schematic diagram 2 (Contd.)

  20. The normal practice adopted so far in survey sampling is to take the decisions on the choice of • a sampling frame; • a sampling design; • a questionnaire design; • sample size; • sample weights; without much considerations to survey design . In the next few slides ,we discusses some issues relating to the above topics.

  21. A Sampling Frame: In some situations one may have number of frames. For example, for the study of health status of workers, one of the frames could be (i) list of work places, (ii) visiting their homes, and (iii) telephoning them at home. For any specific illness, physician’s offices may also be visited. Thus, candidates for a sampling error could include list of areas, telephone numbers, business establishments, physicians or hospitals. The frame chosen will affect the quality of survey results. When deciding to adopt a particular frame, one would need to consider the errors that would be introduced as a result of thischoice.

  22. A Sampling Design: The type of frame chosen will influence the type of sample design that can be used and will influence the efficiency of the potential design; In case of non availability of a frame, sampling design adopted will be different from the one, where more than one frames, when jointly used cover the entire population, are used. Normally, stratified multistage sampling design is adopted in practice. In the face of having intermediate reference units as sampling units, sampling design would be different.

  23. A Questionnaire Design: After defining the concepts, definitions to be used and choosing the sampling design, a detailed list and description of the survey variables with the units of measurement is prepared in consultation with the subject specialists, before they are presented in a most efficient way as a data gathering instrument. Sometimes, the variables to be measured have to be translated into operational/workable definitions and expressed in the form of a logical series of questions, which the interviewer can ask and the interviewee comprehend and answer. They should be designed in such a way that they (i) enable the collection of actual information, (ii) facilitate the work of data collection, collation, processing and tabulation, (iii) ensure economy in data collection (iv) permit comprehensive and meaningful analysis and purposeful utilization of captured data. The refinement of the general data requirement of any survey into the precise, questions is a step-by-step process. Just as development of a complex design is. There should be spaces indicating “confidentiality”, the identity of the agency and the hierarchical identity of the respondent.

  24. Sample size: Basic approaches for single item(s) of enquiry based on SRS designs depend only on the precision measured in terms of (i) margin of error, (ii) coefficient of variation, (iii) cost concepts alone and (iv) also considering both precision and cost. • These approaches are applied to different commonly used such designs as unstratified sampling, stratified sampling, cluster sampling and multistage sampling. • The statistical tool used for determination of sample size is • , for Qualitative Characteristic • and • , for Quantitative Characteristic • where, , ) are sample statistics and population parameters; d is the margin of error and is the confidence coefficient attached to the statement that the sample statistic would be within + / - of d of the population parameter.

  25. Considering precision of estimate only • Situation I, Single Item: Qualitative Characteristic, under SRS, using the above probability statement, (with 95% confidence) nI=4.P(1-P)/d^2 • As a thumb rule, under SRSWR, nIwould be taken as nI=1/d^2 • Under SRSWOR, nF would be taken as nF=nI[N/(N+nI)]

  26. Situation II, Single Item

  27. Situation III, Multiple Items

  28. Situation IV: sample size for subdivision

  29. Considering cost aspects only: There is no denying the fact that in most surveys the cost aspect is of primary concern. An overall budget is contemplated and various cost components are envisaged. This again depends on the set up and the survey problems at hand

  30. Situation V:One stage sampling

  31. Situation VI

  32. Cluster Sampling

  33. Two-Stage Sampling

  34. Optimality Criteria

  35. Computation of Sample Weights: Two qualities of each respondent are identified under the fixed population view. One is structural identity indicating which part of the stratum structure (stratum, primary, secondary sampling unit) the person came from and the other is sampling weight indicating the relative likelihood of being selected and responded in the survey. A sampling weight is calculated as the reciprocal of each respondent’s original probability of selection provided there is no non response. In the case of non response, the above weight has to be revised by multiplying by the inverse of response probabilities.

  36. Type of estimator(s) used for Estimating total of a Character y

  37. Survey Design: Survey design is the design for allocation of the jobs to the investigators and supervisors engaged as members of survey personnel group; It helps one estimate measurement variance and Survey Design has to be determined at the planning stage. Survey design is essential for separating the sources of variation.

  38. Analysis is carried out in standard practice with the assumptions that , (1) there is no problem with the frame, (2) no problem of non-response and , (3)no problem of measurement error, The only error arising is due to sampling error and for that s.e. of the estimator is calculated.

  39. WHAT HAPPENS WHEN THE ABOVE ASSUMPTIONS ARE VIOLATED? (i.e.there are frame errors,true values are not reported and data set are incomplete).

  40. Survey Errors: Survey errors can be classified as sampling error and non-sampling errors by type and within each category, errors are classified as variable errors and biases by nature. Variable errors and biases can arise form sampling and/or non-sampling operations. This double dichotomy gives rise to a four fold classification of errors.

  41. Many potential sources of errors can be found in each of these classes, since every operation is a potential source of variable errors and biases. Different biases can be considered as a set of constants determined by the essential survey conditions, although their values remain largely unknown Biases represent the difference between expected sample value and true value, whereas, variable errors measure the source of difference between the estimate and its expected value. Variable sources would fluctuate, if we are to select different samples with the same design. Most biases can not be reduced by increasing the size of the sample, but only by improving the quality of operation. Contrariwise the reduction in variable errors depends on the number of units of some kind.

  42. Variable errors can be measured by noting internal replications of the units within the sample. Measurement requires the replication of units, whether sampling units or observations by proper survey design to separate sources of variation. Measurement of biases essentially depends on a different method external to the survey proper.

  43. Non-sampling errors are often thought of as being due entirely to mistakes and deficiencies entered during planning, execution and processing stages of the survey operation. Non-sampling errors are defined as a residual category. Thus, one can have non-sampling errors arising from (1)deficiencies in the problem formulation leading to wrongly conceived concepts, definitions and inability to arrive at the workabledefinition; (2)imperfections in the frame leading to an inappropriate sampling design and wrong population being studied; poor construction and/or inadequacies of the frame; (3)imperfections in the questionnaire design; (4)inappropriate choice of reference period; imperfections in the tabulation plan;

  44. (5)inability in collecting information from all items and all respondents; (6)inaccurate survey design; (7)mistakes in recording information; (8)variability in responses; (9)illogical /unrepresented data; (10)errors in interpretation.

  45. Many such factors can cause a disagreement between survey results and true population value. As one might expect, even the notion of true population value sometimes appears to be controversial. In some situations, the notion of an absolute standard for comparison is a fundamental element of the conceptual frame work, in other situations one may be satisfied with a purely operational view of reality, where measurements are simply defined as a product of a specified data collection procedure. Absolute standard of truth plays no role in the purely operational work.

  46. Some illustrative examples of sources of non-sampling errors encountered in real life problems: Every activity outlined at different stages of the schematic diagram 1 may be subject to errors, if proper measure is not taken at that stage. Started with the definition, errors may occur and end up till the completion of the study. For illustrative purposes, we mention few examples of possible errors likely to occur at each stage of some of the projects undertaken, had there been no measure taken ,through the display of next few slides.

  47. Workable definition: The project “Domestic Tourists in Orissa (1988-89), needed redefinition of a “tourist” with respect to the objective of the study. Among others, enquiries were also directed to finding availability of existing infrastructure facilities in terms of accommodation, transport (road, rail, air), medicine and other aspects. Normally, a tourist by definition is a person who visits places of historical monuments, pilgrimages etc. According to objective of the study, any person, for any reason whatsoever, requiring accommodation to spend at least one night should be considered as a tourist, and hence became a member of the target population. Therefore, the usual definition of a tourist became unusable and was defined according to the objective of the study. Otherwise, target population considered could have been under coverage.

  48. Frame Problems: That an imperfect frame may lead to coverage errors was observed both in the study of health status of workers and the study through IPPVIII Project (1998); In the former study of health workers, errors due to coverage problem were likely to occur, • If an area frame were sued, which would cover all workers, but would also include large workers; • If a telephone frame were used, which would not cover workers without telephones, and would also large number of workers; • If business establishments were used, which would contain large concentration of workers. However, it might be extremely difficult to construct a complete list of elements; • However, if medical records are used, it became easy to identify persons who had the disease.

  49. The Indian Population Project (IPP-VIII 1998): undertaken at the Indian Statistical Institute was meant for studying different facts of IPP-VIII. One important component was to assess the impact of the project on the beneficiaries. The lower income group formed the beneficiary group. While listing the beneficiaries in an area, many non-beneficiaries were included causing over coverage. It was also observed in the project: “Cost Benefit Analysis of Rural Electrification (1975-76): that investigators employed as piece rate workers appeared to list more households, which were not within the village boundaries. Later this was seen through maps and other available relevant materials.

More Related