Biostatistics course Part 7 Introduction to inferential statistics

Biostatistics coursePart 7Introduction to inferential statistics Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics, Division Health Sciences and Engineering Campus Celaya-Salvatierra University of Guanajuato Mexico

Presentación Médico Cirujano por la Universidad Autónoma de Guadalajara. Pediatra por el Consejo Mexicano de Certificación en Pediatría. Diplomado en Epidemiología, Escuela de Higiene y Medicina Tropical de Londres, Universidad de Londres. Master en Ciencias con enfoque en Epidemiología, Atlantic International University. Doctorado en Ciencias con enfoque en Epidemiología, Atlantic International University. Profesor Titular A, Tiempo Completo, Universidad de Guanajuato. Nivel 1 del Sistema Nacional de Investigadores. padillawarm@gmail.com

Competencies • The reader will define what is inferential statistics. • He (she) will know what is a sampling distribution. • He (she) will know and define properties of sampling distribution. • He (she) will analyze implications of sampling distribution because works with samples.

Population and sample • We want to measure prevalence of Entamoeba histolytic in Mexican Republic. • We cannot measure it in all Mexican population, because financial and practice reasons. • We can measure the prevalence in a Mexican sub-population, called sample.

How do we select a sample? • It is more easy to obtain a sample from Mexico city, but it is probably that the prevalence of Entamoeba histolytic is different to the prevalence in all country, and we had biased prevalence from all Mexican population. • If we choice a sample by chance, it is probably that we avoid biases. • The random sample ( to chance) is when only the chance decide who is included and who is not.

Example of two samples • The chief of Sanitary Jurisdiction want to research prevalence of E. histolytic between scholars in his jurisdiction. • The project is give to epidemiologist and there are a few resources for this project. • A community medical doctor want to know the prevalence of amebiasis in scholars. • He contract two people to collect the data.

Example of two samples • The epidemiologist obtained a sample of 10% of scholars registered in schools in the jurisdiction. • From 500 scholars selected, they obtained data on age, gender and a test for detection of antigen of E. histolytic in feces. Age (years) Amebiasis + Amebiasis – Total M F M F __________________________________________ 6 7 12 22 28 69 7 10 9 25 19 63 8 5 13 24 17 59 9 9 9 20 24 62 10 7 9 18 23 57 11 11 9 27 17 64 12 4 15 21 21 61 13 12 8 23 22 65 _________________________________________ Total 65 84 180 171 500

Example of two samples • In a sample of 10%, they obtained a prevalence of 29.8%. • 26.5% in males and 32.9% in females.

Example of two samples • The medical doctor collect data in a survey in two schools near his house. • Surveyed and test the presence of E. histolytic antigen in 500 students from these schools. Age (years) Amebiasis + Amebiasis – Total M F M F __________________________________________ 6 5 7 52 50 114 7 10 9 71 34 124 8 2 1 41 37 81 9 6 10 2 1 19 10 7 3 13 19 42 11 10 9 5 12 36 12 4 7 7 18 36 13 9 8 12 19 48 _________________________________________ Total 53 54 203 190 500

Example of two samples • With a sample of 10%, they obtained a prevalence of amebiasis from 21.4%. • 20.7% in males and 37.5% in females.

Example of two samples • Why do the results are different in two samples? • First, we should review the sample distributions. Age Jurisdiction sample Medical doctor sample Males Female Male Female % % % % _______________________________________________________ 6 5.8 8.0 11.4 11.4 7 7.0 5.6 16.2 8.6 8 5.8 6.0 8.6 7.6 9 5.8 6.6 1.6 2.2 10 5.0 6.4 4.0 4.4 11 7.6 5.2 3.0 4.2 12 5.0 7.2 2.2 5.0 13 7.0 6.0 4.0 5.6 ______________________________________________________

Target population and sampling population • It is important distinguish between target population and sampling population. • Target population is the population from which we want information. • Sampling population is the population from wich we can obtain information. • In previous example, both studies have the same target population, but they are different because do not have the same sampling population. • If the characteristics of target population are different from sampling population, the results will be biased.

Sampling estimates and sampling distribution • We selected a sample because want information about a particular event in the target population, for example, the prevalence of E. histolytic among schoolchildren. • Since we can not have this result directly, we must gather information on a random sample, taken from the target population and use it to obtain our best estimate of the value of the outcome in the population. • To distinguish between the values of the population and sample, we use Greek letters for the values of the population and Roman letters for the values of the sample.

Sampling estimates and sampling distribution • It is unlikely that the proportion of schoolchildren with amebiasis found in the random sample of 500 students of 29.8%, exactly the same as the true prevalence of the total school population of the jurisdiction. • But, how close is the estimated p, from the true value of the population π? • In general, we do not know π, so we need to find another way to assess how safe it is p as an estimate of π. • One way is to be aware that the random sample we're using is one of many that could have been learned.

Sampling estimates and sampling distribution • Thus, if many samples alternative could have been gathered in the only sample that we obtained: • How could we have found different results if we use multiple samples? • To resolve this question, we should see some simulations: • We have a school population of 5000 with a prevalence of amebiasis assumed to be of 29.8%. • We take a thousand independent samples of this population; sample size was set at 500 (10%). • We calculate the percentage of students in each sample with amebiasis. • The percentage of students with amebiasis found in the first 20 samples (sample estimates) are shown. • Note that each of them represents an estimate of the true prevalence of the population and we generate 1000 samples.

Sampling estimates and sampling distribution Sample Prevalence (%) Sample Prevalence (%) 1 29.8 8 28.4 2 32.1 9 30.7 3 28.0 10 33.1 4 32.0 11 28.8 5 27.3 12 29.5 6 25.4 13 30.5 7 31.1 14 29.4 This distribution is called the sampling distribution or distribution of samples. Note that: Most estimates of the samples are close to the true prevalence, p = 30% Its distribution ranges from 25 to 35%. Their distribution is nearly symmetrical.

Sampling distribution

Sampling distribution • It has illustrated the idea that, in theory, we can obtain many samples from a population and obtain different estimates of the samples. • However, in practice, we only have a sample of the population of interest. So we can never observe a distribution of estimates of the samples. • The idea of sampling distribution is fundamental in statistical inference, because it allows us to relate the population sample we have, where we got our information, with the true value of the population

Properties of sampling distribution All sampling distributions have the same characteristics. In the same population of 5000 students took samples of different sizes, we see that their characteristics are similar.

Properties of sampling distribution • Sampling distributions from samples of different size, from the same population, show the three properties of sampling distributions: • The mean • Standard deviation • Its form

Inferences • The three properties of sampling distribution give us opportunity to make inferences for population through data obtained in a sample. • The example of prevalence of amebiasis, 29.8% of elementary students had it. • If we correlate our results with other results that could be obtained, we can establish a range of values that is likely to include the true prevalence of amebiasis among schoolchildren.

Inferences The inference process, can be make by 7 steps: • The estimate of sample of 0.298 obtained, is one of many that we can have obtained from other random samples with the same size. • Property one of sampling distribution says: the mean of sampling distribution is the true value from population • Property two of sampling distribution says: standard error of sampling distribution is: π (1-π) SE(p)= ---------------- n

Inferences • Property three of sampling distribution says: sampling distribution is Normal when the sample size is big. Thus, 95% of estimates of sample that can be obtained of 500, will be into 2 SE until the mean, this is the prevalence of the population, π. • However, we only have one sample and do not know the mean (π) of sampling distribution. For the same reason , we cannot calculate SE, because we need π.

Inferences • However, we can use proportion of the sample as our better estimate of π and use it to calculate SE. • Again, using properties of Normal distribution, we can say that have 95% of confidence that the true prevalence of population, π, is inside 2 SE of the proportion of the sample, 0.289. There are the 95% confidence intervals.

Inferences • Conclusion • Thus, joining all results, we can estimate the sampling distribution and of it, the estimation that we have is derived.

Bibliografía • 1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001:173. • 2.- Kirkwood BR. Essentials of medical ststistics. Oxford, Blackwell Science, 1988: 1-4. • 3.- Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.

Biostatistics course Part 7 Introduction to inferential statistics

Biostatistics course Part 7 Introduction to inferential statistics

Presentation Transcript

Inferential Statistics

Inferential Statistics

Inferential statistics

Introduction to Inferential Statistics

Inferential Statistics

Introduction to Inferential Statistics

INFERENTIAL STATISTICS

Inferential Statistics

Inferential Statistics

Inferential statistics

Inferential Statistics

Inferential statistics

Inferential Statistics

Inferential Statistics:

Inferential statistics

Inferential Statistics (Part IV) Chapter 8 Significance Using Inferential Statistics

Inferential Statistics (Part IV) Chapter 9 Significance Using Inferential Statistics

Inferential Statistics

Inferential Spatial Statistics: Introduction to Concepts

Inferential statistics

Introduction to Inferential Statistics