Basic concept of statistics

Basic concept of statistics • Measures of central tendency • Measures of dispersion & variability

Measures of tendency central Arithmetic mean (= simple average) • Best estimate of population mean is the sample mean, X measurement in population summation sample size index of measurement

Measures of variability All describe how “spread out” the data • Sum of squares,sum of squared deviations from the mean • For a sample,

Why? • Average or mean sum of squares = variance, s2: • For a sample,

n – 1 represents the degrees of freedom, , or number of independent quantities in the estimate s2. Greek letter “nu” • therefore, once n – 1 of all deviations are specified, the last deviation is already determined.

Standard deviation, s • Variance has squared measurement units – to regain original units, take the square root • For a sample,

Standard error of the mean • Standard error of the mean is a measure of variability among the means of repeated samples from a population. • For a sample,

Means of repeated random samples, each with sample size, n = 5 values … A Population of Values 14 15 13 14 14 13 12 16 14 14 14 16 13 14 14 13 12 14 13 14 13 16 14 13 14 15 15 16

For a large enough number of large samples, the frequency distribution of the sample means (= sampling distribution), approaches a normal distribution.

Normal distribution: bell-shaped curve

Testing statistical hypotheses between 2 means • State the research question in terms of statistical hypotheses. It is always started with a statement that hypothesizes “no difference”, called the null hypothesis = H0. • E.g., H0: Mean bill length of female hummingbirds is equal to mean bill length of male hummingbirds

Then we formulate a statement that must be true if the null hypothesis is false, called the alternate hypothesis = HA . • E.g., HA: Mean bill length of female hummingbirds is not equal to mean bill length of male hummingbirds If we reject H0 as a result of sample evidence, then we conclude that HA is true.

William Sealey Gosset (a.k.a. “Student”) • Choose an appropriate statistical test that would allow you to reject H0 if H0 were false. E.g., Student’s t test for hypotheses about means

Mean of sample 1 Mean of sample 2 Standard error of the difference between the sample means To estimate s(X1—X2), we must first know the relation between both populations. t Statistic,

Relation between populations • Independent populations • Identical (homogenous ) variance • Not identical (heterogeneous) variance • Dependent populations

Independent Population with homogenous variances Pooled variance: Then,

Independent Population with homogenous variances

Select the level of significance for the statistical test. Traditionally, researchers choose  = 0.05. Level of significance (alpha value = ) the probability of incorrectly rejecting the null hypothesis when it is, in fact, true. 5 percent of the time, or 1 time out of 20, the statistical test will reject H0 when it is true. Note: the choice of 0.05 is arbitrary!

Determine the critical value the test statistic must attain to be declared significant. Most test statistics have a frequency distribution

When sample sizes are small, the sampling distribution is described better by the t distribution than by the standard normal (Z) distribution. Shape of t distribution depends on degrees of freedom,  = n – 1.

Z = t(=) t(=25) t(=5) t(=1) t

For  = 0.05 0.025 0.95 0.025 The distribution of a test statistic is divided into an area of acceptance and an area of rejection. Area of Acceptance Area of Rejection Area of Rejection 0 Lower critical value Upper critical value t

Mean bill length from a sample of 5 female hummingbirds, X1 = 15.75; • Mean bill length from a sample of 5 male hummingbirds, X2 = 14.25; • Perform the statistical test.

Compare the calculated test statistic with the critical test statistic at the chosen . • Draw and state the conclusions. • Reject or fail to reject H0. • Obtain the P-value = probability for the test statistic.

Critical tfor a test about equality = t(2),

to test H0 at  = 0.05 using n1= 5, n2 = 5, t(2), = t0.05(2),8 = 2.306. if |t|  2.306, reject H0.

Since calculated t > t0.05(2),8 (because 3.000 > 2.306), reject H0. • Conclude that hummingbird bill length is sexually size-dimorphic.

What is the probability, P, of observing by chance a difference as large as we saw between female and male hummingbird bill lengths? 0.01 < P < 0.02

Independent Population with heterogenous variances

Dependent Populations Sample Null hypothesis: The mean difference is equal too Null distribution t with n-1 df *n is the number of pairs Test statistic compare How unusual is this test statistic? P > 0.05 P < 0.05 Reject Ho Fail to reject Ho

Analysis of Variance (ANOVA)

Independent T-test • Compares the means of one variable for TWO groups of cases. • Statistical formula: Meaning: compare ‘standardized’ mean difference • But this is limited to two groups. What if groups > 2? • Pair wised T Test (previous example) • ANOVA (ANalysis Of Variance)

From T Test to ANOVA 1. Pairwise T-Test If you compare three or more groups using t-tests with the usual 0.05 level of significance, you would have to compare each pairs (A to B, A to C, B to C), so the chance of getting the wrong result would be: 1 - (0.95 x 0.95 x 0.95) = 14.3% Multiple T-Tests will increase the false alarm.

From T Test to ANOVA 2. Analysis Of Variance • In T-Test, mean difference is used. Similar, in ANOVA test comparing the observed variance among means is used. • The logic behind ANOVA: • If groups are from the same population, variance among means will be small (Note that the means from the groups are not exactly the same.) • If groups are from different population, variance among means will be large.

What is ANOVA? • ANOVA (Analysis of Variance) is a procedure designed to determine if the manipulation of one or more independent variables in an experiment has a statistically significant influence on the value of the dependent variable. • Assumption • Each independent variable is categorical (nominal scale). Independent variables are called Factors and their values are called levels. • The dependent variable is numerical (ratio scale) • The basic idea is that the “variance” of the dependent variable given the influence of one or more independent variables {Expected Sum of Squares for a Factor} is checked to see if it is significantly greater than the “variance” of the dependent variable (assuming no influence of the independent variables) {also known as the Mean-Square-Error (MSE)}.

Rationale for ANOVA • We can break the total variance in a study into meaningful pieces that correspond to treatment effects and error. That’s why we call this Analysis of Variance. The Grand Mean, taken over all observations. The mean of any group. The mean of a specific group (1 in this case). The observation or raw data for the ith subject.

The ANOVA Model Note: A treatment effect The grand mean Error Trial i SS Total = SS Treatment + SS Error

Analysis of Variance • Analysis of Variance(ANOVA) can be used to test for the equality of three or more population means using data obtained from observational or experimental studies. • Use the sample results to test the following hypotheses. • H0: 1=2=3=. . . = k Ha: Not all population means are equal • If H0 is rejected, we cannot conclude that all population means are different. • Rejecting H0 means that at least two population means have different values.

Assumptions for Analysis of Variance • For each population, the response variable is normally distributed. • The variance of the response variable, denoted2, is the same for all of the populations. • The effect of independent variable is additive • The observations must be independent.

Analysis of Variance:Testing for the Equality of t Population Means • Between-Treatments Estimate of Population Variance • Within-Treatments Estimate of Population Variance • Comparing the Variance Estimates: The F Test • ANOVA Table

Between-Treatments Estimate of Population Variance • A between-treatments estimate ofσ2 is called the mean square due to treatments(MSTR). • The numerator of MSTR is called the sum of squares due to treatments(SSTR). • The denominator of MSTR represents the degrees of freedom associated with SSTR.

Within-Treatments Estimate of Population Variance • The estimate of2based on the variation of the sample observations within each treatment is called the mean square due to error(MSE). • The numerator of MSE is called the sum of squares due to error(SSE). • The denominator of MSE represents the degrees of freedom associated with SSE.

Comparing the Variance Estimates: The F Test • If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d.f. equal to k - 1 and MSE d.f. equal tonT - k. • If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimatesσ2. • Hence, we will reject H0 if the resulting value of MSTR/MSEappears to be too large to have been selected at random from the appropriate F distribution.

Test for the Equality of k Population Means • Hypotheses H0: 1=2=3=. . . = k Ha: Not all population means are equal • Test Statistic F = MSTR/MSE

Test for the Equality of k Population Means • Rejection Rule Using test statistic: RejectH0if F > Fa Using p-value: RejectH0if p-value < a where the value of Fa is based on an F distribution with t - 1 numerator degrees of freedom and nT - t denominator degrees of freedom

Sampling Distribution of MSTR/MSE The figure below shows the rejection region associated with a level of significance equal towhere Fdenotes the critical value. Do Not Reject H0 Reject H0 MSTR/MSE F Critical Value

ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Squares F TreatmentSSTR k- 1 MSTR MSTR/MSE Error SSE nT - k MSE TotalSST nT - 1 SST divided by its degrees of freedom nT - 1 is simply the overall sample variance that would be obtained if we treated the entire nT observations as one data set.

What does Anova tell us? ANOVA will tell us whether we have sufficient evidence to say that measurements from at least one treatment differ significantly from at least one other. It will not tell us which ones differ, or how many differ.

ANOVA vs t-test • ANOVA is like a t-test among multiple data sets simultaneously • t-tests can only be done between two data sets, or between one set and a “true” value • ANOVA uses the F distribution instead of the t-distribution • ANOVA assumes that all of the data sets have equal variances • Use caution on close decisions if they don’t

ANOVA – a Hypothesis Test • H0: There is no significant difference among the results provided by treatments. • Ha: At least one of the treatments provides results significantly different from at least one other.

Basic concept of statistics

Basic concept of statistics

Presentation Transcript

Basic Statistics

Basic Concept of Fitness

Basic Statistics

Basic concept

Basic Concept of MRI

Basic Ideas of Statistics

Basic Concept of Gender

Basic concept

Basic Statistics

Basic Statistics

Basic Statistics

Basic concept of statistics

Basic Statistics

Basic Concept of HDL

Basic Statistics

Basic Concept of Government

Basic Practice of Statistics

Basic Concept of

BASIC STATISTICS

BASIC CONCEPT OF C++