AS 737 Categorical Data Analysis For Multivariate

AS 737Categorical Data AnalysisFor Multivariate Week 2

The Data (var00002)

Binomial Test

The Result Using SPSS. But how is it calculated?

How To Calculate the P-value Binomial Table Made in Excel with n=20 and p=0.70 Why? P-value is the probability of observing what was observed or more extreme under the null hypothesis. In our example X=15 so p-value equals P(x>=15)=.4163708 P-value equals the sum of the probabilities 15 through 20 = .4163708

Binomial Test x=15 The way the data is analyzed it treats the “0” as a success. There are 15 zeros. Thus again, P-value equals the sum of the probabilities 15 through 20 =P(X>=15)= .4163708

Sampling • Last week we covered the Binomial distribution and Poisson distribution. • Count data often comes from the Binomial/Multinomial or Poisson distribution. • Luckily whether the data comes from Binomial/Multinomial or Poisson distribution for most analysis of the categorical data is performed in the same manor. • For this reason we will often not discuss which distribution the data came from.

Two-Way Contingency Tables Belief in Afterlife Gender nrc (n, 1st row, 2nd column)

Joint, Marginal and Conditional Probabilities

Independence

Difference of Proportions When the counts in the two rows are independent binomial samples, the estimated standard error of p1-p2 is Class take 10 minutes to do the following: Calculate the 95% confidence interval for the difference in proportions between women and men (women-men) that believe in an afterlife.

Difference of Proportions 95% CI for the difference in proportion (can range from -1 to 1) .010684+/-1.96*.02656 .010684+/-.052057 (-0.04137,0.062741) Do you believe the difference is different from zero? Now that we have calculated a 95% CI, Explain what a 95% CI is. Were we to take an infinite number of samples and create an infinite number of 95% confidence intervals 95% of those intervals created would contain the true difference of

Difference of Proportions Myocardial Infarction (MI) Group Class take 5 minutes to do the following: Calculate a 95% for difference in proportions.

Difference in Proportions vs. Relative Risk The 95% CI is (.0171-.0094)+/-1.96(0.0015) Approx (.005,.011), appears to diminish risk of MI Another way to compare the placebo vs. Aspirin is to look at the relative risk, The sample relative risk is p1/p2=.0171/.0094=1.82 Thus in the sample there were 82% more cases of MI from the placebo than Aspirin. To calculate the CI for relative risk you would first calculate the CI of the log of relative risk and then take the CI limits and the taken the antilog. (Note, log will represent natural log, in Excel you must use ln, not log).

Relative Risk The confidence interval for the relative risk is (1.43, 2.31). From this we would the relative risk is at least 43% higher for patients taking aspirin. It can be misleading to only look at the difference in proportions, looking at this situation in terms of relative risk, clearly you would want to take Aspirin. 0.597628+/-1.96*0.121347=(.359787,.835469) Exp(0.359787) and Exp(0.835469)=(1.43,2.31)

The Odds Ratio The odds are nonnegative, when the odds are greater than one a success is more likely than a failure. The odds ratio can equal all nonnegative numbers. When X and Y are independent then the odds ratio equals 1. An odds ratio of 4 means that the odds of success in row 1 are 4 times the odds of success in row 2. When the odds of success are higher for row 2 than row 1 the odds ratio is less than 1.

The Odds Ratio The maximum likelihood estimator of the odds ratio is: The asymptotic standard error for the log of the MLE is: The confidence interval is:

Inference for Log Odds Ratios Class take10 minutes to do the following: Calculate the Odds ratio for MI, and then a 95% CI for the odds ratio.

Inference for Log Odds Ratios Odds ratio=(189*10933)/(104*10845)=1.832 Log(1.832)=.605 ASE of the log = (1/189+1/10933+1/10845+1/104)1/2=.123 95% CI of the log odds ratio is (.365,.846) Thus the 95% CI of the Odds ratio is (1.44,2.33)

Dealing with small cell counts and the For when zero cell counts occur or some cell counts are very small, the following slightly amended formula is used: The Relationship Between Odds Ratio and Relative Risk

Chi-Squared Tests For calculating chi-square statistics for testing a null hypothesis with fixed values we use expected frequencies:

Chi-Squared Tests of Independence For calculating chi-square statistics for testing a null hypothesis with assuming independence: Most likely the true probabilities are unknown and the sample probabilities must be used

Chi-Squared Test of Independence Take 15 minutes and calculate the Pearson statistic and the likelihood ratio chi-squared statistic for the null hypothesis that the probability of heads is the same for all people, assuming the true probability is unknown. Coin Toss Person

Adjusted Residuals When the null hypothesis is true, each adjusted residual has a large-sample standard normal distribution. An adjusted residual about 2-3 or larger in value indicates lack of fit of the null hypothesis within that cell. Take 10 minutes to calculate the adjusted residuals: Political Party Identification Gender

Adjusted Residuals From this example we can see how the adjusted residuals can add further insight beyond the chi-squared tests of independence. Such as direction. Political Party Identification Gender

Chi-Squared Tests of Independence with Ordinal Data Linear trend alternative to independence. is chi-squared with one degree of freedom. M, its square root follows a standard normal distribution. M gives insight into direction. Note, when categories do not have scores such as education level logical scores must be assigned. E.G. High School degree =1, College degree =2, Masters degree=3

Example with Ordinal DataAlcohol and Infant Malformation Infant Malformation Alcohol Consumption Take 2-3 minutes and think of logical value assignments for scores. Note: nominal binary data can be treated as ordinal.

Example with Ordinal DataAlcohol and Infant Malformation Infant Malformation Alcohol Consumption Take 20 minutes using the scores given calculate r.

AS 737 Categorical Data Analysis For Multivariate

AS 737 Categorical Data Analysis For Multivariate

Presentation Transcript

Categorical Data Analysis

Chapter 16 – Categorical Data Analysis

Introduction to Categorical Data Analysis

Categorical Data Analysis

Analysis of Categorical Data

INTRODUCTION TO CATEGORICAL DATA ANALYSIS

Categorical Data Analysis

STA617 Advanced Categorical Data Analysis

Categorical Data Analysis PGRM 14

Multivariate Data/Statistical Analysis

Multivariate Data Analysis

Multivariate Data Analysis

PCA for analysis of complex multivariate data

The Analysis of Categorical Data

Categorical Data Analysis

Exercices Multivariate Data Analysis

PCA for analysis of complex multivariate data

The Analysis of Categorical Data

Categorical Data Analysis

INTRODUCTION TO CATEGORICAL DATA ANALYSIS

WLS for Categorical Data

Categorical Data Analysis Review for Final