Measures of Association for Contingency Tables

Measures of Association for Contingency Tables

Measures of Association • General measures of association that can be used with any variable types. • Measures of association when both X and Y are nominal. • Measures of association when both X and Y are ordinal. • Measures of association when X and Y are both ordinal or dichotomous nominal.

Measures of Association • There are two main classes of measures of association: symmetric or asymmetric. • Symmetric measures will be the same if the roles of X and Y are reversed. In other, words it does not matter which variable is viewed as the independent variable (X) and which is viewed as the dependent variable (Y).

Measures of Association • Asymmetricmeasures will be different if the roles of X and Y are reversed. In other words, which variable is viewed as the independent variable (X) and which is viewed as the dependent variable (Y) matters.

Measures of Association same

Measures of Association Rule of Thumb for Interpreting the Magnitude (i.e. ignoring the sign/direction) of the various measures of association we will be examining is as follows: .00 to <.10 “no relationship” .10 to <.30 “weak relationship” .30 to <.50 “moderate relationship” .50 to 1.00 “strong relationship” You could find several other adjective scales, these are NOT set in stone!

Measures of Association same SYMMETRIC AND CAN BE USED WITH ANY DATA TYPES

Measures of Association Between Two Categorical Variables - Phi statistic SYMMETRIC AND CAN BE USED WITH ANY DATA TYPES

Measures of Association Between Two Categorical Variables – Phi statistic This can be applied to the cervical cancer case-control study. Using this measure, there is a weak association between risk factor and disease status.

Measures of Association Between Two Categorical Variables – Yule’s Q Symmetric Measure for 2 X 2 Tables only! or

Measures of Association Between Two Categorical Variables – Yule’s Q There is a strong association between risk factor (Preg. Age < 25) and case-control status (Cervical Cancer) using this measure.

Measures of Association Between Two Categorical Variables – Cramer’s V SYMMETRIC AND CAN BE USED WITH ANY DATA TYPES

Measures of Association Between Two Categorical Variables – Cramer’s V Is there a relationship between histological type of Hodgkin’s disease and response to treatment? For the Hodgkin’s study: Which suggests a weak relationship between histological type and response to treatment.

Measures of Association Between Two Categorical Variables – Pearson’s C SYMMETRIC AND CAN BE USED WITH ANY DATA TYPES This can be used for general r x c tables regardless of the data types involved.

Measures of Association Between Two Categorical Variables – Pearson’s C This can be used for the Hodgkin’s example. Which suggests a moderate relationship between type and response to treatment.

Measures of Association same ASYMMETRIC AND CAN BE USED WITH NOMINAL X & Y

Measures of Association Between Two Categorical Variables – Lambda • Lambda (l) - Is an asymmetrical measure of association suitable for use with nominal variables that looks at predictive abilities, i.e. one variable predicting the level of the other. It provides us with an indication of the strength of an association between the independent (X) and dependent (Y) variables. • It may range from 0.0 (meaning the extra information provided by the independent variable does not help prediction) to 1.0 (meaning use of independent variable results in no prediction errors). • It is asymmetric, i.e. which variable is viewed as X and which as Y matters!

Measures of Association Between Two Categorical Variables – Lambda Lambda Lambda (l) is based on the concept ofProportional Reduction of Error (PRE). where, errors of prediction made when the independent variable is ignored. errors of prediction made when the prediction is based on the independent variable.

Measures of Association Between Two Categorical Variables – Lambda Lambda For calculating Lambda (l) : The best way to see how these formulae work and the rationale behind them is to consider an example.

Example: Physical and Psychological Pain of DBM Admits • These data come from a study conducted by three master’s nursing students who recently graduated (Kelsey, Woods, & Langhans). • One of the questions examined was whether there was a relationship between high physical pain at admission and high psychological pain. The high classification for psych pain meant 5 on five-point ordinal scale and high physical pain meant 5+ on the ten-point pain scale.

Example: Physical and Psychological Pain of DBM Admits Below is the a 2 X 2 table of the results with Physical Pain as Row (Y) and Psych Pain as Column (X).

Example: Physical and Psychological Pain of DBM Admits In the absence of any information about psychological pain we predict they will not be suffering from high physical pain as that is the modal level on the physical pain scale.  18 prediction errors using this approach.

Example: Physical and Psychological Pain of DBM Admits Using Psych Pain to predict Physical Pain status we see that if the subject has high Psych Pain the modal response is High Physical Pain and if the subject does not have high Psych Pain the modal response is not having high Physical Pain. i.e. we have 17 prediction errors using this approach.

Example: Physical and Psychological Pain of DBM Admits Using Psych Pain to predict Physical Pain status we see that if the subject has high Psych Pain the modal response is High Physical Pain and if the subject does not have high Psych Pain the modal response is not having high Physical Pain. Thus Lambda (l) = We have roughly a 5.56% improvement in predicting physical pain using knowledge about psychological pain.

Example: Physical and Psychological Pain of DBM Admits Using Physical Pain to predict Psychological Pain status we see that if the subject has high Physical Pain the modal response is High Psych Pain and if the subject does not have high Physical Pain the modal response is not having High Psych Pain.  21 prediction errors 17  17 prediction errors Thus Lambda(l) = , a 19.05% improvement in prediction error (PRE). Notice the asymmetry of the association!!

Example: Physical and Psychological Pain of DBM Admits The symmetric Lambda (l) is simply the average of the two asymmetric measures, i.e. Lambda (l) – symmetric = or a 12.31% improvement in prediction error.

Example: Physical and Psychological Pain of DBM Admits The Lambda association measures are highlighted. You can see they match those we calculated on by hand the previous slides. The Uncertainty Coefficient is calculated differently, but measures the PRE like Lambda does thus it can be interpreted in a similar fashion.

Measures of Association same SYMMETRIC AND ASYMMETRIC MEASURES USED TO MEASURE THE ASSOCIATION BETWEEN ORDINAL VARIABLES.

Measures of Association Between Two Ordinal Variables Some of the previously discussed measures can be used. However, for cases where both variables are ordinal better measures include Gamma, Kendall’s tau, Stuart’s tau and Somer’s D. We will discuss these in a bit. First though, in some cases we wish to measure the degree of exact agreement between two nominal or ordinal variables measured using the same levels or scales, in which case we generally use Cohen’s Kappa (k).

Medicare Health Outcomes Survey Website for Medicare Health Outcomes Survey: http://www.hosonline.org/Content/Default.aspx

Medicare Health Outcomes Survey (HOS) FROM THE MEDICARE HOS SURVEY WEBSITE: The Medicare HOS is the first patient-reported outcomes measure used in Medicare managed care. The goal of the Medicare HOS program is to gather valid and reliable clinically meaningful data that have many uses, such as for targeting quality improvement activities and resources; monitoring health plan performance and rewarding top-performing health plans; helping beneficiaries make informed health care choices; and advancing the science of functional health outcomes measurement. Managed care plans with Medicare Advantage (MA) contracts must participate. Each spring a random sample of Medicare beneficiaries is drawn from each participating Medicare Advantage Organization (MAO), that has a minimum of 500 enrollees and is surveyed (i.e., a survey is administered to a different baseline cohort, or group, each year). Two years later, these same respondents are surveyed again (i.e., follow up measurement). Cohort 1 was surveyed in 1998 and was resurveyed in 2000. Cohort 2 was surveyed in 1999 and was resurveyed in 2001, and so on. During the current HOS administration (2013 Round 16), Cohort 16 is surveyed and Cohort 14 is resurveyed using HOS 2.5. For data collection years 1998-2006, the MAO sample size was one thousand. Effective 2007, the MAO sample size was increased to twelve hundred.

Measures of Association Between Two Categorical Variables Cohen’s Kappa (k) – measures the degree of agreementbetween two variables on the same scales. HOS Study – General health measured ordinallyat baseline and 2-yr. follow-up, how well do they agree? • > .75 excellent agreement .4 < k < .75 good agreement 0 <k < .40 marginal agreement There is a fairly good agreement between the general assessment of overall health baseline and at follow-up. However, there appears to be some general trend for improvement as well.

Bowker’s Test of Symmetry Symmetry of DisagreementBowker’s test suggests the differences are asymmetric (p < .0001). Examining the percentages suggests a majority of patients either stayed the same or improved in each group based on baseline score. Therefore it is reasonable to state that we have evidence that in general subjects health stayed the same or if it did change, it was generally for the better (p < .0001).

Kruskal’s Gamma (g) • Before computing Gamma we need to introduce the concept of discordant and concordant paired observations. • Paired observations – Observations compared in terms of their relative rankingson the independent (X) and dependent variable (Y).

Kruskal’s Gamma (g) • Same order pair (Ns) – Paired observations that show a positive association; the member of the pair ranked higher on the independent variable is also ranked higher on the dependent variable. • Inverse order pair (Nd) – Paired observations that show a negative association; the member of the pair ranked higher on the independent variable is ranked lower on the dependent variable.

Kruskal’s Gamma (g) • Gamma is symmetrical measure of association suitable for use with ordinal variables or with dichotomousnominalvariables. • For dichotomous nominal variables it is the same as Yule’s Q for 2 X 2 tables. • It can vary from 0.0 (meaning the extra information provided by the independent variable does not help prediction) to 1.0 (meaning use of independent variable results in no prediction errors) and provides us with an indication of the strength and direction of the association between the variables. • When there are more Ns pairs, gamma will be positive; when there are more Nd pairs, gamma will be negative.

Example 1 : Job Security & Satisfaction Job Security

Example 1: Job Security & Satisfaction Job Security Same order pair (Ns) – Paired observations that show a positive association; the member of the pair ranked higher on the independent variable is also ranked higher on the dependent variable.

Example 1: Job Security & Satisfaction Job Security Inverse order pairs(Nd) – Paired observations that show a negative association; the member of the pair ranked higher on the independent variable is ranked lower on the dependent variable.

Example 1: Job Security & Satisfaction Job Security Gamma (g) = The other measures use also but make adjustments for ties. Somer’s D as you can see is an asymmetrical measure.

Example 2: Medicare Survey – General Health: Baseline vs. Follow-up Each highlighted measure suggests a strong relationship between general health at baseline and general health at follow-up as all measures exceed 0.50. The association is also positive indicating if health was good at baseline it also tends to be good at follow-up.

Summary We have considered the following measures of association for contingency tables. Depending on the variable types and the goals of our analysis, we generally choose from among these measures.

Other Measures for Ordinal Variables • There other measures that can be used when both X and Y are ordinal in nature. These are more akin to the traditional correlation measure for continuous X and Y, which is Pearson’s Product Moment Correlation (r). • Spearman’s Rank Correlation - (a.k.a. Spearman’s Rho), Kendall’s t, and Hoeffding’sD are all available in JMP, but are obtained by using the Analyze > Multivariate Methods and are found under the Nonparametric Correlations option.

Example: NHANES Survey The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations.

Example: NHANES Dermatology Survey This link we will take you to a description of the NHANES dermatology survey module conducted in 2005-2006. http://www.cdc.gov/nchs/nhanes/nhanes2005-2006/DEQ_D.htm

Example: NHANES Dermatology Survey Here we are examining ordinal measures on several variables pertaining to sun protective measures. The higher the score, the more frequently the respondent said they used the preventative measure. As these are ALL ordinal variable the use of Pearson’s Product Moment Correlation is NOT appropriate!

Example: NHANES Dermatology Survey The nonparametric correlations we might consider using are found in the Nonparametric Correlations pull-out menu. Spearman’s Rho is a good choice when X and Y are continuous but neither variable is normally distributed or if there are noticeable outliers. It can also be used with ordinal variables like we have here. Kendall’s Tau is also a valid choice for ordinal variables. Hoeffding’s D is good when the relationship between X and Y is nonlinear which would rarely, if ever, be the case for ordinal X and Y.

Example: NHANES Dermatology Survey Summary: As one would expect all correlations are positive, as someone who is cautious in one aspect of sun protection, probably tends to cautious in others as well. Spearman’s r and Kendall’s t yield similar results. Hoeffding’s D should not be used for these data!

Summary • If X and Y are ordinal but not on the same scale, or agreement when they are is not of primary interest, then there are several choices: Gamma, Kendall’s, Stuart’s and Somer’s. Try them all, pick the one you think is “best”. • For non-ordinal associations you again have several choices: Phi, Cramer’s V (Yule’s Q), Lambda, Uncertainty Coefficient, etc. Again try them all, think about what you are trying to show and choose the one you think is “best”.

Measures of Association for Contingency Tables