Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Variables

Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Variables David L. Streiner Nour Kteily PSY 1950

The Danger of Dichotomizing • Reduced statistical power • Increased probability of type II error • Difficulty reinterpreting data once definitions have changed

The Rationale for Dichotomizing Outcomes • 1) “Clinicians have to make dichotomous decisions to treat or not to treat, so it makes sense to have a binary outcome” • 2) “Physicians find it easier to understand results when they are expressed in proportions or odds ratios rather than beta weights and other indices”

Striener’s Retort • 1) Confuses measurement with decision making. • 2) Many ‘binary’ disorders could actually be seen as a continuum. • 3) All research using the old dichotomy becomes much more difficult to interpret if the definition of the dichotomy changes. • 4) Many treatments for ‘binary’ disorders actually fall along a continuum.

Example 1 • Scale dichotomized- scores below 15 considered normal; above 15 = ‘case’ • If treat scores in Group 1 and Group 2 continuously: Mean G1 = 11.70 Mean G2 = 16.80 t(18)= 2.16, p = 0.045 • If treat dichotomously: G1: 9 normal, 1 ‘case’ G2: 4 normal, 6 ‘cases’ Fisher’s test: P = 0.057

Example 2 • 40 subjects, measured on 4 variables A-D • Testing correlations (continuous), you would get 4 significant correlations (upper triangle) at p<0.01 level • If you dichotomize the data using median splits, you get only 2 significant correlations (lower triangle). • Run regression with A as the dependent variable and B-D as the predictors: • Variables kept as continua: R2 = 0.588 • Variables dichotomized: R2 = 0.211

Issues with Dichotomizing • 1) Magnitude of the effects were lower when considering outcomes as dichotomous versus continuous. • 2) Findings that were significant using continuous variables were not significant using dichotomous variables. Why? • Dichotomizing results in a ‘tremendous’ loss of information • Misclassification • Signal/Noise ratio Taken together, these issues result in decreased statistical power and increased probability of type II error

It’s Not All Bad • There are actually a few cases, based on statistical not clinical considerations, when we should divide variables into a dichotomy or ordinal data. • 1) J-shaped distributions • 2) Non-linear relationships

Conclusion • Gather data as continua whenever possible • Unless your variable deviates considerably from normality, avoid decreased power and increased type II error - don’t dichotomize!

Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Variables