Overview of Conditional Logistic Regression Gur Hoshen BASAS 7-7-9
What is Conditional Logistic Regression? • Also known as fixed effects • (not be confused with fixed vs. random effects model) • Useful for non-experimental data • In experimental studies, one ensures that groups of interest have similar characteristics • In non-experimental studies, traditional way was to use explanatory variables to account for variances in outcomes • What happens if we do not have these variables at our disposal? • Use each individual as a control • Models are based on variances within individuals
Conditional Logistic Regression: a brief history in SAS • Phreg • STRATA ID • had limitations in number of records • Strata statement in PROC LOGISTIC version 9 • http://www2.sas.com/proceedings/sugi27/p257-27.pdf Has matched pairs example • Paul Allison’s excellent examples under • http://www2.sas.com/proceedings/sugi31/184-31.pdf • Longitudinal studies on poverty, repeat offenders • STRATA ID (not to be confused with SURVEY PROCs)
Conlog: a slight twist • Can also be used in a completely different context of employment equity, like promotions. • Minorities do not get their fair share of promotions • What does this have to be with Conlog? • How do we answer such allegations? • Consider a simple case first: • What is fair share?
Conlog: a slight twist • Example below has data aggregated requisition, but we do have individual data.
Back to Conlog • In this particular situation, the STRATA is not the individual ID but the job. • Also, in the previous table assumes everyone equally qualified. • Individual qualifications are the explanatory variables account for selections and reduce the disparities • Expected (predicted) promotions is equal to actual at the level of the STRATA in Conlog • Strata where no one (or everyone) is selected do not contribute to disparities • “noninformative” strata in SAS • Strata with just whites or just minorities do not contribute to disparities
Actual Settled Case • Allegation that certain race did not receive promotions proportional to their representation • About 670 applicants • About 70 job requisitions • How do we measure this shortfall with no qualifications? • Calculate shortfall from one job to the next, then aggregate across all requisitions • Assumes minority is equally likely to be selected as white • Selection in each job requisition is independent of other job openings • Think of it as drawing balls from urns of different colors
Actual Settled Case • Notice again that requisitions with no selections or where everyone is selected have disparities equal to zero • Yes, we deal with partial people. This is one way of quantifying disparities and $s. • Some requisitions have positive, others negative shortfall so one can see which particulars areas are problematic.
Actual Settled Case • What if we use unconditional logistic regression? • What if we use simulated variables to account for the disparity? • Simulated variables in order to tell a simple story • What if we use simulated variables which do not account for the disparity in order to see if we are over-fitting the model? • Common allegation is there are too many variables in model used in order to capitalize on chance
Variables in Model (100 simulations) • What kind of variables go into model: • Education (BA degree or not) • Work experience: roughly triangular with lots of “0”s
Disparities • Parameter estimates are not meaningful in terms of disparities but one can use predicted probabilities that an individual will be selected. These can be aggregated to a meaningful disparity. • Uncorrelated variables for conditional regression has a disparity of -9.7 (-9.7 if used no qualifications) • Note that, even though odd ratios were closers to zero for uncorrelated model for regular model, disparity increases to -15.6 from -9.7. Why? • l l
l * Noninformative strata where we fill in data
Disparities • Regular regression ignores that some strata may have no (or 100%) selections and yet calculate non-zero expected number of selections, as well as strata with one race only • If do regular regression by requisition, would have on average, 9-10 applicants per position so could not use many variables. • Could do regression by level of position (low, medium, high) rather than just one overall regression, but still have problem that some pools with no (or 100%) selections will have non-zero disparities • l l
Conclusions Can use Conlog in situations where people are competing against one another Could be for scholarships, college admissions, not just in the labor market Note that if strata have large number of records, resultant disparities are close to regular logistic regressions Can take long time to run if want to output predicted (expected) probabilities Questions? l l