Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs

Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs

Goals of Meeting • Review design --research questions and conceptual framework --choice of comparison group --data sources and outcome measures --methods --possible options for timing and sample selection • Respond to comments • Discuss proposed options and possible modifications to options

Goals of LRPs and of Evaluation • To increase number of individuals conducting research in certain fields, NIH implemented 5 extramural loan repayment programs (LRPs): Clinical (began in 2002) Clinical for those from disadvantaged backgrounds (2001) Pediatric (2003) Health disparities (2001) Contraception and infertility (1997) • Evaluation objective: assess whether programs are achieving their goals of recruiting and retaining researchers in these fields

Evaluation Research Questions • Do LRPs have a “recruitment effect” -- increase number of individuals who begin research careers in the designated LRP field? • Do LRPs have a “retention effect” -- increase length of time individuals conduct research in LRP field, or in any biomedical field? • Do LRPs have a “productivity effect” -- make awardees more successful than they would have been without the program?

Conceptual framework for how LRPs might affect outcomes Extramural LRPs might affect: • Recruitment into research field, if individuals know about, and are motivated by, LRPs prior to choosing to pursue research career • Research retention in LRP field (or in any field) by relieving financial pressures that could otherwise cause individuals to leave research for higher-paying positions • Research productivity by enabling individuals to devote more time and focus to research

Choosing a comparison group • To determine what would have happened to extramural LRP awardees absent the program, we need a comparison group. • Should comparison group be “external” (outside the applicant pool) or “internal” (from the applicant pool)?

Why external comparison group not feasible • Comparison group would need to be broadly defined because LRP applicants come from such a wide variety of backgrounds. • Recruitment might be measured by comparing all doctoral degree recipients who were barely eligible to those who were barely ineligible according to debt-to-salary ratio. But: --For MDs, sample size needed would be enormous since the portion of MDs conducting research in particular field is so small. --For PhDs, sample sizes in available data sources are not large enough to detect even maximum possible impact of the extramural LRPs. • Retention might be measured with external comparison group, but matching diverse backgrounds and work experiences of LRP participants would be difficult.

Attractive Features/Possible Concerns of Internal Comparison Group Attractive Features • All applicants were interested in LRP and awardees / non-awardees have similar characteristics. • Administrative data available for full sample. Possible Concerns • Selection Bias: Can we control for likelihood that funded applicants are more promising researchers than non-funded applicants? • Will sample sizes be large enough to detect program effects? • Could recruitment be measured, since all applicants must have positions in field to be eligible?

Overcoming Selection Bias • If scoring process is known and measured, we can use statistical models to obtain unbiased program effects. Regression discontinuity design can be used if score cut-off point or range is used to make funding decisions. LRP scoring process is suitable for regression discontinuity design because: • Applicants are scored on the basis of their research potential according to standardized criteria. • ICs seem to fund all those above a funding cut-off point or range. (Sometimes ICs go strictly by score in determining who to fund; other times, ICs look at all scores close to “payline” and may choose applicants with lower scores who are doing research in areas of particular interest.)

Hypothetical Effect of Extramural LRP on Length of Time in Research Career

What size program effects could be detected with available sample sizes? • Sample sizes large enough to detect whether 5 LRPs collectively had effect of 10 percentage points • An effect of 15 percentage points could be detected for certain subgroups • Effects of 10 to 20 percentage points could be detected for the larger LRPs (and would be able to report outcomes for all applicants in each LRP)

Measuring Recruitment Effects • Recruitment effect is difficult to measure through comparison to non-funded applicants because they must have been funded in relevant field before applying • But, retrospective survey could gauge: -- whether applicants knew about LRP before taking research position -- extent to which LRP influenced decision -- how they gauged chances of receiving award

Outcome Measures Ideally, we could measure LRPs’ effect on whether individuals: • Conducted research in LRP field (and persistence) • Conducted research in any field (and persistence) • Devoted > 50% of time to research in LRP field or any field • Obtained an NIH R-01 grant • Were PIs on NIH grant or any grant • Had NIH research funding or any research funding • Applied for NIH funding • Had tenured academic position • Conducted research in nonprofit or government setting • Were peer reviewers for NIH • Were peer reviewers for journals • Had publications • Had their work cited

Data Sources • Applicant data from OLRS • Publications databases such as PubMed • Funding databases, such as NIH’s IMPAC-II • Proposed survey of past applicants

Reasons for Proposed Survey • Critical information (e.g., field of research, non-NIH funding, being part of a research team) not readily available from secondary sources. • Ability to track publications without name-matching. • Data can be collected sooner (do not have to wait for publication time delays). • Response bias can be gauged by comparing program effects on certain outcomes (such as whether PI on NIH grant) for full sample to sample of survey respondents.

Methods • Regression discontinuity design will be used to obtain unbiased program effects. • Primary analysis would estimate combined effects for all LRPs pooled together. • Model would control for differences between ICs and LRPs (such as scoring patterns or applicant characteristics).

Possible Subgroups • Each of the larger LRPs • MDs vs. PhDs • Those who received NIH funding vs. those who did not • Those who had higher vs. lower debt levels • Those who received their degree recently vs. longer ago

Options for Timing of Data Collection • Need to strike a balance between providing information on research careers (which could take years) vs. providing timely information to policymakers. • Propose measuring early outcomes 4 to 5 years from time of application. • Possibly measure long-term outcomes 7 to 9 years after application.

Sample Selection • Propose to include only 2003 and/or 2004 cohorts since number of non-funded applicants in 2001 and 2002 was so small. Sample size? • The larger the sample, the more likely we will detect program effects if they exist. • Large sample is particularly important for measuring effects of LRPs separately or for other subgroups. • But, collecting data on large sample will be more costly.

Option 1: Include all individuals from 2003 and 2004 cohorts • Able to detect smallest program impacts (about 9 percentage points for survey respondents) • Most costly because it has largest sample • Including 2004 pool means that data collection, analysis would occur later than in Option 2

Option 2: Include only the 2003 cohort • Less costly than Option 1, involving half the sample • Data collection occurs a year earlier than Option 1 or 3 • Minimum detectable effects largest among 3 options (10 to 13 percentage points) and reduces ability to measure subgroup impacts

Option 3: Clinical LRP only • Middle of the three options in terms of sample size and minimum detectable effects (9 to 11 percentage points) • Would only have results for one LRP • Data would be collected a year later than for Option 2

Recommendations and Issues for Consideration • If OLRS desires separate estimates for large LRPs and for subgroups, implement Option 1 • If subgroups not a priority and/or if timeliness is a priority, implement Option 2 • Option 3 is suitable if OLRS wants to detect relatively small program impacts but is concerned about cost of surveying full sample from all LRPs • OLRS needs to consider how small an effect it needs to be able to detect. (Would 7-percent effect be so small that the program would not be considered cost-effective?)

Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs