From Republic of Science to Audit Society

From Republic of Science to Audit Society Irwin Feller Professor Emeritus, Economics, Pennsylvania State University Assessing impact of public policies - Assessing impact of public research.Which complementarities ASIRPA Paris, France June 13, 2012

Presentation’s Focus • Good and Bad Metrics/Methodologies for Evaluating the Impacts of Research • Good and Bad Use(s) of Assessments of Research

Outline Political Economy of Governance of Science Perennial S&T Policy Decisions Choice of Methods/Metrics Use/Nonuse/Misuse of Evidence

The Times They are A-changin Then you’d better start swimming or you’ll sink like a stone.

What Has Been Lost (As seen by some) “The age of chivalry is gone. That of sophisters, economists and calculators has succeeded” Edmund Burke, Reflections on the Revolution in France

Context, Context, Context • (Historical, Path Dependent) Context of National Science/Innovation Systems • (Current Political/Policy) Context • (Decision/Situational) Context • (Linguistic/Disciplinary) Context

Assessment Assessment is an historical, integral component of patron-based scientific activity.

Assessment Framework • Who is Assessing Whom • Using What Criteria/Performance Measures/Methods to Document and Value What Outcomes, with • What Impacts on the Vitality of the Scientific Enterprise and the Position(s) of Those Engaged in Scientific Activity?

Pre-New Public Management Assessment Paradigm • Republic of Science (M.Polyani) • Peer (Expert) Review • Social Contract

Social Contract for Science “Government promises to fund the basic science that peer reviewers find most worthy of support, and scientists promise that the research will be performed well and honestly and will produce a steady stream of discoveries that can be translated into new products, medicines or weapons” (Guston and Keniston, 1994: “The Social Contract for Science”)

New Public Management Paradigm • Accountability • Deregulation • Competition • Performance Measurement

What is New • Performance measures are increasingly mandated components of appropriation decisions and oversight reviews • Dominant ethos is that better measures will lead to better (evidence-based) decisions • Stream of new data sets and analytical techniques • Increased political/policy trends towards performance based budgeting

The Evidence-based Decision Making Imperative • “Agencies should demonstrate the use of evidence throughout their FY2014 budget submissions. • “….comparative cost-effectiveness of agency investments: allocation of funding across agency programs or within programs” • OMB: Use of Evidence and Evaluation in the 2014 Budget (May 18,2012)

Tensions Among Accountability, Efficiency and Autonomy Fine line between improved, evidence-based decision-making –”Wanted Better Benchmarks” and increased influence on the direction and inner workings of the scientific enterprise-”Asking Scientists to Measure Up”

Low-Stakes/High-Stakes Assessments • Low Stakes • Reputational Surveys: National Research Council Assessments of Graduate Programs; Shanghai Academic Ranking of World Universities • High Stakes • Performance-based University Research Funding Systems: UK-Research Assessment Exercise; Germany University Excellence Competition

3 Faces (Purposes) of Evaluation • Learn about a program’s operations (Does it Work?; How can it be made better?) • Control the behavior of those responsible for program implementation (Modify objectives; reallocate resources; reassign responsibilities) • Influence the responses of outsiders in the program’s political environment (create the appearance of a well managed program; preemptively set metrics and methodologies)

Generic Science Policy Questions The major issues in science policy are about allocating sufficient resources to science, to distribute them wisely between activities, to make sure that resources are used efficiently and contribute to social welfare” (Lundvall, B. and S. Borras, 2005, p. 605)

Promises of Research Performance Assessment • Objectives provide useful baseline for assessing performance. • Performance measurement focuses attention on the end objectives of public policy, on what’s happened or happening outside rather than inside the black box. • Well defined objectives and documentation of results facilitate communication with funders, performers, users, and others.

Limitations of Research Performance Measurement • Returns/Impacts to research are uncertain, long-term, and circuitous • Impacts are typically dependent on complementary actions by agents outside of Federal agency control • Benefits from “failure” are underestimated • Specious precision in selection of measures • Distortion of Incentives • Limited (public) evidence of contributions to improved decision making

Assessment as Lever for Structural change “ Given that science is changing, the institutions that are efficient in supporting science at one point in time may be less appropriate at a later point of time. On precise dimensions, a failure to continually re-tune science policy may therefore impede scientific progress. B. Jones (2010) As Science Evolves, How Can Science Policy?

Performance Metrics • Metrics Abound: Generic List of 37, with New Ones Constantly being Proposed • Most Programs Have Multiple Objectives--Select Metrics most relevant to Decisions • “Cherry Pick” Metrics (Strategic Retreat from Objectives)

All Performance Measures Can be Gamed “Once STI indicators are made targets for STI policy, such indicators lose most of the informational content that qualify them to play such a role” (Freeman and Soete, 2009)

Overview of Evaluation Methodologies

Taking the “Con” Out of Econometrics Leamer (1986): “Hardly anyone takes the data analysis seriously”. Distressing lack of robustness to changes in key (‘whimsical’) assumptions

Credibility Revolution in Empirical Economics • “Empirical microeconomics has experienced a credibility revolution, with a consequent increase in policy relevance and scientific impact…. • “Primary engine driving improvement has been a focus on the quality of empirical research designs” Angrist and Pischke(J. Economic Perspectives, 2010)

τ + 1 τ τ τ + 1 Issue: Before/After Design Shows changes “related” to policy intervention, but does not adjust for “intervening” factors. (Threats to internal validity) Reframe Analysis: Did policy “cause” change(s) in treatment group different from those observable in a comparison/control group

Trends in U.S. Agricultural Productivity

Impact of R&D on U.S. Agricultural Productivity

Benefit -Cost Analysis Steps Conduct Technical Analysis Identify Next Best Alternative Estimate Program Costs Estimate Economic Benefits Determine Agency Attribution Estimate Benefits of Economic Return RTI 2010

Benefit-Cost Estimates of Returns to Health Research • An average 45 year old in 1994 had a life expectancy 4 ½ years longer than in 1950 because cardiovascular disease mortality had decreased. • “…(U)nambiguous conclusion …that medical research on cardiovascular disease is clearly worth the cost” (2002; p.113). • In benefit-cost terms, this increase is estimated to yield a 4 to 1 return for medical treatment and a 30-to-1 return for research and dissemination costs related to behavioral change. (Cutler and Kadiyala, 2002)

Benefit-Cost Estimates of DOE-EERE Geothermal Technology Studies RTI: 2010

Econometric ApproachManufacturing Extension Partnership Summary Statistics R. Jarmin Measuring the Impact of Manufacturing Extension

“Dominant” U.S. Methodology is Expert Panels “The most effective means of evaluating federally funded research programs is expert review. Expert review-which includes quality review, relevance review, and benchmarking should be used to assess both basic research and applied research programs” (National Academies, Evaluating Federal Research Programs, 1999, p. 5)

Bibliometrics: US • Added to reputational surveys in NRC assessments • Patent to Citation Linkage to Document Impacts of Basic Research • Increasingly used by departments/colleges • No use of performance-based funding • Little evidence (to date) of impacts on Federal funding of academic research

Use of Bibliometric Data to Allocate Resources Across Fields • Over the last 3 decades, even as the US position in the life sciences has remained strong, its world share of engineering papers has been cut almost in half, from 38% in 1981 to 21% in 2009, placing it below the share (33%) for the EU27. Similar declines in world share are noted for mathematics, physics, and chemistry. • If bibliometric performance is a function of resource allocation, a nation gets what it funds. • Formulation begs questions if what it’s producing is what it most needs, and if what it’s producing is being produced in the most efficient manner.

Is Anyone Listening? “The ideas of economists and political philosophers, both when they are right and when they are wrong, are more powerful than is commonly understood. Indeed the world is ruled by little else. Practical men, who believe themselves to be quite exempt from any intellectual influence, are usually the slaves of some defunct economist. Madmen in authority, who hear voices in the air, are distilling their frenzy from some academic scribbler of a few years back” (J.Keynes)

Is Anyone Listening? Yes • Continuing Impact of “Academic Scribblers” on Men in Power in Setting Policy Worldview • Solow-Abramovitz-Romer • Arrow-Nelson • Mansfield

BIG “3” FEDERAL SCIENCE QUESTIONS ROLE OF PERFORMANCE MEASURES Measures do not provide a basis for determining if, say, 3% is too high, too low, or just right. Measures/methodologies provide multiple answers, leading to multiple possible decisions Potentially of considerable value, but underutilized QUESTION • How much should be allocated to Federal research? • How much to spend across missions/agencies/fields of science? • Which performers; what allocation criteria?

Using Social Rates of Return to Guide Resource Allocations “But it is evident that these studies can provide very limited guidance to the Office of Management and Budget or to the Congress regarding many pressing issues. Because they are retrospective, they shed little light on current resource allocation decisions, since these decisions depend on the benefits and costs of proposed projects, not those completed in the past”. (Mansfield, 1991, p. 26).

Use, Non-Use, Misuse of Program Assessments • Use: NIH Benefit-Cost Studies • Nonuse: ATP (Program terminated); DOE-ERRE (Budget slashed) • Misuse: Overgeneralization of findings to different decision/policy settings • “Bunkum”-Worthless, mundane, incompetently done assessments

Asymmetrical Impacts of Well Done Evaluations • If program is not working, kill it because it is ineffective • If program is working, prima facie evidence that the private sector would engage in it where it not being crowded out

The Ever Lurking (Ideological) Counterfactual:ATP “ATP’s defenders claim that these subsidies generate greater technological innovation. They point out all the technologies on the market that ATP funded. Of course, ATP grants have funded some successful products. But the key question is whether the market would have produced those products even without ATP. Both economic theory and practice say, “Yes.” Brian Riedl, Testimony before the Homeland Security and Government Affairs Committee, United States Senate, May 2005

Thank you

From Republic of Science to Audit Society

From Republic of Science to Audit Society

Presentation Transcript

From Republic to Empire

From Republic to Empire

From Republic to Empire

From Republic to Empire

Rise of the Audit Society

From republic to empire

Discussion From Republic of Science to Audit Society, Irwin Feller

From Republic to Empire

A: From Monarchy to Republic B: From Republic to Monarchy

From Republic to Empire

From Republic to Empire

Roman Society from Republic to Empire

From Republic to Empire

Republic of Iraq Board of Supreme Audit

FROM REPUBLIC TO EMPIRE

From Republic to Empire

From Republic to Empire

From Republic to Empire

From Republic to Empire

From Republic to Empire

From Republic To Empire

From Republic to Empire