RESEARCH EVALUATION: WHEN YOU MEASURE A SYSTEM, YOU CHANGE THE SYSTEM Giorgio Sirilli

RESEARCH EVALUATION: WHEN YOU MEASURE A SYSTEM, YOU CHANGE THE SYSTEM Giorgio Sirilli IRCrES-CNR Redazione ROARS

ROARS Start: 2011 Membersof the Editorialboard: 14 Collaborators: 250 Contacts: 10,6 million (November 2011 – May 2015) Averagedailycontacts: 500 November 2011; 8,000 in 2014) Articlespublished: 2,000 Commentsbyreaders: 30,000 ROARS isranked 8° among the top cultural nationalblogs ROARS, a genuine expressionofdemocracy and participation, hasbecome a veryimportantplayer in the policy debate and in policy making

Evaluation Evaluation may be defined as an objective process aimed at the critical analysis of the relevance, efficiency, and effectiveness of policies, programmes, projects, institutions, groups and individual researchers in the pursuance of the stated objectives. Evaluation consists of a set of coordinated activities of comparative nature, based on formalised methods and techniques through codified procedures aimed at formulating an assessment of intentional interventions with reference to their implementation and to their effectiveness. Internal/external evaluation

The first evaluation (Genesis) The first evaluation In the beginning God created the heaven and the earth. And God saw everything that He had made. “Behold”, God said, “it is very good”. And the evening and morning were the sixthday. And on the seventh day God rested from all His work. His Archangel came then unto Him asking, “God, how do you know that what You have created is ‘very good’? What are Your criteria? On what data do You base Your judgement? Aren’t You a little close to the situation to make a fair and unbiased evaluation?” God thought about these questions all that day and His rest wasgreatlydisturbed. On the eighth day, God said, “Lucifer, go to hell!” (From Halcom’s “The Real Story of Paradise Lost”)

A briefhistoryofevaluation Research Assessment Exercise (RAE) Research Excellence Framework (REF) (impact) “The REF will over time doubtless become more sophisticated and burdensome. In short we are creating a Frankenstein monster” (Ben Martin) Italy, a latecomer Evaluation in Italy: yes or no? Yes, but … good evaluation

What do we evaluate?

The value of science William Gladstone Michael Faraday William Gladstone, then British Chancellor of the Exchequer (minister of finance), asked Michael Faraday of the practical value of electricity. Gladstone’s only commentary was ‘but, after all, what use is it?” “Why, sir, there is every probability that you will soon be able to tax it.”

The case ofphysicists Bruno Maksimovič Pontekorvo

The case ofphysicists “Physicsis a single discipline butunfortunatelynowadaysphisicistsbelongtotwodifferentsgroups: the theoreticians and the experimentalists. If a thoreticiandoesnotpossesanextraordinaryabilityhis work doesnotmakesense….Forexperimentalistsalsoordinarypeole can do a useful work …” (Enrico Fermi, 1931) “La fisica è una sola ma disgraziatamente oggi i fisici sono divisi in due categorie: i teorici e gli sperimentatori. Se un teorico non possiede straordinarie capacità il suo lavoro non ha senso… Per quanto riguarda la sperimentazione invece anche una persona di medie capacità ha la possibilità di svolgere un lavoro utile.”

The case ofgraphene Grapheneisanallotropeofcarbon in the formof a two-dimensional, atomic-scale, hexagonal lattice. Graphenehasmanyextraordinaryproperties. Itisabout 100 timesstrongerthan steel byweight, conductsheat and electricitywithgreatefficiency and isnearlytransparent. Scientistsitwas first measurablyproduced and isolated in the lab in 2003. AndreGeim and KonstantinNovoselov at the Universityof Manchester won the Nobel Prize in Physics in 2010 "forgroundbreakingexperimentsregardinggraphene." The global market forgrapheneisreportedtohavereached $9 millionby 2014 withmostsales in the semiconductor, electronics, batteryenergy and compositesindustries.

The case ofgraphene The famouspaperbyAndre Geim and Konstantin Novoselov was published in 2004 and in 2007 it was indeed quite famous and cited. The point is whether the committee would have selected his project and awarded him with an ERC Starting Grant in 2004. By looking at his citations and publications records in 2004 it is very un-probable that he would have been considered among the top 10%.

The case ofgraphene 2004 2004

The knowledge bundle

The knowledge institutions University teaching research “third mission” Researchagencies research problemsolving management

The neo-conservative waveof the 1980s

The newcatchwords New public management Value for money Accountability Relevance Excellence

The neo-conservative wave in Italy LetiziaMoratti Italian minister of education and research “You first show that you use efficiently and effectively the public money, then we will open the strings of the purse” Never happened!

Contro l’ideologia della valutazione. L’ANVUR e l’arte della rottamazione dell’università Modeloffirm’s management based on the principlesofcompetitiveness and customersatisfaction (the market) The catchwords: competitiveness excellence meritocracy “Evaluative state” as the “minimum state” in which the governmentgives up the roleofpoliticalresponsibility and avoid the democraticdebate in searchofconsensus, and rests on the “automaticpilot” oftechno-administrativecontrol.

Contro l’ideologia della valutazione. L’ANVUR e l’arte della rottamazione dell’università “ANVUR ismuch more thananadministrativebranch. Itis the outcomeof a cultural and political project aimed at reducing the rangeofalternatives and hamperingpluralism.” Sergio Benedetto

Changes in university life The universityhasbecome at the mercyof: - increasingbibliometricmeasurement - qualitystandards - blindrefereeing (someoneseesyoubutyou do notseehim) - bibliometricmedians - journal classifications (A, B, C, …) - opportunisticciting • academictourism • - administrativeburden - …….

The epistemicconsequencesofbibliometrics-basedevaluation InterviewofItalianresearchers (40-65 yearsold) Mainresults: A drasticchangeofresearchers’ attitude due to the introductionofbibliometrics-basedevaluation The bibliometrics-basedevaluationhasanextremely strong normative function on scientificpractices, whichdeeply impact the epistemic status of the disciplines (T. Castellani, E. Pontecorvo, A. Valente, Epistemologicalconsequencesofbibliometrics: Insightsfrom the scientific community, Social EpistemologyReview and ReplyCollectivevol. 3 no. 11, 2014).

The epistemicconsequencesofbibliometrics-basedevaluation • Results • 1. The bibliometrics-basedevaluationcriteriachanged the way in whichscientistschoose the topicoftheirresearch: • choosing a fashionabletheme • placing the article in the tailofanimportantdiscovery (bandwagoneffect) • choosing short empiricalpapers • 2. The hurry • 3. Interdisciplinarytopics are hindered. Bibliometricevaluativesystemsencourageresearchersnottochangetopicduringtheir career • 4. repetitionofexperimentsisdiscuraged. Onlynewresults are consideredinteresting (T. Castellani, E. Pontecorvo, A. Valente, Epistemologicalconsequencesofbibliometrics: Insightsfrom the scientific community, Social EpistemologyReview and ReplyCollectivevol. 3 no. 11, 2014).

Excellence CNR Statute 2011 CNR Statute 2015

Researchevaluation Indicators used - bibliometrics - R&D - peer review - students - graduates - patents - spin-offs - contracts and other funding - other

Some indicators • Number of publications • Number of citations • Impact factor • h-index

Useofpublicationsfordecisionmaking • The case of China (SCI) • The case of Russia

The h-index (Jorge Eduardo Hirsch) In 2005, the physicist Jorge Hirsch suggested a new index to measure the broad impact of an individual scientist’s work, the h-index . A scientist has index h if h of his or her Np papers have at least h citations each and the other (Np − h) papers have ≤ h citations each. In plain terms, a researcher has an h-index of 20 if he or she has published 20 articles receiving at least 20 citations each.

Impact factor (Eugene Fardfield) The impact factor (IF) of an academic journal is a measure reflecting the average number of citations to recent articles published in that journal. It is frequently used as a proxy for the relative importance of a journal within its field. In any given year, the impact factor of a journal is the average number of citations received per paper published in that journal during the two preceding years. For example, if a journal has an impact factor of 3 in 2008, then its papers published in 2006 and 2007 received 3 citations each on average in 2008. ("Citable items" for this calculation are usually articles, reviews, proceedings, or notes; not editorials or letters to the editor).

Nobel laureates and bibliometrics (Boson in 2013) Peter Ware Higgs 13 works, mostly in “minor” journal, h-index = 6 Francois Englert 89 works, both in prestigious and minor journals, h-index = 10 W. S. Boyle h-index = 7 G. E. Smith h-index = 5 C. K. Kao h-index = 1 T. Maskawa h-index = 1 Y. Namby h-index = 17

Science and ideology: the impact on citations Fall of the Berlin wall Berlin Nov. 1989

San Francisco Declaration on ResearchAssessment The Journal Impact Factor, as calculated by Thomson Reuters, was originally created as a tool to help librarians identify journals to purchase, not as a measure of the scientific quality of research in an article. With that in mind, it is critical to understand that the Journal Impact Factor has a number of well-documented deficiencies as a tool for research assessment. These limitations include: A) citation distributions within journals are highly skewed; B) the properties of the Journal Impact Factor are field-specific: it is a composite of multiple, highly diverse article types, including primary research papers and reviews; C) Journal Impact Factors can be manipulated (or “gamed”) by editorial policy; and D) data used to calculate the Journal Impact Factors are neither transparent nor openly available to the public.

San Francisco Declaration on ResearchAssessment San Francisco Declaration on Research Assessment General Recommendation Do not use journal-based metrics, such as Journal Impact Factors, as a surrogate measure of the quality of individual research articles, to assess an individual scientist’s contributions, or in hiring, promotion, or funding decisions.

The Leiden manifesto on bibliometrics

The Leiden Manifesto Bibliometrics: The Leiden Manifesto forresearchmetrics “Data are increasinglyusedtogovern science. Researchevaluationsthatwere once bespoke and performedbypeers are now routine and reliant on metrics. The problemisthatevaluationisnow led by the data ratherthanbyjudgement. Metricshaveproliferated: usuallywellintentioned, notalwayswellinformed, oftenillapplied. Weriskdamaging the system with the verytoolsdesignedtoimproveit, asevaluationisincreasinglyimplementedbyorganizationswithoutknowledgeof, or advice on, goodpractice and interpretation.”

The Leiden Manifesto – Ten principles 1) Quantitative evaluationshouldsupport qualitative, expert assessment. 2) Measure performance against the researchmissionsof the institution, group or researcher. 3) Protectexcellence in locallyrelevantresearch. 4) Keep data collection and analyticalprocesses open, transparent and simple. 5) Allowthoseevaluatedtoverify data and analysis.

The Leiden Manifesto – Ten principles 6) Account forvariationbyfield in publication and citationpractices. 7) Base assessmentofindividualresearchers on a qualitative judgmentoftheir portfolio. 8) Avoidmisplacedconcreteness and false precision. 9) Recognize the systemiceffectsofassessment and indicators. 10) Scrutinizeindicatorsregularly and update them.

Ranking universities and researchagencies CNRS Fraunhofer CNR ---- ---- ---- ----

Ranking universities and researchagencies Evaluating, difficult and even dangerous ….

Ranking ofuniversities Four major sourcesof ranking ARWU Shangai (Shangai, JiaoTongUniversity) QS World University Ranking THE University Ranking (TimesHigherEducation) US News e World Reports (Best Global Universities)

TopUNIVERSITIES Worldwide university rankings, guides & events • Criteria selected as the key pillars of what makes a world class university: • Research • Teaching • Employability • Internationalisation • Facilities • Social Responsibility • Innovation • Arts & Culture • Inclusiveness • Specialist Criteria

Global rankings cover lessthan 3-5% of the world universities

Ranking ofuniversities: the case of Italy ARWU Shangai (Shangai, JiaoTongUniversity) QS World University Ranking THE University Ranking (TimesHigherEducation) US News e World Reports (Best Global Universities) ARWU Shangai: Bologna173,, Milano186, Padova 188, Pisa 190, Sapienza191 QS World University Ranking: Bologna182,, Sapienza202, Politecnico Milano 229 World University Ranking SA: Sapienza95, Bologna99, Pisa 184, Milano193 US News e World Report: Sapienza139, Bologna146, Padova 146, Milano155

The rank-ism (De Nicolao)

The rank-ism (De Nicolao) The vice-rectorof the univerisityof Pavia declaredthat “There are variousrankings in the world: in eachofthem the Universityof Pavia ranks in the firts 1%. Butitisnottrue. Accordingtothreeagencies Pavia is in the followingpositions: 371: QS World UniversityRankings 251-275: TimesHigherEducation 401-500: Shanghai Ranking (ARWU) Pavia

Evaluationisanexpensiveexercise ResearchAssessmentExercise (RAE) 540 million Euro ResearchExcellenceFramework (REF) 1 milllionPounds (500 million) Evaluationof the QualityofResearch (VQR) 300 million Euro (ROARS) 182 million Euro (Geuna) Ruleofthumb: lessthan 1% ofR&D budget devotedtoitsevaluation

Evaluationisanexpensiveexercise National ScientificHabilitation: 126 million Euro - Cost per application: 2,300 euro - Cost per job assigned: 32,000 euro

Cost of evaluation: the saturation effect Source: Geuna and Martin

Cost of evaluation: a systematic loss Source: Geuna and Martin

Evaluation of the Quality of Research by ANVUR Researchers’ products to be evaluated - journal articles - books and book chapters - patents - designs, exhibitions, software, manufactured items, prototypes, etc. University teachers: 3 “products” over the period 2004-2010 Public Research Agencies researchers: 6 “products” over the period 2004-2010 Scores: from 1 (excellent) to -1 (missing)

Evaluation of the Quality of Research by ANVUR Indicatorslinkedtoresearch: quality (0,5) abilitytoattractresources (0,1) mobility (0,1) internazionationalisation (0,1) high leveleducation (0,1) ownresources (0,05) improvement (0,05) Attentionbasicallyhere!

RESEARCH EVALUATION: WHEN YOU MEASURE A SYSTEM, YOU CHANGE THE SYSTEM Giorgio Sirilli