Reforming research assessment for better science

Policy brief

OECD Policy Briefs

29 April 2026

Download PDF

Key messages

Copy link to Key messages

Research Assessment (RA) plays a central role in shaping the priorities and operation of the research system, but current practices are often misaligned with evolving policy demands and the changing process of research. The over-reliance of RA on narrow performance measures tends to undervalue critical research practices and contributions such as collaboration, openness, societal impact or policy support. At the same time, these measures generate perverse incentives and can lead to undesirable behaviours.
There is a pressing need to develop and test alternative evaluation tools and processes aimed at nurturing a healthy and productive research culture that serves societal demands. This need for a systemic evolution towards better adapted RA concepts and methodologies is recognised across research performers and funders. However, transforming RA has proven challenging as it requires coordinated effort and a paradigm shift in assessment practices and structures that are deeply entrenched.
RA should be used in a cost-effective way. RA has grown exponentially in parallel to the demands put on the research system. Evaluation is now taking too much time and resources from researchers and research administrators alike. RA should only be carried out for clearly defined objectives and when other, less resource-intensive, processes are not adequate to achieve these objectives.
Effective RA requires a FAIR and open data environment. Relying solely on proprietary data and services reduces the capacity for conducting meta- analyses that are critical for effective RA. Loss of control and transparency can undermine the autonomy of research agencies and institutions and the trust of the research community.
National and global rankings and league tables, which have become very influential in recent years, should not be used in RA. They are not adapted to the specific profiles and purposes of different research performers and do not therefore reflect accurately countries’ or institutions’ performance in conducting quality research.
The potential role of artificial intelligence (AI) in RA needs to be carefully examined. While AI presents interesting potential to streamline some RA processes, it is also prone to errors and biases, and its reliability depends on multiple factors that need to be carefully controlled. Further work is required to fully assess whether AI, used in a transparent and responsible way, may be of added value in RA.

What’s the issue?

Copy link to What’s the issue?

Research assessment (the systematic process of monitoring, evaluating and reviewing research inputs, processes, outputs, and impacts) is carried out by a variety of public and corporate actors (government ministries, national institutes and agencies, universities, publishers etc.) at national, institutional, and individual levels. It plays a central role in determining how research priorities are set, how resources are allocated, and how researchers’ careers develop. Their design – including the choice of criteria and methods used to assess research – influences the type of research conducted, how research institutions operate, and how individuals and groups of researchers approach their work. Research assessment plays a central role in shaping the direction, quality, and culture of scientific research (Figure 1).

Figure 1. Relationship between research culture, assessment measures, incentives and behaviours
Copy link to Figure 1. Relationship between research culture, assessment measures, incentives and behaviours

Why is supporting changes in Research Assessment important?

Copy link to Why is supporting changes in Research Assessment important?

Research assessment frameworks are important because they signal what is valued – or not valued – in a research system by those carrying out the assessment, and ultimately shape what is achieved through scientific research. Research assessment frameworks should therefore reflect what scientific communities and societies regard as valuable in research – but they are not neutral instruments. They are tools through which individuals, institutions, and states monitor (and incentivise) goals such as advancing scientific knowledge, ensuring research ‘excellence’, quality and integrity, fostering open science and aligning research with societal needs. They can play an important role in determining the allocation of limited research resources, which in turn impacts the working lives and career experiences of those working in research.

Research assessment has evolved over time to respond to the broadening role of science in society. Quantitative and qualitative methods have been progressively introduced as the research ecosystem and its outputs have become more complex. However, in recent years, movements advocating reform in research assessment have grown as existing methodologies are increasingly perceived as unable to respond to emerging needs, and even potentially detrimental to the system they are supposed to evaluate (OECD, 2026[1]). For example, high-risk research, which potentially can bring high rewards in the long term, is often undervalued because of the lack of relevant RA indicators (OECD, 2021[2]). Similarly, transdisciplinary research, spanning multiple disciplines and sectors, is poorly captured by traditional disciplinary evaluation frameworks focused on publications and citations, with little recognition of societal relevance and long-term impact (OECD, 2020[3]).

There are growing concerns that dominant and well-established assessment frameworks promote undesirable effects and behaviours. These frameworks rely heavily on a set of quantitative, publication-based criteria and metrics – such as publication counts, citation rates and journal impact factors. Although there are good reasons why these simple quantitative measures were initially introduced, their mis- or over-use, as a proxy for research productivity and quality, has contributed to a range of perverse incentives with undesirable consequences. These include for example fostering a ‘publish or perish’ culture, fueled by an exponential and uncontrolled proliferation of scientific journals and conferences. This culture pressures researchers to prioritise quantity over quality, undermining research ethics and integrity, reducing the overall diversity and innovative potential of research systems and leading to a loss of talent as researchers burn out and leave the career (Fanelli, 2010[4]) (Bergstrom, Foster and Song, 2016[5]) (Edwards and Roy, 2017[6]) (Demir, 2018[7]) (Öztürk and Taşkın, 2024[8]).

At the same time, the public and policy expectations from research are changing. Research is increasingly expected to demonstrate societal impact and contribute to solutions for complex societal challenges. These expectations bring new demands for research processes and outputs that go beyond publications, and they require forms of knowledge production that are often inter- or trans-disciplinary, open, and collaborative. Existing assessment frameworks often fail to capture or reward these wider contributions and modes of research (OECD, 2020[3]) (Science Europe, 2022[9]). This creates a growing misalignment between what is expected of research and what is incentivised in practice, limiting the responsiveness and relevance of research systems. Hence, the need for developing new assessment and incentive frameworks (Figure 2).

Figure 2. Assessment frameworks, incentives and new expectations and demands from science
Copy link to Figure 2. Assessment frameworks, incentives and new expectations and demands from science

Delivering on these evolving expectations requires not only changes in what is assessed, but also in how research is organised and supported. Research assessment can be a powerful tool to incentivise and monitor desirable changes in research practices and behaviour. Practices such as collaboration, community-based work, science communication and policy engagement are often undervalued as they are difficult to quantify and do not fit easily within traditional research assessment models. Openness and originality/risk-taking are similarly disincentivised. New assessment frameworks and processes that value and reward these desirable practices and behaviours can be a catalyst for more dynamic, diverse, innovative and societally relevant research. They can help foster a positive research culture, enhance discovery and increase the likelihood of finding acceptable solutions to the increasingly complex challenges that society is facing today.

Research assessment can be resource and time-intensive for the research community. The constant demand for outputs, publication inflation, the exponential growth of data, the increased emphasis on performance indicators and the extensive use of peer review (of articles, projects, programmes, laboratories etc.) all contribute to an increase in research assessment and in the burden on researchers and administrative staff (Technopolis Group, 2015[10]) (Kovanis et al., 2016[11]). Care should be taken to ensure proportionality in RA, since the administrative or research burden can, in some cases, outweigh the intended benefits.

There are concerns about the accessibility, transparency and governance of databases, data infrastructures and analytic services that are commonly used for conducting research assessment. Many scientific production and publication databases and services are controlled by corporate actors that select and curate research information using methodologies and algorithms that are not fully transparent. These may embed linguistic and epistemic biases that are not reflective of the breadth and richness of research contributions (for example, contributions in a number of disciplines or from non-English countries are not faithfully represented if data is limited to publications in English).

There are also concerns about the influence that corporate publishers, data analytic firms and university rankings jointly have in assessing “research performance” and shaping research cultures at both the institutional and national levels. A number of commercial entities have developed national or international rankings, which are heavily reliant on their own proprietary data analytics. In the absence of viable alternatives, the influence of these rankings on decision-making has grown over time, partly due to their apparent simplicity (Brankovic, Ringel and Werron, 2018[12]). However, research has shown that such rankings are often based on flawed and insufficiently transparent data and methods, are biased towards STEM subjects and English-speaking scholars and universities, and can accentuate global, regional and national inequalities (Hazelkorn and Mihut, 2021[13]) (Independent Expert Group (IEG), 2023[14]). Moreover, their use in decision-making is generating an increasing demand for costly analytics and consultancy products and services that are frequently provided by the same companies. Dependency on commercial technological companies limits the motivation of organisations to conduct their own assessments and meta-analyses. It can undermine accountability and the autonomy of decision-making as it relates to the allocation of research funds and the procurement of research information infrastructures. More broadly, a lack of transparency in the RA system can have a negative impact on the trust of the research community.

The potential role of AI to respond to the exponential demand for research assessment can be problematic. The capacity of “artificial intelligence” (AI, see definition in (OECD, 2024[15])), to filter and analyse very large datasets and document collections is potentially attractive to reduce reviewer workloads and thus is being explored by a number of research assessment actors. However, its use also presents considerable risks (Thelwall, 2025[16]) (Watermeyer et al., 2025[17]). The robustness of AI-generated solutions very much depends on the quality and methodology of their training. AI can be misled or deceived and cannot easily replace the in-depth knowledge of experts in a given field. Moreover, the use of these technologies can entail considerable investment in data analytics services (including licences), diverting resources and personnel towards their acquisition, management and monitoring. Similarly to the dependency on commercial technological companies for the analysis of research metadata, integrating externally owned non-transparent AI-systems into assessment processes risks introducing new biases and reducing the autonomy and accountability of the research system as a whole.

What are the priority actions to support change?

Copy link to What are the priority actions to support change?

Support initiatives that seek to advance common international principles/coordinated frameworks through contextually relevant mechanisms

In recent years, a number of international initiatives (Table 1) have been promoting responsible and context-sensitive approaches to research assessment, recognising broader contributions to research and addressing the unintended consequences of an excessive focus on metrics and indicators.

Initiatives	Year	Description
DORA (The San Francisco Declaration on Research Assessment)	2012	Calls on research actors to avoid using journal-based metrics as a surrogate measure for the quality of scientists or their work and consider a broad range of impact measures. The TARA (Tools to Advance Research Assessment) initiative, which grew out of DORA, in 2021, is developing practical tools to promote responsible research assessment
Leiden Manifesto	2015	Sets out principles and best practices for the use of quantitative indicators in research assessment
INORMS (International Network of Research Management Societies)	2018	Brings together research management societies and associations to share good practices in managing research. Key outputs of the INORMS Research Evaluation Group – including the SCOPE Framework for Research Evaluation and More than Our Rank initiative – are aimed at all stakeholders in the higher education research system to adopt responsible research evaluation
FOLEC-CLACSO (Latin American Forum on Research Assessment)	2019	A forum for exchange on the meaning and practices of research evaluation in Latin America, initiated by the Latin American Council for Social Sciences (CLACSO). The forum provides regionally specific guidelines for research assessment. FOLEC-CLASCO’s Research Assessment Academy trains reviewers and assessment specialists to support fairer and more situated evaluation processes.
Honk Kong Principles for Assessing Researchers	2019	Principles endorsed at the 6^th World Conference on Research Integrity to help research institutions to minimise perverse incentives, recognise and reward trustworthy research, support inclusion of behaviours that strengthen research integrity in frameworks for career appraisal and advancement
Science Europe	2020	Position Statement and Recommendations on Research Assessment Processes
CoARA (Coalition for Advancing Research Assessment)	2022	A collective of organisations committed to reforming the assessment of research, researchers, and research organisations. Initiative largely governed by European partners but with global signatories. Developed an Agreement on Reforming Research Assessment
Barcelona declaration	2024	Advocates for open research information
Global Research Council’s Working Group on Responsible Research Assessment	2024	Supports the adoption of responsible research assessment globally. Developed a new framework with 11 dimensions and guiding principles for responsible research assessment

These initiatives propose coordinated frameworks and widely recognised principles but also express conceptual and strategic differences that reflect the diversity and asymmetries of the global research system, and their implementation necessarily varies across national research systems and contexts. It is essential to promote sustained dialogue among initiatives, enabling the identification of areas of convergence, discussion of divergences, and mutual learning from situated experiences. Governments should consider highlighting the importance of these initiatives in national strategy and policy and consider integrating research assessment approaches and their alignment to emerging international principles as criteria within cyclical institutional assessment exercises

Empower key actors and provide flexibility in implementing reforms

Research assessment is conducted by multiple actors for a large diversity of objectives, at different levels of the research system. These objectives can be very context-specific and evolve over time. It is important to ensure that the main actors cooperate within their national context to share general principles while also providing enough flexibility for a diversity of approaches in implementation. Recent reforms of national research assessment systems in different countries have shown the value of an iterative process with top-down policy signals interacting with institutional experimentation, feedback loops, and periodic recalibration of evaluation criteria (Liang, Zhao and Li, 2024[18]) (Rushforth, 2024[19]). Rather than a one-off reform, this rather represents a long-term structural transition that continuously adapts to the changing research system.

Promote experimentation with new research assessment methods at national and institutional level

The acceptability of reforms in research assessment needs to be legitimised by robust evidence on the value of these reforms. Existing research assessment processes have been largely developed using sets of quantitative indicators linked to research outputs, and qualitative methods that usually involve peer review. While there is a growing movement towards reforming these processes in order to better match expected impact and incentivise various research practices, there is less evidence regarding the actual efficacy of new assessment methodologies (Dotti et al., 2024[20]) (Luxembourg National Research Fund (FNR), 2026[21]). Demonstrating the effectiveness of research assessment reforms in supporting high quality science and achieving broader objectives, whilst minimising negative side-effects is important. It is also important to analyse potential unintended consequences and ensure that reforms do not increase the current burden that assessment represents for many research actors. Governments could consider financing pilots or collaborative institutional initiatives as research projects, using national research funding instruments, for example, which could help signal importance of reform and help support wide diffusion of the result.

Make scientific publication, data and data infrastructures as accessible as possible

Effective research assessment requires a diversity of data that are often locked in by corporate entities. It is paradoxical that organisations are still often monitoring (and incentivising) open science practices using closed data. Over-dependency on corporate databases, data infrastructures and analytical services for research assessment (and associated policymaking) reduces the autonomy of the research system and the capacity of organisations to conduct their own metadata analyses and assessments. There are already a number of robust alternatives to corporate databases that offer comprehensive coverage of knowledge production across disciplines, regions and languages (e.g., Redalyc and OpenAlex) and these should be strengthened and promoted.

Discourage the use of commercial rankings and league tables in research assessment and decision-making

National or international commercial rankings are marketing tools that may have a role in raising visibility but should not be used for research assessment or to support funding decisions. Although their data and methods lack transparency, flaws have been demonstrated in a number of commercially operated university rankings. Their widespread adoption has many negative side-effects and has been shown to exacerbate global, regional and national inequalities (Kochetkov, 2024[22]) (Meho, 2025[23]). To counteract the influence of commercial rankings, stakeholders and decision-makers need to have access to alternative tools and methodologies for research assessment that can provide robust evidence to support their deliberative processes.

Evaluate only what and when necessary

There is growing concern about research assessment fatigue across the research system, affecting researchers, evaluators, and institutions alike. Evaluation processes require significant time and resources, and their proliferation creates administrative burdens that divert attention from research and innovation activities. In this context, it is increasingly important to ensure that research assessment is conducted only when it serves a clearly defined purpose and provides meaningful added value. This includes consideration of what kind(s) of evaluation serves the agreed purpose; for example, traditional summative evaluation (that looks at past performance) for decision-making or formative evaluation (that looks forwards to enable change) or a combination of both. Whatever the methodology, the resources, data collection, and efforts involved should remain proportionate to the intended objectives and expected outcomes of the assessment.

Use AI cautiously and responsibly

In response to the increasing complexity and demand for research assessment, the potential role of AI is being actively explored in different contexts. Considering its potential impact, the adoption of AI should be conducted in a careful and responsible way, based on a deep understanding of its principles of operation and its limitations. For example, transparent deterministic models might be preferred over large language models (LLMs) which have limited transparency and explainability (Newman-Griffis et al., 2025[24]). The training history, development and deployment of any algorithmic system in research assessment should be fully transparent, and its usage compliant with international standard disclosures. Using AI in a responsible way is clearly context-specific and needs to be problem-oriented and needs-driven. The role of AI in research assessment should be cautiously explored and subject to ex-ante risk assessment, human oversight and ex poste evaluation.

Provide the necessary support for training and capacity building

Effective research assessment is largely dependent on the experience, expertise and skills of the people involved. Implementing reforms in research assessment necessarily means involving a wide range of research stakeholders: senior university administrators, senior policy and decision makers, librarians, and researchers and other staff who carry out peer review and evaluation. Many of these individuals may have been applying established assessment methodologies and their results in a set context for many years. Even if they are well inclined, they will not necessarily be well prepared to implement new RA frameworks and processes. Guidance, training and capacity building will therefore be key to successful reforms.

What can policymakers do?

Copy link to What can policymakers do?

Support initiatives that advance the implementation of globally agreed research assessment principles and mechanisms and mutual learning on research assessment reforms
Reduce the systematic demand for research assessment and prioritise its use in situations where the aims and benefits are clear.
Foster studies on the effectiveness of reforms in research assessment
Implement policies that support the establishment and adoption of open data infrastructures for research assessment
Reform the way rankings are being used, discouraging the use of commercial rankings in research assessment
Support careful experimentation and the development of guidelines on the use of AI in research assessment

References

[5] Bergstrom, C., J. Foster and Y. Song (2016), “Why Scientists Chase Big Problems: Individual Strategy and Social Optimality”, Physics and Society, https://arxiv.org/abs/1605.05822.

[11] Bornmann, L. (ed.) (2016), “The Global Burden of Journal Peer Review in the Biomedical Literature: Strong Imbalance in the Collective Enterprise”, PLOS ONE, Vol. 11/11, p. e0166387, https://doi.org/10.1371/journal.pone.0166387.

[12] Brankovic, J., L. Ringel and T. Werron (2018), “How Rankings Produce Competition: The Case of Global University Rankings”, Zeitschrift für Soziologie, Vol. 47/4, pp. 270-288, https://doi.org/10.1515/zfsoz-2018-0118.

[7] Demir, S. (2018), “Predatory journals: Who publishes in them and why?”, Journal of Informetrics, Vol. 12/4, pp. 1296-1311, https://doi.org/10.1016/j.joi.2018.10.008.

[20] Dotti, N. et al. (2024), Funding-by-lottery and other alternative models for research funding, https://doi.org/10.2777/3966961.

[6] Edwards, M. and S. Roy (2017), “Academic Research in the 21st Century: Maintaining Scientific Integrity in a Climate of Perverse Incentives and Hypercompetition”, Environmental Engineering Science, Vol. 34/1, pp. 51-61, https://doi.org/10.1089/ees.2016.0223.

[13] Hazelkorn, E. and G. Mihut (2021), “Introduction: putting rankings in context looking back, looking forward”, in Research Handbook on University Rankings, Edward Elgar Publishing, https://doi.org/10.4337/9781788974981.00008.

[14] Independent Expert Group (IEG) (2023), Statement on Global University Rankings, United Nations University - International Institute for Global Health, https://doi.org/10.37941/pb/2023/2.

[22] Kochetkov, D. (2024), “University rankings in the context of research evaluation: A state-of-the-art review”, Quantitative Science Studies, Vol. 5/3, pp. 533-555, https://doi.org/10.1162/qss_a_00317.

[18] Liang, H., K. Zhao and J. Li (2024), “Responding to the new research assessment reform in China: the universities’ institutional hybrid actions”, Journal of Higher Education Policy and Management, Vol. 47/4, pp. 473-489, https://doi.org/10.1080/1360080x.2024.2446372.

[21] Luxembourg National Research Fund (FNR) (2026), Three years of Narrative CVs at the FNR: what the data tell us, https://www.fnr.lu/research-with-impact-fnr-highlight/three-years-of-narrative-cvs-at-the-fnr-what-the-data-tell-us/.

[23] Meho, L. (2025), “Gaming the metrics: bibliometric anomalies in global university rankings and the research integrity risk index (RI2)”, Scientometrics, Vol. 130/11, pp. 6683-6726, https://doi.org/10.1007/s11192-025-05480-2.

[24] Newman-Griffis, D. et al. (2025), Funding by Algorithm - A handbook for responsible uses of AI and machine learning by research funders, Research on Research Institute (RoRI), https://doi.org/10.6084/m9.figshare.29041715.

[1] OECD (2026), “New expectations and demands from science: Rethinking research assessment frameworks”, OECD Science, Technology and Industry Working Papers, No. 2026/7, OECD Publishing, Paris, https://doi.org/10.1787/0c685800-en.

[15] OECD (2024), “Explanatory memorandum on the updated OECD definition of an AI system”, OECD Artificial Intelligence Papers, No. 8, OECD Publishing, Paris, https://doi.org/10.1787/623da898-en.

[2] OECD (2021), “Effective policies to foster high-risk/high-reward research”, OECD Science, Technology and Industry Policy Papers, No. 112, OECD Publishing, Paris, https://doi.org/10.1787/06913b3b-en.

[3] OECD (2020), “Addressing societal challenges using transdisciplinary research”, OECD Science, Technology and Industry Policy Papers, No. 88, OECD Publishing, Paris, https://doi.org/10.1787/0ca0ca45-en.

[8] Öztürk, O. and Z. Taşkın (2024), “How metric-based performance evaluation systems fuel the growth of questionable publications?”, Scientometrics, Vol. 129/5, pp. 2729-2748, https://doi.org/10.1007/s11192-024-04991-8.

[19] Rushforth, A. (2024), “Beyond impact factors? Lessons from the Dutch attempt to transform academic research assessment”, Research Evaluation, Vol. 34, https://doi.org/10.1093/reseval/rvaf035.

[4] Scalas, E. (ed.) (2010), “Do Pressures to Publish Increase Scientists’ Bias? An Empirical Support from US States Data”, PLoS ONE, Vol. 5/4, p. e10271, https://doi.org/10.1371/journal.pone.0010271.

[9] Science Europe (2022), A Values Framework for the Organisation of Research, https://doi.org/10.5281/zenodo.6637847.

[10] Technopolis Group (2015), REF Accountability Review: Costs, benefits and burden, https://technopolis-group.com/report/ref-accountability-review-costs-benefits-and-burden/.

[16] Thelwall, M. (2025), “Research quality evaluation by AI in the era of large language models: advantages, disadvantages, and systemic effects – An opinion paper”, Scientometrics, Vol. 130/10, pp. 5309-5321, https://doi.org/10.1007/s11192-025-05361-8.

[17] Watermeyer, R. et al. (2025), Exploring the potential of generative AI for REF2029, https://bpb-eu-w2.wpmucdn.com/blogs.bristol.ac.uk/dist/3/1073/files/2025/11/Full-report-final.pdf.

Contact

Frédéric SGARD ( frederic.sgard@oecd.org).

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Countries

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Publications

Publications

Featured publications

Data

Data

Featured data

News & events

News & events

Featured events

About OECD

About

Engage with us

Work with us

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment