Despite the vast amount of evaluation evidence available across the United Nations (UN) system, its use is often limited. In 2024, the UN Sustainable Development Group (UNSDG) System-Wide Evaluation Office (SWEO) conducted an AI-assisted initiative to map and summarise evaluations. The goal is to effectively inform UN decision makers and member states, to enhance effectiveness and efficiency of UN programmes to support progress towards the 2030 Agenda and the Sustainable Development Goals (SDGs).
Enhancing evidence use with AI: The UN system‑wide approach
Abstract
Challenge
Copy link to ChallengeLike many other large institutions, the United Nations (UN) have been struggling to strengthen knowledge management and improve the use of evidence for more strategic decision making: UN evaluations are neither well-known nor widely used, whether by the UN itself or Member States seeking to assess its contribution to development results. Despite a large volume of reports, filled with evidence, real-life examples and concrete lessons for improving the policies and practices of development co-operation, evaluations remain scattered, difficult to access and underutilised. To address these challenges, the UN Sustainable Development Group System-Wide Evaluation Office (SWEO) launched an AI-assisted initiative to make evaluation evidence more accessible. The initiative also served as a proof of concept for a UN system-wide mapping of evaluations, exploring trends in evaluation practice and testing the use of AI for evidence extraction, classification and synthesis.
Approach
Copy link to ApproachThe UN implemented its evaluation mapping initiative between April and October 2024, which resulted in new interactive maps and summary products. Drawing from the evaluation repository maintained by the United Nations Evaluation Group (UNEG), an interagency network of UN evaluation units, it followed the following steps, also captured in Figure 1:
Scoping and consultation (step 0): Drafting of a project concept note for consultation; feedback from senior evaluation officers and evidence synthesis experts across the UN system.
Developing the framework (step 1): Taxonomy development based on identification of key UN development system priorities, as well as report typologies.
Setting eligibility criteria (step 2): Selection of evaluations of strategic importance that clearly contribute to the SDGs (including country-level, regional, thematic and strategic/policy); joint or pooled funding evaluations, and evaluation syntheses.
Establishing search strategy (step 3): The work built on. This step involved scanning selected evaluations using information available in the UNEG evaluation repository. Final selection of evaluations was based on criteria in step 2, comprising 25% of the approximately 4 000 UN evaluations published from 2021-24.
Building of interactive maps (steps 4-7): Preliminary keyword coding of themes; the development and piloting of large language model (LLM) coding/data extraction for classification based on a sample of reports; the scale up of LLM classification to all reports; and the creation of a geographical map on ArcGIS and several interactive maps on a free, open-source software EPPI Mapper.
Drafting of summary products (steps 4-7): Establishing a short list of summary topics; selecting five priority topics; identifying and sampling relevant evaluations; and drafting five 10-15 page summaries detailing key insights from evaluations.
Figure 1. Approach to the AI-assisted mapping of UN cross-system initiative
Copy link to Figure 1. Approach to the AI-assisted mapping of UN cross-system initiative
Source: UNSDG System-Wide Evaluation Office (2025), Utilizing United Nations evaluation evidence in support of the 2024 QCPR, SWEO Learning Paper #1, https://www.un.org/system-wide-evaluation-office/sites/default/files/2025-03/SWEO_Evaluation%20Evidence%20Mapping%20%26%20Summaries_Learning%20Paper_Feb2025.pdf.
Results
Copy link to ResultsThe AI-assisted mapping of UN evaluations issued over 2021-24 led to the following results:
The comprehensive overview of evaluative evidence helped enhance its availability and use. The initiative mapped and summarised approximately 1 000 UN evaluations, providing a clear picture on the state of evaluations across the UN system and demonstrating the usefulness of AI and LLM for evidence classification and extraction. The findings showed that while UN evaluations contain valuable insights on specific evaluation criteria, and cross-cutting or UN system-wide issues, these are often hidden within lengthy reports and underutilised, pointing to the need for improved system-wide knowledge management solutions.
The mapping helped identify the main thematic priorities of evaluations, as well as gaps in evaluation coverage. The themes set out in the Quadrennial Comprehensive Policy Review 2016-2020 (QCPR) were used to tag the evaluations, as this is the process through which UN Member States assess UN activities and agree on medium-term priorities. Mapping revealed that much evaluative evidence focused on national priorities, gender equality, and thematic areas such as climate, education and food security. Gaps were identified in areas such as results-based management, disability inclusion, SDG financing, funding quality and co‑ordination within the UN system. While many evaluations do contain evidence relating to strategic, systemic and cross-cutting issues, such as results-based management and funding, few make them the central focus (Figure 2). LLM performance also varied depending on topic complexity.
Future work aims to create living evidence maps, embedding AI tools in UN databases to make evaluation evidence more timely, relevant and aligned with key decision moments across the UN system. SWEO is involved in a joint taskforce to revamp the UNEG database by integrating AI capabilities for improved document processing, categorisation and extraction of information.
Figure 2. Results of UN evaluation mapping, by theme, 2021-24
Copy link to Figure 2. Results of UN evaluation mapping, by theme, 2021-24
Source: UNSDG System-Wide Evaluation Office (2025), Utilizing United Nations evaluation evidence in support of the 2024 QCPR, SWEO Learning Paper #1, https://www.un.org/system-wide-evaluation-office/sites/default/files/2025-03/SWEO_Evaluation%20Evidence%20Mapping%20%26%20Summaries_Learning%20Paper_Feb2025.pdf.
Lessons learnt
Copy link to Lessons learntUsing mature, commercially available LLM allowed for effective evidence extraction, classification and abstract generation without the need for model training or development. Despite encouraging results, some issues were encountered related to LLM security settings filtering out potentially harmful content (e.g. child protection issues). Collaboration with LLM companies, improved data governance, and AI-supported automation will help to enhance evaluation knowledge management. Future LLM versions are expected to further improve in accuracy.
Having a “human in the loop” is vital. The team set out to achieve objectives of LLM “percentage accuracy” against a human-tagged set of reports. This proved challenging due to variation across human coders, despite clear protocols, especially when multiple experts were involved. Nonetheless, human/expert familiarity with a sample of reports proved vital for effective and efficient review of multiple iterations and reengineering of prompts in a process of “human-AI collaboration” on the classification of evidence.
Small, focused, interdisciplinary teams enhance outcomes. Blending evaluation, synthesis and machine learning/AI expertise enabled rapid generation of high-quality outputs at relatively low cost. Three core team members produced the evaluation evidence mapping in 8-10 weeks. This experience suggests that greater value may be achieved by working with a smaller, focused team that prioritises multiple prompt iterations.
Standardising and improving data inputs and processing are essential for effective AI-assisted evaluation and knowledge management. The pilot highlighted limitations in existing UN evaluation databases, variations in report formats and structures, and inconsistent use of tagging and meta-data, which hinder effective automation and analysis. This points to the need for more consistent knowledge management practices and evaluation structures.
Further information
Copy link to Further informationUNSDG System-Wide Evaluation Office (2025), Utilizing United Nations evaluation evidence in support of the 2024 QCPR, SWEO Learning Paper #1, https://www.un.org/system-wide-evaluation-office/sites/default/files/2025-03/SWEO_Evaluation%20Evidence%20Mapping%20%26%20Summaries_Learning%20Paper_Feb2025.pdf.
UNSDG System-Wide Evaluation Office, United Nations Evaluation Evidence Map: Coverage of 2020 Quadrennial Comprehensive Policy Review Priorities, https://www.sdgsynthesiscoalition.org/sites/default/files/2024-10/UNSWE_Interactive%20Evaluation%20Evidence%20Map_QCPR_coverage_v1.0.html (accessed 31 July 2025).
UNSDG System-Wide Evaluation Office, United Nations Evaluation Evidence Map: Detailed Evidence on 2020 Quadrennial Comprehensive Policy Review Priorities, https://www.sdgsynthesiscoalition.org/sites/default/files/2024-10/UNSWE_Interactive%20Evaluation%20Evidence%20Map_QCPR_detailed%20evidence_V1.0.html, (accessed 31 July 2025).
UNSDG System-Wide Evaluation Office, United Nations Evaluation Evidence Map: Coverage of Sustainable Development Goals, https://www.sdgsynthesiscoalition.org/sites/default/files/2024-10/UNSWE_Interactive%20Evaluation%20Evidence%20Map_SDGs_v1.0.html (accessed 31 July 2025).
UNSDG System-Wide Evaluation Office, United Nations Evaluation Evidence Map: Country Coverage (2021–2024), https://system-wide-evaluation-office.github.io/UN_evaluation_evidence_map/, (accessed 31 July 2025).
UNSDG System-Wide Evaluation Evidence Summaries, https://www.un.org/system-wide-evaluation-office/en.
OECD resources
Copy link to OECD resourcesOECD (2025), Harnessing AI for efficient use of evaluation evidence in Finland, Development Co-operation TIPs • Tools Insights Practices, https://read.oecd.org/10.1787/ec059edb-en.
OECD (2024), Recommendation of the Council on Artificial Intelligence, C/MIN(2024)16/FINAL, https://legalinstruments.oecd.org/en/instruments/oecd-legal-0449.
Lorenz, P., K. Perset and J. Berryhill (2023), “Initial policy considerations for generative artificial intelligence”, OECD Artificial Intelligence Papers, No. 1, OECD Publishing, Paris, https://doi.org/10.1787/fae2d1e6-en.
OECD (2021), Applying Evaluation Criteria Thoughtfully, OECD Publishing, Paris, https://doi.org/10.1787/543e84ed-en.
To learn more about other development co-operation practitioners, see:
More In Practice examples available on Development Co-operation TIPs • Tools Insights Practices.
This work is published under the responsibility of the Secretary-General of the OECD. The opinions expressed and arguments employed herein do not necessarily reflect the official views of the Member countries of the OECD.
This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area.
Photo credits: ©Andrey Popov/Shutterstock.
© OECD 2025
Attribution 4.0 International (CC BY 4.0)
This work is made available under the Creative Commons Attribution 4.0 International licence. By using this work, you accept to be bound by the terms of this licence (https://creativecommons.org/licenses/by/4.0/).
Attribution – you must cite the work.
Translations – you must cite the original work, identify changes to the original and add the following text: In the event of any discrepancy between the original work and the translation, only the text of original work should be considered valid.
Adaptations – you must cite the original work and add the following text: This is an adaptation of an original work by the OECD. The opinions expressed and arguments employed in this adaptation should not be reported as representing the official views of the OECD or of its Member countries.
Third-party material – the licence does not apply to third-party material in the work. If using such material, you are responsible for obtaining permission from the third party and for any claims of infringement.
You must not use the OECD logo, visual identity or cover image without express permission or suggest the OECD endorses your use of the work.
Any dispute arising under this licence shall be settled by arbitration in accordance with the Permanent Court of Arbitration (PCA) Arbitration Rules 2012. The seat of arbitration shall be Paris (France). The number of arbitrators shall be one.
Related content
-
15 April 20265 Pages -
9 March 20264 Pages -
6 March 20265 Pages -
6 March 20264 Pages -
15 December 20255 Pages -
8 December 20255 Pages