In the field of policy evaluation, AI can perform some tasks, allowing government analysts to use a broader range of evidence and process it faster. While some initial applications of AI were identified in evaluation design, analysis, and evaluation communication and management, the use of AI in policy evaluation remains limited. For this reason, there are areas where AI has the potential for significant impact on policy evaluation in the future.
First, to support evaluation design, chatbots could help evaluators build their knowledge in specific fields. Indeed, if well prompted, chatbots can perform several activities that can support learning. As some initial examples of evaluation design show, they can also support creative thinking and be used as useful tools for brainstorming (Ferretti, 2023[133]). Even if these tools do not generate new evidence, they can provide new insights helpful for initial stages of an evaluation process. Recently, for instance, ChatGPT's Deep Research attempts to automate a large part of the process of evidence review and synthesis. Using chain of thought (CoT) reasoning, tools such as Deep Research break complex research questions into smaller, comprehensible sub-questions that it answers in sequence. The approach enables the system to prepare a detailed report based on its review of the available evidence. Such CoT techniques have the potential to automate a large part of the evidence review and synthesis process. This may enable researchers who would previously develop a few reviews from scratch to instead automate, quality assure and build upon dozens of AI-generated research reviews.
Second, from an analytical perspective, there is a strong potential for AI to be further used to conduct more ambitious ex ante and ex post evaluations, using a broader range of data and assessing impact through quasi-experimental methods. For instance, AI-driven behavioural forecasting can analyse large quantities of historical data and observed behaviour to identify patterns, anticipate decisions and optimise user experiences by integrating contextual variables and external stimuli. ML tools can be used for counterfactual prediction in cases where a control group is missing. This can be used, for example, in the case of carbon pricing assessments, where policy evaluators lack an ex-post perspective. One study proposes a policy evaluation approach using ML tools and economic theory for counterfactual prediction to analyse the costs and emissions impacts of the UK CPS, “a carbon tax levied on all fossil-fired power plants” (Abrell, Kosch and Rausch, 2022[134]).
Finally, in the longer-term, AI has the potential to change the approach to policymaking from a policy cycle and allow evaluations to feed into decision-making at multiple stages. Because AI makes evaluations quicker and to some extent less costly, academics suggest the possibility of shifting from a system where evaluations often arrive too late for decision-making to an approach where evaluative evidence is available to shape, adjust and redesign policies almost in real time. This is referred as Dynamic Public Policy Cycle (Jacob, 2025[125]). As countries around the world have faced a series of crises in recent years, it is essential that governments have access to evaluative evidence in key decision-making stages. Rapid evaluations are developed to inform urgent decision-making and have been efficiently used for this purpose, for example, in Australia (Better Evaluation Knowledge, 2022[135]). While these rapid evaluations now rely mainly on qualitative data, AI could play an important role in making these evaluations more robust and common in the future.
However, for AI to effectively support evaluation, governments need to invest in civil servants’ skills and developing a strong data infrastructure. Stronger international collaboration can also enhance AI’s potential in policy evaluation. Evaluators need a good understanding of AI’s potential benefits, risks and limitations to make informed decisions on when and how to use it. For this reason, governments need to invest in training courses for evaluators to ensure they understand the different tools available to them. Trainings have been developed across OECD governments (see Chapter 4, section on “Fostering skills and talent”). However, these are mainly on the use of AI in government and are not tailored to the field of evaluation. In addition to trainings, it is important to support experimentation with AI and learn-by-doing. Developing a network across line ministries to exchange relevant AI applications can be a good way to support AI uptake in different evaluation tasks. Some incubators are currently being developed but a stronger focus on evaluation is needed.
As is the case for other policy areas, governments should invest in relevant data infrastructures and data sharing that is safe and secure (see Chapter 4, sections on “Creating a strong data foundation” and “Building out digital infrastructure” for a detailed discussion). Some government organisations, such as the Australian Centre for Evaluation, have developed guides to facilitate data discovery and access to support evaluation activities (ACE, 2025[136]). Some OECD countries have developed ways in which different datasets can be linked and accessed in secure environment to ensure policy analysis. In Denmark, for example, Statistics Denmark (2025[137]) facilitates the use of these micro-level databases for research purposes for approved analysts, universities, research organisations or ministries. In the Netherlands, the government launched the Data Agenda Government, outlining plans to improve the management of personal data, open data and big data, leveraging analysis and integration for informed policymaking and addressing societal challenges (Netherlands Ministry of the Interior and Kingdom Relations, 2019[138]).
Finally, AI has potential in evidence synthesis. There is a broader calling for stronger collaboration on evidence generation across countries, following strategic initiatives supported by countries such as the United Kingdom and Australia (Halpern and Maru, 2024[139]). This agenda recognises a need for faster, reliable synthesis at the international level, given that AI is already helping to reduce timelines for evidence production. This could help to fill some of the existing gaps faster.