The AI and Future of Skills (AIFS) project at the OECD’s Centre for Educational Research and Innovation (CERI) presents a framework to systematically measure artificial intelligence (AI) and robotic capabilities and compare them to human skills. This chapter describes the OECD’s methodology to develop the beta indicators presented in this volume. It shows how the indicators can provide clear and evidence-based insights to policy makers about AI developments and their implications for society, work and education. The OECD also argues that the indicators provide a principled and cautious framework to measure progress towards artificial general intelligence (AGI). To that end, it gives policy makers the needed tools to independently verify claims from technology leaders and researchers about AI progress.
Introducing the OECD AI Capability Indicators

2. Constructing a framework to measure AI capabilities
Copy link to 2. Constructing a framework to measure AI capabilitiesAbstract
AI has progressed beyond our understanding of its capabilities
Copy link to AI has progressed beyond our understanding of its capabilitiesDepending on the source, artificial intelligence (AI) is either set to save the world or destroy it. In a landscape dominated by hype and fear, clear, reliable and nuanced information about the true capabilities of AI remains strikingly absent. Even AI developers do not understand the current capacities of AI systems – or how rapidly they are advancing. As an independent and authoritative international body, the OECD is well positioned to fill this knowledge gap. Drawing on its experience of comparative assessment, extensive collaboration with leading computer scientists and engineers, and its international perspective, the OECD has developed a unique methodology to deliver rigorous, evidence-based and clear insights into the real-world performance of AI. This framework offers policy makers the clarity they urgently need to navigate an increasingly complex technological environment and craft informed, future-proof strategies.
With the launch of ChatGPT in 2022, AI and robotics are rapidly advancing, and policy makers worldwide recognise the need to assess their capability. The AI Act of the European Union (EUR-Lex, 2024[1]), for example, mandates regular monitoring. For their part, the OECD Council’s AI Recommendation (OECD Legal Instruments, 2024[2]) and the 2025 Paris AI Summit (La Maison Élysée, 2025[3]) emphasise the importance of understanding AI’s influence on the job market.
Despite this increased attention, a persistent gap remains: no systematic framework comprehensively measures AI capabilities in a way that is both understandable and policy relevant. To address this gap, the OECD has developed a framework for evaluating AI capabilities, introducing its beta AI Capability Indicators (see Figure 2.1 for an overview of their development process). The OECD’s indicators are designed to be:
1. Understandable – communicating AI strengths and limitations in a straightforward manner.
2. Policy relevant – offering insights into AI’s impact on education, employment and the economy.
3. Comprehensive – covering all critical aspects of AI capabilities.
4. Responsive – tracking AI progress over time through systematic updates.
Linking AI capabilities to human abilities allows policy makers to gauge AI’s potential role in education, work and everyday life. Existing frameworks that characterise AI capabilities such as MLCommons (MLCommons, 2025[4]) and Stanford’s AI Index (Maslej et al., 2025[5]) discuss capabilities purely in terms of benchmark performance without any comparison to human abilities. Taken in isolation, benchmark results are unclear to non-AI experts, and even to AI researchers it is unclear how results relate to AI systems’ capacity to perform tasks in real world situations.
In a discussion of the limitations of current AI benchmarks, the authors of the 2025 AI Index report noted that “(t)o truly assess the capabilities of AI systems, more rigorous and comprehensive evaluations are needed” (Maslej et al., 2025[5]) The OECD’s framework is unique in its effort to compare AI capabilities to the full range of human abilities used in the job market, recognising that in many cases adequate AI benchmarks for the advanced levels are still lacking. The indicators provide a conceptual framework for identifying or developing tests that can evaluate AI’s capabilities systematically across the full range of human skill domains relevant to life and employment. Frameworks based in AI performance on benchmarks without a human baseline risk being quickly overtaken by rapid AI developments. An additional benefit of grounding a framework to evaluate AI capabilities in human abilities is they are relatively fixed over time. In other words, the framework will remain stable and informative amid rapid AI progress until it truly surpasses the full range of human performance.
Anthropic’s AI Economic Index (Handa et al., 2025[6]) adopted a novel approach to analysing the impact of AI developments on the economy by linking Claude.ai’s capabilities to some human tasks. This was performed by analysing millions of interactions between Claude large language models (LLMs) and users where tasks that could be linked to those found on the O*NET were performed. However, this approach was limited to analyses of chat-based interactions with LLMs. As such, it did not aim to compare AI performance to the full range of human abilities used in occupations. The OECD’s own approach to leverage its AI Capability Indicators to analyse the impact of AI developments on occupations is described in Chapter 4 of this report.
Acknowledging the limitations of its current framework, the OECD presents these indicators in beta form, inviting collaboration with experts in AI and human psychology. The ratings of AI systems presented below were finalised in November 2024 and therefore reflect the state of the art at that time. Future refinements will strengthen the indicators as a precise and responsive tool for tracking AI developments.
This chapter introduces the motivation for the indicators and the methodology for constructing them (see Figure 2.1 for an overview of the indicators’ development). It also acknowledges the limitations of the beta scales identified during our peer review process. A more detailed discussion of the methodology can be found in the technical volume released alongside this report (OECD, 2025[7]). The chapter ends with a discussion of the ways policy makers can use the indicators to help answer the questions arising from rapid AI development.
Chapter 2 presents all nine scales, the descriptors of performance at each level and the rating of state‑of‑the‑art AI systems. Each indicator also includes a discussion of what is important to measure in each domain and the available evidence of AI performance.
Chapter 3 gives more specific examples of how researchers and policy makers can leverage the scales to provide evidence-based analysis of the impact of AI and its implications for the workforce and education systems.
Methodology: A novel and unique approach
Copy link to Methodology: A novel and unique approachFrom tasks to capabilities: A new approach to assessing AI
Understanding the impact of AI on society requires more than evaluating whether it can perform specific jobs or tasks. While task-based analyses have been central in labour economics, they often lack clarity and coherence when applied to AI; most human tasks involve multiple, interrelated capabilities.
In recent years, the economics literature has moved from occupation-based analyses of the impact of AI on work to analyses based on tasks This shifts recognises that occupations are based on collections of tasks that will likely be affected quite differently by any given AI technique or other technology. This task-based approach has been a major innovation in labour economics research (Autor, Levy and Murnane, 2023[8]).1 However, it fails to provide a clear framework for describing developments in AI. First, task taxonomies are difficult to understand because there are thousands or tens of thousands of distinct work tasks in the modern economy; in contrast, there are a relatively small number of basic capabilities. Second, individual tasks typically demand multiple skills that may be affected quite differently by any given AI technique or other technology. Thus, they fail to provide a clear framework for communicating key advances or limitations in AI.
The OECD instead adopts a capability-based framework, shifting focus from fragmented tasks to core human abilities2 such as reasoning, language, social interaction and psychomotor skills. Grounded in human psychology, this approach offers a structured and high-level perspective on AI development. To show AI development across the full range of human abilities, the OECD has developed nine AI Capability Indicators shown in Figure 2.2.
Figure 2.1. The development of the OECD AI Capability Indicators
Copy link to Figure 2.1. The development of the OECD AI Capability Indicators
Figure 2.2. OECD AI Capability Indicators
Copy link to Figure 2.2. OECD AI Capability Indicators
Constructing and developing the indicators
Copy link to Constructing and developing the indicatorsBuilding scales with five levels
The OECD collaborated with a core group of 30 computer scientists, psychologists and assessment experts to develop comprehensive indicators that capture AI’s progression from simple to complex tasks; distinguish qualitatively significant breakthroughs; and provide meaningful understanding of AI’s capabilities compared to human abilities.
The OECD created scales with five levels to represent the increasing difficulty of tasks for AI systems (see Figure 2.3). The scales aim to provide coverage for all types of AI and robotics systems. In the current version of the scales, narrow symbolic AI systems, neuro-symbolic systems, LLMs, social agents and robotics systems are all considered as they appear in the different AI subfields working on the different capabilities. In each scale, level 1 reflects solved AI challenges (e.g. Google Search’s retrieval capabilities), while level 5 represents performance that simulates all aspects of the corresponding human abilities.
The OECD’s primary motivation for developing five-level scales was to communicate progression in AI capabilities in a manner understandable to those outside of the field. Each scale generally includes several dimensions that reflect varying difficulties for AI. These levels are marked by clear qualitative differences, not just gradual improvements in performance. Each indicator identifies the current level of performance of AI systems on the five-level scale.
Ideally, each scale would be supported by formal tests with comparable results for both AI systems and humans, but such tests do not yet exist for many areas. Instead, the OECD draws on whatever evidence is available and uses expert judgement to fill in the gaps. The evidence may include formal tests but also competitions or analyses of the performance of individual AI systems. Expert input is also used to critically assess the relevance and interpretation of existing tests, ensuring that performance claims are well-grounded in evidence. Standardised human assessments such as the Programme for International Assessment and the Programme for the International Assessment of Adult Competencies have occasionally been used to evaluate AI. However, their primary value lies in inspiring the design of new AI tests that can better reflect the full range of human capabilities relevant to AI systems.
Future extensions of the scale may include aspects of AI capabilities that go beyond the scope of human abilities, which can be described as qualitatively superhuman aspects of AI performance. This contrasts with quantitatively superhuman performance. In the latter case, AI simply outperforms humans on speed, or accuracy or data coverage but is otherwise doing things that humans can also do. The concept of superhuman capabilities is not unique to AI. Historically, innovations like microscopes, calculators and power tools have enabled humans to perform tasks well beyond our natural limits. However, AI raises new questions about capabilities that require intelligent behaviour rather than just mechanical or computational advantage.
Figure 2.3. Overview of the five levels of AI
Copy link to Figure 2.3. Overview of the five levels of AI
Validating the indicators
The process evolved over five years, culminating in a structured peer review in late 2024. The review comprised 25 researchers from the fields of AI, psychology and education, who were divided into two categories:
1. Overall reviewers – evaluating the comprehensiveness of the framework
2. Domain-specific reviewers – assessing individual scales within their areas of expertise.
Expert feedback has guided refinements, shaping the current beta version. Future iterations will integrate additional evidence, standardise measurement methodologies and enhance policy relevance. The OECD hopes for more feedback from AI researchers, psychologists, education specialists and economists to further refine the beta indicators.
Linking AI performance measures
The goal of linking AI performance measures to the scales is to identify the highest level on each scale where AI shows robust and reliable performance at a specific point in time. The scale levels are ordered by difficulty. Consequently, when AI is ranked at a given level it can simulate all aspects of the capability in a robust and reliable way up to that level. However, due to design constraints, any specific AI system may lack some aspects of the capability. The scales reflect the general state of the art, not specific system limitations.
Some scales highlight that lower-level capabilities in narrow applications may not be easily integrated into a system with generalised performance. The bottom levels reflect solved problems where AI is often quantitatively superhuman – performing tasks faster and more accurately than humans. Middle levels often have standard benchmark tests, while higher levels cover emerging or unevaluated aspects of the capabilities that still challenge current AI systems. Experts provided estimates for these higher levels, even in the absence of solid performance measures.
Limitations
Copy link to LimitationsA major challenge in evaluating AI lies in the uneven availability of measurement tools across different capabilities. While areas like language and vision benefit from decades of benchmarks, others – such as social interaction and creativity – lack formal assessments. In these cases, expert judgement helps fill the gaps.
Although a fully systematic AI assessment framework does not yet exist, this approach collects current evidence and highlights key areas for future development. The indicators are still in beta form, with plans for refinement through collaboration with researchers and experts.
The beta version of the AI Capability Indicators offers a valuable foundation for understanding what AI can and cannot do. However, advancing this framework will require deeper collaboration with a broader community of experts. Key next steps comprise:
Expanding coverage: Include additional capabilities and indicators to capture a fuller picture of AI’s range.
Refining structure: Ensure consistency in benchmarks, account for multi-agent systems and manage overlapping capabilities.
Clarifying human performance baselines: Provide greater consistency – or intentional differentiation – as needed to reflect meaningful contrasts in AI vs. human difficulty since scales vary in comparing AI to average or expert human performance.
Enabling dynamic updates: Create a process for regular reviews and maintain a repository3 of performance measures to stay current with rapid AI advances.
By aligning AI capabilities with core human abilities, the OECD indicators provide a transparent, policy‑relevant tool for assessing AI’s societal impact, especially in areas like work and education. While still evolving, this framework represents a key step towards a more systematic, evidence-based understanding of AI progress. A more detailed discussion of the methodology used to develop the indicators, along with its limitations can be found in the technical volume published alongside this report (OECD, 2025[7]).
Next steps
Copy link to Next stepsAfter the beta indicators have been suitably refined the OECD aims to pursue additional activities. This will ensure the indicators remain responsive to AI developments. They will also aim to help AI researchers design and implement valid and informative tests of AI capabilities.
1. Regular updates
The OECD will implement a cycle of regular updates. These will include monitoring improved AI results on existing AI benchmark tests that are linked to the scales. They will also monitor the scientific literature to identify new benchmark tests that should be linked to the scales. Finally, they will synthesise the results into statements about AI’s current performance in relation to the scales. The updates will be vetted with the OECD’s network of AI researchers and psychologists. The OECD plans to develop the updating methodology through the remainder of 2025, with the first update carried out at the beginning of 2026.
2. Anticipating AI breakthroughs
Thus far, the OECD has focused on developing descriptions of AI’s current capabilities that have been demonstrated in the literature. To extend the indicators’ potential usefulness in understanding AI’s implications, the OECD will develop an approach to anticipating potential breakthroughs. To that end, it will invite expert groups to analyse key research plans in the field to describe the scope of potential breakthroughs in relation to the indicators. The goal is to analyse research related to capabilities that AI currently lacks to understand which capabilities may be poised for breakthroughs in performance. The OECD plans to carry out the initial effort to link the indicators to AI breakthroughs in 2026.
3. Expert survey
To complement expert judgements, the OECD will develop a formal periodic expert survey to review and provide input on key statements about AI’s capabilities, modelled after the University of Chicago Economic Experts Panel (Clark Center for Global Markets, 2025[9]). This survey will provide regular updates on aspects of AI’s capabilities that public benchmarks do not currently assess. Such a survey will provide early indications when new work is starting to focus on AI capabilities that have previously been too difficult for the field to address. The OECD plans to recruit experts for the panel during 2025, and then launch the panel in 2026. It will conduct monthly surveys following the Chicago model.
4. New benchmark tests and competitions
To provide more complete information about developing AI capabilities, the OECD will identify specific levels on the scales where benchmark tests and competitions are currently inadequate. These scale levels will reflect areas that should be monitored to identify when new AI capabilities are poised for substantial development. The long-term goal is to develop a new testing programme that can provide independent and authoritative benchmark test results for these missing levels of AI capabilities. In so doing, it would provide adequate information to the public about AI developments. The OECD plans to hold an initial workshop in 2026 to discuss candidate levels where new tests or competitions could be beneficial and identify available assessment approaches. This will lead to initial assessment development work in 2027.
The role of the AI Capability Indicators
Copy link to The role of the AI Capability IndicatorsThe OECD’s indicators offer policy makers an evidence-based tool to assess AI progress in terms understandable for a non-technical audience. This knowledge is particularly valuable for implementing major policy initiatives like the AI Act of the European Union and the OECD AI Recommendation.
Using the indicators to anticipate the impact of AI on education, work and society
As AI systems evolve, their growing capabilities raise new questions about when and how they should be trusted, deployed or constrained. The indicators help clarify which levels of AI performance may trigger ethical or safety concerns. These include decision making without accountability, or autonomy in high-stakes domains like warfare or health care.
Beyond ethics, the indicators offer a practical tool to analyse how AI capabilities align with the demands of human work. This makes it possible to identify occupations more exposed to automation and to anticipate broader economic impacts. While the indicators do not predict whether AI will replace human workers, they highlight where it could technically perform key tasks. This prompts deeper analysis of economic, regulatory or ethical barriers to adoption.
In education, the indicators provide insights into both the use of AI in teaching and the evolving skill sets students will need. As AI becomes able to perform more complex tasks, education systems must consider which abilities remain essential for humans – whether for practical, cognitive or intrinsic reasons – and how to prepare future generations to thrive alongside increasingly powerful AI systems.
A framework for defining and measuring artificial general intelligence
The OECD AI Capability Indicators offer a potential framework for defining and measuring artificial general intelligence (AGI), which is generally understood as AI that matches the full range of human cognitive and social abilities. While many believe AGI is still far off, some tech leaders and researchers warn of its imminent arrival and potential existential risks. In this context, the need for clear, evidence-based monitoring tools becomes more urgent.
Existing attempts to define superhuman AGI rely on characterisations of human abilities that are abstract and difficult to measure in practice. For example, Google DeepMind defined a superhuman AGI system as one that "outperforms 100% of humans" across all tasks (Morris et al., 2024[10]). In contrast, the OECD's AI Capability Indicators provide a framework to systematically compare AI developments with human performance across the range of human ability domains. In so doing, they provide meaningful descriptions of AI’s capabilities and limitations that do not require policy makers to directly interpret the results of AI tests.
This is particularly important as AI capabilities will likely increase at different rates in different domains. The development of LLM capabilities has been described as a "jagged frontier" (Dell’Acqua et al., 2023[11]) because of relatively advanced capabilities in some domains (e.g. breadth of factual knowledge) and limited ones in others (e.g. formal reasoning). This is likely to continue with future AI advances up to and including hypothetical AGI systems. Being able to track the strengths and weaknesses across domains relevant to human abilities will be important to map the social, economic and political implications of AI advancements.
The indicators allow policy makers to track AI progress systematically, with level 5 performance across all scales representing a possible benchmark for human-level general intelligence. By capturing advanced AI capabilities such as creativity, reasoning and metacognition, the framework helps bridge the gap between public concern and technical reality. Where level 5 performance remains elusive, potential “AGI-level” risks can be weighed against empirical evidence rather than conjecture. This enables more grounded, forward‑looking policy responses.
Supporting actors’ engagement with the indicators
Over time, the OECD intends for these indicators to serve as a global reference point for governments, academic researchers and industry. Achieving that goal will involve several key strategies:
Usability and outreach: Developing interactive formats and searchable databases so that a wide array of stakeholders can easily leverage the indicators.
Practical tools: Providing both quantitative and qualitative resources (e.g. occupational vignettes) that illustrate how emerging AI capabilities will alter workplace and educational contexts.
Systematic tracking: Collecting diverse use cases from policy, business and research communities to create an accessible knowledge base of real-world applications.
Continuous stakeholder involvement: Engaging experts, practitioners and policy makers in the indicators’ refinement, promoting broad adoption and collaborative improvement.
Ultimately, these strategies aim to ensure that the OECD AI Capability Indicators bridge the gap between technical assessments and actionable policy. By measuring AI capabilities comprehensively, governments and stakeholders can more accurately anticipate the benefits and risks of emerging AI technologies. This will allow them to take responsible, informed steps towards sustainable, innovation-driven growth (MLCommons, 2025[4]).
References
[8] Autor, D., F. Levy and R. Murnane (2023), “The skill content of recent technological change: An empirical exploration”, The Quarterly Journal of Economics, Vol. 118/4, pp. 1279-1333, https://academic.oup.com/qje/article/118/4/1279/1925105.
[9] Clark Center for Global Markets (2025), Kent A. Clark Center for Global Markets, https://www.kentclarkcenter.org/us-economic-experts-panel/.
[11] Dell’Acqua, F. et al. (2023), “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality”, Harvard Business School Working Paper, No. 24-013, https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf.
[12] Eloundou, T. et al. (2023), “GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models”, arXiv, https://arxiv.org/abs/2303.10130.
[1] EUR-Lex (2024), Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024, EUR-Lex, https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng.
[6] Handa, K. et al. (2025), “Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations”, arXiv, https://arxiv.org/abs/2503.04761.
[3] La Maison Élysée (2025), Statement on Inclusive and Sustainable Artificial Intelligence for People and the Planet, 11 February, La Maison Élysée, Paris, https://www.elysee.fr/en/emmanuel-macron/2025/02/11/statement-on-inclusive-and-sustainable-artificial-intelligence-for-people-and-the-planet.
[5] Maslej, N. et al. (2025), Artificial Intelligence Index Report 2025, https://arxiv.org/abs/2504.07139.
[4] MLCommons (2025), Better AI for Everyone, https://mlcommons.org/ (accessed on 27 May 2025).
[10] Morris, M. et al. (2024), Levels of AGI for Operationalizing Progress on the Path to AGI, DeepMind, https://deepmind.google/research/publications/66938.
[7] OECD (2025), AI and the Future of Skills Volume 3: The OECD AI Capability Indicators, OECD Publishing.
[2] OECD Legal Instruments (2024), Recommendation of the Council on Artificial Intelligence, OECD, Paris, https://legalinstruments.oecd.org/en/instruments/%20OECD-LEGAL-0449.
Notes
Copy link to Notes← 1. For a task-based approach to evaluating LLM’s impact on occupations see (Eloundou et al., 2023[12]) or (Handa et al., 2025[6]).
← 2. Throughout this report, we use the terms “capability” to refer to the basic types of things that AI can do and “ability” to refer to the basic types of things that humans can do.
← 3. The OECD is launching an online repository alongside this report to systematically collect evidence from benchmarks that test AI capabilities described in the indicators. At https://aicapabilityindicators.oecd.org AI researchers will be able to submit benchmarks and other forms of AI evaluation that evaluate any of the capabilities in our scales. The OECD will review submitted evaluations will be reviewed by the OECD and its expert group to judge their suitability for use in future updates of the scales.