The AI and the Future of Skills (AIFS) project at the OECD’s Centre for Educational Research and Innovation (CERI) presents a framework to systematically measure artificial intelligence (AI) and robotic capabilities and compare them to human skills. This chapter gives an overview of AI performance across each of the OECD’s AI Capability Indicators. The first section introduces a comparative table and provides the information needed to understand it. The table indicates the current level of AI in each domain and describes the sorts of capabilities possessed by cutting-edge AI systems in November 2024. Below the table, a brief commentary explains the rationale for the OECD’s expert group rating of AI systems at that level and the capabilities that would allow AI systems to progress to the next level.
Introducing the OECD AI Capability Indicators

1. Overview of current AI capabilities
Copy link to 1. Overview of current AI capabilitiesAbstract
Comparative table of current AI capabilities
Copy link to Comparative table of current AI capabilitiesTable 1.1 features an overview of the current capabilities of the most advanced artificial intelligence (AI) systems. The current level applied to AI systems in each domain is printed next to a description of the sort of capabilities AI systems possess at that level. Below the table, a commentary briefly describes why the OECD’s expert group rated AI systems at that level and the capabilities that would allow AI systems to progress to the next level on the scale.
The OECD has developed five-level scales to communicate progression in AI capabilities in a manner understandable to those outside of the field. The scales aim to provide coverage for all types of AI systems. The current ratings include narrow symbolic AI systems, neuro-symbolic systems, large language models (LLMs), social agents and robotics systems at the cutting edge of given domains. At one end, level 1 reflects long-solved and uncontroversially trivial aspects of capabilities for current AI systems. At the other, level 5 AI systems can replicate all aspects of the corresponding human ability. The intermediate three levels show the development of different aspects of AI performance towards full human equivalence.
The OECD explains its approach to develop the scales in Chapter 2 and in more detail in the complementary technical report (OECD, 2025[1]). The ratings of AI systems reflect the state-of-the-art in November 2024.
To be ranked at a given level, an AI system must consistently and reliably possess most aspects of the capability described at that level. For example, our experts placed LLMs at the threshold between level 2 and level 3 on the Language scale. LLMs have many aspects of language capability described at level 3. However, they are held back by their inability to engage in well-formed analytical reasoning, a tendency to hallucinate incorrect information and an incapacity to learn dynamically. Nevertheless, as LLMs fulfil most of the other aspects of language capability at this level, they are rated at level 3.
A prominent failing of current AI systems – the persistent problem of hallucination in LLMs – appears in a variety of ways across the scales, both directly and indirectly. The Knowledge, learning and memory scale notes that hallucinations will be fixed at level 5, the Language scale also notes that critical thinking will appear at level 5, and the Metacognition and critical thinking scale notes that critical evaluation of knowledge will appear at level 3. This diversity across the scales highlights some different perspectives in anticipating the relative difficulty of fixing this challenge; this aspect of the scales may need to be harmonised in future versions. However, one important function of the scales is to remind the public that hallucination appears as one challenge among many: several challenges need to be solved for AI to reach human-level performance.
Readers will note that our experts have placed all current AI levels at levels 2 and 3, which is an indirect consequence of our approach to constructing the inaugural scales. The scales aim to communicate the major developments in each capability from the past towards a hypothetical future where AI can reproduce all human aspects of the capability. In each scale the level descriptors outline the major development steps in the domain. Those already achieved are at the lower levels, while those remaining are at the upper levels. Levels 4 and 5 generally describe aspects of capabilities that are still difficult for AI to perform consistently and reliably.
Many researchers in the field may not agree with our judgements about the state of the art in 2024 or the distribution of capabilities across the five-level scales. The OECD encourages AI researchers to contact the Organisation to aid our updating process and better align the scales to the most recent developments.
The level descriptions in this chapter are abbreviated; the full versions of each level and its accompanying scale can be found in Chapter 3.
Table 1.1. Overview of current AI capability levels
Copy link to Table 1.1. Overview of current AI capability levels
Domain |
Level (from 1 to 5) |
Capability description1 |
---|---|---|
Language |
3 |
AI systems at this level reliably understand and generate semantic meaning using multi-corpus knowledge. They show advanced logical and social reasoning ability and can process text, speech and images. They support a diverse range of languages and adapt through iterative learning techniques. |
Social interaction |
2 |
AI systems combine simple movements to express emotions and learn from interactions for future encounters. They recall events and adapt slightly based on experience, recognising basic signals and detecting emotions through tone and context. They also perceive individual distinctions and apply past experiences to recurring challenges. |
Problem solving |
2 |
AI systems integrate qualitative reasoning – such as spatial or temporal relationships – with quantitative analysis to address complex professional problems framed using conventional domain abstractions. They handle multiple qualitative states and transitions, predicting how systems may evolve or change over time. |
Creativity |
3 |
AI systems generate valuable outputs that deviate significantly from their training data and challenge traditional boundaries. They generalise skills to new tasks and integrate ideas across domains. |
Metacognition and critical thinking |
2 |
AI systems monitor their own understanding and adjust their approaches accordingly. They work with familiar information that may contain ambiguities, requiring measured confidence and informed guesses. They can handle partially incomplete information by discerning what they know and what they do not. |
Knowledge, learning and memory |
3 |
AI systems learn the semantics of information through distributed representations and generalise to novel situations. They can process massive datasets for context-sensitive understanding but lack real-time learning capabilities. |
Vision |
3 |
AI systems can handle some variation in target object appearance and lighting, performs multiple subtasks, and can cope with known variations in data and situations. |
Manipulation |
2 |
AI systems handle a variety of object shapes and moderately pliable materials, operating in controlled environments with low to moderate clutter. They navigate around small obstacles in open spaces, accommodate objects placed randomly within a defined region, and perform tasks without time constraints. |
Robotic intelligence |
2 |
Robotic systems operate in partially known, mostly static, semi-structured environments with some well-defined variability. They handle short-horizon, simple multi-function tasks that, while well defined, involve inherent uncertainty. They can engage in limited human interaction, such as minimal interfaces, and manage some unexpected outcomes within familiar task settings. They deal with little to no ethical issues. |
Commentary on current ratings
Copy link to Commentary on current ratingsLanguage
As described above, today’s most advanced LLMs, such as GPT4o (OpenAI, 2024a[2]) used by ChatGPT, are rated at the lower threshold of level 3. LLMs excel in accessing world knowledge, working across multiple languages and iterative learning through fine tuning and post-processing. The struggle of LLMs with robust reasoning due to their inability to engage in well-formed analytical reasoning and their tendency to hallucinate incorrect information continue to be a bottleneck for advancement.
Social interaction
GPT4o and equivalent LLMs are rated at level 2 on the Social interaction scale due to their strong social memory skills. However, they are not embodied, have no sense of identity and have limited social perception. Social robots such as Sony’s AIBO are also level 2 systems but have a different set of capabilities. These systems are embodied and have basic perception and identity, but their problem‑solving skills are much more basic than those of LLM systems.
Problem solving
Symbolic AI systems demonstrate superhuman capabilities in narrow domains like logistics planning and model checking and are therefore ranked as level 2 systems. While LLMs can fulfil some level 3 requirements, such as the capability to solve problems described in natural language, they are too brittle due to hallucinations. Our experts felt this was still true of early “reasoning” models such as the preview of GPTo1 (OpenAI, 2024b[3]) that became available in late 2024. Whether this is still true of more advanced “reasoning” models such as GPTo3 (OpenAI, 2025[4]) and DeepSeek R1 V3 (DeepSeek-AI, 2025[5]) is analysed in the full version of the OECD AI Capability Indicators.
Creativity
Current AI systems can create outputs that are valuable to humans, somewhat novel and occasionally surprising. One example of a level 3 system is Google’s AlphaZero (Silver et al., 2017[6]), which produced efficient and surprising strategies for problems using a neuro-symbolic architecture. The reliance of LLMs on a probabilistic architecture and training data (i.e. previous human‑generated content) means they are unable to generate outputs substantially distinct from existing human knowledge. However, these outputs are often useful and occasionally novel, which means LLMs are typical level 2 systems.
Metacognition and critical thinking
The most advanced LLMs typically perform at level 2 of the Metacognition and critical thinking scale. They can monitor their own understanding and adjust their approach to the problem at hand. However they struggle with integrating unfamiliar information or evaluating their own knowledge both required for level 3 systems. At the time of evaluation, agentic systems typically also performed at level 2, reflecting continuing limitations with AI’s ability to self-monitor and adaptively regulate its own reasoning. Agentic systems released in 2025 will be reviewed in the next edition of the OECD’s Capability Indicators.
Knowledge, learning and memory
LLMs and related forms of generative AI are the cutting-edge systems in this domain, reaching level 3 through capabilities such as generalising from stored knowledge. While efforts have been made with AI agents in this domain, none have shown capabilities required for level 4 such as incremental learning through interaction with the world or metacognitive awareness of knowledge gaps.
Vision
Cutting-edge AI vision systems are at level 3. Our experts have identified a small number of systems with limited level 4 capabilities. However, this performance is not yet reliable enough for any system to achieve that rating. Level 3 systems robustly handle a limited range of data types and can cope with modest variations in lighting, shape and appearance of target objects. Unlike level 4 systems, current AI vision systems are unable to improve performance based on self-feedback or cope with large variations of lighting and target objects.
Manipulation
Manipulation systems are rated at level 2. A typical state-of-the-art system is a robotic arm used in highly controlled manufacturing environments. In contrast level 3 systems can perform in moderately cluttered and dynamic environments with objects of variable shape, size and weight. Manipulation systems are still far off human equivalence. However, insofar as objects and environments can be standardised – such as in factories – these systems will still affect human jobs and skill demand will still be impacted.
Robotic intelligence
The most advanced robotic systems are autonomous delivery robots and industrial automation systems, which our experts ranked at level 2. These systems perform well in structured environments with pre‑defined tasks. Robotic systems are currently unable to perform multi-step tasks or collaborate with humans reliably which would be required to reach level 3.
References
[5] DeepSeek-AI (2025), “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, arXiv, Vol. 2501.12948, https://doi.org/10.48550/arXiv.2501.12948.
[1] OECD (2025), AI and the Future of Skills Volume 3: The OECD AI Capability Indicators, OECD Publishing.
[4] OpenAI (2025), OpenAI o3 and o4-mini System Card, 16 April, OpenAI, https://openai.com/index/o3-o4-mini-system-card.
[2] OpenAI (2024a), “GPT-4o System Card, OpenAI”, arXiv, Vol. 2410.21276, https://arxiv.org/abs/2410.21276.
[3] OpenAI (2024b), “OpenAI o1 System Card, OpenAI”, arXiv, Vol. 2412.16720, https://arxiv.org/abs/2412.16720.
[6] Silver, D. et al. (2017), “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”, arXiv, Vol. 1712.01815, https://arxiv.org/abs/1712.01815.
Note
Copy link to Note← 1. The descriptions in the comparison table are abbreviated versions of the relevant scale-level descriptions found in Chapter 3.