The Adoption of Artificial Intelligence in Firms

New Evidence for Policymaking

Report

2 May 2025

Download PDF

Annex D. Implementation and the survey’s statistical features and limitations

Copy link to Annex D. Implementation and the survey’s statistical features and limitations

Finding the sample

Copy link to Finding the sample

A survey administration company conducted the survey. It drew on two proprietary global databases of experts. Together, these contain information on more than 1 million professionals globally. Together, the databases hold information on the company where the experts are currently employed, companies where the experts were previously employed, dates of employment at each company, job titles and job functions at each company, the country and city where the expert was employed with each company, the size of the companies where each expert was employed, and their areas of expertise.

The approach to identifying survey respondents was to filter the databases for executive candidates by the parameters of country, industry, job title and function. Note that the goal of filtering by job title and function was to find respondents who had a good understanding of how artificial intelligence (AI) is being used or is planned to be used in different parts of the enterprises they work in. The job titles searched were: statistician, data engineer, machine learning engineer, AI developer, data scientist, AI project manager, chief information officer, chief technology officer, chief digital officer, chief AI officer, chief data officer, head of data science or equivalent, chief analytics officer, IT manager, data protection officer, AI risk manager, AI ethics officer, digital trust and safety officer or equivalent, other position or any title with responsibilities for AI.

This initial selection of potential survey respondents yielded details of 12 026 experts in relevant fields in companies across the Group of Seven (G7) countries and target sectors. Enterprises were found by finding executives. It was not known how the distribution of enterprises in the databases relates statistically to distributions in the entire population of enterprises in each country.

From the lists of possible survey respondents in each country, the survey administration company randomly selected a subset of names to approach, the aim being to search for qualifying enterprises (e.g. active users of AI) and experts who agreed to participate in the survey. This search process would continue until the sample of 120 enterprises and satisfactorily completed surveys in each country, with the desired enterprise size and sectoral breakdowns, was had. The survey was conducted through a hybrid sampling approach (on line and CATI [Computer Assisted Telephone Interviewing]).

Method 1: Online portal survey

Each expert was emailed with an invitation to take part in the survey. In the invitation email, they were first provided information on the desired expertise and current executive functions the respondent should have. They were also informed of the expected time needed to complete the survey questionnaire. At this stage, experts were not provided with a detailed overview of the survey content. Rather, the invitation provided generic information on the study's objective, namely, to gain insights into the process of AI adoption across various industries.

The invitation email contained a direct link to the survey. Respondents clicked on the link to the survey and completed it without outside guidance or support.

Once an expert was invited to participate in the survey, an automated system sent reminders at intervals of 24 hours. Each expert received a maximum of two reminders, regardless of whether they had already started the survey or had not responded or engaged with it at all. As noted above, this process continued until the desired number of satisfactorily completed questionnaires was achieved.

Method 2: CATI survey

The CATI methodology involves a moderator calling the respondent and taking them through the survey step by step. The moderator reads each question to the respondent and types their answers into the survey for them.

Potential respondents were initially invited by email to participate in the survey, as described above. The details of candidates agreeing to be surveyed were uploaded into a CATI dialler, through which the scheduling team reached out to respondents to book interviews. The scheduling team pre-screened the experts using the following prompt: “Kindly confirm whether you are knowledgeable enough about the usage of artificial intelligence in different parts of your enterprise and have a fair understanding about the scope and challenges pertaining to the usage of AI within your organisation.”

Piloting and full-scale survey administration

The survey was administered in two phases: 1) a pilot; and 2) full-scale administration. The pilot's goal was to ensure ease of use of the survey and to ensure no errors were introduced during the programming of the survey questions. Nine enterprises completed the pilot stage. A review of the responses determined that the survey was correctly coded. Based on this validation, it was concluded that the survey questionnaire required no changes or corrective actions.

Throughout the administration of the survey, the online expert network provider monitored all responses daily to implement quality checks that ensured experts were not speeding through the questionnaire. The results from respondents who spent less than six minutes on the survey were removed. In addition, results from respondents who straight-lined grid-type questions were removed. During the process, the administrator closely monitored open-ended questions to ensure they aligned with the question being asked. During the process, 288 survey responses were deemed of insufficient quality and removed.

The entire process, beginning with the identification of the initial set of 12 026 potential respondents and receipt of the final acceptably completed questionnaire, took place from 5 November 2022 to 6 January 2023. Of the 840 final responses, 353 (42%) were obtained through the CATI process, the remainder coming from the online completions.

Statistical limitations

Copy link to Statistical limitations

Several limitations must be considered concerning the results of the current study. First, the descriptive results and the underlying sample do not represent the firms' population in each country. In other words, the results in the current study relate to averages among the surveyed firms. They are not directly generalisable to the respective population of firms within a given country. One reason for this lack of generalisability is that this study does not use sampling weights to correct for the actual distribution of firms with respect to sector or size. For instance, the population of enterprises in all countries is characterised by the fact that the number of medium-sized enterprises (50‑249 employees) is significantly higher than the number of large enterprises (≥ 250 employees), which is not taken into account in this study by using weights. Therefore, any result that does not directly differentiate between enterprise size classes – i.e. either by controlling for size class in a regression analysis or presenting statistics for each size class separately – will be skewed towards large enterprises, as compared to a representative result for the population of firms.

Another caveat with respect to the generalisation stems from the sampling procedure. Instead of conducting probability sampling among the population of enterprises in a given country, the survey provider primarily contacted enterprises with a high probability of being AI users. Thus, the sampling frame used by the survey provider was not a random subset of enterprises in each country. As a result, the underlying sample for the current study is not a random sample of AI-using enterprises but rather a selection of AI-using enterprises out of a pool of enterprises with a high probability of being AI users. As discussed by Stantcheva (2022[1]), “non-probability sampling, such as the quota sampling performed by survey companies, carries risks in terms of representativeness.” However, as the sampling procedure did not differ systematically between countries, this bias is less likely to affect the comparability of the results across countries in the current study.

A second limitation of the study arises from the number of observations. On the one hand, the total of 840 AI-using enterprises is a significantly higher number of observations than most studies of AI-using enterprises. On the other hand, the analysis covers seven countries, two sectors, and two enterprise size classes. Therefore, any breakdown by those dimensions rapidly decreases the sample size to produce statistically precise approximations of the true population parameters (even if sampling weights and a representative sample had been used). Since the total number of 840 observations was fixed due to budget constraints, the following approach was used to maximise the statistical power of the analyses among G7 countries. Given the available number of 120 observations per country, it was first decided that, at most, 2 strata should be used in the sampling procedure with a total number of 30 observations per cell (see the following paragraph for a discussion of the statistical properties of such sample sizes). Analyses at this granular level with 30 observations, however, are not published in this study. To stratify the sample is an important step to allow for statistical analyses of pre-defined groups of interest within the target population.

There is no formula to determine what sample size qualifies as “sufficiently large”. A general rule of thumb used by many academic publications and often taught in statistics courses is 30. There are several reasons for this number, e.g. it is seen as the lower bound for the central limited theorem (CLT) to hold and 3 $0$ observations are generally seen as a good balance between maximising the sample size (and hence statistical precision) and cost efficiency (the CLT refers to the fact that regardless of the shape of the original population distribution [which might be unknown], the sampling distribution of the sample mean will approximate a normal distribution as the sample size increases. This holds true, provided the sample size is sufficiently large).

This rule of thumb is not only applied by academics and researchers but has repeatedly been used in the context of policy advice as well as publications by governmental and non-governmental institutions. However, the number 30 is not engraved in stone but can be slightly adapted. A working paper by the European Central Bank (ECB) chose a threshold of 20 observations as they decided that “[d]ue to confidentiality constraints, less than 20 observations per cell at the sector level were dropped” (ECB, 2014[2]). Eurostat’s 1990 poverty report was stricter, stating that “[i]f the number of observations per cell is below 50 households, the estimates relating to that cell are considered unreliable and will not be presented in the tables” (Eurostat, 1990, p. 38[3]). Another practical example is the guidelines from the Bundesamt für Statistik, the Federal Office for Statistics in Switzerland. According to these, comparisons that rely on cells with fewer than 10 observations must not be published, and comparisons based on cell frequencies of 10‑29 observations must be accompanied by a note concerning the reduced statistical reliability of the results (BASS, 2016, p. 131[4]). Moreover, in Germany, the widely used “Mietspiegel” (rent index) is officially required by the Federal Office for Building and Regional Planning (BBR) to use at least 30 apartments per cell in order to publish reliable information on average rental prices (BBR, 2020[5]).

References

[4] BASS (2016), Indikatoren-Set für das Monitoring-System Sucht, Büro für Arbeits und Sozialpolitische Studien Bass AG, Bern, http://www.buerobass.ch/fileadmin/Files/2016/BAG_2016_IndikatorensetSucht.pdf.

[5] BBR (2020), Hinweise zur Erstellung von Mietspiegeln, Bundesinstitut für Bau, Stadt und Raumforcshung, Bonn, http://www.bbsr.bund.de/BBSR/DE/veroeffentlichungen/sonderveroeffentlichungen/2014/HinweiseErstellungMietspiegel-neu.html.

[2] ECB (2014), “Micro-based evidence of EU Competitiveness: The Compnet Database”, Working Paper Series, No. 1634, European Central Bank, http://www.ecb.europa.eu/pub/pdf/scpwps/ecbwp1634.pdf.

[3] Eurostat (1990), Poverty in Figures – Europe in the Early 1980s, Statistical Office of the European Communities, Brussels-Luxembourg, http://aei.pitt.edu/100280/1/poverty_in_figures.pdf.

[1] Stantcheva, S. (2022), “How to run surveys: A guide to creating your own identifying variation and revealing the invisible”, NBER Working Paper, No. 30527, http://www.nber.org/papers/w30527.

Publications

Featured publications

Data

Featured data

News & events

Featured events

About

Engage with us

Work with us

Publications

Featured publications

Data

Featured data

News & events

Featured events

About

Engage with us

Work with us

The Adoption of Artificial Intelligence in Firms

More info

Cite this content as:

Annex D. Implementation and the survey’s statistical features and limitations

Finding the sample

Method 1: Online portal survey

Method 2: CATI survey

Piloting and full-scale survey administration

Statistical limitations

References

Topics

Countries & regions

Data

Publications

News & Events

About

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Countries

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Publications

Publications

Featured publications

Data

Data

Featured data

News & events

News & events

Featured events

About OECD

About

Engage with us

Work with us

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment