Several limitations must be considered concerning the results of the current study. First, the descriptive results and the underlying sample do not represent the firms' population in each country. In other words, the results in the current study relate to averages among the surveyed firms. They are not directly generalisable to the respective population of firms within a given country. One reason for this lack of generalisability is that this study does not use sampling weights to correct for the actual distribution of firms with respect to sector or size. For instance, the population of enterprises in all countries is characterised by the fact that the number of medium-sized enterprises (50‑249 employees) is significantly higher than the number of large enterprises (≥ 250 employees), which is not taken into account in this study by using weights. Therefore, any result that does not directly differentiate between enterprise size classes – i.e. either by controlling for size class in a regression analysis or presenting statistics for each size class separately – will be skewed towards large enterprises, as compared to a representative result for the population of firms.
Another caveat with respect to the generalisation stems from the sampling procedure. Instead of conducting probability sampling among the population of enterprises in a given country, the survey provider primarily contacted enterprises with a high probability of being AI users. Thus, the sampling frame used by the survey provider was not a random subset of enterprises in each country. As a result, the underlying sample for the current study is not a random sample of AI-using enterprises but rather a selection of AI-using enterprises out of a pool of enterprises with a high probability of being AI users. As discussed by Stantcheva (2022[1]), “non-probability sampling, such as the quota sampling performed by survey companies, carries risks in terms of representativeness.” However, as the sampling procedure did not differ systematically between countries, this bias is less likely to affect the comparability of the results across countries in the current study.
A second limitation of the study arises from the number of observations. On the one hand, the total of 840 AI-using enterprises is a significantly higher number of observations than most studies of AI-using enterprises. On the other hand, the analysis covers seven countries, two sectors, and two enterprise size classes. Therefore, any breakdown by those dimensions rapidly decreases the sample size to produce statistically precise approximations of the true population parameters (even if sampling weights and a representative sample had been used). Since the total number of 840 observations was fixed due to budget constraints, the following approach was used to maximise the statistical power of the analyses among G7 countries. Given the available number of 120 observations per country, it was first decided that, at most, 2 strata should be used in the sampling procedure with a total number of 30 observations per cell (see the following paragraph for a discussion of the statistical properties of such sample sizes). Analyses at this granular level with 30 observations, however, are not published in this study. To stratify the sample is an important step to allow for statistical analyses of pre-defined groups of interest within the target population.
There is no formula to determine what sample size qualifies as “sufficiently large”. A general rule of thumb used by many academic publications and often taught in statistics courses is 30. There are several reasons for this number, e.g. it is seen as the lower bound for the central limited theorem (CLT) to hold and 3 observations are generally seen as a good balance between maximising the sample size (and hence statistical precision) and cost efficiency (the CLT refers to the fact that regardless of the shape of the original population distribution [which might be unknown], the sampling distribution of the sample mean will approximate a normal distribution as the sample size increases. This holds true, provided the sample size is sufficiently large).
This rule of thumb is not only applied by academics and researchers but has repeatedly been used in the context of policy advice as well as publications by governmental and non-governmental institutions. However, the number 30 is not engraved in stone but can be slightly adapted. A working paper by the European Central Bank (ECB) chose a threshold of 20 observations as they decided that “[d]ue to confidentiality constraints, less than 20 observations per cell at the sector level were dropped” (ECB, 2014[2]). Eurostat’s 1990 poverty report was stricter, stating that “[i]f the number of observations per cell is below 50 households, the estimates relating to that cell are considered unreliable and will not be presented in the tables” (Eurostat, 1990, p. 38[3]). Another practical example is the guidelines from the Bundesamt für Statistik, the Federal Office for Statistics in Switzerland. According to these, comparisons that rely on cells with fewer than 10 observations must not be published, and comparisons based on cell frequencies of 10‑29 observations must be accompanied by a note concerning the reduced statistical reliability of the results (BASS, 2016, p. 131[4]). Moreover, in Germany, the widely used “Mietspiegel” (rent index) is officially required by the Federal Office for Building and Regional Planning (BBR) to use at least 30 apartments per cell in order to publish reliable information on average rental prices (BBR, 2020[5]).