How Do Health System Features Influence Health System Performance?

Report

21 March 2025

2. Does one “best” health system exist?

Copy link to 2. Does one “best” health system exist?

Abstract

The chapter updates previous OECD work on health system efficiency, using newer data on health systems characteristics and an expanded set of countries. The analysis used cluster analysis to group countries with similar healthcare system characteristics, resulting in eight distinct clusters based on features like market mechanisms, provider choice, insurance coverage, and gatekeeping arrangements. There is no evidence that any particular type of health system consistently outperforms others. High-performing countries can be found in all clusters and health systems doing poorly are also present in all groups. The analysis concludes that rather than attempting wholesale system changes, countries should focus on implementing specific policy improvements that can enhance performance regardless of their overall healthcare system design.

This chapter looks at whether efficiency – a measure that offers insights into performance – is influenced by the overall design of health systems. New OECD data on health system characteristics were used to understand links between health policy and performance.

Previous OECD work (Joumard, André and Nicq, 2010[1]) assessed whether higher efficiency is associated with the overall design and policy approach of health systems. Those empirical analyses (Box 2.1) suggested that there is room in all health systems to improve performance, and there is no healthcare system that performs systematically better in delivering cost-effective healthcare. Those analyses also showed that increasing the coherence of policy settings, by adopting best policy practices within a similar system and borrowing the most appropriate elements from other systems will likely be more practical and effective to raise healthcare spending efficiency – big bang reforms1 are therefore unlikely to be warranted.

Box 2.1. Results of previous OECD work on health systems efficiency

Copy link to Box 2.1. Results of previous OECD work on health systems efficiency

To capture the key features of the overall design and policy approach of health systems, the following indicators – constructed on the basis of responses to the 2008 round of the OECD survey on health systems characteristics – were used to identify clusters of health systems:

Indicator	Description
Degree of user choice of basic coverage	Evaluates the source of basic healthcare coverage and whether there is ability/freedom to choose an insurer. residence‑based; insurance‑based, single insurer; insurance‑based, multiple insurers without choice; insurance‑based, multiple insurers with choice
Degree of private provision of primary care and outpatient specialist services	Evaluates the degree of private provision in primary care and outpatient specialist care. A higher value of the indicators means that the predominant provision of primary and outpatient specialist care is private
Patient choice of providers	Indicates whether individuals are free to choose any doctor or hospital, face incentives to choose any specific doctor or hospital, or have a limited choice. A higher value of the indicator means a larger possibility to use any provider
Health insurance as a secondary source of coverage (“over the basic” coverage)	Shows spending on private voluntary health insurance as a secondary source of coverage as a share of total health expenditure A higher value of the indicator means a larger spending on secondary coverage
Role of primary care in the health system (gate‑keeping)	Indicates whether the referral from a primary care physician to access specialist care is required or there are financial incentives to do so. A higher value of the indicator means that referral is required

Those indicators were selected as those that most differentiate health systems – using a Principal Component Analysis1 – and that help identify plausible and interpretable clusters (Joumard, André and Nicq, 2010[1]).

The empirical analyses suggested that efficiency levels vary more within groups of countries sharing similar institutional characteristics than between groups. Thus, there is no indication that one group of healthcare system would systematically outperform another. On the contrary, countries performing well can be found in all institutional groups. Countries doing poorly are also present in most groups.

Figure 2.1. Efficiency scores across and within country groups for 2007
Copy link to Figure 2.1. Efficiency scores across and within country groups for 2007

This chapter updates those analyses to see whether findings are confirmed using a larger set of countries, responses to the 2016 and 2023 rounds of the HSC survey,2 and different input and output variables.

First, a pure statistical data driven approach to clustering was conducted based on the five indicators used in previous OECD work – that is degree of user choice of basic coverage; degree of private provision of primary care and outpatient specialist services; patient choice of providers; health insurance as a secondary source of coverage (“over the basic” coverage); role of primary care in the health system (gate‑keeping) (Box 2.2).

Box 2.2. Results of the data driven approach to clustering health systems

Copy link to Box 2.2. Results of the data driven approach to clustering health systems

The data driven approach identified the following seven clusters of health systems:

Figure 2.2. Clusters of health systems
Copy link to Figure 2.2. Clusters of health systems

At first sight, some of the clusters look plausible – most of them include countries which are neighbours of each other and/or which have aspects in common. However, there are interlopers that are difficult to explain (e.g. Slovenia in cluster 2; Costa Rica in cluster 6), and cluster 7 groups quite different health systems.

Cluster analysis algorithms will always produce a set of clusters whether there are true patterns in the data or not; however, it is important that results are plausible, interpretable, and easily explained to an audience that is not necessarily familiar with statistical methods. Therefore, selecting the appropriate number of homogeneous groups or families of health systems for policy discussions should not be based on quantitative measures alone as the challenge lies in finding the optimal balance to ensure that comparisons are both valid and policy relevant.

Based on the data driven approach, some expert judgement was then used to ensure that there are meaningful and identifiable policy differences to explain why countries are grouped together. As a result, eight clusters that display the following features were identified (Figure 2.3):

Cluster 1: Germany, Israel, the Netherlands, the Slovak Republic and Switzerland rely extensively on market mechanisms to steer the behaviour of service providers, and basic insurance coverage is provided by multiple insurers with choice of insurer on the part of users.
Cluster 2: A second group of countries – Australia, Belgium, Canada and France – relies extensively on market mechanisms in service provision and features public basic insurance coverage. Over-the‑basic insurance coverage plays a significant role, and cost control generally takes the form of moderate gate‑keeping arrangements.
Cluster 3: The third group – which includes Austria, Czechia, Greece, Japan, Korea and Luxembourg – is also characterised by extensive use of market mechanisms in private provision of care. But there is no gate‑keeping system in place and over-the‑basic coverage is limited.
Cluster 4: Chile, Colombia and Mexico feature a mixed provision of healthcare services, basic coverage provided by multiple insurers and limited patient choice of providers.
Cluster 5: A fifth group of countries – Estonia, Latvia and Lithuania – is characterised by a mixed provision of healthcare services, basic coverage provided by a single insurer and ample patient choice of providers.
Cluster 6: The healthcare systems of Hungary, Iceland and Türkiye offer free choice of provider to patients in all three areas of care – primary, specialist and hospital care – with no gate‑keeping.
Cluster 7: In the group consisting of Costa Rica, Finland, Portugal and Spain, healthcare is mainly provided by a heavily regulated public system. Patients’ choice among providers is very limited and the role of gate‑keeping is important.
Cluster 8: The last group also consists of heavily regulated public systems – Denmark, Ireland, Italy, New Zealand, Norway, Poland, Slovenia, Sweden and the United Kingdom. Compared with the previous group, the possibility for patients of choosing between providers tends to be large.

As compared to the previous OECD analysis (Joumard, André and Nicq, 2010[1]), two clusters were added to group health systems with mixed private‑public provision (Cluster 4 and Cluster 5). The other clusters (1‑3, 6‑8) do not look substantially different from earlier analysis, reflecting that not many countries saw major changes in their health system characteristics. Only Mexico was shifted to the new Cluster 4 as its health system presents a mixed provision of healthcare services, Hungary to Cluster 6 as there is no referral required to access specialist care (gate‑keeping) and Sweden to Cluster 8 as referral is required to access specialist care.

Figure 2.3. Groups of health systems with similar institutional features
Copy link to Figure 2.3. Groups of health systems with similar institutional features

Statistical methods

Copy link to Statistical methods

Data Envelopment Analysis (DEA) – a nonparametric statistical technique3 – is a commonly used analytical tool to estimate the relative efficiency with which inputs are turned into output (Jacobs, Smith and Street, 2006[3]). DEA was used in the comparison of efficiency of OECD health systems mentioned earlier (Joumard, André and Nicq, 2010[1]). It was also used in other health system efficiency analyses (Sicari and Sutherland, 2023[4]; Moran, Suhrcke and Nolte, 2023[5]).

Health spending as a share of GDP4 was the main input variable in the DEA model used in this set of analyses. The variable proxies the countries’ current investments aiming to improve population health, while accounting for the different levels of GDP across OECD countries.

The selection of output variables is another important methodological decision to be made, as it should capture key results of health systems that will reflect their efficiency. Depending on the purposes of the analysis, these could be, for example, health outcomes, metrics associated with service delivery, indicators of quality of care, indicators of access to care, or even indicators related to carbon emissions.

A wide range of health outcome variables have been previously used to compare change over time and between populations, ranging from life expectancy (Joumard, André and Nicq, 2010[1]; Medeiros and Schwierz, 2015[6]) to infant mortality (Retzlaff-Roberts, Chang and Rubin, 2004[7]; Manavgat and Audibert, 2024[8]) and treatable mortality (Medeiros and Schwierz, 2015[6]).

As comparisons of measures such as life expectancy at birth do not take into account differences in population structures across countries, in contrast to previous OECD analyses, this set of analyses uses yearly age standardised mortality rates (ASMR)5 as the output variable in the DEA model. Age standardised measures account for differences and changes in population structure and size that can impact mortality-based health outcomes across countries and in a given country over time. For example, if the overall population is ageing over time, then it might be reasonably expected that the observed rates of deaths would also increase, independently of the performance of the health system. Furthermore, ASMR does not take into account how other external factors – including environmental and social factors and health behaviours – that influence performance.

A DEA output-oriented efficiency measure is then compared across countries to explore the proportional expansion in output that are possible both within and between groups/clusters of health systems.

Key findings

Copy link to Key findings

Efficiency scores by country computed using DEA are shown in Figure 2.4.

Figure 2.4. Efficiency scores with confidence intervals by country, 2019
Copy link to Figure 2.4. Efficiency scores with confidence intervals by country, 2019

Descriptive analysis showed that health systems performing well can be found in all institutional groups. Health systems doing poorly are also present in all groups, in particular in cluster 5 characterised by a mixed provision of healthcare services, basic coverage provided by a single insurer and ample patient choice of providers (Figure 2.5). This indicates that no institutional characteristics are suggestive of “best performance”, but there is ample possibility for within and across groups learning.

Figure 2.5. Efficiency scores within and across health systems clusters, 2019
Copy link to Figure 2.5. Efficiency scores within and across health systems clusters, 2019

Analysis of variance (ANOVA) indicated that more than almost 40% of the variation observed in efficiency scores by countries is explained by the eight clusters used in this analysis (Table 2.1).

Cluster	Countries	Efficiency score (1 = highest efficiency)
Cluster	Countries	Mean	95% confidence interval
1	Germany, Israel, Netherlands, Slovak Republic, Switzerland	0.65	0.48 – 0.82
2	Australia, Belgium, Canada, France	0.71	0.59 – 0.84
3	Austria, Czechia, Greece, Japan, Korea, Luxembourg	0.67	0.49 – 0.85
4	Chile, Colombia, Mexico	0.62	0.51 – 0.73
5	Estonia, Latvia, Lithuania	0.21	0.01 – 0.41
6	Hungary, Iceland, Türkiye	0.58	0.24 – 0.92
7	Costa Rica, Finland, Portugal, Spain	0.64	0.56 – 0.71
8	Denmark, Ireland, Italy, New Zealand, Norway, Poland, Slovenia, Sweden, United Kingdom	0.64	0.56 – 0.71
Total		0.61	0.55 – 0.67
Variation (η²)
	Between groups	39.8%
	Within groups	60.2%

Furthermore, the difference in the mean efficiency score by pairs of clusters was assessed using the Tukey 95% family-wise confidence levels. Even though the difference in the mean score (the point) looks as if one cluster is better (more efficient) than another, there is enough variation for us not be 95% sure that this is the case. Only the mean efficiency score of cluster 5 is significantly lower than the mean efficiency score of clusters 1, 2, 3 and 8 (Figure 2.6).

Figure 2.6. Comparison of the difference in the mean efficiency scores for all pairs of clusters
Copy link to Figure 2.6. Comparison of the difference in the mean efficiency scores for all pairs of clusters

In this analysis, some countries with low levels of health spending as a share of GDP appear high on the efficiency score (e.g. Türkiye), in contrast to some historically well-funded health systems (e.g. Germany). This result may be attributed to the declining marginal productivity observed in countries with high health expenditure as a share of GDP. In support of these long-standing OECD findings (OECD, 2017[9]), the International Monetary Fund recently showed that the relationship between financial resources and health outcomes is “flatter” for advanced economies as compared to emerging markets and developing countries, thereby additional health spending results in smaller gains in health outputs over time (Garcia-Escribano, Mogues and Juarros, 2022[10]). Taken together, these results support the idea that as inputs increase in health systems of major developed economies, outputs tend to increase too, but at a slower rate (Gallet and Doucouliagos, 2017[11]). This helps explain the lower efficiency of health systems of countries with higher level of inputs.

Sensitivity analyses

Copy link to Sensitivity analyses

Sensitivity analyses were conducted to assess the robustness of results to adding one input variable to the base model, using efficiency scores estimated using DEA for each year between 2016‑22 as an input in a regression analysis, and using life expectancy at birth as outcome variable.

When the base one input – one output model is compared against different two‑input models, results show that there is little variation in efficiency scores between models (Table 2.2).

	Base model + GDP per capita (in Purchasing Power Parities)	Base model + Obesity (% share of population)	Base model + tobacco (% share of population)	Base model + pollution	Base model + hospital beds	Base model + workforce
Correlation with efficiency scores of the base model	0.93	1	0.94	0.94	0.93	0.87

Note: Correlation is measured using the Pearson correlation coefficient.

Results also confirm the central findings that health systems performing well can be found in all institutional groups, and health systems doing poorly are also present in all groups (see Annex 2.A for details). Only when adding workforce as an input variable, results show some changes as efficiency scores of health systems with lower density of workforce – such as those assigned to cluster 4 – increase (Figure 2.7).

Figure 2.7. Efficiency scores within and across health systems clusters, 2019
Copy link to Figure 2.7. Efficiency scores within and across health systems clusters, 2019

To assess the robustness of results over time, efficiency scores were estimated through DEA for each year between 2016‑22, Those scores were then used as input in a regression analyses. By comparison to a reference cluster, results confirm that cluster 5 shows a lower efficiency (Table 2.3) as the value of the coefficient is positive (1.09) and statistically significant with a 1% chance of this finding be wrong.

Variable	Estimate	SE	P-values
GDP	‑1.89E‑05	6.74E‑06	0.01**
Education	‑3.18E‑02	8.59E‑03	0***
Gini	‑5.9 267	1.4 482	0***
Post_Covid	3.19E‑01	2.03E‑01	0.12
Cluster 2 (versus cluster 1)	‑2.35E‑01	2.15E‑01	0.27
Cluster 3 (versus cluster 1)	‑4.26E‑02	9.84E‑02	0.67
Cluster 4 (versus cluster 1)	3.50E‑01	3.72E‑01	0.35
Cluster 5 (versus cluster 1)	1.5 779	3.57E‑01	0***
Cluster 6 (versus cluster 1)	‑6.71E‑02	3.52E‑01	0.85
Cluster 7 (versus cluster 1)	1.16E‑02	1.56E‑01	0.94
Cluster 8 (versus cluster 1)	‑1.40E‑02	1.40E‑01	0.92
Pseudo-R2	1.99E+01	4.6 438	0***

Note: Significant result at *0.05, **0.01, ***0.001 level. The model uses the Arellano method for heteroskedasticity-consistent standard errors (White). The outcome of the model (Potential for outcome increase) is log transformed. The full regression results can be found in Annex 2.A.

Finally, sensitivity analyses of results to a different outcome variable were conducted. Efficiency scores are considerably higher when using life expectancy as output variable instead of age standardised mortality rates (Figure 2.8). This is explained by a lower variation in the outcome variable, making every country relatively closer to the efficiency frontier.

Compared to the base model of analysis, most countries maintain their relative position in the cluster they are assigned to, and most clusters maintain the same position relative to the OECD average. However, there are some exceptions, as life expectancy at birth does not standardise for differences in population structure. This can help explain the higher efficiency scores – compared to the base model – for Italy, Japan and Spain, countries with relatively older populations with high life expectancy, and the lower efficiency scores – compared to the base model – for countries assigned to cluster 4, countries with a relatively younger populations and lower life expectancy.

Figure 2.8. Efficiency scores within and across health systems clusters, 2019
Copy link to Figure 2.8. Efficiency scores within and across health systems clusters, 2019

Life expectancy at birth as output variable

Conclusion: no best health system

Copy link to Conclusion: no best health system

In line with previous work (Joumard, André and Nicq, 2010[1]), this analysis confirmed that there is no indication that one group of health systems systematically outperform another. The policy implication of these findings is to reinforce the suggestion that large scale, “big bang” reforms, that require considerable political capital and financial resources are to be designed and implemented with caution, as just changing the whole system will not automatically improve performance. Future work could improve our ability to capture additional meaningful characteristics of health systems and explore the use of different outcome variables to refine the model for thinking about how characteristics may influence the outcomes at system level.

In addition to looking at the performance of the whole health system, there are various specific dimensions on which to evaluate health systems, and a system may perform differently on those dimensions. Therefore, there is room in health systems to adopt policy action that will lead to improved performance related to a specific dimension by understanding the most appropriate elements from other systems, regardless of their institutional set-up. To this aim, more targeted measures of performance and more granular classifications of health systems based on characteristics such as the use of financial incentives to providers to improve quality and the strength of gatekeeping (Table 2.4) may be used. Those targeted measures of performance can plausibly be more influenced by the characteristics being examined.

As those characteristics seem to be particularly promising actionable policy levers for countries to improve performance (OECD/WHO, 2014[12]; OECD, 2016[13]; OECD, 2020[14]), the following chapters help identify the role of financial incentives to providers for quality, physicians’ payment methods and strength of primary care in improving performance.

Cluster/country	Strength of financial incentives to providers to improve quality of care	Use of fee‑for-service payments to physicians	Role of primary care in the health system (gate‑keeping)	Continuity of care
Cluster 1
Germany	Weak	Large	Medium	Strong
Israel	Weak	Limited	Medium	Strong
Netherlands	Limited	Large	Strong	Strong
Slovak Republic	Strong	Limited	Strong	n.a.
Switzerland	Weak	Large	Medium	Strong
Cluster 2
Australia	Weak	Large	Strong	Strong
Belgium	Weak	Large	Medium	Strong
Canada	Weak	Large	Strong	Medium
France	Strong	Large	Medium	Medium
Cluster 3
Austria	Weak	Limited	Weak	Medium
Czechia (2023)	Strong	Large	Weak	Strong
Greece	Weak	Limited	Weak	Weak
Japan	n.a.	n.a.	Weak	Weak
Korea	Strong	Large	Weak	Weak
Luxembourg	Weak	Large	Weak	Strong
Cluster 4
Chile	Strong	Limited	Strong	Weak
Colombia	Weak	Limited	Strong	n.a.
Mexico	Weak	Limited	Strong	n.a.
Cluster 5
Estonia	Weak	Limited	Strong	Strong
Latvia	Weak	Large	Medium	Strong
Lithuania	Weak	Limited	Strong	Strong
Cluster 6
Hungary	Weak	Limited	Weak	Strong
Iceland	Weak	Limited	Weak	Weak
Türkiye (2016)	Weak	Large.	Weak	n.a.
Cluster 7
Costa Rica	Weak	Limited	Strong	Strong
Finland	Weak	Limited	Strong	Strong
Portugal	Strong	Limited	Strong	Medium
Spain	Strong	Limited	Strong	Medium
Cluster 8
Denmark	Weak	Large	Medium	n.a.
Ireland	Weak	n.a.	Strong	Medium
Italy	Weak	Limited	Strong	Medium
New Zealand	n.a.	n.a.	n.a.	n.a.
Norway	Weak	Limited	Strong	Strong
Poland	Limited	Limited	Strong	Medium
Slovenia	Weak	Limited	Strong	Strong
Sweden	Limited	Limited	Medium	Medium
United Kingdom	Strong	Limited	Strong	Strong

Note: n.a. not available as countries did not provide responses to the question. The qualification of health system features is based on responses to the OECD Health Systems Characteristics survey (see Annex A).

Annex 2.A. Clustering and sensitivity analyses

Copy link to Annex 2.A. Clustering and sensitivity analyses

Input and output variables

Copy link to Input and output variables

The cross-country comparison of health spending as a share of GDP – the input variable of this set of analysis – is shown in Annex Figure 2.A.1.

Annex Figure 2.A.1. Health spending as a share of GDP by country, 2019
Copy link to Annex Figure 2.A.1. Health spending as a share of GDP by country, 2019

The cross-country comparison of age standardised mortality rates per 100 000 population – the output variable of this set of analysis – is shown in Annex Figure 2.A.2.

Annex Figure 2.A.2. Age standardised mortality rates by country, 2019
Copy link to Annex Figure 2.A.2. Age standardised mortality rates by country, 2019

Efficiency frontier

Copy link to Efficiency frontier

Annex Figure 2.A.3 shows the efficiency frontier estimated using DEA for 2019 – basic model of output-oriented analysis, one input – one output. It should be noted that the reliability of an efficiency score also depends on the density of observations in the region of the frontier where a country is located. Countries with atypical levels of inputs and outputs tend to be considered as efficient but this result may be just the consequence of the lack of comparable observations (Simar and Wilson, 2007[15]). In this analysis, the “flat” part of the efficiency frontier is computed based on the output for Korea. Countries with a similar or higher level (to that of Korea) of health spending as a share of GDP are compared against the efficiency score for Korea. Countries with a lower level (to that of Korea) of health spending as a share of GDP are compared to the hypothetical frontier represented by the “downwards slope” part of the frontier.

Annex Figure 2.A.3. Efficiency frontier estimated using DEA, 2019
Copy link to Annex Figure 2.A.3. Efficiency frontier estimated using DEA, 2019

Clustering

Copy link to Clustering

Ward’s method and Gower’s distance

Ward’s hierarchical clustering was used to group countries into homogeneous groups. Ward’s hierarchical agglomerative clustering is a bottom-up approach where each country starts as its own cluster, and clusters are iteratively merged until a single cluster is formed. The method minimises the total within-cluster variance at each step of the merging process, ensuring that clusters remain as homogenous as possible. Specifically, Ward’s method minimises the increase in the total within-cluster sum of squared deviations (the error sum of squares, ESS) when two clusters are merged (Equation 1).

Equation 1

$E S S (C) = \sum_{i \in C} \sum_{j = 1}^{p} {(x_{i j} - {\bar{x}}_{C_{j}})}^{2}$

The EES in the context of Ward’s method is a measure of the variance within a cluster. It represents the sum of squared deviations of each point in the cluster from the cluster centroid (mean of the points in the cluster). Ward’s method merges the two clusters that result in the smallest increase in the total ESS at each step. In the formula $x_{i j}$ is the value of variable j for point i; ${\bar{x}}_{C_{j}}$ is the centroid (mean) of variable j in cluster C.

Given that the dataset contains both categorical and continuous variables, Gower’s distance was chosen as the distance metric. Gower’s distance is a similarity measure that accommodates mixed data types. It standardises continuous variables and treats categorical variables as binary, computing dissimilarities for each variable and then aggregating these into a single distance metric. This makes Gower’s distance an appropriate choice when the dataset contains a mix of continuous, ordinal, and nominal variables, ensuring that no data type disproportionately influences the clustering process.

The dendrogram produced by the hierarchical clustering algorithm was used to explore the clustering structure, helping to visualise the merging process. Additionally, the Silhouette score, a metric that evaluates how similar an object is to its own cluster compared to other clusters, was used to assess the quality of the clusters and determine the optimal number of clusters. The silhouette score ranges from ‑1 to 1, with higher values indicating better-defined clusters.

Quality checks

To ensure the robustness of the clustering results, a series of quality checks were conducted using both cross-validation techniques and feature importance analysis. These analyses ensured that the clusters were both statistically stable and meaningfully interpretable, ensuring that the results were not overly sensitive to small perturbations in the data and that the variables used for clustering were meaningful.

Clustering stability was evaluated using a leave‑one‑out cross-validation technique. This method involves iteratively excluding one data point (country-year) at a time from the dataset, reapplying the clustering algorithm to the remaining data, and then analysing whether the clusters formed remain consistent. The purpose of this approach is to detect outliers or data points that disproportionately influence the clustering structure. If the exclusion of a single data point leads to major shifts in cluster assignments, this would indicate potential instability in the clustering solution. Conversely, no changes to the clusters would suggest that the solution is robust and not overly dependent on any particular observation.

Additionally, a Random Forest algorithm was employed to assess the importance of variables in determining the cluster structure. Random Forest is a non-parametric ensemble learning method that generates multiple decision trees based on random subsets of the data and then aggregates their predictions. In this context, it was used to evaluate the contribution of each variable to the clustering result.

The variable importance was quantified using two metrics:

Mean Decrease Accuracy: This metric measures how much accuracy decreases in the random forest model when a particular variable is randomly permuted, keeping other variables unchanged. A higher mean decrease accuracy indicates that the variable is critical for correct cluster identification.
Mean Decrease Gini: This metric measures the decrease in node impurity (as quantified by the Gini index) when a variable is used to split the data in a tree. Higher values suggest that the variable plays a significant role in distinguishing between data points in different clusters.

Both metrics were used to rank the importance of variables in defining the clusters. The Random Forest results provided insights into which variables contributed the most to the clustering structure, helping to explain the differentiation between clusters or the distinction of a particular cluster from the rest.

“Pure” statistical clusters

Based on the five indicators used in previous OECD work – that is degree of user choice of basic coverage; degree of private provision of primary care and outpatient specialist services; patient choice of providers; health insurance as a secondary source of coverage (“over the basic” coverage); role of primary care in the health system (gate‑keeping) – a hierarchical clustering algorithm was used to create a dendrogram (Annex Figure 2.A.4).

Annex Figure 2.A.4. Dendogram
Copy link to Annex Figure 2.A.4. Dendogram

Seven clusters were then identified as containing elements that were similar among themselves and dissimilar to elements belonging to other groups (Annex Figure 2.A.5).

Annex Figure 2.A.5. Health systems by cluster. Data driven approach
Copy link to Annex Figure 2.A.5. Health systems by cluster. Data driven approach

Sensitivity analysis

Copy link to Sensitivity analysis

A series of analyses were performed to understand whether efficiency scores by country – and their relationship with country clusters – were sensitive to the use of:

two input variables (instead of one input variable base model, that is health expenditure as a share of GDP)
beta regression model to understand the effect of clusters – instead of visual inspection of efficiency scores of one year
life expectancy at birth as model output – instead of age‑standardised mortality rate

Sensitivity of results to using one additional input variable

Efficiency scores by health system using one additional input variable – GDP per capita; obesity; tobacco consumption; pollution; hospital beds; workforce – were compared against efficiency scores estimated using the base model (Annex Table 2.A.1).

				Model with one additional input variable
	Year	Base Model (95% Confidence Interval): ASMR – health expenditure as a share of GDP		GDP per capita (in Purchasing Power Parities)	Obesity (% share of population)	Tobacco (% share of population)	Pollution	Hospital beds	Workforce
Australia	2019	0.84	(0.77 – 0.88)	0.84	0.84	0.96	0.82	0.98	0.84
Austria	2019	0.61	(0.51 – 0.64)	0.61	0.59	0.61	0.78	0.70	0.58
Belgium	2019	0.56	(0.46 – 0.6)	0.56	0.55	0.60	0.58	0.69	0.54
Canada	2019	0.77	(0.69 – 0.8)	0.77	0.77	0.92	0.92	0.97	0.76
Chile	2019	0.71	(0.62 – 0.76)	0.88	0.71	0.72	0.89	0.96	0.92
Colombia	2019	0.62	(0.53 – 0.69)	0.80	0.62	0.64	0.76	0.88	0.90
Costa Rica	2019	0.69	(0.61 – 0.76)	0.86	0.70	0.92	0.70	0.87	0.90
Czechia	2019	0.45	(0.35 – 0.53)	0.47	0.45	0.46	0.51	0.53	0.54
Denmark	2019	0.62	(0.53 – 0.66)	0.62	0.61	0.62	0.75	0.85	0.63
Estonia	2019	0.42	(0.3 – 0.5)	0.47	0.41	0.51	0.49	0.50	0.48
Finland	2019	0.65	(0.56 – 0.71)	0.65	0.65	0.75	0.86	0.84	0.65
France	2019	0.68	(0.59 – 0.71)	0.66	0.65	0.68	0.80	0.80	0.67
Germany	2019	0.49	(0.39 – 0.53)	0.49	0.47	0.48	0.64	0.57	0.49
Greece	2019	0.38	(0.27 – 0.48)	0.58	0.36	0.40	0.57	0.62	0.69
Hungary	2019	0.23	(0.09 – 0.34)	0.29	0.24	0.24	0.29	0.29	0.21
Iceland	2019	0.72	(0.63 – 0.79)	0.72	0.72	0.94	0.93	0.93	0.72
Ireland	2019	0.77	(0.69 – 0.8)	0.78	0.77	0.83	0.92	0.94	0.76
Israel	2019	0.88	(0.8 – 0.94)	0.87	0.87	0.93	0.88	0.92	0.87
Italy	2019	0.59	(0.5 – 0.67)	0.63	0.54	0.60	0.66	0.80	0.74
Japan	2019	0.84	(0.76 – 0.87)	0.82	0.83	0.83	0.81	0.80	0.84
Korea	2019	0.92	(0.85 – 0.99)	0.92	0.85	0.92	0.91	0.92	0.90
Latvia	2019	0.10	(0 – 0.2)	0.23	0.10	0.11	0.19	0.20	0.23
Lithuania	2019	0.12	(0 – 0.22)	0.18	0.11	0.16	0.22	0.22	0.09
Luxembourg	2019	0.84	(0.75 – 0.92)	0.84	0.80	0.93	0.87	0.90	0.84
Mexico	2019	0.52	(0.37 – 0.65)	0.81	0.52	0.80	0.57	0.85	0.81
Netherlands	2019	0.69	(0.6 – 0.72)	0.69	0.66	0.71	0.76	0.86	0.69
New Zealand	2019	0.71	(0.62 – 0.77)	0.73	0.71	0.80	0.69	0.93	0.70
Norway	2019	0.74	(0.66 – 0.78)	0.75	0.71	0.92	0.93	0.90	0.75
Poland	2019	0.43	(0.31 – 0.53)	0.50	0.41	0.53	0.44	0.51	0.53
Portugal	2019	0.53	(0.43 – 0.58)	0.62	0.51	0.59	0.59	0.72	0.54
Slovak Republic	2019	0.42	(0.31 – 0.51)	0.49	0.42	0.43	0.49	0.51	0.57
Slovenia	2019	0.52	(0.41 – 0.6)	0.60	0.51	0.52	0.59	0.72	0.69
Spain	2019	0.67	(0.58 – 0.73)	0.71	0.66	0.68	0.82	0.87	0.81
Sweden	2019	0.73	(0.65 – 0.77)	0.73	0.70	0.88	0.92	0.97	0.74
Switzerland	2019	0.78	(0.7 – 0.81)	0.79	0.74	0.78	0.94	0.91	0.79
Türkiye	2019	0.78	(0.56 – 0.98)	0.81	0.78	0.81	0.81	0.85	0.81
United Kingdom	2019	0.59	(0.5 – 0.63)	0.58	0.59	0.60	0.71	0.83	0.58
OECD average		0.61		0.66	0.60	0.67	0.70	0.76	0.67
Pearson correlation with base model				0.93	1.00	0.94	0.94	0.93	0.87

Note: In bold results presented in this report. Instead of GDP, GNI is used for Luxembourg and GNI* for Ireland.

Sensitivity of results to using a beta regression model

To test the robustness of results, efficiency scores for each year from 2016 to 2022 were calculated and then used in a panel data framework to assess whether clusters of health systems help explain observed variations in efficiency.

Initially, a pooled OLS regression was employed to estimate the relationship between various socio‑economic, environmental, and health-related factors and efficiency scores. Dummy variables for the eight country clusters were included to assess their effect (Clusters). A pooled OLS model was chosen instead of fixed effects due to the invariance of the cluster variables, which would otherwise cause perfect multicollinearity. Testing for time‑fixed effects revealed no significant patterns, leading to their exclusion.

The “potential for output increase” form of the efficiency scores was used, which ranges from 1 (most efficient) and above, but usually slightly above 2 (that implies a 100% potential for outcome increase), and applied a logarithmic transformation to reduce skewness and compress the range between 0 and 1. Given these transformed scores, a beta regression model was used, as it is suited to continuous, bounded data and provided a better fit for the efficiency scores. Equation 2 shows the regression formula used.

Equation 2

${l o g (E f f i c i e n c y}_{i t}) =$

$α + β_{0} {G D P}_{i t} + {β_{1} E d u c a t i o n}_{i t} + β_{2} {U n e m p l o y m e n t}_{i t} + β_{3} {G i n i}_{i t} + β_{4} {W o r k f o r c e}_{i t} + β_{5} {H o s p i t a l B e d s}_{i t} + β_{6} {O b e s i t y}_{i t} + β_{7} {t o b a c c o}_{i t} + β_{8} {C O V I D}_{i t} + β_{9} {C l u s t e r s}_{i} {+ ε}_{i t}$

Where i represents countries an t the time in the 2016‑22 period; GDP is the Gross Domestic Product per capita (in Purchasing Power Parities); Education is the percentage of the population aged 25‑65 with tertiary education; Unemployment is the unemployment rate; Gini index is used to measure income inequality; workforce is the rate of healthcare workers per 1 000 population; Hospital beds is measured as the rate per 1 000 population; Obese is the percentage of the obese population and tobacco the percentage of daily smokers aged 15 and above. Post_COVID‑19 is a dichotomous variable that assumes the value of 0 for 2016‑18 and 1 for 2019‑21. The model presented both heteroskedasticity and autocorrelation, so standard errors were clustered both in time and country, also to account for potential country fixed effects. There was no multicollinearity effect detected (<3 variance inflation factors). There were no outliers or cross-sectional dependence found (studied with Cooks’ distance and Pesaran’s test, respectively). Annex Table 2.A.2 presents the results of the final model.

Variable	Estimate	SE	P-values
(Intercept)	2.5 717	0.85687	0**
GDP	‑1.89E‑05	6.74E‑06	0.01**
Education	‑3.18E‑02	8.59E‑03	0***
unemployment	5.02E‑03	2.06E‑02	0.81
Gini	‑5.9267	1.4482	0***
Obese	2.21E‑03	3.34E‑03	0.51
Tobacco	2.35E‑02	5.66E‑02	0.68
workforce	6.74E‑03	1.28E‑02	0.6
Beds	‑3.08E‑03	1.53E‑02	0.84
Post_Covid	3.19E‑01	2.03E‑01	0.12
Clusters2 (versus cluster 1)	‑2.35E‑01	2.15E‑01	0.27
Clusters3 (versus cluster 1)	‑4.26E‑02	9.84E‑02	0.67
Clusters4 (versus cluster 1)	3.50E‑01	3.72E‑01	0.35
Clusters5 (versus cluster 1)	1.5779	3.57E‑01	0***
Clusters6 (versus cluster 1)	‑6.71E‑02	3.52E‑01	0.85
Clusters7 (versus cluster 1)	1.16E‑02	1.56E‑01	0.94
Clusters8 (versus cluster 1)	‑1.40E‑02	1.40E‑01	0.92
(phi)	1.99E+01	4.6438	0***
Log-likelihood	225.5 on 18 Df
Pseudo-R2	0.64

Note: Significant result at *0.05, **0.01, ***0.001 level. The model uses the Arellano method for heteroskedasticity-consistent standard errors (White). The outcome of the model (Efficiency scores in potential for outcome increase) is log transformed.

Sensitivity of results to using a different outcome variable

The sensitivity of results to using a different outcome variable – life expectancy at birth – was tested. Life expectancy has been previously used to proxy health system performance in efficiency analysis and it is widely regarded as one the health system main outputs. Results of the DEA model using life expectancy as the output variable and different specification of inputs can be found in Annex Table 2.A.3.

				Model with one additional input variable
	Year	Base Model (95% CI): LE ~ health expenditure as a share of GDP		GDP per capita (in Purchasing Power Parities)	Obesity (% share of population)	Tobacco (% share of population)	Pollution	Hospital beds	Workforce
Australia	2019	0.98	(0.98 – 0.98)	0.98	0.98	0.99	0.98	0.98	0.98
Austria	2019	0.97	(0.96 – 0.97)	0.97	0.97	0.97	0.97	0.97	0.97
Belgium	2019	0.97	(0.96 – 0.97)	0.97	0.97	0.97	0.97	0.97	0.97
Canada	2019	0.97	(0.96 – 0.97)	0.97	0.97	0.98	0.97	0.98	0.97
Chile	2019	0.95	(0.95 – 0.96)	0.98	0.95	0.95	0.95	0.97	0.97
Colombia	2019	0.91	(0.9 – 0.91)	0.98	0.91	0.91	0.91	0.92	0.94
Costa Rica	2019	0.96	(0.95 – 0.97)	0.98	0.96	1.01	0.96	0.98	0.99
Czechia	2019	0.95	(0.94 – 0.95)	0.95	0.94	0.95	0.94	0.95	0.95
Denmark	2019	0.96	(0.96 – 0.97)	0.96	0.96	0.96	0.96	0.97	0.96
Estonia	2019	0.94	(0.93 – 0.95)	0.96	0.94	0.94	0.94	0.94	0.96
Finland	2019	0.97	(0.97 – 0.98)	0.97	0.97	0.98	0.98	0.97	0.97
France	2019	0.98	(0.97 – 0.98)	0.98	0.98	0.98	0.98	0.98	0.98
Germany	2019	0.96	(0.95 – 0.96)	0.96	0.96	0.96	0.96	0.96	0.96
Greece	2019	0.97	(0.97 – 0.98)	0.99	0.97	0.97	0.97	0.97	0.99
Hungary	2019	0.91	(0.89 – 0.92)	0.93	0.91	0.91	0.91	0.91	0.93
Iceland	2019	0.99	(0.99 – 0.99)	0.99	0.99	0.99	0.99	0.99	0.99
Ireland	2019	0.98	(0.97 – 0.98)	0.98	0.98	0.98	0.98	0.98	0.98
Israel	2019	0.99	(0.98 – 1)	0.99	0.99	0.99	0.99	0.99	0.99
Italy	2019	0.99	(0.99 – 1)	0.99	0.99	1.00	0.99	0.99	0.99
Japan	2019	1.00	(0.99 – 1)	0.99	0.98	0.99	1.00	0.99	1.00
Korea	2019	0.99	(0.99 – 1)	0.99	0.99	1.00	0.99	0.99	0.99
Latvia	2019	0.90	(0.88 – 0.9)	0.92	0.90	0.90	0.90	0.90	0.92
Lithuania	2019	0.91	(0.9 – 0.91)	0.92	0.91	0.91	0.91	0.91	0.92
Luxembourg	2019	0.99	(0.97 – 1)	0.98	0.98	0.98	0.98	0.98	0.98
Mexico	2019	0.90	(0.88 – 0.92)	0.98	0.90	0.97	0.91	0.97	0.98
Netherlands	2019	0.97	(0.97 – 0.98)	0.97	0.97	0.97	0.97	0.97	0.97
New Zealand	2019	0.97	(0.97 – 0.98)	0.97	0.97	0.98	0.97	0.98	0.97
Norway	2019	0.98	(0.98 – 0.98)	0.98	0.98	0.99	0.98	0.98	0.98
Poland	2019	0.93	(0.92 – 0.94)	0.95	0.93	0.93	0.93	0.93	0.95
Portugal	2019	0.97	(0.97 – 0.97)	0.98	0.97	0.97	0.97	0.97	0.97
Slovak Republic	2019	0.93	(0.92 – 0.93)	0.94	0.93	0.93	0.93	0.93	0.95
Slovenia	2019	0.97	(0.97 – 0.97)	0.97	0.97	0.97	0.97	0.97	0.97
Spain	2019	1.00	(0.99 – 1)	0.99	1.00	1.00	1.00	0.99	0.99
Sweden	2019	0.98	(0.98 – 0.99)	0.98	0.98	0.99	0.98	0.99	0.98
Switzerland	2019	0.99	(0.99 – 0.99)	0.99	0.99	0.99	0.99	1.00	0.99
Türkiye	2019	0.97	(0.92 – 1)	0.98	0.97	0.97	0.97	0.98	0.98
United Kingdom	2019	0.96	(0.96 – 0.97)	0.96	0.96	0.96	0.96	0.97	0.96
OECD average		0.96		0.97	0.96	0.97	0.96	0.97	0.97
Pearson correlation with base model				0.78	1.00	0.88	1.00	0.90	0.87

Note: Instead of GDP, GNI is used for Luxembourg and GNI* for Ireland.

References

[18] Dutu, R. and P. Sicari (2016), “Public Spending Efficiency in the OECD: Benchmarking Health Care, Education and General Administration”, OECD Economics Department Working Papers, No. 1278, OECD Publishing, Paris, https://doi.org/10.1787/5jm3st732jnq-en.

[11] Gallet, C. and H. Doucouliagos (2017), “The impact of healthcare spending on health outcomes: A meta-regression analysis”, Social Science & Medicine, Vol. 179, pp. 9-17, https://doi.org/10.1016/j.socscimed.2017.02.024.

[10] Garcia-Escribano, M., T. Mogues and P. Juarros (2022), “Patterns and Drivers of Health Spending Efficiency”, IMF Working Papers, Vol. 2022/048, p. 1, https://doi.org/10.5089/9798400204388.001.

[19] Ham, C. (1997), “Reforming the New Zealand health reforms”, BMJ, Vol. 314/7098, pp. 1844-1844, https://doi.org/10.1136/bmj.314.7098.1844.

[2] Hotelling, H. (1933), “Analysis of a complex of statistical variables into principal components.”, Journal of Educational Psychology, Vol. 24/6, pp. 417-441, https://doi.org/10.1037/h0071325.

[3] Jacobs, R., P. Smith and A. Street (2006), Measuring efficiency in health care, Cambridge University Press.

[1] Joumard, I., C. André and C. Nicq (2010), “Health Care Systems: Efficiency and Institutions”, OECD Economics Department Working Papers, No. 769, OECD Publishing, Paris, https://doi.org/10.1787/5kmfp51f5f9t-en.

[16] Klein, R. (1995), “Big Bang Health Care Reform - Does it work? The Case of Britain’s 1991 National Health Service Reforms”, The Milbank Quarterly, Vol. 73/3, pp. 299-337.

[8] Manavgat, G. and M. Audibert (2024), “Healthcare system efficiency and drivers: Re-evaluation of OECD countries for COVID-19”, SSM - Health Systems, Vol. 2, p. 100003, https://doi.org/10.1016/j.ssmhs.2023.100003.

[6] Medeiros, J. and C. Schwierz (2015), “Efficiency estimates of health care systems”, European Commission Publications Office, https://data.europa.eu/doi/10.2765/49924.

[5] Moran, V., M. Suhrcke and E. Nolte (2023), “Exploring the association between primary care efficiency and health system characteristics across European countries: a two-stage data envelopment analysis”, BMC Health Serv Res, Vol. 23/1, https://doi.org/10.1186/s12913-023-10369-y.

[14] OECD (2020), Realising the Potential of Primary Health Care, OECD Health Policy Studies, OECD Publishing, Paris, https://doi.org/10.1787/a92adee4-en.

[9] OECD (2017), “Life expectancy at birth”, in Health at a Glance 2017: OECD Indicators, OECD Publishing, Paris, https://doi.org/10.1787/health_glance-2017-6-en.

[13] OECD (2016), Better Ways to Pay for Health Care, OECD Health Policy Studies, OECD Publishing, Paris, https://doi.org/10.1787/9789264258211-en.

[12] OECD/WHO (2014), Paying for Performance in Health Care: Implications for Health System Performance and Accountability, Open University Press - McGraw-Hill, Buckingham, https://doi.org/10.1787/9789264224568-en.

[7] Retzlaff-Roberts, D., C. Chang and R. Rubin (2004), “Technical efficiency in the use of health care resources: a comparison of OECD countries”, Health Policy, Vol. 69/1, pp. 55-72, https://doi.org/10.1016/j.healthpol.2003.12.002.

[4] Sicari, P. and D. Sutherland (2023), “Health sector performance and efficiency in Ireland”, OECD Economics Department Working Papers, No. 1750, OECD Publishing, Paris, https://doi.org/10.1787/6a000bf1-en.

[15] Simar, L. and P. Wilson (2007), “Estimation and inference in two-stage, semi-parametric models of production processes”, Journal of Econometrics, Vol. 136/1, pp. 31-64, https://doi.org/10.1016/j.jeconom.2005.07.009.

[17] Tuohy, C. (2011), “American Health Reform in Comparative Perspective: Big Bang, Blueprint, or Mosaic?”, Journal of Health Politics, Policy and Law, Vol. 36/3, pp. 571-576, https://doi.org/10.1215/03616878-1271279.

Notes

Copy link to Notes

← 1. The term “big bang healthcare reform” refers to large scale changes swiftly implemented (Tuohy, 2011[17]), such as the National Health Service reforms introduced in England in 1991 (Klein, 1995[16]) and the market oriented reforms introduced in New Zealand in 1993 (Ham, 1997[19]).

← 2. Data on policies and institutions were available from the questionnaires for the United States. However, reflecting the complexity and variety of the US health system, those data were not used in the sets of analyses discussed in this report. For examples of descriptive analyses that compare the performance of the US healthcare system with that of other high-income countries see reports by the Commonwealth Fund (https://www.commonwealthfund.org/publications/fund-reports/2024/sep/mirror-mirror-2024) and Kaiser Family Foundation (https://www.kff.org/health-policy-101-international-comparison-of-health-systems/‌?entry=table-of-contents-introduction).

← 3. Using linear programming, Data Envelopment Analysis (DEA) constructs a frontier of the maximum possible output that can be obtained from a given set of inputs (output-oriented efficiency) or of the proportional reduction in input use which is possible keeping output fixed (input-oriented efficiency). This frontier is then used as a benchmark against which the performance of each unit can be assessed. A country’s relative distance to the DEA-estimated frontier is interpreted as a measure of potential efficiency gains (Dutu and Sicari, 2016[18]). Compared to parametric approaches to measuring relative efficiency, such as Stochastic Frontier Analysis, DEA does not require assumptions on the underlying production function, even if it still assumes that the latter is common to all units. Moreover, by not making assumptions about the functional form of the relationship between inputs and outputs, DEA is not equipped to provide any conclusions on the expected change in outputs to a marginal change in inputs. In this line, on its own, DEA is not meant to inform how to improve efficiency. Instead, it informs about how far a unit is from the most efficiency possible at that level of input.

← 4. For Luxembourg, Gross National Income (GNI) was used to account for compensation of cross-border workers. For Ireland, GNI modified was used to exclude net profits of companies that have been sent abroad, depreciation on Intellectual Property and on leased aircraft, and net income of redomiciled Public Limited Companies.

← 5. Age standardised mortality rates in 2019 were used to limit the bias due to the COVID‑19 pandemic. In sensitivity analysis, a longer time window (2016‑22) was used.

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Countries

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Publications

Publications

Featured publications

Data

Data

Featured data

News & events

News & events

Featured events

About OECD

About

Engage with us

Work with us

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment