This chapter presents the meta-analytical methodology used to derive preliminary unadjusted value of statistical life (PUVSL) estimates, which are intermediary values to the value of statistical life (VSL) estimates, and provides details on how the data was prepared and screened. The chapter highlights best-practices in meta-analyses and explains the concept of the random-effect model used in this report. The chapter presents the PUVSL estimates obtained for the OECD, the EU, the United States, high-income countries, low- and middle- income countries and at the global level. The chapter also presents results from sensitivity analyses of the PUVSL estimates.
Mortality Risk Valuation in Policy Assessment
4. Meta-analysis method and preliminary unadjusted VSL estimates
Copy link to 4. Meta-analysis method and preliminary unadjusted VSL estimatesAbstract
4.1. Main approach to the analysis of VSL meta-data for policy use
Copy link to 4.1. Main approach to the analysis of VSL meta-data for policy useIn many cases, it may not be possible to perform primary valuation studies to estimate the economic benefits or costs of mortality effects associated with a policy under consideration. A common practice is therefore to use existing value of statistical life (VSL) estimates in cost-benefit analyses (CBA). This chapter presents the methodological approach for estimating VSLs from existing primary studies. The preliminary VSL estimates presented in this chapter are not recommended for use without further adjustments, which are discussed in Chapters 5 and 6. This chapter also discusses data quality considerations and presents sensitivity analyses of the preliminary estimates presented.
This report considers four different options for deriving updated VSL estimates from existing VSL primary studies. The first option is a form of unit or benefit function transfer () in which primary VSL estimates from a subset of studies are compared, matched and then adjusted to the extent possible, to the relevant policy context. USEPA (2016[1]) observes that a key advantage of this option is its potential to enhance the precision of the transfer if suitable studies matching the policy context are available. Conversely, this approach carries a risk that analytical accuracy could be reduced when the availability of such studies or primary estimates is limited. Moreover, this approach is not feasible for new policy contexts for which primary valuation estimates are not available. Finally, this approach requires some degree of analytical judgement, potentially complicating the ultimate transparency and objectivity of the approach.
The second approach proceeds by establishing one or more base VSL estimates, and then reviews available evidence from the literature to identify appropriate adjustments factors (e.g. differences in risk and population characteristics, or child vs. adult mortality risk). Under this approach, the “evidence speaks” without a need to impose restrictive assumptions, such as introducing estimation covariates that would require assumptions about the functional relationship between VSL and risk characteristics. Despite its flexibility in terms of assumptions, this approach relies on supplementary evidence from a broader literature on the relationship between VSL and other factors in the primary valuation data. Considering that the meta-data collected from primary valuation data are often not detailed enough to allow for robust estimations, this approach may also be challenging to implement in practice.
The third approach is the standard meta-regression approach in which meta-functions are estimated and used directly in the benefit transfer (Johnston et al., 2021[2]; Johnston et al., 2017[3]) (cf. discussion in Section 2.5 of Chapter 2 and examples in Lindhjem and Navrud (2015[4])). In this approach, values for relevant covariates specific to the policy context such as type of risk, age, and income are inserted into the meta-function. Values for certain methodological characteristics1 are also identified and the meta-regression function is then used to calculate the relevant VSL estimate for the policy context of interest. While this third approach is the most general of the three options, it also requires significant effort to collect and code information on a wide range of descriptors from each study, which can be challenging as this information is not always readily available for all studies. Additionally, it requires that meta-data are extensive enough to identify functional relationships between VSL and a range of variables (Boyle and Wooldridge, 2018[5]). This can be difficult to achieve in practice, especially in the case of global meta-data where heterogeneity in methods and other factors can be high. Finally, relevant data from the policy context of interest (e.g. information on risk and population characteristics) must also be available in order to carry out benefit transfers using this approach.
Given the high level of heterogeneity in the study characteristics and policy of global meta-data of VSL estimates, most of the approaches used to carry out benefit transfer across countries in the academic literature pertain to situations where the data is less heterogenous (e.g. national datasets such as in Johnston et al. (2017[3])), and so have less relevance to more heterogeneous international contexts as a basis for practical policy guidance. A suitable alternative to the comprehensive meta-analytic approach is therefore a simpler reduced form meta-function containing only the most critical variables such as income, country/region, cause of mortality and valuation methods (Lindhjem and Navrud, 2015[4]). This approach can, however, be sensitive to the choice of functional form and the models’ ability to explain the variations in the data (Johnston et al., 2021[2]). A simpler and more transparent approach that uses unit transfers based on the baseline distribution of VSL estimates can also produce reliable results (Johnston et al., 2021[2]; Lindhjem and Navrud, 2015[4]; Robinson et al., 2019[6]).
Another common approach is to combine the second and third approaches described above. This approach proceeds by establishing one or more base VSL estimates along with a range of estimates for “default” case(s). It then uses meta-regression analysis, literature review and expert assessment to identify the adjustment factors that are appropriate to apply to the base VSL estimates. This approach was used in OECD (2012[7]), wherein the central tendencies (mean and median) of the distribution of VSL estimates was estimated from the meta-data. The study then carried out (i) a meta-regression analysis to estimate the income elasticity of VSL to use in benefit transfer between countries and (ii) a general review of the evidence underpinning other potential adjustments. In its most recent review of VSL literature from the United States, the US EPA (2016[1]) similarly opted for a combination of the second and third approaches discussed above.2
The approach taken in the present report is similar to that of OECD (2012[7]) and US EPA (2016[1]). In particular, it first estimates preliminary and unweighted mean VSL (PUVSL) estimates and ranges (Section 4.5). Second, it considers potential adjustments to the base VSL estimates for specific policy use cases based on meta-regression analysis (Sections 5.2 and 5.3 of Chapter 5), as well as a review of other available empirical evidence (Section 5.4 of Chapter 5).
4.2. Choice of meta-data for updating the VSL analysis
Copy link to 4.2. Choice of meta-data for updating the VSL analysisA number of choices regarding data and methodology must be made in deriving new base VSL estimates and potential adjustment factors. This chapter discusses the choice of the scope of studies included in the meta-analysis.3 A critical question is how well current evidence reflects individuals’ trade-offs between mortality risk reduction and income or wealth, and whether this provides a reliable basis for assessing such trade-offs in the future. This is difficult to evaluate, as a number of factors can change over time that can make older studies less reliable for future projections. For example, structural economic changes, including to industries and demographics, may alter how individuals make trade-offs between income and mortality risk reductions (cf. discussion in Chapter 5 on the income elasticity of VSL).
In many policy applications, individual preferences are often assumed to be constant (e.g. regarding ranking products or making trade-offs between savings and consumption). This implies that changes in willingness to pay (WTP) and therefore in VSL estimates over time are primarily functions of income and other factors that shape individuals’ utility and the trade-offs they make between income or wealth and reductions in mortality risk (St-Amour, 2024[8]). While this assumption may hold over short time periods, it may be more difficult to justify over longer horizons. Further, such temporal dynamics require highly specific data in order to be captured empirically (e.g. longitudinal panel studies of the same individuals) (Hammitt, Liu and Liu, 2022[9]; Ørbeck et al., 2024[10]; SAB/USEPA, 2017[11]).4 In addition, if assuming that the quality of VSL estimates improves over time as scientific methodologies improve, using a set of newer studies can be considered to provide higher quality estimates than older studies.
A number of studies have explored the temporal stability of preferences and WTP for various types of non-market goods, especially for environmental goods. Some evidence indicates that preferences are reasonably stable across short periods of a few years (Ørbeck et al., 2024[10]).5 For example, in identical consumer market (CV) surveys of cancer risks carried out in 2014 and 2019, Alberini and Ščasný (2021[12]) found that VSL increased by 41% over this time period. The authors attribute this increase to an increase in income and a lower dread of cancer among participants in 2019. Hammitt et al. (2019[13]) found that VSL increased by a factor of 25 over a 11-year period for a repeated CV survey conducted in China. The authors observed that this increase could be explained by income growth, although their implied income elasticity of 3 falls in the upper range of parameters when compared to other studies (cf. discussion in Section 5.4 of Chapter 5).
Overall, however, the evidence on the stability of preferences for mortality risks and VSL over time is limited (Alberini and Ščasný, 2021[12]; Hammitt et al., 2019[13]; Hammitt, Liu and Liu, 2022[9]), and the issue of when empirical evidence becomes irrelevant remains unsettled, in particular for VSL estimates. Consequently, systematic review procedures follow no standard rules on this issue, suggesting that the time period covered by meta-data should depend on the policy issue of interest. A common practice in the literature is to limit systematic reviews to a consideration of more recent studies, such as those published in the last 10 to 20 years (e.g. (Keller et al., 2021[14]), which performed a systematic review of VSL estimates based on evidence from 2009 to 2019). Generally speaking, justification for the choice of time period covered in systematic reviews and meta-analyses is seldom provided. Further, no established guidelines exist regarding the weighting of evidence from older studies in meta-analyses. It seems reasonable to include as much of the existing empirical evidence as possible, even if some of such evidence may be outdated or perceived as less informative, and especially when the available pool of studies to draw from is limited. Including all studies regardless of their age has been the approach favoured by the US EPA (2016[1]) as well as in other single country reviews (Ananthapavan et al., 2021[15]; Ginbo, Adamowicz and Lloyd-Smith, 2023[16]).6 When following such an approach, however, it is unclear how any challenges related to older evidence (e.g. relative weighting) should be addressed.
As highlighted in Chapter 3, while there are four meta-data sources available for this analysis, only the newer datasets (new stated preference (New SP) studies and new revealed preference (New RP) studies), covering publications in the period 2009 – 2024 are used to derive VSL estimates.7 In light of the above discussion, this choice is motivated by four reasons.
The first reason for limiting the meta-analysis to more recent studies is that the recommendations presented in this report are expected to be used to value the effects of policies pertaining to mortality risks in future years. To this end, more recent studies are likely to better reflect the current preferences of individuals and circumstances under which (real or hypothetical) choices underpinning VSL estimates are made. Second, assuming that SP and RP methodologies, and the scientific quality of valuation analyses more generally, have improved over time, VSL estimates from more recent studies can be considered more reliable than those from older studies. Third, considering uncertainty in the literature regarding how the income elasticity of VSL may change over time, limiting the data to more recent SP and RP studies reduces the possible impact of such uncertainty. As the descriptive data in Section 3.3 of Chapter 3 indicates, the choice between using all available studies versus only the new SP and RP datasets is unlikely to significantly impact base VSL estimates. Fourth, unlike the older data sources, the new SP and RP studies were compiled using a pre-registered set of transparent systematic review procedures following current best practice (cf. description in Section 3.1.3 of Chapter 3 and Annex A). These methods substantially alleviate the potential for selection bias arising from data collection procedures. Moreover, the use of such methods facilitates replication efforts, enabling the analysis to be repeated for subsequent updates of the meta-data and lending methodological consistency over time. An analysis of the sensitivity of estimated mean VSL estimates to the choice of data source is nevertheless provided in Section 4.7.
4.3. Data preparation and screening
Copy link to 4.3. Data preparation and screening4.3.1. Conversions and outlier exclusion
Apart from using CPI and PPP to adjust for inflation and normalise the VSL estimates obtained from different studies to 2022 USD, no other adjustments are applied to the VSL estimates in the data preparation for analysis (cf. Section 3.1.4 of Chapter 3). This approach eliminates the risk of making assumptions that may not be theoretically well-founded or difficult to document based on information from the available primary valuation studies.
There is no consensus in the meta-analysis literature on VSL regarding the superiority of certain pre-screening procedures for addressing challenges such as the discounting of mortality risks that occur at different times. In practice, it is difficult to extract highly specific information from a substantial portion of primary valuation studies, and it may be undesirable to exclude certain studies from the meta dataset because they lack this information8. Moreover, each VSL estimate was elicited under certain circumstances and contexts that are unique to that estimate. Such circumstances can only be partially extracted or quantitatively coded in the analysis. As discussed in Section 3.3 of Chapter 3, the only VSL estimates that are excluded from the current analysis are outliers that are defined via statistical procedures as implausibly high or low (far-outs). This screening procedure is common practice in the scientific literature and results on the impact of removing far-out observations are provided in Annex E.
4.3.2. Coding of variables
As explained in Chapter 3, information contained in primary valuation studies in the New RP and SP datasets was screened and extracted to code a set of basic variables for analysis. Table 4.1 describes these variables. As extensively discussed in OECD (2012[7]), it is impossible to capture information on all of the factors that may explain variations in VSL estimates across studies. This is due to the fact that some studies do not report the information needed to code all factors or because certain explanatory factors are unique to particular valuation methods or application contexts (e.g. specific experimental variations in SP survey instruments). The variation in study types and comprehensiveness of reporting across studies results in significant proportions of missing values for some coded variables. For example, in SP surveys, respondents can be presented with scenarios that can vary along a range of dimensions, including types and sizes of risks, , risk acuteness, different interventions that can be taken to reduce risks, the types and period of payments that can be made to reduce risks, etc.
It is also relevant to note that, when merging SP and RP datasets, there are SP- and RP- specific variables that cannot be meaningfully analysed with merged VSL data. Following the literature and previous meta-analysis studies (OECD, 2012[7]; Ginbo, Adamowicz and Lloyd-Smith, 2023[16]; Lindhjem et al., 2011[17]; Masterman and Viscusi, 2018[18])9, however, several variables specific to the SP and RP datasets were nevertheless coded in order to enable for method-specific analyses. The definitions of coded variables in Table 4.1 are grouped into categories: Risk type, risk acuteness, Elicitation methods, Sample data characteristics, and Methodological characteristics. Variables under the latter category are SP- and RP-specific, and some of which may be treated as proxies for study quality (see discussion in Section 4.3.3). In addition to the variables shown in the table, other variables in the database were coded to identify individual countries and country groups (i.e. the OECD, EU, Low- and middle-income countries and High-income countries10). Summary statistics of all coded variables are provided in Table A C.5 of Annex C.
Table 4.1. Description of coded variables in the meta dataset
Copy link to Table 4.1. Description of coded variables in the meta dataset|
Variable Name |
Description |
|---|---|
|
WB income category |
World Bank income category (Hamadeh, Van Rompaey and Metreau, 2023[19]) |
|
GDP per capita (USD thousand) |
Country's GDP per capita in the year of VSL estimate, adjusted to thousands of 2022 USD |
|
Journal |
1 if published in a peer-reviewed journal, 0 otherwise |
|
Risk types |
|
|
Climate |
1 if climate-related risk, 0 otherwise |
|
Crime |
1 if crime-related risk, 0 otherwise |
|
Disaster |
1 if disaster-related, 0 otherwise |
|
Environment |
1 if environment-related risk, 0 otherwise |
|
Health |
1 if health-related risk, 0 otherwise |
|
Job |
1 if job-related risk, 0 otherwise |
|
Military |
1 if military-related risk, 0 otherwise |
|
Natural Disaster |
1 if natural disaster-related risk, 0 otherwise |
|
Not Specified |
1 if no cause/type of risk specified, 0 otherwise |
|
Suicide |
1 if suicide-related risk, 0 otherwise |
|
Transportation |
1 if transportation-related risk, 0 otherwise |
|
Virus |
1 if virus-related risk, 0 otherwise |
|
Cancer |
1 if cancer is mentioned in the study/survey, 0 otherwise |
|
Risk acuteness |
|
|
Acute |
1 if risk occurs in the immediate term, 0 otherwise |
|
Chronic |
1 if risk occurs in the long term, 0 otherwise |
|
Mixed/job-related |
1 if risk occurs over an undetermined time frame or is job-related, 0 otherwise |
|
Elicitation methods |
|
|
CM |
For RP studies: 1 if consumer market (hedonic price), 0 otherwise |
|
HW |
For RP studies: 1 if hedonic wage, 0 otherwise |
|
CV |
For SP studies: 1 if contingent valuation, 0 otherwise |
|
CE |
For SP studies: 1 if choice experiment, 0 otherwise |
|
Other SP method |
For SP studies: 1 if other SP methods, 0 otherwise |
|
Sample data characteristics |
|
|
Avg sample income (USD thousand) |
Average annual income in the sample, as reported, adjusted to thousands of 2022 USD |
|
Household income |
1 if average sample income is household income, 0 if individual income |
|
Nationwide |
1 if the target population is nationwide, 0 otherwise |
|
General population |
1 if the target population covers the general adult population, 0 if the target population is specific groups |
|
Representative |
For SP studies: 1 if random or probabilistic sampling is used, 0 otherwise For RP studies: 1 if all sample of a specific group is used, 0 otherwise |
|
Sample income |
1 if any income information (e.g. mean, range) of the sample is reported, 0 otherwise |
|
Sample age |
1 if any age information (e.g. mean, range) is reported, 0 otherwise |
|
Sample average age |
Average age of the sample, as reported |
|
Sample gender |
The proportion of male individuals in the sample |
|
Only male |
1 if the sample is limited to male individuals (i.e. Sample gender = 1), 0 otherwise |
|
Only female |
1 if the sample is limited to female individuals (i.e. Sample gender = 0), 0 otherwise |
|
Methodological characteristics |
|
|
SP representative |
1 if the SP study uses random or probabilistic sampling (same as Representative), 0 otherwise |
|
SP study including baseline risk |
1 if the survey clearly defined the baseline risk, 0 otherwise |
|
SP study focusing on scope sensitivity |
1 if the study reports a scope test (including external and internal), 0 if otherwise |
|
SP study including visual aids |
1 if the study uses visual aids, 0 otherwise. |
|
SP with risk change |
Annualised risk change used in the SP survey |
|
RP study using Census of Fatal Occupational Injuries (CFOI) data |
1 if CFOI data is used, 0 otherwise |
|
RP study using IV estimate |
1 if the VSL is estimated using an IV estimator for fatality rate, 0 otherwise |
|
RP study controlling for non-fatal injury risks |
1 if the regression controlled for non-fatal injury risk, 0 otherwise |
|
RP with full sample |
1 if the full sample of employed individuals is used, 0 if particular groups of workers are used |
|
Hedonic wage RP study |
1 if the dependent variable in the HW regression is wage rather than natural logarithm of wage, 0 otherwise |
|
Quality score (SP) |
1 if: SP study including baseline risk = 1 and SP study focusing on scope sensitivity = 1 and SP study including visual aids = 1, 0 otherwise |
|
Quality score (RP) |
1 if: RP study using CFOI data = 1 and RP study using IV estimate = 1 and RP study controlling for non-fatal injury risk = 1, 0 otherwise |
Note: Risk types are defined as the following: Disaster-related risk: nuclear accident, house fire; Natural disaster-related risk: avalanche, flood, earthquake; Climate-related risk: extreme weather, sea level change and depletion of fisheries, heat stroke; Virus: COVID-19, dengue fever, rabies.
4.3.3. Quality considerations
Valuation studies vary in their methodological rigor and the robustness of their estimates. Moreover, as with other methods of scientific inquiry, valuation methods improve over time. A key methodological question for meta-analyses is whether estimates from less rigorous studies, i.e. “invalid” estimates, should be excluded from analyses on the basis of which VSL recommendations are made. “Invalid” estimates can be understood as estimates that are systematically biased from the true underlying VSL estimates (Bishop and Boyle, 2019[20]). The concept of validity is different from reliability, as the latter is related to uncertainty (i.e. the variance) in the estimates around the true values11. In principle, excluding VSL estimates considered to be invalid may be sensible since (highly) biased estimates do not accurately reflect individuals’ true trade-offs between risk and income.
Varying practices exist in the literature regarding screening for the quality of primary valuation studies. OECD (2012[7]) carries out a quality screening based on the size and representativeness of the samples and whether information about risk changes was provided. Other criteria are used in the broader health literature (e.g. Keller et al. (2021[14])12) but may not be easily applied when assessing the validity of VSL estimates. Additional criteria that were considered (for SP in particular) are whether WTP is higher for larger risk reductions (i.e. whether VSL estimates pass a scope test), whether certain estimates have been identified as “best estimates”, and whether the surveys are well-tested and of high quality (Lindhjem et al., 2011[17]). US EPA (2016[1]), for example, applies relatively strict eligibility and quality criteria for the inclusion of primary valuation studies based on earlier VSL guidance (USEPA, 2011[21])13. This procedure was critiqued by US EPA SAB (2017, p. 18[22]), stating that: “…it may not be possible to determine that a study or estimate is valid, but it may be possible to decide that there is insufficient evidence to support a conclusion of invalidity and the data are therefore worthy of inclusion in the analyses. In such cases the burden of proof should be on rejecting studies. If the weight of evidence points toward validity, the study should be included.”14 They propose a list of quality criteria, while recognising the difficulty of operationalising it, and highlight that more work is needed in the future15. Ultimately the US EPA SAB (2017[11]) argues that, until more objective and transparent criteria become available, it is preferable to include estimates rather than to exclude them, for transparency and other reasons.
It is difficult to determine a set of transparent and generally acceptable criteria for assessing study quality that are stable over time. This is due in part to the continuing evolution of scientific methodologies, as well as the fact that scientific opinions regarding these criteria can vary at any one point in time. What may be considered to be highly valuable by an analyst in one context may not be deemed appropriate by a different analyst in another context. Moreover, various considerations may drive an analyst’s preferred quality criteria, ranging from minimum requirements to desirable requirements. Additionally, the effects of quality screening on VSL estimates may vary substantially across studies/contexts, and could be difficult to assess systematically. Finally, some factors that are considered important for research quality in the context of peer-reviewed journal publications (e.g. advanced econometrics) may be considered less relevant for producing valid VSL estimates in contexts where more practical considerations (e.g. sampling, SP survey design, testing and data quality)16 carry more weight than academic sophistication. Existing guidelines remain unsettled on several important methodological issues, especially for SP methods (see for example (Johnston et al., 2017[23])). While some issues also remain unsettled for RP methods, several best practices are nevertheless well-established in this area (Evans and Taylor, 2020[24]).
Based on the considerations above, this report adopts a conservative approach, erring on the side of including rather than excluding studies based on quality criteria that are difficult to define17. Consequently, to the extent possible, information on several quality indicators for both RP and SP studies are recorded in the database. For SP studies, examples include whether probabilistic sampling and visual aids to display risks are used. For hedonic wage RP studies, examples include an indicator of the quality of underlying risk data18, of the use of instrumental variables (IV) and of controlling for non-fatal injuries (Viscusi, 2019[25]) (cf. Table 4.1). Another quality variable indicates whether studies have been published in peer-reviewed journals19. Using the information provided by these variables, sensitivity analyses to assess the importance of study quality on the results are provided in Section 4.7 and Annex E. Basing the recommendations in this report on an inclusive dataset allows practitioners to make their own choices with respect to controlling for such quality criteria.
Further, to integrate measures of statistical error associated with VSL estimates in the analysis, standard errors are extracted, included and estimated (or imputed if missing) (cf. Section 4.4 and Annex D), allowing for some control of uncertainty in the estimation of VSL estimates. As a result, VSL estimates from studies with small samples that are associated with relatively higher uncertainty have a lower weight in the meta-analysis relative to more reliable estimates. The use of Bayesian imputation for the standard errors in this study represents a methodological advancement relative to previous VSL literature (McElreath, 2018[26]).
4.3.4. Publication bias
A final point related to data quality concerns so-called publication selection bias. This can occur when authors selectively report their research results and also if journals systematically avoid publishing certain results. If this process systematically prevents the publishing of certain results, it may bias the body of evidence in the literature over time. In other words where publication selection bias is present, published studies are no longer a representative sample of the available evidence. This problem is likely to be more serious in other types of economic research where it is often necessary to document statistically significant relationships in order to publish an analysis (Askarov et al., 2023[27]). However, such bias has also been investigated by a few explorative studies in the VSL literature (Doucouliagos, Stanley and Giles, 2012[28]; Masterman and Viscusi, 2020[29]). Current evidence of publication bias in the VSL literature is still relatively limited and the methods to detect such bias are not necessarily fully appropriate. The general method of detecting publication bias is based on statistical methods that derive so-called “funnel plots”. Both the methods used to derive these plots and interpretation of their results, have limitations, some of which are discussed here as examples: First, a critical assumption for analyses of publication bias is the independence between VSL estimates and standard error (SE) estimates. An implication of this assumption is that imputed SE estimates cannot be used in analyses of publication bias because they are derived from a relationship between the SE and mean VSL estimates (implying that imputed SEs and mean VSL estimates are correlated).20 Second, funnel plots are unable to provide insights regarding the causes of suspected bias. Doucouliagos et al. , for example, find that the mean VSL estimate from the hedonic wage literature is 70-80% lower when accounting for publication bias. Masterman and Viscusi report a similar magnitude of publication bias, resulting in a reduction of the mean VSL estimate from SP studies by 90%. Third, it should be noted that the bias-findings are also sensitive to the assumptions made about the underlying true distribution of the VSL estimates and how the primary VSL study was structured. For example, it is common in the stated preference literature to use statistical methods and tests that ensure that respondents exhibit a positive willingness to pay to reduce risk, which then by design eliminates the negative willingness to pay that can sometimes be observed in revealed preference studies. As discussed above, existing research suggests that the impact of publication selection bias could potentially be large. However, further investigation is needed, regarding both the validity and use of these methods to detect bias and the potential reasons why large biases may exist. To the authors’ knowledge, no existing guidance regarding CBA and the valuation of mortality risks recommends adjusting base VSL estimates for publication bias, and therefore no adjustments were made in this report.
Several steps have nevertheless been taken in the present report to reduce potential publication bias. First, studies were identified based on a keyword search with a wide topical scope that covered economics, health, as well as interdisciplinary publication outlets. Second, the VSL meta-data includes publication types beyond traditional journal papers, such as working papers, conference proceedings, academic theses and other studies that were not published in journals. Third, the full range of VSL estimates have been included in the analysis, including a small share of negative VSL estimates, which are statistically possible outcomes of analyses of primary valuation data. In addition, Annex E provides an estimate of the impact on the preliminary unweighted VSL estimates of limiting the sample to peer-reviewed journals, and Chapter 5 also includes the marginal effect of peer reviewed journals and of using two study quality scores.
4.4. Meta-analysis methodology for estimating mean VSL
Copy link to 4.4. Meta-analysis methodology for estimating mean VSL4.4.1. Best practice meta-analysis methods
Drawing on best practice guidelines provided by US EPA (2006[30]), Nelson and Kennedy (2009[31]) and Nelson (2015[32]), the meta-analysis in this report uses a multilevel random effects model (Harrer et al., 2021[33]; Sera et al., 2019[34]). This approach normally considers three key characteristics of the meta-data: sample heterogeneity, heterogeneity in the variance of VSL estimates (so-called heteroskedasticity), and correlation within and between primary studies (Nelson and Kennedy, 2009[31]). In this report a fourth level is added that considers heterogeneity resulting from the choice of elicitation approach. This is further discussed below and described formally in Box 4.1.
Primary valuation studies use different samples and methods to estimate VSL estimates, which generates heterogeneity in VSL estimates in the meta-data. Previous meta-analyses use either a meta-regression approach, which accounts for variation through the inclusion of regressors (Viscusi, 2019[25]) or a random effects approach, which accounts for variation by including random intercepts (Kochi, Hubbell and Kramer, 2006[35]). In this study, both approaches are used: a random effects model (Section 4.4.2 and Box 4.1) and a meta-regression, also controlling for random effects (Chapter 5).
Due to heterogeneity in the characteristics of primary valuation studies, VSL estimates generally have heteroskedastic variances. Meta-analyses require variance (i.e. standard error) estimates from primary studies in order to calculate weighted averages and to correct for heteroskedasticity. Nelson and Kennedy (2009[31]) therefore strongly recommend researchers to collect primary data on variances21. However, many recent meta-analyses of VSL estimates are mixed in that some studies collect, calculate or impute these estimates from primary studies (Lindhjem et al., 2011[17]; Viscusi, 2019[25]), while others approximate variances using sample sizes instead (Ginbo, Adamowicz and Lloyd-Smith, 2023[16]).
In this report standard errors of VSL estimates are collected from the primary valuation studies. When standard errors are not reported, they are recovered from other available information, such as confidence intervals. In the absence of both, standard errors are imputed using a Bayesian imputation method (McElreath, 2018[26])22. With this information, the random effects model considers each estimate's sampling error as a random draw from a normal distribution, taking into account each estimate's variance from the primary study. This standard error information, together with higher-level random effects, are used to calculate mean VSL estimates (Harrer et al., 2021[33]; Sera et al., 2019[34]) and to correct for heteroskedasticity.
Multilevel random effects models also address the independence assumption of estimates within and between studies. Since most primary valuation studies of VSL report multiple estimates, the independence assumption between estimates from the same study can easily be violated. Although Nelson and Kennedy (2009[31]) call attention to this problem of non-independence, most previous meta-analyses of VSL have taken this into account only to a limited extent.23
In addition to accounting for random effects at the estimate level, multilevel random effects models can also account for random effects at the study level (level three) in order to address non-independence. This enables the model to capture heterogeneity in VSL estimates between studies by allowing a random effect to vary across studies. The use of three-level random effects models is common practice in the meta-analysis literature in subjects such as medicine and natural sciences (Brown et al., 2024[36]; Harrer et al., 2021[33]; Sera et al., 2019[34]).
4.4.2. The random effects model concept
The most basic random effects model, also called the two-level random effects model, introduces an additional variance component (in addition to the “usual” sample variance) that accounts for the fact that VSL estimates do not come from a single population. Instead, each VSL estimate is assumed to be an independent draw from a “universe” of populations (Harrer et al., 2021[33]). This two-level random effects model is illustrated in Figure 4.1. The true VSL value (or “effect size” in the figure) of study differs from the observed VSL value by a sampling error . However, the true VSL value of study is only a point within a distribution of true VSL values of mean . The difference (or error) between the true VSL value of study and the mean of the distribution of true VSL values is . In this example, the observed VSL value of study deviates from the mean of the distribution of true VSL values by two sources of error ( and ).
Figure 4.1. Intuition behind the random effects model (two levels)
Copy link to Figure 4.1. Intuition behind the random effects model (two levels)
Source: Harrer et al. (2021[33])Results from the four-level random effects model are reported in Table 4.2.
4.5. Estimated preliminary and unweighted mean VSL (PUVSL) by country group using a multilevel random effects model
Copy link to 4.5. Estimated preliminary and unweighted mean VSL (PUVSL) by country group using a multilevel random effects modelAs described above, the random effects meta-analysis model accounts for the fact that the VSL estimates collected from many studies have more potential sources of variation than those drawn from a single homogeneous population. Unlike the fixed-effects24 meta-analysis approach, the random effects model assumes that individual VSL estimates deviate from the true VSL value not only due to pure sampling error, but also due to other sources of variation. After testing a number of alternative econometric specifications of random effects models, a four-level model was selected as the basis for the reported results. This model accounts for four sources of heterogeneity in the data, namely sampling variations within estimates of a study, heterogeneity between estimates in a study, heterogeneity among VSL studies, and heterogeneity resulting from the choice of VSL elicitation method. The use of a fourth level that accounts for VSL elicitation methods is a methodological improvement that has not previously been used in the VSL literature to the authors knowledge. The detailed specification of the random effects model is provided in Box 4.1.
Alternative specifications to the final model and related discussions are also available in Annex E. Table 4.1and Figure 4.1 report mean preliminary unweighted VSL (PUVSL) estimates and 95% confidence intervals (CI) for studies conducted in the following country groups:
All countries
OECD
EU
United States
World Bank income categories country groups25:
High-income countries
Low- and middle-income countries
Box 4.1. Econometric specification of the multilevel random effects model
Copy link to Box 4.1. Econometric specification of the multilevel random effects modelTo formalise the multilevel random effects model, let an observed VSL estimate from a study that uses an elicitation method1 be defined as . For example, the selected dataset that includes newer SP and RP studies has 2 469 VSL estimates () produced by 156 studies (). The first-level random effects model accounts for sampling variations within estimates in a study by assuming that an observed individual VSL estimate, , can be decomposed into the estimate’s true VSL estimate and a pure sampling error (i.e. ). The sampling error is assumed to follow a standard normal distribution with an observed standard error estimate : .
The second-level random effects model captures heterogeneity between VSL estimates within studies. The true individual VSL estimate is now modelled as follows: , with where is a random intercept at the estimate level and will be estimated in the model. is the true study-level VSL estimate.
Three-level random effects model captures heterogeneity in VSL estimates between studies within elicitation methods. The true study-level VSL estimate can now be decomposed as follows: , , where is a random intercept at the study level. The three-level model ends here, and the true VSL estimate from the level three model is , which will be reported under three-level models.
The four-level random effects model goes one step further by capturing also the heterogeneity in VSL estimates between elicitation methods (i.e. CE, CV, HW and CM). With this added component, the true elicitation-method-level VSL estimate can be decomposed as follows: , . is a random intercept at the level of the elicitation method. The true population VSL estimate is reported as the results of the four-level model.
Heterogeneity in variance parameters are higher the greater the heterogeneity within each level . For example, if the estimate of is not statistically significantly different from zero, the data suggests that the three-level model is preferable to, or not distinguishable from, the four-level model2. In summary, an observed individual VSL estimate using the population VSL estimate (which is the estimand3) can be written as follows:
Equation 4.1
,
,
,
,
.4
1. I.e. Choice experiment (CE), contingent valuation (CV), hedonic wage (HW) or consumer market (CM).
2. A likelihood-ratio test is used to determine whether a four-level model is statistically different from a three-level model.
3. An estimand is a quantity that is to be estimated in a statistical analysis.
4. Note that although Shapiro-Francia tests reject strict normality distribution of residuals of levels 1-3 (not 4), the residuals are distributed almost symmetrically around zero (these results are not reported for brevity). It should also be noted that normality of the residuals is not essential in linear estimators and has little impact on the point estimates (Rubio-Aparicio et al., 2018[37]; Schmidt and Finan, 2018[38]; Wooldridge, 2014[39]). Log transformation worsens the model fit for the four-level random effects model.
The results presented in Table 4.2 and Figure 4.2 exclude so-called far-out observations of VSL estimates. Figure A E.5 in Annex E presents results from a sensitivity analysis of the estimated PUVSLs for all countries to the inclusion of far-outs. The results suggest that including far-outs increases the PUVSL by approximately 5%. Hence, the effect of extreme values in the simple (non-parametric) means at the estimate level shown in Section 3.3 of Chapter 3 are accounted for and almost fully removed in the four-level random effects model.
It should be noted that the mean PUVSL estimates reported in Table 4.2 are not recommended values for policy use. Further adjustments, detailed in Chapters 5 and 6, are necessary when applying them in specific policy settings. As can be seen in Table 4.2, estimated mean PUVSL vary between ca. USD 0.9 million for the low- and middle-income country group to USD 7.6 million for the EU. There is little variation within the groups of relatively richer countries, with mean PUVSL estimates ranging from USD 7.1 million for the OECD to USD 7.6 million for the EU.26
Table 4.2. Mean Preliminary Unweighted VSL (PUVSL) estimates by country group
Copy link to Table 4.2. Mean Preliminary Unweighted VSL (PUVSL) estimates by country groupNew SP and RP data, USD million
|
Countries |
Mean PUVSL estimate |
95% CI (Lower bound) |
95% CI (Upper bound) |
Number of studies |
Number of estimates |
|---|---|---|---|---|---|
|
All countries |
5.5 |
3.7 |
7.2 |
156 |
2 449 |
|
OECD |
7.1 |
5.4 |
8.8 |
108 |
1 928 |
|
EU |
7.6 |
6.4 |
8.9 |
36 |
467 |
|
United States |
7.2 |
5.4 |
8.9 |
38 |
1 042 |
|
High income countries |
7.1 |
5.7 |
8.6 |
113 |
2 016 |
|
Low- and middle-income countries2 |
0.9 |
0.3 |
1.5 |
43 |
433 |
Note: The mean VSL estimate (population parameter from Equation 4.1) and 95% confidence intervals were estimated using the statistical software package STATA. The STATA code for reproducing these results is available upon request.
All Countries refer to the unweighted mean of all countries for which VSL studies were included. OECD reflects its 38 member countries as of 2025. EU refers to the 27 members of the European Union in 2025. The three World Bank income country categories are defined as: Low income (Gross National Income (GNI) per capita < USD 1 145), Lower-middle income (GNI per capita = USD 1 146 – 4 515) and Upper-middle income (GNI per capita = USD 4 516 – 14 005). Low-, lower-middle and upper-middle income categories are merged to form the category “Low- and middle-income countries” in order to have a sufficient number of studies to estimate the mean VSL with reasonable levels of uncertainty. “High-income countries” reflects countries with GNI per capita > USD 14 005).
Note that the relative over-representation of high-income countries in the sample increases the mean for the All countries group to USD 5.5 million. For example, the Gross National Income (GNI) per capita for the middle- and low-income group ranges between USD 4 466 to 13 845, while the average GNI per capita in the EU is around USD 59 000 (OECD, 2025[40]). As is shown in Chapter 5, income differences are a key driver of the differences in the PUVSL for the low- and middle-income group of countries and that for the EU and other high-income country groups, and also one of the key reasons why the PUVSLs need to be adjusted to be representative of each respective country group.
The confidence intervals around the means are relatively narrow due to a relatively high number of estimates for all groups, as is evident in Figure 4.2. Note that since these estimated mean PUVSL estimates take into account the four-levels of random effects, i.e. sources of heterogeneity in VSL estimates,27 the estimates derived through this statistical meta-analysis procedure are not directly comparable to the descriptive estimate-level and study-level means presented in Section 3.3 of Chapter 3.
Figure 4.2. Preliminary unweighted VSL estimates (PUVSL) by country group
Copy link to Figure 4.2. Preliminary unweighted VSL estimates (PUVSL) by country group
Note: Based on four-level random effects model estimates. 47 far-out estimates excluded. All Countries refer to the unweighted mean of all countries for which VSL studies were included. OECD reflects its 38 member countries as of 2025. EU refers to the 27 members of the European Union in 2025. The three World Bank income country categories are defined as: Low income (Gross National Income (GNI) per capita < USD 1 145), Lower-middle income (GNI per capita = USD 1 146 – 4 515) and Upper-middle income (GNI per capita = USD 4 516 – 14 005). Low-, lower-middle and upper-middle income categories are merged to form the category “Low- and middle-income countries” in order to have a sufficient number of studies to estimate the mean VSL with reasonable levels of uncertainty. “High-income countries” reflects countries with GNI per capita > USD 14 005).
As noted above, the estimated mean PUVSL values in Table 4.2 are a starting point for calculating recommended base VSL values. In order to arrive at a recommended estimate that can be used in policy analysis, additional adjustments need to be considered, such as adjustments for income, risk type and other evidence in the literature. Potential adjustments are considered in Chapter 5.
4.6. Sensitivity of preliminary unweighted VSL estimates to valuation methods and datasets
Copy link to 4.6. Sensitivity of preliminary unweighted VSL estimates to valuation methods and datasetsThis section presents and discusses the sensitivity of PUVSL estimates to the choice of valuation methods and datasets. This is first done “partially”, i.e. considering only comparing each factor in isolation. Chapter 5 reports the results of a meta-regression analysis controlling for income and other factors, as well as results of sensitivity analyses regarding the choice of SP or RP methods and datasets.
4.6.1. Choice of SP and RP methods and data sources
Figure 4.3 depicts the sensitivity of the mean PUVSL estimate for the All countries group (USD 5.5 million) to the choice of valuation method (SP or RP) and dataset (Old or New). The top line shows the main model estimate of mean USD 5.5 million and a confidence interval of USD 3.6 to 6.9 million.
The mean PUVSL is USD 5.1 million for the new SP data and USD 5.7 million for the new RP data. The similarity of these estimates suggests that the two main types of valuation methods exhibit a large degree of convergent validity. However, this is not the case when comparing the Old SP and RP data, where the Old RP data yields significantly higher PUVSL estimates than the Old SP data. When comparing SP and RP data (both old and new data), the PUVSL estimates are USD 6.3 million for SP data and USD 6.4 million for RP data. A further analysis of differences, including the types of RP and SP methods is discussed in Section below.
Other studies such as US EPA (2016[1]), which is limited to studies from the United States only, find that RP estimates tend to be systematically higher than SP estimates. An older meta-analysis study, Kochi et al. (2006[35]), also find that RP estimates are significantly higher than SP estimates. Note that both of these studies only include hedonic wage studies (not consumer market studies) as RP studies. In contrast to these findings, Ginbo et al. (2023[16]) find that SP methods yield VSL estimates roughly twice as high as RP methods among VSL studies in Canada. While these studies document evidence of differences in VSL estimates across valuation methods, the reasons for these differences remain unclear.
Note also that the new dataset (New SP and New RP datasets) yields a PUVSL estimate of USD 5.5 million, compared to USD 7.3 million for the old dataset. An important reason for the decrease in VSL estimates for the new dataset compared to the old is the inclusion of CM studies in the new dataset, which tends to yield systematically lower values. Another important reason is that the use of both SP and RP methods have spread to other parts of the world with lower income levels (cf. Section 3.2.3 of Chapter 3) and hence lower VSL estimates. There may also be an element of increasing methodological prudence due to dissemination and use of best practice guidance for non-market valuation studies over time and to new regions producing more conservative VSL estimates. The PUVSL estimate of the merged datasets (old and new) is close to the middle value at USD 6.2 million (see Figure A E.3 in Annex E). Further details on PUVSL estimates calculated on the basis of different datasets are provided in Annex E.
Figure 4.3. Sensitivity of PUVSL estimates to elicitation methods and dataset
Copy link to Figure 4.3. Sensitivity of PUVSL estimates to elicitation methods and datasetAll countries
Note: The figure displays mean PUVSL estimates and 95% confidence intervals using a four-level random effects model for the All countries group. SP refers to “Stated Preference” and RP refers to “Revealed preference”. Old refers to studies published before 2009, and New refers to studies published from 2009.
4.6.2. Elicitation methods (CE, CV, HW, CM)
This section investigates the sensitivity of PUVSLs to the specific types of RP and SP methods, i.e. contingent valuation (CV), choice experiments (CE), hedonic wage (HW) and consumer market (CM) methods. Exploring the sensitivity of PUVSL estimates to elicitation methods contributes to understanding the differences in VSL estimates between SP and RP observed in Section 4.6.1. Figure 4.4 depicts mean PUVSL estimates and 95% confidence intervals across these four methods for the old and new meta-data.
The top three lines reflect results from CV for all, new and old meta-data, focusing only on the overall mean (for the All countries group)28. These results indicate that CV, CE, HW, and CM yield PUVSL estimates of USD 4.2, 6.5, 8.4 and 3.5 million, respectively. The RP method produces quite heterogenous estimates, containing both the highest and lowest estimates. As noted in Chapter 3, this effect is in part due to that CM studies (a type of RP method) tend to give systematically lower estimates than HW studies (unknown for which reasons), and the Old RP database did not include any CM studies. Additionally, relatively few VSL estimates come from CM studies overall, yielding a wide confidence interval for estimates produced using this method. CE and HW, in contrast, yield relatively similar estimates. CV methods tend to yield lower VSL estimates than CE and HW methods, but higher than CM methods. Taken together, these results demonstrate that the type of SP- or RP-specific elicitation method used can have a larger effect on PUVSL estimates than the choice of SP or RP methods more generally.
Figure 4.4. Sensitivity of PUVSL to CV, CE, HW and CM methods
Copy link to Figure 4.4. Sensitivity of PUVSL to CV, CE, HW and CM methodsAll countries
Note: The figure displays mean PUVSL estimates and 95% confidence intervals using a three-level random effects model for the All countries group. SP refers to “Stated Preference” and RP refers to “Revealed preference”. Old refers to studies published before 2009, and New refers to studies published from 2009
4.7. Sensitivity of PUVSL to quality and screening factors
Copy link to 4.7. Sensitivity of PUVSL to quality and screening factorsThis section briefly discusses several other factors that could potentially be used to screen the dataset before estimating PUVSL using the random effects meta-analysis model described in Section 4.4.2. Such factors could include the scope of the population covered in the primary valuation study or aspects related to the scientific quality of the study. As discussed in Section 4.3, making assessments about the quality of primary valuation studies and deciding to exclude studies on the basis of such assessments requires careful consideration and justification.
Table 4.3 reports the effects of screening factors on the PUVSL estimate for the All countries group as reported in Table 4.2. For this analysis, the dummy variables defined in Section 4.3.2 are used to screen the dataset before the estimation. Note that this analysis is partial in that it assesses the effect of each of these factors individually without controlling for other factors. Results in Table 4.3 are reported in the following order:
Screening for population coverage:
Nationwide: Whether or not the target population is nationwide.
General population: Whether the target population covers the general adult population or specific groups.
Aspects of study quality:
Random sampling: For SP studies, whether random or probabilistic sampling is used; for RP studies, whether an entire sample of a specific group is used.29
No standard errors of VSL estimates are reported.
SP-specific: Whether the survey clearly defines the baseline level of risk.
SP-specific: Whether the study reports a scope test (including external and internal)30.
SP-specific: Whether a visual aid to explain risk is used.
RP-specific: Whether risk data is based on the Census of Fatal Occupational Injuries (CFOI).
RP-specific: Whether VSL is estimated using an instrumental variable (IV) estimator for fatality rate.
RP-specific: Whether the regression controls for non-fatal injury risk.
Figure A E.6 through Figure A E.15 of Annex E report PUVSL estimates resulting from screening the dataset according to the factors above. It is important to note that while this analysis indicates differences in PUVSL estimates based on these screenings, it does not provide insights regarding the causes or significance of the differences. Instead, the exercise is mainly intended for illustration purposes to demonstrate the potential consequences of analyst choices regarding whether and how to screen datasets of VSL estimates based on certain criteria.
Table 4.3 shows, for example, that limiting the data only to studies that are nationwide or cover the general population increases the mean PUVSL estimate from USD 5.5 million for the All countries group to USD 6.8 and 6.3 million, respectively. Using quality controls also appears to impact mean PUVSL estimates.
Table 4.3. Summary of PUVSL estimates when screening data based on target group and quality measures
Copy link to Table 4.3. Summary of PUVSL estimates when screening data based on target group and quality measuresAll countries, USD million
|
Variable |
PUVSL (pooled data) |
Yes (=1) |
No (=0) |
|---|---|---|---|
|
Population and target group variables: |
|||
|
Nationwide |
5.4 |
6.8 |
3.7 |
|
General population |
5.4 |
6.3 |
4.6 |
|
Quality dimensions of studies: |
|||
|
Random sampling (representative sample) |
5.4 |
5.0 |
6.6 |
|
No standard errors of VSL estimates reported |
5.4 |
5.1 |
6.0 |
|
SP-specific: Whether the survey clearly defines the baseline risk |
4.8 |
5.9 |
3.9 |
|
SP-specific: Whether the study reports a scope test (including external and internal) |
4.8 |
6.0 |
3.7 |
|
SP-specific: Whether the survey uses a visual aid to explain risk |
4.8 |
5.8 |
3.9 |
|
RP-specific: Whether risk data is based on the Census of Fatal Occupational Injuries (CFOI) |
6.8 |
5.3 |
9.9 |
|
RP-specific: Whether VSL is estimated using an instrumental variable (IV) estimator for fatality rate |
6.8 |
7.0 |
6.8 |
|
RP-specific: Whether the regression controls for non-fatal injury risk |
6.8 |
7.6 |
6.1 |
Note: The mean baseline VSL for SP and RP, respectively, are different from USD 5.5 million because they are derived based on data from only one of the two methods. “Yes” means the dummy of the variable of interest is equal to one, and the exercise is performed only on data that e.g. are nationwide. “No” means that studies with the characteristic of interest, e.g. nationwide, are not included in the exercise (the dummy is set to zero). These values are based on a three-level meta-analysis model, as opposed to values presented in and, which are based on a four-level meta-analysis model. This is because incorporating the quality dimension leads to insufficient variation in elicitation methods, making the four-level model not possible to perform (see Table A C.5. in Annex A, which provides descriptive statistics of estimates per elicitation methods and methodological characteristics).
Table A E.1 in Annex E also presents the results of a sensitivity analysis to investigate the impact of including only mean PUVSL estimates from studies published in journals. Although journal publication is relevant for assessing quality, as discussed in Section 4.3.3, it is not recommended to be used as criterion in isolation. The impact of this screening on PUVSL estimates is observable for most country groups. For the group of All countries, the PUVSL estimate increases by USD 300 000 from USD 5.5 to 5.9 million. Estimates for most other country groups also increase somewhat (except for the low- and middle-income country groups), with the largest increase observed for the United States.
References
[12] Alberini, A. and M. Ščasný (2021), “On the validity of the estimates of the VSL from contingent valuation: Evidence from the Czech Republic”, Journal of Risk and Uncertainty, Vol. 62/1, pp. 55-87, https://doi.org/10.1007/S11166-021-09347-8.
[15] Ananthapavan, J. et al. (2021), “Systematic review to update ‘value of a statistical life’ estimates for Australia”, International Journal of Environmental Research and Public Health, Vol. 18/11, https://doi.org/10.3390/IJERPH18116168.
[27] Askarov, Z. et al. (2023), “The Significance of Data-Sharing Policy”, Journal of the European Economic Association, Vol. 21/3, pp. 1191-1226, https://doi.org/10.1093/JEEA/JVAC053.
[20] Bishop, R. and K. Boyle (2019), “Reliability and Validity in Nonmarket Valuation”, pp. 463-497, https://doi.org/10.1007/978-94-007-7104-8_12.
[42] Bishop, R. and K. Boyle (2019), “Reliability and Validity in Nonmarket Valuation”, Environmental and Resource Economics, Vol. 72/2, pp. 559-582, https://doi.org/10.1007/S10640-017-0215-7.
[45] Borenstein, M. et al. (2010), “A basic introduction to fixed-effect and random-effects models for meta-analysis”, Research Synthesis Methods, Vol. 1/2, pp. 97-111, https://doi.org/10.1002/JRSM.12.
[5] Boyle, K. and J. Wooldridge (2018), “Understanding Error Structures and Exploiting Panel Data in Meta-analytic Benefit Transfers”, Environmental and Resource Economics, Vol. 69/3, pp. 609-635, https://doi.org/10.1007/S10640-017-0211-Y.
[36] Brown, A. et al. (2024), “Meta-analysis of Empirical Estimates of Loss Aversion”, Journal of Economic Literature, Vol. 62/2, pp. 485-516, https://doi.org/10.1257/JEL.20221698.
[28] Doucouliagos, C., T. Stanley and M. Giles (2012), “Are estimates of the value of a statistical life exaggerated?”, Journal of Health Economics, Vol. 31/1, pp. 197-206, https://doi.org/10.1016/J.JHEALECO.2011.10.001.
[24] Evans, M. and L. Taylor (2020), “Using Revealed Preference Methods to Estimate the Value of Reduced Mortality Risk: Best Practice Recommendations for the Hedonic Wage Model”, https://doi.org/10.1093/reep/reaa006, Vol. 14/2, pp. 282-301, https://doi.org/10.1093/REEP/REAA006.
[16] Ginbo, T., W. Adamowicz and P. Lloyd-Smith (2023), “Valuing Mortality Risk Reductions in Canada: An Updated Meta-Analysis and Policy Guidance”, Canadian Public Policy, Vol. 49/3, pp. 233-251, https://doi.org/10.3138/CPP.2022-052.
[19] Hamadeh, N., C. Van Rompaey and E. Metreau (2023), World Bank Group country classifications by income level for FY24 (July 1, 2023- June 30, 2024), https://blogs.worldbank.org/en/opendata/new-world-bank-group-country-classifications-income-level-fy24 (accessed on 28 October 2024).
[46] Hamadeh, N., C. Van Rompaey and E. Metreau (2023), World Bank Group country classifications by income level for FY24 (July 1, 2023- June 30, 2024), https://blogs.worldbank.org/en/opendata/new-world-bank-group-country-classifications-income-level-fy24 (accessed on 28 October 2024).
[13] Hammitt, J. et al. (2019), “Valuing mortality risk in China: Comparing stated-preference estimates from 2005 and 2016”, Journal of Risk and Uncertainty, Vol. 58/2-3, pp. 167-186, https://doi.org/10.1007/S11166-019-09305-5.
[9] Hammitt, J., J. Liu and J. Liu (2022), “Is survival a luxury good? Income elasticity of the value per statistical life”, Journal of Risk and Uncertainty, Vol. 65/3, pp. 239-260, https://doi.org/10.1007/S11166-022-09397-6.
[33] Harrer, M. et al. (2021), “Doing Meta-Analysis with R”, Doing Meta-Analysis with R, https://doi.org/10.1201/9781003107347.
[43] Husereau, D. et al. (2013), “Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement.”, BMJ (Clinical research ed.), Vol. 346, https://doi.org/10.1136/BMJ.F1049.
[23] Johnston, R. et al. (2017), “Contemporary guidance for stated preference studies”, Journal of the Association of Environmental and Resource Economists, Vol. 4/2, pp. 319-405.
[2] Johnston, R. et al. (2021), “Guidance to Enhance the Validity and Credibility of Environmental Benefit Transfers”, Environmental and Resource Economics, Vol. 79/3, pp. 575-624, https://doi.org/10.1007/S10640-021-00574-W.
[3] Johnston, R. et al. (2017), “Contemporary guidance for stated preference studies”, Journal of the Association of Environmental and Resource Economists, Vol. 4/2, pp. 319-405, https://doi.org/10.1086/691697.
[14] Keller, E. et al. (2021), “How Much Is a Human Life Worth? A Systematic Review”, Value in Health, Vol. 24/10, pp. 1531-1541, https://doi.org/10.1016/J.JVAL.2021.04.003.
[35] Kochi, I., B. Hubbell and R. Kramer (2006), “An Empirical Bayes Approach to Combining and Comparing Estimates of the Value of a Statistical Life for Environmental Policy Analysis”, Environmental & Resource Economics, Vol. 34/3, pp. 385-406, https://doi.org/10.1007/S10640-006-9000-8.
[4] Lindhjem, H. and S. Navrud (2015), “Reliability of Meta-analytic Benefit Transfers of International Value of Statistical Life Estimates: Tests and Illustrations”, pp. 441-464, https://doi.org/10.1007/978-94-017-9930-0_19.
[17] Lindhjem, H. et al. (2011), “Valuing Mortality Risk Reductions from Environmental, Transport, and Health Policies: A Global Meta-Analysis of Stated Preference Studies”, Risk Analysis, Vol. 31/9, pp. 1381-1407, https://doi.org/10.1111/J.1539-6924.2011.01694.X.
[29] Masterman, C. and W. Viscusi (2020), “Publication Selection Biases in Stated Preference Estimates of the Value of a Statistical Life”, Journal of Benefit-Cost Analysis, Vol. 11/3, pp. 357-379, https://doi.org/10.1017/BCA.2020.21.
[18] Masterman, C. and W. Viscusi (2018), “The Income Elasticity of Global Values of a Statistical Life: Stated Preference Evidence”, Journal of Benefit-Cost Analysis, Vol. 9/3, pp. 407-434, https://doi.org/10.1017/BCA.2018.20.
[26] McElreath, R. (2018), “Statistical rethinking: A bayesian course with examples in R and stan”, Statistical Rethinking: A Bayesian Course with Examples in R and Stan, pp. 1-469, https://doi.org/10.1201/9781315372495/STATISTICAL-RETHINKING-RICHARD-MCELREATH/ACCESSIBILITY-INFORMATION.
[32] Nelson, J. (2015), “Meta-analysis: statistical methods. Ch.15: Benefit Transfer of Environmental and Resource Values: A Guide for Researchers and Practitioners”.
[31] Nelson, J. and P. Kennedy (2009), “The use (and abuse) of meta-analysis in environmental and natural resource economics: An assessment”, Environmental and Resource Economics, Vol. 42/3, pp. 345-377, https://doi.org/10.1007/S10640-008-9253-5.
[44] Newman, R. and I. Noy (2023), “The global costs of extreme weather that are attributable to climate change”, Nature Communications 2023 14:1, Vol. 14/1, pp. 1-13, https://doi.org/10.1038/s41467-023-41888-1.
[40] OECD (2025), Gross national income, https://www.oecd.org/en/data/indicators/gross-national-income.html?oecdcontrol-d7f68dbeee-var3=2023&oecdcontrol-e4e765a1a9-var1=AUS%7CAUT%7CBEL%7CCAN%7CCHL%7CCOL%7CCRI%7CCZE%7CDNK%7CEST%7CFIN%7CFRA%7CDEU%7CGRC%7CHUN%7CISL%7CIRL%7CISR%7CITA%7CJPN%7CKOR%7CLVA%7CLTU%7CLUX%7CMEX%7CNLD%7CNZL%7CNOR%7CPOL%7CPRT%7CSVK%7CSVN%7CESP%7CSWE%7CCHE%7CTUR%7CGBR%7CUSA%7CEU27_2020 (accessed on 27 March 2025).
[7] OECD (2012), Mortality Risk Valuation in Environment, Health and Transport Policies, OECD Publishing, Paris, https://doi.org/10.1787/9789264130807-en.
[10] Ørbeck, M. et al. (2024), “Temporal stability of environmental values in times of uncertainty”.
[41] Ørbeck, M. et al. (2022), “Stated Preferences in Tumultuous Times: Investigating Environmental Preferences Over a Five-Year Period Controlling for COVID-19”, SSRN Electronic Journal, https://doi.org/10.2139/SSRN.4280689.
[6] Robinson, L. et al. (2019), “Reference Case Guidelines for Benefit-Cost Analysis in Global Health and Development”, SSRN Electronic Journal, https://doi.org/10.2139/SSRN.4015886.
[37] Rubio-Aparicio, M. et al. (2018), “A methodological review of meta-analyses of the effectiveness of clinical psychology treatments”, Behavior Research Methods, Vol. 50/5, pp. 2057-2073, https://doi.org/10.3758/S13428-017-0973-8/TABLES/4.
[11] SAB/USEPA (2017), Review of EPA’s Proposed Methodology for Updating Mortality Risk Valuation Estimates for Policy Analysis.
[38] Schmidt, A. and C. Finan (2018), “Linear regression and the normality assumption”, Journal of Clinical Epidemiology, Vol. 98, pp. 146-151, https://doi.org/10.1016/J.JCLINEPI.2017.12.006.
[34] Sera, F. et al. (2019), “An extended mixed-effects framework for meta-analysis”, Statistics in Medicine, Vol. 38/29, pp. 5429-5444, https://doi.org/10.1002/SIM.8362.
[8] St-Amour, P. (2024), “Valuing life over the life cycle”, Journal of Health Economics, Vol. 93, https://doi.org/10.1016/J.JHEALECO.2023.102842.
[1] USEPA (2016), Valuing mortality risk reductions for policy: a meta-analytic approach. White Paper, https://www.epa.gov/system/files/documents/2025-04/vsl-white-paper_final_020516-1.pdf.
[21] USEPA (2011), Review of Valuing Mortality Risk Reductions for Environmental Policy: A White Paper, Office of the Administrator, Science Advisory Board.
[30] USEPA (2006), Report on the EPA Work Group on VSL Meta-Analyses, https://www.epa.gov/sites/default/files/2018-02/documents/ee-0494-01.pdf (accessed on 30 October 2024).
[22] USEPA SAB (2017), Review of EPA’s Proposed Methodology for Updating Mortality Risk Valuation Estimates for Policy Analysis.
[25] Viscusi, W. (2019), “Risk guideposts for a safer society: Introduction and overview”, Journal of Risk and Uncertainty, Vol. 58/2-3, pp. 101-119, https://doi.org/10.1007/S11166-019-09307-3/TABLES/3.
[39] Wooldridge, J. (2014), “Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables”, Journal of Econometrics, Vol. 182/1, pp. 226-234, https://doi.org/10.1016/J.JECONOM.2014.04.020.
Notes
Copy link to Notes← 1. Typically, these values are set at “best practice” or sample means, or alternatively at values to, for example, balance the evidence used from SP and RP approaches (Johnston et al., 2021[63]; USEPA, 2016[6]).
← 2. It is worth noting that the US EPA study did not conduct a standard meta-regression analysis, and the US EPA Scientific Advisory Board (SAB) (2017[11]) did not object to this general approach.
← 3. Additional tests and sensitivity analyses that informed the choice of approach are provided in Annex E.
← 4. As noted in Section 3.1.4 of Chapter 3, it may not be appropriate to adjust older VSL estimates by income using strong assumptions about income elasticity and then convert to a common year for analysis.
← 5. A previous version of this paper is available as a working paper: Ørbeck et al. (2022[41]). This study provides a review of the most important literature on the temporal stability of preferences.
← 6. Ginbo et al. (2023[16]) included all Canadian studies from 1989-2018, while Anthapavan et al. (2021[15]) include Australian studies from 2007 to 2019, where the first primary valuation study of VSL in Australia was conducted in 2017.
← 7. As described in Chapter 3, there is a lag between the years of the data for analysis and the publication year of VSL estimates. As a result, VSL estimates reported in studies published between 2009 and 2024 may be based on older data. Note that such time lags also apply to the old SP and RP data.
← 8. USEPA (2016[1]), for example, states that they “only include estimates for immediate risk reductions or those that begin within one year (under the assumption that this is “nearly” immediate)”. However, the time frame for risk reductions is not always clear in studies estimating VSL.
← 9. The meta-data used in OECD (2012[7]) (i.e. the Old SP dataset) included coded variables such as public vs. private risk, willingness to pay vs. willingness to accept compensation, household vs. individual WTP and types of payment vehicles. Based on the experience of gathering and using these data, many of these variables were not coded in the newer data (New SP and New RP datasets). Given that only a fraction of the studies provided the information needed to code these variables, doing so was expected to limit the number of primary valuation studies that could be used in the analysis.
← 10. World Bank income categories are used to define the country classification by income level (Hamadeh, Van Rompaey and Metreau, 2023[19]).
← 11. This difference is illustrated well by Bishop and Boyle (2019[42]) with a dart board where validity reflects the extent to which the arrows are clustered evenly around the centre or around some other point, and reliability reflects how distant the arrows are from the centre.
← 12. Keller et al. (2021[14]) assessed the quality of studies included in the review based on a modification of the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) checklist (Husereau et al., 2013[43]).
← 14. Note that while the recency of studies can also be considered to be related to their quality (e.g. as discussed in Section 4.2), US EPA SAB (2017[22]) does not articulate a clear position regarding whether or not to exclude older studies for quality reasons. In this regard, they do not discuss the impact of potential changes in preferences and circumstances over time on the validity of VSL estimates from older studies (for other reasons than quality per se). USEPA SAB (2017, p. 39[22]) recommends “to explore whether older or newer studies have a strong influence on the VSL estimate […] the EPA should consider conducting a sensitivity analysis.” (USEPA SAB, 2017, p. 39[22]). This report provides results from sensitivity analyses based on the old, new and merged meta-data in Annex E.
← 15. The list contains 10 items pertaining to SP studies:
1. Was the survey pretested using focus groups, one-on-one interviews, or field pretests?
2. Was the survey applied to a random sample of a clearly specified population?
3. Did the survey clearly define the baseline risk?
4. Did the survey clearly explain the change in risk to be valued?
5. Was the valuation scenario consequential (mandatory payment and valuation response with a non-zero probability of influencing provision of the item being valued)?
6. Was the stated preference question a binary choice framed as a referendum or product purchase?
7. Were robustness checks conducted of the statistical analyses that led to the value estimate?
8. Were construct validity tests conducted?
9. Was the sample of respondents investigated for comparability to the population sampled?
10. Has the study been subject to peer review?
← 16. This means for example that using “journal publication” or some rank of journals (such as in (Newman and Noy, 2023[44])) may be a bit simplistic as a screening criterion for quality.
← 17. Note that this choice is not contradictory to the screening out of statistically implausible VSL estimates (far-outs), as described in Section 4.3.1, which is based on objective and relatively standard statistical criteria.
← 18. And specifically whether it is based on data from the Census of Fatal Occupational Injuries (CFOI) which is a record of fatal work injuries produced by the Bureau of Labour Statistics in the United States.
← 19. As noted in Section 3.1.3 of Chapter 3 around 70% of the primary valuation studies in the new meta dataset have been published in journals. It is nevertheless difficult to use this criterion alone to judge study quality and suitability, notably because peer review processes for journals compared to e.g. working papers, book chapters, conference proceedings, these and other study types are difficult to compare.
← 20. In explorative analysis of publication bias in the VSL estimates of the new SP and RP data (where estimates with imputed SE have been removed due to the noted endogeneity problem), no clear indication of small-study effects was found (i.e. when results from small sample studies are systematically different than for large sample studies), nor evidence of selective reporting. However, more specific investigations, beyond the scope of this report, are required on the methods of bias detection and the explanation of likely sources in published primary VSL estimates, if such bias indeed exists.
← 21. Nelson and Kennedy (2009, p. 350[31]): “Given the wide range of outcomes in primary estimates, it is recommended that future researchers place a high priority on collecting primary data on variances and sample sizes.”
← 22. Information on the calculation methods used in this analysis are provided in Table A D.1 in Annex D. A comparison between observed and imputed standard errors is illustrated in Figure A D.1 in Annex D.
← 23. This is typically done by either including only estimate-level random effects, which is equivalent to a two-level random effects model (Kochi, Hubbell and Kramer, 2006[35]) or though weighted-least-squares estimation using the inverse of the variance as weights (Viscusi, 2019[25]).
← 24. The fixed-effect model assumes that each observed estimate can be decomposed into a true population estimate and a pure sampling error (Borenstein et al., 2010[45]). In other words, it assumes that all estimates reported by various studies share the estimate from a single homogeneous population and that variation in observations is solely due to sampling errors. This is equivalent to a weighted-least-squares meta-regression using only the intercept as a regressor (Nelson and Kennedy, 2009[31]).
← 25. The three World Bank income country categories are defined as: Low income (Gross National Income (GNI) per capita < USD 1 145), Lower-middle income (GNI per capita = USD 1 146 – 4 515) and Upper-middle income (GNI per capita = USD 4 516 – 14 005). Low-, lower-middle and upper-middle income categories are merged to form the category “Low- and middle-income countries” in order to have a sufficient number of studies to estimate the mean VSL with reasonable levels of uncertainty. “High-income countries” reflects countries with GNI per capita > USD 14 005) (Hamadeh, Van Rompaey and Metreau, 2023[46]).
← 26. The higher mean VSL in the EU compared to the US, despite lower GDP per capita in the EU, can be partially explained by CV studies from the United States reporting systematically lower mean VSL estimates compared to those from the EU (USD 6.5 million vs. USD 8.2 million) (for unknown reasons), as well as by the fact that the only CM studies that were included in the meta-analysis, which produce systematically lower VSL estimates than other methods, were conducted in the United States. This effect is only partly counteracted by CE and HW studies from the United States, which produce relatively higher VSL estimates than EU studies. Note that US estimates are also characterised by wider confidence intervals, suggesting that studies from the United States are more heterogenous, at least in terms of elicitation methods.
← 27. A sensitivity analysis of mean VSL estimates is provided in Figure A E.2 and Figure A E.3 in Annex E.
← 28. Further disaggregating the results into individual country groups would significantly reduce the sample size in some cases.
← 29. This increases the chance of obtaining a representative sample.
← 30. The result of the scope tests per se are not coded, only whether scope test results is reported or not.