PISA collects data from a sample, not on the whole population of 15-year-old students. The sample has been drawn in order to avoid bias in the selection procedure and to achieve the maximum precision in view of the available resources (for more information, see Chapter 3 in the PISA Data Analysis Manual: SPSS and SAS, Second Edition).

In practice, this means that the estimation of a population parameter requires to (1) use weights associated with the sampling and (2) to compute the uncertainty due to the sampling (the standard-error of the parameter).

Use final student weights for obtaining unbiased parameter estimates



All analyses using PISA data should be weighted, as unweighted analyses will provide biased population parameter estimates. In PISA 2015 files, the variable w_schgrnrabwt corresponds to final student weights that should be used to compute unbiased statistics at the country level.



The final student weights add up to the size of the population of interest. When conducting analysis for several countries, this thus means that the countries where the number of 15-year students is higher will contribute more to the analysis. For this reason, in some cases, the analyst may prefer to use senate weights, meaning weights that have been rescaled in order to add up to the same constant value within each country. Each country will thus contribute equally to the analysis.

Use replicate weights for obtaining unbiased standard errors



A statistic computed from a sample provides an estimate of the population true parameter. One should thus need to compute its standard-error, which provides an indication of their reliability of these estimates – standard-error tells us how close our sample statistics obtained with this sample is to the true statistics for the overall population. These estimates of the standard-errors could be used for instance for reporting differences that are statistically significant between countries or within countries.



As the sample design of the PISA is complex, the standard-error estimates provided by common statistical procedures are usually biased. Moreover, the mathematical computation of the sample variances is not always feasible for some multivariate indices. For these reasons, the estimation of sampling variances in PISA relies on replication methodologies, more precisely a Bootstrap Replication with Fay’s modification (for details see Chapter 4 in the PISA Data Analysis Manual: SAS or SPSS, Second Edition or the associated guide “Computation of standard-errors for multistage samples”). The general principle of these methods consists of using several replicates of the original sample (obtained by sampling with replacement) in order to estimate the sampling error. The statistic of interest is first computed based on the whole sample, and then again for each replicate. The replicate estimates are then compared with the whole sample estimate to estimate the sampling variance.

In PISA 80 replicated samples are computed – and for all of them, a set of weights are computed as well.

In practice, this means that one should estimate the statistic of interest using the final weight as described above, then again using the replicate weights (denoted by w_fsturwt1- w_fsturwt80 in PISA 2015, w_fstr1- w_fstr80 in previous cycles). The standard-error is then proportional to the average of the squared differences between the main estimate obtained in the original samples and those obtained in the replicated samples (for details on the computation of average over several countries, see the Chapter 12 of the PISA Data Analysis Manual: SAS or SPSS, Second Edition).

Procedures and macros are developed in order to compute these standard errors within the specific PISA framework (see below for detailed description).