This chapter sets out a monitoring and evaluation framework to accompany the design and implementation of DYPA’s new digital tool for identifying and supporting vulnerable jobseekers. It explains how a robust framework can guide evidence‑based decision making by clarifying the intervention logic through a Theory of Change and corresponding results chain, linking inputs and activities to outputs, outcomes and longer-term impacts. Building on the implementation steps developed in Chapter 3, this chapter proposes a structured set of monitoring indicators, highlighting the data requirements for these indicators. The chapter then outlines a concept for a counterfactual impact evaluation of the tool, with a particular focus on the design of a randomised controlled trial and possible quasi‑experimental alternatives when randomisation is not feasible.
Strengthening Individualised Support for Jobseekers Furthest from the Labour Market in Greece
5. Monitoring and evaluation framework of the digital tool to identify clients needing intensive support
Copy link to 5. Monitoring and evaluation framework of the digital tool to identify clients needing intensive supportAbstract
5.1. Introduction
Copy link to 5.1. IntroductionMonitoring and Evaluation (M&E) frameworks are essential tools to accompany the implementation of any policy. They enable a continuous tracking of the policy implementation, facilitate an early detection of deviations from the original plan, and provide a solid basis for assessing whether the policy’s impact aligns with its strategic objectives. In doing so, M&E frameworks play a key role in governance: they allow policymakers and project managers to systematically monitor progress, ensure accountability for results, and promote transparency vis-à-vis internal and external stakeholders.
This chapter proposes an M&E framework that is structured around two interrelated components that serve complementary purposes. The proposed set of monitoring indicators – aligned with the steps of the implementation plan presented in Chapter 3 – enables the systematic tracking of progress throughout the implementation of the digital tool. In contrast, the counterfactual impact evaluation aims to assess the causal effects of the tool, determining whether and to what extent it enhances the identification and referral of, and support for vulnerable jobseekers compared with the existing tool.
The note is organised as follows. Section 5.2 presents the Theory of Change underpinning the M&E framework and outlines the corresponding results chain. The Theory of Change articulates the logical links between inputs, activities, outputs, and outcomes. Importantly, the development of both the Theory of Change and the results chain should be a participatory process involving DYPA’s management, counsellors, and other relevant stakeholders to ensure ownership and to guarantee that the tool responds to actual needs effectively. Section 5.3 builds on the results chain introduced in Section 5.2 and proposes a possible set of indicators to form the monitoring framework, structured along the results chain. Section 5.4 discusses a possible concept for the counterfactual impact assessment of the tool, outlining possible methodological approaches (with a particular focus on Randomised Controlled Trials – RCTs) and highlighting key aspects to consider when implementing the evaluation.
5.2. Foundations of the Monitoring and Evaluation Framework
Copy link to 5.2. Foundations of the Monitoring and Evaluation FrameworkMonitoring and evaluation (M&E) frameworks are essential for any policy to systematically track its implementation and assess its effectiveness and contribution towards clearly defined policy objectives. Such frameworks help policymakers and practitioners answer key questions: Is the intervention reaching the right people? Is it being implemented as intended? Is it achieving the outcomes it was designed to deliver?
A successful implementation of the proposed digital tool for DYPA will require such a framework. As outlined in Step 5 of the implementation plan (Chapter 3), the tool should be systematically monitored and evaluated to track progress and inform decision making. This framework will enable DYPA to identify and address operational issues early on, while also assessing the extent to which the tool supports vulnerable jobseekers in accessing timely and tailored services and, ultimately, sustainable employment.
This chapter outlines the conceptual foundations for developing a M&E framework for the DYPA’s new digital tool. It explains how M&E can be structured around a clear understanding of the intervention logic, specifically, through a Theory of Change (ToC) and its corresponding results chain. Together, these elements describe how the tool’s activities are expected to generate outputs, outcomes, and impacts, and they provide the basis for assessing whether these effects occur as intended. The chapter first introduces the role of M&E frameworks in guiding evidence‑based implementation (Section 5.2.1), then explains how the ToC and results chain form both the starting point and the backbone of such frameworks (Section 5.2.2) and finally presents an example of a results chain tailored to the DYPA’s digital tool (Section 5.2.3).
5.2.1. Monitoring and evaluation (M&E) frameworks are key to effective implementation, learning, and impact
The terms monitoring and evaluation are sometimes used interchangeably, but they have different functions and, therefore, it is useful to distinguish them clearly:
Monitoring refers to the ongoing collection and analysis of information on specific indicators during programme implementation. In this case, it may involve tracking the use of the digital tool (e.g. number of counsellors trained, share of jobseekers profiled at registration, number of referrals made) and the uptake of services by jobseekers. Monitoring ensures that the tool is being implemented as intended and allows timely adjustments as challenges are identified.
Evaluation, by contrast, may focus on how the tool is implemented (process evaluation), whether it represents good value for money (cost – benefit analysis), or whether it has achieved its intended outcomes (impact evaluation) (OECD, 2020[1]). This note focuses on the latter, in this context, the assessments may include measuring whether vulnerable jobseekers who interact with the tool experience shorter unemployment spells, greater use of tailored support, or higher rates of transition into stable employment. Evaluation requires more time and methodological rigour than monitoring, since it aims to isolate the effect of the tool itself from other external factors.
Together, monitoring and evaluation create a feedback loop that supports decision making at multiple levels (OECD, 2022[2]). Monitoring provides real-time information to improve implementation, while evaluation sheds light on the tool’s overall effectiveness and longer-term impacts. Both processes benefit from a clear understanding of the causal logic of the intervention, in other words, how inputs and activities are expected to lead to outputs and outcomes and, ultimately, to contribute to policy goals.
A comprehensive M&E framework typically includes four key components:
First, it begins by defining the intervention logic through a Theory of Change and results chain, which make explicit how inputs and activities are expected to lead to outputs, outcomes, and impacts.
Second, it involves setting indicators to track progress toward objectives.
Third, it requires planning for data collection and management, ensuring that reliable information on indicators is gathered before, during, and after implementation.
Finally, it includes an evaluation plan that defines how and when effectiveness will be assessed, using methods proportionate to the policy’s scope and objectives.
A well-designed M&E framework promotes accountability and transparency but also supports learning and improvement by identifying what works, what does not, and why. Importantly, the process of developing a framework should be participatory, involving stakeholders from the outset to ensure that the design reflects real needs and that results will be used in practice.
5.2.2. A Theory of change and results chain are both the starting point and the backbone of an M&E Framework
A first step in developing a M&E framework is to set out the logic of the intervention through a ToC and its corresponding results chain. By outlining what the intervention does, and what it aims to achieve, the ToC and results chain together form the conceptual foundation for all M&E activities and provide a structure against which the effectiveness of an intervention can later be assessed.
Theory of change
A ToC describes how an intervention is supposed to work, why it will work, who it will benefit, and under what conditions. It sets out the pathways through which activities lead to outcomes and, ultimately, to long-term impacts. A ToC makes explicit the assumptions behind the intervention logic, identifies the contextual factors that may affect success, and clarifies the conditions needed for change to occur. Developing a ToC helps create a shared understanding among stakeholders of the rationale and logic of an intervention, which should in turn, support the strategic planning, implementation, and evaluation of the policy change.
Creating a ToC takes time and should be approached as a participatory process. Experience from evaluation practice suggests that organisations should aim for a manageable scope and engage stakeholders early in the process to ensure that the resulting ToC reflects operational realities and institutional priorities (Clark and Anderson, 2004[3]). The process typically involves analysing programme documents and linking the intervention to broader strategic objectives.
Results chain
A complete results chain can be developed using the ToC process (Rist and Zall Kusek, 2004[4]). The ToC provides the conceptual logic, explaining how and why change is expected to occur. The results chain operationalises the ToC by translating this logic into a structured sequence that can be monitored and evaluated.
The results chain sets out a logical sequence from inputs (the resources available) to activities (the actions taken), to outputs (the immediate products or services delivered), through to outcomes (the direct results of the intervention), and finally impact (the broad, long-term change that the initiative contributes to). This approach makes the relationships between different elements explicit and highlights any assumptions or risks along the way.
The process usually starts with defining the impact, and then working backwards to identify outcomes, outputs, and activities. These elements should always be revisited and validated in discussion with stakeholders, who can help refine priorities and clarify assumptions. This provides both a planning tool and a foundation for monitoring and evaluation, since each stage can be associated with indicators to measure progress and results.
Developing a results chain should begin early, ideally as part of the design stage of a project, so that the intervention’s objectives, assumptions, and expected causal links are clearly defined before implementation begins. Doing so not only clarifies how the planned activities are expected to lead to the desired outcomes but can also inform and improve the design itself, for instance, by identifying weak or missing links between actions and intended results that may need to be adjusted.
Integrating stakeholder input into the development of the intervention logic
The development of the DYPA digital tool, the ToC and result chain should be co‑ordinated by DYPA experts from different teams, including employment counsellors, DYPA’s data analysts, and IT specialists responsible for the tool’s technical implementation. Additionally, feedback from external partners can help enrich and validate the logic of the intervention.
In practical terms, the engagement with external actors could take place as part of Step 2 of the implementation plan (Chapter 3), which calls for identifying and engaging relevant stakeholders in the tool’s design and implementation. For instance, DYPA could organise meetings or workshops to test and refine the Theory of Change and results chain with a wider range of DYPA staff, and where relevant, involve other experts, such as representatives from the Ministry of Labour, researchers, or external consultants. These sessions could:
Review the problem the tool seeks to address and the desired outcomes and policy impact.
Map the logical sequence of inputs, activities, and outputs leading to these outcomes.
Identify risks, assumptions, and contextual factors that may influence success.
5.2.3. An example results chain can guide the development of the M&E framework and inform stakeholder discussions
Figure 5.1 presents a suggested results chain for the DYPA digital tool. It is based on the tool’s design and implementation as described in Chapter 3 and aims to illustrate how the intervention logic can be translated into a structured framework for monitoring and evaluation. The purpose of this exercise is twofold: first, to provide practical guidance on how to articulate the links between activities, outputs, and outcomes; and second, to serve as a foundation for subsequent work on indicator development and evaluation planning. It should be emphasised that this results chain is illustrative and should be refined through consultation with DYPA and other stakeholders before the tool is implemented.
In this example, the impact of the DYPA digital tool is defined as: Improving the social and economic inclusion of vulnerable jobseekers by supporting their integration into the labour market. While the outcomes include: timely interventions for vulnerable jobseekers; greater use of services and referrals to ALMPs; and shorter unemployment spells leading to more stable employment trajectories.
The activities identified in the results chain build directly on the implementation plan outlined in Chapter 3. These include, for instance:
Defining DYPA’s profiling approach and situating the tool within its broader digitalisation strategy (Step 1).
Engaging stakeholders such as counsellors, IT specialists, data analysts, and NGOs to guide design and ensure ownership (Step 2).
Specifying the technical and operational design of the tool, including data sources, referral mechanisms, and service mapping (Step 3).
Developing the tool – technical implementation, testing, releasing in live operational IT system (putting in production), (Step 4).
Training counsellors in tool use, supported by a “train-the-trainer” strategy and practical guidance (Step 5).
Piloting the tool to test functionality and make refinements prior to national roll-out (Step 6).
Refining the tool based on pilot evidence and scaling it up nationally (Step 7).
Together, the activities, outputs, and outcomes outlined in form a coherent sequence that connects the tool’s operational implementation with its intended policy impact. They can later serve as the basis for developing M&E indicators and identifying data needs.
Figure 5.1. Using the results chain to monitor and evaluate the new digital tool
Copy link to Figure 5.1. Using the results chain to monitor and evaluate the new digital tool
Source: Adapted from (OECD, 2022[2]), Technical report: Impact Evaluation of Vocational Training and Employment Subsidies for the Unemployed in Lithuania.
Once the intervention logic has been articulated through the ToC and results chain, the next steps involve defining measurable indicators, establishing reliable data collection processes, and planning for evaluation. These elements translate the conceptual framework into operational tools for tracking performance and assessing effectiveness. The next chapters detail these components, beginning with the selection of monitoring indicators and followed by the design of the evaluation plan for the DYPA digital tool.
5.3. Monitoring indicators
Copy link to 5.3. Monitoring indicatorsMonitoring the implementation of DYPA’s new digital tool for identifying and supporting vulnerable jobseekers ensures that it effectively achieves its intended objectives and facilitates the early identification of challenges as they arise. This would allow DYPA to promptly detect situations in which the tool’s performance deviates from expectations and to implement operational adjustments where necessary. A well-designed monitoring framework also complements the longer-term evaluation and impact assessment of the tool.
5.3.1. The monitoring framework includes indicators related to each step of the results chain
The monitoring framework is structured around a set of indicators (Table 5.1) developed along the results chain, covering inputs, activities (or steps of implementation in the case of the DYPA’s digital tool), outputs, and outcomes. Each category of indicators serves a distinct purpose:
Table 5.1. The proposed monitoring framework contains indicators for inputs, activities, outputs and outcomes
Copy link to Table 5.1. The proposed monitoring framework contains indicators for inputs, activities, outputs and outcomes|
Step of the result chain |
Monitoring indicators |
Data source |
|---|---|---|
|
Input |
|
Financial report, project management (staff time logs) |
|
Activities before national roll-out (relative to the steps of the implementation plan in Chapter 3) |
|
DYPA’s IIS, data obtained through the profiling questionnaire, ERGANI, project management logs. |
|
Activities after national roll-out |
|
DYPA IIS, IT process data, counsellor surveys, a feedback mechanism built in the tool |
|
Outputs |
|
DYPA IIS, ERGANI, IT process data, counsellor surveys |
|
Outcomes |
|
DYPA IIS, ERGANI, IT process data, counsellor surveys, jobseeker survey |
Note: IIS – Integrated Information System. Steps for activity indicators refer to the steps of the implementation plan presented in Chapter 3. Indicators related to longer term outcomes and impact are presented in Table 5.2.
Input indicators ensure that the necessary resources are in place and used efficiently. These indicators can track both the financial resources allocated to the project and the time that DYPA officials dedicate to its development and implementation. Monitoring this helps ensure that sufficient personnel resources are assigned to the project and that the source of these resources is clearly identified. It is also useful to keep track of the financial resources already spent compared with those budgeted.
Activity indicators ensure that the necessary actions are taken to convert inputs into outputs and outcomes. During the development phases of the new tool, the activity indicators cover the steps of the implementation plan and ensure that the activities underpinning the tool’s development remain on track. Key activities during this period include defining DYPA’s future profiling approach, identifying the relevant stakeholders and securing their active engagement, and specifying the tool’s technical and operational design. This step also involves monitoring the data included in the tool and their quality, as well as the information on services (e.g. target groups and eligibility criteria) needed to support effective referrals of vulnerable jobseekers. Additional activities include making training available to DYPA counsellors and ensuring participation, defining the details of the M&E framework (including the evaluation method), piloting the tool, and planning for national roll-out.
While indicators for inputs, outputs and outcomes can be broadly similar in the M&E framework before and after the national roll-out of the new digital tool, activity indicators need to undergo more changes. Many of the activities need to be completed by the time of the national roll-out and are no longer relevant. Additional activity indicators might become relevant for the continuous deployment and maintenance of the tool after the roll-out. However, ensuring training availability will remain relevant also after the national roll-out of the tool, similarly to further fine‑tuning the tool based on user feedback and performance data.
Outputs capture the short-term changes that result directly from the activities carried out to implement the tool, before the final outcomes are realised. In DYPA’s context, the introduction of the tool is expected to influence how effectively vulnerable jobseekers are identified and the types of services they are referred to. For example, it may affect the types and frequency of referrals to various services – both those provided directly by DYPA (e.g. training, employment programmes) and those procured externally. The data analysis conducted under Chapter 3 showed that only a small share of jobseekers typically participates in ALMPs (approximately 4%). This figure could serve as a benchmark for comparing the participation rate of vulnerable groups in relevant services following the implementation of the tool.
Finally, outcome indicators track short- to medium-term changes that result directly from the use of the tool. In DYPA’s context, the tool is expected to lead to more timely and more intensive support for vulnerable jobseekers. Outcomes can be both objective – such as the time to the first meeting between counsellors and vulnerable jobseekers – and behavioural, such as counsellors’ satisfaction with the tool and its technical implementation, or jobseekers’ satisfaction with the services received. Beyond these intermediate outcomes, the tool may influence final outcomes as well, such as reducing unemployment durations and accelerating (re‑)entry into employment among jobseekers counselled using the tool.
5.3.2. The monitoring indicators require different types of data and could be visualised in a dashboard
While many of the indicators described in Table 5.1 could, in principle, be constructed using data from DYPA’s Integrated Information System (IIS) or other registries – such as ERGANI, with which DYPA already has automatic data exchange for certain information – some additional data will need to be collected specifically for monitoring and evaluation purposes. Identifying these data needs at the design stage is crucial to ensure that the necessary information is systematically collected as the tool is implemented. This may include new variables, such as tool usage logs or data on counsellor training, as well as data that are already collected but not yet available in a format suitable for analysis (e.g. information on meetings with DYPA counsellors). Moreover, if the tool is used to identify suitable external services and measures for jobseekers, data on how counsellors use this information could also be collected systematically. Such data would help measure how the tool is used in practice and assess its effectiveness in supporting both counsellors and jobseekers.
Extending existing data collection and storage protocols to include the new data required for the M&E framework would help ensure quality and consistency. Importantly, any data collection and integration – whether for M&E purposes or for the general functioning of the tool – must comply with applicable data protection regulations, including the EU General Data Protection Regulation (GDPR). Among other things, this implies that data must be processed lawfully and transparently, that only data strictly necessary for the purposes of the tool are collected (“data minimisation” principle), that data are kept accurate and up to date, and that DYPA puts in place appropriate technical and organisational safeguards to ensure data security. This is particularly important as the tool requires the use of personal data concerning a group of jobseekers that is especially vulnerable. In this spirit, early co‑ordination between DYPA’s operational, IT, and analytical teams, as well as with the Data Protection Officer (DPO), will be essential to guarantee that new data fields are standardised across systems, that data protection and ethical standards are upheld, and that the information collected can effectively support both monitoring and impact evaluation activities.
In addition to data generated directly through DYPA’s operations and interactions with clients, supplementary information could be collected separately through online surveys of counsellors and jobseekers, triggered following defined actions. Such information is essential for identifying changes in counselling practices and jobseeker behaviour, including measures of jobseeker motivation, trust in counsellors, and counsellors’ views on tool usability and satisfaction. Surveys can help not only to understand whether the tool works, but also how it works. Regular surveys could capture baseline values prior to the implementation of the tool and subsequently collect new data at regular intervals (depending on DYPA’s capacity).
When conducting surveys, it is crucial to minimise attrition due to non-response by encouraging participation among both counsellors and jobseekers. Raising awareness of the importance of data collection for understanding the tool’s effects can help increase the response rate. Additionally, jobseekers could be invited to complete the survey during their visit to a regional office or during their consultation with a counsellor. Although this approach may be time‑consuming, it can substantially improve participation rates and enhance the representativeness of the data collected.
Once the monitoring indicators have been defined and data collection protocols for the new variables established, DYPA could visualise selected indicators – such as the most important ones or those requiring regular monitoring and updating – in a dashboard that enables project managers to regularly track progress in the tool’s implementation over time and identify challenges or issues requiring timely action.
5.4. Proposal for an evaluation plan
Copy link to 5.4. Proposal for an evaluation planEvaluation is a systematic process that involves forming judgements about the design, implementation, effectiveness, and impact of an intervention. It can take different forms depending on its purpose, for instance, process evaluations examine how an intervention is implemented and whether it operates as intended; cost-benefit analyses assess its efficiency and value for money; and impact evaluations estimate the causal effects of the intervention on specific outcomes.
Ideally, evaluation should be planned alongside the design of the intervention itself. Early planning helps to ensure that relevant data are collected and that evaluation questions are aligned with the initiative’s objectives. Thus, evaluation planning is closely linked to, and informed by, the theory of change, which clarifies the outcomes the tool seeks to achieve and the activities through which these outcomes are expected to be realised (see Section 5.2.2). An evaluation plan sets out the key components necessary to ensure that the evaluation (whether addressing processes, outcomes, efficiency, or impact) is focussed, feasible, and aligned with the objectives of the initiative. It typically outlines the following components:
What needs to be evaluated: This component involves identifying the purpose of the evaluation and its main audience and then formulating the key evaluation questions. These questions should, ideally, be developed in consultation with stakeholders, and should be directly linked to the programme’s goals and objectives. For instance, a relevant question in this context might be whether the DYPA digital tool has achieved its intended objectives of improving the identification, counselling, and referral of vulnerable jobseekers, and whether these improvements translate into better labour market outcomes.
What information is required: Once the outcomes, indicators, and evaluation questions have been defined, it is necessary to identify the data needed and how these will be collected. Relevant data may be drawn from the monitoring process (e.g. number of users, referrals, or uptake of services), as well as from administrative sources such as ERGANI employment histories or DYPA profiling records. In addition, surveys, interviews, and focus groups can provide qualitative insights into implementation and user experience. Ethical and data protection considerations should also be addressed at this stage.
Who is responsible: Although DYPA would be responsible for organising and overseeing all evaluation activities as part of its M&E framework, it may choose to contract the implementation of specific evaluations to independent evaluators or external experts. It is important to note that, even in this case, DYPA should retain ownership of the process, including decisions on what and when to evaluate, and ensure that findings are used to inform policy design and implementation and guide any necessary adjustments.
Evaluation approach and methodology: The choice of evaluation design should be informed by the purpose of the evaluation and the availability of data. The methodological approach, which in the case of an impact evaluation can be experimental or quasi‑experimental, is often developed in collaboration with evaluators and it aims to balance rigour with feasibility.
Timeline: Finally, the plan should determine when evaluations will take place. For example, a mid-term evaluation could be conducted after the pilot phase to assess early outcomes and implementation challenges, followed by a final evaluation after national roll-out to measure overall effectiveness and sustainability.
The remainder of this chapter proposes a general framework for evaluating the impact of the DYPA’s new digital tool. It develops the main components of an impact evaluation framework in greater depth and provides concrete examples and technical guidance to support DYPA in preparing for implementation. Specifically, Section 5.4.1 discusses the outcomes to be evaluated and related data needs; Section 5.4.2 highlights the need for robust and credible evaluation approaches; Section 5.4.3 offers more detailed technical advice on designing a randomised controlled trial (RCT) to assess the impact of the digital tool; and Section 5.4.4 presents alternative evaluation methods.
In practice, DYPA can use the framework presented in this chapter as a reference and practical guideline. Moreover, a series of guiding questions set out in Annex Table 5.A.1 can help DYPA refine and adjust the framework as needed, depending on the tool’s final design, implementation context, and policy priorities. The table has been adapted from established evaluation planning templates in the monitoring and evaluation literature and is intended to serve as a practical checklist for structuring the planning process.
5.4.1. Defining outcomes and indicators to evaluate impact enables to plan for data collection and evaluation
Defining the purpose of the evaluation helps determine what type of evaluation is required and what data need to be collected. At a minimum, the evaluation of the DYPA digital tool should assess whether it delivers the outcomes identified in the results chain and contributes to improving service delivery and labour-market outcomes for vulnerable jobseekers, i.e. conducting an impact evaluation.
Following the example provided in Figure 5.1, the evaluation should examine whether the tool leads to: (i) more timely interventions for vulnerable jobseekers; (ii) greater uptake of tailored services and ALMPs; (iii) shorter unemployment spells and improved transitions into employment; and (iv) more stable employment outcomes and a reduced risk of return to unemployment.
Once the purpose and outcomes of the evaluation have been defined, they should be linked to indicators that allow to measure the achievement of the outcomes over time. Indicators provide a way to translate abstract objectives into quantifiable metrics that can be tracked systematically. For example, the outcome of “shorter unemployment spells” can be measured through indicators such as the average duration of unemployment spells or transition rates into unsubsidised employment within 6‑12 months of registration.
Building on the monitoring framework presented in Section 5.3, which outlined indicators to track inputs, activities, outputs and outcomes for monitoring; Table 5.2 presents a set of indicators, and corresponding data sources for the impact evaluation of the DYPA digital tool.
Table 5.2. Linking outcomes to indicators and data sources for the DYPA digital tool evaluation
Copy link to Table 5.2. Linking outcomes to indicators and data sources for the DYPA digital tool evaluation|
Outcomes and impact |
Indicators |
Potential data sources |
|---|---|---|
|
Timely interventions for vulnerable jobseekers |
• Average time from registration to first counselling meeting |
• DYPA IIS (profiling, appointments) • IT process data (Tool usage logs) |
|
Greater uptake of ALMPs and external services |
• Participation rate of vulnerable jobseekers in ALMPs • Average number of referrals to external services per jobseeker |
• DYPA IIS or newly developed database (External service mapping) |
|
Shorter unemployment spells and improved transitions into employment |
• Average duration of unemployment spells • Transition rates into unsubsidised employment within 6/12/18 months • Probability of remaining employed 6 months after placement |
• DYPA IIS • ERGANI • Self-employment records |
|
More stable employment outcomes and reduced return to unemployment |
• Employment retention rates at 6/12/18 months • Re‑registration rates at DYPA after exit to employment • Share of jobseekers with repeated LTU spells |
• ERGANI • DYPA IIS |
|
Improved social and economic inclusion (longer-term impact) |
• Earnings growth trajectories post-employment, including average monthly earnings after 12/24/36 months of re‑entering employment • Other social outcomes including those related to health and crime |
• ERGANI • Administrative data from other public institutions (e.g. Ministry of Health, Justice) |
Note: Indicators related to earlier stages of implementation are presented in Table 5.1. IIS – Integrated Information System.
Similarly to monitoring, most of the information required to evaluate the DYPA’s digital tool is already routinely collected through the administrative systems of DYPA and ERGANI, and other administrative databases. These sources provide detailed information on jobseeker registration, profiling, counselling, and employment outcomes, which make up a solid basis for monitoring and evaluation.
5.4.2. A robust evaluation design is essential to assess the tool’s impact
Credibly assessing the impact of the tool requires determining whether jobseekers supported through the tool achieve better outcomes than they would have without it (for instance, whether they find jobs more quickly, remain employed for longer, or make greater use of services). To do this, the evaluation must compare the outcomes of jobseekers whose counsellors used the tool with the hypothetical outcomes of those same jobseekers had their counsellors not used it (the “counterfactual”). Because, in reality, a counsellor either uses or does not use the tool with a given jobseeker, only one of these situations can ever be observed. Thus, a robust evaluation must establish a credible counterfactual against which actual outcomes can be compared.
If the use of the tool were left to counsellors’ discretion, building such a counterfactual would be challenging. Jobseekers whose counsellors chose to use the tool might differ systematically from those whose counsellors did not. For example, counsellors could be more likely to rely on the tool when working with jobseekers whose distance from the labour market is more difficult to assess. Similarly, counsellors with stronger digital skills might be more inclined to use the tool, and their effectiveness in supporting jobseekers could be higher for reasons unrelated to the tool itself. In such cases, any observed differences in outcomes would reflect both the effect of the tool and these underlying differences, making it difficult to isolate the causal impact of the tool.
The goal of a robust evaluation, therefore, is to compare the outcomes of jobseekers who were supported with the digital tool against those of otherwise identical jobseekers who were not, so that any differences can be attributed solely to the tool. This can be achieved through experimental methods, such as randomised controlled trials (RCTs), or through quasi‑experimental methods that account for differences analytically (UK Treasury, 2020[5]).
RCTs are considered the gold standard. In this context, randomisation would mean deciding in advance which counsellors (or which local offices) will use the new tool, rather than leaving it to individual choice. This ensures that jobseekers assigned to the “treatment” and “control” groups are comparable in both observable and unobservable characteristics. Any measured differences in outcomes between the two groups could then be attributed with confidence to the tool itself.
Although RCTs require careful design and can be more resource‑intensive than quasi‑experimental methods, they offer results that are generally easier to interpret and communicate, and they generate the most reliable evidence on the impact of new interventions. An RCT could not only demonstrate whether the digital tool improves outcomes for vulnerable jobseekers, but also provide insights into why and for whom the tool is most effective. Section 5.4.3 presents the key considerations for designing such an RCT, including decisions about randomisation, stratification, and sample size.
5.4.3. RCTs deliver the most credible evidence when carefully designed and implemented
The following sections describe key aspects relevant to the design and implementation of an RCT to evaluate the DYPA’s new digital tool. This guidance draws on recent OECD work on RCT design for digital tools in Latvia (OECD, 2024[6]) and has been adapted to the context of DYPA’s new digital tool. It is important to emphasise that the precise design of the RCT will depend on the final technical specifications of the tool and its integration into DYPA’s systems and processes. Therefore, DYPA should plan the evaluation design carefully once the tool’s conceptualisation and pilot strategy are finalised.
Feasibility assessment
Before defining the experimental design, DYPA would first need to assess whether conditions for a robust RCT are in place. This feasibility assessment would consider practical questions such as whether:
Counsellors and KPA2 offices are willing to participate;
Workflows and staffing are stable;
The IT systems can support automated randomisation, logging of tool usage and timely extraction of outcome data;
Training to counsellors and offices can be delivered consistently.
The feasibility assessment should also examine risks of contamination/non-compliance between treatment and control groups. Clarifying these preconditions will help DYPA determine whether an RCT is viable and, if so, which design options are feasible in practice.
Level of randomisation
A central design choice in an RCT concerns the level at which randomisation occurs. In the case of the DYPA’s digital tool, randomisation could take place at three possible levels: jobseeker, counsellor, or local employment office.
Jobseeker-level randomisation would involve assigning each newly registered jobseeker randomly to either the treatment group (where counsellors use the new tool to guide service delivery) or the control group (where counsellors follow standard procedures without the tool). Randomisation could be automated based on the jobseeker’s unique identifier at registration.
Alternatively, randomisation could be carried out at the counsellor or at the office level. Counsellor-level randomisation (a form of cluster randomisation) would mean assigning counsellors randomly to use either the new or existing system. All jobseekers advised by a “treatment” counsellor would then form part of the treatment group, while those advised by “control” counsellors would serve as the comparison group. Similarly, office‑level randomisation would assign entire DYPA local offices to treatment or control. All jobseekers registered in a treatment office would receive counselling supported by the new tool.
Each level of randomisation has advantages and disadvantages. Jobseeker-level randomisation maximises statistical power and requires a smaller sample size to detect a given effect. However, it also introduces risks of spillover effects,1 since counsellors would be using both the new and old systems for different clients. Counsellor- or office‑level randomisation, by contrast, minimises such spillovers and may better reflect real-world implementation, but requires more participants and, as a result, it can be more costly.
The most appropriate level of randomisation will depend on the tool’s operational design and DYPA’s IT infrastructure. For instance, jobseeker-level randomisation would require that counsellors access the new tool only for certain clients and that DYPA’s information systems can assign clients automatically to either process. By contrast, Counsellor- or office‑level randomisation allows changes to be implemented for each counsellor or office separately. On the other hand, if the new digital tool is developed specifically for the KPA2 offices serving vulnerable groups, options for office‑level randomisation would be more limited, as the number of eligible offices is smaller. In this case, counsellor-level randomisation could offer a practical compromise, that provides sufficient variation for evaluation while still limiting spillovers. Conversely, if the tool is intended for use across all DYPA offices, an office‑level pilot in a subset of offices could provide valuable evidence before national roll-out.
Level of stratification
DYPA could apply stratified randomisation to ensure that key characteristics are evenly distributed between treatment and control groups, and in turn, enhancing the precision of the results and allowing DYPA to investigate effect heterogeneity across different subgroups. More specifically, stratification involves dividing the sample into subgroups (strata) based on relevant observable characteristics and randomising within each stratum independently. Possible stratification variables include jobseeker or counsellor characteristics such as gender or age; the region or office, or a combination of multiple characteristics (e.g. gender and region).
For example, DYPA could assign half of the counsellors within each regional office to the treatment group and the other half to the control group, ensuring balance by office. Alternatively, if randomisation is implemented at the office level, stratification could occur by region to ensure representation across urban and rural contexts. Stratification can also be useful to analyse heterogeneity in the tool’s impact across specific subgroups of vulnerable jobseekers (for instance, by gender, age group, or vulnerability category) as it leads to each subgroup being sufficiently represented in both the treatment and control groups.
The order of stratification variables is important. Equal numbers of counsellors (or offices) in treatment and control groups can only be ensured within the first stratum, while subsequent strata may show small imbalances.
Sample size and statistical power
A critical step in planning the RCT is determining the sample size required to detect an impact of the digital tool with sufficient statistical power. Statistical power refers to the probability that the evaluation will detect a true effect of the tool if such an effect exists.
The ability of an RCT to detect whether the DYPA’s digital tool has a real impact depends on several factors, including the sample size, the expected size of the effect, and the level of randomisation (jobseeker, counsellor, or office). The larger the sample of treated units, the smaller the effect that the RCT can detect with precision. Conversely, the smaller the expected effect that the RCT aims to detect, the larger the required sample size needs to be. Inadequate sample sizes can lead to inconclusive results, while unnecessarily large samples increase costs and complexity.
Other elements also affect the precision of the evaluation, such as the degree of similarity between participants within the same cluster (the intra-class correlation) and the inclusion of baseline characteristics that explain part of the variation in outcomes (Duflo and Banerjee, 2017[7]). Understanding how these factors interact helps to design an RCT that balances feasibility, cost, and analytical power.
In the case of the DYPA’s tool, a power analysis can provide an indication of the required number of jobseekers (for jobseeker-level randomisation) or counsellors (for counsellor-level randomisation) in the treatment and control groups. When planning the evaluation, it may be useful to consider evidence from similar initiatives to establish expectations of effect sizes. For example, the SEND@ tool in Spain helping job counsellors to advise jobseekers in job search was found to increase employment probabilities by around 2‑3% within three months of registration (OECD, 2022[8]).
Selection and balance of participants
The selection of participants is a critical step for ensuring that the pilot and the resulting evaluation produce credible and generalisable evidence. Ideally, the pilot of the DYPA’s digital tool would be implemented across all local offices to maximise statistical power and external validity. If this is not feasible due to resource constraints, a smaller number of offices could be selected, provided they are representative of DYPA’s network in terms of region, jobseeker composition, and counsellor characteristics. Ensuring that offices from different geographic areas are included would help capture diverse labour market contexts and reduce the risk that observed effects are specific to a particular location.
As exclusion from participating in treatment can be perceived as unfair, DYPA could adopt a phased roll-out approach to maintain motivation among participants. For example, DYPA could allow control-group counsellors or offices to gain access to the tool after a defined period. The length of the delay would depend on the required number of jobseekers that need to be advised using the profiling tool (as determined by the power analysis). This phased approach would preserve the experimental integrity of the evaluation while preventing negative impacts on staff and jobseekers’ engagement and motivation.
After randomisation, it is crucial to verify that participants in the treatment and control groups are balanced in observable characteristics (e.g. counsellor experience, office type, jobseeker age, education, and prior unemployment history). If imbalances are detected, potential biases can be removed by reweighting, inclusion of covariates, or by combining the RCT with quasi‑experimental methods such as matching.
5.4.4. Quasi‑experimental methods can provide credible evidence when randomisation is not feasible
While RCTs are regarded as the gold standard for causal evaluation, quasi‑experimental methods offer a valuable and practical alternative, particularly when an RCT is not feasible or when evaluating the longer-term effects of interventions that have already been rolled out. In such cases, these methods allow evaluators to estimate credible counterfactuals using existing administrative data, enabling rigorous ex post assessment of policy effectiveness. They can also help determine whether impacts observed during a pilot are sustained once the intervention is implemented at scale, or how effectiveness evolves as labour-market conditions or institutional parameters change. Moreover, a meta‑analysis of ALMP evaluations (Card, Kluve and Weber, 2017[9]) shows that average estimated impacts from non-experimental methods are not significantly different from those derived from randomised controlled trials.
Quasi‑experimental approaches approximate a counterfactual by comparing outcomes of jobseekers who were exposed to the tool with those of similar jobseekers who were not. Three main types of quasi‑experimental methods can be considered: Difference‑in-Differences (DiD), Regression Discontinuity Design (RDD), and Propensity Score Matching (PSM).
Among these, DiD and RDD are natural experiment approaches that rely on specific implementation conditions. DiD compares changes in outcomes over time between treated and untreated groups, assuming both would have followed similar trends in the absence of the intervention.2 It can yield credible estimates when the introduction of the tool follows a clear, exogenous schedule (e.g. phased roll-out across regions) that is unrelated to jobseeker characteristics (Roth et al., 2023[10]). On the other hand, RDD, exploits explicit eligibility thresholds (Imbens and Lemieux, 2008[11]). For example, if access to the tool were based on a profiling score or other rule, it would be possible to compare the outcomes of jobseekers just above and just below the cut-off. These approaches can yield robust causal estimates but are only possible if the implementation naturally generates such conditions, which may not be the case for the DYPA’s tool.
By contrast, PSM is more flexible and can be applied even without phased roll-out or formal eligibility rules. This approach involves pairing jobseekers whose counsellors used the tool with otherwise similar jobseekers whose counsellors did not, based on observable characteristics such as age, gender, education, unemployment duration, or prior participation in ALMPs. This creates an artificial control group that mimics the counterfactual scenario (Rosenbaum and Rubin, 1983[12]). PSM can produce well-balanced groups and credible estimates of the tool’s impact when applied using rich administrative data.
Regardless of the specific approach, DYPA should integrate evaluation planning into the roll-out of the tool. Decisions about where, when, and how the tool is introduced, and what information is collected, can determine whether robust evaluation methods can later be applied.
5.5. Conclusion
Copy link to 5.5. ConclusionThis note outlines an M&E framework to accompany the implementation of the new digital tool designed to strengthen DYPA’s support for vulnerable jobseekers. The framework builds on a proposed results chain that guides the development of a set of monitoring indicators, as well as the concept for a counterfactual impact evaluation of the tool. Importantly, this results chain should be developed through a participatory process involving all relevant stakeholders. Therefore, it should be revisited and adjusted as DYPA further concretises its plans for the implementation of the new digital tool.
The set of monitoring indicators presented in this framework allows DYPA to track the introduction of the new digital tool starting from the inputs used, the activities performed and, finally to outputs and outcomes. While most of the data required to implement the monitoring indicators are already contained in DYPA’s IIS or in registers with which DYPA is already exchanging information, additional information on tool usage will need to be systematically collected. Furthermore, behavioural data could be collected through surveys, as this would allow DYPA to better understand not only whether the tool works but also how it works.
The concept for the counterfactual evaluation discusses possible ways of identifying the causal impact of the tool on the labour market outcomes of vulnerable jobseekers. A reliable, method to obtain a causal estimate of the effect of the new digital tool is an RCT, which randomises the use of the tool across DYPA counsellors or DYPA KPA2s. Although this approach is typically more costly than quasi‑experimental methods, it allows for a more intuitive and easier interpretation of the results. The concept highlights some of the aspects that need consideration when carrying out an RCT and also discusses some possible alternatives to RCTs. Importantly, the evaluation of the tool should be designed and conceptualised early on, ideally in parallel with the concretisation of the design of the tool.
References
[14] Angrist, J. and J. Pischke (2009), Mostly harmless econometrics: An empiricist’s companion, Princeton university press.
[9] Card, D., J. Kluve and A. Weber (2017), “What Works? A Meta Analysis of Recent Active Labor Market Program Evaluations”, Journal of the European Economic Association, Vol. 16/3, pp. 894-931, https://doi.org/10.1093/jeea/jvx028.
[3] Clark and A. Anderson (2004), Theories of Change and Logic Models: Telling Them Apart.
[7] Duflo, E. and A. Banerjee (eds.) (2017), Handbook of Field Experiments, Elsevier.
[11] Imbens, G. and T. Lemieux (2008), “Regression discontinuity designs: A guide to practice”, Journal of Econometrics, Vol. 142/2, pp. 615-635, https://doi.org/10.1016/j.jeconom.2007.05.001.
[6] OECD (2024), Monitoring and evaluation framework for a new jobseeker profiling tool in the State Employment Agency.
[8] OECD (2022), Impact evaluation of the digital tool for employment counsellors in Spain: SEND@, OECD, Paris, https://web-archive.oecd.org/2022-10-11/642826-FinalReport-EvaluationOfSEND.pdf.
[2] OECD (2022), Technical report: Impact Evaluation of Vocational Training and Employment Subsidies for the Unemployed in Lithuania, https://www.oecd.org/content/dam/oecd/en/about/projects/technical-reports-and-presentations-dg-reform/lithuania/LTtechnicalNote-September2022.pdf.
[1] OECD (2020), Impact Evaluations Framework for the Spanish Ministry of Labour and Social Economy and Ministry of Inclusion, Social Security and Migrations, https://www.oecd.org/content/dam/oecd/en/topics/policy-issues/employment-services/impact_evaluations_framework.pdf.
[4] Rist, R. and J. Zall Kusek (2004), Ten steps to a results-based monitoring and evaluation system: a handbook for development practitioners., World Bank Publications., https://www.oecd.org/content/dam/oecd/en/toolkits/derec/evaluation-reports/derec/worldbankgroup/35281194.pdf.
[12] Rosenbaum, P. and D. Rubin (1983), “The central role of the propensity score in observational studies for causal effects”, Biometrika, Vol. 70/1, pp. 41-55, https://doi.org/10.1093/biomet/70.1.41.
[13] Rossi, P., M. Lipsey and H. Freeman (2003), Evaluation: A systematic approach, Sage publications.
[10] Roth, J. et al. (2023), “What’s trending in difference-in-differences? A synthesis of the recent econometrics literature”, Journal of Econometrics, Vol. 235/2, pp. 2218-2244, https://doi.org/10.1016/j.jeconom.2023.03.008.
[5] UK Treasury (2020), The Magenta Book: HM Treasury guidance on what to consider when designing an evaluation, https://www.gov.uk/government/publications/the-magenta-book.
Annex 5.A. Key questions in planning an evaluation
Copy link to Annex 5.A. Key questions in planning an evaluationAnnex Table 5.A.1. Key questions in planning an evaluation
Copy link to Annex Table 5.A.1. Key questions in planning an evaluation|
Why do I need to evaluate? What is the purpose of the evaluation? Who is the intended audience for the evaluation findings? |
Identifying the purpose of the evaluation and who the main audience is will help to determine the type of evaluation undertaken and the data collected. The purpose of an evaluation is often linked to the intended audience and what they want to know. For example, staff and managers may want to know that the initiative is reaching the right people and whether any improvements could be made. Funders will want to know whether their investment is making a positive difference, and by how much. Different evaluation purposes may require a different type of evaluation because each will ask different questions. |
|
What do I need to find out? What are the evaluation questions? What evaluation design or type is best? |
Evaluation questions are high-level questions that guide the evaluation and outline what is hoped to be learned. Before deciding what types of data are required and how to collect it, it is necessary to specify the questions to be answered. Evaluation questions may seem intuitive and straightforward but without well-developed, relevant, and accurate evaluation questions, evaluations can fail to address the most important issues the audience(s) is interested in. Evaluation questions should ideally be developed with stakeholders, which may include the staff delivering the initiative, community representatives and/or advocacy groups, funders, and local and national decision makers. Consultations may be informal conversations or semi-structured individual and/or group interviewing. Involving stakeholders is also valuable for gaining their buy-in to the process and the results. (Rossi, Lipsey and Freeman, 2003[13]) evaluation questions must be: • reasonable and appropriate, or realistic for the given programme • answerable; like the reasonableness of a question, good evaluation questions must be able to be answered to some degree of certainty. If questions are too vague or broad, or require data that is unavailable or unobservable, they are not answerable, and • based on programme goals and objectives. Often the first question stakeholders want to know the answer to is “how well is the initiative is being implemented?” Is it being implemented in accordance with policy and programme design objectives, how it is working in practice, and is it meeting the desired short-term goals? These types of questions are typically answered through a process evaluation. A process evaluation will help to identify: 1. Who is receiving the initiative? 2. Are they the target audience? 3. How does the target population(s) interact with the initiative? 4. What sorts of services are they participating in due to receiving case management support? 5. What do they think of the support and services they are receiving? Are they satisfied? 6. What are the barriers to implementation? 7. Conversely, what are the enablers? 8. How is the initiative functioning from administrative, organisational, and/or staff perspectives? Another important role a process evaluation can play is assessing whether appropriate foundations are in place to undertake future evaluations, e.g. outcome/impact evaluations. Outcome and cost-effectiveness evaluations and cost-benefit analysis can be undertaken when an initiative has been running long enough to produce results to determine whether it has produced demonstrable effects (and at what cost). Ideally, they are planned in the early phases of an initiative as it can be difficult (and potentially impossible) and expensive to set up later. |
|
What will I measure? What are the outcomes sought? What indicators should be used? |
Outcomes and the respective indicators should have been determined through the ToC process, discussed above. A process evaluation requires clarity not only about the outcomes sought for service users but also about implementation goals. |
|
How will I measure it? What data collection methods to use? How to ensure the data is of high quality? What are the ethical considerations? |
Once the outcomes and indicators, evaluation questions and type of evaluation have been identified the data required and how to collect it needs to be determined. Data can be quantitative (numbers) or qualitative (words). Many evaluations are “mixed methods” where a combination of quantitative and qualitative data is collected and used. Quantitative data will identify how many, how much or how often something has occurred. Quantitative data collection methods include outcomes measurement tools, surveys with rating scales, and observation methods that count how many times something has happened. Qualitative data will identify why or how something happened and is useful for understanding attitudes, beliefs, and behaviours. Qualitative data collection methods include interviews, focus groups and open-ended questions in a questionnaire. Ethical issues need to be considered and addressed. For example, if service users’ data is going to be used is their informed consent required? Are there relevant data protection and ethical guidelines that need to be adhered to? Does the evaluation require ethics clearance from a relevant body? The volume of data and the questions being addressed through the analysis will determine the type of analysis done. Data collection and analysis are specialised fields and ideally advice and support from data, research and/or evaluation experts should be sought. Software programmes such as SPSS (for quantitative data) or NVIVO (for qualitative data) can support analysis but even with these programmes, skills and knowledge in data analysis are still required. |
|
What resources are required? What is the timeframe? |
The funding allocated for monitoring and evaluation will significantly impact what can be done and in what timeframe, particularly if in-house evaluation resources are limited. The availability of evaluation resources will have implications for the design and scale of the evaluation, as well as what questions can realistically be answered. Management questions that need to be addressed include: • who will manage the evaluation? • who will conduct the evaluation? (It is always a good idea to use an evaluator who is independent from the management of the initiative) • how will it be governed? (An oversight body, such as a working group or steering committee can be useful) |
|
How will the analysis and findings of the evaluation be shared? |
An important question that is often not thought about when an evaluation is being planned, is how will the evaluation findings be communicated to stakeholders and possibly members of the public? What kinds of information will be included (e.g. findings, conclusions, judgments, recommendations)? Effective dissemination of findings increases the likelihood that the evaluation will have an impact on decision making. Findings should be presented in a way that audience(s) will understand – a lengthy and complicated report for example may not be the best way to communicate findings to some audiences. Different communication techniques and styles will be required for different audiences. |
Notes
Copy link to Notes← 1. These effects occurs when the assignment of some units to the treatment group affects units in the control group. In the context of the jobseeker profiling tool, spillover effects could bias the results towards zero.
← 2. The parallel-trends assumption implies that, in the absence of the intervention, average outcomes for treated and untreated groups would have evolved in the same way over time (Angrist and Pischke, 2009[14]).