
Environ. Res. Lett. 16 (2021) 054046 https://doi.org/10.1088/1748-9326/abf964
OPEN ACCESS
RECEIVED
23 December 2020
REVISED
25 March 2021
ACCEPTED FOR PUBLICATION
19 April 2021
PUBLISHED
10 May 2021
Original content from
this work may be used
under the terms of the
Creative Commons
Attribution 4.0 licence.
Any further distribution
of this work must
maintain attribution to
the author(s) and the title
of the work, journal
citation and DOI.
LETTER
Integrated assessment model diagnostics: key indicators and
model evolution
Mathijs Harmsen1,2, Elmar Kriegler3,21, Detlef P van Vuuren1,2, Kaj-Ivar van der Wijst1,2,
Gunnar Luderer3,20, Ryna Cui4, Olivier Dessens5, Laurent Drouet6, Johannes Emmerling6,
Jennifer Faye Morris7, Florian Fosse8, Dimitris Fragkiadakis9, Kostas Fragkiadakis9,
Panagiotis Fragkos9, Oliver Fricko10, Shinichiro Fujimori11, David Gernaat1,2, Céline Guivarch12,
Gokul Iyer13, Panagiotis Karkatsoulis9, Ilkka Keppo14, Kimon Keramidas8, Alexandre Köberle15,
Peter Kolp10, Volker Krey10, Christoph Krüger1,2, Florian Leblanc12, Shivika Mittal15,
Sergey Paltsev7, Pedro Rochedo16, Bas J van Ruijven10, Ronald D Sands17, Fuminori Sano18,
Jessica Strefler3, Eveline Vasquez Arroyo16, Kenichi Wada18 and Behnam Zakeri10,19
1PBL Netherlands Environmental Assessment Agency, Bezuidenhoutseweg 30, 2594 AV The Hague, The Netherlands
2Copernicus Institute for Sustainable Development, Utrecht University, Princetonlaan 8a, 3584 CB Utrecht, The Netherlands
3Potsdam Institute for Climate Impact Research (PIK), Member of the Leibniz Association, Potsdam D-14412, Germany
4Center for Global Sustainability, University of Maryland, 3101 Van Munching Hall, College Park, MD 20742, United States of America
5University College London, London, United Kingdom
6RFF-CMCC European Institute on Economics and the Environment (EIEE), Centro Euro-Mediterraneo sui Cambiamenti Climatici,
Via Bergogne 34, 20144 Milan, Italy
7MIT Joint Program on the Science and Policy of Global Change, Massachusetts Institute of Technology, Cambridge, MA, United States
of America
8European Commission, Joint Research Centre, Seville, Spain
9E3Modelling S.A., Panormou 70-72, Athens, Greece
10 International Institute for Applied Systems Analysis, Schlossplatz-1, A-2361 Laxenburg, Austria
11 Department of Environmental Engineering, Kyoto University, Kyoto, Japan & National Institute for Environmental Studies, Center
for Social and Environmental Systems Research, Tsukuba, Ibaraki 305-8506, Japan
12 Ecole des Ponts ParisTech, CIRED, 45bis avenue de la Belle Gabrielle, Nogent-sur-Marne, France
13 Joint Global Change Research Institute, Pacific Northwest National Laboratory and University of Maryland, 5825 University Research
Court, Suite 3500, College Park, MD 20740, United States of America
14 Department of Mechanical Engineering, School of Engineering, Aalto University, Otakaari 4, Espoo 02150, Finland
15 Grantham Institute, Imperial College London, Exhibition Road, London SW7 2AZ
16 Energy Planning Program, COPPE, Universidade Federal do Rio de Janeiro (UFRJ), PO Box 68565, 21941-914 Rio de Janeiro, RJ,
Brazil
17 USDA Economic Research Service, Kansas City, MO, United States of America
18 Research Institute of Innovative Technology for the Earth (RITE), 9-2, Kizugawadai, Kizugawa-Shi, Kyoto 619-0292, Japan
19 Sustainable Energy Planning Research Group, Aalborg University, A. C. Meyers Vnge 15, Copenhagen 2450, Denmark
20 Global Energy Systems Analysis, Technische Universität Berlin, Straße des 17. Juni 135, Berlin 10623, Germany
21 Faculty of Economics and Social Sciences, University of Potsdam, August-Bebel-Str. 89, Potsdam 14482, Germany
E-mail: mathijs.har[email protected]
Keywords: diagnostics, integrated assessment models, climate policy, 6th Assessment Report IPCC, renewable energy, mitigation, AR6
Supplementary material for this article is available online
Abstract
Integrated assessment models (IAMs) form a prime tool in informing about climate mitigation
strategies. Diagnostic indicators that allow comparison across these models can help describe and
explain differences in model projections. This increases transparency and comparability. Earlier,
the IAM community has developed an approach to diagnose models (Kriegler (2015 Technol.
Forecast. Soc. Change 90 45–61)). Here we build on this, by proposing a selected set of well-defined
indicators as a community standard, to systematically and routinely assess IAM behaviour, similar
to metrics used for other modeling communities such as climate models. These indicators are the
relative abatement index, emission reduction type index, inertia timescale, fossil fuel reduction,
transformation index and cost per abatement value. We apply the approach to 17 IAMs, assessing
both older as well as their latest versions, as applied in the IPCC 6th Assessment Report.
© 2021 The Author(s). Published by IOP Publishing Ltd

Environ. Res. Lett. 16 (2021) 054046 M Harmsen et al
The study shows that the approach can be easily applied and used to indentify key differences
between models and model versions. Moreover, we demonstrate that this comparison helps to link
model behavior to model characteristics and assumptions. We show that together, the set of six
indicators can provide useful indication of the main traits of the model and can roughly indicate
the general model behavior. The results also show that there is often a considerable spread across
the models. Interestingly, the diagnostic values often change for different model versions, but there
does not seem to be a distinct trend.
1. Introduction
Integrated assessment models (IAMs) are widely
used for climate policy and climate change ana-
lysis (van Beek et al 2020). They offer the means to
assess the linkages between long-term climate policy
goals and near-term policy choices. They can also
look into mitigation strategies taking into account
cross-sectoral and, cross-regional and systems inter-
actions (energy, land, economy, climate). As such,
they form a key information source feeding into
the climate change mitigation policy process, e.g. via
IPCC Assessment Reports (ARs) (Halsnæs et al 2000,
IPCC 2014). Within IAMs, a distinction can be made
between cost-benefit IAMs (mostly highly stylized)
and detailed process IAMs that are mostly used to
explore different pathways to reach selected policy
goals. The latter comprise a diverse group of models
with different functional structures.
A thorough understanding of how IAM struc-
ture and assumptions affect IAM behavior is critic-
ally important for assessing IAM based policy analysis
and advice. For both policy makers and researchers,
it can provide insights into why results differ between
models and link projections to policy-relevant model
assumptions and structure. It is the goal of diagnostic
tools to foster such understanding. In fact, such tools
can serve key functions: (a) characterizing model
behavior by use of stylized diagnostic experiments,
and (b) relating model behavior patterns to model
structure and input assumptions. We focus mostly on
the first in this study, but aim to cover the second,
where possible. A subsequent function, but beyond
the limits of this study is to qualify the model beha-
vior and assess models’ policy applicability.
In other modeling disciplines, similar diagnostic
tools have been developed. For instance, in climate
research, diagnostic metrics have been applied to
compare climate models and to evaluate their per-
formance (Andrews et al 2012, Flato et al 2013, Eyring
et al 2016). Such indicators, for instance, include cli-
mate sensitivity (indicating the temperature increase
for a doubling of the CO2concentration) and the
transient climate response (indicating warming over
a more limited time period). These tools are not only
used to regularly compare models and thus qualify
their behavior, but even in validation experiments,
leading to assessment of the quality of models for spe-
cific experiments and their evaluation over time.
Also the IAM community has undertaken sev-
eral model diagnostic activities in the past (Gaskins
and Weyant 1993, Weyant 2004,2010, van Vuuren
et al 2009, Wilkerson et al 2015) resulting in the most
recent and comprehensive diagnostic assessment by
Kriegler et al (2015). Here, we propose an updated
and expanded set of widely applicable, key diagnostic
indicators to be used as a community standard.
We determined these by revisiting the approach by
Kriegler et al (2015) and improving them in terms
of precision, simplicity and completeness. In partic-
ular, we propose a novel, standardized approach to
compare different model versions to assess and mon-
itor model differences over time. The approach is ana-
logous to the climate model diagnostics in the sense
that they are based on stylized scenarios with exogen-
ous assumptions. It has been tested on 17 IAMs and
32 model versions, as part of two EU model devel-
opment projects, ADVANCE (www.fp7-advance.eu/)
and NAVIGATE (https://navigate-h2020.eu/), thus
providing coverage of all main process-based IAMs
(and much higher than in preceding studies), includ-
ing all latest model versions. Especially the latter is
highly needed in light of the forthcoming AR6.
A standard set of diagnostics for the community
has obvious advantages. It provides a tool to system-
atically and consistently assess model behavior in all
future studies. Model diagnostic results can be part
of model documentation that can be referenced and
highlighted in papers. Future model-intercomparison
projects could require participating models to reg-
ularly run the core set of diagnostics, to analyze
model behavior of newly developed models or model
versions. Ultimately, this will lead to greater trans-
parency and comprehensibility of IAM applications,
together with model documentation. It will also allow
tracking the development of IAMs over time—and
possibly, in the future, confronting the outcomes with
empirical information or information from other sci-
ence disciplines.
An important innovation of the present study is
the introduction of two diagnostic indicators in addi-
tion to the ones established by Kriegler et al (2015),
namely inertia timescale (IT) and fossil fuel reduc-
tion (FFR). IT provides a measure of the models’ level
of inertia in response to the introduction of climate
policy, a crucial determining factor in deep mitiga-
tion projections. FFR highlights the models tendency
to reduce fossil fuels as part of climate policy, a key
2

Environ. Res. Lett. 16 (2021) 054046 M Harmsen et al
element in model studies that examine the energy
transition.
Here, we present the results for six key indicators,
adding IT and FFR to the original set of indicators
from Kriegler et al (2015); relative abatement index
(RAI), carbon intensity over energy intensity (CoEI),
transformation index (TI) and cost per abatement
value (CAV). The indicators have been simplified to
make them more suitable to be used as a community
standard, namely with a focus on one strong mitig-
ation case and one benchmark year, 30 years in the
future (here 2050, but later in post-2020 assessments).
The latter allows for comparability with future dia-
gnostic assessments. To ensure precision in the dia-
gnostic results, we define single, unique values to
indicate model behavior.
In method section 2, we explain the study design
and list the participating models. The results are split-
up in subsections for each of the indicators and con-
clude with an overview table to classify all the par-
ticipating models. In the section 4, we reflect on
the research questions: Can these indicators be eas-
ily used as diagnostic tools for IAMs, including their
development over time? And what insights do these tools
provide?
2. Methods
2.1. Diagnostic experiments and indicators
The experiments described in this study form a small
selection from a larger set of stylized, diagnostic scen-
arios that have originally been developed as part of
the EU FP7 ADVANCE project (www.fp7-advance.
eu/). These are: Base (a zero carbon tax, i.e. a no-
climate policy baseline) and C80-gr5 (a run with
an exponential carbon equivalent price growth of
5% per year starting in 2020 and a price level of
80 (2010)$/tCO2eq. reached in 2040). C80-gr5 is
used for each key indicator presented here. For
two indicators (RAI and IT) extra scenarios were
used, as will be explained in the next section. Note
that the C80-gr5 scenario represents a 1.5–2 degree
case in most models (see supplement S7 (available
online at stacks.iop.org/ERL/16/054046/mmedia)),
in line with the Paris agreement’s climate ambi-
tions. This makes it a highly relevant showcase for
assessing model behavior in frequent deep mitiga-
tion scenarios. Preferably, model groups used SSP2,
the middle-of-the-road socioeconomic projection
baseline scenario (Riahi et al 2017) for all assump-
tions, including population and economic growth.
The indicators are originally chosen and adapted
here based on criteria set by Kriegler et al (2015):
•Identification of heterogeneity in model responses
•Diagnosis of relevant features for climate policy
analysis
•Applicability to diverse models
•Accessibility and ease of use
Here, we add the following criteria:
•Standardization and comparability between dia-
gnostic studies
•Precision/quantifiability
Based on these criteria, we derive a set of six
indictors that describe model responses to climate
policy. These indicators go beyond the work of
Kriegler et al, because we provide a standardized
formulation—in each case leading to a single value
that characterizes the model. We specify set rules
(benchmark year, scenario used, socio-economic
assumptions) to allow for comparability between
studies in a quantitative way. The main focus is on the
year 2050 as it is (a) policy relevant and (b) provides
a reasonable indication of model behavior through-
out the century. For future use of the indicators, we
define all indicators based on C80-gr5, using the value
30 years after the introduction of the tax (here 2020).
While the focus is on 2050, we also show the 2100 res-
ults in the supplement (S3) to assess if the 2100 num-
bers would lead to different conclusions.
Table 1gives an overview of the key diagnostic
indicators proposed and assessed in this study. Below,
we shortly summarize the setup and rationale behind
the indicators and particularly indicate differences
with and additions to the Kriegler et al (2015)
approach. The combination of the indicators, focuses
on (a) the responsiveness of the model, (b) the type of
mitigation, (c) the scale of the transformation of the
energy system, and (d) mitigation costs as a function
of the carbon price signal.
As in earlier diagnostic exercises, the indicators
are based on global totals to assess the overall behavior
related to global climate policy. A regional assessment
would be possible in a follow-up study. All emis-
sion indicators are based on CO2energy and indus-
trial process (E&I) emissions. This allows for all mod-
els to participate (the land-use system and non-CO2
emissions are modeled by about half of the models).
Moreover, CO2E&I makes out more than two thirds
of all GHG emissions (Olivier and Peters 2020).
The RAI characterizes the emission reductions in
a carbon tax scenario relative to the baseline. It can
be considered the main indicator in the sense that
it measures the overall response to a climate policy
incentive and correlates with elements from the other
indicators (demand and supply side emission reduc-
tions, transformation rate, FFRs and limited inertia).
Hence, it can also be considered a ‘mitigation sens-
itivity’ indicator, analogous to the ‘climate sensitiv-
ity’ in climate models. In order to assess mitigation of
the full suite of GHGs, we also provide a full Kyoto
GHG analysis in the supplement (S4). In addition, an
additional scenario (C30-gr5, with a two thirds lower
tax) is used to visualize a stylized ‘derived MAC curve’
from the RAI, by connecting the projected relative
abatement at ∼0, 50 and 130 $/tCO2.
3

Environ. Res. Lett. 16 (2021) 054046 M Harmsen et al
Table 1. Key diagnostic indicators. For further explanation, see main text.
The ERT indicates the share of supply side meas-
ures (e.g. renewable energy) in bringing down emis-
sions. 1 minus ERT shows the share of the RAI
that that can be attributed to reduced final energy
demand. Values higher than 0.5 imply supply mod-
els (=most common), lower than 0.5 imply demand
models. This indicator replaces the CoEI indicator
from Kriegler et al 2015): CI (as a fraction of CI in
the baseline) over energy intensity, which did not
strongly reflect reductions in energy intensity (e.g. a
model with no energy efficiency at all could still be
classified as a demand focused model).
Two energy system transformation indicators
have been assessed: FFR, which is new in this study
and transformation index (TI, from Kriegler et al
2015). FFR is a simple, policy relevant indicator that
shows the relative reduction of fossil energy compared
to the base year (2020). The FFR indicator was
added to the transformation analysis, since it repres-
ents a less abstract alternative to TI and relates dir-
ectly to recent studies aimed at fossil fuel phase out
and renewable integration (in in the result section,
we also compare FFR to TI to understand what
drives transitions in models). TI shows the extent
of transformation in the energy system (2 =max,
0=none). Note that in table 1, the shares of energy
sources in primary energy system (S), are based
on the following aggregated energy sources: fossil,
4

Environ. Res. Lett. 16 (2021) 054046 M Harmsen et al
Table 2. Participating models, types and versions. Latest model version indicated in bold. For detailed model documentation see:
www.iamcdocumentation.eu/(IAMC wiki). See supplement (S1) for an overview of all scenarios and submissions by the different
models.
non-bioenergy renewables, bioenergy, nuclear, since
these are reported by all models, thus allowing for a
complete comparison.
In this study, we adopt a new indicator that
describes the level of inertia (i.e. persistence of path
dependency) in the models: IT. Path dependencies are
of particularly relevance for the energy system, due to
long-lived capital stocks, technological learning, and
other sources of inertia in the upscaling of new tech-
nologies, as well as behavioral inertia on the demand
side. They are also highly policy-relevant in the con-
text of delayed climate policy adoption and carbon
lock-in, as analyzed in several scenario studies (Riahi
et al 2015, Luderer et al 2018). We here introduce
a new diagnostic indicator that captures inertia in
response to the introduction of climate policy as a
crucial characteristic of IAMs. It is based on a newly
introduced diagnostic carbon price shock scenario to
quantify model representation of inertia. In our scen-
ario set, the shock scenario follows baseline develop-
ments with zero carbon prices until 2040, followed by
an instantaneous carbon price of 80$/tCO2in 2040, as
in the default scenario, with an exponentially grow-
ing carbon price thereafter. For the shock scenarios,
models with perfect foresight were instructed to dis-
able the anticipation of future carbon pricing. The
difference between the shock scenario and the default
scenario can be measured in terms of the 2040 ‘emis-
sions gap’. After 2040, the shock scenarios and cor-
responding early pricing scenarios can be expected to
converge, since they are subject to the same carbon
prices. However, during a transition period, the shock
scenarios will continue to have higher emission levels
than the corresponding early pricing scenarios, due
to the systems inertia. The IT (in units of years) is
defined as the ratio between the cumulative emission
difference between the two scenarios after 2040, and
the ‘emissions gap’ in the model year prior to 2040.
For more information and visualization see supple-
ment (S2).
The CAV is a dimensionless measure of economic
implications of emissions abatement at a certain car-
bon price. It shows the ratio between the policy costs
and marginal abatement costs (MACs). For PE mod-
els, this can be seen as an indicator for the shape
of the (implicit) MAC curve. The closer to 1 this
indicator is, the more concave the MAC curve and
the higher the projected policy costs. In other words, a
low value indicates more mitigation potential at lower
carbon prices. For GE models, macro-economic feed-
backs are also factored in. Here, a value higher than
1 implies that these feedbacks are a dominant factor
in the costs. We simplified the original indicator
by looking at a benchmark year (2050) instead of
discounting to a net present value. Note that for this
indicator, we include all greenhouse gases represen-
ted by the models (this differs per model), since that
corresponds with the model’s projected policy costs.
Reported policy cost metrics also differ per model
type. We used consumption loss compared to the
baseline for all GE models and area under the MAC
for all PE models, except for PROMETHEUS and
TIAM-Grantham where the additional total energy
system costs were applied. Although the metrics dif-
fer, they are comparable in the sense that they (at
least) factor in first-order economic expenditures,
5
Loading more pages...