The process of parameter estimation targeting a chosen set of observations is an essential aspect of numerical modeling. This process is usually named tuning in the climate modeling community. In climate models, the variety and complexity of physical processes involved, and their interplay through a wide range of spatial and temporal scales, must be summarized in a series of approximate submodels. Most submodels depend on uncertain parameters. Tuning consists of adjusting the values of these parameters to bring the solution as a whole into line with aspects of the observed climate. Tuning is an essential aspect of climate modeling with its own scientific issues, which is probably not advertised enough outside the community of model developers. Optimization of climate models raises important questions about whether tuning methods a priori constrain the model results in unintended ways that would affect our confidence in climate projections. Here, we present the definition and rationale behind model tuning, review specific methodological aspects, and survey the diversity of tuning approaches used in current climate models. We also discuss the challenges and opportunities in applying so-called objective methods in climate model tuning. We discuss how tuning methodologies may affect fundamental results of climate models, such as climate sensitivity. The article concludes with a series of recommendations to make the process of climate model tuning more transparent.
We survey the rationale and diversity of approaches for tuning, a fundamental aspect of climate modeling, which should be more systematically documented and taken into account in multimodel analysis.
As is often the case in sciences that address complex systems, numerical models have become central in climate science (Edwards 2001). General circulation models of the atmosphere were originally developed for numerical weather forecasting (e.g., Phillips 1956). The coupling of global atmospheric and oceanic models began with Manabe and Bryan (1969) and came of age in the 1980s and 1990s. Global climate models or Earth system models (ESMs) are nowadays used extensively to study climate changes caused by anthropogenic and natural perturbations (Lynch 2008; Edwards 2010). The evaluation and improvement of these global models is the driver of much theoretical and observational research. Publications that analyze the simulations coordinated at an international level in the frame of the Coupled Model Intercomparison Project (CMIP) constitute a large part of the material synthesized in the Intergovernmental Panel on Climate Change (IPCC) Assessment Reports. Beyond their use for prediction and projection at meteorological to climatic time scales, global models play a key role in climate science. They are used to understand and assess the mechanisms at work, while accounting for the complexity of the climate system and for the spatial and temporal scales involved (Dalmedico 2001; Held 2005).
The development of a climate model is a long-term project. When releasing a new model or new version of a model, a series of submodels, sometimes developed or improved over years in separate teams, are combined and optimized together to produce a climate that matches some key aspects of the observed climate. While the fundamental physics of climate is generally well established, submodels or parameterizations are approximate, either because of numerical cost issues (limitations in grid resolution, acceleration of radiative transfer computation) or, more fundamentally, because they try to summarize complex and multiscale processes through an idealized and approximate representation. Each parameterization relies on a set of internal equations and often depends on parameters, the values of which are often poorly constrained by observations. The process of estimating these uncertain parameters in order to reduce the mismatch between specific observations and model results is usually referred to as tuning in the climate modeling community.
Climate model tuning is a complex process that presents analogy with reaching harmony in music. Producing a good symphony or rock concert requires first a good composition and good musicians who work individually on their score. Then, when playing together, instruments must be tuned, which is a well-defined adjustment of wave frequencies that can be done with the help of electronic devices. But the orchestra harmony is reached also by adjusting to a common tempo as well as by subjective combinations of instruments, volume levels, or musicians’ interpretations, which will depend on the intention of the conductor or musicians. When gathering the various pieces of a model to simulate the global climate, there are also many scientific and technical issues, and tuning itself can be defined as an objective process of parameter estimation to fit a predefined set of observations, accounting for their uncertainty, and a process that can be engineered. However, because of the complexity of the climate system and of the choices and approximations made in each submodel, and because of priorities defined in each climate center, there is also subjectivity in climate model tuning (Tebaldi and Knutti 2007) as well as substantial know how from a limited number of people with vast experience with a particular model. One goal of this paper is to make this knowledge more explicit.
Choices and compromises made during the tuning exercise may significantly affect model results and influence evaluations that measure a statistical distance between the simulated and observed climate. In theory, tuning should be taken into account in any evaluation, intercomparison, or interpretation of the model results. Although the need for parameter tuning was recognized in pioneering modeling work (e.g., Manabe and Wetherald 1975) and discussed as an important aspect in epistemological studies of climate modeling (Edwards 2001), the importance of tuning is probably not advertised as it should be. It is often ignored when discussing the performances of climate models in multimodel analyses. In fact, the tuning strategy was not even part of the required documentation of the CMIP phase 5 (CMIP5) simulations. In the best cases, the description of the tuning strategy was available in the reference publications of the modeling groups (Mauritsen et al. 2012; Golaz et al. 2013; Hourdin et al. 2013a,b; Schmidt et al. 2014). Why such a lack of transparency? This may be because tuning is often seen as an unavoidable but dirty part of climate modeling, more engineering than science, an act of tinkering that does not merit recording in the scientific literature. There may also be some concern that explaining that models are tuned may strengthen the arguments of those claiming to question the validity of climate change projections. Tuning may be seen indeed as an unspeakable way to compensate for model errors.
The purpose of this paper is to help make the process of model tuning more explicit and transparent. Tuning is an intrinsic and fundamental part of climate modeling that should be better documented and discussed as such in the scientific literature. Tuning can be described as an optimization step and follows a scientific approach. Tuning can provide important insights on climate mechanisms and model uncertainties. Some biases in climate models can be reduced or removed by tuning, while others remain stubbornly resistant. It is important to understand why if we want to improve models. Below, we present a definition of tuning, document current practices and methodologies, and address emerging issues. We conclude with recommendations on model tuning and its documentation.
DEFINITION OF CLIMATE MODEL TUNING.
Model tuning or calibration is neither a new concept nor specific to climate modeling. In statistical sciences, Fisher introduced three steps in the process of modeling (Fisher 1922; Burnham and Anderson 2002): (i) model formulation, (ii) parameter estimation, and (iii) estimation of uncertainty. This categorization applies also to the wider context of numerical modeling. It is conceptually useful to discriminate between model formulation and parameter estimation, even if this distinction is by no means clear-cut in climate model tuning, as explained below.
Climate model development is founded on well-understood physics combined with a number of heuristic process representations. The fluid motions in the atmosphere and ocean are resolved by the so-called dynamical core down to a grid spacing of typically 25–300 km for global models, based on numerical formulations of the equations of motion from fluid mechanics. Subgrid-scale turbulent and convective motions must be represented through approximate subgrid-scale parameterizations (Smagorinsky 1963; Arakawa and Schubert 1974; Edwards 2001). These subgrid-scale parameterizations include coupling with thermodynamics; radiation; continental hydrology; and, optionally, chemistry, aerosol microphysics, or biology.
Parameterizations are often based on a mixed, physical, phenomenological and statistical view. For example, the cloud fraction needed to represent the mean effect of a field of clouds on radiation may be related to the resolved humidity and temperature through an empirical relationship. But the same cloud fraction can also be obtained from a more elaborate description of processes governing cloud formation and evolution. For instance, for an ensemble of cumulus clouds within a horizontal grid cell, clouds can be represented with a single-mean plume of warm and moist air rising from the surface (Tiedtke 1989; Jam et al. 2013) or with an ensemble of such plumes (Arakawa and Schubert 1974). Similar parameterizations are needed for many components not amenable to first-principle approaches at the grid scale of a global model, including boundary layers, surface hydrology, and ecosystem dynamics. Each parameterization, in turn, typically depends on one or more parameters whose numerical values are poorly constrained by first principles or observations at the grid scale of global models. Being approximate descriptions of unresolved processes, there exist different possibilities for the representation of many processes. The development of competing approaches to different processes is one of the most active areas of climate research. The diversity of possible approaches and parameter values is one of the main motivations for model intercomparison projects in which a strict protocol is shared by various modeling groups in order to better isolate the uncertainty in climate simulations that arises from the diversity of models (model uncertainty).
A model configuration is determined by two aspects: its complexity and resolution. For global climate models or ESMs, the configuration retained generally results from compromises between resolution, complexity, and length and number of simulations. Different modeling groups may have different priorities in terms of scientific questions and applications, thus making different judgments on how to best balance finite resources. The choice of complexity and resolution itself can be considered as tuning in a wide sense, since it is often motivated by the ability of the model to reproduce with some realism key aspects of the climate system.
Here, we focus on the classical definition of tuning that corresponds to parameter estimation in Fisher’s terminology. Once a model configuration is fixed, tuning consists of choosing parameter values in such a way that a certain measure of the deviation of the model output from selected observations or theory is minimized or reduced to an acceptable range. Defined this way, tuning is usually called calibration in other application areas of complex numerical models (Kennedy and O’Hagan 2001). Some climate modelers are reluctant to use this term, however, since they know that by adjusting parameters they also compensate, intentionally or not, for some (often unknown) deficiencies in the model formulation itself.
Parameter tuning itself occurs at various levels that correspond to stages of model development. An initial calibration may be performed during the development phase of a new parameterization, for instance, using a single-column version of the climate model. Although desirable in principle, this parameterization tuning is often difficult in practice because processes are strongly coupled to each other and to the large-scale dynamics. At the next stage, a number of parameterizations are tuned together when assembled into components: atmosphere, ocean, and continental surface. This component tuning is performed by using standalone components with boundary conditions that would otherwise be provided by other components. For example, an ocean model with imposed surface wind stress, inputs of freshwater, precipitation, and radiation might be tuned to get sea surface temperatures or meridional overturning circulation that match expectations. A system tuning is finally required to ensure consistency across the full climate system once components are coupled together.
COMMON PRACTICES AND TARGETS.
Tuning of coupled Earth system models generally follows a common practice but with targets and priorities that may vary from group to group. This was confirmed by a poll conducted in August–September 2014 (see sidebar on “How do modeling centers tune their models?” for results). Most of the major climate modeling groups (23 model centers) submitted answers to a questionnaire on why and how their models are tuned.
A survey was conducted in August–September 2014, polling 23 different modeling centers that develop coupled atmosphere and ocean models to find out how they tune models. Most centers had a number of people discuss the answers before submission (one answer per group). The full results can be found in the online supplemental information (http://dx.doi.org/10.1175/BAMS-D-15-00135.2); 22 of 23 groups reported adjusting model parameters to achieve desired properties such as radiation balance at the top of the atmosphere. Percentages are reported based on the fraction of respondents; 83% of centers use atmosphere and land only (fixed sea surface temperatures or a data ocean) to adjust parameters and 44% use single-column models, while 74% perform their adjustment with a preindustrial (1850) coupled atmosphere–ocean configuration and 39% use coupled present-day simulations. Many groups also adjust ocean (48%) and land (39%) model parameters using standalone configurations. In addition, 21% use historical twentieth-century simulations, and 17% use slab ocean models.
The goals of tuning are fairly uniform. Groups were asked about 26 different metrics: a wide variety. About one-third (8 of 26) of the metrics were rated as decisive or very important by at least one-third (35%) of modeling centers. However, there was lots of agreement in the decisive (most important) metrics: global net top-of-atmosphere flux (70%) and then global-mean surface temperature (26%). Based on these goals of tuning, there are a number of different parameterizations adjusted to achieve them. Since tuning is generally focused on the top-of-atmosphere and surface radiation balance, the most common properties adjusted are uncertain cloud properties and then properties that affect surface albedo; 29% adjusted every parameterization asked about occasionally or frequently. The most common parameterizations frequently adjusted are clouds in the atmosphere, including cloud microphysics (65%), convection (52%), and cloud fraction (52%). The most common occasionally adjusted parameters were snow (79%) and sea ice (57%) albedo, along with ocean mixing (57%), orographic drag (57%), and cloud optical properties (48%). Soil (43%) and vegetation (39%) properties were also adjusted. These adjustments are consistent with the feeling that atmospheric cloud physics and atmospheric convection were thought most likely to introduce biases in the model, with ocean physics and mixing third.
Finally, groups were asked whether different tuning practices were eligible (justified) on a five-point scale of disagree, somewhat disagree, neutral, somewhat agree, and agree. All groups agreed or somewhat agreed that tuning was justified; 91% thought that tuning global-mean temperature or the global radiation balance was justified (agreed or somewhat agreed). Given that these were groups attending a meeting on the subject, there is a self-selection bias. Using the same top two categories as registering agreement, the following were considered acceptable for tuning by over half the respondents: atmospheric circulation (74%), sea ice volume or extent (70%), and cloud radiative effects by regime and tuning for variability (both 52%).
With the increasing diversity in the applications of climate models, the number of potential targets for tuning increases. There are a variety of goals for specific problems, and different models may be optimized to perform better on a particular metric, related to specific goals, expertise, or cultural identity of a given modeling center. Groups more focused on the European climate may give more importance to the ocean heat transport in the North Atlantic, whereas others may be more concerned with tropical climate and convection. Some groups may put more weight on metrics that measure the skill to reproduce the present-day mean climatology or observed modes of variability, while others may privilege process-oriented metrics targeting processes that are believed to dominate the climate change response to anthropogenic forcing.
There is, however, a dominant shared target for coupled climate models: the climate system should reach a mean equilibrium temperature close to observations when energy received from the sun is close to its real value (≃340 W m−2). This energy source will be balanced by the energy lost to space by reflected sunlight and thermal infrared radiation if the model conserves energy numerically (which cannot always be strictly imposed). We know indeed that the system is nearly in balance but for the ocean heat uptake, believed to be about 0.5 W m−2 in our warming climate, a value much smaller than the model and observational uncertainties. This provides a strong, large-scale constraint.1
A common practice to fulfill this constraint is to adjust the top-of-atmosphere or surface2 energy balance in atmosphere-only simulations exposed to observed sea surface temperatures (component tuning) and check if the temperature obtained in coupled models is realistic. This energy balance tuning is crucial since a change by 1 W m−2 of the global energy balance typically produces a change of about 0.5–1.5 K in the global-mean surface temperature in coupled simulations depending on the sensitivity of the given model.
In general, the parameters are given some a priori values and ideally a range around this value. This information can come from theory, from a back-of-the-envelope estimate, from numerical experiments (tuning an eddy diffusion coefficient from explicit simulations of the turbulent process), or from observations (a mean effective cloud droplet for instance). Note that many internal parameters are not directly observable. Given this information, a common practice is to adjust the most uncertain parameters that significantly affect key climate metrics. Indeed, all parameters are not known with the same accuracy. There is fair consensus (see poll) that the most uncertain parameters that affect the atmospheric radiation are those entering in the parameterization of clouds and of the albedo of Earth’s surface. Clouds exert a large net cooling effect (about −20 W m−2), but this effect is uncertain to within several watts per square meter (Loeb et al. 2009). A 1 W m−2 change in cloud radiative effects is only a 5% variation of the net cloud cooling effect and 2% of the solar (or shortwave) effect, well below observational and model uncertainty (L’Ecuyer et al. 2015).
Most tuning parameters are specific to submodel (parameterization) choices. Parameters controlling mixing of convective clouds with the environment will depend on the specific description of the convective vertical transport, parameters controlling the size distribution of cloud droplets will depend on the sophistication of the microphysics, and so on. As an example, Fig. 1, reproduced from Mauritsen et al. (2012, their Fig. 1), illustrates the various parameters that are used for tuning in one particular model.
Some parameterizations and associated tuning parameters are, however, shared by several models. We show in Fig. 2 how a scaling factor on the ice crystal fall velocity (process h in Fig. 1) is used to constrain both the global shortwave and longwave radiation to match the observed value of 240 ± 4 W m−2 in climate models that share the same formulation for the ice crystal fall velocity (Heymsfield and Donner 1990). A larger fall velocity systematically reduces the amount of ice clouds and thus increases both the absorbed shortwave radiation (reduced planetary albedo) and outgoing longwave radiation (reduced greenhouse effect). Beyond global values, tuning is sometimes applied to spatial variations of the radiative fluxes like the latitudinal dependency that drives the general circulation or land–sea contrasts that drive monsoon circulations. Figures 2b and 2c illustrate for two models how the same factor on ice crystal fall velocity affects the latitudinal distribution of absorbed solar radiation and outgoing longwave radiation.
After clouds, the most common tuning parameters are those entering in the parameterizations of snow and sea ice albedo, ocean mixing, and orographic drag. Soil and vegetation properties are also sometimes used for tuning.
Because of the uncertainties in observations and in the model formulation, the possible parameter choices are numerous and will differ from one modeling group to another. These choices should be more often considered in model intercomparison studies. The diversity of tuning choices reflects the state of our current climate understanding, observation, and modeling. It is vital that this diversity be maintained. It is, however, important that groups better communicate their tuning strategy. In particular, when comparing models on a given metric, either for model assessment or for understanding of climate mechanisms, it is essential to know whether some models used this metric as tuning target.
APPLYING OBJECTIVE METHODS.
There exists a considerable literature on parametric tuning using objective approaches developed in the statistics, engineering, and computer science communities. By objective methods, one means that a well-founded mathematical or statistical framework is used to perform the model tuning, for instance, by defining and minimizing a cost function or by introducing a Bayesian formulation of the calibration problem (Kennedy and O’Hagan 2001). The use of objective methods does not, however, in any way obviate the requirement for subjective judgment concerning the priorities and targets of the tuning process. An objective algorithm merely identifies those parts of the procedure that require the subjective scientific expertise of the modeler. It requires that the modeler formulate this judgment in terms of numbers or mathematical formulas, which can be sometimes quite demanding but also contribute to making the process of tuning more explicit and reproducible. Objective methods then provide an automatic tuning procedure based on those judgments.
Broadly speaking, objective methods fall into one of two categories. The first involves fast optimization of some cost function measuring the distance of model simulations to a small collection of observations. Applications of such methods in climate science include Bellprat et al. (2012), Yang et al. (2013), Zou et al. (2014), and Zhang et al. (2015). The second class of methods represents a Bayesian approach and is now part of a class of methods under the banner of uncertainty quantification (UQ; Kennedy and O’Hagan 2001). UQ, for parameter tuning, aims to provide uncertainty for the parameters using a statistical model relating the climate model to observations that explicitly quantifies the key sources of uncertainty present in the problem: observational uncertainty, initial condition uncertainty (internal variability), and structural uncertainty (missing or incorrect physics). Applications of these methods to climate models include Rougier (2007), Jackson et al. (2008), Edwards et al. (2011), and Williamson et al. (2013). UQ methods, for example, were used to provide the U.K. Climate Projections (Murphy et al. 2009; Sexton et al. 2012).
Both classes of objective methods (optimization and UQ) share advantages over more arbitrary trial-and-error approaches that focus on tuning only one or two parameters at a time. For example, by perturbing multiple parameters simultaneously and systematically, automatic methods can overcome concerns that a local optimum for one objective may not be a good solution for other objectives and may not even be a global optimum for the tuning metric (Qian et al. 2015; Williamson et al. 2015).
Both classes of methods also share some of the same challenges. The main challenge is the computational cost of running the climate model with sufficient parameter choices to explore the parameter space. For high-resolution climate models (or even their components), available supercomputing power and the time available between tuning cycles—typically on the order of one to a few years between two model releases—limits even the best equipped institutions.
To overcome these computational issues, statistical emulators (also called metamodels) can be used. Developed by statisticians since the late 1980s (Sacks et al. 1989; Currin et al. 1991; Haylock and O’Hagan 1996), emulators use small training ensembles to train statistical models that can predict the climate model response very quickly (Neelin et al. 2010), reporting a measure of uncertainty (typically offering a full probability distribution for the climate model at any choice of the parameters). The emulator uncertainty must be included in Bayesian UQ methods for parameter tuning, though it is ignored in some applications of optimization methods with the emulator mean function used directly.
For high-resolution models and models with long spinup time, running the model enough to build an emulator represents a huge challenge. Ensembles of shorter simulations to replace the traditional, serial-in-time, long-term climatology simulations have been proposed (Wan et al. 2014), and the UQ literature has long proposed and demonstrated the success of linked models of different resolution to build emulators. For example, Williamson et al. (2012) built an emulator for the CMIP5 model the Hadley Centre Coupled Model, version 3 (HadCM3), using only 16 integrations and a large ensemble of the low-resolution version Fast Met Office/U.K. Universities Simulator (FAMOUS). This is an active area of research in UQ.
A principal challenge for automatic tuning methods is that tuning to a handful of metrics may risk achieving improved performance in those metrics at the expense of unphysical behavior in metrics or processes that were not used in tuning, that is, we get some things “right for the wrong reasons.” This problem, known as overfitting or overtuning, will arise as soon as a minimization or parameter selection is done that does not properly account for the observation and model structural uncertainties. It will also arise when tuning to partial observations (i.e., not tuning the whole state vector of the climate model) or overfitting data that are partly simply natural variability (Notz 2015). Then tuning may be seen as an error compensation process rather than as model calibration. Overtuning can also occur when tuning by hand, but blind trust in an automatic tool may be more risky in that it prevents us from exercising the part of the expert judgment that cannot easily be translated into objective functions or expressed mathematically as uncertainties.
Overtuning is a real concern and the raison d’être for Bayesian UQ methods. However, because the key sources of uncertainty in the tuning problem, observation uncertainty, and structural error are so poorly understood and difficult to quantify, automatic tuning has a long way to go before it is adopted routinely by the major modeling centers for CMIP integrations. A class of UQ methods that explicitly avoids overtuning, called history matching, has recently been proposed for the climate model tuning community (Williamson et al. 2015). They avoid overtuning by changing the problem from one of searching for a single best value of the parameters to looking for unacceptable parameter values and ruling out the corresponding regions of the parameter space iteratively.
TUNING AND MODEL IMPROVEMENT.
Although tuning is an efficient way to reduce the distance between model and selected observations, it can also risk masking fundamental problems and the need for model improvements.
There is evidence that a number of model errors are structural in nature and arise specifically from the approximations in key parameterizations as well as their interactions. For example, some models systematically underestimate rainfall over monsoon regions, whereas others will do the opposite. Other biases are systematic across models, like the presence of a persistent double Pacific intertropical convergence zone (ITCZ) on both sides of the equator or warm biases over the eastern tropical oceans. Those model biases are indeed often resistant to model tuning. Tuning a model to improve its performance on a specific target also often degrades performance on other metrics. For example, tuning a model to improve the intraseasonal variability of precipitation in the tropics often comes at the cost of increased biases in the mean state (Kim et al. 2012).
Introduction of a new parameterization or improvement also often decreases the model skill on certain measures. The preexisting version of a model is generally optimized by both tuning uncertain parameters and selecting model combinations giving acceptable results, probably inducing compensation errors (overtuning). Improving one part of the model may then make the skill relative to observations worse, even though it has a better formulation. The stronger the previous tuning, the more difficult it will be to demonstrate a positive impact from the model improvement and to obtain an acceptable retuning. In that sense, tuning (in case of overtuning) may even slow down the process of model improvement by preventing the incorporation of new and original ideas. This difficulty has been known for decades in operational numerical weather prediction centers and could be overcome by not overweighting climate performance metrics (the ones that matter for the end users or for impact models) with respect to process-oriented ones. Process-oriented metrics are intended to help relate large-scale biases to the misrepresentation of specific subgrid-scale processes. Process-oriented metrics include, for example, compositing cloud or precipitation characteristics by dynamical regimes (Bony et al. 2004), compositing relative humidity profiles based on precipitation percentiles to assess the sensitivity of convection schemes to relative humidity (Kim et al. 2014), or evaluating simulated cloud microphysical properties (and their covariability) directly from satellite measurements (Suzuki et al. 2013).
On the other hand, tuning may highlight where further model improvement is needed. If parameter values needed to satisfy a given metric are outside the acceptable range, or if different values are needed for different regions or climate regimes, developers may consider revisiting the formulation of the parameterization or develop new ones. Then, the tuning process can be pushed back to a deeper level inside the model while increasing the physical realism of the model.
For clouds and convection, parameterization development is often performed using single-column versions of the global model compared to explicit high-resolution simulations of the processes that are parameterized, following a strategy defined 20 years ago (see, e.g., Ayotte et al. 1996; Liu et al. 2001). The explicit simulation gives access to variables hardly accessible by observation (like 3D fields of temperature and humidity or vertical velocities) but also to estimation of parameters that have no observational counterpart (like entrainment and detrainment rates between a mean bulk plume and its environment or a mean fall velocity for ice crystals at the model grid scale). Such parameters can be derived by sampling and characterizing the equivalent of the parameterized structures in the explicit simulations, as done, for example, by Couvreux et al. (2010), to derive mixing rates between a mean bulk plume and its environment. The parameterization development process can thus help constrain some parameters but also propose physically based submodels for some others.
One way to make the reduction of model large-scale biases and the parameterization development processes more “in tune” is by deriving an acceptable range of parameter values instead of a single value from the aforementioned process studies and use this range when tuning global simulations. To achieve this goal, UQ methods could be applied to the single-column model using explicit process simulations as a reference. It is important that the representation of turbulence, microphysics, and radiation continue to be improved in explicit high-resolution simulations, so that the parameterization can be evaluated not only in terms of subgrid-scale dynamics (as usually done so far) but also in terms of the radiative effect of clouds.
Another emerging approach consists of using initialized or nudged simulations (Zhang et al. 2014) in the tuning process. In nudged simulations, the model is forced to follow the observed trajectory by relaxing winds and also optionally temperature and humidity toward meteorological analysis, with a time constant of typically a few hours. With initialized or nudged simulations, the simulated and observed meteorology follows the same trajectory and the comparison with observations can be done on a day-by-day basis. Wind-only nudging allows separation of parameterization tuning for a given meteorological situation (as is done in 1D mode) from that of the coupling of parameterization with large-scale dynamics. Nudging with short enough time constants (typically of a few hours) removes the chaotic nature of the atmospheric large-scale circulation and slow feedbacks of that circulation on fast processes (such as clouds). Nudged or initialized simulations may also help accelerate tuning for high-resolution climate models.
Whatever the approach, there is a need for relying more on observational studies at the process scale to tune the radiative budget in a more physical way. Progress will be made by further incorporating model tuning as an uncertainty analysis into the parameterization development process.
TUNING TO TWENTIETH-CENTURY WARMING?
The increase of about 1 K of the global-mean temperature observed from the beginning of the industrial era, hereafter twentieth-century warming, is a de facto litmus test for climate models (Mauritsen et al. 2012). However, as a test of model quality, it is not without issues because the desired result is known to model developers and therefore becomes a potential target of the development.
The amplitude of the twentieth-century warming depends primarily on the magnitude of the radiative forcing, the climate sensitivity, and the efficiency of ocean heat uptake. By linearizing about a basic stationary climatic state, the global-mean temperature change for a gradually increasing forcing can be approximated as
where T denotes global-mean surface temperature, F is an imposed radiative forcing, κ is the deep-ocean heat uptake efficiency, and λ is the feedback parameter that is inversely proportional to equilibrium climate sensitivity (ECS; ECS ≈ –F/λ). Climate models have values of λ that range from −0.6 to −1.8 W m−2 K−1 and κ ranges from approximately 0.5 to 1.2 W m−2 K−1. On average, in models the denominator (κ – λ) is about 2 W m−2 K−1, and in the year 2003, the forcing is around 1.7 W m−2 (Forster et al. 2013).
The often-deployed paradigm of climate change projection is that climate models are developed using theory and present-day observations, whereas ECS is an emergent property of the model and the matching of the twentieth-century warming constituting an a posteriori model evaluation. Some modeling groups claim not to tune their models against twentieth-century warming; however, even for model developers, it is difficult to ensure that this is absolutely true in practice because of the complexity and historical dimension of model development.
The reality of this paradigm is questioned by findings of Kiehl (2007), who discovered the existence of an anticorrelation between the total radiative forcing and climate sensitivity in a model ensemble; high-sensitivity models were found to have a smaller total forcing and low-sensitivity models were found to have a larger forcing, yielding less cross-ensemble variation of historical warming than otherwise to be expected. Even if alternate explanations have been proposed and even if the results were not so straightforward for CMIP5 (cf. Forster et al. 2013), it could suggest that some models may have been inadvertently or intentionally tuned to the twentieth-century warming.
There is a broad spectrum of methods to improve the model match to twentieth-century warming, ranging from simply choosing to no longer modify the value of a sensitive parameter when a match is already good for a given model (Mauritsen et al. 2012), or selecting physical parameterizations that improve the match, to explicitly tuning either forcing or feedback, both of which are uncertain and depend critically on tunable parameters (Murphy et al. 2004; Golaz et al. 2013). Model selection could, for instance, consist of choosing to include or leave out new processes, such as aerosol–cloud interactions, to help the model better match the historical warming or choosing to work on or replace a parameterization that is suspected of causing a perceived unrealistically low or high forcing or climate sensitivity.
An illustration of twentieth-century tuning with the GFDL-CM3 model is shown in Fig. 3. The model (green) produces a relatively weak warming over the twentieth century due to a strong cooling effect from aerosol–cloud interactions. Sensitivity tests, which were performed after the model was frozen, showed that it is possible to reduce this effect and thereby obtain a more realistic warming. However, this was achieved by lowering the threshold size for the conversion of cloud droplets to rain to values smaller than supported by observations (Golaz et al. 2013; Suzuki et al. 2013, and references therein).
Adjusting the twentieth-century warming would in principle require a series of multicentury simulations with the coupled ocean–atmosphere model because of the long spinup of the ocean state required before starting transient twentieth-century simulations. However, it has long been known that short atmospheric simulations can be used to estimate either adjusted forcing when forced with perturbed atmospheric composition (Hansen et al. 2005) or ECS when forced with perturbed sea surface temperature (Cess et al. 1989; Gettelman et al. 2012). Thereby, it is possible to target specific values of F and λ thought to provide a good match to historical warming based on experience with previous model versions.
Any ECS tuning would need to take into account three main sources of uncertainties. First, as usual, the uncertainty of the observation of the global-mean surface temperature should not be forgotten even if it is believed today to be much smaller than the intermodel dispersion. Then the radiative forcing F itself is uncertain. It is composed of a fairly well-known greenhouse gas forcing that is partly compensated by an uncertain aerosol forcing and modified by a series of other less important forcing agents. Tuning of the twentieth century could, for instance, be obtained with an overly large ECS balancing an overly strong aerosol radiative forcing. In such a case, and because the effect of greenhouse gases will dominate in the future, this would result in an overestimate of future global warming. The third important source of uncertainty comes from the internal climate variability that can cause variations among realizations with different initial conditions of typically ±0.1 K to centennial warming; since the observed only represents one such realization, a model need not be closer than this to match the target. Trying to match the twentieth-century global warming without accounting for sources of uncertainty would inevitably lead to overtuning.
The question of whether the twentieth-century warming should be considered a target of model development or an emergent property is polarizing the climate modeling community, with 35% of modelers stating that twentieth-century warming was rated very important to decisive, whereas 30% would not consider it at all during development. Some view the temperature record as an independent evaluation dataset not to be used, while others view it as a valuable observational constraint on the model development. Likewise, opinions diverge as to which measures, either forcing or ECS, are legitimate means for improving the model match to observed warming. The question of developing toward the twentieth-century warming therefore is an area of vigorous debate within the community.
However, the capability to control the modeled twentieth-century warming also offers new opportunities to explore the bounds of modeled climate sensitivity (Golaz et al. 2013); by combining altered ECS and aerosol forcing, it is technically possible to construct outlier low- and high-sensitivity models that match the observed warming. Evaluating such models with other observed aspects, such as midcentury warming or modes of variability, and running them in prehistoric climates, such as the Last Glacial Maximum or the Pliocene, could potentially allow us to rule out extreme values of ECS and/or aerosol forcing.
The fact that some models are explicitly, or implicitly, tuned to better match the twentieth-century warming, while others may not be, clearly complicates the interpretation of the results of combined model ensembles such as CMIP. The diversity of approaches is unavoidable as individual modeling centers pursue their model development to seek their specific scientific goals. It is, however, essential that decisions affecting forcing or feedback made during model development be transparently documented.
CONCLUSIONS, IMPLICATIONS, AND RECOMMENDATIONS.
There was a debate among authors on the idea of using the word art in the title of the paper. Tuning is seen by some modelers more as a pure engineering calibration exercise, which consists of applying objective or automatic tools based on purely scientific considerations. Others see it as an experienced craftsmanship or as an art: “a skill that is attained by study, practice, or observation.”3 As in art, there is also some diversity and subjectivity in the tuning process because of the complexity of the climate system and because of the choices made among the equally possible representations of the system. It is essential to maintain this diversity in model approaches and tuning because of the approximate nature of models, the lack of observational counterparts for many internal model parameters, and the importance of climate change predictions, for which no observation exist.
This subjectivity does not contradict the fundamental and twofold scientific nature of climate tuning. On one side, the tuning process involves many scientific issues like the physical understanding of the phenomena to be modeled, algorithmic formulation of physical laws, mathematical basis of optimization, and the statistics of internal variability. In turn, the understanding of climate mechanisms can be inspired by the act of tuning that is based intrinsically on a large exploration of possible climates through sensitivity experiments. It allows us to identify and understand the role of the various modeled processes and feedbacks involved. Tuning may also help identify model structural errors, for instance, if the optimal value of a parameter falls outside the acceptable range or if different values of the same parameter are optimal for different situations. In this sense, tuning is a form of uncertainty analysis.
Because tuning will affect the behavior of a climate model, and the confidence that can be given to a particular use of that model, it is important to document the tuning portion of the model development process. We recommend that for the next CMIP6 exercise, modeling groups provide a specific document on their tuning strategy and targets that would be referenced when accessing the dataset. We recommend distinguishing three levels in the tuning process: individual parameterization tuning, component tuning, and climate system tuning. At the component level, emphasis should be put on the relative weight given to climate performance metrics versus process-oriented ones and on the possible conflicts with parameterization level tuning. For the climate system tuning, particular emphasis should be put on the way energy balance was obtained in the full system: was it done by tuning the various components independently or was some final tuning needed? The degree to which the observed trend of the twentieth century was used or not for tuning should also be described. Comparisons against observations and adjustment of forcing or feedback processes should be noted. At each step, any occasion where a team had to struggle with a parameter value or push it to its limits to solve a particular model deficiency should be emphasized. This information may well be scientifically valuable as a record of the uncertainty of a model formulation.
It would also be valuable to produce and document two or more versions of the same model that would differ only by their tuning. One can imagine changing a parameter that is known to affect the sensitivity, keeping both this parameter and the ECS in the anticipated acceptable range and retuning the model otherwise with the same strategy toward the same targets.
Finally, development of new methodologies is strongly encouraged. Some of the most promising ideas include 1) the systematic use of the single column versus explicit simulations approach for parameterization tuning, 2) the use of process-oriented metrics, and 3) nudged simulations to fill the gap between parameterization and component tuning. The systematic use of objective methods at the process level in order to estimate the range of acceptable parameters’ values for tuning at the upper levels is probably one strategy that should be encouraged and may help make the process of model tuning more transparent and tractable.
There is a legitimate question of whether tuning should be performed preferentially at the process level and the global radiative budget and other climate metrics used for a posteriori evaluation of the model performance. It could be a good way to evaluate our current degree of understanding of the climate system and to estimate the resulting uncertainty in ECS. Restricting adjustment to the process level may also be a good way to avoid compensating model structural errors in the tuning procedure. However, because of the multiapplication nature of climate models, because of consistency issues across the model and its components, because of the limitations of process studies metrics (sampling issues, lack of energy constraints), and also simply because the climate system itself is not observed with sufficient fidelity to fully constrain models, an a posteriori adjustment will probably remain necessary for a while. This is especially important for the global energy constraints that are a strong and fundamental aspect of global climate models. Adjustment will be done usually by tuning the most uncertain parameters involved in the representation of processes that most affect radiation such as cirrus clouds or low clouds within acceptable ranges. Tuning will probably induce some compensation of shortcomings or errors in the model parameterizations or configuration. However, this error compensation is probably unavoidable and desirable for current models, because of the importance of the energetic tuning for a reasonable simulation of most aspects of the climate system. The level of accuracy required for the global energy tuning (of a few tenths watts per square meter) is, for instance, smaller than the error arising from not computing radiation at every time step, as is often done to save computational means (on the order of several watts per square meter; see, e.g., Balaji et al. 2016). It is recommended, however, to ensure that the final global tuning is not obtained for a set of parameter values that would not be acceptable in terms of process studies and process-oriented metrics.
The use of objective methods could also be promoted at all the stages of model tuning in order to render the process more efficient. However, objective tuning approaches should be used with caution. Because of the approximate nature of models and because of observations’ uncertainties, it is impossible to retain one unique parameter set as an objective criteria. Formalizing the question of tuning addresses an important concern: it is essential to explore the uncertainty coming both from model structural errors by favoring the existence of tens of models and from parameter uncertainties by not overtuning. Either reducing the number of models or overtuning, especially if an explicit or implicit consensus emerges in the community on a particular combination of metrics, would artificially reduce the dispersion of climate simulations. It would not reduce the uncertainty but only hide it.
We end by expressing the hope that this article will encourage both a systematic effort by the community to document this arcane aspect of model construction and for more people to join a vigorous debate on model tuning and evaluation.
The authors thank the World Climate Research Programme and its Working Group on Coupled Modelling for initiating and helping organize the workshop on model tuning in October 2014 in Garmisch-Partenkirchen, Germany. Work at LLNL was performed under the auspices the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. The contribution of Yun Qian was supported by the U.S. Department of Energy’s Office of Science as part of the Earth System Modeling Program. The Pacific Northwest National Laboratory is operated for DOE by Battelle Memorial Institute under Contract DE-AC05-76RL01830. V. Balaji is grateful to Labex L-IPSL for support for its long stay at L’Institut Pierre-Simon Laplace, during which he worked on drafts of this article
Current affiliations: Tomassini—Met Office, Exeter, United Kingdom; Golaz—Lawrence Livermore National Laboratory, Livermore, California
The National Center for Atmospheric Research is supported by the National Science Foundation.
A supplement to this article is available online (10.1175/BAMS-D-15-00135.2)
Even observations of the radiative fluxes are in fact adjusted using this constraint. The CERES–EBAF data stand for energy balance adjusted flux.
Top-of-atmosphere and surface energy balance should not differ if exact energy conservation in the atmosphere is ensured, which turns out not to be an easy task.