## Abstract

A simple and coherent framework for partitioning uncertainty in multimodel climate ensembles is presented. The analysis of variance (ANOVA) is used to decompose a measure of total variation additively into scenario uncertainty, model uncertainty, and internal variability. This approach requires fewer assumptions than existing methods and can be easily used to quantify uncertainty related to model–scenario interaction—the contribution to model uncertainty arising from the variation across scenarios of model deviations from the ensemble mean. Uncertainty in global mean surface air temperature is quantified as a function of lead time for a subset of the Coupled Model Intercomparison Project phase 3 ensemble and results largely agree with those published by other authors: scenario uncertainty dominates beyond 2050 and internal variability remains approximately constant over the twenty-first century. Both elements of model uncertainty, due to scenario-independent and scenario-dependent deviations from the ensemble mean, are found to increase with time. Estimates of model deviations that arise as by-products of the framework reveal significant differences between models that could lead to a deeper understanding of the sources of uncertainty in multimodel ensembles. For example, three models show a diverging pattern over the twenty-first century, while another model exhibits an unusually large variation among its scenario-dependent deviations.

## 1. Introduction

Uncertainty in climate change prediction arises from three different sources: model uncertainty, scenario uncertainty, and internal variability. *Model uncertainty* arises because of an incomplete understanding of the physical processes and the limitations of implementation of the understanding. *Scenario uncertainty* arises because of incomplete information about future emissions. *Internal variability* is the natural unforced fluctuation of the climate system. Internal variability is aleatoric and cannot be reduced by improvement of our scientific knowledge. However, Smith et al. (2007) demonstrated that a proper initialization of climate predictions with observations can reduce the uncertainty for the next decade. An obvious and exploratory way to evaluate the total of these uncertainties is to calculate the spread of a multimodel ensemble. However, further statistical analysis is needed to quantify the contributions of particular sources of uncertainty and to describe how a particular model reacts to a particular emissions scenario.

Various methods to decompose the total uncertainty into its sources have been suggested in climate science. Cox and Stephenson (2007) propose a conceptual framework for this purpose and illustrate its use with a single energy balance model. Hawkins and Sutton (2009, hereafter HS09) and Hawkins and Sutton (2011) fit polynomial trend models over time and calculate various sources of uncertainty. These studies offer simple interpretations of uncertainty, but the drawback is that the total uncertainty cannot be easily interpreted.

We use the analysis of variance (ANOVA) to decompose sources of uncertainty [for a complete review, see Von Storch and Zwiers (2001, chapter 9)]. ANOVA is a model-based approach that partitions the total variance into components due to different sources of variation, allowing a fuller interpretation. The seminal work of Madden (1976) suggests an ANOVA approach to test for the likelihood of potentially predictable long-range variability. Formal ANOVA models are used in Zwiers (1987, 1996) for analyzing seasonal observations and an ensemble of climate simulations, respectively, and are discussed from a statistical point of view. Several previous papers use ANOVA extensively for evaluating model uncertainty from ensembles. Räisänen (2001) uses ANOVA to divide the surface air temperature (SAT), precipitation, and sea level pressure change into a common signal and variances associated with internal variability and model differences under the same forcing scenario. Hingray et al. (2007) use ANOVA to estimate the uncertainty in temperature and precipitation for a collection of atmosphere–ocean general circulation models (AOGCMs).

## 2. Data and methodologies

### a. CMIP3 data

Following HS09, we illustrate the methodology by applying it to global, decadal mean SAT multimodel ensemble predictions for years 2001–99 from the Coupled Model Intercomparison Project phase 3 (CMIP3) archive. The multimodel ensemble data are extracted from original monthly-scale data to decadal-scale data using 10-years’ moving averaging (Table 1). The predictions are from *N _{m}* = 7 general circulation models (GCMs) under

*N*= 3 different future emissions scenarios [Special Report on Emissions Scenarios (SRES) A1B, A2, and B1] with

_{s}*N*= 2 initial condition ensemble members for each model and scenario. This gives a total of

_{r}*N*×

_{m}*N*×

_{s}*N*= 42 predictions for 2001–99. These future scenarios are summarized in Solomon et al. (2007, chapter 10). The reason for using fewer GCMs than other studies, such as Boer (2009) and HS09, is because only seven GCMs have simulated all three scenarios with at least two ensemble runs. Figure 1 shows time series of all the simulation runs of all the models, scenarios, and replicates. The total uncertainty increases with time because the simulation runs diverge with time.

_{r}### b. Methodologies used in previous studies

The methodology in HS09 considers only one realization per model per scenario (*N _{r}* = 1). Each prediction of SAT is fitted using a fourth-order polynomial model over years 1950–2099. The raw predictions

*X*for each model

*m*, scenario

*s*, and year

*t*are written as

where a reference temperature for each model–scenario combination is denoted by *μ*_{ref}, the polynomial fit of the projected change in global mean temperature is represented by *z*, and the regression error (internal variability) is *ε*. The reference temperature used is the 1971–2000 mean for each model and scenario. The internal variability estimator is the multimodel mean of the variance of the regression error *ε*(*m*, *s*, *t*):

The internal variability is considered to have constant variance in time. The model uncertainty estimator is the multiscenario mean of intermodel variance of *z*(*m*, *s*, *t*):

where *z*(·, ·*s*, ·*t*) = Σ* _{m}z*(

*m*,

*s*,

*t*)/

*N*. The scenario uncertainty estimator is the variance of multimodel means of

_{m}*z*(

*m*,

*s*,

*t*):

where *z*(·, ·, *t*) = Σ_{m}_{,}_{s }*z*(*m*, *s*, *t*)/(*N _{m}N_{s}*). The sum of these sources of uncertainty is then defined to be the

*total uncertainty*:

Cox and Stephenson (2007) define the fractional uncertainty (noise-to-signal ratio) at time *t* to be

HS09 also consider the fraction of variance, defined as

### c. Analysis of variance method

We adopt a model-based approach rather than a descriptive or algorithmic approach because the use of a statistical model facilitates a coherent interpretation of uncertainty. We fit an ANOVA model on the projected temperature anomalies *x*(*m*, *s*, *r*, *t*) for model *m*, scenario *s*, and replicate *r* at time *t* from the 1971–2000 mean. First, we consider the following ANOVA model for global decadal SAT *x*(*m*, *s*, *r*, *t*) for each time *t*:

where *μ*(*t*) is the overall effect representing the grand ensemble mean of all simulations at time *t* = 1, 2, … , 99; *α*(*m*, *t*) is the scenario-independent deviation of model *m* = 1, 2, … , 7 from the overall ensemble mean *μ*(*t*); *β*(*s*, *t*) is the scenario deviation of emission scenario *s* = 1, 2, 3; the parameters *α*(*m*, *t*) and *β*(*s*, *t*) are collectively called main effects; *γ*(*m*, *s*, *t*) is the interaction term effect between model *m* and scenario *s* at time *t* that describes scenario-dependent deviation. The error term *ε*(*m*, *s*, *r*, *t*) is independent and identically distributed.

The notion of interaction is an important concept in ANOVA. Mathematically, interaction is said to occur if the separate effects do not combine additively (Berrington de González and Cox 2007). For climate projections, it arises from how models react differently to emission scenarios, which we call scenario-dependent model uncertainty as opposed to scenario-independent model uncertainty. To demonstrate that there is a potential interaction term, we show the mean response at different lead times to different emission scenarios for each model in Fig. 2. The lines are not parallel, indicating that there is an apparent interaction between some models and scenarios, especially for long lead times. For example, in the year 2061–99, the effect of changing emission scenario is different for the CGCM3.1 (refer to Table 1 for model name expansions) than for the MIUBECHO, as the lines joining the models cross.

The method of least squares is used for the parameter estimation. Applying constraints , for *s* = 1, … , *N _{s}* and , for

*m*= 1, … ,

*N*, the parameter estimators are

_{m}where *x*(·, ·, ·, *t*) is the overall mean at time *t*; *x*(*m*, *s*, ·, *t*) is the mean over all the members at time *t* for model *m* and scenario *s*; *x*(·, *s*, ·, *t*) and *x*(*m*, ·, ·, *t*) are means over the models and replicates, and the mean over the scenarios and replicates, at time *t* respectively.

We define all four sources of uncertainty in terms of the notion of variance. The ANOVA approach does not assume constant internal variability over time and is also not restricted to specify any type of trend for models. The internal variability *V*(*t*) is the variance of each member around the model scenario mean, defined as

The scenario-independent model uncertainty *M*(*t*) is the variance of model means around the ensemble mean, defined as

The scenario uncertainty *S*(*t*) is the variance of scenario means around the ensemble mean,

The model–scenario interaction uncertainty *I*(*t*) is the variance of model–scenario mean around the sum of estimated main effects *μ*(*t*), *α*(*m*, *t*) and *β*(*s*, *t*), defined as

The total uncertainty *T*(*t*), is simply the variance of the ensembles, defined as

Since the multimodel ensembles used here are “ensembles of opportunity” (Tebaldi and Knutti 2007), we do not think of the available models and scenarios as a sample from a wider population of possible models and scenarios. If a wider population could be envisaged from which the ensemble members form a sample, then our ANOVA model could be adapted to include so-called random effects that would enable inferences about the population (Eisenhart 1947). The simulation runs under the same model–scenario are thought to be the samples drawn from a finite population. However, it is also appropriate to interpret internal variability as independent realizations from an infinite population. Then our estimation formula *V*(*t*) will be underestimated by 50% when *n* = 2. In that case a possible solution is to use a random effects model to capture such a setting (e.g., Gelman 2005), but we retain simplicity by assuming finite population.

### d. Connection to the estimates in HS09

The model–scenario interaction can be interpreted as a component of the model uncertainty defined in HS09. Consider the sum of the scenario-independent uncertainty *M*(*t*) and the model–scenario interaction variance *I*(*t*), defined as

## 3. Results

We now use the ANOVA approach described above to quantify the uncertainty in global mean and decadal mean SAT in a subset of the CMIP3 climate projections. Figure 3a shows how the different variance components vary with lead time. The scenario uncertainty [*S*(*t*), thick dashed line] dominates the total uncertainty after year 2050, which agrees with HS09. Over the entire period, model uncertainty [*M*(*t*), thick solid line] is also greater than the internal variability [*V*(*t*), thin solid line], which itself is rather constant over time (as assumed by HS09). The model–scenario interaction variance [*I*(*t*), thin dashed line] increases from less than 1 × 10^{−3} to 2.5 × 10^{−2} K^{2}, larger than the internal variability component, demonstrating that interaction is an important component of uncertainty (see Fig. 3b).

Figure 4 presents a comparison of the fractions of uncertainty from the methodologies of HS09 and ANOVA. With our ANOVA approach, the scenario uncertainty dominates all the uncertainty after the year 2050. In the ANOVA method, the fraction of variance due to internal variability decreases rapidly in the first few decades, which may be due to a random fluctuation in the internal variability *V*(*t*) in the first decade. Meanwhile, scenario uncertainty contributes slightly less in the ANOVA method for about the first 30 years. The model uncertainty dominates in the first few decades and has similar values in HS09 and the ANOVA approach. The model–scenario interaction variance increases from a small value and is saturated at about 5% after two decades.

Figure 5 shows how the various fitted ANOVA parameters evolve over time. In Fig. 5a, the separation between the model effects *α* is seen to increase with time, and some of the models, such as ECHAM5/MPI-OM and PCM, give a larger contribution to the model uncertainty than others. Systematic changes in mean deviations can be found in some models (such as PCM) that contribute more to the model uncertainty than others. Such a pattern may be attributable to the fact that the PCM simulations in the historical period are not continuous with those in the twenty-first-century SRES simulations. In Fig. 5b, SRES A2 overtakes SRES A1B in the year 2070, and they have a 0.5-K separation in the year 2100. This separation can be understood as the response to the socioeconomic difference in the emission scenario and also how the model treats the forcings, such as aerosol, differently. A plot of the interaction term (Fig. 5c) is helpful for understanding the contribution of model–scenario interaction, which increases with lead time.

Figure 6 explores the interaction effect in more detail and shows that the interaction effect varies widely. For example, the variation to the different scenarios is the greatest for CCSM3, and the CCSM3 is relatively cool in SRES B1 and relatively warm in A2.

## 4. Discussion and conclusions

### a. Methodology

We introduce a simple, coherent approach for the modeling of uncertainty in multimodel ensembles. The sources of uncertainty are estimated from the ANOVA model and add up to give total variance, which is a natural measure of global uncertainty. In contrast to the uncertainty decomposition constructed in previous studies, this approach does not need to specify a particular type of trend and noise distribution and does not assume constant internal variability over time. The ANOVA approach is a powerful way to quantify sources of uncertainty, and the results generated are often easy to interpret. It is easy to summarize the structure of all ensemble members under different scenarios with simple exploratory techniques. Another important feature is interaction. In this framework, model–scenario interaction is defined as a form of nonconstancy of variance across scenarios in different models. The framework supports the decomposition of model uncertainty into a term that measures the uncertainty due to a variation between scenario-independent model deviations and an interaction term that measures the uncertainty due to a variation between scenario-dependent model deviations. Ignoring the significant interaction term in the analysis would lead to a dramatic impact on the interpretation of the data. The framework offers, along with some exploratory data analysis techniques, a more detailed interpretation on uncertainty and the contribution from a particular ensemble member.

### b. Scientific interpretation

Our results for uncertainty in the global and decadal mean temperature change broadly agree with previous studies; however, some details are different, especially for short lead times. Here are some important findings:

Scenario uncertainty, conditional on the choice of scenarios, is of the greatest importance after year 2050.

Internal variability is constant over time but decreases rapidly as a fraction of the total.

Uncertainty from scenario-independent model deviations dominates uncertainty from scenario-dependent model deviations over the entire study period.

The model–scenario interaction effect is an important contribution to uncertainty, especially at long lead times.

The first finding is fully in agreement with previous papers, such as HS09 and Cox and Stephenson (2007). The second finding is an assumption in HS09 and is now validated using the ANOVA framework. The latter two findings are more closely tied to the presence of a significant interaction term. There are several possible reasons for this finding of certain models having large interaction terms. The most likely reason is that the same forcings are treated differently across the range of models (e.g., Kiehl 2007). However, even if two models treat a forcing in exactly the same way, there could still be a contribution to the interaction if the models respond differently to the forcing—in other words, if the models have different effective climate sensitivities.

### c. Future work

The ANOVA approach, because of its simplicity, is a good starting point to cope with some other more complex problems on attributing uncertainty from multimodel ensembles. Apart from global mean temperature, it is also interesting to investigate uncertainty for different space–time scales and other meteorological fields, such as precipitation and stratospheric ozone (Hawkins and Sutton 2011 and Charlton-Perez et al. 2010). It is possible to extend this approach to a more general class of models. These extensions are not currently common in the climate science community, but they have been used extensively in areas such as biology, epidemiology, and financial modeling. For example, an obvious extension to climate science is a multivariate ANOVA (MANOVA; see details in Press 1972, chapter 8) by incorporating the relationship between different atmospheric fields. This is particularly useful because atmospheric fields are often correlated. A separate analysis of fields such as temperature and precipitation may lead to repeated use of data. For epidemiology applications, Zhang et al. (2009) developed the techniques of smoothed ANOVA (SANOVA) to smooth spatial random effects by taking advantage of the spatial variation.

## Acknowledgments

S. Yip and E. Hawkins are funded by NERC National Centre for Atmospheric Science (NCAS). We thank Mat Collins and three anonymous reviewers for their thoughtful comments, which helped improve the paper. We also thank the modeling groups, the Program for Climate Model Diagnosis and Intercomparison, and the WCRP’s Working Group on Coupled Modelling for making the CMIP3 multimodel dataset freely available (https://esg.llnl.gov:8443/index.jsp). Support of this dataset is provided by the Office of Science, U.S. Department of Energy.

## REFERENCES

_{2}-induced climate change in CMIP2 experiments: Quantification of agreement and role of internal variability

## Footnotes

A comment/reply has been published regarding this article and can be found at http://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-12-00527.1 and http://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-12-00858.1