## Abstract

The relative importance of initial conditions and boundary conditions in interannual to decadal climate predictability is addressed. A simple framework is developed in which (i) ensembles of climate model simulations with changing external forcing can be measured against climatology to get an estimate of the timescale on which changing boundary conditions can provide predictive skill, and (ii) the rate of spread of ensembles of simulations with small perturbations to the initial conditions can be measured against climatology to assess the timescale at which the information in the initial conditions is degraded by chaotic error growth. A preliminary test of the method on a limited number of climate model simulations is presented.

## 1. Introduction

Predictability of climate on interannual to decadal timescales can arise in two distinct ways (Lorenz 1975).

Predictability of the first kind focuses on the initial value problem: how uncertainties in the initial state of the climate system amplify and spoil the forecast at a given lead time. This is the dominant source of error in weather forecasting. Errors in the atmospheric analysis grow such that the current predictability horizon is often of the order of a week or so (e.g., Molteni et al. 1996). In climate forecasting, it is clearly not possible to forecast individual weather events beyond this limit. Slow processes, such as changes in ocean circulation, might, however, lead to predictability of the bulk statistical properties of weather and, in principle, give useful information on interannual timescales and beyond—for example, forecasting of the El Niño–Southern Oscillation (ENSO; e.g., Stockdale et al. 1998). To first order, it is the specification of the initial ocean state that is crucial in the first-kind predictability of climate since this provides the possible “memory” of the system (although other factors such as variations in land cover may be important).

Predictability of the second kind focuses on the boundary value problem: how predictable changes in the boundary conditions that affect climate can provide predictive power. A common class of second-kind predictability studies use atmosphere models with prescribed sea surface temperatures (SSTs) in order to asses the upper limit of predictability associated with a perfect knowledge of the future ocean state (e.g., Zwiers 1996; Rowell 1998). Here we focus on a different problem; that of future changes in external forcing agents of climate, principally anthropogenic increases in greenhouse gases and other radiatively active substances. Changes in radiative forcing can lead to changes in climate that are outside the “noise” of climate variability and, to the extent that these forcing changes are themselves predictable, thus provide a source of potentially useful skill (Stott et al. 2000). The usual focus of climate change detection studies has been on timescales of 50 yr and greater, since their priority has been to attain high levels of confidence by maximizing the signal-to-noise ratio (e.g., Tett et al. 1999). Here we assess the prospects for useful skill on shorter timescales.

On a practical level, knowing the role of the initial conditions and boundary conditions will be crucial in the design of any “operational” climate forecasting system. Consider a climate forecast for a lead time of 10 yr into the future. If predictability on this timescale turns out to be a first-kind (initial value) problem, then there is a requirement for a huge multinational investment in taking routine observations of the ocean in order to provide an accurate initial analysis with which to start the forecast. Conversely, if even very small errors in the initial conditions (i.e., errors that are smaller than the minimum errors achievable in any feasible ocean–atmosphere analysis system) rapidly saturate, leaving only the response to changing boundary conditions, then we need to focus on understanding the response to relatively predictable changes in radiative forcing, such as anthropogenic factors and (on subdecadal to decadal timescales) solar irradiance changes. Some changes in boundary conditions, such as explosive volcanic eruptions, are unlikely ever to be predictable in a deterministic sense, but their role still needs to be understood since it places limits on attainable forecast skill.

## 2. Comparing two kinds of predictability

The question of predictability of the second kind can be addressed by measuring the mean of an ensemble of climate model experiments with changing radiative forcing and comparing that with some estimate of climatological internal variability. The question of predictability of the first kind can be addressed by taking an ensemble of climate model simulations with small perturbations to the initial conditions and measuring the ensemble spread at increasing lead time against the spread expected from random sampling of climatological noise (see Shukla 1981 for a discussion of these issues in the context of weather forecasting).

Assume, for example, that we have *n* experiments with changing radiative forcing (base experiments) from which we have produced *m* ensemble forecasts with slight changes in initial conditions at some date during the base experiments (so that we have *n* ensemble forecasts each with *m* members). Figure 1 displays this experimental setup, with *n* = 4 and *m* = 2, for the simulations considered later in this paper. Let *x*_{ij}(*t*) be a measure of the change in some climate variable over a fixed time interval, *t,* as observed in the *i*th member of the *j*th ensemble forecast. Absolute values could be considered as well as changes, but persistence then introduces additional complications. From here on we take the time dependence, *t,* as given so we drop all (*t*)s from the equations.

We define *r* as the change in *x* attributable to the deterministic response to external forcing over this period (i.e., *r* is the mean change over this period in a hypothetical infinite ensemble). We also define *s*_{j} as the change in *x* attributable to the predictable component of internal variability given the initial conditions of the *j*th ensemble. The sum *r* + *s*_{j} may be thought of as the mean of another hypothetical infinite ensemble with the same external forcing and initial conditions “close” to those of ensemble *j.* The term close is ambiguous, but because of the wide disparity of timescales between the atmosphere and ocean, the precise definition turns out to be irrelevant, and we take close to mean identical ocean initial states and arbitrarily selected atmospheric initial states.

The residual, *u*_{ij}, is the change in *x* that cannot be predicted even in principle given both initial and boundary condition information. On the assumption of no interaction between these different components, we have

The components *r, **s,* and *u* are unknown, but we have an estimate of the likely level of variability in the sum *s*_{j} + *u*_{ij} given by

where *σ*^{2} is the variance of changes in *x* measured over same-length segments, started at the same point in the seasonal cycle, extracted from a long control integration of the model with constant radiative forcing.

The assumption of no interaction between *r, **s*_{j}, and *u*_{ij} may be too restrictive as we expect that, as is the case in weather forecasting, climate predictability will be state dependent (e.g. predictability during the growth phase of a large El Niño event may be higher than that at the onset of a weak one). We proceed though with this assumption in order to illustrate the method, which can be applied to individual ensemble forecasts on a case-by-case basis to highlight this state dependence.

Ultimately the decision of which climatological measures to use will be problem specific and may be determined by the end user. However, Eqs. (1) and (2) could be applied to almost any linear diagnostic, and contain all the ingredients we need for our measures of first- and second-kind predictability. It is simplest to address the second kind first.

### a. Measures of second-kind predictability

We can estimate *r* simply by averaging over all the base experiments that have a particular change in radiative forcing:

Predictability arising from changing boundary conditions might be considered useful when

meaning that >50% of the variance in the forecast quantity can be predicted given accurate knowledge of the boundary conditions, a perfect model, and a hypothetical infinite ensemble. This criterion is, of course, somewhat arbitrary: the actual skill required for a forecast to be useful will depend on the details of the application, feasible ensemble size, payoff matrix, and so forth. Nevertheless, it provides a reasonable benchmark.

Our estimate of *r* will contain noise due to the small sample sizes, so

with the lower and upper limits corresponding to long and short forecast lead times, respectively. In the cases considered here, boundary condition predictability does not become significant until initial condition predictability is almost negligible, so we will use the longforecast approximation (lower limit). Provided *n* ≥ 4, the difference in the resulting criterion on *r̂* is at most *O*(0.1). Substituting for *r* in the left-hand inequality of Eq. (5) thus provides a criterion of useful skill arising from time-evolving boundary conditions:

### b. Measures of first-kind predictability

An estimate of the change in *x* in the *j*th ensemble that is predictable given only knowledge of the forecast initial conditions is provided by

giving an estimate of the completely unpredictable component

Focusing on the role of initial condition information, thus assuming *r* is negligible (as will generally be the case early in the forecast), a similar criterion to Eq. (6) is provided by

Again, if this criterion is satisfied, more than half the climatological variance is, in principle, predictable given perfect knowledge of initial conditions, a perfect model, and an arbitrarily large ensemble.

Taking into account small ensemble sizes, we have

Using summation to approximate the role of the expectation operator and substituting for *u*_{ij} in Eq. (9), the criterion becomes

In the specific case described later on where *m* = 2, this reduces to

This is intuitively plausible, making the forecast useless when the average spread between members of individual ensembles exceeds the climatological variance in the quantity in question. This simple type of variance ratio has been widely used the area of atmospheric (weather) predictability (e.g., Shukla 1981; Roads 1986; Tribbia and Baumhefner 1988) and, more recently, in the case of climate predictability (see also Boer 2000). We present the derivation here for completeness.

## 3. A preliminary application of the method

In this section we illustrate the methods developed above by applying them to a limited set of coupled climate model simulations. These simulations were not performed specifically to address the problem of initial and boundary condition predictability, but nevertheless the present a suitable test bed for our ideas.

The climate model used is the second Hadley Centre Coupled Ocean–Atmosphere General Circulation Model (HadCM2). The model has a horizontal resolution of 2.5° latitude and 3.75° longitude in both the atmosphere and ocean components with 19 and 20 levels in the vertical, respectively, and the two components are coupled using flux adjustments (see Johns et al. 1997 for more details).

We use nine simulations of the coupled model. Four simulations from 1860 to 1996 with increases in greenhouse gases and sulfate aerosols (Mitchell and Johns 1997, which we denote GS1–4. From each of these base experiments we started an “identical twin” experiment in which the oceanic initial conditions from the GS runs were kept fixed and the atmosphere was perturbed by introducing a small change to the radiative forcing (see Fig. 1). Perhaps a more sensible approach would have been to select an atmospheric state corresponding to, for example, the previous day. However, as stated above, the experiments were not designed specifically to address this problem and we use them opportunistically to illustrate the method. In practice, the atmospheric states in the simulations differ significantly after a week indicating that the method is sufficient to cause a perturbation on climatic timescales. The change in radiative forcing corresponds to a an estimate of the downward trend in stratospheric ozone since 1974 superimposed on the GS greenhouse and sulfate forcing; hence, we denote the experiments GSO1–4. We also use a control experiment that was run for 200 yr with fixed concentrations of greenhouse gases, etc. This represents the background internal variability of the climate system.

A posteriori we see that the impact of stratospheric ozone depletion on surface temperatures is undetectable (see below and Fig. 2), at least for the first 10 yr, so we can sensibly treat GS1 and GSO1 as two members of an initial condition ensemble. The main purpose of this paper is to present a simple method for attacking the initial condition versus boundary condition problem and hence the precise details of the analysis presented in this section are less important. In terms of the boundary condition predictability, Allen and Tett (1999) have shown that the most accurate reproduction of observed tropospheric temperature trends is obtained by reducing the amplitude of the ozone contribution by approximately a factor of 2, so we can examine potential predictability arising from the total anthropogenic response using the mean of the GS and GSO ensemble members. The precise details of the externally forced signal used are unimportant, since the magnitude of this signal is changing over time: our aim here is simply to provide an indication of its importance for predictability relative to initial condition information.

Figure 2 shows ℬ, the measure of predictability arising from changing boundary conditions, computed for a “forecast” starting in September 1974. We consider two diagnostics: the linear trend versus the net change over the forecast period. For global mean temperatures, the boundary conditions are unimportant at a lead times of less than a decade, but thereafter the signal emerges from the noise and we see the influence of a significant warming trend. For Northern Hemisphere land temperatures, the signal emerges at a similar lead time, although the net change diagnostic highlights a seasonal dependence on predictability with winter being predictable and summer not. This shows the influence of feedbacks in the climate system (here a snow–albedo feedback), which can significantly modulate the climate change signal.

There is no indication of any second-kind predictability at lead times of less that 20 yr for North Atlantic SSTs nor for Niño-3 (5°S–5°N, 90°–150°W) SSTs. The climate change signal is more likely to emerge on global scales because of the noise being reduced relative to the “signal” by spatial averaging (Stott and Tett 1998). Also land areas are likely to warm at a greater rate than the ocean because of the difference in thermal inertia and because, over water, much of the excess energy from the increased concentrations of greenhouse gases is taken up in greater evaporation rather than heating.

There appears to be some discrepancy between Fig. 2 and the typical conclusions of climate change detection and attribution studies, which have often argued that 30 years or more is required for the anthropogenic signal to emerge robustly from climatological noise (Hegerl et al. 1997). In fact, there is none: because of the economic implications of a “false positive,” these studies typically demand a high level of confidence before concluding that a signal is clearly detectable. A much lower level of confidence may nonetheless be useful for practical predictability: here we only require that a trend be larger than the climatological spread of trends of equal length (equivalent to “detection” at the 50% level).

Figure 3 shows ℐ, the measure of predictability arising from perfect knowledge of the initial conditions, computed for lead times up to 5 yr. Plots of ℐ based in simple differences in temperature show large variations from month to month whereas the measure based on trends smoothes this indicator. Large negative values of ℐ arise because of the small sample size used here. When averaged over many cases, ℐ should saturate at a value of −1. Clearly larger ensembles are required to make firm conclusions, but it seems that, in general, global temperatures may be predictable up to lead times of a couple of years; North Atlantic SSTs may be predictable on decadal timescales, but there is little predictability past a month or so for temperatures over land. These results are comparable to those of Griffies and Bryan (1997), Grötzner et al. (1999), and Boer (2000). We suspect that skillful lead times may, in fact, be highly variable and will be “state dependent” in the sense that different initial states may be more or less amenable to prediction. That then leaves the problem of either relating ensemble spread to skill, or of determining the physical precursors of predictable states.

The ℐ measure for the Niño-3 index shows a rapid drop-off of skill on the seasonal timescale and a return of skill at a lead time of 1–2 yr. This seasonal modulation is consistent with the experience of ENSO forecasters, but the suggestion of potential predictability on interannual timescales may indicate a lack of realism in the ENSO model. Sampling uncertainty with these small ensembles is also an issue.

## 4. Discussion

This study has served to illustrate the importance of two kinds of ensembles: an ensemble of perturbations on a given oceanic initial condition gives an accurate assessment of forecast divergence; and an ensemble of initial conditions provides information on the state dependence of this divergence. For standard statistical measures to be easily applicable, these individual ensembles should contain at least 4 members (for a single 2-member initial condition ensemble there is a 40% chance of falsely concluding there is predictive skill), although it is difficult to deduce an optimal ensemble size. Also, other diagnostic measures of predictability should be applied to variables other than surface air temperature.

Ultimately, first-and second-kind climate predictability will be limited by our ability to estimate the precise state of the ocean at any given time and to predict the future evolution of external factors affecting climate (e.g., explosive volcanic eruptions). Hence, studies of this type must be interpreted carefully as they represent only the potential predictability. The message, however, is clear: both initial and boundary condition information will need to be considered when designing operational climate forecasting systems in the future.

## Acknowledgments

We thank Simon Tett who ran the experiments. This work was supported by U.K. Department of the Environment, Transport and the Regions (PECD/7/12/37) and the U.K. Natural Environment Research Council through the COAPEC thematic programme (MC) and an Advanced Research Fellowship (MRA), with additional support from the European Commission QUARCC project.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* Dr. Matthew Collins, Centre for Global Atmospheric Modelling, Department of Meteorology, University of Reading, P.O. Box 243, Earley Gate, Reading RG6 6BB, United Kingdom. Email: matcollins@met.rdg.ac.uk