Ensemble experiments are performed with five coupled atmosphere–ocean models to investigate the potential for initial-value climate forecasts on interannual to decadal time scales. Experiments are started from similar model-generated initial states, and common diagnostics of predictability are used. We find that variations in the ocean meridional overturning circulation (MOC) are potentially predictable on interannual to decadal time scales, a more consistent picture of the surface temperature impact of decadal variations in the MOC is now apparent, and variations of surface air temperatures in the North Atlantic Ocean are also potentially predictable on interannual to decadal time scales, albeit with potential skill levels that are less than those seen for MOC variations. This intercomparison represents a step forward in assessing the robustness of model estimates of potential skill and is a prerequisite for the development of any operational forecasting system.
Predictions of the future state of the climate system are of potential benefit to society. The ability to predict (here we consider the potential ability to predict) can also give insight into the physical aspects of the climate system that are not simply the averaged or integrated effects of chaotic, unpredictable weather “noise.” Restricting attention to variations in climate that are purely internally generated, predictability in the system hints at processes that have long time scales or that may have periodic behavior. Quantifying the predictability associated with such processes can lead to a greater understanding of the climate system.
Operational predictions of climate on seasonal to interannual time scales associated with the El Niño–Southern Oscillation (ENSO) are now commonplace (e.g., Goddard et al. 2001). Prediction systems for other seasonal–interannual “modes” of climate are also emerging (e.g., Rodwell and Folland 2002). Here we consider the predictability of interannual to decadal variations in the North Atlantic region. On these time scales, both the initial conditions (principally the initial state of the ocean) and the boundary conditions (associated with both natural and anthropogenic forcing of the system) are important (Collins and Allen 2002; Collins 2002), but here we focus solely on the initial value problem of the predictability of internally generated interannual to decadal climate variability.
The Atlantic meridional overturning circulation (MOC) is the main northward heat-carrying component of the ocean part of the climate system (e.g., Trenberth and Caron 2001). Coupled atmosphere–ocean models (AOGCMs) exhibit internally generated variations in the strength of the MOC and associated heat transport (e.g., Dong and Sutton 2001), and the surface climate impact of those variations have also been seen in historical (Latif et al. 2004) and paleoclimatic records (Delworth and Mann 2000). Shorter records of ocean observations (Dickson et al. 1996; Curry et al. 2003; Marsh 2000) also exhibit variations that have been linked with the MOC. Variations in the MOC thus represent an ideal candidate for the study of interannual to decadal climate predictability.
Predictability studies with AOGCMs in which ensembles of simulations with small perturbations to the initial conditions have revealed the potential predictability in these MOC variations and in related surface and atmosphere variables (Griffies and Bryan 1997; Grötzner et al. 1999; Boer 2000; Collins and Sinha 2003; Pohlmann et al. 2004). While all studies show some level of potential predictability, it is difficult to form robust conclusions because of the range of complexity (and hence realism) of the different models used, because of the range of different initial states considered and because of subtle differences in the measures of predictability employed. For example, it is well known in weather forecasting that predictive skill can vary considerably with different initial conditions. Clearly it is important to quantify the potential skill level of interannual–decadal climate forecasts prior to the expensive development of operational prediction schemes and the deployment of operational observing systems.
Here we present a step forward in making a robust estimate of the potential predictive skill of interannual to decadal climate predictions associated with internally generated variations in the MOC. A coordinated set of potential predictability experiments has been performed with five recently developed complex AOGCMs. An attempt is made to initiate the experiments from similar ocean states, and a common set of measures of potential skill is used. This “multimodel” approach has proved useful in other areas of weather and climate prediction. Here the emphasis is on a comparison of the levels of potential predictability seen in the different models. Other publications discuss the individual model results (e.g., Collins and Sinha 2003; Pohlmann et al. 2004, 2005, manuscript submitted to J. Climate) in more detail.
2. The ensemble experiments
Five coupled atmosphere–ocean models are used (see Table 1), as follows.
The version-3 Action de Researche Petite Echelle Grande Echelle “ORCA” Louvain-la-Neuve Sea-Ice Model (ARPEGE3-ORCALIM) has an atmospheric component (Déqué et al. 1994) with a horizontal spacing of T63 with 31 levels in the vertical direction (20 in the troposphere). The ocean component, “ORCA2,” is the global configuration of the Océan Parallélisé (OPA8) model (Madec et al. 1998) with a horizontal spacing of 2° in longitude and 0.5° to 2° in latitude. It includes a dynamic–thermodynamic sea ice model (Fichefet and Morales Maqueda 1997). The components are coupled through the Ocean–Atmosphere–Sea Ice–Soil, version 2.5, software interface (OASIS 2.5; Valcke et al. 2000), which ensures the time synchronization and performs spatial interpolation from one grid to another.
The Bergen Climate Model (BCM; Furevik et al. 2003; Bentsen et al. 2004) uses the Miami Isopycnic Coordinate Ocean Model (Bleck et al.1992) coupled to a dynamic–thermodynamic sea ice module. The ocean mesh is formulated on a Mercator projection with a nominal horizontal spacing of 2.4° and 24 vertical layers. The atmospheric component is version 3 of the ARPEGE model with a horizontal spacing of T63 and 31 layers in the vertical direction—essentially the same atmosphere that is used in ARPEGE3-ORCALIM. Freshwater and heat flux adjustments are applied.
The European Centre for Medium Range Weather Forecasts–Deutsches Klimarechenzentrum Hamburg, version 5/Max Planck Institute Ocean Model (ECHAM5/MPI-OM; Latif et al. 2004) uses the ECHAM5 atmospheric model (Roeckner et al. 2003) at T42 horizontal spacing with 19 vertical layers. The oceanic component, the MPI-OM (Marsland et al. 2003), is run on a curvilinear grid with equatorial refinement and 23 vertical levels. A dynamic–thermodynamic sea ice model and a river-runoff scheme are included.
The Third Hadley Centre Coupled Ocean–Atmosphere GCM (HadCM3; Gordon et al. 2000; Collins et al. 2001) uses an oceanic component with a horizontal spacing of 1.25° longitude by 1.25° latitude and 20 levels in the vertical direction. The atmospheric component uses a gridpoint formulation with a horizontal spacing of 3.75° × 2.5° in longitude and latitude with 19 unequally spaced vertical levels (Pope et al. 2000). A simple thermodynamic sea ice scheme is used.
The Istituto Nazionale di Geofisica e Vulcanologia (INGV) model uses the ECHAM4 model (Roeckner 1996) at T42 resolution with 19 vertical levels. The ocean component is essentially the same as that used in the ARPEGE3-ORCALIM model. More details can be found in Gualdi et al. (2003) and Carril et al. (2004).
Ensemble experiments are performed from initial states of anomalously high and anomalously low MOC taken from a control (i.e., unforced) run of each model (Fig. 1). In addition, some models were used to perform experiments with initial states near the time-mean value of overturning. Perturbations to the initial conditions were made using the common method of taking different atmospheric start conditions (in most cases atmosphere start conditions differ only by one day of model integration) and identical ocean start conditions for the respective model (see e.g., Collins and Sinha 2003). Hence both atmosphere and ocean initial states are in perfect balance with the model as they are taken from the respective control simulations and are thus solutions of the model equations. While this perturbation methodology is in no way optimal in terms of, for example, sampling the likely range of atmosphere–ocean analysis error, it is sufficient to generate ensemble spread on the time scales of interest. Note that the perturbation method produces ensemble experiments that are likely to give the upper limit of model-world predictability: hence the terms potential predictability and perfect-model or perfect-ensemble experiments.
The availability of computer resources limited the number of ensemble members and experiments that could be performed: nevertheless all experiments were integrated out to at least 20 yr. The experiments correspond to a total 1340 simulated years for the predictability experiments combined with a total of 3100 simulated years for the control experiments used to assess background variability. Annual mean diagnostics are examined because of the focus on interannual to decadal time scales.
3. Potential predictability of MOC variations
The first point to note is the wide range of time scales and magnitudes of MOC variability in the different models (Fig. 2). The ECHAM5/MPI-OM model shows the largest variations in MOC strength with clear interdecadal variability present. HadCM3 and BCM also show interdecadal variations but at a reduced level in comparison. The ARPEGE3-ORCALIM model has the lowest level of variability, but decadal–interdecadal time scales are still clearly present in the time series. The large trend seen in the INGV model is almost certainly due to a drift seen in this particular control experiment—the model has yet to reach equilibrium, and we do not attempt to extract quantitative measures of predictability. Although not calculated, diagnostic measures of predictability/variability (e.g., Boer 2000) would clearly show a range of different levels of MOC potential predictability in these models. However, the only reliable way to assess potential predictability is to perform ensemble experiments.
The perfect-ensemble experiments are also shown in Fig. 2. Potential predictability is evident when the ensemble spread is small in comparison with the total level of variability in the control time series, or even if the ensemble spread is relatively large but the center of gravity of the ensemble is displaced significantly with respect to the mean of the control (e.g., Collins 2002). We may imagine a background or climatological distribution that, in the absence of a forecast, would be all the information we would have to form an assessment of the future strength of the MOC. Alternatively, we may imagine a form of damped persistence based on the autocorrelation structure of past observations. A forecast may allow us to reduce the potential range (low ensemble spread) or shift the mean of the distribution (displaced ensemble), or both. Both types of (potential) predictability are seen on interannual to decadal time scales in the experiments shown in Fig. 2. For example, the first HadCM3 ensemble (anomalously strong MOC initial conditions) has relatively small ensemble spread in the first decade of the experiment and the ensemble is significantly shifted to stronger values with respect to the mean with no ensemble members indicating weaker than average overturning [see Collins and Sinha (2003) for more details]. Other examples are clear.
There is a wide range of measures that may be used for forecast verification (as stated above, we measure the potential skill of a perfect model forecast—an upper limit). We examine two of the most simple measures of forecast skill to quantify levels of potential predictability; the anomaly correlation coefficient (ACC) and normalized root-mean-square error (rmse). Formulas are given in Collins (2002).
Figure 3 shows both measures for the MOC in the ensemble experiments discussed above. For the strong MOC initial states, the ACC is “high” for approximately the first decade in all the model experiments, with high being above 0.6—a commonly used cutoff value in weather forecasting. The rmse is correspondingly low. After the first decade, the ARPEGE3 model predictability drops off rapidly whereas for the other models the ACC drops off slowly to low values by the end of the 20-yr experiments. The rmse similarly saturates in 20 yr. For the weak MOC initial states, error growth and loss of predictability seem to happen sooner in the ensemble experiments, although there is some noise in these measures because of small ensemble sizes. ACC and rmse are not shown for the normal initial states because of the small sample size.
While the number of ensemble experiments is small, we may attempt to draw some conclusions about the multimodel estimate of potential predictability of MOC variability in these experiments (Fig. 3, thick solid line). The multimodel ensemble indicates potential predictability of interannual–decadal MOC variations for one–two decades into the future. It also indicates that initial states that have anomalously strong overturning are more predictable than those with anomalously weak overturning. This latter result is intriguing but is subject to some uncertainty because of the relatively small number of models and ensemble experiments included in the multimodel analysis. Nevertheless, some consensus is emerging in contrast to the previous situation in which a large range of predictability is seen in the literature. It would be safe to conclude that there is a robust signal of potential predictability of variations in the MOC on interannual to decadal time scales.
4. Potential predictability of surface climate variations
Predictions of MOC variability may be of interest to scientists, but they would be of little relevance to society unless they are accompanied by predictions of surface climate variables. A simple measure of the impact of MOC variations can be obtained be performing a regression between decadal-averaged MOC strength and decadal-averaged surface air temperature (SAT) in the different models (Fig. 4). The general impression in all the models is of a warmer Northern Hemisphere when the MOC is stronger and is transporting more heat poleward. Differing levels of statistical significance seen in Fig. 4 may be interpreted as resulting from different levels of signal-to-noise in the sense that in models with larger variations in MOC, the surface signal has a better chance of overwhelming the noise of unrelated random climate variations. What is interesting is that the magnitude of the surface response (in kelvins per Sverdrup) is similar across all models.
The North Atlantic Ocean is a region in all the models in which there is a significant relationship between decadal variations in SAT (and underlying SST) and the MOC. Time series of annual mean SAT from the control and ensemble experiments averaged over a region of the North Atlantic [used in Collins and Sinha (2003) and Pohlmann et al. (2004)] are shown in Fig. 5. Strong similarities between these time series and those shown in Fig. 2 for the MOC are evident, although there is clearly more noise in this variable as a result of unrelated random atmospheric variability.
ACC and rmse measures of ensemble spread (Fig. 6) for North Atlantic SAT are similar to those computed for MOC variations (Fig. 3), but the levels of potential predictability are clearly less and the differences between ensemble members greater. It may be possible to find greater levels of potential predictability for each individual model by adjusting the boundaries of the region chosen, but here we compare the models on an equal footing. Also, the effects of interannual noise, which are more prominent in this variable, may be reduced by taking averages over a greater number of years. Nevertheless, the picture of potentially predictable surface climate variations associated with variations in the MOC appears consistent.
Whereas previously it has been difficult to assess the potential for making interannual to decadal forecasts of climate because of different studies indicating different levels of predictability, a more complete picture of the predictability is emerging. This intercomparison study shows that
variations in the ocean meridional overturning circulation are potentially predictable on interannual to decadal time scales,
a more consistent picture of the surface temperature impact of decadal variations in the MOC is now apparent, and
variations of surface air temperatures in the North Atlantic are also potentially predictable on interannual to decadal time scales, albeit with potential skill levels that are less than those seen for MOC variations.
Perhaps the biggest difference between the models is in the wide range of strengths of decadal variability evident in Fig. 2. In general, models with greater decadal MOC variability have greater levels of potential predictability—despite the fact that the ACC and rmse are signal-to-noise measures and thus allow for differences in background natural variability. Investigation into the mechanisms responsible for the different levels of variability would seem to be a priority.
In any real-world prediction system, an estimate of the three-dimensional ocean state would have to be made using data assimilation. Currently there are significant disagreements between estimates of even the mean value of the overturning using such techniques, ranging from as little as 12 Sv (1 Sv ≡ 106 m3 s−1; Stammer et al. 2002) to as much as 25 Sv [S. Masina et al. 2005, unpublished result based on system outlined in Masina et al. (2004)], both being consistent with observational estimates. Hence estimating anomalous interannual to decadal variations about this mean may seem almost impossible, particularly given the relative paucity of in situ ocean observations. We may take some hope though from ocean-model simulations driven by estimates of observed surface winds and fluxes that generally show agreement between their MOC variations over the latter part of the twentieth century (Bentsen et al. 2004). It may be the case that more accurate and balanced reconstructions of past variations in surface fluxes (e.g., from weather forecast reanalysis products) coupled with recently deployed ocean observations (Hirschi et al. 2003) and improved models and data assimilation schemes could provide accurate ocean initial states with which to initialize forecasts. Pilot forecast systems are in development.
The far more pertinent question is, of course, that of the (potential) prediction of surface climate variations over land. The simple measures used in this study do not reveal robustly predictable land signals. Collins and Sinha (2003) and Pohlmann et al. (2005, manuscript submitted to J. Climate) investigate probabilistic techniques more commonly used in medium-range and seasonal forecasting in the context of the interannual–decadal problem with some limited success. However, the application and verification of such measures (here the assessment of potential skill) requires much larger ensemble sizes and many more ensemble simulations than used here. It is hoped that such ensembles will be performed in future. In addition, the modeling, initialization, and observational issues that need to be addressed before we routinely produce interannual–decadal climate forecasts are numerous.
This work was performed while the lead author was based at the Centre for Global Atmospheric Modelling, Department of Meteorology, University of Reading. All authors received support from the EU FP5 PREDICATE project (EVK2-CT-1999-00020) and from national sources.
Corresponding author address: Dr. Matthew Collins, Hadley Centre, Met Office, Exeter, Devon EX1 3PB, United Kingdom. Email: firstname.lastname@example.org