## Abstract

Seasonal forecast of climate anomalies holds the prospect of improving agricultural planning and food security, particularly in the low latitudes where rainfall represents a limiting factor in agrarian production. Present-day methods are usually based on simulated precipitation as a predictor for the forthcoming rainy season. However, climate models often have low skill in predicting rainfall due to the uncertainties in physical parameterization. Here, the authors present an extended statistical model approach using three-dimensional dynamical variables from climate model experiments like temperature, geopotential height, wind components, and atmospheric moisture. A cross-validated multiple regression analysis is applied in order to fit the model output to observed seasonal precipitation during the twentieth century. This model output statistics (MOS) system is evaluated in various regions of the globe with potential predictability and compared with the conventional superensemble approach, which refers to the same variable for predictand and predictors.

It is found that predictability is highest in the low latitudes. Given the remarkable spatial teleconnections in the Tropics, a large number of dynamical predictors can be determined for each region of interest. To avoid overfitting in the regression model an EOF analysis is carried out, combining predictors that are largely in-phase with each other. In addition, a bootstrap approach is used to evaluate the predictability of the statistical model. As measured by different skill scores, the MOS system reaches much higher explained variance than the superensemble approach in all considered regions. In some cases, predictability only occurs if dynamical predictor variables are taken into account, whereas the superensemble forecast fails. The best results are found for the tropical Pacific sector, the Nordeste region, Central America, and tropical Africa, amounting to 50% to 80% of total interannual variability. In general, the statistical relationships between the leading predictors and the predictand are physically interpretable and basically highlight the interplay between regional climate anomalies and the omnipresent role of El Niño–Southern Oscillation in the tropical climate system.

## 1. Introduction

Freshwater availability is a limiting factor for agriculture and food security in many parts of the globe, particularly in the belt of arid and semiarid climates. At the same time, these regions are subject to large rainfall variability at intraseasonal to decadal time scales. A well-documented example is sub-Saharan West Africa, where drought anomalies have prevailed for many years since the late 1960s (Nicholson et al. 2000), leading to tremendous economic loss and a general deterioration of life conditions, especially in the Sahel (Benson and Clay 1998). Findley (1994) also pointed to the relationship between large-scale migration processes and drought. Other parts of the low latitudes experience similar problems like India, the Nordeste region in Brazil, Central America, and tropical East Africa. In addition, many regions with scarce freshwater availability are characterized by strong population growth and an increasing need for agricultural products. Thus, anticipating whether the forthcoming rainy season will be abundant or deficient is of basic interest to agricultural planning and may help to improve food security and human welfare in those countries (Dodd and Jolliffe 2001; Tarhule and Lamb 2003).

While seasonal forecasting in the extratropics is largely obstructed by the predominating effect of stochastic high-frequency variability in the atmosphere (e.g., Rodwell et al. 1999), the low latitudes seem to be favored for longer-range weather prediction up to several months into the future (Reichler and Roads 2004). This arises from the fact that atmospheric dynamics in the Tropics are closely tied to variations in sea surface temperatures (SSTs). These are characterized by a long memory (Colman and Davey 2003; Paeth and Hense 2003; Repelli and Nobre 2004). Most regions are primarily affected by the tropical Pacific basin and the El Niño–Southern Oscillation (ENSO) phenomenon. This holds for tropical South America, Central America, and the Indian summer monsoon region (Bertacchi et al. 1998; Latif and Grötzner 2000; Sutton et al. 2000; Taphyal and Rajeevan 2003; Webster et al. 1998). The African continent represents a special case, since it is influenced by variations in all three tropical ocean basins (Paeth and Friederichs 2004). Besides the ENSO effect, which is less robust in this part of the low latitudes (Camberlin et al. 2001; Nicholson et al. 2000), West African precipitation is directly related to SST anomalies in the tropical Atlantic (Chang et al. 2000; Su et al. 2001; Vizy and Cook 2001), providing an excellent predictor for seasonal forecast in the West African monsoon region (Paeth and Hense 2003; Tarhule and Lamb 2003). The Indian Ocean dipole also plays a role in West and, especially, East African rainfall fluctuations (Bader and Latif 2003; Black et al. 2003; Latif et al. 1999). Some SST impact is even found over northwest Africa (Rodriguez-Fonseca and de Castro 2002), although this subtropical region is mainly governed by extratropical atmospheric circulation (Cullen and de Menocal 2000; Knippertz et al. 2003). Finally, Mediterranean SST changes directly influence precipitation anomalies in the Sahel (Rowell 2003). The response of the Indian monsoon to SST variations is somewhat less pronounced than in Africa (May 2003). However, there is a close relationship between the strength of the Indian summer monsoon season and land surface conditions over Eurasia like snow cover and soil moisture with several-months lead time (Klaßen et al. 1994; Robock et al. 2003).

Present-day approaches in seasonal forecasting either use statistical predictors from SST or land surface characteristics (Taphyal and Rajeevan 2003) or numerical climate model simulations (Krishnamurti et al. 1999; Mo and Thiaw 2002). Either a coupled climate model is used or a statistical prediction produces SST anomalies until the end of the forthcoming rainy season (Colman and Davey 2003; Repelli and Nobre 2004), which are then taken as lower boundary conditions in atmospheric circulation models (Clark and Déqué 2003; Garric et al. 2002). Krishnamurti et al. (1999) have described an improved forecast system, which is based on a large number of ensemble simulations from different atmospheric general circulation models (GCMs). For the forecast, the individual climate models are weighted according to their reliability in the past, which is inferred from a multiple linear regression analysis between various simulated rainfall estimates as predictors and observed rainfall as predictand. This so-called superensemble forecast is found to be more skillful than the individual forecast of the one climate model, which best fitted the observations in the past.

However, there is a major limiting factor in this procedure: Using simulated precipitation as a predictor for the forthcoming rainy season (Feddersen et al. 1999; Kang et al. 2004) holds the risk that this variable is substantially biased due to uncertainties and deficiencies in the representation of subgrid-scale cloud and rainfall processes (Errico et al. 2001). Furthermore, most climate models do not account for local feedbacks and interactions with land surface characteristics like vegetation cover, albedo, and soil processes, which, beside SST variations, play a key role in tropical precipitation anomalies (Nicholson 2001; Zeng et al. 1999; Zeng and Neelin 2000). One way out of the dilemma is a statistical recalibration of the simulated rainfall data, using various dynamical variables of the same model simulation (Hansen and Emanuel 2003). The basic assumption is that state-of-the-art climate models are much more realistic in terms of the large-scale atmospheric dynamics than with respect to precipitation. Many studies have shown that even some smaller-scale features of rainfall amount and distribution are intimately linked to large-scale circulation phenomena like the North Atlantic Oscillation (NAO) or the monsoon systems (Cullen and de Menocal 2000; Knippertz et al. 2003; Long et al. 2000; Saha and Saha 2001). Therefore, the dynamical model output can be fitted to observed precipitation in order to obtain statistical transfer functions, which can be extrapolated to periods for which observational data are not available, for instance, during the forthcoming rainy season. Such model output statistics (MOS) have first been introduced by Glahn and Lowry (1972) in numerical weather forecasting. MOS applications for simulated precipitation in South Africa and South Asia are described by Bartman et al. (2003) and Tippett et al. (2003), respectively. Craig et al. (2001) even suggest that any kind of model output should be statistically postprocessed before using the data for follow-up studies and applications.

In the present study, we describe an MOS approach that is dedicated to predict seasonal precipitation in various regions of the globe. The predictand regions are selected according to their sensitivity to oceanic forcing. A cross-validated stepwise multiple regression analysis is applied to a large number of climate variables from a six-member ensemble of long-term SST-forced climate model experiments and various observational datasets. The predictor variables include temperature, precipitation, geopotential, wind components, atmospheric humidity, and SST, partly in several tropospheric levels. Given the large number of predictors and the distinct teleconnections in tropical climate (Klein et al. 1999), an empirical orthogonal function (EOF) analysis is carried out for each set of predictors prior to computing the MOS equation in order to avoid statistical overfitting of the regression model. The results are compared with the classical superensemble approach by Krishnamurti et al. (1999), albeit considering a smaller number of different climate models. The question whether the MOS system provides some added value with respect to the superensemble forecast is addressed by computing two skill scores, one for the general skill of the forecast and one for different magnitudes of climate extremes. The skill scores are finally compared between both forecast strategies, MOS and superensemble. The present analysis is restricted to a hindcast approach, training and evaluating the MOS system with data during the twentieth century. However, under the assumption of reasonably well predicted SSTs in the tropical oceans (Colman and Davey 2003; Repelli and Nobre 2004), the MOS system could actually be used for operational seasonal forecasting in many regions of the low latitudes, provided that the atmospheric processes are predominantly governed by tropical SST rather than by extratropical dynamics.

The paper is organized as follows: Section 2 lists the considered observational and model datasets. The combined EOF–MOS approach is described in section 3. Sections 4 and 5 are dedicated to the selection of predictand regions and predictor variables, respectively. The results of both forecast approaches, superensemble and MOS, are presented in section 6. The evaluation of the forecasts is addressed in section 7, and the main conclusions of this analysis are drawn in section 8.

## 2. Datasets

### a. Observational data

Depending on the predictand variables different observational datasets are used (Table 1). The main focus of this study is on seasonal rainfall, since it is usually of highest relevance to socioeconomic systems in the low latitudes. We rely on the Climatic Research Unit (CRU) precipitation dataset, which is based on a large number of available station data interpolated to a regular 0.5° grid (New et al. 2000). It covers all landmasses except Antarctica during the period 1901 to 1998 in monthly resolution. Of course, statistical interpolation can generally not account for the spatial heterogeneity in orographic terrain and in regions with pronounced interactions with land surface conditions, especially in areas with low station density. However, it appears that the CRU dataset is more appropriate to describe rainfall variability in the Tropics, at least in Africa (Poccard et al. 2000), than reanalyses products, because the latter suffer from inaccuracies of the assimilating model in the low latitudes (Lim and Ho 2000; Trenberth et al. 2001).

In addition, two dynamical predictands are taken into account. These are the zonal wind over northeastern Africa (WNA) and the dynamical monsoon index (DMI) (Stephenson et al. 2001). The WNA is regarded because it may represent itself a further precursor of West African rainfall. Many authors have shown that Sahelian and Guinean coast precipitation is tied to the midtropospheric African easterly jet (AEJ) and African easterly waves (AEW) (Druyan et al. 1997; Grist and Nicholson 2001; Hastenrath 2000). The DMI is an alternative measure of the strength of the Indian summer monsoon (Stephenson. et al. 2001). It represents the vertical shear of zonal wind between the lower and upper troposphere averaged over India and may be more strongly embedded in the tropical teleconnections than Indian monsoon precipitation. WNA and DMI are taken from the National Centers for Environmental Prediction (NCEP) reanalyses in monthly time resolution but from a former version extending from 1958 to 1999. The Global Sea Ice and Sea Surface Temperature dataset (GISST) provides observed monthly-mean SSTs, once as predictors for the MOS approach, and once as lower boundary conditions in the considered atmospheric climate model experiments (see following subsection).

### b. Model data

The predictors are predominantly derived from a six-member ensemble of ECHAM4 simulations (Roeckner et al. 1996), which are driven by observed SSTs and sea ice margins for the period 1903 to 1994 using the GISST2.2 dataset (Parker and Jackson 1995) (Table 2). The model is run in T42 truncation. The ECHAM4 ensemble data are used to define the predictand regions, which are characterized by a strong SST signal (see section 3), and the dynamical predictors for the MOS system. Six variables, mostly in eight atmospheric levels, are considered: monthly precipitation, temperature (including SST), geopotential height, specific humidity, and horizontal wind components in 1000, 850, 700, 600, 500, 300, 200, and 150 hPa.

For the superensemble approach a relatively small multimodel ensemble of rainfall data is used, consisting of the ECHAM4 precipitation mentioned above, two ECHAM3 ensembles (Hense and Römer 1995; Roeckner et al. 1992), and six HADAM2 ensemble runs (Rodwell et al. 1999). Besides the use of a multimodel ensemble, this approach is similar to Kang et al. (2004) and Feddersen et al. (1999). As predictors only those model fields that simulate the predictands are selected, for example, precipitation as predictor for observed precipitation. For the MOS approach the selection is more general. In principle, the complete model state is taken into account for the regression with precipitation. As this is a strongly ill-posed problem, an a priori selection of the predictand is performed as described below. All climate models are forced with the same SST data. Note that the horizontal resolutions and integration periods differ from model to model (Table 2). All climate models have been referred to in a variety of climate research studies and reproduce the observed rainfall characteristics in a reasonable way (Hense and Römer 1995; Paeth and Hense 2004; Rodwell et al. 1999). In general, the more recent ECHAM4 version provides a more reliable representation of rainfall than the former coarser-grid ECHAM3 version (Paeth and Hense 2004).

## 3. Methods

Although the predictands are derived from the observations, the definition of the predictand regions is based on the six-member ECHAM4 ensemble. The idea is to select only those regions of the globe that are substantially affected by changes in the global SST field in order to ensure that the predictands profit from the long-term memory of the oceanic component. The contribution of the SST forcing to total precipitation variability can be quantified and tested by an analysis of variance (ANOVA) (von Storch and Zwiers 1999). Given the data *X _{jk}* in model run

*j*at time

*k*, the linear model

can be set up with the overall mean *μ*, the impact of the common forcing *β _{k}* (in this case SST), and the unpredictable part

*ɛ*, which in this case arises from internal atmospheric variability imposed by varied initial conditions in the individual ensemble members. Square sum decomposition leads to the portion SS

_{jk}*of variance explained by SST variations. Here SS*

_{β}*is a Fisher*

_{β}**F**distributed random variable with appropriately chosen degrees of freedom and the Null hypothesis

SS* _{β}* is calculated separately for each model grid box. The boundaries of the predictand regions are defined by large values of

*β*. The predictand time series are computed by averaging the observed rainfall data over the predefined regions.

_{k}The predictors are determined by linear correlation between the observed predictand time series and the simulated variables at each grid point and atmospheric level. In the final MOS approach, only those predictors are considered for which the correlation coefficient is statistically significant at the 5% or 1% level.

The MOS system uses a stepwise multiple regression analysis (von Storch and Zwiers 1999). All predictand and predictor time series are standardized in order to balance out the different units and amplitudes of variability. Moreover, a trend polynomial of fourth order is removed, ensuring that correlation does not spuriously arise from coinciding long-term trends, since this study is focused on the interannual variations.

We use two strategies to avoid overfitting of the MOS system: 1) An EOF analysis is applied to the entire set of model predictors (von Storch and Zwiers 1999), combining predictors that are linearly dependent like, for instance, temperature gradients and wind components or SST and geopotential height in 1000 hPa. Then, the MOS is built up by taking the principal component (PC) time series of the EOFs instead of the original predictors from the model fields. For the interpretation of the results the PC predictors are transformed back from the EOF space into the grid point space. 2) A cross validation is used to cut off the list of predictors, if additional predictors do not add further information to the MOS equation, as measured by an independent dataset (Michaelsen 1987). For this purpose, six so-called bootstraps are retained from the time series prior to computing the EOFs and stepwise regression. The MOS equation is evaluated with respect to the six independent years: An additional predictor is only allowed, if the root-mean-square error (rmse) between estimated and retained predictands of the independent data is decreasing compared with the preceding regression step. The bootstrap approach is repeated 1000 times, each time retaining six different years selected by a random process. This leads to a probabilistic distribution of the MOS results over 1000 iterations. The accepted predictors and the mean and 95% confidence intervals of the corresponding portion of explained variance are documented and illustrated in section 6. Note that the EOF analysis is also carried out 1000 times. Of course, it is ensured that the EOFs of the individual bootstrap iterations are comparable with each other. We compare two forecast approaches with each other: 1) the multiple regression analysis is applied to the ECHAM4 ensemble, including all dynamical predictors and precipitation (MOS) and preprocessing an EOF analysis; 2) the method is applied to the superensemble rainfall from four climate models (without EOF decomposition). The goal is to assess to which extent the MOS provides some added value to the classical seasonal forecasting strategy. A more detailed description of the method can be found in Paeth and Hense (2003).

Both forecast approaches, MOS and superensemble, are evaluated and compared by means of two different skill scores:

- The Brier skill score (BSS) measures the increase in explained variance by the MOS or superensemble forecast with respect to the climatological forecast, that is, taking the long-term mean of a time series as forecast for a future anomaly (von Storch and Zwiers 1999). The BSS is not automatically related to a statistical test. Whether a certain increase in explained variance is valuable or not may depend on the specific problem, like for instance food security in sub-Saharan Africa or flood risk in middle Europe, and on the technical or logistic costs that are caused by an improvement of the forecast system. One may construct a Null hypothesis based on the practical benefit of the forecast but this is far beyond the scope of this paper. Therefore, we consider the BSS as a relative measure of the forecast skill, comparing both forecast approaches with each other, but not interpreting whether a certain increase in explained variance may lead to socioeconomic benefit against the background of enhanced costs. The BSS is calculated by relating the rmse between forecast
*Ŷ*^{B}and predictand*Y*to the rmse between climatological mean^{B}*a*_{0}and predictand, both referring to the independent data of the six bootstrap years subscripted by*B:*For a perfect forecast, the BSS equals 1. Values larger (smaller) than zero indicate that the forecast*Ŷ*^{B}provides some (no) added value compared with the climatological forecast*a*^{B}_{0}. - The log-odds ratio provides a more sophisticated skill analysis of a forecast system (Stephenson 2000). For continuous variables it can be calculated for different thresholds, e.g., of observed and simulated rainfall amount, and discussed as a function of these. Thus, it draws a more differentiated picture of the forecast skill for weak and strong negative and positive anomalies around the long-term mean. It is conceivable that a forecast system is very skillful for weak anomalies, whereas extremes are hardly captured. This may still lead to a high BSS, since weak anomalies predominate, but the correct forecast of extremes may be much more relevant in terms of socioeconomic benefit. The log-odds ratio is based on simple counter statistics, which can be arranged in a contingency table with four entries: The number
*a*represents all years or other time steps where the forecast*f*and the observations*o*exceed a given threshold*T*. It is called the hit rate_{i}*H*. Accordingly, the false alarm rate*F*summarizes all cases in*b*, where the forecast is above and the observations below the threshold. The opposite is true for the “miss”*c*, while*d*counts the “correct rejection,” implying that both forecast and observations are below the threshold. The odds ratio*θ*is given by with_{i}*â*=*a*−*δa*and*δ*= (*b*−*c*)/(*a*+*b*) (likewise for all four entries) being the so-called hedging correction accounting for systematic model errors. The logarithmic odds (log-odds) ratio is an asymptotically normally distributed random variable on condition that*a*,*b*,*c*,*d*> 5, allowing the estimate of statistical significance for the forecast skill at a given threshold*T*. We examine thresholds between ±0.2 and ±1.4 standard deviations in order to evaluate also the predictability of large rainfall anomalies._{i}

## 4. Selection of predictands

As climate predictability is usually enhanced in areas where atmospheric processes are largely affected by SST variations, the predictand regions are defined according to the contribution of the SST forcing to total variability. Using the ANOVA, the portion of total rainfall variance accounted for by global SSTs is shown in Fig. 1. Only values are plotted that are statistically significant at the 5% level or lower. The ANOVA results are differentiated between the periods December–March (DJFM) and June–September (JJAS), since the rainy season may occur in different months depending on the geographical location. This estimate of SST-related variability is an upper limit for the predictability. In general, the SST impact prevails in the low latitudes and, not surprisingly, especially over the oceans, where it explains up to 90% of total rainfall variability. This is true for the tropical Pacific in boreal winter as well for the Atlantic Ocean in summer. Precipitation over the Indian Ocean is less governed by SST changes. The oceanic boundary also plays a major role over some landmasses like tropical South America in winter and the West African monsoon region in summer. The SST influence propagates far north into continental West Africa in summer. Indonesia is all through the year strongly affected by SSTs. Toward the extratropics, random atmospheric processes seem to gain control on rainfall fluctuations. The patterns basically confirm the suitability of the Tropics for seasonal forecasting (Reichler and Roads 2004).

Among other variables, the zonal wind component in 700 hPa reveals a striking sensitivity to SST variability (Fig. 2). Aside from the equatorial Walker circulation a weaker signal with still 40% of explained variance is found over northeastern Africa. Since this part of Africa can be regarded as a source region for the AEJ and AEWs, which in turn induce rainfall events in sub-Saharan West Africa (Druyan et al. 1997; Grist and Nicholson 2001; Hastenrath 2000), it joins the list of predictands.

Based on these results, 14 predictand regions are defined as illustrated in Fig. 3. The majority of predictands is derived from the precipitation field. For the predictand time series observed rainfall in wintertime (DJFM) is averaged over Central America, northeast Brazil, the eastern Amazon basin, the Congo basin, Angola, and southeast Africa. Summertime (JJAS) precipitation is averaged over the Sahel zone, the Guinean coast region, tropical East Africa, India, and Indonesia. Oceanic grid points are masked out when computing the regional means. In addition, two nonprecipitation predictands are defined: the summertime zonal wind in 700 hPa over eastern Africa (here referred to as WNA) and the DMI as a dynamical measure of the Indian summer monsoon intensity. In some cases, the selection is not supported by a strikingly high level of SST impact as, for instance, in Central America or East Africa. These predictands have been included by curiosity, since seasonal forecasting, if possible, would be of great benefit. After the spatial averaging, all predictand time series are standardized and low-frequency variability is suppressed by removing a trend polynomial of fourth order.

## 5. Selection of predictors

From a specific observed predictand time series the predictors are derived from the simulated variables (Table 2), as well as observed SST by linear correlation analysis. An example of the predictor selection is presented in Fig. 4, showing the linear correlation between summertime (JJAS) Guinean coast precipitation and monthly geopotential height anomalies all over the globe in different tropospheric levels and with different lead time. Only the 5% quantiles of strongest positive and negative anomalies are displayed. The West African monsoon season is out of phase with geopotential height in the tropical Atlantic during May to July and in phase with the tropical Pacific in the preceding April and May, hence La Niña–type anomalies (cf. Paeth and Hense 2003). Toward higher tropospheric levels, the Atlantic predictor appear to be barotropic, whereas the Pacific impact mainly prevails in the lower troposphere with an indication of a sign reversal at upper levels, pointing to an involvement of the Walker circulation (Latif and Grötzner 2000). Some effect is also exhibited by the southern Indian and Pacific Oceans. The predictor time series for the MOS approach are built by averaging over the regions of strongest linear correlation with the predictand. This may be a regional mean in one level; a horizontal gradient, if two neighboring regions with strong opposite correlation exist; or a vertical gradient in the case of a baroclinic structure. This analysis is carried out for all simulated variables in different atmospheric levels as well as observed SSTs with lead times up to 2 months prior to the onset of the considered seasonal precipitation. Only those predictor time series are chosen that reveal a relatively high linear correlation with the predictand.

This results in a large list of possible predictors for each predictand region (Fig. 5). According to the amount of striking correlations, the number of predictors varies from predictand to predictand. While the observed WNA is related to 40 predictor time series, only 13 time series are found for the Congo basin. In general, the dynamical predictands are better represented than the rainfall predictands. To the left of the thin vertical line, the superensemble rainfall predictors are listed, which are the simulated area averages of precipitation in the same region as the predictands, except for the dynamical predictands WNA, DMI, and SOI. The gray shading of the bars indicates the significance level of the correlation coefficients between predictors and predictand. It is obvious that most linear relationships are statistically significant at the 1% level. Each bar is labeled by the respective predictor variable in a specific atmospheric level, a geographical region and a certain time lag with respect to the predictand period, DJFM or JJAS. This information is also summarized in Table 3 (see below). For the final MOS approach, the actual number of predictors will be determined in the stepwise regression. However, the large number of predictors in Fig. 5 holds the risk that some of them are not independent of each other in the stepwise regression, given the distinct teleconnections in the low latitudes (Klein et al. 1999). These dependencies will cause problems. To avoid an overfitting of the regression model and an overestimation of the forecast skill, an EOF analysis is applied to all predictor lists prior to computing the multiple regression equation. The evaluation of residual variance in each regression step is based on a cross-validation procedure. The cross validation splits the full dataset into a dependent training dataset, which is used to estimate the EOFs and regression coefficients between PCs and predictands. The EOFs are projected onto the predictands of the independent dataset. The regression coefficients are used to estimate the predictands from the independent period.

## 6. Statistical model

### a. Superensemble approach

First, the stepwise multiple regression analysis is carried out using the superensemble rainfall from four different climate models as predictors. The explained variance of this type of model calibration (cf. Feddersen et al. 1999; Krishnamurti et al. 1999), averaged over 1000 bootstrap iterations, is displayed by the circles in Fig. 6 for each of the 11 rainfall predictand regions. The numbers in brackets denote the length of the input time series in years, depending on the minimum length of the considered predictand and predictor time series (see Tables 1 and 2). The *x* axis refers to the climatological mean (*x* = 1) and the four ensemble mean time series of seasonal precipitation, according to the definition of the predictand time series. The order of the predictors reflects their importance in terms of observed rainfall. By definition the predictability of the statistical model increases with each additional predictor, but the process is not linear: The leading predictors usually contribute more explained variance to the system than the least important ones. Depending on the predictand region, the superensemble approach maximally accounts for 5% (Central America) to 65% (Indonesia) of total observed variance. Note that the dynamical predictands are not dealt with in this step.

The black circles mark the number of appropriate predictors as a mean over 1000 realizations. The following predictors do not add further information, as inferred from cross validation, and are cut off. In most cases, only one predictor is retained, implying that the most realistic climate model is sufficient to describe the observed interannual precipitation changes, in contrast to Krishnamurti et al. (1999). In some regions with rather low overall predictability, none of the considered ensemble means reduces the rmse with respect to the independent data. This means that no seasonal forecasting is possible from simulated rainfall time series, at least given the limited selection of climate models used here. The thin vertical bars indicate the 95% confidence intervals of explained variance derived from the 1000 iterations. Given the intervals do not include zero, the corresponding predictability of the regression model is statistically significant at an error level of 2.5%. This is true for most predictand regions, except Central America and southeast Africa, and therein for almost all predictor lists. The order of the model predictors is always documented. It is a function of the 1000 realizations and, especially, of the predictand regions. In almost all cases, the more recent ECHAM4 and HADAM2 atmospheric climate models provide the most relevant (or realistic) predictors (not shown). The order is quite robust with respect to the various bootstrap iterations. Now the question arises to which extent the MOS approach, which includes dynamical predictors from one model ensemble, results in enhanced predictability.

### b. MOS approach

Figure 7 shows the explained variance of the MOS approach for each predictand as a function of the considered predictor lists. It is obvious that a regression with a nonrestricted predictor set allows for a much better model calibration. Note that now the two nonprecipitation predictands are included. As explained in section 3, the maximum number of predictors depends on the number of statistically significant correlation coefficients between observed precipitation and simulated variables as well as on the length of the time series, ensuring that the predictor variance–covariance matrix can be inverted (von Storch and Zwiers 1999). At first sight, the predictability of the MOS approach is much larger than in the superensemble method (see Fig. 6). Although the predictor set is partly confined to one or two time series, the explained variance ranges between 22% in the Congo basin and 76% in terms of the DMI. Predictability for the dynamical variables is very pronounced. In the cases of Angola, Central America, the Sahel zone, and southeast Africa, predictability only exists if the MOS approach with dynamical predictors is taken into account. The 95% confidence intervals are mostly small and highlight that the MOS results are quite robust with respect to different bootstrap selections. It is a very promising result that particularly those regions that are highly vulnerable with respect to rainfall fluctuations like the Sahel, the Nordeste region, and Central America (Benson and Clay 1998; Bertacchi et al. 1998; Findley 1994), are characterized by a substantial improvement in seasonal predictability.

The robustness of the order of predictors with respect to the 1000 realizations is illustrated in Fig. 8 for the Guinean coast predictand as an example. The height of the gray bars indicates how often within 1000 bootstrap iterations a certain predictor occupies a certain rank in the stepwise multiple regression model. Remember that in the MOS system the predictors represent PC time series derived from EOF analysis of the original predictors. It is obvious that the leading EOF is always the most important precursor of observed summertime precipitation in the Guinean coast region. Likewise the fourth EOF occupies rank 2 in almost all 1000 realizations. The assignment of the lower ranks is less clear. However, the cross validation has revealed that these lower ranks do not suitably contribute to the MOS system for Guinean coast rainfall (see Fig. 7). Similar results are found for the other predictand regions, demonstrating that the MOS equations are not excessively sensitive to the bootstrap selection.

To gain insight into the physical relationships between observed predictands and simulated predictors, the results of the MOS approach are transformed back from the EOF space to the associated original predictor time series. In terms of the Guinean coast summer monsoon rainfall for instance, the MOS has revealed the first and fourth EOFs to contribute to seasonal predictability. In Fig. 9 the two most important associated original predictors are displayed. They are determined from the strength of the loadings of the corresponding EOF. The highpass-filtered predictand time series is shown as well (top panel). Per definition, all predictor time series are largely in phase with each other and with the predictand. The leading EOF (middle panel) is mainly related to regional predictors in the vicinity of tropical Africa, that is, SST in the eastern tropical Atlantic prior to the onset of the monsoon season (cf. Chang et al. 2000) but also to simulated precipitation during the rainy season (cf. Paeth and Hense 2003). This implies that the considered ECHAM4 model is able to reproduce the observed interannual monsoon variability quite well. This may lead to the conclusion that the complex MOS approach does not provide substantially new information with respect to the simple rainfall-to-rainfall forecast. However, it will be shown later that the MOS represents indeed a basic improvement. The fourth EOF (bottom panel) is an indicator of the ENSO impact on African monsoon precipitation (Nicholson et al. 2000; Sutton et al. 2000). It is tied to sea level pressure (SLP) variations in the eastern tropical Pacific and upper-tropospheric zonal wind over the tropical Atlantic basin. This reflects the ENSO-related teleconnections via the Walker circulation in the low latitudes (Bertacchi et al. 1998; Latif and Grötzner 2000).

Finally, Fig. 10 addresses the predictor–predictand relationships for Central American precipitation. In the superensemble setup, this region is characterized by almost no predictability (see Fig. 6). EOF 1 and 2 can easily be distinguished by its spatial reference, that is, the Caribbean Sea and the northeastern tropical Pacific, respectively. Various dynamical predictors explain up to 60% of total rainfall variability in a highly vulnerable region of the globe, where climate models fail to simulate interannual precipitation variability in a realistic way (see Fig. 6). Apparently, the model simulations contain the correct information for precipitation forecasts within the dynamical variables. But the models fail to transfer this information to simulated precipitation, for example, due to resolution or errors in the parameterizations.

An overview of all predictand regions is given in Table 3, listing the optimal number of EOF predictors and the mean predictor time series of the original model fields. At first sight, ENSO predominates tropical climate anomalies, governing seasonal rainfall variations from Indonesia over the Indian monsoon region, large parts of Africa, and Central as well as South America. Especially, the Walker circulation serves as a circumglobal mechanism to translate changes in the tropical Pacific basin to regional climate fluctuations all around the globe. In addition, it is striking that tropical Africa is more affected by regional predictors, particularly from the surrounding oceans. Nicholson et al. (2000) have pointed to the fact that the ENSO influence on West African rainfall is less robust than in most other parts of the low latitudes. Thus, the statistically based predictor–predictand relationships are also interpretable in a physical sense and can be traced back to the existing literature. This principally holds for all predictand regions considered here.

## 7. Evaluation of the statistical model

### a. Brier skill score

The skill of the MOS and superensemble approaches is measured and compared by two skill scores. In this subsection, the BSS indicates the improvement of predictability with respect to the so-called climatological forecast, where the long-term mean is taken as forecast of future climate anomalies (von Storch and Zwiers 1999; see section 3). A valuable forecast approach is of course required to provide a better skill than this trivial climatological forecast. On the other hand, there is no objective threshold or statistical test criterion to decide whether an increase in explained variance justifies the use of more complex forecast methods. Therefore, we list the BSS of all predictand regions and both forecast strategies in Table 4 in order to gain insight into the relative skill in various parts of the globe. The explained variance of the optimal multiple regression equations is indicated as well, according to the black circles in Figs. 6 and 7. The predictands are in the order of predictability as inferred from the MOS equation. It is obvious that the dynamical predictands—DMI and WNA—are characterized by the highest explained variance and BSS values. The MOS system accounts for more than 70% of total interannual variability. The BSS indicates that the explained variance is enhanced by around 50% with respect to the climatological forecast. The superensemble approach has not been realized for the dynamical predictands. In terms of seasonal precipitation, there is no clear geographical structure: Regions with high skill are found next to regions with low skill, like for instance the Nordeste region in Brazil (BSS ∼ 44%) versus the Amazon basin (BSS ∼ 25%). There is also no systematic relationship with geographical latitude or westside and eastside locations. The best rainfall predictability is found in Indonesia, as to be expected due to the strong SST impact around the scattered islands. The Guinean coast region, representing the reference predictand and for the description of the MOS results, ranges in the middle third with an explained variance of ∼52% and a BSS amounting to ∼31%. The worst results are revealed by the Congo basin and southeastern Africa. In all cases, the order of the BSS corresponds to the order of explained variance by the MOS approach.

Comparing both forecast methods with each other clearly shows that the MOS approach is more skillful than the superensemble. In some regions like Central America, Angola, the Sahel, and southeast Africa seasonal forecast is only possible, if dynamical predictors are taken into account. In terms of seasonal precipitation in Central America, for instance, the superensemble approach does not provide any predictability nor skill, whereas the MOS is able to explain almost 60% of the interannual fluctuations. This demonstrates to which extent realistic rainfall information can be inferred from the simulated dynamics of the ECHAM4 model. Note that the superensemble in this study cannot directly be compared with the one in Krishnamurti et al. (1999). The latter superensemble has used a much larger number of atmospheric models to train the multiple regression model. It is conceivable that the skill of the superensemble approach is enhanced when more models are included. On the other hand, the MOS approach may equally profit from the consideration of additional climate models, provided that the ECHAM4 model does not represent the most powerful one. Thus, the MOS approach provides a substantial improvement of classical forecast strategies and model calibrations. This is at least true for the general enhancement of predictability as measured by the BSS.

### b. Log-odds ratio

A more differentiated skill score is the log-odds ratio (LOR) (Stephenson 2000; see section 3). It measures the ratio between the forecast hits and the false alarm rate given a certain threshold of climate anomalies. The higher it is the more this ratio is shifted toward the correct forecast of threshold exceedance (or correct rejection). An LOR of 5 for instance means that the counts of “hits” and “correct rejection” are 150 times larger than the product of “false alarm” and “miss.” In the present study, we compute the LOR for increasing positive and negative anomalies ranging from 0.2 times standard deviation (STD) to 1.4 times STD. Thus, we can assess the ability of both forecast methods with respect to weak and strong departures from the long-term mean. The LOR of the MOS approach is displayed in Fig. 11 for each predictand region. For most predictands, the LOR is statistically significant even for distinct anomalies up to one STD. In some cases, even strong departures up to 1.4 STD are correctly predicted by the MOS system. Stronger anomalies cannot be addressed, since the criterion of having at least five values in each category is not fulfilled (see section 3). An exception is given by the Congo basin, which does not show up with any forecast skill. It is also characterized by the lowest predictability. Southeastern Africa also reveals a low skill, especially with respect to dry conditions. In general, there is a certain relationship between the order of the predictands in Table 4 and the number of significant LORs: The higher the overall predictability and BSS, the more skill is usually found also in terms of various thresholds of climate anomalies. The shape of the diagrams is not fully symmetric: In some regions like Angola, Indonesia, and Central America strong positive anomalies are predicted with more certainty than negative ones. In the Amazon basin, India and for all dynamical predictands it is vice versa. The height of the bars often rises from low to high thresholds. This is due to the fact that the LOR is also increasing, if many counts are found in the category of correct rejection (see section 3). This category is more frequently counted for strong than for weak anomalies. Note that the 1% confidence intervals are quite small, since the overall number of values is large (*n* > 6000).

For reasons of comparison, Fig. 12 lists the LOR of the superensemble approach. For many predictands, the skill is substantially lower than in the MOS system. In addition, strong anomalies are less accurately predicted. This particularly holds for Central America and the Sahel, for which no skill is provided by the superensemble forecast. However, there are some exceptions like the Amazon basin, the Guinean coast, and India, where the superensemble is more appropriate to forecast strong precipitation anomalies. An explanation may be that in these regions rainfall predictors from other climate models than ECHAM4 are more powerful in terms of extremely dry and wet monsoon conditions than the dynamical predictors in the MOS system, which are exclusively derived from ECHAM4. Remember that the dynamical predictands are not addressed by the superensemble method. In general, the LOR confirms that the MOS approach is basically more skillful than the classical superensemble method.

## 8. Discussion

This study has presented an alternative approach in seasonal climate forecasting, which uses dynamical predictors from global climate models rather than relying on simulated precipitation alone. This procedure is motivated by the fact that climate models are usually most reliable in terms of the large-scale atmospheric circulation. We have developed a stepwise multiple regression model in order to determine statistical transfer functions between observed predictands and simulated predictors from a six-member ensemble of long-term ECHAM4 experiments. A cross validation is applied to assess the robustness of the statistical relationships. The predictand regions are defined, according to their sensitivity to SST forcing, since the oceanic component is supposed to be autocorrelated at seasonal time scales. The predictors are determined by linear correlation analysis with the predictand, taking various atmospheric variables in several levels and with different lead times into account. Fourteen predictand regions are found, all being located in the low latitudes. Up to 44 predictors are detected per predictand. An EOF analysis of all predictor time series is carried out prior to computing the regression model in order to avoid an overfitting of the latter. This so-called MOS forecast is compared with the classical superensemble method (Krishnamurti et al. 1999) by means of two skill scores. The MOS is generally more skillful than the superensemble. This concerns the overall explained variance by the regression model, the enhancement of predictability with respect to the climatological forecast as measured by the BSS, and the accuracy of the forecast in terms of strong climate anomalies as indicated by the LOR. In some regions like Central America and the Sahel zone, the superensemble precipitation hardly provides any forecast skill, likely due to problems in the parameterization of cloud and rainfall processes and the representation of orography, whereas the MOS system reveals a remarkable predictability, amounting to 57% and 51% of the total interannual variations, respectively.

The results of this study highlight how well observed climate fluctuations, also in the hydrological cycle, can be inferred from the simulated dynamics of state-of the-art global climate models, even in regions where the simulation of rainfall itself fails. This conclusion is very promising for an improvement of present-day seasonal forecasting strategies (e.g., Garric et al. 2002; Krishnamurti et al. 1999; Mo and Thiaw 2002; Tarhule and Lamb 2003), which by the way still do not exist in many parts of the globe where a substantial forecast potential is given and seasonal forecasting is definitely required by decision makers in agriculture, energy production, and policy. Of course, this implies that the positive results in this hindcast study, which have been performed under the so-called perfect model assumption, that is, perfectly known boundary conditions in the form of observed SSTs, can be transferred to a real forecast period. A conceptual idea for the operational forecast of, say, the forthcoming rainy season in the Guinean coast region consists of the following steps: 1) global SST anomalies are observed instantaneously until the onset of the summer monsoon season; 2) based on the autocorrelation of the tropical oceans (Paeth and Hense 2003), the SST anomalies are extrapolated several months into the future, maybe under slight variations in order to produce ensemble data; 3) the extrapolated SST fields are used as lower boundary conditions in atmospheric climate models, which are integrated until the end of the rainy season; 4) the model output—dynamical variables and precipitation—enters the MOS equation, which was a priority trained with hindcast data as in the present study; and 5) the resulting MOS forecast is provided as seasonal forecast with mean and confidence intervals over various ensemble members, according to the SST extrapolation. Note that a general deficiency of such combined dynamical–statistical forecast approaches is the assumption that the statistical transfer functions are stationary in time. This is not necessarily the case, especially in a changing climate (e.g., Taylor et al. 2002).

There are still two important improvements of the presented MOS forecast to be carried out: 1) Additional climate models should be taken into account in order to enhance the forecast skill of the MOS system. Although the dynamics of the ECHAM4 atmospheric climate model has been shown to account for a large part of the observed interannual variability, it is conceivable that other model tends to be even more powerful. Krishnamurti et al. (1999) have shown that a multimodel ensemble is generally more accurate than an individual climate model ensemble. This may also be true for the dynamical predictors in the MOS approach. 2) From a spatial point of view the predictand regions are not sufficiently differentiated for applications in agricultural planning at the regional scale, since the predictand regions represent spatial means over large areas like, for instance, the entire Guinean coast region. A finer-scale forecast may be performed by nesting a regional climate model in the SST-driven global climate model data. Under the perfect model assumption the regional climate model REMO of the Max Planck Institute for Meteorology in Hamburg has been found to provide a promising forecast potential over tropical Africa at the regional scale. Further investigations will address to which extent the predictability during the hindcast period presented here can be transferred to an operational forecast with respect to the forthcoming rainy season. An implementation of our results for the improvement of seasonal forecast in tropical West Africa is planned in near future.

## Acknowledgments

This work was supported by the Federal German Minister of Education and Research (BMBF) under Grant 07 GWK 02 and by the Ministry of Science and Research (MWF) of the Federal State of Northrhine-Westfalia under Grant 223-21200200. We thank two anonymous reviewers whose comments helped to substantially improve the readability of the paper.

## REFERENCES

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Correspondence author address:* Heiko Paeth, Meteorological Institute, University of Bonn, Auf dem Hügel 20, 53121 Bonn, Germany. Email: hpaeth@uni-bonn.de