## 1. Introduction

Rainfall is the most important climate element affecting the Ethiopian population (96.6 million). The main rainy season (called Kiremt) occurs during June–September (JJAS), when the northern two-thirds of the country receives 65%–95% of Ethiopia’s total annual rainfall (Segele and Lamb 2005) and produces 85%–95% of its food (Degefu 1987). Because all Ethiopian agricultural activities and resulting crop production are dependent upon on the distribution and amount of JJAS rainfall, its seasonal prediction is of great importance for agricultural planning and socioeconomic disaster mitigation.

Several previous studies have examined aspects of the intraseasonal-to-interannual variability of Ethiopian rainfall. These studies have considered the following: Kiremt onset, cessation, and resulting growing season variability (Segele and Lamb 2005); the abrupt latitudinal rainfall changes involved and their representation in climate model simulations (Riddle and Cook 2008; Segele et al. 2009c); associated large-scale atmospheric circulation characteristics and global sea surface temperature patterns (Beltrando and Camberlin 1993; Shanko and Camberlin 1998; Segele et al. 2009a; Berhane et al. 2014); and connections to the Indian monsoon (Camberlin 1997; Vizy and Cook 2003). Also investigated have been the hydrological responses to rainfall (Conway 2000), temporal trends and decadal shifts of the rain (Seleshi and Zanke 2004; Bowden and Semazzi 2007; Cheung et al. 2008; Williams et al. 2012; Viste et al. 2013; Berhane et al. 2014), and seasonal spring (Belg) and Kiremt rainfall predictability (Gissila et al. 2004; Korecha and Barnston 2007; Block and Rajagopalan 2007; Diro et al. 2008; Korecha and Sorteberg 2013; Jury 2013; Nicholson 2014).

Segele et al. (2009b, hereinafter Part I) used wavelet analysis to identify temporal and spectral characteristics of the seasonal-to-interannual variability of 5-day average Ethiopian JJAS rainfall, and to present a time–frequency quantification of teleconnections between the rainfall and large-scale atmospheric circulation and global sea surface temperature (SST) patterns. The results linked Ethiopian monsoon rainfall variations principally with seasonal-to-interannual time-scale atmospheric circulation patterns that involve fluctuations in the major components of the monsoon system, including the monsoon trough, tropical easterly jet (TEJ), and Somali low level jet (SLLJ). Part I thus provided the physical basis for understanding the mechanisms of Ethiopian rainfall variability and identifying regional rainfall drivers. The current study builds on that foundation by developing new statistical methods for assessing the predictability of monthly-to-seasonal Ethiopian rainfall on national and local scales, as a step toward an operational seasonal prediction capability for Ethiopia.

On interannual time scales, monsoon rainfall variability now is widely recognized as being governed by slowly varying surface boundary conditions, such as SSTs, land surface albedo, and soil moisture. For Africa, the role of basin- and global-scale SST anomalies for such rainfall variability received considerable attention over the last 35 years (e.g., Lamb 1978a,b; Folland et al. 1986, 1991; Lamb and Peppler 1992; Barnston et al. 1996; Ward 1998; Mutai and Ward 2000; Camberlin et al. 2001; Giannini et al. 2003; Segele et al. 2009a; Part I; Ndiaye et al. 2011). Using diagnostic relationships developed in those studies, several other investigations further assessed the predictability of Ethiopian Belg (Diro et al. 2008) and Kiremt (Gissila et al. 2004; Block and Rajagopalan 2007; Korecha and Barnston 2007; Jury 2013; Nicholson 2014) seasonal rainfall. The possible role of land surface conditions for African rainfall was noted first by Charney (1975) and has been investigated in many subsequent studies (e.g., Charney et al. 1977; Charney and Shukla 1981; Xue and Shukla 1993; Clark and Arritt 1995; Clark et al. 2001).

The surface boundary focus of the present Ethiopian study is SST. However, El Niño–Southern Oscillation (ENSO)-related “predictability barrier” in Northern Hemisphere spring (e.g., Goswami and Shukla 1991; Webster and Yang 1992; Webster et al. 1998) can pose a major challenge to providing seasonal rainfall forecasts two or more months in advance in the tropics (Goddard et al. 2001; Korecha and Barnston 2007). Also, because the effects of slowly evolving global SST variations on Ethiopian rainfall must be transmitted through changes in local and regional atmospheric circulations and SST patterns, development of monthly-to-seasonal Ethiopian rainfall predictions can be enhanced by identifying local and regional predictors and aggregating their effects into ensemble-based statistical prediction schemes.

The present Kiremt predictability investigation involves 2–3-month lead time forecasts of local, regional, and national rainfall for Ethiopia. Individual predictions employ a set of 20 predetermined regional and global predictor variables to construct equal number ensembles of initial statistical model estimates that capture potentially valuable and unique signals from every variable in the 20-member predictor set. Final models are selected objectively from the 20 initial models after a series of tests are applied that address regression assumptions and statistical significance requirements. The averaging of the final ensembles of statistical estimates is shown to improve cross-validation skill compared to a traditional, single-model based technique. The ensemble provides a novel approach to quantify the envelope of uncertainty created by the model building process, reflecting the predictive uncertainties associated with individual predictors and models. Cross-validation skill improvement is achieved through error minimization afforded by the ensemble.

## 2. Data and methodology

Accurate prediction of all-Ethiopian Kiremt rainfall would provide valuable national-scale information. Stakeholders would benefit from accurate monthly rainfall predictions at local and regional levels, especially for highly populated areas and those with large interannual rainfall variability. To achieve a high level of accuracy and a measure of forecast uncertainty, ensemble-based statistical prediction tools were developed and assessed for 1) all-Ethiopian standardized Kiremt rainfall anomalies, 2) August total rainfall at Addis Ababa (the most populated city in the African Union) and Combolcha (center of the 1984 Ethiopian drought), and 3) northeastern Ethiopia standardized JJAS rainfall anomalies. Addis Ababa and Combolcha have daily rainfall records, with Combolcha’s high interannual rainfall variability challenging traditional single-model-based seasonal forecast methods (e.g., Gissila et al. 2004; Korecha and Barnston 2007). Northeastern Ethiopia was selected for regional prediction because of the region’s susceptibility to drought and high rainfall variability (e.g., Lanckriet et al. 2015).

### a. Data

This study utilizes sets of Ethiopian rainfall (predictand) and large-scale atmospheric and global SST (predictor) data for 1970–99 that were described in Part I and now have been extended through 2000–02. For seasonal prediction at a national level, a standardized all-Ethiopian JJAS rainfall index originally was constructed for 1970–99 using daily totals from 100 rain gauge stations (Fig. 1, dots). The present extension of that index for 2000–02 employed seasonal totals for 52 of those stations (Fig. 1, squares) analyzed in Korecha and Barnston (2007; data obtained from A. G. Barnston in 2008). For local prediction examples, August monthly rainfall totals for 1970–99 were used for Addis Ababa and Combolcha (Fig. 1). For regional prediction, 11 stations from three regional states (Amahara, Afar, and Tigray) were used to construct a standardized regional JJAS rainfall index for 1970–99 for northeastern Ethiopia (Fig. 1, stars). This index was extended for 2000–02 using seven stations from Korecha and Barnston (2007) seasonal total rainfall for northeastern Ethiopia (Fig. 1, collocated squares/stars).

The set of atmospheric predictors for 1970–99 was constructed from monthly NCEP–NCAR reanalysis data (50°N–40°S, 30°W–90°E and 2.5° latitude × 2.5° longitude spatial resolution; Kalnay et al. 1996). These predictors included daily-averaged fields of mean sea level pressure (MSLP), geopotential height, temperature, horizontal wind, vertical velocity, and specific humidity at standard pressure levels. SST predictors for 1970–99 were obtained from the Met Office Hadley Centre global monthly SST dataset (HadISST1; 1° latitude × 1° longitude resolution; Rayner et al. 2003). Both these datasets were extended to 2002.

### b. Prediction design

In Part I, wavelet analysis identified the temporal and spectral characteristics of the intraseasonal-to-interannual variability of 5-day average JJAS Ethiopian rainfall, and their strong contemporary teleconnections with large-scale atmospheric circulation and global SST patterns. Although variability on quasi-biennial (QB; 5%) and ENSO (2%) time scales accounted for much less rainfall variance than the annual mode (66%), they modulated the season-long persistence of major regional monsoon components and their remote teleconnection linkages. However, time-lagged relationships between regional rainfall and such large-scale variables generally do not remain strong or statistically significant (e.g., DelSole and Shukla 2002; Segele et al. 2009a, their Fig. 14), which limits their predictive potential.

The present study therefore develops and applies an ensemble-based multiple linear regression technique to predict 1) standardized JJAS rainfall anomalies for all of Ethiopia for 1970–2002, 2) monthly rainfall totals for Addis Ababa and Combolcha for 1970–99, and 3) regional standardized rainfall anomalies for northeastern Ethiopia for 1970–2002 (Fig. 1). This approach assumes that slowly evolving global SST variations affect Ethiopian rainfall through changes in more regional and local atmospheric circulation and SST anomalies, even when the influential ENSO state is unpredictable or its phase has minimal effect (e.g., DelSole and Shukla 2002; Nicholson 2014). The methodology involves three distinct steps: 1) building a set of potential atmospheric circulation and SST predictors from a series of correlation maps; 2) constructing initial ensemble model members by specifying each potential predictor as the first predictor in a multiple regression model; and 3) selecting final ensemble model members based on statistical significance tests.

#### 1) Step 1: Build a set of potential predictors

Guided by Part I, the full three-dimensional state of the regional atmosphere and global SST was represented by a set of potential March predictors of Kiremt (JJAS) rainfall. Raw atmospheric predictors consisted of gridpoint values for 50°N–40°S, 30°W–90°E of geopotential height (Φ), temperature (*T*), horizontal wind (*u*, *υ*), vertical velocity (*w*), and specific humidity (*q*) at standard pressure levels, and MSLP. To accommodate potential interaction effects between these predictors, additional second-order predictors were derived by their cross multiplication—that is, Φ*u*, Φ*υ*, *Tu*, *Tυ*, *uυ*, *uw*, and *υw* at 12 pressure levels from 1000 to 100 hPa, and *qu* and *qυ* at eight levels from 1000 to 300 hPa. Also, following Part I, the horizontal wind components and vertical velocity for 1000–100 hPa within 30°N–20°S, 30°–50°E were used to construct zonal and meridional vertical cross sections of *uw* and *υw*. Raw global (60°N–60°S) SST predictors for March were used in their original form. Each gridpoint variable was standardized. Correlations were computed at all grid points and levels and, following Gissila et al. (2004), Block and Rajagopalan (2007), Korecha and Barnston (2007), and Nicholson (2014), coherent areas of highly significant correlation were selected and the average values of individual variables within the coherent areas were taken as candidate predictors. Here, it was required that the correlations remain statistically significant at 5% level according to a nonparametric bootstrap test for 1970–90 and 1970–99. In addition, the physical significance of the correlations was assessed for all selected predictors. Representative samples of predictors are discussed in section 3a. The number of candidate predictors for a given predictand was limited to a maximum of 20. These predictors were then used for both retroactive/retrospective verification (RV) and leave-one-out cross validation (LOOCV) approaches. In RV, the 20 selected March predictors were used to generate all-Ethiopian Kiremt (JJAS) rainfall predictions for the 1990–2002 verification period, based on 1970–89 model development data. In the LOOCV, 20 March predictors were used for each year during 1) 1970–2002 for all of Ethiopia national and northeastern regional predictions of seasonal rainfall anomalies and 2) 1970–99 for local monthly rainfall prediction at Addis Ababa and Combolcha, based on model development data for all other years in the corresponding training period.

The above predictors may not be mathematically independent. However, the problem of multicollinearity that arises in a given predictor matrix, owing to lack of independence between levels and variables, was evaluated for individual regression models during the final model selection stage, as described in step 3 below.

#### 2) Step 2: Construction of multimodel ensemble set

Model development used a forward selection stepwise multiple linear regression procedure (S-PLUS 2013) that began with base models containing only one predictor from the preselected set and an intercept. Other preselected predictors then were added individually in turn, until no further improvement resulted according to the Akaike information criterion (AIC; Venables and Ripley 1997, 218–222). This process was repeated until all preselected predictors (*N*_{p}) variables for a given prediction/verification year were used as a first model parameter, giving *N*_{p} individual prediction equations. This ensures that a potential predictive signal uniquely associated with a predictor is not unduly discarded. The forward selection procedure was necessary because the alternative backward elimination stepwise approach starts from a full (complex) model and does not guarantee that the final model will contain a particular predictor as its first parameter.

#### 3) Step 3: Selection of final model members

Final multimodel ensemble members were selected from candidate models identified in step 2, based on predictor independence, model assumption validity, model coefficient statistical significance, and overall model significance. Several procedures were applied to ensure that the multiple regression assumption of independent predictors was not seriously violated. Correlations between all predictors were determined for the entire time series, and models for which correlation magnitudes between any two predictors exceeded 0.5 were discarded. Further scrutiny for multicollinearity used the variance inflation factor (VIF; Tamhane and Dunlop 2000, p. 417) to eliminate remaining models with VIFs >10. Remaining models were considered for further analysis only when the null hypotheses of normality, zero autocorrelation, and homoscedasticity were not rejected at the 5% (*α* = 0.05) significance level according to the following tests: the Shapiro–Wilk normality test (Steinskog et al. 2007), the Durbin–Watson test for serial independence (Wilks 2006, 192–193), and the Breusch–Pagan test for conditional heteroscedasticity (Breusch and Pagan 1979). A model then was admitted to a final multimodel ensemble set only when its individual regression coefficients, as well as the overall model fit, were statistically significant at the 5% level according to an *F* test (e.g., Draper and Smith 1981, 31–33; Tamhane and Dunlop 2000, 407–410). In the above significance tests, if the cost of type I error is much larger (smaller) than expected, the *α* value can be adjusted to a smaller (larger) value. The cost of type II error needs to be considered simultaneously when adjusting *α*. Following other regression-based prediction development (e.g., Hastenrath et al. 1995; DelSole and Shukla 2002; Saunders and Lea 2005; Block and Rajagopalan 2007; Korecha and Barnston 2007), nonparametric tests were not performed for regression coefficient significance because of the considerable computational requirements associated with the large number of regression models involved.

### c. Forecast verification

Comprehensive model evaluation was performed by comparing predicted and observed rainfall values for mutually exclusive/independent verification periods, using goodness-of-fit and relative/absolute error measures with supporting estimates of reliability (Willmott et al. 1985; Legates and McCabe 1999, hereinafter LM99). Mean absolute error (MAE) and root-mean-square error (RMSE) statistics quantified prediction errors. To assess the models’ goodness of fit or relative error, the Pearson’s product-moment correlation coefficient, coefficient of efficiency [*E*_{2}; Eq. (2) of LM99], index of agreement [*d*_{2}; Eq. (3) of LM99], modified index of agreement [*d*_{1}; Eq. (4) of LM99], and modified coefficient of efficiency [*E*_{1}; Eq. (5) of LM99] were all calculated. Willmott et al. (1985) and LM99 explain *E*_{2}, *d*_{2}, *d*_{1}, and *E*_{1}, while DelSole and Shukla (2002) and Wilks (2006, 280–281) further discuss *E*_{2}.

*J*is the number of prediction categories, and

*Y*

_{m}and

*O*

_{m}are cumulative predicted and observed probabilities through category

*m*, respectively. The RPS was converted into a ranked probability skill score (RPSS) by computing it relative to climatological probabilities, whereThree equiprobable categories (terciles) were used, and were obtained by ranking and grouping predictand time series into the bottom (below normal), middle (near normal), and top (above normal) thirds.

Because the aforementioned measures follow no known distribution, no standard parametric approach can determine the reliability and significance; therefore, significance and confidence intervals were assessed through two nonparametric bootstrap methods that permit any data distribution to generate statistics (Efron and Gong 1983; Willmott et al. 1985; Solow 1985; LM99). First, and following Part I, the reliability of each sample statistic (Mason and Mimmack 1992; Nicholls 2001) was assessed by calculating its confidence intervals, using the bias-corrected and accelerated (BCa) bootstrap technique of Efron and Tibshirani (1993, p. 178). To establish confidence interval accuracy, the BCa method was applied to sample statistics computed from 5000 bootstrap samples of *N* pairs of observed/predicted rainfall time series members. Each of the *N* pairs was obtained by reshuffling and randomly choosing, with replacement, single-pair values at a time from the *N* pairs of observed/predicted rainfall time series members. In the second bootstrap approach, an achieved significance level (ASL)—the probability of exceeding the observed correlation by chance—was obtained for correlations computed from the same 5000 bootstrap samples of *N* pairs of observed/predicted rainfall time series members (Efron and Tibshirani 1993, p. 203). Each observed–predicted correlation was ranked among the bootstrap sample correlations to determine the fraction of the random correlations that were at least as large as the observed correlation.

## 3. Prediction of Kiremt rainfall

### a. Examples of March predictors for all-Ethiopia Kiremt rainfall

This section documents the physical importance of certain selected atmospheric and SST predictors for Kiremt rainfall prediction. Figures 2 and 3 (atmospheric circulation) and Fig. 4 (SST) document the March predictors that were most strongly correlated with standardized all-Ethiopian Kiremt rainfall for the 1970–90 and 1970–99 periods. The discussion below focuses on regions of strong correlations indicated by red arrows in Figs. 2–4.

Zonal temperature advection at 100 hPa (Tu100) over southern Europe during March correlated strongly with Kiremt rainfall, with achieved significance levels (ASLs) of ≤1% according to a nonparametric bootstrap test for both the 1970–90 and 1970–99 periods (Fig. 2, top). The association between north–south variations in zonal westerlies and warm lower stratospheric midlatitude air is manifested by strong correlations. The correlation magnitude decreased appreciably (from +0.50 to +0.40) at 150 hPa, although similar correlation patterns continue in the zonal wind at pressure levels greater than 250 hPa. The positive correlation indicates a likelihood of enhanced Kiremt rainfall with increased Tu100 over southern Europe.

Another potential predictor involving the mass field was found over the western equatorial Indian Ocean in association with the meridional flux of geopotential height at 500 hPa (ϕv500) during March (Fig. 2, middle). The main correlation signal was linked to the meridional wind (v500), but including variability in the midtropospheric tropical geopotential height field increased the correlation with Kiremt rainfall. Perturbation in the northerly wind occurs in response to the east–west (mainly) oscillation of a weak tropical high over Africa and the eastward passage of subtropical westerlies, which break the weak zonal midtropospheric easterlies over the equatorial Indian Ocean. The positive v500–rainfall correlations indicate that the strengthening of equatorial southerlies over the western Indian Ocean during March is likely to enhance Ethiopian rainfall in summer.

In the lower troposphere, meridional moisture flux in the western Indian Ocean in March is associated strongly with Ethiopian Kiremt rainfall. March meridional moisture flux is affected by a transitory subtropical anticyclone over the Arabian Peninsula, which brings moisture for the short rains in Ethiopia from mid-February to mid-May (locally called Belg) when it moves eastward over the Arabian Sea. In association with the eastward displacement of the subtropical high/anticyclone to the Arabian Sea (usually in response to a southward intrusion of westerlies), the dry continental northeasterlies are replaced by maritime northeasterly to easterly humid air flowing into Ethiopia from the Arabian Sea. The negative rainfall and meridional moisture flux at 925 hPa (qv925) association indicates that increased (decreased) northerly moisture flux over the northern Indian Ocean during March can lead to enhanced (reduced) Ethiopian seasonal rainfall during JJAS.

Predictors in the horizontal wind and vertical velocity fields include 500–200-hPa zonal and meridional winds and vertical velocity. In the upper troposphere, the product of 200 hPa *u* and *w* (uw200) during March possessed strong negative correlation with Kiremt rainfall over northeastern Arabian Sea. This correlation arises from the passage of a ridge–trough system that perturbs a normally weak zonal and subsiding flow over the northern Indian Ocean. The negative correlation indicates that increasing westerlies/ascending motion (easterlies/descending motion) during March likely lead to reduced (enhanced) Ethiopian Kiremt rainfall. Nicholson (2014) also found strong correlation between Horn of Africa rainfall and zonal wind at 200 hPa for May. The 200-hPa vertical velocity association with Ethiopian rainfall strengthens at 300 hPa with its footprint extending farther west to the western Indian Ocean (Fig. 3, middle).

In the middle troposphere, the zonal and meridional wind for March showed strong correlation with Ethiopian rainfall off the coast of Kenya (Fig. 3, bottom). The predictability signal was obtained from both the zonal and meridional components, with the zonal and meridional wind interaction shifting the region of influence near the African coast compared to Fig. 2 (middle). The negative correlation reflects the simultaneous strengthening of northerlies (southerlies) and westerlies (easterlies) during March over the western Indian Ocean can enhance Ethiopian rainfall during JJAS.

Much of Ethiopian Kiremt rainfall predictability from SST is generated from tropical Pacific SST, where there are strong statistically significant correlations for both the 1970–90 and 1970–99 periods. Previous studies also found the northern tropical Pacific to be a source of Ethiopian rainfall predictability (e.g., Block and Rajagopalan 2007; Korecha and Barnston 2007; Nicholson 2014). After demonstrating the circulation patterns associated with the correlations, a physical basis for retention of these predictors is confirmed and a total of 20 predictors were selected for modeling of all-Ethiopian Kiremt rainfall (Table 1). These predictors were used for both RV and LOOCV approaches.

Predictors selected for the prediction of standardized Kiremt (JJAS) all-Ethiopian rainfall anomalies (Fig. 1; 100 stations for 1970–99, dots; 52 stations for 2000–02, squares) using both retroactive verification (RV, section 2b) and leave-one out cross validation (LOOCV, section 2b) approaches. Predictors were selected based on their high correlations with rainfall (sections 2b and 3a). All predictand–predictor correlations are statistically significant with ASLs ≤5% according to a nonparametric bootstrap test.

### b. Retroactive verification of all Ethiopian predictions using March predictors

For RV of all-Ethiopian Kiremt rainfall predictions, 20 individual regression models (with intercepts) initially were developed for the 1970–89 training period, following the multimodel ensemble construction (step 2) described in section 2b. The numbers of model coefficients (not counting the intercepts) stepped into the initial regression models were 2 (5% of models), 3 (65%), and 4 (30%). The final multimodel ensemble set (11 models) was identified as described in step 3 (section 2b) using model statistics for the 1970–89 training period (regression coefficients and residuals, numbers of coefficients, multiple *R*^{2}, VIF, and correlations between predictors).

Model performance is summarized in Fig. 5, which displays the correlations between model-predicted and observed standardized all-Ethiopian Kiremt rainfall anomalies for the training (1970–89) and independent verification (1990–2002) periods for the initial 20 and final 11 models (Fig. 5b). All 20 models reproduced well the observations for the training period (*R*^{2} > 0.77), but their cross-validated skill decreased significantly for the independent 1990–2002 period, with 55% of models having less than 50% explained variance and RMSE increasing by 39%–193% (Fig. 5a). Figure 5b shows the regression equations for the final 11 selected models that satisfied the assumptions of linear regression technique objectively and attained statistical significance of 5% (section 2b). All models have a maximum of four nonintercept coefficients. With the exception of the vw925 predictor, which appeared in 7 of the 11 final models as the second parameter, the other predictors are fairly uniformly distributed with ≤4 out of 11 frequency of occurrence in the ensemble. Comparison of Figs. 5a and 6 shows that significant prediction improvement resulted from averaging the ensemble members, consistent with findings of Krishnamurti et al. (2000) and Bohn et al. (2010) for dynamic multimodel ensemble forecasting. This benefit primarily arises from cancelling (reinforcing) disagreements among (common features between) ensemble members (Wilks 2006, p. 235).

Figure 6 and Table 2 document the RV performance using the mean and median of the final 11-member model ensemble. To highlight the skill advantages that arise from the ensemble averaging, the median column in Table 2 gives the median values of the statistics under the observed versus predicted column but obtained for the final 11 regression models individually prior to ensemble averaging. Pearson’s correlations between observed and predicted ensemble means (+0.84) and medians (+0.79) for the 1990–2002 verification period have ASLs of <1% according to a nonparametric bootstrap test (Fig. 6). The Spearman’s rank correlation increased slightly for the ensemble mean (+0.87) but remained nearly the same for the median (+0.78), both with similarly determined ASLs ≤1%, indicating strong monotonic association between the ranking of ensemble forecasts and observations. Predictions have similar variability as the observations, as evidenced by their standard deviations (Table 2, Fig. 6 inset). Absolute prediction error statistics (MAE and RMSE) are 46%–59% of the observed standard deviation. All relative error measures in Table 2 indicate that the model ensemble mean satisfactorily reproduced the observed standardized rainfall anomalies. The prediction showed conventional skill score (SS_{Clim}; Wilks 2006, p. 281) improvement over 1990–2002 climatology (62% MSE reduction) and persistence (87%). For tercile predictions (below, near, and above normal), the ranked probability skill score (RPSS_{Clim}) exceeded that of climatology (+0.55). Further, by comparing the ensemble mean statistics with the median average values for the individual regression model counterparts, there emerges a clear increase in skill for all statistics for the ensemble mean: increased prediction variance, reduced absolute errors, and much increased goodness-of-fit statistics.

Verification (section 2c) of 2-month lead time model predictions of standardized Kiremt (JJAS) all-Ethiopian rainfall anomalies (Fig. 1, squares, 52 stations) for 1990–2002 using the RV approach (section 2b, steps 2 and 3), based on atmospheric and SST predictors observed in March. Predictions are means for final 11-member multimodel ensemble developed using 1970–89 training data (section 3b). Details on statistics and confidence intervals (BCa bootstrap method) appear in section 2c. The median column gives the median values of the statistics obtained for the individual 11 regression estimates without pooling and the mean column under the bootstrap heading contains average of 5000 bootstrap statistic replicates.

Figure 7 documents the reliability of the above statistics in the context of empirically derived frequency distributions of bootstrapped samples and corresponding confidence intervals. Because the RPSS_{Clim} [skewness *b*_{1} = −0.3 and kurtosis *b*_{2} = +3.2, defined in Tamhane and Dunlop (2000, p. 118) and computed from sample replicates used in Fig. 7] and (especially) the Pearson’s correlation *r* (*b*_{1} = −0.7, *b*_{2} = +4.3) are negatively skewed and leptokurtic compared to the standard normal distribution, the BCa method makes appropriate adjustments to correct the skewness bias in the estimation of their confidence intervals. The modified index of agreement *d*_{1} also is negatively skewed and leptokurtic compared to the standard normal distribution (*b*_{1} = −0.7, *b*_{2} = +3.6), while MAE is positively skewed (*b*_{1} = 0.3) and mesokurtic (*b*_{2} = +3.0).

To examine the cross-validated performance of the ensemble prediction as a function of ensemble sizes, the number of predictors, and hence initial ensemble model sizes, was varied from five to the maximum available number (20) and the ensemble performance was assessed following the model development and selection procedures described in section 2b (steps 2 and 3). Using cross-validated correlation between observed values and predicted ensemble mean as a measure of model performance, Fig. 8 shows that as the ensemble size increases, the cross-validated correlation skill also increases from as low as +0.62 for a two-member model ensemble to the maximum of +0.84 for the final 11-member ensemble model set. At the same time, the variance of the prediction also decreases as the ensemble size increases, as shown by the dashed line in Fig. 8. However, the ensemble standard deviation for the final ensemble is still higher than the observed standard deviation (Table 2). It can also be seen from the figure that an increase in the number of predictors may not increase the ensemble size, because some of the models associated with the added predictors do not make it to the final ensemble set as they fail to satisfy preset conditions of statistical significance and regression requirements. In general, diversifying the predictor set can increase the likelihood of capturing regional atmospheric predictive signals that improve the prediction of Kiremt rainfall.

### c. Leave-one-out cross validation of all Ethiopian predictions using March predictors

All-Ethiopian Kiremt rainfall predictability also was assessed for 1970–2002 using March predictors via the LOOCV approach described in section 2b. The procedures described in section 2b were followed for model construction (step 2) and final ensemble model member selection (step 3). After the passage of all statistical tests, 7–14 models were accepted as the final ensemble set from the initial 20 models (Table 2). In addition to the intercept, the initial models had 3–6 predictor coefficients, with the majority of the models having 5 coefficients. The maximum number of nonintercept coefficients allowed in the final ensemble set was five, because reducing the number of regression parameters to four led to a single model for 1974 (Table 3) that did not satisfy the regression assumptions and statistical significance requirements.

Characteristics of regression models developed using a LOOCV approach (section 2b) for 2-month lead time prediction of standardized Kiremt (JJAS) all-Ethiopia rainfall anomalies (Fig. 1) for 1970–2002, based on atmospheric and SST predictors observed in March. The columns under the heading number of models having indicated number of coefficients (3–6) exclude the intercept. Models for which the predictors were strongly intercorrelated (i.e., |*r*_{px}| *≥* 0.5, where *r*_{px} is the correlation between any two predictors in a model) were removed. The column under the heading number of models significant at 5% level gives results of tests for 1) normality, serial independence, and constant variance according to the Shapiro–Wilk, Durbin–Watson, and Breusch–Pagan tests, respectively, and 2) individual model coefficients and overall model fit according to an *F* test. The column under final ensemble model size gives the number of statistically significant (at 5% level) models selected having a max of five coefficients excluding the intercept.

The mean and median of the final ensemble model members were used as verification metrics for the forecasts. Figure 9 and Table 3 document the performance of the LOOCV approach for the predictions of standardized all-Ethiopian Kiremt rainfall anomalies. The Pearson’s correlations between the observed and predicted ensemble mean and median respectively are +0.81 and +0.75 (ASLs <1% according to a nonparametric bootstrap test) for 1970–2002. The Spearman’s rank correlation was +0.81 for the ensemble mean and +0.77 for the median.

Table 4 provides additional prediction verification statistics (section 2c) for the final multimodel ensemble shown in Table 3 (last column). The performance statistics under the median column were obtained from individual regression estimates. For verification statistics that require pairwise observed–predicted time series, only the least common number of models in the 33 verification years (7 for 1973 and 1982) was used so that no data pairs are excluded or repeatedly used in cross validating individual regression estimates. This issue arises in the LOOCV approach because the number of final models (and hence the number of predicted values) varies from year to year (Table 3). To be as objective as possible in selecting the least common number of models, seven models were randomly sampled (without replacement) for all verification years with larger final model ensembles and were used for cross validation. As seen from Table 4, the LOOCV ensemble mean prediction proved more skillful than the individual regression estimates for all verification measures including prediction variance (standard deviation), SS_{Clim}, RPSS_{Clim}, *r*, and *E*_{1}. In particular, the RPSS_{Clim} skill for the individual regression estimates was far inferior to the skill obtained for the ensemble mean. To ensure that this low skill was not an artifact of a one-time sampling issue, RPSS_{Clim} was computed for 10 more samples obtained for different initializations of the random number generator. In all cases, the median of the RPSS_{Clim} for the individual regression estimates was less than 0, indicating worse than climatology skill.

Summary statistics and 95% confidence intervals for model verifications (section 2c) of 2-month lead time prediction of standardized Kiremt (JJAS) all-Ethiopian rainfall anomalies (Fig. 1) for 1970–2002 using a LOOCV approach (section 2b, steps 2 and 3), based on atmospheric and SST predictors observed in March. Ensemble models were developed by excluding a single verification year in turn (section 3c). Confidence intervals were computed by applying the BCa bootstrap method to 5000 verification statistic replicates obtained by bootstrapping the observed and predicted rainfall anomalies (section 2c). The median column gives the median values of the statistics obtained for the individual regression estimates without pooling, and the mean column under the bootstrap heading contains the average of 5000 bootstrap statistic replicates. Asterisks indicate values were computed using the least common number of models available in all verification years (i.e., 7; Table 3).

The prediction variability for the ensemble mean is close to the observed variation, with the prediction standard deviation being close to 74% of the standard deviation of the observations (Table 4). The MAE (0.18 standardized units) and RMSE (0.23 standardized units) are smaller than the observed standard deviation (0.39 standardized units). The LOOCV prediction approach showed substantial reduction in MSE compared to climatology (65%) and persistence (86%). Generally, the skill of the LOOCV remains close to the skill obtained for the RV approach and provided skillful all-Ethiopian Kiremt rainfall forecasts with the average regional atmospheric and global SST conditions observed in March (e.g., *r* = +0.81, *d*_{1} = 66%, and RPSS = 45%).

### d. Leave-one-out cross validation for local and regional predictions using March predictors

The LOOCV approach was applied on finer temporal and spatial scales for the prediction of August monthly rainfall totals at Addis Ababa and Combolcha (Fig. 1) and standardized JJAS rainfall anomalies for northeastern Ethiopia based on atmospheric and SST predictors observed in March. Tables 5–7 list the 20-predictor sets used for local and regional predictions. Following previous studies (e.g., Gissila et al. 2004; Block and Rajagopalan 2007; Korecha and Barnston 2007; Nicholson 2014), these predictors were selected based on a careful assessment of maps of predictand–predictor correlations. All selected predictors have statistically significant correlations with ASLs ≤5% according to a nonparametric bootstrap test for both the 1970–90 and 1970–99 periods. The predictors for Addis Ababa (Table 5) include upper tropospheric air temperature, meridional moisture flux over the northern Arabian Sea, SST over the Bay of Bengal, and products of zonal and meridional winds with vertical velocities across the southern oceans, most of which are strongly correlated with the performance of the Belg rains in Ethiopia and the development of the monsoon systems in summer. For Combolcha (Table 6), lower-tropospheric meridional wind and specific humidity over the central Indian Ocean and SST over the southern Pacific possessed strong predictive signals, while for regional predictions (Table 7) midtropospheric zonal and meridional wind over the western Indian Ocean and vertical velocity over West Africa and/or the Gulf of Guinea in March are major predictors of JJAS rainfall anomalies for the northeastern Ethiopia.

Predictors selected for the prediction of August total rainfall for Addis Ababa (Fig. 1) using a LOOCV approach (section 2b). Predictors were selected based on their high correlations with rainfall, with predictand–predictor correlations being statistically significant with ASLs ≤5% according to a nonparametric bootstrap test.

Predictors selected for the prediction of August total rainfall for Combolcha (Fig. 1) using a LOOCV approach (section 2b). Predictors were selected based on their high correlations with rainfall, with predictand–predictor correlations being statistically significant with ASLs ≤5% according to a nonparametric bootstrap test.

Predictors selected for the prediction of standardized JJAS northeastern Ethiopia (Fig. 1; 11 stations for 1970–99, stars; 7 stations for 2000–02, squares) using a LOOCV approach (section 2b). Predictors were selected based on their high correlations with rainfall, with predictand–predictor correlations being statistically significant with ASLs ≤5% according to a nonparametric bootstrap test.

The procedures presented in section 2b (steps 2 and 3) were followed for model construction and final ensemble model member selection for local predictions for Addis Ababa and Combolcha for 1970–99 and regional prediction for northeastern Ethiopia for 1970–2002. In all cases, models with a maximum of only five nonintercept coefficients were allowed in the final ensemble set.

For Addis Ababa, the final ensemble models range from 5 for 1984 to 14 for 1972, with the majority of the models having 4–5 nonintercept coefficients (Table 8). Training period degrees-of-freedom–adjusted *R*^{2} (coefficient of multiple determination) varied from 56% to 88% (from 61% to 91%) for all verification years. The LOOCV ensemble mean overestimated the three driest years (1987, 1972, and 1975) by 51–91 mm and underestimated the wettest three years by 18–80 mm (Fig. 10). However, differences between predicted and observed rainfall amounts were less than 50 (33) mm for 77% (73%) of the 1970–99 verification years. The observed and LOOCV ensemble means for 1970–99 differed by <5 mm, with the year-to-year variability of the ensemble mean being 86% of the observed standard deviation. The RPSS_{Clim} and all MSE-based skill scores (SS_{Clim}, SS_{Pers}, *E*_{1}, *d*_{1}, and *d*_{2}) showed improved forecast skill relative to the 1970–99 climatology and persistence (Table 9). The Pearson’s correlations between observed and predicted ensemble mean and median respectively are +0.72 and +0.68 (ASLs <1% according to a conventional nonparametric bootstrap test), which respectively decreased to +0.64 and +0.58 when the maximum number of nonintercept coefficients was reduced to four. In the latter case, however, the final ensemble sizes were small, ranging from 3 or 4 models for many verification years to 12 models for 1972, which can affect the quantification of prediction uncertainty for years with small ensemble sizes. Compared to the median values of individual regression estimates, there is a substantial improvement of the ensemble averaging as evidenced by a smaller predicted-minus-observed mean difference, higher prediction variance, and smaller MAE and RMSE values. In addition, the ensemble mean provided higher skill values (Table 8) for all verification metrics including RPSS_{Clim}, SS_{Clim}, and *E*_{1} for five randomly selected model ensembles for all years except 1984, which only had five ensemble members.

Characteristics of regression models developed using a LOOCV approach (section 2b, steps 2 and 3) approach for 3-month lead time localized prediction of August monthly total rainfall at Addis Ababa (Bole International Airport; Fig. 1) for 1970–99, based on atmospheric and SST predictors observed in March. The columns under the heading number of models having indicated number of coefficients (3–7) exclude the intercept. Models for which the predictors were strongly intercorrelated (i.e., |*r*_{px}| *≥* 0.5, where *r*_{px} is the correlation between any two predictors in a model) were removed. The column under the heading number of models significant at 5% level gives results of tests for 1) normality, serial independence, and constant variance according to the Shapiro–Wilk, Durbin–Watson, and Breusch–Pagan tests, respectively, and 2) individual model coefficients and overall model fit according to an *F* test. The “Final ensemble model size” column gives the number of statistically significant (at 5% level) models selected with a max of five nonintercept coefficients.

As in Table 3, but for 3-month lead time localized prediction of August monthly total rainfall at Addis Ababa (Bole International Airport; Fig. 1) for 1970–99, based on atmospheric and SST predictors observed in March. Asterisks indicate shown values were computed using the least common number of models available for all verification years (i.e., 5; Table 5).

For the prediction of August rainfall total for Combolcha, from 7 (for 1974) to 17 (for 1986) final ensemble members were selected from the initial 20 models (Table 10). In addition to the intercept, the initial models stepped in 3–6 predictors, with the majority of the models having four regression coefficients. The LOOCV ensemble mean (median) forecasts (Fig. 11) for the final ensembles produced an observed–predicted Pearson’s correlation of +0.68 (+0.63), both with ≤1% ASLs according to a nonparametric bootstrap test. The Spearman’s rank correlation decreased for both the ensemble mean (+0.63) and median (+0.56), indicating a reduced monotonic association between ensemble-mean predicted and observed time series. Reducing the maximum number of coefficients from five to four did not change the correlation skills, but lowered the ensemble sizes to between 5 (for many verification years) and 12 for 1996. The observed and prediction ensemble means have comparable values (Table 11), but the predicted year-to-year variability was 19% lower than the observed standard deviation. The prediction outperformed climatology and persistence, with a 45% (76%) reduction of the MSE obtained from 1970–99 climatology (persistence). Note that RPSS_{Clim} and *E*_{1} are not statistically different from zero. Despite the ensemble’s poor RPSS_{Clim} and *E*_{1} skill values, the ensemble averaging showed modest improvements over the median values of individual regression estimates (excluding RPSS_{Clim} and *E*_{1}), with percentage skill increases of 2%–37% for relative error measures, and reductions in absolute error measures of 5%–10%.

As in Table 4, but for 3-month lead time localized prediction of August monthly total rainfall at Combolcha (Fig. 1) for 1970–99, based on atmospheric and SST predictors observed in March. The max number of coefficients excluding the intercept stepped in was 6, and all models with six coefficients were excluded from the final ensemble models.

As in Table 3, but for 3-month lead time localized prediction of August monthly total rainfall at Combolcha (Fig. 1) for 1970–99, based on atmospheric and SST predictors observed in March. Asterisks indicate shown values were computed using the least common number of models available for all verification years (i.e., 7; Table 10).

For regional JJAS rainfall prediction for northeastern Ethiopia, from 6 (for 1986) to 13 (1994) final model ensembles were selected. The majority of models again had 4–5 coefficients, but none of the final ensemble models had more than five nonintercept coefficients (Table 12). The Pearson’s correlations between observed and predicted ensemble mean and median respectively are +0.80 and +0.75 (Fig. 12; ASLs <1% according to a conventional nonparametric bootstrap test), while the Spearman’s rank correlation counterpart for the ensemble mean (median) is +0.76 (+0.75). The Pearson’s correlation skill values for the ensemble mean and median slightly decreased respectively to +0.78 and 0.73 when the maximum number of nonintercept coefficient was reduced from five to four. However, the small change in correlation values was accompanied by significant reductions in ensemble sizes (4 and 5 for many years and 11 for 1982). The small ensemble sizes in turn affect the quantification of prediction uncertainty. Although the ensemble mean overestimated the driest four years and underestimated the wettest two anomalies, the prediction exhibited sufficiently high year-to-year variability (90% of the observed standard deviation). Unlike for Combolcha, the RPSS_{Clim} and all MSE-based skill scores (SS_{Clim}, SS_{Pers}, *E*_{1}, *d*_{1}, and *d*_{2}) showed improved forecast skill relative to the 1970–2002 climatology and persistence (Table 13). Compared to the median values of individual regression estimates, the ensemble mean showed 20%–22% decreases in MAE and RMSE and 7%–62% increases in SS_{Clim}, SS_{Pers}, *r*, *E*_{1}, *d*_{1}, and *d*_{2} for six randomly selected model ensembles for all years except for 1986, which had only six ensemble model members. Thus, the ensemble-based statistical technique provided substantively high skill scores for regional prediction of JJAS standardized rainfall anomalies two months in advance of the onset of Kiremt in northeastern Ethiopia.

As in Table 3, but for a LOOCV approach (section 2b) for 2-month lead time prediction of standardized JJAS northeastern Ethiopia (Fig. 1; 11 stations for 1970–99, stars; 7 stations for 2000–02, collocated squares) rainfall anomalies for 1970–2002, based on atmospheric and SST predictors observed in March. The columns under the heading number of models having indicated number of coefficients (3–7) exclude the intercept. The column final ensemble model size gives the number of statistically significant (at 5% level) models selected with a max of five nonintercept coefficients.

As in Table 4, but for 2-month lead time prediction of standardized JJAS northeastern Ethiopia (Fig. 1, 11 stations for 1970–99, stars; 7 stations for 2000–02, collocated squares) rainfall anomalies for 1970–2002, based on atmospheric and SST predictors observed in March. Asterisks indicate values were computed using the least common number of models available in all verification years (i.e., 6; Table 12).

## 4. Summary and discussion

This study evaluates the predictability of Ethiopian Kiremt monthly to seasonal rainfall at local, regional, and national levels based on the climate system causation of Kiremt rainfall variability over Ethiopia. Time-scale analysis in Part I showed the contemporaneous linkages of Ethiopian rainfall variability with both atmospheric and sea surface temperature (SST) forcing during June–September (JJAS). The monsoon’s response to both slowly varying SST variability and higher-frequency regional atmospheric changes are found to be the primary climate system variability causation to assess Ethiopian seasonal and monthly rainfall predictability one to three months in advance.

A pool of predictors (20) deemed relevant to the rainfall was selected by careful assessments of a series of historical correlation maps that relate rainfall with individual regional atmospheric and global SST variables for March. Regression models were constructed using a forward stepwise model-fitting procedure, for which each selected predictor was specified as a first model parameter along with an intercept to ensure that a potential predictive signal uniquely associated with a predictor is not unduly discarded. This produces an ensemble of models. Forecast skill was assessed using several verification metrics.

In the retroactive verification (RV) approach, the ensemble prediction for 1990–2002 reproduced well the observed all-Ethiopian Kiremt rainfall variability two months in advance, with a Pearson’s correlation [mean-square skill score over climatology (SS_{Clim})] of +0.84 (62%). The leave-one-out cross verification (LOOCV) for 1970–2002, which is a fairer test of the predictive capability of the models, has a Pearson’s correlation (SS_{Clim}) of +0.81 (65%). For probability forecasts of below normal, near normal, and above normal, the ensemble mean showed improvement compared to climatological forecasts, with a ranked probability skill score (RPSS_{Clim}) of 0.45 for the LOOCV approach. Results of LOOCV-based localized predictions of August rainfall at Addis Ababa and Combolcha for 1970–99 showed that the predictions captured the relative interannual variability well. The correlations between observed and ensemble means for Addis Ababa (+0.72) and Combolcha (+0.68) were high. Consistent with decreasing Spearman’s rank correlation, the RPSS_{Clim} skill is low especially for Combolcha (+0.12) and not statistically significant at 5% level. For regional prediction of JJAS standardized rainfall for northeastern Ethiopia, strong linear correlation (+0.80) was found.

There is a marked improvement in the performance of the current empirical ensemble prediction method compared to prediction skill found in previous studies. For example, the correlation skill derived herein between predicted and observed standardized all-Ethiopian Kiremt rainfall anomalies was +0.81 (+0.84) for the LOOCV (RV) for models initiated from average March atmospheric and SST conditions. This can be compared to Korecha and Barnston (2007), who found a correlation skill of +0.64 (+0.51) between predicted and observed standardized all-Ethiopian Kiremt rainfall anomalies for LOOCV (RV) verification approach for models developed from March–May SSTs (for improvements of 27%–65%). For regional northwestern Ethiopia prediction, Block and Rajagopalan (2007) found cross-validated correlation (RPSS_{Clim}) skill of +0.69 (+0.39), based on local polynomial regression models developed using March–May predictors. The current ensemble prediction technique has about 16%–44% gain in explained variance compared to the corresponding *R*^{2} in these studies. In addition, the ensemble prediction provided longer lead times compared to the above studies (May versus March). However, Nicholson (2014) found a high correlation skill of +0.75 for LOOCV-based July–September rainfall predictions for the larger Horn of Africa region encompassing Ethiopia and Sudan. Although enlargement of the domain could have reduced the rainfall variability [e.g., negative 1996 rainfall anomaly for the larger Horn of Africa in Fig. 12 of Nicholson (2014) versus the locally wet all-Ethiopian Kiremt in Figs. 6 and 8] and affect the comparison of results, the current ensemble-based prediction still shows improved skill at a longer lead times (March versus May) and finer spatial scales, with 8%–9% gain in explained variance compared to the corresponding *R*^{2} of 56% (for LOOCV) and 62% (for RV) in Nicholson (2014).

The RV (using a model developed for 1970–89 for upcoming season prediction) and LOOCV (using a model developed for all years up to the current year for upcoming season prediction) approaches can be applied for real-time predictions of all-Ethiopian Kiremt rainfall anomalies a few days after the end of March, as the necessary initialization data (real-time reanalysis, SST) are available 5–10 days after the end of the month. The predictors selected in this study may not pose a problem in the near-term climate. However, the predictor pool should be reevaluated every few years to account for 1) predictors that remain skillful but have decreasing frequency, 2) predictors that are decreasing in skill, 3) predictors that were not included in the selected sets that are starting to show additional skill, and 4) changes in the likelihood functions due to climate change. Updating the RV and LOOCV prediction models using the entire data with follow up real-time forecast evaluation realistically could lead to successful real-time operational use of the approaches. The simultaneous use of both approaches builds confidence in the value of the prediction if they yield similar forecasts. Moreover, to our knowledge, the methodology of building observational ensembles for statistical prediction is unique. The high-quality local and national prediction capability should have significant beneficial societal implications. In particular, the forecasting of seasonal anomalies at regional and national scales and monthly rainfall totals at specific localities with usable skill could play a key role in risk management to help minimize the damaging effects of recurring droughts in Ethiopia.

## Acknowledgments

This research was supported by the NOAA Cooperative Institute for Mesoscale Meteorological Studies (CIMMS). The computations for this project were performed at the OU Supercomputing Center for Education and Research (OSCER) at The University of Oklahoma. The primary research data (1970–99) were provided by the National Meteorological Agency of Ethiopia in the early part of this research. Ethiopian seasonal rainfall station data for 2000–02 were kindly provided by Dr. A. G. Barnston. The comments and suggestions of three anonymous reviewers are appreciated. In particular, Reviewer 1’s detailed analysis of the manuscript greatly improved the final product. The first three authors express their deep and lasting gratitude to the fourth author, the late Peter J. Lamb, who dedicated much of his life to investigation of climate variability in Africa. Without his support and expert assistance in preparing this manuscript, it would not have been possible.

## REFERENCES

Barnston, A. G., , W. Thiao, , and V. Kumar, 1996: Long-lead forecasts of seasonal precipitation in Africa using CCA.

,*Wea. Forecasting***11**, 506–520, doi:10.1175/1520-0434(1996)011<0506:LLFOSP>2.0.CO;2.Beltrando, G., , and P. Camberlin, 1993: Interannual variability of rainfall in the eastern Horn of Africa and indicators of atmospheric circulation.

,*Int. J. Climatol.***13**, 533–546, doi:10.1002/joc.3370130505.Berhane, F., , B. Zaitchik, , and A. Dezfuli, 2014: Subseasonal analysis of precipitation variability in the Blue Nile river basin.

,*J. Climate***27**, 325–344, doi:10.1175/JCLI-D-13-00094.1.Block, P., , and B. Rajagopalan, 2007: Interannual variability and ensemble forecast of Upper Blue Nile Basin Kiremt season precipitation.

*J. Hydrometeor.,***8,**327–343, doi:10.1175/JHM580.1.Bohn, T. J., , M. Y. Sonessa, , and D. P. Lettenmaier, 2010: Seasonal hydrologic forecasting: Do multimodel ensemble averages always yield improvements in forecast skill?

,*J. Hydrometeor.***11**, 1358–1372, doi:10.1175/2010JHM1267.1.Bowden, J. H., , and F. H. M. Semazzi, 2007: Empirical analysis of intraseasonal climate variability over the Greater Horn of Africa.

,*J. Climate***20**, 5715–5731, doi:10.1175/2007JCLI1587.1.Breusch, T. S., , and A. R. Pagan, 1979: A simple test for heteroscedasticity and random coefficient variation.

,*Econometrica***47**, 1287–1294, doi:10.2307/1911963.Camberlin, P., 1997: Rainfall anomalies in the source region of the Nile and their connection with the Indian summer.

,*J. Climate***10**, 1380–1392, doi:10.1175/1520-0442(1997)010<1380:RAITSR>2.0.CO;2.Camberlin, P., , S. Janicot, , and I. Poccard, 2001: Seasonality and atmospheric dynamics of the teleconnection between African rainfall and tropical sea-surface temperature: Atlantic vs. ENSO.

,*Int. J. Climatol.***21**, 973–1005, doi:10.1002/joc.673.Charney, J. G., 1975: Dynamics of deserts and drought in the Sahel.

,*Quart. J. Roy. Meteor. Soc.***101**, 193–202, doi:10.1002/qj.49710142802.Charney, J. G., , and J. Shukla, 1981: Predictability of monsoons.

*Monsoon Dynamics,*J. Lighthill and R. P. Pearce, Eds., University Press, 99–109.Charney, J. G., , W. J. Quirk, , S.-H. Chow, , and J. Kornfield, 1977: A comparative study of the effects of albedo change on drought in semi-arid regions.

,*J. Atmos. Sci.***34**, 1366–1385, doi:10.1175/1520-0469(1977)034<1366:ACSOTE>2.0.CO;2.Cheung, W. H., , G. B. Senay, , and A. Singh, 2008: Trends and spatial distribution of annual and seasonal rainfall in Ethiopia.

,*Int. J. Climatol.***28**, 1723–1734, doi:10.1002/joc.1623.Clark, C. A., , and R. W. Arritt, 1995: Numerical simulations of the effect of soil moisture and vegetation cover on the development of deep convection.

,*J. Appl. Meteor.***34**, 2029–2045, doi:10.1175/1520-0450(1995)034<2029:NSOTEO>2.0.CO;2.Clark, D. B., , Y. Xue, , R. J. Harding, , and P. J. Valdes, 2001: Modeling the impact of land surface degradation on the climate of tropical North Africa.

,*J. Climate***14**, 1809–1822, doi:10.1175/1520-0442(2001)014<1809:MTIOLS>2.0.CO;2.Conway, D., 2000: The climate and hydrology of the Upper Blue Nile River.

,*Geogr. J.***166**, 49–62, doi:10.1111/j.1475-4959.2000.tb00006.x.Degefu, W., 1987: Some aspects of meteorological droughts in Ethiopia.

*Drought and Hunger in Africa: Denying Famine a Future,*M. H. Glantz, Ed., Cambridge University Press, 23–36.DelSole, T., , and J. Shukla, 2002: Linear prediction of Indian monsoon rainfall.

,*J. Climate***15**, 3645–3658, doi:10.1175/1520-0442(2002)015<3645:LPOIMR>2.0.CO;2.Diro, G. T., , E. Black, , and D. I. F. Grimes, 2008: Seasonal forecasting of Ethiopian spring rains.

,*Meteor. Appl.***15**, 73–83, doi:10.1002/met.63.Draper, N. R., , and H. Smith, 1981:

*Applied Regression Analysis.*Wiley, 724 pp.Efron, B., , and G. Gong, 1983: A leisurely look at the bootstrap, jackknife, and cross-validation.

,*Amer. Stat.***37**, 36–48.Efron, B., , and R. J. Tibshirani, 1993:

*An Introduction to the Bootstrap.*Chapman and Hall, 436 pp.Folland, C. K., , T. N. Palmer, , and D. E. Parker, 1986: Sahel rainfall and world-wide sea temperatures.

,*Nature***320**, 602–607, doi:10.1038/320602a0.Folland, C. K., , J. Owen, , M. N. Ward, , and A. Colman, 1991: Prediction of seasonal rainfall in the Sahel region using empirical and dynamical methods.

,*J. Forecasting***10**, 21–56, doi:10.1002/for.3980100104.Giannini, A., , R. Saravanan, , and P. Chang, 2003: Oceanic forcing of Sahel rainfall on interannual to interdecadal time scales.

,*Science***302**, 1027–1030, doi:10.1126/science.1089357.Gissila, T., , E. Black, , D. I. F. Grimes, , and J. M. Slingo, 2004: Seasonal forecasting of the Ethiopian summer rains.

,*Int. J. Climatol.***24**, 1345–1358, doi:10.1002/joc.1078.Goddard, L., , S. J. Mason, , S. E. Zebiak, , C. F. Ropelewski, , R. Basher, , and M. A. Cane, 2001: Current approaches to seasonal-to-interannual climate predictions.

,*Int. J. Climatol.***21**, 1111–1152, doi:10.1002/joc.636.Goswami, B., , and J. Shukla, 1991: Predictability of a coupled ocean–atmosphere model.

,*J. Climate***4**, 3–22, doi:10.1175/1520-0442(1991)004<0003:POACOA>2.0.CO;2.Hastenrath, S., , L. Greischar, , and J. van Heerden, 1995: Prediction of summer rainfall over South Africa.

,*J. Climate***8**, 1511–1518, doi:10.1175/1520-0442(1995)008<1511:POTSRO>2.0.CO;2.Jury, M. R., 2013: Ethiopian highlands crop-climate prediction: 1979–2009.

,*J. Appl. Meteor. Climatol.***52**, 1116–1126, doi:10.1175/JAMC-D-12-0139.1.Kalnay, E., and et al. , 1996: The NCEP/NCAR 40-Year Reanalysis Project.

,*Bull. Amer. Meteor. Soc.***77**, 437–471, doi:10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.Korecha, D., , and A. G. Barnston, 2007: Predictability of June to September rainfall in Ethiopia.

,*Mon. Wea. Rev.***135**, 628–650, doi:10.1175/MWR3304.1.Korecha, D., , and A. Sorteberg, 2013: Validation of operational seasonal rainfall forecast in Ethiopia.

,*Water Resour. Res.***49**, 7681–7697, doi:10.1002/2013WR013760.Krishnamurti, T. N., , C. M. Kishtawal, , Z. Zhang, , T. Larow, , D. Bachiochi, , E. Williford, , S. Gadgil, , and S. Surendran, 2000: Multimodel ensemble forecasts for weather and seasonal climate.

,*J. Climate***13**, 4196–4216, doi:10.1175/1520-0442(2000)013<4196:MEFFWA>2.0.CO;2.Lamb, P. J., 1978a: Large-scale tropical Atlantic surface circulation patterns associated with sub-Saharan weather anomalies.

,*Tellus***30A**, 240–251, doi:10.1111/j.2153-3490.1978.tb00839.x.Lamb, P. J., 1978b: Case studies of tropical Atlantic surface circulation patterns during recent sub-Saharan weather anomalies: 1967 and 1968.

,*Mon. Wea. Rev.***106**, 482–491, doi:10.1175/1520-0493(1978)106<0482:CSOTAS>2.0.CO;2.Lamb, P. J., , and R. A. Peppler, 1992: Further case studies of tropical Atlantic surface atmospheric and oceanic patterns associated with sub-Saharan drought.

,*J. Climate***5**, 476–488, doi:10.1175/1520-0442(1992)005<0476:FCSOTA>2.0.CO;2.Lanckriet, S., , A. Frankl, , E. Adgo, , P. Termonia, , and J. Nyssen, 2015: Droughts related to quasi-global oscillations: A diagnostic teleconnection analysis in North Ethiopia.

, doi:10.1002/joc.4074, in press.*Int. J. Climatol.*Legates, D. R., , and G. J. McCabe, 1999: Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation.

,*Water Resour. Res.***35**, 233–241, doi:10.1029/1998WR900018.Mason, S. J., , and G. M. Mimmack, 1992: The use of bootstrap confidence intervals for the correlation coefficient in climatology.

,*Theor. Appl. Climatol.***45**, 229–233, doi:10.1007/BF00865512.Mutai, C. C., , and M. N. Ward, 2000: East African rainfall and the tropical circulation/convection on intraseasonal to interannual timescales.

,*J. Climate***13**, 3915–3939, doi:10.1175/1520-0442(2000)013<3915:EARATT>2.0.CO;2.Ndiaye, O., , M. N. Ward, , and W. M. Thiaw, 2011: Predictability of seasonal Sahel rainfall using GCMs and lead-time improvements through the use of a coupled model.

,*J. Climate***24**, 1931–1949, doi:10.1175/2010JCLI3557.1.Nicholls, N., 2001: Commentary and analysis: The insignificance of significance testing.

,*Bull. Amer. Meteor. Soc.***82**, 981–986, doi:10.1175/1520-0477(2001)082<0981:CAATIO>2.3.CO;2.Nicholson, S. E., 2014: The predictability of rainfall over the Greater Horn of Africa. Part I. Prediction of seasonal rainfall.

,*J. Hydrometeor.***15,**1011–1027, doi:10.1175/JHM-D-13-062.1.Rayner, N. A., , D. E. Parker, , E. B. Horton, , C. K. Folland, , L. V. Alexander, , D. P. Rowell, , E. C. Kent, , and A. Kaplan, 2003: Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century.

*J. Geophys. Res.,***108,**4407, doi:10.1029/2002JD002670.Riddle, E. E., , and K. H. Cook, 2008: Abrupt rainfall transitions over the Greater Horn of Africa: Observations and regional model simulations.

,*J. Geophys. Res.***113**, D15109, doi:10.1029/2007JD009202.Saunders, M. A., , and A. S. Lea, 2005: Seasonal prediction of hurricane activity reaching the coast of the United States.

,*Nature***434**, 1005–1008, doi:10.1038/nature03454.Segele, Z. T., , and P. J. Lamb, 2005: Characterization and variability of Kiremt rainy season over Ethiopia.

,*Meteor. Atmos. Phys.***89**, 153–180, doi:10.1007/s00703-005-0127-x.Segele, Z. T., , P. J. Lamb, , and L. M. Leslie, 2009a: Large-scale atmospheric circulation and global sea surface temperature associations with Horn of Africa June–September rainfall.

,*Int. J. Climatol.***29**, 1075–1100, doi:10.1002/joc.1751.Segele, Z. T., , P. J. Lamb, , and L. M. Leslie, 2009b: Seasonal-to-interannual variability of Ethiopian/Horn of Africa monsoon. Part I: Associations of wavelet-filtered large-scale atmospheric circulation and global sea surface temperature.

,*J. Climate***22**, 3396–3421, doi:10.1175/2008JCLI2859.1.Segele, Z. T., , L. M. Leslie, , and P. J. Lamb, 2009c: Evaluation and adaptation of a regional climate model for the Horn of Africa: Rainfall climatology and interannual variability.

,*Int. J. Climatol.***29**, 47–65, doi:10.1002/joc.1681.Seleshi, Y., , and U. Zanke, 2004: Recent changes in rainfall and rainy days in Ethiopia.

,*Int. J. Climatol.***24**, 973–983, doi:10.1002/joc.1052.Shanko, D., , and P. Camberlin, 1998: The effects of the southwest Indian Ocean tropical cyclones on Ethiopian drought.

,*Int. J. Climatol.***18**, 1373–1388, doi:10.1002/(SICI)1097-0088(1998100)18:12<1373::AID-JOC313>3.0.CO;2-K.Solow, A. R., 1985: Bootstrapping correlated data.

,*J. Int. Assoc. Math. Geol.***17**, 769–775, doi:10.1007/BF01031616.S-PLUS, 2013: TIBCO Spotfire S+ 8.2. Tibco Software, Inc. [Available online at http://tibco-spotfire-s.software.informer.com/8.2/.]

Steinskog, D. J., , D. B. Tjøstheim, , and N. G. Kvamstø, 2007: A cautionary note on the use of the Kolmogorov–Smirnov test for normality.

,*Mon. Wea. Rev.***135**, 1151–1157, doi:10.1175/MWR3326.1.Tamhane, A. C., , and D. D. Dunlop, 2000:

*Statistics and Data Analysis: From Elementary to Intermediate.*Prentice Hall, 736 pp.Venables, W. M., , and B. D. Ripley, 1997:

*Statistics and Computing: Modern Applied Statistics with S-Plus*. Springer, 548 pp.Viste, E., , D. Korecha, , and A. Sorteberg, 2013: Recent drought and precipitation tendencies in Ethiopia.

,*Theor. Appl. Climatol.***112**, 535–551, doi:10.1007/s00704-012-0746-3.Vizy, E. K., , and K. H. Cook, 2003: Connection between the summer East African and Indian rainfall regimes.

,*J. Geophys. Res.***108**, 4510, doi:10.1029/2003JD003452.Ward, M. N., 1998: Diagnosis and short-lead time prediction of summer rainfall in tropical Africa at interannual and multidecadal timescales.

,*J. Climate***11**, 3167–3191, doi:10.1175/1520-0442(1998)011<3167:DASLTP>2.0.CO;2.Webster, P. J., , and S. Yang, 1992: Monsoon and ENSO: Selectively interactive systems.

,*Quart. J. Roy. Meteor. Soc.***118**, 877–925, doi:10.1002/qj.49711850705.Webster, P. J., , V. O. Magaña, , T. N. Palmer, , J. Shukla, , R. A. Tomas, , M. Yanai, , and T. Yasunari, 1998: Monsoons: Processes, predictability, and prospects for prediction.

,*J. Geophys. Res.***103**, 14 451–14 510, doi:10.1029/97JC02719.Wilks, D. S., 2006:

*Statistical Methods in the Atmospheric Sciences.*2nd ed. International Geophysics Series, Vol. 91, Academic Press, 627 pp.Williams, A. P., and et al. , 2012: Recent summer precipitation trends in the Greater Horn of Africa and the emerging role of Indian Ocean Sea surface temperature.

,*Climate Dyn.***39**, 2307–2328, doi:10.1007/s00382-011-1222-y.Willmott, C. J., , S. G. Ackleson, , R. E. Davis, , J. J. Feddema, , K. M. Klink, , D. R. Legates, , J. O’Donnell, , and C. M. Rowe, 1985: Statistics for the evaluation and comparison of models.

,*J. Geophys. Res.***90**, 8995–9005, doi:10.1029/JC090iC05p08995.Xue, Y., , and J. Shukla, 1993: The influence of land surface properties on Sahel climate. Part I: Desertification.

,*J. Climate***6**, 2232–2245, doi:10.1175/1520-0442(1993)006<2232:TIOLSP>2.0.CO;2.