1. Introduction
Subseasonal predictions, from two weeks to a season, are relevant for informing decision making and early warning across a range of sectors in the Greater Horn of Africa (e.g., agriculture, energy, water, and disaster risk management). Subseasonal forecasts bridge the gap between medium-range weather and seasonal forecasts (Vitart et al. 2012; Robertson et al. 2015; Vitart et al. 2017; White et al. 2017), and have the potential to contribute to early warning and early action for both flooding and drought disasters (Moron et al. 2018).
Given the potential applications of subseasonal predictions (White et al. 2017), and the increasing demand for forecast information across sectors in recent years, the World Weather Research Programme (WWRP) and World Climate Research Programme (WCRP) launched a joint research initiative called the Subseasonal to Seasonal Prediction project (S2S) and a multimodel database of S2S forecasts and reforecasts. The database provides an opportunity to make comparisons between the outputs of different prediction models and advance knowledge of S2S prediction (Vitart et al. 2017). Since the establishment of the S2S database, some studies have evaluated the skill of S2S models in different regions. Li and Robertson (2015) assessed the weekly prediction skill of three global prediction systems over the globe and indicated the models had very good skill for the first week. Over Africa, de Andrade et al. (2021) evaluated the subseasonal forecasts for three global prediction systems and found that although skill was relatively low in weeks 3 and 4, compared to weeks 1 and 2, the probabilistic forecasts still had skill in weeks 3–4. de Andrade et al. (2019) compared the performance of subseasonal precipitation reforecasts from 11 S2S models considering lead times up to 4 weeks using deterministic verification metrics and indicated higher skill during the first week and reduced skill as lead time increased. Vigaud et al. (2017) also examined the subseasonal rainfall forecast skill over summer monsoon regions of the Northern Hemisphere using submonthly lead times and found good skill (reliability) in multimodel forecasts for forecasts beyond 1 week.
Because of different drivers of S2S variability, and the nonlinear response to these drivers, the skill at predicting the precipitation varies widely from region to region and time scale to time scale. Evaluating the forecast skill for different regions and time scales is vitally important to identify model errors, improve forecast skill and also promote the uptake and use of forecast information in decision making. In this study, we thoroughly assessed the skill of 11 S2S models over the Greater Horn of Africa (GHA) during the March–April–May (MAM) rainfall season with a focus on monthly time scales.
Past studies have shown that the MAM rainfall commonly known as the long-rains over the GHA is weakly associated with large-scale oceanic and atmospheric features (e.g., Hastenrath et al. 1993; Rowell et al. 1994; Vellinga and Milton 2018) and has low predictability compared to the October–November–December (OND) rainfall known as the short-rains (Camberlin and Philippon 2002). Furthermore, it has been noted that there is an intraseasonal inhomogeneity within the long-rains season. The spatial rainfall anomaly patterns are similar in March and April but quite different in May (Camberlin and Philippon 2002). Other studies (e.g., Rowell et al. 1995; Nicholson and Kim 1997) also found that time series of interannual variability for the months of March, April, and May are different. Nicholson (2015) also indicated that the prevailing atmospheric circulation and controls on interannual variability are clearly different during the three months of the long-rains. As a result of this inhomogeneity within the season, some authors (e.g., Camberlin et al. 2009; Moron et al. 2013; Rowell et al. 1994) have suggested that subseasonal analysis is required for the long-rains season to advance the understanding and prediction of precipitation variability.
It is also important to recall that the World Meteorological Organization (WMO) Executive Council at its 69th Session in May 2017 recommended the operational Regional Climate Centres (RCCs) and National Meteorological and Hydrological Services (NMHSs) to access digital forecast and reforecast data from the WMO lead centers for long-range forecasts and produce an objectively consolidated subseasonal and seasonal forecast product that is traceable and reproducible. In the recommendations, the need to assess the skill of forecasting models for different regions was stressed as well as the selection of a subset of models which have better skill for the region of interest for the construction of the relevant multimodel ensemble. Therefore, the results from this study address these recommendations and provide a crucial baseline for identifying skillful models over GHA on the S2S time scale.
2. Data and methods
a. Data
1) Observed data used for verification
The observed data used to verify rainfall reforecasts is the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) version v2.0 (Funk et al. 2015). This dataset is a blended product of 0.05° resolution satellite imagery and in situ station data provided by the Climate Hazards Group. CHIRPS dataset is available from 1981 to the near-present. Validations of CHIRPS rainfall data has been conducted over the different parts of East Africa by comparing CHIRPS with rain gauge data and other satellite rainfall products such as African Rainfall Climatology version 2 (ARC2) and the Tropical Applications of Meteorology using Satellite and Ground-Based Observations (TAMSAT) (e.g., Maidment et al. 2017; Dinku et al. 2018). It has been found that CHIRPS performed significantly better than ARC2 and TAMSAT with higher skill, low bias and lower random errors particularly at decadal (10-days) and monthly time scales (Dinku et al. 2018) and indicated its suitability for use as a reference rainfall dataset.
The European Centre for Medium-Range Weather Forecasts (ECMWF) fifth generation reanalysis (ERA5, Hersbach et al. 2020) datasets was used to evaluate the mean circulation features. This global dataset is available from 1979 to the near present with a 0.25° resolution. In this study, monthly 850-hPa zonal and meridional winds are utilized for the analysis period. The observed sea surface temperature (SST) data utilized in this study is version 2 of the National Oceanic and Atmospheric Administration (NOAA) Optimum Interpolation SST (NOAA_ OI_SST_V2) analysis, retrieved from (https://climatedataguide.ucar.edu/climate-data/sst-data-noaa-optimal-interpolation-oi-sst-analysis-version-2-oisstv2-1x1). The NOAA_OI_SST_V2 integrates both in situ and satellite data and is available from 1982 to the present at 1.0° spatial resolution.
2) Model data
The S2S database consists of reforecasts and near-real-time forecasts (3 weeks behind) from 11 global prediction centers, which have been made available for scientific research via the data archive portal at the ECMWF and the China Meteorological Administration (CMA) (Vitart et al. 2017). The 11 global prediction centers are Australian Bureau of Meteorology (BoM), China Meteorological Administration (CMA), Météo-France/Centre National de Recherche Meteorologiques (CNRM), Environment and Climate Change Canada (ECCC), ECMWF, Hydrometeorological Centre of Russia (HMCR), the Institute of Atmospheric Sciences and Climate (ISAC), Japan Meteorological Agency (JMA), Korea Meteorological Administration (KMA), National Centers for Environmental Prediction (NCEP) and the Met Office (UKMO). Not all 11 models are exactly independent from each other. The UKMO and KMA use the same system and have the same configuration, but different atmospheric initial conditions and ensemble size.
The reforecasts and forecasts are archived on a common 1.5° grid horizontal resolution in the S2S database. The reforecasts, also known as hindcasts, are a set of forecasts with start and prediction dates in the past, and are used to assess the skill of the model in reproducing the past forecasts and to calibrate real-time forecasts. Reforecasts are similar in every aspect with the real-time forecasts apart from differences in ensemble size. This study assesses the skill of 11 global prediction systems in predicting the monthly rainfall over GHA.
As the S2S models are developed and run by different prediction centers, they have different configurations. For instance, some models have fixed reforecast configuration, whereas others have on-the-fly configuration. Fixed reforecasts are produced once during the lifetime of a given version of the model (e.g., BoM, CMA, Meteo-France, and NCEP). On the other hand, on-the-fly reforecasts are produced at the same time as the real-time forecasts (e.g., ECMWF, KMA, and UKMO). The frequency and initial start date of the reforecast also varies from model to model. Some models are run in continuous mode on a daily basis (e.g., CMA, NCEP), whereas others run on weekly or subweekly basis (e.g., BoM, ECMWF). In addition to that, the reforecast length and time range varies from model to model. For example, the NCEP has 12 years reforecasts initialized every day from 1999 to 2010, whereas ECMWF produces reforecasts on-the-fly covering the past 20 years, initialized 2 days per week (Monday and Thursday) for each model version. The reforecast ensemble size also varies from model to model. Some models are atmosphere-only models forced by observed SSTs, while others have the atmospheric component coupled to an ocean model and a sea ice model. The general features of the global prediction systems used for this study are summarized in Table 1.
Summary of configuration of the global prediction systems (models) used in this analysis. The reforecast length, time range, frequency, and number of ensemble members depend on the modeling center.


Even if the S2S prediction systems have different configuration or setup, there are some common features between them to make the model intercomparisons possible (de Andrade et al. 2019). For instance, all of the prediction systems have reforecasts covering the period 1999–2010. Each model also has a control reforecast member using a single unperturbed initial condition and perturbed forecast members produced for sampling uncertainty in the initial conditions. Further, most of the prediction systems produce forecasts and reforecasts starting on the 1st and middle of each month. Therefore, it is possible to make the model comparisons using the common period 1999–2010.
In this analysis, all reforecasts (control and perturbed) from one week lead to zero lead have been used. For example, to assess the skill of the models during April, all reforecasts initialized from 23 to 31 March have been analyzed. The rationale for choosing this is: 1) to include the models that have shorter forecast range in the model comparison analysis; and 2) to get a sufficiently large number of ensemble members for the probabilistic verification as some models, especially the models run on a daily basis, have few ensemble members if we only consider one or two initialization dates. To enable the comparison between all models, the analysis is performed over a common period from 1999 to 2010 (for 12 years). For computational purposes, both CHIRPS and model reforecasts have been regridded to 0.5° using bilinear interpolation prior to the skill analysis. We have chosen the 0.5° as this is the spatial resolution currently used operationally at IGAD Climate Prediction and Applications Centre (ICPAC) the RCC over the GHA, when producing the monthly and seasonal downscaled climate outlooks for the region.
b. Verification methods
It is important to note that forecast quality is multifaceted and there is no single verification metric that captures all aspects of forecast quality (Murphy 1993). It is therefore important to assess the forecast skills using a range of different statistical measures. Currently, there are several methods available to evaluate the skill of weather and climate forecasts–ranging from simple traditional statistics and scores to methods for more detailed and advanced verifications. In the present analysis, the skills of the models have been assessed using three deterministic and three probabilistic forecast verification measures. The deterministic forecast measures include mean error, linear correlation and root-mean-square error. The probabilistic forecast evaluation metrics include the ranked probability skill score, relative operating characteristic, and spread–error ratio. The deterministic forecast verification assessment is performed between the ensemble mean of all reforecast members (control plus perturbed members) and the verifying observation, whereas the probabilistic forecast verification analysis is performed using all the individual ensemble members. In addition to the above verification metrics, Taylor and reliability (attribute) diagrams, which provide summary statistical information between the model and reference field are used.
1) Deterministic verification metrics
In this section we summarize the deterministic verifications methods utilized. The mathematical equations for the deterministic metrics are presented in the online supplemental material.
(i) Mean error
The mean error represents the average difference between forecast and verification values. The mean error is primarily a measure of the systematic part of the forecast error. It is important to note that the mean error does not measure the magnitude of the errors. It also does not measure the correspondence between forecast and observation as it is possible to get a perfect score for a bad forecast if there are compensating errors (Kendzierski et al. 2018).
(ii) Root-mean-square error
The root-mean-square error (RMSE) represents the square root of the average of the squared differences between forecasts and verification data. It is a measure of the random component of the forecast error and often used for representing the accuracy of forecasts. The RMSE is sensitive to large errors and provides information on the average magnitude of the forecast errors. However, the RMSE does not indicate the direction of the deviations. The RMSE puts greater influence on large errors than smaller errors (Jorgensen 2016) and thus it might be a good indicator of large errors.
(iii) Linear correlation
Correlation is one of the most widely used measures for forecast verification, and provides an assessment of the strength of the linear association between forecasts and the verifying observation. It is a good measure of linear association or phase error. Jolliffe and Stephenson (2012) noted that it is possible for a forecast with large errors to still have a good correlation coefficient with the observation.
(iv) Taylor diagram
A Taylor diagram (Taylor 2001) summarizes the statistical relationship between model and the observed/reference field. The diagram is useful for evaluating the accuracy of multiple model outputs against a reference data. Further information on the Taylor diagram is provided in the supplemental material.
2) Probabilistic verification metrics
(i) Ranked probability skill score
(ii) Relative operating characteristic (ROC)
ROC measures the ability of a forecast to discriminate between events and nonevents, and measures the degree of forecast discrimination (Mason 1982). Discrimination is the ability to distinguish one categorical outcome from another. The ROC is not sensitive to bias in the forecast, so it does not say anything about reliability. A biased forecast, however, may still have good resolution and produce a good ROC curve, which means that it may be possible to improve the forecast through calibration (Jolliffe and Stephenson 2012). The ROC score, which is computed as the area under the ROC curve, is considered as a useful summary measure of forecast skill. A ROC score of 0.5 indicates unskillful forecasts (i.e., the system is no better than climatology). A ROC score above 0.5 indicates positive discrimination skill and a score of 1.0 represents a perfect forecast. More information on the ROC can be found in Mason (1982), and Jolliffe and Stephenson (2003, 2012).
(iii) Reliability (or attribute) diagram
The reliability (also known as attribute) diagram is a graphical method used to evaluate the reliability of probabilistic forecast systems. The diagram presents the observed frequency against the forecast probability. It basically answers the question of how well the predicted probabilities of an event correspond to their observed frequencies. A forecast system is reliable if and only if all the forecast probabilities are reliable (Toth et al. 2003). A reliability diagram displays a range of forecast probabilities for a given event and their corresponding observed frequencies collected over the reforecast period (Weisheimer and Palmer 2014). Generally, the reliability is high when correspondence between the forecast probabilities and the observed frequencies is good, and it is low when this correspondence is poor. It is expected that all data points will lie on a straight diagonal line in the reliability diagram when the correspondence between the forecast probabilities and the observational frequencies are perfect. A reliability diagram is a form of attribute diagram when the no-resolution (distance from the horizontal or climatological line) and no-skill with respect to the climatology lines are included in the diagram. In the attribute diagram if the curve lies below the line, it indicates overestimation (i.e., the forecast probabilities are too high). On the other hand, if the curve lies above the line, it indicates underestimation (i.e., forecast probabilities are too low).
(iv) Spread–error ratio
The spread–error ratio (SPR) is used to assess the relationship between ensemble spread and the deterministic forecast error. It is defined as the square root of the ratio of mean ensemble variance to the mean squared error of the ensemble mean with the verifying observation. The variance is a measure of the forecast member spread of a particular forecast indicating whether the forecast ensemble spread is large or small, while the RMSE is a measure of the forecast error of the ensemble mean forecast. Thus, the SPR evaluates the ability of the ensemble spread (variance) to depict the forecast error of the data expressed as the RMSE of the ensemble means. When the RMSE and spread are equal, the ensemble successfully predicts the forecast error. When the RMSE is superior to the spread meaning that the SPR is less than 1, it is considered as underdispersive (overconfidence). Conversely, SPR greater than 1 indicates overdispersive (underconfidence). For a reliable forecast system, the ensemble forecasts are expected to have the same size of ensemble spread as their RMSE (Leutbecher and Palmer 2008; Leutbecher 2009). The SPR is suitable for verification of ensemble forecasts and sensitive to both forecast resolution and reliability (Christensen et al. 2015).
3. Results and discussion
a. Rainfall climatology
We first analyzed the spatial distribution of rainfall climatology for individual months using CHIRPS data. Figure 1 shows the observed rainfall climatology during March, April, and May averaged for the period 1981–2010. Climatologically, during the month of March the maximum rainfall is located over southern parts of the region mainly in most parts of Tanzania, Burundi, and Rwanda. During April and May, the rainfall band moves from the southern to the northern part of GHA following the position of the intertropical convergence zone (ITCZ). In April, a marked increase in rainfall occurs throughout the region. In May, the maximum rainfall is located over western part of Ethiopia, most parts of South Sudan and Uganda. The following sections presents the monthly rainfall skill of S2S models over GHA for the individual months using the verification metrics described above.

Spatial distribution rainfall climatology during March, April, and May over GHA using CHIRPS observed data.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Spatial distribution rainfall climatology during March, April, and May over GHA using CHIRPS observed data.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Spatial distribution rainfall climatology during March, April, and May over GHA using CHIRPS observed data.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
b. Deterministic verification scores
1) Mean error
Figures 2a–c show the spatial distribution of mean errors of rainfall between the S2S models and CHIRPS over GHA for March, April, and May, respectively. During March, CMA, HMCR, ISAC, and JMA overestimated, while BoM underestimated the monthly rainfall over most parts of the region. In particular, the overestimation of total monthly precipitation for HMCR and ISAC systems is quite notable. The rest of the models show a mixed signal with variations existing in terms of the location and magnitude of the mean error. Generally, BoM, CMA, CNRM, HMCR, and ISAC show large errors, while ECCC, ECMWF, JMA, KMA, NCEP, and UKMO show smaller mean errors over the region during the month of March.

Spatial distribution of mean error of rainfall between models and CHIRPS during (a) March and (b) April over GHA for the period from 1999 to 2010.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Spatial distribution of mean error of rainfall between models and CHIRPS during (a) March and (b) April over GHA for the period from 1999 to 2010.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Spatial distribution of mean error of rainfall between models and CHIRPS during (a) March and (b) April over GHA for the period from 1999 to 2010.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1


In April, most of the models show larger errors (Fig. 2b) compared to March (Fig. 2a). Consistent with the results for March, the magnitudes of errors are smaller for ECCC, ECMWF, JMA, KMA, NCEP, and UKMO models. In contrast, CMA, CNRM, HMCR, and ISAC largely overestimate the rainfall especially the overestimation in HMCR and ISAC models over the northern part of the region is notable.
During May, the majority of the models overestimate the rainfall mainly over the northern part of the region (Fig. 2c). In contrast, the BoM underestimates the rainfall in most parts of the region. Moreover, some of the models including CMA, JMA, KMA, NCEP, and UKMO show a dry bias over the southern part of the region. It is noted that KMA and UKMO models show similar bias patterns in the region. BoM, CMA, CNRM, HMCR, and ISAC still show large errors over the region.
In general, the results from the mean error analysis show that the magnitude of mean errors are low during the month of March compared to April and May for all the prediction models. CMA, CNRM, HMCR, and ISAC overestimate the monthly rainfall over most part of the region, whereas BoM systematically underestimate the rainfall throughout most of the region. Overall, ECCC, ECMWF, JMA, KMA, NCEP, and UKMO show low bias over the region during March, April, and May. The spatial distribution of the mean error of rainfall from KMA and UKMO are almost identical in most parts of the region. This might be due to the fact that the two models have exactly the same configurations. As mentioned earlier, the only difference between the two models is the atmospheric initial condition (Noh et al. 2016). The reason for the month-to-month skill difference will be discussed later.
2) Root-mean-square error (RMSE)
The spatial distributions of RMSE from the S2S models with reference to CHIRPS are presented from Figs. 3a–c. It can be seen that RMSE are generally higher in April compared to March and May. BoM, CMA, HMCR, and ISAC show large errors over the region in all the months with HMCR and ISAC performing worse (with mean RMSE more than 100 mm), which is consistent with the mean error results. On the other hand, ECCC, ECMWF, KMA, NCEP, and UKMO exhibit good prediction skills over the region in terms of RMSE. It can also be seen that KMA and UKMO prediction systems exhibit similar RMSE patterns over the region. Generally, the magnitudes of the mean errors are small during March compared with April and May.

Spatial distribution of RMSE of rainfall between 11 S2S models and CHIRPS during (a) March and (b) April.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Spatial distribution of RMSE of rainfall between 11 S2S models and CHIRPS during (a) March and (b) April.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Spatial distribution of RMSE of rainfall between 11 S2S models and CHIRPS during (a) March and (b) April.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1


3) Linear correlation
Figures 4a–c illustrate the spatial distribution of correlation coefficients of rainfall between models and CHIRPS for March, April, and May, respectively, for the period from 1999 to 2010 over GHA. Cross hatches indicate regions where the correlation is statistically significant at the 95% confidence level computed using Student’s t test. It can be seen that the skill of the model in producing the rainfall forecast varies from month to month. During March, the majority of the models, with the exception of the HMCR model, show high correlation within the 95% confidence level over the equatorial and southern sector of the region and mainly higher toward the coast. Some of the models show low correlation over the northern part of the GHA, mainly over Sudan, South Sudan and northern and western parts of Ethiopia, but it is important to note that March is not the rainfall season over the northern part of the region (Fig. 1). Overall, ECMWF, JMA, KMA, NCEP and UKMO show relatively high and significant correlation over the equatorial sector compared to the rest of the models. During April, the correlation skills are relatively low over the region compared to March with some models showing a negative correlation in parts of the region. Most notably, CMA model shows negative correlation over the eastern part of the region in April (Fig. 4b). Furthermore, CNRM, HMCR, ISAC, JMA, and NCEP also exhibit negative correlation over parts of the equatorial East Africa, mainly over parts of Kenya and Somalia. BoM, ECMWF, JMA, KMA, and UKMO show relatively improved skill compared to the other models, mainly over the equatorial and southern part of the region. This may be linked with increased predictability in that region associated with the development of low-level Somali jet and Asian summer monsoon system in May as shown by Nicholson (2015). A discussion about the predictability of jet and monsoon will be discussed later in section 3d. During May (Fig. 4c) the models generally show better skill than during the month of April. ECCC, ECMWF, KMA, NCEP, and UKMO models show relatively higher skill with significant correlation over the region compared to the other models. It is found that HMCR presents the negative correlations over most parts of the region reflecting the fact that the model fails to reproduce the interannual variability.

Spatial distribution of correlation coefficient of rainfall between models and CHIRPS during (a) March and (b) April, for the period from 1999 to 2010. Hatching indicates regions where the correlation is statistically significant at the 95% confidence level.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Spatial distribution of correlation coefficient of rainfall between models and CHIRPS during (a) March and (b) April, for the period from 1999 to 2010. Hatching indicates regions where the correlation is statistically significant at the 95% confidence level.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Spatial distribution of correlation coefficient of rainfall between models and CHIRPS during (a) March and (b) April, for the period from 1999 to 2010. Hatching indicates regions where the correlation is statistically significant at the 95% confidence level.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1


In addition to evaluating the S2S models at monthly time scales, we also analyzed the skill of the models for weeks 1 + 2 and weeks 3 + 4 to investigate if the skill for the monthly forecast is coming from weeks 1 + 2 only or there is skill in weeks 3 + 4. In March (Fig. S1 in the online supplemental material) for weeks 1 + 2 the correlation coefficients are statistically significant at 5% level for most models except HMRC showing that the prediction skill is high. In weeks 3 + 4 (Fig. S2), the skill is lower in comparison to weeks 1 + 2. However, ECMWF, KMA, NCEP, and UKMO still have prediction skills with correlations greater than 0.5 over most of the southern and equatorial region. In April (Fig. S3), the weeks 1 + 2 prediction skill is high for most models except for CMA, CNRM, and HMRC which in some areas have weak negative correlations. The majority of the models during April have lower skill in weeks 3 + 4 with most models showing weak negative and positive correlations (Fig. S4). Only ECCC model shows statistically significant correlations in equatorial parts of the region. Since these statistics are calculated over a 12 year period, a larger sample would provide a greater confidence on the skill for weeks 3 + 4 in April. In May (Figs. S5–S6), most models show high prediction skill (significant correlations) in weeks 1 + 2 except for the CMA, ECCC, and HMRC models. The weeks 3 + 4 prediction skills in May are generally higher compared to weeks 3 + 4 in April. During weeks 3 + 4 of May, CMA, KMA, NCEP, and UKMO show higher prediction skill in comparison with the other models. Thus, in general even though the models have lower prediction skills in weeks 3 + 4, the models do have skill in weeks 3 + 4. These results are consistent with Vigaud et al. (2018) who found that during the February to April season the ECMWF model had skill up to weeks 3 + 4. Thus, issuing out the monthly forecasts is likely to aid in tactical decision making over the various sectors in the region that utilize forecast information from the S2S models.
Overall, the results from the correlation analysis show that the correlation skills are highest during March and poor during April. The high prediction skill during March might be linked with high association of March rainfall with tropical sea surface temperatures (SSTs) compared to April and May as indicated by Camberlin et al. (2009) and Moron et al. (2013). On the other hand, the low prediction skill during April might be related with the wind and pressure pattern changes over the Indian Ocean as there is a directional shift in low level winds from northeast (in March) to southwest (in May).
4) Taylor diagram
Figure 5 shows a Taylor diagram displaying normalized statistical comparison (i.e., correlation, root-mean-square error and amplitude of variation) of monthly total rainfall of the S2S models with CHIRPS during March, April, and May, respectively. The rainfall is spatially averaged for the GHA domain by masking out the regions outside GHA. In March, most models (including CMA, CNRM, ECCC, ECMW, JMA, KMA, and UKMO) show high correlation (>0.6) in comparison with the observation. In particular, ECMWF, KMA, and UKMO present relatively high correlation (>0.8) and low root-mean-square difference and have a variation close to the reference data. On the other hand, BoM, HMCR, and NCEP show low correlation (<0.6) with HMCR showing the lowest correlation (i.e., 0.1) and a variation far from the reference field. During April, correlations are relatively low in comparison to March. Moreover, most of the models underestimate the magnitude of year-to-year variation relative to CHIRPS, while three models (CMA, JMA, and ISAC) overestimate the variation. BoM, ECCC and ECMWF have relatively high correlation (r > 0.6) compared with other models. ISAC shows a variation much higher than CHIRPS, while CMA exhibits the lowest correlation. During May CNRM, ECCC, ISAC, KMA, NECP, and UKMO have relatively high correlation (r > 0.6) compared with other S2S models, while JMA and HMCR presents the lowest correlation. It is also noticed that HMCR and JMA indicate extremely high variation compared to CHIRPS.

Taylor diagram displaying normalized statistical comparison of monthly total rainfall of the S2S models with CHIRPS during (top left) March, (top right) April, and (bottom left) May.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Taylor diagram displaying normalized statistical comparison of monthly total rainfall of the S2S models with CHIRPS during (top left) March, (top right) April, and (bottom left) May.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Taylor diagram displaying normalized statistical comparison of monthly total rainfall of the S2S models with CHIRPS during (top left) March, (top right) April, and (bottom left) May.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
c. Probabilistic verification scores
1) RPSS
The fair RPSS from the 11 S2S models for March, April, and May are presented in Figs. 6a–c, respectively. During March most models show positive RPSS (i.e., a forecast better than the climatological forecast values) over most parts of the region, with maximum score over the equatorial sector (Fig. 6a). Consistent with other verification metrics, HMCR shows the lowest skill by presenting negative scores over most parts of the region. In April, the skill for most S2S models is relatively low compared to March. More grid points with negative scores are found than for March. ECCC, ECMWF, and KMA show relatively better skill in the region. During May, the skills of the forecasts are generally higher than April, but lower than March. While ECCC, ECMWF, KMA, NCEP, and UKMO present the highest skill, CMA, HMCR, and ISAC show the lowest skill (Fig. 6c). Overall, the results of RPSS indicate that the skill of the S2S model forecasts is lower in April than March and May, agreeing with the previous results of mean error and correlations. The RPSS values obtained in this study are relatively higher than those in Vigaud et al. (2018) for seasonal evaluation, highlighting the importance of the monthly updates during the season. It is also noted that most models predict worse than climatology over the northern parts of the region, mainly over Sudan. But it is important to note that the northern part of the GHA is generally dry during this season (Fig. 1).

Ranked probability skill score (RPSS) from 11 S2S models for (a) March and (b) April validated against CHIRPS for the period from 1999 to 2010.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Ranked probability skill score (RPSS) from 11 S2S models for (a) March and (b) April validated against CHIRPS for the period from 1999 to 2010.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Ranked probability skill score (RPSS) from 11 S2S models for (a) March and (b) April validated against CHIRPS for the period from 1999 to 2010.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1


2) ROC
Figures 7a–c show ROC skill scores (ROCSS) for lower-tercile forecasts for March, April, and May, respectively. During March, most of the models show a forecast skill better than the climatological forecast (Fig. 7a). In particular, CMA, CNRM, ECCC, ECMWF, ISAC, KMA, NCEP and UKMO show good skill over the region. On the other hand, BoM, HMCR, and JMA present a forecast worse than a climatological forecast over parts of the region especially over parts of Kenya, Somalia, Ethiopia, South Sudan, Uganda, and Tanzania. In April, most of the S2S models show lower skill than in March. ECMWF, KMA, and UKMO perform better than other models, with the ECMWF model showing high ROCSS over the region and outperforming the rest of the models. The rest of the models including BoM, CMA, CNRM, ECCC, HMCR, ISAC, JMA, and NCEP exhibit skill scores of less than 0.5 over equatorial parts of the region indicating the forecast from those systems is worse than the climatological forecast over the specified region. During May, ECMWF, KMA, NCEP, and UKMO prediction systems show good prediction skill over the region compared to the other prediction systems. In contrast, HMCR performs the worst. In general, April forecasts exhibit lower skill than in both March and May. The ROC skill scores for the upper-tercile forecasts have also been analyzed and the results are very similar to lower-tercile forecasts (figure not shown). ROC skill scores for the lower-tercile in weeks 1 + 2 and weeks 3 + 4 for each month was also computed (Figs. S7–S12). The results reveal that nonetheless weeks 1 + 2 have higher skill than weeks 3 + 4, the weeks 3 + 4 still have skill especially in March and May. de Andrade et al. (2021) also evaluated the quality of subseasonal precipitation forecasts over Africa using reforecasts from three models (ECMWF, UKMO, and NCEP) and indicated that the probabilistic forecasts showed reasonable skill in weeks 3 + 4.

Relative operating characteristic skill score (ROCSS) for lower tercile during (a) March and (b) April for the period from 1999 to 2010.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Relative operating characteristic skill score (ROCSS) for lower tercile during (a) March and (b) April for the period from 1999 to 2010.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Relative operating characteristic skill score (ROCSS) for lower tercile during (a) March and (b) April for the period from 1999 to 2010.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1


3) Reliability (or attribute) diagrams
Figure 8 shows the attribute diagrams of precipitation for the below-normal category over GHA from the 11 S2S models during March, April, and May. During March, it can be seen that the majority of the models lie within the gray area particularly for higher probabilities indicating good reliability in the issued reforecast probabilities. Only three of the S2S models, namely CMA, HMCR, and ISAC, lie below the no skill line for forecast probabilities above 0.4. During April, most prediction systems including BOM, CMA, CNRM, ECCC, HMCR, ISAC, and NCEP are away from the perfect reliability diagonal (45°) line particularly for higher forecasted probabilities and indicate the lack of reliability and resolution for the issued hindcast probabilities. The rest of the S2S models show good reliability. In particular, the curve for ECMWF, KMA, NCEP, and UKMO are much closer to the perfect reliability line, indicating a much better agreement between the forecast probabilities and observed frequencies. In May, the three S2S models (i.e., BoM, HMCR, and ISAC) showed the lowest skill by indicating lower resolution and overconfidence. It is also noted that the majority of the models underestimated the low probabilities (below the climatological line). During the three months, it has been found ECMWF shows better prediction skill than the rest of the S2S models. The results for above normal category (figure not shown) were found to be consistent with the results of below-normal category.

Attribute diagram in predicting the monthly precipitation during March, April, and May over GHA for the below-normal category (upper tercile) for the period from 1999 to 2010. In the diagram, the x axis shows the average forecast probability, and the y axis shows the corresponding observed relative frequency.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Attribute diagram in predicting the monthly precipitation during March, April, and May over GHA for the below-normal category (upper tercile) for the period from 1999 to 2010. In the diagram, the x axis shows the average forecast probability, and the y axis shows the corresponding observed relative frequency.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Attribute diagram in predicting the monthly precipitation during March, April, and May over GHA for the below-normal category (upper tercile) for the period from 1999 to 2010. In the diagram, the x axis shows the average forecast probability, and the y axis shows the corresponding observed relative frequency.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
4) SPR
The SPR from the 11 S2S models for March, April, and May are presented in Figs. 9a–c, respectively. In general, it can be seen that most of the S2S models indicate underdispersion (overconfidence) over wet areas and overdispersion (underconfidence) over the dry areas in the northern parts of the region mainly over Sudan. A recent study by de Andrade et al. (2021) also noted overconfidence in ECMWF, NCEP and UKMO models in all weeks and suggested the need to apply calibration for more reliable predictions. In March (Fig. 9a), most of the models show good performance particularly over the equatorial and southern parts of the region. In the HMCR model, the spread is much smaller than the error. During April (Fig. 9b), most models have an SPR less than 1 indicating underdispersion (overconfidence). ECCC and ECMWF outperform other models by presenting SPR values close to 1. In May (Fig. 9c), similar to April, the majority of the models present an error larger than the spread reflecting underdispersive characteristics, with the exception of the northern parts of the region. ECMWF and ECCC perform better than the rest of the prediction systems, while HMCR performs the worst in terms of spread–error relationship. de Andrade et al. (2021) indicated enhanced skill in ECMWF and associated the forecast skill with correct representations of climate drivers’ teleconnections such as El Niño–Southern Oscillation (ENSO), Indian Ocean dipole (IOD) and Madden Julian oscillation (MJO).

Spread–error (SPR) ratio for (a) March and (b) April for the period from 1999 to 2010. SPR below 1 indicates underdispersion (overconfidence) and SPR greater than 1 indicates overdispersion (underconfidence).
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Spread–error (SPR) ratio for (a) March and (b) April for the period from 1999 to 2010. SPR below 1 indicates underdispersion (overconfidence) and SPR greater than 1 indicates overdispersion (underconfidence).
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Spread–error (SPR) ratio for (a) March and (b) April for the period from 1999 to 2010. SPR below 1 indicates underdispersion (overconfidence) and SPR greater than 1 indicates overdispersion (underconfidence).
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1


d. SST and atmospheric features
Further to the evaluation of the skill of S2S models in predicting the monthly rainfall, this study assessed the ability of the models in representing some of the important large-scale features. The goal is to provide insight into the connection between the skill of rainfall forecasts and the representation of key processes that drive monthly rainfall variability in the region.
1) Indian Ocean SST
The Indian Ocean plays an important role in modulating the climate variability of the GHA. Previous studies (e.g., Camberlin and Philippon 2002; Vellinga and Milton 2018; Wainwright et al. 2019) have shown the influence of SST anomalies over the tropical Indian Ocean on the East African long-rains. In this study, we assessed the ability of S2S models to reproduce the teleconnections between SSTs over the Indian Ocean and corresponding rainfall over the GHA. This was done by regressing gridpoint rainfall over the GHA to SST indices over the Indian Ocean. The specific regions (boxes) used to compute the indices are shown in Fig. 10a. These regions (boxes) were selected in accordance with previous studies and are based on the correlation analysis between spatially averaged observed monthly rainfall over the GHA and concurrent gridpoint SST shown in Fig. 10a. During March, the SST gradient between the northern (40°–75°E, 5°S–10°N) and southern (20°–60°E, 40°–20°S) Indian Ocean is used following Wainwright et al. (2019), which linked a reduced March rainfall and delayed onset of the long-rains with warm SSTs south of Madagascar. For the May index, average SSTs in the northern Indian Ocean box (5°S–15°N, 50°–90°E) were used, where the correlations with the rainfall are the strongest and statistically significant.

(a) Correlations between monthly rainfall (March, April, and May) averaged over GHA region and concurrent gridpoint SSTs for the period from 1982 to 2018 using CHIRPS rainfall and NOAA SST data. Hatching indicates regions where the correlation is statistically significant at the 95% confidence level. The boxes indicate location of SST regions used to compute indices for the regression analysis. For March analyses, a western Indian Ocean meridional index is formed by taking the difference between average SSTs over the northern (red) and southern (blue) boxes shown. (b) Linear regression (mm month−1 °C−1) between the March rainfall and the SST index (meridional gradient) over the western tropical Indian Ocean for the period from 1999 to 2010. Hatching indicates regions where the regression coefficient is statistically significant at the 95% confidence level.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

(a) Correlations between monthly rainfall (March, April, and May) averaged over GHA region and concurrent gridpoint SSTs for the period from 1982 to 2018 using CHIRPS rainfall and NOAA SST data. Hatching indicates regions where the correlation is statistically significant at the 95% confidence level. The boxes indicate location of SST regions used to compute indices for the regression analysis. For March analyses, a western Indian Ocean meridional index is formed by taking the difference between average SSTs over the northern (red) and southern (blue) boxes shown. (b) Linear regression (mm month−1 °C−1) between the March rainfall and the SST index (meridional gradient) over the western tropical Indian Ocean for the period from 1999 to 2010. Hatching indicates regions where the regression coefficient is statistically significant at the 95% confidence level.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
(a) Correlations between monthly rainfall (March, April, and May) averaged over GHA region and concurrent gridpoint SSTs for the period from 1982 to 2018 using CHIRPS rainfall and NOAA SST data. Hatching indicates regions where the correlation is statistically significant at the 95% confidence level. The boxes indicate location of SST regions used to compute indices for the regression analysis. For March analyses, a western Indian Ocean meridional index is formed by taking the difference between average SSTs over the northern (red) and southern (blue) boxes shown. (b) Linear regression (mm month−1 °C−1) between the March rainfall and the SST index (meridional gradient) over the western tropical Indian Ocean for the period from 1999 to 2010. Hatching indicates regions where the regression coefficient is statistically significant at the 95% confidence level.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

As in (b), but for linear regression between the May rainfall and the SST index.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

As in (b), but for linear regression between the May rainfall and the SST index.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
As in (b), but for linear regression between the May rainfall and the SST index.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Figure 10b shows SST–rainfall teleconnection patterns obtained by regressing March rainfall against the meridional SST gradient over the Indian Ocean for observations (top-left panel) and individual S2S models (all other panels). The observed patterns indicate that the equatorial parts of the region (5°S–10°N) are positively correlated with the index indicating above normal rainfall when the north-south gradient is strong. On the other hand, the southern and southeastern parts of Tanzania are negatively correlated with the index. In this case, warm SSTs over south western Indian Ocean weaken the meridional SST gradient which creates local convective activity (enhanced moisture convergence), and lead to enhanced rainfall in that part of the region. This is consistent with Wainwright et al. (2019), which suggested warmer SSTs to the south delay the northward progression of the rain-band and lead to increased March rainfall in the southern part, but reduced rainfall over the equatorial and northern part of the GHA. The positive coefficients over the eastern horn of Africa are statistically significant at the 95% confidence level. It can be seen that most S2S models reasonably reproduce the observed features (Fig. 10b). This supports the idea that the relatively strong coupling of SST and rainfall in March is well captured by the S2S models, and that this leads to the high monthly skill found for March.
Rainfall teleconnections for May against the SST index over the northern tropical Indian Ocean are shown in Fig. 10c. The observations exhibit significant positive coefficients over most of the equatorial and southern parts of the region, and negative coefficients over western parts of Ethiopia and the South Sudan–Sudan border areas. This implies that warm SST anomalies in the northern Indian Ocean bring enhanced rainfall over most parts of Eastern Africa, but reduced rainfall over parts of western Ethiopia, South Sudan, and Sudan. Most models poorly represented both the spatial distribution and amplitude of this teleconnection pattern, particularly the positive associations over southern and eastern parts of the region and the negative association over the summer monsoon areas. It can also be seen that there is a linkage between the forecast skill and the teleconnection patterns. For example, ECCC has quite good skill in May over northern Somalia compared to the other models (Figs. 4c and 6c) and also has the best representation of the teleconnection in that region (Fig. 10c). Similarly, ECMWF showed good skill over Western Kenya, and has a good representation to the SST teleconnection in that area.
2) Somali low-level jet
The Somali low-level jet (SLLJ), a major component of the Asian summer monsoon system, is one of the most important sources of moisture for East Africa, particularly during the summer season. It plays an important role in transporting moisture from the Indian Ocean to the region. Although the jet is most intense during the boreal summer season, the northward cross-equatorial flow of the jet starts in April and the jet becomes active over the Indian Ocean during May. A study by Nicholson (2015) indicated that the surface features of the SLLJ begin to develop over the Indian Ocean in April, and by May a deep and well-developed monsoon low becomes evident.
The climatological pattern of SLLJ during May from ERA5 and mean errors of the jet from S2S models in comparison to the ERA5 are shown in Fig. S13. ERA5 shows the jet is characterized by southeasterly flow south of the equator, meridional flow around the equator along the East African coast and southwesterly monsoonal flow over the Arabian Sea. Generally, models which are able to capture these large-scale features have higher skill. Consistent with precipitation performance, ECCC, ECMWF, JMA, KMA, NCEP, and UKMO show smaller errors than the rest of the models. On the other hand, BoM, CMA, HMCR, and ISAC show the largest bias.
To examine the ability of S2S models in representing the spatial patterns and magnitude of rainfall teleconnections with the SLLJ, a regression analysis was applied to a scalar index of the jet. A scalar index of jet intensity was constructed by computing the square root of twice the spatial mean kinetic energy (KE) of 850-hPa horizontal wind over a spatial domain 5°S–20°N; 50°–70°E, as in Boos and Emanuel (2009). Figure 11 shows rainfall teleconnections against SLLJ index estimated by linear regression during May from observation and the S2S models. The teleconnection patterns from ERA5 (Fig. 11, top left) indicates a positive association between the SLLJ index and rainfall over the summer rainfall region (northwestern parts of the analyzed domain), indicating wet conditions associated with a strong jet, possibly through increased moisture flux to the region. It can be seen that most S2S models fail to capture the pattern and the amplitude of the positive teleconnection over the northern part of the region. In particular, BoM, CMA, HMCR, and NCEP produced signals with opposite signs to those found in ERA5 over those areas. ECMWF and ECCC generally capture the positive relationship between the SLLJ index and rainfall, although ECMWF tends to overestimate the magnitude and spatial extent of the positive teleconnection patterns.

Linear regression [mm month−1 (m s−1)−1] between May rainfall and the SLLJ index for the period from 1999 to 2010. Hatching indicates regions where the regression coefficient is statistically significant at the 95% confidence level.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1

Linear regression [mm month−1 (m s−1)−1] between May rainfall and the SLLJ index for the period from 1999 to 2010. Hatching indicates regions where the regression coefficient is statistically significant at the 95% confidence level.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Linear regression [mm month−1 (m s−1)−1] between May rainfall and the SLLJ index for the period from 1999 to 2010. Hatching indicates regions where the regression coefficient is statistically significant at the 95% confidence level.
Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0177.1
Most areas of the equatorial and southern part of the region have weak and inverse relationships with the strength of the SLLJ (ERA5). This implies that enhancement of the jet leads to reduced rainfall over the equatorial and southern part of the region. A study by Nicholson (1996) has also indicated that a strengthening of the SLLJ is associated with enhanced frictionally induced subsidence on the coast of East Africa. The majority of S2S models fairly capture the negative relationship between the strength of the SLLJ and rainfall over the equatorial and southern parts of the region. Analysis of the rainfall teleconnections against SLLJ index from observations over a longer period (1981–2018) revealed that regression coefficients are statistically significant at the 95% confidence level over most parts of the region (figure not shown). This suggests that a large sample is crucial to have a greater confidence on the skill of the models representing the teleconnection patterns.
Overall, our analyses of the important large-scale features revealed that the ability of the models in reproducing the rainfall is partly linked to their ability in representing the important potential oceanic and atmospheric circulation features. However, it is important to note that many other processes contribute to the regional rainfall variability, and thus more in-depth analysis of other relevant atmospheric and oceanic features [such as the MJO, quasi-biennial oscillation (QBO), and Arabian heat low] is crucial to better understand the mechanisms behind the sources of monthly rainfall predictability and elucidate both strengths and deficiencies in the S2S models. For example, Vitart et al. (2017) showed that the ECMWF and UKMO models consistently have higher bivariate correlation for the MJO than the other models, with MJO correlation remaining above 0.6 at several weeks lead time. The ability of such models to better capture large-scale drivers like the MJO could explain their consistently higher skill throughout the different months.
4. Summary and conclusions
Due to the increasing demand for the availability of S2S forecast products and information from the user community, it is important to assess and document the prediction skill of operational prediction systems for different regions and time scales. This study evaluates and compares the skill of 11 state-of-the-art operational models from the S2S database in predicting the monthly precipitation over the Greater Horn of Africa during the long-rains. The prediction skill of S2S models has been examined using reforecast/hindcast data by combining forecasts at lead times from one week to zero over the common period of 1999–2010. The skill has been quantified using different deterministic and probabilistic forecast verification metrics. The deterministic skill assessment is performed using ensemble mean of all reforecast members, whereas the probabilistic forecast verification analysis is performed using all the ensemble members. It has been found that the skill of the models in predicting rainfall is dependent on both the month and region. The models generally showed good prediction skill during the early stage of the rainy season in March and poor prediction skill during the peak of the rainfall season in April. In addition to the monthly evaluation, analysis for model skill in weeks 1 + 2 and weeks 3 + 4 is also conducted. It is shown that although weeks 1 + 2 have higher skill than weeks 3 + 4, the weeks 3 + 4 still exhibit some skill, especially in March and May. The high prediction skill observed during March is likely linked to strong teleconnections between March rainfall and SST over the Indian Ocean, which is well represented by most S2S models. This is in accordance with Camberlin et al. (2009) and Moron et al. (2013) findings, which indicate the March rainfall anomaly patterns are more spatially coherent compared to April and May, and highly associated with tropical SSTs. The low prediction skill during April might be linked with the directional shift in low level winds as there is a progressive directional shift from northeasterly in March to southeasterly in April, where the southeasterlies become stronger and evident in May as highlighted by Nicholson (2015). In May, a diagnostic of SLLJ suggests that the mean error (phase bias) in the position of the jet is a stronger contributor to the quality of the rainfall forecast than its representation of the large-scale teleconnections.
Among the 11 prediction systems, ECCC, ECMWF, KMA, NCEP, and UKMO demonstrate noticeably better skill than the other models. In contrast the BoM, CMA, HMCR, and ISAC prediction systems tend to yield poor prediction skills over the region. Overall, ECMWF outperforms the rest of the models, in terms of both deterministic and probabilistic verification metrics. The best and worst performing models identified in this study are in agreement with findings of the recent study by de Andrade et al. (2019), which assessed the deterministic forecast quality of weekly accumulated precipitation over the globe. This study provides a crucial baseline skill assessment for selecting those models which perform better, thus informing which could be used to construct a multimodel ensemble for producing consolidated forecasts for the GHA region. It is worth noting that in doing so this study directly addresses the WMO recommendation of the need to critically evaluate the skill of forecasting models for different regions and time scale and for selecting a subset of models for producing operational objective S2S forecasts. It has been revealed that the prediction skill of the models in reproducing the regional rainfall was partly linked with the correct representation of some of the important potential atmospheric and oceanic processes and teleconnections such as the SLLJ and SST anomalies over the tropical Indian Ocean. Further diagnostic analysis of other potential drivers is needed to better understand the sources of subseasonal predictability and the linkage between the skill of rainfall forecast and representation of key processes. Moreover, this analysis was performed over a relatively short period (12 years) and thus a large sample size is needed to provide greater confidence on the skill of the S2S models in predicting the rainfall as well as representing the teleconnection patterns.
Acknowledgments
This work was supported by U.K. Research and Innovation as part of the Global Challenges Research Fund, African SWIFT program, Grant NE/P021077/1. Hussen Seid was also supported by Intra-ACP Climate Services and Related Applications (ClimSA) project funded by the 11th EDF (ACP/FED/038-833). Linda Hirons and Steve Woolnough were also supported by the National Centre for Atmospheric Science ODA national capability program ACREW (NE/R000034/1), which is supported by NERC and the GCRF. Zewdu Segele was supported by the Weather and Climate Information Services (WISER) Support to ICPAC Project (W2-SIP).
Data availability statement
All the datasets analyzed in this study (S2S hindcasts, observational, and reanalysis datasets) are openly available and can be accessed from the following links: S2S hindcasts: http://apps.ecmwf.int/datasets/data/s2s, CHIRPS: https://data.chc.ucsb.edu/products/CHIRPS-2.0/, ERA5: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels-monthlymeans?tab=form, and NOAA OISSTv2: https://climatedataguide.ucar.edu/climate-data/sst-data-noaa-optimal-interpolation-oi-sst-analysis-version-2-oisstv2-1x1.
REFERENCES
Boos, W. R., and K. A. Emanuel, 2009: Annual intensification of the Somali jet in a quasi-equilibrium framework: Observational composites. Quart. J. Roy. Meteor. Soc., 135, 319–335, https://doi.org/10.1002/qj.388.
Camberlin, P., and N. Philippon, 2002: The East African March–May rainy season: Associated atmospheric dynamics and predictability over the 1968–97 period. J. Climate, 15, 1002–1019, https://doi.org/10.1175/1520-0442(2002)015<1002:TEAMMR>2.0.CO;2.
Camberlin, P., V. Moron, R. E. Okoola, N. Philippon, and W. Gitau, 2009: Components of rainy seasons’ variability in equatorial East Africa: Onset, cessation, rainfall frequency and intensity. Theor. Appl. Climatol., 98, 237–249, https://doi.org/10.1007/s00704-009-0113-1.
Christensen, H. M., I. M. Moroz, and T. N. Palmer, 2015: Evaluation of ensemble forecast uncertainty using a new proper score: Application to medium-range and seasonal forecasts. Quart. J. Roy. Meteor. Soc., 141, 538–549, https://doi.org/10.1002/qj.2375.
de Andrade, F. M., C. A. Coelho, and I. F. Cavalcanti, 2019: Global precipitation hindcast quality assessment of the subseasonal to seasonal (S2S) prediction project models. Climate Dyn., 52, 5451–5475, https://doi.org/10.1007/s00382-018-4457-z.
de Andrade, F. M., M. P. Young, D. MacLeod, L. C. Hirons, S. J. Woolnough, and E. Black, 2021: Subseasonal precipitation prediction for Africa: Forecast evaluation and sources of predictability. Wea. Forecasting, 56, 265–284, https://doi.org/10.1175/WAF-D-20-0054.1.
Dinku, T., C. Funk, P. Peterson, R. Maidment, T. Tadesse, H. Gadain, and P. Ceccato, 2018: Validation of the CHIRPS satellite rainfall estimates over eastern Africa. Quart. J. Roy. Meteor. Soc., 144, 292–312, https://doi.org/10.1002/qj.3244.
Ferro, C. A. T., 2014: Fair scores for ensemble forecasts. Quart. J. Roy. Meteor. Soc., 140, 1917–1923, https://doi.org/10.1002/qj.2270.
Funk, C., and Coauthors, 2015: The climate hazards infrared precipitation with stations—A new environmental record for monitoring extremes. Sci. Data, 2, 150066, https://doi.org/10.1038/sdata.2015.66.
Hastenrath, S., A. Nicklis, and L. Greischar, 1993: Atmospheric-hydrospheric mechanisms of climate anomalies in the western equatorial Indian Ocean. J. Geophys. Res. Oceans, 98, 20 219–20 235, https://doi.org/10.1029/93JC02330.
Hersbach, H., B. Bell, P. Berrisford, S. Hirahara, A. Horányi, J. Muñoz-Sabater, and A. Simmons, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803.
Jolliffe, I. T., and D. B. Stephenson, Eds., 2003: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley and Sons, 240 pp.
Jolliffe, I. T., and D. B. Stephenson, Eds., 2012: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. 2nd ed. John Wiley & Sons, 292 pp.
Jorgensen, S. E., Ed., 2016: Handbook of Ecological Models Used in Ecosystem and Environmental Management. Vol. 3, CRC Pres, 636 pp.
Kendzierski, S., B. Czernecki, L. Kolendowicz, and A. Jaczewski, 2018: Air temperature forecasts’ accuracy of selected short-term and long-term numerical weather prediction models over Poland. Geofizika, 35, 19–85, https://doi.org/10.15233/gfz.2018.35.5.
Leutbecher, M., 2009: Diagnosis of ensemble forecasting systems. Seminar on Diagnosis of Forecasting and Data Assimilation Systems, ECMWF, 235–266, https://www.ecmwf.int/sites/default/files/elibrary/2010/10725-diagnosis-ensemble-forecasting-systems.pdf.
Leutbecher, M., and T. N. Palmer, 2008: Ensemble forecasting. J. Comput. Phys., 227, 3515–3539, https://doi.org/10.1016/j.jcp.2007.02.014.
Li, S., and A. W. Robertson, 2015: Evaluation of submonthly precipitation forecast skill from global ensemble prediction systems. Mon. Wea. Rev., 143, 2871–2889, https://doi.org/10.1175/MWR-D-14-00277.1.
Maidment, R. I., and Coauthors, 2017: A new, long-term daily satellite-based rainfall dataset for operational monitoring in Africa. Sci. Data, 4, 170063, https://doi.org/10.1038/sdata.2017.63.
Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag., 30, 291–303.
Moron, V., P. Camberlin, and A. W. Robertson, 2013: Extracting subseasonal scenarios: An alternative method to analyze seasonal predictability of regional-scale tropical rainfall. J. Climate, 26, 2580–2600, https://doi.org/10.1175/JCLI-D-12-00357.1.
Moron, V., A. W. Robertson, and F. Vitart, 2018: Sub-seasonal to seasonal predictability and prediction of monsoon climates. Front. Environ. Sci., 6, 83, https://doi.org/10.3389/fenvs.2018.00083.
Müller, W. A., C. Appenzeller, F. J. Doblas-Reyes, and M. A. Liniger, 2005: A debiased ranked probability skill score to evaluate probabilistic ensemble forecasts with small ensemble sizes. J. Climate, 18, 1513–1523, https://doi.org/10.1175/JCLI3361.1.
Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Wea. Forecasting, 8, 281–293, https://doi.org/10.1175/1520-0434(1993)008<0281:WIAGFA>2.0.CO;2.
Nicholson, S. E., 1996: A review of climate dynamics and climate variability in Eastern Africa. The Limnology, Climatology and Paleoclimatology of the East African Lakes, T. C. Johnson and E. O. Odada, Eds., Gordon and Breach, 25–56.
Nicholson, S. E., 2015: The predictability of rainfall over the Greater Horn of Africa. Part II: Prediction of monthly rainfall during the long rains. J. Hydrometeor., 16, 2001–2012, https://doi.org/10.1175/JHM-D-14-0138.1.
Nicholson, S. E., and J. Kim, 1997: The relationship of the El Niño–Southern Oscillation to African rainfall. Int. J. Climatol., 17, 117–135, https://doi.org/10.1002/(SICI)1097-0088(199702)17:2<117::AID-JOC84>3.0.CO;2-O.
Noh, Y. C., B. J. Sohn, Y. Kim, S. Joo, and W. Bell, 2016: Evaluation of temperature and humidity profiles of Unified Model and ECMWF analyses using GRUAN radiosonde observations. Atmosphere, 7, 94, https://doi.org/10.3390/atmos7070094.
Robertson, A. W., A. Kumar, M. Peña, and F. Vitart, 2015: Improving and promoting subseasonal to seasonal prediction. Bull. Amer. Meteor. Soc., 96, ES49–ES53, https://doi.org/10.1175/BAMS-D-14-00139.1.
Rowell, D. P., J. M. Ininda, and M. N. Ward, 1994: The impact of global sea surface temperature patterns on seasonal rainfall in East Africa. Proc. Int. Conf. on Monsoon Variability and Prediction, Trieste, Italy, WMO, 666–672.
Rowell, D. P., C. K. Folland, K. Maskell, and M. N. Ward, 1995: Variability of summer rainfall over tropical North Africa (1906–92): Observations and modelling. Quart. J. Roy. Meteor. Soc., 121, 669–704, https://doi.org/10.1002/qj.49712152311.
Taylor, K. E., 2001: Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res., 106, 7183–7192, https://doi.org/10.1029/2000JD900719.
Tippett, M. K., 2008: Comments on “The discrete Brier and ranked probability skill scores.” Mon. Wea. Rev., 136, 3629–3633, https://doi.org/10.1175/2008MWR2594.1.
Toth, Z., O. Talagrand, G. Candille, and Y. Zhu, 2003: Probability and ensemble forecasts. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. Joliffe and D. Stephenson, Eds., John Wiley and Sons, 137–163.
Vellinga, M., and S. F. Milton, 2018: Drivers of interannual variability of the East African “Long Rains.” Quart. J. Roy. Meteor. Soc., 144, 861–876, https://doi.org/10.1002/qj.3263.
Vigaud, N., A. W. Robertson, M. K. Tippett, and N. Acharya, 2017: Subseasonal predictability of boreal summer monsoon rainfall from ensemble forecasts. Front. Environ. Sci., 5, 67, https://doi.org/10.3389/fenvs.2017.00067.
Vigaud, N., M. K. Tippett, and A. W. Robertson, 2018: Probabilistic skill of subseasonal precipitation forecasts for the East Africa–West Asia sector during September–May. Wea. Forecasting, 33, 1513–1532, https://doi.org/10.1175/WAF-D-18-0074.1.
Vitart, F., A. W. Robertson, and D. L. Anderson, 2012: Subseasonal to seasonal prediction project: Bridging the gap between weather and climate. Bull. WMO, 61 (2), https://public.wmo.int/en/resources/bulletin/subseasonal-seasonal-prediction-project-bridging-gap-between-weather-and-climate.
Vitart, F., and Coauthors, 2017: The Subseasonal to Seasonal (S2S) prediction project database. Bull. Amer. Meteor. Soc., 98, 163–173, https://doi.org/10.1175/BAMS-D-16-0017.1.
Wainwright, C. M., J. H. Marsham, R. J. Keane, D. P. Rowell, D. L. Finney, E. Black, and R. P. Allan, 2019: ‘Eastern African Paradox’ rainfall decline due to shorter not less intense long rains. npj Climate Atmos. Sci., 2, 34, https://doi.org/10.1038/s41612-019-0091-7.
Weisheimer, A., and T. N. Palmer, 2014: On the reliability of seasonal climate forecasts. J. R. Soc. Interface, 11, 20131162, https://doi.org/10.1098/rsif.2013.1162.
White, B. J., and Coauthors, 2017: Potential applications of subseasonal-to-seasonal (S2S) predictions. Meteor. Appl., 24, 315–325, https://doi.org/10.1002/met.1654.