Seasonal predictions have a great socioeconomic potential if they are reliable and skillful. In this study, we assess the prediction performance of SEAS5, version 5 of the seasonal prediction system of the European Centre for Medium-Range Weather Forecasts (ECMWF), over South America against homogenized station data. For temperature, we find the highest prediction performances in the tropics during austral summer, where the probability that the predictions correctly discriminate different observed outcomes is 70%. In regions lying to the east of the Andes, the predictions of maximum and minimum temperature still exhibit considerable performance, while farther to the south in Chile and Argentina the temperature prediction performance is low. Generally, the prediction performance of minimum temperature is slightly lower than for maximum temperature. The prediction performance of precipitation is generally lower and spatially and temporally more variable than for temperature. The highest prediction performance is observed at the coast and over the highlands of Colombia and Ecuador, over the northeastern part of Brazil, and over an isolated region to the north of Uruguay during DJF. In general, Niño-3.4 has a strong influence on both air temperature and precipitation in the regions where ECMWF SEAS5 shows high performance, in some regions through teleconnections (e.g., to the north of Uruguay). However, we show that SEAS5 outperforms a simple empirical prediction based on Niño-3.4 in most regions where the prediction performance of the dynamical model is high, thereby supporting the potential benefit of using a dynamical model instead of statistical relationships for predictions at the seasonal scale.
Seasonal climate forecasts are increasingly sought after to support decisions in a variety of sectors. Their potential has been demonstrated with applications in agriculture (WMO 2007; Hansen 2002) through the prediction of droughts (NIDIS 2004; Schubert et al. 2007; Pozzi et al. 2013; Shafiee-Jood et al. 2014; Yuan and Wood 2013) and crop yield modeling (Cantelaube and Terres 2005; Challinor et al. 2005; Ceglar et al. 2018), in the energy sector (Troccoli 2010; Brayshaw et al. 2011; De Felice et al. 2015; Clark et al. 2017; Svensson et al. 2015), in the insurance sector through weather derivatives (Jewson and Brix 2005), in the transport sector (Palin et al. 2016; Karpechko et al. 2015), in seasonal hurricane prediction (Emanuel et al. 2012), and in the health sector through malaria predictions (Morse et al. 2005). Seasonal predictions are based on statistical tools, for example based on linear regression, or on dynamical models such as ECMWF Systems 4 and 5 (Molteni et al. 2011; Johnson et al. 2019), NCEP CFS (Saha et al. 2006), and many others [see Troccoli (2010) for an overview], or a combination of both (Coelho et al. 2006a).
The quality of monthly and seasonal predictions has been regionally assessed within a variety of studies (e.g., Palmer 2002; Palmer et al. 2004; Saha et al. 2006; Wang et al. 2009; Alessandri et al. 2011; Lee et al. 2010, 2011; Kim et al. 2012). Within the European Provision of Regional Impacts Assessments on Seasonal and Decadal Time Scales (EUPORIAS) project for example, verification metrics of the European Centre for Medium-Range Weather Forecasts (ECMWF) forecasts Systems 4 and 5 (Molteni et al. 2011; Johnson et al. 2019) against ERA-Interim (Dee et al. 2011) were globally calculated on a 2° regular grid and published online (Wehrli et al. 2017). However, the interpretations of the verification analyses often focused on regions in the Northern Hemisphere. For South America, two studies by Coelho et al. (2006a,b) based on both empirical and multimodel approaches showed that the best seasonal precipitation prediction quality during austral summer is found in the tropics and the region around southern Brazil, Uruguay, Paraguay, and northern Argentina. A recent study by Osman and Vera (2017) on both temperature and precipitation confirms the highest performance in the tropics during DJF for temperature, and states that precipitation prediction ability over South America is similar in DJF and JJA.
In South America, national weather services (NWSs) typically use the IRI Climate Predictability Tool (CPT) (Mason and Baddour 2008; Mason and Tippett 2016) to issue seasonal forecasts. CPT is a statistical downscaling tool and can be used for statistical prediction using differing predictor fields. In practice, the predictor field is determined individually for each region, forecast variable, and time period of interest, often a tedious and subjective task. Further, many weather services qualitatively modify the CPT output based on a consensus discussion before publicly issuing the seasonal forecasts. The subjectivity in the procedure and the fact that the forecasts are not stored makes retrospective verification of these predictions difficult. Dynamical models are currently much less used in operational seasonal forecasting in South America. However, there is a tendency for dynamical model forecasts to be used more extensively in the future by the NWSs in South America, particularly because they have a variety of advantages over statistical models: the physical consistency between different variables is assured to the extent represented by the model, predictions are issued globally meaning that no abrupt changes occur at country borders, and there is no need to define predictors individually. Since the ECMWF seasonal forecasting system has proven to be among the best models predicting ENSO (Barnston et al. 2012) and is openly available now through the Copernicus Climate Change Service,1 there is considerable interest of the NWSs in South America in this system, and consequently in its verification.
El Niño–Southern Oscillation (ENSO) is one of the principal modes influencing climate at the seasonal scale in South America (Grimm et al. 2000; Pezzi and Cavalcanti 2001; Zhou and Lau 2001; Brönnimann 2007; Troccoli 2010; Shimizu et al. 2017; Sulca et al. 2018), and the predictability of ENSO and its teleconnection has been identified as the main source of predictability at the seasonal scale in that region (e.g., Manzanas et al. 2014). It is widely known that ENSO has a large influence on extreme events in the Andean region. For instance, a strong El Niño event may result in economic losses due to increased rainfall, landslides and floods, mainly in the lowlands of Ecuador and Peru, and rainfall deficits in Colombia and the highlands of Peru (e.g., Vicente-Serrano et al. 2017; Erfanian et al. 2017). Given ENSO’s large impacts in South America and considering the ability of seasonal forecasts to predict it at least to some extent, there is obvious potential in socioeconomic benefits with seasonal prediction in this region. Within the second phase of Climandes (Rosas et al. 2016), a project aiming at developing climate services for the agricultural sector, we investigated the potential benefit of seasonal forecasts for smallholder farmers. To ensure that forecasts are trusted and applied by farmers, the performance of the forecast and the forecast’s relevance for decision making at the local scale are crucial (Ash et al. 2007). For instance, Ziervogel et al. (2005) state that seasonal climate forecasts may only benefit smallholder farmers if they are correct in more than 60%–70% of the cases, otherwise they do more harm than good. Furthermore, the local focus implies that forecast performance should be ideally evaluated against representative ground observations rather than quasi-observations such as reanalyses that have traditionally been used to analyze seasonal forecast quality. The use of reanalyses as ground truth in South America is further complicated as they show deficiencies in representing the local climate conditions for both temperature (Hofer et al. 2012) and precipitation (Imfeld et al. 2019) in regions of complex terrain.
For these reasons, this study verifies SEAS5, the latest seasonal prediction system of the ECMWF, against high-quality homogenized in situ precipitation and air temperature observations over South America. Furthermore, we extend the existing analyses on seasonal forecast performance cited above by comparing forecast quality for all seasons and regions in South America. The performance of SEAS5 is then interpreted with an analysis of the relationship of the ENSO index of region 3.4 (Niño-3.4) with air temperature and precipitation observations. Further, SEAS5 prediction performance is compared to a simple statistical model based on Niño-3.4 to study whether high prediction performance observed for SEAS5 is purely due to ENSO as the main source of predictability.
The ECMWF seasonal prediction system SEAS5, released in 2017, is the operational seasonal forecasting system of ECMWF at time of publication (Johnson et al. 2019). We verify the hindcasts (or reforecasts) of ECMWF SEAS5 that are available for the period of 1981–2018. Like the operational seasonal forecast, hindcasts are initialized on the first day of every month, but in contrast to the operational forecast ensemble with 51 members, the hindcasts consist of 25 members. SEAS5 has a spatial resolution of ~35 km2 and forecasts run out to 7 months. This study focuses on forecasts with a 1-month lead time, referring to a forecast for December–February that is issued in November, for example.
The observations used as ground truth for verification consist of more than 200 meteorological stations measuring daily precipitation (Prec) as well as maximum and minimum temperature (Tmax and Tmin, respectively). The stations cover the whole region of South America from Colombia and Brazil to southern Chile and Argentina (Fig. 1), with the exception of Venezuela, Guiana, Suriname, and French Guiana. In a joint effort described in Skansi et al. (2013), all measurements were quality controlled applying the RClimDex software (Zhang and Yang 2004) and an additional quality control software developed by Aguilar et al. (2010). The observations were further homogenized using the RHtestV3 (Wang et al. 2010) and the RSNHT softwares (Alexandersson and Moberg 1997; Aguilar 2010), however without gap filling. The homogenized observations are available from 1965 to 2010. Note that homogenization and quality control are important prerequisites for verification since erroneous measurements or artificial break points may alter the outcomes of a verification procedure. In this study, all data are averaged on a monthly/seasonal basis.
The relationship between ENSO (e.g., Trenberth 1997) and the meteorological variables is analyzed based on the seasonal El Niño index2 (Barnston and Ropelewski 1992) as a 3-month running mean of ERSST.v5 sea surface temperature (SST) anomalies in the Niño-3.4 region (5°N–5°S, 120°–170°W) issued by NOAA (Huang et al. 2017), referred to as Niño-3.4 hereafter.
a. Clustering—Climatology of South America
As a first step, the observation sites are clustered based on the correlations of the standardized monthly anomalies of Tmin, Tmax, and precipitation. A hierarchical distance clustering of the Fisher-transformed and averaged correlation matrices of the three variables was performed using the “agnes” algorithm (Kaufman and Rousseeuw 1990; Struyf et al. 1996, 1997; Lance and Williams 1967) of the “cluster” R-package. An analysis of differing cluster sizes (2–20) showed that a relatively large number of clusters is needed to represent the differing climate zones in South America. As a compromise between representing all climate zones and avoiding having a cluster number that is too large for analysis, a total of 15 clusters was selected. From these 15 objective clusters, four stations were manually moved to another cluster for obvious geographic considerations such as horizontal distance and altitude (Fig. 1). All subsequent analyses are performed on the resulting clusters (referred to by an uppercase letter “C” followed by the cluster number), which are briefly described here in terms of their climatology. The climatologies depicted in Fig. 2 are averages of the measurements of all stations lying within a cluster.
Four distinct rainfall classes can be distinguished in South America (Fig. 2). Many regions show a unimodal rainfall distribution peaking either in austral summer (C3–C5 and C10–C12), in austral spring (C8 and C9), or in austral winter (C14 and C15). A clearly bimodal rainfall pattern is observed in the Colombian and Ecuadorian Andes and lowlands (C1, C2, and C7) peaking during the transition seasons. No rainfall at all is observed in the desert region at the northern Chilean/southern Peruvian coast (C6). The highest rainfall amounts occur in the Amazonian regions in Brazil, while the lowest amounts occur in the desert region mentioned above, in Argentina, and in the Altiplano region in Bolivia and Peru.
In the northern tropical regions (C1–C3, C6, and C7), temperature is almost constant throughout the year, and the daily temperature range (i.e., the difference between day and night temperatures) reaches around 5°–10°C on average. Farther toward the south, the annual cycle becomes apparent and increasingly more pronounced, with a mean amplitude (i.e., mean differences between winter and summer temperature) of up to 15°C in the south.
The verification was carried out for 3-monthly means of the forecasts with 1-month lead time (i.e., forecast months 2–4). It focuses on ensemble averages and on categorical forecasts in the form of tercile probabilities. This means that both the forecasts and verifying observations are categorized into three climatologically equiprobable classes, the boundaries of which are derived from the hindcast climatology and observation climatology, respectively. This forecast format implicitly contains a calibration step and verification results are much less affected by systematic forecast errors (e.g., time-mean biases and variance errors). In the following, the lower (upper) terciles will be referred to as dry (wet) for precipitation and cold (warm) for temperature for the sake of simplicity (even though, for example, “dry” does not necessarily imply dry in an absolute sense, but drier-than-normal conditions). The closest grid point of the SEAS5 hindcasts was bias-corrected at each station by quantile mapping using “qmap” (Gudmundsson et al. 2012) implemented within the “biascorrection” R-package (Bhend 2017). The verification was performed for the overlapping time period of the two datasets (i.e., 1981–2010).
It is widely recognized that verification should be done based on a variety of measures to assess the different aspects of forecast quality (Jolliffe and Stephenson 2012). Therefore, diverse prediction quality metrics measuring the association, the accuracy, the discrimination, and the reliability of the forecasts were applied here (see Murphy 1993 for an overview, Table 1). These metrics assess the following aspects of the forecasts (described in the same order of appearance as in the listing above): the linear relationship between the variation of the forecast ensemble mean and the variation of the observations (measured by the Pearson correlation coefficient); the level of agreement between forecasts and observation (measured by the root-mean-squared error of the forecast ensemble mean and the observation, and by the ranked probability skill score of the tercile forecasts and observations); the ability of the forecast to distinguish outcomes for which the observations differ (measured by the discrimination score and the area under the receiver operating curve); and the correspondence of the forecasted probability and the observed frequency of an event (measured by the Weisheimer reliability scores).
All metrics were calculated using R (R Core Team 2012) applying the “easyVerification” (MeteoSwiss 2016) and SpecsVerification (Siegert et al. 2017) R-packages. Except for the Weisheimer score, all metrics were calculated individually at each station and then summarized for each cluster using the median. The Weisheimer reliability score—referred to as Weisheimer score in the following—is based on the slope of the line of the reliability diagram and its associated uncertainty (e.g., Weisheimer and Palmer 2014). Note that a slope of one refers to a perfect correspondence of the forecasted probabilities and the observed frequencies. To ensure a large sample size, the Weisheimer score was determined individually for each cluster by calculating the score based on an “artificial” time series consisting of the pooled hindcasts and observations of all stations lying within a cluster. The score ranges from “dangerous to use” (score 1) to “almost perfect” (score 5).
To give an overview on the verification results, we focus on Pearson correlations and the Weisheimer score. Other scores are included in the discussion of the clusters in case they show significant differences to these main scores and provide additional insights for interpretation. The results for all scores can be found in the appendix.
c. Comparison with ENSO
The statistical relationship between ENSO and the meteorological variables is analyzed based on the Pearson correlation of the seasonal Niño-3.4 index and the observations. The correlations are first determined for different lags from 0 to 11 months. In the analyses, a special focus is put on a lag-1 comparison, referring for example to the Niño-3.4 index in NDJ correlated against the observations of DJF (where January and February refer to the following year). This focus was made since the relationship between ENSO and the observations are strongest for the shortest lag over the whole continent, and also because the NWSs in South America using CPT often apply an ENSO index at lag 1 for their seasonal forecasts.
Furthermore, the prediction skill of SEAS5 was compared to this lag-1 Niño-3.4 benchmark. The reader should note that this benchmark is not applicable as a prediction in practice since two-thirds of the predicted months will already have passed when using a seasonal index at lag-1 for prediction. Here, we use it as a theoretical reference and express both SEAS5 and Niño-3.4 lag-1 coefficients of determination with respect to the observations as a fraction. Fraction values above one indicate a higher performance of SEAS5 while values below one stand for higher performance of a simple linear model using Niño-3.4 as predictor. It is widely known that the different types of ENSO (i.e., different regions of anomalous sea surface temperature) influence the atmosphere in different ways and may have opposite effects depending on where the sea surface temperature is maximized (Waylen and Poveda 2002; Takahashi et al. 2011; Penalba and Rivera 2016; Tedeschi et al. 2015, 2016; Garreaud 2018). However, studying the influence of ENSO in regions other than Niño-3.4 goes beyond the scope of this publication.
d. Usefulness of seasonal forecasts for small-scale applications
One main goal of the Climandes project was to determine the usefulness of SEAS5 for small-scale applications for agriculture in South America. Ziervogel et al. (2005) state that forecasts need to be correct at least 60%–70% of the time in order to be of use for smallholder farmers. The generalized discrimination score, measuring the percentage of correct forecasts (Weigel and Mason 2011), is a suitable measure of the forecasts’ usefulness. To enable analyses based on the Pearson correlation as done in this analysis, the percentages determined by Ziervogel et al. (2005), which can be related to the discrimination score, were empirically converted to thresholds regarding the Pearson correlation. Therefore, a linear model on the correlation and the discrimination score was established based on all the data points available in this study. By applying a Fisher transformation (i.e., the inverse hyperbolic tangent function) to the correlation and the discrimination score (after first being transformed to fall into the interval [−1; 1]), it is assured that the two verification metrics are approximately normally distributed. The fitted linear model resulted in the following relationship between the correlation and the discrimination score (DISCR):
From the formula, it follows that the case of forecasts being correct at least 60% of the time (i.e., a discrimination of 0.6) corresponds to correlations of at least 0.32, while forecasts that are correct at least 70% relate to correlations lying above 0.58. Therefore, we chose 0.3 as the correlation threshold below which forecasts are potentially harmful and 0.6 as the correlation threshold above which forecasts are useful even for small-scale applications. Note that the uncertainty with regard to these thresholds is quite large, and that the values 0.032 and 1.496 are estimated empirically and hold for the data underlying this study. Whether the parameters would strongly differ in another region of the world cannot be determined here.
4. Results and discussion
a. Forecast quality of ECMWF SEAS5 and the influence of ENSO
Given the strong influence of ENSO on the climate of South America and its important role in determining predictability at the seasonal scale (see references in the introduction), a high prediction performance is to be expected in South America, with potential even for small-scale applications. Detailed analyses however show that the prediction performance at the seasonal scale is limited to specific regions and seasons of the year, and highly dependent on the variable of interest. In general, the performance of seasonal temperature predictions is higher than that of precipitation predictions (Fig. 3), while differences between minimum and maximum temperature predictions are less pronounced. This is not surprising since precipitation predictions generally exhibit far less skill than temperature predictions due to the intermittent nature of precipitation and due to precipitation formation being strongly influenced by local processes. In the following sections, the spatiotemporal differences of temperature and precipitation prediction performances are assessed in more detail and put into relation with the Niño-3.4 index.
1) Northern Andes and Pacific coast
At the stations lying close to the Pacific Ocean and in the Andes north of 5°S (C1–C3), high prediction performance of SEAS5 is found, with correlations of the temperature hindcasts ranging on average above the usefulness threshold of 0.6 determined in section 3d (Figs. 3a,c) and high reliability classes ranging between 4 and 5 (Figs. 4b,c,e,f). This is also the region with the strongest relationship between Niño-3.4 and air temperatures (Fig. 5). As they are strongly influenced by the sea surface temperature, air temperatures in this region rise (decrease) during El Niño (La Niña). This correlation is strongest from austral winter to fall, with correlations between Niño-3.4 and observations ranging between 0.4 and 0.8 for Tmax and between 0.3 and 0.6 for Tmin (Figs. 5a,b). In the tropical regions of the Andes (as well as in the Amazon described below), the relationship of Niño-3.4 with precipitation is of type warm (positive ENSO index)—dry/cold (negative ENSO index)—wet, indicated by the negative correlations of the precipitation observations with this ENSO index (Fig. 5c), while at the Ecuadorian coast, the influence of ENSO is of opposite sign. These findings corroborate previous studies by Vuille et al. (2000), Poveda et al. (2011), Waylen and Poveda (2002), Córdoba-Machado et al. (2015a,b), Recalde-Coronel et al. (2014), and Sulca et al. (2018). In accordance with Vuille et al. (2000), for instance, the strongest negative correlations of Niño-3.4 and precipitation in the Colombian/Ecuadorian highlands occur during December–March (Fig. 5c) as well as in June–September (Fig. A1). In contrast, El Niño is related to heavy precipitation events due to strong convection in austral summer along the coast of northern Peru and Ecuador (Aceituno 1988; Takahashi 2004; Lagos et al. 2008; Bazo et al. 2013; among others). This increase in excessive precipitation is due to enhanced water vapor availability and convection because of anomalously high SSTs (e.g., Lavado-Casimiro and Espinoza 2014) and is confined to a narrow band at the Ecuadorian coast (Vuille et al. 2000). Similarly as for temperature, the prediction performance for precipitation is relatively high in these tropical regions with correlations above 0.45 (C1 and C2) and even above 0.8 (C3) (Fig. 3e), and Weisheimer scores ranging between 4 and 5 for clusters 1 and 2 (Figs. 4h,i). However, note that cluster number 3 at the Ecuadorian coast exhibits a Weisheimer score of only 3.
At the example of this cluster, it can be illustrated that analyzing several performance measures is required to fully describe forecast quality. Despite the strong association of rainfall forecasts and corresponding observations at the Ecuadorian coast (C3), accuracy and discrimination of tercile category forecasts are clearly lower than for the other clusters close by (e.g., Fig. A1). A detailed analysis showed that the strong El Niño years with heavy precipitation events (1982/83 and 1997/98) are well captured by the ECMWF SEAS5 resulting in high correlations and a small RMSE compared to the climatological forecast. The low RPSS and discrimination as well as the Weisheimer score of 3 (Figs. 4h,i), however, indicate that the year-to-year variability is not well captured in years when El Niño is weaker or absent. This reflects the general challenge of issuing seasonal predictions during neutral ENSO phases, or, in this region, during La Niña.
2) Central Andes
The prediction performance of temperature farther south in the Andes [i.e., in the central Andes (C4 and C5)], is still high especially during austral summer (Figs. 3a,c and A2). This is seen both in correlations ranging above 0.6 for Tmax and between 0.3 and 0.6 for Tmin on average, and the Weisheimer scores that range between 4 and 5 for both Tmin and Tmax in DJF (Figs. 4b,c,e,f). In contrast however, the negative rainfall anomalies associated with positive ENSO phases in the Central Andes during austral summer reported previously (Vuille 1999; Silva et al. 2008; Lagos et al. 2008; Lavado-Casimiro et al. 2012; Lavado-Casimiro and Espinoza 2014) and confirmed by this study (correlation values around −0.3, Fig. 5) do not lead to a prediction performance exceeding climatological information for precipitation, indicated by the correlations lying below the usefulness threshold of 0.3 (Fig. 3c) and the generally low reliability scores (Fig. 4h). An exception is found with regard to a reliability score of 4 for dry episodes in the Peruvian and Bolivian Altiplano. The low correlations of SEAS5 with precipitation observations may be due to different reasons, for instance the independence of the onset of the rainy season from ENSO (Silva et al. 2008) or the relation of dry spells in the Peruvian Andes with wet anomalies in northeastern Brazil through the Bolivian high (Sulca et al. 2016). Furthermore, other processes such as the upper-tropospheric zonal wind anomalies influence precipitation in the central Andes (Imfeld et al. 2019). Investigating whether these phenomena are well represented in SEAS5 would require further examinations going beyond the scope of this study.
As a side effect, the verification analysis pointed us to single remaining quality problems in the observations. For example, the temperature forecast skill at one station in the Altiplano region differed strongly in comparison to other stations in that region. A closer look at the temperature record revealed obvious issues of the temperature measurements (e.g., Hunziker et al. 2017), making it necessary for the station to be excluded from the analysis. It is known that model evaluation can be used to assess the quality of observational data (see Massonnet et al. (2016) for a more comprehensive approach and in-depth discussion).
3) Northern and central Amazon
Toward the east of the northern Andes [i.e., in the western Amazon basin (C7)], temperature prediction performance is still relatively high with correlation values clearly above 0.3 despite a weaker relationship with Niño-3.4 than in the Andes (Figs. 3a–c, 5a,b, and A3). The high prediction quality of SEAS5 in this region indicates that modes other than ENSO influence the predictability of temperature at the seasonal scale, for instance, North Atlantic SST (Marengo et al. 2008; Coelho et al. 2012; Panisset et al. 2018), which are presumably well represented by SEAS5. In contrast, precipitation performance is very low throughout the year in that specific region. Although it has been shown that the SST of the Atlantic influences precipitation in the Amazon region (Yoon and Zeng 2010), specifically a north–south tropical Atlantic SST dipole-like structure (Vuille et al. 2000; Ronchail et al. 2002), this study indicates no prediction performance for precipitation in the region, with correlation values below 0.3 (Figs. 3e and A3).
In contrast, the prediction performance of SEAS5 in northern Brazil (C8) is quite high for many 3-month periods with correlation values ranging clearly above 0.3 for precipitation and even up to 0.6 for temperature. The highest correlations for temperature occur during DJF (Figs. 3b,d and A3) and from MAM to MJJ for precipitation (Fig. 3e). The influence of ENSO on both temperatures and precipitation in the region is relatively high (Fig. 5) (e.g., Uvo et al. 1998 and Coelho et al. 2002, 2006a). Aceituno (1988) showed that drier-than-normal conditions prevail during negative phases of the Southern Oscillation in northeastern Brazil in late austral summer, which might be a reason for the relatively high prediction performance for precipitation in comparison to its neighboring regions.
In eastern Brazil (C9), temperature shows only a weak positive correlation with Niño-3.4 around MAM (Fig. A3). Nevertheless, the temperature prediction scores of SEAS5 are positive and quite high throughout the whole year and range around 0.6 with no indications of seasonality. It has previously been shown that processes other than ENSO influence the predictability of temperature in the region such as the tropical Atlantic dipole (Moura and Shukla 1981), possibly being the reason for the relatively high year-round prediction performance. With regard to precipitation, the highest prediction performance is observed from AMJ to MJJ similar as for C8 (Fig. 3e), indicating in general that climate modes other than ENSO influence predictions of precipitation at the seasonal scale in this region. Pezzi and Cavalcanti (2001) reported that precipitation anomalies in northeastern Brazil are influenced both by the SST conditions in the Central Pacific and the Tropical Atlantic SST Dipole, resulting in drier-than-normal conditions if El Niño is combined with a positive dipole, and wetter-than-normal for negative dipoles
Between the latitudes of 20° and 30°S (C10–C15), the temperature forecast performance is only marginal. Only in JJA, the scores reach values that are slightly above zero. Similarly, precipitation performance is low, except for the region to the north of Uruguay (C12), which stands out as a local peak with reliability category 5 during DJF (Figs. 4h,i). This isolated region of higher prediction performance with correlation values above 0.3 has already been detected in previous studies (e.g., Coelho et al. 2006a,b). This increased precipitation prediction performance in the region with respect to its surrounding regions stands in relationship with teleconnections of ENSO through a warm–wet/cold–dry relationship (Figs. 5c and A4) (see also Diaz et al. 1998; Montecinos et al. 2000).
No significant forecast performance was found for regions farther to the south (C13–C15, Figs. 3, 4, and A5), although various studies exist highlighting certain relationships of both temperature and precipitation with ENSO (e.g., Garbarini et al. 2016; Aceituno 1988; Montecinos et al. 2000; Montecinos and Aceituno 2003; González and Vera 2010; Rutllant and Fuenzalida 1991; Garreaud et al. 2009; Schneider and Gies 2004), some of which were also found here (Fig. 5). However, the correlations were weaker than the usefulness threshold of 0.3 and are therefore not further discussed here.
b. Is the identified prediction performance solely due to ENSO?
The comparison of a statistical forecast based on Niño-3.4 alone and SEAS5 shows that SEAS5 predictions for temperatures outperform the simple statistical model in most regions (Figs. 6a,b), providing indication that prediction performance in SEAS5 is not solely due to a simple lagged response to ENSO. For precipitation, the differences between SEAS5 and a simple statistical ENSO model are less pronounced (Fig. 6c). Especially in Peru, the highlands of Bolivia, as well as in the region to the north of Uruguay, the simple statistical model yields similar correlations on average for precipitation. In these regions, ENSO and its teleconnections are probably the only modes of variability that are well represented in SEAS5. An analysis of other modes of variability and their representation in SEAS5 could yield further insights and possibilities of improvement of the model in this region.
Certainly, more sophisticated statistical tools such as CPT also make use of additional predictors. It cannot be derived and was not the goal of these analyses that SEAS5 outperforms any statistical prediction. A thorough verification of CPT forecasts would be required to assess the potential extra benefit of one method over the other. However, the generally higher performance of SEAS5 over a simple empirical model using solely Niño-3.4 as the predictor variable for South America seasonal climate variables supports a potential benefit of using dynamical models, in addition to their advantages mentioned in the introduction of this study.
c. On the effect of spatial aggregation on forecast quality
In general, it is widely accepted that spatiotemporal aggregation of seasonal forecasts increases their performance (Buizza and Leutbecher 2015). In this section, the verification results are compared to a study by Weisheimer and Palmer (2014) done at the global scale, where regions such as South America are summarized in two areas (i.e., the continent of South America is partitioned into two parts roughly divided by the 18 °S latitude). The study by Weisheimer and Palmer (2014) shows that the reliability of ECMWF System 4 of both precipitation and temperature is high over South America [see Figs. 4 and 5 in Weisheimer and Palmer (2014)], except for the lower tercile categories (cold respectively dry) in DJF. In DJF, the southern part of South America falls into the medium or “marginally useful” of the five reliability categories.
Despite two differences in the datasets used in Weisheimer and Palmer (2014) compared to the present study (i.e., an updated version of the ECMWF seasonal prediction model and the use of station data instead of reanalysis data as ground truth), we think that new insights are gained by a comparison of the studies. For instance, more detailed spatial differences, as well as a greater complexity of the patterns are suggested by this study. It is clear that the global study by Weisheimer and Palmer (2014), aiming at introducing a simple categorization of the prediction performance and at providing a broad global picture on reliability, does not aim at resolving individual features as done by the present study.
The reliability scores determined by Weisheimer and Palmer (2014) are mostly higher than the scores observed in this study, especially in the southern part of the continent. In regions south of 25°S, the reliability of average temperature ranges between 2 and 3 in DJF in this study (not shown), while classified between 3 and 4 in Weisheimer and Palmer (2014). Higher reliabilities can be attributed to the effect of the improvement of prediction performance due to spatial aggregation (Buizza and Leutbecher 2015). Improving performance by aggregation (spatially or temporally) however limits the potential use of the seasonal forecasts to stakeholders operating at larger scales and excludes for instance smallholder farmers.
On the other hand, the opposite also occurs (i.e., we observed higher reliability at lower spatial aggregations). The most prominent example in this regard is related to precipitation forecasts in the region to the north of Uruguay (C12) in DJF [see section 4a(4)]. In this quite small cluster, “perfect” reliability for both wet and dry terciles is obtained in DJF, while the reliability score in all but one of the surrounding regions ranges between 1 and 3 (Figs. 4h,i). Obviously, this local spot of high prediction performance is spatially not resolved in Weisheimer and Palmer (2014). It is however possible that the reliability estimated by Weisheimer and Palmer (2014) is affected. The example indicates that spatial aggregation might result in the loss of information and performance, and thereby conceal potential opportunities also for stakeholders acting at smaller scales.
Another discrepancy in the spatial structure is observed for the temperature reliability in JJA. In Weisheimer and Palmer (2014), the southern part of the continent has a higher reliability (5) than the northern part (4). In contrast, this study finds no differences between the southern and the northern part of the continent during JJA (Fig. A7). The reliability scores found here range around the “still useful” category (category 4) with very few individual clusters being lower or higher. In contrast to Weisheimer and Palmer (2014), the pattern observed in this study does not suggest a higher seasonal prediction quality for average temperature in JJA in the extratropical regions of South America compared to the tropical ones.
This study is the first to investigate temperature and precipitation prediction performance of a state-of-the-art dynamical seasonal forecast model against homogenized station observations over South America for all seasons. Thereby, biases in the verification result due to known biases or errors in reanalysis or other datasets such as those based on satellites were avoided.
In accordance with previous studies (e.g., Coelho et al. 2006a,b; Manzanas et al. 2014; Weisheimer and Palmer 2014), the highest performances for precipitation and temperature were found in those regions most strongly affected by ENSO variability (i.e., in the tropics during DJF). In the southern extratropics, generally characterized by low seasonal prediction performance, an isolated region of high precipitation prediction performance is found to the north of Uruguay in DJF, possibly due to ENSO teleconnections. Here, the prediction performance of SEAS5 was found to be on a level that is potentially useful even for applications by smallholder farmers.
It is widely recognized that the potential of predictions can be increased through spatial and temporal aggregation. This study showed, however, that regions of high potential prediction performance with limited extent can be identified, such as the mentioned isolated subtropical region to the north of Uruguay. These findings are possibly relevant for operational prediction at smaller scales. Furthermore, the results presented here do contradict previous studies in some cases. Discrepancies were observed with regard to both spatial and temporal patterns of the prediction performance, indicating that differing prediction models and/or observation data used for verification can result in different findings.
The example of the verification metrics for precipitation at the Ecuadorian coast showed that the analysis of more than one skill metric is required to assess the performance of a model. The very high correlations of SEAS5 with precipitation observations mainly stemmed from individual strong El Niño events. Forecast quality measures based on tercile category forecasts, however, exhibited much lower values, illustrating that the model was not able to discriminate precipitation events well except for these strong El Niño events. While the prediction of heavy precipitation events during El Niño is certainly relevant for end users, this case shows that analyzing correlations alone is not sufficient to evaluate the model performance.
We conclude that the seasonal forecasts from ECMWF SEAS5 perform adequately and are reliable enough to be usefully applied in many regions. Further, we found evidence that the prediction performance of SEAS5 does not solely stem from ENSO, but also from other sources of predictability that contributed to a higher performance in all regions where high predictability was identified. Due to this benefit, we strongly encourage national weather services in South America to complement or replace their seasonal forecasts based on empirical models with dynamical model predictions, or to combine the predictions from these two modeling approaches.
We appreciate the constructive comments of three anonymous reviewers that substantially helped to improve the manuscript. Further, we thank the South American weather services for providing their quality controlled and homogenized datasets for this study. The study was performed within the project Climandes, a pilot project of the Global Framework for Climate Services (GFCS) (Hewitt et al. 2012; WMO 2011) that was funded by the Swiss Agency for Development and Cooperation, aiming at providing high-quality climate services in the form of seasonal predictions to decision-makers in the Peruvian Andean region. CASC thanks the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Process 304586/2016-1, and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Process 2015/50687-8 (CLIMAX Project) for the support received. MJC thanks projects FONDAP 15110009, FONDECYT 11170486, and PAI 79160105 (all CONICYT-Chile).
Data availability statement: The data used in this publication are not publicly available. The observational data used for verification fall under restriction to the policy of the South American weather services, and can therefore not be shared with this publication. Readers with interest in the data are asked to contact the respective weather services directly. The use of the ECMWF SEAS5 hindcasts falls under the ECMWF data policy. SEAS5 hindcasts are available on the C3S climate data store, but only since 1993.
Verification Metrics for Individual Clusters and Seasons
The figures in the appendix show the verification metrics for all individual clusters (Figs. A1–A5), as well as the correlation and Weisheimer categories for MAM (Fig. A6), JJA (Fig. A7), and SON (Fig. A8) complementing Fig. 4.
Denotes content that is immediately available upon publication as open access.