• Adler, R. F., and Coauthors, 2003: The version-2 Global Precipitation Climatology Project (GPCP) monthly precipitation analysis (1979–present). J. Hydrometeor., 4, 11471167.

    • Search Google Scholar
    • Export Citation
  • Allan, R., and T. Ansell, 2006: A new globally complete monthly historical gridded mean sea level pressure dataset (HadSLP2): 1850–2004. J. Climate, 19, 58165842.

    • Search Google Scholar
    • Export Citation
  • Arribas, A., and Coauthors, 2011: The GloSea4 Ensemble Prediction System for Seasonal Forecasting. Mon. Wea. Rev., 139, 18911910.

  • Hagedorn, R., F. J. Doblas-Reyes, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting I. Basic concept. Tellus, 57A, 219233.

    • Search Google Scholar
    • Export Citation
  • Huffman, G. J., R. F. Adler, D. T. Bolvin, and G. J. Gu, 2009: Improving the global precipitation record: GPCP Version 2.1. Geophys. Res. Lett., 36, L17808, doi:10.1029/2009GL040000.

    • Search Google Scholar
    • Export Citation
  • Stockdale, T. N., 1997: Coupled ocean–atmosphere forecasts in the presence of climate drift. Mon. Wea. Rev., 125, 809818.

  • View in gallery
    Fig. 1.

    Ensemble-mean forecast for the average temperature anomaly (K) over MAM 2011, initialized around 1 Feb. (a) Forecast against 1996–2009 climatology, (b) 1981–2010 climatology, and (e) 1971–2000 climatology, all using method 1. (c),(f) As in (b),(e), respectively, but using method 3. (d),(g) The mean difference between the 1996–2009 climatology and the climatologies 1981–2010 and 1971–2000, respectively. These differences are equal to the differences (b) − (a) and (e) − (a), respectively.

  • View in gallery
    Fig. 2.

    Correction difference (y axis; defined in text) plotted against the percentile of the original percentile of the 2mT synthetic ensemble created using data from MAM. Correction differences are for the reference periods 1971–2000 (red lines) and 1981–2010 (blue lines). Corrections are made using each of the three methods: method 1 (solid), method 2 (dots), and method 3 (dashes). Gray areas show where the correction difference is large enough to move the original percentile into a different tercile category. Note that the axes vary from plot to plot.

  • View in gallery
    Fig. 3.

    As in Fig. 2, but for precipitation.

  • View in gallery
    Fig. 4.

    As in Fig. 2, but for MSLP.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 109 55 27
PDF Downloads 49 14 1

Forecasting with Reference to a Specific Climatology

Emily WallaceMet Office Hadley Centre, Exeter, United Kingdom

Search for other papers by Emily Wallace in
Current site
Google Scholar
PubMed
Close
and
Alberto ArribasMet Office Hadley Centre, Exeter, United Kingdom

Search for other papers by Alberto Arribas in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Seasonal forecasts are most commonly issued as anomalies with respect to some multiyear reference period. However, different seasonal forecasting centers use different reference periods. This paper shows that for near-surface temperature, precipitation, and mean sea level pressure, over most regions of the world there is evidence that these differences between reference periods should not be ignored, especially when forecasters combine outputs from several prediction systems. Three methods are presented by which reference periods could be adjusted, and it is shown that the differences between the proposed methods are smaller than the errors that result from not correcting for different reference periods.

Corresponding author address: Emily Wallace, Met Office Hadley Centre, Exeter, EX1 3PB, United Kingdom. E-mail: emily.wallace@metoffice.gov.uk

Abstract

Seasonal forecasts are most commonly issued as anomalies with respect to some multiyear reference period. However, different seasonal forecasting centers use different reference periods. This paper shows that for near-surface temperature, precipitation, and mean sea level pressure, over most regions of the world there is evidence that these differences between reference periods should not be ignored, especially when forecasters combine outputs from several prediction systems. Three methods are presented by which reference periods could be adjusted, and it is shown that the differences between the proposed methods are smaller than the errors that result from not correcting for different reference periods.

Corresponding author address: Emily Wallace, Met Office Hadley Centre, Exeter, EX1 3PB, United Kingdom. E-mail: emily.wallace@metoffice.gov.uk

1. Introduction

Seasonal forecasts are often issued as a deviation from a specified reference period (or climatology). This format enables users to see if the forecast represents an increase or decrease in risk from what is normally expected. Three common ways to express forecasts of this nature are as follows: 1) as an anomaly from a particular reference value, 2) as a percentile of a climatological distribution, and 3) as the probability of an event occurring. Forecasts can also be expressed in their original units (i.e., real-world values without reference to a climatology), but this is less useful for risk management.

Forecasting against a model reference/climatology serves another purpose: it provides a simple linear way to remove biases in dynamical model output (Stockdale 1997). The model climatology is defined by running retrospective forecasts (hindcasts) for a set of start dates in the past (the hindcast period). Different centers use different hindcast periods (Table 1), and so reference periods vary between centers. This causes a problem as it is common to collate information from different sources when producing seasonal forecasts because of the improvements in reliability and consistency (Hagedorn et al. 2005). Clearly, attempting to compare forecasts that are against different reference periods has the potential to introduce biases due to long-term trends or low-frequency variability. However, the practice is common and even multimodel forecasts shown by the World Meteorological Organization (WMO) lead center for long-range forecasting (http://www.wmolc.org) are made from forecasts with varying reference periods. In this paper we hope to persuade users of dynamical monthly-seasonal forecast products of the need to objectively account for the different reference periods in the information that they collate.

Table 1.

Hindcast periods of the 12 WMO long-range forecasting Global Producing Centres (GPCs).

Table 1.

The format of the paper is as follows. In section 1a, we describe the data used. In section 1b, we present three possible methods for converting raw model output to a forecast relative to a specific reference period (later called reference period correction). In section 2a, we demonstrate the difference that a reference period can make to a forecast using, as an example, an ensemble mean forecast of near-surface temperature (2mT) anomaly expressed against three different reference periods. In section 2b, we look at the effect of reference period correction on individual ensemble members over a wider range of possible forecast values. In section 2c we explore the risk involved in picking the wrong reference period correction method compared to the risk involved in not correcting for reference period. The analysis is extended to include precipitation and mean sea level pressure (MSLP) in section 2d. Conclusions are presented in section 3.

a. Data

The ensemble mean forecasts in this study (section 2a) are from the Met Office operational global seasonal forecasting system (GloSea4; Arribas et al. 2011). Analysis is presented using operational forecasts of March–May 2010 (MAM); results were similar for other periods and for the European Centre for Medium-Range Weather Forecasts (ECMWF) system. For each operational forecast (42 ensemble members), 168 hindcast members were used to make the bias correction: 12 members for each year of the 1996–2009 hindcast period.

To fully analyze the differences between methods we create a synthetic forecast (or ensemble of realizations) designed to span almost the entire model distribution. It is defined as a 99-member ensemble equal to the 1st, 2nd, … , 99th percentiles of the hindcast distribution, calculated separately for the area average of each Giorgi region. While it is important to consider forecasts of ensemble mean anomaly, they could hide the differences between methods. Indeed, the differences between methods may be of opposite signs at different ends of the distribution, and thus would be small when averaged over an ensemble. Conversely, differences between reference periods are likely to be of the same sign throughout the distribution. The synthetic ensemble is used to explore these differences throughout the distribution.

The datasets used as observations are as follows. Near-surface temperature (2mT) data were obtained from the National Centers for Environmental Prediction (NCEP) reanalysis [provided by the National Oceanic and Atmospheric Administration/Earth System Research Laboratory/Physical Sciences Division (NOAA/ESRL/PSD), Boulder, Colorado, from their website http://www.esrl.noaa.gov/psd/]. The precipitation dataset was from version 2.2 of the Global Precipitation Climatology Project (GPCP; Adler et al. 2003; Huffman et al. 2009). The MSLP was from the near-real-time update of the Hadley Centre sea level pressure model, version 2 (HadSLP2r; Allan and Ansell 2006).

b. The correction methods

We present three methods that can be used to display an ensemble forecast with respect to a specific reference period (Table 2). Each method assumes the model has particular systematic errors and tries to estimate and remove them as follows:

Table 2.

Algebraic description of methods 1–3.a

Table 2.

Method 1—An assumed bias in the mean is removed by subtracting the model’s climatological mean, and adding the observed climatological mean from the hindcast period.

Method 2—Assumed biases in the mean and interannual variability are removed by subtracting the model’s climatological mean, inflating/deflating the variance of the resulting anomalies to make them in line with observations, and adding the observed climatological mean from the hindcast period.

Method 3—Errors in the model’s distribution (which includes errors in mean and interannual variability, but also, e.g., in skewness) are removed by expressing the forecast ensemble members as a percentile of the model’s climatological distribution, and then matching these percentiles against the percentiles of the observation over the hindcast period.

When choosing a method there is a balance: it is desirable to correct as many of the errors as possible, but to rely on the fewest estimates of these errors. For example, if it was believed that the model does not have errors in its interannual variability, then one would not wish to risk introducing noise by attempting to estimate the error in variance. Method 3 has the potential to introduce more noise than either method 1 or 2, so this method should be avoided unless it is thought to be necessary.

2. Results

a. Assessing the correction methods on the ensemble mean anomalies

Figure 1 illustrates (for 2mT anomaly) the differences between ensemble mean forecasts that are identical except in their reference period. In general, for the same forecast, anomalies are greater against the older (cooler) reference period than the newer (warmer) ones. This is most evident at high latitudes, in particular in northeast Canada (a difference of more than 4°C, Fig. 1g). There are also examples of the sign of the anomaly varying depending on the reference period (e.g., over the United Kingdom and Spain). This example illustrates that if forecasts that are against different reference periods are combined then the true forecast signals may be obscured.

Fig. 1.
Fig. 1.

Ensemble-mean forecast for the average temperature anomaly (K) over MAM 2011, initialized around 1 Feb. (a) Forecast against 1996–2009 climatology, (b) 1981–2010 climatology, and (e) 1971–2000 climatology, all using method 1. (c),(f) As in (b),(e), respectively, but using method 3. (d),(g) The mean difference between the 1996–2009 climatology and the climatologies 1981–2010 and 1971–2000, respectively. These differences are equal to the differences (b) − (a) and (e) − (a), respectively.

Citation: Monthly Weather Review 140, 11; 10.1175/MWR-D-12-00159.1

Of the three methods presented earlier, the method used to convert the forecast is relatively unimportant (cf. method 1 in the left-hand column to method 3 in the middle column). Even in the regions where the largest differences are observed (e.g., the United States), the main message of the forecast is not significantly changed by using an alternative method.

b. Assessing the correction methods on individual ensemble members

Using the synthetic ensemble we now look at the effect of reference period correction on individual ensemble members for 2mT, precipitation, and MSLP. In this analysis we call ensemble members expressed against the 1996–2009 period the “original” members (x axis, Fig. 2), those expressed against the 1981–2010 or 1971–2000 periods “corrected,” and the difference between these (corrected − original) the “correction difference” (y axis, Fig. 2). In general, correction differences are positive, as the 1996–2009 hindcast period is usually warmer than the 1981–2010 and 1971–2000 periods (Fig. 2).

Fig. 2.
Fig. 2.

Correction difference (y axis; defined in text) plotted against the percentile of the original percentile of the 2mT synthetic ensemble created using data from MAM. Correction differences are for the reference periods 1971–2000 (red lines) and 1981–2010 (blue lines). Corrections are made using each of the three methods: method 1 (solid), method 2 (dots), and method 3 (dashes). Gray areas show where the correction difference is large enough to move the original percentile into a different tercile category. Note that the axes vary from plot to plot.

Citation: Monthly Weather Review 140, 11; 10.1175/MWR-D-12-00159.1

Probabilistic seasonal forecasts are often made by counting the proportion of ensemble members that fall into a particular tercile category. The tercile into which an ensemble member falls depends on the period over which the tercile is calculated. Consider the 30th member of the synthetic ensemble: in the original ensemble this member is equal to the 30th percentile (by definition), and so is in the lower tercile when the reference period is 1996–2009. If, say, against the 1971–2000 period it were corrected to the 40th percentile (correction difference of 10) then the member would now count toward the middle tercile. A switch like this alters the “message” that the ensemble member gives, and therefore is important to highlight in this analysis. We have done this in Fig. 2 by shading areas in gray areas where a switch of tercile would occur. Notice that close to a tercile boundary, even a small correction difference (of the relevant sign) can result in a change of tercile, and so the gray area approaches the x axis close to tercile boundaries.

In many regions much of the data falls within the gray shading, in particular in eastern Africa. Here large correction differences can be seen (up to 205% of the width of the middle tercile), which would represent fundamental changes in the forecast issued. If we compare the magnitude of the correction differences with the differences between methods it is clear that differences between methods is smaller than the reference period correction difference. This indicates that, even if a suboptimal choice of method was made, it would still be better to correct for reference period than not to do so.

In regions where the method choice has a large influence on the resulting corrected ensemble (e.g., northern Europe), the differences between methods should be seen as an incentive to investigate the most appropriate method for the correction rather than to not make a correction. However, it may not be straightforward to ascertain the most appropriate method, so we now investigate the consequences of making an uninformed choice of method.

c. The consequence of an uninformed decision

In this section we attempt to answer the question: “If it is not possible to ascertain the optimal method, is it safer not to correct for reference period at all?” To do this we estimate the probability that a randomly selected method is better (closer to some unknown perfect correction method) than leaving the original ensemble without correction, calling this the probability of improvement. As a first approximation, if the probability of improvement is greater than 0.5 then the reference period should be corrected even if there is no conclusive evidence that could be used to select the best method.

The probability of improvement is estimated using the synthetic ensemble to find the proportion of ensemble members for which the perfect method is closer to one of methods 1–3 than it is to the original ensemble. Closeness is measured as the absolute difference between methods for each ensemble member, when expressed in the percentile format. As the perfect method is unknown we assume that one of methods 1–3 is perfect. While this is unlikely to be true, given that the methods each use different assumptions as to the nature of the systematic errors in the model, it is reasonable to use one of the existing methods as a proxy for the perfect method. To protect ourselves from overestimating the probability of improvement we do not compare the closeness of the assigned perfect method to itself.

The probability of improvement over all regions ranges from 0.3 (Amazon basin) to 1 (central Asia and eastern Africa), with an average value of is 0.8 for the 1971–2000 period, and from 0.43 (central North America) to 0.93 (eastern Africa) with an average value of 0.7 for the 1981–2010 period. For the majority (>80%) of regions probabilities are greater than 0.5, which indicates that in general there is less risk involved in randomly selecting a correction method than in not correcting for reference period at all. This has the implication that for most regions the best strategy is to implement a correction method first, and then to attempt to pick the optimal one.

d. Precipitation and mean sea level pressure

Now consider precipitation and MSLP (Figs. 3 and 4). Low-frequency oscillations in precipitation and MSLP mean that there is variation in the mean values of these variables over the various reference periods, making reference period correction necessary. Indeed, the correction differences are similar to those found for temperature. The average absolute correction difference is around 30% of the width of the middle tercile. Larger correction differences can be seen for precipitation in Southeast Asia and for MSLP in southern South America. Here differences are large enough to affect the main message of the forecast. Probabilities for improvement are lower than for temperature (all are between 0.6 and 0.7), but even so the indication is that it is more important to correct for reference period than it is to optimize the method for doing so.

Fig. 3.
Fig. 3.

As in Fig. 2, but for precipitation.

Citation: Monthly Weather Review 140, 11; 10.1175/MWR-D-12-00159.1

Fig. 4.
Fig. 4.

As in Fig. 2, but for MSLP.

Citation: Monthly Weather Review 140, 11; 10.1175/MWR-D-12-00159.1

3. Conclusions

For ease of production and use, seasonal forecasts using dynamical model output are most commonly issued as anomalies from a multiyear average or as the probability of an event, where the event is defined using a particular reference period. We have shown that there is strong evidence that differences between reference periods should not be ignored. We have offered three methods by which the reference period could be corrected, and have shown that the differences between them are small. This indicates that the particular method chosen to adjust the baseline period is less important than the fact that the baseline should always be corrected.

A problem remains that many users of seasonal forecast products (e.g., those displayed on http://www.wmolc.org), do not have access to the hindcast data for making reference period corrections. In these cases it is recommended that users still try to take account of differences between reference periods objectively. For plots in probabilistic format this would be extremely difficult. This should be considered when producing centers decide what information to make publically available.

Acknowledgments

This work was supported by the Joint DECC/Defra Met Office Hadley Centre Climate Programme (GA01101).

REFERENCES

  • Adler, R. F., and Coauthors, 2003: The version-2 Global Precipitation Climatology Project (GPCP) monthly precipitation analysis (1979–present). J. Hydrometeor., 4, 11471167.

    • Search Google Scholar
    • Export Citation
  • Allan, R., and T. Ansell, 2006: A new globally complete monthly historical gridded mean sea level pressure dataset (HadSLP2): 1850–2004. J. Climate, 19, 58165842.

    • Search Google Scholar
    • Export Citation
  • Arribas, A., and Coauthors, 2011: The GloSea4 Ensemble Prediction System for Seasonal Forecasting. Mon. Wea. Rev., 139, 18911910.

  • Hagedorn, R., F. J. Doblas-Reyes, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting I. Basic concept. Tellus, 57A, 219233.

    • Search Google Scholar
    • Export Citation
  • Huffman, G. J., R. F. Adler, D. T. Bolvin, and G. J. Gu, 2009: Improving the global precipitation record: GPCP Version 2.1. Geophys. Res. Lett., 36, L17808, doi:10.1029/2009GL040000.

    • Search Google Scholar
    • Export Citation
  • Stockdale, T. N., 1997: Coupled ocean–atmosphere forecasts in the presence of climate drift. Mon. Wea. Rev., 125, 809818.

Save