## Abstract

A statistical postprocessing approach is applied to seasonal forecasts of surface air temperatures (SAT) over North America in fall, when the original uncalibrated predictions have little skill. The data used are ensemble-mean seasonal forecasts from four atmospheric general circulation models (GCMs) in the Canadian Historical Forecasting Project (HFP2) during the period 1969–2001. The statistical postprocessing uses the relationship between the predicted 500-hPa geopotential height (Z500) and the observed SAT to calibrate the SAT forecasts. The dimensions of the predicted Z500 fields are reduced to three modes with fixed spatial structures but time-dependent amplitudes. The latter are obtained through a singular value decomposition (SVD) analysis linking the variability of the ensemble-mean predicted Z500 to the tropical Pacific sea surface temperatures (SSTs). Results show that the postprocessing significantly improves the predictive skill of North American SAT in fall. The distributions of the SAT temporal standard deviation and the skill of the postprocessed ensemble forecasts are consistent among the GCMs, indicating that the approach is effective in reducing the model-dependent part of the errors associated with GCMs.

## 1. Introduction

It is well known that some teleconnection patterns, such as the Pacific–North American (PNA) pattern and the North Atlantic Oscillation (NAO), can significantly influence the seasonal atmospheric conditions over North America (Wallace and Guztler 1981; Barnston and Livezey 1987). The external forcing associated with sea surface temperature (SST) anomalies is known to play an important role in the variability of the PNA, and to a lesser extent of the NAO (Horel and Wallace 1981; Trenberth et al. 1998; Hoerling et al. 2001; Straus and Shukla 2002; Jia et al. 2009). The relationship between the lower boundary forcings and the atmospheric response in regions as far away as the midlatitudes is at the heart of skilful seasonal forecasts (e.g., Derome et al. 2001). The skill of seasonal forecasts depends on the extent to which the boundary forcings can generate strong enough signals recognizable from the chaotic internal variability of the atmosphere (noise) and on the accuracy of the response achieved by the forecast models. GCMs can be used to estimate the potential predictability of the seasonal-mean atmospheric condition (signal-to-noise ratio) via the use of ensembles of forecasts using identical boundary conditions but slightly different initial conditions (Zwiers 1996). One difficulty with this approach is that the signal and noise estimations are model dependent.

Despite numerous improvements, all numerical models have some systematic errors that can arise either from the model internal physics or from the response to the boundary forcings. Identifying the sources of systematic errors is of critical importance in improving numerical model forecasts and different approaches have been developed to correct these errors (Feddersen et al. 1999; D’Andrea and Vautard 2000; Kang et al. 2004). A statistical postprocessing approach was proposed by Lin et al. (2005) to reduce the systematic errors of ensemble-mean forecasts from GCMs. This technique was applied to the winter Z500 forecasts from two dynamical models for the period from 1969 to 1999. Results showed that significant improvements of the seasonal forecast skill were achieved for the NAO in the case of both models and for the PNA for the less skillful model. In Lin et al. (2008) this technique was applied to the seasonal forecasts of the winter precipitation over Canada. Four dynamical models were used in that study and again significant improvements of predictive skill over the original forecasts were obtained in the southern prairies and the Québec–Ontario region. However, the above studies mainly focused on the wintertime when the relationship between the SST forcing and the seasonal-mean atmospheric conditions appears to be strongest. In this paper, we extend their studies by applying this statistical postprocessing technique to surface air temperature (SAT) forecasts in fall over North America. Results for the other seasons will also be mentioned briefly.

The models that are used for the seasonal forecasts and the data are presented in section 2. In section 3, the postprocessing approach is briefly described. The forecast skills of the SAT in fall before and after the postprocessing are presented in section 4. Section 5 investigates the possible mechanisms accounting for the improvement of the skill followed by the summary and discussion in section 6.

## 2. Data and models

The Canadian Climate Variability Research Network has conducted a project termed the Historical Forecasting Project (HFP). The goal was to test how well seasonal-mean atmospheric conditions could be predicted using dynamical general circulation models in an operational environment (Derome et al. 2001; Kharin et al. 2009). The data used in the present study are from the second phase of the HFP (HFP2) in which four models were used. The models are the second and third generations of the general circulation models (GCM2 and GCM3), used for climate simulations, that were developed at the Canadian Centre for Climate Modeling and Analysis (CCCma; Boer et al. 1984; McFarlane et al. 1992). The other two models are reduced-resolution versions of the global spectral model (SEF; Ritchie 1991) and the Global Environmental Multiscale model (GEM; Côté et al. 1998a,b), used for medium-range weather forecasting, developed at Recherche en Prévision Numérique (RPN) in Montréal, Canada.

For each model, 10 four-month forecasts were carried out starting from the first day of each month. The initial conditions were from 12-h intervals preceding the first day of the expected forecast season. For the initial conditions, the reanalyses from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR; Kalnay et al. 1996) were used. The SST in the forecasts was the sum of the SST anomaly of the month prior to the forecast period, persisted through the forecast period, and the monthly-varying climatological SST. The SST and ice data were taken from the Seasonal Prediction Model Intercomparison Project-2 (SMIP-2) boundary data.

In this study, we only make use of the ensemble mean of the seasonal forecasts spanning the 33 yr from 1969 to 2001. We are therefore focusing on the externally forced variability in the forecasts. The dominant external source of a skillful forced response by the GCMs is derived from the prescribed SST anomalies. The data used for the verification of the forecasts is the Climatic Research Unit (CRU) TS 2.1 dataset, a set of monthly averaged observed SAT over the land surface from the CRU at the University of East Anglia, United Kingdom (more information is available online at http://www.cru.uea.ac.uk/cru/data/hrg.htm). As we are interested in the forecast skill coming from anomalies in the boundary conditions, seasonal forecasts with a 1-month lead time are analyzed to minimize the influence of the initial conditions. In other words, the forecast data used for December–February (DJF), March–May (MAM), June–August (JJA), and September–November (SON) are from the forecasts made for November–February (NDJF), February–May (FMAM), May–August (MJJA), and August–November (ASON).

## 3. Postprocessing approach

It is known that atmospheric large-scale patterns such as the PNA and NAO have a low-frequency time scale, thus are more potentially predictable for longer periods than other smaller scale structures. The presence of these large-scale atmospheric patterns can modulate the climate distributions of the upper-atmospheric flows and affect the temperature and precipitation in various regions over North America (Dole and Gordon 1983; Hurrell 1995; Hurrell et al. 2003; Archambault et al. 2008).

Lin et al. (2005) found that the GCMs involved in the HFP forecasts can predict the time evolution of the PNA and NAO amplitudes in wintertime with some skill while the forecast spatial patterns are biased and model dependent. In essence the statistical procedure they developed is aimed at correcting the bias in the ensemble-mean Z500 field by using the statistical relationship between the observed and predicted Z500 fields. Here we apply similar ideas, but with the aim of correcting the SAT forecast bias. Our postprocessing method is based on the statistical relationship between the observed SAT field over North America and the (ensemble mean) predicted Z500 field. However, rather than use the entire predicted Z500 field at all grid points (or spherical harmonics), which are not independent, we reduce its dimension to three modes with fixed structures but time-dependent amplitudes. Experience has shown that using more than three modes does not improve the results to be discussed. To determine the modes to represent the predicted Z500 field we could have used a standard principal component analysis (PCA) to maximize the variance explained by the modes. Instead, we decided to follow Lin et al. (2005, 2008) and use a singular value decomposition (SVD) analysis involving the predicted Z500 and the tropical SST fields, on the basis that the tropical SST is the most likely source of predictive skill and hence the SVD analysis is most likely to yield the Z500 modes with the most relevance to seasonal forecasts. Lin et al. (2005, 2008) had shown that their approach succeeds in improving PNA and NAO forecasts and precipitation forecasts over Canada, so it appeared of interest to use it in an attempt to improve SAT forecasts over North America. With the above representation of the predicted Z500, a linear regression equation is used to relate the amplitudes of the three modes (predictors) to the observed SAT, grid point by grid point over North America (predictands). The statistical model is applied in a cross-validation approach, meaning that when the equation is to be used to predict the SAT of year *n*, the coefficients of the regression equation are computed with data from all years other than year *n*. Of course, this can be done for as many years as we have Z500 forecasts and SAT observations (33 in our case) with regression coefficients that may change from year to year due to the relatively short time series.

It should be noted that while it is necessary to employ a cross-validation approach in determining the coefficients of the regression model, it is not necessary to do it in the SVD analysis, for the following reason. Imagine an operational forecasting environment in which we need to calibrate the SAT forecast for the coming season. Naturally at that point we do not have the observed SAT for that season. But we do have the dynamical predictions of Z500, so that the latter can be used in the SVD analysis to obtain the amplitudes of the three modes to be used in the regression equation. In other words, the (unknown) SAT cannot (and should not) be used to develop the statistical model, but the (available) Z500 forecasts can, and are used.

The first step in the calibration procedure is therefore to obtain the three modes with which to represent the ensemble-mean predicted Z500 and this is done by an SVD analysis linking the ensemble-mean Z500 north of 20°N for the average of months 2, 3, and 4 (DJF, MAM, JJA, and SON) and the observed SST from the month before the start of the forecasts (October, January, April, and July, respectively) over the tropical Pacific and Indian Oceans (20°N–20°S, 120°E–90°W). Again, the procedure is meant to mimic an operational environment in which only the SST prior to the forecast season is known. In addition, in the dynamical models it is the SST anomaly of the month before the start of the forecast that is persisted throughout the forecast period, so it is natural to use it in the SVD analysis. The atmospheric components of the expansion coefficients in the first three leading SVDs are denoted by *C*_{1}(*t*), *C*_{2}(*t*), and *C*_{3}(*t*) (*t* = 1, 2, … , 33), respectively, where *t* represents the year in the period 1969–2001.

For each grid point in space, a statistical linear regression model can be written as

where *T _{o}*(

*t*) is the observed seasonal-mean SAT at a given grid point and

*a*

_{1},

*a*

_{2}, and

*a*

_{3}are the regression coefficients that can be calculated by the least squares approach, and the residual is ɛ. Thus (1) relates, at each grid point, the observed SAT to the predicted Z500.

In the forecast year *t _{f}* , once we have the ensemble-mean forecasts from the GCMs,

*C*

_{1}(

*t*),

_{f}*C*

_{2}(

*t*), and

_{f}*C*

_{3}(

*t*) can be determined by projecting the forecast Z500 onto the SVD patterns. Then the corrected forecast can be obtained as

_{f}## 4. Forecast skill of SAT before and after the postprocessing

We begin by showing the climatology of the SAT in SON. Figure 1 depicts the standard deviation of the seasonal-mean SAT calculated from the CRU data for the 33 yr from 1969 to 2001. The SAT variability is seen to generally decrease from north to south over North America. The standard deviations of the ensemble-mean SAT forecasts for the four models are shown in Fig. 2. A comparison between Figs. 1 and 2 reveals that the magnitudes of the SAT standard deviations in the models are generally smaller than in the observations with SEF having the lowest values among the GCMs. Not surprisingly the variability in the forecasts is lower than in the observations, as the former is based on the ensemble-mean forecasts and is consequently an estimate of the forced variability, whereas the latter includes both the forced and internally generated variability. The main point here, however, is that the forced variability is quite model dependent.

The predictive skill of the 1-month lead ensemble-mean forecast SAT for DJF, MAM, JJA, and SON is examined first. Figure 3 illustrates the temporal correlation scores, over the 33 yr, between the observed SAT and the ensemble-mean forecasts averaged over the four models. Areas with a correlation score greater than 0.3, significant at the 5% level or better, are shaded. Significant predictive skill is found over many parts of North America in DJF, MAM, and JJA, while SON has the lowest forecast skill. Limited predictive skill is found in SON over small parts of the midwestern and south-central United States. We then examine the correlation scores for the original ensemble SAT forecasts in SON for each model (Fig. 4). The correlation scores are seen not to be statistically significant for any of the models. It is an open question as to why the GCMs have much lower forecast skill in SAT for SON compared to other seasons.

The correlation score for the postprocessed forecast SAT averaged for the four models for SON is displayed in Fig. 5. The postprocessing of the forecasts is applied to each model separately and then the results are combined by simply averaging them. A comparison between Figs. 3d and 5 shows that significant improvement of the predictive skill is achieved, especially over northern and eastern Canada and the southwestern United States. The corresponding correlation scores for the postprocessed ensemble forecasts for each model are shown in Fig. 6. Results show that the forecast skill has been improved for all GCMs and all models have significant forecast skill over the central and southwestern United States.

The forecast skill of the SAT before and after the postprocessing approach can also be measured by the mean squared error (MSE). Following Smith and Livezey (1999) and Lin et al. (2008), the MSE is calculated using the observed and four model averaged SAT anomalies that are normalized using their respective standard deviations. The MSE for the original model forecasts and the postprocessed forecasts are presented in Figs. 7a,b, respectively. Areas with MSE smaller than 1.2 are shaded. [As the observed and predicted fields have been normalized, the MSE would be equal to 2 if the fields were uncorrelated. In Smith and Livezey (1999) and Lin et al. (2008), a value of 1.5 was used.] The distributions of the MSE for the original and postprocessed forecasts can be seen to be quite consistent with their correlation skills as shown in Figs. 3d and 5. Overall the forecast errors are much lower in the postprocessed forecasts than those in the original forecasts over northern and eastern Canada and the southwestern United States.

When applied to DJF, JJA, and MAM the postprocessing did not significantly improve the forecasts. Thus, this postprocessing approach works best for the SON season when the original uncalibrated forecasts have the least skill. One possible explanation for this result is that with only 33 yr of forecasts in our dataset (32 to compute the regression coefficients) the estimates of the regression coefficients are not sufficiently good to improve on dynamical forecasts that already have a skill above some level, at least a level higher than in SON. If that is the case, a longer dataset would be needed to improve on the dynamical SAT forecasts for the other three seasons. In the above analysis the first month of the forecasts is skipped in order to isolate the contribution of the initial conditions in the skill. We also tested the procedure on zero-lead forecasts. The results (not shown) indicated that the forecasts skill is higher, as expected, than for 1-month lead forecasts, but the conclusions as to the improvements that the postprocessing produces remain the same.

## 5. Physical processes behind the improvement of the skill

We have seen in Fig. 2 that the variability in the ensemble-mean SAT forecasts is strongly model dependent, and extremely small in the SEF model forecasts. Figure 8 shows that after the statistical postprocessing the variability of the SEF forecasts is appreciably larger. We also note that the distribution and magnitude of the variability are now more consistent among the models and closer to the total variability in the observations, suggesting that a part of the model-dependent errors in the forced signal has been removed.

To help understand the source of the improvement in the forecast skill in SON in the postprocessed forecasts, we compare the first two leading SVD modes from an analysis of pairs of observed tropical Pacific SST and observed Z500 to the corresponding analysis involving the observed SST and the predicted Z500. As we mentioned before, the data used for SON are those from the forecast season of ASON. The SST anomaly used throughout the forecasts was that of July, the month prior to the forecast period. So the SVD is carried out between the observed July SST over the tropical Pacific (20°N–20°S, 120°E–90°W) and Z500 for SON north of 20°N.

The two leading SVD modes in the observations are shown in Fig. 9. The first SVD mode reveals, not surprisingly, a Z500 PNA pattern (Fig. 9a) and a typical El Niño pattern in the SST field (Fig. 9b) over the tropical Pacific. The second SVD pair is characterized by a negative SST anomaly band along the equatorial Pacific (Fig. 9d) and positive Z500 anomalies over northeastern Canada and Greenland. These two pairs of SVDs explain 49% and 18% of the squared covariance between Z500 and the SST, respectively.

To understand the relationship between the SAT over North America and the leading SVD Z500 patterns, the correlations between the CRU SAT and the atmospheric expansion coefficients for the first two SVDs are shown in Fig. 10. The distribution of the correlation associated with the first SVD shows a dipole structure with a significant positive area over the northwest and a negative area over southeast North America. The correlation with the second SVD shows significant positive values over most of North America except northwestern Canada. The link between the North American SAT and Z500 in SON and the tropical SST in July is thus clear.

The same SVD analysis was conducted for all GCMs separately between the ensemble 1-month lead forecast Z500 of SON and the July SST over the tropical Pacific. The models were found to be unable to produce a one-to-one match of the observed patterns. Shown in Fig. 11 is an example of the two leading SVD patterns for the GCM3 ensemble-mean forecasts. These two pairs of SVDs explain 59% and 25% of the squared covariance, respectively. The atmospheric component of SVD1 (Fig. 11a) resembles the observed SVD1 pattern, and the associated SST component (Fig. 11b) has some similarity to the El Niño signal, but significant differences can be found. Specifically, the warm SST anomaly has a different meridional extent and the center is located too far west. The Z500 centers are shifted eastward compared with the observations. For the second SVD pair, the SST pattern is similar, but differences between the model and the observations are obvious for the Z500 structure.

The above discrepancies in the forecast spatial patterns can seriously degrade the seasonal forecast skill. On the positive side, the time evolution of the observed SVD patterns (PC) are correlated with those of the forecast SVD, in spite of the differences in the spatial patterns. In fact, the success of the postprocessing technique is linked to this property. The technique makes use of the three leading SVD PCs of the model ensemble forecasts (i.e., a linear combination of the three forecast PCs). If one or more forecast PC is correlated with the observed PCs, it will contribute to the postprocessing system to improve the skill of the forecast. Table 1 shows the correlations of the atmospheric expansion coefficients between the HPF2 data and observations. It can be seen that the time series in the observations are significantly correlated to at least one of the PCs in the HFP2 data. Interestingly, the correlations are systematically significant for the second SVD mode. It will be shown later that the postprocessed ensemble forecasts benefit more from the second SVD mode than from the first.

Figure 12 presents the spatial correlations of the CRU SAT with the expansion coefficients averaged for four models for the first two SVDs. As can be seen, the observed SAT is correlated with APC2 over a large area over North America while little significant correlation is found for SVD1. Figure 13 shows the correlation score between the observed SAT and the postprocessed forecasts using only the first (Fig. 13a) and the second (Fig. 13b) PC. It shows that the postprocessed ensemble forecasts benefit mainly from the PC2 over northern and eastern Canada and southwestern North America, while the PC1 only contributes to a small area of southwestern North America.

It is of interest to see if it would be possible to construct a weighted average of the original and postprocessed forecasts that would have a better skill than either forecast over a period of time, the weights being statistically determined. This idea was tested, using a cross-validation approach in determining the weights. The results were negative, presumably because the dataset is too short to enable a sufficiently good estimate of the weights to be used. Various modifications to the above described correction methods have also been tested including using other variables, such as the 850-hPa air temperature as the predictor for the SAT instead of Z500. The North Atlantic Ocean SSTs were also added to the tropical Pacific SST when the SVD analysis was performed. No significant gains were achieved, the results being quite similar to those presented in this study.

## 6. Summary and discussion

The relationship between the ENSO signal and the midlatitude atmospheric response is recognized as the main source of seasonal predictive skill in the extratropics. Recently, some studies found that the positive phase of the NAO can be linked to a negative SST anomaly over the tropical Pacific (e.g., Jia et al. 2009). It is therefore important for numerical models to capture the above association between the midlatitude atmosphere and the tropical Pacific SST. Previous studies (Lin et al. 2005, 2008) showed that the time series associated with these SST-forced large-scale patterns could be reasonably well predicted by GCMs. However, their spatial patterns are found to be biased and model dependent. A statistical postprocessing approach based on the regression of the forecast forced SVD patterns and the historical observations has been used to correct the atmospheric response pattern to tropical SST forcings. It was found that this statistical postprocessing approach is efficient in correcting the systematic errors. When it was applied to the winter Z500 forecasts and to the Canadian winter precipitation, significant improvements were achieved (Lin et al. 2005, 2008). In this paper, we further applied this approach to SAT forecasts in the fall season over North America when the original dynamical model predictive skill is poor, in order to postprocess the ensemble forecasts over a 33-yr (1969–2001) period using the HFP2 data.

An SVD analysis was performed between the ensemble-mean forecast Z500 over the Northern Hemisphere (20°–90°N) and the tropical Pacific SST (20°N–20°S, 120°E°–90°W). The first three atmospheric components of the expansion coefficients in the SVDs together with some regression coefficients were used to construct a new set of SAT forecasts. Results showed that significant improvement of the predictive skill for the SAT in SON was achieved over northern and eastern Canada and southwestern United States. It was also shown that the distributions of the standard deviation of the postprocessed ensemble forecasts are more consistent among the four GCMs used in the study after postprocessing, indicating that the approach is effective in improving the model-dependent part of the systematic errors. Further investigation showed that the postprocessed ensemble forecasts benefit more from the second SVD mode than the first SVD mode. It was found that the statistical postprocessing technique is beneficial for SAT forecasts in the fall season (SON), when the dynamical forecasts are of very poor quality, but it does not lead to improvements over the model dynamical forecasts for the other seasons. It would appear that a longer dataset would be needed to evaluate the coefficients of the statistical model with sufficient accuracy to improve the dynamical forecasts of the three seasons when the dynamical forecasts are of higher quality than in the fall.

Finally, it is worth noting that this postprocessing method has been developed using the leading SVD modes of the ensemble mean of the seasonal forecasts that are mainly forced by the tropical Pacific SST forcing. It is possible that other external forcings, such as the SST anomalies in other ocean basins or the snow cover over Eurasian continent, for example, can also play a role and may not be well represented by these modes.

## Acknowledgments

This research was funded by the Ouranos Consortium, the Global Environmental and Climate Change Centre, the Chinese Universities Scientific Fund, and the Chinese Academy of Meteorological Sciences under Grant 2008LASW-A03. Hai Lin is partly supported by the Canadian Foundation for Climate and Atmospheric Sciences (CFCAS) and by the Natural Science and Engineering Research Council of Canada (NSERC). We are grateful to the two anonymous reviewers for their helpful suggestions on improving our paper.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* XiaoJing Jia, Department of Earth Sciences, ZheJiang University, HangZhou, ZheJiang, China. Email: xiaojing.jia@mail.mcgill.ca