## Abstract

The skill of two global numerical weather prediction models, the National Centers for Environmental Prediction (NCEP) medium-range forecast model and the European Centre for Medium-Range Weather Forecasts (ECMWF) operational model, has been assessed over the Southern Hemisphere extratropics for much of the 1990s. Forecast skill and circulation predictability are calculated in terms of predicted and observed 500-hPa height fields. The skill of both the NCEP and ECMWF models has increased steadily through the decade. The useful forecast range (mean anomaly correlation at least 0.6) extended out to about day 6 during the late 1990s compared to day 5 in the early 1990s. The ECMWF model generally performed best out to the useful forecast limit, but scores were insignificantly different beyond that. ECMWF forecasts show a gradual increase in variance with forecast interval, while NCEP forecasts show a decrease.

For both models, the most predictable wintertime circulation pattern, defined by a singular value decomposition analysis, is associated with wave propagation across the South Pacific and southern Atlantic Oceans, the so-called Pacific–South American pattern, analogous to results found for the Northern Hemisphere. At day 10, the predicted amplitude of the leading pattern correlates at 0.6 with the analysis amplitude, while average hemispheric anomaly correlations are less than 0.3. For the leading singular mode pair, the spatial patterns and summary statistics compare closely between models. The spatial pattern of the leading singular mode is very similar in form to the leading analysis EOF from either model. A study of forecast errors reveals that a pattern related to the “high-latitude mode” or Antarctic oscillation, associated with a zonally symmetric exchange of mass between mid- and high latitudes, is weakly associated with large forecast errors. Large errors tend to be associated with positive height anomalies over the Pole and weak westerlies near 55°S. The more predictable patterns exhibit stronger temporal persistence than do the least predictable. Applications of these results to operational forecasting are discussed.

## 1. Introduction

Variations in the skill of operational numerical weather prediction (NWP) models arise in two main ways. First, technological improvements (faster computers, better models) have led to substantial and relatively steady improvements in skill over the last several decades (Kalnay et al. 1998). Second, overlaid on the general upward trend in forecast skill are shorter-term variations related to the predictability of the atmospheric circulation. The chaotic nature of the circulation (Lorenz 1963, 1965) ensures that predictability and forecast skill vary on daily, seasonal, and interannual timescales (e.g., Brankovic and Palmer 1997; Buizza 1997; Renwick and Wallace 1996a; Zheng et al. 2000). Research into medium-range predictability is concerned with the latter kind of variation in skill, although it is important from the point of view of operational weather forecasting to document and keep close track of changes in performance related to model formulation. Moreover, based upon operational NWP model output, it may be difficult to separate the impact on predictability of the “secular” effects of model changes from the “dynamical” effects of circulation variability (Renwick and Wallace 1996a).

A number of medium-range predictability studies have found associations between the presence of specific circulation patterns and a tendency for high (or low) forecast skill (e.g., Branstator et al. 1993; O'Lenic and Livezey 1989; Palmer 1988; Renwick and Wallace 1995). Much of the work has concentrated on the Northern Hemisphere (NH) winter, where it has been found that the leading empirical orthogonal functions (EOFs; Jolliffe 1986) tend to be the most accurately forecast features of the circulation (Branstator et al. 1993). The best forecast (most predictable) features tend to be the most persistent. The most poorly forecast features appear related to the occurrence of high-latitude “blocking” and associated high-frequency baroclinic wave activity (Dalcher and Kalnay 1987; Renwick and Wallace 1996b). While blocking events themselves are persistent, and therefore predictable once established, transitions between blocking states and zonal flows often occur rapidly and unpredictably.

It is important to distinguish between the skill in forecasting the amplitude of “predictable” or “unpredictable” patterns, and skill in forecasting the full circulation anomaly field. Here, predictable patterns are taken as those whose amplitude is most well forecast. Skill in forecasting such patterns carries through to skill in forecasting the full circulation anomaly field at times when the leading predictable patterns dominate the circulation. When predictable pattern amplitude is large, the full circulation tends to be well forecast. However, when predictable pattern amplitude is small, the full circulation may not be well forecast, even though that part of the variability associated with the predictable pattern is predicted skillfully. Similar remarks apply, in a converse way, to unpredictable patterns associated with low forecast skill.

The initial aim of this work was to document recent operational model performance over the Southern Hemisphere (SH) extratropics, to aid the interpretation of model output in an operational setting. Of the global models in use today, relatively long time series of analyses and forecasts were readily available from two centers: the European Centre for Medium-Range Weather Forecasts (ECMWF) and the National Centers for Environmental Prediction (NCEP). A second aim was to assess aspects of the medium-range predictability of the Southern Hemisphere circulation, using methods previously applied to the northern extratropics, as outlined above. By comparing the output of two different NWP models, we hope to identify circulation features genuinely associated with predictability from those particular to a single model or related to changes in model formulation. The paper begins with a description of data sources and analysis techniques (section 2). General error and skill statistics are presented in section 3 and the analysis of predictable and unpredictable circulation “modes” is discussed in sections 4 and 5. The final section contains a summary and concluding remarks.

## 2. Data and methodology

The results presented here are based on operational output of the analysis and forecast systems of the ECMWF (e.g., Bengtsson 1985; Buizza 1997) and the U.S. NCEP Medium Range Forecast (MRF) model (Kalnay et al. 1998). ECMWF 500-hPa height (H500) analyses and forecasts (the “Lorenz” datasets) were obtained for the period December 1990–November 1998, supplemented through 1999 with fields received in real time as Global Telecommunications System transmissions. A full set of NCEP fields was obtained for the period January 1990–June 1998, through the National Center for Atmospheric Research (NCAR). In both cases, daily analyses (denoted D00) and 1- to 10-day predictions (in daily steps, denoted D1, D2, … , D10) were obtained. For much of the discussion below, results refer to the common H500 field and the common period of data coverage, December 1990–June 1998. During the period of record, there have been many significant changes to the formulation of both modeling systems, resulting in upward trends in forecast skill. Model improvements will not be discussed here, but for more information, a very readable account of developments in the NCEP operational forecasting system may be found in Kalnay et al. (1998). A description of changes to the ECMWF prediction system is available in ECMWF Data Services (1999).

For ECMWF data, all model runs were initialized from analyses at 1200 UTC, while for NCEP all runs began from 0000 UTC analyses. The different analysis and validity times for the two models implies that strict comparisons are not possible. However, it is expected that large-scale variance structures in the model fields should not be sensitive to the time of day and that averaged statistics should indicate the relative skill of both models. All fields were defined on a global 2.5° latitude–longitude grid (prior to December 1995, NCEP fields were converted from spectral coefficients to the 2.5° grid). As the focus of this study is large-scale variability over the SH extratropics, fields were projected onto a relatively coarse 19 × 19 polar stereographic grid centered over the South Pole, covering all latitudes south of 20°S. Such a grid is capable of resolving the dominant large-scale features of the SH circulation (e.g., Renwick 1998).

In all of the results presented, model forecasts were compared to their own analyses. This was considered to be preferable to taking one model's analysis as the “truth,” which would bias results in favor of that model. In practical terms, analyses could not be “exchanged” between models as analysis times were always 12 h different. Since the SH extratropics are rather poorly observed, much of the analysis detail must come from model “first-guess” fields, especially over the data-sparse southern oceans. Hence, forecast errors calculated using a model's own analysis may be optimistically biased. To make a simple estimate of the sensitivity of results to the choice of analysis, many of the calculations for both sets of model forecasts were performed using analyses from a third readily available source, the NCEP–NCAR reanalyses (Kalnay et al. 1996). The reanalyses are clearly not independent of the MRF fields, but they at least provide a single set of analyses for 0000 and 1200 UTC verification times. Comparisons have also been made between model forecasts and land-based upper-air observations in the New Zealand region. Results of such comparisons generally reinforced conclusions reached from the model–analysis comparisons.

Measures of forecast and analysis variance are taken as root-mean-squared amplitudes *A* and *F*:

where *a*_{i} is the analysis value, *f*_{i} the forecast value, and *c*_{i} the climatological value at point *i.* The number of points in the summation (over space or time) is *n.* The two “basic” integrated statistics used are the root-mean-squared error *E* and the anomaly correlation *R,* defined as

(Molteni and Palmer 1991; Renwick and Wallace 1995). In (4), *A* and *F* are as defined in (1) and (2). Statistic *E* measures the typical size of forecast errors while *R* measures how well the forecast anomaly pattern matches the analysis, without reference to the size of the anomalies. The two statistics are negatively correlated. Note that in all results presented, an “error” is taken as forecast minus analysis.

Separate climatologies were derived from the full forecast and analysis datasets using a harmonic analysis with annual and semiannual components. For the calculation of anomaly correlations, the analysis climatology was removed from both analyses and forecasts. This is in line with operational practice to include systematic errors and anomalies from climatology in the calculation of *R* (Palmer and Tibaldi 1988). For other calculations, both climatologies were used, effectively removing the mean bias in the forecasts. In practical terms, derived statistics are relatively insensitive to systematic errors, since the systematic component of the forecast errors is comparatively small, even for D10 forecasts, as found for the extratropics of the NH (Renwick and Wallace 1995).

Statistics *E* and *R* are used to characterize the relative performance of both models through the period of record. Patterns of variability associated with high and low skill are derived from a series of multivariate techniques. Here, we concentrate on the cool half-year (May–October, or MJJASO), the time of maximum error variance over the Southern Hemisphere. Anomaly patterns associated with low skill are calculated as composites over cases of large rmse *E* and by regressing daily time series of *E* against forecast and analysis anomalies maps, as in Renwick and Wallace (1996a). EOF analyses of forecast error fields were also performed for comparison. Anomaly patterns associated with high skill are derived from singular value decomposition analysis (SVDA) of forecast and analysis anomaly fields. SVDA is a two-field analog of EOF analysis, where the matrix of covariances between forecasts and analyses is diagonalized. The leading forecast and analysis “singular vector” patterns are defined such that their respective amplitude time series have maximal covariance. SVDA results are summarized in terms of the squared covariance fraction (SCF), analogous to the “explained variance” in an EOF analysis; correlations in time (*r*_{t}) and space (*r*_{s}) between the singular mode time series and spatial patterns, respectively; and in terms of the amount of variance accounted for separately in the forecast (EV_{f}) and analysis (EV_{a}) fields [see Bretherton et al. (1992) and Renwick and Wallace (1995) for more discussion].

## 3. Overall model performance

The form and magnitude of many of the results presented below depend upon the variance structure of the observed circulation. We therefore begin with a brief discussion of observed SH circulation variability. Figure 1 shows standard deviation fields for H500 from both models, for the analyses, and for D10 errors. Values were calculated for all times of year combined, from daily fields. The pattern of height and error variance is nearly constant through the year but magnitudes are largest in winter (June–August, JJA). All plots are largely zonally symmetric, as is typical of intraseasonal variability in the SH circulation (Hurrell et al. 1998). Maximum values occur in a band centered near 60°S, a few degrees north of the climatological mean sea ice edge (Bromwich and Parish 1998). The southeast Pacific exhibits the largest height variance and largest forecast errors, associated with the strong low-frequency variability seen in that region (Kidson 1999; Renwick and Revell 1999). At D10, midlatitude height errors of 100 m are typical. The form of the error variance field matches closely with that of the analysis variance field, as in the NH extratropics (Renwick 1995), since on average the greatest potential for error occurs where analysis anomalies are largest. Both models agree closely on the variance structure of the circulation, and on its mean state and seasonal variation (not shown).

The four leading EOFs of the daily analysis anomalies during the cool half-year (MJJASO) are shown in Fig. 2. The patterns shown reflect variability on many timescales, but are dominated by intraseasonal and shorter timescale variance, and resemble the EOFs of 10–50-day variability shown in Kiladis and Mo (1998) and Kidson (1991). The leading pair of modes represent an eastward-travelling wavenumber-4 signature, with significant cross correlation in their time series at 3-day lag. The third and fourth modes appear to be stationary wavenumber-3 patterns, showing no systematic tendency for propagation (Kiladis and Mo 1998). Results from only the ECMWF model are shown, but MRF EOFs are very similar in form to those in Fig. 2, as are NCEP–NCAR reanalysis EOFs for the same period. Modes 1 and 3 also contain zonally symmetric elements at high latitudes and account for some of the variance associated with the “high-latitude mode” (HLM; Kidson 1988) or Antarctic Oscillation (Thompson and Wallace 2000), although they do not represent the HLM explicitly. A varimax rotation produces a clear zonally symmetric HLM pattern, which accounts for around 7% of the total height variance.

Seasonally averaged values of *E* and *R* are presented in Fig. 3, for both ECMWF and NCEP. There is a clear annual cycle in *E* related to the annual cycle in circulation variance [*A* and *F* from Eqs. (1) and (2)]. NCEP rmse's tend to be slightly larger than those of ECMWF at shorter forecast intervals, but the situation reverses (on average) by D10. Based on *t* tests, differences in seasonal mean statistics between models are statistically significant (*p* < 0.05) for *E* at D3 and for *R* at D3 and D5. The trend toward relatively lower NCEP rmse's at longer forecast intervals is related to the amount of variance retained in the forecast fields. NCEP forecast variance (anomaly amplitude) decreases slowly with forecast interval while ECMWF forecast variance increases with forecast interval. By D10, NCEP forecast variance averages around 10% less than analysis variance while ECMWF is around 5% more than analysis. At long forecast intervals, where *R* is close to zero, (4) implies that error variance is equal to the sum of analysis and forecast variance. Hence, a decrease in forecast amplitude (variance) leads to a decrease in *E.* Qualitatively, none of the above statistics are affected by the choice of analysis (individual model analysis or reanalysis), although the separation between models is reduced a little when reanalyses are used with both sets of forecasts. Rmse's are larger in both models in the SH than in the NH (not shown), largely because of the relative paucity of surface weather observations and surface-based tropospheric sounding data over the SH (Kalnay et al. 1998).

Anomaly correlations (Fig. 3b) show a weak seasonal cycle, having a maximum in the winter when forecast and analysis variance is largest. Again, ECMWF skill tends to be slightly higher at shorter forecast intervals, but both models are roughly equal by D10, by which time deterministic predictability is very low. There are noticeable trends in skill over the period shown in Fig. 3, particularly in *R.* Both models exhibit statistically significant (*F*-test, *p* < 0.01) upward trends in *R* at most forecast intervals. For NCEP, significant downward trends are also evident in *E* out to D4 (Table 1). The largest trends in skill occur in the middle of the forecast range, between D3 and D5, where mean anomaly correlations have increased by around 0.1 during the period of record. The NCEP model generally shows (insignificantly) larger improvements in skill than does ECMWF. On average, there has been a gain of around one day in skill levels through the 1990s. For ECMWF, “useful” forecasts (mean anomaly correlations 0.6 or above) extended to forecast day 5.5 in 1991, compared to forecast day 6.5 in 1998. For NCEP, the useful forecast period has increased from around D5 to D6.

Model biases over the SH are relatively small at all forecast intervals. Even at D10, mean height errors exceed 20 m only in small areas (Fig. 4). Bias patterns grow in amplitude with time but largely keep the same form. ECMWF biases show a zonal wave number-3 pattern at midlatitudes, while NCEP biases are more zonally symmetric, associated with a weakening of the subtropical westerlies in the forecasts. While both models have increased in average skill through the decade, there has been little change in either the magnitude or the form of the mean error field seen in Fig. 4. Substituting NCEP reanalyses instead of operational model analyses has little impact on MRF bias patterns and changes ECMWF results only over Antarctica.

## 4. Patterns associated with high forecast skill

Identification of “predictable patterns” (those associated on average with high forecast skill) is carried out in the same way as described in Renwick and Wallace (1995), using SVDA between forecast and verifying analysis height anomaly fields. SVDA mode pairs are displayed as covariance maps, based on the analysis amplitude time series in all cases. The analysis maps are “homogeneous” covariances, showing typical pattern amplitude in the verifying analyses, while forecast maps are “heterogeneous,” showing the typical amplitude of the pattern “response” in the forecasts.

Summary statistics for D5 and D10 SVDA during MJJASO are shown in Tables 2 and 3. The two leading D10 modes are illustrated in Fig. 5 for ECMWF and in Fig. 6 for NCEP. Spatial patterns associated with the first mode at D5 are nearly identical to those found at D10, but there is some mixing of modes two and three between D5 and D10 results. The leading forecast and analysis SVDA modes are highly correlated in space (columns headed *r*_{s} in Tables 2 and 3), as also found for the NH wintertime circulation (Renwick and Wallace 1995). Hence, there is little bias in model forecasts of the leading patterns of spatial variability, and forecast and analysis anomaly fields projecting strongly onto the leading modes should be associated on average with high anomaly covariance/correlation. The leading D10 mode (Figs. 5a,c) accounts for around 40% of the squared covariance between forecast and analysis fields. The amplitude time series of the leading mode pair have a temporal correlation of around 0.6, lower than the 0.7 level found for the more predictable Pacific–North American-like mode in the NH extratropics (Renwick and Wallace 1995), but considerably higher than the mean anomaly (spatial) correlation of around 0.25 for D10 forecasts. Note that in the extratropics at all forecast intervals, there is approximate correspondence between the time mean of the forecast–analysis spatial (anomaly) correlations and the spatial mean of the temporal correlations (not shown). For example, south of 20°S the temporal correlation (during MJJASO) of matching D10 forecast and analysis gridpoint time series lies between 0.2 and 0.3 for both models.

The leading two analysis patterns resemble the leading EOFs of the analyses (Figs. 2a,b), representing zonal wavenumber-3–4 activity at midlatitudes (the sign is arbitrary in both SVDA and EOF analysis). However, the forecast patterns have most of their amplitude in the Pacific–South American (PSA) sector, with generally a much weaker response over the Indian Ocean, especially for the NCEP model (Figs. 6a,c). Hence, much of the forecast skill is associated with the PSA region. It is well known that Rossby wave propagation is common across the South Pacific and southern South America, often emanating from the tropical or subtropical Pacific where the waves may be forced episodically by anomalies in tropical diabatic heating (Kidson 1999; Kiladis and Mo 1998; Mo and Higgins 1998; Renwick and Revell 1999). The predictability across the South Pacific region is likely to come from the persistence of such Rossby wave patterns, having an average timescale longer than that of background synoptic variability. The above result is in broad agreement with the result found for the Pacific–North American pattern in the NH (Renwick and Wallace 1995).

Both NWP models produce essentially the same leading SVDA mode, suggesting that the South Pacific–South American region is genuinely the most predictable area of the SH at the medium range. Results shown here are based on the whole period of record, but there is little change in the form of the leading SVDA patterns between the first and second halves of the record.

Patterns such as those in Figs. 5 and 6, found using SVDA, should be predictable in either polarity, as the analysis technique is linear and the sign of the result is arbitrary. That is, when the amplitude of the leading forecast mode from Fig. 5 is large, in either polarity, the anomaly covariance (and correlation) between the full D10 forecast and analysis fields should be relatively high, since the most well-forecast elements dominate the circulation at those times. In Fig. 7, the ECMWF D10 anomaly correlation *R* and analysis rms amplitude *A* are plotted against the amplitude of the leading ECMWF D10 forecast SVDA pattern (Fig. 5a). A running-mean smoother has been applied to highlight the general form of the relationship. The *R* is generally lowest on average when the pattern amplitude is small, and there is a tendency for higher values as the pattern amplitude increases. However, forecast skill appears highest in the extreme positive polarity of the pattern. Positive extremes are more common and dominant generally, associated with elevated values of total rms amplitude (*A*) and forecast skill. The positive polarity is associated with positive height anomalies over the southeast Pacific, the main region of blocking in the SH (Sinclair 1996; Renwick and Revell 1999). Relations between the PSA pattern and blocking over the southeast Pacific are discussed further by Marques and Rao (1999, 2000). On average, SVDA pattern 1 magnitude and D10 *R* are well correlated. Averaged according to the deciles of the absolute value of the amplitude of the leading ECMWF D10 forecast SVDA mode, anomaly correlation and pattern amplitude are correlated at 0.9. However, strong day-to-day variability in both time series results in a correlation of less than 0.2 for unsmoothed daily values, suggesting that forecast skill cannot be usefully predicted from the amplitude of the PSA pattern.

## 5. Patterns associated with low forecast skill

“Unpredictable patterns” (those associated on average with low forecast skill) are identified using composites and regression maps based on rmse *E.* Time series of *E* were ordered into quintiles, and average forecast and analysis anomaly fields calculated for the first (lowest 20% of errors, the “best”) and last (highest 20% or errors, the “worst”) quintiles. The *E* time series were also regressed against forecast and analysis anomalies at each grid point to obtain patterns linearly associated with changes in rms error. Figures 8 and 9 show a set of such maps for D5 and D10 forecasts, respectively. There are a number of similarities between models in the form of the composites and regression maps, but the agreement is much less than that found for the well-forecast patterns in section 4. Both models show a weak tendency for predicted negative height anomalies over the pole when *E* is low (Figs. 8a,b and 9a,b), and for positive polar height anomalies when *E* is high (Figs. 8c,d and 9a,d), implying a degree of linearity in the relationships. Regression maps with both forecast and analysis fields exhibit a similar pattern, weakly associated with the HLM or Antarctic Oscillation. The correlation between the analysis amplitude of the HLM-like patterns shown in Figs. 8 and 9 (e.g., Figs. 8g, 9g) and a “standard” HLM index (Kidson 1988), calculated as the difference in 500-hPa zonal mean zonal wind between 60° and 40°S, is around 0.6. There is moderate agreement between models on the form of the regression maps, both showing large rmse's associated on average with positive polar height anomalies in analyses and forecasts (Figs. 8g,h).

The association between a pattern related to the HLM and large rmse's appears to come from its modulating influence on total circulation variability. Total forecast and analysis anomaly amplitude tend to be high on average when the HLM pattern is in the weak westerly polarity (positive anomalies over the pole), resulting in higher average rmse's [from (4)]. In the strong westerly polarity, forecast and analysis amplitude tends to be lower than average, leading to reduced rmse's. The differences in mean values of both *A* and D10 *F* [as in (1) and (2)] between polarities of the HLM pattern are statistically significant (*p* < 0.05) under a two-tailed *t* test. However, the amplitude of the pattern shown in Fig. 9g is relatively well forecast. At D10, the forecast and analysis amplitude time series are correlated at ∼0.48 by both ECMWF and MRF. Such a result suggests that it is not the HLM-like pattern itself that is poorly forecast, but that the weak westerly polarity of the pattern (Fig. 9g) is associated with other elements of the circulation that are poorly forecast, in a nonsystematic way.

Another approach to identifying patterns associated with forecast errors is to calculate EOFs from the error fields. The leading four EOFs of ECMWF D10 error fields are shown in Fig. 10 (NCEP results were very similar in form and are not shown). The four leading patterns show two pairs of patterns in quadrature, associated with zonal wavenumber 4 at midlatitudes. There are similarities with the leading analysis EOFs (Fig. 2) and with the leading SVDA patterns (Fig. 5), although the latter set of patterns maximize in amplitude over the Pacific sector, while the amplitude in the error EOFs is more evenly distributed around the hemisphere. The leading D10 error EOFs are quite different in form to the D10 *E*-regression maps, while over the NH the leading error EOF closely resembles the analogous regression map (Renwick 1995). Over the SH, error patterns of the form of Fig. 10a are presumably the result of nonsystematic errors in the forecast phase and amplitude of zonal wavenumber 4, which average out to near zero in the regression maps of Fig. 9. A cluster analysis was also performed on the D10 error fields. Cluster means (not shown) consistently included an HLM pattern and a pair of zonal wavenumber-4 patterns. However, the variation in forecast skill across clusters was generally small (statistically insignificant).

There is a separation between predictable (SVDA-based) and unpredictable (error EOF) modes in the time domain. Figure 11 shows the day-to-day autocorrelation functions of the analysis amplitude of the leading ECMWF D10 SVDA mode (Fig. 5c), the leading D10 error EOF (Fig. 10a), and the D10 forecast–rmse regression map (Fig. 9g). As expected, the time series of the SVDA pattern is more persistent than those of either the error EOFs. The HLM-like pattern (Fig. 9g) is nearly as persistent as the leading SVDA mode and yet is associated with large rmse's in its positive polarity. The relatively large autocorrelation values for the HLM pattern at longer lags is consistent with the view that feedback processes help maintain the HLM beyond the synoptic timescale (Kidson and Watterson 1999).

## 6. Summary and conclusions

The skill of two of the main global NWP models in operational use has been assessed over the Southern Hemisphere extratropics for most of the 1990s. Both NCEP and ECMWF models have increased steadily in skill through the decade, although skill levels are somewhat below those seen over the wintertime NH extratopics. During the latter part of the 1990s, the useful forecast range (mean anomaly correlation at least 0.6) extended out to about D6. The ECMWF model generally performed best out to the useful forecast limit, but scores were insignificantly different for D7 through D10. ECMWF rmse's gradually overtake NCEP rmse's, as a consequence of a gradual increase in variance of ECMWF fields with forecast interval, compared to a decrease in NCEP forecasts. Model biases are small at all forecast intervals.

The most well forecast SH wintertime circulation pattern, defined by forecast–analysis SVDA, appears to be associated with wave propagation across the south Pacific and southern Atlantic oceans, the so-called Pacific–South American (PSA) pattern (Mo and Higgins 1998) or South Pacific wave train (Kidson 1999). At D10, the predicted amplitude of the leading pattern correlates at 0.6 with the analysis amplitude, while average anomaly correlations and gridpoint temporal correlations are 0.3 or less. Both models reproduce essentially the same pattern, which in the analyses is very similar to the leading EOF, in agreement with the findings of Branstator et al. (1993). One polarity of the pattern, associated with positive height anomalies over the far southeast Pacific, is somewhat more favored for skillful forecasts, suggesting a connection between persistent ridging in that region and high forecast skill. The predictability of the PSA pattern comes largely from its persistent nature, as found for the Pacific–North American pattern in the northern winter.

Results regarding poorly forecast situations are rather ambiguous. The SH wintertime circulation pattern associated with the largest forecast errors, defined by rmse composites and regression maps, is weakly associated with the high-latitude mode (HLM) and the strength of the zonal winds around 50°–60°S. Large errors are on average associated with positive height anomalies over the pole and weak westerlies, especially in the Indian and Pacific Ocean sectors. However, the HLM-like pattern itself is quite well predicted, implying that it is the association between the weak westerly polarity of the pattern and relatively large circulation variance that leads to the relationship with rmse's. An EOF analysis of forecast errors suggests that errors in prediction of zonal wavenumber-4 activity at midlatitudes contribute strongly to overall error variance, in a nonsystematic way. This result differs from the situation for the NH wintertime circulation, where the leading EOF of the forecast error field is similar in form to the regression map between hemispheric rmse's and analysis height anomalies (Renwick 1995).

Many of the results presented here appear to be model independent, at least in terms of the two NWP models studied, suggesting that at the medium range, much of the predictability in the SH circulation is associated with the PSA pattern. Results for unpredictable patterns are weaker. It appears that nonsystematic errors in the prediction of zonal wavenumber-4-activity is the main contributor to medium-range error variance. Such errors may be linked weakly to positive height anomalies over the pole and weakened westerlies at 50°–60°S. In terms of their temporal behavior, patterns associated with high forecast skill exhibit stronger temporal persistence than those associated with low skill.

Can the results discussed above be applied on a routine basis for operational weather forecasting? It was shown by Renwick and Wallace (1995) that prediction of forecast skill from the form of the forecast field itself, and statistical correction of forecasts, is compromised by secular changes in model behavior. Neither technique has been formally pursued here. In a qualitative sense, however, these results may perhaps be used as a (weak) guide to forecast model performance. Relationships between (un)predictable pattern amplitude and forecast error or skill hold well on an average basis, even though day-to-day correlations between pattern amplitude and forecast skill are weak (typically around 0.2). When grouped into deciles and averaged, the absolute value of the amplitude of the leading predictable pattern (Fig. 5, PSA) is correlated at 0.9 with the averaged anomaly correlation *R.* Similarly, the decile average of the leading unpredictable pattern amplitude (Fig. 9, D10 forecast-*E* regression map; HLM) is correlated at 0.9 with mean rmse *E.* Routine monitoring of the amplitudes of the leading predictable and unpredictable patterns may therefore provide some guide to the relative medium-range predictability of the SH circulation.

A more direct possibility for real-time monitoring comes from the daily persistence of *E* and *R.* Over the SH extratropics, there is a correlation of around +0.6 between D1 scores from one day (run) to the next. Errors in the forecast circulation tend to be retained over data-sparse regions and may take several days to be corrected. Conversely, good forecasts tend to follow one another as high skill tends to occur when the circulation is in a persistent state.

## Acknowledgments

The operational NCEP model output and processing software to was kindly provided through the NCAR Data Support Section. Particular thanks to Joey Comeaux and Chi-Fan Shih for help in getting our data extraction working. Thanks to Tim Palmer and David Richardson at ECMWF for providing the ECMWF “Lorenz” data. John Kidson provided useful comments on an earlier draft of this paper. The comments of P. L. Silva Dias and two anonymous reviewers helped clarify a number of points. This research was supported by the New Zealand Foundation for Research, Science and Technology under Contract CO1831.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* James A. Renwick, National Institute of Water and Atmospheric Research Wellington, P.O. Box 14901, Wellington, New Zealand. Email: J.Renwick@niwa.cri.nz