The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) is a World Weather Research Programme project. One of its main objectives is to enhance collaboration on the development of ensemble prediction between operational centers and universities by increasing the availability of ensemble prediction system (EPS) data for research. This study analyzes the prediction of Northern Hemisphere extratropical cyclones by nine different EPSs archived as part of the TIGGE project for the 6-month time period of 1 February 2008–31 July 2008, which included a sample of 774 cyclones. An objective feature tracking method has been used to identify and track the cyclones along the forecast trajectories. Forecast verification statistics have then been produced [using the European Centre for Medium-Range Weather Forecasts (ECMWF) operational analysis as the truth] for cyclone position, intensity, and propagation speed, showing large differences between the different EPSs. The results show that the ECMWF ensemble mean and control have the highest level of skill for all cyclone properties. The Japanese Meteorological Administration (JMA), the National Centers for Environmental Prediction (NCEP), the Met Office (UKMO), and the Canadian Meteorological Centre (CMC) have 1 day less skill for the position of cyclones throughout the forecast range. The relative performance of the different EPSs remains the same for cyclone intensity except for NCEP, which has larger errors than for position. NCEP, the Centro de Previsão de Tempo e Estudos Climáticos (CPTEC), and the Australian Bureau of Meteorology (BoM) all have faster intensity error growth in the earlier part of the forecast. They are also very underdispersive and significantly underpredict intensities, perhaps due to the comparatively low spatial resolutions of these EPSs not being able to accurately model the tilted structure essential to cyclone growth and decay. There is very little difference between the levels of skill of the ensemble mean and control for cyclone position, but the ensemble mean provides an advantage over the control for all EPSs except CPTEC in cyclone intensity and there is an advantage for propagation speed for all EPSs. ECMWF and JMA have an excellent spread–skill relationship for cyclone position. The EPSs are all much more underdispersive for cyclone intensity and propagation speed than for position, with ECMWF and CMC performing best for intensity and CMC performing best for propagation speed. ECMWF is the only EPS to consistently overpredict cyclone intensity, although the bias is small. BoM, NCEP, UKMO, and CPTEC significantly underpredict intensity and, interestingly, all the EPSs underpredict the propagation speed, that is, the cyclones move too slowly on average in all EPSs.
Ensemble prediction involves the integration of multiple forecasts from slightly different initial states to provide an estimate of the probability density function of forecast states (Leith 1974). The approach was first introduced operationally in 1992 by both the European Centre for Medium-Range Weather Forecasts (ECMWF; Buizza and Palmer 1995; Molteni et al. 1996) and the National Centers for Environmental Prediction (NCEP; Toth and Kalnay 1993, 1997). Both systems have since undergone extensive developments and modifications (Buizza et al. 2007; Wei et al. 2006) and a large number of other operational centers around the world now routinely produce ensemble forecasts. The ensemble prediction systems (EPSs) of the different meteorological centers differ in many ways; they use different models, resolutions, initial condition perturbations, model perturbations, numbers of ensemble members, and so on. It is important that the performance of the different EPSs are assessed and compared to explore the impacts these factors have on forecast skill and to determine how the forecast systems could be improved.
In the past, EPS comparison studies have been limited to just a few centers. For example, Buizza et al. (2005) performed a comprehensive comparison of the ECMWF, NCEP, and Canadian Meteorological Centre (CMC) EPSs. Overall, the ECMWF EPS was found to have the highest performance, which was attributed mainly to the superior model and data assimilation system of the ECMWF EPS, rather than other factors such as the perturbation method. Bourke et al. (2004) compared the ECMWF and Australian Bureau of Meteorology (BoM) EPSs in the Southern Hemisphere. The ECMWF EPS was found to have a higher level of performance than the BoM EPS overall. There have also been a number of other studies comparing just the ECMWF and NCEP EPSs (Atger 1999; Mullen and Buizza 2001; Wei and Toth 2003; Froude et al. 2007b).
One of the main reasons why such studies have been limited to just a few EPSs has been the difficulty for the research community in obtaining data from other EPSs. In 2005 a World Weather Research Programme called The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) was initiated at a workshop at ECMWF (Richardson et al. 2005), for which one of the main objectives is to enhance collaboration on the development of ensemble prediction between operational centers and universities by increasing the availability of EPS data for research. Since 1 February 2008, 10 operational weather forecasting centers have been delivering near-real-time ensemble forecast data to three TIGGE data archives located at ECMWF, the National Center for Atmospheric Research (NCAR), and the Chinese Meteorological Agency (CMA). An in-depth description of the TIGGE program, which includes examples of initial research performed with data from the archive, is provided by Bougeault et al. (2010).
Preliminary studies of the TIGGE dataset include Park et al. (2008), which analyzed eight of the EPSs available from the TIGGE archive and found large differences between them. For 500-hPa geopotential height in the NH the difference between the best and worst control forecasts or ensemble mean was about 2 days of predictability at day 5 of the forecast. Titley et al. (2008) compared the ECMWF and Met Office (UKMO) EPSs using data from the TIGGE archive. Overall, the ECMWF EPS was found to have a higher level of performance than the UKMO EPS. Another objective of the TIGGE program is to allow researchers to assess the potential value of combining different ensembles to generate multimodel ensemble forecasts. Some initial studies have also been performed in this area (Johnson and Swinbank 2009; Matsueda and Tanaka 2008; Pappenberger et al. 2008; Park et al. 2008).
A new cyclone-tracking approach to forecast verification has recently been introduced by Froude et al. (2007a). The method involves the identification and tracking (Hodges 1995, 1999) of extratropical cyclones along forecast trajectories. Statistics can then be generated to determine the rate at which the position, intensity, and other properties of the forecast cyclones diverge from the analyzed cyclones with increasing forecast time. Detailed information about the prediction of cyclones was obtained, which could not have been determined from other forecast verification measures. The approach has been used to analyze the prediction of extratropical cyclones by the ECMWF and NCEP EPSs (Froude et al. 2007b), and a more in-depth study of the ECMWF EPS, which focused on the regional differences in forecast skill, has also been performed (Froude 2009).
The aim of this paper is to use the cyclone-tracking analysis method to analyze and compare the predictions of extratropical cyclones by the different EPS archived in the TIGGE dataset. The Froude et al. (2007b) study mentioned above performed a very preliminary comparison of the ECMWF and NCEP EPSs, but data limitations meant it was not possible to perform a full comparison. By making use of the TIGGE archive, it has now been possible to perform a full comparison of nine different EPSs. Since TIGGE is a key component of THORPEX (WMO 2005), which has the major goal of accelerating improvements in the accuracy of 1-day to 2-week high-impact weather forecasts for humanity, the current paper is perhaps particularly timely. It represents the first of a number of planned studies, presenting a basic analysis of the prediction of Northern Hemisphere cyclones. Further studies will perform a more in-depth regional analysis for both the Northern and Southern Hemispheres and will also attempt to understand any differences seen in the prediction of the cyclones by the different EPSs.
2. Data description
Since 1 February 2008, the TIGGE archive has included EPS data from 10 different operational weather centers, namely the BoM, the CMA, the CMC, the ECMWF, the Japan Meteorological Agency (JMA), the Korean Meteorological Administration (KMA), NCEP, UKMO, the Brazilian Centre for Weather Prediction and Climate Studies (Centro de Previsão de Tempo e Estudos Climáticos, CPTEC), and Météo-France. This study analyses EPS data for all of these centers, except Météo-France, for the 6-month time period of 1 February 2008–31 July 2008. Météo-France was excluded because their forecasts are only integrated out to 3 days, which is not long enough to include the full life cycle of a large number of extratropical cyclones. The selected time period was chosen because it was the first 6-month period for which data from all nine of the EPSs were available. If this study was repeated for different seasons/years, the results might be slightly different.
As discussed in the introduction, the EPSs of the different centers vary in a large number of ways. Table 1 shows the main properties of the nine EPSs evaluated in this study. For the initial-condition perturbations, BoM, ECMWF, and JMA use singular vectors (SVs; Buizza and Palmer 1995; Bourke et al. 2004); CMA and KMA use bred vectors (BVs; Toth and Kalnay 1997); NCEP uses an ensemble transform method (ET; Wei et al. 2008); UKMO use an ensemble transform Kalman filter approach (ETKF; Bishop et al. 2001; Wei et al. 2006; Bowler et al. 2007); CPTEC use a method based on empirical orthogonal functions (EOFs; Zhang and Krishnamurti 1999); and CMC adds random perturbations to the observations and generates perturbed analyses using an ensemble Kalman filter (EKF; Houtekamer and Mitchell 1998). Some centers apply perturbations to the entire globe, whereas others just apply the perturbations in certain regions or hemispheres. The fifth column in Table 1 lists the initial perturbation method together with the region the perturbations are applied to for each EPS.
As well as applying perturbations to the initial state, some centers also apply perturbations to the forecast model. ECMWF applies random perturbations to the parameterized physical processes [stochastic physics; Buizza et al. (1999)], UKMO uses the Shutts (2005) kinetic energy backscatter algorithm, and CMC uses schemes similar to both the Buizza et al. (1999) and Shutts (2005) schemes and also uses several different physical parameterization schemes (Houtekamer and Lefaivre 1997). All the other main characteristics of the different EPSs, such as the number of ensemble members, resolutions, and data assimilation approaches, are listed in Table 1. It is important to note that the operational models of all of the EPSs are subject to continuous development and improvement, and so during the 6-month period of this study, changes may have been implemented that will have affected the model characteristics.
3. Analysis methodology
The analysis methodology used in this study has been described in detail in several previous studies (Froude et al. 2007a,b; Froude 2009; Hodges 1995, 1999; Hoskins and Hodges 2002, 2005). The reader is referred to these studies for further discussion of the methodology, including its advantages and disadvantages.
a. Cyclone tracking
The extratropical cyclones were identified and tracked along the 6-hourly forecast trajectories of each of the perturbed ensemble members and the control forecasts of each EPS in the NH (20°–90°) using the identification and tracking scheme of Hodges (1995, 1999). This identification and tracking were performed using the 850-hPa relative vorticity (ξ850) field. Since this paper focuses on synoptic-scale extratropical cyclones, the resolution of the data was reduced to T42 before the cyclones were identified so that only the synoptic-scale features were identified. It would be possible to look at smaller-scale weather systems, such as polar lows, using higher-resolution data, but this is not the purpose of this paper. The planetary scales with total wavenumbers less than or equal to five were also removed (Hoskins and Hodges 2002, 2005) before the cyclones were identified. Vorticity features with a magnitude exceeding 1.0 × 10−5 s−1 were identified as maxima and considered to be cyclones.
Figure 1 shows the T42 postprocessed vorticity field (colored shading) from the ECMWF analysis at 12-h intervals for the selected time period of 1200 UTC 7 February 2008–0000 UTC 9 February 2008. The dots indicate the locations of the centers of the vorticity features. The high-resolution (T799) MSLP field is also shown as contours over the top. It can be seen very clearly that the positions of the cyclones identified with the vorticity field correspond well with the positions of the low pressure centers. There are also a number of weaker vorticity features identified that are not apparent in the MSLP field. This is in fact one of the advantages of using vorticity rather than MSLP for the cyclone identification. More features are identified with vorticity, allowing higher quality statistics to be produced, and the cyclones are usually identified at an earlier stage of development than with the MSLP field. Another advantage is that vorticity is less sensitive to the background flow. A similar correspondence between the vorticity features and low pressure centers was also found for other time periods (not shown). For further details and discussion of the uses and advantages of vorticity in identifying and tracking cyclones, see Hoskins and Hodges (2002).
Once the cyclones were identified, the tracking was performed, which involves the minimization of a cost function (Hodges 1995) to obtain smooth trajectories (cyclone tracks). As a final step, only those cyclone tracks that lasted at least 2 days and traveled farther than 1000 km were retained for the statistical analysis. This method of identification and tracking has been used extensively in other studies of extratropical cyclones (e.g., Bengtsson et al. 2005, 2009; Hoskins and Hodges 2002, 2005).
The identification and tracking were also performed with the ECMWF operational analyses for the selected time period to use for the verification. Ideally, it would have been fairer to verify each EPS against its own analysis data. However, we currently only have access to the ECMWF analysis data at the 6-hourly frequency required for the tracking. This means the results presented may be slightly biased in ECMWF’s favor. The use of different analyses for verification will be investigated in future work.
All of the results of this paper concerning the intensity of the cyclones use the T42 postprocessed vorticity field. Since forecasters generally use the MSLP field, these results may therefore be a little difficult to interpret for use operationally. The reader is referred to the Hodges et al. (2003) study, in which the tracking is performed on reanalysis data with both the MSLP and vorticity fields. That study includes a distribution of cyclone intensity in terms of vorticity compared with MSLP and may therefore help a forecaster to interpret the results concerning the cyclone intensity included in this paper. Further discussion of this issue is provided at the end of this paper, including plans to address the applications of the results to potential users in future work.
b. Matching methodology
To validate the ensemble forecast cyclone tracks against the analysis cyclone tracks, it is necessary to have a systematic method of determining which forecast cyclone tracks correspond to which analysis cyclone tracks. The matching methodology of Froude et al. (2007b) was used, in which a forecast cyclone track was considered to be the same system as an analysis cyclone track (i.e., matched) if the two tracks met certain predefined spatial and temporal criteria. A forecast track was said to match an analysis track if the following held true:
(i) at least T% of their points coincided in time, that is, 100 × [2nm/(nA + nF)] ≥ T where nA and nF denote the total number of points in the analysis and forecast tracks, respectively, and nm denotes the number of points in the analysis track that coincide in time with the forecast track; and
(ii) the geodetic separation distance d between the first k points of the forecast track, which coincide in time with the analysis track, and the corresponding points in the analysis track was less than S°, that is d ≤ S.
The forecast tracks that matched analysis tracks were used to generate diagnostics concerning the position, intensity, and other properties of the cyclones. In Froude et al. (2007a,b), the sensitivity of the diagnostics to the choice of parameters k, T, and S was explored in detail. They found that, although the number of forecast cyclone tracks that matched the analysis tracks varied with different choices of the parameters, the diagnostics produced from the matched tracks were basically unaffected. For this study all the results were obtained using the parameters k = 4, T = 60%, and S = 4°.
As an additional constraint, only those cyclones whose genesis occurs within the first 3 days of the forecast or that already existed at time 0 were considered. Results from the study of Bengtsson et al. (2005) indicated that the skill in predicting cyclone tracks after 3 days is relatively low. If a cyclone was generated in a forecast at a lead time (the time since the start of the forecast) greater than 3 days, and matches a cyclone in the analysis, then it was probably more due to chance than an accurate prediction. Although this may not be the case for the more recent forecast and analysis systems used in this and the previous studies of Froude et al. (2007b) and Froude (2009), this constraint is kept so that the methodology is consistent with Froude et al. (2007a).
The number of ensemble members that match will vary for different cyclones and forecast start times. In addition, the cyclone tracks of the different ensemble members will be different lengths and so the number of ensemble member tracks available decreases with increasing lead time. The statistics in section 4 therefore only include those data points where at least five perturbed member tracks are available, since calculating these values from less than five members would not be very informative. For further discussion of this constraint, see Froude et al. (2007b).
Table 2 shows the number of data points available for inclusion in the statistics for each of the different EPSs as a function of forecast lead time. For each lead time a particular cyclone can be included in the statistics multiple times, but from forecasts of different start times and with the cyclone at a different stage of development [see Froude et al. (2007a) for detailed discussion of this]. The numbers in brackets in the table therefore show the number of distinct cyclones included at each lead time. A total of 774 cyclone tracks were identified in the ECMWF analysis over the 6-month time period examined. The number of data points (and the number of distinct cyclones) available for inclusion in the statistics increases until the day 3 lead time from which point it then decreases. This is because of the constraint that only those cyclones whose genesis occurs within the first 3 days of the forecast or that already existed at time 0 were considered (see above). The number of data points will therefore increase until the day 3 lead time since any new cyclones generated will be included as well as those that already existed at earlier lead times. After this point any new cyclones will not be included and the number of data points will decrease due to lysis (most cyclones last 5 days or less). This decrease in data points means that the statistics of the following section cannot be shown beyond the day 7 lead time and some cannot be shown beyond day 5.
a. Example cyclones
As with the previous studies using the cyclone tracking analysis approach, it is useful to consider a couple of examples of individual cyclones before presenting the statistical analysis. Figure 2 shows the tracks and intensities predicted by the control forecasts of each of the different EPSs for both a Pacific and an Atlantic cyclone. The ECMWF-analyzed track and intensities are also shown and the Pacific cyclone is labeled with a cross in Fig. 1 (the Atlantic cyclone does not occur during the time period in Fig. 1).
The analyzed Pacific cyclone originated over Japan’s Shikoku Island at 0000 UTC 4 February 2008. It then traveled rapidly across the Pacific Ocean, intensifying over the next 4 days and reaching its maximum intensity of 10.9 × 10−5 s−1 at 1800 UTC 7 February. The cyclone remained very intense for 2 days, before decaying rapidly over the next 1.5 days and reaching the coast of British Columbia, Canada, at 0000 UTC 11 February.
The control forecasts started at 1200 UTC 4 February, 12 h after the cyclone is first identified in the ECMWF analysis, are shown in Figs. 2a and 2b. This was the earliest time after the cyclone had been identified in the ECMWF analysis for which there is a forecast available from every weather center. The CMA, JMA, and CPTEC control forecasts predict the track of the cyclone most accurately. The ECMWF track is broken in two at day 2.5 of the forecast, and the NCEP and UKMO tracks are cut short at this point in the forecast. This is caused by a large jump in the position of the cyclone, meaning that the smoothness constraints of the tracking algorithm are not met [see Hodges (1995, 1999) for details of these constraints]. Broken tracks can also occur when a feature splits and a double center is formed. Work is currently under way to address the issue of broken tracks. It is, for example, possible to relax the smoothness constraints (although there is the possibility that this would result in some spuriously elongated tracks). However, this is unlikely to make any significant difference to the statistical results presented in the following sections of this paper, since the number of broken tracks is very small compared to the total number of tracks included in the statistics. The CMC, KMA, BoM, and ECMWF forecasts diverge to the right of the analyzed track from day 2 or 3 of the forecast. For the intensity of the cyclone, the CMA and CPTEC forecasts provide the most accurate predictions, although they do not grow fast enough between days 2 and 3. The JMA forecast overpredicts the intensity, meaning that overall the CMA and CPTEC control forecasts provide the most accurate predictions of this particular cyclone. The other centers’ control forecasts underpredict the cyclone’s intensity.
The analyzed Atlantic cyclone, shown in Figs. 2c and 2d, formed over North America at 0000 UTC 22 February 2008. It then traveled across the Atlantic, intensifying rapidly over the next 3 days before reaching its maximum amplitude of 11.9 × 10−5 s−1 at 0600 UTC 25 February. The cyclone then moved north of the British Isles, over Scandinavia, and just into Russia while decaying over the next 3.5 days.
The control forecasts started at 1200 UTC 22 February, 12 h after the cyclone is first identified in the ECMWF analysis, are shown in Figs. 2c and 2d. Again, this is the earliest time after the cyclone had been identified in the ECMWF analysis for which there is a forecast available from every weather center. The track of the cyclone is predicted very well by all the centers until about day 4 of the forecast, at which point the forecast tracks begin to diverge from the analyzed track. Some of the forecast cyclones travel considerably farther into Russia than the analyzed cyclone. As with the cyclone tracks, the intensities are more accurately predicted than for the Pacific cyclone, with ECMWF and KMA having the highest level of skill. The CMA control forecast overpredicts the intensity of this particular cyclone. The other centers underpredict the intensity, but to a far lesser degree than for the Pacific cyclone.
Figure 3 shows the ensemble mean track and intensity of the Pacific and Atlantic cyclones for each of the different EPSs. The ensemble mean track/intensity values are calculated by taking the mean track/intensity of all the ensemble member tracks/intensities rather than by computing the track and intensities from the ensemble mean of the vorticity field.1 Considering first the Pacific cyclone, overall the ensemble mean provides a superior forecast to that of the control for all the different EPSs. However, all the EPSs underpredict the intensity of the cyclone. The JMA ensemble mean forecast does not overpredict the intensity like its control forecast and the CMA ensemble mean forecast provides a better prediction of the growth of the cyclone between days 2 and 4 of the forecast. The NCEP, UKMO, and ECMWF tracks are not broken or cut short like the control. It is noted that some of the individual ensemble member tracks could potentially be broken and that these tracks may or may not be included in the calculation of the ensemble mean track. This will depend on whether they meet the 2-day lifetime constraint (see section 3a) and the matching criteria (see section 3b). It is also possible that a broken track that does not satisfy one or both of these requirements may bring the number of ensemble member tracks down below the threshold of 5 (see section 3b), resulting in a truncated ensemble member track. However, as stated previously, the number of such tracks is comparatively small and will therefore have no significant difference on the statistical results of the following sections.
For the Atlantic cyclone, there is less of a difference in skill between the ensemble mean and control forecasts. There is some improvement in the prediction of the intensity of the cyclone from the control to the ensemble mean. However CMA still overpredicts the intensity. The relative performance of the different EPSs varies considerably between the two cyclones. This highlights the importance of performing a statistical analysis of a large number of cyclones to assess the skill and determine the strengths and weaknesses of the different EPSs, which will be addressed in the remainder of this paper.
b. Relative skill of the ensemble mean
In this section the relative skill of the different EPSs in predicting cyclone position, intensity, and propagation speed are considered by looking at the ensemble mean error. To calculate the ensemble mean error, the mean track, mean intensity, and mean propagation speed of the matching ensemble member tracks (including the control) were computed for each cyclone in each ensemble forecast at each forecast lead time.
The mean error in position was then calculated as the mean geodetic separation distance between the mean tracks and the corresponding ECMWF analysis tracks. The mean intensity error was calculated similarly, from the fully processed filtered vorticity value at the feature points, using the absolute intensity difference as the measure of error. The propagation speeds of the analysis and ensemble member cyclones were calculated at each point on their tracks by comparing the position of consecutive points on the tracks. Since the points on the tracks are 6 h apart, the speed calculated at each point corresponds to the average propagation speed of the cyclone in the next 6 h. An alternative approach would be to calculate the speed using a ±6 h span around each point, but this would not have any significant impact on the results. The mean propagation speed error was then calculated in the same way as the mean position and intensity error, but using the absolute propagation speed difference as a measure of error. As with the position error, the speed error is calculated along great circles and therefore avoids any biases introduced by working with projections.
Figure 4 shows the mean errors in position, intensity, and propagation speed for each EPS. First, considering the position of the cyclones, there is a large difference in the predictive skill of the different EPSs. ECMWF shows the highest level of skill, although there may be some bias because all the EPSs were verified against the ECMWF analysis. JMA, NCEP, UKMO, and CMC have approximately 1 day less skill than ECMWF throughout the forecast. It is worth commenting that while CPTEC has the least skill, this is perhaps to be expected since the NH extratropical region is not the focus in the construction of their ensemble (e.g., they only apply perturbations in the region of 45°S–30°N). The CPTEC EPS might be expected to perform more competitively in the SH (this will be explored in future work).
For the intensity of the cyclones, the skill of the different EPSs in relation to each other in general remain the same. That is, EPSs with smaller (larger) errors in position generally have smaller (larger) errors in intensity. However, NCEP has a larger error in intensity in relation to the other EPSs than it does for position. For position, NCEP has errors comparable to the CMC, UKMO, and JMA, but for intensity it has larger errors comparable with CMA and KMA. The error growth is faster initially for NCEP, CPTEC, and BoM. This is perhaps because these EPSs are integrated at low resolutions and are not able to accurately capture the cyclones’ growth and decay (see, e.g., Jung et al. 2006). However the CMC EPS is also integrated at a low resolution and does not have this rapid error growth. Perhaps the use of four-dimensional variational data assimilation (4DVAR) is compensating for this by providing a better initial state (e.g., Johnson et al. 2006).
The mean error in propagation speed is large throughout the forecast for all the EPSs ranging from around 8 to 16 km h−1. It should be noted that the speed error is different in nature to the position or intensity error in that it would not necessarily be expected to grow with lead time. However, there will be a cumulative effect of a consistent error in speed on the position of the cyclone with increasing lead time. The relative skill of the different EPSs is similar to the position error. It was only possible to plot the propagation speed error to day 5 as there was insufficient data beyond this point for this particular diagnostic.
c. Intensity and propagation speed bias
In this section the biases of the intensity and propagation speed errors presented in the previous section are explored. It is not possible to compute a bias for the position error shown in Fig. 4a as this error is strictly positive. Figure 5a shows the signed intensity difference between the mean intensity of the ensemble member tracks and the intensity of the analysis tracks for each EPS (i.e., the bias of the error shown in Fig. 4b). CMC has the smallest bias (not exceeding 0.5 × 10−5 s−1) and ECMWF, CMA, JMA, and KMA also all have small biases. ECMWF is the only system that consistently overpredicts the intensity of cyclones. JMA and KMA underpredict, and CMA and CMC vary, but the biases of all of these systems are very small. On the other hand, BoM, NCEP, CPTEC, and UKMO all significantly underpredict cyclone intensity. BoM, NCEP, and CPTEC in particular show a dramatic increase in negative bias in the earlier part of the forecast (shorter lead time). This corresponds to the rapid error growth exhibited by these systems in the earlier part of the forecast (Fig. 4b).
Figure 5b shows the signed propagation speed difference between the mean propagation speed of the ensemble member tracks and the propagation speed of the analysis tracks (i.e., the bias for the error shown in Fig. 4c). It is interesting that all of the EPS underpredict the propagation speed of the cyclones. Hence, cyclones will in general arrive earlier than they are forecast to. The magnitude of the bias varies between the centers, with BoM and UKMO having the largest and CPTEC, CMA, and ECMWF having the smallest. A similar negative bias was found for the control forecasts of each EPS (not shown). This shows that the bias must be due to a deficiency in the models rather than the perturbation methodologies.
The curves in Figs. 5a and 5b have an oscillating pattern, which appears to be related to the frequency with which the EPS suites are run. NCEP has the smoothest curves as it produces forecasts every 6 h and the JMA curves have a 24-h oscillating pattern corresponding to the frequency at which they were produced. This pattern was noted for the ECMWF EPS by Froude (2009) and was attributed to the data assimilation. A larger number of observations are assimilated at 0000 and 1200 UTC than at 0600 and 1800 UTC. In particular, radiosonde observations are only assimilated at 0000 and 1200 UTC. It was suggested that the extra observations assimilated into the analysis at these times would improve the accuracy of the analysis, but would nudge it farther away from the forecast than the smaller number of observations assimilated at 0600 and 1800 UTC, resulting in larger bias values and the 12-h oscillation pattern seen in the results. The results for the other EPSs produced at different frequencies, to the 12-hourly frequency implemented at ECMWF and a majority of the other centers, suggest that the frequency the forecasts are produced at plays a larger role than the data assimilation.
d. Control and ensemble mean error
One of the aims of an ensemble prediction system is for the ensemble mean to provide a superior forecast to the control (Leith 1974; Toth and Kalnay 1993, 1997). In this section the skill of the control forecast is compared with that of the ensemble mean for each EPS for cyclone position, intensity, and propagation speed.
Figures 6 –8 show the control and ensemble mean error in cyclone position, intensity, and propagation speed, respectively, for each EPS (the spread of the ensemble is also shown in the figures and will be discussed in the following subsection). It is clear from the Fig. 6 that the ensemble mean provides very little advantage over the control forecast in predicting the positions of cyclones. A small difference can be seen from around day 4 for CMA and perhaps for CMC and KMA, but for the majority of the EPSs there is very little difference at all.
For the intensity of the cyclones, much more of a difference can be seen between the control and the ensemble mean (Fig. 7). From around day 2 of the forecast the ensemble mean begins to provide an advantage over the control forecast for all the EPSs except CPTEC. As already discussed, CPTEC only applies perturbations in the region of 45°S–30°N and it is therefore unsurprising that there is less impact in the NH extratropics. CMA, CMC, ECMWF, JMA, KMA, and UKMO all have similar differences in skill between their ensemble mean and control forecasts; the skill of the control at day 5 of the forecast is comparable to the skill of the ensemble mean at day 7 for all these EPSs. There is less of a difference between the control and ensemble means for BoM and NCEP, which as mentioned in section 4b (in relation to the faster error growth of intensity for these EPSs) have low spatial resolutions.
Finally, considering the propagation speed (Fig. 8), the ensemble mean has a higher level of skill than the control forecast for all of the EPSs. CMA, CMC, and KMA have the largest differences between the ensemble mean and control. It is interesting that CMA and KMA are the two EPS that use BV perturbations and CMC is the only EPS that uses EnKF.
e. Ensemble spread and ensemble mean error
In this final results section, the ensemble spread is compared with the ensemble mean error for each of the EPSs. The spread is calculated as the average difference of the ensemble members from the ensemble mean. For an EPS to be statistically reliable the ensemble mean error should be equal to the ensemble spread. Figure 6 shows the ensemble spread and ensemble mean error for the cyclone position for each of the EPSs. ECMWF and JMA have the best spread–skill relationship. This is interesting since the two systems have very similar characteristics (see Table 1). Both EPSs have 50 members, use SV perturbations, use 4DVAR, and have similar horizontal and vertical resolutions. However, the ECMWF EPS does include model perturbations whereas the JMA EPS does not. In fact when ECMWF first introduced model perturbations to their EPS, one of the impacts was an increase in the spread of the ensemble [see Buizza et al. (1999), which assessed this using more traditional field-based measures of verification]. It is therefore interesting that the JMA EPS has a relatively large spread without model perturbations. The other EPSs are all underdispersive to varying degrees. For CMC and CMA, there is quite a small difference between the spread and the mean error curves, whereas for BoM there is a very large difference. Once again, the extremely large difference seen for CPTEC is to be expected due to the limited region to which the perturbations are applied.
For cyclone intensity (Fig. 7), there are much larger differences between the mean error and spread than for the position of the cyclones. This was found previously for the ECMWF EPS (Froude et al. 2007b). All the EPSs are underdispersive. In slight contrast to the position results, it is ECMWF and CMC that have the smallest differences between their mean error and spread, with JMA having a slightly larger difference. It seems model perturbations have more of an impact on the spread in cyclone intensity than position. Again, the BoM EPS is very underdispersive, but here NCEP is also. This again possibly highlights the importance of resolution in predicting cyclone intensity.
As with intensity, Fig. 8 shows that all the EPSs are also underdispersive for propagation speed and to an even greater extent. CMC has the best spread–skill relationship (i.e., the spread and ensemble mean error curves are closest together) for this diagnostic, with ECMWF and JMA not performing as well as for position and intensity.
5. Discussion and conclusions
This paper has analyzed the prediction of extratropical cyclones by nine different EPSs archived as part of the TIGGE program for the 6-month time period of 1 February 2008–31 July 2008. This time period was chosen because it was the first 6-month period for which all of the data were available. If a different time period were analyzed, such as a DJF season containing potentially more intense winter cyclones, the results might be slightly modified. It is also important to note that since all the EPSs are subject to continuous development, the performance characteristics are likely to differ in the future as the systems evolve.
The results for the selected time period show large differences between the different EPSs. The relative performance of the different EPSs varies for different measures of ensemble performance (i.e., forecast skill of the ensemble mean or the spread–skill relationship) and for the cyclone property (i.e., position or intensity). This highlights the importance of using a variety of verification measures when assessing the skill of a forecasting system. It also illustrates how different EPSs may be useful for different applications. For example, an oil company may require forecast information concerning the prediction of cyclones for managing their operations. If an intense cyclone is forecast to strike in a region where operations are being carried out, then it may be more important for the oil company to know the time the cyclone is likely to strike to a high accuracy than the exact intensity of the cyclone. Shutting down operations too early is extremely costly, whereas shutting them down too late can be both costly and cause loss of life. It is also useful to know any biases in the system so that they can be taken into account when an operational forecaster is interpreting the forecast data. For example, if the ensemble is generally underdispersive, then the ensemble mean is likely to have a larger error than the ensemble spread suggests, or if cyclones generally propagate too slowly in the model, then this should be taken into account when predicting the time a cyclone is likely to strike in a particular location. This paper finishes with a discussion of the main results and plans for future work.
Overall, the relative performance of the different EPSs in predicting cyclones seems to agree with the results obtained using more conventional forecast verification measures (Park et al. 2008). The ECMWF EPS has the highest level of skill for both its ensemble mean and control forecast for cyclone position, intensity, and propagation speed. As mentioned previously, there may be some bias because all the EPSs have been verified against the ECMWF analyses. For the positioning of the cyclones, the JMA, NCEP, UKMO, and CMC ensemble mean have approximately 1 day less skill than ECMWF throughout the forecast. For the intensity, the relative performance of the different EPSs remains the same except that NCEP has a lower level of performance for intensity than for position. Closer examination of the results reveals that the error growth of the intensity is much faster in the earlier part of the forecast for both the control and ensemble mean of the NCEP, CPTEC, and BoM EPSs. The spread of these EPS is far too small for cyclone intensity and the ensemble significantly underpredicts intensity, with a dramatic increase in negative bias in the earlier part of the forecast (see Fig. 5). This is perhaps because these EPSs have comparatively low horizontal and vertical resolutions compared with the other EPSs and are not able to accurately capture the cyclones’ structure. Sufficient vertical and horizontal resolutions are necessary to accurately predict the tilted structure of baroclinic systems, which are critical to the cyclones’ growth and decay. The CMC EPS also has low resolution, but performs much better at predicting cyclone intensity. This is possibly due to the use of a 4DVAR data assimilation system and the ETKF perturbation approach. In 4DVAR assimilation systems a model trajectory is fitted to the observations taken throughout the assimilation window and may therefore better represent the vertical structure and evolution of baroclinic systems in the earlier part of the forecast (Johnson et al. 2006). In the ETKF perturbation approach of CMC, the observations are perturbed randomly and are then reassimilated with 4DVAR. The tilted structure of the cyclones may perhaps therefore be better predicted by both the control and perturbed ensemble members despite the comparatively low resolution.
Results comparing the skill of the ensemble mean with that of the control forecasts show that for cyclone position there is very little difference between all of the EPSs. For cyclone intensity, however, the ensemble mean does offer a significant improvement in skill for all EPSs except CPTEC. At day 7 of the forecast the ensemble mean has 2 days more skill than the control for all the EPSs except BoM and NCEP. This shows that by running an ensemble a more accurate prediction of cyclone intensity can be obtained, but there is no real gain in skill for cyclone position (although results assessing the spread–skill relationship show that the EPS do provide probabilistic information, providing confidence in the prediction of cyclone position on a case by case basis). This is useful information since if an accurate prediction of cyclone position is of most importance to the user, they may be better off using a high-resolution deterministic forecast than an EPS. For cyclone propagation speed, the ensemble mean forecast has a higher level of skill than the control forecast for all of the EPSs, but the differences in skill are largest for CMA, CMC, and KMA. It is noted that CMA and KMA are the only two EPSs that use BV perturbations and CMC is the only EPS that uses EnKF.
A good spread–skill relationship is a desirable property of an EPS, since the spread of an ensemble can then provide an indication of the current predictability of the atmosphere and the magnitude of the error of the ensemble mean. For cyclone position, ECMWF and JMA have the highest performance; the ensembles are just slightly underdispersive. As discussed in section 4e, these two EPSs have very similar characteristics except that ECMWF perturbs their forecast model physics whereas JMA do not. The other EPSs are all underdispersive to varying degrees. For cyclone intensity there is a much greater difference between the ensemble mean error and the spread for all of the EPSs. This was also found to be the case previously for the ECMWF and NCEP EPSs (Froude et al. 2007b; Froude 2009). Here, it is ECMWF and CMC that have the highest performance, with JMA not performing quite as well. This is very interesting since ECMWF and CMC both perturb their forecast model physics. It seems that forecast model physics perturbations have more impact on increasing the spread in cyclone intensity than position. This is probably to be expected since cyclone position will be more dependent on the large-scale steering-level flow than on the smaller-scale parameterized processes that are perturbed. Cyclone intensity, on the other hand, will be much more influenced by these smaller-scale processes. For cyclone propagation speed it is CMC that has the best spread–skill relationship, with ECMWF and JMA not performing as well (particularly in the earlier part of the forecast).
ECMWF is the only EPS to consistently overpredict the intensity of cyclones, although the small magnitude of this bias should be noted. BoM, NCEP, CPTEC, and UKMO significantly underpredict cyclone intensity. It is worth pointing out that this and some of the other results may be influenced by the bias or deficiency in the ECMWF analysis (see, e.g., Bengtsson et al. 2009). This will be an issue with any verification dataset, which is why plans for future work include verifying the EPSs against alternative analyses, in order to obtain a broader picture (discussed above). In particular, there may be some positive bias in the results for the ECMWF EPS, but this is likely to only be in the earlier part of the forecast (Bengtsson et al. 2005).
There are biases in all the EPSs (of varying magnitudes) for the cyclones to propagate too slowly. This was also found to be the case for the control forecasts and, therefore, suggests it is related to the forecast models rather than how the ensembles are constructed. The magnitude of the bias is small; however, the cumulative effects will result in the 5-day forecast of the cyclone being on the order of 200–400 km behind the analyzed cyclone. While this is small in relation to the spatial scale of extratropical cyclones, it would still be of importance to oil operations, for example.
This paper represents the first of a number of planned studies exploring the prediction of extratropical cyclones by the EPSs contained in the TIGGE archive. In future studies Southern Hemisphere cyclones will be considered and a more in-depth regional analysis will be performed. There are many factors to be considered in the construction of an EPS and the EPSs analyzed in this paper differ considerably. Plans for future work also include trying to gain additional understanding of how these factors impact cyclone prediction. In particular, the vertical structure and tilt of the cyclones will be explored to try to gain a more in-depth understanding of the impacts that different factors, such as resolution, have on the prediction of cyclone growth and evolution.
The results and cyclone-tracking methodology of this paper are potentially very useful to weather forecasters and a wide range of forecast users. However, further work is required to determine how the results should be interpreted and utilized by each particular user. For the weather forecasters themselves, it is important to determine which diagnostics provide the most helpful information for making a forecast. For example, the results of this paper concerning the intensity of cyclones are presented using the filtered vorticity field because this is the most appropriate field to use for the cyclone identification and tracking (Hoskins and Hodges 2002). However, forecasters in general make use of the MSLP field and may find information about the vorticity field difficult to interpret. It is possible to identify and track the cyclones using the vorticity field and then obtain the corresponding MSLP values (Bengtsson et al. 2009). This approach would probably be more appropriate for operational weather forecasting and will be investigated as future work. Another important question, relevant to forecasters, is how the biases of cyclone intensity and propagation speed could be taken into account and used to adjust a forecast. Again, this is a topic for future work. As well as considering the forecasters, plans are under way to work with other users of weather forecast data. Work has already been performed in collaboration with the oil and gas consultancy Schlumberger (information online at www.slb.com) to determine how information concerning the prediction of cyclones by EPSs can be used to manage their operations both on- and offshore. For this type of application, information concerning wind strength and storm surge will probably be most useful to the user.
The author thanks the TIGGE contribution centers and the data centers for providing the data. Thanks go to Kevin Hodges, Robert Gurney, and Lennart Bengtsson for their help and advice, and also to the anonymous reviewers for their thoughtful and detailed comments, which helped to improve this paper considerably. The oil and gas consultancy Schlumberger is acknowledged for funding this research.
Corresponding author address: Lizzie S. R. Froude, Environmental Systems Science Centre, University of Reading, Harry Pitt Bldg., Whiteknights, P.O. Box 238, Reading RG6 6AL, United Kingdom. Email: firstname.lastname@example.org
Performing the identification and tracking on the ensemble mean of the vorticity field would not work very well. The smoothing of the data, resulting from the averaging, means that the identification thresholds would not be met. Averaging the data would also result in a large number of double centers, causing further problems with the identification.