## 1. Introduction

Ensemble prediction systems (EPS) have become one of the key components of the operational forecasting suites at many meteorological institutes. The European Centre for Medium-Range Weather Forecasts (ECMWF; Palmer et al. 1993; Molteni et al. 1996) and the National Centers for Environmental Prediction [(NCEP), previously the National Meteorological Center (NMC; Tracton and Kalnay 1993; Toth and Kalnay 1993; Wei et al. 2006] implemented the first operational ensemble prediction systems in 1992. They were followed, in 1995, by the Meteorological Service of Canada (MSC; Houtekamer et al. 1996, 2007). In the past decade, global ensemble prediction systems have been developed and implemented in eight other centers: the U.S. Navy in Monterey, California; the Bureau of Meteorology Research Centre in Melbourne, Australia; the Centro de Previsao de Tempo e Estudos Climatico in Sao Paulo, Brazil; the Chinese Meteorological Agency in Beijing, China; the Japanese Meteorological Agency in Tokyo, Japan; the Korean Meteorological Agency in Seoul, South Korea; and more recently Météo-France in Toulouse, France; and the Met Office in Exeter, United Kingdom [see Park et al. (2008) for a recent comparison of the performance of the global ensemble systems that are taking part in the World Meteorological Organization (WMO) The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble project (TIGGE)]. Ensemble forecasts are used to construct a range of products, including the two most commonly used ones: the first-order moment, the ensemble mean, and the second-order moment, the ensemble standard deviation.

One of the advantages that one should expect from using ensemble-based products, in particular the ensemble mean, instead of a single forecast defined by the control member, is a better consistency between consecutive forecasts valid for the same verification time, because of the filtering effect of ensemble averaging. For example, ensemble-mean forecasts should be more consistent, as illustrated by Buizza (2008a), who investigated the importance of consistency of precipitation forecasts in the case of a severe weather event that affected northern Italy between 14 and 16 October 2000 (this event was the most severe flood of the Po’ River after the terrible events of 1951).

This key property, that is, the consistency and inconsistency, or “jumpiness” of a forecast is investigated in this work. More specifically, attention will focus on the following two questions:

- Are consecutive forecasts valid for the same verification time defined by the ensemble mean more consistent than the corresponding forecasts defined by the control?
- If the control forecast jumps, how closely does the ensemble-mean forecast follow the control forecast?

These two questions are addressed considering forecasts given by the control and the ensemble-mean forecasts of the ECMWF and the Met Office operational ensemble systems for an 18-month period. These two ensembles include one unperturbed member, the control forecast that starts from the unperturbed analysis (i.e., the best estimate of the atmosphere initial conditions defined by the data assimilation system), and *N* perturbed members, where *N* = 50 for the ECMWF and *N* = 23 for the Met Office ensemble, starting from perturbed initial conditions. In both systems, stochastic schemes are also used during the forecasts to simulate the effect of model error on the forecast states. Forecast inconsistency has been assessed for four variables: the 500- and 1000-hPa geopotential height and the 500- and 850-hPa temperature. Four different areas are considered: three over northwestern Europe with different size but centered on the same point; and a separate area over southeast Europe with the same size as the middle northwest Europe area (Fig. 1).

After this introduction, section 2 briefly reviews the ECMWF and the Met Office ensemble prediction systems, describes the data used in this work, and explains the methodology used to measure the consistency/inconsistency (i.e., the jumpiness) of control and ensemble-mean forecasts. Section 3 illustrates the use of the inconsistency and jumpiness concepts in two case studies, and also for a 2-month time series. Section 4 presents the bulk of the jumpiness results: it discusses the period-average inconsistency, the inconsistency correlation and the forecast jump frequency statistics of consecutive ECMWF and Met Office control and ensemble-mean forecasts; and analyzes the sensitivity of the inconsistency measures to parameters, areas, and sampling periods. Section 5 addresses the relationship between the forecast jumpiness and forecast error. Finally, conclusions are drawn in section 6.

## 2. Description of the ECMWF and the Met Office ensemble systems, datasets, and introduction of the jumpiness indices

In this section, first the key characteristics of the ECMWF and the Met Office Ensemble Prediction Systems are briefly described, then the database used in this work is introduced, and finally the methodology used to assess the forecast jumpiness and the statistical indices used to quantify it are defined.

### a. The ECMWF Ensemble Prediction System

Since 11 March 2008, ECMWF has been running a unified medium-range and monthly ensemble system, formed by merging the previous Variable Resolution Ensemble Prediction System (VAREPS) and monthly configurations (Buizza et al. 2007; Vitart et al. 2008). The merged system includes 51 members integrated twice a day up to 15 days, except once a week when they are extended to 32 days. More precisely, these are the characteristics of the ECMWF Ensemble Prediction System (EC-EPS):

- 1200 UTC: T
_{L}399L62 (50-km resolution, 62 levels, days 0–10) and T_{L}255L62 (80-km resolution, days 10–15) with persisted SST anomaly; - 0000 UTC: T
_{L}399L62 (days 0–10) with persisted SST anomaly, and T_{L}255L62 (days 10–15) with coupled ocean model; on Thursdays the integration is extended to day 32.

Initial uncertainties are simulated using a combination of initial-time and evolved singular vectors (Buizza and Palmer 1995) computed at T42L62 resolution, with 48-h optimization time interval and a total-energy norm. Singular vectors are computed to maximize the final-time norm over different areas (Barkmeijer et al. 1999, 2001), combined and scaled to have initial amplitude comparable to an estimate of the analysis error. Model uncertainties due to physical parameterizations are simulated using a stochastic scheme (Buizza et al. 1999). The reader is referred to Palmer et al. (2007) for a review of the developments of the ECMWF ensemble system since its implementation in 1992 to date.

### b. The Met Office Ensemble Prediction System

The 15-day Met Office Ensemble Prediction System (UK-EPS) is an extension of the Met Office Global Regional Ensemble Prediction System (MOGREPS; see Bowler et al. 2008). The UK-EPS is run twice a day, at 0000 and 1200 UTC, has a resolution of 90 km, 38 vertical levels, and includes 24 members (23 perturbed plus the unperturbed control forecast). Initial uncertainties are simulated using an ensemble transform Kalman filter (ETKF) approach (Wei et al. 2006; Bishop et al. 2001; Bowler et al. 2007). Model uncertainties are modeled using a stochastic backscatter scheme (Shutts 2005). The reader is referred to Park et al. (2008) and Johnson and Swinbank (2008) for two recent papers that compare the performance of the EC-EPS and the UK-EPS, and discuss the potential value of combining the two ensembles.

### c. The dataset used in this study

The EC-EPS and the UK-EPS forecasts used in this study were extracted from the TIGGE archive hosted online at ECMWF (more information available online at http://tigge.ecmwf.int/) on a regular 1° × 1° latitude–longitude grid. To account for the poleward distortion of the grid, weights defined by the cosine of the latitude have been given to each grid point when area averages have been computed. Since the UK-EPS started being available in the TIGGE archive in January 2007, we chose to base our statistical computations on the period from February 2007 to August 2008. In effect this meant 18 months worth of forecasts, issued twice a day at 0000 and 1200 UTC, a sufficiently large sample to provide robust results. Furthermore, to assess the seasonal variability of the ensemble jumpiness, at least for the EC-EPS, ECMWF ensemble data from June 2003 to August 2008 (i.e., about 5 yr) have also been considered.

### d. The methodology and indices used to measure forecast jumpiness

Five different measures of forecast jumpiness, one basic index, and four related concepts are introduced. First, the inconsistency index of two forecasts valid for the same verification time is defined. Then, considering the average of the inconsistency index for a given sampling period, the concepts of different forecast jumps—the “flip,” “flip-flop,” and “flip-flop-flip”—and their corresponding indices, are defined in their two flavors, the *single* and the *parallel* one. Finally, the *inconsistency correlation* between two forecasts’ inconsistency index time series is introduced to measure how similar the inconsistency of two different forecasts is.

*f*(

_{j}^{C}*d*,

*t*) the forecast started on date/time

*d*and valid for date/time

*d*+

*t*, defined for all grid points of an area Σ, where the superscript

*C*= EC identifies the ECMWF forecast, and

*C*= UK identifies the Met Office forecast. The subscript

*j*= 0 identifies the control forecast, while

*j*= 1,

*N*identifies the

^{C}*N*perturbed ensemble forecasts. Denote with 〈

^{C}*f*(

_{m}^{C}*d*,

*t*)〉 the ensemble-mean forecast, defined by averaging the control and all the perturbed members:Consider two forecast fields

*f*(..,..) and

*g*(..,..). Denote with

*d*

_{Σ}[

*f*,

*g*] the difference between the two forecasts over the area Σ:where gp denotes the grid points in the area, and denote with std

_{Σ}[

*f*] the standard deviation of the field values over the area Σ:where

*A*

_{Σ}is the average of the field values over the area Σ:

#### 1) Definition of the inconsistency index used to measure forecast jumpiness

*d*and (

*d*+

*δ*) and valid for the same verification time, where

*δ*has been set to 12 h in our case. The inconsistency index INC

_{Σ}[

*f*(

_{j}^{C}*d*,

*t*),

*δ*] between the two forecast fields issued

*δ*hours apart (here

*j*is either 0 or

*m*corresponding to control or ensemble mean) over the area Σ is defined as the ratio between their distance and their average standard deviation:

The denominator has been introduced to take into account the fact that different forecasts have different characteristics (e.g., resolution and variability). For example, the control forecast has more detail (i.e., includes smaller scales) and thus is expected to have larger inconsistency than the smoother ensemble-mean forecast. This scaling factor normalizes the differences between consecutive forecasts to the variability of the forecasts themselves. The normalization is also important to enable comparison of inconsistency indices computed for different forecast steps, different weather parameters, or different forecasting systems.

*N*dates, the

*N*-period root-mean-square average of the daily inconsistency index is defined as (hereafter referred to as the period-average inconsistency):

#### 2) definition of the concept of a forecast flip

*d*and (

*d*+

*δ*), and valid for the same verification time (

*d*+

*t*). The forecast

*f*(

_{j}^{C}*d*,

*t*) is defined to have made a single flip if the absolute value of the inconsistency index is larger than half of the period-average inconsistency over

*N*forecast runs:

#### 3) Definition of the concept of a forecast flip-flop

*d*−

*δ*),

*d*and (

*d*+

*δ*), and valid for the same verification time. The forecast

*f*(

_{j}^{C}*d*,

*t*) is defined to have made a single flip-flop if the forecast made two consecutive single flips but with an opposite sign. In other words, a forecast made a single flip-flop if the following three conditions are met:

#### 4) Definition of the concept of a forecast flip-flop-flip

Consider four consecutive forecasts issued at date/time (*d* − *δ*), *d*, (*d* + *δ*), and (*d* + 2*δ*), and valid for the same verification time. The forecast *f _{j}^{C}*(

*d*,

*t*) is defined to have made a single flip-flop-flip if the forecast made three consecutive single flips with alternating signs. In other words, if both forecasts

*f*(

_{j}^{C}*d*,

*t*) and

*f*(

_{j}^{C}*d*+

*δ*,

*t*−

*δ*) have made a single flip-flop with opposite orientation.

#### 5) Definition of the parallel forecast jumps

To help characterize the strength of the relationship between two forecast systems (e.g., the control and the ensemble-mean forecasts of an EPS or the control forecasts of two different EPSs) we use an extended concept of the single forecast jump indices. A two-forecast system is said to have made a parallel flip, parallel flip-flop, or parallel flip-flop-flip, when both forecasts make a single flip, single flip-flop, or single flip-flop-flip, respectively, and the jumps are in phase.

#### 6) Definition of the inconsistency correlation

*f*(

*d*,

_{i}*t*) and

*g*(

*d*,

_{i}*t*) over

*N*forecast runs, denoted here for simplicity by

*F*and

_{i}*G*where

_{i}*i*= 1, 2, … ,

*N*refers to the forecast runs. The inconsistency correlation between the two inconsistency index time series,

*F*and

_{i}*G*over the area Σ, is defined by the product-moment correlation coefficient:

_{i}The inconsistency correlation is computed either for an individual EPS system with *f* and *g* representing the control and the ensemble-mean forecasts, or for two EPS systems with *f* and *g* representing the two controls or the two ensemble-mean forecasts.

## 3. Practical examples of the jumpiness

In this section we give some insight into the behavior of the forecast jumpiness by considering only the EC-EPS control and ensemble-mean forecasts. First, two individual cases are discussed: forecasts verifying on 23 February 2008 and 31 January 2008. Then, the time series of the inconsistency index at *t* + 168 h valid for January–February 2008 is analyzed.

### a. Case 1: Jumpiness during zonal-flow conditions in February 2008

During this case of February 2008 (this event actually triggered our investigation), the ECMWF control and the EC-EPS ensemble-mean forecasts were very jumpy for a few days in a row over northern Europe, and the EC-EPS ensemble mean followed the control forecast very closely. This raised the question of whether this was an unusual event, and more generally of whether the ensemble-mean forecast was on average following the control forecast too closely.

At the time of this event, the northern part of Europe was under the influence of a strong zonal flow, with a ridge building up and moving eastward rather rapidly (Fig. 2). Figure 3 shows the 180-, 168-, and 156-h control and ensemble-mean forecasts valid for 1200 UTC 23 February. In terms of the large-scale, zonal flow the forecasts look consistent and are in quite good agreement with the observed flow. However, a closer look highlights large inconsistencies in the small scales (e.g., in the prediction of the ridge that was affecting the northern European countries). The high level of uncertainty in the region is also indicated by the unusually large ensemble spread (not shown), as measured by the ensemble standard deviation. The “jumpy” behavior of the consecutive forecasts is highlighted on Fig. 3 by the arrows indicating the dominant flow direction. The forecasts started on 16 (*t* + 180 h) and 18 (*t* + 156 h) February show a ridge stretching over the United Kingdom, while the intermediate forecasts started on 17 February (*t* + 168 h) indicate a significantly more zonal flow with only slightly anticyclonic curvature over the United Kingdom. Note that the control and the ensemble-mean forecasts show a very similar behavior, although the jumps in the ensemble mean have smaller amplitude.

Figure 4 shows the inconsistency index of the EC-EPS control and ensemble-mean forecasts as a function of forecast range for all forecasts verifying at 1200 UTC 23 February. The ensemble-mean forecast follows the control forecast rather closely between forecast steps *t* + 240 h and *t* + 144 h, while it jumps much less for longer forecast ranges. At a short forecast range both forecasts change less and follow each other very closely (this is not surprising in the short range, since by construction the ensemble-mean and the control forecasts coincide at initial time, and perturbations grow linearly during the first 24–48 h (see e.g., Gilmour et al. 2001).

The dotted and dash–dotted lines in Fig. 4 show the threshold values used in the definition of forecast jumps. The control and/or the ensemble mean makes a flip if the corresponding inconsistency indices (open circles for the control, triangles for the ensemble mean) are outside of the area bounded by the two dotted (control) or dash–dotted (ensemble mean) lines (i.e., if the modulus of the inconsistency index is larger than half of the period-average inconsistency). The values enclosed in the large circles in Fig. 4 (at *t* + 180 h and *t* + 168 h) correspond to the forecasts displayed in Fig. 3. The lower circle includes the inconsistency indices computed between the first (*t* + 180 h) and second (*t* + 168 h) forecast fields (delivering the inconsistency index for *t* + 180 h) in the upper (control) and lower (ensemble mean) rows of Fig. 3. Similarly, the upper circle includes the inconsistency indices computed between the second (*t* + 168 h) and third (*t* + 156 h) fields (inconsistency index for *t* + 168 h). The two circled inconsistency-index pairs actually correspond to a single flip-flop of the control and also of the ensemble mean, because they zigzag outside of the jump-threshold lines (first both geopotential height fields get lower, then on the next run both get higher again). Moreover, since the control and the ensemble-mean jump in the same direction, they also make a parallel flip-flop. A closer inspection actually indicates a parallel flip-flop-flip of the three consecutive 180-, 168-, and 156-h forecasts. Note also that from *t* + 228 h to *t* + 144 h the control and the ensemble mean always jump in phase.

### b. Case 2: Jumpiness during a fast-moving cyclone in January 2008

The second case was associated with a fast-moving, very intense cyclone sweeping across northern Europe in January 2008 and causing disruption and damages to the surrounding countries (Fig. 5). Figure 6 shows the inconsistency indices of the EC-EPS control and ensemble-mean forecasts as a function of forecast range for this event. This case differs substantially from the previous one in the extent to which the ensemble mean follows the control. In the February case (Fig. 4) the control and the ensemble mean changed from run to run very closely for days, while in this case the relationship is much less apparent and the parallel zigzagging is rather limited.

### c. Time series of the inconsistency index for the t + 168-h forecasts valid for January–February 2008

Before section 4’s discussion of the 18-month average results, it is interesting to analyze in some detail a time series of the inconsistency index for a shorter, 2-month period. Figure 7 shows the inconsistency indices for the *t* + 168-h forecasts from January to February 2008. The inconsistency index of the EC-EPS ensemble-mean forecasts is smaller than that of the control forecast for most cases, with synchronized zigzag events for both the ensemble mean and the control occurring only very rarely (see e.g., the consecutive runs around 4–7 January 2008, when both indices have roughly the same values). The correlation coefficient between the two curves is 0.60.

## 4. Statistics of the jumpiness of the EC- and the UK-EPS control and ensemble-mean forecasts

In this section we summarize the 18-month average results for both the EC-EPS and the UK-EPS systems. For reason of space, most plots and discussion will refer to the 500-hPa geopotential height over the medium northwestern European area for the sampling period 16 February 2007 to 15 August 2008 (the period for which forecasts from the UK-EPS were available in the ECMWF TIGGE archive).

### a. Period-average inconsistency statistics

Figure 8 shows that all average inconsistency indices increase with the forecast range in agreement with the practical experience that the forecasts are usually more consistent at short forecast ranges. Figure 8 also shows that for both ensemble systems, the ensemble-mean period-average inconsistency index is smaller than the control one, especially at long forecast ranges. Although it is difficult to relate the differences between the indices to precise system characteristics, factors that might have contributed to them are model resolution, model activity, and forecast skill. Figure 8 shows, for example, that the EC-EPS control is more inconsistent than the UK-EPS control, which could be because the EC-EPS has a higher resolution, and its members are more active. By contrast, Fig. 8 shows that the EC-EPS ensemble mean is less inconsistent than the UK-EPS ensemble mean: this could be due to the fact that the EC-EPS has a larger, better tuned ensemble spread (Park et al. 2008).

### b. Control and ensemble-mean inconsistency correlation statistics

To measure the relationship between the inconsistency indices of the control and ensemble-mean forecasts, the product-moment correlation coefficient (Neter et al. 1988) between their corresponding inconsistency indices has been computed. In both the EC-EPS and UK-EPS, the initial perturbations are centered on the control forecast, therefore the correlation between control and ensemble-mean inconsistency will be high at short ranges, while initial perturbations evolve linearly, and then decrease as perturbation growth becomes more nonlinear (Fig. 9). The correlation is slightly stronger in the EC-EPS than in the UK-EPS during the first few days, while it is weaker in the medium range. This is also consistent with the spread characteristics of the two systems: in the short forecast range the UK-EPS has a larger spread since it uses larger initial perturbations. This makes each perturbation’s growth more nonlinear, and the ensemble-mean/control divergence stronger at shorter forecast ranges. The opposite is valid in the long range, where the UK-EPS is underdispersive (Park et al. 2008).

It is interesting to contrast these correlations with the correlation between the inconsistencies of the two control forecasts or the two ensemble-mean forecasts. Figure 9 shows that the inconsistency correlation between the two control forecasts (pluses) and the two ensemble-mean forecasts (crosses) are much lower than the correlations obtained for each individual system (squares and triangles). The large difference between the same-system and the different-system correlations throughout the forecast range could be seen as an indication that each single system is still not capable of properly simulating the effect of model or analysis uncertainty.

### c. Forecast jump frequency statistics

Occasionally (e.g., the case discussed in section 3a), the ensemble mean follows the control very closely in a zigzagging manner for a number of runs. If such situations happened too often, the capability of the EPS to improve the prediction of potentially damaging weather scenarios could be reduced. To measure the frequency of occurrence of such events we counted the number of cases when the control and the ensemble-mean forecasts—either individually or in parallel—made flips, flip-flops, and flip-flop-flips.

Figure 10 shows that the frequency of individual flips is similar throughout the forecast range for the control and ensemble-mean forecasts for both systems. The parallel flip frequency between the control and the ensemble-mean forecasts of each system reflects the weakening relationship between the control and the ensemble mean with the forecast range (cf. Fig. 9): at short ranges almost every jump occurs simultaneously, but this decreases steadily through the medium range. The parallel flips between EC-EPS and UK-EPS also reflect the correlations discussed in the previous section: the two systems jump together substantially less often than the same-system control and ensemble mean.

The frequencies of flip-flops (Fig. 11) and flip-flop-flips (Fig. 12) are lower than for flips. For the ensemble mean, the flip-flops and flip-flop-flips indices are noticeably smaller than the control indices for both EPS systems, especially in the medium range. Thus, although the control and ensemble-mean forecasts have a similar numbers of flips, the ensemble mean has a definite smaller frequency of flip-flops and flip-flop-flips. This confirms the earlier results that the ensemble mean jumps less than the control forecast. Note also that this is more evident for the EC-EPS: this, as already mentioned, is probably linked to the fact that the EC-EPS has a better tuned spread than the UK-EPS.

### d. Results sensitivity to parameters, areas, and sampling periods

To assess the robustness of the results presented above, the inconsistency indices have been assessed for the other variables, different areas, and sampling periods. The discussion is limited to the EC-EPS system.

Figure 13 shows the sensitivity of the period-average inconsistency to the parameter. Although the relationship between the control and the ensemble-mean curves is similar for all parameters, the absolute values vary substantially. Over the medium northwestern European area the most inconsistent parameter is the 1000-hPa geopotential height; and forecasts are more consistent for temperature than for geopotential height.

Regarding the sensitivity of the inconsistency to the area, first consider the areas nested in the large northwestern European area (Fig. 1), which by construction, are affected on average by similar weather climatology. Figure 14 shows that the period-average inconsistency is very sensitivity to the area size. This strong sensitivity to the area size is not surprising, since it reflects the fact that, in general, any forecast, or a forecast rms error, has more variability if computed over a smaller than a larger area. Consider now the indices for the middle of the northwesterly areas and the one for the southest area, which have the same size but are affected, on average, by different synoptic activity and weather regimes. Figure 14 shows that the indices for the two areas are very similar, suggesting that there is little sensitivity to the weather climate. The different flip indices and the inconsistency correlation between the control and the ensemble mean also show little sensitivity to the choice of the weather parameter and the area size (not shown).

The sensitivity of the inconsistency indices to the sampling period has also been investigated. Although there is some moderate variability on the seasonal scale, it becomes very small if averages are computed for longer periods (i.e., 12 months or more, not shown), confirming the limited sensitivity of the inconsistency to the weather regimes.

## 5. Relationship between jumpiness and forecast error

Routine assessment of the ECMWF ensemble performance has indicated that there is a certain degree of correlation between ensemble spread and forecast error, with cases with lower-than-average ensemble standard deviation characterized by lower-than-average ensemble-mean errors (see e.g., Buizza and Palmer 1998). It is useful to know whether there is a similar relationship between jumpiness and forecast error. In other words, are cases characterized by less inconsistent forecasts easier to predict?

To assess this, forecast error statistics have been computed for subsets of cases characterized by either 0, 1, 2, or 3 consecutive flips. Similarly, forecast error statistics have been computed for cases with small, medium–small, medium–large, and large ensemble spread (the spread was classified as small when it was below the average spread minus one spread standard deviation, medium–small when it was between the average and the average minus one standard deviation, etc.). This assessment was made for the EC-EPS over the three months June, July, and August, so that 5.5 months worth of data were available within the 18-month total period. Figure 15 shows the average and standard deviation of rms errors of the control and ensemble-mean forecasts for the cases with flip numbers equal to 0, 1, 2, and 3, computed for the 500-hPa geopotential over the medium northwestern European area. Figure 16 shows the corresponding average and standard deviation of errors for the cases with small, medium–small, medium–large, and large spread. Results show clearly that while the ensemble spread is a good predictor of the forecast error, there is only a weak relationship between forecast jumpiness and forecast error. Similar results were obtained for other parameters and areas (not shown).

For either the control or the ensemble-mean forecast, the flip number is a measure of the spread of consecutive, lagged forecasts. This result indicates that there is only a moderate correlation between the spread of a lagged ensemble and the error of the most recent forecasts. This result is in line with the conclusions of Buizza (2008b) who showed that the ECMWF operational EPS performs better than a lagged ensemble based on the six most recent high-resolution forecasts.

## 6. Discussion and conclusions

In the past decade, ensemble prediction systems have become an essential part of the operational suite of numerical weather forecasting systems. One of the key advantages that should be expected from using an ensemble prediction system instead of a single forecast is a better consistency between consecutive forecasts valid for the same verification time (Buizza 2008a). In particular, one would expect that forecasts defined by the ensemble mean (i.e., the first-order moment of the probability distribution function of forecast states and one of the most commonly used ensemble products), should be more consistent than forecasts defined by the single control forecast.

In the first part of this work, an inconsistency index (the “jumpiness” index) was introduced to measure the consistency/inconsistency of consecutive forecasts over a given area. This inconsistency index was defined as the ratio of the difference between the two forecasts and the mean of their standard deviations. This normalization was introduced to take into account the fact that different forecast systems can have different characteristics (e.g., resolution and variability). Considering the average of the inconsistency index for a sampling period, the concepts of different forecast jumps—the “flip,” “flip-flop,” and “flip-flop-flip”—and their corresponding indices, were defined. The inconsistency correlation between two forecasts’ inconsistency index time series was introduced.

In the second part of this work, these indices have been used to address two key questions:

- Are consecutive forecasts valid for the same verification time defined by the ensemble mean more consistent than the corresponding forecasts defined by the control?
- If the control forecast jumps, how closely does the ensemble-mean forecast follow the control forecast?

These questions have been addressed by considering the control and the ensemble-mean forecasts of the ECMWF and the Met Office operational ensemble systems for an 18-month period (with extension to 5 yr for the EC-EPS only) for 500- and 1000-hPa geopotential height and 500- and 850-hPa temperature, over four different areas in Europe, using datasets available at ECMWF within the TIGGE archive.

Results (Fig. 8) have shown that in the short forecast range the EPS control and ensemble-mean forecasts have very low and similar inconsistency indices, which increase slightly with the forecast step. By contrast, in the medium and long forecast ranges (around *t* + 72 h onward) the inconsistency index of the control forecast increases significantly faster than the ensemble-mean index. Therefore, the ensemble-mean forecasts provide more consistent forecasts than the control, and this benefit increases with forecast range. Generally speaking, this conclusion can be drawn for both the EC-EPS and the UK-EPS, but the EC-EPS ensemble mean showed a smaller inconsistency, probably due to the fact that the EC-EPS has a better tuned ensemble spread in the medium-range.

Considering the frequency of single forecast jumps, both the EC-EPS and the UK-EPS control and ensemble-mean forecasts have very similar statistics (Fig. 10): the percentage of single flips stays at about 55%–60% for the whole forecast range and the percentage of times when both control and ensemble-mean jump in synchrony (parallel flips) decreases with forecast step from ∼50% at initial time to ∼25% by day 15. By contrast, the control and ensemble-mean statistics for a flip-flop or a flip-flop-flip event, differ noticeably: results (Figs. 11 –12) have shown that for both the EC-EPS and the UK-EPS the ensemble-mean frequencies are lower than the control frequencies. This indicates that the ensemble-mean forecasts jump less frequently in a *zigzagging* way (e.g., cf. 18% to 26% for the EC-EPS at forecast step *t* + 168 h regarding the flip-flops, see Fig. 11).

The behavior of forecasts generated using the same model have been compared with the behavior of forecasts generated using different models. Results (Fig. 9) have shown that the inconsistency correlation between ensemble-mean and control forecasts generated using the same system is significantly higher than forecasts generated using different systems. This is confirmed by the results of the parallel jump frequencies of forecasts across the EC-EPS and the UK-EPS, as the values for both the control and the ensemble mean are significantly lower than the percentage of times when forecasts from the same EPS jump in synchrony (see Figs. 10 –12). These results indicate that the relationship between the control and the ensemble mean of the same EPS system is much stronger than the relationship between forecasts of two different EPS systems.

The sensitivity of the inconsistency indices to the weather parameters, the area, and the sampling period has been assessed for the EC-EPS using forecasts covering a 5-yr period. Results (see Figs. 13 –14) have indicated a strong sensitivity to the parameter and especially to the area size, and a weak sensitivity to the area climatology and the sampling period. By contrast, results have shown a very weak sensitivity of the correlation between the control and the ensemble-mean inconsistency indices.

Furthermore, it has been investigated whether there is a significant link between jumpiness and forecast error (e.g., whether forecasts with lower inconsistency have, on average, lower forecast error). Results based on the EC-EPS have shown that the connection between forecast inconsistency and error is weak, while they have confirmed the existence of a more substantial relationship between ensemble spread and forecast error. This latter result have been seen as a further evidence of the superiority of the EC-EPS compared to an ensemble system defined by the latest available lagged forecasts, as pointed out by Buizza (2008b).

Overall, results have shown that for both the EC- and the UK-EPS ensembles, the ensemble-mean forecast is less inconsistent in general than the control forecast, and has substantially fewer flip-flops. However, the ensemble-mean and the control forecasts from one system follow each other more closely than the control or the ensemble-mean forecasts of two different ensemble systems follow each other. The EC-EPS and the UK-EPS systems have different climatology, different model activity and different forecast skill, as documented by Park et al. (2008). All of these differences may contribute to the above conclusion. Nevertheless, the results show that flip-flops do occur for both ensemble means. This indicates that the forecast uncertainty is not sufficiently well sampled in either EPS. This could be because the ensemble size is not large enough, or because the EPS perturbations do not adequately represent the initial and model uncertainties (see, e.g., the discussion in Buizza et al. 2005).

As a practical conclusion, this work has shown that forecasters will benefit from higher consistency between consecutive runs by using the ensemble mean rather than the control forecast. But, forecasters should not rely on the jumpiness as an indicator of predictability. Instead, this work has confirmed that the ensemble spread provides a more reliable guide and that cases with low spread generally will have lower ensemble-mean error.

This work compared for the first time objective measures of the inconsistency between the control and ensemble-mean forecasts valid for the same verification time. To further investigate the superiority of an ensemble instead of a single forecast the definition of the jumpiness index should be generalized so that it can be applied to probabilistic forecasts. Furthermore, it could be interesting to apply the diagnostic measures introduced in this work to all the ensemble forecasts available within TIGGE, and to the multimodel, grand ensemble mean that can be generated by using all TIGGE forecasts. Work along these lines is encouraged.

## Acknowledgments

The authors thank Anabel Bowen and Rob Hine for editing and improving in a substantial way the quality of all the figures. The editor and two anonymous reviewers are also thanked for their work and very useful comments that helped us in revising an earlier version of this work.

## REFERENCES

Barkmeijer, J., , R. Buizza, , and T. N. Palmer, 1999: 3D-Var Hessian singular vectors and their potential use in the ECMWF Ensemble Prediction System.

,*Quart. J. Roy. Meteor. Soc.***125****,**2333–2351.Barkmeijer, J., , R. Buizza, , T. N. Palmer, , K. Puri, , and J-F. Mahfouf, 2001: Tropical singular vectors computed with linearized diabatic physics.

,*Quart. J. Roy. Meteor. Soc.***127****,**685–708.Bishop, C. H., , B. J. Etherton, , and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects.

,*Mon. Wea. Rev.***129****,**420–436.Bowler, N. E., , A. Arribas, , K. R. Mylne, , and K. B. Robertson, 2007: The MOGREPS short-range ensemble prediction system. Part I: System description. Met Office NWP Tech. Rep. 497, 18 pp. [Available from The Met Office, FitzRoy Rd., Exeter, EX1 3PB, United Kingdom and online at www.metoffice.gov.uk].

Bowler, N. E., , A. Arribas, , K. R. Mylne, , and K. B. Robertson, 2008: The MOGREPS short-range ensemble prediction system.

,*Quart. J. Roy. Meteor. Soc.***134****,**703–722.Buizza, R., 2008a: The value of probabilistic prediction.

,*Atmos. Sci. Lett.***9****,**36–42. doi:10.1002/asl.170.Buizza, R., 2008b: Comparison of a 51-member low-resolution (TL399L62) ensemble with a 6-member high-resolution (TL799L91) lagged-forecast ensemble.

,*Mon. Wea. Rev.***136****,**3343–3362.Buizza, R., , and T. N. Palmer, 1995: The singular-vector structure of the atmospheric global circulation.

,*J. Atmos. Sci.***52****,**1434–1456.Buizza, R., , and T. N. Palmer, 1998: Impact of ensemble size on ensemble prediction system.

,*Mon. Wea. Rev.***126****,**2503–2518.Buizza, R., , M. Miller, , and T. N. Palmer, 1999: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system.

,*Quart. J. Roy. Meteor. Soc.***125****,**2887–2908.Buizza, R., , P. L. Houtekamer, , Z. Toth, , G. Pellerin, , M. Wei, , and Y. Zhu, 2005: A comparison of the ECMWF, MSC, and NCEP Global Ensemble Prediction Systems.

,*Mon. Wea. Rev.***133****,**1076–1097.Buizza, R., , J-R. Bidlot, , N. Wedi, , M. Fuentes, , M. Hamrud, , G. Holt, , and F. Vitart, 2007: The new ECMWF VAREPS (Variable Resolution Ensemble Prediction System).

,*Quart. J. Roy. Meteor. Soc.***133****,**681–695.Gilmour, I., , L. A. Smith, , and R. Buizza, 2001: On the duration of the linear regime: Is 24 hours a long time in weather forecasting?

,*J. Atmos. Sci.***58****,**3525–3539.Houtekamer, P. L., , L. Lefaivre, , J. Derome, , H. Ritchie, , and H. L. Mitchell, 1996: A system simulation approach to ensemble prediction.

,*Mon. Wea. Rev.***124****,**1225–1242.Houtekamer, P. L., , M. Charron, , H. Mitchell, , and G. Pellerin, 2007: Status of the global EPS at Environment Canada.

*Proc. ECMWF Workshop on Ensemble Prediction,*Reading, United Kingdom, ECMWF, 57–68. [Available from ECMWF, Shinfield Park, Reading, RG2 9AX, United Kingdom].Johnson, C., , and R. Swinbank, 2008: Medium-range multi-model ensemble combination and calibration. Met Office Forecasting Research Tech. Rep. 517, 30 pp. [Available from The Met Office, FitzRoy Rd., Exeter, EX1 3PB, United Kingdom and online at www.metoffice.gov.uk].

Molteni, F., , R. Buizza, , T. N. Palmer, , and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation.

,*Quart. J. Roy. Meteor. Soc.***122****,**73–119.Neter, J., , W. Wasserman, , and G. A. Whitmore, 1988:

*Applied Statistics*. Allyn and Bacon Inc., 1006 pp.Palmer, T. N., , F. Molteni, , R. Mureau, , R. Buizza, , P. Chapelet, , and J. Tribbia, 1993: Ensemble prediction.

*Proc. ECMWF Seminar on Validation of Models over Europe,*Vol. I, Reading, United Kingdom, ECMWF, 21–66.Palmer, T. N., and Coauthors, 2007: The ECMWF Ensemble Prediction System: Recent and on-going developments. ECMWF Research Department Tech. Memo. 540, ECMWF, Reading, United Kingdom, 53 pp.

Park, Y-Y., , R. Buizza, , and M. Leutbecher, 2008: TIGGE: Preliminary results on comparing and combining ensembles.

,*Quart. J. Roy. Meteor. Soc.***134****,**2029–2050.Shutts, G., 2005: A kinetic energy backscatter algorithm for use in ensemble prediction systems.

,*Quart. J. Roy. Meteor. Soc.***131****,**3079–3100.Tarcton, M. S., , and E. Kalnay, 1993: Operational ensemble prediction at the National Meteorological Center: Practical aspects.

,*Wea. Forecasting***8****,**379–398.Toth, Z., , and E. Kalnay, 1993: Ensemble forecasting at NMC: The generation of perturbations.

,*Bull. Amer. Meteor. Soc.***74****,**2317–2330.Vitart, F., and Coauthors, 2008: The new VAREPS-monthly forecasting system: A first step towards seamless prediction.

,*Quart. J. Roy. Meteor. Soc.***134****,**1789–1799.Wei, M., , Z. Toth, , R. Wobus, , Y. Zhu, , C. Bishop, , and X. Wang, 2006: Ensemble Transform Kalman Filter-based ensemble perturbations in an operational global prediction system at NCEP.

,*Tellus***58A****,**28–44.