• Anderson, J., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A, 72––83.

  • Barker, T., 1991: The relationship between spread and forecast error in extended-range forecasts. J. Climate, 4, 733––742.

  • Candille, G., and O. Talagrand, 2005: Evaluation of probabilistic prediction systems for a scalar variable. Quart. J. Roy. Meteor. Soc., 131, 2131––2150.

    • Search Google Scholar
    • Export Citation
  • Descamps, L., and O. Talagrand, 2007: On some aspects of the definition of initial conditions for ensemble prediction. Mon. Wea. Rev., 135, 3260––3272.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., T. Hamill, and J. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble forecasts. Part I: Two-meter temperatures. Mon. Wea. Rev., 136, 2608––2619.

    • Search Google Scholar
    • Export Citation
  • Hamill, T., and J. Whitaker, 2006: Probabilistic quantitative precipation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 3209––3229.

    • Search Google Scholar
    • Export Citation
  • Hamill, T., J. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434––1447.

    • Search Google Scholar
    • Export Citation
  • Hamill, T., R. Hagedorn, and J. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble forecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 2620––2632.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., 1993: Global and local skill forecasts. Mon. Wea. Rev., 121, 1834––1846.

  • Kuhl, D., and Coauthors, 2007: Assessing predictability with a local ensemble Kalman filter. J. Atmos. Sci., 64, 1116––1140.

  • Leith, C. E., 1974: Theoretical skill of Monte Carlo forecasts. Mon. Wea. Rev., 102, 409––418.

  • Li, H., E. Kalnay, and T. Miyoshi, 2009: Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter. Quart. J. Roy. Meteor. Soc., 135, 523––533, doi:10.1002/qj.371.

    • Search Google Scholar
    • Export Citation
  • Majumdar, S., C. H. Bishop, B. I. Szunyogh, and Z. Toth, 2001: Can an ensemble transform Kalman filter predict the reduction in forecast error variance produced by targeted observations?. Quart. J. Roy. Meteor. Soc., 127, 2803––2820.

    • Search Google Scholar
    • Export Citation
  • Oczkowski, M., I. Szunyogh, and D. J. Patil, 2005: Mechanisms for the development of locally low-dimensional atmospheric dynamics. J. Atmos. Sci., 62, 1135––1156.

    • Search Google Scholar
    • Export Citation
  • Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56, 415––428.

  • Patil, D. J., B. R. Hunt, E. Kalnay, J. A. Yorke, and E. Ott, 2001: Local low dimensionality of atmospheric dynamics. Phys. Rev. Lett., 86, 5878––5881.

    • Search Google Scholar
    • Export Citation
  • Rao, C., 1973: Linear Statistical Inference and Its Applications. 2nd ed. Wiley, 625 pp.

  • Satterfield, E., and I. Szunyogh, 2010: Predictability of the performance of an ensemble forecast system: Predictability of the space of uncertanties. Mon. Wea. Rev., 138, 962––981.

    • Search Google Scholar
    • Export Citation
  • Szunyogh, I., and Coauthors, 2007: The local ensemble transform Kalman filter and its implementation on the NCEP global model at the University of Maryland. Proc. Workshop on Flow Dependent Aspects of Data Assimilation, Reading, United Kingdom, ECMWF, 47––63.

    • Search Google Scholar
    • Export Citation
  • Szunyogh, I., E. J. Kostelich, G. Gyarmati, E. Kalnay, B. R. Hunt, E. Ott, E. A. Satterfield, and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation system for the NCEP global model. Tellus, 60A, 113––130.

    • Search Google Scholar
    • Export Citation
  • Talagrand, O., 1981: A study of the dynamics of four-dimensional data assimilation. Tellus, 33, 43––60.

  • Talagrand, O., R. Vautard, and B. Strauss, 1999: Evaluation of probabilistic prediction systems. Proc. Workshop on Predictability, Reading, United Kingdom, ECMWF, 1––25.

    • Search Google Scholar
    • Export Citation
  • Tribbia, J. J., and D. P. Baumhefner, 2004: Scale interactions and atmospheric predictability: An updated perspective. Mon. Wea. Rev., 132, 703––713.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J., and A. Loughe, 1998: The relationship between ensemble spread and ensemble mean skill. Mon. Wea. Rev., 126, 3292––3302.

    • Search Google Scholar
    • Export Citation
  • View in gallery
    Fig. 1.

    The time mean forecast error shown for the meridional component of the wind at 500 hPa averaged over all latitudes in the NH extratropics. Results are shown for the experiment that assimilates conventional observations using the original values of inflation (triangles) and doubled inflation (circles).

  • View in gallery
    Fig. 2.

    The time evolution of TV (squares), TVS (triangles), and V (circles) for the NH extratropics. Results are shown for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) observations of the real atmosphere. Note the scale is exponential. Also note the different scale in (bottom) panel.

  • View in gallery
    Fig. 3.

    The zonal power spectrum of the meridional component of the wind averaged overall latitudes in the NH extratropics and over time. Results are shown for 0––120-h forecast lead times at 12-h increments, averaged over the (left) ensemble perturbations and (right) for δδxt for the experiments that assimilate (top) simulated observations in random locations, (middle) simulated observations in realistic locations, and (bottom) observations of the real atmosphere.

  • View in gallery
    Fig. 4.

    Linear regression for ensemble skill based on spread. Shown are the NH results for the actual values (gray dots) and predicted values (black line) at the 5-day lead time for experiments that assimilate (top) simulated observations in random locations, (middle) simulated observations in realistic locations, and (bottom) observations of the real atmosphere.

  • View in gallery
    Fig. 5.

    Mean V and the 95th percentile of TV of data divided equally into 100 bins for the NH extratropics for the first 18 days (triangles). The linear regression curve fitted to these data is shown by a solid straight line. If the prediction of the 95th percentile of TV by the linear statistical model was perfect, the actual values for the second 18 days (open circles) would fall on this line. Shown are the distributions for (left) the analysis and (right) the 120-h forecast lead time for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) observations of the real atmosphere. The legends show the correlation between V and 95th percentile of TV in the training dataset (Rtraining) and between V and the predicted value of the 95th percentile of TV.

  • View in gallery
    Fig. 6.

    The time mean of the NH average spectrum of the ratio dk, calculated for all assimilated variables in the local regions with energy rescaling. Results are shown for (left) the analysis time and the (right) 5-day lead time for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) observations of the real atmosphere. The average is taken over all forecasts started between 0000 UTC 11 Jan 2004 and 0000 UTC 15 Feb 2004.

  • View in gallery
    Fig. 7.

    The time average of the ratio dk in the leading direction for the temperature at 850 hPa. Results are shown for (left) the analysis time and (right) the 5-day forecast for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) observations of the real atmosphere. The average is taken over all forecasts started between 0000 UTC 11 Jan 2004 and 0000 UTC 15 Feb 2004.

  • View in gallery
    Fig. 8.

    The time-averaged spectrum of the ratio dk in a densely observed local region centered at grid point (40°°N, 80°°W) for the temperature at 850 hPa. Results are shown for (left) the analysis time and (right) the 5-day forecast for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) conventional observations of the real atmosphere. The average is taken over all forecasts started between 0000 UTC 11 Jan 2004 and 0000 UTC 15 Feb 2004.

  • View in gallery
    Fig. 9.

    As in Fig. 8, but in a sparsely observed local region centered at grid point (0°°, 120°°W).

  • View in gallery
    Fig. 10.

    The time mean of the NH average spectrum of the ratio dk, calculated for all assimilated variables in local regions with energy rescaling. Results are shown for observations of the real atmosphere for (top) the minimum bin average of E dimension, (middle) median bin average of E dimension, and (bottom) maximum bin average of E dimension for (left) the analysis time and (right) the 120-h forecast lead time.

  • View in gallery
    Fig. 11.

    The eigenvalue spectrum (normalized by the leading eigenvalue) and the percentage of TVS for low (red plus signs), median (green open circles), and high (blue triangles) values of the E dimension for the NH. Shown are the results at (left) the analysis time and at (right) the 120-h forecast lead time for the experiments that assimilate (top) simulated observations in random locations, (middle) simulated observations in realistic locations, and (bottom) observation of the real atmosphere.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 386 244 9
PDF Downloads 68 36 1

Assessing the Performance of an Ensemble Forecast System in Predicting the Magnitude and the Spectrum of Analysis and Forecast Uncertainties

Elizabeth SatterfieldTexas A&M University, College Station, Texas

Search for other papers by Elizabeth Satterfield in
Current site
Google Scholar
PubMed
Close
and
Istvan SzunyoghTexas A&M University, College Station, Texas

Search for other papers by Istvan Szunyogh in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

The ability of an ensemble to capture the magnitude and spectrum of uncertainty in a local linear space spanned by the ensemble perturbations is assessed. Numerical experiments are carried out with a reduced resolution 2004 version of the model component of the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS). The local ensemble transform Kalman filter (LETKF) data assimilation system is used to assimilate observations in three steps, gradually adding more realistic features to the observing network. In the first experiment, randomly placed, noisy, simulated vertical soundings, which provide 10%% coverage of horizontal model grid points, are assimilated. Next, the impact of an inhomogeneous observing system is introduced by assimilating simulated observations in the locations of real observations of the atmosphere. Finally, observations of the real atmosphere are assimilated.

The most important findings of this study are the following: predicting the magnitude of the forecast uncertainty and the relative importance of the different patterns of uncertainty is, in general, a more difficult task than predicting the patterns of uncertainty; the ensemble, which is tuned to provide near-optimal performance at analysis time, underestimates not only the total magnitude of the uncertainty, but also the magnitude of the uncertainty that projects onto the space spanned by the ensemble perturbations; and finally, a strong predictive linear relationship is found between the local ensemble spread and the upper bound of the local forecast uncertainty.

Corresponding author address: Elizabeth Satterfield, Naval Research Laboratory, Marine Meteorology Division, Monterey, CA 93943-5502. E-mail: elizabeth.satterfield.ctr@nrlmry.navy.mil

This article included in the Third THORPEX International Science Symposium special collection.

Abstract

The ability of an ensemble to capture the magnitude and spectrum of uncertainty in a local linear space spanned by the ensemble perturbations is assessed. Numerical experiments are carried out with a reduced resolution 2004 version of the model component of the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS). The local ensemble transform Kalman filter (LETKF) data assimilation system is used to assimilate observations in three steps, gradually adding more realistic features to the observing network. In the first experiment, randomly placed, noisy, simulated vertical soundings, which provide 10%% coverage of horizontal model grid points, are assimilated. Next, the impact of an inhomogeneous observing system is introduced by assimilating simulated observations in the locations of real observations of the atmosphere. Finally, observations of the real atmosphere are assimilated.

The most important findings of this study are the following: predicting the magnitude of the forecast uncertainty and the relative importance of the different patterns of uncertainty is, in general, a more difficult task than predicting the patterns of uncertainty; the ensemble, which is tuned to provide near-optimal performance at analysis time, underestimates not only the total magnitude of the uncertainty, but also the magnitude of the uncertainty that projects onto the space spanned by the ensemble perturbations; and finally, a strong predictive linear relationship is found between the local ensemble spread and the upper bound of the local forecast uncertainty.

Corresponding author address: Elizabeth Satterfield, Naval Research Laboratory, Marine Meteorology Division, Monterey, CA 93943-5502. E-mail: elizabeth.satterfield.ctr@nrlmry.navy.mil

This article included in the Third THORPEX International Science Symposium special collection.

1. Introduction

In an earlier paper (Satterfield and Szunyogh 2010) we investigated the performance of the linear space spanned by the ensemble perturbations in capturing the space of uncertainties in 1000-km horizontal length scale local regions. In that paper, we found that in the 72––120-h forecast range:

  • the performance of in capturing the forecast error patterns was strongly flow dependent: the more rapid the error growth, the more reliable was in capturing the forecast errors; and

  • the performance of was highly predictable because of its linear dependence on the flow-dependent E-dimension diagnostic, which can be computed from a forecast ensemble.

In Satterfield and Szunyogh (2010), we emphasized that a good prediction of the space of uncertainties does not necessarily guarantee a good prediction of the magnitude, or in general, the probability distribution of the uncertainties. This paper reports on our first attempt at investigating the performance of the ensemble in predicting these characteristics of the forecast uncertainty. In particular, we assess the skill of the ensemble in predicting the magnitude of the uncertainties and the relative importance of the different error patterns within . A more expansive introduction can be found in Satterfield and Szunyogh (2010).

The structure of this paper is as follows. In section 2, we introduce the diagnostics we use to assess and explain the performance of the ensemble prediction system at the different locations and times. In section 3, we describe the design of the numerical experiments. In section 4, we examine the ability of the ensemble to accurately capture the magnitude and spectrum of forecast uncertainties. In section 5, we summarize our conclusions.

2. Diagnostics

As in Satterfield and Szunyogh (2010), we explore the predictive qualities of the ensemble by using linear diagnostics applied to the ensemble perturbations in a small local neighborhood of each model grid point. In what follows, we briefly summarize the general mathematical model we adopt and introduce the local diagnostics we employ to explore the predictability of the magnitude and spectrum of the forecast uncertainties.

a. Local vectors and their covariance

We define the local state vector x(ℓℓ) at each model grid point ℓℓ by the model representation of the state within a local volume centered at location (grid point) ℓℓ. We assume that the dimension of x(ℓℓ), which is defined by the product of the number of grid points within the local volume and the number of model variables considered at each grid point, is equal to N. To simplify the notation, in what follows, we drop the argument ℓℓ from the notation of the local state vector, because all equations are valid at any arbitrary location ℓℓ.

We predict the uncertainty in the knowledge of the local state at both the analysis and the different forecast times by an ensemble-based estimate of the local error covariance matrix :
e1
In Eq. (1), K is the number of ensemble members, T is the matrix transpose, and the ensemble perturbations {{x′′(k): k == 1 …… K}} are defined by the difference
e2
between the ensemble members {{x(k): k == 1 …… K}}, and the ensemble mean:
e3

b. Diagnostics for the magnitude of the uncertainties

In Satterfield and Szunyogh (2010), we focused on investigating the efficiency of the linear space defined by the range of in capturing the error
e4
in the deterministic prediction, xe started from the ensemble mean analysis . In Eq. (4), xt is the model representation of the true state. In the present paper, our goal is to investigate (i) the accuracy of the ensemble prediction of the expected value of the magnitude of the uncertainty and (ii) the accuracy of the ensemble prediction of the spectrum of uncertainties within . To achieve this goal, we apply diagnostics to the
e5
difference between the model representation of the true state and the ensemble mean state estimate, instead of ξξ. The difference δδxt is often referred to in the literature as the error in the ensemble mean forecast. This terminology is justified when the ensemble mean is used as a deterministic forecast, which is motivated by the fact that the mean of a perfectly designed ensemble would be the most accurate deterministic forecast in the root-mean-square sense (Leith 1974). In our study, however, we consider to be the prediction of the mean of a probability distribution. Since, except for the analysis time, δδxt is expected to be nonzero even if is a perfect prediction of the mean of the probability distribution, we refer to δδxt as either the difference between the ensemble mean and the model representations of the true state or the local forecast uncertainty. [The notation δδxt is consistent with the notation of Satterfield and Szunyogh (2010), where we represented all local state vectors x as the sum of the ensemble mean state estimate and a perturbation δδx.]
The motivation to apply diagnostics to δδxt, instead of ξξ, is that a verifiable optimality condition between ‖‖δδxt‖‖, the magnitude of δδxt, and the ensemble variance,
e6
exists at all forecast times: because Vℓℓ is a prediction of the total variance TVℓℓ == E[(δδxt)2], where (δδxt)2 == (δδxt)T(δδxt) == ‖‖δδxt‖‖2, its expected value, V == E[Vℓℓ], should satisfy the equation
e7
where TV == E[TVℓℓ]. (Hereafter, E[··] denotes the expected value.) Verifying the relationship defined by Eq. (7), which is often referred to as the spread––skill relationship, is one of the most widely used diagnostics for the validation of an ensemble prediction system. In contrast, all we know about the magnitude ‖‖ξξ‖‖ of ξξ is that it should satisfy E[ξξ2] == V(ξξ2) == ξξTξξ == ‖‖ξξ‖‖2) at analysis time and, under an ergodic hypothesis, E[ξξ2] == 2V once the forecast time is so long that predictability is completely lost (Leith 1974). That is, no verifiable diagnostic relationship exists between ξξ2 and V at the intermediate forecast lead times.
We decompose δδxt as
e8
Here, δδxt(‖‖) is the component of δδxt that projects onto and xt(⊥⊥) represents the component of δδxt, which projects onto the null space of (orthogonal to the space spanned by the ensemble perturbations). Since the normalized eigenvectors associated with the first K −− 1 eigenvalues of {{uk: k == 1, …… , K −− 1}}, define an orthonormal basis in , they provide a convenient basis to compute δδxt(‖‖) by
e9
In addition, the first K −− 1 (nonzero) eigenvalues of satisfy the following equation:
e10
We introduce the notation TVS == E[(δδxt(‖‖))2] for the variance in . In the optimal case, δδxt would fully project onto , satisfying δδxt == δδxt(‖‖), leading to TVS == TV. But, when part of δδxt is not captured by the ensemble, xt(⊥⊥) ≠≠ 0, which leads to TVS < TV.

When the ensemble correctly represents the variance of δδxt(‖‖), V == TVS. For a given ensemble system, V can be either smaller or larger than TVS. In the former case (V < TVS) the ensemble underestimates the magnitude of the uncertainty that can be explained by , while in the latter case the ensemble overestimates the magnitude that can be explained by . It may even happen that the ensemble satisfies the optimality condition of Eq. (7) by overestimating the true variance in to compensate for the variance lost by not capturing all true error directions. This situation occurs when the ensemble variance is tuned to satisfy Eq. (7) at a given forecast time (e.g., 48 h), but the ensemble cannot fully capture δδxt. Such a situation can be diagnosed by verifying that V > TVS. Analyzing the results of our numerical experiment we always make a three-way comparison between TV, TVS, and V.

c. Diagnostics for the spectrum of uncertainties

While the explained variance diagnostic of Satterfield and Szunyogh (2010) quantifies the efficiency of the space in capturing the space of uncertainty in the state estimate, the comparison of TV, TVS, and V quantifies the quality of the ensemble in predicting the magnitude of the uncertainty. These diagnostics, however, do not provide information about the performance of the ensemble in distinguishing between the relative importance of the different error patterns within . To introduce a diagnostic that can measure the performance of the ensemble in quantifying the contributions of the different error patterns to the total error within , we first recall that the eigenvalue λλk, is the ensemble-based prediction of the variance of the uncertainty in the kth eigendirection.1 We choose the d ratio:
e11
which was first introduced in Ott et al. (2004), to measure the accuracy of the prediction of the variance in the kth eigendirection. In Eq. (11), is the kth coordinate of δδxt(‖‖) in the coordinate system {{uk: k == 1, …… , K −− 1}}. Since the d ratio is defined independently for each eigendirection, it is more appropriate to talk about a spectrum of the d ratio. It can be shown that if the ensemble correctly predicts, in a statistical sense, the uncertainty in the kth direction, the expected value, E[dk], of dk at a given time and location is equal to 1:
e12
In Eq. (12) we made use of the fact that λλk is the ensemble-based prediction of , thus, for a correct prediction the two quantities must be equal. Since we have a verifiable optimality condition only for the expected value of dk, we cannot use dk to measure the performance of the ensemble at a given time and location. Instead, we collect statistical samples of dk and obtain estimates of the expected value by computing the sample means. A sample mean smaller than 1 indicates that the ensemble tends to overestimate the uncertainty in the kth direction, while a sample mean larger than 1 indicates that the ensemble tends to underestimate the uncertainty in the kth direction.
Finally, we note that the condition E[dk] == 1 is always satisfied when the random variable,
e13
has an expected value equal to 0, , and a variance equal to 1, , since
e14
A random variable similar to was first introduced for the verification of the ensemble forecast of a scalar variable by Talagrand et al. (1999) and was later named the reduced centered random variable (RCRV) by Candille and Talagrand (2005) and Descamps and Talagrand (2007). The difference between and the RCRV is that while RCRV is for a scalar atmospheric state variable, is for a vector, the local state vector, and a spectrum of scalar ratios is obtained by first projecting the centered local state vector on the principal components of the ensemble-based estimate of the background error covariance matrix and then performing the reduction by the ensemble spread only in that particular eigendirection.

3. Experiment design

Since our experiment design is identical to that of Satterfield and Szunyogh (2010), here we provide only a brief summary of the design of the experiments. All experiments are carried out with a reduced resolution (T62L28) 2004 version of the model component of the operational National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS). Observations are assimilated with the implementation of the local ensemble transform Kalman filter (LETKF) data assimilation scheme on the NCEP GFS, which was described in detail in Szunyogh et al. (2008). With the help of the LETKF, we generate a K == 40-member analysis ensemble, which is then used to generate the background ensemble of the next analysis cycle and to provide the initial conditions of the 120-h ensemble forecasts used in the diagnostics. All numerical experiments are carried out for the time period 0000 UTC 1 January and 1800 UTC 29 February 2004. The diagnostics are computed based on the forecasts started at 0000 and 1200 UTC between 11 January and 15 February on a 2.5°° ×× 2.5°° resolution grid.

Three experiments are carried out: two under the perfect model hypothesis, assimilating observations of a known true state space trajectory, which was generated with a continuous integration of the model started from the operational analysis of NCEP at 0000 UTC 1 January 2004; and one in a realistic setting, assimilating observations of the true atmosphere. The difference between the two perfect model experiments is that one of them assimilates 2000 vertical soundings of the model atmosphere, which are randomly distributed in the horizontal direction, while the other assimilates observations whose types and locations are identical to those in the realistic experiment. While there are error sources common to all experiments, comparing results from the two perfect-model experiments enables us to detect features that are most likely due to inhomogeneities in a realistic observing network. Similarly, comparing the results from the experiment that assimilates observations of the real atmosphere to the results of the experiment that uses identically distributed simulated observations, we can detect features that are most likely due to model errors. For the experiment conducted in the realistic setting, high-resolution (T254L64) operational NCEP analyses truncated to 2.5°° ×× 2.5°° resolution are used as proxy for the ““true”” state. These operational analyses were obtained by NCEP assimilating a large number of satellite radiance observations in addition to the conventional observations used in our experiments.

The one particular parameter of the LETKF that has an important effect on our diagnostics is the multiplicative covariance inflation factor. In our implementation of the LETKF, we apply multiplicative covariance inflation at each analysis step to increase the magnitude of the estimated analysis uncertainty to compensate for the loss of ensemble variance due to sampling errors, the effects of nonlinearities, and model errors. In essence, the covariance inflation factor controls the magnitude of the analysis ensemble perturbations. In our code, the covariance inflation factor, ρρ == ρρ(σσ, φφ), is a function of the model vertical coordinate σσ and the geographical latitude φφ. That is, ρρ is constant in the zonal direction at a given model level and latitude. We tuned the covariance inflation factor independently for the three experiments, trying to ensure that the ensemble of analysis perturbations satisfies the condition V ≈≈ TVS in each experiment. We note that this tuning condition is different from the one used in the common practice of ensemble forecasting, where the magnitude of the analysis ensemble perturbations is tuned so that V ≈≈ TV. Our motivation to use V ≈≈ TVS instead of V ≈≈ TV is that in our setting the magnitude of the variance inflation affects not only the magnitude of the analysis perturbations, but it also affects the quality of the analysis. In particular, we find that while increasing the variance inflation increases V, it also increases TV. For instance, for the experiment with observations of the real atmosphere, when the variance inflation is doubled, V increases from 154.91 to 389.23 J kg−−1 and TV increases from 389.23 to 843.68 J kg−−1. (The associated degradation in the analysis and forecasts of the meridional component of the 500-hPa wind is shown in Fig. 1.) Thus, increasing the variance inflation not only has the undesirable effect of increasing the analysis error, but is also fails to improve the relationship between V and TV. This problem does not exist in the current practice of numerical weather prediction because there the data assimilation system and the ensemble prediction system are tuned independently. But, our result should serve as a cautionary note, as the operational centers moving in the direction of unifying their analysis and ensemble forecasting systems.

Fig. 1.
Fig. 1.

The time mean forecast error shown for the meridional component of the wind at 500 hPa averaged over all latitudes in the NH extratropics. Results are shown for the experiment that assimilates conventional observations using the original values of inflation (triangles) and doubled inflation (circles).

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

We define the local state vector by all temperature, wind, and surface pressure grid point variables in a cube that is defined by 5 ×× 5 horizontal grid points and the entire column of the model atmosphere. Computing projections in the vector space requires the definition of a scalar product on . In this paper, we follow the approach of Oczkowski et al. (2005) and Kuhl et al. (2007): we use the Euclidean scalar product, but before we compute it, we transform the ensemble perturbations to ensure that all vector components have the same physical dimension. In particular, we choose the transformation weights so that the square of the Euclidean norm, computed by taking the scalar product of a transformed ensemble perturbation by itself, has a dimension of energy. The use of this transformation to compute scalar products of the perturbations of the state vector of a primitive equation model was first suggested by Talagrand (1981).

4. Numerical experiments

a. Prediction of the magnitude of forecast error

Figure 2 shows the evolution of TV, TVS, and V. In this figure the expected value is estimated by taking the spatial average over all grid points in the NH extratropics (30°°––90°°N) and the temporal average over all forecasts started between 0000 UTC 11 January 2004 and 0000 UTC 15 February 2004.

Fig. 2.
Fig. 2.

The time evolution of TV (squares), TVS (triangles), and V (circles) for the NH extratropics. Results are shown for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) observations of the real atmosphere. Note the scale is exponential. Also note the different scale in (bottom) panel.

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

Interestingly, the difference between TVS and V at longer lead times is much larger than the difference between TV and TVS. In other words, although the linear space spanned by the ensemble perturbations provides a good representation of the space of forecast uncertainties, the ensemble severely underestimates the total variance in . Even though, this underestimation is more serious in the experiment where model errors have an effect on the total error variance TV, the underestimation in the two perfect model experiments is also significant. Thus, the underestimation of the forecast error variance cannot be fully explained by the lack of accounting for the effect of model errors in our ensemble.

To investigate whether the underestimation of TV by V at the different forecast times is primarily due to the underestimation of the analysis error by the analysis ensemble perturbations or to insufficient perturbation growth with time, we introduce the magnitude doubling time Td to measure the rate of the error and perturbation growth. Here Td is obtained by first fitting exponential curves x(t) == x(0) expat to TV, V and TVS; then, computing Td by Td == a−−1 ln2. (The smaller Td, the faster is the growth of the magnitude of the error or the ensemble perturbation.) In the two perfect-model experiments, Td is very similar for TV and V: in the experiment with simulated observations at random locations, Td is 25.6 h for TV and 23.1 h for V, while in the experiment with simulated observations at realistic locations, Td is 27.0 h for both TV and V. These numbers suggest that, in the perfect-model experiments, the underestimation of the typical (average) magnitude of the forecast errors by the ensemble is primarily due to the underestimation of the typical (average) magnitude of the analysis error.

In the experiment that uses observations of the real atmosphere, Td is 33.2 h for TV and 42.7869 h for V. We attribute the slower growth of V than that of TV to the lack of representation of the effects of model errors in the ensemble, since we did not observe a similar difference in the two perfect-model experiments. Thus, in the experiment with observations of the real atmosphere, the underestimation of the forecast error is due to a combination of the underestimation of the analysis error and the lack of representation of the effects of model errors.

Finally, we note that, interestingly, TVS grows faster than either TV or V in all 3 experiments: Td for TVS is 19.7 h for the experiment with simulated observations at random locations, 22.4 h for the simulated observations at realistic locations, and 25.8 h for observations of the real atmosphere. In addition, the initial growth rate of TVS and V is faster than exponential growth: up until the 36-h lead time, the time evolution of the growth rate of TVS and V can be best approximated by a second-order polynomial.

Next, we turn our attention to the evolution of the spectral distribution of the forecast errors and the prediction of this property by the ensemble. In particular, Fig. 3 shows the zonal power spectrum of the meridional wind at 500 hPa averaged over all latitudes in the NH extratropics and over time. The left panels show results that were obtained by computing the power spectra for each ensemble perturbation, then taking the ensemble mean of the spectra. The right panels show the spectra for the δδxt difference between the model representation of the truth and the ensemble mean. The shape of the spectra and the time evolution of the spectra is very similar in all panels, except for the lower-left panel, which shows the ensemble based prediction of the spectrum for the experiment with observations of the real atmosphere. Unlike in the other panels, where a dominant range of wavenumbers 6––10 emerges from a relatively flat initial spectrum, here, at analysis time, the spectrum has a well-pronounced peak at wavenumber 2. A more detailed investigation of this spectrum revealed that this peak is associated with large ensemble variance in a region over the arctic north of Russia.

Fig. 3.
Fig. 3.

The zonal power spectrum of the meridional component of the wind averaged overall latitudes in the NH extratropics and over time. Results are shown for 0––120-h forecast lead times at 12-h increments, averaged over the (left) ensemble perturbations and (right) for δδxt for the experiments that assimilate (top) simulated observations in random locations, (middle) simulated observations in realistic locations, and (bottom) observations of the real atmosphere.

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

While the qualitative behavior of the spectra of δδxt and the ensemble perturbations is similar, as can be expected based the results discussed earlier in this section, the amplitudes are larger for δδxt than for the ensemble perturbations. This result suggests that the underestimation of the typical magnitude of the analysis error in all three experiments and the lack of representation of the model error in the experiment with observations of the real atmosphere leads to an underestimation of the forecast errors, which is most severe at the synoptic scales, the scales where the error growth is the fastest. This result is in good agreement with that of Tribbia and Baumhefner (2004), who found that the propagation of forecast uncertainties with time in spectral space was toward the synoptic scales, where the growth rate of their magnitude became exponential.

b. Prediction of the spatiotemporal changes in the magnitude of forecast error

The usual candidate for a predictor of ‖‖δδxt‖‖ at location ℓℓ is the ensemble spread, , at the same location. These two quantities are known to have a positive correlation, which is typically low at analysis time and asymptotes to a level of about 0.5 by about 72-h lead time (e.g., Barker 1991; Houtekamer 1993; Whitaker and Loughe 1998). Kuhl et al. (2007) found that the correlation for our system was in good agreement with those earlier results. Since a correlation of 0.5 for a sample size of N == 129 600 suggests the existence of a linear relationship between and ‖‖δδxt‖‖, a prediction of ‖‖δδxt‖‖ based on with a linear regression may seem to be a natural choice. Computing the correlation for our experiments, we find that it is largest for the case of simulated observations at random locations (0.58), slightly lower for the case of realistically placed simulated observations (0.52), and much lower for the realistic case (0.26). This relatively large drop in the correlation for the realistic case would itself provide an argument against using a linear regression to predict ‖‖δδxt‖‖. An even more problematic feature of the relationship between ‖‖δδxt‖‖ and , which is illustrated by Fig. 4, is that for larger values of , ‖‖δδxt‖‖ varies over a much wider range of values. To better understand the problematic aspects of this result, we recall from linear statistics, that univariate regression predicts the value of a random variable y based on a predictor x by the E[y||x] conditional expectation of y given x [e.g., p. 264 in Rao (1973)]. Thus, a prediction of ‖‖δδxt‖‖ with a prediction of its conditional expectation does not reflect the large potential magnitude of the forecast error for a large value of the spread. This motivates us to investigate the relationship between and the upper bound of ‖‖δδxt‖‖, instead of the expectation of ‖‖δδxt‖‖ given .

Fig. 4.
Fig. 4.

Linear regression for ensemble skill based on spread. Shown are the NH results for the actual values (gray dots) and predicted values (black line) at the 5-day lead time for experiments that assimilate (top) simulated observations in random locations, (middle) simulated observations in realistic locations, and (bottom) observations of the real atmosphere.

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

We start our investigation by breaking up the 36-day dataset of 120-h forecasts into 2 sets of 18 days, and search for a quantitative linear relationship between Vℓℓ and the upper bound of ‖‖δδxt‖‖2 given Vℓℓ based on the first 18 days (training period). We then use the functional relationship found for this training period to predict the upper bound of ‖‖δδxt‖‖2 based on Vℓℓ for the second 18-day period. To dampen the effects of outliers, we use the 95th percentile of the bin values of ‖‖δδxt‖‖2 instead of the bin maximum. To obtain a qualitative prognostic relationship, we first order the values of Vℓℓ for the training period and divide them into 100 bins, each containing an equal number of data points. Each bin provides a pair of data: a value of Vℓℓ defined by its mean for the bin and the 95th percentile of ‖‖δδxt‖‖2. A similar technique was presented in Majumdar et al. (2001), where binned error variances were compared to ensemble sample variances.

Based on the training dataset, the correlation between the bin mean of Vℓℓ and the bin 95th percentile of ‖‖δδxt‖‖2 is 0.9889, 0.8176, and 0.9206 at the 120-h lead time for experiments that assimilate simulated random locations, simulated observations in realistic locations, and observations of the real atmosphere, respectively, which are statistically significant at the 99.99%% level by a t test. This suggests that Vℓℓ may be a good linear predictor of ‖‖δδxt‖‖2, even in the realistic case. Thus, we use the linear regression coefficients obtained for the training dataset to predict the 95th percentile of ‖‖δδxt‖‖2 for the second 18 days. The results are summarized in Fig. 5. The correlation values (0.9518, 0.8685, and 0.8568) between the predicted and the actual 95th percentile values of ‖‖δδxt‖‖2 indicate a linear relationship, which is statistically significant at the 99.99%% confidence level by a t test. Encouraged by the strong linear predictive relationship we find at 120 h, we turn our attention to the shorter lead times (figures for shorter lead times are not shown). At analysis time, we find lower correlation values between the bin mean of Vℓℓ and the bin 95th percentile of ‖‖δδxt‖‖2 (0.2472, 0.7076, and 0.2182) as well as lower correlation values between the predicted and the actual 95th percentile values of ‖‖δδxt‖‖2 (0.2902, 0.6573, and 0.4161) than at 120-h lead time. As can be expected, the correlation values increase with forecast lead time, by the 48-h lead time both perfect-model experiments show correlation values greater than 0.75 for both the training period and the prediction. For the experiment that assimilates simulated random observations, the lowest values of Vℓℓ which correspond to the highest values of ‖‖δδxt‖‖2 are found in the polar circle, where, by experiment design, we consider greater number of observations in each local region.2 For the case of realistic observations, the correlation values remain relatively low until around the 96-h lead time, where we find correlation values of 0.7259 for the training period and 0.6382 for the prediction of 95th percentile values of ‖‖δδxt‖‖2. We recall that 96 h is about the lead time at which the peak in the power series of the ensemble perturbations moves into the k == 8 range, in better agreement with ‖‖δδxt‖‖ (Fig. 3).

Fig. 5.
Fig. 5.

Mean V and the 95th percentile of TV of data divided equally into 100 bins for the NH extratropics for the first 18 days (triangles). The linear regression curve fitted to these data is shown by a solid straight line. If the prediction of the 95th percentile of TV by the linear statistical model was perfect, the actual values for the second 18 days (open circles) would fall on this line. Shown are the distributions for (left) the analysis and (right) the 120-h forecast lead time for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) observations of the real atmosphere. The legends show the correlation between V and 95th percentile of TV in the training dataset (Rtraining) and between V and the predicted value of the 95th percentile of TV.

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

c. Spectrum of the d ratio

We now turn our attention to investigating the efficiency of the ensemble in distinguishing between the importance of the eigendirections (error patterns in physical space) in . We first compute the spectrum of d ratio dk using the same definition of the local volume as in our calculations of E dimension and explained variance. The results are summarized in Fig. 6.

Fig. 6.
Fig. 6.

The time mean of the NH average spectrum of the ratio dk, calculated for all assimilated variables in the local regions with energy rescaling. Results are shown for (left) the analysis time and the (right) 5-day lead time for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) observations of the real atmosphere. The average is taken over all forecasts started between 0000 UTC 11 Jan 2004 and 0000 UTC 15 Feb 2004.

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

First, we discuss the results for analysis time (left panels of Fig. 6). We find that for the experiment that assimilates simulated observations in random locations, the ensemble only slightly underestimates the error in the directions it captures (top-left panel). When realistically placed simulated observations are used, the ensemble tends to underestimate uncertainty in all captured directions, with the exception of the few leading directions (middle-left panel). In the experiment with observations of the real atmosphere, the uncertainty is underestimated in all directions captured by the ensemble. The similarity between the shape of the spectra in the two experiments that assimilate observations at realistic locations and the flat spectrum in the third experiments, where observations are nearly uniformly distributed, suggests that the larger underestimation of error variance in the trailing directions is due to the uneven distribution of observations.

As forecast time increases the underestimation of the error by the ensemble becomes increasingly more severe in all directions and in all experiments. We show results for 120-h forecast time (right panels of Fig. 6). In the experiment with randomly distributed observations, the spectrum remains flat (top-right panel), while in the experiment with realistically distributed simulated observations the slope of the spectrum does not increase. In the realistic case, in contrast, the spectrum becomes much steeper indicating an increasingly more severe underestimation toward the trailing directions. Comparing the spectra from the two experiments with realistically distributed observations, we conclude that model errors lead to a more severe underestimation of the forecast uncertainty in the trailing directions.

To obtain d-ratio figures whose meteorological (physical) meaning is easier to interpret, we now change the definition of the local volume: we investigate a single variable at a single level using 5 by 5 horizontal grid points. In these calculations N == 25 (N < K), hence, the upper bound for the E dimension in is 25. The variable and levels we choose for this analysis are the surface pressure, the temperature at 850 hPa, the two horizontal wind components at 500 hPa, and the geopotential height at 250 hPa. Figure 7 shows the time mean of this ratio in the leading direction d1 for the temperature at 850 hPa. For the experiment that assimilates randomly placed observations, initially d1 is highest near the Poles. The main regions of enhanced baroclinicity over Japan and off the coast of Newfoundland also show underestimation. The latter result suggests that when the distribution of the observations is nearly uniform, the use of a zonally constant covariance inflation factor in the analysis leads to an underestimation of the uncertainty in the dynamically more active regions. In contrast, for the two experiments that assimilate realistically placed observations, d1 tends to reflect the local observation density: the uncertainty is underestimated in regions of high observation density, such as Europe, Japan, and the United States, and overestimated in regions of lower observational density, such as the Southern Hemisphere and the oceanic regions. This result is an indication that our zonally constant covariance inflation factor cannot be tuned to be optimal everywhere when there are zonal changes in observation density. Thus, we conjecture that implementing a spatially varying adaptive covariance inflation technique, such as described in Anderson (2009) or a localized version of Li et al. (2009) may lead to an improvement of the analyses and the short-term ensemble forecasts.

Fig. 7.
Fig. 7.

The time average of the ratio dk in the leading direction for the temperature at 850 hPa. Results are shown for (left) the analysis time and (right) the 5-day forecast for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) observations of the real atmosphere. The average is taken over all forecasts started between 0000 UTC 11 Jan 2004 and 0000 UTC 15 Feb 2004.

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

The time-averaged spectrum of the d ratio for a particular grid point in a densely observed region (40°°N, 80°°W) in the northeast United States (Fig. 8) at analysis time shows that for the experiment that uses simulated observations in random locations, the ensemble, on average, overestimates the uncertainty in all directions. When simulated observations are placed in the locations of conventional observations, the leading directions of uncertainty are overestimated and the trailing directions are underestimated. For the experiment that uses observations of the real atmosphere, the ensemble underestimates uncertainty in all direction, more severely in the trailing directions. For the same grid point at the 120-h lead time the two experiments that use simulated observations show overestimation in all directions. For observations of the real atmosphere, the ensemble overestimates the uncertainty in the leading directions and underestimates uncertainty in the trailing directions. In contrast, Fig. 9 shows the same plot for a grid point chosen in a sparsely observed region, in the western Pacific Ocean (0°°, 120°°W). For this grid point, we find that the ensemble overestimates uncertainty in all directions for all experiments at analysis time and at the 120-h lead time, with the exception of a few trailing directions for the experiment that uses conventional observations where uncertainty is underestimated.

Fig. 8.
Fig. 8.

The time-averaged spectrum of the ratio dk in a densely observed local region centered at grid point (40°°N, 80°°W) for the temperature at 850 hPa. Results are shown for (left) the analysis time and (right) the 5-day forecast for experiments that assimilate (top) randomly distributed simulated observations, (middle) simulated observations at the locations of conventional observations, and (bottom) conventional observations of the real atmosphere. The average is taken over all forecasts started between 0000 UTC 11 Jan 2004 and 0000 UTC 15 Feb 2004.

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

Fig. 9.
Fig. 9.

As in Fig. 8, but in a sparsely observed local region centered at grid point (0°°, 120°°W).

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

d. Relationship between E dimension and d ratio

The E dimension, introduced by Patil et al. (2001) and discussed in details in Oczkowski et al. (2005), characterizes the local complexity of dynamics. The E dimension is a spatiotemporally evolving measure of the steepness of the eigenvalue spectrum, λλ1 ≥≥ λλ2 …… ≥≥ λλr …… ≥≥ λλK, having smaller values for a steeper spectrum (Szunyogh et al. 2007). Because Satterfield and Szunyogh (2010) found the E dimension to be a good predictor of the performance of in capturing the forecast error, we hoped to find a similar relationship between the E dimension and the quality of the prediction of the magnitude and the spectrum of uncertainties by the ensemble. While all of our attempts at finding a qualitative relationship between the E dimension and the performance of the ensemble in predicting the magnitude of the uncertainty have failed, we have found interesting differences between the spectra of d ratios for different values of the E dimension.

To explore the relationship between the E dimension and the spectrum of d ratios, we output values of the E dimension and the corresponding values for the spectrum of the d ratio. The data pairs are then ordered by E-dimension values and divided equally into 100 bins. We find that at analysis time the spectrum is better behaved for lower values of the E dimension. For instance, while for the bin with the lowest value of E dimension (Fig. 10, top-left panel) the spectrum is relatively flat and the values are near 1, for the bin with the highest values of E dimension the underestimation of the uncertainty by the ensemble is more severe. Interestingly, at 120-h lead time the spectra are better behaved for the higher values of the E dimension. For instance, for the same two examples compared at analysis time, the underestimation of the forecast, with the exception of a few leading directions, is more severe for the regions of low E dimension. These results indicate that while the spectrum of the d ratio benefits from better representation of the space of uncertainties in the low E-dimensional regions at analysis time, having a more diverse distribution of the uncertainty at a longer forecast lead time improves the representation of the forecast uncertainty. The practical implications of this result, when combined with the results of Satterfield and Szunyogh (2010), is that while a forecaster should trust the 96––120-h ensemble predictions of the possible error patterns in the lower-dimensional regions more, he or she should also keep in mind that the patterns of uncertainty associated with small eigenvalues play a more important role in reality than suggested by the raw ensemble forecast.

Fig. 10.
Fig. 10.

The time mean of the NH average spectrum of the ratio dk, calculated for all assimilated variables in local regions with energy rescaling. Results are shown for observations of the real atmosphere for (top) the minimum bin average of E dimension, (middle) median bin average of E dimension, and (bottom) maximum bin average of E dimension for (left) the analysis time and (right) the 120-h forecast lead time.

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

e. E dimension, d ratio, and forecast error

To further investigate the problem of underestimating the uncertainty with an ensemble that effectively captures the space of uncertainties, we plot the eigenvalue spectrum (normalized by the leading eigenvalue) and the percentage of TVS:
e15
captured by the first k eigendirections. Figure 11, which is obtained with the same bin-averaging techniques as Fig. 10, shows the results for the minimum, median, and maximum bin values of the E dimension. At analysis time, these three values of the E dimension are 29.3679, 34.1271, and 36.3424 for the simulated observations in random locations, 14.4292, 25.9455, and 35.4192 for the simulated observations in realistic locations, and 21.8493, 31.88, and 36.9536 for observations of the real atmosphere. At the 120-h lead time, the minimum, median, and maximum bin values of the E dimension are 6.344 64, 14.0873, and 24.9761 for simulated observations in random locations; 8.22 311, 17.0315, and 26.0225 for simulated observations in realistic locations; and 10.3602, 17.7928, and 25.7089 for observations of the real atmosphere. We find that, low values of the E dimension show a quicker saturation of the percentage of TVS compared to the eigenvalue spectrum. For example, for the experiment that assimilates simulated observations in random locations, at the 120-h lead time, at the point where the eigenvalue spectrum approaches 0, only approximately 90%% of TVS has been captured by the ensemble. For the experiment that assimilates observations of the real atmosphere, at the 120-h lead time, even for the lowest bin value of the E dimension, we find that all directions captured by the ensemble are necessary to capture 100%% of TVS, but the eigenvalue spectrum saturates around n == 15 (in agreement with Fig. 10, which shows that the ensemble underestimation increases sharply for trailing directions). These results support the use of linear postprocessing techniques to increase the ensemble spread in the trailing directions at longer forecast lead times.
Fig. 11.
Fig. 11.

The eigenvalue spectrum (normalized by the leading eigenvalue) and the percentage of TVS for low (red plus signs), median (green open circles), and high (blue triangles) values of the E dimension for the NH. Shown are the results at (left) the analysis time and at (right) the 120-h forecast lead time for the experiments that assimilate (top) simulated observations in random locations, (middle) simulated observations in realistic locations, and (bottom) observation of the real atmosphere.

Citation: Monthly Weather Review 139, 4; 10.1175/2010MWR3439.1

5. Conclusions

Motivated by our earlier result, that the linear space defined by the local ensemble perturbations provides a good representation of the forecast uncertainty in the 72––120-h forecast range (Satterfield and Szunyogh 2010), in this paper we continued our investigation into the performance of the ensemble in predicting the magnitude and the distribution of the uncertainty in . All of our numerical experiments were carried out with one particular analysis and ensemble generation system: the implementation of the LETKF on the NCEP GFS by Szunyogh et al. (2008). Some of our results point to shortcomings of that particular system. Most importantly, using a zonally invariant multiplicative covariance inflation factor leads to an underestimation of the local magnitude of the analysis uncertainty in regions of dense observations and an overestimation of the local magnitude of the analysis uncertainty in regions of sparse observations. Also, the system underestimates the analysis uncertainty at the synoptic scales, which leads to an underestimation of the forecast uncertainty at the same scale. These shortcomings of the system can be, most likely, eliminated by implementing a more sophisticated covariance inflation scheme on the LETKF. Some of our other results, on the other hand, have potentially broader implications and, we believe, deserve further investigation:

  • The results suggest that predicting the magnitude of the forecast uncertainty and the relative importance of the different patterns of uncertainty is, in general, a more difficult task than predicting the space of uncertainty (the collection of the potential patterns of uncertainty).

  • While the ensemble, which is tuned to provide near to optimal performance at analysis time, provides a good representation of the space of forecast uncertainty, it severely underestimates not only the total magnitude of the uncertainty, but also the magnitude of the uncertainty that projects onto the space spanned by the ensemble perturbations.

  • The ensemble tends to more severely underestimate the forecast uncertainty in the directions (for the patterns of uncertainty) that are present in the ensemble with a small amplitude. This problem is more pronounced at locations where the E dimension is low (where a very few patterns dominate the predicted uncertainty), which are, interestingly, also the locations where the ensemble provides the best representation of the space of uncertainties.

  • An especially encouraging finding is the strong predictive linear relationship that is found between the local ensemble spread and the 95th percentile of the local forecast uncertainty.

  • Finally, we note that our decision to tune the inflation to satisfy V ≈≈ TVS may not be optimal for 5-day prediction. In the current experiment setting, we make this choice of inflation since the size of the perturbation affects the analysis itself and further inflating (i.e., to satisfy V ≈≈ TV as is more common practice) degrades the analysis. Decoupling the analysis from the forecast and rescaling the perturbations to better match the forecast error deserves further investigation.

We hope to investigate the generality of these findings in the near future utilizing The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) database. If the results of the present paper are found to be general, it would support the approach followed by many synopticians in interpreting the raw ensemble forecasts, which is based on paying more attention to the predicted patterns of uncertainties than to the uncalibrated raw quantitative ensemble-based measures of the uncertainty. Such findings would provide an additional argument for the use of postprocessing techniques to enhance the ensemble-based forecasts. The fact that a linear space provides a good representation of the uncertainty in the medium-range forecast confirms that linear statistical techniques, such as those based on reforecasts (e.g., Hamill et al. 2004; Hamill and Whitaker 2006; Hamill et al. 2008; Hagedorn et al. 2008), have a great potential to improve ensemble-based forecasts.

Finally, we believe that our reasoning behind using a simple linear regression to predict the upper bound of the forecast uncertainty based on the spread is sufficiently general to be valid for any ensemble forecast system. This relationship, along with the strong predictive linear relationship between the E dimensions and the performance of the ensemble in capturing the patterns of uncertainty found in Satterfield and Szunyogh (2010), could be implemented in operations after a proper tuning of the linear regression coefficients.

Acknowledgments

The authors wish to thank David Kuhl, Eric Kostelich, and Gyorgyi Gyarmati for their contributions to this work. The two anonymous reviewers made several helpful suggestions that helped us improve the presentation of our results. The research reported in this paper was funded by the National Research Foundation (Grant ATM-0935538).

1

Graphically, the vectors are the principal axes of the ellipsoid defined by . This ellipsoid represents states of equal probability in .

2

The current implementation of the LETKF defines the local region in terms of distance rather than by model grid points. Since we generate randomly placed simulated observations to cover 10%% of the model grid, as in Kuhl et al. (2007), our experiment considers a greater number of observations for analyses at higher latitudes.

REFERENCES

  • Anderson, J., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A, 72––83.

  • Barker, T., 1991: The relationship between spread and forecast error in extended-range forecasts. J. Climate, 4, 733––742.

  • Candille, G., and O. Talagrand, 2005: Evaluation of probabilistic prediction systems for a scalar variable. Quart. J. Roy. Meteor. Soc., 131, 2131––2150.

    • Search Google Scholar
    • Export Citation
  • Descamps, L., and O. Talagrand, 2007: On some aspects of the definition of initial conditions for ensemble prediction. Mon. Wea. Rev., 135, 3260––3272.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., T. Hamill, and J. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble forecasts. Part I: Two-meter temperatures. Mon. Wea. Rev., 136, 2608––2619.

    • Search Google Scholar
    • Export Citation
  • Hamill, T., and J. Whitaker, 2006: Probabilistic quantitative precipation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 3209––3229.

    • Search Google Scholar
    • Export Citation
  • Hamill, T., J. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434––1447.

    • Search Google Scholar
    • Export Citation
  • Hamill, T., R. Hagedorn, and J. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble forecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 2620––2632.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., 1993: Global and local skill forecasts. Mon. Wea. Rev., 121, 1834––1846.

  • Kuhl, D., and Coauthors, 2007: Assessing predictability with a local ensemble Kalman filter. J. Atmos. Sci., 64, 1116––1140.

  • Leith, C. E., 1974: Theoretical skill of Monte Carlo forecasts. Mon. Wea. Rev., 102, 409––418.

  • Li, H., E. Kalnay, and T. Miyoshi, 2009: Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter. Quart. J. Roy. Meteor. Soc., 135, 523––533, doi:10.1002/qj.371.

    • Search Google Scholar
    • Export Citation
  • Majumdar, S., C. H. Bishop, B. I. Szunyogh, and Z. Toth, 2001: Can an ensemble transform Kalman filter predict the reduction in forecast error variance produced by targeted observations?. Quart. J. Roy. Meteor. Soc., 127, 2803––2820.

    • Search Google Scholar
    • Export Citation
  • Oczkowski, M., I. Szunyogh, and D. J. Patil, 2005: Mechanisms for the development of locally low-dimensional atmospheric dynamics. J. Atmos. Sci., 62, 1135––1156.

    • Search Google Scholar
    • Export Citation
  • Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56, 415––428.

  • Patil, D. J., B. R. Hunt, E. Kalnay, J. A. Yorke, and E. Ott, 2001: Local low dimensionality of atmospheric dynamics. Phys. Rev. Lett., 86, 5878––5881.

    • Search Google Scholar
    • Export Citation
  • Rao, C., 1973: Linear Statistical Inference and Its Applications. 2nd ed. Wiley, 625 pp.

  • Satterfield, E., and I. Szunyogh, 2010: Predictability of the performance of an ensemble forecast system: Predictability of the space of uncertanties. Mon. Wea. Rev., 138, 962––981.

    • Search Google Scholar
    • Export Citation
  • Szunyogh, I., and Coauthors, 2007: The local ensemble transform Kalman filter and its implementation on the NCEP global model at the University of Maryland. Proc. Workshop on Flow Dependent Aspects of Data Assimilation, Reading, United Kingdom, ECMWF, 47––63.

    • Search Google Scholar
    • Export Citation
  • Szunyogh, I., E. J. Kostelich, G. Gyarmati, E. Kalnay, B. R. Hunt, E. Ott, E. A. Satterfield, and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation system for the NCEP global model. Tellus, 60A, 113––130.

    • Search Google Scholar
    • Export Citation
  • Talagrand, O., 1981: A study of the dynamics of four-dimensional data assimilation. Tellus, 33, 43––60.

  • Talagrand, O., R. Vautard, and B. Strauss, 1999: Evaluation of probabilistic prediction systems. Proc. Workshop on Predictability, Reading, United Kingdom, ECMWF, 1––25.

    • Search Google Scholar
    • Export Citation
  • Tribbia, J. J., and D. P. Baumhefner, 2004: Scale interactions and atmospheric predictability: An updated perspective. Mon. Wea. Rev., 132, 703––713.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J., and A. Loughe, 1998: The relationship between ensemble spread and ensemble mean skill. Mon. Wea. Rev., 126, 3292––3302.

    • Search Google Scholar
    • Export Citation
Save