• Bowler, N. E., , M. J. P. Cullen, , and C. Piccolo, 2015: Verification against perturbed analyses and observations. Nonlinear Processes Geophys., 22, 403411, doi:10.5194/npg-22-403-2015.

    • Search Google Scholar
    • Export Citation
  • Buehner, M., , P. L. Houtekamer, , C. Charette, , H. L. Mitchell, , and B. He, 2010: Intercomparison of variational assimilation and the ensemble Kalman filter for global deterministic NWP. Part I: Description and single-observation experiments. Mon. Wea. Rev., 138, 15501566, doi:10.1175/2009MWR3157.1.

    • Search Google Scholar
    • Export Citation
  • Burgers, G., , P. van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 17191724, doi:10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Cullen, M. J. P., 2010: A demonstration of cycled 4D-Var in the presence of model error. Quart. J. Roy. Meteor. Soc., 136, 13791395, doi:10.1002/qj.653.

    • Search Google Scholar
    • Export Citation
  • Cullen, M. J. P., 2013: Analysis of cycled 4D-Var with model error. Quart. J. Roy. Meteor. Soc., 139, 14731480, doi:10.1002/qj.2045.

  • Desroziers, G., , L. Berre, , B. Chapnik, , and P. Poli, 2005: Diagnosis of observation, background and analysis-error statistics in observation space. Quart. J. Roy. Meteor. Soc., 131, 33853396, doi:10.1256/qj.05.108.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., , and P. J. van Leeuwen, 2000: An ensemble Kalman smoother for nonlinear dynamics. Mon. Wea. Rev., 128, 18521867, doi:10.1175/1520-0493(2000)128<1852:AEKSFN>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Fisher, M., , M. Leutbecher, , and G. A. Kelly, 2005: On the equivalence between Kalman smoothing and weak-constraint four-dimensional variational data assimilation. Quart. J. Roy. Meteor. Soc., 131, 32353246, doi:10.1256/qj.04.142.

    • Search Google Scholar
    • Export Citation
  • Hodyss, D., , and N. Nichols, 2015: The error of representation: Basic understanding. Tellus, 67A, 24 82224 839, doi:10.3402/tellusa.v67.24822.

    • Search Google Scholar
    • Export Citation
  • Isaksen, L., , M. Bonavita, , R. Buizza, , M. Fisher, , J. Haseler, , M. Leutbecher, , and L. Raynaud, 2011: Ensemble of data assimilations at ECMWF. ECMWF Tech. Memo. 636, 48 pp. [Available online at http://old.ecmwf.int/publications/library/ecpublications/_pdf/tm/601-700/tm636.pdf.]

  • Mitchell, H., , and P. Houtekamer, 2009: Ensemble Kalman filter configurations and their performance with the logistic map. Mon. Wea. Rev., 137, 43254343, doi:10.1175/2009MWR2823.1.

    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., 2001: A nonlinear dynamical perspective on model error: A proposal for non-local stochastic-dynamic parametrization in weather and climate prediction models. Quart. J. Roy. Meteor. Soc., 127, 279304, doi:10.1002/qj.49712757202.

    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., , R. Buizza, , F. Doblas-Reyes, , T. Jung, , M. Leutbecher, , G. J. Shutts, , M. Steinheimer, , and A. Weisheimer, 2009: Ensemble of data assimilations at ECMWF. ECMWF Tech. Memo. 598, 42 pp. [Available online at http://old.ecmwf.int/publications/library/ecpublications/_pdf/tm/501-600/tm598.pdf.]

  • Piccolo, C., , and M. Cullen, 2013: Estimation of model errors using data assimilation techniques. Met Office, 26 pp. [Available online at https://www0.maths.ox.ac.uk/system/files/attachments/Chiara_Piccolo.pdf.]

  • Smith, L. A., , and J. A. Hansen, 2004: Extending the limits of ensemble forecast verification with the minimum spanning tree. Mon. Wea. Rev., 132, 15221528, doi:10.1175/1520-0493(2004)132<1522:ETLOEF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Tennant, W. J., , G. J. Shutts, , A. Arribas, , and S. A. Thompson, 2011: Using stochastic kinetic energy backscatter scheme to improve MOGREPS probabilistic forecast skill. Mon. Wea. Rev., 139, 11901206, doi:10.1175/2010MWR3430.1.

    • Search Google Scholar
    • Export Citation
  • Todling, R., 2015: A lag-1 smoother approach to system error estimation: Sequential method. Quart. J. Roy. Meteor. Soc., 141, 1502–1513, doi:10.1002/qj.2460.

    • Search Google Scholar
    • Export Citation
  • Trémolet, Y., 2007: Model-error estimation in 4D-Var. Quart. J. Roy. Meteor. Soc., 133, 12671280, doi:10.1002/qj.94.

  • Whitaker, J. S., , and T. M. Hamill, 2012: Evaluating methods to account for system errors in ensemble data assimilation. Mon. Wea. Rev., 140, 30783089, doi:10.1175/MWR-D-11-00276.1.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2004: The minimum spanning tree histogram as a verification tool for multidimensional ensemble forecasts. Mon. Wea. Rev., 132, 13291340, doi:10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    MST rank histogram for 500-hPa geopotential height at T + 6 h: (a) when the verification points are close together (180 km) and (b) when the verification points are far apart (1800 km). The solid horizontal lines are the expected value for each bin and the dashed lines are the expected standard deviation of each bin.

  • View in gallery

    850-hPa zonal wind spread–skill verification: solid lines represent the rmse of the ensemble mean and dashed lines the ensemble spread for the Northern Hemisphere extratropics (blue), the Southern Hemisphere extratropics (red), and for the tropics (green).

  • View in gallery

    850-hPa zonal wind rmse: dashed–dotted lines represent the rmse of the deterministic (unperturbed) control run and solid lines the rmse of the ensemble mean. Color code as in Fig. 2.

  • View in gallery

    Autocorrelation function of the analysis increments as a function of time in the northern extratropics for different pressure levels. Solid lines represent autocorrelations for potential temperature and dashed lines for zonal wind.

  • View in gallery

    (top) Geographical variation of the error variance for the zonal wind component at 850 hPa (right) when the analysis increments are added once per window and (left) added at each time step. (bottom) As in (top), but for potential temperature at 250 hPa.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 57 57 11
PDF Downloads 54 54 7

Ensemble Data Assimilation Using a Unified Representation of Model Error

View More View Less
  • 1 Met Office, Exeter, United Kingdom
© Get Permissions
Full access

Abstract

A natural way to set up an ensemble forecasting system is to use a model with additional stochastic forcing representing the model error and to derive the initial uncertainty by using an ensemble of analyses generated with this model. Current operational practice has tended to separate the problems of generating initial uncertainty and forecast uncertainty. Thus, in ensemble forecasts, it is normal to use physically based stochastic forcing terms to represent model errors, while in generating analysis uncertainties, artificial inflation methods are used to ensure that the analysis spread is sufficient given the observations. In this paper a more unified approach is tested that uses the same stochastic forcing in the analyses and forecasts and estimates the model error forcing from data assimilation diagnostics. This is shown to be successful if there are sufficient observations. Ensembles used in data assimilation have to be reliable in a broader sense than the usual forecast verification methods; in particular, they need to have the correct covariance structure, which is demonstrated.

Denotes Open Access content.

Corresponding author address: Chiara Piccolo, Met Office, Fitzroy Road, Exeter EX1 3PB, United Kingdom. E-mail: chiara.piccolo@metoffice.gov.uk

Abstract

A natural way to set up an ensemble forecasting system is to use a model with additional stochastic forcing representing the model error and to derive the initial uncertainty by using an ensemble of analyses generated with this model. Current operational practice has tended to separate the problems of generating initial uncertainty and forecast uncertainty. Thus, in ensemble forecasts, it is normal to use physically based stochastic forcing terms to represent model errors, while in generating analysis uncertainties, artificial inflation methods are used to ensure that the analysis spread is sufficient given the observations. In this paper a more unified approach is tested that uses the same stochastic forcing in the analyses and forecasts and estimates the model error forcing from data assimilation diagnostics. This is shown to be successful if there are sufficient observations. Ensembles used in data assimilation have to be reliable in a broader sense than the usual forecast verification methods; in particular, they need to have the correct covariance structure, which is demonstrated.

Denotes Open Access content.

Corresponding author address: Chiara Piccolo, Met Office, Fitzroy Road, Exeter EX1 3PB, United Kingdom. E-mail: chiara.piccolo@metoffice.gov.uk

1. Introduction

Stochastic data assimilation methods, such as that described by Evensen and van Leeuwen (2000), have the benefit that they naturally provide an estimate of analysis uncertainty that is consistent with model uncertainty. This is achieved by representing the model uncertainty by a stochastic forcing term and then running an ensemble smoother in which the observation uncertainty is represented by using perturbed observations. If the analysis update is performed using the Kalman gain formula, then the ensemble, if large enough, will correctly represent the evolution of the forecast error covariance in a linear system with Gaussian errors (Burgers et al. 1998). This includes the case where the analysis updates are performed variationally—for instance, the ECMWF ensemble data assimilation system (Isaksen et al. 2011). The utility of ensemble methods in estimating forecast error covariances is demonstrated by Buehner et al. (2010).

To make such systems work, it is essential that the ensembles are reliable. In particular, the true state should be statistically indistinguishable from a random member of the prior ensemble. If this condition is met and the analysis update is performed using the Kalman gain formula, the dynamics is linear, and the errors are Gaussian, then the truth will also be statistically indistinguishable from a random member of the analysis ensemble (Burgers et al. 1998). However, if the model is imperfect, such a prior ensemble can only be constructed if the statistics of the model error are known. If so, and the model is forced by perturbations drawn from a population with these model error statistics, the truth trajectory will be statistically indistinguishable from a randomly chosen trajectory of the forced model ensemble.

Most model error comes from the limited resolution of computer models, which can only represent a small fraction of the observed scales of variability of the atmosphere. Further contributions come from the many physical effects that cannot be described by partial differential equations, whether deterministic or stochastic. The practical success of numerical prediction is measured by comparison with observations, and it is natural to use observations to estimate the model error. As also noted by Todling (2015), the statistics of the model error can only be estimated under a stationarity assumption because the observations only represent a single realization of the truth. Data assimilation techniques are a natural way to estimate model error from observations because they allow for observation error in a systematic way and produce estimates in model space, as in Trémolet (2007). In particular, by using Trémolet weak-constraint formulation, it is possible to estimate a forcing term that represents the model error. However, as noted above and also pointed out by Todling (2015), data assimilation can only be performed if there is prior knowledge of the model error statistics. We therefore carry out a prior data assimilation cycle and calculate the statistics of the analysis increments assuming stationarity. If the actual analysis increments can be regarded as a random draw from a population with these stationary statistics of analysis increments, then the analyzed trajectory will be statistically indistinguishable from a trajectory of the model forced with randomly chosen increments from this population. If the analysis trajectory can be regarded as a reasonable proxy for the truth trajectory, then these stationary statistics of analysis increments will provide an accurate estimate of the model error covariance.

The more common approach for representing model error is to use physically based methods to construct model error forcing terms that are state dependent. The latter approaches include, for example, introducing a stochastic element into atmospheric models by randomly perturbing increments or tendencies from parameterization schemes (e.g., Palmer et al. 2009; Tennant et al. 2011) or seeking a stochastic formulation of the parameterization schemes (e.g., Palmer 2001). The limitation of this approach is that it is usually empirical and, ideally, should be calibrated using data assimilation techniques such as those we describe. However, this calibration can only be performed in a climatological sense for the same reasons as above. Stochastic physics approaches are normally used in ensemble forecasts rather than ensemble data assimilation algorithms. In the latter case, it is more usual to use ad hoc inflation methods, such as those described by Whitaker and Hamill (2012).

Our approach is then first to perform a calibration analysis cycle, using weak-constraint formulation, over a sufficiently long period to obtain stationary statistics. The analysis increments are archived. We then run an ensemble of analyses using a model forced with increments randomly chosen from the archive. No other method of inflation is used. This provides a seamless approach for estimating analysis uncertainty and forecast uncertainty. In this paper we focus on testing the assumption that the model forced with randomly chosen analysis increments does indeed deliver a reliable ensemble, so that the truth trajectory is statistically indistinguishable from a randomly chosen trajectory of the model. The analysis updates are performed using current Met Office operational methods which work well in practice. Previous work has shown that low-resolution results using model error statistics estimated using analysis increments showed a clear improvement of the spread skill relationship at all lead times globally and a reduction in the rmse of the ensemble mean in the tropics compared to standard physically based methods for representing model error (Piccolo and Cullen 2013). Additional work in progress compares the effects of forcing by analysis increments with the operational Met Office physically based methods described in Tennant et al. (2011) in ensemble forecasts. This comparison will be published in a following paper.

2. Methodology

a. General approach

Data assimilation for the purpose of generating initial conditions for a forecast can only be carried out optimally given prior knowledge of the statistics of background errors, model errors, and observation errors. Our premise is that the statistics of model error are unknown a priori and can only be inferred by using observations. We therefore first estimate the model error using data assimilation over a training period and then use the resulting statistics in an ensemble data assimilation system.

Consider first the estimation of the model error statistics. In practice, the truth is only known through imperfect observations. If there were no observations, it would be impossible to determine whether a model was perfect or have any idea of its errors. Since much of the model error comes from the inability to resolve many scales of motion, it can be expected that the statistics will be highly nonstationary. However, the observations only represent a single realization of the state of the atmosphere. Even if the observations were complete and perfect, this would only allow a deterministic estimate of the model error to be made over any fixed time interval. Thus, an estimate of the model error statistics can only be made over a large number of cases. A natural way to do this is to use weak-constraint four-dimensional variational data assimilation (4D-Var) as in Trémolet (2007). It is pointed out in Cullen (2013) that the length of time required for cycled short-window 4D-Var, given stationary model and observation error statistics, to spin up to stationary analysis error statistics is the same as the length of window required for long-window 4D-Var to “forget” its initial conditions, as described by Fisher et al. (2005). Once the assimilation cycles have reached a statistically steady state, the analysis increments must balance the error growth due to both model error and the growth of analysis errors. This will allow the model error to be estimated accurately if the analysis errors are small enough. Note that methods of decomposing this error growth into its components [e.g., that of Desroziers et al. (2005)] rely on prior knowledge of the statistics of the model and observation errors. Nonstationary model error statistics can only be estimated on a time scale longer than the spinup time.

It is inevitable that estimating model errors from observations will be limited by the accuracy and completeness of the observations. However, most errors propagate in space and time, and will be corrected by observations at a different place and time in a cycled data assimilation system. If this is significant, we would expect to see the analysis increments concentrated in data-rich areas, which would not be a correct representation of the model error. However, statistics of actual analysis increments from the Met Office system (illustrated later in Fig. 5) do not show an obvious correspondence to data-rich regions. They seem to correspond more to regions of greater synoptic activity. Probably this largely reflects the global coverage given by satellite data.

A major assumption of our method is that the analysis increments from a cycled data assimilation can be assumed to be a random draw from an archive of increments with stationary statistics. This would not be true if the analysis increments were closely related to individual synoptic systems. It could also not be true because the analysis procedure may be highly tuned to making small increments in sensitive regions of the atmosphere. In the case of a perfect model, methods based on the Kalman filter deliberately do the latter, so that errors in growing modes are selectively corrected. It is not clear how much this argument carries over to correcting model error, which may not be particularly associated with unstable modes of the model. The effectiveness of our assumption can only be tested by actual results, which is the main theme of section 3b.

b. Estimation of the model error statistics

We first summarize the formulation introduced by Trémolet (2007). Write for a model state vector at time step i and assume that the true state evolution is given by
e1
where is a deterministic model operator and is a residual term representing the error in the deterministic model. Assume that the are a random draw from a population with stationary statistics with covariance . Then Eq. (1) states that the true evolution is a realization of a stochastic model, which is thus “perfect.” We thus make the following definition:
  • Definition 2.1. A perfect stochastic model is such that the true evolution is statistically indistinguishable from a random realization of the model.

As noted above, information about the truth is only available from imperfect observations. Under the randomness assumption discussed above, if we replace the by a random draw from an archive of analysis increments, we will obtain a perfect stochastic model in the sense of definition 2.1 if the phrase “true evolution” is replaced by “analysis evolution.” In the limit of vanishing analysis error, the statistics of the analysis increments will be the same as the “true” model error statistics defined from Eq. (1). In general, the statistics of the analysis increments will also include the uncertainty in estimating the model error, so it will be beneficial to minimize that uncertainty by using the best available observations and the best available data assimilation scheme, as in a reanalysis.

We thus suppose we are given an archive of observations, and fit the model defined in Eq. (1) to the data. As noted above, the initial state uncertainty does not need to be estimated. We write the estimated residual to distinguish them from the true residuals . We assume that the statistics of the can be described by a covariance and that the statistics of the can be described by a covariance . To make the estimate, we require a prior covariance, which we write as , and will show in the next subsection how this should be chosen. We will also illustrate that, in the linear case, the covariance of the estimated residuals, , is minimized by making an optimal choice of but will always be greater than the true model error covariance . This means that a suboptimal choice of will result in a greater uncertainty in the estimate of the true model error.

We use cycled weak-constraint 4D-Var, using a model error control variable to define the increments , as in Trémolet (2007). The observation vectors are given by with assumed error covariance and the observation operators which map a model state to observation space are . The cost function to be minimized is
e2
Here n is the number of time steps and the remaining notation is as in Eq. (1).
The minimizer of Eq. (2) gives a minimum variance estimate of the required to ensure that the trajectory defined by Eq. (1) fits the observations to within observation error. The trajectory defined by
e3
will be a realization of a stochastic model that satisfies definition 2.1 with the true state replaced by the analyzed state. It is important to note that the minimum variance estimate of the is not given by the choice . This is analyzed in more detail in the next subsection. In practice, this estimation problem will have to be bootstrapped, as in Cullen (2010), to get closer to the optimal choice of .

Ideally, this estimate should be made using a long assimilation window, as shown in Eq. (2) and suggested by Trémolet. In practice, since only the stationary statistics are required, the same result can be obtained more simply by cycling a short-window algorithm, as in Cullen (2010) and Cullen (2013), in which case the are estimated sequentially rather than simultaneously. Once the system has spun up to give stationary statistics, then the solution of the short-window algorithm will be statistically the same as that of a long-window algorithm and will be a steady-state solution of a Kalman smoother. Thus the estimate could be made using an equivalent ensemble smoother.

c. Scalar case with single control variable

In this section we show why the estimated model error covariance is greater than the true model error and is not the same as the optimal choice of prior covariance needed for the estimation procedure. We use a simple scalar analysis, as in Cullen (2013), which is sufficient for this purpose. The same analysis can be applied to each mode of a multivariate system, as in Desroziers et al. (2005). The notation follows Cullen (2013) in that the variances are written as lower case letters, so that p is the actual variance and r is the observation error variance. The background, model, and observation errors p, q, and r are assumed to be uncorrelated.

The assimilation is assumed to be split into time steps i. To be consistent with Eq. (1), given an error at time i, it is assumed that the error in the trajectory evolved to the time is given by
e4
where m is the growth rate under the action of the model and q is the true residual variance. The assimilation is carried out by minimizing Eq. (2) with the observation at the end of the time step. We calculate residual terms, with assumed prior variance d, to be added at the end of each time step. We will show that the variance e of the computed residuals is different from d, as expected from the theory described by Desroziers et al. (2005). The resulting trajectory will be a realization of Eq. (3).
The error in the trajectory is reduced by a factor by the analysis, where the gain k is given by
e5
The assimilation is cycled to a statistically steady state as in Cullen (2013). The steady-state error is given by Eq. (13) from that paper with suitably adjusted notation:
e6
where
e7
provided that . Otherwise the error will diverge to infinity. The steady-state error is less than r if
e8
If this condition is satisfied, the analysis will fit the observation to within observation error.
It is shown in Cullen (2013) that is minimized by choosing d to be the forecast error at the time the observation is valid. Thus, we define by
e9
The variance of the innovation at steady state is given at the end of each time step by . Since the residuals are computed by multiplying the innovations by k, the variance of the residuals is given by
e10
Then setting in Eq. (5) gives
e11
and, since Eq. (24) of Cullen (2013) gives ,
e12
Thus and thus is always less than r, so that the observations are fitted to within observational error. Substituting Eq. (11) in Eq. (10), and cancelling , gives
e13
The analysis error a at the end of the window is given by the standard formula [Eq. (3) of Cullen (2013)]:
e14
At steady state we must have , so
e15

This shows that the estimated residuals compensate for both the growth of the steady-state error and the true model error q. If the analyses are perfect, so that , then . Otherwise e is greater than q. The value of e is not the same as the assumed value used to calculate it. Since the setting minimizes , Eq. (10) shows that it also minimizes e since r and q are given.

d. Discussion

We have shown that our estimation procedure creates a realization of the stochastic model Eq. (3), which would be a perfect stochastic model in the sense of definition 2.1 if the true states were represented by analyses. This is not the same as estimating the real model error defined by Eq. (1). The property that 4D-Var, or other deterministic methods of data assimilation, gives a minimum variance estimate means that the forcing terms are the minimum required to allow the available observations to be fitted to within observation error for a long period. Thus, if there are very few observations the estimated error will be very small even if the true errors are larger. However, in a real system where observations are plentiful, even though inhomogeneous, cycling the system to a statistically steady state is likely to smooth out the effect of inhomogeneities in the observations.

The analysis set out above shows that the estimated residuals compensate the error growth, which includes the inherent growth of the irreducible analysis error as well as the model error. If the observation coverage and assimilation techniques are improved, the residuals will become closer to the true model errors. If the residuals are genuinely random and are used in Eq. (3), then the analyzed states will be statistically indistinguishable from a random member of an ensemble of forecasts using the model. This is because the evolution from one analyzed state to the next is given by the model and an analysis increment and is, thus, a realization of Eq. (3). Thus, use of Eq. (3) should give reliable ensemble forecasts as judged against analyses.

In the real case, the assumption that model error, and thus the residual estimated as above, can be represented by random noise uncorrelated in time is unlikely to be realistic. However, the result that the residuals in a spun-up data assimilation cycle have to compensate the forecast error growth must still hold, or else the analysis would not be fully spun up. The residuals only compensate for the error growth, not the total error. Systematic errors that do not grow in time are excluded. The results of Cullen (2013) suggest that excluding nongrowing errors that are correlated in time from the prior covariance is beneficial. This is supported by Hodyss and Nichols (2015). Thus, it is reasonable to expect that our procedure will be effective in the real case, even though no theory is available to prove this.

Todling (2015) claims to be able to estimate more directly from innovation statistics in the manner of Desroziers et al. (2005). His method relies on making optimal choices of the gain matrices, but it is not clear how this can be achieved without prior knowledge of . He does not use the assumption that the assimilation has reached a statistical steady state directly, but this assumption is implicit in the way the covariance matrix is estimated from individual innovations.

e. Implementation in an ensemble data assimilation system

We now use the stochastic model defined in Eq. (3) in an ensemble data assimilation system. This is set up in a standard way using perturbed observations and using Eq. (3) to generate the prior ensemble, which is also the background state for the assimilation. Independent analyses are performed for each ensemble member using a randomly chosen set of perturbed observations. Any method can be used for the analysis update, but as in the ensemble Kalman filter of Burgers et al. (1998), the same covariances are used for all the analyses.

Consider the linear case. This procedure will then ensure that the covariance in the ensemble forecast starting from the end of the assimilation window, , will evolve to the end of the subsequent window as
e16
where is the model evolution operator. If a four-dimensional method such as 4D-ensemble-Var (4d-En-Var; Buehner et al. 2010) is employed, with the ensemble covariances used directly in the assimilation, the result will be equivalent to a fixed lag ensemble smoother as discussed by Evensen and van Leeuwen (2000). If an ensemble of 4D-Vars with static covariances is used, it is necessary to ensure that the implied covariance evolution is equivalent to Eq. (16) with the initial covariance set to . This can be achieved by using the weak-constraint formulation of Trémolet (2007), with the background error covariance matrix set to and the model error covariance set to . It is important that the same residual forcing term as used in the background trajectory for each member is applied when regenerating the analysis for that member.

The correct analysis spread will be obtained if the background trajectories include forcing by a random member of the archive of residual forcing terms generated in the calibration procedure described in section 2c. As discussed in section 2d, the true analysis will be statistically indistinguishable from a random member of the background ensemble and will be also statistically indistinguishable from a random member of the observation ensemble. Thus, in the hypothetical case where the truth is selected from both ensembles, so that there is no innovation and no analysis update, the truth will also be statistically indistinguishable from a random member of the analysis ensemble.

If linear theory applies, then Trémolet states that the analysis obtained using the model error control variable will be the same as that given by calculating a model state increment with the same covariance. Therefore, a similar set of analyses could be obtained by using strong-constraint 4D-Var with covariance . The addition of these matrices is justified because the analysis errors will be uncorrelated with the increments that generated them if the analyses are statistically optimal; otherwise, the analyses could be improved by simple postprocessing of the increments (Bowler et al. 2015).

3. Results

In this section first we want to test how well the assumption that the analysis increments are random is satisfied so that it is valid to use them as residual forcing terms. Second, we want to test the correctness of the covariance structures of the T + 6-h forecast ensemble starting from the end of the assimilation window and how the forecast ensemble performs at longer lead times.

a. Calibration step

First we perform the calibration step to generate the archive of analysis increments from the weak-constraint formulation of Eq. (2) using a 6-h window. These increments are an estimate of to be applied at each model time step over each window. In the calibration step we set equal to the background error covariance matrix from the Met Office operational system. This is a suitable first guess because it is known to deliver good quality analyses with operational observations. We run the calibration step for four months in 2011 to generate a large archive of increments. The steady state was achieved within a week, which is the same spinup period of the operational system at the same resolution. We use the full operational observation system of 2011 except for Infrared Atmospheric Sounding Interferometer (IASI) and Atmospheric Infrared Sounder (AIRS) observations owing to technical reasons. This observation system is sufficient to estimate the residual forcing well enough to derive reliable ensemble forecasts.

b. Ensemble data assimilation setup

The ensemble data assimilation system used here is a system of independent strong-constraint 4D-Var cycles that differ by perturbing the observations, sea surface temperature (SST) fields, and applying random residual forcing at each time step every 6 h of the integration of the model. We use N320L70 resolution, which has 640 × 480 horizontal grid points giving a resolution of about 40 km at 50°N and 70 vertical levels. We run this ensemble data assimilation system for one month in summer 2011.

The ensemble system may be summarized as follows: the forecast model [Eq. (3)] is applied to N perturbed analyses, which provide N backgrounds for the next analysis time. The term used in each ensemble forecast member is a random draw from the archive of residual forcings obtained during the calibration step. The strong-constraint 4D-Var analysis scheme is then applied individually to these N backgrounds and to N sets of perturbed observations using the background error covariance matrix and the observation covariance matrix from the Met Office operational system for all members. This provides N new perturbed analyses and so on. The N sets of observations are obtained as random realizations of a Gaussian pdf whose covariance matrix corresponds to the specified observation covariance matrix . In the following experiments, we choose the covariance matrix to include representativeness errors as well as the instrumental errors and to be inflated to compensate for spatial correlation errors. Each ensemble member is kept independent. This means that the mean of the observation perturbations may not be zero. This may result in a degraded analysis (Mitchell and Houtekamer 2009).

c. Random assumption of the analysis increments

An ensemble constructed from Eq. (3) with residual forcing defined by a random draw from an archive generated in the calibration step may not contain the analyzed state if the residual forcing terms are not random. We test this by running a 6-h ensemble forecast from the end of the assimilation window and verify it against a random member of the analysis ensemble at the end of the following assimilation window. This is equivalent to verifying against the truth if the analysis is statistically optimal and the analysis spread is equal to the expected analysis error (Bowler et al. 2015). In this way, we are comparing the effects of adding random analysis increments from an archive to the background trajectories to adding the real analysis increments. If the random assumption is valid, the ensemble spread at T + 6 h from the end of the assimilation window should match the rmse of the ensemble mean.

Table 1 reports the ensemble spread, the rmse of the ensemble mean, and their relative percentage difference with 95% confident intervals at T + 6 h as a function of different area over the globe for zonal winds at 850 hPa (m s−1). The mismatch between ensemble spread and rmse of the ensemble mean at T + 6 h is of the order of a few percent. The 95% confidence intervals show that this mismatch is mainly due to sampling effect. These results demonstrate that the randomness assumption is valid if there are enough observations, so that the calibrated using the weak-constraint formulation are a realistic enough estimate of the residual forcing to allow Eq. (3) to be valid. However, the presence of radiosondes gives a strong semidiurnal variability in the increments, so that the observation statistics used in the calibration step will be nonstationary. This will lead to increased variance in . This may artificially result in increased ensemble spread reported in Table 1 for the southern extratropics and the tropics. Note that this problem cannot be resolved by subsetting the increments in the diurnal cycle because such increments will not be spun up and will not properly represent stationary statistics.

Table 1.

Random assumption of the analysis increments for zonal wind at 850 hPa (m s−1).

Table 1.

The randomness assumption may not be valid if the analysis increments, used to estimate the residual forcings, are correlated in time. In section 4e, we will look at the time correlation of the analysis increments. If there is sufficient time correlation in the error growth, it would be worth including in the generation of .

d. Minimum spanning tree rank histogram

Traditional ensemble verification scores are not sufficient to evaluate the adequacy of covariances for ensemble data assimilation systems. The definition of the perfect stochastic model relies on the truth to be statistically indistinguishable from a random member of an ensemble. A proper way to enforce this definition is to use the minimum spanning tree (MST) rank histogram verification. As described by Smith and Hansen (2004), the MST rank histogram “assesses the predicted pdf by testing the hypothesis that the truth is a member of the population defined by the ensemble.” The MST histogram ranks the smallest length of a multidimensional tree consisted of points (N ensemble members plus the verification), where N MST lengths are calculated by replacing each ensemble member with the verification while one length is constructed by using only the ensemble (Smith and Hansen 2004; Wilks 2004). If the ensemble and the verification belong to the same probability distribution, then the MST lengths constructed using only the ensemble members or by replacing the verification to each ensemble member should be randomly distributed. If a large number of cases is used, the resulting MST rank histogram should be flat.

To infer information about the covariance structure of the ensemble, we first correct the ensemble data at a set of K verification points for bias and spread. We then assess the ensemble horizontal correlation scale by choosing verification points with different horizontal separation distances. For verification, we use a random member of the ensemble of analyses.

The bias-corrected ensemble (Wilks 2004) has elements given by
e17
where each point , in this K-dimensional space, corresponds to the value of the kth element of the jth ensemble member on the ith forecast occasion, where , and . The represent the raw ensemble, their corresponding verifications, and the angle brackets the averages over the occasions.
Since we want to use the MST rank histogram to diagnose error structures and not just error variances, we need to calibrate the ensemble so that the spread matches the ensemble mean error as well removing bias. Let denote the ensemble perturbations for the forecast event i and location k:
e18
and the corresponding ensemble variance σ and the ensemble mean error γ are
e19
where is the verifying analysis at location k and event i. Then, a calibrated ensemble for the forecast event i and location k is given by
e20
where
e21
the adjustment of the spread is given by the ratio of the rmse () and the ensemble spread () for each location averaged over the forecast events.

Here we present results generated from one month of data and horizontal locations. To increase the statistical sample, the K parameters were sampled in 10 different parts of the northern extratropics (i.e., 90 locations). Here the MST lengths are calculated using the Euclidean norm.

Figure 1 shows the 500-hPa geopotential height MST rank histogram results for ensemble forecasts at T + 6 h when the verification points are clustered together (Fig. 1a) and when they are chosen at large separation distance (Fig. 1b). As described in Smith and Hansen (2004), “the solid horizontal lines are the expected value for each bin based on the ensemble size, and the dashed lines are the expected standard deviation of each bin based on the ensemble size and the number of samples.”

Fig. 1.
Fig. 1.

MST rank histogram for 500-hPa geopotential height at T + 6 h: (a) when the verification points are close together (180 km) and (b) when the verification points are far apart (1800 km). The solid horizontal lines are the expected value for each bin and the dashed lines are the expected standard deviation of each bin.

Citation: Monthly Weather Review 144, 1; 10.1175/MWR-D-15-0270.1

In both cases, when the verification points are chosen at short separation distance (around 180 km) and at large separation distance (around 1800 km), the MST rank histogram is reasonably flat. This indicates that the correlation of the ensemble perturbations and of the error is similar and therefore the ensemble and the verification are sampled from the same pdf.

e. Spread–skill at longer lead times

Figure 2 shows the spread–skill relationship for 850-hPa zonal winds averaged over 1-month period. Solid lines represent the rmse of the ensemble mean and dashed lines the ensemble spread for the northern extratropics in blue, the southern extratropics in red, and the tropics in green. The rmse of the ensemble mean refers to the rmse of the mean of the ensemble forecasts.

Fig. 2.
Fig. 2.

850-hPa zonal wind spread–skill verification: solid lines represent the rmse of the ensemble mean and dashed lines the ensemble spread for the Northern Hemisphere extratropics (blue), the Southern Hemisphere extratropics (red), and for the tropics (green).

Citation: Monthly Weather Review 144, 1; 10.1175/MWR-D-15-0270.1

Each ensemble forecast member is perturbed every 6 h up to the end of the forecast range with the term being a random draw from the archive of residual forcings obtained during the calibration step.

Figure 2 illustrates that the system is slightly underdispersive for the Northern Hemisphere extratropics and slightly overdispersive for the Southern Hemisphere extratropics and the tropics. This is in agreement with that shown at T + 6 h in Table 1. The linear analysis of section 2c suggests that the residual forcing will be more seriously overestimated if there are few observations leading to an overdispersive ensemble, because large analysis increments will be required to keep the evolution close to the observations that are available. When the spread generated by the residual forcing is not sufficient as in the Northern Hemisphere extratropics, it may be due to the fact that either the randomness assumption is not satisfied, or that the analysis increments are localized to the neighborhood of the observations and thus inadequate for use as residual forcing terms.

Figure 3 compares the rmse of the deterministic (unperturbed) control run in dashed–dotted lines and the rmse of the ensemble mean in solid lines. The rmse of the control is larger than the rmse of the ensemble mean almost everywhere. At initial times the rmse of the control is much larger of the rmse of the ensemble mean, this is because the verification is performed against a random member of the ensemble of analysis instead of the unperturbed analysis. Bowler et al. (2015) demonstrated that this is equivalent to verifying against the truth. This demonstrates that the skill of the ensemble mean forecast is greater than the deterministic run’s skill at the same resolution at all lead times.

Fig. 3.
Fig. 3.

850-hPa zonal wind rmse: dashed–dotted lines represent the rmse of the deterministic (unperturbed) control run and solid lines the rmse of the ensemble mean. Color code as in Fig. 2.

Citation: Monthly Weather Review 144, 1; 10.1175/MWR-D-15-0270.1

f. Potential inconsistencies in the system

Though the assumption that the analysis increments are random is potentially restrictive, the assumption will be definitely degraded if

  • the assimilation procedure is not set up optimally,
  • the assumptions of linearity and of Gaussian random errors are not valid,
  • the analysis increments are correlated in time,
  • the analysis increments are not calculated consistently with their use as forcing terms.
Here we illustrate only the two latter issues leaving the others for future work. In particular, the use of the operational Met Office matrix is likely to be a nonoptimal choice for the calibration step, which implies that the residual term is larger than necessary, or for the ensemble data assimilation.

In the following two sections we show some evidence of the importance of the time correlation of the analysis increments and how using weak-constraint 4D-Var to estimate the improves the consistency between the calibration of the residual forcing and the forecast system as expressed by Eq. (3), as well as the ensemble spread–skill relationship.

1) Time correlation of the analysis increments

Table 1 suggests that there is some degree of time correlation in the analysis increments and therefore it should also be allowed for when defining the to be used in Eq. (3). Here we look at the time correlation of the analysis increments collected over three months. Figure 4 shows the autocorrelation function as a function of time in the northern extratropics at different pressure levels, 250 hPa in cyan, 500 hPa in magenta, and 850 hPa in blue. Solid lines represent autocorrelations for potential temperature and dashed lines for zonal wind.

Fig. 4.
Fig. 4.

Autocorrelation function of the analysis increments as a function of time in the northern extratropics for different pressure levels. Solid lines represent autocorrelations for potential temperature and dashed lines for zonal wind.

Citation: Monthly Weather Review 144, 1; 10.1175/MWR-D-15-0270.1

Figure 4 shows that there are strong time correlations depending on the pressure levels of the order of a few days. Similar results (not shown here) are obtained for the Southern Hemisphere and the tropics, although they show different time scales depending on regions, variables, and pressure levels. Therefore, future experiments will be designed in such a way that the different time correlation scales are taken into account when defining the residual forcing term in Eq. (3). This could partially explain the mismatch between the ensemble spread and the ensemble mean error.

2) Weak-constraint 4D-Var analysis increments

It is important to have a correct and consistent system that calculates the residual forcing terms as they are going to be applied to the ensemble forecast. That means that the forcing should be assumed to be applied at each time step as in weak-constraint 4D-Var rather than only at the beginning of the window as in strong-constraint 4D-Var. This inconsistency in the system would contribute to mismatch between spread and error since the scale and the magnitude of the residual error forcings are different in the two cases as shown in Fig. 5.

Fig. 5.
Fig. 5.

(top) Geographical variation of the error variance for the zonal wind component at 850 hPa (right) when the analysis increments are added once per window and (left) added at each time step. (bottom) As in (top), but for potential temperature at 250 hPa.

Citation: Monthly Weather Review 144, 1; 10.1175/MWR-D-15-0270.1

Figure 5 shows the comparison of the geographical variation of the error variance for the zonal wind component at 850 hPa (top panels) between residual forcings computed assuming that they are added once per window on the right and added at each time step on the left. The latter increments show more variance and are larger scale than the former. Similar results are obtained also for different variables and different pressure levels. For example the bottom panels of Fig. 5 show results for potential temperature at 250 hPa. This is to be expected because when using strong-constraint 4D-Var we rely on the model to evolve the initial perturbations for 6 h while in weak-constraint 4D-Var small increments are added at each time step through the 6-h window. At the end of window the effect is the same.

Less variance and smaller scale in when applied in Eq. (3) will probably decrease the spread of the ensemble as well as produce an inconsistency between the assimilation and the forecast. Lower resolution experiments showed that, when using generated from a strong-constraint formulation, the ensemble gets very underdispersive (Piccolo and Cullen 2013).

4. Conclusions

In this paper we have illustrated how to use observations to evaluate the effect of model error by using an ensemble of data assimilations. We have demonstrated the requirements needed to create a stochastic model such that the analyzed truth is statistically indistinguishable from a member of the forecast ensemble at all times with minimum spread.

The analysis error of a cycled data assimilation system converges to steady state under suitable stationarity assumptions. At this steady state, the statistics of the analysis increments have to be the same as the statistics of the error growth within a data assimilation cycle. We can create a “perfect” stochastic model by using the analysis increments as random residual forcing terms provided that the error growth over a given time interval is random with stationary statistics. We tested the randomness assumption of the analysis increments which define the residual forcing in a stochastic model. This assumption is accurate to within 3%.

Finally, we anticipate that optimizing the assimilation procedure, using a more complete set of observations such as a reanalysis, and introducing time correlation when defining the residual forcing, will lead to further improvement in creating a perfect stochastic model and thus a reliable ensemble data assimilation system.

Acknowledgments

The authors thank Neill Bowler for very useful discussions on the project. Andrew Lorenc and Neill Bowler are thanked for providing valuable suggestions to improve the manuscript. The authors would also like to thank Mike Thurlow and Paul Earnshaw for the technical support in running an ensemble of 4D-Vars and David Davies for implementing the perturbed observation software.

REFERENCES

  • Bowler, N. E., , M. J. P. Cullen, , and C. Piccolo, 2015: Verification against perturbed analyses and observations. Nonlinear Processes Geophys., 22, 403411, doi:10.5194/npg-22-403-2015.

    • Search Google Scholar
    • Export Citation
  • Buehner, M., , P. L. Houtekamer, , C. Charette, , H. L. Mitchell, , and B. He, 2010: Intercomparison of variational assimilation and the ensemble Kalman filter for global deterministic NWP. Part I: Description and single-observation experiments. Mon. Wea. Rev., 138, 15501566, doi:10.1175/2009MWR3157.1.

    • Search Google Scholar
    • Export Citation
  • Burgers, G., , P. van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 17191724, doi:10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Cullen, M. J. P., 2010: A demonstration of cycled 4D-Var in the presence of model error. Quart. J. Roy. Meteor. Soc., 136, 13791395, doi:10.1002/qj.653.

    • Search Google Scholar
    • Export Citation
  • Cullen, M. J. P., 2013: Analysis of cycled 4D-Var with model error. Quart. J. Roy. Meteor. Soc., 139, 14731480, doi:10.1002/qj.2045.

  • Desroziers, G., , L. Berre, , B. Chapnik, , and P. Poli, 2005: Diagnosis of observation, background and analysis-error statistics in observation space. Quart. J. Roy. Meteor. Soc., 131, 33853396, doi:10.1256/qj.05.108.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., , and P. J. van Leeuwen, 2000: An ensemble Kalman smoother for nonlinear dynamics. Mon. Wea. Rev., 128, 18521867, doi:10.1175/1520-0493(2000)128<1852:AEKSFN>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Fisher, M., , M. Leutbecher, , and G. A. Kelly, 2005: On the equivalence between Kalman smoothing and weak-constraint four-dimensional variational data assimilation. Quart. J. Roy. Meteor. Soc., 131, 32353246, doi:10.1256/qj.04.142.

    • Search Google Scholar
    • Export Citation
  • Hodyss, D., , and N. Nichols, 2015: The error of representation: Basic understanding. Tellus, 67A, 24 82224 839, doi:10.3402/tellusa.v67.24822.

    • Search Google Scholar
    • Export Citation
  • Isaksen, L., , M. Bonavita, , R. Buizza, , M. Fisher, , J. Haseler, , M. Leutbecher, , and L. Raynaud, 2011: Ensemble of data assimilations at ECMWF. ECMWF Tech. Memo. 636, 48 pp. [Available online at http://old.ecmwf.int/publications/library/ecpublications/_pdf/tm/601-700/tm636.pdf.]

  • Mitchell, H., , and P. Houtekamer, 2009: Ensemble Kalman filter configurations and their performance with the logistic map. Mon. Wea. Rev., 137, 43254343, doi:10.1175/2009MWR2823.1.

    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., 2001: A nonlinear dynamical perspective on model error: A proposal for non-local stochastic-dynamic parametrization in weather and climate prediction models. Quart. J. Roy. Meteor. Soc., 127, 279304, doi:10.1002/qj.49712757202.

    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., , R. Buizza, , F. Doblas-Reyes, , T. Jung, , M. Leutbecher, , G. J. Shutts, , M. Steinheimer, , and A. Weisheimer, 2009: Ensemble of data assimilations at ECMWF. ECMWF Tech. Memo. 598, 42 pp. [Available online at http://old.ecmwf.int/publications/library/ecpublications/_pdf/tm/501-600/tm598.pdf.]

  • Piccolo, C., , and M. Cullen, 2013: Estimation of model errors using data assimilation techniques. Met Office, 26 pp. [Available online at https://www0.maths.ox.ac.uk/system/files/attachments/Chiara_Piccolo.pdf.]

  • Smith, L. A., , and J. A. Hansen, 2004: Extending the limits of ensemble forecast verification with the minimum spanning tree. Mon. Wea. Rev., 132, 15221528, doi:10.1175/1520-0493(2004)132<1522:ETLOEF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Tennant, W. J., , G. J. Shutts, , A. Arribas, , and S. A. Thompson, 2011: Using stochastic kinetic energy backscatter scheme to improve MOGREPS probabilistic forecast skill. Mon. Wea. Rev., 139, 11901206, doi:10.1175/2010MWR3430.1.

    • Search Google Scholar
    • Export Citation
  • Todling, R., 2015: A lag-1 smoother approach to system error estimation: Sequential method. Quart. J. Roy. Meteor. Soc., 141, 1502–1513, doi:10.1002/qj.2460.

    • Search Google Scholar
    • Export Citation
  • Trémolet, Y., 2007: Model-error estimation in 4D-Var. Quart. J. Roy. Meteor. Soc., 133, 12671280, doi:10.1002/qj.94.

  • Whitaker, J. S., , and T. M. Hamill, 2012: Evaluating methods to account for system errors in ensemble data assimilation. Mon. Wea. Rev., 140, 30783089, doi:10.1175/MWR-D-11-00276.1.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2004: The minimum spanning tree histogram as a verification tool for multidimensional ensemble forecasts. Mon. Wea. Rev., 132, 13291340, doi:10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
Save