## 1. Introduction

Stochastic data assimilation methods, such as that described by Evensen and van Leeuwen (2000), have the benefit that they naturally provide an estimate of analysis uncertainty that is consistent with model uncertainty. This is achieved by representing the model uncertainty by a stochastic forcing term and then running an ensemble smoother in which the observation uncertainty is represented by using perturbed observations. If the analysis update is performed using the Kalman gain formula, then the ensemble, if large enough, will correctly represent the evolution of the forecast error covariance in a linear system with Gaussian errors (Burgers et al. 1998). This includes the case where the analysis updates are performed variationally—for instance, the ECMWF ensemble data assimilation system (Isaksen et al. 2011). The utility of ensemble methods in estimating forecast error covariances is demonstrated by Buehner et al. (2010).

To make such systems work, it is essential that the ensembles are reliable. In particular, the true state should be statistically indistinguishable from a random member of the prior ensemble. If this condition is met and the analysis update is performed using the Kalman gain formula, the dynamics is linear, and the errors are Gaussian, then the truth will also be statistically indistinguishable from a random member of the analysis ensemble (Burgers et al. 1998). However, if the model is imperfect, such a prior ensemble can only be constructed if the statistics of the model error are known. If so, and the model is forced by perturbations drawn from a population with these model error statistics, the truth trajectory will be statistically indistinguishable from a randomly chosen trajectory of the forced model ensemble.

Most model error comes from the limited resolution of computer models, which can only represent a small fraction of the observed scales of variability of the atmosphere. Further contributions come from the many physical effects that cannot be described by partial differential equations, whether deterministic or stochastic. The practical success of numerical prediction is measured by comparison with observations, and it is natural to use observations to estimate the model error. As also noted by Todling (2015), the statistics of the model error can only be estimated under a stationarity assumption because the observations only represent a single realization of the truth. Data assimilation techniques are a natural way to estimate model error from observations because they allow for observation error in a systematic way and produce estimates in model space, as in Trémolet (2007). In particular, by using Trémolet weak-constraint formulation, it is possible to estimate a forcing term that represents the model error. However, as noted above and also pointed out by Todling (2015), data assimilation can only be performed if there is prior knowledge of the model error statistics. We therefore carry out a prior data assimilation cycle and calculate the statistics of the analysis increments assuming stationarity. If the actual analysis increments can be regarded as a random draw from a population with these stationary statistics of analysis increments, then the analyzed trajectory will be statistically indistinguishable from a trajectory of the model forced with randomly chosen increments from this population. If the analysis trajectory can be regarded as a reasonable proxy for the truth trajectory, then these stationary statistics of analysis increments will provide an accurate estimate of the model error covariance.

The more common approach for representing model error is to use physically based methods to construct model error forcing terms that are state dependent. The latter approaches include, for example, introducing a stochastic element into atmospheric models by randomly perturbing increments or tendencies from parameterization schemes (e.g., Palmer et al. 2009; Tennant et al. 2011) or seeking a stochastic formulation of the parameterization schemes (e.g., Palmer 2001). The limitation of this approach is that it is usually empirical and, ideally, should be calibrated using data assimilation techniques such as those we describe. However, this calibration can only be performed in a climatological sense for the same reasons as above. Stochastic physics approaches are normally used in ensemble forecasts rather than ensemble data assimilation algorithms. In the latter case, it is more usual to use ad hoc inflation methods, such as those described by Whitaker and Hamill (2012).

Our approach is then first to perform a calibration analysis cycle, using weak-constraint formulation, over a sufficiently long period to obtain stationary statistics. The analysis increments are archived. We then run an ensemble of analyses using a model forced with increments randomly chosen from the archive. No other method of inflation is used. This provides a seamless approach for estimating analysis uncertainty and forecast uncertainty. In this paper we focus on testing the assumption that the model forced with randomly chosen analysis increments does indeed deliver a reliable ensemble, so that the truth trajectory is statistically indistinguishable from a randomly chosen trajectory of the model. The analysis updates are performed using current Met Office operational methods which work well in practice. Previous work has shown that low-resolution results using model error statistics estimated using analysis increments showed a clear improvement of the spread skill relationship at all lead times globally and a reduction in the rmse of the ensemble mean in the tropics compared to standard physically based methods for representing model error (Piccolo and Cullen 2013). Additional work in progress compares the effects of forcing by analysis increments with the operational Met Office physically based methods described in Tennant et al. (2011) in ensemble forecasts. This comparison will be published in a following paper.

## 2. Methodology

### a. General approach

Data assimilation for the purpose of generating initial conditions for a forecast can only be carried out optimally given prior knowledge of the statistics of background errors, model errors, and observation errors. Our premise is that the statistics of model error are unknown a priori and can only be inferred by using observations. We therefore first estimate the model error using data assimilation over a training period and then use the resulting statistics in an ensemble data assimilation system.

Consider first the estimation of the model error statistics. In practice, the truth is only known through imperfect observations. If there were no observations, it would be impossible to determine whether a model was perfect or have any idea of its errors. Since much of the model error comes from the inability to resolve many scales of motion, it can be expected that the statistics will be highly nonstationary. However, the observations only represent a single realization of the state of the atmosphere. Even if the observations were complete and perfect, this would only allow a deterministic estimate of the model error to be made over any fixed time interval. Thus, an estimate of the model error statistics can only be made over a large number of cases. A natural way to do this is to use weak-constraint four-dimensional variational data assimilation (4D-Var) as in Trémolet (2007). It is pointed out in Cullen (2013) that the length of time required for cycled short-window 4D-Var, given stationary model and observation error statistics, to spin up to stationary analysis error statistics is the same as the length of window required for long-window 4D-Var to “forget” its initial conditions, as described by Fisher et al. (2005). Once the assimilation cycles have reached a statistically steady state, the analysis increments must balance the error growth due to both model error and the growth of analysis errors. This will allow the model error to be estimated accurately if the analysis errors are small enough. Note that methods of decomposing this error growth into its components [e.g., that of Desroziers et al. (2005)] rely on prior knowledge of the statistics of the model and observation errors. Nonstationary model error statistics can only be estimated on a time scale longer than the spinup time.

It is inevitable that estimating model errors from observations will be limited by the accuracy and completeness of the observations. However, most errors propagate in space and time, and will be corrected by observations at a different place and time in a cycled data assimilation system. If this is significant, we would expect to see the analysis increments concentrated in data-rich areas, which would not be a correct representation of the model error. However, statistics of actual analysis increments from the Met Office system (illustrated later in Fig. 5) do not show an obvious correspondence to data-rich regions. They seem to correspond more to regions of greater synoptic activity. Probably this largely reflects the global coverage given by satellite data.

A major assumption of our method is that the analysis increments from a cycled data assimilation can be assumed to be a random draw from an archive of increments with stationary statistics. This would not be true if the analysis increments were closely related to individual synoptic systems. It could also not be true because the analysis procedure may be highly tuned to making small increments in sensitive regions of the atmosphere. In the case of a perfect model, methods based on the Kalman filter deliberately do the latter, so that errors in growing modes are selectively corrected. It is not clear how much this argument carries over to correcting model error, which may not be particularly associated with unstable modes of the model. The effectiveness of our assumption can only be tested by actual results, which is the main theme of section 3b.

### b. Estimation of the model error statistics

*i*and assume that the true state evolution is given bywhere

- Definition 2.1. A perfect stochastic model is such that the true evolution is statistically indistinguishable from a random realization of the model.

As noted above, information about the truth is only available from imperfect observations. Under the randomness assumption discussed above, if we replace the

We thus suppose we are given an archive of observations, and fit the model defined in Eq. (1) to the data. As noted above, the initial state uncertainty does not need to be estimated. We write the estimated residual

*n*is the number of time steps and the remaining notation is as in Eq. (1).

Ideally, this estimate should be made using a long assimilation window, as shown in Eq. (2) and suggested by Trémolet. In practice, since only the stationary statistics are required, the same result can be obtained more simply by cycling a short-window algorithm, as in Cullen (2010) and Cullen (2013), in which case the

### c. Scalar case with single control variable

In this section we show why the estimated model error covariance *p* is the actual variance and *r* is the observation error variance. The background, model, and observation errors *p*, *q*, and *r* are assumed to be uncorrelated.

*i*. To be consistent with Eq. (1), given an error

*i*, it is assumed that the error in the trajectory evolved to the time

*m*is the growth rate under the action of the model and

*q*is the true residual variance. The assimilation is carried out by minimizing Eq. (2) with the observation at the end of the time step. We calculate residual terms, with assumed prior variance

*d*, to be added at the end of each time step. We will show that the variance

*e*of the computed residuals is different from

*d*, as expected from the theory described by Desroziers et al. (2005). The resulting trajectory will be a realization of Eq. (3).

*k*is given byThe assimilation is cycled to a statistically steady state as in Cullen (2013). The steady-state error

*r*ifIf this condition is satisfied, the analysis will fit the observation to within observation error.

*d*to be the forecast error at the time the observation is valid. Thus, we define

*k*, the variance of the residuals is given byThen setting

*r*, so that the observations are fitted to within observational error. Substituting Eq. (11) in Eq. (10), and cancelling

*a*at the end of the window is given by the standard formula [Eq. (3) of Cullen (2013)]:At steady state we must have

This shows that the estimated residuals compensate for both the growth of the steady-state error *q*. If the analyses are perfect, so that *e* is greater than *q*. The value of *e* is not the same as the assumed value *e* since *r* and *q* are given.

### d. Discussion

We have shown that our estimation procedure creates a realization of the stochastic model Eq. (3), which would be a perfect stochastic model in the sense of definition 2.1 if the true states were represented by analyses. This is not the same as estimating the real model error defined by Eq. (1). The property that 4D-Var, or other deterministic methods of data assimilation, gives a minimum variance estimate means that the forcing terms

The analysis set out above shows that the estimated residuals compensate the error growth, which includes the inherent growth of the irreducible analysis error as well as the model error. If the observation coverage and assimilation techniques are improved, the residuals will become closer to the true model errors. If the residuals are genuinely random and are used in Eq. (3), then the analyzed states will be statistically indistinguishable from a random member of an ensemble of forecasts using the model. This is because the evolution from one analyzed state to the next is given by the model and an analysis increment and is, thus, a realization of Eq. (3). Thus, use of Eq. (3) should give reliable ensemble forecasts as judged against analyses.

In the real case, the assumption that model error, and thus the residual estimated as above, can be represented by random noise uncorrelated in time is unlikely to be realistic. However, the result that the residuals in a spun-up data assimilation cycle have to compensate the forecast error growth must still hold, or else the analysis would not be fully spun up. The residuals only compensate for the error growth, not the total error. Systematic errors that do not grow in time are excluded. The results of Cullen (2013) suggest that excluding nongrowing errors that are correlated in time from the prior covariance is beneficial. This is supported by Hodyss and Nichols (2015). Thus, it is reasonable to expect that our procedure will be effective in the real case, even though no theory is available to prove this.

Todling (2015) claims to be able to estimate

### e. Implementation in an ensemble data assimilation system

We now use the stochastic model defined in Eq. (3) in an ensemble data assimilation system. This is set up in a standard way using perturbed observations and using Eq. (3) to generate the prior ensemble, which is also the background state for the assimilation. Independent analyses are performed for each ensemble member using a randomly chosen set of perturbed observations. Any method can be used for the analysis update, but as in the ensemble Kalman filter of Burgers et al. (1998), the same covariances are used for all the analyses.

The correct analysis spread will be obtained if the background trajectories include forcing by a random member of the archive of residual forcing terms generated in the calibration procedure described in section 2c. As discussed in section 2d, the true analysis will be statistically indistinguishable from a random member of the background ensemble and will be also statistically indistinguishable from a random member of the observation ensemble. Thus, in the hypothetical case where the truth is selected from both ensembles, so that there is no innovation and no analysis update, the truth will also be statistically indistinguishable from a random member of the analysis ensemble.

If linear theory applies, then Trémolet states that the analysis obtained using the model error control variable will be the same as that given by calculating a model state increment with the same covariance. Therefore, a similar set of analyses could be obtained by using strong-constraint 4D-Var with covariance

## 3. Results

In this section first we want to test how well the assumption that the analysis increments are random is satisfied so that it is valid to use them as residual forcing terms. Second, we want to test the correctness of the covariance structures of the *T* + 6-h forecast ensemble starting from the end of the assimilation window and how the forecast ensemble performs at longer lead times.

### a. Calibration step

First we perform the calibration step to generate the archive of analysis increments from the weak-constraint formulation of Eq. (2) using a 6-h window. These increments are an estimate of

### b. Ensemble data assimilation setup

The ensemble data assimilation system used here is a system of

The ensemble system may be summarized as follows: the forecast model [Eq. (3)] is applied to *N* perturbed analyses, which provide *N* backgrounds for the next analysis time. The term *N* backgrounds and to *N* sets of perturbed observations using the background error covariance matrix *N* new perturbed analyses and so on. The *N* sets of observations are obtained as random realizations of a Gaussian pdf whose covariance matrix corresponds to the specified observation covariance matrix

### c. Random assumption of the analysis increments

An ensemble constructed from Eq. (3) with residual forcing *T* + 6 h from the end of the assimilation window should match the rmse of the ensemble mean.

Table 1 reports the ensemble spread, the rmse of the ensemble mean, and their relative percentage difference with 95% confident intervals at *T* + 6 h as a function of different area over the globe for zonal winds at 850 hPa (m s^{−1}). The mismatch between ensemble spread and rmse of the ensemble mean at *T* + 6 h is of the order of a few percent. The 95% confidence intervals show that this mismatch is mainly due to sampling effect. These results demonstrate that the randomness assumption is valid if there are enough observations, so that the

Random assumption of the analysis increments for zonal wind at 850 hPa (m s^{−1}).

The randomness assumption may not be valid if the analysis increments, used to estimate the residual forcings, are correlated in time. In section 4e, we will look at the time correlation of the analysis increments. If there is sufficient time correlation in the error growth, it would be worth including in the generation of

### d. Minimum spanning tree rank histogram

Traditional ensemble verification scores are not sufficient to evaluate the adequacy of covariances for ensemble data assimilation systems. The definition of the perfect stochastic model relies on the truth to be statistically indistinguishable from a random member of an ensemble. A proper way to enforce this definition is to use the minimum spanning tree (MST) rank histogram verification. As described by Smith and Hansen (2004), the MST rank histogram “assesses the predicted pdf by testing the hypothesis that the truth is a member of the population defined by the ensemble.” The MST histogram ranks the smallest length of a multidimensional tree consisted of *N* ensemble members plus the verification), where *N* MST lengths are calculated by replacing each ensemble member with the verification while one length is constructed by using only the ensemble (Smith and Hansen 2004; Wilks 2004). If the ensemble and the verification belong to the same probability distribution, then the MST lengths constructed using only the ensemble members or by replacing the verification to each ensemble member should be randomly distributed. If a large number of cases

To infer information about the covariance structure of the ensemble, we first correct the ensemble data at a set of *K* verification points for bias and spread. We then assess the ensemble horizontal correlation scale by choosing verification points with different horizontal separation distances. For verification, we use a random member of the ensemble of analyses.

*K*-dimensional space, corresponds to the value of the

*k*th element of the

*j*th ensemble member on the

*i*th forecast occasion, where

*i*and location

*k*:and the corresponding ensemble variance

*σ*and the ensemble mean error

*γ*arewhere

*k*and event

*i*. Then, a calibrated ensemble for the forecast event

*i*and location

*k*is given bywherethe adjustment of the spread is given by the ratio of the rmse (

Here we present results generated from one month of data and *K* parameters were sampled in 10 different parts of the northern extratropics (i.e., 90 locations). Here the MST lengths are calculated using the Euclidean

Figure 1 shows the 500-hPa geopotential height MST rank histogram results for ensemble forecasts at *T* + 6 h when the verification points are clustered together (Fig. 1a) and when they are chosen at large separation distance (Fig. 1b). As described in Smith and Hansen (2004), “the solid horizontal lines are the expected value for each bin based on the ensemble size, and the dashed lines are the expected standard deviation of each bin based on the ensemble size and the number of samples.”

In both cases, when the verification points are chosen at short separation distance (around 180 km) and at large separation distance (around 1800 km), the MST rank histogram is reasonably flat. This indicates that the correlation of the ensemble perturbations and of the error is similar and therefore the ensemble and the verification are sampled from the same pdf.

### e. Spread–skill at longer lead times

Figure 2 shows the spread–skill relationship for 850-hPa zonal winds averaged over 1-month period. Solid lines represent the rmse of the ensemble mean and dashed lines the ensemble spread for the northern extratropics in blue, the southern extratropics in red, and the tropics in green. The rmse of the ensemble mean refers to the rmse of the mean of the ensemble forecasts.

Each ensemble forecast member is perturbed every 6 h up to the end of the forecast range with the term

Figure 2 illustrates that the system is slightly underdispersive for the Northern Hemisphere extratropics and slightly overdispersive for the Southern Hemisphere extratropics and the tropics. This is in agreement with that shown at *T* + 6 h in Table 1. The linear analysis of section 2c suggests that the residual forcing will be more seriously overestimated if there are few observations leading to an overdispersive ensemble, because large analysis increments will be required to keep the evolution close to the observations that are available. When the spread generated by the residual forcing is not sufficient as in the Northern Hemisphere extratropics, it may be due to the fact that either the randomness assumption is not satisfied, or that the analysis increments are localized to the neighborhood of the observations and thus inadequate for use as residual forcing terms.

Figure 3 compares the rmse of the deterministic (unperturbed) control run in dashed–dotted lines and the rmse of the ensemble mean in solid lines. The rmse of the control is larger than the rmse of the ensemble mean almost everywhere. At initial times the rmse of the control is much larger of the rmse of the ensemble mean, this is because the verification is performed against a random member of the ensemble of analysis instead of the unperturbed analysis. Bowler et al. (2015) demonstrated that this is equivalent to verifying against the truth. This demonstrates that the skill of the ensemble mean forecast is greater than the deterministic run’s skill at the same resolution at all lead times.

### f. Potential inconsistencies in the system

Though the assumption that the analysis increments are random is potentially restrictive, the assumption will be definitely degraded if

- the assimilation procedure is not set up optimally,
- the assumptions of linearity and of Gaussian random errors are not valid,
- the analysis increments are correlated in time,
- the analysis increments are not calculated consistently with their use as forcing terms.

In the following two sections we show some evidence of the importance of the time correlation of the analysis increments and how using weak-constraint 4D-Var to estimate the

#### 1) Time correlation of the analysis increments

Table 1 suggests that there is some degree of time correlation in the analysis increments and therefore it should also be allowed for when defining the

Figure 4 shows that there are strong time correlations depending on the pressure levels of the order of a few days. Similar results (not shown here) are obtained for the Southern Hemisphere and the tropics, although they show different time scales depending on regions, variables, and pressure levels. Therefore, future experiments will be designed in such a way that the different time correlation scales are taken into account when defining the residual forcing term

#### 2) Weak-constraint 4D-Var analysis increments

It is important to have a correct and consistent system that calculates the residual forcing terms as they are going to be applied to the ensemble forecast. That means that the forcing should be assumed to be applied at each time step as in weak-constraint 4D-Var rather than only at the beginning of the window as in strong-constraint 4D-Var. This inconsistency in the system would contribute to mismatch between spread and error since the scale and the magnitude of the residual error forcings are different in the two cases as shown in Fig. 5.

Figure 5 shows the comparison of the geographical variation of the error variance for the zonal wind component at 850 hPa (top panels) between residual forcings computed assuming that they are added once per window on the right and added at each time step on the left. The latter increments show more variance and are larger scale than the former. Similar results are obtained also for different variables and different pressure levels. For example the bottom panels of Fig. 5 show results for potential temperature at 250 hPa. This is to be expected because when using strong-constraint 4D-Var we rely on the model to evolve the initial perturbations for 6 h while in weak-constraint 4D-Var small increments are added at each time step through the 6-h window. At the end of window the effect is the same.

Less variance and smaller scale in

## 4. Conclusions

In this paper we have illustrated how to use observations to evaluate the effect of model error by using an ensemble of data assimilations. We have demonstrated the requirements needed to create a stochastic model such that the analyzed truth is statistically indistinguishable from a member of the forecast ensemble at all times with minimum spread.

The analysis error of a cycled data assimilation system converges to steady state under suitable stationarity assumptions. At this steady state, the statistics of the analysis increments have to be the same as the statistics of the error growth within a data assimilation cycle. We can create a “perfect” stochastic model by using the analysis increments as random residual forcing terms provided that the error growth over a given time interval is random with stationary statistics. We tested the randomness assumption of the analysis increments which define the residual forcing in a stochastic model. This assumption is accurate to within 3%.

Finally, we anticipate that optimizing the assimilation procedure, using a more complete set of observations such as a reanalysis, and introducing time correlation when defining the residual forcing, will lead to further improvement in creating a perfect stochastic model and thus a reliable ensemble data assimilation system.

## Acknowledgments

The authors thank Neill Bowler for very useful discussions on the project. Andrew Lorenc and Neill Bowler are thanked for providing valuable suggestions to improve the manuscript. The authors would also like to thank Mike Thurlow and Paul Earnshaw for the technical support in running an ensemble of 4D-Vars and David Davies for implementing the perturbed observation software.

## REFERENCES

Bowler, N. E., , M. J. P. Cullen, , and C. Piccolo, 2015: Verification against perturbed analyses and observations.

,*Nonlinear Processes Geophys.***22**, 403–411, doi:10.5194/npg-22-403-2015.Buehner, M., , P. L. Houtekamer, , C. Charette, , H. L. Mitchell, , and B. He, 2010: Intercomparison of variational assimilation and the ensemble Kalman filter for global deterministic NWP. Part I: Description and single-observation experiments.

,*Mon. Wea. Rev.***138**, 1550–1566, doi:10.1175/2009MWR3157.1.Burgers, G., , P. van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126**, 1719–1724, doi:10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.Cullen, M. J. P., 2010: A demonstration of cycled 4D-Var in the presence of model error.

,*Quart. J. Roy. Meteor. Soc.***136**, 1379–1395, doi:10.1002/qj.653.Cullen, M. J. P., 2013: Analysis of cycled 4D-Var with model error.

,*Quart. J. Roy. Meteor. Soc.***139**, 1473–1480, doi:10.1002/qj.2045.Desroziers, G., , L. Berre, , B. Chapnik, , and P. Poli, 2005: Diagnosis of observation, background and analysis-error statistics in observation space.

,*Quart. J. Roy. Meteor. Soc.***131**, 3385–3396, doi:10.1256/qj.05.108.Evensen, G., , and P. J. van Leeuwen, 2000: An ensemble Kalman smoother for nonlinear dynamics.

,*Mon. Wea. Rev.***128**, 1852–1867, doi:10.1175/1520-0493(2000)128<1852:AEKSFN>2.0.CO;2.Fisher, M., , M. Leutbecher, , and G. A. Kelly, 2005: On the equivalence between Kalman smoothing and weak-constraint four-dimensional variational data assimilation.

,*Quart. J. Roy. Meteor. Soc.***131**, 3235–3246, doi:10.1256/qj.04.142.Hodyss, D., , and N. Nichols, 2015: The error of representation: Basic understanding.

,*Tellus***67A**, 24 822–24 839, doi:10.3402/tellusa.v67.24822.Isaksen, L., , M. Bonavita, , R. Buizza, , M. Fisher, , J. Haseler, , M. Leutbecher, , and L. Raynaud, 2011: Ensemble of data assimilations at ECMWF. ECMWF Tech. Memo. 636, 48 pp. [Available online at http://old.ecmwf.int/publications/library/ecpublications/_pdf/tm/601-700/tm636.pdf.]

Mitchell, H., , and P. Houtekamer, 2009: Ensemble Kalman filter configurations and their performance with the logistic map.

,*Mon. Wea. Rev.***137**, 4325–4343, doi:10.1175/2009MWR2823.1.Palmer, T. N., 2001: A nonlinear dynamical perspective on model error: A proposal for non-local stochastic-dynamic parametrization in weather and climate prediction models.

,*Quart. J. Roy. Meteor. Soc.***127**, 279–304, doi:10.1002/qj.49712757202.Palmer, T. N., , R. Buizza, , F. Doblas-Reyes, , T. Jung, , M. Leutbecher, , G. J. Shutts, , M. Steinheimer, , and A. Weisheimer, 2009: Ensemble of data assimilations at ECMWF. ECMWF Tech. Memo. 598, 42 pp. [Available online at http://old.ecmwf.int/publications/library/ecpublications/_pdf/tm/501-600/tm598.pdf.]

Piccolo, C., , and M. Cullen, 2013: Estimation of model errors using data assimilation techniques. Met Office, 26 pp. [Available online at https://www0.maths.ox.ac.uk/system/files/attachments/Chiara_Piccolo.pdf.]

Smith, L. A., , and J. A. Hansen, 2004: Extending the limits of ensemble forecast verification with the minimum spanning tree.

,*Mon. Wea. Rev.***132**, 1522–1528, doi:10.1175/1520-0493(2004)132<1522:ETLOEF>2.0.CO;2.Tennant, W. J., , G. J. Shutts, , A. Arribas, , and S. A. Thompson, 2011: Using stochastic kinetic energy backscatter scheme to improve MOGREPS probabilistic forecast skill.

,*Mon. Wea. Rev.***139**, 1190–1206, doi:10.1175/2010MWR3430.1.Todling, R., 2015: A lag-1 smoother approach to system error estimation: Sequential method.

,*Quart. J. Roy. Meteor. Soc.***141**, 1502–1513, doi:10.1002/qj.2460.Trémolet, Y., 2007: Model-error estimation in 4D-Var.

,*Quart. J. Roy. Meteor. Soc.***133**, 1267–1280, doi:10.1002/qj.94.Whitaker, J. S., , and T. M. Hamill, 2012: Evaluating methods to account for system errors in ensemble data assimilation.

,*Mon. Wea. Rev.***140**, 3078–3089, doi:10.1175/MWR-D-11-00276.1.Wilks, D. S., 2004: The minimum spanning tree histogram as a verification tool for multidimensional ensemble forecasts.

,*Mon. Wea. Rev.***132**, 1329–1340, doi:10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2.