## 1. Introduction

A popular method used to account for initial condition uncertainty in Numerical Weather Prediction (NWP) is the ensemble approach to forecasting (Palmer et al. 1992; Toth and Kalnay 1993). The atmosphere’s initial condition probability distribution function (PDF) is discretely sampled, and each sample is propagated forward by the NWP model. The resulting collection of forecasts is treated as a discrete sample of the forecast PDF. One factor that can limit an ensemble forecast’s utility is its relatively small size. This paper presents an approach to increase forecast ensemble size using lagged ensemble forecasts that have been transformed to account for all observations that have become available since the forecasts were launched. This transformed lagged ensemble forecasting (TLEF) technique is equally valid for single-model and multimodel ensembles, but the technique introduced here places emphasis on the single-model case in the context of an idealized model. Issues associated with operational NWP ensembles and multimodel scenarios will appear in a companion paper.

The TLEF proves an extremely inexpensive way to increase ensemble size, but as will be shown below, because the methodology is applied to ensemble perturbations it impacts only the higher-order moments of the distribution, not the mean. Thus, the traditional benefit of improved ensemble mean skill for increased ensemble size (Buizza 1997; Houtekamer and Mitchell 1998) is not realized for TLEF ensembles. Instead, benefits are derived by other ensemble-based methods such as targeting or limited-area modeling where increased numbers of ensemble forecast perturbations can be put to good use.

Increasing the size of NWP ensemble forecasts is a computationally demanding endeavor. The method described here increases ensemble size inexpensively by utilizing old ensemble forecasts. The idea of using old forecasts to augment the most recent forecast is not new. Lagged average forecasting was introduced by Hoffmann and Kalnay (1983) as a precursor to ensemble forecasting. Instead of defining an ensemble of initial conditions at the initial time and propagating them forward to some forecast time of interest, one generates a forecast ensemble by collecting the single forecasts launched at different times but all verifying at the same time. Because the different forecasts are conditioned upon different amounts of observational information, the authors proposed weighting the older forecasts by a measure of their expected skill. In the majority of applications, each of the forecasts are given equal weight (Kalnay 2003). A more sophisticated weighting for lagged average forecasting was proposed by Ebisuzaki and Kalnay (1991). The method proposed in the current work differs from the above approaches in two important ways: 1) it combines ensemble forecasts rather than a single forecast, and 2) it uses the combination of observations and ensemble-based uncertainty information to provide a statistically rigorous, state-dependent “weighting.”

The TLEF, as presented here, is based on the ensemble transform Kalman filter (ETKF) of Bishop et al. (2001). It is important to note that the technique is not limited to the ETKF or even to deterministic ensemble filters; Evensen (2003) provides a formulation of the ensemble Kalman filter (EnKF) that is amenable to application with the TLEF. In our implementation of the TLEF, a transformation based on the ETKF is performed on old ensemble forecasts each time new observations become available. The information contained in the transformation is then propagated forward in the ensemble subspace under the assumption of linear error evolution to alter the ensemble forecasts *without rerunning models*. The ensemble forecast itself provides all the information necessary to propagate the impact of the observations forward in time: rerunning the numerical model is not necessary. The transformation process results in a collection of ensemble forecasts that are directly comparable at any verification time; all ensemble members have been conditioned upon the same number of observations. This approach effectively increases the forecast ensemble size without adding any new model runs.

The most recent ensemble forecast will provide an estimate of *p*[**x*** ^{t}*(

*τ*

_{ver})|

**Y**(0)]: the probability of the true state of the system at

*t*=

*τ*

_{ver}given all observations up to and including those at

*t*= 0. The ensemble forecast launched one observation time prior to the most recent provides an estimate of

*p*[

**x**

*(*

^{t}*τ*

_{ver})|

**Y**(−

*τ*

_{obs})], a different distribution from that which is desired. The aim of the TLEF is to transform the samples from

*p*[

**x**

*(*

^{t}*τ*

_{ver})|

**Y**(−

*τ*

_{obs})], so that they become samples from

*p*[

**x**

*(*

^{t}*τ*

_{ver})|

**Y**(0)] and can be used to augment the most recent ensemble forecast. Ensembles can be transformed multiple times, and each time new observations become available previously transformed ensembles can undergo another transformation. The number of times an ensemble forecast can be transformed is controlled by the validity of the underlying linearity assumption, and the level of model error. It is important to point out that the approach only provides new information when the model dynamics are nonlinear and/or when the ensemble prediction system’s ensemble construction methodology is different from that used by the TLEF (the ETKF in this case). This point is discussed in section 2c.

This manuscript describes a proof-of-concept study of the TLEF technique by applying it to an idealized model. In section 2 the ETKF is briefly described, followed by a description of the TLEF methodology. Section 3 explores the implementation of this framework in an idealized model environment where the sensitivity of the results can be extensively studied. Section 4 concludes with a brief discussion of the results and some suggestions for implementation into an NWP framework and possible improvements to the approach.

## 2. Formulation of the TLEF

### a. The ETKF

The ETKF is conceptually similar to both the ensemble square root filter (Whitaker and Hamill 2002) and the ensemble adjustment filter (Anderson 2001). The ETKF has been applied to the targeted observation problem (Bishop et al. 2001), ensemble generation (Wang and Bishop 2003), and data assimilation (Majumdar et al. 2002). In this work we use the ETKF to propagate observational influence forward in time. The TLEF can use the ETKF machinery to update both the forecast mean and ensemble perturbations when observations become available, and then propagate those changes forward in the ensemble subspace under the assumptions that they evolve linearly. *This is achieved by simply applying the transformation matrix derived at the observation time to all forecast perturbations*.

The ETKF is an approximation of the extended Kalman filter (Jazwinski 1969). The extended Kalman filter is a near-optimal algorithm that provides a minimum error variance estimate of the state **x*** ^{a}*, and its uncertainty 𝗣

*given a prior estimate of the state*

^{a}**x**

*, available observational data*

^{f}**y**, and the uncertainty associated with each, 𝗣

*and 𝗥, respectively.*

^{f}^{1}The first guess is typically in the form of a short-term forecast,

**x**

*=*

^{f}**F**(

**x**

*), where*

^{a}**F**can be a nonlinear operator. The uncertainty associated with the first guess is obtained by linearly propagating the uncertainty associated with the previous minimum error variance estimate forward to the new observation time, 𝗣

*= 𝗠𝗣*

^{f}*𝗠*

^{a}^{T}, where 𝗠 is the linear uncertainty propagator defined by integrating the Jacobian of

**F**. The ensemble approximations to the extended Kalman filter take a Monte Carlo, or ensemble, approach to producing the uncertainty associated with the first guess, 𝗣

*= 𝗫*

^{f}*𝗫*

^{f}

^{f}^{T}. Here 𝗫

*is a matrix whose columns consist of the vectors (*

^{f}**x**

^{f}

_{i}−

x

^{f})(

*N*

_{ens}− 1)

^{−1/2}for

*i*= 1, . . . ,

*N*

_{ens}, where

**is the ensemble mean,**x

**x**

^{f}

_{i}is an individual ensemble member, and

*N*

_{ens}is the number of ensemble members: 𝗫

*is a matrix of normalized ensemble perturbations.*

^{f}^{2}and

**y**is an observation. To obtain analysis ensemble perturbations, forecast ensemble perturbations 𝗫

*are subjected to a*

^{f}*linear transformation*(via a 𝗧 matrix) so that the distribution of transformed ensemble perturbations exactly matches the Kalman filter expression for the uncertainty associated with the analysis estimate:Bishop et al. (2001) described a computationally efficient approach to solving Eqs. (1) and (2) when the ensemble size

*N*

_{ens}is smaller than the number of observations

*N*

_{obs}. Note that as is the case for all ensemble filters, if the ensemble size is smaller than the rank of the true forecast error covariance matrix in observation space, the mean ETKF analysis increments and error covariance updates are far from optimal. The ETKF minimum error variance estimate is obtained through the following expression:where 𝗖 and

**Γ**contain the eigenvectors and eigenvalues of 𝗫

^{f}^{T}𝗛

^{T}𝗥

^{−T/2}𝗥

^{−1/2}𝗛𝗫

*, respectively, and 𝗘 contains the eigenvectors of 𝗥*

^{f}^{−1/2}𝗛𝗫

*𝗫*

^{f}

^{f}^{T}𝗛

^{T}𝗥

^{−T/2}that are not in the null space of 𝗫

*𝗫*

^{f}

^{f}^{T}. Only the highlights of the method are given here, for a complete treatment the interested reader should see Bishop et al. (2001).

*are added to the minimum error variance of the state vector*

^{a}x

^{a}. Note that the transformation 𝗧 only depends on the forecast and observation error

*statistics*, not on the forecasts or observations themselves. Equations (2)–(5) provide the tools necessary to perform ensemble-based data assimilation using the ETKF.

The expression for the 𝗧 matrix in Eq. (4) includes a matrix 𝗖^{T}, which, although not included in the original formulation by Bishop et al. (2001), provides an extra transformation to ensure the new analysis perturbations are centered about a zero mean without changing their sample covariance. If this matrix is omitted, one of the analysis ensemble members is nearly indistinguishable from the mean. This is the “spherical simplex” ETKF formulation of Wang et al. (2004).

We reiterate that the TLEF is not dependent upon the use of the ETKF. Other ensemble data assimilation schemes can be similarly framed (see Tippett et al. 2003). As will be discussed later, the TLEF is derived under the assumption that the dynamics is linear, but will actually give no new information if the dynamics really is linear. The current work relies on the nonlinearity of the model and the mismatch between the scheme used to generate the forecast ensembles (the EnKF) and the scheme used in the TLEF (the ETKF). Other researchers may consider utilizing the EnKF in the TLEF framework; the perturbed observations utilized by the EnKF would allow transformed ensembles to be different from untransformed ensembles in the case of linear dynamics.

### b. Ensemble inflation

*underestimates the total analysis error variance both because it lacks contributions from important parts of the error space and because of sample size limitations. A simple approach to partially address this deficiency (although not to entirely ameliorate the problem) uses an inflation factor to boost the perturbation amplitude. From the basic methodology of Wang and Bishop (2003), this inflation factor, denoted by Π*

^{f}*, is calculated to ensure that the forecast ensemble variance at some chosen lead time*

_{r}*t*is made consistent with the control forecast error variance at the observation locations. Therefore, assuming that the statistics of the next globally averaged forecast will be similar to those of the previous forecast, the value of II

_{r}*can be obtained bywhere*

_{r}*r*= 1, . . . ,

*N*(

_{r}*N*is the total number of available forecasts or “realizations”) and

_{r}*α*is defined aswhere

_{r}**d̃**

_{r}is the innovation vector at

*t*normalized by the square root of the observation error covariance matrix. Similarly,

_{r}*is the ensemble covariance at*

^{e}_{r}*t*and

_{r}*N*

_{obs}is the number of observations.

We slightly modify the definition of the inflation factor given in Eq. (6). A running average of the last 200 values of *α* is calculated for each realization, and Π* _{r}* is obtained directly from the square root of this running average of

*α*values (previous values of the inflation factor Π

_{r}_{−1}were not used in the calculation). The reason for this modification can be seen in Eq. (7). When the dimension of the observation space

*N*

_{obs}is small it is possible to have innovation vectors that are so small that the numerator of Eq. (7) becomes negative. The running average allows us to avoid square roots of negative numbers, but the memory in the running average can result in a long string of

*α*

_{r}*cause the inflation factor to diverge. In the following study, inflation is only applied to the initial ensemble forecast (not to the transformed forecasts).*

_{r}Section 2c describes the simple extensions to the ETKF that are necessary to implement the TLEF.

### c. The TLEF

*to be a linear uncertainty propagator, then the traditional approach is given byAlternatively, it is possible to jump straight from the analysis at time*

_{i}*t*= 0 to the forecast at time

*t*= 2 that is conditioned by the observations at time

*t*= 1 by first linearly propagating to time

*t*= 2 and then applying the impact of the observations at time

*t*= 1:This is trivially extended to longer forecast leads and more observation periods. A more detailed explanation is given below. The treatment and nomenclature will appear confusing to the casual reader, but are necessary for those who wish to reproduce our results. If desired, the remainder of the section can be skipped by the casual reader without loss of comprehension.

The time indices were removed from the equations in section 3a for the sake of notational simplicity. A coherent discussion of the TLEF requires that the time indices now be reintroduced in an unusual fashion. Define 𝗫* ^{f}*(

*τ*

_{1},

*τ*

_{2},

*τ*

_{3}) to be the normalized ensemble forecast perturbations from a forecast that was originally launched at

*t*=

*τ*

_{1}, that were last transformed at

*t*=

*τ*

_{2}, and are being assessed at

*t*=

*τ*

_{3}. For example, 𝗫

*(−6, 0, 6) would be the normalized ensemble forecasts verified at*

^{f}*t*= 6 h that were launched at

*t*= −6 h, but were transformed by the observations at

*t*= 0 h. An analyzed state will always have

*τ*

_{1}=

*τ*

_{2}=

*τ*

_{3}; for example,

x

^{a}(0, 0, 0). Using this nomenclature an analysis state can also be expressed in terms of a forecast state as

x

^{a}(0, 0, 0) =

x

*(−*

^{f}*τ*

_{obs}, 0, 0).

*t*= 0, and observation times have been normalized to integer values. The analysis increment gives the perturbation that must be added to the first-guess state in order to obtain the analyzed state. Assuming the analysis increment is small, the impact of the increment at some future time can be obtained by linearly propagating the analysis increment forward in time. Define 𝗠(

*τ*

_{1},

*τ*

_{3}) to be the linear propagator that takes a perturbation at

*t*=

*τ*

_{1}and propagates it forward to

*t*=

*τ*

_{3}(in this example

*τ*

_{3}= 4). Thenor equivalently, Eqs. (12) and (13) can be combined to obtainIf one assumes that the ensemble subspace captures all dynamically important directions, and that forecast ensemble perturbations evolve linearly, thenIn words, the linearly propagated perturbations are nothing more than the original forecast perturbations produced by the ensemble forecasting system. Equation (15) can be substituted into Eq. (14) to giveThus, mean forecast corrections can be made simply by replacing the 𝗫

*(−1, −1, 0) of Eq. (12) with the normalized forecast perturbations at the desired forecast time,*

^{f}*t*= 4 in this example.

*(0, 0, 0) = 𝗫*

^{a}*(−l, −l, 0)𝗧(0), and if we assumethen 𝗫*

^{f}*(−l, 0, 4) = 𝗫*

^{f}*(−l, −l, −4)𝗧(0). Here 𝗧(0) indicates that the transformation matrix was calculated at*

^{f}*t*= 0.

*All that is necessary to propagate the influence of the new observations forward in time over the forecast period is to transform the normalized forecast ensemble perturbations with the*same

*transformation matrix that was used at the observation time.*

Continuing with this example, the transformed forecast ensemble perturbations at *t* = 4 can be transformed again once the next set of observations become available at *t* = 1: 𝗫* ^{f}*(−l, l, 4) = 𝗫

*(−l, 0, 4)𝗧(1). This process can be repeated until*

^{f}*τ*

_{2}=

*τ*

_{3}[𝗫

*(−1, 4, 4)] or until the assumptions that go into the transformation (linearity, validity of ensemble subspace, etc.) are violated. The reader is reminded that the implications of this treatment are that*

^{f}*if*the system is linear and

*if*the data assimilation scheme employed is the ETKF, then the TLEF as presented above provides no new information. The TLEF aims to provide value by exploiting the atmosphere’s nonlinearity and the current disconnect between operational data assimilation and ensemble construction.

In an effort to clarify the different time indices, in the narrative of the text we will refer to *lags* and *leads*. The lag will be defined as the number of observation times between the time at which a forecast was launched and the time at which the last observations were incorporated into the forecast: lag = *τ*_{2} − *τ*_{1}. An untransformed ensemble forecast launched from the most recent observations would be a lag-0 forecast.^{3} The lead will be defined as the number of observation times between the time at which the last observations were incorporated and the forecast time of interest: lead = *τ*_{3} − *τ*_{2}. For example, 𝗫* ^{f}*(−1, 1, 4) is a lag-2, lead-3 forecast: it has been transformed twice, and we are interested in a forecast time 3 observation times from the most recent observation.

*t*= 1, the ensemble forecast at

*t*= 4 can be made up of the ensembles given by the most recent update 𝗫

*(l, 1, 4) (lag 0, lead 3), and a collection of transformed ensemble forecast perturbations [𝗫*

^{f}*(0, 1, 4) (lag 1, lead 3), 𝗫*

^{f}*(−1, 1, 4) (lag 2, lead 3), and any additional lags that one wishes to include]. A simplified cartoon of this process is shown in Fig. 1. In Fig. 1a an ensemble forecast (represented by dark circles), based on observations (denoted by the small white box) is generated by an ensemble prediction system (EPS), and is launched from*

^{f}*t =*−1 to

*t*= 4. The ensemble perturbations at

*t*= 4 are described by 𝗫

*(−1, −1, 4). In Fig. 1b a new observation becomes available at*

^{f}*t*= 0 and a transformation of 𝗫

*(−1, −1, 0) perturbations is performed. In addition, a new ensemble is produced by the EPS at*

^{f}*t*= 0 and propagated forward to

*t*= 4. The 𝗫

*(0, 0, 4) forecast perturbations at*

^{f}*t*= 4 are given by the dark triangles, and the transformed 𝗫

*(−1, 0, 4) = 𝗫*

^{f}*(−1, −1, 4)𝗧(0) ensemble perturbations at*

^{f}*t*= 4 are given by the new position of the dark circles. Notice that the original forecasts launched from

*t*= −1 are now shown as gray circles to denote they have been superseded by the updated forecasts. Another set of new observations become available at

*t*= 1 in Fig. 1c. A transformation of both 𝗫

*(−1, 0, 1) and 𝗫*

^{f}*(0, 0, 1) takes place,*

^{f}^{4}and a new ensemble is generated at

*t*= 1 by the EPS. The ensemble is propagated forward and the 𝗫

*(l, 1, 4) forecast perturbations at*

^{f}*t*= 4 are given by the dark squares, the singly transformed 𝗫

*(0, 1, 4) = 𝗫*

^{f}*(0, 0, 4)𝗧(1) forecast perturbations are given by the dark triangles, and the doubly transformed forecast perturbations,are given by the dark circles. In Fig. 1c, the previous “forecast trajectory” lines have been omitted for clarity and only the new forecast trajectory and the transformed states remain. The forecast states prior to transformation are always shown as gray symbols. So in this example, the size of the lead-3 ensemble forecast valid at*

^{f}*t*= 4 has been tripled, as shown by the collection of dark circles, triangles, and squares in Fig. 1c.

It is important to point out that the transformed perturbations do not rely upon the transformed mean [as seen in Eq. (4)]. This implies that the perturbations need not be added back onto the transformed mean at the desired forecast time. Instead, they can be added to the mean of the untransformed ensemble (lag 0) launched from the most recent observations. The implementation of the ETKF considered here is unable to account for the spurious long-distance correlations that can arise from small sample sizes,^{5} and Wang and Bishop (2003) suggest that these incorrect correlations can make obtaining a good transformed mean more difficult than obtaining good transformed ensemble perturbations. In this work we choose to use the latest lag-0 forecasts to center the transformed ensemble perturbations; we do not explore the impact of using the transformed mean. Also note that the method does not require that the ETKF be used to construct the initial ensembles, it only assumes that the ensembles accurately reflect the underlying forecast uncertainty.

## 3. Application of the TLEF to an idealized model

### a. Experimental setup

*n*= 100, and

*F*= 8. The equations represent a forced dissipative system containing a nonlinear advection-like term, a damping term, and an external forcing. A fourth-order Runge–Kutta integration scheme is employed with a fixed time step of 0.05. This corresponds to approximately 6 h in the atmosphere based on error-doubling times. The boundary conditions are cyclic, so if one insists upon an atmospheric analog, the variables can be thought of as representing some atmospheric quantity distributed along a latitude circle. The important aspects of the system are that it is a chaotic, spatially extended system with a reasonable number of degrees of freedom.

The results reported in this work are for the case where a perfect model of the system exists: the model equations are identical to the system equations. The impact of model inadequacies will be explored in a companion paper. Although the TLEF is based on the ETKF framework, it does not require that the ensemble forecasts be generated using ETKF data assimilation. In the majority of experiments described below, data assimilation and ensemble generation is done using the EnKF (Evensen 1994) with *N*_{obs} = 100 and *N*_{ens} = 1024. A subset of these 1024 ensemble members and a reduced number of observations are utilized by the TLEF. The motivation for this experimental setup was to provide ensemble forecasts that are as close to random draws from the correct forecast PDF as possible, and to eliminate a potential source of misinterpretation. Assessment performed on the large EnKF ensembles (not shown) demonstrate that this aim was achieved. The sensitivity of results to the quality of the ensemble forecasts is explored and discussed below. The ensemble members used in the TLEF experiments are randomly chosen from the large EnKF analysis ensemble each iteration. Experiments were repeated where the same subsample of ensemble members were always used, and there was no qualitative impact on the results.

While not strictly necessary, the TLEF is best suited for situations where the ensemble size *N*_{ens} is less than the number of observations *N*_{obs}. In the experiments described below, the TLEF ensemble size varies from *N*_{ens} = 10 to *N*_{ens} = 80, and the number of observations varies from *N*_{obs} = 20 to *N*_{obs} = 90. The baseline configuration sets *N*_{ens} = 80 and *N*_{obs} = 90. Observations were generated by randomly perturbing the true system state and are taken every six model hours. The observational error is assumed to be Gaussian and uncorrelated in both space and time. While these assumptions are unlikely to be satisfied in NWP applications, they allow us to eliminate the impact of correlated observational errors and simplify the interpretation of results. The standard deviation of the observational noise was set to *ϵ* = 0.2, which is roughly 3% of the range of the state variables. All statistics are based on the average of 10 independent experiments, each using 3300 successive observation times.

The implementation of the TLEF used in this work does not support localization, but it does support covariance inflation. As described in section 2b, the inflation factor is applied *only* to the lag-0 forecast perturbations (not to the transformed forecast perturbations), and is applied prior to any transformations. This means that lag-0 forecast perturbations are calibrated on the fly and any variance deficiency associated with ensemble size will be reduced. This removes any advantage the TLEF transformed ensembles might have that is simply due to calibration.

The final aspects of the experimental setup are to define the number of lagged ensembles to be included in a combined (or “augmented”) ensemble, and the distance into the future that transformations are performed (the definitions of lag and lead are given in section 3c). Note that the lead is defined with respect to the most recent observation. The ensemble forecast with lag 4 (launched one model day in the past) and lead 4 (verifying one model day in the future) will have been subjected to four transformations from the four observations encountered over the lag period; the lag gives the number of transformations performed on the ensemble, and the lead gives the distance into the future that transformations have been performed. For all of the experiments described below, there are 4 lags (one model day) and 24 leads (six model days) available at any time.

### b. Ensemble mean error and ensemble spread

To demonstrate the characteristics of the TLEF, Figs. 2a–c displays a plot of the average root-mean-square (RMS) distance of the *perturbations* from the ensemble mean (here referred to as the average ensemble spread) as a function of lag. This method of ensemble evaluation gives an indication of the effect of the transformations on the variance of the ensemble. Negative *x*-axis values in Fig. 2a indicate times before the most recent observation (observations are available every six model hours over the negative interval) and positive values along the *x* axis represent times beyond the most recent observation.

Figure 2a shows a plot of average ensemble spread of the lag 0 and *transformed* perturbations for all available lags and leads up to six model days. For this combination of ensemble size, observational uncertainty level, and number of observations considered for the baseline experiment, the transformed ensembles do not degrade significantly before 6 days. The thick black solid curve in Fig. 2a is for lag 0, and gives the spread for the untransformed (initial) ensemble perturbations. Remember that the lag-0 forecast perturbations have been calibrated using the state-dependent inflation factor. As a result the lag-0 spread curve is consistent with the average RMSE of the ensemble mean forecasts (not shown). The thin solid curve gives the same information for lag-1 perturbations; this is the average spread of the ensemble that results from applying the TLEF to the lag-0 ensemble perturbations once an observation becomes available. The legend in Fig. 2a indicates the line types for lag 0–lag 4 where the same process of successive transformations continues. Note that the lag-4 forecasts have undergone four transformations and that each 6-h transformation slightly reduces the average ensemble spread of the new forecast perturbations. The *y* axis has been normalized with respect to the lag-0 value at *t* = 0 to aid comparison and error bars reflecting the standard deviation of the RMS values over 10 experiments have been added to all curves. Figure 2b shows the same information as Fig. 2a but plotted relative to the time of the most recent observation seen by the ensembles. Therefore, all curves are lined up as if launched from the same time. Figure 2c shows the information from Fig. 2b in more detail between 0 and 1 day. Note that the rate of increase in spread of the transformed ensemble curves is almost identical to the rate of increase of spread in the lag-0 curve in Fig. 2c. It is also clear that 1-day forecasts exhibit relatively linear error characteristics. Assuming this behavior is characteristic of the TLEF in this configuration, 1-day forecasts are assessed for clarity and ease of comparison in the rest of this study.

It is important to emphasize that one should not expect the TLEF to produce ensemble perturbations that are an improvement upon the calibrated lag-0 ensemble perturbations.^{6} This implies that while it would be unprofitable use a transformed ensemble in isolation (unless it is in the period between the time that new observations have arrived and the time that the latest ensemble forecast is completed), it can be used to augment the most recent untransformed ensemble forecast. Using this approach, forecast ensemble sizes can be increased by combining ensemble forecasts that have been corrected by all available observational information; the untransformed and all transformed ensembles are draws from the same distribution, *p*[**x**(*t*)|**Y**(0)].

The transformed ensemble perturbations are added onto the most recent lag-0 forecast mean to produce a new ensemble forecast. This newly transformed forecast can then be combined with the lag-0 forecast to produce an ensemble with a larger number of members. As all ensembles have been updated using the same number of observations, it does not matter when the original perturbations were “launched” or how many ensembles are combined. As all perturbations are centered around the most recent forecast, it is clear that an ensemble augmented in this way would have an ensemble mean forecast error identical to the lag-0 ensemble mean forecast error. This negates the most practical benefit of increased ensemble size (improved ensemble mean), but benefits can still be gained for techniques such as targeting (where a large ensemble would reduce spurious long-distance correlations) and regional ensemble forecasting (where larger ensembles of boundary conditions can be generated).

To assess whether the error variances forecast by an augmented forecast ensemble are correct, ensemble mean forecast error (with respect to the true system state) for a single component is divided by that component’s associated ensemble standard deviation.^{7} If the ensembles have correct second moment statistics, then the distribution of this ratio will have a mean of 0 and a standard deviation of 1. Means different than 0 indicate a bias, standard deviations different than 1 imply that the ensemble variance is systematically over- or underestimated. The covariances are not considered here. Results for the baseline case for the lag 0 (80 members), transformed (lag 1, lag 2, lag 3, and lag 4 with 80 members each), and augmented (400 member) 1-day forecasts are given in Table 1 and represent an average of second-order moments over all components.

The mean values of the distributions (displayed in the first row) are indistinguishable from 0, indicating a relatively unbiased ensemble. Standard deviation values greater than 1.0 imply that the ensembles are all slightly underdispersive. The mean standard deviation values in Table 1 indicate slightly underdispersive forecast ensembles, although the associated uncertainty indicates that the standard deviations are indistinguishable from 1.0. Note that in the mean values of the standard deviations, the lag-0 value is always closest to 1.0 and the value increases with increasing number of lags (lag 1, lag 2, lag 3, and lag 4). These values reflect the characteristics of the ensemble spread curves in Fig. 2c. When all ensembles are augmented into a 400-member ensemble (the lag-0 ensemble, combined with all transformed ensembles lag 1–lag 4), the standard deviation value of the ratio is equal or closer to 1 than all the individual 80-member lagged ensembles, but not that of the 80-member lag-0 ensemble. Perhaps one should not be too surprised by these results; after all, the lag-0 forecasts includes an inflation factor to help overcome variance deficiency due to finite ensemble size, and that inflation propagates through the TLEF. The important thing to take away is that the first two moments of the transformed and augmented ensemble forecasts indicate that the TLEF is behaving as expected, and that the ensemble forecasts seem consistent with draws from an approximately correct distribution; the transformation process does not significantly degrade the second-moment statistics of the forecast ensemble relative to the lag-0 results.

### c. Rank histograms

Rank histograms were constructed to test whether these second-moment results hold for higher-order moments as well. One-day forecast ensembles for 10 specific components were used in the calculation of rank histograms. These components were spaced equally throughout the 100-component domain to minimize correlation between histogram increments, and the results were qualitatively similar to rank histograms produced for one single component. Truth is used as the verification, and in each case, the ensemble is subsampled to *N*_{ens} = 9 in order to adequately populate the rank histograms and fairly compare ensembles of different size. Rank histograms for each of the 10 components and each of the 10 realizations were averaged to produce a single rank histogram for each of the lag-0, transformed, and augmented ensembles. The variability in the 10 independent realizations permitted the production of standard deviation error bars for the population in each bin. Results are shown in Fig. 3. Figure 3a plots the 1-day rank histogram for the lag-0 ensemble. The solid horizontal line indicates the expected value for each bin (based on the number of bins), and the dashed horizontal lines represent plus and minus the expected one standard deviation (based on the number of bins and number of realizations making up the rank histogram; Smith and Hansen 2004). The error bars at the top of each histogram bar indicate the *realized* standard deviations of the distributions over the 10 experiments. Also note that the *y* axis has been enlarged to range between 0.8 and 1.1 so that the structure of the distribution can be more clearly seen. Upon close inspection, the rank histogram for lag-0 ensemble forecasts display end bins that are slightly underpopulated in the mean, although the error bars for all bins are consistent with the expected mean (i.e., for some realizations the end bins are overpopulated); statistically, the lag-0 rank histogram is indistinguishable from a uniform distribution.

Figure 3b plots the 1-day rank histogram from the augmented 400-member ensemble. To ease comparison with the lag-0 ensemble forecast results, this combined ensemble is also subsampled to nine random members so that all rank histograms have the same number of bins. The augmented ensemble’s rank histogram appears underdispersive in the mean, but again the error bars indicate that the distribution is indistinguishable from a uniform distribution. Figures 3c–f plots the rank histograms from lag 1 to lag 4, respectively. Each rank histogram illustrates that the transformed ensembles exhibit slightly underdispersive in the mean, but are statistically indistinguishable from a uniform distribution. A rank histogram produced using a 400-member lag-0 ensemble (not shown), was indistinguishable from the collection of rank histograms displayed in Fig. 3. Consistent with the second-moment statistics of section 3b, the transformation process does not significantly degrade the lag-0 performance, but does significantly increase the ensemble size. The rank histogram generated by a 400-member lag-0 forecast is statistically identical to the rank histograms of Fig. 3.

### d. Brier skill scores

Our final form of assessment is the Brier score (Wilks 1995). The Brier score measures the mean-square error of probabilistic forecasts and ranges from 0 (for a perfect forecast) to 1 (for an unskillful forecast).

*y*is the forecast probability and

_{k}*o*is the observation probability such that

_{k}*o*= 1 if the event occurs and

_{k}*o*= 0 if the event does not occur. The index

_{k}*k*denotes a numbering of the forecast/event pairs. Therefore the score averages the squared difference between the forecast probabilities and the subsequent binary observations.

For the simple model experiment, values of truth are readily available and so we define an event occurrence as when truth exceeds a climatological threshold value. The climatology is defined as the average value of the truth over all realizations and all components over 3300 realizations. The threshold values are based on the standard deviation (*σ*) of the climatology and is set at the mean, ±*σ*, ±0.5*σ*, and ±0.25*σ*. We consider an event to have occurred when verification exceeds the threshold value. Forecast probabilities are calculated based on the number of ensemble members exceeding each of the threshold values. The Brier score was calculated for each threshold at all model components and then averaged over all components and all realizations.

_{ref}is the Brier score of the reference forecast. This expresses the

*percentage improvement*of the Brier score of the new EPS over the reference EPS. We define our reference EPS to be the lag-0 forecast. The BSS will provide a measure of the improvement of the transformed and augmented ensembles over the initial lag-0 ensemble, if any. The BSS was calculated from the average Brier scores (not by averaging all components’ BSS results) and 15 independent experiments with 5000 realizations each were run to assess the variability of results. The individually transformed ensemble BSS results were assessed along with augmented ensemble results containing one, two, three, or four transformed ensembles. The mean results over the 15 independent experiments are shown in Table 2, along with the results from a 400-member fully nonlinear ensemble. The results of this large nonlinear ensemble should be compared with the results of the 400-member, augmented ensemble consisting of 4 transformations. Positive values of BSS (improvements over the lag-0 ensemble) are shown in boldface type for clarity. The values reveal that a large majority of individually transformed ensembles are worse than lag-0 ensembles (i.e., they have negative BSS). However, the BSS of the augmented ensembles are positive in all cases with mean scores ranging between +0.23% and +0.79%. Analysis of the distribution of results reveals that the BSS of the individually transformed ensembles are indistinguishable from zero: they are statistically the same as the lag-0 forecast. The results also show that the BSS of the augmented ensembles are robustly positive, indicating the augmented ensembles are an improvement over the lag-0 ensembles. Error bars were provided for the largest of the augmented ensembles (lag 1–lag 4) and for the 400-member nonlinear ensemble. The BSS of the nonlinear ensemble are systematically larger than the augmented ensemble, but the two are statistically consistent. The augmented ensemble forecast produced by the TLEF provides the nearly the same information as the 400-member lag-0 ensemble forecast, but at a significant reduction in cost. Because the calibration of the lag-0 forecast strives to remove problems associated with sample size, the positive BSS values are indicative of in increase in the resolution component of the Brier score, not the reliability component (Wilks 1995).

The BSS results of the 400-member ensemble indicate that there is little marginal gain in a Brier score sense for increasing the ensemble size for this experiment, and it is very likely that other techniques would produce BSS superior to those produced by the TLEF (Wilks 2002; Kharin and Zwiers 2003). But the benefit of the TLEF is that increased numbers of dynamically consistent ensemble forecasts are made available for use by other ensemble-based techniques.

### e. Sensitivity of results

In nonlinear systems, the linearity criteria assumed by the TLEF is violated for various combinations of initial error magnitude, location in state space, and time scales over which the linearity assumption is assumed. To investigate the sensitivity of TLEF within this idealized model framework, we alter some of the defining parameters used in the baseline case, such as ensemble size, number of realizations, number of observations, and method of ensemble generation.

#### 1) Ensemble size

Because one of the aims of the TLEF is to characterize the benefits of increased ensemble size, a number of experiments were performed with subsampled ensemble sizes both smaller and larger than the baseline case. Statistical assessment of a 10-member ensemble was performed with an identical experimental setup, and second-moment statistics revealed that the lag-0 and the subsequent transformed ensembles are underdispersive to a greater degree than the baseline case in section 3b. The ratio standard deviations of a single experiment increased from 1.19 to 1.50 for lag 0 through lag 4, respectively. Subsequent augmentation of all ensembles was consistent with the baseline case where combining the ensembles produced improved statistics over individual ensembles (with exception of the lag-0 ensemble). Rank histograms produced for this example were consistent with these results. BSSs suggest that when any combination of the lagged forecasts are augmented with the most recent lag 0 the ensemble is an improvement over the individual lag-0 ensemble (given by positive BSS). This may suggest given the small ensemble size, the lag-0 ensemble is largely degraded by sampling error and limitations imposed by the TLEF are not as prominent. This may also demonstrate that for these smaller ensemble sizes the effect of the TLEF is greatly amplified.

It has already been shown that the transformation process can make ensemble sizes larger without a significant degradation of statistics. This was be verified in section 3b by comparing the statistics of the augmented 400-member ensemble (made up of a lag 0 and four 80-member lagged ensembles) with a lag-0, 400-member ensemble. Second-moment statistics of the (calibrated) lag-0, 400-member ensemble produced a ratio mean of 0.005 and a ratio standard deviation of 1.05. The increased computational effort of running a 400-member ensemble does not provide any noticeable benefit in this case for these measures.

#### 2) Observational density and uncertainty magnitude

The ensembles used for the baseline experiment were a random subsample from a large (*N*_{ens} = 1024) ensemble generated by the EnKF. This experimental setup was chosen to ensure that the ensemble forecasts transformed by the TLEF were a random draw from the correct distribution. Reducing the number of observed components used in the EnKF generation of the analyses and lag-0 forecasts has the same impact on lag-0 forecasts as reducing the ensemble size; the forecast ensemble becomes a worse representation of the correct forecast distribution. Decreases in the observational density also cause the transformed ensembles to become worse than the pretransformed forecasts at shorter lag–lead combinations than in the baseline experiment; the linearity assumption become more tenuous as the size of the initial error increases. The same effect is generated by increasing the magnitude of observational uncertainty. As the quality of the analysis degrades, the TLEF is less and less able to provide quality transformed forecasts.

#### 3) Ensemble construction methodology

Experiments were carried out where the ETKF was employed as the data assimilation scheme, and the TLEF used the same ensemble size and number of observations as the assimilation scheme. Because of the smaller ensemble size (*N*_{ens} = 80) and the lack of covariance localization in our implementation of the ETKF, the quality of analysis ensembles were degraded, and forecast errors were 1.5–2 times larger than the baseline case. In this case, these increased error levels were not large enough to have a large impact on the quality of the linearity assumption, and the augmented ensemble results were qualitatively similar to the baseline case. The only major difference was that the lag-1 ensemble for lead 0 was identical to the most recent lag-0 ensemble because the TLEF employs the ETKF for its updating. This again highlights the importance of nonlinearity to the successful implementation of the TLEF; nonlinear error evolution results in a divergence of the lag-1 and lag-0 forecasts for greater leads.

To further investigate the sensitivity of results to ensemble construction methodology, ensemble forecasts were also generated using singular vectors. Maintaining an ensemble size of *N*_{ens} = 80, ensembles were generated by adding perturbations to the EnKF ensemble mean analysis in plus and minus directions of the first 40 singular vectors. Both an isotropic uncertainty norm and an analysis error covariance norm (based on the EnKF analysis ensemble) were utilized. A range of perturbations magnitudes were explored. Because singular vectors do not represent a random draw from the correct analysis distribution, resulting ensemble forecasts do not represent a random draw from the correct forecast distribution. This does not mean that good second-order moments are not obtainable, but it depends upon the combination of initial perturbation size and TLEF boost factor. Initial perturbations sizes that are too small result in transformed ensembles that are variance deficient, while initial perturbation sizes that are too large result in transformed ensembles that have an excess of variance. This strong dependence on exactly how singular vectors are constructed and applied makes us hesitate to draw any general conclusions about what to expect for singular vector ensembles.

#### 4) State-dependent transformation matrix

The transformation matrix 𝗧 utilized by the TLEF both rotates and rescales ensemble perturbations in a state-dependent manner. To assess the importance of the state dependence and the rotation, experiments were performed utilizing a static, scalar “transformation.” Lead-specific rescaling factors were calculated over a long control run in an effort to produce the best augmented 1-day forecast ensembles possible. By construction, good second-moment statistics were produced by these climatologically rescaled, augmented ensembles. However, the rank histograms and BSSs indicated that the simple rescaling significantly degraded the forecast statistics relative to the lag-0 ensemble forecast. The state-dependent rotation and rescaling performed by the TLEF cannot be efficiently reproduced by using a simple climatological scaling factor.

The sensitivity of the TLEF depends on many factors, and understanding how the results might break down in different situations may help us understand future results obtained using output from operational NWP models.

## 4. Summary and conclusions

We have presented a technique based on ensemble data assimilation ideas to increase ensemble size without additional model runs. This work was motivated by a desire to sensibly and inexpensively increase forecast ensemble sizes by reusing old ensemble forecasts. We have employed the ETKF machinery to provide state-dependent transformations of old ensemble forecasts using new observations. In the idealized case considered, the resulting augmented (larger) ensemble forecasts did not degrade the statistics of the most recent ensemble forecast, and produced results that were statistically consistent with large lag-0 ensembles. Because transformations are performed in perturbation space, transformed ensembles must be centered about a mean state. While it is theoretically possible to transform the ensemble mean given observations that have become available since the forecast was launched, we chose to center the transformed forecast perturbations about the mean of the most recent ensemble forecasts. The implication is that the TLEF does not alter ensemble mean forecast statistics.

We applied the TLEF to the Lorenz 96 system of equations using a baseline case of *n* = 100, *N*_{ens} = 80, and *N*_{obs} = 90. Ensemble inflation factors were determined using a version of the approach described in Wang et al. (2004) modified to account for the small size of our observation space. Transformed and augmented ensembles were assessed by examining ensemble spread, ensemble second-moment statistics, rank histograms, and BSSs. We found that the spread, second-moment statistics, and rank histograms of the transformed and augmented ensembles were statistically indistinguishable from the lag-0 ensemble forecasts. The BSSs of the augmented ensembles indicated an improvement over the lag-0 ensemble forecasts due to an increase in resolution (ensemble size), not reliability. Extensive sensitivity studies were carried out that highlighted the dependence of the TLEF on the linearity assumption of perturbation error growth and on the relevance of the ensemble subspace.

This work has introduced the formulation of the TLEF and provided proof of concept for the ability of the TLEF to increase ensemble sizes (without having to increase the number of model integrations). Although the method presented here does not represent the definitive solution to small sample sizes, preliminary results of experiments using output from NWP ensemble prediction systems are promising. Results from ensembles from a single NWP EPS are broadly consistent with the results shown here, and application to multimodel NWP ensembles is under way.

## Acknowledgments

The authors thank the reviewers for comments and suggestions that led to an improved manuscript, as well as Steve Cohn for pertinent advice and Greg Lawson for discussions on technical issues. Significant insight into the ETKF and the state-dependent inflation factor calculations were provided by conversations with Craig Bishop and Xuguang Wang. The authors gratefully acknowledge financial support by ONR YIP Grant N00014-02-1-0473.

## REFERENCES

Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129****,**2884–2903.Bishop, C. H., , B. J. Etherton, , and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects.

,*Mon. Wea. Rev.***129****,**420–436.Buizza, R., 1997: Potential forecast skill of ensemble prediction, and spread and skill distributions of the ECMWF ensemble prediction system.

,*Mon. Wea. Rev.***125****,**99–119.Ebisuzaki, W., , and E. Kalnay, 1991: Ensemble experiments with a new lagged average forecasting scheme. WMO research activities in atmospheric and oceanic modelling, WMO Rep. 15, Geneva, Switzerland, 6.31–6.32.

Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99****,**10143–10162.Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation.

,*Ocean Dyn.***53****,**343–367.Hansen, J. A., , and L. A. Smith, 2000: The role of operational constraints in selecting supplementary observations.

,*J. Atmos. Sci.***57****,**2859–2871.Hoffmann, R. N., , and E. Kalnay, 1983: Lagged average forecasting, an alternative to Monte Carlo forecasting.

,*Tellus***35A****,**100–118.Houtekamer, P. L., , and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Jazwinski, A. H., 1969: Adaptive filtering.

,*Automatica***5****,**475–485.Kalnay, E., 2003:

*Atmospheric Modeling, Data Assimilation, and Predictability*. Cambridge University Press, 341 pp.Kharin, V., , and F. W. Zwiers, 2003: Improved seasonal probability forecasts.

,*J. Climate***16****,**1684–1701.Lorenz, E. N., 1996: Predictability-a problem partly solved.

*Proc. Seminar on Predictability,*Vol. 1, Reading, United Kingdom, ECMWF, 1–18.Lorenz, E. N., , and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulations with a small model.

,*J. Atmos. Sci.***55****,**399–414.Majumdar, S., , C. Bishop, , R. Buizza, , and R. Gelaro, 2002: A comparison of ensemble transform Kalman filter targeting guidance with ECMWF and NRL total energy singular vector guidance.

,*Quart. J. Roy. Meteor. Soc.***128****,**1–24.Palmer, T. N., , F. Molteni, , R. Mureau, , R. Buizza, , P. Chapelet, , and J. Tribbia, 1992: Ensemble prediction.

*Proc. Seminar on Validation of Models over Europe,*Reading, United Kingdom, ECMWF, 21–66.Smith, L., , and J. Hansen, 2004: Extending the limits of forecast verification with the minimum spanning tree.

,*Mon. Wea. Rev.***132****,**1522–1528.Tippett, M. K., , J. L. Anderson, , C. H. Bishop, , T. M. Hamill, , and J. S. Whitaker, 2003: Ensemble square-root filters.

,*Mon. Wea. Rev.***131****,**1485–1490.Toth, Z., , and E. Kalnay, 1993: Ensemble forecasting at NMC: The generation of perturbations.

,*Bull. Amer. Meteor. Soc.***74****,**2317–2330.Wang, X., , and C. H. Bishop, 2003: A comparison of breeding and ensemble transform Kalman filter ensemble forecast schemes.

,*J. Atmos. Sci.***60****,**1140–1158.Wang, X., , C. H. Bishop, , and S. J. Julier, 2004: Which is better, an ensemble of positive–negative pairs or a centered spherical simplex ensemble?

,*Mon. Wea. Rev.***132****,**1590–1605.Whitaker, J. S., , and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130****,**1913–1924.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences: An Introduction*. Academic Press, 467 pp.Wilks, D. S., 2002: Smoothing forecast ensembles with fitted probability distributions.

,*Quart. J. Roy. Meteor. Soc.***128****,**2821–2836.

Average second-moment statistics (over 10 experiments), where second-moment statistics are averaged over 100 components in each. All means are close to 0 indicating correctly sampled distributions. Standard deviations above 1.0 indicate slightly overdispersive ensembles. The lag 0 is closest to 1.0 and the augmented ensemble (*N*_{ens} = 400) is closer to 1.0 than any of the lagged forecasts.

BSS (for individual and augmented ensembles with respect to lag 0) over all components. All values are an average over 15 experiments. Analysis of the variability over the 15 experiments indicates that the BSS of the individually transformed ensembles are indistinguishable from zero, while those of the augmented ensembles are robustly positive. Positive values of BSS (improvements over the lag-0 ensemble) are shown in boldface type for clarity. The last column contains BSS from a large nonlinear ensemble that is the same size as the 0, 1, 2, 3, and 4 augmented ensembles. Error bars indicate the statistical similarity between the augmented and the nonlinear ensemble.

^{1}

For notational clarity the time indices have been removed from these expressions.

^{2}

Note that because the ensemble-based filters can be configured so that the observation operator operates on 𝗫* ^{f}* rather than 𝗣

*: 𝗛 need not be linear.*

^{f}^{3}

This is not strictly true. The lag-0 forecast ensemble perturbations are multiplied by an inflation factor in a calibration aimed at reducing ensemble variance deficiencies arising from limited sample sizes. Details are given in section 2b.

^{4}

One can choose to construct a 𝗧 from the most recent lag-0 forecast and apply it to all previously transformed ensembles, or construct a new 𝗧 for each transformed ensemble using their ensemble perturbations. In the limit of all assumptions being satisfied the two are equivalent. In this work, a 𝗧 is computed for each lagged ensemble. Experiments were run using only the lag-0 𝗧 and results were qualitatively unchanged.

^{5}

The ETKF can be modified to include localization, but complicating issues associated with propagating localization information forward in the forecast ensemble subspace motivated neglecting such an implementation.

^{6}

A pathological counterexample is the case where the lag-0 ensemble is constructed without regard to dynamics, observational density, or level of observational uncertainty. In such a case the ensemble data assimilation roots of the TLEF would result in improved ensembles relative to the lag 0.

^{7}

The L96 equations are symmetric, making the result independent of the choice of component.