## 1. Introduction

Hybrid ensemble-variational data assimilation (DA) is becoming increasingly popular at operational numerical weather prediction (NWP) centers (Buehner et al. 2010; Clayton et al. 2013; Kuhl et al. 2013; Kleist and Ide 2015; Bonavita et al. 2016). By combining high-rank conventional error covariance with flow-dependent localized ensemble covariance, the hybrid outperforms traditional 4DVAR. A key component in both hybrid and traditional 4DVAR is a dynamical tangent linear model (TLM) and its adjoint to implicitly propagate the background error covariance. The TLM is limited, however, in its ability to model nonlinear physics (e.g., processes that use a step function threshold), by its difficulty to maintain for evolving forecast model algorithms and accompanying code, and in its scalability to multiple processors (Lorenc 2003). New ensemble-variational approaches are promising, but are currently limited by the inability to propagate the climatological covariance and difficulties in localizing a four-dimensional ensemble covariance (e.g., Buehner et al. 2010, 2013, 2015; Kleist and Ide 2015; Lorenc et al. 2015; Poterjoy and Zhang 2015).

To overcome these problems, local ensemble-based TLMs (LETLMs) have been proposed [see Frolov and Bishop (2016) for an overview of previous work and details of the theory and practical construction of LETLMs]. Bishop et al. (2016, manuscript submitted to *Quart. J. Roy. Meteor. Soc.*, hereafter B16) have shown that LETLMs perfectly reproduce the true TLM when the ensemble perturbations are small enough to be governed entirely by linear dynamics and the ensemble size exceeds the number of variables that influence the evolution of a single variable over an LETLM time step. Regardless of the size of the ensemble perturbations, LETLMs only require ensembles of forecasts from the full nonlinear forecast model. The perturbations they propagate do not have to lie within the ensemble subspace and they can propagate the full hybrid analysis correction. Frolov and Bishop (2016) obtained promising results with one-dimensional tests for both nondispersive and dispersive wave dynamics. Further work by B16 used a simple coupled model to show that the LETLM approach can precisely recover the true TLM and can be used with a 4DVAR outer loop to find the most likely state in systems with a high degree of nonlinearity. The use of LETLMs in coupled models is particularly appealing because the challenges associated with building and maintaining conventional TLMs are even greater in coupled models. An additional complication is the construction and maintenance of the TLM and adjoint of the coupler between model components.

In this paper, we extend the work of Frolov and Bishop (2016) and B16 by applying an LETLM to a shallow-water model (SWM). While not a full NWP model, the SWM is sufficiently complex to examine realistic atmospheric flows that include both slow (Rossby) and fast (gravity) modes. The B16 4DVAR study was limited to the case where the ensemble size exceeded the number of variables within each LETLM influence volume and where the LETLM time step was the same as the time step of the nonlinear model. Here, we use the SWM to show that LETLMs can also be accurate with less stringent requirements. Specifically, we show that the LETLM is accurate even when the ensemble size is less than the number of variables within each influence volume and when the LETLM time step is significantly longer than the time step of the nonlinear model. The SWM used in this study has been employed previously to study trace gas assimilation using 4DVAR (Allen et al. 2014), ensemble Kalman filter (EnKF) (Allen et al. 2015), and hybrid 4DVAR (Allen et al. 2016). In this study, we present results from experiments in which height observations (in lieu of temperature for the SWM) are assimilated in a cycling system both with a conventional TLM and LETLM.

The layout of the paper is as follows. Section 2 describes the forecast model and DA. Section 3 describes the tuning of a baseline LETLM and examines some diagnostics of its performance. Section 4 presents the results of cycling experiments with the baseline LETLM. Section 5 details the sensitivity of LETLM errors to several model parameters, and conclusions are provided in section 6.

## 2. Model description

### a. Forecast model, truth run, and observations

The forecast model is the spectral SWM coupled to a spectral advection model for tracer transport, as described in Allen et al. (2014, 2015). For this study, we use a triangular truncation of T21 (Gaussian grid spacing of ~5.6° at the equator) and a global mean height of 10 km. Horizontal fourth-order diffusion was applied to the geopotential, vorticity, divergence, and tracer, with a coefficient of 8.9 × 10^{16} m^{4} s^{−1}, which represents an *e*-folding damping time of 1 day for the smallest scales. The forecast model was run with a time step of 2 min.

The truth run (hereafter referred to as “truth”) was initialized with a Northern Hemisphere (NH) zonal jet in geostrophic balance with the fluid height. The tracer was initialized with ozone (O_{3}) mixing ratios typical of the middle stratosphere [see Fig. 5 of Allen et al. (2015) for representative O_{3} maps]. Topographic forcing simulated a zonal wavenumber-1 mountain that grew and decayed over 20 days [see Allen et al. (2014) for details], followed by flat topography. We define the start of the truth run as day −20. The DA experiments are performed from day 0 and onward when there is no topographic forcing. Maps of potential vorticity (PV) in the NH and Southern Hemisphere (SH) are shown in Fig. 1 at 5-day intervals starting at day −15. The scenario resembles a major stratospheric warming, in which the NH polar vortex is pushed off the pole and stretched into a comma shape (days −15 to 0). From day 0 to 10 the vortex returns back to the pole and assumes a more nearly circular shape. In the SH, the flow is weaker and generally easterly, similar to the summer stratosphere.

Fluid height *z* and O_{3} observations were generated by sampling the truth and adding Gaussian random noise. The *z* observations are located randomly in space and time, while O_{3} observations simulate polar-orbital sampling (see Fig. 2). Error standard deviations are 200 m for *z* and 0.3 ppmv for O_{3}. The *z* error represents approximately 1-K temperature error in the middle stratosphere, as explained by Allen et al. (2015), while the O_{3} error is typical of limb-sounding measurements in the stratosphere. Observations were made at hourly intervals from days 0 to 10, with approximately 150 observations of each variable in each hourly batch.

Sampling patterns for 24 h of (a) pseudorandom height data and (b) polar-orbiting ozone data.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Sampling patterns for 24 h of (a) pseudorandom height data and (b) polar-orbiting ozone data.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Sampling patterns for 24 h of (a) pseudorandom height data and (b) polar-orbiting ozone data.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

### b. 4DVAR using a hybrid linear mix of ensemble and static error covariances

The DA system we use is the hybrid 4DVAR described in Allen et al. (2016). The ensembles for both the hybrid background error covariance and the LETLM are produced from a “perturbed observations” EnKF as described in Allen et al. (2015). The EnKF analysis equation was solved with streamfunction, velocity potential, height, and ozone. As shown in Allen et al. (2015), the use of streamfunction and velocity potential resulted in less imbalance than the use of zonal and meridional wind components. An adaptive state space covariance inflation factor was applied to the background ensemble, forcing the globally averaged ensemble spread to match the globally averaged root-mean-square error (RMSE) in the streamfunction (relative to the truth). While derived from the streamfunction only, the same inflation factor is applied equally to all state variables. After a brief spinup of ~2 days, the average spread over days 2–10 is within 10% of the RMSE for each of the state variables. Elementwise (Schur product) localization (e.g., Houtekamer and Mitchell 2001) was applied using Eq. (4.10) of Gaspari and Cohn (1999) and a localization length of 7500 km. To initialize the ensemble, we sampled the truth at 6-h intervals from days 0 to 25. The EnKF was run with assimilation of *z* and O_{3} in 60-min batches [rather than 20 min in Allen et al. (2015)]. There are therefore six batches of data assimilated over a 6-h analysis window. The EnKF errors are discussed in more detail in section 3a.

### c. Local ensemble tangent linear model

Frolov and Bishop (2016) introduced several variants of localization for ensemble-based recursive TLMs. We experimented with many different methods (including spectral localization). As in Frolov and Bishop (2016) and B16, we found that the version of the LETLM that uses a Heaviside function for localization much like early forms of the local ensemble transform Kalman filter [Eq. (29) of Frolov and Bishop (2016)] produced the most accurate results. We therefore only show results using this method.

*p*is formulated as

*t*=

*m*− 1) and future (

*t*=

*m*) states of the model, where

*t*is a time index, and

*u*, meridional wind

*υ*, and

*z*, respectively, which were determined by globally averaging the gridpoint ensemble standard deviations for each variable for a representative ensemble. The term

*p*, and

*s*) is the maximum eigenvalue for prediction point

*p*,

*u*,

*υ*, and

*z*) at this grid point.

Influence volume for a grid point over the western United States, using a circle of radius 2000 km. Red points are model grid points that influence the central (large black) point, while small black points indicate the model Gaussian grid for triangular truncation T21.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Influence volume for a grid point over the western United States, using a circle of radius 2000 km. Red points are model grid points that influence the central (large black) point, while small black points indicate the model Gaussian grid for triangular truncation T21.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Influence volume for a grid point over the western United States, using a circle of radius 2000 km. Red points are model grid points that influence the central (large black) point, while small black points indicate the model Gaussian grid for triangular truncation T21.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

^{−1}, where

*g =*9.81 m s

^{−2}is the acceleration of gravity and

*H*= 10 km is the model depth. These waves travel 1126 km in the LETLM time step. To safely account for these, we chose the influence volume to be a circle with great-circle radius of 2000 km. Imposing this on the T21 Gaussian grid results in latitudinal variation in the number of grid points in the influence volume (see Fig. 3 for an illustration), ranging from 37 at the equator to 188 at the poles. For this study, we only propagate the three dynamical variables with the LETLM (

*u*,

*υ*, and

*z*), and we do not allow the O

_{3}to influence the dynamical variables. Since the vector

The ensemble size used in the LETLM is another “tuning parameter.” As discussed in Allen et al. (2015), the state size (i.e., spectral coefficients) of the T21 SWM is 1518. Unlocalized ETLMs with 1518 members could therefore accurately propagate analysis corrections, since the ensemble is as large as the state space. However, this is computationally impractical for large systems. By localizing the ETLM we can, as long as the influence volume is large enough to encompass all important physical processes, reduce the effective necessary ensemble size. In this study we use a 100 member ensemble for the baseline LETLM, which is ~10%–80% smaller than the number of variables in the influence volume. Sensitivity tests for smaller ensemble sizes are also shown in section 5c.

Note that here we are considering experiments in which there will be more variables in each influence volume than there are independent ensemble perturbations. B16 showed that the LETLM is guaranteed to be accurate if the number of independent ensemble perturbations exceeds the number of variables on each influence volume and provided the ensemble perturbations are governed by purely linear dynamics. B16 speculated that the LETLM might still be accurate when the number of variables within each influence volume was greater than the ensemble size provided that the rank of the forecast error covariance matrix when confined to the influence volume was less than the ensemble size. Hence, the experiments considered here serve to explore the possibility that LETLMs can still deliver acceptable accuracy even when the conditions for LETLM accuracy outlined in B16 are not precisely met.

The time required for the LETLM calculation depends on several factors, including time step, influence volume radius, forecast model resolution, and ensemble size. In this study, the first three factors are fixed, but we do vary the ensemble size as a sensitivity study. We note that, theoretically, the computational cost of the matrix inversion in Eq. (2) should scale as the cube of the ensemble size. To test the timing with the current algorithm, we calculated the computation time to run 6-h LETLM forecasts with ensemble sizes of 10, 20, …, 100, and compared with the 6-h forecast time for the conventional TLM with a 120-s time step (1.43 s). The results in Fig. 4 (also see Table 1) show that the LETLM takes ~2 times longer for 10 members and ~100 times longer for 100 members. The time required for the LETLM is approximately proportional to the ensemble size raised to the power 1.7 for the range of 10–100 members, while it is proportional to 2.0 for the range of 50–100 members. These numbers are significantly lower than the theoretical limit of 3.0 for the matrix inversion alone. We note that the LETLM as currently coded is not optimized for speed; a discussion of several methods for speeding up the LETLM is provided in the conclusions.

Time required for the computation of a 6-h forecast with the LETLM on a single processor as a function of ensemble size. Lines indicate power-law fits of 1.7 over the range of 10–100 members (red) and 2.0 over the range of 50–100 members (blue).

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Time required for the computation of a 6-h forecast with the LETLM on a single processor as a function of ensemble size. Lines indicate power-law fits of 1.7 over the range of 10–100 members (red) and 2.0 over the range of 50–100 members (blue).

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Time required for the computation of a 6-h forecast with the LETLM on a single processor as a function of ensemble size. Lines indicate power-law fits of 1.7 over the range of 10–100 members (red) and 2.0 over the range of 50–100 members (blue).

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Computation time for 6-h forecasts with the LETLM as a function of ensemble size along with the ratio relative to the TLM forecast time of 1.43 s.

## 3. Tuning of the baseline LETLM using 6-h forecasts of an analysis correction

This section discusses the spinup of the DA system and subsequent tuning of LETLM parameters that will be used as a baseline for the cycling experiments presented in section 4 as well as subsequent sensitivity tests in section 5. Baseline LETLM parameters include a time step of 3600 s, an influence volume radius of 2000 km, 100 ensemble members, and a

Baseline LETLM parameters.

### a. Spinup of the DA system

To spin up the DA, we assimilated *z* and O_{3} for 6 days with conventional 4DVAR (*u*, *υ*, and *z*, along with the wind extraction potential (WEP). WEP is a normalized diagnostic relating the analyzed RMSE of the vector wind to the initial RMSE of the vector wind [a WEP value of 100% indicates winds matching the truth, while 0% indicates no improvement; details of the WEP calculation are provided in Allen et al. (2015)]. The errors decrease (and WEP increases) initially and then start to level out at the end of the spinup period. The state at day 6, hour 0 is used as the initial conditions for subsequent experiments. The EnKF errors are also shown in Fig. 5 (dotted lines) for comparison. While the EnKF was run without any coupling to the 4DVAR, it tracks the 4DVAR errors closely, particularly for the zonal wind and height. We therefore decided that recentering the EnKF to the 4DVAR analysis was not necessary for the experiments in this study. Note that the EnKF errors are somewhat lower for the meridional wind than for the zonal wind. This is likely related to ozone gradients being generally oriented in the north–south direction, which facilitates extraction of the meridional wind component from ozone assimilation.

(a) RMSE for zonal wind, (b) RMSE for meridional wind, (c) wind extraction potential, and (d) RMSE for height for 4DVAR (solid) and EnKF (dotted) assimilation of height and O_{3}.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

(a) RMSE for zonal wind, (b) RMSE for meridional wind, (c) wind extraction potential, and (d) RMSE for height for 4DVAR (solid) and EnKF (dotted) assimilation of height and O_{3}.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

(a) RMSE for zonal wind, (b) RMSE for meridional wind, (c) wind extraction potential, and (d) RMSE for height for 4DVAR (solid) and EnKF (dotted) assimilation of height and O_{3}.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

As benchmarks, we first performed three experiments from days 6 to 10 (16 DA cycles). In the first experiment there was no data assimilated (NODATA), while in the second and third experiments, *z* was assimilated using 3DVAR or 4DVAR. Note that in this study the first guess at appropriate time (FGAT) method is employed for 3DVAR, so the only key difference between the 3DVAR and 4DVAR is the covariance propagation. Errors from these experiments are plotted in Fig. 6. Wind and height errors clearly increase (and WEP decreases) when no data are assimilated, while the errors in the 3DVAR experiment are smaller than NODATA. The 4DVAR results in further error reductions, showing that propagation of the background error through the analysis window is an important factor in this experimental design.

(a) RMSE for zonal wind, (b) RMSE for meridional wind, (c) wind extraction potential, and (d) RMSE for height. Results are shown for NODATA (green), 3DVAR (red), and 4DVAR (black) assimilation of height only.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

(a) RMSE for zonal wind, (b) RMSE for meridional wind, (c) wind extraction potential, and (d) RMSE for height. Results are shown for NODATA (green), 3DVAR (red), and 4DVAR (black) assimilation of height only.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

(a) RMSE for zonal wind, (b) RMSE for meridional wind, (c) wind extraction potential, and (d) RMSE for height. Results are shown for NODATA (green), 3DVAR (red), and 4DVAR (black) assimilation of height only.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

### b. Tuning the LETLM pseudoinverse’s eigenvalue cutoff

To tune the quality of the LETLM, we start with an analysis correction (or analysis perturbation) generated by assimilating 6 h of randomly spaced *z* observations (see Fig. 2a for sampling), starting at day 6, hour 0, using the adjoint of the conventional TLM and the static (i.e.,

RMSE for zonal wind, meridional wind, and height as a function of eigenvalue cutoff parameter

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

RMSE for zonal wind, meridional wind, and height as a function of eigenvalue cutoff parameter

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

RMSE for zonal wind, meridional wind, and height as a function of eigenvalue cutoff parameter

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Running with

### c. Comparing performance of baseline LETLM and TLM

Next we compare 6-h forecasts of analysis corrections from same initial conditions, calculated with the TLM, LETLM, and persistence (i.e., 3DVAR). The 6-h results are compared with full nonlinear forecasts as before. Figure 8 shows maps of the 6-h propagation using the conventional covariance (*α* = 0.0). The initial correction is shown in the first row, while the 6-h TLM and LETLM results are in the second and third rows, and the difference between these is in the last row. Because of the high quality of the TLM, maps of the full nonlinear calculation are nearly identical and are not shown. The LETLM is able to forecast all of the major features that are present in the TLM forecast. Difference maps are quite random, except for an apparent wavelike feature in the wind fields in the NH near 180° longitude.

Analysis corrections at (first row) day 0, and propagated for 6 h with (second row) TLM and (third row) LETLM with

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Analysis corrections at (first row) day 0, and propagated for 6 h with (second row) TLM and (third row) LETLM with

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Analysis corrections at (first row) day 0, and propagated for 6 h with (second row) TLM and (third row) LETLM with

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Figure 9 (first row) shows the global RMSE for *u*, *υ*, and *z* as a function of time for the *α* = 0.0 test. The TLM errors (dotted lines very close to the horizontal axis) are quite small, indicating that the TLM is nearly a perfect representation of the full nonlinear SWM for analysis corrections of this size. The persistence errors (dashed) grow rapidly with time, reaching nearly the same order as the global RMSE value of the correction (solid black line), which indicates the size of the corrections. While the LETLM errors (red) are larger than the TLM errors, they are considerably smaller than both the global RMSE values of the correction and the persistence errors, suggesting that the LETLM will provide useful time-dependent information in the cycling process.

Errors for tests with (top)

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Errors for tests with (top)

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Errors for tests with (top)

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Figure 10 shows forecast maps for the analysis correction based completely on the ensemble covariance (

As in Fig. 8, but with

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

As in Fig. 8, but with

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

As in Fig. 8, but with

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Finally, Fig. 11 presents the results for the hybrid based on a 50–50 blend of the conventional and ensemble covariances (

As in Fig. 8, but with

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

As in Fig. 8, but with

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

As in Fig. 8, but with

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

While the LETLM errors are much smaller than persistence errors, these tests show that the TLM errors for this case are even smaller. One difference is that while the TLM was run at the nominal model time step of 120 s, the LETLM was run at 3600 s (even though the nonlinear ensemble forecasts used for the LETLM were run at 120 s). Further improvements to the LETLM may be made by reducing the time step of the LETLM calculation itself (and altering the influence volume accordingly). However, as shown in section 4, when used in the cycling system, the DA with the baseline LETLM produces similar analyses to the TLM, so the differences in quality seen in these tests appear to not affect the overall performance of the DA system.

### d. Quantifying the degrees of freedom in the LETLM

To further understand the degrees of freedom (DOF) inherent to our system, we examined the number of eigenvalues that are kept by the pseudoinverse (i.e., ensemble size minus the number of discarded eigenvalues). Figure 12a shows the DOF and Fig. 12b shows the maximum eigenvalue for the first time step of the LETLM used in these baseline tests. The DOF ranged from 70 to 83 with larger values in the extratropics of both hemispheres and at the equator, while bands of lower values occur in the subtropics. The limited range in the number of DOF as a function of latitude contrasts to the large difference in the number of points included in the influence volume at the poles (564) as compared to the equator (111). This suggests that, despite changes in the gridpoint resolution, the inherent complexity of the dynamics is similar at the poles as it is at the equator. This is not that surprising given that the model is a spectral model with triangular truncation that gives close to isotropic resolution in the continuous space of the spherical harmonic functions used by the spectral model. At present we do not have an explanation for the band of lower DOF along the tropics. However, we can see the impact of the computational influence volume in the trend of the maximum eigenvalues. The larger maximum eigenvalues at the poles suggest that the larger influence volume at the poles contained redundant information, leading to a more poorly conditioned matrix

(a) Degrees of freedom, (b) maximum eigenvalue, and eigenvalues at individual points, indicated by numbers in (a) for (c) point 1 and (d) point 2. In (c) and (d), red dots are eigenvalues that are kept, while blue dots indicate eigenvalues that are discarded.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

(a) Degrees of freedom, (b) maximum eigenvalue, and eigenvalues at individual points, indicated by numbers in (a) for (c) point 1 and (d) point 2. In (c) and (d), red dots are eigenvalues that are kept, while blue dots indicate eigenvalues that are discarded.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

(a) Degrees of freedom, (b) maximum eigenvalue, and eigenvalues at individual points, indicated by numbers in (a) for (c) point 1 and (d) point 2. In (c) and (d), red dots are eigenvalues that are kept, while blue dots indicate eigenvalues that are discarded.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

## 4. Cycling experiments with the LETLM

The success of the LETLM in propagating analysis corrections suggests that the LETLM can be used in cycling. Here we examine the results when using the baseline LETLM for assimilation of *z* data from days 6 to 10. Longer tests are possible, but the truth run used in this study has no topographic forcing during the DA period, so the system evolves toward a less dynamically active state over time (see Fig. 1). While four days may seem like a short assimilation test, it is sufficiently long to serve our purpose to demonstrate that the LETLM is a feasible alternative to the TLM. For baseline cycling tests, 6-h nonlinear ensemble forecasts (using time step of 120 s) for each cycle were created offline from the EnKF DA. As in the tests shown in the previous section, the LETLM uses a time step of 1 h, with the influence volume being a circle with radius of 2000 km and with

Results for

RMS errors for cycling experiments using

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

RMS errors for cycling experiments using

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

RMS errors for cycling experiments using

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Final WEP values (averaged over the last 24 h) from the cycling experiments. The parameter

As in Fig. 13, but using

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

As in Fig. 13, but using

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

As in Fig. 13, but using

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Further cycling experiments were also performed using data assimilation windows of 12 and 24 h, for comparison with the baseline 6-h results. Offline tuning experiments showed that while the baseline value of *z* data, and the final WEP value is used for comparison. Figure 15 summarizes the results for

Wind extraction potential (averaged over the last 24 h) as a function of cycling time for three different values of the hybrid blending coefficient: (a)

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Wind extraction potential (averaged over the last 24 h) as a function of cycling time for three different values of the hybrid blending coefficient: (a)

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

Wind extraction potential (averaged over the last 24 h) as a function of cycling time for three different values of the hybrid blending coefficient: (a)

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

## 5. Sensitivity tests

The baseline LETLM discussed in section 3 demonstrated the ability of a well-tuned LETLM to successfully propagate a hybrid analysis correction. In this section, we examine LETLM skill with varying model parameters to provide further guidance as to the broader applicability of the LETLM. In particular, we examine sensitivities to the size of the analysis correction, the ensemble spread, ensemble size, and forecast model error. For each sensitivity test, 6-h forecasts were performed for the

### a. Sensitivity to the size of the analysis correction

^{−4}%, indicating a very linear regime, which favors the conventional TLM, making this a very stringent test for the LETLM.

Sensitivity tests were performed by running 6-h forecasts with the TLM and LETLM using initial corrections that were increased by factors of 2, 4, 8, 16, …, 1024. The LETLM settings were unchanged, and the same nonlinear ensemble forecasts were used for each case. The NLI as a function of scaling factor is shown in Fig. 16a, and the 6-h errors for *u* are shown in Fig. 16b. In the highly linear regime (NLI ≪ 1), the TLM errors are smaller than LETLM errors. Power-law fits of the errors in the linear regime indicate that the TLM errors increase as the square of the analysis correction size; this is due to the quadratic form of the terms that are neglected when forming the TLM from the nonlinear equations. The LETLM errors, however, increase linearly with correction size in the linear regime. Therefore, as the correction size increases, the errors increase more slowly for the LETLM than for the TLM. At values of NLI greater than ~1%, the errors in the two systems become very similar, and at NLI ≫ 1 they give nearly identical results. This suggests that the LETLM is applicable for more realistic geophysical systems where the size of analysis corrections and nonlinearity is larger.

LETLM sensitivity test results calculated from 6-h forecasts of analysis corrections. (a) The nonlinear index as a function of analysis correction scaling factor. (b)–(f) Global RMS errors in *u* as a function of (b) the analysis correction scaling factor, (c) the ensemble scaling factor, (d) the ensemble size, (e) the diffusion coefficient scaling factor, and (f) the model time step. In (b)–(f) the LETLM errors are indicated by colored dots and connecting lines, while the TLM errors are indicated by black dots and connecting lines. Large dots indicate LETLM results using the baseline settings.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

LETLM sensitivity test results calculated from 6-h forecasts of analysis corrections. (a) The nonlinear index as a function of analysis correction scaling factor. (b)–(f) Global RMS errors in *u* as a function of (b) the analysis correction scaling factor, (c) the ensemble scaling factor, (d) the ensemble size, (e) the diffusion coefficient scaling factor, and (f) the model time step. In (b)–(f) the LETLM errors are indicated by colored dots and connecting lines, while the TLM errors are indicated by black dots and connecting lines. Large dots indicate LETLM results using the baseline settings.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

LETLM sensitivity test results calculated from 6-h forecasts of analysis corrections. (a) The nonlinear index as a function of analysis correction scaling factor. (b)–(f) Global RMS errors in *u* as a function of (b) the analysis correction scaling factor, (c) the ensemble scaling factor, (d) the ensemble size, (e) the diffusion coefficient scaling factor, and (f) the model time step. In (b)–(f) the LETLM errors are indicated by colored dots and connecting lines, while the TLM errors are indicated by black dots and connecting lines. Large dots indicate LETLM results using the baseline settings.

Citation: Monthly Weather Review 145, 1; 10.1175/MWR-D-16-0184.1

### b. Sensitivity to the ensemble spread

Part of the reason the LETLM errors are larger than TLM errors may be due to ensemble perturbations that are not purely linear. Therefore, the nonlinear ensemble forecasts may not accurately represent the linear evolution. In practice, this problem can be overcome by reducing the size of the ensemble perturbations (i.e., deflating the spread about the ensemble mean). This involves added computation of running additional nonlinear ensemble forecasts from the deflated ensembles for the purpose of the LETLM only, but may result in significant benefit. To illustrate this, we performed a series of LETLM tests that were initialized on days 1, 2, 3, 4, 5, and 6 of the assimilation experiment. Separate analysis corrections were calculated for each case, using 6 h of height data from the respective day. Since the EnKF ensemble spread is adaptively tuned to the global RMSE of the analysis, the earlier ensembles have larger perturbations and are therefore more nonlinear (see Fig. 5 for plots of EnKF errors). For each initial time, we ran four additional nonlinear ensemble forecasts in which the ensemble perturbations were scaled by factors of 0.3, 0.1, 0.03, and 0.01. We then performed LETLM tests with the same initial corrections (and the baseline LETLM parameters), but run with the different ensemble forecasts.

Figure 16c shows the errors as a function of ensemble scaling factor. For the baseline test, initialized on day 6, it was possible to reduce LETLM errors by ~5% by reducing the ensemble perturbations by 0.1, and a slight further reduction occurred with scaling by 0.01. However, for tests initialized on earlier days the error reduction for ensemble scaling was much larger. For example, the test initialized on day 1 showed error reductions of ~35% for scaling of 0.1 and ~38% for scaling of 0.01. This illustrates the potential benefit of deflating the ensemble in order to create more linear ensemble forecasts in order to perform the LETLM.

### c. Sensitivity to ensemble size

The LETLM skill will, of course, be sensitive to the ensemble size. More ensemble members provide a greater probing of the local error space. To quantify this effect, we ran 6-h forecasts using ensemble sizes of 10, 20, …, 100. To create the reduced ensembles we subsample the ensemble forecasts created from the original 100-member EnKF. For the LETLM parameters, we use the other baseline parameters (radius of 2000 km, time step of 1 h, and

The errors as a function of ensemble size are provided in Fig. 16d along with the errors for persistence and for the TLM. For ensemble size of 10, the errors are larger than persistence, indicating that the LETLM is not providing useful information. From 20 to 100 members, the errors are smaller than persistence and drop monotonically with ensemble size, with an apparent levelling out above around 70–80 members. This is consistent with the degrees of freedom calculated in the baseline tests, which ranged from 70 to 83 (see Fig. 12a). Adding ensembles above this number appears to not result in much additional improvement in the LETLM.

We conclude this subsection by recalling that the quality of the LETLMs with small ensemble sizes might have been further improved by reducing the LETLM time step and making the influence volume correspondingly smaller. However, an exploration of this possibility is beyond the scope of this paper.

### d. Sensitivity to model error

The experiments performed so far were all based on a perfect model assumption, in that the exact same forecast model was used for the truth and for the DA. As a final test, we examine how the LETLM and TLM behave when the underlying forecast model has different parameters than the forecast model used in the truth run. Here, we test the sensitivity to two different model parameters: diffusion coefficient and time step. In the former case, we multiply the baseline diffusion coefficient by factors of 3, 10, 30, and 100, and in the latter case we increase the time step from the baseline value of 120 s to 600, 1200, and 1800 s. In each case, the nonlinear ensemble forecasts used for the LETLM are repeated with the new forecast model settings, but all other LETLM parameters use the baseline settings. In addition, the TLM is run with the altered model parameters for comparison.

The sensitivity to errors in the diffusion coefficient is shown in Fig. 16e. The LETLM errors do not change much when diffusion is increased by factors of 3 and 10, but they increase significantly for factors of 30 and 100. The TLM errors also increase with diffusion factor, but more rapidly than for the LETLM, so that by a factor of 30 the errors are nearly the same, and the errors remain similar for value of 100. These results suggest that the presence of model errors tends to equalize the skill of the LETLM and the TLM. A similar tendency is seen in the sensitivity to model time step (Fig. 16f), where the LETLM and TLM errors are similar at values of 1200 and 1800 s. It is interesting that the LETLM and TLM asymptote toward a similar solution in both cases of model error. Further tests are necessary to examine the impact of model error in the complete cycling system.

## 6. Conclusions

The construction of the LETLM and its adjoint, and their use in a 4DVAR system with no outer loop, have been demonstrated for a geophysically relevant global shallow-water model. Compared with the highly idealized LETLM studies of Frolov and Bishop (2016) and B16, this study moves us a significant step closer to our ultimate aim of investigating their use in a full NWP system. While B16 demonstrated very high LETLM accuracy in a dynamical system in which the ensemble size exceeded the number of variables in each influence volume, here we demonstrated accurate LETLMs for the case where the number of ensemble members was significantly fewer than the number of variables in each influence volume. Furthermore, we showed that, at the expense of larger influence volumes, LETLM time steps can be made much larger than the time step of the nonlinear model used to produce the ensemble.

Nevertheless, more research is needed to determine the potential value of LETLMs and their adjoints in NWP. In complex multicomponent systems such as coupled models, LETLMs have the potential to be much easier to construct and maintain than traditional TLMs and adjoints. However, approaches to minimizing their computational cost, particularly in terms of input–output operations, while maximizing their computational scalability are still in their infancy. The cost of the LETLM is more than a traditional TLM (in these experiments using a single processor the cost was a factor of 100 larger when using 100 ensemble members). These costs are amenable to parallel computing and can be further reduced through code optimization. For example, more efficient matrix inverse procedures could also be employed (e.g., Cholesky solver instead of singular value decomposition), and further benefit could be obtained by computing matrix inverses on a sparse grid than the original grid. Finally, in a supercomputer with sufficient memory, one could imagine precomputing the LETLM operator before the DA cycle and holding it in memory as a sparse matrix or as a distributed entity across many processors. Multiplication by a sparse matrix is a highly efficient and scalable operation that is supported by standard math libraries on current and future generations of supercomputers. Furthermore, the input–output cost of reading in ensembles for each time step of the LETLM would only have to be incurred once, before beginning the iterative TLM/adjoint based decent algorithms used in 4DVAR.

The LETLM provides a promising approach to propagating the full hybrid covariance without the major limitations of a hybrid 4DEnVar (e.g., Lorenc et al. 2015). First, it can propagate the full hybrid covariance with any chosen blending of the static and ensemble parts, just like a conventional TLM. Second, it can propagate an initially localized covariance, thereby overcoming four-dimensional localization difficulties associated with identifying and attenuating spurious ensemble covariances across space–time. Even in the limiting case of a blending coefficient of 1.0 (i.e., localized ensemble covariance only), then the LETLM approach should, in principle, be able to outperform the 4DEnVar. Further research is necessary to test and computationally optimize the LETLM in a realistic NWP system.

## Acknowledgments

This work was funded by the U.S. Office of Naval Research. Douglas Allen, Karl Hoppel, and Gerald Nedoluha acknowledge support from Office of Naval Research base funding via Task BE-033-02-42. David Kuhl, Karl Hoppel, and Gerald Nedoluha acknowledge support from Office of Naval Research base funding via Task BE-435-050. Sergey Frolov and Craig Bishop acknowledge support from Office of Naval Research Award N0001412WX20323. We thank the editor and two anonymous reviewers for their helpful comments.

## REFERENCES

Allen, D. R., K. W. Hoppel, and D. D. Kuhl, 2014: Wind extraction potential from 4D-Var assimilation of stratospheric O3, N2O, and H2O using a global shallow water model.

,*Atmos. Chem. Phys.***14**, 3347–3360, doi:10.5194/acp-14-3347-2014.Allen, D. R., K. W. Hoppel, and D. D. Kuhl, 2015: Wind extraction potential from ensemble Kalman filter assimilation of stratospheric ozone using a global shallow water model.

,*Atmos. Chem. Phys.***15**, 5835–5850, doi:10.5194/acp-15-5835-2015.Allen, D. R., K. W. Hoppel, and D. D. Kuhl, 2016: Hybrid ensemble 4DVar assimilation of stratospheric ozone using a global shallow water model.

,*Atmos. Chem. Phys.***16**, 8193–8204, doi:10.5194/acp-16-8193-2016.Bonavita, M., E. Hólm, L. Isaksen, and M. Fisher, 2016: The evolution of the ECMWF hybrid data assimilation system.

,*Quart. J. Roy. Meteor. Soc.***142**, 287–303, doi:10.1002/qj.2652.Buehner, M., P. L. Houtekamer, C. Charette, H. L. Mitchell, and B. He, 2010: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part II: One-month experiments with real observations.

,*Mon. Wea. Rev.***138**, 1567–1586, doi:10.1175/2009MWR3158.1.Buehner, M., J. Morneau, and C. Charette, 2013: Four-dimensional ensemble-variational data assimilation for global deterministic weather prediction.

,*Nonlinear Processes Geophys.***20**, 669–682, doi:10.5194/npg-20-669-2013.Buehner, M., and Coauthors, 2015: Implementation of deterministic weather forecasting systems based on ensemble-variational data assimilation at Environment Canada. Part I: The global system.

,*Mon. Wea. Rev.***143**, 2532–2559, doi:10.1175/MWR-D-14-00354.1.Clayton, A. M., A. C. Lorenc, and D. M. Barker, 2013: Operational implementation of a hybrid ensemble/4D-Var global data assimilation system at the Met Office.

,*Quart. J. Roy. Meteor. Soc.***139**, 1445–1461, doi:10.1002/qj.2054.Frolov, S., and C. H. Bishop, 2016: Localized ensemble-based tangent linear models and their use in propagating hybrid error covariance models.

,*Mon. Wea. Rev.***144**, 1383–1405, doi:10.1175/MWR-D-15-0130.1.Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125**, 723–757, doi:10.1002/qj.49712555417.Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129**, 123–137, doi:10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.Kleist, D. T., and K. Ide, 2015: An OSSE-based evaluation of hybrid variational-ensemble data assimilation for the NCEP GFS. Part II: 4DEnVar and hybrid variants.

,*Mon. Wea. Rev.***143**, 452–470, doi:10.1175/MWR-D-13-00350.1.Kuhl, D. D., T. E. Rosmond, C. H. Bishop, J. McLay, and N. L. Baker, 2013: Comparison of hybrid ensemble/4DVar and 4DVar within the NAVDAS-AR data assimilation framework.

,*Mon. Wea. Rev.***141**, 2740–2758, doi:10.1175/MWR-D-12-00182.1.Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP—A comparison with 4D-Var.

,*Quart. J. Roy. Meteor. Soc.***129**, 3183–3203, doi:10.1256/qj.02.132.Lorenc, A. C., N. E. Bowler, A. M. Clayton, S. R. Pring, and D. Fairbairn, 2015: Comparison of hybrid-4DEnVar and hybrid-4DVAR data assimilation methods for global NWP.

,*Mon. Wea. Rev.***143**, 212–229, doi:10.1175/MWR-D-14-00195.1.Machenhauer, B., 1977: On the dynamics of gravity oscillations in a shallow water model, with applications to normal mode initialization.

,*Contrib. Atmos. Phys.***50**, 253–271.Poterjoy, J., and F. Zhang, 2015: Systematic comparison of four-dimensional data assimilation methods with and without the tangent linear model using hybrid background error covariance: E4DVar versus 4DEnVar.

,*Mon. Wea. Rev.***143**, 1601–1621, doi:10.1175/MWR-D-14-00224.1.Rosmond, T., and L. Xu, 2006: Development of NAVDAS-AR: Non-linear formulation and outer loop tests.

,*Tellus***58A**, 45–58, doi:10.1111/j.1600-0870.2006.00148.x.Xu, L., T. Rosmond, and R. Daley, 2005: Development of NAVDAS-AR: Formulation and initial tests of the linear problem.

,*Tellus***57A**, 546–559, doi:10.1111/j.1600-0870.2005.00123.x.