## 1. Introduction

Covariance localization (Hamill et al. 2001; Houtekamer and Mitchell 2001; Anderson 2007a, 2012) and inflation (Anderson and Anderson 1999; Zhang et al. 2004; Anderson 2007b, 2009; Whitaker et al. 2008; Miyoshi 2011; Whitaker and Hamill 2012) are essential for the ensemble Kalman filter (EnKF; Evensen 1994; Burgers et al. 1998) to perform well in oceanic and atmospheric applications (e.g., Houtekamer and Mitchell 1998; Whitaker et al. 2004) because the EnKF is subject to sampling error resulting from small ensemble sizes (Anderson 2007a). A basic EnKF often diverges from the true state or fails numerically for large geophysical applications with affordable ensemble sizes (Houtekamer and Mitchell 1998).

With small ensemble size, the EnKF suffers from spuriously large magnitudes of estimated covariances between observations and state variables, especially when the separation between an observation and a state variable is large. To reduce the errors from these spurious covariances, Houtekamer and Mitchell (1998) implemented a cutoff radius beyond which the observations had no impact. This procedure for limiting the impact of remote observations on state variables is called localization. By using the compactly supported polynomial approximation of a normal distribution given by Gaspari and Cohn (1999), Houtekamer and Mitchell (2001) and Hamill et al. (2001) demonstrated the efficacy of a more general localization that limits the covariance between an observation and a state variable as a function of separation to reduce the impact of spuriously large covariances.

Localization can be implemented in a serial EnKF by multiplying the regression coefficient (equivalently the component of the Kalman gain) by a distance-dependent function that traditionally starts from 1.0 at the observation location and gradually decreases to 0.0 at a threshold distance (Hamill et al. 2001). It can also be implicitly implemented by applying the assimilation algorithm on local regions as in the local ensemble transform Kalman filter (Ott et al. 2004; Miyoshi et al. 2007).

The Gaspari and Cohn (1999) function (GC function) has been the standard solution for localizing in the horizontal for atmospheric applications of ensemble Kalman filters. The GC function is an approximately Gaussian, fifth-order, piecewise polynomial function with a single real parameter that defines the width of the function. The parameter is half the distance at which the GC function goes to zero, and it must be tuned for good filter performance for a given application. However, tuning even this single parameter can be computationally expensive for large atmospheric and oceanic applications.

Localization in the vertical is also required for good filter performance in atmospheric applications, but there are few theoretical studies of the vertical localization. Whitaker et al. (2004) chose a vertical localization function that has a value of 1.0 below *p* unit. Localization in the vertical needs further study since the appropriate localization form is not well established.

Moreover, there is evidence that different localization functions are needed for different observation types (Houtekamer and Mitchell 2005) and kinds of state variable (Anderson 2007a, 2012), and at different times (Anderson 2007a; Chen and Oliver 2010). This further increases the complexity of tuning localization parameters and motivates the design of algorithms to automatically compute the localization.

Bishop and Hodyss (2007) proposed a method for adaptive ensemble covariance localization. Further studies of Bishop and Hodyss (2009a,b) dynamically moved the localization function with the true error correlation and adapted to the width of the true error correlation function. Anderson (2007a) proposed a hierarchical ensemble filter to detect and correct the sampling error that creates the need for localization. The nature of sampling error in ensemble Kalman filters was further explored by Anderson (2012) in which an algorithm that computes localization as a function of ensemble size and sample correlation was proposed. A number of other studies including Zhou et al. (2008) and Emerick and Reynolds (2010) have related localization to the sample correlation.

Taking sampling error and other potential errors into account, Anderson and Lei (2013) developed an empirical localization function (ELF) that can automatically provide an estimate for the localization for any possible observation type with a given kind of state variable. The ELF makes few a priori assumptions for the shape of the localization function. It significantly outperformed the best GC function for an observation type whose forward operator was a sum of state variables (similar to satellite radiance) in the Lorenz-96 model (Lorenz and Emanuel 1998). Lei and Anderson (2014) found that the ELFs that are computed for each observation type with every kind of state variable outperformed the best GC function in the dynamical core of the Geophysical Fluid Dynamics Laboratory (GFDL) B-grid climate model (Anderson et al. 2004).

The ELF provides a natural way to examine the localization in the vertical, because it only assumes the localization is a function of the distance between an observation and a state variable and it does not assume a Gaussian-like function. Since there are few theoretical studies for the vertical localization, the ELF is applied here to provide an estimate of the vertical localization. The ELF can give estimates of the localization function for different observation types. It can also be constructed for different regions without additional computational cost. Compared to tuning the GC localization functions for different observation types and regions, the ELF can provide the localization function with much less computation. Thus, based on the promising results from a low-order model and a simple atmospheric general circulation model, the ELF is further explored using the serial EnKF in the Data Assimilation Research Testbed (DART; Anderson et al. 2009) and the Community Atmosphere Model, version 5.0 (CAM5; Neale et al. 2012), a full-physics numerical weather prediction (NWP) model.

Section 2 describes the experimental design. A group of GC parameters are tuned and results are compared in section 3. Section 4 explains the procedures to compute the ELFs in the horizontal and vertical and the ELFs varying with region. The results of the horizontal and vertical ELFs from section 4 are presented in section 5. The convergence of the ELFs is discussed in section 6. Section 7 discusses the ELFs with empirical inflation and section 8 gives conclusions.

## 2. Experimental design

### a. The DART/CAM system

An observing system simulation experiment (OSSE) is conducted in the DART/CAM system (Raeder et al. 2012). The forecast model is CAM5, the atmospheric component of the Community Earth System Model, version 1 (CESM1; Gent et al. 2011). The model uses a finite-volume grid with approximately 2° resolution (96 × 144), and has 30 vertical levels. At the surface, CAM5 is forced by the temporally interpolated observed monthly-mean sea surface temperature. The default configuration of the Atmospheric Model Intercomparison Project (AMIP; Gates 1992) protocol is used.

The data assimilation system is DART in which the serial ensemble adjustment Kalman filter (EAKF; Anderson 2001) is used to combine synthetic observations with an ensemble of forecasts from CAM5 to produce an ensemble of analyses at an analysis time. To avoid filter divergence, the spatially varying and time-varying state space adaptive inflation (Anderson 2009) and sampling error correction (Anderson 2012) are applied. The default covariance localization is the GC localization in which the vertical distance is converted to equivalent radians by normalizing the pressure difference between an observation and a state variable by 1000 hPa. The empirical localization functions will be discussed in sections 4–7.

### b. Experimental design of the OSSE

The nature run is obtained by advancing the forecast model from an initial condition at 0000 UTC 1 January to 1200 UTC 30 November 2008. Synthetic observations of temperature and zonal and meridional winds are generated by adding random draws from a normal distribution with mean 0 and specified observation error variances to spatially interpolated values from the gridded true state. The observation error variances are 1 K^{2} for temperature and 4 m^{2} s^{−2} for zonal and meridional winds. The synthetic observations are nearly uniformly distributed in the horizontal (600 profiles on the sphere) and range from 1000 to 5 hPa in the vertical on standard radiosonde mandatory pressure levels. There are 27 000 synthetic observations available every 12 h.

Adding small perturbations from a normal distribution with mean 0 and standard deviation of 10^{−11} (Anderson et al. 2005) to the temperature field of the true state at 0000 UTC 30 June 2008 generates *N* = 80 perturbed states. These perturbed states are advanced to 1200 UTC 31 July 2008, and the result is a set of *N* ensemble members that can be viewed as random draws from the model’s climatological distribution. The synthetic observations are assimilated every 12 h from 0000 UTC 1 August to 1200 UTC 31 August 2008. The default GC localization with half-width 0.2 rad (1274 km) is used. The ensemble analyses at 1200 UTC 31 August 2008 are the ensemble initial conditions for assimilation experiments.

A set of assimilation experiments with different GC half-widths is conducted in order to find the optimal value. These experiments extend from 1200 UTC 31 August to 1200 UTC 30 September 2008. The first 10 days are discarded to eliminate transient effects, and the last 20 days are used to produce the results in section 3.

With an initial guess of the GC half-width as 0.2 rad that might not be the optimal value, an assimilation experiment (GC0.2) is conducted from 1200 UTC 31 August to 1200 UTC 30 November 2008. The results from September are discarded to eliminate transient effects. The results from October are used to compute the empirical localization functions. The details of constructing the empirical localization functions will be discussed in section 4. Additional assimilation experiments using the computed empirical localization functions are conducted starting from the ensemble initial conditions at 1200 UTC 31 August 2008 and for the same time span as GC0.2. Similar to Anderson and Lei (2013), the empirical localization function is iteratively computed by using the output of an OSSE in which the previously constructed empirical localization function is applied. Table 1 summarizes the experiments using different localization functions. The experiments using empirical localization functions will be compared to the initial guess of the GC half-width and the best GC half-width in sections 5–7.

Summary of the experiments with different localization functions.

### c. Evaluation metrics

For the set of experiments with different GC half-widths, the time mean root-mean-square error (RMSE) of the ensemble mean from the truth for state variables of surface pressure, temperature, zonal and meridional winds, and specific humidity are computed.

For the experiments with empirical localization functions, the time series of domain-averaged RMSE of the ensemble mean from the truth and ensemble spread for each state variable are used for evaluation. The vertical profiles of temporally and horizontally averaged RMSE and ensemble spread are also examined. The temporally and vertically averaged inflation for each state variable is also presented. The evaluation metrics are computed for October and November separately. These can be seen as dependent and independent verification periods because the ELFs are computed from the OSSE results for October. Results are shown for the RMSE and ensemble spread of the prior, but qualitatively similar results are obtained for the RMSE and ensemble spread of the posterior.

## 3. Tuning the GC half-width

To find the optimal GC half-width, a set of assimilation experiments with the GC half-width varying from 0.2 to 0.6 rad are performed. Figure 1 shows the time mean RMSEs for each state variable averaged globally (GL), in the Southern Hemisphere (SH), in the tropics (TP), and in the Northern Hemisphere (NH). For the globally averaged RMSEs, GC0.4 has the smallest RMSE for temperature and zonal wind and GC0.45 has the smallest RMSE for surface pressure and meridional wind, while GC0.35 has the smallest RMSE for specific humidity. Thus, different kinds of state variables require different localization scales. The differences of the globally averaged RMSE among the GC half-widths of 0.35, 0.4, and 0.45 rad are small, so 0.4 rad (2548 km) is chosen as the best half-width.

The time mean RMSEs computed for SH, TP, and NH separately have larger variations than for the globally averaged RMSE. GC0.5 has the smallest surface pressure RMSE in SH and TP, and GC0.45 has the smallest surface pressure RMSE in NH (Fig. 1a). For temperature and zonal and meridional wind, GC0.5 has the smallest RMSE in SH, and GC0.4 has the smallest RMSE in TP. GC0.35 has the smallest RMSE of temperature and zonal wind in NH, and GC0.45 has the smallest RMSE of meridional wind in NH. For specific humidity, GC0.35 has the smallest RMSE in TP and NH, and GC0.45 has the smallest RMSE in SH. These variations highlight the complexity of tuning the GC half-width.

## 4. Empirical localization function

The ELF proposed by Anderson and Lei (2013) minimizes the RMS difference between the true value and the posterior ensemble mean, given the results from an OSSE. The ELF is computed as follows: 1) for each pair of an observation and a state variable compute the separation between them, 2) divide the set of all pairs into subsets using the separation and possibly additional criteria (details below), and 3) compute the localization value based on the pairs for each subset.

Let **Y** be the set of observations to be used in a subsequent OSSE and **X** be the set of model state variables. Let **Y**, *L* is the total number of observations and **X**, *M* is the total number of state variables. All pairs *K* is the total number of pairs in the subset contains all the pairs for which the separations are between (*s* − 1) × 0.02 and *s* × 0.02 rad, *S* is the total number of subsets.

*t*and

*p*denote the true value and prior, the overbar denotes the ensemble mean,

Given the separation between an observation and a state variable, the localization is approximated by the product of the localization for the horizontal separation and the localization for the vertical separation. Lei and Anderson (2014) computed the ELFs for an observation type at a vertical level for every kind of state variable at every vertical level. However, this is impractical here because CAM5 has more kinds of state variables and more vertical levels than the idealized B-grid model.

The horizontal ELFs are constructed from the output of an OSSE as follows. Empirical localization as a function of horizontal separation is computed separately on the 10 lowest mandatory radiosonde levels (1000, 850, 700, 500, 400, 300, 200, 150, 100, and 50 hPa) for temperature and zonal and meridional winds (a total of 30 different values for each separation). To increase the sample size *K* and produce a smoother ELF, the gridded state values of temperature and zonal and meridional winds on the 10 model levels (1, 7, 10, 12, 14, 15, 18, 20, 22, and 25) that are closest to each of the 10 lowest mandatory radiosonde levels are used for the true observation values. In this way, the sample size of **Y** for an ELF increases from 600 (number of observation profiles) to 13 824 (grid points on one vertical level). For temperature, an empirical localization is computed for each subset ST(*i*, *s*) where *i* indexes the 10 model levels. The subset ST(*i*, *s*) contains all pairs (*y*, *x*) where *y* is a temperature state variable on level *i* at any time *t* from the OSSE and *x* is a temperature variable on level *i* at time *t* with a separation from *y* that is between (*s* − 1) × 0.02 and *s* × 0.02 rad. Subsets SU(*i*, *s*) and SV(*i*, *s*) are constructed similarly for meridional and zonal wind variables.

A *z* test is used to assess the significance of the localization for each subset (Lei and Anderson 2014). Given the localization value and its standard error, the *z* test is applied with a null hypothesis that the localization *z* value is outside the critical region for a 95% confidence and the value of the localization for the subset is set to 0.

Similarly, vertical localization as a function of vertical separation is computed separately on the same 10 model levels for temperature and zonal and meridional winds. For temperature, an empirical localization is computed for each subset ST(*i*, *s*) where *i* indexes the 10 model levels. For the vertical, the subset ST(*i*, *s*) contains all pairs (*y*, *x*) where *y* is a temperature state variable on level *i* at any time *t* from the OSSE and *x* is a temperature variable in the same vertical column at time *t* with a vertical separation from *y* that is in a specified range of log pressure. The number of separation subsets varies with level *i* because of the finite domain in the vertical. Subsets SU(*i*, *s*) and SV(*i*, *s*) are constructed similarly for meridional and zonal winds. Again, the *z* test is applied to each localization value.

From the output of the GC0.2 OSSE, the resulting localizations for each of the subsets are plotted as black dots in Fig. 2b. Finally, the vertical ELFSP is computed by Eq. (6) and is shown by the blue solid line in Fig. 2b. The final localization applied in a subsequent OSSE (ELFOne) is the product of the horizontal and vertical ELFSPs.

Figure 2 shows that the localization decreases with separation in both the horizontal and vertical. The horizontal ELFs (the black dots in Fig. 2a) have smaller localization at small separations than the GC functions with half-widths 0.2 and 0.4 rad. The horizontal ELFs are larger than the GC with half-width 0.2 (0.4) rad when the separation is larger than 0.2 (0.5) rad. Similarly, the vertical ELFs are smaller than the GC localization with half-width 0.2 (0.4) when the separation is smaller than 0.2 (0.4) log*p* units, and are much broader than the GC function with half-width 0.2 (0.4) for separations larger than 0.2 (0.4) log*p* units.

The horizontal and vertical empirical localization functions are also computed for the SH, TP, and NH separately. Figure 3 shows the horizontal and vertical ELFSPs for the three regions. The horizontal ELFSPs for SH and NH (ELFSP_SH and ELFSP_NH) have a similar shape to the global ELFSP. The horizontal ELFSP of TP (ELFSP_TP) has larger values than the ELFSP, ELFSP_SH, and ELFSP_NH when the separation is smaller than 0.3 rad, and it has a more compact tail for larger separations. The vertical ELFSPs of SH and NH (ELFSP_SH and ELFSP_NH) are similar to each other, and both have smaller magnitude than the ELFSP. However, the vertical ELFSP for TP (ELFSP_TP) is broader than the global ELFSP. Therefore, the horizontal and vertical ELFSPs are different for the three regions; in particular, the ELFSPs in TP are different from those in SH and NH. The horizontal and vertical ELFSPs that are varying with region are used in a subsequent OSSE (ELFReg).

## 5. Results of ELFOne and ELFReg

The global ELFSPs are used as an empirical localization in an OSSE called ELFOne while the regional ELFSPs are applied in an OSSE ELFReg. Figure 4 shows the time series of area-weighted global-average RMSE and ensemble spread for temperature, zonal and meridional wind, surface pressure, and specific humidity. The black solid line separates the dependent and independent verification periods. In both periods, ELFOne has smaller RMSE of temperature and zonal and meridional winds than GC0.2. The RMSE of surface pressure and specific humidity can be seen as indirect verification, because the computation of ELFs does not include minimizing the RMSE error of surface pressure and specific humidity. In both verification periods, ELFOne has smaller surface pressure RMSE than GC0.2, and it has similar specific humidity RMSE to GC0.2. However, ELFOne has larger RMSE for every state variable than GC0.4, the best GC experiment.

For the directly assimilated state variables of temperature and zonal and meridional winds, ELFReg has smaller RMSE than ELFOne and GC0.2 in both dependent and independent verification periods. For the indirectly assimilated state variables of surface pressure and specific humidity, ELFReg also has smaller RMSE than ELFOne and GC0.2, except that ELFReg has similar RMSE of surface pressure to ELFOne in the dependent verification period. Thus ELFReg improves on ELFOne by using regionally varying ELFSPs, but ELFReg still has larger RMSE for every state variable than GC0.4.

Next, the RMSE and ensemble spread for each state variable are computed for the SH, TP, and NH, respectively. Figure 5 shows the time series of RMSE and ensemble spread for temperature and zonal wind in the three geographic regions. ELFOne has smaller RMSE of temperature and zonal wind than GC0.2 in SH and NH for both verification periods (Figs. 5a,b and 5e,f), as for the global results. However, ELFOne has larger RMSE of temperature and zonal wind in TP than GC0.2 for both verification periods (Figs. 5c,d). The vertical profiles of RMSE show that ELFOne has smaller RMSE of temperature and zonal wind than GC0.2 at nearly every vertical level in SH and NH, but it has larger RMSE of temperature and zonal wind in TP (not shown). Similar results are obtained for meridional wind and specific humidity, although ELFOne produces smaller RMSE of surface pressure than GC0.2 in all three regions (not shown).

Compared to ELFOne, ELFReg that has ELFSPs varying with region has slightly smaller RMSE of temperature and zonal wind than ELFOne in SH and NH for both verification periods. In TP, ELFReg has smaller RMSE of temperature and zonal wind than ELFOne and GC0.2 at nearly every assimilation time (Figs. 5c,d) and every vertical level (not shown). Similar results are obtained for meridional wind, specific humidity, and surface pressure (not shown). Thus, the ELFSP_TP that is different from the ELFSP, ELFSP_SH, and ELFSP_NH in both the horizontal and vertical helps to reduce the RMSE in TP. This also explains why ELFReg has smaller globally averaged RMSE than ELFOne for every state variable (Fig. 4).

As shown in Fig. 4, GC0.2 has larger ensemble spread than RMSE for all state variables. GC0.4, ELFOne, and ELFReg have smaller ensemble spread than RMSE for all state variables except surface pressure. GC0.4, ELFOne, and ELFReg have smaller ensemble spread than GC0.2 for all state variables, because they have broader localizations than GC0.2. ELFOne and ELFReg have larger ensemble spread than GC0.4 for all state variables, because the horizontal global ELFSP and regional ELFSPs have much smaller magnitude than the GC localization function with half-width 0.4 rad when separation is smaller than 0.4 rad, although the vertical global ELFSP and regional ELFSPs are broader than the GC localization function with half-width 0.4 rad. Figure 5 shows that GC0.2 has larger ensemble spread than RMSE especially in SH and NH, while GC0.4, ELFOne, and ELFReg have smaller ensemble spread than RMSE mainly in TP. ELFReg has similar ensemble spread to ELFOne in SH, and ELFReg has larger (smaller) ensemble spread than ELFOne in NH (TP). This is consistent with the localization functions used in ELFOne and ELFReg (Fig. 3).

The temporally and vertically averaged inflation of temperature in the independent verification period is shown in Fig. 6. The inflation values for GC0.2 that are larger than 1.0 are distributed over the sphere, and these inflation values are relatively small compared to those for GC0.4, ELFOne, and ELFReg. The inflation values that are larger than 1.0 for GC0.4, ELFOne, and ELFReg are mostly concentrated in TP, which is consistent with the smaller ensemble spread than RMSE in TP (Fig. 5c). ELFOne has smaller inflation values than ELFReg, and ELFReg has smaller inflation values than GC0.4.

## 6. Convergence of the ELFs

Section 5 shows that the ELFSPs that vary by region have advantages over the single ELFSP especially in TP. ELFOne and ELFReg outperform GC0.2 from which the ELFSPs are computed, but they have larger RMSE for every state variable except surface pressure than GC0.4. Anderson and Lei (2013) demonstrated that ELFs appear to converge to a solution and lead to smaller RMSE when the construction process of the ELF is iterated. Thus, the convergence of the ELFSPs varying with region is examined in this section.

Five OSSEs (ELFReg_I#, # = 1, … , 5) are conducted iteratively. Each OSSE uses the regional ELFSPs that are computed based on the output of the previous OSSE. From the output of ELFReg (section 5), a new group of ELFSPs (ELFSP_SH_I1, ELFSP_TP_I1, and ELFSP_NH_I1 in the horizontal and vertical) are constructed following the procedures in section 4. A new OSSE ELFReg_I1 uses this new group of ELFSPs. After five iterations, a group of ELFSPs (ELFSP_SH_I6, ELFSP_TP_I6, and ELFSP_NH_I6 in the horizontal and vertical) are constructed from the output of ELFReg_I5.

Figure 7 shows the iterative ELFSP_SHs in the horizontal and vertical. From iteration 1 to 4 the magnitude of the horizontal (vertical) ELFSP_SHs becomes larger when the separation is smaller than 0.4 rad (4 log*p* units). The horizontal (vertical) ELFSP_SHs from iteration 3 to 6 have values larger than 1.0 when the separation is smaller than 0.2 rad (1.5 log*p* units), although they are constrained to 1 at 0 separations (section 4). Anderson and Lei (2013) and Lei and Anderson (2014) noted that localization values larger than 1.0 indicate insufficient spread and the empirical localization acts as an inflation. To avoid this, empirical localization values larger than 1.0 are set to 1.0 when used in an OSSE for results in this section. This constraint is relaxed in section 7. The ELFSP_SHs in the horizontal and vertical appear to have converged after three iterations. Similar results are obtained for ELFSP_TP and ELFSP_NH, although ELFSP_TP (ELFSP_NH) converges faster (slower) than ELFSP_SH.

The horizontal and vertical ELFSPs of iteration 3 for the three regions are shown by the solid lines in Fig. 8. The ELFSPs are broader than the GC0.4, except that the horizontal ELFSP_TP_I3 has a more compact tail than GC0.4. The horizontal and vertical ELFSPs in SH and NH have wider localization scales than the ELFSPs in TP. Please note that to avoid the ELFSP acting as an inflation, the ELFSP values larger than 1.0 at distances greater than 0 are set to 1.0 when used in OSSEs in this section.

Experiment ELFReg_I1 using the first iteration of the ELFSPs varying with region produces smaller RMSE than ELFReg, and the following iterative experiments that use the ELFSPs constructed from the output of the previous OSSE have similar RMSE to ELFReg_I1 (not shown). The RMSE and spread of ELFReg_I3 are shown in Fig. 9. ELFReg_I3 produces similar globally averaged RMSE of temperature and zonal and meridional winds to GC0.4 in both dependent and independent verification periods. It has smaller RMSE of surface pressure than GC0.4 but larger RMSE of specific humidity than GC0.4 in both periods. ELFReg_I3 produces similar RMSE of temperature and zonal and meridional winds to GC0.4 in all three geographic regions and smaller RMSE of surface pressure than GC0.4 in all three regions. However, ELFReg_I3 has larger RMES of specific humidity than GC0.4 in TP (not shown).

The time mean RMSE for every state variable from GC0.4 and the five iterative ELFReg OSSEs for the dependent and independent verification periods are shown in Table 2. To examine if the time mean RMSEs are significantly different between each pair of OSSEs for each state variable and period, a paired Student’s *t* test is performed on the time series of area-weighted-average RMSE. Differences between pairs of time mean RMSEs are referred to as significant here if the *t* test indicates differences at greater than the 95% significance level. For the directly assimilated state variables of temperature and zonal and meridional winds, ELFReg_I4 and ELFReg_I5 have significantly smaller time mean RMSE than GC0.4 in the dependent verification period, and ELFReg_I3 has smaller RMSE than GC0.4 in the independent verification period although the difference is not significant. For surface pressure, ELFReg experiments from iterations 1 to 5 have significantly smaller time mean RMSE than GC0.4 in the dependent verification period, and ELFReg experiments from iterations 2 to 5 have significantly smaller time mean RMSE than GC0.4 in the independent verification period. For specific humidity, ELFReg_I5 has significantly smaller time mean RMSE than GC0.4 in the dependent verification period, but GC0.4 has significantly smaller time mean RMSE than the iterative ELFReg experiments in the independent verification period. The specific humidity performs differently from the other variables because the specific humidity is less dynamically constrained and has smaller correlation length scales than the other variables (Fig. 1).

For both the dependent (DEP) and independent (IND) periods, the first row is the time mean RMSE for a particular state variable for the GC0.4 OSSE, the five iterated ELFReg OSSEs, and the ELFRegEI_I3 OSSE. The minimum RMSE in each row is set boldface. The second row lists all experiments to the right of a given experiment for which the RMSE differences are not significant at the 95% level using the Student’s *t* test discussed in the text with EI3 denoting experiment ELFRegEI_I3. Zonal wind, US; meridional wind, VS; surface pressure, PS.

Figure 9 shows that the ELFReg_I3 has smaller ensemble spread than RMSE. The ensemble spread becomes smaller than RMSE after one iteration, and the ensemble spread further decreases with more iterations and converges when the ELFSPs appear to have converged (not shown). The inflation for the iterative ELFReg experiments has a similar spatial pattern to ELFReg (Fig. 6d) but with larger magnitudes.

The ELFSPs that have converged have much broader vertical localization than GC0.4 (Figs. 7 and 8). To determine if the advantage of the ELFSP over GC0.4 is mainly from the broader vertical localization, an experiment with the GC horizontal localization of half-width 0.4 rad and a GC vertical localization that fits the converged ELFSPs is conducted for the dependent verification period. The new experiment produces smaller RMSE than GC0.4 for all state variables, and it has similar RMSE to the converged ELFSPs (not shown). Thus, the broader vertical localizations suggested by the ELFSPs appear to be reasonable.

## 7. ELF with empirical inflation

Lei and Anderson (2014) demonstrated that ELFs could detect insufficient ensemble spread and act as an inflation through localization values larger than 1.0 at small separations. Although the adaptive inflation is used in all experiments, the ELF is able to detect the insufficient spread that is not satisfactorily addressed by the heuristic adaptive inflation algorithm.

The dashed lines in Fig. 8 show the horizontal and vertical ELFSPs with empirical inflation (ELFSPEI) for the three regions for iteration 3, because the ELFSPs varying with region appear to have converged after three iterations. The horizontal and vertical ELFSPEIs in the three regions have values larger than 1.0 at small separations. The localization value of ELFSPEI_TP at zero separation is larger than those of ELFSPEI_SH and ELFSPEI_NH. This is consistent with the smaller ensemble spread than RMSE in TP compared to SH and NH (Fig. 5).

The ELFSPEIs varying with region for iteration 3 are used in a new experiment ELFRegEI_I3 in which the adaptive inflation is also applied. The area-weighted global average RMSE and ensemble spread for ELFRegEI_I3 are shown in Fig. 9. ELFRegEI_I3 has slightly smaller RMSE for temperature and zonal and meridional winds than ELFReg_I3 and GC0.4 in the dependent verification period, and this improvement occurs in all three regions (not shown). ELFRegEI_I3 has similar RMSE for temperature and zonal and meridional winds to ELFReg_I3 and GC0.4 in the independent verification period. ELFRegEI_I3 has smaller RMSE for surface pressure than GC0.4 and slightly larger RMSE for specific humidity than GC0.4. The inflation for experiment ELFRegEI_I3 has a similar spatial pattern to ELFReg (Fig. 6d) but with larger magnitudes.

Table 2 shows that ELFRegEI_I3 has the smallest time mean RMSE of temperature and zonal and meridional winds in the dependent verification period, and the RMSE is significantly smaller than GC0.4. In the independent verification period, ELFRegEI_I3 has similar time mean RMSE of temperature to ELFReg_I3, which has the minimal time mean RMSE. ELFRegEI_I3 has the smallest time mean RMSE of zonal and meridional winds, although the RMSE is not significantly different from GC0.4. ELFRegEI_I3 has significantly smaller time mean RMSE of surface pressure than GC0.4 in both verification periods. ELFRegEI_I3 has larger time mean RMSE of specific humidity than ELFReg_I5 in the dependent verification period and GC0.4 in the independent verification period.

## 8. Conclusions

A new algorithm for empirical localization of observations is tested in CAM5. The empirical localization function (ELF) is defined to minimize the RMS difference between the true value and posterior ensemble mean. Without assuming a Gaussian-like function and empirically tuning the GC localization parameter, the ELF can provide an estimate of the localization function for an observation type and a kind of state variable. To practically apply the ELF, the localization value is assumed to be a product of the horizontal localization value and vertical localization value. The horizontal and vertical ELFs are first constructed from the output of an OSSE for each observation type at a pressure level with a state variable that has the same type as the observation. A *z* test is used to eliminate noisy values of the ELFs at the tail. Then a three-knot cubic spline is applied to the horizontal and vertical ELFs in order to obtain a smooth localization function that is applied to all types of observations and kinds of state variables.

From the output of an OSSE that uses GC localization with half-width 0.2 rad, a horizontal ELFSP and a vertical ELFSP are first computed. ELFOne, which uses the horizontal and vertical ELFSPs, has smaller RMSE of temperature and zonal and meridional winds than GC0.2. For the unobserved state variables, ELFOne has smaller surface pressure RMSE than GC0.2 and similar specific humidity RMSE to GC0.2. The improvements of ELFOne compared to GC0.2 are mainly from the error reduction in SH and NH. Therefore, the horizontal and vertical ELFSPs are then computed for the Southern Hemisphere (SH), tropics (TP), and Northern Hemisphere (NH) separately. ELFReg that uses the horizontal and vertical ELFSPs varying with region has smaller RMSE of every state variable than ELFOne and GC0.2. ELFReg has advantages over ELFOne especially in TP. However, ELFOne and ELFReg have larger RMSE of every state variable than GC0.4, which is the best GC experiment.

The convergence of the ELFSPs varying with region is examined by iteratively computing the ELFSPs from the output of the previous OSSE. The horizontal and vertical ELFSPs appear to have converged after three iterations. After the ELFSPs converge, the localization values are larger than 1.0 at small separations indicating insufficient spread, and the empirical localization plays the role of empirical inflation (Anderson and Lei 2013; Lei and Anderson 2014).

ELFReg_I# (# = 1, … , 5) that use the iteratively computed ELFSPs varying with region have smaller RMSE of every state variable than ELFReg. At the 95% significance level, the converged ELFSPs from iterations 3 to 5 generally lead to significantly smaller time mean RMSE than GC0.4 for the directly assimilated state variables of temperature and zonal and meridional winds in the dependent verification period, and similar RMSE to GC0.4 in the independent verification period. For unobserved state variables, ELFReg experiments from iterations 2 to 5 have significantly smaller time mean RMSE of surface pressure than GC0.4 in both verification periods, and ELFReg_I5 has significantly smaller time mean RMSE of specific humidity than GC0.4 in the dependent verification period.

The ELFSPs with empirical inflation are tested by allowing localization values larger than 1.0 at small separations. ELFRegEI_I3 has the smallest time mean RMSE of temperature and zonal and meridional winds in the dependent verification period, and the RMSE is significantly smaller than GC0.4. ELFRegEI_I3 has the smallest time mean RMSE of zonal and meridional winds in the independent verification period, although the RMSE is not significantly different from GC0.4. Thus, the advantage of the empirical localization function that plays the role of empirical inflation is demonstrated (Lei and Anderson 2014). Therefore, starting from a suboptimal GC half-width, the converged ELFSPs can outperform both the suboptimal and optimal GC half-widths. The converged ELFSPs with empirical inflation have even smaller RMSE than those without empirical inflation.

The ELF algorithm does not make any assumptions about the shape of the resulting localization function. While the horizontal ELFs are approximately Gaussian, the vertical ELFs are not. Distinctly non-Gaussian ELFs were obtained in Anderson and Lei (2013) for an observation type whose forward operator involves a sum of state variables (similar to a satellite radiance) in a low-order model. The shape of the ELF was related to the ensemble correlations between the observation and the state variables. The vertical ELFs here are similarly related to the ensemble prior correlations in the vertical. The non-Gaussian vertical localization functions outperform the best GC localization function.

With the experimental design applied here, the ELF requires less computation than finding an optimal half-width for the GC localization function. The ELF appears to have converged after three iterations. Thus, the computational cost for finding the converged ELF is the expense for four OSSE experiments, since the computation for the ELF given the output of an OSSE is relatively small. Here, eight OSSE experiments were needed to find the optimal GC half-width, although the optimal GC half-width could be found with fewer experiments by using a better search algorithm. Moreover, the computational cost of manually tuning the GC half-width would significantly increase if multiple GC half-widths need to be tuned for different observation types and regions. Thus, the ELF has computational cost advantages over manually tuning the GC half-width.

The ELFs are applied here to a serial ensemble Kalman filter, but it may be possible to approximate the method in ensemble Kalman filters that simultaneously assimilate observations (Lei and Anderson 2014). From the output of any ensemble OSSE, serial or simultaneous, an ELF can be computed by Eq. (3), and an ELFSP, a smooth ELF, can be obtained by applying a cubic spline to the ELF. A localization matrix that has the same dimension as the background error covariance matrix can be constructed that approximates the ELF or ELFSP, and the Schur product of the background error covariance matrix and the localization matrix provides the localized background error covariance matrix for ensemble Kalman filters that simultaneously assimilate observations.

The variation of the best GC parameter indicates different localization functions are needed for different state variables and regions. Further investigation of ELFs with different observation types and state variables is needed, and also the examination of spatially varying ELFs is needed. The localization scale may vary temporally as shown by the time series of RMSE, so that the exploration of time-varying ELFs is also of interest. Moreover, a uniform observation network is used here, and further exploration of the performance of the ELF with different observation densities and networks is needed. The ELF has been examined in the Weather Research and Forecasting Model (WRF; Skamarock et al. 2008) with higher resolutions than those used in CAM here.

Although the ELFs computed in this manuscript use the output of an OSSE, the ELFs can provide an initial estimate of the localization functions for real observation experiments. The ELFs can also be computed with real observations from an ensemble simulation system (Anderson and Lei 2013). Further investigation of applying the ELFs in real observation experiments and constructing the ELFs with real observations is needed.

Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation. Thanks to Kevin Raeder, Tim Hoar, and Nancy Collins for technical support, and thanks to Doug Nychka for helpful discussions.

## REFERENCES

Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129**, 2884–2903, doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.Anderson, J. L., 2007a: Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter.

,*Physica D***230**, 99–111, doi:10.1016/j.physd.2006.02.011.Anderson, J. L., 2007b: An adaptive covariance inflation error correction algorithm for ensemble filters.

,*Tellus***59A**, 210–224, doi:10.1111/j.1600-0870.2006.00216.x.Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters.

,*Tellus***61A**, 72–83, doi:10.1111/j.1600-0870.2008.00361.x.Anderson, J. L., 2012: Localization and sampling error correction in ensemble Kalman filter data assimilation.

,*Mon. Wea. Rev.***140**, 2359–2371, doi:10.1175/MWR-D-11-00013.1.Anderson, J. L., , and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts.

,*Mon. Wea. Rev.***127**, 2741–2758, doi:10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO;2.Anderson, J. L., , and L. Lei, 2013: Empirical localization of observation impact in ensemble Kalman filters.

*Mon. Wea. Rev.,***141,**4140–4153, doi:10.1175/MWR-D-12-00330.1.Anderson, J. L., and Coauthors, 2004: The new GFDL global atmosphere and land model AM2–LM2: Evaluation with prescribed SST simulations.

,*J. Climate***17**, 4641–4673, doi:10.1175/JCLI-3223.1.Anderson, J. L., , B. Wyman, , S. Zhang, , and T. Hoar, 2005: Assimilation of surface pressure observations using an ensemble filter in an idealized global atmospheric prediction system.

,*J. Atmos. Sci.***62**, 2925–2938, doi:10.1175/JAS3510.1.Anderson, J. L., , T. Hoar, , K. Raeder, , H. Liu, , N. Collins, , R. Torn, , and A. Avellano, 2009: The Data Assimilation Research Testbed: A community facility.

,*Bull. Amer. Meteor. Soc.***90**, 1283–1296, doi:10.1175/2009BAMS2618.1.Bishop, C. H., , and D. Hodyss, 2007: Flow adaptive moderation of spurious ensemble correlations and its use in ensemble based data assimilation.

,*Quart. J. Roy. Meteor. Soc.***133**, 2029–2044, doi:10.1002/qj.169.Bishop, C. H., , and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part 1: Tests on simple error models.

,*Tellus***61A**, 84–96, doi:10.1111/j.1600-0870.2008.00371.x.Bishop, C. H., , and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part 2: A strategy for the atmosphere.

,*Tellus***61A**, 97–111, doi:10.1111/j.1600-0870.2008.00372.x.Burgers, G., , P. J. van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126**, 1719–1724, doi:10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.Chen, Y., , and D. S. Oliver, 2010: Cross-covariances and localization for EnKF in multiphase flow data assimilation.

,*Comput. Geosci.***14**, 579–601, doi:10.1007/s10596-009-9174-6.Emerick, A., , and A. Reynolds, 2010: Combining sensitivities and prior information for covariance localization in the ensemble Kalman filter for petroleum reservoir applications.

,*Comput. Geosci.***15**, 251–269, doi:10.1007/s10596-010-9198-y.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to do forecast error statistics.

,*J. Geophys. Res.***99**(C5), 10 143–10 162, doi:10.1029/94JC00572.Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125**, 723–757, doi:10.1002/qj.49712555417.Gates, W. L., 1992: AMIP: The Atmospheric Model Intercomparison Project.

,*Bull. Amer. Meteor. Soc.***73**, 1962–1970, doi:10.1175/1520-0477(1992)073<1962:ATAMIP>2.0.CO;2.Gent, P. R., and Coauthors, 2011: The Community Climate System Model version 4.

,*J. Climate***24**, 4973–4991, doi:10.1175/2011JCLI4083.1.Hamill, T. M., , J. S. Whitaker, , and C. Snyder, 2001: Distance-dependent filtering of background-error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129**, 2776–2790, doi:10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.Hastie, T., , and R. Tibshirani, 1990:

*Generalized Additive Models.*CRC Press, 352 pp.Houtekamer, P. L., , and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126**, 796–811, doi:10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.Houtekamer, P. L., , and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129**, 123–137, doi:10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.Houtekamer, P. L., , and H. L. Mitchell, 2005: Ensemble Kalman filtering.

,*Quart. J. Roy. Meteor. Soc.***131**, 3269–3289, doi:10.1256/qj.05.135.Lei, L., , and J. L. Anderson, 2014: Comparisons of empirical localization techniques for serial ensemble Kalman filters in a simple atmospheric general circulation model.

,*Mon. Wea. Rev.***142,**739–754, doi:10.1175/MWR-D-13-00152.1.Lorenz, E. N., , and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model.

,*J. Atmos. Sci.***55**, 399–414, doi:10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.Miyoshi, T., 2011: The Gaussian approach to adaptive covariance inflation and its implementation with the local ensemble transform Kalman filter.

,*Mon. Wea. Rev.***139**, 1519–1535, doi:10.1175/2010MWR3570.1.Miyoshi, T., , S. Yamane, , and T. Enomoto, 2007: Localizing the error covariance by physical distance within a local ensemble transform Kalman filter (LETKF).

,*SOLA***3**, 89–92, doi:10.2151/sola.2007-023.Neale, R. B., and Coauthors, 2012: Description of the NCAR Community Atmosphere Model (CAM 5.0). NCAR Tech. Note NCAR/TN-486+STR, National Center for Atmospheric Research, 289 pp.

Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation.

,*Tellus***56A**, 415–428, doi:10.1111/j.1600-0870.2004.00076.xRaeder, K., , J. L. Anderson, , N. Collins, , T. J. Hoar, , J. E. Kay, , P. H. Lauritzen, , and R. Pincus, 2012: DART/CAM: An ensemble data assimilation system for CESM atmospheric models.

,*J. Climate***25**, 6304–6317, doi:10.1175/JCLI-D-11-00395.1.Skamarock, W. C., and Coauthors, 2008: A description of the advanced research WRF version 3. NCAR Tech Note NCAR/TN-475+STR, 125 pp. [Available online at http://www.mmm.ucar.edu/wrf/users/docs/arw_v3.pdf.]

Whitaker, J. S., , and T. M. Hamill, 2012: Evaluating methods to account for system errors in ensemble data assimilation.

,*Mon. Wea. Rev.***140**, 3078–3089, doi:10.1175/MWR-D-11-00276.1.Whitaker, J. S., , G. P. Compo, , X. Wei, , and T. M. Hamill, 2004: Reanalysis without radiosondes using ensemble data assimilation.

,*Mon. Wea. Rev.***132**, 1190–1200, doi:10.1175/1520-0493(2004)132<1190:RWRUED>2.0.CO;2.Whitaker, J. S., , T. M. Hamill, , X. Wei, , Y. Song, , and Z. Toth, 2008: Ensemble data assimilation with the NCEP Global Forecast System.

,*Mon. Wea. Rev.***136**, 463–482, doi:10.1175/2007MWR2018.1.Zhang, F., , C. Snyder, , and J. Sun, 2004: Impacts of initial estimate and observation availability on convective-scale data assimilation with ensemble Kalman filter.

,*Mon. Wea. Rev.***132**, 1238–1253, doi:10.1175/1520-0493(2004)132<1238:IOIEAO>2.0.CO;2.Zhou, Y., , D. McLaughlin, , D. Entekhabi, , and G. C. Ng, 2008: An ensemble multiscale filter for large nonlinear data assimilation problems.

,*Mon. Wea. Rev.***136**, 678–698, doi:10.1175/2007MWR2064.1.