## Abstract

In data assimilation applications using ensemble Kalman filter methods, localization is necessary to make the method work with high-dimensional geophysical models. For ensemble square root Kalman filters, domain localization (DL) and observation localization (OL) are commonly used. Depending on the localization method, appropriate values have to be chosen for the localization parameters, such as the localization length and the weight function. Although frequently used, the properties of the localization techniques are not fully investigated. Thus, up to now an optimal choice for these parameters is a priori unknown and they are generally found by expensive numerical experiments. In this study, the relationship between the localization length and the ensemble size in DL and OL is studied using twin experiments with the Lorenz-96 model and a two-dimensional shallow-water model. For both models, it is found that the optimal localization length for DL and OL depends linearly on an effective local observation dimension that is given by the sum of the observation weights. In the experiments no influence of the model dynamics on the optimal localization length was observed. The effective observation dimension defines the degrees of freedom that are required for assimilating observations, while the ensemble size defines the available degrees of freedom. Setting the localization radius such that the effective local observation dimension equals the ensemble size yields an adaptive localization radius. Its performance is tested using a global ocean model. The experiments show that the analysis quality using the adaptive localization is similar to the analysis quality of an optimally tuned constant localization radius.

## 1. Introduction

In ocean modeling and weather forecasting an estimate of the current state is important to initialize forecasts of the dynamical process. In sequential data assimilation, variants of the ensemble Kalman filter (EnKF; Evensen 1994) are commonly used. To deal with the particular problems of the geophysical systems many improvements of the methods (e.g., covariance inflation and localization; Houtekamer and Mitchell 1998) have been introduced. Typically, the state dimension of the models is very high, but only a small ensemble is feasible to use. This introduces noise and spurious correlations in the covariance matrices and limits the degrees of freedom for the analysis, which are defined by the ensemble size. Localization is used to access the problem of spurious correlations, and increases the degrees of freedom by calculating a local analysis in every grid point. This approach is justified by the fact that dynamical systems can locally behave like low-dimensional systems (see Patil et al. 2001). The positive effect of localization for ensemble Kalman filters has recently been described for different applications in oceanography and meterology (e.g., Nerger et al. 2006; Janjić et al. 2011; Otkin 2012; Losa et al. 2012; Kang et al. 2012).

Localization can be applied to the covariance matrices by pointwise multiplication (Houtekamer and Mitchell 2001), referred to as covariance localization (CL). Alternatively, the domain is decomposed as in domain localization (DL) and separate analyses for each subdomain are calculated (Houtekamer and Mitchell 1998). The latter method can be combined with observation localization (OL), where the observations are weighted according to their distance, as described in Hunt et al. (2007). Several studies (Miyoshi and Yamane 2007; Greybush et al. 2011; Sakov and Bertino 2011; Nerger et al. 2012) investigated the relationship between CL and OL and found that the results were comparable, even though the effective localization length is shorter for OL than for CL. The relation between different weight functions and localization radii was examined in Whitaker and Hamill (2002). They found that using a weight function similar to the Gaussian function [see Gaspari and Cohn 1999, Eq. (4.10)] produces better results than using a Heaviside step function. For a regional ocean model, the effect of different localization radii in DL was examined in Nerger et al. (2006). Yoon et al. (2010) have shown that localization improves the estimation of the covariances. According to their findings the localization radius should be chosen large enough to get most of the relevant covariances. For all of these localization methods, extensive tuning of the localization parameters is necessary to achieve the optimal results.

Recently, adaptive localization methods (Anderson 2007, 2012; Bishop and Hodyss 2007, 2009) have been developed to estimate the correlations between different variables flow dependently. Further, information-based localization schemes have been developed (Zupanski et al. 2007; Migliorini 2013). As shown for different examples, these methods improve the assimilation results, but they still require the choice of different parameters or are computational very expensive.

Here, an alternative approach to define the localization radius is investigated. From experiments using two small models, a relationship between the ensemble size and the optimal localization radius is derived in the context of dense observations with uniform error statistics. Examples of these kind of observations are gridded satellite observations of sea surface temperature or sea surface elevation, which are frequently used in ocean data assimilation applications (see e.g., Janjić et al. 2012; Losa et al. 2012; Sakov et al. 2012). The relation is then used to define an adaptive localization method and tested using a global ocean model.

The article is structured as follows. In section 2 the assimilation algorithm and the localization techniques are discussed. Afterward, the models are introduced and the numerical experiments are described in section 3. In section 4, the results for the Lorenz-96 model are presented. The experiments using the shallow-water-equations are discussed in relation to the Lorenz-96 model in section 5. In section 6 assimilation results using a global ocean model are discussed and conclusions are drawn in section 7.

## 2. Assimilation algorithm

The assimilation experiments in this study are performed with the widely used ensemble transform Kalman filter (ETKF; Bishop et al. 2001) with localization (Hunt et al. 2007). In this section, the ETKF and the localization techniques are reviewed.

### a. ETKF

Data assimilation methods provide an estimate of the state of a system **x**_{k} ∈ ℝ^{n} at time *k* given the model dynamics

and a set of observations . These are related to the model state via the observation operator **H**:

The errors ** ε** ∈ ℝ

^{n}and

**∈ ℝ**

*η*^{p}are assumed to be Gaussian with zero mean and covariance matrices ∈ ℝ

^{n×n}and ∈ ℝ

^{p×p}, respectively . Below, the time index

*k*is omitted.

The background state ^{f} and the covariance matrix ^{f} are now represented by an ensemble of state realizations ^{f(i)}, *i* ∈ {1, …, *N*}. The matrix ^{f} denotes the matrix whose column vectors are the ensemble members, and ^{f}′ is the matrix of ensemble perturbations. The state estimate is given by the mean of the ensemble .

The idea of the ETKF is to carry out the analysis in the ensemble space and then map the corrections into the state space via the ensemble perturbations. Here, only the equations for the ETKF are given. For a detailed derivation of the filter equations see Hunt et al. (2007).

At an analysis time, an analysis weight vector and an analysis covariance matrix are calculated in the space spanned by the ensemble perturbations:

The factor *ρ* ≥ 1 is used to inflate the ensemble (see Hunt et al. 2007).

The forecast ensemble is then

where **1** is a matrix containing 1 in all elements. During the forecast phase, the ensemble members are all moved forward in time using the full nonlinear model:

for all *i* = 1, …, *N*.

### b. Localization in ETKF

For a local analysis with the ETKF, the domain is decomposed into different local regions (Houtekamer and Mitchell 1998, e.g., every single grid point). An analysis increment is then calculated separately for every local domain. For the local analysis domains a support radius *l* for the observations is defined. Only observations closer than *l* from the analysis point will have a nonzero weight and thus influence on the local analysis. According to Hunt et al. (2007), the observations used for two neighboring analysis regions should overlap significantly to ensure that the weights are similar and a smooth analysis is produced. Except for very small localization radii, this was ensured in the experiments.

The observations inside each observation region are weighted according to their distance to the analysis point. These weights are applied by Schur multiplying the inverse of the observation covariance matrix by a matrix constructed from a correlation function (see Hunt et al. 2007).

We examine the effect of two localization techniques—DL and OL—which are characterized by their weighting functions. DL was formulated without explicit weights to the observations (see e.g., Houtekamer and Mitchell 1998; Nerger et al. 2006), but implicitly the weights

are used. Here, *l* is a predefined cutoff radius. This weighting corresponds to a unit weight inside an observation domain and zero outside.

For OL, a fifth-order polynomial [Gaspari and Cohn 1999, Eq. (4.10)] is used for weighting the observations. This function is very popular because its shape is similar to the probability density function of a normal distribution but has compact support. The equations can be written as

with

The scheme OL is the current standard for localization in the LETKF (e.g., Miyoshi and Yamane 2007). DL is an older formulation (see e.g., Houtekamer and Mitchell 1998; Nerger et al. 2006) and nowadays it is unusual to use DL, because OL yields better assimilation performance. However, the constant observation weights allow us to investigate the influence of localization without considering the effects of varying weights. If the results from DL are then compared to the variable weight functions of OL, the basic properties of localization become clearer.

## 3. Configuration of numerical experiments

The numerical experiments are performed with the Lorenz-96 model (Lorenz 1995) and a shallow-water model. Although being rather simple, both models exhibit strong nonlinear behavior. For the Lorenz-96 model this was described in (Lorenz 1995). The shallow-water model configuration used here can develop strongly nonlinear dynamics in the form of a meandering zonal jet and associated eddies (see Krysta et al. 2011). Since the dynamics of the models are distinct, the comparison of the results from both models provides insight to which extent the localization behavior is independent of the model.

### a. Experiments with the Lorenz-96 model

The characteristics of the localization techniques are first investigated with twin experiments using the 40-dimensional Lorenz-96 model (Lorenz 1995). For the twin experiments, the initial condition *X* ∈ ℝ^{40} with *X*_{20} = 8.008 and *X*_{j} = 8 for all *i* ≠ 20 is first integrated for 1000 time steps by using the classical forth-order Runge–Kutta scheme with a time step of 0.05. By integrating the model for another 5000 time steps, a trajectory is obtained that represents the truth. The observations are generated by adding Gaussian distributed random numbers with unit variance and zero mean to the truth. All grid points are observed. The observation error covariance matrix is chosen to be diagonal with the variance of the observation error on the diagonal. A constant inflation factor of *ρ* = 1.05 is used to inflate the background covariance matrix.

The initial ensemble is generated by second-order exact sampling (Pham 2001) from a model run over 10 000 time steps. The ensemble size is varied between 5 and 28. Localization radii between 0 and 20 are used for the experiments with DL, while for OL localization radii from 0 to 50 are used. All experiments are repeated 10 times with different random numbers for the ensemble initialization and observations. The ETKF as implemented in the Parallel Data Assimilation Framework (PDAF; Nerger and Hiller 2013, http://pdaf.awi.de) is used for the experiments.

For evaluating the assimilation performance, the root-mean-squared error, averaged over the assimilation times and the repetitions is used. This quantity will be denoted as MRMSE.

### b. Experiments with the shallow-water model

A 2D model using the shallow-water equations (see Krysta et al. 2011) is used to assess the localization in case of a multivariate model. A detailed review of the model is given in the appendix. The model is calculated on a regular square grid with 25-km resolution. At each grid point, the sea surface height (*h*), the horizontal (*u*), and the vertical velocities (*υ*) are defined. The state vector has 19 380 elements, of which only the sea surface height is observed in the experiments. Both fully observed *h* and partial observations of *h* are considered in the experiments. For the partial observations, every second and every third point in both directions is observed.

The experiment is initialized by integrating the initial state *h* = 500 m and *u* = *υ* = 0 m s^{−1} for 15 yr. The first 5 yr are used to spin up the model state. A sample of every second day from year 6 to 15 is used to initialize the ensemble through second-order-exact sampling. Synthetic observations are generated from the sea surface height with zero mean and constant variance of 2 m^{2}. The observation errors are assumed to be uncorrelated and are assimilated once a day.

A local analysis is calculated for every single grid point. The influence region for the observations is a circle of radius *l* around the analysis location. The weighting is applied according to the Euclidean distance. For the experiments, localization radii between 20 and 350 km with a step size of 10 km and ensemble sizes from 5 to 40 are used.

The inflation factor is set to *ρ* = 1.08. It is tuned so that the estimated and true errors are in the same order of magnitude for several converged configurations. Thus, is not tuned to achieve the minimal error, but such that the following results do not depend on the choice of the inflation factor. For the experiments, the same configuration of PDAF as in section 3a was used.

To compare the analysis quality of the different experiments, the root-mean-squared error (RMSE) of the height field *h* is examined.

## 4. Localization behavior with the Lorenz-96 model

### a. Optimal localization radius for DL

Figure 1 shows in the top row the MRMSE for all considered parameter values *N* and *l* for DL. The parameter region can be clearly divided into diverged and converged results. An experiment is defined as divergent, if the MRMSE of an experiment is larger than the observation error. For every ensemble of less than 21 members, filter divergence occurs when a certain localization radius is exceeded (e.g., *l* = 4 for *N* = 5). In the following, this radius is denoted by *l*_{div}.

For a constant localization radius, increasing the ensemble size reduces the MRMSE. However, after the most information content from the observations is extracted, very little error reduction is gained (e.g., for *N* > 14 for *l* = 7).

If the ensemble size is kept constant and the localization radius is increased, the error shrinks until an optimal localization radius, denoted by *l*_{opt}, is reached. Increasing *l* beyond this radius deteriorates the assimilation results and filter divergence can occur.

In the top panel of Fig. 2, *l*_{opt} and *l*_{div} as functions of the ensemble size *N* are shown for DL. The optimal value for *l*_{opt} is always close to *N*/2. Filter divergence occurs approximately if the localization radius, measured in grid points, exceeds the number of ensemble members. As long as in a local analysis not all observations are used, *l*_{opt} and *l*_{div} depend linearly on the ensemble size. For DL, the behavior changes if the ensemble size is big enough so that the filter converges without localization. In this case, filter divergence does not occur anymore and the global filter produces the best results.

### b. Optimal ensemble size for OL

For OL, the MRMSE for various localization radii and ensemble sizes is also divided into regions where the filter diverges or converges (Fig. 1, bottom). The assimilation converges as long as *l* is only slightly bigger than 2*N*. Compared to DL, the convergence region in case of OL is enlarged approximately by a factor of 2. A similar relationship holds for the optimal localization radius. Since more observations are assimilated, the best assimilation results for OL are more accurate than the ones for DL, even with less ensemble members. As expected, the observation weighting of OL results in a smaller error with a minimum MRMSE = 0.1883 compared to MRMSE = 0.1901 in case of DL.

The bottom panel of Fig. 2 shows that the relationship between the optimal localization radius *l*_{opt} and the ensemble size *N* is also linear. However, with OL longer localization radii can be used than with DL. The behavior of the optimal localization radius for *N* > 20 is not representative for OL. The reason is that *l*_{opt} is bounded by the largest tested localization radius. Thus, for *N* > 20 *l*_{opt} is likely to be larger than the radii tested here.

### c. Sampling quality of the covariance matrix

The localization implicitly modifies the state covariance matrix. Here, it is examined how well the true covariance matrix is approximated with localization. The results are shown for a single ensemble size (*N* = 16), but also hold for other choices.

The true covariance matrix ^{t} is generated from a twin experiment using an ensemble Kalman filter with an ensemble size of 128. Since the ensemble is significantly larger than the state dimension, this covariance matrix should be close to the truth.

At the end of the assimilation experiment, the normalized difference between the true covariance matrix and the analysis ensemble covariance matrix

is compared in the Frobenius norm ||⋅||_{F}. Here, the matrix denotes the ensemble covariance matrix calculated from an assimilation experiment with the localization radius *l* using the LETKF with OL.

In the local filter, not all elements of the covariance matrix are used. To take this into account, we define the matrix _{l} as the matrix with all elements (*p*)_{ij} set to zero that correspond to long distances beyond the localization radius, that is,

The quantity *δ*_{l} is then defined as

In Fig. 3, *δ* and *δ*_{l} are plotted for the case of OL for *N* = 16 over all localization radii. Both curves show small errors in the covariance estimates as long as *l* < 13. Increasing *l* beyond 13 worsens the estimation of the covariances. If only the observation at each analysis grid point is used (*l* = 0), the estimates of the variance are even worse than in the case when all observations are assimilated at once. Despite this, the state estimation with *l* = 0 is improved over the global filter (see Fig. 1). The smallest error is obtained for the localization radius *l* = 11. This is consistent with the optimal localization radius in section 4a. For *l* > 14 the assimilations become unstable until divergence happens.

Compared to the global estimate , the error of the local estimate is always smaller for all localization radii. This shows that the neglected covariances are noisy and therefore it is beneficial to omit those noisy parts. For *l* between 3 and 11 the error of the local approximation has roughly the same smallest value. In this interval, the covariances are gradually improved by increasing the localization radius. The interval becomes narrower if a smaller ensemble is used. Thus, it becomes more difficult to find the optimal localization radius. Overall, this experiment shows that a good choice of the localization radius improves the estimate of the covariance matrix .

### d. Relation between domain and observation localization

Domain and observations localization differ only in their weight functions. To relate the localizations of DL and OL, we define an effective observation dimension for an assimilation experiment as the sum of the local weights used to compute the analysis, that is,

where *p*_{l} is the number of observations in each local region, *l* is the localization radius, and *k* is the localization type (OL or DL). Thus, the effective observation dimension takes not only into account the number of observations but also the weights given to the observations. Because in the experiments the observations have uniform density, the effective observation dimension is identical for all grid points. It follows directly from the definition in Eq. (10) that for DL the effective observation dimension is equal to the number of observations. In Fig. 4, *d*_{W} is plotted for the optimal and divergence localization radii for both DL and OL. The optimal effective observation dimensions are in good agreement for ensemble sizes below 16 with a difference of at most 1. For 16 ≤ *N* ≤ 20 the difference gets slightly bigger. Only values up to *N* = 20 are shown, because, as noted in section 4b, the effective observation dimension for OL is bounded by the considered localization radii for *N* ≥ 20.

The effective observation dimension where divergence occurs (bottom of Fig. 4) is nearly equal for *N* < 9 for DL and OL. Above *N* = 9, the observation dimension where the analysis with OL diverges is slightly smaller than the one for domain localization. Yet, the trend for the two functions is still similar. Above *N* = 17, the filter with OL converged for all considered localization radii. The behavior of the curves is also similar if an exponential weight function is used (not shown). Overall, decreasing the weight of the observations, does not constrain the ensemble as strongly anymore and the number of observations that can effectively assimilated is increased.

## 5. Localization with the shallow-water equations

In this section the localization experiments are repeated using a model with different dynamics, to examine whether similar results are obtained. In addition, the shallow-water model is multivariate, so an additional degree of complexity is introduced.

The MRMSEs for the experiments with the shallow-water model (see Fig. 5) are qualitatively similar to the ones for the Lorenz-96 model. The ability of the filter to handle more observations with increasing ensemble size is clearly visible (e.g., the step from *l* = 70 km to *l* = 80 km for *N* = 8 to *N* = 9) for DL (Fig. 5, top). Compared to the experiments with the Lorenz-96 model, the convergence region is not increasing uniformly with growing ensemble size. This is due to the nonuniform increase of the number of observations in the local domains because the domain is two-dimensional. The smallest errors for the considered ensemble sizes are achieved for localization radii between 80 and 100 km. If *l* is increased beyond this value, the analysis quality is degraded. For OL (Fig. 5, bottom), the methods behave more uniformly, since the weighting of the observations allows a smoother increase of the observation dimension. This leads to an almost linear increase of the optimal localization radius for *N* ≤ 14.

For OL, the convergence region is almost twice as large compared to DL. This occurs because the weight of distant observations is decreased so that more observations can be assimilated in a beneficial way. As a consequence, the errors are also slightly reduced. The smallest MRMSE = 0.27 is obtained with a localization radius between 190 and 210 km and the largest investigated ensemble size.

In Fig. 6, the effective observation dimensions for the experiments are shown. For DL, the optimal observation dimension *l*_{opt} is nearly a step function. This means that a much bigger ensemble is needed to assimilate the stepwise increase of observations in an optimal way. This effect does not occur for OL where the optimal observation dimension is growing at a slower rate. For *N* = 15 and *N* = 28, the optimal observation dimension for DL and OL are almost the same. In between, the optimal observation dimension increases about linearly for OL compared to the sudden step for DL. The optimal value for the effective observation dimension is slightly smaller than the ensemble size *N* for OL, and depends linearly on the ensemble size.

For the effective observation dimension *l*_{div} at which the filter diverges, the behavior is slightly different. Divergence occurs for both weighting functions for nearly the same effective observation dimension. Again, the dependence on *N* is smoother for OL than for DL.

The optimal localization radii for the unobserved *u* and *υ* fields are almost equal to the optimal localization radius for the height field. There is only a minor difference for DL, when the local number of observations is heavily increased (e.g., *l* = 70 to *l* = 80 km). At this point the optimal localization radius is a bit smaller for the *u* and *υ* fields than for the *h* field (not shown).

For DL, the slopes of *l*_{opt} and *l*_{div} as functions of the ensemble size are reduced compared to the experiment with the Lorenz-96 model. Nevertheless, the effective observation dimensions for DL and OL are very similar, thus the degrees of freedom for both methods are very close to each other.

If the observation density is reduced, the optimal effective observation dimension still depends linearly on the ensemble size (see Fig. 7). The smaller the observation density, the smaller the optimal effective observation dimension becomes. Thus, if not the whole field is observed, the optimal localization radius has to be normalized by the observational density. This especially becomes an issue, if the spatial distribution of the observations is not regular. This case will be examined in future studies.

Figure 5 also allows us to estimate the optimal localization radius as a function of the ensemble size. The relationship is approximately

where *dx* denotes the grid spacing. At this localization radius, the effective observation dimension is approximately equal to the ensemble size. This relation should hold in general for dense observations that are distributed in two dimensions and a regular orthogonal model grid.

## 6. Localization in a global ocean model

The experiments discussed above indicate that an optimal localization radius is obtained when the effective observation dimension is approximately equal to the ensemble size. To assess whether this localization can be applied in a realistic large-scale model, we apply it here in twin experiments using a global configuration of the finite-element sea ice ocean model (FESOM; Danilov et al. 2004; Wang et al. 2008; Timmermann et al. 2009). The twin experiments are similar to an application of FESOM by Janjić et al. (2012) where real satellite dynamic ocean topography data were assimilated.

### a. Experimental setup

FESOM is an ocean general circulation model that utilizes finite elements to solve the hydrostatic ocean primitive equations. Unstructured triangular meshes are used, which allow for a varying resolution of the mesh. The configuration used here has a horizontal resolution of about 1.3° with refinement in the equatorial region. The model uses 40 vertical levels.

For the data assimilation, FESOM was coupled to the assimilation framework PDAF (Nerger et al. 2005; Nerger and Hiller 2013; http://pdaf.awi.de) into a single program. The state vector includes the sea surface height (SSH) and the three-dimensional fields of temperature, salinity, and the velocity components. The state vector has a size of about 10 million. For the twin experiments, the model is initialized from a spinup run and a trajectory over 1 yr is computed. This trajectory contains the model fields at each tenth day and represents the “truth” for the assimilation experiments. An ensemble of 32 members is used, which is generated by second-order exact sampling from the variability of the true trajectory (see Pham 2001). The initial state estimate is given by the mean of the true trajectory. Pseudo observations of the SSH at each surface grid point are generated by adding uncorrelated random Gaussian noise with a standard deviation of 5 cm to the true model state. The analysis step is computed after each forecast phase of 10 days with an observation vector containing about 68 000 observations. Overall, the experiments were conducted over a period of 360 days.

The experiments use the ETKF with OL. Two experiments with fixed localization radii of *l* = 500 km and *l* = 1000 km are performed. A third experiment uses the localization radius determined such that the effective observation dimension is equal to the ensemble size. The inflation factor was set to *ρ* = 1.1.

### b. Assimilation performance

Figure 8 shows of the RMS errors of the sea surface height over time relative to an experiment without data assimilation for the three experiments. For the fixed radius of *l* = 1000 km, the relative RMS error is quickly reduced below 0.5, but increases again after day 150. The relative RMS errors for the fixed radius of 500 km and the experiment with the localization radius based on the effective observation dimension are similar and the errors generally decrease over time. However, the variable localization results in smaller RMS errors than the fixed localization radius. In the second half of the experiment, the RMS errors obtained with the variable localization radius are even smaller than those for the fixed localization radius of 1000 km.

Overall, the experiments show that the effective observation dimension can be used to specify a spatially varying localization radius that yields estimates of similar quality than those produced by a fixed radius. However, while the fixed radius has to be tuned with several experiments, this is not required for the variable radius.

## 7. Conclusions

In this study, the optimal value for the localization radius in domain localization and observation localization was examined using numerical experiments. Using the Lorenz-96 model and a nonlinear shallow-water model allowed for the assessment the localization behavior with two simple nonlinear models with different dynamics. The main focus was on dense observations with uniform observational error, which are used in real assimilation applications (e.g., as gridded satellite observations of the ocean surface temperature or sea surface height). For this type of observations, it was possible to assess the relation of the localization radius to the ensemble size over the whole model domain.

The localization radius is optimal if the estimation errors are minimal. It depends on the ensemble size and varies for different weight functions. Typically, the optimal radius is determined by experimentation. Yet, one can define an effective observation dimension given as the sum of the observation weights involved in a local analysis. The optimal localization radius was obtained, if the effective observation dimension was about equal to the size of the ensemble. Moreover, the optimal value of the effective observation dimension is constant for different weighting functions. This situation can be explained by the fact that the degrees of freedom for the analysis are determined by the rank of the ensemble. The degrees of freedom are optimally utilized if the ensemble size equals the effective observation dimension. In the case of constant observation errors, the degrees of freedom are distributed over different numbers of observations for different weight functions. If the observation network is less dense, other effects, like sampling error for distant observations, become more important so that this relation is weaker. For multivariate data assimilation in the shallow-water model, the optimal effective observation dimension was the same for all three model fields. If the observation density is reduced, the linear relation in the shallow-water model was still conserved, but the slope was different. For both models, the optimal value of the effective observation dimension was roughly equal to the ensemble size if a field was completely observed. For dense observations that are distributed in two dimensions, a simple relation between the ensemble size and the optimal localization radius was deduced from the experiments. This relation can be used to define an adaptive localization radius that ensures that the effective observation dimension is equal to the number of ensemble members. The relation was tested using a global ocean model where synthetic observations of the sea surface height were assimilated. With the adaptive localization, without tuning, a similar error reduction as using an optimally tuned fixed localization radius was achieved. This study leads to a simple relation between the ensemble size and the localization radius that should result in the minimal analysis errors of the observed field for dense observations. However, in real applications the localization radius can be influenced by other factors. For example, it is known that localization influences balances in the model state and a longer localization radius will have a smaller impact on the balances. Accordingly, one might prefer a longer localization radius in multivariate assimilation applications. In addition, the study only considered twin experiments. When assimilating real observations one can encounter biases and the observation error covariance matrix might be incorrectly estimated. It is unclear to which extent these factors can require the adaption of the localization radius to obtain overall optimal assimilation results.

In the experiments, the optimal localization length was not influenced by the model properties. Thus, while different fields in a model can have different correlation length scales, this does not seem to influence the optimal localization radius. A reason for this finding might be that the optimal localization radius for dense observations is rather short. For example, the optimal radius was eight grid points in the shallow-water model for the largest tested ensemble of 40 members. In combination with the weighting by observation localization, observations have only an influence over a distance of a few grid points. This distance should be short enough to effectively remove spurious correlations when the real correlations are very short ranged. If the true error correlations are significant over a long range, at some point they can no longer influence the analysis, because of the limited degrees of freedom provided by the ensemble. Since it is well known that long-range correlations are not well approximated with small ensembles, this might be desirable. Nevertheless, the relation between the optimal localization radius and the physical error correlation should be further investigated.

The findings of this study hold for dense observations with uniform observation errors and spatially constant inflation. The experiments with lower observation density indicate that the chosen effective localization dimension has to be smaller in this case, to account for the lack of information. This effect might be related to the sampling quality of the ensemble-estimated state error covariance matrix. When observations with spatially varying error variances and varying spatial distribution are assimilated, the global measurements of this study are no longer possible. One can expect that observations with different error variances show a varying influence on the analysis step that should be accounted for in the localization, perhaps by information-based methods (e.g., Migliorini 2013). These aspects will be investigated in a future study.

## Acknowledgments

The authors thank the reviewers and the editor for their helpful comments. This work was funded by the SANGOMA EU Project (Grant FP7-SPACE-2011-1-CT-283580-SANGOMA).

### APPENDIX

#### The Shallow-Water Equations

The shallow-water model used in section 4 is similar to that used in Krysta et al. (2011). For completeness, the equations are given here. This two-dimensional model consists of the horizontal and vertical velocities (*u*, *υ*) and the water height *h*. The model equations are

The model domain is chosen as the square domain [0, *L*] × [*y*_{0} − *L*, *y*_{0} + *L*] with length *L* = 2000 km and *y*_{0} = 0. The Coriolis parameter *f* is approximated by a *β*-plane approximation:

where *β* = 2 × 10^{−11} m^{−1} s^{−1}. The variable *g** denotes the reduced gravity, *ρ*_{0} is water density, *ν* is diffusivity friction, and *r* is the bottom friction coefficient. The system is driven by a wind stress ** τ** = (

*τ*

_{x},

*τ*

_{y})

^{T}, which is given by

*τ*

_{x}(

*y*) =

*τ*

_{0}cos[2

*π*(

*y*−

*y*

_{0})/

*L*] and

*τ*

_{y}= 0. The constants are chosen as

*f*(0) = 7 × 10

^{−5}s

^{−1},

*g** = 0.02 m s

^{−2},

*ρ*

_{0}= 10

^{3}km

^{−3},

*τ*

_{0}= 0.015 N m

^{−2},

*r*= 5 × 10

^{−9}s

^{−1}, and

*ν*= 9 m

^{2}s

^{−1}.

The domain is discretized on a regular Arakawa C grid with 25-km resolution in both directions. For the boundary, a no-slip condition is used (i.e., *u* = *υ* = 0). As a time stepping method, a leapfrog scheme (Sadourny 1975) smoothed by the Robert–Asselin filter (Robert 1966) with *α* = 0.01 and Δ_{t} = 30 min is used.

## REFERENCES

*J. Geophys. Res.,*

**117,**D24101, doi:.

*Proc. Seminar on Predictability,*Reading, United Kingdom, ECMWF, 1–18.

**139,**2033–2054, doi:.

*Use of High Performance Computing in Meteorology—Proceedings of the 11th ECMWF Workshop,*Reading, United Kingdom, ECMWF,

*J. Geophys. Res.,*

**113,**C05015, doi:.

*J. Geophys. Res.,*

**112,**D17107, doi:.