## Abstract

This work assesses the large-scale applicability of the recently proposed nonlinear ensemble transform filter (NETF) in data assimilation experiments with the NEMO ocean general circulation model. The new filter constitutes a second-order exact approximation to fully nonlinear particle filtering. Thus, it relaxes the Gaussian assumption contained in ensemble Kalman filters. The NETF applies an update step similar to the local ensemble transform Kalman filter (LETKF), which allows for efficient and simple implementation. Here, simulated observations are assimilated into a simplified ocean configuration that exhibits globally high-dimensional dynamics with a chaotic mesoscale flow. The model climatology is used to initialize an ensemble of 120 members. The number of observations in each local filter update is of the same order resulting from the use of a realistic oceanic observation scenario. Here, an importance sampling particle filter (PF) would require at least 10^{6} members. Despite the relatively small ensemble size, the NETF remains stable and converges to the truth. In this setup, the NETF achieves at least the performance of the LETKF. However, it requires a longer spinup period because the algorithm only relies on the particle weights at the analysis time. These findings show that the NETF can successfully deal with a large-scale assimilation problem in which the local observation dimension is of the same order as the ensemble size. Thus, the second-order exact NETF does not suffer from the PF’s curse of dimensionality, even in a deterministic system.

## 1. Introduction

Data assimilation (DA) refers to the combination of predictions from numerical models with real-world observations. The resulting state estimates are of high relevance, particularly in atmospheric or oceanic modeling (e.g., Bennett 2002). Its application in model initialization represents an essential contribution to forecast quality. However, DA also allows one to consistently reconstruct the system state over extended time periods, and the resulting reanalyses are an extremely valuable data source for climate studies and diagnostics (e.g., Bengtsson et al. 2007). Additionally, systematic model or observation errors can be diagnosed from DA output (e.g., Haimberger 2007).

In principle, the DA problem is entirely determined by Bayes’s theorem (e.g., Wikle and Berliner 2007). However, a major challenge in geophysical applications (e.g., van Leeuwen 2010) consists of the high dimensionality of the involved state and observation spaces, together with the lack of knowledge about the involved probability density functions (pdfs). This work focuses on sequential, ensemble-based square root filters that iteratively improve the state estimate throughout the assimilation window. They perform an ensemble prediction during the forecast step and update the prior ensemble with the current observations in the analysis step. Compared to variational methods (e.g., 4DVAR; Talagrand and Courtier 1987), which fit a whole trajectory, these filters are conceptually attractive and easy to implement since no tangent linear and adjoint models are required (Kalnay et al. 2007). Particle filters (PFs) offer a direct Monte Carlo solution to Bayes’s theorem without a parametric assumption on the forecast pdf (Doucet et al. 2001). However, the likelihood weights are computed in a high-dimensional probability space in which the region of significant probability density is extremely narrow. Consequently, many ensemble members obtain insignificant weight, which leads to filter divergence (Snyder et al. 2008). This is known as the curse of dimensionality (Silverman 1986).

The ensemble Kalman filter (EnKF; Evensen 1994; Burgers et al. 1998) avoids this issue by assuming Gaussian distributions, where the required mean and covariance of the state are directly estimated from the forecast ensemble. Over the past two decades, the EnKF has evolved to a robust scheme that is applicable to large-scale systems with small ensemble sizes, such as in numerical weather prediction (e.g., Reich et al. 2011; Miyoshi and Kunii 2012) or oceanography (e.g., Nerger et al. 2007; Losa et al. 2012). Deterministic variants such as the (local) ensemble transform Kalman filter [(L)ETKF, Bishop et al. 2001; Hunt et al. 2007] avoid sampling noise in the analysis step by applying a matrix square root transform. However, the implicit Gaussian assumption leads to a linear update mechanism and renders the analysis suboptimal in nonlinear systems (Lei and Bickel 2011). Consequently, there is broad research activity toward enabling the applicability of nonlinear filters in high dimensions (e.g., van Leeuwen 2009). A more recent development is the equivalent weights PF (EWPF; van Leeuwen 2010). It explores proposal densities to ensure that the particles lie in the important region and exhibit small variability of the weights, which allows for applicability to large-scale DA (van Leeuwen and Ades 2013; Ades and van Leeuwen 2015). However, the EWPF only works in models with stochastic model errors and requires an adequate adaption of the forecast step to properly guide the particles (Ades and van Leeuwen 2013), which increases the implementation complexity.

While the EWPF aims at considering the full analysis pdf, approximations to fully nonlinear filtering have been suggested as well (van Leeuwen 2009). This work builds upon Tödter and Ahrens (2015), who introduced the nonlinear ensemble transform filter (NETF). It is based on the nonlinear ensemble adjustment filter (NLEAF; Lei and Bickel 2011), which updates each member with perturbed observations. In contrast, the NETF applies a matrix square root transform, similar to the ETKF, to ensure that the analysis mean and covariance exactly match the Monte Carlo estimates of the Bayesian expectations. To overcome the curse of dimensionality inherent to most PFs, the effective dimensionality is reduced by localizing the analysis as done in the LETKF. The NETF only acts in the analysis step, which allows the use of deterministic models. The empirical findings resulting from assimilation experiments with ensemble sizes not larger than 100 (Tödter and Ahrens 2015) point out that it produces reasonable and stable analyses, even in higher-dimensional and chaotic systems with state dimensions up to 10^{3}. In these experiments with simplified models, the NETF outperforms the stochastic EnKF and the (L)ETKF in the presence of nonlinearity. Additionally, it improves upon the stochastic NLEAF in larger-dimensional cases because of the deterministic update mechanism. Thus, the NETF exhibits potential applicability to large-scale DA.

Motivated by these results, this work assesses the NETF’s performance in an advanced circulation system. This system is characterized by more realistic and higher-dimensional dynamics with a state dimension that is orders of magnitudes larger than studied before. For that purpose, we apply the Nucleus for European Modelling of the Ocean (NEMO v3.3; Madec 2012), a state-of-the-art ocean general circulation model (OGCM), within an identical twin experiment that includes a realistic observation scenario and a challenging filter initialization. The principal objective of this paper is to demonstrate that the generic NETF is applicable to high-dimensional DA, even though it only relies on the Bayesian weights in the analysis step. In particular, no proposal densities, as in the EWPF, or other modifications need to be used. Furthermore, a comparison with the LETKF reveals aspects that influence the filter’s performance. This allows us to gain more insight into its behavior and to enable future improvements.

The remainder of this paper is organized as follows. Section 2 contains a brief review of ensemble square root filters and establishes the formal analogy of NETF and ETKF. In section 3, the configuration of the ocean model and its circulation characteristics are presented. Section 4 specifies the observation scenario and filter setup. The results of the assimilation experiments are shown and discussed in section 5. Finally, section 6 draws the principal conclusions and outlines potential continuative research paths.

## 2. Ensemble square root filtering

In the following work, we exclusively concentrate on the analysis step and, therefore, neglect any time indices. We consider a dynamical system , and the state vector of size *d* contains the prognostic model variables. We assume that an ensemble of *m* independent and identically distributed prior states is available. Usually, it arises from a preceding forecast step and can be regarded as the best representation of the forecast pdf, (i.e., before assimilating the new observation). The ensemble vectors are stored in the columns of the matrix . Subtracting the prior ensemble mean, , from each column yields the ensemble perturbation matrix . The observation, represented by the vector of size *k*, contains all measurements available at that time. The observation operator maps any model state into observation space. We define the matrix that contains the mapped ensemble vectors [i.e., ]. The observational uncertainty is represented by the likelihood pdf, , which may be of arbitrary form in the NETF. In this work, we restrict the likelihood to a Gaussian distribution with covariance {i.e., }.

As the prior pdf is usually unknown, approximate solutions have been developed, thus leading to a large variety of filters and smoothers. Second-order exact filters aim at an analysis ensemble such that its mean, , and covariance, , exactly match some specified values. Both the ETKF and NETF can be described by the same square root filtering framework (Nerger et al. 2012). The analysis ensemble, , incorporates the observation by updating the prior mean and perturbations as follows:

The mean analysis increment is a linear combination of the prior perturbations, and it is determined by a weight vector , while the matrix , typically a matrix square root, transforms into analysis perturbations, . The optional random matrix constitutes a rotation in the ensemble subspace. The weight vector and transform matrix depend on the prior ensemble and observational properties. Their specific forms in the ETKF and NETF are presented next.

### a. Linear filtering and the ETKF

The Kalman filter (KF; Kalman 1960) assumes that the prior as well as the likelihood pdf are Gaussian. Then, according to Bayes’s theorem, the analysis pdf remains Gaussian, and its mean and covariance can be computed analytically from the prior moments. Ensemble Kalman filters extract the prior moments from an ensemble integration. This partly accounts for the nonlinear nature of the model (Anderson 2012). Stochastic EnKFs (Burgers et al. 1998; Houtekamer and Mitchell 1998) introduce additional sampling errors (Whitaker and Hamill 2002). In contrast, deterministic EnKFs (Tippett et al. 2003) transform the prior ensemble such that its first two moments exactly match the theoretical KF values. The ETKF (Bishop et al. 2001) is determined by the update applied with Eqs. (1) and (2) if the weight vector and transform matrix are chosen as follows:

Following Wang et al. (2004), the unique symmetric square root should be chosen to compute the transform matrix in Eq. (4).

### b. Nonlinear filtering and the NETF

The Gaussian assumption in the EnKFs leads to a robust solution, but the resulting filters are suboptimal in nonlinear systems. In contrast, PFs (Doucet et al. 2001) offer a nonparametric solution of the assimilation problem. The basic importance sampling PF (Gordon et al. 1993) interprets the prior ensemble as a mixture of delta distributions that approximate the prior pdf [i.e., for an equally weighted ensemble]. Using this particle representation, Bayes’s theorem states that the analysis pdf is approximated as

The analysis weights contain all the information extracted from the observation. For Gaussian observation errors,

While the EWPF (van Leeuwen 2010) attempts to sample the full analysis pdf, the NETF (Tödter and Ahrens 2015) represents a generic, yet approximative solution to nonlinear filtering. It applies a deterministic update mechanism in analogy to the ETKF such that the first two moments of the analysis ensemble exactly match the unbiased PF estimates of the Bayesian mean and covariance. The analysis mean and perturbations of the NETF are computed with Eqs. (1) and (2) by using the Bayesian weights defined in Eq. (5) and the following transform matrix:

Here, the matrix is the diagonal matrix formed by the weights. As in the ETKF, the unique symmetric square root is used in Eq. (8). For stability reasons, the random rotation with [see Eq. (2)] is mandatory in the NETF. Details on the theoretical background of the NETF can be found in Tödter and Ahrens (2015) and Tödter (2015).

The ETKF and NETF are described by update equations of identical form. They only differ in the explicit equations of the weight vector and the transform matrix, thus allowing simple implementation of the NETF. Both filters perform the update in the ensemble subspace, and their computational expenses are similar for a given ensemble size.

### c. Localization and inflation

In large-scale applications, most ensemble filters require modifications because the ensemble size is usually much smaller than the system’s dimension (e.g., Anderson 2012). Observation localization (OL; Hunt et al. 2007) suppresses spurious correlations associated with distant locations and increases the rank of the analysis covariance by partitioning the state vector into subsets, the so-called local domains. Typically, a local domain contains all state variables at each grid point or in each vertical column (e.g., Houtekamer and Mitchell 1998; Losa et al. 2012). Each local domain is updated independently using only a part of the global observation vector. Typically, nearby observations are included by choosing an appropriate localization radius. The overlapping observation regions of different local domains ensure that the covariances in the ensemble covariance matrix are taken into account. Mathematically, the observation impact is reduced with distance by multiplying by an appropriate correlation function through the use of a Schur product (Whitaker and Hamill 2002; Kirchgessner et al. 2014). This term appears in Eqs. (3)–(4) and (6) for the LETKF and NETF, respectively. Therefore, OL can be directly adopted for the NETF. It reduces the effective dimension of the observation space, where the Bayesian weights are computed. This counteracts filter divergence in higher dimensions. A detailed algorithm for the localized NETF is given in Tödter and Ahrens (2015, their section 4f).

Inflation is usually applied to counteract the tendency of the ensemble to underestimate the uncertainty. Multiplicative inflation adjusts the prior ensemble perturbations by . This increases the prior covariance by a factor of *γ*, which is slightly larger than 1 (Anderson and Anderson 1999). The localization radius and inflation factor constitute the main tuning factors for most ensemble filters. They allow the EnKFs to achieve results that are competitive to established variational schemes (e.g., Buehner et al. 2010; Fairbairn et al. 2014), and they can be used similarly in the NETF.

## 3. Ocean model and its configuration

This section gives an overview of the ocean model applied in our experiment. Additionally, we discuss the characteristic circulation that emerges from our model setup.

### a. Model characterization

The NEMO model (Madec 2012, http://www.nemo-ocean.eu) contains an OGCM that numerically solves the primitive equations that determine the ocean’s dynamics and thermodynamics. Its prognostic variables are temperature *T*, salinity *S*, zonal and meridional fluid velocity (), as well as sea surface height (). Processes that occur on a subgrid scale, such as turbulence or convection, are parameterized. For example, the diffusive fluxes are described by second-order closure schemes that use the gradients of the large-scale fields together with associated eddy coefficients. The NEMO model applies a leapfrog time stepping scheme for all nondiffusive terms in conjunction with a Robert–Asselin time filter. For the lateral diffusive and damping parts of the equations, a forward scheme is used, while in the vertical direction, an implicit scheme is required. The spatial discretization applies second-order central finite differences on a curvilinear Arakawa C-type grid.

Boundary conditions are required to determine the fluxes of momentum, mass, and energy at the ocean interfaces. At the bottom, usually all fluxes are assumed to vanish. In contrast, the fluxes at the upper boundary are derived from atmospheric forcing fields by using bulk formulations or by applying analytical prescriptions. The lateral boundary conditions depend on the specific model setup.

### b. Model configuration and setup

The aim of this work is to demonstrate the large-scale applicability of the NETF in principle. Therefore, we chose a simplified configuration of the NEMO model with a wind-driven ocean in a closed basin. It idealizes the ocean circulation that is representative for the midlatitudes (e.g., the Gulf Stream in the North Atlantic). This configuration is an established test bed in oceanography (e.g., Carrier and Robinson 1962; Cosme et al. 2010; Lévy et al. 2010; Yan et al. 2014). The forcing induces a large-scale double-gyre circulation that is complemented by mesoscale eddies.

The model is applied to a closed, rectangular basin in the North Atlantic in an area ranging from 60° to 30°W and from 24° to 44°N. The horizontal grid employs a distance of 0.25°. This resolution corresponds to an eddy-permitting setup. This discretization results in grid points in the zonal and grid points in the meridional direction. In the vertical, 11 layers are defined with exponentially increasing thickness. The bottom is assumed to be flat and located at a depth of 5054 m. The leapfrog scheme is used with a time step of 15 min and a smoothing parameter of 0.1 for the Robert–Asselin time filter. External gravity waves are damped explicitly in the horizontal momentum equations according to Roullet and Madec (2000). The space variation of the lateral eddy coefficients is constrained to the horizontal, while the vertical ones are fixed. Here, the default values 1.2 × 10^{−4} m^{2} s^{−1} for momentum and 1.2 × 10^{−5} m^{2} s^{−1} for temperature are set. The parameterization of lateral mixing is realized by a biharmonic diffusion operator with an eddy coefficient of −8 × 10^{10} m^{4} s^{−1} for both momentum and temperature. At the bottom, linear friction with a drag coefficient of 4 × 10^{−4} m s^{−1} is prescribed, while the lateral boundaries are assumed to be frictionless.

The model is entirely forced by a prescribed zonal wind stress , which varies with latitude *ϕ*, but is constant in time *t* and longitude *λ* (Cosme et al. 2010):

Here, is the latitude at the southern boundary and is the latitude range of the domain. The forcing is symmetric with respect to in the domain center, where the zonal wind directs to the east between 29° and 39°. In the northern and southern parts, the wind blows to the west. No freshwater influx is considered, and therefore, salinity does not vary and is disregarded in all following considerations.

### c. Model initialization and truth run

We create a reference trajectory for the identical twin experiment, which is referred to as the truth, by initializing the model with an ocean at rest. Salinity remains constant at a prescribed value of 35.5 g kg^{−1}. Each vertical ocean column is initialized by the same temperature profile that corresponds to stratification typically observed in ocean climatologies (Chassignet and Gent 1991):

After initialization, the model is integrated forward for 75 years. One year is idealized to 360 days. The first 50 years are considered as the spinup phase toward reaching the model climatology. Afterward, the dynamic equilibrium of the model with respect to the applied forcing is reached (Cosme et al. 2010). The actual DA experiment is performed in year 75, while the years 51–74 allow us to estimate the model climatology.

Before turning to the actual assimilation problem, it is useful to gain an overview of the dynamical structure of the wind-driven ocean. For this purpose, we refer to the true initial state in year 75. Figure 1 shows the surface fields. The large-scale double-gyre circulation, which intensifies at the western boundary, is directly visible in the and *T* fields. In the west, the inhomogeneous zonal wind forcing leads to boundary current (see Fig. 1d). They support an eastward jet in the center, which is located around in Fig. 1c. However, the jet is subject to dynamic instabilities. Consequently, chaotic behavior can be observed that leads to a mesoscale flow besides the large-scale double-gyre circulation. It is characterized by eddies, which exhibit notable local differences in velocity, temperature, and , and they may also influence the large-scale flow (Holland 1978). This underlines the model’s high-dimensional dynamics, which is important concerning the aim of our study. More quantitatively, the dynamics of a system can be regarded as high dimensional if a considerable number of eigenvectors are necessary to explain most of its variability. The black line in Fig. 2 shows the amount of variability that can be explained for a model run over 4 years. This is derived from a multivariate singular value decomposition (Lermusiaux and Robinson 1999) of states that were sampled each second day. To explain 90% of the variability, about 200 eigenvectors are needed while 500 eigenvectors can explain 99%. These numbers are slightly smaller, but of the same order of magnitude as was found for data from atmosphere or atmosphere–ocean general circulation systems (Achatz and Branstator 1999; Achatz and Opsteegh 2003).

## 4. Experimental setup

Having shown the model and its circulation characteristics, the remaining task is to define an assimilation problem by simulating observations. Furthermore, we now describe and discuss the filter setup.

### a. Observation scenario

Even though our experiment employs a simplified ocean configuration with simulated observations, we create a situation that resembles a typical ocean assimilation problem. Cosme et al. (2010) and Yan et al. (2014) applied the NEMO configuration described above to investigate different DA algorithms. We closely follow their setup and simulate two types of observations with distinct spatial structures and densities.

First, *Environmental Satellite* (*Envisat*) observations (e.g., Durrant et al. 2009) are simulated on the flight tracks falling into our model domain during the year 2009. This yields on average 150 measurements at each analysis time. As the observation operator, a bilinear interpolation from the model grid to the observation grid is applied. The observation error, which summarizes the measurement and representivity errors, is modeled as uncorrelated Gaussian noise with a standard deviation of 0.06 m. Second, temperature observations mimic the Argo network (Carval et al. 2013) on a horizontal 3° × 3° grid that is shifted at each analysis time to reflect the nonstationarity of the profilers. Vertically, all Argo levels located below the first layer’s center are considered for each profile. The standard deviation of the observation error variance is 0.3°C, and the observation operator is given by trilinear interpolation.

The observations are assimilated every second day (i.e., 192 model time steps define one analysis cycle). Since the truth run covers one idealized year, 180 analysis steps are performed in total. Figure 3 visualizes the observation characteristics on day 8 (i.e., at the fourth analysis step). Figure 3a is a horizontal snapshot that shows both the *Envisat* locations and the Argo network available at this time. The observations along the satellite tracks are indicated by their color. Figure 3b gives an example of the temperature profiles along the line and their location in the vertical NEMO grid.

### b. Technical implementation

For the assimilation experiments, the NEMO model was coupled with the parallel DA framework (PDAF; Nerger and Hiller 2013, available online at http://pdaf.awi.de). In PDAF, numerous EnKF variants including the LETKF are already implemented and used for different applications (e.g., Losa et al. 2012; Fournier et al. 2013). Because of the similarity of the LETKF and NETF equations (see section 2), almost the same implementation is used. Only minor changes were needed to account for the different transform matrices and weight vectors. At an analysis time, each ensemble state vector () is constructed from the model fields (*T*, *U*, *V*, ) and mapped into observation space by interpolation (see section 4a), thus yielding . Localization is implemented as described in Nerger et al. (2006) by performing a local analysis for each vertical column. For each column, nearby measurements (see section 4c) are selected to create the local observation vector and the local part of . Then, a local analysis is computed as explained in section 2. Finally, all local analyses are accumulated and each analysis state vector is distributed into the model fields to initialize the next forecast phase.

Technically, NEMO has been extended to call PDAF directly from the model code, and it has been compiled into a single program that contains the full assimilation system (Tödter 2015, see chapter 6.4.1). Therefore, all parallel features that are already present in the PDAF library are directly usable. In addition, the same model or observation specific routines can be applied directly to both filters and the twin experiments can be conveniently performed.

### c. Filter setup

For our experiment, we apply the NETF algorithm as presented in section 2 without any model-specific modifications. A fixed multiplicative inflation as shown in section 2c is applied. We found that a factor of yields the best performance for the NETF (see section 5).

To localize the analysis, the vertical ocean columns are updated independently. A horizontal localization radius of 2.5° is used in conjunction with a fifth-order polynomial correlation function [Gaspari and Cohn 1999, their Eq. (4.10)] to reduce the observational influence with distance. This radius, which roughly corresponds to 250 km, arises from experiences in prior LETKF experiments in this setting and is in agreement with similar experiments and the statistical properties of the system (Cosme et al. 2010; Yan et al. 2014). In tuning experiments, we found that this radius also allows the NETF to achieve its best performance. However, in other applications, the optimal radius might differ for both filters. Table 1 provides some statistics for the observation dimensions from a global and local point of view. On average, at each analysis time, 3273 independent observations (mostly temperature data) are assimilated. The average local observation dimension is nearly 100, but it may reach up to 200. Thus, on average, the likelihood weights are evaluated in a 100-dimensional subspace.

The filter is initialized from model climatology. To allow an investigation of its influence on the filter performance, we prepared both a long- and short-term climatology by extracting a model state from the truth run every other month within the years 51–74 and 65–74, respectively. Table 2 summarizes the basic statistics of these climatologies. They exhibit a similar deviation from the true initial state, but the long-term climatology has significantly more spread, particularly in the temperature field. The ensemble of 120 members is constructed from the corresponding climatology via second-order exact sampling (Pham 2001). This guarantees that the ensemble contains the dominant error directions, but it could only explain at best 80% of the system’s variability (see Fig. 2). Additionally, following Table 2, we increased its spread by 25% to ensure that the truth lies within the ensemble range. The initial members contain mesoscale features, however, their mean only reflects the large-scale double-gyre circulation and the central jet, as visualized in Fig. 4 for the long-term climatology.

### d. Complexity of the assimilation problem

The twin experiment employs a state-of-the-art, complex circulation system, as used in ocean or atmospheric modeling, including advanced physical parameterizations. The global state vector consists of all prognostic variables of NEMO except salinity [i.e., *T*, *U*, *V* (three-dimensional fields), and (two-dimensional)]. Therefore, the global state dimension is . On the global scale, the system exhibits high-dimensional dynamics, as reflected by the black line in Fig. 2 and discussed in section 3c. The dynamics are characterized by the chaotic mesoscale flow. Each *local* region, as defined by the localization radius, contains nearly 300 water columns with about 1 × 10^{4} state variables. The gray line in Fig. 2 shows the average amount of eigenvectors needed to explain the system’s variance in a local region. It is smaller than for the global case, but the difference is not very pronounced. Thus, because of the eddies, the dynamics also exhibit considerably high variability on the local scale.

The number of and *T* observations at each analysis step is about 3300, which is one order of magnitude larger than the ensemble size of 120. Yet, the average *local* observation space dimension is of the same order as the ensemble size. This is a typical situation in large-scale ocean problems (e.g., Yan et al. 2014). For the NETF as a particle-based technique, it represents a major challenge. Ordinary PFs (with or without resampling) would require at least () members to assimilate 100 (200) independent observations (Snyder et al. 2008). However, in atmospheric DA applications, the local observation dimension may strongly exceed the ensemble size (e.g., Buehner et al. 2010). This remains a limitation of our study.

Finally, the initial ensemble chosen here does not contain information about the actual mesoscale flow. Hence, we not only assess the capability of the NETF to track the truth, but also to converge to it at first. In summary, our experiment considers a full three-dimensional ocean including its thermodynamics, together with a realistic observation scenario and a climatological initialization. For comparison, Ades and van Leeuwen (2015) initialized the EWPF around the truth in a highly nonlinear single-layer primitive equation system with a state dimension of 6 × 10^{4} using only 24 particles. We conclude that the experiment conducted here is suited to assess the NETF’s general applicability for nonlinear, high-dimensional DA.

## 5. Results and discussion

This section presents and discusses the results of the experiment described above. For comparative reasons, we also integrated the initial ensemble throughout the time window but without assimilating any observations. This free ensemble serves as a reference to show the impact of the observations on the model evolution.

### a. Qualitative evaluation

For now, we apply the long-term climatology to create the initial ensemble, as presented in section 4. Thus, the free ensemble can only deliver climatological information of the system during the forecast phase (i.e., the large-scale double-gyre circulation created by the wind forcing). However, it is not able to resolve the mesoscale patterns, which are of a chaotic nature and average out over the free ensemble. This is supported by snapshots of the fields on day 260 in Fig. 5. In principle, the mean of the free ensemble, Fig. 5b, has not changed compared to the initial time, as revealed by a comparison with Fig. 4a. In contrast, the NETF analysis at that time, Fig. 5c, yields an estimate that closely resembles the true field including the mesoscale features, as shown in Fig. 5a. The field difference, plotted in Fig. 5d, shows only small deviations of at most 10%. They mainly concern the magnitude of the mesoscale perturbations in the dynamically very active region in the domain center, which is highlighted by the box in Fig. 5d. These qualitative findings are similarly observed for all other variables as well, including the unobserved velocity fields. These results demonstrate that the NETF successfully assimilates the observations in this high-dimensional problem.

### b. Quantitative evaluation

#### 1) Evaluation measures

The main evaluation criterion used for filter performance is the root-mean-square error (RMSE) of the ensemble mean at each analysis time level . It is based on the spatial average of the squared deviations, and it is computed separately for each model variable according to

Here, *X* stands for each of the four different model fields (*T*, *U*, *V*, ) and represents the ensemble mean field. The term is the dimension of the field (i.e., 107 811 for *T*, *U*, and *V* and 9801 for ), and indicates summation over all model grid points. As an overall measure, the RMSE may not capture all performance details, but a low RMSE represents a necessary condition for a successful analysis. Ensemble filters allow one to calculate an *estimated* RMSE by the field-averaged ensemble spread:

This measure can be computed independently from the truth. In a well-calibrated DA system, it should be of similar magnitude as the RMSE, since then the ensemble distribution will allow for reliable diagnostics of the filter uncertainty (e.g., Palmer et al. 2005; Hopson 2014).

The RMSE enables a general judgment of the analysis quality. However, it only evaluates the ensemble mean and neglects higher-order moments. To assess the quality of the ensemble distribution as well, we also apply the probabilistic continuous ranked probability score (CRPS, e.g., Gneiting et al. 2007; Tödter and Ahrens 2012) for the comparison with the LETKF. The CRPS accounts for the distribution of probability mass around the truth for any state vector component *x*. Specifically, let be the empirical ensemble pdf. The CRPS quantifies the difference between the cumulative density functions (cdfs) of the ensemble and the verifying truth (a Heaviside step function, ) in probability space by using a quadratic norm:

The empirical ensemble cdf, , is a piece-wise constant function, which allows for the computation of Eq. (13) according to the algorithm suggested by Hersbach (2000). Similar to the RMSE, we evaluate the CRPS at each grid point and then average over the whole field.

#### 2) Error reduction

Figure 6 shows the temporal evolution of the analysis RMSE for each control variable, together with the average ensemble spread. The evaluation measures represent the average over the whole field (two-dimensional for , three-dimensional for *T*, *U*, and *V*). The corresponding results for the free ensemble are also included. As expected, the free ensemble does not improve upon the initial time. Its error and spread remain of similar magnitude, and their high variability reflects the chaotic nature of the model evolution. The RMSEs of the NETF indicate that it performs successfully, as its analyses only exhibit small errors. This reflects the insight that the NETF is able to resolve the mesoscale patterns of the true flow. For all variables, the initial error is reduced nearly monotonically until convergence is reached. This spinup phase is explained by the fact that the initial ensemble does not contain information about the mesoscale circulation (Yang et al. 2012), and this will be discussed in more detail in section 5c. After the spinup, the errors remain approximately constant, which shows that the filter ensemble is in a quasi-stationary balance constrained by the observations. The comparison with the free ensemble demonstrates the error reduction in absolute terms. In addition, the small variability of the filter errors compared to the free ensemble shows that the NETF tracks the truth continuously. These considerations are further confirmed by the ensemble spread, which shows that the inflation factor is well tuned. After the spinup phase, the spread is nearly constant and typically slightly larger than the RMSE. Therefore, the ensemble distributions are consistent with the truth in a statistical sense.

Next, for a better assessment of the performance, the RMSEs are normalized by their values at the initial time (day 0) at which the NETF and free ensemble are equivalent. Figure 7 shows the temporal evolution of these relative errors. The strongest error reduction is observed for the temperature *T*. This is easily explained by the fact that at least the upper temperature field is constrained by the majority of the observations. A minimal relative error of about 7.5% can be achieved. , also an observed variable, achieves the second-best minimal relative error, with about 10% beyond the spinup. Even though the velocity fields are not observed, they can also be estimated with fairly high accuracy, and minimal relative errors of about 15% are found. Here *U* exhibits slightly smaller errors, presumably because it is, at least close to the surface, constrained by the fixed zonal wind forcing. Figure 8 provides a more detailed evaluation of the overall performance by displaying the error evolution in all vertical layers for the three-dimensional variables (*T*, *U*, and *V*). It shows that not only the densely observed surface layer but also the subsurface layers are improved by the filter. As is typical in ocean data assimilation, the spinup time increases with depth because the deeper ocean layers are less constrained by observations and exhibit a slower variability (Zhang and Rosati 2010).

#### 3) Comparison to the LETKF

As demonstrated, the NETF produces consistent and reasonable analyses, which proves the main objective of this work. To further assess its performance, we also applied the LETKF using the same initial ensemble and setup. We found that an inflation factor of results in the best performance for the LETKF. Figure 9 compares the relative errors of both filters in terms of the relative RMSE (black and gray lines) and CRPS (red and orange lines) for all four variables. The LETKF requires only slightly more than 100 days to reach convergence, while the NETF spinup phase takes about 200 days. This difference will be discussed in section 5c. However, from day 200 on, the NETF and LETKF perform almost identical in terms of the RMSE. In principle, the same holds for the CRPS. The spinup time indicated by the CRPS is slightly longer than that for the RMSE. This behavior is independent of the filter. The reason is that the RMSE only measures the convergence of the ensemble mean, while the CRPS takes the whole ensemble distribution into account. After the spinup, the CRPS and RMSE achieve a similar relative value for each variable. This confirms that the ensemble distribution, which quantifies the filter uncertainty, is consistent with the error of the ensemble mean.

To assess the statistical significance of the similarity of the results, we repeated the assimilation experiment 10 times with different random numbers for the generation of the initial ensemble and the observations (see section 4) as well as for the random rotations [see Eq. (2)]. The truth (see section 3c) was kept fixed to ensure a reproducible reference trajectory of balanced model states. The thick lines in Fig. 10 show the relative RMSEs averaged over all model variables and all repetitive runs. After the spinup phase, they are very similar for the NETF (in black) and LETKF (in gray), respectively. This confirms that, on average, the NETF achieves the performance of the LETKF after the spinup. The minimal and maximal scores of the LETKF are close to the mean, which underlines the high robustness of the LETKF. The scores of the NETF exhibit more variability in the repetition runs, particularly during its spinup phase. For the CRPS, the corresponding results are very similar as in Fig. 10 (not shown).

### c. Initial ensemble and spinup phase

The results have confirmed that the NETF yields a reasonable analysis with low errors, but it requires a longer spinup time than the LETKF. During this period, the results are also more sensitive to random variations. In general, the spinup phase of an ensemble filter is related to its initialization (Kalnay and Yang 2010). Hence, the NETF is more sensitive to the specification of the initial ensemble than the LETKF. This property might also be valid for nonlinear filtering in general, and this section provides a more detailed exploration of this issue.

#### 1) Sensitivity to the initial ensemble

As demonstrated in section 5b, the long-term climatology (spanning years 51–74) results in a successful NETF run. Initially, we had used the short-term climatology (spanning years 64–74, see also section 4c), for which the LETKF works as well. In contrast, the NETF diverges in this case independently of the tuning parameters. The ensemble spread decreases, but there is no error reduction (Fig. 11). These earlier failures emphasize that the consistent specification of the initial ensemble is a key criterion for the NETF to avoid filter divergence. It should appropriately span the largest sources of variability (Zhang et al. 2004). This claim is supported by the statistical properties of the model climatologies, which are shown in Table 2. The deviations from the true initial state, as measured by the RMSE, are of similar magnitude for both climatologies. However, the spread in *T* is very small for the short-term climatology (i.e., only about one-third of the RMSE). Hence, the initial ensemble does not properly cover the truth. For the other variables in Table 2, the situation is less problematic, but the weights are computed in the observation space, which is mainly spanned by temperature measurements. Therefore, in the beginning, most ensemble members receive insignificant relative weights, and this leads to filter divergence. In contrast, the long-term climatology exhibits more consistent statistics. Even though the spread is still too small, its deviation from the RMSE is less pronounced.

#### 2) Spinup time and inflation

The initial ensemble based on the long-term climatology enables the NETF to work successfully in our experiment, but the relatively long spinup time represents a practical restriction for future NETF applications. A first hint toward understanding this behavior can be found by revisiting Fig. 6. During the spinup phase, the spread decreases faster than the RMSE for all variables, particularly for *U*, *V*, and . EnKF-like methods do not sufficiently account for the observations if the ensemble spread is too small. This can increase the spinup time or even lead to filter divergence (e.g., Evensen 2009; Yang et al. 2012). Usually, it is counteracted by inflation procedures. Therefore, one might consider that a larger inflation factor might reduce the spinup time of the NETF. However, after the spinup phase, it would be detrimental to the filter and cause the RMSE to increase again. To explore this question, we tested a linearly decreasing inflation factor (1.11–1.02) during the initial phase of the assimilation experiment. Empirically, we found that applying this tuning during the first 100 days results in a much better agreement between the RMSE and spread over the entire assimilation window, as shown in Fig. 12. However, the spinup time is not shortened much by this modified inflation procedure and the RMSE remains almost as before. This indicates that the NETF’s slower convergence is not simply a matter of inflation tuning, even though a more objective adaptive inflation technique (e.g., Anderson 2009) might still improve the results.

#### 3) Explanation

The NETF’s higher sensitivity toward the initial ensemble in comparison with the LETKF can be explained by a key difference in their update mechanisms that was not explored in Tödter and Ahrens (2015). In EnKFs, the mean increment is *proportional* to the mean innovation (i.e., ) (Hodyss 2012), as visible in Eq. (3). The analysis covariance in the LETKF is determined by the transform matrix in Eq. (4). It only depends on the prior ensemble covariance and the observation error covariance matrix , but not on the actual observation (see Posselt et al. 2014). Therefore, the EnKFs are stable if the initial ensemble exhibits sufficient spread and the observation error is specified consistently. In contrast, according to Eqs. (7) and (8), both the NETF’s analysis mean and covariance are determined directly by the likelihood weights [Eq. (5)]. For Gaussian observation errors, the weights, as given by Eq. (6), decay *exponentially* with the innovations, . Thus, the relative contribution of members with a large distance to the observation is less pronounced in the NETF compared to the EnKFs. If many weights are insignificant, the analysis mean and covariance are effectively estimated by few members only, which degrades their quality. This property becomes particularly apparent for an ensemble initialized from a model climatology that is characterized by large innovations . In this case, the ensemble might collapse (see above), but at least the marginal contribution of most members results in the longer spinup period. This property also explains the increased sensitivity toward random variations in the observations (see section 5b).

To verify this hypothesis, the average effective ensemble size (Doucet et al. 2001) is computed by for the experiment in section 5b. It equals *m* if all members have equal weight and is smaller otherwise. Thus, it serves as a descriptive measure of the variance of the weights (Stordal et al. 2011). As shown in Fig. 13, the effective ensemble size is only about 20 during the first analysis steps and increases to over 100 later on. These numbers support the theoretical expectation stated above that most NETF members contribute only a little information in the beginning.

## 6. Conclusions and outlook

This work explores the large-scale applicability of the recently proposed NETF (Tödter and Ahrens 2015), which is a model-independent filter that creates a new analysis ensemble whose mean and covariance exactly match the Monte Carlo estimates of the corresponding Bayesian expectations. Thereby, it avoids biases that arise because of the Gaussian assumption inherent in EnKFs. The NETF performs an ensemble transformation like the LETKF, but with a distinct transform matrix and weight vector. It is supplemented by a moment-preserving random rotation. In contrast to most PFs, this transformation increases the stability of the NETF in larger-dimensional settings if the analysis is localized. Compared to Tödter and Ahrens (2015), this paper applies the new filter to a more realistic assimilation problem in an OGCM with high-dimensional dynamics and about 3.3 × 10^{5} state variables. The assimilation of temperature profiles and satellite sea surface heights mimics realistic ocean observation networks. The system is characterized by its wind-driven large-scale dynamics, a double-gyre circulation as occurs in the North Atlantic, and mesoscale eddies that express its chaotic nature.

The NETF remains stable in this setting and shows a reasonable performance. It converges to the truth and keeps track of it, including the mesoscale features. Therefore, the NETF is applicable to this nonlinear, high-dimensional problem even though it only relies on the Bayesian weights. Thus, it is able to overcome to curse of dimensionality with a computationally feasible ensemble size. Furthermore, its implementation in realistic circulation systems requires only little additional effort if an LETKF system is available. To the knowledge of the authors, so far, a similar successful large-scale application could only be achieved with the more complex EWPF, which additionally relies on stochastic models. The circulation system considered here exhibits globally high-dimensional dynamics. A limitation of this study is that the local observation space, where the analysis is computed, exhibits a dimension of the same order of magnitude as the ensemble size. The observation scenario constructed here is realistic and dense for ocean applications, and already very challenging for a PF-based technique. However, in meteorological applications, the observation density can become considerably higher.

The experiments demonstrate that the NETF reduces the analysis errors to between 7.5% and 15% for the different variables, compared to the initial time. Thus, it generates reasonable analysis increments based on the local likelihood weights. The ensembles are statistically consistent, which shows that the observations are used efficiently. After the spinup period, it can at least match the performance of the LETKF, as determined by both the RMSE and CRPS. However, it is unknown whether more improvement over the LETKF is actually possible in this setup. Because of its nonlinear design, the NETF may offer potential benefits, but this has to be elaborated upon in additional nonlinear, large-scale applications.

The results further emphasize that the NETF is sensitive to its initialization, which becomes particularly relevant in higher dimensions. An inconsistent initial ensemble (e.g., one with too little spread) results in many members with insignificant weights in the first analysis steps and might cause filter collapse. Furthermore, this issue increases the spinup phase in comparison to the LETKF, particularly if the initial ensemble only contains little information about the true flow. This problem is likely of concern to the EWPF as well. In the large-scale EWPF experiments published so far (van Leeuwen and Ades 2013; Ades and van Leeuwen 2015), the initial ensembles were generated by perturbing the true initial state, which prevents a spinup phase. Nevertheless, even with a climatological initialization, the NETF is still able to reconstruct the truth. In future applications, one should make sure to specify consistent initial ensembles that contain as much knowledge about the true flow as possible (e.g., from previous analyses). As shown for EnKFs, this can strongly reduce the spinup time (Yang et al. 2012). The most pragmatic solution consists of simply using an EnKF for the spinup phase and switching to the nonlinear filter afterward. Alternatively, more advanced adaptive inflation methods (e.g., Anderson 2009; Miyoshi 2011) could be adopted for the NETF or the spinup could be reduced by smoothing the estimates with future observations (Cosme et al. 2010). Such a nonlinear smoother based on the NETF can be derived in analogy to the ensemble transform Kalman smoother (Kalnay and Yang 2010; Nerger et al. 2014) and will be presented in an upcoming paper.

The successful application of the NETF in an OGCM offers numerous possibilities for future research. First, we anticipate that the experiences presented here will stimulate further research to reduce the spinup time. Second, in the experiment conducted here, the filter performance was similar to the LETKF at least after the spinup phase. Therefore, the NETF should be applied to other large-scale systems to accentuate potential benefits of the nonlinear approach. This could concern setups where the chaotic dynamics are able to modify the large-scale flow, since the latter is constrained by the analytical wind forcing in our setup. Additionally, it would be important to assess whether the NETF is able to deal with a local observation space whose dimension is much higher than the ensemble size. Such a situation is characteristic of atmospheric data assimilation, and it might represent a limitation for the filter.

Finally, the NETF should be compared to the EWPF to assess the relevance of higher-order moments in nonlinear, large-scale DA. This necessarily requires the employment of a stochastic model. The NETF can be combined with a nudged forecast step as well (Tödter 2015, chapter 4.9), which allows for a fair comparison of the analysis algorithms in such a scenario. For that purpose, the NETF algorithm does not need to be modified, except that the prior weights resulting from nudging have to be considered in Eq. (7). In this context, it could also be investigated whether the EWPF exhibits a similar sensitivity to the initial ensemble.

## Acknowledgments

The authors thank the three anonymous reviewers for their detailed comments that greatly helped in improving the manuscript. J. Tödter and B. Ahrens acknowledge support by Senckenberg BiK-F and by the German Federal Ministry of Education and Research (BMBF) under Grant MiKlip: DEPARTURE/01LP1118B. P. Kirchgessner and L. Nerger acknowledge the support by the SANGOMA EU Project (Grant FP7-SPACE-2011-1-CT-283580-SANGOMA). The calculations for this research were partly conducted on the LOEWE-CSC high-performance computer of the Goethe University Frankfurt. We thank HPC-Hessen, funded by the State Ministry of Higher Education, Research and the Arts, for their support.

## REFERENCES

*Inverse Modeling of the Ocean and Atmosphere.*Cambridge University Press, 256 pp.

*Sequential Monte-Carlo Methods in Practice.*Springer-Verlag, 581 pp.

*Data Assimilation: The Ensemble Kalman Filter*. Springer, 279 pp.

*ECMWF Newsletter*, No. 106, ECMWF, Reading, United Kingdom,

*Density Estimation for Statistics and Data Analysis.*Chapman and Hall, 175 pp.