## Abstract

The serial ensemble square root filter (EnSRF) typically assumes observation errors to be uncorrelated when assimilating the observations one at a time. This assumption causes the filter solution to be suboptimal when the observation errors are spatially correlated. Using the Lorenz-96 model, this study evaluates the suboptimality due to mischaracterization of observation error spatial correlations. Neglecting spatial correlations in observation errors results in mismatches between the specified and true observation error variances in spectral space, which cannot be resolved by inflating the overall observation error variance. As a remedy, a multiscale observation (MSO) method is proposed to decompose the observations into multiple scale components and assimilate each component with separately adjusted spectral error variance. Experimental results using the Lorenz-96 model show that the serial EnSRF, with the help from the MSO method, can produce solutions that approach the solution from the EnSRF with correctly specified observation error correlations as the number of scale components increases. The MSO method is further tested in a two-layer quasigeostrophic (QG) model framework. In this case, the MSO method is combined with the multiscale localization (MSL) method to allow the use of different localization radii when updating the model state at different scales. The combined method (MSOL) improves the serial EnSRF performance when assimilating observations with spatially correlated errors. With adjusted observation error spectral variances and localization radii, the combined MSOL method provides the best solution in terms of analysis accuracy and filter consistency. Prospects and challenges are also discussed for the implementation of the MSO method for more complex models and observing networks.

## 1. Introduction

Data assimilation (DA) combines the information from observations and model forecasts according to their relative uncertainties to obtain an improved estimate of the model state. The ensemble Kalman filter (EnKF; Evensen 1994; Burgers et al. 1998) is a popular ensemble filter formulation where prior and observation errors are assumed to follow Gaussian distributions. Serial EnKF formulations (Anderson 2003; Tippett et al. 2003), where observation errors are assumed uncorrelated and observations are assimilated one at a time, have good computational scaling and have been successfully implemented in operational prediction systems (e.g., Whitaker et al. 2008). Despite its popularity, the serial EnKF is known to have suboptimal performance when the observation errors are correlated, since the uncorrelated observation error assumption will cause an overfit to the observations.

In this paper, observation errors are considered as including both instrument noises and errors of representativeness, or “representation errors” (van Leeuwen 2015; Hodyss and Satterfield 2017; Janjic et al. 2018). Representation errors can be caused by errors in the forward operator that maps model state to corresponding observations, scale mismatches between observations and model states, as well as errors introduced during preprocessing of observations. Scale mismatches between observations and model states often give rise to spatial correlations in representation errors (Stewart 2010; Weston et al. 2014; Waller et al. 2014). Due to these spatial correlations, observations obtained from Doppler radar (Waller et al. 2016), satellite radiances (Bormann and Bauer 2010; Geer et al. 2017; Migliorini and Candy 2019), and many others cannot be fully utilized in DA. To avoid overfitting to observations with correlated errors, it is common practice to inflate the observation error variances during DA (Courtier et al. 1998; Collard 2004; Hilton et al. 2009; Bormann et al. 2016). However, Miyoshi et al. (2013) pointed out that inflating observation error variances to try to compensate for correlated observation errors is suboptimal. Other observation subsampling methods such as thinning (Liu and Rabier 2002, 2003; Dando et al. 2007) and the creation of superobservation, also known as “superobbing” (Berger and Forsythe 2004; Zhang et al. 2009), have been proposed to reduce spatial correlations in observation errors. Hoffman (2018) recently investigated the impact of observation thinning or smoothing on analysis accuracy in a simple one-dimensional setting. Although subsampling the observations will prevent overfitting, doing so will discard small-scale observation information and degrade the overall DA performance.

For high-resolution prediction systems, the need for retaining small-scale information calls for DA methods that can incorporate the observational information at all ranges of scales. Recent studies have investigated DA methods that can account for observation error correlations in simplified (Stewart et al. 2008, 2013; Rainwater et al. 2015; Ruggiero et al. 2016) or realistic scenarios (Bormann and Bauer 2010; Weston et al. 2014; Waller et al. 2016; Bormann et al. 2016; Campbell et al. 2017; Simonin et al. 2019). These DA methods, although more computationally expensive, allow more observation information to be incorporated and thus improves prediction skill. A common approach for ensemble filtering to assimilate observations with correlated errors is to perform a transform of the observations to a space where their errors are no longer correlated (rotating the observation error covariance matrix) and perform DA in that space (e.g., Anderson 2001). For example, if the observation error correlation length is constant in space, the observations can be transformed to spectral space (e.g., Fourier transform) where their errors become uncorrelated. However, reducing sampling noises in spectral space is more difficult, since a distance-based correlation function in physical space will transform to a nonlocal function in spectral space.

Spatial correlations in observation errors change the distribution of error variances in spectral space, effectively changing the relative precision of observation information at different scales (Miyoshi et al. 2013; Rainwater et al. 2015; Fowler et al. 2018). Observation errors with spatial correlation are essentially a red noise with more error variances concentrated at larger scales. Recent ideas of using spatial differences of neighboring observations to extract small-scale information and improve DA performance (Anderson 2016; Bedard and Buehner 2020) inspired a new multiscale approach described in this paper. The new approach decomposes the observations into several scale components and assimilates them sequentially using a serial EnKF. During assimilation of an observation scale component, its corresponding error variance is adjusted individually to account for the effect of spatial correlations.

This paper is organized as follows. Section 2 describes the DA framework used in this study for testing the multiscale approach. Section 3 uses the Lorenz (1996) model to demonstrate the suboptimality due to mischaracterizing observation error correlations and provides a proof of concept of how the multiscale approach reduces such suboptimality. Section 4 further evaluates the multiscale approach using a two-layer quasigeostrophic model. Section 5 discusses further implementation of the multiscale approach in real prediction systems and section 6 summarizes the findings of this study.

## 2. Methodology

### a. The ensemble square root filters

Ensemble filtering is a sequential DA approach that propagates the model state in time and assimilates observations as they become available to update the state. An ensemble of model state realizations is used to provide a flow-dependent representation of its uncertainties. An analysis cycle consists of an filter update step where the prior ensemble from model forecasts is fused with observations to form the posterior ensemble (also known as the analyses) and a forecast step where the posterior ensemble is propagated forward in time to provide priors for the next cycle. This study focuses on a particular variant of ensemble filters known as the ensemble square root filter (EnSRF), which is described as follows.

Let **x** denote a random vector of size *N*_{x} containing the model state variables. Let *N* be the ensemble size and subscript *n* indexes ensemble members. Superscripts *b* and *a* denote the prior and posterior state, respectively. The *n*th prior ensemble member can be written as $xnb=x\xafb+x'nb$, where

is the ensemble mean and $x'nb$ is the *n*th ensemble perturbation. Let **y**^{o} be a vector of size *N*_{y} that contains the observations. *H* is a forward operator that maps a prior model state to its corresponding observation priors:

The observation priors can also be written in terms of the sum of mean and perturbations, $ynb=y\xafb+y'nb$. The ensemble filter derives the posterior states by weighting the prior states and the observations according to their relative uncertainties. The prior state and observation errors are both assumed to follow Gaussian distributions with zero mean (unbiased) and error covariances $B$ and $R$, respectively. The prior ensemble $xnb$ provides a sample estimate of $B$, while $R$ is typically specified.

A shorthand notation for sample-estimated error covariance between two vectors **a** and **b** is defined as

where *n* indexes the ensemble member and the prime indicate ensemble perturbations. For the sake of simplicity, the covariance between a vector **a** and itself is denoted as cov(**a**). Note the notation also holds for scalars [e.g., cov(**a**, *c*) is the error covariance between a vector **a** and scalar *c*]. The covariance between a scalar *c* and itself is just its variance, and therefore denoted as var(*c*).

The EnSRF (Whitaker and Hamill 2002) update equations can be written as

and

where $xna=x\xafa+x'na$ is the *n*th posterior member. The EnSRF is a deterministic ensemble filter formulation (e.g., Bishop et al. 2001; Anderson 2001) that does not require perturbing the observations and therefore avoids introducing additional sampling noise.

For observations with uncorrelated errors ($R$ is diagonal), the observations can be assimilated serially, which is known as the “serial EnSRF” (Whitaker and Hamill 2002; Tippett et al. 2003). Let

be a state vector in a joint state–observation space (Anderson 2003) for the *n*th ensemble member. The serial EnSRF assimilates one observation at a time to iteratively update the joint space state vectors. Let $yjo$ be the *j*th observation and $\sigma o,j2$ be its error variance (*j*th diagonal term of $R$). Superscript (*j*) indicates the updated values after assimilating the previous *j* − 1 observations. Starting from the prior state $zn\u2061(1)=znb$. The update equations for the *j*th observation are

and

where

is the Kalman gain associated with the *j*th observation, and

is a square root modification factor. When all *N*_{y} observations are assimilated, $xna=x\xaf\u2061(Ny+1)+x'n\u2061(Ny+1)$ gives the *n*th posterior ensemble member. A full $R$ matrix can be specified in the EnSRF to account for error correlations. Section 3a will use numerical experiments to show how the serial EnSRF solution deviates from the EnSRF solution due to mischaracterization of $R$.

### b. Spatially correlated observation errors

This study only considers spatially uniform observing networks. A simple first-order autoregressive process is selected as observation error model. The $R$ matrix is defined as

where *σ*^{2} is the error variance, *D*_{ij} is the distance between the *i*th and *j*th observations, and *L* is the correlation length scale. Both *σ* and *L* are considered to be constant in space and time. Figure 1a shows the spatial correlation function when *L* = 0, 1, 2, 5, and 10 grid points on a one-dimensional domain. *L* = 0 is the case when observation errors are uncorrelated. The $R$ matrix can be diagonalized as

where **Λ**^{o} contains the eigenvalues (error spectrum) along the diagonal and $U$ contains the discrete Fourier bases as eigenvectors; $\Lambda ko$, the *k*th diagonal term in **Λ**^{o}, is the spectral error variance associated with wavenumber *k* = 0, 1, …, *N*_{y}/2. The (*j*, *k*)th element of $U$ is given by

where $i=\u22121$. Figure 1b shows the error spectra for *σ* = 1 and *L* varying from 0 to 10. The uncorrelated errors (*L* = 0) have an even distribution of spectral variances. As *L* increases, more spectral error variances are distributed in the large scales (small *k*). When the true observation errors are spatially correlated, assuming uncorrelated observation errors (in the serial EnSRF) is effectively overspecifying observation error variances at small scales and underspecifying them at large scales.

Note that the exact choice of observation error correlation functions is not important for reasons that will become clear later as the multiscale approach is introduced in section 2d. Decomposing the observations into scale components will allow adjustment of spectral error variances to adapt to any arbitrary correlation functions. What is more challenging is the spatially and temporally varying *σ* and *L* in real observations, which will be discussed in detail in section 5.

### c. Metrics for evaluating filter performance

Let us now define several quantities to measure filter performance. The true model state from a nature run is denoted as $x*$. Synthetic observations are generated as $yo=H\u2061(x*)+\epsilon o$, where random noises *ε*^{o} are drawn from the true observation error distribution $N\u2061(0,\u2009R*)$ with the true error statistics $\sigma *$ and $L*$. The observation error covariance specified a priori in the EnSRF is denoted as $R$ with parameters *σ* and *L*. The true and specified observation error spectra are denoted as $\Lambda o*$ and **Λ**^{o}, respectively, according to (12). The error of the prior ensemble mean compared to the truth, $\epsilon b=x\xafb\u2212x*$, measures the actual accuracy of the prior model state, while the prior ensemble spread represents the uncertainty “specified” in the EnSRF which reflects the filter’s knowledge of the uncertainty in state estimate. The spectrum of the actual prior error, $\Lambda b*$, can be calculated as

where *E* denotes the expected value that is evaluated by averaging *ε*^{b}*ε*^{bT} over many analysis cycles, $U\u02dc$ is similar to $U$ in (13) but replacing *N*_{y} with *N*_{x}. The spectrum of prior ensemble spread, **Λ**^{b}, can be calculated as

Ideally, the ensemble spread should match the actual error standard deviation for each wavenumber, so that the specified uncertainties match the truth and correct weights are assigned to the observations and prior states. A consistency ratio (CR) is defined as the ratio between **Λ**^{b} and $\Lambda b*$ to evaluate this match. When CR < 1, the ensemble spread underestimates the actual uncertainties in model states (the ensemble is underdispersive), and observations are not given enough weights to adjust the state estimates. In extreme cases, the observations can be completely ignored when the ensemble spread collapses, a phenomenon known as “filter divergence”. Note that the same metrics can be calculated for the posterior error and ensemble spread, $\Lambda a*$ and **Λ**^{a}. The overall error and spread can be measured by summing over all wavenumbers. For example, the square root of the trace of $\Lambda a*$ and **Λ**^{a} gives the overall posterior RMSE and posterior ensemble spread, respectively.

Covariance localization (Houtekamer and Mitchell 2001; Hamill et al. 2001) and inflation (Anderson and Anderson 1999) are also implemented to counter the deficiencies in ensemble filter and improve its performance. Localization reduces the sampling noises due to the use of a limited-size ensemble. The Gaspari and Cohn (1999) localization function is used in this study with radius of influence (ROI) defined as the distance at which ensemble estimated covariance is tapered to zero. A multiplicative inflation factor *λ* is applied to the prior ensemble perturbations before the filter update to maintain filter consistency in the presence of model errors or other unaccounted for error sources.

### d. A multiscale ensemble filtering approach

This section introduces a multiscale ensemble filtering approach comprising two methods: multiscale observation (MSO) and multiscale localization (MSL). The MSO method decomposes the observations into several scale components and the filter update is calculated sequentially for each component to update the model state. The uniform observing network considered in this study allows scale decomposition to be performed by fast Fourier transforms. The *s*th observation scale component can be found by

where $U$ contains the Fourier bases from (13), **f**_{s} is a scale-selective response function for the *s*th scale component, and $\u2218$ denotes the element-wise product. Approaches for extracting scale component from nonuniform observing networks will be discussed in section 5. The number of scale components *N*_{s} and the response functions are predefined in this study. The MSO method starts from the prior ensemble $xn\u2061(1)=xnb$ and observation priors $yn\u2061(1)=ynb$ (subscript *n* indexes ensemble member), and performs the following steps for *s* = 1, …, *N*_{s} to iteratively update the states and observation priors.

Use (16) to compute the

*s*th scale component of the observations and observation priors, $yso$ and $yn,s\u2061(s)$.Use $xn\u2061(s)$ as prior ensemble, $yso$ as observation, and $yn,s\u2061(s)$ as observation priors; use $\u2061(\lambda so\sigma )2$ as observation error variance, where $\lambda so$ is an adjustment factor for the spectral observation error varainces. Apply the serial EnSRF update equations in (7)–(10) and set $xn\u2061(s+1)$ as the updated ensemble and $yn\u2061(s+1)$ as the updated observation priors.

If

*s*<*N*_{s}, let*s*=*s*+ 1 and go to step 1, otherwise the final posterior ensemble is $xna=xn\u2061(Ns+1)$.

In this study, the optimal spectral observation error variance adjustment factors are found by $\lambda so=\Lambda so*/\Lambda so$, where **Λ**_{s} is the averaged spectral variances for the *s*th scale component. Section 5 will discuss how to estimate $\lambda so$ when the truth ($\Lambda so*$) is unknown, and how to account for spatially and temporally varying observation error statistics.

Multiscale localization methods have been suggested in the literature to improve DA performance for high-dimensional problems (e.g., Zhang et al. 2009; Miyoshi and Kondo 2013; Buehner and Shlyaeva 2015). The MSL method is implemented in this study by decomposing the model state into several scale components and applying a separate localization ROI when updating each component. The model state is decomposed into *N*_{t} scale components $xt=U\u02dc\u2061[f\u02dct\u2218(U\u02dcTx)]$, where *t* indexes the scale components. Let $xn,tb$ denote the *t*th scale component of the *n*th prior ensemble member. Starting from the prior states $xn\u2061(1)=xnb$, the following steps are applied for *t* = 1, …, *N*_{t} to update the states.

The observation priors corresponding to the intermediate updated ensemble states are computed as $yn\u2061(t)=H\u2061[xn\u2061(t)]$.

Use $xn,tb$ as prior ensemble,

**y**^{o}as observations, and $yn\u2061(t)$ as observation priors; use ROI_{t}as localization radius and*σ*^{2}as observation error variance; apply the serial EnSRF update step described in (7)–(10), and the resulting analysis increments are*δ***x**_{n}. Add these increments to update the ensemble as $xn\u2061(t+1)=xn\u2061(t)+\delta xn$.If

*t*<*N*_{t}, let*t*=*t*+ 1 and go to step 1, otherwise the final posterior ensemble is $xna=xn\u2061(Nt+1)$.

The MSL method can be combined with the MSO method for applications involving both multiple correlation length scales and spatially correlated observation errors. The combined method will be referred to as MSOL, which will go through each pair of *s*th observation component and *t*th state component, and compute serial EnSRF analysis increments to iteratively update the ensemble states to reach the final posterior. Note that the multiscale approach can also be applied to other variants of ensemble filters.

## 3. Lorenz-96 model experiments

In this section, the Lorenz (1996) model is employed as the test model to demonstrate the suboptimality due to mischaracterized observation error statistics in EnSRF and evaluate the MSO method. The governing equations for the Lorenz-96 model state **x** are

*i* = 1, 2, …, *N*_{x}, where *N*_{x} = 40, and *F* is a forcing parameter. The true model forcing parameter is set to $F*$ = 8, for which chaotic behavior occurs. Cycling DA experiments are conducted over 100 000 cycles with a cycling period of 0.2 time units, and performance metrics (RMSE and CR) are averaged over these cycles. The identity matrix is used as forward operator (i.e., observations are the same as state variables, $ynb=xnb$, and all state variables are observed). The true observation error statistics are set to $\sigma *$ = 1 and $L*$ = 5.

### a. Impact from mischaracterization of observation error correlation

This section evaluates the suboptimality in DA performance due to mis-specified *L*. In DA practice, several sources of suboptimality may exist simultaneously, including mis-specified observation error statistics, model errors, sampling errors due to limited ensemble size, and the deviation from linear and Gaussian approximation due to nonlinear model dynamics and forward operators. Here, the impact of changing *L* will be evaluated in several scenarios with or without sampling errors (small *N*) and model parameter errors (wrong *F*). In each scenario, the localization ROI and prior inflation factor *λ* are manually tuned by trying ROI = 5, 10, 15, …, 60 grid points, and *λ* = 1, 1.02, 1.04, …, 1.2 (or up to 1.5 if model errors exist) to find the best solution. These different scenarios provide a more comprehensive evaluation of the suboptimality due to mis-specified *L*.

A benchmark case is chosen with correctly specified *σ* = 1 and *L* = 5, the perfect model (*F* = 8), and ensemble size of *N* = 40. Note that the benchmark solution is still suboptimal because the EnSRF makes linear and Gaussian assumptions and the Lorenz-96 model has nonlinear dynamics. Nonetheless, it is still considered the benchmark case since this study only focuses on mischaracterized observation error statistics. Also note that using very large ensemble sizes in this case does not further improve performance due to the nonlinear model dynamics (Anderson 2010). For the benchmark case, the best ROI = 60 grid points and *λ* = 1.04. With correctly specified parameters, only a little localization and inflation is needed, almost all prior error covariances are retained with those at greater distances given less weight.

To understand the impact from mis-specified *L*, a set of experiments are performed with *L* that deviates from the benchmark case. The changes in DA performance are evaluated by investigating the error spectra averaged over all analysis cycles. Figure 2 shows the spectra of true observation error ($\Lambda o*$), specified observation error (**Λ**^{o}), prior error ($\Lambda b*$) and ensemble spread (**Λ**^{b}), and posterior error ($\Lambda a*$) and ensemble spread (**Λ**^{a}) for *L* = 0, 0.5, 1, 3, 5 (benchmark), and 8. The benchmark case (Fig. 2a) has minimum spectral error variances and CR close to 1 for all wavenumbers. When $L>L*$ (Fig. 2b), observation error spectral variances are underspecified at smaller scales compared to the truth, resulting in underdispersive posterior ensemble. Note that observation error spectral variance at *k* = 0 is actually overspecified (not shown on plot) and the overall observation error variance is still matching with the true since $\sigma =\sigma *$. But this overspecification only happens at the largest scale and the posterior solution seems to be dominated by the underspecification at small scales. On the other hand, when *L* < $L*$ (Fig. 2c), the observation error spectral variances are overspecified at smaller scales, causing an overdispersive posterior ensemble. For the *L* = 3 and *L* = 8 cases, the mismatches between true and specified observation error spectral variances are not large and do not considerably increase the posterior error variances. As *L* deviates more from $L*$ (Figs. 2d–f), the posterior error variances become much larger than the benchmark case. While posterior error variances keep increasing as *L* decreases to 0, the posterior ensemble consistency seems to be worst for *L* = 1. For *L* = 0 (Fig. 2f), CR is close to 1 again for most of the spectrum except for wavenumbers 0–2. In this case, observation error spectral variances are underspecified at large scales and overspecified at small scales, and the effects on ensemble spread from such under and overspecification seem to cancel out. However, since observation error spectral variances are not correctly specified during assimilation, the posterior error variances do increase. The solution from *L* = 0 is similar to the case of running the serial EnSRF where observation error correlations are not accounted for. The solutions from the serial EnSRF and the EnSRF with *L* = 0 differ only due to their different ways of applying localization.

In practice, the specified *σ* is often inflated during DA to compensate for correlated observation errors (Courtier et al. 1998; Bormann et al. 2016). Another set of experiments are performed by deviating both *σ* and *L* from the benchmark case to see if adjusting *σ* can help compensate the misspecification of *L*. Figure 3 plots the overall posterior RMSE and CR (averaged over all scales) as a function of *σ* ranging from 0.4 to 1.6 and *L* ranging from 0 to 10 grid points. The “+” sign marks the benchmark case which has the lowest RMSE and CR close to 1. Results show that small deviations from $L*$ (*L* = 3–10) can be well compensated by adjusting *σ*. For example, *σ* = 1.2 for *L* = 7 and *σ* = 0.8 for *L*= 3 both yield DA performance that is comparable to the benchmark. However, as *L* deviates more from the truth (*L* < 2), adjusting *σ* becomes no longer effective and overall posterior RMSE increases. As seen from Figs. 2d–f, the shape of the specified observation error spectrum differs too much from the truth and cannot be matched by simply adjusting the overall error level. For *L* < $L*$, inflating *σ* can better match its large-scale error variances, but at a price of sacrificing the small-scale components where the mismatch gets even worse. Similarly, trying to match small-scale error variances will sacrifice the large-scale components.

The best DA performance for each specified *L* is found by manually tuning *σ*, ROI, and *λ*, and results are summarized in the first section of Table 1. The benchmark case does not require too much localization and inflation because of the near-optimal ensemble size and the perfect model. As a result, cases with *L* = 3–10 have low RMSE and CR close to 1 because *L* only deviates slightly from the truth (*L* = 5). Note that the shape of the observation error spectrum does not change linearly with *L*, therefore doubling *L* from 5 to 10 does not result in big mismatches in error spectra. The suboptimality becomes clear as *L* drops below 2, where the posterior RMSE increases considerably. The same experiments are repeated for two other scenarios: more sampling errors (*N* = 20, *F* = 8), and imperfect model (*N* = 40, *F* = 7.5), whose results are also summarized in Table 1. In these scenarios, extra sources of suboptimality are introduced to the system, causing DA performance to degrade. The manually tuned ROI and *λ* counter these suboptimality as much as possible (larger inflation factor and tighter localization distances are used). In these scenarios, the suboptimality due to mis-specified *L* still appear in a similar fashion as in the first scenario, indicating that the behavior described above is robust.

### b. Test of multiscale observation approach

In this section the MSO method is tested with the serial EnSRF using the Lorenz-96 model. Since the model does not feature multiscale dynamics (errors grow and instantaneously project to all wavenumbers), multiscale localization is not applied. The wavenumber space is evenly divided into *N*_{s} bands for each scale component, and the cases using *N*_{s} = 1, 2, 3, 4, 5, and 7 are tested. Note that *N*_{s} = 1 is the original single-scale serial EnSRF. Figure 4 shows the spectra of prior and posterior error and ensemble spread for cases with increasing *N*_{s}. Note that results from *N*_{s} = 4 and 5 do not visually differ much from *N*_{s} = 7, thus not shown. For *N*_{s} = 2, the observations are decomposed into two scale components, the larger-scale observation errors are inflated to *σ* ≈ 1.3 and the smaller-scale errors are deflated to *σ* ≈ 0.4. Results suggest that separating observations into just two scale components already reduces most of the suboptimality due to mis-specified *L*. As *N*_{s} increases, the filter solution approaches the benchmark EnSRF solution with correctly specified *L*. More quantitatively, the overall posterior RMSE and CR are summarized in the first section of Table 2. The MSO method is also tested in the other two scenarios with additional suboptimalities, and results are also summarized in Table 2. As *N*_{s} increases, the overall posterior RMSE also decreases for these scenarios despite the fact that performance is worse than the first scenario due to additional error sources. One exception is for the second scenario with more sampling errors, where the improvement in performance stops at *N*_{s} = 5 and posterior RMSE increases back up for *N*_{s} = 7. The accumulation of sampling noises may be the culprit here. When the serial EnSRF assimilates the *s*th observation scale component, error covariances between the observation at scale *s* and the model state need to be estimated by the ensemble. Such cross-scale error covariance is inevitably contaminated with sampling noises not fully removed by a distance-based localization function. This may also explain why the *N*_{s} = 7 case still have larger posterior RMSE compared to the benchmark EnSRF case, since no cross-scale covariance is needed in the EnSRF.

## 4. Two-layer quasigeostrophic model experiments

The multiscale approach is further tested in a higher-dimensional setting using the two-layer quasigeostrophic (QG) model (Smith et al. 2002). The model describes two-dimensional vorticity dynamics in geophysical flows with baroclinic instabilities. Although much simpler than actual weather prediction models, the QG model describes flows spanning many scales, and therefore is useful for this study. Horizontal model grid dimension is 128 × 128 with doubly periodic boundary conditions. Let *x* and *y* be the coordinates in the zonal and meridional directions, and subscripts 1 and 2 indicate the top and bottom layer, respectively. Background streamfunction is $\psi \xaf1=\u2212Uy$ and $\psi \xaf2=Uy$, where *U* is the mean flow. The governing equations are

where $qi=\u22072\psi i\u2212\u2061(\u22121)i\u2061(kd2/2)\u2061(\psi 2\u2212\psi 1)$, *i* = 1, 2, are the perturbation potential vorticity, *ψ* is the perturbation streamfunction, and *J* denotes the Jacobian term. Model parameters include the Rossby deformation wavenumber (*k*_{d}), the meridional gradient of the Coriolis force (*β*), and the Ekman bottom drag strength (*b*). The scale defined by $kd\u22121$ features energy injection due to baroclinic instability, while *β* and *b* define the Rhines scale at which the inverse cascade of kinetic energy halts. This model configuration follows the control experiment in Ying et al. (2018): *k*_{d} = 20, *β* = 16, *U* = 0.2, and *b* = 0.5. The potential temperature *θ* is chosen as the model state variable for assimilation in this study, because its spectrum resembles the real atmospheric temperature and kinetic energy spectra. Let $k=kx2+ky2$ be the horizontal total wavenumber, so the maximum wavenumber for the model grid is *k*_{max} = 63. The *θ* field is converted from streamfunction in spectral space,

A nature run is first obtained to provide the verifying truth. Figure 5a shows a snapshot of the true *θ* field. Synthetic observations are then generated by adding randomly drawn errors to the truth field. A uniform observation grid with horizontal spacing of 3 model grid points is used. The *θ* variable is directly observed, but only available at the top model layer. Observation errors are drawn from a first-order autoregressive spatial random process. Figures 5b and 5c show snapshots of observation error fields with standard deviation $\sigma *$ = 3 and spatial correlation $r*$ = 0 and 0.6, respectively.

Only the serial EnSRF is implemented for the QG model (the EnSRF is not computationally feasible) and tested with the MSO and MSL methods. To determine the ROIs specified in the MSL method, a set of sensitivity experiments are performed to determine the best ROIs for different scales. The QG model states are arbitrarily decomposed into three scale components: large (*k* = 0–5), medium (*k* = 6–12), and small (*k* = 13–63) scales. Figure 6 compares the spectra of posterior error and ensemble spread using ROIs ranging from 8 to 24 grid points for the $r*$ = 0 and 0.6 cases. The reference spectrum has a −3 slope, which corresponds to the forward enstrophy cascade in QG dynamics. The uncorrelated observation error spectrum has a +1 slope (white noise), and the correlated one has increased spectral error variances at large scale and decreased at small scale. Note that the spectra are plotted on a log-log scale. For $r*$ = 0 (Fig. 6a), the posterior error spectral variances are concentrated at the medium scale, since the large scale is well constrained by the observations and the small scale has low reference error variances to begin with. Since the observing network is at lower spatial resolution, the small scale lacks observation information and the posterior errors are mostly saturated (close to reference level). The best ROI is 24 grid points for the large scale and 12 grid points for the small scale, consistent with findings from Ying et al. (2018). For $r*$ = 0.6 (Fig. 6b), the posterior error spectral variances are more concentrated in the large scale. Underspecification of observation error spectral variances at the large scale causes the posterior ensemble to be underdispersive. This case seems to display a stronger scale dependency in best ROIs. The favorable ROIs actually exceed the range shown in Fig. 6b. The best ROI is 32 grid points for the large scale and 4 grid points for the small scale.

The three scale components defined above are used for both the MSO and MSL methods. The ROIs are set to 24, 16, and 8 grid points for the three components in the MSL method. For the MSO method, the spectral observation error variance adjustment factors $\lambda so$ are set to 2.4, 1.5, and 0.8 for the three scale components according to the mismatches between the specified and true observation error spectral variances. An adaptive prior inflation algorithm is implemented based on innovation statistics (Hollingsworth and Lönnberg 1986; Desroziers et al. 2005) to avoid manual tuning. At each analysis cycle, a maximum likelihood estimate gives the prior inflation factor:

A relaxation to prior perturbation method (Zhang et al. 2004) is also applied with a fixed coefficient *α* = 0.5 (chosen by trial and error) to provide some posterior inflation, since prior and posterior inflation may ameliorate different kinds of suboptimalities during DA (El Gharamti et al. 2019).

Cycling DA experiments are conducted using *N* = 20 ensemble members and assimilating the observations with correlated errors ($r*$ = 0.6) every 0.05 time units for 200 cycles. The cycling period is long enough so that the posterior errors have time-lagged correlations of no more than 0.2, and thus can be treated as quasi-independent samples for performance metrics. Tests are first conducted in a perfect model scenario. Four experiments are compared: the original single-scale serial EnSRF with ROI = 16, and the serial EnSRF with the MSO, MSL, and MSOL methods.

Figure 7 compares the averaged spectra of posterior error and ensemble spread from each method. More quantitatively, the first section of Table 3 summarizes the averaged posterior RMSE and CR for the three scale components. All comparisons of RMSE between different methods have passed the student *t* test at *p* < 0.01 significance level. The underspecified observation error spectral variances at the large scale causes the single-scale filter solution to be severely underdispersive (CR = 0.13) at the large scale. The MSL method lowers the posterior error spectral variances for all wavenumbers, but the large-scale component is still underdispersive (CR = 0.27). Spatially correlated observation errors can also be considered as locally biased large-scale observation errors (Wilks 1995, section 5.2.3). Assimilating these observations assuming uncorrelated errors will cause the posterior ensemble to overfit the observations and also become locally biased (hence the low CR). The MSO method significantly improves the large-scale component of the posterior ensemble, reducing RMSE and bringing the CR much closer to 1. The posterior ensemble no longer overfits the observations because the large-scale observation error variances are inflated. The MSOL method produces the lowest posterior RMSE for all three scale components. The MSL method seems to cause the large-scale posterior ensemble to be underdispersive and small-scale ensemble to be overdispersive, which is adjusted during the model forecast steps to produce an overall consistent prior ensemble (not shown). The interplay between scale-dependent error growth during forecast and error reduction during DA calls for further investigation in more complex models.

The experiments are repeated under another scenario where model parameter error exists in the forecast steps. The forecast model has *k*_{d} = 10 such that the peak baroclinic instability generation is shifted toward larger scales, causing smaller scales (*k* > 10) to be less energetic than the true model (*k*_{d} = 20). The second section in Table 3 shows that the presence of model error increases the overall posterior RMSEs. The time averaged adaptive prior inflation factor also increases from 0.98 to 1.07. However, the MSOL method still improves the performance over the single-scale method, and the impact from the MSO and MSL methods are similar to those found in the perfect model scenario. The only difference is that the MSL method improves the small-scale component more than the MSO method for the imperfect model scenario.

## 5. Discussion

The current study uses an idealized observing network to demonstrate the suboptimality in ensemble filtering performance due to mis-specified observation error statistics and test the proposed MSO method as a remedy. The idealized observing network is uniform in space and time, its observation error is modeled as a first-order autoregressive process with constant variance *σ*^{2} and correlation length scale *L* in space and time. Such idealized observations are hard to find in real application scenarios. Typically, observations are not distributed uniformly in space and time, and the statistics of representation errors also vary in space and time due to the flow dependency in model forecast errors (Janjić and Cohn 2006; Waller et al. 2014). The true observation error statistics are ultimately unknown, which is even more challenging. In this section, potential approaches for extending the MSO method to more realistic scenarios will be discussed.

For a nonuniform observing network, scale decomposition cannot be performed with a pair of fast Fourier transforms. A spatial averaging approach similar to superobbing can be used instead. Observations are averaged within each grid box of a uniform grid at a certain resolution. The smoothed observation defined on the low-resolution grid represents its large-scale component. Due to the spatial inhomogeneity of observations, some grid boxes will have too few observations to provide robust statistics for the large scale, therefore shall be marked as missing values and discarded by the filter. To find the smaller-scale component, the large-scale component can be subtracted from the original observations and spatial averaging can be applied again with higher-resolution grid corresponding to the smaller scale. This procedure will be repeated for the total number of scale components, resulting in a multigrid representation of the original observations, which will then be used as observation scale components in the MSO method.

The MSO method uses an adjustment factor $\lambda so$ to change the observation error variance for the *s*th scale component. In this study, the correct factors are given by

where $\Lambda so*$ is the true observation error spectral variance and $\Lambda so$ is the originally specified one. In real application scenarios, $\Lambda so*$ is unknown and has to be estimated through observation-space innovation statistics (e.g., Hollingsworth and Lönnberg 1986; Desroziers et al. 2005; Bowler 2017; Howes et al. 2017). For example, the innovation statistics for the *s*th scale component can be written as

where $yso$ is the observation and $y\xafsb$ is the observation prior ensemble mean, $\lambda so$ and $\lambda sb$ are adjustment factors for the specified observation error and prior error spectral variances. This study only considers the case where $\Lambda so\u2260\Lambda so*$ due to mischaracterization of observation error correlation length scale *L*. Therefore, assume $\Lambda sb=\Lambda sb*$, $\lambda sb$ = 1, and (22) will provide a maximum likelihood estimate for $\lambda so$. In a more general case, the prior error variances may also be mis-specified due to other suboptimalities (e.g., sampling noises), then $\lambda sb$ can be estimated as an adaptive inflation factor to adjust the prior error variances. The method described in Li et al. (2009) can provide simultaneous estimate for both adjustment factors $\lambda so$ and $\lambda sb$ using the innovation statistics. Since spatial correlation in representation errors may be actually caused by the prior errors, adjusting the prior inflation factors $\lambda sb$ to account for the effect of these spatial correlations may be an alternative method. Adaptive inflation methods using Bayesian formulations (Anderson 2009; Miyoshi 2011; El Gharamti 2018) can be borrowed to estimate a spatially and temporally varying $\lambda so$, so that the MSO method can be applied to nonuniform observing networks.

Another concern is the increase in computational cost for the MSO method. As described in this paper, the MSO method increases the number of observations by a factor of *N*_{s}, which is the number of scale components. Using a multigrid representation, however, would only increase the number of observations by a factor of no more than 2, since larger-scale components would have fewer number of observations. Different model variables may have different spectral error variance distributions. For example, wind and temperature tend to have errors in the larger scales compared to precipitation and vertical motion. Complex scale interactions among state variables may also occur. Determining the cutoff wavenumbers for each scale component may be a nontrivial task for real models. Adaptive algorithms will be needed to turn on the MSO method when mismatches in spectral error variances are detected and scale decomposition is performed to optimally adjust the spectral error variances. The cost-benefit analysis of spending computational resources on ensemble size and model resolution has been performed in previous studies (e.g., Lei and Whitaker 2017). Similar analysis can also be performed for the number of scale components for real application.

## 6. Conclusions

In this study, the suboptimality in ensemble square root filters (EnSRF) due to mischaracterizing the observation error spatial correlations is evaluated using the Lorenz-96 model. The true observation errors have spatial correlations that are neglected by the serial EnSRF, resulting in suboptimal filter performance. The mismatches between specified and true observation error spectral variances cause the analysis ensemble to be overdispersive at small scales and underdispersive at large scales. As the spectral distribution is vastly different between correlated and uncorrelated observation errors, a scale-dependent adjustment of observation error variances is needed to reduce such suboptimality, which motivates the multiscale approach. The proposed multiscale observation (MSO) method decomposes observations into multiple scale components using a spectral bandpass filter and assimilates each component with adjusted observation error variances to update the model state iteratively. Lorenz-96 model experiments indicate that the solution from the serial EnSRF with the MSO method approaches the solution from the EnSRF with correctly specified observation error correlations as the number of scales increases. The MSO method is further tested with a more complicated two-layer quasigeostrophic (QG) model. The serial EnSRF is implemented with the MSO method to assimilate observations with spatially correlated errors. The MSO method is capable of reducing the suboptimality due to mischaracterization of observation error correlations in both perfect and imperfect model scenarios. The multiscale localization (MSL) method is also applied to allow the use of different localization radii when updating the model state at different scales. The QG model state spans a range of spatial scales, therefore its large-scale component favors a longer localization radius, and small-scale favors a shorter radius. The MSL method improves filter performance compared to the single-scale serial EnSRF with a fixed localization radius. Combining the MSO and MSL, the MSOL multiscale approach gives the best filter performance in terms of analysis error and consistency between error and ensemble spread. Although the MSO method is only tested in very idealized settings in this study (uniform observing network, simple observation error model, and known true statistics), the underlying concept of adjusting the observation error spectral variances is potentially applicable to realistic scenarios. Using innovation statistics to estimate a spatially and temporally varying observation error adjustment factor for each scale component, the MSO method can be potentially applied to nonuniform observing networks with inhomogeneous error statistics. Future studies will further investigate the cost and benefit of the MSO method in realistic prediction systems.

## Acknowledgments

This material is based upon work supported by the National Center for Atmospheric Research, which is a major facility sponsored by the National Science Foundation under Cooperative Agreement 1852977. The author appreciates the support from the Advanced Study Program, and the guidance of Jeffrey Anderson and Moha El Gharamti at various stages of this research. This paper is dedicated to the author’s mentor, Fuqing Zhang, who recently passed away unexpectedly. The author also appreciates the comments from three anonymous reviewers that helped improve an earlier version of this paper.

## REFERENCES

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Tellus*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Tellus*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*J. Geophys. Res.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Mon. Wea. Rev.*

*Proc. Workshop on the Assimilation of IASI Data in NWP*, Reading, United Kingdom, ECMWF, 73 pp.

*Data Assimilation for Atmospheric, Oceanic, and Hydrologic Applications*, S. K. Park and L. Xu, Eds., Vol. III, Springer, 177–194

*Mon. Wea. Rev.*

*Tellus*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*J. Adv. Model. Earth Syst.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Proc. Seminar on Predictability*, Reading, United Kingdom, ECMWF, 1–18

*Quart. J. Roy. Meteor. Soc.*

*Mon. Wea. Rev.*

*SOLA*

*Inverse Probl. Sci. Eng.*

*Quart. J. Roy. Meteor. Soc.*

*J. Atmos. Oceanic Technol.*

*Quart. J. Roy. Meteor. Soc.*

*J. Fluid Mech.*

*Int. J. Numer. Methods Fluids*

*Tellus*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Statistical Methods in the Atmospheric Sciences: An Introduction*. International Geophysics Series, Vol. 59, Elsevier, 467 pp

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

10th CAWCR Workshop, Melbourne, Australia, CSIRO–Australian Bureau of Meteorology, Geophysical Research Abstracts, Vol. 18, EGU2016-3104.