• Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 28842903.

  • Anderson, J. L., 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131, 634642.

  • Anderson, J. L., 2007: Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter. Physica D, 230, 99111.

    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., 2009a: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A, 7283.

  • Anderson, J. L., 2009b: Ensemble Kalman filters for large geophysical applications. IEEE Control Syst., 29, 6682.

  • Anderson, J. L., , B. Wyman, , S. Zhang, , and T. Hoar, 2005: Assimilation of surface pressure observations using an ensemble filter in an idealized global atmospheric prediction system. J. Atmos. Sci., 62, 29252938.

    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., , T. Hoar, , K. Raeder, , H. Liu, , N. Collins, , R. Torn, , and A. Arellano, 2009: The Data Assimilation Research Testbed. Bull. Amer. Meteor. Soc., 90, 12831296.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., , and D. Hodyss, 2007: Flow adaptive moderation of spurious ensemble correlations and its use in ensemble based data assimilation. Quart. J. Roy. Meteor. Soc., 133, 20292044.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., , and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part 1: Tests on simple error models. Tellus, 61A, 8496.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., , and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part 2: A strategy for the atmosphere. Tellus, 61A, 97111.

    • Search Google Scholar
    • Export Citation
  • Burgers, G., , P. J. van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 17191724.

  • Campbell, W. F., , C. H. Bishop, , and D. Hodyss, 2010: Vertical covariance localization for satellite radiances in ensemble Kalman filters. Mon. Wea. Rev., 138, 282290.

    • Search Google Scholar
    • Export Citation
  • Chen, Y., , and D. S. Oliver, 2009: Cross-covariances and localization for EnKF in multiphase flow data assimilation. Comput. Geosci., 14, 579601, doi:10.1007/s10596-009-9174-6.

    • Search Google Scholar
    • Export Citation
  • Courtier, P., and Coauthors, 1998: The ECMWF implementation of three-dimensional variational assimilation (3D-Var). I: Formulation. Quart. J. Roy. Meteor. Soc., 124, 17831807.

    • Search Google Scholar
    • Export Citation
  • Emerick, A., , and A. Reynolds, 2010: Combining sensitivities and prior information for covariance localization in the ensemble Kalman filter for petroleum reservoir applications. Comput. Geosci., 15, 251269, doi:10.1007/s10596-010-9198-y.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 14310 162.

    • Search Google Scholar
    • Export Citation
  • Fillion, L., , H. L. Mitchell, , H. Ritchie, , and A. Staniforth, 1995: The impact of a digital filter finalization technique in a global data assimilation system. Tellus, 47A, 304323.

    • Search Google Scholar
    • Export Citation
  • Furrer, R., , and T. Bengtsson, 2007: Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivar. Anal., 98, 227255.

    • Search Google Scholar
    • Export Citation
  • Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723757.

    • Search Google Scholar
    • Export Citation
  • GFDL Global Atmospheric Model Development Team, 2004: The new GFDL global atmosphere and land model AM2–LM2: Evaluation with prescribed SST simulations. J. Climate, 17, 46414673.

    • Search Google Scholar
    • Export Citation
  • Greybush, S. J., , E. Kalnay, , T. Miyoshi, , K. Ide, , and B. R. Hunt, 2011: Balance and ensemble Kalman filter localization techniques. Mon. Wea. Rev., 139, 511522.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., , J. S. Whitaker, , and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 27762790.

    • Search Google Scholar
    • Export Citation
  • Held, I. M., , and M. J. Suarez, 1994: A proposal for the intercomparison of the dynamical cores of atmospheric general circulation models. Bull. Amer. Meteor. Soc., 75, 18251830.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., , and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., , and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131, 32693289.

  • Kepert, J. D., 2009: Covariance localisation and balance in an Ensemble Kalman Filter. Quart. J. Roy. Meteor. Soc., 135, 11571176.

  • Li, H., , E. Kalnay, , and T. Miyoshi, 2009a: Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter. Quart. J. Roy. Meteor. Soc., 135, 523533.

    • Search Google Scholar
    • Export Citation
  • Li, H., , E. Kalnay, , T. Miyoshi, , and C. M. Danforth, 2009b: Accounting for model errors in ensemble data assimilation. Mon. Wea. Rev., 137, 34073419.

    • Search Google Scholar
    • Export Citation
  • Liu, H., , J. L. Anderson, , Y.-H. Kuo, , and K. Raeder, 2007: Importance of forecast error multivariate correlations in idealized assimilations of GPS radio occultation data with the ensemble adjustment filter. Mon. Wea. Rev., 135, 173185.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., , and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model. J. Atmos. Sci., 55, 399414.

    • Search Google Scholar
    • Export Citation
  • Lyster, P. M., , S. E. Cohn, , R. Menard, , L.-P. Chang, , S.-J. Lin, , and R. G. Olsen, 1997: Parallel implementation of a Kalman filter for constituent data assimilation. Mon. Wea. Rev., 125, 16741686.

    • Search Google Scholar
    • Export Citation
  • Mitchell, H. L., , and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter. Mon. Wea. Rev., 128, 416433.

  • Mitchell, H. L., , P. L. Houtekamer, , and G. Pellerin, 2002: Ensemble size, balance and model-error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130, 27912808.

    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., , S. Yamane, , and T. Enomoto, 2007: Localizing the error covariance by physical distance within a local ensemble transform Kalman filter (LETKF). Sci. Online Lett. Atmos., 3, 8992.

    • Search Google Scholar
    • Export Citation
  • Oke, P. R., , P. Sakov, , and S. P. Corney, 2007: Impacts of localization in the EnKF and EnOI: Experiments with a small model. Ocean Dyn., 57, 3245.

    • Search Google Scholar
    • Export Citation
  • Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A, 415428.

  • Tong, M., , and M. Xue, 2005: Ensemble Kalman filter assimilation of Doppler radar data with a compressible nonhydrostatic model: OSS experiments. Mon. Wea. Rev., 133, 17891807.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., , G. P. Compo, , X. Wei, , and T. M. Hamill, 2004: Reanalysis without radiosondes using ensemble data assimilation. Mon. Wea. Rev., 132, 11901200.

    • Search Google Scholar
    • Export Citation
  • Zhang, Y., , and D. S. Oliver, 2010: Improving the ensemble estimate of the Kalman gain by bootstrap sampling. Math. Geosci., 42, 327345.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    Sampling error correction as a function of the absolute value of the ensemble sample correlation for a range of ensemble sizes.

  • View in gallery

    Time mean root-mean-square error of ensemble mean prior estimate (solid line) and prior ensemble spread (dashed line) as a function of ensemble size for assimilations with the 200-variable separable linear model with sampling error correction. The thin dashed line is the RMS error and spread for a 201-member ensemble with no sampling error correction or adaptive inflation.

  • View in gallery

    Time mean values of localization obtained from sampling error correction algorithm and adaptive inflation as a function of ensemble size for assimilations with the 200-variable separable linear model.

  • View in gallery

    Time mean root-mean-square error of the ensemble mean as a function of the half-width of the background Gaspari–Cohn localization for ensemble assimilations in the Lorenz-96 model with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 10 (dash–dot lines), 20 (dashed lines), and 40 (solid lines).

  • View in gallery

    Time mean values of systematic error correction for an observation that is the average of the 15th and 16th state variables in the Lorenz-96 assimilation experiments (thick solid lines) for (top) 10-, (middle) 20-, and (bottom) 40-member ensembles. The 10-member result is repeated in the (middle) and (bottom) (thin solid line) for easy comparison. The time mean localization from a group filter with four groups and corresponding ensemble size is shown by the dashed line in each. No background Gaspari–Cohn localization is applied for any of these cases. In (top), a Gaspari–Cohn localization with half-width 0.1 of the domain size is shown by the thin solid line.

  • View in gallery

    Time mean root-mean-square error of the ensemble mean for surface pressure as a function of the half-width of a background Gaspari–Cohn localization (rad) for ensemble assimilations in the low-order dynamical core with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 20 (dash–dot lines), 40 (dashed lines), and 80 (solid lines).

  • View in gallery

    Time mean ratio of ensemble spread to ensemble mean root-mean-square error for surface pressure as a function of the half-width of a background Gaspari–Cohn localization (rad) for ensemble assimilations in the low-order dynamical core with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 20 (dashed lines) and 80 (solid lines).

  • View in gallery

    Time mean of spatial mean inflation for surface pressure as a function of the half-width of a background Gaspari–Cohn localization (rad) for ensemble assimilations in the low-order dynamical core with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 20 (dashed lines) and 80 (solid lines).

  • View in gallery

    Time and space mean of spatial root-mean-square value of the initial time tendency of surface pressure after each assimilation step as a function of the half-width of a background Gaspari–Cohn localization (rad) for ensemble assimilations in the low-order dynamical core with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 20 (dashed lines) and 80 (solid lines). The horizontal dashed line marks the value from an extended free integration of the model.

  • View in gallery

    Time mean value of the localization obtained from the systematic error correction algorithm for the influence of a surface pressure observation located at the lower left of the “+” on surface pressure state variables for an assimilation with 80 ensemble members and a 1.2-rad background Gaspari–Cohn localization. The bottom (right) panel shows values along the horizontal (vertical) line in the main panel.

  • View in gallery

    Time mean value of the localization obtained from the systematic error correction algorithm for the influence of a north–south wind observation at the middle model level located at the lower left of the “+” on the middle level east–west wind state variables for an assimilation with 80 ensemble members and a 1.2-rad background Gaspari–Cohn localization. The bottom (right) panel shows values along the horizontal (vertical) line in the main panel. Note the reduced spatial extent of the plots and the color scale that is different from that in Fig. 10.

  • View in gallery

    As in Fig. 11, but localizations are time mean from a group filter with two groups of 80 ensemble members each. Note the different range on the color scale from Fig. 11.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 305 305 26
PDF Downloads 283 283 24

Localization and Sampling Error Correction in Ensemble Kalman Filter Data Assimilation

View More View Less
  • 1 NCAR/Data Assimilation Research Section,* Boulder, Colorado
© Get Permissions
Full access

Abstract

Ensemble Kalman filters use the sample covariance of an observation and a model state variable to update a prior estimate of the state variable. The sample covariance can be suboptimal as a result of small ensemble size, model error, model nonlinearity, and other factors. The most common algorithms for dealing with these deficiencies are inflation and covariance localization. A statistical model of errors in ensemble Kalman filter sample covariances is described and leads to an algorithm that reduces ensemble filter root-mean-square error for some applications. This sampling error correction algorithm uses prior information about the distribution of the correlation between an observation and a state variable. Offline Monte Carlo simulation is used to build a lookup table that contains a correction factor between 0 and 1 depending on the ensemble size and the ensemble sample correlation. Correction factors are applied like a traditional localization for each pair of observations and state variables during an ensemble assimilation. The algorithm is applied to two low-order models and reduces the sensitivity of the ensemble assimilation error to the strength of traditional localization. When tested in perfect model experiments in a larger model, the dynamical core of a general circulation model, the sampling error correction algorithm produces analyses that are closer to the truth and also reduces sensitivity to traditional localization strength.

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

Corresponding author address: Jeffrey Anderson, NCAR, 1850 Table Mesa Dr., Boulder, CO 80305. E-mail: jla@ucar.edu

Abstract

Ensemble Kalman filters use the sample covariance of an observation and a model state variable to update a prior estimate of the state variable. The sample covariance can be suboptimal as a result of small ensemble size, model error, model nonlinearity, and other factors. The most common algorithms for dealing with these deficiencies are inflation and covariance localization. A statistical model of errors in ensemble Kalman filter sample covariances is described and leads to an algorithm that reduces ensemble filter root-mean-square error for some applications. This sampling error correction algorithm uses prior information about the distribution of the correlation between an observation and a state variable. Offline Monte Carlo simulation is used to build a lookup table that contains a correction factor between 0 and 1 depending on the ensemble size and the ensemble sample correlation. Correction factors are applied like a traditional localization for each pair of observations and state variables during an ensemble assimilation. The algorithm is applied to two low-order models and reduces the sensitivity of the ensemble assimilation error to the strength of traditional localization. When tested in perfect model experiments in a larger model, the dynamical core of a general circulation model, the sampling error correction algorithm produces analyses that are closer to the truth and also reduces sensitivity to traditional localization strength.

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

Corresponding author address: Jeffrey Anderson, NCAR, 1850 Table Mesa Dr., Boulder, CO 80305. E-mail: jla@ucar.edu

1. Introduction

Ensemble Kalman filters were developed for data assimilation in oceanic and atmospheric applications during the 1990s (Evensen 1994; Burgers et al. 1998). Basic ensemble filters worked well for low-order models, but performed poorly or diverged from the observed system when applied to large geophysical models. Houtekamer and Mitchell (1998) determined that small ensembles could not accurately estimate the small correlations between a state variable and a physically remote observation. To reduce errors, they did not use observations that were further than a cutoff distance from a state variable; this procedure is called localization. A properly tuned localization allows ensemble filters with fewer than 100 members to work with the largest atmosphere and ocean models. However, tuning the cutoff distance for a particular application is expensive.

Theoretically motivated functions that approximate the spatial covariance of geophysical variables for data assimilation purposes were described by Gaspari and Cohn (1999). Data assimilation experiments with variational methods (Courtier et al. 1998) and Kalman filters (Lyster et al. 1997) using these functions were already under way and it was a natural extension to apply them in ensemble Kalman filters. The compactly supported polynomial approximation of a normal probability distribution in Gaspari and Cohn (1999) became the standard solution for localizing in the horizontal in atmospheric applications of the ensemble Kalman filter. Only a single real coefficient defining the width of the localization must be tuned, but even this can be expensive for large models.

Localization in the vertical is also required for good filter performance in atmospheric applications. In this case, there is less theoretical foundation for choosing a localization and a number of functions have been used (Whitaker et al. 2004; Houtekamer and Mitchell 2005). To further increase the complexity of tuning, ensemble filter performance is also improved when different localizations are used for the impact of different observation types (Houtekamer and Mitchell 2005; Tong and Xue 2005). An additional challenge is that increasingly strong localization of observation impacts can lead to increasingly unbalanced posterior model states and consequent transient nonequilibrium oscillations such as gravity waves in atmospheric models (Mitchell et al. 2002; Kepert 2009). This complexity motivates the design of algorithms to automatically tune localizations.

Developing a better understanding of why localization is needed for good filter performance is important for improving filter algorithms. Bishop and Hodyss (2007) were among the first to propose a method for adaptive ensemble covariance localization. Anderson (2007) suggested that localization was primarily needed because of sampling error and proposed a method using a group of ensemble filters to detect and correct for this error. The nature of sampling error in ensemble filters is explored further here leading to an algorithm that computes localization as a function of ensemble sample correlation and ensemble size. A number of other studies including Chen and Oliver (2009) and Bishop and Hodyss (2009a,b) have related localization to the correlation between an observation and a state variable.

Section 2 discusses methods for automatically tuning localization and hypothesizes that localization is required because of sampling error. Section 3 proposes an algorithm for reducing sampling error and section 4 presents results in two low-order models and the dynamical core of an atmospheric circulation model.

2. Sampling error in ensemble filters

The Kalman filter is the optimal solution to a data assimilation problem with a linear forecast model, linear observation operators, and normal observational error. A deterministic ensemble Kalman filter like the ensemble adjustment Kalman filter (EAKF; Anderson 2001) with an ensemble size that exceeds a threshold is simply an algorithm for computing the Kalman filter solution (Anderson 2009b). The ensemble sample covariance of an observation and a state vector component is the optimal estimate. When the EAKF is used with an ensemble that is too small, or any of the conditions for the Kalman filter to be optimal are violated, the ensemble sample covariance is no longer guaranteed to be optimal. In addition, stochastic ensemble Kalman filters like the perturbed observation filter (Burgers et al. 1998) are subject to sampling error for any ensemble size even when the Kalman filter is an optimal solution.

Localization is a standard algorithm for reducing the impact of errors in ensemble Kalman filters (Houtekamer and Mitchell 1998; Hamill et al. 2001; Furrer and Bengtsson 2007). The regression coefficient, or gain, relating ensemble increments for observed quantity y to increments for state variable x is multiplied by a factor between 0 and 1 called a localization. The localization is often a function of the physical distance between y and x. A common form for this function is a compactly supported polynomial approximation to a normal known as the Gaspari–Cohn (GC) function (Gaspari and Cohn 1999). The GC function works well for many applications where localization in the horizontal is important, but specifying localization in the vertical is more challenging (Whitaker et al. 2004). In addition, some observations, for instance a satellite radiance (Houtekamer and Mitchell 2005) or a Constellation Observing System for Meteorology Ionosphere and Climate (COSMIC) radio occultation (Liu et al. 2007), are not associated with a unique spatial location. Campbell et al. 2010 explored localization functions for observations, like satellite radiances, that have forward operators that perform weighted averages of a large number of state variables. As noted in Anderson (2007) localizations for such variables are expected to be quite different from a GC function in some cases. Good localizations may also be a function of the types of the observation and the state variable; for instance, an observation of temperature might require different localizations for temperature and wind state variables (Anderson 2007). Finally, localization is also expected to be a function of the time difference between an observation and a state variable (Anderson 2007); this is relevant for ensemble Kalman smoothing. Some ensemble filters like the local ensemble transform Kalman filter (Ott et al. 2004) implicitly localize by implementing the assimilation algorithm on local patches although they can be modified to have more general localization (Miyoshi et al. 2007).

There are a number of techniques to adapt to or correct for errors in ensemble Kalman filter sample covariances. Mitchell and Houtekamer (2000) developed a model of model error and compared prior statistics to observations. Their algorithm improved filter performance and provided estimates of the error terms. Li et al. (2009b) explored several models of model error and compared their efficacy in perfect model assimilations with and without inflation of ensemble priors. Li et al. (2009a) presented a method to simultaneously correct for a lack of variance in prior ensembles and estimate the error variance of observations.

Methods for dynamically computing localization have been developed. In an algorithm closely related to the one presented here, Emerick and Reynolds (2010) based localization on observation sensitivity matrices in a petroleum reservoir application. A similar algorithm, where localization is related to the prior ensemble covariance between an observation and a state variable, is presented in Chen and Oliver (2009). Bishop and Hodyss (2007, 2009a,b) present algorithms where localization is equal to a power of the sample correlation of a smoothed ensemble.

Anderson (2007) hypothesized that most of the errors necessitating localization in atmospheric applications were literally sampling error (i.e., dependent on arbitrary choices in the selection of a finite initial ensemble). Running several ensemble filters, differing only in the initial ensemble, produces a sample of regression coefficients (gains) for an observation y and a state vector component x. A localization that minimizes the expected, ensemble mean, root-mean-square error for the state variable component, given the set of regression coefficients, can be computed analytically. Time mean values of this group filter localization are similar to the best heuristically derived localizations for a variety of applications. However, this method ignores the possibility that the regression coefficients from the group of filters are biased compared to the optimal value. A similar approach that estimates errors in ensemble priors by splitting the ensemble into pieces is presented in Houtekamer and Mitchell (1998). A bootstrap method presented in Zhang and Oliver (2010) for estimating errors in ensemble covariances is also similar.

The group filter algorithm is expensive because it requires several ensemble filter assimilations. Anderson (2007) suggested that the time mean localization from a short group filter assimilation could be used as a static localization for a traditional ensemble filter. The next section describes a less costly algorithm for estimating sampling error using a single ensemble filter. This algorithm assumes that the ensemble filter is more similar to a true Monte Carlo algorithm in which the ensemble is a random draw from some underlying distribution than to a deterministic exact Kalman filter algorithm. If applied in the case where the ensemble filter is the optimal solution, the algorithm can only degrade performance. However, it may lead to improved performance in large geophysical applications with small ensemble sizes.

3. Sampling error correction algorithm

Describing the impact of a single observation y on a single state variable component x is sufficient to define all commonly used ensemble filter algorithms without loss of generality (see Anderson 2003 for a derivation). In this paper, a localization is defined as a factor that multiplies the regression coefficient that is used to compute increments to the prior estimate of x given increments for y,
e1
where α is a localization, n indexes the N ensemble members, Δyn is the increment for the nth ensemble estimate of the observed quantity, Δxn is the corresponding increment for the nth ensemble estimate of the state variable component, and is the sample regression coefficient:
e2
where is the sample correlation and and are the sample standard deviations of x and y, respectively.
The algorithm described here assumes that the N-member sample regression coefficient is drawn from a distribution that is . A nonzero standard deviation σb,N indicates that there is sampling error in the ensemble filter. The algorithm computes a localization α such that multiplying a random draw z from by α minimizes the expected RMS difference between αz and the maximum likelihood estimate . This minimizes the expected RMS error in the updated ensemble mean for x. Minimizing requires that
e3
where
e4
Taking the partial derivative in (3) results in
e5
Using the integral identities
e6
and
e7
gives
e8
Solving for α gives
e9
where Q is the ratio of the mean regression to the standard deviation:
e10
A basic sampling error correction algorithm would proceed as follows. First, the sample regression coefficient is computed. Then a sampling error correction localization is computed with (9). The state variable ensemble is then updated using
e11
To compute (9), one needs to compute and σb,N given the ensemble sample and some additional prior information about the regression coefficient. This prior information is what is known about the regression coefficient before the assimilation is undertaken. Because possible values of the regression coefficient are not bounded, it can be challenging to define an appropriate prior. However, the correlation coefficient r is independent of the relative scale of the priors and is bounded on [−1, 1].
In what follows, the regression coefficient is expressed as in (2) and it is assumed that there is no sampling error in the sample standard deviations of the observation and state variable; all the sampling error comes from the correlation coefficient. The appendix provides some evidence that this assumption is not a poor approximation. If one can compute the mean and standard deviation σr,N of the distribution from which the sample correlation is drawn, then
e12
and
e13
and (9) can be computed.

For all results shown here, the prior information assumed about r is that it is uniformly distributed on [−1, 1]. This is a relatively uninformative prior. If more detailed information about the prior is available, for instance that the correlation is greater than 0.5, it could be used (see section 5). An additional complication is that the sample correlation coefficient is generally biased compared to the mean of the distribution conditioned on the additional prior information . This bias can be corrected by including an additional factor multiplying the localization in (11).

An offline Monte Carlo technique is used to approximate and σr,N given the sample regression coefficient . For the uniform correlation prior used here, a set of K correlation values is selected that equally partition the interval [−1, 1] with
e14
For results shown here, K = 201. For each rk, M random samples (M = 108 for all results here) of size N from a bivariate normal distribution with covariance are generated and the sample correlation coefficient is computed for each sample. The result is a set of K × M sample correlation values:
e15
The values are partitioned into K subsets Rk where the kth subset contains all values that are closer to rk than to any other value rj from the set defined in (14). Each sample correlation has an associated actual correlation rk. The mean and standard deviation of the actual correlations in each subset are computed. Suppose that an observation and a state variable have sample correlation . The mean and standard deviation of the actual correlations in the subset Rk for the value of rk that is closest to are used to approximate the mean and standard deviation for .

The final sampling error correction algorithm proceeds as follows:

  1. The ensemble sample regression coefficient and correlation are computed.
  2. The mean and standard deviation σr,N are obtained from the offline computation outlined in the previous paragraph given .
  3. and σb,N are computed from (12) and (13).
  4. α is computed from (9).
  5. Increments for x are computed as
    e16
    The term
    e17
    is the product of the localization and the bias correction and is referred to as the sampling error correction (SEC) hereafter.
Lookup tables for S as a function of ensemble size and ensemble sample correlation can be computed offline and referenced during an assimilation. Figure 1 shows S as a function of the absolute value of the sample correlation for a variety of ensemble sizes. The S is smaller for small ensembles and small correlations indicating that the relative sampling error is larger in these cases.
Fig. 1.
Fig. 1.

Sampling error correction as a function of the absolute value of the ensemble sample correlation for a range of ensemble sizes.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

4. Results

Perfect model assimilation experiments with increasingly complex models are used to evaluate the SEC algorithm. In a perfect model experiment, a single long run of the numerical forecast model, referred to as the “truth” run here, is used as an analog for an evolving physical system. Synthetic observations of this long run are generated by applying forward operators to the state vector from the truth run and adding in random samples from a specified observational error distribution.

All assimilation results use the ensemble adjustment Kalman filter (Anderson 2001) to compute observation increments. The spatially and temporally varying adaptive inflation algorithm of Anderson (2009a) is used with the inflation standard deviation fixed at 0.6 and the inflation value damped 10% of the distance to 1 before each assimilation step. These settings perform well in a variety of real data assimilations in large geophysical models (Anderson et al. 2009). The root-mean-square error of the ensemble mean from the truth,
e18
is used to evaluate results where is the ensemble mean, xm,t is the true value, and the subscript m indexes the model state variable.

a. Simple linear model

The first model is a simple linear growth model:
e19
where the first subscript indexes the state vector component and the second the time. Each of the 200 components evolves independently and is observed every time step with a simulated observational error drawn from Normal(0, 1). For ensemble sizes N > 200 the EAKF has no sampling error and produces the Kalman filter solution. For N ≤ 200, the EAKF solution diverges leading to unbounded growth in the ensemble estimates of the state. The optimal localization for this problem is a delta function so that the ith observation impacts only the ith state variable because the state variables evolve independently.

Ensemble assimilations are performed for 100 000 steps. Figure 2 displays the time mean root-mean-square error [(18)] of the ensemble mean averaged over all 200 state vector components for various ensemble sizes using the SEC algorithm. The dashed horizontal line indicates both the spread and the expected RMSE of the optimal solution from a 201-member filter with no SEC. The largest SEC assimilation ensemble shown has 201 members and an RMSE larger than the optimal value. All of the smaller ensembles would diverge without SEC and their RMSE increases as ensemble size decreases. Even with SEC, ensembles smaller than 50 generally diverged during the 100 000 steps. Figure 2 also displays the ensemble spread. It is larger than the optimal spread in all cases and is larger than the corresponding RMSE for ensemble sizes greater than 100.

Fig. 2.
Fig. 2.

Time mean root-mean-square error of ensemble mean prior estimate (solid line) and prior ensemble spread (dashed line) as a function of ensemble size for assimilations with the 200-variable separable linear model with sampling error correction. The thin dashed line is the RMS error and spread for a 201-member ensemble with no sampling error correction or adaptive inflation.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

Figure 3 displays the time mean estimate of the SEC averaged over all pairs of observations and state variables that are not collocated; the optimal value in this case is 0. The average SEC is very close to 0 for ensemble sizes of 201 and 200 and increases to 0.4 as ensemble size decreases to 50. These nonzero values reflect the inability of the SEC algorithm to accurately determine that all nonzero correlations are spurious in this case. However, the sampling error is systematically decreased for all ensemble sizes greater than 50. The value of SEC for all collocated pairs of observations and state variables is 1 in this case. Figure 3 also displays the time mean value of the adaptive inflation averaged over all state variables. As the ensemble gets smaller, more inflation is needed to offset the erroneous loss of variance that occurs when observations improperly impact state variables that are not collocated.

Fig. 3.
Fig. 3.

Time mean values of localization obtained from sampling error correction algorithm and adaptive inflation as a function of ensemble size for assimilations with the 200-variable separable linear model.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

b. Lorenz-96 40-variable model

The 40-variable configuration of the Lorenz-96 model (Lorenz and Emanuel 1998) is used with standard parameter settings discussed in their section 3: forcing F = 8.0, time step of 0.05 units, and the fourth-order Runge–Kutta time differencing scheme. At each assimilation time, 20 observations
e20
that are equally spaced in the model domain are assimilated. Observations are generated from a 110 000 step control integration with observational error simulated by adding a random draw from Normal(0, 1). The initial ensemble is composed of random draws from the long run. The first 10 000 steps of the assimilation are discarded and results are the average of the last 100 000 steps. A background GC localization is used in most experiments, both with and without SEC, with a half-width expressed as a fraction of the model domain.

Figure 4 displays the time mean ensemble mean RMSE [(18)] as a function of the half-width of the GC localization for ensemble sizes of 10, 20, and 40 with and without SEC. For 10-member ensembles, the SEC RMSE is larger for GC half-width less than 0.3 but smaller for larger halfwidths. Without SEC, the filter blows up for GC halfwidths greater than 0.5. Both filters with and without SEC give the smallest RMSE for GC half-width 0.2 but the case without SEC has a significantly smaller RMSE.

Fig. 4.
Fig. 4.

Time mean root-mean-square error of the ensemble mean as a function of the half-width of the background Gaspari–Cohn localization for ensemble assimilations in the Lorenz-96 model with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 10 (dash–dot lines), 20 (dashed lines), and 40 (solid lines).

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

Qualitatively, the behavior for 20-member ensembles is similar. The case without SEC has smallest RMSE for GC half-width 0.3 while the case with SEC is smallest for GC half-width 0.5. The SEC case has larger RMSE for GC half-width less than 0.7 and smaller for larger halfwidths. Again, the smallest overall RMSE is for the case without SEC. Finally, for 40-member ensembles the case without SEC has smaller RMSE for all GC halfwidths with the largest differences from the SEC case being for smaller halfwidths.

The SEC degrades filter performance for many of the Lorenz-96 cases. Only for large GC halfwidths and small ensemble sizes does it reduce the RMSE. Figure 5 shows time mean values of S for a particular observation location and 10-, 20-, and 40-member ensembles with no background GC localization. The largest values of S are for the two state variables that are averaged in the forward operator for this observation. For 10 members, the maximum value of S is approximately 0.75 and this increases to nearly 0.9 for 40 members. For state variables far from the observation, S has a minimum of about 0.2 for all three ensemble sizes; the figure includes the 10-member curve on the plots of 20- and 40-member results to facilitate this comparison.

Fig. 5.
Fig. 5.

Time mean values of systematic error correction for an observation that is the average of the 15th and 16th state variables in the Lorenz-96 assimilation experiments (thick solid lines) for (top) 10-, (middle) 20-, and (bottom) 40-member ensembles. The 10-member result is repeated in the (middle) and (bottom) (thin solid line) for easy comparison. The time mean localization from a group filter with four groups and corresponding ensemble size is shown by the dashed line in each. No background Gaspari–Cohn localization is applied for any of these cases. In (top), a Gaspari–Cohn localization with half-width 0.1 of the domain size is shown by the thin solid line.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

Figure 5 also shows the time mean localization that results from applying a group filter (Anderson 2007) with 4 groups for each ensemble size; for example, 4 times 10 model forecasts are used for the group results in Fig. 5a. The group filter localization is larger than the SEC near the observation location and smaller for remote state variables. The differences between the group filter and SEC time means become smaller, especially close to the observation, as the ensemble size increases. These results suggest that the SEC is giving too little weight to observations close to a state variable, and too much weight to observations that are remote. As the ensemble size gets large, the difference becomes predominantly giving too much weight to remote observations.

The top panel of Fig. 5 also shows a GC function for comparison. With the exceptions of the long tails, the GC is fairly similar to the group filter time mean localizations. The GC localizations that lead to the smallest RMSE results for Lorenz-96 are quite similar to the corresponding group filter time mean localizations. The best SEC cases are not quite as good because state variables receive too much impact from nearly unrelated distant observations and too little from nearby observations. Nevertheless, using the SEC does stabilize the RMSE for larger background GC cases for the 10- and 20-member ensembles. In particular, the SEC leads to the 10-member case being stable even for no background GC, while the base case becomes unstable for GC greater than 0.5.

The similarity between the 40-member SEC time mean in Fig. 5 and the four group filter result suggests that the SEC is an inexpensive way to estimate time mean localization that could then be used in a standard filter. Anderson (2007) explored the impact of using time mean localizations from a group filter in this way.

c. Low-order dry dynamical core

A low-resolution version of the B-grid dynamical core of the Geophysical Fluid Dynamics Laboratory Atmospheric Model version 2.0 (GFDL AM2) general circulation model (GFDL Global Atmospheric Model Development Team 2004) with 30 latitudes, 60 longitudes, and 5 levels is used for perfect model assimilation experiments. The model is forced with a pole to equator temperature gradient as described in Held and Suarez (1994) and is the same model used in Anderson et al. (2005) to examine assimilation of only surface pressure (PS) observations. The model has midlatitude baroclinic instability and a total of 28 800 variables.

For perfect model assimilation experiments, wind components and temperature at all five levels and surface pressure are observed for 180 columns located at latitudes
eq1
and longitudes
eq2
The horizontal points are selected so that no observation lies exactly on any model grid point. Observations are taken every 12 h by interpolating the model state in the horizontal to the column location and adding a random draw from Normal(0, 2002) to PS (units are Pa), and Normal(0, 32) to wind components and temperature (units are m s−1 and K).

A 100-yr free run of the model starting from no motion is used to generate a climatological sample from the model attractor. This preliminary initial condition is extended an additional 150 days during which synthetic observations are generated. A 320-member ensemble is generated by adding small perturbations to the preliminary initial condition and a preliminary ensemble assimilation with no localization is done for these 150 days.

The end of the 150-day control integration is the initial condition for the ensemble assimilation experiments. This initial condition is integrated for an additional 200 days during which observations are generated. The first N members from the final state of the 320-member preliminary assimilation are used as initial conditions for an ensemble assimilation with N members. This N member ensemble assimilation is then applied to the next 200 days of observations, the first 50 days are discarded, and the final 150 days (300 assimilation times) are used to compute statistics.

The relative quality of filter solutions is measured by the spatial and temporal mean of the RMSE for the ensemble mean of the model PS variables [(18) with m indexing only the M = 1800 surface pressure variables from the model]. Results are qualitatively similar for any of the other model variables. Figure 6 shows the time mean PS RMSE as a function of background horizontal GC localization radius half-width for 20-, 40-, and 80-member ensembles with and without SEC. For 20- and 40-member ensembles, the SEC has slightly larger RMSE for GC localization of 0.2, but smaller RMSE for all other GC values. For 80 members, the SEC has slightly larger RMSE for GC values less than 1, but smaller RMSE for larger values of GC. For all ensemble sizes, the absolute minimum of RMSE is for a SEC case. For larger values of GC, the SEC RMSE is much smaller than for cases without SEC. Since exhaustively tuning the GC half-width becomes prohibitively expensive in large models, the fact that applying the SEC makes the RMSE less sensitive to GC values is potentially useful.

Fig. 6.
Fig. 6.

Time mean root-mean-square error of the ensemble mean for surface pressure as a function of the half-width of a background Gaspari–Cohn localization (rad) for ensemble assimilations in the low-order dynamical core with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 20 (dash–dot lines), 40 (dashed lines), and 80 (solid lines).

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

Figure 7 displays the spatial and temporal mean of the ratio of ensemble spread to RMSE for PS as a function of GC half-width for 20- and 80-member ensembles with and without SEC. For small GC, cases with and without SEC have too much spread. As the GC half-width is increased, the ratio decreases for all four cases. For values of GC for which the RMSE was smaller in the SEC case, the ratio of spread to RMSE is closer to 1 for the SEC case. It is not surprising that the SEC spread is less deficient for large GC half-width since each state variable is being affected by many remote observations that are only weakly related and the SEC reduces some of this noise.

Fig. 7.
Fig. 7.

Time mean ratio of ensemble spread to ensemble mean root-mean-square error for surface pressure as a function of the half-width of a background Gaspari–Cohn localization (rad) for ensemble assimilations in the low-order dynamical core with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 20 (dashed lines) and 80 (solid lines).

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

Figure 8 shows the spatial and temporal mean of the adaptive inflation applied to the PS field as a function of GC half-width for the 20- and 80-member ensemble cases. For small GC, inflation values in all cases are very close to 1 since the ensemble has sufficient spread. As the GC half-width increases, inflation increases in all cases. The increase is faster for the 20-member ensembles and for the cases without SEC, again consistent with the spread ratios in Fig. 7. The 80-member SEC case with GC half-width of 3 requires mean inflation of about 1.05, while the case without requires inflation of 1.7. The smaller inflation for the SEC cases suggests that the SEC is acting mostly to remove noise while retaining much of the signal in the assimilation.

Fig. 8.
Fig. 8.

Time mean of spatial mean inflation for surface pressure as a function of the half-width of a background Gaspari–Cohn localization (rad) for ensemble assimilations in the low-order dynamical core with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 20 (dashed lines) and 80 (solid lines).

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

A number of studies have pointed out that ensemble assimilation can lead to unbalanced posterior states that result in transient nonequilibrium oscillations such as gravity waves in atmospheric models (Mitchell et al. 2002; Kepert 2009; Oke et al. 2007; Greybush et al. 2011). One common measure of imbalance in primitive equation models is the RMS of the time tendency of the PS (Fillion et al. 1995). The spatial mean of the time tendency of PS for the first model time step after each assimilation is averaged to measure imbalance. Figure 9 shows this measure as a function of GC half-width for the 20- and 80-member ensembles. The figure also shows a baseline value obtained from the last 10 years of the free model run that generated the preliminary initial condition.

Fig. 9.
Fig. 9.

Time and space mean of spatial root-mean-square value of the initial time tendency of surface pressure after each assimilation step as a function of the half-width of a background Gaspari–Cohn localization (rad) for ensemble assimilations in the low-order dynamical core with (thick lines) and without (thin lines) sampling error correction for ensemble sizes of 20 (dashed lines) and 80 (solid lines). The horizontal dashed line marks the value from an extended free integration of the model.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

In general, the 20-member ensemble cases are much less balanced than their 80-member counterparts (Fig. 9). For all but the largest values of GC half-width, the SEC cases are more unbalanced than the corresponding case without SEC. The smallest imbalance occurs for intermediate values of the background GC. For very small GC, imbalance is occurring because strongly correlated state variables that are close to one another may be impacted quite differently by a legitimately correlated observation that is slightly closer to one of them. For large GC, the introduction of noise due to accidental correlation between observations and distant state variables leads to imbalance.

Figure 10 shows the time mean SEC for the impact of a PS observation near the equator on PS state variables for an 80-member ensemble with a 1.2-rad background GC half-width. For the closest state variables, the SEC is very close to 1 while for distant state variables, the values are between 0.3 and 0.4. One dimensional cross sections are roughly symmetric about the observation and similar in shape to those for Lorenz-96 in Fig. 5. Results for smaller ensembles (not shown) are similar in shape but have smaller values near the observations and roughly the same values at larger distances. As in the Lorenz-96 case, the SEC is somewhat similar in shape to a GC function in the horizontal.

Fig. 10.
Fig. 10.

Time mean value of the localization obtained from the systematic error correction algorithm for the influence of a surface pressure observation located at the lower left of the “+” on surface pressure state variables for an assimilation with 80 ensemble members and a 1.2-rad background Gaspari–Cohn localization. The bottom (right) panel shows values along the horizontal (vertical) line in the main panel.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

Figure 11 shows the time mean SEC for the impact of a north–south wind component observation on the model’s middle level on east–west wind state variables on the same level. For remote observations (not shown in the limited domain of the figure) the SEC values are roughly the same as for the PS observation in Fig. 10. However, close to the observation there is a more complicated pattern with four independent maxima. Interestingly, the largest values of SEC are not for the state variables that are closest to the observation in this case. Also, the maximum values are not nearly as large as those in Fig. 10. This is consistent with results in Anderson (2007) suggesting that localization needs to be applied not only as a function of the spatial relation between an observation and state variable, but also depending on the “types” of the observation and state.

Fig. 11.
Fig. 11.

Time mean value of the localization obtained from the systematic error correction algorithm for the influence of a north–south wind observation at the middle model level located at the lower left of the “+” on the middle level east–west wind state variables for an assimilation with 80 ensemble members and a 1.2-rad background Gaspari–Cohn localization. The bottom (right) panel shows values along the horizontal (vertical) line in the main panel. Note the reduced spatial extent of the plots and the color scale that is different from that in Fig. 10.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

Figure 12 shows the time mean localization from a two group filter for the same observation as in Fig. 11. The pattern is quite similar with four maximum located around the observation location. As in the comparisons of group filters and SEC for Lorenz-96, the maximum values are slightly larger for the group filter than for the SEC. However, the similarity suggests that the SEC is detecting much of the sampling error that is detected by the group filter.

Fig. 12.
Fig. 12.

As in Fig. 11, but localizations are time mean from a group filter with two groups of 80 ensemble members each. Note the different range on the color scale from Fig. 11.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00013.1

There are several reasons why the SEC RMSE is generally better than for cases without SEC for this model. Having three spatial dimensions means that the number of distant observations that are potentially contaminated with noise relative to the number of nearby observations is larger than in the one-dimensional Lorenz-96 model. Second, the group filter results in Fig. 12 suggest that there are cases where the appropriate localization for sampling error is less similar in shape to an appropriately tuned GC function than for the Lorenz-96 model. Third, there is evidence that the horizontal width of localization suggested by the group filter varies as a function of the horizontal location of observations. The SEC for a PS observation in midlatitudes (not shown) is found to have a much larger area of large values than for an equatorial PS observation. Finally, there is limited experience with applying vertical localization in large model applications. It is not clear that a GC function is appropriate in the vertical for this application.

5. Conclusions

Practical applications of ensemble Kalman filters in large geophysical models must use ensembles that are small compared to the model state vector. Some form of localization of observation impact is then required to avoid poor performance or filter divergence.

A systematic error correction algorithm that computes a localization for the impact of each observation on each state variable has been described. In low-order models, applying this SEC algorithm reduces the sensitivity of assimilation quality to the width of a specified background localization. With the SEC algorithm, small ensembles can produce reasonable results with values of background localization that result in model failure without SEC. In a three-dimensional model of greater complexity, the SEC algorithm reduces sensitivity to background localization and also produces better assimilation results.

If a specified GC localization is very similar to the optimal localization for an application, it is difficult for the SEC to produce assimilations with lower RMSE. Time mean localizations from the SEC are very similar in shape to a GC function in the one-dimensional Lorenz-96 model. In addition, there are only 40 variables in the model so that the number of remote observations for a given state variable is relatively small. The three-dimensional atmospheric dynamical core presents a better opportunity for the SEC algorithm to demonstrate improvement. The time mean values of SEC localization are not as similar to a GC function. There is evidence, both from the SEC results and from group filter results, that localization is a function not only of horizontal distance but also of the latitude of the observation and the latitude, longitude, and vertical displacements of the state variable from the observation. In some cases, the horizontal pattern of the SEC-derived localizations is not closely approximated by a Gaussian in the vicinity of the observation. In fact, even in this simple dynamical core, the structure of the localization suggested by the SEC is quite varied. Also, having three spatial dimensions means that the number of observations impacting a state variable increases as the cube of the localization cutoff. This implies that the number of weakly related observations at a distance is much greater in this model than in the Lorenz-96 model. The ability of the SEC algorithm to minimize the impact of weakly related observations is expected to be more important in three-dimensional models.

There are a number of geophysical applications where a standard GC localization is suboptimal. First, little is known about how to localize in the vertical in atmospheric or oceanic models, so any three-dimensional model may be helped by the SEC algorithm. There are also applications where the correlation of observations and state variables is expected to be heterogeneous in the horizontal. Localization of an observation in the eyewall of a hurricane might require a much stronger horizontal localization, possibly with a complicated vortex-oriented shape, than observations of the background flow outside of the storm. Similarly, observations of convective cells for a cloud-resolving assimilation of severe convection might require localization quite different from observations in a surrounding location without convection. In the ocean, localization of observations in a western boundary current like the Gulf Stream might be different from that needed in the middle of an ocean basin. Experiments applying the SEC to real-time hurricane prediction in the Atlantic basin (R. Torn 2010, personal communication) and to global ocean prediction are under way and will be described in subsequent reports. Finally, the ensemble Kalman smoother where observations impact state variables at other times also requires localization that is quite different from a GC function (Anderson 2007). Further testing of the SEC algorithm is required to see if it can stabilize and improve filter and smoother performance in these applications.

The SEC algorithm independently computes a localization for each observation and state variable pair during an assimilation. One could consider using additional information about other pairs to improve the localization. For instance, at a single time one could do spatial averaging of the localizations for similar pairs of observations and state variables. One could also use information from previous times to do temporal averages of SEC localization. In the most general case, one could build a statistical model of the localization expected for a given observation and state variable using all SEC localization information from previous steps in an assimilation. A relatively simple example of such a statistical model would be to use the time mean localization, like that shown in Figs. 10 and 11, from all previous assimilation steps.

The results discussed here all assumed a relatively uninformative uniform prior for the correlation of an observation and a state variable. It is possible to efficiently compute the SEC algorithm for more informative priors. For instance, for an observation and a state variable that are known a priori to be strongly positively correlated, one could specify a prior correlation distribution that is U(0.8, 1.0). For a pair that is known to be very weakly correlated, a prior of U(−0.5, 0.5) could be used. The additional prior information would result in more constrained values of SEC localization.

The relation between localization and unbalanced analyses requires further consideration. Even for a large ensemble, the application of the SEC resulted in slightly more unbalanced solutions. It is possible that using more informed prior estimates of the correlations could reduce this imbalance.

The SEC assumes that the ensemble estimates of covariance are not systematically biased due to model error or nonlinearity. An algorithm like the SEC, or the group filter, that only examines the prior ensemble cannot correct for these errors. Adaptive inflation algorithms like the ones in Anderson (2009a) make use of both the prior ensemble and the observations when correcting for error. Algorithms that use observations to detect systematic model errors in prior covariance estimates could lead to improved filter performance.

Understanding of why ensembles of O(10) members work so effectively in enormous geophysical models is still lacking. For instance, it is unclear if localized filters are closer to the deterministic Kalman filter limit or to a true Monte Carlo algorithm. The SEC algorithm leads to degraded behavior in the Kalman filter limit while it is expected to be beneficial in the Monte Carlo limit. The low-order GCM results here are apparently not so close to the Kalman filter limit that the SEC degrades performance. Future research on this issue could lead to an ability to estimate a priori the expected performance of a given ensemble size for a given application.

Acknowledgments

Thanks to Kevin Raeder, Pavel Sakov, Peter Bickell, and Doug Nychka for comments that improved the clarity of the manuscript. Jim Hansen, Craig Bishop, and an anonymous reviewer helped to significantly increase the clarity and correctness of the manuscript. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation.

APPENDIX

Sources of Sampling Error

The algorithm described here assumes that all of the sampling error in the computation of the regression coefficient comes from the correlation coefficient. A rough check on the validity of this assumption can be obtained by sampling a related distribution. For a given ensemble size N, correlation r, and standard deviations σx and σy, generate K (K is 10 million for the results here) random samples of size N from a distribution with covariance matrix . For each sample, compute the regression coefficient b and the correlation coefficient r. Then compute the mean and standard deviation of b and r over the K samples. One can then compute Q, the ratio of the mean to the standard deviation for both the regression coefficient and the correlation coefficient and then use (9) to compute a localization for each. Define the relative error as
eq3
where αb is the localization for the regression and αr is the localization for the correlation. The relative error in this case does not depend on σx or σy.

This calculation was performed for correlation values ranging from 0.01 to 1.0 every 0.01. For correlations larger than 0.05, the maximum relative error was 14%, 7%, 3%, and 2% for ensemble sizes of 10, 20, 40, and 80, respectively. Somewhat larger errors were found for correlations less than 0.05, but the impact of these observations is already small. The fact that the sampling error is dominated by the correlation term in this case suggests that a similar result holds for the full sampling error problem described in section 3.

REFERENCES

  • Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 28842903.

  • Anderson, J. L., 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131, 634642.

  • Anderson, J. L., 2007: Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter. Physica D, 230, 99111.

    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., 2009a: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A, 7283.

  • Anderson, J. L., 2009b: Ensemble Kalman filters for large geophysical applications. IEEE Control Syst., 29, 6682.

  • Anderson, J. L., , B. Wyman, , S. Zhang, , and T. Hoar, 2005: Assimilation of surface pressure observations using an ensemble filter in an idealized global atmospheric prediction system. J. Atmos. Sci., 62, 29252938.

    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., , T. Hoar, , K. Raeder, , H. Liu, , N. Collins, , R. Torn, , and A. Arellano, 2009: The Data Assimilation Research Testbed. Bull. Amer. Meteor. Soc., 90, 12831296.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., , and D. Hodyss, 2007: Flow adaptive moderation of spurious ensemble correlations and its use in ensemble based data assimilation. Quart. J. Roy. Meteor. Soc., 133, 20292044.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., , and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part 1: Tests on simple error models. Tellus, 61A, 8496.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., , and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part 2: A strategy for the atmosphere. Tellus, 61A, 97111.

    • Search Google Scholar
    • Export Citation
  • Burgers, G., , P. J. van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 17191724.

  • Campbell, W. F., , C. H. Bishop, , and D. Hodyss, 2010: Vertical covariance localization for satellite radiances in ensemble Kalman filters. Mon. Wea. Rev., 138, 282290.

    • Search Google Scholar
    • Export Citation
  • Chen, Y., , and D. S. Oliver, 2009: Cross-covariances and localization for EnKF in multiphase flow data assimilation. Comput. Geosci., 14, 579601, doi:10.1007/s10596-009-9174-6.

    • Search Google Scholar
    • Export Citation
  • Courtier, P., and Coauthors, 1998: The ECMWF implementation of three-dimensional variational assimilation (3D-Var). I: Formulation. Quart. J. Roy. Meteor. Soc., 124, 17831807.

    • Search Google Scholar
    • Export Citation
  • Emerick, A., , and A. Reynolds, 2010: Combining sensitivities and prior information for covariance localization in the ensemble Kalman filter for petroleum reservoir applications. Comput. Geosci., 15, 251269, doi:10.1007/s10596-010-9198-y.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 14310 162.

    • Search Google Scholar
    • Export Citation
  • Fillion, L., , H. L. Mitchell, , H. Ritchie, , and A. Staniforth, 1995: The impact of a digital filter finalization technique in a global data assimilation system. Tellus, 47A, 304323.

    • Search Google Scholar
    • Export Citation
  • Furrer, R., , and T. Bengtsson, 2007: Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivar. Anal., 98, 227255.

    • Search Google Scholar
    • Export Citation
  • Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723757.

    • Search Google Scholar
    • Export Citation
  • GFDL Global Atmospheric Model Development Team, 2004: The new GFDL global atmosphere and land model AM2–LM2: Evaluation with prescribed SST simulations. J. Climate, 17, 46414673.

    • Search Google Scholar
    • Export Citation
  • Greybush, S. J., , E. Kalnay, , T. Miyoshi, , K. Ide, , and B. R. Hunt, 2011: Balance and ensemble Kalman filter localization techniques. Mon. Wea. Rev., 139, 511522.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., , J. S. Whitaker, , and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 27762790.

    • Search Google Scholar
    • Export Citation
  • Held, I. M., , and M. J. Suarez, 1994: A proposal for the intercomparison of the dynamical cores of atmospheric general circulation models. Bull. Amer. Meteor. Soc., 75, 18251830.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., , and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., , and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131, 32693289.

  • Kepert, J. D., 2009: Covariance localisation and balance in an Ensemble Kalman Filter. Quart. J. Roy. Meteor. Soc., 135, 11571176.

  • Li, H., , E. Kalnay, , and T. Miyoshi, 2009a: Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter. Quart. J. Roy. Meteor. Soc., 135, 523533.

    • Search Google Scholar
    • Export Citation
  • Li, H., , E. Kalnay, , T. Miyoshi, , and C. M. Danforth, 2009b: Accounting for model errors in ensemble data assimilation. Mon. Wea. Rev., 137, 34073419.

    • Search Google Scholar
    • Export Citation
  • Liu, H., , J. L. Anderson, , Y.-H. Kuo, , and K. Raeder, 2007: Importance of forecast error multivariate correlations in idealized assimilations of GPS radio occultation data with the ensemble adjustment filter. Mon. Wea. Rev., 135, 173185.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., , and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model. J. Atmos. Sci., 55, 399414.

    • Search Google Scholar
    • Export Citation
  • Lyster, P. M., , S. E. Cohn, , R. Menard, , L.-P. Chang, , S.-J. Lin, , and R. G. Olsen, 1997: Parallel implementation of a Kalman filter for constituent data assimilation. Mon. Wea. Rev., 125, 16741686.

    • Search Google Scholar
    • Export Citation
  • Mitchell, H. L., , and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter. Mon. Wea. Rev., 128, 416433.

  • Mitchell, H. L., , P. L. Houtekamer, , and G. Pellerin, 2002: Ensemble size, balance and model-error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130, 27912808.

    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., , S. Yamane, , and T. Enomoto, 2007: Localizing the error covariance by physical distance within a local ensemble transform Kalman filter (LETKF). Sci. Online Lett. Atmos., 3, 8992.

    • Search Google Scholar
    • Export Citation
  • Oke, P. R., , P. Sakov, , and S. P. Corney, 2007: Impacts of localization in the EnKF and EnOI: Experiments with a small model. Ocean Dyn., 57, 3245.

    • Search Google Scholar
    • Export Citation
  • Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A, 415428.

  • Tong, M., , and M. Xue, 2005: Ensemble Kalman filter assimilation of Doppler radar data with a compressible nonhydrostatic model: OSS experiments. Mon. Wea. Rev., 133, 17891807.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., , G. P. Compo, , X. Wei, , and T. M. Hamill, 2004: Reanalysis without radiosondes using ensemble data assimilation. Mon. Wea. Rev., 132, 11901200.

    • Search Google Scholar
    • Export Citation
  • Zhang, Y., , and D. S. Oliver, 2010: Improving the ensemble estimate of the Kalman gain by bootstrap sampling. Math. Geosci., 42, 327345.

    • Search Google Scholar
    • Export Citation
Save