• Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129 , 28842903.

  • Anderson, J. L., 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131 , 634642.

  • Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127 , 27412758.

    • Search Google Scholar
    • Export Citation
  • Berre, L., O. Pannekoucke, G. Desroziers, S. E. Stefanescu, B. Chapnik, and L. Raynaud, 2007: A variational assimilation ensemble and the spatial filtering of its error covariances: Increase of sample size by local spatial averaging. Proc. ECMWF Workshop on Flow-Dependent Aspects of Data Assimilation, Reading, United Kingdom, ECMWF, 151–168. [Available online at http://www.ecmwf.int/publications/library/do/references/list/14092007.].

    • Search Google Scholar
    • Export Citation
  • Buehner, M., 2005: Ensemble-derived stationary and flow-dependent background error covariances: Evaluation in a quasi-operation NWP setting. Quart. J. Roy. Meteor. Soc., 131 , 10131043.

    • Search Google Scholar
    • Export Citation
  • Courtier, P., J. N. Thepaut, and A. Hollingsworth, 1994: A strategy for operational implementation of 4D-Var, using an incremental approach. Quart. J. Roy. Meteor. Soc., 120 , 13671387.

    • Search Google Scholar
    • Export Citation
  • Dudhia, J., 1989: Numerical study of convection observed during the Winter Monsoon Experiment using a mesoscale two-dimensional model. J. Atmos. Sci., 46 , 30773107.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 , 1014310162.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., and P. J. van Leeuwen, 2000: An ensemble Kalman smoother for nonlinear dynamics. Mon. Wea. Rev., 128 , 18521867.

  • Fertig, E. J., J. Harlim, and B. R. Hunt, 2007: A comparative study of 4D-VAR and a 4D ensemble filter: Perfect model simulations with Lorenz-96. Tellus, 59A , 96100.

    • Search Google Scholar
    • Export Citation
  • Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125 , 723757.

    • Search Google Scholar
    • Export Citation
  • Gilbert, J. C., and C. Lemarechal, 1989: Some numerical experiments with variable storage quasi-Newton algorithm. Math. Programm., 45B , 407435.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter–3D variational analysis scheme. Mon. Wea. Rev., 128 , 29052919.

  • Hamill, T. M., and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches. Mon. Wea. Rev., 133 , 31323147.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., C. Snyder, and R. E. Morss, 2000: A comparison of probabilistic forecasts from bred, singular-vector, and perturbed observation ensembles. Mon. Wea. Rev., 128 , 18351851.

    • Search Google Scholar
    • Export Citation
  • Hong, S. Y., Y. Noh, and J. Dudhia, 2006: A new vertical diffusion package with an explicit treatment of entrainment processes. Mon. Wea. Rev., 134 , 23182341.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126 , 796811.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129 , 123137.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131 , 32693289.

  • Hunt, B. R., and Coauthors, 2004: Four-dimensional ensemble Kalman filtering. Tellus, 56A , 273277.

  • Kain, J. S., 2004: The Kain–Fritsch convective parameterization: An update. J. Appl. Meteor., 43 , 170181.

  • Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP: A comparison with 4D-VAR. Quart. J. Roy. Meteor. Soc., 129 , 31833203.

    • Search Google Scholar
    • Export Citation
  • Liu, C., Q. Xiao, and B. Wang, 2008: An ensemble-based four-dimensional variational data assimilation scheme. Part I: Technical formulation and preliminary test. Mon. Wea. Rev., 136 , 33633373.

    • Search Google Scholar
    • Export Citation
  • Liu, Z., and F. Rabier, 2003: The potential of high-density observations for numerical weather prediction: A study with simulated observations. Quart. J. Roy. Meteor. Soc., 129 , 30133035.

    • Search Google Scholar
    • Export Citation
  • Meng, Z., and F. Zhang, 2007: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part II: Imperfect model experiments. Mon. Wea. Rev., 135 , 14031423.

    • Search Google Scholar
    • Export Citation
  • Mitchell, H. L., P. L. Houtekamer, and G. Pellerin, 2002: Ensemble size and model-error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130 , 27912808.

    • Search Google Scholar
    • Export Citation
  • Mlawer, E. J., S. J. Taubman, P. D. Brown, M. J. Iacono, and S. A. Clough, 1997: Radiative transfer for inhomogeneous atmosphere: RRTM, a validated correlated-k model for the long-wave. J. Geophys. Res., 102 , (D14). 1666316682.

    • Search Google Scholar
    • Export Citation
  • Pu, Z-X., E. Kalnay, J. Sela, and I. Szunyogh, 1997: Sensitivity of forecast errors to initial conditions with a quasi-inverse linear method. Mon. Wea. Rev., 125 , 24792503.

    • Search Google Scholar
    • Export Citation
  • Simmons, A. J., and A. Hollingsworth, 2002: Some aspects of the improvement in skill of numerical weather prediction. Quart. J. Roy. Meteor. Soc., 128 , 647677.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. G. Powers, 2005: A description of the Advanced Research WRF version 2. NCAR Tech. Note NCAR/TN-468+STR, 88 pp.

    • Search Google Scholar
    • Export Citation
  • Snyder, C., and F. Zhang, 2003: Assimilation of simulated radar observations with an ensemble Kalman filter. Mon. Wea. Rev., 131 , 16631677.

    • Search Google Scholar
    • Export Citation
  • Xiao, Q., X. Zou, M. Pondeca, M. A. Shapiro, and C. S. Velden, 2002: Impact of GMS-5 and GOES-9 satellite-derived winds on the prediction of a NORPEX extratropical cyclone. Mon. Wea. Rev., 130 , 507528.

    • Search Google Scholar
    • Export Citation
  • Xue, M., M. Tong, and K. K. Droegemeier, 2006: An OSSE framework based on the ensemble square root Kalman filter for evaluating the impact of data from radar networks on thunderstorm analysis and forecasting. J. Atmos. Oceanic Technol., 23 , 4666.

    • Search Google Scholar
    • Export Citation
  • Zhang, F., 2005: Dynamics and structure of mesoscale error covariance of a winter cyclone estimated through short-range ensemble forecasts. Mon. Wea. Rev., 133 , 28762893.

    • Search Google Scholar
    • Export Citation
  • Zhang, F., C. Snyder, and R. Rotunno, 2002: Mesoscale predictability of the “surprise” snowstorm of 24–25 January 2000. Mon. Wea. Rev., 130 , 16171632.

    • Search Google Scholar
    • Export Citation
  • Zhang, F., Z. Meng, and A. Aksoy, 2006: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part I: Perfect model experiments. Mon. Wea. Rev., 134 , 722736.

    • Search Google Scholar
    • Export Citation
  • Zhang, H., J. Xue, S. Zhuang, G. Zhu, and Z. Zhu, 2004: GRAPeS 3D-Var data assimilation system ideal experiments. Acta Meteor. Sin., 62 , 3141.

    • Search Google Scholar
    • Export Citation
  • Zupanski, D., 1997: A general weak constraint applicable to operational 4DVAR data assimilation systems. Mon. Wea. Rev., 125 , 22742292.

    • Search Google Scholar
    • Export Citation
  • Zupanski, M., 2005: Maximum likelihood ensemble filter: Theoretical aspects. Mon. Wea. Rev., 133 , 17101726.

  • Zupanski, M., D. Zupanski, D. F. Parrish, E. Rogers, and G. DiMego, 2002: Four-dimensional variational data assimilation for the Blizzard of 2000. Mon. Wea. Rev., 130 , 19671988.

    • Search Google Scholar
    • Export Citation
  • View in gallery
    Fig. 1.

    The SLP (every 2 hPa) and 1000-hPa winds (a full barb represents 5 m s−1) at (a) 1200 UTC 24, (b) 0000 UTC 25, and (c) 1200 UTC 25 Jan 2000 from the truth simulation; and the corresponding 300-hPa PV (shaded), geopotential heights (every 80 m), and winds (a full barb represents 5 m s−1) at (d) 1200 UTC 24, (e) 0000 UTC 25, and (f) 1200 UTC 25 Jan 2000.

  • View in gallery
    Fig. 2.

    The distribution of observations at different times. Each En4DVAR assimilates observations within a 6-h window, and four En4DVAR cycles are designed from 0900 UTC 24 to 1200 UTC 25 Jan 2000: (left) 24 Jan (shaded) and (right) 25 Jan 2000.

  • View in gallery
    Fig. 3.

    The flowchart of the En4DVAR OSSE design. The initial perturbation (ptb) is added to the control (CTL) initial field to obtain the truth and ensemble initial fields. The simulated observations (obs) are produced by adding normal perturbation to the truth state. The perturbation matrix (𝗫′b) is calculated from ensemble forecast at every observation time. The control forecast is enhanced by the simulated observations in the assimilation window for obtaining the analysis (denoted by A). The ensemble forecast fields are enhanced by the simulated observations so that a set of analysis fields (denoted by EA) are obtained and treated as the ensemble initial fields for next assimilation cycle. The subscript denotes the time (e.g., 09 is 0900 UTC Jan 2000).

  • View in gallery
    Fig. 4.

    The response increments from a single observation test using (a) WRF 3DVAR, (b) En4DVAR without localization, and (c) En4DVAR with localization. The figures show increments of 1000-hPa wind vector (arrows) and temperature (shaded with the scale on the right).

  • View in gallery
    Fig. 5.

    The vertical profiles of temperature increments at the observation location corresponding to Fig. 4 by En4DVAR-NL (circle-line) and En4DVAR-L (cross-line). The observation level (850 hPa) is marked.

  • View in gallery
    Fig. 6.

    The horizontal mean absolute correlation coefficient of background perturbation at the observation level (the 8th h level) at different forecast times. The correlation coefficients of u winds (thick line), temperature (thick plus-line), and humidity (thick square-line) are calculated with 36 members. The correlation coefficient of υ winds is similar to u winds (not shown). The dotted line is the mean absolute correlation coefficient of u winds calculated by 24 members, and the dashed line is the mean absolute correlation coefficient of u winds calculated by 12 members. The thin line, thin plus-line and thin square-line are corresponding to the thick lines, but statistics are at the same single observation location in Fig. 4.

  • View in gallery
    Fig. 7.

    The vertical RMSE profiles of forecast at 1500 UTC 24 from different analysis times at 0900 UTC 24 (square-line), 1200 UTC 24 (cross-line), and 1500 UTC 24 Jan 2000 (circle-line).

  • View in gallery
    Fig. 8.

    Variations of (a) cost function and (b) its gradient with respect to iterations in En4DVAR. The dashed line is from the observation term. The dotted line is from the background term. The solid line is from both the observation and background terms.

  • View in gallery
    Fig. 9.

    The CTRL forecast error (CTRL minus truth) of SLP (shaded) and 1000-hPa wind vectors (a full barb represents 5 m s−1) at (a) 1200 UTC 24, (b) 0000 UTC 25, and (c) 1200 UTC 25 Jan 2000; and the corresponding 300-hPa potential vorticity (shaded), geopotential heights (every 2 m), and wind vectors at (d) 1200 UTC 24, (e) 0000 UTC 25, and (f) 1200 UTC 25 Jan 2000.

  • View in gallery
    Fig. 10.

    As in Fig. 9, but for En4DVAR analysis minus truth.

  • View in gallery
    Fig. 11.

    The vertical profiles of domain-averaged RMSEs in (a) u winds, (b) υ winds, (c) temperature, and (d) humidity at 1200 UTC 24 (cross-dotted line), 0000 UTC 25 (thin line), and 1200 UTC 25 Jan 2000 (thick line). The results of CTRL, En4DVAR 6-h forecast, and En4DVAR analysis are denoted by black circles, black line, and gray line, respectively.

  • View in gallery
    Fig. 12.

    The vertical profiles of domain-averaged analysis bias in (a) u winds, (b) υ winds, (c) temperature, and (d) humidity at 1200 UTC 24 (cross-doted line), 0000 UTC 25 (thin line), and 1200 UTC 25 Jan 2000 (thick line). The results of CTRL and En4DVAR analysis are denoted by black and gray lines, respectively.

  • View in gallery
    Fig. 13.

    The variation of domain-averaged RMSE in CTRL (square-line), En3DVAR (circle-line), and En4DVAR (cross-line) with time for (a) u winds, (b) υ winds, (c) temperature, and (d) humidity. The star-line shows variation of forecast–analysis spread at different times in En4DVAR.

  • View in gallery
    Fig. 14.

    The vertical profiles of domain-averaged RMSEs in (a) u winds, (b) υ winds, (c) temperature, and (d) humidity at 1200 UTC 24 (cross-dotted line), 0000 UTC 25 (thin line), and 1200 UTC 25 Jan 2000 (thick line). The results of En3DVAR and En4DVAR analysis are denoted by black and gray lines, respectively.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 624 330 34
PDF Downloads 365 105 18

An Ensemble-Based Four-Dimensional Variational Data Assimilation Scheme. Part II: Observing System Simulation Experiments with Advanced Research WRF (ARW)

Chengsi LiuLASG, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, China, and ESSL/MMM, National Center for Atmospheric Research,* Boulder, Colorado

Search for other papers by Chengsi Liu in
Current site
Google Scholar
PubMed
Close
,
Qingnong XiaoESSL/MMM, National Center for Atmospheric Research,* Boulder, Colorado

Search for other papers by Qingnong Xiao in
Current site
Google Scholar
PubMed
Close
, and
Bin WangLASG, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, China

Search for other papers by Bin Wang in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

An ensemble-based four-dimensional variational data assimilation (En4DVAR) algorithm and its performance in a low-dimension space with a one-dimensional shallow-water model have been presented in Part I. This algorithm adopts the standard incremental approach and preconditioning in the variational algorithm but avoids the need for a tangent linear model and its adjoint so that it can be easily incorporated into variational assimilation systems. The current study explores techniques for En4DVAR application in real-dimension data assimilation. The EOF decomposed correlation function operator and analysis time tuning are formulated to reduce the impact of sampling errors in En4DVAR upon its analysis. With the Advanced Research Weather Research and Forecasting (ARW-WRF) model, Observing System Simulation Experiments (OSSEs) are designed and their performance in real-dimension data assimilation is examined. It is found that the designed En4DVAR localization techniques can effectively alleviate the impacts of sampling errors upon analysis. Most forecast errors and biases in ARW are reduced by En4DVAR compared to those in a control experiment. En3DVAR cycling experiments are used to compare the ensemble-based sequential algorithm with the ensemble-based retrospective algorithm. These experiments indicate that the ensemble-based retrospective assimilation, En4DVAR, produces an overall better analysis than the ensemble-based sequential algorithm, En3DVAR, cycling approach.

Corresponding author address: Dr. Qingnong Xiao, ESSL/MMM, National Center for Atmospheric Research, Boulder, CO 80307-3000. Email: hsiao@ucar.edu

Abstract

An ensemble-based four-dimensional variational data assimilation (En4DVAR) algorithm and its performance in a low-dimension space with a one-dimensional shallow-water model have been presented in Part I. This algorithm adopts the standard incremental approach and preconditioning in the variational algorithm but avoids the need for a tangent linear model and its adjoint so that it can be easily incorporated into variational assimilation systems. The current study explores techniques for En4DVAR application in real-dimension data assimilation. The EOF decomposed correlation function operator and analysis time tuning are formulated to reduce the impact of sampling errors in En4DVAR upon its analysis. With the Advanced Research Weather Research and Forecasting (ARW-WRF) model, Observing System Simulation Experiments (OSSEs) are designed and their performance in real-dimension data assimilation is examined. It is found that the designed En4DVAR localization techniques can effectively alleviate the impacts of sampling errors upon analysis. Most forecast errors and biases in ARW are reduced by En4DVAR compared to those in a control experiment. En3DVAR cycling experiments are used to compare the ensemble-based sequential algorithm with the ensemble-based retrospective algorithm. These experiments indicate that the ensemble-based retrospective assimilation, En4DVAR, produces an overall better analysis than the ensemble-based sequential algorithm, En3DVAR, cycling approach.

Corresponding author address: Dr. Qingnong Xiao, ESSL/MMM, National Center for Atmospheric Research, Boulder, CO 80307-3000. Email: hsiao@ucar.edu

1. Introduction

The incremental approach of four-dimensional variational data assimilation (4DVAR; Courtier et al. 1994) and ensemble Kalman filters (EnKFs; Evensen 1994) are two advanced techniques for atmospheric data assimilation. As a retrospective assimilation algorithm, 4DVAR can provide the optimal trajectory and can effectively assimilate nonsynoptic data in operation (Xiao et al. 2002; Simmons and Hollingsworth 2002). EnKF, on the other hand, can use flow-dependent background error covariance (𝗕 matrix) calculated from ensemble forecast and can be easily implemented without tangent linear and adjoint models. In recent years, approaches that couple both features of the two data assimilation algorithms have been proposed, including the ensemble Kalman smoother (Evensen and van Leeuwen 2000), the maximum likelihood ensemble filter (Zupanski 2005), 4DEnKF (Hunt et al. 2004; Fertig et al. 2007), and E4DVAR (Zhang et al. 2008, manuscript submitted to Adv. Atmos. Sci.). These approaches use the flow-dependent 𝗕 matrix based on the statistics from ensemble forecasts while maintaining retrospective assimilation character. Berre et al. (2007) demonstrated advantages from coupling the ensemble and variational techniques in their quasi-operational experiments. Recently, Liu et al. (2008) presented an ensemble-based four-dimensional variational data assimilation algorithm (En4DVAR), which uses the flow-dependent 𝗕 matrix constructed by ensemble forecasts and performs 4DVAR optimization. This approach (En4DVAR) adopts incremental and preconditioning ideas in the variational algorithm so that it can be easily incorporated in both operational and research variational assimilation systems. In addition, En4DVAR obviates the need for a tangent linear model and its adjoint, which are computationally expensive and difficult to develop and maintain.

Although the ensemble-based 𝗕 matrix has been widely used for flow-dependent analysis in data assimilation algorithms, reducing the effects of sampling errors is a challenge (Houtekamer and Mitchell 1998; Hamill and Snyder 2000; Lorenc 2003). Because an infinite ensemble can never be obtained, the analysis from the ensemble-based data assimilation approach always contains noise due to the sampling errors. Usually, the ensemble dimension is far less than model dimension so that the 𝗕 matrix rank is restricted to the low-dimension subspace. The deficient rank causes the problem to be underdetermined, making the exact optimal solver impossible to obtain.

Various techniques, including the Schur product operator (Houtekamer and Mitchell 2001; Lorenc 2003; Buehner 2005), local truncation (Houtekamer and Mitchell 1998), inflation (Anderson and Anderson 1999), and hybrid schemes (Hamill and Snyder 2000; Lorenc 2003) have been used to reduce the effect of sampling errors in the analysis of ensemble-based sequential data assimilation. Using such techniques, ensemble-based data assimilation algorithms have proven useful in either global atmospheric data assimilation (Mitchell et al. 2002; Houtekamer and Mitchell 2005; Buehner 2005) or regional mesoscale atmospheric data assimilation (Snyder and Zhang 2003; Xue et al. 2006).

Ensemble-based retrospective data assimilation algorithms (Zupanski 2005; Hunt et al. 2004; Fertig et al. 2007; Liu et al. 2008) are still subject to sampling errors and need to be tested further in real atmospheric data assimilation systems. Knowing how to reduce the effect of sampling errors upon the analysis in ensemble-based retrospective data assimilation schemes is essential in order to successfully implement these schemes in real data assimilation. The ensemble-based retrospective data assimilation algorithm can produce analysis noise from both spatial and temporal sampling errors. Here, the former and latter refer to errors at single and different observation times, respectively. Both of these types of sampling errors produced by spurious correlation are studied in this paper and two techniques are proposed to reduce their effects on the analysis. One is the EOF decomposed correlation function operator, which is similar to spatial localization in ensemble-based three-dimensional variational data assimilation (3DVAR; Buehner 2005). The other is analysis time tuning, which tunes an optimal analysis time to alleviate the temporal sampling errors. Numerical experiments are used to test the ability of both techniques to reduce analysis noise resulting from sampling errors.

In Liu et al. (2008, hereafter Part I), we conducted a preliminarily test of our proposed En4DVAR scheme using a simple one-dimensional shallow-water model. The experimental results showed En4DVAR provided an analysis comparable to those of widely used variational or ensemble-based sequential data assimilation schemes. In the current paper (Part II), we examine the En4DVAR performance in real model space using the Advanced Research Weather Research and Forecast (ARW-WRF) model (Skamarock et al. 2005) in an Observing System Simulation Experiment (OSSE) framework. The 24–25 January 2000 snowstorm is selected as the OSSE case. To facilitate comparison between the ensemble-based sequential data assimilation and ensemble-based retrospective data assimilation approaches, we conducted an experiment using ensemble-based three-dimensional variational (En3DVAR) cycling and compared it with En4DVAR.

The organization of the paper is as follows. In section 2 we review the En4DVAR basic algorithm, and then describe the EOF decomposed correlation function operator and analysis time tuning techniques. Section 3 presents the OSSEs and their results. In particular, the snowstorm of 24–25 January 2000 is overviewed in section 3a; the configuration of OSSE with En4DVAR and model forecast is introduced in section 3b; tests of En4DVAR spatial localization and analysis time tuning are examined in sections 3c and 3d; the performance of the En4DVAR scheme and its advantages in producing the analysis are evaluated in section 3e; and the comparison among the control experiment, En3DVAR cycling, and En4DVAR are presented in section 3f. The summary and discussion is provided in section 4.

2. Theoretical background of En4DVAR

a. En4DVAR basic algorithm

Variational data assimilation systems typically use the incremental approach (Courtier et al. 1994) and preconditioning technique (Gilbert and Lemarechal 1989). The cost function in control variable w space is
i1520-0493-137-5-1687-e1
where w is the control variable, I is the total number of time levels on which observations are available, 𝗛 is the tangent linear observation operator, 𝗠 is the tangent linear forecast model, and 𝗢 is the observation error covariance. The innovations at different times (with subscript i) are calculated using
i1520-0493-137-5-1687-e2
where 𝘅b is the background state matrix, in which each column represents one member of the background state vector, H is the observation operator, M is the forecast model, and y is the observation vector. The precondition matrix, 𝗨, is defined by
i1520-0493-137-5-1687-e3
The final analysis is obtained from
i1520-0493-137-5-1687-e4
The idea behind En4DVAR is that the preconditioning matrix in the incremental approach of 4DVAR is replaced with a perturbation matrix. The columns of the perturbation matrix, 𝗫′b, are the normalized deviations from the ensemble mean and are estimated by N ensemble members, namely,
i1520-0493-137-5-1687-e5
The background error covariance 𝗕 is approximated by
i1520-0493-137-5-1687-e6
The En4DVAR cost function in control variable space is defined by
i1520-0493-137-5-1687-e7
To avoid tangent linear and adjoint models in calculating the gradient of cost function, we transform the perturbation matrix to observation space via
i1520-0493-137-5-1687-e8
The gradient of the cost function is then calculated using
i1520-0493-137-5-1687-e9
After minimization iteration, the optimal analysis 𝘅a can be obtained from
i1520-0493-137-5-1687-e10
The dimension of control vector w in En4DVAR is the same as the ensemble size instead of the analysis vector dimension in 4DVAR. The computational cost of En4DVAR is much less than 4DVAR because the ensemble size is far less than the analysis vector dimension. However, as discussed in section 1, reducing the dimension of control vector w causes the problem to be undetermined and analysis noise is produced. We use the localization technique to solve this problem.

b. Horizontal and vertical localization in En4DVAR

To reduce sampling errors caused by a finite number of ensemble members, Houtekamer and Mitchell (2001) employed the Schur operator (Gaspari and Cohn 1999) in EnKF, whereas Lorenc (2003) and Buehner (2005) used the Schur operator in the ensemble-based variational scheme. In En4DVAR, we introduce an EOF decomposed correlation function operator to modify the perturbation matrix 𝗫′b. This is an approach similar to the spatial localization of Buehner (2005). Using the mathematical proofs in appendix A, we obtain the modified perturbation 𝗣′b, which is defined by
i1520-0493-137-5-1687-e11
In (11), the subscripts h and υ are horizontal and vertical indices, respectively. Subscript number 1 means that the first column replaces all columns of matrix. Here 𝗘 contains all of the eigenvectors and λ is a diagonal matrix with its diagonal elements corresponding to eigenvalues obtained from an EOF decomposition of the correlation function 𝗖:
i1520-0493-137-5-1687-e12
In defining 𝗖 in (12), we use the compactly supported second-order autoregressive function as horizontal correlation model (Liu and Rabier 2003):
i1520-0493-137-5-1687-e13
where s is the spherical separation in degree between two data points, and s0 and s1 are the correlation scale and the cutoff distance beyond which the correlation becomes zero. Following Zhang et al. (2004), we perform vertical localization using the correlation function,
i1520-0493-137-5-1687-e14
where Δ log p is the distance between two vertical levels in log p space, and p is pressure.
The new perturbation matrix in (11) in En4DVAR is equivalent to the modified 𝗕 matrix by correlation function [e.g., the Schur product operator, in the EnKF (see appendix A)]:
i1520-0493-137-5-1687-e15
The correlation function using (13) makes the correlation equal to zero beyond the cutoff distance. It can smooth the correlation within the cutoff distance as well. After localization, the perturbation matrix 𝗫′b in (7) is replaced by 𝗣′b, and the new cost function can be rewritten as
i1520-0493-137-5-1687-e16
If all the eigenvectors and eigenvalues are used in (11), 𝗣′b will be a matrix with n rows and n × n × N columns (n is the analysis vector dimension). The dimension of the control vector will be increased to n × n × N. But if a few leading modes can represent the correlation function very well, the dimension of the control vector is reduced to rh × rυ × N, in which rh is the number of the horizontal eigenvector truncation mode and rυ is the number of vertical eigenvector truncation mode. This reduction of dimension can greatly reduce the En4DVAR computational cost in real-dimension data assimilation.

c. Analysis time tuning in En4DVAR

In EnKF analysis, the perturbation matrix is calculated at the analysis time. The analysis noise comes from the spatially spurious correlation between the two grid points. The further the two sets of perturbations between different grid points, the noisier the analysis (Houtekamer and Mitchell 1998). In contrast to EnKF, En4DVAR needs not only the perturbation matrix at analysis time, but also all perturbation matrices at observation times. Therefore, the analysis noise can come from both the spatially spurious correlations and the temporally spurious correlations in the perturbations.

Adding the time subscript index to (7), the cost function can be rewritten as
i1520-0493-137-5-1687-e17
For simplicity, we use the En4DVAR analysis with only a single time observation to illustrate the problem. With only one time observation, the mathematical expression for the analysis derived from (17) becomes
i1520-0493-137-5-1687-e18
If the observation and analysis times are identical, (18) is the same as the equation in the EnKF analysis and the 𝗫′b0(𝗛1𝗠0∼1𝗫′b0)T term becomes 𝗫′b0(𝗛0𝗫′b0)T. At large distances from observation points, the correlation element between 𝗫′b0 and 𝗛0𝗫′b0 should be close to zero. However, some spurious correlations may appear due to the limited ensemble members used in calculating the perturbation matrix. These spurious correlations will make the analysis noisy and contaminate the analysis quality. Similarly, when the observation time is significantly different from the analysis time, 𝗫′b0(𝗛1𝗠0∼1𝗫′b0)T should be close to a zero matrix because the correlation between 𝗫′b0 and 𝗛1𝗠0∼1𝗫′b0 is very small. Limited ensemble members could lead to temporal analysis noise resulting from the observation times departing significantly from the analysis time.
To reduce temporal analysis noise, the En4DVAR cost function is modified to allow an analysis time as close as possible to the time of most observations, instead of the beginning time of the data assimilation window. The cost function can be reformulated as
i1520-0493-137-5-1687-e19

In (19), the subscript α indicates any time within assimilation window. If the analysis time is not at the beginning of assimilation window, the backward tangent linear model and forward adjoint model are needed for calculating the gradient of 4DVAR. Appendix B shows that the perturbation matrix calculation in (19) is the same as in (17). Therefore, En4DVAR can be implemented at any analysis time within the assimilation window and does not require the use of the backward tangent linear model and the forward adjoint model.

3. Observing System Simulation Experiments with the Blizzard of 2000

a. Synoptic overview

During 24–25 January 2000, a major winter storm, termed the “Blizzard of 2000” (Zupanski et al. 2002), affected most of the east coast of the United States. Heavy rain, mixed with snow and freezing rain fell in North and South Carolina, producing an enormous natural hazard. In the Washington, D.C., area, the heavy snow blocked most roads and causing business closures and significant disruption of commerce. The operational model failed to predict strong convection and precipitation for the snowstorm case.

A broad synoptic trough moved over the eastern United States during 24–25 January 2000. At 1200 UTC 24 January, an associated surface low appeared over north Florida. The surface low then moved near the Washington, D.C., area along the eastern coast during the next 24 h. After 0000 UTC 25 January, the convection developed rapidly and heavy snowfalls were produced from the Carolinas through the Washington, D.C., area. The minimum sea level pressure associated with the cyclone showed a rapid drop of 22 hPa from 1200 UTC 24 January to 1200 UTC 25 January (Zhang et al. 2002). This case has been analyzed in conjunction with a variety of topics including mesoscale predictability (Zhang et al. 2002), dynamics and structure (Zhang 2005), 4DVAR experiments (Zupanski et al. 2002), and EnKF performance for mesoscale data assimilation (Zhang et al. 2006; Meng and Zhang 2007).

b. Experimental design

The ARW-WRF (Skamarock et al. 2005) was used for this study. All experiments were conducted over a grid mesh of 94 × 94 with grid spacing of 27 km. The domain covers the eastern half of the United States. In the vertical, there are 27 η layers, and the model top is 50 hPa. The physical processes in all experiments include the Rapid Radiative Transfer Model (RRTM) based on Mlawer et al. (1997) for longwave radiation, the Dudhia (1989) shortwave radiation scheme, the Yonsei University (YSU) PBL scheme (Hong et al. 2006), and the Kain–Fritsch (new Eta) scheme (Kain 2004). The National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) analysis provided the first-guess fields and boundary conditions.

The control run (CTRL) is integrated for 42 h and is initialized using the first-guess fields at 1800 UTC 23 January 2000. Random perturbations are added to the first guess of CTRL to produce 37 ensemble members. The ensemble member that compares most favorably to observations in terms of the location and strength of the cyclone is chosen as the truth simulation, while the other members constitute the reference forecast ensemble. The random perturbations are derived from the background error covariance of the WRF 3DVAR data assimilation system using an approach similar to the initial ensemble generating method in Houtekamer and Mitchell (2005). Therefore, the perturbations are consistent with the background error covariance defined by the WRF 3DVAR data assimilation. We also perturbed boundary conditions using the Data Assimilation Research Test bed (DART) system (Anderson 2001, 2003).

Figures 1a–c show the sea level pressure (SLP) and 1000-hPa wind vectors from the truth simulation at 12-h intervals from 1200 UTC 24 January to 1200 UTC 25 January 2000. The corresponding potential vorticity (PV), geopotential heights, and wind vectors at 300 hPa are shown in Figs. 1d–f. At 1200 UTC 24 January (Fig. 1a), the location and intensity of the simulated cyclone are similar to the analysis in Zupanski et al. (2002). But 12 hours later (Fig. 1b), the strength of the simulated cyclone was a little weaker than the GFS analysis (Zupanski et al. 2002). At 1200 UTC 25 January (Fig. 1c), the intensity of the simulated cyclone is over 8 hPa higher than in the analysis, and its location is east of the analyzed cyclone location.

To conduct En4DVAR OSSE, we generated the simulated observations from 0900 UTC 24 January to 1200 UTC 25 January 2000 from the truth simulation with random normal perturbations containing variances identical to the observation errors defined in WRF-VAR. Four types of observations [sounding, surface observation, satellite-retrieved winds, and Quick Scatterometer (QuikSCAT) winds] were tested in this study. The simulated observations are located in the site of real observations but their values are replaced by the simulated values. Following the EnKF implementation in Houtekamer and Mitchell (1998), these observations are perturbed first, and then assimilated to update the ensemble forecast fields in the En4DVAR cycling procedure.

Figure 2 shows the temporal distributions of different observations. We designed an assimilation window of 6 h for each update analysis, which resulted in four windows used in the En4DVAR cycling for the OSSEs from 0900 UTC 24 to 1200 UTC 25 January. Figure 3 describes the flowchart of En4DVAR procedure used in this OSSE. The deterministic forecast is updated by observations and the ensemble forecast is updated by perturbed observations, which provides the perturbation matrices for En4DVAR. We chose u and υ wind components, temperature, and humidity as the En4DVAR analysis variables. We use the deterministic forecast as background field in the En4DVAR. The ensemble mean is used in calculating the perturbation matrix. One of the main purposes of this study is to examine the effect of the flow-dependent 𝗕 matrix in the 4DVAR. If ensemble mean is used as the background field, it is hard to know whether the improvement in the En4DVAR analysis comes from the flow-dependent 𝗕 matrix or the ensemble forecast.

c. Test of the En4DVAR localization technique

To perform localization, we introduced the correlation function operator in the 𝗕 matrix. Since the effects of observations propagate in time, ideally, a flow-dependent localization should be used. However, the correlation function operator we currently have in the WRF En4DVAR does not have this feature because of the complexity of implementation and computational costs. Because the perturbation matrix 𝗫′b is estimated by the model ensemble forecast and the correlation operator is fixed in time, the analysis may lose some information from observations stretched far from the analysis time. We will investigate this problem and develop the new perturbation matrix 𝗣′b, which contains flow-dependent location in the future.

When the correlation function operator is applied to En4DVAR, the control vectors are enlarged and the computational cost becomes very expensive. To reduce the computational cost, EOFs are used to decompose the correlation function and limited truncation modes are selected. Table 1 shows the cumulative relative RMSE corresponding to different horizontal and vertical truncation modes, which is defined as
i1520-0493-137-5-1687-e20
where G is the cumulative relative RMSE, m is the selected truncation mode number, mt is the total truncation modes number, and λ is the eigenvalue. In this study, we selected 36 truncation modes in the horizontal and 10 in the vertical. As shown in Table 1, both the horizontal and vertical cumulative relative RMSEs are over 90% with these selections. Although the truncation makes the 𝗕 matrix not full rank, the rank is still comparable to the number of observations in this experiment.

We employed a single observation test to examine the En4DVAR localization technique in this study. A single temperature observation at 33.5°N, 78.4°W and 850 hPa is assimilated using WRF 3DVAR, En4DVAR without localization (En4DVAR-NL), and En4DVAR with localization (En4DVAR-L). Resulting wind vector and temperature increments at 1000 hPa are shown in Fig. 4. The temperature analysis increment in WRF 3DVAR (Fig. 4a) indicates a homogeneous and isotropic structure that the 𝗕 matrix of WRF 3DVAR has. Because of the balance constraint of quasigeostrophy in the variable transform of WRF 3DVAR, the wind analysis increments show quasigeostrophic characteristics in the single temperature observation assimilation experiment. On the other hand, Figs. 4b,c demonstrate the flow-dependent 𝗕 matrix structure in the En4DVAR. The temperature analysis increments in En4DVAR extend along the eastern coast, corresponding to the flow direction (Fig. 1a) and isotherms (not shown). In contrast to the variable transform in WRF 3DVAR, the relations among different physical variables in the En4DVAR depend on ensemble statistics. The analysis increments in the En4DVAR (Figs. 4b,c) show characteristics presented by ensemble statistics.

If no localization is performed in the En4DVAR (Fig. 4b), a lot of increment noise is found because of sampling errors. This has been discussed in section 1. The amplitude of such noise can be comparable to the increment signal at observation locations. Using localization by the EOF decomposed correlation function operator, the noise almost disappears but the increment signal from the observation is still maintained (Fig. 4c). Figure 5 shows the vertical profiles of the temperature analysis increments at the observation location obtained using En4DVAR-NL and En4DVAR-L. At high levels, there is an obvious spurious analysis increment if the localization is not applied, but the noise is filtered out after localization. Although noise can be filtered out by the localization technique, the analysis increment signal is also a little reduced compared to the one without localization because limited modes are used. However, the major analysis feature is retained since over 90% of the signal is still maintained.

d. Analysis time tuning in En4DVAR

To illustrate the impact of temporal sampling errors in the En4DVAR, we calculate the horizontal mean absolute correlation coefficient of the background perturbation at the observation level (the 8th η level) at different forecast times (Fig. 6). It is shown that the perturbation correlation between different times decreases as the forecast time increases. This means that an observation far from the analysis time will have little impact on the analysis. To evaluate the correlation sensitivity to ensemble size, the correlation coefficients with different ensemble sizes are also calculated. It shows that the correlation difference between different ensemble sizes increases with the time. That is, a smaller correlation needs more samples to be estimated, while observation information far from the analysis time needs more ensembles to be extracted.

Comparing the correlation coefficients among different variables, the time scale of the humidity error temporal correlation is the shortest; while that of the winds is the longest. This indicates that the humidity analysis is more sensitive to temporal sampling errors in the En4DVAR. In the unstable flow region, the perturbations usually develop rapidly so that the temporal correlations among perturbations are less. Figure 6, which shows the temporal correlation coefficient among perturbations at the same single observation location in Fig. 4 near the cyclone, illustrates this point. The scale length of temporal correlation is less than that in the horizontal mean. In particular, note that the humidity perturbation correlation coefficient is less than 0.2 after t = 2 h. This indicates that the humidity observation will not provide significant information to the analysis if the observation time differs from the analysis time by 2 h or more.

To evaluate the sensitivity of En4DVAR to analysis time, we choose 0900, 1200, and 1500 UTC 24 January as En4DVAR analysis times, and denote these three experiments as En4D-09, En4D-12, and En4D-15, respectively. The RMSEs obtained from integrating the En4D-09 and En4D-12 analysis fields to 1500 UTC 24 January are compared in Fig. 7. Due to sampling errors, the En4DVAR analyses are very sensitive to the analysis time even within the 6-h assimilation window. Comparing all RMSEs, it is obvious that En4D-12 provides the best analysis because its analysis time is set closest to the majority of observations; and its temporal analysis noise is smallest. Except for humidity, the RMSEs in En4D-09 are larger than in En4D-15 (Figs. 7a–c) because En4D-09 sampling errors quickly increase in a 6-h forecast. The spatial and temporal correlation scales in humidity variable errors are usually smaller than in other variables. Therefore, the humidity analysis is more sensitive to sampling errors. Because the humidity observations are at 0900 (surface observation) and 1200 UTC (sounding) 24 January, the humidity analysis in En4D-15 is obviously affected by temporal sampling errors and produces an even worse analysis than the one in En4D-09 (Fig. 7d).

e. En4DVAR OSSE results

First, we examine the minimization efficiency of WRF En4DVAR. Figure 8 shows the En4DVAR cost function and its gradient for the first analysis time (1200 UTC 24 January). In this experiment, the En4DVAR cost function reaches the WRF-VAR minimization convergence standard after 38 iterations. It indicates that the minimization of WRF En4DVAR converges efficiently. Table 2 compares AO (analysis minus observation) with BO (background minus observation) at analysis times, indicating a closer agreement of analysis fields than background fields with observations.

Figure 9 shows the CTRL forecast errors (forecast minus truth) at 1200 UTC 24 January and 0000 and 1200 UTC 25 January (similar to Fig. 1, but showing the errors between the forecast and the truth). Before the winter storm occurs, the forecast error has small amplitude even after the 18-h forecast from 1800 UTC 23 (Figs. 9a,d). When the storm develops, the forecast error rapidly increases near the storm region (Figs. 9b,e). With the En4DVAR OSSE configuration described in section 2 the analysis error in Fig. 10 is similar to the forecast error in Fig. 9. It is obvious that the En4DVAR analysis errors are less than the background errors. The greatest analysis errors of En4DVAR are mainly over the western Atlantic because the background error develops most rapidly near the cyclone, and the limited synoptic observations in the region are not enough to correct the error.

Figure 11 shows the vertical profiles of domain-averaged RMSEs in the 6-h forecasts by CTRL and En4DVAR, as well as in the En4DVAR analyses at 1200 UTC 24 January, 0000 UTC 25 January, and 1200 UTC 25 January. The RMSE in CTRL increases several times after 12-h forecast from 1200 UTC 24 January with the error developing rapidly after the onset of the winter storm. After 0000 UTC 25 January, most errors have reached saturation and no longer increase. But the errors of V-wind component and humidity in CTRL at lower levels continue to increase because the V-wind component and humidity still change significantly after 0000 UTC 25 January (Figs. 1b,c). In contrast to CTRL, the forecast error in En4DVAR decreases after the first analysis and maintains small amplitude during 24-h forecast–analysis cycling. Overall, the En4DVAR analysis error is always smaller than the background error. However, the error reduction in the En4DVAR analysis compared with background decreases in higher levels than in lower levels. This is because only rawinsonde observations exist at high levels and observations from lower levels have marginal impacts on the high-level analysis. Figure 12 shows the domain-averaged analysis bias in the CTRL and En4DVAR experiments from 1200 UTC 24 January to 1200 UTC 25 January. The biases of winds in CTRL (Figs. 12a,b) are mainly in the upper levels (around the15th level) and the low levels (below the 5th level), which is similar to RMSE vertical profile in Fig. 11. It suggests that CTRL has a systematic error in predicting the upper-level trough and the cyclone. But after En4DVAR analysis, the upper-level biases are reduced significantly and the low-level biases are near zero. We suspect that the observations of QuikSCAT winds have a positive impact on the En4DVAR analysis.

f. Comparison of En4DVAR and En3DVAR cycling

En4DVAR is an ensemble-based retrospective algorithm. If no sampling errors are considered, the En4DVAR analysis fits to the optimal trajectory. Unlike the ensemble-based sequential algorithm, it does not necessarily fit observations at individual points. The temporal sampling errors can affect En4DVAR analysis as discussed previously in this paper. To compare the ensemble-based sequential and ensemble-based retrospective algorithms, we designed an En3DVAR cycling experiment that uses the same configuration of En4DVAR but assimilates only one-time observation and cycles all the observations in the assimilation window.

Figure 13 shows the time variations of the domain-averaged RMSE in the analyses of CTRL, En3DVAR cycling, and En4DVAR. It is clear that the error in CTRL rises suddenly after 1800 UTC 24 January and becomes stable after 2100 UTC 24 January, but the humidity error still increases slowly until 0600 UTC 25 January. CTRL hardly predicts the cyclone development from 1200 UTC 24 January resulting in the most significant errors near the cyclone location (Fig. 9). When En3DVAR or En4DVAR analysis cycling is applied, the forecast–analysis errors are far less than those of CTRL. Meanwhile, it is found that the overall performance of En4DVAR is better than that of En3DVAR, indicating that the ensemble-based retrospective algorithm is more robust than the ensemble-based sequential algorithm. The time variation of forecast–analysis spread statistics in En4DVAR is also plotted in Fig. 13. During the first two cycles, it is obvious that the perturbation spread is larger than the RMSE. After the third cycle, however, the perturbation spread is a little less than the RMSE, which means the filter divergence also exists in En4DVAR. The filter divergence seems more serious in the humidity analysis (Fig. 13d) because the humidity perturbations are confined to small amplitude in our experiments to prevent oversaturation or negative humidity. This suggests that a better humidity perturbation method should be explored in the future. Another reason for the humidity analysis divergence is that humidity spatial and temporal scales are smaller and more sensitive to sampling errors. Adding the inflation factor (Anderson and Anderson 1999) can relax the filter divergence problem. Figure 14 shows the vertical profiles of the domain-averaged RMSE in the En3DVAR and En4DVAR analyses at 1200 UTC 24 January, 0000 UTC 25 January, and 1200 UTC 25 January. At most altitudes, the RMSEs in En4DVAR are less than those in En3DVAR. However, sometimes the RMSE of the En4DVAR analysis are larger than those of the En3DVAR analysis at low levels because the scale length of the temporal error correlation is smaller, and En4DVAR analysis is seriously affected by the temporal sampling errors at low levels. The humidity analysis in En4DVAR is worse than that in En3DVAR in the first analysis. As discussed in section 3d, the humidity analysis is more sensitive to temporal sampling errors. After several forecast–analysis cycles, the humidity RMSE in the En4DVAR forecast–analysis is comparable to that of En3DVAR because the analysis of other variables in En4DVAR is better than for En3DVAR. Therefore, the final En4DVAR humidity analysis is also improved. Since both En4DVAR and En3DVAR use the flow-dependent 𝗕 matrix, we do not expect En4DVAR to result in a significant improvement over En3DVAR for the designed OSSEs in this study. First, the analysis of En4DVAR is affected by temporal sampling errors, which does not exist in En3DVAR. Second, as discussed in section 4c, the fixed localization is used in our experiment. Third, most observations are at the analysis time of En4DVAR, especially sounding observations; this reduces the benefits of En4DVAR compared with En3DVAR. However, we believe the results of En4DVAR can be further improved compared to those of En3DVAR if a flow-dependent localization technique is adopted and more nonsynoptic observations are used during assimilation cycles.

4. Summary and discussion

In recent years, ensemble-based sequential data assimilation algorithms have yielded many encouraging experimental results (Mitchell et al. 2002; Houtekamer and Mitchell 2005; Buehner 2005; Snyder and Zhang 2003; Xue et al. 2006). Ensemble-based retrospective data assimilation algorithms constitute a new approach whose implementation techniques and performance in numerical weather prediction of the real atmosphere require further research. Liu et al. (2008) proposed an ensemble-based retrospective data assimilation algorithm called En4DVAR, which uses background perturbations in observation space to calculate the gradient during the minimization procedure. This obviates the need for tangent linear and adjoint models in its formulation. In Liu et al. (2008), a one-dimensional shallow-water model was used for preliminarily tests of the En4DVAR scheme. These tests showed that En4DVAR could produce an analysis comparable to those obtained from widely used variational or ensemble data assimilation schemes.

For implementation of the En4DVAR in real data assimilation, the EOF decomposed correlation function operator and analysis time tuning are two techniques proposed in this paper to reduce the impact of sampling errors upon the analysis. The purpose of the first technique is to reduce spatial sampling errors, while the purpose of the second one is to reduce temporal sampling errors in the En4DVAR analysis.

Using observing system simulation experiments of the “Blizzard of 2000” snowstorm case, the En4DVAR performance is examined by assimilating the simulated rawinsonde, surface observation, satellite-retrieved winds, and QuikSCAT winds observations derived from a WRF-model “truth” simulation.

The single observation test shows that En4DVAR can provide a flow-dependent analysis structure function, and a reasonable physical relationship among control variables can be determined from the ensemble statistics. Using the EOF decomposed correlation function operator, the analysis increment noise can be reduced effectively, and most analysis increment signals remain in the analysis.

We compared the forecast RMSEs at different analysis times and examined the impact of temporal sampling errors upon the En4DVAR analysis. The experimental results indicated that En4DVAR was very sensitive to temporal sampling errors. Since the En4DVAR analysis time can be tuned to any time within the data assimilation window, there is an optimal analysis time in En4DVAR. Using OSSEs, we concluded that the optimal analysis time is at the middle of assimilation window.

We found that En4DVAR can reduce most forecast errors and biases compared with the results of CTRL. To compare the ensemble-based sequential algorithm with the ensemble-based retrospective algorithm, we carried out an En3DVAR cycling experiment, which is in fact the En4DVAR at one analysis time with cycling in the assimilation window. Results showed that the En4DVAR produced an overall better analysis. However, at lower levels and in humidity analysis, the performance of En4DVAR is similar to En3DVAR cycling because of the effects of temporal sampling errors.

The OSSE study in this paper has indicated that the En4DVAR can be applied to real atmospheric data assimilation. However, there are issues that should be considered in the implementation for study of real cases. For example, we assumed that a perfect model is used in the En4DVAR OSSEs. In practice, the impact of model error cannot be neglected in real-world data assimilation. The model errors can make the filter severely divergent in ensemble-based data assimilation. Some techniques dealing with model errors in ensemble-based data assimilation have been proposed (Hamill and Whitaker 2005; Houtekamer and Mitchell 2005; Meng and Zhang 2007). Since En4DVAR is still constructed in a variational framework, adding model error terms to the objective function seems straightforward (Zupanski 1997). We plan to adopt these techniques and test them in En4DVAR. The OSSE results showed that the humidity analysis in En4DVAR should be given more attention since it is very sensitive to sampling errors. The spatial scale length of correlation operator for humidity should be shorter than that for other analytical variables. Humidity observations at significant temporal departures from analysis times should be excluded in En4DVAR. Although localization and analysis time tuning mitigate spurious correlation in En4DVAR, there are issues that require further study. The inflation factor (Anderson and Anderson 1999) and better perturbation generation methods (Hamill et al. 2000) should be added in En4DVAR. We will report on the En4DVAR performance for a real case study in the future.

Acknowledgments

This research has been supported by NOAA Grant NA05111076. Discussions with Drs. Chris Snyder, Jenny Sun, and Hui Liu were very helpful to this study. We are also very grateful to Drs. Stan Trier, Thomas Auligne, Hans Huang, and Chris Snyder for their constructive suggestions and comments on our initial draft.

REFERENCES

  • Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129 , 28842903.

  • Anderson, J. L., 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131 , 634642.

  • Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127 , 27412758.

    • Search Google Scholar
    • Export Citation
  • Berre, L., O. Pannekoucke, G. Desroziers, S. E. Stefanescu, B. Chapnik, and L. Raynaud, 2007: A variational assimilation ensemble and the spatial filtering of its error covariances: Increase of sample size by local spatial averaging. Proc. ECMWF Workshop on Flow-Dependent Aspects of Data Assimilation, Reading, United Kingdom, ECMWF, 151–168. [Available online at http://www.ecmwf.int/publications/library/do/references/list/14092007.].

    • Search Google Scholar
    • Export Citation
  • Buehner, M., 2005: Ensemble-derived stationary and flow-dependent background error covariances: Evaluation in a quasi-operation NWP setting. Quart. J. Roy. Meteor. Soc., 131 , 10131043.

    • Search Google Scholar
    • Export Citation
  • Courtier, P., J. N. Thepaut, and A. Hollingsworth, 1994: A strategy for operational implementation of 4D-Var, using an incremental approach. Quart. J. Roy. Meteor. Soc., 120 , 13671387.

    • Search Google Scholar
    • Export Citation
  • Dudhia, J., 1989: Numerical study of convection observed during the Winter Monsoon Experiment using a mesoscale two-dimensional model. J. Atmos. Sci., 46 , 30773107.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 , 1014310162.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., and P. J. van Leeuwen, 2000: An ensemble Kalman smoother for nonlinear dynamics. Mon. Wea. Rev., 128 , 18521867.

  • Fertig, E. J., J. Harlim, and B. R. Hunt, 2007: A comparative study of 4D-VAR and a 4D ensemble filter: Perfect model simulations with Lorenz-96. Tellus, 59A , 96100.

    • Search Google Scholar
    • Export Citation
  • Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125 , 723757.

    • Search Google Scholar
    • Export Citation
  • Gilbert, J. C., and C. Lemarechal, 1989: Some numerical experiments with variable storage quasi-Newton algorithm. Math. Programm., 45B , 407435.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter–3D variational analysis scheme. Mon. Wea. Rev., 128 , 29052919.

  • Hamill, T. M., and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches. Mon. Wea. Rev., 133 , 31323147.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., C. Snyder, and R. E. Morss, 2000: A comparison of probabilistic forecasts from bred, singular-vector, and perturbed observation ensembles. Mon. Wea. Rev., 128 , 18351851.

    • Search Google Scholar
    • Export Citation
  • Hong, S. Y., Y. Noh, and J. Dudhia, 2006: A new vertical diffusion package with an explicit treatment of entrainment processes. Mon. Wea. Rev., 134 , 23182341.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126 , 796811.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129 , 123137.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131 , 32693289.

  • Hunt, B. R., and Coauthors, 2004: Four-dimensional ensemble Kalman filtering. Tellus, 56A , 273277.

  • Kain, J. S., 2004: The Kain–Fritsch convective parameterization: An update. J. Appl. Meteor., 43 , 170181.

  • Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP: A comparison with 4D-VAR. Quart. J. Roy. Meteor. Soc., 129 , 31833203.

    • Search Google Scholar
    • Export Citation
  • Liu, C., Q. Xiao, and B. Wang, 2008: An ensemble-based four-dimensional variational data assimilation scheme. Part I: Technical formulation and preliminary test. Mon. Wea. Rev., 136 , 33633373.

    • Search Google Scholar
    • Export Citation
  • Liu, Z., and F. Rabier, 2003: The potential of high-density observations for numerical weather prediction: A study with simulated observations. Quart. J. Roy. Meteor. Soc., 129 , 30133035.

    • Search Google Scholar
    • Export Citation
  • Meng, Z., and F. Zhang, 2007: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part II: Imperfect model experiments. Mon. Wea. Rev., 135 , 14031423.

    • Search Google Scholar
    • Export Citation
  • Mitchell, H. L., P. L. Houtekamer, and G. Pellerin, 2002: Ensemble size and model-error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130 , 27912808.

    • Search Google Scholar
    • Export Citation
  • Mlawer, E. J., S. J. Taubman, P. D. Brown, M. J. Iacono, and S. A. Clough, 1997: Radiative transfer for inhomogeneous atmosphere: RRTM, a validated correlated-k model for the long-wave. J. Geophys. Res., 102 , (D14). 1666316682.

    • Search Google Scholar
    • Export Citation
  • Pu, Z-X., E. Kalnay, J. Sela, and I. Szunyogh, 1997: Sensitivity of forecast errors to initial conditions with a quasi-inverse linear method. Mon. Wea. Rev., 125 , 24792503.

    • Search Google Scholar
    • Export Citation
  • Simmons, A. J., and A. Hollingsworth, 2002: Some aspects of the improvement in skill of numerical weather prediction. Quart. J. Roy. Meteor. Soc., 128 , 647677.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. G. Powers, 2005: A description of the Advanced Research WRF version 2. NCAR Tech. Note NCAR/TN-468+STR, 88 pp.

    • Search Google Scholar
    • Export Citation
  • Snyder, C., and F. Zhang, 2003: Assimilation of simulated radar observations with an ensemble Kalman filter. Mon. Wea. Rev., 131 , 16631677.

    • Search Google Scholar
    • Export Citation
  • Xiao, Q., X. Zou, M. Pondeca, M. A. Shapiro, and C. S. Velden, 2002: Impact of GMS-5 and GOES-9 satellite-derived winds on the prediction of a NORPEX extratropical cyclone. Mon. Wea. Rev., 130 , 507528.

    • Search Google Scholar
    • Export Citation
  • Xue, M., M. Tong, and K. K. Droegemeier, 2006: An OSSE framework based on the ensemble square root Kalman filter for evaluating the impact of data from radar networks on thunderstorm analysis and forecasting. J. Atmos. Oceanic Technol., 23 , 4666.

    • Search Google Scholar
    • Export Citation
  • Zhang, F., 2005: Dynamics and structure of mesoscale error covariance of a winter cyclone estimated through short-range ensemble forecasts. Mon. Wea. Rev., 133 , 28762893.

    • Search Google Scholar
    • Export Citation
  • Zhang, F., C. Snyder, and R. Rotunno, 2002: Mesoscale predictability of the “surprise” snowstorm of 24–25 January 2000. Mon. Wea. Rev., 130 , 16171632.

    • Search Google Scholar
    • Export Citation
  • Zhang, F., Z. Meng, and A. Aksoy, 2006: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part I: Perfect model experiments. Mon. Wea. Rev., 134 , 722736.

    • Search Google Scholar
    • Export Citation
  • Zhang, H., J. Xue, S. Zhuang, G. Zhu, and Z. Zhu, 2004: GRAPeS 3D-Var data assimilation system ideal experiments. Acta Meteor. Sin., 62 , 3141.

    • Search Google Scholar
    • Export Citation
  • Zupanski, D., 1997: A general weak constraint applicable to operational 4DVAR data assimilation systems. Mon. Wea. Rev., 125 , 22742292.

    • Search Google Scholar
    • Export Citation
  • Zupanski, M., 2005: Maximum likelihood ensemble filter: Theoretical aspects. Mon. Wea. Rev., 133 , 17101726.

  • Zupanski, M., D. Zupanski, D. F. Parrish, E. Rogers, and G. DiMego, 2002: Four-dimensional variational data assimilation for the Blizzard of 2000. Mon. Wea. Rev., 130 , 19671988.

    • Search Google Scholar
    • Export Citation

APPENDIX A

EOF Decomposed Correlation Function Operator in En4DVAR

The background error covariance modified by a correlation function (e.g., the Schur operator; Houtekamer and Mitchell 2001) is defined as
i1520-0493-137-5-1687-ea1
where 𝗖 is spatial correlation function and 𝗕 is the background error covariance.
In En4DVAR, the background error covariance is approximately calculated using ensemble perturbations:
i1520-0493-137-5-1687-ea2
Using the correlation function in En4DVAR, the control variables are preconditioned by 𝗣′ instead of 𝗫′b. Here 𝗣′ is given by
i1520-0493-137-5-1687-ea3
where 𝗫′b1 is an n-column matrix and every column is the first column in 𝗫′b, N is the ensemble number, and n is the state vector dimension. Here 𝗖′ is designed to satisfy
i1520-0493-137-5-1687-ea4
We can prove that 𝗣′ defined by (A3) satisfies
i1520-0493-137-5-1687-ea5
Using Eqs. (A1), (A2), and (A4), the element Pl,m in 𝗣 matrix is
i1520-0493-137-5-1687-ea6
According the definition of 𝗣′b in (A3), we have
i1520-0493-137-5-1687-ea7
Therefore 𝗣′b defined by (A3) satisfies the approximate relation of (A5).
Although using 𝗣′b to precondition control variables in En4DVAR can effectively avoid analysis noise resulting from sampling errors, the control vector will be enlarged to n × N dimension, and the computational cost will be increased. In our implementation, we decompose matrix 𝗖 using EOF decomposition to reduce computing cost. It is proven that the implementation retains the effect of the correlation function localization. With the EOF decomposition,
i1520-0493-137-5-1687-ea8
where 𝗘 contains all of eigenvectors, λ is a diagonal matrix, and the diagonal elements are eigenvalues. In our experiment, the EOF decomposition of matrix 𝗖 is performed on a low-resolution grid so that the computational cost is significantly reduced. The eigenvectors and eigenvalues of the high-resolution grid are obtained by spline interpolation. So
i1520-0493-137-5-1687-ea9
Thus, (A3) can be rewritten as
i1520-0493-137-5-1687-ea10

With the above derivation, 𝗖′ becomes a matrix with n rows and r columns, where r is the eigenvector number chosen. The control vector also becomes r × N dimension. Because a few eigenvectors can represent the correlation function very well, the computing cost of En4DVAR with the correlation function is greatly reduced.

In the implementation of En4DVAR localization, we separate the three-dimensional spatial correlation function into its horizontal and vertical components:
i1520-0493-137-5-1687-ea11
Similar to the derivation of (A10), we can obtain (A11).

APPENDIX B

Analysis Time Tuning in En4DVAR Perturbation Matrix Calculation

Adding the observation time subscript index i, (8) can be rewritten as
i1520-0493-137-5-1687-eb1
With the cost function defined by (16) at α analysis time, (8) is reformulated as
i1520-0493-137-5-1687-eb2
For the observations at the analysis time and future (iα), 𝗛i𝗠αi𝗫′ is calculated by
i1520-0493-137-5-1687-eb3
For the observations at the past time (i < α), 𝗠αi is the inverse tangent linear model (Pu et al. 1997). The 𝗛𝗠αi𝗫′ is calculated by
i1520-0493-137-5-1687-eb4
Using (B3) and (B4), the En4DVAR perturbation matrix calculation at any analysis time is the same as the one at the beginning of assimilation window.

Fig. 1.
Fig. 1.

The SLP (every 2 hPa) and 1000-hPa winds (a full barb represents 5 m s−1) at (a) 1200 UTC 24, (b) 0000 UTC 25, and (c) 1200 UTC 25 Jan 2000 from the truth simulation; and the corresponding 300-hPa PV (shaded), geopotential heights (every 80 m), and winds (a full barb represents 5 m s−1) at (d) 1200 UTC 24, (e) 0000 UTC 25, and (f) 1200 UTC 25 Jan 2000.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 2.
Fig. 2.

The distribution of observations at different times. Each En4DVAR assimilates observations within a 6-h window, and four En4DVAR cycles are designed from 0900 UTC 24 to 1200 UTC 25 Jan 2000: (left) 24 Jan (shaded) and (right) 25 Jan 2000.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 3.
Fig. 3.

The flowchart of the En4DVAR OSSE design. The initial perturbation (ptb) is added to the control (CTL) initial field to obtain the truth and ensemble initial fields. The simulated observations (obs) are produced by adding normal perturbation to the truth state. The perturbation matrix (𝗫′b) is calculated from ensemble forecast at every observation time. The control forecast is enhanced by the simulated observations in the assimilation window for obtaining the analysis (denoted by A). The ensemble forecast fields are enhanced by the simulated observations so that a set of analysis fields (denoted by EA) are obtained and treated as the ensemble initial fields for next assimilation cycle. The subscript denotes the time (e.g., 09 is 0900 UTC Jan 2000).

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 4.
Fig. 4.

The response increments from a single observation test using (a) WRF 3DVAR, (b) En4DVAR without localization, and (c) En4DVAR with localization. The figures show increments of 1000-hPa wind vector (arrows) and temperature (shaded with the scale on the right).

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 5.
Fig. 5.

The vertical profiles of temperature increments at the observation location corresponding to Fig. 4 by En4DVAR-NL (circle-line) and En4DVAR-L (cross-line). The observation level (850 hPa) is marked.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 6.
Fig. 6.

The horizontal mean absolute correlation coefficient of background perturbation at the observation level (the 8th h level) at different forecast times. The correlation coefficients of u winds (thick line), temperature (thick plus-line), and humidity (thick square-line) are calculated with 36 members. The correlation coefficient of υ winds is similar to u winds (not shown). The dotted line is the mean absolute correlation coefficient of u winds calculated by 24 members, and the dashed line is the mean absolute correlation coefficient of u winds calculated by 12 members. The thin line, thin plus-line and thin square-line are corresponding to the thick lines, but statistics are at the same single observation location in Fig. 4.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 7.
Fig. 7.

The vertical RMSE profiles of forecast at 1500 UTC 24 from different analysis times at 0900 UTC 24 (square-line), 1200 UTC 24 (cross-line), and 1500 UTC 24 Jan 2000 (circle-line).

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 8.
Fig. 8.

Variations of (a) cost function and (b) its gradient with respect to iterations in En4DVAR. The dashed line is from the observation term. The dotted line is from the background term. The solid line is from both the observation and background terms.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 9.
Fig. 9.

The CTRL forecast error (CTRL minus truth) of SLP (shaded) and 1000-hPa wind vectors (a full barb represents 5 m s−1) at (a) 1200 UTC 24, (b) 0000 UTC 25, and (c) 1200 UTC 25 Jan 2000; and the corresponding 300-hPa potential vorticity (shaded), geopotential heights (every 2 m), and wind vectors at (d) 1200 UTC 24, (e) 0000 UTC 25, and (f) 1200 UTC 25 Jan 2000.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 10.
Fig. 10.

As in Fig. 9, but for En4DVAR analysis minus truth.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 11.
Fig. 11.

The vertical profiles of domain-averaged RMSEs in (a) u winds, (b) υ winds, (c) temperature, and (d) humidity at 1200 UTC 24 (cross-dotted line), 0000 UTC 25 (thin line), and 1200 UTC 25 Jan 2000 (thick line). The results of CTRL, En4DVAR 6-h forecast, and En4DVAR analysis are denoted by black circles, black line, and gray line, respectively.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 12.
Fig. 12.

The vertical profiles of domain-averaged analysis bias in (a) u winds, (b) υ winds, (c) temperature, and (d) humidity at 1200 UTC 24 (cross-doted line), 0000 UTC 25 (thin line), and 1200 UTC 25 Jan 2000 (thick line). The results of CTRL and En4DVAR analysis are denoted by black and gray lines, respectively.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 13.
Fig. 13.

The variation of domain-averaged RMSE in CTRL (square-line), En3DVAR (circle-line), and En4DVAR (cross-line) with time for (a) u winds, (b) υ winds, (c) temperature, and (d) humidity. The star-line shows variation of forecast–analysis spread at different times in En4DVAR.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Fig. 14.
Fig. 14.

The vertical profiles of domain-averaged RMSEs in (a) u winds, (b) υ winds, (c) temperature, and (d) humidity at 1200 UTC 24 (cross-dotted line), 0000 UTC 25 (thin line), and 1200 UTC 25 Jan 2000 (thick line). The results of En3DVAR and En4DVAR analysis are denoted by black and gray lines, respectively.

Citation: Monthly Weather Review 137, 5; 10.1175/2008MWR2699.1

Table 1.

The cumulative relative RMSE with different horizontal and vertical truncation modes (G is cumulative relative RMSE, m is the selected truncation mode number, and h and υ are horizontal vertical subscripts, respectively).

Table 1.
Table 2.

The AO (analysis minus observation) and BO (background minus observation) in zonal wind (m s−1), meridional wind (m s−1), temperature (K), and humidity (g kg−1).

Table 2.

* The National Center for Atmospheric Research is sponsored by the National Science Foundation.

Save