• Aksoy, A., , D. C. Dowell, , and C. Snyder, 2010: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part II: Short-range ensemble forecasts. Mon. Wea. Rev., 138, 12731292, doi:10.1175/2009MWR3086.1.

    • Search Google Scholar
    • Export Citation
  • Bayley, G. V., , and J. M. Hammersley, 1946: The “effective” number of independent observations in an autocorrelated time series. J. Roy. Stat. Soc., 8 (Suppl.), 184197, doi:10.2307/2983560.

    • Search Google Scholar
    • Export Citation
  • Berenguer, M., , M. Surcel, , I. Zawadzki, , M. Xue, , and F. Kong, 2012: The diurnal cycle of precipitation from continental radar mosaics and numerical weather prediction models. Part II: Intercomparison among numerical models and with nowcasting. Mon. Wea. Rev., 140, 26892705, doi:10.1175/MWR-D-11-00181.1.

    • Search Google Scholar
    • Export Citation
  • Caumont, O., , V. Ducrocq, , É. Wattrelot, , G. Jaubert, , and S. Pradier-Vabre, 2010: 1D+3DVar assimilation of radar reflectivity data: A proof of concept. Tellus,62A, 173–187, doi:10.3402/tellusa.v62i2.15678.

  • Chang, W., , K.-S. Chung, , L. Fillion, , and S.-J. Baek, 2014: Radar data assimilation in the Canadian high-resolution ensemble Kalman filter system: Performance and verification with real summer cases. Mon. Wea. Rev., 142, 21182138, doi:10.1175/MWR-D-13-00291.1.

    • Search Google Scholar
    • Export Citation
  • Chung, K.-S., , I. Zawadzki, , M. K. Yau, , and L. Fillion, 2009: Short-term forecasting of a midlatitude convective storm by the assimilation of single–Doppler radar observations. Mon. Wea. Rev., 137, 41154135, doi:10.1175/2009MWR2731.1.

    • Search Google Scholar
    • Export Citation
  • Chung, K.-S., , W. Chang, , L. Fillion, , and M. Tanguay, 2013: Examination of situation-dependent background error covariances at the convective scale in the context of the ensemble Kalman filter. Mon. Wea. Rev., 141, 33693387, doi:10.1175/MWR-D-12-00353.1.

    • Search Google Scholar
    • Export Citation
  • Fabry, F., 2006: The spatial variability of moisture in the boundary layer and its effect on convection initiation: Project-long characterization. Mon. Wea. Rev., 134, 7991, doi:10.1175/MWR3055.1.

    • Search Google Scholar
    • Export Citation
  • Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723757, doi:10.1002/qj.49712555417.

    • Search Google Scholar
    • Export Citation
  • Jacques, D., , and I. Zawadzki, 2014: The impacts of representing the correlation of errors in radar data assimilation. Part I: Experiments with simulated background and observation estimates. Mon. Wea. Rev., 142, 39984016, doi:10.1175/MWR-D-14-00104.1.

    • Search Google Scholar
    • Export Citation
  • Lewis, J. M., , S. Lakshmivarahan, , and S. Dhall, 2006: Dynamic Data Assimilation: A Least Squares Approach. Cambridge University Press, 680 pp.

  • Murphy, J. M., 1988: The impact of ensemble forecasts on predictability. Quart. J. Roy. Meteor. Soc., 114, 463493, doi:10.1002/qj.49711448010.

    • Search Google Scholar
    • Export Citation
  • Oliver, D., 1995: Moving averages for Gaussian simulation in two and three dimensions. Math. Geol., 27, 939960, doi:10.1007/BF02091660.

    • Search Google Scholar
    • Export Citation
  • Oliver, D., 1998: Calculation of the inverse of the covariance. Math. Geol., 30, 911933, doi:10.1023/A:1021734811230.

  • Purser, R. J., , W.-S. Wu, , D. F. Parrish, , and N. M. Roberts, 2003: Numerical aspects of the application of recursive filters to variational statistical analysis. Part I: Spatially homogeneous and isotropic Gaussian covariances. Mon. Wea. Rev., 131, 1524, doi:10.1175/1520-0493(2003)131<1524:NAOTAO>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Rennie, S. J., , S. L. Dance, , A. J. Illingworth, , S. P. Ballard, , and D. Simonin, 2011: 3D-Var assimilation of insect-derived Doppler radar radial winds in convective cases using a high-resolution model. Mon. Wea. Rev., 139, 11481163, doi:10.1175/2010MWR3482.1.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., , J. B. Klemp, , J. Dudhia, , D. O. Gill, , D. M. Barker, , W. Wang, , and J. G. Powers, 2005: A description of the advanced research WRF version 2. NCAR Tech. Note NCAR/TN-468+STR, 88 pp. [Available online at http://www.mmm.ucar.edu/wrf/users/docs/arw_v2.pdf.]

  • Snook, N., , M. Xue, , and Y. Jung, 2011: Analysis of a tornadic mesoscale convective vortex based on ensemble Kalman filter assimilation of CASA X-band and WSR-88D radar data. Mon. Wea. Rev., 139, 34463468, doi:10.1175/MWR-D-10-05053.1.

    • Search Google Scholar
    • Export Citation
  • Sobash, R. A., , and D. J. Stensrud, 2013: The impact of covariance localization for radar data on EnKF analyses of a developing MCS: Observing system simulation experiments. Mon. Wea. Rev., 141, 36913709, doi:10.1175/MWR-D-12-00203.1.

    • Search Google Scholar
    • Export Citation
  • Stratman, D. R., , M. C. Coniglio, , S. E. Koch, , and M. Xue, 2013: Use of multiple verification methods to evaluate forecasts of convection from hot- and cold-start convection-allowing models. Wea. Forecasting, 28, 119138, doi:10.1175/WAF-D-12-00022.1.

    • Search Google Scholar
    • Export Citation
  • Sun, J., 2004: Numerical prediction of thunderstorms: Fourteen years later. Atmospheric Turbulence and Mesoscale Meteorology, E. Fedorovich, R. Rotuno, and B. Stevens, Eds., Cambridge University Press, 139–164.

  • Sun, J., and Coauthors, 2014: Use of NWP for nowcasting convective precipitation: Recent progress and challenges. Bull. Amer. Meteor. Soc., 95, 409426, doi:10.1175/BAMS-D-11-00263.1.

    • Search Google Scholar
    • Export Citation
  • Talagrand, O., , and R. Vautard, 1999: Evaluation of probabilistic prediction systems. Proc. Workshop on Predictability, Reading, United Kingdom, European Centre for Medium-Range Weather Forecasts, 125.

  • Tarantola, A., 2005: Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM, 358 pp.

  • Wang, H., , J. Sun, , X. Zhang, , X.-Y. Huang, , and T. Auligné, 2013: Radar data assimilation with WRF 4D-Var. Part I: System development and preliminary testing. Mon. Wea. Rev., 141, 22242244, doi:10.1175/MWR-D-12-00168.1.

    • Search Google Scholar
    • Export Citation
  • Wilson, J. W., , Y. Feng, , M. Chen, , and R. D. Roberts, 2010: Nowcasting challenges during the Beijing Olympics: Successes, failures, and implications for future nowcasting systems. Wea. Forecasting, 25, 16911714, doi:10.1175/2010WAF2222417.1.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 2005: Representations of inverse covariances by differential operators. Adv. Atmos. Sci., 22, 181198, doi:10.1007/BF02918508.

    • Search Google Scholar
    • Export Citation
  • Yaremchuk, M., , and A. Sentchev, 2012: Multi-scale correlation functions associated with polynomials of the diffusion operator. Quart. J. Roy. Meteor. Soc., 138, 19481953, doi:10.1002/qj.1896.

    • Search Google Scholar
    • Export Citation
  • Zieba, A., 2010: Effective number of observations and unbiased estimators of variance for autocorrelated data—An overview. Metrol.Measure. Syst., 17, 3–16, doi:10.2478/v10178-010-0001-0.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    Convection prevailing in the truth and two members of the forecast ensemble used as background estimates. All fields are plotted on a horizontal plane at an altitude of 2 km. Liquid water mixing ratio is shown in blue, vertical wind velocities in red, and water vapor mixing ratio in gray. (i)–(iii) The large-scale similarities between the different ensemble members. (iv)–(vi) The differences occurring at the convective scales.

  • View in gallery

    Percentage of background members with reflectivity greater than 15 dBZ. The position of four points of interest, A, B, C, and D, is also depicted along with a white contour line delimiting the area where rain is found in more than 10% of background ensemble members.

  • View in gallery

    Histograms representing the probability density of background errors at locations A, B, C, and D (see Fig. 2). Fits of Gaussian distributions are plotted in orange and fits of double exponential distributions are plotted in blue. Vertical black lines indicate the position of zero.

  • View in gallery

    Magnitude of the difference .

  • View in gallery

    (i)–(iv) Correlations between the errors at locations A, B, C, and D and errors in the rest of the assimilation domain. Correlations in the rainy area (A and B) decay rapidly and are nearly isotropic. Correlations outside of precipitation (C and D) decay over much longer distances and have a complex spatial structure. (v)–(viii) The same data presented in 2D graphs for better representation of the functional form of the correlation decay. Black lines represent the average correlation as a function of distance; gray shadings represent the variability due to anisotropy. Fits of Gaussian (in orange) and exponential (in blue) decay are also provided. The dashed lines indicate correlations of magnitude .

  • View in gallery

    Standard deviation of background errors (with respect to ) estimated using Eq. (7).

  • View in gallery

    Average distribution of background errors (in black). Gray shading indicates the member-to-member variability about the average error distribution. A double-exponential distribution centered at zero (in blue) provides a good fit for the average distribution of background errors.

  • View in gallery

    Average correlation of background errors obtained by averaging correlations in . (i) Color-coded representation of the average spatial correlation of background errors. (ii) The same data replotted for a better depiction of the functional form of the correlation decay. Gray shadings are indicative of the variability of correlation estimates caused by heterogeneity and anisotropy. In blue is a fit obtained by an equally weighted sum of two-exponential decay, Eq. (10).

  • View in gallery

    (i) One realization of the simulated observations obtained with m s−1 and km. (ii) The U component of the wind prevailing in throughout the analysis domain.

  • View in gallery

    Description of the different configurations under which analysis ensembles were generated. The “symbol” column on the right-hand side is also the legend by which Figs. 1113 may be interpreted. Some notes on the symbols and color coding: hollow circles represent analyses obtained with homogeneous representation of background error variance; solid dots indicate heterogeneous variance. The color orange indicates analyses for which the correlation of errors were neglected altogether. The color purple indicates analyses for which correlations were represented in both and . The color red indicates analyses for which correlations were represented in , but not in . The description of all matrices used here can be found in Table 1.

  • View in gallery

    Effective number of uncorrelated pieces of information, , of different estimates as a function , the decay rate of the spatial correlation of observation errors. For small , (shown in blue) is very close to 3251, the number of observation data points. The larger , the larger the correlation of observation errors and the smaller . Irrespective of , background estimates are the same with . The effective number of uncorrelated pieces of information is shown for analyses obtained under the configurations (b), (d), and (j). These analyses differ only in the correlations that were represented in and . Under configuration (b), analyses were obtained with no representation of correlations whatsoever. Under configuration (d), the correlations were represented in both and . Under configuration (j), correlations were represented in , but not in .

  • View in gallery

    Average standard deviation of errors for background, observation, and analysis ensembles as a function of . Each circle/dot plotted here represents the average standard deviation of errors , estimated from an ensemble of 100 members. Color shadings extend above and below and indicate the variability of errors within different ensembles. The errors of background estimates and their bias-free counterparts appear in gray. Observation errors appear in blue. The errors of analysis ensembles obtained under the configurations (a)–(l) are also shown in orange, purple, and red. These configurations, the symbols, and the color coding used here are described in Fig. 10. A detailed description of how these results may be interpreted is provided in the text.

  • View in gallery

    As in Fig. 12, but for analyses obtained under the configurations (m), (n), (o), and (p). Under those configurations, the localized matrix (see Fig. 10 and Table 1) was used to represent the correlation of background errors in . The errors of analyses (b), (d), (f), and (h) are also shown for comparison purposes. The errors of analyses (d) and (h) appear as dashed lines to avoid possible confusion with analyses (m) and (n).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 40 40 4
PDF Downloads 26 26 1

The Impacts of Representing the Correlation of Errors in Radar Data Assimilation. Part II: Model Output as Background Estimates

View More View Less
  • 1 McGill University, Montréal, Québec, Canada
© Get Permissions
Full access

Abstract

In data assimilation, analyses are generally obtained by combining a “background,” taken from a previously initiated model forecast, with observations from different instruments. For optimal analyses, the error covariance of all information sources must be properly represented. In the case of radar data assimilation, such representation is of particular importance since measurements are often available at spatial resolutions comparable to that of the model grid. Unfortunately, misrepresenting the covariance of radar errors is unavoidable as their true structure is unknown. This two-part study investigates the impacts of misrepresenting the covariance of errors when dense observations, such as radar data, are available. Experiments are performed in an idealized context. In Part I, analyses were obtained by using artificially simulated background and observation estimates. For the second part presented here, background estimates from a convection-resolving model were used. As before, analyses were generated with the same input data but with different misrepresentation of errors. The impacts of these misrepresentations can be quantified by comparing the two sets of analyses. It was found that the correlation of both the background and observation errors had to be represented to improve the quality of analyses. Of course, the concept of “errors” depends on how the “truth” is considered. When the truth was considered as an unknown constant, as opposed to an unknown random variable, background errors were found to be biased. Correcting these biases was found to significantly improve the quality of analyses.

Corresponding author address: Dominik Jacques, McGill University, BH 945, 805 Sherbrooke St. West, Montreal, QC H3A 0B9, Canada. E-mail: dominik.jacques@mail.mcgill.ca

Abstract

In data assimilation, analyses are generally obtained by combining a “background,” taken from a previously initiated model forecast, with observations from different instruments. For optimal analyses, the error covariance of all information sources must be properly represented. In the case of radar data assimilation, such representation is of particular importance since measurements are often available at spatial resolutions comparable to that of the model grid. Unfortunately, misrepresenting the covariance of radar errors is unavoidable as their true structure is unknown. This two-part study investigates the impacts of misrepresenting the covariance of errors when dense observations, such as radar data, are available. Experiments are performed in an idealized context. In Part I, analyses were obtained by using artificially simulated background and observation estimates. For the second part presented here, background estimates from a convection-resolving model were used. As before, analyses were generated with the same input data but with different misrepresentation of errors. The impacts of these misrepresentations can be quantified by comparing the two sets of analyses. It was found that the correlation of both the background and observation errors had to be represented to improve the quality of analyses. Of course, the concept of “errors” depends on how the “truth” is considered. When the truth was considered as an unknown constant, as opposed to an unknown random variable, background errors were found to be biased. Correcting these biases was found to significantly improve the quality of analyses.

Corresponding author address: Dominik Jacques, McGill University, BH 945, 805 Sherbrooke St. West, Montreal, QC H3A 0B9, Canada. E-mail: dominik.jacques@mail.mcgill.ca

1. Introduction

This study investigates the impacts of misrepresenting error covariances when the background and observations are available at comparable densities. This situation is typical of storm-scale radar data assimilation, which provides the context and motivation for the experiments that are presented here. To avoid many of the complications associated with “true” radar data assimilation, our experiments were performed in a highly idealized context. As such, the results presented here are applicable not only to radar data assimilation but to any situation where dense observations may be available. As an introduction to this study, we provide a brief description of the current state of storm-scale data assimilation. The results of Jacques and Zawadzki (2014, hereafter Part I) are also discussed followed by a description of the experiments presented here.

In the past two decades, there has been a continuous research effort toward the numerical forecasting of thunderstorms [see a recent review paper by Sun et al. (2014)]. Weather radars, with spatial and temporal resolutions on the order of kilometers and minutes, have always assumed a central role in storm-scale forecasting. Unfortunately, radars provide limited or no information on most atmospheric state variables. This shortcoming is often compensated by the use of “background” estimates obtained from previously initiated forecasts. The process of “assimilating” radar observations aims to bring the background atmospheric state closer to the true atmospheric state.

Most commonly, the assimilation of radar data consists of an “analysis step” and a “forecast step.” During the analysis step, background and observation estimates are combined into one analysis. This can be done in the variational (e.g., Chung et al. 2009; Caumont et al. 2010; Rennie et al. 2011; Wang et al. 2013) or the Kalman filter (e.g., Aksoy et al. 2010; Snook et al. 2011; Chang et al. 2014) frameworks. During the forecast step, an atmospheric model is integrated forward in time using the result of the analysis step as initial conditions. In a process known as “cycling,” the analysis and forecast steps are repeated several times. The intention of performing such cycling is to gradually bring the background state closer to the true atmospheric state.

Unfortunately, the assimilation of radar data is not very effective. Despite advances in the methods used and increases in available computational power, convection-allowing prediction systems remain incapable of accurately forecasting thunderstorms a few hours in the future (Wilson et al. 2010). Even at larger scales, the beneficial influence of radar data assimilation is short lived and, approximately, does not exceed 6 hours (Berenguer et al. 2012; Stratman et al. 2013; Sun et al. 2014). Partly, these results can be explained by the low predictability of thunderstorms. The imperfection of atmospheric models and assimilation systems also play a role.

In this two-part study, we focus on the representation of error covariances at the analysis step of the data assimilation process. For linear systems, the correct representation of error covariances is required to obtain analyses that are minimum variance estimators of the “truth” (Tarantola 2005, p. 68). In practice, lack of knowledge and numerical constraints prohibit the perfect representation of these covariances. Therefore, there is potential for improving analyses through better representations of error covariances. Currently, this research avenue is an active research topic. Methods have been proposed to allow the efficient convolution of error fields with various correlation matrices (Oliver 1995; Gaspari and Cohn 1999; Purser et al. 2003) or their inverse (Oliver 1998; Xu 2005; Yaremchuk and Sentchev 2012). In the context of ensemble forecasting, correlations are commonly “localized” using methods varying in sophistication (see references in Sobash and Stensrud 2013).

While theory says that least squares analyses will be obtained with perfect representation of error covariances, it is unclear how much “better” these analyses would be from those obtained with misrepresented covariances. The situation is particularly interesting in radar data assimilation because both the background and observation estimates are available at comparable spatial densities.

In Part I, we devised an idealized experimental setup where the background and observation error covariances could be perfectly represented. These experiments were designed with radar data assimilation in mind but their simplicity makes them applicable to any situations where “dense” observations must be assimilated. Only one state variable, the U component of the wind, was estimated on a two-dimensional (2D) domain. Analyses were obtained by minimizing the cost function,
e1
in the variational framework. Both the background and observations estimates were generated artificially by adding homogeneous and correlated noise to a predefined truth . For simplicity, observations were made available everywhere in the assimilation domain and in the same units as . This insured a linear observation operator . As is usual, the covariance of background and observation errors were represented in the matrices and , respectively.

With this setup, truly optimal analyses could be obtained. These optimal analyses were then compared with other “suboptimal” analyses obtained by purposely misrepresenting covariances in and . By comparing the two sets of analyses, we could quantify the influence of covariances on the quality of analyses. It was found that, in most cases, even the perfect representation of covariances could only bring small improvements to the quality of analyses. Also, not representing correlations at all proved to be less harmful than representing only the correlations of background errors. This last result contradicts the intuitive notion that some representation of correlations should yield better analyses than none.

The experiments that were performed in Part I are very simple. For the second part of this study we wished to confirm that the previous results could be replicated in a more realistic setup. For the experiments presented here, dense observations were only made available in a limited portion of the assimilation domain as is generally the case with “real” radar observations. Another difference with the previous experiments is that background estimates were taken from the output of a mesoscale atmospheric model. This change required that an ensemble of forecasts be generated. The generation of this forecast ensemble, the choice of the truth and the background estimates are discussed in section 2.

A significant portion of this article is dedicated to the description of background errors in a convective environment. The treatment of errors is delicate as it depends on how the truth is considered. Lewis et al. (2006, p. 228) state that there are two schools of thought for considering . In modern data assimilation, is most commonly considered as an unknown random variable. This sets the assimilation problem in the Bayesian framework, where a refined “posterior” probability density is sought from the combination of a broader “prior” probability density with observations. Alternatively, can be considered as an unknown constant to be recovered. This sets the assimilation problem in Fisher’s framework. In this case, the estimation of is similar to a regression process where a model is “fitted” through a set of “observations” (see the appendix for details). In the context of our experiments the two frameworks are associated with the same cost function. However, the assumptions on the inherent properties of errors are different.

In section 3, we document the heterogeneous nature of background errors. The functional shape of the error distribution is discussed along with its derived mean, variance, and correlation. For simplicity, these errors were considered in Fisher’s framework (with respect to the predefined truth ). Because this is an unusual choice, we also discuss how these errors differ from those obtained in the Bayesian framework (with respect to the average of the background ensemble ).

When a forecast ensemble is not available, spatial averages are sometimes used to estimate the variance and covariance of background errors. In section 4, we revisit background error statistics under the homogeneity assumption. This allowed us to document the impact of representing heterogeneous versus homogeneous errors in the matrix .

Once the background errors have been thoroughly discussed, we proceed to the assimilation experiments. The methodology under which these experiments were performed is described in section 5. By using different error representations in the matrices and , we could document the magnitude of errors associated with 1) the representation of homogeneous versus heterogeneous error variances in , 2) the misrepresentation of error covariances in and/or , and 3) the presence of “biases” when errors are considered in Fisher’s framework. The results of these experiments are presented and discussed in sections 6 and 7, respectively. Conclusions are found in section 8.

2. Generation of the forecast ensemble and choice of the truth

In Part I, the background estimates were simulated by adding zero-mean, correlated noise to a predefined truth. Here, both the background estimates and the truth, were taken from an ensemble of convection-resolving forecasts. In this section, we discuss the generation of this forecast ensemble and the choice of .

Early work with forecast ensembles (Murphy 1988; Talagrand and Vautard 1999) established that, ideally, both and should be realizations of the same probability density. To assess whether this was the case, they suggested to verify that the expected value for the variance between and the ensemble average be equal to the expected value for the variance between any member of the background ensemble and the ensemble average . That is
e2
with representing the expectation operator and a vector representing the variance at every location in the analysis domain. The left- and right-hand sides of this equation are, respectively, known as the “skill” and “spread” of a background ensemble. Verifying the equality of Eq. (2) insures that a given ensemble has sufficient spread to “contain” the truth everywhere in the analysis domain.

For our idealized assimilation experiments, we selected and from the same forecast ensemble. When is considered as a random variable, such that the equality of Eq. (2) is guaranteed. However, when is considered as a constant (such that ), the spread–skill relationship will no longer hold. We discuss this issue in section 3.

To determine the magnitude of the ensemble spread to be used in our experiments, we inspired ourselves from radar data assimilation. In operational contexts, the quality of background estimates varies between two extremes. In the worst cases, the atmospheric conditions prevailing in the background do not allow the occurrence of convection that is compatible with radar observations. In the best cases, the convection occurring in background estimates is found in the same locations as in radar observations. Such background estimates have been demonstrated after the assimilation of radar data (e.g., Aksoy et al. 2010; Chang et al. 2014). However, even after the assimilation of radar data, it is currently not realistic to expect that individual convective cells occurring in the “real” atmosphere will be accurately resolved in background estimates.

We wished to perform our assimilation experiments using the best background estimates that would still be representative of those obtained in real data assimilation. To this end, we generated a forecast ensemble by applying slight perturbations to the low-level moisture in the initial conditions of each member of our forecast ensemble. The same initial conditions were used for all other state variables. Forecasts were generated using the Weather Research and Forecasting (WRF) Model (Skamarock et al. 2005) for a convective event that occurred near Montréal (Québec, Canada) on 17 July 2010. Forecasts were performed using two nested domains with horizontal resolutions of 3 and 1 km.

We chose to perturb the moisture field because of its strong influence on convection (Sun 2004). The perturbation consisted of tridimensional fields of zero-mean correlated noise. This noise followed a Gaussian distribution whose correlations were described by a homogeneous and isotropic exponential decay. The rates of decay for the horizontal and vertical correlations were set to 180 and 3 km, respectively. Such decay rates were found to roughly approximate the average autocorrelation of moisture fields found in the larger nesting domain. The standard deviation of the perturbation noise varied in height and was set to 10% of the average standard deviation of moisture found on the different model levels. This is a relatively small perturbation compared to the uncertainty associated with moisture estimates (Fabry 2006).

Simulations were allowed to run for four hours, a sufficiently long period for the model to develop convection. A total of 1001 such forecasts were computed in which 1 forecast was chosen as the truth , and the remaining 1000 became members of our background ensemble.

During the period of model integration, the perturbations to the moisture field propagated to all the other atmospheric fields. Figures 1i–vi are provided as a verification that the background ensemble had the desired property of diverging from only at the convective scales. These figures depict the atmospheric conditions prevailing in the chosen truth and two members of the background ensemble at an altitude of 2 km. The liquid water mixing ratio is shown in blue, positive W component of the wind in red, and water vapor mixing ratio in gray.

Fig. 1.
Fig. 1.

Convection prevailing in the truth and two members of the forecast ensemble used as background estimates. All fields are plotted on a horizontal plane at an altitude of 2 km. Liquid water mixing ratio is shown in blue, vertical wind velocities in red, and water vapor mixing ratio in gray. (i)–(iii) The large-scale similarities between the different ensemble members. (iv)–(vi) The differences occurring at the convective scales.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Figures 1i–iii depict the atmospheric conditions prevailing in the larger nesting domain. We can observe that the location of convection as well as large-scale features (most visible in the moisture field) are very similar for each of the forecasts displayed. By present-day forecasting standards, a background ensemble displaying such proximity with the true atmospheric state would be considered excellent. In Figs. 1iv–vi we zoom on a 101 km × 101 km area where convection is found in most members of the forecast ensemble. This area is the domain over which we performed our assimilation experiments. In this smaller domain, we can observe that the convective features greatly differ between the different members of the ensemble.

Figures 1i–vi demonstrate that slight perturbations to the moisture field can have a strong influence on all the other atmospheric variables in the vicinity of convection. We chose to display liquid water, positive vertical velocity, and water vapor because most readers will likely be familiar with these variables. For our experiments, only the U component of the wind (not shown) was retained from each model simulation. This state variable was chosen because of its proximity to Doppler velocity. The reader should keep in mind that all aspects of the assimilation problem discussed in this article pertain to, and only to, the U component of the wind. From this point on, references to this state variable will be implicit.

In the next section, we provide a description of background errors throughout the assimilation domain. For some aspects of this description, we distinguish between the errors near and away from convection. It is convenient, then, to consider Fig. 2, which represents the percentage of ensemble members with reflectivity greater than 15 dBZ1 as a function of the location within the assimilation domain. This figure shows that convection, associated with intense precipitation, is mostly found along a band that extends from the bottom-left to the top-right corners of the assimilation domain. An arbitrary threshold of 10%, illustrated by a white line, was chosen to delimit the rainy and nonrainy portion of the assimilation domain. This white line will be plotted over some of the error statistics presented in the next sections. The position of four points (labeled A, B, C, and D), two in the rainy area and two outside, is also shown. Details on the distribution and correlation of background errors will be given at these four locations.

Fig. 2.
Fig. 2.

Percentage of background members with reflectivity greater than 15 dBZ. The position of four points of interest, A, B, C, and D, is also depicted along with a white contour line delimiting the area where rain is found in more than 10% of background ensemble members.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

3. Heterogeneous statistics for background errors

In this section, we discuss the distribution, bias, correlation, and standard deviation of background errors. These statistics will be used for the construction of different covariance matrices.

To investigate the errors of background estimates, one vector
e3
was computed from each member of the background ensemble. The symbol represents the reference against which errors were estimated. In the Bayesian framework, the “errors” of background estimates actually refer to the dispersion of the background probability density around its center. This dispersion can be characterized by letting . In Fisher’s framework, background errors are considered with respect to the constant truth. This difference can be estimated by letting . The error statistics shown here were estimated in Fisher’s framework with . Because this is an unusual choice, we will discuss how these errors differ from those estimated in the Bayesian framework.
To express different error statistics, it is convenient to concatenate all error vectors into the following matrix:
e4
Columns of the matrix represent the individual errors fields that were obtained from each member of our ensemble. They are composed of n = nx × ny = 10 201 elements. Rows of the matrix represent all the error samples available at a given location in the assimilation domain. They comprise elements.

In the following equations, square brackets are used to represent matrix indices. The index i refers to the position in the assimilation domain, while the index m refers to the “number” of a given member of the background ensemble. For example, the expression designates the error at the ith position in the domain and for the mth member of the ensemble. Sometimes, an asterisk will be used to designate all possible indices over a given row or column of . For example, designates the vector containing the errors at the ith position of the domain for every ensemble member.

a. Distribution of background errors

One of the reasons for generating a background ensemble with 1000 members was to document the functional form of the distribution of background errors in the vicinity of convection. In Figs. 3i–iv we plot four histograms (in black) that represent the distribution of background errors in rainy (A and B) and nonrainy (C and D) positions of the assimilation domain. For each of the histograms displayed, we also plotted fits of a Gaussian (in orange) and a double exponential (in blue) distribution. At position A, the distribution of errors is well represented by the double-exponential distribution. At position C, the distribution of errors is well represented by the Gaussian distribution. Such good fits, however, are not the norm as can been seen at position B, where some skewness can be observed and position D where the distribution exhibits an irregular function with a hint of bimodality.

Fig. 3.
Fig. 3.

Histograms representing the probability density of background errors at locations A, B, C, and D (see Fig. 2). Fits of Gaussian distributions are plotted in orange and fits of double exponential distributions are plotted in blue. Vertical black lines indicate the position of zero.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

b. “Biases” of background errors

When the truth is considered as a random variable, the expected value of background errors will be zero. That is, . In the Bayesian framework, “biases” are not an issue. However, when the truth is considered as a constant, “biases” occur as . These “biases” violate assumption 1) for obtaining least squares analyses in Fisher’s framework (see the appendix). Quotation marks are used to emphasize that “biases” occur as a consequence of considering the truth as a constant.

The magnitude of the difference between and is illustrated in Fig. 4. This difference is most significant within the rainy area (the white line delimitation) where convection prevails.

Fig. 4.
Fig. 4.

Magnitude of the difference .

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

c. Correlation of background errors

The spatial correlation between background errors at any two locations i and is represented by the correlation matrix . These correlations were estimated using
e5
Inspection of Eq. (5) reveals that will remain the same irrespective of whether we let equal to or in Eq. (3).

Figures 5i–iv depict the correlation between the errors found at the locations A, B, C, and D and the errors everywhere else in the assimilation domain. In the rainy area (A and B), correlations decay rapidly and are approximately isotropic. This is in agreement with similar correlations shown by Chung et al. (2013). Outside of convection (C and D), correlations are significant over much longer distances and are no longer isotropic.

Fig. 5.
Fig. 5.

(i)–(iv) Correlations between the errors at locations A, B, C, and D and errors in the rest of the assimilation domain. Correlations in the rainy area (A and B) decay rapidly and are nearly isotropic. Correlations outside of precipitation (C and D) decay over much longer distances and have a complex spatial structure. (v)–(viii) The same data presented in 2D graphs for better representation of the functional form of the correlation decay. Black lines represent the average correlation as a function of distance; gray shadings represent the variability due to anisotropy. Fits of Gaussian (in orange) and exponential (in blue) decay are also provided. The dashed lines indicate correlations of magnitude .

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

To better represent the functional form of the correlation decays, the same data have been replotted in Figs. 5v–viii. As for the distribution of background errors, we fitted both a Gaussian and an exponential decay to the correlations being depicted. Gaussian decays (in orange) were fitted for minimum square errors at lags 2 km. They provide a good representation of correlations between adjacent grid points. At these small lags, it seems reasonable to assume that the correlations of background errors are mainly caused by the model diffusion. Exponential decays (in blue) were fitted for a perfect match at lags where the average correlations (in black) decayed to a value of . At lags greater than 3 km, the exponential decay provides a good approximation for the correlation of background errors in the rainy area (A and B). In the nonrainy area, neither the Gaussian nor the exponential decay provide good fits.

In the Bayesian framework [with in Eq. (3)], the errors of any two members of the background ensemble will not be correlated to one another. However, in Fisher’s framework (with ), such correlations will be caused by the “biases” discussed above. When it comes to error estimation, error “biases” are indistinguishable from errors that are correlated to one another.

These correlations are represented in the matrix . This matrix was estimated using
e6
The matrix represents the correlation found between the errors of any two members m and of the background ensemble. Note that is different from the matrix , which represents correlations between the errors found at any two locations i and . With , is equal to the identity matrix down to sampling noise. However, when , the matrix is no longer diagonal. In the next section, we will discuss how these correlations should be taken into account when estimating the standard deviation of background errors.

d. Standard deviation of background errors

The standard deviation of background errors varies throughout the assimilation domain and is represented in the diagonal matrix
e7
In this equation, is a correction factor for the estimation of variance in the presence of correlated errors (Bayley and Hammersley 1946; Zieba 2010). Such factor is also discussed in Part I and is here defined as
e8
with
e9

In the Bayesian framework, the errors of different background members are not correlated and . In Fisher’s framework, the errors between the different members of the background ensemble are correlated such that . The variance of background errors estimated with respect to are 2.16 times greater than those estimated with respect to .

The heterogeneous standard deviation of background errors (estimated with ) is shown in Fig. 6. As expected, the magnitude of errors is greater in the rainy area (within the white contour).

Fig. 6.
Fig. 6.

Standard deviation of background errors (with respect to ) estimated using Eq. (7).

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

4. Homogeneous statistics for background errors

In operational contexts, spatial and temporal averages are often used to characterize the nature of background errors. In this section, we discuss what happens when background errors are assumed to be spatially homogeneous. As before, the distribution, bias, correlation, and standard deviation of background errors are discussed. This will allow us to investigate the impacts of representing homogeneous versus heterogeneous background errors in the matrix .

a. Average distribution of background errors

Figure 7 shows (in black) the average distribution of background errors with respect to . Gray shadings are indicative of the member-to-member variability about the average distributions. A fitted double exponential distribution (in blue) centered at zero is mostly contained within the variability.

Fig. 7.
Fig. 7.

Average distribution of background errors (in black). Gray shading indicates the member-to-member variability about the average error distribution. A double-exponential distribution centered at zero (in blue) provides a good fit for the average distribution of background errors.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

b. Average “bias” of background errors

In Fig. 7, the domain-averaged distribution of background errors with respect to is approximately centered at zero. This differs from the distributions show in Fig. 3, which indicated significant differences between and . When averaged throughout the entire assimilation domain, differences of opposite signs balance each other.

c. Average correlation of background errors

The spatially averaged correlation of background errors is shown in Figs. 8i and 8ii. Figure 8i shows the average correlation of errors as a function of lag and direction. This plot reveals that, on average, errors are slightly anisotropic with the anisotropy depending on the lag distance. In Fig. 8ii, the same data are replotted to better represent the functional form of the correlation decay. In blue, we plotted a fit given by an equally weighted sum of two exponential decay:
e10
In this expression, γ represents the distance between any two errors, while the αs represent the decay rate of each exponential function. In Fig. 8, and km. This fit was used in the construction of a homogeneous background error correlation matrix.2
Fig. 8.
Fig. 8.

Average correlation of background errors obtained by averaging correlations in . (i) Color-coded representation of the average spatial correlation of background errors. (ii) The same data replotted for a better depiction of the functional form of the correlation decay. Gray shadings are indicative of the variability of correlation estimates caused by heterogeneity and anisotropy. In blue is a fit obtained by an equally weighted sum of two-exponential decay, Eq. (10).

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

d. Average standard deviation of background errors

The average standard deviation of errors is a convenient metric for synthesizing the overall magnitude of errors for an entire ensemble of background estimates. The most intuitive way to estimate is to average the error variances estimated from each member of the background ensemble. This is obtained by computing
e11
with
e12
and
e13

The scalar is also a correction factor for the estimation of variance in the presence of correlated errors. Note that is different from introduced above. While compensates for the spatial correlations of background errors, compensates for the member-to-member correlations.

The average standard deviation of background errors is equal to 1.7 and 2.5 m s−1 when errors were estimated with respect to and , respectively. In both cases, . In Eq. (11), spatial correlations demand that variance estimates be increased by 6%.

The influence of correlation on variance estimates is more important when we compute in the reverse order. That is, we average the variances estimated at each point in the domain. This is obtained by computing
e14

As before, is equal to 1.7 and 2.5 m s−1 when background errors were estimated with respect to and , respectively. However, in Eq. (14), it is the factor that accounts for the difference.

This second method for estimating the average variance of errors is provided to show the “symmetry” that exists when estimating error variances from the matrix . When the variance of errors is estimated from individual columns (row) of , the correlation between the rows (columns) of must be considered.

5. Experimental setup

We now describe the methodology by which our assimilation experiments were conducted. For consistency with the results of Part I, only one atmospheric state variable, the U component of the wind, was estimated on a 2D domain of grid points. Analyses, , were obtained by combining and through the minimization of the cost function in Eq. (1). Background estimates, , and their errors have been thoroughly discussed in sections 24. The process by which observations were simulated and the different configurations for obtaining analyses are now being discussed.

a. Simulated observations

Observations were simulated by adding correlated noise to the truth . The reader may consider these simulated observations as coming from an imaginary instrument that directly measures the U component of the wind with known error statistics. Fields of observation errors
e15
were constructed by generating zero-mean, homogeneous, isotropic, and correlated noise following a multivariate Gaussian distribution. The correlation ζ between observation errors at any two locations, i and , was set to an exponential decay:
e16
where determines the rate at which correlations decay. Exponentially correlated noise was obtained by convolving fields of Gaussian white noise with the appropriate 2D kernel provided by Oliver (1995).

An ensemble of 100 observation estimates was created for different ranging between 0.1 and 100 km.3 With km, observations are effectively uncorrelated and can be considered as white noise. At the other extreme, km, observations are very strongly correlated. Assimilation experiments will thus be generated for a wide range of observation error correlations.

We let the standard deviation of observation errors, , be equal to 2.5 m s−1, which is the average standard deviation of background errors estimated with respect to (see section 4d). Like “real” Doppler velocity measurements, the simulated observations were only made available where precipitation was found in the truth. An example of the observations simulated with km is shown in Fig. 9i. For comparison purposes, Fig. 9ii shows the U component of the wind prevailing in .

Fig. 9.
Fig. 9.

(i) One realization of the simulated observations obtained with m s−1 and km. (ii) The U component of the wind prevailing in throughout the analysis domain.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

b. Different analysis ensembles

A number of analysis ensembles have been performed to quantify the influence of the following:

  1. representing homogeneous versus heterogeneous variance of background errors,
  2. neglecting the correlation of background and/or observation errors, and
  3. the presence of “biases” in Fisher’s framework.

The sensitivity to the representation of error variance and correlation can be assessed by comparing analyses obtained with the same input data but with different representation of errors in the matrices and . It is convenient, then, to express these matrices as
e17
e18
where the subscript indicates diagonal matrices in which the standard deviation of errors is represented and the subscript indicates nondiagonal matrices in which the spatial correlation of errors is represented.

All the different error matrices that were used in our assimilation experiments are described in Table 1. Here are some notes on the notation: whenever exact error statistics are not represented in and , the bold subscripts and are used in place of and . To indicate that homogeneous errors statistics are represented, the italic subscripts s and C are used in place of their bold counterparts.

Table 1.

The different error matrices used in our experiments.

Table 1.

The matrices and cannot be used to assess the sensitivity of analyses to the presence of the “biases” in Fisher’s framework. To document the influence of these “biases”, we constructed a modified ensemble of background estimates
e19
whose average is . In operational contexts, such perfect correction of “biases” is obviously not possible. Nevertheless, analyses obtained by using (instead of ) are useful since they allow us to document the potential gains that could be obtained by applying similar corrections to background estimates.

Ensembles of analyses were generated for various combination of the matrices and . The influence of bias correction was also tested. The configurations chosen for our experiments were assigned letters ranging from “a” to “p” and are described in Fig. 10.4 Figure 10 also provides the legend for interpreting the results presented in Figs. 1113.

Fig. 10.
Fig. 10.

Description of the different configurations under which analysis ensembles were generated. The “symbol” column on the right-hand side is also the legend by which Figs. 1113 may be interpreted. Some notes on the symbols and color coding: hollow circles represent analyses obtained with homogeneous representation of background error variance; solid dots indicate heterogeneous variance. The color orange indicates analyses for which the correlation of errors were neglected altogether. The color purple indicates analyses for which correlations were represented in both and . The color red indicates analyses for which correlations were represented in , but not in . The description of all matrices used here can be found in Table 1.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

6. Results

One analysis ensemble with = 100 members has been computed for each configuration and for each of the different that was tested. In total, this makes more than 1500 analysis ensembles. It would not be practical to examine the errors of each analysis ensemble as was done in sections 3 and 4. For that reason, we resorted to present only averaged error statistics for the correlation and the standard deviation of analysis errors.

The errors discussed here were considered in Fisher’s framework. As before, the error vectors of the different members of a given analysis ensemble,
e20
were concatenated into the matrix:
e21
of dimension . The symbol is the number of grid points in the analysis domain, while is the number of members in each analysis ensemble.

In a manner similar to Part I, all the results are presented in the form of plots with the rate of decay of observation errors, , being represented on the abscissa. Doing so allows us to depict the dependence of analysis errors on the correlation of observation errors.

a. Average correlation of analysis errors

It is difficult to synthesize information on the correlation of a large number of analysis ensembles. We chose to represent only the “amount” of correlation that can be inferred from , the effective number of independent pieces of information contained in the errors of a given analysis ensemble. This quantity was estimated using Eqs. (5) and (13) with and in place of and N.

In Fig. 11, is shown for analysis ensembles obtained under the configurations (b), (d), and (j). Analyses obtained under these three configurations differ only in the correlations that were represented in the matrices and . The effective number of uncorrelated pieces of information contained in the background and observation estimates, and , respectively, are also plotted as references. By construction, the spatial correlation between observation errors decreases at a rate determined by . For km, observations are effectively uncorrelated and (shown in blue) is approximately equal to 3251. This is the number of observation data points. For larger , observations become more correlated and they contain less information than the same number of uncorrelated estimates. Consequently, the number of effectively uncorrelated pieces of information, , decreases. For km, observations are so correlated that they contain only a little more information than a single “uncorrelated” observation. Regardless of , background estimates are the same with (shown in gray).

Fig. 11.
Fig. 11.

Effective number of uncorrelated pieces of information, , of different estimates as a function , the decay rate of the spatial correlation of observation errors. For small , (shown in blue) is very close to 3251, the number of observation data points. The larger , the larger the correlation of observation errors and the smaller . Irrespective of , background estimates are the same with . The effective number of uncorrelated pieces of information is shown for analyses obtained under the configurations (b), (d), and (j). These analyses differ only in the correlations that were represented in and . Under configuration (b), analyses were obtained with no representation of correlations whatsoever. Under configuration (d), the correlations were represented in both and . Under configuration (j), correlations were represented in , but not in .

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

For , it did not make a big difference whether the correlation of errors were neglected altogether [configuration (b) in orange] or represented in both and [configuration (d) in purple]. However, for analyses in which only the correlation of background errors were represented [configuration (j) in red], significantly drops for km. This decrease can be explained by considering that neglecting the correlation of observation errors artificially increases the influence of observations in analyses. For large , the strongly correlated observation errors end up “contaminating” the analyses. Similar results (not shown here) were obtained with all the other configurations.

b. Average standard deviation of analysis errors

Data assimilation strives at minimizing the variance of errors between the analyses and the truth. In the present case, these errors are heterogeneous and can only be represented in 2D maps such as Fig. 6. To allow comparisons between the different analyses that were generated, only the average standard deviation of analysis errors, , was considered. This quantity was estimated using Eq. (11) with and in place of and N.

Figure 12i shows the average standard deviation of errors for background, observations and analyses. The errors of background estimates are shown in gray, while the errors of observation estimates appear in blue. Each solid dot indicates the magnitude of for one ensemble composed 100 members. Color shadings indicate the member-to-member variability of the average standard deviation of errors prevailing within different ensembles. In Part I, this second-order error statistic,
e22
was referred to as the “sampling noise.” The average standard deviation of errors for one member of a given ensemble will lie within 95% of the time. To reflect this fact, color shadings extend above and below in Figs. 12i–iv. The wider these shadings, the greater the variability of errors within one ensemble. The sampling noise is used to assess the significance of the differences observed between the errors of two analysis ensembles.
Fig. 12.
Fig. 12.

Average standard deviation of errors for background, observation, and analysis ensembles as a function of . Each circle/dot plotted here represents the average standard deviation of errors , estimated from an ensemble of 100 members. Color shadings extend above and below and indicate the variability of errors within different ensembles. The errors of background estimates and their bias-free counterparts appear in gray. Observation errors appear in blue. The errors of analysis ensembles obtained under the configurations (a)–(l) are also shown in orange, purple, and red. These configurations, the symbols, and the color coding used here are described in Fig. 10. A detailed description of how these results may be interpreted is provided in the text.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

For observation errors, depends on . The greater , the greater the correlation of observation errors and the more variable their average. The same ensemble of background estimates was used for all the analysis ensembles depicted in Fig. 12i. The statistics of background errors are thus depicted only once on the left-hand side of the figure.

Figure 12i shows for analyses obtained under the configurations (a), (b), (c), and (d). From now on, the short form “analyses (a)” will be used in reference to “all the analysis ensembles obtained under configuration (a).” These four configurations allow us to quantify the impacts of 1) representing homogeneous versus heterogeneous variance of background errors and 2) neglecting the correlation of background and observations errors. The only difference between analyses (a) and (b) is that homogeneous background error variances (indicated by hollow circles) were represented in (a), while heterogeneous background error variances (indicated by solid dots) were represented in (b). The same can be said about the analyses (c) and (d). Analyses (a) and (c) differ only in that correlations were neglected altogether in (a) (indicated by the color orange), while correlations were represented in both and (indicated by the color purple) in (c). Similarly for analyses (b) and (d). Color shadings indicate the magnitude of the sampling noise for analyses (b) and (d). The respective sampling noise of analyses (a) and (c) (not shown for clarity) are of comparable magnitudes.

Comparing analyses (a) to (b) and (c) to (d), we observe that representing heterogeneous versus homogeneous background error variance resulted in small reductions of . It is possible that the representation of heterogeneous error variance did yield better analyses in limited portions of the domain but these improvements did not significantly affect the average value of errors. This highlights the limitations associated with domain-averaged errors statistics. Surprisingly, representing the heterogeneous variance in (d) was detrimental to the quality of analyses for km.

The impacts of representing correlations in both and can be quantified by comparing analyses (a) to (c) and (b) to (d). In general, representing correlations resulted in analyses (shown in purple) with smaller errors than those obtained by neglecting correlations (shown in orange). However, the benefits associated with the representing correlations in and depend on the relative “amount” of correlations found in background and observation errors. The more similar the correlations of background and observation errors, the smaller the impact of representing them. For km (where ) representing correlations leads to small reductions of analysis errors. This result is consistent with those of Part I.

Figure 12ii shows analyses (e), (f), (g), and (h) obtained using . Otherwise, these analyses are identical to analyses (a), (b), (c), and (d), respectively. The errors of the bias-free background estimates are also displayed. The influence of “biases” in background estimates can be assessed by comparing the errors shown in Figs. 12i and 12ii. This comparison reveals that correcting “biases” results in significantly smaller background and analysis errors.

The influence of these “biases” can also be more subtle. For example, consider analyses (c) and (d), obtained with the “biased” estimates , to (g) and (h), obtained with the unbiased estimates . For km, representing correlations in and [in analyses (d)] was associated with an increase in analysis errors. Such increase is not observed for the “bias free” analyses (h).

In operational data assimilation, it is common to represent the correlations of background errors while neglecting the correlation of observation errors. Analyses obtained under the configurations (i), (j), (k), and (l) (all shown in red) were performed to investigate this situation. To avoid cluttering Fig. 12i, the errors of analyses (i) and (j) were plotted in Fig. 12iii. The errors , , and for analyses (b) and (d) were also replotted in this figure. For weakly correlated observation errors ( km), analyses (i) and (j) are almost identical to the more optimal analyses (d). In such cases, neglecting the correlation of observation errors could be done without consequences. However, as the correlation of observation errors was increased (using larger ), neglecting these correlations resulted in significant increases in the magnitude of analysis errors. For km it was preferable to neglect the correlations altogether, in (b), than to represent only the correlation of background errors in (i) and (j). Again, these results are consistent with those obtained in Part I. Analyses (k) and (l) (obtained with bias-free background estimates and shown in Fig. 12iv) behave in a similar manner.

So far, the correlation of background errors have either been represented using the diagonal matrix or the homogeneous matrix . Both matrices are described in Table 1. We wished to determine if better results could be obtained by using heterogeneous representation of background error correlations. To this end, assimilation experiments under the configurations (m), (n), (o), and (p) were performed with the localized correlation matrix described in Table 1.

In Fig. 13i the errors of analyses (m) appear as purple solid dots. For comparisons with previous results, the errors of analyses (d) have also been replotted as a purple dashed line. Analyses (m) differ from analyses (d) only in the use of localized correlations for the representation of background errors. The errors of analysis (m), obtained with localized correlations, are systematically larger than those of analyses (d), obtained with homogeneous correlations. Within the range km, it was preferable to neglect correlations altogether [analyses (b), in orange], than to represent the correlations of both background and observation errors [analyses (m), in red]. This is yet another demonstration that the imperfect representation of correlations may well prove more detrimental than no representation of correlations at all.

Fig. 13.
Fig. 13.

As in Fig. 12, but for analyses obtained under the configurations (m), (n), (o), and (p). Under those configurations, the localized matrix (see Fig. 10 and Table 1) was used to represent the correlation of background errors in . The errors of analyses (b), (d), (f), and (h) are also shown for comparison purposes. The errors of analyses (d) and (h) appear as dashed lines to avoid possible confusion with analyses (m) and (n).

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Figure 13ii shows the errors of analyses (f), (h), and (n). With the exception that the estimates were used, these analyses are the same as (b), (d), and (m), respectively. The conclusions we can draw from Fig. 13ii are the same as those obtained from Fig. 13i. As before, correcting the “biases” of background estimates leads to a significant decrease of analysis errors.

Figure 13iii shows the errors of analyses (o) obtained with localized background error correlations and neglected observation error correlations. Within the range km, neglecting the correlation of observation errors in (o) yielded analyses with smaller errors than those of analyses (m) in which correlations were represented in both and . We explain this result from the fact that the analyses (o) are “smoother” than those obtained in (m). This smoothing ends up diminishing the detrimental influence of the “biases” in . This assertion is supported by considering the errors of analyses (p) and (n) shown in Fig. 13iv and obtained from the bias-free background estimates . In this case, neglecting the correlation of observation errors in (p) does not lead to better analyses than representing them.

7. Discussion

In this study, idealized assimilation experiments were conducted to quantify the consequences of misrepresenting the errors of background and observation estimates. These errors had to be interpreted differently depending on how the truth was considered.

In the Bayesian framework, the truth is considered as a random variable and errors actually refer to the dispersion of probability densities around their respective centers. Given an ensemble of background estimates, one can characterize the matrix by estimating the variance and covariance of errors with respect to the ensemble average . In the present context, the expected value for these errors is zero and “biases” are not an issue.

In Fisher’s framework, the truth is considered as a constant and errors refer to the difference with this constant. In our experiments, the truth was chosen from the same forecast ensemble as the background estimates. It is, therefore, not possible for the truth to coincide with the ensemble average and background errors are “biased.” From the point of view of error estimation, “biased” estimates are indistinguishable from estimates whose errors are correlated to one another. These correlations demand that the variance of errors be increased.

One could, correctly, say that these “biases” are simply the consequence of considering the assimilation problem in a framework that is not compatible with the available data. Why choose Fisher’s framework then?

The main advantage for considering the assimilation problem in Fisher’s framework is its simplicity. In our experimental setup, a truth is chosen from the start. This truth is used for the construction of observations and the verification of analyses. After minimizing the cost function, we ask the question: how close are the analyses from the truth? The difference with is easy to compute and to analyze.

If we had considered our experiments in the Bayesian framework, we would have had to answer the question: how plausible is it that be a realization of the estimated posterior probability density? This question is more difficult to answer.

It can also be argued that considering the assimilation problem in Fisher’s framework encourages a proactive attitude toward improving the quality of analyses. In this framework, the zero-mean error assumption (see the appendix) cannot be satisfied for background estimates. This leads one to wonder what would happen in the case where this assumption was satisfied? In section 6, we presented the results of experiments in which the “biases” of background errors had been corrected prior to their assimilation. This correction was associated with significant improvements in the quality of analyses. In an operational context, performing such correction is far from obvious. However, the possibility exists and it may be investigated.

Conversely, in the Bayesian framework, it is known that analyses with minimum variance will be obtained at the condition of accurately representing errors in the matrices and . Once this is done, we have the “best” analyses that can be obtained given the data at our disposal. In this framework, the only path for improvements is the better representation of errors in and .

Some explanations are now provided as to why representing heterogeneous correlations in the localized matrix lead to analyses of poorer quality than those obtained with homogeneous correlations in the matrix . At first, this result appears counterintuitive. Why would representing heterogeneous correlations lead to larger errors than representing homogeneous correlations?

In Fig. 5, it is show that the short-lag correlations between background errors are well represented by a Gaussian decay, while the long-lag correlations are better represented by an exponential decay. We argued that that the model diffusion was probably responsible for the short-lag Gaussian correlations and that the atmospheric structure was responsible for the long-lag exponential correlations. The localization process forces the spatial correlations to zero beyond lags equal or greater than km. As a result, the short-lag Gaussian correlations will remain essentially unaltered, while the longer-lag exponential correlations may be significantly trimmed. In the end, the matrix mostly represents the Gaussian correlations between model errors at neighboring grid points. Conversely, the homogeneous matrix was constructed to provide a good representation of the long-lag correlations caused by the atmospheric structure (see Fig. 8ii). It appears that , representing atmospheric correlations over appreciably longer distances, provides a better representation of correlations than .

In multiple trial experiments, we found that increasing the localization parameter L (up to 20 km) did not result in better analyses than those obtained with . We explain this result by noting that if a larger L leads to a better representation of correlations, it also increases the condition number of the localized correlation matrix. Our hypothesis, is that for , the larger condition number was more detrimental to the quality of analyses than the better representation of correlations. Perhaps, a more sophisticated localization method would have resulted in better analyses.

8. Conclusions

In this two-part study, we documented the impacts of errors misrepresentations on the quality of analyses obtained by minimizing a simple cost function. In Part I (Jacques and Zawadzki 2014), idealized experiments were performed from simulated background and observation estimates made available everywhere in the assimilation domain. Under such ideal conditions, it was shown that representing correlations could only improve the quality of analyses when the respective correlations of background and observation errors were significantly different. Also, it was generally preferable to neglect correlations altogether than to represent only the correlation of background errors.

In the second part of this study, we verified that the previous results could be replicated in more realistic assimilation experiments. Again, analyses were obtained by minimizing a simple cost function. This time, however, the background estimates were taken from the output of a large ensemble (1000 members) of model forecasts in a convective situation. As before, simulated observations have been used but they were only made available in precipitation.

One contribution of this study is a detailed depiction of background errors in a convective environment. These errors are described both in the Bayesian framework (where the “truth” is considered as an unknown random variable) and in Fisher’s framework (where the truth is considered as an unknown constant). In most locations of the assimilation domain, the functional form for the distribution of background errors was shown to differ from the Gaussian that is usually assumed. In the Bayesian framework, the least squares criterion will be achieved irrespective of the functional form of the probability densities under consideration. However, after mentioning this fact, Tarantola (2005, p. 68) warns the reader that the least squares criterion may not be a good choice when probability densities are not Gaussian.

For the idealized experiments represented here, analyses were computed by minimizing the same cost function regardless of whether the truth is considered in the Bayesian or Fisher’s framework. However, the assumptions on the nature of errors and their representation in is different. In Fisher’s framework, background errors are “biased,” which demands an increase in the variance of errors represented in the matrix . In the context of our experiments where a truth is selected from the start, considering the assimilation problem in Fisher’s framework allows the easy verification of analyses. Simply, we quantify the average errors variance with respect to .

The assimilation experiments that are presented here were designed to quantify the influence of three different misrepresentation of errors in the matrices and .

First, relatively small differences were observed between analyses obtained with homogeneous versus heterogeneous error variance in . It is possible that this result is a consequence of only considering domain-averaged error statistics.

Second, analyses obtained with and without the representation of spatial correlations in and were compared. In this case, the results were similar to those obtained in Part I. The more similar the correlation of background and observation errors, the smaller the impact of representing them. As in Part I, representing only the correlation of background errors was generally more detrimental than not representing correlations at all.

Finally, we compared analyses obtained with and without correction of the “biases,” which occur as a consequence of considering the truth in Fisher’s framework. Significant improvements to the quality of analyses were obtained by correcting these “biases.” This result is not entirely surprising. One expects the quality of analyses to improve when background estimates are closer to the truth. In a more realistic framework, performing such bias correction will be difficult. However, methods for doing so could be investigated.

Acknowledgments

The authors are grateful for a careful review of an early version of this article by Dr. Chris Snyder. His comments had a significant influence on the final form presented here. We also acknowledge the input of Frédéric Fabry and Aitor Atencia on the nature of biases and the computation of errors in the presence of correlation. Finally, thanks to Konstantinos Menelaou and Jonathan Vogel for proofreading this article.

APPENDIX

The Cost Function in Fisher’s Framework

In Fisher’s framework, the “truth” is considered as an unknown constant to be recovered. It can be estimated using the method of statistical least squares (Lewis et al. 2006, p. 240). This method is essentially the same as the one used to “fit” a model through a set of “observations” defined as
ea1
In the context of our experiments, the “errors” of and ( and , respectively), are uncorrelated and the above quantities can be defined as
ea2
The following assumptions are made:
  1. E() = 0,
  2. and
  3. and are uncorrelated.
Given the observations , one can find the least squares estimate:
ea3
by minimizing
ea4
ea5

REFERENCES

  • Aksoy, A., , D. C. Dowell, , and C. Snyder, 2010: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part II: Short-range ensemble forecasts. Mon. Wea. Rev., 138, 12731292, doi:10.1175/2009MWR3086.1.

    • Search Google Scholar
    • Export Citation
  • Bayley, G. V., , and J. M. Hammersley, 1946: The “effective” number of independent observations in an autocorrelated time series. J. Roy. Stat. Soc., 8 (Suppl.), 184197, doi:10.2307/2983560.

    • Search Google Scholar
    • Export Citation
  • Berenguer, M., , M. Surcel, , I. Zawadzki, , M. Xue, , and F. Kong, 2012: The diurnal cycle of precipitation from continental radar mosaics and numerical weather prediction models. Part II: Intercomparison among numerical models and with nowcasting. Mon. Wea. Rev., 140, 26892705, doi:10.1175/MWR-D-11-00181.1.

    • Search Google Scholar
    • Export Citation
  • Caumont, O., , V. Ducrocq, , É. Wattrelot, , G. Jaubert, , and S. Pradier-Vabre, 2010: 1D+3DVar assimilation of radar reflectivity data: A proof of concept. Tellus,62A, 173–187, doi:10.3402/tellusa.v62i2.15678.

  • Chang, W., , K.-S. Chung, , L. Fillion, , and S.-J. Baek, 2014: Radar data assimilation in the Canadian high-resolution ensemble Kalman filter system: Performance and verification with real summer cases. Mon. Wea. Rev., 142, 21182138, doi:10.1175/MWR-D-13-00291.1.

    • Search Google Scholar
    • Export Citation
  • Chung, K.-S., , I. Zawadzki, , M. K. Yau, , and L. Fillion, 2009: Short-term forecasting of a midlatitude convective storm by the assimilation of single–Doppler radar observations. Mon. Wea. Rev., 137, 41154135, doi:10.1175/2009MWR2731.1.

    • Search Google Scholar
    • Export Citation
  • Chung, K.-S., , W. Chang, , L. Fillion, , and M. Tanguay, 2013: Examination of situation-dependent background error covariances at the convective scale in the context of the ensemble Kalman filter. Mon. Wea. Rev., 141, 33693387, doi:10.1175/MWR-D-12-00353.1.

    • Search Google Scholar
    • Export Citation
  • Fabry, F., 2006: The spatial variability of moisture in the boundary layer and its effect on convection initiation: Project-long characterization. Mon. Wea. Rev., 134, 7991, doi:10.1175/MWR3055.1.

    • Search Google Scholar
    • Export Citation
  • Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723757, doi:10.1002/qj.49712555417.

    • Search Google Scholar
    • Export Citation
  • Jacques, D., , and I. Zawadzki, 2014: The impacts of representing the correlation of errors in radar data assimilation. Part I: Experiments with simulated background and observation estimates. Mon. Wea. Rev., 142, 39984016, doi:10.1175/MWR-D-14-00104.1.

    • Search Google Scholar
    • Export Citation
  • Lewis, J. M., , S. Lakshmivarahan, , and S. Dhall, 2006: Dynamic Data Assimilation: A Least Squares Approach. Cambridge University Press, 680 pp.

  • Murphy, J. M., 1988: The impact of ensemble forecasts on predictability. Quart. J. Roy. Meteor. Soc., 114, 463493, doi:10.1002/qj.49711448010.

    • Search Google Scholar
    • Export Citation
  • Oliver, D., 1995: Moving averages for Gaussian simulation in two and three dimensions. Math. Geol., 27, 939960, doi:10.1007/BF02091660.

    • Search Google Scholar
    • Export Citation
  • Oliver, D., 1998: Calculation of the inverse of the covariance. Math. Geol., 30, 911933, doi:10.1023/A:1021734811230.

  • Purser, R. J., , W.-S. Wu, , D. F. Parrish, , and N. M. Roberts, 2003: Numerical aspects of the application of recursive filters to variational statistical analysis. Part I: Spatially homogeneous and isotropic Gaussian covariances. Mon. Wea. Rev., 131, 1524, doi:10.1175/1520-0493(2003)131<1524:NAOTAO>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Rennie, S. J., , S. L. Dance, , A. J. Illingworth, , S. P. Ballard, , and D. Simonin, 2011: 3D-Var assimilation of insect-derived Doppler radar radial winds in convective cases using a high-resolution model. Mon. Wea. Rev., 139, 11481163, doi:10.1175/2010MWR3482.1.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., , J. B. Klemp, , J. Dudhia, , D. O. Gill, , D. M. Barker, , W. Wang, , and J. G. Powers, 2005: A description of the advanced research WRF version 2. NCAR Tech. Note NCAR/TN-468+STR, 88 pp. [Available online at http://www.mmm.ucar.edu/wrf/users/docs/arw_v2.pdf.]

  • Snook, N., , M. Xue, , and Y. Jung, 2011: Analysis of a tornadic mesoscale convective vortex based on ensemble Kalman filter assimilation of CASA X-band and WSR-88D radar data. Mon. Wea. Rev., 139, 34463468, doi:10.1175/MWR-D-10-05053.1.

    • Search Google Scholar
    • Export Citation
  • Sobash, R. A., , and D. J. Stensrud, 2013: The impact of covariance localization for radar data on EnKF analyses of a developing MCS: Observing system simulation experiments. Mon. Wea. Rev., 141, 36913709, doi:10.1175/MWR-D-12-00203.1.

    • Search Google Scholar
    • Export Citation
  • Stratman, D. R., , M. C. Coniglio, , S. E. Koch, , and M. Xue, 2013: Use of multiple verification methods to evaluate forecasts of convection from hot- and cold-start convection-allowing models. Wea. Forecasting, 28, 119138, doi:10.1175/WAF-D-12-00022.1.

    • Search Google Scholar
    • Export Citation
  • Sun, J., 2004: Numerical prediction of thunderstorms: Fourteen years later. Atmospheric Turbulence and Mesoscale Meteorology, E. Fedorovich, R. Rotuno, and B. Stevens, Eds., Cambridge University Press, 139–164.

  • Sun, J., and Coauthors, 2014: Use of NWP for nowcasting convective precipitation: Recent progress and challenges. Bull. Amer. Meteor. Soc., 95, 409426, doi:10.1175/BAMS-D-11-00263.1.

    • Search Google Scholar
    • Export Citation
  • Talagrand, O., , and R. Vautard, 1999: Evaluation of probabilistic prediction systems. Proc. Workshop on Predictability, Reading, United Kingdom, European Centre for Medium-Range Weather Forecasts, 125.

  • Tarantola, A., 2005: Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM, 358 pp.

  • Wang, H., , J. Sun, , X. Zhang, , X.-Y. Huang, , and T. Auligné, 2013: Radar data assimilation with WRF 4D-Var. Part I: System development and preliminary testing. Mon. Wea. Rev., 141, 22242244, doi:10.1175/MWR-D-12-00168.1.

    • Search Google Scholar
    • Export Citation
  • Wilson, J. W., , Y. Feng, , M. Chen, , and R. D. Roberts, 2010: Nowcasting challenges during the Beijing Olympics: Successes, failures, and implications for future nowcasting systems. Wea. Forecasting, 25, 16911714, doi:10.1175/2010WAF2222417.1.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 2005: Representations of inverse covariances by differential operators. Adv. Atmos. Sci., 22, 181198, doi:10.1007/BF02918508.

    • Search Google Scholar
    • Export Citation
  • Yaremchuk, M., , and A. Sentchev, 2012: Multi-scale correlation functions associated with polynomials of the diffusion operator. Quart. J. Roy. Meteor. Soc., 138, 19481953, doi:10.1002/qj.1896.

    • Search Google Scholar
    • Export Citation
  • Zieba, A., 2010: Effective number of observations and unbiased estimators of variance for autocorrelated data—An overview. Metrol.Measure. Syst., 17, 3–16, doi:10.2478/v10178-010-0001-0.

    • Search Google Scholar
    • Export Citation
1

A simple rain water to reflectivity relation [Eq. (9) of Chung et al. 2009] was used.

2

As for “pure” exponential decay, correlations described by an equally weighted sum of exponentials lead to sparse inverse matrices that can be used in data assimilation at low computational costs.

3

In Part I, simulated observations were not generated for greater than 10 km because of truncation errors. This problem was solved here by generating the correlated noise on a grid larger than the assimilation domain.

4

To allow the use of colors, a figure is used here in place of a table.

Save