## 1. Introduction

This study investigates the impacts of misrepresenting error covariances when the background and observations are available at comparable densities. This situation is typical of storm-scale radar data assimilation, which provides the context and motivation for the experiments that are presented here. To avoid many of the complications associated with “true” radar data assimilation, our experiments were performed in a highly idealized context. As such, the results presented here are applicable not only to radar data assimilation but to any situation where dense observations may be available. As an introduction to this study, we provide a brief description of the current state of storm-scale data assimilation. The results of Jacques and Zawadzki (2014, hereafter Part I) are also discussed followed by a description of the experiments presented here.

In the past two decades, there has been a continuous research effort toward the numerical forecasting of thunderstorms [see a recent review paper by Sun et al. (2014)]. Weather radars, with spatial and temporal resolutions on the order of kilometers and minutes, have always assumed a central role in storm-scale forecasting. Unfortunately, radars provide limited or no information on most atmospheric state variables. This shortcoming is often compensated by the use of “background” estimates obtained from previously initiated forecasts. The process of “assimilating” radar observations aims to bring the background atmospheric state closer to the true atmospheric state.

Most commonly, the assimilation of radar data consists of an “analysis step” and a “forecast step.” During the analysis step, background and observation estimates are combined into one analysis. This can be done in the variational (e.g., Chung et al. 2009; Caumont et al. 2010; Rennie et al. 2011; Wang et al. 2013) or the Kalman filter (e.g., Aksoy et al. 2010; Snook et al. 2011; Chang et al. 2014) frameworks. During the forecast step, an atmospheric model is integrated forward in time using the result of the analysis step as initial conditions. In a process known as “cycling,” the analysis and forecast steps are repeated several times. The intention of performing such cycling is to gradually bring the background state closer to the true atmospheric state.

Unfortunately, the assimilation of radar data is not very effective. Despite advances in the methods used and increases in available computational power, convection-allowing prediction systems remain incapable of accurately forecasting thunderstorms a few hours in the future (Wilson et al. 2010). Even at larger scales, the beneficial influence of radar data assimilation is short lived and, approximately, does not exceed 6 hours (Berenguer et al. 2012; Stratman et al. 2013; Sun et al. 2014). Partly, these results can be explained by the low predictability of thunderstorms. The imperfection of atmospheric models and assimilation systems also play a role.

In this two-part study, we focus on the representation of error covariances at the analysis step of the data assimilation process. For linear systems, the correct representation of error covariances is required to obtain analyses that are minimum variance estimators of the “truth” (Tarantola 2005, p. 68). In practice, lack of knowledge and numerical constraints prohibit the perfect representation of these covariances. Therefore, there is potential for improving analyses through better representations of error covariances. Currently, this research avenue is an active research topic. Methods have been proposed to allow the efficient convolution of error fields with various correlation matrices (Oliver 1995; Gaspari and Cohn 1999; Purser et al. 2003) or their inverse (Oliver 1998; Xu 2005; Yaremchuk and Sentchev 2012). In the context of ensemble forecasting, correlations are commonly “localized” using methods varying in sophistication (see references in Sobash and Stensrud 2013).

While theory says that least squares analyses will be obtained with perfect representation of error covariances, it is unclear how much “better” these analyses would be from those obtained with misrepresented covariances. The situation is particularly interesting in radar data assimilation because both the background and observation estimates are available at comparable spatial densities.

*U*component of the wind, was estimated on a two-dimensional (2D) domain. Analyses

With this setup, truly optimal analyses could be obtained. These optimal analyses were then compared with other “suboptimal” analyses obtained by purposely misrepresenting covariances in

The experiments that were performed in Part I are very simple. For the second part of this study we wished to confirm that the previous results could be replicated in a more realistic setup. For the experiments presented here, dense observations were only made available in a limited portion of the assimilation domain as is generally the case with “real” radar observations. Another difference with the previous experiments is that background estimates were taken from the output of a mesoscale atmospheric model. This change required that an ensemble of forecasts be generated. The generation of this forecast ensemble, the choice of the truth and the background estimates

A significant portion of this article is dedicated to the description of background errors in a convective environment. The treatment of errors is delicate as it depends on how the truth is considered. Lewis et al. (2006, p. 228) state that there are two schools of thought for considering *random variable*. This sets the assimilation problem in the Bayesian framework, where a refined “posterior” probability density is sought from the combination of a broader “prior” probability density with observations. Alternatively, *constant* to be recovered. This sets the assimilation problem in Fisher’s framework. In this case, the estimation of

In section 3, we document the heterogeneous nature of background errors. The functional shape of the error distribution is discussed along with its derived mean, variance, and correlation. For simplicity, these errors were considered in Fisher’s framework (with respect to the predefined truth

When a forecast ensemble is not available, spatial averages are sometimes used to estimate the variance and covariance of background errors. In section 4, we revisit background error statistics under the homogeneity assumption. This allowed us to document the impact of representing heterogeneous versus homogeneous errors in the matrix

Once the background errors have been thoroughly discussed, we proceed to the assimilation experiments. The methodology under which these experiments were performed is described in section 5. By using different error representations in the matrices

## 2. Generation of the forecast ensemble and choice of the truth

In Part I, the background estimates were simulated by adding zero-mean, correlated noise to a predefined truth. Here, both the background estimates and the truth, were taken from an ensemble of convection-resolving forecasts. In this section, we discuss the generation of this forecast ensemble and the choice of

For our idealized assimilation experiments, we selected

To determine the magnitude of the ensemble spread to be used in our experiments, we inspired ourselves from radar data assimilation. In operational contexts, the quality of background estimates varies between two extremes. In the worst cases, the atmospheric conditions prevailing in the background do not allow the occurrence of convection that is compatible with radar observations. In the best cases, the convection occurring in background estimates is found in the same locations as in radar observations. Such background estimates have been demonstrated after the assimilation of radar data (e.g., Aksoy et al. 2010; Chang et al. 2014). However, even after the assimilation of radar data, it is currently not realistic to expect that individual convective cells occurring in the “real” atmosphere will be accurately resolved in background estimates.

We wished to perform our assimilation experiments using the best background estimates that would still be representative of those obtained in real data assimilation. To this end, we generated a forecast ensemble by applying slight perturbations to the low-level moisture in the initial conditions of each member of our forecast ensemble. The same initial conditions were used for all other state variables. Forecasts were generated using the Weather Research and Forecasting (WRF) Model (Skamarock et al. 2005) for a convective event that occurred near Montréal (Québec, Canada) on 17 July 2010. Forecasts were performed using two nested domains with horizontal resolutions of 3 and 1 km.

We chose to perturb the moisture field because of its strong influence on convection (Sun 2004). The perturbation consisted of tridimensional fields of zero-mean correlated noise. This noise followed a Gaussian distribution whose correlations were described by a homogeneous and isotropic exponential decay. The rates of decay for the horizontal and vertical correlations were set to 180 and 3 km, respectively. Such decay rates were found to roughly approximate the average autocorrelation of moisture fields found in the larger nesting domain. The standard deviation of the perturbation noise varied in height and was set to 10% of the average standard deviation of moisture found on the different model levels. This is a relatively small perturbation compared to the uncertainty associated with moisture estimates (Fabry 2006).

Simulations were allowed to run for four hours, a sufficiently long period for the model to develop convection. A total of 1001 such forecasts were computed in which 1 forecast was chosen as the truth

During the period of model integration, the perturbations to the moisture field propagated to all the other atmospheric fields. Figures 1i–vi are provided as a verification that the background ensemble had the desired property of diverging from *W* component of the wind in red, and water vapor mixing ratio in gray.

Figures 1i–iii depict the atmospheric conditions prevailing in the larger nesting domain. We can observe that the location of convection as well as large-scale features (most visible in the moisture field) are very similar for each of the forecasts displayed. By present-day forecasting standards, a background ensemble displaying such proximity with the true atmospheric state would be considered excellent. In Figs. 1iv–vi we zoom on a 101 km × 101 km area where convection is found in most members of the forecast ensemble. This area is the domain over which we performed our assimilation experiments. In this smaller domain, we can observe that the convective features greatly differ between the different members of the ensemble.

Figures 1i–vi demonstrate that slight perturbations to the moisture field can have a strong influence on all the other atmospheric variables in the vicinity of convection. We chose to display liquid water, positive vertical velocity, and water vapor because most readers will likely be familiar with these variables. For our experiments, only the *U* component of the wind (not shown) was retained from each model simulation. This state variable was chosen because of its proximity to Doppler velocity. The reader should keep in mind that all aspects of the assimilation problem discussed in this article pertain to, and only to, the *U* component of the wind. From this point on, references to this state variable will be implicit.

In the next section, we provide a description of background errors throughout the assimilation domain. For some aspects of this description, we distinguish between the errors near and away from convection. It is convenient, then, to consider Fig. 2, which represents the percentage of ensemble members with reflectivity greater than 15 dB*Z*^{1} as a function of the location within the assimilation domain. This figure shows that convection, associated with intense precipitation, is mostly found along a band that extends from the bottom-left to the top-right corners of the assimilation domain. An arbitrary threshold of 10%, illustrated by a white line, was chosen to delimit the rainy and nonrainy portion of the assimilation domain. This white line will be plotted over some of the error statistics presented in the next sections. The position of four points (labeled A, B, C, and D), two in the rainy area and two outside, is also shown. Details on the distribution and correlation of background errors will be given at these four locations.

Percentage of background members with reflectivity greater than 15 dB*Z*. The position of four points of interest, A, B, C, and D, is also depicted along with a white contour line delimiting the area where rain is found in more than 10% of background ensemble members.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Percentage of background members with reflectivity greater than 15 dB*Z*. The position of four points of interest, A, B, C, and D, is also depicted along with a white contour line delimiting the area where rain is found in more than 10% of background ensemble members.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Percentage of background members with reflectivity greater than 15 dB*Z*. The position of four points of interest, A, B, C, and D, is also depicted along with a white contour line delimiting the area where rain is found in more than 10% of background ensemble members.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

## 3. Heterogeneous statistics for background errors

In this section, we discuss the distribution, bias, correlation, and standard deviation of background errors. These statistics will be used for the construction of different covariance matrices.

*n*=

*nx*×

*ny*= 10 201 elements. Rows of the matrix

In the following equations, square brackets are used to represent matrix indices. The index *i* refers to the position in the assimilation domain, while the index *m* refers to the “number” of a given member of the background ensemble. For example, the expression *i*th position in the domain and for the *m*th member of the ensemble. Sometimes, an asterisk will be used to designate all possible indices over a given row or column of *i*th position of the domain for every ensemble member.

### a. Distribution of background errors

One of the reasons for generating a background ensemble with 1000 members was to document the functional form of the distribution of background errors in the vicinity of convection. In Figs. 3i–iv we plot four histograms (in black) that represent the distribution of background errors in rainy (A and B) and nonrainy (C and D) positions of the assimilation domain. For each of the histograms displayed, we also plotted fits of a Gaussian (in orange) and a double exponential (in blue) distribution. At position A, the distribution of errors is well represented by the double-exponential distribution. At position C, the distribution of errors is well represented by the Gaussian distribution. Such good fits, however, are not the norm as can been seen at position B, where some skewness can be observed and position D where the distribution exhibits an irregular function with a hint of bimodality.

Histograms representing the probability density of background errors at locations A, B, C, and D (see Fig. 2). Fits of Gaussian distributions are plotted in orange and fits of double exponential distributions are plotted in blue. Vertical black lines indicate the position of zero.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Histograms representing the probability density of background errors at locations A, B, C, and D (see Fig. 2). Fits of Gaussian distributions are plotted in orange and fits of double exponential distributions are plotted in blue. Vertical black lines indicate the position of zero.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Histograms representing the probability density of background errors at locations A, B, C, and D (see Fig. 2). Fits of Gaussian distributions are plotted in orange and fits of double exponential distributions are plotted in blue. Vertical black lines indicate the position of zero.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

### b. “Biases” of background errors

When the truth is considered as a random variable, the expected value of background errors will be zero. That is,

The magnitude of the difference between

Magnitude of the difference

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Magnitude of the difference

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Magnitude of the difference

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

### c. Correlation of background errors

*i*and

Figures 5i–iv depict the correlation between the errors found at the locations A, B, C, and D and the errors everywhere else in the assimilation domain. In the rainy area (A and B), correlations decay rapidly and are approximately isotropic. This is in agreement with similar correlations shown by Chung et al. (2013). Outside of convection (C and D), correlations are significant over much longer distances and are no longer isotropic.

(i)–(iv) Correlations between the errors at locations A, B, C, and D and errors in the rest of the assimilation domain. Correlations in the rainy area (A and B) decay rapidly and are nearly isotropic. Correlations outside of precipitation (C and D) decay over much longer distances and have a complex spatial structure. (v)–(viii) The same data presented in 2D graphs for better representation of the functional form of the correlation decay. Black lines represent the average correlation as a function of distance; gray shadings represent the variability due to anisotropy. Fits of Gaussian (in orange) and exponential (in blue) decay are also provided. The dashed lines indicate correlations of magnitude

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

(i)–(iv) Correlations between the errors at locations A, B, C, and D and errors in the rest of the assimilation domain. Correlations in the rainy area (A and B) decay rapidly and are nearly isotropic. Correlations outside of precipitation (C and D) decay over much longer distances and have a complex spatial structure. (v)–(viii) The same data presented in 2D graphs for better representation of the functional form of the correlation decay. Black lines represent the average correlation as a function of distance; gray shadings represent the variability due to anisotropy. Fits of Gaussian (in orange) and exponential (in blue) decay are also provided. The dashed lines indicate correlations of magnitude

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

(i)–(iv) Correlations between the errors at locations A, B, C, and D and errors in the rest of the assimilation domain. Correlations in the rainy area (A and B) decay rapidly and are nearly isotropic. Correlations outside of precipitation (C and D) decay over much longer distances and have a complex spatial structure. (v)–(viii) The same data presented in 2D graphs for better representation of the functional form of the correlation decay. Black lines represent the average correlation as a function of distance; gray shadings represent the variability due to anisotropy. Fits of Gaussian (in orange) and exponential (in blue) decay are also provided. The dashed lines indicate correlations of magnitude

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

To better represent the functional form of the correlation decays, the same data have been replotted in Figs. 5v–viii. As for the distribution of background errors, we fitted both a Gaussian and an exponential decay to the correlations being depicted. Gaussian decays (in orange) were fitted for minimum square errors at lags

In the Bayesian framework [with

*m*and

*i*and

### d. Standard deviation of background errors

In the Bayesian framework, the errors of different background members are not correlated and

The heterogeneous standard deviation of background errors (estimated with

Standard deviation of background errors (with respect to

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Standard deviation of background errors (with respect to

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Standard deviation of background errors (with respect to

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

## 4. Homogeneous statistics for background errors

In operational contexts, spatial and temporal averages are often used to characterize the nature of background errors. In this section, we discuss what happens when background errors are assumed to be spatially homogeneous. As before, the distribution, bias, correlation, and standard deviation of background errors are discussed. This will allow us to investigate the impacts of representing homogeneous versus heterogeneous background errors in the matrix

### a. Average distribution of background errors

Figure 7 shows (in black) the average distribution of background errors with respect to

Average distribution of background errors (in black). Gray shading indicates the member-to-member variability about the average error distribution. A double-exponential distribution centered at zero (in blue) provides a good fit for the average distribution of background errors.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Average distribution of background errors (in black). Gray shading indicates the member-to-member variability about the average error distribution. A double-exponential distribution centered at zero (in blue) provides a good fit for the average distribution of background errors.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Average distribution of background errors (in black). Gray shading indicates the member-to-member variability about the average error distribution. A double-exponential distribution centered at zero (in blue) provides a good fit for the average distribution of background errors.

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

### b. Average “bias” of background errors

In Fig. 7, the domain-averaged distribution of background errors with respect to

### c. Average correlation of background errors

*γ*represents the distance between any two errors, while the

*α*s represent the decay rate of each exponential function. In Fig. 8,

^{2}

Average correlation of background errors obtained by averaging correlations in

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Average correlation of background errors obtained by averaging correlations in

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Average correlation of background errors obtained by averaging correlations in

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

### d. Average standard deviation of background errors

The scalar

The average standard deviation of background errors ^{−1} when errors were estimated with respect to

As before, ^{−1} when background errors were estimated with respect to

This second method for estimating the average variance of errors is provided to show the “symmetry” that exists when estimating error variances from the matrix

## 5. Experimental setup

We now describe the methodology by which our assimilation experiments were conducted. For consistency with the results of Part I, only one atmospheric state variable, the *U* component of the wind, was estimated on a 2D domain of

### a. Simulated observations

*U*component of the wind with known error statistics. Fields of observation errors

*ζ*between observation errors at any two locations,

*i*and

An ensemble of 100 observation estimates was created for different ^{3} With

We let the standard deviation of observation errors, ^{−1}, which is the average standard deviation of background errors estimated with respect to *U* component of the wind prevailing in

(i) One realization of the simulated observations obtained with ^{−1} and *U* component of the wind prevailing in

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

(i) One realization of the simulated observations obtained with ^{−1} and *U* component of the wind prevailing in

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

(i) One realization of the simulated observations obtained with ^{−1} and *U* component of the wind prevailing in

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

### b. Different analysis ensembles

A number of analysis ensembles have been performed to quantify the influence of the following:

representing homogeneous versus heterogeneous variance of background errors,

neglecting the correlation of background and/or observation errors, and

the presence of “biases” in Fisher’s framework.

All the different error matrices that were used in our assimilation experiments are described in Table 1. Here are some notes on the notation: whenever exact error statistics are not represented in *s* and *C* are used in place of their bold counterparts.

The different error matrices used in our experiments.

Ensembles of analyses were generated for various combination of the matrices ^{4} Figure 10 also provides the legend for interpreting the results presented in Figs. 11–13.

Description of the different configurations under which analysis ensembles were generated. The “symbol” column on the right-hand side is also the legend by which Figs. 11–13 may be interpreted. Some notes on the symbols and color coding: hollow circles represent analyses obtained with homogeneous representation of background error variance; solid dots indicate heterogeneous variance. The color orange indicates analyses for which the correlation of errors were neglected altogether. The color purple indicates analyses for which correlations were represented in both

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Description of the different configurations under which analysis ensembles were generated. The “symbol” column on the right-hand side is also the legend by which Figs. 11–13 may be interpreted. Some notes on the symbols and color coding: hollow circles represent analyses obtained with homogeneous representation of background error variance; solid dots indicate heterogeneous variance. The color orange indicates analyses for which the correlation of errors were neglected altogether. The color purple indicates analyses for which correlations were represented in both

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Description of the different configurations under which analysis ensembles were generated. The “symbol” column on the right-hand side is also the legend by which Figs. 11–13 may be interpreted. Some notes on the symbols and color coding: hollow circles represent analyses obtained with homogeneous representation of background error variance; solid dots indicate heterogeneous variance. The color orange indicates analyses for which the correlation of errors were neglected altogether. The color purple indicates analyses for which correlations were represented in both

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

## 6. Results

One analysis ensemble with

In a manner similar to Part I, all the results are presented in the form of plots with the rate of decay of observation errors,

### a. Average correlation of analysis errors

It is difficult to synthesize information on the correlation of a large number of analysis ensembles. We chose to represent only the “amount” of correlation that can be inferred from *N*.

In Fig. 11,

Effective number of uncorrelated pieces of information,

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Effective number of uncorrelated pieces of information,

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Effective number of uncorrelated pieces of information,

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

For

### b. Average standard deviation of analysis errors

Data assimilation strives at minimizing the variance of errors between the analyses and the truth. In the present case, these errors are heterogeneous and can only be represented in 2D maps such as Fig. 6. To allow comparisons between the different analyses that were generated, only the average standard deviation of analysis errors, *N*.

Average standard deviation of errors for background, observation, and analysis ensembles as a function of

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Average standard deviation of errors for background, observation, and analysis ensembles as a function of

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Average standard deviation of errors for background, observation, and analysis ensembles as a function of

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

For observation errors,

Figure 12i shows

Comparing analyses (a) to (b) and (c) to (d), we observe that representing heterogeneous versus homogeneous background error variance resulted in small reductions of

The impacts of representing correlations in both

Figure 12ii shows analyses (e), (f), (g), and (h) obtained using

The influence of these “biases” can also be more subtle. For example, consider analyses (c) and (d), obtained with the “biased” estimates

In operational data assimilation, it is common to represent the correlations of background errors while neglecting the correlation of observation errors. Analyses obtained under the configurations (i), (j), (k), and (l) (all shown in red) were performed to investigate this situation. To avoid cluttering Fig. 12i, the errors of analyses (i) and (j) were plotted in Fig. 12iii. The errors

So far, the correlation of background errors have either been represented using the diagonal matrix

In Fig. 13i the errors of analyses (m) appear as purple solid dots. For comparisons with previous results, the errors of analyses (d) have also been replotted as a purple dashed line. Analyses (m) differ from analyses (d) only in the use of localized correlations for the representation of background errors. The errors of analysis (m), obtained with localized correlations, are systematically larger than those of analyses (d), obtained with homogeneous correlations. Within the range

As in Fig. 12, but for analyses obtained under the configurations (m), (n), (o), and (p). Under those configurations, the localized matrix

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

As in Fig. 12, but for analyses obtained under the configurations (m), (n), (o), and (p). Under those configurations, the localized matrix

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

As in Fig. 12, but for analyses obtained under the configurations (m), (n), (o), and (p). Under those configurations, the localized matrix

Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-14-00243.1

Figure 13ii shows the errors of analyses (f), (h), and (n). With the exception that the estimates

Figure 13iii shows the errors of analyses (o) obtained with localized background error correlations and neglected observation error correlations. Within the range

## 7. Discussion

In this study, idealized assimilation experiments were conducted to quantify the consequences of misrepresenting the errors of background and observation estimates. These errors had to be interpreted differently depending on how the truth was considered.

In the Bayesian framework, the truth is considered as a random variable and errors actually refer to the dispersion of probability densities around their respective centers. Given an ensemble of background estimates, one can characterize the matrix

In Fisher’s framework, the truth is considered as a constant and errors refer to the difference with this constant. In our experiments, the truth was chosen from the same forecast ensemble as the background estimates. It is, therefore, not possible for the truth to coincide with the ensemble average and background errors are “biased.” From the point of view of error estimation, “biased” estimates are indistinguishable from estimates whose errors are correlated to one another. These correlations demand that the variance of errors be increased.

One could, correctly, say that these “biases” are simply the consequence of considering the assimilation problem in a framework that is not compatible with the available data. Why choose Fisher’s framework then?

The main advantage for considering the assimilation problem in Fisher’s framework is its simplicity. In our experimental setup, a truth is chosen from the start. This truth is used for the construction of observations and the verification of analyses. After minimizing the cost function, we ask the question: how close are the analyses from the truth? The difference with

If we had considered our experiments in the Bayesian framework, we would have had to answer the question: how plausible is it that

It can also be argued that considering the assimilation problem in Fisher’s framework encourages a proactive attitude toward improving the quality of analyses. In this framework, the zero-mean error assumption (see the appendix) cannot be satisfied for background estimates. This leads one to wonder what would happen in the case where this assumption was satisfied? In section 6, we presented the results of experiments in which the “biases” of background errors had been corrected prior to their assimilation. This correction was associated with significant improvements in the quality of analyses. In an operational context, performing such correction is far from obvious. However, the possibility exists and it may be investigated.

Conversely, in the Bayesian framework, it is known that analyses with minimum variance will be obtained at the condition of accurately representing errors in the matrices

Some explanations are now provided as to why representing heterogeneous correlations in the localized matrix

In Fig. 5, it is show that the short-lag correlations between background errors are well represented by a Gaussian decay, while the long-lag correlations are better represented by an exponential decay. We argued that that the model diffusion was probably responsible for the short-lag Gaussian correlations and that the atmospheric structure was responsible for the long-lag exponential correlations. The localization process forces the spatial correlations to zero beyond lags equal or greater than

In multiple trial experiments, we found that increasing the localization parameter *L* (up to 20 km) did not result in better analyses than those obtained with *L* leads to a better representation of correlations, it also increases the condition number of the localized correlation matrix. Our hypothesis, is that for

## 8. Conclusions

In this two-part study, we documented the impacts of errors misrepresentations on the quality of analyses obtained by minimizing a simple cost function. In Part I (Jacques and Zawadzki 2014), idealized experiments were performed from simulated background and observation estimates made available everywhere in the assimilation domain. Under such ideal conditions, it was shown that representing correlations could only improve the quality of analyses when the respective correlations of background and observation errors were significantly different. Also, it was generally preferable to neglect correlations altogether than to represent only the correlation of background errors.

In the second part of this study, we verified that the previous results could be replicated in more realistic assimilation experiments. Again, analyses were obtained by minimizing a simple cost function. This time, however, the background estimates were taken from the output of a large ensemble (1000 members) of model forecasts in a convective situation. As before, simulated observations have been used but they were only made available in precipitation.

One contribution of this study is a detailed depiction of background errors in a convective environment. These errors are described both in the Bayesian framework (where the “truth” is considered as an unknown random variable) and in Fisher’s framework (where the truth is considered as an unknown constant). In most locations of the assimilation domain, the functional form for the distribution of background errors was shown to differ from the Gaussian that is usually assumed. In the Bayesian framework, the least squares criterion will be achieved irrespective of the functional form of the probability densities under consideration. However, after mentioning this fact, Tarantola (2005, p. 68) warns the reader that the least squares criterion may not be a good choice when probability densities are not Gaussian.

For the idealized experiments represented here, analyses were computed by minimizing the same cost function regardless of whether the truth is considered in the Bayesian or Fisher’s framework. However, the assumptions on the nature of errors and their representation in

The assimilation experiments that are presented here were designed to quantify the influence of three different misrepresentation of errors in the matrices

First, relatively small differences were observed between analyses obtained with homogeneous versus heterogeneous error variance in

Second, analyses obtained with and without the representation of spatial correlations in

Finally, we compared analyses obtained with and without correction of the “biases,” which occur as a consequence of considering the truth in Fisher’s framework. Significant improvements to the quality of analyses were obtained by correcting these “biases.” This result is not entirely surprising. One expects the quality of analyses to improve when background estimates are closer to the truth. In a more realistic framework, performing such bias correction will be difficult. However, methods for doing so could be investigated.

## Acknowledgments

The authors are grateful for a careful review of an early version of this article by Dr. Chris Snyder. His comments had a significant influence on the final form presented here. We also acknowledge the input of Frédéric Fabry and Aitor Atencia on the nature of biases and the computation of errors in the presence of correlation. Finally, thanks to Konstantinos Menelaou and Jonathan Vogel for proofreading this article.

## APPENDIX

### The Cost Function in Fisher’s Framework

*E*() = 0, and and are uncorrelated.

## REFERENCES

Aksoy, A., D. C. Dowell, and C. Snyder, 2010: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part II: Short-range ensemble forecasts.

,*Mon. Wea. Rev.***138**, 1273–1292, doi:10.1175/2009MWR3086.1.Bayley, G. V., and J. M. Hammersley, 1946: The “effective” number of independent observations in an autocorrelated time series.

,*J. Roy. Stat. Soc.***8**(Suppl.), 184–197, doi:10.2307/2983560.Berenguer, M., M. Surcel, I. Zawadzki, M. Xue, and F. Kong, 2012: The diurnal cycle of precipitation from continental radar mosaics and numerical weather prediction models. Part II: Intercomparison among numerical models and with nowcasting.

,*Mon. Wea. Rev.***140**, 2689–2705, doi:10.1175/MWR-D-11-00181.1.Caumont, O., V. Ducrocq, É. Wattrelot, G. Jaubert, and S. Pradier-Vabre, 2010: 1D+3DVar assimilation of radar reflectivity data: A proof of concept.

*Tellus,***62A,**173–187, doi:10.3402/tellusa.v62i2.15678.Chang, W., K.-S. Chung, L. Fillion, and S.-J. Baek, 2014: Radar data assimilation in the Canadian high-resolution ensemble Kalman filter system: Performance and verification with real summer cases.

,*Mon. Wea. Rev.***142**, 2118–2138, doi:10.1175/MWR-D-13-00291.1.Chung, K.-S., I. Zawadzki, M. K. Yau, and L. Fillion, 2009: Short-term forecasting of a midlatitude convective storm by the assimilation of single–Doppler radar observations.

,*Mon. Wea. Rev.***137**, 4115–4135, doi:10.1175/2009MWR2731.1.Chung, K.-S., W. Chang, L. Fillion, and M. Tanguay, 2013: Examination of situation-dependent background error covariances at the convective scale in the context of the ensemble Kalman filter.

,*Mon. Wea. Rev.***141**, 3369–3387, doi:10.1175/MWR-D-12-00353.1.Fabry, F., 2006: The spatial variability of moisture in the boundary layer and its effect on convection initiation: Project-long characterization.

,*Mon. Wea. Rev.***134**, 79–91, doi:10.1175/MWR3055.1.Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125**, 723–757, doi:10.1002/qj.49712555417.Jacques, D., and I. Zawadzki, 2014: The impacts of representing the correlation of errors in radar data assimilation. Part I: Experiments with simulated background and observation estimates.

,*Mon. Wea. Rev.***142**, 3998–4016, doi:10.1175/MWR-D-14-00104.1.Lewis, J. M., S. Lakshmivarahan, and S. Dhall, 2006:

*Dynamic Data Assimilation: A Least Squares Approach*. Cambridge University Press, 680 pp.Murphy, J. M., 1988: The impact of ensemble forecasts on predictability.

,*Quart. J. Roy. Meteor. Soc.***114**, 463–493, doi:10.1002/qj.49711448010.Oliver, D., 1995: Moving averages for Gaussian simulation in two and three dimensions.

,*Math. Geol.***27**, 939–960, doi:10.1007/BF02091660.Oliver, D., 1998: Calculation of the inverse of the covariance.

,*Math. Geol.***30**, 911–933, doi:10.1023/A:1021734811230.Purser, R. J., W.-S. Wu, D. F. Parrish, and N. M. Roberts, 2003: Numerical aspects of the application of recursive filters to variational statistical analysis. Part I: Spatially homogeneous and isotropic Gaussian covariances.

,*Mon. Wea. Rev.***131**, 1524, doi:10.1175/1520-0493(2003)131<1524:NAOTAO>2.0.CO;2.Rennie, S. J., S. L. Dance, A. J. Illingworth, S. P. Ballard, and D. Simonin, 2011: 3D-Var assimilation of insect-derived Doppler radar radial winds in convective cases using a high-resolution model.

,*Mon. Wea. Rev.***139**, 1148–1163, doi:10.1175/2010MWR3482.1.Skamarock, W. C., J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. G. Powers, 2005: A description of the advanced research WRF version 2. NCAR Tech. Note NCAR/TN-468+STR, 88 pp. [Available online at http://www.mmm.ucar.edu/wrf/users/docs/arw_v2.pdf.]

Snook, N., M. Xue, and Y. Jung, 2011: Analysis of a tornadic mesoscale convective vortex based on ensemble Kalman filter assimilation of CASA X-band and WSR-88D radar data.

,*Mon. Wea. Rev.***139**, 3446–3468, doi:10.1175/MWR-D-10-05053.1.Sobash, R. A., and D. J. Stensrud, 2013: The impact of covariance localization for radar data on EnKF analyses of a developing MCS: Observing system simulation experiments.

,*Mon. Wea. Rev.***141**, 3691–3709, doi:10.1175/MWR-D-12-00203.1.Stratman, D. R., M. C. Coniglio, S. E. Koch, and M. Xue, 2013: Use of multiple verification methods to evaluate forecasts of convection from hot- and cold-start convection-allowing models.

,*Wea. Forecasting***28**, 119–138, doi:10.1175/WAF-D-12-00022.1.Sun, J., 2004: Numerical prediction of thunderstorms: Fourteen years later.

*Atmospheric Turbulence and Mesoscale Meteorology,*E. Fedorovich, R. Rotuno, and B. Stevens, Eds., Cambridge University Press, 139–164.Sun, J., and Coauthors, 2014: Use of NWP for nowcasting convective precipitation: Recent progress and challenges.

,*Bull. Amer. Meteor. Soc.***95**, 409–426, doi:10.1175/BAMS-D-11-00263.1.Talagrand, O., and R. Vautard, 1999: Evaluation of probabilistic prediction systems.

*Proc. Workshop on Predictability,*Reading, United Kingdom, European Centre for Medium-Range Weather Forecasts, 1–25.Tarantola, A., 2005:

*Inverse Problem Theory and Methods for Model Parameter Estimation.*SIAM, 358 pp.Wang, H., J. Sun, X. Zhang, X.-Y. Huang, and T. Auligné, 2013: Radar data assimilation with WRF 4D-Var. Part I: System development and preliminary testing.

,*Mon. Wea. Rev.***141**, 2224–2244, doi:10.1175/MWR-D-12-00168.1.Wilson, J. W., Y. Feng, M. Chen, and R. D. Roberts, 2010: Nowcasting challenges during the Beijing Olympics: Successes, failures, and implications for future nowcasting systems.

,*Wea. Forecasting***25**, 1691–1714, doi:10.1175/2010WAF2222417.1.Xu, Q., 2005: Representations of inverse covariances by differential operators.

,*Adv. Atmos. Sci.***22**, 181–198, doi:10.1007/BF02918508.Yaremchuk, M., and A. Sentchev, 2012: Multi-scale correlation functions associated with polynomials of the diffusion operator.

,*Quart. J. Roy. Meteor. Soc.***138**, 1948–1953, doi:10.1002/qj.1896.Zieba, A., 2010: Effective number of observations and unbiased estimators of variance for autocorrelated data—An overview.

,*Metrol.Measure. Syst.***17**, 3–16, doi:10.2478/v10178-010-0001-0.

^{1}

A simple rain water to reflectivity relation [Eq. (9) of Chung et al. 2009] was used.

^{2}

As for “pure” exponential decay, correlations described by an equally weighted sum of exponentials lead to sparse inverse matrices that can be used in data assimilation at low computational costs.

^{3}

In Part I, simulated observations were not generated for

^{4}

To allow the use of colors, a figure is used here in place of a table.