1. Introduction
Because of their high spatial and temporal resolution, radar observations have great potential for improving atmospheric analyses and the ensuing forecasts. Despite 30 years (Lilly 1990; Sun et al. 1991) of ongoing research, our skills in forecasting mesoscale convection have remained modest. Over continental scales, radar data assimilation was shown to improve forecasts for periods not exceeding 6–8 h (Berenguer et al. 2012; Stratman et al. 2013). Over regional scales, a forecasting system intercomparison by Wilson et al. (2010) found no individual system capable of accurately forecasting convection a few hours into the future. In a context where the resolution of operational models is regularly increased, making the best out of radar observations is as pertinent as it ever was.
The most common framework for assimilating radar data is to combine a first guess from a previously initiated model forecast (the background) to radar observations in order to obtain an analysis. An “optimal” analysis, the one with minimum error variance, can be found by minimizing a cost function in which the contribution of background and observations estimates are weighted by the inverse of their variance and covariance.
Because of the lack of information and limited computational resources, the covariance (or, equivalently, the correlation) of background and observation errors can only be represented in a simplified form. A nonexhaustive list of methods for doing so would include the recursive filter for representing Gaussian correlations (Purser et al. 2003) along with various expressions representing convolutions with different correlation matrices (Oliver 1995; Gaspari and Cohn 1999) or their inverse (Oliver 1998; Xu 2005; Yaremchuk and Sentchev 2012).
In convective situations, radar observations are generally available over significant portions of the analysis domain at a spatial resolution comparable to that of convection-resolving models. While the instrumental errors of radar observations are not correlated (Keeler and Ellis 2000), the representativeness errors associated with the integration of the precipitating medium within a radar volume might be. Because radar integration is reflectivity weighted, gradients in the intensity of precipitation will be the source of errors (Zawadzki 1973). Precipitation possessing a scaling structure (Fabry 1996), these errors may well be correlated.
Nevertheless, it is not uncommon to neglect the correlations of observation errors in radar data assimilation (see, e.g., Chung et al. 2009). To prevent the negative impact associated with this misrepresentation, it has been suggested to consider only observations sufficiently far apart so that they effectively become uncorrelated (Liu and Rabier 2002, 2003).
In this study, we investigate the impact of representing and misrepresenting the correlation of errors on the quality of analyses. Data thinning and the purposeful misrepresentation of variance to improve the quality of analyses are also examined.
Conceptually, the experiments presented here are similar to those of Liu and Rabier (2002, hereafter LR02). Several assimilation experiments are performed in an idealized context where the correlation of background and observation errors is prescribed. Both the background and observations are made available everywhere in the assimilation domain.
The problem under investigation, which considerably differs from the one examined by LR02, is specified in section 2, followed by theoretical considerations in section 3. Expression for analysis errors are given for the cases where correlations are either entirely neglected or perfectly represented. Special attention is also given to the precision at which the standard deviation may be estimated from a sample of correlated errors. Analyses obtained with different combination of background and observation errors are then examined in section 4.
In section 5 we compare the standard deviation of errors for analyses obtained with perfect representation of correlations to those obtained by neglecting correlations altogether. The computational costs of analyses are discussed in section 5a, followed by a short discussion, in section 5b, on cases where the correlation of errors may be neglected with little influence on the quality of analyses. We then consider analyses in which the correlation of only one term is neglected. This situation is examined first without (section 5c) then with (section 5d) data thinning.
In section 6, a few words are being said on the correlation of analysis errors. Results are discussed in section 7 followed by conclusions in section 8.
2. Problem setup
In this section, we describe the framework within which we studied the impact of representing and misrepresenting the correlation of errors in data assimilation.







The assimilation context we just described is much simpler than any realistic radar data assimilation. Only one variable is retrieved, we suppose that direct observations of xt can be made, errors are unbiased, and temporal dependences (such as cycling and the influence of model equations) are not taken into account.
Even for this simple situation, it is very difficult to generalize the impact of representing correlations since assimilation domains usually contain thousands of data points leading to large and possibly complex matrices
To reduce the dimensionality of the problem, we investigate the case where the errors ϵb and ϵo are homogeneous, isotropic, and spatially correlated following an exponential2 decay.
The standard deviation of background and observation errors are then given by the scalars σb = σ(ϵb) and σo = σ(ϵo).

The correlation of background and observation errors are then given by the scalars αb and αo, respectively.
3. Theoretical background
Given the previous simplifications, only four parameters (σb, αb, σo, and αo) are required for a complete description of background, observations, and analysis errors on any assimilation domain. In this section, we give analytical expressions for the errors of analyses obtained with and without the representation of correlations in
Before these expressions are given, we briefly discuss the estimation of error statistics from a sample of correlated errors.




The estimate for the variance of errors s(ϵ) may only estimate the true variance σ(ϵ) to a certain precision. For different realizations of ϵ, s(ϵ) will vary about σ(ϵ). Throughout this article, we refer to this variability as the “sampling noise” represented as σ(s(ϵ)) or σ(s) for short, where parentheses represent “function of.”







While the errors in ϵb and ϵo are spatially correlated, the errors between different realizations of ϵb and ϵo are not. In Eq. (12), s(s) can then be estimated with the “traditional” formula for estimating the standard deviation.
To assess the impact of representing the correlation of errors we compare analyses obtained by neglecting the correlation of errors (using the diagonal matrices
a. Analyses obtained by neglecting the correlation of errors



b. Analyses obtained by perfectly representing the correlation of errors








4. Methodology
In the previous sections, we have shown that the standard deviation, correlation, and sampling noise of analysis errors could be predicted from the errors of xb and y (determined by αb, σb, αo, and σo), and the domain size (from which we get n).
The principal objective of this article is to assess the impact of representing and misrepresenting correlations in idealized assimilation experiments. To do so, we need to show the dependence of analysis errors to the five parameters aforementioned; this is something that cannot easily be represented in one or even a few plots.
Being mostly interested in the impacts of representing the correlation of errors, we chose to let σb = σo = 2.5 m s−1, a plausible value for the errors of horizontal winds in xb a s well as Doppler velocity. By setting an equal value for the standard deviation of errors for the two sources of information, any difference in analysis errors will be attributable to correlation.
The analogy with radar data assimilation is useful as a reference point for determining the context in which experiments are performed and the general nature of errors to be tested. However, the experiments conducted here may be representative of any situation where two estimates with correlated errors are combined into an analysis.
The assimilation domain we chose for our experiments consisted of a 2D grid of 100 × 100 points representing a square domain with a side of 100 km at a resolution of 1 km. This fixes n = 10 000.
We wish to describe analysis errors for a wide variety of combinations of αb and αo. We thus fixed the decay rate for the correlation of observation errors to αo = 5 km, a value that involves nonnegligible spatial correlations and yet is comparable to the typical length scales of wind field features in convection. The variable αb was allowed to vary between 0 and 100 km. Note that because the errors of xb and y were of the same nature, the same results would have been achieved by fixing αb and allowing αo to vary.
Verification of assimilation system
Having derived theoretical expressions for analysis errors in the above, the impact of representing correlations could have been assessed without the actual computation of analyses. Analyses were nevertheless computed for verification purposes. By comparing the errors estimated from analyses to those expected from theory, we could verify that our assimilation system and verification procedure were free of errors. Having validated our assimilation system, we could then be confident in the results of experiments (shown in sections 5c and 5d) for which only numerical computations were available.
Here is the four-step procedure by which our assimilation system was tested:
- For given values of αb and σb, generate exponentially correlated noise and add to a predefined truth xt to obtain xb. Using the same procedure, obtain simulated observations y from αo and σo.Over 1D domains, exponentially correlated noise can be obtained through an first-order autoregressive process (Ward 2002). In 2D and 3D, exponentially correlated noise can be obtained by convolving fields of Gaussian white noise with the kernels provided in Tables 1 and 2 of Oliver (1995).The choice of xt is unimportant since this “truth” is to be removed from x to obtain analysis errors [see Eqs. (22)–(23)].
- Combine xb and y to obtain an analysis x. Two different analyses were computed: xavg was obtained by use of Eq. (14), and xoptim was obtained by minimizing the cost function [Eq. (3)] using the conjugate-gradient algorithm.Knowledge of
−1 and −1 is required for the computation of the cost function and its gradient. Several authors have discussed the sparse nature of inverse exponential matrices. Analytical formulation for the inverse of exponential correlation matrices are provided by Tarantola (2005) in 1D and 3D. Approximations for inverse exponential can also be obtained by use of Taylor expansion in spectral space as demonstrated by Oliver (1998) and Xu (2005). These expressions, however, require special care near boundaries. For the experiments presented here, on a relatively small 2D domain with background and observation estimates available everywhere, direct numerical inversion was the simplest way to obtain −1 and −1. - Estimate the standard deviation and correlation of analysis errors from
- Repeat steps 1 to 3 N times and obtain
and using Eq. (13). As N increases, should converge to σavg and to σoptim. The magnitude of the sampling noise s(s(ϵavg)) and s(s(ϵoptim)) can also be estimated using Eq. (12). As N increases, s(s(ϵavg)) should converge to σ(s(ϵavg)) and s(s(ϵoptim)) to σ(s(ϵoptim)).
5. Impact of representing correlations on the standard deviation of analysis errors
In Fig. 1, we show the standard deviation of background, observation, and analysis errors as a function of αb. The errors expected from theory are displayed by use of solid lines while the errors obtained from numerical estimations are indicated by use of dots and color shadings.

Standard deviation of background, observation, and analysis errors as a function of αb, the rate of decay for the correlation of background errors. The three other parameter determining the nature of errors were kept constant with at σb = 2.5 m s−1, σo = 2.5 m s−1, and αo = 5 km. Background errors are shown in light purple while observation errors are shown in blue. The errors for two set of analyses are also being compared. The errors of xavg are illustrated in orange while the errors of xoptim are shown in dark purple. For each of the four errors being plotted, thick solid lines indicate the theoretical standard deviations of errors σ(ϵ) expected from the characteristics of input data. Dots represent the average standard deviation
Citation: Monthly Weather Review 142, 11; 10.1175/MWR-D-14-00104.1
Observation errors ϵo are represented in light blue. Here σ(ϵo), the standard deviation of observation errors was set to 2.5 m s−1 and is indicated by the horizontal dashed line [also indicating the standard deviation of background errors σ(ϵb) in gray]. Thin blue lines indicating the sampling noise expected from Eq. (11) are plotted 2σ(s(ϵo)) ≈ 0.2 m s−1 above and below σ(ϵo). These lines indicate the range within which we expect the estimated standard deviation of errors s(ϵo) to lie 95% of the time. Because σo and αo were fixed, the expected sampling noise for observations errors does not vary.
We can verify that the simulated observations y had the expected error statistics by considering the blue dots, each indicating
Background errors ϵb are displayed in light purple. Again, we can verify that
The error statistics for analyses obtained by a weighted average of xb and y are shown in orange. Invariably, s(ϵavg) ≈ σ(ϵavg) = 1.77 m s−1. The magnitude of the sampling noise s(s(ϵavg)) also increases with αb.
Statistics for the errors of optimal analyses, ϵoptim, are indicated in dark purple. The difference between σ(ϵoptim) and σ(ϵavg) represents the magnitude of improvements to the standard deviation of analysis errors brought by the perfect representation of correlations in
When the correlation of observation and background errors are equal, indicated by the vertical dashed line in black (in Fig. 1), optimal analyses were also obtained using simple weighted averages. This was also expected since the two terms of the cost function [Eq. (3)] have the exactly the same structure. In the case where αb = αo, optimal analyses are obtained by a simple weighted average between xb and y. This is also demonstrated analytically in appendix A.
Another interesting feature of the errors of xoptim is the tapering off of the improvements brought by the representation of correlations for αb ≲ 0.5 km. This tapering is a consequence of αb becoming significantly smaller than the grid spacing, which was set to 1 km.
We could attest that our assimilation system and verification procedure are behaving as expected since
a. Impact of representing correlations on the computational costs of analyses
Here are some considerations with respect to the computational costs of xavg versus those of xoptim:
All analyses represented in this study were computed on the same desktop computer. The computational cost of representing the correlations of errors could then be inferred by comparing the time necessary for the generation of different analysis ensembles.
The analyses xavg consisting in the weighted average of xb and y could be obtained in a few thousandths of a second. Neglecting the correlation of errors leads to virtually costless analyses.
By contrast, xoptim obtained through the minimization of a cost function required a significant amount of computer time. In Fig. 2, we plot the time required for the generation of analyses as a function of αb. Each dot represents the average time required for the minimization of 100 cost functions with the same

Computation time as a function of αb, the rate of decay for the correlation of background errors. For each αb being tested, 100 optimal analyses xoptim were computed from different realizations of xb and y. Black dots represent the average convergence time for each of these analysis ensembles. Gray shading, extending two standard deviations above and below the average, represent the variability of convergence time. The blue line indicates the condition number (l2) of the matrix (
Citation: Monthly Weather Review 142, 11; 10.1175/MWR-D-14-00104.1
The exact amount of time required for convergence is not interesting in itself. Figure 2 shows that optimal analyses are much more expensive than weighted average. This is especially true for strongly correlated background and observation errors.
Research being our primary objective, the assimilation system that we used had not been optimized with computational time in mind. Such optimization (e.g., preconditioning, parallelization, etc.) is expected in operational system and will reduce the time required for the generation of analyses. It appears reasonable, however, to assume that for any assimilation system, representing the correlation of both background and observation errors may not be done at negligible costs.
b. Sampling noise
If the observations are dense and that the correlation of ϵo and ϵb is similar, we know from Fig. 1 that nearly optimal analyses may be obtained by neglecting the correlation of errors in
The errors found in any individual analysis xavg are partly due to misrepresentations of correlations and partly due to sampling noise. If the errors caused by misrepresenting correlations are much larger than the variability due to sampling noise, then the extra expense of obtaining xoptim are justified. On the other hand, if sampling noise is expected to dominate the error, then the beneficial impacts of representing the correlations may well go unnoticed.


Ratio r [Eq. (24)] of the magnitude of the improvements brought by the correct representation of error correlation vs the magnitude of the sampling noise expected for optimal analyses as a function of αb. A value of r > 1 (above the horizontal dashed line) indicates values of αb for which the representation of error correlation yields improvements in which magnitude are greater than the sampling noise. In such circumstances, the extra cost of representing the correlation of errors will result in analyses with noticeably smaller errors. For r < 1, the magnitude of the sampling noise exceeds that of the improvements brought by the representation of correlations. In this case, the errors caused by neglecting the correlation of errors may become observable only through the averaging of several analyses obtained in identical conditions.
Citation: Monthly Weather Review 142, 11; 10.1175/MWR-D-14-00104.1
A ratio r = 1, indicated by the horizontal dashed line means that the errors made by neglecting correlations are of the same magnitude as sampling noise. For αb ≲ 3 km and αb ≳ 10 km, r > 1. The errors of individual analyses are dominated by the misrepresentation of error correlations. In these circumstances, representing the correlation of errors will noticeably improve the quality of analyses. For 3 ≲ αb ≲ 10 km, r < 1. The error is dominated by sampling noise. Neglecting correlation of errors will have little noticeable impacts on the standard deviation of analysis errors.
c. Neglecting the correlations of errors for only one of 
or 


In data assimilation, it is not uncommon to represent the correlation of background errors and not those of radar errors. We now assess the impact of neglecting the representation of error correlation for only one term in the cost function.
In Fig. 4, we show the standard deviation of errors for analyses obtained by neglecting correlation in only one of

The errors of analyses obtained by neglecting the correlation of only one of background or observation errors as a function of αb. The errors of analyses for which only the correlation of background errors were neglected using
Citation: Monthly Weather Review 142, 11; 10.1175/MWR-D-14-00104.1
Analyses obtained with diagonal
When only the correlation of background errors were omitted (in blue) and the correlation of background errors were small, nearly optimal analyses could be obtained. For αb ≳ 2 km, the errors of these analyses were greater than σavg (orange line). When only the correlation of observation errors were neglected (in red), the resulting analyses always had errors larger than σavg.
The impact of neglecting the correlation of errors in only one term of the cost function is important. Figure 4 indicates that it is generally better to neglect the correlation of errors altogether than to only partially represent them.
Misrepresented correlations may be compensated by purposefully misrepresenting the variance of errors as discussed in LR02; a technique that we will refer to as “variance compensation.” In Fig. 4 dashed lines indicate the average standard deviation of errors
From Fig. 4, we can observe that variance compensation may significantly improve the standard deviation of analysis errors. This is not, however, generally sufficient to yield analyses with errors smaller than σavg.
d. Data thinning
LR02 concluded that the correlation of observation errors could be neglected with negligible impacts on analysis errors when the correlation between neighboring observations does not exceed 0.15. In the present context, where observations are spatially correlated following an exponential decay with αo = 5 km, this criterion is met for observations which are approximately 10 km apart.
In Fig. 5a, we show the errors for analyses obtained by considering only observations that were separated by a distance of 10 km or more. Out of the initial 100 × 100 observations, a grid of 10 × 10 was retained. Only 1 out of every 100 observations was conserved. A simple matrix

The impact of data thinning. (a) Observations are thinned so that they are never <10 km apart (1 in 100 of the original observations were conserved). In this case, the errors of analyses obtained by neglecting the correlation of observation errors (in green) are virtually indistinguishable from optimal analyses (in pink) obtained with perfect representation of errors. Observations located at least 10 km apart are effectively uncorrelated and analyses do not suffer from the misrepresentation of the observation error correlations. (b) Less drastic thinning was applied by thinning observations located at every second grid points (1 in 4 of the original observations were conserved). This time, misrepresenting the correlation of observation errors (in green) lead to analyses with larger errors than those with perfect representation of correlations (in pink). Unless the correlation of errors is perfectly represented in
Citation: Monthly Weather Review 142, 11; 10.1175/MWR-D-14-00104.1
Because of data thinning, analysis errors were no longer homogeneous. Only average error statistics are nevertheless presented here. This eases the representation of errors as a function of αb, and makes the comparison with previous experiments possible.
For each pairs xb and y, analyses were performed twice. Once neglecting the correlation of observation errors in
In Fig. 5a, we can observe an almost perfect overlap between the errors of the two set of analyses. As observed by LR02, neglecting the correlation of errors had no impact on the quality of analyses.
For comparison with earlier results, we also plotted (in red) the errors of analyses also obtained with
When background errors were weakly correlated, αb ≲ 1 km, the reduction in observational information caused by data thinning significantly increased the standard deviation of analysis errors. In these conditions, the average standard deviation of analysis errors was approximately the same as that of background errors.
As αb was increased thinning became more and more beneficial. For αb ≳ 5 km, the errors of analyses for which thinning was used (in green and pink) were smaller than those for which thinning was not used (in red). The improvements were small, however, when compared to the sampling noise observed for such correlations.
Keeping only 1 in 100 observations removed a lot of information. In Fig. 5b, we show the errors of analyses for which 1 in every 4 observations was conserved. This corresponds to a situation where the smallest distance between neighboring observations is 2 km.
This time, analyses obtained by neglecting the correlations of errors (in green) have significantly greater errors than those for which correlations are perfectly represented (in pink). This was expected since the errors of observations 2 km apart are still significantly correlated.
It is interesting to note that in case of strong background error correlations, αb ≳ 8 km, when the correlation of errors was perfectly represented (in pink), analysis errors converged to those of optimal analyses obtained without thinning (in dark purple). This was also observed in LR02. When the correlations are strong and perfectly respected, increasing the density of observations does not significantly improve the quality of analyses. It is unfortunate that perfect representation of correlations is improbable in a more realistic context.
Used at its maximum potential, variance compensation (red dashed lines) yields analyses with smaller errors than those obtained with data thinning (green dots). Again, optimal application of variance compensation is unlikely in a realistic context.
When data thinning was used, nearly optimal analyses could only be obtained at the condition that 1) modest amount of thinning be applied, 2) the correlations of errors be perfectly represented in both
Irrespectively of variance compensation or data thinning, every analyses obtained by representing only the correlation of background errors (using
In operational data assimilation, correlations are not expected to be perfectly represented. Our results suggest that in such cases, keeping all available information but neglecting correlations altogether may well be less damageable than thinning observations prior to their assimilation.
Another commonly used method for dealing with dense radar data is the averaging of a group of observations to obtain a smaller number of “superobservations.” The averaging process by which superobservations are obtained affects the probability density or errors to be represented in
6. Impact of representing the correlation of ϵb and ϵo on the correlation of analysis errors
So far, only the standard deviation of analysis errors have been discussed. We saw how representing the correlation of errors was truly beneficial only in cases where the spatial correlation of ϵb and ϵo were relatively different. Also, the quality of analyses obtained by neglecting correlations altogether was generally better than that of analyses with partial representation of correlations. In this section, we demonstrate that similar conclusions can be reached through by examining the correlation of analysis errors.
While the standard deviation of analysis error could be represented by a scalar, the homogeneous and isotropic 2D correlation of analysis errors must be represented by a plot of correlation as a function of the distance, or lag, separating errors.
For the case where αb = αo = 5 km, xavg = xoptim and the correlation of analysis errors follows an exponential decay with αavg = αoptim = 5 km. This situation is not depicted here.
In Fig. 6, we show the correlation of analysis errors for 100 realizations of xavg and xoptim obtained with αb = 0.1, 1, and 10 km. Purple lines in Figs. 6a–c indicate ζ(ϵoptim), the correlation of optimal analyses expected from Eq. (21). The average autocorrelation of errors estimated from analysis ensembles,

Correlation of analysis errors for αb = 0.1, 1, and 10 km. The errors of three types of analyses are being compared. (a)–(c) Optimal analyses (in purple), (d)–(f) analyses consisting of a weighted average between xb and y (in orange), and (g)–(i) analyses for which only the correlation of observation errors were neglected (in red). For each of these plots, the 2D autocorrelation of 100 error fields were considered. Dots represent the average correlation for different lags along the x axis of the correlation function. Color shadings represent the variability of correlations observed in the 100 error fields. The correlation of background (in gray) and observation (in blue) errors are also shown as a reference. Neglecting only the correlation of observation errors (in red) has a larger negative impact on the correlation of analysis errors than neglecting correlations altogether (in orange).
Citation: Monthly Weather Review 142, 11; 10.1175/MWR-D-14-00104.1
The correlation of errors for analyses consisting of a weighted average between xb and y (obtained using
For αb = 0.1 km (Figs. 6a,d,g) perfect representation of correlations in
As αb was increased, the difference between the correlation of errors in ϵoptim and ϵavg (cf. Figs. 6b,e and 6c,f) became smaller. For αb = 10 km, the correlation of errors for analyses obtained with and without the representation of correlations (Figs. 6c,f) are virtually indistinguishable. Neglecting the correlation of errors results in analyses with stronger error correlations but no real damage is done.
The opposite can be said about analyses for which only the correlation of observations errors was neglected. For increasing αb (Figs. 6h,i) the correlation of analysis errors became more and more different from optimal analyses (Figs. 6b,c). For short lags, these correlations even exceeded the correlation of either the background and observations estimates.
7. Discussion
For increasing αb, the variance of optimal analyses errors was shown to increase until αb = αo at which point, increasing αb leads to analyses with smaller errors. While this behavior could be mathematically explained this result is not very intuitive. Why does the representation of error correlations only improve the quality of analyses at the condition that αb ≠ αo?
In attempt to answer this question, let us consider the extreme situation where αb tends to infinity. In the limit of infinitely long correlations, background errors would be exactly of the same sign and magnitude everywhere in the assimilation domain. In other words, background estimates xb would “measure” xt with great precision but not with great accuracy. Note that the error (singular form intended) of such background estimates would differ between realizations of xb and have an expected value of zero.
In this situation, the respective “merits” of background and observation estimates are different. Background estimates provide very good information on the relative magnitude between different random variables in final analyses while observation estimates provide very good information on the average value of all random variables considered together.
Representing the correlation of errors “tells” the assimilation system how to best combine xb and y in order to take advantage of the respective merits of each source of information. For infinitely long background error correlations, observations would be used solely for the purpose of adjusting the mean value of individual background estimates in order to generate analyses that are both precise and accurate.
Representing the correlation of errors can only be beneficial in cases where the merits of information contained in xb and y are different. For αb = αo, the two estimates do not bring complementary information in the system and analyses with the largest error variance are observed.
The results presented in this study may be considered as an extension to those of LR02. We now examine some of their conclusions in the context of our experiments. We also discuss the implications of our experiments to more realistic data assimilation situations.
LR02 concluded that when correlations are perfectly represented, increasing the observation density beyond a certain threshold yields little improvements to the quality of analyses. Our experiments demonstrate that the magnitude of improvements caused by increasing the observation density is strongly dependent on the correlation of background errors. In Figs. 5a,b, we can observe the errors of optimal analyses obtained with (in pink) and without (in dark purple) data thinning. The difference between these two sets of analyses, which indicates the magnitude of improvements associated with increasing the observation density, strongly depends on the correlation of background errors. In Fig. 5b, where moderate thinning was applied, optimal analyses obtained with and without thinning have virtually the same errors for αb ≳ 10 km. For this specific experiment, we can therefore conclude that increasing the observation density beyond 2 km yields little improvements to the quality of analyses on the condition that αb ≳ 10 km. To the conclusion of LR02, we add the requirement for background errors to be sufficiently correlated.
In a more realistic context, the structure of background and observation errors is likely to be much more complex than those examined here. Errors are not expected to be homogeneous and isotropic; their distributions may be poorly represented in terms of their mean and variance alone. There are also significant challenges associated with both the estimation of these errors and their representation in assimilation systems. For these reasons, misrepresenting the correlation of errors is unavoidable and obtaining truly optimal analyses is impossible. In this context, experiments where we neglected the correlation of observation errors are more relevant.
A second conclusion of LR02 is that when the correlations of observation errors are neglected, thinning data such that neighboring grid points have correlations no greater than 0.15 provides the best compromise between the error associated with correlation misrepresentations and the loss of information associated with thinning.
In Fig. 5, errors associated with correlation misrepresentations may be estimated by comparing analyses obtained with (in dark purple) and without (in red) the representation of observation error correlations. The magnitude of these errors shows only a weak dependence on the correlation of background errors.
Errors associated with information loss may be estimated by comparing optimal analyses obtained with (in pink) and without data thinning (in dark purple). The magnitude of these errors is strongly dependent on the correlation of background errors.
In green are the errors of analyses obtained by neglecting the correlation of observation errors and applying data thinning. Thus, they suffer from the two types of errors aforementioned. Data thinning may only alleviate the errors caused by correlation misrepresentations on the condition that background errors are sufficiently correlated. In Fig. 5a, this happens for αb ≳ 5 km, in Fig. 5b, this happens for αb ≳ 2 km. Again, to the conclusion of LR02, we add the requirement for background errors to be sufficiently correlated.
One notable difference between the experiments presented here and those of LR02 is the functional form of the correlations, which have been tested. We performed experiments with exponential correlations while they tested correlations functions closer to a Gaussian decay. With respect to the exponential function, the Gaussian decay imposes a more abrupt decrease of correlation with increasing distance between errors. There are two ways in which this is favorable to data thinning. First, less thinning will be necessary to obtain a correlation of 0.15 between neighboring data points. This will diminish the errors due to information loss. Second, background errors correlated following a Gaussian decay will better propagate observation information to neighboring and nonobserved grid points. We can, therefore, conclude that the 0.15 criterion found by LR02 depends on both the rate of decay and the functional form of background error correlations.
In any circumstances where dense observations are available, data thinning is probably not necessary. In our experiments, analyses obtained by neglecting correlations altogether proved systematically better than those with the representation of only background error correlation. This is convenient as weighted averages only require knowledge of the variance of errors that may be more easily estimated and represented in data assimilation than correlation. Savings in computational costs are not to be neglected either.
Our experiments have shown that partial representation of correlation generally leads to analyses that are of poorer quality compared to simple weighted averages. This suggests that great care should be taken when it comes to representing the correlation of errors in a realistic context. In case of doubt on the magnitude of the improvements brought by the representation of correlations, neglecting correlations altogether should be considered as a safe option.
Of course, neglecting the correlation of errors is only possible where dense observations are available. If radar observations are spatially scattered, or not available, then the correlation of background errors should be represented to spread observation information to nonobserved areas. In a realistic assimilation context, dense radar measurements are usually available on limited portions of the assimilation domain. The determination of where and when
In our experiments, we set σb = σo = 2.5 m s−1 for simplicity. In a more realistic assimilation context, it is likely that the variance of background and observation errors will differ. In experiments (not shown here) where we reduced σo to 1 m s−1 (a number often quoted for the errors of Doppler velocity), the standard deviation of analysis errors was reduced to ~0.9 m s−1 irrespectively from the correlations being represented in
In this respect, we have demonstrated the strong influence of error correlations on the precision at which the variance of errors may be estimated. In our experiments, precise estimates for the standard deviation of errors could be obtained by averaging several errors fields with the same error statistics. In an atmospheric context, this may only be done in a climatological sense. The applicability of such estimates to convective situations remains to be determined.
The temporal aspect of representing correlations is yet another aspect of data assimilation that should be investigated. Typically, data assimilation consists of several cycles of analyses followed by periods of model integration. In principle, this should bring the model state closer to the truth and affect both the magnitude and correlation of its errors. Potentially, this could make the correlation of background errors sufficiently different from observation errors to justify the representation of these correlations. If and how this happens at the convective scales remains to be documented.
As outlined above, there are a number of major differences between the idealized experiment presented here and the context of operational data assimilation. These experiments are nevertheless interesting as they help us understand the contribution of correlation to the quality of analyses in the ideal case where all the usual assimilation assumptions are fulfilled. It is probably safe to say that in an operational context, where assimilation is conducted in less than ideal conditions, improvements can only be harder to obtain.
The experiments presented here demonstrate that representing only the correlation of background errors may well end up degrading the quality of analyses compared with not representing correlations at all. This conclusion is important since most modern frameworks for performing data assimilation [e.g., the ensemble Kalman filter (EnKF) or hybrid EnKF-variational systems] are oriented toward better representation of background error correlations at the analysis step. We now know that, alone, improving the representation of background errors is not expected to improve the quality of observed variables in analyses.
8. Conclusions
The experiments presented in this study help understanding the process of optimal estimation in the presence of multivariate and correlated estimates. For simplicity, we only considered the case where background and observation estimates were available everywhere in the assimilation domain with known errors represented by unbiased, homogeneous and isotropic multivariate Gaussian distributions. Errors were correlated in space following an exponential decay, a “long tailed” correlation function similar to those often found in the atmosphere. With these simplifications, analysis errors could be expressed as a function of only four parameters: the standard deviation and correlation of background and observation errors.
In a first set of experiments, two situations were examined: one in which analyses were obtained by neglecting correlations altogether, a second in which correlations were perfectly represented. When the correlations of errors were neglected, analyses could be obtained at a very low computational cost by a weighted average between background and observation estimates. From a statistical perspective, these analyses are not optimal. They are not those with the smallest expected error variance with respect to the truth. Optimal analyses were obtained with perfect representation of correlations and at a considerably higher computational cost than simple weighted averages. We investigated whether the extra cost associated with the representation of correlations was justified.
By comparing these two sets of analyses we demonstrated that the more the correlation of background and observation errors differ, the more representing those correlations had a beneficial effect on the analyses. When the correlation of background and observation errors were equal, optimal analyses were obtained with or without the representation of correlations. For this special (and unlikely) situation, the costs associated with the representation of correlations are definitely not justified.
When the correlation of background and observation errors were different, perfectly representing the correlation of errors was always beneficial. The magnitude of the improvements, however, was shown to depend on the difference between the correlation of background and observation errors. When the correlation of background and observation errors did not differ significantly, perfect representation of correlations provided only small improvements compared to suboptimal analyses where correlations were neglected altogether.
Special attention was paid to the precision at which the average standard deviation of errors may be estimated. This second-order error statistics (which we referred to as the sampling noise) was shown to depend on the size of the assimilation domain, the standard deviation, and the correlation of errors being estimated. To determine when correlations may be neglected, we suggested that one considers the ratio between the improvements brought by the representation of correlation versus the expected magnitude of the sampling noise. If the improvements brought by the representation of correlations are small compared to the precision at which the standard deviation of errors may be estimated, then the beneficial effects of representing correlation are likely to go unnoticed. In this case, the computational resources associated with the representation of correlations may be allocated elsewhere.
The presence of correlated errors is not necessarily a bad thing. In the situation where the errors of one estimate are strongly correlated while the errors of a second estimate are not, representing these correlation in the cost function does yield analyses of significantly better quality than simple weighted averages. Determining whether if, and to which extent, this situation occurs in realistic situation demands further investigation.
Analyses for which the correlations of errors were represented in only one of
Under ideal circumstances, partial representation of error correlations generally does more harm than good. In a more realistic context where misrepresenting the correlation of errors is unavoidable, great care should be taken to insure that representing correlations will have a beneficial impact on analyses. In case of doubts, neglecting correlations altogether should be considered as a safe alternative. Of course, this is only possible in areas where dense observations are available.
The results presented in this article suggest that common practice in data assimilation, such as only representing the correlation of background errors and data thinning, may not have the expected beneficial impact on the quality of analyses. Operational data assimilation is, however, conducted in conditions which are far from the idealized context in which our experiments were performed. Therefore, additional work is required to fully document the impacts of representing correlations in a more realistic framework.
In the second part of this study, we perform similar experiments using model output as background estimates with observations made available only in precipitating areas. This will allow us to investigate the challenges that arise when we perform data assimilation from estimates, which error statistics are non-Gaussian, heterogeneous, biased, and misrepresented.
The authors are grateful to Peter Houtekamer and Marc Berenguer for providing many useful suggestions improving preliminary versions of this article.
APPENDIX A
Analytical Expressions for the Analysis Variance–Covariance Matrix




In this appendix, we show that for


Such an operator may be constructed for











In 2D, the operator representing the inverse of an exponential variance-covariance matrix has yet to be derived analytically (Oliver 1998). The method used above is, therefore, not applicable. However, close examination of the average autocorrelation for analysis errors displayed in Fig. 6 showed that they do not follow an exponential decay.




However, in the special case where αo = αb = αsame then analysis errors are exponentially correlated with Eqs. (A12)–(A14) being reduced to Eqs. (A10) and (A9).
APPENDIX B
Variance Compensation
In this appendix, we discuss the use of variance compensation to alleviate the negative impacts associated with the misrepresentation of error correlations.
Liu and Rabier (2003) mention the technique and discuss how variance may be adjusted by considering the value of the cost function J. For the experiments presented here, we were not so much interested in how to adjust the variance as in documenting the maximum benefits that could be obtained using this technique.
Given that we knew xt, we adjusted the variance a posteriori, by computing analyses with different representation of variance and choosing the value that minimized the standard deviation of errors with respect to xt. This is best explained using an example.
Let σo = σb = 2.5 m s−1, αb =1 km, and αo = 5 km. We are interested in analyses obtained with perfect representation of background error correlations (the “true”
The correct representation of background error correlations in
To a certain extent, the “correct” contribution of observation and background estimates can be reestablished by purposefully misrepresenting the variance of errors in
In Fig. B1, we plot the standard deviation of analysis errors for

Demonstration of variance compensation improving the standard deviation of analysis errors in a case where the correlation of background errors are perfectly represented but not that of observation errors. For this experiment, background error statistics are given by σb = 2.5 m s−1 and αb = 1 km. As always, observation error statistics are given by σo = 2.5 m s−1 and αo = 5 km. The 100 analyses (of which the errors are displayed in red) were computed for different representation for the standard deviation of background errors
Citation: Monthly Weather Review 142, 11; 10.1175/MWR-D-14-00104.1
In the limit where
In the Figs. 4 and 5 the dashed line in red represents analyses ensembles for which variance compensation was applied. For each of these ensembles, the value of
REFERENCES
Bayley, G. V., , and J. M. Hammersley, 1946: The “effective” number of independent observations in an autocorrelated time series. Suppl. J. Roy. Stat. Soc., 8, 184–197, doi:10.2307/2983560.
Berenguer, M., , and I. Zawadzki, 2008: A study of the error covariance matrix of radar rainfall estimates in stratiform rain. Wea. Forecasting, 23, 1085–1101, doi:10.1175/2008WAF2222134.1.
Berenguer, M., , and I. Zawadzki, 2009: A study of the error covariance matrix of radar rainfall estimates in stratiform rain. Part II: Scale dependence. Wea. Forecasting, 24, 800–811, doi:10.1175/2008WAF2222210.1.
Berenguer, M., , M. Surcel, , I. Zawadzki, , M. Xue, , and F. Kong, 2012: The diurnal cycle of precipitation from continental radar mosaics and numerical weather prediction models. Part II: Intercomparison among numerical models and with nowcasting. Mon. Wea. Rev., 140, 2689–2705, doi:10.1175/MWR-D-11-00181.1.
Bretherton, C. S., , M. Widmann, , V. P. Dymnikov, , J. M. Wallace, , and I. Bladé, 1999: The effective number of spatial degrees of freedom of a time-varying field. J. Climate, 12, 1990–2009, doi:10.1175/1520-0442(1999)012<1990:TENOSD>2.0.CO;2.
Chung, K.-S., , I. Zawadzki, , M. K. Yau, , and L. Fillion, 2009: Short-term forecasting of a midlatitude convective storm by the assimilation of single-Doppler radar observations. Mon. Wea. Rev.,137, 4115–4135, doi:10.1175/2009MWR2731.1.
Fabry, F., 1996: On the determination of scales ranges for precipitation fields. J. Geophys. Res., 101, 12 819–12 826, doi:10.1029/96JD00718.
Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757, doi:10.1002/qj.49712555417.
Hollingsworth, A., , and P. Lönnberg, 1986: The statistical structure of short-range forecast errors as determined from radiosonde data. Part I: The wind field. Tellus, 38A, 111–136, doi:10.1111/j.1600-0870.1986.tb00460.x.
Hosmer, D. W., Jr., , S. Lemeshow, , and S. May, 2008: Applied Survival Analysis: Regression Modeling of Time to Event Data. 2nd ed. Wiley, 416 pp.
Keeler, R. J., , and S. M. Ellis, 2000: Observational error covariance matrices for radar data assimilation. Phys. Chem. Earth, Part B: Hydrol. Oceans Atmos., 25, 1277–1280, doi:10.1016/S1464-1909(00)00193-3.
Kiefer, J. C., 1953: Sequential minimax search for a maximum. Proc. Amer. Math. Soc., 4, 502–506.
Lilly, D. K., 1990: Numerical prediction of thunderstorms—Has its time come? Quart. J. Roy. Meteor. Soc., 116, 779–798, doi:10.1002/qj.49711649402.
Liu, Z. Q., , and F. Rabier, 2002: The interaction between model resolution, observation resolution and observation density in data assimilation: A one-dimensional study. Quart. J. Roy. Meteor. Soc., 128, 1367–1386, doi:10.1256/003590002320373337.
Liu, Z. Q., , and F. Rabier, 2003: The potential of high-density observations for numerical weather prediction: A study with simulated observations. Quart. J. Roy. Meteor. Soc., 129, 3013–3035, doi:10.1256/qj.02.170.
Oliver, D., 1995: Moving averages for Gaussian simulation in two and three dimensions. Math. Geol., 27, 939–960, doi:10.1007/BF02091660.
Oliver, D., 1998: Calculation of the inverse of the covariance. Math. Geol., 30, 911–933, doi:10.1023/A:1021734811230.
Purser, R. J., , W.-S. Wu, , D. F. Parrish, , and N. M. Roberts, 2003: Numerical aspects of the application of recursive filters to variational statistical analysis. Part I: Spatially homogeneous and isotropic Gaussian covariances. Mon. Wea. Rev., 131, 1524–1535, doi:10.1175//1520-0493(2003)131<1524:NAOTAO>2.0.CO;2.
Stratman, D. R., , M. C. Coniglio, , S. E. Koch, , and M. Xue, 2013: Use of multiple verification methods to evaluate forecasts of convection from hot- and cold-start convection-allowing models. Wea. Forecasting, 28, 119–138, doi:10.1175/WAF-D-12-00022.1.
Sun, J., , D. W. Flicker, , and D. K. Lilly, 1991: Recovery of three-dimensional wind and temperature fields from simulated single-Doppler radar data. J. Atmos. Sci., 48, 876–890, doi:10.1175/1520-0469(1991)048<0876:ROTDWA>2.0.CO;2.
Tarantola, A., 2005: Inverse Problem Theory and Methods for Model Parameter Estimation.SIAM, 358 pp.
Thiébaux, H. J., , and F. W. Zwiers, 1984: The interpretation and estimation of effective sample size. J. Climate Appl. Meteor., 23, 800–811, doi:10.1175/1520-0450(1984)023<0800:TIAEOE>2.0.CO;2.
Trefethen, L. N., , and D. Bau, 1997: Numerical Linear Algebra.SIAM, 373 pp.
Ward, L. M., 2002: Dynamical Cognitive Science.The MIT Press, 371 pp.
Wilson, J. W., , Y. Feng, , M. Chen, , and R. D. Roberts, 2010: Nowcasting challenges during the Beijing Olympics: Successes, failures, and implications for future nowcasting systems. Wea. Forecasting, 25, 1691–1714, doi:10.1175/2010WAF2222417.1.
Xu, Q., 2005: Representations of inverse covariances by differential operators. Adv. Atmos. Sci., 22, 181–198, doi:10.1007/BF02918508.
Yaremchuk, M., , and A. Sentchev, 2012: Multi-scale correlation functions associated with polynomials of the diffusion operator. Quart. J. Roy. Meteor. Soc., 138, 1948–1953, doi:10.1002/qj.1896.
Zawadzki, I., 1973: The loss of information due to finite sample volume in radar-measured reflectivity. J. Appl. Meteor., 12, 683–687, doi:10.1175/1520-0450(1973)012<0683:TLOIDT>2.0.CO;2.
Zawadzki, I., 1982: The quantitative interpretation of weather radar measurements. Atmos.–Ocean, 20, 158–180, doi:10.1080/07055900.1982.9649137.
Zieba, A., 2010: Effective number of observations and unbiased estimators of variance for autocorrelated data—An overview. Metrol. Measure. Syst.,17, 3–16, doi:10.2478/v10178-010-0001-0.
Multivariate is used here in reference to the many random variables by which these error fields may be described. The influence of correlations may only be studied in the presence of two or more random variables. These statistical random variables are not to be confused with atmospheric state variables, such as temperature, or pressure.
Many reasons justify the choice of exponential correlations. First, previous work on this function allowed us to derive expressions for analysis errors as stated in section 3b and appendix A. Second, we could use existing mathematical tools for testing our assimilation system. A procedure, described in section 4a, which requires the generation of exponentially correlated noise and the use of inverse exponential correlation matrices. Third, long tailed correlations, similar in nature to the exponential decay, have frequently been observed in atmospheric contexts. For examples, see Hollingsworth and Lönnberg (1986) for errors correlations at the global scales, and Berenguer and Zawadzki (2008, 2009) for the correlation of rain-rate errors inferred from radar measurements.
As is often done in statistics, we adopt the convention by which the true standard deviation of errors is represented by the Greek letter σ while estimates of the same quantity are represented by the letter s.
The equivalence between the two forms becomes evident when considering the inverse of Eq. (14) of Bayley and Hammersley (1946) multiplied by n2.
Note that the concept of effective degrees of freedom has also been used to described different statistical properties (e.g., in Bretherton et al. 1999). For the purpose of this article, it only refers to the quantity estimated using Eq. (8).
The accuracy of this approximation has been verified in numerical experiments not shown here. Note also that the simplification given in Eq. (8) differs from that of Zieba (2010), which presented the result of a similar simplification. While we reduced the
The method we used for the generation of exponentially correlated noise relies on a convolution kernel provided by Oliver (1995). For αb ≳ 10 km the tail of the kernel became sufficiently truncated as to prevent the generation of purely exponentially correlated noise. For this reason, numerical results are only provided up to αb = 10 km. The range of most figures presented in this study nevertheless extends to αb = 100 km to emphasize the symmetrical nature of σoptim.