## 1. Introduction

The observation error of representation will be defined here as the difference, in observation space, between an observation taken with a “perfect” instrument, without instrument error, and the model state equivalent in observation space. Such errors arise from the fact that the observation typically measures a higher-resolution state than the filtered state represented by the model. In the case of a point observation of a model variable, the observation error variance of representation would be equal to the variance of all possible subfilter/subgrid variations of this model variable given the same filtered state. In the case that the observation is an integral or average of some aspect of the true state, then the observation error variance of representation is again equal to the variance of the average that is observed given the same true filtered state. Typically, the observation error of representation is defined to include errors in the observation operator, which is used to map the model state to the observation location (e.g., Daley 1993; Mitchell and Daley 1997a,b, among others). Van Leeuwen (2015) further discusses that, from a Bayesian point of view, the analysis step of any practical data assimilation scheme, requires finding the posteriori probability density function of a discrete model state given the observation, not the density function of a true (infinite resolution) atmospheric state given the observation value. A careful examination of the observation likelihood when the observation is conditioned on a lower-resolution state reveals that such terms should be included in the observation error covariance matrix (van Leeuwen 2015; Hodyss and Nichols 2015).

The observation error of representation depends on the true filtered state because differing large-scale flows support differing types of subfilter variations. Consequently, one would expect representation error to depend on the model state and possibly be correlated in time (Mitchell and Daley 1997a,b; van Leeuwen 2015; Janjić and Cohn 2006). However such flow dependence of the observation error covariance matrix is typically neglected. Frehlich (2006) addressed spatial variations of the observation error variance, which is dependent on the statistics of small scales, by calculating the observation error covariance based on estimates of local turbulence.

There are several methods that can be used to estimate the full observation error covariance, including the error of representation. Hollingsworth and Lönnberg (1986, hereafter denoted the H–L method) split (observation minus background) innovation statistics into observation and background covariance terms. This method involves building a histogram of innovation covariances binned by separation distance and extrapolating to zero separation by fitting a function. To do this requires the key assumptions that the observation statistics are uncorrelated beyond zero separation and that the observation and background errors are uncorrelated. In this way, the extrapolation to zero separation provides an estimate of the background error variance. The observation variance is then computed by subtracting the background variance from the innovation variance. This method requires a dense observing network to bin innovations by distance and the resulting error variances are dependent on the chosen function used for extrapolation. If spatially correlated observation errors are present, the resulting observation error would be too small since spatially correlated terms would be aliased onto the background error variance. Another method due to Desroziers et al. (2005) involves taking the expected value of the matrix outer product of innovations and observations-minus-analysis residuals. This method is easy to implement, but can be in error when the data assimilation system used to make the analysis uses poorly specified background and observation variances and their corresponding spatial correlations (Chapnik et al. 2004; Ménard 2016). The Desroziers method is often thought of as a “consistency check,” as this method will be perfectly accurate if the gain matrix used in the data assimilation system is in agreement with the true covariances for background and observation errors (Desroziers et al. 2005). Therefore, this method will produce errors due to inaccuracies in the prescribed error covariance matrices. A computationally costly iterative procedure, sequentially updating the error covariance matrices and applying the Desroziers diagnostic, is often suggested. Waller et al. (2016) showed that, although this method is subject to prescribed background and observation covariance matrices, a useful solution can often be obtained in a single iteration even when iterative techniques cannot be expected to converge. On the other hand, Hodyss and Satterfield (2017) showed that when the observation is at a higher resolution than the model state, both the Desroziers method and the H–L method will have contributions from representation error as well as errors from resolved scales.

Other methods, such as Forget and Wunsch (2007) and Oke and Sakov (2008) use only observational data to produce maps of representation error. These methods involve averaging the observation data to model resolution and then interpolating back to high resolution to compute differences with the observation, which can be regarded as an estimate of representation error for a particular point in time. Such methods can be viewed as an extension of Mitchell and Daley (1997b) and Liu and Rabier (2002) who defined representation error as the difference between a full state and a spectrally truncated form. Further, such differences could be approximated, in model space, by using a high-resolution model field and its spectrally truncated form; however, the resulting differences would be smaller than the actual representation error for two reasons. First, the model state will likely lack processes measured by the observations. Second, the *effective resolution* of a particular model may be smaller than the precise spectral truncation of the model due to physical parameterizations, diffusion, or numerical processes, which may act to smooth the field.

In what follows we explore the spatial structure of representation error, specifically the portion related to smoothed boundaries of large-scale gradients. Since the ensemble variance from a coarse-resolution model could be expected to be higher in regions of uncertain boundaries, we further examine the ability of the ensemble variance to predict such observation representation error variance. In section 2 we provide the necessary mathematical definitions to facilitate our discussion of representation error. In section 3, we use spectrally truncated model output to display the spatial distribution of the observation error of representation. We also investigate the relationship between the error of representation and the ensemble variance. Section 4 shows that both the Desroziers and H–L methods produce estimates of observation error variance, including the portion owing to representation error, which are a function of ensemble variance. Discussion and conclusions follow in section 5.

## 2. Mathematical formulation

To facilitate the following discussion, we follow the formalism of Mitchell and Daley (1997a,b); however, we define our terms in model grid space rather than in spectral space. Such a definition gives some ease to dealing with sparse observations, such as radiosondes. Another difference between our discussion and that of Mitchell and Daley (1997a,b) is that we will differentiate between the observation operator error

*K*-member ensemble as follows:

## 3. Visualizing the structure of representation error

To explore the structure of the representation error described by (3) and visualize such error as a continuous field, we consider the difference between a high-resolution model field and its spectrally truncated form. The idea is that if one had perfect observations of the high-resolution state and our model had a lower resolution, obtainable by filtering the high-resolution state in some way, then the differences between the high-resolution state and the filtered state would provide specific realizations of observation errors of representation. As previously noted, the resulting differences will likely be smaller than the actual representation error, so the results in this section simply aim to gain a *qualitative* understanding of the locations where representation error may be found and its relative magnitude. For the discussion in this section, we compute the spectral truncation by truncating spherical harmonics.

### a. Spatial structure based on ECMWF analyses

To explore error variances due to spectral truncation, we use ECMWF analyses obtained from the THORPEX Interactive Grand Global Ensemble (TIGGE) archive (Richardson 2005). We obtain the ECMWF analyses and deterministic forecasts at the 6-h lead time with initial conditions at 0000 UTC, on a 0.25° × 0.25° regular grid (approximately T639 spectral resolution) for 1 month (January 2015) during Northern Hemisphere (NH) winter and 1 month (July 2013) during NH summer. To model the qualitative properties of the representation error, we truncate spherical harmonics to reconstruct a smoothed field using only the first 100 total wavenumbers (T100). The truncated field is then interpolated to the 0.25° × 0.25° grid and the difference between the high-resolution fields and the smoothed fields are used to investigate the representation error. We note that these differences will include an interpolation error; however, such errors should be small and we only use these results qualitatively to motivate later discussions. Last, we emphasize that our choice for truncating at T100 is arbitrary and was chosen simply to provide a reasonable baseline to compare against the higher-resolution ECMWF forecasts.

The method described above differs from that of Oke and Sakov (2008) who applied a boxcar filter and then interpolated to observation location. The work of Mitchell and Daley (1997b), Liu and Rabier (2002), and Hodyss and Nichols (2015) show that a spectral truncation will isolate the representation error while the boxcar filter will not. We compute the variance of the difference between the high-resolution field and the spectrally truncated field to *qualitatively* understand the behavior of the representation error.

The zonally averaged error variances due to truncation of spherical harmonics is obtained by averaging the squared error for the entire month, and are shown in Fig. 1 (July 2013) and Fig. 2 (January 2015) for temperature. We do not show the calculations below the 700-hPa levels because the calculation at lower levels is dominated by spectrally truncating the terrain. Analyses above 200 hPa are not currently available through the TIGGE archive. For the NH summer period, we see peaks in representation error between 200 and 250 hPa, near the jet maximum, extending from 45° to 75°N. The upper-troposphere peak strengthens and shifts equatorward in the NH winter period (Fig. 2). We also see lower-tropospheric peaks between 35° and 40°N and near 70° and 75°N, again these peaks become stronger and elongated in in the NH winter season.

Zonally averaged error variance due to spectral truncation of the ECMWF 6-h deterministic forecast for January 2015. Shown for temperature.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Zonally averaged error variance due to spectral truncation of the ECMWF 6-h deterministic forecast for January 2015. Shown for temperature.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Zonally averaged error variance due to spectral truncation of the ECMWF 6-h deterministic forecast for January 2015. Shown for temperature.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

### b. Ensemble variance as a predictor of representation error variance

In what follows, we will explore the relationship between the ensemble variance and the variance of the error of representation. To motivate such a discussion, we consider fluctuations of the real state about the filtered state, which may be associated with subfilter eddies or turbulence, or it may be related to the inability of the filtered state to represent persistent large-scale gradients. In the application considered in this section, where the difference in resolution of the original 0.25° × 0.25° grid and the T100 smoothed state is not that large and the resolution of the original grid is relatively coarse, we suspect that most of the error of representation is associated with resolving sharp gradients associated with fronts, jets, etc., rather than entirely missing subgrid-scale features. We also note that in this case, the error variance of representation is likely to be associated with representation errors that have a high degree of spatiotemporal correlation.

Relating representation error to the jet structure or synoptic-scale features that, while resolved may be resolved poorly, is a relatively new idea. To further develop this idea, Figs. 3 and 4 show a plan view of the error variance due to the T100 spectral truncation at 200 hPa, again shown for temperature. We see peaks in the representation error in the midlatitude storm-track regions. The representation error is more prominent in the NH winter and shifts poleward during the NH summer following the jet maximum. The monthly averaged temperature field for January 2015 and July 2013 (not shown) indicate significant departures from the zonal structure, in particular, the temperature gradient in January, which tracks across the northeastern coast of the United States to the United Kingdom. We can see this gradient in the structure of representation error in Fig. 4.

Plan view error variance due to spectral truncation of the ECMWF 6-h deterministic forecast at 200 hPa for July 2013. Shown for temperature.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Plan view error variance due to spectral truncation of the ECMWF 6-h deterministic forecast at 200 hPa for July 2013. Shown for temperature.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Plan view error variance due to spectral truncation of the ECMWF 6-h deterministic forecast at 200 hPa for July 2013. Shown for temperature.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Plan view error variance due to spectral truncation of the ECMWF 6-h deterministic forecast at 200 hPa for January 2015. Shown for temperature.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Plan view error variance due to spectral truncation of the ECMWF 6-h deterministic forecast at 200 hPa for January 2015. Shown for temperature.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Plan view error variance due to spectral truncation of the ECMWF 6-h deterministic forecast at 200 hPa for January 2015. Shown for temperature.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

These arguments highlight the fact that representation error is partially due to smoothed boundaries of large-scale features. Since the ensemble variance from a coarse-resolution model could be expected to be higher in regions of uncertain boundaries, it seems likely that ensemble variance could be a good predictor of this component of the observation error of representation. To assess this possibility, we bin ensemble variance and representation error (defined as high-minus-low-resolution states) pairs, defined for temperature at each spatial and temporal point, into 10 equally populated bins as a function of ensemble variance. Figure 5 shows the mean ensemble variance (abscissa) for each bin and the variance of the observation representation error for the entire globe for the month of January (2 076 480 points per bin). The line on the figure is the relationship between ensemble variance and representation error variance as determined by linear regression. The linear regression fits the data remarkably well, particularly at and near the jet maximum at the 200–250-hPa levels and near surface, in qualitative agreement with Fig. 2. A mathematical discussion as to why the ensemble variance and the observation error variance of representation should be expected to be related can be found in the appendix. This encourages us to further develop the concept of using the ensemble variance as a predictor of representation error.

The bin mean ensemble variance vs the bin mean observation error variance of representation due to spectral truncation for 10 equally populated bins based on values of ensemble variances (blue dots). Results are shown for temperature for the month of January. The linear regression is also shown (red line).

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

The bin mean ensemble variance vs the bin mean observation error variance of representation due to spectral truncation for 10 equally populated bins based on values of ensemble variances (blue dots). Results are shown for temperature for the month of January. The linear regression is also shown (red line).

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

The bin mean ensemble variance vs the bin mean observation error variance of representation due to spectral truncation for 10 equally populated bins based on values of ensemble variances (blue dots). Results are shown for temperature for the month of January. The linear regression is also shown (red line).

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

## 4. Analysis of the H–L and Desroziers methods

In this section we further explore the predictive relationship between the ensemble variance and the variance associated with representation error, qualitatively illustrated in the previous section. In what follows, we address how well commonly used statistical methods estimate the error of representation and explore the dependence of that estimate on ensemble variance.

### a. Model and observational data

We use innovation statistics obtained from the U.S. Navy Global Environmental Model (NAVGEM; Hogan et al. 2014) at T425 spectral resolution (31 km) and 60 vertical levels for a NH winter season (1 December–4 February 2014) and for a NH summer season (15 May–15 August 2013) at 0000 and 1200 UTC. The data assimilation algorithm used is a hybrid 4DVAR [for details see Kuhl et al. (2013)] in which ensemble and static error covariances are combined at the initial time and a tangent linear model (TLM) and its adjoint are used to linearly propagate the initial ensemble-based covariance matrix across the data assimilation window and no outer loop is employed. Typically, an “alpha” weighting of

To avoid complications of a complex observation operator, correlations introduced by variational bias corrections, and correlations due to dense observations, we restrict our observation type to radiosondes. For this case, the observation operator is simply an interpolation to observation location. Also, to reduce the possibility of instrument errors differing between radiosonde types, we further restrict our study to only Vaisala RS-92 radiosondes. We restrict the radiosonde type to ensure that a regionally varying instrument error due to the use of different radiosondes in different regions is not misinterpreted as representation error.

The geographical distribution of these observations is shown in Fig. 6. For our study, we focus on the Northern Hemisphere (30°–90°N), keeping in mind that the majority of these observations come from stations in western Europe or from European ships in the northern Atlantic. In our setup, the background forecast is run at T425 resolution. The ensemble is run at the lower inner loop resolution of T119L60 (360 × 180 Gaussian grid with 60 vertical levels) and is only used to produce the background error covariance terms.

The geographical distribution of Vaisala RS92 radiosondes at 1000 hPa shown for the NH summer period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

The geographical distribution of Vaisala RS92 radiosondes at 1000 hPa shown for the NH summer period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

The geographical distribution of Vaisala RS92 radiosondes at 1000 hPa shown for the NH summer period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

### b. Desroziers and H–L methods

*D*is used to differentiate the estimated quantity using the Desroziers method from the true value. The background error covariance matrix can be similarly obtained by taking the expected value of the outer product of the analysis increment

As mentioned previously, errors in the assumed background or observation error covariance matrices can lead to errors in the Desroziers estimate (e.g., Ménard 2016; Waller et al. 2016, among others).

The H–L method uses innovation statistics binned by distance. This method assumes that the observation error is not horizontally correlated. By extrapolating the spatial autocorrelation of the innovation to the origin (distance zero), one can separate the observation error variances and the background error variances. Although, the H–L method has less dependence on the assumed covariance matrices, this method requires a dense observing network to bin observations by distance and is dependent on the choice of correlation function. In what follows, we simply use the H–L method as a consistency check for the Desroziers method to better understand what features of our analysis result from deficiencies in the prescribed covariance models.

One may wonder how well, or if, these methods account for representation error in their estimate of the observation error covariance matrix. Hodyss and Satterfield (2017) explicitly extend the method of Desroziers et al. (2005) to the case where the model state vector is at a lower resolution than the true system state vector. It is shown that both methods provide estimates of observation error variance that include representation error variance, but are also influenced by bias terms and a portion of the resolved-scale error variance, usually associated with the background error covariance matrix.

### c. Estimates of observation error variance as a function of ensemble variance

Because of the choice of observations, we do not expect that instrument error or error in the observation operator should vary as a function of ensemble variance, whereas section 3 strongly suggests that the error of representation is an increasing function of ensemble variance. To test whether such a relationship exists in the real observation error variance of representation, we divide the data into two equally populated bins based on ensemble variance. We perform Desroziers and H–L analyses separately for each bin. For both methods, we first remove the temporal and spatial mean from the innovation, analysis residual, and increment terms. For the H–L method, we then define 200-km distance bins and fit a second-order autoregressive (SOAR) correlation function to extrapolate to zero separation. (The use of the SOAR function is consistent with the static forecast error covariance model used in NAVDAS-AR.) We tested the choice of 200-, 300-, and 400-km bin sizes (not shown) and verify that wider bin widths result in a greater smoothing and an increase in the correlation length scale of the background covariance as well as a reduction in the magnitude of the background variance at zero separation. This behavior resulted in increased values of observation error variance with increased bin size. We use 200-km distance bins to maintain a relatively small bin size, but still have a sufficient number of observations per bin. To obtain confidence intervals for the H–L method we first obtain the 95% confidence interval from the first (0–200 km) distance bin and fit SOAR functions to the upper and lower bounds of that interval. The extrapolation to zero separation then provides an estimate of the 95% confidence intervals of the H–L estimated background error variance. Observation error variance confidence intervals are defined by the difference between the innovation variance and the confidence intervals for the background error variance.

Figures 7 and 8 compare the Desroziers result with the H–L method for observations of temperature. For all experiments the assumed observation error variances for temperature observations at all latitudes is 1 K^{2} from the surface to 300 hPa, above which the values increase exponentially to 4.14 K^{2} at approximately 10 hPa. For the NH summer season, Fig. 7 shows that, as anticipated, the estimated observation error variance value in the 150–300-hPa layer is an increasing function of ensemble variance, indicating the presence of representation error. We again see evidence of lower-tropospheric representation error at 700 and at 850 hPa. For the NH winter case, we still see variation in observation error variance estimates near the jet maximum; however, it extends to lower levels in the atmosphere, consistent with Fig. 2. We also see stratospheric indications of representation error, potentially related to the polar night jet. The lower layer between 850 and 925 hPa consistently shows an indication of representation error. In general, the H–L method and the Desroziers method show fairly good agreement both in their quantitative estimates of observation error and the atmospheric levels in which observation error variance estimates are an increasing function of ensemble variance, indicating the presence of representation error. The vertical structure of representation error provided by these estimates is qualitatively consistent with the structure illustrated by spectral truncation in Figs. 1 and 2, showing peaks in representation error at jet levels and at lower levels in the atmosphere.

H–L and Desroziers recovered observation error variances for the upper 50% ensemble spread values and the lower 50% of ensemble spread values in the Northern Hemisphere for 17 pressure levels. Results are shown for temperatures during the NH summer period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

H–L and Desroziers recovered observation error variances for the upper 50% ensemble spread values and the lower 50% of ensemble spread values in the Northern Hemisphere for 17 pressure levels. Results are shown for temperatures during the NH summer period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

H–L and Desroziers recovered observation error variances for the upper 50% ensemble spread values and the lower 50% of ensemble spread values in the Northern Hemisphere for 17 pressure levels. Results are shown for temperatures during the NH summer period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

H–L and Desroziers recovered observation error variances for the upper 50% ensemble spread values and the lower 50% of ensemble spread values in the Northern Hemisphere for 17 pressure levels. Results are shown for temperatures during the NH winter period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

H–L and Desroziers recovered observation error variances for the upper 50% ensemble spread values and the lower 50% of ensemble spread values in the Northern Hemisphere for 17 pressure levels. Results are shown for temperatures during the NH winter period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

H–L and Desroziers recovered observation error variances for the upper 50% ensemble spread values and the lower 50% of ensemble spread values in the Northern Hemisphere for 17 pressure levels. Results are shown for temperatures during the NH winter period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

### d. Potential errors in the Desroziers method when binned by ensemble variance

To better assess potential deficiencies in the prescribed background error covariance we turn our attention to the ensemble variance. The ensemble variance should provide an estimate of the forecast error variance and therefore should approximate

The innovation variance (red) and the estimated forecast error variance (blue) for each of 20 equally populated bins based on ensemble variance. The black line indicates the theoretical case in which the forecast error variance is exactly equal to the ensemble variance. Results are shown for the NH winter period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

The innovation variance (red) and the estimated forecast error variance (blue) for each of 20 equally populated bins based on ensemble variance. The black line indicates the theoretical case in which the forecast error variance is exactly equal to the ensemble variance. Results are shown for the NH winter period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

The innovation variance (red) and the estimated forecast error variance (blue) for each of 20 equally populated bins based on ensemble variance. The black line indicates the theoretical case in which the forecast error variance is exactly equal to the ensemble variance. Results are shown for the NH winter period.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Using a toy model, Hodyss and Satterfield (2017) showed that although an overdispersive ensemble would result in an underestimated value for the observation error variance [consistent with Waller et al. (2016) and Ménard (2016) among others], when binning by ensemble variance a negative slope would be expected. This result implies that the findings in section 4c, that observation error variance estimates were an increasing function (positive slope) of ensemble variance, could not result from an overdispersive ensemble. In fact, the separation between low and high ensemble variance bins shown in Figs. 7 and 8 is likely an underestimate.

## 5. Accounting for representation error in the observation error covariance matrix

The purpose of this section is to show how predictive relationships could be derived to model the observation error covariance in a manner that accounts for variations due to representation error. To achieve this we apply a binning method to temperature innovations (as in Figs. 7 and 8) using five equally populated bins based on ensemble variance. We apply the Desroziers method separately in each bin. As a consistency check, the H–L method is also applied; however, due to limited data we only apply the H–L method in three bins. Figures 10 (NH summer season) and 11 (NH winter season) show the observation error variance estimate obtained from H–L (blue) and from Desroziers (red). We also show the linear function (solid green line) obtained by regressing the eight bin estimates of observation error variance (five Desroziers bin estimates and three H–L bins estimates) onto the corresponding bin mean ensemble variance.

Linear regression (solid green line) for observation error variance as a function of ensemble spread based on five equally populated bins using the Desroziers method (red) and three equally populated bins using the H–L method (blue). Shown for the NH summer season.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Linear regression (solid green line) for observation error variance as a function of ensemble spread based on five equally populated bins using the Desroziers method (red) and three equally populated bins using the H–L method (blue). Shown for the NH summer season.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Linear regression (solid green line) for observation error variance as a function of ensemble spread based on five equally populated bins using the Desroziers method (red) and three equally populated bins using the H–L method (blue). Shown for the NH summer season.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Linear regression (solid green line) for observation error variance as a function of ensemble spread based on five equally populated bins using the Desroziers method (red) and three equally populated bins using the H–L method (blue). Shown for the NH winter season.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Linear regression (solid green line) for observation error variance as a function of ensemble spread based on five equally populated bins using the Desroziers method (red) and three equally populated bins using the H–L method (blue). Shown for the NH winter season.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

Linear regression (solid green line) for observation error variance as a function of ensemble spread based on five equally populated bins using the Desroziers method (red) and three equally populated bins using the H–L method (blue). Shown for the NH winter season.

Citation: Monthly Weather Review 145, 2; 10.1175/MWR-D-16-0299.1

For these regressions, one could reasonably assume that the *y*-intercept values should correspond to the instrument error plus any possible minimum value of (static) representation error and model error. The *y*-intercept term allows one to prescribe a static error, as is currently done in the observation covariance model, providing a hard minimum for the observation error variance. The slope term then allows representation error to vary as a function of ensemble variance. In an application, one would simply apply a linear interpolation in the vertical and potentially by region to the *y* intercept (static observation error variance) and to the slope term, which would then provide a flow-dependent field as an estimate of representation error variance. As discussed in section 4d errors in ensemble dispersion may result in error in both the slope and intercept terms. In such a case these values may need to be tuned.

One concern about such a method may be whether including background error covariance information in the prediction of observation error variances would violate the assumption of independence. We note that this should not be the case, since we are only using background information to predict the magnitude of the variance, not the sign; the forecast errors and observation errors should remain uncorrelated when this method is applied.

It has been found by Kuhl et al. (2013), among others, that the inclusion of ensemble covariances in hybrid 4DVAR data assimilation schemes increases the number of inner loops required for inner loop convergence. It has been argued that this is a direct result of the fact that the inclusion of the ensemble covariances greatly increases the range of the ratio of forecast error variance to observation error variance. Examining Figs. 10 and 11, we note that for the NH winter season at 250 hPa this would result in observation error variances ranging from 0.42 to 2.74 K^{2} in the NH, resulting in a ratio of largest to smallest predicted observation error variance of 6.56 when the full range of ensemble variances are considered. It is of interest to note that implementing our proposed ensemble variance–based observation error variance prediction scheme would lower this range of the ratio of forecast to observation error variance because it would cause the observation and forecast error variance to move in unison. Hopefully, this would lead to lower condition numbers and faster inner loop convergence with fewer iterations. We look forward to exploring these possibilities in future work.

## 6. Discussion and conclusions

In this study, we used variances computed by taking differences between model states and their spectrally truncated form to produce maps that qualitatively predict representation error. In this idealized framework, we demonstrated a predictive relationship between the ensemble variance and the variance due to representation error. Using real observations and a cycling operational data assimilation scheme, we demonstrated that *both* the Desroziers and the H–L methods result in observation error variance estimates that are clearly an increasing function of ensemble variance at most of the levels examined. The results strongly support the hypothesis that the observation of error variance of representation is an increasing function of the variance of a well-formed ensemble. By limiting the dataset to RS-92 radiosondes, which should not have a flow-dependent instrument error or large observation operator error, and through comparison to the previously mentioned maps of representation error, we were able to determine regions where we believe the spatially varying component of observation error variance to be dominated by representation error. Thus, we outlined a procedure in which several estimation methods can be used in concert to diagnose the error of representation, and we derived a linear relationship between the ensemble variance and the error of representation that could be easily implemented to account for flow dependence in the observation error covariance matrix. Such a procedure is general enough to be applied to other observation types, although potential spatial variations in instrument error as well as correlated errors need to be accounted for.

It was shown in Hodyss and Nichols (2015) that the observation error covariance matrix in the presence of representation error should be calculated using the prior ensemble. We have attempted in this work to verify this possibility, and to the degree that we have, this work opens the door to a new approach to ensemble-enhanced data assimilation in which the observation error variances are adjusted based on the ensemble variance. The found variations in observation error variance are nonnegligible and would likely reduce the condition number associated with the 4DVAR inner loop. This reduction in condition number would, in principle, reduce the number of iterations required to minimize the 4DVAR cost function.

We admit the possibility of the influence of bias and/or model error terms in our estimation of the observation error variance, and further that the ensemble prediction could reflect such terms. We have therefore limited our discussion to levels where the influence of representation error is clearer: representation error peaked at 200 hPa based on truncating spherical harmonics to ECMWF forecasts and similar results were seen using Desroziers and H–L methods. However, there are levels where the behavior is less clear. Further work may more accurately address some of the limitations due to the inability to disentangle the model error terms from the observation error terms in statistical estimation methods.

Although a component of the representation error can be explained by static predictors (e.g., latitude or climatological error variance), the ensemble variance will also incorporate dynamic information into the prediction. We expect the introduction of flow dependence to be highly beneficial, as we have shown that subgrid-scale boundaries associated with strong gradients are a key contributor to error of representation. We further note that although this study focuses on representation errors related to poorly resolved features, the ensemble may also capture the representation error related to subgrid-scale features, through variance associated with physics parameterizations. However, quantifying the benefits of using a static model of observation error variances, for example, based on climatological error of representation, versus a flow-dependent model is left to future work. Future work is also required to quantify the performance gains realized by predicting the error of representation using ensemble variances and whether changing the gain matrix in the manner suggested by Hodyss and Nichols (2015) is necessary to realize such benefits. Finally, while we focus on the diagonal contribution (sparsely placed radiosondes), such an understanding could be extended to correlations terms as well.

## Acknowledgments

This research is supported by the Chief of Naval Research through the NRL Base Program, PE 0601153N. The forecasts were obtained from the THORPEX Interactive Grand Global Ensemble (TIGGE) data portal at ECMWF. The two anonymous reviewers made several helpful suggestions that helped us improve the presentation of our results.

## APPENDIX

### Observation Error Variance as a Function of the Prior

*β*is a constant then

## REFERENCES

Chapnik, B., G. Desroziers, F. Rabier, and O. Talagrand, 2004: Properties and first application of an error-statistics tuning method in variational assimilation.

,*Quart. J. Roy. Meteor. Soc.***130**, 2253–2275, doi:10.1256/qj.03.26.Daley, R., 1993: Estimating observation error statistics for atmospheric data assimilation.

,*Ann. Geophys.***11**, 634–647.Desroziers, G., L. Berre, B. Chapnik, and P. Poli, 2005: Diagnosis of observation, background and analysis-error statistics in observation space.

,*Quart. J. Roy. Meteor. Soc.***131**, 3385–3396, doi:10.1256/qj.05.108.Forget, G., and C. Wunsch, 2007: Estimated global hydrographic variability.

,*J. Phys. Oceanogr.***37**, 1997–2008, doi:10.1175/JPO3072.1.Frehlich, R., 2006: Adaptive data assimilation including the effect of spatial variations in observation error.

,*Quart. J. Roy. Meteor. Soc.***132**, 1225–1257, doi:10.1256/qj.05.146.Hodyss, D., and N. Nichols, 2015: The error of representation: Basic understanding.

,*Tellus***67A**, 24822, doi:10.3402/tellusa.v67.24822.Hodyss, D., and E. Satterfield, 2017: The treatment, estimation, and issues with representation error modelling.

*Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications*, S. K. Park and L. Xu, Eds., Vol. III, Springer International Publishing, 177–194.Hogan, T. F., and Coauthors, 2014: The Navy Global Environmental Model.

,*Oceanography***27**, 116–125, doi:10.5670/oceanog.2014.73.Hollingsworth, A., and P. Lönnberg, 1986: The statistical structure of short-range forecast errors as determined from radiosonde data. Part I: The wind field.

,*Tellus***38A**, 111–136, doi:10.1111/j.1600-0870.1986.tb00460.x.Janjić, T., and S. E. Cohn, 2006: Treatment of observation error due to unresolved scales in atmospheric data assimilation.

,*Mon. Wea. Rev.***134**, 2900–2915, doi:10.1175/MWR3229.1.Kuhl, D. D., T. E. Rosmond, C. H. Bishop, J. McLay, and N. L. Baker, 2013: Comparison of hybrid ensemble/4DVar and 4DVar within the NAVDAS-AR data assimilation framework.

,*Mon. Wea. Rev.***141**, 2740–2758, doi:10.1175/MWR-D-12-00182.1.Liu, Z.-Q., and F. Rabier, 2002: The interaction between model resolution, observation resolution and observation density in data assimilation: A one-dimensional study.

,*Quart. J. Roy. Meteor. Soc.***128**, 1367–1386, doi:10.1256/003590002320373337.McLay, J., C. H. Bishop, and C. A. Reynolds, 2010: A local formulation of the ensemble-transform (ET) analysis perturbation scheme.

,*Wea. Forecasting***25**, 985–993, doi:10.1175/2010WAF2222359.1.Ménard, R., 2016: Error covariance estimation methods based on analysis residuals: Theoretical foundation and convergence properties derived from simplified observation networks.

,*Quart. J. Roy. Meteor. Soc.***142**, 257–273, doi:10.1002/qj.2650.Mitchell, H. L., and R. Daley, 1997a: Discretization error and signal/error correlation in atmospheric data assimilation (I). All scales resolved.

,*Tellus***49A**, 32–53, doi:10.1034/j.1600-0870.1997.00003.x.Mitchell, H. L., and R. Daley, 1997b: Discretization error and signal/error correlation in atmospheric data assimilation (II). The effect of unresolved scales.

,*Tellus***49A**, 54–73, doi:10.1034/j.1600-0870.1997.t01-4-00004.x.Oke, P. R., and P. Sakov, 2008: Representation error of oceanic observations for data assimilation.

,*J. Atmos. Oceanic Technol.***25**, 1004–1017, doi:10.1175/2007JTECHO558.1.Richardson, D., 2005: The THORPEX Interactive Grand Global Ensemble (TIGGE).

*Geophys. Res. Abstr*.,**7**, 02815, Abstract EGU05-A-02815. [Available online at http://meetings.copernicus.org/www.cosis.net/abstracts/EGU05/02815/EGU05-J-02815.pdf.]van Leeuwen, P. J., 2015: Representation errors and retrievals in linear and nonlinear data assimilation.

,*Quart. J. Roy. Meteor. Soc.***141**, 1612–1623, doi:10.1002/qj.2464.Waller, J. A., S. L. Dance, and N. K. Nichols, 2016: Theoretical insight into diagnosing observation error correlations using observation-minus-background and observation-minus-analysis statistics.

,*Quart. J. Roy. Meteor. Soc.***142**, 418–431, doi:10.1002/qj.2661.