## 1. Introduction

Several recent studies have shown that the short-range forecast errors in the state-of-the-art numerical weather prediction (NWP) models are largest in the tropics. This is illustrated by the standard deviation of an ensemble of 6-h forecasts initialized by an ensemble of 4D-Var analyses produced by the operational system of ECMWF on a random recent date (initial time for the forecast was 2100 UTC 15 June 2015; Fig. 1). The standard deviation of an ensemble of forecasts (also denoted spread) measures the forecast uncertainty that is maximal in the upper tropical troposphere and the lower stratosphere in Fig. 1. The analysis uncertainty looks very similar, but the amplitudes are somewhat smaller. Similar results were presented in several recent papers (Žagar et al. 2013; Cardinali et al. 2014; Žagar et al. 2015a). With a permanent maximum of analysis and forecast uncertainty in the tropical upper troposphere, it is not surprising that the description of the tropopause and stratospheric variability, which is maintained by the vertically propagating equatorial waves, is not reliable (e.g., Fujiwara et al. 2012; Podglajen et al. 2014).

At the same time, data assimilation for NWP has traditionally been focused on the reduction of the short-range forecast errors in the synoptic scales in the midlatitude troposphere. From early on, the modeling of the background error covariance matrix, which is crucial for the success of data assimilation, has relied on the understanding that the short-range forecast errors are characterized by balances which govern the underlying dynamics (Daley 1991). Therefore, quasigeostrophic (QG) balance between the errors in temperature and winds has been utilized in several operational global and mesoscale NWP systems (Parrish and Derber 1992; Courtier et al. 1998; Derber and Bouttier 1999; Gustafsson et al. 2001). In the midlatitudes, this approach includes the use of balanced and unbalanced components of temperature and divergent wind increments where balance is defined for example by the linear balance equation on the sphere (Derber and Bouttier 1999). The balance can be only approximate as in the case of statistically defined coupling between streamfunction and temperature variables in the NCEP data assimilation system (Wu et al. 2002) and the balanced increments may be further changed by steps such as the incremental normal-mode procedure (Kleist et al. 2009).

In the tropics, perturbations in the geopotential field are small and they appear to be only weakly coupled to the wind field; this applies to both full dynamical fields and their forecast errors. Coupling of the forecast errors in the tropical wind field and the geopotential height was investigated in Žagar et al. (2004) and Žagar et al. (2005); they demonstrated that the apparent decoupling is a consequence of the mixture of the tropical mass–wind relationships involving a spectrum of the inertio-gravity (IG) waves in addition to the QG coupling of the equatorial Rossby waves. For example, the most important features of the large-scale tropical circulation involve the Kelvin wave, the zonally propagating mode characterized by the geostrophic coupling between the zonal wind and the meridional pressure gradient. Žagar et al. (2004, 2005) showed that the Kelvin mode, with its strong positive coupling between the geopotential height and the zonal wind, has a role to reduce a negative coupling between the mass and the zonal wind that is imposed by the QG balance at the equator. Furthermore, a new model for the tropical 4D-Var data assimilation revealed that the presence of other westward- and eastward-propagating equatorial IG modes in the background error-covariance model has a consequence of trapping the analysis increments close to the equator and of a further reduction of the mass–wind coupling in the tropics. This illustrates that tropical data assimilation presents challenges that far exceed those in the midlatitudes where the first-order description of the synoptic-scale dynamics successfully relies on a single mass–wind coupling (i.e., that of the Rossby modes).

As the global NWP models increase their resolution to about 10 km and data assimilation for mesoscale models is now applied at horizontal scales close to 1 km, the dynamical issues currently important for the tropics become equally relevant for mesoscale data assimilation. Moreover, Harlim and Majda (2013) extended the idea of involving the spectrum of IG waves in the background error term for data assimilation to the case with a full moist background covariance model and demonstrated that tropical 3D-Var assimilation with the moist covariances can recover the unobserved tropospheric humidity and precipitation rate. The issue of the tropical mass–wind coupling is even more relevant given the lack of direct wind observations in the tropics as discussed in a number of studies (e.g., Podglajen et al. 2014; Baker et al. 2014).

While the above-mentioned tropical data assimilation studies have been limited to the simplified framework and the tropical channel, the operational NWP models systematically demonstrate the forecast skill improvement associated with the improvement of the simulation of tropical variability (e.g., Vitart and Molteni 2010). This is a result of systematic improvements of the observing systems, data assimilation procedures for making these observations into the improved initial conditions (ICs) for forecasts, and improvements of the models’ physics, dynamics, and numerics (e.g., Magnusson and Källén 2013). Further advancement of NWP will depend on improvements of all these components. In particular, the model error remains a large uncertainty source in NWP; its impact on data assimilation and on the uncertainties in ICs for the medium-range forecasts is currently simulated by parameterizations such as the inflation of the background error covariances (e.g., Houtekamer and Mitchell 2005; Anderson 2009), the stochastically perturbed parameterized tendency (e.g., Buizza et al. 1999), and the backscatter methods (e.g., Shutts 2005, 2013).

Given the complexity of moist convection and deficiencies of the tropical data assimilation in global NWP models, one wonders about the role of the model errors in producing the observed forecast error maxima in the tropics. Furthermore, we ask whether the structure of short-term forecast errors is different in the case of the ensemble Kalman filter (EnKF) data assimilation using the flow-dependent background error covariances in contrast to the variational framework behind Fig. 1.

These questions are addressed here by carrying out a data assimilation experiment involving a perfect model and flow-dependent background error covariances. The experimental framework is the so-called observing system simulation experiment (OSSE) in which the “truth” or “nature run” is simulated by the same model used for the assimilation. This is also known as the “identical twin” OSSE experiment (e.g., Atlas 1997). By adopting the OSSE framework and the EnKF data assimilation, this study focuses on the scale-dependent information content of the wind and temperature observations in the EnKF in relation to the underlying dynamics. For this purpose, we derive a transformation of the state vector based on the 3D normal-mode function decomposition that provides diagonal background and analysis error covariance matrices and allows the scale-dependent quantification of the prior variance reduction by the assimilated observations. In particular, the perfect-model assumption facilitates discussion of the dynamical properties of data assimilation in the tropics and the role of the balanced and inertio-gravity (unbalanced) dynamics.

The paper has the following organization. Section 2 presents methodology for the quantification of the information content of observations as a function of the spatial scales and dynamical balance. The section includes a relationship between the new model and the degrees of freedom for signal and entropy reduction concepts for data assimilation. In section 3, we present the modeling framework including the global forecast model and the ensemble Kalman filter data assimilation. Properties of the short-range forecast errors produced by using a perfect-model with an EnKF are discussed in section 4. Section 5 discusses the value of observations in terms of their contribution to the reduction of the uncertainty in the state estimate. Discussion and conclusions are given in section 6.

## 2. Scale-dependent representation of the information content of observations in ensemble data assimilation

### a. General framework

Several measures are used to compute the value of various observing systems in operational data assimilation for NWP. They compare the uncertainty of the background (prior) state as described by the probability density

*n*denotes the dimension of the state vector

*t*.

Fisher (2003) showed how to compute the entropy reduction and DFS for a large variational assimilation system. He defined DFS by using a transformation *N* statistically independent directions of the transformed state vector (i.e., as the reduction in each degree of freedom of the background error).

In this study, we present a solution for the matrix

### b. Representation of the error covariance matrix

The representation of the error covariances relies on the idea that the analysis and forecast error covariances could be represented by the normal modes of the model. The assumption that the errors in different model modes are uncorrelated results in the diagonal background and analysis error covariance matrices *β* plane by using the nonlinear shallow-water equations (Daley 1993; Žagar et al. 2004). The present formulation can be seen as the extension of the previous work to the global 3D framework. The 3D framework applies solutions for the three-dimensionally orthogonal normal-mode functions of the primitive equations on terrain-following levels in terms of the discrete spectrum of the Rossby and inertia–gravity waves and their zonally averaged states. The basic state is a state of rest with vertical temperature and stability profiles representing the model average state over the simulated period. In comparison to other orthogonal representations of modes of variability of model forecasts and forecast errors, the normal-mode function decomposition extracts physical wave modes that constitute solutions to the global linear primitive equation system. While it cannot be expected that error covariances in NWP models are described well by diagonal matrices, the application of normal modes serves well to discuss the multivariate nature of some of the leading modes, especially in the tropics.

*λ*,

*φ*, and

*j*in the sigma coordinate system, respectively. The new indices

*σ*coordinates, described by the vertical eigenfunctions and a horizontal Hough function expansion (Kasahara and Puri 1981; Žagar et al. 2015b):Here, an input data vector

*h*defined on the horizontal regular Gaussian grid and vertical sigma levels at time step

*t*, with the time subscript dropped:

*h*, where

*z*is the height corresponding to the hydrostatic pressure, and

*g*is Earth’s gravity. Also,

*σ*level,

*σ*coordinate system and solved by using the finite-difference method. The Hough harmonics are denoted by

*m*. The scaling matrix

*U*, meridional velocity

*V*, and the geopotential height

*Z*for each set of indices

*u*,

*υ*, and

*h*because the expansion in (6) is complete as the truncation limits

*M*is limited by the number of vertical levels. The maximal number of zonal waves in (6) is denoted by

*K*, starting from

*R*combines equal numbers of balanced modes

*χ*, can be written with the help of a transformation operator

**to the model physical space:Similar to Žagar et al. (2004), we call it “pseudoinverse” because (10) should in principle also include an operator representing the three truncations: the Fourier truncation, the meridional truncation for the Hough vectors, and the vertical truncation. Since such an operator is not invertible, its pseudoinverse (denoted**

*χ*### c. Information content of observations in modal space

**can be written as**

*χ***becomes**

*χ**ν*asIt describes the relative information content in the observations and the prior ensemble in a spatial scale characterized by the zonal wavenumber

*k*, meridional mode

*n*, and the vertical mode

*m*. The modal index

*ν*, the quantity

*t*. Equations (12) and (13) are expressed in terms of

## 3. Ensemble Kalman filter data assimilation with a perfect-model framework

The modeling framework is based on the ensemble adjustment Kalman filter system (EAKF) in DART (Anderson et al. 2009), which is applied with the global Community Atmosphere Model (CAM; Hurrell et al. 2006). The perfect-model framework provides a unique dataset for studying two important issues in data assimilation: the growth of forecast errors free of the model error component and the flow dependency of forecast errors independent of inhomogeneities of the observation network and the covariance inflation. The latter is ensured by constructing a global, nearly homogeneous network of observations of the model prognostic variables. The DART assimilation system in the perfect-model framework works well without the need for covariance inflation. The implementation of the 80-member ensemble includes the horizontal and vertical localization of Gaspari and Cohn (1999), which limits the length of spatial correlations. The impact of the localization radius on the assimilation efficiency is presented by using sensitivity experiments with varying localization radii from Lei and Anderson (2014).

### a. The atmospheric model and nature run

The CAM model is an atmospheric component of the NCAR Community Earth System Model, version 1 (CESM1; Gent et al. 2011). However, the applied dynamical core based on spherical harmonics is not a standard part of the CESM system but it suits our study because of the similarity of the spectral formulation with the representation in terms of the normal-mode functions. Spectral solutions are obtained with a T85 truncation that corresponds to approximately 150-km resolution at the equator. The model physics is the so-called CAM4 physics package (Neale et al. 2013). The model is coupled to the land surface model known as the Community Land Model (CLM; Gent et al. 2011) and prescribed sea surface temperature (SST). The SST is derived from the Hadley Centre (Hurrell et al. 2008); linearly interpolated weekly values are obtained from monthly average datasets that maintain the observed monthly means as near as possible. Time integration is Eulerian and the step is 10 min. The model vertical discretization is based on the hybrid sigma–pressure vertical coordinate and the discretization is the same as in Žagar et al. (2011); that is, 26 model levels include 8 levels above 100 hPa, 3 levels below 900 hPa, and the top model level located at about 3.7 hPa. The spectral dynamical core of CAM4 applies the second-order divergence damping in the top four model levels, its strength increasing with height.

For the nature run, we selected a year without a significant ENSO signal, 2008. In preparation for the OSSE, the atmospheric model is run continuously for 10 months starting from 1 January 2008 and the initial state is taken from the NCEP–NCAR reanalyses. The last three months of the simulated period, 1 August–31 October are used as “nature” for the simulation of observations used in the data assimilation. As the applied nonnegative matrix factorization (NMF) decomposition relies on the *σ* coordinate, the first step before any projection is the interpolation of data from the hybrid levels to corresponding *σ* levels.

### b. Observations and assimilation setup

The data assimilation system DART applies the serial EAKF (Anderson 2001; Anderson et al. 2009) to combine synthetic observations with an ensemble of forecasts from CAM4 to produce an ensemble of analyses. The DART/CAM (Raeder et al. 2012) has been applied for a number of studies including the localization problem (Lei and Anderson 2014).

Synthetic observations are simulated over the period August–October from model outputs 12 h apart. A special observation network with radiosonde-like wind and temperature profiles is nearly homogeneous on the sphere (Fig. 2). The number of radiosonde profiles of temperature, zonal, and meridional winds is 600 at each analysis step. The average distance between the observation points is about 920 km, which is considered sufficient for the representation of the large-scale properties of the growth of ensemble spread and the properties of data assimilation in a perfect model without inflation.

A homogeneous network ensures that the flow-dependent covariances are not influenced by the distribution of observations (although they are influenced by the observation density). At each measurement point, temperature and wind profiles are interpolated to model equivalents of observations on model levels. Observations are simulated from the nature run by adding a random error from a Gaussian distribution of zero mean and with the standard deviation equal to the prescribed observation error for different variables. The observation errors in temperature are 1 K while winds are observed with 2 m s^{−1} errors. Observation errors are vertically homogeneous.

The initial 80-member prior ensemble valid at 0000 UTC 1 August 2008 is produced by advancing an ensemble created by adding negligibly small, normally distributed perturbations to the simulated temperature field. Data assimilation is performed with an 80-member ensemble at 0000 and 1200 UTC starting from 0000 UTC 1 August. The observation operator bilinearly interpolates the model values to the horizontal locations of observations. Observations are assimilated on all levels up to the model top and they influence all model state variables. The covariance inflation, which is used to increase the ensemble spread, is not applied allowing purely dynamical error covariance structures. Examination of the time mean total spread and time mean root-mean-square error of prior estimates of the synthetic observations shows that the ensemble spread and root-mean-square error (RMSE) are comparable for all observation types in all regions and at all heights (figures not shown). A comparison of the state space prior RMSE and prior ensemble spread also shows comparable magnitudes when averaged over time for all regions, variables, and heights indicating that the assimilation system is working well without a need for multiplicative inflation. The implementation of the EAKF includes the vertical and horizontal covariance localization by Gaspari and Cohn (1999) to filter noisy background-error covariances. The half-width radius for the horizontal localization is 0.2 rad (around 1275 km) whereas the vertical localization is 300 hPa.

## 4. General properties of the EnKF with a perfect model in physical space

Some basic properties of our OSSE experiment are illustrated in Figs. 3–5, which present ensemble spread in 12-h forecasts and analyses. First of all, Fig. 3 shows time series of the globally integrated spread in the zonal wind variable as well as the spread in zonal wind and temperature variables at four grid points located at the model level close to 225 hPa. Two points are located in the tropics and two in the midlatitudes on the same longitude to illustrate flow-dependent properties of the spread in different regions. The longitude locations correspond to 0° and to 120°; that is, to equatorial Africa (point 1) and the western Pacific (point 2). The main characteristic of the presented curves is a large difference in the magnitude of the spread between the tropics and the midlatitudes. There are large temporal oscillations in the tropical spread on both the short range (a couple of days) and a longer time scale (2–3 weeks). Furthermore, there is no substantial difference in the amplitude of temperature spread in the tropics and midlatitudes whereas the difference in the spread of the zonal wind field is large; this applies especially for the point in the western Pacific where convection is active all the time. Figure 3 shows that it takes about 3 weeks for the system to reach a state in which the prior and posterior ensemble spread oscillate around their respective averages.

The flow-dependent properties of the prior ensemble spread are further illustrated in Fig. 4, which displays the zonal wind spread at three levels in the upper troposphere where the spread is the largest. In the midlatitudes, uncertainty in the 12-h forecast ensemble is associated with developing baroclinic systems on the midlatitude westerly flow that is seen in the eastward propagation of the ensemble spread throughout the troposphere. On the contrary, the tropical spread contains both westward- and eastward-propagation directions as well as some stationary features. The latter are best seen at the top of the intense convection over the Indian Ocean close to 100 hPa (Fig. 4a). Higher up, the spread emanating from this convective region is propagating eastward (not shown). We can also notice in Fig. 4 that the tropical spread has on average twice the amplitude of the midlatitude spread and that wide areas of tropical oceans are characterized by large forecast uncertainties.

When averaged zonally, the tropical upper troposphere spread appears as a dominant feature of our perfect-model experiment (Fig. 5). In other words, uncertainties in the global perfect-model experiment are on average very similar to those simulated by the state-of-the-art global NWP models including the model error (as presented in Fig. 1). The spread in the zonal wind field in Fig. 5a is strikingly similar to the outputs from the ECMWF model and data assimilation system shown in Žagar et al. (2013, see their Fig. 2) and in Cardinali et al. (2014, see their Fig. 3). The zonally averaged spread of the prior ensemble in Fig. 5 illustrates several properties of global dynamics. The maximum of spread in zonal wind is north of the equator over the ITCZ between 150 and 200 hPa. The spread maximum in the meridional wind is broader and stretched across the equator to the Southern Hemisphere in relation to the cross-equatorial circulation. The tropical maximum of temperature spread is found above the spread maxima in the wind components and there is also a large spread in temperature in the lower troposphere of high latitudes and near the model top.

Finally, we present the average difference between the posterior and prior spread divided by the prior spread for the zonal wind and temperature variables. The result, shown in Fig. 6, describes the average fractional reduction of the prior spread by the assimilated observations. Figure 6 shows that the largest information content of observations as represented by the reduction of the prior ensemble spread is in the midlatitude upper troposphere. The altitude of the maximal impact of observations is different for various state variables; the displacement of the efficiency of observations in reducing the prior spread in temperature and the zonal wind agrees with the thermal wind coupling between the mass and wind field errors. In our experiment with the homogeneous observation network, globally constant observation errors, and the perfect model, the average fractional reduction of the spread appears at least one-third smaller in the tropics than in the midlatitudes. At the same time, the errors in the prior are on average twice as large in the tropics (Figs. 3a, 4, and 5). This is due to the large forecast error growth in the tropics associated with deep convection (e.g., Reynolds et al. 1994) and significant uncertainties in the posterior ensemble (initial state; Fig. 3b).

## 5. Scale-dependent information content of observations

### a. Time-averaged properties of the prior ensemble spread

As shown in Fig. 3 in physical space, the globally integrated ensemble spread (or variance) does not vary a lot during the experiment after the first three weeks. If the total variance of the prior ensemble is averaged over the September–October period assigned to the three motion types, their contributions are around 60%, 20%, and 20% for the balanced, the EIG, and the WIG components, respectively. The variability is up to about 5% of the total variance, with variability in the balanced and IG modes characterized by opposite signs (not shown). The detailed investigation of the flow-dependent properties of the prior variance is a subject of a separate paper. Here we present the time-averaged properties, which are relevant for the discussion of the variance reduction by assimilation and for comparison to previous studies.

If the prior variance is integrated vertically, the resulting variance distribution along the *n* = 1–3 and in several zonal wavenumbers centered around *n* = 0 EIG mode) variance in

If the variance is also integrated meridionally, the resulting spectra present the distribution of the variance as a function of the zonal wavenumber (Fig. 9). Figure 9a shows that largest contribution to the spread is by the large scales. This figure appears very similar to figures in Žagar et al. (2013) and Cardinali et al. (2014) for the ECMWF ensemble of short-range forecasts evolving from the ensembles of independent 4D-Var analyses; it reflects the dominance of the tropical large-scale uncertainties in analyses growing into the large-scale errors from the forecast start. The balanced spread dominates over the IG spread except at the smallest scales in our model. Furthermore, the EIG spread dominates over the WIG spread on large scales up to around zonal wavenumber 8 (scale around 1700 km at the equator), in agreement with Fig. 7 and the idea of flow-dependent large-scale tropical errors associated with the Kelvin mode. In subsynoptic scales, there is a slight prevalence of the WIG spread. If the variance in each zonal wavenumber is split between the balanced and IG parts, the dominance of the balanced variance on planetary scales becomes more clearly visible (Fig. 9b). The ratio between the balanced and IG variance is at a nearly constant level of 70% and 30% for BAL and IG modes, respectively, for zonal wavenumbers 1–6 (scales larger than 2000 km in the tropics). The relative amount of balanced variance decreases to about 50% at the wavenumber 70. Overall, the distribution of the prior uncertainty in Figs. 7–9 is very similar to that in NWP systems suggesting that model error is not the dominating factor responsible for the tropical large-scale errors in the global short-range forecasts. The 2D distribution of the variance in the posterior ensemble appears very similar to that shown in Fig. 7 and the 1D curves of the posterior variance thus follow the prior variance distribution (Fig. 10). The largest difference between the prior and posterior variance can be seen in the large scales of the balanced modes.

While the prior variance distribution in Fig. 9a is similar to that in the ensemble data assimilation system of ECMWF (Žagar et al. 2013), it is rather different from the variance distribution in our earlier study (Žagar et al. 2011). The initial ensemble in Žagar et al. (2011) originated from 80 various AMIP realizations and thus contained a large spread on the planetary scales. Almost half of the ensemble spread (in both prior and posterior) remained in the zonally averaged state and the discussion of the prior variance was split between the mean zonal state and the remaining wave variance. In the present case, small initial temperature perturbations adjust with the wind field during the first few days of the experiment through the multivariate assimilation and model dynamics. After about two weeks, the ratio between the balanced and IG variance is established at its equilibrated value and only a small part of the variance belongs to the mean state, just like in the ECMWF model.

### b. Scale-dependent information content of observations

Having known the distribution of the prior ensemble variance in the perfect-model experiment, we can now discuss how the reduction of variance is related to the circulation properties. From Fig. 10 we can infer how the globally integrated variance of the 12-h forecast ensemble in each zonal wavenumber is reduced by the assimilation step. If averaged over all zonal wavenumbers, the reduction of the balanced and IG variance is about 19% and 16%, respectively, of the prior variance. While these percentages may significantly vary depending on the observation network and background errors, the average information content does not vary significantly throughout the period but it varies significantly across the scale and motion types. The remaining figures reveal the details.

We start the discussion of the information content of observations by comparing the 2D distribution of the time-averaged variance reduction subtracted from 1 in the leading vertical mode with the entropy reduction [(14) vs (15)]. Figure 11 shows that the distribution in the modal space is qualitatively the same; thus, we shall use the variance reduction and information content of observations as synonyms when discussing the impact of observations in the assimilation.

Figures 12 and 13 show that the information content of observations is a strongly scale-dependent quantity with large differences between the horizontal and vertical scales as well as between the balanced and IG components. The quantity presented in these figures is the one defined in (14) subtracted from one; this corresponds to the portion of the prior variance reduced by the observation information. On average, data assimilation is more efficient at reducing the prior variance in balanced modes than in the IG modes. This is especially well seen in Fig. 13, which shows the distribution of the variance reduction in three types of modes along each direction of the modal space. The information content of observations as a function of the zonal wavenumber is maximal in the synoptic scale of the balanced motions. In zonal wavenumbers 4–14 (scales between 5000 and 1400 km at the equator, and between 3500 and 1000 km at 45°N), the variance reduction is on average up to 29%. The peak reduction is in the zonal wavenumber 7 (around 2900 km at the equator and 2000 km at 45°). From its peak at zonal wavenumber 7, the information content of observations nearly symmetrically drops to zonal wavenumbers 1 and 20, which have variance reduction around 25%. Toward the smaller scale, the variance reduction steadily reduces so that at zonal wavenumber 30 (scale around 500 km in the midlatitudes), the efficiency of observations is about one-third smaller than its value in the peak synoptic scale. The key question to be answered is why the applied data assimilation is equally efficient at the 700-km scale as at the scale of 13 000 km at the equator (observation sampling of zonal wavenumber 1 at the equator is shown in Fig. 2). We suspect that the radius of localization may be at least partly responsible for a better efficiency of assimilation at planetary scales than the synoptic scales. In section 5c we investigate to what extent this holds. The relative lack of observation content in planetary scales is confirmed in the 2D presentation of Fig. 12. The smaller information content of observations in planetary scales than in synoptic scales is found in all vertical modes. In particular, baroclinic vertical modes 6–8 are characterized by the maximal variance reduction in the synoptic zonal scales.

There is a large gap in the information content of observations for the balanced and IG modes in all scales until the zonal wavenumber around 50 (scale around 270 km at the equator) as seen in Fig. 13a. This is the scale where the amount of the variance in the prior ensemble sharply drops (Fig. 9). Furthermore, there is also a major difference between the scale distribution of the information content for the EIG and WIG modes. Both WIG and EIG curves show maximal variance reduction in zonal wavenumbers 5–10; the difference is that the curve for the WIG modes appears more similar to the balanced modes (Figs. 13a and 12c). Furthermore, the variance reduction for the WIG modes in the synoptic zonal scales and nearly all meridional modes is significantly greater than for the EIG modes (Figs. 13a,b). The difference in the variance reduction between the EIG and WIG modes is associated with dynamical properties of forecast errors as discussed earlier for the prior variance distribution in Fig. 7. Since a part of the midlatitude variance associated with the eastward-propagating developing baroclinic disturbances projects to the WIG modes, the distribution of information content of observations in Figs. 12 and 13 for the WIG modes have properties similar to both the balanced and the EIG component. Figure 14 illustrates this point by showing the difference between the prior spread from Fig. 8 and the posterior spread; as earlier, we notice that the spread reduction is to a larger degree associated with the WIG modes in the midlatitudes and to the EIG modes in the tropics. A comparison of Fig. 8 and Fig. 14 also illustrates the greater efficiency of data assimilation in the midlatitudes than in the tropics where the prior spread is largest.

Further dynamical aspects of the information content of observations are illustrated by presenting the variance reduction as a function of the vertical mode index (Fig. 13c). There is a large gap between the variance reduction in the balanced and IG parts for the leading two modes that have barotropic and almost barotropic structure (mode *m* = 6–8, which have equivalent depths *D* around 370, 250, and 160 m, respectively, the information content of observations in balanced modes has its maximal value. For the IG modes, the maximal variance reduction is in *m =* 10–11, which have equivalent depths around 74 and 53 m. These are the vertical modes with baroclinic tropospheric structure that have largest amplitudes in the upper troposphere and they represent the gross error variance in the upper tropical troposphere in all three types of modes (Figs. 12d–f).

### c. Impact of the localization on the information content of observations

The Gaspari–Cohn (GC) localization has the basic purpose of removing the spurious large-scale spatial correlations during data assimilation (Houtekamer and Mitchell 2001; Anderson 2007). This process may also cause some imbalance (e.g., Mitchell et al. 2002; Kepert 2009), which, however, is not the subject of the present paper. More general methods for applying scale-dependent localization have been studied (e.g., Buehner 2012; Buehner and Shlyaeva 2015) and may be able to reduce the imbalance in analyzed fields.

Here, we address the impact of the applied half-width of the GC localization function on the relative variance reduction in the balanced and IG modes. For this purpose we apply OSSE experiments presented in Lei and Anderson (2014). Their experiments were prepared with the same observation network, observation error variances, and the same method for the generation of the initial ensemble as in the present case. The main difference in the data assimilation setup is the use of the spatially varying and time-varying state space adaptive inflation (Anderson 2009). Furthermore, a different dynamical core and a different set of physical parameterizations for the CAM model were used. For details, the reader is referred to Lei and Anderson (2014). The assimilation experiments were done for several values of the half-width of the GC function including 0.2 rad, which is used in our experiment, 0.4 rad (around 2550 km), and 0.6 rad (around 3820 km).

First we show in Fig. 15 the impact of the variance inflation on the zonally averaged variance reduction in physical space for the zonal wind component. Figure 15a can be compared to our Fig. 6a, which is based on the same half-width of the GC function to see that these two plots largely agree. Therefore, the other two experiments shown in Fig. 15 can be used to discuss the impact of the localization radius despite their use of inflation. They suggest that the impact of localization is significant; in particular, the 0.6-rad case, which is 3 times larger than in our simulation, provides a similar variance reduction in physical space in the tropics as in the Northern Hemisphere midlatitudes. The comparison suggests that a more broad localization improves the variance reduction in the tropics, where correlation scales are on average longer than in the midlatitudes. All three experiments are characterized by gradients near the poles that are even stronger in the temperature field (not shown). The zonally averaged spread of the prior and posterior ensemble appears similar in all experiments with a strong maximum in the upper tropical troposphere as shown in Fig. 5. A difference between the experiments is the magnitude of the spread; the experiment with the shortest radius has the largest prior spread at all levels and latitudes (not shown).

By verifying these experiments against the simulated truth, Lei and Anderson (2014) found that the experiment applying the half-width of 0.4 rad produced the posterior ensemble mean closest to the truth in the root-mean-square (rms) sense. The best performing experiment is associated with both localization radius and covariance inflation; the latter increases the spread in the case when a too broad localization radius introduces extra noise, thus reducing the accuracy of the ensemble and reducing the spread. When the localization is too narrow, not all useful information from the observations is being used, thus the rms errors would increase again. However, this argument should be seen in light of the dynamics. In the tropics, long correlations are expected in both balanced and IG modes, especially in regions away from intense convection. In contrast, midlatitude correlations are strongest in baroclinic scales where also the balanced variance is largest (Fig. 7a). As the largest amount of IG prior variance is in tropical large scales, we expect a strong impact of localization in this part of the spectrum. Extratropical baroclinic scales, with the prior variance maximum around

First we should compare Fig. 16a with Fig. 13a, which uses the same localization radius. The difference in the variance reduction between the balanced and IG modes is much smaller in the synoptic scales and much larger in the subsynoptic scales. Another relevant difference is between the EIG and WIG modes in subsynoptic scales; the ensemble spread in WIG modes appears more reduced in Fig. 16 than in Fig. 13a. The differences are due to a different forecast model (dynamical core and physical parameterizations) and due to inflation. In spite of differences with respect to Fig. 13a, Fig. 16a suggests that the covariance inflation does not change the main fact that is motivating this comparison (i.e., a smaller variance reduction in planetary scales than in synoptic scales for both balanced and IG modes).

As the localization radius increases, the variance reduction increases in all scales but especially in the subsynoptic scales for the IG modes (Fig. 16). The variance reduction for the IG modes at zonal wavenumber 50 is more than twice as great for the 0.60-rad localization than for the 0.2-rad case. As discussed above, this is not necessarily leading to a better RMSE score. We also find in Fig. 16 a greater variance reduction for the synoptic-scale WIG modes than for the EIG modes in all cases. However, the main finding of this comparison is that the spatial scale of the peak efficiency of data assimilation in the balanced modes is not affected by the applied localization radius; the maximal variance reduction remains in the synoptic scales of balanced modes. This is associated with the best efficiency of data assimilation for the midlatitude Rossby waves as illustrated in Fig. 14. But, we cannot rule out the possibility that the localization significantly larger than 0.60 rad would make the curves more flat in large scales. The difference in variance reduction between the balanced and IG modes reduces with increasing radius. Increasing the radius has the largest importance for the tropics and its IG variance; the curves of variance reduction in planetary scales become nearly flat as the radius becomes larger. With the localization radius half-width equal to 0.6 rad, the variance reduction in

## 6. Discussion and conclusions

This paper discussed the scale-dependent efficiency of observations to reduce the short-range forecast error variances in the ensemble Kalman filter data assimilation and the perfect-model framework. A particular motivation for the discussion was the global forecast error maximum in the tropical upper troposphere characteristic of the current NWP and ensemble prediction systems. The presented experiments that assimilated observations from a nature run simulated by the same model suggest that the observed forecast error maximum is not associated with the errors in the representation of physical processes, numerical truncation, or surface boundary conditions.

The earlier work in tropical data assimilation modeling provides a basic understanding of the relative impact of different equatorially trapped wave solutions in variational analysis in the tropics (Daley 1993; Žagar et al. 2004, 2005; Harlim and Majda 2013). In particular, it showed that the time-averaged information about the variance of tropical waves, built into the background error covariance matrix for data assimilation can provide analysis increments that appear nearly univariate even though they result from the advanced multivariate assimilation methodology. This is explained to be a consequence of complex tropical dynamics in which various IG waves, each characterized by its own coupling between the mass field and the wind field, play different roles in the coupling between convection and the horizontal motions. This is not a problem in the midlatitudes where the quasigeostrophic dynamics has successfully served as the basic framework for representing short-range errors in the NWP forecasts. Another dynamical constraint that can significantly influence the success of the EnKF data assimilation in the tropics is the fact that growth of forecast errors in the tropics occurs in all scales and in both balanced and inertio-gravity modes. Therefore, the localization of the covariances, which is successful in the midlatitudes may not be optimal for the tropics.

We discuss an important role of IG modes in tropical large-scale dynamics and associated growth of errors in forecasts by comparing analysis and forecast uncertainties in balanced and IG modes of various horizontal and vertical scales. We derive a methodology that quantifies the information content of observations that is equivalent to the degrees of freedom for signal and entropy reduction concepts for data assimilation. The application of the 3D normal-mode function decomposition facilitates the discussion of the multivariate nature of some of the leading modes of atmospheric variability, especially in the tropics where the role of large-scale IG modes such as the Kelvin waves in the initial state for NWP is still not clearly identified.

The application of the modal decomposition showed that about 40% of the prior variance in the global forecasts is associated with the inertio-gravity modes. On large scales, the unbalanced spread is primarily associated with the tropical IG modes with the Kelvin mode dominant. With the selected setup for the EnKF assimilation, the efficiency of observations to reduce the synoptic-scale unbalanced variance was about one-third smaller than for the balanced variance. Among different planetary-scale IG modes, the Kelvin mode was analyzed the best, especially in the experiments with a larger radius of the covariance localization. This mode is characterized by the geostrophic balance between the meridional geopotential gradient and the zonal wind, a relationship one would hope to be represented by the background covariances. However, even in this case a mixture of the mass–wind couplings of tropical waves can provide univariate analysis increments because of the superposition of the positive and negative signs in different mass–wind couplings. The flow-dependent multivariate assimilation methodology, which treats the resulting mixture of mass–wind relationships, will thus not recognize the errors associated with different waves, not even on the largest scales.

The spatial distribution of analysis and forecast uncertainties, as simulated by the variance of the 80-member ensemble in the EnKF assimilation using the perfect model, is still dominated by the large dynamical errors in the tropical upper troposphere, just like in the operational NWP framework. The performed data assimilation was on average significantly less efficient in the reduction of tropical forecast uncertainties than the uncertainties in the midlatitudes. However, the efficiency of assimilated observations was significantly affected by the applied localization radius. A localization radius that may be suitable for the baroclinic scales in the midlatitudes is shown to be far too small to allow data assimilation to reduce the prior variance in the planetary scales in the tropics. This is particularly harmful for the unbalanced modes. Even with the half-width of the Gaspari–Cohn function around 3800 km, the planetary-scale (

## Acknowledgments

The authors would like to thank Dennis Shea and Damjan Jelić for their help with processing the data in section 5c; Blaž Jesenko for creating Figs. 2, 8, and 14; and Chris Snyder for a discussion. Also, we are very grateful for the concise and insightful comments by two anonymous reviewers. Research of Nedjeljka Žagar is funded by the European Research Council (ERC), Grant 280153. The Centre of Excellence for Space Sciences and Technologies SPACE-SI is an operation partly financed by the European Union, the European Regional Development Fund, and the Republic of Slovenia, Ministry of Higher Education, Science, and Technology.

## REFERENCES

Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129**, 2884–2903, doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.Anderson, J. L., 2007: Exploring the need for localization in ensemble data assimilation using an hierarchical ensemble filter.

,*Physica D***230**, 99–111, doi:10.1016/j.physd.2006.02.011.Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters.

,*Tellus***61A**, 72–83, doi:10.1111/j.1600-0870.2008.00361.x.Anderson, J. L., , T. Hoar, , K. Raeder, , H. Liu, , N. Collins, , R. Torn, , and A. F. Arellano, 2009: The Data Assimilation Research Testbed: A community data assimilation facility.

,*Bull. Amer. Meteor. Soc.***90**, 1283–1296, doi:10.1175/2009BAMS2618.1.Atlas, R., 1997: Atmospheric observations and experiments to assess their usefulness in data assimilation.

,*J. Meteor. Soc. Japan***75**, 111–130.Baker, W. E., and et al. , 2014: Lidar-measured wind profiles: The missing link in the global observing system.

,*Bull. Amer. Meteor. Soc.***95**, 543–564, doi:10.1175/BAMS-D-12-00164.1.Buehner, M., 2012: Evaluation of a spatial/spectral covariance localization approach for atmospheric data assimilation.

,*Mon. Wea. Rev.***140**, 617–636, doi:10.1175/MWR-D-10-05052.1.Buehner, M., , and A. Shlyaeva, 2015: Scale-dependent background-error covariance localisation.

,*Tellus***67A**, 28027, doi:10.3402/tellusa.v67.28027.Buizza, R., , M. Milleer, , and T. N. Palmer, 1999: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system.

,*Quart. J. Roy. Meteor. Soc.***125**, 2887–2908, doi:10.1002/qj.49712556006.Cardinali, C., , N. Žagar, , G. Radnoti, , and R. Buizza, 2014: Representing model error in ensemble data assimilation.

,*Nonlinear Processes Geophys.***21**, 971–985, doi:10.5194/npg-21-971-2014.Courtier, P., and et al. , 1998: The ECMWF implementation of three-dimensional variational assimilation (3D-Var). I: Formulation.

,*Quart. J. Roy. Meteor. Soc.***124**, 1783–1807, doi:10.1002/qj.49712455002.Daley, R., 1991:

*Atmospheric Data Analysis*. Cambridge University Press, 460 pp.Daley, R., 1993: Atmospheric data analysis on the equatorial beta plane.

,*Atmos.–Ocean***31**, 421–450, doi:10.1080/07055900.1993.9649479.Derber, J. C., , and F. Bouttier, 1999: A reformulation of the background error covariance in the ECMWF global data assimilation system.

,*Tellus***51A**, 195–221, doi:10.1034/j.1600-0870.1999.t01-2-00003.x.Fisher, M., 2003: Estimation of entropy reduction and degrees of freedom for signal for large variational analysis systems. ECMWF Research Department Tech. Memo. 397, 18 pp. [Available online at http://www.ecmwf.int/en/elibrary/9402-estimation-entropy-reduction-and-degrees-freedom-signal-large-variational-analysis.]

Fujiwara, M., , J. Suzuki, , A. Gettelman, , M. Hegglin, , H. Akiyoshi, , and K. Shibata, 2012: Wave activity in the tropical tropopause layer in seven reanalysis and four chemistry climate model data sets.

,*J. Geophy. Res.***117**, D12105, doi:10.1029/2011JD016808.Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125**, 723–758, doi:10.1002/qj.49712555417.Gent, P. R., and et al. , 2011: The Community Climate System Model version 4.

,*J. Climate***24**, 4973–4991, doi:10.1175/2011JCLI4083.1.Gustafsson, N., , L. Berre, , S. Hörnquist, , X.-Y. Huang, , M. Lindskog, , B. Navascués, , K. S. Mogensen, , and S. Thorsteinsson, 2001: Three-dimensional variational data assimilation for a limited area model. Part I: General formulation and the background error constraint.

,*Tellus***53A**, 425–446, doi:10.1111/j.1600-0870.2001.00425.x.Harlim, J., , and A. J. Majda, 2013: Test models for filtering and prediction of moisture-coupled tropical waves.

,*Quart. J. Roy. Meteor. Soc.***139**, 119–136, doi:10.1002/qj.1956.Houtekamer, P. L., , and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129**, 123–137, doi:10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.Houtekamer, P. L., , and H. L. Mitchell, 2005: Ensemble Kalman filtering.

,*Quart. J. Roy. Meteor. Soc.***131**, 3269–3289, doi:10.1256/qj.05.135.Hurrell, J., , J. Hack, , A. Phillips, , J. Caron, , and J. Yin, 2006: The dynamical simulation of the Community Atmosphere Model version 3 (CAM3).

,*J. Climate***19**, 2162–2183, doi:10.1175/JCLI3762.1.Hurrell, J., , J. Hack, , D. Shea, , J. Caron, , and J. Rosinski, 2008: A new sea surface temperature and sea ice boundary dataset for the Community Atmosphere Model.

,*J. Climate***21**, 5145–5153, doi:10.1175/2008JCLI2292.1.Kasahara, A., , and K. Puri, 1981: Spectral representation of three-dimensional global data by expansion in normal mode functions.

,*Mon. Wea. Rev.***109**, 37–51, doi:10.1175/1520-0493(1981)109<0037:SROTDG>2.0.CO;2.Kepert, J. D., 2009: Covariance localisation and balance in an ensemble Kalman filter.

,*Quart. J. Roy. Meteor. Soc.***135**, 1157–1176, doi:10.1002/qj.443.Kleist, D. T., , D. F. Parrish, , J. C. Derber, , R. Treadon, , R. M. Errico, , and R. Yang, 2009: Improving incremental balance in the GSI 3DVAR analysis system.

,*Mon. Wea. Rev.***137**, 1046–1060, doi:10.1175/2008MWR2623.1.Lei, L., , and J. L. Anderson, 2014: Empirical localization of observations for serial ensemble Kalman filter data assimilation in an atmospheric general circulation model.

,*Mon. Wea. Rev.***142**, 1835–1851, doi:10.1175/MWR-D-13-00288.1.Magnusson, L., , and E. Källén, 2013: Factors influencing skill improvements in the ECMWF forecasting system.

,*Mon. Wea. Rev.***141**, 3142–3153, doi:10.1175/MWR-D-12-00318.1.Mitchell, H. L., , P. L. Houtekamer, , and G. Pellerin, 2002: Ensemble size, balance, and model-error representation in an ensemble Kalman filter.

,*Mon. Wea. Rev.***130**, 2791–2808, doi:10.1175/1520-0493(2002)130<2791:ESBAME>2.0.CO;2.Neale, R. B., , J. Richter, , S. Park, , P. H. Lauritzen, , S. J. Vavrus, , P. J. Rasch, , and M. Zhang, 2013: The mean climate of the Community Atmosphere Model (CAM4) in forced SST and fully coupled experiments.

,*J. Climate***26**, 5150–5168, doi:10.1175/JCLI-D-12-00236.1.Parrish, D. F., , and J. C. Derber, 1992: The National Meteorological Center’s spectral statistical-interpolation analysis system.

,*Mon. Wea. Rev.***120**, 1747–1763, doi:10.1175/1520-0493(1992)120<1747:TNMCSS>2.0.CO;2.Phillips, N. A., 1990: Dispersion processes in large-scale weather prediction. Sixth IMO Lecture, World Meteorological Organization Tech. Doc. 700, 126 pp.

Podglajen, A., , A. Hertzog, , R. Plougonven, , and N. Žagar, 2014: Assessment of the accuracy of (re)analyses in the equatorial lower stratosphere.

,*J. Geophys. Res. Atmos.***119**, 11 166–11 188, doi:10.1002/2014JD021849.Raeder, K., , J. L. Anderson, , N. Collins, , T. J. Hoar, , J. E. Kay, , P. H. Lauritzen, , and R. Pincus, 2012: DART/CAM: An ensemble data assimilation system for CESM atmospheric models.

,*J. Climate***25**, 6304–6317, doi:10.1175/JCLI-D-11-00395.1.Reynolds, C. A., , P. J. Webster, , and E. Kalnay, 1994: Random error growth in NMC’s global forecasts.

,*Mon. Wea. Rev.***122**, 1281–1305, doi:10.1175/1520-0493(1994)122<1281:REGING>2.0.CO;2.Rodgers, C. D., 2000:

*Inverse Methods for Atmospheric Sounding: Theory and Practice.*Series on Atmospheric, Oceanic and Planetary Physics, Vol. 2, World Scientific, 238 pp.Shutts, G. J., 2005: A kinetic energy backscatter algorithm for use in ensemble prediction systems.

,*Quart. J. Roy. Meteor. Soc.***131**, 3079–3100, doi:10.1256/qj.04.106.Shutts, G. J., 2013: Coarse graining the vorticity equation in the ECMWF Integrated Forecasting System: The search for kinetic energy backscatter.

,*J. Atmos. Sci.***70**, 1233–1241, doi:10.1175/JAS-D-12-0216.1.Singh, K., , A. Sandu, , M. Jardak, , K. W. Bowman, , and M. Lee, 2013: A practical method to estimate information content in the context of 4D-Var data assimilation.

,*SIAM/ASA J. Uncertainty Quant.***1**, 106–138.Skok, G., , J. Tribbia, , J. Rakovec, , and B. Brown, 2009: Object-based analysis of satellite-derived precipitation systems over the low- and midlatitude Pacific Ocean.

,*Mon. Wea. Rev.***137**, 3196–3218, doi:10.1175/2009MWR2900.1.Vitart, F., , and F. Molteni, 2010: Simulation of the Madden-Julian Oscillation and its teleconnections in the ECMWF forecast system.

,*Quart. J. Roy. Meteor. Soc.***136**, 842–855, doi:10.1002/qj.623.Wu, W.-S., , D. F. Parrish, , and R. J. Purser, 2002: Three-dimensional variational analysis with spatially inhomoheneous covariances.

,*Mon. Wea. Rev.***130**, 2905–2916, doi:10.1175/1520-0493(2002)130<2905:TDVAWS>2.0.CO;2.Žagar, N., , N. Gustafsson, , and E. Källén, 2004: Variational data assimilation in the tropics: the impact of a background error constraint.

,*Quart. J. Roy. Meteor. Soc.***130**, 103–125, doi:10.1256/qj.03.13.Žagar, N., , E. Andersson, , and M. Fisher, 2005: Balanced tropical data assimilation based on a study of equatorial waves in ECMWF short-range forecast errors.

,*Quart. J. Roy. Meteor. Soc.***131**, 987–1011, doi:10.1256/qj.04.54.Žagar, N., , J. Tribbia, , J. L. Anderson, , and K. Raeder, 2011: Balance of the background-error variances in the ensemble assimilation system DART/CAM.

,*Mon. Wea. Rev.***139**, 2061–2079, doi:10.1175/2011MWR3477.1.Žagar, N., , L. Isaksen, , D. Tan, , and J. Tribbia, 2013: Balance and flow-dependency of background-error variances in the ECMWF 4D-Var ensemble.

,*Quart. J. Roy. Meteor. Soc.***139**, 1229–1238, doi:10.1002/qj.2033.Žagar, N., , R. Buizza, , and J. Tribbia, 2015a: A three-dimensional multivariate modal analysis of atmospheric predictability with application to the ECMWF ensemble.

,*J. Atmos. Sci.***72**, 4423–4444, doi:10.1175/JAS-D-15-0061.1.Žagar, N., , A. Kasahara, , K. Terasaki, , J. Tribbia, , and H. Tanaka, 2015b: Normal-mode function representation of global 3D datasets: Open-access software for the atmospheric research community.

,*Geosci. Model Dev.***8**, 1169–1195, doi:10.5194/gmd-8-1169-2015.Županski, D., , A. Y. Hou, , S. Q. Zhang, , M. Županski, , C. D. Kummerow, , and S. H. Cheung, 2007: Applications of information theory in ensemble data assimilation.

,*Quart. J. Roy. Meteor. Soc.***133**, 1533–1545, doi:10.1002/qj.123.