1. Introduction
To obtain the minimum error variance or maximal likelihood state of the atmosphere given a forecast and observations, one requires the true flow-dependent forecast error covariance matrix. For a variety of reasons, this is a very difficult quantity to accurately estimate. Until recently, a common approach to estimating this quantity at operational centers has been to use a static covariance matrix that approximates a climatological average of flow-dependent error covariances (Lewis and Derber 1985; Courtier et al. 1994). In contrast, ensemble Kalman filters (Evensen 1994) have received considerable attention by researchers as a means to generate flow-dependent forecast error covariances. Both approaches are imperfect, so it is of interest to examine the performance of data assimilation schemes that linearly combine these two different types of error covariance models, what we term as hybrid-ensemble/four-dimensional variational data assimilation (4D-Var) or just hybrid systems. Here, we report on the performance of a hybrid system that we have built within the framework of the Navy's operational Naval Research Laboratory Atmospheric Variational Data Assimilation System-Accelerated Representer (NAVDAS-AR) four-dimensional data assimilation scheme. As far as the authors' are aware, the results we present here are the first observation space hybrid-ensemble/4D-Var system results. The results add to a growing body of knowledge about the performance of these hybridizations of static and ensemble covariances within operational data assimilation schemes.
Hybrid error covariance models were first discussed in a three-dimensional variational data assimilation (3D-Var) context by Hamill and Snyder (2000) and Lorenc (2003). In experiments with a barotropic model, Etherton and Bishop (2004) found that that the hybrid ensemble/3D-Var formulation was particularly useful when the forecast model was imperfect. Recently, scientists from the National Centers for Environmental Prediction (NCEP) have shown that the introduction of a hybrid error covariance formulation significantly improved the performance of NCEP's 3D-Var–GSI (gridpoint statistical interpolation) system (Kleist et al. 2009) at reduced operational resolution globally (Wang et al. 2013) and for tropical cyclone tracks (Hamill et al. 2011). Using a simple model system Zhang et al. (2009) developed the first truly coupled hybrid ensemble/4D-Var system [coupled, as used here and by Zhang et al. (2009), is not to be confused with ocean–atmosphere coupling]. Both NCEP and the Met Office now use hybrid background error covariance models in their operational systems.
Experiments by Buehner et al. (2010a,b) with versions of the Canadian operational system found that replacing the static covariances by localized ensemble covariances lead to large forecast improvements in the southern extratropics. While their work showed the potential superiority of ensemble-based covariances over static error covariances, it did not address the question of whether linear combinations of flow-dependent and static error covariances would provide superior forecast skill. This question was, however, addressed by Clayton et al. (2013) who found that initial covariances based on a linear combination of static and flow-dependent covariances were able to reduce overall forecast RMS errors, relative to the normal 4D-Var system, by a significant amount even with a relatively small ensemble (24 ensemble members). Zhang and Zhang (2012) corroborated these results [extending the work of Zhang et al. (2009) to real-data experiments] using a limited-area weather prediction model. Zhang and Zhang (2012) showed that their hybrid ensemble/4D-Var system outperformed both the ensemble Kalman filter and the 4D-Var system separately and in Zhang et al. (2013) they demonstrated that additional advantages may come from using the adjoint in hybrid ensemble/4D-Var over hybrid ensemble/3D-Var systems.
The hybrid ensemble/4D-Var data assimilation system we have developed is designed to be a component of the existing operational NAVDAS-AR data assimilation system (Rosmond and Xu 2006; Xu et al. 2005) and the operational ensemble forecasting system (McLay et al. 2008, 2010). The operational ensemble is based on a local formulation of Bishop and Toth (1999)'s ensemble transform (ET) technique and features a short-term cycling ensemble of 80 members. Our implementation enables a range of configurations to be tested using the same code. The configurations range from only using the static covariances (which is our baseline experiment representing the operational system) to only using the flow-dependent ensemble covariances and all fractional combinations in between. This flexibility allows for experiments where the relative impact of various combinations of the static and ensemble covariances can be readily measured without changing other components of the DA or forecast system. The 5-day verification forecasts were produced using the same forecast model as the NAVDAS-AR system, Navy Operational Global Atmospheric Prediction System (NOGAPS; Hogan et al. 1991).
The NAVDAS-AR data assimilation system differs from other major operational 4D-Var implementations in that it has been formulated in observation space (the dual form) rather than model space (the primal form; Courtier 1997). As we will explain in section 2, because the formulation is in the dual form, the system does not require the use of the extended control variable technique. A description of the experimental setup is presented in section 3. In section 4, results from a series of 2-month data assimilation experiments and the resulting validation with 5-day deterministic forecasts are presented. The various experiment configurations are compared to highlight the impact of the static (4D-Var), flow-dependent (ensemble), and linear mixing of the two initial covariances. Finally, some conclusions are given in section 5.
2. Formulation of NAVDAS-AR hybrid













































































Vertical localization on
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

Vertical localization on
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
Vertical localization on
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
Defining the vertical localization in terms of











The horizontal and vertical localization covariance can be seen in the right-hand column plots of Figs. 2g–i for an observation assimilated at 500 hPa. The 50% covariance localization in the horizontal direction is approximately 20° in both latitude and longitude, which corresponds to about 2000 km. The 50% covariance localization in the vertical ranges from about 425 to 650 hPa or approximately a 225-hPa-thick layer.

Meridional wind response (filled thick contours) to a single meridional wind observation at (a)–(c) the start and (d)–(f) the end of the 6-h 4D-Var window. The time of the observation is at the start of the 6-h 4D-Var window. The ensemble localization in the (g) horizontal and (h),(i) vertical planes; with (h) the latitude–pressure plane and (i) the longitude–pressure plane. The plots in (a)–(g) are all for the same model level (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

Meridional wind response (filled thick contours) to a single meridional wind observation at (a)–(c) the start and (d)–(f) the end of the 6-h 4D-Var window. The time of the observation is at the start of the 6-h 4D-Var window. The ensemble localization in the (g) horizontal and (h),(i) vertical planes; with (h) the latitude–pressure plane and (i) the longitude–pressure plane. The plots in (a)–(g) are all for the same model level (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
Meridional wind response (filled thick contours) to a single meridional wind observation at (a)–(c) the start and (d)–(f) the end of the 6-h 4D-Var window. The time of the observation is at the start of the 6-h 4D-Var window. The ensemble localization in the (g) horizontal and (h),(i) vertical planes; with (h) the latitude–pressure plane and (i) the longitude–pressure plane. The plots in (a)–(g) are all for the same model level (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
Along with the localization, Fig. 2 illustrates the resulting meridional wind analysis increment fields through a 6-h 4D-Var assimilation window. Assimilated is a single meridional wind observation with innovation (=1 m s−1) and observation error (=1 m s−1) located at the middle of the figure and at the beginning of the window. Presented in the figure is the beginning of the assimilation window (τ = −3 h, Figs. 2a–c) and the end of the assimilation window (τ = 3 h, Figs. 2d–f) wind increments. The plots present the increments generated from three alpha cases:
3. Experimental setup
Two series of experiments were performed, using a configuration similar to the operational system. The resolution for the control forecast model [
The assimilated observations include conventional observations from land surface stations, radiosondes, dropsondes and pilot balloons, aircraft, and buoys. Satellite remotely sensed observations/retrievals include global positioning system (GPS) radio occultation bending angle observations for temperature and water vapor, atmospheric motion vectors (AMVs) derived from both polar-orbiting and geostationary satellites, ocean surface winds from scatterometers and microwave imagers, and integrated water vapor from microwave imagers.
The largest set of observations comes from a wide range of satellite sensors. The satellite observations include the microwave sounders the Advanced Microwave Sounding Unit-A (AMSU-A), the Microwave Humidity Sounder (MHS), the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave Imager/Sounder (SSM/IS); and the advanced infrared sounders Aqua and Atmospheric Infrared Sounder (AIRS) and Meteorological Operation (MetOp-A), the Infrared Atmospheric Sounding Interferometer (IASI). The moisture analysis is mainly affected by radiosondes, surface observations, and the microwave imagers/sounders such as the MHS, SSM/IS, the Special Sensor Microwave Imager (SSM/I), and the Navy Research Laboratory's (NRL) WindSat. Because of technical difficulties with the decoders, MHS and Aqua AIRS/AMSU-A were not included in the suite of assimilated observations.
An 80-member ensemble centered on the analysis was generated using a configuration similar to the operational ET ensemble generation scheme (McLay et al. 2008, 2010). The ET ensemble generation scheme transforms 6-h forecast perturbations into analysis perturbations such that (i) each analysis perturbation is a linear combination of forecast perturbations and (ii) the covariance of the analysis perturbations is, on a globally averaged basis, consistent with an estimate of the analysis error covariance matrix obtained from NAVDAS-AR. The ET ensemble generation cycle selects growing perturbations in much the same way as Toth and Kalnay (1997)'s breeding method selects growing perturbations. Unlike the breeding scheme, the ET ensemble generation scheme ensures that variance is maintained in the entire vector subspace spanned by the ensemble. Although the NAVDAS-AR estimate of analysis error covariance only depends on the distribution and accuracy of observations and not on the flow or season, the ET ensemble analysis covariance is flow dependent because its perturbations are composed of dynamical modes that have recently amplified in response to the flow and stability properties of the recent flow-dependent state of the atmosphere.
The ET ensemble generation scheme used in our experiments has a climatological 3D-Var based estimate of analysis error variance for each time period of 0000, 0600, 1200, and 1800 UTC. Though there is a strong diurnal dependence in the variances due to the variation of the observation network (as discussed in McLay et al. 2008) through experimentation we detected little seasonal dependence. This is understandable because there is little seasonal dependence on the observation network. Thus we used diurnally varying climatological averages for all of our experiments. The climatological variances are averaged from 20 November 2008 to 31 December 2008. Recall that the ET analysis error covariance is determined by a combination of the instability of the recent and current flow regime and the prescribed NAVDAS estimate of analysis error variances. The ET analysis ensemble variances depend on the flow of the day and the season even though the NAVDAS estimate of analysis error variance does not. Low-resolution experiments comparing climatological and day-to-day varying analysis error variance files showed no statistically significant differences between the results. The initial ensemble, only used at the startup time of the experiment cycle, comprises 80 randomly chosen states of the atmosphere from a cycled system from December of 2008 with 32 flow-dependent ensemble members. The initial ensemble was used to start both the winter and summer experiments, which is one reason why a spinup period of 30 days was used.
The second reason the 30-day spinup period was to train the coefficients for the variational radiance bias correction system (VarBC) used for all radiance observations (Derber and Wu 1998; Dee 2004; B. Chua 2012, personal communication). With VarBC, the coefficients for the model-based bias predictors become part of the state vector and are updated each data assimilation cycle. Although satellite VarBC has not yet been put into operations at the Fleet Numerical Meteorology and Oceanography Center (FNMOC), tests have shown that it is as good or better than that used in operations at the time of writing and is planned to be transitioned to operations in the very near future. The current operational system uses the offline two-predictor bias correction approach of Harris and Kelly (2001).
4. Results
We use the static mode (
The first set of comparisons is between the static mode and the hybrid mode (
The forecast quality evaluations were made with single 5-day deterministic forecasts, launched every 12 h (which is what is done operationally), comparing mass and wind fields verified against self-analysis and/or radiosondes observations. The verification scores are computed either globally or separately for two extratropical regions (NH = 20°–80°N and SH = 20°–80°S) and the tropical region (TR = 20°S–20°N). The self-analysis verification is performed at every grid point on a 360 × 180 Gaussian grid and weighted by the cosine of latitude to account for different area coverage of the grid boxes. The radiosonde verification is performed on a subset of 400 high-quality radiosonde stations scattered around the globe with 80.5% in the NH, 8.5% in the TR, and 10.8% in the SH (J. Goerss 2012, personal communication).
There are advantages and disadvantages to using both radiosonde and self-analysis data for validation. A pitfall of verifying against self-analyses is that one would obtain perfect forecast scores if no observations were assimilated because the analysis would be the same as the forecast. It is generally believed that changes to the data assimilation scheme that reduce the magnitude and/or spatial area of analysis corrections can increase the similarity of forecasts to self-analyses without changing or even increasing the distance of forecasts from the truth. The issues with self-analysis verification are thus partially mitigated by longer forecasts, which is why we chose to only report self-analysis verifications with forecasts longer than 2 days. The primary limitation of verifying against just radiosondes is that radiosondes cover a relatively small area of the globe and that such an approach ignores information from other observation types such as aircraft and satellite observations. For these reasons, one must be cautious about drawing conclusions from either verification in isolation.






















a. RMS vector wind error for the hybrid mode experiment




















(top) Impact and (bottom) significance level of that impact of the verification scores of 5-day forecasts of vector wind from the hybrid mode (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

(top) Impact and (bottom) significance level of that impact of the verification scores of 5-day forecasts of vector wind from the hybrid mode (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
(top) Impact and (bottom) significance level of that impact of the verification scores of 5-day forecasts of vector wind from the hybrid mode (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 3, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 3, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 3, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 3, but given for the RMSD from verifying analyses rather than radiosondes. NOGAPS scorecard metrics are highlighted in black. The Met Office scorecard metrics are highlighted in green.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 3, but given for the RMSD from verifying analyses rather than radiosondes. NOGAPS scorecard metrics are highlighted in black. The Met Office scorecard metrics are highlighted in green.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 3, but given for the RMSD from verifying analyses rather than radiosondes. NOGAPS scorecard metrics are highlighted in black. The Met Office scorecard metrics are highlighted in green.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 5, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 5, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 5, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
The numerous red boxes in Figs. 3–6 indicate where the improvement of the hybrid over the control is statistically significant. The robustness of the improvement is particularly marked in the stratosphere and at 1–3-day forecast lead times in the troposphere.
It is of interest to note that in Figs. 3 and 4 at pressure levels 30 and 50 hPa in all regions, the static mode experiment is closer to the observations at lead time equal to zero than the hybrid mode. However, for lead time greater than zero the hybrid mode is closer to the verifying observations than the static mode. The fact that the hybrid mode forecasts are better than the static mode forecasts suggests that the hybrid mode analyses were better than the static mode analyses even though the hybrid mode analyses are further away from the observations at lead time equal to zero. This is possible because observations themselves are imperfect and hence, an analysis that lies closer to the observations than another analysis is not necessarily better.
The magnitude of the percentage improvements for radiosondes (Figs. 3 and 4) is smaller than that from the self-analysis improvements (Figs. 5 and 6). We have not yet been able to diagnose the cause this inconsistency but there are a variety of possibilities. First, radiosonde observations and analyses have independent errors that no amount of improvement of the forecast can remove. Consequently, if observation error variances were larger than analysis error variances then the percentage reduction in the RMS difference between forecast and verification due to a fixed reduction in forecast error variance would be smaller when the verification was based on radiosonde observations than it would be if the verification was based on analyses. Second, even though comparisons against self-analyses do allow all observation types to contribute to the verification metric, false positives/negatives are possible. If, for example, no observations were assimilated for a period that extended beyond 5 days, the verifying analysis would become identical to the 5-day forecast because; in the absence of any observations, our current system sets the analysis equal to the forecast. In future work, we hope to include an independent analysis (from a different operational center) among our verification tools so that we can avoid the possibility of false positives due to comparison against self-analyses.
b. Globally averaged radiosonde verifications for the hybrid mode experiment
The globally averaged analysis and forecast verification versus radiosonde observations are presented in Figs. 7 and 8 for geopotential height (left-hand column), temperature (center column) and vector winds (right-hand column). The results are presented in style similar to the regionally averaged RMS vector wind error plots in section 4a, but for globally averaged values. The radiosonde verification vector wind plots (right-hand column) of Figs. 7 and 8 are an average of all of the regions of Figs. 3 and 4 with the addition of radiosondes near the poles. These plots highlight the similarity and differences between the results of the different forecast metrics.

As in Fig. 3, but the RMSD from radiosondes is averaged over the whole globe and is for (left) geopotential height, (middle) temperature, and (right) vector wind. NOGAPS scorecard metrics are highlighted in black.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 3, but the RMSD from radiosondes is averaged over the whole globe and is for (left) geopotential height, (middle) temperature, and (right) vector wind. NOGAPS scorecard metrics are highlighted in black.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 3, but the RMSD from radiosondes is averaged over the whole globe and is for (left) geopotential height, (middle) temperature, and (right) vector wind. NOGAPS scorecard metrics are highlighted in black.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 7, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 7, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 7, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
In general, the results with the different metrics (geopotential height, temperature, and vector wind) are similar (i.e., the hybrid experiment resulted in improvement of the different metrics at most levels and at most forecast lead times greater than lead time equal to zero). One notable exception to this is for temperature at 30 hPa in Fig. 7 (July–August 2010) where the static mode experiment was statistically significantly better than the hybrid mode at lead times >3 days (although the hybrid mode was better in vector winds and geopotential height). We have not yet been able to diagnose the cause of this inconsistency but there are a variety of possibilities such as the improper specification of the ensemble localization at this level (which is very tight). In future work, we hope to address the localization issue and try and improve temperature forecasts in the stratosphere without removing the improvements seen in the geopotential heights or vector winds.
c. Geopotential height anomaly correlation for the hybrid mode experiment








(top) Impact and (bottom) significance level of that impact of the verification scores of 5-day forecasts of the geopotential height anomaly correlation from the hybrid mode (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

(top) Impact and (bottom) significance level of that impact of the verification scores of 5-day forecasts of the geopotential height anomaly correlation from the hybrid mode (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
(top) Impact and (bottom) significance level of that impact of the verification scores of 5-day forecasts of the geopotential height anomaly correlation from the hybrid mode (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 9, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 9, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 9, but displaying 0000 UTC 1 Feb–0000 UTC 1 Apr 2011.
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
Figures 9 and 10 of the geopotential height anomaly correlation demonstrate the importance of the use of spatially localized and flow-dependent background-error covariance versus stationary and highly parameterized covariances. For both time periods, the 2–5-day lead-time forecasts in the Southern Hemisphere for all vertical levels and in the Northern Hemisphere below 100 hPa the hybrid mode experiment resulted in improvement to the geopotential height anomaly correlations. Many of these areas were statistically significantly closer to 1.0 than the static mode experiment. However, the improvements were slight and there were no lead times or vertical levels where the increase was greater than 5%. There is one small area in the Northern Hemisphere, 100 hPa at the 5-day lead time, where the static mode was closer to 1.0 than the hybrid mode but this was not a statistically significant difference.
d. Flow-dependent mode verifications
Presented in Figs. 11 and 12 are the radiosonde verification and Figs. 13 and 14 are the self-analysis verification of the RMS vector wind error results comparing the static mode (

As in Fig. 3, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 3, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 3, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 4, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 4, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 4, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 5, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 5, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 5, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 6, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

As in Fig. 6, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
As in Fig. 6, but comparing between the static mode
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
Comparing Figs. 11–14 we see very good agreement in results with Figs. 3–6 above 100 hPa of all of the plots. Where there is discrepancy between the hybrid mode results and the flow-dependent mode results is below 100 hPa. Both radiosonde and self-analysis verifications agree that the 100% flow-dependent ensemble is inferior to the static background error covariance matrix in the extratropical troposphere. The radiosonde verification (Figs. 11 and 12) disagrees with the self-analysis verification (Figs. 13 and 14) in the tropics where the radiosonde favors the static mode and the self-analysis favors the flow-dependent mode. When both verification statistics disagree with each other it is hard to identify which verification is the truth. On one hand the self-analysis verification is flawed because one would obtain perfect forecast scores if no observations were assimilated because the analysis would be the same as the forecast. On the other hand the radiosondes cover a relatively small area of the globe, which ignores information from other observation types such as aircraft and satellite observations. In future work, we hope to include an independent analysis (from a different operational center) among our verification tools because there is less chance that the analysis error of an independent analysis will be correlated with forecast error.
Our flow-dependent results are contrary to the findings of Buehner et al. (2010b, see their Figs. 6 and 7) who found a positive impact of the 100% flow-dependent ensemble shown in all regions and at all levels. This suggests that the ratio of the accuracy of the Canadian ensemble covariance model relative to the Canadian static covariance model is greater than the corresponding ratio for our system. One significant difference between our ET ensemble and the Canadian ensemble is that the Canadian ensemble incorporates samples from a static covariance matrix (Houtekamer et al. 2005) whereas our ET ensemble does not. This fact might mean that the Canadian ensemble covariance model may not be able to benefit as much from being linearly combined with a static covariance model as our system because it already partially incorporates information from the 3D-Var static covariance model. It should be noted that there are other differences as well between the two systems such as ensemble size, Buehner et al. used an ensemble of 96 members while we have an ensemble of 80 members. Also, it is likely that the Canadian EnKF provides an accurate estimate of the effect of observations on analysis error covariance than the very approximate ET approach. Finally another difference is that Buehner et al. simulate the effect of model error with perturbations to the forecast initial conditions and different configurations of the model physical parameterizations (Houtekamer et al. 2009) neither of which are included in our ensemble system.
We found, in low-resolution experiments, that the ensemble covariances, and not the ensemble variances, were the major contributor to the improvements of the hybrid and flow-dependent modes over the static mode. For computational reasons this investigation was performed using a low-resolution version of the system (outer-loop resolution T119/L42 and 32-member ensemble/inner-loop resolution T47/L42) and only conventional observations (no satellite sensors). The computational cost of running experiments at operational resolution and with operational observations is 2 months of computer time for each experiment whereas low-resolution experiments can be completed in a matter of days. The covariance experiments were performed using mixed
e. Aggregate NOGAPS scorecard
The tool used at the U.S. Navy's operational center (FNMOC) to summarize verification results between a control experiment (normally the operational code) and a comparison experiment is the so-called “NOGAPS scorecard” (R. Pauley 2012, personal communication). The set of scorecard verification metrics are listed in Table 1 for the deterministic forecasts (a different scorecard is used with the ensemble forecasts). The scorecard awards positive (or negative) points when forecasts of a specified field have been improved (or degraded). The more positive points awarded, the greater the perceived value of the system change.
NOGAPS scorecard. The areas: global (GL), Northern Hemisphere (NH), tropics (TR), and Southern Hemisphere (SH). Tau is the forecast lead time in days. For anomaly correlations, the criteria for weighting are that the scores must be statistically different with a confidence level of 95%. For all other score types, the criterion for weighting is that the scores must be at least 5% less and that the scores must be statistically different with a confidence level of 95%. Weights for the control experiment are negative and weights for the comparison experiment positive. The total score for a control vs comparison experiment is the sum of all category weights with a maximum of 24 points at stake. An aggregate score of −1 or better is considered to be a neutral (or better) overall result and is the minimum requirement for a major system change to be considered for operational promotion.


The Northern Hemisphere 500-hPa geopotential height anomaly correlation for the 96-h forecast is given 4 times the weight of other geopotential height anomaly correlation metrics in the table. The anomaly correlation scores must be statistically higher with a confidence level of 95%. For wind speed and vector wind RMS errors, the errors of the comparison experiment must be at least 5% less than for the control experiment, with a confidence level of 95%. The aggregate verification score is the sum of all categories with a maximum of 24 points. An aggregate score of −1 or better is considered to be neutral (or better) for the overall results and is a minimum requirement to promote a major system upgrade. The score of −1 allows for a small amount of degradation in the verification results that occurs sometimes in major system changes.
Figures 3–10 include a black outlines around the boxes where the NOGAPS scorecard metrics are measured. For both experiments, the buoy verification as well as the tropical cyclone forecast track verification results yielded a neutral (+0) scorecard value (not shown).
The aggregate NOGAPS scorecard for the hybrid mode experiments has a score of +1 for the July–August 2010 experiment and +1 for the February–March 2011 experiment. If a 5% improvement constraint is relaxed, and only statistical significance is required, then the hybrid system has a total aggregate score of +7 for July–August 2010 and +5 for February–March 2011. Either way, the scores represent values sufficiently high for the system to be considered for extended testing, tuning and preoperational trials. As a reference, the NAVDAS (3D-Var) preoperational trials had a scorecard value of +1, while the NAVDAS-AR (4D-Var) preoperational trials had a scorecard value of +4.
The NOGAPS scorecard value for the flow-dependent mode experiment is considerably different. Both the July–August 2010 and the February–March 2011 experiments had a total aggregate score of −10. Clearly running the system in the flow-dependent mode would not warrant operational implementation at this time. However, as can be seen in Figs. 13 and 14, there are areas that performed better in the flow-dependent mode experiment than in the hybrid mode experiment. For example, the flow-dependent mode experiment had greater than a 30% reduction in the tropical vector wind RMS errors against self-analysis for all forecast lead times (Figs. 13 and 14) for all levels 100 hPa and above. In contrast, the hybrid mode experiment (Figs. 5 and 6) had between 5% and 20% improvement for most of those levels and forecast lead times. For this reason, we hypothesize that a spatially varying value of α may yield the best results.
f. Aggregate Met Office scorecard
Figure 15 was created to facilitate intercomparison with Clayton et al. (2013)'s work (see Fig. 10 in their paper). This figure does not include all of the Met Office scorecard metrics [listed in Table 1 of Clayton et al. (2013)]. Some of the Met Office scorecard metrics presented in this paper can be seen as highlighted in green in Figs. 5–6, 9–10, and 13–14. Comparison with Clayton et al.'s results shows that our comparison against self-analysis in the tropics is better than those obtained by Clayton et al. but is about the same in the extratropics. Presumably, the differences are due to some combination of differences in the ratio of the accuracies of the ensemble covariances to static covariancies in each of the systems. Notable differences between the two systems include (i) our ensemble has 80 members whereas theirs had 24 members; and (ii) Clayton et al. localized ensemble covariances of geopotential, streamfunction, and velocity potential whereas we localized ensemble covariances of wind and temperature. As shown by Kepert (2011), their localization approach is better at preserving quasigeostrophic balance. While quasigeostrophic balance is a fair assumption in the extratropics, its value in the tropics is unclear. Our approach to localization ensures that the variance of the wind field is preserved whereas the Kepert (2011)'s approach does not. As noted by Kepert, localizing streamfunction and velocity potential causes the modeled wind variance to be larger than the wind variance in the unlocalized ensemble. In the extratropics, the potentially damaging effects of spuriously increasing the wind variances are compensated for by improved quasigeostrophic balances. However, in the tropics, there is no compensating “balance benefit” and, hence, it is possible that localization of streamfunction, velocity potential, and geopotential is worse than localization of wind and temperature. More careful experimentation will be required to assess the validity of this hypothesis.

Truncated table of Met Office scorecard metrics. Bars represent percentage changes to RMS error comparing of static mode (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1

Truncated table of Met Office scorecard metrics. Bars represent percentage changes to RMS error comparing of static mode (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
Truncated table of Met Office scorecard metrics. Bars represent percentage changes to RMS error comparing of static mode (
Citation: Monthly Weather Review 141, 8; 10.1175/MWR-D-12-00182.1
To check the dynamic balance of the initial conditions obtained in our experiments we examined the forecast model's global RMS surface pressure tendency (SPT) averaged between 0000 UTC 1 July 2010–0000 UTC 1 September 2010 and 0000 UTC 1 February–0000 UTC 1 March 2011. We found that in all experiments, the SPT is at its highest in the first 3 h of the forecast integration. We take these elevated values to be a sign of imbalance and use the average of SPT over the first 3 h as a measure of imbalance. The flow-dependent mode SPT values were lower than the static mode SPT values. This result was found to be statistically significant with greater than 99% confidence. This suggests that the states produced by our localized ensemble covariances propagated by the TLM and adjoint were at least as balanced as those produced by our static initial covariance model.1 In considering this result, recall that the balances used in this static covariance are currently fairly simplistic (Daley and Barker 2001). Hopefully, further improvements to the balance of initial conditions from localized ensemble covariances could be obtained by using localization methods that are better designed to preserve balance such as those suggested by Kepert (2011). The SPT values of the hybrid mode were also less than the static mode values but this difference was not statistically significant at the 95% margin. Last, we checked to make sure the SPT values from the static mode was not statistically significantly different from the operational mode experiment with time window discretized into 0.5-h intervals and the digital filter turned on.
5. Conclusions
We have tested the performance of a new form of the Navy's NAVDAS-AR observation space data assimilation scheme. This new form replaces a purely static initial covariance matrix by a hybrid linear blend of a static and ensemble-based covariance matrix. Our results show that the hybrid mode (
Acknowledgments
We thank all those people responsible for the development of NAVDAS-AR, in particular, we thank the late Roger Daley who first formulated and initiated the development of NAVDAS-AR. We would also like to thank Liang Xu who was the PI of the project that ultimately led to the transition of NAVDAS-AR into operations. Boon Chua, Tim Hogan, Ben Ruston, James Goerss, and Pat Pauley also made major contributions to NAVDAS-AR. This research was started while D. D. Kuhl held a National Research Council Research postdoctoral fellowship at the Naval Research Laboratory, Washington, D.C., and continued during his NRL Jerome Karl Fellowship award. T. Rosmond acknowledges support from PMW-120 under Program Element 0603207N. C. H. Bishop acknowledges support from Office of Naval Research base funding via Program Element 0601153N, Task BE033-03-45, and NOPP funding via Program Element 0602435N. NAVDAS-AR was originally developed with ONR and PMW-120 funding under NRL base Program Elements 0601153N and 0602435N.
REFERENCES
Bishop, C. H., and Z. Toth, 1999: Ensemble transformation and adaptive observations. J. Atmos. Sci., 56, 1748–1765.
Bishop, C. H., D. Hodyss, P. Steinle, H. Sims, A. M. Clayton, A. C. Lorenc, D. M. Barker, and M. Buehner, 2011: Efficient ensemble covariance localization in variational data assimilation. Mon. Wea. Rev., 139, 573–580.
Buehner, M., P. L. Houtekamer, C. Charette, H. L. Mitchell, and B. He, 2010a: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part I: Description and single-observation experiments. Mon. Wea. Rev., 138, 1550–1566.
Buehner, M., P. L. Houtekamer, C. Charette, H. L. Mitchell, and B. He, 2010b: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part II: One-month experiments with real observations. Mon. Wea. Rev., 138, 1567–1586.
Clayton, A. M., A. C. Lorenc, and D. M. Barker, 2013: Operational implementation of a hybrid ensemble/4D-Var global data assimilation system at the Met Office. Quart. J. Roy. Meteor. Soc., doi:10.1002/qj.2054, in press.
Courtier, P., 1997: Dual formulation of four-dimensional variational assimilation. Quart. J. Roy. Meteor. Soc., 123, 2449–2461.
Courtier, P., J. N. Thepaut, and A. Hollingsworth, 1994: A strategy for operational implementation of 4D-Var, using an incremental approach. Quart. J. Roy. Meteor. Soc., 120, 1367–1387.
Daley, R., and E. Barker, 2001: The NAVDAS sourcebook. Naval Research Laboratory NRL/PU/7530–01-441, 161 pp. [Available online at http://www.dtic.mil/dtic/tr/fulltext/u2/a396883.pdf.]
Dee, D., 2004: Variational bias correction of radiance data in the ECMWF system. Proc. Workshop on Assimilation of High Spectral Resolution Sounders in NWP, Reading, United Kingdom, ECMWF, 97–112.
Derber, J. C., and W. S. Wu, 1998: The use of TOVS cloud-cleared radiances in the NCEP SSI analysis system. Mon. Wea. Rev., 126, 2287–2299.
Etherton, B. J., and C. H. Bishop, 2004: Resilience of hybrid ensemble/3DVAR analysis schemes to model error and ensemble covariance error. Mon. Wea. Rev., 132, 1065–1080.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 143–10 162.
Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter–3D variational analysis scheme. Mon. Wea. Rev., 128, 2905–2919.
Hamill, T. M., J. S. Whitaker, D. T. Kleist, M. Fiorino, and S. G. Benjamin, 2011: Predictions of 2010's tropical cyclones using the GFS and ensemble-based data assimilation methods. Mon. Wea. Rev., 139, 3243–3247.
Harris, B., and G. Kelly, 2001: A satellite radiance-bias correction scheme for data assimilation. Quart. J. Roy. Meteor. Soc., 127, 1453–1468.
Hogan, T. F., T. Rosmond, and R. Gelaro, 1991: The NOGAPS forecast model: A technical description. Naval Research Laboratory AD–A247 216, 218 pp. [Available online at http://handle.dtic.mil/100.2/ADA247216.]
Houtekamer, P. L., H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, L. Spacek, and M. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Wea. Rev., 133, 604–620.
Houtekamer, P. L., H. L. Mitchell, and X. X. Deng, 2009: Model error representation in an operational ensemble Kalman filter. Mon. Wea. Rev., 137, 2126–2143.
Kepert, J. D., 2011: Balance-aware covariance localisation for atmospheric and oceanic ensemble Kalman filters. Comput. Geosci., 15, 239–250.
Kleist, D. T., D. F. Parrish, J. C. Derber, R. Treadon, W. S. Wu, and S. Lord, 2009: Introduction of the GSI into the NCEP Global Data Assimilation System. Wea. Forecasting, 24, 1691–1705.
Lewis, J. M., and J. Derber, 1985: The use of adjoint equations to solve a variational adjustment problem with advective constraints. Tellus, 37A, 309–322.
Lorenc, A. C., 1981: A global three-dimensional multivariate statistical interpolation scheme. Mon. Wea. Rev., 109, 701–721.
Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP— A comparison with 4D-Var. Quart. J. Roy. Meteor. Soc., 129, 3183–3203.
McLay, J. G., C. H. Bishop, and C. A. Reynolds, 2008: Evaluation of the ensemble transform analysis perturbation scheme at NRL. Mon. Wea. Rev., 136, 1093–1108.
McLay, J. G., C. H. Bishop, and C. A. Reynolds, 2010: A local formulation of the ensemble transform (ET) analysis perturbation scheme. Wea. Forecasting, 25, 985–993.
Rosmond, T., and L. Xu, 2006: Development of NAVDAS-AR: Non-linear formulation and outer loop tests. Tellus, 58A, 45–58.
Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 3297–3319.
Wang, X. G., C. Snyder, and T. M. Hamill, 2007: On the theoretical equivalence of differently proposed ensemble–3DVAR hybrid analysis schemes. Mon. Wea. Rev., 135, 222–227.
Wang, X. G., D. Parrish, D. Kleist, and J. Whitaker, 2013: GSI 3DVar-based ensemble-variational hybrid data assimilation for NCEP Global Forecast System: Single resolution experiments. Mon. Wea. Rev., in press.
Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Elsevier Academic Press, 704 pp.
Xu, L., T. Rosmond, and R. Daley, 2005: Development of NAVDAS-AR: Formulation and initial tests of the linear problem. Tellus, 57A, 546–559.
Zhang, F. Q., M. Zhang, and J. A. Hansen, 2009: Coupling ensemble Kalman filter with four-dimensional variational data assimilation. Adv. Atmos. Sci., 26, 1–8.
Zhang, F. Q., M. Zhang, and J. Poterjoy, 2013: E3DVar: Coupling an ensemble Kalman filter with three-dimensional variational data assimilation in a limited-area weather prediction model and comparison to E4DVar. Mon. Wea. Rev., 141, 900–917.
Zhang, M., and F. Q. Zhang, 2012: E4DVar: Coupling an ensemble Kalman filter with four-dimensional variational data assimilation in a limited-area weather prediction model. Mon. Wea. Rev., 140, 587–600.
Experiments not reported on here suggest that the use of the TLM and adjoint in the 4D covariance model improves the balance of the analyzed states.