1. Introduction
In a continuous effort to improve initial conditions for convective-scale numerical weather predictions (NWP), operational centers have developed and implemented high-resolution limited-area data assimilation based on 3D variational [3DVar; Météo France: Brousseau et al. (2011); Japan Meteorological Agency (JMA): Aranami et al. (2015)], 4D variational [4DVar; JMA: Honda et al. (2005); Met Office: Ingleby et al. (2013)], ensemble Kalman filter [EnKF; Deutscher WetterDienst (DWD): Schraff et al. 2016], or 3D ensemble–variational (3D-EnVar) approaches [National Oceanic and Atmospheric Administration (NOAA): Benjamin et al. 2004, 2016; Hu et al. 2006, 2017; Wu et al. 2017]. Gustafsson et al. (2018) provide an extensive survey of the current methods being used and developed.
Following the methodology proposed with the Rapid Update Cycle (RUC) system (Benjamin et al. 2004), most high-resolution limited-area data assimilation algorithms now use a short (1 or 3 h) assimilation time window. Such a short time window may allow updating the convection-permitting models more frequently, but reduces the flow dependency of 4DVar methods when using climatological background error covariances at the beginning of the window. These climatological covariances may be well adapted to describe large-scale balance relationships, such as geostrophy and hydrostaticity (Gauthier et al. 1999), but do not capture finescale correlations within the flow (Bédard et al. 2015). They still appear to be essential within data assimilation schemes that use hybrid covariances (Buehner 2005; Buehner et al. 2015; Caron et al. 2015; Benjamin et al. 2016; Hu et al. 2017; Wu et al. 2017), possibly due to a reduction in the effects of sampling errors from ensemble-based background error statistics. However, the use of climatological error structures can actually produce unbalanced increments in some situations and thereby limit the propagation of the information from dense observation networks (Bédard et al. 2017).
Increased spatial resolution for the forecast model implies a higher number of degrees of freedom and a need for dense observation networks to constrain the model initial state. Like at many other NWP centers, only a small fraction of satellite, radar, and surface observations is being used in Environment and Climate Change Canada (ECCC) operational systems. For example, the horizontal thinning distance for all assimilated satellite radiances is 150 km, weather radar observations are not assimilated, and the screen-level wind observations are not assimilated over land. Although data assimilation for convective-scale NWP has been the object of intense research lately, the resolution and the quality of background error covariances remain factors limiting the assimilation of dense observations (Gustafsson et al. 2018). Many operational centers are thus now investing in the generation of high-resolution ensembles to provide their assimilation schemes with flow-dependent background error covariances at scales closer to those resolved by the NWP model (e.g., Lu et al. 2017). As a result of improved quality of background error statistics from ensembles, further improvements should be obtainable from reducing the relative contribution of the climatological error component in hybrid schemes.
As the generation of high-resolution ensemble-based background error covariances can be complex and computationally cost prohibitive, practical and low-cost methodologies are presented and evaluated in this study. Cycled data assimilation experiments are performed to assess the value of the different ensemble generation approaches. The practical and low-cost methodologies are compared with the ECCC operational global EnKF (G-EnKF) and an experimental high-resolution regional EnKF (R-EnKF). The goal is to provide ECCC’s experimental regional deterministic 4D-EnVar scheme with higher-resolution flow-dependent background error covariances that perform at least as well as those from the G-EnKF, when using the same observations within the deterministic data assimilation scheme. The use of these higher-resolution ensemble-based covariances is expected to result in further improvements by enabling more effective use of observations at high temporal and spatial resolutions.
The paper is organized as follows. Section 2 provides an overview of ECCC’s G-EnKF and R-EnKF and introduces the practical and low-cost ensemble generation approaches used to estimate flow-dependent background error covariances. Section 3 presents the experimental regional deterministic data assimilation and prediction system, while also describing the configuration of the experiments performed in this study. An analysis of the computational costs for each ensemble generation method is also given in section 3. The ensemble characteristics from applying the different approaches are assessed through diagnostic results presented in section 4. The impacts of using ensembles generated with the different approaches are then evaluated in section 5 in the context of data assimilation experiments by comparing the resulting deterministic forecasts with in situ observations. Concluding remarks and aspects of future development are presented in section 6.
2. Ensemble-based background error covariances
Various practical ensemble generation approaches are considered from a hierarchy of methods ranging from simple and cheap to complex and expensive. The simplest and cheapest solution considered is using the global ensemble forecasts from the ECCC G-EnKF interpolated to the regional grid, given that this system is already operational and therefore does not require the implementation of an additional ensemble system. In terms of high-resolution ensembles, a computationally cheap and conceptually simple solution is to add balanced perturbations to a deterministic analysis (the so-called filter free approach). A more complex, yet still computationally cheap, approach is to generate a regional ensemble of forecasts initialized with a simplified ensemble square root filter (S-EnSRF), centered on deterministic analyses. Finally, the most expensive approach considered is to generate ensemble forecasts from the ECCC R-EnKF. These ensemble generation methods are presented in the following subsections.
a. Global and regional EnKFs
A G-EnKF was first implemented at ECCC in 2005 as the data assimilation methodology used to initialize medium-range global ensemble forecasts (Houtekamer et al. 2005). A brief history of the G-EnKF and some key aspects can be found in Houtekamer et al. (2014a,b). In a nutshell, ECCC’s EnKF is based on a sequential filter in which observations are randomly perturbed to account for their errors. It employs a multimodel approach in which ensemble members use a variety of different model configurations to sample uncertainty in the physical parameterizations. Still, to cope with insufficient ensemble spread in the forecasts, additive covariance inflation sampled from climatological background error covariances is applied as a system error term. To address sampling errors due to limited ensemble size, spatial localization is applied to the background error statistics by computing the Schur product of the ensemble-based covariances with a compactly supported fifth-order piecewise rational function, as described by Gaspari and Cohn (1999) and Houtekamer and Mitchell (2001).
To provide the experimental regional deterministic data assimilation scheme with higher-resolution background error covariances, an experimental R-EnKF has recently been developed. It is essentially based on the G-EnKF algorithm, with the exception that all ensemble members have identical forecast model configuration. Also, being a limited-area system, the R-EnKF uses lateral boundary conditions (LBCs) from the G-EnKF. Details on the experimental configurations of both the G-EnKF and R-EnKF are given in section 3b.
b. Filter free approach
c. Simplified ensemble square root filter
While the analysis procedure of the EnKF and the full EnSRF tend to sharpen the ensemble perturbation correlations when assimilating dense observations with uncorrelated errors, the S-EnSRF performs a pointwise scaling of background perturbations and thus has no effect on the ensemble correlations (both spatial and multivariate), which evolve freely in time. This behavior is similar in some ways to the bred vector approach (Toth and Kalnay 1993, 1997), especially when 3D rescaling is applied (Ma et al. 2014). However, the S-EnSRF updates the ensemble perturbation based on the ensemble background error variance (from Kalman filtering equations) and therefore controls the ensemble spread in a similar way as the EnKF and EnSRF algorithms.
Like ECCC’s EnKF, the S-EnSRF uses simple additive inflation to cope with insufficient ensemble spread in the forecasts. The sum of the analysis perturbations and the additive inflation perturbations is gradually added to the deterministic analysis using the same procedure as IAU. Each ensemble member is integrated with the same forecast model to span the next assimilation window.
d. Thoughts on ensemble generation methods
Conceptually, the Kalman filter update to ensemble perturbations tends to “flatten” the ensemble spread, and the S-EnSRF is designed to replicate this behavior. To illustrate this effect, Fig. 1 shows the resulting analysis spread (
For very large background spread, Fig. 1 shows that the S-EnSRF analysis spread reaches an asymptotical value determined by the local observation error (
Figure 1 also shows that the filter free approach provides a uniform analysis spread, independent of the background spread. This is a first-order approximation of the analysis spread from an EnKF. In this case, the system relies on the forecast model to dampen/amplify the perturbations in dynamically stable/unstable atmospheric regimes to provide flow-dependent background error covariances to the subsequent analysis cycle. Similar to the filter free approach, simple additive inflation sampled from climatological background error covariances further flattens the ensemble spreads of the EnKFs and S-EnSRF.
3. Experimental framework
a. An experimental regional numerical weather prediction system
The numerical experiments presented in this study are based on the limited-area version of the Global Environmental Multiscale (GEM) model version 4 (Girard et al. 2014; Zadra et al. 2014) using 80 staggered vertical levels with the lid at 0.1 hPa. The data assimilation procedure and model configuration used in this study are essentially the same as in ECCC’s operational regional deterministic prediction system (referred to as the “continental” system; Caron et al. 2015), but with important differences regarding the model initialization and the forecast grid. As in the operational global deterministic prediction system (GDPS; Buehner et al. 2015; Qaddouri et al. 2015), the experimental system employs a continuous data assimilation cycle, the important physics variables are recycled, and the model is initialized using the 4D-IAU scheme (Buehner et al. 2015). The horizontal domain is the same as used in the high-resolution prediction system described by Milbrandt et al. (2016): it is smaller than that of the continental system and covers most of Canada and the northern United States (including part of Alaska). For the purpose of this study, the forecast model uses a lower resolution (10 instead of 2.5 km) due to computational resource restrictions. The topographic elevation within the domain is presented in Fig. 2.
Deterministic data assimilation experiments were carried out over all of July 2014. The experimental regional 4D-EnVar algorithm employs a 6-h time window, and it is cycled four times per day at 0000, 0600, 1200, and 1800 UTC like the operational systems at ECCC. All observations assimilated in ECCC operational systems are used, provided they are located within the horizontal domain, which includes those from radiosondes, aircraft, land stations, ships, buoys, scatterometers, atmospheric motion vectors, satellite-based radio occultation, precipitable water vapor retrievals from ground-based global positioning systems, and brightness temperature from microwave and infrared satellite sounders/imagers. Nearly one million observations are assimilated every day. The system performs 48-h forecasts twice per day (at 0000 and 1200 UTC) using LBCs provided by the GDPS (Caron et al. 2015).
The background error covariances are computed using ensemble members from different methods. The background error covariances are localized using a horizontal length scale of 1400 km (defined here as the radial distance where the function reaches zero) and a vertical length scale of 1.0 in the natural logarithm of atmospheric pressure. All experiments use the same experimental regional model and configuration. The different background error covariance matrices used are summarized in Table 1.
Description of the deterministic assimilation experiments using background error covariances estimated from different ensemble generation methods: G-EnKF, R-EnKF, S-EnSRF, and the filter free approach
b. Ensemble generation
1) Global ensemble Kalman filter
The experimental configuration of the G-EnKF used in the present study is similar in many ways to the one described by Houtekamer et al. (2014a,b). Both systems use the same methodology and have 256 ensemble members at ~50-km horizontal resolution (800 × 400 global grid). However, to increase its compatibility with the deterministic system, three modifications are applied to the G-EnKF. First, the model lid is raised from 2.0 to 0.1 hPa, and the number of vertical levels is increased from 74 to 80. Second, the IAU is employed to initialize the model, replacing the digital filter initialization technique. Last, brightness temperature observations from a relatively small number of hyperspectral infrared sounder [Atmospheric Infrared Sounder (AIRS), Cross-Track Infrared Sounder (CrIS), and Infrared Atmospheric Sounding Interferometer (IASI)] channels are assimilated. These changes improved the filter performance and will be included in the next operational version of the G-EnKF. As described by Houtekamer et al. (2014a,b), the G-EnKF cycles four times per day (at 0000, 0600, 1200, and 1800 UTC), and the ensemble members are available every hour within the subsequent 6-h assimilation window for use in deterministic data assimilation system. The localization scheme employed within the G-EnKF forces the background error covariance to zero at a horizontal distance ranging from 2100 km at the surface to 3000 km at the model top and at a natural logarithm of atmospheric pressure difference of 2.0 in the vertical. Finally, the additive inflation is based on the global climatological background error covariances (spectrally truncated at global wavenumber 40) with a scaling factor of 0.332.
2) Regional ensemble Kalman filter
To demonstrate the added value of higher-resolution ensembles in a controlled environment, the R-EnKF is based on the G-EnKF algorithm with the following four exceptions. First, all 256 ensemble members have an identical forecast model configuration. Second, the R-EnKF is a limited-area system and uses LBCs from the 256 G-EnKF members. Third, its domain and model configuration are the same as those of the experimental regional deterministic prediction system. Although the R-EnKF is a limited-area system, its additive inflation is based on the global climatological background error covariances (also truncated at wavenumber 40) for consistency with the G-EnKF algorithm. Compared to the G-EnKF, the covariance scaling factor of the additive inflation is reduced to 0.252 since the higher-resolution model produces a more rapid increase of the ensemble spread in the short-range forecasts.
3) Simplified approaches
Like the R-EnKF, all the ensemble members from the simplified approaches (S-EnSRF and filter free) use identical forecast model configurations. The domain and model configuration are also the same as those from the experimental regional deterministic prediction system at 10-km resolution. To reduce the computational cost of the experiments presented in this paper, only 128 members are generated using the simplified approaches. Thus, only 128 members of the EnKFs are used for evaluation to ensure a proper comparison of the different approaches. To ensure realistic ensemble perturbations early in the model integrations of the filter free approach, limited-area climatological background error covariances (at the same spatial resolution as the model) are used to sample the additive inflation. The S-EnSRF also uses the limited-area covariances to facilitate comparison between the two simplified approaches. This is in contrast with the global and regional EnKFs that both use low-resolution global climatological background error covariances. The S-EnSRF uses a scaling factor of 0.1652 (applied to the covariances) to maintain an ensemble spread comparable to that of the EnKF. As the filter free approach is fully based on the perturbations sampled from the limited-area climatological background error covariances, it uses a higher scaling factor (0.502) than the other methods.
By design, the ensembles from the simplified approaches are centered on deterministic analyses from a previous regional 4D-EnVar experiment that used background error covariances estimated from the 256 ensemble members of the ECCC operational G-EnKF. It ensures that the ensemble means from the simplified approaches do not suffer from using only 128 ensemble members because the deterministic analysis used employed 256 ensemble members like both of the EnKFs. This allows for a fair comparison of the simplified approaches with the EnKFs. The previous regional 4D-EnVar experiment uses LBCs from the GDPS. The ensembles from the simplified approaches also use LBCs from the GDPS (to ensure consistency with the deterministic analysis used to center the perturbations), plus ensemble perturbations from the G-EnKF forecasts. This allows for sufficient spread near the model lateral boundaries.
c. Computational costs
As mentioned in the introduction, the generation of high-resolution ensemble-based background error covariances can be cost prohibitive. Table 2 presents the approximate computational costs (in core hours) for the four ensemble generation methodologies (for 128 members) as measured on the Cray XC40 supercomputer of ECCC.
Computational cost (in core hours) as measured on the Cray XC40 supercomputer of ECCC for different ensemble generation methods (128 members): G-EnKF, R-EnKF, S-EnSRF, and filter free.
Considering that the G-EnKF system is run operationally at ECCC for medium- and long-range ensemble predictions, the use of its ensemble forecast is virtually cost free because interpolating global forecasts to the regional grid only takes a few minutes on a single core. On the other hand, an operational implementation of the R-EnKF would add to the overall cost more than 2772 core hours per 6-h cycle. However, by simplifying the ensemble update step, the simplified approaches (S-EnSRF and filter free) reduce the cost of the analysis step to essentially zero. This cost reduction only represents 15% of the total computational costs of the R-EnKF since most of the computational cost is related to the high-resolution ensemble forecasts. Still, the simplicity of the S-EnSRF algorithm could make this method attractive to a center that does not already have an EnKF. Because S-EnSRF and the filter free approach do not use observations to generate perturbations, their computational scheduling in an operational context is also more flexible.
4. Diagnostic results
This section presents diagnostic results to evaluate the different ensemble generation methods. The evolution of the ensemble spread and the background error correlations are studied in the following subsections for a specific case to prevent the results from being dominated by the large signal from Hurricane Arthur. Many cases were studied, and the results from the case presented (0000 UTC 9 July) are generally representative. Results are presented for vertical levels between the surface and 100 hPa, as the troposphere is the main focus of the experimental regional prediction system.
a. Ensemble spread
To assess the overall ensemble spread after 6 h of model integration, Fig. 3 presents mean surface pressure spread along with vertical profiles of the mean zonal wind, temperature, and humidity spread for the different ensemble generation methods. It can be seen that the mean ensemble-spread profiles are similar for all ensemble generation methods. Still, one can note that the limited-area ensemble generation methods (R-EnKF, S-EnSRF, and filter free) have more wind variability than the G-EnKF. The limited-area ensemble generation methods also exhibit slightly larger humidity spread throughout the troposphere, but this behavior changes near the atmospheric boundary layer, where the EnKFs have more spread than the simplified approaches (S-EnSRF and filter free). The simplified approaches display more zonal wind spread near the jet, compared to the EnKF algorithms. However, the EnKFs have slightly larger temperature spread than the simplified approaches in the lower part of the troposphere. Figure 3 also shows that the different ensemble generation methods have similar mean 6-h surface pressure spreads, although it is higher for the simplified approaches than for the EnKFs.
By construction, the analysis spread from the filter free approach is horizontally homogeneous at first and becomes gradually flow dependent during the model integration. Although the mean 6-h spread is similar for the different approaches, their spatial variabilities and temporal evolutions may differ. To illustrate these two features, Fig. 4 presents the spatial variability of the ensemble spread at different forecast lead times for all ensemble generation methods. Only results for 500-hPa zonal wind and temperature are presented because similar results are found throughout the lower part of the troposphere. As Fig. 3 showed that the mean spread from the different ensemble generation methods is comparable, the values in Fig. 4 are normalized with the 9-h median value of each system to focus on the spatial variability of the ensemble spread and its temporal evolution within the assimilation window (at 3–9-h forecast lead times, after all perturbations have been applied during the 6-h IAU period).
Figure 4 shows that the median spread of all ensembles tends to grow in time. However, the ensemble spread of the simplified approaches (S-EnSRF and filter free) tends to grow faster than that of the EnKFs (with the R-EnKF having the slowest growth of all). This behavior could be related to the resolution of the error covariances used to sample the additive inflation.
When looking at the lower and upper limits of the boxplots in Fig. 4 (the first and third quartiles, respectively), one can see that the spatial variability of the spread of the filter free approach increases more rapidly over time than with the other approaches. This suggests that the forecast model modifies the spread such that it departs from its statistically homogeneous structures and becomes flow dependent. This result is coherent with those from Snyder et al. (2003), where homogeneous and isotropic perturbations fully lost their initial characteristics only after a day of model integration. The ensembles from the three other approaches also display such behavior because of the homogeneous nature of the additive perturbations used. However, the change in spatial variability is not as large as that of the filter-free ensemble because the initial perturbations are already partially flow dependent. While the current use of a 6-h assimilation window allows sufficient time for the spatial variability in the ensemble spread from the filter free approach to become comparable with the other approaches, this approach is likely unsuited for use with shorter time windows.
To visualize the spatial distribution of the ensemble spread, Fig. 5 presents the 6-h surface pressure ensemble spread over the regional domain for the different ensemble generation methods. Two specific areas related to low pressure systems correspond with areas of high ensemble spread and are identified on the figure (black circles). The first is located in the upper-left part of the domain and is characterized by the early stages of the formation of a low pressure system. The second is located in the lower-right part of the domain and is characterized by a cold front approaching the East Coast of the United States. Focusing on the U.S. East Coast, one can see that the G-EnKF does not have sufficient resolution to represent the structure related to the cold front, as supported by the power spectra presented in Fig. 6. On the other hand, Figs. 5 and 6 suggest that all the high-resolution ensemble generation methods are able to represent such fine scales. From Fig. 5, one can see that the S-EnSRF and filter-free ensembles have very similar spatial structures of ensemble spread for this case, while the R-EnKF has a slightly different structure (more linear structure with slightly stronger ensemble spread). This could be because the ensembles from the simplified approaches are centered on the same deterministic mean, while the R-EnKF ensemble is centered on its own mean.
Now focusing on the upper-left part of the domain, one can see that the S-EnSRF and the EnKFs produce similar spatial patterns of ensemble spread. On the other hand, the filter free approach provides much smaller ensemble spread than the other methods for this specific case [likely because of the relatively short (6 h) model integration]. This suggests that in some cases, a 6-h assimilation window is not sufficient for initially homogenous perturbations to become fully flow dependent. This is in contrast with the case of the cold front, for which the filter free approach developed a spatial structure of ensemble spread that is nearly indistinguishable from the other approaches.
Figure 5 also shows that the spread is continuous near the model lateral boundaries and is consistent with the spread at the center of the domain, on average, thanks to the perturbed LBCs. The perturbed LBCs also allow the limited-area ensemble to capture the effects of meteorological systems entering the domain. As an example, on 4 July 2014, the perturbed LBCs were effective at seamlessly introducing the uncertainty associated to Hurricane Arthur within the regional domain (not shown).
Finally, the fact that the filter free approach and S-EnSRF give very similar results, although they are generated from very different methods, supports the view that the ensemble mean is an important component of any ensemble. Consistency between the ensemble mean and the deterministic analysis is also desirable to propagate information from observations in a way that is coherent with the background error statistics used in the data assimilation approach.
b. Ensemble correlations
To assess the overall ensemble correlation length scales after 6 h of model integration, Fig. 7 (Fig. 8) presents the mean horizontal (vertical) correlations as a function of distance for the different ensemble generation methods. Again, only results for 500-hPa zonal wind and temperature are presented because similar results are found throughout the lower part of the troposphere.
Figures 7 and 8 show that the correlation length scales are qualitatively similar for all ensemble generation methods. Still, one can see that the limited-area ensembles (R-EnKF, S-EnSRF, and filter free approach) have sharper horizontal correlations than the G-EnKF due to their higher horizontal resolutions. However, in the vertical, the EnKFs produce sharper correlations than the simplified approaches (S-EnSRF and filter free). More specifically, the R-EnKF has sharper zonal wind correlations (in both the horizontal and vertical) than the simplified approaches. For temperature, the R-EnKF and the simplified approaches have similar spatial correlations (in both the horizontal and vertical), although the tails of the correlation functions differ. The R-EnKF temperature correlations likely have a longer tail than that of the simplified approaches because the R-EnKF uses additive inflation sampled from the global climatological background error covariances, while the S-EnSRF and filter free approaches sample the perturbations from ECCC limited-area climatological background error covariances, which have sharper temperature correlations (not shown). Finally, Fig. 7 indicates that the G-EnKF temperature horizontal correlation function has a very long tail. This characteristic is amplified near the surface (not shown). This is likely because the G-EnKF employs a multimodel approach, and the multiple boundary layer parameterizations induce large-scale systematic differences between members near the surface (not shown), which leads to the very long correlations observed in Fig. 7.
To assess the multivariate nature of the covariances, the correlations among different variables, different vertical levels, and different times and locations are examined. The fact that the ensembles are centered on different mean analyses makes it difficult to compare results for finescale meteorological events because the error structures strongly depend on the ensemble mean. To address this issue, a single case occurring during a continental ridge phase is selected. In this case, small ensemble-mean differences do not interfere much with the results because large scales dominate the flow. Figure 9 presents surface pressure correlations with 850-hPa zonal wind centered on the continental ridge (over the Canadian province of Saskatchewan).
Figure 9 shows that the different ensembles provide qualitatively similar results, in terms of multivariate correlations. At first glance, it is obvious that the limited-area ensembles provide error correlations with finer structures than the G-EnKF due to their higher resolutions. An examination of ensemble perturbation power spectra (Fig. 6) indicates that the limited-area ensemble perturbations do have higher energy in the fine scales (<200 km), while having similar amounts of energy at the large scales, as compared to the G-EnKF.
Figure 9 indicates that the S-EnSRF and G-EnKF have stronger local multivariate correlations than R-EnKF and the filter free approach for this case. On the other hand, results for this specific case show that the filter free approach produces smaller multivariate correlations than R-EnKF. Throughout the many cases examined, the S-EnSRF systematically generated the strongest multivariate correlations (not shown). Keeping in mind that the S-EnSRF only performs pointwise scaling of the background perturbations and has no effect on the ensemble correlations, it becomes clear that the S-EnSRF has stronger multivariate correlations because it does not sharpen the ensemble perturbation correlations at every cycle as the EnKF analysis step does (assimilation algorithms generally decrease correlations because observations are considered as having uncorrelated errors).
5. Results from data assimilation experiments
The results from the deterministic assimilation experiments carried out over July 2014 are presented in this section. As mentioned in section 3, the experimental regional 4D-EnVar algorithm uses the same observations for all experiments, but employs background error covariances computed using 128 ensemble members from the different ensemble generation methods, as presented in Table 1. The 48-h forecasts produced twice daily are evaluated by computing departures from upper-air and surface observations (radiosondes and SYNOP reports from surface stations, respectively). Biases and standard deviations (STD) of these departures are evaluated, and statistical significance is assessed using Student’s t test (for bias) and Fisher’s exact test (for STD). Upper-air forecasts’ STDs are presented for lead times ranging from 0 to 36 h and for vertical levels between the surface and 100 hPa, as the differences between the experiments above 100 hPa and at 48 h are not statistically significant. Upper-air forecast bias and all surface scores (bias and STD) are also not presented because differences between the experiments are not statistically significant, except for surface pressure (which can be diagnosed from the upper-air geopotential height scores presented). To evaluate the background error covariances’ impact on the accuracy of the precipitation forecasts, the equitable threat score (ETS) is computed for different precipitation accumulation thresholds (over 24-h periods).
Upper-air wind, temperature, geopotential height, and humidity variables from the different experiments using limited-area background error covariance matrices (EnVarfilter free, EnVarS-EnSRF, and EnVarR-EnKF, respectively) are compared against those from the EnVarG-EnKF experiment. Figures 10 and 11 indicate that both the EnVarfilter free and EnVarS-EnSRF experiments degrade the analysis fit to the observations, compared to the EnVarG-EnKF experiment, although the ensembles used have similar spread on average. For 12–36-h lead times, Fig. 10 shows that EnVarfilter free generally degrades the forecast STDs. The degradation is small, being statistically significant only for temperatures near 400 hPa, for humidity near the surface, and for winds at or below the jet level. Although the EnVarS-EnSRF experiment degrades the fit to the observations, its forecast results are generally neutral when compared to EnVarG-EnKF (Fig. 11). Figure 12 shows that the EnVarR-EnKF provides a better fit to the observations than the EnVarG-EnKF experiment, but the forecast differences are small and generally not statistically significant. Comparing the regional approaches directly, results show that the EnVarS-EnSRF and EnVarR-EnKF experiments provide upper-air forecasts of similar quality, while the EnVarfilter free experiment slightly degrades the forecasts (not shown).
Higher-resolution background error covariances were expected to provide a better fit to the observations because of their spatially sharper error correlations, provided they have similar spread amplitude. However, the persistence of the homogeneous and isotropic perturbation characteristics within the filter-free ensemble and the stronger multivariate correlations within the S-EnSRF ensemble may be limiting factors to the analysis fit to assimilated upper-air observations. Although the EnVarfilter free experiment slightly degrades the forecasts, the EnVarS-EnSRF does not suffer from a reduced fit to the assimilated observations, and its forecasts are not degraded.
The ETS is computed to compare the precipitation forecast accuracy of the four assimilation experiments for different precipitation accumulation thresholds (≥1, ≥2, ≥5, ≥10, ≥25, and ≥50 mm) over 24-h periods. The scores presented are computed using accumulations between 12- and 36-h forecasts for which the impact of the background error covariances is maximal. Similar, but less statistically significant, results are obtained when using accumulations from the first 24 h of the forecasts, whereas the impact is not statistically significant (for most thresholds) when using accumulations between 24- and 48-h forecasts (not shown). Figure 13 shows that EnVarR-EnKF statistically significantly improves the precipitation accumulation predictions over EnVarG-EnKF for all evaluated thresholds, except for the ≥50-mm threshold, which has too few cases (357 events) to provide statistically significant results. Comparing the regional approaches directly, the EnVarR-EnKF improvement over EnVarS-EnSRF and EnVarfilter free is similar to that of Fig. 13 (not shown). Results indicate that the experiment using background error covariances from the R-EnKF provides the best precipitation accumulation predictions over 24-h periods, compared to those based on the G-EnKF, S-EnSRF, and filter free approaches.
Experimental tests combining the two EnKFs were also carried out to increase the total number of ensemble members at no additional cost by combining the global ensemble members with the ensemble members of the R-EnKF. Mixing two ensembles can potentially be better than using either individually, even if one is of poorer quality. However, the discrepancy between the two ensembles limited the overall quality of the combined ensemble. Even when removing their respective means separately before combining them, the combined ensemble degraded the ensuing deterministic forecasts. This may be due to the difference in the scales resolved by the G-EnKF and R-EnKF, and this option was abandoned at an early stage of development.
6. Conclusions
The spatial resolution and the quality of background error covariances are factors limiting the assimilation of dense observations, and the generation of a high-resolution ensemble can be cost prohibitive. Cheap and conceptually simple ensemble generation methods are presented and compared with G-EnKF and R-EnKF, aiming at providing limited-area deterministic assimilation schemes with higher-resolution flow-dependent background error covariances that perform at least as well as those from the G-EnKF when using the same observations.
Diagnostic results show that the different methods generate ensemble with similar spread and similar horizontal and vertical correlations, on average. However, higher-resolution ensembles (filter free, S-EnSRF, and R-EnKF) generate finer-scale structures than those from G-EnKF. Results also show that the ensemble mean influences the structure of the background ensemble perturbations and is an important component of the ensembles. Consistency between the ensemble mean and the deterministic forecast is desirable to propagate information from observations in a way that is coherent with the background state used for assimilation.
In the end, very simple methods like the filter free approach and S-EnSRF can develop error structures similar to complex approaches like R-EnKF, provided the ensemble mean is of good quality. However, in some cases, the ensemble spread evolved from homogeneous and isotropic analysis perturbations failed to capture some slowly evolving structures, and the ensemble perturbations would certainly benefit from being recycled as in the S-EnSRF or EnKF schemes.
A detailed evaluation of the ensemble spread for a specific case shows that the homogeneous characteristics of the filter-free initial perturbations persist in time. At lead times ranging from 3 to 6 h, the filter-free perturbation structure is more homogeneous (not as flow dependent) than that from S-EnSRF, R-EnKF, and G-EnKF. The shorter the model integration time, the more homogeneous the perturbations are. The filter-free ensemble is thus not suited for assimilation cycles with short time windows.
The different ensembles were also used to provide flow-dependent background error covariances to deterministic data assimilation cycles. Results from upper-air evaluation of the assimilation experiments against radiosonde observations show that the filter-free experiment degrades both the analysis fit to the observations and the subsequent forecasts. On the other hand, the S-EnSRF and R-EnKF experiments provide forecasts of similar quality to those from G-EnKF, although the S-EnSRF degrades the analysis fit to the observations. However, results from precipitation verification indicate that the R-EnKF experiment provides the best precipitation accumulation predictions over 24-h periods.
Finally, the G-EnKF remains a good reference and is necessary to provide LBCs to the limited-area ensembles. However, higher-resolution background error covariances are necessary to make better use of dense observations, and work is already underway on the assimilation of radar data and methods for extracting higher-resolution information from satellite data. The S-EnSRF is a conceptually simple and cost-efficient ensemble generation algorithm (with essentially zero cost for the analysis step) that allows for both additive inflation and recycling of modified background perturbations as in R-EnKF. Although S-EnSRF and R-EnKF provide forecasts of similar quality (except for precipitation), the impact of using limited-area (S-EnSRF) versus global (R-EnKF) error covariances for additive inflation was not tested, but could affect these results. Additional work to optimize the R-EnKF scheme (e.g., tuning localization distances, recentering the ensemble on deterministic analyses that use more observations than the EnKF, or using limited-area climatological background error covariances for additive inflation) may result in improved deterministic forecasts when using the resulting background ensembles, but these are out of the scope of the present paper and should be examined in a subsequent study.
Acknowledgments
The authors acknowledge the contributions of their colleagues Mateusz Reszka, Thomas Milewski, and Ervig Lapalme for their technical support with the data assimilation suite. Also, thanks to the anonymous reviewers for providing insightful comments. The first author acknowledges the Natural Sciences and Engineering Research Council of Canada (NSERC) and ECCC Atmospheric Science and Technology Directorate for their financial support.
REFERENCES
Aranami, K., and Coauthors, 2015: A new operational regional model for convection-permitting numerical weather prediction at JMA. Atmos. Oceanic Model., 45, 0505–0506.
Bédard, J., S. Laroche, and P. Gauthier, 2015: A geo-statistical observation operator for the assimilation of near-surface wind data. Quart. J. Roy. Meteor. Soc., 141, 2857–2868, https://doi.org/10.1002/qj.2569.
Bédard, J., S. Laroche, and P. Gauthier, 2017: Near-surface wind observation impact on forecasts: Temporal propagation of the analysis increment. Mon. Wea. Rev., 145, 1549–1564, https://doi.org/10.1175/MWR-D-16-0310.1.
Benjamin, S. G., and Coauthors, 2004: An hourly assimilation–forecast cycle: The RUC. Mon. Wea. Rev., 132, 495–518, https://doi.org/10.1175/1520-0493(2004)132<0495:AHACTR>2.0.CO;2.
Benjamin, S. G., and Coauthors, 2016: A North American hourly assimilation and model forecast cycle: The Rapid Refresh. Mon. Wea. Rev., 144, 1669–1694, https://doi.org/10.1175/MWR-D-15-0242.1.
Bloom, S. C., L. L. Takacs, A. M. da Silva, and D. Ledvina, 1996: Data assimilation using incremental analysis updates. Mon. Wea. Rev., 124, 1256–1271, https://doi.org/10.1175/1520-0493(1996)124<1256:DAUIAU>2.0.CO;2.
Brousseau, P., L. Berre, F. Bouttier, and G. Desroziers, 2011: Background-error covariances for a convective-scale data-assimilation system: AROME–France 3D-Var. Quart. J. Roy. Meteor. Soc., 137, 409–422, https://doi.org/10.1002/qj.750.
Buehner, M., 2005: Ensemble-derived stationary and flow-dependent background-error covariances: Evaluation in a quasi-operational NWP setting. Quart. J. Roy. Meteor. Soc., 131, 1013–1043, https://doi.org/10.1256/qj.04.15.
Buehner, M., and Coauthors, 2015: Implementation of deterministic weather forecasting systems based on ensemble–variational data assimilation at Environment Canada. Part I: The global system. Mon. Wea. Rev., 143, 2532–2559, https://doi.org/10.1175/MWR-D-14-00354.1.
Caron, J., T. Milewski, M. Buehner, L. Fillion, M. Reszka, S. Macpherson, and J. St-James, 2015: Implementation of deterministic weather forecasting systems based on ensemble–variational data assimilation at Environment Canada. Part II: The regional system. Mon. Wea. Rev., 143, 2560–2580, https://doi.org/10.1175/MWR-D-14-00353.1.
Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757, https://doi.org/10.1002/qj.49712555417.
Gauthier, P., M. Buehner, and L. Fillion, 1999: Background-error statistics modelling in a 3D variational data assimilation scheme: Estimation and impact on the analyses. Proc. ECMWF Workshop on the Diagnostics of Assimilation Systems, Reading, United Kingdom, ECMWF, 131–145.
Girard, C., and Coauthors, 2014: Staggered vertical discretization of the Canadian Environmental Multiscale (GEM) model using a coordinate of the log-hydrostatic-pressure type. Mon. Wea. Rev., 142, 1183–1196, https://doi.org/10.1175/MWR-D-13-00255.1.
Gustafsson, N., and Coauthors, 2018: Survey of data assimilation methods for convective‐scale numerical weather prediction at operational centres. Quart. J. Roy. Meteor. Soc., 144, 1218–1256, https://doi.org/10.1002/qj.3179.
Honda, Y., M. Nishijima, K. Koizumi, Y. Ohta, K. Tamiya, T. Kawabata, and T. Tsuyuki, 2005: A pre-operational variational data assimilation system for a non-hydrostatic model at the Japan Meteorological Agency: Formulation and preliminary results. Quart. J. Roy. Meteor. Soc., 131, 3465–3475, https://doi.org/10.1256/qj.05.132.
Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137, https://doi.org/10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.
Houtekamer, P. L., H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, L. Spacek, and B. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Wea. Rev., 133, 604–620, https://doi.org/10.1175/MWR-2864.1.
Houtekamer, P. L., B. He, and H. L. Mitchell, 2014a: Parallel implementation of an ensemble Kalman filter. Mon. Wea. Rev., 142, 1163–1182, https://doi.org/10.1175/MWR-D-13-00011.1.
Houtekamer, P. L., X. Deng, H. L. Mitchell, S.-J. Baek, and N. Gagnon, 2014b: Higher resolution in an operational ensemble Kalman filter. Mon. Wea. Rev., 142, 1143–1162, https://doi.org/10.1175/MWR-D-13-00138.1.
Hu, M., M. Xue, and K. Brewster, 2006: 3DVAR and cloud analysis with WSR-88D level-II data for the prediction of the Fort Worth, Texas, tornadic thunderstorms. Part I: Cloud analysis and its impact. Mon. Wea. Rev., 134, 675–698, https://doi.org/10.1175/MWR3092.1.
Hu, M., S. G. Benjamin, T. T. Ladwig, D. C. Dowell, S. S. Weygandt, C. R. Alexander, and J. S. Whitaker, 2017: GSI three-dimensional ensemble–variational hybrid data assimilation using a global ensemble for the regional Rapid Refresh model. Mon. Wea. Rev., 145, 4205–4225, https://doi.org/10.1175/MWR-D-16-0418.1.
Ingleby, N. B., A. C. Lorenc, K. Ngan, F. Rawlins, and D. R. Jackson, 2013: Improved variational analyses using a nonlinear humidity control variable. Quart. J. Roy. Meteor. Soc., 139, 1875–1887, https://doi.org/10.1002/qj.2073.
Lu, X., X. Wang, M. Tong, and V. Tallapragada, 2017: GSI-based, continuously cycled, dual-resolution hybrid ensemble–variational data assimilation system for HWRF: System description and experiments with Edouard (2014). Mon. Wea. Rev., 145, 4877–4898, https://doi.org/10.1175/MWR-D-17-0068.1.
Ma, J., Y. Zhu, D. Hou, X. Zhou, and M. Peña, 2014: Ensemble transform with 3D rescaling initialization method. Mon. Wea. Rev., 142, 4053–4073, https://doi.org/10.1175/MWR-D-13-00367.1.
Milbrandt, J. A., S. Bélair, M. Faucher, M. Vallée, M. L. Carrera, and A. Glazer, 2016: The pan-Canadian High Resolution (2.5 km) Deterministic Prediction System. Wea. Forecasting, 31, 1791–1816, https://doi.org/10.1175/WAF-D-16-0035.1.
Qaddouri, A., C. Girard, L. Garand, A. Plante, and D. Anselmo, 2015: Changes to the Global Deterministic Prediction System (GDPS) from version 4.0.1 to version 5.0.0—Yin-Yang grid configuration. Canadian Meteorological Centre Tech. Note, 68 pp., http://collaboration.cmc.ec.gc.ca/cmc/CMOI/product_guide/docs/lib/technote_gdps-500_20151215_e.pdf.
Raynaud, L., and F. Bouttier, 2016: Comparison of initial perturbation methods for ensemble prediction at convective scale. Quart. J. Roy. Meteor. Soc., 142, 854–866, https://doi.org/10.1002/qj.2686.
Schraff, C., H. Reich, A. Rhodin, A. Schomburg, K. Stephan, A. Periáñez, and R. Potthast, 2016: Kilometre‐scale ensemble data assimilation for the COSMO model (KENDA). Quart. J. Roy. Meteor. Soc., 142, 1453–1472, https://doi.org/10.1002/qj.2748.
Snyder, C., T. M. Hamill, and S. B. Trier, 2003: Linear evolution of error covariances in a quasigeostrophic model. Mon. Wea. Rev., 131, 189–205, https://doi.org/10.1175/1520-0493(2003)131<0189:LEOECI>2.0.CO;2.
Todling, R., and A. El Akkraoui, 2014: Hybrid data assimilation without ensemble filtering. NASA Goddard Space Flight Center Tech. Note, 41 pp., https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20140011180.pdf.
Toth, Z., and E. Kalnay, 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc., 74, 2317–2330, https://doi.org/10.1175/1520-0477(1993)074<2317:EFANTG>2.0.CO;2.
Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 3297–3319, https://doi.org/10.1175/1520-0493(1997)125<3297:EFANAT>2.0.CO;2.
Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 1913–1924, https://doi.org/10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.
Wu, W., D. Parrish, E. Rogers, and Y. Lin, 2017: Regional ensemble–variational data assimilation using global ensemble forecasts. Wea. Forecasting, 32, 83–96, https://doi.org/10.1175/WAF-D-16-0045.1.
Zadra, A., and Coauthors, 2014: Improvements to the Global Deterministic Prediction System (GDPS) (from version 2.2.2 to 3.0.0), and related changes to the Regional Deterministic Prediction System (RDPS) (from version 3.0.0 to 3.1.0). Canadian Meteorological Centre Tech. Note, 88 pp., http://collaboration.cmc.ec.gc.ca/cmc/CMOI/product_guide/docs/lib/op_systems/doc_opchanges/technote_gdps300_20130213_e.pdf.