Toward Unifying Short-Term and Next-Day Convection-Allowing Ensemble Forecast Systems with a Continuously Cycling 3-km Ensemble Kalman Filter over the Entire Conterminous United States

Craig S. Schwartz National Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Craig S. Schwartz in
Current site
Google Scholar
PubMed
Close
,
Glen S. Romine National Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Glen S. Romine in
Current site
Google Scholar
PubMed
Close
, and
David C. Dowell NOAA/Earth System Research Laboratory, Boulder, Colorado

Search for other papers by David C. Dowell in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Using the Weather Research and Forecasting Model, 80-member ensemble Kalman filter (EnKF) analyses with 3-km horizontal grid spacing were produced over the entire conterminous United States (CONUS) for 4 weeks using 1-h continuous cycling. For comparison, similarly configured EnKF analyses with 15-km horizontal grid spacing were also produced. At 0000 UTC, 15- and 3-km EnKF analyses initialized 36-h, 3-km, 10-member ensemble forecasts that were verified with a focus on precipitation. Additionally, forecasts were initialized from operational Global Ensemble Forecast System (GEFS) initial conditions (ICs) and experimental “blended” ICs produced by combining large scales from GEFS ICs with small scales from EnKF analyses using a low-pass filter. The EnKFs had stable climates with generally small biases, and precipitation forecasts initialized from 3-km EnKF analyses were more skillful and reliable than those initialized from downscaled GEFS and 15-km EnKF ICs through 12–18 and 6–12 h, respectively. Conversely, after 18 h, GEFS-initialized precipitation forecasts were better than EnKF-initialized precipitation forecasts. Blended 3-km ICs reflected the respective strengths of both GEFS and high-resolution EnKF ICs and yielded the best performance considering all times: blended 3-km ICs led to short-term forecasts with similar or better skill and reliability than those initialized from unblended 3-km EnKF analyses and ~18–36-h forecasts possessing comparable quality as GEFS-initialized forecasts. This work likely represents the first time a convection-allowing EnKF has been continuously cycled over a region as large as the entire CONUS, and results suggest blending high-resolution EnKF analyses with low-resolution global fields can potentially unify short-term and next-day convection-allowing ensemble forecast systems under a common framework.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Craig Schwartz, schwartz@ucar.edu

Abstract

Using the Weather Research and Forecasting Model, 80-member ensemble Kalman filter (EnKF) analyses with 3-km horizontal grid spacing were produced over the entire conterminous United States (CONUS) for 4 weeks using 1-h continuous cycling. For comparison, similarly configured EnKF analyses with 15-km horizontal grid spacing were also produced. At 0000 UTC, 15- and 3-km EnKF analyses initialized 36-h, 3-km, 10-member ensemble forecasts that were verified with a focus on precipitation. Additionally, forecasts were initialized from operational Global Ensemble Forecast System (GEFS) initial conditions (ICs) and experimental “blended” ICs produced by combining large scales from GEFS ICs with small scales from EnKF analyses using a low-pass filter. The EnKFs had stable climates with generally small biases, and precipitation forecasts initialized from 3-km EnKF analyses were more skillful and reliable than those initialized from downscaled GEFS and 15-km EnKF ICs through 12–18 and 6–12 h, respectively. Conversely, after 18 h, GEFS-initialized precipitation forecasts were better than EnKF-initialized precipitation forecasts. Blended 3-km ICs reflected the respective strengths of both GEFS and high-resolution EnKF ICs and yielded the best performance considering all times: blended 3-km ICs led to short-term forecasts with similar or better skill and reliability than those initialized from unblended 3-km EnKF analyses and ~18–36-h forecasts possessing comparable quality as GEFS-initialized forecasts. This work likely represents the first time a convection-allowing EnKF has been continuously cycled over a region as large as the entire CONUS, and results suggest blending high-resolution EnKF analyses with low-resolution global fields can potentially unify short-term and next-day convection-allowing ensemble forecast systems under a common framework.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Craig Schwartz, schwartz@ucar.edu

1. Introduction

Convection-allowing ensembles (CAEs) produce better precipitation and severe weather forecasts than coarser-resolution, convection-parameterizing ensembles (e.g., Clark et al. 2009; Duc et al. 2013; Iyer et al. 2016; Schellander-Gorgas et al. 2017), are operational at many weather forecasting offices (e.g., Gebhardt et al. 2011; Peralta et al. 2012; Hagelin et al. 2017; Raynaud and Bouttier 2017; Jirak et al. 2018; Klasa et al. 2018), and have proven useful and valuable for various meteorological applications around the world (e.g., Xue et al. 2007; Clark et al. 2012; Evans et al. 2014; Maurer et al. 2017; Zhang 2018; Cafaro et al. 2019; Porson et al. 2019; Schwartz et al. 2019). Thus, as computing power has increased, CAE domains have gradually enlarged, with operational global CAEs on the horizon.

While CAEs can be initialized by downscaling coarser-resolution, convection-parameterizing analyses, convection-allowing numerical weather prediction (NWP) models are typically best when initialized from corresponding convection-allowing analyses, particularly for short-term forecasts (e.g., Ancell 2012; Harnisch and Keil 2015; Johnson et al. 2015; Johnson and Wang 2016; Raynaud and Bouttier 2016; Schwartz 2016; Gustafsson et al. 2018). Therefore, to produce the best possible CAE forecasts over ever-expanding domains, convection-allowing data assimilation (DA) systems over large areas are needed to provide optimal initial conditions (ICs).

However, there are obstacles to implementing convection-allowing DA systems over domains large enough to resolve mesoalpha- to synoptic-scale features, especially when using state-of-the-science ensemble-based DA algorithms like the ensemble Kalman filter (EnKF; Evensen 1994; Houtekamer and Zhang 2016), which produces flow-dependent analysis ensembles and has become popular for initializing CAEs (e.g., Jones and Stensrud 2012; Melhauser and Zhang 2012; Schumacher and Clark 2014; Schwartz et al. 2014, 2015a,b, 2019). One challenge is simply computational expense, which grows directly with domain size,1 and accordingly, most convection-allowing EnKFs and their associated CAE forecasts have relatively small domains centered on a single European country (e.g., Schraff et al. 2016; COSMO 2020) or a small portion of the conterminous United States (CONUS). For example, NOAA’s experimental “Warn-on-Forecast” (WoF; Stensrud et al. 2009, 2013) system, initialized from 36-member 3-km EnKF analyses, covers less than 1000 km × 1000 km (Wheatley et al. 2015; Jones et al. 2016, 2018, 2020; Skinner et al. 2018).

Fortunately, computing challenges can be overcome with increased resources, and recently, several studies initialized CAE forecasts from 40-member EnKF analyses with 3-km or finer horizontal grid spacing over the entire CONUS (Duda et al. 2019; Gasperoni et al. 2020; Johnson et al. 2020). Similarly, NOAA’s real-time, experimental High-Resolution Rapid Refresh Ensemble (HRRRE) is initialized from CONUS-spanning, 3-km, 36-member EnKF analyses (Dowell et al. 2016; Ladwig et al. 2018). However, 36–40-member EnKFs are likely smaller than desirable, considering that operational global EnKFs run by the United States and Canada, respectively, have 80 and 256 members, and generally, EnKFs benefit from larger ensembles (e.g., Zhang et al. 2013; Houtekamer et al. 2014).

But, even with unlimited resources, there are fundamental scientific concerns that must be addressed to develop stable, high-quality, convection-allowing EnKFs over large regional domains, especially in continuously cycling limited-area EnKFs where external models are relegated to providing boundary conditions. In particular, model physics deficiencies can lead to accumulation of biases throughout EnKF DA cycles, potentially degrading analysis system performance and subsequent forecasts (e.g., Torn and Davis 2012; Romine et al. 2013; Cavallo et al. 2016; Wong et al. 2020). Although all continuously cycling limited-area EnKFs are prone to bias accumulation, this issue may be exacerbated as both model resolution and domain size increase: biases may accumulate more in high-resolution EnKFs than low-resolution EnKFs because of rapid small-scale error growth (e.g., Lorenz 1969; Zhang et al. 2003; Hohenegger and Schär 2007; Judt 2018), and EnKFs over large domains may suffer from bias accumulations more than EnKFs over small domains because of reduced influence from lateral boundaries provided by potentially less biased global models (e.g., Warner et al. 1997; Romine et al. 2014; Schumacher and Clark 2014).

Given these scientific and computing challenges, operational convection-allowing continuously cycling EnKFs and attendant CAEs over Europe have small domains (e.g., Schraff et al. 2016; COSMO 2020), while large-domain convection-allowing EnKFs over the CONUS (e.g., Duda et al. 2019; Gasperoni et al. 2020; Johnson et al. 2020; HRRRE) employ “partial cycling” strategies that periodically discard convection-allowing analysis cycles and replace them with coarser-resolution, large-scale external analyses in hopes of tempering bias accumulations (e.g., Hsiao et al. 2012; Benjamin et al. 2016; Wu et al. 2017). This partial cycling approach over the CONUS seems justified, as Schwartz et al. (2020) showed that a limited-area continuously cycling EnKF with convection-parameterizing resolution did not initialize better CAE precipitation forecasts over the CONUS than downscaled global analyses.

Nonetheless, as discussed at length by Schwartz et al. (2019), continuously cycling EnKFs have many attractive properties for CAE initialization, including the ability to diagnose model biases while simultaneously producing flow-dependent ICs that are dynamically consistent with and span all possible resolvable scales of the convection-allowing forecast model. Thus, despite formidable challenges, it is desirable to further explore and develop continuously cycling EnKFs over large geographic areas at convection-allowing resolutions for CAE initialization purposes.

Accordingly, we produced continuously cycling, 80-member, 3-km EnKF analyses with a 1-h cycling period for 4 weeks over a computational domain spanning the entire CONUS. EnKF analysis ensembles then initialized 36-h, 3-km, 10-member CAE forecasts. For comparison, 3-km CAE forecasts were also initialized by downscaling both 15-km EnKF analyses and global ICs produced for NCEP’s operational Global Ensemble Forecast System (GEFS; Zhou et al. 2017). The impact of assimilating radar observations into the 3-km EnKF was also assessed. Relative to the EnKF described in Schwartz et al. (2020), our EnKFs used more advanced observation processing, an upgraded NWP model, and a shorter cycling period, and inclusion of 3-km EnKF DA was also new. To our knowledge, this work presents the first time convection-allowing continuously cycling EnKF analyses have been produced over the entire CONUS.

Results indicated benefits of EnKF-initialized forecasts with respect to GEFS-initialized forecasts diminished with forecast length, presumably because large-scale fields were better represented in GEFS ICs and became more important at longer forecast ranges. These findings motivated experimentation with a “blending” approach combining large-scale fields from an external (e.g., global) NWP model with small-scale fields from a limited-area model, which can be achieved by augmenting a variational cost function with a global model constraint (e.g., Guidard and Fischer 2008; Dahlgren and Gustafsson 2012; Vendrasco et al. 2016; Keresturi et al. 2019) or using filtering to perform scale separation (e.g., Yang 2005; Wang et al. 2011; Caron 2013; H. Wang et al. 2014; Y. Wang et al. 2014; Hsiao et al. 2015; Zhang et al. 2015; Feng et al. 2020); we used a low-pass filter to combine large scales from GEFS ICs with small scales from EnKF analyses. These previous studies collectively suggested blended limited-area ICs improved forecasts compared to those initialized from unblended limited-area ICs, including for a CAE within a perturbed-observation variational DA framework (Keresturi et al. 2019). However, our application of blending within the context of a large-domain convection-allowing continuously cycling EnKF was unique, and, as described below, blending global fields with high-resolution EnKF analyses can potentially unite short-term and next-day (18–36-h) CAE forecast systems under a common framework.

2. Model configurations, EnKF settings, and experimental design

a. Forecast model

All forecasts were produced by version 3.9.1.1 of the Advanced Research Weather Research and Forecasting (WRF) Model (Skamarock et al. 2008; Powers et al. 2017) over a nested computational domain (Fig. 1a). The horizontal grid spacing was 15 km in the outer domain and 3 km in the nest, and time steps were 60 and 12 s in the 15- and 3-km domains, respectively. Both domains had 51 vertical levels distributed as in the Rapid Refresh model (Benjamin et al. 2016) with a 15-hPa top. Physical parameterizations were identical across the two domains (Table 1), except no cumulus parameterization was employed on the convection-allowing 3-km grid, and all ensemble members used common physics and dynamics options.

Fig. 1.
Fig. 1.

(a) Computational domain. Horizontal grid spacing was 15 km in the outer domain (415 × 325 points) and 3 km in the nest (1581 × 986 points). Objective precipitation verification only occurred over the red shaded region of the 3-km domain (CONUS east of 105°W). (b) Total accumulated Stage IV (ST4) precipitation (mm) over the verification region between 0000 UTC 25 Apr and 1200 UTC 21 May 2017, which encompasses all possible valid times of the 36-h forecasts. (c)–(e) The 500-hPa wind speed (shaded; kt; 1 kt ≈ 0.51 m s−1) and height (m; contoured every 40 m) from Global Forecast System analyses valid at 0000 UTC (c) 25 Apr, (d) 1 May, and (e) 14 May 2017.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0110.1

Table 1.

Physical parameterizations for all WRF Model forecasts. Cumulus parameterization was only used on the 15-km domain.

Table 1.

b. EnKF DA systems

1) EnKF experiments and configurations

Two primary DA experiments with 80-member ensembles were performed using an ensemble adjustment Kalman filter (Anderson 2001, 2003; Anderson and Collins 2007), a type of EnKF, as implemented in the Data Assimilation Research Testbed (DART; Anderson et al. 2009) software. The first EnKF experiment only produced analyses on the 15-km domain (Fig. 1a), and the 3-km domain was removed during WRF Model advances between EnKF analyses. Conversely, the second EnKF experiment produced separate, independent analyses on both the 15- and 3-km domains, with nested WRF Model forecasts between EnKF analyses. During these nested forecasts, which were ~45 times more expensive than the single-domain 15-km model advances, one-way feedback was employed such that the 15-km EnKF DA system was unaffected by the 3-km EnKF DA system (i.e., 15-km fields in the nested- and single-domain EnKF DA systems were identical), permitting a clean comparison of analysis and forecast sensitivity to EnKF resolution. The 15- and 3-km EnKFs updated identical state variables (Table 2), with hydrometers included in anticipation of experimentation with radar DA (section 4c).

Table 2.

Summary of EnKF configurations.

Table 2.

Initial 80-member ensembles were produced by interpolating the 0.25° NCEP Global Forecast System (GFS) analysis at 0000 UTC 23 April 2017 onto the 15-km domain and adding random, correlated, Gaussian noise with zero mean (e.g., Barker 2005; Torn et al. 2006) drawn from background error covariances provided by the WRF Model’s DA system (Barker et al. 2012). The randomly produced 15-km ensemble was then downscaled onto the 3-km grid to initialize the 3-km EnKF, ensuring initial 15- and 3-km ensembles were identical aside from interpolation errors. These randomly generated ensembles served as prior (before assimilation) ensembles for the first EnKF analyses, and the posterior (after assimilation) ensembles at 0000 UTC 23 April 2017 initialized 1-h, 80-member ensemble forecasts that became prior ensembles for the next EnKF analyses at 0100 UTC 23 April 2017. Analysis–forecast cycles with a 1-h period continued until 0000 UTC 20 May 2017 (649 total DA cycles). This experimental period (23 April–20 May 2017) was similar to that in Schwartz (2019), which featured several heavy precipitation episodes primarily driven by strong synoptic forcing, a broad overall precipitation maximum centered in Missouri (Fig. 1b), and a variety of flow patterns (Figs. 1c–e).

During EnKF cycles, soil states freely evolved for each member, sea surface temperature was updated daily from NCEP’s 0.12° analyses (e.g., Gemmill et al. 2007), and identical randomly perturbed lateral boundary conditions (LBCs) were applied to the 15-km domain in each DA system, with perturbations for individual members generated using the same method to produce initial ensembles at 0000 UTC 23 April 2017. The first two days of cycling were regarded as spinup.

Spurious correlations due to sampling error were mitigated with a sampling error correction scheme (Anderson 2012) and covariance localization [Eq. (4.10) of Gaspari and Cohn (1999)]. Vertical localization limited analysis increments to ±1.0 scale height (in log pressure coordinates) away from an observation in both the 15- and 3-km EnKFs. However, horizontal localizations differed depending on EnKF resolution: 15-km EnKF analysis increments were forced to zero 1280 km from an observation, but to lessen expense and complete 3-km EnKF analyses quickly enough for operational applications, 3-km EnKF analysis increments were forced to zero 640 km from an observation, except rawinsonde observations could produce increments up to 1280 km away (Table 2). The vertical and 15-km EnKF horizontal localization distances were guided by previous experiences with DART (e.g., Romine et al. 2013, 2014; Schwartz et al. 2015a,b, 2019), and while our 3-km EnKF horizontal localization distances were similar to Johnson et al. (2015), they were larger than those in many other convection-allowing EnKFs (e.g., Harnisch and Keil 2015; Yussouf et al. 2015, 2016; Degelia et al. 2018; Gasperoni et al. 2020; Jones et al. 2020). However, these studies with smaller localization distances either used partial cycling strategies or only continuously cycled for a short period (days), and we believed that larger localization distances were necessary to provide stronger observational constraints in a large-domain continuously cycling 3-km EnKF.

EnKF spread was maintained by applying covariance inflation to posterior state-space perturbations about the ensemble mean following Whitaker and Hamill (2012)’s “relaxation-to-prior spread” algorithm with an inflation parameter α = 1.06 in both the 15- and 3-km EnKFs. As noted by Schwartz and Liu (2014), α > 1 meant inflated posterior spread was greater than prior spread, which, while counterintuitive, was necessary to maintain reasonable spread given absence of other spread-inducing methods like multiphysics ensembles, additive inflation, or stochastic physics. Several iterative weeklong trials with 15-km EnKFs were performed to settle on α = 1.06, which provided acceptable prior observation-space statistics for the assumed observation errors (section 3).

2) Observations

Although DART has observation processing capabilities, we instead used NCEP’s operational Gridpoint Statistical Interpolation (GSI) DA system (Kleist et al. 2009; Shao et al. 2016) for observation processing, which, relative to DART, has more sophisticated quality control, observation thinning, and observation error assignment capabilities. In addition, GSI’s observation operators were used instead of DART’s built-in observation operators to produce model-simulated conventional observations. Initially specified observation errors were based on the HRRRE and identical in the 15- and 3-km EnKFs (Fig. 2; Table 3); GSI adjusted these errors to produce “final” observation error standard deviations σo actually used in the assimilation, as described by several texts (e.g., Schwartz and Liu 2014; Developmental Testbed Center 2016; Johnson and Wang 2017). These adjustments often inflated initially specified observation errors (Fig. 2).

Fig. 2.
Fig. 2.

Initially specified (solid lines) and final (after GSI adjustment; dashed lines) observation error standard deviations as a function of pressure for (a) wind (m s−1), (b) temperature (K), (c) relative humidity (%), and (d) surface pressure (hPa) observations with vertically varying errors averaged over all observations assimilated between 0000 UTC 25 Apr and 0000 UTC 20 May 2017 (inclusive) by both the 15- and 3-km EnKFs. If a particular observation type was not assimilated at a certain pressure level, no value is plotted.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0110.1

Table 3.

Conventional observations that were assimilated and their outlier check thresholds, time windows, and initially specified observation error standard deviations.

Table 3.

Time windows for the observation platforms varied and were based on Rapid Refresh model (Benjamin et al. 2016) and HRRRE settings, with generally smaller windows for frequently reporting, stationary platforms, like METAR observations (Table 3), and all observations were assumed valid at the analysis time. Moisture observations were initially processed as specific humidity, but because GSI requires moisture observation errors in terms of relative humidity, moisture observations were ultimately converted to and assimilated as relative humidity using the prior ensemble mean saturation specific humidity. Satellite-tracked wind and aircraft observations were thinned such that remaining observations were spaced 25 hPa apart vertically and 30 and 15 km apart horizontally in the 15- and 3-km EnKFs, respectively (Table 2); these different horizontal thinnings were chosen so the 15- and 3-km EnKFs had equal numbers of satellite-tracked wind and aircraft observations within their respective horizontal localization radii. Radiance observations were not assimilated since they generally yield small impacts over the CONUS (Lin et al. 2017) given the multitude of available conventional observations. Additionally, the EnKFs did not assimilate radar observations, although an auxiliary experiment was performed where radar observations were assimilated with a 3-km EnKF (section 4c).

Observations were subject to numerous quality control procedures, such as excluding observations from specific aircraft with known biases and applying an “outlier check” to reject observations whose ensemble mean innovations2 were >o, where a varied from 2.5 to 10 depending on observation type and platform (Table 3). These a were generally fairly lenient and allowed most observations to pass the outlier check, which, along with our relatively large localization distances, reflected a philosophy that we wanted observations to heavily constrain the 1-h WRF Model forecasts between EnKF analyses. Overall, the EnKFs assimilated 30 000–100 000 conventional observations each cycle, with a relative dearth of overnight observations due to fewer commercial flights and maxima at 0000 and 1200 UTC reflecting the majority of rawinsonde launches (Fig. 3). Ultimately, GSI-provided observations, final observation errors, and prior model-simulated observations for each ensemble member were ingested directly into DART for use in EnKF DA.

Fig. 3.
Fig. 3.

Computational domain overlaid with observations assimilated by the 15-km EnKF during the (a) 0000, (b) 0600, (c) 1200, and (d) 1800 UTC 27 Apr 2017 analyses. Values of N in the headers indicate the number of assimilated observations. The inner box represents bounds of the 3-km domain; most observations located within the 3-km domain were also assimilated by the 3-km EnKF at these times.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0110.1

3) Forecast initialization

EnKF analysis ensembles initialized 36-h 10-member ensemble forecasts over the nested computational domain (Fig. 1a) at 0000 UTC between 25 April and 20 May 2017 (inclusive; 26 forecasts). Although 80 EnKF analysis members were available, due to computing constraints, 36-h forecasts were only initialized from members 1–10; 10-member CAEs are sufficient to provide skillful and valuable probabilistic forecasts (e.g., Clark et al. 2011, 2018; Schwartz et al. 2014) and similar in size as the HRRRE and NCEP’s operational High-Resolution Ensemble Forecast system (Jirak et al. 2018). Choosing members 1–10 was effectively the same as randomly selecting 10 members since all ensemble members had identical configurations (e.g., Schwartz et al. 2014). In principle, free forecasts could have been initialized every hour, but given finite resources, forecasts were solely initialized at 0000 UTC, which allowed us to focus on both short-term and next-day forecast periods featuring active convection.

When initializing 36-h forecasts from 15-km EnKF analyses, the 3-km nest was initialized by downscaling 15-km EnKF analyses onto the 3-km grid. Conversely, downscaling was unnecessary to initialize 36-h forecasts from the 3-km EnKF; 3-km ICs were simply 3-km EnKF analysis members. For both sets of EnKF-initialized 36-h forecasts, perturbation members 1–10 from the GEFS (Zhou et al. 2017) with 0.5° horizontal grid spacing provided LBCs at 3-h intervals for the 15-km domain, which in turn provided LBCs for the 3-km nest. While random LBCs could have been used for the 36-h forecasts as in the EnKF DA system, we believed it was more appropriate to use flow-dependent LBCs for these longer unconstrained forecasts.

c. Benchmark ensemble

To serve as a benchmark for the EnKF-initialized CAE forecasts, 36-h forecasts on the nested grid (Fig. 1a) with the configurations in section 2a were initialized by interpolating 0.5° ICs from perturbation members 1–10 of the GEFS onto the computational domain at 0000 UTC daily between 25 April and 20 May 2017 (inclusive), with LBCs provided by GEFS forecasts identically as in the EnKF-initialized CAEs. As described by Zhou et al. (2017), GEFS ICs were produced by adding 6-h forecast perturbations from a global EnKF DA system (Whitaker and Hamill 2002) to “hybrid” variational-ensemble analyses produced for NCEP’s deterministic GFS (e.g., Wang and Lei 2014; Kleist and Ide 2015a,b). Relative to the limited-area EnKF analyses, GEFS ICs were much coarser but reflected assimilation of many more observations, including satellite radiances. Overall, comparison of GEFS- and EnKF-initialized CAE forecasts provides insight about whether the vastly more expensive EnKF initialization procedure was warranted.

d. Blending

Based on performance of the EnKF- and GEFS-initialized CAE forecasts (section 4b), additional ensemble ICs were created by “blending” small scales from EnKF analyses with large scales from GEFS ICs. Blending was solely performed at 0000 UTC between 25 April and 20 May 2017 (inclusive) immediately after EnKF DA and before initializing 36-h CAE forecasts; blending was not employed within the context of continuously cycling EnKF DA, as the blended 0000 UTC fields were not used to initialize 1-h WRF Model forecasts that served as priors for the next DA cycle.

Specifically, ICs from corresponding GEFS and EnKF ensemble members were blended on both the 15- and 3-km domains3 to create new initial ensembles using
xblendi=(EnKFiEnKFFILT,i)+GEFSFILT,i,
where xblendi represents the blended ICs for the ith ensemble member, EnKFi is the EnKF analysis for the ith member, and EnKFFILT,i and GEFSFILT,i are the low-pass filtered EnKF and GEFS ICs for the ith member, respectively, for i = 1, …, 10. To perform the scale separation, a low-pass, sixth-order implicit tangent filter (e.g., Raymond 1988; Raymond and Garder 1991) as implemented by several studies (e.g., Yang 2005; H. Wang et al. 2014; Hsiao et al. 2015; Feng et al. 2020) and given by
H(L)=[1+tan6(πΔx/Lx)tan6(πΔx/L)]1
was employed (Fig. 4), where Δx is the horizontal grid spacing (either 15 or 3 km), L the wavelength, H(L) the scale-dependent response function, and Lx a specified filter cutoff (km) physically representing the spatial scale (wavelength) where the blended ICs (e.g., xblendi) had equal contributions from GEFS and EnKF initial states [i.e., when L = Lx, H(L) = 0.5]. Blending was applied at all 51 vertical levels to zonal and meridional wind components; perturbation geopotential height, potential temperature, and dry surface pressure; and water vapor mixing ratio, and the cutoff length was height and variable invariant.
Fig. 4.
Fig. 4.

Amplitude response (y axis) of a sixth-order implicit tangent filter as a function of wavelength (km) for a specified cutoff length of 960 km. In the context of this study, the curve denotes the contribution of GEFS ICs to blended ICs at a given wavelength (e.g., for wavelengths where the amplitude response is 1, 100% of the blended ICs at those wavelengths were from the GEFS). The dashed vertical and solid horizontal lines illustrate how the amplitude response is 0.5 at the cutoff length.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0110.1

We produced blended ICs using filter cutoff lengths Lx of 640, 960, and 1280 km, guided by EnKF horizontal localization lengths and previous work suggesting values between 640 and 1280 km were appropriate (e.g., H. Wang et al. 2014; Hsiao et al. 2015; Feng et al. 2020). CAE forecasts initialized from these three sets of blended ICs objectively had similar skill, although Lx = 960 km yielded slightly better results. Therefore, results are shown only for the 960-km cutoff.

3. EnKF performance

To assess EnKF performance, we examined the observation-space bias and relationship between the prior ensemble mean root-mean-square error (RMSE) and “total spread,” the square root of the sum of the observation error variance σo2 and ensemble variance of the simulated observations (Houtekamer et al. 2005). Ideally, the ratio of total spread to RMSE [termed the consistency ratio (CR; Dowell and Wicker 2009)] should be near 1.0. To fairly compare the 15- and 3-km EnKFs, we restricted this analysis solely to those observations assimilated by both EnKFs, although overall findings were unchanged when computing identical statistics with inhomogeneous samples. We focused on aircraft and rawinsonde observations because of their large impacts on springtime forecasts over the CONUS (James and Benjamin 2017).

Ensemble mean additive biases (model minus observations) and RMSEs aggregated over all prior ensembles (1-h forecasts) between 0000 UTC 25 April and 0000 UTC 20 May 2017 (inclusive) were similar in the 15- and 3-km EnKFs with respect to zonal wind and temperature observations at most levels (Figs. 5a,b,d,e), while biases and RMSEs for moisture were typically smaller in the 3-km EnKF (Figs. 5c,f). Magnitudes of temperature biases were typically <0.1 K, except near the surface and in the upper troposphere for rawinsonde observations (Fig. 5a); the latter is consistent with other continuously cycling EnKFs over the CONUS (e.g., Romine et al. 2013; Schumacher and Clark 2014; Schwartz and Liu 2014; Cavallo et al. 2016; Schwartz 2016) and likely due to closer fits to the more numerous aircraft observations that may have systematically warm biases compared to rawinsonde observations (Ballish and Krishna Kumar 2008). That upper-tropospheric temperature biases relative to aircraft observations (Fig. 5d) were smaller than and opposite the sign of temperature biases relative to rawinsonde observations (Fig. 5a) further supports this reasoning.

Fig. 5.
Fig. 5.

Ensemble mean additive bias (model minus observations; short-dashed lines), ensemble mean RMSE (solid lines), total spread (long-dashed lines), and consistency ratio (CR; solid lines with circles) for (a) rawinsonde temperature (K), (b) rawinsonde zonal wind (m s−1), (c) rawinsonde relative humidity (%), (d) aircraft temperature (K), (e) aircraft zonal wind (m s−1), and (f) aircraft relative humidity (%) observations aggregated over all prior ensembles (1-h forecasts) between 0000 UTC 25 Apr and 0000 UTC 20 May 2017 (inclusive). These statistics were computed for those observations assimilated by both the 15- and 3-km EnKFs. Sample size at each pressure level is shown at the right of each panel. Vertical lines at x = 0 and x = 1 are references for biases and CRs, respectively.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0110.1

Prior total spreads were similar in both EnKFs (Fig. 5) and CRs were usually between 0.8 and 1.2, although CRs suggest moisture observation errors could potentially be decreased. While more spread may have been expected in the 3-km EnKF because small-scale errors grow rapidly upscale (e.g., Lorenz 1969; Zhang et al. 2003; Hohenegger and Schär 2007), cumulus parameterization in the 15-km DA system may have served as an error source that compensated for missing storm-scale structures, and assimilating copious observations each cycle (Fig. 3) with fairly large localization distances highly constrained the 15- and 3-km EnKFs, limiting spread growth during 1-h WRF Model integrations between analyses. In balance, these factors potentially contributed to the similar 15- and 3-km prior spreads.

Overall, systematic biases were usually small and EnKF performance appeared acceptable. Moreover, after the first two days, prior total spread and ensemble mean biases were steady throughout the cycles (Fig. 6), and observation rejection rates varied little with time (not shown). These results indicate the continuously cycling EnKFs maintained stable climates, which is particularly noteworthy for the 3-km EnKF, as it has not previously been demonstrated that a convection-allowing EnKF can be continuously cycled over a large domain without deleterious consequences like a drifting model climate or filter divergence [see appendix A of Houtekamer and Zhang (2016) for a succinct summary of filter divergence].

Fig. 6.
Fig. 6.

Prior (1-h forecast) total spread (long-dashed lines) and ensemble mean additive bias (model minus observations; short-dashed lines) for (a) rawinsonde temperature (K), (b) rawinsonde zonal wind (m s−1), (c) aircraft temperature (K), and (d) aircraft zonal wind (m s−1) observations between 150 and 1000 hPa as a function of time. In (c) and (d) values are plotted every hour between 0000 UTC 23 Apr and 0000 UTC 20 May 2017 (inclusive) and smoothed with a 6-h running average, while in (a) and (b) values are plotted every 12 h between 0000 UTC 23 Apr and 0000 UTC 20 May 2017 (inclusive) without smoothing. These statistics were computed for those observations assimilated by both the 15- and 3-km EnKFs. The x-axis labels represent 0000 UTC for a specific month and day in 2017 (e.g., the marker for “0511” denotes 0000 UTC 11 May 2017). Dashed lines at y = 0 are for reference.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0110.1

4. Precipitation forecast verification

Hourly accumulated precipitation forecasts were verified against Stage IV (ST4) analyses (Lin and Mitchell 2005) produced at NCEP considered as “truth.” Objective evaluations were performed over the CONUS east of 105°W (hereafter the “verification region”; Fig. 1a), where ST4 analyses were most robust (e.g., Nelson et al. 2016). For metrics requiring a common grid for forecasts and observations, we used a budget algorithm (e.g., Accadia et al. 2003) to interpolate forecast precipitation to the ST4 grid (4.763-km horizontal grid spacing). Otherwise, metrics were computed from native grid output.

The following statistics were aggregated over all twenty-six 3-km forecasts initialized at 0000 UTC.

a. Precipitation climatologies

To assess precipitation climatologies, aggregate domain-total precipitation per grid point and fractional coverages of 1-h accumulated precipitation meeting or exceeding various accumulation thresholds (e.g., 2.5 mm h−1) were calculated on native grids over the verification region. Additionally, spatial patterns of total precipitation over all 26 forecasts were examined, which were similar in the various ensembles and generally agreed with observations (e.g., Fig. 1b), including the southwest–northeast-oriented maximum across Missouri and adjacent areas. Although magnitudes of these maxima differed across the ensembles, these differences were manifested by the following domain-average statistics, so spatial variations of precipitation climatologies are not discussed further.

1) Impact of analysis resolution

Differences between ensembles were largest over the first 12 h, when GEFS-initialized forecasts were spinning up precipitation from coarse 0.5° ICs. While this spinup meant GEFS-initialized forecasts underpredicted total precipitation (Fig. 7) and areal coverages (Fig. 8) over the first 5 h, ultimately, the spinup process yielded too much 6–12-h total precipitation and excessive coverages ≥ 2.5 mm h−1. Forecasts initialized from 15-km EnKF analyses also overpredicted total precipitation over the first 12 h, accompanied by excessive coverages for thresholds ≥ 5.0 mm h−1.

Fig. 7.
Fig. 7.

Average 1-h accumulated precipitation (mm) per grid point over all twenty-six 3-km forecasts and the verification region (CONUS east of 105°W) computed on native grids as a function of forecast hour. Red, blue, gold, and black shadings represent envelopes of the 10 members comprising the ensembles with 3-km EnKF ICs, 15-km EnKF ICs, GEFS ICs, and blended 3-km ICs, respectively, and darker shadings indicate intersections of two or more ensemble envelopes. Values on the x axis represent ending forecast hours of 1-h accumulation periods (e.g., an x-axis value of 24 is for 1-h accumulated precipitation between 23 and 24 h). ST4 data during the 0–12- and 24–36-h forecast periods were identical except for 1 day (the former included data between 0000 and 1200 UTC 25 Apr–20 May while the latter instead included data between 0000 and 1200 UTC 26 Apr–21 May), and because domain-total ST4 precipitation between 0000–1200 UTC 21 May was much larger than that between 0000 and 1200 UTC 25 Apr, average 24–36-h domain-total ST4 precipitation was greater than average 0–12-h domain-total ST4 precipitation.

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0110.1

Fig. 8.
Fig. 8.

Fractional areal coverage (%) of 1-h accumulated precipitation meeting or exceeding (a) 1.0, (b) 2.5, (c) 5.0, (d) 10.0, (e) 25.0, and (f) 50.0 mm h−1 over the verification region (CONUS east of 105°W), computed on native grids and aggregated over all twenty-six 3-km forecasts as a function of forecast hour. Red, blue, gold, and black shadings represent envelopes of the 10 members comprising the ensembles with 3-km EnKF ICs, 15-km EnKF ICs, GEFS ICs, and blended 3-km ICs, respectively, and darker shadings indicate intersections of two or more ensemble envelopes. Values on the x axis represent ending forecast hours of 1-h accumulation periods (e.g., an x-axis value of 24 is for 1-h accumulated precipitation between 23 and 24 h).

Citation: Weather and Forecasting 36, 2; 10.1175/WAF-D-20-0110.1

Overall, forecasts initialized from unblended 3-km EnKF analyses had precipitation climatologies best matching observations through 12 h, but there were shortcomings. For example, although at 1 h, unblended 3-km EnKF analyses produced forecasts with areal coverages closest to observations (Fig. 8), coverages rapidly decreased between 2 and 3 h and were further from those observed between 2 and 12 h for the 1.0 and 2.5 mm h−1 thresholds (Figs. 8a,b) compared to forecasts with 15-km or blended 3-km ICs, suggesting poor maintenance of stratiform precipitation regions after initialization. However, forecasts with unblended 3-km ICs had 6–12-h areal coverages at the 5.0 mm h−1 threshold well-matching observations (Fig. 8c) and 2–6-h coverages at the 10.0–50.0 mm h−1 thresholds closer to observations than forecasts with GEFS and 15-km EnKF ICs (Figs. 8d–f). Furthermore, 2–12-h domain-total precipitation was clearly best in forecasts with unblended 3-km ICs (Fig. 7).

Despite differences between the ensembles through 12 h, domain-total precipitation and areal coverages were broadly similar between 18 and 36 h, with too much total precipitation (Fig. 7) and general underprediction and overprediction of areal coverages at the 1.0 and 10.0–50.0 mm h−1 thresholds, respectively (Figs. 8a,d–f). Collectively, for precipitation climatologies, these findings suggest benefits of convection-allowing analyses relative to convection-parameterizing analyses are primarily confined to short-term forecasts and heavier rainfall rates.

2) Impact of blending

With respect to forecasts initialized from unblended 3-km EnKF analyses, forecasts with blended 3-km ICs (using a 960-km cutoff) had similar 18–36-h areal coverages and total precipitation but higher domain-total precipitation and areal coverages over the first 6–12 h that typically compared worse to observations through 3 h (Figs. 7 and 8). Examination of individual forecasts indicated blended 3-km ICs mostly enhanced 1–3-h forecast precipitation within and near precipitation entities also predicted by forecasts with unblended 3-km ICs and that widespread spurious features did not cause the overprediction. This behavior is illustrated by the forecast initialized at 0000 UTC 1 May 2017, which had the largest difference of domain-total precipitation (e.g., Fig. 7) between member 1 in the CAEs with blended and unblended 3-km ICs across all twenty-six 36-h forecasts (Fig. 9). While both 1–3-h precipitation forecasts had similar spatial patterns, blended ICs led to more numerous cells in places with scattered rainfall, and these additional entities were usually erroneous compared to observations (black and gold circles in Fig. 9). Additionally, within features, the forecast with blended ICs had heavier rainfall maxima than ST4 observations and the forecast with unblended ICs (red circles in Figs. 9b,c,e,f,h,i).

Fig. 9.