1. Introduction
Convection-allowing ensembles (CAEs) provide useful and valuable forecast guidance (e.g., Clark et al. 2012; Evans et al. 2014; Schwartz et al. 2019) and are now operational at many meteorological offices (e.g., Gebhardt et al. 2011; Peralta et al. 2012; Hagelin et al. 2017; Raynaud and Bouttier 2017; Jirak et al. 2018; Klasa et al. 2018). Nonetheless, uncertainty remains about optimal CAE design, especially regarding initial condition perturbations (ICPs),1 which are needed to generate forecast diversity before perturbations from lateral boundary conditions (LBCs) and model error representation schemes engender substantial spread (e.g., Hohenegger et al. 2008; Vié et al. 2011; Peralta et al. 2012; Kühnlein et al. 2014; Romine et al. 2014; Zhang 2019).
There are several broad approaches for producing CAE ICPs. One simple method is to add random noise to a deterministic field (e.g., Hohenegger and Schär 2007; Johnson et al. 2014; Raynaud and Bouttier 2016; hereafter RB16). While this operation is trivial, randomly produced ICPs are not flow dependent, a potential limitation.
Another straightforward method for initial condition (IC) generation is to downscale preexisting coarse-resolution analyses or short-term forecasts from either an ensemble or collection of deterministic numerical weather prediction (NWP) models directly onto the CAE grid (Jones and Stensrud 2012; Duc et al. 2013; Romine et al. 2014; Schumacher and Clark 2014; Schwartz et al. 2015a,b, 2019; Tennant 2015; Clark 2017; Schellander-Gorgas et al. 2017; Jirak et al. 2018; Klasa et al. 2018; Cafaro et al. 2019; Porson et al. 2019). Downscaling means the ICPs directly reflect the NWP model and data assimilation (DA) system underlying the coarse-resolution fields, and finescale details are not introduced into the CAE ICs.
Alternatively, ICPs derived from coarser-resolution models can be recentered about a deterministic analysis of one’s choosing, which could either be interpolated onto or produced directly on the CAE grid (Xue et al. 2007; Kong et al. 2008, 2009; Peralta et al. 2012; Kühnlein et al. 2014; Tennant 2015; RB16; Raynaud and Bouttier 2017; Hagelin et al. 2017). Thus, while these ICPs again reflect the external modeling system, the initial ensemble center could possess finescale structures that are imparted to individual ensemble members during recentering. However, recentering can be complex; European studies examining how recentering affects ensemble forecasts revealed mixed results (e.g., Lang et al. 2015; Tennant 2015; RB16), and recentering analysis ensembles about deterministic “hybrid” variational-ensemble analyses within DA contexts yields little impact (e.g., Clayton et al. 2013; Wang et al. 2013; Pan et al. 2014; Schwartz et al. 2015c).
Still another approach for generating CAE ICPs is to produce them directly on the CAE grid with an ensemble DA system, which provides flow-dependent ICPs fully consistent with the CAE forecast model that span all possible resolvable scales (e.g., Vié et al. 2011; Bouttier et al. 2012; Harnisch and Keil 2015; Wheatley et al. 2015; Johnson and Wang 2016; RB16; Keresturi et al. 2019). While this method is more sophisticated than and theoretically preferable to others, convective-scale DA is still evolving and computationally expensive.
Within each of these overarching methods, there are many options for producing CAE ICs: random noise can be generated in a variety of manners with different correlation length scales; coarse-resolution analyses are available from numerous NWP models with varied resolutions and DA methods; perturbations can be derived from and centered about many potential datasets; and myriad high-resolution DA implementations are possible. Moreover, these various approaches can be combined to produce CAE ICPs (e.g., Zhang 2018, 2019).
Yet, despite the multitude of options for CAE ICP generation, few studies have rigorously examined CAE forecast sensitivity to ICPs. Perhaps the most systematic study devoted to CAE ICPs was RB16, who found ICPs provided from both correlated random noise and a high-resolution, perturbed-observation variational DA system led to better CAE forecasts than ICPs from downscaled global ensemble analyses through 9–12 h. However, RB16 reported negligible sensitivity to ICP method for 12–36-h forecasts, presumably because LBC information quickly swept through their fairly small France-centered computational domain, and it is unclear how RB16’s results may translate to larger domains that are less prone to LBC impacts (e.g., Warner et al. 1997; Romine et al. 2014; Schumacher and Clark 2014), where sensitivity to ICPs may be detectable beyond 9–12 h.
In addition, several studies have assessed the suitability of limited-area ensemble Kalman filters (EnKFs; Evensen 1994; Houtekamer and Zhang 2016) for CAE initialization in case-study or idealized frameworks, although some did not fully isolate ICP impacts. For example, Harnisch and Keil (2015) suggested a convective-scale EnKF could initialize better CAE forecasts than downscaled ICPs for three forecasts, but forecast differences were not fully attributable to ICPs given discrepancies regarding DA and LBCs between various CAEs. Similarly, although Schumacher and Clark (2014) suggested an EnKF-initialized CAE sometimes outperformed a CAE initialized by downscaling and recentering non-EnKF perturbations about a deterministic analysis for a multiday heavy rainfall case, many differences between the CAEs also limited attribution to ICPs. Conversely, Johnson and Wang (2016) performed an idealized, controlled experiment and noted ICPs produced directly on a convection-allowing grid via EnKF DA led to modestly better 9-h precipitation forecasts than when ICPs were provided by coarser-resolution EnKF analyses, but their “perfect model” framework may not apply to many real-data situations.
More broadly, limited-area EnKFs are attractive for CAE initialization, as EnKFs seamlessly meld ensemble DA and forecasting in a single step to produce analysis ensembles that can initialize CAEs. Furthermore, continuously cycling EnKFs have become increasingly popular for real-time CAE forecast applications. For instance, between 2015 and 2017, the National Center for Atmospheric Research (NCAR) produced experimental, real-time CAE forecasts over the conterminous United States (CONUS) initialized with a continuously cycling EnKF (Schwartz et al. 2015b, 2019), and in 2017, Germany began using a continuously cycling EnKF to initialize their operational CAE (Schraff et al. 2016; Pantillon et al. 2018).
However, using continuously cycling limited-area EnKFs to initialize CAEs has risks, as biases can accumulate through assimilation cycles and degrade forecasts (e.g., Hsiao et al. 2012; Torn and Davis 2012; Romine et al. 2013; Wong et al. 2020); EnKFs over large domains like the CONUS might be more susceptible to this problem than EnKFs over comparatively small European domains where prominent LBC influences may mitigate bias accumulation. Therefore, although NCAR’s experimental CAE forecasts were credible and widely adopted by both researchers and forecasters (Schwartz et al. 2019), it remains unclear whether large-domain, limited-area, continuously cycling EnKFs are optimal for producing CAE ICPs. Furthermore, objective assessments of systematic, controlled experiments designed to isolate impacts of ICPs on real-data CAE forecasts over the CONUS have yet to be reported, although subjective evaluations of two CAEs differing solely by ICPs performed during NOAA’s 2019 Hazardous Weather Testbed Spring Forecasting Experiment suggested ICPs had little impact on severe weather forecasts (Clark et al. 2019).
Accordingly, to further understanding about ICP methods for CAEs, including EnKF-based approaches, this study systematically examined 31 48-h forecasts from several 10-member CAEs over the CONUS, where many of the CAEs differed solely by their ICPs. In addition, to explore the impacts of recentering ICPs, other CAEs differed solely by their central initial states. Thus, differences between various CAE forecasts were fully attributable to either ICPs or central initial state, providing insight about CAE initialization and design that has implications for development of future operational CAEs, such as those at NOAA under the Unified Forecast System (UFS) framework.
2. Model configurations and ICP strategies
a. Model configurations
All CAEs employed forecast model configurations similar to those used in NCAR’s real-time CAE project (Schwartz et al. 2015b, 2019). Specifically, 48-h forecasts were produced by version 3.6.1 of the Advanced Research Weather Research and Forecasting (WRF) Model (Skamarock et al. 2008; Powers et al. 2017) over a two-way nested domain spanning the CONUS and adjacent areas (Fig. 1a). The horizontal grid spacing was 15 km in the outer domain and 3 km in the nest, where time steps were 75 and 18.75 s, respectively. Both domains had 40 vertical levels, a 50-hPa top, and used common physical parameterizations (Table 1), except no cumulus parameterization was employed on the 3-km grid. All ensemble members used identical physics and dynamics.
Physical parameterizations for all WRF Model forecasts. Cumulus parameterization was only used on the 15-km domain.
During the 48-h integrations, LBCs were produced for each ensemble member by perturbing forecasts from NCEP’s Global Forecast System (GFS) with random, correlated, Gaussian noise with zero mean (e.g., Barker 2005; Torn et al. 2006) drawn from the default “cv3” background error covariances (BECs) provided by the WRF Model’s DA system (WRFDA; Barker et al. 2012), which were produced with the “NMC method” (Parrish and Derber 1992) based on differences between 48- and 24-h forecasts from a legacy ~100-km configuration of the GFS model. Following Schwartz et al. (2015b), LBC perturbation magnitudes linearly increased throughout the forecasts to promote spread. Identical LBC sets were used for all CAEs.
b. IC generation and experimental design
Five sets of 10-member, 48-h CAE forecasts were produced over May 2015, which was the wettest month ever recorded over the CONUS (e.g., Blunden and Arndt 2016) and featured a broad precipitation maximum over the central CONUS (Fig. 1b). The CAEs were identical except for their ICs, which are now described.
1) Continuously cycling EnKF DA
Two CAEs had ICPs derived from an experimental continuously cycling ensemble adjustment Kalman filter (Anderson 2001, 2003; Anderson and Collins 2007), a type of EnKF, implemented in the Data Assimilation Research Tested (DART) software (Anderson et al. 2009). The EnKF DA system had 80 ensemble members and produced analyses solely on the 15-km domain (Fig. 1a).
Similar to Schwartz et al. (2015a), an initial 15-km ensemble was created by adding random noise drawn from the WRFDA-provided BECs to the 0.25° 0000 UTC 26 April 2015 GFS analysis; this randomly generated ensemble served as the prior (before assimilation) ensemble for the first EnKF analysis. Then, the 0000 UTC 26 April 2015 posterior (after assimilation) ensemble initialized a 6-h, 80-member ensemble forecast that became the prior ensemble for the second EnKF analysis at 0600 UTC 26 April 2015, and this analysis–forecast cycle with a 6-h period continued until 0000 UTC 31 May 2015. Model configurations and LBC perturbation strategies during the 6-h forecast steps were identical to those described in section 2a, except forecasts were not produced on the 3-km grid. Soil states evolved freely and independently for each member during the entire cycling period, and the first 5 days of cycling were considered spinup and discarded.
The EnKF assimilated conventional observations as described by Schwartz et al. (2015b), with the addition of global positioning system radio occultation refractivity observations. Observation errors, preprocessing, and quality control were also detailed by Schwartz et al. (2015b) and included an “outlier check” to reject observations far from the prior ensemble mean, inflating errors of observations near lateral boundaries, rejecting surface observations with mismatched modeled and observed terrain heights, and superobbing aircraft and satellite wind observations.
Specific DA settings were mostly similar to those employed by NCAR during their real-time CAE project (Schwartz et al. 2015b, 2019) and summarized in Table 2. Compared to NCAR’s real-time EnKF analyses produced in May 2015 (Schwartz et al. 2015b), the biggest differences involved ensemble size and covariance inflation. Specifically, the 80 members and posterior relaxation-to-prior-spread (RTPS) inflation (Whitaker and Hamill 2012) differed from the real-time analyses, which had 50 members and used prior adaptive inflation (e.g., Anderson 2009). The switch to RTPS inflation was based on systematic experimentation finding little precipitation forecast sensitivity to inflation method, and because RTPS inflation is simpler, we chose it for this work. Ultimately, the EnKF configuration herein was well tuned in a spread–skill sense (Houtekamer et al. 2005) and initialized significantly better precipitation forecasts than the real-time EnKF analyses (not shown), likely due to the larger ensemble size, which benefits EnKFs (e.g., Zhang et al. 2013; Houtekamer et al. 2014).
Configuration details of the DART-based EnKF DA system.
For each 0000 UTC EnKF analysis between 1 and 31 May 2015 (inclusive), the first 10 15-km analysis members (i.e., members 1–10) initialized 48-h forecasts on the nested grid (Fig. 1a), where 3-km ICs were downscaled from the 15-km analyses that lacked storm-scale structures. Because the EnKF can be conceived as separately updating a mean and perturbations about the mean, our EnKF-based ICPs were centered about 80-member ensemble mean EnKF analyses (“EnKFEnKF”; Table 3). On average, each EnKF member was equally likely to be closest to “truth”, so choosing the first 10 members to initialize 48-h forecasts was analogous to randomly picking 10 members from full 80-member analysis ensembles (e.g., Schwartz et al. 2014, 2019), and 10-member CAEs can provide skillful and valuable forecasts (e.g., Clark et al. 2011, 2018; Schwartz et al. 2014).
Description of the various CAEs in terms of their initial centers, IC perturbation methods, and initial hydrometeors.
In addition, between 1 and 31 May 2015 (inclusive), 0000 UTC perturbations of zonal and meridional wind, potential temperature, water vapor mixing ratio, and perturbation geopotential and dry surface pressure (U, V, θ, qυ, ϕ, and μ, respectively) from analysis members 1–10 were added to corresponding 0000 UTC 0.25° GFS analyses to create another set of ICs that initialized 48-h forecasts on the nested grid (“GFSEnKF”; Table 3). These ICs had identical U, V, θ, qυ, ϕ, and μ perturbations as EnKFEnKF but were centered on GFS analyses, rather than EnKF mean analyses (Table 3), providing insight about sensitivity to IC center and suitability of large-domain regional continuously cycling EnKFs to initialize CAE forecasts. Compared to the EnKF analyses, GFS analyses had coarser resolution but assimilated many more observations, including satellite radiances, and reflect a well-tuned, operational deterministic forecast system. Moreover, standard WRF Model preprocessing discards GFS hydrometeor analyses, such that CAEs with GFS initial centers started with no (zero value) hydrometeors, contrasting those CAEs with EnKF mean initial centers (Table 3).
2) Random ICPs
Performing limited-area EnKF DA can be expensive, so other cheaper, pragmatic methods of producing CAE ICPs were also explored. Thus, two additional sets of 10-member 48-h forecasts on the nested grid (Fig. 1a) were initialized by taking random draws of U, V, θ, qυ, ϕ, and μ from the WRFDA-provided BECs and adding them to both 0000 UTC 15-km EnKF mean (“EnKFRAND”) and 0.25° GFS analyses (“GFSRAND”) between 1 and 31 May 2015 (inclusive; Table 3). The random patterns differed for each initialization, but, for a particular initialization and ensemble member, identical random perturbations were added to both EnKF mean and GFS analyses (i.e., member 1 of EnKFRAND and GFSRAND had identical perturbations).
Length scales and variances of random perturbations can be tuned when drawing from BECs, providing many possibilities for specifying initial correlated random noise. However, we only used one set of tuning parameters where the length scales were empirically reduced by ~85% from those within the WRFDA-provided BECs, which was necessary because of our much finer grid spacing compared to the ~100-km statistics contained in the BECs. Variances were also reduced relative to those in the WRFDA-provided BECs in attempt to roughly approximate spread of the other initial ensembles. Ultimately, our randomly produced ICPs had mesoscale structures and contrasted RB16, who used a similar method to produce random CAE ICPs but with convective-scale structures. Although subsequent CAE forecasts may be sensitive to length scale and variance of initial random noise, examining this sensitivity was beyond the scope of this work, and the primary purpose of constructing random ICPs was to assess whether they yielded comparable forecast quality as flow-dependent EnKF ICPs, but with substantially lower costs, by examining relative performances of CAEs with the same initial center but different ICPs (e.g., EnKFRAND versus EnKFEnKF and GFSRAND versus GFSEnKF; Table 3).
3) SREF ICPs
An additional IC set was produced by adding perturbations of U, V, θ, qυ, ϕ, and μ derived from 2100 UTC–initialized 3-h forecasts of NCEP’s Short-Range Ensemble Forecast (SREF; Du et al. 2014) system to 0000 UTC 0.25° GFS analyses (“GFSSREF”; Table 3); these ICs then initialized 48-h forecasts on the nested grid between 1 and 31 May 2015 (inclusive). This inexpensive method was very similar to that used by the Center for Analysis and Prediction of Storms to produce CAE ICPs for many years (e.g., Xue et al. 2007; Kong et al. 2008, 2009; Gallo et al. 2017). Like EnKF perturbations, SREF perturbations were flow-dependent, and although the SREF system had 16-km horizontal grid spacing, data available to us had been coarsened to 32 km.
During the experimental period (May 2015) the SREF contained 21 members with diversity provided by varied dynamic cores, physics, and ICs (Du et al. 2014).2 However, we only needed perturbations from 10 members, which were chosen as the 8 SREF members used to initialize the National Severe Storms Laboratory’s experimental CAE (Clark 2017) plus 2 additional members based on the Advanced Research WRF dynamic core (the “p3” and “n3” SREF members). Contrasting the single-physics, single-dynamics EnKF ICPs, these 10 SREF-based ICPs collectively reflected three dynamic cores, each associated with its own unique IC generation method, and, moreover, some physics schemes varied across SREF members with a common core (Du et al. 2014). Thus, below we refer to SREF ICPs as “multimodel ICPs,” with the understanding that differences between GFSSREF members cannot be fully attributed to dynamic core, physics, or initialization method encapsulated and entangled within their ICPs.
3. ICP characteristics and spread growth
a. Initial spread characteristics
Mean 700-hPa zonal wind spread over all 31 0000 UTC initial 15-km ensembles highlighted differences between the various ICPs. Specifically, EnKF and SREF ICPs were flow dependent (Figs. 2a,b), with relatively large spread associated with stronger mean height gradients that portend uncertainty, such as over eastern Canada and the central CONUS, and comparatively small spread associated with weaker height gradients over the southeast CONUS and West Coast. Conversely, random ICPs were not flow dependent and yielded nearly uniform mean spread reflecting the tuned BECs (Fig. 2c).
Consistent with Figs. 2a and 2b, EnKF- and SREF-based initial ensembles had comparable spreads for wind (Figs. 3a,b), while SREF-based initial ensembles had larger spread than EnKF-based initial ensembles for temperature and moisture below 250 hPa (Figs. 3c,d); these larger SREF spreads for thermodynamic variables were possibly manifestations of diverse precipitation patterns produced by the multiple models in the unconstrained 3-h SREF forecasts leveraged to obtain ICPs. Except for jet stream level, randomly produced initial ensembles had broadly comparable spreads as those with SREF and EnKF perturbations for wind but with smoother vertical structures (Figs. 3a,b). However, random ICPs had more midtropospheric temperature and low-level moisture spread than the other ICPs (Figs. 3c,d).
b. Spread evolution
1) Spread and error at rawinsonde locations
Ensemble mean RMSEs with respect to rawinsonde observations and standard deviations at rawinsonde locations were computed to assess spread and error growth. The ensembles with EnKF ICPs (EnKFEnKF and GFSEnKF; Table 3) had similar spreads throughout the forecast (Fig. 4), but the CAE initially centered about EnKF mean analyses (EnKFEnKF; gray curves) had smaller RMSEs than the CAE initially centered about GFS analyses (GFSEnKF; blue curves) at the initial time, indicating the EnKF (section 2b) fit rawinsonde observations more closely than GFS analyses. However, EnKFEnKF ensemble mean RMSEs grew quickly and were statistically significantly worse than those from GFSEnKF after initialization, indicating forecast sensitivity to initial center and suggesting GFS analyses were overall better than EnKF mean analyses.
The three CAEs with GFS initial centers (GFSEnKF, GFSRAND, GFSSREF; Table 3) typically had similar RMSEs but different spreads (Fig. 4), with initial spreads generally aligned with Figs. 3a–c. In particular, except for 500- and 300-hPa wind, EnKF ICPs had the smallest initial spread at rawinsonde locations due to the restorative effect of assimilating those very observations, and GFSEnKF spread was smaller than GFSRAND and GFSSREF spread from 12 to 48 h.
While GFSEnKF and GFSSREF spread sometimes grew more than GFSRAND spread over the entire 48-h forecast, over the first 12 h, GFSSREF spread usually grew faster than GFSEnKF spread and GFSRAND spread growth rates were typically highest (Fig. 4). Although GFSRAND initial spread was often relatively large, even when GFSRAND initial spread was comparable to or smaller than that of the other ensembles, rapid spread growth still occurred over the first 12 h (Figs. 4a,b,f,g,j), suggesting GFSRAND forecast spread was not simply modulated by its initial spread.
2) Perturbation power spectra
To further understand spread growth characteristics over the first 12 h, perturbation power spectra were computed using the discrete Fourier transform after applying a Hanning window (e.g., Harris 1978) to enforce periodicity. Random 2-m temperature and 10-m wind ICPs had less power than SREF and EnKF ICPs for scales <500 and 250 km, respectively (Figs. 5a,f), reflecting the specified length scales used to construct random noise. However, random ICPs led to rapid error growth over the first hour (Figs. 5b,g), with larger growth rates than EnKF and SREF ICPs at small scales, suggesting rapid GFSRAND spread increases over the first 12 h (e.g., Figs. 4a,b,f–j) were driven by small-scale perturbation growth ultimately spurred by downscale propagation of random mesoscale errors (e.g., Durran and Gingrich 2014). After the first hour, GFSRAND error growth rates were much slower (Figs. 5c–e,h–j), but by 12 h, at all scales GFSRAND had the most perturbation energy and GFSEnKF the least (Figs. 5e,j), consistent with greater low-level GFSRAND 12-h forecast spread compared to GFSEnKF (Figs. 4a,f).
Overall, these spectra illustrate rapid GFSRAND error growth was insensitive to ICP variance magnitude; GFSRAND surface temperature spread was relatively small (Fig. 3c) while its surface wind spread was relatively large (Figs. 3a,b), yet rapid GFSRAND error growth occurred over the first hour for both variables. Similar evolutions were evident for other vertical levels, and after 12 h, spectra from all three ensembles gradually converged as common LBCs exerted their influence (not shown).
3) Precipitation spread
Precipitation development and spread over the first 18 h was sensitive to initial spread characteristics. Most notably, precipitation variances (about each ensemble’s mean) were largest in the two CAEs with random ICPs, with rapid spread increases over the first 6 h (Fig. 6a) consistent with fast low-level error growth (Figs. 4a,b,f,g, 5). Comparatively, precipitation variances were less sensitive to IC center, although the CAEs initially centered about GFS analyses had less spread than those initially centered about EnKF mean analyses through 18 h, possibly because initial nonzero hydrometeor states in the CAEs initially centered about EnKF mean analyses (Table 3) contributed to spread growth over the first few hours. GFSSREF had larger spread than GFSEnKF through 24 h, possibly due to the multiple models reflected in SREF-based ICPs and generally consistent with greater GFSSREF initial spread (Figs. 3c,d, 4) and spread growth over the first 12 h (Fig. 4). Variances computed after a bias correction (see section 4b) generally behaved similarly as uncorrected variances over the first 12–18 h but with smaller differences among the ensembles (Fig. 6b).
After 18 h, precipitation variances were more similar across all five CAEs than at earlier times. However, of the three CAEs initially centered about GFS analyses, GFSSREF had the most spread between 24 and 33 h for raw variances (Fig. 6a), while bias-corrected variances indicated more spread from random and SREF ICPs between 24 and 42 h compared to EnKF ICPs (Fig. 6b).
c. Forecast example
The forecast initialized at 0000 UTC 11 May 2015 nicely illustrates how different ICPs impacted spread growth. At this time, precipitation was ongoing in the vicinity of tropical depression Ana over southeastern North Carolina and along surface boundaries stemming from a low pressure center over South Dakota, which was associated with an upper-level trough over the Rockies and adjacent plains (Figs. 7s–u). Initial 2-m temperature EnKF and SREF perturbations indicated flow dependence with enhanced spread around these features (Figs. 7a,g,s,t), whereas random perturbations did not reflect these phenomena (Figs. 7m,u).
The 2-m temperature perturbation magnitudes were initially small (Figs. 7a,g,m,s–u), but by 1 h, spread substantially increased. At 1 h, GFSEnKF and GFSSREF spread primarily reflected the surface low pressure system and attendant fronts (Figs. 7b,h), while GFSRAND had not fully developed flow-dependent characteristics (Fig. 7n). However, after 3 h, all spread patterns reflected synoptic-scale features (Figs. 7c,i,o), and by 6–12 h, the three ensembles had comparable structures near the fronts (Figs. 7d–f,j–l,p–r), with GFSSREF spread highest along the boundaries. Conversely, in the weak forcing regime over the southeastern CONUS and Ohio Valley, GFSRAND possessed much more spread than GFSSREF and GFSEnKF that peaked from 6 to 9 h (Figs. 7d,e,j,k,p,q), consistent with Johnson et al. (2014), who suggested random noise was most likely to promote spread growth in weak forcing scenarios. It appears that these random spread patterns were initially organized on small scales (Figs. 7n,o), consistent with perturbation spectra indicating rapid small-scale error growth over the first several hours (Figs. 5a–c,f–h).
Regarding precipitation, GFSEnKF had more spread than GFSRAND and GFSSREF at 1 h (Figs. 8a,f,k), and although GFSRAND 2-m temperature structures were not fully flow-dependent at this time (Fig. 7n), GFSRAND precipitation spread represented flow-dependent features (Fig. 8k). This finding was similar to RB16, who noted flow-dependent precipitation structures quickly developed in a CAE with storm-scale random ICPs. By 3 h, precipitation spread had grown substantially in all CAEs (Figs. 8b,g,l), and by 6–12 h, there were generally more enhanced and wider areas of nonzero spread in GFSRAND than GFSEnKF in the vicinity of frontally forced precipitation (Figs. 8c–e,h–j,m–o). Consistent with Fig. 6a, GFSRAND spread peaked at 6 h, and while GFSSREF and GFSEnKF had similar patterns, there was slightly more GFSSREF spread from 3 to 12 h.
Between 3 and 9 h, GFSRAND precipitation spread was particularly large over the weakly forced southeast CONUS and Ohio Valley, whereas GFSEnKF and GFSSREF had much less spread in similar locales (Figs. 8b–d,g–i,l–n). Areas of enhanced GFSRAND precipitation spread often appeared to be preceded by relatively large GFSRAND 2-m temperature perturbations (Figs. 7m–p) and were accompanied by low probabilities of precipitation over wide areas where observed precipitation did not occur (not shown). Thus, at least for this case, random ICPs led to false alarms in some members in areas with weak forcing.
d. Summary
The preceding analyses suggest random ICPs promoted rapid short-term error growth, primarily driven by small-scale perturbations (Fig. 5), while EnKF and SREF ICPs had comparatively slower error growth rates. Accordingly, compared to the other ICP strategies, random ICPs generally yielded more spread over the first 12–18 h (Figs. 4, 6). As the next section shows, this additional spread from random ICPs was sometimes helpful, yet did not always possess favorable characteristics.
4. Precipitation verification
Hourly accumulated precipitation forecasts were objectively compared to NCEP’s Stage IV (ST4) analyses (Lin and Mitchell 2005) over the CONUS east of 105°W (Fig. 1a), where ST4 data were most robust (e.g., Nelson et al. 2016) and considered as “truth.” While some statistics were computed on native grids, many verification metrics require a common grid for forecasts and observations. So, for these metrics, all precipitation forecasts were interpolated to the ST4 grid (4.763-km horizontal grid spacing) using a precipitation-conserving budget interpolation algorithm (e.g., Accadia et al. 2003). We primarily focused on precipitation because it is an important sensible weather field and depends on many physical processes, thus providing an overall summary of model performance.
Statistics presented in this section are aggregated over all 31 forecasts.
a. Bias characteristics
1) Total precipitation
Total precipitation over the verification region (Fig. 1a) normalized by number of grid points in the verification region was determined for each member on native grids (Fig. 9). To concisely summarize results, only the mean and range (maximum minus minimum; lines with circle markers) are shown for all five CAEs (Fig. 9a), while individual GFSSREF members are shown in Fig. 9b.
The largest differences between the CAEs occurred over the first 12 h, where the two CAEs with random ICPs spunup precipitation much faster than the other CAEs but grossly overshot observed domain-total precipitation (Fig. 9a). While the CAEs with EnKF and SREF ICPs had broadly similar mean spinups, distinct trifurcation of GFSSREF members occurred based on dynamic core (Fig. 9b), consistent with Johnson et al. (2011) and indicating how ICPs reflecting multiple models can lead to clustering, which is undesirable (e.g., Gowan et al. 2018; Schwartz et al. 2019). In general, spinup appeared more sensitive to ICPs than IC center, even though initial center determined whether the CAE had initial hydrometeors (Table 3). This finding suggests ICP characteristics influence spinup more than initial hydrometeor state for 0000 UTC–initialized forecasts over the central-eastern CONUS.
Despite varied spinups, all CAEs generally well-represented diurnal cycle timing after 18 h, where GFSEnKF and GFSSREF members with NMM dynamic core ICPs had domain-total precipitation typically best matching observations, including the observed peak around 24 h (Fig. 9). Conversely, the two ensembles with random ICPs had less mean precipitation than observations between ~24–42 h, while at the maximum around 24 h EnKFEnKF produced too much precipitation (Fig. 9a). Interestingly, despite overpredicting at 24 h, EnKFEnKF precipitation dramatically decreased and underpredicted between ~26–42 h, perhaps due to insufficient upscale convection growth (e.g., Schwartz et al. 2015b).
GFSSREF clearly had the widest range throughout the forecast, reflecting its ICPs with multimodel diversity (Fig. 9a). Additionally, the two CAEs with random ICPs typically had wider ranges than those with EnKF ICPs, particularly over the first 12 h, essentially a manifestation of randomness. However, after 12 h, except for GFSSREF, the four other CAEs had fairly similar ranges.
2) Precipitation distributions
Average areal coverages of 1-h accumulated precipitation meeting or exceeding selected accumulation thresholds (e.g., 10.0 mm h−1) were calculated over the verification region on native grids to assess precipitation distributions (Fig. 10). The CAEs generally well-represented diurnal cycle timing after the spinup, although there were sometimes biases, particularly for thresholds ≤1.0 mm h−1, where all CAEs usually had mean coverages lower than those observed (Figs. 10a,b).
Areal coverage characteristics for thresholds ≥2.5 mm h−1 (Figs. 10c–f) were broadly consistent with domain-total precipitation statistics, and GFSSREF members again clustered based on dynamic core represented in the ICPs (not shown). Specifically, the two ensembles with random ICPs had lower mean coverages than observations between ~24–42 h for thresholds ≥2.5 mm h−1 but clearly overpredicted during the spinup, which contributed to their excessive total precipitation during this period (e.g., Fig. 9a). Before and during the first observed precipitation peak (18–24 h), GFSEnKF and GFSSREF typically had ensemble mean coverages closest to observations, while EnKFEnKF overpredicted for thresholds between 2.5 and 10.0 mm h−1 (Figs. 10c–e). Ensemble ranges of areal coverages (lines with circles in Fig. 10) also resembled those for domain-total precipitation, with GFSSREF having the widest ranges and the CAEs with random ICPs possessing relatively large ranges for the first ~12 h.
Probability density functions (PDFs) further revealed the different spinups engendered by EnKF and random ICPs (Fig. 11). At 1 h, while finescale structures were still developing, CAEs with EnKF ICPs had more heavy precipitation than those with random ICPs, although none of the forecasts could yet reproduce the observed heavy rainfall frequency (Fig. 11a). However, between 1 and 3 h, heavy precipitation rapidly developed in the CAEs with random ICPs (Fig. 11b), with slower development in the CAEs with EnKF ICPs, and by 5–7 h, the CAEs with random ICPs produced too much rainfall >40.0 mm h−1 while PDFs of the CAEs with EnKF ICPs gradually aligned with those observed (Figs. 11c,d).
Collective findings clearly suggested issues when initializing CAEs with random noise for short-term precipitation forecasts (Figs. 8–11), possibly due to gross imbalances in random initial states. Moreover, our results are consistent with Johnson et al. (2014), who found initializing CAEs with correlated random noise led to “spurious precipitation that formed over large areas on many cases” at short forecast ranges, and as Johnson et al. (2014) constructed random ICPs with smaller length scales than those used here, it appears using random noise to initialize CAEs may be challenging regardless of its correlation scale.
b. Ensemble precipitation verification
Areal coverages sometimes indicated biases (Fig. 10), which can hamper interpretation of verification metrics designed to quantify spatial errors (e.g., Baldwin and Kain 2006; Roberts and Lean 2008). Thus, forecasts were bias corrected before assessing measures of probabilistic forecast quality with a “probability-matching” approach that forced each ensemble member’s distribution to the ST4 distribution by replacing the model grid point containing the most precipitation within the verification region with the highest ST4 amount within the verification region, and so on, thus eliminating bias (e.g., Ebert 2001; Clark et al. 2009, 2010a, b; Schwartz et al. 2015a; Loken et al. 2019; Pyle and Brill 2019). Despite replacing model values with observations, this method preserves forecast spatial patterns.
After interpolating precipitation forecasts to the ST4 grid and bias correcting, a “neighborhood” approach (e.g., Theis et al. 2005; Ebert 2008, 2009) was employed to derive probabilistic fields suitable for verification following Schwartz and Sobash (2017). First, ensemble probabilities (EPs) at a particular grid point were determined as the fraction of ensemble members predicting an event at that point, where an event was defined as precipitation meeting or exceeding an accumulation threshold (e.g., 5.0 mm h−1). Then, “neighborhood ensemble probabilities” (NEPs; Schwartz et al. 2010; Schwartz and Sobash 2017) were computed by choosing a neighborhood length scale (r) to define a spatial neighborhood and averaging EPs over all grid points in the neighborhood. NEPs are probabilities of event occurrence at a point given a neighborhood length scale (e.g., Schwartz and Sobash 2017) and are more appropriate for verifying CAEs than point-based probabilities (i.e., EPs) because they incorporate spatial uncertainty and acknowledge that CAEs are inherently inaccurate at the grid scale.
NEPs were produced from all CAEs with r between 5 and 150 km, which represented radii of circular neighborhoods. Following Schwartz and Sobash (2017), NEPs at the ith point were verified against corresponding observations (i.e., ST4) at the ith point, where the ith observed value could either be binary (i.e., 0 or 1) or fractional depending on what the metric required; fractional observations (e.g., Roberts and Lean 2008) at the ith point were obtained by determining the fraction of observed events within its neighborhood, analogously to NEPs.
For brevity, results are shown solely for r = 100 km, but overall findings were unchanged using different r. Additionally, a maximum event threshold of 10.0 mm h−1 was used, as metrics computed at higher thresholds were noisy due to small sample sizes (e.g., Fig. 10).
Statistical significance testing followed Schwartz (2019), who examined performance of several ensembles, and the following text parallels from there. Specifically, statistical significance was determined with a bootstrap technique by randomly drawing paired samples (10 000 times) of daily statistics from two ensembles over all forecast cases to calculate resampled distributions of aggregate differences between two ensembles (e.g., Hamill 1999; Wolff et al. 2014). This procedure assumed individual forecasts, initialized 24 h apart, were independent (e.g., Hamill 1999). Bounds of 90% bootstrap confidence intervals (CIs) were obtained from the distribution of resampled aggregate differences using the bias corrected and accelerated method (e.g., Gilleland 2010). If bounds of a 90% bootstrap CI did not encompass zero, using a one-tailed interpretation, differences between two ensembles were statistically significant at the 95% level or higher.
1) Fractions skill scores
The fractions skill score [FSS; Roberts and Lean (2008)] was used to evaluate spatial placement, where FSS = 1 means a perfect forecast and FSS = 0 indicates no skill. For fixed initial centers, CAEs with flow-dependent ICPs usually had higher FSSs than those with random ICPs, while differences between GFSSREF and GFSEnKF FSSs were usually small and not statistically significant (Fig. 12). These results indicated value of flow-dependent ICPs compared to random ICPs and minimal benefits of multimodel ICPs. However, regardless of ICPs, the three CAEs initially centered on GFS analyses typically had higher FSSs than the two CAEs initially centered about EnKF mean analyses, demonstrating GFS analyses were generally better than EnKF mean analyses and suggesting initial center is more important than ICPs for achieving high FSSs.
2) Rank histograms
Rank histograms (e.g., Hamill 2001) based on domain-total precipitation were constructed as in Schwartz et al. (2014). Although rank histograms are sensitive to observation errors (e.g., Hacker et al. 2011), ST4 observation errors are not well-known and were not included. The reliability index (RI; Delle Monache et al. 2006) was used to summarize rank histogram flatness; lower values are preferable.
Observations fell within the ensemble more regularly and more optimal values were achieved in most bins when CAEs had random or SREF ICPs rather than EnKF ICPs (Fig. 13). GFSSREF and GFSRAND RIs were fairly similar and much smaller than GFSEnKF RIs (Figs. 13a,c), and differences between the CAEs were comparatively small after 18 h (cf. Fig. 13a and Fig. 13c and Fig. 13b and Fig. 13d), reflecting generally converging spread with time (e.g., Figs. 4, 6). Nonetheless, results at all forecast ranges suggest that enhanced precipitation spread engendered by random ICPs (e.g., Fig. 6) led to better dispersion characteristics than flow-dependent EnKF ICPs, even though this improved spread was also a manifestation of spinup issues (e.g., Figs. 7–11) and using random ICPs degraded forecast skill as measured by FSSs (e.g., Fig. 12).
3) ROC areas
Ability to discriminate events from climatology was quantified by area under the relative operating characteristic (ROC) curve (Mason 1982; Mason and Graham 2002), which was computed using decision thresholds of 1%, 2%, 3%, 4%, 5%, 10%, 15%, …, 95%, and 100% and a trapezoidal approximation. ROC area > 0.5 indicates better discriminating ability than random forecasts.
As with FSSs, all three CAEs initially centered on GFS analyses usually had higher ROC areas than the two CAEs initially centered about EnKF mean analyses (Figs. 14a–d), again suggesting GFS analysis superiority to EnKF mean analyses and greater importance of initial center than ICPs. Between ~6–18 h, for fixed initial centers, the CAEs with random ICPs had statistically significantly higher ROC areas than those with EnKF ICPs, while before 6 h and after 18 h, EnKF and random ICPs yielded similar ROC areas (Figs. 14a–d). These results differed from FSSs (Fig. 12) that clearly indicated EnKF ICPs were preferable to random ICPs. Outside the 6–18-h period, GFSSREF had the highest ROC areas among the three CAEs with GFS initial centers that were often statistically significantly higher than GFSEnKF ROC areas, suggesting benefits of incorporating multimodel diversity within CAE ICPs and contrasting the similar GFSSREF and GFSEnKF FSSs.
Further investigation revealed the relatively poor 6–18-h GFSEnKF ROC areas compared to GFSRAND and GFSSREF were primarily due to insufficient contributions from NEPs < 25%. Specifically, GFSRAND and GFSSREF were less sharp than GFSEnKF, with higher coverages of NEPs for r = 100 km between 5% and 25% and lower coverages of NEPs ≥ 45% at most thresholds (Figs. 15a–d). In general, the GFSRAND distribution differed more from the GFSEnKF distribution than GFSSREF; for example, within the 5%–25% bin for the 0.25 and 1.0 mm h−1 thresholds, GFSRAND had ~50% more NEPs than GFSEnKF while the difference between GFSSREF and GFSEnKF was smaller (~15%; Figs. 15a,b). These enhanced low-probability coverages in GFSSREF and GFSRAND reflected their greater spreads relative to GFSEnKF between 6 and 18 h (Fig. 6) that were beneficial (e.g., Fig. 13a) and enabled better detection of low-probability events while not appreciably increasing false alarm rates, boosting ROC areas.
Between 18 and 36 h, even though GFSRAND again had more low probabilities than GFSEnKF (Figs. 15e–h), differences between GFSRAND and GFSEnKF coverages of 5%–25% NEPs were smaller than between 6 and 18 h. Although greater GFSRAND spread was beneficial from a dispersion perspective (e.g., Fig. 13c), GFSRAND spatial placement was significantly poorer than GFSEnKF (Fig. 12), counteracting benefits from enhanced spread and likely leading to comparable GFSRAND and GFSEnKF ROC areas outside of 6–18 h (Figs. 14a–d). Conversely, differences between GFSEnKF and GFSSREF NEP distributions were similar across both forecast intervals (Fig. 15), and as GFSSREF and GFSEnKF FSSs were similar, the combination of good GFSSREF placement and more GFSSREF spread translated into higher GFSSREF ROC areas for most of the forecast relative to GFSEnKF.
Overall, ROC areas indicated more benefits of using both random and multimodel ICPs than FSSs. However, higher ROC areas from these techniques appear related solely to enhanced spread and greater low probability coverages. In fact, when ROC areas were computed with decision thresholds of 0%, 25%, 30%, 35%, …, 95%, and 100% to explicitly exclude contributions from NEPs < 25%, although ROC areas plummeted, CAEs with EnKF ICPs had higher ROC areas than CAEs with the same initial center but random ICPs, and GFSEnKF had comparable or higher ROC areas than GFSSREF (Figs. 14e–h). These truncated ROC areas provided similar conclusions as FSSs regarding benefits of flow-dependent EnKF ICPs, and it appears that employing multimodel ICPs may be unnecessary for users unconcerned with low-probability decision thresholds.
4) Attributes statistics
Attributes diagrams (Wilks 2011) were constructed with forecast probability bins of 0%–5%, 5%–15%, 15%–25%, …, 85%–95%, and 95%–100% (Fig. 16) to assess calibration, with curves falling on the diagonal indicating perfect reliability. Over the first 18 h, for fixed IC centers, the CAEs with random and SREF ICPs were more reliable than those with EnKF ICPs for most thresholds and probability bins, and GFSRAND was sometimes more reliable than GFSSREF (Figs. 16a–d). The better GFSRAND and GFSSREF reliabilities compared to GFSEnKF were aided by less sharp distributions with fewer high-probability forecasts (e.g., Figs. 15a–d) that diminished overconfidence, again reflecting their greater spreads. Nonetheless, relatively poor GFSRAND FSSs suggest many low probabilities did not correspond well with observations. Initial center again mattered, as for fixed ICPs, the CAEs with GFS initial centers typically had better reliabilities than those with EnKF mean initial centers.
Similar conclusions generally held at later times (18–36 h; Figs. 16e–h), although GFSSREF and GFSRAND had closer reliabilities than at earlier times. Over both periods, most ensembles were overconfident and all CAEs had little or no skill with respect to forecasts of climatology at the 10.0 mm h−1 threshold, indicating challenges persist for making reliable predictions of highly localized events like heavy rainfall.
5. Summary and conclusions
Five sets of 48-h, 10-member, 3-km CAE forecasts were initialized at 0000 UTC each day in May 2015 over the CONUS with various configurations designed to isolate forecast sensitivity to ICPs and central initial state. Sensitivity to ICs extended throughout the 48-h forecasts, contrasting many European studies showing IC impacts through only 6–12 h (e.g., Hohenegger et al. 2008; Vié et al. 2011; Kühnlein et al. 2014; RB16); this disparity is probably due to the much bigger computational domain used here, and our findings suggest enhanced importance of ICs for large domains.
Specifically, using random mesoscale ICPs yielded undesirable spinup characteristics and relatively poor FSSs compared to employing flow-dependent ICPs provided by both single-physics, single-dynamics 15-km limited-area continuously cycling EnKF analyses and 3-h multimodel SREF forecasts. However, these deleterious characteristics from random ICPs increased spread, leading to less overconfidence and broader low-probability coverages that improved ROC areas, rank histogram flatness, and attributes statistics compared to EnKF—and sometimes SREF—ICPs. Therefore, it appears random ICPs engendered some beneficial properties despite lack of flow dependence, but substantial work is needed to further understand and remedy detrimental impacts of random noise on model spinup.
Compared to EnKF ICPs, SREF ICPs yielded comparable FSSs but improved performance for spread-sensitive metrics. Yet, individual members of the SREF-initialized CAE had different climatologies that undesirably clustered by dynamic core reflected in its ICPs. Thus, although SREF-based and random ICPs often provided improvements over EnKF ICPs, given the challenges associated with multimodel and random ICPs, collective results suggest obtaining “good spread” in CAEs remains elusive, and within future operational CAEs like those being developed under NOAA’s UFS, it may be more fruitful to attempt to recover the helpful, spread-inducing aspects from random and multimodel ICPs by instead using stochastic physics schemes in association with single-physics, single-dynamics, flow-dependent ICPs (e.g., Bouttier et al. 2012; Romine et al. 2014; Jankov et al. 2019).
Additionally, our findings stress the importance of CAE initial center, which was more important than ICPs for achieving high ROC areas and FSSs. Moreover, CAEs initially centered about operational GFS analyses were unequivocally superior to those initially centered on our experimental EnKF mean analyses. These results strongly suggest relative superiority of GFS analyses and lend credence to the “partial cycling” strategy currently employed by NOAA’s limited-area DA systems over the CONUS that periodically discards cycled states and replaces them with fields from a global model (e.g., Benjamin et al. 2016; Wu et al. 2017).
Despite our seemingly discouraging EnKF-based results, continuously cycling EnKFs over large regional domains can potentially be enhanced by decreasing the cycling period (e.g., using 1-h cycles), assimilating more observations, and likely most importantly, improving the limited-area NWP model (e.g., Romine et al. 2013). In addition, our results documenting very slow perturbation growth over the first 12 h from EnKF ICPs compared to random ICPs suggest efforts toward understanding and accelerating these slow growths should be undertaken to improve short-term forecast spread from EnKF ICPs. While increasing EnKF resolution may also help, especially for nowcasting purposes, finer EnKF resolution is likely not a panacea, and it is entirely possible that continuously cycling limited-area EnKFs, despite their many attractive properties, may not currently be optimal for initializing large-domain regional CAEs, particularly for next-day forecasts that are less impacted by spinup. Nonetheless, ongoing research at NCAR is attempting to improve limited-area NWP models (e.g., Wong et al. 2020), with hopes that these efforts will translate into better continuously cycling DA systems over large regional domains.
Acknowledgments
This work was partially funded by NCAR’s Short-term Explicit Prediction (STEP) program and NOAA/OAR Office of Weather and Air Quality Grants NA17OAR4590182 and NA17OAR4590122. Thanks to Adam Clark and two anonymous reviewers for their constructive comments. All forecasts were produced on NCAR’s Yellowstone supercomputer (Computational and Information Systems Laboratory 2016). The National Center for Atmospheric Research is sponsored by the National Science Foundation.
REFERENCES
Accadia, C., S. Mariani, M. Casaioli, A. Lavagnini, and A. Speranza, 2003: Sensitivity of precipitation forecast skill scores to bilinear interpolation and a simple nearest-neighbor average method on high-resolution verification grids. Wea. Forecasting, 18, 918–932, https://doi.org/10.1175/1520-0434(2003)018<0918:SOPFSS>2.0.CO;2.
Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903, https://doi.org/10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.
Anderson, J. L., 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131, 634–642, https://doi.org/10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2.
Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A, 72–83, https://doi.org/10.1111/j.1600-0870.2008.00361.x.
Anderson, J. L., 2012: Localization and sampling error correction in ensemble Kalman filter data assimilation. Mon. Wea. Rev., 140, 2359–2371, https://doi.org/10.1175/MWR-D-11-00013.1.
Anderson, J. L., and N. Collins, 2007: Scalable implementations of ensemble filter algorithms for data assimilation. J. Atmos. Oceanic Technol., 24, 1452–1463, https://doi.org/10.1175/JTECH2049.1.
Anderson, J. L., T. Hoar, K. Raeder, H. Liu, N. Collins, R. Torn, and A. Arellano, 2009: The Data Assimilation Research Testbed: A community facility. Bull. Amer. Meteor. Soc., 90, 1283–1296, https://doi.org/10.1175/2009BAMS2618.1.
Baldwin, M. E., and J. S. Kain, 2006: Sensitivity of several performance measures to displacement error, bias, and event frequency. Wea. Forecasting, 21, 636–648, https://doi.org/10.1175/WAF933.1.
Barker, D. M., 2005: Southern high-latitude ensemble data assimilation in the Antarctic Mesoscale Prediction System. Mon. Wea. Rev., 133, 3431–3449, https://doi.org/10.1175/MWR3042.1.
Barker, D. M., and Coauthors, 2012: The Weather Research and Forecasting Model’s Community Variational/Ensemble Data Assimilation System: WRFDA. Bull. Amer. Meteor. Soc., 93, 831–843, https://doi.org/10.1175/BAMS-D-11-00167.1.
Benjamin, S. G., and Coauthors, 2016: A North American hourly assimilation and model forecast cycle: The Rapid Refresh. Mon. Wea. Rev., 144, 1669–1694, https://doi.org/10.1175/MWR-D-15-0242.1.
Bishop, C. H., and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part I: Tests on simple error models. Tellus, 61A, 84–96, https://doi.org/10.1111/j.1600-0870.2008.00371.x.
Bishop, C. H., and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part II: A strategy for the atmosphere. Tellus, 61A, 97–111, https://doi.org/10.1111/j.1600-0870.2008.00372.x.
Blunden, J., and D. S. Arndt, 2016: State of the Climate in 2015. Bull. Amer. Meteor. Soc., 97, SI–S275, https://doi.org/10.1175/2016BAMSSTATEOFTHECLIMATE.1.
Bouttier, F., B. Vieì, O. Nuissier, and L. Raynaud, 2012: Impact of stochastic physics in a convection-permitting ensemble. Mon. Wea. Rev., 140, 3706–3721, https://doi.org/10.1175/MWR-D-12-00031.1.
Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1–3, https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
Cafaro, C., T. H. A. Frame, J. Methven, N. Roberts, and J. Bröcker, 2019: The added value of convection-permitting ensemble forecasts of sea breeze compared to a Bayesian forecast driven by the global ensemble. Quart. J. Roy. Meteor. Soc., 145, 1780–1798, https://doi.org/10.1002/qj.3531.
Chen, F., and J. Dudhia, 2001: Coupling an advanced land surface–hydrology model with the Penn State–NCAR MM5 modeling system. Part I: Model implementation and sensitivity. Mon. Wea. Rev., 129, 569–585, https://doi.org/10.1175/1520-0493(2001)129<0569:CAALSH>2.0.CO;2.
Clark, A. J., 2017: Generation of ensemble mean precipitation forecasts from convection-allowing ensembles. Wea. Forecasting, 32, 1569–1583, https://doi.org/10.1175/WAF-D-16-0199.1.
Clark, A. J., W. A. Gallus Jr., M. Xue, and F. Kong, 2009: A comparison of precipitation forecast skill between small convection-allowing and large convection-parameterizing ensembles. Wea. Forecasting, 24, 1121–1140, https://doi.org/10.1175/2009WAF2222222.1.
Clark, A. J., W. A. Gallus Jr., and M. L. Weisman, 2010a: Neighborhood-based verification of precipitation forecasts from convection-allowing NCAR WRF Model simulations and the operational NAM. Wea. Forecasting, 25, 1495–1509, https://doi.org/10.1175/2010WAF2222404.1.
Clark, A. J., W. A. Gallus Jr., M. Xue, and F. Kong, 2010b: Growth of spread in convection-allowing and convection-parameterizing ensembles. Wea. Forecasting, 25, 594–612, https://doi.org/10.1175/2009WAF2222318.1.