1. Introduction
Ensemble numerical weather forecasts are now routinely generated at operational meteorological centers and provide realizations that reflect the evolution of atmospheric states (Molteni et al. 1996; Charron et al. 2010; Toth and Kalnay 1997). Hydrological applications rely on precipitation forecasts as inputs (Roulin 2007; Verkade and Werner 2011). However, raw ensemble precipitation forecasts are biased and unreliable because of suboptimal initial conditions, model physics, and insufficient spatial resolution (e.g., Jeworrek et al. 2019; Lopez 2007; Clark et al. 2009). For this reason, the statistical postprocessing of precipitation forecasts—bias-correction, probabilistic calibration, and spatiotemporal consistency reconstruction—is a key step that improves their utility in hydrological forecasting, water resource management (e.g., Buytaert et al. 2010; Ward et al. 2011) and other applications like agriculture (e.g., Robertson et al. 2007; Glotter et al. 2014).
Several types of bias-correction and calibration methods have been developed for precipitation forecasts, including both parametric [e.g., nonhomogeneous regression (Scheuerer 2014; Scheuerer and Hamill 2015); Bayesian model averaging (Sloughter et al. 2007); logistic regression (Hamill et al. 2004)] and nonparametric methods [e.g., rank-histogram-based calibration (Hamill 2001), best-member dressing (e.g., Fortin et al. 2006); quantile regression (Bremnes 2004)].
Analog ensembles (AnEns) (Hamill et al. 2015; Hamill and Whitaker 2006) are one of the nonparametric methods that have been applied to ensemble precipitation postprocessing. For each current forecast lead time and location, the AnEn method identifies similar historical date/times in a reforecast dataset; and forms an ensemble composed of the observed or analyzed precipitation amounts at the identified date/times.
The AnEn method leverages a large reforecast archive without requiring a priori distributions and can calibrate the forecasted state into realizations with a flexible size (Hamill and Whitaker 2006). Compared to other more complicated machine learning approaches, AnEns are also easier to implement; they can be maintained conveniently with updates of postprocessing routines. These strengths make the AnEn method a good option for postprocessing precipitation forecasts for hydrological applications (e.g., Yang et al. 2020). Although successful in general, AnEns have two notable limitations: 1) they do not account for spatiotemporal consistencies of their target variable (e.g., Sperati et al. 2017), 2) their performance is impacted by random variations in their reforecast input, and thus they introduce small-scale noise into their output (Hamill et al. 2015, their appendix A). These deficiencies imply that AnEn members can be further adjusted by other methods to improve their quality.
Limitation 1 is not unique to the AnEn method; univariate postprocessing methods typically cannot produce physically realistic outputs when they are applied to locations and forecast lead times independently. Existing studies have proposed solutions to this common issue with copula-based ensemble member shuffling (e.g., Clark et al. 2004; Schefzik et al. 2013; Schefzik 2016). The minimum divergence Schaake shuffle (MDSS; Scheuerer et al. 2017) is one of such methods that have successfully been applied to precipitation ensembles. The MDSS models the multivariate dependence structure of the forecasted variables by selecting training samples based on the distribution divergence between the postprocessed forecasts and historical analysis.
Limitation 2 connects to the nonparametric nature of the AnEn method and impacts its performance in complex terrain, where orographic precipitation patterns may exhibit larger random variations and errors that can mislead the analog date search.
Convolutional neural networks (CNNs) are types of deep learning models based on hierarchically assigned spatial operations. They are well suited for gridded, pattern-based learning problems (Aloysius and Geetha 2017; Gu et al. 2018; Goodfellow et al. 2016). CNNs are capable of recovering spatial information from noisy inputs, and typically perform better than other spatially agnostic statistical models (Tian et al. 2020; Shahdoosti and Rahemi 2019; Cruz et al. 2018). Further, CNNs have successfully been applied in meteorological problems. In particular, encoder–decoder CNNs, such as UNET, are widely used to postprocess gridded forecasts. Chapman et al. (2019) adapted an UNET model to improve atmospheric river forecasts. Grönquist et al. (2021) applied UNET variants on the postprocessing of 500- and 850-hPa prognostic variables; Sha et al. (2020a,b) performed UNET-based gridded downscaling of 2-m temperature and precipitation with transfer learning.
In this research, A CNNs with UNET architecture are applied to adjust the output of the AnEn algorithm and Schaake shuffle to address their limitations—with the goal of producing physically realistic spatiotemporal precipitation sequences that are more skillful in complex terrain and better calibrated to precipitation extremes.
The proposed AnEn–CNN hybrid scheme is tested primarily in British Columbia (BC), Canada, using Global Ensemble Forecast System (GEFS) precipitation forecasts out to a 7-day lead time. The following research questions are addressed: 1) What is the skill of the AnEn–CNN hybrid relative to a conventional AnEn method? 2) Can the AnEn–CNN hybrid postprocess heavy precipitation events in different hydrologic regions? 3) Does the AnEn–CNN hybrid scheme have practical significance in complex terrain areas such as BC? By answering these the authors aim to develop more skillful precipitation forecasts that support hydrological applications in complex terrain, and more broadly, introduce CNNs to ensemble forecast postprocessing, inspiring creative works in the future.
2. Region of interest
The region of interest of this research is defined as 48.25°–60°N, 141°–113.25°W including British Columbia (BC), Canada, and southeast Alaska (Fig. 1a; shaded area). This research focuses on three hydrologic regions within this area: South Coast, Southern Interior, and Northeast (Fig. 1b). These regions are monitored with station networks and represent different geographical–climatological conditions, and thus provide an opportunity to verify precipitation postprocessing in regions with disparate precipitation characteristics.
The South Coast of BC is in a maritime climate. Precipitation has a strong seasonal pattern, with dry periods in May–September and persistent rains in October–January (Schnorbus et al. 2014); most of it is in liquid form and related to Pacific frontal systems and coastal orography (Houze 2012; Roe 2005).
The Southern Interior has a continental humid climate. Precipitation in this area has seasonal variations with a winter maximum and summer minimum (Schnorbus et al. 2014). In the drier spring–summer, synoptic-scale moisture transport can be locally modified by orography-related dynamics (e.g., Bruintjes et al. 1994; Cox et al. 2005) and microphysical processes (e.g., Bergeron 1965), yielding convective precipitation.
The Northeast generally features a continental subarctic climate. Precipitation in this area follows a seasonal pattern of summer maximum and spring minimum (Schnorbus et al. 2014). Localized convective events and a lack of data assimilation sources make precipitation forecasts in Northeast BC the poorest among the regions. Few studies have discussed ensemble postprocessing here.
Precipitation forecast postprocessing in BC is also important for the public good. The main electric utility in BC, BC Hydro, generates most of its electricity from hydropower, mostly within the watersheds of the Peace (Northeast BC) and Upper Columbia (Southern Interior) river basins (BC Hydro 2020; Sha et al. 2021). Skillful and localized precipitation forecasts are key inputs for the planning and management of these hydroelectric facilities.
3. Data
a. Forecast data
This research postprocesses GEFS total precipitation and uses GEFS column integrated precipitable water as an additional predictor. The GEFS is an operational weather forecast model maintained by the National Centers for Environmental Prediction (NCEP) (Zhou et al. 2017). This research uses 0.25° GEFS gridded forecasts.
Reforecast data are a valuable resource for developing postprocessing methods (e.g., Hamill et al. 2006; Hamill 2018). In this research, the twelfth-generation GEFS reforecast (National Centers for Environmental Prediction 2021) is used in the training and testing of the AnEn–CNN hybrid scheme and other baseline methods.
Both the GEFS forecast data, and reforecast data described here, initialize daily at 0000 UTC, and contain 3-hourly precipitation forecasts from +9 to +168 h, for the historical period of 2000–19. The GEFS reforecast generates five ensemble members that are statistically consistent with the operational GEFS members (Guan et al. 2019), and thus can approximate the forecasted state of their operational counterparts.
b. Gridded reanalysis and elevation
The European Centre for Medium-Range Weather Forecasts Reanalysis version 5 (ERA5) provides hourly, 0.25°, global analyses of meteorological variables, including precipitation.
In this research, the ERA5 total precipitation is applied with several purposes: 1) the training target of the AnEn algorithm—once analogs are identified within the GEFS reforecast, an ensemble is formed from the ERA5 precipitation at those analog date/times; 2) the training target of Schaake shuffle—the copula relationships are estimated based on the rank structure of the ERA5; 3) the training target of the CNN model; and 4) estimating the cumulative distribution functions (CDFs) of monthly precipitation climatologies.1
Many ensemble postprocessing studies apply gridded precipitation analyses as training targets (e.g., Hamill and Whitaker 2006; Hamill et al. 2015; Scheuerer and Hamill 2015; Grönquist et al. 2021). The value of reanalyses in forecast postprocessing has been addressed by comparison studies (e.g., Marzban et al. 2006; Sperati et al. 2017) and reviews (e.g., Haupt et al. 2021). Some further justifications are provided here for the use of ERA5 as postprocessing training and validation targets:
-
The ERA5 precipitation has good quality. Several studies [Hersbach et al. 2020 for global averages; Crossett et al. 2020; Xu et al. 2019; Tarek et al. 2020 for regional studies; Odon et al. 2018 for BC (based on the ERA-Interim, an older version of the ERA5)] show that the ERA5 is capable of representing observed precipitation.
-
The authors have statistically compared the ERA5 precipitation and station observations in BC (see supplemental document). Results confirm that ERA5 precipitation is adequate for training postprocessing methods, and is more usable than station observations because of its spatiotemporal consistencies.
-
The ERA5 precipitation can train the AnEn, Schaake shuffle, and CNNs over the entire domain. This is important when new facilities are built and need to receive postprocessed forecasts. In this scenario, no historical observations would be immediately available to support the training of the AnEn and Schaake shuffle algorithms.
That said, there are also implications for the ERA5-based training. As a gridded product, the ERA5 value represents an area average of its 0.25° grid point; it may underestimate the observational variability of the real world (e.g., Allen et al. 2021; Feldmann et al. 2019). For point-based postprocessing, collecting station observations as training targets is beneficial. In this research, it is practically the best, however, to use the ERA5 to train gridded postprocessing models.
For data preprocessing, ERA5 precipitation from 2000 to 2019 is aggregated to 3-h periods and paired with the preprocessed GEFS reforecast.
This research obtains gridded elevation from the ETOPO1, a 1-arc-min resolution global relief model maintained by the National Geophysical Data Center (NGDC) (Amante and Eakins 2009), regridded to 0.25° through bilinear interpolation.
c. Station observations
The postprocessed GEFS precipitation forecasts are verified against station observations of 84 gauge stations in BC. The station network is maintained by BC Hydro and covers the three hydrologic regions (Fig. 1b; 26 stations in the South Coast; 34 stations in the Southern Interior; 24 stations in the Northeast). These stations are located close to the key hydroelectric watersheds and are an important focus of the forecast verification.
BC Hydro stations use standpipe- and weighing-bucket-type precipitation gauges; they provide real-time bucket height values with instrumental sampling resolutions ranging from 15 min to 1 h, and gauge height precisions of either 0.1 or 1.0 mm. Manual quality control procedures were conducted by BC Hydro (Sha et al. 2021; BC Hydro 2021, personal communication).
BC Hydro station observations are the verification target (but not training and validation targets) of this research, because 1) they represent the best data available for precipitation “ground truth” in BC watersheds, the focus of forecast verification and 2) they are independent of the ERA5—postprocessing methods are trained by the ERA5; verifying them on the same data introduces risks of confirmation bias.
For data preprocessing, BC Hydro station observations were aggregated to 3-h periods, cleaned with a range check.
4. Methods
a. The AnEn–CNN hybrid
Three postprocessing methods are incorporated into the AnEn–CNN hybrid (Fig. 2). First, the AnEn algorithm converts the GEFS forecast and reforecast ensemble means into calibrated, bias-corrected, but not physically realistic AnEn members. Second, a Schaake-shuffle-based method reconstructs AnEn members into sequences with more physically realistic spatiotemporal dependencies. Finally, a CNN model (our contribution) is applied, reducing the small-scale spatial noise at each forecast lead time by taking gridded elevation and precipitation climatology as additional predictors. After training, the same CNN is applied to all forecast lead times and locations, but does not change the locations of precipitation centers. Thus, as a multivariate postprocessing model, the CNN can refine forecasts without negatively impacting the spatiotemporal structures modeled by the Schaake shuffle.
The above three postprocessing methods are trained and validated in succession with the ERA5 precipitation. For the AnEn and Schaake shuffle algorithms, their training and validation periods are 2000–14 and 2015–16, respectively. Both methods need a long and consistent time period to obtain training information, because they are implemented based on initialization times, in which one initialization is converted to one training sample. The training period of the CNN is 2015–16; its validation data are split from the training set randomly. The verification period of the final postprocessing outputs is 2017–19.
1) AnEn with augmented SLs
The AnEn–CNN hybrid scheme begins with a two-step AnEn algorithm. A conventionally used benchmark, as described in (Hamill et al. 2015, hereafter H15), is adopted by this research and introduced herein.
First, the training data of the AnEn algorithm is augmented with “supplemental locations” (SLs). SLs are searched within a large spatial extent (Fig. 1a, the map extent). For each postprocessed grid point within BC (Fig. 1a, shaded area), its SLs are determined based on the similarity of 1) analyzed monthly precipitation climatology, 2) elevation, 3) facet (i.e., the direction a slope faces), and 4) distance. Where (1) is measured based on the Kolmogorov–Smirnov distance of monthly CDFs. The SL search minimizes the linear combination of 1–4, and is subject to the constraint that each grid point and its SLs do not neighbor each other (H15). Nineteen SLs (i.e., the same number as H15) were identified for each postprocessed grid point in the BC domain, and based on the ERA5 monthly precipitation and the ETOPO1 elevation; three example grid points and their SLs in January and July are illustrated in Figs. 1c and 1d, respectively.
The ensemble mean of APCP and PWAT are used in the analog search. Using the ensemble mean rather than single-deterministic members can improve the performance of AnEn methods, because it reduces the random variation of reforecast inputs that negatively impact the analog search (Hamill et al. 2006; Hamill and Whitaker 2006).2
Linear coefficients of APCP and PWAT in Eq. (1) are optimized based on the validation set performance of continuous ranked probability score (CRPS) for all forecast lead times (validated by the ERA5, not shown). This hyperparameter search was conducted with steps of 0.02, and initial guesses of 0.70 and 0.30 for APCP and PWAT, respectively. Incorporating PWAT also solves ties in similarity measures because it is likely nonzero even if APCP is forecasted as 0 mm (Hamill et al. 2015).
Next, the analog search is performed on grid points and forecast lead times independently, with a ±30-day window around the date of the reforecasts (i.e., t ∈ [tc − 30, tc + 30]). Similar to H15, the reuse of SLs is constrained. For each (x, y), each of its SL(x, y) can be used once per time window. This constraint applies on each (x, y) individually. Different (x, y) may share the same SL; this reuse is not constrained.
Finally, once the analog search is completed, the ERA5 precipitation is used to form 25 AnEn members. The ensemble size was chosen to balance calibration performance and computation load (cf. Eckel and Monache 2016).
2) Minimum divergence Schaake shuffle (MDSS)
AnEn methods calibrate marginal distributions of precipitation independently for each location and forecast lead time. However, they are not regularized by spatiotemporal dependencies of the target variable, and thus, cannot produce physically realistic calibrated outputs (e.g., Sperati et al. 2017; Scheuerer et al. 2017).
The Schaake shuffle (Clark et al. 2004) and its variants (Schefzik et al. 2013; Schefzik 2016; Scheuerer et al. 2017) are nonparametric methods that can restore spatiotemporal consistency to calibrated AnEn members. In this application, given M AnEn members, the Schaake shuffle obtains M physically realistic “dependence templates” from its training data (the ERA5), and re-indexes the M AnEn members based on the rank structure of the dependence templates.
In this research, a state-of-the-art Schaake shuffle variant, the minimum divergence Schaake shuffle (MDSS; Scheuerer et al. 2017) is applied, and converts 25 AnEn members into 25 sequences. MDSS selects its dependence templates from historical analyzed conditions and by minimizing the total divergence (the sum of distribution divergence over all locations and forecast lead times) between templates and AnEn members.
The implementation of the MDSS is similar to Scheuerer et al. (2017), but with coarser CDF quantiles of {0.25, 0.5, 0.7, 0.9, 0.95}. Dependence templates are provided by the 2000–14 ERA5 precipitation. “Template candidates” are selected within a 61-calendar-day window centered on the initialization time of the new forecast. These candidates are discarded heuristically based on the total divergence loss, until 25 candidates remain.
3) CNN-based AnEn adjustments
AnEn methods are types of the k-nearest-neighbor (k-NN) algorithm (e.g., Yang 2019) and inherit its limitations; notably, k-NN can over fit to the random variations of its inputs, downgrading testing set performance (Kramer 2013). When applied to precipitation forecasts, AnEn algorithms are specifically impacted by this limitation because their reforecast inputs typically contain noise caused by, for example, complex terrain, convective precipitation, and errant forecasts that are increasingly common at longer lead times.3 Aside from the use of ensemble mean and a large k [both are helpful according to Hamill and Whitaker (2006)], prior research has not tackled this overfitting problem.
The existence of small-scale noise within AnEn members was recognized by H15 who employed a Savitzky–Golay smoothing filter to produce visually interpretable results. Inspired by the use of low-pass convolution filters, this research adopts a CNN as an improved solution; it learns to adjust the output of the AnEn by extracting meteorologically meaningful features and reducing the small-scale noise of AnEn members.
The base architecture of the proposed CNN is UNET 3+ (Huang et al. 2020). This choice is determined from the benchmarking of different UNET-like architectures, including the UNET (Ronneberger et al. 2015), Attention-UNET (Oktay et al. 2018), UNET++ (Zhou et al. 2018), and UNET 3+ (Huang et al. 2020) based on their validation set performance. UNET 3+ is an encoder-decoder CNN with full-scale skip connections and deep supervision, loosely defined under the concept of “fully convolutional networks” (e.g., Ronneberger et al. 2015; Zhou et al. 2018). An encoder-decoder architecture is applied here because it handles de-noising problems well; its encoder compresses noisy inputs into learnable representations, and its decoder reconstructs full-resolution targets based on the encoded representations.
Hyperparameters of the base architecture were investigated through a grid search and determined by validation loss (validated by the ERA5, not shown). The resulting architecture contains four encoding levels; each consists of two convolutional layers. Decoding blocks are formed with full-scale skip connections that extract information from different encoding/decoding levels (Fig. 3a).
In the inference stage, the proposed CNN takes three inputs: 1) postprocessed gridded forecasts at each forecast lead time, 2) the ERA5 monthly precipitation climatology, and 3) elevation. It produces a normalized gridded precipitation forecast as output (Fig. 3b). Inputs 1 and 2 are normalized by logarithm transformations [y = log(x + 1)], and input 3 is normalized by minimum–maximum scaling. The CNN output is further processed by nonnegative correction and de-normalization before use.
In the training and validation stage, however, there are several differences from the inference stage:
-
ERA5 precipitation at the forecasted time is the training target. AnEn members with forecast lead times of +9 to +36 h are linearly combined with the ERA5 target, and applied as the training input (Fig. 3b). It is assumed that AnEn members at short forecast lead times loosely represent the precipitation intensity spectrum of the ERA5, and thus, can be mixed into the ERA5 as the source of precipitation noise. Using a linear combination of AnEn members and the ERA5 target as input can guide the CNN to preserve precipitation centers while de-noising. The CNN will be penalized if its input precipitation centers, which already contain the ERA5 precipitation, are significantly relocated. The weights of this linear combination (k in Fig. 3b) are the random draws of the uniform distribution of [0.7, 0.9]. This randomness can regularize the CNN to produce more robust results under different noise levels.
-
The input AnEn members are not shuffled by the MDSS (Fig. 3b). Precipitation patterns represented by those sequences are different from the ERA5 targets even at short forecast lead times. Taking shuffled sequences as inputs could mislead CNNs to relocate precipitation centers (not a desirable trait).
-
Training inputs are subsetted from the full domain with 32 × 32 and 48 × 48 sizes. After subsetting, AnEn members that contain enough nonzero and extreme values are chosen to use as training input, whereas drier regions are discarded. Similar to difference 2 above, this choice also guides the CNN to process localized precipitation centers without relocating precipitation centers.
The CNN training and validation period is 2015–16. The validation set is split from 2015 to 2016 randomly. Note that the ERA5 is deterministic, whereas its paired 25 AnEn members are an ensemble. The above training procedure has an implicit 25-fold data augmentation that ensures the size of the training set to be sufficient. The training procedure is fully supervised with mean absolute error (MAE) loss and deep supervision (Wang et al. 2015). Adaptive moment estimation (Kingma and Ba 2017) and stochastic gradient descent (Loshchilov and Hutter 2017) are used for optimizing model weights.
b. Postprocessing experiments and baseline methods
The first control method of this research combines the AnEn [with SLs; H15; see section 4a(1)] and MDSS algorithms, but without CNN-based adjustments (Fig. 2). Hereafter, it is named “SL-H15.” The Savitzky–Golay filter smoothing of H15 is not implemented, because this step was proposed to smooth calibrated probability maps, not precipitation sequences. We found that when low-pass filters are applied to smooth actual precipitation fields, it would reduce heavy precipitation values and produce more drizzle. The resulting output becomes even less skillful.
The other control of this research is “noSL-H15,” namely, similar to SL-H15 but without SL-based data augmentation. This control is proposed to evaluate the actual benefits of SLs in BC—no existing research has applied SLs in this area.
The two H15 controls above will be contrasted with “SL-CNN” and “noSL-CNN,” respectively, and the resulting skill score differences measure the benefits of CNN-based adjustments.
All methods above rely on MDSS to model spatiotemporal dependencies. For the AnEn–CNN hybrid, the CNN component is applied after MDSS and does not impact the selection of dependence templates (Fig. 2).
In addition to the two H15 controls (SL-H15, noSL-H15) and the AnEn–CNN hybrids (SL-CNN, noSL-CNN), a quantile-mapping-based postprocessing baseline method is applied using forecasted and analyzed monthly CDFs derived from the 2000 to 2014 GEFS reforecast and ERA5, respectively (similar to Hamill et al. 2017 but with climatology-based monthly CDFs). This method quantile maps the five GEFS reforecast members with 3 × 3 stencil grid points to produce a total of 45 calibrated members. They are more skillful than the uncalibrated reforecast, but are not competitively skillful because correlations between the forecasted and analyzed precipitation are relatively weak, especially in terms of their extreme values [more discussion, see Hamill and Whitaker (2006)]. As a more conventional, statistical postprocessing method, the quantile-mapped GEFS is used as the baseline for individual lead time performance (cf. Hamill and Whitaker 2006) (Fig. 2).
c. Verification methods
This research verifies results against BC Hydro observations from 2017 to 2019. The gridded values of postprocessed precipitation ensembles are compared to station observations in their corresponded grid cells. That said, nearest neighbor interpolation is applied to estimate station forecasts. The two verification skill scores involved are continuous ranked probability skill score (CRPSS; Grimit et al. 2006) and Brier skill score (BSS; Murphy 1973); they are derived from strictly proper scoring rules, the CRPS and Brier score (BS), respectively. Climatology values used to calculate skill scores are taken from the 2000–14 ERA5 monthly precipitation climatology at station-location grid points.
CRPSs and BSs are computed for individual initialization days, forecast lead times, and station grid points. The resulting three-dimensional arrays are averaged temporally, and then averaged station-wise. Finally, climatology-based reference strategies are applied to produce CRPSSs and BSSs. For BSSs, the above steps are explained in Hamill and Juras (2006). Three-component decomposition of BSs and reliability diagrams are also computed to attribute the BSS difference; their computation steps follow Murphy (1973) and Hsu and Murphy (1986).
This research does not cross-validate results, but rather splits data into training, validation, and verification periods. This is mainly because BC Hydro observations have limited temporal availability, and are not a temporally consistent verification target. Bootstrap aggregation is applied for all 3-hourly skill score results to minimize the impact of observation uncertainties. Two-sided Wilcoxon signed-rank tests are applied to determine if skill scores are statistically significantly different.
5. Results
a. An example case
A case-based assessment is presented to demonstrate the output of the different postprocessing methods. The forecast is initialized on 1 February 2019 with a +15-h horizon. Based on the ERA5 precipitation at the forecast valid time, two primary precipitation regions are found: one along the South and Central Coast, and the other over the Interior mountains (Fig. 4b).
The AnEn algorithm is applied first; its members loosely capture the location and intensity of precipitation centers, but the spatial distribution of precipitation intensities are physically unrealistic and contain small-scale noise (Fig. 4a). MDSS is then applied to reconstruct AnEn members into more realistic spatiotemporal sequences. This realistic precipitation pattern is evident in Fig. 4c (the SL-H15 control).
The AnEn and MDSS algorithms perform as expected, but there is still too much small-scale spatial noise despite being reshuffled by the MDSS. The 8.5 mm day−1 contour line in Fig. 4c illustrates one impact of this problem—boundaries of different precipitation intensities are not estimated properly. Further, this is not a visual problem only—in this example case, it also introduces a broad range of wet and dry precipitation biases among stations in the South Coast (Fig. 4e). Thus, there is potential for even better results if the remaining small-scale noise is reduced.
CNN-based adjustments (Fig. 4d; SL-CNN) are applied to the example sequence, with additional inputs of monthly precipitation climatology and elevation. Comparing SL-CNN to SL-H15, three performance highlights are evident:
-
The two precipitation centers modeled by the MDSS are preserved (cf. color shades in Figs. 4c,d). The CNN also preserves the domain-wise precipitation intensity spectrum (cf. histograms in Figs. 4c,d).
-
CNN-based adjustments refine the boundaries of different precipitation intensities. For example, light precipitation in central interior BC (which the ERA5 correctly analyzes as a rain-shadowed region) is reduced (cf. contour lines in Figs. 4c,d). Precipitation patterns around the Coast Mountains are extended eastward; the isolated peak values in the central BC coast are slightly shifted toward the South Coast (cf. color shades in Figs. 4b–d). These changes better align the forecasted precipitation with the precipitation climatology and orography (cf. color shade in Figs. 1c and 4c), which the CNN uses as inputs; and importantly, with the ERA5 target (Fig. 4b).
-
CNN-based adjustments improve the station-observation-based deterministic comparisons. For South Coast stations, the range of precipitation bias is narrowed, and some highly underestimated station values are dramatically improved (Fig. 4e). For Southern Interior stations, the median of precipitation bias is reduced to zero, which also improves the mean absolute error (MAE; Fig. 4f).
b. CRPSS performance
CRPSS is averaged over all stations and shown for 3-hourly individual forecast lead times. Two sets of results were produced for cool (October–March) and warm (April–September) seasons.
Cool season CRPSSs (Figs. 5a,b) linearly decrease through the forecast period. Warm season CRPSSs are similar in magnitude, but decrease less over the period. Also, they are impacted by a large diurnal cycle: higher skill from 0900 to 1200 UTC (0100–0400 PST; predawn hours), and lower skill from 0000 to 0300 UTC (1600–1900 PST; late afternoon) (Figs. 5c,d). This diurnal cycle is in part explained by diurnal (radiative) heating and resulting orographic convection (Colle et al. 2013). Thermally driven orographic convective precipitation is harder to forecast and is typically triggered on summer afternoons, and thus, introduces periodic signals into the CRPSS curves.
All of the AnEn-based postprocessing methods perform better than the quantile-mapped GEFS baseline (gray solid lines, Figs. 5a–d), indicating that AnEn methods are better at producing more accurate and probabilistically calibrated forecasts. Also, all methods have positive CRPSSs, indicating that they are more skillful than the climatology reference.
The performance gains resulting from adding a CNN are measured by comparing SL-CNN and noSL-CNN with SL-H15 and noSL-H15 (Figs. 5e–h). Despite the impact of the diurnal cycle, CNN-based adjustments roughly account for CRPSS gains of 0.03. This performance gain is statistically significant, and does not diminish with increasing forecast lead time. This translates to ∼6% improvement at the earliest lead times, and ∼11% at the longest lead times (cf. Figs. 5c,d,g,h).
The effectiveness of SL-based data augmentation is measured by comparing SL-H15 and SL-CNN with noSL-H15 and noSL-CNN, respectively (Figs. 5i–l). SL-based data augmentation leads to a CRPSS increase at most lead times, but primarily within the first 3–4 forecast days. When SL-CNN is contrasted with noSL-CNN, the CRPSS increase is smaller but more persistent as forecast lead times increase. To explain this finding, the authors hypothesize that the SL-based data augmentation and CNN-based adjustments may contribute overlapping improvements to the AnEn forecasts. SLs are identified based on terrain roughness and precipitation climatology, which are also applied as CNN inputs. Investigating process-based explanations of this overlap and incorporating SLs into CNN training would be a worthwhile future research topic.
c. Heavy precipitation performance by lead time and hydrologic region
In this section, BSS and reliability diagrams are calculated based on a 3-hourly 90th-percentile precipitation event threshold derived from the ERA5 monthly climatology, calculated for each station and 3-month centered calendar period for all days. This threshold represents heavy precipitation events and forecasts. Percentile-based, rather than value-based, thresholds are preferred because of the dramatic differences in climatological precipitation across the complex terrain of BC. Using fixed threshold values may undesirably down-weight or exclude drier stations and time periods. We present results by hydrologic region.
1) South Coast
The monthly 90th-percentile thresholds of the South Coast stations vary from 20 to 40 mm day−1 in winter and from 5 to 15 mm day−1 in summer (Fig. 6n).
All postprocessing methods show higher BSSs in winter and lower in summer. The seasonal difference is slightly larger for shorter forecast lead times (Figs. 6a–e). This is likely because of the synoptic-scale systems (e.g., Pacific frontal-cyclone systems) in winter. Synoptic-scale precipitation at short forecast lead times has relatively high predictability in the GEFS (Scheuerer and Hamill 2015), and thus, is easier to postprocess than summertime convective heavy precipitation events.
All of the AnEn-based methods outperform the quantile-mapped GEFS baseline. The difference is around 0.05–0.1 in winter–spring and slightly lower in summer (some are statistically insignificant, but mostly still positive) (Figs. 6f–i). Also, AnEn-based methods show mostly positive BSSs at all forecast lead times, indicating more skill over the climatology reference through day 6.
The AnEn–CNN hybrid performance is measured by contrasting SL-CNN and noSL-CNN with SL-H15 and noSL-H15 (Figs. 6j,k). The difference is mostly positive and statistically significant; it ranges from 0 to 0.03 in winter and from 0 to 0.05 in summer. The amount of BSS increase for forecast hours 9–24 has relatively large oscillations, slightly higher in spring–summer, and lower in fall–winter. For forecast days 3–5, the improvement increment is stable around 0.03 in winter and slightly lower in summer. Overall, the AnEn–CNN hybrid method is more skillful than the two H15 controls, bringing a roughly 20% relative BSS increase (∼0.03 BSS increase relative to BSSs of ∼0.15).
Comparing BSSs for SL-H15 with noSL-H15, SL-based data augmentation shows improvements at short forecast lead times and in summer months. For long forecast lead times and winter months, noSL-H15 slightly outperforms SL-H15, indicating that supplemental locations may make some forecasts worse at the South Coast (Fig. 6l). Comparing SL-CNN and noSL-CNN, there are smaller but more consistent improvements using SLs (Fig. 6m). This finding is somewhat similar to the CRPSS verification results (Figs. 5j,l), and implies some redundancy or overlap. That is, the CNN may have corrected some error characteristics that the SL-based data augmentation would have otherwise.
Reliability diagrams in Fig. 7 provide further details regarding heavy precipitation performance at the South Coast. The quantile-mapped GEFS baseline exhibits high resolution, but is not reliable; its calibration curve stays close to the “no skill” reference line. It has high resolution because it frequently issues high probabilities for climatologically rare events. However it has poor reliability because its overconfident probabilities are often wrong. That is, the conditional probability of observed heavy precipitation events does not increase with the probability of the forecasted events. The H15 controls and AnEn–CNN hybrids are much more skilful than the quantile-mapped GEFS, exhibiting much better reliability, while maintaining similar resolution.
The AnEn–CNN hybrids exhibit higher resolution than the two H15 controls, which explains their superior BS and BSS performance. For day-1, all AnEn-based methods show good, comparable reliability, but at longer forecast lead times, the AnEn–CNN hybrids are more reliable than the H15 controls.
The BSS improvements from SL-based data augmentation are not large for the South Coast. SL-H15 has a better BS than noSL-H15 for day-1 and day-3 forecasts, but is slightly worse than noSL-H15 for day-5 forecasts. Based on the BS decomposition, SL-H15 is less reliable than noSL-H15 for longer lead times; it performs better than noSL-H15 at short lead times because it has higher resolution. When the CNN is applied, the reliability deficit of SL-H15 is in part solved, and its resolution performance is further improved. As a result, SL-CNN is the best method for calibrating 3-hourly heavy precipitation events at the South Coast, whereas noSL-CNN is second best, outperforming the two H15 controls. The reliability of noSL-CNN is comparable to that of SL-CNN, but its resolution is slightly lower.
2) Southern interior
The monthly 90th-percentile thresholds for the Southern Interior stations vary from 5 to 20 mm day−1 in winter and from 2 to 30 mm day−1 in summer (Fig. 8n).
The seasonal pattern of BSSs in the Southern Interior is similar to that of the South Coast. Winter–spring precipitation is more synoptically driven and verifies with higher BSSs; whereas local-scale convection in summer suppresses the performance of all methods, causing lower (and here, negative) BSSs at all forecast lead times (Figs. 8a–e). The authors have examined these negative BSS values at individual stations and found that they are commonly due to a mix of continuous dry days interspersed with isolated extreme values (i.e., isolated convective precipitation). When forecasts incorrectly estimate the timing of isolated 3-hourly extreme values with temporal or spatial shifts, it can result in a so-called “double penalty,” and the resulting BS can be worse than the overall dry climatology background. While object-oriented verification can be more lenient in these cases, such methods are not appropriate for some use cases. For example, spatial errors that place precipitation just outside a watershed make a critical difference to watershed inflows.
AnEn-based methods mostly outperform the quantile-mapped GEFS baseline, and the AnEn–CNN hybrids mostly by a large margin. BSS improvements are more clear and statistically significant in winter–spring and at shorter forecast lead times. One exception is day-2 BSSs in August-October, where noSL-H15 has the worst BSS (Figs. 8f–i).
The AnEn–CNN hybrids perform better than the two H15 controls at all forecast lead times. This performance difference is generally larger and statistically significant in winter–spring, BSS improvements vary from 0% to 40% (Figs. 8j,k). Reliability diagrams show that SL-CNN and noSL-CNN produce both more reliable and higher-resolution forecasts than the two H15 controls.
Comparing SL-H15 with noSL-H15, the contribution of SL-based data augmentation is evident up to day-4 (Fig. 8l). Reliability diagrams suggest that SL-H15 is more reliable than noSL-H15 and can achieve higher resolution. At long forecast lead times, resolution improvement is the main driver of its superior performance. Both BSSs and reliability diagrams suggest that SL-based data augmentation benefits 3-hourly heavy precipitation forecasts in the Southern Interior.
The BSS difference between SL-CNN and noSL-CNN is smaller but still positive (Fig. 8m). SL-CNN exhibits better reliability than noSL-CNN at all lead times, and slightly higher resolution for day-3 and day-5 forecasts (Fig. 9). Overall, SL-CNN is the best performing method for postprocessing 3-hourly heavy precipitation in the Southern Interior.
3) Northeast
In the Northeast, BSSs for precipitation 90th percentiles (Figs. 10a–m), and the 90th-percentile values themselves (Fig. 10n), have summer maxima and spring minima. All methods produce more skillful forecasts in May–October, with poorer BSSs in November–March (Figs. 10a–e). This poor performance is likely attributable to 1) difficulties in postprocessing solid precipitation given the significant observational errors, and 2) GEFS error characteristics in the winter over Northeast BC. Given that the same postprocessing methods performed well in winter for the Southern Interior where precipitation is also commonly solid, reason 2 may play a larger role. Further, validation set performance (relative to the ERA5, not station observations) exhibits very similar poor skill scores (negative BSSs the cool season). This is more evidence that poor GEFS predictability, not poor observation quality, accounts for most of the performance deficiencies in the postprocessed forecasts.
In May–October, all AnEn-based methods outperform the quantile-mapped GEFS baseline, with day-0 and day-1 forecasts, and summer and fall seasons, showing the largest performance gains (Figs. 10f–i). SL-based data augmentation improves BSSs either with or without CNN-based adjustments (Figs. 10l,m). Reliability diagrams suggest that the use of SLs produces more reliable and higher-resolution forecasts, and a larger resolution improvement for day-3 and day-5 (Fig. 11).
Excepting the challenging cool season, the AnEn–CNN hybrid performs better than the two H15 controls, with a BSS increase of roughly 0.03 for the warm season (May–October; Figs. 10j,k). Given the relatively low BSS in this area, the AnEn–CNN hybrid provides a roughly 30%–60% benefit for short forecast lead times (0.03 improvement for BSSs of 0.05–0.09). Given the relatively consistent (∼0.03) gains across all lead times and decreasing BSSs with lead time, the AnEn–CNN hybrid yields relatively larger gains at longer forecast lead times. This performance increase is confirmed by the reliability diagrams, with improvements in both reliability and resolution (Fig. 11).
d. Accumulated heavy precipitation
Skillful 7-day heavy precipitation total forecasts can support applications like flood risk assessments and volumetric water management (e.g., in hydroelectric operations). It is a good indicator of the usefulness of postprocessing methods in a real-world application (i.e., research question 3), where end users might be planning for a challenging sequence of storms (sometimes called a “storm cycle”). Temporally aggregated precipitation is sensitive to the spatiotemporal co-variability of the postprocessed sequences, which the MDSS should assemble realistically. Thus, this part of the verification also shows how well the AnEn–CNN hybrid scheme can produce physically realistic sequences.
Postprocessing outputs of the AnEn–CNN hybrid and the two H15 controls are considered in this verification (quantile-mapped GEFS is not). All of them are as reliable as they were for individual lead times (but resolutions are slightly decreased), indicating that the sequences contain appropriate spatiotemporal variability and are practical to be used as 7-day guidance. BSSs are much higher for 7-day accumulations than for 3-hourly forecast windows, which is an expected result because timing error penalties are largely eliminated (e.g., Jeworrek et al. 2021). All methods perform well at the South Coast, with BSSs ranging from 0.46 to 0.50 (Fig. 12a). Relatively poor BSSs are found in the Southern Interior and Northeast, around 0.2 and 0.1, respectively (Figs. 12b,c). As noted in the verification of 3-hourly heavy precipitation events, this regional performance difference is likely because of the high predictability of synoptically forced precipitation in the winter at the South Coast.
SL-CNN and noSL-CNN outperform the two H15 controls; both show moderate resolution improvements, while noSL-CNN also largely improves the reliability compared to noSL-H15 in the Southern Interior and Northeast. Overall, SL-CNN shows the best BSSs in all hydrologic regions for 7-day accumulated heavy precipitation events; noSL-CNN and SL-H15 are second best with comparable BSSs.
Results for both 7-day accumulated precipitation and 3-hourly precipitation at individual lead times indicate that noSL-H15 performs poorly in the Southern Interior and Northeast. To investigate this, the authors examined calibrated forecast distributions for several heavy precipitation periods; one example is shown in Fig. 13. The noSL-H15 members are positively skewed, with a lower 90th-percentile value than that of the SL-H15. Based on station observations, this points to a systematic underestimation. (Figs. 13a,b,d,f). This underestimation is found at nearly all inland verification stations, as well as Lower Mainland stations within the South Coast, but is worst in the Southern Interior. Moreover, the performance difference between noSL-H15 versus SL-H15 and noSL-CNN is even larger for 7-day accumulated precipitation than for the 3-hourly forecasts in the Southern Interior and Northeast BC. This is because the underestimations of noSL-H15 accumulate when individual forecast lead time values are summed over 7 days.
Hamill et al. (2015) and Hamill et al. (2017) explain the benefit of SL-based data augmentation for preventing the underestimation of extremes—nonparametric methods like AnEns leverage a large training set for calibration. When data augmentation is added, more precipitation extremes are incorporated into the training set, which prevents it from overfitting to less extreme reforecasts, avoiding the underestimation of extremes. SLs are identified in part using terrain features, so they are likely more effective in interior mountains, where the frontal systems are less organized and precipitation is more tied to the terrain. SLs are less effective at the South Coast, where well-organized Pacific frontal systems have relatively more influence on precipitation, and terrain relatively less.
Next, we examine why SL-CNN consistently performs better than SL-H15 for both 3-hourly and 7-day heavy precipitation, when CNN-based adjustments were originally proposed to reduce the small-scale noise problem of AnEns (e.g., examples in Fig. 4). First, histograms from SL-CNN are typically smoother than those from SL-H15 (Figs. 13b,c,e). Smoother histograms are less impacted by the discretization from a fixed ensemble size, and thus, better approximate the calibrated probability density functions. Second, the AnEn–CNN hybrid produces a slightly wider, flatter histogram with longer tails on both ends (Fig. 13e). Therefore, despite the SL-H15 90th percentile being closer to that of the BC Hydro station observations, the overall histogram shape of SL-CNN is in better agreement with that of the observations. This improves BSSs and reliability over both short and long accumulation periods.
Last, we revisit the question, can the AnEn–CNN hybrid scheme produce practically useful and physically realistic sequences? Our CNN is applied for multivariate postprocessing, in which the same CNN is used for all locations and forecast lead times. We have shown through case studies (Figs. 4 and 13) and verification results that the CNN successfully de-noises precipitation fields while preserving the location of precipitation centers. Thus, as long as the copula relationships are estimated properly—no matter through MDSS or other methods—the CNN does not impact the established multidimensional dependencies. As a result, we see that for the key indicator of 7-day accumulated heavy precipitation, the AnEn–CNN hybrid is as reliable as it is at individual forecast lead times, and maintains its superior performance relative to the H15 controls.
6. Discussion and conclusions
A novel postprocessing method, the AnEn–CNN hybrid, was proposed by incorporating a convolutional neural network (CNN) to refine precipitation forecast sequences produced by an analog ensemble (AnEn) and minimum divergence Schaake shuffle (MDSS). The AnEn–CNN hybrid was tested with GEFS reforecasts of 3-hourly precipitation and verified with station observations from three disparate hydrologic regions—the South Coast, Southern Interior, and Northeast—in British Columbia (BC), Canada, from 2017 to 2019.
This research uniquely focused on a limitation of the AnEn method. They are able to memorize and predict from large training sets, but the way they reassemble forecasts is vulnerable to the random variations, in space and time, of the training set. The MDSS, which Scheuerer et al. (2017) introduced in combination with AnEns, partially addressed the issue of spatiotemporal consistencies, creating realistic forecast sequences. Our research introduced CNNs to address the issue of the remaining random spatial variations, or noise, in precipitation forecast fields. CNNs are good at recovering pattern-based information from noisy fields, and thus, this work adds them to the AnEn postprocessing pipeline.
Both our AnEn–CNN hybrid and the Hamill et al. (2015, H15) benchmark methods outperformed a quantile-mapped GEFS baseline. The AnEn–CNN hybrid also outperformed the H15 benchmark in Continuous Ranked Probability Skill Scores (CRPSSs) by roughly 10%. For 3-hourly heavy precipitation events in all three hydrologic regions, all AnEn-based methods produced generally skillful forecasts. The AnEn–CNN hybrids (SL-CNN and noSL-CNN) showed BSS improvements ranging from 0% to 60% over the H15 benchmark; the improvements were largely statistically significant. While the AnEn–CNN hybrid was reliable, its resolutions exhibited region-specific differences; highest for the South Coast and lowest for the Northeast. However, even in the latter region, the AnEn–CNN hybrid was largely improved compared to the H15 controls (SL-H15 and noSL-H15). For 7-day accumulated forecasts, the AnEn–CNN hybrid maintained the same good reliability and resolution seen across 3-hourly lead times.
Case studies revealed that the AnEn–CNN hybrid reduced the random error of AnEn output and smoothed the precipitation intensity spectra, better aligning them with observations. Lastly, Supplemental locations (SLs), a data augmentation technique suggested by Hamill et al. (2015), improved the AnEn forecasts in BC overall, especially in the South Interior and Northeast. SL-CNN, the combination of CNN-based adjustments and SLs, was the best-performing method over all hydrologic regions.
Future research could evaluate variations on our methodology. This was an initial attempt at using convolutional neural networks for multivariate postprocessing. It does not apply the CNN to process the entire forecast sequence at once, but rather separately at each forecast lead time. This choice was justified, and no negative impacts were found from using the MDSS when 7-day accumulated heavy precipitation events were examined. However, future research could explore using spatiotemporal neural networks such as recurrent convolutional neural networks (e.g., Shi et al. 2015); they can process grid points and multiple forecast lead times as a whole. Further, other postprocessing methods, aside from AnEn methods, may also introduce undesired noise to their outputs (e.g., ensemble member dressing; Roulston and Smith 2003; Fortin et al. 2006). Given the success of this AnEn–CNN hybrid, other CNN hybrids could explore addressing lingering artifacts left by previous steps in other forecast pipelines.
To our knowledge, no previous research has experimented with a hybrid of the AnEn algorithm and a CNN. Our successful AnEn–CNN hybrid fills the gap between conventional statistical postprocessing and neural networks. More broadly, it also contributes to the growing evidence that deep learning models are useful tools for enhancing and localizing numerical weather prediction results. Once operationalized, this work will be used in hydrometeorological forecasting for reservoir and flood risk management in BC at fine spatial and temporal resolutions.
Throughout this article, the monthly precipitation climatology is computed for all days and in each month from 2000 to 2014 with its surrounding 2 months.
The authors have experimented withapplying the same AnEn algorithm with the GEFS control member only, and the resulting validation set performance was suboptimal (validated by the ERA5, not shown). Further, an analog search among all individual members is expensive and operationally impractical.
Another notable limitation of k-NN is its performance downgrade when using multiple and high-dimensional inputs (Kramer 2013). AnEn methods avoid this by incorporating the limited-area hypothesis (van den Dool et al. 2003).
Acknowledgments.
This research is jointly funded by a Four Year Doctoral Fellowship (4YF) program of the University of British Columbia, and the Canadian Natural Science and Engineering Research Council (NSERC). We also thank the National Center for Atmospheric Research (NCAR), their Advanced Study Program (ASP), and the Casper cluster (CISL 2020) for supporting this research. NCAR is operated by the University Corporation for Atmospheric Research (UCAR) and is sponsored by the National Science Foundation. Additional support was provided by MITACS and BC Hydro. The neural network application of this research is available at https://github.com/yingkaisha/keras-unet-collection. The source program of this research is available at https://github.com/yingkaisha/MWR_21_0154. The neural network of this research was trained on a single NVIDIA Tesla V100 GPU (32 GB); it takes roughly 6 hours to complete the training. Other statistical methods were conducted on 4 CPUs (2.3-GHz Intel Xeon Gold 6140) with 8-GB memory; their overall computational time is roughly 50 min per initialization time.
REFERENCES
Allen, S., G. R. Evans, P. Buchanan, and F. Kwasniok, 2021: Accounting for skew when postprocessing MOGREPS-UK temperature forecast fields. Mon. Wea. Rev., 149, 2835–2852, https://doi.org/10.1175/MWR-D-20-0422.1.
Aloysius, N., and M. Geetha, 2017: A review on deep convolutional neural networks. 2017 Int. Conf. on Communication and Signal Processing (ICCSP), Chennai, India, IEEE, https://doi.org/10.1109/ICCSP.2017.8286426.
Amante, C., and B. Eakins, 2009: ETOPO1 arc-minute global relief model: Procedures, data sources and analysis. NOAA Tech. Memo. NESDIS NGDC-24, 25 pp., https://www.ngdc.noaa.gov/mgg/global/relief/ETOPO1/docs/ETOPO1.pdf.
Bergeron, T., 1965: On the low-level redistribution of atmospheric water caused by orography. Suppl. Proc. Int. Conf. Cloud Phys., 1965, 96–100, https://ci.nii.ac.jp/naid/10012388696/.
Bremnes, J. B., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Mon. Wea. Rev., 132, 338–347, https://doi.org/10.1175/1520-0493(2004)132<0338:PFOPIT>2.0.CO;2.
Bruintjes, R. T., T. L. Clark, and W. D. Hall, 1994: Interactions between topographic airflow and cloud/precipitation development during the passage of a winter storm in Arizona. J. Atmos. Sci., 51, 48–67, https://doi.org/10.1175/1520-0469(1994)051<0048:IBTAAC>2.0.CO;2.
Buytaert, W., M. Vuille, A. Dewulf, R. Urrutia, A. Karmalkar, and R. Célleri, 2010: Uncertainties in climate change projections and regional downscaling in the tropical Andes: Implications for water resources management. Hydrol. Earth Syst. Sci., 14, 1247–1258, https://doi.org/10.5194/hess-14-1247-2010.
Chapman, W. E., A. C. Subramanian, L. D. Monache, S. P. Xie, and F. M. Ralph, 2019: Improving atmospheric river forecasts with machine learning. Geophys. Res. Lett., 46, 10 627–10 635, https://doi.org/10.1029/2019GL083662.
Charron, M., G. Pellerin, L. Spacek, P. L. Houtekamer, N. Gagnon, H. L. Mitchell, and L. Michelin, 2010: Toward random sampling of model error in the Canadian Ensemble Prediction System. Mon. Wea. Rev., 138, 1877–1901, https://doi.org/10.1175/2009MWR3187.1.
CISL, 2020: Cheyenne: HPE/SGI ICE XA system (NCAR Community Computing). Computational and Information Systems Laboratory, National Center for Atmospheric Research, https://doi.org/10.5065/d6rx99hx.
Clark, A. J., W. A. Gallus, M. Xue, and F. Kong, 2009: A comparison of precipitation forecast skill between small convection-allowing and large convection-parameterizing ensembles. Wea. Forecasting, 24, 1121–1140, https://doi.org/10.1175/2009WAF2222222.1.
Clark, M., S. Gangopadhyay, L. Hay, B. Rajagopalan, and R. Wilby, 2004: The Schaake Shuffle: A method for reconstructing space–time variability in forecasted precipitation and temperature fields. J. Hydrometeor., 5, 243–262, https://doi.org/10.1175/1525-7541(2004)005<0243:TSSAMF>2.0.CO;2.
Colle, B. A., R. B. Smith, and D. A. Wesley, 2013: Theory, observations, and predictions of orographic precipitation. Mountain Weather Research and Forecasting: Recent Progress and Current Challenges, F. K. Chow, S. F. De Wekker, and B. J. Snyder, Eds., Springer, 291–344.
Cox, J. A. W., W. J. Steenburgh, D. E. Kingsmill, J. C. Shafer, B. A. Colle, O. Bousquet, B. F. Smull, and H. Cai, 2005: The kinematic structure of a Wasatch Mountain winter storm during IPEX IOP3. Mon. Wea. Rev., 133, 521–542, https://doi.org/10.1175/MWR-2875.1.
Crossett, C. C., A. K. Betts, L.-A. L. Dupigny-Giroux, and A. Bomblies, 2020: Evaluation of daily precipitation from the ERA5 global reanalysis against GHCN observations in the Northeastern United States. Climate, 8, 148, https://doi.org/10.3390/cli8120148.
Cruz, C., A. Foi, V. Katkovnik, and K. Egiazarian, 2018: Nonlocality-reinforced convolutional neural networks for image denoising. IEEE Signal Process. Lett., 25, 1216–1220, https://doi.org/10.1109/LSP.2018.2850222.
Eckel, F. A., and L. D. Monache, 2016: A hybrid NWP–analog ensemble. Mon. Wea. Rev., 144, 897–911, https://doi.org/10.1175/MWR-D-15-0096.1.
Feldmann, K., D. S. Richardson, and T. Gneiting, 2019: Grid- versus station-based postprocessing of ensemble temperature forecasts. Geophys. Res. Lett., 46, 7744–7751, https://doi.org/10.1029/2019GL083189.
Fortin, V., A. Favre, and M. Saïd, 2006: Probabilistic forecasting from ensemble prediction systems: Improving upon the best-member method by using a different weight and dressing kernel for each member. Quart. J. Roy. Meteor. Soc., 132, 1349–1369, https://doi.org/10.1256/qj.05.167.
Glotter, M., J. Elliott, D. McInerney, N. Best, I. Foster, and E. J. Moyer, 2014: Evaluating the utility of dynamical downscaling in agricultural impacts projections. Proc. Natl. Acad. Sci. USA, 111, 8776–8781, https://doi.org/10.1073/pnas.1314787111.
Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning. The MIT Press, 800 pp.
Grimit, E. P., T. Gneiting, V. J. Berrocal, and N. A. Johnson, 2006: The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Quart. J. Roy. Meteor. Soc., 132, 2925–2942, https://doi.org/10.1256/qj.05.235.
Grönquist, P., C. Yao, T. Ben-Nun, N. Dryden, P. Dueben, S. Li, and T. Hoefler, 2021: Deep learning for post-processing ensemble weather forecasts. Philos. Trans. A Math. Phys. Eng. Sci., 379, 20200092, https://doi.org/10.1098/rsta.2020.0092.
Gu, J., and Coauthors, 2018: Recent advances in convolutional neural networks. Pattern Recognit., 77, 354–377, https://doi.org/10.1016/j.patcog.2017.10.013.
Guan, H., Y. Zhu, X. Zhou, E. Sinsky, W. Li, and D. Hou, 2019: The design of NCEP GEFS reforecasts to support subseasonal and hydrometeorological applications. Global and Regional-Scale Models: Updates and Center Overviews, Phoenix, AZ, Amer. Meteor. Soc., 7.2, https://ams.confex.com/ams/2019Annual/webprogram/Paper351640.html.
Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550–560, https://doi.org/10.1175/1520-0493(2001)129<0550:IORHFV>2.0.CO;2.
Hamill, T. M., 2018: Practical aspects of statistical postprocessing. Statistical Postprocessing of Ensemble Forecasts, S. Vannitsem, D. S. Wilks, and J. W. Messner, Eds., Elsevier, 187–217.
Hamill, T. M., and J. Juras, 2006: Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132, 2905–2923, https://doi.org/10.1256/qj.06.25.
Hamill, T. M., and J. S. Whitaker, 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 3209–3229, https://doi.org/10.1175/MWR3237.1.
Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434–1447, https://doi.org/10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.
Hamill, T. M., J. S. Whitaker, and S. L. Mullen, 2006: Reforecasts: An important dataset for improving weather predictions. Bull. Amer. Meteor. Soc., 87, 33–46, https://doi.org/10.1175/BAMS-87-1-33.
Hamill, T. M., M. Scheuerer, and G. T. Bates, 2015: Analog probabilistic precipitation forecasts using GEFS reforecasts and climatology-calibrated precipitation analyses. Mon. Wea. Rev., 143, 3300–3309, https://doi.org/10.1175/MWR-D-15-0004.1.
Hamill, T. M., E. Engle, D. Myrick, M. Peroutka, C. Finan, and M. Scheuerer, 2017: The U.S. National Blend of Models for statistical postprocessing of probability of precipitation and deterministic precipitation amount. Mon. Wea. Rev., 145, 3441–3463, https://doi.org/10.1175/MWR-D-16-0331.1.
Haupt, S. E., W. Chapman, S. V. Adams, C. Kirkwood, J. S. Hosking, N. H. Robinson, S. Lerch, and A. C. Subramanian, 2021: Towards implementing artificial intelligence post-processing in weather and climate: Proposed actions from the Oxford 2019 workshop. Philos. Trans. A Math. Phys. Eng. Sci., 379, 20200091, https://doi.org/10.1098/rsta.2020.0091.
Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803.
Houze, R. A., 2012: Orographic effects on precipitating clouds. Rev. Geophys., 50, RG1001, https://doi.org/10.1029/2011RG000365.
Hsu, W., and A. H. Murphy, 1986: The attributes diagram A geometrical framework for assessing the quality of probability forecasts. Int. J. Forecasting, 2, 285–293, https://doi.org/10.1016/0169-2070(86)90048-8.
Huang, H., and Coauthors, 2020: UNet 3+: A full-scale connected UNet for medical image segmentation. arXiv, 2004.08790, https://arxiv.org/abs/2004.08790.
BC Hydro, 2020: Generation system, an efficient, low cost electricity system for B.C. Accessed 20 June 2021, https://www.bchydro.com/energy-in-bc/operations/generation.html.
Jeworrek, J., G. West, and R. Stull, 2019: Evaluation of cumulus and microphysics parameterizations in WRF across the convective gray zone. Wea. Forecasting, 34, 1097–1115, https://doi.org/10.1175/WAF-D-18-0178.1.
Jeworrek, J., G. West, and R. Stull, 2021: WRF precipitation performance and predictability for systematically varied parameterizations over complex terrain. Wea. Forecasting, 36, 893–913, https://doi.org/10.1175/WAF-D-20-0195.1.
Kingma, D. P., and J. Ba, 2017: Adam: A method for stochastic optimization. arXiv, 1412.6980, https://doi.org/10.48550/arXiv.1412.6980.
Kramer, O., 2013: K-nearest neighbors. Dimensionality Reduction with Unsupervised Nearest Neighbors, O. Kramer, Ed., Springer, 13–23.
Lopez, P., 2007: Cloud and precipitation parameterizations in modeling and variational data assimilation: A review. J. Atmos. Sci., 64, 3766–3784, https://doi.org/10.1175/2006JAS2030.1.
Loshchilov, I., and F. Hutter, 2017: SGDR: Stochastic gradient descent with warm restarts. arXiv, 1608.03983, https://doi.org/10.48550/arXiv.1608.03983.
Marzban, C., S. Sandgathe, and E. Kalnay, 2006: MOS, perfect prog, and reanalysis. Mon. Wea. Rev., 134, 657–663, https://doi.org/10.1175/MWR3088.1.
Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122, 73–119, https://doi.org/10.1002/qj.49712252905.
Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12, 595–600, https://doi.org/10.1175/1520-0450(1973)012<0595:ANVPOT>2.0.CO;2.
National Centers for Environmental Prediction, 2021: NOAA Global Ensemble Forecast System (GEFS) re-forecast. Accessed 15 March 2021, https://registry.opendata.aws/noaa-gefs-reforecast/.
Odon, P., G. West, and R. Stull, 2018: Evaluation of reanalyses over British Columbia. Part II: Daily and extreme precipitation. J. Appl. Meteor. Climatol., 58, 291–315, https://doi.org/10.1175/JAMC-D-18-0188.1.
Oktay, O., and Coauthors, 2018: Attention U-Net: Learning where to look for the pancreas. arXiv, 1804.03999, https://doi.org/10.48550/arXiv.1804.03999.
Robertson, A. W., A. V. M. Ines, and J. W. Hansen, 2007: Downscaling of seasonal precipitation for crop simulation. J. Appl. Meteor. Climatol., 46, 677–693, https://doi.org/10.1175/JAM2495.1.
Roe, G. H., 2005: Orographic precipitation. Annu. Rev. Earth Planet. Sci., 33, 645–671, https://doi.org/10.1146/annurev.earth.33.092203.122541.
Ronneberger, O., P. Fischer, and T. Brox, 2015: U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, N. Navab et al., Eds., Lecture Notes in Computer Science, Springer, 234–241.
Roulin, E., 2007: Skill and relative economic value of medium-range hydrological ensemble predictions. Hydrol. Earth Syst. Sci., 11, 725–737, https://doi.org/10.5194/hess-11-725-2007.
Roulston, M. S., and L. A. Smith, 2003: Combining dynamical and statistical ensembles. Tellus, 55A, 16–30, https://doi.org/10.3402/tellusa.v55i1.12082.
Schefzik, R., 2016: A similarity-based implementation of the Schaake shuffle. Mon. Wea. Rev., 144, 1909–1921, https://doi.org/10.1175/MWR-D-15-0227.1.
Schefzik, R., T. L. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci., 28, 616–640, https://doi.org/10.1214/13-STS443.
Scheuerer, M., 2014: Probabilistic quantitative precipitation forecasting using ensemble model output statistics. Quart. J. Roy. Meteor. Soc., 140, 1086–1096, https://doi.org/10.1002/qj.2183.
Scheuerer, M., and T. M. Hamill, 2015: Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Mon. Wea. Rev., 143, 4578–4596, https://doi.org/10.1175/MWR-D-15-0061.1.
Scheuerer, M., T. M. Hamill, B. Whitin, M. He, and A. Henkel, 2017: A method for preferential selection of dates in the Schaake shuffle approach to constructing spatiotemporal forecast fields of temperature and precipitation. Water Resour. Res., 53, 3029–3046, https://doi.org/10.1002/2016WR020133.
Schnorbus, M., A. Werner, and K. Bennett, 2014: Impacts of climate change in three hydrologic regimes in British Columbia, Canada. Hydrol. Processes, 28, 1170–1189, https://doi.org/10.1002/hyp.9661.
Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2020a: Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part I: Daily maximum and minimum 2-m temperature. J. Appl. Meteor. Climatol., 59, 2057–2073, https://doi.org/10.1175/JAMC-D-20-0057.1.
Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2020b: Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part II: Daily precipitation. J. Appl. Meteor. Climatol., 59, 2075–2092, https://doi.org/10.1175/JAMC-D-20-0058.1.
Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2021: Deep-learning-based precipitation observation quality control. J. Atmos. Oceanic Technol., 38, 1075–1091, https://doi.org/10.1175/JTECH-D-20-0081.1.
Shahdoosti, H. R., and Z. Rahemi, 2019: Edge-preserving image denoising using a deep convolutional neural network. Signal Process., 159, 20–32, https://doi.org/10.1016/j.sigpro.2019.01.017.
Shi, X., Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-C. Woo, 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. arXiv, 1506.04214, https://doi.org/10.48550/arXiv.1506.04214.
Sloughter, J. M. L., A. E. Raftery, T. Gneiting, and C. Fraley, 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 3209–3220, https://doi.org/10.1175/MWR3441.1.
Sperati, S., S. Alessandrini, and L. D. Monache, 2017: Gridded probabilistic weather forecasts with an analog ensemble. Quart. J. Roy. Meteor. Soc., 143, 2874–2885, https://doi.org/10.1002/qj.3137.
Tarek, M., F. P. Brissette, and R. Arsenault, 2020: Evaluation of the ERA5 reanalysis as a potential reference dataset for hydrological modelling over North America. Hydrol. Earth Syst. Sci., 24, 2527–2544, https://doi.org/10.5194/hess-24-2527-2020.
Tian, C., L. Fei, W. Zheng, Y. Xu, W. Zuo, and C.-W. Lin, 2020: Deep learning on image denoising: An overview. Neural Networks, 131, 251–275, https://doi.org/10.1016/j.neunet.2020.07.025.
Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 3297–3319, https://doi.org/10.1175/1520-0493(1997)125<3297:EFANAT>2.0.CO;2.
van den Dool, H., J. Huang, and Y. Fan, 2003: Performance and analysis of the constructed analogue method applied to U.S. soil moisture over 1981–2001. J. Geophys. Res., 108, 8617, https://doi.org/10.1029/2002JD003114.
Verkade, J. S., and M. G. F. Werner, 2011: Estimating the benefits of single value and probability forecasting for flood warning. Hydrol. Earth Syst. Sci., 15, 3751–3765, https://doi.org/10.5194/hess-15-3751-2011.
Wang, L., C.-Y. Lee, Z. Tu, and S. Lazebnik, 2015: Training deeper convolutional networks with deep supervision. arXiv, 1505.02496, https://doi.org/10.48550/arXiv.1505.02496.
Ward, E., W. Buytaert, L. Peaver, and H. Wheater, 2011: Evaluation of precipitation products over complex mountainous terrain: A water resources perspective. Adv. Water Resour., 34, 1222–1231, https://doi.org/10.1016/j.advwatres.2011.05.007.
Xu, X., S. K. Frey, A. Boluwade, A. R. Erler, O. Khader, D. R. Lapen, and E. Sudicky, 2019: Evaluation of variability among different precipitation products in the Northern Great Plains. J. Hydrol. Reg. Stud., 24, 100608, https://doi.org/10.1016/j.ejrh.2019.100608.
Yang, C., H. Yuan, and X. Su, 2020: Bias correction of ensemble precipitation forecasts in the improvement of summer streamflow prediction skill. J. Hydrol., 588, 124955, https://doi.org/10.1016/j.jhydrol.2020.124955.
Yang, D., 2019: Ultra-fast analog ensemble using kd-tree. J. Renewable Sustainable Energy, 11, 053703, https://doi.org/10.1063/1.5124711.
Zhou, X., Y. Zhu, D. Hou, Y. Luo, J. Peng, and R. Wobus, 2017: Performance of the new NCEP Global Ensemble Forecast System in a parallel experiment. Wea. Forecasting, 32, 1989–2004, https://doi.org/10.1175/WAF-D-17-0023.1.
Zhou, Z., M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, 2018: UNet++: A nested U-Net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, D. Stoyanov et al., Eds., Lecture Notes in Computer Science, Springer International Publishing, 3–11.