1. Introduction
a. Motivation
Atmospheric flow structures exist on spatial scales ranging from centimeters to thousands of kilometers. Accurately representing these scales in computational simulations of the atmosphere is a great challenge, especially since processes at differing scales are not generally independent of each other (Judt 2018).
Readily available low-resolution (LR) climate models (ranging from 10- to 100-km horizontal grid spacing) cannot resolve important small-scale processes, particularly for variables strongly influenced by surface heterogeneities (e.g., wind and precipitation) (Whiteman 2000; Frei et al. 2003; Kharin et al. 2007; Stephens et al. 2010; Sillmann et al. 2013; Ban et al. 2015; Torma et al. 2015; Schlager et al. 2019; Song et al. 2020). At horizontal grid spacings of approximately 4 km or finer, high-resolution (HR) convection-permitting models offer improved representations of small-scale variability over LR models owing to the representation of convection and finer orographic resolution (Kopparla et al. 2013; Prein et al. 2016; Innocenti et al. 2019). However, due to their computational cost, convection-permitting models are currently limited in scope either spatially or temporally, and large initial condition or multimodel ensemble experiments, which are highly desirable for climate impact and adaptation studies, are unavailable.
Since fast assessments of meteorological conditions are required for such studies, additional methods that can effectively model small-scale processes have been developed. As one option, statistical downscaling seeks to exploit empirical links between large-scale and small-scale processes using statistical models (Wilby and Wigley 1997; Cannon 2008; Sobie and Murdock 2017; Li et al. 2018). However, standard statistical downscaling approaches are often limited in their ability to model the range of spatiotemporal variability required in many climate impact studies (Maraun et al. 2010).
Machine learning is the branch of artificial intelligence concerned with having computers learn how to perform certain tasks. Deep learning, which is an approach to machine learning based on artificial neural networks (Gardner and Dorling 1998), can be used to implement highly nonlinear, high-dimensional, and flexible statistical models. As one example, convolutional neural networks (CNNs) are a class of deep learning models constructed for image analysis with spatial awareness (Krizhevsky et al. 2012; Karpathy 2022). In the field of image processing, super resolution (SR) aims to develop deep learning models that produce plausible HR details from LR inputs. Owing to their ability to represent spatially organized structures in images, CNNs have led to substantial improvements in SR quality (Dong et al. 2014, 2015; Zhang et al. 2018; Zhu et al. 2020). Even further improvements were found by adopting generative adversarial networks (GANs) (Goodfellow et al. 2014; Mirza and Osindero 2014) using CNNs for SR tasks (Ledig et al. 2017; Zhu et al. 2020). GANs are a machine learning architecture that consists of dueling functions (often CNNs) trained simultaneously with opposing objectives. These objectives shape two networks that communicate with each other, namely, the generator, which aims to generate realistic information, and the discriminator (or critic), which aims to judge or critique this generated information and provide feedback to the generator.
Given the natural parallels between SR and statistical downscaling tasks, researchers have started to apply CNNs to the field of climate downscaling. Several applications have focused on temperature and precipitation. For instance, Sha et al. (2020) employed CNNs to downscale temperature over the continental United States, while Wang et al. (2021) utilized a deep residual network for downscaling daily precipitation and temperature. In another study, Kumar et al. (2021) used the super-resolution convolutional neural network to downscale rainfall data for regional climate forecasting. GANs configured for SR have also shown promise in downscaling for precipitation, wind, and solar irradiance fields (Singh et al. 2020; Stengel et al. 2020; Leinonen et al. 2020; Harris et al. 2022; Price and Rasp 2022).
b. Problem formulation
The focus of this study is the evaluation of GAN- and CNN-based SR methods for downscaling from LR climate model scales to HR scales. Specifically, we assess SR models adapted directly from computer vision for the multivariate statistical downscaling of near-surface winds, encompassing both the u and υ wind components simultaneously. HR wind fields are crucial for numerous weather and climate applications, such as fire weather, pollutant dispersal, infrastructure design, and wind turbine siting. In contrast to most applications, we adopt an emulation approach, training the machine learning models on existing pairs of HR fields and covariates from the LR models used to drive them rather than deriving them from the convection-permitting model fields themselves (i.e., through coarsening). The LR and HR fields may therefore contain mismatches on shared scales that result from the internal variability of the convection-permitting model. One benefit of this approach is that it allows us to sample from the distribution of internal variability that is physically consistent with the conditioning fields. The generation of fine-scale features using SR has been compared to the concept of “hallucinating,” where plausible details are generated that may not precisely match the “true” HR features present in the training data (Zhang et al. 2020). This ability to hallucinate these details is considered desirable for climate applications (Bessac et al. 2019).
c. Research questions
To build on existing literature, we narrow in on three core questions in this paper. (i) How do the generated outputs change when we manipulate the objective functions taken directly from the computer vision literature? (ii) What capacity do the networks have to deal with nonidealized LR–HR pairs? (iii) What role do select LR covariates play in super-resolved near-surface wind fields?
Existing climate applications of SR often overlook the importance of objective functions borrowed from the computer vision field. In this manuscript, we address this issue by focusing on the intersection of SR and statistical downscaling. We conduct experiments to explore various configurations of SR models, including objective functions, data sources, and hyperparameters. Our goal is not to develop highly optimized models for specific configurations but rather to provide insights into the effectiveness and sensitivity of the SR objective function for statistical downscaling. We propose ways to improve the configuration through the assessment of multiple skill metrics and evaluation techniques. Additionally, we investigate the influence of LR inputs and the emulation approach (i.e., with the presence of internal variability) on the generated fields.
We first introduce the existing configurations of SR methods and explain how our chosen configuration and methods relate to them (section 2). Subsequently, we provide details on the training methods and data in section 3. To organize our work, we introduce the methodology required to conduct two experiments we refer to as “experiment 1: frequency separation” (section 3c) and “experiment 2: partial frequency separation” (section 3d), which we use to address research question (i). In section 4, we present the results from experiments 1 and 2. Additionally, we conduct a further analysis in “experiment 3: low-resolution covariates” in section 4d, which addresses research questions (ii) and (iii). We provide a discussion in section 5. Additionally, given the novelty and unique challenges of SR methods for statistical downscaling, we review SR and statistical downscaling in the online supplemental material. We provide an acronym definition list in the supplemental material.
2. Previous work
a. Stochastic versus deterministic GANs
GAN-based SR methods have been successfully applied to the statistical downscaling of climate fields using two main approaches: deterministic and stochastic. In deterministic SR, a unique realization is generated for unique LR inputs [e.g., wind and solar irradiance fields in Singh et al. (2020) and Stengel et al. (2020)]. Alternatively, stochastic SR allows for the sampling of multiple realizations given single LR conditioning fields by providing noise to the generator network (Leinonen et al. 2020; Harris et al. 2022).
We focus on purely deterministic SR models (i.e., single HR fields for given LR input fields) to simplify the analysis and reduce the number of free parameters (such as how to configure the generator to accept noise). While stochastic SR is an important avenue of research, it introduces complexity, design choices, and additional currently unresolved issues, such as underdispersion in the generated ensembles (Goodfellow et al. 2014; Arjovsky et al. 2017; Harris et al. 2022). Furthermore, stochastic and deterministic SRs are typically optimized using similar (if not identical) objective functions, so we believe that our findings can inform design choices for stochastic approaches. There are, however, some key differences between the two approaches that we will discuss further in section 3.
b. Low-resolution covariates
In addition to developing stochastic and deterministic SR models, existing SR studies have configured the LR input data in numerous ways. For example, some studies (in both computer vision and climate/weather) use LR covariates that are coarsened versions of their HR targets, forming a perfect and idealized LR–HR pair at shared scales (Dong et al. 2015; Ledig et al. 2017; Wang et al. 2018; Singh et al. 2020; Sha et al. 2020; Cheng et al. 2020; Stengel et al. 2020; Wang et al. 2021; Kumar et al. 2021; Adewoyin et al. 2021). More recent studies (e.g., Adewoyin et al. 2021; Harris et al. 2022; Price and Rasp 2022) have considered LR inputs that are synchronous with the HR target but are not perfectly matched because they come from different sources, that is, observations as HR targets with LR reanalyses or forecasts as LR inputs. Systematic biases can exist between the HR and LR fields because of their different sources. Such biases can in principle be addressed by bias corrections. However, since convection-permitting models will develop internal variability that differs from that of the LR driving model, mismatches may occur on shared scales, which cannot be remediated by bias correction techniques (Lucas-Picher et al. 2008). One of the goals in SR for statistical downscaling is to mimic the internal variability of the convection-permitting model rather than match HR features exactly with the GAN approach. Because of the desire to develop a tool to sample realizations of HR fields conditional on LR fields, the ability to model this internal variability is a strength of our approach.
In computer vision, the process of obtaining nonidealized LR–HR pairs for training poses significant challenges, which in turn hinders the generalization capabilities of state-of-the-art SR methods when applied to real-world images. These methods are trained using idealized inputs that do not exhibit issues commonly faced in photography, such as aliasing effects, sensor noise, and compression artifacts. Consequently, when applied to real-world scenarios, these SR models tend to produce high-frequency artifacts and distortions, as documented in studies like Shocher et al. (2018) and Fritsche et al. (2019). To combat this, Fritsche et al. (2019) design a training method that synthesizes nonidealized LR images from HR input images, essentially developing a training set that contains nonidealized LR and HR pairs. This approach improved the ability of the networks to generalize to real-world (imperfect) data. Interestingly, while nonidealized training image pairs are difficult to come by in computer vision, the analogous configuration for climate fields is more readily available because of the role of internal variability.
For forecasting and observational datasets, Price and Rasp (2022) recognize that systematic error and biases may contribute prominently to shared-scale mismatches for their data. To deal with this challenge, they train the networks to correct the LR input fields to match the HR domain beforehand by including explicit “correction” layers. In our work, we show how idealized versus nonidealized LR–HR pairs influence the generated fields but do not include additional correction techniques; as demonstrated through analyses of model biases, our mismatches are predominantly caused by random internal variability rather than systematic errors.
Additionally, studies have either considered mapping between LR and HR fields of the same physical quantity (e.g., Singh et al. 2020; Stengel et al. 2020; Leinonen et al. 2020) or provided additional input information in the form of LR climate variables or information about the model surface (e.g., Price and Rasp 2022; Harris et al. 2022). However, to our knowledge, the value of including additional covariates has not yet been explicitly addressed. We include additional LR covariate fields and design experiments to measure their influence over the generated fields.
3. Methods and data
a. Objective functions
We adopt the Wasserstein GAN with gradient penalty (WGAN-GP) from Arjovsky et al. (2017) for our SR models that use a critic C network to estimate the Wasserstein distance between the generated and target distributions (ℙg and ℙr, respectively). Using WGAN-GP, we implement super-resolution GAN (SRGAN) networks from Ledig et al. (2017). More details on the networks we use are provided in section 3f.
GANs enable the SR methods to minimize both distributional distances (i.e., convergence of distributions) and grid point–based metrics (i.e., convergence of realizations) in the training process. When using grid point–based (or pixelwise) metrics, such as mean-square error (MSE) or MAE, enforcing strict adherence to pixelwise errors penalizes physically realizable but noncongruent (i.e., mismatched) high-frequency patterns. This problem is similar to the limitations of grid point–based error measures for precipitation fields, known as the double-penalty problem (Rossa et al. 2008; Michaelides 2008; Harris et al. 2022). Using a distributional distance in the objective function helps mitigate the double-penalty problem. Further details on GAN implementation can be found in the supplemental material.
b. Meteorological datasets
We develop deterministic WGAN-GP SR models that generate 10-m wind component fields (u10 and v10 for the zonal and meridional components, respectively), using simulations by the Weather Research and Forecasting (WRF) Model over subregions in the high-resolution contiguous United States (HRCONUS) domain (Rasmussen and Liu 2017; Liu et al. 2017) as training data. The WRF HRCONUS simulations are at a convection-permitting resolution (4-km grid spacing) and are driven using 6-hourly ERA-Interim (80-km grid spacing) reanalysis output (Dee et al. 2011) during the historical period from October 2000 to September 2013 (18 991 6-hourly fields). Using GANs for SR, we aim to generate HR fields—conditioned on LR reanalysis fields—that are consistent with WRF HRCONUS, effectively emulating the simulated HR WRF wind fields.
Through its boundary forcing and spectral nudging at large scales above the boundary layer, WRF HRCONUS is synchronous in time with ERA-Interim (Rasmussen and Liu 2017). This synchronization creates reasonable agreement at large scales between ERA-Interim and WRF HRCONUS. However, due to upscale energy transfers, they may not match exactly, since smaller scales can evolve freely as a result of WRF’s internal variability. This nonidealized pairing places more responsibility on the GAN to correctly produce details consistent with the convection-permitting model, thereby testing the extent to which the critic captures this internal variability and enables the generator to produce them.
WRF HRCONUS outputs are not provided on a regular latitude–longitude grid. So, to begin, WRF HRCONUS is regridded to a regular grid through nearest-neighbor interpolation. While the native WRF HRCONUS grid spacing is ∼4 km, WRF HRCONUS is regridded to ∼10 km, resulting in a scale factor of 8with respect to ERA-Interim’s 80-km grid spacing. Nearest-neighbor interpolation is intentionally used to limit unintended smoothing by other methods, such as bilinear interpolation.
In addition to the LR u10 and v10 ERA-Interim fields, five additional LR fields from reanalysis products are used as covariates. The additional covariates and motivations for their use are as follows:
-
Convective available potential energy (CAPE) is selected for its influence on wind conditions in convective systems.
-
Topography is a coarse digital elevation map, selected for its role in influencing wind speed and direction.
-
Land–sea fraction indicates the ocean-to-land fraction of a coarse grid and influences wind patterns around coastlines.
-
Surface roughness length determines the (generally heterogeneous) strength of surface drag on the flow.
-
Surface pressure plays a role in the surface wind momentum budget through the pressure gradient.
These are provided to the modified generator as extra channels. Due to the unavailability of CAPE in ERA-Interim at 6-hourly time steps, CAPE from ERA5 (Hersbach et al. 2020), interpolated from the native 30-km grid spacing to 80 km, is used instead. ERA5 and ERA-Interim represent the same historical atmospheric conditions, so it is assumed that any mismatches introduced are small between both WRF and ERA5 as well as between ERA-Interim and ERA5.
Within the WRF HRCONUS domain, SR models are developed for three subregions with different climatological conditions as follows:
-
The western region, which covers southern British Columbia, Washington State, and Oregon, is characterized by complex topography that includes mountainous terrain and complex shorelines.
-
The central region, which covers North and South Dakota, as well as Minnesota and northern Iowa, southern Manitoba, and the southwestern part of Ontario, has a continental climate, with large lakes and relatively frequent mesoscale convective features.
-
The southeast region, which includes Florida, Cuba, and adjacent waters, is subject to tropical cyclones and frequent mesoscale convective features.
Figure 1 shows a representative instance of the wind speed field over the WRF HRCONUS domain. Each of the three subregions contains 16 × 16 LR grid points and 128 × 128 HR grid cells.
c. Experiment 1: Frequency separation
In deterministic GAN SR, the objective of the MAE content loss in the generator’s objective function is to produce single realizations that look like the conditional median, which drives the outputs to appear smooth. Simultaneously, the goal of the adversarial loss is to ensure that single realizations are drawn from the entire distribution of possible realizations instead of just the conditional median, so it encourages generating possible arrangements of fine-scale features. A challenge with deterministic GAN SR is that the content loss compares fine-scale features from different realizations of the generated and “true” fields, while the adversarial loss compares the distributions of the generated and “true” fields using the critic. It follows that the content/adversarial loss can be viewed as implicitly oppositional (not adversarial) because differences in physically realizable fine-scale features (which are made possible by the adversarial loss) are penalized by the content loss. While training GAN SR models, one can view the two terms as existing in “tension” with one another.
In their work, Fritsche et al. (2019) recognized that this tension in SR tasks can be addressed by delegating spatial frequencies in the images to select terms in the objective functions. The resulting approach, called frequency separation (FS), separates the spatial frequencies of the HR fields into high- and low-frequency pairs, applying adversarial loss and content loss to each frequency range, respectively.
The concept behind FS is to use the generator’s MAE content loss to encourage realization convergence at low frequencies in the fields rather than across the entire range of image frequencies, as in typical SR configurations. As we have discussed, encouraging high-frequency realization convergence is not always appropriate for images or weather and climate fields, as high-resolution features are not determined uniquely by low-resolution ones. In FS, high frequencies are isolated and provided to the generator’s adversarial loss (i.e., the critic), which strives for distributional convergence between training and generated data rather than individual realizations. This approach was found by Fritsche et al. (2019) to yield perceptually improved results with images and is considered in this study to evaluate its effectiveness for wind fields.
Summary of hyperparameters. Following Gulrajani et al. (2017), we define one GAN epoch as updating the critic once every minibatch (while the generator is updated every fifth minibatch), but define one pure CNN epoch as updating the generator once every minibatch.
d. Experiment 2: Partial frequency separation
There is an interesting yet potentially subtle difference in how SR objective functions can be used for stochastic versus deterministic SR. This difference motivates a second experiment we conduct on our deterministic GANs related to FS, which we call “partial” frequency separation. If multiple realizations are generated, as is done in stochastic SR, the fine scales of the “true” fields can be compared to the ensemble median of the generated realizations using the content loss. As the resulting ensemble median tends to suppress fine-scale features, differences on common scales would be more strongly penalized than differences in fine-scale features between the individual realizations. Furthermore, generated individual realizations would not be encouraged to look like the conditional median. Just like in regular FS, penalizing differences on common scales is a desirable outcome of applying the content loss. Such an approach is taken by Harris et al. (2020) for stochastic SR precipitation fields.
For deterministic GAN SR, while we sample from the distribution of HR fields conditioned on the LR fields, for a given network the same sample is always drawn for the same conditioning fields. To mimic the approach of Harris et al. (2020) in a deterministic setting, low-frequency filters can be applied to suppress fine-scale features in the HR fields instead of computing the ensemble median from several realizations. As done in the FS GANs, the content loss from stochastic SR can be mimicked in deterministic GAN SR by delegating low frequencies to the MAE so that common scales are more strongly penalized. However, unlike the FS GANs, in partial FS the adversarial loss is applied to all frequencies instead of just the high frequencies. We emphasize that we are mimicking stochastic SR because simply applying a low-frequency filter to an HR field is not the same as estimating the actual ensemble median. The practical benefits of partial FS are that it can significantly save graphics processing unit (GPU) memory requirements (and training time) by not requiring the critic network to evaluate several ensemble members.
e. Experiment 3: Low-resolution covariates
As a separate analysis from the frequency-separation experiments, we narrow our focus on the LR covariates to investigate their impact on the performance of the GANs considered in this study. We particularly focus on how the LR covariates influence the spatial frequency structure. This analysis is organized into two streams that explore 1) how differences between ERA-Interim and WRF HRCONUS may influence GAN performance and 2) what role the additional physically relevant covariates play in generating spatial structures. The details are described below, with results presented in section 4.
1) Idealized covariates
Two additional non-FS GANs are trained using only u10 and v10 fields as LR covariates. One GAN is conditioned with ERA-Interim, without the additional covariates, and the other uses artificially coarsened (by a scale factor of 8) WRF HRCONUS HR wind components. The idealized pairing of original HR and coarsened WRF fields emulates approaches common to both the computer vision and climate literature (Ledig et al. 2017; Singh et al. 2020; Sha et al. 2020; Leinonen et al. 2020; Stengel et al. 2020; Kumar et al. 2021; Wang et al. 2021).
2) Additional covariates
As a further analysis, we compare the spectra of fields produced by the non-FS GANs with all seven covariate fields to the spectra produced by the GAN with only ERA u10 and v10. This is a simple experiment intended to evaluate the collective effect of including these additional covariates; however, it does not illuminate the importance of individual covariates.
To explore how sensitive SR models are to individual covariates, an experiment is devised to randomly shuffle individual covariate fields of the already-trained non-FS GAN and measure across wavenumbers the resulting changes in the spectra. The relative difference (RD) between the power spectra of the modified [Ps(k)] and unmodified baseline [Pb(k)] is quantified, and the resulting variance at each wavenumber for each perturbed covariate is computed. The above approach is known as single-pass permutation importance and is a common feature-importance experiment (McGovern et al. 2019). Further details about our implementation are provided in the supplemental material.
f. Model training
The critic network is adopted from the SRGAN discriminator of Ledig et al. (2017), but without batch normalization layers (Gulrajani et al. 2017) for compatibility with WGAN-GP. We use a generator network similar to SRGAN, but with additional LR inputs and one additional upsampling block. Network details are shown in Fig. 3.
The models are trained using a single NVIDIA GTX 1060 GPU with 6 GB of video RAM (VRAM). The Adam optimizer, a form of stochastic gradient descent (Kingma and Ba 2017), is used to train the models. Of the 18 991 fields in the 2000–13 WRF HRCONUS simulation, 80% are used for training (15 704 fields) and 20% (3287 fields comprising years 2000, 2006, and 2010) are used for testing and evaluation. The network parameters are not updated using any data from the year 2000, 2006, or 2010 test set. We would like to emphasize that most modeling decisions have been made prior to training by adopting hyperparameter and model choices from existing work. As such, we do not specify a separate validation set, since we do not perform hyperparameter tuning. Instead, we focus on performing sensitivity analyses with our experimental configurations.
Each GAN takes approximately 48 h to complete 1000 passes (i.e., epochs) through the entire training set. Hyperparameter values, which are summarized in Table 1, are mostly taken directly from those recommended in the existing literature (e.g., Gulrajani et al. 2017). All results are produced by models after reaching the full 1000 training epochs.
While rescaling
For each of the regions, three different values of N are used for
4. Results
a. Experiment 1: Frequency separation
1) Visual quality of generated fields
A representative set of the u10 and v10 wind field maps produced using the SR models on 1200 UTC 5 October 2000 is shown in Figs. 4 and 5, respectively. While there are broad consistencies at large scales between WRF HRCONUS and ERA-Interim, differences in the locations of certain structures are present in the fields, illustrating the nonidealized nature of the pairing and the internal variability of WRF. For example, the negative u10 wind feature in the northeast part of the southeast region is oriented slightly differently in WRF HRCONUS and ERA-Interim.
Fields produced by the CNN do not contain the fine-scale variability seen in the WRF HRCONUS field for the southeast and central regions. This fact is most obvious in the southeast region, where fine-scale convective features are not produced by the pure CNN, and the fields are too smooth. When comparing individual realizations, we cannot conclude that the lack of detail is worse (given that “smooth” realizations may be physically realizable). However, this “smoothness” is observed systematically over several realizations (Figs. S1–S6) and over each region, demonstrating that the CNN is limited in the spatial structures it can produce. The objective function based on content loss alone does not allow the CNN to “hallucinate” fine-scale features because it constrains realization pairs to be similar, rather than sampling from distributions, as when the adversarial loss is included. This effect can also be observed in the central region, although to a lesser extent. The west region shows generally good agreement between the CNN and WRF HRCONUS, in particular, with the inland topographical features. However, the CNN for the west region is lacking some of the sharp and well-defined topographical details found in WRF HRCONUS.
GANs with and without FS show little perceptual difference for the west and central regions; both show an improvement in fine spatial structure over the pure CNN. In the GAN with
2) Evolution of performance metrics while training
Several metrics were recorded during the training of the SR models. Among them are the MAE, MSE, multiscale structural similarity index (MS-SSIM), and Wasserstein distance. MS-SSIM is a metric comparing images across multiple spatial scales taken from the computer vision field, designed to correlate well with perceived image quality in image reconstruction tasks.
Figure 6 shows the evolution of the MAE and Wasserstein distance on the test sets during the training process, while Fig. S7 shows the training evolution on both the test and training sets for the MAE, MSE, MS-SSIM, and Wasserstein distance. While training, the MAE, MSE, and MS-SSIM are computed by comparing pairs of realizations in minibatches, while the Wasserstein distance is approximated between the entire set of realizations in the minibatches.
The MAE over the test data reaches a minimum value after ∼200 epochs for the southeast and central regions. For the pure CNNs, this minimum is more pronounced and occurs earlier than in the GANs because the generator is updated more frequently (see Table 1). The presence of a local minimum in the MAE is indicative of the overfitting of the generator in these two regions, since no minimum is found in the evolution of the MAE on the training set (Fig. S7). At late epochs, the test set MAE does not grow substantially, so overfitting as measured by the MAE is minimal. Interestingly, evidence of overfitting is only present in the MAE/MSE, not the Wasserstein distance. No evidence of overfitting is found in the west region. The slightly larger MAE at later epochs for the southeast and central regions may be indicative of the generator learning large-scale differences between ERA-Interim and WRF HRCONUS in the training set only, while not generalizing to the test set. The topic of large-scale differences will be discussed in more detail later in this section.
For the evolution of the MAE, MSE, and to some extent MS-SSIM (Fig. S7), there is a robust ordering of the performance of the different SR models across the regions. The best-performing model in the MAE sense is the pure CNN, followed by the
By construction, pure CNNs minimize the MAE and as such are expected to perform the best among all models with regard to this metric. The smoothness of the conditional median reduces the impact of the double-penalty problem on the generated fields. The CNN performs similarly well in terms of MSE and MS-SSIM, both of which are also performance measures of realization convergence (Sampat et al. 2009). For the FS GANs, when there is less smoothing, the optimization problem tends to be more similar to that in the pure CNNs, since a large range of frequencies are delegated to the content loss. This explains why the
3) Radially averaged power spectra
Quantifying the perceptual quality of generated realizations using metrics that align with “true” realizations is challenging due to the double-penalty problem. Instead, we shift our focus to evaluating the statistical characteristics of the generated fields. Specifically, we analyze the spatial correlation structures of HR wind fields using power spectra, as suggested by previous studies (Singh et al. 2020; Stengel et al. 2020; Kashinath et al. 2021). Each wind component is assessed separately, and we calculate the radially averaged power spectral density (RAPSD), following the naming convention in Harris et al. (2022) after their application of RAPSD to SR precipitation fields. The RAPSD ratios of SR to WRF HRCONUS are depicted in Fig. 7.
The RAPSD of the FS GANs shows how the results of the optimization problem change when the spatial frequencies are separated. The CNN shows strong low-variance biases at fine scales, consistent with the oversmooth quality of the generated fields. Each FS GAN shows similar power at large scales to the pure CNN (since the range of wavenumbers for both is optimized using the MAE) until the frequencies are separated and the spectra break from the pure CNN and join the non-FS GAN spectra at higher wavenumbers (since these wavenumbers were optimized using the adversarial component). The non-FS GAN spectra show that the SR models more accurately match WRF HRCONUS when the adversarial loss is provided with the full range of frequencies, despite the increase in pixelwise errors when doing so.
b. Experiment 2: Partial frequency separation
We build on the FS GAN result that the variability of the fields generated by SR models improves when the adversarial loss receives all frequencies, and we consider a partial FS GAN. We also adjust the hyperparameter α, the relative weight of the content and adversarial loss terms, to examine its role in optimizing the variability of the generated fields. The resulting partial FS GAN RAPSD ratios, with α = 50 and α = 500, are presented in Fig. 8.
The partial FS GAN spectra are largely similar to the non-FS GAN spectra; however, there are more fluctuations in the RAPSD ratio for α = 50. For example,
Increasing the value of α for the central and west regions reduces the variability in the RAPSD and enhances the small-scale power. The results of the partial FS GANs, with different values of α and N, show that the RAPSD is strongly influenced by the role of the content loss in the optimization, depending on the region of focus.
c. Comparing and summarizing model performance
We summarize model performance using a range of different metrics (Table 2). The MAE and MS-SSIM are included, as well as biases in the mean, standard deviation, and 90th percentile of the wind speed for each region. The spatial maps of the biases in these statistics are reported in Figs. S8–S16 for the pure CNN, FS GANs, the non-FS GAN, and partial FS GANs (for both values of α). Wind speed bias is selected as a stringent test since low-variance biases in the wind components result in biases in the mean of the wind speed. Systematic low-variance biases are represented as negative spatial averages for these metrics, consistent with the general low-power bias in the RAPSD.
Summary of metrics computed for each SR model over test data. The MAE and MS-SSIM are computed for all pixels in u10 and v10 fields. Spatial averages of differences (i.e., bias) between the generated and “true” climatological mean, standard deviation, and 90th percentile over the test set of wind speeds are reported as μ, σ, and Q90, respectively. The MSA ξ is also included, as is MAEP, between 6-hourly WRF time steps. Bold text indicates the optimally performing model for the given metric.
Table 2 summarizes the result that the adversarial loss introduces fine-scale variability that contributes to the double-penalty problem. Similar to Harris et al. (2022), we find the MS-SSIM not very useful for evaluating the generated fields. We hypothesize that the MS-SSIM may be more sensitive to noise and artifacts (common to images) rather than to the potentially noncongruent fine-scale convective features of the wind patterns, like those that contribute to the double-penalty problem. The spatial means of the wind speed biases show that GANs generally outperform the pure CNN, especially for the standard deviation and 90th percentile. This is not a surprising result; the GANs are introducing variability consistent with WRF into the generated fields and are better able to represent these climatological statistics.
The MAE of persistence (MAEP), which summarizes the difference between realizations (from either WRF or the SR models) at each time step and those 6 h prior, is also provided in Table 2. The values of the MAE are lower than those of MAE of persistence for WRF (MAEPWRF), and the persistence of the models on the 6-hourly time scale is consistent between the SR models (MAEPSR) and MAEPWRF. This provides additional evidence that the models are producing realistic results.
d. Experiment 3: Low-resolution covariates
1) Idealized covariates
Using idealized coarse covariates resulted in a training evolution of the MAE and MSE, without signs of overfitting (not shown). The MAE and MSE plateau at late epochs, supporting the hypothesis that the SR models are overfitting large-scale differences in the location of spatial features between ERA-Interim and WRF HRCONUS in the training set. Figure 9 shows the RAPSD ratio (relative to WRF HRCONUS) of u10 and v10 wind fields with the idealized GAN (GAN with coarsened WRF) and the GAN trained with u10 and v10 ERA-Interim covariates (GAN with ERA).
Due to large-scale differences between WRF HRCONUS and ERA-Interim, a low-power bias at small wavenumbers is found in Fig. 9 for GANs using ERA-Interim covariates. This bias almost entirely vanishes using the idealized covariates. At high frequencies, there is less of a difference seen between the coarsened WRF GAN and ERA GAN with just u10 and v10.
2) Additional covariates
Figure 10 shows the power spectra of the non-FS GAN from earlier, with all seven covariate fields (GAN with ERA+, where “+” is shorthand to indicate that these GANs were trained with the additional covariates discussed in section 3), and compares them to the spectra of the GAN with only ERA u10 and v10. The GAN with ERA+ shows a minor improvement in this bias at large scales and a significant improvement at the small scales. This demonstrates that the additional covariates are robustly improving the generator’s ability to produce high spatial frequency information consistent with WRF for both the u10 and v10 fields for each region.
The results of the single-pass permutation-importance experiment are summarized in Fig. 11. We find that u10 and v10 are most sensitive to changes to the respective coarse u10 and υ10 covariates, especially at large scales. This is not surprising, given that the large scales in WRF HRCONUS are synchronous with ERA-Interim, so the networks are making direct use of this large-scale information. This same result can be seen for each region and each wind component.
Interestingly, the degree to which each wind component is sensitive to the other (i.e., the sensitivity of HR v10 fields to LR u10 fields, and vice versa) is less than that seen for some of the other covariates, such as CAPE in the southeast region. CAPE is highly correlated with convective processes that dominate the high spatial frequencies of the generated wind components (Houze 2004) in the southeast region, where fine-scale convective features are common. Surface pressure also plays a moderate role in the southeast region, possibly correlating with weather systems accompanied by small-scale variability caused by squall lines or multicell storms.
The central region shows sensitivity to u10, v10, and CAPE. As in the southeast region, convective features are also common in the central region, which explains the observed sensitivity to CAPE. The frequency of convective systems in WRF HRCONUS in the central region is not expected to be quite as high as in the southeast region, resulting in a slightly lower relative sensitivity to CAPE.
For the west region, generated fields are most sensitive to u10 and v10 but are not sensitive to CAPE or surface pressure. This result can also be understood in the context of the west region’s climatology, where convective storms are rare. The strong influences of land–sea boundaries and complex topography are better predicted by the coarse u10 and v10 fields themselves.
5. Discussion
a. Overview
GANs for SR show impressive capabilities in generating fine-scale variability that is similar in distribution to the “true” variability simulated by the convection-permitting model. Extensive dynamical downscaling by convection-permitting models is operationally infeasible due to computational costs, which makes statistical downscaling using GANs a very attractive and practical alternative, and work to date bodes well for their operational feasibility.
We present three experiments aimed at understanding our research questions regarding the SR objective function and LR covariates in the SR configurations. While we do not propose a “best” performing model, we design experiments that provide potential avenues for fine-tuning future models. A discussion of these three experiments and future avenues is included below.
b. Experiment 1: Frequency separation, and experiment 2: Partial frequency separation
When using FS, results vary for metrics that evaluate the convergence of realizations depending on the value of N. The non-FS GAN captures the variability of WRF HRCONUS well because the adversarial loss considers variability across all scales, unlike FS GANs, which only capture certain scales. Although FS GANs demonstrate a lower MAE, they sacrifice perceptual realism and variability. Spatial correlation measures like RAPSD better reflect the perceptual accuracy of generated fields.
Partial FS can substantially influence the generated spectra for each region (Table 2) by mimicking the use of the conditional mean/median in the content loss in stochastic approaches. As such, compared to stochastic GANs (that require an ensemble), partial FS can significantly decrease the computational requirements. Notably, partial FS GANs can generate more large-scale variability when the content loss is applied to low frequencies only. Potential improvements provided by partial FS GANs compared to non-FS GANs depend on the relative weighting of the content and adversarial losses; further study is required to determine an optimal approach.
The results of both FS experiments helped to address research question (i): how do the generated outputs change when we manipulate the objective functions taken directly from the computer vision literature? Namely, the experiments demonstrated a tension between the convergence of realizations and convergence of distributions that is at the core of the generator’s objective function in SR and also offer useful directions for future tuning to ease this tension.
c. Experiment 3: Low-resolution covariates
1) Low-power biases
There exists a stubborn low-power bias between WRF HRCONUS and the non-FS GAN (GAN with ERA+) (Fig. 10) at small wavenumbers, even with additional covariates and an adversarial loss computed over all frequencies. We provide evidence that this low-power bias originates from differences in the placement of large-scale spatial features between ERA-Interim and WRF HRCONUS and show that idealized coarsening of the HR fields to produce the LR fields for training dramatically reduces it. This result suggests that since the HR-generated fields inherit large-scale information from the LR fields, differences between ERA-Interim and WRF HRCONUS can manifest in the generated fields as low-power biases at large scales.
In existing studies, stochastic methods have demonstrated differences in the performance of the GANs when trained with idealized (Leinonen et al. 2020) and nonidealized (Price and Rasp 2022; Harris et al. 2022) LR–HR pairs. Similar to the present study, Harris et al. (2022) performed an idealized experiment using covariates derived from the HR fields, which revealed that idealized covariates improve the calibration and continuous ranked probability score (CRPS). Moreover, Harris et al. (2022) named large-scale differences between the LR–HR pairs as the main limiting factor in GAN performance and saw a large improvement in the CRPS and calibration when the GANs ingested coarsened HR fields from the same target HR dataset. For deterministic approaches, nonidealized pairs limit the large-scale variability in the generated fields, as represented by a low-power bias at large scales in the RAPSD. We hypothesize that this difference in deterministic settings comes from differences in internal variability between ERA-Interim and WRF, causing the misalignment of features on shared scales. Conversely, the absence of significant biases in power at large scales for the idealized pairs can be attributed to the lack of internal variability between the LR and HR fields on shared scales. To the extent to which systematic differences exist between the nonidealized HR and LR pairs, a bias-correction methodology, similar to Price and Rasp (2022), or inclusion of HR topographical information during training could improve skill.
The results of performing the nonidealized versus idealized experiment, particularly when examining the power spectra, help address research question (ii): what capacity do the networks have to deal with nonidealized LR–HR pairs?
2) Additional covariates
To our knowledge, this study represents the first application of SR to climate fields that demonstrates the importance of including additional physically relevant LR variables, beyond LR versions of the HR target variables, as covariates. Including these additional covariates reduces low-power bias across all frequencies, particularly at high frequencies (Fig. 10). This finding is important for designing SR models for climate and weather fields, as the GANs mirror the physical relationships between covariates and target variables.
The methodology we use to assess the importance of specific covariates finds results consistent with important physical processes in the regions considered. This methodology assumes the independence of covariates. One caveat of the above method is that if the covariates are not independent, certain combinations of covariates may be more important than individual covariates. It should also be noted that the method does not apply to invariant covariates like surface roughness length, topography, or a land–sea mask, which may play a crucial role in representing high-frequency variability. Future work could explore the elimination of these covariates in analyzing the RAPSD of new GANs to determine their importance and also perform multipass permutation importance to measure combined effects (McGovern et al. 2019). Furthermore, HR versions of invariant covariates could also significantly improve model skills. GAN SR with wind variables could leverage HR topography, surface roughness, and a land–sea mask (although a land–sea mask might provide redundant information to topographical information). A similar approach has been used previously in Harris et al. (2022) and quite successfully in Sha et al. (2020). In the generator network, HR invariant information could be embedded in a set of additional LR covariates (Harris et al. 2022), or it could also be concatenated after the upsampling blocks in the generator network for dimensional consistency. One interesting follow-up study to ours would be to measure the importance of invariant HR covariates for GAN SR.
The results of including additional covariates, as well as the power spectrum breakdown of their importance, address research question (iii): what role do select LR covariates play in super-resolved near-surface wind fields?
d. Future GANs
Our deterministic approach examines how the low-power bias is represented in the spectra of the generated fields and, importantly, how appropriately chosen covariates and hyperparameters can reduce this bias. We hypothesize that one additional way to reduce RAPSD power biases may be to introduce a loss/regularization term that directly evaluates the RAPSD of batches of generated and “true” fields. Such an approach was introduced in Kashinath et al. (2021) by replacing the adversarial loss entirely with this spectral loss. This approach performs well, but its implementation in Kashinath et al. (2021) comes with caveats—it was tested with idealized covariates and a scale gap of 4 times. Moreover, leaving out the adversarial loss may inhibit the ability of the model to sample diverse realizations from the conditional distribution. Rather than replacing components of the generator’s loss function entirely, a spectral loss could serve an auxiliary role that supports realistically generated variability across spatial scales in addition to the content and adversarial components.
Given that the generated fields suffer from these low-power biases, future avenues might also explore 1) the extent to which SR model behavior might change if trained using LR fields from different climate models, or if trained with one LR model and evaluated with a different LR model, and 2) how well our SR models will extrapolate when provided with LR fields from future climate model projections.
6. Conclusions
The SRGANs used with the WGAN-GP framework show promising potential and feasibility for the downscaling of multivariate wind patterns, indicating their potential usefulness in a variety of practical applications. While we do not propose “best” performing models, our results shed light on the SR task when applied to climate fields.
Using SR for statistical downscaling means generating fine-scale features that could exist, rather than those which may have actually existed, resulting in challenges in selecting appropriate error metrics. Using RAPSD, we demonstrated scale-dependent biases in the generated variability, allowing us to more easily compare the SR models. We emphasize that selecting appropriate metrics is vital for the comparability of future SR approaches.
The role of covariates was closely examined in the RAPSD of the generated fields. Specifically, we showed that internal variability in HR fields can result in a low-power bias at large scales in generated fields. We also showed that carefully chosen covariates help reduce low-power biases at all spatial scales, but especially in the fine-scale features. To further investigate the role of our chosen covariates, a sensitivity experiment was conducted to demonstrate the value added to spatial structures in the generated fields by each of the covariates. The importance of the covariates differed between regions, consistent with the relative importance of CAPE in producing small-scale wind variability.
Additionally, we adopted frequency separation (FS) from the computer vision field. While FS did not result in a more skillful GAN (in terms of the power spectra of the generated fields), it did reveal how modifying the generator’s objective function changed the RAPSD of the generated fields. Specifically, we demonstrate the important role the adversarial loss has in generating variability across spatial scales. For deterministic SR, we discuss how the content loss and adversarial loss are implicitly oppositional in their objective to express variability. While stochastic SR can mitigate this problem using ensembles, we introduce partial FS as a simpler and more computationally efficient option. We also show evidence of the sensitivity of generated spatial structures to the hyperparameters α and N in partial FS. A central result of this analysis is the importance to the generated fields of the two kinds of loss terms used in the objective function, adversarial and content, and the scales to which these are applied.
Acknowledgments.
A.H.M. acknowledges the support of the NSERC (Funding Reference RGPIN-2019-204986). We acknowledge helpful discussions with David John Gagne II and Mercè Casas-Prat. We would also like to express our gratitude to the three reviewers for their constructive feedback, which greatly enhanced our manuscript.
Data availability statement.
We have organized our code using two avenues to reproduce our results. 1) The underlying code used for training the Wasserstein GAN models is archived at https://doi.org/10.5281/zenodo.7604242, and 2) due to the challenge of using complicated software with dynamic dependencies, we made efforts to isolate our software environment and data so that our analysis is reproducible. As such, we developed a Docker image (nannau/annau-2023) hosted on Docker Hub and include the corresponding documentation with the source code at https://doi.org/10.5281/zenodo.7604267 (Merkel 2014).
REFERENCES
Adewoyin, R. A., P. Dueben, P. Watson, Y. He, and R. Dutta, 2021: TRU-NET: A deep learning approach to high resolution prediction of rainfall. Mach. Learn., 110, 2035–2062, https://doi.org/10.1007/s10994-021-06022-6.
Arjovsky, M., S. Chintala, and L. Bottou, 2017: Wasserstein GAN. arXiv, 1701.07875v3, https://doi.org/10.48550/arXiv.1701.07875.
Ban, N., J. Schmidli, and C. Schär, 2015: Heavy precipitation in a changing climate: Does short-term summer precipitation increase faster? Geophys. Res. Lett., 42, 1165–1172, https://doi.org/10.1002/2014GL062588.
Bessac, J., A. H. Monahan, H. M. Christensen, and N. Weitzel, 2019: Stochastic parameterization of subgrid-scale velocity enhancement of sea surface fluxes. Mon. Wea. Rev., 147, 1447–1469, https://doi.org/10.1175/MWR-D-18-0384.1.
Cannon, A. J., 2008: Probabilistic multisite precipitation downscaling by an expanded Bernoulli–gamma density network. J. Hydrometeor., 9, 1284–1300, https://doi.org/10.1175/2008JHM960.1.
Cheng, J., J. Liu, Z. Xu, C. Shen, and Q. Kuang, 2020: Generating high-resolution climate prediction through generative adversarial network. Procedia Comput. Sci., 174, 123–127, https://doi.org/10.1016/j.procs.2020.06.067.
Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828.
Dong, C., C. C. Loy, K. He, and X. Tang, 2014: Learning a deep convolutional network for image super-resolution. Computer Vision–ECCV 2014, D. Fleet et al., Eds., Lecture Notes in Computer Science, Vol. 8692, Springer International Publishing, 184–199, https://doi.org/10.1007/978-3-319-10593-2_13.
Dong, C., C. C. Loy, K. He, and X. Tang, 2015: Image super-resolution using deep convolutional networks. arXiv, 1501.00092v3, https://doi.org/10.48550/arXiv.1501.00092.
Frei, C., J. H. Christensen, M. Déqué, D. Jacob, R. G. Jones, and P. L. Vidale, 2003: Daily precipitation statistics in regional climate models: Evaluation and intercomparison for the European Alps. J. Geophys. Res., 108, 4124, https://doi.org/10.1029/2002JD002287.
Fritsche, M., S. Gu, and R. Timofte, 2019: Frequency separation for real-world super-resolution. arXiv, 1911.07850v1, https://doi.org/10.48550/arXiv.1911.07850.
Gardner, M. W., and S. R. Dorling, 1998: Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmos. Environ., 32, 2627–2636, https://doi.org/10.1016/S1352-2310(97)00447-0.
Goodfellow, I. J., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, 2014: Generative adversarial networks. arXiv, 1406.2661v1, https://doi.org/10.48550/arXiv.1406.2661.
Gulrajani, I., F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, 2017: Improved training of Wasserstein GANs. arXiv, 1704.00028v3, https://doi.org/10.48550/arXiv.1704.00028.
Harris, C. R., and Coauthors, 2020: Array programming with NumPy. Nature, 585, 357–362, https://doi.org/10.1038/s41586-020-2649-2.
Harris, L., A. T. T. McRae, M. Chantry, P. D. Dueben, and T. N. Palmer, 2022: A generative deep learning approach to stochastic downscaling of precipitation forecasts. J. Adv. Model. Earth Syst., 14, e2022MS003120, https://doi.org/10.1029/2022MS003120.
Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803.
Houze, R. A., Jr., 2004: Mesoscale convective systems. Rev. Geophys., 42, RG4003–, https://doi.org/10.1029/2004RG000150.
Innocenti, S., A. Mailhot, A. Frigon, A. J. Cannon, and M. Leduc, 2019: Observed and simulated precipitation over northeastern North America: How do daily and subdaily extremes scale in space and time? J. Climate, 32, 8563–8582, https://doi.org/10.1175/JCLI-D-19-0021.1.
Judt, F., 2018: Insights into atmospheric predictability through global convection-permitting model simulations. J. Atmos. Sci., 75, 1477–1497, https://doi.org/10.1175/JAS-D-17-0343.1.
Karpathy, A., 2022: CS231n convolutional neural networks for visual recognition. Stanford University, https://cs231n.github.io/convolutional-networks.
Kashinath, K., and Coauthors, 2021: Physics-informed machine learning: Case studies for weather and climate modelling. Philos. Trans. Roy. Soc., A379, 20200093, https://doi.org/10.1098/rsta.2020.0093.
Kharin, V. V., F. W. Zwiers, X. Zhang, and G. C. Hegerl, 2007: Changes in temperature and precipitation extremes in the IPCC ensemble of global coupled model simulations. J. Climate, 20, 1419–1444, https://doi.org/10.1175/JCLI4066.1.
Kingma, D. P., and J. Ba, 2017: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.
Kopparla, P., E. M. Fischer, C. Hannay, and R. Knutti, 2013: Improved simulation of extreme precipitation in a high-resolution atmosphere model. Geophys. Res. Lett., 40, 5803–5808, https://doi.org/10.1002/2013GL057866.
Krizhevsky, A., I. Sutskever, and G. E. Hinton, 2012: ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, Lake Tahoe, NV, NeurIPS, 1097–1105, https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
Kumar, B., R. Chattopadhyay, M. Singh, N. Chaudhari, K. Kodari, and A. Barve, 2021: Deep learning–based downscaling of summer monsoon rainfall data over Indian region. Theor. Appl. Climatol., 143, 1145–1156, https://doi.org/10.1007/s00704-020-03489-6.
Ledig, C., and Coauthors, 2017: Photo-realistic single image super-resolution using a generative adversarial network. arXiv, 1609.04802v5, https://doi.org/10.48550/arXiv.1609.04802.
Leinonen, J., D. Nerini, and A. Berne, 2020: Stochastic super-resolution for downscaling time-evolving atmospheric fields with a generative adversarial network. IEEE Trans. Geosci. Remote Sens., 59, 7211–7223, https://doi.org/10.1109/TGRS.2020.3032790.
Li, G., X. Zhang, A. J. Cannon, T. Murdock, S. Sobie, F. Zwiers, K. Anderson, and B. Qian, 2018: Indices of Canada’s future climate for general and agricultural adaptation applications. Climatic Change, 148, 249–263, https://doi.org/10.1007/s10584-018-2199-x.
Liu, C., and Coauthors, 2017: Continental-scale convection-permitting modeling of the current and future climate of North America. Climate Dyn., 49, 71–95, https://doi.org/10.1007/s00382-016-3327-9.
Lucas-Picher, P., D. Caya, R. de Elía, and R. Laprise, 2008: Investigation of regional climate models’ internal variability with a ten-member ensemble of 10-year simulations over a large domain. Climate Dyn., 31, 927–940, https://doi.org/10.1007/s00382-008-0384-8.
Maraun, D., and Coauthors, 2010: Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user. Rev. Geophys., 48, RG3003, https://doi.org/10.1029/2009RG000314.
McGovern, A., R. Lagerquist, D. J. Gagne II, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 2175–2199, https://doi.org/10.1175/BAMS-D-18-0195.1.
Merkel, D., 2014: Docker: Lightweight Linux containers for consistent development and deployment. Linux J., 2014, 2, https://www.linuxjournal.com/content/docker-lightweight-linux-containers-consistent-development-and-deployment.
Michaelides, S. C., 2008: Precipitation: Advances in Measurement, Estimation and Prediction. 1st ed. Springer, 540 pp., https://doi.org/10.1007/978-3-540-77655-0.
Mirza, M., and S. Osindero, 2014: Conditional generative adversarial nets. arXiv, 1411.1784v1, https://doi.org/10.48550/arXiv.1411.1784.
Morley, S. K., T. V. Brito, and D. T. Welling, 2018: Measures of model performance based on the log accuracy ratio. Space Wea., 16, 69–88, https://doi.org/10.1002/2017SW001669.
Prein, A. F., and Coauthors, 2016: Precipitation in the EURO-CORDEX 0.11° and 0.44° simulations: High resolution, high benefits? Climate Dyn., 46, 383–412, https://doi.org/10.1007/s00382-015-2589-y.
Price, I., and S. Rasp, 2022: Increasing the accuracy and resolution of precipitation forecasts using deep generative models. Proc. Int. Conf. on Artificial Intelligence and Statistics, Online, PMLR, 10 555–10 571, https://proceedings.mlr.press/v151/price22a.html.
Rasmussen, R., and C. Liu, 2017: High resolution WRF simulations of the current and future climate of North America. UCAR/NCAR, accessed 2 January 2017, https://rda.ucar.edu/datasets/ds612.0/.
Rossa, A., P. Nurmi, and E. Ebert, 2008: Overview of methods for the verification of quantitative precipitation forecasts. Precipitation: Advances in Measurement, Estimation and Prediction, Springer, 419–452.
Sampat, M. P., Z. Wang, S. Gupta, A. C. Bovik, and M. K. Markey, 2009: Complex wavelet structural similarity: A new image similarity index. IEEE Trans. Image Process., 18, 2385–2401, https://doi.org/10.1109/TIP.2009.2025923.
Schlager, C., G. Kirchengast, J. Fuchsberger, A. Kann, and H. Truhetz, 2019: A spatial evaluation of high-resolution wind fields from empirical and dynamical modeling in hilly and mountainous terrain. Geosci. Model Dev., 12, 2855–2873, https://doi.org/10.5194/gmd-12-2855-2019.
Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2020: Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part I: Daily maximum and minimum 2-m temperature. J. Appl. Meteor. Climatol., 59, 2057–2073, https://doi.org/10.1175/JAMC-D-20-0057.1.
Shocher, A., N. Cohen, and M. Irani, 2018: “Zero-shot” super-resolution using deep internal learning. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, IEEE, 3118–3126, https://openaccess.thecvf.com/content_cvpr_2018/html/Shocher_Zero-Shot_Super-Resolution_Using_CVPR_2018_paper.html.
Sillmann, J., V. V. Kharin, X. Zhang, F. W. Zwiers, and D. Bronaugh, 2013: Climate extremes indices in the CMIP5 multimodel ensemble: Part 1. Model evaluation in the present climate. J. Geophys. Res. Atmos., 118, 1716–1733, https://doi.org/10.1002/jgrd.50203.
Singh, A., B. White, A. Albert, and K. Kashinath, 2020: Downscaling numerical weather models with GANs. 19th Conf. on Artificial Intelligence for Environmental Science, Boston, MA, Amer. Meteor. Soc., 2B.7, https://ams.confex.com/ams/2020Annual/meetingapp.cgi/Paper/365409.
Sobie, S. R., and T. Q. Murdock, 2017: High-resolution statistical downscaling in southwestern British Columbia. J. Appl. Meteor. Climatol., 56, 1625–1641, https://doi.org/10.1175/JAMC-D-16-0287.1.
Song, J.-H., Y. Her, S. Shin, J. Cho, R. Paudel, Y. P. Khare, J. Obeysekera, and C. J. Martinez, 2020: Evaluating the performance of climate models in reproducing the hydrological characteristics of rainfall events. Hydrol. Sci. J., 65, 1490–1511, https://doi.org/10.1080/02626667.2020.1750616.
Stengel, K., A. Glaws, D. Hettinger, and R. N. King, 2020: Adversarial super-resolution of climatological wind and solar data. Proc. Natl. Acad. Sci. USA, 117, 16 805–16 815, https://doi.org/10.1073/pnas.1918964117.
Stephens, G. L., and Coauthors, 2010: Dreary state of precipitation in global models. J. Geophys. Res., 115, D24211, https://doi.org/10.1029/2010JD014532.
Torma, C., F. Giorgi, and E. Coppola, 2015: Added value of regional climate modeling over areas characterized by complex terrain—Precipitation over the Alps. J. Geophys. Res. Atmos., 120, 3957–3972, https://doi.org/10.1002/2014JD022781.
Wang, F., D. Tian, L. Lowe, L. Kalin, and J. Lehrter, 2021: Deep learning for daily precipitation and temperature downscaling. Water Resour. Res., 57, e2020WR029308, https://doi.org/10.1029/2020WR029308.
Wang, X., and Coauthors, 2018: ESRGAN: Enhanced super-resolution generative adversarial networks. arXiv, 1809.00219v2, https://doi.org/10.48550/arXiv.1809.00219.
Whiteman, C. D., 2000: Mountain climates of North America. Mountain Meteorology: Fundamentals and Applications, C. D. Whiteman, Ed., Oxford University Press, 11–22, https://doi.org/10.1093/oso/9780195132717.003.0008.
Wilby, R. L., and T. M. L. Wigley, 1997: Downscaling general circulation model output: A review of methods and limitations. Prog. Phys. Geogr., 21, 530–548, https://doi.org/10.1177/030913339702100403.
Zhang, Y., Y. Tian, Y. Kong, B. Zhong, and Y. Fu, 2018: Residual dense network for image super-resolution. arXiv, 1802.08797v2, https://doi.org/10.48550/arXiv.1802.08797.
Zhang, Y., Z. Zhang, S. DiVerdi, Z. Wang, J. Echevarria, and Y. Fu, 2020: Texture hallucination for large-factor painting super-resolution. Computer Vision–ECCV 2020, A. Vedaldi et al., Eds., Lecture Notes in Computer Science, Vol. 12352, Springer, 209–225, https://doi.org/10.1007/978-3-030-58571-6_13.
Zhu, X., L. Zhang, L. Zhang, X. Liu, Y. Shen, and S. Zhao, 2020: GAN-based image super-resolution with a novel quality loss. Math. Probl. Eng., 2020, 5217429, https://doi.org/10.1155/2020/5217429.