1. Introduction
A large portion of research into the physical Earth system is reliant on the use of general circulation models (GCMs). GCMs are used for weather prediction and for climate studies on time scales of days to years, although their seamless use for both is still rare. They have been central to informing policy makers through the International Panel on Climate Change (IPCC 2013).
The IPCC report relies on a multiplicity of different GCMs, developed by research centers spread across the world. The latest Climate Model Intercomparison Project (CMIP6; Eyring et al. 2016) includes output from tens of GCMs. These differ in how the Earth system is discretized, in their approximations for and inclusion of subgrid-scale processes, and even down to integration schemes, all of which introduce uncertainty that is reflected in the spread of results across models. The climate is highly chaotic, and so these differences can lead to detectable differences in the GCM outputs (Wang et al. 2014; Maher et al. 2018).
GCM outputs are used to assess the risks associated with different weather events, such as droughts, heatwaves, floods, and wildfire risk. To quantify these risks, researchers must decide which GCMs are fit for this purpose. Persistent biases in climate models exist (Eyring et al. 2021) and need to be addressed by either selecting the best models or by correcting biases. The choice of models can have a quantitative and qualitative difference on estimated climate risks (e.g., Kirchmeier-Young et al. 2017; Herger et al. 2018). A GCM that has a low bias relative to observations in one variable and one geographical location may have a large bias in another (Ridder et al. 2021). This makes compound risks (Leonard et al. 2014), such as simultaneous heatwaves and drought, even harder to assess as they involve multiple variables and may involve multiple geographical areas. Simultaneous crop failure in multiple regions of high agricultural output (Gaupp et al. 2020) is a risk of recent study. Particularly persistent biases exist in climate model simulated rainfall patterns, with many models exhibiting a double intertropical convergence zone (ITCZ) and misplaced monsoons (Wang et al. 2020; Tian and Dong 2020).
GCMs are improving, but in the interim, we must use their outputs effectively to understand our current climate and the potential effects of global warming. Therefore, we must devise methods to optimally correct biases in the output of GCMs (Bellprat et al. 2019).
Recently, modern artificial neural network architectures have been developed that can be used for bias correction and statistical downscaling (Moghim and Bras 2017; Steininger et al. 2020; Le et al. 2020; Han et al. 2021; Wang et al. 2021) and that offer some theoretical advantages over classical techniques. In this paper, we will focus on unpaired image-to-image translation networks to perform bias correction between a GCM and observations.
These neural networks use layers of convolutional filters that are applied across multiple climate variables simultaneously. This gives these architectures the capacity to “see” and, therefore, correct both spatial and cross-variable relations simultaneously. This is an advantage over current methods. These networks are presented in more detail later.
Further, QM does not use the values of other variables at the same spatial location. Therefore, relationships between variables are severed and may become physically unrealistic. For example, if a pixel is translated from a dry day in the GCM to a wet day in equivalent observations, the relationships between surface temperature and precipitation (Trenberth and Shea 2005) may not be preserved.
These are limitations that modern neural network architectures could be ideally suited to improve upon. Both of these features would be required to accurately correct for the presence and strength of teleconnections (Yuan et al. 2018; Stan et al. 2017). A specific example is shown in Maraun et al. (2017), where extreme precipitation in Piura, Peru, only occurs during El Niño events in observations. These extreme rainfall events did not occur in the GCM outputs, so when QM was applied to the GCM data the extreme rainfall events no longer co-occurred with El Niño. El Niño events are primarily characterized by warm sea surface temperatures in the equatorial Pacific Ocean (Timmermann et al. 2018), and so conditional bias correction that takes spatial and cross-variable information is required.
Previous work has attempted to solve these issues with classical techniques, but incompletely. In Levy et al. (2013), the authors propose optimally stretching simulated precipitation fields to match precipitation patterns in observations. They found that when precipitation features were stretched onto the correct places, human attributable change in precipitation could be more easily detected, while misplaced features lead to poor fingerprints of the expected climate change. However, this technique uses monthly average precipitation and does not allow the use of daily data, which is important for studying extreme risks. It also cannot easily be extended to multiple variables. In Cannon (2018), the authors propose a way to generalize QM to N dimensions. This allows the user to transform multiple variables in multiple spatial locations using daily data. However, as they note, their method cannot be extended to many spatial points in many variables before it becomes computationally limited and becomes prone to overfitting.
We note that this problem of bias correcting GCM outputs is more generally the problem of mapping between two empirical distributions of multichannel images without having any one-to-one corresponding pairs. This is precisely the problem description of unpaired image-to-image translation (Liu and Tuzel 2016).
Some previous work has applied similar neural network architectures to this problem. François et al. (2021) apply an architecture based on a cycle-consistent generative adversarial network (CycleGAN; Zhu et al. 2017) to bias correct temperature and precipitation data in a region over Paris, but only for the winter season. Pan et al. (2021) develop a similar network architecture to bias correct precipitation data over the contiguous United States. They use the dynamical variables (sea level pressure, geopotential height, and specific humidity at 500 hPa) to aid the translations but do not bias correct these variables themselves.
In this paper, we extend on and complement these previous studies. We first introduce a different image translation method that has not been applied in the climate domain. We apply this method to bias correct simulations from a GCM of the South Asian monsoon across five variables, using a GCM that has substantial biases in the spatial pattern of the monsoon, and using reanalysis data as a target for the correction. This is a larger geographical region and uses more variables than previous studies, and this region has a known and very significant physical bias in the GCM. We evaluate the method’s use by comparing its performance with quantile mapping, and we explore using these two techniques in conjunction to better represent the monsoon and other extreme events in the region. We analyze the translation results focusing on relationships between different variables as well as spatial relationships. Additional details on various parts of the paper are provided in the appendix.
2. A prime use for unpaired image-to-image translation networks
These architectures, such as Unsupervised Image-to-Image Translation (UNIT; Liu et al. 2017), CycleGAN (Zhu et al. 2017), and AlignFlow (Grover et al. 2020) are neural networks that incorporate the architecture of GANs (Goodfellow et al. 2014). The aim of these techniques is to translate between images
These networks were initially used to translate between images of summer and winter driving scenes, and between simulated city driving scenes and real city driving scenes (Liu et al. 2017; Zhu et al. 2017). Further, in Hao et al. (2021), they use these networks to make video game images look more photorealistic. In Shrivastava et al. (2017), the authors train a related refiner model to create more realistic-looking images of eyes from simulated 3D models. These are all examples of statistically bias correcting spatial fields with multiple variables, as we aim to do with GCMs.
To map between GCM outputs and observations, the ability to translate without pairs is absolutely necessary. Imagine an idealized GCM that captures all of the physics of the Earth system almost perfectly. However, due to discretization, it accumulates errors when integrating a climate state forwards in time. If it was initiated with the exact observed climate state ψGCM(0) = ψobs(0) we should expect that after approximately two weeks (Lorenz 1969; Zhang et al. 2019) the simulation will have diverged from the observations as a result of chaos. Therefore, image pairs from the simulation and observations collected beyond two weeks have no relation to each other. If the GCM is not perfect and hence has a bias, then the initial state matched to observations ψGCM(0) = ψobs(0) may be one that the GCM does not visit often, and the simulation will drift toward its own preferred states. This means we cannot use the image pairs collected during the first two weeks to bias correct the rest of the GCM simulation, that is, cannot use pairs
This lack of corresponding pairs makes bias correction distinctly different than perfect prognosis and statistical downscaling (Maraun et al. 2010), so we cannot use the machine learning techniques employed therein. In these settings, predictions are made for short lead times and so gathering corresponding pairs of GCM predictions and observations is possible. Although the application of deep learning to these tasks and statistical nowcasting (Steininger et al. 2020; Vaughan et al. 2022; Ravuri et al. 2021; Sønderby et al. 2020) has been developing recently, it would not be valid to simply use this learned mapping for times after around two weeks.
We note that Wang and Tian (2022) attempt to bias correct temperatures from GCMs by assuming synchronized pairs between observations and simulations over long time frames. However, they admit this to be a limitation of their proposed method and that the dynamics of the GCM may be distorted. They also do not thoroughly test their assumption.
a. A brief overview of unpaired image-to-image translations networks
The UNIT network is composed of multiple subcomponents. A more precise breakdown and diagram of the components in terms of layers is available in section b of the appendix, and also in the original paper (Liu et al. 2017). The main backbone of the network is composed of two variational autoencoders (VAEs) (Kingma and Welling 2013). The encoder EX is a convolutional neural network that maps images from domain X to a region in latent space Z, that is, p(z|x) = EX(x). There is a similar encoder EY that maps images in domain Y into a latent space that, if trained correctly, should be the same as Z. The main purpose of this network is to learn this shared latent space of the two image domains and learn functions to map into and out of it to the two domains. Decoders GX and GY map vectors from the shared latent space Z to images in the domains X and Y, respectively, that is,
These KL divergence losses encourage the images to have a Gaussian distribution when encoded into the shared latent space. During training, a random perturbation is added in the latent space encoding during training so that z = EX(x) ↦ EX(x) + η, where η is a random vector sampled from a multivariate Gaussian distribution with unit diagonal variance. This so-called reparameterization trick is required to train VAEs (Kingma and Welling 2013) and ensures that the learned latent space Z is smooth—that is, that images that are similar in domain X will be translated to a similar region in domain Z.
b. What do bias corrected simulations represent?
It is important to consider what the translated sequence FGCM→obs[ψGCM(t)] represents, and we refer to Ehret et al. (2012) for a more thorough discussion. The ψGCM(t) and thus FGCM→obs[ψGCM(t)] are driven dynamically by the time evolution operator of the GCM, but each image FGCM→obs[ψGCM(t)] is mapped individually to look like it came from the observations. If we set ψGCM(0) = Fobs→GCM[ψobs(0)] and evolve each of these states forward in time with their own time evolution operator, then we would still expect that FGCM→obs[ψGCM(t)] ≠ ψobs(t). Although each image from FGCM→obs[ψGCM(t)] should be realistic in comparison with observations, the entire sequence need not be. An example of this inconsistency could come when bias correcting wind velocities and precipitation. The wind velocities may be debiased such that it should change the velocity of a precipitation system, but in consecutive frames the precipitation pattern may not reflect this.
So, when using GCM output, we must assume that the GCM has realistic time dynamics, which would not be drastically affected by our moderate bias correction. This also emphasizes that bias correction is no cure for a poorly performing GCM and that capturing the physics of the climate system is key.
3. Bias correcting the South Asian monsoon
We use GCM data from the Climate of the Twentieth Century Plus (C20C+) Project (Folland et al. 2014), particularly from the HadGEM3-A-N216 GCM (Ciavarella et al. 2018), run under a historical recreation scenario (All-Hist/est1). In this dataset, the ocean temperatures are prescribed to their observational estimates and emissions are set to the historical record. Therefore, only the atmosphere component of the model is run.
We attempt to bias correct the HadGEM3 historical recreation data with respect to the ERA5 reanalysis data (Hersbach et al. 2020). Both of these datasets are daily data fields. We choose this over monthly data because extreme events like heatwaves and floods occur on the time scale from days to weeks.
We limit the geographical extent to the area bounded by 8°S–30°N, 44°–121°E. This region was chosen to capture the South Asian monsoon. Many GCMs, including HadGEM3, have a large bias in simulating the South Asian monsoon (Bollasina and Ming 2013; Ashfaq et al. 2017), which is usually placed too far south over the Indian Ocean and leaves the landmass drier than reality (shown in Fig. 1). This bias remains when the model is run with prescribed sea surface temperatures. We chose this region as a hard case to solve for bias correction. We consider the daily 2-m mean, minimum, and maximum temperature; accumulated precipitation; and mean 500-hPa geopotential height at all grid locations. The first four of these variables are important for climate impact studies and are of primary interest. The geopotential height is a dynamical variable that can aid the accuracy of bias correction of the other variables (Pan et al. 2021), although we allow the network to correct this variable also.
Mean precipitation (mm day−1) during peak monsoon months (June–September) for each dataset and their bias with respect to ERA5.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
Before bias correction, we conservatively regrid (Jones 1999) the ERA5 data onto the coarser grid of the HadGEM3 data, which has resolution of 0.56° latitude and 0.83° longitude. This gave a region of size 68 × 92 grid points. We also limit the two datasets to the time period in which they overlap; this is 1979–2013 inclusive, giving us 35 complete years and approximately 13 000 daily fields. We split this data into train and test sets, using the odd-numbered years for training each method and the even-numbered years to test on. This extreme split of using nearly 50% of the data test was required to perform adequate analysis of the spatial and cross-variable statistics of the results. All figures to follow were based on the test set, which was not used in training or choosing parameters for any method.
We applied quantile mapping [Eq. (1)] to both datasets by fitting a 100-point empirical cumulative distribution function to each grid point for each month of the year and for each variable. This equates to ∼75 million points estimated from the data in order to perform the bias correction with quantile mapping. For comparison, the UNIT translation network had ∼38 million parameters. The form of QM we use is generally applied in situations in which the data are stationary, that is, where there is no distribution shift due to global warming. The neural network approach we choose similarly assumes stationarity. Since we limit our datasets to only a 35-yr period and split alternative years into train and test data, we do not expect this assumption to have a significant impact on the results we show. Especially as we are considering daily variability rather than monthly averages, so changes in the mean are relatively small in comparison with variability. The extensions that are added to quantile mapping to allow it to approximately debias data that are not stationary (Cannon et al. 2015) could be applied to the UNIT neural network approach with little modification.
In the following section, we compare these translations and the original datasets. In particular, we study in detail how they address the large biases in simulations of the South Asian monsoon. To test this more broadly, we also consider their performance in correcting a variety of other extreme events of societal relevance to the region.
4. Results
Early in our experimentation with the UNIT network, we noticed that it was biased against producing extreme values in its translations. This limitation may be because UNIT inherits features of its architecture from GANs, and GANs are known to reduce the distribution at its boundaries (Dionelis et al. 2020; Bau et al. 2019; Arora and Zhang 2017). Notably, Ravuri et al. (2021) found that using a conditional GAN for precipitation nowcasting also reduced extreme values in precipitation fields. UNIT also inherits features VAEs, which blur the image reconstructions (Snell et al. 2017) and thus reduce the extreme values. The VAE blurring is a result of the VAE architecture. These extreme values are important for climate research, so we propose to improve the UNIT network performance by following it up with QM. In the combined UNIT+QM method, we take the trained UNIT network and use it to translate the entire HadGEM3 dataset; then we train QM between this dataset and the ERA5 dataset. The same trained UNIT instance is used in both the UNIT and the UNIT+QM results presented.
a. South Asian monsoon
Figure 1 shows the means and biases of precipitation in the peak monsoon months (June–September) in ERA5, HadGEM3, and the UNIT-corrected data. This figure shows the key bias present in HadGEM3’s simulated monsoon. It also shows the effects of the reduction in extremes generated by UNIT, where the average precipitation is reduced everywhere. The results for QM and UNIT+QM are not shown. By the definition of QM, their biases under this plot would be near zero (nonzero only due to sampling error between train and test data), and their means would be the same as the ERA5 field. We confirmed this by plotting.
Figure 2 shows three nonconsecutive days from the HadGEM3 validation data and their bias corrections using the three different methods. These examples show that UNIT can coherently bias correct the structure of the fields and shows the consequences of QM’s lacking in this ability. In example day 1, it removes the precipitation in HadGEM3 that is characteristic of the HadGEM3 monsoon bias. It also removes the associated minor depression in geopotential height. QM maintains the spatial structure of the fields but simply adjusts the intensity; hence the QM field is left with a muted form of the HadGEM3 monsoon precipitation. In the QM data, the spatial structure is still intact, and the field produced is unrealistic. In example day 2, UNIT removes the monsoon precipitation feature and a stronger geopotential height anomaly. In this case, the dynamical conditions simulated by the GCM are truly biased and do not happen in the observations. In this case, it makes sense to bias correct the dynamical feature of geopotential height as well as the precipitation. This is an advantage over previous studies that assume the dynamical features are correct and use them without scrutiny to aid bias correcting other variables. Example day 3 shows a case where UNIT adds a precipitation system over the Bay of Bengal, showing the UNIT network can add meteorological features as well as remove them.
Three samples of daily fields from the HadGEM3 data and their bias corrections with three different methods. The fields shown for each day are the accumulated precipitation (plotted with power-law color intensity), 500-hPa geopotential height, and 2-m temperature. The geopotential height and temperature are shown as anomalies and have had their monthly means subtracted. Each plot in a given row shares the color bar plotted at the end of the row.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
In Fig. 3, we show the distribution of daily accumulated precipitation from a single grid point. This grid point is located on the southern tip of India. The datasets of the three bias correction techniques of UNIT, UNIT+QM, and QM are created by taking these HadGEM3 data and debiasing them with each method.
Distributions of daily accumulated precipitation from a single grid point located at the southern tip of India (centered at 8.6°N, 77.9°E). Five distributions for the same location are plotted from the five different data sources. The normalized histogram is on an x scale of the fourth root of precipitation so that the differences in the distributions can be seen. It is also clipped at an upper value of 75 mm day−1.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
This figure shows the typical behavior of these bias correction methods. There is a significant bias between the ERA5 and HadGEM3 datasets, with HadGEM3 showing much more light drizzle (Takahashi et al. 2021) with precipitation values below 0.1 mm day−1. Since this is a marginal distribution, QM is expected to perform well. If the datasets were infinitely large, then by definition, QM would perform perfectly here (Maraun et al. 2017). The same is true of the combined UNIT+QM method. The UNIT translation performs reasonably well on the marginal distribution, although it reduces the occurrence of both high and low extremes and shifts more of the distribution toward ERA5’s central peak. Note that QM is explicitly designed to match these one-dimensional distributions at each grid point, while UNIT matches the distributions only as an emergent feature of optimizing its loss function.
Figure 4 shows the joint distribution between temperature and precipitation at the same grid location at the southern tip of India (we also sampled several others and found similar qualitative results). This plot shows that although UNIT underdisperses the data, it can capture the shape of the joint distribution well. We see that quantile mapping does not correct the joint distribution between precipitation and temperature well. UNIT+QM performs the best of the three techniques and captures both the joint distribution and the dispersion of the data. The UNIT step in UNIT+QM corrects the correlation structure, while the QM part corrects the marginal distributions.
Joint distributions of daily accumulated precipitation and daily mean temperature from a single grid point located at the southern tip of India (centered at 8.6°N, 77.9°E) for the five different datasets. Each point is from a single day, and the contours show the joint density as estimated by kernel density estimation. The x scale in all subplots is the 4 root of precipitation. This was chosen so that the differences in the distributions can be seen. It is also clipped at an upper value of 75 mm day−1.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
Continuing this analysis, Fig. 5 shows maps of the JS divergence of precipitation and temperature at all grid points. UNIT performed poorly as a result of how it underdisperses the data, whereas UNIT+QM performed the best. UNIT+QM appears to adjust for the correlations in the data while also maintaining the dispersion.
Maps of estimated JS divergence [Eq. (7)] of the HadGEM3, UNIT, UNIT+QM, and QM datasets with respect to the ERA5 dataset. The JS divergence is calculated at each grid point between the two-dimensional precipitation and temperature distributions using 20 bins in each dimension.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
To assess the spatial structures of the data and translations, we start locally, looking at the joint distribution of precipitation at two neighboring grid points. We choose the same familiar grid point on the southern tip of India and the cell directly eastward of it. Again, in Fig. 6, we see that UNIT+QM performs the best, improving the structure of the joint distribution and also the dispersion of the data. UNIT reduces the dispersion, and QM does not adequately correct the correlation structure.
Joint distributions of daily accumulated precipitation at a grid point located at the southern tip of India (centered at 8.6°N, 77.9°E) and its neighboring cell directly eastward (centered at 78.8°E) for the five different datasets. Each point is from a single day, and the contours show the joint density as estimated by kernel density estimation. The x and y scale in all subplots is the fourth root of precipitation. This was chosen so that the differences in the distributions can be seen. It is also clipped at an upper value of 75 mm day−1.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
We wish to assess whether the daily fields produced by each method of bias correction are realistic and whether the spatial structures observed are biased. To do so, we use the mean structural similarity index measure (SSIM) (Wang et al. 2004), which is a metric designed to measure how structurally similar two images are. SSIM takes into account the difference in mean values, the contrast in the image between high and low values, and the structure in the image, that is, whether high and low values are in the same locations. As is common, we use the mean SSIM, and these comparisons are made for each grid point using an 11 × 11 pixel Gaussian sliding window. Then the average is taken over the image. A larger value shows a closer match between images, with a maximum possible similarity of 1.
To assess whether the datasets produce fields that are spatially realistic, we propose running the following computation: Take one daily field of a single variable from a dataset χ ∈ {HadGEM3, UNIT, UNIT+QM, QM} and find the daily field from the ERA5 test dataset that is most similar using SSIM. This is a similar matching algorithm to flow-analogs as developed in Yiou et al. (2007). Store this optimal value of SSIM. Repeat for all days in dataset χ.
Then we plot the distribution of these best-match SSIM values. Figure 7 shows the distributions calculated by matching on geopotential height, mean daily temperature, and the fourth root of precipitation. We chose to use the fourth root of precipitation as the raw marginal distribution has a very long tail, and we wish to avoid extreme values dominating the comparison. Instead, we are trying to focus on the overall spatial structure. In the figure, the farther the distribution is to the right the more similar the fields in the dataset were to the ERA5 fields. In the figure, we also include the results of comparing fields from the ERA5 train dataset with the test dataset to set a baseline and aid interoperability. The results in this figure tell us that the UNIT translations produced fields that were closer to ERA5 fields than the other translation methods, with UNIT+QM having the next most similar fields.
Distributions of the SSIM between each dataset field and its most similar ERA5 field, shown for the mean 2-m temperature, mean 500-hPa geopotential height, and the fourth root of daily precipitation.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
There are some limitations to this matching method that should be noted and can explain why UNIT outperforms UNIT+QM here. The UNIT translations are underdispersed, as we have seen in previous figures. This means each UNIT field lies more toward the center of the ERA5 distribution. Fields from any dataset χ that lie toward the center of the ERA5 distribution are more likely to find an ERA5 field that matches them closely than fields from χ that are toward the tails of the ERA5 distribution. This is simply because there are more ERA5 fields to choose from in dense regions. The best-match SSIM between a field x and the ERA5 data is positively associated with the density of the ERA5 data pERA(x) around the location x. This explains why UNIT performs best in this analysis. We note that UNIT+QM performed second best and the distribution of UNIT+QM is not underdispersed like UNIT. In section i of the appendix, we perform extra analysis that confirms that UNIT’s exaggerated performance in this analysis is likely due to underdispersion.
We repeated the computation above using mean absolute difference and mean square difference as alternative metrics to SSIM and achieved qualitatively similar results.
Furthermore, we examine whether the characteristics of these methods hold when we spatially aggregate the data. We aggregate to a few river basins in this region, using data from the World Bank data catalog (World Bank 2019) to define the basin boundaries. We aggregate to the portion of these basins that lie within the spatial extent of our data, and, therefore, two of the basins are clipped (see section f of the appendix for plotted basin masks).
Figure 8 shows the joint distribution between the spatial mean precipitation and temperature in the Mahanadi River basin. This basin was chosen as there is a big difference between the HadGEM3 and ERA5 joint distributions, and, therefore, the bias correction method has a lot to correct. Debiasing this distribution involves correcting spatial correlations (since it is a spatial aggregate) and cross-variable correlations simultaneously. Once again, we can see that UNIT+QM performs well, correcting the cross-variable correlations and the dispersion.
Joint distributions of the mean daily accumulated precipitation and daily mean temperature aggregated over the Mahanadi River basin for the five different datasets. Each point is from a single day, and the contours show the joint density as estimated by kernel density estimation. The x axis has power-law scaling. This was chosen so that the differences in the distributions can be seen. It is also clipped at an upper value of 75 mm day−1.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
In Fig. 9, we plot the joint distribution between the spatial mean of precipitation in the Ganges–Brahmaputra and Indus basins. These are two important basins, both impacted by the South Asian monsoon and geographically separated. This kind of long-range bias correction could be important for assessing the risks of multiple breadbasket failure. None of the methods performed particularly well in this analysis.
Joint distributions of mean daily accumulated precipitation across the Ganges–Brahmaputra and the Indus basin for the five different datasets. Each point is from a single day, and the contours show the joint density as estimated by kernel density estimation. The x axis and y axis have power-law scaling. This was chosen so that the differences in the distributions can be seen. It is also clipped at an upper value of 75 mm day−1.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
b. Other extremes
In the second part of the results section, we study the performance of these bias correction techniques for other extreme events of societal relevance in the South Asia region. In particular, we study whether the combined UNIT+QM improves the representation of physical extremes. We consider three cases, testing the cross correlation of different variables and spatial areas.
First, we analyze the relationship between temperature and pressure on the hottest day of each year over a region in central India (8°–28°N, 72°–85°E). We select the day with the highest average maximum daily temperature over this region in each year. South Asia experiences some of the most extreme humid heat on the planet (Raymond et al. 2020), and combined with high population density and vulnerability to such hazards, this poses a severe threat to human health (Im et al. 2017). Furthermore, even despite the cooling influence of anthropogenic aerosols over the region, recent events such as the heatwave of 2015 have been amplified by climate change (Wehner et al. 2016). Thus, the understanding of such events is becoming increasingly pertinent.
Figure 10 shows the joint distributions between daily maximum temperature and 500-hPa geopotential height at all grid points across all annual hottest days. In this case, UNIT+QM appears to perform the best, though with only 17 annually hottest days to analyze, so this could be quite sensitive to noise.
Joint distributions of the daily maximum temperature and 500-hPa geopotential height at each grid point over central India, bounded by 8°–28°N, 72°–85°E, on the hottest day of each year (defined by the spatially averaged daily maximum temperature) for the five different datasets. Each point is from a single day and grid point, and the contours show the joint density as estimated by kernel density estimation.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
Results of the JS divergence for all of the various extreme distributions are presented in Table 1. In this first case, UNIT+QM performs best, though not significantly better than QM.
JS divergence values for each model relative to ERA5 across three joint distributions of extreme events: temperature and pressure on the hottest days of the year over central India (Fig. 10), temperature and precipitation over the wettest continuous 30 days of each year over the Ganges–Brahmaputra basin (Fig. 11), and the standardized precipitation index in the Ganges–Brahmaputra and Indus River basins (Fig. 12). These JS divergence values are all significantly distinct from each other—see section e of the appendix.
Second, we analyze the relationship between temperature and precipitation over the Ganges–Brahmaputra River basin during the wettest continuous 30-day period of each year. Flooding, driven partly by rainfall excesses or deficits, has severe impacts on the region. Between 2000 and 2020, floods caused over USD 100 billion in damages and the deaths of more than 49 000 people—almost one-half of the global flood mortality in the period—according to disaster database EM-DAT (Guha-Sapir et al. 2014). Although this is largely driven by the behavior of the monsoon, impactful rainfall extremes also occur outside of this season and are also influenced by anthropogenic climate change (Rimi et al. 2019). When widespread flooding occurs, simultaneous extreme heat may result in compounded impacts, such as through disrupted water supplies and water- and insect-borne disease (Levy et al. 2016; Moors et al. 2013).
Figure 11 shows the joint distributions between the 30-day averages of daily maximum temperature and precipitation, for the wettest 30-day period of each year, at all grid points in the Ganges–Brahmaputra basin and for all years.
Joint distributions of the daily maximum temperature and accumulated precipitation at each grid point over the Ganges–Brahmaputra River basin over the wettest 30-day period each year, defined by the spatially averaged accumulated precipitation, for the five different datasets. Each point is from a single grid point, and the contours show the joint density as estimated by kernel density estimation.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
Across the basin there are two temperature regimes due to the change in altitude from low-lying Bangladesh and northeast India to the Tibetan Plateau. Over the low-lying warmer region, HadGEM3 shows severe discrepancies in rainfall, likely owing to its poor monsoon representation and its tendency to drizzle. UNIT underdisperses the translated data, and both QM and UNIT+QM perform well. The results of JS divergence (Table 1) show that combining the two techniques captures the reanalysis data most effectively, with UNIT+QM significantly outperforming the other methods.
Figure 12 shows the joint distributions between co-occurring SPI values in each river basin for each model and bias correction system. We note that the data points in this plot originate from a 30-day rolling time window instead of selecting independent 30-day periods. To turn our rolling-window data into independent samples would mean selecting 1/30 of the time indices, that is, indices {30i + s}i∈[0, 1, 2, 3, …] where s is the starting index 0 < s < 29. However, we are only interested in estimating the joint density, and all starting indices s are equally valid. So, we decide not to filter to independent periods in the density plot and effectively marginalize over s when performing kernel density estimation in the figure. However, we note that the datapoints are oversampled, so we must avoid overinterpreting point clusters. An independently sampled version of this plot is presented in the appendix (Fig. A8) and shows the same overall structure in each case.
Joint distributions of the standardized precipitation index at each grid point over the Ganges–Brahmaputra and Indus River basins using a 30-day time window for the five different datasets. Each point is from a single grid point, and the contours show the joint density as estimated by kernel density estimation.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
HadGEM3 underestimates the co-occurrence of very wet events where the Ganges–Brahmaputra SPI is greater than 1 and Indus SPI is greater than 2. All translations perform reasonably well at correcting for these edge cases, but again UNIT+QM performs best. This is evident from the JS divergence values and visually; while UNIT underdisperses the data, using UNIT+QM provides an accurate correction for more extreme events.
5. Conclusions
In this study, we examined the appropriateness of unpaired image-to-image translation networks to bias correct climate data. We found that the UNIT neural network architecture was not sufficient by itself for bias correcting data. This was because it reduced the dispersion of the data and, therefore, led to less extreme values than expected. We showed that the bias correction produced by UNIT did have desirable properties, such as coherently bias correcting cross-variable correlations and spatial structures. We proposed to combine the UNIT translation with quantile mapping, a more traditional bias correcting technique—a combination that is similar to previous work (François et al. 2021). We found that applying these techniques in sequence made up for shortcomings in each. UNIT was able to bias correct the spatial and cross-variable correlations, while QM corrected UNIT’s tendency to underdisperse the data.
When applied to bias correct the daily minimum, mean, and maximum temperature, the precipitation, and geopotential height, we found that UNIT+QM was able to simultaneously correct all variables. This is an advantage of previous work where the dynamical feature of geopotential height is assumed to be correct and unbiased and thus is used as a key part of regularizing the bias correction (Pan et al. 2021). In our method, we leave it to the discriminator network to ensure that the temperatures and precipitation are consistent with each other and the dynamical variable. This allows the network to bias correct the dynamical variable where it is appropriate.
Designing bias correction methods such as UNIT+QM, which can bias correct many variable joint distributions, is crucial to study and make risk assessments of compound extreme events. Such cross-variable corrections are also important when the output of GCMs are used as boundary conditions of regional climate models (White and Toumi 2013). We have shown that incorporating modern machine learning methods alongside classical techniques can provide us with more powerful tools for bias correction.
Further work on this topic may be needed in several aspects: 1) to consider the performance of this approach relative to newer developments in statistical bias correction such as multivariate quantile mapping (Cannon 2018), alongside computational demands, as well as how the approaches could be further combined to address the issues inherent in both, and 2) to consider bias correction of nonstationary data. We note that this is also an ongoing challenge for classical techniques. Many of the extensions to classical techniques used for nonstationary data, such as detrending either additively or multiplicatively (Cannon et al. 2015), could also be used with UNIT+QM.
Acknowledgments.
Author Fulton was supported by NERC Doctoral Training Partnership Grant NE/L002558/1. Author Clarke was supported by NERC Doctoral Training Partnership Grant NE/L002612/1. Author Hegerl was funded by NERC Large Grant GloSAT (NE/S015698/1). Computing resources for this work were provided by a Microsoft AI for Earth grant. We acknowledge the World Climate Research Programme, which, through its Working Group on Coupled Modelling, coordinated and promoted CMIP6. We thank the climate modelling groups for producing and making available their model output, the Earth System Grid Federation (ESGF) for archiving the data and providing access, and the multiple funding agencies who support CMIP6 and ESGF. We also thank Simon Tett, Massimo Bollasina, and Friederike Otto for helpful discussions.
Data availability statement.
All data used in this work come from the Climate of the 20th Century Plus (C20C+) Project (Folland et al. 2014) and are freely available. The code is available online (https://github.com/dfulu/climateTranslation).
APPENDIX
Network Details and Supporting Analysis
a. Preprocessing the data
As is ubiquitous in training machine learning (ML) models, we preprocessed the data before training the UNIT network.
The daily mean temperature fields originally in kelvins were converted to Celsius and then divided by the mean of the standard deviation of temperature across all latitudes and longitudes. This maintained the physically significant value of 0°C while avoiding excessive gradients when updating the model parameters. The daily maximum and minimum temperatures were converted to degrees Celsius above or below the daily mean temperature. This allowed us to choose ReLU activation functions that enforced Tmin < Tmean < Tmax. They were also divided by the spatially mean standard deviation in temperature.
Precipitation values are extremely skewed and non-Gaussian. This can be challenging for an ML model to learn. In keeping with ML best practices, we transformed the data so that the distribution became closer to Gaussian. We found that taking the fourth root of the daily accumulated precipitation appeared reasonably dispersed. Then we divided the data by its standard deviation. We note that the transforms that we used need not make the data exactly Gaussian but merely less extremely distributed.
We simply standardized the 500-hPa geopotential height by subtracting the mean and dividing by the standard deviation. The means and standard deviations used were not calculated pointwise; instead, a single value was used globally.
b. UNIT network structure
Figure A1 shows a diagram of the UNIT architecture used here. We use color and shape to represent different layers and subcomponents of the network.
Architecture of the UNIT network, configured as we use in this study.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
The encoders E(x) are composed of seven convolutional layers with convolutional downsampling to reduce the spatial dimension. The decoders G(z) are composed of nine convolutional layers with nearest neighbor upsampling to increase spatial dimension. Nearest neighbor upsampling was used over transpose convolutions to avoid checkerboard artifacts Odena et al. (2016) that are present in the original UNIT network. Some of these layers are organized into residual skip connections (He et al. 2016), which allow more routes for gradient backpropagation to earlier layers.
We use a multiscale discriminator in each domain. Isola et al. (2017) found that such multiscale discriminators produce images that have more realistic large-scale features and sharper small-scale features. The multiscale discriminator is composed of three discriminator components that assess the plausibility of the translations at separate spatial scales. We use spatial scales that are 1×, 2×, and 4× as coarse as the input image. Each discriminator component had six convolutional layers. For the discriminators that are coarser than the original image, the input image is coarsened by mean pooling.
Batch normalization and leaky ReLU activation functions were used throughout these components, as shown in the diagram. The final layer of the decoder G(z) used different activation functions for the different channels. The mean temperature and geopotential height used none (i.e., x ↦ x); the precipitation, and maximum and minimum temperatures (expressed as difference from mean temperature) used ReLU activation functions. Where padding was used, it was replication padding. The number of filters used in each layer is shown in the diagram.
The bottom panel of the diagram shows the overall structure of the UNIT network. It also shows the path of an image x from domain X through the network components to generate its reconstruction (
This panel also shows the use of the land mask. Early in the study we noticed that the UNIT network was producing unphysical translations at land–sea boundaries, such as negative daily temperature ranges (this was noticed before we added some of the preprocessing and activation constraints mentioned in Fig. A1). This occurred exclusively on land–sea border pixels. To aid the network in translating these border regions effectively, the encoder E(x) and discriminator D(x) networks were given the concatenated weather fields and binary land–sea mask [x, mask] as input. The decoder G(z) was only trained to reconstruct images
c. Training and hyperparameters
The hyperparameters used in training this network were taken from those used in the original UNIT network and adjusted manually a little using only the results of the training data. These were not highly optimized, so there may be room for improvement.
We trained the network from scratch using a batch size of 8 and the Adam optimizer (Kingma and Ba 2014) with learning rate of 5 × 10−4 for both the translation network and the discriminators. In the Adam optimizer, we also used weight decay, β1, and β2 values of 0.0001, 0.5, and 0.999, respectively. No explicit regularization was used in training. The only regularization present in the network is the implicit regularization associated with the reparameterization step in the VAEs.
We trained the network for 208 000 iterations. As with many adversarial networks, training is very unstable, and so the model was checkpointed regularly (every 2000 iterations) throughout training. We chose the 208 000 iteration checkpoint manually although training continued to 570 000 iterations. We chose this checkpoint because the loss was reasonably low and stable at this point. The network took a few days to train on a single NVIDIA Tesla T4 graphical processing unit (GPU).
The loss coefficients used in Eq. (6) are shown in Table A1. These are similar to those used in the original UNIT network.
The coefficients used in the UNIT loss function.
d. Empirically calculated JS divergence
In this study, we have used an approximation of JS divergence [Eq. (7)] that can be used on binned data. Figure A2 shows the sensitivity to the number of bins of the spatial mean of the JS divergence between precipitation and temperature calculated at each grid point (i.e., the mean value across Fig. 5 for each dataset). This shows that our results are qualitatively robust to the choice in the number of bins in this estimation.
The spatial mean of the JS divergence between temperature and precipitation at all grid points.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
e. Significance of JS divergence values
In the main text, we quote the values of the JS divergence calculated between each of the distributions in Fig. 4. We estimate the significance of these values via bootstrapping.
We randomly sample N fields with replacement from the ERA5 dataset, which is also of length N. We sample from each dataset χ ∈ {HadGEM3, UNIT, UNIT+QM, QM} similarly (we also sample again independently from ERA5 for results in Table A2). Then we calculate the JS divergence between these bootstrapped samples. We repeat this bootstrapping calculation 200 times. The 5th, 50th, and 95th percentiles, as well as the values quoted in the main text that did not use bootstrapping, are collated in Table A2.
The values of the JS divergence calculated between each of the distributions in Fig. 4 as mentioned in the main text and their bootstrapped percentile intervals.
We note that the JS value calculated using all of the data is consistently lower than even the 5% value using bootstrapped samples. This is simply because the bootstrapped samples are less diverse than the full dataset, and so it can only be expected that the match between the distributions will get worse. We consider one data source to be significantly better than the other if the 50% bootstrapped JS divergence of that dataset is lower than the 5% of the other data source. This suggests that UNIT+QM is a significant improvement on the other methods in this statistic.
Table A3 shows similarly calculated confidence intervals calculated for the JS divergences presented in Table 1.
f. Basins used
Figure A3 shows the areas used for each basin used in this study. The mask was created by checking whether the center of each grid point is contained within the boundaries of the basin shape downloaded from the World Bank data catalog (World Bank 2019).
Geographical extent of the key areas and basins used in this study. The red outline denotes the overall region used in this study. For the basins that extend outside this red-outlined box, we have aggregated to only the area of the basin that lies inside this region. The black bounding box denotes the area bounded by 8°–28°N, 72°–85°E, used to study extreme heat events as in Fig. 10.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
As noted, two of these basins are clipped because of the boundaries chosen (8°S–30°N, 44°–121°E). In this study, we only use the results of aggregating these basins as a demonstration of how each method of bias correction performs when aggregated over a spatial area. Therefore, clipping these basins to the area in our domain does not affect the contents of the work presented here.
g. Plots repeated as copulas
To assess the joint distribution of the translations without the influence of the marginal distributions, we recreate some of our figures from the main text after transforming the data values into quantiles. This produces an empirical estimate of the copula of the two-dimensional datasets plotted. Figures A4–A7 are copula alternatives to Figs. 4, 6, 8, and 9, respectively.
Copula decomposition of Fig. 4—Joint distributions of daily accumulated precipitation and daily mean temperature from a single grid point located at the southern tip of India (centered at 8.6°N, 77.9°E) for the five different datasets. Each point is from a single day, and the contours show the joint density as estimated by kernel density estimation. The x and y scales show the precipitation and temperature as expressed as quantiles calculated with respect to each dataset.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
Copula decomposition of Fig. 6—Joint distributions of daily accumulated precipitation at a grid point located at the southern tip of India (centered at 8.6°N, 77.9°E) and its neighboring cell directly eastward (centered at 78.8°E) for the five different datasets. Each point is from a single day, and the contours show the joint density as estimated by kernel density estimation. The x and y scales show the precipitation at the two grid points as expressed as quantiles calculated with respect to each dataset.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
Copula decomposition of Fig. 8—Joint distributions of the mean daily accumulated precipitation and daily mean temperature aggregated over the Mahanadi River basin for the five different datasets. Each point is from a single day, and the contours show the joint density as estimated by kernel density estimation. The x and y scales show the precipitation and temperature as expressed as quantiles calculated with respect to each dataset.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
Copula decomposition of Fig. 9—Joint distributions of mean daily accumulated precipitation across the Ganges–Brahmaputra and the Indus basin for the five different datasets. Each point is from a single day, and the contours show the joint density as estimated by kernel density estimation. The x and y scales show the precipitation in the two basins as expressed as quantiles calculated with respect to each dataset.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
The results of Figs. A4–A7 support what we observed in the figures in the main text. The UNIT+QM method performs the best across the four figures. The UNIT+QM copula is more similar to the ERA5 copula than those of the other methods in all the figures. QM does not adequately correct the copula of the HadGEM3 data in any of the figures.
When a dataset is transformed via regular QM, its copula is unaffected. Therefore, we might expect the copulas of the HadGEM3 and QM datasets to be identical in each of the figures. Similarly, we might expect the UNIT and UNIT+QM data to have identical copulas. However, we apply quantile mapping to each month separately. This makes the QM we apply a conditional QM. This method can, therefore, modify the copula, not just the marginal distributions when considered over the whole year.
In Figs. A6 and A7 there are no data points in the gap between 0 and around 0.1–0.2 on the x axis. This is because around 10%–20% of the basin average precipitation for the Mahanadi and Indus basins, there is exactly zero. Hence these 10%–20% of data points are assigned a quantile of zero. The Ganges–Brahmaputra basin is large enough that almost no days have exactly zero precipitation across the basin, so no gap is observed in the y axis in Fig. A7.
h. Independently sampled SPI events
Figure A8 shows independently sampled values of SPI from the Ganges–Brahmaputra and Indus River basins. The overall structure of the distributions is very similar to Fig. 12, in which values are calculated from a rolling time window.
Joint distributions of the standardized precipitation index at each grid point over the Ganges–Brahmaputra and Indus River basins using a 30-day time window for the five different datasets. Each point is from a single grid point and a completely independent 30-day time window.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
i. Alternative matching algorithm
We examine the opposite match algorithm to the one used to produce Fig. 7. Instead of taking fields from dataset χ ∈ {HadGEM3, UNIT, UNIT+QM, QM} and finding the closest match in ERA5, we match the opposite way. We take each ERA5 field and find the closest match in dataset χ. The results of this are shown in Fig. A9. This figure shows fairly similar results to Fig. 7, with UNIT and UNIT+QM performing the best.
Distributions of the SSIM between each ERA5 field and its best-matched dataset field, shown for the mean 2-m temperature, mean 500-hPa geopotential height, and the fourth root of daily precipitation.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
From the SSIM match distributions, we may be left asking how often each individual sample is chosen as the best match. If the translation is done well, we would expect that many samples are chosen as the best match rather than the same sample always being matched. Figure A10 (based on the fourth root of matching precipitation) and Fig. A11 (based on matching temperature) show the results of this analysis. We count how many times each sample is chosen as the best match and then plot the frequencies (y axis) of these best match counts (x axis). For example, in Fig. A10, we can see that when we take HadGEM3 fields and select the best match among the ERA5 data, around 1500 of the ERA5 fields were chosen as the best match exactly one time. A little over 400 ERA5 fields were chosen as the best match exactly two times, and so on. In the figures, the black lines marked as ERA5 → ERA5 are the results of matching the training ERA5 data to the validation ERA5 data. Each panel of these figures also shows the fraction of data that was matched to at least once.
Distribution of the number of times matched to fields were chosen as the best match using SSIM to compare the fourth root of precipitation. The x axis is the number of times a field was chosen as the best match (the match count). The y axis is the number of fields with this match count. The figure shows the results of matching in both directions between ERA5 and the bias correction methods. The black line is the result of performing the same SSIM matching based on matching the ERA5 training set to the ERA5 test set and is included for comparison.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
As in Fig. A10, but based on matching temperature.
Citation: Artificial Intelligence for the Earth Systems 2, 2; 10.1175/AIES-D-22-0031.1
In Fig. A10, we see evidence of the underdispersion of UNIT. When UNIT is matched to ERA, a few of the ERA5 fields are chosen many times, with one ERA5 field being the best match to around 400 UNIT fields. In the UNIT+QM panel, we see that the match frequencies are similar to the ERA-to-ERA baseline regardless of whether we match ERA5 to UNIT+QM or UNIT+QM to ERA.
Figure A11 shows similar features to Fig. A10 but with these features less pronounced. This may be because temperature has a less complex spatial joint distribution than precipitation.
REFERENCES
Aadhar, S., and V. Mishra, 2017: High-resolution near real-time drought monitoring in South Asia. Sci. Data, 4, 170145, https://doi.org/10.1038/sdata.2017.145.
Arora, S., and Y. Zhang, 2017: Do GANs actually learn the distribution? An empirical study. arXiv, 1706.08224v2, https://doi.org/10.48550/arXiv.1706.08224.
Ashfaq, M., D. Rastogi, R. Mei, D. Touma, and L. R. Leung, 2017: Sources of errors in the simulation of South Asian summer monsoon in the CMIP5 GCMs. Climate Dyn., 49, 193–223, https://doi.org/10.1007/s00382-016-3337-7.
Bau, D., J.-Y. Zhu, J. Wulff, W. Peebles, H. Strobelt, B. Zhou, and A. Torralba, 2019: Seeing what a GAN cannot generate. arXiv, 1910.11626, https://doi.org/10.48550/arXiv.1910.11626.
Bellprat, O., V. Guemas, F. Doblas-Reyes, and M. G. Donat, 2019: Towards reliable extreme weather and climate event attribution. Nat. Commun., 10, 1732, https://doi.org/10.1038/s41467-019-09729-2.
Bollasina, M. A., and Y. Ming, 2013: The general circulation model precipitation bias over the southwestern equatorial Indian Ocean and its implications for simulating the South Asian monsoon. Climate Dyn., 40, 823–838, https://doi.org/10.1007/s00382-012-1347-7.
Cannon, A. J., 2018: Multivariate quantile mapping bias correction: An N-dimensional probability density function transform for climate model simulations of multiple variables. Climate Dyn., 50, 31–49, https://doi.org/10.1007/s00382-017-3580-6.
Cannon, A. J., S. R. Sobie, and T. Q. Murdock, 2015: Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? J. Climate, 28, 6938–6959, https://doi.org/10.1175/JCLI-D-14-00754.1.
Chandrasekara, S. S. K., H.-H. Kwon, M. Vithanage, J. Obeysekera, and T.-W. Kim, 2021: Drought in South Asia: A review of drought assessment and prediction in South Asian countries. Atmosphere, 12, 369, https://doi.org/10.3390/atmos12030369.
Christian, J. I., J. B. Basara, E. D. Hunt, J. A. Otkin, J. C. Furtado, V. Mishra, X. Xiao, and R. M. Randall, 2021: Global distribution, trends, and drivers of flash drought occurrence. Nat. Commun., 12, 6330–, https://doi.org/10.1038/s41467-021-26692-z.
Ciavarella, A., and Coauthors, 2018: Upgrade of the HadGEM3-A based attribution system to high resolution and a new validation framework for probabilistic event attribution. Wea. Climate Extremes, 20, 9–32, https://doi.org/10.1016/j.wace.2018.03.003.
Dionelis, N., M. Yaghoobi, and S. A. Tsaftaris, 2020: Tail of distribution GAN (TailGAN): Generative-adversarial-network-based boundary formation. 2020 Sensor Signal Processing for Defence Conf., Edinburgh, United Kingdom, Institute of Electrical and Electronics Engineers, 41–45, https://doi.org/10.1109/SSPD47486.2020.9272100.
Ehret, U., E. Zehe, V. Wulfmeyer, K. Warrach-Sagi, and J. Liebert, 2012: HESS opinions: “Should we apply bias correction to global and regional climate model data?” Hydrol. Earth Syst. Sci., 16, 3391–3404, https://doi.org/10.5194/hess-16-3391-2012.
Eyring, V., S. Bony, G. A. Meehl, C. A. Senior, B. Stevens, R. J. Stouffer, and K. E. Taylor, 2016: Overview of the Coupled Model Intercomparison Project phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev., 9, 1937–1958, https://doi.org/10.5194/gmd-9-1937-2016.
Eyring, V., and Coauthors, 2021: Human influence on the climate system. Climate Change 2021: The Physical Science Basis, V. Masson-Delmotte, Eds., Cambridge University Press, 423–552, https://www.ipcc.ch/report/ar6/wg1/downloads/report/IPCC_AR6_WGI_Chapter03.pdf.
Folland, C., D. Stone, C. Frederiksen, D. Karoly, and J. Kinter, 2014: The international CLIVAR climate of the 20th century plus (C20C+) project: Report of the sixth workshop. CLIVAR Exchanges, No. 19, International CLIVAR Project Office, Southampton, United Kingdom, 57–59, https://portal.nersc.gov/c20c/pub/FollandC_StoneD_etalii_2014.pdf.
François, B., S. Thao, and M. Vrac, 2021: Adjusting spatial dependence of climate model outputs with cycle-consistent adversarial networks. Climate Dyn., 57, 3323–3353, https://doi.org/10.1007/s00382-021-05869-8.
Gaupp, F., J. Hall, S. Hochrainer-Stigler, and S. Dadson, 2020: Changing risks of simultaneous global breadbasket failure. Nat. Climate Change, 10, 54–57, https://doi.org/10.1038/s41558-019-0600-z.
Goodfellow, I. J., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, 2014: Generative adversarial networks. arXiv, 1406.2661v1, https://doi.org/10.48550/arXiv.1406.2661.
Grover, A., C. Chute, R. Shu, Z. Cao, and S. Ermon, 2020: AlignFlow: Cycle consistent learning from multiple domains via normalizing flows. Proc. AAAI Conf. Artif. Intell., 34, 4028–4035, https://doi.org/10.1609/aaai.v34i04.5820.
Guha-Sapir, D., R. Below, and P. Hoyois, 2014: EM-DAT: International disaster database. Université Catholique de Louvain, accessed 1 April 2022, www.emdat.be.
Han, L., M. Chen, K. Chen, H. Chen, Y. Zhang, B. Lu, L. Song, and R. Qin, 2021: A deep learning method for bias correction of ECMWF 24–240 h forecasts. Adv. Atmos. Sci., 38, 1444–1459, https://doi.org/10.1007/s00376-021-0215-y.
Hanlon, H. M., G. C. Hegerl, S. F. B. Tett, and D. M. Smith, 2015: Near-term prediction of impact-relevant extreme temperature indices. Climatic Change, 132, 61–76, https://doi.org/10.1007/s10584-014-1191-3.
Hao, Z., A. Mallya, S. Belongie, and M.-Y. Liu, 2021: GANcraft: Unsupervised 3D neural rendering of Minecraft worlds. 2021 IEEE/CVF Int. Conf. on Computer Vision, Montreal, QC, Canada, Institute of Electrical and Electronics Engineers, 14 052–14 062, https://doi.org/10.1109/ICCV48922.2021.01381.
He, K., X. Zhang, S. Ren, and J. Sun, 2016: Deep residual learning for image recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, Institute of Electrical and Electronics Engineers, 770–778, https://doi.org/10.1109/CVPR.2016.90.
Herger, N., G. Abramowitz, R. Knutti, O. Angélil, K. Lehmann, and B. M. Sanderson, 2018: Selecting a climate model subset to optimise key ensemble properties. Earth Syst. Dyn., 9, 135–151, https://doi.org/10.5194/esd-9-135-2018.
Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803.
Im, E.-S., J. S. Pal, and E. A. B. Eltahir, 2017: Deadly heat waves projected in the densely populated agricultural regions of South Asia. Sci. Adv., 3, e1603322, https://doi.org/10.1126/sciadv.1603322.
IPCC, 2013: Climate Change 2013: The Physical Science Basis. Cambridge University Press, 1535 pp., https://doi.org/10.1017/CBO9781107415324.
Isola, P., J.-Y. Zhu, T. Zhou, and A. A. Efros, 2017: Image-to-image translation with conditional adversarial networks. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, Institute of Electrical and Electronics Engineers, 1125–1134, https://doi.org/10.1109/CVPR.2017.632.
Jones, P. W., 1999: First- and second-order conservative remapping schemes for grids in spherical coordinates. Mon. Wea. Rev., 127, 2204–2210, https://doi.org/10.1175/1520-0493(1999)127<2204:FASOCR>2.0.CO;2.
Kingma, D. P., and M. Welling, 2013: Auto-encoding variational Bayes. arXiv, 1312.6114v11, https://doi.org/10.48550/arXiv.1312.6114.
Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.
Kirchmeier-Young, M. C., F. W. Zwiers, and N. P. Gillett, 2017: Attribution of extreme events in Arctic sea ice extent. J. Climate, 30, 553–571, https://doi.org/10.1175/JCLI-D-16-0412.1.
Le, X.-H., G. Lee, K. Jung, H. An, S. Lee, and Y. Jung, 2020: Application of convolutional neural network for spatiotemporal bias correction of daily satellite-based precipitation. Remote Sens., 12, 2731, https://doi.org/10.3390/rs12172731.
Leonard, M., and Coauthors, 2014: A compound event framework for understanding extreme impacts. Wiley Interdiscip. Rev.: Climate Change, 5, 113–128, https://doi.org/10.1002/wcc.252.
Levy, A. A. L., W. Ingram, M. Jenkinson, C. Huntingford, F. Hugo Lambert, and M. Allen, 2013: Can correcting feature location in simulated mean climate improve agreement on projected changes? Geophys. Res. Lett., 40, 354–358, https://doi.org/10.1002/2012GL053964.
Levy, K., A. P. Woster, R. S. Goldstein, and E. J. Carlton, 2016: Untangling the impacts of climate change on waterborne diseases: A systematic review of relationships between diarrheal diseases and temperature, rainfall, flooding, and drought. Environ. Sci. Technol., 50, 4905–4922, https://doi.org/10.1021/acs.est.5b06186.
Liu, M.-Y., and O. Tuzel, 2016: Coupled generative adversarial networks. 29th Conf. on Neural Information Processing Systems, Barcelona, Spain, NIPS, 469–477, https://papers.nips.cc/paper/2016/hash/502e4a16930e414107ee22b6198c578f-Abstract.html.
Liu, M.-Y., T. Breuel, and J. Kautz, 2017: Unsupervised image-to-image translation networks. 31st Conf. on Neural Information Processing Systems, Long Beach, CA, NIPS, 700–708, https://proceedings.neurips.cc/paper/2017/file/dc6a6489640ca02b0d42dabeb8e46bb7-Paper.pdf.
Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21, 289–307, https://doi.org/10.3402/tellusa.v21i3.10086.
Maher, P., G. K. Vallis, S. C. Sherwood, M. J. Webb, and P. G. Sansom, 2018: The impact of parameterized convection on climatological precipitation in atmospheric global climate models. Geophys. Res. Lett., 45, 3728–3736, https://doi.org/10.1002/2017GL076826.
Mao, X., Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, 2017: Least squares generative adversarial networks. Proc. IEEE Int. Conf. on Computer Vision, Venice, Italy, Institute of Electrical and Electronics Engineers, 2813–2821, https://doi.org/10.1109/ICCV.2017.304.
Maraun, D., and Coauthors, 2010: Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user. Rev. Geophys., 48, RG3003, https://doi.org/10.1029/2009RG000314.
Maraun, D., and Coauthors, 2017: Towards process-informed bias correction of climate change simulations. Nat. Climate Change, 7, 764–773, https://doi.org/10.1038/nclimate3418.
Mishra, V., S. Aadhar, and S. S. Mahto, 2021: Anthropogenic warming and intraseasonal summer monsoon variability amplify the risk of future flash droughts in India. npj Climate Atmos. Sci., 4, 1, https://doi.org/10.1038/s41612-020-00158-3.
Moghim, S., and R. L. Bras, 2017: Bias correction of climate modeled temperature and precipitation using artificial neural networks. J. Hydrometeor., 18, 1867–1884, https://doi.org/10.1175/JHM-D-16-0247.1.
Moors, E., T. Singh, C. Siderius, S. Balakrishnan, and A. Mishra, 2013: Climate change and waterborne diarrhoea in northern India: Impacts and adaptation strategies. Sci. Total Environ., 468–469, S139–S151, https://doi.org/10.1016/j.scitotenv.2013.07.021.
Odena, A., V. Dumoulin, and C. Olah, 2016: Deconvolution and checkerboard artifacts. Distill, 1, e3, https://doi.org/10.23915/distill.00003.
Pan, B., G. J. Anderson, A. Goncalves, D. D. Lucas, C. J. Bonfils, J. Lee, Y. Tian, and H.-Y. Ma, 2021: Learning to correct climate projection biases. J. Adv. Model. Earth Syst., 13, e2021MS002509, https://doi.org/10.1029/2021MS002509.
Ravuri, S., and Coauthors, 2021: Skillful precipitation nowcasting using deep generative models of radar. Nature, 597, 672–677, https://doi.org/10.1038/s41586-021-03854-z.
Raymond, C., T. Matthews, and R. M. Horton, 2020: The emergence of heat and humidity too severe for human tolerance. Sci. Adv., 6, eaaw1838, https://doi.org/10.1126/sciadv.aaw183.
Ridder, N. N., A. J. Pitman, and A. M. Ukkola, 2021: Do CMIP6 climate models simulate global or regional compound events skillfully? Geophys. Res. Lett., 48, e2020GL091152, https://doi.org/10.1029/2020GL091152.
Rimi, R. H., K. Haustein, M. R. Allen, and E. J. Barbour, 2019: Risks of pre-monsoon extreme rainfall events of Bangladesh: Is anthropogenic climate change playing a role? [in “Explaining Extreme Events of 2017 from a Climate Perspective”]. Bull. Amer. Meteor. Soc., 100, S61–S65, https://doi.org/10.1175/BAMS-D-18-0152.1.
Shrivastava, A., T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, 2017: Learning from simulated and unsupervised images through adversarial training. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, Institute of Electrical and Electronics Engineers, 2242–2251, https://doi.org/10.1109/CVPR.2017.241.
Snell, J., K. Ridgeway, R. Liao, B. D. Roads, M. C. Mozer, and R. S. Zemel, 2017: Learning to generate images with perceptual similarity metrics. 2017 IEEE Int. Conf. on Image Processing, Beijing, China, Institute of Electrical and Electronics Engineers, 4277–4281, https://doi.org/10.1109/ICIP.2017.8297089.
Sønderby, C. K., and Coauthors, 2020: MetNet: A neural weather model for precipitation forecasting. arXiv, 2003.12140v2, https://doi.org/10.48550/arXiv.2003.12140.
Stan, C., D. M. Straus, J. S. Frederiksen, H. Lin, E. D. Maloney, and C. Schumacher, 2017: Review of tropical-extratropical teleconnections on intraseasonal time scales. Rev. Geophys., 55, 902–937, https://doi.org/10.1002/2016RG000538.
Steininger, M., D. Abel, K. Ziegler, A. Krause, H. Paeth, and A. Hotho, 2020: Deep learning for climate model output statistics. arXiv, 2012.10394v1, https://doi.org/10.48550/arXiv.2012.10394.
Takahashi, H., A. Bodas-Salcedo, and G. Stephens, 2021: Warm cloud evolution, precipitation, and their weak linkage in HadGEM3: New process-level diagnostics using a-train observations. J. Atmos. Sci., 78, 2075–2087, https://doi.org/10.1175/JAS-D-20-0321.1.
Tian, B., and X. Dong, 2020: The double-ITCZ bias in CMIP3, CMIP5, and CMIP6 models based on annual mean precipitation. Geophys. Res. Lett., 47, e2020GL087232, https://doi.org/10.1029/2020GL087232.
Timmermann, A., and Coauthors, 2018: El Niño–Southern Oscillation complexity. Nature, 559, 535–545, https://doi.org/10.1038/s41586-018-0252-6.
Tirivarombo, S., D. Osupile, and P. Eliasson, 2018: Drought monitoring and analysis: Standardised precipitation evapotranspiration index (SPEI) and standardised precipitation index (SPI). Phys. Chem. Earth, 106, 1–10, https://doi.org/10.1016/j.pce.2018.07.001.
Trenberth, K. E., and D. J. Shea, 2005: Relationships between precipitation and surface temperature. Geophys. Res. Lett., 32, L14703, https://doi.org/10.1029/2005GL022760.
Vaughan, A., W. Tebbutt, J. S. Hosking, and R. E. Turner, 2022: Convolutional conditional neural processes for local climate downscaling. Geosci. Model Dev., 15, 251–268, https://doi.org/10.5194/gmd-15-251-2022.
Wang, B., C. Jin, and J. Liu, 2020: Understanding future change of global monsoons projected by CMIP6 models. J. Climate, 33, 6471–6489, https://doi.org/10.1175/JCLI-D-19-0993.1.
Wang, C., L. Zhang, S.-K. Lee, L. Wu, and C. R. Mechoso, 2014: A global perspective on CMIP5 climate model biases. Nat. Climate Change, 4, 201–205, https://doi.org/10.1038/nclimate2118.
Wang, F., and D. Tian, 2022: On deep learning-based bias correction and downscaling of multiple climate models simulations. Climate Dyn., 59, 3451–3468, https://doi.org/10.1007/s00382-022-06277-2.
Wang, F., D. Tian, L. Lowe, L. Kalin, and J. Lehrter, 2021: Deep learning for daily precipitation and temperature downscaling. Water Resour. Res., 57, e2020WR029308, https://doi.org/10.1029/2020WR029308.
Wang, Z., A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, 2004: Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process., 13, 600–612, https://doi.org/10.1109/TIP.2003.819861.
Wehner, M., D. Stone, H. Krishnan, K. AchutaRao, and F. Castillo, 2016: The deadly combination of heat and humidity in India and Pakistan in summer 2015 [in “Explaining Extreme Events of 2015 from a Climate Perspective”]. Bull. Amer. Meteor. Soc., 97 (12), S81–S86, https://doi.org/10.1175/BAMS-D-16-0145.1.
White, R. H., and R. Toumi, 2013: The limitations of bias correcting regional climate model inputs. Geophys. Res. Lett., 40, 2907–2912, https://doi.org/10.1002/grl.50612.
World Bank, 2019: WB data catalog—major river basins of the world. World Bank, accessed 12 November 2021, https://datacatalog.worldbank.org/search/dataset/0041426.
Yiou, P., R. Vautard, P. Naveau, and C. Cassou, 2007: Inconsistency between atmospheric dynamics and temperatures during the exceptional 2006/2007 fall/winter and recent warming in Europe. Geophys. Res. Lett., 34, L21808, https://doi.org/10.1029/2007GL031981.
Yuan, X., M. R. Kaplan, and M. A. Cane, 2018: The interconnected global climate system—A review of tropical–polar teleconnections. J. Climate, 31, 5765–5792, https://doi.org/10.1175/JCLI-D-16-0637.1.
Zhang, F., Y. Q. Sun, L. Magnusson, R. Buizza, S.-J. Lin, J.-H. Chen, and K. Emanuel, 2019: What is the predictability limit of midlatitude weather? J. Atmos. Sci., 76, 1077–1091, https://doi.org/10.1175/JAS-D-18-0269.1.
Zhu, J.-Y., T. Park, P. Isola, and A. A. Efros, 2017: Unpaired image-to-image translation using cycle-consistent adversarial networks. 2017 IEEE Int. Conf. on Computer Vision, Venice, Italy, Institute of Electrical and Electronics Engineers, 2223–2232, https://doi.org/10.1109/ICCV.2017.244.