Modeling Spatial Asymmetries in Teleconnected Extreme Temperatures

Mitchell L. Krock aDepartment of Statistics, University of Missouri, Columbia, Missouri

Search for other papers by Mitchell L. Krock in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-1628-1842
,
Julie Bessac bComputational Science Center, National Renewable Energy Laboratory, Golden, Colorado
cDepartment of Mathematics, Virginia Tech, Blacksburg, Virginia

Search for other papers by Julie Bessac in
Current site
Google Scholar
PubMed
Close
, and
Michael L. Stein dDepartment of Statistics, Rutgers, The State University of New Jersey, New Brunswick, New Jersey

Search for other papers by Michael L. Stein in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Combining strengths from deep learning and extreme value theory can help describe complex relationships between variables where extreme events have significant impacts (e.g., environmental or financial applications). Neural networks learn complicated nonlinear relationships from large datasets under limited parametric assumptions. By definition, the number of occurrences of extreme events is small, which limits the ability of the data-hungry, nonparametric neural network to describe rare events. Inspired by recent extreme cold winter weather events in North America caused by atmospheric blocking, we examine several probabilistic generative models for the entire multivariate probability distribution of daily boreal winter surface air temperature. We propose metrics to measure spatial asymmetries, such as long-range anticorrelated patterns that commonly appear in temperature fields during blocking events. Compared to vine copulas, the statistical standard for multivariate copula modeling, deep learning methods show improved ability to reproduce complicated asymmetries in the spatial distribution of ERA5 temperature reanalysis, including the spatial extent of in-sample extreme events.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Mitchell L. Krock, mk52n@missouri.edu

Abstract

Combining strengths from deep learning and extreme value theory can help describe complex relationships between variables where extreme events have significant impacts (e.g., environmental or financial applications). Neural networks learn complicated nonlinear relationships from large datasets under limited parametric assumptions. By definition, the number of occurrences of extreme events is small, which limits the ability of the data-hungry, nonparametric neural network to describe rare events. Inspired by recent extreme cold winter weather events in North America caused by atmospheric blocking, we examine several probabilistic generative models for the entire multivariate probability distribution of daily boreal winter surface air temperature. We propose metrics to measure spatial asymmetries, such as long-range anticorrelated patterns that commonly appear in temperature fields during blocking events. Compared to vine copulas, the statistical standard for multivariate copula modeling, deep learning methods show improved ability to reproduce complicated asymmetries in the spatial distribution of ERA5 temperature reanalysis, including the spatial extent of in-sample extreme events.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Mitchell L. Krock, mk52n@missouri.edu

1. Introduction

In 2022, the United States experienced a record-setting 18 climate disasters, each causing over $1 billion in damage (NOAA 2023). Mitigating risk from environmental extreme events is a pressing issue for science, especially in the face of anthropogenic climate change. An extreme weather event in late December 2022 motivates the statistical research question investigated in this paper. During this time, a severe winter storm produced dangerously cold temperatures and significant snowfall in much of the United States and Canada. Buffalo, New York, received over 3 ft of snow, and 41 people died during the blizzard. This storm was the result of an atmospheric blocking, a quasi-stationary high pressure ridge that disrupts the usual zonal circulation of the atmosphere. The map of 500-hPa geopotential height in Fig. 1 illustrates that such a blocking event over the subarctic Pacific preluded the 2022 December storm. Blocking events during the boreal winter in North America are associated with anomalous warmth in Alaska and extreme cold temperatures in the midlatitudes of North America (Carrera et al. 2004). Observe that the temperatures in Fig. 1 are roughly the same in Anchorage, Alaska, and El Paso, Texas.

Fig. 1.
Fig. 1.

Illustrating the December 2022 winter weather storm with ERA5 data products. (left) The contours show the average geopotential height during 21 Dec 2022. There is a very large ridge over the northeast (NE) Pacific and a deep trough over the continental interior. The flow approaching western coastal Alaska is almost southerly. (right) The average 2-m temperature during 23 Dec 2022, and the black contour line corresponds to 0°C, roughly separating where precipitation falls as rain vs snow.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

Extant models for spatial extremes (e.g., max-stable processes, generalized Pareto processes, and scale mixture models) are not equipped to deal with teleconnections where one location is abnormally warm, while another is abnormally cool (Krock et al. 2023). Specifically, these models are restricted to the joint upper (or lower) tail of the distribution—it is not possible to think about more than one tail of the spatial distribution, which is necessary for opposite-tail teleconnections. This setting motivates the challenging problem of studying extreme values on large spatial scales while being interested in the entire distribution.

We propose to model marginal distributions with the bulk-and-tails distribution, a parametric probability distribution with flexible behavior in its upper and lower tails (Stein 2021). After transforming the marginal distributions to standard uniformity, we apply multivariate copula models to capture opposite-tail dependence. A popular statistical model for this scenario is the vine copula, which uses bivariate building blocks to create a flexible asymmetric multivariate distribution (Czado and Nagler 2022). However, the simple pairwise construction of vine copulas naturally limits their expressiveness. We investigate the ability of deep learning models to replicate complicated aspects of the spatial distribution of daily wintertime temperatures in North America. In particular, we consider a stochastic process model which combines principal components and normalizing flows. Compared to vine copulas, the flow-based model more accurately reproduces bivariate opposite-tail teleconnective patterns as well as other spatial patterns in extreme temperatures.

The paper is structured as follows. In sections 2 and 3, we review concepts from extreme value theory and deep learning, respectively. Section 4 discusses our model and related work at the intersection of these two research areas. In section 5, we compare several probabilistic models in a cross-validation study to assess their ability to fit aspects of ERA5 temperature distribution that are relevant to extremes. Section 6 concludes.

2. Statistical background

We review some basic concepts from extreme value analysis, progressing from one variable to two variables to the general multivariate setting. For a formal introduction, see Coles (2001).

a. Univariate extremes

First, we discuss two celebrated univariate models from extreme value theory. The Fisher–Tippett–Gnedenko theorem motivates using the generalized extreme value (GEV) distribution to model the block maxima of a random variable, and the Pickands–Balkema–De Haan theorem motivates using the generalized Pareto distribution (GPD) to model threshold exceedances of a random variable. Both models have explicit parametric control of the tail behavior, which allows researchers to consider ideas such as the “return level” of an extreme event (i.e., the expected time until another event that is at least equally extreme). However, these models only consider a single tail of the distribution, ignoring the bulk and other tail of the data distribution. Stein (2021) proposes a seven-parameter distribution designed for the entire distribution with flexible behavior in both tails. The cumulative distribution function (CDF) of a bulk-and-tails (BATs) random variable X is P(Xx) = Tν[H(x)], where Tν(⋅) is the CDF of the Student’s t distribution with ν degrees of freedom and H(⋅) is a monotone increasing function with location, scale, and shape parameters for the upper and lower tails. The Julia package BulkAndTails.jl provides an interface to this distribution as well as an extension where the location and scale parameters depend on covariates (Krock et al. 2022).

b. Copulas and bivariate tail dependence

A copula is a multivariate CDF on the unit hypercube with uniform marginal distributions (Nelsen 2006; Joe 2014). For simplicity, consider the bivariate setting where X and Y are two continuous random variables with joint CDF FX,Y and marginal CDFs FX and FY. Sklar’s theorem (Sklar 1959) says that there exists a unique copula C: [0, 1] × [0, 1] → [0, 1] such that FX,Y (x, y) = C[FX(x), FY(y)]. The copula representation is especially useful in multivariate extreme value statistics, as it provides a straightforward way to construct a valid multivariate distribution while preserving the marginal distributions FX and FY, which presumably have been calibrated to describe the marginal tails of the distribution well. For the remainder of this paper, we primarily focus on multivariate extremes and assume that the marginal distributions can be modeled appropriately with parametric extreme value models.

Sibuya (1960) proposed a tail dependence coefficient to describe the probability of multiple random variables experiencing concurrent extremes. In the bivariate case, there are four tail dependence statistics (Zhang 2008):
λUU=limu1P[FX(X)>u|FY(Y)>u]=limu112u+C(u,u)1uλLL=limu1P[FX(X)1u|FY(Y)1u]=limu1C(1u,1u)1uλLU=limu1P[FX(X)1u|FY(Y)>u]=limu11uC(1u,u)1uλUL=limu1P[FX(X)>u|FY(Y)1u]=limu11uC(u,1u)1u.
Applications usually focus on λUU and λLL, yet λUL and λLU have received attention in financial time series (Wang et al. 2013; Chang 2021). Krock et al. (2023) studied the spatial patterns of these four tail dependencies using the ERA5 reanalysis of winter surface air temperature. Atmospheric blocking over the subarctic Pacific produces strong teleconnections between Alaska and the midlatitudes of North America, which is reflected in large values for opposite-tail dependencies λUL and λLU (illustrated later in Fig. 2). Teleconnections are traditionally studied with correlation coefficients (Wallace and Gutzler 1981). Tail dependence coefficients provide an alternative way to measure the strength of teleconnections when one is interested in extremes. In general, X and Y in (1) can be different climatological variables; for example, Singh et al. (2021) use an ensemble pooling approach to analyze compound temperature-precipitation extremes in Canada.
Fig. 2.
Fig. 2.

Correlation and empirical u = 0.95 tail dependencies between grid box in NW Alaska (black dot) and all other pairs of grid boxes. For reference, the black dot corresponds to a grid box in NW Alaska whose lower-left coordinate is (−165.1°, 65.1°) and upper-right coordinate is (−159.1°, 69.1°). Grayscale denotes the gridbox values of the extremal dependence matrix (such that the lower-right corner of a grid box depicts λUL, the tail dependence where that grid box is especially cold and the grid box with the black dot is especially warm.)

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

With these details in mind, the main question is what copula to use. Two common families of copula models are elliptical copulas, which are highly symmetric, and Archimedean copulas, which only possess a single parameter, restricting their utility to low dimensions. These two families also have limitations on the possible values for tail dependencies; in particular, none are capable of producing four distinct values in (1). The bivariate mixture model proposed in Krock et al. (2023) considers a mixture of four 90° rotations of a bivariate Archimedean copula with asymptotic dependence in one corner of the unit square and asymptotic independence in the other three corners. By construction, this model has four different values for the tail dependence coefficients (1), but their sum cannot exceed 1. This limitation is unrealistic since both λUU and λLL will be close to 1 when considering two nearby locations. Moreover, extending this mixture model to d > 2 dimensions is unrealistic, as it requires estimating a weight and a copula parameter for each d-dimensional tail, yielding a total of 2d parameters.

We may also be interested in the values in (1) at a quantile u < 1, in which case nonparametric empirical estimators are available. For example, given independent and identically distributed realizations (x1, y1), …, (xn, yn) of (X, Y), Reiss and Thomas (2007) define
λ^UU(u)=1n(1u)i=1n1(xi>xun:nandyi>yun:n),
where xun:n is the un largest value among (x1, …, xn) and 1(⋅) is an indicator function that equals one when its argument is true and zero otherwise. Note that, for a fixed amount of data, the quality of these empirical estimates will worsen as u is increased to 1.

c. Multivariate extremes

Li (2009) generalizes the bivariate tail dependencies (1) to arbitrary orthants of a d-dimensional vector. When working with spatial data from a large study region, it can be unrealistic to only consider scenarios where all d variables are simultaneously extreme. In particular, for long-range anticorrelated temperature extremes, there must be a transition region of asymptotic independence between the two regions that experience opposite-tail extremes.

Flexible multivariate copulas with analytic expressions for tail dependencies are scarce. In this work, we take a unique approach and validate our models through comparison with empirical tails of the data distribution. Assessing model fit based on empirical tail dependence coefficients is already challenging due to an inherent lack of data, and this effect is compounded when one is interested in multivariate opposite-tail extremes. Later in section 5, we propose metrics based on the “spatial extent” of an extreme event to assess the strength of opposite-tail dependence.

Besides vine copulas (which we discuss next), there are few other options for constructing asymmetric multivariate copulas. Archimax copulas combine Archimedean copulas with a stable tail dependence function for added flexibility (Charpentier et al. 2014). Ng et al. (2022) developed scalable methods for inference and sampling from the Archimax family and demonstrated improved performance in tail dependence inference compared to several other density estimators. Gong and Huser (2022) proposed a copula model with asymmetric behavior in the joint upper and lower tails, but computations in high dimensions are demanding. A main limitation of multivariate copula models outside the vine family lies in their restriction to capturing tail dependence and asymmetries along the “main diagonal” of the distribution. That is, they are only concerned with modeling extremes in the joint upper (lower) tail, which effectively ignores any sort of negative extremal dependence in the distribution.

Existing models in spatial extremes suffer from the same fundamental limitation. Maximum-stable processes and generalized Pareto processes generalize the univariate GEV and GPD methodology to the setting of multivariate stochastic processes. Both models are restricted to the joint upper (or lower) tail of the distribution, and the associated tail dependence coefficient is constant over space. Many recent works have proposed nonstationary models that can transition from close-range asymptotic dependence to asymptotic independence as the distance between locations increases; see Huser and Wadsworth (2022) for a review. Teleconnections can cause tail dependence to appear at large distances, potentially between opposite tails of the distribution. Conditional extreme models (Heffernan and Tawn 2004) could be adapted to deal with teleconnections, but this would be a significant departure from the current framework where the conditioning extreme event only considers the joint upper tail and often just a single conditional site (Wadsworth and Tawn 2022).

d. Vine copulas

In general, it is not possible to construct a multivariate distribution that preserves a collection of bivariate marginal distributions (Joe 2014, section 2.7). We can, however, approximate this procedure with vine copulas (Bedford and Cooke 2002; Aas et al. 2009; Czado and Nagler 2022), which construct a valid multivariate copula using d(d − 1)/2 bivariate copulas as building blocks. Specifically, the d-dimensional copula is factorized as a product of d(d − 1)/2 bivariate copulas according to a vine-like1 conditional independence structure. Commonly used canonical vines (C-vines) or drawable vines (D-vines) are special cases of the regular vine (R-vine) family. Given a regular vine V with edge set E, the R-vine copula density is expressed as
f(x1,x2,,xd)=eEce1,e2;De(Fe1|De,Fe2|De;De)i=1dfi(xi),
where De denotes the conditioning variables corresponding to the edge e = (e1, e2)T. An example of a four-dimensional R-vine copula density given by Czado and Nagler (2022) is
f(x1,x2,x3,x4)=c12[F1(x1),F2(x2)]×c13[F1(x1),F3(x3)]×c14[F1(x1),F4(x4)]×c23;1[F2|1(x2|x1),F3|1(x3|x1);x1]×c24;1[F2|1(x2|x1),F4|3(x4|x1);x1]×c34;12[F3|12(x3|x1,x2),F4|12(x4|x1,x2);x1,x2]×i=14fi(xi).
We see that the joint density of a vine copula framework factorizes into three parts: the marginal distributions, a set of baseline copulas that connect the marginal distributions (i.e., c12, c13, c14), and conditional copulas that use the previous edges as leaves (i.e., c23;1, c24;1, c34;12). Joe et al. (2010) showed that these baseline copulas govern the tail dependence of the vine copula. In total, there are d! × 2(d−2)(d−3)/2 − 1 R-vine copulas in d dimensions (Morales-Nápoles 2010).

Vine copulas are widely used in finance (Brechmann and Czado 2013; Low et al. 2018) and have also appeared in spatial statistics literature (Gräler 2014; Erhardt et al. 2015). Tail dependence coefficients of a vine copula can be calculated recursively (Joe et al. 2010; Salazar Flores and Díaz-Hernández 2021). In particular, Salazar Flores and Díaz-Hernández (2021) use rotated copulas to extend the recursive derivations of tail dependence functions in Joe et al. (2010) to the setting of counterdiagonal/nonpositive dependence; i.e., they consider an arbitrary joint tail where all d variables are marginally extreme, but not necessarily the joint upper or lower tail. Except in special cases, this recursion will require the numerical computation of high-dimensional integrals. Instead of approximating this integral with repeated Monte Carlo integration, we take a simpler approach and estimate the tail dependence coefficients empirically from simulations from the R-vine copula (Dißmann et al. 2013, Algorithm 2.2). Joe et al. (2010) also showed that a vine copula has tail dependence in the joint upper tail if each d − 1 bivariate copula in the first baseline level of the vine exhibits upper-tail dependence. This reasoning extends directly to other counterdiagonal tails by rotating copulas as in Salazar Flores and Díaz-Hernández (2021). That is, tail dependence exists in one of the 2d joint tails if the d − 1 bivariate baseline copulas are extreme in appropriate corners of the unit square. For example, if we desire nonzero dependence in the λULLU tail of (4), then the bivariate copula densities c12, c13, and c14 must have nonzero tail dependence in the λUL, λUL, and λUU corners of the unit square, respectively. Thus, considering arbitrary d-dimensional tail dependencies in vine copula models can be difficult since the typical bivariate building blocks do not have flexible tail dependencies in the four corners of the unit square. The relationship between the d-dimensional joint tails and lower-dimensional tails of a vine copula is complicated; see Proposition 4.3 in Joe et al. (2010) and Simpson et al. (2021).

In summary, modeling opposite-tail teleconnected extremes motivates us to search for multivariate distributions with flexible behavior in multiple tails—not just the joint upper tail, which is the setting of nearly all research in multivariate extreme value theory. Vine copulas are suited for this task, but ensuring nonzero tail dependence in a situation where marginal distributions are mixed between the upper tail, lower tail, and bulk is outside the scope of the current literature.

3. Deep learning background

We explore the ability of probabilistic generative models to model complex features of the entire data distribution, with particular focus on asymmetries and extremes. A probabilistic generative model in this context refers to a model which is able to generate unconditional samples from a multivariate probability distribution.

a. Normalizing flow

A normalizing flow uses neural networks to parameterize an invertible map from a simple known distribution to a complex unknown data distribution. Therefore, normalizing flows can be applied in Bayesian settings as prior distributions (Rezende and Mohamed 2015). Specifically, a normalizing flow gθ:RlRl is an invertible function with an easily computable Jacobian determinant such that the transformed random variable Z=gθ1(C) follows a simple base distribution [e.g., ZN(0, I), where I is the × identity matrix]. Using the change-of-variable formula, the density
fC(c)=fZ[gθ1(c)]|det[dgθ(c)dc]|1
is trivial by construction. Note that a composition of normalizing flows is still invertible with tractable Jacobian determinant, so in practice, it is common to compose normalizing flows for increased modeling flexibility. We follow the standard training procedure of neural network models and repeatedly update estimates for θ using stochastic gradient descent where the objective function (likelihood of the model) is evaluated using a subset of the data samples (known as a “batch”). Training happens for a number of epochs, where an epoch is a pass through all batches, updating θ via stochastic gradient descent in each batch. To prevent overfitting, batches are shuffled randomly between epochs. In our implementation, we use the state-of-the-art Neural Spline Flow (Durkan et al. 2019) implemented in the Python package nflows (Durkan et al. 2020). The Neural Spline Flow is a powerful density estimator, particularly when constructed in an autoregressive fashion (Coccaro et al. 2023). We provide a more detailed description of the autoregressive Neural Spline Flow in appendix A; see Kobyzev et al. (2021) and Papamakarios et al. (2022) for an extensive discussion of normalizing flows. A downside of this model is that the user is required to select an autoregressive ordering. In section 4, we propose a solution based on principal components to help with this problem of ordering variables.

b. Other generative models

To produce a new simulation from the data distribution, one follows the flow from the noise distribution to the data distribution [i.e., Zgθ(Z)]. However, this does not require invertibility of the generator gθ(⋅). Instead of a normalizing flow, one could use a standard feedforward neural network gθ:RLRl with L for increased modeling flexibility. To be precise, this gθ(⋅) would be defined recursively via J hidden layers as
gθ():=g(J)()=σ[WJg(J1)()+bJ],
where σ(⋅) is a nonlinear activation function applied elementwise to its argument, g(0)(⋅) is the identity function, and the parameters θ are the weight matrices W1, …, WJ and bias vectors b1, …, bJ. For simplicity, we set L = in this paper and demonstrate that the invertible normalizing flow framework performs as well as the noninvertible framework (6).

If gθ(⋅) is not invertible, the likelihood (5) is unknown and cannot be used as the objective function of the model. Some alternatives to maximum likelihood are adversarial training (Goodfellow et al. 2014) or discrepancy training (Li et al. 2015). Although generative adversarial networks (GANs) have achieved widespread success in many applications, adversarial training is known to be unstable, and we encountered training difficulties in our data analysis (see appendix C). Annau et al. (2023) show that GANs can hallucinate small-scale artifacts within downscaled model predictions of surface wind. We found discrepancy metrics based on proper scoring rules to be a more effective than adversarial training. Note that maximum likelihood estimation is equivalent to minimizing a distance—the Kullback–Leibler (KL) divergence—between the data distribution and model distribution. The GeomLoss package (Feydy et al. 2019) provides a PyTorch loss function that calculates the energy distance between distributions. Formally, the squared energy distance between two distributions FA and FB equals 2EABEBBEAA, where A1, A2, B1, B2 are independent with A1, A2FA and B1, B2FB. To minimize this distance in practice, the expectations are approximated with the sample mean, and neural network parameters in the generator are optimized with stochastic gradient descent. These alternative training procedures are also available for normalizing flows (Si et al. 2021; Si and Kuleshov 2022), but for the remainder of this paper, our normalizing flow models are estimated by maximum likelihood.

Diffusion models are a different class of generative models that have achieved state-of-the-art results in image generation (Ho et al. 2020). Implementations rely heavily upon the U-Net architecture (Ronneberger et al. 2015), which inputs (and outputs) a p × p image comprised of p2 square pixels of equal size. Our application in section 5 considers rectangular grid boxes over the land in North America; the U-Net architecture is not readily available in this case or other scenarios with scattered spatial data. With image data, diffusion models are an appealing option, and we leave the investigation of their utility in generating extremes for future work.

4. Proposed model and related work

To model dependence between variables, we consider the process model Y(s)=i=1lCiϕi(s) where the random vector C = (C1, …, C)T is modeled using the Neural Spline Flow (Durkan et al. 2019) under different choices of basis functions ϕ1, …, ϕ with d. Given temperature measurements at locations s1, …, sd, we are interested in the random vector Y where
Y=[Y(s1)Y(s2)Y(sd)]=[ϕ1(s1)ϕ2(s1)ϕl(s1)ϕ1(s2)ϕ2(s2)ϕl(s2)ϕ1(sd)ϕ2(sd)ϕl(sd)](C1C2Cl)=ΦC.
Here, Y is implicitly assumed to have standard normal marginals, although this is not enforced in our model. We build basis functions from principal component analysis (PCA), a popular statistical technique for dimension reduction. Constructing Φ from principal components has been effective in other normalizing flow applications (Cunningham et al. 2020; Cramer et al. 2022; Li and Hooi 2022; Klein et al. 2022), although none of these works consider spatial processes or extremes. Jiang et al. (2020) and Drees and Sabourin (2021) conduct principal component analyses for multivariate extremes with respect to the joint upper tail of the distribution. We can also model Y directly with the normalizing flow, i.e., taking = d and Φ = I. In general, basis function approaches for spatial data can be useful when the number of locations is prohibitively large. Using principal components also adds a notion of spatial continuity which is otherwise missing in the normalizing flow model. An additional motivation in our data analysis is that the first principal component shows a pronounced opposite-tail teleconnection between Alaska and the rest of the study region (see Fig. 3).
Fig. 3.
Fig. 3.

(left) Cumulative percentage of variability explained by PCA. (right) Spatial map of the first principal component, where a strong anticorrelated teleconnective pattern is clear.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

Assuming the number of principal components is large enough, the density of (7) can be approximated as
fY(y)fC(Φy)|det(ΦΦ)|1/2,
where Φ is the left pseudoinverse of Φ and fC(⋅) is the normalizing flow density. Using a normalizing flow instead of an arbitrary probabilistic generative model ensures that fC(⋅) is available in closed form. Observe that the neural network parameters in (8) only depend on the first term fCy), and the projected data Φy can be precomputed.

As a model for the entire distribution of multivariate extremes, we use the bulk-and-tails distribution for marginal distributions in conjunction with a normalizing flow for the dependence structure. This framework is a special case of a copula and marginal (CM) flow (Wiese et al. 2019), allowing for flexible multivariate tail modeling up to an approximation of the uniform distribution. It is important to note that an arbitrary normalizing flow used in the copula step of a CM flow is not a true copula model, as its marginal distributions are unknown. However, once training is complete, the marginal distributions can be corrected to more closely resemble a copula in a postprocessing step. Our modeling procedure is as follows:

  1. Marginals: At each of the d locations, fit the observations with a BATs distribution. We allow the scale and location parameters to depend on time-varying covariates to account for seasonal differences between temperatures at the beginning and end of boreal winter. Additionally, to account for climate change, the scale and location parameters depend on the yearly value of greenhouse gas emissions in the form of log CO2 equivalent (Krock et al. 2022).

  2. Copula: Transform marginal distributions to standard normality by applying the estimated BATs CDF followed by the normal quantile function. If desired, project the data onto its principal components; the number of principal components can be chosen according to the percentage of variability explained. The generative model is then fit to the (projected) data by maximizing the likelihood (8).

  3. Correction: Simulate from ngen times from the d-dimensional model, with ngen > ndat. For each marginal distribution, estimate a univariate CDF based on the simulations (e.g., empirical CDF or parametric BATs CDF). Then, for each marginal distribution, apply the estimated CDF followed by the normal quantile function.

A copula can equivalently be defined with marginal distributions that are standard normal instead of standard uniform. It is more natural to use the convention of standard normal marginals than to constrain basis expansion (7) to lie in the unit hypercube. In offline experiments (not shown), we used an empirical CDF for Step 1 and found little difference in terms of the resulting empirical tail dependencies. For Step 3, we can consider parametric and nonparametric corrections for the marginals of the generative model. Ideally, the estimated marginal distributions of the generative model are already close to standard normal, so only minor corrections are needed. Note that applying a strictly increasing parametric function (e.g., the CDF of the BATs distribution) to correct marginals is guaranteed to preserve the copula (Joe 2014).

a. Related work

Boulaguiem et al. (2022) use a similar framework to model the annual maxima of temperature and precipitation in western Europe. Specifically, their model combines GEV marginals with a GAN for dependence. The key difference is that this formulation only considers the joint upper tail of the data. If the GEV marginals in Boulaguiem et al. (2022) were changed to the BATs distribution, it would look similar to our model but with the normalizing flow replaced by a GAN. We found this GAN version of our model very difficult to train on the ERA5 dataset (see appendix C). However, due to the spatial regularity of their observations, Boulaguiem et al. (2022) were able to use convolutional layers in the GAN architecture, which could explain their improved performance. Additionally, convolutions are not generally invertible operators, complicating their use with normalizing flows; Karami et al. (2019) construct invertible convolutions for this purpose.

The closest work to ours is McDonald et al. (2022), who raise the important issue of creating a generative multivariate model with flexible tail dependence. They propose COMET Flows, a specific type of CM flow which combines a normalizing flow for dependence with flexible marginal models for extremes. Their marginal distributions are obtained from a mixture model that combines a kernel density estimate for the bulk of the distribution with two generalized Pareto distributions to describe the lower and upper marginal tails. This type of model has been explored in MacDonald et al. (2011); see Scarrott and MacDonald (2012) for an extensive discussion about threshold models and issues with combining bulk and tails in inference for extremes.

With the marginals specified, McDonald et al. (2022) use SoftFlow (Kim et al. 2020) to model the dependence between variables. The motivation for using SoftFlow is that tail dependence can be viewed as a low-dimensional manifold of feature space that normalizing flows struggle to model since they are invertible. SoftFlow perturbs conditional inputs with noise to better capture the manifold structure in feature space. Despite this justification, we did not find SoftFlow to be an effective solution to modeling multivariate tail dependence (see appendix C). In comparison with McDonald et al. (2022), we focus explicitly on spatial patterns of extremes and consider the basis decomposition (7) to move toward a stochastic process model.

b. Marginal tails

Although this paper focuses on multivariate tails of the distribution, we briefly mention some important results about the marginal tails of probabilistic generative models. Previous work has studied how the marginal tails of the data distribution and the base distribution are related, for both normalizing flows (Jaini et al. 2020; Wiese et al. 2019; Laszkiewicz et al. 2022) and GANs (Huster et al. 2021; Oriol and Miot 2021; Allouche et al. 2022). If, for example, marginals of the data distribution are heavy tailed, but the base distribution is Gaussian noise, then common GANs or normalizing flows will fail. The flow used in our experiments does not possess the limitations described in these works; see appendix A for more discussion. Moreover, we marginally transform the copula to match the standard Gaussian marginals of the base distribution so that, in theory, the marginal modeling step is less difficult.

5. ERA5 data analysis

To illustrate the strengths of our proposed multivariate model, we build on the data analysis from Krock et al. (2023) that used tail dependence coefficients to study teleconnected extremes in ERA5 temperature reanalysis data product (Hersbach et al. 2020). Tail dependence coefficients provide an alternative way to measure the strength of teleconnected extremes beyond the commonly used correlation coefficient (Wallace and Gutzler 1981). As discussed in section 2c, generalizing the bivariate mixture model from previous work to a high-dimensional setting is unrealistic. We propose several criteria for model performance and conduct a cross-validation study among various probabilistic generative models to see how well they model aspects of the temperature distribution that are relevant to extremes.

We consider the daily average 2-m temperature in December, January, and February from 1979 to 2022 over Canada and the contiguous United States; in total, there are ndat = 3940 temperature measurements at each location. The original ERA5 product lies on a 0.25° longitude/latitude grid, and we perform a coarse spatial averaging, resulting in d = 76 disjoint grid boxes that cover the study region. In other experiments (not shown), we observed that averaging the ERA5 data within each grid box produces stronger teleconnections than representing the grid box by the measurement closest to its centroid. For the remainder of this work, we consider marginal modeling at each of the 76 locations to be complete after fitting marginal BATs models (Krock et al. 2022) and transforming the temperature data to have standard uniform (or normal) marginal distributions. That is, we assume temporal independence over days after applying the marginal transformations to uniformity, giving us samples Y1,,Yndat of the d-dimensional random vector Y with known marginals. Although incorporating time dependence is necessary for predicting when an atmospheric blocking will occur, we focus on the spatial distribution of the temperature field during blocking events, for which the notion of simultaneous extremes is more important.

Figure 2 shows the empirical estimates of the four tail dependencies (1) and bivariate correlation between a grid box in northwest (NW) Alaska and all other 75 grid boxes. Spatial patterns are evident, especially the strong opposite-tail dependence between NW Alaska and midlatitude United States corresponding to atmospheric blocking events. Near the primary grid box in NW Alaska, there is strong positive dependence in both the correlation coefficient and the common-tail dependencies λLL and λUU. Moving toward northern Canada, both the correlation and tail dependence coefficients are small. Over most of Canada and the United States, we see the teleconnective patterns of negative correlation and large values for opposite-tail dependence. The strongest teleconnections correspond to atmospheric blocking events where the United States experiences freezing temperatures, while NW Alaska is anomalously warm. In contrast, for central Canada, the larger opposite-tail dependence coefficient corresponds to zonal atmospheric flow where Canada is relatively warm, while Alaska is relatively cool. These asymmetries also exist in the analytic tail dependencies for parametric bivariate mixture models in Krock et al. (2023). The main goal of our multivariate model in this paper is to reproduce these bivariate correlations and (empirical) tail dependencies. Spatial processes based on elliptical distributions can model a wide range of correlations but have opposite-tail symmetry in their bivariate tail dependence coefficients.

As a way to study opposite-tail dependence beyond the bivariate case, we propose statistics based on the spatial extent of an extreme event, defined in Zhang et al. (2022). First, in a preliminary step, we calculate the areas of the d = 76 coarse grid boxes using great circle distance2 and call them A1, …, Ad. For a temperature vector U = (U1, …, Ud)T on the copula scale, we define the statistics
αUU=limu1E(i=1dAi1(Ui>u)/π|Uj>u),αLL=limu1E(i=1dAi1(Ui1u)/π|Uj1u),αLU=limu1E(i=1dAi1(Ui1u)/π|Uj>u), andαUL=limu1E(i=1dAi1(Ui>u)/π|Uj1u)
to represent the average radius3 of exceedance (ARE) of spatial extremes in U given that Uj is extreme. Our extension to Zhang et al. (2022) is to consider cases other than the upper-tail extent αUU.

We empirically estimate the tail dependencies (1) and ARE (9) with samples from the probabilistic generative model. Since our primary interest is accurately modeling observed extreme events rather than extrapolation, we use a large number of simulations (i.e., ngen = 106) and select a modest quantile level of u = 0.95 to define empirical extremes. Increasing u further suggests that none of the estimated generative models exhibit long-range opposite-tail dependence (see Fig. D8). By setting aside analytic expressions for (1) and (9), we ignore the issue of extrapolating beyond the in-sample behavior, which is a primary role of extreme value theory. Nonetheless, the spatial asymmetries of moderately extreme surface air temperatures in our data example pose an interesting and relevant challenge outside the usual scope of spatial extremes methodology.

Model comparison

We conduct a tenfold cross-validation comparison to judge how well various probabilistic generative models can reproduce correlation, tail dependence, and spatial extent in the ERA5 temperature data. First, we separate the set {1, …, ndat} into k disjoint sets I1,,Ik. Each generative model listed below is trained on YIk for 1 ≤ k ≤ 10, where YIk contains all samples of Y except those from the cross-validation fold Ik. See appendix A for specific details about neural network training and architecture. We consider the following models:

  1. R-vine copula estimated using the vinecop function from pyvinecopulib (Nagler and Vatter 2020). We consider all possible types of bivariate copulas, including their rotations. We also permit nonparametric bivariate copulas based on local likelihood estimators (Geenens et al. 2017) since the cross-validation performance was noticeably worse upon restricting to only parametric copulas. The model selection procedure from Dißmann et al. (2013) automatically chooses the structures of the R-vine, estimates parameters for various bivariate copulas, and then picks the optimal bivariate copula for each vine pair.

  2. Neural Spline Flow (Durkan et al. 2019) using nflows (Durkan et al. 2020). See appendix A for mathematical details.

  3. PCA basis expansion (7) with Neural Spline Flow for the latent vector. Specifically, we create basis functions from the first eigenvectors of the SVD of the d × ndat copula data matrix, where the data are transformed to have standard normal marginals. We show results for = 25 principal components, which explained 96.4% of the total variability in the process (see Fig. 3) and provided a large improvement over = 15. Appendix B shows how our validation results change as the number of principal components varies.

  4. Generative moment matching network (Li et al. 2015) trained with energy distance loss.

Once the training on YIk is complete, we generate a new set of ngen = 106 samples (denoted as Y˜k) in order to evaluate the model fit with respect to the held out data YIk. For each grid box i ∈{1, …, d} and j > i, we calculate the following:

  1. (1/10)k=110ρi,j(YIk)ρi,j(Y˜k), where ρi,j(YI) is the empirical Spearman rank correlation between the ith and jth grid box, calculated using realizations YI.

  2. (1/10)k=110λUU,i,j(YIk)λUU,i,j(Y˜k), where λUU,i,j(YI) is the bivariate tail dependence λUU between the ith and jth grid box, calculated at the u = 0.95 quantile using realizations YI. This is performed for four tail configurations as in (2).

  3. (1/10)k=110αUU,i(YIk)αUU,i(Y˜k), where αUU,i(YI) is the empirical ARE conditional αUU on grid box i being extreme, calculated at the u = 0.95 quantile using realizations YI. This is performed for four tail configurations as in (9).

Note that these values are calculated with respect to the uniform copula, so if the testing data or simulations correspond to standard normal marginals, they are transformed to uniformity via the standard normal CDF. There are d differences in Step 3 and d(d − 1)/2 in Steps 1 and 2 that are used to evaluate and visualize the cross-validated model performance. A simple solution is to average these values and display them in boxplots grouped by the model. This provides a quick summary of model performance but collapses information over space. Instead, we can plot the values as a function of distance. For the pairwise statistics in Steps 1 and 2, there is a natural notion of distance, but each statistic in Step 3 is associated with a single grid box. Therefore, we plot the spatial extent as a function of distance from an arbitrary location, which is chosen to be the grid box in NW Alaska marked with a black dot in Fig. 2. Figures 46 show these two types of summary plots for correlation, tail dependence, and spatial extent, respectively.

Fig. 4.
Fig. 4.

(left) Boxplots showing the difference in empirical Spearman rank correlation between ERA5 testing data and the probabilistic generative model, averaged over all cross-validation folds. (right) The same statistics on the y axis, but the values are plotted as a function of pairwise distance along the x axis.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

Fig. 5.
Fig. 5.

(left) Boxplots showing the difference in empirical (u = 0.95) tail dependence between ERA5 testing data and the probabilistic generative model, averaged over all cross-validation folds. Note that λUL and λLU are grouped together in this case. (right) The same statistics on the y axis, but the values are plotted as a function of pairwise distance along the x axis.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

Fig. 6.
Fig. 6.

(left) Boxplots showing the difference in empirical (u = 0.95) ARE between ERA5 testing data and the probabilistic generative model, averaged over all cross-validation folds. (right) The same statistics on the y axis, but the values are plotted as a function of distance along the x axis. In this case, the distance is taken from the reference point in NW Alaska marked with a black dot in Fig. 2.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

In terms of spatial correlation, gmmn performs best, which is unsurprising since the generator is trained to match all moments of the distribution. Meanwhile, nflows tends to underestimate local positive correlations compared to the data distribution; this bias disappears when the flows are combined with principal components, likely because the basis functions add a notion of spatial continuity to the model. The PCA model noticeably overestimates αUU, αLL, λUU, and λLL, which measure dependence along the main diagonal of the distribution. Adding more basis functions does help correct these biases (see Fig. D2). Even so, the PCA model remains competitive with the other models that do not perform any dimension reduction. Similarly, gmmn also overestimates λUU and λLL at nearby locations, but interestingly, it underestimates αUU and αLL along with the other two types of spatial extent. Overall, nflows compares favorably with gmmn, performing slightly better for tail statistics and slightly worse for bulk statistics.

The vine copula is also competitive with the deep learning methods despite its simple formulation. Figure 7 illustrates a weakness of the vine copula in that it markedly underestimates the probability of teleconnected extremes where NW Alaska is abnormally warm and western United States is abnormally cold. Meanwhile, nflows models this type of dependence relatively well. Since both models have tractable densities, we can compare the cross-validated log-likelihood (1/10)k=110l(YIk;YIk), where l(YIk;YIk) is the log-likelihood of a model trained on YIk, evaluated with the testing data YIk. The cross-validated log-likelihoods for nflows and the vine copula are 34 081.21 and 31 017.97, respectively. Interestingly, the parametric vine copula produces a higher cross-validated log-likelihood of 33 990.49, but its performance in other cross-validation metrics was worse than the nonparametric vine copula. This data analysis provides an example where moving from traditional statistical methodology like vine copulas to normalizing flows provides better performance in modeling some aspects of extremes.

Fig. 7.
Fig. 7.

Displaying results from the right panel of Fig. 5 in a spatial map.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

6. Discussion and conclusions

Teleconnected patterns where distant locations experience oppositely signed temperature extremes motivate the need for multivariate extreme models that consider the entire distribution. This work is an initial step toward the spatial modeling of opposite-tail extremes, primarily addressing the asymmetric patterns in the spatial distribution of surface air temperature. We propose a copula model with bulk-and-tails marginal distributions whose dependence structure is controlled by a normalizing flow. We also propose relevant metrics for quantifying opposite-tail dependence and use them to judge model performance in a 10-fold cross-validation procedure. When compared to a vine copula, the traditional statistical tool in this scenario, our proposal performs better in modeling complicated asymmetric spatial patterns in winter temperatures in North America, such as anticorrelated teleconnective patterns in the bulk and tails of the distribution. Other recent machine learning methods such as GraphCast and Pangu-Weather (Lam et al. 2023; Bi et al. 2023) also outperform conventional approaches in the context of hurricane prediction; however, these approaches are not directly applicable in our case since they consider a conditional distribution of weather forecasting given initial conditions. Designing statistical models for teleconnected extremes will help create resilience against severe weather events caused by atmospheric blocking.

We also considered modeling dependence with a normalizing flow as the stochastic weights of a principal component basis expansion. Ordering the principal components by variance provides a natural solution to the autoregressive structure in the normalizing flow. An issue raised by Jiang et al. (2020) is that principal components are more suitable for describing the bulk of the distribution rather than the joint tails, so it may be worth exploring basis representations, including nonlinear ones (Tagasovska et al. 2019). Principal components add a notion of spatial smoothness that normalizing flows themselves lack, as seen in the biased local correlations in Fig. 4. Flows are more expressive than vine copulas, a standard statistical tool for dependence modeling. This greater flexibility comes with a cost: the flow is only a copula up to an approximation of the uniform distribution, and analytic expressions of tail dependence coefficients are unavailable. The practical implications of the first point are that the marginals estimated in the first step of our copula model will not be exact. We are unaware of any deep learning models that can overcome this limitation. As mentioned in section 4, this bias can be partially mitigated with marginal corrections. With vine copulas, the analytic expression for tail dependence involves recursive numerical computation of multivariate integrals, so tractability is not necessarily better than resorting to empirical estimation of tail dependence coefficients. A common simplifying assumption in the vine copula literature is that the parameters of the conditional copulas are constant with respect to the conditional variables. While there are efforts to construct vine copulas without this limitation (Acar et al. 2012; Stöber et al. 2013; Zhang and Bedford 2018), this type of parameter-level conditional dependence is a natural by-product of normalizing flows. Balancing the expressiveness of neural networks with the interpretability of classical parametric statistical models is a growing focus of spatial deep learning (Wikle and Zammit-Mangion 2023).

A benefit of the bulk-and-tails modeling framework is that it is invariant to the definition of extreme events. Although we focused on blocking events in North America, our method could be used to examine temperature teleconnections in other regions. Moreover, with appropriate4 marginal distributions, our method can be applied to other weather variables where the impact of extremes is critical (e.g., precipitation), and appendix D shows evidence that our proposal can scale to a finer-resolution grid at the cost of increased computation time. The biggest issue with our model from a spatial extremes point of view is that it lacks parameters to control the behavior of the joint tails of the distribution. Except in trivial cases, it is impossible to calculate the tail dependence coefficients of a flow model. With other probabilistic generative models that lack an explicit form for their probability density, the task of parameterizing joint tail behavior seems even more difficult. Indeed, the inability of neural networks to describe out-of-sample events is a prominent obstacle in the deep learning community (Nalisnick et al. 2018; Kirichenko et al. 2020). In this work, we have demonstrated the effectiveness of several nonparametric models in reproducing asymmetries of the (in sample) spatial distribution of surface air temperature, which is a necessary first step toward a spatial process model that is suitable for opposite-tail teleconnective extremal dependence.

1

See appendix B for a formal definition of an R-vine.

2

Gridbox areas are calculated using great circle distance: (π/180)R2|sin(lat1) − sin(lat2)ǁlon1 − lon2|, where R = 6371 km is the radius of Earth and (lon1, lat1) × (lon2, lat2) is the boundary of the grid box.

3

Note that A/π is the radius of a circle with area A.

4

For example, with precipitation data, we would need a marginal probability distribution that is supported on [0, ) with a point mass at zero. BATs could be modified for this setting.

Acknowledgments.

Mitchell Krock and Julie Bessac acknowledge the support from the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR), Contract DE-AC02-06CH11357. Thanks to Adam Monahan for useful discussions.

Data availability statement.

ERA5 data are publicly available at https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview.

APPENDIX A

Normalizing Flow

Here, we provide additional details about the parametric form of the normalizing flow gθ:RlRl. Recall that gθ(⋅) must be an invertible function with tractable Jacobian determinant. Flows are often constructed in an autoregressive fashion that takes the noise vector ZRl and outputs the ith component of Y = gθ(Z) as Yi = h[Zi; ci(Z<i)], where h(;θ):RR is a monotonically increasing function whose parameters θ are obtained as the output of a neural network ci(⋅), which takes Z<i = (Z1, …, Zi−1)T as input. In practice, ci(Z<i) for i = 1, …, are not the output of separate neural networks but rather the output of a single pass through a neural network that is appropriately “masked” to preserve the autoregressive structure (Papamakarios et al. 2017).

By construction, the Jacobian matrix of this autoregressive transformation is triangular and therefore has a trivial determinant. Moreover, gθ1() is readily computed since h(⋅; θ) is a bijection; i.e., Zi = h−1[Yi;ci(Z<i)]. However, calculating this inverse is slow since it requires sequential computation in that ci(Z<i) must be computed before Zi. This typically means that sampling from an autoregressive flow is times slower than evaluating its likelihood, although in this paper, we have used the convention that gθ(⋅) maps from the noise distribution to the data distribution, while most flows are formulated in the opposite direction.

Early flow models consider simple forms for h(⋅; θ) (e.g., an affine transformation), limiting their expressive power. In the Neural Spline Flow, Durkan et al. (2019) define h(⋅; θ) with rational quadratic splines, improving upon their previous work that uses cubic splines. Specifically, they use monotonically increasing piecewise rational quadratic splines that bijectively maps [0, 1] to [0, 1]. Durkan et al. (2019) suggest augmenting this mapping with the identity transformation outside [0, 1] so that the flow can take unbounded inputs. We found that mapping from Rl to [0, 1] with the sigmoid function, modeling on [0, 1] with autoregressive rational spline transformations, and then transforming back to Rl with the logit function was a more effective way to handle unbounded inputs than using linear tails. Note that strictly increasing componentwise transformations like sigmoid and logit activation functions do not change the copula (Joe 2014). Laszkiewicz et al. (2022) observed that using linear tails outside [0, 1] leads to the same type of restrictive tail behavior seen in triangular affine flows (Jaini et al. 2020) as mentioned in section 2b.

APPENDIX B

Vine Copula

A tree T is an undirected graph where there is a unique path between two nodes. Czado and Nagler (2022) define a d-dimensional R-vine as a collection of trees {T1, …, Td−1} such that

  1. T1 has edges E1 and nodes N1 = {1, …, d},

  2. For i ≥ 2, Ti has edges Ei and nodes Ni = Ei−1,

  3. For tree Ti with i ∈ {2, …, d − 1}, if an edge connects nodes a1 and a2, and another edge connects b1 and b2, then |{a1, a2} ∩ {b1, b2}| = 1, where ∩ denotes set intersection and |⋅| denotes the cardinality of a set.

That is, the nodes of Ti are the edges of Ti−1, and if two edges are in tree Ti, then they must share a common node in Ti−1.

APPENDIX C

Synthetic Data Analysis

Consider a bivariate Student’s t distribution with ν = 1 degrees of freedom (i.e., a bivariate Cauchy distribution) and dependence coefficient ρ = 0.8. The tail dependence coefficients of this model are λUU = λLL = f(ν, ρ) and λUL = λLU = f(ν, −ρ), where Tν(⋅) denotes the Student’s t cumulative distribution function with ν degrees of freedom and f(ν,ρ)=2Tν+1{[(ν+1)(1ρ)]/(1+ρ)}. Training data for our model consists of ndat = 3940 samples from this bivariate Cauchy distribution. Although a sample size of 3940 would be considered small for most deep learning applications, it may be realistic for environmental data corresponding to climate records (e.g., it is the sample size of the ERA5 reanalysis product used in section 5). We train the models for 20 000 epochs using a batch size of 100 and a learning rate of 0.0001 in the ADAM minimizer; this also matches the settings in section 5.

We briefly investigate the role of the input dimension of the noise vector Z to the generator in a gmmn model. To match the framework of the normalizing flow, we consider the gmmn to take a two-dimensional noise vector ZR2 as input. We can also use a high-dimensional input space for Z by taking the same neural network architecture but removing the two-dimensional input and instead starting with the second, deeper layer. Despite having fewer parameters, we see improved performance in Fig. C1 when ZR100 rather than ZR2. As expected, tail dependence beyond the 1(1/3940) quantile is underestimated. Without further parametric control, it will be difficult for probabilistic generative models to extrapolate beyond the in-sample tail behavior.

Fig. C1.
Fig. C1.

Displaying the estimated tail dependence coefficients and corresponding 95% parametric-bootstrap confidence interval. (top left) The results under the true model. (top right) The results for nflows. (bottom) The results for gmmn models. The two gmmn models have nearly the same number of parameters, except that (bottom left) the generator takes a two-dimensional noise vector Z as input and (bottom right) ZR100. Horizontal black lines show analytic tail dependence coefficients, and the vertical black line shows the training sample size. Line colors: λ^UU red, λ^LL blue, λ^UL orange, and λ^LU purple. The same axes on all plots.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

APPENDIX D

ERA5 Appendix

a. Neural network architecture

Here, we describe the neural network architectures used in section 5 to model the ERA5 temperature data. These models are based on the PyTorch backend and assume that marginal distributions of the pseudocopula are standard normal rather than standard uniform. Each model is trained for 20 000 epochs using a learning rate of 0.0001 in the ADAM optimizer (Kingma and Ba 2015) and a batch size of 100. We experienced training issues with a learning rate of 0.001 that were resolved upon decreasing it to 0.0001.

The generative moment matching network is a standard multilayer perceptron with the following PyTorch definition:

Sequential(Linear(76, 100), ReLU(), Linear(100, 200), ReLU(),

 Linear(200, 400), ReLU(), Linear(400, 400), ReLU(),

 Linear(400, 200), ReLU(), Linear(200, 100), ReLU(),

 Linear(100, 76))

The normalizing flow architecture is defined using the nflows package as follows:

L = 76

base_dist = distributions.normal.StandardNormal(shape=[L])

num_layers = 4

transforms = []

transforms.append(Sigmoid())

for _ in range(num_layers):

 transforms.append (

  MaskedPiecewiseRationalQuadraticAutoregressive

  Transform(features=L,

  hidden_features=32))

 transforms.append(RandomPermutation(features=L))

transforms.append(Logit())

There are 376 676 and 337 744 trainable parameters in the gmmn and nflows models, respectively. See Fig. D1 for an illustration of the gmmn architecture. Creating a similar figure for nflows is not straightforward due to its autoregressive nature. The total number of neural network parameters was chosen to be approximately 100 times larger than ndat = 3940, the number of measurements at each location.

Fig. D1.
Fig. D1.

Neural network architecture for the gmmn model in section 5.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

b. More model comparisons

We ruled out several models before cross-validation study in section 5 due to poor performance. First, we disregarded GAN methods due to training difficulties. In particular, we tried two models trained with adversarial loss: GAN (Goodfellow et al. 2014) and Wasserstein GAN (Arjovsky et al. 2017). The Wasserstein GAN uses a different loss function to avoid issues (e.g., mode collapse, vanishing gradients) that are commonly encountered with the original GAN loss function in Goodfellow et al. (2014). However, both methods performed poorly on the ERA5 data.

The following models were more competitive with the methods presented in the paper. The implicit generative copula (Janke et al. 2021) is a special type of generative moment matching network (Li et al. 2015) that uses a novel neural network activation function to enforce marginal distributions to be approximately uniform during training. SoftFlow (Kim et al. 2020) was used to model tail dependence in COMET Flows (McDonald et al. 2022). Our attempts to fit COMET Flows directly were not successful. Figures D2D4 are analogous to Figs. 46 but without any cross validation; i.e., the training and testing is performed on the entire ERA5 dataset. Overall, models based on nflows are most accurate. SoftFlow consistently overestimates the spatial extent of extremes. Upon further inspection, this is due to poor marginal fitting—SoftFlow does not how to increase the probability extremes in the joint distribution, only in the marginal distributions. Finally, we found that the generative moment matching network was faster to train and more accurate than the implicit generative copula, especially in terms of the estimated spatial correlation.

Fig. D2.
Fig. D2.

Boxplots showing the difference in the empirical Spearman rank correlation between ERA5 data and the probabilistic generative model.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

Fig. D3.
Fig. D3.

Boxplots showing the difference in empirical (u = 0.95) tail dependence between ERA5 data and the probabilistic generative model. Note that λUL and λLU are grouped together in this case.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

Fig. D4.
Fig. D4.

Boxplots showing the difference in empirical (u = 0.95) ARE between ERA5 data and the probabilistic generative model.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

Figures D5D7 demonstrate how cross-validation performance improves as we increase the number of principal components in (7). Note that when all 76 principal components are used, the model is different from nflows, as normalizing flows are not invariant to linear transformations.

Fig. D5.
Fig. D5.

Increasing the number of principal components in Fig. 4.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

Fig. D6.
Fig. D6.

Increasing the number of principal components in Fig. 5.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

Fig. D7.
Fig. D7.

Increasing the number of principal components in Fig. 6.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

c. Further into tails beyond u = 0.95

In Fig. D8, we show the tail dependence of our probabilistic generative models beyond the u = 0.95 quantile. Values are again based on ngen = 106 unconditional samples from the model (without any cross validation). Clearly, the models in their current form do not exhibit long-range asymptotic dependence as u↑1.

Fig. D8.
Fig. D8.

Displaying the estimated tail dependence coefficients at quantiles u = 0.99, 0.999, calculated using 106 samples from the generative model.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

d. Finer-resolution grid

Finally, we analyzed the ERA5 data on a finer scale, with the gridbox dimensions halved vertically and horizontally. This corresponds to d = 345 grid boxes over North America, nearly a 4.5 times increase from the original setting. Again, each grid box corresponds to a spatial average of ERA5 temperatures on the native 0.25° longitude/latitude grid. In other offline experiments, we observed weakened tail dependencies by defining the grid box with the nearest observation to its centroid instead of with a spatial average. Marginal distributions are also transformed to uniformity using the empirical cumulative distribution function instead of an estimated BATs distribution.

We fit a normalizing flow with 1 472 924 parameters, which is also around a 4.5 times increase from appendix A. Setup and training follows appendix A but is performed for 80 000 epochs instead of 20 000 epochs. Overall, Fig. D9 shows similar results in this fine-scale experiment, but blocking teleconnective patterns in NW America are much less pronounced for nflows than the corresponding empirical estimates. After the first 20 000 epochs, the discrepancy was even more dramatic with very similar values for λUL and λLU. The additional 60 000 epochs of training helped strengthen the opposite-tail dependencies. Training took around 65 h on a 2.10-GHz Intel Xeon Gold 6130 CPU using 32 threads on one core, nearly 7 times longer than the original setting in section 5. We also trained a vine copula model (not shown), which again substantially underestimates long-range tail dependencies.

Fig. D9.
Fig. D9.

(left) Empirical u = 0.95 tail dependence coefficients and (right) corresponding estimates for nflows (calculated with 106 samples from the generative model).

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0086.1

REFERENCES

  • Aas, K., C. Czado, A. Frigessi, and H. Bakken, 2009: Pair-copula constructions of multiple dependence. Insur. Math. Econ., 44, 182198, https://doi.org/10.1016/j.insmatheco.2007.02.001.

    • Search Google Scholar
    • Export Citation
  • Acar, E. F., C. Genest, and J. Nešlehová, 2012: Beyond simplified pair-copula constructions. J. Multivar. Anal., 110, 7490, https://doi.org/10.1016/j.jmva.2012.02.001.

    • Search Google Scholar
    • Export Citation
  • Allouche, M., S. Girard, and E. Gobet, 2022: EV-GAN: Simulation of extreme events with ReLU neural networks. J. Mach. Learn. Res., 23 (150), 139.

    • Search Google Scholar
    • Export Citation
  • Annau, N. J., A. J. Cannon, and A. H. Monahan, 2023: Algorithmic hallucinations of near-surface winds: Statistical downscaling with generative adversarial networks to convection-permitting scales. Artif. Intell. Earth Syst., 2, e230015, https://doi.org/10.1175/AIES-D-23-0015.1.

    • Search Google Scholar
    • Export Citation
  • Arjovsky, M., S. Chintala, and L. Bottou, 2017: Wasserstein GAN. arXiv, 1701.07875v3, https://doi.org/10.48550/ARXIV.1701.07875.

  • Bedford, T., and R. M. Cooke, 2002: Vines—A new graphical model for dependent random variables. Ann. Stat., 30, 10311068, https://doi.org/10.1214/aos/1031689016.

    • Search Google Scholar
    • Export Citation
  • Bi, K., L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, 2023: Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619, 533538, https://doi.org/10.1038/s41586-023-06185-3.

    • Search Google Scholar
    • Export Citation
  • Boulaguiem, Y., J. Zscheischler, E. Vignotto, K. van der Wiel, and S. Engelke, 2022: Modeling and simulating spatial extremes by combining extreme value theory with generative adversarial networks. Environ. Data Sci., 1, e5, https://doi.org/10.1017/eds.2022.4.

    • Search Google Scholar
    • Export Citation
  • Brechmann, E. C., and C. Czado, 2013: Risk management with high-dimensional vine copulas: An analysis of the Euro Stoxx 50. Stat. Risk Model., 30, 307342, https://doi.org/10.1524/strm.2013.2002.

    • Search Google Scholar
    • Export Citation
  • Carrera, M. L., R. W. Higgins, and V. E. Kousky, 2004: Downstream weather impacts associated with atmospheric blocking over the northeast Pacific. J. Climate, 17, 48234839, https://doi.org/10.1175/JCLI-3237.1.

    • Search Google Scholar
    • Export Citation
  • Chang, K.-L., 2021: A new dynamic mixture copula mechanism to examine the nonlinear and asymmetric tail dependence between stock and exchange rate returns. Comput. Econ., 58, 965999, https://doi.org/10.1007/s10614-020-09981-5.

    • Search Google Scholar
    • Export Citation
  • Charpentier, A., A.-L. Fougères, C. Genest, and J. G. Nešlehová, 2014: Multivariate Archimax copulas. J. Multivar. Anal., 126, 118136, https://doi.org/10.1016/j.jmva.2013.12.013.

    • Search Google Scholar
    • Export Citation
  • Coccaro, A., M. Letizia, H. Reyes-Gonzalez, and R. Torre, 2023: Comparative study of coupling and autoregressive flows through robust statistical tests. arXiv, 2302.12024v2, https://doi.org/10.48550/arXiv.2302.12024.

  • Coles, S., 2001: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics, Springer-Verlag, 209 pp., https://doi.org/10.1007/978-1-4471-3675-0.

  • Cramer, E., A. Mitsos, R. Tempone, and M. Dahmen, 2022: Principal component density estimation for scenario generation using normalizing flows. Data-Centric Eng., 3, e7, https://doi.org/10.1017/dce.2022.7.

    • Search Google Scholar
    • Export Citation
  • Cunningham, E., R. Zabounidis, A. Agrawal, M. Fiterau, and D. Sheldon, 2020: Normalizing flows across dimensions. arXiv, 2006.13070v1, https://doi.org/10.48550/arXiv.2006.13070.

  • Czado, C., and T. Nagler, 2022: Vine copula based modeling. Annu. Rev. Stat. Appl., 9, 453477, https://doi.org/10.1146/annurev-statistics-040220-101153.

    • Search Google Scholar
    • Export Citation
  • Dißmann, J., E. C. Brechmann, C. Czado, and D. Kurowicka, 2013: Selecting and estimating regular vine copulae and application to financial returns. Comput. Stat. Data Anal., 59, 5269, https://doi.org/10.1016/j.csda.2012.08.010.

    • Search Google Scholar
    • Export Citation
  • Drees, H., and A. Sabourin, 2021: Principal component analysis for multivariate extremes. Electron. J. Stat., 15, 908943, https://doi.org/10.1214/21-EJS1803.

    • Search Google Scholar
    • Export Citation
  • Durkan, C., A. Bekasov, I. Murray, and G. Papamakarios, 2019: Neural spline flows. arXiv, 1906.04032v2, https://doi.org/10.48550/arXiv.1906.04032.

  • Durkan, C., A. Bekasov, I. Murray, and G. Papamakarios, 2020: nflows: Normalizing flows in PyTorch. Zenodo, https://doi.org/10.5281/zenodo.4296287.

  • Erhardt, T. M., C. Czado, and U. Schepsmeier, 2015: R-vine models for spatial time series with an application to daily mean temperature. Biometrics, 71, 323332, https://doi.org/10.1111/biom.12279.

    • Search Google Scholar
    • Export Citation
  • Feydy, J., T. Séjourné, F.-X. Vialard, S. Amari, A. Trouvé, and G. Peyré, 2019: Interpolating between optimal transport and MMD using Sinkhorn divergences. 22nd Int. Conf. on Artificial Intelligence and Statistics, Naha, Okinawa, Japan, PMLR, 2681–2690, https://proceedings.mlr.press/v89/feydy19a/feydy19a.pdf.

  • Geenens, G., A. Charpentier, and D. Paindaveine, 2017: Probit transformation for nonparametric kernel estimation of the copula density. Bernoulli, 23, 18481873, https://doi.org/10.3150/15-BEJ798.

    • Search Google Scholar
    • Export Citation
  • Gong, Y., and R. Huser, 2022: Asymmetric tail dependence modeling, with application to cryptocurrency market data. Ann. Appl. Stat., 16, 18221847, https://doi.org/10.1214/21-AOAS1568.

    • Search Google Scholar
    • Export Citation
  • Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, 2014: Generative adversarial nets. Advances in Neural Information Processing Systems 27 (NIPS 2014), Curran Associates Inc., 2672–2680, https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.

  • Gräler, B., 2014: Modelling skewed spatial random fields through the spatial vine copula. Spat. Stat., 10, 87102, https://doi.org/10.1016/j.spasta.2014.01.001.

    • Search Google Scholar
    • Export Citation
  • Heffernan, J. E., and J. A. Tawn, 2004: A conditional approach for multivariate extreme values (with discussion). J. Roy. Stat. Soc., 66B, 497546, https://doi.org/10.1111/j.1467-9868.2004.02050.x.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 19992049, https://doi.org/10.1002/qj.3803.

    • Search Google Scholar
    • Export Citation
  • Ho, J., A. Jain, and P. Abbeel, 2020: Denoising diffusion probabilistic models. Proc. 34th Int. Conf. on Neural Information Processing Systems, Vancouver, BC, Canada, Association for Computing Machinery, 6840–6851, https://dl.acm.org/doi/abs/10.5555/3495724.3496298.

  • Huser, R., and J. L. Wadsworth, 2022: Advances in statistical modeling of spatial extremes. Wiley Interdiscip. Rev. Comput. Stat., 14, e1537, https://doi.org/10.1002/wics.1537.

    • Search Google Scholar
    • Export Citation
  • Huster, T., J. Cohen, Z. Lin, K. Chan, C. Kamhoua, N. O. Leslie, C.-Y. J. Chiang, and V. Sekar, 2021: Pareto GAN: Extending the representational power of GANs to heavy-tailed distributions. Proc. 38th Int. Conf. on Machine Learning, Online, PMLR, 4523–4532, https://proceedings.mlr.press/v139/huster21a/huster21a.pdf.

  • Jaini, P., I. Kobyzev, Y. Yu, and M. Brubaker, 2020: Tails of Lipschitz triangular flows. Proc. 37th Int. Conf. on Machine Learning, Online, PMLR, 4673–4681, https://proceedings.mlr.press/v119/jaini20a/jaini20a.pdf.

  • Janke, T., M. Ghanmi, and F. Steinke, 2021: Implicit generative copulas. Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Curran Associates Inc., 26 028–26 039, https://proceedings.neurips.cc/paper/2021/hash/dac4a67bdc4a800113b0f1ad67ed696f-Abstract.html.

  • Jiang, Y., D. Cooley, and M. F. Wehner, 2020: Principal component analysis for extremes and application to U.S. precipitation. J. Climate, 33, 64416451, https://doi.org/10.1175/JCLI-D-19-0413.1.

    • Search Google Scholar
    • Export Citation
  • Joe, H., 2014: Dependence Modeling with Copulas. 1st ed. Taylor and Francis, 462 pp.

  • Joe, H., H. Li, and A. K. Nikoloulopoulos, 2010: Tail dependence functions and vine copulas. J. Multivar. Anal., 101, 252270, https://doi.org/10.1016/j.jmva.2009.08.002.

    • Search Google Scholar
    • Export Citation
  • Karami, M., D. Schuurmans, J. Sohl-Dickstein, L. Dinh, and D. Duckworth, 2019: Invertible convolutional flow. 33rd Conf. on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, NeurIPS, 1–11, https://proceedings.neurips.cc/paper_files/paper/2019/file/b1f62fa99de9f27a048344d55c5ef7a6-Paper.pdf.

  • Kim, H., H. Lee, W. H. Kang, J. Y. Lee, and N. S. Kim, 2020: SoftFlow: Probabilistic framework for normalizing flow on manifolds. arXiv, 2006.04604v4, http://arxiv.org/abs/2006.04604.

  • Kingma, D., and J. Ba, 2015: Adam: A method for stochastic optimization. Proc. Third Int. Conf. on Learning Representations (ICLR), San Diega, CA, ICLR, 1–13, https://dare.uva.nl/search?identifier=a20791d3-1aff-464a-8544-268383c33a75.

  • Kirichenko, P., P. Izmailov, and A. G. Wilson, 2020: Why normalizing flows fail to detect out-of-distribution data. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Curran Associates Inc., 20 578–20 589, https://proceedings.neurips.cc/paper_files/paper/2020/file/ecb9fe2fbb99c31f567e9823e884dbec-Paper.pdf.

  • Klein, N., N. Panda, P. Gasda, and D. Oyen, 2022: Generative structured normalizing flow Gaussian processes applied to spectroscopic data. arXiv, 2212.07554v1, https://doi.org/10.48550/ARXIV.2212.07554.

  • Kobyzev, I., S. D. Prince, and M. A. Brubaker, 2021: Normalizing flows: An introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell., 43, 39643979, https://doi.org/10.1109/TPAMI.2020.2992934.

    • Search Google Scholar
    • Export Citation
  • Krock, M., J. Bessac, M. L. Stein, and A. H. Monahan, 2022: Nonstationary seasonal model for daily mean temperature distribution bridging bulk and tails. Wea. Climate Extremes, 36, 100438, https://doi.org/10.1016/j.wace.2022.100438.

    • Search Google Scholar
    • Export Citation
  • Krock, M. L., A. H. Monahan, and M. L. Stein, 2023: Tail dependence as a measure of teleconnected warm and cold extremes of North American wintertime temperatures. J. Climate, 36, 44614473, https://doi.org/10.1175/JCLI-D-22-0662.1.

    • Search Google Scholar
    • Export Citation
  • Lam, R., and Coauthors, 2023: Learning skillful medium-range global weather forecasting. Science, 382, 14161421, https://doi.org/10.1126/science.adi2336.

    • Search Google Scholar
    • Export Citation
  • Laszkiewicz, M., J. Lederer, and A. Fischer, 2022: Marginal tail-adaptive normalizing flows. Proc. 39th Int. Conf. on Machine Learning, Baltimore, Maryland, PMLR, 12 020–12 048, https://proceedings.mlr.press/v162/laszkiewicz22a.html.

  • Li, H., 2009: Orthant tail dependence of multivariate extreme value distributions. J. Multivar. Anal., 100, 243256, https://doi.org/10.1016/j.jmva.2008.04.007.

    • Search Google Scholar
    • Export Citation
  • Li, S., and B. Hooi, 2022: Neural PCA for flow-based representation learning. Proc. 31st Int. Joint Conf. on Artificial Intelligence, Vienna, Austria, IJCAI, 3229–3235, https://doi.org/10.24963/ijcai.2022/448.

  • Li, Y., K. Swersky, and R. Zemel, 2015: Generative moment matching networks. Proc. 32nd Int. Conf. on Machine Learning, Lille, France, PMLR, 1718–1727, https://proceedings.mlr.press/v37/li15.html.

  • Low, R. K. Y., J. Alcock, R. Faff, and T. Brailsford, 2018: Canonical vine copulas in the context of modern portfolio management. Asymmetric Dependence in Finance: Diversification, Correlation and Portfolio Management in Market Downturns, J. Alcock and S. Satchell, Eds., John Wiley and Sons Ltd, 263–289, https://doi.org/10.1002/9781119288992.ch11.

  • MacDonald, A., C. J. Scarrott, D. Lee, B. Darlow, M. Reale, and G. Russell, 2011: A flexible extreme value mixture model. Comput. Stat. Data Anal., 55, 21372157, https://doi.org/10.1016/j.csda.2011.01.005.

    • Search Google Scholar
    • Export Citation
  • McDonald, A., P.-N. Tan, and L. Luo, 2022: COMET Flows: Towards generative modeling of multivariate extremes and tail dependence. Proc. 31st Int. Joint Conf. on Artificial Intelligence, Vienna, Austria, IJCAI, 3328–3334, https://doi.org/10.24963/ijcai.2022/462.

  • Morales-Nápoles, O., 2010: Counting vines. Dependence Modeling, D. Kurowicka, Ed., World Scientific, 189–218, https://doi.org/10.1142/9789814299886_0009.

  • Nagler, T., and T. Vatter, 2020: Pyvinecopulib. GitHub, https://github.com/vinecopulib/pyvinecopulib/.

  • Nalisnick, E., A. Matsukawa, Y. W. Teh, D. Gorur, and B. Lakshminarayanan, 2018: Do deep generative models know what they don’t know? Seventh Int. Conf. on Learning Representations, New Orleans, LA, ICLR, 1–19, https://openreview.net/forum?id=H1xwNhCcYm.

  • Nelsen, R. B., 2006: An Introduction to Copulas. 2nd ed. Springer, 272 pp.

  • Ng, Y., A. Hasan, and V. Tarokh, 2022: Inference and sampling for Archimax copulas. 36th Conference on Neural Information Processing Systems (NeurIPS 2022), Curran Associates Inc., 17 099–17 116, https://proceedings.neurips.cc/paper_files/paper/2022/file/6d00071564ec447466fc4577743cf1b3-Paper-Conference.pdf.

  • NOAA, 2023: U.S. billion-dollar weather and climate disasters. Accessed 23 October 2023, https://doi.org/10.25921/stkw-7w73.

  • Oriol, B., and A. Miot, 2021: On some theoretical limitations of generative adversarial networks. arXiv, 2110.10915v1, https://doi.org/10.48550/ARXIV.2110.10915.

  • Papamakarios, G., T. Pavlakou, and I. Murray, 2017: Masked autoregressive flow for density estimation. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., 2335–2344, https://dl.acm.org/doi/10.5555/3294771.3294994.

  • Papamakarios, G., E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan, 2022: Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res., 22, 26172680.

    • Search Google Scholar
    • Export Citation
  • Reiss, R.-D., and M. Thomas, 2007: Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and other Fields. 3rd ed. Birkhäuser Verlag, 511 pp.

  • Rezende, D. J., and S. Mohamed, 2015: Variational inference with normalizing flows. Proc. 32nd Int. Conf. on Machine Learning, Lille, France, JMLR, 1530–1538, https://proceedings.mlr.press/v37/rezende15.pdf.

  • Ronneberger, O., P. Fischer, and T. Brox, 2015: U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Springer International Publishing, 234–241, https://doi.org/10.1007/978-3-319-24574-4_28.

  • Salazar Flores, Y., and A. Díaz-Hernández, 2021: Counterdiagonal/nonpositive tail dependence in Vine copula constructions: Application to portfolio management. Stat. Methods Appl., 30, 375407, https://doi.org/10.1007/s10260-020-00527-5.

    • Search Google Scholar
    • Export Citation
  • Scarrott, C., and A. MacDonald, 2012: A review of extreme value threshold estimation and uncertainty quantification. REVSTAT- Stat. J., 10, 3360, https://doi.org/10.57805/revstat.v10i1.110.

    • Search Google Scholar
    • Export Citation
  • Si, P., and V. Kuleshov, 2022: Energy flows: Towards determinant-free training of normalizing flows. arXiv, 2206.06672v1, https://doi.org/10.48550/ARXIV.2206.06672.

  • Si, P., A. Bishop, and V. Kuleshov, 2021: Autoregressive quantile flows for predictive uncertainty estimation. arXiv, 2112.04643v3, https://doi.org/10.48550/ARXIV.2112.04643.

  • Sibuya, M., 1960: Bivariate extreme statistics, I. Annals Inst. Stat. Math., 11, 195210, https://doi.org/10.1007/BF01682329.

  • Simpson, E. S., J. L. Wadsworth, and J. A. Tawn, 2021: A geometric investigation into the tail dependence of vine copulas. J. Multivar. Anal., 184, 104736, https://doi.org/10.1016/j.jmva.2021.104736.

    • Search Google Scholar
    • Export Citation
  • Singh, H., M. R. Najafi, and A. J. Cannon, 2021: Characterizing non-stationary compound extreme events in a changing climate based on large-ensemble climate simulations. Climate Dyn., 56, 13891405, https://doi.org/10.1007/s00382-020-05538-2.

    • Search Google Scholar
    • Export Citation
  • Sklar, M., 1959: Fonctions de repartition an dimensions et leurs marges (in French). Publ. Inst. Stat. Univ. Paris, 8, 229231.

  • Stein, M. L., 2021: A parametric model for distributions with flexible behavior in both tails. Environmetrics, 32, e2658, https://doi.org/10.1002/env.2658.

    • Search Google Scholar
    • Export Citation
  • Stöber, J., H. Joe, and C. Czado, 2013: Simplified pair copula constructions—Limitations and extensions. J. Multivar. Anal., 119, 101118, https://doi.org/10.1016/j.jmva.2013.04.014.

    • Search Google Scholar
    • Export Citation
  • Tagasovska, N., D. Ackerer, and T. Vatter, 2019: Copulas as high-dimensional generative models: Vine copula autoencoders. 33rd Conf. on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, NeurIPS, 1–13, https://proceedings.neurips.cc/paper_files/paper/2019/file/15e122e839dfdaa7ce969536f94aecf6-Paper.pdf.

  • Wadsworth, J. L., and J. A. Tawn, 2022: Higher-dimensional spatial extremes via single-site conditioning. Spat. Stat., 51, 100677, https://doi.org/10.1016/j.spasta.2022.100677.

    • Search Google Scholar
    • Export Citation
  • Wallace, J. M., and D. S. Gutzler, 1981: Teleconnections in the geopotential height field during the Northern Hemisphere winter. Mon. Wea. Rev., 109, 784812, https://doi.org/10.1175/1520-0493(1981)109<0784:TITGHF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Wang, Y.-C., J.-L. Wu, and Y.-H. Lai, 2013: A revisit to the dependence structure between the stock and foreign exchange markets: A dependence-switching copula approach. J. Banking Finance, 37, 17061719, https://doi.org/10.1016/j.jbankfin.2013.01.001.

    • Search Google Scholar
    • Export Citation
  • Wiese, M., R. Knobloch, and R. Korn, 2019: Copula and marginal flows: Disentangling the marginal from its joint. arXiv, 1907.03361v1, https://doi.org/10.48550/ARXIV.1907.03361.

  • Wikle, C. K., and A. Zammit-Mangion, 2023: Statistical deep learning for spatial and spatiotemporal data. Annu. Rev. Stat. Appl., 10, 247270, https://doi.org/10.1146/annurev-statistics-033021-112628.

    • Search Google Scholar
    • Export Citation
  • Zhang, L., M. D. Risser, E. M. Molter, M. F. Wehner, and T. A. O’Brien, 2022: Accounting for the spatial structure of weather systems in detected changes in precipitation extremes. Wea. Climate Extremes, 38, 100499, https://doi.org/10.1016/j.wace.2022.100499.

    • Search Google Scholar
    • Export Citation
  • Zhang, M., and T. Bedford, 2018: Vine copula approximation: A generic method for coping with conditional dependence. Stat. Comput., 28, 219237, https://doi.org/10.1007/s11222-017-9727-9.

    • Search Google Scholar
    • Export Citation
  • Zhang, M.-H., 2008: Modelling total tail dependence along diagonals. Insur. Math. Econ., 42, 7380, https://doi.org/10.1016/j.insmatheco.2007.01.002.

    • Search Google Scholar
    • Export Citation
Save
  • Aas, K., C. Czado, A. Frigessi, and H. Bakken, 2009: Pair-copula constructions of multiple dependence. Insur. Math. Econ., 44, 182198, https://doi.org/10.1016/j.insmatheco.2007.02.001.

    • Search Google Scholar
    • Export Citation
  • Acar, E. F., C. Genest, and J. Nešlehová, 2012: Beyond simplified pair-copula constructions. J. Multivar. Anal., 110, 7490, https://doi.org/10.1016/j.jmva.2012.02.001.

    • Search Google Scholar
    • Export Citation
  • Allouche, M., S. Girard, and E. Gobet, 2022: EV-GAN: Simulation of extreme events with ReLU neural networks. J. Mach. Learn. Res., 23 (150), 139.

    • Search Google Scholar
    • Export Citation
  • Annau, N. J., A. J. Cannon, and A. H. Monahan, 2023: Algorithmic hallucinations of near-surface winds: Statistical downscaling with generative adversarial networks to convection-permitting scales. Artif. Intell. Earth Syst., 2, e230015, https://doi.org/10.1175/AIES-D-23-0015.1.

    • Search Google Scholar
    • Export Citation
  • Arjovsky, M., S. Chintala, and L. Bottou, 2017: Wasserstein GAN. arXiv, 1701.07875v3, https://doi.org/10.48550/ARXIV.1701.07875.

  • Bedford, T., and R. M. Cooke, 2002: Vines—A new graphical model for dependent random variables. Ann. Stat., 30, 10311068, https://doi.org/10.1214/aos/1031689016.

    • Search Google Scholar
    • Export Citation
  • Bi, K., L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, 2023: Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619, 533538, https://doi.org/10.1038/s41586-023-06185-3.

    • Search Google Scholar
    • Export Citation
  • Boulaguiem, Y., J. Zscheischler, E. Vignotto, K. van der Wiel, and S. Engelke, 2022: Modeling and simulating spatial extremes by combining extreme value theory with generative adversarial networks. Environ. Data Sci., 1, e5, https://doi.org/10.1017/eds.2022.4.

    • Search Google Scholar
    • Export Citation
  • Brechmann, E. C., and C. Czado, 2013: Risk management with high-dimensional vine copulas: An analysis of the Euro Stoxx 50. Stat. Risk Model., 30, 307342, https://doi.org/10.1524/strm.2013.2002.

    • Search Google Scholar
    • Export Citation
  • Carrera, M. L., R. W. Higgins, and V. E. Kousky, 2004: Downstream weather impacts associated with atmospheric blocking over the northeast Pacific. J. Climate, 17, 48234839, https://doi.org/10.1175/JCLI-3237.1.

    • Search Google Scholar
    • Export Citation
  • Chang, K.-L., 2021: A new dynamic mixture copula mechanism to examine the nonlinear and asymmetric tail dependence between stock and exchange rate returns. Comput. Econ., 58, 965999, https://doi.org/10.1007/s10614-020-09981-5.

    • Search Google Scholar
    • Export Citation
  • Charpentier, A., A.-L. Fougères, C. Genest, and J. G. Nešlehová, 2014: Multivariate Archimax copulas. J. Multivar. Anal., 126, 118136, https://doi.org/10.1016/j.jmva.2013.12.013.

    • Search Google Scholar
    • Export Citation
  • Coccaro, A., M. Letizia, H. Reyes-Gonzalez, and R. Torre, 2023: Comparative study of coupling and autoregressive flows through robust statistical tests. arXiv, 2302.12024v2, https://doi.org/10.48550/arXiv.2302.12024.

  • Coles, S., 2001: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics, Springer-Verlag, 209 pp., https://doi.org/10.1007/978-1-4471-3675-0.

  • Cramer, E., A. Mitsos, R. Tempone, and M. Dahmen, 2022: Principal component density estimation for scenario generation using normalizing flows. Data-Centric Eng., 3, e7, https://doi.org/10.1017/dce.2022.7.

    • Search Google Scholar
    • Export Citation
  • Cunningham, E., R. Zabounidis, A. Agrawal, M. Fiterau, and D. Sheldon, 2020: Normalizing flows across dimensions. arXiv, 2006.13070v1, https://doi.org/10.48550/arXiv.2006.13070.

  • Czado, C., and T. Nagler, 2022: Vine copula based modeling. Annu. Rev. Stat. Appl., 9, 453477, https://doi.org/10.1146/annurev-statistics-040220-101153.

    • Search Google Scholar
    • Export Citation
  • Dißmann, J., E. C. Brechmann, C. Czado, and D. Kurowicka, 2013: Selecting and estimating regular vine copulae and application to financial returns. Comput. Stat. Data Anal., 59, 5269, https://doi.org/10.1016/j.csda.2012.08.010.

    • Search Google Scholar
    • Export Citation
  • Drees, H., and A. Sabourin, 2021: Principal component analysis for multivariate extremes. Electron. J. Stat., 15, 908943, https://doi.org/10.1214/21-EJS1803.

    • Search Google Scholar
    • Export Citation
  • Durkan, C., A. Bekasov, I. Murray, and G. Papamakarios, 2019: Neural spline flows. arXiv, 1906.04032v2, https://doi.org/10.48550/arXiv.1906.04032.

  • Durkan, C., A. Bekasov, I. Murray, and G. Papamakarios, 2020: nflows: Normalizing flows in PyTorch. Zenodo, https://doi.org/10.5281/zenodo.4296287.

  • Erhardt, T. M., C. Czado, and U. Schepsmeier, 2015: R-vine models for spatial time series with an application to daily mean temperature. Biometrics, 71, 323332, https://doi.org/10.1111/biom.12279.

    • Search Google Scholar
    • Export Citation
  • Feydy, J., T. Séjourné, F.-X. Vialard, S. Amari, A. Trouvé, and G. Peyré, 2019: Interpolating between optimal transport and MMD using Sinkhorn divergences. 22nd Int. Conf. on Artificial Intelligence and Statistics, Naha, Okinawa, Japan, PMLR, 2681–2690, https://proceedings.mlr.press/v89/feydy19a/feydy19a.pdf.

  • Geenens, G., A. Charpentier, and D. Paindaveine, 2017: Probit transformation for nonparametric kernel estimation of the copula density. Bernoulli, 23, 18481873, https://doi.org/10.3150/15-BEJ798.

    • Search Google Scholar
    • Export Citation
  • Gong, Y., and R. Huser, 2022: Asymmetric tail dependence modeling, with application to cryptocurrency market data. Ann. Appl. Stat., 16, 18221847, https://doi.org/10.1214/21-AOAS1568.

    • Search Google Scholar
    • Export Citation
  • Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, 2014: Generative adversarial nets. Advances in Neural Information Processing Systems 27 (NIPS 2014), Curran Associates Inc., 2672–2680, https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.

  • Gräler, B., 2014: Modelling skewed spatial random fields through the spatial vine copula. Spat. Stat., 10, 87102, https://doi.org/10.1016/j.spasta.2014.01.001.

    • Search Google Scholar
    • Export Citation
  • Heffernan, J. E., and J. A. Tawn, 2004: A conditional approach for multivariate extreme values (with discussion). J. Roy. Stat. Soc., 66B, 497546, https://doi.org/10.1111/j.1467-9868.2004.02050.x.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 19992049, https://doi.org/10.1002/qj.3803.

    • Search Google Scholar
    • Export Citation
  • Ho, J., A. Jain, and P. Abbeel, 2020: Denoising diffusion probabilistic models. Proc. 34th Int. Conf. on Neural Information Processing Systems, Vancouver, BC, Canada, Association for Computing Machinery, 6840–6851, https://dl.acm.org/doi/abs/10.5555/3495724.3496298.

  • Huser, R., and J. L. Wadsworth, 2022: Advances in statistical modeling of spatial extremes. Wiley Interdiscip. Rev. Comput. Stat., 14, e1537, https://doi.org/10.1002/wics.1537.

    • Search Google Scholar
    • Export Citation
  • Huster, T., J. Cohen, Z. Lin, K. Chan, C. Kamhoua, N. O. Leslie, C.-Y. J. Chiang, and V. Sekar, 2021: Pareto GAN: Extending the representational power of GANs to heavy-tailed distributions. Proc. 38th Int. Conf. on Machine Learning, Online, PMLR, 4523–4532, https://proceedings.mlr.press/v139/huster21a/huster21a.pdf.

  • Jaini, P., I. Kobyzev, Y. Yu, and M. Brubaker, 2020: Tails of Lipschitz triangular flows. Proc. 37th Int. Conf. on Machine Learning, Online, PMLR, 4673–4681, https://proceedings.mlr.press/v119/jaini20a/jaini20a.pdf.

  • Janke, T., M. Ghanmi, and F. Steinke, 2021: Implicit generative copulas. Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Curran Associates Inc., 26 028–26 039, https://proceedings.neurips.cc/paper/2021/hash/dac4a67bdc4a800113b0f1ad67ed696f-Abstract.html.

  • Jiang, Y., D. Cooley, and M. F. Wehner, 2020: Principal component analysis for extremes and application to U.S. precipitation. J. Climate, 33, 64416451, https://doi.org/10.1175/JCLI-D-19-0413.1.

    • Search Google Scholar
    • Export Citation
  • Joe, H., 2014: Dependence Modeling with Copulas. 1st ed. Taylor and Francis, 462 pp.

  • Joe, H., H. Li, and A. K. Nikoloulopoulos, 2010: Tail dependence functions and vine copulas. J. Multivar. Anal., 101, 252270, https://doi.org/10.1016/j.jmva.2009.08.002.

    • Search Google Scholar
    • Export Citation
  • Karami, M., D. Schuurmans, J. Sohl-Dickstein, L. Dinh, and D. Duckworth, 2019: Invertible convolutional flow. 33rd Conf. on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, NeurIPS, 1–11, https://proceedings.neurips.cc/paper_files/paper/2019/file/b1f62fa99de9f27a048344d55c5ef7a6-Paper.pdf.

  • Kim, H., H. Lee, W. H. Kang,