## 1. Introduction

Much of our understanding of the climate system is derived from climate model experiments, which also underpin most projections of future climate based on specified scenarios of greenhouse gas emissions and socioeconomic development (Chen et al. 2023). Climate models may be used individually, as when using general circulation models (GCMs) or Earth system models (ESMs) to provide global simulations at a relatively coarse spatial resolution, or in combination, as when using GCMs or ESMs to provide boundary conditions for high-resolution regional climate model (RCM) simulations.

Climate model experiments are associated with multiple sources of uncertainty (Rougier and Goldstein 2014), which are often studied using collections (“ensembles”) of model runs. These include ensembles that explore internal variability, often by varying initial conditions in a single model (e.g., Rodgers et al. 2021; Maher et al. 2021; von Trentini et al. 2020); perturbed physics ensembles (Collins et al. 2006) that vary the representation of key processes, again within a single model; and multimodel ensembles (Meehl et al. 2007; von Trentini et al. 2019) that can be regarded as attempting to characterize the range of outcomes that could be anticipated on the basis of current scientific knowledge, as represented by the models considered.

In any such ensemble, the sampled uncertainties induce variation between ensemble members. It is therefore of interest to characterize this variation: for example, by identifying sources of uncertainty to which the outputs are most sensitive and hence suggesting priorities for future model improvements, or by identifying representative subsets of members for use in applications where it is not feasible to process the entire ensemble. The latter situation can arise when climate projections are used as inputs to other models of climate change impacts: if the impacts models themselves are computationally expensive or if resources are otherwise limited, then this limits the number of ensemble members that can be used (Cannon 2015).

A substantial body of literature is devoted to quantifying the relative importance of different sources of variation within an ensemble. For example, Hawkins and Sutton (2009) applied a heuristic approach to projections from the World Climate Research Programme’s Coupled Model Intercomparison Project phase 3 (CMIP3) multimodel ensemble, to partition the variation into components attributable respectively to GCMs, emissions scenarios, and internal variability of the climate system. This approach was placed on a more formal footing using analysis of variance (ANOVA) techniques by Yip et al. (2011). A drawback of these authors’ fixed-effects ANOVA methodology, however, is that the unambiguous partitioning of variation requires a balanced ensemble, which in this case means that each GCM is run the same number of times with each emissions scenario. The CMIP3 ensemble does not meet this condition, so Yip et al. (2011) discarded some ensemble members and applied their methodology to the largest possible balanced subset. Of course, discarding ensemble members incurs a loss of information: this can be avoided using a random-effects ANOVA (Northrop and Chandler 2014) or by considering the problem as one of a balanced ensemble with some values missing, and estimating (“imputing”) these missing values (Evin et al. 2019, 2021). The latter approaches both use Bayesian methods and require the use of Markov chain Monte Carlo (MCMC) techniques that are computationally intensive. This perhaps disincentivizes their routine use in situations where the outputs of interest are high dimensional (i.e., involve many quantities of interest), for example when analyzing maps involving large numbers of spatial locations.

In high-dimensional settings, a further potential concern with ANOVA approaches is that they are designed to analyze scalar-valued quantities. Most applications of ANOVA in climate science have therefore considered each quantity separately. For example, Christensen and Kjellström (2020) applied fixed-effects ANOVAs to a subset of the EuroCORDEX regional model ensemble (Jacob et al. 2014) separately for each spatial location within the EuroCORDEX study region, and mapped the results. However, a comprehensive analysis of variation in high-dimensional ensemble outputs requires methods that are specifically designed for a high-dimensional setting. This has rarely been attempted to date, a notable exception being Sain et al. (2011), who used functional ANOVA methods to analyze ensembles of maps within a Bayesian framework, which, as noted above, can be computationally expensive. Other exceptions include applications of clustering methods to find representative subsets of ensemble members for use in impacts studies (e.g., Cannon 2015; Casajus et al. 2016): these methods aim to produce a simplified representation of the variation by clustering the ensemble members into groups that are relatively homogeneous and distinct.

Alternative simplified representations can be obtained via dimension reduction, for example by applying principal components analysis (PCA) or empirical orthogonal function (EOF) analysis across ensemble members (e.g., Li and Xie 2012; Yim et al. 2016; Wang et al. 2020). Such “intermodel EOF” analyses seek to identify dominant modes of spatial variation across the ensemble: the cited references demonstrate how these modes may be of interest in their own right, as well as how the associated scores (see below) can be used to identify contrasting models for further study. A potential difficulty, however, is that ensembles often contain structure that EOF analyses are not designed to handle. In regional ensembles for example, structure is induced by the GCM and RCM combinations used to generate each member. Similarly, in the CMIP ensembles, direct EOF analyses will be dominated by the models with the most runs. To avoid this problem when analyzing CMIP5 outputs, Yim et al. (2016) worked with the average of each model’s runs, while Wang et al. (2020) analyzed a single run from each model. However, no formal justification was provided for either approach.

Against this background, the main contribution of the present paper is to formalize the application of dimension reduction techniques to climate model ensembles, while also accounting appropriately for ensemble structure. The resulting approach, which we call *ensemble principal pattern* (EPP) analysis, reduces to intermodel EOF analysis when applied to unstructured ensembles containing a single member per model. It is intended primarily as a tool to enable the rapid exploration of ensembles, either to select contrasting or representative members for use in impacts studies or to highlight distinctive modes of variation that may merit further investigation.

To motivate our proposal and fix ideas subsequently, section 2 presents some outputs from the EuroCORDEX regional ensemble over the United Kingdom and discusses some questions that could be asked about these outputs. The basic intermodel EOF methodology is described in section 3 as a reference. In section 4 the methodology is extended, in conjunction with multivariate ANOVA (MANOVA) techniques, to handle ensembles with more complex structures: the extension is illustrated using the EuroCORDEX example. Section 5 concludes and suggests further potential applications.

## 2. Motivating example: The EuroCORDEX regional ensemble

We consider the bias in simulated summer (JJA) mean daily maximum surface temperature (“tasmax”) across the United Kingdom, over the period 1989–2008, for 64 EuroCORDEX ensemble members (Jacob et al. 2014) that were produced using combinations of 10 RCMs and 10 GCM runs from the CMIP5 experiment (Taylor et al. 2012). The GCM runs, from six unique GCMs, were conditioned on CMIP5 historical forcings until 2005, and on the RCP8.5 emissions scenario from 2006 to 2008. Over this limited period the RCP8.5 forcings deviate very little from the 2005 levels (van Vuuren et al. 2011) and hence can be considered as plausible proxies for the historical values.

To calculate the biases in JJA tasmax, the 1989–2008 mean tasmax values were first calculated for each ensemble member on the native grid of the corresponding RCM. Next, these mean values were regridded to the 12 × 12 km^{2} grid used by the HadUK gridded observational dataset (Perry et al. 2009), using a conservative area-weighting scheme (Jones 1999): The 12-km grid resolution is similar to that of each EuroCORDEX RCM. Finally, the biases for each ensemble member, shown in Fig. 1, were computed by subtracting the 1989–2008 mean HadUK tasmax values.

The columns of Fig. 1 represent GCM runs, and the rows represent RCMs. To simplify the subsequent presentation, the 10 GCM runs will be considered as representing separate models throughout: this is not required for the methodology, however. With this convention, each GCM and RCM pair contributes at most a single member to the ensemble. The ensemble is unbalanced however, with 36 of the 100 possible pairs missing. The missing pairs include four that are available but have been excluded deliberately: three because they are superseded in the ensemble by runs from later versions of the same RCMs driven by the same GCMs, and one because it contains inconsistent metadata regarding the driving GCM.

As noted in section 1, lack of balance causes problems when attributing variation to its potential sources, whence analyses are sometimes restricted to the largest balanced subset of an ensemble. Here, the largest such subset involves eight RCMs and four GCM runs, hence including just 32 of the 64 available ensemble members. It is clearly undesirable to restrict attention to such a small subset.

Apart from the “missing” runs, the most obvious feature of Fig. 1 is the sixth column, corresponding to the HadGEM2-ES GCM: all ensemble members driven by this GCM appear to have a warm bias. Closer inspection also suggests potential cool biases associated with the HIRHAM5, RACMO22E, and RCA4 RCMs (fourth, sixth, and seventh rows): However, it is hard to be confident about this based on a purely visual inspection. Moreover, it is hard to identify any systematic and detailed spatial structure from the maps. This creates difficulties for several potential users of the ensemble. Consider, for example, a regional climate modeler who sees EuroCORDEX as an opportunity to explore the simulation of summer temperatures by different RCMs. Apart from the apparent cool biases noted above, Fig. 1 does not obviously provide useful information to support such a comparison.

Another potential user is the climate impacts modeler with limited computational resources. Such a user might produce the equivalent of Fig. 1, showing future changes in some impact-relevant quantity under a specified emissions scenario, and they may wish to choose a small number of ensemble members spanning the range of potential impacts. This could be done by computing spatial averages of relevant quantities for each ensemble member, and then choosing members spanning the range of the resulting averages. However, spatial averages are not necessarily the most relevant quantities when considering impacts that may be localized in space (e.g., associated with urban areas). It would therefore be helpful to find an alternative visualization that allows users to identify detailed spatial structures in an ensemble.

Our two notional EuroCORDEX users have different interests and priorities. Nonetheless, both would benefit from a visualization that simultaneously (i) is more compact than Fig. 1 and (ii) allows them to identify potentially interesting modes of variation. This is the aim of our EPP methodology—although, as discussed later, its use is not restricted to regional ensembles.

## 3. EPP analysis: The simplest case

To develop the methodology, we start by considering an unstructured ensemble—for example, a multimodel ensemble with a single member per model, or a perturbed physics ensemble with one member per model variant. The ensemble outputs are considered as a collection of vectors {**Y*** _{i}*:

*i*= 1, …,

*n*} say, where

*n*is the number of members and

*S*containing a value for the

*i*th member at each of

*S*spatial locations. Moreover, let

*n*×

*S*“ensemble data matrix” with

*as its*

_{i}*i*th row; let

*S*× 1 overall ensemble mean vector, with the

*s*th element being

**1**be an

*S*× 1 vector of ones.

In this case, as noted earlier, the proposed methodology is equivalent to an intermodel EOF analysis which, for present purposes, is most conveniently presented by considering the singular value decomposition (SVD) of the centered data matrix *i*th row. Since the approach is widely used already (see section 1), we merely sketch it here to prepare for the extension to structured ensembles in section 4. For more details of the underpinning mathematics, see Krzanowski (1988, section 4.1) or Gentle (2007, section 3.10).

When *n* < *S*, the centered matrix *n* − 1 so that *n* separate maps are needed to visualize the complete ensemble (the ensemble mean, plus *n* − 1 maps to visualize the centered matrix—or, equivalently, separate maps of each ensemble member). Dimension reduction aims to find an accurate approximation of *d* < *n* − 1, so that the information can be visualized in just *d* + 1 maps including the ensemble mean.

*n*×

*n*and

*S*×

*n*, respectively, and

**Λ**is a

*n*×

*n*diagonal matrix with nonnegative elements sorted in decreasing order. Denote these elements by

*λ*is zero because the rank of

_{n}*n*− 1); then (1) can be rewritten as

**u**

*and*

_{j}**v**

*are vectors of length*

_{j}*n*and

*S*respectively, containing the

*j*th columns of

*i*th element of

**u**

*by*

_{j}*u*, the

_{ij}*i*th row of

*n*vectors

*S*and, in the current context, represent spatial patterns: moreover, the patterns are uncorrelated because

*λ*:

_{j}u_{ij}*i*= 1, …,

*n*;

*j*= 1, …,

*n*} are the corresponding principal component scores.

Now consider truncating the right-hand side of Eq. (2) to retain just *d* < *n* terms. The result is an *n* × *S* matrix, *d* and with *i*th row *i*, *s*)th element of *d* approximation of *j*th spatial pattern **v*** _{j}* accounts for a proportion

As noted above, the convention in PCA is to define the principal components as the columns of *λ _{j}*} into the corresponding scores. In the current context however, it is arguably more interpretable to multiply the {

*λ*} by the {

_{j}**v**

*} instead: the resulting spatial patterns {*

_{j}*λ*

_{j}**v**

*} have the same units of measurement as the original quantity of interest and hence can be interpreted directly as contributions to the overall variation of that quantity, while the scores {*

_{j}*u*} are dimensionless and hence can be interpreted as “weights” attached to each pattern. We therefore define the EPPs as the patterns {

_{ij}*λ*

_{j}**v**

*}; and the {*

_{j}*u*:

_{ij}*i*= 1, …,

*n*;

*j*= 1, …,

*d*} are the corresponding EPP scores. An ensemble member with a positive (negative) score for the

*j*th EPP will tend to have positive (negative) deviations from the ensemble mean in regions where the corresponding pattern

**v**

*is positive, and vice versa.*

_{j}A further detail is that

## 4. EPPs for structured ensembles

As noted in section 1, many ensembles contain structure that is not accounted for by the basic methodology described above. We now extend the methodology to address this, by combining multivariate analysis of variance (MANOVA) techniques with SVDs. For ease of exposition, we focus on regional ensembles involving *R* RCMs and *G* GCMs, in which each RCM and GCM combination has been run at most once. Extensions to more general structured ensembles are straightforward in principle: details can be found in the online supplemental material to the paper.

### a. Regional ensembles: Theory

We denote the ensemble data matrix by * _{rg}*}, with

*r*and

*g*indexing RCMs and GCMs respectively. The total number of runs is denoted by

*n*

_{⋅⋅}so that the dimension of

*n*

_{⋅⋅}×

*S*, and

*n*

_{r}_{⋅}and

*n*denote the numbers of runs involving RCM

_{⋅g}*r*and GCM

*g*, respectively.

**represents an overall mean;**

*μ*

*α**and*

_{g}

*β**represent systematic departures from this overall mean for the*

_{r}*g*th GCM and

*r*th RCM, respectively; and

*ε**is a “residual” representing variation that cannot be attributed to systematic contributions from either the GCM or the RCM (e.g., arising from internal variability in the system), treated as though it is drawn independently for each member from a distribution with mean vector*

_{rg}**0**and covariance matrix

**Σ**. With the exception of the

*S*×

*S*matrix

**Σ**, these quantities are all vectors of length

*S*.

Equation (3) defines an additive MANOVA model, which is the multivariate analog of the two-way ANOVA model used in several references from section 1. Least squares estimates of the coefficient vectors {*α** _{g}*} and {

*β**} can be obtained from the data matrix: the estimation is particularly simple for a balanced ensemble with a run for every RCM and GCM combination so that*

_{r}*n*

_{r}_{⋅}=

*G*,

*n*=

_{⋅g}*R*and

*n*

_{⋅⋅}=

*RG*. Let

*r*th RCM and

*g*th GCM respectively; and let

**,**

*μ*

*α**, and*

_{g}

*β**are, respectively,*

_{r}*g*is

*r*. However, the appropriate analogs of these quantities are less obvious in unbalanced ensembles with some missing RCM and GCM combinations. In this case, model (3) provides a natural generalization as we now elaborate.

**defined already,**

*μ***1**is now a vector containing

*n*

_{⋅⋅}ones;

*is an*

_{G}*n*

_{⋅⋅}× (

*G*− 1) matrix in which the

*g*th column contains ones in the rows corresponding to runs in which the

*g*th GCM was used, negative ones in the rows where the

*G*th GCM was used, and zeroes everywhere else;

*is an*

_{R}*n*

_{⋅⋅}× (

*R*− 1) matrix in which the

*r*th column contains ones in the rows corresponding to runs in which the

*r*th RCM was used, negative ones in the rows where the

*R*th RCM was used, and zeroes everywhere else;

**is the (**

*α**G*− 1) ×

*S*matrix in which the rows are the coefficient vectors {

*α**:*

_{g}*g*= 1, …,

*G*− 1};

**is the corresponding (**

*β**R*− 1) ×

*S*matrix containing the coefficient vectors {

*β**:*

_{r}*r*= 1, …,

*R*− 1};

**is the**

*ε**n*

_{⋅⋅}×

*S*matrix in which the rows are the vectors {

*ε**}; and the matrix*

_{rg}*n*

_{⋅⋅}× (

*R*+

*G*− 1), is obtained by placing its component parts side by side. The accompanying python script (see the data availability statement) demonstrates these constructions.

*R*+

*G*− 1) ×

*S*matrix. The cost of solving this system increases linearly in

*S*so that the solution is feasible on modern computers, even for large spatial domains.

Computing *G* + *R* maps, subjected to their own SVD decompositions by stacking them into matrices of dimension *G* × *S* and *R* × *S*, respectively. These matrices are both centered by construction. The SVD decompositions yield two sets of EPPs, summarizing the dominant patterns of variation among the GCMs and RCMs respectively. Moreover, SVDs of the “residual” matrix ^{(2)}; the superscript is for notational consistency with the supplemental material.

A full EPP analysis thus consists of two steps: first, calculation of the GCM, RCM, and residual effects based on (3); and second, application of SVD to each set of effects to obtain the corresponding spatial modes of variation.

### b. Regional ensembles: Partitioning of variation

The development above provides low-dimensional estimates of RCM, GCM, and residual effects in a regional ensemble. To fully understand the ensemble structure however, it is also necessary to quantify the relative importance of these effects. If, for example, the RCMs contribute only a small proportion of the total variation, then the RCM EPPs themselves are arguably of limited interest. We now address this issue. The approach can be regarded either as an extension of the ANOVA methodology of Yip et al. (2011), to handle unbalanced ensembles with multivariate outputs, or as a computationally cheap alternative to the functional ANOVA methodology of Sain et al. (2011).

#### 1) Balanced ensembles

*r*,

*g*) run can be written as

*sum of squares and cross-products*(SSCP)

*S*×

*S*that plays the same role as the total sum of squares in a standard univariate ANOVA (Krzanowski 1988, section 13.3). To aid interpretation, note that

*RG*− 1) is the sample covariance matrix between all pairs of locations across the ensemble. Using (6), a little algebra shows that the total SSCP can be decomposed as

*and*

_{G}*are formed respectively from the least squares estimates of the GCM and RCM effects in model (3):*

_{R}**Σ**is

*R*− 1 independent RCM effects and

*G*− 1 independent GCM effects. Moreover, in the balanced case with

*n*

_{⋅⋅}=

*RG*, (

*n*

_{⋅⋅}−

*R*−

*G*+ 1) is equal to (

*G*− 1)(

*R*− 1) so that

Unfortunately, if the number of locations *S* exceeds 1, there is no unique way to compare the magnitudes of * _{G}*,

*, and*

_{R}*in (8) (if*

_{E}*S*= 1, then each of these matrices is a single number and can be expressed as a proportion of the total variation

*,*

_{G}*, and*

_{R}*to each of these diagonal elements: these are the “total variability partition” maps in, for example, Fig. 5 of Christensen and Kjellström (2020). Importantly, the diagonal elements can all be calculated without having to compute the SSCPs themselves (see the supplemental material for details), which is helpful because they are of dimension*

_{E}*S*×

*S*so that storage requirements can be excessive for large

*S*.

To supplement the maps just described, the diagonal elements of the matrices in (8) can be summed to obtain their traces. The trace operator is additive, so that trace(* _{G}*) + trace(

*) + trace(*

_{R}*) and the total trace is partitioned unambiguously into single-number summaries representing contributions from each source of variation. A potential disadvantage is that the off-diagonal elements of the SSCP matrices do not influence the partitioning: we return to this point in section 5. For the moment however, we highlight a useful alternative interpretation involving the centered data matrix*

_{E}*) and trace(*

_{G}*) are the squared Frobenius norms of*

_{R}*) is that of the residual matrix*

_{E}^{(2)}. The trace-based partitioning of variation therefore provides an interpretable decomposition focused on the data matrix itself, rather than the SSCPs.

Finally, we note that the traces of the components of (8) are related to the SVDs of the components of (4): if the singular values of * _{E}*) is related to the SVD of the residual matrix

**e**

^{(2)}. The proposed methodology therefore provides a hierarchical partitioning of the ensemble variation: the squared singular values of the EPPs for each component of (4) sum to the variation explained by that component and, in turn, these componentwise contributions sum to the total variation trace(

#### 2) Unbalanced ensembles

If some (*r*, *g*) combinations are missing from a regional ensemble then Eq. (8) no longer holds and as discussed in section 1, the ensemble variation cannot be partitioned unambiguously. In this case, instead of discarding or imputing members to obtain a balanced ensemble, one alternative is to determine the range of variation (RoV) that is potentially attributable to each source. This is done by performing two separate analyses, each based on a sequence of statistical models as described in the supplemental material. In the first analysis, the maximum possible variation is attributed to the GCMs, with the RCM information used only to account for variation that cannot otherwise be explained. In the second analysis, the roles of GCMs and RCMs are reversed. The difference between the results indicates the extent to which the partitioning is affected by the lack of balance: in the balanced case, both analyses recover the unique decomposition (8).

*α**}, the {*

_{g}

*β**}, and*

_{r}**Σ**as described earlier. We therefore denote the three terms in (9) as

*estimated*SSCP for a complete ensemble as

A potential objection to this approach is that it no longer provides an exact decomposition of variation for the observed ensemble. We therefore propose to use it in conjunction with the RoVs, as illustrated in the example below. Importantly, the RoVs are not guaranteed to contain the relative proportions derived from (10): they provide information on potential partitionings of variation in the *observed* ensemble, whereas (10) aims to account additionally for variation associated with *unobserved* ensemble members. If the observed members are unrepresentative in some sense—for example, if GCMs with above-average responses tend to be paired with RCMs that have below-average responses—then this lack of representativeness will feed into the RoVs. If the results from (10) lie outside these ranges therefore, this suggests that the characteristics of the available and missing ensemble runs may differ.

The use of an estimated partitioning of variation can be regarded as a form of imputation (see section 1). By contrast with other imputation schemes however, it does not require estimation of the missing ensemble members: rather, it reweights the contributions to each SSCP to account for undersampled parts of the complete ensemble. This is similar in spirit to the well-established approach, in survey sampling, of reweighting to handle situations in which subgroups of a population are over- or underrepresented (e.g., Little and Rubin 2020, chapter 3).

### c. Regional ensembles: An unbalanced example

The EPP methodology is now applied to the EuroCORDEX U.K. temperature biases of section 2. The data matrix has *n*_{⋅⋅} = 64 rows corresponding to the ensemble members, and *S* = 1652 columns corresponding to grid cells.

Figure 2 shows the estimated partitioning of variation for a complete ensemble based on Eq. (10). Focusing first on the estimated ensemble standard deviations in Fig. 2a, derived from

Figures 2b–d use Eq. (10) to probe the sources of variation in the ensemble. The RCMs account for the highest percentage (53%) of the estimated total variation, although the GCMs also account for 38%: unstructured residual variation is relatively unimportant. These figures can be compared with the RoVs derived from the available ensemble members: these analyses reveal that the RCMs contribute between 35% and 62% of the total variation while the GCMs contribute between 29% and 56%. The GCM and RCM estimates from (10) both fall well within the respective RoVs, whence there is no evidence that the available ensemble members are unrepresentative with respect to biases in summer tasmax. A closer inspection of the maps also reveals that in the two urban areas where the ensemble variation is highest, the RCMs account for up to around 70% of it: this perhaps reinforces recent evidence (Lo et al. 2020) that uncertainties in urban heat island effects are primarily attributable to the RCMs.

To understand the variation in more detail, Figs. 3 and 4 show the GCM and RCM EPPs respectively. In both cases, panel a is the estimate of ** μ** in (3). The first GCM EPP accounts for 92% of the GCM-attributed variation and shows a gentle north–south gradient; the second EPP is much less important. By contrast, the first RCM EPP—accounting for 87% of the RCM-related variation—clearly picks out the pattern corresponding to the effects of urban heat islands and topography. This demonstrates that the uncertainty regarding these effects comes predominantly from the RCMs, and in fact that it is the dominant pattern of RCM-related variation in the ensemble. Moreover, the EPP scores associated with this pattern enable us to identify the RCMs with the smallest and largest such effects, which are RACMO22E and REMO2015 respectively. For users with a particular interest in U.K. summer maximum temperatures in urban heat islands, this analysis therefore helps to identify the EuroCORDEX runs spanning the range of relevant historical biases.

## 5. Discussion

### a. Summary of the proposed methodology

EPPs are designed to enable rapid exploration of structured climate model ensembles, particularly when the outputs of interest are high-dimensional. They are descriptive rather than inferential: in particular, we do not attempt to quantify uncertainties about sources of variation or to assess their “statistical significance,” which require more time-consuming methods as reviewed in the introduction. Nor do they aim to estimate the underlying properties of a model or system: this contrasts, for example, with superficially similar techniques that have been used to study the dynamical properties of individual climate models (e.g., Maher et al. 2018; Haszpra et al. 2020; Bódai et al. 2021). Estimation of system properties requires appropriately designed ensembles of sufficient size. By contrast, an EPP analysis is essentially an arithmetical decomposition of the ensemble: for balanced ensembles the decomposition is exact, while for unbalanced ensembles it relies on minimal assumptions corresponding to representations such as Eq. (3).

For the descriptive analysis of regional ensembles, (3) itself is relatively uncontentious: it makes no distributional assumptions, although one potential restriction is the additive structure in which the effect of a given RCM is the same regardless of which GCM is driving it. This assumption can be relaxed if the ensemble contains multiple runs of each GCM and RCM combination: the supplemental material provides details. The model provides interpretable summaries of the GCM and RCM effects via the least squares coefficient estimates, and provides a framework for reweighting contributions to the SSCPs in unbalanced ensembles via Eqs. (9) and (10). This reweighting does not require the estimation of missing ensemble members, nor does it require that members are discarded to obtain a balanced subset. An open question, however, is to assess the uncertainty associated with the use of **T*** _{E}*. A related question is considered by Christensen and Kjellström (2022), who use heuristic arguments to examine the implications of missing ensemble members for estimation of the coefficient vectors ({

*α**} and {*

_{g}

*β**} in the current context). It is not obvious, however, that these arguments can be extended to examine the effect on the partitioning of variation.*

_{r}Although EPP analysis is designed primarily for descriptive purposes, it is natural to ask how it is affected by sources of variation that are not considered explicitly—for example, Eq. (3) contains no direct representation of internal variability, because the EuroCORDEX ensemble does not provide multiple runs of each GCM and RCM pair with varying initial conditions. In such cases the associated unmodelled or unrepresented variation is subsumed into the residual term * _{E}* in (8), or its estimate

### b. How many EPPs?

In Fig. 3, most variation in the GCM effects is dominated by the first two EPPs, which represent respectively a spatial monopole and dipole. A reviewer has pointed out that this situation is common, and queried whether higher-order EPPs may reveal more nuances of structure. We have investigated this (details are in the accompanying code—see the data availability statement). The higher-order EPPs are relatively unimportant (e.g., GCM EPPs 3 and 4 contribute respectively 0.9% and 0.3% of the GCM-attributed variation) and, moreover, exhibit no interpretable spatial patterns. These results are typical for GCM EPPs in our experience—although the monopole/dipole pattern is not so typical for RCM EPPs as exemplified by Fig. 4. At some level, a lack of “interesting” higher-order structure is itself noteworthy: for example, it suggests that identification of representative ensemble members can be done using just the first two EPP scores for each source of variation. Further investigation is needed, however, to determine whether these findings can be replicated across different ensembles, regions, time periods, and quantities of interest.

### c. Alternative measures of variation

To summarize the overall magnitudes of the SSCP matrices involved in the partitioning of total variation, the development above uses their traces. Although this approach is appealing for its relationship with the Frobenius norm of the centered data matrix, it neglects the off-diagonal elements of the SSCPs. These elements are related to the correlations between neighboring locations and thus, in principle, contain information about the spatial extent of “typical” differences between ensemble members. Alternative summaries of the SSCPs have been considered in the MANOVA literature (e.g., Huberty and Olejnik 2006, chapter 3), albeit justified by the assumption (not made in the development above) that the residual vectors [the {*ε** _{rg}*} in (3)] have multivariate normal (Gaussian) distributions. One such alternative derives from fact that in the Gaussian case the least squares MANOVA coefficient estimates are also the maximum likelihood estimates (Krzanowski 1988, section 15.2); hence fitted models can be summarized in terms of their maximized log-likelihoods. In particular, the

*scaled deviance*for a model is defined as twice the difference between its maximized log-likelihood and the highest log-likelihood attainable (i.e., the log-likelihood from a model that fits the ensemble outputs perfectly). The scaled deviance is a measure of “lack of fit”: for example, in linear regression models it is proportional to the residual sum of squares (Davison 2003, section 10.2).

For a two-way additive MANOVA such as Eq. (3), it can be shown (see the supplemental material) that in a balanced ensemble the scaled deviance can be partitioned into contributions from the GCMs, RCMs, and residuals in exactly the same way as the total trace, and that the proportions of scaled deviance attributable to the GCMs and RCMs are trace(**Σ**^{−1}* _{G}*)/trace(

**Σ**

^{−1}

**Σ**

^{−1}

*)/trace(*

_{G}**Σ**

^{−1}

**Σ**matrix: any differences compared with the trace-based partitioning must therefore be associated with this correlation structure, implying that the ensemble members contain differing spatial “patches” of above- and below-average values that are associated with inter-RCM variation. This implication is unsurprising and suggests that the trace-based partitioning of variation yields more useful insights than a partitioning based on deviances. Further work is needed to determine whether this conclusion holds in general, or whether it is specific to the ensembles and study region considered in this paper.

### d. Extensions and other potential applications

Our example considers maps of a single climate index (bias in tasmax). The methodology can also be applied to multiple indices simultaneously: all that is required is that the ensemble data matrix *n*_{⋅⋅} × 2*S* data matrix containing the relevant values of both temperature and precipitation. An intermodel EOF analysis along these lines is considered by Zhou et al. (2020). In cases involving variables with different units of measurement however, it is desirable to standardize the data for each variable prior to analysis so that the results are not dominated by the contributions from individual variables: see Krzanowski (1988, section 2.2) for a discussion of this in the closely related context of principal components analysis. In climate science, it is also common to standardize each index on a per-grid-square basis before calculating SVDs, when performing PCA or similar. The appropriateness of this for an EPP analysis depends on the context. We did not do it, because our goal was to understand the spatial variation in the ensemble: for example, standardizing the EuroCORDEX temperature biases individually for each grid square would have removed the interesting excess variation associated with urban heat islands and topography in the first panel of Fig. 2.

**Y**

*now denotes the output from the*

_{gi}*i*th run of the

*g*th CMIP GCM (

*g*= 1, …,

*G*);

**represents an overall mean;**

*μ*

*α**represents a systematic departure from this mean for the*

_{g}*g*th GCM; and

*ε**represents residual variation. Note that (11) is the direct analog of (3): the least squares estimates of*

_{gi}**and**

*μ*

*α**are now*

_{g}*g*th GCM. An EPP analysis thus focuses on the SVDs of the

Moving beyond regional and single-scenario CMIP ensembles, the supplemental material describes extensions to ensembles with more complex structures. For example, the approach could be used to explore the spatial structure of GCM- and scenario-specific variation in the CMIP ensembles which, as noted above, are often highly unbalanced: a simple application in this setting would use the GCM EPPs to characterize (dis)similarities between models in terms of their spatial patterns of projected future change. A more sophisticated analysis might focus on scenario effects: in such an analysis it may be reasonable to expect that the first scenario EPP will correspond to an overall pattern of change, and for the corresponding scenario-specific EPP scores to be related to some measure of net radiative forcing. Any departures from this expected pattern could yield interesting insights into the dynamics of the models. Other potential applications are to ensembles in which a single model is used to obtain projections for each of a set of emissions scenarios, starting from each of a common collection of carefully chosen initial conditions: here, an EPP analysis of the residual/interaction term in an additive representation of scenario and initial condition effects could potentially reveal information about the state-dependence of climate change signals.

EPPs can also be used to identify gaps in an existing structured ensemble, and hence to identify design priorities for additional runs. For example, in a regional ensemble each member can be summarized using the first EPP scores for the corresponding RCM and driving GCM respectively: the ensemble structure can then be visualized as a scatterplot of the corresponding pairs of scores. Such a plot will reveal combinations of scores—and hence of characteristic modes of behavior—that are not well represented and hence could be prioritized in subsequent ensemble updates.

Finally, we note that EPP analysis has potential applications beyond climate model ensembles. One such application is to gridded data products providing estimates of quantities that are not observed directly at the locations of interest: in such settings, one way to characterize the estimation uncertainty in the data product is to provide multiple samples from its joint uncertainty distribution. The uptake of such techniques by data product providers is currently low, although they are likely to become more widely available in the future (Chandler et al. 2012). EPPs provide one possible route for data product users to choose informed and representative subsets of samples, enabling uncertainty to be propagated through their subsequent analyses.

## Acknowledgments.

This research was funded under the U.K. Climate Resilience programme, which is supported by the UKRI Strategic Priorities Fund. The programme is co-delivered by the Met Office and NERC on behalf of UKRI partners AHRC, EPSRC, ESRC.

The authors acknowledge the World Climate Research Programme’s Working Group on Regional Climate, and the Working Group on Coupled Modelling, former coordinating body of CORDEX and responsible panel for CMIP5. We also thank the climate modelling groups (see column headings in Fig. 1) for producing and making available their model output. We also acknowledge the Earth System Grid Federation infrastructure, an international effort led by the U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison, the European Network for Earth System Modelling and other partners in the Global Organisation for Earth System Science Portals (GO-ESSP).

Finally, we thank the editor and three reviewers for their careful reading and constructive comments on an earlier draft of the paper.

## Data availability statement.

All figures and analysis can be reproduced using python scripts and example data linked from https://www.ucl.ac.uk/statistics/research/eurocordex-uk. The scripts also include an additional example, demonstrating the EPP analysis of an unstructured ensemble.

## REFERENCES

Bódai, T., G. Drótos, K.-J. Ha, J.-Y. Lee, and E.-S. Chung, 2021: Nonlinear forced change and nonergodicity: The case of ENSO-Indian monsoon and global precipitation teleconnections.

,*Front. Earth Sci.***8**, 599785, https://doi.org/10.3389/feart.2020.599785.Cannon, A. J., 2015: Selecting GCM scenarios that span the range of changes in a multimodel ensemble: Application to CMIP5 climate extremes indices.

,*J. Climate***28**, 1260–1267, https://doi.org/10.1175/JCLI-D-14-00636.1.Casajus, N., C. Périé, T. Logan, M.-C. Lambert, S. de Blois, and D. Berteaux, 2016: An objective approach to select climate scenarios when projecting species distribution under climate change.

,*PLOS ONE***11**, e0152495, https://doi.org/10.1371/journal.pone.0152495.Chandler, R. E., P. Thorne, J. Lawrimore, and K. Willett, 2012: Building trust in climate science: Data products for the 21st century.

,*Environmetrics***23**, 373–381, https://doi.org/10.1002/env.2141.Chen, D., and Coauthors, 2023: Framing, context, and methods.

*Climate Change 2021: The Physical Science Basis*, V. Masson-Delmotte et al., Eds., Cambridge University Press, 147–286, https://doi.org/10.1017/9781009157896.003.Christensen, O. B., and E. Kjellström, 2020: Partitioning uncertainty components of mean climate and climate change in a large ensemble of European regional climate model projections.

,*Climate Dyn.***54**, 4293–4308, https://doi.org/10.1007/s00382-020-05229-y.Christensen, O. B., and E. Kjellström, 2022: Filling the matrix: An ANOVA-based method to emulate regional climate model simulations for equally-weighted properties of ensembles of opportunity.

,*Climate Dyn.***58**, 2371–2385, https://doi.org/10.1007/s00382-021-06010-5.Collins, M., B. B. B. Booth, G. R. Harris, J. M. Murphy, D. M. H. Sexton, and M. J. Webb, 2006: Towards quantifying uncertainty in transient climate change.

,*Climate Dyn.***27**, 127–147, https://doi.org/10.1007/s00382-006-0121-0.Davison, A. C., 2003:

*Statistical Models*. Cambridge University Press, 726 pp.Evin, G., B. Hingray, J. Blanchet, N. Eckert, S. Morin, and D. Verfaillie, 2019: Partitioning uncertainty components of an incomplete ensemble of climate projections using data augmentation.

,*J. Climate***32**, 2423–2440, https://doi.org/10.1175/JCLI-D-18-0606.1.Evin, G., S. Somot, and B. Hingray, 2021: Balanced estimate and uncertainty assessment of European climate change using the large EURO-CORDEX regional climate model ensemble.

,*Earth Syst. Dyn.***12**, 1543–1569, https://doi.org/10.5194/esd-12-1543-2021.Faraway, J. J., 2014:

*Linear Models with R*. 2nd ed. Chapman and Hall/CRC, 286 pp.Gentle, J. E., 2007:

*Matrix Algebra: Theory, Computations, and Applications in Statistics*. Springer, 530 pp.Haszpra, T., M. Herein, and T. Bódai, 2020: Investigating ENSO and its teleconnections under climate change in an ensemble view—A new perspective.

,*Earth Syst. Dyn.***11**, 267–280, https://doi.org/10.5194/esd-11-267-2020.Hawkins, E., and R. Sutton, 2009: The potential to narrow uncertainty in regional climate predictions.

,*Bull. Amer. Meteor. Soc.***90**, 1095–1108, https://doi.org/10.1175/2009BAMS2607.1.Huberty, C. J., and S. Olejnik, 2006:

*Applied MANOVA and Discriminant Analysis*. John Wiley and Sons, 528 pp.Jacob, D., and Coauthors, 2014: EURO-CORDEX: New high-resolution climate change projections for European impact research.

,*Reg. Environ. Change***14**, 563–578, https://doi.org/10.1007/s10113-013-0499-2.Jones, P. W., 1999: First-and second-order conservative remapping schemes for grids in spherical coordinates.

,*Mon. Wea. Rev.***127**, 2204–2210, https://doi.org/10.1175/1520-0493(1999)127<2204:FASOCR>2.0.CO;2.Krzanowski, W., 1988:

*Principles of Multivariate Analysis*. Oxford University Press, 563 pp.Li, G., and S.-P. Xie, 2012: Origins of tropical-wide SST biases in CMIP multi-model ensembles.

,*Geophys. Res. Lett.***39**, L22703, https://doi.org/10.1029/2012GL053777.Little, R., and D. Rubin, 2020:

*Statistical Analysis with Missing Data*. 3rd ed. Wiley, 464 pp.Lo, Y. T. E., D. M. Mitchell, S. I. Bohnenstengel, M. Collins, E. Hawkins, G. C. Hegerl, M. Joshi, and P. A. Stott, 2020: U.K. climate projections: Summer daytime and nighttime urban heat island changes in England’s major cities.

,*J. Climate***33**, 9015–9030, https://doi.org/10.1175/JCLI-D-19-0961.1.Maher, N., D. Matei, S. Milinski, and J. Marotzke, 2018: ENSO change in climate projections: Forced response or internal variability?

,*Geophys. Res. Lett.***45**, 11 390–11 398, https://doi.org/10.1029/2018GL079764.Maher, N., S. Milinski, and R. Ludwig, 2021: Large ensemble climate model simulations: Introduction, overview, and future prospects for utilising multiple types of large ensemble.

,*Earth Syst. Dyn.***12**, 401–418, https://doi.org/10.5194/esd-12-401-2021.Meehl, G. A., C. Covey, T. Delworth, M. Latif, B. McAvaney, J. F. B. Mitchell, R. J. Stouffer, and K. E. Taylor, 2007: The WCRP CMIP3 multimodel dataset: A new era in climate change research.

,*Bull. Amer. Meteor. Soc.***88**, 1383–1394, https://doi.org/10.1175/BAMS-88-9-1383.Northrop, P. J., and R. E. Chandler, 2014: Quantifying sources of uncertainty in projections of future climate.

,*J. Climate***27**, 8793–8808, https://doi.org/10.1175/JCLI-D-14-00265.1.Perry, M., D. Hollis, and M. Elms, 2009: The generation of daily gridded datasets of temperature and rainfall for the U.K. NCEI Tech. Rep. 24, 7 pp., https://www.metoffice.gov.uk/binaries/content/assets/metofficegovuk/pdf/weather/learn-about/uk-past-events/papers/cm24_generation_of_daily_gridded_datasets.pdf.

Rao, C. R., 1973:

*Linear Statistical Inference and its Applications*. 2nd ed. Wiley, 625 pp.Rodgers, K. B., and Coauthors, 2021: Ubiquity of human-induced changes in climate variability.

,*Earth Syst. Dyn.***12**, 1393–1411, https://doi.org/10.5194/esd-12-1393-2021.Rougier, J., and M. Goldstein, 2014: Climate simulators and climate projections.

,*Annu. Rev. Stat. Appl.***1**, 103–123, https://doi.org/10.1146/annurev-statistics-022513-115652.Sain, S. R., D. Nychka, and L. Mearns, 2011: Functional ANOVA and regional climate experiments: A statistical analysis of dynamic downscaling.

,*Environmetrics***22**, 700–711, https://doi.org/10.1002/env.1068.Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design.

,*Bull. Amer. Meteor. Soc.***93**, 485–498, https://doi.org/10.1175/BAMS-D-11-00094.1.van Vuuren, D. P., and Coauthors, 2011: The representative concentration pathways: An overview.

,*Climatic Change***109**, 5, https://doi.org/10.1007/s10584-011-0148-z.von Trentini, F., M. Leduc, and R. Ludwig, 2019: Assessing natural variability in RCM signals: Comparison of a multi model EURO-CORDEX ensemble with a 50-member single model large ensemble.

,*Climate Dyn.***53**, 1963–1979, https://doi.org/10.1007/s00382-019-04755-8.von Trentini, F., E. E. Aalbers, E. M. Fischer, and R. Ludwig, 2020: Comparing interannual variability in three regional Single-Model Initial-condition Large Ensembles (SMILEs) over Europe.

,*Earth Syst. Dyn.***11**, 1013–1031, https://doi.org/10.5194/esd-11-1013-2020.Wang, C., Y. Hu, X. Wen, C. Zhou, and J. Liu, 2020: Inter-model spread of the climatological annual mean Hadley circulation and its relationship with the double ITCZ bias in CMIP5.

,*Climate Dyn.***55**, 2823–2834, https://doi.org/10.1007/s00382-020-05414-z.Yim, B. Y., H. S. Min, and J.-S. Kug, 2016: Inter-model diversity in jet stream changes and its relation to Arctic climate in CMIP5.

,*Climate Dyn.***47**, 235–248, https://doi.org/10.1007/s00382-015-2833-5.Yip, S., C. A. T. Ferro, D. B. Stephenson, and E. Hawkins, 2011: A simple, coherent framework for partitioning uncertainty in climate predictions.

,*J. Climate***24**, 4634–4643, https://doi.org/10.1175/2011JCLI4085.1.Zhou, S., G. Huang, and P. Huang, 2020: Inter-model spread of the changes in the East Asian summer monsoon system in CMIP5/6 models.

,*J. Geophys. Res. Atmos.***125**, 2020JD033016, https://doi.org/10.1029/2020JD033016.