Characterising spatial structure in climate model ensembles

: This paper presents a methodology that is designed for rapid exploratory analysis of the outputs from ensembles of climate models, especially when these outputs consist of maps. The approach formalises and extends the technique of ‘inter-model empirical orthogonal function’ analysis, combining multivariate analysis of variance techniques with singular value decompositions (SVDs) of structured components of the ensemble data matrix. The SVDs yield spatial patterns associated with these components, that we call ensemble principal patterns (EPPs). A unique hierarchical partitioning of variation is obtained for balanced ensembles in which all combinations of factors, such as GCM:RCM pairs in a regional ensemble, appear with equal frequency: suggestions are also proposed to handle unbalanced ensembles without imputing missing values or discarding runs. Applications include the selection of ensemble members to propagate uncertainty into subsequent analyses; and diagnosis of modes of variation associated with specific model variants or parameter perturbations. The approach is illustrated using outputs from the EuroCORDEX regional ensemble over the UK.


Introduction
Much of our understanding of the climate system is derived from climate model experiments, which also underpin most projections of future climate based on specified scenarios of greenhouse gas emissions and socioeconomic development (Chen et al. 2021).Climate models may be used individually, as when using general circulation models (GCMs) or earth system models (ESMs) to provide global simulations at a relatively coarse spatial resolution; or in combination, as when using GCMs or ESMs to provide boundary conditions for high-resolution regional climate model (RCM) simulations.
Climate model experiments are associated with multiple sources of uncertainty (Rougier and Goldstein 2014), which are often studied using collections ('ensembles') of model runs.These include ensembles that explore internal variability, often by varying initial conditions in a single model (e.g.Rodgers et al. 2021;Maher et al. 2021;von Trentini et al. 2020); perturbed physics ensembles (Collins et al. 2006) that vary the representation of key processes, again within a single model; and multimodel ensembles (Meehl et al. 2007;von Trentini et al. 2019) that can be regarded as attempting to characterise the range of outcomes that could be anticipated on the basis of current scientific knowledge, as represented by the models considered.
In any such ensemble, the sampled uncertainties induce variation between ensemble members.
It is therefore of interest to characterise this variation: for example, by identifying sources of uncertainty to which the outputs are most sensitive and hence suggesting priorities for future model improvements; or by identifying representative subsets of members for use in applications where it is not feasible to process the entire ensemble.The latter situation can arise when climate projections are used as inputs to other models of climate change impacts: if the impacts models themselves are computationally expensive or if resources are otherwise limited, then this limits the number of ensemble members that can be used (Cannon 2015).
A substantial body of literature is devoted to quantifying the relative importance of different sources of variation within an ensemble.For example, Hawkins and Sutton (2009) applied a heuristic approach to projections from the World Climate Research Programme's Coupled Model Intercomparison Project phase 3 (CMIP3) multimodel ensemble, to partition the variation into components attributable respectively to GCMs, emissions scenarios and internal variability of the climate system.This approach was placed on a more formal footing using Analysis of Variance (ANOVA) techniques by Yip et al. (2011).A drawback of these authors' fixed-effects ANOVA methodology, however, is that the unambiguous partitioning of variation requires a balanced ensemble which, in this case, means that each GCM is run the same number of times with each emissions scenario.The CMIP3 ensemble does not meet this condition, so Yip et al. (2011) discarded some ensemble members and applied their methodology to the largest possible balanced subset.Of course, discarding ensemble members incurs a loss of information: this can be avoided using a random-effects ANOVA (Northrop and Chandler 2014); or by considering the problem as one of a balanced ensemble with some values missing, and estimating ('imputing') these missing values (Evin et al. 2019(Evin et al. , 2021)).The latter approaches both use Bayesian methods and require the use of Markov Chain Monte Carlo (MCMC) techniques that are computationally intensive.
This perhaps disincentivises their routine use in situations where the outputs of interest are highdimensional (i.e.involve many quantities of interest), for example when analysing maps involving large numbers of spatial locations.
In high-dimensional settings, a further potential concern with ANOVA approaches is that they are designed to analyse scalar-valued quantities.Most applications of ANOVA in climate science have therefore considered each quantity separately: for example, Christensen and Kjellström (2020) applied fixed-effects ANOVAs to a subset of the EuroCORDEX regional model ensemble (Jacob et al. 2014), separately for each spatial location within the EuroCORDEX study region, and mapped the results.However, a comprehensive analysis of variation in high-dimensional ensemble outputs requires methods that are specifically designed for a high-dimensional setting.This has rarely been attempted to date, a notable exception being Sain et al. (2011) who used functional ANOVA methods to analyse ensembles of maps within a Bayesian framework which, as noted above, can be computationally expensive.Other exceptions include applications of clustering methods to find representative subsets of ensemble members for use in impacts studies (e.g.Cannon 2015;Casajus et al. 2016): these methods aim to produce a simplified representation of the variation by clustering the ensemble members into groups that are relatively homogeneous and distinct.
Alternative simplified representations can be obtained via dimension reduction, for example by applying principal components analysis (PCA) or empirical orthogonal function (EOF) analysis across ensemble members (e.g.Li and Xie 2012;Yim et al. 2016;Wang et al. 2020).Such 'inter- the cited references demonstrate how these modes may be of interest in their own right, as well as how the associated scores (see below) can be used to identify contrasting models for further study.
A potential difficulty, however, is that ensembles often contain structure that EOF analyses are not designed to handle.In regional ensembles for example, structure is induced by the GCM:RCM combinations used to generate each member.Similarly, in the CMIP ensembles, direct EOF analyses will be dominated by the models with the most runs.To avoid this problem when analysing CMIP5 outputs, Yim et al. (2016) worked with the average of each model's runs, while Wang et al. (2020) analysed a single run from each model.However, no formal justification was provided for either approach.
Against this background, the main contribution of the present paper is to formalise the application of dimension reduction techniques to climate model ensembles, while also accounting appropriately for ensemble structure.The resulting approach, which we call ensemble principal pattern (EPP) analysis, reduces to inter-model EOF analysis when applied to unstructured ensembles containing a single member per model.It is intended primarily as a tool to enable the rapid exploration of ensembles, either to select contrasting or representative members for use in impacts studies or to highlight distinctive modes of variation that may merit further investigation.
To motivate our proposal and fix ideas subsequently, Section 2 presents some outputs from the EuroCORDEX regional ensemble over the United Kingdom, and discusses some questions that could be asked about these outputs.The basic inter-model EOF methodology is described in Section 3 as a reference.In Section 4 the methodology is extended, in conjunction with multivariate ANOVA (MANOVA) techniques, to handle ensembles with more complex structures: the extension is illustrated using the EuroCORDEX example.Section 5 concludes and suggests further potential applications.

Motivating example: the EuroCORDEX regional ensemble
We consider the bias in simulated summer (JJA) mean daily maximum surface temperature ('tasmax') across the United Kingdom, over the period 1989-2008, for 64 EuroCORDEX ensemble members (Jacob et al. 2014) that were produced using combinations of ten RCMs and ten GCM runs from the CMIP5 experiment (Taylor et al. 2012).The GCM runs, from six unique GCMs, were conditioned on CMIP5 historical forcings until 2005, and on the RCP8.5 emissions scenario 5 from 2006 to 2008: over this limited period the RCP8.5 forcings deviate very little from the the 2005 levels (Van Vuuren et al. 2011) and hence can be considered as plausible proxies for the historical values.
To calculate the biases in JJA tasmax, the 1989-2008 mean tasmax values were first calculated for each ensemble member on the native grid of the corresponding RCM.Next, these mean values were regridded to the 12 × 12km 2 grid used by the HadUK gridded observational dataset (Perry et al. 2009), using a conservative area-weighting scheme (Jones 1999): the 12km grid resolution is similar to that of each EuroCORDEX RCM.Finally the biases for each ensemble member, shown in Figure 1, were computed by subtracting the 1989-2008 mean HadUK tasmax values.
The columns of Figure 1 represent GCM runs, and the rows represent RCMs.To simplify the subsequent presentation, the ten GCM runs will be considered as representing separate models throughout: this is not required for the methodology, however.With this convention, each GCM:RCM pair contributes at most a single member to the ensemble.The ensemble is unbalanced however, with 36 of the 100 possible pairs missing.The missing pairs include four that are available but have been excluded deliberately: three because they are superseded in the ensemble by runs from later versions of the same RCMs driven by the same GCMs, and one because it contains inconsistent metadata regarding the driving GCM.
As noted in Section 1, lack of balance causes problems when attributing variation to its potential sources, whence analyses are sometimes restricted to the largest balanced subset of an ensemble.
Here, the largest such subset involves eight RCMs and four GCM runs, hence including just 32 of the 64 available ensemble members.It is clearly undesirable to restrict attention to such a small subset.
Apart from the 'missing' runs, the most obvious feature of Figure 1 is the sixth column, corresponding to the HadGEM2-ES GCM: all ensemble members driven by this GCM appear to have a warm bias.Closer inspection also suggests potential cool biases associated with the HIRHAM5, RACMO22E and RCA4 RCMs (fourth, sixth and seventh rows): however, it is hard to be confident about this based on a purely visual inspection.Moreover, it is hard to identify any systematic and detailed spatial structure from the maps.This creates difficulties for several potential users of the ensemble.Consider, for, example, a regional climate modeller who sees EuroCORDEX as an opportunity to explore the simulation of summer temperatures by different RCMs: apart from the apparent cool biases noted above, Figure 1 does not obviously provide useful information to support such a comparison.
Another potential user is the climate impacts modeller with limited computational resources.Such a user might produce the equivalent of Figure 1, showing future changes in some impactrelevant quantity under a specified emissions scenario; and they may wish to choose a small number of ensemble members spanning the range of potential impacts.This could be done by computing spatial averages of relevant quantities for each ensemble member, and then choosing members spanning the range of the resulting averages.However, spatial averages are not necessarily the most relevant quantities when considering impacts that may be localised in space (e.g.associated with urban areas): it would therefore be helpful to find an alternative visualisation that allows users to identify detailed spatial structures in an ensemble .
Our two notional EuroCORDEX 'users' have different interests and priorities.Nonetheless, both would benefit from a visualisation that simultaneously (i) is more compact than Figure 1 and (ii) allows them to identify potentially interesting modes of variation.This is the aim of our EPP methodology -although, as discussed later, its use is not restricted to regional ensembles.

EPP analysis: the simplest case
To develop the methodology, we start by considering an unstructured ensemble -for example, a multimodel ensemble with a single member per model, or a perturbed physics ensemble with one member per model variant.The ensemble outputs are considered as a collection of vectors {Y  :  = 1, . . ., } say, where  is the number of members and Y  = ( 1  2 • • •   ) ′ is a vector of length  containing a value for the th member at each of  spatial locations.Moreover, let Y be the  ×  'ensemble data matrix' with Y  as its th row; let Y =  −1  =1 Y  denote the  × 1 overall ensemble mean vector, with th element   =  −1  =1   ; and let 1 be an  × 1 vector of ones.In this case, as noted earlier, the proposed methodology is equivalent to an inter-model EOF analysis which, for present purposes, is most conveniently presented by considering the Singular Value Decomposition (SVD) of the centred data matrix Y − 1Y ′ = Y say, with Y  − Y as its th row.Since the approach is widely used already (see Section 1), we merely sketch it here to prepare for the extension to structured ensembles in Section 4. For more details of the underpinning mathematics, see Krzanowski (1988, Section 4.1) or Gentle (2007, Section 3.10).
When  < , the centred matrix Y typically has rank  − 1 so that  separate maps are needed to visualise the complete ensemble (the ensemble mean, plus  − 1 maps to visualise the centred matrix -or, equivalently, separate maps of each ensemble member).Dimension reduction aims to find an accurate approximation of Y of lower rank, say  <  − 1, so that the information can be visualised in just  + 1 maps including the ensemble mean.
The SVD of the centred data matrix takes the form where U and V are orthogonal matrices with dimensions  ×  and  ×  respectively, and  is a  ×  diagonal matrix with non-negative elements sorted in decreasing order.Denote these elements by  1 >  2 > . . .>   = 0 (  is zero because the rank of Ỹ is at most  − 1); then (1) can be rewritten as where u  and v  are vectors, of length  and , containing the th columns of U and V respectively.
Denoting the th element of u  by    , the th row of Y is thus This matrix is a reduced-rank approximation to Y : the corresponding approximation to the original As noted above, the convention in PCA is to define the principal components as the columns of V and to subsume the singular values {  } into the corresponding scores.In the current context however, it is arguably more interpretable to multiply the {  } by the {v  } instead: the resulting spatial patterns {  v  } have the same units of measurement as the original quantity of interest and hence can be interpreted directly as contributions to the overall variation of that quantity, while the scores    are dimensionless and hence can be interpreted as 'weights' attached to each pattern.
We therefore define the EPPs as the patterns {  v  }; and the {   :  = 1, . . ., ;  = 1, . . ., } are the corresponding EPP scores.An ensemble member with a positive (resp.negative) score for the th EPP will tend to have positive (resp.negative) deviations from the ensemble mean in regions where the corresponding pattern v  is positive, and vice versa.
A further detail is that U and V in (1) are not uniquely defined, because replacing U and V with −U and −V respectively does not change the result.We remove this ambiguity by specifying that positive EPP scores are associated overall with above-average values of the quantity of interest.
Operationally, this is achieved by multiplying each EPP by the sign of its average value: if this average is negative, the multiplication reverses the signs of the EPP and the corresponding scores.

EPPs for structured ensembles
As noted in Section 1, many ensembles contain structure that is not accounted for by the basic methodology described above.We now extend the methodology to address this, by combining multivariate Analysis of Variance (MANOVA) techniques with SVDs.For ease of exposition, we focus on regional ensembles involving  RCMs and  GCMs, in which each RCM:GCM combination has been run at most once.Extensions to more general structured ensembles are straightforward in principle: details can be found in the Supplement to the paper.The ensemble structure can be represented using a statistical model: where  =1 α  =  =1 β  = 0.In (3), µ represents an overall mean; α  and β  represent systematic departures from this overall mean for the th GCM and th RCM respectively; and ε  is a 'residual' representing variation that cannot be attributed to systematic contributions from either the GCM or the RCM (for example, arising from internal variability in the system), treated as though it is drawn independently for each member from a distribution with mean vector 0 and covariance matrix .
With the exception of the  ×  matrix , these quantities are all vectors of length .Specifically, (3) can be written as where, in addition to the vector µ defined already, 1 is now a vector containing  •• ones; X  is an  •• × ( − 1) matrix in which the th column contains 1s in the rows corresponding to runs in which the th GCM was used, −1s in the rows where the th GCM was used, and zeroes everywhere else; X  is an  •• × ( − 1) matrix in which the th column contains 1s in the rows corresponding to runs in which the th RCM was used, −1s in the rows where the th RCM was used, and zeroes everywhere else; α is the ( − 1) ×  matrix in which the rows are the coefficient vectors {α  :  = 1, . . .,  − 1}; β is the corresponding ( − 1) ×  matrix containing the coefficient vectors matrix in which the rows are the vectors {ε  }; and the matrix , is obtained by placing its component parts side by side.The accompanying python script (see "Data Availability" statement) demonstrates these constructions.
Under formulation (4), the least-squares coefficient estimates satisfy the equation (Rao 1973) each side of which is an ( +  − 1) ×  matrix.The cost of solving this system increases linearly in  so that the solution is feasible on modern computers, even for large spatial domains.
Computing α = − −1 =1 α and β = − −1 =1 β now yields a complete set of effect estimates for each GCM and RCM in the ensemble, satisfying the constraints  =1 α =  =1 β = 0.These estimates can be mapped individually or, to avoid having to inspect  +  maps, subjected to their own SVD decompositions by stacking them into matrices of dimension  ×  and  ×  respectively.These matrices are both centred by construction.The SVD decompositions yield two sets of EPPs, summarising the dominant patterns of variation among the GCMs and RCMs respectively.Moreover, SVDs of the 'residual' matrix Y − 1 μ′ − X  α − X  β (which is also centred) define spatial modes of variation that are not systematically related to either the GCMs or RCMs.We denote this residual matrix by e (2) : the superscript is for notational consistency with the Supplement.
A full EPP analysis thus consists of two steps: first, calculation of the GCM, RCM and residual effects based on (3); and second, application of SVD to each set of effects to obtain the corresponding spatial modes of variation.

b. Regional ensembles: partitioning of variation
The development above provides low-dimensional estimates of RCM, GCM and residual effects in a regional ensemble.To fully understand the ensemble structure however, it is also necessary to quantify the relative importance of these effects.If, for example, the RCMs contribute only a small proportion of the total variation, then the RCM EPPs themselves are arguably of limited interest.
We now address this issue.The approach can be regarded either as an extension of the ANOVA methodology of Yip et al. (2011), to handle unbalanced ensembles with multivariate outputs; or as a computationally cheap alternative to the functional ANOVA methodology of Sain et al. (2011).

1) Balanced ensembles
Initially, it is again helpful to consider a balanced ensemble with no missing runs.In this case, the output vector for the (, ) run can be written as Consider now the total sum of squares and cross-products (SSCP) which is a matrix of dimension  ×  that plays the same role as the total sum of squares in a standard univariate ANOVA (Krzanowski 1988, Section 13.3).To aid interpretation, note that T /( − 1) is the sample covariance matrix between all pairs of locations across the ensemble.Using (6), a little algebra shows that the total SSCP can be decomposed as where are SSCPs associated with the GCMs and RCMs respectively; and represents residual unstructured variation.The relative magnitudes of these three SSCPs can therefore be used to summarise the relative importance of the three sources of variation within the ensemble.Note that for the balanced case under consideration, T  and T  are formed respectively from the least-squares estimates of the GCM and RCM effects in model (3): T  =   =1 α α′  and T  =   =1 β β′  .Note also that under (3), an unbiased estimator of the residual covariance matrix  is Σ = T  /( •• −  −  + 1): the denominator here is the residual degrees of freedom after estimating an overall mean together with  − 1 independent RCM effects and  − 1 independent GCM effects.Moreover, in the bal- and the right-hand side of equation ( 8) can be rewritten as We will revisit this expression later, when considering unbalanced ensembles.
Unfortunately, if the number of locations  exceeds 1, there is no unique way to compare the magnitudes of T  , T  and T  in (8) (if  = 1, then each of these matrices is a single number and can be expressed as a proportion of the total variation T ).One option is to produce maps showing the diagonal elements of T , and the proportional contributions of T  , T  and T  to each of these diagonal elements: these are the "total variability partition" maps in, for example, Figure 5 of Christensen and Kjellström (2020).Importantly, the diagonal elements can all be calculated without having to compute the SSCPs themselves (see the Supplement for details), which is helpful because they are of dimension  ×  so that storage requirements can be excessive for large .
To supplement the maps just described, the diagonal elements of the matrices in (8) can be summed to obtain their traces.The trace operator is additive, so that trace(T ) = trace(T  ) + trace(T  ) + trace(T  ) and the total trace is partitioned unambiguously into single-number sum-maries representing contributions from each source of variation.A potential disadvantage is that the off-diagonal elements of the SSCP matrices do not influence the partitioning: we return to this point in Section 5.For the moment however, we highlight a useful alternative interpretation involving the centred data matrix Y .It is easy to verify that the total SSCP T can alternatively be written as T = Y ′ Y ; and that trace(T ) is the sum of the squared elements of Y , referred to as the 'total variation' in Section 3.This quantity is the squared Frobenius norm of Y (Gentle 2007, Section 3.9), a standard measure of the overall magnitude of a matrix.Similarly, trace(T  ) and trace(T  ) are the squared Frobenius norms of X  α and X  β respectively (the contributions of the GCMs and the RCMs to the centred data matrix, according to ( 4)); and trace(T  ) is that of the residual matrix e (2) .The trace-based partitioning of variation therefore provides an interpretable decomposition focused on the data matrix itself, rather than the SSCPs.
Finally, we note that the traces of the components of ( 8) are related to the SVDs of the components of ( 4  2) .The proposed methodology therefore provides a hierarchical partitioning of the ensemble variation: the squared singular values of the EPPs for each component of (4) sum to the variation explained by that component and, in turn, these componentwise contributions sum to the total variation trace(T ).

2) Unbalanced ensembles
If some (, ) combinations are missing from a regional ensemble then equation ( 8) no longer holds and, as discussed in Section 1, the ensemble variation cannot be partitioned unambiguously.
In this case, instead of discarding or imputing members to obtain a balanced ensemble, one alternative is to determine the range of variation (RoV) that is potentially attributable to each source.This is done by performing two separate analyses, each based on a sequence of statistical models as described in the Supplement.In the first analysis, the maximum possible variation is attributed to the GCMs, with the RCM information used only to account for variation that cannot otherwise be explained.In the second analysis, the roles of GCMs and RCMs are reversed.The difference between the results indicates the extent to which the partitioning is affected by the lack of balance: in the balanced case, both analyses recover the unique decomposition ( 8).
An alternative to the RoV approach is to provide a partitioning of variation that maintains a hierarchical relationship with the RCM, GCM and residual EPPs, as found in a balanced ensemble.
This can be done via the alternative representation ( 9) of ( 8), because in the unbalanced case we still have estimates of the {α  }, the {β  } and  as described earlier.We therefore denote the three terms in (9) as T †  , T †  and T †  respectively, and define the estimated SSCP for a complete ensemble as The traces of the components T †  , T †  and T †  can now be used to estimate the relative contributions of each source of variation in a complete ensemble; and the corresponding EPPs provide a hierarchical partitioning of (10).
A potential objection to this approach is that it no longer provides an exact decomposition of variation for the observed ensemble.We therefore propose to use it in conjunction with the RoVs, as illustrated in the example below.Importantly, the RoVs are not guaranteed to contain the relative proportions derived from (10): they provide information on potential partitionings of variation in the observed ensemble, whereas (10) aims to account additionally for variation associated with unobserved ensemble members.If the observed members are unrepresentative in some sensefor example, if GCMs with above-average responses tend to be paired with RCMs that have belowaverage responses -then this lack of representativeness will feed into the RoVs.If the results from (10) lie outside these ranges therefore, this suggests that the characteristics of the available and missing ensemble runs may differ.
The use of an estimated partitioning of variation can be regarded as a form of imputation (see Section 1).By contrast with other imputation schemes however, it does not require estimation of the missing ensemble members: rather, it reweights the contributions to each SSCP to account for undersampled parts of the complete ensemble.This is similar in spirit to the well-established approach, in survey sampling, of reweighting to handle situations in which subgroups of a population are over-or under-represented (e.g.Little and Rubin 2020, Chapter 3).

c. Regional ensembles: an unbalanced example
The EPP methodology is now applied to the EuroCORDEX UK temperature biases of Section 2.
The data matrix has  •• 64 rows corresponding to the ensemble members, and  = 1652 columns corresponding to grid cells.
Figure 2 shows the estimated partitioning of variation for a complete ensemble based on equation (10).Focusing first on the estimated ensemble standard deviations in Figure 2(a), derived from T † , the most obvious features are two local areas of substantially enhanced variation.One of these is in the south-east, corresponding to the Greater London area; the other is in the English Midlands which is also a major conurbation.Other areas of enhanced variation are less pronounced, and include non-urban regions (e.g. the Scottish Highlands) as well as upland and industrial areas of northern England.This suggests that as far as summer maximum temperature biases are concerned, the ensemble indicates some uncertainty associated with large urban heat islands and also, to a lesser extent, with topography.10) to probe the sources of variation in the ensemble.The RCMs account for the highest percentage (53%) of the estimated total variation, although the GCMs also account for 38%: unstructured residual variation is relatively unimportant.These figures can be compared with the RoVs derived from the available ensemble members: these analyses reveal that the RCMs contribute between 35% and 62% of the total variation while the GCMs contribute between 29% and 56%.The GCM and RCM estimates from (10) both fall well within the respective ranges from the Type I analyses, whence there is no evidence that the available ensemble members are unrepresentative with respect to biases in summer tasmax.A closer inspection of the maps also reveals that in the two urban areas where the ensemble variation is highest, the RCMs account for up to around 70% of it: this perhaps reinforces recent evidence (Lo et al. 2020) that uncertainties in urban heat island effects are primarily attributable to the RCMs.
To understand the variation in more detail, Figures 3 and 4 show the GCM and RCM EPPs respectively.In both cases, panel (a) is the estimate of µ in (3).The first GCM EPP accounts for 92% of the GCM-attributed variation and shows a gentle north-south gradient; the second EPP is much less important.By contrast, the first RCM EPP -accounting for 87% of the RCM-related variation -clearly picks out the pattern corresponding to the effects of urban heat islands and topography.This demonstrates that the uncertainty regarding these effects comes predominantly from the RCMs, and in fact that it is the dominant pattern of RCM-related variation in the ensemble.
Moreover, the EPP scores associated with this pattern enable us to identify the RCMs with the smallest and largest such effects, which are RACMO22E and REMO2015 respectively.For users with a particular interest in UK summer maximum temperatures in urban heat islands, this analysis therefore helps to identify the EuroCORDEX runs spanning the range of relevant historical biases.

a. Summary of the proposed methodology
EPPs are designed to enable rapid exploration of structured climate model ensembles, particularly when the outputs of interest are high-dimensional.They are descriptive rather than inferential: in particular, we do not attempt to quantify uncertainties about sources of variation or to assess their 'statistical significance', which require more time-consuming methods as reviewed in the Introduction.Nor do they aim to estimate the underlying properties of a model or system: this  ensemble: for balanced ensembles the decomposition is exact, while for unbalanced ensembles it relies on minimal assumptions corresponding to representations such as equation (3).
For the descriptive analysis of regional ensembles, (3) itself is relatively uncontentious: it makes no distributional assumptions, although one potential restriction is the structure in which the effect of a given RCM is the same regardless of which GCM is driving it.This assumption can be relaxed if the ensemble contains multiple runs of each GCM:RCM combination: the Supplement provides details.The model provides interpretable summaries of the GCM and RCM effects via the least-squares coefficient estimates, and provides a framework for reweighting contributions to the SSCPs in unbalanced ensembles via equations ( 9) and ( 10).This reweighting does not require the estimation of missing ensemble members; nor does it require that members are discarded to obtain a balanced subset.An open question, however, is to assess the uncertainty associated with the use of T † , T †  , T †  and T †  in place of T , T  , T  and T  .A related question is considered by Christensen and Kjellström (2022), who use heuristic arguments to examine the implications of missing ensemble members for estimation of the coefficient vectors ( α  and {β  } in the current context).It is not obvious, however, that these arguments can be extended to examine the effect on the partitioning of variation.
Although EPP analysis is designed primarily for descriptive purposes, it is natural to ask how it is affected by sources of variation that are not considered explicitly -for example, equation (3) contains no direct representation of internal variability, because the EuroCORDEX ensemble does not provide multiple runs of each GCM:RCM pair with varying initial conditions.In such cases the associated unmodelled or unrepresented variation is subsumed into the residual term T  in (8), or its estimate T †  in (10).A consequence is that if, as in Figure 2, the residuals account for only a small proportion of the total variation, then the unmodelled sources must themselves contribute relatively little.

b. How many EPPs?
In Figure 3 , most variation in the GCM effects is dominated by the first two EPPs which represent respectively a spatial monopole and dipole.A reviewer has pointed out that this situation is common, and queried whether higher-order EPPs may reveal more nuances of structure.We The higher-order EPPs are relatively unimportant (for example, GCM EPPs 3 and 4 contribute respectively 0.9% and 0.3% of the GCM-attributed variation) and, moreover, exhibit no interpretable spatial patterns.These results are typical for GCM EPPs in our experience -although the monopole / dipole pattern is not so typical for RCM EPPs as exemplified by Figure 4.At some level, a lack of 'interesting' higher-order structure is itself noteworthy: for example, it suggests that identification of representative ensemble members can be done using just the first two EPP scores for each source of variation.Further investigation is needed, however, to determine whether these findings can be replicated across different ensembles, regions, time periods and quantities of interest.

c. Alternative measures of variation
To summarise the overall magnitudes of the SSCP matrices involved in the partitioning of total variation, the development above uses their traces.Although this approach is appealing for its relationship with the Frobenius norm of the centred data matrix, it neglects the off-diagonal elements of the SSCPs.These elements are related to the correlations between neighbouring locations and thus, in principle, contain information about the spatial extent of 'typical' differences between ensemble members.Alternative summaries of the SSCPs have been considered in the MANOVA literature (e.g.Huberty and Olejnik 2006, Chapter 3), albeit justified by the assumption (not made in the development above) that the residual vectors (the {ε  } in (3)) have multivariate normal (Gaussian) distributions.One such alternative derives from fact that in the Gaussian case the least-squares MANOVA coefficient estimates are also the maximum likelihood estimates (Krzanowski 1988, Section 15.2); hence fitted models can be summarised in terms of their maximised log-likelihoods.In particular, the scaled deviance for a model is defined as twice the difference between its maximised log-likelihood and the highest log-likelihood attainable (i.e. the log-likelihood from a model that fits the ensemble outputs perfectly).The scaled deviance is a measure of 'lack of fit': for example, in linear regression models it is proportional to the residual sum of squares (Davison 2003, Section 10.2).
For a two-way additive MANOVA such as equation ( 3), it can be shown (see the Supplement) that in a balanced ensemble the scaled deviance for Model 0 can be partitioned into contributions from the GCMs, RCMs and residuals in exactly the same way as the total trace, and that the proportions 21 temperature biases individually for each grid square would have removed the interesting excess variation associated with urban heat islands and topography in the first panel of Figure 2.
The potential applications of EPP analysis extend beyond the regional ensemble example considered above.Consider, for example, a CMIP ensemble with different numbers of runs for each model (see references in Section 1).The EPP analysis of such an ensemble starts from the representation where Y  now denotes the output from the th run of the th CMIP GCM ( = 1, . . ., ); µ represents an overall mean; α  represents a systematic departure from this mean for the th GCM; and ε  represents residual variation.( 11) is the direct analogue of (3): the least-squares estimates of µ and α  are now μ =  −1  =1 Y  and α = Y  − μ respectively, where Y  is the mean of the ensemble members from the th GCM.An EPP analysis thus focuses on the SVDs of the α and, if relevant, the residuals Y  − μ − α .This is essentially the inter-model EOF approach taken, without formal justification, by Yim et al. (2016) (see Section 1).Moving beyond regional and single-scenario CMIP ensembles, the Supplement describes extensions to ensembles with more complex structures.For example, the approach could be used to explore the spatial structure of GCM-and scenario-specific variation in the CMIP ensembles which, as noted above, are often highly unbalanced: a simple application in this setting would use the GCM EPPs to characterise (dis)similarities between models in terms of their spatial patterns of projected future change.A more sophisticated analysis might focus on scenario effects: in such an analysis it may be reasonable to expect that the first scenario EPP will correspond to an overall pattern of change, and for the corresponding scenario-specific EPP scores to be related to some measure of net radiative forcing.Any departures from this expected pattern could yield interesting insights into the dynamics of the models.Other potential applications are to ensembles in which a single model is used to obtain projections for each of a set of emissions scenarios, starting from each of a common collection of carefully chosen initial conditions: here, an EPP analysis of the residual / interaction term in an additive representation of scenario and initial condition effects could potentially reveal information about the state-dependence of climate change signals.
EPPs can also be used to identify gaps in an existing structured ensemble, and hence to identify design priorities for additional runs.For example, in a regional ensemble each member can be summarised using the first EPP scores for the corresponding RCM and driving GCM respectively: the ensemble structure can then be visualised as a scatterplot of the corresponding pairs of scores.Such a plot will reveal combinations of scores -and hence of characteristic modes of behaviour -that are not well represented and hence could be prioritised in subsequent ensemble updates.
Finally, we note that EPP analysis has potential applications beyond climate model ensembles.
One such application is to gridded data products providing estimates of quantities that are not observed directly at the locations of interest: in such settings, one way to characterise the estimation uncertainty in the data product is to provide multiple samples from its joint uncertainty distribution.
The uptake of such such techniques by data product providers is currently low, although they are likely to become more widely available in the future (Chandler et al. 2012).EPPs provide one possible route for data product users to choose informed and representative subsets of samples, enabling uncertainty to be propagated through their subsequent analyses.
model EOF' analyses seek to identify dominant modes of spatial variation across the ensemble: 4 Accepted for publication in Journal of Climate.DOI 10.1175/JCLI-D-23-0089.1.

Fig. 1 .
Fig. 1.Bias in mean simulated surface temperature ( • C) over the UK, for the period 1989-2008, from 64 members of the EuroCORDEX regional ensemble (see main text for details).Rows and columns correspond to RCMs and GCMs respectively.
These vectors are each of length  and, in the current context, represent spatial patterns: moreover, the patterns are uncorrelated because V is orthogonal.In fact, these patterns are the principal components of Y and the weights {     :  = 1, . . ., ;  = 1, . . ., } are the corresponding principal component scores.Now consider truncating the right-hand side of equation (2) to retain just  <  terms.The result is an  ×  matrix, Y * say, of rank  and with th row  1 a. Regional ensembles: theory We denote the ensemble data matrix by Y as before.Now however, the individual ensemble members are denoted by {Y  }, with  and  indexing RCMs and GCMs respectively.The total 10 Accepted for publication in Journal of Climate.DOI 10.1175/JCLI-D-23-0089.1.number of runs is denoted by  •• so that the dimension of Y is  •• × ; and  • and  • denote the numbers of runs involving RCM  and GCM , respectively.
Equation (3) defines an additive MANOVA model, which is the multivariate analogue of the two-way ANOVA model used in several references from Section 1. Least-squares estimates of the coefficient vectors {α  } and {β  } can be obtained from the data matrix: the estimation is particularly simple for a balanced ensemble with a run for every RCM:GCM combination so that  • = ,  • =  and  •• = .Let Y • =  −1 •  =1 Y  and Y • =  −1 •  =1 Y  denote the mean vectors over all available runs for the th RCM and th GCM respectively; and let Y •• =  −denote the overall ensemble mean.Then, following standard derivations e.g.Davison (2003, Section 9.2.2), in the balanced case the least-squares estimates of µ, α  and β  are respectively μ = Y •• , α = Y • − Y •• and β = Y • − Y •• .These are natural summary measures: for example, the estimated mean field for GCM  is μ+ α = Y •• + Y • − Y •• = Y • , witha corresponding expression for RCM .However, the appropriate analogues of these quantities are less obvious in unbalanced ensembles with some missing RCM:GCM combinations.In this case, model (3) provides a natural generalisation as we now elaborate.Modern treatments of unbalanced (M)ANOVA models (e.g.Faraway 2014, Section 13.2) exploit their alternative representation as linear models with dummy covariates defining group membership.11 Accepted for publication in Journal of Climate.DOI 10.1175/JCLI-D-23-0089.1.

13
Accepted for publication in Journal of Climate.DOI 10.1175/JCLI-D-23-0089.1.Unauthenticated | Downloaded 11/27/23 03:00 PM UTC ): if the singular values of Y = Y − 1 μ′ are  1 ≥ ... ≥   •• = 0 as in Section 3, then trace(T ) = , with corresponding expressions for the remaining traces.The singular values of X  α are just √  times those for the GCM EPPs, while those of X  β are √  times those for the RCM EPPs and trace(T  ) is related to the SVD of the residual matrix e (

Fig. 2 .
Fig. 2. Estimated decomposition of variation in the completed EuroCORDEX ensemble, for bias in summer mean daily maximum surface temperature over the UK from 1989 to 2008.Panel (a) shows the ensemble standard deviations at each grid cell (i.e. the square roots of the diagonal elements of T † /( •• − 1), with T † as in (10)), while panels (b)-(d) show the diagonal elements of T †  , T †  and T †  respectively, as percentages of the corresponding elements of T † .The traces of the respective matrices are also quoted, as percentages of trace(T † ).

Figures 2
Figures 2(b) to 2(d) use equation (10) to probe the sources of variation in the ensemble.The

Fig. 3 .
Fig. 3. EPP analysis for the estimated GCM effects in the 64-member EuroCORDEX ensemble, for bias in summer mean daily maximum surface temperature from 1989 to 2008.Panel (a) represents the ensemble mean; panels (b) and (c) are maps of the first and second GCM EPPs respectively; and panel (d) shows the associated EPP scores for each GCM in the ensemble.Subtitles for panels (b) and (c) indicate the percentage contributions of each EPP to the GCM-attributed variation.

Fig. 4 .
Fig. 4. EPP analysis for the estimated RCM effects in the 64-member EuroCORDEX ensemble, for bias in summer mean daily maximum surface temperature from 1989 to 2008.The panels are directly analogous to those in Figure 3.
have investigated this (details are in the accompanying code -see Data Availability Statement).20 Accepted for publication in Journal of Climate.DOI 10.1175/JCLI-D-23-0089.1.
approximation of Y ; and the th spatial pattern v  accounts for a proportion  2 say.In fact, denoting by  *  the (, )th element of Y * , the sum of squared approximation errors  =1  =1   − *  2 is smaller than under any other rank 9 Accepted for publication in Journal of Climate.DOI 10.1175/JCLI-D-23-0089.1.