## Abstract

It is of fundamental importance to evaluate the ability of climate models to capture the large-scale atmospheric circulation patterns and, in the context of a rapidly increasing greenhouse forcing, the robustness of the changes simulated in these patterns over time. Here we approach this problem from an innovative point of view based on dynamical systems theory. We characterize the atmospheric circulation over the North Atlantic in the CMIP5 historical simulations (1851–2000) in terms of two instantaneous metrics: local dimension of the attractor and stability of phase-space trajectories. We then use these metrics to compare the models to the Twentieth Century Reanalysis version 2c (20CRv2c) over the same historical period. The comparison suggests that (i) most models capture to some degree the median attractor properties, and models with finer grids generally perform better; (ii) in most models the extremes in the dynamical systems metrics match large-scale patterns similar to those found in the reanalysis; (iii) changes in the attractor properties observed for the ensemble-mean 20CRv2c are artifacts resulting from inhomogeneities in the standard deviation of the ensemble over time; and (iv) the long-term trends in local dimension observed among the 56 members of the 20CR ensemble have the same sign as those observed in the CMIP5 multimodel mean, although the multimodel trend is much weaker.

## 1. Introduction

One of the main sources of uncertainty in determining the impact of climate change on extreme events is the forced response of atmospheric dynamics (Shepherd 2014; IPCC 2012). While changes in observables such as surface temperature are easily diagnosed, shifts in the midlatitude atmospheric patterns have proved very difficult to quantify. Some advances have been made by focusing on specific features such as atmospheric blocking (Kay et al. 2015; Cassou and Cattiaux 2016; Faranda et al. 2016b), which in turn influence the occurrence of European cold spells and heat waves, but the broader appreciation of circulation changes is still unsatisfactory. Here we shed some light on this knowledge gap by using a dynamical systems framework. We illustrate the power of such an approach by considering the well-known Lorenz (1963) system, a conceptual model of atmospheric convection consisting of three differential equations:^{1}

where *x*, *y*, and *z* represent, respectively, the convection strength, the temperature difference between the surface and the top of the troposphere, and the asymmetry of the convection cells. The parameters *σ* and *r* are the Prandtl and the Rayleigh numbers, respectively, and *b* is a ratio of critical parameters. A trajectory of the Lorenz (1963) attractor is shown in blue in Fig. 1. Figure 1 consists of 2000 points obtained by iterating the Lorenz equations with *dt* ≃ 0.035, *σ* = 28, *r* = 10, and *b* = 8/3 with a Runge–Kutta scheme of order 4. These are the classical parameters used by Lorenz (1963).

To study the effects of an external forcing, we now increase *σ* by about 2% with respect to the classical value, such that *σ* = 28.5 (violet trajectory in Fig. 1). This new trajectory on average favors higher values of *z*, but the changes relative to the original trajectory depend on the point being considered; some points are not displaced, while some others are mapped elsewhere. Assuming no knowledge of the system other than the trajectories’ paths, is there a way to determine that they belong to Lorenz attractors with different forcings? To answer this question we would need (i) to measure the dynamical properties of an ensemble of trajectories representing the two configurations and (ii) to estimate the distance between these trajectories and determine if the shift has changed the properties of the points in a detectable way. As a further complicating factor, we note that changes in natural systems are often gradual, and a sudden jump from one forcing to a different one is unlikely. A more realistic example discussing a Lorenz attractor with a smoothly varying forcing is presented in section 2.

The atmospheric equivalent of a point on the Lorenz attractor is the ensemble of instantaneous fields describing the atmosphere at a time *t*. Here we study the atmospheric circulation over the North Atlantic and focus on a single field: the sea level pressure (SLP). The SLP field reflects the major modes of variability affecting the North Atlantic (Hurrell 1995; Moore et al. 2013) and can further be used to diagnose a wealth of other atmospheric features, ranging from teleconnection patterns to storm-track activity to atmospheric blocking (e.g., Rogers 1997; Comas-Bru and McDermott 2014). The trajectories of our dynamical systems are the successions of daily SLP fields from 26 CMIP5 models (Taylor et al. 2012) and the Twentieth Century Reanalysis version 2c (20CRv2c) (Compo et al. 2011) over the period 1851–2000. The choice of the North Atlantic domain is motivated by the better observational coverage over the region in the first part of the analysis period compared to other regions of the globe (Krueger et al. 2013). To measure changes in the systems, one must be able to specify at each point (each day) the local (in phase space; daily in time) dynamical properties and track their evolution. Recent contributions to dynamical systems analysis have proven that local properties of the trajectories are characterized by two quantities: the local dimension and the stability of the field considered (Lucarini et al. 2016; Faranda et al. 2017b). They correspond respectively to the rarity and the typical persistence of the configuration. Faranda et al. (2017b) and Messori et al. (2017) have also shown that these two metrics can be connected to the predictability of a given atmospheric state and that their extremes match climate extremes.

In this work we will first assess whether the models and reanalysis present similar attractor properties over the full time period considered. To do this, we compute daily values for the dimension and stability of the SLP fields and study their average and extreme properties. Next, we quantify their changes over the analysis period. We then compare the changes seen in the models to those observed in the reanalysis.

## 2. Data and methods

We use daily SLP model output from the historical simulations of 26 CMIP5 models (Table 1) available from the CMIP5 archive (Taylor et al. 2012). We then compare these simulations to the 20CRv2c dataset (Compo et al. 2011), studying both the ensemble mean and the 56 individual members. The analysis focuses on the region 22.5°–70°N, 80°W–50°E over the period 1851–2000. SLP anomalies are defined as deviations from a daily climatology. For example, the SLP anomaly at a given geographical point on 5 December 2000 is computed relative to the mean SLP value at that same location for all 5 Decembers over the analysis period.

To compute the dynamical systems metrics, we combine the statistical tools of extreme value theory with the results obtained by Freitas et al. (2010) for Poincaré recurrences. The parameters mentioned in the introduction (local dimension *d* and stability 1/*θ*) are computed for the points *ζ* on the attractor obtained as a sequence of states of the system. The dynamical indicators are linked to the probability *p* that a trajectory *x*(*t*) explores a ball centered on *ζ* with diameter 2*ε* (i.e., the recurrence rate of the configuration *ζ*). We briefly outline the physical meaning of these quantities and the way they are computed below.

### a. Local dimension

The Freitas et al. (2010) theorem and its modification in Lucarini et al. (2012) states that the probability *p* of entering a ball centered on *ζ* with a radius *ε* for chaotic attractors obeys a generalized Pareto distribution (Pickands 1975). To compute such probability, we first calculate the series of distances *δ*[*x*(*t*), *ζ*] between the point on the attractor *ζ* and all other points *x*(*t*) on the trajectory. We then put a logarithmic weight on the time series of the distance:

The reason for taking the logarithm is explained by Collet and Eckmann (2009): in the dynamical system setup, the negative logarithm increases the discrimination of small values of *δ*(*x*, *y*), which correspond to large values of *g*[*x*(*t*)]. The probability of entering a ball of radius *ε* centered on *ζ* is now translated into the probability *p* of exceeding a threshold *q*. In the limit of an infinitely long trajectory, such probability is the exponential member of the generalized Pareto distribution:

whose parameters *μ* and *β* depend on the point *ζ* chosen on the attractor. Remarkably, *β*(*ζ*) = 1/*d*(*ζ*), where *d*(*ζ*) is the local dimension around the point *ζ*. This result has been applied to SLP fields in Faranda et al. (2017b) and also to surface temperature and precipitations fields in Faranda et al. (2017a). In this paper we use the 97.5th quantile of the series *g*[*x*(*t*)] to determine *q*. We have further checked the stability of the results against reasonable changes in the quantile.

### b. Local stability

The probability *p* contains information about the geometry of the ball around *ζ* but provides no insight on the temporal evolution of the dynamics around *ζ*. In particular, it is interesting to know the mean residence time of the trajectory within the neighborhood of *ζ*. To measure this quantity, we employ the extremal index *θ* (Freitas et al. 2012; Faranda et al. 2016a), namely, the inverse of this mean residence time. Heuristically, if the trajectory enters the neighborhood of *ζ* for *N* times and at each time *i* the length of the cluster (i.e., the number of successive time steps when the trajectory is within the neighborhood) is *τ*_{i}, a simple estimate is , such that *θ* varies between 0 and 1. The value *θ* = 0 corresponds to a stable fixed point of the dynamics where the observation *ζ* is repeated infinite times (as for a pendulum left in its equilibrium position). A value of *θ* = 1 indicates a point immediately leaving the neighborhood of *ζ*. Since *θ* is the inverse of a persistence time, it depends on the *dt* used. If *dt* is too large, the time dependence structure is hidden. The system will rarely persist more than one time step within the neighborhood of *ζ*, and *θ* will therefore be close to 1 at all points because of the cluster undersampling. In Faranda et al. (2017b) it has been observed that *θ* for SLP fields over the North Atlantic varies between 0.3 and 0.5, when *dt* = 1 day. In this work we use the same *dt*. The extremal index is estimated using the likelihood estimator developed by Süveges (2007).

Figure 2 illustrates the meaning of the indicators for atmospheric flows: the local dimension *d* is the number of degrees of freedom needed to describe the dynamics of the system linearized around the state *ζ* and is therefore proportional to the number of degrees of freedom of *ζ*. In the pictogram, *d* = 2 since we only consider two possible precursors to and evolutions of the state *ζ*. The inverse of the persistence time *θ* is linked to the probability that the trajectory follows a path where each field resembles those of the previous and subsequent days. We present two possibilities: 1) along the green trajectory, the pattern changes every day so that *θ*(*t*) = 1 and 2) along the red trajectory, the same pattern is observed for three consecutive days so that *θ* = 1/3.

To evaluate how close the *d* and *θ* of the different models are to those of the reanalysis, we adopt a number of distance metrics. The simplest metrics that can be defined are *R*(*d*) = Δ(*d*)/max[Δ(*d*)] and *R*(*θ*) = Δ(*θ*)/max[Δ(*θ*)], where Δ represents the difference between the median values of *d* or *θ* of each model and the 20CRv2c values. We further compute a global score:

To check the validity of the global score, we compare *R*_{tot} with the Wasserstein distance , computed as described in Robin et al. (2017).

The Wasserstein distance measures how two multivariate probability density functions *μ* and *η* differ from each other. Probability distributions are like distributions of mass (normalized to 1) across space. The Wasserstein distance is proportional to the minimal work that is needed to transport one distribution of mass into another. It is therefore rooted in optimal transport theory (Villani 2008; Santambrogio 2015). In practice, the two measures *μ* and *η* are discretized into multivariate histograms, which are sums of Dirac masses:

The distribution *μ* is transformed into *η* by the transport plan *γ* such that *γ*_{ij} transports the mass *μ*_{i} at *x*_{i} to *η*_{j} at *x*_{j}. Therefore, the *γ*_{ij} ≥ 0 verify

If Γ is the set of all possible transport plans of *μ* to *η*, then the Wasserstein distance is

where ∥⋅∥ is the usual Euclidean distance.

Before beginning the analysis of the model and reanalysis data, it is necessary to determine if the dynamical properties are sensitive to the changes of the attractor due to continuous modifications of the underlying forcing and if these changes are statistically detectable. There are few theoretical results on nonstationary statistics of dynamical systems, as well as on nonstationary extreme value theory. Luckily, the recurrence approach we use here to estimate *d* and *θ* allows us to bypass most of the technical difficulties linked to nonstationarity because the dynamical properties are measured with respect to each individual state *ζ* of the attractor. If the change affects the neighborhood of a state, it will change its dynamical properties. If most of the states are affected by the changes in the dynamics, then the average dimension of the attractor and the average persistence will change accordingly.

To test this idea, we consider the Lorenz (1963) systems discussed in the introduction and perform a set of 30 realizations (trajectories), where *σ* for each realization varies continuously over 28 < *σ* < 28.54 according to *σ*(*t*) = *σ*(*t* − *dt*) + *δσ*, with *δσ* = 10^{−5}. Each realization consists of 54 000 iterations with time step *dt* = 0.02, starting from random initial conditions in the basin of attraction. The number of time steps and the *dt* are chosen to mimic the persistence properties of the SLP field over the North Atlantic, which displays a median *θ* of around 0.5 (see section 3). If our methodology can indeed detect the gradual change from *σ* = 28 to 28.54, then the *d* and *θ* distributions for the first and second half of the simulations should be significantly different. To provide a visually immediate picture, we show distributions of the medians of (*d*, *θ*) for each half of each simulation (Fig. 3). It is straightforward to verify that the data form two clouds of well-separated median centroids. This analysis therefore confirms that the indicators are sensitive to small changes in the attractor properties, and that we can attempt to use them to detect long-term changes in the dynamical properties of reanalysis and CMIP5 data.

## 3. Aggregate analysis of model and reanalysis attractors

We begin the analysis of the daily SLP fields from 1851 to 2000 by presenting the scatterplot of *d* versus *θ* for the ensemble mean of the 20CRv2c reanalysis (Fig. 4). Hereafter we will call this run 20CR-EM. With respect to the computations done for the Lorenz (1963) system, *ζ* is now a daily SLP map and distances are computed using the Euclidean metric at each grid point as in Faranda et al. (2017b). The average of *d* is proportional to the number of degrees of freedom needed to represent the systems’ dynamics (this quantity is called attractor dimension in dynamical systems theory), while the average of *θ* is the inverse of the mean persistence time of a given SLP configuration. Maxima (minima) of *d* therefore correspond to the most complex (simple) SLP configurations. Maxima (minima) of *θ* correspond to the most unsteady (stable) configurations (Messori et al. 2017; Faranda et al. 2017b). Figures 4a–d show the composite SLP anomalies for dynamical extremes—namely, days beyond the 2.5th and 97.5th quantiles of the *d* and *θ* distributions. These closely—albeit not exactly—resemble the canonical North Atlantic weather regimes (Vautard 1990). In particular, maxima of *θ* (Fig. 4a) reproduce an Atlantic ridge pattern, while minima of *θ* (Fig. 4b) correspond to a negative North Atlantic Oscillation (NAO) phase. Similarly, maxima of *d* (Fig. 4c) correspond to a blocking pattern and minima of *d* (Fig. 4d) correspond to a positive NAO. This is in agreement with previous results from Faranda et al. (2017b). The patterns are stable for definitions of dynamical extremes up to the 20th and 80th percentiles of the relevant distributions, although the magnitude of the composite anomalies reduces when including days corresponding to lower percentiles (not shown). We note that the values of *θ* should not be compared directly to the persistence of the traditional weather regimes defined using clustering algorithms, as the requirement that the flow does not leave the neighborhood of the state *ζ* is a more restrictive condition than continued permanence within a given cluster. For example, intense or frequent mobile synoptic systems leading to substantial day-to-day fluctuations in sea level pressure could cause the dynamics to leave the ball centered on the state *ζ* in phase space while remaining within a weather regime cluster. Indeed, if one considers the typical partition of the North Atlantic atmospheric variability into four weather regimes, the probability of being in one of them is of order 0.25, whereas the probability of being close to *ζ* is set by the threshold *q*, in our case 0.025 (see section 2).

The 20CR-EM data analyzed above are constructed by averaging instantaneous fields from a 56-member ensemble of simulations. The ensemble is less constrained at the beginning of the period, when surface observations were scarce, than at the end. This implies that the 20CR-EM fields are smoother at the beginning of the period than at the end, because the latter are obtained averaging over an ensemble with smaller differences between individual members. This may affect the effective number of degrees of freedom as measured by *d*. For this reason, we also measure *d* and *θ* for the 56 individual members and then average them to obtain a single daily value. We will refer to this quantity as the 20CRv2c mean of the ensemble (20CR-ME). Schematically,

20CR-EM indicates the daily dynamical properties computed for the 20CRv2c ensemble mean, and

20CR-ME indicates the average of daily dynamical properties computed for each single ensemble member.

We can now compare the distributions of (*d*, *θ*) for 20CR-EM to (*d*, *θ*) for 20CR-ME. The scatterplot of *d* versus *θ* for the 20CR-ME is shown in Fig. 5. A number of differences relative to the 20CR-EM appear (cf. Fig. 4). First, the median value of *d* is lower for 20CR-EM than for 20CR-ME (Table 2), which indicates that the averaging of SLP fields in the ensemble mean has suppressed some degrees of freedom. Similarly, the ensemble mean has higher persistence (lower *θ*) because smoother fields tend to have slower variations. Although the numerical values of *d* and *θ* differ, the cross-correlation coefficients between the time series for the two datasets are 0.93 and 0.97, respectively. This suggests that features such as the seasonality and interannual variability of the *d* and *θ* time series are preserved with the EM averaging. We next look at the SLP anomaly fields corresponding to the dynamical extremes of the 20CR-ME (Figs. 5a–d). We find similar patterns to those observed for 20CR-EM (Figs. 4a–d). Indeed, 70% of *d* maxima from 20CR-EM match those from 20CR-ME, while this percentage is 61% for the minima. For *θ* maxima and minima we find values of 81% and 55%, respectively. We further investigate the differences between 20CR-EM and 20CR-ME by looking at the changes of the dynamical properties over time in section 4.

We next compare the (*d*, *θ*) bivariate histograms obtained for the 20CR-EM (Fig. 6a) with those computed for the CMIP5 models. The (*d*, *θ*) histograms and the composite SLP anomalies corresponding to the *d* and *θ* extremes for all CMIP5 models analyzed here are shown in Figs. S1–S26 of the supplemental material. For conciseness, we limit our discussion to the models shown in Figs. 6c and 6e. Two different behaviors emerge: some of the models (e.g., CMCC-CMS; Fig. 6c) yield a unimodal distribution resembling that obtained for the 20CR-EM; other models (e.g., the IPSL-CM5A; Fig. 6e) show bimodal distributions. We find these different behaviors to be related to different seasonal cycles. In Figs. 6b, 6d, and 6f, we plot the (*d*, *θ*) scatterplots for the same models by coloring each point according to the month of the year it occurs in. In the 20CR-EM and the CMCC-CMS model, the different seasons are spread across the cloud, although maxima of *θ* mostly occur in winter and the summer season is biased toward low *d* and *θ*. The IPSL-CM5A displays a much stronger seasonal discrimination, with two distinct (*d*, *θ*) clouds for the winter and for the summer seasons corresponding to the different modes of the bivariate histograms. This implies that both the bulk statistics and the extremes are modified by the seasonal cycle. For a more detailed discussion of the seasonality of the dynamical extremes, we refer the reader to Faranda et al. (2017b).

Given the variety of the possible behaviors, we will analyze separately the mean and the extreme behavior of the CMIP5 dynamical properties. We report the aggregate analysis in Table 2 and in Fig. 7. In Fig. 7, the colored numbers correspond to the median values for each model (numbered as in Table 1), MM (in black) corresponds to the CMIP5 multimodel mean, the number 1 corresponds to the 20CR-EM median, and the blue dots correspond to the medians of each of the 56 members of the 20CRv2c ensemble. For 20CR-EM and 20CR-ME we also draw the ellipses whose semiaxes correspond to the standard deviation of the mean. Figure 7a refers to the whole period, whereas Figs. 7b and 7c show the analysis averaging over two different periods (1851–1978 and 1979–2000, respectively). We note that the median values of the 20CR-ME are so close to each other as to be almost indistinguishable, meaning that the individual members provide a coherent representation of the SLP field. As noted above, the 20CR-EM *d* and *θ* median values are both smaller than those of 20CR-ME. By inspecting Figs. 7b and 7c it is evident that the distance between 20CR-EM and 20CR-ME reduces in the second period. However, the distance between 20CR-ME and models remains about constant. In Table 2 we provide the median values and standard deviations of *d* and *θ* for all the models, and also present the distance metrics *R*(*d*), *R*(*θ*), and *R*_{tot} between models and reanalysis as introduced in section 2. To verify their robustness, we further compare *R*_{tot} to the Wasserstein distance _{2}, also defined in section 2 (Fig. 8). The two indicators provide very similar information (Pearson correlation coefficient *r*_{pear} = 0.90 and Spearman correlation coefficient *r*_{spear} = 0.85; von Storch and Zwiers 2002); from now on we will therefore use the simpler *R*_{tot} when discussing our results. Indeed, the Wasserstein distance would be particularly complex to compute for the dataset analyzed here.

All the models are within one standard deviation of the 20CR-EM (Fig. 7), while five are not within the 20CR-ME ellipse. Both Fig. 7 and Table 2 further indicate that in many cases, models with a higher horizontal resolution have median values closer to those of the reanalysis. We note, however, that the median values in *d* and *θ* are statistically different from those found in 20CR-EM for all models except models 24 and 26 for *d* and model 23 for *θ* when we consider the full 1851–2000 span. For the period 1851–1978, medians are statistically different except model 21 for *d* and model 12 for *θ*. For the second period (1979–2000), all medians are different except for models 13, 15, and 25 for *d* and models 2, 6, 10, 14, 20, 22, 24, 25, and 26 for *θ*.

For 20CR-ME, the medians are different for all models except model 13 in the period 1851–2000. For the period 1851–1978, all the medians are different. For the period 1979–2000 all the medians are different except for models 8 and 14 for *d*. The statistical significance is determined using the Wilcoxon rank sum test (von Storch and Zwiers 2002). The null hypothesis is that the variables are samples from continuous distributions with equal medians. A rejection of the null hypothesis therefore indicates a significant difference between the medians.

We also find that different models have different spreads in *d* and *θ* (Table 2), suggesting investigating the extremes of these quantities and their relation with the weather regimes found in 20CR-ME and 20CR-EM. A quantitative analysis is reported in Table S2 of the supplemental material using the root-mean-square error (RMSE) between 20CR-EM, 20CR-ME, and the CMIP5 SLP composite anomalies. In general, we find the (NAO-like) patterns in Figs. 4b,d and 5b,d to have a higher RMSE, whereas the patterns in Figs 4a,c and 5a,c are better represented. This is counterintuitive, since one might naively expect models to better reproduce low-dimensional rather than high-dimensional patterns. At the same time, the NAO is the dominant mode of variability in the North Atlantic region, which could explain why a higher RMSE is found in that region of the phase space. The patterns in Figs. 4b,d and 5b,d also display stronger gradients than the ones in Figs. 4a,c and 5a,c, suggesting that small displacement errors might lead to larger RMSE values for the former than for the latter.

## 4. Changes in the attractor properties

We now investigate whether the SLP’s dynamical indicators have changed over time by computing 30-yr moving time averages (denoted with angle brackets) for 20CR-EM, 20CR-ME (separately for the 56 members), and the 26 CMIP5 models. These results are presented in Fig. 9. The abscissa shows the final year of each averaging window. In Figs. 9a, 9c, and 9e, we consider the full 150 years and subtract from each model/reanalysis the respective mean values of the indicators in 1880 (i.e., a constant value for each time series given by the average over 1851–80); in Figs. 9b and 9d, we focus on the last 35 years of data and subtract the values of 1979 (i.e., averaged over 1950–79). All time series therefore have a value of 0 on the year the baseline mean values are computed for. In Figs. 9e and 9f, we display the absolute values of *d* and *θ*, without subtracting any baseline values.

In Fig. 9a, 20CR-EM (red) shows an increase in *d* over time, which is opposite in sign and an order of magnitude larger than the changes observed in 20CR-ME (blue), the single members (yellow), the CMIP5 multimodel mean (black), and the single CMIP5 models (color scale, ordered by *R*_{tot}). When focusing on the last 25 years of the datasets, the curves scale similarly and most of them show a moderate decrease of the dimension (Fig. 9b). Figures 9e and 9f show that the spread of the 20CRv2c ensemble reduces with time, contrary to that of the CMIP5 models. The rapid increase of the dimension for 20CR-EM during the presatellite era may then be caused by changes in the variance of the ensemble members, which as discussed in section 3 above become more closely constrained as we approach the present day. In fact, the standard deviation of the ensemble is closely linked to the number of observations incorporated in the analysis (e.g., Krueger et al. 2013). To test this hypothesis, we compute the daily values of the standard deviation of the 56 SLP fields and plot the moving average of such quantity against that of the dimension of 20CR-EM (Fig. 10). The expected near-linear relationship is found.

This points to the increase in 〈*d*〉 for 20CR-EM not being a physical trend but rather an artifact resulting from the scarcity of observations during the first part of the reanalysis period. However, from 1979 onward the variance stabilizes, and both the 20CR-EM and 20CR-ME show a decrease in local dimension. This decrease is also observed in most CMIP5 models, as reflected by the multimodel mean (Fig. 9b). Even though the models are ordered by increasing *R*_{tot}, it is evident that the *best* models for the overall period do not correspond to the models showing a decrease in time in the local dimension. It should be further noted that, when computing the *R*_{tot} statistics for different periods, the best models change and there are no clear *winners*.

The inverse persistence *θ* of 20CR-EM shows a similar behavior (Fig. 9c). Here *θ* increases up to 1950 and then levels off. In terms of persistence time, this increase is small compared to the resolution of the datasets used for the analysis (1 day). The 20CR-ME shows a similar, albeit weaker, trend, while the CMIP5 models mostly oscillate around zero. No systematic trends can be identified after 1970 (Fig. 9d).

We conclude by noting that the changes found in CMIP5 models are mostly of the same sign as those found for 20CR-ME but systematically smaller in magnitude. This is associated with a much larger spread between the different models than between the different members of the reanalysis, with a number of models showing trends of opposite sign to those of the multimodel mean.

## 5. Discussion and conclusions

We have computed the instantaneous dynamical properties of the SLP fields over the North Atlantic region for the 20CRv2c and the CMIP5 historical runs, over the period 1851–2000. The goal of our analysis was to assess whether different models with different physics and spatial resolutions quantitatively represent the *same* dynamical system and therefore possess attractors with similar characteristics. The metrics we use are the local dimension *d* and the inverse of the persistence time *θ*. As described in Faranda et al. (2017b), these two quantities give a complete characterization of the attractor of the system. To take into account the possibility that inhomogeneities in the assimilated data create artifacts in the 20CRv2c ensemble-mean SLP fields, we have computed two sets of dynamical properties. The first is composed of the dynamical properties of the ensemble-mean field (20CR-EM). The second is computed as the average of the instantaneous dynamical properties of each of the 56 ensemble members (20CR-ME).

When the whole analysis period is considered, we find that the models successfully capture many of the dynamical system features identified in the reanalysis, such as the range and variability of the *d* and *θ* metrics. The SLP fields corresponding to extremes in *d* and *θ* are also similar across the models and reanalysis. At the same time, some models exaggerate the effects of the seasonal cycle on the dynamical indicators, and the statistical agreement in the median values of the metrics is generally poor. Models with higher horizontal resolutions mostly perform better.

To detect the changes in the attractor properties with time, we have analyzed the 30-yr moving averages of *d* and *θ* for the models and reanalysis. We find a number of interesting behaviors; up to 1975, the 20CR-EM shows an increase in *d* and *θ* as a result of the averaging among ensemble members, whereas 20CR-ME shows a decrease in *d* and a slower increase in *θ*. After 1950, there is no trend in *θ* and a weakly decreasing trend in *d* for both datasets. The CMIP5 multimodel mean shows a weakly decreasing trend in *d* throughout the analysis period, in agreement with 20CR-ME, and no trend in *θ*. These results suggest that one should be very careful in using ensemble means for studying the atmospheric dynamics of the late-nineteenth and beginning of the twentieth century, even over the North Atlantic (Krueger et al. 2013; Ferguson and Villarini 2012, 2014; Alvarez-Castro et al. 2018). The 20CRv2c ensemble members are increasingly constrained by a growing number of SLP observations as one approaches the present day. This causes a decrease of the ensemble spread with time, since the system is more closely pinned to a specific manifold (the observations) without the possibility of exploring the full phase space. This is the root cause of the upward trend in the 20CR-EM local dimension, which therefore reflects a characteristic of the dataset as opposed to a physical change in the climate system. Our results are therefore in agreement with Wang et al. (2013), Krueger et al. (2014), and Dell’Aquila et al. (2016), who identified spurious trends in the 20CRv2c datasets in cyclone activity, storminess, and synoptic variability. Finally, while in the latter part of the analysis period the 20CR-EM and 20CR-ME values of *d* converge (consistent with the fact that ensemble spread decreases; Fig. 10); this is not the case for *θ* (cf. Figs. 9e and 9f). This may be a signature of recurrent wave activity. Roughly periodic fluctuations do not change *d* but will increase *θ*. In the 20CR-ME, atmospheric waves would therefore lower the persistence (and increase *θ*) more than in 20CR-EM, where they would be smoothed out. This would, however, have a relatively small impact on *d*. Preliminary analysis on a quasigeostrophic two-layer model supports this interpretation, although further analysis would be necessary to demonstrate this is indeed the case for the 20CR dataset.

The next natural question is whether we can trust the results obtained for single reanalysis ensemble members. We believe that the answer is affirmative, as (i) the dataset has a sufficiently high horizontal resolution to obtain a good estimate of the local dimension distribution (Faranda et al. 2017b), and (ii) we focus here on the North Atlantic sector, which can be expected to perform better than elsewhere since most of the observations used to constrain 20CRv2c in the first part of the dataset are located in Europe or eastern North America (Cram et al. 2015). We therefore argue that the results obtained for the 20CR-ME and the multimodel ensemble are valuable, and that the decrease in dimension with time is a real and interesting feature of the atmospheric dynamics that merits a more detailed analysis in further studies. In fact, a decreasing dimension implies a more predictable atmosphere. This conclusion needs to be corroborated by looking at other reanalyses of the modern period that use a fully 3D set of observations, and this will be the subject of a forthcoming study.

We also stress that there is a much stronger dynamical coherence within the 20CRv2c ensemble than among CMIP5 models. In fact, the spread in *d* and *θ* among 20CR members throughout the analysis period is significantly smaller than the differences among models (Fig. 9). Considering a single member of the 20CRv2c ensemble would still provide good estimates of *d* and *θ*, but the same would not always be true of a randomly chosen model. The CMIP5 model analysis further suggests that differences in model setup, resolution, and physics affect the trends in dynamical systems metrics more than the average properties, so that *d* increases for some models contrary to the 20CR-ME.

As a final caveat we note that our analysis does not attempt to separate the forced variability from natural low-frequency oscillations and that, especially during the first part of the analysis period, it is unclear whether the greenhouse forcing can be clearly discerned above the background “climate noise” (Paeth et al. 1999; Lyu et al. 2015). We must therefore take into account the possibility that the data’s internal variability dominates over the long-term forcing trends for the time period considered.

## Acknowledgments

D. Faranda, Y. Robin, and P. Yiou were supported by ERC Grant 338965-A2C2. M. C. Alvarez-Castro was supported by Swedish Research Council Grant C0629701. G. Messori was supported by Swedish Research Council Grant 2016-03724 and a grant from the Department of Meteorology of Stockholm University.

## REFERENCES

*Iterated Maps on the Interval as Dynamical Systems*. Springer Science and Business Media, 248 pp.

*Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation*. Cambridge University Press, 582 pp.

*Extremes and Recurrence in Dynamical Systems*. John Wiley and Sons, 312 pp.

*Optimal Transport for Applied Mathematicians*. Progress in Nonlinear Differential Equations and Their Applications, Vol. 87, Birkhäuser Basel, 353 pp.

*Optimal Transport: Old and New*. Grundlehren der Mathematischen Wissenschaften, Vol. 338, Springer Science and Business Media, 976 pp.

*Statistical Analysis in Climate Research*. Cambridge University Press, 496 pp.

## Footnotes

Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JCLI-D-17-0176.s1.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

^{1}

A list of symbols used in the manuscript is provided in Table S1 of the supplemental material.