## 1. Introduction

The dynamics of Earth’s climate system is to a large degree a story of interactions between processes on very different scales. It is commonly believed that there are fewer real degrees of freedom in the climate system than there are in a typical physical description because the smallest scales are somehow slaved. Weather prediction on the basis of severely limited observations is indeed possible because of such low dimensionality. If processes on the smallest physical scales in time or space were entirely independent, the butterfly effect would preclude prediction on any useful time scale.

Yet the dimensionality is also clearly greater than the number of independent observations used in weather prediction. The dissipative dynamics are such that assimilating a relatively small number of observations causes the remaining degrees of freedom, while dynamically independent, to partially synchronize with their “real” counterparts. Data assimilation is thus an instance of the dynamical systems paradigm of chaos synchronization (Duane et al. 2006; Yang et al. 2006), wherein two chaotic systems, though sensitively dependent on initial conditions, can be made to synchronize through the exchange, unidirectionally or bidirectionally, of only one or of a few dynamical variables (Fujisaka and Yamada 1983; Afraimovich et al. 1986; Pecora and Carroll 1990; Pecora et al. 1997). Standard data assimilation algorithms prescribe couplings between the systems, here “reality” and “model,” that can also be proven optimal for synchronization and thus for prediction (Duane et al. 2006).

Models can similarly be made to synchronize with *one another* when coupled only loosely, a phenomenon that can also be put to advantage. Unobserved subgrid-scale processes, some of them slaved and others that are dynamically important, are parameterized in different ways in different climate models. If such models are allowed to assimilate data from one another in run time, the models will partially synchronize. The “supermodeling” agenda that relies on the synchronization of alternative models of the same objective process is one of the ultimate applications of the chaos synchronization phenomenon to large real-world systems (Duane et al. 2017). The program is to tune the strengths of the intermodel assimilation for the different variables so that the synchronized variables simulate their counterparts in observations, in the belief that results superior to those of any one model, or to any ex post facto combination of model outputs, can thus be obtained. That hypothesis has been confirmed for supermodels formed from systems of low-order ODEs (van den Berge et al. 2011; Duane 2015; Mirchev et al. 2012; Du and Smith 2017^{1}), quasigeostrophic models, as described herein, versions of the intermediate-complexity SPEEDO model (Selten et al. 2017), and full climate models connected in a rudimentary way, only at the ocean–atmosphere interface (Shen et al. 2016, 2017).

Despite the proliferation of climate models and of subgrid-scale parameterization schemes, the partially synchronized large scales in the trained supermodel afford a superior description of the real climate. The improvement is related to the better known advantage gained in ex post facto averaging of outputs of different IPCC-class climate models, which almost always gives an improved representation of the real climate as compared to that of any single model (Reichler and Kim 2008; T. Reichler 2013, personal communication). It is as though the error in the separate models can be treated as random model error, which is reduced upon averaging. Supermodeling employs this assumption at every intermodel assimilation cycle.

The success of the supermodeling agenda, and of the new effective parameterization scheme on which it is based, should tell us something about the small-scale behavior of the real climate, and about relationships among processes on different scales. Here we provide evidence that supermodeling succeeds because it captures the interscale interactions that are essential to maintaining *critical behavior* that is missed in each of the separate models. Further, because different models commonly err in the same way in missing such behavior, no ex post facto average of the outputs of those models can reproduce it.

It is precisely when models are in critical states that they are most sensitive to the poorly represented small scales. According to the self-organized criticality (SOC) hypothesis of Bak et al. (1987), systems naturally tend toward such states. They are in any case commonly encountered in the climate system, and SOC has been applied to tropical sea surface temperatures previously (Andrade et al. 1995). What the training of supermodels achieves, in this view, is criticality. This is possible because prior modeling experience, essentially performing optimization in various subspaces of the space of all possible models, has been used to narrow the training task to the dimensions in model space along which ambiguity remains. That ambiguity arises because optimization of a model within the confines of any given algorithmic scheme chosen for subgrid-scale parameterization tends to miss the full scope of interscale interactions that come into play near criticality.

The plan of this paper is as follows. In the next section we review the concept and history of supermodeling. Then in section 3 we discuss supermodeling in the case of a toy model with no unresolved processes, but with intermodel coupling in a limited range of scales. It is seen that critical behavior can be achieved where there is none in the constituent models, with an emergent power spectrum that reflects the expected interscale cascade. In section 4, we examine the Shen et al. (2016, 2017) full climate supermodel, with intermodel connections only through a common ocean, from the same point of view. We discuss the possibility of monolithic single-model descriptions of small-scale processes in section 5, and conclude with a reassessment of the self-organized criticality hypothesis in the final section.

## 2. Background: Synchronization of competing models in a supermodel

### a. Single-scale ODE supermodels

The supermodel concept is first illustrated with three Lorenz (1963) systems, each with a different setting of the standard parameters, that we combine to mimic the behavior of a “true” system, defined by a fourth choice of parameters.

*x*,

*y*,

*z*) is the real Lorenz system and (

*x*,

_{i}*y*,

_{i}*z*)

_{i}*i*= 1, 2, 3 are the three models. An extra term

*μ*is present in the models but not in the real system. Because of the relatively small number of variables available in this toy system, all possible directional couplings among corresponding variables in the three Lorenz systems were considered, giving 18 connection coefficients

*K*,

_{A}*A*=

*x*,

*y*,

*z*are chosen arbitrarily so as to effect “data assimilation” from the “real” Lorenz system into the three coupled “model” systems. The corresponding configuration for a general set of models, in any given assimilation context represented by a gain matrix

*a*is an arbitrary constant, we find that the three systems synchronize with each other as well as with the “true” system, as seen in Fig. 2a, and that the connection coefficients rapidly asymptote to optimal values. The arrangement is analogous to an idealized form of weather prediction with continuous assimilation of data from “truth” into three alternative forecast models, none of which track truth (Figs. 2b–d) as well as the supermodel.

That Eq. (2) is the correct way to extend synchronization of states to estimation of connection coefficients follows from a result on extending synchronization of states to synchronization of parameters generally (Duane et al. 2007), as explained in detail by Selten et al. (2017). As a heuristic justification, if one considers a time integral of (2), the prescribed change in any intermodel nudging coefficient *A* = *x*, *y*, or *z*, is seen to be proportional to the correlation of the intermodel nudging term and the overall signed error between supermodel and truth. With care to the correct choice of sign in the latter factor, model *j* is then given more weight, through nudging, if such nudging of model *i* to model *j* tends to reduce truth–supermodel synchronization error, and conversely, as desired. For models of real processes, the learning rate *a* can be made to depend on the amount of random observational error, as in ordinary data assimilation, so that learning proceeds more slowly when the data are noisy.

If we turn off the connection to truth and run the trained supermodel freely, so as to represent long-range climate projection rather than weather prediction, we would expect to lose synchronization, but to still have a supermodel attractor that matches the true attractor. Results of such a comparison are shown in Figs. 3a–e. A fair degree of attractor matching, still far from perfect, has been achieved. Further, if we modify an ancillary parameter of the models, such as *ρ*, to mimic the change in radiative forcing of all models in a changed climate, the modified supermodel attractor (Fig. 3e) captures the new properties—a shift in *Z* and expansion in the *Y* dimension—of the modified true attractor (Fig. 3d), even where the connections defining the new supermodel were obtained by training with the original value of *ρ* in all models.

Better attractor matching is achieved with algorithms that consider finite trajectory segments, instead of instantaneous states, in setting values of the connection coefficients for models with very different attractors (van den Berge et al. 2011). In the general case of highly imperfect models, algorithms that minimize short-term prediction error are not adequate for matching attractors (Wiegerinck and Selten 2017). The task of optimizing connections in a supermodel such as (1) thus defines a problem in machine learning that can be addressed with a variety of traditional and less traditional methods (Wiegerinck and Selten 2017; Schevenhoven and Carrassi 2022).

### b. Multiscale supermodels: The quasigeostrophic example

Supermodeling with multiscale models, the main subject of this paper, is first illustrated by extending an example that was previously used to study synchronization-based teleconnections—weak ones of mainly theoretical interest—between the Atlantic and Pacific sectors of the midlatitude circulation in a quasigeostrophic channel model (Duane and Tribbia 2001, 2004). It was previously established that two such models, one forced by a jet in the Atlantic and the other by a jet in the Pacific, would synchronize if connected through only a limited range of Fourier components of the total flow. In the present context, we recast the partially synchronized configuration of connected models as a toy supermodel, engaged in three-way synchronization with a “true” system that has a jet in both sectors.

*q*in a two-layer reentrant channel on a

*β*plane:

*i*= 1, 2,

*ψ*is streamfunction, and the Jacobian

*D*/

*Dt*. The forcing

*F*is a relaxation term designed to induce a jet-like flow near the beginning of the channel:

*D*, boundary conditions, and other parameter values are given in Duane and Tribbia (2004).

*i*) with forcing terms defined in terms of their spectral components:

*q*

**is the wavenumber**

_{k}**k**spectral component of

*q*in three dimensions.

^{2}The adjustments

The flow fields in the coupled channels governed by (4) are found to synchronize, regardless of differences in initial conditions, as seen in Fig. 4. Synchronization was also found when only the medium scales were coupled, by replacing the advective coupling terms *cJ*(*ψ ^{A}*

^{,}

*,*

^{B}*q*

^{B}^{,}

*−*

^{A}*q*

^{A}^{,}

*) in (4) with*

^{B}*q*

^{A}^{,}

*. For faster convergence, intermodel nudging terms can be added, in various configurations as described by Duane and Tribbia (2004).*

^{B}The correspondence between the channels is not exactly the identity, because of the difference in forcings, but is an instance of *generalized synchronization*, as known to occur in pairs of systems of ordinary differential equations (Rulkov et al. 1995) when small differences in parameters are introduced. Likewise here, at *c* = 1/2 we have *ψ ^{A}* ≈

*ψ*, with the difference between the flows in the two channels given mostly by a difference in the high-spatial-frequency components, as illustrated in Fig. 5. Control of the small scales by the assimilation of large-scale data agrees with previous studies of such behavior QG models, e.g., Tanguay et al. (1995). That the smallest scales need not be coupled is thought to reflect the existence of an inertial manifold (Temam 1988) on which they are slaved. The existence of (approximate) inertial manifolds for forced-dissipative systems is the theoretical underpinning of the synchronization-based approach.

^{B}It is readily seen that the average *c* = 1/2, is the solution of a model with the average forcing, that is, of a model with *two* jets, as in Fig. 4g. (If intermodel nudging terms are included in the forcing, they cancel in the average.) The flow in either channel *ψ ^{A}* ≈

*ψ*approximates the flow

^{B}*c*= 0 to 1/2, the dynamics of each channel changes so as to incorporate a “virtual” counterpart of the dynamics of the sector that is forced in the other channel, including blocking behavior in that sector.

*c*= 1/2 that defines a useful supermodel can be obtained by training against a “real” (two-jet) dataset

*q*

_{obs}(

**x**,

*t*) to which the model is nudged, according to the training rule:

### c. Climate supermodels

At an intermediate level of complexity in the hierarchy of models, two SPEEDO models (Severijns and Hazeleger 2010) sharing a common ocean, but with distinct atmospheric components, each with its own parameter settings, were connected by nudging at all atmospheric grid points (Selten et al. 2017). The resulting supermodel exhibited a climatology and a climate response to a CO_{2} increase that was closer to those of a reference “true” model than were those of a standard multimodel average. That result suggests that the toy supermodel results described above will carry over to full climate models.

The manner in which supermodels surpass multimodel averages is, however, more clearly illustrated by the application to full climate models connected to one another only through a common ocean, without interatmosphere connections. We use the COSMOS configuration (Giorgetta et al. 2013) composed of an ECHAM5 atmosphere and an MPIOM ocean. Two versions of the ECHAM5 atmosphere model, “Nordeng” and “Tiedtke,” each with a different parameterization of subgrid-scale convective processes, were coupled to a single MPIOM ocean model (Shen et al. 2016, 2017): Both atmosphere models feel air–sea fluxes based on the same SST field, while the ocean receives a weighted average of the air–sea fluxes, as depicted schematically in Fig. 7. We refer the interested reader to the original sources (Tiedtke 1989; Nordeng 1994) for details of the differences between the two models, which are rather involved.

*α*,

*β*, and

*γ*are used for each of the air–sea fluxes felt by the common ocean—heat, momentum, and freshwater, respectively. That is, the two COSMOS models are given by

*f*and

_{N}*f*are the dynamics describing the evolution of the atmospheric state vector

_{T}**A**in the Nordeng and Tiedtke models, respectively, which include a dependence on the atmosphere–ocean fluxes

**Q**,

**, and**

*τ***q**, for heat, momentum (wind stress), and water (precipitation minus evaporation), respectively. Each flux depends on the states of atmosphere and ocean in a different way for each model. The ocean state

**O**evolves according to the dynamics

*g*which is the same in both models. The supermodel state is the conjoined state vector (

**A**

*,*

_{N}**A**

*,*

_{T}**O**). The state evolves according to

*α*,

*β*, and

*γ*are free parameters that are trained using a gradient descent scheme, so that the simulated monthly climatology of SST over the tropical Pacific region is closest to the observed monthly climatology, with respect to RMS error in SST in the tropical Pacific (10°S–10°N, 160°E–90°W) (Shen et al. 2016)

The climatological SST and precipitation fields for the two models, the supermodel, and reference observations are shown in Fig. 8. It is seen that while the two models each exhibit the error of a double intertropical convergence zone (ITCZ), that is also found in a large variety of other GCMs (Mechoso et al. 1995; Zhang et al. 2007), the trained supermodel exhibits the single ITCZ that is found in observed tropical Pacific behavior surrounding ENSO.^{3} Importantly, any ex post facto weighted average of the outputs of the two models would also exhibit the double ITCZ error.

To be consistent with our definition of supermodeling as intermodel data assimilation, we note that the configuration described above, given by Eqs. (8)–(10), approximates a configuration of two coupled ocean–atmosphere models—one with a Nordeng atmosphere and an MPIOM ocean, the other with a Tiedtke atmosphere and a separate MPIOM ocean—connected by intermodel nudging between the ocean components.^{4} To see this, imagine that the connections between the two oceans are strong enough to cause them to synchronize, while the interocean connections for different prognostic variables are chosen judiciously. For example, if the atmosphere–ocean heat flux only enters the prognostic equation for ocean temperature, and the momentum flux only enters the prognostic equations for ocean velocity, then different coefficients for the nudging of temperature and velocity between the two oceans would give different weights *α* and *β* for the heat and momentum fluxes, resp., in the limit of infinite nudging, *l* = *T*, *u*, *υ*, …) while

## 3. Relationships between processes on different scales in a quasigeostrophic supermodel

*q** used to force both models:

*w*,

^{A}*w*∈ [0, 1], with

^{B}*w*= 1 −

^{B}*w*. Define the coefficients of the advective coupling terms in (11) differently for the two channels, specifically let

^{A}*c*=

^{A}*w*and

^{B}*c*=

^{B}*w*. Then, combining the two equations in (11) and expanding the advective derivatives

^{A}*D*/

*Dt*, the averaged potential vorticity

**k**in any range of scales for which the models are synchronized, we naturally define the corresponding component of the supermodel

*q*

^{sm}as

**k**for which the external forcing coefficients

*D*

^{A}^{,}

*are of the same form and combine linearly. So the supermodel field*

^{B}**q**

^{sm}also satisfies (16), the usual potential vorticity equation with the forcing terms

*F*

^{A}^{,}

*averaged. The contribution of any intermodel nudging terms to the averaged forcing can also be made to vanish for an appropriate ratio of*

^{B}*a*is a learning rate, with

*w*≡ 1 −

_{B}*w*, and

_{A}*υ*is the ratio, held constant, between the strength of nudging to “observations” during training and the strength of external forcing. This adaptation rule is the analog of (2) and (7) for the single-jet two-forcing-strength QG supermodel (see appendix).

Two models that are unforced and strongly forced, respectively, exhibit no blocking, as seen in Fig. 9, while a model with moderate forcing strength exhibits a blocked-zonal index cycle, arguably similar to that found in the real atmosphere, as does also the supermodel with the two extreme cases as constituents, after training the connections (Fig. 9e) to simulate “realistic” forcing. With only medium scales connected, the constituent models synchronize with each other and exhibit the usual index cycle. Note that the models cannot possibly synchronize exactly, because they would satisfy different equations if the nudging terms vanish. The differences are taken up in the small-scale components that are not connected between models, as shown for the two-jet supermodel in Fig. 5. That some dynamical components of the constituent models are *not* connected, so that the models remain semiautonomous, is key to the supermodeling program.

With strong large-scale forcing, small-scale dynamics are ineffective in disrupting the jet-like flow (Fig. 9a). With no forcing, no interesting interaction between large and small scales occurs, and only low-amplitude turbulence (Fig. 9b) ensues. The interesting behavior occurs in the intermediate dynamical regime (Fig. 9c). We take the blocked-zonal flow vacillation to be one example of *critical behavior*, of which there are indeed many in the climate system, each characterized by chaotically intermittent vacillation of a climate subsystem between qualitatively different dynamical regimes.

In the SOC picture of Bak et al. (1987), as applied to the model studied, blocked and zonal flows are “minimally stable” states which break down in response to small fluctuations. One can indeed cast blocked and zonal flows as two equilibria that exist simultaneously in a simpler, barotropic system over a narrow range of parameters (Ghil and Childress 1987). Vacillation among minimally stable states, in the Bak et al. view, occurs in a wide variety of open systems that are far from thermodynamic equilibrium, such as the famous example of a sandpile onto which sand is continually poured, giving rise to a fractal pattern of avalanching structures on the surface. The fractal form arises, in the idealized SOC case, because information cascades freely from small scales to large scales when a system is at or a near a critical point, implying that there is no preferred scale in the range of scales over which this cascade occurs. In the nonideal case studied here, no fractal form is discerned, but there is interscale interaction across a wide range of scales (e.g., Branstator 1995), so something of the SOC picture will apply.

Rather than further describe the specific interscale interactions that give rise to criticality, we investigate the applicability of the power-law form of the fluctuations that was conjectured by Bak et al. (1987) to characterize critical states generally. If *P*(*s*) represents the distribution of scales *s* in a power spectrum, then the lack of a well-defined value for *s*, suggests that *P*(*s*) is of the form *Ns ^{α}* for some constants

*α*and

*N*, so that log

*P*(

*s*) is linear in log

*s*, with slope

*α*. Logarithmic plots of the energy spectrum of the flow in the QG channel model are shown in Fig. 10. The energy falloff with wavenumber in the situation of blocked-zonal flow vacillation, as occurs in “reality” and in the supermodel, has a form closer to that given by a single log-linear range than does the spectrum of either of the two constituent models defined by extreme values of the forcing strength, except for a range of energies and of length scales both too small to be of interest, in the case of zero forcing. The behavior supports the notion that the cascade of information from the small scales is important in the case of vacillation, and that it is achieved by the supermodel. In the toy model there are no scales that are unresolved by design, but there are relevant processes on many scales that are collectively parameterized by the forcing strength

*μ*

_{0}. We suggest that the demonstrated log-linear form is a vestige of the form that would be found in a more complete model that exhibits the same vacillation.

It is suggested that the representation of such spatiotemporal structures as blocking patterns is at the root of a trained supermodel’s ability to avoid errors present in its constituent models, even when those errors are qualitatively similar in all constituents. Such structures, like the fractal patterns on avalanching sandpiles, appear only near criticality, and thus are absent in the noncritical constituent models. They are also absent in any ex post facto average of the constituent model outputs. In the following sections, further evidence is presented that supermodels can generally surpass the common method of output averaging for the same reason—the unique character of the spatiotemporal patterns that emerge only near a critical state and are missed by all constituent models.

## 4. Trained criticality in the full climate supermodel

We argue that the COSMOS supermodel that was described in section 2, in reaching a critical state, qualitatively different from that of either of its constituent models, exhibits behavior comparable to that of the quasigeostrophic supermodel discussed above. Here the critical state, imagined to coincide with climatology, as characterized by a single ITCZ and a reduced cold tongue, is supported by suppressed ocean upwelling at the equator under conditions where the equatorial and off-equatorial ocean circulations combine in just the right way to suppress the excessive cold tongue found in either model separately, in the manner previously described by Shen et al. (2017).

Briefly, the Shen et al. mechanism relies on an interplay between Ekman pumping of subsurface cold water, upwelled at the equator due to local wind stress on the one hand, and off-equatorial downwelling driven by wind stress gradients at 5°–10°N and 5°–10°S on the other. The off-equatorial downwelling induces equatorial upwelling via the meridional tropical cells in the ocean, in either hemisphere. The equatorial wind stress and consequent upwelling is stronger in the Nordeng model, while the off-equatorial downwelling and consequent equatorial upwelling is stronger in the Tiedtke model. The two effects are combined in the supermodel, with relative strengths determined by the value of the momentum-flux weight *β* in (10), which determines wind stress. The effects do not themselves combine linearly, due to some of the same nonlinearities that are essential to the ENSO cycle (e.g., Jin 1997, 1998; Zebiak and Cane 1987). That is, the cooling due to upwelling at the equator is not itself a weighted average of the cooling in Nordeng and Tiedtke. Nature has in effect selected a special value of *β* that gives a singular pattern in the tropical Pacific SST, different from what is observed at both higher and lower values.

Following the self-organized criticality analysis of Bak et al. (1987), we examine the power spectra for observations and the different models, as in section 3 for the QG model, but here using temporal rather than spatial frequency. The spectra for both the Nordeng and Tiedtke models, in simulations of the twentieth century, exhibit a falloff with frequency that is too steep at low frequencies (Figs. 12a,b), as compared to observations (Fig. 12d). This behavior agrees with the fact that the ENSO cycle in both models is too rapid, with too much energy concentrated around a period of 2 years, an error commonly encountered in models of ENSO (e.g., Kirtman 1997). The single-ITCZ pattern is a promising indication that the ocean–atmosphere dynamics in the supermodel is a more faithful representation of the true dynamics.

The chaotic vacillation here that is analogous to the vacillation between blocked and zonal flow regimes in the midlatitudes, depicted in Fig. 9d, is that between the warm phases and cold phases of the ENSO cycle, illustrated in Fig. 11. While the climatological precipitation patterns in the two phases are less distinct than the temperature patterns shown in the figure, the single ITCZ in climatology is indeed an average of the two phases. Nordeng, Tiedtke, and other climate models of similar resolution overrepresent the cold (La Niña) phase in which the ITCZ is divided. That the single ITCZ behavior in the supermodel is the result of vacillation, and does not represent a uniform climatological shift, was demonstrated in Shen et al. (2016), where it was shown that the supermodel better reproduces the El Niño phase. The system is in a critical state in regard to the two flow regimes. As with blocking, El Niño events interrupt the background cold phase pattern that is so easily reproduced in climate models.

Analogously to the blocking patterns in the quasigeostrophic model, here it is the spatial structure associated with ENSO that is characteristic of the critical behavior. The extension of the Pacific warm pool, defining an El Niño event, is an extreme form of the suppression of the cold tongue in the supermodel in a climatological average. The vacillation between El Niño and La Niña regimes reduces the size of the cold tongue on average.

The extended warm pool in El Niño events, in this view, is a dissipative structure, as are the blocking patterns in the QG model, of the sort that arises in nonequilibrium forced-dissipative systems. And as with the fractal structure on the surface of Bak et al.’s sandpiles that gives rise to avalanches, the El Niño structure has a unique form, though not fractal, that disappears as one ventures away from the critical state.

Interscale interactions are thought to play a key role in the ENSO cycle and in the form of El Niño events, as they do in blocked/zonal vacillation and in blocking patterns. All models exhibit cascades (Figs. 12a–c), down to the scale of the 4–12 yr^{−1} Madden–Julian oscillation (MJO), which has indeed been hypothesized to trigger El Niño events (Moore and Kleeman 1999). The cascades actually continue to higher frequencies (not shown in the figure) that would include westerly wind bursts with periods of 5–20 days, which have also been thought to play such a triggering role (Harrison and Vecchi 1997). Neither of these phenomena are reproduced well either by the separate models or by the supermodel. The supermodel, however, while giving a falloff rate that is still too steep, slightly steeper than in the separate models at high frequencies, avoids the low-frequency error of the separate models, in each of which the region of rapid falloff extends to below-annual frequencies. In this regard the supermodel spectrum (Fig. 12c) is more like that of observations (Fig. 12d). Thus, it is the form of the criticality in this case that is better represented by the supermodel than it is by either of the constituent models separately, although the latter qualitatively agree with each other in this regard, as they do in their SST patterns. The improvement in the spectrum is thought to accompany a more realistic transfer of energy between scales that underlies the phenomena producing the observed SST pattern.

## 5. Discussion

### a. Generality of supermodeling

The use of a handful of models to create a useful supermodel rests on the assumption that the set of combinations of the constituent model dynamics includes the true dynamics, or comes closer to it, than does any one constituent model. It is preferred to consider only combinations formed with nonnegative coefficients or weights, since the use of negative coefficients can lead to instability. If we envision the models as defined in a common, large parameter space, then the constituent models should form a convex hull surrounding the “true” model (e.g., Schevenhoven et al. 2019). A desirable set of constituent models can be created by design, as with the Lorenz system supermodel described in section 1, and with both of the two quasigeostrophic supermodels described in sections 2b and 3, respectively. Alternatively, a desirable model set can be constructed using models that have each been locally optimized in their respective parameter spaces, as with the Nordeng and Tiedtke versions of the COSMOS model, relying on the empirical result that model error in such situations can be viewed as random error in the large model space. That empirical result seems reasonable, as it does for the set of IPCC-class models discussed in the introduction, because the local optimization starts from choices made by different groups of modelers that arguably are equally realistic or nearly so. Some combination of such models will likely come closer to the true dynamics.

For state-of-the-art climate models, the main differences are in the subgrid-scale parameterization schemes. The improvement obtained by supermodeling is therefore expected to be in the representation of phenomena on these scales. Further, realistic anthropogenic changes in radiative forcing are not expected to have a large impact on the combination of subgrid-scale parameterizations best suited to represent the true dynamics. So, the effective parameterizations in supermodels trained on twentieth century data are expected to be robust against climate change, requiring only the usual change in forcings and not in intermodel connections, as with the Lorenz system example in Fig. 3. Models are most sensitive to behavior on the smallest scales when they are near criticality, irrespective of the specific theory of criticality put forward by Bak et al. and their predicted power spectra. Thus, the improvement due to supermodeling is expected to be most pronounced when the system is near criticality, presently or in the projected future.

It remains to argue that similar improvement could not be obtained by averaging the outputs of the separate models, in the general case. We assert, without proof, that the occurrence of such structures as blocking patterns and a single ITCZ near and only near critical points is common. The structures are then absent in any ex post facto average of states in models that do not represent the critical behavior well.

### b. Effective parameterization of the subgrid scales in a climate supermodel

It is natural to ask whether there is some parameterization of the subgrid-scale dynamics, intermediate between the Nordeng and Tiedtke parameterizations, that would yield the same results as the supermodel but with a single atmosphere. Indeed, it is expected that better results in the upper atmosphere and in the extratropics will be obtained by introducing connections between the atmospheres. If the atmospheres themselves then nearly synchronize, it would thus be expected that the desynchronized small scales could be used to define such an intermediate parameterization.

We argue that the lesson of the supermodeling study is that such parameterizations must be more complex, dynamical entities. By contrast, the quasigeostrophic supermodel formed from unforced and strongly forced models (11)–(14), is equivalent to a single model with a parameterization defined simply by averaging the forcing strengths, as shown in section 3. This will generalize to any supermodel formed from models that differ only in parameters that appear linearly in the prognostic equations, perhaps as multipliers of entire tendencies, provided that the nonlinear terms are only the familiar Jacobian terms for advection.

Averaging of tendencies (e.g., Wiegerinck and Selten 2017) will generally not suffice in the case of climate models with multiple nonlinearities, such as the ECHAM models with different convective parameterizations, coupled to one another by nudging with finite coefficients. With climate models that are fully connected in such standard ways, among which the large scales synchronize, there would be additional degrees of freedom at the smallest scales and, in general, no definition of supermodel variables, known a priori, that would satisfy a modified set of prognostic equations. We recall that not all variables can synchronize in systems that satisfy different equations given by different subgrid-scale parameterizations, if coefficients remain finite. If the large- and medium-scale variables synchronize, the smallest-scale variables cannot. The only recourse might be a limit of infinite nudging coefficients in which all scales are nudged and synchronize, with nudging terms *C _{ij}*(

**x**

*−*

_{i}**x**

*) that remain finite as*

_{j}*C*→ ∞ and

_{ij}**x**

*−*

_{i}**x**

*→ 0, to compensate for the model differences. In the limiting case, such finite terms can be taken as dynamical variables that satisfy their own prognostic equations. Thus, a combined parameterization is only possible, in general, at the cost of introducing extra degrees of freedom, whether we introduce desynchronized variables or use*

_{j}*C*→ ∞.

_{ij}It is interesting to compare our method with that of stochastic parameterization, in which random variables are added (Palmer et al. 2009; Berner et al. 2017). Specifically, one could consider random variables that are sums of multiplicative noise terms, each random term based on one of the models’ instantaneous parameterization of unresolved processes. Here the new variables are deterministically related to the entire history of the models’ states.

Extra degrees of freedom are needed, of course, if we assume a need to represent the physical processes occurring at subgrid scales explicitly. The usual approach would be to increase the spatial resolution of the grid. But we note that we have achieved a single ITCZ by at most doubling the number of degrees of freedom through the use of a separate model. A similar increase in dynamical dimension in the increased resolution approach would require that resolution be increased by

Thus, supermodeling gives an economy of extra degrees of freedom. If we assume, for finite nudging, that synchronized and desynchronized variables segregate according to scale, as is common, we can have unique variables to describe the large and medium scales, corresponding to the putatively synchronized variables in our supermodel. This subsystem is analogous to the single ocean in the COSMOS supermodel. But there would be two or more models of the smallest resolved scales, which need not be connected to one another, but would be connected to the larger scales with variable weights. The proposed architecture, schematized in Fig. 13, emerges as the form of effective parameterization that directly extends the supermodel approach to combining models.

Near criticality, the two-way interaction between the smallest scales and the larger scales is especially important. The supermodel approach, in our construction, is useful for two related reasons: 1) it uses prior modeling experience to effectively isolate the dimensions in a high-dimensional parameter space along which there is ambiguity in how to parameterize the subgrid scales, so that optimization can be restricted to those dimensions—by training intermodel connections, as in conventional supermodeling; 2) it uses those dimensions to define extra degrees of freedom, in a dynamical parameterization, that capture the part of the subgrid-scale dynamics that is relevant to interscale interactions in a critical state.

## 6. Concluding remarks—Beyond self-organized criticality

We have used the Bak et al. (1987) analysis as a reference point for our investigation of criticality in the two systems studied here. However, we are not wedded to the specific log-linear form of the spectrum that is expected in the original concept of self-organized criticality. That form has been criticized (Jaeger et al. 1989) as not even descriptive of the behavior of real sandpiles, a primary example in the original proposal, and is too specific. A multifractal form applies in the more general case discussed by Kadanoff et al. (1989), which may be piecewise linear or even given by a continuously varying exponent in the putative power law. Correspondingly, the spatial structures that arise intermittently near a critical state have a unique form, but one more general than that implied by the fractal structure on the surface of Bak et al.’s sandpiles—a generalization that has not heretofore received attention. Blocking patterns and extended warm pools can be viewed as examples of the “dissipative structures” that appear in Prigogine’s (Nicolis and Prigogine 1977; Prigogine and Stengers 1979) qualitative approach to the nonequilibrium thermodynamics of open forced-dissipative systems, in which fluctuations play a large role near critical points, as here. But general rules about the form of the structures and their intermittency near a critical point have not been elucidated. In the absence of a general theory, our point about the scaling is simply that however the true power spectrum deviates from the idealized SOC form, the supermodel spectrum is closer to it than is the spectrum of any of the separate models from which the supermodel is constructed.

Specifically, for the toy quasigeostrophic model, the spectrum (Fig. 10) has a form suggestive of the expected log-linear behavior over a significant range of frequencies and energies. For the constituent models, the behavior appears piecewise multifractal (for strong forcing) at best, or has the expected power-law form over energies and length scales that are too small to be of interest (for no forcing). The multifractal form is in fact common in geophysical turbulence (Schertzer and Lovejoy 2011). For the tropical Pacific SST and its detailed dynamical models, the spectrum (Fig. 12) of the real system exhibits log-linear behavior over a range that is more restricted than in simple models. In reality, and in the supermodel, specific dynamical processes appear to limit the power-law form at the low end of that range—processes that are indeed key to the ENSO phenomenon as known. Thus, while any claim to universality of the power-law form is exaggerated, that form provides a benchmark for the comparison of observed and modeled behavior.

What we seek to preserve of self-organized criticality is the qualitative assertion that natural systems tend toward critical states, and toward unique, temporally intermittent spatial structures associated with those states. For the two examples given, that would mean that processes that are not represented in the models or are not fully resolved effectively set the values of the forcing strength or the flux weights, respectively, to give criticality and associated structures. The power spectra confirm this tendency, in view of the expected scale relationships near such states. Beyond the scope of this paper is a possible connection with the idea that such critical states arise so as to maximize heat transport from the equator to the poles and the associated principle of maximum entropy production (Paltridge 1975), another qualitative principle in climate science that has yet to yield verifiable predictions. As the proponents of SOC commented in response to criticism, their claim was that SOC could explain power-law behavior when it was observed, not that it universally implied power-law behavior (Bak et al. 1989).

In any case, phenomena of interest in weather and climate tend to involve critical states, and that does explain the need for the methods of this paper. It is to be remembered that subgrid-scale processes to which critical states are sensitive are dynamical processes that are only approximately, not completely slaved to the larger scales. Especially near criticality, they cannot be correctly represented by any diagnostic parameterization. Such situations are common if not universal. Criticality was achieved in the present work not by design, but by crudely training a small number of coefficients combining different models, each of them arguably physical, against observed data. It appears that nature has chosen criticality over simplicity in regard to approximate parameterization schemes. Thus, where limited computational resources necessitate a compact representation of subgrid-scale processes, either supermodeling or other use of empirical methods (i.e., learning) to combine alternative parameterizations is in order.

The cross-pollination in time II scheme of Du and Smith (2017) is based on intermodel data assimilation and thus is regarded here as a form of supermodeling, though it was conceived and developed independently.

Following Duane and Tribbia (2001, 2004), periodic boundary conditions are used in the *y* dimension as well as the *x* dimension. Their use in *y* implies that there are *two* flows, in opposite directions, displaced from one another in latitude. Only half of the full domain is shown in the figures in this paper, so as to focus on a single flow. But here the wavenumber **k** refers to the number of waves per domain width or domain length. So here *k _{y}* is the number of waves across the width of the full domain, that is, twice the number of waves across the width of the displayed channel.

It is important to distinguish the trained supermodel behavior from that of the *interactive ensemble* of Kirtman and Shukla (2002), who also obtained improvement in the representation of ENSO by coupling several atmospheres, there with equal weights, to the same ocean. In that work, the atmospheres were different realizations of the *same* model, so the improvement was due to a reduction in weather noise. Here, training is essential, since the double ITCZ remains when equal weights are used (Fig. 8c), and the improvement cannot be attributed to noise reduction. Further, no improvement in SST pattern similar to that obtained here was reported in that previous work. The *multimodel* interactive ensemble of Kirtman et al. (2003), on the other hand, involved swapping of variables between different models and was a forerunner of supermodeling.

As this article goes to press, it is noted that such an ocean-connected supermodel has recently been constructed by Counillon et al. (2023).

The rule that appears in both references is stated for dynamical equations with any form of dependence on the parameters, but a linear dependence, as in (A1) and (A2), was assumed in the derivation, so we can express the rule as (A3) without loss of generality. Selten et al. (2017) address the restricted case where the Lyapunov function *L* has the usual RMS form.

## Acknowledgments.

Michael Ghil is thanked for useful discussions regarding self-organized criticality and synchronization generally. Andrew Mai is thanked for assistance with software modifications to adapt the connection coefficient in the quasigeostrophic supermodel. The original version of the code used herein for a single QG channel model, and the original suggestion for its use, are due to Joe Tribbia, to whom we are grateful. We thank the three anonymous reviewers for their constructive contributions and the editor, Peter Bartello, for his diligent consideration of all points of view. This work was supported by the European Commission (EU FP7 Grant 266722, ERC Grant 648982 and Marie Sklodowska-Curie Actions—Individual Fellowship Grant 658602), the U.S. Department of Energy (Grant DE-SC0005238), and the National Science Foundation (Grant 2015618).

## Data availability statement.

Software and data relevant to this publication are archived at https://zenodo.org/communities/duaneshenjas2022data.

## APPENDIX

### Synchronization-Based Optimization of Connections in a PDE Supermodel

**x**= (

*x*

_{1},

*x*

_{2}, …,

*x*) is the

_{n}*n*-dimensional state vector. The evolution of the model system,

*p*, and an extra term

*c*(

_{i}**x**,

**x**

*) has been added to represent nudging or other form of data assimilation.*

^{M}*L*(

**x**,

**x**

*;*

^{M}*t*) represent the error in the model’s state at time

*t*. Commonly,

*L*→ 0, regardless of initial states, for a correct choice of parameter

*p*=

^{M}*p*, we add an equation for adaptation of the parameter along with states:

*l*such that the prognostic equation for the state variable

*x*contains the parameter

_{l}*p*, and

*a*is an arbitrary learning rate. The rule (A3) can be rigorously shown to give asymptotic values of the model parameter that allows the states to synchronize (Duane et al. 2007; Selten et al. 2017).

^{5}

*L*. For a supermodel composed of

*m*models, two broad categories are to define the error for each constituent model and then average, i.e.,

*c*, the coefficient of the advective coupling term. The same parameter appears in the two Eqs. (4) for every point in space. So there are two terms in the sum in (A3) for each point in space and an integral over space. The Lyapunov function we choose is a sum of error relative to truth over the two models separately, giving factors of

*q*−

^{A}*q*

_{obs}and

*q*−

^{B}*q*

_{obs}. The cofactors

*f*are

^{p}*J*(

*ψ*,

^{A}*q*−

^{B}*q*) and

^{A}*J*(

*ψ*,

^{B}*q*−

^{A}*q*). The resulting parameter adaptation rule is

^{B}*p*=

*w*to be determined, having taken

_{A}*w*≡ 1 − –

_{B}*w*. We assume, as in Duane and Tribbia (2004), that the dissipative terms

_{A}*D*and

^{A}*D*are linear and are of the same form, so we can simplify

^{B}*w*. The cofactor of

^{A}*w*in (A5), in each large-scale equation where

_{A}*w*appears, is

_{A}*w*, we need to include a nudging term in (A5). While such a term would usually be independent of the parameter to be estimated, as in previous examples, in the present case the nudging strength (in the medium scales) must be kept proportional to the external forcing (in the large scales) for dynamical consistency. With nudging to “truth” included in the forcing for the medium scales with a proportionality factor

^{A}*ν*, (A5) becomes

*w*are modified as follows:

^{A}*a*.

## REFERENCES

Afraimovich, V. S., N. N. Verichev, and M. I. Rabinovich, 1986: Stochastic synchronization of oscillation in dissipative systems.

,*Radiophys. Quantum Electron.***29**, 795–803, https://doi.org/10.1007/BF01034476.Andrade, J. S., Jr., I. Wainer, J. M. Filho, and J. E. Moreira, 1995: Self-organized criticality in the El Niño Southern Oscillation.

,*Physica A***215**, 331–338, https://doi.org/10.1016/0378-4371(95)00004-Q.Bak, P., C. Tang, and K. Wiesenfeld, 1987: Self-organized criticality: An explanation of 1/

*f*noise.,*Phys. Rev. Lett.***59**, 381–384. https://doi.org/10.1103/PhysRevLett.59.381.Bak, P., C. Tang, and K. Wiesenfeld, 1989: Comment on “Relaxation at the angle of repose.”

,*Phys. Rev. Lett.***62**, 110, https://doi.org/10.1103/PhysRevLett.62.110.Berner, J., and Coauthors, 2017: Stochastic parameterization: Toward a new view of weather and climate models.

,*Bull. Amer. Meteor. Soc.***98**, 565–588, https://doi.org/10.1175/BAMS-D-15-00268.1.Branstator, G., 1995: Organization of storm-track anomalies by recurring low-frequency circulation anomalies.

,*J. Atmos. Sci.***52**, 207–226, https://doi.org/10.1175/1520-0469(1995)052<0207:OOSTAB>2.0.CO;2.Counillon, F., N. Keenlyside, S. Wang, M. Devilliers, A. Gupta, S. Koseki, and M.-L. Shen, 2023: Framework for an ocean-connected supermodel of the Earth system.

,*J. Adv. Model. Earth Syst.***15**, e2022MS003310, https://doi.org/10.1029/2022MS003310.Du, H., and L. A. Smith, 2017: Multi-model cross-pollination in time.

,*Physica D***353–354**, 31–38, https://doi.org/10.1016/j.physd.2017.06.001.Duane, G. S., 2015: Synchronicity from synchronized chaos.

,*Entropy***17**, 1701–1733, https://doi.org/10.3390/e17041701.Duane, G. S., and J. J. Tribbia, 2001: Synchronized chaos in geophysical fluid dynamics.

,*Phys. Rev. Lett.***86**, 4298–4301, https://doi.org/10.1103/PhysRevLett.86.4298.Duane, G. S., and J. J. Tribbia, 2004: Weak Atlantic–Pacific teleconnections as synchronized chaos.

,*J. Atmos. Sci.***61**, 2149–2168, https://doi.org/10.1175/1520-0469(2004)061<2149:WATASC>2.0.CO;2.Duane, G. S., J. J. Tribbia, and J. B. Weiss, 2006: Synchronicity in predictive modelling: A new view of data assimilation.

,*Nonlinear Processes Geophys.***13**, 601–612, https://doi.org/10.5194/npg-13-601-2006.Duane, G. S., D.-C. Yu, and L. Kocarev, 2007: Identical synchronization, with translation invariance, implies parameter estimation.

,*Phys. Lett.***371A**, 416–420, https://doi.org/10.1016/j.physleta.2007.06.059.Duane, G. S., C. Grabow, F. Selten, and M. Ghil, 2017: Introduction to focus issue: Synchronization in large networks and continuous media—Data, models, and supermodels.

,*Chaos***27**, 126601, https://doi.org/10.1063/1.5018728.Fujisaka, H., and T. Yamada, 1983: Stability theory of synchronized motion in coupled-oscillator systems.

,*Prog. Theor. Phys.***69**, 32–47, https://doi.org/10.1143/PTP.69.32.Ghil, M., and S. Childress, 1987:

*Topics in Geophysical Fluid Dynamics: Atmospheric Dynamics, Dynamo Theory and Climate Dynamics*. Applied Mathematical Sciences, Vol. 60, Springer-Verlag, 485 pp.Giorgetta, M. A., and Coauthors, 2013: Climate and carbon cycle changes from 1850 to 2100 in MPI-ESM simulations for the Coupled Model Intercomparison Project phase 5.

,*J. Adv. Model. Earth Syst.***5**, 572–597, https://doi.org/10.1002/jame.20038.Harrison, D. E., and G. A. Vecchi, 1997: Westerly wind events in the tropical Pacific, 1986–95.

,*J. Climate***10**, 3131–3156, https://doi.org/10.1175/1520-0442(1997)010<3131:WWEITT>2.0.CO;2.Huang, B., C. Liu, V. Banzon, E. Freeman, G. Graham, B. Hankins, T. Smith, and H.-M. Zhang, 2021: Improvements of the Daily Optimum Interpolation Sea Surface Temperature (DOISST) version 2.1.

,*J. Climate***34**, 2923–2939, https://doi.org/10.1175/JCLI-D-20-0166.1.Jaeger, H. M., C.-H. Liu, and S. R. Nagel, 1989: Relaxation at the angle of repose.

,*Phys. Rev. Lett.***62**, 40–43, https://doi.org/10.1103/PhysRevLett.62.40.Jin, F.-F., 1997: An equatorial ocean recharge paradigm for ENSO. Part I: Conceptual model.

,*J. Atmos. Sci.***54**, 811–829, https://doi.org/10.1175/1520-0469(1997)054<0811:AEORPF>2.0.CO;2.Jin, F.-F., 1998: A simple model for the Pacific cold tongue and ENSO.

,*J. Atmos. Sci.***54**, 2458–2469, https://doi.org/10.1175/1520-0469(1998)055<2458:ASMFTP>2.0.CO;2.Kadanoff, L. P., S. R. Nagel, C.-H. Liu, and S. M. Zhou, 1989: Scaling and universality in avalanches.

,*Phys. Rev. A***39**, 6524–6537, https://doi.org/10.1103/PhysRevA.39.6524.Kirtman, B. P., 1997: Oceanic Rossby wave dynamics and the ENSO period in a coupled model.

,*J. Climate***10**, 1690–1704, https://doi.org/10.1175/1520-0442(1997)010<1690:ORWDAT>2.0.CO;2.Kirtman, B. P., and J. Shukla, 2002: Interactive coupled ensemble: A new coupling strategy for CGCMs.

,*J. Geophys. Res. Lett.***29**, 1367, https://doi.org/10.1029/2002GL014834.Kirtman, B. P., D. Min, P. S. Schopf, and E. K. Schneider, 2003: A new approach for CGCM sensitivity studies. COLA Tech. Rep. 154, 50 pp., http://www.m.monsoondata.org/pubs/tech.html.

Lorenz, E. N., 1963: Deterministic nonperiodic flow.

,*J. Atmos. Sci.***20**, 130–141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.Mechoso, C. R., and Coauthors, 1995: The seasonal cycle over the tropical Pacific in coupled ocean–atmosphere general circulation models.

,*Mon. Wea. Rev.***123**, 2825–2838, https://doi.org/10.1175/1520-0493(1995)123<2825:TSCOTT>2.0.CO;2.Met Office Hadley Centre, 2013: HadISST1 data. Accessed 5 February 2013, https://www.metoffice.gov.uk/hadobs/hadisst/data/download.html.

Mirchev, M., G. S. Duane, W. K. Tang, and L. Kocarev, 2012: Improved modeling by coupling imperfect models.

,*Commun. Nonlinear Sci. Numer. Simul.***17**, 2741–2751, https://doi.org/10.1016/j.cnsns.2011.11.003.Moore, A. M., and R. Kleeman, 1999: Stochastic forcing of ENSO by the intraseasonal oscillation.

,*J. Climate***12**, 1199–1220, https://doi.org/10.1175/1520-0442(1999)012<1199:SFOEBT>2.0.CO;2.Namias, J., 1951: The index cycle and its role in the general circulation.

,*J. Meteor.***7**, 130–139, https://doi.org/10.1175/1520-0469(1950)007<0130:TICAIR>2.0.CO;2.Nicolis, G., and I. Prigogine, 1977:

*Self-Organization in Nonequilibrium Systems: From Dissipative Structures to Order through Fluctuations*. Wiley, 491 pp.Physical Sciences Laboratory, 2013: GPCP version 2.3 combined precipitation data set. NOAA/OAR/ESRL, accessed 5 February 2013, https://psl.noaa.gov/data/gridded/data.gpcp.html.

Physical Sciences Laboratory, 2016: NOAA OI SST V2 high resolution dataset. NOAA/OAR/ESRL, accessed 21 April 2022, https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.highres.html.

Nordeng, T. E., 1994: Extended versions of the convective parametrization scheme at ECMWF and their impact on the mean and transient activity of the model in the tropics. ECMWF Tech. Memo. 206, 42 pp., https://www.ecmwf.int/en/elibrary/75843-extended-versions-convective-parametrization-scheme-ecmwf-and-their-impact-mean.

Palmer, T. N., R. Buizza, F. Doblas-Reyes, T. Jung, M. Leutbecher, G. Shutts, M. Steinheimer, and A. Weisheimer, 2009: Stochastic parametrization and model uncertainty. ECMWF Tech. Memo. 598, 44 pp., www.ecmwf.int/sites/default/files/elibrary/2009/11577-stochastic-parametrization-and-model-uncertainty.pdf.

Paltridge, G. W., 1975: Global dynamics and climate—A system of minimum entropy exchange.

,*Quart. J. Roy. Meteor. Soc.***101**, 475–484, https://doi.org/10.1002/qj.49710142906.Pecora, L. M., and T. L. Carroll, 1990: Synchronization in chaotic systems.

,*Phys. Rev. Lett.***64**, 821–824, https://doi.org/10.1103/PhysRevLett.64.821.Pecora, L. M., T. L. Carroll, G. A. Johnson, D. J. Mar, and J. F. Heagy, 1997: Fundamentals of synchronization in chaotic systems, concepts, and applications.

,*Chaos***7**, 520–543, https://doi.org/10.1063/1.166278.Prigogine, I., and I. Stengers, 1979:

*The New Alliance*. Gallimard, 312 pp.Reichler, T., and J. Kim, 2008: How well do coupled models simulate today’s climate?

,*Bull. Amer. Meteor. Soc.***89**, 303–311, https://doi.org/10.1175/BAMS-89-3-303.Reynolds, R. W., T. M. Smith, C. Liu, D. B. Chelton, K. S. Casey, and M. G. Schlax, 2007: Daily high-resolution-blended analyses for sea surface temperature.

,*J. Climate***20**, 5473–5496, https://doi.org/10.1175/2007JCLI1824.1.Rulkov, N. F., M. M. Sushchik, and L. S. Tsimring, 1995: Generalized synchronization of chaos in directionally coupled chaotic systems.

,*Phys. Rev. E***51**, 980–994, https://doi.org/10.1103/PhysRevE.51.980.Schertzer, D., and S. Lovejoy, 2011: Multifractals, generalized scale invariance, and complexity in geophysics.

,*Int. J. Bifurcation Chaos***21**, 3417–3456, https://doi.org/10.1142/S0218127411030647.Schevenhoven, F., and A. Carrassi, 2022: Training a supermodel with noisy and sparse observations: A case study with CPT and the synch rule on SPEEDO.

,*Geosci. Model Dev.***15**, 3831–3844, https://doi.org/10.5194/gmd-15-3831-2022.Schevenhoven, F., F. Selten, A. Carrassi, and N. Keenlyside, 2019: Improving weather and climate predictions by training of supermodels.

,*Earth Syst. Dyn.***10**, 789–807, https://doi.org/10.5194/esd-10-789-2019.Selten, F. M., F. J. Schevenhoven, and G. S. Duane, 2017: Simulating climate with a synchronization-based supermodel.

,*Chaos***27**, 126903, https://doi.org/10.1063/1.4990721.Severijns, C. A., and W. Hazeleger, 2010: The efficient global primitive equation climate model SPEEDO V2.0.

,*Geosci. Model Dev.***3**, 105–122, https://doi.org/10.5194/gmd-3-105-2010.Shen, M.-L., N. Keenlyside, F. M. Selten, W. Wiegerinck, and G. S. Duane, 2016: Dynamically combining climate models to “supermodel” the tropical Pacific.

,*Geophys. Res. Lett.***43**, 359–366, https://doi.org/10.1002/2015GL066562.Shen, M.-L., N. Keenlyside, B. C. Bhatt, and G. S. Duane, 2017: Role of atmosphere-ocean interactions in supermodeling the tropical Pacific climate.

,*Chaos***27**, 126704, https://doi.org/10.1063/1.4990713.Tanguay, M., P. Bartello, and P. Gauthier, 1995: Four-dimensional data assimilation with a wide range of scales.

,*Tellus***47A**, 974–997, https://doi.org/10.3402/tellusa.v47i5.11967.Temam, R., 1988:

*Infinite-Dimensional Dynamical Systems in Mechanics and Physics.*Springer-Verlag, 650 pp.Tiedtke, M., 1989: A comprehensive mass-flux scheme for cumulus parameterization in large-scale models.

,*Mon. Wea. Rev.***117**, 1779–1800, https://doi.org/10.1175/1520-0493(1989)117<1779:ACMFSF>2.0.CO;2.van den Berge, L. A., F. M. Selten, W. Wiegerinck, and G. S. Duane, 2011: A multi-model ensemble method that combines imperfect models through learning.

,*Earth Syst. Dyn.***2**, 161–177, https://doi.org/10.5194/esd-2-161-2011.Vautard, R., and B. Legras, 1988: On the source of midlatitude low-frequency variability. Part II: Nonlinear equilibration of weather regimes.

,*J. Atmos. Sci.***45**, 2845–2867, https://doi.org/10.1175/1520-0469(1988)045<2845:OTSOML>2.0.CO;2.Vautard, R., B. Legras, and M. Déqué, 1988: On the source of midlatitude low-frequency variability. Part I: A statistical approach to persistence.

,*J. Atmos. Sci.***45**, 2811–2843, https://doi.org/10.1175/1520-0469(1988)045<2811:OTSOML>2.0.CO;2.Wiegerinck, W., and F. M. Selten, 2017: Attractor learning in synchronized chaotic systems in the presence of unresolved scales.

,*Chaos***27**, 126901, https://doi.org/10.1063/1.4990660.Yang, S.-C., and Coauthors, 2006: Data assimilation as synchronization of truth and model: Experiments with the three-variable Lorenz system.

,*J. Atmos. Sci.***63**, 2340–2354, https://doi.org/10.1175/JAS3739.1.Zebiak, S. E., and M. A. Cane, 1987: A model El Niño–Southern Oscillation.

,*Mon. Wea. Rev.***115**, 2262–2278, https://doi.org/10.1175/1520-0493(1987)115<2262:AMENO>2.0.CO;2.Zhang, X., W. Lin, and M. Zhang, 2007: Toward understanding the double intertropical convergence zone pathology in coupled ocean-atmosphere general circulation models.

,*J. Geophys. Res.***112**, D12102, https://doi.org/10.1029/2006JD007878.