1. Introduction
An energy balance model (EBM) is a simplified representation of climate where changes in global temperature are explained by imbalances in Earth’s energy budget. Energy balance models are simpler than atmosphere–ocean general circulation models (AOGCMs), which explicitly describe the fluid dynamics of Earth’s atmosphere and oceans. Their simplicity means that EBMs are both analytically tractable and inexpensive to simulate. Compared with purely empirical statistical models, EBMs have two distinct advantages: 1) the choice of model structure is motivated by physical reasoning, and 2) model parameters have physical interpretability. Energy balance models are therefore useful not only for climate forecasting but for making physical inferences about the climate system.
Energy balance models in the literature vary in complexity. The class of EBM considered here is the k-box model (sometimes called k-layer), which represents the atmosphere and ocean as a set of vertically stacked boxes. The simplest k-box model is the so-called one-box model, which is obtained by a linearization of the Budyko–Sellers model (Budyko 1969; Sellers 1969). The one-box model is known to insufficiently capture thermal inertia in the climate response and has been superseded by the two-box model (Gregory 2000; Held et al. 2010; Geoffroy et al. 2013a). Some recent studies have employed three-box models (Caldeira and Myhrvold 2013; Tsutsui (2016); Proistosescu and Huybers 2017; Fredriksen and Rypdal 2017). By taking the limit as k → ∞ it is possible to approximate continuous vertical heat diffusion.
The k-box energy balance model (Fig. 1) used in this study is defined by the system of k linear differential equations:
The first box represents the atmosphere and uppermost layer of the ocean, while boxes 2 to k together represent the deep ocean. Each box i has a temperature Ti and heat capacity Ci and is coupled to adjacent boxes above and below; T1 is defined to be global mean surface temperature (GMST) anomaly relative to preindustrial. Heat transfer coefficients κi > 0 determine the strength of thermal coupling between boxes i and i − 1. In the literature κ1 is often written as λ and is referred to as the climate feedback parameter (e.g., Geoffroy et al. 2013a). We follow the convention of Fredriksen and Rypdal (2017) and use the letter κ for both climate feedback and heat uptake by the deep ocean. The heat transfer coefficient κk in the equation for box k − 1 is multiplied by a so-called efficacy factor ε > 0, introduced by Held et al. (2010), to simulate variation in the effective strength of κ1 during periods of transient (nonequilibrium) warming. The term F(t) denotes radiative forcing measured at the top of the atmosphere and ξ(t) is a stochastic disturbance (see below). Table 1 contains physical units and a brief description of each parameter.

Vertical layout of the boxes in the k-box energy balance model. The thickness of each box indicates its heat capacity, and the arrows represent the flow of heat between adjacent boxes. The top of the atmosphere has no heat capacity and so is represented by a horizontal line. The dashed line in the middle is an abbreviation of the intervening boxes.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Vertical layout of the boxes in the k-box energy balance model. The thickness of each box indicates its heat capacity, and the arrows represent the flow of heat between adjacent boxes. The top of the atmosphere has no heat capacity and so is represented by a horizontal line. The dashed line in the middle is an abbreviation of the intervening boxes.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Vertical layout of the boxes in the k-box energy balance model. The thickness of each box indicates its heat capacity, and the arrows represent the flow of heat between adjacent boxes. The top of the atmosphere has no heat capacity and so is represented by a horizontal line. The dashed line in the middle is an abbreviation of the intervening boxes.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Parameters of k-box model with physical units and description.


Natural variability in GMST can be partially explained within the EBM framework using a stochastic process in the radiative forcing term (Hasselmann 1976). To enforce continuity of F in time we model F(t) as a red-noise process:
where Fdet(t) and η(t) are the respective deterministic and stochastic forcing components. Here we assume η(t) to be a Gaussian white-noise (WN) process with mean zero and standard deviation ση. In the limit as γ → ∞ the stochastic forcing becomes white noise, whereas if γ → 0 we have a random walk. Interannual variation in radiative forcing is insufficient to explain all of the natural variability in surface temperature. Residual surface temperature variability is explained here by a Gaussian WN disturbance ξ(t) with mean zero and standard deviation σξ. The term ξ(t) functions like an external forcing but is not measurable at the top of the atmosphere since it represents dynamic variability, which is generated internally.
As parameters of the k-box EBM do not correspond to well-defined physical quantities in the real world, it is not possible to calculate realistic parameter values directly from first principles. Parameter values must instead be estimated empirically from data. In this paper a maximum likelihood method is presented for estimating parameters of k-box models. The structure of the paper is as follows: section 2 provides a summary and critique of some methods previously employed to fit box models; section 3 describes data requirements for successful parameter estimation and the specific data from phase 5 of the Coupled Model Intercomparison Project (CMIP5) used in this study; section 4 outlines the proposed maximum likelihood framework; section 5 describes a software tool created for applying the method described in this paper; section 6 evaluates the robustness of the proposed method in a simulation study; section 7 explains how the method was applied to climate model data from CMIP5 and presents an analysis of the results; and the content of the paper is summarized in section 8.
2. Methods for fitting k-box energy balance models
Maximum likelihood estimation is simple for one-box model parameters: given uniformly sampled data, estimation reduces to an ordinary least squares problem with a closed-form solution (Rypdal and Rypdal 2014). When more boxes are added, latent variables appear and the estimation problem becomes more difficult. Several methods have been proposed in the literature for estimating parameters of box models with k ≥ 2, including least squares curve fitting (Geoffroy et al. 2013a; Caldeira and Myhrvold 2013), frequency-domain regression (Fredriksen and Rypdal 2017), and Bayesian estimation (Proistosescu and Huybers 2017; Jonko et al. 2018). Box models have previously been fitted to the historical record, to paleoclimate reconstructions, and to data from general circulation model experiments. Three examples of existing methods are described below. The first method described, proposed by Geoffroy et al. (2013a), is compared in section 7 with the new method proposed in this paper.
Geoffroy et al. (2013a) derived explicit time-dependent solutions for the two-box model under purely deterministic forcing scenarios. They proposed a procedure for estimating model parameters using measurements of GMST and top-of-the-atmosphere (TOA) net downward radiative flux (see section 3) from the step responses of AOGCMs in CMIP5. Their method uses prior information about characteristic time scales to estimate the model parameters in sequence, with the sum of squared residuals as the criterion to be minimized. The time-dependent solution of the two-box model is a sum of saturating exponentials and so estimating parameters in parallel by nonlinear least squares can be a notoriously difficult problem (Kaufmann 2003), which is avoided by estimating parameters sequentially. In a companion paper Geoffroy et al. (2013b) added a deep ocean heat uptake efficacy factor ε to their model, requiring the use of iteration in their fitting procedure. Geoffroy et al. (2013a) did not specify an error model; however, their least squares fitting criterion would correspond to maximum likelihood estimation under an assumption of errors which are independent and identically distributed (i.i.d.) and Gaussian. We have found this assumption to be inconsistent with time series of residuals obtained by subtracting fitted two-box model trajectories from AOGCM step responses: such residual time series exhibit strong autocorrelation. Without specifying an error model it is also impossible to correctly construct confidence intervals for parameter estimates.
Fredriksen and Rypdal (2017) estimated parameters of a three-box model with natural variability driven by a Gaussian WN process in the forcing term. They proposed an iterative least squares-based fitting algorithm to estimate the model parameters. Their method alternates between fitting the signal (expected temperature series for the first box) in the time domain and fitting the noise (time series of residuals) in the frequency domain. Fredriksen and Rypdal (2017) estimated model parameters using estimates of GMST from HadCRUT4 (Morice et al. 2012) and the Moberg et al. (2005) paleoclimate reconstructions, and forcing estimates from Crowley (2000) and Hansen et al. (2011). Unlike the other studies cited in this section, Fredriksen and Rypdal (2017) estimated parameters of box models (k ≥ 2) without access to measurements of TOA net downward radiative flux, as they were fitting to historical datasets. Only a subset of the model parameters was estimated from data since, without radiative flux measurements, a wide range of possible values for the three characteristic time scales τ1, τ2, and τ3 was found to be equally compatible with the observations. In their analysis three candidate time scale configurations were chosen and the remaining parameters estimated. An important result of Fredriksen and Rypdal (2017) is that the stochastically forced three-box model produces a similar noise spectrum to so-called scale-invariant models, a related class of simple climate model. Parameters of scale-invariant models have been estimated by maximum likelihood (Rypdal and Rypdal 2014) and more recently using Bayesian inference (Rypdal et al. 2018). The method of Rypdal et al. (2018) is generally applicable to linear response models, including box models, although the authors only present results for the scale-invariant model.
Jonko et al. (2018) estimated parameters of the two-box model using Bayesian hierarchical methods. In their model likelihood the variability in observed temperatures T1(t) and TOA net downward radiative flux N(t) are jointly modeled as a vector autoregressive process of order one [VAR(1)]. All VAR(1) correlations are considered free parameters, not constrained by the physical parameters of the EBM. Given prior distributions for parameters to be estimated, Markov chain Monte Carlo (MCMC) is used to form an approximation to the posterior distribution. Jonko et al. (2018) used their method to pool information from 24 AOGCM step responses and produce a joint posterior for equilibrium climate sensitivity (ECS). They also included time series of historical temperature observations in their model likelihood to further constrain estimates of future warming. By opting not to include a stochastic forcing term Jonko et al. (2018) increase the number of parameters to estimate and lose physical motivation for the natural temperature variability in their model.
Of the approaches considered above, none can be considered optimal in the sense of maximum likelihood or sampling from the posterior distribution of the full, stochastic k-box energy balance model. We therefore propose to develop a maximum likelihood method for estimating stochastic k-box models with k ≥ 2. Maximum likelihood estimators are widely used and have known asymptotic sampling properties allowing for simple quantification of uncertainty. Furthermore, optimal complexity of maximum likelihood models can be identified using information criteria.
3. Step response and CMIP5 data
The k-box model is a linear time-invariant system and is therefore completely characterized by its impulse response or alternatively its step response (of which the impulse response is the time derivative). The step response contains information about model behavior on all relevant time scales. The CMIP5 archive includes experiments (Taylor et al. 2012) designed to elicit the step response of AOGCMs by subjecting them to a step forcing of the form
The forcing is achieved by an instant quadrupling of atmospheric carbon dioxide (CO2) concentration. The reasoning behind this choice of forcing is that the amplitude should be large enough that the signal-to-noise ratio is high, but small enough not to induce strongly nonlinear behavior such as tipping points. Ideally the step-forcing experiment would be long enough for the system to stabilize at a new equilibrium temperature and multiple ensemble runs would be available for each AOGCM. However, since Earth system models (ESMs) are expensive to run, the step-forcing experiments in CMIP5 are typically 150 years in length and consist of a single ensemble member. These experiments nevertheless constitute the most information-rich datasets from which to infer the parameters of k-box models and simple climate models in general. The output of an AOGCM step-forcing experiment can even be used on its own to make climate predictions by convolving it with a forcing signal of interest (Good et al. 2011; Lucarini et al. 2017).
The models in CMIP5 have equilibration times in the thousands of years, meaning that a 150-yr time series of temperatures contains insufficient information to identify all model parameters. Attempting to fit to such datasets results in massively correlated parameter estimates with correspondingly large uncertainty. This difficulty can be overcome by using measurements of net downward radiative flux at the top of the atmosphere (TOA) to constrain κ1. Using Eqs. (1) and (3) we extract the relation
where N(t) denotes the TOA net downward radiative flux. If the system is in equilibrium at time t, that is, Tk−1(t) = Tk(t), and/or if ε = 1, Eq. (7) reduces to the traditional Gregory relation N(t) = F(t) − κ1T1(t) (Gregory et al. 2004). Note that, since fitting to 4 × CO2 experiments is essentially not feasible without measurements of N(t), fitting to historical temperature observations with all parameters free is unlikely to produce meaningfully constrained estimates.
4. Maximum likelihood framework
Computing the likelihood function for the k-box model is nontrivial. We typically observe the temperature of only the first box and hence for k ≥ 2 at least half of the model state variables are unobserved (latent). In this section we start by obtaining a rigorous state-space formulation of the k-box model. We then show how the likelihood of this state-space representation can be evaluated recursively using the Kalman filter. Numerical maximization of the likelihood is briefly described and a method for constructing confidence intervals given. Finally, we explain how optimal model complexity can be identified using information criteria.
a. Matrix representation
The purely deterministic, homogeneous (externally and internally unforced) k-box model with ε = 1 can be written in matrix form:
where
and
For ε ≠ 1 two entries in the penultimate row of
Analysis of the full inhomogeneous, stochastic k-box model is simplified by the inclusion of radiative forcing F as a state variable. Defining the state vector
we can write the full model
where
and where
and
with
b. Discretization scheme
The continuous-time model is a system of stochastic differential equations and may be analyzed using the tools of stochastic calculus. However if observations consist of uniformly spaced discrete samples then it makes sense to discretize the model (see section 7 for details of sampled data used in this study). Assuming constancy of the deterministic forcing input u(t) = Fdet(t) between samples, the model can be discretized exactly (see appendix B)
where
with subscript d denoting discretization. The integral in Eq. (23) can be evaluated via the matrix exponential method described in section 1 of Van Loan (1978).
c. State-space representation
As a linear time-invariant system the k-box model is amenable to powerful numerical techniques from control theory, in particular the Kalman filter (Kalman 1960). By choosing a model for our observation process y(t) we can write the k-box model in state-space form
where matrix
where entries of
but for climate model experiments we assume vd(t) = 0 for all t.
d. Kalman filter
The Kalman filter was originally developed as a minimum mean-square-error (MMSE) estimator of state variables in a noisy linear dynamic system (Kalman 1960). It may also be used to recursively calculate the likelihood of a time series of observations from this class of model (Tusell 2011). The Kalman filter estimates the system state at time t using the information contained in all previous observations up to and including time t, through a recursive procedure iterating over two steps: a prediction step and an update step. For the k-box model in state-space form we can write the Kalman recursions as follows, using the hat/subscript notation of Reid (2001).
1) Prediction step
Given
The predicted error covariance of this a priori estimate is
where
2) Update step
Having then observed yt we update our a priori estimate of xt with this new information to obtain an a posteriori state estimate
which has covariance
Our a posteriori state estimate is simply our a priori estimate
where the shrinkage amplitude is the optimal Kalman gain
The covariance of the a posteriori estimate is
The measurement postfit residual is
In the complete absence of observational noise the recursions may still be computed by setting Σt equal to a diagonal matrix with each diagonal element a very small number.
e. Model likelihood
Since the k-box model is a causal linear filter (i.e., system states depend on past states and past inputs but not on future states and future inputs), we can factorize the likelihood function of the temperature observations
where θ denotes the vector of model parameters. For numerical stability it is preferable to compute the log-likelihood
which can be calculated recursively using the prefit residuals and their corresponding covariances from the Kalman filter
Evaluation of (38) requires the distribution of x0|θ, upon which y1 depends, to be known. If we assume that at the beginning of a dataset the system is in a state of preindustrial equilibrium then E(x0) = 0 with some covariance matrix to be derived from the model parameters (see appendix C). For an abrupt 4 × CO2 climate model experiment, the first element of x0 (corresponding to radiative forcing) has an expected value, given the model parameters, of
The Kalman filter log-likelihood is essentially a weighted least squares objective function which penalizes squared one-step-ahead prediction errors (prefit residuals). The weighting applied to each prediction error is determined by its corresponding uncertainty (covariance).
f. Maximum likelihood estimation
The maximum likelihood estimator (MLE) of the model parameters θ is
where
Although the Fisher information
g. Optimal model complexity
The number of boxes k offers a natural parameterization of model complexity. When emulating an AOGCM with an EBM it is desirable to fit the most parsimonious model that does not significantly underperform compared to more complex models. Models with different numbers of boxes k can be compared (e.g., Caldeira and Myhrvold 2013) using Akaike’s information criterion (AIC). The AIC score for a fitted model m is defined as
where
Competing models can be compared using the decision rule whereby for a given AOGCM we choose the number of boxes k that minimizes AIC(mk).
5. Software implementation
This study has developed a package for the R software environment (R Core Team 2019) for simulation, fitting, filtering, and predicting with k-box EBMs. The package estimates parameters of k-box models from time series of GMST and TOA net downward radiative flux by numerically maximizing the likelihood function. To evaluate the model likelihood we use modern implementations of the matrix exponential (Goulet et al. 2019) and the Kalman filter (Luethi et al. 2018). The likelihood is maximized using an implementation (Johnson 2014; Ypma 2020) of the BOBYQA optimization algorithm (Powell 2009). Confidence intervals for parameter estimates are obtained using the Fisher information, as described in section 4f, where the Hessian of the likelihood function is evaluated numerically using an implementation of Richardson’s extrapolation (Gilbert and Varadhan 2016). The R package, which includes the datasets used in this paper, is available for download at https://github.com/donaldcummins/EBM.
6. Simulation study
a. Methods
A simulation study was performed to investigate the feasibility of fitting k-box models to AOGCM step response data via the proposed maximum likelihood method. The step response of HadGEM2-ES from CMIP5 was used to fit a two-box model and a three-box model (optimal under AIC). HadGEM2-ES was chosen as this model has been used extensively for climate change studies. Data from HadGEM2-ES consisted of annually averaged values (see section 7 for details of CMIP5 data used). Estimated two-box model parameters were γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ, and

Example simulated dataset from a two-box model with parameters: γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ,
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Example simulated dataset from a two-box model with parameters: γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ,
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Example simulated dataset from a two-box model with parameters: γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ,
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Pairs plot showing approximate sampling distribution of the maximum likelihood estimator. Each point represents a model fitted to a simulated dataset. Simulated datasets are from a two-box model with parameters: γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ,
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Pairs plot showing approximate sampling distribution of the maximum likelihood estimator. Each point represents a model fitted to a simulated dataset. Simulated datasets are from a two-box model with parameters: γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ,
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Pairs plot showing approximate sampling distribution of the maximum likelihood estimator. Each point represents a model fitted to a simulated dataset. Simulated datasets are from a two-box model with parameters: γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ,
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Histograms showing approximate sampling distribution of the maximum likelihood estimator. The thick vertical lines indicate the true value of each parameter. Simulated datasets are from a two-box model with parameters: γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ,
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Histograms showing approximate sampling distribution of the maximum likelihood estimator. The thick vertical lines indicate the true value of each parameter. Simulated datasets are from a two-box model with parameters: γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ,
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Histograms showing approximate sampling distribution of the maximum likelihood estimator. The thick vertical lines indicate the true value of each parameter. Simulated datasets are from a two-box model with parameters: γ = 1.58; C1, C2 = 7.73, 89.3 W yr m−2 K−1; κ1, κ2 = 0.632, 0.522 W m−2 K−1; ε = 1.52; ση, σξ,
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
b. Results
Estimator sampling distributions for the two-box and three-box models were examined for excessive bias, variance, and pairwise correlations. Results for the two-box model simulations are discussed below. Analysis of results for the three-box model leads to analogous conclusions.
Pairwise parameter correlations are visible in the estimated sampling distribution of the two-box model estimator (see Fig. 3). The strongest correlation (positive) is between the parameters controlling the stochastic forcing, γ and ση; that is, the whiter the noise, the greater the disturbance needed at each time step to obtain the same overall level of variability. The second strongest correlation (negative) is between the climate feedback parameter κ1 and deep ocean heat capacity C2. The third strongest correlation (positive) is between C1 and σξ. A natural explanation for this is that when the heat capacity of the first box C1 is increased the corresponding temperature T1 has more inertia and hence requires a stronger stochastic disturbance amplitude σξ to maintain the same level of variability. The fourth strongest correlation (negative) is between κ2 and C2. This correlation is related to the time taken for relaxation of the system on the longer time scale τ2. A longer relaxation time can be achieved either by increasing the heat capacity of the second box C2 or by reducing the heat transfer coefficient κ2 between boxes one and two.
Model parameters can be divided, by correlation, into two disjoint sets: set (i), stochastic forcing parameters γ and ση; and set (ii), all remaining parameters. Neither γ nor ση is correlated with any parameter in set (ii), nor does either parameter appear well constrained by the simulated datasets, the consequence being a mutual inflation of uncertainty. The correlations between parameters in set (ii) appear to act favorably: individual parameter uncertainty in set (ii) is uniformly low with coefficients of variation mostly less than 10%. If at least one parameter in set (ii) is well constrained by observations, as appears to be the case, then uncertainty in the other parameters decreases as a result.
Estimated marginal distributions of the two-box model parameters resemble unimodal bell curves (see Fig. 4), with the notable exception of γ and ση. The parameter γ appears poorly bounded from above (hard to rule out very white stochastic forcing) and this uncertainty propagates into ση. The maximum likelihood estimator is asymptotically unbiased but in general has a finite sample bias. Estimates of all two-box model parameters display some bias. Parameters γ and ση have positive relative biases of 21% and 6% respectively, which is unsurprising given the skewness of their marginal distributions. For parameters in set (ii), and for both the two-box and three-box models, the magnitude of the bias is in all cases less than 5% of the parameter’s true value.
c. Conclusions
The simulation study demonstrates that the proposed maximum likelihood method reliably estimates parameters of two-box and three-box box models from the step response of a typical AOGCM from CMIP5. Pairwise correlation and estimator bias were found to influence estimates of stochastic forcing parameters γ and ση; however, other model parameters were not adversely affected.
7. Fitting to CMIP5 climate model simulations
The R package was used to fit two-box and three-box models to the step responses of 16 ESMs from CMIP5 (see Table 2), using the same data as Geoffroy et al. (2013b). The step response data consist of values of GMST T1 and TOA net downward radiative flux N averaged over each of 150 years in the experiment. While it is possible, in practice, to fit four-box models using the methodology described in this paper, it was decided that the upper limit in this study should be k = 3. It was found that fitting a fourth box typically yields an estimated characteristic time scale substantially shorter than one year, which is beyond what might reasonably be extracted from annually averaged data.
For each ESM the fitted box model with lower AIC (see section 4g) was chosen as the optimal k-box emulator. The same procedure was applied to the multimodel mean (MMM) of the 16 step-response datasets. Maximum likelihood parameter estimates are reported for these optimal fits (see Table 3), with corresponding estimates (see Table 4) of characteristic time scales τi, surface temperature response coefficients ai, equilibrium climate sensitivity (ECS), and transient climate response (TCR). It should be noted that, as the shortest time scale of the three-box model is on the order of one year, estimated parameters of the first box will be affected by changes in radiative forcing due to stratospheric and tropospheric “rapid adjustments” (Chung and Soden 2015). We refer to the fits chosen using AIC as optimal k-box emulators for the remainder of this paper.
Estimated parameters for optimal k-box emulators of ESMs in CMIP5. In all cases k = 3. For physical units and descriptions of parameters see Table 1. MMM refers to the k-box model fitted to the average of the datasets from all 16 ESMs.


Characteristic time scales τi, surface temperature response coefficients ai, equilibrium climate sensitivity (ECS), and transient climate response (TCR) of optimal k-box emulators fitted to ESMs in CMIP5. Column ∆AIC shows the decrease in AIC moving from two to three boxes. For physical units and descriptions of parameters see Table 1. MMM refers to the k-box model fitted to the average of the datasets from all 16 ESMs.


From Table 4 it can be seen that, for all 16 fitted ESMs, three boxes are required for optimal emulation under AIC. According to AIC the multimodel mean requires three boxes. Figure 5 shows three examples of fitted step responses for optimal k-box emulators. In all fitted models the heat capacities of the boxes increase with depth while, with the exception of GISS-E2-R, the heat transfer coefficients decrease with depth (excluding the feedback parameter κ1). The approximate signal-to-noise ratio, calculated as

Observed and fitted three-box step responses of three ESMs from CMIP5. (left) Temperature trajectories for each box. (right) TOA net downward radiative fluxes against surface temperature. Gray dots are observations while the black curves are expected box-model responses. Models are (a) GISS-E2-R, (b) MIROC5, and (c) HadGEM2-ES.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Observed and fitted three-box step responses of three ESMs from CMIP5. (left) Temperature trajectories for each box. (right) TOA net downward radiative fluxes against surface temperature. Gray dots are observations while the black curves are expected box-model responses. Models are (a) GISS-E2-R, (b) MIROC5, and (c) HadGEM2-ES.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Observed and fitted three-box step responses of three ESMs from CMIP5. (left) Temperature trajectories for each box. (right) TOA net downward radiative fluxes against surface temperature. Gray dots are observations while the black curves are expected box-model responses. Models are (a) GISS-E2-R, (b) MIROC5, and (c) HadGEM2-ES.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Example approximate 95% confidence intervals for parameters of k-box models fitted to HadGEM2-ES. For physical units and descriptions of parameters see Table 1.



(a) Two-box and (b) three-box fits to the step response of IPSL-CM5A-LR. (left) Temperature trajectories for each box. (right) TOA net downward radiative fluxes against surface temperature. Gray dots are observations while the black curves are expected box-model responses.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

(a) Two-box and (b) three-box fits to the step response of IPSL-CM5A-LR. (left) Temperature trajectories for each box. (right) TOA net downward radiative fluxes against surface temperature. Gray dots are observations while the black curves are expected box-model responses.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
(a) Two-box and (b) three-box fits to the step response of IPSL-CM5A-LR. (left) Temperature trajectories for each box. (right) TOA net downward radiative fluxes against surface temperature. Gray dots are observations while the black curves are expected box-model responses.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
The number of boxes k influences the impulse responses of the fitted box models, sometimes strongly (see Fig. 7). The mathematical definition of the impulse response is given in appendix D. For all 16 ESMs from CMIP5 the impulse response of the optimal k-box emulator runs hotter in the first few years than that of the corresponding two-box model. Moving from two to three boxes increases the instantaneous sensitivity by between 17% and 174% (see Table 6). This suggests that when modeling the GMST response to impulse-like forcing events such as volcanic eruptions the greater flexibility of a three-box model might prove valuable.

Impulse responses of fitted k-box models. The curves are the expected temperature trajectories of the first box of the fitted models in response to a unit-impulse forcing. The solid and dashed curves correspond to three-box and two-box fits respectively, fitted using maximum likelihood. Models are (a) MIROC5 and (b) IPSL-CM5A-LR.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Impulse responses of fitted k-box models. The curves are the expected temperature trajectories of the first box of the fitted models in response to a unit-impulse forcing. The solid and dashed curves correspond to three-box and two-box fits respectively, fitted using maximum likelihood. Models are (a) MIROC5 and (b) IPSL-CM5A-LR.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Impulse responses of fitted k-box models. The curves are the expected temperature trajectories of the first box of the fitted models in response to a unit-impulse forcing. The solid and dashed curves correspond to three-box and two-box fits respectively, fitted using maximum likelihood. Models are (a) MIROC5 and (b) IPSL-CM5A-LR.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Instantaneous increase in surface temperature (K) under a unit-impulse forcing scenario. Results are given for two-box and three-box maximum likelihood fits. Also given is the percentage increase moving from two to three boxes. MMM refers to the k-box model fitted to the average of the datasets from all 16 ESMs.


Figure 8 compares two-box model parameter estimates obtained using maximum likelihood with those obtained by Geoffroy et al. (2013b). Maximum likelihood typically yields lower estimates of the heat capacities C1 and C2 but higher estimates of the heat transfer coefficient κ2. This results in shorter estimated characteristic time scales τ1 and τ2 when using maximum likelihood. Estimates of the radiative parameters κ1, ε, and

Maximum likelihood parameter estimates for two-box models compared with corresponding estimates from Geoffroy et al. (2013b). Each point is one of 16 ESMs from CMIP5. The solid lines have equation y = x and show where estimates are the same for both fitting methodologies. Plot axes are logarithmic.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Maximum likelihood parameter estimates for two-box models compared with corresponding estimates from Geoffroy et al. (2013b). Each point is one of 16 ESMs from CMIP5. The solid lines have equation y = x and show where estimates are the same for both fitting methodologies. Plot axes are logarithmic.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Maximum likelihood parameter estimates for two-box models compared with corresponding estimates from Geoffroy et al. (2013b). Each point is one of 16 ESMs from CMIP5. The solid lines have equation y = x and show where estimates are the same for both fitting methodologies. Plot axes are logarithmic.
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Under the proposed observation model (see section 4c), a fitted k-box model can be combined with temperature and forcing data to filter the (possibly noisy) observations and estimate the temperatures of the unobserved boxes (see Fig. 9). In this way we can see the attenuation of natural variability in temperature with increasing depth. The thermal inertia of the deep ocean boxes with their large heat capacity means that in the CO2 quadrupling experiment the noise in these boxes is dwarfed by the signal. Filtering and hidden state estimation with k-box models is not restricted to step responses or AOGCM experiments, but rather is applicable to any combination of global temperature and radiative forcing data, including the observational record.

Reconstructed three-box model state variables in the MRI-CGCM3 step-forcing experiment. The dots are observed surface temperatures T1(t) while the solid curves are reconstructed time series of the latent variables in their respective units. Latent variables are, from top to bottom, radiative forcing F(t) (in W m−2) and deep ocean box temperatures T2(t) and T3(t) (in K).
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1

Reconstructed three-box model state variables in the MRI-CGCM3 step-forcing experiment. The dots are observed surface temperatures T1(t) while the solid curves are reconstructed time series of the latent variables in their respective units. Latent variables are, from top to bottom, radiative forcing F(t) (in W m−2) and deep ocean box temperatures T2(t) and T3(t) (in K).
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
Reconstructed three-box model state variables in the MRI-CGCM3 step-forcing experiment. The dots are observed surface temperatures T1(t) while the solid curves are reconstructed time series of the latent variables in their respective units. Latent variables are, from top to bottom, radiative forcing F(t) (in W m−2) and deep ocean box temperatures T2(t) and T3(t) (in K).
Citation: Journal of Climate 33, 18; 10.1175/JCLI-D-19-0589.1
8. Summary
The k-box energy balance model in this paper offers a simple but flexible representation of the response of global mean surface temperature to radiative forcing, both deterministic and stochastic, over a range of time scales. Parameter estimation for this class of model is nontrivial: since we can typically observe the temperature of only the first box, we have a situation where for k ≥ 2 at least half of the model state variables are latent. We have shown how, by finding a state-space representation of the linear dynamic system and evaluating the likelihood recursively via the Kalman filter, maximum likelihood estimates of all model parameters may be obtained.
The k-box model is a linear time-invariant system and thus characterized by its response to a step forcing, a forcing scenario that has been simulated in AOGCM experiments. A simulation study has been carried out to investigate the feasibility, reliability, and performance of the proposed method when applied to step-response data. The proposed method has been found to reliably estimate the k-box model parameters.
An important advantage of maximum likelihood estimation is that optimal model complexity can be chosen using information criteria. To demonstrate this, two-box and three-box models were fitted to each of a set of 16 Earth system models from CMIP5 with the optimal number of boxes chosen by Akaike’s information criterion. It was found that for all 16 AOGCMs three boxes are required for optimal k-box emulation. Results obtained via maximum likelihood estimation were compared with equivalent results from the method of Geoffroy et al. (2013b). It was found that estimates of some model parameters differ systematically depending on the choice of fitting method. The number of boxes, k, was found to influence the impulse responses of the fitted models, sometimes strongly. These results suggest that, under impulse-like forcing scenarios, AOGCM responses might be better emulated using three-box models.
Finally, an example has been presented showing how a fitted k-box model can be combined with temperature and forcing data to reconstruct the temperatures of unobserved boxes corresponding to the deep ocean. Noise filtering and hidden state estimation using k-box AOGCM emulators are possible wherever we have a combination of global temperature and radiative forcing data, including the observational record.
Acknowledgments
We thank Chris E. Forest and two anonymous reviewers for their thoughtful and constructive comments, which helped improve the manuscript. We thank Olivier Geoffroy for kindly providing the climate model datasets used in this study. We thank Stefan Siegert for helpful discussions regarding the Kalman filter. We acknowledge the World Climate Research Program’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups (listed in Table 2 of this paper) for producing and making available their model output. For CMIP the U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.
APPENDIX A
Proof that Eigenvalues of Matrix are Real and Non-Positive when ε = 1
Consider the k × k matrices
and
APPENDIX B
Discretization of the Full k-Box Model
Equation (12) can be rearranged as follows:
Multiplying by the integrating factor
Integrating with respect to time,
so that, multiplying by
As a linear function of Gaussian random variables x(t) is itself Gaussian and hence fully characterized by its mean and covariance. Since E[w(t)] = 0 for all t,
where, assuming u(s) = u(t − 1) for s ∈ [t − 1, t),
For the covariance we have
where, since w(t) is white noise and hence uncorrelated in time,
For additional information on this type of discretization scheme see section 4.3 of Ljung (1987).
APPENDIX C
Marginal Covariance of the Stochastic Response
The k-box model is a linear dynamic system. Therefore the response to a linear combination of inputs is equal to the sum of the responses to individual inputs. In this way we can separate the model responses to deterministic and stochastic forcing components. The stochastic component of the response is driven by a purely stochastic input and may be written
which is a vector autoregressive process of order one [VAR(1)]. The matrix-valued auto-cross-covariance function Γ(h) is defined as
where the lag h is an integer. We seek the marginal auto-cross-covariance matrix Γ(0), which is the a priori covariance of x0 in the Kalman filter. Define the backshift operator B such that
We can write
and
The geometric series converges when the VAR(1) process is stationary (i.e., all eigenvalues of
Since
where δij denotes the Kronecker delta, we have
The infinite series can be computed as follows using the vec operator and the Kronecker product (Luetkepohl 1991).
Note (
APPENDIX D
Analytical Responses under Idealized Forcing Scenarios
a. Unit step forcing
The k-box model response under a unit step-forcing scenario
can be written
where 1 denotes the vector of ones (1, …, 1)′. The unit-forced equilibrium temperature is 1/κ1, which is obtained by setting Eq. (7) equal to zero and solving for
b. Unit impulse forcing
Differentiating xstep(x) with respect to time we obtain the response to a unit-impulse forcing
where δ(t) denotes the Dirac delta function:
This follows from the fact that an impulse is the time derivative of a step forcing.
c. Transient climate response
Integrating xstep(x) with respect to time and scaling appropriately we obtain the transient climate response (TCR)
which is the response to atmospheric CO2 concentration increasing at a rate of 1% yr−1 starting at time t = 0. This follows from the fact that an exponentially increasing CO2 input is equivalent to a sequence of superimposed 1.01 × CO2 step-forcing inputs. By linearity, the k-box model response to this superposition of forcing inputs is a superposition of the corresponding temperature outputs.
REFERENCES
Akaike, H., 1974: A new look at the statistical model identification. Selected Papers of Hirotugu Akaike, Springer, 215–222.
Budyko, M. I., 1969: The effect of solar radiation variations on the climate of the earth. Tellus, 21, 611–619, https://doi.org/10.3402/tellusa.v21i5.10109.
Caldeira, K., and N. Myhrvold, 2013: Projections of the pace of warming following an abrupt increase in atmospheric carbon dioxide concentration. Environ. Res. Lett., 8, 034039, https://doi.org/10.1088/1748-9326/8/3/034039.
Chung, E.-S., and B. J. Soden, 2015: An assessment of methods for computing radiative forcing in climate models. Environ. Res. Lett., 10, 074004, https://doi.org/10.1088/1748-9326/10/7/074004.
Crowley, T. J., 2000: Causes of climate change over the past 1000 years. Science, 289, 270–277, https://doi.org/10.1126/science.289.5477.270.
Fredriksen, H. B., and M. Rypdal, 2017: Long-range persistence in global surface temperatures explained by linear multibox energy balance models. J. Climate, 30, 7157–7168, https://doi.org/10.1175/JCLI-D-16-0877.1.
Geoffroy, O., D. Saint-Martin, D. J. L. Olivie, A. Voldoire, G. Bellon, and S. Tyteca, 2013a: Transient climate response in a two-layer energy-balance model. Part I: Analytical solution and parameter calibration using CMIP5 AOGCM experiments. J. Climate, 26, 1841–1857, https://doi.org/10.1175/JCLI-D-12-00195.1.
Geoffroy, O., D. Saint-Martin, G. Bellon, A. Voldoire, D. J. L. Olivié, and S. Tytéca, 2013b: Transient climate response in a two-layer energy-balance model. Part II: Representation of the efficacy of deep-ocean heat uptake and validation for CMIP5 AOGCMs. J. Climate, 26, 1859–1876, https://doi.org/10.1175/JCLI-D-12-00196.1.
Geršgorin, S., 1931: Über die Abgrenzung der Eigenwerte einer Matrix. Izv. Akad. Nauk SSSR Ser. Mat., 1, 749–755.
Gilbert, P., and R. Varadhan, 2016: numDeriv: Accurate numerical derivatives, version 2016.8-1. R package, https://CRAN.R-project.org/package=numDeriv.
Good, P., J. M. Gregory, and J. A. Lowe, 2011: A step-response simple climate model to reconstruct and interpret AOGCM projections. Geophys. Res. Lett., 38, L01703, https://doi.org/10.1029/2010GL045208.
Goulet, V., C. Dutang, M. Maechler, D. Firth, M. Shapira, and M. Stadelmann, 2019: expm: Matrix Exponential, Log, etc, version 0.999-4. R package, https://CRAN.R-project.org/package=expm.
Gregory, J. M., 2000: Vertical heat transports in the ocean and their effect on time-dependent climate change. Climate Dyn., 16, 501–515, https://doi.org/10.1007/s003820000059.
Gregory, J. M., and Coauthors, 2004: A new method for diagnosing radiative forcing and climate sensitivity. Geophys. Res. Lett., 31, L03205, https://doi.org/10.1029/2003GL018747.
Hansen, J., M. Sato, P. Kharecha, and K. Schuckmann, 2011: Earth’s energy imbalance and implications. Atmos. Chem. Phys., 11, 13 421–13 449, https://doi.org/10.5194/acp-11-13421-2011.
Hasselmann, K., 1976: Stochastic climate models, Part I. Theory. Tellus, 28, 473–485, https://doi.org/10.3402/tellusa.v28i6.11316.
Held, I. M., M. Winton, K. Takahashi, T. Delworth, F. Zeng, and G. K. Vallis, 2010: Probing the fast and slow components of global warming by returning abruptly to preindustrial forcing. J. Climate, 23, 2418–2427, https://doi.org/10.1175/2009JCLI3466.1.
Johnson, S. G., 2014: The NLopt nonlinear-optimization package. http://github.com/stevengj/nlopt.
Jonko, A., N. M. Urban, and B. Nadiga, 2018: Towards Bayesian hierarchical inference of equilibrium climate sensitivity from a combination of CMIP5 climate models and observational data. Climatic Change, 149, 247–260, https://doi.org/10.1007/s10584-018-2232-0.
Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. J. Basic Eng., 82, 35–45, https://doi.org/10.1115/1.3662552.
Kaufmann, B., 2003: Fitting a sum of exponentials to numerical data. Accessed 21 March 2019, https://arxiv.org/abs/physics/0305019.
Ljung, L., 1987: System Identification: Theory for the User. Prentice-Hall, 519 pp.
Lucarini, V., F. Ragone, and F. Lunkeit, 2017: Predicting climate change using response theory: Global averages and spatial patterns. J. Stat. Phys., 166, 1036–1064, https://doi.org/10.1007/s10955-016-1506-z.
Luethi, D., P. Erb, and S. Otziger, 2018: FKF: Fast Kalman filter, version 0.1.5. R package, https://CRAN.R-project.org/package=FKF.
Luetkepohl, H., 1991: Introduction to Multiple Time Series Analysis. Springer-Verlag, 556 pp., https://doi.org/10.1007/978-3-662-02691-5.
Moberg, A., D. M. Sonechkin, K. Holmgren, N. M. Datsenko, and W. Karlén, 2005: Highly variable Northern Hemisphere temperatures reconstructed from low- and high-resolution proxy data. Nature, 433, 613–617, https://doi.org/10.1038/nature03265.
Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones, 2012: Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 data set. J. Geophys. Res., 117, D08101, https://doi.org/10.1029/2011JD017187.
Powell, M. J., 2009: The BOBYQA algorithm for bound constrained optimization without derivatives. Cambridge NA Rep. NA2009/06, University of Cambridge, 39 pp., http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf.
Proistosescu, C., and P. J. Huybers, 2017: Slow climate mode reconciles historical and model-based estimates of climate sensitivity. Sci. Adv., 3, e1602821, https://doi.org/10.1126/sciadv.1602821.
R Core Team, 2019: R: A language and environment for statistical computing, R Foundation for Statistical Computing, https://www.R-project.org/.
Reid, I., 2001: Estimation II [lecture notes]. 44 pp., http://www.robots.ox.ac.uk/~ian/Teaching/Estimation/LectureNotes2.pdf.
Rypdal, M., and K. Rypdal, 2014: Long-memory effects in linear response models of Earth’s temperature and implications for future global warming. J. Climate, 27, 5240–5258, https://doi.org/10.1175/JCLI-D-13-00296.1.
Rypdal, M., H.-B. Fredriksen, E. Myrvoll-Nilsen, K. Rypdal, and S. H. Sørbye, 2018: Emergent scale invariance and climate sensitivity. Climate, 6, 93, https://doi.org/10.3390/CLI6040093.
Sellers, W. D., 1969: A global climatic model based on the energy balance of the Earth–atmosphere system. J. Appl. Meteor., 8, 392–400, https://doi.org/10.1175/1520-0450(1969)008<0392:AGCMBO>2.0.CO;2.
Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485–498, https://doi.org/10.1175/BAMS-D-11-00094.1.
Tsutsui, J., 2016: Quantification of temperature response to CO2 forcing in atmosphere–ocean general circulation models. Climatic Change, 140, 287–305, https://doi.org/10.1007/s10584-016-1832-9.
Tusell, F., 2011: Kalman filtering in R. J. Stat. Software, 39 (2), 1–27, https://doi.org/10.18637/jss.v039.i02.
Van Loan, C., 1978: Computing integrals involving the matrix exponential. IEEE Trans. Autom. Control, 23, 395–404, https://doi.org/10.1109/TAC.1978.1101743.
Ypma, J., 2020: Introduction to nloptr: An R interface to NLopt. 14 pp., https://cran.r-project.org/web/packages/nloptr/vignettes/nloptr.pdf.