Climate Field Completion via Markov Random Fields: Application to the HadCRUT4.6 Temperature Dataset

Adam Vaccaro Department of Earth Sciences, University of Southern California, Los Angeles, California

Search for other papers by Adam Vaccaro in
Current site
Google Scholar
PubMed
Close
,
Julien Emile-Geay Department of Earth Sciences, University of Southern California, Los Angeles, California

Search for other papers by Julien Emile-Geay in
Current site
Google Scholar
PubMed
Close
,
Dominque Guillot Department of Mathematical Sciences, University of Delaware, Newark, Delaware

Search for other papers by Dominque Guillot in
Current site
Google Scholar
PubMed
Close
,
Resherle Verna Department of Earth Sciences, University of Southern California, Los Angeles, California

Search for other papers by Resherle Verna in
Current site
Google Scholar
PubMed
Close
,
Colin Morice Met Office Hadley Centre, Exeter, United Kingdom

Search for other papers by Colin Morice in
Current site
Google Scholar
PubMed
Close
,
John Kennedy Met Office Hadley Centre, Exeter, United Kingdom

Search for other papers by John Kennedy in
Current site
Google Scholar
PubMed
Close
, and
Bala Rajaratnam Department of Statistics, University of California, Davis, Davis, California

Search for other papers by Bala Rajaratnam in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Surface temperature is a vital metric of Earth’s climate state but is incompletely observed in both space and time: over half of monthly values are missing from the widely used HadCRUT4.6 global surface temperature dataset. Here we apply the graphical expectation–maximization algorithm (GraphEM), a recently developed imputation method, to construct a spatially complete estimate of HadCRUT4.6 temperatures. GraphEM leverages Gaussian Markov random fields (also known as Gaussian graphical models) to better estimate covariance relationships within a climate field, detecting anisotropic features such as land–ocean contrasts, orography, ocean currents, and wave-propagation pathways. This detection leads to improved estimates of missing values compared to methods (such as kriging) that assume isotropic covariance relationships, as we show with real and synthetic data. This interpolated analysis of HadCRUT4.6 data is available as a 100-member ensemble, propagating information about sampling variability available from the original HadCRUT4.6 dataset. A comparison of Niño-3.4 and global mean monthly temperature series with published datasets reveals similarities and differences due in part to the spatial interpolation method. Notably, the GraphEM-completed HadCRUT4.6 global temperature displays a stronger early twenty-first-century warming trend than its uninterpolated counterpart, consistent with recent analyses using other datasets. Known events like the 1877/78 El Niño are recovered with greater fidelity than with kriging, and result in different assessments of changes in ENSO variability through time. Gaussian Markov random fields provide a more geophysically motivated way to impute missing values in climate fields, and the associated graph provides a powerful tool to analyze the structure of teleconnection patterns. We close with a discussion of wider applications of Markov random fields in climate science.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Adam Vaccaro, avaccaro@usc.edu

Abstract

Surface temperature is a vital metric of Earth’s climate state but is incompletely observed in both space and time: over half of monthly values are missing from the widely used HadCRUT4.6 global surface temperature dataset. Here we apply the graphical expectation–maximization algorithm (GraphEM), a recently developed imputation method, to construct a spatially complete estimate of HadCRUT4.6 temperatures. GraphEM leverages Gaussian Markov random fields (also known as Gaussian graphical models) to better estimate covariance relationships within a climate field, detecting anisotropic features such as land–ocean contrasts, orography, ocean currents, and wave-propagation pathways. This detection leads to improved estimates of missing values compared to methods (such as kriging) that assume isotropic covariance relationships, as we show with real and synthetic data. This interpolated analysis of HadCRUT4.6 data is available as a 100-member ensemble, propagating information about sampling variability available from the original HadCRUT4.6 dataset. A comparison of Niño-3.4 and global mean monthly temperature series with published datasets reveals similarities and differences due in part to the spatial interpolation method. Notably, the GraphEM-completed HadCRUT4.6 global temperature displays a stronger early twenty-first-century warming trend than its uninterpolated counterpart, consistent with recent analyses using other datasets. Known events like the 1877/78 El Niño are recovered with greater fidelity than with kriging, and result in different assessments of changes in ENSO variability through time. Gaussian Markov random fields provide a more geophysically motivated way to impute missing values in climate fields, and the associated graph provides a powerful tool to analyze the structure of teleconnection patterns. We close with a discussion of wider applications of Markov random fields in climate science.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Adam Vaccaro, avaccaro@usc.edu

1. Introduction

The surface temperature record, composed of land surface air temperature (LSAT) measurements and sea surface temperature (SST) readings from around the world, is a critical resource in assessing climate change in the industrial era, helping frame recent trends in the context of past variability (Stocker et al. 2013). Since they are based on in situ temperature measurements, instrumental surface datasets are also fundamental to the validation of remote sensing retrieval algorithms (Karl et al. 2006) and serve as key training datasets for paleoclimatic reconstructions (Tingley et al. 2012; Emile-Geay et al. 2013a,b; Hakim et al. 2016; Neukom et al. 2019). They also play a critical role in the evaluation of climate models (Simmons et al. 2004; Flato et al. 2013).

Because instrumental datasets draw on a variety of observational platforms, the inhomogeneities among the various types of measurements (such as different instrumentation, measurement type, and time of day) can manifest as biases in metrics calculated from the measurements if necessary corrections are not applied (Kennedy et al. 2011b; Williams et al. 2012; Hausfather et al. 2013; Kennedy 2014; Kent et al. 2017). Recent studies suggest that sources of inhomogeneity in the instrumental data that were previously unaccounted for are at least partially responsible for the so-called global warming pause (a slowdown of temperature rise between about 1998 and 2013) apparent in some global temperature estimates (Karl et al. 2015; Chan and Huybers 2019). A separate problem is spatial coverage, which is highly nonstationary, with large parts of the globe lacking observational coverage of any kind over long intervals, especially prior to World War II (Fig. 1). A spatially complete, homogeneous global temperature dataset is preferred by some researchers, but such datasets do not exist for long periods of time [satellite-based estimates are only available since 1979, and they too suffer from biases (e.g., Mears and Wentz 2005; Thorne et al. 2010)]. It would therefore be desirable to fill in the gaps in the surface temperature record in a way that is maximally consistent with available observations, a task known as imputation in the statistical literature (Dempster et al. 1977; Schneider 2006).

Fig. 1.
Fig. 1.

(top) Spatial distribution of the number of available monthly observations per decade for four periods: the 1850s, 1900s, 1950s, and 2000s. Note that over land, this number measures the number of stations per grid cell, while over ocean points, this number represents the actual observations contributing to the gridcell average. Both give an indication of improving coverage over time. (bottom) Contributions from various regions: NH, SH, global ocean, and polar regions (Arctic, defined as all grid cells poleward of 65°N, and Antarctic, defined as all grid cells poleward of 60°S). Note the different axis for polar contributions (right), which are of order 5% for the Arctic, 2% for Antarctica, and are marked by a strong seasonal cycle associated with the presence of sea ice.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

To this end, let us consider the HadCRUT4 dataset. The Met Office Hadley Centre and the University of East Anglia’s Climatic Research Unit’s global surface temperature dataset (Brohan et al. 2006; Morice et al. 2012) is among the most widely used global instrumental temperature datasets available (Flato et al. 2013; Hausfather et al. 2013). It is derived from near-surface air temperature data from the Hadley Centre and Climatic Research Unit’s (CRU) land-based temperature dataset, version 4 (CRUTEM4; Jones et al. 2012) combined with sea surface temperature data from the Hadley Centre’s SST dataset, version 3 (HadSST3; Kennedy et al. 2011a,b); the result is a global gridded surface temperature dataset dating back to 1850. Indeed, it is the longest of the three major independently produced global surface temperature datasets, which include NOAA/NCDC’s Merged Land–Ocean Surface Temperature Analysis (Vose et al. 2012) and NASA’s GISTEMP (Hansen et al. 2010). The coverage is fragmentary, however, which is a major statistical challenge. Over the entire period, over half (53%) of data entries are missing, most of them at the poles and over Africa; the coverage generally worsens back in time, with notable gaps during the two World Wars (Fig. 1).

This uneven distribution of missing observations introduces a bias into the estimation of the global mean temperature. Undersampling of the fastest warming parts of the globe (the poles) leads to an underestimation of the global mean temperature trend (Simmons et al. 2010; Folland et al. 2013). These coverage gaps are of particular concern over recent decades due to the different rates of warming exhibited between high and low latitudes and between land and ocean points (Hansen et al. 2006). A previous study (Cowtan and Way 2014, hereafter CW14) identified the uneven distribution of unsampled regions around the globe as a source of bias in global temperature trends calculated using the HadCRUT4 dataset. In particular, CW14 found that incomplete sampling of high-latitude polar regions led to an underestimation of the global warming trend over recent decades, that is, the 1998–2016 “warming pause” found in other global temperature reconstructions (Easterling and Wehner 2009; Hansen et al. 2010; Karl et al. 2015). After applying a kriging-based interpolation method (Cressie 1990) to the raw HadCRUT4 data, the authors found that the warming trend had been greatly increased, suggesting that imputation is necessary for a more accurate portrayal of temperature variations in this and other surface temperature datasets. Other studies have replicated CW14’s results and shown that interpolation over Arctic regions eliminates the apparent “pause” found in uncorrected datasets (Huang et al. 2017). The effect of infilling missing data in the Arctic highlights the necessity of addressing missing data in climate fields, and it is used in this study as a means of illustrating the effects of various interpolation methods.

Spatial estimation methods have a long history in this field, which we recount more fully in section 2. For now, we note that, whether Bayesian or frequentist, these methods all assume that the data are missing at random, in the sense that the likelihood of a value being missing is independent of the value itself. While there are some similarities between the temporal structure of missing values (Fig. 1) and global warming (e.g., Fig. 8), the patterns are different enough that this is not an issue. The recent, relatively complete period may be used to characterize spatial relationships within the field, and to exploit these relationships to impute missing data at earlier times.

Generally speaking, existing approaches do so in two main ways: 1) local methods parametrically model spatial covariances to infer missing data as a function of distance only, usually within a short radius (e.g., Hansen and Lebedeff 1987; Reynolds and Smith 1994; Smith et al. 1996; Reynolds et al. 2002; Tingley and Huybers 2010; Rohde et al. 2013; CW4); 2) global methods leverage long-range correlations but must regularize the covariance estimation problem either via truncation (Kaplan et al. 1997, 1998, 2003) or model selection (Guillot et al. 2015). Very few methods bridge this dichotomy; Karspeck et al. (2012) offer one example, constructing a nonstationary Matérn model to estimate the North Atlantic sea surface temperature field in a Bayesian framework. Multiresolution lattice kriging (Nychka et al. 2015) offers another, recently applied to surface temperature (Ilyas et al. 2017).

The objective of this paper is to apply the theory of Markov random fields (MRFs; aka graphical models) to this imputation problem. Recently, Guillot et al. (2015, hereafter G15) used MRFs to flexibly model temperature covariance in the context of paleoclimate reconstructions, a closely related problem (Tingley et al. 2012). Here we apply G15’s graphical expectation–maximization (GraphEM) algorithm to produce a 100-member ensemble of spatially complete realizations of the latest HadCRUT4 dataset (HadCRUT4.6). We show that the imputation results in more realistic reconstructions of past climate events, like the 1877/78 El Niño, and stronger historical warming trends than in uninterpolated data.

The rest of this paper is structured as follows: we first describe the data, as well as the methods used to infill the gaps herein (section 2). The methods are benchmarked against synthetic data in section 3. Results on HadCRUT4 data are presented in section 4, followed in section 5 by an analysis of the climate network afforded by the graphical approach. Section 6 discusses the benefits and limitations of our approach and offers conclusions.

2. Data and methods

a. Imputation of missing values

Following Schneider (2001), we consider an incomplete climate dataset with n samples of p variables (say, monthly surface temperature over the past 167 years on a grid with p = 2592 points, corresponding to HadCRUT4’s 5° × 5° grid). Let us denote by X the n × p data matrix. Hence, each row of X corresponds to temperature records at the different grid cells at a given time, and each column corresponds to the temporal record for a given grid cell. Our goal is to fill in the blanks of the dataset in a way that is consistent with the available data. Estimating statistics of incomplete data and imputing missing values are parts of an estimation problem that is generally nonlinear (Schneider 2001). In this work, we assume that each row can be modeled by a p-dimensional normal distribution with mean vector μ1×p and covariance matrix Σp×p. The normality assumption can be further relaxed to include heavy tailed distributions too. A popular approach for infilling missing values in X is to use the EM algorithm (Dempster et al. 1977). In the multivariate normal setting, given an estimate of the mean μ and the covariance matrix Σ of the dataset, the EM algorithm reduces to regressing the missing values on the available ones, and thereafter updating the estimates of μ and Σ. This procedure is iterated until convergence. More precisely, following G15’s notation, for a given row x1×p of Xn×p, let a,m{1,,p} denote the set of indices for which the values of x are available and missing, respectively. Note that the available and missing indices may change for each row. Let xa1×|a| and xm1×|m| be the row vectors obtained by restricting x to its available and missing parts. Let μ(0) and Σ(0) be initial estimates of μ and Σ. For example, they could be the sample mean and sample covariance calculated after replacing every missing value xi,j in X by the average of the available values in column j. A multiple of the identity matrix can be added to the initial sample covariance matrix to ensure it is invertible. The EM algorithm iteratively constructs a sequence of parameters μ(l), Σ(l) as follows. For every l ≥ 0, the E step consists of a linear regression of the missing values on the available ones in each row x of X:
(xmμm(l))T=B(l)[xaμa(l)]T,
where
B(l)=Σma(l)(Σaa(l))1|m|×|a|
denotes matrix of regression coefficients, and
μ(l)=(μa(l),μm(l)),Σ(l)=(Σaa(l)Σam(l)Σma(l)Σmm(l)),
denote l-th iterates of the mean vector μ(l)1×p and the covariance matrix Σ(l), both partitioned according to the available and missing parts of the given row x. Denote by X(l+1) the completed estimate of X, obtained after the regression (1) has been performed in order to impute the missing values in each row of X. In the M step of the algorithm, the estimates of μ and Σ are updated by
μi(l+1)=1nk=1nXki(l+1), Σij(l+1)=1nk=1n[(Xki(l+1)μi(l+1))(Xkj(l+1)μj(l+1))]+Cij(l+1),
where Cij(l+1) is the covariance of the residuals. Using the same block decomposition as in (3), we have
C(l+1)=[000Σmm(l)Σma(l)(Σaa(l))1Σam(l)].
The reader is referred to Little and Rubin (2002) and G15 for more details on the EM algorithm.

The estimate (2) of the regression coefficients implies two central challenges in carrying out an imputation of missing values based on (1): (i) the covariance matrices must be reliably estimated; (ii) some submatrices Σaa must be inverted. Although these problems are obviously linked, they are somewhat distinct in practice, and both pose a number of problems in high dimensions. Indeed, for HadCRUT4, the number of variables p = 2592 exceeds the number of monthly observations (n ~ 2000). This “large p, small n problem” is well known to result in sample covariance matrices that are not only rank deficient (hence noninvertible) but also very imprecise (Johnstone 2001). This imposes a need for some form of regularization to impute the missing data. Note that this need arises universally in spatial estimation problems, regardless of the framework. Mean vectors μ must also be well estimated, but this step is less problematic than the estimation of covariance matrices from incomplete data (Little and Rubin 2002).

In climate studies, covariance regularization has been accomplished in three main ways: (i) explicit modeling of the spatial dependence of covariances (e.g., Hansen and Lebedeff 1987; Jones and Moberg 2003; Brohan et al. 2006; Tingley and Huybers 2010); (ii) expansion of covariance matrix estimates in a truncated principal component basis to achieve dimension reduction (e.g., Reynolds and Smith 1994; Smith et al. 1996, 2008; Kaplan et al. 1997, 1998, 2000, 2003; Ribes et al. 2013); (iii) filtering of trailing principal components of covariance matrix estimates through Tikhonov regularization/ridge regression (Schneider 2001). In the first case, regularization is achieved by choosing spatial covariance functions that drop sharply with radial distance (e.g., Matérn kernels). This choice is subjective and, so far, has not allowed for anisotropy or inhomogeneities that may be associated with surface features like coastlines or mountain ranges. In the second case, regularization is achieved through the choice of a truncation parameter (the number of principal components to be retained). Heuristics have typically been used for this choice, representing a limitation of this approach as practiced in climate applications. In the third case, regularization is achieved through the choice of a continuous regularization parameter (“ridge parameter”), which measures the strength of the filtering of trailing principal components (Hoerl and Kennard 1970b,a; Tikhonov and Arsenin 1977). Ridge regression with determination of a regularization parameter by generalized cross validation (GCV; Golub et al. 1979; Craven and Wahba 1978) has been used in the regularized expectation maximization (EM) algorithm (Schneider 2001) and was shown to lead to more accurate SST estimates for recent decades than the method of Reynolds and Smith (1994).

A novel approach consists in modeling temperature using a Gaussian graphical model (also known as Markov random field) (G15). This involves representing conditional independence relations between locales with the aid of graph G = (V, E), where V are the vertices (temperature grid points, also referred to as nodes of the graph) and E the edges of the graph (connections between grid points). Because this approach will be unfamiliar to most readers, we now provide a brief overview.

b. Graphical covariance modeling

An attractive feature of graph-based methods is their demonstrated link to dynamical features of fluid flow (land–ocean boundaries, orography, or wave-propagation pathways, aka teleconnections), which traditional covariance modeling methods do not capture (Tupikina et al. 2016). This makes such techniques applicable to any geophysical field that may be modeled by a multivariate normal distribution, including those representing passive tracers. This includes surface temperature, which may be regarded as a passive tracer of atmospheric or oceanic flow on the monthly scales of interest in this work (Tupikina et al. 2016).

Let X = (X1, …, Xp) be a normally distributed random vector with mean μ and covariance matrix Σ. It is well known that zero entries in Σ correspond to marginal independence of the associated random variables. A possible approach to reduce dimensionality in the estimation of Σ is thus to seek a sparse representation of the covariance matrix. In practice, however, fields that show long-range correlations (e.g., temperature fields) will show very few true zero entries in Σ. Nevertheless, they could be quite abundant in the precision matrix Ω = Σ−1. Indeed, one can show that a zero in Ω = Σ−1 corresponds to the conditional independence of the corresponding variables (Dempster 1972; Lauritzen 1996). Two variables Xi and Xj are said to be conditionally independent if Xi is independent of Xj given {X1, …, Xp}\{Xi, Xj}. Hence, Ωij = 0 if Xi and Xj are independent given the value of all the other variables. Such conditional independence relations are ubiquitous in temperature fields. For example, while the temperature at two distant locations (say, New York, New York, and Los Angeles, California) may display a nonnegligible level of correlation, the temperature in Los Angeles may not help too much in predicting the temperature in New York given the temperature at closer locations (say, Boston, Massachusetts). In such a case, the corresponding random variables would be conditionally independent given the Boston temperature, and the corresponding entry of Ω would be 0. Graphical models leverage these zeros in Ω to reduce the dimensionality of the estimation problem, resulting in a well-conditioned covariance matrix Σ.

The conditional independence relations of a random vector (X1, …, Xp) may be encoded in an undirected graph G with vertex set V = {1, …, p}, and where two vertices i, jV are connected by an edge if and only if the corresponding variables Xi and Xj are not conditionally independent (thus the name graphical model). The graph G is called the concentration graph associated with the random vector. Hence, the lack of an edge between two vertices in the concentration graph of X = (X1, …, Xp) denotes conditional independence of the corresponding random variables.

Recently, G15 embedded such graphical models within an EM algorithm (Dempster et al. 1977). The resulting method, GraphEM, is a generalization of the regularized EM (RegEM) algorithm (Schneider 2001), and was used to estimate annual temperature over the Common Era (Wang et al. 2015; Neukom et al. 2019) with demonstrably higher spatial fidelity than competing methods (Wang et al. 2014; G15), including RegEM variants. The main difficulty in this approach is to reliably estimate the concentration graph G from a finite dataset in the presence of observational noise.

G15 explored two approaches to graph estimation: 1) modeling G as a neighborhood graph—that is, a graph obtained by declaring all points with a radius R of each vertex to be a neighbor; 2) using the graphical least absolute shrinkage and selection operators (GLASSO; Friedman et al. 2008), an 1 penalized likelihood method1 to discover the conditional independence relations hiding in the data. Both approaches involve a tuning parameter [the radius R or the penalty parameter ρ; see G15, Eq. (2.1)], and both parameters may be optimized via cross validation. G15 found that for suitable choices of ρ, the GLASSO approach yielded better estimates of the temperature field than the neighborhood graph method. The choice of ρ is a trade-off between including too few neighbors (ρ large), which in the limit makes the graph strictly local; or including too many neighbors (ρ small), some of which are spurious and could raise imputation error. Once a compromise is reached to find a suitable graph G, GraphEM uses it to obtain well-conditioned estimates of Σ and its inverse Ω, which are then used to estimate the missing values via regression (Fig. 2). Starting from initial guesses about μ and Σ, the algorithms proceed by fitting the given graph to the data, resulting in a graphical covariance estimate Σ^G (step 2). This well-conditioned estimate is then used to compute the regression coefficients (step 3), which are used to estimate the expected value of missing entries (step 4, expectation step of the EM algorithm). The completed matrix is then used to update the maximum-likelihood estimates of μ and Σ (step 5, maximization step of the EM algorithm). The cycle continues until values reach convergence within a specified numerical tolerance (typically, 5 × 10−3). In the following, we apply this variant of GraphEM to the HadCRUT4.6 dataset, which we now describe. The computational steps are outlined in section 2d.

Fig. 2.
Fig. 2.

Schematic representation of the GraphEM algorithm. Starting from an initial guess of the mean and covariance, as well as some information about the conditional independence structure (i.e., the concentration graph G), the algorithm first computes a graphical estimate of the covariance matrix, uses it to compute regression coefficients, impute the missing values, then uses the latter to update the mean and covariance of the dataset. The algorithm proceeds until convergence, which is always guaranteed for a fixed graph.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

c. Source data

The original source of historical temperature data used in this study is HadCRUT4 (version 4.6.0.0, herein HadCRUT4.6). HadCRUT4 is a monthly resolved global surface temperature anomaly dataset placed on a 5° × 5° grid. The anomalies are calculated with respect to a 1961–90 reference period and are available from January 1850 to September 2018. HadCRUT4 is organized as a 100 ensemble member dataset, which samples the distribution of likely surface temperature anomalies (Morice et al. 2012). Structuring the dataset in this manner enables better representation of complex temporal and spatial interdependencies of measurement and bias uncertainties, which allows correlated uncertainties to be considered (Morice et al. 2012). The HadCRUT4 dataset is nearly ubiquitous in the climate science community, and indeed, is one of the most widely used temperature datasets. However, like any surface temperature dataset, HadCRUT4 suffers from incomplete coverage: historical observations are not available for slightly more than half of the grid cells (~53%) over the entire time period (1850–2017) in the HadCRUT4.6 version, but the data show improved coverage (~70%) since 1979 (Fig. 1).

CW14 identified such coverage gaps as a major source of bias in global average temperature derived from the HadCRUT4.6 dataset. In particular, the authors found that incomplete sampling of high-latitude polar regions leads to an underestimation of the global warming trend over recent decades found in other preexisting global temperature reconstructions. The authors performed three reconstructions: null, kriging, and hybrid. The null reconstruction involves setting the missing values to the global mean temperature, whereas kriging allows for the estimation of unknown values by making use of the weighted average of neighboring sampled data. The hybrid reconstruction features the combination of both the kriging method and the University of Alabama in Huntsville’s satellite dataset (Spencer 1990; Christy et al. 2007).

The authors found that in all cases, the kriging and the hybrid approaches outperformed the null reconstruction in cross-validation testing. The authors also found that the hybrid approach outperformed the kriging method in a few cases, notably at high latitudes. However, because the hybrid reconstruction is limited to a shorter time period (1979–2017), we use the kriging solution as a starting point for our work since it covers the entire time span of HadCRUT4.6.

We build on CW14 by using the global temperature series that they obtained using their kriging and hybrid approaches to update the HadCRUT4.6 data and model their covariance structure before infilling the remaining missing values. Our method ingests both the CW14 kriging results (version 1.0, herein HadCRUT4.6CW) and Cowtan and Way’s hybrid short reconstruction version 2.0 that incorporates the MERRA satellite reanalysis dataset [HadCRUT4.6CW(MERRA)], which spans the 1979–2017 period and offers near global spatial coverage (Bosilovich et al. 2016). Our results will be compared against a comprehensive set of surface temperature datasets (Table 1).

Table 1.

Surface temperature datasets used in this study. L+O: land and ocean. O: ocean only.

Table 1.

d. Workflow

Here, we produce a spatially complete global gridded temperature dataset extending back to 1850. We start off by estimating both the precision matrix and the concentration graph of the gridded temperature field using information from the HadCRUT4.6CW(MERRA) reconstruction (Fig. 3). The concentration graph is obtained via the 1-penalized maximum likelihood method described in G15 (section 2.1). More specifically, the graphical lasso is used to obtain a sparse estimate of the precision matrix of the field for a given target sparsity level. Once this precision matrix is obtained, the graph can be estimated from the pattern of zeros in the precision matrix. Since HadCRUT4.6CW(MERRA) only covers the 1979–2017 interval, it is assumed that the graph is constant through the entire 1850–2017 period.

Fig. 3.
Fig. 3.

Schematic of the workflow used in this study. First, the HadCRUT4.6CW(MERRA) data are used to estimate the covariance matrix of the temperature field. Then, using a target sparsity of 0.6%, the concentration graph of the field is obtained from a greedy search algorithm. The graph is then incorporated into GraphEM to infill the missing values in the raw HadCRUT4 data, using HadCRUT4.6CW as an initial guess.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

The choice of an optimal target sparsity parameter for GLASSO is a delicate point. Cross validation was attempted, but resulted in ambiguous choices, likely due to the paucity of fully complete data available for validation. However, testing with higher sparsities showed that despite yielding lower expected prediction error over grid cells that were available in the raw data, the reconstructions tended to show unrealistically large excursions further back in time, when observations are especially scarce. These anomalous values are the result of overfitting the sparsity parameter. To identify a sparsity level that minimized the occurrence of extreme temperature anomalies, we plot kernel density estimates of temperature anomaly probabilities in Fig. 4. Of the sparsity levels tested, 0.6% sparsity shows the least amount of distortion compared to the raw HadCRUT4.6 and yielded the lowest frequency of extreme anomalies. We therefore select 0.6% sparsity as the best compromise.

Fig. 4.
Fig. 4.

Kernel density estimates of probability density plotted on a logarithmic scale for the raw HadCRUT4.6 median, GraphEM with GLASSO products at various target sparsities, and GraphEM neighborhood graph with 1000-km radius.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

Having estimated the concentration graph at a target sparsity of 0.6%, we then apply GraphEM to impute the missing values in HadCRUT4.6. However, because GraphEM requires an initial guess for the temperature field before interpolating missing values, we opted to use the HadCRUT4.6CW(MERRA) reconstruction as a starting point rather than set missing values equal to 0, as is the default behavior when an initial guess is not provided. The results are not sensitive to this choice, though it greatly accelerated computations by providing a “warm” start.

3. Imputation of synthetic data

We evaluate the performance of our and other infilling algorithms through application to synthetic benchmark datasets. We refer to these tests as “pseudoworld” tests.

a. Pseudoworlds

Near-surface temperature fields from two CMIP5 historical experiments were used to provide physically plausible, globally complete realizations of surface temperature. The two pseudoworlds (PW1–PW2) are derived from surface air temperature and sea surface temperature fields obtained from historical experiments with two CMIP5 models: CCSM4 (Deser et al. 2012) and MPI-ESM-MR (Giorgetta et al. 2013).

SST fields were generated by masking ocean skin temperature fields where simulated sea ice concentrations are greater than 15%. SST anomalies were then calculated, regridded to the HadSST3 (Kennedy et al. 2011a,b) grid and subsampled to observed locations. Measurement and sampling uncertainties were then sampled from the HadSST3 error covariance matrices and added to the subsampled model fields.

To create land air temperature fields, climate model LSAT anomalies were regridded to the same 5° × 5° latitude–longitude grid as the CRUTEM4 dataset (Jones et al. 2012) and global coverage was reduced to match CRUTEM4. Measurement and sampling uncertainty (associated with observational error and estimation of gridbox average temperature anomalies from a finite number of point observations) was sampled and added to the climate model subsampled model fields. Our results show very few differences between methods over land, however, so we omit LSAT results for brevity.

b. Results

The SST component of these two pseudoworlds is then used to benchmark the performance of four infilling algorithms:

  • Null: A null reconstruction is performed by setting missing values equal to the local monthly mean temperature.

  • Kriging: The kriging method employed by CW14.

  • GraphEM, R1000: GraphEM using a neighborhood graph with a cutoff radius of 1000 km.

  • GraphEM, GLASSO; GraphEM using GLASSO with target sparsities ranging from 0.4% to 1.2%

We first explore the impact of the target sparsity (Fig. 5). The results suggest marked improvements in mean-squared error (MSE) over several regions, particularly the Pacific cold tongue, for all target sparsity levels, particularly for pseudoworld 1. For better observed regions (e.g., the Atlantic) there is no noticeable improvements in either pseudoworld. A bias-variance decomposition (not shown) further shows that these gains are almost entirely driven by the variance component.
Fig. 5.
Fig. 5.

Impact of graph sparsity in pseudoworld experiments. (first column) MSE of the null reconstruction on the pseudoworlds over ocean points (PW1 SST and PW2 SST). (columns 2–6) Difference in MSE to the null reconstruction of SST fields reconstructed using GraphEM with GLASSO at target sparsities ranging from 0.4% to 1.2%.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

These experiments all display comparable improvements over the null, providing little guidance on the choice of an optimal sparsity parameter. For this reason, the choice for the real HadCRUT4 dataset is chiefly guided by the distribution of temperature extremes (Fig. 4). For consistency with HadCRUT4, we hence focus on the 0.6% sparsity level from now on.

Figure 6 compares the various methods together. It shows all methods improving comparably on the null, with a slight advantage for GraphEM (with either choice of graph) over kriging in pseudoworld 1. Pseudoworld 2 imputations show smaller improvements over the cold tongue, perhaps because the underlying model (MPI-ESM) displays relatively weaker El Niño–Southern Oscillation (ENSO) variability than CCSM4, whose ENSO cycle is overly active (Deser et al. 2012). Results with this pseudoworld, however, suggests that all methods perform worse than the null (increasing, rather than decreasing, the MSE) close to the Southern Ocean. The effect is particularly pronounced for the kriging method. This is likely an edge effect, due to the masking of anomalies when sea ice concentration is above 15%, and the consideration of SST anomalies alone.

Fig. 6.
Fig. 6.

Method intercomparison in pseudoworld experiments. (first column) MSE of the null reconstruction on the pseudoworlds over ocean points (PW1 SST and PW2 SST). (columns 2–4) Difference in MSE to null reconstruction of SST fields reconstructed using kriging, GraphEM with a neighborhood graph of 1000-km radius, and GraphEM with GLASSO at 0.6% target sparsity.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

Table 2 synthesizes a global metric, the root-mean-squared error (RMSE) for all cases investigated. The lowest global scores are achieved with GraphEM, with only a weak dependence on the sparsity level. However, it is obvious from Fig. 6 that the metric is dominated by certain “hot spots.”

Table 2.

Area-weighted RMSE scores for the pseudoworlds imputations. The term R1000 refers to GraphEM with a 1000-km neighborhood graph; all subsequent columns refer to GraphEM imputations using the graphical lasso with sparsity equal to the column label. Minima for each pseudoworld are indicated in bold.

Table 2.

Taken together, these results suggest that GraphEM offers slight advantages over kriging. However, they would lead us to expect relatively minor differences between the methods, unlike what we will observe when applying them to real datasets (section 4). These benchmarks come with the caveat that the models are, by definition, imperfect representations of reality, particularly regarding their spatial features, and that real errors may be more severe than the modeled ones. These benchmarks therefore cannot be the sole arbiter of methodological choices.

4. Imputation of HadCRUT4 data

Here we examine the results of the GraphEM imputation against published HadCRUT4.6 variants, as well as other surface temperature datasets (Table 1). Because the full dataset is large (2016 × 2592 × 100), we focus for this comparison on two indices: the global mean temperature (GMT) and the Niño-3.4 index, a common metric of El Niño activity (Trenberth and Stepaniak 2001). Spatial context will be given as appropriate.

a. Global variability

We first compare GMT from the GraphEM-infilled dataset to series computed from the raw HadCRUT4.6 median data and from HadCRUT4.6CW. This will illustrate the effect of the GraphEM infilling and quantify the difference between it and the kriging method used by CW14.

Figure 7 compares GMT anomalies from the GraphEM solution to the same series obtained from the raw HadCRUT4.6 median and the CW14 median. Unsurprisingly, the datasets are very close over recent times (post-1950) but diverge markedly back in time. The monthly GMT series from HadCRUT4.6G exhibits much more variability than the raw data pre-1950 and shows a secular warming trend similar to HadCRUT4.6CW. Hints of an enhanced seasonal cycle are also detectable in the two interpolated versions (bottom). These differences are entirely driven by data availability: series using the raw HadCRUT4.6 shrink to the mean, while their interpolated counterparts make use of what little observations are available and propagate them in space in accordance with their covariance model.

Fig. 7.
Fig. 7.

Global mean temperature in HadCRUT4.6 variants. (top) Extreme quantiles of the raw HadCRUT4.6 ensemble (gray), the raw HadCRUT4.6 median (blue), HadCRUT4.6CW (red), and HadCRUT4.6G (black). (bottom) As in the top panel, but plotted as difference to the raw HadCRUT4.6 median.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

Table 3 gathers such warming trends calculated as least squares fit using the estimator of Newey and West (1987) over various intervals (expressed in °C century−1) as a basis for comparison. It shows that the trend calculated over 1998–2013 in HadCRUT4.6G (0.832°C century−1) lies about midway between the trends found in the raw median (0.548°C century−1) and HadCRUT4.6CW (1.044°C century−1). This confirms CW14’s finding that spatial incompleteness is a major contributor to the global warming “pause” found in previous analyses using the raw HadCRUT4 dataset. Looking at the longest common period (1880–2017), HadCRUT4.6CW displays the strongest trends (0.6860.7090.731°C century−1),2 of any HadCRUT4.6 variant. So while it is true that trends in interpolated datasets are greater than in the raw HadCRUT4.6 median over all periods analyzed, the type of interpolation is seen to be quite consequential in these estimates.

Table 3.

Warming trends (°C century−1) over various intervals. Linear trends and associated confidence intervals, expressed as ±2 × SE, where SE is the standard error, as per the method of Newey and West (1987).

Table 3.

Figure 8 compares the monthly resolved GMT series from HadCRUT4.6G, NASA’s GISTEMP and NOAAGlobalTemp plotted over the entire interval (top) and over a recent period (1985–present, bottom). The overall shape of all three series is similar, especially the long-term trends, but HadCRUT4.6G’s GMT is sometimes colder by 0.5°–1°C in the 1880s—a large difference indeed, resulting in stronger secular trends for HadCRUT4G prior to the mid twentieth century. However, HadCRUT4.6G (like its other HadCRUT4.6 variants) is colder over recent decades, so the late twentieth-century trend is larger in GISTEMP and NOAAGlobalTemp (1.0521.1361.221 and 1.0401.1181.195°C century−1, respectively). Differences become more apparent at higher resolution (Fig. 8, bottom).

Fig. 8.
Fig. 8.

(top) Monthly global mean temperature series plotted over their entire time interval for various global surface temperature datasets: GraphEM-infilled HadCRUT4.6 median (black), the GraphEM-infilled ensemble extreme quantile (shaded gray area), NASA’s GISTEMP (yellow), and NOAA’s Global Surface Temperature analysis (brown). (bottom) As in the top panel, but the series are plotted over the 1985–2018 period: GraphEM-infilled HadCRUT4.6 median (black), GraphEM-infilled ensemble extreme quantile (shaded gray area), NASA’s GISTEMP (yellow), and NOAA’s Global Surface Temperature analysis (brown). See Table 1 for details.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

A notable point from Table 3 is that interpolation methods result in notably different trends between HadCRUT4.6 variants and external datasets (NASA’s GISTEMP, NOAAGlobalTemp). This underlies the importance of interpolation and motivates using the best available methods. Another source of differences between the HadCRUT4.6 variants and GISTEMP/NOAAGlobalTemp is the bias adjustments applied to SSTs, which affect trends across all the period analyzed, particularly the post 1998 “pause” and the period from 1950 on. Kent et al. (2017) cover the longer periods, and Hausfather et al. (2017) cover the more recent periods.

To gain insight into GMT behavior, the zonally averaged temperature evolution from HadCRUT4.6G is charted in Fig. 9. It shows the largest changes over the Arctic. This reinforces the result that missing values over this region were key to the underestimation of recent trends in the raw HadCRUT4.6 dataset (CW14). It also clearly shows that the onset of twentieth-century warming occurred earlier in high northern latitudes. In contrast to the NH polar amplification, in the SH the temperature changes are larger in the tropics and midlatitudes than the polar region.

Fig. 9.
Fig. 9.

The HadCRUT4.6G temperature field zonally averaged over 10° latitude bands and 5-yr intervals. The interpolated solution shows strong polar amplification and an increased rate of warming over Arctic regions.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

b. Regional variability: The case of ENSO

While a comprehensive depiction of spatiotemporal temperature variability is beyond the scope of this paper, we choose to illustrate the impact of data processing on the depiction of ENSO variability. ENSO is the leading mode of global interannual variability, influencing climate and weather over much of the globe (Sarachik and Cane 2010; Trenberth et al. 1998). ENSO can be described by many metrics (Trenberth and Stepaniak 2001), the Niño-3.4 index (average SST over 170°–120°W, 5°S–5°N) being a common one (Trenberth 1997). One motivation for assessing the impact of imputation on this index is that Emile-Geay et al. (2013a,b) found that instrumental trends in Niño-3.4 had a leading-order influence on the amplitude of reconstructed Niño-3.4 variability over the past millennium.

Niño-3.4 indices computed from each of the HadCRUT4.6 variants are plotted in Fig. 10. Overall, the three datasets show similar trends over recent decades (1960–present) but diverge before 1960. HadCRUT4.6CW shows the least amount of variance further back in time, while HadCRUT4.6G shows the most. This is again a consequence of interpolation: when no interpolation is performed, the index is dominated by the few observations available at that time, which mostly follow shipping tracks between Australia and the United States (Fig. 1; Bunge and Clarke 2009). The CW14 kriging method, which assumes a relatively narrow decorrelation radius, smooths these anomalies down to zero away from ship tracks (i.e., over much of the Niño-3.4 box). In contrast, the GraphEM solution leverages large-scale teleconnections to infer Niño-3.4 temperature (section 5), resulting in stronger anomalies. This is particularly clear for the year 1917, which stands out in differences to the raw HadCRUT4.6 of up to 5 K (Fig. 10, bottom). The large negative anomaly arises from isolated ship observations. Where observations are isolated in this way, the automated quality control is less reliable and occasional large outliers can pass the checks. Reliable interpolation methods could be used to improve the background fields against which quality control checks are made, thereby improving data quality.

Fig. 10.
Fig. 10.

Niño-3.4 SST in HadCRUT4.6 variants. Conventions identical to Fig. 7.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

It is instructive to look at spatial anomalies to understand what is at work. Figure 11 illustrates this for the well-known 1877/78 El Niño, the biggest documented event prior to 1982 (Quinn 1992), which caused widespread famines in China and India, estimated to have caused the premature death of about 20 million people (Davis 2001). Figure 11 shows a comparison between the raw (uninterpolated) HadCRUT4.6 dataset, HadCRUT4.6CW, and HadCRUT4.6G over various seasonal windows straddling 1877/78. The raw HadCRUT4.6 dataset (top row) is shown to be missing vast swathes of the Pacific. Yet, subtle hints of the equatorial warming associated with the El Niño event can be detected. After applying a kriging-based interpolation method to the dataset (CW, bottom row), the presence of the El Niño event becomes clearer, but the pattern is patchy, with reduced amplitude away from sampled grid points—a natural consequence of distance-based kriging, which produces unphysical features. In contrast, GraphEM can recover the full structure of this known El Niño event, including its far field effects over Eurasia, either with a neighborhood graph or a GLASSO graph. Thus, the two covariance modeling methods yield very different estimates of past climate variability: HadCRUT4.6CW seems more conservative with outliers like the 1917 anomaly but distorts the spatial patterns and likely overdamps anomalies away from observed locales. On the other hand, HadCRUT4.6G retains more faithful spatial patterns, but can be vulnerable to outliers. In the future, interpolation methods may be used in tandem with quality control checks to better constrain such outliers.

Fig. 11.
Fig. 11.

Temperature field averaged across each grid cell over DJF, MAM, and JJA months during the 1877/78 El Niño event. (first row) HadCRUT4.6raw, (second row) HadCRUT4.6G with graph estimated using GLASSO and 0.6% target sparsity, (third row) HadCRUT4.6G using a neighborhood graph with a 1000-km cutoff radius, and (fourth row) HadCRUT4.6CW.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

Figure 12 compares the HadCRUT4.6G Niño-3.4 to analogous series derived from global SST datasets [ERSSTv5 (Huang et al. 2015; Liu et al. 2015); Centennial Observation-Based Estimates of SST (COBE SST; Folland and Parker 1995; Ishii et al. 2005)], as well as stand-alone Niño-3.4 products (Bunge and Clarke 2009; Kaplan et al. 1998). As before, differences are most pronounced before 1875, and around 1917, though differences of up to 1.5 K are seen through the 2000s for some months (Bunge and Clarke, purple line). These differences are a testament to the difficulty of characterizing ENSO state even with modern observing platforms (Huang et al. 2013) and need to be kept in mind when assessing changes in ENSO variability variance among datasets cannot be neglected.

Fig. 12.
Fig. 12.

(top) Monthly resolved Niño-3.4 series for Kaplan (dark green), Bunge and Clarke (purple), ERSSTv5 (brown), COBE SST (salmon), and HadCRUT4.6G (blue). (bottom) As in the top panel, but series are plotted as differences to the Kaplan Niño-3.4 series. See Table 1 for details on the datasets.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

The comparison is summarized in Table 4, which shows the correlations calculated between HadCRUT4.6G and the other series. Most correlations are above 0.9, with the largest differences observed with Kaplan extended at decadal and longer scales.

Table 4.

Comparison of Niño-3.4 indices. Correlations were computed with respect to HadCRUT4.6G for the monthly resolved Niño-3.4 series (N3.4m), an annualized series calculated using a 12-month mean (N3.4ayr), a decadally smoothed series obtained by applying a low-pass filter with a cutoff frequency of 1/120 months to the monthly resolved series (N3.4dyr), an annualized series calculated using only DJF months (N3.4adif), and a decadally smoothed series computed by applying a zero-phase low-pass filter with a cutoff period of 10 years to the annual series calculated using DJF months only (N3.4ddif). The bold font indicates that the correlation is statistically significant.

Table 4.

5. Analysis of the graph

To better understand the reasons underlying GraphEM’s ability to capture far-field climate features like ENSO teleconnections, we now explore the characteristics of the GLASSO-estimated graph G. Figure 13 (top) shows the mean degree (number of neighbors) of every vertex in the graph, a basic measure of network connectivity. The degree ranges from 5 in parts of the Southern Ocean—where mesoscale eddies control the dynamics—to 60 in the deep tropics, which acts as a Rossby wave source (Sardeshmukh and Hoskins 1988) able to project influence at long distances (Horel and Wallace 1981; Simmons et al. 1983).

Fig. 13.
Fig. 13.

(top) Degree of vertices of G (average number of neighbors per grid cell, logarithmic scale). (middle) Average distance to neighbors in G (km). (bottom) Fraction of common edges between G and G1000. To highlight equator/pole contrasts, only regions where this number falls below 70% are shown.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

Figure 13 (middle) displays the average great circle distance to neighbors. It is seen to be equally variable as the mean degree, ranging from a minimum of 150 km in polar and extratropical land regions, to over 10 000 km in equatorial land regions. To some extent, this is a consequence of gridding in 5° boxes, which packs many neighbors in small areas near the poles. Nonetheless, the pattern shows marked deviations from zonal symmetry, including marked land–ocean contrasts. Consistent with past work on climate networks (Tsonis et al. 2006; Zerenner et al. 2014), our analysis also finds tropical nodes to be highly connected to the rest of the globe (many neighbors, with large average distances over equatorial forests). However, this picture is highly granular, implying that there is a lot more to this connectivity than latitude.

One way to probe the structure of G is to contrast it to a neighborhood graph GR. The rationale for this is threefold: 1) by definition, a neighborhood is radially isotropic and geographically confined; 2) neighborhood graphs have been found to provide a reasonable first guess in estimating Σaa (G15); 3) neighborhood graphs perform similarly in our pseudoworld experiments (section 3). We thus define GR for various values of R. To measure similarity between G and GR at each location l, we restricted G to the points within R km of l, then computed the percentage of edges that are common to that subgraph of G and GR. From this we obtain the fraction of common edges (f). We find its mode to be maximized for R = 1000 km, and henceforth focus on this value of R.

Figure 13 (bottom), which shows f over the sphere, reveals some similarity between G and G1000, in the sense that the GLASSO algorithm tends to identify geographical neighbors as climate neighbors. This is particularly true in the tropics, where f ≥ 70% (Fig. 13, bottom and Fig. 14, top left). On the other hand, a more detailed comparison (Fig. 14) shows important differences: while the average distance to neighbors is clustered around 500–600 km in G1000, this metric shows tremendous spread in the GLASSO graph G (Fig. 14, top right), implying very local as well as long-range connections.

Fig. 14.
Fig. 14.

Network connectivity in G (graphical lasso) vs G1000 (neighborhood graph). (top left) Fraction of common edges between graphs, as raw histogram (f). (top right) Average distance to neighbor in both graphs, as kernel density estimate. (bottom left) Local clustering coefficient C and (bottom right) average network degree k. Solid vertical lines show the average clustering coefficient for each graph. Dashed lines show the clustering coefficients of purely random graphs with the same average degree as G and G1000, for reference.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

This has profound consequences for other measures of network topology. The local clustering coefficient C, which measures the local connectedness of a graph’s nodes (Bollobás 1998; Zerenner et al. 2014), is also very different between G and G1000 (Fig. 14, bottom left): while it is tightly clustered around 0.45 in G1000, it ranges from 0.1 to 0.7 in G, with an average of 0.3. In both cases these measures are much larger than the local clustering coefficient of a random graph with identical degree (dashed lines). Similarly, the average shortest pathlength L (Bollobás 1998), which measures how many steps are needed on average to get from one network node to another randomly picked node (Zerenner et al. 2014), is much larger for G1000 than G (18 vs 7). Both are much larger than their random counterparts (around 2 in both cases). Taken together, these large C and L metrics are indicative of small-world networks (Watts and Strogatz 1998), but clearly G is able to capture much more long-range connections than G1000. As Fig. 4 reveals, this is a double-edged sword: too dense a GLASSO graph can make GraphEM rather vulnerable to outliers.

This is illustrated in Fig. 15. The top-left panel shows the neighbors of a point in the Weddell Sea, its neighbors tightly clustered around it. Compare this graph to the top-right panel, which shows the neighbors of a point in the eastern Pacific cold tongue. There the neighbors are seen to spread along the equatorial waveguide, as well as Baja California, connected by coastal Kelvin wave dynamics (e.g., Cane 1984). The bottom-left panel shows the neighbors of a point in the California Current, closely hugging the western coast of the United States. The neighbors of a point centered in western Europe (bottom right) are very close to a distance-based neighborhood graph.

Fig. 15.
Fig. 15.

Illustration of the GLASSO graph. Each panel displays the neighbors of a subjectively chosen vertex (black dot) in the GLASSO graph, colored by the values of their correlation to the node. This illustrates the unequal weight of each node’s neighbors in the estimation problem.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

Finally, we investigate whether G appears to contain meaningful clusters, that is, a collection of regions where the vertices are well connected within a region, but not among regions. We use spectral clustering (see, e.g., von Luxburg 2007) to produce the clustering and compare it to the same neighborhood graph as above (Fig. 16). Clusters were obtained using the spectral clustering implementation in scikit-learn (Pedregosa et al. 2011) with default parameters, and eight clusters. Because of its isotropy, the neighborhood graph looks the same from any vantage point, so spectral clustering returns a symmetric partition of the sphere. On the other hand, GLASSO delineates very different regions. In particular, it contrasts a North American cluster (light blue) with a Eurasian/North African cluster (cyan). In the Southern Hemisphere, the spectral clustering of G identifies three Antarctic clusters, one located over the East Antarctic ice sheet, one adjacent to the Ross sea, and one adjacent to the Weddell sea. Most world oceans form a single cluster (deep blue), which also encompasses tropical landmasses; this likely reflects efficient spreading of climate information by Kelvin and Rossby waves within the tropical waveguide. It is remarkable that clustering the GLASSO graph extracts such climatically relevant features from the data alone, without any other source of information than the number of clusters to retrieve. This suggests that graphs may be used to probe relationships in climate fields.

Fig. 16.
Fig. 16.

Spectral clustering of (top) the GLASSO graph G and (bottom) a 1000-km neighborhood graph G1000. Both use eight clusters.

Citation: Journal of Climate 34, 10; 10.1175/JCLI-D-19-0814.1

6. Discussion

We have applied Gaussian Markov random fields to the imputation of missing values in a leading surface temperature dataset, HadCRUT4.6. Gaussian graphical models allow flexible modeling of the covariance structure of climate fields, characterized by strong anisotropy resulting from wind or ocean currents, wave-propagation patterns, land–ocean contrasts, or orography. This results in estimates of surface temperature that better capture the known structure of climate patterns (e.g., ENSO, Figs. 6, 11). They do so by encoding the conditional independence structure of the field, identifying which neighbors are most essential to infer the value of the field at one point (Markov property). By ignoring the vast majority (here, over 99%) of the others, they greatly reduce the number of parameters to estimate, thereby resulting in a well-conditioned covariance matrix, thus a well-posed estimation problem.

It is important to note that, while graph neighbors might also be geographical neighbors, this process can carry information along vast distances. An analogy might be useful: the autoregressive process of order 1 [AR(1)] is an example of a Markov process, where, by definition, the value of a series xt is conditionally independent of all previous values but xt−1. Yet, for large enough values of the autocorrelation parameter (which quantifies the dependence of xt on xt−1), the process can take many time steps to “forget” a past excursion. Similarly, Markov random fields may capture long-range teleconnections even for purely local graphs (e.g., G1000). The principal advantage of discovering the concentration graph via GLASSO, instead of specifying it via distance-based measures, is therefore in allowing nonisotropic patterns to be pulled out of the data (Fig. 15), something that distance-based methods at the heart of many kriging approaches cannot allow. At the same time, because concentration graphs tend to favor local information, these graphs better prevent the spread of errors across the field than global methods that rely on an eigendecomposition of the covariance matrix, where the leading modes (by definition, the most energetic ones) are also the most global and are thus more vulnerable to observational errors. Methods based on concentration graphs, like GraphEM, therefore offer a useful middle-ground between local and global methods.

This raises another important point: graph-based methods need not be the purview of an EM approach, and in fact, should not be. As pointed out by Schneider (2006), the EM algorithm draws from the center of a distribution (the M in EM corresponds to maximizing the likelihood, which selects the mode), which is known to lower the variance of the dataset (Little and Rubin 2002). Unless explicit steps are taken to utilize the estimates of imputation error output by GraphEM, using only the central estimate will underestimate the true imputation error (in addition to all other sources of uncertainty; Kennedy 2014). There would therefore be value in other inference frameworks that leverage graph-based covariance estimation. Bayesian hierarchical models (Gelman et al. 2013, chapter 5) are one mechanism to accomplish this and have the potential to more fully represent uncertainties, including known biases and imputation error (e.g., Karspeck et al. 2012).

The framework of the Berkeley Earth Surface Temperature dataset (Rohde et al. 2013), which currently uses some form of kriging, would be a natural candidate for graphical covariance estimation. Another realm is data assimilation (Kalnay et al. 1996; Compo et al. 2011; Hakim et al. 2016), where the need for regularizing rank-deficient covariance matrices also arises (Gaspari and Cohn 1999), as it does in optimal detection problems (e.g., Ribes et al. 2013). In addition to atmosphere–ocean applications, another potential domain of application is in geophysical inverse problems like seismic tomography (Dȩbski 2009).

Beyond their use as regularizing tools, concentration graph models also appear useful in characterizing teleconnections within a geophysical field (section 5). This echoes previous work (Tsonis et al. 2006; Paluš et al. 2011; Tupikina et al. 2016) and suggests that concentration graphs should be considered a bona fide analytical tool to be archived alongside the climate fields they served to complete, as done herein (see link to data repository below). One potential application of such graphs would be to characterize the similarity between climate networks simulated by general circulation models and those derived from observational products such as HadCRUT4. This may provide yet another benchmark for model evaluation, one focused on spatial relationships within a field or between fields (e.g., temperature, pressure, and precipitation).

Finally, the results shown here (section 4) testify once again to the large differences in basic climate metrics (e.g., GMT trends) arising from various interpolation methods, or lack thereof. It is therefore particularly important that centers that generate major surface temperature datasets utilize the best available statistical methods to generate complete climate fields. The present work offers one such approach; we hope it will stimulate further research and applications in this area.

Acknowledgments

J.E.G. acknowledges support from Grants AGS 1003818 and DMS 1025465 from the U.S National Science Foundation. J.K. and C.M. were supported by the Met Office Hadley Centre Climate Programme funded by BEIS and Defra. Code is freely available at https://github.com/advaccaro/hadcrut4.6-graphem. Data will be made available at https://zenodo.org/record/4601616 upon publication.

REFERENCES

  • Bollobás, B., 1998: Random graphs. Modern Graph Theory, Springer, 215–252.

    • Crossref
    • Export Citation
  • Bosilovich, M., R. Lucchesi, and M. Suarez, 2016: MERRA-2: File specification. GMAO Office Note 9, version 1.1, 73 pp., https://gmao.gsfc.nasa.gov/pubs/docs/Bosilovich785.pdf.

  • Brohan, P., J. J. Kennedy, I. Harris, S. F. Tett, and P. D. Jones, 2006: Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850. J. Geophys. Res., 111, D12106, https://doi.org/10.1029/2005JD006548.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bunge, L., and A. J. Clarke, 2009: A verified estimation of the El Niño index Niño-3.4 since 1877. J. Climate, 22, 39793992, https://doi.org/10.1175/2009JCLI2724.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cane, M. A., 1984: Modeling sea level during El Niño. J. Phys. Oceanogr., 14, 18641874, https://doi.org/10.1175/1520-0485(1984)014<1864:MSLDEN>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chan, D., and P. Huybers, 2019: Systematic differences in bucket sea surface temperature measurements among nations identified using a linear-mixed-effect method. J. Climate, 32, 25692589, https://doi.org/10.1175/JCLI-D-18-0562.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Christy, J. R., W. B. Norris, R. W. Spencer, and J. J. Hnilo, 2007: Tropospheric temperature change since 1979 from tropical radiosonde and satellite measurements. J. Geophys. Res., 112, D06102, https://doi.org/10.1029/2005JD006881.

    • Search Google Scholar
    • Export Citation
  • Compo, G. P., and Coauthors, 2011: The Twentieth Century Reanalysis Project. Quart. J. Roy. Meteor. Soc., 137, 128, https://doi.org/10.1002/qj.776.

  • Cowtan, K., and R. G. Way, 2014: Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends. Quart. J. Roy. Meteor. Soc., 140, 19351944, https://doi.org/10.1002/qj.2297.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Craven, P., and G. Wahba, 1978: Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31, 377403, https://doi.org/10.1007/BF01404567.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cressie, N., 1990: The origins of kriging. Math. Geol., 22, 239252, https://doi.org/10.1007/BF00889887.

  • Davis, M., 2001: Late Victorian Holocausts: El Niño Famines and the Making of the Third World. Verso, 464 pp.

  • Dȩbski, W., 2009: Seismic tomography by Monte Carlo sampling. Pure Appl. Geophys., 167, 131152, https://doi.org/10.1007/S00024-009-0006-3.

    • Search Google Scholar
    • Export Citation
  • Dempster, A. P., 1972: Covariance selection. Biometrics, 28, 157175, https://doi.org/10.2307/2528966.

  • Dempster, A. P., N. M. Laird, and D. B. Rubin, 1977: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Stat. Soc., 39B, 122, https://doi.org/10.1111/J.2517-6161.1977.TB01600.X.

    • Search Google Scholar
    • Export Citation
  • Deser, C., and Coauthors, 2012: ENSO and Pacific decadal variability in the Community Climate System Model version 4. J. Climate, 25, 26222651, https://doi.org/10.1175/JCLI-D-11-00301.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Easterling, D. R., and M. F. Wehner, 2009: Is the climate warming or cooling? Geophys. Res. Lett., 36, L08706, https://doi.org/10.1029/2009GL037810.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Emile-Geay, J., K. Cobb, M. Mann, and A. T. Wittenberg, 2013a: Estimating central equatorial Pacific SST variability over the past millennium. Part I: Methodology and validation. J. Climate, 26, 23022328, https://doi.org/10.1175/JCLI-D-11-00510.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Emile-Geay, J., K. Cobb, M. Mann, and A. T. Wittenberg, 2013b: Estimating central equatorial Pacific SST variability over the past millennium. Part II: Reconstructions and implications. J. Climate, 26, 23292352, https://doi.org/10.1175/JCLI-D-11-00511.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Flato, G., and Coauthors, 2013: Evaluation of climate models. Climate Change 2013: The Physical Science Basis, T. F. Stocker et al., Eds., Cambridge University Press, 741–866, https://doi.org/10.1017/CBO9781107415324.020.

    • Crossref
    • Export Citation
  • Folland, C. K., and D. Parker, 1995: Correction of instrumental biases in historical sea surface temperature data. Quart. J. Roy. Meteor. Soc., 121, 319367, https://doi.org/10.1002/qj.49712152206.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Folland, C. K., A. W. Colman, D. M. Smith, O. Boucher, D. E. Parker, and J.-P. Vernier, 2013: High predictive skill of global surface temperature a year ahead. Geophys. Res. Lett., 40, 761767, https://doi.org/10.1002/grl.50169.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Friedman, J., T. Hastie, and R. Tibshirani, 2008: Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432441, https://doi.org/10.1093/biostatistics/kxm045.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723757, https://doi.org/10.1002/qj.49712555417.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin, 2013: Bayesian Data Analysis. 2nd ed. Chapman and Hall, 675 pp.

    • Crossref
    • Export Citation
  • Giorgetta, M. A., and Coauthors, 2013: Climate and carbon cycle changes from 1850 to 2100 in MPI-ESM simulations for the Coupled Model Intercomparison Project phase 5. J. Adv. Model. Earth Syst., 5, 572597, https://doi.org/10.1002/jame.20038.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Golub, G. H., M. Heath, and G. Wahba, 1979: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21, 215223, https://doi.org/10.1080/00401706.1979.10489751.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Guillot, D., B. Rajaratnam, and J. Emile-Geay, 2015: Statistical paleoclimate reconstructions via Markov random fields. Ann. Appl. Stat., 9, 324352, https://doi.org/10.1214/14-AOAS794.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hakim, G. J., J. Emile-Geay, E. J. Steig, D. Noone, D. M. Anderson, R. Tardif, N. Steiger, and W. A. Perkins, 2016: The Last Millennium Climate Reanalysis Project: Framework and first results. J. Geophys. Res. Atmos., 121, 67456764, https://doi.org/10.1002/2016JD024751.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hansen, J., and S. Lebedeff, 1987: Global trends of measured surface air temperature. J. Geophys. Res., 92, 13 34513 372, https://doi.org/10.1029/JD092iD11p13345.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hansen, J., M. Sato, R. Ruedy, K. Lo, D. W. Lea, and M. Medina-Elizade, 2006: Global temperature change. Proc. Natl. Acad. Sci. USA., 103, 14 28814 293, https://doi.org/10.1073/pnas.0606291103.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hansen, J., R. Ruedy, M. Sato, and K. Lo, 2010: Global surface temperature change. Rev. Geophys., 48, RG4004, https://doi.org/10.1029/2010RG000345.

  • Hausfather, Z., M. J. Menne, C. N. Williams, T. Masters, R. Broberg, and D. Jones, 2013: Quantifying the effect of urbanization on U.S. Historical Climatology Network temperature records. J. Geophys. Res. Atmos., 118, 481494, https://doi.org/10.1029/2012JD018509.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hausfather, Z., K. Cowtan, D. C. Clarke, P. Jacobs, M. Richardson, and R. Rohde, 2017: Assessing recent warming using instrumentally homogeneous sea surface temperature records. Sci. Adv., 3, e1601207, https://doi.org/10.1126/sciadv.1601207.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hoerl, A. E., and R. W. Kennard, 1970a: Ridge regression: Applications to non-orthogonal problems. Technometrics, 12, 6982; Correction, 12, 723.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hoerl, A. E., and R. W. Kennard, 1970b: Ridge regression: Biased estimation for non-orthogonal problems. Technometrics, 12, 5567, https://doi.org/10.1080/00401706.1970.10488634.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Horel, J., and J. Wallace, 1981: Planetary-scale atmospheric phenomena associated with the Southern Oscillation. Mon. Wea. Rev., 109, 813829, https://doi.org/10.1175/1520-0493(1981)109<0813:PSAPAW>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Huang, B., M. L’Heureux, J. Lawrimore, C. Liu, H.-M. Zhang, V. Banzon, Z.-Z. Hu, and A. Kumar, 2013: Why did large differences arise in the sea surface temperature datasets across the tropical Pacific during 2012? J. Atmos. Oceanic Technol., 30, 29442953, https://doi.org/10.1175/JTECH-D-13-00034.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Huang, B., and Coauthors, 2015: Extended Reconstructed Sea Surface Temperature version 4 (ERSST.v4). Part I: Upgrades and intercomparisons. J. Climate, 28, 911930, https://doi.org/10.1175/JCLI-D-14-00006.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Huang, J., and Coauthors, 2017: Recently amplified Arctic warming has contributed to a continual global warming trend. Nat. Climate Change, 7, 875879, https://doi.org/10.1038/s41558-017-0009-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ilyas, M., C. M. Brierley, and S. Guillas, 2017: Uncertainty in regional temperatures inferred from sparse global observations: Application to a probabilistic classification of El Niño. Geophys. Res. Lett., 44, 90689074, https://doi.org/10.1002/2017GL074596.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ishii, M., A. Shouji, S. Sugimoto, and T. Matsumoto, 2005: Objective analyses of sea-surface temperature and marine meteorological variables for the 20th century using ICOADS and the Kobe collection. Int. J. Climatol., 25, 865879, https://doi.org/10.1002/joc.1169.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Johnstone, I., 2001: On the distribution of the largest eigenvalue in principal components analysis. Ann. Stat., 29, 295327, https://doi.org/10.1214/aos/1009210544.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jones, P. D., and A. Moberg, 2003: Hemispheric and large-scale surface air temperature variations: An extensive revision and an update to 2001. J. Climate, 16, 206223, https://doi.org/10.1175/1520-0442(2003)016<0206:HALSSA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jones, P. D., D. H. Lister, T. J. Osborn, C. Harpham, M. Salmon, and C. P. Morice, 2012: Hemispheric and large-scale land-surface air temperature variations: An extensive revision and an update to 2010. J. Geophys. Res., 117, D05127, https://doi.org/10.1029/2011JD017139.

    • Search Google Scholar
    • Export Citation
  • Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437471, https://doi.org/10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kaplan, A., Y. Kushnir, M. A. Cane, and M. B. Blumenthal, 1997: Reduced space optimal analysis for historical data sets: 136 years of Atlantic sea surface temperatures. J. Geophys. Res., 102, 27 83527 860, https://doi.org/10.1029/97JC01734.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kaplan, A., M. A. Cane, Y. Kushnir, A. C. Clement, M. B. Blumenthal, and B. Rajagopalan, 1998: Analyses of global sea surface temperature 1856-1991. J. Geophys. Res. Oceans, 103, 27 83527 860, https://doi.org/10.1029/97JC01736.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kaplan, A., Y. Kushnir, and M. Cane, 2000: Reduced space optimal interpolation of historical marine sea level pressure: 1854–1992. J. Climate, 13, 29873002, https://doi.org/10.1175/1520-0442(2000)013<2987:RSOIOH>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kaplan, A., M. A. Cane, and Y. Kushnir, 2003: Reduced space approach to the optimal analysis interpolation of historical marine observations: Accomplishments, difficulties, and prospects. WMO/TD-1081, 199–216, http://www.wmo.ch/web/aom/marprog/Wordpdfs/Jcomm-TR/JCOMM\20TR13\20Marine\20Climatology/JCOMM_TR13.pdf.

  • Karl, T. R., S. Hassol, C. Miller, and W. Murray, 2006: Temperature trends in the lower atmosphere: Steps for understanding and reconciling differences. NOAA, 180 pp.

  • Karl, T. R., and Coauthors, 2015: Possible artifacts of data biases in the recent global surface warming hiatus. Science, 348, 14691472, https://doi.org/10.1126/science.aaa5632.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Karspeck, A. R., A. Kaplan, and S. R. Sain, 2012: Bayesian modelling and ensemble reconstruction of mid-scale spatial variability in North Atlantic sea-surface temperatures for 1850–2008. Quart. J. Roy. Meteor. Soc., 138, 234248, https://doi.org/10.1002/qj.900.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kennedy, J. J., 2014: A review of uncertainty in in situ measurements and data sets of sea surface temperature. Rev. Geophys., 52, 132, https://doi.org/10.1002/2013RG000434.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kennedy, J. J., N. A. Rayner, R. O. Smith, D. E. Parker, and M. Saunby, 2011a: Reassessing biases and other uncertainties in sea surface temperature observations measured in situ since 1850: 1. Measurement and sampling uncertainties. J. Geophys. Res., 116, D14103, https://doi.org/10.1029/2010JD015218.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kennedy, J. J., N. A. Rayner, R. O. Smith, D. E. Parker, and M. Saunby, 2011b: Reassessing biases and other uncertainties in sea surface temperature observations measured in situ since 1850: 2. Biases and homogenization. J. Geophys. Res., 116, D14104, https://doi.org/10.1029/2010JD015220.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kent, E. C., and Coauthors, 2017: A call for new approaches to quantifying biases in observations of sea surface temperature. Bull. Amer. Meteor. Soc., 98, 16011616, https://doi.org/10.1175/BAMS-D-15-00251.1.

    • Crossref
    • Search Google Scholar