## 1. Introduction

The work presented herein incorporates the approach to prioritizing data types presented in Huang et al. (2011, hereinafter H11). H11 sought to modify the projection of a 50-yr trend in global average temperature using varying lengths of trends in several observation types, both in situ and remotely sensed. The ensemble of models used for climate projection was the model output from the World Climate Research Programme (WCRP)’s phase 3 of the Coupled Model Intercomparison Project (CMIP3) multimodel dataset. Increasing the length of the data record reduced the range of climate prediction and improved its accuracy. Remotely sensed observation types, especially high-spectral-resolution thermal infrared spectra, reduced uncertainty in climate projection better than did the in situ observation types. The conclusiveness of the study was limited, though, by the small size of the climate model ensemble of opportunity used. Using a larger ensemble of climate models, either an ensemble of opportunity or a perturbed physics ensemble, is needed for follow-on research.

Here we expand on H11 by considering spatial patterns in the prospective observation types and both their long-term averages and trends. We also use the multimodel dataset from WCRP’s phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012) in place of the CMIP3 ensemble. By considering resolved spatial patterns in prospective observation types, it is possible that important information about climate useful for climate prediction can be revealed that is obscured in a global average. For example, the global average trend in reflected shortwave radiation is caused by a combination of decreasing albedo in the Arctic, decreasing snow and ice in midlatitudes, and changes in cloud fraction especially in low latitudes (Feldman et al. 2011, 2013); hence, spatially resolving trends in reflected shortwave radiation may help in attributing trends in reflected shortwave to well-known feedbacks in the climate system. Even though it has been customary to consider long-term trends in observations as relevant to testing climate models according to their ability to predict long-term change, it has never been proven that long-term trends are more relevant a metric than long-term averages or the annual cycle. Already there are suggestions that testing climate models against long-term averages of observations can reduce climate prediction uncertainty (Geoffroy et al. 2012) as can an annual cycle (Hall and Qu 2006). Here we consider both long-term trends in observation types and long-term averages in those observation types as prospective metrics in testing climate models. Because we are concerned with the prioritization of climate monitoring observation types and the annual cycle does not necessarily require long time series of data, we do not address the annual cycle as a metric in testing climate models in this work.

In the second section we give a brief description of the method of testing climate models previously presented in H11. The third section explores the spatial signatures in long-term averages and trends of various observable types that can inform multidecadal global average temperature change, and the fourth section presents an eigen-decomposition of those spatial signatures. The fifth summarizes our results and presents conclusions.

## 2. Approach

We use methods of ensemble climate prediction to prioritize observation types for climate monitoring. Ensemble climate prediction results in a probability density function of a future climate based on predictive runs of a wide variety of climate models subjected to a prescribed set of external boundary conditions. Each of the member models of the ensemble has been adapted or modified to match some previously obtained data that its custodian had determined mandatory to reproduce. While different modeling groups can choose different data for the test, it is already common that they test against multidecadal trends in global average surface air temperature and best estimates of the present-day top-of-atmosphere radiative imbalance. Many other datasets, some of questionable quality, are available for use in testing, but it is not obvious which should be considered next. Prioritization of observation types is intended to answer this question.

One way to prioritize observation types is by weighting models according to how well they match observations. First, a probability density function of climate projection is constructed using the forecasts of all the member models of an ensemble. Next, a weight is determined for each member model according to how well its hindcasts of data compare to observed data (with an allowance for uncertainty due to internal variability). The weights are “Bayes factors” (Min and Hense 2006a,b). Finally, an updated probability density function of climate projection is constructed by applying the models’ weight factors to their forecasts. The relative priority of a candidate observation type is determined by the ratio of the width of the unweighted probability density function to the width of the weighted probability density function.

Another method to prioritize data types is by Bayes’s theorem (Sivia 2006) applied to ensemble prediction. In this method, a large ensemble of climate models is used to populate a joint probability density function of hindcast observation and predicted climate change, and that distribution is modified using a likelihood function in which the data are conditioned on the hindcast (i.e., how likely that the data were obtained if a specific hindcast was true of the climate system?). Application of Bayes’s theorem enables the generation of a posterior probability density function of ensemble prediction after incorporation of data. The ratio of the prior prediction uncertainty and the modified, posterior prediction uncertainty is used to prioritize the observation types and metrics.

It can be shown that, in the limit of an ensemble with an infinite number of members, the two methods of prioritizing observation types presented above are the same. The former is modification of a climate ensemble by conditioning models on data, which is a simple way of describing Bayes’s theorem. We choose to work with the latter method because it permits us to circumvent the lack of an initial condition ensemble for each model and because it is in common use, having been applied in a number of previous studies attempting to infer climate sensitivity from model ensembles modified by data (Forest et al. 2002; Sexton et al. 2012).

**x**and the climate prediction is

**y**; the ensemble mean hindcast and the ensemble mean climate prediction are

*α*_{x}and

*α*_{y}, respectively. Across the ensemble, the covariance in the spread of hindcasts is

**Σ**

_{xx}, the covariance in the spread of climate predictions is

**Σ**

_{yy}, and the covariance between them is

**x**,

**y**) are random and normally distributed (

*α*_{x},

*α*_{y}] and covariance

**Σ**as conditioned by ensemble

**d**is obtained by application of Bayes’s theorem:in which

**d**would have been obtained if the actual climate realized the state

The formulation of the likelihood function depends on the specific definition of the hindcast variable. If each member model of **d** would be expected to differ from **x** is ill suited to our purposes. If, on the other hand, each member of **x** would be defined as the actual evolution of the climate system which the data **d** should match perfectly in the absence of measurement error. The ensemble would span uncertainty in model physics and model initial conditions, and either or both could be inferred given appropriate data. On short time scales for prediction, the likelihood function is more sensitive to initial conditions, and so model physics can be treated as a Bayesian “nuisance parameter.” (This is a qualitative description of modern numerical weather prediction with randomized physics.) On long time scales of prediction, the likelihood function is more sensitive to model physics, and so the initial conditions can be treated as nuisance parameters. The latter is the case given the subject matter of this research, and so hereinafter **d** a single realization of reality). As required of Bayesian nuisance parameters, we integrate over initial conditions by considering a variety of initial conditions within reasonable uncertainty as is explicitly done by the different member models of CMIP5. Ultimately, when testing an individual climate model with actual data, it is necessary to estimate a mean hindcast of the climate model and consider both internal variability and measurement error as the width of the likelihood function. Here we ignore measurement error in prioritizing tests of climate models. The addition of measurement error is unlikely to upset the relative priority of different tests of climate models inasmuch as measurement error should be a fraction of internal variability for climate monitoring missions (Leroy et al. 2008b).

*β*_{y}and its uncertainty covariance is

**Π**

_{yy}. They are given byWe consider the prediction variable

**y**to be a one-element vector, and so the value of the proposed observable

**x**for predicting climate change is proportional to |

**Σ**

_{yy}|/|

**Π**

_{yy}|. Equation (5b) reduces toif the prediction variable

**y**is a scalar

*y*, in which

*ρ*is the correlation coefficient between an element of the hindcast observable

**x**and climate prediction

*y*across the ensemble

*ρ*, the higher the priority of the hindcast observation type and the test metric.

We anticipate that climate models will always be improved in the context of known trends in surface air temperature over the same time period that a newer highly accurate satellite-based data type is considered. Climate modeling groups already test their models and tune them such that their hindcast decadal temperature trends agree with observed temperature trends over the same period. In this sense, the ensemble of climate models contributed to model intercomparison projects is “calibrated.” Different climate modeling groups, however, tune different model physics to bring their models into agreement with temperature trends, and these different tuning practices beget the remaining broad range of multidecadal climate predictions among the models that we are seeking to reduce. It is safe to assume that climate modeling groups will continue to calibrate their models according to trends in surface air temperature for years to come.

Figure 1 illustrates the experimental design. The hindcasts (**x**) are taken from the “historical” scenario of CMIP5, characterized by prescribed anthropogenic and natural radiative forcing from 1850 through 2005. For the hindcasts we produce temporal averages and trends over the period 1970 through 2005, a period in which models tend to produce a strong linear trend in global average surface air temperature. The hindcasts are resampled onto a 2° × 2° longitude–latitude grid such that regional averages are preserved. The climate predictions (**y**) are based on the output of the representative concentration pathway 4.5 (RCP4.5) scenario, characterized by 4.5 W m^{−2} of stabilized radiative forcing after 2100 over preindustrial conditions (Moss et al. 2010). It is considered a moderate projection of future radiative forcing. Multidecadal climate prediction is the difference between the 2090–99 average and global average surface air temperature and a best estimate for the global average surface air temperature in year 2006. The best estimate in 2006 is the intercept at year 2006 determined by linear regression of global average surface air temperature over the period 2006 through 2045. For each model we take only one run, which is connected seamlessly from its historical to its future RCP4.5 scenario, thus guaranteeing continuous time series.

Multidecadal climate prediction, by virtue of its time scale, is expected to be governed by transience in the ocean and climate sensitivity as dictated by atmospheric and surface feedbacks. Figure 2 shows the relationship between climate sensitivity and predicted multidecadal temperature change as defined above, as if climate sensitivity were an observation type (*x*). The correlation between sensitivity and prediction is 0.56, pointing toward climate sensitivity as a controlling factor in multidecadal temperature prediction. The normality of the distribution in Fig. 2 is typical of all the distributions considered in this work.

## 3. Remote sensing observable types

Three remote sensing observation types have either been deployed or technically developed for the purpose of climate monitoring, which means on-board traceability to international standards and adequate spatial–temporal coverage. They are radio occultation using the Global Navigation Satellite Systems (GNSS), high-spectral-resolution thermal infrared spectra (Anderson et al. 2004), and high-spectral-resolution reflected shortwave radiance (Wielicki et al. 2013). GNSS radio occultation has already been deployed in the form of missions of opportunity (Wickert et al. 2001) and as the Constellation Observing System for the Meteorology, Ionosphere, and Climate (COSMIC) constellation (Liou et al. 2007); high-accuracy, high-spectral-resolution longwave and shortwave instruments have been developed to the point of space-readiness but have not yet been deployed. For GNSS radio occultation we consider long-term averages and trends of the geopotential height of the 200-hPa surface, which can be retrieved directly from GNSS radio occultation data (Leroy 1997) and is strongly related to upper-air winds (Leroy et al. 2006; Verkhoglyadova et al. 2014). For the radiation fields we consider broadband downward shortwave and longwave radiative flux at the top of the atmosphere because net downward flux warms the climate system. The radiation fields are for all-sky conditions. Net downward longwave radiation is the opposite of outgoing longwave radiation (CMIP’s variable rlut), and net downward shortwave radiation is the difference between insolation (rsdt) and reflected shortwave flux at the top of the atmosphere (rsut). For diagnostic purposes we also consider temporal averages and trends in net total downward radiation at the top of the atmosphere (TOA) as hindcasts even though net total TOA radiation is extremely difficult to measure with desirable accuracy. The ability of long-term temporal averages and trends of these observable types to inform multidecadal climate prediction is examined in this section. Figures 3 and 4 show the covariances of hindcast and climate prediction (**Σ**_{xy}) for temporal averages and trends of the prospective observation types and the spread in hindcast observations across the ensemble **Σ**_{xx}), and Fig. 5 shows the squared correlation of hindcast and climate prediction (*ρ*^{2}) for the same observation types.

The covariances **Σ**_{xy} shown in Fig. 3 offer easy physical insight. In those plots several large, coherent structures stand out. When the 200-hPa height field is the hindcast, structures related to position of the midlatitude jet streams and warming of Arctic upper air are prominent; when the temporal average of the radiation fields are considered as hindcasts, structures related to clouds and convection stand out; and when trends of the longwave and shortwave fields are considered, the dipole pattern in the equatorial Pacific, which can be explained as a trend in the strength of the Walker circulation, is prominent. Spatially coherent structures such as these in the covariance plots are expected. The hindcasts of climate models differ in ways that should be explainable by acknowledged uncertainties in model physics. Differences in physics emerge over broad regions that are influenced by those physics. If the patterns associated with the uncertain physics correlate strongly with climate predictions, then they may become important metrics for testing climate models.

The spread in hindcast observations **Σ**_{xx} across the ensemble are shown in Fig. 4. Because we retained only one model run for each member model of the ensemble (as discussed in the previous section), the spread in hindcast observations is explained by both intermodel differences in physics and internal variability. In the radiation fields, the spread in hindcast observations is dominated by clouds. For the average net TOA radiation, large intermodel differences are apparent in the equatorial Pacific and over regions of widespread stratocumulus clouds. For the average TOA shortwave, the intermodel differences of tropical and jet-stream clouds are apparent as well as intermodel differences of snow and ice. For the average TOA longwave, differences in clouds in the western and central Pacific are readily apparent. For trends in the radiation fields, both the shortwave and longwave show large differences in the western and central equatorial Pacific clouds, likely associated with trends in the Walker circulation. Differences in trends in shortwave and longwave roughly cancel when the two are added together to form the net TOA total radiation; hence, differences in trends in net TOA radiation are small.

*ρ*

^{2}given byconstitute maps wherein the elements of

**Σ**

_{xy}correspond to individual longitude–latitude grid points, each divided by the diagonal element of

**Σ**

_{xx}, denoted Diag(

**Σ**

_{xx}), corresponding to the appropriate longitude–latitude grid point. The squared correlation is related to reduction in predictive uncertainty according to Eq. (6). Many of the plots in Fig. 5 exhibit squared correlations that exceed 0.50, implying the potential to reduce uncertainty in climate prediction by 30% (i.e.,

**Σ**

_{xy}and

**Σ**

_{xx}is limited in size. The quantities contoured in Fig. 5 are estimators

*r*of correlations

*ρ*, with

*ρ*understood as the correlation that would result should an infinite ensemble of climate models have been considered and

*r*as the correlation that results from the limited draw

*ρ*= 0 everywhere on the globe, the expected probability density function for estimators of correlation

*r*follow a Student’s

*t*distribution withwhere

*n*

_{eff}is degrees of freedom in the computation of

*r*(Alder and Roessler 1972). In this case, it also corresponds to the number of “effectively independent” models used to compute the correlation coefficients in each map. While approximately 50 models have contributed to CMIP5, not all have contributed all of the output variables to the historical and RCP4.5 scenarios needed for hindcast and prediction variables in this work. Consequently, we have typically used only 25 of the CMIP5 models to compute

*r*. Many of the entries in the CMIP5 ensemble have physics in common, however, and so the number of effectively independent models is less than the number of models used to compute the correlation coefficients. Figure 6 shows plots of histograms of the estimators of correlation coefficients taken from every longitude–latitude grid point of the maps of Fig. 5, each point contributing its area on the sphere to the histogram. For each hindcast observation type, the single parameter of the Student’s

*t*distribution

*n*

_{eff}that best fits the structure of each histogram is inferred by matching the maximum of the Student’s

*t*distribution and the maximum of the histogram of

*r*. In this way we circumvent difficulties that arise from the large coherent structures of Fig. 5 that appear as bumps superimposed on a background distribution. The number of effectively independent models

*n*

_{eff}is about 17 for most of the histograms of Fig. 6.

The number of data points that enter into the histograms of Fig. 6 is the number of grid elements in the maps of Fig. 5, but the effective number of data points is less because of spatial coherence in the plots of Fig. 5. The greater the spatial heterogeneity in a plot of Fig. 5 is, the more closely the corresponding histogram in Fig. 6 should match the Student’s *t* distribution for zero correlation if no correlation were present. The larger the coherent structures in a plot of Fig. 5 are, the fewer the number of independent evaluations of *r*, resulting in a coarser correspondence to the Student’s *t* distribution in Fig. 6 if no correlation were present.

A comparison of the Student’s *t* distributions for zero correlation with the histograms of estimators of the correlation coefficients indicates general consistency with no nonzero correlation being present. Each of the global-scale coherent structures prominent in the plots of Fig. 5 contributes one of the aforementioned bumps in Fig. 6. They are particularly noticeable in Figs. 6c, 6e, and 6g, the time-average TOA radiation fields as hindcast observation types. Whether these are statistically significant correlations requires a more sophisticated analysis, which is presented in the following section.

## 4. Eigenmode decomposition of hindcast data

Metrics for testing climate models using climate monitoring observation types almost certainly point toward eigenmode decomposition rather than point-by-point comparison, because the “modes” of differences between climate models’ hindcasts very likely correspond to differences in the models’ physics. One can expect the signature of differing model physics to occur on global scales. The eigenmodes **e**_{μ} of intermodel differences in hindcast over the ensemble **Σ**_{xx}. The eigenmodes are orthogonal, and so intuiting the physics responsible based on the spatial patterns of an individual eigenmode is impossible (Monahan et al. 2009). Nonetheless, *m* statistically significant eigenmodes should mean that *m* uncertainties in model physics are being explained. Here we examine whether or not those explained differences in model physics also play a role in intermodel differences in climate prediction.

*λ*

_{μ}, and so the contours have the same dimensions as the hindcast observable

**x**.

*μ*of the hindcast observation and climate prediction is a scalar:The square of the modal correlation coefficient

*r*

_{μ}is the fraction of the variance of the climate predictions

*y*in ensemble

*μ*, and a sum of the squares of modal correlation coefficients gives a cumulative squared correlation that can be inserted into Eq. (6) to estimate reduction of uncertainty in climate prediction. With

*n*

_{m}independent models, however, all of the variance in climate model prediction is explained by

*n*

_{m}− 1 eigenmodes,which seems to suggest reduction of uncertainty in climate prediction to zero, an absurd result. Just as

*r*is an estimator of

*ρ*, however,

*r*

_{μ}is an estimator for

*ρ*

_{μ}, as if the

*n*

_{m}models that define ensemble

*n*

_{m}used to construct the covariance of hindcast uncertainty

**Σ**

_{xx}is less than the rank of

**Σ**

_{xx}.

A truncation criterion is needed to limit the number of eigenmodes that can be legitimately retained in uncertainty reduction. There are two possible approaches to limiting the number of eigenmodes: consideration of significance in associated variance *λ*_{μ} and consideration of uncertainty in the associated modal correlation coefficient *r*_{μ}. The criterion for significance in associated variance was presented by North et al. (1982). This criterion holds that the only statistically significant eigenmodes are those with eigenvalues significantly different from zero. The uncertainty in any eigenvalue is the sum of all of the eigenvalues divided by (*n*_{m} − 2)^{1/2}. Therefore, all of those eigenmodes whose eigenvalues explain more than (*n*_{m} − 2)^{−1/2} of the total of the eigenvalues of **Σ**_{xx} can be considered significant modes of intermodel difference in hindcasts. Table 1 contains a list of the eigenvalues *λ*_{μ} and modal correlation coefficient *r*_{μ} for the gravest four modes of intermodel hindcast difference, sorted by decreasing eigenvalue *λ*_{μ}. Application of the criterion of North et al. (1982) yields the conclusion that only one mode of the six hindcast observation types considered is statistically significant.

Modal variance and modal correlation coefficient (|*ρ*_{μ}|) associated with the first four eigenmodes for various hindcast observation types, sorted by decreasing modal variance. Each hindcast observation type is either a temporal average or a trend over the years 1970–2005, as noted. The modal variance *λ*_{μ} is given as a fraction of the total variance **Σ**_{xx}. The correlation coefficient *r*_{μ} is defined according to Eq. (10).

*r*

_{μ}is also considered. Table 2 contains a list of estimates of modal correlation coefficients for the eigenmodes of hindcast differences for the four eigenmodes sorted by descending |

*r*

_{μ}|. We consider the

*r*

_{μ}as estimators of the modal correlation coefficients

*ρ*

_{μ}of an infinite pool of climate models but based on a subset of

*n*

_{m}draws taken from the infinite pool. Because even

*n*

_{m}estimators

*r*

_{μ}must explain all of the variance in the climate predictions of ensemble

*r*

_{μ}| are very large even if all

*ρ*

_{μ}of the infinite pool are zero. Figure 8 shows probability density functions for the largest

*r*

_{μ}that can be expected for a given data type

*n*

_{m}independent models have been used in the formulation of

**Σ**

_{xx}and all

*ρ*

_{μ}= 0. This was done by constructing 500 000 scenarios for each value of

*n*

_{m}. Each member of the 500 000 scenarios contains model principal components

*p*

_{μ,i}and climate predictions

*y*

_{i}. The principal component

*p*

_{μ,i}is best interpreted as the projection of model

*i*’s hindcast observation field

**x**

_{i}onto eigenmode

**e**

_{μ}:where

*p*

_{μ,i}are simulated by first generating an (

*n*

_{m}+ 1)-element vector

**x**

_{i}for each model

*i*using a normal distribution random number generator and then determining the principal components

*p*

_{μ,i}by singular value decomposition. As is standard for principal components,

*y*

_{i}are also generated using a normal distribution random number generator, and the correlation coefficient estimators are computed bywhere

*n*

_{m}− 1 for an ensemble of

*n*

_{m}models, thus

*μ*∈ [1,

*n*

_{m}− 1]. For each scenario of the 500 000 scenarios, the largest value of

*ρ*

_{μ}= 0 can be rejected with varying degrees of confidence. These values are obtained by integrating under the curves of Fig. 8. The relevant number of degrees of freedom when assessing statistical significance of correlation coefficients is the effective number of degrees of freedom, previously found to be approximately 17.

Modal correlation coefficient (|*r*_{μ}|) and modal variance (*λ*_{μ}) associated with the first four eigenmodes for various hindcast observation types, sorted by decreasing modal correlation coefficients (see Table 1). The eigenmodes have been renumbered after resorting.

The values of *ρ*_{μ} = 0 can be rejected with confidences of 50%, 75%, 95%, and 99% for varying numbers of models *n*_{m} (see Fig. 8).

Only three eigenmodes in all the considered hindcast observation types have considerable fractional variance (*λ*_{μ} > 0.01) and correlate significantly with climate prediction (*r*_{μ}) even if marginally, as is apparent in Table 2. The three hindcast observation types are the 1970–2005 trend in the 200-hPa height, the 1970–2005 trend in net TOA downward shortwave radiation, and the 1970–2005 average in net downward longwave radiation. The most prominent is the 1970–2005 trend in the 200-hPa height, with *r*_{μ} = 0.582. The most relevant modes, as defined by large |*r*_{μ}|, are plotted in Fig. 7. The confidence level that the eigenmode in Fig. 7a has a nonzero correlation with climate prediction is approximately 80%.

## 5. Discussion

We have pursued the prioritization of prospective remotely sensed climate monitoring observation types with the goal of improving climate prediction capability, having expanded upon a previous study by considering spatial information and long-term trends versus long-term averages over the same time series. We have performed calculations against a realistic backdrop, namely that climate models will continue to be calibrated with some existing data as they are currently. We anticipate that climate modeling groups will perform additional tests of their models against data, selecting first those data that they deem most important for testing their models. The models they generate will be tuned to agree with those data. In prioritizing observation types and metrics, we are recommending to climate modeling groups what data should be given priority in testing their models and what metrics should be applied in the tests. These observation types should be given priority in a climate monitoring system.

Using a method that has been applied to conditioning large perturbed physics ensembles of a climate model, we sought the hindcast observation types that correlate best with multidecadal change in global average surface air temperature. When resolved spatially, the hindcast observations can seemingly correlate very strongly with long-term climate prediction; however, when many thousands of correlations are estimated, statistical noise is capable of generating occasionally large estimates of correlation coefficients. We have shown that the correlations between hindcast and climate prediction are generally consistent with there being no nonzero correlation present (cf. Fig. 6) with the possible exception of a few prominent large-scale structures in Fig. 5. The candidate hindcast observation types for highest priority are Figs. 7a, 7e, and 7i. Figures 7a and 7e lend themselves to simple physical interpretation: tropospheric expansion in the Arctic as measured by GNSS radio occultation contains multidecadal predictive capability, and an increase in the strength of the Walker circulation as born out in net TOA downward shortwave radiation. The Walker circulation can increase low cloud fraction in the central and eastern tropical Pacific because of enhanced subsidence and decrease low cloud fraction in the tropical western Pacific by enhancing rising motion, leading to the signature seen in Fig. 7e. Change in the intensity of the Walker circulation has attracted attention previously (Vecchi et al. 2006).

There are several possible causes for the slight improvements in climate prediction we have found. First, it is possible that the CMIP5 ensemble of climate models is not large enough to arrive at decisive statistics. Approximately 50 models contributed to CMIP5, about 25 of which contributed the scenarios needed in this study, and only about 17 of those can be considered independent. A large perturbed physics ensemble of a single climate model may provide better statistics, but previous efforts with perturbed physics ensembles have had difficulty reducing uncertainty in equilibrium climate sensitivity (Sexton et al. 2012). Second, the climate modeling groups may already have effectively tested their models against data corresponding to the observation types we have considered here. If so, then no correlation between hindcasts of that observation type and climate prediction would be found. This seems unlikely, though, because of the large spreads that exist in most of the hindcast observation types as expressed through **Σ**_{xx}. Third, the spectral information contained in the radiance observation types being considered for the Climate Absolute Radiance and Refractivity Observatory (CLARREO) mission may enable the distinguishing between external radiative forcing and radiative response, and that separation might enable larger correlation coefficients. Radiative response should provide information on radiative feedbacks and hence equilibrium climate sensitivity (Leroy et al. 2008a; Huang et al. 2010; Feldman et al. 2013). On the other hand, the spatial structure of signals in hindcasts can be expected to contain much of the same information as spectral structure: there is little doubt in Fig. 3 where clouds are responsible for correlation and where ice is responsible. In any case, further investigation is warranted to understand multidecadal predictability of climate and to find climate monitoring observation types of the highest priority.

We acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output. For CMIP the U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. The authors thank Profs. Richard Goody and Guido Visconti for their intellectual contributions to this research. Dr. Leroy was supported by Grant NNX11AD01G of the U.S. National Aeronautics and Space Administration. Drs. Redaelli and Grassi were supported in part by the Italian Space Agency (ASI) under the QUITSAT project.

## REFERENCES

Alder, H., , and E. Roessler, 1972:

W. H. Freeman, 373 pp.*Introduction to Probability and Statistics.*Anderson, J., , J. Dykema, , R. Goody, , H. Hu, , and D. Kirk-Davidoff, 2004: Absolute, spectrally-resolved, thermal radiance: A benchmark for climate monitoring from space.

,*J. Quant. Spectrosc. Radiat. Transf.***85**, 367–383, doi:10.1016/S0022-4073(03)00232-2.Andrews, T., , J. Gregory, , M. Webb, , and K. Taylor, 2012: Forcing, feedbacks and climate sensitivity in CMIP5 coupled atmosphere–ocean climate models.

*Geophys. Res. Lett.,***39,**L09712, doi:10.1029/2012GL051607.Feldman, D., , C. Algieri, , W. Collins, , Y. Roberts, , and P. Pilewskie, 2011: Simulation studies for the detection of changes in broadband albedo and shortwave nadir reflectance spectra under a climate change scenario.

,*J. Geophys. Res.***116**, D24103, doi:10.1029/2011JD016407.Feldman, D., , D. Coleman, , and W. Collins, 2013: On the usage of spectral and broadband satellite instrument measurements to differentiate climate models with different cloud feedback strengths.

,*J. Climate***26**, 6561–6574, doi:10.1175/JCLI-D-12-00378.1.Forest, C., , P. Stone, , A. Sokolov, , M. Allen, , and M. Webster, 2002: Quantifying uncertainties in climate system properties with the use of recent climate observations.

,*Science***295**, 113–117, doi:10.1126/science.1064419.Geoffroy, O., , D. Saint-Martin, , and A. Ribes, 2012: Quantifying the sources of spread in climate change experiments.

,*Geophys. Res. Lett.***39**, L24703, doi:10.1029/2012GL054172.Hall, A., , and X. Qu, 2006: Using the current seasonal cycle to constrain snow albedo feedback in future climate change.

,*Geophys. Res. Lett.***33**, L03502, doi:10.1029/2005GL025127.Huang, Y., , S. Leroy, , P. Gero, , J. Dykema, , and J. Anderson, 2010: Separation of longwave climate feedbacks from spectral observations.

,*J. Geophys. Res.***115**, D07104, doi:10.1029/2009JD012766.Huang, Y., , S. Leroy, , and R. Goody, 2011: Discriminating between climate observations in terms of their ability to improve an ensemble of climate predictions.

,*Proc. Natl. Acad. Sci. USA***108**, 10 405–10 409, doi:10.1073/pnas.1107403108.Leroy, S., 1997: Measurement of geopotential heights by GPS radio occultation.

,*J. Geophys. Res.***102**, 6971–6986, doi:10.1029/96JD03083.Leroy, S., , J. Anderson, , and J. Dykema, 2006: Testing climate models using GPS radio occultation: A sensitivity analysis.

*J. Geophys. Res.,***111,**D17105, doi:10.1029/2005JD006145.Leroy, S., , J. Anderson, , J. Dykema, , and R. Goody, 2008a: Testing climate models using thermal infrared spectra.

,*J. Climate***21**, 1863–1875, doi:10.1175/2007JCLI2061.1.Leroy, S., , J. Anderson, , and G. Ohring, 2008b: Climate signal detection times and constraints on climate benchmark accuracy requirements.

,*J. Climate***21**, 841–846, doi:10.1175/2007JCLI1946.1.Liou, Y., , A. Pavelyev, , S. Liu, , A. Pavelyev, , N. Yen, , C. Huang, , and C. Fong, 2007: FORMOSAT-3/COSMIC GPS radio occultation mission: Preliminary results.

,*IEEE Trans. Geosci. Remote Sens.***45**, 3813–3826, doi:10.1109/TGRS.2007.903365.Min, S., , and A. Hense, 2006a: A Bayesian approach to climate model evaluation and multi-model averaging with an application to global mean surface temperatures from IPCC AR4 coupled climate models.

*Geophys. Res. Lett.,***33,**L08708, doi:10.1029/2006GL025779.Min, S., , and A. Hense, 2006b: A Bayesian assessment of climate change using multimodel ensembles. Part I: Global mean surface temperature.

,*J. Climate***19**, 3237–3256, doi:10.1175/JCLI3784.1.Monahan, A., , J. Fyfe, , M. Ambaum, , D. Stephenson, , and G. North, 2009: Empirical orthogonal functions: The medium is the message.

,*J. Climate***22**, 6501–6514, doi:10.1175/2009JCLI3062.1.Moss, R., and Coauthors, 2010: The next generation of scenarios for climate change research and assessment.

,*Nature***463**, 747–756, doi:10.1038/nature08823.North, G., , T. Bell, , R. Cahalan, , and F. Moeng, 1982: Sampling errors in the estimation of empirical orthogonal functions.

,*Mon. Wea. Rev.***110**, 699–706, doi:10.1175/1520-0493(1982)110<0699:SEITEO>2.0.CO;2.Sexton, D., , J. Murphy, , M. Collins, , and M. Webb, 2012: Multivariate probabilistic projections using imperfect climate models. Part I: Outline of methodology.

,*Climate Dyn.***38**, 2513–2542, doi:10.1007/s00382-011-1208-9.Sivia, D., 2006:

Oxford University Press, 246 pp.*Data Analysis: A Bayesian Tutorial.*Taylor, K., , R. Stouffer, , and G. Meehl, 2012: An overview of CMIP5 and the experiment design.

,*Bull. Amer. Meteor. Soc.***93**, 485–498, doi:10.1175/BAMS-D-11-00094.1.Vecchi, G., , B. Soden, , A. Wittenberg, , I. Held, , A. Leetma, , and M. Harrison, 2006: Weakening of the tropical Pacific atmospheric circulation due to anthropogenic forcing.

,*Nature***441**, 73–76, doi:10.1038/nature04744.Verkhoglyadova, O., , S. Leroy, , and C. Ao, 2014: Estimation of winds from GPS radio occultations.

*J. Atmos. Oceanic Technol.,***31,**2451–2461, doi:10.1175/JTECH-D-14-00061.1.Wickert, J., and Coauthors, 2001: Atmosphere sounding by GPS radio occultation: First results from CHAMP.

,*Geophys. Res. Lett.***28**, 3263–3266, doi:10.1029/2001GL013117.Wielicki, B., and Coauthors, 2013: Achieving climate change absolute accuracy in orbit.

,*Bull. Amer. Meteor. Soc.***94**, 1519–1539, doi:10.1175/BAMS-D-12-00149.1.