## 1. Introduction

Optimal detection, otherwise called linear multipattern regression, has been the method of choice for attributing global change to specific causes, natural and anthropogenic. It is the preferred method for addressing problems of attribution of climate change in the Fourth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC). The method is described qualitatively in box 12.1 of the Third Assessment Report (Houghton et al. 2001) and has been presented rigorously elsewhere (Bell 1986; Hasselmann 1993; North et al. 1995; Hasselmann 1997). In the method, a spatiotemporal pattern for a climate signal is predicted by a climate model by subtracting a control run of the climate model from a run subjected to a forcing that was not included in the control. Care is taken to minimize the influence of natural variability in the signal pattern. Then the signal pattern—in both space and time—is multiplied by a scalar to fit a time series of data under the condition that the postfit residuals are consistent with the statistics of naturally occurring, interannual fluctuations of the climate system, as prescribed by a long control run of a climate model (Allen and Tett 1999). The final result is the multiplying scalar along with the uncertainty in its determination that, when taken together, yield a confidence level that a forced signal has been detected.

There are many contributors to uncertainty in optimal detection. Allen and Stott (2003) give a thorough presentation of these contributors. One of them arises from the dependence of the spatiotemporal patterns of externally forced signals on the climate model used to simulate them. The term is first presented in Eq. (2.10) of Bell (1986), is rigorously derived in Allen and Stott (2003), and was first applied in Huntingford et al. (2006). The uncertainty it introduces takes the form of an uncertainty covariance matrix in the spatiotemporal patterns of the signal, as generated by an ensemble of different runs of a climate model or multiple climate models. Intuitively, uncertainty because of the differences in signal pattern should not arise from models that produce the same pattern but with different overall amplitude; thus, the signals produced by different models must first be normalized before a signal pattern uncertainty covariance matrix is estimated. The derivations in Bell (1986) and Allen and Stott (2003) provide no prescription for the normalization of simulated signal patterns before their uncertainty covariance is computed, so Huntingford et al. (2006) implement a sensible but ultimately ad hoc normalization in their computations.

Because the results of optimal detection differ depending on the conditions of its application, and because optimal detection itself can be derived from the rules of conditional probability (Leroy 1998), the problem of normalization of signal patterns may be addressed by a rigorous application of Bayes’s theorem to the problem of optimal detection, using an ensemble of climate model runs to prescribe signal patterns. The same rigorous application may also ascribe physical meanings to the scalars used to multiply the signal patterns to obtain a best fit to data. In this paper, we present such a treatment for optimal detection, and along the way we point out the implicit assumptions that must be made to obtain the equations of optimal detection from a more general application of Bayes’s theorem to the problem of climate signal detection.

The second section of this paper contains a derivation of the equations of optimal detection using an ensemble of climate model runs that use the rules of conditional probability to determine the appropriate method of normalizing signal patterns produced by different model runs. The findings will lead to a more general interpretation of optimal detection than simple attribution to external influences. One additional interpretation is that global datasets can be used to find the underlying climate trends in regional signals for noisy time series. The third section presents two illustrative applications of optimal detection to regional trends, made feasible by two distinct normalizations of the signal patterns as produced by an ensemble of climate model runs. The results will demonstrate the nontriviality of the issue of signal pattern normalization. Finally, the fourth section contains a summary of the preceding sections and a discussion of the implications of this theoretical work.

## 2. Normalizing signal patterns

**d**is a measurement of climate change in a dataset. It can take the form of the average of one period of time subtracted from the average of a later period of time, or it can take the form of a trend in time computed by linear regression of a time series of data. The coefficients Δ

**are the multipliers of the signals thought to emerge in the data as a response of climate to external forcing otherwise obscured by natural variability. A continuum of values of the coefficients is possible; here, Δ**

*α*

*α*_{mp}are the “most probable” values though.

**. The postfit residual is**

*α**d*

**n**, a single realization of a random system, itself described by a Gaussian distribution with covariance

**Σ**. There are no demands on the dimensionality of the data vector nor the data types; in fact, it is possible to mix data types in the construction of

_{n}**d**. The optimal fingerprints are

*d*

**n**are simply the fluctuations of internal “natural” variability. These are the equations of optimal detection. See Allen and Stott (2003) for an in-depth explanation and exploration of optimal detection.

**g**in an observation vector due solely to a specific external forcing is directly proportional to the product of the Bayesian “likelihood” function and the Bayesian prior function. In this case, the likelihood function is the first term in the numerator of Eq. (5) and is the probability density function (PDF) of obtaining data Δ

**d**if the true climate change is Δ

**g**. The prior is the second term in the numerator of Eq. (5) and is the PDF for obtaining climate change Δ

**g**using model

*M*. See Sivia (2006) for a tutorial on Bayesian data analysis. Model

_{μ}*M*can be used to determine the secular trend of climate given a specified external forcing. Implicit in this approach is that a secular trend or perturbation from a climate equilibrium state exists in response to an external forcing, distinct from other fluctuations of a climate system in equilibrium, and is not directly observable. This is arguably the central assumption of climate change research. The internal fluctuations on interannual time scales obscure the underlying trends on decadal time scales, and so it is advantageous to suppress the internal variability when evaluating the numerator of Eq. (5). When diagnosing climate models, this is best done by averaging together multiple forced runs of the same climate model.

_{μ}The following subsections state the assumptions needed to obtain the equations of optimal detection from Bayesian inference (sections 2a–2d) and then state how normalization of signals is accomplished (section 2e).

### a. Assumption of a separable prior

*k*components of climate change,

*k*≥ 1, and those changes are linear in well defined scalars

*α*:

_{i}*i*th column of the Jacobian ∂

**g**/∂

**is ∂**

*α***g**/∂

*α*. The first assumption of optimal detection is that the prior is separable; that is, the prior in Eq. (7) becomes

_{i}**g**/∂

**are precisely known properties of the climate model**

*α**M*. Thus, the contingency of ∂

_{μ}**g**/∂

**upon model**

*α**M*implies that the corresponding prior function—the first term on the right of Eq. (8)—is singular at ∂

_{μ}**g**/∂

**= (∂**

*α***g**/∂

**)**

*α**. Assuming the prior is separable, the Bayesian formulation for climate change detection and attribution becomes*

_{Mμ}*δ*[···] is a Dirac delta function. That the Bayesian prior is separable is the first assumption.

### b. Assumption of normally distributed residuals

*d*

**n**~

**0**,

**Σ**). Read this notation as—the random fluctuations of natural variability

_{n}*d*

**n**have a Gaussian distribution with zero mean (〈

*d*

**n**〉 =

*d*

**n**

*d*

**n**

^{T}〉 =

**Σ**. Applying the model for the data, Eq. (2), the likelihood function is also normally distributed

_{n}**g**/∂

**)**

*α**with s*

_{Mμ}_{i}= (∂

**g**/∂

*α*)

_{i}*the*

_{Mμ}*i*th column of matrix 𝗦. The assumption of normally distributed residuals is explicit elsewhere in the literature on optimal detection and is the second assumption here.

### c. Assumption of uninformed sensitivity

**is an uninformative or flat one:**

*α***. The solutions are Eqs. (1), (3), and (4). We have not yet accounted for uncertainty in signal patterns.**

*α*At this stage, under the condition of just one contingent model *M _{μ}*, optimal detection only depends on how one separates the climate change signal into multiple components and not on how one defines the scalars

*α*. For the problem of detecting anthropogenic warming in global surface air temperature, for instance, the climate signal can be separated into a component due to anthropogenic forcing, another due to volcanic aerosol, another due to sulfate aerosol, and another due to the solar cycle. The scalar assigned to each can be nondimensional, indicative of the global average surface air temperature change or some other quantity. Application of the equations of optimal detection to obtain a most probable Δ

_{i}**and its error covariance**

*α***Σ**will differ because of different definitions of

_{α}**, but the confidence levels of detection will remain the same. This can be checked simply by scaling each**

*α**α*by

_{i}*q*by

_{i}*α*′

_{i}=

*q*substituting

_{i}α_{i}*α*′

_{i}for

*α*in the equations of optimal detection, and computing confidence levels of detection.

_{i}### d. Assumption of a continuum of models

**is the weighted sum of the posterior distribution function for each model,**

*α**M*. The weights

_{μ}*P*(Δ

**d**,

*M*) in this equation are new; each is the joint probability of the data and the model

_{μ}*M*. Some models’ prescriptions of signal pattern will be more consistent with the data than others, and those models will be preferentially weighted. With the law of multiplication for conditional probabilities, it is related to the denominator on the right of Eq. (7):

_{μ}*P*(

*M*) = 1 is appropriate. Putting together Eqs. (9), (11), (12), and (13) yields the conditional probability for underlying trends of the climate given data and an ensemble of models:

_{μ}_{𝗦}:

**with the most probable values given by**

*α*_{}indicate an ensemble average over

*δ*𝗦 is the departure of the fingerprints 𝗦

*for a given model*

_{μ}*M*from the intermodel mean

_{μ}_{s}in Eq. (17d) can be derived from Eq. (16) using a nontrivial coordinate transformation, which calls for the computation of the marginal PDFs of the subspace of Σ

_{𝗦}described by 𝗦Δ

**. The rest of the space of Σ**

*α*_{𝗦}can be considered “nuisance” parameters and are easily integrated over as in Sivia (2006).

Equations (17a)–(17d) are the equations of optimal detection, with an accounting for uncertainty in the spatiotemporal patterns of climate signals derived in the context of an ensemble of climate models using Bayesian formalism. For comparison, see Eq. (2.10) in Bell (1986), section 3 of Allen and Stott (2003), and Eqs. (1)–(3) of Huntingford et al. (2006), which precisely give Eq. (17c).

### e. Normalization of signal shapes

It is now possible to define how to normalize signal patterns in the process of accounting for model uncertainty. The normalization is defined according to Eq. (6). The underlying climate change for any time series of data can be written as a linear combination of multiple signals, each of which can be normalized by arbitrary and completely general scalars that are strongly related to the existence of climate trends. The obvious application is to regional detection and attribution. For example, the normalization scalars Δ** α** can be defined as regional surface air temperature changes that result from different external forcings. This places no demand on the data vector

**d**, and so the data field is completely general. For example, the data field can be anything from in situ temperature data to calibrated radiances obtained remotely. It may include the region of interest. Optimal detection with an accounting for model uncertainty turns out to be a method to extract information from arbitrary and mixed-type datasets to infer the underlying climate trends for any noisy quantity in the climate system.

Hereafter, we call a normalized signal pattern a climate fingerprint. That there is a PDF of fingerprints with finite width, as defined by a continuum of models, is a consequence of the uncertain physics in climate models, which leads to uncertain connections between a climate response (free of natural variability) and observable data types. The more certain the physical relationships are between a climate trend in a particular variable and an observed data field, the smaller the width of the PDF in fingerprints will be, and optimal detection will consider internal variability as the dominant source of residuals. Likewise, the less certain the physical relationships are between a climate trend in a particular variable and an observed data field, the greater the width of the PDF in fingerprints will be, and optimal detection will consider fingerprint uncertainty as the dominant source of residuals.

We call the columns of the matrix 𝗦 the climate fingerprints and the columns of the matrix 𝗙_{mp} the “optimal fingerprints.” The optimal fingerprints can be thought of as linear spatiotemporal filters trained by an ensemble of climate models to infer underlying climate trends. They are the contravariant vectors of the climate fingerprints because

## 3. Illustrative examples

Here, Eqs. (17a)–(17d) are applied using the same data field but target different regions to illustrate how the method described in the preceding section works. We use maps of Northern Hemisphere surface air temperature trends to infer the trend associated with anthropogenic forcing in the regions of the central United States and Northern Europe.

_{mp}=

**f**

_{mp}). It also provides multiple preindustrial control runs. We define a scalar change in the form of a climate trend:

**f**

_{mp}is determined using the outputs of all the CMIP3 models but one, and the output of that one is used as a stand-in for data to test the analysis method. This method is sometimes referred to as the perfect model test. The lone element of the scalar change Δ

**is effectively the regional average temperature trend**

*α**dT*

_{mp,region}/

*dt*, and the data change Δ

**d**is the long-term trend in the data field

*d*

**d**/

*dt*. The major operation of Eq. (18) is linear. Because the trend in the data field

*d*

**d**/

*dt*is the time derivative of a time series of data

**d**(

*t*), the underlying trend in regional surface air temperature can be written also as a time series

*T*

_{mp,region}(

*t*):

*T*

_{region}(

*t*), but it is the inferred most probable estimate of the regional surface air temperature trend associated with SRES A1B forcing; that is, the climate response in

*T*

_{region}without natural variability.

**s**for each model is computed by dividing the 40-yr trend in Northern Hemisphere surface air temperature by the 40-yr trend in the regionally restricted central U.S. surface air temperature [cf. Eq.(6)]. We use 40-yr trends instead of 10-yr trends to reduce the error in

**s**due to the interannual variability internal to each climate model. A mean

**is computed by averaging together the**s

**s**over the ensemble of CMIP3 models

**s**is subsequently computed according to Eq. (17d). Internal variability is computed from a long preindustrial control run of a climate model taken from CMIP3 and represents the range of trends in

*d*

**d**/

*dt*that can be realized by internal variability without any anthropogenic climate forcing over 10 yr. If the year-to-year internal variability of Northern Hemisphere annual average temperature is Σ

**, then the internal variability of a 10-yr trend in Northern Hemisphere annual average temperature Σ**

_{n}_{dn/dt}is related to Σ

**approximately by**

_{n}*τ*

_{var}is the persistence time of the major modes of variability of Northern Hemisphere surface air temperature. We approximate it as 1.4 yr, generally reflecting the variability associated with the El Niño–Southern Oscillation (ENSO). In our examples, Σ

*= Σ*

_{r}*+ Σ*

_{s}_{dn/dt}in Eq. (17c).

The diagonal elements of Σ* _{r}* take the form of expected squared postfit residuals. Figure 1 shows the diagonal elements of Σ

*and Σ*

_{s}_{dn/dt}for finding underlying trends of climate change in central U.S. surface air temperature using 10 yr of Northern Hemisphere surface air temperature data. Internal variability is greatest in the Arctic, with a noticeable contribution from the Pacific Ocean due to ENSO variability. Fingerprint uncertainty is also largest in the Arctic, conveying the general lack of utility of Arctic temperature trends in providing information on lower-latitude temperature trends. Throughout, the contribution of internal variability to Σ

*far outweighs the contribution of signal pattern uncertainty.*

_{r}*requires special care. The equation for the optimal fingerprint*

_{r}**f**

_{mp}can be written in the following inner product form:

**e**

_{ν}and

*λ*are the

_{ν}*ν*th eigenvector and eigenvalue of matrix Σ

*, and 〈… , …〉 is an inner product defined by the same rule used to compute the eigenvectors and eigenvalues, 〈Σ*

_{r}*,*

_{r}**e**

_{ν}〉 =

*λ*

_{ν}**e**

_{ν}. The summations must be truncated at some finite

*m*. The relationship between the fingerprints and optimal fingerprints becomes 〈

**f**

_{mp},

**〉 = 1.**s

Figure 2 contains plots of the optimal fingerprints **f**_{mp} for the cases of the central United States and Northern Europe. Each is the optimal fingerprint—the set of coefficients used to multiply a decadal-time-scale trend in the field of Northern Hemisphere surface air temperature—to obtain a most probable estimate for a regional average surface air temperature trend associated with SRES A1B forcing. In the case of the central United States, the optimal fingerprint heavily weights toward the central United States itself, which is expected when the historical regional trend contains strong information on the underlying trend associated with climate change. The component of the fingerprint external to the central United States then contains information that reduces the “noise” of internal variability in the central United States. In the case of Northern Europe, the optimal fingerprint has little weight in Northern Europe. Rather, the optimal fingerprint for Northern Europe heavily weights toward temperature trends in the Arctic, unlike the case of the central United States. The optimal fingerprints for both the central United States and Northern Europe, though, are positive throughout most of the Northern Hemisphere, indicating that both regions can be expected to have positive (negative) trends when the Northern Hemisphere also has a positive (negative) trend. Most importantly, the optimal fingerprints differ greatly, depending on how they are normalized, when accounting for fingerprint uncertainty in optimal detection.

Figure 3 contains plots of the actual regional average surface air temperature, *T*_{region}(*t*) and the most probable inference of the component associated with a long-term trend due to climate forcing SRES A1B, *T*_{mp,region}(*t*). A simulated truth dataset is the first 10 yr of output of a CMIP3 model subjected to SRES A1B forcing and not included in the formulation of the optimal fingerprint **f**_{mp}. The most probable “climate” trend is determined by linear regression of *T*_{mp,region}(*t*) over the first 10 yr of the forced model run. The uncertainty in the trend is the uncertainty determined by standard linear regression error analysis in *T*_{mp,region}(*t*) (von Storch and Zwiers 1999). Using data outside the region of interest suppresses interannual internal variability by a factor of 10 in the central United States and by a factor of 7 in Northern Europe.

To demonstrate the near-term climate-forecasting capability of this method, Fig. 3 also shows the evolution of the area-averaged surface air temperature for years 10–20, taken from the same model run that produced the first 10 yr of data. The 10-yr climate prediction is simply an extrapolation of the linear regression of *T*_{mp,region}(*t*), with a 1-standard deviation error envelope. With 10 yr of data, regional average temperature for the central United States and Northern Europe can be projected with an uncertainty of 0.1 K 10 yr into the future. To compute a PDF for a future prediction, one must convolve the PDF for the climate projection with a PDF describing interannual internal variability for that region.

For Figs. 2 and 3 we computed 21 optimal fingerprints, using 21 different truncations, *m* = 20 through *m* = 40 and averaged the resulting fingerprints together. The slope of the *T*_{mp,region}(*t*) time series is almost completely independent of the truncations *m* = 20 through *m* = 40. After averaging optimal fingerprints together, only the most salient features of the optimal fingerprints remain. Features that are averaged out are small in spatial scale and associated with higher-order eigenvectors of Σ* _{r}*. Those eigenvectors have small eigenvalues

*λ*, and so they explain little interannual variability in the time series

_{μ}*T*

_{mp,region}(

*t*).

## 4. Summary

In optimal fingerprinting, it is necessary to account for uncertainty in the spatiotemporal pattern of evolution of externally forced climate signals, but no rule has been given for normalizing them prior to deducing their uncertainty. Normalization is required so as to avoid penalizing models’ sensitivity differences. Because optimal detection can be formulated using Bayesian statistics, and because an ensemble of models hints at multiple levels of inference, we use Bayesian inference to find a rigorous rule for signal normalization. In the process, we determine that four assumptions are necessary to arrive at the standard equations of optimal fingerprinting that account for signal pattern uncertainty. They are as follows:

(i) The prior in signal pattern and sensitivity is separable [cf. Eq. (8)].

(ii) Postfit residuals are due to internal variability and are normally distributed [cf. Eq. (10)]. This assumption has been made explicit elsewhere.

(iii) The prior in sensitivity is uninformative [cf. Eq. (11)].

(iv) A continuum of models produces a normal distribution for signal patterns [cf. Eq. (14)].

**is nonuniquely defined. When uncertainty in the signal pattern is disregarded, the choice of Δ**

*α***is irrelevant in optimal detection. When uncertainty in the signal pattern is considered, however, it is first necessary to define Δ**

*α***according to a specific interest. There are an infinite number of possible definitions of Δ**

*α***. Once the choice for Δ**

*α***is made, normalization of signal patterns becomes unique. According to Eqs. (6) and (17d), one must normalize modeled trends of data Δ**

*α***g**by modeled trends in Δ

**. Qualitatively, signal pattern uncertainty is best understood as a quantification of uncertain model physics: depending on what climate response is being sought, the uncertain physics of a climate model may be more or less relevant to the outcome.**

*α*In two illustrative examples, we have shown that when one normalizes the trend in the Northern Hemisphere surface air temperature field by a regional surface air temperature trend, the optimal detection result will be the most probable estimate of the underlying climate trend in that region associated with a particular external forcing. Information gained from the Northern Hemisphere outside the region of interest serves to dramatically reduce the fluctuations of internal variability within the region of interest.

The illustrative examples point toward regional climate signal inference and near-term climate forecasting as clear applications for this method of analysis. Of special note is the work of Kharin and Zwiers (2002), which aims at a methodology for attributing regional trends to specific external forcings. Their equations are also those of optimal detection, but the philosophical underpinnings of the approach require that it be restricted to data within the region of interest. Our work does address Kharin and Zwiers (2002) in showing how to normalize fingerprints when accounting for model uncertainty, but our work goes beyond the task of attribution. It explains how to handle arbitrary datasets that can extend well beyond the region of interest to arrive at a most probable inference of the underlying climate trends in a particular region.

The method presented here has already been applied to demonstrate how a time series of infrared spectra can be used to place strong constraints on long-wave feedbacks in the tropics Leroy et al. (2008a). Although the third assumption most likely only inhibits the precision of the results of optimal detection, the invalidity of any of the other assumptions to a particular problem will probably lead to a breakdown of the method. For example, the method seems well suited to regional surface air temperature trends (as in the illustrative examples), but it may not be as well suited to precipitation because of strong relationships between sensitivity and spatial patterns of precipitation change. In that case, regional detection and near-term projection are better accomplished using a Bayesian approach, which does not require the previously stated assumptions.

## Acknowledgments

We acknowledge the modeling groups, the Program for Climate Model Diagnosis and Intercomparison (PCMDI), and the WCRP’s Working Group on Coupled Modelling (WGCM) for their roles in making the WCRP CMIP3 multimodel dataset available. Support of this dataset is provided by the Office of Science, U.S. Department of Energy. We wish to thank Richard Goody for many useful conversations on the topic of testing climate models. This work was supported by Grant ATM-0755099 of the National Science Foundation.

## REFERENCES

Allen, M., and S. Tett, 1999: Checking for model consistency in optimal fingerprinting.

,*Climate Dyn.***15****,**419–434.Allen, M., and P. Stott, 2003: Estimating signal amplitudes in optimal fingerprinting. Part I: Theory.

,*Climate Dyn.***21****,**477–491.Bell, T., 1986: Theory of optimal weighting to detect climate change.

,*J. Atmos. Sci.***43****,**1694–1710.Hasselmann, K., 1993: Optimal fingerprints for the detection of time-dependent climate change.

,*J. Climate***6****,**1957–1971.Hasselmann, K., 1997: Multi-pattern fingerprint method for detection and attribution of climate change.

,*Climate Dyn.***13****,**601–611.Houghton, J., Y. Ding, D. Griggs, M. Noguer, P. van der Linden, X. Dai, K. Maskell, and C. Johnson, Eds. 2001:

*Climate Change 2001: The Scientific Basis*. Cambridge University Press, 881 pp.Huntingford, C., P. Stott, M. Allen, and F. Lambert, 2006: Incorporating model uncertainty into attribution of observed temperature change.

,*Geophys. Res. Lett.***33****,**L05710. doi:10.1029/2005GL024831.Kharin, V., and F. Zwiers, 2002: Climate predictions with multimodel ensembles.

,*J. Climate***15****,**795–799.Leroy, S., 1998: Detecting climate signals: Some Bayesian aspects.

,*J. Climate***11****,**640–651.Leroy, S., J. Anderson, J. Dykema, and R. Goody, 2008a: Testing climate models using thermal infrared spectra.

,*J. Climate***21****,**1863–1875.Leroy, S., J. Anderson, and G. Ohring, 2008b: Climate signal detection times and constraints on climate benchmark accuracy requirements.

,*J. Climate***21****,**841–846.North, G., K. Kim, S. Shen, and J. Hardin, 1995: Detection of forced climate signals. Part I: Filter theory.

,*J. Climate***8****,**401–408.Sivia, D., 2006:

*Data Analysis: A Bayesian Tutorial*. Oxford University Press, 246 pp.von Storch, H., and F. Zwiers, 1999:

*Statistical Analysis in Climate Research*. Cambridge University Press, 484 pp.