## Abstract

Observed 1961–2000 annual extreme temperatures, namely annual maximum daily maximum (TXx) and minimum (TNx) temperatures and annual minimum daily maximum (TXn) and minimum (TNn) temperatures, are compared with those from climate simulations of multiple model ensembles with historical anthropogenic (ANT) forcing and with combined anthropogenic and natural external forcings (ALL) at both global and regional scales using a technique that allows changes in long return period extreme temperatures to be inferred. Generalized extreme value (GEV) distributions are fitted to the observed extreme temperatures using a time-evolving pattern of location parameters obtained from model-simulated extreme temperatures under ANT or ALL forcing. Evaluation of the parameters of the fitted GEV distributions shows that both ANT and ALL influence can be detected in TNx, TNn, TXn, and TXx at the global scale over the land areas for which there are observations, and also regionally over many large land areas, with detection in more regions in TNx. Therefore, it is concluded that the influence of anthropogenic forcing has had a detectable influence on extreme temperatures that have impacts on human society and natural systems at global and regional scales. External influence is estimated to have resulted in large changes in the likelihood of extreme annual maximum and minimum daily temperatures. Globally, waiting times for extreme annual minimum daily minimum and daily maximum temperature events that were expected to recur once every 20 yr in the 1960s are now estimated to exceed 35 and 30 yr, respectively. In contrast, waiting times for circa 1960s 20-yr extremes of annual maximum daily minimum and daily maximum temperatures are estimated to have decreased to fewer than 10 and 15 yr, respectively.

## 1. Introduction

Most of the observed increase in global average temperature since the midtwentieth century is very likely due to greenhouse gas increases (Solomon et al. 2007). Human influence on temperature is now detected at continental, subcontinental, and even regional scales (Karoly et al. 2003; Karoly and Stott 2006; Stott 2003; Zwiers and Zhang 2003; Zhang et al. 2006; Stott et al. 2010). As changes in climate extremes are more strongly associated with changes in natural and human systems than are changes in the climatic mean state (e.g., Parmesan et al. 2000; Parmesan and Martens 2009), demonstration of human influence or lack of it on climate extremes is of great importance for assessing social impacts and for the development of adaptation strategies to climate change. Extreme warm days and nights, for example, have been associated with excess mortality in humans (Pirard et al. 2005), and the warming of extremely cold winter nights has been associated with the winter survival of pests, such as forest beetles in western North America (Peterson et al. 2008).

Solomon et al. (2007) concluded that “confidence has increased that some extremes will become more frequent, more widespread and/or more intense during the 21st century.” This conclusion is supported by evidence from the observations, model simulations, and the comparison of observations and model simulations. Analyses of model-simulated extremes suggest that extreme high temperatures are likely to be higher and extreme low temperatures are likely to be less extreme in the future (Kharin and Zwiers 2000, 2005; Kharin et al. 2007). By comparing annual extreme daily temperatures simulated by two general circulation models (GCMs), Hegerl et al. (2004) showed that the signal-to-noise ratio for changes in climate-model-simulated extreme temperatures on broad spatial scales is nearly as large as that for mean temperature, indicating that it may now be possible to detect anthropogenic influence in extreme temperatures. They showed that projected changes in extreme temperatures are influenced by the changes in both the mean and the shape of seasonal temperature distribution.

Indeed, previous studies have detected human influence on some types of temperature extremes at global scales. Christidis et al. (2005) compared long-term variations in observed and model-simulated extreme temperatures for the *N* warmest days and nights of the year (*N* = 30, 10, 5, 1) over the global land area, and they found significant human influence on patterns of change in extremely warm nights. They also detected human influence on cold nights and days, though less robustly. Shiogama et al. (2006) also used an optimal detection method to compare changes in the annual extremes of daily maximum and minimum temperature with those simulated by the Model for Interdisciplinary Research on Climate 3.2, medium-resolution version [MIROC3.2(medres)]. They showed results similar to those of Christidis et al. (2005); however, there was a small difference in scaling parameters, reflecting differences in the climate model being used. Nevertheless, evidence of human activities on temperature extremes at regional scales, which is of strong relevance to climate change adaptation, is still lacking. Also, with limited exceptions, existing studies have not been able to assess changes in rare, high impact, long return period events.

In this study, we consider the causes of observed extreme temperature changes at regional scales over land, using regions (Fig. 1) similar to those defined by Giorgi and Francesco (2000). We use a single-step attribution (Hegerl et al. 2010) technique that allows us to exploit the statistical properties of extremes and therefore allowing us, as a biproduct, to directly estimate changes in the length of expected return periods of long return period events. This contrasts with multistep attribution approaches, such as that used in Stott et al. (2004), to estimate the anthropogenic influence on the likelihood of a summer as warm as that which was observed in Europe in 2003.

## 2. Model output and observational data

The model output used in this study was obtained from the Coupled Model Intercomparison Project phase 3 (CMIP3) archive at the Program for Climate Model Diagnosis and Intercomparison (PCMDI) Web site. Ensemble simulations from seven GCMs provide daily values of maximum and minimum temperatures for 1961–2000. These models include MIROC3.2(medres), ECHAM and the global Hamburg Ocean Primitive Equation (ECHO-G), and the Meteorological Research Institute Coupled General Circulation Model, version 2 (MRI CGCM2), for which 3, 3, and 5 runs, respectively, are available under a combination of both anthropogenic (ANT) and natural forcings (ALL). In addition, the Canadian Centre for Climate Modelling and Analysis Coupled General Circulation Model, version 3 [CGCM3 (T47)], Commonwealth Scientific and Industrial Research Organisation Mark version 3.0 (CSIRO Mk3.0), CSIRO Mk3.5, and Institute of Atmospheric Physics Flexible Global Ocean–Atmosphere–Land System Model gridpoint (IAP FGOALS) models have available 5, 3, 3, and 3 runs, respectively, under ANT forcing. Additionally, we also obtained daily values from the CGCM3 (Scinocca et al. 2008) runs for 1951–60 and used these longer runs to examine the influence of sample size on the detection analysis. Annual extreme temperature values were extracted from daily values on the model grid for each year and for each simulation. These values were used to derive extreme temperature responses to ANT and ALL forcings.

The observations are station annual maximum daily maximum (TXx) and minimum (TNx) temperatures and annual minimum daily maximum (TXn) and minimum (TNn) temperatures compiled as a part of an internationally coordinated effort to develop a suite of climate indices (Alexander et al. 2006). The data cover global land areas, mainly over the North Hemisphere and Australia, for the period 1946–2000. They are available from the Web site of the Expert Team on Climate Change Detection and Indices (ETCCDI; available online at http://cccma.seos.uvic.ca/ETCCDI), which is a joint activity of the World Meteorological Organization (WMO) Commission for Climatology (CCl), the World Climate Research Programme project on Climate Variability and Predictability (CLIVAR), and the Joint WMO–IOC (Intergovernmental Oceanographic Commission) Technical Commission for Oceanography and Marine Meteorology (JCOMM). Observed annual extremes were calculated on the CGCM3 (T47) model grid (approximately 3.75° × 3.75°) by averaging anomalies of annual extremes relative to their 1961–90 climatologies from available station observations within grid boxes.

Comparison between observed and model-simulated extreme temperatures was conducted on the model grid of CGCM3 over large regions and over the global land area where observational data were available. Using a strict criterion for missing data (station annual temperature extremes are set as missing for a year if 15 days or more observations at that station and in that year are missing), a total of 482 grid boxes are available that have at least 40 of the 50 possible annual extreme values for the 50-yr period (1951–2000) for each of the four types of temperature extremes, including 228 grid boxes with more than three stations. Our assessment was based on gridbox means of annual extremes observed at stations. Extreme temperatures have relatively large spatial covariance structures, as is indicated by the fact that the median value of correlation coefficients between station pairs remains statistically significant for separation distances of up to approximately 2000 km. Thus, even individual stations should well represent gridbox mean extremes, and scales of variation comparable to those simulated by climate models.

## 3. Methods

A detection and attribution analysis typically involves the comparison of observed climate data with model-simulated signals using a regression-based method (Allen and Stott 2003, references therein). The standard detection method assumes that regression residuals follow a Gaussian distribution. This assumption becomes problematic when dealing with extreme values since they tend to have a skewed distribution. In this paper, we describe a method that takes the underlying distribution of extreme temperatures into consideration. We represent the behavior of observed annual extreme temperatures with a generalized extreme value (GEV) distribution (Smith 1989) that has the following probability density function:

Parameters *ξ*, *μ*, and *σ* (>0) are termed the shape, location, and scale parameters, respectively. When *ξ* = 0, the distribution is also known as the Gumbel distribution (Gumbel 1958).

One or more of the parameters of the GEV distribution may be treated as a function of covariates that alter the characteristics of extremes (e.g., Kharin and Zwiers 2005; Zhang et al. 2010). For the purpose of a detection and attribution analysis, parameters *ξ*, *μ*, and *σ* can be defined as functions of a climate change signal estimated from climate model simulations and statistical inference techniques can then be used to determine whether the observations provide evidence for the presence of that signal. Specifically, we assume observed annual extreme temperatures in individual grid boxes follow GEV distributions with location parameters that are influenced by the response to anthropogenic forcing. We further assume that the scale and shape parameters are unaffected by anthropogenic forcing. We recognize that this is a strong assumption that may not hold consistently everywhere. For example, one might anticipate that the shape parameter might change for extreme annual minimum temperatures in regions where the snow line moves, and for extreme annual maximum temperatures in semiarid regions where soil moisture changes. Nevertheless, this assumption keeps the problem of simultaneously fitting a large number of GEV distributions tractable, and it is consistent with the findings of previous research, which has demonstrated that the response in scale and shape parameters is relatively weaker than in the location parameter, both in twentieth-century observations (Brown et al. 2008) and even under strong forcing in the twenty-first century (Kharin and Zwiers 2005). Details of our detection and attribution method are outlined below.

### a. Signal estimation from ensemble simulations

As noted above, we assume that the first-order influence of anthropogenic forcing on gridbox extreme temperatures (TNn, TNx, TXn, TXx) is on the location parameter and that scale and shape parameters are not affected by external forcing. Because of a limited number of ensemble members, we further assume that changes in the location parameter within a 10-yr block are small, thereby allowing us to represent the signal in the extremes through decade-by-decade changes in the location parameter. This corresponds to the common practice of representing the evolution of the signal in the climatic mean as a sequence of decadal ensemble mean anomalies. We therefore estimate the signal in the location parameter due to external forcing for individual models separately from the available *M*-member ensemble of climate simulations with that model, by treating the 10*M* annual extreme temperatures from the *M*-member ensemble simulations within a given 10-yr period as coming from the same distribution. Decades are defined as 1951–60 to 1991–2000 for CGCM3 and as 1961–70 to 1991–2000 for CMIP3 models.

The signal for a given model is estimated separately for individual grid boxes. Let *y _{il}* be annual extreme temperatures (being one of TNn, TNx, TXn, TXx), where

*i*= 1, … ,

*N*represents the

*i*th decade corresponding to years 1951–60, 1961–70, … , 1991–2000 (1961–70, … , 1991–2000 for CMIP3 models) respectively; and

*l*runs from 1 to 10

*M*representing the 10

*M*annual extremes that are available from the

*M*-member ensembles in the

*i*th decade. We assume, for the moment, that annual extremes {

*y*,

_{il}*i*= 1, … ,

*N*,

*l*= 1, … , 10

*M*} for a given grid box are independent of each other. For model-simulated extreme temperature at a given grid point, we fit a GEV distribution as described in (1) with the following set of parameters:

where *i* = 1, 2, … , *N* corresponds to the *N* decades. The parameters can be estimated by maximizing the likelihood function, which is given by

or equivalently, by minimizing the joint negative log-likelihood function

where {*y _{il}*,

*l*= 1, … , 10

*M*} is the collection of 10

*M*-simulated annual temperature extremes at that grid box in decade

*i*.

The *N* time-evolving location parameter estimates are used to represent model-simulated changes in the annual extreme temperature distributions in the subsequent analysis of observed extremes. This process is repeated for each of the four extreme temperatures and for each model and land grid box of that model separately. The estimated location parameters are then interpolated to the CGCM3 grid. Location parameters from the ALL simulations are averaged to represent extreme temperature responses (signals) to ALL forcing, and those from ANT simulations to ANT forcing.

### b. Detection analysis

A detection analysis is conducted by fitting GEV distributions to observed extreme temperatures, with location parameters proportional to the estimated signal. A common proportionality coefficient (or scaling factor) is used for all boxes for the region under consideration, but other parameters of the distribution are allowed to vary between grid boxes. The scaling factor is obtained by using a profile likelihood method as described below.

Let *X _{jk}* represent the annual extreme temperature observations during the past

*N*decades over a region of

*K*grid points, where

*j*= 1, 2, … , 10

*N*years and

*k*= 1, 2, … ,

*K*grid boxes. Assuming the GEV distribution and that the signal only appears in the location parameters, for each gridpoint

*k*, we set the parameters of the GEV distribution of (1) to

where Δ *μ̂ _{jk}* is the estimated signal relative to its 1961–1970 value at gridpoint

*k*in year

*j*, and

*β*is the scaling factor common for the whole region under consideration. Since it was assumed that changes in the signal within a decade are small, there are only

*N*distinctive Δ

*μ̂*values, one for each of the

_{jk}*N*decades. Because we use a common scaling factor for all

*K*grid boxes, the parameters for all grid boxes in the region must be estimated jointly. Assuming, for the moment, independence between grid boxes, the joint negative log likelihood for the GEV density function at

*K*grid boxes is given by

where nllh* _{k}* is the negative log-likelihood function at grid box

*k*,

and maximum likelihood estimates of parameters *β*, *μ _{k}*,

*ξ*, and

_{k}*σ*, where

_{k}*k*= 1, 2, … ,

*K*are obtained by minimizing nllh. As the number of parameters to be estimated is large, we use the profile likelihood method to avoid the estimation of the joint likelihood over all grid points in the region. We first set

*β*=

*β*in (4) and minimize the negative log likelihood for each gridpoint

_{c}*k*conditional on

*β*=

*β*. We then searched over

_{c}*β*values to obtain an estimate of

_{c}*β*that minimizes nllh. This provides an estimate of the best scaling factor for the region, although this value is not necessarily optimal for every individual grid point. We varied

*β*from −4.5 to 4.5 with an increment of 0.03. This range was determined by considering the smallest and largest values of the scaling factors that were separately estimated for individual grid boxes over the global land area.

_{c}Note that the procedure above is not “optimal” in the sense that our estimate of the scaling factor has not taken into account the space–time covariance structure of the observed extremes to maximize the signal-to-noise ratio, as is typical in optimal detection analyses (see, e.g., Allen and Stott 2003; Zwiers 1999). This would be very difficult to do given that extreme value theory does not yet provide a convenient way to take spatial or temporal dependence into account when analyzing time series of fields of extreme values, such as we are doing here.

Nevertheless, it is possible to take temporal and spatial dependence into account when estimating the uncertainty of the scaling factor estimate derived above; we do this by using an appropriately constructed block bootstrap procedure to infer a confidence interval for the scaling factor. The objective is to estimate the uncertainty in the scaling factor that results from natural internal variability in the climate system and from uncertainty in signal estimates, taking into account temporal and spatial dependence between extremes. Generally, in detection and attribution analyses, estimates of uncertainty due to natural internal variability are obtained from long control simulations; however, that is not practically possible in this case given that daily output from a long control simulation was not available. Consequently, it was necessary to use a block bootstrap resampling procedure that attempts to retain as much of the temporal and spatial dependence structure as possible.

This procedure involves the following steps:

Residual series are obtained by subtracting the scaled signal from the observed extreme values.

To consider interannual variability, we divide the

*N*decades into 2*N*5-yr nonoverlapping blocks (e.g., 1951–55, 1956–60, … , 1996–2000 for CGCM3) and randomly reorder the sequence of 5-yr blocks. The spatial covariance structure is retained by applying the same reordering of 5-yr block to all grid points. The residual series is then resampled according to the sequence of the years that results.Bootstrapped data are finally obtained by adding the scaled signal back to the resampled residuals, and a scaling factor is again estimated for the analyzed region by maximum likelihood, as described previously.

Repeating this procedure a number of times would produce a sampling distribution on the estimated scaling factor that accounts for the effects of internal variability. However, we also wanted to account for the effects of signal uncertainty that arise from climate-model-simulated internal variability.

To account for uncertainty in the signal estimates, we use a further bootstrap resampling procedure that includes the following steps:

(i) For each GCM ensemble, 10

*M*extreme values are sampled with replacement in 5-yr blocks from model-simulated values for each of the 10-yr blocks, and new signals are estimated from the resampled data.(ii) Signals for models with ALL forcing and for models with ANT forcing are averaged separately.

(iii) A scaling factor that best fits the new signals to the observations is obtained for each region.

(iv) We then applied steps (i)–(iii) 32 times to account for the influence of natural internal variability conditional upon the signal estimate from step (i).

The steps (i)–(iv) are repeated 32 times, thereby providing a total of 1024 scaling factor estimates, the range of which account for sampling uncertainty, including temporal dependence on time scales up to 5 yr, in both the observations and the signals. Since the resampling procedure retains the spatial dependence structure of the observations and the model output (residuals and model output are not resampled in space), the bootstrapped sample of scaling factor estimates also takes spatial dependence into account. The 5th and 95th percentile of these scaling factors give an approximate 90% confidence interval for the scaling factor. We claim detection if this 90% confidence interval lies above zero.

A concern is that our approach for estimating internal variability may not adequately account for low-frequency internal variability at time scales longer than 5 yr, such as that which might be associated with, for example, the Atlantic multidecadal oscillation (Enfield et al. 2001). Using long control simulations to estimate internal variability would alleviate this concern, assuming that models correctly represent the relationships between low-frequency ocean variability and the responses of land surface air temperature extremes to that variability. Since long simulations with daily sampling were not available to us, we were unable to take this more standard approach to estimating internal variability. However, potential predictability studies with models, which assess the presence of “excess” low-frequency variability, suggest that potential predictability in ocean areas extends only weakly into land areas (e.g., Boer 2004; Boer and Lambert 2008) for both surface temperature and predictability, although other research (e.g., Zhang et al. 2010) does indicate that variability on the ENSO time scale, which is encompassed within our block bootstrap resampling scheme, can significantly affect precipitation (and presumably temperature) extremes. Furthermore, our goodness-of-fit tests, discussed below, do not indicate that there is a serious problem with misfit of residual variability or that misfit occurs preferentially in regions that might be expected to be influenced by low-frequency ocean variability. Therefore, while we recognize that caution is merited, our judgment is that internal variability is not seriously underestimated. With the exception of TXx, our detection findings remain robust even if our estimates of internal variability, which are reflected in the lengths of scaling factor confidence intervals, were substantially increased.

### c. Goodness-of-fit test

It is not easy to directly test the goodness of fit of the nonstationary GEV distribution. However, the nonstationary component in the extreme temperatures can be removed by transforming the observed extreme temperatures into standardized variables, as was done in Zhang et al. (2010). After the transformation, the tests of goodness of fit for the nonstationary GEV distribution are equivalent to tests of the goodness of fit of stationary Gumbel distributions to the transformed data (Coles 2001). We use a parametric bootstrap Kolmogorov–Smirnov (K–S) test (Kharin and Zwiers 2005) to examine the goodness of fit for a Gumbel distribution to the transformed data.

### d. Extreme temperature changes attributable to external forcing

To estimate possible changes in the likelihood of extreme temperatures attributable to external forcing, we estimate the 1990s *waiting time* of the 1960s climate 20-yr return values of extreme temperature from the nonstationary GEV distribution that we have fitted to the observations. Waiting time for an extreme of a given size is defined as one over the probability of occurrence of an annual extreme at least as extreme as the one that is specified. Waiting times are constant in a stationary climate—a value more extreme than the 20-yr return value is expected to recur once every 20 yr on average. In contrast, waiting times have an instantaneous interpretation in a nonstationary climate that is specific to the date for which the waiting time is determined; they indicate the instantaneous rate at which the specified threshold is likely to be exceeded. A circa 1990s waiting time can therefore be interpreted as the projected rate at which the reference value would be exceeded, assuming that the climate remains stationary in its 1990s state. To this end, we first estimate the relevant 20-yr return values using GEV parameters corresponding to 1960s climate. We then estimate the probability that the annual extreme temperature will lie below (for annual minimum temperatures) or above (for annual maximum temperatures) the relevant 20-yr return values in the 1990s climate as predicted with the influence of external forcing. These probabilities are weighted by the size of grid boxes and are averaged for each region separately. The inverse of the regionally averaged probabilities provide estimates of average waiting times (or return periods) in the 1990s for the 1960s climate 20-yr return values. The departure of this waiting time from 20 yr provides a measure of the change in the likelihood of extremes that may be attributable to external forcing (Stone and Allen 2005).

## 4. Results

Detection results are summarized in Fig. 1. The influence of both ANT and ALL forcings is detected globally in all annual extreme temperatures, including TXx, TXn, TNx, and TNn. The ANT and ALL signals are not detected in TXx in almost all subcontinent regions, with the exception of southern Africa (SAF) and Central America (CAM), and ANT in southern Asia (SAS). However, both ANT and ALL signals are detected in TNx in all subcontinent regions except for ANT in central North America (CNA). Both ANT and ALL signals are detected in most subcontinent regions for TXn and TNn.

Detection results as shown in Fig. 1 clearly indicate that the most detectable influence of external forcings is on annual maximum nighttime low temperatures, and that the least detectable influence is on annual maximum daytime high temperatures. Note that the former results are robust to large increases in the estimates of internal variability. The 90% confidence intervals for scaling factors in regions where the ANT and/or ALL signal is detected generally include one as a plausible scaling factor, suggesting that the magnitudes of model-simulated changes in extremes are comparable to those observed. There are a few exceptions, however. Model-simulated responses to ALL and ANT in global TXx are larger than observed, and the simulated responses in TNn are smaller than observed.

As the daily minimum temperature usually occurs at night and the daily maximum temperature during the day, our detection results indicate that there is a more extensive human influence on the annual minima of nighttime temperature than on the annual maxima of daytime temperature. The reasons for this discrepancy are not well understood, but we speculate that this may be due to differences in the physical mechanisms that control extreme nighttime and daytime temperatures, with the former being constrained largely by the efficiency with which the surface can cool to space and the latter being influenced by both radiative considerations and the local balance between latent and sensible heat production. The fact that models appear to oversimulate the response to anthropogenic forcing in TXx and undersimulate that in TNn suggests that there may be difficulties in simulating the surface energy balance or controlling factors, such as soil moisture, surface albedo, the planetary boundary layer, or low-level clouds.

The results of our detection analysis based on CGCM3 simulations and for the period 1951–2000 are plotted in Fig. 2. Scaling factors are in general smaller than unity, suggesting that CGCM3 simulates larger changes in extreme temperature than observed. This is consistent with the fact that CGCM3 warms too much in the mean. The scaling factor confidence intervals are noticeably smaller than the above multiple model results for 1961–2000. As the difference may come from the difference in sample size, and the difference in the simulations being used, we also conducted detection analysis using CGCM3 simulations for the shorter 1961–2000 period. Results indicate that the confidence intervals for the shorter 1961–2000 period are typically about 1.3 of those for the longer 1951–2000 period. This suggests that the detection results would benefit from the use of longer samples. The confidence intervals we computed, which involve consideration of the effects of both the internal variability and signal uncertainty, are wider than those computed directly from the profile likelihood (Coles 2001). This is because our approach accounts for the effects of spatial and temporal covariance structure as well as uncertainty in signal estimates, while the profile likelihood–based estimates assume that data are spatially and temporally white, and that signals are known perfectly.

Because different regions have different scaling factors, goodness of fit was assessed for the different temperature extremes, regions, and signals separately. Results of the goodness-of-fit test (Table 1) show that the GEV is an appropriate distribution for extreme temperatures in general. The grid boxes at which the GEV is rejected are distributed quite randomly in general. The rate at which GEV fit is rejected is generally not field significant at the 5% level, even if temperature extremes are assumed to be spatially independent. There are only a few exceptions: there is some evidence that the GEV does not fit well for TNn in SAF for ALL, for TXn in Alaska (ALA) for ANT, and rejection rates were a bit higher overall when CGCM3 data were used to estimate the ANT signal.

Figure 3 displays estimates of the expected waiting time in the 1990s for the 20-yr return period extreme temperatures of the 1960s. The changes in the waiting time correspond well with warming. Globally, waiting times are estimated to have changed substantially between the 1960s and 1990s, including increases in waiting times for the recurrence of extreme minimum daily maximum temperature to approximately 30 yr, and of extreme annual minimum daily minimum temperature to more than 35 yr. As well, waiting times for extreme annual maximum daily minimum temperature are estimated to have decreased to fewer than 10 yr and for extreme annual maximum daily maximum temperature to about 15 years. Regionally, the most significant change in the likelihood of extreme temperatures is observed in the extremes of daily minimum temperatures (which generally correspond to extreme nighttime temperatures); in this case, the 90% confidence interval for the 1990s waiting time of a 1960s 20-yr event excludes 20 yr in almost all regions for TNx and in 11 out of 15 regions for TNn. Note that the detection of a signal for a region and a given type of temperature extreme may not correspond to a significant departure of the waiting time for a 1960s 20-yr event. This is because uncertainty in the return values, especially those farther in the tail, reflects uncertainties not only in the scaling factor but also in other parameters.

## 5. Conclusions and discussion

By fitting GEV distributions to observed extreme temperatures with simulated responses to anthropogenic (ANT) forcing and to anthropogenic and natural (ALL) forcing combined with multiple GCMs, we found that external influence is clearly detectable in global land annual maximum daily minimum temperatures, and annual minimum daily maximum and daily minimum temperatures. This is in agreement with earlier studies (Christidis et al. 2005; Shiogama et al. 2006). We also detected ALL and ANT influences in global land annual maximum daily maximum temperature with the multimodel simulations. In addition, we have obtained some early evidence that external influence on extreme daily temperature is detectable at the regional scale. As external influence has been detected with both ANT and ALL forcings, we conclude that human influence has contributed to the observed changes in the annual extreme daily temperatures, especially the increase in the extremely warm nighttime temperatures, and both coldest day and night temperatures over the global land area as a whole as well as over the large regions we studied.

Anthropogenic influence is estimated to have increased the waiting time of extreme annual minimum daily maximum and minimum temperature events, and has substantially decreased the waiting time between extreme annual maximum daily minimum temperature events. Anthropogenic influence has been detected in almost all regions for the annual maximum of daily minimum temperature (TNx). This may be related to the fact that extreme values of TNx typically occur in summer when year-to-year variability tends to be reduced. We note that the CGCM3-simulated ANT signal is not detected in the annual maximum of daily maximum temperature (TXx) over global land but that the ANT signal in TXx simulated by the multimodel ensemble is detected. This may be because CGCM3 warms too quickly, therefore requiring an ANT signal scaling factor that is substantially less than one. Detection results based on 1951–2000 and 1961–2000 periods clearly indicate the importance of using a longer series since the uncertainty range in the scaling factor can be reduced substantially We also note that the fit of the GEV distribution is rejected in fewer regions when signals are estimated from multiple model ensembles than when estimated from a single model, indicating once again (e.g., Gillett et al. 2002; Huntingford et al. 2006) the importance of using a signal that is less influenced by internal variability and model uncertainty in a detection analysis.

We caution that the results presented here are preliminary. Further investigation is required with more models and longer samples to reduce uncertainty. Also, extensions to consider multiple signals are required to increase our confidence in the regional detection of temperature extremes and to confidently attribute change to specific forcing agents as opposed to a specific combination of agents. Greenhouse gas, aerosol, and natural external forcing signal separation, which has not been attempted here, will be more challenging. Nonetheless, the separation of the greenhouse gas and aerosol signals at the regional and continental scales in annual mean temperature (Stott 2003; Zhang et al. 2006) and the fact that the signal-to-noise ratio for changes in extreme temperature over the globe is nearly as large as that for mean temperature (Hegerl et al. 2004) enhance our confidence in these detection results. The evidence of anthropogenic influence in extreme temperature at the regional scale should have important implications when considering climate change adaptation strategies.

Our analysis is not without some caveats. A potential problem is that we were not able to totally account for the effects of spatial covariance and natural low-frequency variability on extremes. Nevertheless, the approach that we have used for estimating the effects of internal variability and signal uncertainty on the scaling factors does partially account for dependence. Goodness-of-fit test results suggest also that the lack of independence is not a large issue for the reliability of our statistical inferences. A second consequence of our inability to fully account for the dependence structure of the extreme values across time and space is that we have not been able to optimize the signal-to-noise ratio, as can normally be done in an optimal detection and attribution analysis (e.g., Hegerl et al. 1997). Also, it has not been possible to reduce the dimensionality of the problem so as to retain the scales at which the model most reliably represents internal variability in extremes (Allen and Tett 1999). In addition, the anthropogenic signal in temperature extremes is likely less well simulated by models than the mean response given the number and complexity of processes that may contribute to the generation of extreme temperature values.

## Acknowledgments

We thank Nathan Gillett and Seung-Ki Min for their comments, which improved an earlier draft of this manuscript. We would also like to thank two anonymous reviewers for their perceptive and helpful comments. We acknowledge the modeling groups the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and the WCRP’s Working Group on Coupled Modelling (WGCM) for their roles in making available the WCRP CMIP3 multimodel dataset. Support of this dataset is provided by the Office of Science, U.S. Department of Energy.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* Francis Zwiers, Pacific Climate Impacts Consortium, C182 Sedgewick Building, University of Victoria, P.O. Box 1700 Sta CSC, Victoria BC V8W 2Y2, Canada. Email: fwzwiers@uvic.ca