Observations show the oceans have warmed over the past 40 yr, with appreciable regional variation and more warming at the surface than at depth. Comparing the observations with results from two coupled ocean–atmosphere climate models [the Parallel Climate Model version 1 (PCM) and the Hadley Centre Coupled Climate Model version 3 (HadCM3)] that include anthropogenic forcing shows remarkable agreement between the observed and model-estimated warming. In this comparison the models were sampled at the same locations as gridded yearly observed data. In the top 100 m of the water column the warming is well separated from natural variability, including both variability arising from internal instabilities of the coupled ocean–atmosphere climate system and that arising from volcanism and solar fluctuations. Between 125 and 200 m the agreement is not significant, but then increases again below this level, and remains significant down to 600 m. Analysis of PCM’s heat budget indicates that the warming is driven by an increase in net surface heat flux that reaches 0.7 W m−2 by the 1990s; the downward longwave flux increases by 3.7 W m−2, which is not fully compensated by an increase in the upward longwave flux of 2.2 W m−2. Latent and net solar heat fluxes each decrease by about 0.6 W m−2. The changes in the individual longwave components are distinguishable from the preindustrial mean by the 1920s, but due to cancellation of components, changes in the net surface heat flux do not become well separated from zero until the 1960s. Changes in advection can also play an important role in local ocean warming due to anthropogenic forcing, depending on the location. The observed sampling of ocean temperature is highly variable in space and time, but sufficient to detect the anthropogenic warming signal in all basins, at least in the surface layers, by the 1980s.
Observations show the climate is warming. Nearly a decade’s worth of rigorous detection and attribution studies (e.g., Houghton et al. 2001; International Ad Hoc Detection and Attribution Group 2005), focused mainly on atmospheric temperature, show the warming is outside the range of natural variability. Natural variability includes the effects of both fluctuations that arise within the coupled ocean–atmosphere system due to oscillations or instabilities (“internal” variability), and effects imposed on the climate from natural sources external to the oceans and atmospheres, such as volcanism and solar fluctuations (“external” variability). The warming’s effects can be seen qualitatively in accelerating melting of glaciers (e.g., Thompson 2000, 2003), disruption of Arctic ecosystems [e.g., Tynan and DeMaster (1997)], and bleaching of the world’s coral reefs (Knowlton 2001), to name a few examples.
The ocean’s density and heat capacity are orders of magnitude greater than the atmosphere’s; it is the thermal flywheel that stabilizes the climate system. Indeed, 84% of observed climate warming of the earth system (oceans, atmosphere, cryosphere, continents) over the past 50 yr has gone to heat the oceans; only 3% has gone to heat the atmosphere (Levitus et al. 2005). The global ocean is therefore the logical place to study the nature of observed warming.
Detection and attribution studies of ocean warming using climate models were pioneered by Barnett et al. (2001) using the Parallel Climate Model version 1 (PCM), and Levitus et al. (2001) using the Geophysical Fluid Dynamics Laboratory (GFDL R30) model, who compared simulated ocean warming to the Levitus et al. (2000) observed ocean temperature dataset. Both found a good match between modeled and observed increase in ocean heat content, but only if anthropogenic forcing was included. The conclusion was that observed ocean warming is outside the expected range of natural variability, and attributable to anthropogenic forcing. Subsequently, the issue was examined by Reichert et al. (2002) using the European Centre for Medium-Range Weather Forecasts–Hamburg model– Ocean Isopycnal Model version 3 (ECHAM4–OPYC3), Gent and Danabasoglu (2004) with the Community Climate System Model version 2 (CCSM2), and Gregory et al. (2004) with the Hadley Centre Coupled Climate Model version 3 (HadCM3). All found good agreement between model-predicted and observed ocean warming, although sampling issues, somewhat weak model variability, and the way the signal could be more easily detected in the top part of the water column than at greater depths were generally noted. None of these studies examined in detail the vertical structure of the warming signal. Hansen et al. (2002) and Sun and Hansen (2003), although not formal detection and attribution studies, examined the anthropogenically forced increase in ocean heat content in the Goddard Institute for Space Studies atmosphere model (GISS SI2000) coupled to various ocean models (including the Hybrid Coordinate Ocean Model, HYCOM), and showed globally averaged vertical profiles of anthropogenic ocean warming. Gregory (2000) showed zonal-mean ocean temperature increases in an anthropogenically forced model experiment, and analyzed the role of ocean vertical heat transport (particularly at high latitudes) in producing the surface climate response.
More recently, Barnett et al. (2005, B05 hereafter) revisited this issue with a new version of the observed, gridded ocean temperature dataset (Levitus et al. 2005). Compared to Barnett et al. (2001), B05 also included a basin-scale analysis of the temporal evolution of vertical penetration of heat into the oceans, an estimate of the role of time-averaged surface heat flux and advection in contributing to the temperature change, detection with estimated historic solar and volcanic forcing taken into account, results from a different climate model (HadCM3; Gordon et al. 2000), and a more restrictive sampling scheme that better mimics observations (more on this below). The conclusion of B05 was that human activities are largely responsible for the observed ocean warming over the last 40 yr. Natural variability, either internal to the coupled ocean–atmosphere system or external (solar/volcanic), was incapable of explaining the ocean observations.
Because of their brevity, Barnett et al. (2001) and B05 did not fully address a number of questions attending their main conclusion. Our objective is to more fully explore these questions and determine their potential impact on the detection and attribution results for ocean warming. Accordingly, this paper is organized as follows. In section 2 we briefly describe the data and models used. In section 3 we examine the models’ natural variability and compare it to observations; a reasonably close correspondence is needed if natural variability is to be accepted or ruled out as a potential cause of observed ocean warming. In section 4 we illustrate a model-based “fingerprint” of ocean warming and compare it to both observations and model estimates of natural internal and external variability. In section 5 we illustrate how changes in net surface heat flux and advection together determine changes in local heat storage. In section 6 we examine the effects sampling variability has on the detection and attribution results. Finally, in section 7 we present a summary of our findings and conclusions.
2. Model and observational data used
We use data from two ocean–atmosphere general circulation models (O–AGCMS): the Parallel Coupled Model (PCM; Washington et al. 2000) and HadCM3 (Gordon et al. 2000). These two models have been developed independently for decades, so they have little in common except their purpose. Neither uses flux adjustments. Complete descriptions of both of these models and their simulations are available in the above references, so they will only be summarized briefly here.
The atmospheric component of PCM is the National Center for Atmospheric Research’s Community Climate Model version 3 (CCM3) atmospheric general circulation model (Kiehl et al. 1998), a spectral model used here at T42 resolution (equivalent to about 280 km × 280 km grid spacing), with 18 vertical layers in a hybrid sigma-coordinate scheme. A land surface model with soil moisture, vegetation types, and a simplified runoff scheme is included. The ocean component is the Parallel Ocean Program (POP; Smith et al. 1992; Dukowicz and Smith 1994), with a horizontal resolution of 384 by 288 points on a displaced-pole grid (roughly 2/3° resolution), with 32 vertical levels clustered near the surface to improve the representation of surface mixing processes. A dynamic–thermodynamic sea ice model based on Zhang and Hibler (1997) is included, with an elastic–viscous–plastic rheology for computational efficiency (Hunke and Dukowicz 1997). The ice model is formulated on its own grid, which has a total of 320 by 640 grid points, with a physical grid spacing of roughly 27 km.
To determine the effect of anthropogenic forcing on climate, we used a 12-member ensemble (listed in Fig. 6) forced with estimated historical emissions of greenhouse gasses, ozone, and the direct effect of sulfate aerosols (Washington et al. 2000; Dai et al. 2004). These runs did not include the effects of black carbon, solar fluctuations, or volcanoes. We compared this to a 520-yr control run with greenhouse gas and sulfate aerosol concentrations fixed at 1870 values, which provides an estimate of natural internal climate variability. For isolating the effects of natural variability external to the coupled ocean–atmosphere system, we used a four-member ensemble driven only by estimated solar fluctuations and volcanic forcing over the past 50 yr (Meehl et al. 2003).
HadCM3 is based on the United Kingdom Met Office (UKMO) unified forecast and climate model. The atmospheric resolution is 2.5° × 3.75°, with 19 vertical levels in a hybrid vertical coordinate. It represents effects of minor trace gases in addition to carbon dioxide, water vapor, and ozone. The direct and some indirect effects of aerosols are included (Gordon et al. 2000).
The ocean component of HadCM3 uses a 1.25° × 1.25° horizontal resolution, with 20 vertical levels concentrated in the upper part of the water column to better resolve surface processes. This ocean model shows considerably improved heat budgets compared to earlier, coarser-resolution versions. A simple thermodynamic sea ice model is included, with parameterizations of ice drift and leads. A four-member ensemble was used (listed in Fig. 7) that incorporated the effects of both anthropogenic and natural external (solar, volcanic) forcing. This differs from the PCM anthropogenically forced runs, which did not include solar and volcanic forcing. A 700-yr control run with no external or anthropogenic forcing provided an estimate of natural internal variability. Tett et al. (2002) and Johns et al. (2003) analyze anthropogenic runs using this model.
We use the latest version of the National Oceanographic Data Center’s gridded historical ocean temperature dataset [Levitus et al. (2005); available online at http://www.nodc.noaa.gov/OC5/DATA_ANALYSIS/heat_intro.html, accessed 17 March 2005]. Using this dataset, Levitus et al. (2005) estimated the World Ocean heat content (surface to 3000 m) increased 14.5 × 1022 J between 1955 and 1998.
The data are on a 1° × 1° grid, and has been infilled (i.e., interpolated) to fill gaps where no observations exist. Yearly data are provided on 16 standard depth levels in the top 700 m.
Locations where the ocean is sampled vary considerably from year to year. To help take this into account, two additional fields are supplied with the dataset. The gp mask shows, for each grid box, the number of observations taken within ∼450 km of that grid box at the same depth. [This is a change from the gp field in Levitus et al. (2000), where a radius of ∼900 km was used.] The dd mask shows the number of observations taken within the grid box itself. Both masks vary in latitude, longitude, depth, and time. For most results shown here, only locations the dd mask indicates as sampled are included, for both models and observations. This is a departure from Barnett et al. (2001) and Reichert et al. (2002), who used the gp mask; this change is an attempt to be more conservative, by avoiding using any infilled data. It also motivates changing from total heat content, used in Barnett et al. (2001), to volume-averaged temperature, since the total amount of heat depends on the volume sampled in any particular year. Gregory et al. (2004, their Fig. 2) illustrate how total estimated heat content can vary substantially based on assumptions made to infill data. [Gent and Danabasoglu (2004) used volume-averaged temperature as well, but did so to normalize by varying basin sizes and avoid use of heat content units.] In section 6a we compare the results of using the dd and gp masks.
Pentads (5-yr averaged data) are provided on the same horizontal grid as yearly data, but extended to 3000 m using 28 depth levels. We focus on yearly data in this work, since we use the dd mask to sample the model in the same way as the observations, and potentially having a single sample represent temperature anomalies for 5 yr becomes uncomfortably long, even at depth. The issue of seasonal sampling of the observations, and how it might influence the annual anomalies, are discussed in section 6d.
3. Natural variability: Models versus observations
It is important to estimate the likelihood that observed ocean warming arises from natural climate variability, both internal and external. To do this one must compare observed ocean warming to fluctuations expected without anthropogenic forcings. The latter requires estimates from models; it then becomes important to examine whether the models’ natural variability is realistically strong. We do this by comparing control model results to the detrended observations. The detrended observations also include natural external variability (solar and volcanic forcing) and any nonlinear effects of anthropogenic forcing, unlike model control runs. To address this, we also compare detrended observations to detrended anthropogenically forced model runs. (The separate effect of external forcing in the absence of anthropogenic forcing is shown for PCM in section 4.)
Perhaps the first and most obvious way to compare variability is simply through maps of the standard deviation of annual sea surface temperature (SST) anomalies; this is shown in Fig. 1 for observations (1945–2004) and the long model control runs. Both the observed and model values are detrended by removing the least squares best-fit line (although there is very little trend in the model control run surface temperatures). After detrending, solar and volcanic variability (along with any nonlinear response to anthropogenic forcing) remains in the observations, but is absent from the model control runs. Even so, there is some tendency for the models to overestimate SST variability. For example, PCM has too much North Pacific variability, as the Pacific decadal oscillation in PCM (standard deviation of 1.1°C) is stronger than observed (0.74°C). HadCM3 has more North Atlantic variability than observed. Both models have roughly the observed strength in the central tropical Pacific (associated with ENSO), although PCM misses the high variance along the west coast of South America and HadCM3 extends the variability too far into the tropical Pacific warm pool.
The standard deviation of annually averaged values is useful to examine, but strong interannual variability might mask low-frequency differences. Time series and spectra of PCM and HadCM3 temperatures (0–700 m) from the control runs (internal variability alone) are already compared to detrended observations in B05, and so are not repeated here. However, it is unclear if anthropogenic forcing has any effect on the amplitude or frequency of natural modes of internal climate variability, such as ENSO, the Pacific decadal oscillation, or the North Atlantic Oscillation. Accordingly, in Fig. 2 we show spectra of globally averaged temperature (0–700 m, dd-sampled points only) in observations and anthropogenically forced model runs, both detrended by removing the least squares best-fit line to remove the linear part of any anthropogenic signal. [Prior to detrending, the model values have control model drift removed as described in appendix A; Gent and Danabasoglu (2004) illustrate and discuss drifts that can be found in a control run, albeit for a different model than used here.]
The models and observations have similar power in interannual frequencies (>0.15 cpy), but PCM becomes deficient at longer time scales. This is less a problem with HadCM3, which has more low-frequency variability than PCM, and matches the observed spectrum rather well. A more complete spectral analysis is shown in appendix B, where the same data used to make Fig. 2 are analyzed by basin and depth. The conclusion is low-frequency spectral power in the detrended observations generally falls within the 90% confidence limits of the PCM model ensembles, while HadCM3 tends to have too much variability above 100 m and too little below 400 m. This would have the tendency of making any anthropogenic warming of the surface ocean (where the signal is concentrated) harder to detect than it should be.
Gregory et al. (2004) compared observed low-frequency temperature variability to the HadCM3 control run (natural internal variability only) by computing the standard deviation of volume-averaged, 5-yr running mean temperature at each level from the detrended data (their Fig. 4). They found a peak in observed variability at 400 m that the model did not reproduce. The detailed analysis of standard deviations in appendix B shows that the PCM control run exhibits the same behavior, but also that the observed 400-m “peak in variability” arises not because ocean variability is unusually strong at 400 m, but because it is unusually spatially coherent. This reduces cancellation between uncorrelated points seen to a greater extent above and below 400 m when volume-average temperature is calculated. The models’ average standard deviation of temperature by basin is rather similar to the observations, as shown in Fig. 3. The black dots are observations; the range from HadCM3’s control run is shown as the gray-shaded region, while PCM’s control run is shown as the cross-hatched region. Both models have somewhat weak internal variability in the South Atlantic below 250 m and in the Indian Ocean around 100 m. In other basins and depths the observed curve generally falls within the range of HadCM3 ensembles, while PCM tends to show overly strong variability.
The analyses discussed so far use detrended data, yet any anthropogenic response might be a near-linear trend over the time period. It is therefore useful to show a simple comparison of the linear trends (1960–99) found in the observations, control model runs, and anthropogenic model runs. This is shown in Fig. 4 for PCM and Fig. 5 for HadCM3; all data have been sampled with the dd mask. The results vary by model and basin, but generally the observed 40-yr trends fall within the trends seen in the anthropogenically forced runs, and outside the trends seen in the control run ensemble members in the upper 100 m of the water column. This is confirmed by a Kolmogorov–Smirnov (K–S) test of the trends from the control and anthropogenic ensemble members (results shown in the plot).
In summary, we have performed a number of comparisons of the observed data with the model control and anthropogenic runs to evaluate the models’ ability to reproduce climate variability. Comparing the spectra of the detrended observations to the detrended anthropogenically forced runs shows PCM’s level of variability is in general agreement with observations; HadCM3 tends to have too much variability above 100 m and too little below 400 m. Comparing low-frequency standard deviations of temperature to the observations shows the model control runs bracket the observed values, but lack large-scale coherence in temperature fluctuations around 400 m in the Southern Hemisphere western Pacific. However, B05 (their Fig. 2) show no detectable warming signal at that location and depth, so the implication of this model shortcoming for the detection issue is moot. Comparing simple linear trends in the observations to the control and anthropogenic model runs shows the observed trends are consistent with those found in the anthropogenic runs, and well separated from those found in the control runs in the upper part of the water column. None of these analyses indicate the models have systematically weak variability in the top 100 m (where the ocean warming signal is greatest). We conclude that the models provide a good estimate of natural internal and external variability for detection and attribution studies.
4. Ocean warming fingerprint
The time series for yearly volume average temperature anomaly (0–700 m) are shown for PCM in Fig. 6 and HadCM3 in Fig. 7. Left panels show results from the anthropogenically forced ensemble members, right panels show same-length chunks from the control run. Anomalies are calculated relative to the base period of 1957–90 to match the treatment of the observations (Levitus et al. 2005). For brevity, in the rest of this work we will refer to the yearly, volume-averaged temperature anomaly as the “temperature.” The values in Figs. 6 and 7 are taken from sampled points only, as indicated by the dd mask. The period shown is 1955–99, in deference to PCM runs, which end in 1999. [Levitus et al. (2005, their Fig. 1) show a similar graph for observed heat content that extends through 2003.]
Both PCM and HadCM3 show a great deal of temporal variability, which differs significantly between ensemble members. For instance, PCM realizations B05.03 and B06.08 show near-constant temperatures over the last 12 yr of the record, while B06.28, B06.85, and B06.97 show cooling in the final years of the record. Most ensembles show multiyear periods where temperature is cooler than a previously attained value. Such variability can be seen in observations as well, although the decadal swing from 1970 to 1980 is larger than anything seen in the PCM ensemble members (but comparable to swings seen in HadCM3’s abw1 and abw2). The consistent strong trends seen in the observations and anthropogenically forced runs of both models are not found in the control runs, which lack anthropogenic forcing (right panels in Figs. 6 and 7).
The temperature series by region for PCM are shown in Fig. 8. Individual ensemble members (gray lines) show a great deal of variability, both year to year and between realizations. The observations (thick solid line) are consistent with individual ensemble members in the amplitude of year-to-year variability and in the overall trend by basin. Taking the average over ensembles (thick dashed line) removes much of the higher-frequency variability not of interest here, but still worth showing to illustrate the model generates basinwide variability consistent with observations. The anthropogenic forcing increases gradually and monotonically during this time period, so for the rest of the analysis we average by decade to reduce the effects of high-frequency internal natural climate variability.
a. Definition of fingerprint
We desire to define a signal that concisely expresses both spatial and temporal ocean warming found in the model runs with anthropogenic forcing. Given such a fingerprint, we can examine how likely it is to find this pattern in the observations, unforced control run (natural internal variability), and runs with natural external forcing (solar fluctuations and volcanoes).
We first remove the influence of model drift by differencing temperature in the forced ensemble runs from the simultaneous low-frequency fitted temperature in the control run (details in appendix A). We then define an ensemble common signal (ECS) for an ensemble of n members as follows. For each member we concatenate, by ocean basin, the four decadal temperature values to form C(n, t̂). Decades are taken as 1960–69, etc. Here t̂ can take on 24 values (four decades × six basins; the x axis in Fig. 9). We do this independently for each of the 16 standard depth levels in the top 700 m (the maximum depth at which yearly data are given in the observed dataset). To get the final ECS, we use standard principal component analysis (PCA) to decompose C:
where m is the mode number, E is the empirical orthogonal function, and P is the principal component. We take the leading principal component, P(1, t̂), as the ECS.
The ECS can be formed from any set of ensemble members, such as the PCM or HadCM3 ensemble members alone. Here, we define our ensemble common model fingerprint of ocean warming (or simply “fingerprint”) as the ECS of the 12 PCM ensemble members taken together with the 4 HadCM3 ensemble members. It thus represents the common warming signal in all the ensembles available from the two models. The fingerprint explains 86% of the variance in the surface layers, dropping to 30% of variance at 700 m. As tests, we tried both volume weighting C before PCA analysis, and simply averaging the ensemble members to obtain the fingerprint; neither made much difference. Equally weighting the mean of the 12 PCM ensembles and the mean of the 4 HadCM3 ensembles to form the fingerprint changed it somewhat in the bottom three levels (500–700 m), but had little effect higher in the water column, where the signal is concentrated.
The ensemble common model fingerprint at 50 m is shown in Fig. 9 (dashed line with squares). Also illustrated is the same quantity calculated from observations (thick line with dots), and the 16 individual ensemble members that go into making the model fingerprint (thin gray lines). The observed warming matches the model fingerprint quite well. Consistent differences between oceans can be seen; for example, warming has been greater in the North than South Atlantic.
The amplitude of the fingerprint decreases with depth as shown in Fig. 10. Peak-to-peak values drop from 0.3°C near the surface to 0.05°C at 600 m. Again, differences between basins are apparent; the signal drops less in the North and South Atlantic than in the South Pacific and South Indian Oceans.
The ensemble common model fingerprint is compared to the observations, and to the models’ anthropogenically forced and control run ensemble members, in Fig. 11. The figure is in the same format used in B05, although here the entire World Ocean fingerprint is used instead of individual basins. Briefly, “signal strength” is defined as
where F is the ensemble common model fingerprint at a given level and T is the target being compared to, taken at the same level. The signal strength is similar to a correlation, but without normalization by T (equivalently, it is the correlation between F and T multiplied by the standard deviation of T). It retains information on both the agreement in temporal evolution of the two signals and the strength of T. Retention of signal strength is an important point; we would deem a model unsuccessful at reproducing the target signal if it predicted radically too weak or strong a signal, even with a perfect correlation.
Over most of the water column there is good agreement between the observed signal strength and the anthropogenically forced ensemble members, in both PCM and HadCM3. Moreover, in the upper part of the water column there is a clear separation between the observations and the natural internal variability found in the control runs (cross-hatched region). Consistent with B05, we conclude that the observed warming cannot be explained by natural internal variability, but can be explained by anthropogenic forcing. The results of section 3 should be kept in mind when forming this conclusion; that is, it is important to assess a model’s natural internal variability, and compare it to the available observations, before the model estimate of natural internal variability can sensibly be compared to the observed warming signal. Here, we have done this comparison and found the models’ estimation of natural internal variability to be in accord with the observations.
The triangles in the left panel in Fig. 11 show the signal strength for the ECS of PCM runs forced by estimated solar and volcanic forcing alone, that is, natural external variability (triangles). The signal strength in these runs falls within the range of natural internal variability, and distinctly outside the envelope of the anthropogenically forced runs. The conclusion is that natural external variability cannot explain the observed ocean warming. In sum, natural internal variability and solar/volcanic fluctuations are too weak to explain the observed ocean warming signal, but the warming is consistent with that expected from anthropogenic forcing.
The correlation between the ensemble common model fingerprint and the observations is shown in Fig. 12. The ensemble common model fingerprint explains 80%–90% of the variance of the observed signal over the top 75 m. The correlation drops to a minimum at depths between 125 and 150 m, possibly because of variability in the thermocline. It then increases again below that level, such that the model fingerprint is significantly correlated with the observations and explains ∼35% of the observed variance between 250 and 600 m.
Once the global fingerprint concatenated by ocean has been calculated, it can be split apart again to perform detection individually by basin. This is illustrated in B05, and so will not be repeated here, although we will note that adding an additional seven PCM anthropogenically forced realizations to the five used in B05 made little difference (not shown).
5. Change in heat flux components
The ocean warming in various basins is accomplished by an increase in net surface heat flux (NSHF) due to anthropogenic forcing, the regional details of which do not appear to have been shown before. Several previous studies have shown globally averaged radiative forcing at various atmospheric levels (e.g., Hansen and et al. 2002; Johns et al. 2003), while Pierce (2004) showed global surface fluxes. Sun and Hansen (2003) have an interesting analysis of NSHF and ocean heat storage, as well as changes in ocean heat transport in an anthropogenically forced simulation, but do not separate out the contribution of the various surface heat flux component fields. In this section we show the regional anthropogenic change in surface heat flux components and oceanic heat storage estimated by PCM. (It is not possible to show the same quantities from HadCM3 as not all of the surface heat flux components were saved.)
a. Net surface heat flux
Figure 13 shows PCM’s change in surface heat flux components relative to the 1880–1919 mean (the South Pacific, which is similar to the North Pacific, is omitted). All components have the sign convention that positive values heat the ocean, including LW surf, the longwave flux emitted from the surface (which is then always negative).
Globally averaged (Fig. 13, lower-right panel), the change in NSHF relative to preindustrial conditions reaches about 0.71 ± 0.05 W m−2 by the 1990s. Willis et al. (2004), analyzing satellite altimetric height data with observed in situ temperature profiles, estimated a global ocean warming rate of 0.86 ± 0.12 W m−2 from 1993 to 2003, which suggests the model NSHF might be slightly low. The increase in downward longwave (LW) flux from the atmosphere (3.7 W m−2) is not completely compensated for by an increase in upward LW (2.2 W m−2) from the surface, resulting in a net LW increase by about 1.5 W m−2. Incoming shortwave radiation has a negative trend, due to the effects of increasing sulfate aerosols and, possibly, changes in cloud cover. The trend in latent heat flux is also negative, suggesting increased evaporation. The sensible heat flux increases by about 0.6 W m−2.
The anthropogenically forced changes in individual heat flux components (particularly LW) are distinguishable from the preindustrial mean by the 1920s. However, due to the near cancellation of the competing effects, NSHF (and thus the ocean warming) does not become well separated from the preindustrial mean until the 1960s. This multidecade lag between the effects of anthropogenic forcing on the atmosphere and on NSHF (and hence surface temperatures) does not seem to have been noted before. It should be noted that the anthropogenically forced PCM runs used here do not include solar and volcanic forcing, which might alter the times by which the runs can be distinguished.
Individual oceans each have their own signatures of how flux components evolve. For example, in the North Indian Ocean NSHF is near zero until the 1990s, due to evaporative cooling and a strong decrease in incoming shortwave radiation [cf. Gent and Danabasoglu (2004), who noted the Indian Ocean’s lack of monotonically increasing warming in CCSM2 was an “interesting exception” to the other basins]. There is also considerable variability between ensemble members, as indicated by the whisker plots. For example, the North Indian Ocean shows ensemble members with a negative NSHF change by the 1990s. Not every region is obliged to exactly reproduce the globally averaged trends. This variability prevents the same calculation from being sensibly performed for HadCM3, since the heat flux components were only saved for one ensemble member.
b. Advection and local heat storage
The rate of change of heat storage in a basin depends on NSHF and net heat advection into the basin [cf. Gregory (2000), who examine the effects of vertical ocean heat transport on surface temperature change in an anthropogenically forced model run]. With the anthropogenically forced PCM ensemble members, it is possible to calculate local heat storage’s rate of change and compare it to NSHF, thereby estimating the role changes in net advection play in producing ocean warming.
Figure 14 shows time series (averaged by decade) of NSHF and changes in local heat storage. The contribution of advection is then calculated as a residual between the other two. Values are shown relative to the average over 1880–1919, since we are interested in examining anthropogenically forced changes. To facilitate comparison, changes in heat are normalized by surface area to yield an equivalent surface heat flux (in watts per meters squared). The entire depth of the water column is used (without any masking), so the effect of vertical heat transfers is not included.
Advection’s role in accomplishing the anthropogenic ocean temperature change can be minimal or important, depending on the region. For example, changes in net advective warming play little role in North Atlantic anthropogenic warming, but significantly warm the North Indian Ocean, and cool the South Pacific. The differences between ensemble members (not shown) can be reasonably large.
The trends are also interesting, albeit noisy. The South Pacific shows a steadily decreasing contribution to ocean warming from advection, which partly offsets the NSHF to result in reduced warming. The South Indian Ocean, by contrast, shows increasing warming due to advection, and so warms more than would be expected due to NSHF alone.
The conclusion is that advection plays an important role in local ocean warming due to anthropogenic forcing. It would be interesting to explore how the change in advective heating is accomplished, for example, by changes in mean or eddy circulation, or by changes in the temperature difference between water flowing into and out of the basin. However, this is outside the scope of the present work.
6. Sampling variability and model uncertainty
The fraction of ocean volume sampled to 700 m for various oceans, as given by the dd mask, is shown in Fig. 15. Northern Hemisphere oceans (solid symbols) are noticeably better sampled than Southern Hemisphere oceans (open symbols). Even in the northern oceans, the actual locations sampled can vary wildly from year to year. In this section we explore how sampling might potentially affect the detection and attribution results. Gregory et al. (2004), in their analysis of the possible effects of sampling and data infill on the observed dataset, indicated that caution on this point is warranted, since estimates of ocean heat content can vary significantly depending on how infill data is calculated. Additionally, Sun and Hansen (2003) specifically noted that surface heat fluxes estimated from air–sea temperature differences “do not seem to be consistent with” the large decadal time-scale swing in ocean heat content found in the observational dataset, and the discrepancy might be due to incomplete sampling. Although we avoided using heat content and sampled only at the same places as the observations partly for these reasons, sampling is clearly an important issue that deserves consideration.
a. dd versus gp masking
We have already noted the difference between the observed data’s dd mask (sampled points only; used here and in B05) and the gp mask [points within ∼450 km of a sample, used by Barnett et al. (2001) and Reichert et al. (2002)]. The difference using these masks makes to the detection result (Fig. 11), by level, is shown in Fig. 16. There is little difference between results using either mask. More detailed results with detection by basin (not shown) give differences slightly bigger than seen in Fig. 16, but still not enough to affect the detection and attribution results. Reichert et al. (2002) compared using the gp mask to the full infilled dataset from Levitus et al. (2001) and similarly found it made little difference to the conclusions in a formal detection and attribution study. This does not necessarily imply there is little difference between the sampled and infilled data itself; rather, the signal detection strategy employed here [with the concatenated signal applied to Eq. (2)] is designed to maximize the signal to noise ratio as much as possible, thereby avoiding some problems with using infill data. Further details on the impact of infilling are given in AchutaRao et al. (2006, hereafter AJGR).
b. Accuracy of observed estimates
We have focused on the volume-averaged yearly temperature anomaly in various oceans. How accurately is this known, given the incomplete sampling?
It is straightforward to estimate this from the model. We use PCM, as there are more ensemble members to work with than with HadCM3 (12 versus 4). For each year’s dd sampling mask, we compare the ensemble member’s true basin-averaged temperature to the value estimated after masking. We then average into decades so the effect of changes in sampling over the years can be more easily seen. The 90% confidence intervals for observational error in estimated volume-averaged temperature are shown in Fig. 17, using sampling masks from the 1960s (gray swath) and 1990s (cross-hatched region). The intervals are centered around zero; however, the purpose of constructing such intervals is to compare them to the signal. Values in Fig. 17 are therefore offset by the decade’s mean temperature anomaly with respect to the preindustrial control run, for the 1960s (white squares) and 1990s (black dots).
For example, in the North Atlantic (Fig. 17, top-left panel) there has been warming of about 0.1°C by the 1960s. However, the sampling masks used in that era could have misestimated the basin-average temperature by up to ±0.1°C, making detection of the warming problematical by that time. By the 1990s the warming is greater and sampling better, leading to a clear separation of measured temperature anomalies from zero at all depths. Examination of the other decades (not shown) indicates the confidence intervals for all basins except the South Atlantic separate from zero (at least in the surface levels) starting in the 1970s; the South Atlantic then becomes separated from zero in the 1980s. The overall conclusion is that, given the evolving sampling density and strength of the signal, sampling is sufficient to confidently detect the anthropogenic warming signal in all oceans by the 1980s. Detection would be possible even earlier were a formal mechanism for climate change detection used, such as the fingerprint technique in B05, rather than simple decadal differences.
c. Model heat uptake differences
Different climate models show different ocean heat uptakes (Sokolov et al. 2003). How might this affect the detection and attribution results? The Climate Model Intercomparison Project (CMIP; Meehl et al. 2000) has accumulated a database of coupled climate runs with 1% yearly increasing CO2 forcing that can be used to examine this question.
The ocean warming experienced after 80 yr for a selection of models is shown in Table 1. The difference between the minimum and maximum warming found by basin (expressed here as a ratio) can be large; for example, the model with most warming showed the North Indian Ocean increasing in temperature almost 8 times more than the model with the least.
A crude estimate of the effect this model uncertainty has on the detection and attribution results can be made as follows. If R(b) is the maximum–minimum ratio from Table 1 for basin b, and F(t, b) is the ensemble common model warming fingerprint (a function of decade t and basin), then we construct estimated signals for the “minimum” and “maximum” models as follows:
Here, F is at the midpoint of Smin and Smax so the constructed signals are centered on the ensemble common model fingerprint, and at every point Smax/Smin = R. The constructed signals for the 50-m level are shown in Fig. 18. We can then treat these constructed signals as any other signal, using Eq. (2).
The results are shown in Fig. 19. Observations are shown as circles, the standard ensemble common model fingerprint as squares, and the maximum–minimum constructed signals as the white and black triangles, respectively. The cross-hatched region is the 90% confidence interval of the unforced control run from HadCM3, used here to be conservative as it is a wider region than found in PCM (see Fig. 16). Even with the minimum model signal, the forced result (black triangles) falls outside the 90% confidence interval of the unforced control in the upper part of the water column, indicating that natural internal variability would be rejected as an explanation of the warming. Still, the standard model fingerprint is a considerably better match to observations than either the minimum or maximum constructed signals. If various climate models exhibit the same difference in ocean heat uptake during the historical era as they show in the 1% CO2 increase CMIP runs, this might allow identification of models whose simulations agree poorly with observations. This could then be taken into account when considering their projections of future climate change.
In summary, model uncertainty in heat uptake is significant, but not large enough to affect the detection results presented above and in B05.
d. Seasonality of the observed sampling
There is no difficulty forming annually averaged temperature anomalies from the models, since complete monthly data coverage is available. For the observations, however, samples are made at specific times, and there is unlikely to be uniform seasonal coverage in any particular region. To address whether this could affect our analysis, we consider three questions. 1) Does the observed ocean temperature trend vary by season? If not, then even strongly seasonally biased annual anomalies would have little effect on our analysis of the low-frequency signal. 2) Is the seasonal temperature cycle sufficiently well known to correctly compute anomalies from the value expected at the observed time? This is a prerequisite for accurately computing the annual anomalies. 3) Is the ocean warming fingerprint present only in the upper part of the water column affected by the seasonal cycle? If not, then there is little reason to believe that any seasonal bias is affecting our conclusions. These questions will be considered in turn.
1) Temperature trend by season
The best place to examine the seasonality of observed ocean temperature trends is at the surface, since the amplitude of the seasonal cycle decreases with depth. This also allows the long record of SST to be used. The linear trend in SST anomalies, by season, over the most recent 100 yr (1905–2004) computed from the Kaplan et al. (1998) extended dataset (downloaded from the Climatic Data Center Web site: http://www.cdc.noaa.gov/) is shown in Fig. 20. The seasonal trends differ in details, but there is broad agreement on the larger scales. The pattern correlations between the various seasons range from 0.58 to 0.92, with a mean of 0.76; all are significant at the 99% level. Similar agreement is seen using the most recent 80- and 50-yr periods (not shown). The lack of large systematic differences in low-frequency trend by season suggests that even if seasonally biased sampling were affecting the annual anomaly, it would have minimal impact on the low-frequency signal analyzed here.
2) Representation of annual cycle
The seasonal cycle of ocean temperatures in the World Ocean Atlas 2001 (WOA2001; Stephens et al. 2002), an earlier version of the temperature dataset used here, was examined by Antonov et al. (2004). They found the significant increase in data in WOA2001 compared to an even earlier (1994) version of the dataset made little difference to the depicted seasonal cycle. The implication is that there is sufficient data in WOA2001 to accurately model the seasonal cycle. In fact, they show that 99% of seasonal changes in ocean heat content, in both hemispheres, arise from the annual harmonic alone (their Table 1). Evidently, fine temporal sampling is not required to form an accurate annual cycle on basin scales. The annually averaged version of the dataset we use here is based on even more data than is used in WOA2001 (Levitus et al. 2005). It therefore would be expected to have an even more accurate depiction of the annual cycle, which would allow annual anomalies to be correctly computed even from irregularly sampled data.
3) Detection with depth
If seasonally biased observations were somehow mimicking the models’ anthropogenic warming fingerprint, the correlation between the ensemble common model fingerprint and the observed signal would become insignificant at depths little influenced by the annual cycle. Figure 12 shows this is not the case. The correlation of the model anthropogenic warming fingerprint with the observations is statistically significant, and nearly uniform, from 250 to 600 m. At such depths, temperatures are little influenced by the seasonal cycle (Gleckler et al. 2006). The fact that the signal is detectable at depth is a strong indication that it does not arise from seasonal biases in the observations.
In summary, multidecadal trends in SST do not exhibit large differences by season, previous work suggests that the dataset used here has a good representation of the seasonal cycle, and the model anthropogenic warming fingerprint is significantly correlated with the observations at depths little influenced by the seasonal cycle (250–600 m). We therefore conclude that the results shown here are not influenced by possible seasonal sampling biases in the observations.
The purpose of this work has been to explore a number of issues related to detection and attribution of ocean warming, including an evaluation of natural climate variability (internal and external) in the coupled climate models, the possible influence of sampling on the results, an estimate of the strength of model heat uptake uncertainty (compared to signal strength), the role of the surface heat flux components and advection in accomplishing the ocean warming, and the possible effects of seasonally biased observations. We used two independent coupled climate models, PCM and HadCM3 (neither of which uses flux correction) and compared them to the Levitus et al. (2005) yearly observed, gridded ocean temperature dataset (0–700 m). With few exceptions, we sampled the models in accordance with the observation’s dd sampling mask, which shows when and where an individual 1° × 1° grid box contains an observation. Our main conclusions are as follows.
PCM and HadCM3 show natural variability (internal and external) at levels similar to that observed at time scales shorter than 0.15 cy yr−1, except for having overly weak variability in basin-averaged temperatures between 300 and 500 m. This latter detail is not due to weak variability at a given point, but rather because observations show more coherence in variability at this depth (resulting in lack of cancellation when the basin-average temperature is computed) than the models. This increase in coherent variability at depth in the observations is not a global phenomena, but is rather localized to the Southern Hemisphere western tropical Pacific. Overall, the models provide a good measure of natural climate variability for anthropogenic detection studies.
The ensemble common model fingerprint of ocean warming due to anthropogenic forcing agrees well with observed ocean warming in the top 100 m of the water column, explaining about 80%–90% of the observed variance (globally and decade averaged). The warming is outside the envelope expected from natural variability, both internal and external (solar and volcanic fluctuations) to the climate system. Between 250 and 600 m the correlation is weaker but still statistically significant. The conclusion that the warming is outside the range expected from natural variability depends on the estimate of natural variability employed. Both models’ estimates of natural variability are consistent with the available observations. Since the observed ocean warming is outside this range of natural variability, we conclude the ocean is warming due to anthropogenic causes.
The increase in ocean heat content in PCM is driven by an 0.7 W m−2 increase in net surface heat flux over the world’s oceans since 1960. Downward longwave radiation increases by 3.7 W m−2, which is not completely compensated by an increase in upward longwave radiation of 2.2 W m−2. Net incoming shortwave and latent heat have negative trends. The anthropogenically forced changes in the longwave components become distinguishable from preindustrial values by the 1920s, but near cancellation of terms prevents the net heat flux from becoming well separated from the preindustrial mean until the 1960s, a gap of some decades.
In PCM (where data are available), changes in advection play an important part in the modeled anthropogenic change in ocean temperature in certain regions. For example, in the South Indian Ocean advection increases the overall warming, while it decreases the warming in the South Pacific.
Model estimates suggest the observed sampling is sufficient to estimate basin-averaged temperatures to about ±0.1°C. This is highly dependent on the basin and depth, however. Comparing this uncertainty to the model-projected strength of anthropogenic ocean warming, the historical coverage is sufficient to reliably detect ocean warming in all oceans.
CMIP2 results suggest ocean basin heat uptake in various climate models can vary by a wide margin (a factor of 2 to almost 8, depending on the region). Synthetically constructed ocean warming fingerprints corresponding to the “maximum” and “minimum” heat uptake ocean models suggest that even in the case of minimum heat uptake, the anthropogenically forced model signal falls outside the expected range of natural internal climate variability in the surface layers.
Analysis of ocean temperature trends by season, representation of the seasonal cycle, and detectability with depth indicate our results are unaffected by possible seasonal sampling biases in the observations.
Our conclusion that observed ocean warming arises from anthropogenic sources agrees with earlier authors such as Barnett et al. (2001), Levitus et al. (2001), Reichert et al. (2002), Gent and Danabasoglu (2004), and Gregory et al. (2004). The additional information presented here and in Barnett et al. (2005) on the vertical structure of the signal explains why it is more detectable in the upper part of the water column, as noted in Gent and Danabasoglu (2004). Additionally, illustrating the evolution of net surface heat flux and the role advection plays sheds more light on the processes driving the ocean warming signal. The analysis of detectability and sampling issues as a function of depth shows the limitations of the historical observations and implies how they might be usefully extended in the future. Finally, it should be noted that the two models used here, although producing good climate simulations overall, likely do not sample the full spread of model differences found in global climate models in use today; a fuller analysis that takes more model results into account would be worthwhile. As models with higher resolution or improved physical parameterizations become available, the kind of analysis shown here could reveal ever-more detailed information about how anthropogenic forcing is likely to affect our planet.
This work is a contribution from the International Detection and Attribution Group funded by the NOAA and DOE through NOAA’s CCDD program. We gratefully acknowledge DOE support through Grants DE-FG03-01ER63255 to the Scripps Institution of Oceanography and DOE-W-7405-ENG-48 to PCMDI at LLNL. Work at the Hadley Centre was supported by the U.K. Department for Environment, Food and Rural Affairs under Contract PECD 7/12/37 and by the Government Meteorological Research and Development Programme. We especially wish to thank Syd Levitus for his generosity in making his updated ocean dataset available, and to colleagues at NCAR and the Hadley Centre for performing and making available the model runs used in this work. Kaplan SST data provided by the NOAA–CIRES Climate Diagnostics Center, Boulder, Colorado, from their Web site. This work benefited from the comments of anonymous reviewers, for which we thank them.
Removing Model Drift
Our objective is to remove the model drift from the detection results. In Barnett et al. (2001), we fit a second-order trend to the control run and took the difference between the anthropogenically forced ensembles and fitted trend as the signal. The difference between the unforced control run and the fitted trend was used as the noise estimate. This gets more difficult when using the dd sampling mask, since the sampling mask itself changes significantly from year to year.
s(x, t) be the sampling mask (1 if grid box was sampled, 0 otherwise),
c(x, t) be the unforced control run model data,
e(x, t, n) be the anthropogenically forced ensemble run model data, and
υ(x) be the volume of the grid box,
where x is the location, t is the year (1955–99), and n is the ensemble number.
Then, the volume-averaged temperature for the control run (Tc) and an ensemble member (Te) are time series defined as
If no low-frequency fitting to the control run were desired, the signal for any particular ensemble member n could be defined as Te (t, n) − Tc (t). The problem with this comes when estimating the noise. If a procedure analogous to that used in Barnett et al. (2001) were employed, the noise would be Tc (t) − Tc (t) ≡ 0. So that is not useful.
We instead define γ, the time series of the control run sampled with one fixed mask, where the fixed mask is taken at year τ :
The advantage of this is ∂γ/∂t depends only on the way c(x, t) depends on t. By contrast, dTc(t)/dt depends on the way both c(x, t) and s(x, t) depend on t. This is not helpful, since changes in s with time are completely nonphysical; they might be driven by budgets for observing programs, for example. We want to get at the way c(x, t) changes in time, as it is the model drift to be eliminated.
Since the time dependence of γ only depends on a quantity with a physically sensible time variation [c(x, t)], we can reasonably fit a second-order trend to it over time (t) to produce γ* (t, τ).
Now consider only the diagonal elements of the γ* array:
The two time indices (t and τ) are then redundant, so we drop the τ and refer to this simply as Γ*(t), where Γ* represents the second-order trend-fitted volume-averaged temperature (i.e., retaining only model drift) seen at time t, using the correct sampling mask for that time.
We then estimate the signal for an ensemble as
and the noise as
This estimates the signal as the difference between what is seen at each year and what would be expected to be seen if only model drift were acting at that same year, using the correct sampling mask for the year in question.
A subtle point is that the final quantity reintroduces the temporal dependence on the sampling mask. That is, d/dt[Tc(t) − Γ*(t)] depends on ds/dt again [since Γ*(t) depends on s(t)]. The trend fitting removes the effect of model drift, not the effect of sampling mask changes. Another point worth noting is that after this procedure, the temperatures are anomalies with respect to the control run, which represents preindustrial conditions. We must then compute the anomalies with respect to the observed era (1957–90) to match the treatment of the observations (Levitus et al. 2005).
Variability in Models and Observations
Spectra by basin
Figure 2 shows temperatures globally averaged over all depths, but our primary interest is how ocean warming varies regionally (in the North and South Atlantic, Indian, and Pacific Oceans) and with depth [cf. Lysne and Deser (2002) for a regional model–data comparison of variability in the Pacific]. The data used to generate the global, depth-averaged spectrum in Fig. 2 are broken apart by basin and depth in Fig. B1 for the observations: Fig. B2 for PCM, and Fig. B3 for HadCM3. Only frequencies less than 0.2 cpy are shown since we are interested in examining the slow ocean warming over decades (in section 4 temperatures are averaged by decade before forming the ocean warming fingerprint). The low-frequency power at depth should be considered uncertain due to poor sampling in early decades.
Places where observed values fall outside the model ensemble range are shown in the panels marked “outside” in Figs. B2 and B3. Solid circles indicate the observed power falls above the 90% confidence limit of the ensembles; hollow circles indicate where it falls below. A circle (hollow or solid) will be seen by chance 10% of the time; the actual percentage of locations with circles is noted in the plot title for each basin. For the purposes of detecting anthropogenic ocean warming, solid circles indicate the model is deficient in variability, which could lead to a false detection of anthropogenic influences on the oceans; hollow circles indicate the model has too much variability, which could lead to the false rejection of anthropogenic forcing as the cause of ocean warming.
The spectral power density seen in PCM is close to the values seen in the observations in all basins except the North Atlantic, where PCM tends to exhibit too much variability below 75 m. In all other basins the fraction of observed values falling outside the model ensemble range is near that expected from chance. In HadCM3, there is a tendency to have weak variability at depth, and overly strong variability in the surface layers, especially in the South Atlantic, South Pacific, and North Indian basins.
Analysis of standard deviation
Gregory et al. (2004) compared observed low-frequency temperature variability to the HadCM3 control run (natural internal variability only) by computing the standard deviation of the volume-averaged, 5-yr running mean temperature at each level from the detrended data (their Fig. 4). We will refer to this quantity as SD[Avg(T)]; it is shown for observed temperatures (detrended by removing the best fit line) as the broken line with triangles in Fig. B4. There is a peak in observed variability at 400 m, which neither model control run reproduced (Fig. B5, right panel).
The subsurface peak in observed SD[Avg(T)] gives the impression that were a ship to occupy a station, there would be greater temperature variability measured at 400 m than above or below. This is not true, as can be seen in the global average of individual standard deviations at each point, Avg([SD(T)]. Generally, SD[Avg(T)] ≪ Avg[SD(T)] because significant cancellation occurs when the volume-average temperature is taken. How much cancellation depends on the spatial coherence of temperature fluctuations, so an indication of coherence can be seen by comparing SD[Avg(T)] to Avg[SD(T)].
Figure B4 shows both quantities. The two curves have been normalized by their maximum values so they can be easily compared. They differ at the surface, where there is less cancellation (i.e., more spatial coherence) in ocean temperature than below. Presumably, this is due to the atmosphere’s imposition of its typical length scales on the surface ocean. More importantly, Avg[SD(T)] shows no peak in variability at 400 m. This implies that the observed 400-m “peak in variability” arises not because ocean variability is unusually strong at 400 m, but because it is unusually spatially coherent. This reduces the cancellation between uncorrelated points seen to a greater extent above and below 400 m when the volume-average temperature is calculated.
Figure B5 compares observations to the control model runs for both methods of calculating variability. Data have been detrended as described in section 3. The left panel in Fig. B5 shows Avg[SD(T)]; by this measure the HadCM3 control run is generally consistent with observations, while the PCM control run has somewhat too strong internal variability between 20 and 600 m. The right panel in Fig. B5 shows SD[Avg(T)]; as in Gregory et al. (2004), the main discrepancy is that both models miss the observed maximum at 400 m. Comparing the panels shows the models’ failure to reproduce the peak in SD([Avg(T)] at 400 m happens mainly because they have less spatial coherence at that depth, not because variability at individual points is too weak. Further analysis (not shown) indicates the observed spatial coherence driving the peak at 400 m is not a global phenomena, but rather is confined roughly to the northwest half of a box from 4°S to 42°S, and 105°E to 130°W. As a general characterization, Fig. B5 shows that the model control runs bracket the observed pointwise variability at all depths, but have less spatial coherence than observed (but still realistic pointwise variability) around 400 m.
Corresponding author address: David W. Pierce, Climate Research Division, Scripps Institution of Oceanography, Mail Stop 0224, La Jolla, CA 92093-0224. Email: email@example.com