The paper presents results from a climate change detection and attribution study on the decline of Arctic sea ice extent in September for the 1953–2012 period. For this period three independently derived observational datasets and simulations from multiple climate models are available to attribute observed changes in the sea ice extent to known climate forcings. Here we direct our attention to the combined cooling effect from other anthropogenic forcing agents (mainly aerosols), which has potentially masked a fraction of greenhouse gas–induced Arctic sea ice decline. The presented detection and attribution framework consists of a regression model, namely, regularized optimal fingerprinting, where observations are regressed onto model-simulated climate response patterns (i.e., fingerprints). We show that fingerprints from greenhouse gas, natural, and other anthropogenic forcings are detected in the three observed records of Arctic sea ice extent. Beyond that, our findings indicate that for the 1953–2012 period roughly 23% of the greenhouse gas–induced negative sea ice trend has been offset by a weak positive sea ice trend attributable to other anthropogenic forcing. We show that our detection and attribution results remain robust in the presence of emerging nonstationary internal climate variability acting upon sea ice using a perfect model experiment and data from two large ensembles of climate simulations.
One of the best quantified aspects of climate change in the Arctic is the change in the spatial extent of the sea ice as measured by passive-microwave sensors on board satellite systems since 1978. Satellite observations of the Arctic show a negative trend in sea ice concentration (SIC) in all seasons and all Arctic subregions except the Bering Sea for the 1979–2012 period (Serreze et al. 2007; Comiso et al. 2008; Stroeve et al. 2012b). The minimum sea ice extent (SIE) reached in September 2012 set a new low record following the earlier record set in 2007. All years past the year 2001 have SIE minima below the historical climatological mean conditions for the 1981 to 2012 reference period (Fetterer et al. 2002; Vaughan et al. 2013). Satellite observations of the modern era (from 1979 to the present) show that the downward trend in spatial sea ice extent is smaller in winter and larger in summer (Serreze et al. 2007). The September linear trend of Arctic ice decline stands at −12.4% per decade over the satellite record (Stroeve et al. 2012a). The decrease in spatial extent is accompanied by thinning of the sea ice (Rothrock et al. 1999; Lindsay et al. 2009; Kwok and Rothrock 2009; Laxon et al. 2013). Thinner sea ice is affected more strongly by air–ice–ocean interaction allowing easier ice breakup, enhanced ice circulation, drift speed, and export rates out of the Arctic basin through the Fram Strait (Kwok et al. 2013). The lateral shrinking and vertical thinning consequently lead to an overall loss in Arctic sea ice volume (Comiso et al. 2008; Bindoff et al. 2013).
The combined effect from direct and indirect radiative processes associated with sulfate aerosols is a net cooling effect and measurements of the global atmospheric sulfate aerosol burden imply that this effect is potentially offsetting a fraction of recent global warming due to increased greenhouse gas (GHG) concentrations that would otherwise have occurred in the absence of this forcing (Solomon et al. 2011; Sato et al. 1993; Robock 2000). Emissions of sulfur dioxide, the aerosol’s precursor gas, increased from the 1950s to the 1970s and peaked around 1980, after which the atmospheric burden has been declining (Gagné et al. 2017a). Indirect warming from decreasing aerosol burden after the 1980s and 2000s and continuously increasing greenhouse gas–induced warming have resulted in an net warming effect in the Arctic (Gillett et al. 2000, 2003; Solomon et al. 2011; Fyfe et al. 2013; Gagné et al. 2015; Najafi et al. 2015).
Based on climate model simulations conducted for phase 3 of the Coupled Model Intercomparison Project (CMIP3), previous studies detected and quantified the relative anthropogenic (ANT) influence in the observed decline of Arctic SIE in the presence of natural (NAT) influences, due to solar and volcanic activity, and internally induced climate variability (IV) (Vinnikov et al. 1999; Gregory et al. 2002; Min et al. 2008; Heo and Min 2014).
It could be the case that a fraction of the GHG-induced Arctic sea ice response is being masked by the offsetting aerosol effect. Any future reduction in global aerosol emissions could therefore result in additional Arctic sea ice loss due to reduced aerosol cooling (Gagné et al. 2015). To date no formal detection and attribution study has addressed the question of a possible offsetting effect in the trend of Arctic sea ice extent from other anthropogenic forcing (OANT), mainly tropospheric aerosols, and whether it can be detected in the observed record. In this detection and attribution study we address this question using extended data records of the Arctic sea ice observations (from 1953 to 2012) that combine satellite observations and operational sea ice charts from multiple sources. We use climate simulations from eight models from phase 5 of the Coupled Model Intercomparison Project (CMIP5) under different climate forcing combinations.
2. Data and methods
a. Observations of Arctic sea ice
Because of the extensive size and inhospitable nature of the Arctic region, spatially complete observations of sea ice concentration began with the satellite era. However, multiple records of local SIC conditions and ice edge position exist for the presatellite era. These records come from various sources with different resolutions and various spatial and temporal coverage. Global coupled climate model (GCM) simulations of sea ice under ALL, GHG, and NAT forcing conducted for CMIP5 end in the year 2012 (Stroeve et al. 2012a). In this study we use time series of September Arctic SIE representing the annual minimum for the 1953–2012 period. For this period observations sufficient to estimate sea ice extent and CMIP5 simulations of the response to individual forcings exist. For simulations and observed records of SIC, SIE is calculated as the area sum of grid cells with at least 15% SIC.
We use three different publicly available observational datasets that differ in the processing of the raw remote sensing data or the assimilation techniques of data from various sources.
For the presatellite era various sources of information on the Arctic sea ice condition exist in the form of operational charts. Although often limited to specific regions or time periods, operational sea ice charts provide one of the few archives of presatellite sea ice records. Their quality is not only limited by the areal coverage provided but may also be subject to the expertise of the navigation analysts who mapped the local sea ice conditions during cruises (Selyuzhenok et al. 2015). Errors in ice edge location in the Arctic and Antarctic Research Institute (AARI) charts issued before 1998 vary from 2–10 km (Polyakov et al. 2003) to 50 km (Mahoney et al. 2008). For this reason and others uncertainties in the sea ice extent are generally larger for the presatellite era. Therefore, we use three independently derived datasets and verify that consistent results are obtained for each. For times and regions where observations from multiple sources exist previous studies have applied a ranking scheme to determine the best source for use in the final compiled data product (Walsh et al. 2015). We do not extend our study to years prior to 1953 as Arctic-wide coverage can only be achieved by making use of climatological infilling where data gaps occurred. We use the observational sea ice extent datasets described below.
1) Walsh and Chapman compilation
In 2016 Walsh and Chapman released an updated version of gridded monthly sea ice extent and concentration fields from 1850 onward (version 1.1), called the WC dataset or just WC hereafter (Fetterer et al. 2016). From their monthly latitude × longitude sea ice concentration fields we calculate September sea ice extent for every year. The WC dataset is derived from 16 different individual sources, which can be identified by a data source flag value. It is the only Arctic-wide record that exists that incorporates sea ice information in the Russian sector from naval operational ice charts provided by the AARI covering 1933–2006 (Walsh and Johnson 1979; Chapman and Walsh 2001; Mahoney et al. 2008; Walsh et al. 2015). The years 1972–78 are filled with data from the National Ice Center (NIC). The National Ice Center produced weekly hand-drawn charts for that period by NIC’s ice analysts, which were later digitized by NIC. In producing the charts, the NIC analysts used all available sea ice information, which included the Nimbus (single channel) satellite data for the years preceding the multichannel sensors (SMMR, SMMI) that began operating in late 1978. Walsh and Chapman do use climatological infilling to close gaps in the data record, which reduces low-frequency variability in their sea ice record. Therefore, we do use multiple independently derived datasets and show that results are not sensitive to this choice.
The second set of sea ice observations we consider is the newest update of the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST.220.127.116.11), called HadISST2 hereafter.
HadISST2 provides sea ice concentration fields on a latitude × longitude grid from 1850 onward. Among numerous smaller improvements compared to the previous version (HadISST.18.104.22.168) HadISST2 uses a higher-resolution land mask and therefore has more ocean and ice grid cells than the previous version. Sea ice data for the 1953–1978 presatellite era rely mainly on an earlier version of WC that did not yet include the AARI data. Between 1995 and 2007 NIC sea ice charts were used as reference to adjust SIC biases between the different data sources (Titchner and Rayner 2014).
3) Pirón and Pasalodos compilation
The third dataset we consider (called PP) provides a new time series of September Arctic sea ice extent from 1935 to 2014 (Pirón and Pasalodos 2016). The observational time series produced by Pirón and Pasalodos (2016) consist of values of monthly mean SIE values. The authors include data for the Siberian sector from the AARI dataset. Furthermore, the dataset also includes an adjustment of a discontinuity in the sea ice record at the 1978–79 boundary marking the switch from single- to multichannel microwave sensors on board the Nimbus satellite systems (Meier et al. 2012; Pirón and Pasalodos 2016).
b. Climate model simulations
Besides sea ice observations we analyze climate simulations from eight CMIP5 models covering the 1953–2012 period that provide single-forcing simulations under historical GHG, historical NAT, and the combined effect of ALL forcing. The historical ALL simulation ended in 2005, and we extended these to the year 2012 with either ALL forcing extended simulations provided by the associated modeling centers (GISS-E2-H, GISS-E2-R, NorESM1-M) or representative concentration pathway 4.5 (RCP4.5) simulations (Miller et al. 2014). We limit our analysis to only RCP4.5 since differences in simulated Arctic sea ice extent in the different RCPs do not emerge prior to 2020 (Collins et al. 2013), and therefore we do not expect our results to be sensitive to the choice of RCP scenario. An overview of the climate models and corresponding ensemble sizes used in this study is shown in Table 1.
We calculate the CMIP5 multimodel ensemble (MME) mean response by giving equal weight to each model rather than weighting models with larger ensemble sizes more heavily (Table 1). Figure 1 shows an overview of both the observed and model-simulated Arctic sea ice extent response. Visible spikes of increased SIE in ALL simulations correspond well with increased stratospheric sulfate (SO4) burden after the three most recent major volcanic eruptions from Mt. Agung (1963) and, more distinctly, after El Chichón (1982) and Mt. Pinatubo (1991), indicating that the CMIP5 sea ice models simulate temporary weakening of the spatial sea ice decline due to aerosol cooling. An Arctic SIE increase following the major volcanic eruptions was also reported in simulations and observations by Gagné et al. (2017b). The SIE response to OANT forcing is derived by subtracting the signals from GHG and NAT from simulations under ALL forcing (see Fig. 1). Since most CMIP5 models provide a larger number of historical ALL forcing simulations than those with GHG and NAT forcing, the overall ensemble size of every climate model is reduced to the smallest shared ensemble number under all three forcing experiments. Figure 1 illustrates that in first half of the observational Arctic sea ice record the GHG and OANT responses oppose each other and the overall linear trend in Arctic sea ice extent over the 1953–2012 period in response to OANT forcing is positive. When considering the satellite era (1979–2012) only, the linear trend in Arctic sea ice extent in the simulated OANT response is weakly negative [−0.004 (106 km2 yr−1)]. Using presatellite observations in our detection and attribution analysis therefore ensures that the GHG and OANT responses are not collinear over the full period of analysis, which will help us to distinguish the distinct effects of these signals.
1) CMIP5 model selection
We use all but one CMIP5 models for which at least one ALL, GHG, and NAT simulation is available over the 1953–2012 period. Despite providing sufficient simulations from 1953 to 2012 under GHG, NAT, and ALL forcing the CSIRO Mk3.6.0 model was excluded from all analysis because of its unrealistic Arctic sea ice simulation (Massonnet et al. 2012; Uotila et al. 2013). The CSIRO Mk3.6.0 model simulates excessive sea ice in the winter months due to a filter parameter that regulates spurious ice–ocean fluxes between the ocean and sea ice models increasing the amount of sea ice formation and slowing down ice export out of the Arctic throughout the twenty-first century (Gordon et al. 2010).
A notable difference among the eight CMIP5 models is in the treatment of ozone forcing (Table 2). NorESM1-M is the only model that includes time-varying ozone in the GHG simulations, in contrast to all other climate models that only include well-mixed greenhouse gases in these simulations. The climate response to changes in ozone concentration is therefore included in the GHG simulations. In consequence, for NorESM1-M alone the ozone signal is subtracted in the approach to derive the OANT response.
2) Large ensemble experiments
Observed variables of the Earth system are subject to naturally generated internal variability. This internal variability arises from the coupled, complex, and highly nonlinear nature of the Earth system including atmospheric and oceanic turbulence as well as various feedback mechanisms between individual system components. Since the observed climate represents only one individual realization of the actual climate system internal climate variability is more evident in an ensemble of climate simulations. Large ensemble (LE) experiments produce numerous simulations under slightly different initial conditions to produce an unprecedented large sample of underlying internal variability of the climate (Kay et al. 2015). The first large ensemble experiment considered here is the CESM Large Ensemble Project. This 35-member large ensemble is produced with CESM1(CAM5) under historical ALL forcing for the 1920–2005 period. Thereafter this large ensemble uses the representative concentration pathway 8.5 (RCP8.5). For the CESM Large Ensemble Project ensemble spread is generated using round-off differences in the initial conditions of the atmospheric state (Kay et al. 2015). The second large ensemble experiment was conducted by the Canadian Centre for Climate Modeling and Analysis (CCCma) using the CanESM2 model under historical ALL forcing that is extended following RCP8.5 as well. Five initial ensemble members were each branched (split) in the year 1950 into 10 descendent runs, producing a 50-member large ensemble in total (Kirchmeier-Young et al. 2017). The ensemble spread in all members can then be interpreted as a measure of internal variability.
Standard detection and attribution approaches often assume that the covariance structure of internal variability is stationary. However, when studying Arctic sea ice extent this assumption is not necessarily true (e.g., with respect to seasonal ice-free conditions). Here we use the available large ensemble experiments to account for temporal changes in the covariance structure of internal variability and their implications for our detection and attribution results. Comparing two structurally different estimates of internal variability is motivated by the fact that internal variability acting upon spatial SIE may not be stationary over time, especially toward future nearly ice-free Arctic conditions.
To study the sensitivity of our results to the choice of how internal climate variability is prescribed in the fingerprinting method we estimate two separate sets of climate variability, IV1 and IV2. The first estimate of internal variability, IV1, is constructed from 408 nonoverlapping segments of ~24 500 years of control simulations carried out by 54 CMIP5 models under constant preindustrial [i.e., pre-1850 (piControl)] climate forcing. With the exception of potential model climate drift, the covariance structure of internal variability is stationary with each segment. The second set, IV2, comes from simulations under transient ALL forcing from the two large ensemble experiments (CESM LE, CanESM2 LE) and is produced by centering all 85 large ensemble members by subtracting the large ensemble mean response to ALL forcing (see Fig. 2). Figure 3 illustrates the emergence of increasing internal variability in transient large ensemble runs in comparison to control simulations from the same two climate models. The nonstationarity emerges in the first three decades of the twenty-first century indicated by variance ratios diverging from a horizontal uniform band meaning that the sea ice variability conditionally increases over time as mean sea ice extent decreases.
For CanESM2 it is striking that the variance ratios in the second half of the twentieth century are consistently lower than unity indicating lower variability than in the control runs. The ratio between the mean variance across all CanESM2 LE members and variance from a corresponding CanESM2 control run (1096 years) is only 0.77 for the 1950–2000 period, indicating lower variability in the LE historical ALL forcing simulations than in the control simulation under preindustrial forcing. Since the mean Arctic SIE in September for the 1950–2000 period is similar in the two sets of runs (i.e., 5.17 million km2 for CanESM2 LE and 5.06 million km2 CanESM2 control run), the difference in variability cannot be explained by the conditional link between mean SIE and its variance alone and must arise in the simulations itself (e.g., potential model drift) (Sen Gupta et al. 2013). Differences between temporal estimates of variance from preindustrial control simulation and ensemble estimates from transient runs may be caused by spurious forced variability if the ensemble spread of the large ensemble has not grown enough over time to equilibrate the equilibrium variance (i.e., CanESM2 LE members are spilt in 1950 and then run for another 150 years). More detailed discussions of model and observation biases and comparing metrics on simulations from CanESM2 and CanESM2 LE are given in Massonnet et al. (2012), Merryfield et al. (2013), Shu et al. (2015), and Bajish et al. (2015).
c. Detection and attribution
Detection and attribution of Arctic sea ice change and its effects on the global climate contributes to our understanding of the physical scientific basis of climate change. In the context of climate change detection and attribution studies detection is the demonstration that a certain index of the climate system has changed over a given time in some specific statistical sense (Hegerl et al. 2010; Bindoff et al. 2013; Zwiers et al. 2014).
Detection, in the detection and attribution formalism, involves the computation of regression parameters, or scaling factors, that adjust model-simulated response patterns under a given forcing (i.e., fingerprints) to best match the observations. If a scaling factor of a corresponding fingerprint is significantly (10% significance level) different from zero and positive, then that signal is detected in the observations. Negative scaling factors do not allow for a physical interpretation even if the associated confidence interval excludes zero but might be an indication that important real world processes are not well captured in the model simulation or that fingerprints from other forcings need to be considered in the detection and attribution study as well.
The second step in the detection and attribution framework is attribution, which assesses the relative contributions of multiple plausible (i.e., in a known physical sense) change signals by assigning weights (scaling factors) to the different change signals and assigning statistical confidence (Hegerl et al. 2010). If a fingerprint exceeds the observed signal (e.g., due to missing moderating forcing components or structural biases between observations and simulations) the analysis may produce a scaling factor that is less than unity to account for that mismatch (Mitchell et al. 2001; Allen and Stott 2003). Similarly, a scaling factor greater than one indicates a modeled response that is smaller than that in observations. If a value of unity for a factor cannot be excluded at a specific confidence level, a statement of attribution is made.
1) Regularized optimal fingerprinting
In this detection and attribution study the observed changes in Arctic September sea ice extent are related to the relative contribution of three climate forcings: GHG, NAT, and OANT. The detection and attribution framework uses an adaptation of the total least squares (TLS)-based optimal fingerprinting method (Allen and Stott 2003) referred to as regularized optimal fingerprinting (Ribes et al. 2013). This approach accounts for internal variability in both the observational record and the simulated response to forcings (Allen and Stott 2003), based on simulated control variability.
A key aspect of the total least squares approach is to reduce uncertainty in the estimate of the covariance matrix of the internal variability to the extent possible so that signal uncertainty estimates are well characterized. For this reason, we use the regularized covariance matrix estimator that is suggested by Ribes et al. (2013) and Ribes and Terray (2013). We also use essentially all available CMIP5 control runs to construct covariance matrix estimates (24 500 years of control run simulations) and assess the sensitivity to using covariance matrix estimates that are constructed by pooling data from two available large ensembles (equivalent to 5000 years of simulated internal variability).
Similar to the detection and attribution approach applied by Najafi et al. (2015) we derive scaling factors for GHG, OANT, and NAT from simulations under GHG, ALL, and NAT by first regressing observations onto the available response patterns to obtain scaling factors for GHG, ALL, and NAT. In an second step the response to ALL forcing in the initial regression model [Eq. (1)] is decomposed assuming linear additivity
and is substituted with
Then the scaling factors for GHG, OANT, and NAT can be written as
Again, if a scaling factor of a response pattern is statistically significantly (10% significance level) different from zero and positive, then that signal is detected in the observations. If the scaling factor is close to unity it means that the observed change is in close agreement with the estimated model response to ALL, NAT, and GHG forcing respectively.
After estimating the signal amplitudes the last step in regularized optimal fingerprinting is to conduct a residual consistency test (RCT) to check if the estimated regression residuals are consistent with the climate model noise represented by a second estimate of the covariance matrix of internal variability. The two covariance matrix estimates are constructed by dividing the initial full set of available control run segments into half. After removing all externally forced signals in the regression procedure the regression residuals are expected to be consistent with internal variability. The computation of the residual consistency test statistics involves an F test that compares the variances of the regression residuals with the internal variability estimated from control simulations. If the test statistic produces p values between 0.05 and 0.95 (10% significance level) the test is passed (Allen and Stott 2003; Rupp et al. 2013). An alternative version of the residual consistency test procedure adapted for regularized optimal fingerprinting that does not rely on parametric distributions is presented in Ribes et al. (2013).
Passing the residual consistency test indicates that the overall assumptions of the statistical model cannot be rejected at a given significance level. The test fails if at least one underlying assumption is violated, such as if the estimated internal variability from models is different from that in observations or if models simulate substantially incorrect response patterns. A detailed description of the residual consistency test is given in Allen and Stott (2003) and Ribes et al. (2013).
2) Sensitivity analysis
To study sensitivity of our detection and attribution results to nonstationary internal variability two perfect model experiments were carried out making use of two available large ensemble datasets. Using only a single model in a perfect model experiment has the benefit of reducing the influence of uncertainty associated with differences in model forcing, observational uncertainty and structural model differences inherent to multimodel ensembles like CMIP5.
In our perfect model experiment setup we create a one-signal test case where the detection and attribution algorithm is used to detect a fingerprint derived from all-but-one large ensemble members in “observations” that are represented by the remaining one member. First, similar to fingerprinting methods from other detection and attribution studies such as those of Min et al. (2011), Rupp et al. (2013), Planton et al. (2013), and Najafi et al. (2015, 2016), unforced CMIP5 control runs are used. Then the experiment is repeated using versions of nonstationary covariance structures constructed from transient large ensemble simulations under ALL forcing. The standard deviation of unforced control simulation from the models under investigation is 0.357 (106 km2) for CanESM2 and 0.270 (106 km2) for CESM1, both of which fall close to the mean standard deviation from 54 other CMIP5 models [0.346 (106 km2)] whose simulations are used to estimate IV1. While a fairer evaluation of the impact of the source of internal variability information would use control simulations from CanESM2 and CESM1 only, doing so would provide only a small fraction of available control run segments compared to the full CMIP5 archive to estimate internal variability. In addition, using control simulations from multiple CMIP5 models to estimate internal variability is common practice in the recent detection and attribution literature including Jones et al. (2013), Knutson et al. (2013), Gillett et al. (2013), Rupp et al. (2013), Wan et al. (2015), and Najafi et al. (2015). Here, we assess this practice and its suitability for detection and attribution in the context of Arctic sea ice extent. The perfect model experiment is repeated for four different analysis periods, the first from 1950 to 2005 and each following period extended by 15-yr increments up to 2050. This allows the onset of the sensitivity of the detection and attribution results to the choice of the estimate of internal variability to be narrowed down. Differences in the detection and attribution results are manifested in the calculated scaling factors and their 90% confidence intervals (90% CIs) as well as the success rate of the residual consistency test. The expectation is that in a perfect model experiment the failure rate for the residual consistency test (using a 90% CI) should be about 10%.
a. Detection and attribution results
The detection and attribution results for the multimodel mean (MMM) response are shown in Fig. 4. All three fingerprints from OANT, GHG, and NAT forcing are detected in all three observed records (HADISST2, WC, PP) at the 10% significance level. The detection results for the MMM are consistent when using the two different estimates of internal variability IV1 (control runs) and IV2 (large ensemble runs). The multimodel regression is found to have residuals consistent with simulated internal variability and the residual consistency test is passed for all three observational datasets at the 10% significance level.
Minimal differences between the WC and PP observational datasets are reflected in the similarity of the estimated scaling factors. The simulated sea ice response under GHG forcing is in closest agreement with the observed changes in September Arctic sea ice extent (i.e., scaling factors closest to unity) for all datasets while exhibiting the narrowest uncertainty ranges of the 90% CI. Consistent with a previous study that examines the role of aerosol forcing in the evolution of Arctic SIE (Gagné et al. 2017a), we find that OANT is well detected with a scaling factor that is only somewhat more uncertain than for GHG. Furthermore, NAT is also detected in all datasets, but with greater uncertainty than for the other signals. Using IV2 for internal variability slightly increases the uncertainty of the estimated scaling factors under all three forcings.
The detection and attribution results for individual CMIP5 models when using IV1 are presented in Fig. 5. Our detection and attribution results vary only little when IV2 is used instead (not shown). It is worth noting that signals from some models are estimated from a single simulation or very few simulations, which results in relatively uncertain signal estimates. In particular, the majority of scaling factors with open confidence intervals belong to signals estimated from three or fewer simulations.
b. Perfect model experiment
Results from the perfect model experiment using the two available large ensemble datasets (CanESM2 LE, CESM LE) are presented in Figs. 6 and 7. Both figures are structured into four panels each representing different analysis periods. Dots represent the estimated scaling factor while whiskers illustrate the associated 90% confidence intervals. Black colors are used for results using CMIP5 control runs to estimate internal variability. Red colors are used when internal variability was estimated using large ensemble residual variability. The residual consistency test score (RCTS) indicates the success rate of passing the test for both choices of internal variability.
The uncertainty in the estimated scaling factors is largest for both models when they are based on the first period, 1950–2005, which is the shortest period and, comparably, has the weakest signal. Overall the differences in the detection and attribution results appear to be unsystematic.
In the second analysis period from 1950 to 2020 the signal is detected in all cases for both models. Also a noticeable decrease in uncertainty in the estimated scaling factors is visible across both LEs.
In the third period from 1950 to 2035 the ALL signal from individual large ensemble members is always detected in the ensemble mean for both perfect model experiments. The uncertainty associated with corresponding beta terms is decreased compared to the two previous analysis periods. Only minimal differences in the scaling factors exist across the two models.
For the full available analysis period from 1950 to 2100 the ALL response is detected in all cases and with the smallest uncertainty range. The overall uncertainty ranges for estimated scaling factors in both models are smaller when IV2 is used instead of IV1.
Overall CanESM2 perfect model experiment results seem to be independent from the choice of internal variability while CESM large ensemble perfect model experiment scaling factors are generally in closer agreement with unity when IV2 is used instead of IV1 (see Figs. 6 and 7). This result seems to be consistent with the fact that CESM large ensemble also shows a stronger nonstationarity in variance.
The perfect model experiment results overall vary only little dependent on the choice of internal variability in all periods analyzed. However, bigger differences exist in the achieved RCTS. Across models in six out of eight cases the RCTS is higher when transient IV2 is used.
In conclusion our perfect model experiment results give no strong indication that in the contemporary detection and attribution analysis period from 1953 to 2012 nonstationary internal variability has any influence on the robustness of estimated scaling factors. Differences in the RCTS exist but are rather small over this time. However, the perfect model experiment results also suggest that the nonstationarity in Arctic internal sea ice variability is emerging in the recent decade and that future detection and attribution studies will have to address the question, potentially making use of transient climate simulations rather than unforced control runs.
c. Attributable trends
In the next step we compare the observed trend of Arctic sea ice decline between 1953 and 2012 with the trend from the multimodel mean response under GHG, ALL, and NAT forcing from which we calculate the contribution from OANT forcing (see Table 3). Here we only consider the observational record WC since it includes more information than HadISST2 and shows only marginal differences to the PP data compilation.
Sea ice trends over the 1953–2012 period that are attributable to GHG, OANT, and NAT are calculated by multiplying the trends in the multimodel mean forced responses by the estimated scaling factors from the detection and attribution results. The attributable trends for the CMIP5 multimodel mean response and WC trends are shown in Fig. 8. More detailed results are shown in Table 4.
Figure 8 shows that the best estimate of the trend attributable to NAT forcing is roughly zero, which is consistent with the expectation of weak long-term trends in solar and volcanic forcing over this period. The trend attributable to GHG forcing is more negative than the observed negative trend of Arctic sea ice decline in the WC data. Our results imply that the difference may be explained by a small positive annual sea ice trend attributable to the response to OANT forcing.
Thus we estimate that the effect from GHG and NAT forcing alone would have resulted in a significantly (10% significance level) higher negative sea ice extent trend in Arctic sea ice simulations of −0.088 (106 km2 yr−1) [from −0.107 to −0.044 (106 km2 yr−1)] compared to −0.063 (106 km2 yr−1) [from −0.080 to −0.006 (106 km2 yr−1)] under ALL forcing. We calculate that roughly 23% of the decline has been offset by the combined cooling effect from OANT forcing.
4. Discussion and conclusions
We find using the CMIP5 model ensemble that fingerprints from GHG, OANT, and NAT forcing can be robustly detected in all three observational datasets of September Arctic SIE. For the first time we detect the response to OANT and NAT forcing separately in observations of SIE for the full Arctic using a formal detection and attribution approach. Modeled fingerprints are in closest agreement with the observed Arctic SIE in the PP data record (scaling factors closest to unity). The strong similarity between PP and WC sea ice fields is also reflected in the detection and attribution results. When the HadISST2 dataset is used all fingerprints are also detected. However, in HadISST2 sea ice information from the Russian sector of the Arctic relies on an older version of WC and does not yet include the more refined AARI sea ice information. This additional observational uncertainty may be a reason for stronger deviations between modeled and observed sea ice fields in the case of HadISST2.
The combined cooling effect from anthropogenic aerosol forcing (OANT) is detectable in all three available datasets consistent with previous studies on aerosol offsetting of Arctic temperatures (Najafi et al. 2015). This increases our confidence in the multimodel ensemble mean signals and the available observational datasets in their ability to describe some important features of the evolution of Arctic sea ice extent. Using the 10% significance level, we show that OANT has offset about 23% of the decline that would have been expected in the absence of OANT forcing due to the combined climate response from GHG and NAT forcings. An implication is that any reduction in the global aerosol burden may therefore expose the impact of GHG forcing on Arctic SIE more strongly than has already occurred (Kloster et al. 2010; Levy et al. 2013). Furthermore, our findings strengthen confidence in studies showing that aerosol decrease may have a substantial contribution to future sea ice decline (Gagné et al. 2015).
The detection of a natural external (NAT) forcing in the observed records of September Arctic sea ice extent could be potentially linked to temporary cooling from volcanic forcing and the associated increase of stratospheric aerosols (Fyfe et al. 2013).
We have also shown that the natural variability acting upon SIE is not stationary. Therefore, we estimated the covariance matrix of internal variability, a key element of the detection and attribution framework, in two ways (IV1, IV2) to test the sensitivity of the detection and attribution results to transience in the characteristics of the underlying internal variability. When comparing the detection and attribution result between the two sets of estimated internal climate variability for the CMIP5 MME, no striking differences can be identified for the analysis period. Furthermore, when looking at individual models, the signal detection is generally not sensitive to the choice of internal variability and results vary only slightly.
The perfect model detection and attribution results for the 1950–2020 analysis period have the biggest overlap with the real world detection and attribution analysis period covering 1953–2012. No noticeable differences in the ALL signal detection frequency and accuracy are apparent that could be attributed to the choice of the internal variability estimate. Despite the fact that the increasingly nonstationary internal variability acting upon Arctic sea ice is unfolding in the contemporary decade, its effect on our detection and attribution study covering 1953–2012, while present, is small. However, detection and attribution methods based on stationary linear regression models might be limited in the future, especially in the context of SIE, due to increasingly nonstationary internal variability in the near-term future, and alternative approaches will become more applicable (Ribes et al. 2017).
Beyond the relevance for the estimation of the covariance matrix of internal climate variability in the detection and attribution formalism some detection and attribution related studies make use of unforced control simulations for constraining future projections of the climate system (Allen et al. 2000; Stott and Kettleborough 2002) and the likelihood of extreme events (Sun et al. 2014). In both cases the internal variability is assumed to be stationary by using unforced control simulations that are then used to evaluate the uncertainty of projected future climate conditions. This practice will have to be altered if the internal variability is in fact nonstationary. Both approaches could be revisited using large ensemble perfect model approaches to test their working assumptions.
Our study has been limited to times where reliable (i.e., spatially and temporally consistent) observations of Arctic sea ice exist. Any observationally based data projects that use climatological infilling (e.g., early HadISST sea ice data) are generally not suitable for detection and attribution. Hence, all datasets used in this study are limited to the 1953–2012 period where temporal and spatial data coverage sufficient to reliably estimate Arctic sea ice extent exists. Using recently available compilations, we have extended previous studies that were confined to the satellite era to a much longer period. However, in some cases observations for Arctic subregions prior to 1953 exist that could be explored in future detection and attribution studies to investigate regional climate change signals.
In addition, our analysis has been somewhat limited by the climate models that were available. Climate models represent many of the physical processes that control sea ice formation (e.g., melt, transport, and deformation), but many subgrid-scale processes must be parameterized and the development of more complex and higher-resolution climate models is far from complete. In this study we analyze model output from eight CMIP5 models that provided at least one simulation under GHG, ALL, and NAT climate forcing for the 1953–2012 period and excluded the CSIRO Mk3.6.0 model from our study because of very large biases in its sea ice simulation.
In future phases of coupled model comparison, such as CMIP6, more models will produce more realizations under optimized forcing experiments, including special aerosol-only simulations, which allows us to follow up on the question of how Arctic sea ice conditions will evolve in the future under changing aerosol emissions. Furthermore, aspects of model selection and model codependencies deserve some more reflection (Gillett et al. 2016; Knutti et al. 2017).
We acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output. For CMIP the U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. B.M. acknowledges funding from the Canadian Sea Ice and Snow Evolution Network (CanSISE). M. Piron, H. Titchner, and J. Walsh are thanked for their comments on available observational data sets. J. Fyfe, N. Swart, G. Flato, R. Najafi, A. Dirkson, and B. Johnson are thanked for their comments on earlier versions of the manuscript. AHM acknowledges funding from the Natural Science and Engineering Research Council of Canada (NSERC).