Attribution of extreme precipitation with updated observations and CMIP6 simulations

While the IPCC 5 th Assessment Working Group I report assessed observed changes in extreme precipitation based on both absolute and percentile-based extreme indices, human influence on extreme precipitation has rarely been evaluated based on percentile-based extreme indices. Here we conduct a formal detection and attribution analysis on changes in four percentile-based precipitation extreme indices. The indices include annual precipitation totals from days with precipitation exceeding the 99th and 95th percentiles of wet-day precipitation in 1961-1990 (R99p and R95p) and their contributions to annual total precipitation (R99pTOT and R95pTOT). We compare theses indices from a set of newly compiled observations during 1951-2014 with simulations from models participating in the Coupled Model Inter-comparison Project Phase 6 (CMIP6). We show that most land areas with observations experienced increases in these extreme indices with global warming during the historical period 1951-2014. The new CMIP6 models are able to reproduce these overall increases, though with considerable over- or underestimations in some regions. An optimal fingerprinting analysis reveals detectable anthropogenic signals in the observations of these indices averaged over the globe and over most continents. Furthermore, signals of greenhouse gases can be separately detected, taking other forcing into account, over the globe and Asia in these indices except for R95p. In contrast, signals of anthropogenic aerosols and natural forcings cannot be detected in any of these indices at either global or continental scales.


Introduction
Changes in extreme precipitation is drawing attention because of extreme precipitation disastrous effects on human life, social economy, agriculture and ecosystem (Stocker et al. 2014). Increases in the frequency and intensity of extreme precipitation have been observed at continental to global scales (Donat et al. 2013;Hartmann et al. 2013;Min et al. 2011;Westra et al. 2013). Globally, precipitation extremes have increased in more regions than have decreased (Alexander 2016;Rhein et al. 2013). At continental scale, there exists an overall increase in the intensity and frequency of heavy precipitation in Europe, Asia, North and South America (de los Milagros Skansi et al. 2013;Donat et al. 2013;Huang et al. 2017).
It is important to understand the causes of the observed changes in extreme precipitation. Some studies have used simulations conducted with the models participated in the Coupled Model Inter-comparison Project Phase 3 (CMIP3) and 5 (CMIP5) to conduct the detection and attribution analysis of extreme precipitation. Min et al. (2011) attributed the observed increases in annual maximum 1-day precipitation (Rx1day) in the Northern Hemispheric land region to anthropogenic influence on the climate. Using CMIP5 simulations and observations with improved spatial and temporal coverage, Zhang et al. (2013) reported that anthropogenic influence on Rx1day and annual maximum 5-day precipitation (Rx5day) was detected in regions such as North America and Europe. Wan et al. (2014) showed that the anthropogenic influence in high-latitude precipitation was detectable.

/ 32
Kirchmeier-Young and Zhang (2020) provided robust evidence that human influence has intensified extreme precipitation in North America. Attribution of changes in extreme precipitation at regional or country scale is less robust. For example, while Chen and Sun (2017) showed that anthropogenic signal could be detected in extreme precipitation over China, Li et al. (2018) indicated that the detection was not robust.
While there is evidence of human influence on extreme precipitation, model simulations suggest important differences in the roles that greenhouse gases and anthropogenic aerosols play in the changes of mean and extreme precipitation. Wu et al. (2013) showed that the lack of discernable trend in global mean precipitation expected from global warming is in part due to the counteracting effects of anthropogenic greenhouse gases and aerosols. Pendergrass et al. (2015) showed that influence of greenhouse gases and aerosols on extreme precipitation can differ substantially from that on mean precipitation. Studies with climate model simulations suggested substantial influence of aerosols on extreme precipitation in East Asia and in northwest Australia (Dey et al. 2019;Lin et al. 2016;Sillmann et al. 2017;Wang 2015). Yet the individual roles of greenhouses gases and anthropogenic aerosols in the observed changes in precipitation extremes have not been studied using a formal detection and attribution framework.
Percentile-based extreme precipitation indices (Zhang et al. 2011) including R99p and R95p and their fractions in the annual total precipitation (R95pTOT and R99pTOT) have been widely used in the studies of past and future changes in extreme Accepted for publication in Journal of Climate. DOI 10.1175/JCLI-D-19-1017.1.

/ 32
precipitation (e.g. Rhein et al. 2013;Dunn et al. 2020) and assessment of changes in extreme precipitation (e.g. Hartman et al, 2013). The R99p and R95p are defined as annual total precipitation accumulations from days with precipitation exceeding respectively the 99th or 95th percentiles of the probability distribution of wet day (defined as days with precipitation amount no less than 1mm) precipitation during a base period (e.g., 1961-1990). The R95pTOT and P99pTOT are defined by the ratio of these values to annual total precipitation during wet days (PRCPTOT). These indices provide a different angle of view than that of Rx1day (Zhang et al. 2011) though it can be difficult to compare these indices with Rx1day (Schär et al. 2016). Rx1day, defined as the maximum amount of daily precipitation occurred in the year has an advantage that it occurs once every year regardless of climates and that it is widely used in many engineering applications. The number of events included in the calculation of R95p and R99p is generally different from one and varies from one region to the other depending on climates. For example, the exceedance of the 99th percentile of wet-day amounts may happen more frequently than once per year in wet climates with more than 100 wet days (precipitation > 1mm) per year, or less frequently than once per year in dry climates. The exceedance of the 95th percentile of wet-day amounts would happen more frequently than once per year in many regions. As a result, the intensity of such events is of different level of rarity than Rx1day, and less rare for R95p in most regions. Different level of rarity of extreme precipitation may respond to warming differently (e.g. Li et al. 2019 about precipitation on more than one day in a year, this may allow the extraction of more information from available data than that from Rx1day, in a way similar to that of the peak-over-threshold method for the analysis of extreme value (e.g. Coles 2001). Previous detection and attribution studies on extreme precipitation have focused mainly on Rx1day and/or Rx5day (e.g., Min et al. 2011, Paik et al. 2020). An exception might be Dong et al. (2020) who conducted detection and attribution analysis on percentile-based extreme precipitation indices for Asia.
Here we examine possible human influence on these percentile-based precipitation indices at both the global and continental scales, and paying particular attention to the possibility of separating the influence of anthropogenic greenhouse gases from other forcings such as aerosols as the availability of simulations from the Detection and Attribution Model Intercomparison Project (DAMIP, Gillett et al. 2016) component of the Coupled Model Intercomparison Project phase 6 (CMIP6) offers a unique opportunity to do so. We consider R99p and R95p. We also consider contributions of annual precipitation from very wet days (R95pTOT) and from extremely wet days (R99pTOT), defined as ratios between R99p or R95p to PRCPTOT (Dunn et al. 2020). This is because the heterogeneous nature of extreme precipitation across space and uneven distribution of available station data both in time and space present a unique challenge when computing mean values for large regions (also see data section below). The use of ratios may reduce such difficulty to some extent. The paper is structured as follows. Section 2 describes observational Accepted for publication in Journal of Climate. DOI 10.1175/JCLI-D-19-1017.1.

/ 32
and CMIP6 model data used in this study, along with an introduction about the data processing and attribution analysis method. We then present the results in section 3 and conclude the study in Section 4.

a. Regions
We examine human influence on extreme precipitation at global and continental scales. While the IPCC 6 th Assessment Working Group I report attempts to use a common set of reference regions (Iturbide et al. 2020), there has not yet an attempt to assign these reference regions with continental boundaries. For this reason, we follow the geographic demarcation of the continents used by Jones et al. (2013;Fig. S1). Here, we consider six continents including Asia (ASI), Europe (EUR), North America (NAM), South America (SAM), Australia (AUS), and Africa (AFR). Among them, observational data coverage in SAM, AUG and AFR is very limited ( Figure S2). .

b. Observations
HadEX2 (Donat et al. 2013) has been a popular global dataset for the analysis of changes in weather and climate extremes including precipitation extremes. The last year in this dataset was 2010. At the time of writing, an update to HadEX2, HadEX3, was not available. We therefore decided to compile a global dataset using HadEX2 as a model. Here we use the daily precipitation dataset compiled by Sun et al. (2020) who examined historical changes in annual maximum 1-day and 5-day precipitation.

/ 32
It extends the HadEX2 collection in time to include more recent years, and also adds more stations in several countries including Australia, Argentina, Canada, China and Russia. These additional data were obtained from relevant national meteorological services in respective countries. In total, it contains observations from 14,796 land-based observing stations with more than 30 years of record across the global land area during 1900-2018. Compared with HadEX3 (Dunn et al. 2020), which is now available, this dataset has a better spatial coverage in some regions especially in Asia, North America, Africa (Fig. S2). More details about this dataset can be found in Sun et al. (2020).
We computed R99p and R95p as annual totals of wet-day precipitation amounts that exceed the 99 th and 95 th percentiles during base period 1961-1990 as was defined in Zhang et al. (2011). We also computed the annual total wet-day precipitation (PRCPTOT). Here after, we refer the ratio between R99p and PRCPTOT as R99pTOT and that between R95p and PRCPTOT as R95pTOT. A wet day is defined as a day with precipitation amount no less than 1mm/day. We only considered long-term stations in our analyses, using stations with at least 70% of years (45 years observational data is poor prior to 1950 and after 2014 (Fig. S3). We conduct our analyses on global and continental averages of the indices. To obtain these averages, we first gridded the station anomalies (relative to the 1961-90 average) of the four indices (R95p, R99p, R95pTOT and R99pTOT) onto 1.875°x 1.25°longitude-latitude grid boxes by averaging all station anomalies within each grid cell if that grid cell has at least one station. This grid size is consistent with that used in HadEX3 (Dunn et al. 2020). It should be noted that the gridded values represent grid-box averages of point estimation of extremes, rather than extremes of grid-box mean precipitation.
We find the global and continental mean series from our data are quite similar to those of HadEX3 for the globe and the northern hemispheric continents, with clear differences in the southern hemispheric continents (Fig. S4). The global and continental mean series for northern hemispheric continents are very consistent between our series and HadEX3, indicating that the two datasets are consistent at such spatial scale despite differences in the underlying station coverage and in the methods of data processing. We also found visible and sometimes sizeable difference in the continental mean series for SAM, AUS, AFR and for Africa in particular between the two datasets. Given that the spatial coverage in the Southern Hemisphere including SAM, AUS and AFR is very limited, with the number of grid boxes less than 100 (not shown), and there exist differences between the two datasets, we conclude that the data coverage for the Southern Hemispheric continents is not sufficiently good for conducting detection and attribution analysis.

/ 32
We thus conduct detection and attribution analyses on global and continental averages in the Northern Hemisphere.
When computing global and continental averages, difficult choices need to be made regarding station density and spatial coverage and representativeness. On one hand, the use of a coarser grid without increasing the minimum number of stations per grid for gridding will result in larger areas with data coverage but at the cost of higher uncertainty in individual grid box values. On the other hand, if higher station density is required which will reduce uncertainty in the estimates for individual boxes there will be a reduction in the number of boxes with data thereby reducing the fractional area covered by grid boxes containing observations. Our choice of using the resolution of 1.875°x1.25° should be considered as more conservative in representing values at the grid box level when compared with HadEX3 data since we do not use an interpolation scheme that enables the in-filling of data for grid boxes without any observations. To explore the sensitivity of detection and attribution result to the choices made, we also gridded station data onto 3.75°x2.5° longitude-latitude grid boxes, again requiring at least one station observation in a box. We find that this difference in gridding methods can have impact in quantitative results, but the qualitative conclusions about whether different forcing signal can be detected are not affected. We choose to present the results based on the 1.  (Gillett et al. 2016). In this study, we use 7 models that have at least three ensemble members for each forcing experiment and 16 models that provide the pre-industrial control experiments. Table 1  indices used in this study is beyond the scope of this study, we will provide some basic comparisons between observation and the simulations to give a flavor of model skill. Note that detection and attribution analysis, which examines how well a model simulated signal fits the observation, is in itself a stringent form of model evaluation. Europe, and East North America. We also produced a Taylor diagram (Taylor 2001) with the spatial correlation and normalized standard deviation for mean annual indices during 1951-2014 from observations and simulations over the global and three continental regions (Fig. S9). In general, spatial correlation between observation and simulation is high, around 0.9, and the normalized standard deviations in the observation and simulations are comparable and ensemble mean is closer to observation than individual model simulations. All these indicate that the CMIP6 simulations reproduce the basic features of changes in extreme precipitation giving the confidence that can be used for the purpose of this study. Dittus et al. (2016) also found that many CMIP5 models can reproduce the observed trends in R95pTOT in many Northern Hemisphere regions.

d. Detection and Attribution
Accepted for publication in Journal of Climate. DOI 10.1175/JCLI-D-19-1017.1.

/ 32
Detection and attribution analysis is conducted with a generalized linear regression (Ribes et al. 2013), which regresses the observations Y onto multi-model simulated signal patterns X, plus internal climate variability ε: Y=(X-v)β+ε, where v represent the influence of natural internal variability in the modeled signal patterns, and the regression coefficients β are unknown scaling factors to be estimated. If the 90% confidence interval of β falls above 0, the corresponding modeled signals are considered to be detected in the observations. In these cases, if the 90% confidence interval of β also includes 1, then the modeled responses are considered to be consistent with the observations (Allen and Stott 2003;Ribes et al. 2013).
For all the observations and model simulations, the gridded anomalies (relative to  of extreme precipitation indices during 1951-2014 at grid boxes are averaged over the globe and in three continental regions. In the detection analyses, the temporal and spatial dimensions are both reduced. For the temporal dimensions, the regional averaged time series is reduced by taking non-overlapping 5-year means, producing 13 temporal points for a given region (i.e., the globe and three continents). 14 / 32 thus only have one spatial dimension for the global and regional detection analyses.
Spatial averaging reduces regional features of responses to aerosol forcing, which may make it more difficult to separate this forcing response from that of other forcings. Allowing higher dimensionality on the other hand, increases the challenges involved in estimating the covariance structure of the noise terms in the total least square (TLS) regression model.

To estimate and test the scaling factors in the detection analyses, we obtain 111
64-year non-overlapping chunks of preindustrial control simulations from 16 CMIP6 climate models to estimate internal variability in the extreme indices (Table 1) historical (ALL) forcing. For two-signal analysis, observed changes are regressed onto responses to anthropogenic forcing (ANT, in which ANT forcing equals to ALL forcing minus natural forcing) and natural (NAT) forcing simultaneously so as to distinguish the individual role of ANT and NAT forcings in driving the observed changes. For three-signal analysis, responses to GHG forcing, AER forcing, and NAT forcing are involved simultaneously in the regression. A residual consistency test is applied to test whether the modeled internal variability is consistent with that observed as represented by regression residuals (Ribes et al. 2013). In these analyses, we assume that the modeled responses of these extreme indices to anthropogenic forcings including GHG and AER forcings, and NAT forcing are additive. The basic idea for the assumption is that all of the forcings over the historical period presumably produced small perturbations to the global radiative balance, thus allowing linear decomposition of the combined response. A few studies (e. g., Shiogama et al. 2013;Marvel et al. 2015) have shown that the response to GHG forcing may be different from that due to AER and NAT forcings. In the former, forcing is via a change in longwave radiation, while for the latter two cases, forcing is dominantly via changes in shortwave radiation. Despite these differences, we find for our analyzed precipitation indices, the linear decompositions are generally reasonable since the response to ALL forcing is very similar to the sum of those from the GHG, AER and NAT forcings (not shown).  observed decreasing trends in Asia and western North America do not appear in the multi-model ensemble mean trends. Possible reasons include the possibility that the observed decreasing trends are induced by internal climate variability, which would have been smoothed out in the ensemble mean trends, or that they may be associated with regional forcings that are not adequately represented in the CMIP6 forcings. The hist-GHG simulations show similar spatial patterns of trends seen in the historical simulations, but generally with much larger trend magnitudes. In contrast, the hist-aer simulations show decreasing trends almost everywhere, with particularly strong decreases in R95p in Asia. The counteracting effects of greenhouse gases and aerosols on extreme precipitation produce the weaker increasing trends in historical simulations than in hist-GHG simulations in most continents. In Europe, the GHG response is smaller than the ALL response in recent decades because of small aerosol response and positive contribution from NAT forcing. We also observe remarkably larger decreases in these two percentile-based indices, especially R95p, in hist-aer simulations in Southeast Asia and the eastern United States, suggesting an important role of aerosols in extreme precipitation in these regions. The hist-nat simulations exhibit mixed decreasing and increasing trends of small magnitudes, suggesting a limited role of natural forcing in driving changes in the percentile-based precipitation extreme indices. Similar findings are seen in the modeled trends in the two fractional indices R99pTOT and R95pTOT, but with more uniform spatial patterns, resulting from the reduced effect of local precipitation climatology differences after normalizing the percentile-based indices by local annual total precipitation and also Accepted for publication in Journal of Climate. DOI 10.1175/JCLI-D-19-1017.1.

/ 32
from better comparability between observations and models. Figure 3 shows the 5-year mean anomaly time series for the observed and modeled precipitation extreme indices averaged over the globe and the continents. It is found that multi-model mean responses to ALL forcings reproduce reasonably well the observed upward trends for all studied indices (red lines), with the observed trends (black lines) falling within the central 90% ranges of the simulated responses of individual model runs to ALL forcings in all regions (light pink envelopes). For all indices, the multi-model mean responses to GHG forcings also simulate upward trends, but the magnitudes tend to be larger than the observed trends in many cases (blue lines). The multi-model mean responses to AER forcings show decreasing trends (brown lines). The NAT forcings responses do not exhibit statistically significant long-term trends in these indices, but they do show weak decadal variations that may reflect the effects of volcanic forcing (green lines). changes in the extreme precipitation changes, which is in line with the results reported in Dong et al. (2020), who used a shorter period  and CMIP5 simulations. In EUR and NAM, the best estimates of scaling factors are greater than unity and thus indicate an underestimate of model simulated changes compared with observations. For most cases, the residual consistency test passed except in EUR.

b. Detection results for extreme precipitation changes
In EUR, the model-simulated variability and the regression residual are not consistent  For the two percentile-based indices, the ANT signals can be detected globally and in northern hemispheric continental regions, while the NAT signals cannot be detected.
The best estimates of ANT scaling factor in GLB and three continental regions are very close to or slightly larger than unity, indicating that the ANT response from analyzed CMIP6 models are consistent with observations or tend to slightly underestimate the observed changes in these regions. In contrast, the NAT signal while the NAT signals cannot be detected. Nevertheless, the confidence intervals of the scaling factors are generally narrower than the two percentile-based indices, especially for the global analysis and in Asia. This suggests that human influence is more detectable in the ratio indices, consistent with our previous findings in Asia (Dong et al. 2020). The best estimates of the ANT scaling factor for R99p and R99pTOT in GLB and three continental regions are quite closed to unity, indicating a decent performance of the available CMIP6 in simulating the ANT signals presented in observations.
Estimates of scaling factors for the three-signal analysis are presented in Fig. 6. When the signals of GHG, AER and NAT are considered simultaneously in the detection analyses, only GHG signal can be detected in several cases. For R99p, the best estimates of the scaling factors for the GHG signal are generally greater than zero for GLB and the northern hemispheric continental regions. For R95p, the GHG signal can be detected only marginally in Asia. For both R99pTOT and R95pTOT, the GHG signals are detected only for GLB and Asia. The AER and NAT signals cannot be detected for most indices in all the regions though the NAT seems to be detected at global scale for R95p but with large confidence intervals. Also, to test the influence of the collinearity between the GHG and AER changes on the detection results, we conduct the global detection analyses based on three space dimensions from the 5-year series in three continents. We find similar results but with the residual consistency test not passed, which may reflect the difficulties to separate the response of

Conclusions and Discussions
In this study, we present a formal detection and attribution analysis of the observed changes in four precipitation extreme indices using the updated observations and the newly available CMIP6 simulations. The four extreme indices are R99p, R95p, R99pTOT and R95pTOT, representing precipitation totals from heavy precipitation days and the contributions of these extreme precipitation days to annual precipitation totals. We find that the influence of anthropogenic forcing can be detected in these extreme indices globally and in the northern hemispheric continental regions. The net influence of anthropogenic greenhouse gases is detectable globally and in some northern hemispheric continental regions such as Asia and Europe. In general, the CMIP6 models forced with all-known historical forcings reproduce the observed changes in all four extreme indices, while they may over-or underestimate the observed changes in some continental regions. For example, models substantially underestimate changes in the fractional indices in Europe and overestimate them in Asia.
Analyzing long-term changes in the percentile-based indices such as those used here provides different yet complementary picture to that of annual maximum 1-day or 5-day precipitation. In particular, it reflects the combined effect in the changes in the Accepted for publication in Journal of Climate. DOI 10.1175/JCLI-D-19-1017.1.

/ 32
frequency and intensity of heavy precipitation events. Unfortunately, this advantage comes with an important caveat: it is difficult to separate the effects of changes in the frequency and from those intensity. As different mechanisms may come into play in frequency change and in that intensity change, it can be harder to understand the physics behind the changes in these percentile indices. For example, it becomes difficult to establish a relatively simple relationship between changes in these indices and global warming levels (Li et al. 2020). While changes in Rx1day closely follow the C-C (Clausius-Clapeyron) relationship both in the observations (Westra et al. 2013), andSun et al. (2020) and in the model simulations (Kharin et al. 2013), the rates of change in R95p per 1°C global warming could be quite different from that in R99p for the same region and across regions of different climates. Table 1 List of multi-model simulations used in this study. Numbers represent the ALL, GHG, AER and NAT simulation ensemble sizes or the number of 64-year chunks for the CTL simulations.