Simulated Tropical Precipitation Assessed across Three Major Phases of the Coupled Model Intercomparison Project (CMIP)

The representation of tropical precipitation is evaluated across three generations of models participating in phases 3, 5


Introduction
The representation of tropical precipitation has never been a strength of global climate models.Some reasons are well known, but have proven difficult to improve with classical climate modeling approaches.This includes the representation of moist convection, which produces the majority of precipitation in the tropics, but Denotes content that is immediately available upon publication as open access.
Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/MWR-D-19-0404.s1. is a process that coarse-resolution climate models must parameterize with the help of resolved processes.It is known that model differences in precipitation arising from such an approach can be substantial (e.g., Dai 2006;Stevens and Bony 2013).In reviewing progress over past phases of the Coupled Model Intercomparison Project (CMIP), Stouffer et al. (2017), identify six ''particularly important and long-standing biases'' that the authors hope will be reduced in CMIP's sixth phase (CMIP6).First among these is related to the misrepresentation of tropical precipitation, in the form of tropical rainbands being too hemispherically symmetric, something known as the double intertropical convergence zone (ITCZ) bias.Other studies have pointed to further deficiencies [e.g., in the representation of the summer monsoon (Zhang et al. 2015)], modes of internal variability (Ahn et al. 2017), and the intensity distribution and extremes of precipitation (Stephens et al. 2010).
A correct simulation of the tropical climate matters, not only directly for the region, but also indirectly by influencing the response of the general circulation to forcing at global scales (Held 1983;Palmer and Owen 1986;Zhou and Xie 2015).Precipitation is important due to its many impacts, ranging from ecosystems (Cox et al. 2000) to air pollution (Rodhe and Grandell 1972;Baker and Charlson 1990;Bourgeois and Bey 2011).Hence the past decades have witnessed substantial efforts to improve precipitation in climate models, including the representation of the hydrological cycle in the tropics.Despite these efforts, progress has proven unsatisfactory in past CMIP phases (Hawkins and Sutton 2011;Knutti and Sedlácek 2012;Flato et al. 2013), so much so that it has been suggested to pay the computational price of resolving precipitating convection, and abandoning the traditional approach to climate modeling with parameterized convection for studying tropical precipitation (Schär et al. 2020;Palmer and Stevens 2019;Satoh et al. 2019).In evaluating these arguments it seems sensible to ask if progress in simulating tropical precipitation is as unsatisfactory as past evaluations of CMIP models suggest.This question motivates the present study, revisiting the tropical precipitation over the three major phases of CMIP: CMIP3, CMIP5, and now CMIP6.
At a first glance, the hope that CMIP6 models would substantially address the long-standing biases in precipitation appears unfulfilled.CMIP6 models continue to show large differences in precipitation, compared to observations (Fig. 1).Half of the global precipitation occurs between 308S and 308N-a region we refer to as the tropics.Regional model biases relative to data from the Tropical Rainfall Measuring Mission (TRMM; Huffman et al. 2007Huffman et al. , 2010) ) range from 23 to 4 mm day 21   (Fig. 1).These occur partly in regions where the absolute amount is smaller than the tropical mean of 3.85 mm day 21 (e.g., in the southeast Pacific and southern Atlantic).Spatial disagreements are a southward displaced precipitation maximum over the Atlantic Ocean, a double-ITCZ pattern in precipitation over the Pacific Ocean, and an east-west precipitation anomaly over the Indian Ocean.
The question remains whether biases in tropical precipitation in CMIP6 models have been reduced compared to previous phases of CMIP.By combining the expertise of many authors, we apply here different previously used methods to broadly assess the representation of tropical precipitation across models participating in CMIP6.By applying the same methods to model output from the third and fifth phases of CMIP, we evaluate the extent to which model developments have been successful in improving tropical precipitation.
Much of what we show effectively extends previous studies on tropical precipitation in earlier CMIP models to CMIP6.The novelty of the present study is thus not in any specific analysis, but rather through our use of existing techniques to develop and take stock of the big picture.Specifically by looking systematically at the representation of tropical precipitation by three generations of CMIP models, across different regions and scales as measured by various metrics, we assess the status and progress in climate modeling for tropical precipitation.
For the purpose of our study, we collected observations and model output from historical simulations with 3-hourly to monthly resolution from 97 different data sources and applied 14 different analysis approaches.The analyses are based on known methods and chosen for their merit for giving a broad view on different characteristics.Our data and the analysis strategy are introduced in the next section (section 2), followed by the presentation of the results of this analysis, which are distributed across four sections, focusing on the climatology (section 3), natural cycles associated with solar radiative effects (section 4) and modes of internal variability (section 5), and long-term trends in the twentieth century (section 6).Opportunities for future research are discussed in section 7. We end with our conclusions in section 8.

1) MODEL OUTPUT
We assess the historical simulations of global coupled climate models produced for the last three major phases FIG. 1.Long-term multimodel means of CMIP6 precipitation.Shown are the spatial distributions of the present-day  precipitation statistics of CMIP6 as (a) the multimodel mean, (b) bias over land including small islands compared to gridded station observations from CRU, and (c) the mean bias in the tropics against TRMM.The thick contour indicates the isoline for the tropical mean precipitation of CMIP6 (3.58 mm day 21 ) for an easier comparison of regional biases to the precipitation amount.Biases are calculated from the monthly climatology for 2000-14.We use ensemble averages for models with several historical simulations.
of the Coupled Model Intercomparison Project: CMIP3 (Meehl et al. 2007), CMIP5 (Taylor et al. 2012), and CMIP6 (Eyring et al. 2016).In these simulations, the boundary conditions (e.g., irradiation, aerosols, orbital parameters, and greenhouse gas concentrations in the atmosphere) represent those estimated for the historical time period in the CMIP phase and therefore differ slightly from one another.The historical simulations in all phases of CMIP start in 1850 but end in 2000, 2005, and 2014 for CMIP3, CMIP5, and CMIP6, respectively.(Tables S1-S4 in the online supplemental material list the model output used here.) The availability of model output differs across the CMIP phases and the participating models.We therefore chose the data considering the availability and intended analyses as follows: 1991-2000 for subdaily (3hourly), 1961-2000 for daily, and 1900-2000 for monthly and annual analyses.For CMIP6, we additionally use data in the period 2000-14 for comparison against the current state-of-the-art observational record for the same time period (section 2b).Analyzed variables are total surface precipitation for all output frequencies as well as near-surface winds and top-of-the-atmosphere outgoing longwave radiation for daily to annual time scales.All CMIP data are averages over the given output intervals.CMIP3 and CMIP5 simulation results are summarized in the corresponding chapters of the fourth and fifth IPCC Assessment Reports (Randall et al. 2007;Flato et al. 2013).
Access to the CMIP data is facilitated by the Earth System Grid Federation (ESGF; Williams et al. 2016).For practical reasons, we use ESGF-published model output, which was already replicated by the German Climate Computing Center [Deutsches Klimarechenzentrum (DKRZ)] until 1 October 2019.Additionally, we use the not-yet-published model output from MPI-ESM-LR produced by the Max-Planck-Institute for Meteorology for CMIP6.

2) OBSERVATIONS
We use four observational datasets, listed in Table 1 and introduced here.The diversity in estimated precipitation among the datasets is taken as a measure of observational uncertainty, which for some ocean and mountainous regions with a sparse ground-based observation network can be considerable (e.g., for the Asian monsoon region) (Ceglar et al. 2017).The rainfall retrieval product of the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA; Huffman et al. 2007) version 7 provides 3-hourly data for 1998-2019.This dataset, TRMM hereafter, combines data from passive microwave sensors, calibrated by the TRMM precipitation radar, with infrared sensors (Huffman et al. 2010), and is corrected to match rain gauge data.We further use the 3-hourly precipitation estimate from the Climate Prediction Center morphing technique (CMORPH) version 1.0 for 1998-2017 (Joyce et al. 2004).CMORPH uses data from passive microwave measurements and cloud advection vectors from correlated images of infrared sensors.For climate change assessments, we use the gridded precipitation product of the Climatic Research Unit (CRU) time series version 4.03 (Harris et al. 2014) for 1901-2014 with 0.58 spatial resolution, based on gauge networks on land.
To test the observational uncertainty, we additionally use the monthly satellite-gauge product (''3IMERGM'') of the Integrated Multisatellite Retrievals for GPM (IMERG; Huffman et al. 2019) from the Global Precipitation Mission (Hou et al. 2014).IMERG extends the concept of TRMM but instead uses a dual-frequency precipitation radar paired with more passive microwave and infrared sensors.Overall, the observed mean precipitation rate for 2000-14 ranges from 3 mm day 21 (CMORPH) to 3.5 mm day 21 (IMERG) across our four observational datasets (Table 1).Individual regions can show larger observational differences, with the largest observational ranges exceeding 2 mm day 21 over islands, in the lee of mountain ranges, and in coastal areas (Fig. S1).The products mainly disagree over central Africa (CMORPH wet bias), in the Pacific warm pool (CMORPH dry bias), in the lee of mountain ranges in West India and the Malay Peninsula (CMORPH dry bias), on the Caribbean islands (CRU wet bias), and Central America (CMORPH dry bias).More details including seasonal differences are provided in the supplemental information.CMORPH and TRMM capture the observational range across the assessed satellite products over land and ocean.We therefore use differences in these two products to measure the observational uncertainty in our analyses.

b. Data analysis strategy
All datasets have been screened, and standardized for easy handling.This includes remapping the data to the same horizontal grid between 308S and 308N.Typically one would choose the coarsest resolution as common grid to avoid generating information that the model did not simulate, but this approach would have led to a crude comparison since models in CMIP3 had substantially coarser grids than in CMIP6.As a compromise, we use the T63 grid, which is the native grid of MPI-M's low-resolution configuration of MPI-ESMs in CMIP5 and CMIP6.This grid has 196 points along the equator, and hence a spatial resolution of approximately 200 km.We unify the precipitation unit of all datasets by calculating mm day 21 .
In addition to performing an analysis over the entire tropics, separate analyses are performed for tropical land and ocean.For this purpose we use the land-sea mask of MPI-ESM1.2.We count grid cells with more than 50% ocean surface as ocean and otherwise as land.This approach implies that small islands are assigned to ocean regions.All tropical lakes are defined as land.Results from the analyses for tropical land and ocean are shown if relevant.
The output from models that provide more than one simulation for the historical period are averaged before computing the mean of a CMIP phase.By this procedure, we avoid giving too much weight to an individual model that produced particularly many simulations.The model output includes both precipitation contributions from the model's subgrid parameterizations and the fractions associated with atmospheric dynamics explicitly resolved on the model grids.
As discussed in the introduction, none of the analysis techniques we employ are novel.Most are widely used in the climate modeling community (e.g., statistics over different time and length scales as well as analyses under different meteorological regimes).Some techniques are less familiar (e.g., the standardized precipitation index and the concept of Jennings scaling; Jennings 1950).These are included to present a broader view of how precipitation is represented in models.
We further analyze precipitation associated with a range of different atmospheric features like cloud regimes, monsoons, and intra-and interseasonal variability.The details of these techniques are introduced in the relevant sections.
A comparison of models and observations encompassing different time periods poses a number of challenges.One challenge is the definition of a common time period for the comparison, as not only do the different CMIP phases end on different years, they also overlap differently with satellite datasets.Initially we compared TRMM against CMIP6 for the overlapping time period 2000-14 as validation, and CMIP6 against CMIP5 and CMIP3 for the overlapping period 1900-99 to determine the development across CMIP generations.We found, however, only small differences in the statistics of CMIP6 for 1900-99 and 2000-14 in all our results, consistent with similar long-term mean statistics for CMIP6 (Table 2) and the small past trend in tropical mean precipitation (section 6).For instance, the spatial correlation coefficient of the CMIP6 precipitation climatologies for the twentieth and twenty-first centuries over the tropics is 0.998, much larger than the average correlations between CMIP models and TRMM (Table 3).For simplicity, and because a more temporally consistent comparison adds no new information, we compare TRMM and CMORPH for 2000-14 directly with the different CMIP phases for 1900-99.Another challenge was to establish to what extent changes across phases of CMIP were simply the result of a different mix of models in each phase.To test this possibility we selected the subset of models that participated in all phases of CMIP and tested to what extent this sample of models influenced our conclusion for the climatological mean.We found that using all the CMIP models, or just the subset participating in all CMIP phases yielded similar results (not shown).We further tested averaging over related models to account for different processing practices (Abramowitz et al. 2019).To this end, we calculated the standardized precipitation index on averages of related models in CMIP6, and identified only small differences that did not change our conclusions (not shown).

Climatology a. Tropical mean
There has and continues to be a long-standing discrepancy between energy-budget inferences of precipitation, and estimates of precipitation based on observations, whereby the former tend to be larger than the latter (Stephens et al. 2012;Stevens and Schwartz 2012;Wild et al. 2012).The tropical precipitation from CMIP models assessed here are also larger than the observational estimates.Compared to the tropical mean from TRMM of 3.23 mm day 21 , CMIP3 has an overestimation by 0.21 mm day 21 , and CMIP5 and CMIP6 by 0.34 mm day 21 (Table 2).The tropical means of CMIP5 and CMIP6 are outside of the spread in the satellite observations (Table 1).The intermodel standard deviation is larger than the mean bias for CMIP3, but smaller for CMIP5 and CMIP6.The overestimation is also seen for precipitation averaged over oceans, with CMIP5 and CMIP6 being outside of the observational range (Tables 1 and 2).For land, we find a slight underestimation in CMIP3 and CMIP5, but CMIP6 is in the observational range.The observed land to ocean ratios in precipitation of 0.86-0.99 are consistently underestimated in all CMIP means.The land-ocean ratio has, however, slightly increased across the CMIP phases with CMIP6 (0.82) being the closest to the lower bound of the observational range of the land-ocean ratio (Table 1).
The spatial pattern of tropical precipitation shows a systematic improvement across the CMIP phases, although the values do still not fall within the observational uncertainty.We measure this with the spatial correlations, r, in the annual mean tropical precipitation between CMIP and TRMM, with r 5 0.75 in CMIP3, r 5 0.79 in CMIP5, and r 5 0.84 in CMIP6 for the tropical mean (Table 3).Improvements across CMIP in r are also found for both tropical ocean and land separately, with r being slightly larger over ocean than over land (Table 3).The observed pattern differences, measured by r, are larger over land than over ocean (Fig. 2), but none of the CMIP means fall within the observational uncertainty for r, measured as the spatial correlation between CMORPH and TRMM.Only the two best CMIP6 models for this metric, CESM2 and CESM2-WACCM with r .0.9 over tropical land, come close to the observational range for r, reflecting a regional improvement in the tropical precipitation pattern of this model (Woelfle et al. 2019).Also the root-mean-square errors (RMSE) for precipitation compared to TRMM have decreased on average over the CMIP phases, from 1.85 mm day 21 in CMIP3 to 1.80 mm day 21 in CMIP5 and 1.55 mm day 21 in CMIP6, but again these are larger than the observational uncertainty (Table 3).RMSEs are slightly larger over ocean than over land in both CMIP3 and CMIP5, but this behavior has reversed in CMIP6.
Figure 2 shows the standard deviations of CMIP models in comparison to TRMM for land and ocean.While this measure of annual-mean variability in CMIP3 is too small, CMIP5 and CMIP6 are closer to observations over tropical land, while the standard deviation has been similar across the CMIP phases over tropical ocean.The difference between the mean precipitation for June-August and December-February is used as a measure for the seasonal amplitude S. The spatial correlations of S between the models and TRMM have improved across the CMIP phases (Fig. 2 and Table 3), but all values fall outside of the observational uncertainty.
We test the hypothesis that improvements in precipitation across the CMIP phases occur in tandem with TABLE 2. Long-term mean statistics for tropical precipitation.(from left to right) Listed are the CMIP phase, the time period, the number of models for calculating the long-term statistics, the means 6 1 standard deviation in precipitation for the tropics (G), tropical land (L), and tropical ocean (O), and the ratio of land and ocean precipitation rates (L/O).) and the spatial correlation coefficients (r) for the tropics (G), tropical land (L), and tropical ocean (O) as well as the correlation coefficient of the differences between June-August and December-February means as a measure of the seasonal amplitude (S).The top row shows CMIP6 for 1900-99 (20th) against CMIP6 2000-14 (21st), followed by TRMM against CMIP6 for 2000-14 and rows below TRMM  against CMIPs (1900-99).The statistics are computed on the multimodel mean precipitation in the three CMIP phases against TRMM.We do, however, find no clear indication that the largescale precipitation difference over tropical oceans is tightly linked with model differences in SSTs neither for the entire tropical oceans nor for the cold tongue in the Pacific, although some of the SST biases in CMIP6 are smaller than in CMIP3 and CMIP5 by up to 1 K (Figs.S2 and S3).

b. Zonal mean
Despite evidence of improvements in the spatial pattern of precipitation, we find no sign of improvement for the zonal mean precipitation across the CMIP phases FIG. 2. Taylor diagrams for tropical precipitation.Shown are the correlation coefficient, spatial standard deviation, and the root-mean-square error following Taylor (2001) of the tropical precipitation over (left) land and (right) ocean.Statistics are calculated on the (a),(b) long-term means and (c),(d) the difference between June-August and December-February means for the models (colored circles) against TRMM (black star).We mark the spread and average for all models per CMIP phase (colored lines) and the average for the selection of those models that participated in all CMIP phases (colored stars).We show CMIP6 model data for 1900-99 only, since the differences in the statistics of CMIP6 for 2000-14 and 1900-99 are small.The observational uncertainty is indicated by calculating the same statistics for CMORPH (gray star) against TRMM.
(Figs. 3a-c).The zonally averaged annual mean precipitation is remarkably robust across all phases of CMIP.The Northern Hemisphere rainfall maximum is well matched compared to TRMM.In the Southern Hemisphere, the rainfall maximum in CMIP6 and CMIP5 is, however, overestimated compared to both TRMM and CMIP3.This is likely related to the toopronounced double ITCZ in the models (e.g., Li and Xie 2014) and possibly explains the mean differences in tropical precipitation in the central Pacific (Fig. 1).In previous works, it has been related to biases in the ocean-atmosphere feedbacks in the tropical Pacific (Lin 2007), errors in cloud simulations (Li and Xie 2014), and the cold tongue bias in the topical Pacific (Samanta et al. 2019).
The double ITCZ in CMIP6 (Figs. 3a-c) shares the same biases as have been previously reported for CMIP3 and CMIP5 (Zhang et al. 2015).As quantitative comparison, we compute the double-ITCZ index I (Samanta et al. 2019): Here P N is the mean precipitation in the northern box (58-158N, 1608E-1208W), P S is the mean precipitation in the southern box (158-58S, 1608E-1208W), and P E is the mean precipitation in the equatorial box (58S-58N, 1608E-1208W).The median double-ITCZ index is largely unchanged across different phases of CMIP, with I 5 4.3 in CMIP3, I 5 3.6 in CMIP5, and I 5 4.0 in CMIP6 (Fig. 3d).Compared to the observational estimates of I 5 1.5 (CMORPH) and I 5 1.7 (TRMM), the median of I is too large by more than a factor of 2 in all CMIP phases.This means that the tropical precipitation over the Pacific Ocean is overestimated (cf.Fig. 1c).
The model spread decreases as we move through CMIP generations, but this is not an improvement.Some models reproduced the ITCZ index of the observation in both CMIP3 and CMIP5, but none do in CMIP6.

c. Intensity distribution
Frequency and intensity are important precipitation characteristics with implications for hydrology and aerosol burden.For instance, a large model spread for surface runoff has been identified in CMIP5 (Lehner et al. 2019), and for aerosol burden in aerosol-climate models (e.g., Baker and Charlson 1990;Textor et al. 2006;Fan et al. 2018).Even models with a relatively accurate representation of the spatial pattern of precipitation may have large biases in the frequency and intensity (e.g., Trenberth et al. 2003;Pendergrass and Hartmann 2014).Models in CMIP3 and CMIP5 are known to produce too-frequent drizzle (e.g., Baker and Huang 2014;Pendergrass and Hartmann 2014;Sun et al. 2015).Here, we test to what extent this behavior has improved using long-term statistics of the frequency of wet 3-h means, the 1-day lag autocorrelation, the number of consecutive dry days, and scaling relationships between precipitation amount and its duration.

1) WET AND DRY FREQUENCY
All CMIP phases consistently produce more frequent wet 3-h means in tropical precipitation than observed (Fig. 4a).This overestimation has been slightly reduced in CMIP6 with 85% of the 3-hourly means being wet, compared to 93% in CMIP3.However, this is still a substantial overestimation of the occurrence of precipitation compared to the observed frequency of 44%-54%.The improvement in tropical precipitation frequency in CMIP6 is primarily explained by the reduction of wet 3-h means over tropical oceans, whereas the frequency over land has only slightly decreased compared to CMIP3 (Figs.S4 and S5).We note, however, a substantial model spread for the frequency of precipitation rates in all CMIP phases (Fig. S6).
We measure the day-to-day variability and spatiotemporal coherence of the tropical precipitation with the 1-day lag autocorrelation (Fig. 4b).A realistic lag autocorrelation is associated with an improved representation of deep convection and convection coupled to equatorial waves, including the Madden-Julian oscillation (Peters et al. 2017;Ma et al. 2019) assessed in section 5a.Atmospheric models with parameterized moist convection are known to have unrealistic day-today variability in precipitation (Peters et al. 2017(Peters et al. , 2019) ) due to deficiencies in the physical parameterization schemes that lead to too-frequent triggering of deep convection (Klingaman et al. 2017;Peters et al. 2017).This behavior is characterized by too-large 1-day lag autocorrelations (i.e., wet episodes over several days are not sufficiently interrupted by dry days).We identify a slight improvement in the 1-day lag autocorrelation from CMIP3 and CMIP5 to CMIP6, namely, a reduction from roughly 0.60 in CMIP3 to 0.50 in CMIP6.Since the lag autocorrelation is sensitive to the representation of convection (Klingaman et al. 2017;Peters et al. 2017), this result indicates that the past model development between CMIP phases contributed to a slightly better day-to-day variability of moist convection.However, compared to the observed 1-day lag autocorrelation of 0.35 in both TRMM and CMORPH, CMIP6 models still substantially overestimate this quantity, pointing to toolittle intermittence of rainy episodes.
The maximum number of consecutive dry days (CDD) are used to quantify the length of dryness.Following the Expert Team on Climate Change Detection and Indices (ETCCDI) as used by Frich et al. (2002), CDD is defined as the number of consecutive days within a year that have total daily precipitation amounts of less than 1 mm day 21 .This threshold removes days with light drizzle events that are difficult to measure.Although TRMM (TMPA) is better for light rain events than other satellite-based data (e.g., Burdanowitz et al. 2015), it misses light precipitation (Behrangi et al. 2014) affecting the frequency of occurrence of precipitation events (Klepp et al. 2018).By eliminating days with such light drizzle events, we determine the differences in CDD considering more regular to extreme precipitation events.We show the spatial and temporal average of CDD (Fig. 4c) and the probability distribution of CDD across time and space (Fig. 4d).The latter primarily indicates spatial variability for CMIP due to the small year-to-year changes in ensemble-averaged CDD with standard deviations of 0.47-0.69(not shown).
There is an improvement over the three CMIP generations in averaged CDD and their probability of occurrence (Figs.4c,d), but CMIP6 models still produce shorter dry periods on average than observed (Fig. 4c).Reasons for the remaining difference to observations stem from the poor representation of extremely long dry episodes (Fig. 4d).For instance, the climate models show too-low probabilities for CDD longer than 130 days in CMIP3, and 200 days in CMIP5 and CMIP6, compared to the observations (Fig. 4d).The underestimation of such extremely long dry episodes is primarily explained by the mismatch in CDD over oceans (Figs.S4 and S5), but this is also the region where the improvement across CMIP generations was largest.Compared to the ocean, the number of CDD over land is generally better captured by CMIP models, except for CDDs longer than 250 days.The probability of occurrence for these extremely long dry episodes has slightly improved from CMIP3 to CMIP6, but the occurrence of more than 300 CDDs in deserts, is still underestimated.This finding has implications for other processes in the Earth system (e.g., dust-aerosol emissions, which is influenced by the soil moisture and lack of vegetation cover) (e.g., Shao 2001;Kok et al. 2014).
2) JENNINGS SCALING Jennings (1950) discovered a scaling law, P ; D a , that describes the global maximum of precipitation, P, observed at rain gauges over land during an interval of some duration, D, with the exponent a ; 1/2 for periods of minutes to 1 year.Even earlier research on thresholds of rainfall extremes supports this power-law scaling (Wussow 1922).This type of scaling can be reproduced by simple thermodynamic models whose large-scale input is modulated by stochastic forcing (e.g., Field and Shutts 2009;Zhang et al. 2013b).The scaling relationships described by Jennings, sometimes called maximum depthduration graphs, have entered textbooks in hydrology, but their application to the evaluation of climate model output is less common (Zhang et al. 2013a), which motivates the present analysis.Moreover, whereas previous studies focused on time periods of minutes to one year, we test here the extension of the Jennings scaling to decades by calculating the slope for data over longer averaging intervals.We find that the Jennings slopes are very similar in TRMM and all phases of CMIP.
The rainfall maxima P for a given D is determined from the spatial distributions of tropical precipitation.The literature typically refers to P as depth.The depth is the maximum across time and space in the running means of daily precipitation over the duration, calculated at every grid point.Durations range here from 1 day to 1 decade, with steps of 1 day, for all datasets, except for CMIP6 and TRMM, where we also use the entire overlapping period 2001-14.We show three examples of the resulting points that fall on a line in the depth-duration space (Fig. 5a).We find that all regression lines closely fit the data points, with R 2 5 0.97 being the smallest coefficient of determination across all datasets here.The slopes of that line a are shown in Fig. 5b and are known as the Jennings slope.
In both the CMIP output and the data from TRMM, a is larger than the value determined from the earlier gauge measurements by Jennings (1950).In addition to the different spatial scales, Jennings (1950) covers minutes to 1 year, while we start with daily precipitation and move to decadal scales for TRMM and CMIP.Looking at the line in Fig. 5a indicates that the steeper slopes in TRMM are primarily explained by the interval from 1 year to 1 decade (i.e., there is some curvature in the slope when moving to longer averaging intervals).Paired with the different spatial representation of gauge measurements and the gridded data, it explains why CMIP and TRMM produce slopes that are more similar to one another than compared to Jennings (1950), with CMIP6 following the observations better than the previous phases of CMIP.
There is considerable variability in the estimates of a from the CMIP output, although less in CMIP6 than in previous CMIP phases.The relatively good match between TRMM and CMIP6 based estimates of a suggests that despite biases in the distribution of precipitation, the tendency for long-duration events to be associated with more intense rainfall is well captured by the models.exclude areas of deep convection, we estimate the observed fractional area coverage of low-level cloud regimes from the daily CERES product (Loeb et al. 2009) to be 68% (not shown).We choose the threshold of 250 W m 22 , corresponding to a brightness temperature of 258 K and similar to other studies (e.g., Masunaga et al. 2005).Note that this includes both low-level cumuli and stratiform clouds.It also includes a fraction of cumulus congestus, which is not distinguishable from lowlevel clouds with OLR.Sensitivity tests with other thresholds of 240-260 W m 22 , consistent with Stubenrauch et al. (1999), give qualitatively similar results to 250 W m 22 .For the analysis, we use daily OLR and precipitation data, available from 16 CMIP3 models, 32 CMIP5 models, and 14 CMIP6 models, marked by indices 2 and 3 in Tables S1-S4.
The CMIP means have a fractional low-level cloud area similar to the observations, with a slight increase from 65% for CMIP3 to 69% for CMIP6.Despite a similar areal coverage the models differ substantially by 50%-100% in the amount of precipitation associated with low-level clouds (Fig. 6a).There is no clear improvement over the CMIP phases, although the very large outliers evident in CMIP3 and CMIP5, with precipitation fractions associated with low-level cloud regimes larger by a factor of four to five, have reduced in CMIP6.Some models in CMIP6 lie within the observational range for the fractional precipitation amount associated with low-level clouds (10%-14%), namely, BCC-ESM1 (12%), CNRM-CM6-1 (10%), and CNRM-ESM2-1 (11%).These three models, however, tend to underestimate the fractional area coverage of the low-level cloud regimes with 59%, 65%, and 65%, respectively.
We extend the analysis to regimes with deep convection, which we identify with regions of particularly low OLR.In these regimes the observations differ considerably (Fig. 6b).For an OLR of 120 W m 22 the precipitation rates are 25% larger in CERES-TRMM (200 mm day 21 ) as compared to CERES-CMORPH (150 mm day 21 ), consistent with the lower frequency of these precipitation rates in CMORPH than in TRMM (Fig. S6).CMIP5 and CMIP6 have a better representation of the relationship between OLR and precipitation rate than CMIP3 for OLR of 120-270 W m 22 , and align closely with what is diagnosed from the CERES-TRMM measurements.For more moderate precipitation, between 10 and 100 mm day 21 , the observations are more consistent and suggest that the models require deeper convection to produce these rain rates (too-low OLR) across the CMIP phases.
In summary, models produce more precipitation from low-level clouds than is observed, consistent with the persistent overestimation of drizzle in CMIP.For more moderate precipitation rates, the CMIP models are associated with lower OLR, pointing to deeper clouds or more overcast conditions than is observed.For stronger rain rates ( p . 100 mm day 21 ) substantial divergence between the observational datasets make an evaluation of the models difficult, but CMIP3 clearly lies outside of the observational range, whereas CMIP5 and CMIP6 are closer to the observations.

Solar radiative effects
Model-based climate change projections are essentially an exercise in assessing how a model's climate respond to radiative forcing.In this context, the fidelity of their response to known changes in the radiation budget, for instance, as associated with the seasonal and daily cycles of the sun, provides a useful test of their plausibility.The response of precipitation to radiative forcing associated with atmospheric composition changes, as manifest by a global increase in surface temperatures, is likely different than the response to seasonal and daily cycles in irradiance.There is, however, little reason to believe that a model could capture a forced response in precipitation (e.g., to radiative forcing of greenhouse gases), if they poorly represent the observed cycles induced by radiative perturbations as strong as those associated with the seasonal and daily changes of irradiance.This is one of our motivations for this analysis across the CMIP models.

a. Seasonal cycle
The seasonal cycle of tropical precipitation determines the regional climate in many tropical areas (Knoben et al. 2019).Hence, quite apart from being a generic test of how models respond to natural changes in the radiation budget, an ability of CMIP models to reproduce the seasonal cycle of tropical precipitation with fidelity is relevant on its own.Through the influence of precipitation on the regional energy budget, an accurate simulation of tropical precipitation is also influential for other aspects of climate on both regional and global scales.

1) ZONAL MEANS
Models in CMIP3 and CMIP5 are drier than observations early in the wet season and too wet later on in both hemispheres (Seth et al. 2013).They showed two systematic biases in tropical precipitation.First, most CMIP3 and CMIP5 models underestimate the precipitation near the equator between January and June.Others also documented a regional underestimation of precipitation in the 48-88N band within the tropical Pacific from March to April (Mechoso et al. 1995;Bellucci et al. 2010).Second, Seth et al. (2013) showed that most CMIP5 models overestimate precipitation at 48-208S, particularly strongly between February and May.This is consistent with observations showing more hemispheric asymmetry in the zonal-mean annual precipitation and a dominant ITCZ signature to the north of the equator than most CMIP models (Fig. 3).
Figure 7 shows that CMIP6 models still do not correctly represent the observed seasonal cycle of zonalmean precipitation over tropical land and ocean.We find that they are generally wetter than observations in the summer hemisphere by 0.5-2.5mmday 21.Furthermore, we find a too dry-too wet pattern between January and May, explained by a rain belt that is displaced too far to the south.This model behavior might also delay the onset of the summer monsoon in the Northern Hemisphere.

2) SUMMER MONSOONS
Monsoon rainfall dominates the annual variability in the tropics (e.g., Trenberth et al. 2000;Wang and Ding 2008), affecting many tropical regions.Previous studies show that CMIP5 models simulate better monsoonal circulation climatology and variability than CMIP3 (e.g., Sperber et al. 2013), but they still suffer from systematic regional biases.For example, the CMIP5 mean tends to underestimate precipitation over the eastern Indian Ocean, the Bay of Bengal, the equatorial western Pacific, and tropical Brazil, but overestimate precipitation over the Maritime Continent, the Philippines, and high-elevated terrains such as the Andes, Sierra Madre, and the Tibetan Plateau (Lee et al. 2010;Lee and Wang 2014).Despite these regional biases, the CMIP5 mean reproduces the observed monsoon intensity and area (Lee and Wang 2014).
We assess the monsoon across the CMIP phases with a bulk measure for the monsoon area and intensity following earlier approaches (Wang and Ding 2006;Wang et al. 2011).For each model simulation, the monsoon regions are defined with two criteria: 1) the annual range of precipitation (summer minus winter mean) exceeds FIG. 7. Seasonal cycle of differences in zonal precipitation.Shown are differences between the CMIP6 multimodel mean and TRMM (shading) and the magnitude of precipitation from CMIP6 in steps of 2 mm day 21 (gray contours).
2 mm day 21 ; and 2) the summertime precipitation contributing at least 55% to the annual total.The monsoon intensity is then defined as the area-weighted average of summer precipitation (i.e., June-August in the Northern Hemisphere and December-February in the Southern Hemisphere) within the monsoon area.The latter is the composite of the regional monsoons in both hemispheres (Fig. S7).
Based on these measures the monsoon in CMIP6 is not better represented than in previous CMIP phases.CMIP6 models rather produce the widest area among the CMIP phases and the largest mismatch compared to observations (Fig. 8a).Here, both TRMM and CMORPH, fall outside the intermodel spread of the monsoon area in the CMIP6 mean.The land-only monsoon area is closer to the observations, with less of a discrepancy between CMIP6 and previous CMIP phases (Fig. 8b).Whereas the simulated monsoon intensity fell within the range of observed values for both CMIP3 and CMIP5, the intensity of the monsoon is too large in the CMIP6 mean (Fig. 8c).This is also true for the land-only monsoon, but the magnitude of the intensity overestimation is less pronounced (Fig. 8d).Based on these metrics, the monsoon in the CMIP6 mean is larger and wetter than in the previous CMIP phases, and therefore agrees less with observations than CMIP5.
The reasonable simulation of the land-only monsoon area and intensity arises in part from regionally compensating biases.Figure 9 shows summertime precipitation differences for the regional monsoons, defined by Kitoh et al. (2013).Overall, there is an apparent reduction in the dry bias across CMIP phases, so much so that some regions (South Asia and western South America) have had their dry bias in CMIP3 replaced by a wet bias in CMIP6.This analysis suggests that the degradation of the global monsoon metrics in CMIP6 (Fig. 8) might result from a diminishment of compensating biases across the regional monsoons (Fig. 9), particularly associated with what in CMIP3 were large dry biases in the North American and South Asian monsoon systems.Altogether, this analysis makes it difficult to refute the hypothesis that the representation of the monsoon systems has not improved in successive CMIP phases.We cannot identify an overall improvement of the monsoon across the CMIP phases.Monsoon rainfall in some regions improved across the phases, but the global monsoon intensity and area have a larger bias in CMIP6 than in CMIP3.CMIP3 models showed already the basic features of the summer monsoons and the monsoonal teleconnections (Randall et al. 2007), but the location and intensity of the observed rainfall differed from the observations (Fan et al. 2010).Improvements in CMIP5 were attributed to a more realistic ENSO-monsoon teleconnection (Meehl et al. 2012), but also an improved spatial distribution of intraseasonal variations (Sperber et al. 2013).Possible reasons for monsoon differences in CMIP models are many (e.g., for CMIP5 these include too-cold SSTs, e.g., over the Arabian Sea; Levine et al. 2013), too-weak meridional temperature gradients (Joseph et al. 2012), an unrealistic development of the Indian Ocean dipole (Achuthavarier et al. 2012;Boschat et al. 2012), regional differences in rainfall affecting moisture advection (Bollasina and Ming 2013), and a different degree of compensation between thermodynamics and dynamics (D'Agostino et al. 2019).

b. Diurnal cycle
Moist convection causes differences in the diurnal cycle of precipitation over land and ocean (Dai 2001), a signal that models with parameterized convection have historically struggled to represent (Dai et al. 1999).Daytime heating over land triggers deep convection that typically causes a delayed precipitation maximum in the late afternoon to evening (e.g., Yang and Slingo 2001).Over the ocean, the nocturnal cooling at cloud tops is important for the deepening of clouds, causing an early morning precipitation maximum (Kraus 1963;Gray et al. 1977;Sato et al. 2009).Vial et al. (2019) recently documented a similar diurnal cycle for shallow convective clouds.
The diurnal cycle of precipitation is reproduced by both TRMM and CMORPH observations (Fig. 10).Our data processing for the diurnal cycle includes a transformation of the data from UTC to local time to make a meaningful comparison of the diurnal cycles.The resulting data of CMORPH has an offset of 1.5 h ahead of TRMM.This difference is thought to be mostly associated with the different time accumulation periods of the products.For CMORPH and the model output, these are the 3-h periods: 0000-0300, 0300-0600, and so on, but for TRMM, the periods are shifted by 1.5 h (i.e., 0130-0430, 0430-0730, and so on) (e.g., see section 2.3 in Rauniyar et al. 2017).Additionally, the time stamp had to be changed in CMORPH's metadata, since it was defined at the beginning of the corresponding periods.The other datasets define the time stamp at the center, and thus, we shifted CMORPH's time axis by 11.5 h, for a consistent treatment.FIG. 9. Difference in summer precipitation for the monsoon regions over land compared to TRMM for (a) CMIP3, (b) CMIP5, and (c) CMIP6.The numbers are the precipitation differences averaged for the regional monsoons, defined as in Kitoh et al. (2013).The monsoon and its regional separation is graphically displayed in Fig. S7.The equator separates the northern monsoons [North America monsoon system (NAMS), North Africa (NAF), Southern Asia summer (SAS), East Asian summer (EAS)] from the southern ones [South America monsoon system (SAMS), South Africa (SAF), and Australian-Maritime Continent (AUSMC)], 608E separates NAF and SAS, and 208N, 1008E separates SAS and EAS.We assess the diurnal cycle of precipitation in CMIP through the time of occurrence for the peak rainfall, as well as the frequency of wet 3-h means, and by the 99th percentile of all 3-h means.We use 3-hourly data, available from 7 CMIP3 models, 23 CMIP5 models, and 13 CMIP6 models, marked with index 1 in Tables S1-S4.The only exception here is the diurnal cycle of shallow clouds, for which we had 4 models less due to the data availability, marked with indices 1 and 3 in Tables S1-S4.We found no systematic differences for the diurnal cycle when we separated high-and low-resolution configurations (not shown).
Some clear and systematic biases become apparent through this analysis.One is that the simulated maxima in the diurnal cycle are too strong in the models, for both the precipitation amount and the frequency of wet 3-h means, shown in Fig. 10.This bias is most pronounced over land and over the ocean in regions with low-level clouds.Another is a bias in the simulated phase of the diurnal cycle.Over tropical land, CMIP models typically produce too-early maxima of precipitation amounts.This problem has been extensively studied, and is often attributed to the use of physical parameterization schemes for moist convection that are designed to remove convective instabilities quickly.The quasiequilibrium assumption (Arakawa and Schubert 1974), which in some form is used in most parameterization schemes, links precipitation to the rate at which convective instability is produced, thereby strongly coupling precipitation to surface fluxes (e.g., Bechtold et al. 2004) over land, and net radiative cooling rate over the ocean.This might also explain why the time of occurrence of the maximum tends to appear too early over the ocean, even in the absence of deep convection.
From Fig. 10 it is difficult to discern obvious changes among CMIP phases, let alone systematic improvements.As quantitative assessment, we therefore compute the mean absolute phase lag of the maxima in the models compared to CMORPH (Table 4).Here, too, there is no evidence of a systematic improvement of the time of occurrence for any of the three metrics of the diurnal cycle across the CMIP phases, with CMIP3 being in the mean closer to the observed time of the maximum than CMIP6.Measured by the mean absolute phase lag of the maxima, the diurnal cycle of the precipitation of low-level clouds over the ocean is also worse in CMIP6 than in the earlier phases.Using TRMM as alternative observational reference does not change these findings, except giving systematically larger phase lags for most of the metrics in Table 4.
Although some models do show improvements, and correspond to observations for individual metrics (Fig. 10), none correctly represents the time of occurrence for all three maxima in amount, frequency, and intensity (not shown).Similarly, no model represents both the minimum and maximum of a single metric correctly.When taken together with the poor response of tropical precipitation to the seasonal cycle, these findings suggest that CMIP models should only be used with great caution in studies on the response of tropical precipitation to radiative forcing of atmospheric composition changes.

Modes of internal variability
The CMIP6 output shows more marked improvements in the representation of modes of tropical variability.This statement is based on the analysis of the two most dominant modes of internal variability in the tropics: the Madden-Julian oscillation (MJO) and El Niño-Southern Oscillation (ENSO), which we present here.

a. Madden-Julian oscillation
The MJO is the dominant mode of intraseasonal precipitation variability in the tropics, most pronounced in boreal winter.Its salient feature is a coherent eastwardpropagating pattern of enhanced and suppressed convection over the Indian Ocean, the Indo-Pacific warm pool, and the western Pacific Ocean (Madden and Julian 1994).The critical processes that give rise to the MJO remain debated (Maloney et al. 2018) and a realistic MJO in climate models has been a challenge (Kim et al. 2009;Crueger et al. 2013;Jiang et al. 2015).The present analysis focuses on the most obvious characteristic of the MJO, namely, the eastward propagation of suppressed and enhanced precipitation patterns.
Figure 11 shows the ratio of the eastward-propagating spectral power of tropical precipitation to that of its westward-propagating counterpart for the CMIP phases.Each is summed up over the MJO characteristic wavenumbers one to three and periods of 20-100 days for the November to April season between 108S and 108N.This quantity is often used as a measure for the MJO (e.g., Crueger et al. 2013;Kim et al. 2014).A ratio TABLE 4. Mean phase lag of the maxima in the amount, the frequency, and the 99th percentile in the 3-hourly tropical precipitation across CMIP phases compared to CMORPH.All lag values are in hours and listed for convection over land (L), convection over ocean (O), and low-level clouds over ocean (low-O).Positive values indicate that the models are leading the observations (i.e., earlier occurrence of the maximum in the models).

Amount
Frequency 99th percentile 1.7 0.5 4.0 3.9 1.3 1.0 6.9 6.9 6.5 5 3.8 1.6 3.7 3.0 1.6 1.4 5.7 7.4 6.5 6 3.5 0.8 4.4 3.0 2.0 2.0 6.8 6.0 11.0 larger than 1 indicates more spectral power in the eastward-propagating modes, and thus measures the dominance of an eastward-propagating disturbance.By not measuring other aspects of the MJO, such as its amplitude or composite structure, our analysis sets a relatively low bar for the evaluation of the MJO, but one which is informative.Calculations are performed on daily precipitation data from individual models for 1961-75 and 1976-90, and then averaged to be consistent with the 15-yr period of satellite observations.Almost all models across all CMIP phases show eastward propagation, indicated by ratios larger than 1.Thus, the most important characteristic of the MJO is represented.The mean MJO skill, which only slightly improves in going from CMIP3 to CMIP5, is more substantially improved in CMIP6.Every phase of CMIP has produced individual models that capture the dominance of eastward propagation particularly well-examples being the MRI-ESM for CMIP5, and GFDL-CM4 in CMIP6, with ratios of 3.3 and 3.2, respectively.Every phase of CMIP, and here CMIP6 is no exception, however, also includes models with predominantly westward-propagating disturbances.
Our results for the MJO are consistent with Fig. 4b and the expectation that an improved MJO is accompanied by a more realistic 1-day lag autocorrelation (Peters et al. 2017).However, this does not necessarily imply an improvement of the mean state.In fact, a realistic representation of the MJO is often accompanied by an unrealistic mean state of, for example, the ITCZ.Thus, the improvement of the MJO and the large ITCZ bias in CMIP6 (Fig. 3d) found in this study agree with previous results (Kim et al. 2011;Crueger and Stevens 2015).

b. El Niño-Southern Oscillation
El Niño-Southern Oscillation (ENSO) is a coupled ocean-atmosphere oscillation with a dominant period of 3 to 7 years.Its main characteristics are changing winds and sea surface temperatures over the tropical Pacific Ocean.ENSO has major impacts in the tropics and subtropics [e.g., on the monsoon (Kumar et al. 2013)], but is also thought to influence regions farther poleward.
We identify El Niño events using the method by Power et al. (2013).This includes the following processing steps with the Climate Data Operators (Schulzweida 2019) for each dataset.We compute the empirical orthogonal function (EOF) of SST means for June-December within 158S-158N, 1408E-1008W, based on detrended and filtered SST time series that do not contain temporal variability on time scales longer than 30 years.El Niño events are then defined as those cases when the principal component of this data is greater than 0.8 times the standard deviation of the principle component.We afterward select the precipitation anomalies relative to the long-term mean during the detected El Niño events and combine them by computing the average.As an example, we show the resulting pattern of precipitation associated with El Niño events for TRMM (Fig. 12a).It illustrates the classical El Niño pattern with strongly positive anomalies in precipitation over the west Pacific and north of the equatorial Pacific, paired with negative anomalies over the Maritime Continent.We also assessed La Niña events (not shown) and found inverse results to the ones for El Niño, shown next.
Comparing the CMIP results to TRMM points to two clear regions with systematic biases in precipitation associated with El Niño events (Figs.12b-d).The first one is the too-strong positive anomaly around the Maritime Continent indicating a westward displaced precipitation maximum during El Niño events.This behavior might be linked with the too-cold equatorial SST, which also occurs in CMIP6 (Fig. S3), affecting the Walker Circulation such that convective activity is displaced westward (Bayr et al. 2019).The second bias is found over the central Pacific that is consistent with the toopronounced double ITCZ (see Fig. 3).Both of these features have been identified in earlier climate models (Li and Xie 2014) and we show here that these remain challenges in CMIP6.
Biases in El Niño precipitation composites show values that are in many regions commensurate with the observed pattern.This is especially true for CMIP3, for which the pattern is almost inverse to the observed pattern (Figs.12a,b).This is due to the small precipitation amounts in CMIP3, which is reflected in the very small standard deviation and weak spatial correlation, shown in the Taylor diagram (Fig. 12e).CMIP5 witnessed a marked improvement in the amplitude of the precipitation signal for the composite El Niño events.This improvement is maintained by CMIP6, which additionally shows a slight improvement in the spatial pattern, reflected by a correlation coefficients of r 5 0.65 in CMIP6, compared to r 5 0.51 in CMIP5 (Fig. 12e).
The representation of ENSO likely explains regional precipitation biases over the Pacific Ocean and the Maritime Continent (Fig. 1c).These might be linked with SST biases (Figs.S2 and S3).The regional SSTs have improved in CMIP5 and CMIP6 compared to CMIP3.In contrast, the precipitation bias in the Indian Ocean has increased in both CMIP5 and CMIP6 compared to CMIP3.Overall, and in contrast to the natural cycles of precipitation associated with solar radiative effects, precipitation anomalies from internal variability have improved in the mean from one phase of CMIP to the next.

Climate change
In this section we compare the signal of climate change in tropical precipitation in the twentieth century, when the global near-surface temperature has increased by 0.6 to 0.8 K (Fig. 11 in Hansen et al. 2010).Given the scarcity of precipitation data at the beginning of the twentieth century, strong signals are difficult to identify.For this reason we evaluate any signals that emerge not only for consistency across CMIP phases, but also for consistency with the present understanding of drivers of precipitation changes.

a. Long-term trends
Greenhouse warming of the surface results from additional cooling of the atmosphere, which one expects to be balanced by an increase in precipitation (Mitchell et al. 1987;Fläschner et al. 2016).Atmospheric heating from the direct atmospheric radiative effects of increased aerosol burden and CO 2 (Bony et al. 2013) is associated with reduced precipitation.Past studies indicate that the increase in the twentieth-century precipitation associated with the global temperature increase has indeed been widely compensated by precipitation decreases due to atmospheric radiative effects of changes in the atmospheric composition (Thorpe and Andrews 2014;Myhre et al. 2017).Here, we investigate whether the changes in tropical precipitation in the twentieth century across the three CMIP phases are consistent with the current understanding of changes in atmospheric composition.
Figure 13 presents the mean precipitation across the tropics for the different CMIP phases.For the continental land areas, we also include observations from CRU, which because of its being drawn from a single realization (one as opposed to an average across the FIG.12. Tropical precipitation associated with El Niño events.Shown are precipitation anomalies for El Niño events from (a) TRMM, and the bias thereof for (b) CMIP3, (c) CMIP5, and (d) CMIP6, and (e) the Taylor diagram as in Fig. 2, but using averages over land and ocean from (a)-(d).The ranges of the results were tested from each model in CMIP6 for the shorter period of TRMM, but this gives a possible error due to too-few samples of observed ENSO events.The range of standard deviations in CMIP6 from the same model (average over all models) was 0.10 and it was 0.05 for the correlation coefficient.This agrees well with Wittenberg et al. (2014) and Maher et al. (2018).models) shows a much higher variability, which it to be plotted on a different scale.Note that for this reason, extremes occur in the CRU data over land that do not appear with the same magnitude for the averaged model data (e.g., CRU has minima in 1972/73 and 1987).These minima in CRU occur at times without strong volcanic eruptions.They are rather associated with natural variability (e.g., ENSO; Lyon and Barnston 2005;Gu et al. 2007;Gu and Adler 2011), which causes year-to-year changes in precipitation that are comparable to the magnitude of precipitation reduction after strong volcanic eruptions.
We identify a small reduction in precipitation, about 20.4 mm day 21 , for a couple of years after the largest explosive volcanic eruptions, namely, Santa Maria, Pinatubo, El Chichon, and Agung (Fig. 13a), in agreement with previous studies (Trenberth and Dai 2007;Gu et al. 2007;Iles and Hegerl 2014).For CMIP5 and CMIP6, similar behaviors are found over land (Fig. 13b).Here, the volcanic eruptions show stronger signatures than in the tropical mean over land and ocean together (note different range in Figs.13a and 13b).The posteruption reduction in precipitation of the CMIP means over land is stronger than what is indicated by CRU.This behavior suggests a toostrong response of tropical precipitation to volcanic aerosol effects (e.g., potentially due to too-strong volcanic aerosol radiative effects in CMIP models of phase 5 and 6).
For all phases of CMIP, precipitation increases slightly over the first half of the twentieth century (Fig. 13).This is followed by a reduction of precipitation in the second half of the century in the CMIP5 and CMIP6, but not in CMIP3.The weaker signal in CMIP3 might be from less pronounced aerosol-climate effects as compared to the later CMIP phases.This aerosol signature, combined with the direct effect of CO 2 is thought to explain the absence of a positive trend in precipitation in the models that is to be expected based on near-surface temperature changes alone.For instance, the twentieth-century trend in precipitation from CRU over land is positive, albeit small (Fig. 14b).The precipitation in the models rather indicate a slowdown of the hydrological cycle (Fig. 14), a change that has been attributed to dimming by anthropogenic aerosols (Wild 2009).This is consistent with the reduction of the positive precipitation trends over ocean as we go from CMIP3 to CMIP5 and CMIP6 (Fig. 14c), since anthropogenic aerosols also affect regions offshore of land.The reduction in precipitation over land is if anything often stronger across the CMIP5 and CMIP6 ensembles.Such differences could arise from either a tendency toward more negative aerosol forcing in CMIP5 and CMIP6, or a too-weak trend in near-surface temperatures over tropical land for other reasons.

b. Extreme monthly precipitation
Here we assess trends in extreme monthly precipitation sums to characterize changes in meteorological droughts and wet spells.To this end, we employ the standardized precipitation index (SPI; McKee et al. 1993), a widely used index for identifying dry and wet spells across different climates.Examples of previous applications include assessments of extreme precipitation in the past (Bordi et al. 2009;Bothe et al. 2010;Zhu et al. 2011;Sienz et al. 2012).On the monthly time scales, we assess here, SPI can be related to groundwater and water reservoir storage.
SPI is a normalized index for the probability of occurrence of a rainfall amount compared to the climatology.SPI units are interpreted as the number of standard deviations that an observed anomaly deviates from the climatological mean, listed in Table S6.The computation follows a standard procedure described in detail elsewhere (e.g., Sienz et al. 2012).SPI is a standardized departure of precipitation from a selected probability distribution function fit to the data by preserving the probabilities.This standardizing transformation ensures that the SPI gives a uniform measure for different climates.Here, we fit three distribution types (gamma, generalized gamma, and Weibull) at all grid points and chose the one that yields the largest p-value, that is the specific probability for a value of the test statistics to occur according to the null distribution.Thus, climate change is determined by the change of the number of grid points (per unit area), which belong to the same SPI classes.Hence, the null hypothesis is rejected if the p-value is less than or equal to a given test level (e.g., 5%) and not rejected otherwise, ensuring thereby the best-fitting distribution.
Our SPI analysis of the CMIPs and CRU is shown in Fig. 15.The geographical distribution of SPI trends in CMIP6 (Fig. 15a) indicates a tendency toward dryness in the tropical rain forests of the Amazon, southern Africa, and the subtropical parts of China.Slightly increasing tendencies in SPI are observed for large parts of the Sahel and the Maritime Continent.The CRU data indicate that the number of both extreme wet and dry months has increased (Fig. 15b), while the normal events (N0)-which comprise the majority (68% of the distribution probability)-have slightly decreased.This is indicative of a broadening of the distribution of monthly precipitation anomalies.In the course of the three generations of CMIP, tendencies in the probabilities of simulated dry anomalies (D1-D3) have become systematically more similar to the CRU data.This does not apply to the wet classes.All CMIPs show, in contrast to the CRU data, a downward trend in the number of wet anomalies (W1-W3), which is consistent with a tendency toward dryness of tropical land (e.g., Fig. 15a).

Opportunities for future advancements
Some of the biases in the representation of tropical precipitation have been reduced over successive CMIP phases.These include aspects of the pattern of the mean precipitation as well as the precipitation signature of modes of internal variability.But our progress has been uneven.In many metrics, such as in the diurnal or seasonal cycle, there is no clear sign of a continuous improvement in CMIP6.And for some quantities, such as the fraction of precipitation that comes from lowlevel cloud regimes (warm rain), or in measures of the summer monsoon, the CMIP6 model mean depart more from the observations than do earlier CMIP phases.
Even where progress has been most marked (e.g., for the climatological pattern of precipitation), the model biases are still larger than the signals they are being used to predict (Palmer and Stevens 2019).To illustrate this, Fig. 16 shows the change in precipitation expected at the time of CO 2 doubling, for simulations with a 1% yr 21 increase in CO 2 .Typically the temperature increase at this time, known as the Transient Climate Response, is about 128C.We assume this to be representative of a magnitude that CMIP6 would project into the future and hence use the associated tropical precipitation response as signal to be compared to the bias.The analysis illustrates that the regional signals are much smaller than the biases (cf.Fig. 1c), both in the multimodel mean and also for the CMIP6 model MPI-ESM-LR (not shown).Typical ways to argue that models are fit for the purpose of projecting future precipitation changes would be to point to the consistency of their response and the strength of the signal relative to the bias.Although we find some consistency across the models for the sign of the precipitation response in the central Pacific, evaluations of earlier CMIP simulations than CMIP6 already indicated that this does not imply that the regional signal is correct (England et al. 2014;Luo et al. 2018;Cai et al. 2019).Moreover, we do not find regional model agreements everywhere in the tropics.If the models are to be justified as fit for purpose, a more sophisticated argument is required.
One possible such argument would be the claim that the projected changes do not depend on the model representation of the mean state.There is some support for this claim from precipitation changes without spatial shifts in the sense of wet regions get wetter and dry regions get drier (Mitchell et al. 1987;Held and Soden 2006).But this does not apply for the more crucial question of regional changes in tropical precipitation that additionally account for spatial shifts, like those shown in Fig. 16.A part of the problem is the poor understanding of how clouds couple to atmospheric dynamics (e.g., reflected by the model diversity in precipitation responses to warming in idealized simulations; Stevens and Bony 2013).
Another unsatisfactory aspect comes from the realization that for many of the metrics we have evaluated, the model biases are well known (e.g., Stouffer et al. 2017).The point of model development is to eliminate biases, but it can be hard to know whether climate model improvements arise from overfitting or overall a better representation of physical processes, if one does not do a comprehensive evaluation.In the case models are over fit, one expects that improvements in a physical parameterization scheme deteriorate some aspects of the simulated climate as errors in that parameterization scheme no longer compensate for errors elsewhere in the model.The uneven progress for tropical precipitation that we document is an indication of the models being over fitted, rather than being entirely based on solid physical principles.For instance, MPI-ESM has a poorer representation of the MJO in CMIP6, despite an improvement of the spatial correlation with r 5 0.74 in CMIP3 und CMIP5 to r 5 0.82 in CMIP6.On the contrary, GFDL and CNRM improved the MJO over time, but show no gradual improvement in the spatial correlation.Compared to the observed ratio of eastwardto westward-propagating spectral power of 3.4, GFDL has increased the ratios from 1.75 (CMIP3) and 1.88 (CMIP5) to 3.2 (CMIP6), whereas CNRM decreased the ratios of 5.9 (CMIP3) and 4.1 (CMIP5) to 3.7 (CMIP6).However, the spatial correlations show less of a gradual improvement in these models, with r 5 0.84 (CMIP3), r 5 0.77 (CMIP5), r 5 0.91 (CMIP6) for GFDL and FIG. 16.Precipitation difference in response to a doubling in atmospheric CO 2 concentrations.Shown are the precipitation changes in the CMIP6 ensemble mean, obtained from model experiments with an annual 1% CO 2 increase as the difference between two 30-yr averages in precipitation that are 70 years apart.Stippled regions indicate where at least 16 of 17 models agree on the sign of the change.We show the isoline of the tropical mean precipitation (thick contour) and use the same color bar as in Fig. 1 for an easier comparison of the regional signals to the mean biases.r 5 0.68 (CMIP3), r 5 0.83 (CMIP5), r 5 0.84 (CMIP6) for CNRM.
Our documentation of the slow progress across the analyzed CMIP phases indicates difficulties in developing a suite of physical parameterization schemes for fully coupled Earth system models that successfully produce the desired outcome-here just accurate tropical precipitation, let alone the dynamic climate system as a whole.This is due to the complexity of interacting processes and the degree of freedom for choosing appropriate settings in the various parameterizations, not only for moist convection itself, but also processes like radiative transfer, aerosols, and turbulent mixing that influence the cloud development.Such problems are for instance reflected in little progress on bounding radiative forcing from aerosol perturbations (Bellouin et al. 2020) and a poor model representation of the diurnal cycle in near-surface winds (Fiedler et al. 2016).Moreover, potential errors in resolved processes project onto tropical precipitation (e.g., the atmospheric and ocean dynamics that set the large-scale environment in which moist convection develops).Simulating circulation is known as a grand challenge for climate models (e.g., Sandeep et al. 2014;Bony et al. 2015).
There is a tendency to say that parameterization improvements are difficult but important, so we should persevere.But given the slow pace and uneven nature of improvements for tropical precipitation since CMIP3, it at least seems fair to ask why one expects developments to accelerate when continuing along this same path.In the absence of an especially compelling answer and in the light of our own frustration, we discuss other options for future research on tropical precipitation.We focus on superparameterizations and storm-resolving simulations, well aware that not all of them are instantaneously feasible for the full breadth of research interests.
One way forward to improve tropical precipitation in coarse-resolution climate models are superparameterizations.Superparameterization refers to the replacement of the parameterization scheme for convection by a cloud-system resolving model (e.g., Khairoutdinov and Randall 2001;Dirmeyer et al. 2012).The domain of the embedded model has a sufficiently high resolution for simulating convective systems including their triggering, organization, and propagation within each grid box of a coarse-resolution model.This idea has also further developed to ultraparameterizations that resolve large-eddy mixing requiring resolutions at the order of 20 m (Parishani et al. 2017).Models with a superparameterization can better simulate light and extreme precipitation associated with convective clouds (Li et al. 2012).A superparameterization therefore helps to reduce tropical precipitation biases (e.g., in the diurnal and seasonal cycle as well as the MJO; Randall et al. 2016), despite challenges regarding the correct orientation of the embedded model relative to the wind (Dirmeyer et al. 2012) and the persistent scale gap similar like for traditional parameterization schemes (Stevens et al. 2020).Given the computational costs of superparameterizations, fast alternatives using deep neural networks (e.g., trained on model simulations with superparameterization), are also being tested (e.g., Rasp et al. 2018) with the caveat that physical processes are replaced with an artificial intelligence.
It is therefore worthwhile to also explore different approaches to study tropical precipitation with global warming.Here global storm-resolving or convectionpermitting simulations, with horizontal resolutions of 1 to 5 km, are attractive.Such simulations resolve more of deep convection (Kendon et al. 2017;Hohenegger et al. 2020), and have successfully been used for regional domains including parts of the tropics (e.g., Heinold et al. 2013;Ban et al. 2014;Leutwyler et al. 2016;Heinze et al. 2017;Klocke et al. 2017).Results of global storm-resolving simulations indicate an ability to overcome persistent biases (e.g., in the diurnal cycle, precipitation extremes, and the placement of the ITCZ; Sato et al. 2009;Satoh et al. 2019;Schär et al. 2020;Arnold and Randall 2015;Stevens et al. 2020).
The idea that storm-resolving models could be applied for global climate research is sometimes dismissed as something for the distant future (e.g., Schneider et al. 2017).However, recent studies (Neumann et al. 2019;Düben et al. 2020) suggest that if existing models could be run on today's largest supercomputer, global simulations on grids as fine as 1.5 km could already today deliver one simulated year per day.As discussed elsewhere (cf.Stevens et al. 2020), these models are not likely to solve all problems (e.g., their results are sensitive to the representation of cloud microphysical and small-scale mixing processes), but they appear to have a more solid physical basis for simulating processes relevant to tropical precipitation, such as the coupling to mesoscale atmospheric dynamics.We suggest that the past view, whereby traditional parameterizations for moist convection are necessary for all research questions in climate sciences and global storm-resolving models are a method of the distant future, is outdated.If the past pace of progress across the CMIP phases is any indication for the future, then we should not reject the idea that storm-resolving models could offer a quicker route to a better understanding of tropical precipitation within the next decade.

Conclusions
Our assessment of tropical precipitation as measured across a wide range of metrics, and over three generations of CMIP models, shows that the simulations are improving in some respects, but the improvements are uneven.In some metrics the CMIP6 models appear to have larger biases than previous generations of models.Compared to state-of-the-art observations, we identify some improvement in the 1) tropical mean spatial correlations and the root-mean-square error of the climatology of tropical precipitation, 2) day-to-day variability and the number of consecutive dry days, 3) dominant modes of internal variability measured by the Madden-Julian oscillation and El Niño-Southern Oscillation, and 4) trends in dry extremes in the twentieth century.On the contrary, we find no clear improvement for tropical precipitation in CMIP6, concerning 1) the seasonal cycle, which shows a persistent double-ITCZ bias and still poorly represented summer monsoons; 2) the precipitation associated with convective cloud regimes with the largest precipitation events being associated with too-shallow deep clouds; 3) the time of occurrence of diurnal maxima with still a systematically too-early maximum in precipitation amount and frequency; and 4) the persistent negative trends in wet extremes across all CMIPs, a change that is opposite to the positive trend found in observations for the twentieth century.
In those metrics where the CMIP6 models better represent the observations as compared to previous phases of CMIP, one can assess the rate of progress in different ways.For instance, as regards the strength of eastwardpropagating information associated with the MJO, substantial progress has been made.But even if one assumes that the current rate of progress can be maintained, another two phases (15 years) will be required to match the simulated MJO to within observational uncertainty.For other quantities the time scale of model improvement appears even longer.For instance, biases in the mean climatology are still several times larger than the signals projected in association with a roughly 28C warming.
Given the efforts that have been devoted to improving global climate models, one is reluctant to be dismissive of the improvements that have been realized.It is also almost certainly the case that some models have made much larger improvements than is apparent when one looks at all the models as a whole.Nonetheless, a sober analysis of progress must be based on quality of the results rather than the magnitude and importance of the effort.In this regard, and given the interest in using these models to project the response of tropical precipitation to anthropogenic forcing, the poor response to known solar radiative effects, in the form of seasonal and diurnal cycles, is discouraging.When it is considered that these are not independent tests of the models, as the desired response of tropical precipitation is well known, the results are even more discouraging.Given that the present situation arises after model development over three generations of CMIP models, we begin to consider the possibility that coarse-resolution global climate models suffer from structural deficiencies (i.e., the past approach of convective parameterization is ill-posed for representing tropical precipitation).This would suggest that classical climate models with parameterized convection are not necessarily adequate for the purpose of projecting future tropical precipitation changes, with implications for both physical processes that depend on them and impact studies that rely on the model results.For this reason other approaches should be encouraged, in which context storm-resolving simulations are an option.

FIG. 3 .
FIG. 3. Zonal mean precipitation.Shown are annual means across tropical latitudes for (a) CMIP6 compared to TRMM, (b) CMIP5 compared to CMIP6, and (c) CMIP3 compared to CMIP6, with shading indicating the model spread as one standard deviation, and (d) the double-ITCZ index I calculated using tropical precipitation in the regions defined by Samanta et al. (2019) and explained in the text.In (d), the box-and-whisker plots indicate the median, quartiles, and extremes in CMIP3, CMIP5, and CMIP6, and the horizontal lines are the TRMM and CMORPH observational means.
FIG. 4. Wet and dry periods.(a) The frequency of wet 3-h means calculated by flattening 3-hourly CMIP data and observations in time and space into a single dimension and counting the number of precipitation events; (b) the 1-day lag autocorrelation of total daily precipitation, temporally and spatially averaged for CMIP and observations; and (c),(d) the number of consecutive dry days (CDD) as (c) box-and-whisker plot for the time and spatial average of CDD of CMIP and observations plotted as horizontal lines and (d) the probability of occurrence of CDD across time and space.
FIG. 5. Jennings scaling.(a)  Three examples for the calculated data points that fall on a line in the depth-duration (P-D) space and (b) the Jennings slopes a of that line across the CMIP phases and in TRMM, compared to the gauge measurements used byJennings (1950).The box-and-whisker plot show the means, quartiles, and extremes across the CMIP phases.

FIG. 6 .
FIG. 6. Precipitation associated with clouds of different depth.(a) The fraction of precipitation associated with low-level clouds, defined as the daily precipitation in regions with daily outgoing longwave radiation (OLR) greater than 250 W m 22 divided by the total tropical precipitation amount.(b) Tropical mean in daily OLR against daily precipitation amount binned by steps of 10 mm day 21 .Shaded areas mark half the standard deviation of the model spread.The probability density functions of individual models are shown in the supplemental material (Fig. S6).In both (a) and (b) the black (gray) line is the precipitation observed by TRMM (CMORPH) and OLR based on CERES.We use here CMIP model data marked with indices 2 and 3 in TablesS1-S4, which is slightly less than in analyses that only use daily precipitation, because of the availability of OLR output.

FIG. 8 .
FIG.8.Area-weighted summer monsoon area and intensity.Shown are the values for (a),(c) the monsoon and for (b),(d) the monsoon over land for CMIP3, CMIP5, and CMIP6.Box-and-whisker plot show the median, quartiles, the 99% percentiles, and extremes.Horizontal lines are the means of TRMM and CMORPH that overlap for the monsoon area.

FIG. 10 .
FIG. 10.Time of occurrence of tropical precipitation.The 24-h clocks show the time of the day (the angle) and the magnitude (the distance from the center) for (left) the precipitation amount, (center) the frequency of wet 3-h means, and (right) the all-hour 99th percentile of all 3-h means as index for the intensity.The analyses are shown for (a)-(c) tropical land, (d)-(f) tropical ocean, and (g)-(i) low-level clouds over tropical ocean, defined by a daily mean OLR .250 W m 22 .The black solid (gray dashed) lines are the diurnal cycles of TRMM (CMORPH) satellite observations.Thin gray lines indicate the diurnal cycles of individual climate models to illustrate the model spread.The small circles mark the maxima in the observations and three generations of CMIP climate models.Please note that the simulations provide 3-hourly averages and are shown at the middle of the averaging period.We do not show the ensemble-averaged diurnal cycle of the CMIP phases due to the large model spread.(a)-(f) We use the 3-hourly precipitation data, marked by index 1, and (g)-(i) 3-hourly precipitation data paired with daily OLR, marked with indices 1 and 3 in Tables S1-S4.
FIG. 11.Eastward-propagating strength of the Madden-Julian oscillation (MJO).Shown is the ratio of the eastward-and westward-propagating spectral power (r) of tropical precipitation (see section 5a).The dashed line indicates a standing wave.Values larger (smaller) than one are eastward-(westward-) propagating waves.The box-and-whisker plots indicate the median, quartiles, and extremes in the CMIP phases.

FIG. 13 .
FIG. 13.Time series of twentieth-century tropical precipitation for CMIP multimodel means.(a) Tropical mean precipitation anomalies with respect to their long-term mean for 1900-99.(b) As in (a), but for continental land regions only, including CRU data.The DP CRU axis is scaled by the fraction of the mean standard deviations of CMIP and CRU to account for differences in the year-to-year variability of ensemble means and observation [right y axis in (b)].The means are stated by the colored numbers 61 standard deviation of the CMIP ensemble.FIG.14. Precipitation trends over the period 1900-99 for CMIP3 (blue), CMIP5 (orange), CMIP6 (red), and CRU observations (only available over land; black).The histogram values are marked by the horizontal bars and the mean values by the dashed lines.Gray shading marks the 90% confidence interval for no trend based on a bootstrap method, calculating trends in about 10 4 random sequences of 100 annual means from CMIP5, which has currently the most models.
FIG. 15.SPI over land.(a)  The trends of SPI in CMIP6 (SPI units per decade), and (b) the spatially averaged trends per decade for the number of events per SPI class for the CMIP phases and CRU (1900-99).

TABLE 1 .
Overview of used precipitation observations.Listed are the characteristics of the data and the means for the tropics (G), tropical land (L), and tropical ocean (O), and the ratio of land and ocean precipitation rates (L/O).