1. Introduction
The representation of tropical precipitation has never been a strength of global climate models. Some reasons are well known, but have proven difficult to improve with classical climate modeling approaches. This includes the representation of moist convection, which produces the majority of precipitation in the tropics, but is a process that coarse-resolution climate models must parameterize with the help of resolved processes. It is known that model differences in precipitation arising from such an approach can be substantial (e.g., Dai 2006; Stevens and Bony 2013). In reviewing progress over past phases of the Coupled Model Intercomparison Project (CMIP), Stouffer et al. (2017), identify six “particularly important and long-standing biases” that the authors hope will be reduced in CMIP’s sixth phase (CMIP6). First among these is related to the misrepresentation of tropical precipitation, in the form of tropical rainbands being too hemispherically symmetric, something known as the double intertropical convergence zone (ITCZ) bias. Other studies have pointed to further deficiencies [e.g., in the representation of the summer monsoon (Zhang et al. 2015)], modes of internal variability (Ahn et al. 2017), and the intensity distribution and extremes of precipitation (Stephens et al. 2010).
A correct simulation of the tropical climate matters, not only directly for the region, but also indirectly by influencing the response of the general circulation to forcing at global scales (Held 1983; Palmer and Owen 1986; Zhou and Xie 2015). Precipitation is important due to its many impacts, ranging from ecosystems (Cox et al. 2000) to air pollution (Rodhe and Grandell 1972; Baker and Charlson 1990; Bourgeois and Bey 2011). Hence the past decades have witnessed substantial efforts to improve precipitation in climate models, including the representation of the hydrological cycle in the tropics. Despite these efforts, progress has proven unsatisfactory in past CMIP phases (Hawkins and Sutton 2011; Knutti and Sedlácek 2012; Flato et al. 2013), so much so that it has been suggested to pay the computational price of resolving precipitating convection, and abandoning the traditional approach to climate modeling with parameterized convection for studying tropical precipitation (Schär et al. 2020; Palmer and Stevens 2019; Satoh et al. 2019). In evaluating these arguments it seems sensible to ask if progress in simulating tropical precipitation is as unsatisfactory as past evaluations of CMIP models suggest. This question motivates the present study, revisiting the tropical precipitation over the three major phases of CMIP: CMIP3, CMIP5, and now CMIP6.
At a first glance, the hope that CMIP6 models would substantially address the long-standing biases in precipitation appears unfulfilled. CMIP6 models continue to show large differences in precipitation, compared to observations (Fig. 1). Half of the global precipitation occurs between 30°S and 30°N—a region we refer to as the tropics. Regional model biases relative to data from the Tropical Rainfall Measuring Mission (TRMM; Huffman et al. 2007, 2010) range from −3 to 4 mm day−1 (Fig. 1). These occur partly in regions where the absolute amount is smaller than the tropical mean of 3.85 mm day−1 (e.g., in the southeast Pacific and southern Atlantic). Spatial disagreements are a southward displaced precipitation maximum over the Atlantic Ocean, a double-ITCZ pattern in precipitation over the Pacific Ocean, and an east–west precipitation anomaly over the Indian Ocean.

Long-term multimodel means of CMIP6 precipitation. Shown are the spatial distributions of the present-day (2000–14) precipitation statistics of CMIP6 as (a) the multimodel mean, (b) bias over land including small islands compared to gridded station observations from CRU, and (c) the mean bias in the tropics against TRMM. The thick contour indicates the isoline for the tropical mean precipitation of CMIP6 (3.58 mm day−1) for an easier comparison of regional biases to the precipitation amount. Biases are calculated from the monthly climatology for 2000–14. We use ensemble averages for models with several historical simulations.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Long-term multimodel means of CMIP6 precipitation. Shown are the spatial distributions of the present-day (2000–14) precipitation statistics of CMIP6 as (a) the multimodel mean, (b) bias over land including small islands compared to gridded station observations from CRU, and (c) the mean bias in the tropics against TRMM. The thick contour indicates the isoline for the tropical mean precipitation of CMIP6 (3.58 mm day−1) for an easier comparison of regional biases to the precipitation amount. Biases are calculated from the monthly climatology for 2000–14. We use ensemble averages for models with several historical simulations.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Long-term multimodel means of CMIP6 precipitation. Shown are the spatial distributions of the present-day (2000–14) precipitation statistics of CMIP6 as (a) the multimodel mean, (b) bias over land including small islands compared to gridded station observations from CRU, and (c) the mean bias in the tropics against TRMM. The thick contour indicates the isoline for the tropical mean precipitation of CMIP6 (3.58 mm day−1) for an easier comparison of regional biases to the precipitation amount. Biases are calculated from the monthly climatology for 2000–14. We use ensemble averages for models with several historical simulations.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
The question remains whether biases in tropical precipitation in CMIP6 models have been reduced compared to previous phases of CMIP. By combining the expertise of many authors, we apply here different previously used methods to broadly assess the representation of tropical precipitation across models participating in CMIP6. By applying the same methods to model output from the third and fifth phases of CMIP, we evaluate the extent to which model developments have been successful in improving tropical precipitation. Much of what we show effectively extends previous studies on tropical precipitation in earlier CMIP models to CMIP6. The novelty of the present study is thus not in any specific analysis, but rather through our use of existing techniques to develop and take stock of the big picture. Specifically by looking systematically at the representation of tropical precipitation by three generations of CMIP models, across different regions and scales as measured by various metrics, we assess the status and progress in climate modeling for tropical precipitation.
For the purpose of our study, we collected observations and model output from historical simulations with 3-hourly to monthly resolution from 97 different data sources and applied 14 different analysis approaches. The analyses are based on known methods and chosen for their merit for giving a broad view on different characteristics. Our data and the analysis strategy are introduced in the next section (section 2), followed by the presentation of the results of this analysis, which are distributed across four sections, focusing on the climatology (section 3), natural cycles associated with solar radiative effects (section 4) and modes of internal variability (section 5), and long-term trends in the twentieth century (section 6). Opportunities for future research are discussed in section 7. We end with our conclusions in section 8.
2. Data and methods
a. Data sources
1) Model output
We assess the historical simulations of global coupled climate models produced for the last three major phases of the Coupled Model Intercomparison Project: CMIP3 (Meehl et al. 2007), CMIP5 (Taylor et al. 2012), and CMIP6 (Eyring et al. 2016). In these simulations, the boundary conditions (e.g., irradiation, aerosols, orbital parameters, and greenhouse gas concentrations in the atmosphere) represent those estimated for the historical time period in the CMIP phase and therefore differ slightly from one another. The historical simulations in all phases of CMIP start in 1850 but end in 2000, 2005, and 2014 for CMIP3, CMIP5, and CMIP6, respectively. (Tables S1–S4 in the online supplemental material list the model output used here.)
The availability of model output differs across the CMIP phases and the participating models. We therefore chose the data considering the availability and intended analyses as follows: 1991–2000 for subdaily (3-hourly), 1961–2000 for daily, and 1900–2000 for monthly and annual analyses. For CMIP6, we additionally use data in the period 2000–14 for comparison against the current state-of-the-art observational record for the same time period (section 2b). Analyzed variables are total surface precipitation for all output frequencies as well as near-surface winds and top-of-the-atmosphere outgoing longwave radiation for daily to annual time scales. All CMIP data are averages over the given output intervals. CMIP3 and CMIP5 simulation results are summarized in the corresponding chapters of the fourth and fifth IPCC Assessment Reports (Randall et al. 2007; Flato et al. 2013).
Access to the CMIP data is facilitated by the Earth System Grid Federation (ESGF; Williams et al. 2016). For practical reasons, we use ESGF-published model output, which was already replicated by the German Climate Computing Center [Deutsches Klimarechenzentrum (DKRZ)] until 1 October 2019. Additionally, we use the not-yet-published model output from MPI-ESM-LR produced by the Max-Planck-Institute for Meteorology for CMIP6.
2) Observations
We use four observational datasets, listed in Table 1 and introduced here. The diversity in estimated precipitation among the datasets is taken as a measure of observational uncertainty, which for some ocean and mountainous regions with a sparse ground-based observation network can be considerable (e.g., for the Asian monsoon region) (Ceglar et al. 2017). The rainfall retrieval product of the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA; Huffman et al. 2007) version 7 provides 3-hourly data for 1998–2019. This dataset, TRMM hereafter, combines data from passive microwave sensors, calibrated by the TRMM precipitation radar, with infrared sensors (Huffman et al. 2010), and is corrected to match rain gauge data. We further use the 3-hourly precipitation estimate from the Climate Prediction Center morphing technique (CMORPH) version 1.0 for 1998–2017 (Joyce et al. 2004). CMORPH uses data from passive microwave measurements and cloud advection vectors from correlated images of infrared sensors. For climate change assessments, we use the gridded precipitation product of the Climatic Research Unit (CRU) time series version 4.03 (Harris et al. 2014) for 1901–2014 with 0.5° spatial resolution, based on gauge networks on land.
Overview of used precipitation observations. Listed are the characteristics of the data and the means for the tropics (G), tropical land (L), and tropical ocean (O), and the ratio of land and ocean precipitation rates (L/O).


To test the observational uncertainty, we additionally use the monthly satellite-gauge product (“3IMERGM”) of the Integrated Multisatellite Retrievals for GPM (IMERG; Huffman et al. 2019) from the Global Precipitation Mission (Hou et al. 2014). IMERG extends the concept of TRMM but instead uses a dual-frequency precipitation radar paired with more passive microwave and infrared sensors. Overall, the observed mean precipitation rate for 2000–14 ranges from 3 mm day−1 (CMORPH) to 3.5 mm day−1 (IMERG) across our four observational datasets (Table 1). Individual regions can show larger observational differences, with the largest observational ranges exceeding 2 mm day−1 over islands, in the lee of mountain ranges, and in coastal areas (Fig. S1). The products mainly disagree over central Africa (CMORPH wet bias), in the Pacific warm pool (CMORPH dry bias), in the lee of mountain ranges in West India and the Malay Peninsula (CMORPH dry bias), on the Caribbean islands (CRU wet bias), and Central America (CMORPH dry bias). More details including seasonal differences are provided in the supplemental information. CMORPH and TRMM capture the observational range across the assessed satellite products over land and ocean. We therefore use differences in these two products to measure the observational uncertainty in our analyses.
b. Data analysis strategy
All datasets have been screened, and standardized for easy handling. This includes remapping the data to the same horizontal grid between 30°S and 30°N. Typically one would choose the coarsest resolution as common grid to avoid generating information that the model did not simulate, but this approach would have led to a crude comparison since models in CMIP3 had substantially coarser grids than in CMIP6. As a compromise, we use the T63 grid, which is the native grid of MPI-M’s low-resolution configuration of MPI-ESMs in CMIP5 and CMIP6. This grid has 196 points along the equator, and hence a spatial resolution of approximately 200 km. We unify the precipitation unit of all datasets by calculating mm day−1.
In addition to performing an analysis over the entire tropics, separate analyses are performed for tropical land and ocean. For this purpose we use the land–sea mask of MPI-ESM1.2. We count grid cells with more than 50% ocean surface as ocean and otherwise as land. This approach implies that small islands are assigned to ocean regions. All tropical lakes are defined as land. Results from the analyses for tropical land and ocean are shown if relevant.
The output from models that provide more than one simulation for the historical period are averaged before computing the mean of a CMIP phase. By this procedure, we avoid giving too much weight to an individual model that produced particularly many simulations. The model output includes both precipitation contributions from the model’s subgrid parameterizations and the fractions associated with atmospheric dynamics explicitly resolved on the model grids.
As discussed in the introduction, none of the analysis techniques we employ are novel. Most are widely used in the climate modeling community (e.g., statistics over different time and length scales as well as analyses under different meteorological regimes). Some techniques are less familiar (e.g., the standardized precipitation index and the concept of Jennings scaling; Jennings 1950). These are included to present a broader view of how precipitation is represented in models. We further analyze precipitation associated with a range of different atmospheric features like cloud regimes, monsoons, and intra- and interseasonal variability. The details of these techniques are introduced in the relevant sections.
A comparison of models and observations encompassing different time periods poses a number of challenges. One challenge is the definition of a common time period for the comparison, as not only do the different CMIP phases end on different years, they also overlap differently with satellite datasets. Initially we compared TRMM against CMIP6 for the overlapping time period 2000–14 as validation, and CMIP6 against CMIP5 and CMIP3 for the overlapping period 1900–99 to determine the development across CMIP generations. We found, however, only small differences in the statistics of CMIP6 for 1900–99 and 2000–14 in all our results, consistent with similar long-term mean statistics for CMIP6 (Table 2) and the small past trend in tropical mean precipitation (section 6). For instance, the spatial correlation coefficient of the CMIP6 precipitation climatologies for the twentieth and twenty-first centuries over the tropics is 0.998, much larger than the average correlations between CMIP models and TRMM (Table 3). For simplicity, and because a more temporally consistent comparison adds no new information, we compare TRMM and CMORPH for 2000–14 directly with the different CMIP phases for 1900–99. Another challenge was to establish to what extent changes across phases of CMIP were simply the result of a different mix of models in each phase. To test this possibility we selected the subset of models that participated in all phases of CMIP and tested to what extent this sample of models influenced our conclusion for the climatological mean. We found that using all the CMIP models, or just the subset participating in all CMIP phases yielded similar results (not shown). We further tested averaging over related models to account for different processing practices (Abramowitz et al. 2019). To this end, we calculated the standardized precipitation index on averages of related models in CMIP6, and identified only small differences that did not change our conclusions (not shown).
Long-term mean statistics for tropical precipitation. (from left to right) Listed are the CMIP phase, the time period, the number of models for calculating the long-term statistics, the means ± 1 standard deviation in precipitation for the tropics (G), tropical land (L), and tropical ocean (O), and the ratio of land and ocean precipitation rates (L/O).


Long-term mean comparison of tropical precipitation. Listed are the root-mean-square error/difference (RMSE/D) and the spatial correlation coefficients (r) for the tropics (G), tropical land (L), and tropical ocean (O) as well as the correlation coefficient of the differences between June–August and December–February means as a measure of the seasonal amplitude (S). The top row shows CMIP6 for 1900–99 (20th) against CMIP6 2000–14 (21st), followed by TRMM against CMIP6 for 2000–14 and rows below TRMM (2000–14) against CMIPs (1900–99). The statistics are computed on the multimodel mean precipitation in the three CMIP phases against TRMM.


3. Climatology
a. Tropical mean
There has and continues to be a long-standing discrepancy between energy-budget inferences of precipitation, and estimates of precipitation based on observations, whereby the former tend to be larger than the latter (Stephens et al. 2012; Stevens and Schwartz 2012; Wild et al. 2012). The tropical precipitation from CMIP models assessed here are also larger than the observational estimates. Compared to the tropical mean from TRMM of 3.23 mm day−1, CMIP3 has an overestimation by 0.21 mm day−1, and CMIP5 and CMIP6 by 0.34 mm day−1 (Table 2). The tropical means of CMIP5 and CMIP6 are outside of the spread in the satellite observations (Table 1). The intermodel standard deviation is larger than the mean bias for CMIP3, but smaller for CMIP5 and CMIP6. The overestimation is also seen for precipitation averaged over oceans, with CMIP5 and CMIP6 being outside of the observational range (Tables 1 and 2). For land, we find a slight underestimation in CMIP3 and CMIP5, but CMIP6 is in the observational range. The observed land to ocean ratios in precipitation of 0.86–0.99 are consistently underestimated in all CMIP means. The land–ocean ratio has, however, slightly increased across the CMIP phases with CMIP6 (0.82) being the closest to the lower bound of the observational range of the land–ocean ratio (Table 1).
The spatial pattern of tropical precipitation shows a systematic improvement across the CMIP phases, although the values do still not fall within the observational uncertainty. We measure this with the spatial correlations, r, in the annual mean tropical precipitation between CMIP and TRMM, with r = 0.75 in CMIP3, r = 0.79 in CMIP5, and r = 0.84 in CMIP6 for the tropical mean (Table 3). Improvements across CMIP in r are also found for both tropical ocean and land separately, with r being slightly larger over ocean than over land (Table 3). The observed pattern differences, measured by r, are larger over land than over ocean (Fig. 2), but none of the CMIP means fall within the observational uncertainty for r, measured as the spatial correlation between CMORPH and TRMM. Only the two best CMIP6 models for this metric, CESM2 and CESM2-WACCM with r > 0.9 over tropical land, come close to the observational range for r, reflecting a regional improvement in the tropical precipitation pattern of this model (Woelfle et al. 2019). Also the root-mean-square errors (RMSE) for precipitation compared to TRMM have decreased on average over the CMIP phases, from 1.85 mm day−1 in CMIP3 to 1.80 mm day−1 in CMIP5 and 1.55 mm day−1 in CMIP6, but again these are larger than the observational uncertainty (Table 3). RMSEs are slightly larger over ocean than over land in both CMIP3 and CMIP5, but this behavior has reversed in CMIP6.

Taylor diagrams for tropical precipitation. Shown are the correlation coefficient, spatial standard deviation, and the root-mean-square error following Taylor (2001) of the tropical precipitation over (left) land and (right) ocean. Statistics are calculated on the (a),(b) long-term means and (c),(d) the difference between June–August and December–February means for the models (colored circles) against TRMM (black star). We mark the spread and average for all models per CMIP phase (colored lines) and the average for the selection of those models that participated in all CMIP phases (colored stars). We show CMIP6 model data for 1900–99 only, since the differences in the statistics of CMIP6 for 2000–14 and 1900–99 are small. The observational uncertainty is indicated by calculating the same statistics for CMORPH (gray star) against TRMM.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Taylor diagrams for tropical precipitation. Shown are the correlation coefficient, spatial standard deviation, and the root-mean-square error following Taylor (2001) of the tropical precipitation over (left) land and (right) ocean. Statistics are calculated on the (a),(b) long-term means and (c),(d) the difference between June–August and December–February means for the models (colored circles) against TRMM (black star). We mark the spread and average for all models per CMIP phase (colored lines) and the average for the selection of those models that participated in all CMIP phases (colored stars). We show CMIP6 model data for 1900–99 only, since the differences in the statistics of CMIP6 for 2000–14 and 1900–99 are small. The observational uncertainty is indicated by calculating the same statistics for CMORPH (gray star) against TRMM.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Taylor diagrams for tropical precipitation. Shown are the correlation coefficient, spatial standard deviation, and the root-mean-square error following Taylor (2001) of the tropical precipitation over (left) land and (right) ocean. Statistics are calculated on the (a),(b) long-term means and (c),(d) the difference between June–August and December–February means for the models (colored circles) against TRMM (black star). We mark the spread and average for all models per CMIP phase (colored lines) and the average for the selection of those models that participated in all CMIP phases (colored stars). We show CMIP6 model data for 1900–99 only, since the differences in the statistics of CMIP6 for 2000–14 and 1900–99 are small. The observational uncertainty is indicated by calculating the same statistics for CMORPH (gray star) against TRMM.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Figure 2 shows the standard deviations of CMIP models in comparison to TRMM for land and ocean. While this measure of annual-mean variability in CMIP3 is too small, CMIP5 and CMIP6 are closer to observations over tropical land, while the standard deviation has been similar across the CMIP phases over tropical ocean. The difference between the mean precipitation for June–August and December–February is used as a measure for the seasonal amplitude S. The spatial correlations of S between the models and TRMM have improved across the CMIP phases (Fig. 2 and Table 3), but all values fall outside of the observational uncertainty.
We test the hypothesis that improvements in precipitation across the CMIP phases occur in tandem with a reduction in large-scale SST biases. Climate models typically underestimate SSTs by several degrees in large parts of the tropical oceans, especially the cold tongue region in the Pacific Ocean (e.g., Woelfle et al. 2019) while they overestimate SSTs in the upwelling regions at the eastern side of the basins (Li and Xie 2012). We do, however, find no clear indication that the large-scale precipitation difference over tropical oceans is tightly linked with model differences in SSTs neither for the entire tropical oceans nor for the cold tongue in the Pacific, although some of the SST biases in CMIP6 are smaller than in CMIP3 and CMIP5 by up to 1 K (Figs. S2 and S3).
b. Zonal mean
Despite evidence of improvements in the spatial pattern of precipitation, we find no sign of improvement for the zonal mean precipitation across the CMIP phases (Figs. 3a–c). The zonally averaged annual mean precipitation is remarkably robust across all phases of CMIP. The Northern Hemisphere rainfall maximum is well matched compared to TRMM. In the Southern Hemisphere, the rainfall maximum in CMIP6 and CMIP5 is, however, overestimated compared to both TRMM and CMIP3. This is likely related to the too-pronounced double ITCZ in the models (e.g., Li and Xie 2014) and possibly explains the mean differences in tropical precipitation in the central Pacific (Fig. 1). In previous works, it has been related to biases in the ocean–atmosphere feedbacks in the tropical Pacific (Lin 2007), errors in cloud simulations (Li and Xie 2014), and the cold tongue bias in the topical Pacific (Samanta et al. 2019).

Zonal mean precipitation. Shown are annual means across tropical latitudes for (a) CMIP6 compared to TRMM, (b) CMIP5 compared to CMIP6, and (c) CMIP3 compared to CMIP6, with shading indicating the model spread as one standard deviation, and (d) the double-ITCZ index I calculated using tropical precipitation in the regions defined by Samanta et al. (2019) and explained in the text. In (d), the box-and-whisker plots indicate the median, quartiles, and extremes in CMIP3, CMIP5, and CMIP6, and the horizontal lines are the TRMM and CMORPH observational means.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Zonal mean precipitation. Shown are annual means across tropical latitudes for (a) CMIP6 compared to TRMM, (b) CMIP5 compared to CMIP6, and (c) CMIP3 compared to CMIP6, with shading indicating the model spread as one standard deviation, and (d) the double-ITCZ index I calculated using tropical precipitation in the regions defined by Samanta et al. (2019) and explained in the text. In (d), the box-and-whisker plots indicate the median, quartiles, and extremes in CMIP3, CMIP5, and CMIP6, and the horizontal lines are the TRMM and CMORPH observational means.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Zonal mean precipitation. Shown are annual means across tropical latitudes for (a) CMIP6 compared to TRMM, (b) CMIP5 compared to CMIP6, and (c) CMIP3 compared to CMIP6, with shading indicating the model spread as one standard deviation, and (d) the double-ITCZ index I calculated using tropical precipitation in the regions defined by Samanta et al. (2019) and explained in the text. In (d), the box-and-whisker plots indicate the median, quartiles, and extremes in CMIP3, CMIP5, and CMIP6, and the horizontal lines are the TRMM and CMORPH observational means.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
The double ITCZ in CMIP6 (Figs. 3a–c) shares the same biases as have been previously reported for CMIP3 and CMIP5 (Zhang et al. 2015). As quantitative comparison, we compute the double-ITCZ index I (Samanta et al. 2019):
Here PN is the mean precipitation in the northern box (5°–15°N, 160°E–120°W), PS is the mean precipitation in the southern box (15°–5°S, 160°E–120°W), and PE is the mean precipitation in the equatorial box (5°S–5°N, 160°E–120°W). The median double-ITCZ index is largely unchanged across different phases of CMIP, with I = 4.3 in CMIP3, I = 3.6 in CMIP5, and I = 4.0 in CMIP6 (Fig. 3d). Compared to the observational estimates of I = 1.5 (CMORPH) and I = 1.7 (TRMM), the median of I is too large by more than a factor of 2 in all CMIP phases. This means that the tropical precipitation over the Pacific Ocean is overestimated (cf. Fig. 1c). The model spread decreases as we move through CMIP generations, but this is not an improvement. Some models reproduced the ITCZ index of the observation in both CMIP3 and CMIP5, but none do in CMIP6.
c. Intensity distribution
Frequency and intensity are important precipitation characteristics with implications for hydrology and aerosol burden. For instance, a large model spread for surface runoff has been identified in CMIP5 (Lehner et al. 2019), and for aerosol burden in aerosol–climate models (e.g., Baker and Charlson 1990; Textor et al. 2006; Fan et al. 2018). Even models with a relatively accurate representation of the spatial pattern of precipitation may have large biases in the frequency and intensity (e.g., Trenberth et al. 2003; Pendergrass and Hartmann 2014). Models in CMIP3 and CMIP5 are known to produce too-frequent drizzle (e.g., Baker and Huang 2014; Pendergrass and Hartmann 2014; Sun et al. 2015). Here, we test to what extent this behavior has improved using long-term statistics of the frequency of wet 3-h means, the 1-day lag autocorrelation, the number of consecutive dry days, and scaling relationships between precipitation amount and its duration.
1) Wet and dry frequency
All CMIP phases consistently produce more frequent wet 3-h means in tropical precipitation than observed (Fig. 4a). This overestimation has been slightly reduced in CMIP6 with 85% of the 3-hourly means being wet, compared to 93% in CMIP3. However, this is still a substantial overestimation of the occurrence of precipitation compared to the observed frequency of 44%–54%. The improvement in tropical precipitation frequency in CMIP6 is primarily explained by the reduction of wet 3-h means over tropical oceans, whereas the frequency over land has only slightly decreased compared to CMIP3 (Figs. S4 and S5). We note, however, a substantial model spread for the frequency of precipitation rates in all CMIP phases (Fig. S6).

Wet and dry periods. (a) The frequency of wet 3-h means calculated by flattening 3-hourly CMIP data and observations in time and space into a single dimension and counting the number of precipitation events; (b) the 1-day lag autocorrelation of total daily precipitation, temporally and spatially averaged for CMIP and observations; and (c),(d) the number of consecutive dry days (CDD) as (c) box-and-whisker plot for the time and spatial average of CDD of CMIP and observations plotted as horizontal lines and (d) the probability of occurrence of CDD across time and space.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Wet and dry periods. (a) The frequency of wet 3-h means calculated by flattening 3-hourly CMIP data and observations in time and space into a single dimension and counting the number of precipitation events; (b) the 1-day lag autocorrelation of total daily precipitation, temporally and spatially averaged for CMIP and observations; and (c),(d) the number of consecutive dry days (CDD) as (c) box-and-whisker plot for the time and spatial average of CDD of CMIP and observations plotted as horizontal lines and (d) the probability of occurrence of CDD across time and space.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Wet and dry periods. (a) The frequency of wet 3-h means calculated by flattening 3-hourly CMIP data and observations in time and space into a single dimension and counting the number of precipitation events; (b) the 1-day lag autocorrelation of total daily precipitation, temporally and spatially averaged for CMIP and observations; and (c),(d) the number of consecutive dry days (CDD) as (c) box-and-whisker plot for the time and spatial average of CDD of CMIP and observations plotted as horizontal lines and (d) the probability of occurrence of CDD across time and space.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
We measure the day-to-day variability and spatiotemporal coherence of the tropical precipitation with the 1-day lag autocorrelation (Fig. 4b). A realistic lag autocorrelation is associated with an improved representation of deep convection and convection coupled to equatorial waves, including the Madden–Julian oscillation (Peters et al. 2017; Ma et al. 2019) assessed in section 5a. Atmospheric models with parameterized moist convection are known to have unrealistic day-to-day variability in precipitation (Peters et al. 2017, 2019) due to deficiencies in the physical parameterization schemes that lead to too-frequent triggering of deep convection (Klingaman et al. 2017; Peters et al. 2017). This behavior is characterized by too-large 1-day lag autocorrelations (i.e., wet episodes over several days are not sufficiently interrupted by dry days). We identify a slight improvement in the 1-day lag autocorrelation from CMIP3 and CMIP5 to CMIP6, namely, a reduction from roughly 0.60 in CMIP3 to 0.50 in CMIP6. Since the lag autocorrelation is sensitive to the representation of convection (Klingaman et al. 2017; Peters et al. 2017), this result indicates that the past model development between CMIP phases contributed to a slightly better day-to-day variability of moist convection. However, compared to the observed 1-day lag autocorrelation of 0.35 in both TRMM and CMORPH, CMIP6 models still substantially overestimate this quantity, pointing to too-little intermittence of rainy episodes.
The maximum number of consecutive dry days (CDD) are used to quantify the length of dryness. Following the Expert Team on Climate Change Detection and Indices (ETCCDI) as used by Frich et al. (2002), CDD is defined as the number of consecutive days within a year that have total daily precipitation amounts of less than 1 mm day−1. This threshold removes days with light drizzle events that are difficult to measure. Although TRMM (TMPA) is better for light rain events than other satellite-based data (e.g., Burdanowitz et al. 2015), it misses light precipitation (Behrangi et al. 2014) affecting the frequency of occurrence of precipitation events (Klepp et al. 2018). By eliminating days with such light drizzle events, we determine the differences in CDD considering more regular to extreme precipitation events. We show the spatial and temporal average of CDD (Fig. 4c) and the probability distribution of CDD across time and space (Fig. 4d). The latter primarily indicates spatial variability for CMIP due to the small year-to-year changes in ensemble-averaged CDD with standard deviations of 0.47–0.69 (not shown).
There is an improvement over the three CMIP generations in averaged CDD and their probability of occurrence (Figs. 4c,d), but CMIP6 models still produce shorter dry periods on average than observed (Fig. 4c). Reasons for the remaining difference to observations stem from the poor representation of extremely long dry episodes (Fig. 4d). For instance, the climate models show too-low probabilities for CDD longer than 130 days in CMIP3, and 200 days in CMIP5 and CMIP6, compared to the observations (Fig. 4d). The underestimation of such extremely long dry episodes is primarily explained by the mismatch in CDD over oceans (Figs. S4 and S5), but this is also the region where the improvement across CMIP generations was largest. Compared to the ocean, the number of CDD over land is generally better captured by CMIP models, except for CDDs longer than 250 days. The probability of occurrence for these extremely long dry episodes has slightly improved from CMIP3 to CMIP6, but the occurrence of more than 300 CDDs in deserts, is still underestimated. This finding has implications for other processes in the Earth system (e.g., dust-aerosol emissions, which is influenced by the soil moisture and lack of vegetation cover) (e.g., Shao 2001; Kok et al. 2014).
2) Jennings scaling
Jennings (1950) discovered a scaling law, P ~ Dα, that describes the global maximum of precipitation, P, observed at rain gauges over land during an interval of some duration, D, with the exponent α ~ 1/2 for periods of minutes to 1 year. Even earlier research on thresholds of rainfall extremes supports this power-law scaling (Wussow 1922). This type of scaling can be reproduced by simple thermodynamic models whose large-scale input is modulated by stochastic forcing (e.g., Field and Shutts 2009; Zhang et al. 2013b). The scaling relationships described by Jennings, sometimes called maximum depth–duration graphs, have entered textbooks in hydrology, but their application to the evaluation of climate model output is less common (Zhang et al. 2013a), which motivates the present analysis. Moreover, whereas previous studies focused on time periods of minutes to one year, we test here the extension of the Jennings scaling to decades by calculating the slope for data over longer averaging intervals. We find that the Jennings slopes are very similar in TRMM and all phases of CMIP.
The rainfall maxima P for a given D is determined from the spatial distributions of tropical precipitation. The literature typically refers to P as depth. The depth is the maximum across time and space in the running means of daily precipitation over the duration, calculated at every grid point. Durations range here from 1 day to 1 decade, with steps of 1 day, for all datasets, except for CMIP6 and TRMM, where we also use the entire overlapping period 2001–14. We show three examples of the resulting points that fall on a line in the depth–duration space (Fig. 5a). We find that all regression lines closely fit the data points, with R2 = 0.97 being the smallest coefficient of determination across all datasets here. The slopes of that line α are shown in Fig. 5b and are known as the Jennings slope.

Jennings scaling. (a) Three examples for the calculated data points that fall on a line in the depth–duration (P-D) space and (b) the Jennings slopes α of that line across the CMIP phases and in TRMM, compared to the gauge measurements used by Jennings (1950). The box-and-whisker plot show the means, quartiles, and extremes across the CMIP phases.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Jennings scaling. (a) Three examples for the calculated data points that fall on a line in the depth–duration (P-D) space and (b) the Jennings slopes α of that line across the CMIP phases and in TRMM, compared to the gauge measurements used by Jennings (1950). The box-and-whisker plot show the means, quartiles, and extremes across the CMIP phases.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Jennings scaling. (a) Three examples for the calculated data points that fall on a line in the depth–duration (P-D) space and (b) the Jennings slopes α of that line across the CMIP phases and in TRMM, compared to the gauge measurements used by Jennings (1950). The box-and-whisker plot show the means, quartiles, and extremes across the CMIP phases.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
In both the CMIP output and the data from TRMM, α is larger than the value determined from the earlier gauge measurements by Jennings (1950). In addition to the different spatial scales, Jennings (1950) covers minutes to 1 year, while we start with daily precipitation and move to decadal scales for TRMM and CMIP. Looking at the line in Fig. 5a indicates that the steeper slopes in TRMM are primarily explained by the interval from 1 year to 1 decade (i.e., there is some curvature in the slope when moving to longer averaging intervals). Paired with the different spatial representation of gauge measurements and the gridded data, it explains why CMIP and TRMM produce slopes that are more similar to one another than compared to Jennings (1950), with CMIP6 following the observations better than the previous phases of CMIP.
There is considerable variability in the estimates of α from the CMIP output, although less in CMIP6 than in previous CMIP phases. The relatively good match between TRMM and CMIP6 based estimates of α suggests that despite biases in the distribution of precipitation, the tendency for long-duration events to be associated with more intense rainfall is well captured by the models.
d. Low-level and deep clouds
We investigate tropical precipitation associated with different cloud regimes, namely, low-level and deep clouds. Low-level cloud regimes cover large parts of the tropics away from the ITCZ. Using an outgoing longwave radiation (OLR) threshold of >250 W m−2 to exclude areas of deep convection, we estimate the observed fractional area coverage of low-level cloud regimes from the daily CERES product (Loeb et al. 2009) to be 68% (not shown). We choose the threshold of 250 W m−2, corresponding to a brightness temperature of 258 K and similar to other studies (e.g., Masunaga et al. 2005). Note that this includes both low-level cumuli and stratiform clouds. It also includes a fraction of cumulus congestus, which is not distinguishable from low-level clouds with OLR. Sensitivity tests with other thresholds of 240–260 W m−2, consistent with Stubenrauch et al. (1999), give qualitatively similar results to 250 W m−2. For the analysis, we use daily OLR and precipitation data, available from 16 CMIP3 models, 32 CMIP5 models, and 14 CMIP6 models, marked by indices 2 and 3 in Tables S1–S4.
The CMIP means have a fractional low-level cloud area similar to the observations, with a slight increase from 65% for CMIP3 to 69% for CMIP6. Despite a similar areal coverage the models differ substantially by 50%–100% in the amount of precipitation associated with low-level clouds (Fig. 6a). There is no clear improvement over the CMIP phases, although the very large outliers evident in CMIP3 and CMIP5, with precipitation fractions associated with low-level cloud regimes larger by a factor of four to five, have reduced in CMIP6. Some models in CMIP6 lie within the observational range for the fractional precipitation amount associated with low-level clouds (10%–14%), namely, BCC-ESM1 (12%), CNRM-CM6–1 (10%), and CNRM-ESM2–1 (11%). These three models, however, tend to underestimate the fractional area coverage of the low-level cloud regimes with 59%, 65%, and 65%, respectively.

Precipitation associated with clouds of different depth. (a) The fraction of precipitation associated with low-level clouds, defined as the daily precipitation in regions with daily outgoing longwave radiation (OLR) greater than 250 W m−2 divided by the total tropical precipitation amount. (b) Tropical mean in daily OLR against daily precipitation amount binned by steps of 10 mm day−1. Shaded areas mark half the standard deviation of the model spread. The probability density functions of individual models are shown in the supplemental material (Fig. S6). In both (a) and (b) the black (gray) line is the precipitation observed by TRMM (CMORPH) and OLR based on CERES. We use here CMIP model data marked with indices 2 and 3 in Tables S1–S4, which is slightly less than in analyses that only use daily precipitation, because of the availability of OLR output.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Precipitation associated with clouds of different depth. (a) The fraction of precipitation associated with low-level clouds, defined as the daily precipitation in regions with daily outgoing longwave radiation (OLR) greater than 250 W m−2 divided by the total tropical precipitation amount. (b) Tropical mean in daily OLR against daily precipitation amount binned by steps of 10 mm day−1. Shaded areas mark half the standard deviation of the model spread. The probability density functions of individual models are shown in the supplemental material (Fig. S6). In both (a) and (b) the black (gray) line is the precipitation observed by TRMM (CMORPH) and OLR based on CERES. We use here CMIP model data marked with indices 2 and 3 in Tables S1–S4, which is slightly less than in analyses that only use daily precipitation, because of the availability of OLR output.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Precipitation associated with clouds of different depth. (a) The fraction of precipitation associated with low-level clouds, defined as the daily precipitation in regions with daily outgoing longwave radiation (OLR) greater than 250 W m−2 divided by the total tropical precipitation amount. (b) Tropical mean in daily OLR against daily precipitation amount binned by steps of 10 mm day−1. Shaded areas mark half the standard deviation of the model spread. The probability density functions of individual models are shown in the supplemental material (Fig. S6). In both (a) and (b) the black (gray) line is the precipitation observed by TRMM (CMORPH) and OLR based on CERES. We use here CMIP model data marked with indices 2 and 3 in Tables S1–S4, which is slightly less than in analyses that only use daily precipitation, because of the availability of OLR output.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
We extend the analysis to regimes with deep convection, which we identify with regions of particularly low OLR. In these regimes the observations differ considerably (Fig. 6b). For an OLR of 120 W m−2 the precipitation rates are 25% larger in CERES-TRMM (200 mm day−1) as compared to CERES-CMORPH (150 mm day−1), consistent with the lower frequency of these precipitation rates in CMORPH than in TRMM (Fig. S6). CMIP5 and CMIP6 have a better representation of the relationship between OLR and precipitation rate than CMIP3 for OLR of 120–270 W m−2, and align closely with what is diagnosed from the CERES-TRMM measurements. For more moderate precipitation, between 10 and 100 mm day−1, the observations are more consistent and suggest that the models require deeper convection to produce these rain rates (too-low OLR) across the CMIP phases.
In summary, models produce more precipitation from low-level clouds than is observed, consistent with the persistent overestimation of drizzle in CMIP. For more moderate precipitation rates, the CMIP models are associated with lower OLR, pointing to deeper clouds or more overcast conditions than is observed. For stronger rain rates (p > 100 mm day−1) substantial divergence between the observational datasets make an evaluation of the models difficult, but CMIP3 clearly lies outside of the observational range, whereas CMIP5 and CMIP6 are closer to the observations.
4. Solar radiative effects
Model-based climate change projections are essentially an exercise in assessing how a model’s climate respond to radiative forcing. In this context, the fidelity of their response to known changes in the radiation budget, for instance, as associated with the seasonal and daily cycles of the sun, provides a useful test of their plausibility. The response of precipitation to radiative forcing associated with atmospheric composition changes, as manifest by a global increase in surface temperatures, is likely different than the response to seasonal and daily cycles in irradiance. There is, however, little reason to believe that a model could capture a forced response in precipitation (e.g., to radiative forcing of greenhouse gases), if they poorly represent the observed cycles induced by radiative perturbations as strong as those associated with the seasonal and daily changes of irradiance. This is one of our motivations for this analysis across the CMIP models.
a. Seasonal cycle
The seasonal cycle of tropical precipitation determines the regional climate in many tropical areas (Knoben et al. 2019). Hence, quite apart from being a generic test of how models respond to natural changes in the radiation budget, an ability of CMIP models to reproduce the seasonal cycle of tropical precipitation with fidelity is relevant on its own. Through the influence of precipitation on the regional energy budget, an accurate simulation of tropical precipitation is also influential for other aspects of climate on both regional and global scales.
1) Zonal means
Models in CMIP3 and CMIP5 are drier than observations early in the wet season and too wet later on in both hemispheres (Seth et al. 2013). They showed two systematic biases in tropical precipitation. First, most CMIP3 and CMIP5 models underestimate the precipitation near the equator between January and June. Others also documented a regional underestimation of precipitation in the 4°–8°N band within the tropical Pacific from March to April (Mechoso et al. 1995; Bellucci et al. 2010). Second, Seth et al. (2013) showed that most CMIP5 models overestimate precipitation at 4°–20°S, particularly strongly between February and May. This is consistent with observations showing more hemispheric asymmetry in the zonal-mean annual precipitation and a dominant ITCZ signature to the north of the equator than most CMIP models (Fig. 3).
Figure 7 shows that CMIP6 models still do not correctly represent the observed seasonal cycle of zonal-mean precipitation over tropical land and ocean. We find that they are generally wetter than observations in the summer hemisphere by 0.5–2.5 mm day−1. Furthermore, we find a too dry–too wet pattern between January and May, explained by a rain belt that is displaced too far to the south. This model behavior might also delay the onset of the summer monsoon in the Northern Hemisphere.

Seasonal cycle of differences in zonal precipitation. Shown are differences between the CMIP6 multimodel mean and TRMM (shading) and the magnitude of precipitation from CMIP6 in steps of 2 mm day−1 (gray contours).
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Seasonal cycle of differences in zonal precipitation. Shown are differences between the CMIP6 multimodel mean and TRMM (shading) and the magnitude of precipitation from CMIP6 in steps of 2 mm day−1 (gray contours).
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Seasonal cycle of differences in zonal precipitation. Shown are differences between the CMIP6 multimodel mean and TRMM (shading) and the magnitude of precipitation from CMIP6 in steps of 2 mm day−1 (gray contours).
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
2) Summer monsoons
Monsoon rainfall dominates the annual variability in the tropics (e.g., Trenberth et al. 2000; Wang and Ding 2008), affecting many tropical regions. Previous studies show that CMIP5 models simulate better monsoonal circulation climatology and variability than CMIP3 (e.g., Sperber et al. 2013), but they still suffer from systematic regional biases. For example, the CMIP5 mean tends to underestimate precipitation over the eastern Indian Ocean, the Bay of Bengal, the equatorial western Pacific, and tropical Brazil, but overestimate precipitation over the Maritime Continent, the Philippines, and high-elevated terrains such as the Andes, Sierra Madre, and the Tibetan Plateau (Lee et al. 2010; Lee and Wang 2014). Despite these regional biases, the CMIP5 mean reproduces the observed monsoon intensity and area (Lee and Wang 2014).
We assess the monsoon across the CMIP phases with a bulk measure for the monsoon area and intensity following earlier approaches (Wang and Ding 2006; Wang et al. 2011). For each model simulation, the monsoon regions are defined with two criteria: 1) the annual range of precipitation (summer minus winter mean) exceeds 2 mm day−1; and 2) the summertime precipitation contributing at least 55% to the annual total. The monsoon intensity is then defined as the area-weighted average of summer precipitation (i.e., June–August in the Northern Hemisphere and December–February in the Southern Hemisphere) within the monsoon area. The latter is the composite of the regional monsoons in both hemispheres (Fig. S7).
Based on these measures the monsoon in CMIP6 is not better represented than in previous CMIP phases. CMIP6 models rather produce the widest area among the CMIP phases and the largest mismatch compared to observations (Fig. 8a). Here, both TRMM and CMORPH, fall outside the intermodel spread of the monsoon area in the CMIP6 mean. The land-only monsoon area is closer to the observations, with less of a discrepancy between CMIP6 and previous CMIP phases (Fig. 8b). Whereas the simulated monsoon intensity fell within the range of observed values for both CMIP3 and CMIP5, the intensity of the monsoon is too large in the CMIP6 mean (Fig. 8c). This is also true for the land-only monsoon, but the magnitude of the intensity overestimation is less pronounced (Fig. 8d). Based on these metrics, the monsoon in the CMIP6 mean is larger and wetter than in the previous CMIP phases, and therefore agrees less with observations than CMIP5.

Area-weighted summer monsoon area and intensity. Shown are the values for (a),(c) the monsoon and for (b),(d) the monsoon over land for CMIP3, CMIP5, and CMIP6. Box-and-whisker plot show the median, quartiles, the 99% percentiles, and extremes. Horizontal lines are the means of TRMM and CMORPH that overlap for the monsoon area.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Area-weighted summer monsoon area and intensity. Shown are the values for (a),(c) the monsoon and for (b),(d) the monsoon over land for CMIP3, CMIP5, and CMIP6. Box-and-whisker plot show the median, quartiles, the 99% percentiles, and extremes. Horizontal lines are the means of TRMM and CMORPH that overlap for the monsoon area.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Area-weighted summer monsoon area and intensity. Shown are the values for (a),(c) the monsoon and for (b),(d) the monsoon over land for CMIP3, CMIP5, and CMIP6. Box-and-whisker plot show the median, quartiles, the 99% percentiles, and extremes. Horizontal lines are the means of TRMM and CMORPH that overlap for the monsoon area.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
The reasonable simulation of the land-only monsoon area and intensity arises in part from regionally compensating biases. Figure 9 shows summertime precipitation differences for the regional monsoons, defined by Kitoh et al. (2013). Overall, there is an apparent reduction in the dry bias across CMIP phases, so much so that some regions (South Asia and western South America) have had their dry bias in CMIP3 replaced by a wet bias in CMIP6. This analysis suggests that the degradation of the global monsoon metrics in CMIP6 (Fig. 8) might result from a diminishment of compensating biases across the regional monsoons (Fig. 9), particularly associated with what in CMIP3 were large dry biases in the North American and South Asian monsoon systems.

Difference in summer precipitation for the monsoon regions over land compared to TRMM for (a) CMIP3, (b) CMIP5, and (c) CMIP6. The numbers are the precipitation differences averaged for the regional monsoons, defined as in Kitoh et al. (2013). The monsoon and its regional separation is graphically displayed in Fig. S7. The equator separates the northern monsoons [North America monsoon system (NAMS), North Africa (NAF), Southern Asia summer (SAS), East Asian summer (EAS)] from the southern ones [South America monsoon system (SAMS), South Africa (SAF), and Australian–Maritime Continent (AUSMC)], 60°E separates NAF and SAS, and 20°N, 100°E separates SAS and EAS.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Difference in summer precipitation for the monsoon regions over land compared to TRMM for (a) CMIP3, (b) CMIP5, and (c) CMIP6. The numbers are the precipitation differences averaged for the regional monsoons, defined as in Kitoh et al. (2013). The monsoon and its regional separation is graphically displayed in Fig. S7. The equator separates the northern monsoons [North America monsoon system (NAMS), North Africa (NAF), Southern Asia summer (SAS), East Asian summer (EAS)] from the southern ones [South America monsoon system (SAMS), South Africa (SAF), and Australian–Maritime Continent (AUSMC)], 60°E separates NAF and SAS, and 20°N, 100°E separates SAS and EAS.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Difference in summer precipitation for the monsoon regions over land compared to TRMM for (a) CMIP3, (b) CMIP5, and (c) CMIP6. The numbers are the precipitation differences averaged for the regional monsoons, defined as in Kitoh et al. (2013). The monsoon and its regional separation is graphically displayed in Fig. S7. The equator separates the northern monsoons [North America monsoon system (NAMS), North Africa (NAF), Southern Asia summer (SAS), East Asian summer (EAS)] from the southern ones [South America monsoon system (SAMS), South Africa (SAF), and Australian–Maritime Continent (AUSMC)], 60°E separates NAF and SAS, and 20°N, 100°E separates SAS and EAS.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Altogether, this analysis makes it difficult to refute the hypothesis that the representation of the monsoon systems has not improved in successive CMIP phases. We cannot identify an overall improvement of the monsoon across the CMIP phases. Monsoon rainfall in some regions improved across the phases, but the global monsoon intensity and area have a larger bias in CMIP6 than in CMIP3. CMIP3 models showed already the basic features of the summer monsoons and the monsoonal teleconnections (Randall et al. 2007), but the location and intensity of the observed rainfall differed from the observations (Fan et al. 2010). Improvements in CMIP5 were attributed to a more realistic ENSO–monsoon teleconnection (Meehl et al. 2012), but also an improved spatial distribution of intraseasonal variations (Sperber et al. 2013). Possible reasons for monsoon differences in CMIP models are many (e.g., for CMIP5 these include too-cold SSTs, e.g., over the Arabian Sea; Levine et al. 2013), too-weak meridional temperature gradients (Joseph et al. 2012), an unrealistic development of the Indian Ocean dipole (Achuthavarier et al. 2012; Boschat et al. 2012), regional differences in rainfall affecting moisture advection (Bollasina and Ming 2013), and a different degree of compensation between thermodynamics and dynamics (D’Agostino et al. 2019).
b. Diurnal cycle
Moist convection causes differences in the diurnal cycle of precipitation over land and ocean (Dai 2001), a signal that models with parameterized convection have historically struggled to represent (Dai et al. 1999). Daytime heating over land triggers deep convection that typically causes a delayed precipitation maximum in the late afternoon to evening (e.g., Yang and Slingo 2001). Over the ocean, the nocturnal cooling at cloud tops is important for the deepening of clouds, causing an early morning precipitation maximum (Kraus 1963; Gray et al. 1977; Sato et al. 2009). Vial et al. (2019) recently documented a similar diurnal cycle for shallow convective clouds.
The diurnal cycle of precipitation is reproduced by both TRMM and CMORPH observations (Fig. 10). Our data processing for the diurnal cycle includes a transformation of the data from UTC to local time to make a meaningful comparison of the diurnal cycles. The resulting data of CMORPH has an offset of 1.5 h ahead of TRMM. This difference is thought to be mostly associated with the different time accumulation periods of the products. For CMORPH and the model output, these are the 3-h periods: 0000–0300, 0300–0600, and so on, but for TRMM, the periods are shifted by 1.5 h (i.e., 0130–0430, 0430–0730, and so on) (e.g., see section 2.3 in Rauniyar et al. 2017). Additionally, the time stamp had to be changed in CMORPH’s metadata, since it was defined at the beginning of the corresponding periods. The other datasets define the time stamp at the center, and thus, we shifted CMORPH’s time axis by +1.5 h, for a consistent treatment.

Time of occurrence of tropical precipitation. The 24-h clocks show the time of the day (the angle) and the magnitude (the distance from the center) for (left) the precipitation amount, (center) the frequency of wet 3-h means, and (right) the all-hour 99th percentile of all 3-h means as index for the intensity. The analyses are shown for (a)–(c) tropical land, (d)–(f) tropical ocean, and (g)–(i) low-level clouds over tropical ocean, defined by a daily mean OLR > 250 W m−2. The black solid (gray dashed) lines are the diurnal cycles of TRMM (CMORPH) satellite observations. Thin gray lines indicate the diurnal cycles of individual climate models to illustrate the model spread. The small circles mark the maxima in the observations and three generations of CMIP climate models. Please note that the simulations provide 3-hourly averages and are shown at the middle of the averaging period. We do not show the ensemble-averaged diurnal cycle of the CMIP phases due to the large model spread. (a)–(f) We use the 3-hourly precipitation data, marked by index 1, and (g)–(i) 3-hourly precipitation data paired with daily OLR, marked with indices 1 and 3 in Tables S1–S4.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Time of occurrence of tropical precipitation. The 24-h clocks show the time of the day (the angle) and the magnitude (the distance from the center) for (left) the precipitation amount, (center) the frequency of wet 3-h means, and (right) the all-hour 99th percentile of all 3-h means as index for the intensity. The analyses are shown for (a)–(c) tropical land, (d)–(f) tropical ocean, and (g)–(i) low-level clouds over tropical ocean, defined by a daily mean OLR > 250 W m−2. The black solid (gray dashed) lines are the diurnal cycles of TRMM (CMORPH) satellite observations. Thin gray lines indicate the diurnal cycles of individual climate models to illustrate the model spread. The small circles mark the maxima in the observations and three generations of CMIP climate models. Please note that the simulations provide 3-hourly averages and are shown at the middle of the averaging period. We do not show the ensemble-averaged diurnal cycle of the CMIP phases due to the large model spread. (a)–(f) We use the 3-hourly precipitation data, marked by index 1, and (g)–(i) 3-hourly precipitation data paired with daily OLR, marked with indices 1 and 3 in Tables S1–S4.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Time of occurrence of tropical precipitation. The 24-h clocks show the time of the day (the angle) and the magnitude (the distance from the center) for (left) the precipitation amount, (center) the frequency of wet 3-h means, and (right) the all-hour 99th percentile of all 3-h means as index for the intensity. The analyses are shown for (a)–(c) tropical land, (d)–(f) tropical ocean, and (g)–(i) low-level clouds over tropical ocean, defined by a daily mean OLR > 250 W m−2. The black solid (gray dashed) lines are the diurnal cycles of TRMM (CMORPH) satellite observations. Thin gray lines indicate the diurnal cycles of individual climate models to illustrate the model spread. The small circles mark the maxima in the observations and three generations of CMIP climate models. Please note that the simulations provide 3-hourly averages and are shown at the middle of the averaging period. We do not show the ensemble-averaged diurnal cycle of the CMIP phases due to the large model spread. (a)–(f) We use the 3-hourly precipitation data, marked by index 1, and (g)–(i) 3-hourly precipitation data paired with daily OLR, marked with indices 1 and 3 in Tables S1–S4.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
We assess the diurnal cycle of precipitation in CMIP through the time of occurrence for the peak rainfall, as well as the frequency of wet 3-h means, and by the 99th percentile of all 3-h means. We use 3-hourly data, available from 7 CMIP3 models, 23 CMIP5 models, and 13 CMIP6 models, marked with index 1 in Tables S1–S4. The only exception here is the diurnal cycle of shallow clouds, for which we had 4 models less due to the data availability, marked with indices 1 and 3 in Tables S1–S4. We found no systematic differences for the diurnal cycle when we separated high- and low-resolution configurations (not shown).
Some clear and systematic biases become apparent through this analysis. One is that the simulated maxima in the diurnal cycle are too strong in the models, for both the precipitation amount and the frequency of wet 3-h means, shown in Fig. 10. This bias is most pronounced over land and over the ocean in regions with low-level clouds. Another is a bias in the simulated phase of the diurnal cycle. Over tropical land, CMIP models typically produce too-early maxima of precipitation amounts. This problem has been extensively studied, and is often attributed to the use of physical parameterization schemes for moist convection that are designed to remove convective instabilities quickly. The quasi-equilibrium assumption (Arakawa and Schubert 1974), which in some form is used in most parameterization schemes, links precipitation to the rate at which convective instability is produced, thereby strongly coupling precipitation to surface fluxes (e.g., Bechtold et al. 2004) over land, and net radiative cooling rate over the ocean. This might also explain why the time of occurrence of the maximum tends to appear too early over the ocean, even in the absence of deep convection.
From Fig. 10 it is difficult to discern obvious changes among CMIP phases, let alone systematic improvements. As quantitative assessment, we therefore compute the mean absolute phase lag of the maxima in the models compared to CMORPH (Table 4). Here, too, there is no evidence of a systematic improvement of the time of occurrence for any of the three metrics of the diurnal cycle across the CMIP phases, with CMIP3 being in the mean closer to the observed time of the maximum than CMIP6. Measured by the mean absolute phase lag of the maxima, the diurnal cycle of the precipitation of low-level clouds over the ocean is also worse in CMIP6 than in the earlier phases. Using TRMM as alternative observational reference does not change these findings, except giving systematically larger phase lags for most of the metrics in Table 4.
Mean phase lag of the maxima in the amount, the frequency, and the 99th percentile in the 3-hourly tropical precipitation across CMIP phases compared to CMORPH. All lag values are in hours and listed for convection over land (L), convection over ocean (O), and low-level clouds over ocean (low-O). Positive values indicate that the models are leading the observations (i.e., earlier occurrence of the maximum in the models).


Although some models do show improvements, and correspond to observations for individual metrics (Fig. 10), none correctly represents the time of occurrence for all three maxima in amount, frequency, and intensity (not shown). Similarly, no model represents both the minimum and maximum of a single metric correctly. When taken together with the poor response of tropical precipitation to the seasonal cycle, these findings suggest that CMIP models should only be used with great caution in studies on the response of tropical precipitation to radiative forcing of atmospheric composition changes.
5. Modes of internal variability
The CMIP6 output shows more marked improvements in the representation of modes of tropical variability. This statement is based on the analysis of the two most dominant modes of internal variability in the tropics: the Madden–Julian oscillation (MJO) and El Niño–Southern Oscillation (ENSO), which we present here.
a. Madden–Julian oscillation
The MJO is the dominant mode of intraseasonal precipitation variability in the tropics, most pronounced in boreal winter. Its salient feature is a coherent eastward-propagating pattern of enhanced and suppressed convection over the Indian Ocean, the Indo-Pacific warm pool, and the western Pacific Ocean (Madden and Julian 1994). The critical processes that give rise to the MJO remain debated (Maloney et al. 2018) and a realistic MJO in climate models has been a challenge (Kim et al. 2009; Crueger et al. 2013; Jiang et al. 2015). The present analysis focuses on the most obvious characteristic of the MJO, namely, the eastward propagation of suppressed and enhanced precipitation patterns.
Figure 11 shows the ratio of the eastward-propagating spectral power of tropical precipitation to that of its westward-propagating counterpart for the CMIP phases. Each is summed up over the MJO characteristic wavenumbers one to three and periods of 20–100 days for the November to April season between 10°S and 10°N. This quantity is often used as a measure for the MJO (e.g., Crueger et al. 2013; Kim et al. 2014). A ratio larger than 1 indicates more spectral power in the eastward-propagating modes, and thus measures the dominance of an eastward-propagating disturbance. By not measuring other aspects of the MJO, such as its amplitude or composite structure, our analysis sets a relatively low bar for the evaluation of the MJO, but one which is informative. Calculations are performed on daily precipitation data from individual models for 1961–75 and 1976–90, and then averaged to be consistent with the 15-yr period of satellite observations.

Eastward-propagating strength of the Madden–Julian oscillation (MJO). Shown is the ratio of the eastward- and westward-propagating spectral power (r) of tropical precipitation (see section 5a). The dashed line indicates a standing wave. Values larger (smaller) than one are eastward- (westward-) propagating waves. The box-and-whisker plots indicate the median, quartiles, and extremes in the CMIP phases.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1

Eastward-propagating strength of the Madden–Julian oscillation (MJO). Shown is the ratio of the eastward- and westward-propagating spectral power (r) of tropical precipitation (see section 5a). The dashed line indicates a standing wave. Values larger (smaller) than one are eastward- (westward-) propagating waves. The box-and-whisker plots indicate the median, quartiles, and extremes in the CMIP phases.
Citation: Monthly Weather Review 148, 9; 10.1175/MWR-D-19-0404.1
Eastward-propagating strength of the Madden–Julian oscillation (MJO). Shown is the ratio of the eastward- and westward-propagating spectral power (r) of tropical precipitation (see section 5a). The dashed line indicates a standing wave. Values larger (smaller) than one are eastward- (westward-) propagating waves. The box-and-whisker plots indicate the median, quartiles, and extremes in the CMIP phases.
Citation: Monthly Weather Review 148, 9; 10.1175