Benchmarking Simulated Precipitation Variability Amplitude across Time Scales

Min-Seop Ahn aPCMDI, Lawrence Livermore National Laboratory, Livermore, California

Search for other papers by Min-Seop Ahn in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-3308-7793
,
Peter J. Gleckler aPCMDI, Lawrence Livermore National Laboratory, Livermore, California

Search for other papers by Peter J. Gleckler in
Current site
Google Scholar
PubMed
Close
,
Jiwoo Lee aPCMDI, Lawrence Livermore National Laboratory, Livermore, California

Search for other papers by Jiwoo Lee in
Current site
Google Scholar
PubMed
Close
,
Angeline G. Pendergrass bEarth and Atmospheric Science, Cornell University, Ithaca, New York
cNational Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Angeline G. Pendergrass in
Current site
Google Scholar
PubMed
Close
, and
Christian Jakob dARC Centre of Excellence for Climate Extremes, Monash University, Melbourne, Victoria, Australia

Search for other papers by Christian Jakob in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

Objective performance metrics that measure precipitation variability across time scales from subdaily to interannual are presented and applied to Historical simulations of Coupled Model Intercomparison Project phase 5 and 6 (CMIP5 and CMIP6) models. Three satellite-based precipitation estimates (IMERG, TRMM, and CMORPH) are used as reference data. We apply two independent methods to estimate temporal variability of precipitation and compare the consistency in their results. The first method is derived from power spectra analysis of 3-hourly precipitation, measuring forced variability by solar insolation (diurnal and annual cycles) and internal variability at different time scales (subdaily, synoptic, subseasonal, seasonal, and interannual). The second method is based on time averaging and facilitates estimating the seasonality of subdaily variability. Supporting the robustness of our metric, we find a near equivalence between the results obtained from the two methods when examining simulated-to-observed ratios over large domains (global, tropics, extratropics, land, or ocean). Additionally, we demonstrate that our model evaluation is not very sensitive to the discrepancies between observations. Our results reveal that CMIP5 and CMIP6 models in general overestimate the forced variability while they underestimate the internal variability, especially in the tropical ocean and higher-frequency variability. The underestimation of subdaily variability is consistent across different seasons. The internal variability is overall improved in CMIP6, but remains underestimated, and there is little evidence of improvement in forced variability. Increased horizontal resolution results in some improvement of internal variability at subdaily and synoptic time scales, but not at longer time scales.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Min-Seop Ahn, ahn6@llnl.gov

Abstract

Objective performance metrics that measure precipitation variability across time scales from subdaily to interannual are presented and applied to Historical simulations of Coupled Model Intercomparison Project phase 5 and 6 (CMIP5 and CMIP6) models. Three satellite-based precipitation estimates (IMERG, TRMM, and CMORPH) are used as reference data. We apply two independent methods to estimate temporal variability of precipitation and compare the consistency in their results. The first method is derived from power spectra analysis of 3-hourly precipitation, measuring forced variability by solar insolation (diurnal and annual cycles) and internal variability at different time scales (subdaily, synoptic, subseasonal, seasonal, and interannual). The second method is based on time averaging and facilitates estimating the seasonality of subdaily variability. Supporting the robustness of our metric, we find a near equivalence between the results obtained from the two methods when examining simulated-to-observed ratios over large domains (global, tropics, extratropics, land, or ocean). Additionally, we demonstrate that our model evaluation is not very sensitive to the discrepancies between observations. Our results reveal that CMIP5 and CMIP6 models in general overestimate the forced variability while they underestimate the internal variability, especially in the tropical ocean and higher-frequency variability. The underestimation of subdaily variability is consistent across different seasons. The internal variability is overall improved in CMIP6, but remains underestimated, and there is little evidence of improvement in forced variability. Increased horizontal resolution results in some improvement of internal variability at subdaily and synoptic time scales, but not at longer time scales.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Min-Seop Ahn, ahn6@llnl.gov

1. Introduction

Precipitation variability is a fundamental aspect of our climate and is associated with many weather and climate phenomena. It occurs on a wide range of time scales from subdaily to interannual, including phenomena that have a high impact on human activity such as flood and drought (e.g., Berndtsson and Niemczynowicz 1988; Cristiano et al. 2017; Pendergrass et al. 2020). Precipitation variability is not only an omnipresent characteristic of our environment; it is also critical for agriculture (e.g., Riha et al. 1996; Robinson and Gross 2010; Rowhani et al. 2011; Klink et al. 2014) with a wide range of societal impacts (e.g., O’Loughlin et al. 2014; Shively 2017).

Physically based numerical weather and climate models are important tools for understanding and predicting precipitation. However, weather forecast models continue to suffer from fundamental deficiencies that limit realistic simulation of precipitation (e.g., Sun et al. 2014; Simonin et al. 2017), and many serious biases persist in climate models (e.g., Dai 2006; Fiedler et al. 2020; IPCC 2021). Precipitation is not a continuous variable like temperature or other variables, resulting in complexities associated with the inherent intermittency including the intensity, frequency, and duration. In models, one long-standing deficiency is often referred to as a “drizzling bias” (e.g., Trenberth et al. 2017; Covey et al. 2018; Fiedler et al. 2020; Chen et al. 2021). Other examples include the diurnal cycle (e.g., Covey et al. 2016; Tang et al. 2021), drought characteristics (e.g., Nasrollahi et al. 2015; Knutson and Zeng 2018; Ukkola et al. 2018), and the convective precipitation that is associated with grid size and subgrid-scale parameterizations (e.g., Prein et al. 2015; Chen and Dai 2019; Xie et al. 2019; Ma et al. 2022). A recent study by Fiedler et al. (2020) evaluates the representation of tropical precipitation in three CMIP generations (phases 3, 5, and 6) and indicated overall little improvement across CMIP phases for summer monsoons, double-ITCZ bias, and diurnal cycle. Improvement of these and other deficiencies in simulated precipitation represent long-standing challenges in model development that continue to motivate research.

The Coupled Model Intercomparison Project (CMIP; Meehl et al. 2005, 2007; Taylor et al. 2012; Eyring et al. 2016), with well-defined experimental protocols, provides an established framework to evaluate state-of-the-art climate models. Six phases of CMIP spanning several decades include more than 100 climate models of varying complexity, with many of them including multiple realizations. The well-established framework for model evaluation and research opens the possibility for a more systematic and potentially semi-operational benchmarking of model performance (e.g., Flato et al. 2013; Gleckler et al. 2016; Eyring et al. 2019; Pendergrass et al. 2020). This has inspired community efforts to develop methods to routinely produce high-level objective assessments of a variety of large-scale characteristics gauging how well models compare with observations and each other. A few examples among many include performance metrics for the mean state (e.g., Gleckler et al. 2008; Reichler and Kim 2008; Fasullo 2020), El Niño–Southern Oscillation (ENSO; Bellenger et al. 2014; Planton et al. 2021), and extratropical modes of variability (Lee et al. 2019b, 2021; Fasullo et al. 2020), with others targeting a variety of aspects of simulated precipitation (e.g., Pendergrass and Deser 2017; Tang et al. 2021; Sillmann et al. 2013; Wehner et al. 2021; Klingaman et al. 2017). Large-scale summary statistics such as these are often fairly robust and complement regional-scale or process-driven analysis.

Many studies have focused on evaluating simulated precipitation at various spatiotemporal scales, such as continental- to global-scale mean states (e.g., Phillips and Gleckler 2006; Gleckler et al. 2008; Reichler and Kim 2008; Mehran et al. 2014; Nguyen et al. 2017), variability at a specific time scale (e.g., Pendergrass et al. 2017; Brown et al. 2017; Wood et al. 2021; Zhang et al. 2021), precipitation extremes from short-time-scale heavy rain (e.g., Sillmann et al. 2013; Wehner et al. 2020, 2021) to longer-time-scale droughts (e.g., Ukkola et al. 2018; Bonfils et al. 2020), regional impact-related evaluation (e.g., Koutroulis et al. 2016), intensity distribution (e.g., Dai 2006; Pendergrass and Hartmann 2014; Ma et al. 2022), and spatial and temporal scale coherence of precipitation (e.g., Klingaman et al. 2017; Martin et al. 2017). Fiedler et al. (2020) mentioned above provides a comprehensive portrayal of the limited progress in improving a broad range of tropical precipitation characteristics.

Biases in simulated precipitation variability have been objectively quantified on a wide range of time scales including subdaily internal variability (e.g., Trenberth et al. 2017; Covey et al. 2018), the diurnal cycle (e.g., Covey et al. 2016; Tang et al. 2021; Lee and Wang 2021), synoptic variability (e.g., Wang et al. 2015; Hu et al. 2019), subseasonal variability (e.g., Kim et al. 2009; Ahn et al. 2017, 2020), the seasonal cycle (e.g., Dunning et al. 2017; Fiedler et al. 2020), and interannual variability (e.g., Lee et al. 2019a; Zhu and Yang 2021). While collectively these studies target important aspects of precipitation variability, a set of large-scale metrics designed to gauge precipitation variability amplitude across time scales has yet to be brought together in a common framework.

An important factor in evaluating how well models simulate precipitation is the uncertainties associated with measurement-based estimates (e.g., Ruane and Roads 2007; Gehne et al. 2016; Sun et al. 2018; Tang et al. 2021). Sun et al. (2018) found that substantial discrepancies occur across 30 global precipitation products (7 ground-based, 13 satellite-based, and 10 reanalysis) in basic aspects of precipitation such as the mean state, seasonal cycle, and intensity distribution. Recently, Tang et al. (2021) identified important discrepancies in the diurnal and semidiurnal cycle estimates of five observationally based products including three satellite-based and two ground-based estimates. The discrepancy between ground-based and satellite-based precipitation products has also been discussed in terms of a variety of characteristics including seasonal mean amount, frequency, intensity, diurnal cycle (Dai et al. 2007; Gehne et al. 2016), and extremes (Alexander et al. 2019). In an attempt to address the many challenges associated with precipitation datasets, the International Precipitation Working Group (IPWG) and the GEWEX Data and Analysis Panel (GDAP) has recently published a Joint IPWG–GDAP Precipitation Assessment (Roca et al. 2021).

The diversity of topics addressed in the studies highlighted above is indicative of the complex nature of precipitation, the fundamental biases that exist in atmospheric general circulation models, and the significant deficiencies in available observations. Better process-level understanding is required to improve models, and in support of this difficult challenge it is important to be able to robustly gauge the consistency between models and observational products. Large-scale performance metrics (e.g., for the mean state, ENSO, and many other characteristics) can provide a means to quantify the relative strengths and weaknesses of different models, and be used to benchmark model improvements during the model development process or across generations of the CMIP multimodel ensemble. Many of these are holistic in the sense that they do not isolate specific processes or phenomena that might help identify the cause of model errors. Rather, they are designed to provide a robust means to compare the basic strengths and weaknesses of different models and can be used to monitor performance changes as models are further developed. Such high-level performance metrics complement the findings of more targeted process-driven research.

Motivated by the outcome of a workshop dedicated to building a framework for the systematic benchmarking of simulated precipitation (Pendergrass et al. 2020), in the present study we use two independent methods to estimate the precipitation variability amplitude across time scales. Note that this analysis targets only one of many properties that are required to capture the complex and intermittent nature of precipitation (e.g., including intensity, frequency, duration, amount, and type). We focus on results averaged across large-scale domains to facilitate a robust comparison between observations and models, using established well-defined metrics to evaluate simulated precipitation variability for both forced (e.g., annual and diurnal cycles) and internal variability across time scales. Our analysis across time scales for both forced and internal variability is accomplished in a single analysis framework. We examine multiple satellite-based observational datasets to factor into our analysis how the selection of a reference dataset can alter our conclusions of model behavior. We evaluate models that have been contributed to the most recent CMIP phase (CMIP6; Eyring et al. 2016) and compare the results with those from the previous generation (CMIP5; Taylor et al. 2012) to assess the improvement across recent model generations.

The rest of the paper is organized as follows: Sections 2 and 3 describe the data and analysis methods used. Section 4 presents results including the comparison of variability estimated from two independent analysis methods, evaluation of CMIP models and their improvement over time. Section 5 summarizes and discusses the main findings from this study.

2. Data

a. Satellite-based precipitation products

We rely on three modern satellite-based products that provide rainfall estimates at a frequency of 3 h or higher: the bias-corrected Climate Prediction Center Morphing technique product (Xie et al. 2017; hereafter CMORPH), the Tropical Rainfall Measuring Mission Multisatellite Precipitation Analysis 3B42 version 7 product (Huffman et al. 2007; hereafter simply TRMM), and the Integrated Multi-satellitE Retrievals for GPM version 6 final run product (Huffman et al. 2020; hereafter simply IMERG). CMORPH combines multiple satellite-based precipitation estimates from passive microwave (PMW) sensors on low Earth orbit (AMSR-E, AMSU-B, MHS, MWRI SSM/I, SSMIS, and TMI) and infrared sensors on geosynchronous Earth orbit (GEO-IR) with an algorithm of propagation and morphing. The bias correction for CMORPH is performed through probability density function matching against the CPC daily gauge analysis over land and GPCP pentad merged analysis over ocean. TRMM combines multiple satellite-based precipitation estimates from PMW sensors on low Earth orbit and infrared sensors on geosynchronous Earth orbit (AMSR-E, AMSU-B, MHS, SSM/I, SSMIS, TMI, and GEO IR) with calibration by TRMM combined instrument based on an algorithm of probability matching. IMERG is designed to supersede TRMM products with an improved algorithm on higher temporal and spatial resolutions. Its algorithm fuses the early precipitation estimates collected during the operation of the TRMM satellite (2000–15) with more recent precipitation estimates collected during operation of the Global Precipitation Measurement (GPM) satellite (2014–present). IMERG combines the GPM satellite constellation (Advanced Technology Microwave Sounder, AMSR-2, GMI, MHS, SAPHIR, and SSMIS) with an algorithm of integrated multisatellite retrievals. Several recent studies have shown the advantages of IMERG (Rajagopal et al. 2021; Hosseini‐Moghari and Tang 2022) and argued that IMERG is more reliable than TRMM and CMORPH (e.g., Wei et al. 2017; Khodadoust Siuki et al. 2017; Zhang et al. 2018). We use IMERG as our default reference to quantify simulated to observed variability ratio for CMIP models but include results from both TRMM and CMORPH as alternate references. CMORPH covers the years of 1998 to the present with 30-min temporal and 8-km horizontal resolutions, TRMM covers the years of 1998–2019 with 3-hourly temporal and 0.25° horizontal resolutions, and IMERG covers the years of 2001–present with 30-min temporal and 0.1° horizontal resolutions. Differences between the satellite-based precipitation products are more described in Sun et al. (2018) and Tang et al. (2021). To minimize the sampling uncertainty from different periods and temporal and spatial resolutions, we analyze the common 19 years of 2001–19 for the three datasets after averaging to a 3-hourly frequency with the time bound used by TRMM. We note that due to the different methods to sample diurnal variations as well as satellite instrument and constellation changes over time, the observational estimates of variability are imperfect, perhaps especially at subdaily time scales. Using three different products partially mitigates against this, but does not remove the need for caution in the interpretation of our results. They all suffer from inhomogeneities over time, perhaps with some independent changes, but also depending on many of the same satellite sensors, so potentially having shifts at similar times. A lack of complete intercalibration might also lead to somewhat enhanced subdaily variability in the observational products as the various input datasets sequentially contribute.

b. CMIP models

In this study, we analyze 3-hourly averaged precipitation obtained from multiple realizations of 21 CMIP5 (Taylor et al. 2012) and 33 CMIP6 (Eyring et al. 2016) models in their historical simulations. The models and the number of their realizations used in this study are listed in Table 1. We have calculated metrics for individual realizations and then averaged the statistics across realizations to yield a representative score for each model. We analyze the period of 1985–2004, which is the most recent 20 years that CMIP5 and CMIP6 models have in common. A common objective of many studies analyzing CMIP simulations is to evaluate if models have improved over time. In this study we do so following a routine approach: average the results (statistics, not model output) across all available realizations of the same model, which makes use of all realizations in the database and ensures that each model has equal influence in a multimodel mean. However, in recent years it has become clear that this may not be sufficient for a fair assessment of generational improvements. One difficulty is an evolving number and diversity of models contributed to each generation of CMIP; more fundamentally, there is the potential of the dependence of different models to obfuscate the value of the traditional model mean (Eyring et al. 2019). We do report results from the traditional multimodel means in this study but emphasize results that we believe are more telling: Evaluating how the performance of individual models has improved across generations.

Table 1

List of CMIP5 and CMIP6 models used in this study and their horizontal resolution. The numbers in parentheses indicate the number of realizations used for each model. The models with an asterisk are ESM-type models. Note that the horizontal-resolution information is obtained from the number of grids, and it may vary slightly if the grid interval is not linear.

Table 1

3. Methods

We use two independently established approaches to quantify variability across time scales: power spectra and time averaging. In the process, we address the strengths and weaknesses of each method and discuss how they are complementary. For internal variability (discussed below), we use both power spectra and an independent time-average approach. For the forced variability, our use of selected frequencies within a power spectrum is equivalent to the application of harmonic analysis in previous studies for evaluating the amplitude of the annual and diurnal cycles (e.g., Hsu and Wallace 1976; Kirkyla and Hameed 1989; Gates et al. 1999; Dai et al. 2007; Covey et al. 2016; Tang et al. 2021). Applying our analysis to multiple realizations from individual models, we have confirmed that our approach yields fairly robust results (discussed later).

a. Power spectra–based variability estimation

We estimate precipitation variability across time scales from subdaily to interannual with power spectra derived from 3-hourly precipitation. We apply the power spectra analysis method of Welch (1967) at each grid point with a 10-yr segment Hann window and a 5-yr overlap between windows. To test the significance of spectral power, we use a red noise background using 95% confidence level (Wilks 1995). The longer window length increases the lowest detectable frequency of power spectra but decreases the extent of noise signal smoothing. The overlap length also affects the smoothing of the noise signal, and a half of the window length is commonly used as an overlap length (e.g., Welch 1967; Trethewey 2000; Weber and Talkner 2001; Pallotta and Santer 2020). The half overlap of the Hann window enables each data point to get the same weight in the resulting spectrum, except the data points in the first and last half window. It is important to note that there is little consensus on the most suitable processing choices, particularly for precipitation that most closely follows a gamma distribution (e.g., Schuite et al. 2019; Martinez-Villalobos and Neelin 2019). To address this, we compare our results with those derived from a simple time-averaging approach (discussed below). We also perform several sensitivity tests to quantify what impact our processing choices may have on our conclusions (see the appendix) and find that the ratio of simulated to observed variability is not much affected by the different processing choices of power spectra, supporting the robustness of our metric.

We partition the precipitation variability into forced and internal variability (Fig. 1a). The forced variability, which includes precipitation mainly forced by annual and diurnal cycles of solar insolation and annual cycle of sea surface temperature, can then be obtained from a power spectrum of the original 3-hourly total precipitation time series. The internal variability is obtained from a power spectrum of the 3-hourly anomaly precipitation. The anomaly time series is obtained by subtracting the mean annual and diurnal cycle from each year of the original total time series at each grid point. The mean annual and diurnal cycles are calculated by averaging the 3-hourly total time series across all years. To obtain the precipitation variability at specific time scales from the power spectra results, we pick a frequency (forced) or average a frequency band (internal) of the spectral power. For the forced variability, we choose frequencies representing the annual cycle (around 365 days), the semiannual cycle (around 182.5 days), and the diurnal cycle (1 day). The semidiurnal cycle is excluded in our analysis because the 3-hourly frequency resolves the semidiurnal cycle with only four time points, which is close to the Nyquist frequency (Nyquist 1928) and prone to substantial uncertainty. Because CMIP models use various calendars, including Gregorian, 365-day, and 360-day calendars, we select the frequency that has maximum spectral power near the target frequency for the annual and semiannual cycles. For the internal variability, we define and average frequency bands of interest as the interannual variability (longer than 365 days), seasonal variability (90–365 days), subseasonal variability (20–90 days), synoptic variability (1–20 days), and subdaily variability (shorter than 1 day). As mentioned above, by using the simulated-to-observed ratio, our metric is not much affected by the different processing choices of power spectra, helping the robustness of analysis results. Also, we use the three observations (IMERG, TRMM, and CMORPH) as a reference dataset, which facilitates us to gauge what impact the selection of observational data has on any conclusions we may draw from our analysis.

Fig. 1.
Fig. 1.

Schematic diagrams for estimating variability across time scales including (a) forced and internal variability with power spectra analysis and (b) internal variability with the time-average method.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

b. Time-average-based variability estimation

A metric based on a time-averaging method is defined and applied to assess the robustness of the results from the power spectra–based metric. The time-average method was used to obtain the variability of subdaily intermittent precipitation in previous studies (e.g., Covey et al. 2018). We extend the strategy to variability at longer time scales (Fig. 1b). From the 3-hourly anomaly precipitation time series at each grid point, we produce time-averaged anomaly precipitation time series with 1-, 20-, 90-, and 365-day temporal resolution by averaging the 3-hourly data at each time scale (e.g., the time series averaged with 365-day temporal resolution has 20 time points for 20 years data). In doing so, the variance of each time series includes the variability at longer frequencies than the time averaging period. We obtain the variability for specific time-scale range by subtraction of the two time series’ variance as follows: subdaily variability = VAR3h − VAR1day, synoptic variability = VAR1day − VAR20day, subseasonal variability = VAR20day − VAR90day, and seasonal variability = VAR90day − VAR365day, where VAR indicates the variance of a time series with a specific temporal resolution. The interannual variability is obtained by the variance of the time series with 365-day temporal resolution.

c. Large-scale performance metrics

We quantify the difference between observed and simulated precipitation variability with the ratio of simulated to observed variability. Our performance metrics are aggregate summary statistics computed over the tropics (30°S–30°N), extratropics of both hemispheres (30°–50°S, 30°–50°N), and near globally (50°S–50°N). In each case, we also compute results for land and ocean, land-only, and ocean-only. We quantify the differences between simulations and observations by comparing results calculated on a common horizontal grid (2° × 2°) using a conservative interpolation method that preserves the area-weighted mean of the domain (regrid2 provided by CDAT). Interpolation has disadvantages (e.g., associated with values of zero precipitation); however, we agree with the decision made in many previous studies that to fairly intercompare estimates of variability the best option is usually to evaluate all models and observational products at a common resolution. Chen and Dai (2019) discussed two resolution effects: one is a grid aggregation effect related to the increased probability of precipitation as the sampling area increases, and the other is associated with physical adjustments, which is related to the different performance of model physics and dynamics under changing model resolutions. Our approach here is to mitigate the effects of grid aggregation and focus on the physical adjustment effects for different model resolutions. For each domain, variability estimates are computed at every grid cell, averaged across the domain with area-weighting applied, and finally the domain-averaged simulated/observed ratios are computed.

4. Results

a. Methodological considerations

A conventional statistic used to gauge precipitation variability is the temporal standard deviation (SD). Figures 2a–e show the spatial pattern of the SD based on 3-hourly precipitation for the three satellite-based precipitation datasets as well as the average SD for both the CMIP5 and CMIP6 multimodel. For the model-model mean, each model is equally weighted by first averaging the SD across all realizations of each model. Generally, precipitation variability is stronger over the tropics than extratropics and stronger over the oceans than over land, which is similar to the mean state pattern (Figs. 2f–j). The three satellite-based precipitation data show a similar spatial pattern, but a large spread in the magnitude of variability. Based on the 50°S–50°N area-averaged SD of IMERG (11.87), TRMM (10.08) and CMORPH (9.27) respectively show about 15% and 22% weaker variability, and the difference mainly comes from the tropics. In the mean state, on the other hand, the difference between observations can be seen but is less than that in the SD. TRMM and CMORPH respectively show about 10% and 14% weaker mean precipitation than IMERG in 50°S–50°N area-averaged values. In addition to these discrepancies in precipitation estimates, differences in these products have been identified in other baseline characteristics including the seasonal cycle, intensity distribution, and diurnal cycle (e.g., Sun et al. 2018; Tang et al. 2021). Numerous papers have also examined the differences between these products for smaller domains (e.g., Wei et al. 2017; Khodadoust Siuki et al. 2017; Zhang et al. 2018).

Fig. 2.
Fig. 2.

Spatial patterns of the (a)–(e) standard deviation and (f)–(j) mean state of 3-hourly precipitation from IMERG, TRMM, CMORPH, CMIP5, and CMIP6. Contours indicate the intermodel spread represented by standard deviation calculated across statistics of models. The number in the upper right of each panel indicates the 50°S–50°N-averaged value. The nonlinear color scale is applied to facilitate the visualization of patterns across the wide range of scales observed.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

The multimodel means of CMIP5 and CMIP6 precipitation variability are muted in comparison to the variability of satellite-based estimates. Comparing the 50°S–50°N domain-averaged SD, the CMIP5 mean (6.88) and CMIP6 mean (7.39) substantially underestimate the variability of IMERG by about 42% and 38%, respectively. The CMIP6 mean suggests there is little improvement from the CMIP5 mean and still largely underestimates the observed variability. In the mean state, the multimodel means of CMIP5 and CMIP6 are not weaker than the observations. The relationship between the mean state and variability across individual models will be shown later. We note, however, that the comparison of multimodel means may be misleading given the different mix of models contributed to CMIP5 and CMIP6. The large intermodel spread further complicates the comparison of the multimodel averages (contours in Figs. 2d,e,i,j). As discussed in section 2b, we will later (section 4d; Fig. 10) attempt to examine model improvement more explicitly by examining performance changes across generations in each modeling group.

Figure 2 shows the total variability of 3-hourly data, including variability at time scales longer than the 3-hourly frequency. The decomposition of the variability across time scales would help us better understand the model errors, and helps to motivate this study. With power spectra analysis, we can isolate the variability across time scales and examine the model errors at distinct time scales. Section 3a with Fig. 1a summarizes the strategy of variability decomposition with power spectra analysis. Figure 3 shows the power spectra of 3-hourly total precipitation time series (left) and anomaly precipitation time series (right) averaged over the tropics (30°S–30°N), Northern Hemisphere extratropics (NHEX; 30°–50°N), and Southern Hemisphere extratropics (SHEX; 50°–30°S). In the power spectra of total precipitation, there are several distinct peaks above 95% confidence level. The most distinct peak represents the annual cycle, with a second distinct peak associated with the diurnal cycle. Semiannual and semidiurnal peaks are also evident. Hereafter, we refer to the spectral power of the first three cycles, excepting the semidiurnal cycle, as forced variability. The semidiurnal cycle is excluded in our analysis as discussed in section 3a.

Fig. 3.
Fig. 3.

Power spectra of 3-hourly precipitation from IMERG (black), TRMM (gray), CMORPH (silver), CMIP5 (blue), and CMIP6 (red) for the (a),(b) Northern Hemisphere extratropics, (c),(d) tropics, and (e),(f) Southern Hemisphere extratropics. The (left) forced and (right) internal variability are obtained from total precipitation and anomaly precipitation departure from the spatially varying climatological diurnal and annual cycles. The shades with different colors indicate the 95% confidence level for each observational product as well as the CMIP5 and CMIP6 means. The natural logarithm x axis (frequency) and linear y axis (frequency × power spectral density) ensure that the area below each line is proportional to the total variability.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

In the power spectra of the anomaly precipitation, we define several frequency bands as subdaily (shorter than 1 day), synoptic (1–20 days), subseasonal (20–90 days), seasonal (90–365 days), and interannual (longer than 365 days). Consistent with Fig. 2, there are notable differences in the power spectra of the three satellite-based precipitation estimates, especially over the tropics. IMERG shows the strongest variability over most frequency bands. The CMIP6 models show overall stronger variability across time scales than the CMIP5 models but still underestimate the observed internal variability. In contrast to the internal variability, both CMIP5 and CMIP6 overestimate the forced variability of the annual cycle over the tropics and the diurnal cycle over the tropics and NHEX. The estimation of the forced variability is statistically significant, but the internal variability across all time scales in Figs. 3b, 3d, and 3f is well below the 95% confidence level, urging a degree of caution in the interpretation of these results. Later, we will examine results from land and ocean separately.

To address the limitation of the power spectra–based method, we apply a distinctly different but complementary time-average method (section 3b with Fig. 1b) to help interpret our estimates of internal variability. Figure 4 shows the spatial patterns of the variability estimated by the time-average method. The spatial pattern of the variability is similar to the mean precipitation pattern. There is substantial agreement between the three satellite-based products at longer time scales, but at shorter time scales the observational discrepancy is getting larger. Consistent with the previous figures, IMERG shows the strongest variability across time scales in the three satellite-based products, especially over the tropics and in higher frequencies. Also, the CMIP multimodel means show overall underestimated variability, especially in the higher frequency and over the tropics.

Fig. 4.
Fig. 4.

Spatial patterns of the variability across time scales estimated by time-averaged method for (a) IMERG, (b) TRMM, (c) CMORPH, (d) CMIP5, and (e) CMIP6. Contours in (d) and (e) indicate the intermodel spread represented by the standard deviation calculated across statistics of models. Numbers in the upper right of each panel indicate the 50°S–50°N-averaged variability. The different rows use different color levels.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

Figure 5 quantifies the relationship between our results with the power spectra and time-average methods for the internal variability at different time scales. The small dots represent results for individual CMIP5 and CMIP6 models (averaged across all realizations available for each model) and the larger shaded circles the CMIP5 + CMIP6 multimodel mean at specific time scales. As the variability estimates from the power spectra and time-average methods are quantitatively very different, and as our goal is to evaluate the model performance relative to observations, we consider the ratio between the simulated and observed values although it is limited to evaluate actual values. When analyzed in this fashion, we find strong agreement (correlation coefficient above 0.95) between the two methods. This result supports the robustness of our broad measure of variability (averaged across frequencies and large-scale domains) estimated by power spectra even though the statistical significance was below the 95% confidence level as shown in Fig. 3. The low confidence in the spectral power of internal variability relates to a basic characteristic of precipitation, namely its intermittent rather than continuous nature (Trenberth et al. 2017; Trenberth and Zhang 2018; Covey et al. 2018). We tested using simple idealized time series with explicitly added variability to demonstrate how intermittency can substantially alter the spectra and consequently decrease the statistical significance across a broad range of scales (Fig. S1 in the online supplemental material). The result implies that the intermittent characteristic of precipitation could decrease the statistical significance of spectral power although there are signals of variability that are explicitly added.

Fig. 5.
Fig. 5.

Relationship between the internal variability estimated from the power spectra (PS) and time-average (TA)-based metrics over (a) NHEX (30°–50°N), (b) TROPICS (30°S–30°N), and (c) SHEX (50°–30°S). The reference observation for the ratio is IMERG. Color coded by each time scale, the small and large dots respectively represent individual models averaged across realizations (CMIP5 and CMIP6) and the mean across all models at each specific time scale. The numbers with different colors in the upper right of each panel represent the correlation coefficients between the two methods across all models at all time scales (red) and each time scale separately (other colors corresponding to each time scale).

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

Separately, each of the power spectra and time-average based approach has limitations. As noted above, estimates of internal variability with the power spectra method are generally not significant and can be sensitive to methodological considerations, diminishing their utility when used in isolation. In contrast, the simplicity of the time-average method is appealing, but it does not provide information on forced variability because the time-average method only can measure the variability within a time frequency band. Taken in combination however, and using the power spectra method as our base metric, provides a simple framework that yields a consistent set of results that builds trust in our approach.

b. Variability evaluation in CMIP5 and CMIP6 models

Our power spectra–based metric measures the forced variability of annual, semiannual, and diurnal cycles and the internal variability of interannual, seasonal, subseasonal, synoptic, and subdaily time scales. Figure 6 shows the distribution of the three satellite-based products, along with the CMIP5 and CMIP6 results averaged over different latitude ranges (left column) as well as separated into ocean (middle column) and land (right column). The model results are obtained by averaging the metric values across individual realizations of each model. The three satellite-based estimates do show substantial differences, as indicated by the difference between the largest and smallest value (shaded gray bar), especially for oceanic regions. In the tropics and NHEX, the observational differences tend to be larger in high-frequency variability (e.g., subdaily) than that of low-frequency variability (e.g., annual and interannual). In SHEX, however, the annual variability exhibits the largest differences in the three observations, perhaps because fewer station data are available as a constraint in the satellite products. In those regions and frequencies, the interobservational spread is greater than that of the intermodel spread. The large interobservational spread would be a limitation in quantitatively and accurately evaluating the model performance, but the finding that CMIP models overall underestimate the high-frequency variability is not changed irrespective of which observation is used. Recalling the total variability of 3-hourly precipitation in Fig. 2, IMERG exhibits the largest overall variability among the three observations. However, with the decomposition of variability into different time scales and domains we find that IMERG does not show the largest variability across all time scales and domains. In the tropical oceanic region, IMERG shows the largest variability for all time scales, but in the tropical land region TRMM shows larger variability than IMERG at interannual, seasonal, and subdaily time scales. This characteristic is evident in the extratropics as well and is illustrative of complex measurement, processing, and sampling uncertainties in each product that are difficult to reconcile. It is worth noting that TRMM has one snapshot (not accumulation) per 3 h, while CMORPH and IMERG have six. This deficiency makes the TRMM product prone to be noisier in the subdaily time scales.

Fig. 6.
Fig. 6.

Distribution of forced and internal variability estimated from power spectra–based metric for observations (black) and the CMIP5 (blue) and CMIP6 (red) models. The gray boxes represent the spread of the three observations from the minimum to the maximum values, with the different symbols in the gray box indicating each observation (× for IMERG, − for TRMM, and + for CMORPH). The reference observation for the ratio is IMERG. The blue and red boxes indicate the spread of CMIP models (averaged across individual realizations for each model) from the 25th percentile to the 75th percentile. Individual models (not identified) are shown as thin dashes, the multimodel mean as a wide thick dash, and the multimodel median as an open circle.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

In the CMIP5 and CMIP6 models, the intermodel spread is relatively large for the diurnal cycle, especially over the tropical and NHEX land regions. Some models have variability less than half of the observed value, while others have variability that is 3 times larger than the observed value. Overall, the CMIP5 and CMIP6 models tend to overestimate the forced variability and underestimate the internal higher-frequency variability (e.g., subdaily and synoptic) over the tropics and NHEX land region. The annual and diurnal cycles are mainly forced by solar insolation, which implies that the convection schemes of CMIP models may overreact to solar heating (e.g., Lee et al. 2007; Xie et al. 2019) or the net surface solar radiation could be overestimated in CMIP models over land regions (e.g., Ma et al. 2018; Van Weverberg et al. 2018). In the NHEX ocean and SHEX ocean regions, on the other hand, CMIP5 and CMIP6 models underestimate all forced and internal variability in general. This metric is limited to measure only the amplitude of variability rather than the phase. To comprehensive evaluation of the forced variability, another metric based on harmonic analysis is needed, which is included in our precipitation benchmarking package (Pendergrass et al. 2020; Tang et al. 2021). The systematic underestimation of internal variability in CMIP models on synoptic and subdaily time scales implies shortcomings in the simulation of the rain-bearing weather systems that create the observed variability on these time scales. In other words, models lack the mesoscale structures, presumably parameterized in coarser-resolution models, that would be fundamental to correctly representing the high-frequency precipitation variability. The relationship between the high-frequency variability and model resolution will be shown in the last part of this section.

The interrealization spread is an important factor to consider the robustness and usefulness of the model evaluation results. Figure 7 shows the interrealization spread for the models that have more than 10 realization members among CMIP models we used. Across all time scales and domains, the interrealization spread is negligibly small, except for the forced variability over SHEX and NHEX land area. The largest interrealization spread appears over the SHEX land area, which could be related to the fact that the SHEX includes only a small fraction of land area (e.g., the southern part of Australia and South America). Also, the forced variability exhibits an overall larger spread than the internal variability, which could be associated with the different analysis processing that the forced variability was obtained by picking a frequency whereas the internal variability was obtained by averaging a frequency range. Compared to the intermodel spread shown in Fig. 6, however, the interrealization spread exhibits a negligible small range, supporting the robustness of the results from our metric.

Fig. 7.
Fig. 7.

As in Fig. 6, but for individual realizations from the subset of models where 10 or more realizations are available. The number to the right of each model name represents the number of realizations. In many cases the thin horizontal bars for individual realizations are not visible because they are indistinguishable from the thicker bar denoting the average across any given model’s realizations.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

Figure 8 summarizes the results of our analysis for each individual model. It shows the forced and internal variability ratio to the IMERG for each model as well as each observation. IMERG exhibits the largest variability in most frequencies and domains, but TRMM shows the largest variability at high frequencies (e.g., diurnal and subdaily) over the NHEX region, especially for land regions. CMORPH exhibits the smallest variability of the observational products for most frequencies and domains. Consistent with Fig. 6, many CMIP5 and CMIP6 models are shown to overestimate the forced variability and underestimate the internal variability. The higher frequencies exhibit larger errors: the diurnal cycle tends to be more substantially overestimated than longer time scales in forced variability, whereas the internal variability at higher frequencies (e.g., subdaily and synoptic) tend to be severely underestimated. Also, errors in the tropical region are larger than those in the extratropics. The simulated-to-observed variability ratio is generally larger over the land region than the ocean region. Thus, the land region tends to show more overestimated forced variability and alleviated underestimation of the internal variability compared to the ocean region, but some models turn into overestimation of the internal variability. ACCESS1.0, ACCESS1.3, CNRM-CM5, EC-EARTH, GISS-E2-H, GISS-E2-R, IPSL-CM5A-LR, and IPSL-CM5A-MR show the largest errors among CMIP5 models, but the new versions of these models in CMIP6 (e.g., ACCESS-CM2, CNRM-CM6-1, EC-EARTH3, GISS-E2-1-G, and IPSL-CM6A-LR) show some improvements. On the other hand, MIROC-ESM and MIROC-ESM-CHEM show the best simulation skill in forced variability among the CMIP5 models, but the next generation of the model, MIROC-ES2L, shows a somewhat degraded simulation skill in forced variability with increased variability. In Fig. S2, the internal variability is shown twice based on both the power spectra–based metric and the time-average based metric, with remarkable agreement between the two.

Fig. 8.
Fig. 8.

Portrait plot of the forced and internal variability estimated from power spectra–based metric for individual (a) CMIP5 and (b) CMIP6 models. The three satellite products are shown at the bottom of the CMIP5 result. The reference observation for the ratio is IMERG. The triangles in each box indicate different domains—top: global (50°S–50°N); right: NHEX (30°–50°N); bottom: tropics (30°S–30°N); and left: SHEX (50°–30°S) [see key below (a)] for (left) total region, (center) ocean region, and (right) land region. Green dot is marked if the model falls within the range of the three satellite products. Figure S2 in the online supplemental material demonstrates that the results of this figure are nearly identical when calculated with the spectral and time-averaged methods used in this study.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

c. Seasonality of subdaily variability

The annual mean results of subdaily internal variability (Fig. 8) are analogous to those of Covey et al. (2018). In Fig. 9, we use the time-average method to assess the seasonal changes in subdaily internal variability. The extratropical land regions show increased variability in summer and decreased variability in winter in observations. Compared to the land regions, the seasonality is more pronounced over the extratropical ocean regions, and its timing is delayed by about three months. Consistent with the muted model variability identified in the annual mean result, Fig. 9 reveals that most models are notably deficient at simulated subdaily precipitation variability for all calendar months over the extratropical ocean regions although the annual fluctuation of the variability is reasonably well simulated. The deficiency of the variability across all calendar months is evident over the tropical ocean as well. Over the tropical and extratropical land regions, the situation is ambiguous as there are periods when many models are more consistent with the observations. Figure S3 in the online supplemental material exhibits the results of Fig. 9 as simulated-to-observed variability ratios, to more clearly quantify the model skill.

Fig. 9.
Fig. 9.

Seasonality of subdaily variability estimated by time-average method for observations (black), CMIP5 (blue), and CMIP6 (red) models. The gray boxes represent the spread of the three observations from the minimum to the maximum values, with the different symbols in the gray box indicating each observation (× for IMERG, − for TRMM, and + for CMORPH). The blue and red boxes indicate the spread of CMIP models (averaged across individual realizations for each model) from the 25th percentile to the 75th percentile. Individual models are shown as thin dashes, the multimodel mean as a wide thick dash, and the multimodel median as an open circle.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

Overall, these results suggest that while there is some seasonality associated with the subdaily scale variability of precipitation, models are demonstrably muted compared to the satellite products for all calendar months. The seasonality of the simulated diurnal cycle in CMIP5 and CMIP6 was recently examined in Tang et al. (2021). Consistent with earlier studies (e.g., Dai et al. 2007; Covey et al. 2016), they demonstrate the substantial seasonality and biases of both the amplitude and phase of the diurnal cycle. More work is needed to understand how or if the biases in the (forced) diurnal cycle and subdaily internal variability are related.

d. Benchmarking variability performance changes from CMIP5 to CMIP6

The improvement in simulated precipitation variability is quantitatively assessed across CMIP generations from phase 5 to phase 6 for applicable modeling groups. The modeling groups and their corresponding CMIP5 and CMIP6 models are listed in Table 1. We assess the improvement in Earth system model (ESM)-type and coupled model (CM)-type separately, to facilitate consistent comparison between generations and model types. We define an improvement metric as |MC5 − 1| − |MC6 − 1|, where MC5 and MC6 are the means of the precipitation variability metric of the models involved in each modeling group for CMIP5 and CMIP6, respectively. We assess the statistical significance of the difference between CMIP5 and CMIP6 means with a t test for unequal sample sizes (Welch 1947) using all available realizations. We do not test the significance for the cases where only one realization is available for CMIP5 or CMIP6 models. Figure 10 shows the improvement in CMIP6 from CMIP5 for each modeling group available. Across time scales, many CMIP6 models show increased variability compared to CMIP5 models (plus marks). This drives the overall improvement (green shading) in CMIP6 models in the internal variability that was underestimated in most CMIP5 models. However, CMIP6 models show little improvement in forced variability because the forced variability was already overestimated. The improvement is more exhibited over the ocean regions than the land regions (Fig. S4). While many CMIP6 models show increased variability, some models show decreased variability of the diurnal cycle (e.g., ACCESS-CM, CNRM-CM, GISS-CM, and IPSL-CM), reducing the overestimation in their CMIP5 version. FGOALS-CM and IPSL-CM exhibit the most improvement across all time scales among the models analyzed here. A small number of models (e.g., CMCC-CM and MIROC-ESM) show more degradation than improvement.

Fig. 10.
Fig. 10.

Improvement of the forced and internal variability estimated from power spectra–based metric in CMIP6 from CMIP5 for each modeling group. The improvement represents how CMIP6 is close to the observations compared to CMIP5 (see text for details). The triangles in each box indicate different domains—top: global (50°S–50°N); right: NHEX (30°–50°N); bottom: tropics (30°S–30°N); and left: SHEX (50°–30°S) (see key in lower right). Plus signs indicate increased variability in CMIP6 compared to CMIP5, and minus signs indicate decreased variability. The black and white plus/minus signs respectively indicate statistically significant (>95% confidence) and insignificant changes based on a t test using all realization members. The gray plus/minus signs indicate that the significance test is not appropriate because of insufficient sample size (i.e., only one realization). Gray shading indicates that CMIP5 and CMIP6 simulations both fall within the range of the three satellite products.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

An example of a strong decline in skill, especially in internal variability is the CMCC-CM. We note that, unlike other models, the CMCC contribution to CMIP6, CMCC-CM2-SR5 (1.25° longitude × 0.9° latitude), has a lower horizontal resolution than the model that participated in CMIP5, CMCC-CM (0.75° longitude and latitude). This hints at the possibility that the model horizontal resolution is an important factor to simulate the precipitation variability accurately. Generally, the model horizontal resolution is increased in CMIP6 from CMIP5 in each modeling group (Table 1), but most models also contain many other improvements, such as physics parameterizations. Thus, it is difficult to confirm the impact of horizontal resolution on the precipitation variability by simply comparing CMIP5 and CMIP6 models.

To better isolate the impact of horizontal resolution on the simulated precipitation variability, we compare the models that provide both lower- and higher-resolution versions (BCC-CSM1-1, IPSL-CM5A, CNRM-CM6-1, EC-Earth3-Veg, HadGEM3-GC31, and MPI-ESM1-2). Figure 11 shows the relationship between horizontal resolution and precipitation variability across time scales in these models. The statistical significance of the differences in the precipitation variability between model pairs is tested with the same t-test method used for Fig. 10. In subdaily and synoptic variability, all models analyzed here suggest that the higher-resolution version simulates stronger precipitation variability than the lower-resolution version. The subdaily variability increases more than the synoptic variability with the finer horizontal resolution, indicating that higher-frequency variability is more sensitive to the horizontal resolution. The horizontal resolution of this collection of models increases by about a factor of 2 from the lower- to the higher-resolution versions (i.e., BCC-CSM1-1: 2.5 times; IPSL-CM5A: 1.5 times; CNRM-CM6-1: 2.8 times; EC-Earth3-Veg: 1.6 times; HadGEM3-GC31: 2.3 times; and MPI-ESM1-2: 2 times). The subdaily variability increases by about 1.68 times, and the synoptic variability increases by about 1.15 times from the lower-resolution to the higher-resolution version. The relationship between horizontal resolution and higher-frequency variability is more evident over the ocean regions than the land regions (Figs. S5 and S6).

Fig. 11.
Fig. 11.

Relationship between horizontal resolution and variability estimated from power spectra–based method for (a) annual cycle, (b) semiannual cycle, and (c) diurnal cycle and for (d) interannual, (e) seasonal, (f) subseasonal, (g) synoptic, and (h) subdaily variability over a near-global domain (50°S–50°N). Results are shown only for CMIP contributions that include both higher- and lower-resolution versions of the same model. Arrows indicate the direction from lower- to higher-resolution versions in each model set. The black solid and dashed arrows respectively indicate statistically significant (>95% confidence) and insignificant changes based on a t test using all realization members. The gray arrow indicates that the significance test is not appropriate because of insufficient sample size (i.e., only one realization). Each realization is plotted with small dots, and the number of realizations for each model is represented in the legend. The gray shading indicates the range of the three satellite products.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

In the internal variability with time scales longer than synoptic, a systematic relationship between horizontal resolution and variability is not evident—IPSL-CM5A, EC-Earth3-Veg, and MPI-ESM1-2 simulate slightly stronger variability, but the others simulate weaker variability in the higher resolution version. A similar change of the variability is also shown in the annual and semiannual cycles. When the horizontal resolution is increased, IPSL-CM5A, EC-Earth3-Veg, HadGEM3-GC31, and MPI-ESM1-2 simulate stronger variability, but BCC-CSM1-1 and CNRM-CM6-1 models simulate weaker variability. Curiously, the diurnal cycle shows change nearly opposite to the annual and semiannual cycles. As the horizontal resolution increases, most models that show an increased variability in their annual and semiannual cycles exhibit a decreased variability in the diurnal cycle (e.g., IPSL-CM5A, EC-Earth3-Veg, and HadGEM3-GC31).

5. Summary and discussion

Temporal variability is a fundamental characteristic of precipitation across weather and climate scales, but models suffer from fundamental deficiencies that limit the realistic simulation of precipitation. In this study, we describe the development and application of a framework to assess large-scale forced and internal variability of simulated precipitation across time scales. Our analysis is based on simulated-to-observed ratios of variance estimated from two independent approaches. In both cases we estimate variability locally (at each grid point) but focus on more robust results by averaging over large spatial scales including the tropics, extratropics, and over land and ocean separately. For forced variability, we use power spectra of 3-hourly total precipitation and isolate the daily and annual frequencies, which is analogous to the well-established application of harmonic analysis applied in many previous studies. For estimating internal variability, we use a 3-hourly precipitation anomaly time series and average across selected power spectra frequencies representative of subdaily, synoptic, subseasonal, seasonal–annual, and interannual scales.

Independently, we also estimate internal variability using a method based on time averaging that can be clearly defined without the complication of processing choices associated with power spectra. The time-average method also allows us to examine the seasonality of internal variability at subdaily time scales. The intermittent nature of precipitation complicates the interpretation of power spectra analysis, particularly for internal variability, where we find the variability is often well below the levels of strict statistical significance. To assess the robustness of our large-scale results averaged across selected frequencies, we compare the ratio between observed and simulated variability resulting from both our power spectra and time averaging methods. Encouragingly, we find a close correspondence between the internal variability estimated with the two methods. Although its simplicity is appealing, we do not use the time-average approach as our baseline because it does not provide estimates of forced variability that are measured at a distinct frequency (i.e., the diurnal and annual cycles). Rather, we choose to use the time-average approach to demonstrate the consistency of our large scale results derived from our power spectra analysis, and to examine the seasonality of shorter-time-scale variability. We note that both frameworks are based on estimating variance across time scales, and as ratios enable us to readily identify if a model’s precipitation variability is muted, about right, or overactive.

By averaging results over large domains and across a range of frequencies, we have been able to demonstrate that our relative performance assessments (as gauged by simulated-to-observed variability ratios) are fairly robust. In particular, our conclusions comparing models in the CMIP ensemble are similar irrespective of which realization we examine from any given model. However, when gauging model improvements for an individual model, either during the development process or across CMIP generations, access to multiple realizations helps to quantify performance changes with statistical significance.

We evaluate 21 CMIP5 and 33 CMIP6 models with the three satellite-based precipitation estimates of IMERG, TRMM, and CMORPH. The three observational products show substantial differences in precipitation variability, especially at higher frequencies over the tropical ocean and the annual cycle amplitude over the southern midlatitude ocean. The IMERG product stands out with substantially larger variability across time scales and domains. Several recent studies have argued that IMERG is the more reliable of the three, and as such supersedes the earlier products (e.g., Wei et al. 2017; Khodadoust Siuki et al. 2017; Zhang et al. 2018). While the near-global satellite products have enabled us to partition variability over large domains, several deficiencies in these datasets may to some degree influence our results. Subdaily sampling is of particular concern, but the limited record length may also impact our estimates of interannual variability. Detailed comparison between satellite and in situ products is one avenue to better understand the limitations of estimating variability from satellite products, and indeed some investigators have made a few targeted comparisons (e.g., Wei et al. 2017; Tang et al. 2021). However, this important avenue of cross-validation invokes multiple challenges including comparing variability estimated from pointwise data with grid cells O(100 km), and the fact that each of the satellite products are constrained by some in situ data. Better understanding of the uncertainties in observations via examination of different classes of data is needed, but well beyond the scope of this study.

Noting the above caveats, in general the CMIP models do show a much larger spread in results than the satellite-based observational products, particularly for the diurnal cycle over the tropical land. Having applied our analysis with all three observational products we have found that in most cases our interpretation of the simulated variability is not dramatically altered by the observations we compare them to, despite the discrepancies between the satellite products. The CMIP models underestimate the total precipitation variability (Fig. 2), but when decomposed into different time scales we find that the models tend to overestimate the forced variability over the tropics and underestimate the internal variability more broadly and across time scales (Figs. 6 and 8). The overactive behavior of the simulated forced variability may indicate that the convective parameterizations within the models are too sensitive to the land and sea surface heating driven by the annual and diurnal cycles of solar insolation.

The systematic underestimation of internal variability is overall improved in many CMIP6 models when compared to their CMIP5 counterparts, but there is little or no improvement in forced variability (Fig. 10). To better understand the improvements identified, we examined the relationship between horizontal resolution and precipitation variability using the subset of models that provide both high- and low-resolution versions with the same model. The models with higher horizontal resolution tend to be more active for the internal variability at higher frequencies, but no clear influence of resolution is evident at time scales longer than synoptic scale. The lower-frequency precipitation variability could be more related to large-scale sources of the variability (e.g., ENSO) that is not much influenced by the horizontal resolution in state-of-the art climate models (about 1°–2° resolution). For the case of forced variability, horizontal resolution appears to have little impact. Our results suggest that increasing horizontal resolution can reduce errors in internal precipitation variability at synoptic and shorter time scales; however, the improvement we have identified is incremental and for a limited set of models. Several previous studies tested the role of horizontal resolution and convective parameterization on simulated precipitation using a single model (e.g., Johnson et al. 2016; Bush et al. 2015; Ahn and Kang 2018) and suggested that decreasing the portion of convective precipitation has a greater influence on precipitation variability than simply increasing horizontal resolution. The conventional convective parameterization does not appropriately reduce the portion of parameterized convective precipitation as the horizontal resolution increases, which needs to be addressed with scale-aware convective parameterization (e.g., Arakawa and Wu 2013; Ahn and Kang 2018). Convection-permitting models offer promise to improve the simulation of precipitation variability because they circumvent the uncertainty associated with the convective parameterization (Prein et al. 2015; Ma et al. 2022).

Although the focus of this study is on variability across time scales, it is known that precipitation variability is related to the mean state. This suggests that our domain-averaged variability estimates could be influenced by the domain-averaged mean state amount. We have compared domain-averaged mean amount model biases (here represented as a ratio to IMERG) to our variability estimates across time scales (Fig. 12). The mean state amount tends to be overestimated in the tropical region, while it falls more in the observational range in the extratropical region. A robust relationship between the mean state amount and variability is not evident at higher-frequency variability (e.g., diurnal cycle and subdaily variability), while it shows a statistically significant relationship (correlation coefficients of 0.3–0.6) for lower-frequency variability (e.g., annual cycle, interannual, and seasonal variability). The range in mean state performance across models is larger over the land regions than it is over the ocean, and it tends to be overestimated in the extratropical land region (see Fig. S7). The correlation tends to be higher over land than ocean. We note that the spatial pattern of the mean state is not included in the evaluation as our metric framework targets domain-averaged statistics. Further work is needed to better understand the complex differences in the variability–mean relationship between regions.

Fig. 12.
Fig. 12.

Relationship between the domain-averaged precipitation mean state and variability across time scales estimated from the power spectra–based method over (a),(b) NHEX (30°–50°N), (c),(d) tropics (30°S–30°N), and (e),(f) SHEX (50°–30°S) for (left) forced and (right) internal variability. The reference observation for the model/obs ratio is IMERG. Dots represent individual models (CMIP5 and CMIP6), and colors are coded by each time scale. The numbers with different colors in the upper-right corner of each panel represent correlation coefficients between the mean state and variability, across all models at each time scale. An asterisk following the correlation coefficient indicates a statistical significance (>95% confidence) based on a t test. The shaded vertical (mean state) and horizontal (variability) boxes represent the spread of the three observations from the minimum to the maximum values. The different line styles in the gray vertical boxes indicate different observations (solid for IMERG, dashed for TRMM, and dotted for CMORPH). Variability estimates for individual observations can be identified via comparison with Fig. 6.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

Our large-scale objective tests complement other studies that strive to target particular processes or phenomena. For example, a recent study (Fiedler et al. 2020) provides a thorough examination of tropical precipitation biases across generations of CMIP models. They find improvements across the CMIP phases in simulating the Madden–Julian oscillation (MJO) and ENSO, which is consistent with our conclusions that the simulated internal variability at intraseasonal and interannual scales are overall improved. Also consistent with our analysis, Fiedler et al. (2020) find that improvement is not uniform across all aspects of simulated precipitation. They find no improvement in summer monsoons and the double-ITCZ bias, while in this study we find little improvement in forced variability. Fiedler et al. (2020), Tang et al. (2021), and this study all demonstrate little progress in the simulated diurnal cycle of precipitation.

It is important to emphasize that our analysis is not designed to yield information about particular phenomena or processes, such as the structure of ITZC biases, MJO propagation, or the consistency between the simulated and observed characteristics of ENSO. Taking the latter as an example, our large-scale results are not partitioned by ocean basin and simply provide a crude measure of tropical variability including time scales greater than one year. While this includes the dominating signature of ENSO, there are many other analyses that provide more insightful information via explicit examination of ENSO characteristics (e.g., Planton et al. 2021).

The magnitude of precipitation variability is only one of many aspects needed to fully characterize simulated precipitation. Our analysis does not address the phase of forced variability although it can readily be combined with capabilities that do as described in other studies. Similarly, our baseline results do not reveal seasonal information without additional processing. More fundamentally, precipitation generally follows a gamma distribution, and its intermittent behavior requires examination of its frequency, intensity, and duration. A well-known and challenging systematic error in models is associated with more persistent drizzle than observed (e.g., Dai 2006; Pendergrass and Hartmann 2014), and one might expect this to be related to the muted variability we have identified at shorter times scales. We have examined this possibility and have concluded that the well-known drizzle bias is not readily identified in our analysis (see Fig. S8).

There continues to be considerable interest in reducing the spread of results from the CMIP multimodel ensemble by weighting twenty-first-century model projections of precipitation based on their consistency with observations (e.g., Schaller et al. 2011). This will likely continue to be an active area of research with current approaches emphasizing relationships between present-day performance and future responses via “emergent constraints” (e.g., Caldwell et al. 2018), methods to account for model dependence in the multimodel ensemble (e.g., Sanderson et al. 2017), or focusing on key physical processes (e.g., Fiedler et al. 2020). The results from the present study were not designed for weighting future projections, but providing a high-level objective perspective may be useful when used in conjunction with other metrics or analysis methods for this or other purposes.

State-of-the-art weather prediction and climate models are limited by long-standing deficiencies in simulated precipitation. A systematic examination of simulated precipitation requires a comprehensive suite of performance tests. The framework of the present study provides high-level summary statistics that would be a useful complement to other tests that are designed to gauge essential characteristics of precipitation including mean, intensity, frequency, duration, and type. Efforts are underway to implement an initial suite of baseline summary statistics into a common analysis framework for routine evaluation of simulated precipitation (Pendergrass et al. 2020). This could serve as a useful resource to modelers striving to improve simulated precipitation.

Acknowledgments.

We thank Jill Chengzhu Zhang and Sterling Baldwin (both of LLNL) for assisting with E3SM model data, Shuaiqi Tang (PNNL) for assistance with satellite data, and Ken Sperber for his valuable comments on our analysis. We acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output, the Earth System Grid Federation (ESGF) for archiving the output and providing access, and the multiple funding agencies who support CMIP and ESGF. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. The efforts of the authors were supported by the Regional and Global Model Analysis (RGMA) component of the Earth and Environmental System Modeling Program of the U.S. Department of Energy’s Office of Science and via National Science Foundation IA 1947282. C. Jakob is grateful for the support of the ARC Centre of Excellence for Climate Extremes (CE17010002). This document was prepared as an account of work sponsored by an agency of the U.S. government. Neither the U.S. government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the U.S. government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the U.S. government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.

Data availability statement.

All of the data used in this study are publicly available. The CMIP data are available at https://esgf-node.llnl.gov/projects/esgf-llnl. TRMM and IMERG are available on the NASA Goddard Space Flight Center at https://gpm.nasa.gov/missions/trmm and https://gpm.nasa.gov/data/imerg, respectively. CMORPH is available on the NOAA National Centers for Environmental Information at https://www.ncei.noaa.gov/products/climate-data-records/precipitation-cmorph. The statistics generated from this study and the interactive portrait plots with access to the underlying diagnostics were made available (CMIP5: https://pcmdi.llnl.gov/pmp-preliminary-results/interactive_plot/precip/variability_across_timescales/PS_across_timescales/portrait_PS_pr.3hr_ratio_regrid.180x90_cmip5_ensmean_obsmean_woSD_interactive.html, CMIP6: https://pcmdi.llnl.gov/pmp-preliminary-results/interactive_plot/precip/variability_across_timescales/PS_across_timescales/portrait_PS_pr.3hr_ratio_regrid.180x90_cmip6_ensmean_obsmean_woSD_interactive.html). Results are also available upon request to the authors. The precipitation variability metric presented in this study was released via the Program for Climate Model Diagnosis and Intercomparison (PCMDI) metric package (PMP; https://doi.org/10.5281/zenodo.5747897).

APPENDIX

Power Spectra Analysis and Sensitivity to Processing Choices

The power spectrum method used in this study is based on Welch (1967), enabled through a Python library called SciPy (https://docs.scipy.org/doc/scipy-1.9.1/reference/generated/scipy.signal.welch.html). This method divides the time series into overlapping segments with a specific window and overlap lengths and computes an estimation of the spectral power for each segment and averages them. We tested the sensitivity of power spectra results with several window length, overlap length, and windowing function. Figure A1 shows the sensitivity test results with 10-yr Hann window and 5-yr overlap (w10o5.hann), 10-yr Hann window and 9-yr overlap (w10o9.hann), 6-yr Hann window and 3-yr overlap (w6o3.hann), 6-yr Hann window and 5-yr overlap (w6o5.hann), and 10-yr boxcar window and 5-yr overlap (w10o5.boxcar). Increasing overlap length shows little impact on the averaged spectral power at specific time scales, but shortening the window length from 10 to 6 years substantially increases the spectral power by about 50% in internal variability and by about 20% in forced variability. Also, the spectral power is substantially sensitive to the different windowing functions (Fig. A1a). However, the ratio of model to observation, which is the metric we presented, is not much affected by the window length, overlap length, and windowing function (Fig. A1b). The metric results from the different processing choices agree well with each other, which supports the robustness of our metric.

Fig. A1.
Fig. A1.

Sensitivity of power spectra analysis to window length, overlap length, and windowing function shown with portrait plots. (a) The observational spectral power normalized by the result with 10-yr window, 5-yr overlap, and Hann windowing function (first row of the plot). (b) The ratio of model to observation using the same window length, overlap length, and windowing function.

Citation: Journal of Climate 35, 20; 10.1175/JCLI-D-21-0542.1

REFERENCES

  • Ahn, M.-S., and I.-S. Kang, 2018: A practical approach to scale-adaptive deep convection in a GCM by controlling the cumulus base mass flux. npj Climate Atmos. Sci., 1, 13, https://doi.org/10.1038/s41612-018-0021-0.

    • Search Google Scholar
    • Export Citation
  • Ahn, M.-S., D. Kim, K. R. Sperber, I.-S. Kang, E. Maloney, D. Waliser, and H. Hendon, 2017: MJO simulation in CMIP5 climate models: MJO skill metrics and process-oriented diagnosis. Climate Dyn., 49, 40234045, https://doi.org/10.1007/s00382-017-3558-4.

    • Search Google Scholar
    • Export Citation
  • Ahn, M.-S., and Coauthors, 2020: MJO propagation across the Maritime Continent: Are CMIP6 models better than CMIP5 models? Geophys. Res. Lett., 47, e2020GL087250, https://doi.org/10.1029/2020GL087250.

    • Search Google Scholar
    • Export Citation
  • Alexander, L. V., and Coauthors, 2019: On the use of indices to study extreme precipitation on sub-daily and daily timescales. Environ. Res. Lett., 14, 125008, https://doi.org/10.1088/1748-9326/ab51b6.

    • Search Google Scholar
    • Export Citation
  • Arakawa, A., and C.-M. Wu, 2013: A unified representation of deep moist convection in numerical modeling of the atmosphere. Part I. J. Atmos. Sci., 70, 19771992, https://doi.org/10.1175/JAS-D-12-0330.1.

    • Search Google Scholar
    • Export Citation
  • Bellenger, H., E. Guilyardi, J. Leloup, M. Lengaigne, and J. Vialard, 2014: ENSO representation in climate models: From CMIP3 to CMIP5. Climate Dyn., 42, 19992018, https://doi.org/10.1007/s00382-013-1783-z.

    • Search Google Scholar
    • Export Citation
  • Berndtsson, R., and J. Niemczynowicz, 1988: Spatial and temporal scales in rainfall analysis—Some aspects and future perspectives. J. Hydrol., 100, 293313, https://doi.org/10.1016/0022-1694(88)90189-8.

    • Search Google Scholar
    • Export Citation
  • Bonfils, C. J. W., B. D. Santer, J. C. Fyfe, K. Marvel, T. J. Phillips, and S. R. H. Zimmerman, 2020: Human influence on joint changes in temperature, rainfall and continental aridity. Nat. Climate Change, 10, 726731, https://doi.org/10.1038/s41558-020-0821-1.

    • Search Google Scholar
    • Export Citation
  • Brown, J. R., A. F. Moise, and R. A. Colman, 2017: Projected increases in daily to decadal variability of Asian-Australian monsoon rainfall. Geophys. Res. Lett., 44, 56835690, https://doi.org/10.1002/2017GL073217.

    • Search Google Scholar
    • Export Citation
  • Bush, S. J., A. G. Turner, S. J. Woolnough, G. M. Martin, and N. P. Klingaman, 2015: The effect of increased convective entrainment on Asian monsoon biases in the MetUM general circulation model. Quart. J. Roy. Meteor. Soc., 141, 311326, https://doi.org/10.1002/qj.2371.

    • Search Google Scholar
    • Export Citation
  • Caldwell, P. M., M. D. Zelinka, and S. A. Klein, 2018: Evaluating emergent constraints on equilibrium climate sensitivity. J. Climate, 31, 39213942, https://doi.org/10.1175/JCLI-D-17-0631.1.

    • Search Google Scholar
    • Export Citation
  • Chen, D., and A. Dai, 2019: Precipitation characteristics in the Community Atmosphere Model and their dependence on model physics and resolution. J. Adv. Model. Earth Syst., 11, 23522374, https://doi.org/10.1029/2018MS001536.

    • Search Google Scholar
    • Export Citation
  • Chen, D., A. Dai, and A. Hall, 2021: The convective‐to‐total precipitation ratio and the “drizzling” bias in climate models. J. Geophys. Res. Atmos., 126, e2020JD034198, https://doi.org/10.1029/2020JD034198.

    • Search Google Scholar
    • Export Citation
  • Covey, C., P. J. Gleckler, C. Doutriaux, D. N. Williams, A. Dai, J. Fasullo, K. Trenberth, and A. Berg, 2016: Metrics for the diurnal cycle of precipitation: Toward routine benchmarks for climate models. J. Climate, 29, 44614471, https://doi.org/10.1175/JCLI-D-15-0664.1.

    • Search Google Scholar
    • Export Citation
  • Covey, C., C. Doutriaux, P. J. Gleckler, K. E. Taylor, K. E. Trenberth, and Y. Zhang, 2018: High‐frequency intermittency in observed and model‐simulated precipitation. Geophys. Res. Lett., 45, 12 51412 522, https://doi.org/10.1029/2018GL078926.

    • Search Google Scholar
    • Export Citation
  • Cristiano, E., M.-C. ten Veldhuis, and N. van de Giesen, 2017: Spatial and temporal variability of rainfall and their effects on hydrological response in urban areas—A review. Hydrol. Earth Syst. Sci., 21, 38593878, https://doi.org/10.5194/hess-21-3859-2017.

    • Search Google Scholar
    • Export Citation
  • Dai, A., 2006: Precipitation characteristics in eighteen coupled climate models. J. Climate, 19, 46054630, https://doi.org/10.1175/JCLI3884.1.

    • Search Google Scholar
    • Export Citation
  • Dai, A., X. Lin, and K.-L. Hsu, 2007: The frequency, intensity, and diurnal cycle of precipitation in surface and satellite observations over low- and mid-latitudes. Climate Dyn., 29, 727744, https://doi.org/10.1007/s00382-007-0260-y.

    • Search Google Scholar
    • Export Citation
  • Dunning, C. M., R. P. Allan, and E. Black, 2017: Identification of deficiencies in seasonal rainfall simulated by CMIP5 climate models. Environ. Res. Lett., 12, 114001, https://doi.org/10.1088/1748-9326/aa869e.

    • Search Google Scholar
    • Export Citation
  • Eyring, V., S. Bony, G. A. Meehl, C. A. Senior, B. Stevens, R. J. Stouffer, and K. E. Taylor, 2016: Overview of the Coupled Model Intercomparison Project phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev., 9, 19371958, https://doi.org/10.5194/gmd-9-1937-2016.

    • Search Google Scholar
    • Export Citation
  • Eyring, V., and Coauthors, 2019: Taking climate model evaluation to the next level. Nat. Climate Change, 9, 102110, https://doi.org/10.1038/s41558-018-0355-y.

    • Search Google Scholar
    • Export Citation
  • Fasullo, J. T., 2020: Evaluating simulated climate patterns from the CMIP archives using satellite and reanalysis datasets using the Climate Model Assessment Tool (CMATv1). Geosci. Model Dev., 13, 36273642, https://doi.org/10.5194/gmd-13-3627-2020.

    • Search Google Scholar
    • Export Citation
  • Fasullo, J. T., A. S. Phillips, and C. Deser, 2020: Evaluation of leading modes of climate variability in the CMIP archives. J. Climate, 33, 55275545, https://doi.org/10.1175/JCLI-D-19-1024.1.

    • Search Google Scholar
    • Export Citation
  • Fiedler, S., and Coauthors, 2020: Simulated tropical precipitation assessed across three major phases of the Coupled Model Intercomparison Project (CMIP). Mon. Wea. Rev., 148, 36533680, https://doi.org/10.1175/MWR-D-19-0404.1.

    • Search Google Scholar
    • Export Citation
  • Flato, G., and Coauthors, 2013: Evaluation of climate models. Climate Change 2013: The Physical Science Basis. Cambridge University Press, 741866.

  • Gates, W. L., and Coauthors, 1999: An overview of the results of the Atmospheric Model Intercomparison Project (AMIP I). Bull. Amer. Meteor. Soc., 80, 2955, https://doi.org/10.1175/1520-0477(1999)080<0029:AOOTRO>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Gehne, M., T. M. Hamill, G. N. Kiladis, and K. E. Trenberth, 2016: Comparison of global precipitation estimates across a range of temporal and spatial scales. J. Climate, 29, 77737795, https://doi.org/10.1175/JCLI-D-15-0618.1.

    • Search Google Scholar
    • Export Citation
  • Gleckler, P. J., K. E. Taylor, and C. Doutriaux, 2008: Performance metrics for climate models. J. Geophys. Res., 113, D06104, https://doi.org/1029/2007JD008972.

    • Search Google Scholar
    • Export Citation
  • Gleckler, P. J., C. Doutriaux, P. Durack, K. Taylor, Y. Zhang, D. Williams, E. Mason, and J. Servonnat, 2016: A more powerful reality test for climate models. Eos, 97, https://doi.org/10.1029/2016EO051663.

    • Search Google Scholar
    • Export Citation
  • Hosseini‐Moghari, S., and Q. Tang, 2022: Can IMERG data capture the scaling of precipitation extremes with temperature at different time scales? Geophys. Res. Lett., 49, e2021GL096392, https://doi.org/10.1029/2021GL096392.

    • Search Google Scholar
    • Export Citation
  • Hsu, C.-P. F., and J. M. Wallace, 1976: The global distribution of the annual and semiannual cycles in precipitation. Mon. Wea. Rev., 104, 10931101, https://doi.org/10.1175/1520-0493(1976)104<1093:TGDOTA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hu, Y., Y. Deng, Z. Zhou, H. Li, C. Cui, and X. Dong, 2019: A synoptic assessment of the summer extreme rainfall over the middle reaches of Yangtze River in CMIP5 models. Climate Dyn., 53, 21332146, https://doi.org/10.1007/s00382-019-04803-3.

    • Search Google Scholar
    • Export Citation
  • Huffman, G. J., and Coauthors, 2007: The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi-global, multiyear, combined-sensor precipitation estimates at fine scales. J. Hydrometeor., 8, 3855, https://doi.org/10.1175/JHM560.1.

    • Search Google Scholar
    • Export Citation
  • Huffman, G. J., and Coauthors, 2020: Integrated multi-satellite retrievals for the global precipitation measurement (GPM) mission (IMERG). Satellite Precipitation Measurement, V. Levizzani et al., Eds., Springer, 343353, https://doi.org/10.1007/978-3-030-24568-9_19.

    • Search Google Scholar
    • Export Citation
  • IPCC, 2021: Climate Change 2021: The Physical Science Basis. V. Masson-Delmotte et al., Eds., Cambridge University Press, in press.

  • Johnson, S. J., and Coauthors, 2016: The resolution sensitivity of the South Asian monsoon and Indo-Pacific in a global 0.35° AGCM. Climate Dyn., 46, 807831, https://doi.org/10.1007/s00382-015-2614-1.

    • Search Google Scholar
    • Export Citation
  • Khodadoust Siuki, S., B. Saghafian, and S. Moazami, 2017: Comprehensive evaluation of 3-hourly TRMM and half-hourly GPM-IMERG satellite precipitation products. Int. J. Remote Sens., 38, 558571, https://doi.org/10.1080/01431161.2016.1268735.

    • Search Google Scholar
    • Export Citation
  • Kim, D., and Coauthors, 2009: Application of MJO simulation diagnostics to climate models. J. Climate, 22, 64136436, https://doi.org/10.1175/2009JCLI3063.1.

    • Search Google Scholar
    • Export Citation
  • Kirkyla, K. I., and S. Hameed, 1989: Harmonic analysis of the seasonal cycle in precipitation over the United States: A comparison between observations and a general circulation model. J. Climate, 2, 14631475, https://doi.org/10.1175/1520-0442(1989)002<1463:HAOTSC>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Klingaman, N. P., G. M. Martin, and A. Moise, 2017: ASoP (v1.0): A set of methods for analyzing scales of precipitation in general circulation models. Geosci. Model Dev., 10, 5783, https://doi.org/10.5194/gmd-10-57-2017.

    • Search Google Scholar
    • Export Citation
  • Klink, K., J. J. Wiersma, C. J. Crawford, and D. D. Stuthman, 2014: Impacts of temperature and precipitation variability in the northern plains of the United States and Canada on the productivity of spring barley and oat. Int. J. Climatol., 34, 28052818, https://doi.org/10.1002/joc.3877.

    • Search Google Scholar
    • Export Citation
  • Knutson, T. R., and F. Zeng, 2018: Model assessment of observed precipitation trends over land regions: Detectable human influences and possible low bias in model trends. J. Climate, 31, 46174637, https://doi.org/10.1175/JCLI-D-17-0672.1.

    • Search Google Scholar
    • Export Citation
  • Koutroulis, A. G., M. G. Grillakis, I. K. Tsanis, and L. Papadimitriou, 2016: Evaluation of precipitation and temperature simulation performance of the CMIP3 and CMIP5 historical experiments. Climate Dyn., 47, 18811898, https://doi.org/10.1007/s00382-015-2938-x.

    • Search Google Scholar
    • Export Citation
  • Lee, J., Y. Xue, F. De Sales, I. Diallo, L. Marx, M. Ek, K. R. Sperber, and P. J. Gleckler, 2019a: Evaluation of multi-decadal UCLA-CFSv2 simulation and impact of interactive atmospheric–ocean feedback on global and regional variability. Climate Dyn., 52, 36833707, https://doi.org/10.1007/s00382-018-4351-8.

    • Search Google Scholar
    • Export Citation
  • Lee, J., K. R. Sperber, P. J. Gleckler, C. J. W. Bonfils, and K. E. Taylor, 2019b: Quantifying the agreement between observed and simulated extratropical modes of interannual variability. Climate Dyn., 52, 40574089, https://doi.org/10.1007/s00382-018-4355-4.

    • Search Google Scholar
    • Export Citation
  • Lee, J., K. R. Sperber, P. J. Gleckler, K. E. Taylor, and C. J. W. Bonfils, 2021: Benchmarking performance changes in the simulation of extratropical modes of variability across CMIP generations. J. Climate, 34, 69456969, https://doi.org/10.1175/JCLI-D-20-0832.1.

    • Search Google Scholar
    • Export Citation
  • Lee, M.-I., and Coauthors, 2007: An analysis of the warm-season diurnal cycle over the continental United States and northern Mexico in general circulation models. J. Hydrometeor., 8, 344366, https://doi.org/10.1175/JHM581.1.

    • Search Google Scholar
    • Export Citation
  • Lee, Y.-C., and Y.-C. Wang, 2021: Evaluating diurnal rainfall signal performance from CMIP5 to CMIP6. J. Climate, 34, 76077623, https://doi.org/10.1175/JCLI-D-20-0812.1.

    • Search Google Scholar
    • Export Citation
  • Ma, H.-Y., and Coauthors, 2018: CAUSES: On the role of surface energy budget errors to the warm surface air temperature error over the central United States. J. Geophys. Res. Atmos., 123, 28882909, https://doi.org/10.1002/2017JD027194.

    • Search Google Scholar
    • Export Citation
  • Ma, H.-Y., S. A. Klein, J. Lee, M. Ahn, C. Tao, and P. J. Gleckler, 2022: Superior daily and sub‐daily precipitation statistics for intense and long‐lived storms in global storm‐resolving models. Geophys. Res. Lett., 49, e2021GL096759, https://doi.org/10.1029/2021GL096759.

    • Search Google Scholar
    • Export Citation
  • Martin, G. M., N. P. Klingaman, and A. F. Moise, 2017: Connecting spatial and temporal scales of tropical precipitat