1. Introduction
Earth system models (ESMs) are indispensable tools for studying the climate of the past and present and for providing projections of Earth’s potential future climate. An assessment of the accuracy of ESMs to predict present-day climate conditions and their variability is therefore essential. The Coupled Model Intercomparison Project (CMIP) coordinates multimodel climate experiments using state-of-the-art ESMs. Experiments are conducted by contributing international climate modeling centers. The CMIP experiments are essential for comparing the quality of models from different research institutions and contribute to international climate assessments like the Intergovernmental Panel on Climate Change (IPCC) reports (Flato et al. 2013). The present study analyzes results of two phases of CMIP [phases 3 and 5 of CMIP (CMIP3 and CMIP5, respectively)] (Meehl et al. 2007; Taylor et al. 2012).
Model evaluation using quantitative model performance metrics has become a widely used practice in different research communities (Abramowitz et al. 2008; Gleckler et al. 2008; Reichler and Kim 2008; Blyth et al. 2011; Abramowitz 2012; Luo et al. 2012; Gettelman et al. 2012; Luo et al. 2012; Anav et al. 2013; Goddard et al. 2013; Brovkin et al. 2013; Eyring et al. 2016). A standardized and regular model evaluation procedure is supposed to be implemented in future CMIPs as well (Meehl et al. 2014).
To evaluate either model processes or the general capabilities of models to represent climate mean states and their variability, the assessment of different geophysical variables is important. In particular, geophysical information derived from satellite observations is used for that purpose, as it provides global data.
Considerable effort has been devoted to improve the atmospheric radiative transfer codes and their parameterizations in CMIP models (e.g., Stevens et al. 2013). The surface solar radiation fluxes are the major energy input for Earth’s surface. An assessment of their accuracy is therefore an indicator of the quality of the atmospheric radiative transfer codes used in ESMs. An analysis of the accuracy of the surface solar radiation fluxes is therefore of major importance. Li et al. (2013) provided a first analysis of the accuracy of shortwave and longwave radiation fluxes in CMIP5 simulations, focusing on top-of-the-atmosphere fluxes. Individual assessments of surface water and energy flux components of single ESMs have been reported as well (Hagemann et al. 2013; Brovkin et al. 2013).
Allen et al. (2013) used in situ observations of the Global Energy Balance Archive to evaluate CMIP5 surface solar downwelling radiation fluxes for selected regions. They investigated how CMIP5 models reproduce regionally observed decrease (dimming) or increase (brightening) of surface solar downwelling radiation fluxes. It was found that ESM typically underestimated the multidecadal trends observed at the ground stations.
However, no thorough assessment of surface solar radiation fluxes of CMIP3 and CMIP5 models using state-of-the-art global observational datasets has been conducted so far. The present study therefore provides a complementary assessment of the surface solar radiation fluxes, as it (i) provides a global analysis using multiple observational data records and (ii) analyzes the downwelling as well as upwelling radiation fluxes. Global and long-term satellite products are used as reference. In particular, the following research questions are addressed in the study:
How well do CMIP models simulate the surface radiation fluxes at climatological time scales? The study is focused on providing skill scores for different surface solar radiation fluxes, which provide information on relative model performance compared to chosen reference datasets. The utilized skill scores provide a quantitative measure of how well a model simulates the climatological mean seasonal cycle of chosen geophysical variables.
Do CMIP5 models provide improved surface solar radiation fluxes compared to CMIP3? We compare results from the large ensemble of CMIP5 models with their predecessors of CMIP3 to assess if the accuracy in simulated surface solar radiation fluxes has changed between the two phases of CMIP.
How robust are model rankings, given uncertainties in the observational products? Models can be ranked according to their skill for representing the temporal and spatial patterns of geophysical variables. Such a ranking is, however, also dependent on the uncertainties of the reference data. We therefore analyze how uncertainties in observations affect CMIP model assessments of surface solar radiation fluxes.
How well do CMIP models reproduce the multidecadal trends in surface radiation fluxes? Systematic changes in the surface solar radiation have been reported in the literature, known as global dimming and brightening (Wild 2009). We investigate how these multidecadal changes are represented in CMIP simulations as well as long-term satellite radiation products.
We first briefly introduce the CMIP models and datasets used in section 2. The data (pre) processing steps and evaluation methods are introduced thereafter (section 3). Results of the analysis are provided in section 4.
2. Data
a. CMIP5 model data
Data from CMIP5 simulations were obtained through the Earth System Grid Federation node of the German Climate Computing Centre (http://esgf-data.dkrz.de). The analysis in the present study focuses on two core CMIP5 experiments (AMIP and historical), which are supposed to represent present climate conditions and are therefore expected to be best comparable with satellite observations (Taylor et al. 2012):
Historical (twentieth century) experiments comprise fully coupled (ocean–atmosphere–land) simulations with prescribed greenhouse gas concentrations and are supposed to represent present climate conditions. The period 1979–2005 is analyzed in the present study.
AMIP simulations correspond to an atmosphere–land-only setup. The ocean state (sea surface temperature and sea ice distribution) is prescribed from observational data. The period 1979–2008 is used in the present study.
For each model and experiment, several realizations (ensemble members) are available. The spread among the ensemble members accounts for the internal variability of each model. All CMIP5 models that fulfilled the following criteria were selected: (i) surface downwelling R↓ and upwelling R↑ shortwave radiation fluxes were available; (ii) more than one ensemble member was available; and (iii) simulations for either the AMIP or the historical experiments were provided. Based on these requirements, a total of 54 different models were selected for the present study, which corresponds to 90% of all models providing input to CMIP5. A summary of the used models is provided in Table A1.
b. CMIP3 model data
The predecessor of CMIP5 was CMIP3 (Meehl et al. 2007). CMIP3 data are used in this study in particular to assess if the representation of the surface solar radiation fluxes in CMIP5 has been improved.
For CMIP3, we used all models that provided data for the AMIP experiment and also provided the required surface solar upward and downward radiation fluxes. We focused on AMIP simulations, as these prescribe sea surface temperature and sea ice distributions from observations, which should result in the most realistic surface–atmosphere fluxes. A total of nine models is used. A summary of the selected CMIP3 models is provided in Table A1 as well.
c. Observations
Different observational datasets of surface downwelling R↓ and upwelling R↑ all-sky and broadband (0.25, …, 4 μm) surface solar radiation fluxes are used for the assessment. The surface solar net radiation flux (RN = R↓ − R↑) is used as an additional metric, as it determines the amount of available energy for surface processes. The following observational datasets are used:
The International Satellite Cloud Climatology Project (ISCCP) provides information on surface solar radiation fluxes at spatial scales of approximately 280 × 280 km2 and at 3-hourly temporal resolution. Further details on the estimation of surface fluxes for ISCCP are provided by Zhang et al. (2004). Data from 1989 to 2009 were used for the present analysis.
The NASA Global Energy and Water Cycle Experiment (GEWEX) Surface Radiation Budget (SRB) project aims at the production of long-term datasets of shortwave and longwave surface and top-of-atmosphere radiation fluxes. It provides 3-hourly to monthly global products with a spatial resolution of 1° × 1°. The fluxes are calculated using cloud parameters from ISCCP and meteorological fields from the NASA GMAO reanalysis dataset. Monthly means of the shortwave surface radiation flux products from SRB, version 3.0 (SRB3.0 ;Stackhouse et al. 2011; Zhang et al. 2013) are used in the present study.
Clouds and the Earth’s Radiant Energy System (CERES), version 2.7 (Ed2.7), surface solar radiation fluxes are derived from measurements onboard the EOS Terra and Aqua satellites (Loeb et al. 2012). The CERES surface fluxes are obtained from the CERES Energy Balanced and Filled (EBAF) surface radiation product (Kato et al. 2013).
The Satellite Application Facility on Climate Monitoring (CM SAF) Cloud, Albedo and Radiation dataset from AVHRR data (CLARA-A1) provides a 28-yr (1982–2009) record of cloud parameters and radiation fluxes. Details on the product and algorithms used are provided by Karlsson et al. (2013).
Major differences between the observations are due to different retrieval algorithms and the origin of the satellite input data (i.e., the type of sensor from which the data products are derived). The GEWEX radiation flux assessment provides a comprehensive review and intercomparison of these flux products (Raschke et al. 2012). The dataset details and references are summarized in Table 1.
Datasets used for surface radiation assessment.
d. Analysis regions
The comparison of models and observations is either done for global or regional scales. For the latter, we used 26 regions as predefined in the IPCC Special Report on Extreme Events (SREX). A map of the defined regions is shown in Fig. B1 and the labels for the regions used subsequently in the manuscript are given in appendix B.
3. Methods
a. Data preprocessing
The evaluation of the CMIP models with a set of observations requires careful data preprocessing and a coherent framework for intercomparison. The following processing steps are consistently applied to all model simulations and observations:
Ensemble mean calculation: CMIP model simulations are provided as an ensemble of realizations for each experiment. The ensemble mean is used for further data analysis.
Monthly means: All model simulations and observations are aggregated to monthly means.
Spatial remapping: The observations and model data are remapped to a common grid (horizontal T63 spectral truncation, ~1.8° × 1.8°) using an energy conservative approach with weights proportional to the grid cell sizes.
Consistent data mask: The observational datasets typically contain gaps in space and time. These gaps were therefore also applied to the model fields to avoid any sampling-related biases in the comparisons between models and observations.
Climatological means: Climatological means are calculated for each grid point from the entire time period of each dataset and are used for further analysis. The effect of different lengths of the individual data records was tested with a sensitivity analysis, which showed only a minor impact on the biases between models and observations.
All data analysis and model–data intercomparisons are performed using an open-source and flexible data analysis framework for the analysis of geospatial data and the benchmarking of models (Loew 2015a).
b. Evaluation skill scores
In the following, we define the skill score metrics that are used to assess the CMIP models. We focus on global skill scores, which provide a first insight into general model performance. Details on spatial and temporal model biases are not provided within this paper as a result of the multitude of models analyzed but can be found in separate reports (see appendix E for details).
1) Global skill scores

2) Consistency of model ranking
Models can be ranked according to the skill score defined by Eq. (2) for each observation. If all observational datasets provide the same coherent picture of a geophysical field, the resulting model ranking is expected to be independent of the observational reference. However, as the reference datasets differ themselves, the relative model ranking would also depend on the choice of the reference itself. The Spearman rank correlation coefficient rs is used as a measure for the similarity of model rankings obtained using different observational datasets.





3) Multidecadal variability
4. Results and discussion
a. Overview: CMIP3 versus CMIP5
Results of the multimodel comparison are analyzed in the following. Figure 1 shows the distribution of the overall skill score metric I2 [see Eq. (3)] for the different surface solar radiation fluxes and experiments. The boxes correspond to the interquartile range (IQR; 25%–75%) of the I2 values from all models, whereas the whiskers correspond to 1.5 times the IQR.
Multivariate model skill score I2 for (top) R↓, (middle) R↑, and (bottom) RN radiation fluxes for CMIP5 and CMIP3 models and AMIP and historical experiments. Boxes correspond to the interquartile range (25%–75%), while whiskers correspond to extreme data values within 1.5 times the IQR.
Citation: Journal of Climate 29, 20; 10.1175/JCLI-D-14-00503.1
In general, CMIP5 models have lower I2 values than CMIP3 simulations, which indicates an overall better performance of the new model generation. Best results (lowest I2) are obtained for CMIP5 AMIP experiments with median values for I2 below zero.
The distributions of I2 for the downwelling and net solar radiation fluxes are symmetric, while the results for the upwelling solar radiation flux are positively skewed, which indicates a larger spread and uncertainty of the upwelling flux estimates compared to the observations.
The different CMIP models are ranked in accordance to their normalized RMSD E′ for each observation. This ranking is summarized in Tables C1 and C2 for all variables and experiments. In general, the multimodel mean outperforms the individual models across the different observations and variables for both the AMIP and historical simulations. This indicates that the ensemble of different models is generally in better agreement with the observations than individual models, which is consistent with previous findings (Flato et al. 2013).
Details of the spatial and temporal deviations of each model compared to the observational datasets are provided in additional reports (see appendix E for details). The global mean absolute error is, on average, 7.5, 5.9, and 6.5 W m−2 for R↓, R↑, and RN, respectively, and ranges between 4 and 10 W m−2.
b. Consistency of model ranking
It is expected that the spread in E′, caused by the different observations, will also affect the relative model ranking. While I2 provides a general metric for the overall performance of a model across different observational datasets, the cυ [Eq. (6)] is an initial measure of the variability of E′. The distributions of cυ are provided in Fig. 2 for the different experiments and CMIP phases. The variability of E′ is typically on the order of 50% for all fluxes and experiments. Largest uncertainties in E′ are observed for R↑, where the distribution of cυ is largely skewed and the uncertainty of the upward solar radiation flux can be more than twice I2. This variability also affects the relative model ranking.
As in Fig. 1, but for distribution of cυ [Eq. (6)].
Citation: Journal of Climate 29, 20; 10.1175/JCLI-D-14-00503.1
The Spearman rank correlation coefficient rs [Eq. (5)] is a measure for the consistency of the model ranks obtained using different observational datasets. Table 2 summarizes the rank correlations between all pairs of observations. For none of the combinations, the same relative model ranking is obtained (rs = 1). A good agreement is nevertheless found for many of them, for both the AMIP and historical experiments. Best agreement for the surface downweling solar radiation fluxes is observed between the CERES2.7 dataset and the ISCCP (SRB) dataset with rs = 0.82 (0.92). The CLARA-A1 radiation dataset shows, in general, a smaller agreement with the other radiation datasets with respect to the model ranking. Uncertainties in model ranking are, in general, larger for R↑ with 0.59 ≤ rs ≤ 0.85.
Spearman rank correlation coefficients rs of model ranks using different observational datasets for AMIP (historical in parentheses) simulations.
c. Multidecadal variability
The multidecadal trend maps for R↓ are provided in Fig. 3. Similar maps for R↑ are given in Fig. D1. The correlation coefficients r and temporal regression slopes m are provided in Fig. 4 and Fig. 5 for each region. The boxes indicate IQR, and the trend estimates from the observations are provided as markers. Results are discussed for each variable in the following.
Multidecadal regional trends of R↓ (W m−2 a−1) for different observational datasets: (left) ISSCP, (center) SRB3.0, and (right) CLARA-A1. Trends are significant (p < 0.05) for |r| > 0.13.
Citation: Journal of Climate 29, 20; 10.1175/JCLI-D-14-00503.1
Multidecadal regional trends of R↓: (top) The Pearson product moment correlation coefficient and (bottom) multidecadal trend (W m−2 a−1) for IPCC regions are shown. Shaded areas correspond to correlations that are not significant (p > 0.05). Box plots represent model results from AMIP experiment, where the box corresponds to the IQR. The following datasets are represented: ISCCP (red dots), SRB3.0 (blue diamonds), and CLARA-A1 (green rectangle).
Citation: Journal of Climate 29, 20; 10.1175/JCLI-D-14-00503.1
As in Fig. 4, but for multidecadal regional trends of R↑.
Citation: Journal of Climate 29, 20; 10.1175/JCLI-D-14-00503.1
1) Surface solar downward radiation flux
Significant temporal trends (p ≤ 0.05) are obtained only for regions with |r| ≥ 0.13, which is only the case for a limited number of regions. The trends of the CMIP models are typically smaller than those obtained from the satellite datasets. This is consistent with results from Allen et al. (2013), who also identified an underestimation of dimming and brightening trends in CMIP models. While the trends from CMIP are typically not statistically significant, the satellite observations show significant positive or negative trends for various regions (see appendix B for region acronyms).
An increase (brightening) is observed for all datasets over Europe and the Mediterranean (NEU, CEU, and MED), northern Africa (SAH), the Arabian Peninsula (WAS), and central and southern Asia (CAS, TIB, and SAS), including parts of Central and North America (CAM and ENA). A reduction of solar radiation (dimming) is observed in all datasets for Siberia (NAS), Southeast Asia (SEA), and western and southern Africa (WAF and SAF). However, the temporal trends of the satellite datasets have, in many cases, different magnitudes and even opposite signs and are statistically not significant (|r| < 0.13). These trends need to be interpreted very carefully because of the risk of spurious trends caused by, for example, changes in the observing system (cf. section 3).
Allen et al. (2013) provide an independent study of long-term trends (1987–2009) in surface solar irradiance for Europe, China, Japan, and India. We therefore compare the satellite trend estimates with their results for China, Japan, and Europe for the same time period (Table 3). A comparison for India is not possible, as Allen et al. (2013) only provide information on dimming trends prior to 1989. For China and Japan, the increase in surface solar radiation is 2.6 ± 2.2 and 4.4 ± 2.0 W m−2 decade−1 for the ground measurements. While the ISCCP and SRB3.0 datasets show even a decrease in R↓ for the same time period, the CLARA-A1 dataset shows an increase of 2.3 W m−2 decade−1, which is much closer to the reported in situ observations. The European region is divided into three subregions (NEU, CEU, and MED). Allen et al. (2013) reported a positive trend of 3.5 ± 1.95 W m−2 decade−1. The derived trends from the satellite observations are, in general, much lower, except for the CLARA-A1 dataset, which shows positive trends for central and southern Europe (CEU and MED) that are comparable to the findings of Allen et al. (2013).
Observed brightening trends (W m−2 decade−1) reported by Allen et al. (2013) and estimated from satellite observations for different regions. Results are provided for IPCC regions defined in Fig. B1.
While the CMIP models in general underestimate the long-term trend, they at least show positive trends for the aforementioned European and Mediterranean regions (NEU, CEU, and MED). This initial comparison is still very limited, as the reliability of the trend estimates from satellite datasets needs further investigation and cross-comparison with in situ data. The trends for CMIP historical experiments are comparable to those obtained for the AMIP experiments and are provided in Fig. 5.
2) Surface upwelling solar radiation flux
As the upward solar radiation flux is the product of the downwelling solar radiation flux and the surface albedo (R↑ = αR↓), it depends on the temporal variation of R↓ and the albedo. The surface albedo is typically simulated in different ways in ESMs. While it was common practice for CMIP3 to prescribe the surface conditions using climatological mean values, most of the land surface schemes in the current ESMs use an interactive albedo scheme, which simulates the surface albedo as a function of the model state. A comparison of current albedo schemes used in ESMs and their accuracy was investigated by Loew et al. (2014).
ISCCP and SRB3.0 show, in general, a decrease in R↑ in North and South America with a maximum decrease of more than 5 W m−2 decade−1 over the Amazon (Fig. 5, bottom), while an increase is mainly observed in eastern Asia. Large differences between the simulated decadal changes of R↑ and the satellite records are observed. While the models show no significant trend for most of the IPCC regions, the ISCCP dataset and SRB3.0 show significant changes for many areas. The observed change is often several times larger than those of the models. The spread of trend values among the models is small, and the magnitude of trends is minor.
5. Conclusions
This study provides an initial assessment of the surface solar radiation fluxes for the CMIP3 and CMIP5 ensembles. The accuracy of the CMIP model simulations to describe the mean state and variability of the investigated variables (R↑, R↓, and RN) is evaluated. The consistency of using different observational records for the assessment has been investigated.
Accuracy of CMIP models, when the multimodel mean, in general, outperforms the individual models. This is consistent with previous findings (e.g., Flato et al. 2013). The accuracy of different models shows a large spread on the order of ±20%.
CMIP5 models have improved compared to CMIP3 models. For all investigated surface radiation fluxes, the CMIP5 models show better skills in simulating the spatiotemporal fields of surface solar radiation fluxes.
AMIP experiments are better than historical experiments. The CMIP5 AMIP experiments show better agreement with the observations than the historical experiments. The latter have nevertheless smaller errors than the CMIP3 AMIP simulations.
Choice of observational reference matters since the relative model ranking depends on the choice of the observational reference dataset. It is therefore recommended that a multitude of observations is always used when evaluating model simulations. This allows one to account for the variability between the observational records.
CMIP models underestimate multidecadal trends in surface radiation fluxes. The CMIP model ensemble underestimates observed multidecadal trends in surface solar radiation fluxes. Significant changes observed in in situ and satellite observations are not reproduced by the CMIP models.
Satellite trend estimates are uncertain because the satellite data records differ in the magnitudes, and in some regions even the sign of regional trends of surface solar radiation fluxes. Hence, trend estimates from the individual datasets should be interpreted very carefully (cf. section 3). An initial comparison for Europe and China with published trend estimates from Allen et al. (2013) indicated that the used CLARA-A1 surface radiation dataset showed multidecadal trends similar to the ground observations.
The main purpose of this study is to provide a general overview about the accuracy of CMIP models to represent climate mean states of surface solar radiation fluxes and their long-term variability. Providing more detailed information on the spatial and temporal biases of individual models is beyond the scope of this paper. Detailed information of individual model deviations is therefore not provided here but is available in separate reports (see appendix E for details).
A more detailed analysis of the long-term trends using in situ, satellite, and model data of different regions of the globe should be subject to further investigations. The current comparison was limited to Europe and China, while previous studies only focused on the comparison of selected satellite and in situ data (Hinkelman et al. 2009; Riihelä et al. 2015; Müller et al. 2015) or in situ data and models (Allen et al. 2013). As only a single time period (1989–2007) was used in this study, to be consistent with Allen et al. (2013), it is recommended that future studies should analyze trends of subsequent time periods, as dimming and brightening trends might compensate each other (Hinkelman et al. 2009).
This study has also shown that the choice of the observational reference has an impact on the relative model ranking. It is recommended that, whenever available, different observational datasets are used for model evaluation purposes to quantify this additional variability. In addition, robust quantitative information on observational uncertainties and guidance for users to choose observations for particular applications is needed. Both issues are subjects of ongoing research (e.g., Hollmann et al. 2013; Gregow et al. 2015) and are of high importance to minimize the impact of observation uncertainties in future ESM evaluation exercises.
Acknowledgments
Parts of this research have been supported by Deutscher Wetterdienst (DWD), which is gratefully acknowledged. The main author was further supported through the CliSAP (EXC177) Cluster of Excellence, University of Hamburg, funded through the German Science Foundation (DFG) and the ESA Climate Change Initiative Climate Modelling User Group (Contract 4000100222/10/I-AM). The CM SAF data have been kindly provided by the EUMETSAT Satellite Application Facility on Climate Monitoring. SRB data were obtained from the NASA Langley Research Center Atmospheric Sciences Data Center NASA GEWEX SRB project. We acknowledge the modeling groups, the DOE Program for Climate Model Diagnosis and Intercomparison (PCMDI), and the WCRP’s Working Group on Coupled Modelling (WGCM) for their roles in making available the WCRP CMIP3 and CMIP5 multimodel datasets. Support of this dataset is provided by the U.S. Department of Energy Office of Science.
APPENDIX A
Models
CMIP models and experiments used in this study are in Table A1. Expansions of acronyms are available at http://www.ametsoc.org/PubsAcronymList.
CMIP3 and CMIP5 models for AMIP and historical experiments.
APPENDIX B
Regions
Figure B1 shows the regions from the IPCC SPEX (IPCC 2012). The defined regions are Alaska/northwestern Canada (ALA); eastern Canada, Greenland, and Iceland (CGI); western North America (WNA); central North America (CNA); eastern North America (ENA); Central America and Mexico (CAM); Amazon (AMZ); Northeast Brazil (NEB); west coast South America (WSA); southeastern South America (SSA); northern Europe (NEU); central Europe (CEU); southern Europe and Mediterranean (MED); Sahara (SAH); western Africa (WAF); eastern Africa (EAF); southern Africa (SAF); northern Asia (NAS); western Asia (WAS); central Asia (CAS); Tibetan Plateau (TIB); eastern Asia (EAS); southern Asia (SAS); southeastern Asia (SEA); northern Australia (NAU); and southern Australia/New Zealand (SAU).
Definition of regions used for the analysis. The regions are based on those of IPCC (2012), in which the latitude–longitude coordinates of individual regions are also defined.
Citation: Journal of Climate 29, 20; 10.1175/JCLI-D-14-00503.1
APPENDIX C
APPENDIX D
Multidecadal Variability
The spatial distribution of the observed multidecadal trends for R↑ is shown in Fig. D1. Regional distributions of trends for R↑ (R↓) similar to Fig. 4 (Fig. 5) are given in Fig. D2 (Fig. D3).
Multidecadal regional trends of R↑ (W m−2 a−1) for (left) ISCCP and (right) SRB3.0 observations.
Citation: Journal of Climate 29, 20; 10.1175/JCLI-D-14-00503.1
As in Fig. 4, but for historical experiments.
Citation: Journal of Climate 29, 20; 10.1175/JCLI-D-14-00503.1
As in Fig. 5, but for historical experiments.
Citation: Journal of Climate 29, 20; 10.1175/JCLI-D-14-00503.1
APPENDIX E
Further Information
This paper summarizes the assessment of the CMIP3 and CMIP5 surface solar radiation fluxes but does not provide detailed information on individual model deviations as well as temporal and spatial deviations between individual models and observations.
As the software used for the evaluation of the different models is publicly available [Python Climate Model Benchmarking Suite (pyCMBS); Loew 2015a], as are all datasets, the interested reader might reproduce all results. The output of the software is comprehensive reports, providing detailed information on individual model deviations. The reports corresponding to the processing of the present paper have been published separately and are accessible in Loew (2015b).
REFERENCES
Abramowitz, G., 2012: Towards a public, standardized, diagnostic benchmarking system for land surface models. Geosci. Model Dev., 5, 819–827, doi:10.5194/gmd-5-819-2012.
Abramowitz, G., R. Leuning, M. Clark, and A. Pitman, 2008: Evaluating the performance of land surface models. J. Climate, 21, 5468–5481, doi:10.1175/2008JCLI2378.1.
Allen, R. J., J. R. Norris, and M. Wild, 2013: Evaluation of multidecadal variability in CMIP5 surface solar radiation and inferred underestimation of aerosol direct effects over Europe, China, Japan, and India. J. Geophys. Res. Atmos., 118, 6311–6336, doi:10.1002/jgrd.50426.
Anav, A., G. Murray-Tortarolo, P. Friedlingstein, S. Sitch, S. Piao, and Z. Zhu, 2013: Evaluation of land surface models in reproducing satellite derived leaf area index over the high-latitude Northern Hemisphere. Part II: Earth system models. Remote Sens., 5, 3637–3661, doi:10.3390/rs5083637.
Blyth, E., D. B. Clark, R. Ellis, C. Huntingford, S. Los, M. Pryor, M. Best, and S. Sitch, 2011: A comprehensive set of benchmark tests for a land surface model of simultaneous fluxes of water and carbon at both the global and seasonal scale. Geosci. Model Dev., 4, 255–269, doi:10.5194/gmd-4-255-2011.
Brovkin, V., L. Boysen, T. Raddatz, V. Gayler, A. Loew, and M. Claussen, 2013: Evaluation of vegetation cover and land-surface albedo in MPI-ESM CMIP5 simulations. J. Adv. Model. Earth Syst., 5, 48–57, doi:10.1029/2012MS000169.
Eyring, V., and Coauthors, 2016: ESMValTool (v1.0)—A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP. Geosci. Model Dev., 9, 1747–1802, doi:10.5194/gmd-9-1747-2016.
Flato, G., and Coauthors, 2013: Evaluation of climate models. Climate Change 2013: The Physical Science Basis, T. F. Stocker et al., Eds., Cambridge University Press, 741–866. [Available online at https://www.ipcc.ch/pdf/assessment-report/ar5/wg1/WG1AR5_Chapter09_FINAL.pdf.]
Gettelman, A., and Coauthors, 2012: A community diagnostic tool for chemistry climate model validation. Geosci. Model Dev., 5, 1061–1073, doi:10.5194/gmd-5-1061-2012.
Gleckler, P. J., K. E. Taylor, and C. Doutriaux, 2008: Performance metrics for climate models. J. Geophys. Res., 113, D06104, doi:10.1029/2007JD008972.
Goddard, L., and Coauthors, 2013: A verification framework for interannual-to-decadal predictions experiments. Climate Dyn., 40, 245–272, doi:10.1007/s00382-012-1481-2.
Gregow, H., and Coauthors, 2015: User awareness concerning feedback data and input observations used in reanalysis systems. Adv. Sci. Res., 12, 63–67, doi:10.5194/asr-12-63-2015.
Hagemann, S., A. Loew, and A. Andersson, 2013: Combined evaluation of MPI-ESM land surface water and energy fluxes. J. Adv. Model. Earth Syst., 5, 259–286, doi:10.1029/2012MS20008.
Hinkelman, L. M., P. W. Stackhouse, B. A. Wielicki, T. Zhang, and S. R. Wilson, 2009: Surface insolation trends from satellite and ground measurements: Comparisons and challenges. J. Geophys. Res., 114, D00D20, doi:10.1029/2008JD011004.
Hollmann, R., and Coauthors, 2013: The ESA Climate Change Initiative: Satellite data records for essential climate variables. Bull. Amer. Meteor. Soc., 94, 1541–1552, doi:10.1175/BAMS-D-11-00254.1.
IPCC, 2012: Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation, C. B. Field et al., Eds., Cambridge University Press, 582 pp.
Karlsson, K.-G., and Coauthors, 2013: CLARA-A1: A cloud, albedo, and radiation dataset from 28 yr of global AVHRR data. Atmos. Chem. Phys., 13, 5351–5367, doi:10.5194/acp-13-5351-2013.
Kato, S., N. G. Loeb, F. G. Rose, D. R. Doelling, D. A. Rutan, T. E. Caldwell, L. Yu, and R. A. Weller, 2013: Surface irradiances consistent with CERES-derived top-of-atmosphere shortwave and longwave irradiances. J. Climate, 26, 2719–2740, doi:10.1175/JCLI-D-12-00436.1.
Li, J.-L. F., D. E. Waliser, G. Stephens, S. Lee, T. L’Ecuyer, S. Kato, N. Loeb, and H.-Y. Ma, 2013: Characterizing and understanding radiation budget biases in CMIP3/CMIP5 GCMs, contemporary GCM, and reanalysis. J. Geophys. Res., 118, 8166–8184, doi:10.1002/jgrd.50378.
Loeb, N. G., B. A. Wielicki, D. R. Doelling, G. L. Smith, D. F. Keyes, S. Kato, N. Manalo-Smith, and T. Wong, 2009: Toward optimal closure of the earth’s top-of-atmosphere radiation budget. J. Climate, 22, 748–766, doi:10.1175/2008JCLI2637.1.
Loeb, N. G., S. Kato, W. Su, T. Wong, F. G. Rose, D. R. Doelling, J. R. Norris, and X. Huang, 2012: Advances in understanding top-of-atmosphere radiation variability from satellite observations. Surv. Geophys., 33, 359–385, doi:10.1007/s10712-012-9175-1.
Loew, A., 2013: Terrestrial satellite climate data records: How long is long enough? A test case for the Sahel. Theor. Appl. Climatol., 115, 427–440, doi:10.1007/s00704-013-0880-6.
Loew, A., 2015a: pycmbs, version 1.1.0. Zenodo, accessed 11 May 2015, doi:10.5281/zenodo.17486.
Loew, A., 2015b: Comprehensive CMIP5 model evaluation reports: Auxiliary material. Open Data LMU, doi:10.5282/ubm/data.70.
Loew, A., P. M. van Bodegom, J.-L. Widlowski, J. Otto, T. Quaife, B. Pinty, and T. Raddatz, 2014: Do we (need to) care about canopy radiation schemes in DGVMs? Caveats and potential impacts. Biogeosciences, 11, 1873–1897, doi:10.5194/bg-11-1873-2014.
Luo, Y. Q., and Coauthors, 2012: A framework for benchmarking land models. Biogeosciences, 9, 3857–3874, doi:10.5194/bg-9-3857-2012.
Meehl, G. A., C. Covey, K. E. Taylor, T. Delworth, R. J. Stouffer, M. Latif, B. McAvaney, and J. F. B. Mitchell, 2007: The WCRP CMIP3 multimodel dataset: A new era in climate change research. Bull. Amer. Meteor. Soc., 88, 1383–1394, doi:10.1175/BAMS-88-9-1383.
Meehl, G. A., R. Moss, and K. Taylor, 2014: Climate model intercomparisons: Preparing for the next phase. Eos, Trans. Amer. Geophys. Union, 95, 77–78, doi:10.1002/2014EO090001.
Müller, R., U. Pfeifroth, C. Träger-Chatterjee, J. Trentmann, and R. Cremer, 2015: Digging the METEOSAT treasure—3 decades of solar surface radiation. Remote Sens., 7, 8067–8101, doi:10.3390/rs70608067.
Raschke, E., S. Kinne, and P. W. Stackhouse, 2012: GEWEX Radiative Flux Assessment (RFA) volume 1: Assessment. WCRP Tech. Rep. 19/2012, 273 pp. [Available online at http://www.wcrp-climate.org/documents/GEWEX%20RFA-Volume%201-report.pdf.]
Reichler, T., and J. Kim, 2008: How well do coupled models simulate today’s climate? Bull. Amer. Meteor. Soc., 89, 303–311, doi:10.1175/BAMS-89-3-303.
Riihelä, A., T. Carlund, J. Trentmann, R. Müller, and A. Lindfors, 2015: Validation of CM SAF surface solar radiation datasets over Finland and Sweden. Remote Sens., 7, 6663–6682, doi:10.3390/rs70606663.
Stackhouse, P. W., S. K. Gupta, S. J. Cox, T. Zhang, J. C. Mikovitz, and L. M. Hinkelman, 2011: 24.5-year data set released. GEWEX News, Vol. 21, No. 1, International GEWEX Project Office, Silver Spring, MD, 10–12.
Stevens, B., and Coauthors, 2013: Atmospheric component of the MPI-M Earth System Model: ECHAM6. J. Adv. Model. Earth Syst., 5, 146–172, doi:10.1002/jame.20015.
Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485–498, doi:10.1175/BAMS-D-11-00094.1.
Wild, M., 2009: Global dimming and brightening: A review. J. Geophys. Res., 114, D00D16, doi:10.1029/2008JD011470.
Zhang, T., P. W. Stackhouse, S. K. Gupta, S. J. Cox, J. Colleen Mikovitz, and L. M. Hinkelman, 2013: The validation of the GEWEX SRB surface shortwave flux data products using BSRN measurements: A systematic quality control, production and application approach. J. Quant. Spectrosc. Radiat. Transfer, 122, 127–140, doi:10.1016/j.jqsrt.2012.10.004.
Zhang, Y., W. B. Rossow, A. A. Lacis, V. Oinas, and M. I. Mishchenko, 2004: Calculation of radiative fluxes from the surface to top of atmosphere based on ISCCP and other global data sets: Refinements of the radiative transfer model and the input data. J. Geophys. Res., 109, D19105, doi:10.1029/2003JD004457.