Abstract

The U.S. Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) program User Facility produces ground-based long-term continuous unique measurements for atmospheric state, precipitation, turbulent fluxes, radiation, aerosol, cloud, and the land surface, which are collected at multiple sites. These comprehensive datasets have been widely used to calibrate climate models and are proven to be invaluable for climate model development and improvement. This article introduces an evaluation package to facilitate the use of ground-based ARM measurements in climate model evaluation. The ARM data-oriented metrics and diagnostics package (ARM-DIAGS) includes both ARM observational datasets and a Python-based analysis toolkit for computation and visualization. The observational datasets are compiled from multiple ARM data products and specifically tailored for use in climate model evaluation. In addition, ARM-DIAGS also includes simulation data from models participating the Coupled Model Intercomparison Project (CMIP), which will allow climate-modeling groups to compare a new, candidate version of their model to existing CMIP models. The analysis toolkit is designed to make the metrics and diagnostics quickly available to the model developers.

A set of standard metrics and diagnostics provides an effective way for climate modeling centers to routinely assess their model performance and judge the improvement of model simulations from new parameterizations. In the past, climate model developers have often relied on satellite remote sensing products to calibrate and tune their models. Satellite datasets provide great global coverage; however, it is difficult to apply satellite data in some process studies due to their poor temporal resolution. Therefore, utilizing detailed high-frequency ground-based measurements for a comprehensive collection of quantities can be a complementary test in model evaluation.

Over the past three decades, the U.S. Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) program has established several permanent research sites and deployed a number of ARM Mobile Facilities (AMF) in diverse climate regimes around the world to collect long-term continuous field measurements of clouds, aerosols, and radiation and their associated large-scale environments. These detailed field observations have provided a unique observational basis specifically for understanding cloud and precipitation related processes and evaluating and improving their representations in climate models. However, ARM data have not been extensively utilized in current model development workflows. With the growing interest in the climate modeling community in developing process-oriented metrics and diagnostics to aid parameterization development (Maloney et al. 2019), the high-frequency process-oriented ARM observations should play a more important role in future metrics and diagnostics development.

In this article, we introduce the recently developed ARM data-oriented metrics and diagnostics package (ARM-DIAGS) for the global climate community to facilitate the use of ARM field data in climate model evaluation. The focus is on unique ARM observations on clouds and aerosols, as well as process-oriented diagnostics that are particularly aimed to improve the representation of cloud and precipitation related processes in climate models, such as those included in the Coupled Model Intercomparison Project (CMIP). The package is available publicly with the hope that it can serve as an easy entry point for climate modelers to compare their models with ARM data and supplemented CMIP datasets.

Overview of the ARM data-oriented metrics and diagnostics package

The ARM-DIAGS development closely follows the CMIP protocol to efficiently distribute ARM metrics and diagnostics package along with other metrics packages to the CMIP community and other climate modeling centers. For this purpose, the diagnostic toolkit is built with the Python programming language and utilizes Python libraries for scientific analysis (such as NumPy and matplotlib). Additional Python packages developed by DOE [i.e., the Community Data Analysis Tools (CDAT), https://cdat.llnl.gov/] are also used. Four components are currently included in the ARM-DIAGS: 1) a Python-based analysis program; 2) an ARM-based collection of mean and diurnal and seasonal cycle climatologies as well as high time frequency data for process-oriented diagnostics; 3) a database of simulation data from models contributed to the CMIP project; and 4) relevant technical documentations for ARM-DIAGS.

The observations used to assess model performance primarily rely on the ARM Best Estimate (ARMBE) data products (Xie et al. 2010) and other ARM value-added products (VAPs; www.arm.gov/capabilities/vaps), which are available for all the ARM permanent research sites and some ARM mobile facilities. These data often rely on measurements at the ARM Central Facility (CF) locations (i.e., single point measurements). To improve model–observation comparison, the ARM long-term continuous forcing data (Xie et al. 2004), which represents an average over a global climate model (GCM) grid box, is also used when it is available. For cloud properties such as cloud liquid and ice water contents, the ARM Cloud Retrieval Ensemble Data (ACRED; Zhao et al. 2012) is used. The detailed information about ARM data used in the ARM-DIAGS package is listed in Tables 1 and 2. The observational data product consists of hourly averaged, diurnal cycle, monthly means or climatological summaries of the measured quantities, with variable names, units, and vertical dimensions remapped to CMIP convention. They are currently available for the Southern Great Plains (SGP) site (Table 1) as well as the North Slope of Alaska (NSA) Barrow (now known as Utqiaġvik) site and the Tropical Western Pacific (TWP) Manus, Nauru, and Darwin sites (Table 2). Other than the ARM observations, ARM-DIAGS also includes simulation data from models participating in the CMIP project, which will allow climate-modeling groups to compare a new candidate version of their model to existing CMIP models. A full list of metrics and diagnostics are as follows, with a subset demonstrated in the “Facilitating use of ARM data in climate model evaluation” section of this article:

  • a set of basic metrics tables: mean, mean bias, correlation, and root-mean-square error based on annual cycle of each variable;

  • line plots and Taylor diagrams (Taylor 2001) for annual cycle variability of each variable;

  • contour and vertical profiles of annual cycle and diurnal cycle of cloud fraction;

  • line and harmonic dial plots (Covey et al. 2016) of diurnal cycle of precipitation;

  • probability density function (PDF) plots of precipitation rate (Pendergrass and Hartmann 2014); and

  • convection onset metrics showing the statistical relationship between precipitation rate and column water vapor (Schiro et al. 2016).

Table 1.

Observed quantities selected in the evaluation package, including the quantity names, the data sources, and the temporal and spatial information of the derived data for SGP.

Observed quantities selected in the evaluation package, including the quantity names, the data sources, and the temporal and spatial information of the derived data for SGP.
Observed quantities selected in the evaluation package, including the quantity names, the data sources, and the temporal and spatial information of the derived data for SGP.
Table 2.

Observed quantities selected in the evaluation package, including the quantity names, the data sources, and the temporal and spatial information of the derived data for NSA and TWP sites.

Observed quantities selected in the evaluation package, including the quantity names, the data sources, and the temporal and spatial information of the derived data for NSA and TWP sites.
Observed quantities selected in the evaluation package, including the quantity names, the data sources, and the temporal and spatial information of the derived data for NSA and TWP sites.

Facilitating use of ARM data in climate model evaluation

Diagnosis of summertime warm bias.

The data and diagnostics provided through ARM-DIAGS have been used for studying the systematic warm bias in surface temperature found among the climate models in summertime over continental midlatitudes including the ARM SGP site (C. Zhang et al. 2018). The biases are consistent with both overestimated surface shortwave radiation and underestimated evaporative fraction, which contribute to the warm bias as illustrated in Fig. 1. These diagnostics provide an integrated picture with detailed field observations to identify possible model deficiencies in representing cloud, radiation, and land properties, as well as their interactions.

Fig. 1.

Annual cycle of monthly mean of (a) surface air temperature, (b) precipitation, (c) surface air relative humidity, (d) surface downward shortwave radiative flux, (e) surface sensible flux, and (f) surface latent heat flux over the ARM SGP domain (averaged over 35°–38°N, 99°–96°W) from ARM observations averaged over 1999–2011 (red line with error bars representing one standard deviation of interannual variability) and CMIP5 simulations averaged over 1979–2008 (gray lines for individual CMIP5 models and black line for multimodel mean). JJA mean values are shown in the legend. Plots are modified from C. Zhang et al. (2018).

Fig. 1.

Annual cycle of monthly mean of (a) surface air temperature, (b) precipitation, (c) surface air relative humidity, (d) surface downward shortwave radiative flux, (e) surface sensible flux, and (f) surface latent heat flux over the ARM SGP domain (averaged over 35°–38°N, 99°–96°W) from ARM observations averaged over 1999–2011 (red line with error bars representing one standard deviation of interannual variability) and CMIP5 simulations averaged over 1979–2008 (gray lines for individual CMIP5 models and black line for multimodel mean). JJA mean values are shown in the legend. Plots are modified from C. Zhang et al. (2018).

Diurnal cycle of cloud fraction.

This daily cycle could serve as a critical test of the models’ representation of the physical processes controlling cloud life cycle. One unique product from ARM is cloud vertical profile measurements derived from an integration of multiple active remote sensors, including millimeter wavelength cloud radars, laser ceilometers, and micropulse lidars [Active Remote Sensing of Clouds product (ARSCL)]. Figure 2 shows a comparison between observed and simulated diurnal cycle of cloud vertical structure over the ARM midlatitude and tropical sites (i.e., SGP and Manus), where prominent climatological diurnal cycle of clouds is present. Over the SGP site, a lack of cloud transition from shallow to deep during summertime [June–August (JJA)] is shown in the Energy Exascale Earth System Model (E3SM). This is a common model bias which is related to model deep convection that is triggered too easily and does not allow low clouds to build up. The Manus site exhibits a strong diurnal cycle, with a maximum in low cloud fraction occurring at early local noon and followed by a maximum in high cloud hours later. Similarly, the model in general underestimated the lower cloud and overestimated high cloud, which is also lack of diurnal variability.

Fig. 2.

Climatological composite diurnal cycle of clouds (left) from observed and (right) simulated by E3SM: (top) JJA mean at SGP and (bottom) annual mean at Manus.

Fig. 2.

Climatological composite diurnal cycle of clouds (left) from observed and (right) simulated by E3SM: (top) JJA mean at SGP and (bottom) annual mean at Manus.

Diurnal cycle of precipitation.

Diurnal cycle of precipitation often serves as a benchmark for climate models. The diurnal cycle diagnostics in ARM-DIAGS, which compare the precipitation intensity and its peak time, have been utilized by the E3SM development team to assess the performance of a newly developed convection triggering mechanism (Xie et al. 2019). Figure 3 shows that all climate models including the default E3SM are not able to capture the observed nocturnal peak which is often associated with the eastward propagation of mesoscale convective systems. A recently developed convective triggering function, which incorporates an empirical dynamic constraint and allows elevated convection to be captured, started to pick up the early morning peak time, although the intensity is still too weak. These diagnostics are useful to repeat continually, especially when new features in convection parameterizations are implemented.

Fig. 3.

(left) Black dots are ARM observations. Curves are the first harmonics: gray for CMIP5 model AMIP type of runs. Color curves are from DOE’s E3SM Atmosphere Model (EAM v1) with a standard control run and a run using newly developed convection triggers [a detailed experiment description can be found in Xie et al. (2019)]. (right) Mapping precipitation peak time and amplitude (mm day‒1) from the first harmonics to polar coordinate.

Fig. 3.

(left) Black dots are ARM observations. Curves are the first harmonics: gray for CMIP5 model AMIP type of runs. Color curves are from DOE’s E3SM Atmosphere Model (EAM v1) with a standard control run and a run using newly developed convection triggers [a detailed experiment description can be found in Xie et al. (2019)]. (right) Mapping precipitation peak time and amplitude (mm day‒1) from the first harmonics to polar coordinate.

Precipitation distribution.

The PDF analysis for daily mean precipitation at the SGP site during June–August is shown in Fig. 4. This example illustrates that models tend to underestimate heavy rainfall (>10 mm day‒1) both in frequency (Fig. 4a) and the amount contributed to the mean precipitation (Fig. 4b). The overlaying result from GPCP (Global Precipitation Climatology Project One-Degree Daily Precipitation Dataset) also confirms this systematic model bias.

Fig. 4.

(a) Precipitation frequency distribution (mm day‒1)‒1 and (b) the contribution to mean precipitation amount (PDF multiplied by precipitation rate, unitless) as a function of precipitation rate based on daily-mean values using observations from ARM (blue line) and GPCP (red line) compared with CMIP5 AMIP simulations shown as gray lines. The black line represents the multimodel mean. The precipitation bin arrangement follows Pendergrass and Hartmann (2014) but a conventional normalization is used (integrated in precipitation rather than a log-precipitation variable).

Fig. 4.

(a) Precipitation frequency distribution (mm day‒1)‒1 and (b) the contribution to mean precipitation amount (PDF multiplied by precipitation rate, unitless) as a function of precipitation rate based on daily-mean values using observations from ARM (blue line) and GPCP (red line) compared with CMIP5 AMIP simulations shown as gray lines. The black line represents the multimodel mean. The precipitation bin arrangement follows Pendergrass and Hartmann (2014) but a conventional normalization is used (integrated in precipitation rather than a log-precipitation variable).

Convection onset metrics.

Convection onset metrics allow users to compare diagnostics for the behavior of deep convection from ARM observations to model output. The statistics quantify robust relationships between precipitation, column water vapor (CWV), and temperature. This includes the sharp increase or “pickup” in conditional-average precipitation rate above a critical CWV value seen in Fig. 5a, which is easily identifiable for short time averages at tropical ARM sites. The pickup represents the onset of conditional instability yielding strong convective precipitation (Schiro et al. 2016) and is also seen in the probability of precipitation (Fig. 5b). The probability density of CWV and the contribution from precipitating points (Fig. 5c) have a drop in probability density at high CWV corresponding to the regime with high precipitation loss above the critical CWV value. These features are robust to spatial averaging up to about 2° latitude–longitude and time averaging up to about 3 h (Kuo et al. 2018), aside from slight increases in probability (Fig. 5b) with averaging.

Fig. 5.

(a) Precipitation conditionally averaged on CWV for observations based on ARMBE precipitation and gap-filled Microwave Radiometer Retrievals (MWRRET) of CWV (blue) and E3SM model output (black) over Manus Island. (b) As in (a), but for precipitation probability (the number of CWV observations with rain rates greater than a small threshold, here 0.5 mm h‒1, divided by the total number of CWV samples in each bin). (c) The PDFs of CWV for observations (dark blue) and model (black) and of the contribution to this from points with precipitation exceeding 0.5 mm h‒1 for observations (light blue) and model (gray).

Fig. 5.

(a) Precipitation conditionally averaged on CWV for observations based on ARMBE precipitation and gap-filled Microwave Radiometer Retrievals (MWRRET) of CWV (blue) and E3SM model output (black) over Manus Island. (b) As in (a), but for precipitation probability (the number of CWV observations with rain rates greater than a small threshold, here 0.5 mm h‒1, divided by the total number of CWV samples in each bin). (c) The PDFs of CWV for observations (dark blue) and model (black) and of the contribution to this from points with precipitation exceeding 0.5 mm h‒1 for observations (light blue) and model (gray).

The statistics discussed here can distinguish between convective parameterizations in models (Kuo et al. 2020). An example of model comparison is given in Fig. 5. An important diagnostic in the model evaluation of convection onset concerns the critical CWV value where the precipitation pickup begins. Many models exhibit a pickup at lower CWV than observations (Kuo et al. 2020), as seen in Fig. 5a for E3SM. This mismatch persists even when temperature dependence (not shown but will be included in a future release of ARM-DIAGS) is included by binning by the saturation water vapor.

Summary and future work

The ARM metrics and diagnostics package is designed and developed to facilitate the use of ARM ground-based in situ measurements in climate model evaluation. Metrics and diagnostics evaluating the simulated atmospheric and cloud fields are generated by running a Python program in a simple software environment based on CDAT. The v2.0 ARM-DIAGS’s analysis codes are currently publicly available through GitHub (https://github.com/ARM-DOE/arm-gcm-diagnostics) under the ARM User Facility project space. This analysis code package is envisioned to serve as a central place to share the valuable analysis scripts to produce the metrics and diagnostics developed based on ARM data from the community. Analysis data include ARM observational datasets and the reference CMIP5 AMIP data can be downloaded through the ARM archive (www.arm.gov/capabilities/vaps/adcme-123). For now, the default requirement for the input model is that the data use CMIP conventions. Anyone interested in applying ARM-DIAGS to a specific model should contact the development team via our GitHub page for specific configurations for a model run.

Future work includes extending the ARM-DIAGS to the ARM Eastern North Atlantic (ENA) site (a new fixed site) and ARM AMF sites. CMIP6 data will be included as they become more available. Ongoing work includes incorporation of the recently developed ARM cloud radar simulator (Y. Zhang et al. 2018) into ARM-DIAGS to improve the comparison between model clouds and ARM cloud radar observations, as well as adding temperature dependence to convection onset statistics. In addition, utilizing other sources of observations, such as those retrieved from satellites, as supplementary data, can help address issues associated with observation uncertainty and data resolution. Moving forward, we will be particularly focusing on adding process-oriented diagnostics in ARM-DIAGS. The diagnostics suite will be continuously improved with close collaboration with scientists in the field. To make this package to be accessible and utilized broadly, we plan to integrate it into other commonly used Python-based metrics packages in the GCM community such as the PCMDI’s metrics package (PMP) and the DOE E3SM diagnostics package (E3SM-DIAGS) to provide routine model evaluation at ARM sites.

Acknowledgments

This research was supported by the DOE Atmospheric Radiation Measurement (ARM) program and performed under the auspices of the U. S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. IM release: LLNL-ABS-789643. Work at UCLA was supported by U.S. Department of Energy Grant DE-SC0011074 and subcontract B634021 and National Science Foundation AGS-1540518, AGS-1936810. We acknowledge the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison to provide coordinating support and lead development of software infrastructure in partnership with the Global Organization for Earth System Science Portals for making CMIP data available.

References

References
Bond
,
D.
,
2005
:
Soil Water and Temperature System (SWATS) handbook
.
ARM Tech
.
Rep. DOE/SC-ARM-TR-063
,
24
pp., www.arm.gov/publications/tech_reports/handbooks/swats_handbook.pdf.
Chandra
,
A. S.
,
C.
Zhang
,
S. A.
Klein
, and
H.-Y.
Ma
,
2015
:
Low-cloud characteristics over the tropical western Pacific from ARM observations and CAM5 simulations
.
J. Geophys. Res. Atmos.
,
120
,
8953
8970
, https://doi.org/10.1002/2015JD023369.
Chandra
,
A. S.
, and et al
,
2001
:
The ARM Millimeter Wave Cloud Radars (MMCRs) and the Active Remote Sensing of Clouds (ARSCL) Value Added Product (VAP)
.
DOE Tech. Memo. ARM VAP-002.
1
,
56
pp., www.arm.gov/publications/tech_reports/arm-vap-002-1.pdf .
Cook
,
D. R.
, and
R. C.
Sullivan
,
2011a
:
Energy Balance Bowen Ratio Station (EBBR) handbook
.
ARM Tech
.
Rep. DOE/SC-ARM/TR-037
,
28
pp., www.arm.gov/publications/tech_reports/handbooks/ebbr_handbook.pdf.
Cook
,
D. R.
, and
R. C.
Sullivan
,
2011b
:
Eddy Correlation Flux Measurement System (ECOR) handbook
.
ARM Tech
.
Rep. DOE/SC-ARM/TR-052
,
20
pp., www.arm.gov/publications/tech_reports/handbooks/ecor_handbook.pdf.
Covey
,
C.
,
P. J.
Gleckler
,
C.
Doutriaux
,
D. N.
Williams
,
A.
Dai
,
J.
Fasullo
,
K.
Trenberth
, and
A.
Berg
,
2016
:
Metrics for the diurnal cycle of precipitation: Toward routine benchmarks for climate models
.
J. Climate
,
29
,
4461
4471
, https://doi.org/10.1175/JCLI-D-15-0664.1.
Knootz
,
A.
,
C.
Flynn
,
G.
Hodges
,
J.
Michalsky
, and
J.
Barnard
,
2013
:
Aerosol optical depth value-added product
.
ARM Tech. Rep. DOE/SC-ARM/TR-
129
,
32
pp., www.arm.gov/publications/tech_reports/doe-sc-arm-tr-129.pdf.
Kuo
,
Y. H.
,
K. A.
Schiro
, and
J. D.
Neelin
,
2018
:
Convective transition statistics over tropical oceans for climate model diagnostics: Observational baseline
.
J. Atmos. Sci.
,
75
,
1553
1570
, https://doi.org/10.1175/JAS-D-17-0287.1.
Kuo
,
Y. H.
, and et al
,
2020
:
Convective transition statistics over tropical oceans for climate model diagnostics: GCM evaluation
.
J. Atmos. Sci.
,
77
,
379
403
, https://doi.org/10.1175/JAS-D-19-0132.1.
Long
,
C. N.
, and
Y.
Shi
,
2006
:
The QCRad value added product: Surface radiation measurement quality control testing, including climatologically configurable limits
.
ARM Tech. Rep. DOE/SC-ARM/TR-074
,
69
pp., www.arm.gov/publications/tech_reports/doe-sc-arm-tr-074.pdf.
Long
,
C. N.
, and
Y.
Shi
,
2008
:
An automated quality assessment and control algorithm for surface radiation measurements
.
Open Atmos. Sci. J.
,
2
,
23
37
, https://doi.org/10.2174/1874282300802010023.
Maloney
,
E. D.
, and et al
,
2019
:
Process-oriented evaluation of climate and weather forecasting models
.
Bull. Amer. Meteor. Soc.
,
100
,
1665
1686
, https://doi.org/10.1175/BAMS-D-18-0042.1.
Pendergrass
,
A. G.
, and
D. L.
Hartmann
,
2014
:
Two modes of change of the distribution of rain
.
J. Climate
,
27
,
8357
8371
, https://doi.org/10.1175/JCLI-D-14-00182.1.
Schiro
,
K. A.
,
J. D.
Neelin
,
D. K.
Adams
, and
B. R.
Lintner
,
2016
:
Deep convection and column water vapor over tropical land versus tropical ocean: A comparison between the Amazon and the tropical western Pacific
.
J. Atmos. Sci.
,
73
,
4043
4063
, https://doi.org/10.1175/JAS-D-16-0119.1.
Taylor
,
K. E.
,
2001
:
Summarizing multiple aspects of model performance in a single diagram
.
J. Geophys. Res.
,
106
,
7183
7192
, https://doi.org/10.1029/2000JD900719.
Xie
,
S.
, and et al
,
2010
:
Clouds and more: ARM climate modeling best estimate data: A new data product for climate studies
.
Bull. Amer. Meteor. Soc.
,
91
,
13
20
, https://doi.org/10.1175/2009BAMS2891.1.
Xie
,
S.
, and et al
,
2019
:
Improved diurnal cycle of precipitation in E3SM with a revised convective triggering function
.
J. Adv. Model. Earth Syst.
,
11
,
2290
2310
, https://doi.org/10.1029/2019MS001702.
Xie
,
S. C.
,
R. T.
Cederwall
, and
M. H.
Zhang
,
2004
:
Developing long-term single-column model/cloud system-resolving model forcing data using numerical weather prediction products constrained by surface and top of the atmosphere observations
.
J. Geophys. Res.
,
109
,
D01104
, https://doi.org/10.1029/2003JD004045.
Zhang
,
C.
, and
S.
Xie
,
2017
:
ARM data-oriented metrics and diagnostics package for climate model evaluation value-added product
.
ARM Tech. Rep. DOE/SC-ARM-TR-202
,
19
pp., www.arm.gov/publications/tech_reports/doe-sc-arm-tr-202.pdf.
Zhang
,
C.
,
S.
Xie
,
S. A.
Klein
,
H.-y.
Ma
,
S.
Tang
,
K.
Van Weverberg
,
C. J.
Morcrette
, and
J.
Petch
,
2018
:
CAUSES: Diagnosis of the summertime warm bias in CMIP5 climate models at the ARM Southern Great Plains site
.
J. Geophys. Res. Atmos.
,
123
,
2968
2992
, https://doi.org/10.1002/2017JD027200.
Zhang
,
Y.
, and et al
,
2018
:
The ARM cloud radar simulator for global climate models: Bridging field data and climate models
.
Bull. Amer. Meteor. Soc.
,
99
,
21
26
, https://doi.org/10.1175/BAMS-D-16-0258.1.
Zhao
,
C.
, and et al
,
2012
:
Toward understanding of differences in current cloud retrievals of ARM ground-based measurements
.
J. Geophys. Res.
,
117
,
D10206
, https://doi.org/10.1029/2011JD016792.

Footnotes

*

CURRENT AFFILIATION: Google LLC, Mountain View, California

For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).