Data Length Requirements for Observational Estimates of Land–Atmosphere Coupling Strength

Kirsten L. Findell Geophysical Fluid Dynamics Laboratory, Princeton, New Jersey

Search for other papers by Kirsten L. Findell in
Current site
Google Scholar
PubMed
Close
,
Pierre Gentine Department of Earth and Environmental Engineering, and Earth Institute, Columbia University, New York, New York

Search for other papers by Pierre Gentine in
Current site
Google Scholar
PubMed
Close
,
Benjamin R. Lintner Department of Environmental Sciences, Rutgers, The State University of New Jersey, New Brunswick, New Jersey

Search for other papers by Benjamin R. Lintner in
Current site
Google Scholar
PubMed
Close
, and
Benoit P. Guillod Institute for Atmospheric and Climate Science, ETH Zürich, Zurich, Switzerland

Search for other papers by Benoit P. Guillod in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Multiple metrics have been developed in recent years to characterize the strength of land–atmosphere coupling in regional and global climate models. Evaluation of these metrics against observations has proven challenging because of limited observations and/or metric definitions based on model experimental designs that are not replicable with observations. Additionally, because observations are limited in time, with only a single realization of the earth’s climate available, metrics of land–atmosphere coupling strength typically assume stationarity and ergodicity, so that an observed time series (or set of time series) can be used in place of an ensemble mean of multiple realizations. The present study evaluates the observational data requirements necessary for robust quantification of a suite of land–atmosphere coupling metrics previously described in the literature. It is demonstrated that the amount of data required to obtain robust estimates of metrics assessing relationships between variables is greater than that necessary to constrain means of directly measured observables. Moreover, while the addition of unbiased noise does not significantly alter the mean of a directly observable quantity, inclusion of such noise degrades metrics based on connections between variables, yielding a unidirectional and negative impact on metric strength estimates. This analysis suggests that longer records of surface observations are required to correctly estimate land–atmosphere coupling strength than are required to estimate mean values of the observed quantities.

Current affiliation: Environmental Change Institute, School of Geography and the Environment, University of Oxford, Oxford, United Kingdom.

Corresponding author address: Kirsten L. Findell, Geophysical Fluid Dynamics Laboratory, 201 Forrestal Road, Princeton, NJ 08540-6649. E-mail: kirsten.findell@noaa.gov

Abstract

Multiple metrics have been developed in recent years to characterize the strength of land–atmosphere coupling in regional and global climate models. Evaluation of these metrics against observations has proven challenging because of limited observations and/or metric definitions based on model experimental designs that are not replicable with observations. Additionally, because observations are limited in time, with only a single realization of the earth’s climate available, metrics of land–atmosphere coupling strength typically assume stationarity and ergodicity, so that an observed time series (or set of time series) can be used in place of an ensemble mean of multiple realizations. The present study evaluates the observational data requirements necessary for robust quantification of a suite of land–atmosphere coupling metrics previously described in the literature. It is demonstrated that the amount of data required to obtain robust estimates of metrics assessing relationships between variables is greater than that necessary to constrain means of directly measured observables. Moreover, while the addition of unbiased noise does not significantly alter the mean of a directly observable quantity, inclusion of such noise degrades metrics based on connections between variables, yielding a unidirectional and negative impact on metric strength estimates. This analysis suggests that longer records of surface observations are required to correctly estimate land–atmosphere coupling strength than are required to estimate mean values of the observed quantities.

Current affiliation: Environmental Change Institute, School of Geography and the Environment, University of Oxford, Oxford, United Kingdom.

Corresponding author address: Kirsten L. Findell, Geophysical Fluid Dynamics Laboratory, 201 Forrestal Road, Princeton, NJ 08540-6649. E-mail: kirsten.findell@noaa.gov

1. Introduction

Much work in the area of land–atmosphere interactions over recent decades has focused on understanding how the land surface state, for example, vegetative and soil moisture conditions, affects the overlying atmospheric state, especially air temperature, humidity, and turbulence, and how these conditions in turn affect convective precipitation (see, e.g., Findell et al. 2011; Spracklen et al. 2012; Taylor et al. 2012; Gentine et al. 2013). Prior work has ranged in temporal and spatial focus from hourly, point-scale processes related to boundary layer dynamics (e.g., Betts and Ball 1995; Betts et al. 2002; Santanello et al. 2005, 2007, 2009) to global signatures (e.g., Betts et al. 1998, 1999, 2003, 2006; Koster et al. 2004, 2006; Dirmeyer 2006, 2011; Dirmeyer et al. 2006, 2009, 2012). Since the observations necessary to quantify land–atmosphere coupling are available at relatively few locations around the world and are usually of short duration, the global-scale studies have, by necessity, relied on models or reanalysis data, the latter of which in turn depend on an underlying modeling framework.

Some studies have used observational data from individual locations [e.g., the Atmospheric Radiation Measurement Program (ARM) Southern Great Plains (SGP) site; Phillips and Klein 2013], from multiple observational stations [e.g., AmeriFlux (Chen and Zhang 2009) and FLUXNET (Guillod et al. 2014) stations], from satellite-based estimates of relevant variables [e.g., Global Land Surface Evaporation: The Amsterdam Methodology (GLEAM) (Miralles et al. 2012) and Atmospheric Infrared Sounder (AIRS) (Ferguson and Wood 2011; Ferguson et al. 2012) data], or from reanalysis or observation–model combination products (Mei and Wang 2012; Liu et al. 2014) to derive observationally based estimates of land–atmosphere coupling strength. Differences between model- and observation-based estimates of this coupling strength could arise from factors such as systematic model biases stemming from inadequate or incomplete representations of complex physical systems or from limitations imposed by the available data, for example, a single (and often short) realization of climate time series. In the present study, we look at the latter factor: we aim to determine how dataset length influences metric estimates. Determining the mean characteristics of the climate system from a single realization ultimately relies on two underlying assumptions: stationarity and ergodicity. Stationarity relates to the time invariance of the underlying statistical properties of a process or system, while ergodicity relates to application of time averages in lieu of ensemble averages under the assumption that a sufficiently large temporal sampling can be taken as representative of the entire set of possible process or system states.

The present study examines the feasibility of calculating various metrics of land–atmosphere coupling from the short-duration (and geographically sparse) observations that are typically available, specifically addressing the following question: how long does an observational record need to be in order to yield a robust measure of land–atmosphere coupling strength at a given location? This question has important ramifications for the design of observational field campaigns, as well as for the use of short-duration datasets of any kind. To tackle this question, we use data from the North American Regional Reanalysis (NARR; Mesinger et al. 2006) dataset as a pseudo-observational network in which we apply multiple sampling strategies, varying the length of the time series from 92 days, equivalent to one summer’s worth of data, to 2300 days, or 25 summers of data. We limit the analysis to summertime, since in the midlatitudes boundary layer growth and development is strongly impacted by flux partitioning at the surface during this season. We further evaluate how a common methodology to assess the statistical confidence of a single time series realization, bootstrapping with replacement, constrains the metric estimates by varying the number of bootstrap samples. While the NARR data are our test bed, we use a sampling methodology to demonstrate that any short dataset is subject to certain statistical constraints that should be considered: our conclusions are not dependent on the exact dataset used. Indeed, output from a general circulation model from the Geophysical Fluid Dynamics Laboratory yields similar conclusions (not shown).

The NARR dataset is briefly described in the next section. In section 3, we detail the sampling strategy, while in section 4 we define the computed metrics and variables used in the study. Results on the impact of sample size on the robustness of metric estimates are presented in section 5, while results on the impact of noise on data requirements are shown in section 6. Finally, conclusions are presented in section 7.

2. Dataset

The NARR dataset is a dynamically consistent 3-hourly gridded dataset spanning more than 34 years at roughly 30-km grid spacing (Mesinger et al. 2006). This dataset is derived from a data assimilation scheme (DAS) with precipitation and other near-surface observations ingested hourly and atmospheric profiles of temperature, winds, and moisture from rawinsondes and dropsondes ingested every 3 h.

The regional reanalysis project, which led to the creation of the NARR data, aimed to create a long-term, consistent, high-resolution climate dataset for North America that improved on earlier global reanalysis datasets in both resolution and accuracy (Mesinger et al. 2006). In the present study, we use NARR as an internally consistent, comprehensive framework for estimating the data length requirements of directly observed variables versus derived metrics of land–atmosphere coupling strength; as such, our analysis does not strongly hinge on the quality of the NARR dataset. Nevertheless, we note that several studies document significant improvements in the components of the NARR system, including the Eta model (Marshall et al. 2003; Berbery et al. 2003), the DAS (Mitchell et al. 2004a), and the Noah land surface model (Ek et al. 2003). Other work has demonstrated clear improvements of NARR relative to other reanalysis products with respect to a diverse array of issues, including simulating the seasonality of dry and wet extremes (Mesinger et al. 2006; Mitchell et al. 2004b), matching radiosonde-based datasets of tropospheric winds and temperatures (Mesinger et al. 2006), closing the water and energy budgets over the Mississippi River basin (Roads et al. 2003), and capturing atmospheric moisture transport over the United States and Mexico (Mo et al. 2005). Comprehensive comparison of the precipitation, moisture flux convergence, and precipitable water characteristics of the NARR data to gridded observations highlights good correspondence between the two, albeit with a slight systematic bias toward more frequent, light precipitation events in NARR, particularly in Florida (Becker et al. 2009). Guillod et al. (2014) further suggest some potential issues in NARR related to the parameterization of interception evaporation, while detailed analyses by Ruane (2010a,b) show that discrepancies between the underlying model estimate of precipitation and the precipitation from the DAS may lead to biases in the evaporation and moisture flux convergence fields. Despite such discrepancies, Ruane notes the utility of NARR for studies of the water cycle.

3. Sampling strategy

We focus here on the analysis of the 34 summers (June–August) of NARR data spanning 1979–2012. Figure 1 depicts the synthetic network of sampling sites analyzed in this study. This network is intended to mimic point-based stations typical of field measurements, albeit with more regular spacing than is typical of observations. Stations are distributed zonally and meridionally every 10 grid points, with a slight deviation in uniform spacing because of the curvature of the Lambert conformal grid. At each station, a moving-block bootstrap routine with 10-, 30-, and 92-day block lengths is applied across the full time series to create a large number of bootstrap data samples for each of 11 different lengths corresponding to 1, 2, 3, 4, 6, 8, 10, 12, 15, 20, and 25 summers’ worth of data (i.e., 92, 184, …, 2300 days), as described in detail below. For the 10-day block lengths series, 1001 bootstrap samples are created, while 300 bootstrap samples are created for each of the 30- and 92-day blocks.

Fig. 1.
Fig. 1.

Locations of grid points with extensive sampling. Data from circled grid points are shown in Figs. 2 and 3.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

Bootstrapping is a resampling technique first introduced by Efron (1979) as a means of quantifying the accuracy of estimates of statistics, for example, the mean or variance, derived from a single observational sample. Using the original data, a family of bootstrap members (new sample datasets) is constructed via sampling with replacement from the original series. This family of bootstraps may be used to construct distributions of statistical properties along with confidence intervals and estimates of standard errors.

The moving-block bootstrap technique was proposed by Künsch (1989) to account for serially correlated data. Indeed, the time series of many climate variables often exhibit statistically significant autocorrelations, which effectively reduce the degrees of freedom within the data. In the moving-block bootstrap, the original dataset (of length n) is broken into nb + 1 blocks of length b, with the first block containing the original data points from 1 to b, the second block data points from 2 to b + 1, etc., up to the last block with data points from nb + 1 to n. Individual members of the family of bootstraps are generated by choosing blocks, with replacement, from a uniform random distribution spanning (1: nb + 1) until the desired sample length is reached. While b is arbitrary, it should be sufficiently long to capture autocorrelation in the time series. Our selection of block lengths (i.e., 10, 30, and 92 days) is intended to capture autocorrelation present on synoptic time scales as well as to replicate field measurements that may comprise distinct intervals of data sampled in short, intensive observational periods over multiple summers, or month-long or summer-long intervals. For the 2300-day samples, each member of the 10-day bootstrap ensemble comprises 230 blocks of 10 days chosen from the original 34-summer-long (1979–2012) blocked time series, while each member of the 92-day bootstrap ensemble is made up of 25 blocks of 92 days. For sample lengths not divisible by the block length, the last block is truncated to yield the appropriate number of days in the bootstrap member, for example, 92-day samples using 10-day blocks are composed of nine full blocks and 2 days from a tenth block.

This sampling methodology is shown schematically in Fig. 2, along with a 12-summer rainfall sample from one grid point included for reference. Ten different randomly generated bootstrap members are shown for the three different block lengths and for two different sample lengths. Figure 2 shows that bootstrap sample sizes much smaller than the total dataset length are less likely to sample the full range of observed rainfall states compared to the longer data samples. Additionally, the block length clearly impacts the variability sampled in the resultant value of the observation or metric of interest. Because there is some degree of autocorrelation within the climate system, samples generated from contiguous days (long block length) are more likely than samples generated from noncontiguous days (short block length) to sample a series of days with relatively homogeneous conditions, such as the ~45-day rain-free periods at the beginning of the second or twelfth summers. Thus, an increase in the number of blocks, or a decrease in the block length, should both increase the heterogeneity included in the resultant sample and decrease the spread of estimates provided by the bootstrap members. Figure 3 synthesizes the impact of these sampling decisions on the composite probability distribution functions (PDFs) generated from 300 bootstrap samples of the simplified triggering feedback strength sTFS metric (described in section 4) for each of the three block lengths and for six different sample lengths. This figure is discussed in greater detail in section 5.

Fig. 2.
Fig. 2.

(a) Blue line shows 12 years of June–August sample daily rainfall data from a grid cell in northeastern New Mexico. Red line shows 31-day moving average of daily values. Below are schematics of bootstrap sampling protocol for block lengths of (b) 10, (c) 30, and (d) 92 days and for total bootstrap sample lengths of (left) 92 and (right) 920 days.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

Fig. 3.
Fig. 3.

For a sample grid cell in northwestern Mexico, impact of block size (left) 10, (middle) 30, or (right) 92 days, and bootstrap sample length (from top to bottom) nobs = 92, 276, 552, 920, 1380, or 2300 days, on the estimate of sTFS. Each histogram contains 300 bootstrap samples for the given block size and sample length.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

4. Computed metrics and variables

We compare the impact of sample length on various metrics of land–atmosphere coupling strength to the impact on directly observed variables. The latter include mean daily precipitation p and 2-m air temperature T2m and mean late morning [0900–1200 local time (LT)] sensible heat flux H, latent heat flux λE, evaporative fraction EF, and soil moisture SM in the top 10 cm. Evaporative fraction is the fraction of available energy (net radiation minus ground heat flux) consumed by latent heat flux: EF = λE/(H + λE), where λ is the latent heat of condensation and E is the evaporation rate (e.g., Gentine et al. 2007, 2011).

The metrics of land–atmosphere coupling strength fall into three groups: correlations, two-legged indices, and convective triggering metrics. Two correlation-based metrics are considered, namely, the correlation ρ between latent heat flux and 2-m air temperature [ρ(λE, T2m); Seneviratne et al. 2006], and between latent heat flux and soil moisture [ρ(λE, SM); Dirmeyer et al. 2009]. Values of ρ(λE, T2m) close to −1 indicate strong control of soil moisture on both evapotranspiration and surface air temperature, which is typical under conditions with ample net surface radiative heating but limited availability of soil moisture.

The two-legged index (Dirmeyer 2011) builds on the correlation between latent heat flux and soil moisture. Two-legged refers to the conceptual understanding that the impact of soil moisture on subsequent precipitation can be divided into distinct component influences of (i) soil moisture on the surface energy partitioning and (ii) the surface energy partitioning on the overlying atmosphere. Specifically, the two-legged index IdailyLH captures the first of these components. It is computed as the slope of the correlation between soil moisture and latent heat flux, given by βLH, and the variability of soil moisture:
e1
where σSM is the standard deviation of soil moisture. Large positive values of IdailyLH may indicate either a steep slope in the relationship between soil moisture and latent heat flux or strong variability in soil moisture.

Dirmeyer (2011) masks out results if the correlation between soil moisture and latent heat flux is not statistically significant; in what follows, all results are retained in order to compare means and variances of different sample populations. Four variations of two-legged coupling indices are considered, measuring the relationships between before-noon soil moisture and latent heat flux ILH, before-noon soil moisture and daily latent heat flux IdailyLH, before-noon soil moisture and sensible heat flux IH, and before-noon soil moisture and EF IEF. Because soil moisture is usually nonnormally distributed, may not be the most appropriate measure of soil moisture variability, though we use it here for consistency with Dirmeyer (2011).

The convective triggering metrics include four variants of the TFS as developed in Findell et al. (2011). Triggering feedback strength measures the sensitivity of afternoon rainfall to before-noon EF and is defined as
e2
where σEF is the standard deviation of EF and Γ(r) is the probability of local afternoon (1200–1800 LT) rainfall and the EF considered is the mean over the preceding 3-h (0900–1200 LT) period. Before-noon EF is considered to avoid temporal overlap and because EF is nearly conserved during daylight hours (Crago 1996; Crago and Brutsaert 1996; Gentine et al. 2007, 2011). The computation of TFS in Findell et al. (2011) considers the probability of rainfall in bins determined by three quantities: the convective triggering potential (CTP; Findell and Eltahir 2003a), the low-level humidity deficit (HIlow; Findell and Eltahir 2003a), and EF. CTP and HIlow are low-level atmospheric quantities used to assess the predisposition of the early morning atmosphere to convective triggering (Findell and Eltahir 2003a,b) and require atmospheric profiles of temperature and humidity extending at least 300 hPa above the ground surface. Since observations of these data are also limited, we consider a simplified estimate of triggering feedback strength sTFS, with bins defined only in EF space and with the number of bins ζmax set to three for simplicity:
e3
where denotes the probability of afternoon rainfall given that EF is in bin ζ. Thresholds for the EF bins are determined separately for each grid point to ensure equal numbers of observations are assigned to each bin. Large values of sTFS indicate a strong sensitivity of rainfall probability to evaporative fraction as well as large variability in evaporative fraction.

In addition to TFS, Findell et al. (2011) introduced the amplification feedback strength AFS as a metric for EF-related rainfall accumulation once triggering has occurred. Findell et al. (2011) show that AFS is close to zero across the domain analyzed, indicating that rainfall amounts are not highly sensitive to EF. Thus, we do not include AFS in this study. On the other hand, Berg et al. (2013) combined TFS and AFS into the combined feedback strength CFS, which accounts for both initial triggering and rainfall quantity. Here we use the simplified combined feedback strength sCFS, for which the probability terms in Eq. (3) are replaced with the expected value of afternoon (1200–1800 LT) rainfall. The sCFS differs from AFS in terms of the days included in the computation. The sCFS and sTFS are computed using the same set of days (days with early morning rain removed prior to calculation, as detailed below), while the AFS is computed using just the subset of the TFS-valid days when afternoon rainfall does occur. We also analyze simplified versions of both the TFS and the CFS for which EF in Eq. (3) is replaced with soil moisture (sTFSSM and sCFSSM).

Prior to computing all the metrics, days with rainfall greater than 1 mm in the 6 h before noon are removed, in order to remove potential influences from large-scale forcing. This differs from the methodology in Findell et al. (2011), in which an additional criterion was used to remove large-scale forcing days that might not be captured by the early morning rain criteria. That is, Findell et al. (2011) excluded days with early morning CTP < 0, since such days are typically so stable that local surface fluxes are not expected to influence convection. Since CTP is not included here, only the early morning rainfall criterion is used. In what follows, the number of days indicated in the calculations of all of the metrics corresponds to the number before eliminating days with morning rainfall. However, calculations for all the metrics were also performed with this early morning rain threshold removed, that is, all days were retained in the analysis, with no qualitative impact on the results. This suggests that the variability from sampling is far larger than the variability that is captured or excluded with this simple early morning rain-day threshold.

5. Results: Impact of time series length on estimated metric values

Figure 3 shows the composite PDFs generated from 300 bootstrap samples of the sTFS at each of the three block lengths and for six of the different bootstrap sample lengths. For short bootstrap sample lengths (Fig. 3, top), the PDFs are broad and relatively flat, indicating that a single 92-day-long observational sample could yield an sTFS value far from the mean of the PDF; in fact, some bootstrap estimates are even of opposite sign relative to the mean, which would imply a fundamentally distinct physical relationship between evaporative fraction and convective triggering. As observational sample length increases, the PDF narrows and tightens around the mean, indicating that estimates far from the mean are substantially less likely to occur with longer samples. This general pattern is repeated for each block length (in each column), indicating that the impact of block length is less critical than the sample length.

Figure 4 synthesizes the results of plots like Fig. 3 for one representative metric from each of the directly observed quantities and land–atmosphere coupling metrics at each of the 126 stations: that is, precipitation for the directly observed variables, the most well-known of the TFS and two-legged-style metrics, and one of the two correlation metrics. (Other metrics within each class produce similar plots and results.) In particular, the x axes in Fig. 4 illustrate the spread among the 1001 bootstrap samples created with 10-day blocks at each station for various sample lengths. The y axes show the mean of the 1001 bootstraps for 25 summers’ (2300 days) worth of data for each station, referred to hereafter as the “Gold Standard,” since we view this as the best estimate of the true value of the quantity assessed. The median values of the bootstrap ensembles for each station are shown with the black circles, with the thick blue lines showing the spread from the 25th to the 75th percentile and the thin blue lines extending across the range of ensemble members not characterized as outliers. Points are drawn as outliers (cyan dots) if they are larger than q3 + w(q3q1) or smaller than q1w(q3q1), where q1 and q3 are the 25th and 75th percentiles, respectively, and w is set to 1.5. While the medians of all the distributions fall along 1:1 lines, the spread is much larger for the short samples, particularly for the three land–atmosphere coupling metrics. The occurrence of such large spreads underscores the uncertainty inherent in estimating these quantities from short observational data samples.

Fig. 4.
Fig. 4.

For each of 126 “stations,” the variability of 1001 individual bootstraps (x axis) vs Gold Standard (y axis; the mean value of 1001 bootstraps of 2300-day samples). For each station, the black circle at the center of each line is the median, the inner 50% of data are shown with thick blue lines, the outer 50% of data are shown with thin blue lines, and the outliers are shown with cyan dots. The 1:1 line is in red. Results are from a random sampling of all data during 1979–2012 using a moving-block bootstrap routine with block length of 10 days. (from top to bottom) Mean daily precipitation (mm day−1); correlation between latent heat flux and 2-m air temperature (unitless); two-legged index between soil moisture and latent heat flux (units of soil moisture; kg m−2); and sTFS, measuring the impact of EF variation on the probability of afternoon rain (units of probability of afternoon rain).

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

To assess the differences between the bootstrap mean of the short-duration bootstraps with the Gold Standard bootstrap mean, we apply the multiple comparison procedure outlined by Stoline (1981). Multiple comparison procedures allow for multiple inferences to be tested simultaneously without artificially inflating the rate of false rejections of a hypothesis as the number of inferences increases. In our case, for each of the 126 stations shown in Fig. 1, we aim to compare the bootstrap distributions of a given quantity (mean precipitation, sTFS, etc.) from each of the shorter sample sets (1, 2, 3, 4, 6, 8, 10, 12, 15, and 20 summers’ worth of data) with those from the Gold Standard sample sets. By the central limit theorem, each set of 1001 bootstrap sample means of directly observed variables should approach a normal distribution, since each bootstrap sample comprises a mean of a large number of observations.

We can summarize the results of the multiple comparison procedure by considering the number of stations at which the null hypothesis of no difference between the bootstrap mean for given sample lengths and that of the Gold Standard is rejected at the 5% significance level (Fig. 5a). For directly observable variables (green curves), single-summer samples yield bootstrap means that are statistically different from the Gold Standard samples at between 4 and 11 of the 126 stations at the 95% confidence level. On the other hand, with just two summers of data, all the stations exhibit population means that are statistically indistinguishable from the population means of the Gold Standard. By contrast, for the metrics assessing relationships between variables, between 65 and 98 of the 126 stations (52%–78%) show bootstrap population means for single-summer samples that differ from the Gold Standard. In fact, 6–8 summers of data are needed for the correlation and two-legged metric values at all stations to converge statistically to the Gold Standard bootstrap mean, while as many as 12 summers are necessary for the triggering metrics to achieve this convergence.

Fig. 5.
Fig. 5.

Behavior of the bootstrap mean. Data requirements (x axis) as a function of amount of disagreement (y axis) between bootstrap subsamples of data and the 2300-day (25 summers) sample. The disagreement is quantified by the number of stations at which the null hypothesis that the short samples and 2300-day samples do not differ is rejected at the 5% significance level. Total number of stations considered is 126.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

Variability of bootstrap members is also influenced by the block length chosen for data sample selection. Figure 6 shows the impact of using 92-day block lengths in the sampling strategy and comparing the PDFs from these samples (i.e., Fig. 3, right) to the Gold Standard determined using 10-day blocks (i.e., bottom of Fig. 3, left). As in Fig. 4, the spread of bootstrap estimates decreases as bootstrap sample length increases. Unlike Fig. 4, however, the median values for the short sample lengths of the three derived metrics are frequently far from the 1:1 (red) line: in fact, for many stations, the median values for short sample lengths are biased high for the correlation metrics and many are biased low for the sTFS and two-legged metrics. These biases largely disappear for record lengths of ~6 summers for the correlation metric and 10 summers for sTFS. For the two-legged index, a few stations continue to show bias in their median values for samples longer than 10 summers.

Fig. 6.
Fig. 6.

As in Fig. 4, but for 300 individual bootstraps generated with 92-day data blocks (spread along the x axis) vs the Gold Standard for each station (y axis). Here, the Gold Standard is computed as the mean value of the first 300 bootstraps of 2300-day-long samples generated with 10-day data blocks.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

To determine if these biases result from comparing bootstrap sample populations made with 92-day blocks to a Gold Standard created with shorter blocks, Fig. 7 shows the spread of the same 92-day block samples (i.e., Fig. 3, right) compared against the mean of the 300 bootstrap samples covering 2300 days and created with 92-day blocks (i.e., bottom of Fig. 3, right). Similar biases are present in Figs. 6 and 7, indicating that individual 92-day blocks can be far from either estimate of the mean. Figures 5b and 5c summarize the multiple comparison procedure results on data shown in Figs. 6 and 7. Comparison of the three plots in the bottom row of Fig. 3 indicates that for this particular station, the means of these long-record PDFs are statistically the same. Figure 5c, however, indicates that this is not true at all stations: for each of the two-legged metrics and for the correlation between latent heat and soil moisture, mean values generated from 300 bootstrap samples of 25-summer-long records are slightly different when the samples are generated with 10-day versus 92-day blocks. Figure 5b shows that even 30-day block lengths can generate some of these mean differences for the two-legged metrics for short sample lengths: the number of stations with a 92-day mean that is statistically different from the Gold Standard mean value is about 70 out of 126 for the two-legged metrics generated from 10-day blocks, 110 for those generated with 30-day blocks, and 120 for those generated with 92-day blocks. These differences are more pronounced in the two-legged metrics than in the TFS-style metrics because the biases present in the two terms contributing to the two-legged metrics (e.g., σSM and the slope of the correlation between soil moisture and latent heat) are both negative, while the two terms contributing the sTFS have opposing biases: σEF is biased negative for the long-block samples, while the derivative in Eq. (2) is biased positive (not shown).

Fig. 7.
Fig. 7.

As in Fig. 6, but for the Gold Standard computed as the mean value of the 300 bootstraps of 2300-day-long samples generated with 92-day data blocks.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

Figure 8 assesses the severity of the biases in the two-legged and sTFS metrics displayed in Figs. 6 and 7. Figure 8 (top) shows deviations of the median value of the bootstrap samples for each station from the red 1:1 line in the two-legged ILH data shown in Fig. 7. The negative slope of the best-fit regression lines is particularly pronounced for the shortest sample lengths and decreases as sample length increases. These six lines are superimposed in the far-right panel in Fig. 8 (middle), with similar plots for the other variables in Fig. 7 given by the remaining panels. While the correlation metric shows some bias for the single-summer sample, this bias disappears at longer sample lengths. The bias persists longer for sTFS but is not as pronounced as for two-leg ILH, as discussed above. Similar plots generated with Fig. 6 data are indistinguishable, indicating that the 92-day block length produced similar biases against either Gold Standard.

Fig. 8.
Fig. 8.

(top) Black dots indicate deviations of median values for each station from the 1:1 line in the Fig. 7 two-legged ILH data. Gold Standard values are on the x axis. Negative deviations indicate that median values from the bootstrap samples are smaller than the Gold Standard value. Red lines are best-fit linear regression lines. (middle) Six regression lines from (top) are shown in the far right for two-legged ILH. Other panels show similar lines for the other variables in Fig. 7. (bottom) As in (middle), but for bootstrap samples with noise (data presented in Fig. 13, described in greater detail below).

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

Since the results depicted in Fig. 5 are obtained through an assessment of the means of large bootstrap populations of different sample lengths, they cannot be directly related to observational records where only one realization is available. Figure 9 demonstrates that while the mean of the 1001 bootstraps of short duration may not be statistically different from the mean of 1001 bootstraps of long duration, the range of values in the distribution of 1001 bootstraps at each station may be quite sizeable. For example, while Fig. 5 indicates that only a handful of the 126 stations have 92-day sample bootstrap means of surface temperature that are statistically different from the 2300-day sample bootstrap means, the range of 92-day bootstrap estimates exceeds 2°C for all but one of the stations; in fact, many stations exhibit ranges approaching 8°C. Since one does not know if the retrieved value for any individual bootstrap sample is close to the mean or to one of the extremes, the presence of a large range renders it difficult to assess the reliability and robustness of a single estimate. For most applications, an uncertainty of 8°C or even 4°C for June–August mean 2-m air temperature is unacceptable: clearly, longer records would be required here, despite the bootstrap mean behavior plotted in Fig. 5.

Fig. 9.
Fig. 9.

For nine variables, boxplots showing the difference between the max and the min bootstrap value at each of the 126 stations shown in Fig. 1 from the 1001 bootstrap samples (10-day block length) of a given sample record length. For each record length (x axis), the black circle at the center of each line is the median of the values from the 126 stations, the inner 50% of data are shown with thick blue lines, the outer 50% of data are shown with thin blue lines, and the outliers are shown with dots. Titles on figure panels use SHF and LHF for sensible heat flux H and latent heat flux λE, respectively.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

Figure 9 indicates that for most variables the spread among bootstrap members is reduced considerably up to sample lengths of 10 or 12 summers of data; for longer sample lengths, the reduction in spread with increasing sample length is much smaller. Nevertheless, for H, for example, the station with the largest range in bootstrap estimates is about 100 W m−2 at 10 years and closer to 60 W m−2 at 25 years, so even this slow rate of improvement can lead to substantially increased accuracy in individual bootstrap estimates.

To quantify the relationship between time series length and the reliability of individual observational realizations, we consider the standard error of the mean of a normally distributed variable, , where σ is the population standard deviation and n is the sample length. For directly observable variables such as precipitation, we can determine a minimum value of n needed to estimate the population mean within an acceptable margin of error MOE and for a specified confidence interval. The MOE, like the standard error of the mean, is inversely proportional to the square root of n:
e4
where (1 − α) is the confidence level and is the (positive) Z test value with area (α/2) in the right tail of a standard normal distribution. Equation (4) applies when σ is known and we wish to determine the sample size necessary to establish, with a confidence of (1 − α), the mean value to within ±MOE; in other words, the MOE represents the difference between the observed sample mean and the true population mean that is exceeded α (%) of the time. For our purposes, we assume σ is given by the mean σ of the Gold Standard samples, so we can invert Eq. (4) to obtain the minimum sample size:
e5
Figure 10 depicts MOE versus nmin for six quantities, assuming α = 0.05. To account for inaccuracies in the determination of σ, we also depict results for the minimum and maximum σ calculated from all 2300-day bootstrap members (10-day block length) at all stations (dashed lines in Fig. 10). Since daily precipitation and soil moisture are typically lognormally or exponentially distributed, calculations are performed on ln(p) and ln(SM) (see, e.g., Bras 1989). Obviously, what constitutes an acceptable MOE depends on the variable and the application of interest (see, e.g., Entekhabi et al. 2010). For the purposes of Fig. 10, we note that for all six variables, between two and four summers of data are required to ensure that the MOE is less than 5% of the range of the mean values at the 126 stations, that is, the maximum station mean minus the minimum station mean. Some applications may, of course, require much smaller MOEs and thus larger values of nmin.
Fig. 10.
Fig. 10.

Data requirements for directly observable variables for a range of acceptable margins of error, relative to the range of mean values across the 126 stations. Solid lines are derived from Eq. (5) with α = 5%, a population std dev is taken from the mean of all 126 stations from the full ensemble of 2300-day bootstrap samples, and the dashed lines use the min and max σ calculated from all 2300-day bootstrap members at all stations.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

Equation (5) for sample size requirements does not directly apply for a derived metric such as sTFS, since all values in a given bootstrap sample are used to compute a single value, that is, regardless of sample length, we obtain only one estimate of the metric. Through bootstrapping we can evaluate the variation between members of a bootstrap sample set. However, we are interested in assessing the degree to which such variation can be attributed to both the length of each bootstrap sample and the number of bootstraps forming the ensemble. Figure 11 addresses this question using a standard score normalized standard deviation NSD of the bootstraps X′, defined as
e6
where and are the mean and standard deviation of the metric values computed from the 1001 Gold Standard bootstraps at each station. The normalization [Eq. (6)] permits direct comparison of the different variables. For Fig. 11, the 11 black lines shown denote the mean values of at the 126 stations for each of the 11 record lengths sampled, the blue envelopes show the standard score NSD of the bootstraps with 92 days of data for each of the 126 stations, and the green envelopes show this for the bootstraps with 2300 days of data.
Fig. 11.
Fig. 11.

Number of ensemble members in a bootstrap sample set vs standard score normalized std dev of the bootstraps [X′; Eq. (6)]. The 11 black lines are the mean values of X′ at the 126 stations for each of the 11 record lengths sampled with longer samples associated with lower y-axis values. The blue envelopes show the standard score normalized std dev of the bootstraps with 92 days of data for each of the 126 stations, while the green envelopes show this for the bootstraps with 2300 days of data. For all categories of metrics, the number of observations contributing to each bootstrap sample has a much bigger impact on variability within a sample set than the number of bootstrap members. The variability is largely independent of the number of bootstraps once the ensemble is larger than about 100–300 members.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

In general, for each quantity shown in Fig. 11, the horizontal orientation of the black lines indicates that for all sample lengths, the variability is largely independent of the number of bootstraps. For ensembles larger than ~100–300 members, the spread around this mean black line for a given variable does not change. (This result was used to determine that 300 bootstrap members were sufficient for describing the variability produced with 30- and 92-day block lengths.) Ensembles made from longer observational time series have smaller NSDs, meaning that the number of observations contributing to each bootstrap sample has a much bigger impact on variability within a sample set than the number of bootstrap members. The blue envelopes, however, indicate that different variables/metrics have very different spread among bootstraps, even with a large bootstrap number: the three derived metrics manifest 2–3 times the normalized spread as precipitation even as the number of bootstraps approaches 1000. Additionally, all four quantities show that for small numbers of bootstraps, the spread of metric estimates for short sample lengths, that is, the leftmost values in the blue envelope, are highly variable, with the NSD at the smallest x values shown in Fig. 11 ranging from about 1.5 to 11 for precipitation and from 0 to more than 12 for sTFS. (The best estimate of the true value should yield an NSD of 1.) This precludes assessment of whether a metric estimate derived from a short sample with limited realizations is close to the metric’s “true” value, even if the population mean of a large number of realizations of short samples is statistically consistent with the true value.

For all of the metrics analyzed, the shortest sample lengths are associated with the largest spreads, with the mean normalized standard deviation of the bootstraps (black lines in Fig. 11) typically 5–6 times greater for the shortest sample length than for the longest record lengths. This is consistent with the Gold Standard sample length being 25 times the shortest sample length, since, as stated above, the standard error of the mean is given by and with standardization. Given that the black lines in Fig. 11 demonstrate that the spread is relatively insensitive to the number of bootstraps in a sample set (for >100–300 bootstraps), we can calculate from the 126 stations the mean and the 5th and 95th percentile values of the normalized standard deviation for a large number of bootstraps—here we use the mean over 950–1000 bootstraps, though results are insensitive to this choice—versus the number of years of data contributing to the samples (Fig. 12).

Fig. 12.
Fig. 12.

From the data in Fig. 11 (and similar figures for other variables), the mean (central symbols), 5th (lowest symbols), and 95th (upper-most symbols) percentile values of the normalized std dev from the 126 stations for a large number of bootstraps (mean for 950–1000 bootstraps) vs the number of years of data contributing to the samples. Curved black line is , where x is summers of data, representing the variability stemming from the standard error of the mean and the normalization relative to the 25-summer-long ensemble set (see text for discussion). Symbols are spread out on the x axis for easier readability but are all associated with the nearest integer values along the x axis.

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

For the single-summer samples, the directly observable quantities exhibit mean standard deviations approximately 5 times that of Gold Standard samples (central symbols on green lines at x = 1). Individual stations have single-summer standard deviations ranging from 4.75 to 5.25 times the mean of the long samples (the low and high symbols on green lines at x = 1). The convection triggering metrics, on the other hand, have a mean standard deviation that is 6–7 times that of 2300-day samples, while individual stations have standard deviations that range from about 5 to 10 times the mean of the long samples. The stations with the 5th percentile convection triggering value (lowest symbol on the black lines) have normalized standard scores equivalent to the mean values from the directly observable variables (middle symbols on the green lines). The correlations and two-legged metrics have individual station standard deviations ranging from about 4.5 to about 6.5 times that of the long samples. The larger values for the derived metrics point to sources of error beyond sample size.

For all variables and metrics, the variability decreases as sample length increases, with the directly observable variables governed by the MOE equation discussed earlier [Eq. (5), shown as the solid black line on Fig. 12]. The range of variability of the correlation metrics and the two-legged metrics across all stations is equivalent to that of the directly observable variables when the data samples exceed 4–6 years, but the convection triggering metrics have only about half of the stations within this range of variability with 6 summers of data, and up to 12 summers are required for the 5th–95th percentile range of variability to match that of the other variable classes.

6. Impact of noise on data requirements

While this sampling exercise with NARR output provides some guidance regarding data length requirements for robust estimates of land–atmosphere coupling metrics, most observational datasets are further subject to instrument errors and sampling biases (see, e.g., Entekhabi et al. 2010). We now investigate the impact of noise on the sample length requirement. Three noise models were applied to generate errors, each performed with sampling in 10-day blocks.

The first noise model involved an additive noise term based on the mean of each directly observable variable (precipitation, soil moisture, latent heat flux, sensible heat flux, and 2-m air temperature). Noise terms for each of these variables were generated from a normal distribution with mean of zero and standard deviations of either 10% or 20% of variable means. This noise model was slightly modified for the flux terms, since for areas with small values of mean sensible heat flux, the evaporative fraction is essentially unchanged by addition of noise. The modification entailed developing a noise term based on the mean value of net radiation, which was then added to λE and subtracted from H. This may, of course, lead to an overestimate of the noise in the sensible heat flux term.

The second noise model involved application of realistic instrument errors, given in Table 1, taken from the work of Phillips and Klein (2013) with data from the ARM Best Estimate dataset [ARMBE; Xie et al. (2010)]. For each daily sampled value of the quantities listed in Table 1, the sample error terms were taken from uniform normal distributions with means of zero and standard deviations equal to the listed RMSE values in Table 1.

Table 1.

Estimated measurement RMSE from Phillips and Klein (2013). Because soil moisture in the NARR dataset is provided in fractional units, a conversion was required: the RMSE for the soil moisture data plotted in Phillips and Klein (2013) is 7%–10% of the range of soil moisture observations, so this 7%–10% was applied to the range of NARR soil moisture data at the location of the ARM site to yield a value of 0.03 fractional units.

Table 1.

Under both the first and second noise models, negative values of positive-definite quantities (precipitation and soil moisture) were rejected, resulting in slight positive biases in the noise for such quantities. Additionally, we removed days with EF outside of (0, 1), leading to some further biases in the noisy datasets for each of these terms.

To ensure that qualitatively similar results from the noisy experiments are not artifacts of the biases in the noisy samples, a third, multiplicative noise configuration was devised and implemented at two noise amplitudes. For these experiments, we generated normally distributed noisy factor time series fvar of length equal to the length of the observed time series for each directly observed variable var (i.e., soil moisture, precipitation, latent and sensible heat fluxes, and 2-m air temperature), with means of 1.0 and standard deviations of either 0.1 or 0.25. Each observed time series was multiplied by fvar, generating multiplicative noise that is related to the magnitude of the given observation and thus yielding noisy observations that are of the same sign as the original observations. Additionally, this type of noise model avoids generation of unrealistic errors such as a large random noise term on a day with no observed precipitation. For 2-m air temperature, the factor was applied to the anomaly of temperature rather than to its absolute value.

The results presented in Fig. 13 are for the multiplicative noise model of standard deviation 0.25. In fact, all noise models yield qualitatively similar results to those shown in Fig. 13, namely, that the noisy sample populations are not significantly different from the original noise-free populations shown in Fig. 4. However, the lower two quantities, IdailyLH and sTFS, do show a deviation from the 1:1 line, all toward lower median values in the noisy populations than the noise-free populations, suggesting that the introduction of noise degrades derived relationships between observed variables as in these metrics. The two-legged coupling metrics manifest more sensitivity to the noise. Although the deviations from the 1:1 line here are smaller than those for noise-free small sample sizes with consecutive data blocks (Figs. 6, 7), the biases persist even when the sample sizes are very long (25 years), indicating a systematic shift in the metric estimates in the presence of noise. This is clearly demonstrated in Fig. 8 (bottom).

Fig. 13.
Fig. 13.

As in Fig. 4, but for bootstraps with noise (see text for details).

Citation: Journal of Hydrometeorology 16, 4; 10.1175/JHM-D-14-0131.1

7. Discussion and conclusions

In this work, we apply a systematic sampling framework to the North American Regional Reanalysis (NARR) dataset (Mesinger et al. 2006) in order to quantify the observational data requirements necessary for characterizing a variety of metrics related to land–atmosphere coupling. We show that more data are needed to obtain reliable estimates of metrics assessing the relationship between variables, for example, the triggering feedback strength, which assess the relationship between before-noon evaporative fraction and afternoon rainfall, than are needed to determine the means of directly observable quantities such as daily precipitation or temperature. Qualitatively similar results are evident for application of the sampling framework to a general circulation model from the Geophysical Fluid Dynamics Laboratory.

We develop estimates of three different classes of metrics of land–atmosphere coupling strength using data from large bootstrap ensembles of differing time series length (ranging from 1 to 25 summers’ worth of data, i.e., 92–2300 days) and multiple sampling methodologies. With these data, we demonstrate that 6–12 summers of observational data are required to yield bootstrap sample means at all stations that do not differ from those derived from the ensembles composed of the longest data records (25 years). Similar analyses with directly observable variables such as daily precipitation or surface temperature suggest that bootstrap population means are consistent with long sample length means with fewer than two summers’ worth of data (184 days).

However, for consideration of individual data records of a given sample length, rather than characteristics of means of large bootstrap sample sets, the variability between bootstrap members and acceptable error levels for the application at hand must also be considered. While the mean of large bootstrap sample sets of short data records may be statistically consistent with those of long data records, observed time series provide only a single realization, and the variance of realizations of directly observable variables is inversely proportional to the square root of the number of observations. Figure 9 shows that this has a dramatic impact on the variability of different bootstrap samples, with the range of estimates from different bootstraps decreasing rapidly as observation time series length increases over the first 10–12 summers, and then decreasing more slowly as the sample length continues to increase.

Derived metrics have additional sources of variability in the relationships between their contributing observed variables, as evidenced by the broader range of population variances seen with derived metrics, particularly from shorter samples (Fig. 12). For the correlation and two-legged classes of derived metrics, the range of the mean of variances of many bootstrap members is similar to that of directly observed variables when the samples have at least 4–6 years of data; for the convection triggering metrics, a broader range of variances is evident for sample lengths shorter than 12 summers (1104 days) long. Additionally, while unbiased noise does not alter the mean of a directly observable variable, results suggest that it degrades the strength of the connection between variables, yielding a unidirectional, negative impact on metrics assessing this connectivity, which suppresses the estimated metric strength. This reduction of metric strength persists even when the sample size is very large.

This work shows how limited time series length and noise may account for some of the uncertainty in estimates of land–atmosphere coupling. Furthermore, analysis of summer-long (92 day) blocks of data indicate that consecutive data-gathering periods, as opposed to the randomly selected 10-day blocks of data primarily analyzed here, lead to increased variability in all fields, suggesting that our results may be conservative, that is, longer records would be required if the observations of interest are all consecutively collected. This likely stems from single-summer blocks of data spanning less of the parameter space than samples created from multiple data blocks randomly chosen from different summers. We further note that the scale mismatch between point-scale observations and model gridbox averages may also render our results conservative. Indeed, given that the “point” measurements used in this study are really averages over ~30 km × ~30 km grid boxes, one may anticipate even steeper data requirements at the actual point scale. Of course, other potentially significant sources of model–observation difference that we have not considered include deficiencies or errors in model physics or parameterizations, especially the simplified representations of real-world processes that fail to capture all interactions among variables or that introduce spurious relationships. Further research is needed to disentangle these contributions to existing differences between observational and model-based estimates of land–atmosphere coupling.

In terms of designing field campaigns, a choice may be needed between sampling in a single location over long periods versus sampling at multiple locations over short periods. Clearly, this choice is often informed by the types of scientific questions to be addressed as well as practical considerations. In the context of land–atmosphere coupling assessments, we believe our study demonstrates the importance of long-duration observational records. To that end, we note the development of the ARM-SGP test bed of site observations is a key effort in this direction, but long-term observations are also needed at additional sites in different climatological settings. Long-term remote sensing initiatives could begin to address temporal data requirements highlighted by this work while also covering large spatial domains.

Acknowledgments

The authors thank John Lanzante, Ron Stouffer, and Alexis Berg for providing insightful reviews of earlier versions of the manuscript. Helpful conversations with Sergey Malyshev and Keith Dixon were also very much appreciated. Funding support was provided by National Science Foundation Grant NSF-AGS-1035986.

REFERENCES

  • Becker, E., Berbery E. H. , and Higgins R. W. , 2009: Understanding the characteristics of daily precipitation over the United States using the North American Regional Reanalysis. J. Climate, 22, 62686286, doi:10.1175/2009JCLI2838.1.

    • Search Google Scholar
    • Export Citation
  • Berbery, E. H., Luo Y. , Mitchell K. E. , and Betts A. K. , 2003: Eta model estimated land surface processes and the hydrologic cycle of the Mississippi basin. J. Geophys. Res., 108, 8852, doi:10.1029/2002JD003192.

    • Search Google Scholar
    • Export Citation
  • Berg, A., Findell K. L. , Lintner B. R. , Gentine P. , and Kerr C. , 2013: Precipitation sensitivity to surface heat fluxes over North America in reanalysis and model data. J. Hydrometeor., 14, 722–743, doi:10.1175/JHM-D-12-0111.1.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., and Ball J. H. , 1995: The FIFE surface diurnal cycle climate. J. Geophys. Res., 100, 25 67925 693, doi:10.1029/94JD03121.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., Viterbo P. , and Wood E. , 1998: Surface energy and water balance for the Arkansas–Red River basin from the ECMWF reanalysis. J. Climate,11, 2881–2897, doi:10.1175/1520-0442(1998)011<2881:SEAWBF>2.0.CO;2.

  • Betts, A. K., Ball J. H. , and Viterbo P. , 1999: Basin-scale surface water and energy budgets for the Mississippi from the ECMWF reanalysis. J. Geophys. Res., 104, 19 29319 306, doi:10.1029/1999JD900056.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., Fuentes J. , Garstang M. , and Ball J. , 2002: Surface diurnal cycle and boundary layer structure over Rondonia during the rainy season. J. Geophys. Res., 107, 8065, doi:10.1029/2001JD000356.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., Ball J. H. , Bosilovich M. , Viterbo P. , and Zhang Y. , 2003: Intercomparison of water and energy budgets for five Mississippi subbasins between ECMWF reanalysis (ERA-40) and NASA Data Assimilation Office fvGCM for 1990–1999. J. Geophys. Res.,108, 8618, doi:10.1029/2002JD003127.

  • Betts, A. K., Zhao M. , Dirmeyer P. A. , and Beljaars A. C. M. , 2006: Comparison of ERA40 and NCEP/DOE near-surface data sets with other ISLSCP-II data sets. J. Geophys. Res., 111, D22S04, doi:10.1029/2006JD007174.

    • Search Google Scholar
    • Export Citation
  • Bras, R., 1989: Hydrology: An Introduction to Hydrologic Science. Addison-Wesley, 660 pp.

  • Chen, F., and Zhang Y. , 2009: On the coupling strength between the land surface and the atmosphere: From viewport of surface exchange coefficients. Geophys. Res. Lett., 36, L10404, doi:10.1029/2009GL037980.

    • Search Google Scholar
    • Export Citation
  • Crago, R., 1996: Conservation and variability of the evaporative fraction during the daytime. J. Hydrol., 180, 173194, doi:10.1016/0022-1694(95)02903-6.

    • Search Google Scholar
    • Export Citation
  • Crago, R., and Brutsaert W. , 1996: Daytime evaporation and the self-preservation of the evaporative fraction and the Bowen ratio. J. Hydrol., 178, 241255, doi:10.1016/0022-1694(95)02803-X.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., 2006: The hydrologic feedback pathway for land–climate coupling. J. Hydrometeor., 7, 857867, doi:10.1175/JHM526.1.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., 2011: The terrestrial segment of soil moisture–climate coupling. Geophys. Res. Lett., 38, L16702, doi:10.1029/2011GL048268.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., Koster R. D. , and Guo Z. , 2006: Do global models properly represent the feedback between land and atmosphere? J. Hydrometeor., 7, 11771198, doi:10.1175/JHM532.1.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., Schlosser C. A. , and Brubaker K. L. , 2009: Precipitation, recycling, and land memory: An integrated analysis. J. Hydrometeor., 10, 278288, doi:10.1175/2008JHM1016.1.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., and Coauthors, 2012: Evidence for enhanced land–atmosphere feedback in a warming climate. J. Hydrometeor., 13, 981995, doi:10.1175/JHM-D-11-0104.1.

    • Search Google Scholar
    • Export Citation
  • Efron, B., 1979: Bootstrap methods: Another look at the jackknife. Ann. Stat., 7, 126, doi:10.1214/aos/1176344552.

  • Ek, M. B., Mitchell K. E. , Lin Y. , Rogers E. , Grunmann P. , Koren V. , Gayno G. , and Tarpley J. D. , 2003: Implementation of Noah land surface model advances in the National Centers for Environmental Prediction operational mesoscale Eta model. J. Geophys. Res., 108, 8851, doi:10.1029/2002JD003296.

    • Search Google Scholar
    • Export Citation
  • Entekhabi, D., Reichle R. H. , Koster R. D. , and Crow W. T. , 2010: Performance metrics for soil moisture retrievals and application requirements. J. Hydrometeor., 11, 832840, doi:10.1175/2010JHM1223.1.

    • Search Google Scholar
    • Export Citation
  • Ferguson, C. R., and Wood E. F. , 2011: Observed land–atmosphere coupling from satellite remote sensing and reanalysis. J. Hydrometeor., 12, 12211254, doi:10.1175/2011JHM1380.1.

    • Search Google Scholar
    • Export Citation
  • Ferguson, C. R., Wood E. F. , and Vinukollu R. K. , 2012: A global intercomparison of modeled and observed land–atmosphere coupling. J. Hydrometeor., 13, 739784, doi:10.1175/JHM-D-11-0119.1.

    • Search Google Scholar
    • Export Citation
  • Findell, K. L., and Eltahir E. , 2003a: Atmospheric controls on soil moisture–boundary layer interactions. Part I: Framework development. J. Hydrometeor., 4, 552569, doi:10.1175/1525-7541(2003)004<0552:ACOSML>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Findell, K. L., and Eltahir E. , 2003b: Atmospheric controls on soil moisture–boundary layer interactions. Part II: Feedbacks within the continental United States. J. Hydrometeor., 4, 570583, doi:10.1175/1525-7541(2003)004<0570:ACOSML>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Findell, K. L., Gentine P. , and Lintner B. , 2011: Probability of afternoon precipitation in eastern United States and Mexico enhanced by high evaporation. Nat. Geosci., 4, 434439, doi:10.1038/ngeo1174.

    • Search Google Scholar
    • Export Citation
  • Gentine, P., Entekhabi D. , Chehbouni A. , Boulet G. , and Duchemin B. , 2007: Analysis of evaporative fraction diurnal behaviour. Agric. For. Meteor., 143, 1329, doi:10.1016/j.agrformet.2006.11.002.

    • Search Google Scholar
    • Export Citation
  • Gentine, P., Entekhabi D. , and Polcher J. , 2011: The diurnal behavior of evaporative fraction in the soil–vegetation–atmospheric boundary layer continuum. J. Hydrometeor., 12, 15301546, doi:10.1175/2011JHM1261.1.

    • Search Google Scholar
    • Export Citation
  • Gentine, P., Holtslag A. A. M. , D’Andrea F. , and Ek M. , 2013: Surface and atmospheric controls on the onset of moist convection over land. J. Hydrometeor., 14, 1443–1462, doi:10.1175/JHM-D-12-0137.1.

    • Search Google Scholar
    • Export Citation
  • Guillod, B. P., and Coauthors, 2014: Land-surface controls on afternoon precipitation diagnosed from observational data: Uncertainties and confounding factors. Atmos. Chem. Phys., 14, 83438367, doi:10.5194/acp-14-8343-2014.

    • Search Google Scholar
    • Export Citation
  • Koster, R. D., and Coauthors, 2004: Regions of strong coupling between soil moisture and precipitation. Science, 305, 11381140, doi:10.1126/science.1100217.

    • Search Google Scholar
    • Export Citation
  • Koster, R. D., and Coauthors, 2006: GLACE: The Global Land–Atmosphere Coupling Experiment. Part I: Overview. J. Hydrometeor., 7, 590610, doi:10.1175/JHM510.1.

    • Search Google Scholar
    • Export Citation
  • Künsch, H. R., 1989: The jackknife and the bootstrap for general stationary observations. Ann. Stat., 17, 12171241, doi:10.1214/aos/1176347265.

    • Search Google Scholar
    • Export Citation
  • Liu, D., Wang G. , Mei R. , Yu Z. , and Gu H. , 2014: Diagnosing the strength of land–atmosphere coupling at subseasonal to seasonal time scales in Asia. J. Hydrometeor.,15, 320–339, doi:10.1175/JHM-D-13-0104.1.

  • Marshall, C. H., Crawford K. C. , Mitchell K. E. , and Stensrud D. J. , 2003: The impact of the land surface physics in the operational NCEP Eta model on simulating the diurnal cycle: Evaluation and testing using Oklahoma Mesonet data. Wea. Forecasting, 18, 748768, doi:10.1175/1520-0434(2003)018<0748:TIOTLS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Mei, R., and Wang G. , 2012: Summer land–atmosphere coupling strength in the United States: Comparison among observations, reanalysis data, and numerical models. J. Hydrometeor., 13, 10101022, doi:10.1175/JHM-D-11-075.1.

    • Search Google Scholar
    • Export Citation
  • Mesinger, F., and Coauthors, 2006: North American Regional Reanalysis. Bull. Amer. Meteor. Soc.,87, 343–360, doi:10.1175/BAMS-87-3-343.

  • Miralles, D. G., van den Berg M. J. , Teuling A. J. , and de Jeu R. A. M. , 2012: Soil moisture–temperature coupling: A multiscale observational analysis. Geophys. Res. Lett., 39, L21707, doi:10.1029/2012GL053703.

    • Search Google Scholar
    • Export Citation
  • Mitchell, K. E., and Coauthors, 2004a: The multi-institutional North American Land Data Assimilation System (NLDAS): Utilizing multiple GCIP products and partners in a continental distributed hydrological modeling system. J. Geophys. Res., 109, D07S90, doi:10.1029/2003JD003823.

    • Search Google Scholar
    • Export Citation
  • Mitchell, K. E., and Coauthors, 2004b: NCEP completes 25-year North American Reanalysis: Precipitation assimilation and land surface are two hallmarks. GEWEX News, Vol. 14, No. 2, International GEWEX Project Office, Silver Spring, MD, 912.

  • Mo, K., Chelliah M. , Carrera M. L. , Higgins R. W. , and Ebisuzaki W. , 2005: Atmospheric moisture transport over the United States and Mexico as evaluated in the NCEP regional reanalysis. J. Hydrometeor., 6, 710728, doi:10.1175/JHM452.1.

    • Search Google Scholar
    • Export Citation
  • Phillips, T. J., and Klein S. A. , 2013: Land–atmosphere coupling manifested in warm-season observations on the U.S. Southern Great Plains. J. Geophys. Res. Atmos., 119, 509–528, doi:10.1002/2013JD020492.

    • Search Google Scholar
    • Export Citation
  • Roads, J., and Coauthors, 2003: GCIP Water and Energy Budget Synthesis (WEBS). J. Geophys. Res., 108, 8609, doi:10.1029/2002JD002583.

  • Ruane, A. C., 2010a: NARR’s atmospheric water cycle components. Part I: 20-year mean and annual interactions. J. Hydrometeor., 11, 12051219, doi:10.1175/2010JHM1193.1.

    • Search Google Scholar
    • Export Citation
  • Ruane, A. C., 2010b: NARR’s atmospheric water cycle components. Part II: Summertime mean and diurnal interactions. J. Hydrometeor., 11, 12201233, doi:10.1175/2010JHM1279.1.

    • Search Google Scholar
    • Export Citation
  • Santanello, J. A., Jr., Friedl M. A. , and Kustas W. , 2005: An empirical investigation of convective planetary boundary layer evolution and its relationship with the land surface. J. Appl. Meteor., 44, 917932, doi:10.1175/JAM2240.1.

    • Search Google Scholar
    • Export Citation
  • Santanello, J. A., Jr., Friedl M. A. , and Ek M. B. , 2007: Convective planetary boundary layer interactions with the land surface at diurnal time scales: Diagnostics and feedbacks. J. Hydrometeor., 8, 10821097, doi:10.1175/JHM614.1.

    • Search Google Scholar
    • Export Citation
  • Santanello, J. A., Jr., Peters-Lidard C. D. , Kumar S. V. , Alonge C. , and Tao W.-K. , 2009: A modeling and observational framework for diagnosing local land–atmosphere coupling on diurnal time scales. J. Hydrometeor., 10, 577599, doi:10.1175/2009JHM1066.1.

    • Search Google Scholar
    • Export Citation
  • Seneviratne, S. I., Luethi D. , Litschi M. , and Schaer C. , 2006: Land–atmosphere coupling and climate change in Europe. Nature, 443, 205209, doi:10.1038/nature05095.

    • Search Google Scholar
    • Export Citation
  • Spracklen, D. V., Arnold S. R. , and Taylor C. M. , 2012: Observations of increased tropical rainfall preceded by air passage over forests. Nature, 489, 282285, doi:10.1038/nature11390.

    • Search Google Scholar
    • Export Citation
  • Stoline, M. R., 1981: The status of multiple comparisons: Simultaneous estimation of all pairwise comparisons in one-way ANOVA designs. Amer. Stat., 35, 134141, doi:10.1080/00031305.1981.10479331.

    • Search Google Scholar
    • Export Citation
  • Taylor, C. M., de Jeu R. A. M. , Guichard F. , Harris P. P. , and Dorigo W. A. , 2012: Afternoon rain more likely over drier soils. Nature, 489, 423426, doi:10.1038/nature11377.

    • Search Google Scholar
    • Export Citation
  • Xie, S. C., and Coauthors, 2010: ARM climate modeling best estimate data: A new product for climate studies. Bull. Amer. Meteor. Soc., 91, 1320, doi:10.1175/2009BAMS2891.1.

    • Search Google Scholar
    • Export Citation
Save
  • Becker, E., Berbery E. H. , and Higgins R. W. , 2009: Understanding the characteristics of daily precipitation over the United States using the North American Regional Reanalysis. J. Climate, 22, 62686286, doi:10.1175/2009JCLI2838.1.

    • Search Google Scholar
    • Export Citation
  • Berbery, E. H., Luo Y. , Mitchell K. E. , and Betts A. K. , 2003: Eta model estimated land surface processes and the hydrologic cycle of the Mississippi basin. J. Geophys. Res., 108, 8852, doi:10.1029/2002JD003192.

    • Search Google Scholar
    • Export Citation
  • Berg, A., Findell K. L. , Lintner B. R. , Gentine P. , and Kerr C. , 2013: Precipitation sensitivity to surface heat fluxes over North America in reanalysis and model data. J. Hydrometeor., 14, 722–743, doi:10.1175/JHM-D-12-0111.1.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., and Ball J. H. , 1995: The FIFE surface diurnal cycle climate. J. Geophys. Res., 100, 25 67925 693, doi:10.1029/94JD03121.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., Viterbo P. , and Wood E. , 1998: Surface energy and water balance for the Arkansas–Red River basin from the ECMWF reanalysis. J. Climate,11, 2881–2897, doi:10.1175/1520-0442(1998)011<2881:SEAWBF>2.0.CO;2.

  • Betts, A. K., Ball J. H. , and Viterbo P. , 1999: Basin-scale surface water and energy budgets for the Mississippi from the ECMWF reanalysis. J. Geophys. Res., 104, 19 29319 306, doi:10.1029/1999JD900056.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., Fuentes J. , Garstang M. , and Ball J. , 2002: Surface diurnal cycle and boundary layer structure over Rondonia during the rainy season. J. Geophys. Res., 107, 8065, doi:10.1029/2001JD000356.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., Ball J. H. , Bosilovich M. , Viterbo P. , and Zhang Y. , 2003: Intercomparison of water and energy budgets for five Mississippi subbasins between ECMWF reanalysis (ERA-40) and NASA Data Assimilation Office fvGCM for 1990–1999. J. Geophys. Res.,108, 8618, doi:10.1029/2002JD003127.

  • Betts, A. K., Zhao M. , Dirmeyer P. A. , and Beljaars A. C. M. , 2006: Comparison of ERA40 and NCEP/DOE near-surface data sets with other ISLSCP-II data sets. J. Geophys. Res., 111, D22S04, doi:10.1029/2006JD007174.

    • Search Google Scholar
    • Export Citation
  • Bras, R., 1989: Hydrology: An Introduction to Hydrologic Science. Addison-Wesley, 660 pp.

  • Chen, F., and Zhang Y. , 2009: On the coupling strength between the land surface and the atmosphere: From viewport of surface exchange coefficients. Geophys. Res. Lett., 36, L10404, doi:10.1029/2009GL037980.

    • Search Google Scholar
    • Export Citation
  • Crago, R., 1996: Conservation and variability of the evaporative fraction during the daytime. J. Hydrol., 180, 173194, doi:10.1016/0022-1694(95)02903-6.

    • Search Google Scholar
    • Export Citation
  • Crago, R., and Brutsaert W. , 1996: Daytime evaporation and the self-preservation of the evaporative fraction and the Bowen ratio. J. Hydrol., 178, 241255, doi:10.1016/0022-1694(95)02803-X.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., 2006: The hydrologic feedback pathway for land–climate coupling. J. Hydrometeor., 7, 857867, doi:10.1175/JHM526.1.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., 2011: The terrestrial segment of soil moisture–climate coupling. Geophys. Res. Lett., 38, L16702, doi:10.1029/2011GL048268.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., Koster R. D. , and Guo Z. , 2006: Do global models properly represent the feedback between land and atmosphere? J. Hydrometeor., 7, 11771198, doi:10.1175/JHM532.1.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., Schlosser C. A. , and Brubaker K. L. , 2009: Precipitation, recycling, and land memory: An integrated analysis. J. Hydrometeor., 10, 278288, doi:10.1175/2008JHM1016.1.

    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., and Coauthors, 2012: Evidence for enhanced land–atmosphere feedback in a warming climate. J. Hydrometeor., 13, 981995, doi:10.1175/JHM-D-11-0104.1.

    • Search Google Scholar
    • Export Citation
  • Efron, B., 1979: Bootstrap methods: Another look at the jackknife. Ann. Stat., 7, 126, doi:10.1214/aos/1176344552.

  • Ek, M. B., Mitchell K. E. , Lin Y. , Rogers E. , Grunmann P. , Koren V. , Gayno G. , and Tarpley J. D. , 2003: Implementation of Noah land surface model advances in the National Centers for Environmental Prediction operational mesoscale Eta model. J. Geophys. Res., 108, 8851, doi:10.1029/2002JD003296.

    • Search Google Scholar
    • Export Citation
  • Entekhabi, D., Reichle R. H. , Koster R. D. , and Crow W. T. , 2010: Performance metrics for soil moisture retrievals and application requirements. J. Hydrometeor., 11, 832840, doi:10.1175/2010JHM1223.1.

    • Search Google Scholar
    • Export Citation
  • Ferguson, C. R., and Wood E. F. , 2011: Observed land–atmosphere coupling from satellite remote sensing and reanalysis. J. Hydrometeor., 12, 12211254, doi:10.1175/2011JHM1380.1.

    • Search Google Scholar
    • Export Citation
  • Ferguson, C. R., Wood E. F. , and Vinukollu R. K. , 2012: A global intercomparison of modeled and observed land–atmosphere coupling. J. Hydrometeor., 13, 739784, doi:10.1175/JHM-D-11-0119.1.

    • Search Google Scholar
    • Export Citation
  • Findell, K. L., and Eltahir E. , 2003a: Atmospheric controls on soil moisture–boundary layer interactions. Part I: Framework development. J. Hydrometeor., 4, 552569, doi:10.1175/1525-7541(2003)004<0552:ACOSML>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Findell, K. L., and Eltahir E. , 2003b: Atmospheric controls on soil moisture–boundary layer interactions. Part II: Feedbacks within the continental United States. J. Hydrometeor., 4, 570583, doi:10.1175/1525-7541(2003)004<0570:ACOSML>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Findell, K. L., Gentine P. , and Lintner B. , 2011: Probability of afternoon precipitation in eastern United States and Mexico enhanced by high evaporation. Nat. Geosci., 4, 434439, doi:10.1038/ngeo1174.

    • Search Google Scholar
    • Export Citation
  • Gentine, P., Entekhabi D. , Chehbouni A. , Boulet G. , and Duchemin B. , 2007: Analysis of evaporative fraction diurnal behaviour. Agric. For. Meteor., 143, 1329, doi:10.1016/j.agrformet.2006.11.002.

    • Search Google Scholar
    • Export Citation
  • Gentine, P., Entekhabi D. , and Polcher J. , 2011: The diurnal behavior of evaporative fraction in the soil–vegetation–atmospheric boundary layer continuum. J. Hydrometeor., 12, 15301546, doi:10.1175/2011JHM1261.1.

    • Search Google Scholar
    • Export Citation
  • Gentine, P., Holtslag A. A. M. , D’Andrea F. , and Ek M. , 2013: Surface and atmospheric controls on the onset of moist convection over land. J. Hydrometeor., 14, 1443–1462, doi:10.1175/JHM-D-12-0137.1.

    • Search Google Scholar
    • Export Citation
  • Guillod, B. P., and Coauthors, 2014: Land-surface controls on afternoon precipitation diagnosed from observational data: Uncertainties and confounding factors. Atmos. Chem. Phys., 14, 83438367, doi:10.5194/acp-14-8343-2014.

    • Search Google Scholar
    • Export Citation
  • Koster, R. D., and Coauthors, 2004: Regions of strong coupling between soil moisture and precipitation. Science, 305, 11381140, doi:10.1126/science.1100217.

    • Search Google Scholar
    • Export Citation
  • Koster, R. D., and Coauthors, 2006: GLACE: The Global Land–Atmosphere Coupling Experiment. Part I: Overview. J. Hydrometeor., 7, 590610, doi:10.1175/JHM510.1.

    • Search Google Scholar
    • Export Citation
  • Künsch, H. R., 1989: The jackknife and the bootstrap for general stationary observations. Ann. Stat., 17, 12171241, doi:10.1214/aos/1176347265.

    • Search Google Scholar
    • Export Citation
  • Liu, D., Wang G. , Mei R. , Yu Z. , and Gu H. , 2014: Diagnosing the strength of land–atmosphere coupling at subseasonal to seasonal time scales in Asia. J. Hydrometeor.,15, 320–339, doi:10.1175/JHM-D-13-0104.1.

  • Marshall, C. H., Crawford K. C. , Mitchell K. E. , and Stensrud D. J. , 2003: The impact of the land surface physics in the operational NCEP Eta model on simulating the diurnal cycle: Evaluation and testing using Oklahoma Mesonet data. Wea. Forecasting, 18, 748768, doi:10.1175/1520-0434(2003)018<0748:TIOTLS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Mei, R., and Wang G. , 2012: Summer land–atmosphere coupling strength in the United States: Comparison among observations, reanalysis data, and numerical models. J. Hydrometeor., 13, 10101022, doi:10.1175/JHM-D-11-075.1.

    • Search Google Scholar
    • Export Citation
  • Mesinger, F., and Coauthors, 2006: North American Regional Reanalysis. Bull. Amer. Meteor. Soc.,87, 343–360, doi:10.1175/BAMS-87-3-343.

  • Miralles, D. G., van den Berg M. J. , Teuling A. J. , and de Jeu R. A. M. , 2012: Soil moisture–temperature coupling: A multiscale observational analysis. Geophys. Res. Lett., 39, L21707, doi:10.1029/2012GL053703.

    • Search Google Scholar
    • Export Citation
  • Mitchell, K. E., and Coauthors, 2004a: The multi-institutional North American Land Data Assimilation System (NLDAS): Utilizing multiple GCIP products and partners in a continental distributed hydrological modeling system. J. Geophys. Res., 109, D07S90, doi:10.1029/2003JD003823.

    • Search Google Scholar
    • Export Citation
  • Mitchell, K. E., and Coauthors, 2004b: NCEP completes 25-year North American Reanalysis: Precipitation assimilation and land surface are two hallmarks. GEWEX News, Vol. 14, No. 2, International GEWEX Project Office, Silver Spring, MD, 912.

  • Mo, K., Chelliah M. , Carrera M. L. , Higgins R. W. , and Ebisuzaki W. , 2005: Atmospheric moisture transport over the United States and Mexico as evaluated in the NCEP regional reanalysis. J. Hydrometeor., 6, 710728, doi:10.1175/JHM452.1.

    • Search Google Scholar
    • Export Citation
  • Phillips, T. J., and Klein S. A. , 2013: Land–atmosphere coupling manifested in warm-season observations on the U.S. Southern Great Plains. J. Geophys. Res. Atmos., 119, 509–528, doi:10.1002/2013JD020492.

    • Search Google Scholar
    • Export Citation
  • Roads, J., and Coauthors, 2003: GCIP Water and Energy Budget Synthesis (WEBS). J. Geophys. Res., 108, 8609, doi:10.1029/2002JD002583.

  • Ruane, A. C., 2010a: NARR’s atmospheric water cycle components. Part I: 20-year mean and annual interactions. J. Hydrometeor., 11, 12051219, doi:10.1175/2010JHM1193.1.

    • Search Google Scholar
    • Export Citation
  • Ruane, A. C., 2010b: NARR’s atmospheric water cycle components. Part II: Summertime mean and diurnal interactions. J. Hydrometeor., 11, 12201233, doi:10.1175/2010JHM1279.1.

    • Search Google Scholar
    • Export Citation
  • Santanello, J. A., Jr., Friedl M. A. , and Kustas W. , 2005: An empirical investigation of convective planetary boundary layer evolution and its relationship with the land surface. J. Appl. Meteor., 44, 917932, doi:10.1175/JAM2240.1.

    • Search Google Scholar
    • Export Citation
  • Santanello, J. A., Jr., Friedl M. A. , and Ek M. B. , 2007: Convective planetary boundary layer interactions with the land surface at diurnal time scales: Diagnostics and feedbacks. J. Hydrometeor., 8, 10821097, doi:10.1175/JHM614.1.

    • Search Google Scholar
    • Export Citation
  • Santanello, J. A., Jr., Peters-Lidard C. D. , Kumar S. V. , Alonge C. , and Tao W.-K. , 2009: A modeling and observational framework for diagnosing local land–atmosphere coupling on diurnal time scales. J. Hydrometeor., 10, 577599, doi:10.1175/2009JHM1066.1.

    • Search Google Scholar
    • Export Citation
  • Seneviratne, S. I., Luethi D. , Litschi M. , and Schaer C. , 2006: Land–atmosphere coupling and climate change in Europe. Nature, 443, 205209, doi:10.1038/nature05095.

    • Search Google Scholar
    • Export Citation
  • Spracklen, D. V., Arnold S. R. , and Taylor C. M. , 2012: Observations of increased tropical rainfall preceded by air passage over forests. Nature, 489, 282285, doi:10.1038/nature11390.

    • Search Google Scholar
    • Export Citation
  • Stoline, M. R., 1981: The status of multiple comparisons: Simultaneous estimation of all pairwise comparisons in one-way ANOVA designs. Amer. Stat., 35, 134141, doi:10.1080/00031305.1981.10479331.

    • Search Google Scholar
    • Export Citation
  • Taylor, C. M., de Jeu R. A. M. , Guichard F. , Harris P. P. , and Dorigo W. A. , 2012: Afternoon rain more likely over drier soils. Nature, 489, 423426, doi:10.1038/nature11377.

    • Search Google Scholar
    • Export Citation
  • Xie, S. C., and Coauthors, 2010: ARM climate modeling best estimate data: A new product for climate studies. Bull. Amer. Meteor. Soc., 91, 1320, doi:10.1175/2009BAMS2891.1.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Locations of grid points with extensive sampling. Data from circled grid points are shown in Figs. 2 and 3.

  • Fig. 2.

    (a) Blue line shows 12 years of June–August sample daily rainfall data from a grid cell in northeastern New Mexico. Red line shows 31-day moving average of daily values. Below are schematics of bootstrap sampling protocol for block lengths of (b) 10, (c) 30, and (d) 92 days and for total bootstrap sample lengths of (left) 92 and (right) 920 days.

  • Fig. 3.

    For a sample grid cell in northwestern Mexico, impact of block size (left) 10, (middle) 30, or (right) 92 days, and bootstrap sample length (from top to bottom) nobs = 92, 276, 552, 920, 1380, or 2300 days, on the estimate of sTFS. Each histogram contains 300 bootstrap samples for the given block size and sample length.

  • Fig. 4.

    For each of 126 “stations,” the variability of 1001 individual bootstraps (x axis) vs Gold Standard (y axis; the mean value of 1001 bootstraps of 2300-day samples). For each station, the black circle at the center of each line is the median, the inner 50% of data are shown with thick blue lines, the outer 50% of data are shown with thin blue lines, and the outliers are shown with cyan dots. The 1:1 line is in red. Results are from a random sampling of all data during 1979–2012 using a moving-block bootstrap routine with block length of 10 days. (from top to bottom) Mean daily precipitation (mm day−1); correlation between latent heat flux and 2-m air temperature (unitless); two-legged index between soil moisture and latent heat flux (units of soil moisture; kg m−2); and sTFS, measuring the impact of EF variation on the probability of afternoon rain (units of probability of afternoon rain).

  • Fig. 5.

    Behavior of the bootstrap mean. Data requirements (x axis) as a function of amount of disagreement (y axis) between bootstrap subsamples of data and the 2300-day (25 summers) sample. The disagreement is quantified by the number of stations at which the null hypothesis that the short samples and 2300-day samples do not differ is rejected at the 5% significance level. Total number of stations considered is 126.

  • Fig. 6.

    As in Fig. 4, but for 300 individual bootstraps generated with 92-day data blocks (spread along the x axis) vs the Gold Standard for each station (y axis). Here, the Gold Standard is computed as the mean value of the first 300 bootstraps of 2300-day-long samples generated with 10-day data blocks.

  • Fig. 7.

    As in Fig. 6, but for the Gold Standard computed as the mean value of the 300 bootstraps of 2300-day-long samples generated with 92-day data blocks.

  • Fig. 8.

    (top) Black dots indicate deviations of median values for each station from the 1:1 line in the Fig. 7 two-legged ILH data. Gold Standard values are on the x axis. Negative deviations indicate that median values from the bootstrap samples are smaller than the Gold Standard value. Red lines are best-fit linear regression lines. (middle) Six regression lines from (top) are shown in the far right for two-legged ILH. Other panels show similar lines for the other variables in Fig. 7. (bottom) As in (middle), but for bootstrap samples with noise (data presented in Fig. 13, described in greater detail below).

  • Fig. 9.

    For nine variables, boxplots showing the difference between the max and the min bootstrap value at each of the 126 stations shown in Fig. 1 from the 1001 bootstrap samples (10-day block length) of a given sample record length. For each record length (x axis), the black circle at the center of each line is the median of the values from the 126 stations, the inner 50% of data are shown with thick blue lines, the outer 50% of data are shown with thin blue lines, and the outliers are shown with dots. Titles on figure panels use SHF and LHF for sensible heat flux H and latent heat flux λE, respectively.

  • Fig. 10.

    Data requirements for directly observable variables for a range of acceptable margins of error, relative to the range of mean values across the 126 stations. Solid lines are derived from Eq. (5) with α = 5%, a population std dev is taken from the mean of all 126 stations from the full ensemble of 2300-day bootstrap samples, and the dashed lines use the min and max σ calculated from all 2300-day bootstrap members at all stations.

  • Fig. 11.

    Number of ensemble members in a bootstrap sample set vs standard score normalized std dev of the bootstraps [X′; Eq. (6)]. The 11 black lines are the mean values of X′ at the 126 stations for each of the 11 record lengths sampled with longer samples associated with lower y-axis values. The blue envelopes show the standard score normalized std dev of the bootstraps with 92 days of data for each of the 126 stations, while the green envelopes show this for the bootstraps with 2300 days of data. For all categories of metrics, the number of observations contributing to each bootstrap sample has a much bigger impact on variability within a sample set than the number of bootstrap members. The variability is largely independent of the number of bootstraps once the ensemble is larger than about 100–300 members.

  • Fig. 12.

    From the data in Fig. 11 (and similar figures for other variables), the mean (central symbols), 5th (lowest symbols), and 95th (upper-most symbols) percentile values of the normalized std dev from the 126 stations for a large number of bootstraps (mean for 950–1000 bootstraps) vs the number of years of data contributing to the samples. Curved black line is , where x is summers of data, representing the variability stemming from the standard error of the mean and the normalization relative to the 25-summer-long ensemble set (see text for discussion). Symbols are spread out on the x axis for easier readability but are all associated with the nearest integer values along the x axis.

  • Fig. 13.

    As in Fig. 4, but for bootstraps with noise (see text for details).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 996 516 35
PDF Downloads 410 97 4