Reanalysis products are widely used to study the land–atmosphere exchanges of energy, water, and carbon fluxes and have been evaluated using in situ data above or below ground. Here, measurements for several years at five flux tower sites in the United States (with a total of 315 576 h of data) are used for the coupled evaluation of both below- and aboveground processes from three global reanalysis products and six global land data assimilation products. All products show systematic errors in precipitation, snow depth, and the timing of the melting and onset of snow. Despite the biases in soil moisture, all products show significant correlations with observed daily soil moisture for the periods with unfrozen soil. While errors in 2-m air temperature are highly correlated with errors in skin temperature for all sites, the correlations between skin and soil temperature errors are weaker, particularly over the sites with seasonal snow. While net short- and longwave radiation flux errors have opposite signs across all products, the net radiation and ground heat flux errors are usually smaller in magnitude than turbulent flux errors. On the other hand, the all-product averages usually agree well with the observations on the evaporative fraction, defined as the ratio of latent heat over the sum of latent and sensible heat fluxes. This study identifies the strengths and weaknesses of these widely used products and helps understand the connection of their errors in above- versus belowground quantities.
Because of a lack of globally consistent observational data, reanalysis products have become invaluable tools in the earth sciences (Kalnay et al. 1996). Reanalysis products are generated by assimilating remotely sensed and in situ data into global models to produce gridded datasets that provide a best possible representation of the state of the climate system (Balsamo et al. 2013; Reichle et al. 2011; Rienecker et al. 2011; Dee et al. 2011; Saha et al. 2010; Rodell et al. 2004). However, they are still subject to errors contained in the model physics. These products are also often used to drive land surface models (LSMs; Qian et al. 2006), which are in turn used for the production of gridded datasets used in climate studies (Bengtsson et al. 2004) and to force hydrologic models (Lauri et al. 2014). The increased use of reanalysis products in the scientific community combined with their potential for inherent errors has dictated the need for a careful evaluation of their skill (Wang and Zeng 2012; Decker et al. 2012; Reichle et al. 2011; Mao et al. 2010).
For instance, Decker et al. (2012) used flux tower measurements to perform a comprehensive evaluation of the 6-hourly and monthly outputs of atmospheric variables used to force LSMs along with near-surface turbulent energy and water fluxes. This was done for reanalysis products from the National Aeronautics and Space Administration (NASA) Global Modeling and Assimilation Office (GMAO), the National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP), and the European Centre for Medium-Range Weather Forecasts (ECMWF). They found that overall the ECMWF interim reanalysis (ERA-Interim) performed the best, with at least part of its accuracy owing to the assimilation of 2-m air temperature. However, they showed that the Climate Forecast System Reanalysis (CFSR) outperforms the ERA-Interim, the Modern-Era Retrospective Analysis for Research and Applications (MERRA), and the Global Land Data Assimilation System (GLDAS) in its representation of 6-hourly latent heat flux; this is attributable to CFSR’s use of observed precipitation in the land data assimilation. Additionally, they showed that GLDAS performs the best for producing precipitation totals because of the use of observed precipitation data as a forcing field for LSMs.
Reichle et al. (2011) assessed the skill (defined as the correlation of the bias of product time series from the observed climatology) of MERRA-Land, MERRA, and ERA-Interim compared with in situ observations. The assessment was conducted for root-zone soil moisture, snow depth, snow water equivalent (SWE), and runoff. They showed that the skill of MERRA and MERRA-Land is typically higher than that of ERA-Interim with MERRA-Land typically performing slightly better than MERRA. These results are attributed to the use of observation-corrected precipitation in MERRA-Land and the model physics improvement compared with MERRA. Wang and Zeng (2012) evaluated daily and monthly values of six reanalysis products over the Tibetan Plateau using 62 weather stations. They found that no one reanalysis product could be considered superior to any other for all variables or time scales and thus suggested that multiple products be used in weather and climate studies of the region.
Evaluation studies conducted so far either included a large collection of reanalyses and focused entirely on atmospheric outputs (Decker et al. 2012) or examine above- and belowground fields with a small number of products only (Reichle et al. 2011). However, the question still remains: how do both (surface–atmosphere) coupled and offline land (noncoupled) products compare to one another and to observations in their outputs of above- and belowground fields?
The above evaluation studies focused on the atmospheric variables predominately used in the forcing of LSMs; furthermore, these studies focused on hourly or monthly time scales. However, the output fields from LSMs (e.g., soil moisture and temperature, snow depth, and SWE) have a strong seasonal variability and are of particular importance in studying the phenomena associated with the hydrologic cycle. For instance, Betts et al. (2014) found that, over the Canadian Prairies, snow cover acts as a rapid catalyst for seasonal climatic transitions. They showed that near-surface air temperature falls by 10 K within days of the surface experiencing snow cover in the autumn, with a similar rise in temperature corresponding to the melting of snow in the spring. They also showed that, with a 10% drop in the number of days experiencing snow cover, the mean winter temperature there rose by 1.4 K.
Nigam and Ruiz-Barradas (2006) performed a seasonal evaluation of the 40-yr ECMWF Re-Analysis (ERA-40) along with the NCEP–NCAR reanalyses, the North American Regional Reanalysis (NARR), and several climate models using gridded and remotely sensed data. They found that the difference between summer and winter product errors of surface air temperature is 5–9 K larger than observations for climate models and upward of 4 K larger for reanalysis, suggesting that the reanalyses and models are more sensitive to the seasonal change than nature.
The importance of the seasonal variations highlighted by Betts et al. (2014) combined with inherent errors of reanalysis products highlighted by Nigam and Ruiz-Barradas (2006) suggests that an evaluation of the seasonal behavior of the land surface components of reanalysis products and the recently developed offline land-based reanalyses is needed. More explicitly, the questions become, are the known seasonally dependent errors of Nigam and Ruiz-Barradas (2006) present in below-surface fields as well, and are these seasonal errors linked to the transition between snow-cover and snow-free periods?
This study performs just such an evaluation utilizing six reanalysis products: MERRA, MERRA-Land, ERA-Interim, ERA-Land, CFSR, and GLDAS. The evaluation is conducted using flux towers along with stations from the Cooperative Observer Program (COOP) network over the continental United States. The focus is on the seasonal (from summer to winter and vice versa) transition of near-surface variables: air and skin temperatures, precipitation, snow depth, SWE, surface energy and radiation fluxes, and below-surface variables of soil temperature and moisture.
2. Data description and methods
a. In situ observation
In situ measurements for this study were taken from five flux tower sites from the AmeriFlux network within the FLUXNET database (http://www.fluxnet.ornl.gov; Baldocchi et al. 2001). FLUXNET is a global network of flux tower sites that use eddy covariance techniques to make measurements of water, energy, and CO2 exchanges. This study analyzes the non-gap-filled level two data of: 2-m air temperature, skin temperature, soil temperature and moisture, and surface energy and radiation fluxes from the FLUXNET sites. Details of the sites used are provided in Table 1, while a map of all sites used can be seen in Fig. S1 of the supplemental material. The sites were chosen so as to represent a variety of land-cover types, while being located in an area of sufficiently homogenous land cover and terrain so as to be probably representative of an entire model grid box for a given reanalysis product. Sites were also chosen to have at least 4 years of continuous data between 2000 and 2007. All sites have either half hourly or hourly data output frequencies; these are averaged to create a 6-hourly time series so as to be directly comparable with all reanalysis fields.
These flux tower sites, however, do not provide the in situ snow depth and precipitation measurements that are crucial for our analysis. These observations are therefore taken from the National Weather Service (NWS) COOP (data accessed from http://rda.ucar.edu/datasets/ds510.0/; National Weather Service 1989). COOP data incorporate observations from NWS principal climatological stations, the Federal Aviation Administration, the National Park Service, the Bureau of Land Management, and the U.S. Geological Survey. COOP data are available dating back to 1854 and covering the United States, the Virgin Islands, Puerto Rico, and assorted Pacific Islands. Observations of daily snow depth and precipitation are used for this study. The COOP archiving system considers snow depth less than 0.5 in. (12.7 mm) and precipitation less than 0.005 in. (0.127 mm) to be a trace amount and recorded as 0 in.
COOP stations employed here are within 60 km of a flux tower and are at an elevation within 105 m of the tower elevation. Stations are also chosen with similar topographic and land-cover profiles as the flux tower. Data from these stations are then averaged together to create a single dataset (Fig. S1 in the supplemental material).
To help address the spatial representativeness issue of in situ snow measurements, we also use the gridded snow depth product from the Canadian Meteorological Centre (CMC; Brown and Brasnett 2010). This dataset is on a 24-km polar stereographic grid and covers the period from 1998 to 2013 for the Northern Hemisphere and contains daily fields of snow depth. The dataset is an interpolation of a combination of data from synoptic observations, meteorological aviation reports, and special aviation reports. (Data are available at http://nsidc.org/data/nsidc-0447#.)
b. Reanalysis products
A short description of the surface–atmosphere coupled and land surface offline reanalysis products used in this study follows, highlighting the major differences between the products primarily in the near-surface and subsurface aspects of the products and their assimilation of near surface forcing fields (e.g., observed and satellite-derived precipitation). More detailed descriptions are available from the relevant references.
To address the issue of a lack of assimilated land surface data, ECMWF and GMAO have developed offline (not coupled to the atmosphere) reruns of their reanalysis products: ERA-Interim/Land (hereafter ERA-Land; Balsamo et al. 2013) and MERRA-Land (Reichle et al. 2011), respectively. The reruns utilize atmospheric output fields from the original reanalysis products along with observed or observation-corrected precipitation data to force improved LSMs, in order to better capture surface and subsurface fields and the complete hydrologic cycle. For the same reason, GLDAS products are also used here.
MERRA (available at http://mirador.gsfc.nasa.gov/) is a NASA-produced product covering the satellite era, from 1979 to present, and is available in near–real time. MERRA makes use of the Goddard Earth Observing System, version 5 (GEOS-5), atmospheric general circulation model (AGCM; Rienecker et al. 2011) to produce hourly outputs at a native latitude–longitude resolution of ½° × ⅔°. It also uses a catchment-based land model (Koster et al. 2000).
MERRA-Land is forced entirely with output surface meteorological fields from MERRA, with the exception that the MERRA-Land precipitation is corrected toward gauge measurements via the Global Precipitation Climatology Project, version 2.1 (GPCP v2.1; Reichle et al. 2011). Furthermore, MERRA-Land benefits from a set of improved hydrological treatments, particularly related to rainfall interception that allows a greater amount of rainfall to reach the soil. Covering the same time period with the same spatial and temporal frequency as MERRA, MERRA-Land introduces several additional hydrological and subsurface fields (e.g., soil temperature). (MERRA-Land data can be found along with MERRA data at http://mirador.gsfc.nasa.gov/.)
ERA-Interim (available at http://apps.ecmwf.int/datasets/) is the land–atmosphere coupled reanalysis product from ECMWF (Simmons et al. 2006). It covers the period from 1979 to present, with a spectral resolution of T255 (~79 km). ERA-Interim employs a four-dimensional variational data assimilation (4DVar), which is more advanced than the three-dimensional variational data assimilation (3DVar) employed by all other reanalysis products utilized in this study. Observed in situ 2-m air temperature and humidity are also assimilated, while no observational precipitation rates or amounts are assimilated. ERA-Interim uses a 12-h assimilation period. All variables analyzed here are taken at 6-hourly intervals. Because analysis fields for surface energy fluxes (latent and sensible heat, along with radiative fluxes at the surface) and precipitation are not available, they are taken from the 6-h intervals of forecast fields initiated at 0000 UTC (Simmons et al. 2006).
In much the same style as MERRA-Land, ERA-Land is an offline reanalysis product with meteorological forcing provided by ERA-Interim and operating with the same temporal and spatial resolution as ERA-Interim (Balsamo et al. 2010). As with MERRA-Land, ERA-Land benefits from a bias correction of ERA-Interim precipitation, using GPCP v2.1. ERA-Land also uses a newer version of the LSM within ERA-Interim, with improved snow treatment (Dutra et al. 2010), a satellite-based vegetation climatology (Boussetta et al. 2013), an improved bare soil evaporation scheme (Balsamo et al. 2011), and improved soil hydrology (Balsamo et al. 2009). As with ERA-Interim, fields are used at 6-h intervals, with sensible heat, latent heat, net radiation, and precipitation fields taken from the reanalysis forecast. (ERA-Land can be accessed at http://apps.ecmwf.int/datasets/ along with the ERA-Interim data.)
GLDAS (accessible at http://mirador.gsfc.nasa.gov/) is an offline land data assimilation product that uses a single set of atmospheric forcing fields to drive four different LSMs (Rodell et al. 2004). These forcing data come from a variety of sources, including the Princeton global meteorological forcing dataset (a bias-corrected reanalysis product), and observational rain rates from several satellite observing systems (for a complete list, see http://ldas.gsfc.nasa.gov/gldas/GLDASforcing.php). GLDAS uses these data to drive four different LSMs: Mosaic (Koster and Suarez 1996), the Community Land Model (CLM; Dai et al. 2003), Noah (Ek et al. 2003), and the Variable Infiltration Capacity model (VIC; Liang et al. 1996). GLDAS outputs 3-hourly data covering the satellite era, from 1979 to present. This is done at several different spatial resolutions; however, only the 1° × 1° resolution for each LSM are analyzed in this study (Rodell et al. 2004).
The reanalysis product from NCEP (CFSR) also covers from 1979 to present (Saha et al. 2010). CFSR is an atmosphere–ocean coupled reanalysis at T382 (~38 km) horizontal resolution. CFSR uses gauge-based gridded precipitation data to drive its Noah LSM. The data assimilation process operates on a 6-hourly cycle, and the product outputs forecast data at each hour between the analysis periods (Saha et al. 2010). However, only the 6-hourly analysis outputs are considered for this study. (CFSR data are available from several locations, including http://rda.ucar.edu/datasets/ds093.0/.)
Not every product produces all fields of interest for this study; Table S1 in the supplemental material shows which variables are analyzed from which products. Three separate groups of fields will be analyzed: water cycle including precipitation, snow depth, and soil moisture; temperature including soil, skin, and 2-m air temperatures; and surface energy fluxes, including sensible, latent, and ground heat fluxes, as well as up- and downward short- and longwave radiation fluxes.
a. Water cycle
For all sites there are clear patterns in the temporal variation of water cycle variable errors. Products tend to misrepresent the peak depth of snow, and the snow period as can be seen in Figs. 1–3 (top-center). The onset of the snow period is taken as the first day with a snow depth greater than 1.3 cm or SWE greater than 0.13 cm (chosen to correspond to the minimum recorded values at the COOP stations) that is followed by a period of more than 3 days with snow on the ground within the next 14 days. This is done so as to exclude small early season isolated storms not corresponding to the onset of winter snow cover. The snow period is defined as having ended when the depth or SWE values drop below the above thresholds and stay there for a period of 14 days.
Table 2 shows the discrepancies in the snow fields for the three [Wisconsin (WI), Montana (MT), and Indiana (IN)] sites with snow. Because snow depth is only provided for MERRA and ERA, timing of snowfall and snowmelt is taken from the SWE fields of other products. The MERRA and MERRA-Land products have the greatest errors in the timing of first snowfall and snowmelt and consistently overestimate the snow period, with delays on the order of months. The GLDAS products capture the timing of snow well, as they are all driven by the same observed precipitation data. It is still surprising, however, that the different criteria used to separate snow versus rain in the four land models (CLM, Mosaic, Noah, and VIC) in GLDAS lead to different snowfall onset dates by up to 29 days (over the MT site between CLM and VIC). The peak snow depth error is within 6 cm at MT, but can be as large as 13 cm at IN (by MERRA-Land) and 18 cm in magnitude at WI (by ERA-Interim; Table 2).
While the in situ observations are based on the average of several COOP sites (four at WI, four at MT, and three at IN), they still do not represent the gridbox average. We could use the differences between in situ data and CMC results to roughly estimate the uncertainty due to spatial representativeness issues. Then Table 2 shows that most products capture the first snowfall date within 2 weeks (except MERRA and MERRA-Land, CFSR at WI, and CLM at MT). In contrast, all products perform poorly (errors of at least 18 days, with several errors exceeding 30 days) in capturing the snow end date over at least one site, except GLDAS-VIC. Overall, ERA-Land performs best in the maximum snow depth. Despite these discrepancies, no one product performs the best in snowfall, snowmelt, and snow depth at each location, indicating that an analysis of multiple products would be beneficial when conducting studies related to snow duration and depth.
The precipitation differences between various products and the in situ data are presented in Figs. 1–4 (top-left). They are also summarized in Table 3. Some of these differences may be partially caused by the scale mismatch between point measurements and gridbox averages. However, even the differences between different products are very large [e.g., 4.2 mm day−1 over the Florida (FL) site between the different global precipitation products in GLDAS vs CFSR in summer]. As was also shown by Decker et al. (2012) and Bosilovich et al. (2009), the precipitation differences tend to be greater in magnitude for the June–August (JJA) period than for the December–February (DJF) period (Table 3), likely because of large convective storms in summer, as can be seen in Figs. 1–5. Bosilovich et al. (2009) showed similar results using model ensembles (not reanalysis products). This, along with good precipitation data coverage over the United States, suggests that this bias is more a product of model physics than data assimilation.
In general, SWE is driven by snowfall and constrained by sublimation and snowmelt. SWE is linked to snow depth via the snow density. Therefore, large errors in precipitation in a product are likely to indicate large errors in snow depth. However, several products actually show opposite signs in precipitation and snow depth errors. For example, all products show a negative November precipitation bias in MT, but the multiproduct mean shows a positive snow depth bias in November (Fig. 2, top). In contrast, these biases change signs in February/March.
Soil moisture errors tend to be smaller at WI and IN with greater average soil moisture (Fig. 1 and Fig. S3 in the supplemental material) compared with those at MT and Arizona (AZ; Figs. 2, 4). For sites in WI, MT, IN, and AZ (as soil moisture data are unavailable for FL), the average (across all products) correlation coefficient between observed daily soil moisture and product soil moisture for days with unfrozen soil is 0.39, 0.45, 0.65, and 0.66, respectively, all significant at the 99% level. The correlation is largest at the snow-free AZ site and is the smallest at the WI site receiving the most annual snow. The lower correlation at WI is in part due to snow errors in all products.
An initial stark observation of the temperature fields (Figs. 1–5, middle) is the shift from cold bias in the winter to warm bias in the summer for the two forested sites with snow (WI in Fig. 1 and IN in Fig. 3), and this shift is a seasonal overresponsiveness in the models, as shown by Nigam and Ruiz-Barradas (2006). Furthermore, all sites with snow (WI, IN, and MT) show a 4.2°C increase in 2-m air temperature observations in the 10 days preceding and following snowmelt, and this is linked to the fast climate switch noted by Betts et al. (2014). These results suggest that this fast switch is substantially amplified by the reanalysis products here. Note that this seasonal amplification is absent from the other two sites except for the soil temperature field in FL and is linked to a failure in the products’ partitioning of surface energy fluxes at these sites (to be discussed in greater detail in section 3c). The magnitude of the summer warm bias of 2-m air temperature, particularly for WI (Fig. 1) is substantial, around 5°C for all products.
The soil temperature fields of the various reanalysis products are produced for different depths below the surface. For each product, the level that most closely corresponds to 5 cm is chosen and is given in Table S1 in the supplemental material. Soil levels for in situ measurements are given in Table 1. Unless otherwise denoted, the 5- or 4-cm levels are used for analysis.
The monthly mean diurnal cycle for soil temperature at WI is shown in Fig. 6, with other sites shown in Figs. S2–S5 in the supplemental material. For January and February at the forested sites (WI, FL, and IN in Fig. 6 and in Figs. S3 and S4 in the supplemental material), most products show a colder soil temperature than in situ observations. For June and July at the same sites, some products show a greater diurnal variation, and most have a warmer bias. The observed diurnal cycle of soil temperature at nonforested sites of AZ and MT (Figs. S2 and S5 in the supplemental material) tends to be better reproduced by the reanalysis products than at the forested sites, where there is a clear seasonal shift in bias.
In general, soil temperature is a function of soil depth, and this factor should be considered when in situ data are used to evaluate various products. To address this issue, observations at two depths are shown in Fig. 6, and their differences are generally much smaller in magnitude than those between products and observations. This suggests that soil temperature biases of various products are not caused by the soil depth differences between products and measurements in Fig. 6.
The semiarid AZ site has a higher soil temperature gradient with depth and a larger spread of results from the various products (Fig. 4). The largest differences occur in the late afternoon and in midsummer (Fig. S2 in the supplemental material). At these times the difference in observed soil temperature at the two depths of 4 and 8 cm represents at most 35% of the spread of the reanalysis products, and this ratio is much lower for nighttime temperatures. This gives confidence that the findings discussed here are not the result of discrepancies between reanalysis and observational soil depths, particularly for daily and monthly averaged quantities.
Soil temperature shows a shift in the sign of the error from negative in winter to positive in summer at the forested sites of WI (Fig. 1), IN (Fig. 3), and FL (Fig. 5), and from positive in winter to negative in summer from the grassland site in MT (Fig. 2). In contrast, the sign does not change at the shrubland site in AZ (Fig. 4). This shift in sign of the error does not occur with the onset and departure of snow (Figs. 1, 3). Furthermore, this effect is seen not just in the daily mean, but also in the mean diurnal cycle (Fig. 6). Consistent with the seasonal shift of errors over the three forested sites, reanalysis products show a mean soil temperature seasonal cycle amplitude (taken as the difference in the average temperature in JJA from DJF) that is 7.2°, 6.2°, and 2.8°C greater than observations for WI, IN, and FL, respectively.
Over the forested WI site, the magnitude of the errors increases from skin temperature down to 5-cm soil temperature during the snow-covered period (Fig. 1, middle), and the reanalysis products show an average of 2.8°C greater seasonal amplitude than skin temperature observations (vs 7.2°C for soil temperature difference). For the AZ site (without snow), there is an average yearly negative bias in soil temperature of −4.2° and −6°C for skin temperature.
To understand the above results, we need to recognize that the 2-m air temperature and skin temperature are linked through turbulent mixing, while the skin and soil temperatures are linked through the snow and soil heat transfer. Table 4 quantifies the correlations between the errors of these three temperatures. At the three sites (WI, IN, and MT) with seasonal snow, the correlation of the errors is highest between air and skin temperatures. The weaker correlation between skin and soil temperatures is partly caused by the snow biases, leading to the greater soil temperature bias in magnitude than the skin temperature bias at WI, as mentioned earlier. At AZ, the correlations of air temperature with both skin and soil temperatures are high (Table 4), and the heat transfer in the soil naturally leads to the lower soil temperature bias in magnitude than the skin temperature bias, as mentioned earlier.
Noah and CLM have similar correlations between air and skin temperature errors, because they are both driven by the same air temperature data in GLDAS. In contrast, they have larger differences in the correlations between air (or skin) and soil temperature errors, partly because soil temperature is strongly dependent on land models. Note that, because soil temperature is also dependent on soil moisture that is strongly affected by precipitation and snowmelt, the use of the same land model in GLDAS-Noah and CFSR does not lead to the same correlations between skin and soil temperatures (Table 4).
c. Energy fluxes
Before any discussion of energy flux errors is undertaken, it is important to recognize that flux towers do not preserve closure of the surface energy budget (Wilson et al. 2002). This is partly because the covariance methods used by the towers do not account for all methods by which energy is transported, including subsidence and horizontal advection (Foken et al. 2011). For the four sites (WI, IN, MT, and AZ), the residuals in the energy balance (i.e., the differences between net radiation and the sum of sensible, latent, and ground heat fluxes) are 5.67, 38.1, −14.2, and 5.3 W m−2, respectively. In contrast, all products studied here have a small residual in the energy balance, usually within 2 W m−2 in magnitude. As a result, it is difficult to discern if the errors discussed below are the result of errors in the reanalysis products or errors in the flux tower observations or, most likely, a combination of both.
Daily ground heat flux (GH) is typically well represented by all reanalysis products (e.g., Figs. 1–5). The average magnitude of GH errors for the various products is comparable and usually within 10 W m−2, with CFSR typically having the poorest representation of GH (Table 5). In the two snow-free sites, daily GH errors average −3.3 W m−2 (at AZ) and 6.7 W m−2 (at FL) only, with standard deviations of 3.5 and 0.2 W m−2 among different products, respectively. At the sites with seasonal snow coverage (WI, MT, and IN), the GH error is usually highest around snowmelt time (Figs. 1–3). For instance, the average GH error in the week preceding and following snowmelt increases from the annual mean error by 10.6 W m−2 for MERRA at the WI site and by 18.1 W m−2 from GLDAS-Mosaic. This feature is more likely tied to errors in the products’ peak snow depth, because the spike in GH errors initiates just before snowmelt begins, when the snow depth is the greatest, and it is not strongest in MERRA and MERRA-Land even though they have the poorest representation of the snowmelt time.
The errors in latent (LH) and sensible heat (SH) fluxes are generally greater than the GH errors in magnitude (Figs. 1–5, Table 5). In particular, the all-product averages overestimate LH over all sites except AZ (Figs. 1–5), consistent with the finding of Decker et al. (2012). For the sites with snow, these LH errors are generally greater during the snow-free period when LH itself is large because of the increase in net radiative fluxes (Figs. 1–3). However, there is a decrease in the positive bias of nearly all products timed with peak summer rainfall. Not surprisingly there is a significant spike in SH errors (Figs. 1–3, 7) during this time to account for the energy surplus no longer being attributed to LH.
Both LH and SH errors are related to the net radiative flux errors. To isolate the LH and SH errors that are less affected by the radiative flux errors, Fig. 8 shows the evaporative fraction (EF), defined as LH/(SH + LH), for the observations and the products (averaged together for clarity) along with observed daily precipitation. Over the FL, AZ, and MT sites, the all-product averages agree very well with the observations, including the spike in EF during the period of peak summer rains. At the IN and WI sites, while observations show a spike in EF in the early summer and then a nearly constant EF during the summer and early fall, the all-product averages show a nearly constant EF for the whole snow-free period. This indicates that, on average, the products at the WI and IN sites fail to capture the seasonal transition in the partitioning of surface available energy and have higher EF during the winter than observations.
GH, SH, and LH are driven and balanced by the net radiative fluxes (defined as downward minus upward radiative fluxes). In the annual cycle, net shortwave radiation tends to be overestimated while net longwave radiation is underestimated (Table 5); these biases partially cancel out in the net radiation balance. The underestimate of net longwave radiation flux (i.e., the increase in the upward longwave flux) is caused by the summer warm bias (e.g., Figs. 1–4), while the shortwave bias is due to an underestimate of surface albedo (Fig. S7 in the supplemental material). The shortwave bias tends to dominate in most months, while the longwave bias dominates for part of the winter (Fig. S6 in the supplemental material). Moreover, the forest sites (WI and IN) show substantially stronger negative longwave and positive shortwave biases than does the grassland site of MT, which shows less negative albedo bias in the summer months. While MERRA-Land is forced by the MERRA downward radiation fields, they have nearly identical errors in upward radiation fluxes, indicating that the precipitation bias correction and model improvement in MERRA-Land do not affect the albedo computation. In contrast, while ERA-Land is forced by the ERA-Interim downward radiation fields, they have different upward shortwave radiation flux errors, indicating that the precipitation bias correction and model improvement in ERA-Land affect its albedo computation.
Figure 7 shows the average across all products of the SH, LH, net shortwave (SW) and net longwave (LW) radiation for WI in order to show the temporal evolution of the errors in these fields. For the snow-covered period (from day 321 to day 105) in Fig. 7, the average (across all products) net short- and longwave flux errors are 4.6 and −12.8 W m−2, leading to a net radiative flux error of −8.2 W m−2. The average errors in LH (10.8 W m−2) and SH (−12.8 W m−2) are comparable in magnitude but have opposite signs. For late spring and early summer when soil is relatively wet (from day 106 to day 175) in Fig. 7, the net radiative flux error of 22.2 W m−2 is overly balanced by the LH error of 45.5 W m−2 (with a smaller compensational SH error of −14.7 W m−2). For late summer and early fall when the soil is drier (from day 176 to day 320), the average SH error (12.7 W m−2) becomes positive, but it is still smaller than the LH error (17.8 W m−2).
As discussed above, the flux tower observations do not maintain closure of the surface energy budget, whereas all products closely satisfy energy closure. The lack of closure by eddy covariance techniques is well known and studied. Twine et al. (2000) have shown that closure is most accurately obtained by assuming the net radiation and Bowen ratio β of SH over LH are measured accurately by the flux tower. So in order to better quantify errors in the SH and LH fields, we have chosen a simple method to force the monthly energy budget into closure in the flux tower data by multiplying the SH and LH terms by a constant scale factor A for each month such that SW + LW − GH − A(SH + LH) = 0. This method preserves β and EF while providing a closed observational energy budget by which we can better evaluate SH and LH from reanalysis products. The adjustment is shown in Fig. 7. This adjustment would shift the SH errors from the above values to −10.3, −18.1, and 11.9 W m−2 for the snow-covered, moist soil, and dry soil portions of the year described earlier. For the same periods, the LH errors become 11.0, 39.4, and 6.9 W m−2, respectively. The LH and SH errors have similar magnitudes and temporal behaviors before and after adjustment. This suggests that the seasonal variations of errors in the energy fluxes of the reanalysis products discussed above are representative of errors in the reanalysis products rather than the validating flux tower data.
In this study, we have used the in situ data for several years over five flux tower sites (with a total of 315 576 h of data) to evaluate above- and belowground quantities as well as interface quantities from three global reanalysis products (MERRA, ERA-Interim, and CFSR) and six global land data assimilation products (MERRA-Land; ERA-Land; and GLDAS with CLM, Mosaic, Noah, and VIC). These five sites cover major land-cover types over midlatitudes (three forest sites, one grassland site, and one shrubland site) with seasonal snow cover over three sites.
All of the reanalysis products analyzed in this study show systematic errors in their representations of soil moisture, snow depth, and precipitation. Precipitation tends to be underestimated by the reanalyses, with a significant increase in this negative bias occurring with the summer rains. The exception to this pattern is that MERRA and MERRA-Land have slight positive biases during the summer rains at the AZ, FL, and MT sites. Though beyond the scope of this study, a more focused investigation into the data assimilation and bias correction of precipitation data employed by MERRA and MERRA-Land as opposed to the other products could lead to useful insights for correcting the fairly ubiquitous negative summer rain bias in the other products. Consistent with the precipitation bias, snow depth is subject to large errors as well, and the error of −18 cm in maximum snow depth (ERA-Interim in WI) is the greatest in magnitude for any product at any site. While some products (e.g., ERA-Land) realistically simulate the onset and ending dates of the snow season, MERRA and MERRA-Land tend to greatly overestimate the length of time with snow on the ground. This is significant because the disappearance of snow on the ground acts as a fast switch that significantly increases air temperature and reduces mean relative humidity (Betts et al. 2014).
There is a nearly pervasive positive bias in net shortwave radiation that peaks with the melting of snow and a negative bias in net longwave radiation, and the magnitude and seasonal evolution of these biases is linked to land-cover type (e.g., forests vs grassland). These biases partially cancel out and are tied to the summer warm bias and negative surface albedo biases, respectively. Yet, the errors in surface available energy lead to errors in ground (GH) and turbulent (SH and LH) heat fluxes, with the GH errors usually smaller in magnitude than SH and LH errors. The all-product averages usually overestimate LH and underestimate SH. However, these biases change with season (e.g., for the snow period, spring/early summer with relatively wet soil, and summer/early fall with relatively dry soil). Furthermore, these biases are affected by the use of the original LH and SH measurements versus the adjusted values (to force energy balance in flux tower measurements).
Recognizing the energy flux imbalance issue in tower measurements, a more robust quantity is the EF, defined as LH/(SH + LH). Indeed the all-product averages usually agree well with the observed EF. For the springtime over WI and IN, however, the all-product averages show a nearly constant EF, which is different from the observed increase of EF with time.
Despite the biases in soil moisture, all products show significant correlations with observed daily soil moisture for the periods with unfrozen soil, with the coefficients lower at the sites with the greatest snowfall (WI and MT). Temperature fields at the forested sites (WI, IN, and FL) show a consistent seasonal shift in the behavior of the errors. Soil temperature error in these products is characterized by an underestimation during the winter and a shift to an overestimation in summer. This leads to greater seasonal amplitude of soil temperature in these products over the three forested sites. This seasonal pattern is lacking from the 2-m air temperature field. While errors in 2-m air temperature are highly correlated with errors in skin temperature for all sites, the correlations between skin and soil temperature errors are weaker, particularly over the sites with seasonal snow (WI, IN, and MT). For instance, the all-product-averaged positive air temperature error is greater than that of soil temperature at WI in summer, because air temperature is the driver of, and strongly correlated with, soil temperature. In contrast, the negative air temperature error is smaller in magnitude than that of soil temperature in winter, because soil is shielded by snow and model deficiencies in snow treatments become more important (in generating soil temperature errors).
It needs to be emphasized that there is an inherent uncertainty in using the point measurements to evaluate gridded products. While this uncertainty cannot be fully quantified in this study, we have attempted to address this issue from different perspectives. In section 3a, we used several COOP sites for the in situ snow data, used the differences between in situ data and CMC snow results to roughly estimate the uncertainty due to spatial representativeness issues, and identified large precipitation differences between reanalysis products (independent of in situ data). In section 3b, we used soil temperature measurements at two depths (rather than at a single depth) to evaluate reanalysis products, and we evaluated relationships (between quantities) that may be less sensitive to spatial scaling issues than the quantities themselves (e.g., the correlation between air, skin, and soil temperatures and the seasonal shift of temperature errors). In section 3c, we evaluated both EF (which may be less sensitive to scaling issues) and the actual SH and LH values with and without adjustments to close the energy budget in flux tower measurements.
This work was supported by NASA (Award NNX14AM02G) and NSF (Award AGS-0944101). We thank the data centers for providing various datasets for this work. The flux tower data were obtained from the FLUXNET database, and the COOP and CFSR data were from the NCAR Research Data Archive (RDA). The MERRA, MERRA-Land, and GLDAS data were from the Goddard Earth Sciences (GES) Data and Information Services Center (DISC). ERA-Interim and ERA-Land were obtained directly from ECMWF. CMC data were acquired from the National Snow and Ice Data Center (NSIDC). Additionally, we thank the two reviewers for their helpful and insightful comments.
Supplemental information related to this paper is available at the Journals Online website: http://dx.doi.org/10.1175/JHM-D-15-0224.s1.