Consistency and Homogeneity of Atmospheric Energy, Moisture, and Mass Budgets in ERA5

: This study uses advanced numerical and diagnostic methods to evaluate the atmospheric energy budget with the ﬁfth major global reanalysis produced by ECMWF (ERA5) in combination with observed and reconstructed top of the atmosphere(TOA)energyﬂuxesfortheperiod1985–2018.Weassessthemeridionalaswell as ocean–landenergytransport and perform internal consistency checks using mass-balanced data. Furthermore, the moisture and mass budgets in ERA5 are examined and compared with previous budget evaluations using ERA-Interim as well as observation-based estimates. Results show that peak annual mean meridional atmospheric energy transports in ERA5 (4.58 6 0.07 PW in the Northern Hemisphere) are weaker compared to ERA-Interim (4.74 6 0.09 PW), where the higher spatial and temporal resolution of ERA5 can be excluded as a possible reason. The ocean–landenergy transport in ERA5 is reliable at least from 2000 onward ( ; 2.5 PW) such that the imbalance between net TOA ﬂuxes and lateral energy ﬂuxes over land are on the order of ; 1Wm 2 2 . Spinup and spindown effects as revealed from inconsistencies between analyses and forecasts are generally smaller and temporally less variable in ERA5 compared to ERA-Interim. Evaluation of the moisture budget shows that the ocean–land moisture transport and parameterized freshwater ﬂuxes agree well in ERA5, while there are large inconsistencies in ERA-Interim. Overall, the quality of the budgets derived from ERA5 is demonstrably better than estimates from ERA-Interim. Still some particularly sensitive budget quantities (e.g., precipitation, evaporation, and ocean–land energy transport) show apparent inhomogeneities, especially in the late 1990s, which warrant further investigation and need to be considered in studies of interannual variability and trends.


Introduction
Earth's atmosphere exchanges vast amounts of energy with space, the ocean, and land surfaces. Large-scale lateral energy transports balance the arising differential heating, which exhibits a strong meridional gradient (Peixoto and Oort 1992). Moreover, atmospheric energy transports and their divergence are closely linked to spatial patterns of surface energy flux (Liu et al. 2017;Mayer et al. 2017;Trenberth and Fasullo 2017). Thus, atmospheric transports also determine patterns of net heat uptake by the oceans, where most of the excess heat arising from anthropogenic climate change is stored (von Schuckmann et al. 2020). The atmospheric moisture cycle represents the link between the energy budget and the mass budget. Evaporation transfers water vapor and thus mass and latent heat into the atmosphere, which subsequently transports moisture to regions where precipitation exceeds evaporation, where the latent heat is released and warms the atmosphere. Changes to the atmospheric moisture cycle are of direct relevance to society, and climate models predict an enhancement of climatologically dry and wet zones (Held and Soden 2006;Seager et al. 2010;Collins et al. 2013). These trends have been detected indirectly through changes in the oceans (e.g., Boyer et al. 2005;Rhein et al. 2013;Li et al. 2020;Cheng et al. 2020), but direct detection of these changes with observational and reanalysis datasets is challenging due to existing biases (Trenberth et al. 2011) and lack of temporal stability (e.g., Robertson et al. 2016).
Atmospheric reanalyses provide a four-dimensional gridded estimate of the atmospheric state and thus in principle are perfectly suited for the quantification of atmospheric budgets and changes thereof. However, the assimilating models are imperfect, which can introduce biases where observational information is lacking. In addition, the observing system is constantly evolving, which often introduces spurious jumps in reanalyzed quantities that are relevant for budget evaluations (Chiodo and Haimberger 2010;Berrisford et al. 2011;Hersbach et al. 2020). Nevertheless, reanalysis is an iterative process, and every new generation of these products is expected to come with qualitative improvements over its predecessors (Dee et al. 2014;Buizza et al. 2018).
Here we use data from the fifth major global reanalysis produced by ECMWF (ERA5; Hersbach et al. (2020); publicly available via the Copernicus Climate Change Service climate. copernicus.eu; see https://confluence.ecmwf.int/display/CKB/ How1to1download1ERA5 for instructions) to evaluate various aspects of the atmospheric energy, moisture, and mass budgets. Results are validated with observational products wherever Denotes content that is immediately available upon publication as open access. possible and extensive comparison is made with ERA5's predecessor ERA-Interim ) to document progress. Improvements are demonstrated in many climate-related aspects but also in terms of evaluation methods, such as mass adjustment (Trenberth 1991;Mayer and Haimberger 2012) and reduction of spectral noise over topography.
This paper is structured as follows. The input data are introduced in section 2. Section 3 describes numerical and diagnostic methods applied in this study. Section 4 presents results from the evaluation of the atmospheric mass and moisture budget. The evaluation of the atmospheric energy budget using ERA5 and a comparison with ERA-Interim and previous studies is presented in section 5. Summary and an outlook follow in section 6.

Data
The atmospheric mass, moisture, and energy budgets are calculated for the period 1985-2018 using ECMWF's latest atmospheric reanalysis dataset ERA5, which provides global data on a reduced Gaussian grid N320 (equivalent to 0.288 spatial resolution) with 1-hourly temporal resolution and in 137 atmospheric levels up to a pressure of 0.01 hPa. ERA5 is currently available from 1979 onward and consists of analyses and shortrange forecasts initialized from the analyzed fields daily at 0600 and 1800 UTC. It also provides ten low-resolution ensemble members with 3-hourly temporal and N160 (equivalent to 0.568) spatial resolution for uncertainty estimates. All budget quantities are aggregated to monthly means, which can be compared, e.g., to ERA-Interim data as presented by Mayer et al. (2017), that were calculated with 6-hourly time resolution on a reduced N128 Gaussian grid covering the full study period 1985-2018. Net TOA fluxes are taken from the recently published DEEP-C v4.0 dataset (Liu et al. (2020); freely available on https:// researchdata.reading.ac.uk/271/) providing monthly averages on a full Gaussian grid F128. The DEEP-C dataset is a backward extension of the Clouds and the Earth's Radiant Energy System-Energy Balanced and Filled (CERES-EBAF) product in version 4.1 (Loeb et al. 2009(Loeb et al. , 2018 using satellite data, AMIP5 model simulations, and ERA5 TOA flux data for reconstructions prior to the CERES era. Its availability from January 1985 to June 2019 determined the analysis period for this study. Observation-based precipitation estimates are taken from the Global Precipitation Climatology Centre Full Data Monthly V.2018 (GPCC; Becker et al. 2013;Schneider et al.2016) and Global Precipitation Climatology Project v2.3 monthly analysis (GPCP; Adler et al. 2018) products providing monthly averages on a 18 and 2.58 regular grid, respectively. GPCC data are available from 1891 to 2016 using quality-controlled precipitation data from various sources (national weather and hydrological services, and historical data) and the robust SPHEREMAP interpolation method (Willmott et al. 1985) to provide near-global land coverage (with the exception of Antarctica). Nonetheless, sampling errors are still an issue in GPCC varying in space depending on the rain gauge network density (up to 40% in data-sparse regions according to Schneider et al. 2014). GPCP combines GPCC rain gauge data with satellite measurements and provides global data for the full study period.
In this study, ocean averages are referred to the global ocean area including regions covered by sea ice. Inland waters such as the Great Lakes and the Caspian Sea are excluded. Land averages refer to all areas that are excluded in the ocean average, unless otherwise specified.

a. Moisture and mass budget
The conservation of mass and moisture is a fundamental property of planetary physical systems such as Earth's atmosphere. The degree to which this property is satisfied represents an important method to assess the internal consistency and quality of reanalyses. We formulate the vertically integrated moisture budget as where g is the gravitational constant, p S is the surface pressure, q is the specific humidity, v is the horizontal wind vector, and E and P are the evaporation and precipitation, respectively. Evaporation E and precipitation P are flux densities in units of kilograms per square meter per second (kg m 22 s 21 ), which are positive if directed downward. This equation describes the balance between the vertically integrated horizontal moisture flux divergence (left side; denoted as VIWVD hereafter) and the vertically integrated moisture (i.e., total column vapor) tendency (denoted as QTEND) plus surface freshwater fluxes (i.e., P 1 E). Vertical moisture fluxes at the TOA are zero. To quantify the moisture budget in ERA5 and ERA-Interim, we prefer analyzed quantities as they are constrained by observations as much as possible. Hence, we use analyzed hourly VIWVD (denoted as VIWVD AN ) and QTEND fields. Evaporation and precipitation are only available from shortterm forecasts and are taken from twice-daily 12-hourly forecasts started at 0600 and 1800 UTC, respectively. In section 4b we will also take advantage of the availability of twice-daily instantaneous forecast fields of wind and moisture, which allows computation of forecasted moisture flux divergence fields (denoted as VIWVD FC ). We note that the ERA5 archive contains also accumulated moisture flux divergence fields, but they are numerically inaccurate (Paul Berrisford, ECMWF, 2020, personal communication) and hence are not used here. Furthermore, we write the vertical integral of the atmospheric total mass budget as where the expression on the left represents the vertical integral of the total mass flux divergence including dry and moist air masses (denoted as MDIV total ), and the right side consists of the mass tendency (denoted as MTEND), evaporation and precipitation. Analogously to the moisture budget, P and E are taken from forecasts, and MTEND and MDIV total are computed using analyzed fields.
horizontal and vertical vapor fluxes in a consistent manner.
Here, we use the simplified version of their equation, where horizontal and vertical enthalpy fluxes associated with water (and snow) are neglected. The introduced inaccuracies (discussed extensively in Mayer et al. 2017) are negligible in the context of this study. Thus, we formulate the vertically integrated divergence of the dry static plus kinetic energy fluxes (denoted as TEDIV hereafter) as follows: where c a is the specific heat capacity of dry air, T a is the temperature of air measured in kelvin, L y is the latent heat of vaporization, F the geopotential, and k is the kinetic energy. It is important to note that results are truly independent of reference temperature only in the steady state (which was assumed in Mayer et al. 2017), but there are ambiguities arising from local mass variations in the nonsteady state. These ambiguities become larger with larger local mass variations, which generally is the case for shorter averaging periods (see, e.g., Liang et al. 2018). However, in the present manuscript the minimumaveraging period is 12-months (the length of the temporal averaging window for the time series). At this time scale, local differences between results using the kelvin or Celsius scale are on the order of 1.0 W m 22 , and the RMS difference of oceanland transport is 0.1 W m 22 . Due to this acceptably small ambiguity, we opted for the use of the kelvin scale here to ensure consistency with the other datasets used for comparison. TEDIV can be calculated directly using analyzed state quantities, in this case referred to as TEDIV dir . This is the method of choice if the total energy budget equation is used to infer an estimate of net surface energy fluxes F S (sum of turbulent plus radiative heat fluxes) by combining satellite-based net TOA energy fluxes R TOA (here from DEEP-C), vertically integrated atmospheric energy tendency AET, and directly computed TEDIV dir (mass-adjusted as described below; Liu et al. 2015;Trenberth and Fasullo 2017;Mayer et al. 2017). For this application, AET is calculated from analyzed state quantities (denoted as AET AN ), since these are supposed to give the most accurate estimates of AET.
TEDIV can also be estimated indirectly (denoted as TEDIV ind ) from the parameterized state quantities of the right side of Eq. (3) containing the net TOA flux R TOA , the net surface flux F S , and the vertical integral of the forecast total atmospheric energy tendency AET FC . The use of AET FC in the evaluation of TEDIV ind is consistent with forecasts of vertical fluxes (Mayer and Haimberger, 2012). All quantities used for computation of TEDIV ind represent monthly averages of 12-hourly short-term forecasts.

c. Mass adjustment
The calculation of vertically integrated flux divergences requires considerable care. Before individual budget terms are computed, each input field is transformed to a quadratic full Gaussian grid in order to reduce aliasing effects (Durran 2013) at high latitudes caused by the spectral transform method that is applied. Still this does not imply that the right side of Eq. (2) is equal to the left side. Consistency of the mass budget is achieved by iteratively adjusting the horizontal wind field v according to the mass budget residual where parameterized precipitation and evaporation fluxes are approximated by the analyzed divergence and tendency of the vertically integrated water vapor content [i.e., Eq. (1) is substituted into Eq. (2)] in order to get the expression MDIV total 1 MTEND 5 VIWVD AN 1 QTEND (Trenberth 1991;Fasullo and Trenberth 2008;Mayer and Haimberger 2012). That is, we put all terms on the left side such that where RE Mass is an estimate of the error in the divergence of atmospheric dry mass flux. Thus, taking the gradient of the inverted Laplacian of RE Mass yields a two-dimensional field of the associated vertically integrated horizontal erroneous mass flux. The erroneous mass flux is converted to a vertically averaged spurious wind field (through division by local atmospheric mass), which is used to barotropically adjust the horizontal wind vector v. Since this adjustment also affects VIWVD AN on the left side of the equation (which itself is affected by spurious divergent winds, albeit only very weakly), one needs to make 2-3 iterations until RE Mass vanishes and mass consistency is achieved. Please note the magnitude of this correction varies in space and time, as individual terms of Eq. (4) vary. The TEDIVdir and VIWVD AN fields are then computed using the mass-consistent wind which greatly reduces artificial noise over high topography. Also note that the difference between mass-adjusted and unadjusted VIWVD AN is negligible (see, e.g., Berrisford et al. 2011), because moisture flux divergence is mainly driven by advection rather than wind divergence. This approach has already been introduced by Trenberth (1991) and Mayer and Haimberger (2012) in a slightly modified way. Trenberth and Fasullo (2018) proposed a different approach to the mass adjustment, with a barotropic correction in a first step and an additional three-dimensional correction in a second step using the divergent component of moisture fluxes in each model level. That is, the 3D correction step weights the adjustment according to the distribution of the atmospheric moisture. However, it is unclear whether this method further reduces artificial noise over high topography. This will be tested in future studies.
All operations in spectral space as well as spectral transforms use routines used by the Integrated Forecasting System (IFS), which are provided through an openIFS license. This ensures maximum accuracy of our computations.

d. Diagnostics of budget consistency
The degree of closure of the diagnosed energy budget is an important indicator of the credibility of the results and their subsequent applications. Previous budget evaluations have shown substantial inconsistencies, but improvements are made with every new iteration of reanalyses (Dee et al. 2014;Buizza et al. 2018). Here, we explore inconsistencies in the energy budget based on the degree of closure and satisfaction of physical constraints. To study budget closure, we combine parameterized R TOA and F S with TEDIV dir derived from analyses. Consequently, either forecasted or analyzed tendencies can be used to check the degree of closure. Depending on what is used, different aspects of the budget consistency are emphasized as outlined in the following.
First, we define the residual to be the remainder of Eq. (3) using analyzed tendencies AET AN such that Assuming that AET AN is small in the long-term averages, this diagnostic measures the balance between the divergence of horizontal and vertical energy fluxes. It thus emphasizes budget inconsistencies arising from spinup and spindown effects in the parameterized fluxes. Alternatively, forecasted tendencies AET FC can be used to assess self-consistency of the budget, such that This equation essentially compares directly computed TEDIV using analysis fields with the indirectly estimated TEDIV using forecasts (Mayer and Haimberger 2012). In this diagnostic, inconsistencies from spinup or spindown effects are largely cancelled out by the forecasted tendency. Instead, inconsistencies arising from sampling (TEDIV dir uses hourly instantaneous analysis fields, while TEDIV ind uses accumulated forecast fields), the mean drift of the atmospheric circulation in short-term forecasts, and potential inconsistencies in the diagnostic equations are emphasized. In other words, this diagnostic shows differences between energy transports obtained from the analyzed atmospheric state and those obtained from the model physics and dynamics driving the forecasts. We investigate the closure of the mass and moisture budget analogously, but only use analyzed tendencies (i.e., equivalent to the residual). Another approach for the quality assessment of the results is the satisfaction of physical constraints; e.g., the inferred surface energy fluxes over land should be close to zero as the rate of energy storage in landmasses is small (on the order of &0:1 W m 22 as estimated by von Schuckmann et al. 2020).

a. Total mass and moisture budget
In this section, we investigate the atmospheric moisture and mass budget in ERA5. 1985-2018 averages of individual moisture budget terms are shown in Fig. 1 including P 1 E in Fig. 1a, VIWVD AN in Fig. 1b, and the moisture budget residual [difference between left and right side of Eq. (2)] in Fig. 1c. The analyzed moisture tendency is small compared to the moisture fluxes and is thus not shown here. Please note that the scaling in Fig. 1c differs from that in Figs. 1a or 1b. VIWVD AN and P 1 E are in good agreement such that the residual is relatively small across the globe, indicating small spinup and spindown effects in the forecasts. The largest discrepancies can be seen over high topography, e.g., along the Andes, Himalaya, or Rocky Mountains, where spectral noise is present in the VIWVD AN field. Over the tropical ocean, the residual resembles the VIWVD AN pattern, i.e., VIWVD AN is too strong (root-meansquare RMS 5 3.07 mm day 21 for 308N-308S over the ocean) to balance P 1 E (RMS 5 3.03 mm day 21 ) properly, which suggests that the tropical hydrological cycle spins down during the forecasts. Please note that the moisture budget residual has an RMS 5 0.23 mm day 21 for 308N-308S over the ocean, which is 92% smaller than that of P 1 E in the same region.
The total mass flux divergence MDIV total and the residual of the total mass budget are presented in Fig. 2. MDIV total in the left panel is relatively homogeneous over the ocean, especially in the Pacific Ocean basin where no clear P 1 E signal is visible as we would expect it for the long-term mean total (moist plus dry) mass flux divergence. However, a signature of P 1 E can be seen in the zonal average suggesting that its spatial pattern is buried by the spurious patterns of the dry air divergence (note that this diagnostic is based on the unadjusted wind fields). Wavelike structures parallel to coastal lines stem from the spectral method that is used, and the large-scale pattern of artificial noise over high topography may stem from steep gradients of surface pressure, although the field is spectrally truncated at wavenumber 179. The total mass budget residual in Fig. 2b is the difference between MDIV total (plus mass tendency, which is not shown here) and P 1 E (Fig. 1a). It exhibits the artificial noise from the MDIV total field, and a clear P 1 E signal in the equatorial Pacific Ocean as well as in the zonal mean, suggesting that moisture flux divergence and dry mass divergence are inconsistent. Global means and RMS values of the mass and moisture budget terms are summarized in the appendix B.
b. Ocean-land moisture transport Figure 3 presents land (top panel) and ocean averages (bottom panels) of individual moisture budget terms. Figures 3a and 3c focus on P and E as obtained from reanalyses and observational products (see also Hersbach et al. 2018). Figures 3b and 3d assess the consistency between P 1 E from forecasts and VIWVD AN . The degree of agreement between these two terms is a measure of spinup and spindown effects (emphasized by the residual RE AN ) and their temporal variations.
Over land ( Fig. 3a; please note that Antarctica is excluded in the following land averages as there are no GPCC observations in that region), precipitation from ERA5 has a 1985-2018 mean of 2.44 mm day 21 , very similar to that from ERA-Interim, which is 2.41 mm day 21 . Precipitation from ERA5 exhibits a weak but abrupt change around 2000, as seen in various energy budget terms (as will be discussed below), but is temporally stable around 2.41 mm day 21 afterward. Precipitation from ERA-Interim is stable around its climatological mean value so that it is smaller than P from ERA5 before 2002 and slightly larger afterward. We also show observation-based precipitation estimates from GPCP and GPCC. The former has a 1985-2018 land average of 2.31 mm day 21 , which is significantly less precipitation than from any other source. Precipitation from GPCC has a 1985-2016 mean of 2.43 mm day 21 and agrees well with those from reanalysis-based precipitation estimates. After 1997, both reanalysis products lie in between the values from the observation-based products. During this period, the temporal variability of the four precipitation products shown is very similar. Prior to 1997, we find that P from ERA-Interim matches that from GPCC reasonably well, whereas that from ERA5 is clearly larger than the other considered precipitation products. Note that signals such as the precipitation reduction associated with the eruption of Mt. Pinatubo in 1991 as well as strong El Niño-Southern Oscillation (ENSO) events are well captured by all four products. Land averages of evaporation from ERA5 as well as from ERA-Interim exhibit remarkably good temporal stability over the given time span, with no clear trend or change around 2000. They do not differ greatly in terms of temporal variability, but have an offset of ,0.1 mm day 21 .
Moisture transport from the ocean to land is realized by positive (negative) VIWVD over ocean (land), which is balanced by the vertical freshwater flux P 1 E. Over land ( Fig. 3b), El Niños and the Mt. Pinatubo eruption cause a clear signal in the VIWVD AN and P 1 E time series from both ERA5 and ERA-Interim. Nonetheless, there are sizeable differences between P 1 E and VIWVD AN from ERA5 at the beginning of the time series (VIWVD AN is ;17% smaller relative to P 1 E), which get gradually smaller until around 2005. After 2005, these fields agree very well except for an offset of ;0.02 mm day 21 . The 1985-2018 land averages are 20.7 mm day 21 for VIWVD AN and 0.8 mm day 21 for P 1 E.
The results from ERA-Interim, however, show large discrepancies over the whole period. While VIWVD AN steadily decreases until around 2005 and again increases afterward, P 1 E is relatively stable except for the period 2005-15. Compared to ERA5, moisture transport and freshwater fluxes are significantly weaker in ERA-Interim, with a 1985-2018 land average of 20.6 mm day 21 for VIWVD AN and 0.7 mm day 21 for P 1 E. Trenberth and Fasullo (2013) estimated a climatological ocean to land moisture transport of 20.7 mm day 21 using river discharge data from Dai et al. (2009), which is in better agreement with the moisture transport derived from ERA5 than that from ERA-Interim. Furthermore, P 1 E must balance the global river discharge plus changes in the terrestrial water storage (TWSC). Dai and Trenberth (2002) estimated the river discharge of the 912 largest rivers to be about 37.2 3 10 3 km 3 yr 21 . TWSC can be derived from Gravity Recovery and Climate Experiment satellite observations and is roughly 180 km 3 yr 21 for 2002-09 (Llovel et al. 2010). This is equivalent to a total freshwater flux of roughly 0.67 mm day 21 and in good agreement with the long-term mean of P 1 E from ERA-Interim. Over the ocean (Fig. 3c), the spread among different precipitation products is substantially larger than over land. Mean precipitation from GPCP is significantly lower than averages from reanalyses, with a 1985-2018 mean of 2.9 mm day 21 and good temporal stability, while those from both ERA5 and ERA-Interim have a mean of 3.2 mm day 21 . Reanalysis-based precipitation exhibits several discontinuities over the ocean. Precipitation from ERA5 increases in the early 2000s but is relatively stable before and after. Precipitation from ERA-Interim decreases rapidly around 1992, with a further decline until ;2005, when it starts to increase again until 2010. Evaporation products from the reanalyses agree reasonably well, but both exhibit a strong positive trend. Again, ENSO events (e.g., in 1997/98, 2009/10 and 2015/16) are well captured by the products shown, except for P from ERA-Interim, which is mainly dominated by the spurious temporal variability described above. Nogueira (2020) showed that both ERA-Interim and ERA5 overestimate precipitation mainly over the global ocean (especially in the tropics) and over the Himalaya and Andes when using GPCP as reference. ERA5 shows some improvements over ERA-Interim owing to better representation of deep convection. However, its 1985-2018 mean is only slightly closer to GPCP values. Nonetheless, the strong trends in P from ERA5 (leading to a larger discrepancy with GPCP during 2000-18; see Fig. 3c) are not well understood (Hersbach et al. 2020) and need further investigation. On the other hand, precipitation estimates from GPCP (using rain gauge data and satellite measurements) exhibit larger biases in regions with sparse rain gauge coverage (the relative bias is 10.5% over the ocean and 7.5% over land; Adler et al. 2012) and must thus be treated with caution.
VIWVD AN and P 1 E from ERA5 agree well over the ocean area (Fig. 3d), particularly between 1992 and 2015, with a 1985-2018 mean of 0.30 mm day 21 for VIWVD AN and 20.29 mm day 21 for P 1 E. VIWVD AN from ERA5 has a positive trend, with a slope of 1.0 3 10 23 mm day 21 yr 21 (taking autocorrelation into account, confidence intervals are 2.2 3 10 23 and 24.6 3 10 24 mm day 21 yr 21 for a 5 0.05), which means that ocean-land moisture transport in ERA5 suggests a steady increase over the given time span. This would imply a rather strong increase of ;3% per decade, which is at least one order of magnitude larger than the TWSC estimated by Llovel et al. (2010) and several orders of magnitude larger than river discharge trends provided by Su et al. (2018). We FIG. 3. Temporal evolution of (a),(c) precipitation and evaporation as well as (b),(d) VIWVD AN and P 1 E from various products. (top) Land averages and (bottom) ocean averages. Note that the Antarctica is excluded in land averages shown in (a). Solid lines illustrate ERA5 fields and dotted lines ERA-Interim fields, whereas precipitation is colored orange and evaporation is colored blue. Precipitation from GPCP and GPCC are dashed green and cyan, respectively; P 1 E and VIWVD AN are red and black, respectively. VIWVD FC is shown as a solid gray line. Note that the signs of E and P 1 E are inverted in this figure. Each line is smoothed using a 12-month running mean. Units are mm day 21 on the left figure axes and Sv (1 Sv [ 10 6 m 3 s 21 ) on the right axes. To convert mm day 21 to kg m 22 s 21 , divide by 86 400.
conclude that this strong but insignificant trend is unrealistic, since no statistically significant increase in land precipitation is evident in GPCC or other observation-based precipitation products, and there is no evidence that evapotranspiration over land is decreasing (Hartmann et al. 2013). Oceanic P 1 E from ERA5 steadily increases before 2000 and decreases after around 2010 making P 1 E smaller than VIWVD AN before 1992 and after 2015. In ERA-Interim, these fields are much more inconsistent during the entire period, with P 1 E having an exceptionally large increase (by a factor of 4) between 1991 and 1999 and another sharp decrease between 2008 and 2012. VIWVD AN from ERA-Interim is temporally more stable than P 1 E but also has a weak positive trend prior to around 2005 and a negative trend afterward. These spurious trends are associated with changes in satellite measurements of atmospheric water vapor (Berrisford et al. 2011;Dee et al. 2011), and inconsistencies in the moisture budget generally stem from analysis increments that can artificially remove or add moisture (Trenberth et al. 2011).
The VIWVD field from ERA5 is also available as instantaneous forecast (VIWVD FC ) quantity using twice-daily 12hourly forecasts, which is also shown in Figs. 3b and 3d. In contrast to VIWVD AN , no clear trend is visible in VIWVD FC indicating that the ocean-land moisture transport is temporally more stable in short-term forecasts compared to the analyses. While forecasted and analyzed VIWVD agree well in the late 1980s, differences after around 2009 are on the order of 0.1 mm day 21 over land and ;0.04 mm day 21 over the oceans. This results in a noticeable disparity between P 1 E and VIWVD FC over land after 2000, a period when P 1 E agree very well with VIWVD AN . The moisture tendencies from short-term forecasts (not shown), which can be interpreted as a result of the model drifting toward its own climate, indeed balance the disparity between P 1 E and VIWVD FC qualitatively well (over land, for example, stronger drift before 2000, weaker drift after 2000), but not exact, probably because VIWVD FC represents instantaneous rather than accumulated fields (a note on accumulated fields follows below). Hence, we conclude that P 1 E agrees better with VIWVD AN after around 2000, especially over land, indicating small model drift and a good balance between observations and model climate. Prior to 2000, however, differences between analyzed and forecast VIWVD are smaller so that the difference to P 1 E is similarly large for both quantities, which overall indicates the presence of drift in the short-term forecasts.
We note that the ERA5 archive also offers accumulations of forecasted moisture flux divergence. However, this field is computed with numerically less accurate methods (P. Berrisford, ECMWF, 2020, personal communication) that do not satisfy global properties such as a global zero mean (see Gutenstein et al. 2021), and thus is not suited to obtain a best estimate of ocean-land transport and hence is not used in the present study.
In summary, interpretation of temporal variations in the moisture budget in ERA5 is not straightforward. VIWVD AN from ERA5 agrees well with P 1 E after 2000 indicating a good balance of the assimilation system during that time, but it exhibits a strong and spurious trend, which is not seen in VIWVD FC . This indicates that temporally varying spinup and spindown effects in P (visible in unrealistic trends in P 1 E) tend to be compensated rather by temporally varying drift in atmospheric moisture than by forecasts of the horizontal moisture transport. At the same time, we see an effect of the varying analysis increments on the analyzed moisture transports as they suffer a rather strong and unrealistic trend.
Overall, these diagnostics indicate that the forecast quantity VIWVD FC may indeed be better suited for long-term studies than VIWVD AN because of its superior temporal stability. However, it should be kept in mind that we here consider averages over large areas. The conclusions may be different for more regional moisture budgets, but further investigations are beyond the scope of this study.

c. Moisture budget residuals
In this section, we investigate the internal consistency of the moisture budget. Keep in mind that we use moisture tendencies calculated from analyzed state quantities to evaluate the moisture budget. Analyzed tendencies are small compared to the flux terms such that the offset between VIWVD AN and P 1 E in Fig. 3 can be considered a good indicator for the magnitude of moisture budget residuals on a global average. Figure 4 presents zonally averaged anomalies of the moisture budget residuals. Over the land area, ERA5 residuals (Fig. 4a) are generally of the same sign with latitude, with positive anomalies (directed downward) before the discontinuity around 2000 and negative anomalies afterward. To evaluate the location of this discontinuity more precisely, we apply a 4-years moving average standard normal homogeneity test (SNHT; Alexandersson and Moberg 1997; not shown) on times series of zonally averaged residual anomalies. It shows multiple breaks between 1997 and 2003 for the region within 6308 latitude that might be caused by changes in the observing system, particularly the increased amount of SSM/I wind observations and the assimilation of ERS-1/2 data in the mid-1990 (Hersbach et al. 2020;Robertson et al. 2020). At higher latitudes, no significant breaks can be identified by the SNHT.
ERA-Interim residuals (Fig. 4b) exhibit a prominent dipole structure at low latitudes, with negative anomalies along the equator and positive anomalies at higher latitudes, suggesting temporally varying strengths of spinup and spindown effects of P 1 E in the tropics. This structure is, however, inverted in the early 2000s and becomes weaker afterward.
Over the global ocean, ERA5 (Fig. 4c) is more stable than ERA-Interim and features large discrepancies only near the equator following the seasonal cycle of the intertropical convergence zone, which also moves regions of strongest anomalies (between 1995 and 2003 and after 2016). The ERA-Interim budget residual over the ocean area (Fig. 4d) has a smaller mean than that from ERA5, and is in general similar to that over land. It does not exhibit this prominent dipole structure, but variations of the residual are also strongest along the equator. Moreover, anomalies of the ERA-Interim moisture budget residual have a 24% (70%) larger RMS value over land (ocean) compared to those from ERA5. Hence, the small mean of the ERA-Interim residual over ocean may be a result of taking the mean of positive and negative values with similar magnitude. To conclude, VIWVD AN and P 1 E agree relatively well in ERA5, while there are large discrepancies in ERA-Interim. In both products moisture budget residuals are largest along the equator.

Atmospheric energy budget
a. Divergence of total atmospheric energy flux Evaluations of TEDIV are central in this paper, since it can be used for indirectly estimating the net surface energy flux F S but also for consistency estimates (RE AN and RE FC ) as defined in Eqs. (5) and (6), which are useful indicators for consistency problems and inhomogeneities. Here we focus on the consistency aspect. Updated indirect estimates of F S fields as presented, e.g., in Mayer et al. (2017) and Trenberth and Fasullo (2018) will be discussed in future work.
We present 1985-2018 averages of different TEDIV products in Fig. 5. Figure 5a displays the TEDIV dir derived from ERA5 using advanced numerical and diagnostic methods introduced above, and Fig. 5b shows the same field but from previous budget evaluations using ERA-Interim. Additionally, we depict the indirectly estimated TEDIV ind from parameterized fluxes as they are stored in ERA5. Each field is spectrally truncated at wavenumber 179. Regions of positive TEDIV are indicative of divergent lateral energy fluxes, which is generally the case in the tropics, but most notably over western boundary currents (Sverdrup 1947;Stommel 1948;Seager and Simpson 2016) where the atmosphere is supplied with energy from warm water masses that are transported from the equator poleward, e.g., along the Gulf Stream and Kuroshio in the Northern Hemisphere (NH), and the Aghulas and East Australian Currents in the Southern Hemisphere (SH). Convergent lateral energy fluxes (negative TEDIV) can primarily be found in polar regions where Earth is losing energy to space. It is also noted that the gradients along sea ice edges, western boundary currents, and coastal lines are sharper in the novel ERA5 fields (Figs. 5a and 5c; see also Mayer et al. (2019) for closeups) than in previous assessments, where stronger truncations had to be applied to remove the much stronger spectral noise (see, e.g., Trenberth and Fasullo 2018). Moreover, the fields in Fig. 5 are much smoother over high topography (e.g., along the Andes, Rocky Mountains, and Himalayas) reducing the RMS by at least ;10% relative to that from ERA-Interim. Note that this improvement not only comes from the enhanced methods that are used, but also from the higher spatial resolution of ERA5. That is, the truncation to T179 reduces the native spatial resolution of ERA5 by ;70%, while it reduces that of ERA-Interim by only ;30%, i.e., the relative truncation is stronger in ERA5 which may contribute to the stronger noise reduction in ERA5. However, there is still some residual noise over the Andes and Himalaya as these regions exhibit the strongest artificial noise. TEDIV ind looks similar to TEDIV dir derived from ERA5, both are very smooth over land compared to TEDIV dir from ERA-Interim. TEDIV ind has a 1985-2018 global mean of 22.9 W m 22 , which indicates an inconsistency between forecast vertical fluxes and energy tendency. Note that this discrepancy is larger than that found in ERA-Interim using the same method (see Mayer and Haimberger 2012). The reason for this is unclear and warrants further investigation.
The benefit of 1-hourly temporal resolution for budget evaluations from ERA5 is highlighted in Fig. 6. The left panel shows monthly mean TEDIV dir derived from ERA5 for January 2010 based on 1-hourly evaluation. The right panel depicts the same, but is evaluated with a reduced temporal resolution of 6 h. While the large-scale pattern remains unaffected by the temporal resolution, regional differences attain values up to ;300 W m 22 , especially in midlatitudes where fast moving cyclones cannot be sampled properly. Considering annual averages these differences are still on the order of ;90 W m 22 (not shown). Note that both fields are presented at full spectral resolution T639 such that the artificial noise over high topography is not filtered out, which is already reduced (due to the iterative mass correction) compared to the noise in mass-unadjusted TEDIV dir fields (not shown). A further reduction of this noise will be addressed in future work.
In Fig. 7, Hovmöller plots of TEDIV dir anomalies from ERA5 (left panels) and ERA-Interim (right panels) are presented. The upper panels show anomalies for the global land area and the lower panels show anomalies for the global ocean. Over land, anomalies of TEDIV dir derived from ERA5 are dominated by an abrupt change of sign in the late 1990s, with primarily positive anomalies before and negative anomalies afterward. The ERA-Interim analog in Fig. 7b does not exhibit this abrupt change, but exhibits a meridional dipole structure with positive anomalies at the equator and negative anomalies along subtropical latitudes after around 2005, and vice versa before. The noise at high latitudes mainly stems from regions where the sparse number of grid points contributing to the zonal mean are dominated by the artificial noise over high topography.
Over the ocean, TEDIV dir anomalies from ERA5 are similar to those from ERA-Interim. However, there are some differences. The largest disparity can be seen between 2002 and 2010 along 108N, where ERA-Interim is mostly negative and ERA5 has no uniform pattern. There are some minor differences in the amplitude of individual peaks at the equator representing the ENSO, e.g., the positive anomaly at the beginning of 1998 and the subsequent negative anomaly are more pronounced in ERA5. Also, TEDIV dir from ERA-Interim is temporally less stable in the tropical region (308N-308S) after about 2010 (RMS 5 5.8 W m 22 in this region whereas that from ERA5 is RMS 5 4.9 W m 22 for the 2010-18 period) as a decreasing number of satellite data were assimilated in ERA-Interim during this period [see, e.g., Mayer et al. (2018)]. TEDIV dir anomalies derived from ERA5 are slightly more negative over the tropical ocean before 2000 and positive or close to zero afterward. The TEDIV dir ocean average for 308N-308S is 34.1 W m 22 for 1985-1999 and 35.8 W m 22 for 2000-18, i.e., the changes in the TEDIV over the global ocean mainly stem from changes in the tropics. This is coincident with a sudden increase of ERA5's 10-m wind speed and evaporation in the eastern tropical Pacific during the late 1990s documented by Robertson et al. (2020), which they attributed to changes in the observing system. In summary, TEDIV dir anomalies over the global ocean are very similar in ERA5 and ERA-Interim. Over land, ERA5 exhibits a persistent and spatially relatively uniform transition from positive to negative anomalies in the late 1990s, whereas ERA-Interim has some spatial discrepancies as well.

b. Meridional energy transport
In this section, we estimate the atmospheric meridional energy transport (AMET) for the period 2010-18 using various TEDIV fields derived from ERA5 as well as ERA-Interim (Fig. 8). Meridional transports are obtained by inverting the Laplacian applied to TEDIV and computing the meridional derivative of the resulting potential function.
The maximum AMET derived from 1-hourly evaluated TEDIV dir using ERA5 is 4.58 PW (peak value standard deviation s peak 5 0.07 PW and peak latitude s lat 5 0.288) in the NH and 25.19 PW (s peak 5 0.07 PW, s lat 5 0.548) in the SH both around 408 latitude, whereas that from ERA-Interim is considerably stronger with 4.74 PW (s peak 5 0.09 PW, s lat 5 0.178) and 25.28 PW (s peak 5 0.10 PW, s lat 5 0.418). To rule out the differing spatial and temporal resolution of these products as potential source for this discrepancy, we also derive the AMET from ERA5 products with reduced resolution. That is, we also evaluate TEDIV dir 1) with 6-hourly temporal but same spatial resolution, and 2) using one ERA5 ensemble member with 6hourly temporal and reduced spatial (N160) resolution. The former has a maximum transport of 4.55 PW in the northern and 25.13 PW in the Southern Hemisphere, whereas the latter is at 4.53 PW and 25.11 PW. All three ERA5 products agree well within ;0.1 PW, as Fig. 8 shows. Thus, the discrepancy between ERA5 and ERA-Interim does not result from differences in spatial or temporal resolution. Moreover, this result suggests that the AMET is quite robust to changes of the spatial resolution of the reanalysis and that 6-hourly temporal resolution is sufficient to capture relevant eddy heat fluxes, at least when considering zonal integrals of the transports.
Additionally, we derive the AMET from the indirectly estimated TEDIV ind , which is slightly weaker than that from the aforementioned ERA5 products. It has a maximum of 4.46 PW in the northern and 25.28 PW in the Southern Hemisphere. Note that TEDIV ind is uniformly adjusted to zero global mean by subtracting its 2010-18 average of 23.06 W m 22 before the AMET is computed.
We also show the total (atmosphere plus ocean) meridional energy transport derived from CERES and parameterized net TOA fluxes from ERA5 and ERA-Interim, which are also uniformly adjusted to zero global mean (2010-18 averages are 1.07, 0.64, and 21.93 W m 22 , respectively). The total energy transport derived from CERES has a peak annual mean of 5.90 PW in the NH and 25.77 PW in the SH. The total energy transports derived from parameterized net TOA fluxes are substantially smaller, with maxima of 5.33 and 25.34 PW in ERA5 and 5.11 and 25.27 PW in ERA-Interim. This is a 10 (14) % weaker poleward transport of energy in ERA5 (ERA-Interim) in the NH when using CERES as reference. In the Southern Hemisphere, differences are in general smaller (7% weaker in ERA5 and 9% weaker in ERA-Interim). This indicates that net TOA flux biases in ERA5 are qualitatively similar to those in ERA-Interim, with too little energy absorption in the tropics and too little energy loss in high latitudes (consistent with Mayer and Haimberger 2012), but they are reduced in ERA5, leading to a more realistic total meridional energy transports in this product.
The cross-equatorial total energy transport derived from CERES-EBAF net TOA fluxes is northward (115 TW). Combining this with our estimates of cross-equatorial AMET that range between 2380 TW (low-resolution ERA5 ensemble member) and 2540 TW (TEDIV ind from ERA5) yields an oceanic cross-equatorial northward energy transport of about 480 6 80 TW (derived from the average of our AMET estimates), which is in good agreement with estimates provided by Liu et al. (2020).
In contrast to the CERES-based estimate of 15 TW, the total cross-equatorial energy flux is southward when using reanalysis data (2151 TW in ERA5 and 2361 TW in ERA-Interim), which implies a hemispheric asymmetry in the radiation bias of the reanalyses. To understand this better, we show in Fig. 8b zonal averages of the absorbed shortwave (ASR) and outgoing longwave radiation (OLR) at TOA from ERA5, ERA-Interim, and CERES. ASR and OLR from ERA5 are at almost all latitudes closer to the CERES radiation (RMS difference is 7.8 W m 22 for ASR and 3.2 W m 22 for OLR) than those from ERA-Interim (RMS differences are 10.7 and 6.1 W m 22 , respectively), i.e., ERA5 exhibits a stronger (weaker) ASR (OLR) in the tropics. At higher latitudes, both ASR and OLR are weaker in ERA5 than in ERA-Interim. This results in generally stronger and more realistic (when using CERES as reference) R TOA (R TOA 5 ASR 2 OLR) fluxes in ERA5, particularly at low latitudes and in the SH. An outstanding feature of Fig. 8b is the positive ASR bias in both ERA-Interim and ERA5 over the Southern Ocean, which represents a long-standing bias common to many atmospheric models (see, e.g., Hyder et al. 2018), and likely contributes to the negative bias in cross-equatorial energy flux discussed above.
In addition to the stronger net TOA input in the tropics, ERA5 also exhibits an increase in parameterized net downward surface energy fluxes in the tropics (mainly arising from stronger shortwave absorption and weaker evaporation; not shown) compared to ERA-Interim, which is more pronounced than the changes in R TOA . This results in a weaker convergence of vertical energy fluxes (R TOA 2 F S ) in ERA5 in the tropics and the SH, and thus in a weaker atmospheric meridional energy transport in ERA5.

c. Ocean-land energy transport
Here, we assess the temporal evolution of the ocean-land energy transport and other relevant energy budget terms. The transport from ocean to land is realized by positive TEDIV over the ocean (or convergence over land). Energy transports associated with river discharge are of order ;1.2 3 10 6 m 3 s 21 (Dai and Trenberth 2002;Dai et al. 2009), which is equivalent to an enthalpy flux of ;0.3 W m 22 for an assumed temperature difference of 10 K between the river and ocean, and are thus neglected here. Figure 9 shows time series of parameterized as well as reconstructed net TOA fluxes, TEDIV dir derived from ERA5, and F S inferred from the latter two. In addition, we show the inferred F S using R TOA from CERES-EBAF v4.1 for the period 2001-18 as well as the mass-unadjusted TEDIV, as it is stored in ERA5, and TEDIV ind . The left column presents absolute fluxes using a 12-month running mean and the right column displays corresponding anomalies using a 13-month Gaussian filter (Trenberth et al. 2007;Solomon et al. 2007), whereas land averages are at the top and ocean averages at the bottom.
Parameterized and reconstructed net TOA fluxes are in good agreement from a global perspective. Parameterized TOA fluxes have a global mean of 0.51 W m 22 and those from observations and reconstructions 0.45 W m 22 for the full period. Both products capture ENSO events with stronger (weaker) energy input during La Niña (El Niños). ERA5 uses realistic aerosol forcing (Hersbach et al. 2020) which is reflected in the negative R TOA anomalies associated with the Mt. Pinatubo eruption in 1991 and was not represented in ERA-Interim (e.g., Dee et al. 2011). However, the impact on parameterized TOA fluxes appears to be longer lasting compared to DEEP-C. It is also noteworthy that the El Niño event in 2009/10, which was significant in terms of moisture (see Fig. 3) but only moderate in terms of SST anomalies, exhibits a strong TOA signal in both products. We also applied the standard normal homogeneity test to the DEEP-C TOA fluxes, which did not reveal clear jumps neither over ocean nor land.
While both net TOA flux products are stable over the full period and show no significant trend, we find a gradual change between 1996 and 2003 in all TEDIV time series shown (Fig. 9a). TEDIV dir derived from ERA5 has a 1985-2018 mean of 215.7 W m 22 over land, with positive anomalies of around ;1.5 W m 22 before the discontinuity and 21.0 W m 22 afterward. TEDIV ind is very similar to TEDIV dir but has an offset of about 2.0 W m 22 , i.e., ocean to land energy transport is stronger when derived from TEDIV ind . The mass-unadjusted TEDIV exhibits an even stronger discontinuity resulting in a stronger (weaker) ocean-land transport after (before) 2000 compared to our mass-adjusted TEDIV dir . Decomposition of the energy transports (not shown) reveals that the latent heat term accounts for the majority (about 220 W m 22 ) of TEDIV when averaged over the land. Enthalpy and geopotential transports are of opposite sign and approximately balance each other (the kinetic energy term is an order of magnitude smaller). In the non-mass-adjusted TEDIV, however, we find a disparity between these two terms of about 6 W m 22 before and 2 W m 22 after 2000 (over land) explaining the gradual change in TEDIV, i.e., the temporal discontinuities of opposite sign in the two terms do not cancel out. In our mass-adjusted TEDIV dir this imbalance still exists but is reduced by ;50%.
This gradual change might stem from the changing observing system. The ERA5 assimilation system was initially tested in the period after 2000, when generally more observational data (such as GPS-RO and AMSU) have been available than before. We also note that ocean-land energy transports in ERA-Interim (not shown; see Trenberth and Fasullo (2013) for a figure) are temporally more stable, with a temporal standard deviation of ;0.3 W m 22 compared to ;0.5 W m 22 in ERA5. Reasons for the differences between ERA-Interim and ERA5 may be the fact that some observational data that are assimilated by ERA5 have been reprocessed compared to the versions used in ERA-Interim (e.g., SSM/I microwave imager observations; see Fig. 5 in Hersbach et al. 2020), and the use of different SST products before 2009. Before that time, typical SST differences between the two products (measured as temporal standard deviations of monthly mean differences) range between ;0.2 K in the tropics and ;0.5 K in the Southern Ocean, western boundary currents, and in the high north (not shown). It is likely that these differences in SST contribute to differences in surface fluxes and also have an effect on assimilated state quantities away from the surface and consequently analyzed transports.
Consequently, this gradual change in TEDIV dir is also present in our indirectly estimated F S product (derived from TEDIV dir and R TOA from DEEP-C) such that its land average is 24.3 W m 22 prior to 2000 and 21.4 W m 22 afterward, which is in reasonable agreement with observation-based estimates of &0:1 W m 22 from von Schuckmann et al. (2020). Aside from the large discontinuity, F S is relatively stable around those values having a standard deviation of 0.7 W m 22 before and 0.4 W m 22 after the year 2000. The inferred F S using R TOA from CERES-EBAF v4.1 exhibits the same temporal variability as F S derived from DEEP-C, but has a 2000-2018 land average of only 21.2 W m 22 (see Fig. 9a) owing to the 0.2 W m 22 weaker net TOA flux in CERES-EBAF over land, which is a result of a different global anchoring value in the two products (Liu et al. 2020). Consequently, its ocean average is about 0.2 W m 22 larger when using CERES-EBAF (long-term mean is 1.8 W m 22 ; see Fig. 9c) instead of DEEP-C (see further below). This is a fairly small difference and can be attributed to observational uncertainties [e.g., von Schuckmann et al. (2020)] and different averaging periods for the choice of anchoring values for DEEP-C and CERES-EBAF TOA fluxes.
El Niño events and volcanic eruptions do not only have an impact on the surface and net TOA fluxes, but also on the oceanland transport. Figures 9b and 9d highlights the strengthened ocean-land transport during these events, e.g., in 1992, 1998, 2010, and 2016, with the largest anomalies during the strong 1997/98 El Niño. Over the ocean (Fig. 9c), variability of F S is coherent with the R TOA variability so that it is also dominated by the signals of El Niños and volcanic eruptions. The discontinuity in TEDIV dir (see also Fig. 7  Please keep in mind that the calibration of DEEP-C R TOA fluxes incorporates, among others, information about long-term OHC trends. This is similar to CERES-EBAF, but DEEP-C is adjusted to an updated ocean heat uptake estimate based on Argo data (see Liu et al. 2020). Nevertheless, the good agreement between F S over ocean and global ocean OHC trends suggests a realistic balance between TOA fluxes and TEDIV over ocean.
We also computed the ocean to land energy transport for 2000-06 using the new ERA5.1 dataset (not shown; Simmons et al. 2020), where a cold bias in the stratospheric temperature is corrected, but this has virtually no effect on the global energy transport. Differences in the ocean to land energy transport are on the order of ;0.1 W m 22 .
As for the moisture transport, there are forecasts available for the TEDIV (note that these are not mass-adjusted; not shown) exhibiting the same characteristics as the TEDIV fields (based on analyses) presented above, i.e., a gradual change with the same magnitude around the year 2000. We also see this trend in the AET FC and parameterized F S over the ocean (not shown) indicating an altered balance of the forecasted energy budget likely caused by changes in the observing system.
In summary, ocean to land energy transport in ERA5 exhibits a discontinuity of ;3 W m 22 (0.45 PW) in the late 1990s so that inferred F S after 2000 agrees within 1.5 W m 22 with recent estimates of ocean and land heat uptake, but disagreement is significantly larger before 2000 (all long-term averages are summarized in appendix B).

d. Internal consistency of the atmospheric energy budget
We now assess the internal consistency of the ERA5 and ERA-Interim energy budget using the residual (computed with analyzed tendencies) and self-consistency diagnostics (computed with forecasted tendencies) as described in section 3. Again, we want to emphasize that these assessments of internal consistency are independent of observation-based vertical energy fluxes, i.e., we use parameterized R TOA and F S fluxes from reanalyses. Figure 10 shows global 1985-2018 averages of the self-consistency in the upper and residuals in the lower panels. Results from ERA5 are shown in the left and those from ERA-Interim in the right column.
The self-consistency check, i.e., the agreement between TEDIV dir and TEDIV ind , in ERA5 (Fig. 10a) shows rather uniform values over both ocean and land. Largest inconsistencies can be found over high topography, e.g., the Andes and Himalaya, where spectral noise is present in TEDIV dir (cf. Figs. 5a,c), and in the Southern Ocean. The latter is possibly related to our neglect of snow and ice in our Thus, there is an imbalance between the vertical energy fluxes and AET FC (the global average of TEDIV dir is 0), which suggests that some energetic effects are missing in our diagnostics. The selfconsistency of results from ERA-Interim (Fig. 10b) exhibits comparatively strong artificial noise over land originating from the directly computed TEDIV dir (see Fig. 5b), but is as smooth over ocean as in ERA5. Its global mean is 20.9 and 20.8 W m 22 for the aforementioned periods, which is significantly lower than in ERA5. This is a surprising result given the evaluations are carried out analogously for both ERA5 and ERA-Interim. The energy budget residuals from both ERA5 and ERA-Interim (Figs. 10c,d; emphasizing the balance between analysis-based horizontal and forecast-based vertical energy fluxes) are spatially more heterogeneous compared to their self-consistency, with generally larger RMS values (cf. RMS 5 11.1 W m 22 for ERA5 and RMS 5 27.0 W m 22 for ERA-Interim with self-consistency RMS 5 8.5 W m 22 for ERA5 and RMS 5 25.2 W m 22 for ERA-Interim). The selfconsistency metric for ERA5 (Fig. 10c) shows relatively high values where precipitation dominates freshwater fluxes (positive values in Fig. 1a), especially in the ITCZ. This appears related to the negative ASR and positive OLR biases (see Fig. 8b), resulting in a negative bias of net TOA fluxes in the tropics. Other inconsistencies may stem from different sources and need further investigation. Nonetheless, compared to other energy budget residual estimates in the literature, a value of 11.1 W m 22 for a multiannual RMS at T179 spectral resolution (equivalent to 18) is remarkably small. It is substantially improved compared to estimates from 9.4 W m 22 at T63 (Mayer et al. 2017), and 13.0 W m 22 at T63 (Hantel and Haimberger 2016)]. ERA-Interim energy budget residuals (Fig. 10d)  , which is significantly more negative than the global mean of the self-consistency diagnostic (22.9 and 20.9 W m 22 , respectively). This (i) demonstrates the improved balance between analyses and short-term forecasts in ERA5 and (ii) confirms that forecasted tendencies (as used in the self-consistency diagnostic) largely cancel spinup and spindown effects of the vertical fluxes.
In Fig. 11, we show the temporal evolution of the zonally averaged self-consistency as well as budget residual as anomalies, which is arranged in the same way as Fig. 10. We find that self-consistency in ERA5 (Fig. 11a) is temporally homogeneous, with an RMS 5 1.5 W m 22 and no clear patterns and good temporal stability. In ERA-Interim (Fig. 11b), however, regions and times of positive and negative anomalies are present, which change sign on decadal time scales, e.g., negative anomalies around 308S prior to the year 1997 and positive anomalies afterward, resulting in an RMS 5 2.3 W m 22 .
The temporal evolution of the ERA5 budget residual (Fig. 11c) is clearly dominated by the discontinuity in the late 1990s, with RMS 5 3.8 W m 22 . Except for the polar regions, it has uniformly negative anomalies before 2000 and positive anomalies afterward. The SNHT shows significant jumps in the tropical region (6158) in 1997/98 and at 308S in 1995, which are at least two years earlier than the discontinuities in ERA5's moisture budget residuals (see Fig. 4a; note that both land and ocean anomalies of the energy budget residual exhibit the same discontinuity as the global anomalies in Fig. 11c). In ERA-Interim (Fig. 11d), we again see the spatially more heterogeneous pattern with alternating signs on decadal time scales, but with larger anomalies compared to its self-consistency (RMS 5 4.1 W m 22 compared to 2.3 W m 22 ).
In summary, inconsistencies caused by the model spinup and spindown effect are small in ERA5 and buried by the discontinuity around the year 2000 (see Fig. 11c). Consequently, anomalies based on the period 2000-2018 (not shown) are even smaller, with an RMS 5 2.3 W m 22 (compared to 3.6 W m 22 for ERA-Interim). In ERA-Interim, spinup effects are causing large temporal and spatial inconsistencies, indicating a degraded regional balance between forecasted and analyzed energy fluxes (see Fig. 11d). On a global average, these inconsistencies are well compensated by the forecasted tendency such that the self-consistency global mean is smaller (i.e., better consistency between model forecasts and analyses) in ERA-Interim than in ERA5. Nonetheless, the regional discrepancies between forecasted and analyzed energy transports (recall RE FC 5 TEDIV ind 2 TEDIV dir ) obviously vary temporally and spatially in ERA-Interim, leading to temporal and spatial variations in the self-consistency metric for this reanalysis (see Fig. 11b). In ERA5, however, these inconsistencies as well as the discontinuity around 2000 are effectively cancelled out by the forecasted tendency resulting in a very uniform but more negative self-consistency (see Fig. 11a).

Discussion
We investigate the atmospheric energy budget in ECMWF's latest reanalysis dataset ERA5. We use a mass-consistent formulation of the energy budget in combination with advanced numerical and diagnostic methods to compute the divergence of total atmospheric energy fluxes with unprecedented accuracy. These fields are used to 1) estimate the atmospheric meridional energy transport, 2) compute the ocean-land energy transport, 3) indirectly estimate surface energy fluxes, and 4) assess the consistency of the energy budget. Furthermore, we evaluate the mass and moisture budget in ERA5 and compare individual terms with those from previous evaluations using ERA-Interim. We also compare parameterized precipitation fluxes from ERA5 and ERA-Interim with observation-based precipitation estimates. The main outcomes of this study are the following: d Freshwater fluxes from ERA5 are generally well balanced by the analyzed moisture flux divergence, with a moderate spindown effect in the tropics. Moreover, global ocean to land moisture transports agree well with the continental freshwater flux. Also, the terms of the moisture budget in ERA5 are temporally more stable than those from ERA-Interim. However, globally averaged moisture transports from analyses exhibit an unrealistic positive trend over the considered period. This spurious trend is not seen in moisture transports derived from short-term forecasts, which makes those better suited for long-term studies. However, it remains unclear whether this conclusion also holds for more regional moisture budgets.
d Precipitation, evaporation, and ocean to land energy transport exhibit an unrealistically strong increase in the equator region during the period 1995-2005, particularly in 1997/98, which likely stems from changes in the observing system. d TEDIV from ERA5 is much smoother than in earlier evaluations, thanks to ERA5's enhanced spatial and temporal resolution. Results can thus be used at much higher resolution than before. The very likely spurious increase of forecast precipitation and evaporation and also the shift in the energy budget residual in the late 1990s warrants further discussion. ERA5 uses the 41r2 cycle version of the Integrated Forecasting System operational in 2016, and the ERA5 assimilation system was initially tested in the period after 2000, when e.g., GPS-RO and AMSU data have been available. This means that it uses the model and data assimilation at that time, which was tuned to work best with the observing system operational at that time (Simmons et al. 2020). This is also reflected in Fig. 8c in Hersbach et al. (2020), which clearly shows that the energetic consistency of ERA5 (in terms of the balance between global mean net flux at TOA and the surface) is best in the late 2000s and early 2010s. Hence, it is not surprising that inferred surface energy fluxes over land (see Fig. 9) are more realistic after 2000 than before.
Furthermore, almost all satellite observations that were assimilated by ERA-Interim have also been used by ERA5, but many of them are reprocessed, such as SSM/I microwave observations [see Fig. 5 in Hersbach et al. (2020)]. This could also have an effect on the transports as the spurious trend in ERA5's energy budget around 2000 is larger compared to that in ERA-Interim (see, e.g., Trenberth and Fasullo 2013). In addition, we also note that ERA5 uses a different SST product than ERA-Interim before 2009, which likely contribute to the found differences in surface fluxes and subsequently atmospheric transports.
Nonetheless, our novel TEDIV dir fields show unprecedented accuracy, with sharp gradients along coastal lines, ice edges, and western boundary currents such that regionally more detailed evaluations are possible. Earlier studies using older reanalysis datasets usually truncated results at T63 or below to obtain reasonably smooth fields (see, e.g., Trenberth and Fasullo 2008;Mayer et al. 2016;Trenberth and Fasullo 2017), but this obviously prevents a realistic representation of sharp gradients (see Fig. 5 in Mayer et al. (2019) for a comparison of different truncations). Even at full spatial resolution (see, e.g., Fig. 6a), our fields are very smooth over ocean (RMS 5 62.3 W m 22 ), while, e.g., those archived from ERA5 show extensive patterns of artificial noise (RMS 5 159.0 W m 22 ; not shown). At 18 resolution (truncation at wavenumber 179; see Fig. 5a) and below, our TEDIV dir field unfolds its full potential, where almost no artificial noise is left over near high topography. Some remainders are visible only along the Andes, while other regions show no artificial noise at all, e.g., in North America, Greenland, and Siberia.
We also want to briefly highlight the indirectly estimated surface energy fluxes F S using TEDIV dir and reconstructed R TOA fields from DEEP-C. The F S over ocean (1.6 W m 22 for 2000-18) is in much better agreement with the long-term mean ocean heat uptake of ,1.0 W m 22 (Wild et al. 2012;L'Ecuyer et al. 2015;von Schuckmann et al. 2020) than widely used satellite-derived surface flux products (uncertainties of ;20 W m 22 ; Rhein et al. 2013;Mayer et al. 2017). Over land, we find good agreement with a recent observation-based estimate of land heat uptake from von Schuckmann et al. (2020). We also note that the F S global mean is unbiased by construction (global average of TEDIV dir is 0) and thus well-suited for climate studies and model evaluations. Future work will include a comprehensive evaluation of F S and additional in-depth studies.
We used the RMS value of the self-consistency and residual diagnostic as metric for the accuracy of our budget evaluations. The RMS values of both self-consistency and residual can be reduced by ;60% using ERA5 instead of ERA-I. We recommend using the RMS metric in other budget evaluations to objectively measure the quality of results.
In conclusion, the performance of ERA5 is clearly improved compared to its predecessor, but some fields contain unrealistically strong trends and should be used with caution. While the reason for the strong gradual change in the late 1990s needs further investigation, the presented fields of TEDIV dir are stable from 2000 onward and thus useful for quantitative assessments of large-scale climate variability and trends. In general, climate studies will benefit from the improved temporal stability and higher useful resolution of ERA5 energy budgets. The latter will allow for regionalized studies at higher accuracy. Another area of future work will be a backwardextension of the presented evaluations, which will become possible after the release of ERA5 data back to 1950.