Representation of Climate in Reanalyses: An Intercomparison for Europe and North America

: Atmospheric reanalysis datasets are a major tool to assess climate and its variability for past decades in a broad spectrum of applications. Nowadays, global reanalysis systems are available in third- and fourth-generation versions. They beneﬁt from continuous improvements of numerical models and data assimilation schemes thus utilizing a growing number of observing systems. In the past decade, global reanalyses have been complemented by several regional reanalyses with enhanced horizontal resolution and appropriate model physics. Due to the considerable technical overhead associated with the comparison of reanalyses, it is difﬁcult for users to identify an optimal dataset for their needs. Therefore, this study strives to provide a comprehensive intercomparison of the representation of climate in various reanalyses. The study compares four global reanalyses and three regional reanalyses over Europe and North America with in situ observations. The focus lies on the representation of near-surface temperature and precipitation expressed through various climate indices. The comparison shows that performances of the different reanalyses vary considerably depending on the parameter(s) of interest, the region, and the type of aggregation. The main ﬁndings are that regional reanalyses provide added value especially with regard to precipitation. While ECMWF’s newest reanalysis ERA5 is in better agreement with observations compared to the other global reanalyses for most climate indices over Europe, it is still outperformed by its predecessorERA-InterimoverNorthAmerica.Tosupportusersinidentifyinganappropriatereanalysisfortheirneeds,the study is complemented by comprehensive supplemental material.


Introduction
The major aim of reanalysis efforts is to generate the best possible estimate of the atmospheric state given an observational dataset and a numerical modeling system. In general, such a system comprises a fixed state-of-the-art numerical weather prediction (NWP) model (note that acronyms are expanded in the appendix) and a corresponding data assimilation scheme (e.g., Kalnay et al. 1996). Compared to gridded observational datasets, the procedure allows for generating a physically consistent spatiotemporal reconstruction of the (atmospheric) system's state. Further, information from various observational sources can effectively be merged into the reanalysis. Through the use of the physical NWP model, reanalyses also allow for estimates of nonmeasurable quantities. Hence, reanalysis data considerably improve the ability to investigate climate and its variability compared to datasets created based solely on observational data or model simulations (Trenberth et al. 2008).
Over the last two decades several generations of global atmospheric reanalyses have been produced by various meteorological centers. The recent third generation includes well-established datasets such as ERA-Interim (Dee et al. 2011), MERRA-2 (Gelaro et al. 2017), and JRA-55 (Kobayashi et al. 2015). Horizontal resolutions of global reanalyses commonly lie between 100 and 200 km for the firstand second-generation datasets, at or above 50 km for thirdgeneration reanalyses and at approximately 30 km for ECMWF's new ERA5 (Hersbach et al. 2020), the first fourth-generation reanalysis.
With the increasing number of reanalyses, intercomparison of the datasets has become more essential. In particular, users depend on guidance in selecting a suitable dataset for their respective application. In this regard, efforts such as the WCRP Task Team on the Intercomparison of Reanalyses (TIRA) or the SPARC Reanalysis Intercomparison Project (S-RIP; Fujiwara et al. 2017) have emerged.
In general, there are three approaches to the evaluation and comparison of reanalyses. The first is the intercomparison of reanalyses among each other without using a reference dataset. These studies aim at identifying similarities and differences between the reanalyses in order to determine the uncertainty in reanalysis estimates and identify outliers (e.g., Auger et al. 2018;Torralba et al. 2017;Martineau et al. 2018). Research using this intercomparison method often focuses on parameters that are not observed or are insufficiently observed (e.g., Trenberth et al. 2011;Bosilovich et al. 2017;Boisvert et al. 2018).
The second approach employs gridded observational datasets, which have the advantage of a spatially continuous representation of the parameter to be evaluated. Several recent studies use such data as reference to compare reanalyses with regard to specific parameters globally (e.g., Simmons et al. 2017;Sun et al. 2018) or for a certain region (e.g., Ghodichore et al. 2018;Gupta et al. 2020;Krauskopf and Huth 2020). As reanalyses use a grid representation, the comparison with gridded observations seems in theory to be straightforward, since both should reflect area averages. However, the area average representation of gridded datasets strongly depends on the density of observations (e.g., Prein and Gobiet 2017;Zandler et al. 2019). Further, the interpolation may lead to representation errors either in grid cells with and without observations (e.g., Kyselý and Plavcová 2010). To mitigate these effects, the grid size of gridded observational data is often increased enhancing the resolution error in the data (e.g., Willmott and Johnson 2005). Therefore, regional features and spatial variability of a parameter are not reproduced on smaller scales, as can for example be seen in Alexander et al. (2020). Moreover, a gridded observation dataset is basically the result of a statistical simulation employing certain assumptions on the input data and its distribution. In contrast to model-based estimates, gridded datasets do not take into account dependencies between different meteorological parameters.
The third approach uses point-based observations, which in general exhibit a high quality with respect to errors in their representation of the observed parameter. In comparison to gridded observational datasets, local characteristics of the parameter in consideration are preserved. Recent intercomparisons of reanalyses with point-based data have been conducted for various parameters, such as near-surface wind speed (Stopa and Cheung 2014;Ramon et al. 2019), cloud cover (Free et al. 2016), snow depth (Wegmann et al. 2017;Orsolini et al. 2019), temperature and precipitation (Lader et al. 2016;Diaconescu et al. 2018;Sheridan et al. 2020), and surface solar radiation (Slater 2016;Zhang et al. 2016).
All of the aforementioned studies focus on global reanalyses. However, the resolved scales and processes in the global datasets do not necessarily allow for an assessment of regional or local climate (e.g., Vidal et al. 2010) especially with regard to complex topography or surface heterogeneity. In this respect, regional reanalyses have been developed applying a limited area model over specific regions. Due to their increased horizontal resolution, it is expected that regional reanalyses are able to better represent characteristics of local climate. The North American Regional Reanalysis (NARR; North America at 35 km; Mesinger et al. 2006) was the first such approach when horizontal resolutions in global reanalyses were quite coarse (.100 km). During the last decade, several new regional reanalyses at higher horizontal resolutions emerged for various regions, including ASRv2 (Arctic at 15 km; Bromwich et al. 2018), COSMO-REA6 (Europe at 6 km; Bollmeyer et al. 2015), HARMONIE (Europe at 11 km; Dahlgren et al. 2016), and BARRA (Australia, New Zealand, and Southeast Asia at 12 km; Su et al. 2019). The added value of regional reanalyses has been shown especially for spatially heterogeneous parameters such as precipitation (Bukovsky and Karoly 2007;Wahl et al. 2017), near-surface and boundary layer wind speed (Kaiser-Weiss et al. 2015;Moore et al. 2016;Frank et al. 2020), temperature (Kaiser-Weiss et al. 2019;Scherrer 2020), or integrated water vapor (Steinke et al. 2019). A shortcoming of the aforementioned studies is that most of them evaluate the regional reanalyses only against their respective forcing global reanalysis. Further, the comparisons for regional reanalyses mostly focus on general evaluation metrics and do not specifically take into account the representation of climate and its variability.
In summary, there are a number of open questions regarding the intercomparison of reanalysis: 1) How do current reanalyses perform with respect to user-relevant aspects of local climate and its variability? 2) Does ERA5 further increase the quality of reanalysis estimates compared to its predecessor ERA-Interim, which has often been one of the best performing reanalyses in past intercomparisons? 3) Is there an added value of regional reanalyses compared to various global products in the representation of various aspects of local climate? Until now, there has been no comprehensive intercomparison study focused on the aforementioned issues especially with a focus on local climate characteristics for continental regions.
In this regard, our study contributes a comprehensive evaluation and intercomparison of seven reanalysis systems (including three regional reanalyses) using point-based station observations in order to investigate the representation of local climate and its variability. The intercomparison is performed for two continents with sound observational data coverage (Europe and North America) and two parameters relevant to users (temperature and precipitation). Besides a general evaluation with standard statistical tools, we provide a regional ranking of the reanalyses that supports users in choosing a suitable dataset for their specific needs.
The paper is organized as follows. Section 2 describes the reanalysis datasets and verifying observations. Section 3 provides an overview of the intercomparison approach, the applied ranking method, and the climate indices used in this study. The results of the analysis are shown in section 4 and summarized and discussed in section 5. Section 6 provides concluding remarks on the results. Comprehensive supplemental material containing tables and maps complements the study and provides useful insights and guidance for users of reanalysis datasets.

a. Reanalysis data
For the intercomparison in this study, we use data from four global reanalyses, two European regional reanalyses, and one North American regional reanalysis for the investigation period from 1995 to 2018, which corresponds to the period of the shortest available reanalysis dataset. The datasets used in the study are described in the following subsections and in Table 1. Information about data sources can be found in the data availability statement at the end of the article.
Weather Forecasts (ECMWF). It utilizes ECMWF's atmospheric model IFS (version cy31r2) in combination with a four-dimensional variational data assimilation system (4D-Var). ERA-Interim's original representation is spectral, but it translates to a horizontal grid size of about 70-80 km in the midlatitudes. Although it was thought to be an intermediate version between the ERA-40 reanalysis and ECMWF's latest reanalysis installment (ERA5), it became immensely popular with several tens of thousands of users worldwide (Hersbach et al. 2019) due to its quality in representing the atmospheric state. The dataset is available from 1979 to mid-2019.

2) ERA5
ERA5 is the new global reanalysis product of ECMWF. It is based on a more current version of ECMWF's atmospheric model (cy41r2) with a hybrid incremental 4D-Var system and is the successor of ERA-Interim. It contains a deterministic realization at 31 km horizontal resolution and is the first ECMWF reanalysis to also provide an ensemble (10 members) albeit at a reduced resolution. Once completed, the ERA5 reanalysis will cover the time period from 1950 onward. Details about the ERA5 implementation as well as improvements and differences with respect to ERA-Interim can be found in Hersbach et al. (2020).

3) MERRA-2
The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2; Gelaro et al. 2017) is NASA's current reanalysis, which serves as a replacement of the first version of MERRA. MERRA-2 is produced with the Goddard Earth Observing System (GEOS) model and a threedimensional variational data assimilation (3D-Var) scheme. As a distinct feature, it is the only current global reanalysis to assimilate atmospheric aerosols. It has a midlatitude horizontal resolution of approximately 50 km and is available for periods from 1980 onward. MERRA-2 provides observations-corrected precipitation fields (Reichle et al. 2017), which are used in this study. The correction is based on a merged satellite-gauge precipitation product (CPCU; Chen et al. 2008) with a full correction for latitudes below 42.58, a linear tapering between CPCU and the MERRA-2 model precipitation between 42.58 and 62.58, and no correction for latitudes at 62.58 and higher.

4) JRA-55
The JRA-55 reanalysis dataset is produced by the Japanese 55-Year Reanalysis project carried out by the Japan Meteorological Agency (JMA). It is a global reanalysis based on the JMA global spectral model (GSM) and uses a 4D-Var assimilation system, implemented as a follow up to the JRA-25 reanalysis (Kobayashi et al. 2015;Harada et al. 2016). JRA-55 provides reanalyzed data for a long time period starting in 1958 at an approximate horizontal resolution of 55 km for the midlatitudes.

5) COSMO-REA6
Recently, regional reanalysis datasets for Europe have become available. These include the COSMO-REA6 reanalysis (Bollmeyer et al. 2015), which was developed by the Climate Monitoring and Diagnostics group of the Hans-Ertel-Centre for Weather Research (HErZ). The reanalysis is produced using the limited area model COSMO at a horizontal grid resolution of 6.2 km on the EURO-CORDEX-11 domain forced at the lateral boundaries by ERA-Interim. It uses a continuous nudging scheme allowing only for a straightforward inclusion of prognostic variables, and thus observational data are restricted to conventional observations. The nudging scheme is complemented by separate soil moisture, sea surface temperature, and snow analysis modules. The COSMO-REA6 dataset is available from 1995 to mid-2019 and is an operational climate monitoring product at the Deutscher Wetterdienst (DWD; German Meteorological Service) with a new version currently under development.

6) HARMONIE
The HARMONIE regional reanalysis system was developed at the Swedish Meteorological and Hydrological Institute (SMHI) in the framework of the EU FP7-funded project UERRA. 1 The system is based on the HARMONIE NWP setup (system cycle 38h1.1) with the ALADIN physics scheme (Dahlgren et al. 2016). It applies a 3D-Var data assimilation scheme on its 11-km European model grid with data being available since 1961. Between 1961 and 2001, the reanalysis is based on the observational dataset of ECMWF's ERA-40 reanalysis complemented by Swedish and French data. After 2001, the system used available operational conventional observations. Lateral boundary conditions are obtained from ERA-40 for the years 1961-79, and from ERA-Interim after 1979. While this may have a considerable impact on the consistency of the dataset as a whole, it does not affect our period under investigation.  Period 1979Period -2019Period 1980Period -present 1980Period -present 1978Period -present 1995Period -2019Period 1961Period -2019Period 1979 7) NORTH AMERICAN REGIONAL REANALYSIS The North American Regional Reanalysis (NARR; Mesinger et al. 2006) developed by the National Centers for Environmental Prediction (NCEP) is the first ever produced regional reanalysis dataset. The reanalysis system is based on a 2003 version of the Eta Model Data Assimilation System (EDAS) based on 3D-VAR at a horizontal grid size of 32 km with boundary data being provided by the NCEP-DOE Reanalysis (R2). In addition to conventional observations, the data assimilation scheme is supplied with various other observational data such as cloud drift winds from satellites, lake surface temperatures and ice data, snow cover, and hourly precipitation products. The data are available from 1979 on and are still continued to the near present.

b. Observations
As a reference for the quality of the reanalysis estimates, we calculate climate indices for a large number of observation sites providing daily measurements. These are obtained from the European Climate Assessment and Dataset (ECA&D; e.g., Klein Tank et al. 2002) database for Europe as well as the Global Historical Climatology Network, version 4 (GHCNv4; Menne et al. 2018) for North America.
All reanalyses assimilate a large amount of atmospheric observations. The global reanalysis systems comprise many observational datasets including multiple satellite platforms. COSMO-REA6 and HARMONIE only employ conventional observations (SYNOP, ship, buoy, airplane, balloon, radiosonde, etc.) in their data assimilation schemes. In contrast, NARR is the only regional reanalysis that also includes satellite observations. However, the reanalysis estimates are not completely independent from verifying observations. All reanalysis systems except MERRA-2 use 2-m temperature observations in some form in their data assimilation. Rain gauge observations are independent for COSMO-REA6, HARMONIE, ERA-Interim, and JRA-55, as well as for ERA5 over Europe. The NARR assimilates precipitation over the continental United States, Mexico, and Canada via a gridded analysis of rain gauge observations. Since mid-2009 ERA5 assimilates the NCEP Stage IV precipitation product, which is a combination of radar and station observations, over the United States. The precipitation-corrected fields of MERRA-2 are based mainly on the CPCU product [see section 2a (3)] over both Europe and North America.

Methodology
In this section, we describe our approach to compare the various reanalysis datasets with the observations with respect to the representation of climate.

a. Intercomparison approach
First, we build daily precipitation and minimum and maximum temperature time series from each reanalysis corresponding to each of the observation locations. In this respect, we employ a straightforward nearest neighbor approach (i.e., for every observation site, we take the value of the reanalysis grid box in which the site is located). For the temperature data, we intentionally did not include a height correction from reanalysis orography to the station height as we do not expect that a majority of users, especially from nonscientific applications, would go to such lengths but merely take the data as is. Further, for precipitation such a correction would not be trivial in any case due to the highly nonlinear processes involved.
While reanalyses always provide complete time series, the observational data exhibit gaps and missing values. To ensure a sound analysis, we only use stations that provide annual time series with more than 360 daily observations for at least 15 of the 24 years in the comparison period.
From each of the reanalysis and observational time series, we calculate annual values for selected climate indices (cf. section 3c). Then, we employ two standard metrics, namely bias and correlation, to evaluate the performance of the reanalyses at the observation locations. Specifically, the bias at a single location is calculated for each climate index as where CI REA and CI OBS are the average value of a climate index calculated from the reanalysis and observational data, respectively. The correlation for each location is calculated as the Pearson correlation coefficient with cov(.) denoting the covariance and s the standard deviations of the respective annual climate index time series.

b. Ranking
To compare the performance of the reanalysis datasets for the various stations, we employ a ranking on the reanalyses for each station and climate index. This ranking takes into account that the value of a metric at a station may differ numerically between reanalyses but that these differences are probably not statistically significant. The specific steps in order to obtain the ranking for a certain climate index are the following: 1) Calculate the best reanalysis REA 1 (with the smallest bias or the largest correlation) at each station location. 2) Perform a t test for REA 1 with every other reanalysis measuring whether the average (expected) value of the metric differs significantly across the two samples (reanalyses). 3) Assign the first rank for this climate index, station, and metric to REA 1 and all other reanalyses rejecting the t-test hypothesis at the 1% significance level.
Due to the fact that multiple reanalyses can occupy the first rank, the sum over all reanalyses for a certain rank may well exceed 100%.

c. Climate indices
Climate indices are an essential tool to describe and monitor the state of the climate system. In this regard, the WMO Expert Team on Climate Change Detection and Indices (ETCCDI) has defined a core set of 27 indices based on daily temperature and precipitation which describe frequency, amplitude, and persistence of certain climatic characteristics (Klein Tank et al. 2009).
In our intercomparison approach estimates of climate indices are derived from the reanalyses and compared to those calculated with observation time series. To allow for the inclusion of various aspects, a set of different climate indices for temperature and precipitation is employed and described in the following.

3) HEAT AND COLD WAVE INDEX
More complex indices describing extreme temperature anomalies (i.e., heat or cold waves) are given by Lavaysse et al. (2018). The indices are based on daily maximum T max and daily minimum temperature T min . Heat waves are identified as periods where the daily maximum and minimum temperature exceed T q90 max and T q90 min on at least three consecutive days in the extended Northern Hemisphere summer (April-September). In the same way, cold waves are defined with daily maximum and minimum temperature being below T q10 max and T q10 min in the extended Northern Hemisphere winter (October-March). Following Lavaysse et al. two heat (cold) waves are merged into one if they are separated by just one day. The reference thresholds are defined as the climatological 90% and 10% quantiles determined for each location and each day of the year separately using an 11-day window centered around the calendar date in consideration. The climatological dataset used for these calculations is taken from historic observations in the time period before 1995 requiring a minimum of at least 110 (or 10 years) observations in the window.
With this definition, we obtain the two following indices: d HWI-Heat wave days: Total number of days in a year where a heat wave is detected. d CWI-Cold wave days: Total number of days in a year where a cold wave is detected.

4) STANDARDIZED PRECIPITATION INDEX
In addition to the other indices for precipitation, we also apply the more complex standardized precipitation index (SPI; McKee et al. 1993), which is designed to indicate periods with drier or wetter conditions compared to climatology and is a tool often employed to detect droughts. Following Guttman (1999), the SPI is calculated at each station by 1) determining the empirical distribution function (ECDF) as reference for a specific time scale (e.g., annual, seasonal) from long-term historic records of precipitation, 2) transforming the obtained ECDF into a standard normal distribution, 3) finding the corresponding ECDF value for the accumulated precipitation in each year, and 4) assigning these values their respective standard normal transformed values as the SPI.
The SPI is given in terms of standard deviations of observed precipitation with respect to the climatological mean. Positive (negative) SPI values indicate wetter (drier) conditions compared to climatology at the respective location for a given time scale. In this study, the analysis is focused on summer, winter, and annual precipitation amounts. To obtain a reasonable climatological reference, the respective ECDFs are estimated from the available historic observations at each station with at least 30 years of complete data for the respective time scale (including the evaluation period).

Results
The climate indices derived from the reanalyses are compared to the indices calculated with the observation time series. Note that the number of stations for evaluation also varies with the climate index in consideration, as indices such as hot days or ice days cannot be observed in the far north or south, respectively, of the domains.
The following subsections depict the results of the intercomparison and outline their main characteristics from an overall (section 4a) as well as spatially differentiated (section 4b) point of view. Section 5 then summarizes and discusses the key findings.

a. Overall agreement of reanalyses and observations
To provide an overview of the performance of the reanalyses with respect to climate, we calculate the metrics bias and correlation at each station location for all climate indices and reanalyses as described in section 3a. The results are presented as box plots in Figs. 1, 2, 5, and 6. The box represents the interquartile range with the horizontal line depicting the median, whereas the whiskers denote the respective 10th and 90th percentile. Figure 1 shows box plots over all European stations for each reanalysis with respect to bias (upper plot) and correlation (lower plot) of the temperature-based indices. Negative systematic biases can be found with all reanalyses except HARMONIE for summer days (SD) and hot days (HD) and also for most stations for the heat wave index (HWI), while positive biases prevail for ice days (ID). In general, large variations of the reanalysis estimates can be found between the different reanalyses for the same index. Further, the spread of bias among the stations for each reanalysis also varies considerably. For example, the median of COSMO-REA6 bias for HWI is 21 with an interquartile range of about 64 days. In contrast, the HARMONIE reanalysis exhibits a significant cold bias with a median of 26 but with an interquartile range of only about 62 days. For the correlation (lower plot in Fig. 1), we also find large variations over stations for each reanalysis but also among the different reanalyses. For tropical nights (TR) all reanalyses have a large interquartile range of about 0.35 with the 10th percentile going down close to correlations of 0.2. High correlations can especially be found for ID with interquartile values above 0.9 for most reanalyses. Considerably lower correlations compared to the other reanalyses can be seen with SD, HD, and the cold wave index (CWI) for JRA-55 and with HWI, ID, and to some extent frost days (FD) for MERRA-2.

1) TEMPERATURE-BASED INDICES
Results for temperature indices over North America are shown in Fig. 2 (please note larger y-axis limits compared to Fig. 1). Here, mostly negative biases occur with SD, HD, FD, and CWI and positive biases for TR while for HWI, the median bias of all reanalyses is close to zero. While variations are quite large among stations, a close to 0 median can be found with ERA-Interim for most and with NARR for some indices. With respect to the correlation of the temperature-based climate indices over North America (lower plot in Fig. 2), large variations over the stations can be observed for each reanalysis with slightly higher correlations for NARR compared to the global reanalyses. In contrast to Europe, the correlations are more homogeneous over the reanalyses for North America but exhibit larger interquartile ranges over stations of about 0.25-0.3 for all indices. However, the correlation values are in general lower for North America (except for TR) with median correlations being mostly below 0.8 while the median is generally above 0.8 for most indices and reanalyses over Europe.
It is rather difficult to draw useful conclusions for specific applications from the results shown above. Therefore, we now address the major question for reanalysis users: Which reanalysis best represents the parameter(s) of interest? Therefore, we employ the ranking described in section 3b for each climate index, metric, and station. Figures 3 and 4 indicate how often a reanalysis performs best (i.e., populates the first rank) for each of the temperature-based climate indices. Specifically, the plots show for each reanalysis the percentage of stations where it obtains the first rank for bias (top) and correlation (bottom). Hence, longer colored bars indicate a better performance in the respective reanalysis-climate index combination. Due to the fact that multiple reanalyses can occupy the first rank, the sum over all reanalyses for a climate index can be larger than 100%.
Looking at the results of the ranking for Europe in Fig. 3, it can be observed that the regional reanalyses perform well and at least one of them outperforms the global reanalyses-even the new ERA5 with its recent NWP and data assimilation systems and increased horizontal resolution-for SD, HD, and CWI. The best overall performance can be observed for HARMONIE at SD, COSMO-REA6 at HD and CWI, ERA5 at HWI and FD, ERA-Interim at ID, and MERRA-2 at TR. In general, JRA-55 exhibits a poor performance compared to the other reanalyses with lowest or second lowest percentages for six of the seven indices for bias and for all indices for correlation. Further, it can be seen that ERA5 outperforms its predecessor ERA-Interim for six of the seven indices for bias and all indices for correlation.
For North America (cf. Fig. 4), an unexpected result for temperature-based indices can be observed as ERA-Interim shows highest or second highest percentages for six of the seven indices in bias and correlation and often outperforms its successor ERA5. The other major finding is that NARR is best or second best for all indices in bias and correlation despite using a rather old forecasting and data assimilation system (frozen in 2003) and having a coarser horizontal resolution (;32 km) compared to more current regional reanalyses. The other global reanalyses (ERA5, MERRA-2, and JRA-55) are mostly close together between 10% and 20% for bias and around 30% for correlation.

2) PRECIPITATION-BASED INDICES
The results for precipitation-based indices are shown with the box plots in Fig. 5 (Europe) and Fig. 6 (North America). For Europe, all reanalyses exhibit a wet bias with a negative bias for dry spell length (CDD) and positive bias for wet spell length (CWD) except for COSMO-REA6, which has a median approximately close to zero for both indices. Further, an underestimation of the occurrence of extreme precipitation events can be observed for all reanalyses indicated by negative biases for maximum 1-day precipitation (RX1D) and maximum 5-day precipitation (RX5D) with COSMO-REA6 being closest to zero. With respect to the SPIs, ERA-Interim performs well with a median bias close to zero in all three indices while the other global reanalysis show, again, a wet bias with mostly positive SPI values. COSMO-REA6 has a dry bias in summer, which has also been observed by Wahl et al. (2017) as an underestimation of monthly precipitation in the summer season. With nearly no overall bias in winter SPI, COSMO-REA6 as a result also exhibits negative SPI values on the annual time scale. In comparison, HARMONIE performs quite well for the annual SPI but as a result from an overestimation of summer and underestimation winter precipitation. Regarding correlation (lower plot in Fig. 5), the performance is more homogeneous between the reanalyses with considerably lower overall correlation values for RX1D and CWD. COSMO-REA6 has somewhat lower correlations for most indices compared to the other reanalyses, which may be a result of the double penalty problem (cf. section 5c).
The results for North America as shown in Fig. 6 indicate similar systematic biases (negative values for CDD, positive values for CWD) for the reanalyses compared to Europe but with larger variability among stations and a stronger underestimation of extreme precipitation events in RX1D and RX5D (please note the different y-axis range compared to Fig. 5). In terms of bias median over all stations, NARR is closest to zero for RX1D, RX5D, and CWD while JRA-55 exhibits a very good performance for CDD. With respect to SPI, which evaluates aggregated long-term precipitation, MERRA-2 shows very good overall performances, while JRA-55 (NARR) generally overestimates (underestimates) annual and seasonal precipitation. ERA5 performs well in terms of bias for annual SPI, but this performance comes as a result of an overestimation (underestimation) of precipitation in winter (summer). ERA-Interim underestimates annual precipitation as a consequence of an underestimation in summer precipitation. Correlations (lower plot of Fig. 6) are slightly lower compared to the European stations with JRA-55 showing highest correlations for RX1D, RX5D, and CDD while NARR performs best for CWD and the SPI-based indices.
Turning again to the ranking of reanalyses at each station as described for the temperature-based indices in the previous subsection, results for the percentages of the first rank for precipitation-based indices are shown in Fig. 7 for Europe and Fig. 8 for North America.
For Europe, it can be found that the regional reanalyses (especially COSMO-REA6) outperform the global reanalyses for all indices except summer SPI for bias and RX5D, annual SPI, and summer SPI for correlation. While the results are more evenly distributed over the reanalyses for correlation, there is much more variation in the performance of the reanalyses between the indices for bias. Here, each reanalysis has the highest or second highest percentage for at least one index. COSMO-REA6 shows the highest percentages in bias for five indices and leads with a large margin for the indices RX1D, CDD, and CWD, where it also exhibits the highest percentages in correlation. HARMONIE and COSMO-REA6 are close together for bias of the SPI-based indices with being the best reanalyses for annual and winter SPI. Overall, ERA5 is performing well for correlations having highest or second highest percentages for five of the seven indices.
For North America, NARR is outperforming the global reanalyses similar to the temperature-based indices having the highest or second highest percentages of first rank for all indices but CDD. In this index, JRA-55 clearly outperforms the other reanalyses, which is in correspondence with its significantly smaller bias for CDD shown in Fig. 6. Further, MERRA-2 shows a good performance with respect to the longer-termbased SPI indices for bias and correlation. ERA-Interim outperforms its successor ERA5 at the three indices RX1D, RX5D, and CDD while ERA5 has higher bias and correlation percentages for the SPI indices.

b. Spatial agreement of reanalyses and observations
While the previous evaluations focus on an aggregated evaluation over all stations of a continent, this section aims at indicating the spatial distribution of climate representation in reanalyses. To that end, we split the stations into a set of 93 regions for Europe and 95 regions for North American.
For Europe, these regions are taken from European Union NUTS-1 level definitions 2 based on socioeconomic features and are complemented by country level regions for non-EU countries. However, due to the availability of data, the number FIG. 5. As in Fig. 1, but for climate indices related to precipitation over Europe. For better readability, the upper plot is split in two with RX1D, RX5D, CDD, and CWD on the left (left axis ticks; dimensions are mm and days) and with SPI-related indices on the right (right axis ticks; dimension is standard deviations). of stations varies considerably among the regions and climate indices. For Europe, the number of stations per region ranges between 1 and 191 for temperature-based and between 1 and 1091 for precipitation-based indices. For North America, the regions are obtained as the federal states for the United States and Mexico and the provinces and territories for Canada. Here, the number of stations per region lies between 1 and 466 for temperature-based and between 1 and 383 for precipitationbased indices. Details on the number of stations per region can be found in Tables S1-S28 in the online supplemental material.
To illustrate the spatial heterogeneity of the performance of the reanalyses, exemplary maps indicating the best-performing reanalysis in each region for bias and correlation of HD and RX5D can be found in Figs. 9-16. The performance is estimated using the same ranking as introduced before; that is, for each reanalysis, the average rank over all stations in a region is calculated for bias as well as correlation. A reanalysis is performing best if it exhibits the lowest average rank over all stations within a region. Maps and comprehensive tables with results for all indices can be found in the supplemental material. Figure 9 shows the best ranking reanalyses of HD bias for European regions where two reanalyses with the same average rank are indicated by a hatched shading. COSMO-REA6 is the best reanalysis in most regions with MERRA-2 and HARMONIE exhibiting the best average rank for the rest. For the HD correlation (cf. Fig. 10), the large majority of regions goes either for ERA5 (mainly eastern Europe) or COSMO-REA6 (mainly western Europe). The best performance for the remaining regions are divided between MERRA-2 (6 regions), HARMONIE (6 regions), and ERA-Interim (3 regions).
For North America, the HD ranking is shown in Figs. 11 and 12. Mainly two reanalyses, NARR and ERA-Interim, dominate the best average rank among regions. NARR performs best for bias along the Pacific coast, the northwestern regions, northeastern Canada, and southern Mexico whereas ERA-Interim is superior along the U.S. Atlantic, in the central south United States into Mexico, and some states in the U.S. Midwest and central Canada. For the other regions, JRA-55 shows the best performance of the reanalyses for some states along the U.S. Gulf Coast and around Lake Michigan, with ERA5 (4 regions) and MERRA-2 (2 regions) claiming the remaining regions.
The results are somewhat flipped for HD correlation (cf. Fig. 12) as ERA-Interim performs best along the central and southern Pacific coast and parts of the northwestern regions while NARR is on average the best reanalysis from the central south United States over the Midwest into the Great Lakes region. Here, no pattern emerges for the remaining regions as they are split up between JRA-55 (6 regions), ERA5 (5 regions), and MERRA-2 (2 regions).
Looking at the bias of RX5D over Europe (Fig. 13), COSMO-REA6 performs best in a majority of regions including Scandinavia, being outperformed by JRA-55 in the northern German states and parts of eastern Europe, while ERA5 performs best over the Iberian Peninsula, western England and Wales, and several regions in central Europe. For the correlation of RX5D (cf. Fig. 14), COSMO-REA6 has the best rank only in three regions, whereas ERA5 (Scandinavia FIG. 6. As in Fig. 2, but for climate indices related to precipitation over North America. For better readability, the upper plot is split in two with RX1D, RX5D, CDD, and CWD on the left (left axis ticks; dimensions are mm and days) and with SPI-related indices on the right (right axis ticks; dimension is standard deviations). and central Europe) and MERRA-2 (southern and eastern Europe) exhibit an overall better performance compared to the other reanalyses.
For North America, it is difficult to point out a clear winner with regard to bias of RX5D (Fig. 15). The best average rank in regions is mostly split up between ERA-Interim and NARR followed by MERRA-2. Best performances can be found for ERA-Interim along the U.S. Gulf Coast and most of the Atlantic coast and the northwestern part of Canada, for NARR in the U.S. Pacific coast and to the northern central United States as well as to the north and the east of the Great Lakes, and for MERRA-2 from the Mexican Pacific coast northeastward into the central United States as well as in northern Canada. ERA5 and JRA-55 perform best for four distinct regions each. A very different pattern emerges for RX5D correlation shown in Fig. 16 as JRA-55 performs best over the continental United States as well as a few regions in Mexico and Canada. There is no obvious reason for the good performance over the United States with respect to the utilized observing systems but it might be based on the quality of the forecast model used for JRA-55 in representing extreme precipitation (S. Kobayashi 2020, personal communication). In the other regions, the best average rank is often taken by ERA-Interim (Mexico and parts of Canada), NARR (western Canada and Alaska), and MERRA-2 (northeastern Canada).

Summary: Intercomparison
In this section, we summarize the key findings of the intercomparison in three categories: (i) intercomparison of global reanalyses, (ii) improvement of ERA5 over ERA-Interim, and (iii) added value of regional reanalyses in Europe and North America.

a. Global reanalyses intercomparison
With respect to a general intercomparison of the global reanalyses over the European continent, ERA5 shows the best correlation for all temperature-based indices. In terms of bias, ERA5 is only outperformed by MERRA-2 among global reanalyses for indices related to warm temperatures (SD, HD, TR) and by ERA-Interim for ID. However, the best performance for precipitation-based indices over Europe varies between the four global datasets. Each system performs best among global reanalyses for at least one index either in terms of bias or correlation.
For North America, ERA-Interim is superior over the other global systems for nearly all temperature-based indices, and is only outperformed by ERA5 for ID. With regard to precipitation, the results are again quite diverse: In terms of bias, ERA-Interim performs best for extreme precipitation (RX1D, RX5D), JRA-55 for CDD and CWD, and MERRA-2 for longterm aggregated precipitation (SPI). For correlation, the best performance among global reanalyses is split between JRA-55 (RX1D, RX5D, CDD, CWD) and MERRA-2 (SPI). In North America, ERA5 never performs best for precipitation indices either in terms of bias or correlation.

b. ERA5 and ERA-Interim
With ERA5, ECMWF released the first fourth-generation global reanalysis dataset, which was expected to outperform its FIG. 7. Percentage of European stations where each reanalysis is in the significantly independent top rank (with respect to a significance level of 1%) of (top) lowest bias and (bottom) highest correlation for each climate index related to precipitation.
predecessor ERA-Interim. ERA5 was therefore highly anticipated as ERA-Interim has been quite popular among a broad range of users. Over the European continent, the results indicate a clear improvement in performance of ERA5 in representing climate for nearly all temperature-based indices in both bias and correlation. Only for ID ERA-Interim has a smaller bias and performs better than all other reanalyses, which points to daily maximum temperatures in winter being well captured by ERA-Interim. With regard to precipitation over Europe, ERA5 again shows better correlations for nearly FIG. 8. Percentage of North American stations where each reanalysis is in the significantly independent top rank (with respect to a significance level of 1%) of (top) lowest bias and (bottom) highest correlation for each climate index related to precipitation. all indices, and better performance in terms of bias for extreme precipitation (RX1D, RX5D) and CWD. However, aggregated precipitation is better represented by ERA-Interim with an SPI bias close to zero while ERA5 overestimates annual precipitation (positive bias in SPI) despite having a slightly better correlation. For North America, the results are quite different. ERA-Interim shows a better performance for nearly all temperature-based indices in terms of bias and correlation. It also has a better representation of extreme precipitation (RX1D, RX5D) and dry days (CDD) compared to its successor ERA5. It is unclear why this performance deficit of ERA5 manifests as there is no obvious explanation known to ECMWF yet (H. Hersbach 2020, personal communication). However, ERA5 outperforms ERA-Interim in wet days (CWD) and seasonal and annual aggregated precipitation (SPI), which might be due to the assimilation of a ground-based radar gauge composite over the United States starting in 2009. c. Benefit of (high-resolution) regional reanalyses With respect to regional reanalyses, we find an added value in representing climate in Europe and North America compared to global reanalysis datasets depending on the index in consideration. While the results are quite similar for both continents, we assume that the underlying reasons are different as horizontal resolutions, data assimilation schemes, and included observing systems differ significantly.
For temperature, COSMO-REA6 and HARMONIE outperform the global reanalyses in Europe at least for SD, HD, and CWI and NARR outperforms the global reanalyses for SD, ID, and CWI in North America. For precipitation-based indices, the added value of regional reanalyses is much more pronounced. COSMO-REA6 outperforms the global reanalyses for RX1D, RX5D (only bias), CDD, and CWD. Further, COSMO-REA6 and HARMONIE show the best percentages of rank for annual (only bias) and winter SPI. Similar results are obtained for North America as NARR outperforms the global reanalyses for most of the precipitation-based indices.
The benefit for temperature-related indices from COSMO-REA6 and HARMONIE is quite surprising since both reanalysis systems assimilate only conventional surface and upper air variables but no satellite data as in global reanalyses. FIG. 11. As in Fig. 9, but for North America.
FIG. 12. As in Fig. 11, but for correlation of index HD.
FIG. 13. As in Fig. 9, but for bias of index RX5D.
Further, the nudging data assimilation scheme used in COSMO-REA6 allows for the assimilation of 2-m humidity but not temperature. Therefore, a possible reason for the good performances in temperature indices at least with respect to bias is the better representation of surface orography and the corresponding decrease in systematic biases for surface temperatures. Similar findings are reported by Scherrer (2020) especially over complex terrain. He shows that high-resolution datasets indeed improve the representation of surface temperature, but also assumes that the assimilation of surface temperature can be problematic over elevated regions. This is indicated by the fact that COSMO-REA6, which does not directly assimilate surface temperature, is superior to HARMONIE or MESCAN-SURFEX in his study.
For precipitation-based indices, improvements in COSMO-REA6 and HARMONIE are mainly expected due to the higher horizontal resolution and the associated enhanced representation of nonlinear, small-scale processes. In contrast, NARR assimilates additional observations from satellites and uses latent heat nudging, which might explain the good performance in precipitation indices despite the coarser horizontal resolution.
In general, it is assumed that the smaller grid spacing allows for a better representation of smaller scales; however, higherresolution reanalyses may experience a disadvantage in the evaluation of their performance. The expected benefits may not become apparent to the full extent in an evaluation due to the well-known double penalty problem (e.g., Gilleland et al. 2009). With decreasing grid cell sizes, point-to-point agreement to observations is presumed to deteriorate with small displacement errors in the analysis fields being penalized twice, once for issuing a false alarm and again for missing the observation.

Conclusions
The representation of climate in reanalyses varies significantly not only between the different reanalyses but also between the parameters (i.e., the climate indices) and regions. Therefore, there is no general advice for users on the best reanalysis dataset. The choice of the reanalysis is rather strongly dependent on the specific application and location. To support users in their decisions, we provide a ranking system for the FIG. 16. As in Fig. 11, but for correlation of index RX5D.
FIG. 15. As in Fig. 11, but for bias of index RX5D.
FIG. 14. As in Fig. 9, but for correlation of index RX5D. reanalyses with respect to the representation of climate indices. The comprehensive results of our analysis are provided as online supplemental material. The information may serve as a guideline for choosing an appropriate reanalysis for a specific region and parameter.
In general, it would be expected that reanalyses assimilating either 2-m temperature observations or precipitation would benefit with respect to the representation of the respective climate indices. This would mean that all reanalyses but MERRA-2 and to some extent COMSO-REA6 should have an advantage in representing temperature as well as NARR and partly ERA5 for precipitation. Further, the higher horizontal resolutions and the subsequently enhanced model representation of the regional reanalyses are anticipated to give them an edge over the coarser-resolution global reanalyses.
As the results show, regional reanalyses are indeed able to outperform global reanalyses with respect to the representation of climate. For precipitation, this is a result of a better representation of small-scale processes and/or latent heat nudging (cf. Wahl et al. 2017). Regarding temperature, the apparent added value is likely an effect of the more detailed surface orography due to the higher horizontal resolution (e.g., Jacob et al. 2014;Scherrer 2020). For Europe, ERA5 is still outperforming regional reanalyses for some parameters due to its sophisticated data assimilation scheme and comprehensive observational dataset. This effect is mostly visible in the correlation of the indices where assimilation is clearly beneficial. In contrast, the bias of the reanalyses is much more dependent on the respective model physics.
Our study focuses on the representation of precipitation and temperature in a climatological context. We are aware of some limitations in our approach that should be addressed in future research. In our intercomparison, we concentrate on the continents of Europe and North America. For these regions, large observational datasets of high quality as well as continentalscale regional reanalyses are available. For other continents, these prerequisites are currently not yet fulfilled with ongoing developments for an Asian regional reanalysis.
Moreover, our focus is on the most recent model-based atmospheric reanalysis systems, which describe the fourdimensional atmospheric state with a large set of physically consistent variables. It should be noted that other reanalysis products derived from these ''full'' reanalyses exist. These products aim at enhancing the quality of the original estimates for single or a set of parameters by applying some form of postprocessing. Examples of such datasets are ERA5-Land, 3 which is a downscaling using the ERA5 land component, or the MESCAN-SURFEX 4 surface reanalysis based on HARMONIE on a 5-km grid. Current developments of such reanalysis-based postprocessing datasets include MERRA-2-Land (M. Bosilovich 2020, personal communication) or a surface reanalysis scheme for COSMO-REA6. A follow-on study including or focusing on these data is intended.
As in most intercomparison studies, we are evaluating parameters separately neglecting the statistical or physical dependence between the parameters. However, high-impact weather and extreme events often involve multiple parameters (i.e., compound events). With their physically consistent design, reanalyses provide the best data to correctly represent these events. Therefore, future reanalyses intercomparisons should also include compound events such as wildfires, heat stress, droughts, and floods. With ongoing developments, better climate estimates from reanalyses can be expected in the future. Efforts are under way to implement regional reanalyses for other areas of the world. Newer versions of the HARMONIE and COSMO-REA6 datasets are currently developed and it is expected that these data will further enhance the representations of climate.