Atmospheric circulation types, blockings, and cyclones are central features of the extratropical flow and key to understanding the climate system. This study intercompares the representation of these features in 10 reanalyses and in an ensemble of 30 climate model simulations between 1980 and 2005. Both modern, full-input reanalyses and century-long, surface-input reanalyses are examined. Modern full-input reanalyses agree well on key statistics of blockings, cyclones, and circulation types. However, the intensity and depth of cyclones vary among them. Reanalyses with higher horizontal resolution show higher cyclone center densities and more intense cyclones. For blockings, no strict relationship is found between frequency or intensity and horizontal resolution. Full-input reanalyses contain more intense blocking, compared to surface-input reanalyses. Circulation-type classifications over central Europe show that both versions of the Twentieth Century Reanalysis dataset contain more easterlies and fewer westerlies than any other reanalysis, owing to their high pressure bias over northeast Europe. The temporal correlation of annual circulation types over central Europe and blocking frequencies over the North Atlantic–European domain between reanalyses is high (around 0.8). The ensemble simulations capture the main characteristics of midlatitudinal atmospheric circulation. Circulation types of westerlies to northerlies over central Europe are overrepresented. There are too few blockings in the higher latitudes and an excess of cyclones in the midlatitudes. Other characteristics, such as blocking amplitude and cyclone intensity, are realistically represented, making the ensemble simulations a rich dataset to assess changes in climate variability.
Accurate representation of weather systems and atmospheric circulation features in datasets such as reanalyses and climate models is crucial to better understand climate variability and impacts related to weather. Accurate modeling of weather variability is a prerequisite to assessing subtle changes in that variability, such as from climate change or decadal variability. Placing recent variations of weather variability in the context of decadal to multidecadal climate variability requires centennial or longer model simulations or reanalysis datasets; the latter have become available only recently (e.g., Compo et al. 2011; Poli et al. 2016; Laloyaux et al. 2017).
Reanalyses have become widely used datasets in geosciences and are used well beyond research applications. They are the preferred datasets to study variability in atmospheric circulation features due to their standardized spatiotemporal resolution and completeness, their coherency, and the long time periods they cover (e.g., Raible et al. 2008; Neu et al. 2013). Despite several different reanalyses being available, studies evaluating climate model data often make use of only one of these products (see Flato et al. 2013). However, different assimilation schemes, different input datasets, and different numerical weather prediction (NWP) models are used to produce reanalysis datasets; thus, discrepancies between reanalyses are to be expected. Multiple projects have compared several reanalyses, providing a rich set of analysis tools [e.g., the Web-Based Reanalyses Intercomparison Tools (WRIT; Smith et al. 2014) and the Stratosphere–Troposphere Processes and Their Role in Climate (SPARC) Reanalysis Intercomparison Project (S-RIP; Fujiwara et al. 2017)].
Despite these comparison efforts, the newer reanalyses, and especially the recent centennial reanalyses [Twentieth Century Reanalysis (20CR), Twentieth Century Reanalysis version 2c (20CRv2c), European Centre for Medium-Range Weather Forecasts (ECMWF) twentieth century reanalysis (ERA-20C), and Coupled ECMWF Re-Analysis of the twentieth century (CERA-20C); see Table 1), are still inadequately evaluated with respect to their ability to represent the most important midlatitude atmospheric features, such as cyclones and blockings, and recurrent weather patterns described by circulation types (CTs).
Of these three atmospheric features that are the focus of this study, CTs reduce the continuum of possible atmospheric flow situations to a few distinctive classes (Huth et al. 2008), serving as an important diagnostic of past weather and climate events (Auchmann et al. 2012; Hofer et al. 2012), particularly since CT classifications can be extended far back in time (Lamb 1972; Jones et al. 1993, 2013; Schwander et al. 2017). Numerous studies link specific CTs with more frequent extreme events, such as storms, floods, or hail (e.g., Kunz et al. 2009; Pinto et al. 2010; Riediger and Gratzki 2014; Nisi et al. 2016).
CTs allow for evaluation of model performance by quantifying biases in the frequency and intensity of recurring weather regimes (e.g., Demuzere et al. 2009; Rohrer et al. 2017). They are used to adjust accompanied biases in surface variables for subsequent (impact) studies (Addor et al. 2016). Numerous different CT classifications exist. We use two CT classifications provided by the COST 733 Action (Philipp et al. 2010, 2016).
An important aspect of weather variability is blocking. These are responsible for a considerable amount of midlatitudinal weather variability and are defined as quasi-stationary, vertically coherent, and persistent high pressure systems (e.g., Rex 1950; Schwierz et al. 2004). They divert the eastward propagation of pressure systems and often lead to extreme events associated with persistent weather conditions, such as floodings, heat waves, cold spells, and droughts (e.g., Black et al. 2004; Cattiaux et al. 2010; Barriopedro et al. 2011; Buehler et al. 2011; Dole et al. 2011; Lau and Kim 2012).
Several algorithms to detect blocking exists (Barriopedro et al. 2006). Spatial structure and frequency of blocking can vary considerably, depending on the blocking index used. Barnes et al. (2014) compared three different blocking algorithms and four different reanalyses. Overall, they found that spatial and temporal features of blockings are similar in all reanalyses, but differences are evident on regional scales.
Besides blocking, extratropical cyclones determine the weather in the midlatitudes. They convey a large part of the total precipitation to continents (Pfahl and Wernli 2012; Catto and Pfahl 2013; Dowdy and Catto 2017) and are linked to extreme events, such as heavy precipitation or storms (Shaw et al. 2016). Therefore, the accurate representation of cyclones in climate models is essential for subsequent impact studies, especially if the studies involve hydrological applications.
Numerous recent studies intercompared cyclone characteristics in different reanalyses (e.g., Raible et al. 2008; Ulbrich et al. 2009; Hodges et al. 2011; Tilinina et al. 2013; Wang et al. 2006, 2016). In general, their results agree that more modern reanalyses (e.g., ERA-Interim, MERRA, and CFSR) converge in their representation of cyclones: that is, their spatial distribution and cyclone frequencies. Reanalyses with a higher horizontal resolution show more intense cyclones and a larger number of them. Hodges et al. (2011), Tilinina et al. (2013), and Wang et al. (2016) found deeper cyclones in MERRA than in any other reanalysis datasets. Wang et al. (2016) found that differences are larger in winter than in summer and larger in the Southern than in the Northern Hemisphere. Century-long reanalyses (20CR and ERA-20C) are not well constrained in the Southern Hemisphere and the Pacific when going back in time, which has implications on cyclones and their characteristics. Furthermore, the cyclone-tracking algorithm influences cyclone characteristics, which should be kept in mind when interpreting results [see Raible et al. (2008) and Neu et al. (2013) for a review of different detection and tracking methods applied to the ERA-Interim dataset].
In summary, CTs, blockings, and cyclones are important atmospheric phenomena, and a systematic evaluation of their representation across available reanalysis datasets is still missing. In this paper, we add to the intercomparison endeavor and systematically compare a set of 10 different reanalyses (Table 1), as well as an ensemble of 30 simulations with slightly perturbed initial conditions with a general circulation model (GCM; Bhend et al. 2012) spanning the last 400 years. We aim to benchmark these GCM simulations as to their suitability for later studies. Our evaluation has the following aims:
Systematically compare reanalyses in terms of spatial patterns (climatology), magnitude, variability, and interannual correlation of midlatitudinal weather patterns. A focus will be on recently released, centennial reanalysis datasets, as they are still less evaluated, compared to other reanalyses. Thereby, we investigate whether it is sufficient to only use one reanalysis to evaluate a model simulation.
Evaluate a 30-member ensemble of 400-yr-long GCM simulations (Bhend et al. 2012) with respect to climatologies and variability of the three features.
The different reanalyses examined (Table 1) can be subdivided into two groups: reanalyses using only surface observations (20CR, 20CRv2c, ERA-20C, and CERA-20C) and reanalyses also assimilating data from other sources, such as satellites, aircraft, balloon soundings, and other conventional platforms. We follow the terminology of Fujiwara et al. (2017) and hereafter refer to these reanalyses as surface-input reanalyses and full-input reanalyses, respectively. Fujiwara et al. (2017) summarized most of the reanalyses extensively and provided extensive intercomparison tables. Here, we briefly introduce each reanalysis used. Note that 6-hourly data are always used, even if the dataset has a higher temporal resolution.
Full-input reanalyses depend on the availability of satellite data; thus, their extension back in time is limited to 1979. Using only conventional data sources (e.g., using surface and upper-air in situ measurements), some reanalyses reach back until 1948. Surface-input reanalyses are comparatively new. Compo et al. (2006) showed the feasibility of a surface-input reanalysis to extend back to the nineteenth century.
Subsequently, Compo et al. (2011) produced the 20CRv2 dataset back to 1871, based on the assimilation of surface and sea level pressure from the International Surface Pressure Database (ISPD; Cram et al. 2015), version 2, using an ensemble Kalman filter (EKF). 20CRv2 consists of 56 ensemble members, each of which is equally consistent with observations. To study weather events, the use of the individual ensemble members, rather than the ensemble mean, is advised (e.g., Brönnimann et al. 2012). The data are available in 2° × 2° resolution.
The updated 20CRv2c extends back to 1851. Issues concerning the sea ice concentration have been fixed, and new boundary conditions for sea surface temperature (SST; Giese et al. 2016) and sea ice concentration (Hirahara et al. 2014), as well as an updated set of observations (ISPD, version 3.2.9; Cram et al. 2015), have been used. The model resolution and number of ensemble members are identical to 20CRv2.
Three reanalyses from ECWMF are used in this study. All of them use a four-dimensional variational data assimilation (4D-Var). ERA-20C is a surface-input reanalysis that spans the years from 1900 to 2010 with a temporal resolution of 3 h and a horizontal spectral resolution of T159 (corresponding to 1.125° × 1.125°; Poli et al. 2016). Only surface and sea level pressure and surface wind observations over the ocean were assimilated [ISPD, version 3.2.6, and International Comprehensive Ocean–Atmosphere Datasets (ICOADS), version 2.5.1; Woodruff et al. 2011].
The recently generated successor CERA-20C (Laloyaux et al. 2016; 2017) assimilates the same observations as ERA-20C but is coupled with an ocean model (which assimilates oceanic variables). A 10-member ensemble is provided to address uncertainties related to observations and the model.
ERA-Interim data from 1979 to 2015 are used (Dee et al. 2011; the initially available T255 spectral resolution was interpolated to 1° × 1° regular latitude–longitude grid). ERA-Interim is a widely used full-input reanalysis, is well tested, and is chosen as a reference for this study.
The Japan Meteorological Agency (JMA) has produced the Japanese 55-year Reanalysis (JRA-55; Ebita et al. 2011; Kobayashi et al. 2015), which goes back to 1958, when regular radiosonde observations became broadly available. It uses a 4D-Var data assimilation. Here, the 1.25° × 1.25° horizontal resolution data are used before remapping to the resolutions described in the method section.
The Modern-Era Retrospective Analysis for Research and Applications (MERRA; Rienecker et al. 2011) and its recent update, MERRA version 2 (MERRA-2; Bosilovich et al. 2015), are produced by the National Aeronautics and Space Administration (NASA). MERRA assimilates observations using a gridpoint statistical interpolation (GSI) three-dimensional variational data assimilation (3D-Var) analysis and provides data from 1979 to the end of February 2016, and it has been replaced by MERRA-2, which goes back to 1980 and is updated to 2017.
For historical reasons, and because of its wide use in the scientific community, the National Centers for Environmental Prediction–National Center for Atmospheric Research reanalysis (NNR; Kalnay et al. 1996) that used 3D-Var is also included in this study. NNR could be considered a reduced-input reanalysis because it only assimilates satellite-derived temperatures, rather than radiances, and does not include Global Navigation Satellite System radio occultation observations.
Additionally, the Climate Forecast System Reanalysis (CFSR; Saha et al. 2010) is included. CFSR uses a coupled atmosphere–ocean–land surface–sea ice system similar to the Climate Forecast System, version 2 (CFSv2). The reanalysis is available from 1979 to 2010 at a spatial resolution of 0.5° × 0.5°. The dataset is now expanded using the updated CFSv2 analysis system (Saha et al. 2014), which serves as a quasi continuation of CFSR with some changes (Fujiwara et al. 2017).
The different reanalysis products are compared to GCM simulations [chemical climate change over the past 400 years (CCC400); Bhend et al. 2012] produced using the ECHAM5.4 atmospheric model (Roeckner et al. 2003), with a spectral truncation of T63 corresponding to an approximate horizontal resolution of 1.875° and 31 vertical levels. The CCC400 dataset encompasses the years from 1600 to 2005 and 30 model members, resulting in a total of 12 180 years. Additionally, one control simulation spanning the same period was performed (CCC400_corr), assessing the impact of an erroneous implementation of the reconstructed land surface conditions from Pongratz et al. (2008). There was a misrepresentation of the land surface classes affecting transient land surface parameters, such as albedo and surface roughness. CCC400_corr uses the same setup but with correct handling of the land surface classes. The CCC400_corr simulation was found to improve the simulation in the Southern Hemisphere to some extent but did not detectably alter the circulation in the Northern Hemisphere.
CCC400 is forced with reconstructed annual mean SSTs (Mann et al. 2009), augmented by El Niño–Southern Oscillation–dependent intra-annual variability according to the reconstructed Niño-3.4 index of E. R. Cook et al. (2008, meeting presentation). Sea ice is prescribed by the Hadley Centre Sea Ice and Sea Surface Temperature dataset, version 1.1 (HadISST1.1; Rayner et al. 2003). After 1870, HadISST reconstructed monthly sea ice is used; before 1870, the HadISST monthly climatology between 1871 and 1900 is used.
In CCC400, several radiative forcings are included. The radiative effects of volcanic eruptions are prescribed on the basis of reconstructions by Crowley et al. (2008), long-lived greenhouse gas concentrations are prescribed according to Yoshimori et al. (2010), and tropospheric aerosols are implemented following reconstructed loadings by Koch et al. (1999). Total solar irradiance is included based on the reconstructions of Lean (2000).
a. Circulation types
We use two CT classifications over the central European domain (41°–52°N, 3°–20°E), namely, the Grosswetter-types (GWT) and cluster analysis of principal components (CAP) classifications (Weusthoff 2011; Rohrer et al. 2017). They are in accordance with conventions by the COST 733 Action “Harmonisation and Applications of Weather Type Classifications for European Regions” CT classification catalog (Philipp et al. 2010, 2016) and were introduced by Schiemann and Frei (2010) for operational use at MeteoSwiss. Daily averaged data are bilinearly remapped to a 1° × 1° resolution. A brief synoptic description is given in Table 2.
GWT is a correlation-based classification scheme calculating an index for the zonality, meridionality, and cyclonicity of a flow using sea level pressure (SLP) or geopotential height at 500 hPa (Z500). Based on these indices, the flow situation is separated into CTs representing the wind direction and/or the cyclonicity.
CAP combines a principal component analysis of SLP with a subsequent k-means cluster analysis. Here, in order to compare different datasets, every day is assigned to the most similar CT centroid of the MeteoSwiss classification established by using ERA-40 according to the lowest Euclidian distance.
Blockings are defined as reversals of the meridional Z500 gradient ΔZ500/Δφ, with Δφ being the change in latitude. This approach was introduced by Lejenäs and Økland (1983) and later refined by Tibaldi and Molteni (1990) and Tibaldi et al. (1994). As suggested by Scherrer et al. (2006), the algorithm is extended to find blockings in a two-dimensional space using the following criteria:
geopotential height (GPH) gradient (GPHG) toward the pole, GPHGP = (Z500φ+14° − Z500φ)/14° < −10 gpm (° lat)−1; and
GPH gradient toward the equator, GPHGE = (Z500φ − Z500φ−14°)/14° > 0 gpm (° lat)−1.
The latitude φ varies from 36° to 76° in 2° latitude intervals. All datasets are bilinearly remapped to a 2° × 2° resolution.
The attribution, whether a meridional Z500 reversal is a blocking, follows the approach of Schwierz et al. (2004), who defined blockings as a spatiotemporally connected anomaly. A blocking is detected if the spatial overlap of a reversed GPHG area was at least 70% of At (i.e., At ∩ At+1 ≥ 0.7At, where At denotes the area of a blocking at time step t) and if the GPHG reversal persists at least five days (20 time steps).
The cyclone-tracking algorithm of Blender et al. (1997) is used to detect and track the position and intensity of individual cyclones from genesis to lysis. Every dataset is first remapped to T63 spectral resolution for better comparability between datasets. Note that sensitivity tests show that more cyclones are detected with higher resolution; for example, CFSR shows 11% higher cyclone center densities on its original 0.5° × 0.5° resolution, compared to T63 spectral resolution.
In case of reanalyses providing fields on a regular longitude–latitude grid, the remapping to first a Gaussian grid and then spectrally truncating the Gaussian grid at T63 may introduce differences in the cyclone center density and other properties of a cyclone. However, we find that these differences are minor, compared to the differences between datasets and their original spatial resolution.
A cyclone is defined as a local minimum in the 1000-hPa GPH field (Z1000) within the eight neighboring grid points. This local Z1000 minimum is required to have a Z1000 mean gradient greater than 20 m (1000 km)−1 in the surrounding 1000 × 1000 km2 area. This is a rather weak criterion that allows tracking cyclones already in their juvenile state. The Z1000 mean gradient must be greater than 60 m (1000 km)−1 at least once in the lifetime of a cyclone.
Cyclone tracks are determined by a nearest-neighbor search in an area with a radius of roughly 480 km without assuming preferred propagation direction or speed. Blender et al. (1997) showed that this criterion is sufficient for 6-hourly data. Cyclones are tracked only if they occur for at least one day, are shorter than 10 days, and do not traverse elevated terrain over 1000 m. The extrapolation from the surface level to the Z1000 level over orography can lead to artifacts that may be detected as long-lasting, quasi-stationary cyclones.
Results for the North Atlantic–European region (NAE; 40°–76°N, 70°W–10°E) are mainly presented in this study. Results for the North Pacific (NPA; 40°–76°N, 150°–230°E) and South Pacific (SPA; 40°–76°S, 170°–290°E) are included where relevant, and associated figures are shown in the supplemental material. For cyclones in the NAE region, the area 60°–76°N, 70°–20°W, is removed because the topography of Greenland obfuscates the results. CT results cover the Alpine domain (41°–52°N, 3°–20°E). The overlapping period of all reanalyses and the model simulation, 1980–2005, is presented throughout this study. For multimember datasets, we do not use the ensemble mean but treat every member separately.
The following characteristics are investigated in the results section.
CT frequency denotes how often a CT occurs per year.
Blocking frequency is defined as the fraction of blocked 6-hourly time steps per year.
As a measure of blocking intensity, the maximum geopotential height (maxGPH) amplitude is determined by the maximum of −GPHGP during a blocking.
Cyclone center density is a temporally and spatially normalized quantity measuring the cyclone center frequency per grid point.
The minimum Z1000 value determines the depth of a cyclone, and the mean Z1000 gradient around the minimum Z1000 value is used to define the cyclone intensity.
Cumulative distribution functions (CDFs) are shown for seasonal blocking frequency, seasonal cyclone center density, and annual CT frequency.
a. Circulation types
To evaluate the atmospheric circulation over the Alpine region, we begin with CTs, as they provide an important overview characterizing the variability. Figure 1 shows the CDF of the annual frequency for each CT and dataset for the GWT classification with 10 types (GWT10) using SLP (GWT10SLP) for the overlapping period over the Alps. The SLP composite map is drawn at the upper-left corner of each CT.
In general, reanalyses agree well with each other, all showing that westerlies (W), northeasterlies (NE), and easterlies (E) are most abundant (note the different x axes). In some cases, the spread may be large, particularly for the purely anticyclonic and cyclonic CTs (A and C, respectively). Some reanalyses show discrepancies to other reanalyses for certain CTs. 20CR and 20CRv2c exhibit fewer westerlies [including southwesterlies (SW) and northwesterlies (NW)] and more easterlies [including southeasterlies (SE)], compared to other datasets, also denoted by the shaded 10th–90th-percentile range in Fig. 1 for 20CRv2c. Both MERRA reanalyses show fewer purely A situations over the Alpine region.
Examining the model ensemble, we find that CCC400 overrepresents the NW and northerly (N) CTs, compared to all reanalyses, visible by the rightward shift of the green CDF in Fig. 1. Contrarily, the E and SE CTs are underrepresented, compared to all reanalyses. Similarly, southerly (S) and NE CTs tend to be less frequent in CCC400 than in any reanalysis dataset. In these cases, the 10th–90th-percentile range (green shaded area) is mostly not overlapping with reanalyses. The four other CTs (W, SW, C, and A) are simulated well within the range of reanalyses.
Results for CAP with nine classes (CAP9) are very similar (Fig. S1 in the supplemental material). Hence, two different CT classifications agree with each other, adding to the confidence of the results. GWT10 performed at the Z500 level (GWT10Z500) shows substantially smaller differences between datasets, including the CCC400 model simulation (Fig. S2 in the supplemental material).
Figure 2 shows the mean differences in frequency between two datasets for GWT10 on SLP. Reanalyses from the same institution tend to have similar CT frequencies. This is particularly evident for 20CRv2c and 20CR (both from NOAA CIRES); MERRA and MERRA-2 (NASA); and CERA-20C, ERA-20C, and ERA-Interim (ECMWF). Also, NNR and CFSR (both from NCEP) show rather small differences, although they have a different NWP model and assimilation scheme.
Lower mean differences are discernible when using the CAP9 classification (Fig. S3 in the supplemental material) than when using GWT10SLP. The main findings from GWT10SLP are, however, also evident in CAP9, enhancing the robustness of the results. Examining GWT10Z500 (Fig. S4 in the supplemental material) reveals that the mean differences between reanalyses are considerably lower than at the surface. On this level, surface-input reanalyses are almost as different to full-input reanalyses as the model simulation. Generally, full-input reanalyses and the reduced-input reanalysis NNR agree very well with each other.
Because of the NW–N overestimation and E–SE underestimation of the CCC400 model simulation, compared to all reanalyses, the mean difference between the model and all reanalyses is larger than the deviations between the individual reanalyses. On the Z500 level, CCC400 is closer to all reanalyses, compared to the SLP-based CT, indicating that the midtropospheric circulation is simulated more accurately than the atmospheric circulation at the surface.
Figure 3a and Figs. S5a and S6a in the supplemental material show the correlation of annual CT counts between 1980 and 2005 averaged over all CTs for GWT10SLP, CAP9, and GWT10Z500, respectively. CCC400 is not shown because we expect the correlations to be near zero for such a free-running global simulation. With the exception of 20CR, all reanalyses correlate at least 0.75 with each other. CERA-20C outperforms ERA-20C and shows similar correlation coefficients to full-input reanalyses.
The average annual blocking frequency is shown in Fig. 4 for each dataset for the overlapping period. ERA-Interim (Fig. 4, top left) is used as the reference, and all other datasets are shown as differences with respect to it. For datasets with several ensemble members, the blocking frequency for each member is calculated separately, and only thereafter is the ensemble mean computed.
In agreement with, for example, Barriopedro et al. (2006) or Berrisford et al. (2007), Fig. 4 shows that all datasets contain three centers of high blocking frequency in the North Atlantic–European region, in the North Pacific, and less pronounced in the South Pacific. However, notable differences in the blocking frequency exist among the datasets.
All four surface-input reanalyses (20CR, 20CRv2c, ERA-20C, and CERA-20C) contain fewer blocking episodes, compared to ERA-Interim, in almost all regions. As an exception, both 20CR reanalyses show higher blocking frequencies than ERA-Interim over the Alps. Between 1980 and 2005, 657 blocking episodes are identified in ERA-Interim in the NAE, while from CERA-20C, only 603 blocking episodes are detected, on average. ERA-20C and the mean of both 20CR reanalyses are between 639 and 645 blocking episodes (Table 3).
While modern full-input reanalyses, except MERRA, agree very well on the spatial distribution and the frequency of blockings, the NNR contains a much lower blocking frequency. Most full-input reanalyses show between 626 (MERRA-2) and 657 (ERA-Interim) blocking episodes in the NAE domain, while MERRA contains only 582 blocking episodes. NNR has even fewer, with 538 blocking episodes.
In contrast to the NAE, even recent full-input reanalyses do not agree particularly well on the number of SPA blocking episodes. Here, the number of blocking episodes between 1980 and 2005 ranges from 304 (MERRA) to 443 (ERA-Interim and JRA-55). NNR produces only 209 blocking episodes. The surface-input reanalyses are in the range of recent full-input reanalyses, with 20CR and 20CRv2c containing more blocking episodes, compared to ERA-20C and CERA-20C (399 and 407 vs 313 and 317; Table 3).
For CCC400, there is a tendency toward an underrepresentation (overrepresentation) at the high (low) latitudes in the Northern Hemisphere (Fig. 4). The Southern Hemisphere is poorly represented in the model simulation, with too high of a blocking frequency. This is related to a misrepresentation of the atmospheric circulation over Antarctica (not shown). On average, CCC400 detects 582 blocking episodes (with a minimum of 552 and a maximum of 605) in the NAE domain, which is similar to MERRA, but lower than other full-input reanalyses.
Among all datasets considered, the percentage of long-lasting blocking episodes (lifetimes >9 days) with respect to all detected blocking episodes between 1980 and 2005 is globally highest in CCC400, with 33.0% (Table 3). Reanalyses show lower percentages between 26.6% (ERA-20C) and 29.9% (ERA-Interim). While different realizations of CCC400 vary between 31.7% and 34.7% long-lasting blockings, 20CRv2c encompasses a range between 26.9% and 30.3% long-lasting blockings. The spread among different realizations of 20CR and CERA-20C are very similar to 20CRv2c, and, thus, we conclude that the model simulation CCC400 significantly overrepresents the number of long-lasting blockings.
Figure 5 illustrates the simultaneous comparison of two blocking characteristics for the NAE domain: the maxGPH amplitude as a proxy for the strength of a blocking and the duration of a blocking. A long-lasting, strong blocking would be located at the top-right corner of Fig. 5.
The duration of blockings in this region is right skewed, as seen in all datasets; that is, the distribution has a long tail toward long blockings. In general, all datasets show a similar behavior in both blocking duration and intensity. There is a significant positive relationship between blocking length and intensity in all datasets, determined by a linear regression. This relationship is strongest in the NAE domain and weaker in NPA and SPA (not shown).
The median blocking duration for the NAE region is consistently around 7.75 days, with a few datasets varying by 0.25 days. Only NNR shows a lower median of 7.25 days. CFSR and JRA-55 tend to have more long-lasting blockings, as evident by the median, as well as by the distinctive bulge of the 50th-percentile contour toward long-lasting blockings.
The blocking intensity for the NAE is more variable between datasets. MERRA and, to some degree, MERRA-2 have more intense blockings, compared to other reanalyses. On the other hand, ERA-20C shows the weakest blockings. The successor CERA-20C is closer to other reanalyses. In general, surface-input reanalyses contain weaker blockings than full-input reanalyses. For all datasets, the mode of the distribution of blocking intensity is not well defined. It is rather flat and differs considerably between datasets. Some have bimodal distribution features even at the 2.5th-percentile contour (Fig. 5).
CCC400 agrees well with the reanalyses with respect to both blocking duration and intensity. Thus, the main reason for the underestimation of NAE blockings is the too-few total number of blocking episodes.
Figures S7 and S8 in the supplemental material show the results for NPA and SPA, respectively. While NPA is similar to NAE, reanalyses show larger discrepancies in terms of blocking amplitude in SPA. Both ERA-20C and CERA-20C contain less-intense blockings in SPA. MERRA contains by far the strongest blockings, while MERRA-2 is similar to CFSR, JRA-55, and ERA-Interim.
Next, we focus on the seasonal variability of blocking. Figure 6 shows the CDF of the blocking frequency in the NAE domain for all datasets. Blockings are most frequent in winter (DJF) and spring (MAM) and rarest in summer (JJA). The underrepresentation of blockings in NNR is most apparent during autumn (SON) and summer. ERA-20C estimates higher blocking frequencies than any other reanalysis in spring and in the lower percentiles of the CDF for winter (i.e., winters with few blockings). CERA-20C is closer to other full-input reanalyses in this regard. Both 20CR reanalyses contain higher blocking frequencies than other reanalyses in summers with high blocking frequencies (upper part of panel). Full-input reanalyses show relatively similar CDFs. The 10th–90th-percentile range of CERA-20C and 20CRv2c demonstrates that the uncertainty among different reanalyses may be as large as the variations among different members of the same reanalysis.
Figures S9 and S10 in the supplemental material show the results for NPA and SPA. ERA-20C shows fewer blockings in summer over NPA, compared to other reanalyses. CERA-20C is closer to full-input reanalyses. 20CR and 20CRv2c contain fewer blockings in spring. In SPA, CCC400 largely overrepresents blockings in all seasons.
Correlation coefficients between the annual blocking frequencies of different reanalyses (Fig. 3b) show that ERA-20C has, generally, somewhat lower correlations (around 0.6), compared to CERA-20C (around 0.8) and 20CR and 20CRv2c (around 0.7) in NAE. Full-input reanalyses (ERA-Interim, JRA-55, MERRA, MERRA-2, and CFSR) are highly correlated (around 0.85). NNR shows comparable correlations with more recent full-input reanalyses.
CCC400 represents the blocking frequency well, except in summer, where a tendency toward too few blockings is discernible in Fig. 6. The 10th–90th-percentile range encompasses reanalysis datasets; hence, no significant deviation can be detected.
Figure 7 shows the representation of the cyclone center density using the period from 1980 to 2005. All datasets are compared to ERA-Interim. The main storm tracks are located in the western North Pacific, northern North Atlantic, and around Antarctica in all datasets, which is in agreement with, for example, Neu et al. (2013).
The climatology, defined by the cyclone center density, agrees well in the extratropics among full-input reanalyses. ERA-Interim, JRA-55, CFSR, MERRA, MERRA-2, and, additionally, the two surface-input reanalyses ERA-20C and CERA-20C show few notable differences, except in the proximity of orography. For example, ERA-Interim shows stationary cyclones east of the Andes and the Atlas Mountains. This needs to be considered when using ERA-Interim as the baseline.
Reanalyses with a resolution coarser than or equal to 2° × 2° contain lower cyclone center densities. Globally, NNR is 40%, and both 20CR reanalyses are 20%, below the cyclone center density, averaged over all full-input reanalyses. CFSR tends to show the highest cyclone center density overall, with globally averaged values being 10% higher than ERA-Interim. However, the largest differences occur close to orography, and, thus, these results should not be overinterpreted. Both 20CR reanalyses have an imprint of a Gibbs-type phenomenon (e.g., Hoskins 1980) visible in the Southern Hemisphere oceans. The low cyclone center density in NNR and 20CR is partly due to the remapping from a regular longitude–latitude grid to a Gaussian grid for the T63 spectral resolution. Although both datasets are interpolated to a very high resolution before the spectral remapping, the cyclone center density is lower than in their original resolution.
Globally, CCC400 simulates similar cyclone center densities around 10% higher than high-resolution reanalyses. Regionally, the Northern Hemisphere main storm track regions, as well as cyclone center densities in continental Europe, are overrepresented, while the high latitudes are mostly underrepresented. In the Southern Hemisphere, a similar pattern is visible with an equatorward shift of the storm track of CCC400, compared to reanalyses.
Figure 8 displays the distribution of the intensity of cyclones measured by the mean Z1000 gradient around the cyclone center (Fig. 8a), the depth of cyclones given by the minimum Z1000 (Fig. 8b), and the cyclone lifetime (Fig. 8c). While the mean Z1000 gradient shows that NNR has less intense cyclones, the minimum Z1000 reveals that NNR contains mainly fewer shallow cyclones than other reanalyses and only slightly underrepresents very deep cyclones. MERRA contains the most intense and the deepest cyclones, compared to other reanalyses. MERRA-2 contains the second-deepest and second most intense cyclones after MERRA; thus, MERRA-2 is closer to other reanalyses. The higher the original resolution before the remapping is, the more intense cyclones are generally estimated in reanalyses. There are tendencies for such a behavior in the depth of cyclones, but here, the relationship is less discernible.
CCC400 is well within the range of reanalyses for both the intensity and the depth of cyclones, especially taking the relatively low resolution of the model simulation into account. This is true, except for a difference in the intensity distribution of CCC400 for intensity gradients greater than 450 m (1000 km)−1, where the distribution has a noticeable difference.
Figure 8c indicates that datasets show a similar cyclone lifetime distribution, with a spike at five time steps (the minimum) and then an exponentially decreasing cyclone count with increasing cyclone lifetime. Most datasets show a similar distribution with NNR, and, less pronounced, both 20CR reanalyses show fewer (more) short-lived (long lived) cyclones.
Results for NPA and SPA are similar to the result over NAE (Figs. S11 and S12 in the supplemental material), indicating the cyclone characteristics of a specific reanalysis are valid in all storm track regions.
The cumulative distribution functions of cyclone center density seasonally averaged over the NAE domain are shown in Fig. 9. The substantially lower cyclone count in NNR and lower counts in both 20CR reanalyses are very distinct, as already observed in Fig. 7. A seasonal cycle is evident for all datasets, with a maximum in summer and a minimum in winter. Datasets with coarse resolution show particularly few cyclones in summer. The model simulation agrees relatively well with high-resolution reanalyses in terms of cyclone frequency. CCC400 has a tendency to simulate more cyclones than reanalyses do, especially in years with a low cyclone frequency, as discernible by the green shading in Fig. 9. CCC400 tends to produce too many cyclones also in NPA and SPA, as show in Figs. S13 and S14 in the supplemental material, respectively.
The correlations among the annually averaged cyclone center densities among datasets for the NAE domain between 1980 and 2005 (Fig. 10a) are consistently significant on the 5% level (which, in our case, corresponds to a correlation of 0.39). If only deep cyclones (reaching a core geopotential height of below −300 m at least once in its lifetime) are taken into account (Fig. 10b), then correlation coefficients are generally, but not uniformly, higher than for all cyclones. With the exception of ERA-20C, correlation coefficients among reanalyses often exceed 0.9. Figures S15a and S16a in the supplemental material show that correlation coefficients for interannual cyclone center densities are somewhat lower in NPA, compared to NAE, whereas in SPA, we find considerably lower correlations, compared to the Northern Hemisphere, indicating that reanalyses are less constrained here. In many cases, the correlation between two datasets is not significant in SPA. Considering only deep cyclones (<−300-m geopotential height) provides generally higher correlations among reanalyses (Figs. S15b and S16b).
5. Discussion and conclusions
In this study, we examined 1) the representation of extratropical atmospheric flow features in 10 reanalyses and 2) the representation of the same flow features in GCM ensemble simulations (CCC400). Table 4 summarizes the peculiarities found in the different reanalysis products.
We first discuss the performance of surface-input reanalyses, compared to full-input reanalyses, as these are still inadequately evaluated. Particularly CERA-20C and 20CRv2c perform well in most statistics between 1980 and 2005. Despite some reanalysis-dependent peculiarities, such as the more frequent easterlies over the Alpine region in 20CR and 20CRv2c and the relatively weak blocking intensity in general in surface-input reanalyses, surface-input reanalyses succeed in capturing the midlatitudinal circulation at the surface as well as in the midtroposphere. This may not be expected, considering these reanalyses only assimilate surface observations. Both 20CR reanalyses show a low overall cyclone center density and intensity, compared to full-input reanalyses. As also reported by, for example, Blender and Schubert (2000), Jung et al. (2006), Tilinina et al. (2013), and Wang et al. (2016), these two measures depend on the original horizontal resolution of the reanalysis before remapping. Interestingly, the cyclone depth (i.e., the minimum geopotential height) is found to depend not strictly on the horizontal resolution, with 20CR producing some very deep cyclones. The more frequent easterly CTs for both 20CR reanalyses over the Alpine region stem from a high pressure anomaly in 20CR over continental Eurasia (see also van den Besselaar et al. 2011), which locally translates to a high pressure anomaly north of the Alps (not shown). This is in line with the higher blocking frequency detected over central Europe, compared to other datasets.
For the NAE and NPA domains, modern full-input reanalyses agree well among each other, with the notable exception of MERRA, which shows fewer blockings and has more intense and deeper cyclones. Several other investigators have noticed MERRA as an outlier in its circulation statistics. Barnes et al. (2014) noted different seasonal blocking frequencies for MERRA, compared to other datasets, for one specific blocking algorithm. Hodges et al. (2011), Tilinina et al. (2013), and Wang et al. (2016) found that MERRA shows more intense and deeper cyclones, compared to other reanalyses, each using different cyclone-tracking algorithms. MERRA-2 is closer to other full-input reanalyses (CFSR, ERA-Interim, and JRA-55) but still contains deeper and more intense cyclones, compared to other full-input reanalyses.
The older full-input reanalysis NNR shows rather different statistics for blockings and cyclones. NNR contains fewer and shorter blockings, compared to any other dataset examined in this study. Its cyclones are less intense, and fewer very shallow cyclones are detected, compared to other reanalyses. In contrast to the results presented here, Davini et al. (2012) found that NNR, ERA-40, and ERA-Interim are very similar (5%–10% difference at most) in their representation of blockings with a similar blocking algorithm. We clearly find more blockings in ERA-Interim than in NNR (Table 3 and Fig. 4).
Although we potentially penalize NNR in case of cyclones as a result of the interpolation to a higher-resolution grid, we advise, based on results presented in this study, to use a more modern full-input reanalysis. JRA-55 reaches back to 1958 and covers almost the same time period. For periods before 1958, it may be more appropriate to use a surface-input reanalysis, which showed a better overall performance than NNR in this study. However, our results focus on the period 1980–2005, and it should be pointed out that good agreement during this period does not necessarily imply good agreement farther back in time, especially in regions with sparse observations. One advantage of reanalyses is the constant NWP model; however, changes in the number of observations and/or observation systems may lead to artificial trends in reanalyses (e.g., Bengtsson et al. 2004; Brönnimann et al. 2012).
Results for reanalysis products from the same institution (using the same NWP model and assimilation scheme) are very similar for circulation types. Such an institutional dependency cannot be consistently found for either cyclones or blockings. The reason behind this finding is unclear. For both MERRA reanalyses, which show stronger blockings and cyclones, the reason potentially lies in its nonspectral NWP model, as suggested by Tilinina et al. (2013). This may also explain the high (low) frequency of purely cyclonic (anticyclonic) CTs, which may be related to different handling of extrapolation from surface pressure to SLP in MERRA reanalyses. Additionally, we report that the discrepancies among modern full-input reanalyses are comparable to the variations among different members of the multimember reanalyses 20CRv2c or CERA-20C.
The good agreement in NAE and NPA is only partly found in SPA. Table 3 exemplarily shows that the number of blockings between 1980 and 2005 varies greatly among reanalyses (from 209 in NNR to 443 in ERA-Interim and JRA-55). Here, the choice of the reanalysis potentially has a major impact on the result. We also find rather low agreement in the temporal correlation of cyclones among datasets in SPA. Only considering deep cyclones leads to higher correlation coefficients in most cases, which is in line with, for example, Raible et al. (2008), Neu et al. (2013), and Chang and Yau (2016).
The CCC400 model simulations are able to reasonably simulate some aspects of midlatitudinal atmospheric features in the Northern Hemisphere circulation. The simulation of the Southern Hemisphere is significantly hampered by an overrepresentation of blockings and an accompanied equatorward shift of cyclones. Many studies already have shown that GCMs tend to overestimate westerlies at the expense of easterlies in the midlatitudes (e.g., van Ulden and van Oldenborgh 2006), and this study evaluating the ECHAM5.4 confirms this finding. However, here, the largest overestimation (underestimation), compared to the reanalyses, is found for northwesterlies (southeasterlies).
Blockings are underrepresented in the high latitudes of the Northern Hemisphere in CCC400, while they are overrepresented in lower latitudes. This agrees with Lenggenhager (2013), who found that the subtropical high pressure belt in CCC400 is too strong. The overestimation (underestimation) of northerly–westerly (easterly–southerly) CTs fits very well into this picture. Summer blockings are underrepresented, while the other seasons agree better with reanalyses. The internal variability in CCC400 was shown to be very large for blocking, cyclone, and CT frequency. Therefore, even though discrepancies between the model and reanalyses are large, the large variability among CCC400 model members often inhibits the detection of significant biases.
Kreienkamp et al. (2010) found that ECHAM5.4 succeeds in simulating blockings in the midlatitudes, while polar blockings are underrepresented, compared to the NNR. This is qualitatively in agreement with the findings of this study; however, NNR shows fewer blockings than the ECHAM5.4-driven CCC400 model simulation in the NAE domain. Most full-input reanalyses contain more blockings between 1980 and 2005 than CCC400 does. This underlines that the selection of the reanalysis may play a role in the outcome of a study. This is especially true for studies focusing on the Southern Hemisphere, where discrepancies among reanalyses are largest.
Dunn-Sigouin et al. (2013) found that CMIP5 models generally underrepresent short blockings with a lifetime shorter than nine days and overrepresent longer blockings. They used the NNR to assess the performance of CMIP5 models. NNR shows fewer long blockings, compared to modern full-input reanalyses, and, thus, exaggerates the overrepresentation of long blockings in CMIP5 models. However, CCC400 using the ECHAM5.4 model still overrepresents long blockings, compared to all reanalyses.
Considering its spectral T63 horizontal resolution, CCC400 simulates cyclone intensity and length reasonably well in the Northern Hemisphere. Examination of the cyclone center density statistics revealed an overrepresentation of cyclones in the midlatitudes. The distributions of cyclone intensity, depth, and lifetime of CCC400 are within the range of modern reanalyses. Similarly, Pinto et al. (2007) and Löptien et al. (2008) found that ECHAM5.4 simulates cyclone characteristics reasonably, although both studies found some discrepancies in the location of the storm tracks and in the intensity of cyclones.
To summarize, we find that modern, full-input reanalyses, with the exception of MERRA, generally agree well in their representation of CTs, blockings, and cyclones in the Northern Hemisphere. In particular, ERA-Interim, CFSR, and JRA-55 are, overall, very similar. Despite satellite, aircraft, and other remote observation systems, the Southern Hemisphere shows substantial discrepancies among the datasets between 1980 and 2005. The smaller the feature examined (e.g., cyclone depth), the larger the discrepancies among reanalyses. Thus, reanalysis intercomparisons are the most important for statistics relying on small-scale features and for the Southern Hemisphere. Model evaluations may also profit from the knowledge of the reanalysis uncertainty, particularly under these circumstances. NNR may not be suitable for model evaluations anymore and should preferably be replaced, or at least intercompared, with a more recent reanalysis dataset. Surface-input reanalyses show promising results in the near past and in the midtroposphere and prove the usability of such reanalysis projects solely based on surface observations.
This study was funded by the Swiss National Science Foundation via the project EXTRA-LARGE (Contract 143219). Circulation types were calculated with the COST 733 classification software provided by the COST 733 Action “Harmonisation and Applications of Weather Type Classifications for European Regions.” The EU FP7 project ERA-CLIM2 is acknowledged. The CCC400 simulations were performed at the Swiss Supercomputer Centre CSCS. We are grateful to all the institutions producing reanalysis datasets and making them publicly available. The Twentieth Century Reanalysis Project datasets are supported by the U.S. Department of Energy (DOE) Office of Science Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program, and Office of Biological and Environmental Research (BER), and by the National Oceanic and Atmospheric Administration Climate Program Office. The authors thank Paul Poli for useful comments, which greatly helped to improve this study. Three anonymous reviewers helped to greatly improve the quality of this manuscript.
Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JCLI-D-17-0350.s1.