North Atlantic Winter Storm Activity in Modern Reanalyses and Pressure-Based Observations

This study analyzes changes in extratropical windstorms over the North Atlantic during the last decades. We assessed and compared North Atlantic winter storm activity in a comprehensive approach from three different data sources: modern reanalysis datasets, a dynamically downscaled high-resolution global atmospheric climate simulation, and observations. The multidecadal observations comprise both a storm index derived from geostrophic wind speed triangles and an observational record of low pressure systems counted from weather analyses. Both observational datasets have been compared neither to the most recent reanalyses nor to the downscaled global climate simulation with respect to North Atlantic winter storms before. The similarity of the geostrophic wind speed storm index to reanalyzed high wind speed percentiles and storm numbers confirms its suitability to describe storm frequencies and intensities for multidecadal time scales. The results show that high wind speeds, storm numbers, and spatial storm track distributions are generally alike in high-resolution reanalyses and downscaled datasets and they reveal an increasing similarity to observations over time. Strong decadal and multidecadal variability emerged in high wind speed percentiles and storm frequency, but no long-term changes for the last decades were detected.


Introduction
Winter storms over the North Atlantic are extreme weather events, which have a large impact on the coastal population, offshore and onshore industry, shipping, buildings, and other objects of value. Understanding windstorm changes on multidecadal to centennial or even longer time scales is a prerequisite for analyzing future variability of storm activity. Over the North Atlantic, modes of natural variability need to be taken into account when analyzing long-term storm variability. Walz et al. (2018) analyzed the impact of various large-scale drivers on European winter windstorm counts and their seasonal clustering. The main drivers they found for the North Atlantic are the North Atlantic Oscillation (NAO) and the second prominent mode of low-frequency variability over the North Atlantic, the east Atlantic pattern. For the Norwegian Sea and farther north, the polar pattern, which is linked to the strength of the circumpolar circulation, is the prevailing teleconnection pattern.
The most dominant teleconnection pattern, the NAO, describes the strength of the pressure difference between the Icelandic low and the Azores high. It is strongly related to the location of the polar jet stream and thus to the strength and direction of the extratropical storm track (Hanna and Cropper 2017;Rogers 1997;Wettstein and Wallace 2010). The NAO is very important for weather conditions over the North Atlantic and western Europe in winter (Pinto and Raible 2012). The pattern shows interannual to multidecadal variability. On shorter time scales variability is associated with meridional shifts of the jet and storm track and on longer time scales with their strength (Woollings et al. 2015). The dominant variability on time scales of decades to centuries was shown to be nonstationary in time (Appenzeller et al. 1998). The NAO is linked to the storm activity over the North Atlantic with increasing similarity over time; after about 1950 correlations are high with values of about 0.7. Between 1951 and 1990 the NAO shows a positive trend, which generally resembles geostrophic storm indices (see Fig. 4 of Krueger et al. 2019), but still the time series vary in their amplitudes and the NAO is not the only contributing factor. In this article, we address the question of how extratropical storms over the North Atlantic have changed during the last decades in a comprehensive approach that integrates observation-based data, the most recent full reanalysis datasets, and a global high-resolution hindcast.
Observations such as wind speed measurements often suffer from inhomogeneities over time as they are very dependent on station location, its surroundings, and measurement techniques (e.g., Lindenberg et al. 2012;von Storch et al. 2017). To overcome these potential inhomogeneities, geostrophic wind speed proxies based on surface pressure measurements can be used. As pressure is a large-scale variable that is mostly independent of station location, surroundings, or measurement technique and frequency, it provides homogeneous data on long time scales von Storch 2011, 2012;Feser et al. 2015). The pressure information can be utilized to derive geostrophic Denotes content that is immediately available upon publication as open access.
Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JCLI-D-20-0529.s1. wind speed, which acts as a proxy for storm activity. Time series of such geostrophic wind speed were first analyzed by Schmidt and von Storch (1993) for the German Bight and later for the eastern North Atlantic by, for example, Alexandersson et al. (1998Alexandersson et al. ( , 2000, Krueger et al. (2019), and Wang et al. (2009).
Most long-term gridded observation-based reanalysis datasets are available for the last 40-70 years, although some cover more than 100 years, such as 20CR (Compo et al. 2011;Slivinski et al. 2019) and ERA-20C (Poli et al. 2016). Efforts were made to take changes in station surroundings or measurement techniques into account. However, over time, reanalyses use a changing number of stations (mostly increasing), newer measurement techniques, and new products, such as satellite data, which were introduced at the end of the 1970s. This improvement in data frequency and station numbers as well as in measurement methods is adding value to the reanalysis data and provides increasing quality, but it also leads to potential inhomogeneities over time for trend analysis (Bengtsson et al. 2004;Krueger et al. 2013;Ferguson and Villarini 2012;Wang et al. 2006Wang et al. , 2016. The very long reanalyses, 20CR and ERA-20C, were found to be inconsistent in terms of long-term trends (Krueger et al. 2013;Befort et al. 2016;Wohland et al. 2019;Rohrer et al. 2019) due to an increasing station density over time and thus a higher chance of detecting storms in more recent times (Krueger et al. 2013(Krueger et al. , 2014Chang and Yau 2016). We therefore excluded these longterm reanalyses from our study and limit our analyses to the time frame of multidecadal reanalyses starting in 1948 as longterm trends in storm activity derived from reanalysis data were found to be consistent with observations after about 1930 (Krueger et al. 2013). Most of the recent full reanalyses start in 1979 or 1980, after the introduction of satellite data, which helps to reduce this problem. Nevertheless, longer time series are needed to assess changes in winter storm activity. Some of the older, longer-term reanalysis data were analyzed for storm tracks and extratropical cyclones. Raible et al. (2008) tracked cyclones in NCEP (Kalnay et al. 1996;Kistler et al. 2001) and ERA-40 (Uppala et al. 2005) and found an increase in winter cyclone frequency and intensity for the high-latitude North Atlantic and a decrease in storm numbers for the midlatitude North Atlantic. The same result was reported by Wang et al. (2006) for cyclone activity and by Trigo (2006) for cyclone numbers. Also some of the more recent reanalyses were examined for synoptic-scale extratropical cyclones (e.g., Hodges et al. 2011;Wang et al. 2016). Hodges et al. (2011) found that the numbers and spatial distributions of extratropical cyclones of the Northern Hemisphere compared well between ERA-Interim (Dee et al. 2011), MERRA (Rienecker et al. 2011), NCEP CFSR (Saha et al. 2006(Saha et al. , 2010, and JRA-25 (Onogi et al. 2007). Larger differences occurred for their maximum intensities, which were mostly resolution-dependent. Higher cyclone counts for higher resolutions were also reported by Tilinina et al. (2013) and Wang et al. (2016). Wang et al. (2016) compared nine reanalysis datasets using an objective tracking algorithm. They found large similarity in temporal variability, especially for the most recent decades after 1979, and for deep cyclone statistics. MERRA showed the highest mean cyclone counts and intensity, while NCEP-CFSR featured the lowest intensity.
The longer full reanalyses available are quite coarse in resolution, which limits the representation of regional features of extreme events. To gain long-term data at a higher resolution, a multidecadal weather hindcast for the last decades was computed. For that a global high-resolution atmospheric climate model was spectrally nudged at large spatial scales toward reanalysis data. Spectral nudging is a well-established method to constrain the large spatial scales of climate models for climate hindcast simulations (von Storch et al. 2000;Waldron et al. 1996;von Storch et al. 2018). The climate model remains free to develop regional-scale weather phenomena and processes, while larger-scale features are nudged toward reanalyses or global model data. The idea is that those large scales are well resolved by the coarser dataset and should thus not be altered by the higher-resolution model. In the beginning, the approach was applied to regional climate models. Later global spectral nudging was introduced by Yoshimura and Kanamitsu (2008) in a high-resolution atmospheric general circulation model, which was spectrally nudged toward the NCEP-2 reanalysis. Long-term simulations using this technique were performed by Kim and Hong (2012) and Schubert-Frisius et al. (2017) for 31 and 67 years. Some first case studies (Schubert-Frisius et al. 2017) lead to the conclusion that global spectral nudging helps to simulate extratropical and tropical cyclones at the time and place they had occurred. A recent study demonstrated the suitability of the hindcast setup in combination with the storyline method for climate change extreme event attribution (van Garderen et al. 2021).
In this article, we provide a holistic assessment of changes in North Atlantic storm activity during the last decades in the most recent full reanalysis datasets, in observation-based data, and in the global hindcast. The most recent reanalyses applied in this study are ERA5 (Hersbach et al. 2020), JRA-55 (Kobayashi et al. 2015), MERRA-2, and NCEP CFSR. Added to that list are the longer-term ERA-Interim and NCEP I, the latter being one of the first reanalysis datasets available. NCEP I served as input data for our global spectrally nudged hindcast. The observations we analyze here have neither been compared to recent reanalyses nor to the global hindcast in terms of North Atlantic storminess so far. As the observations cover near-surface wind speed and sea level pressure we will focus on these variables for our analyses. The overall question is this: How did extratropical winter storms over the North Atlantic change during the last decades in these various up-to-date datasets? Another important question that has not been raised and that we would like to tackle is how well the observed geostrophic wind speed index represents both storm frequency and intensity. Moreover, we will analyze the capability of the global spectrally nudged multidecadal hindcast to simulate extratropical cyclones over the North Atlantic.

a. Reanalyses
In this study, a number of different modern reanalysis datasets were analyzed:  (Saha et al. 2006(Saha et al. , 2010.

b. Downscaled high-resolution simulation
The general circulation model (GCM) used for the atmospheric hindcast of this study is ECHAM6 (Giorgetta et al. 2013) in its version ECHAM-6.1.00 as described in Schubert-Frisius et al. (2017). This state-of-the-art global climate model features the prognostic variables logarithm of surface pressure, temperature, mixing ratio of different water species, divergence, and vorticity. It is a spectral model coupled to a land vegetation model; sea surface temperatures and sea ice concentration are prescribed from the NCEP-NCAR reanalysis (Kalnay et al. 1996;Kistler et al. 2001). With this setup, a multidecadal hindcast for the past about 70 years from 1948 until April 2015 is computed at a resolution of T255L95. The horizontal fields are expanded into spherical harmonics up to total wavenumber 255, which corresponds to a horizontal grid distance of about 55 km and in the vertical 95 hybrid terrainfollowing levels up to 0.01 hPa (about 80 km) are chosen.
A GCM in an AMIP-type configuration will use sea surface temperatures and sea ice concentrations based on gridded observation-based products or on reanalysis data. Nevertheless, this still leaves a lot of freedom for the model to simulate various equivalent weather states, which could all have happened with the same probability. Here, we also use additional simulations where the observed large-scale circulation patterns are simulated by applying spectral nudging (von Storch et al. 2000). This means that large-scale weather states of the GCM are adjusted to be more similar to the forcing reanalysis data. The underlying idea is that the reanalysis can resolve atmospheric weather patterns at large spatial scales, such as high and low pressure systems or fronts, with high quality, whereas smaller spatial scales cannot be depicted realistically due to the coarse grid spacing. Synoptic scales, which can be resolved reliably by the reanalysis, should not be altered by the GCM in order to gain large-scale weather situations, which are close to the observed ones. By using such realistic large-scale information, a GCM at high resolution can add value by simulating regional scales and thus adding mesoscale details, which could not be resolved by the reanalysis. Spectral nudging provides realistic global hindcasts of past weather records, at resolutions much higher in both space and time than the reanalysis they are spectrally nudged to. This approach represents a simplified and computationally less demanding set up for high-resolution hindcast studies in comparison with the full assimilation schemes generally used for reanalyses. For the hindcast presented in this article (Schubert-Frisius et al. 2017), only vorticity and divergence are spectrally nudged toward NCEP I. Spectral nudging is not applied below 750 hPa, as most regional-scale processes take place here, and not above 3 hPa, as NCEP I does not provide data at such height. In between, a plateau-shaped profile is applied for the spectral nudging. Spectral nudging is applied for a wavenumber truncation of 30, so that wave lengths of approximately 1300 km and larger are nudged.

c. Resolution and domain
The different spatial resolutions of both the reanalyses and the GCM datasets are depicted in Fig. 1, which shows a snapshot of 10-m wind speed at 0600 UTC 1 January 2009. For further analyses, both the reanalysis and the global hindcast datasets were spatially interpolated to a common grid of 0.78, using a distance-weighted average remapping of the four nearest neighbor values. The common area chosen, denoted North Atlantic area in the following, extends from 808W to 408E and from 308 to 858N. A smaller subset of this region, called the northeast Atlantic area in the following, extending from 248W to 158E and from 508 to 688N (marked as a white box in the first panel of Fig. 1), was used for comparisons with the observationbased geostrophic wind speed index, which is described in more detail in the following subsection. Six-hourly data were selected for all datasets, which were linearly interpolated to 3-hourly data for storm tracking. For our analysis, we looked at extended winter seasons from October to March. The common period of all datasets includes the winter seasons from 1980/81 until 2014/15.

1) THE FRANKE TIME SERIES
The reanalysis and climate model results were compared to observations. One is a multidecadal time series of counts of intense low pressure system (,950 hPa) over the North Atlantic counted from weather analysis charts (Franke 2009). This time series was derived through the same method by Richard Franke from the German Weather Service from 1956 until 2018. As the low pressure systems are counted by eye the method is subjective, but as it was applied by the same meteorologist for all these decades any potential inhomogeneities of this long-term record are presumably not introduced by the method itself, but rather by changes in the making of the weather charts, such as by assimilating more measurements over time.

2) GEOSTROPHIC WIND SPEED DATASET
In addition, geostrophic wind speed indices based on historical air pressure readings from the International Surface Pressure Databank (ISPD; Compo et al. 2015;Cram et al. 2015) and from the Danish (Cappelen et al. 2018a,b) and Norwegian Meteorological Institutes (downloaded from MET Norway 2018) were analyzed [as explained in Krueger et al. (2019)]. The list of meteorological stations and observational periods is provided in Table S1 in the supplemental material. Pressure observations that were not at mean sea level height were reduced using the barometric formula and standard atmospheric values following Alexandersson et al. (1998). The data were temporally synchronized using a cubic spline interpolation.
Over the North Atlantic, as a first-order approximation, wind speed can be derived from pressure gradients if ageostrophic effects are not taken into account. Therefore, the method works best over flat terrain, such as the open ocean. The advantage of pressure readings is that long and homogeneous time series are available for a number of stations in the vicinity of the North Atlantic. To get spatial information out of the point measurements, triangles can be formed from these stations within which geostrophic wind speed statistics and thus storm activity can be calculated (Schmidt and von Storch 1993; Alexandersson et al. 2000; Krueger et al. 2019;Wang et al. 2009). A total number of 10 triangles were spanned covering the northeast Atlantic area from 248W to 158E and from 508 to 688N (see Fig. S4 and Table S2 in the online supplemental material). This area is depicted as a white box in the first panel of Fig. 1. The geostrophic wind dataset covers the period 1875-2016. The method used to derive the geostrophic wind speed index for this study is described in more detail in the supplemental material and in Krueger et al. (2019).

3) STORM TRACKING
Storms were detected using an objective tracking algorithm (Feser and von Storch 2008;Barcikowska et al. 2018) that detects minimum sea level pressure (SLP) and maximum nearsurface wind speed in a circular distance around the minimum pressure. The located minima were then connected to form cyclone tracks whereby certain criteria were applied. Those include a maximum wind speed of at least 17.2 m s 21 (Beaufort force 8) during the cyclone's lifetime, a pressure minimum that has to reach or exceed the selected pressure threshold at least once during the storm's lifetime, and a storm duration of at least 2.5 days, so that only intense storms were taken into account. The automated tracking approach was applied with the same settings for all datasets on the common grid (0.78) for the common period (winter seasons from 1980/81 until 2014/15) to achieve maximum comparability.

4) STANDARDIZATION
The standardization for the analyses of this study was done in the following way: extended winter season variables (e.g., wind speed percentiles or storm frequencies) minus timeaveraged (over all extended winter seasons) variables were divided by the standard deviation of the variables of all extended winter seasons.

a. Spatial fields
A snapshot of 10-m wind speed at 0600 UTC 1 January 2009, featuring several extratropical cyclones, is presented in Fig. 1. The different spatial resolution of the datasets is obvious; the lowest resolution is used by NCEP I and the highest by ERA5, while the other datasets show values in between. The three wind speed maxima in the western Atlantic and at the southern tip of Greenland are included in all datasets, and while the general spatial wind pattern is quite similar, the intensity varies, especially for the high wind speeds. The highest wind speeds can be seen for NCEP-CFSR and JRA-55, followed by ECHAM6 and MERRA2. The lowest wind speeds are visible for NCEP I, ERA-Interim, and ERA5. Figure S1 shows the SLP for the same snapshot. It shows that the wind maximum at the southern tip of Greenland is related not to low pressure, but rather to high orographic differences and katabatic winds flowing over elevated ice sheets toward the ocean. Generally the SLP pattern is very similar between the different datasets, with smaller differences than for the near-surface wind speed. Winter storms feature high wind speeds and low pressure, from which we analyze the 95th and 5th percentiles, respectively. Figure 2 shows the 95th percentiles of wind speed for the extended winter storm seasons (October-March) averaged from 1980/81 to 2014/15 over the North Atlantic area for different reanalysis datasets. To illustrate the discrepancies between the individual datasets, red isolines were added that show the differences between each reanalysis dataset and the mean over all datasets. The general pattern looks rather similar in all datasets; it shows the storm track over the North Atlantic, extending from the northeast coast of the United States in a northeasterly direction toward Greenland, Iceland, the British Isles, and the Norwegian Sea. The datasets with the highest wind speed percentiles are JRA-55, ECHAM6, and NCEP-CFSR. Relatively low wind speeds for the storm track area are given by ERA5, ERA-Interim, MERRA-2, and NCEP I. The largest differences between the datasets can be seen for the coast of Greenland and in its vicinity, indicating that they are based on the varying orographies of the different reanalyses.
The lowest 5th percentiles of SLP for the extended winter storm seasons (October-March) of 1980/81-2014/15 over the North Atlantic area for different reanalysis datasets are presented in Fig. 3. Again red isolines illustrate the differences between each reanalysis dataset and the mean over all datasets. As SLP is a rather large-scale variable with a spatial decorrelation length generally much longer than for wind speed, the patterns are more similar to each other than the ones for the highest wind speed percentiles. Low pressure patterns can be detected over the North Atlantic with a minimum between the southeastern tip of Greenland and Iceland and another one over the Norwegian Sea. The largest differences between the datasets of up to 8 hPa occur over Greenland and are presumably orography-dependent like the differences for the high wind speed percentiles. NCEP I shows slightly higher pressure values, which are more intensified in the ECHAM hindcast, which features the lowest pressure of all datasets.

b. Results for the northeast Atlantic
The results of the following subsections cover the smaller area of the northeast Atlantic (see white box in Fig. 1). This region is located at the end of the North Atlantic storm track and is thus influenced by changes in the large-scale circulation. This could be, for instance, a displacement of the storm track due to a phase shift of the North Atlantic Oscillation, which would then lead to changes in storm numbers entering the domain.

c. Temporal wind speed evolution
To relate these spatial high wind speed and low pressure patterns to changes in storm activity over time, time series of the highest wind speed percentiles are examined. Figures 4a  and 4b show the 95th and 99th percentiles, respectively, of wind speed for each storm season (October-March) over the northeast Atlantic area for different reanalyses and the storm index derived from geostrophic wind speed triangles (OBS). The observed storm index is based on station data, which cover the northeast Atlantic area from 248W to 158E and from 508 to 688N and thus represent a smaller region than the common area chosen for our study. This comparison was therefore computed for this smaller region. Wind speed percentiles were area averaged for this region and standardized. The time series show large year-to-year as well as decadal to multidecadal variability and no long-term trends. The temporal evolution of the high wind speed percentiles is very similar for the different datasets, especially after 1979, when satellite data were introduced to the reanalyses and most modern reanalyses start. For the earlier decades, only JRA-55 (starting in 1958), NCEP I, ECHAM6 (both starting in 1948), and the geostrophic wind speed observations are available. For these early times, the largest discrepancies occur. NCEP I and ECHAM6 (which was spectrally nudged toward NCEP I) are more similar to each other than JRA-55, which in general shows higher values before 1979. Afterward, NCEP I shows the highest values and, in contrast to the earlier decades, JRA-55 the lowest ones while the other datasets are close to each other. The large similarity between the datasets shows that the geostrophic wind speed index, which was solely derived from pressure measurements, is representative for high wind speeds and their highest percentiles over the northeast Atlantic. Since the geostrophic wind speed index is based on homogeneous, quality-checked, longdated pressure measurements, it can be assumed to well depict long-term changes in geostrophic wind speed. As the reanalyses show largest similarity with this wind speed index calculated from SLP observations for the most recent decades, our findings suggest that this is a result either of higher data availability in the reanalyses after introduction of satellite data to the assimilation process or of changes in the observing network density. For the earlier decades, larger variability between the simulated datasets indicates greater uncertainty based on smaller measurement density.
In a next step, the observation-based storm index is now recalculated based on the reanalyses and the GCM data. For that, the SLP was extracted from the simulated and reanalyzed data for those grid points, which encompass the 10 meteorological stations used for the observed storm index. Ten triangles covering the northeast Atlantic area were spanned in the same way as done for the observed index (see Krueger et al. 2019) to provide spatial information on storm activity within these triangles. Based on the simulated SLP values, geostrophic wind speed information was calculated for those triangles. Figure 5 depicts the storm index (standardized) based on geostrophic wind speed triangles (95th and 99th percentiles) derived from modeled SLP for each storm season (October-March) over the northeast Atlantic area for different reanalyses (colored time series) and the same storm index derived from SLP measurements (black lines). Figure S2  shows the winter season time series, and Fig. 5 shows the 10-yr running mean for the 95th (Fig. 5a) and 99th (Fig. 5b) percentiles. The curves are all close to each other, especially for the most recent decades, while slightly larger differences occur for the first decades. For this time, ECHAM6 and JRA-55 are closer to the observations than NCEP I. For the later decades, all modeled datasets are close to each other and deviate only slightly from the observations. In accordance with other studies (e.g., Feser et al. 2015;Krueger et al. 2019;Tilinina et al. 2013), the 10-yr running mean time series show multidecadal variability for the North Atlantic storm activity, a decreasing trend until the 1960s, a subsequent increasing trend until the begin of the 1990s, and a decrease back to average values for the most recent years.

d. Storm tracks
An objective tracking algorithm was applied to identify storms within the modeled datasets. The result is shown in Fig. 6 and Fig. S3, which depict standardized tracked storm frequencies (tracked with thresholds 950, 960, and 980 hPa) for extended winter storm seasons (October-March) over the northeast Atlantic area of the different reanalysis datasets (colored time series) in comparison to the storm index derived from geostrophic wind speed triangles (black solid line shows the 95th percentile; black stippled line shows the 99th percentile). The gray areas show the range of uncertainty based on station availability, whereby a large spread indicates larger uncertainty due to fewer stations available, as described in Krueger et al. (2019). In the early years of the record, missing data from the Aberdeen station from 1948 until 1956 lead to higher uncertainty values as five geostrophic wind speed triangles, which contribute to the storm index, are based on that data. After 1960 uncertainty decreases further due to higher data availability (Krueger et al. 2019). Figure S3 shows winter season time series and Fig. 6 their 10-yr running means. The observed storm index shows some differences to the modeled track numbers, especially for the early decades. Also between the modeled datasets the numbers vary, with the largest differences occurring for the very first decades. For 950 and 960 hPa, ECHAM6 shows the highest storm frequencies and is mostly outside of the uncertainty range of the observed index, while JRA-55 features lower numbers until about 1975. Afterward JRA-55 and NCEP show the largest values, which are often above the uncertainty range. However, differences between the datasets generally decrease after the introduction of satellite data at the end of the 1970s and the reanalyzed data lie largely within the uncertainty range of the observed storm index. The gray line in Fig. 6 shows the NAO index, standardized for the common period. The correlation between the NAO and storm frequencies from reanalyses or storm indices of geostrophic wind speed is lowest for earlier decades and increases for the most recent ones. The correlations are provided in Fig. S5, ranging between 20.11 and 0.97. Correlations are on average 0.6 before 1980 and 0.84 afterward. Lower pressure thresholds and thus more intense cyclones show higher correlations with the NAO; also, the storm index for the 99th percentiles features higher correlations than the one for the 95th percentiles.
Correlations between the observed wind speed index and reanalyses are provided in Fig. 7a) for the 95th percentiles and in Fig. 7b) for the 99th percentiles of geostrophic wind speed. The figures show the first period until 1980 and the most recent decades separately as the first decades are only available for the three longest simulated datasets, ECHAM6, NCEP I, and JRA-55. Usage of different tracking pressure thresholds does lead to different results. The lowest correlations occur for a threshold of 980 hPa, which indicates that the geostrophic wind speed index is representative for more intense storms, both for the 95th and 99th percentiles. For the first decades, ECHAM6 shows the highest correlations, which are still below 0.4. Figures 7a and 7b clearly show that correlations are much higher for the most recent decades. Here, ERA-Interim shows the highest correlations with values up to 0.71 for 970 hPa, followed by ERA5. ECHAM6 and MERRA-2 have the lowest correlations for the most recent decades. For 950 and 960 hPa the simulated storm frequency time series generally follow the observed geostrophic wind speed index (see Fig. 6) and show the monitored multidecadal decrease until the mid-1960s, a subsequent increase until the beginning of the 1990s, and a decrease to values around average afterward. The same temporal evolution for cyclone numbers with an increase until 1990 and a subsequent decline was described by Tilinina et al. (2013) for deep cyclones (,960 hPa). This comparison demonstrates that, even though the differences are larger than for the high wind speed percentiles shown in Fig. 4, the observed SLP-based storm index is also representative of storm frequency, especially for the most intense storms.

e. Results for the North Atlantic
We add another type of observation, namely a counting of intense low pressure systems below 950 hPa over the North Atlantic in weather analysis charts. This observational record is unique in how it was derived by the very same colleague of the German Weather Service using the very same method for more than 50 years [extended from Franke (2009)]. One point we would like to make is that the observed counts of low pressure systems is not identical to storm tracks derived with our tracking algorithm. To be defined as a storm track, several criteria have to be fulfilled, concerning, for instance, wind speed or storm duration, and not just the minimum pressure threshold. However, we think that the comparison still provides interesting insights. The observed time series (pink) is shown in Fig. 8, together with the storm frequencies derived from the reanalyses and GCM (now tracked with a tracking threshold of 950 hPa; colored time series). The observed low pressure curve is covering the whole North Atlantic and therefore the modeled datasets were now tracked for the common North Atlantic area used for this study. Please note that the black line no longer shows observations, but rather the ensemble average of all simulated datasets. We use the ensemble average as we consider all datasets to provide equally probable realizations of storminess. Figure 8a shows winter FIG. 5. Storm index (standardized) based on geostrophic wind speed triangles derived from modeled SLP for each storm season (October-March) over the northeast Atlantic area for different reanalyses (colored time series) and the same storm index derived from SLP measurements (black lines). Shown are 10-yr running mean time series. The storm index is shown based on the (a) 95th and (b) 99th percentiles of geostrophic wind speed. season time series, while Fig. 8b shows their 10-yr running means. In this analysis we show raw numbers and not standardized ones, and therefore the resolution dependency of the reanalysis (Tilinina et al. 2013) can be seen in the results. All reanalyses were interpolated to the common grid of 0.78 before applying the objective tracking algorithm. However, also after interpolation, the storm frequencies still largely depend on the original reanalyses resolution as, for example, for higher resolutions generally more intense storms will be simulated. After interpolation to a lower resolution these cyclones will still feature higher wind speeds and lower pressure values than the storms of a lower-resolution dataset and those more intense storms may thus lead to higher track numbers for certain tracking thresholds. NCEP I provides the lowest storm numbers, while ECHAM6 generally features the highest track numbers. JRA-55 shows increasing storm numbers for the first decades, from the low level of NCEP I up to the higher ECHAM6 values. For the more recent decades the modern reanalyses are close to each other; the highest values can be seen for MERRA-2 [which is in accordance with Tilinina et al. (2013), who found the highest cyclone numbers for MERRA] and NCEP-CFSR. Especially for the first decades until winter 1988/89, the observed low pressure system numbers match quite well with the reanalyses and ECHAM6. Afterward the absolute observed low pressure counts increase remarkably; the reason for this is still unknown. However, this increase does not resemble any of the simulated or reanalyzed datasets and we suggest that a change in data availability or modified technique in creating the weather analysis charts might have led to the higher numbers in the most recent decades (Karl et al. 1993). Still both the observed curve and the reanalyses show decreasing numbers starting in the early 1990s until the mid-2000s. During the most recent years the peak of winter 2013/14 and the subsequent drop back to values around average is represented both in the observations and simulated datasets. The correlations between the time series are given in Fig. 7c, and also for other tracking thresholds of 960-980 hPa. Not surprisingly, the observed 950-hPa low pressure system time series correlates best with simulated storm numbers derived with low tracking thresholds of 950 and 960 hPa. For 950 hPa, for the first decades until 1980 ECHAM6 shows highest correlations; also, the other two long-term reanalyses show high values larger than 0.5. For the most recent decades, ERA-Interim and MERRA-2 feature the highest correlations with observations with values up to 0.69.
We now consider storm parameters along the storm tracks. For reasons of better comparability with other studies and structural unity, we chose 980 hPa as a common pressure threshold in the following analyses. Figure 9a shows the storm season averaged minimum pressure of tracked storms over the North Atlantic area for different reanalysis datasets and their ensemble mean as 10-yr running mean time series. The ensemble mean is shown because no gridded measurements, which could be tracked for storms, are available. The ensemble mean does not represent a solution closest to observations, but it exemplifies that all reanalysis datasets are treated as being of generally similar quality. NCEP I generally features higher minimum pressures than the other reanalyses, the lowest pressure values are reached by ECHAM6 and MERRA-2. The remaining reanalyses are close to each other and to the ensemble mean. Some multidecadal trends can be seen for the minimum pressure. After an increase for the very first years, from the mid-1960s on pressure decreases by approximately 3 hPa until around 1990, followed by a sharp increase of about 2.5 hPa until 2005 and a subsequent drop of almost 1 hPa afterward. The long-term decrease in minimum pressure from the 1960s until the 1990s resembles, in terms of storm intensification, the increase in geostrophic wind speed (Fig. 5) and storm numbers for the most intense storms (Figs. 6b,c). The subsequent increase in minimum pressure, and thus storm weakening, fits these analyses as well. Near-surface maximum wind speed values are compared in Fig. 9b), which depicts storm season averaged maximum wind speed of tracked storms over the North Atlantic area for different reanalysis datasets and their ensemble average for 10-yr running mean time series. Larger differences are apparent between the datasets. The lowest wind speed maxima are present in NCEP I and ERA5 and the highest in JRA-55 and NCEP-CFSR, while the other simulated datasets are close to the ensemble average. The low wind speeds in ERA5 are in contrast to its high resolution; we assume that a possible explanation for the low wind speed in ERA5 is the new waveatmospheric coupling. ERA-Interim also features a wave coupling, though not as advanced as the one for ERA5. Ramon et al. (2019) analyzed near-surface wind speed of different reanalyses in comparison to observations and found ERA5 to provide results closest to the observations. However, they focused on wind at turbine hub heights and use the ERA5 wind speed at a height of 100 m, where the wave coupling is less important than close to the surface. Carvalho (2019) compared wind speed at a height of 10 m for different reanalyses, namely FIG. 7. (a) Correlations between standardized storm frequencies from reanalyses and standardized storm index (see Fig. 6) derived from 95th percentiles of geostrophic wind speed for extended winter storm seasons (October-March) 1948/49-1979/80 (shown at left) and 1980/81-2014 Fig. 9d for their 10-yr running mean time series. All time series show large year-to-year variability and range between 3.6 and 4.1 days. NCEP I and NCEP-CFSR show slightly longer storm durations than ECHAM6 and MERRA-2. NCEP I and ECHAM6 show a decrease in storm duration until around 1990 and an increase afterward back to values around average. But overall variability is large and thus no conclusions can be drawn on long-term changes in storm durations. Figure 10 shows the maximum near-surface wind speed for all grid boxes covered by storm tracks for the extended winter storm seasons 1980/81-2014/15 over the North Atlantic area for different reanalyses and simulated datasets. In accordance with Fig. 2, all simulated datasets feature the North Atlantic storm track, but some differences between the datasets are notable. The highest wind speed values along the tracks are apparent in NCEP-CFSR and JRA-55. NCEP I and ERA5 show, in accordance with Fig. 9b, the lowest wind speeds. Also note that the coarser resolutions of NCEP I and JRA-55 become noticeable visible artifacts in the depiction of maximum near-surface wind speeds as the grid configuration used in this study is finer than the native resolution of these datasets.
For a more systematic comparison, track densities for the extended winter storm seasons 1980/81-2014/15 over the North Atlantic area for different reanalysis datasets were computed (Fig. 11). All reanalyses and the GCM show the North Atlantic storm track in quite similar patterns, but the numbers vary. The general track density pattern compares well with the one shown for NCEP in Pinto et al. (2009, see their Fig. 4a) and the one for ERA5 in Priestley et al. (2020, see their Fig. 1a), with a maximum southeast of Greenland and southwest of Iceland. Highest track densities can be seen for NCEP-CFSR and JRA-55. Lower track densities occur for ERA5, ECHAM6, and MERRA-2. The red contour lines show track origin densities, only the origin points of each storm track were taken into account. Again, NCEP-CFSR shows the highest track densities while ERA5, ECHAM6, NCEP I, and MERRA-2 feature the lowest ones.

Summary and conclusions
In this study, changes in extratropical windstorm activity over the North Atlantic were analyzed in observations, recent reanalysis datasets, and a global, high-resolution atmospheric hindcast for the last decades. The reanalyses chosen for this assessment were ERA5, ERA-Interim, JRA-55, MERRA-2, NCEP I, and NCEP-CFSR. The global hindcast was simulated with the GCM ECHAM6, spectrally nudged toward the NCEP I reanalysis. A geostrophic wind speed index based on sea level pressure measurements and a counting of low pressure systems over the North Atlantic serve as observations. We tackled the question how extratropical windstorms over the North Atlantic changed during the last decades in these different databases. The general storm track pattern and large-scale fields of high wind speeds and low pressure are similar among the simulated datasets. The temporal evolution of simulated high wind speed percentiles follows the well-described (Alexandersson et al. 2000;Krueger et al. 2019;Feser et al. 2015;Wang et al. 2009; WASA Group 1998) observed multidecadal pattern of a decreasing trend until the 1960s, a subsequent increasing trend until the begin of the 1990s, and a decrease back to average values for the most recent years. Large similarity can be seen between the geostrophic wind speed index derived from pressure measurements and the reanalyses. This similarity is even higher for reconstructed simulated geostrophic wind speed indices. Here, differences generally occur only for the very first decades, long before the introduction of satellite data. This is presumably a result of data sparsity for the reanalyses' assimilation process in these early decades.
The simulated storm numbers match the observation-based storm index well. Highest correlations are visible for the most recent decades and the more intense storms. We conclude that the similarity of the geostrophic wind speed storm index to reanalyzed high wind speed percentiles and storm numbers confirms its suitability to describe storm activity for multidecadal time scales. Another long-term observation was provided by the German Weather Service and consists of a manual low pressure count for intense lows , 950 hPa over the North Atlantic. This observation shows generally high correlation with the most intense simulated storm numbers, although FIG. 9. Storm season averaged, 10-yr running mean (a) minimum pressure (hPa), (b) maximum wind speed (m s 21 ), (c) storm track lengths (km), and (d) storm durations (days) of tracked storms (tracked with a threshold of 980 hPa) for extended winter storm seasons (October-March) over the North Atlantic area for different reanalysis datasets (colored time series) and their ensemble average (black line). simulated storms have to fulfill certain tracking algorithm criteria in order to be defined as a storm instead of just featuring low pressure. However, the multidecadal trends of both observations and simulated data generally agree and are close to the variability of high geostrophic wind speed and the storm indices, although the absolute numbers differ.
Parameters along the storm tracks were analyzed in more detail. Storm season averaged minimum pressure along the track reveal higher pressure values for NCEP I than for the other reanalyses, which are close to each other. Multidecadal trends of minimum pressure are almost opposite to the ones for high wind speeds and storm frequency, with a long decrease from the 1960s until around 1990 and a subsequent increase. This inverse relationship was also described by Krueger and von Storch (2012) and Matulla et al. (2007). Near-surface maximum wind speed shows the largest differences with lowest wind speeds of NCEP I and ERA5 and highest of JRA-55 and NCEP-CFSR. As the low wind speeds of ERA5 are in contrast to its high resolution it is concluded that inherent processes in computing the reanalysis (e.g., the wave-atmosphere coupling) lead to the lower wind speeds in comparison with other reanalyses that have no internal coupling. Storm track lengths are longest in NCEP I and shortest in ECHAM6 and MERRA-2. Only NCEP I and ECHAM6 show a decreasing trend from the 1960s until the 1990s and a subsequent increase while the other datasets are more variable. Variability is even higher for storm durations; the time series show large year-to-year variability and no long-term trends could be detected. Track densities show a common pattern with a maximum southeast of Greenland and southwest of Iceland; the highest densities can be seen for NCEP-CFSR and JRA-55, which is also the case for storm track origin densities.
In this article, we showed that recent reanalysis datasets are capable to represent extratropical winter storms over the North Atlantic and their changes over time. The multidecadal hindcast computed with a GCM is a high-resolution reconstruction that overall lead to results closer to observations and high-resolution reanalyses than the coarse forcing reanalysis. It thus presents a more feasible solution to downscale long-term reanalysis data of lower resolution without using a full assimilation scheme. Two independent observational datasets confirm the temporal storm activity evolution and changes in FIG. 11. Weighted (square root of the cosine of the latitude) track densities (threshold 980 hPa) computed over 48 unit spheres for the extended winter storm seasons (October-March) 1980/81-2014/15 over the North Atlantic area for different reanalysis datasets. The red contour lines show weighted track origin densities (contour interval is 20) for the same threshold, unit spheres, period, and geographical region. storm frequency and high wind speeds on decadal to multidecadal time scales. In the long term large variability occurs, which requires even longer homogeneous measurement records in order to derive long-term trends for changes in windstorms over the North Atlantic.