## Abstract

Current estimates of the European windstorm climate and their associated losses are often hampered by either relatively short, coarse resolution or inhomogeneous datasets. This study tries to overcome some of these shortcomings by estimating the European windstorm climate using dynamical seasonal-to-decadal (s2d) climate forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF). The current s2d models have limited predictive skill of European storminess, making the ensemble forecasts ergodic samples on which to build pseudoclimates of 310–396 yr in length. Extended winter (October–April) windstorm climatologies are created using scalar extreme wind indices considering only data above a high threshold. The method identifies up to 2363 windstorms in s2d data and up to 380 windstorms in the 40-yr ECMWF Re-Analysis (ERA-40). Classical extreme value analysis (EVA) techniques are used to determine the windstorm climatologies. Differences between the ERA-40 and s2d windstorm climatologies require the application of calibration techniques to result in meaningful comparisons. Using a combined dynamical–statistical sampling technique, the largest influence on ERA-40 return period (RP) uncertainties is the sampling variability associated with only 45 seasons of storms. However, both maximum likelihood (ML) and L-moments (LM) methods of fitting a generalized Pareto distribution result in biased parameters and biased RP at sample sizes typically obtained from 45 seasons of reanalysis data. The authors correct the bias in the ML and LM methods and find that the ML-based ERA-40 climatology overestimates the RP of windstorms with RPs between 10 and 300 yr and underestimates the RP of windstorms with RPs greater than 300 yr. A 50-yr event in ERA-40 is approximately a 40-yr event after bias correction. Biases in the LM method result in higher RPs after bias correction although they are small when compared with those of the ML method. The climatologies are linked to the Swiss Reinsurance Company (Swiss Re) European windstorm loss model. New estimates of the risk of loss are compared with those from historical and stochastically generated windstorm fields used by Swiss Re. The resulting loss-frequency relationship matches well with the two independently modeled estimates and clearly demonstrates the added value by using alternative data and methods, as proposed in this study, to estimate the RP of high RP losses.

## 1. Introduction

Extreme winds associated with midlatitude cyclones represent a major loss potential for reinsurance companies (Swiss Re 2000). To quantify this risk many insurance companies use a combination of historical analyses of windstorms and stochastic and/or dynamical models to generate hypothetical windstorms. Generally, all data sources, whether they are based on historical observations or dynamical models, have various advantages and disadvantages. Historical datasets are relatively short, have coarse spatiotemporal resolution, or suffer from inhomogeneities, whereas dynamical models have biases associated with their physics and parameterizations and their use is often a trade-off between spatiotemporal resolution and integration length.

Reinsurance companies are especially interested in the frequency (or return period) of very rare extreme wind events on continental and synoptic scales and their associated losses. The primary influence on the accuracy of the estimation of the frequency of rare events is the number of observations of such events. Della-Marta et al. (2009) used the 40-yr European Centre for Medium-Range Weather Forecasting (ECMWF) reanalysis (ERA-40) to estimate the return period (RP) of extreme winds at continental and regional scales. They found that the uncertainties associated with long RPs (approximately 30 yr) are in the range between −60% and +200% of the RP estimate. This places limitations on the use of these data for reinsurance risk analysis applications.

The main aim of this study is to reduce the uncertainty associated with estimating the RP of very rare windstorm events. Previous studies investigating the frequency of extreme winds or extreme cyclones in the Atlantic–European domain use a number of different data and methodologies depending on the aim of the study (see Della-Marta et al. 2009). Generally all such observation-based studies are limited by the length and/or resolution of the data used. Improvements have been made using statistical resampling techniques (e.g., Dukes and Palutikof 1995; Palutikof et al. 1999), allowing more accurate inferences about the nature of the parent distribution. These studies suffer from incomplete knowledge of the dynamics that lead to such extremes. Alternatively, long ensemble integrations of dynamical coupled atmosphere–ocean general circulation models (AOGCMs) can be used as a dynamical sample to estimate the statistics of extreme events. Applications include the estimation of extreme synoptic-scale winds (van den Brink et al. 2004a), coupling of extreme atmospheric and ocean conditions with hydrological models (van den Brink et al. 2004b, 2005), and the statistics of extreme temperatures (Vannitsem 2007). Broadly speaking, these types of data are referred to as seasonal-to-decadal (s2d) data [see Schwierz et al. (2006) for an overview].

The second major aim of this study is to investigate the feasibility of coupling s2d windstorm climatologies with the propriety loss model of the Swiss Reinsurance Company (Swiss Re) to gain a better understanding of high RP losses to a Swiss Re portfolio. In this respect, this study is similar to Schwierz et al. (2009), who investigated windstorm-related losses in expected future climate scenarios. Other studies that look at current and future extreme wind-related damage are Klawa and Ulbrich (2003), Heneka et al. (2006), Leckebusch et al. (2007), and Pinto et al. (2007). However, these studies are either regionally focused or use limited sample size datasets or reanalyses as the reference current windstorm climate.

It is perhaps worth briefly explaining why we are interested in using dynamical models (e.g., s2d) for the estimation of extreme value statistics. Observations of the climate system represent a sample of its dynamics whose elements interact on different time scales. The fact that some of these time scales exceed the length of our observations (e.g., ocean processes) suggests that observation-based extreme value statistics probably do not sample the entire range of possible values. While statistical resampling helps to overcome the observation length limitations it cannot reproduce properties of dynamical systems such as bifurcations, fixed points, etc. A key assumption of classical extreme value theory is that the distribution function is continuous and differentiable (Gumbel 1958). Balakrishnan et al. (1995) show that this is not valid for simple dynamical systems, which display deterministic chaos. Moreover, Vannitsem (2007) shows that the statistics of temperature extremes in a simplified general circulation model (GCM) only weakly converge (as a function of integration time) to a stable extreme value distribution.

We follow a similar approach to van den Brink et al. (2004b, 2005) in this study by using AOGCMs that are run in seasonal forecast mode (Anderson et al. 2003; Palmer et al. 2004; Anderson 2008). These models are initialized with perturbed initial conditions based on observations at the beginning of each season and then integrated freely with boundary conditions thereafter (such as anthropogenic forcings; Liniger et al. 2007). At the point in forecast lead time where the skill of forecasts is close to zero the ensemble members can be treated as independent samples of the model climate system (i.e., ergodic samples). A possible drawback of using such models to estimate the current (and future) climate is that the results are likely to be model dependent because of many model uncertainties (see, e.g., Kharin and Zwiers 2000; Schwierz et al. 2006). These uncertainties need to be addressed when we use the Swiss Re loss model, which relies on observations for its calibration.

Below we elaborate the specific aims of this study:

to quantify the predictability of storminess over the western European domain in the current generation of s2d climate forecasts from the ECMWF;

to assess the usefulness of s2d data as a surrogate (pseudo) climate to estimate the frequency and intensity of windstorms over Europe and to quantify the amount to which ERA-40 may under- or overestimate the severity and frequency of windstorms;

to apply the new storm climate to a user application model, namely, the Swiss Re windstorm loss model, to compare the loss-frequency estimates based on the Swiss Re and s2d windstorm climatologies.

We start with an overview of the data and models used in this study and the calculation of extreme wind indices (EWI) that will form the basis of estimating the windstorm climate. The following sections describe the methodology and the main results, including an intercomparison of the extreme wind climatologies and modeled loss estimates, followed by some discussion and conclusions.

## 2. Data and model description

In this section we describe the datasets used in this study and the necessary background details about the Swiss Re loss model.

### a. S2d model data and ERA-40 reanalysis data

In this study we use the ECMWF seasonal climate forecasts, which are also known as s2d forecasts (e.g., Anderson 2008), and ERA-40 (Uppala et al. 2005). The 10-m-level wind gust speed is a useful parameter for assessing wind extremes at the surface that are responsible for damage to infrastructure and the loss of human life. A previous study by Della-Marta et al. (2009) demonstrated that the wind gust parameterization in ERA-40 produces unrealistic wind gust estimates in the areas of either steep orographic gradients or coastal regions of Europe. The same parameterization has been used in earlier versions of the ECMWF s2d forecasts. The latest version of the ECMWF s2d forecast model (system 3; ECMWF 2007; Anderson et al. 2007) has an improved wind gust parameterization and therefore could be used in this study; however, to maximize the use and transferability of analysis methods, we use geostrophic wind speed (GW) at the 850-hPa level. By using GW we lose both information about real surface winds affected by, for example, surface roughness, and possible intensification associated with mesoscale frontal features. On the other hand we gain a wind climatology that is independent of the various parameterizations and one that focuses model differences on those associated with large-scale circulation. The latter represents an advantage for model intercomparisons.

We use three different sets of s2d hindcasts from the ECMWF (an overview is given in Table 1). The latest s2d model is known as system 3 (SYS3) and is based on the physics documented in ECMWF (2007). Other datasets are system 2 (SYS2) and the ECMWF model used in the Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER) project and the first phase of the European Union Sixth Framework Programme (EU FP6) Ensemble-Based Predictions of Climate Changes and their Impacts (ENSEMBLES) project (EC-DEM; Palmer et al. 2004) both based on similar physics (ECMWF 2003). GW is calculated from the model geopotential height (GPH) at 850 hPa (Holton 2004). Our analysis concentrates on the extended winter season, October–April, and we focus our results on the eastern North Atlantic and western Europe (35°–73°N, 15°W–30°E) domain. Figures 1a,b present the domain and example wind fields as represented in the Swiss Re dataset and ERA-40 GW data, respectively. Note that the October–April season of s2d data is formed using a composite of data from two different forecast initialization dates. Rationale for this is found in the predictability assessment presented in section 3a.

### b. Extreme wind indices

Della-Marta et al. (2009) used a number of different scalar indices (time series, known as EWI) to summarize a windstorm’s magnitude and spatial extent over a given domain. Three of the five indices used in the study (Mean, Q95, and Sw3q90) are based on absolute measures of the wind speed, whereas two (Sfq95 and Sfq95q99) are based on wind speeds relative to local climatology (see appendix B for their exact definition). They found that the EWIs—Mean, Sfq95, and Sfq95q99—result in similar RP estimates for individual storms in ERA-40 and recommend that these indices be used when more weight to local winds relative to their climatology is needed and to find storms affecting regions not subject to regular extreme wind events. The EWIs Sw3q90 and Q95 result in similar RP estimates for individual storms. These indices should be used when more weight to the absolute magnitude of a windstorm is needed, regardless of the local wind climatology. Throughout the remainder of the paper we will concentrate on presenting results of the Sw3q90 index, which is the sum of wind speed cubed above the extended winter season 90th percentile, and the Sfq95q99 index, which is the sum of the fraction of extreme wind divided by the length of the distribution tail at each grid point. See appendix B for a complete specification of these indices and Della-Marta et al. (2009) for more details, including an intercomparison of the RPs of historical storms based on these indices.

### c. Swiss Re windstorm data

Swiss Re bases its estimate of European windstorm risk on a dataset consisting of around 150 historical windstorm fields from 1947 to the present (SR Hist) and a derived, stochastically generated windstorm field dataset (SR Stoch). Each field represents the maximum wind gust at 10-m level over the lifetime of the storm. An example is shown in Fig. 1b. The creation of a historical wind field is a multistep process starting with the digitization of surface air pressure and the position of frontal features from weather charts. Quasigeostrophic principles are used to create a first guess of the sustained wind field. Frontal gust correction factors are then calculated based on the linear relationship between the quasigeostrophic wind speeds and measured gust wind speed at the positions of the digitized fronts. Then a third and final stage is the correction of the wind field by adding a bilinearly interpolated grid of observed 10-m wind gusts values in the regions where the storm had its highest impacts. The SR Hist dataset has undergone many iterations to improve its quality, from specific corrections in Alpine regions to country-specific validation/corrections with in situ wind measurements. The process of developing such a dataset is labor intensive and has been focused on mapping wind fields that have affected Swiss Re’s business. For this reason, the dataset represents a subset of the extreme wind climate over Europe and is not necessarily a homogeneous sample of the storminess over the larger domain considered here (see Della-Marta et al. 2009 for more discussion). To overcome these shortcomings Swiss Re has created a stochastically generated windstorm dataset, using the historical fields as a basis. Each historical windstorm field has been resampled by perturbing its intensity and position within physically constrained boundaries. The position of each historical storm is perturbed either by a rigid motion or a rotation with the bounds of ±50–100 km and ±5° (along the major axis of storm movement), respectively. Each position perturbation is considered equally likely as the original historical wind field on which it is based; however, the wind intensity of the historical wind fields are altered subject to the constraint of the observed frequency–intensity relationship derived from a storm severity index (Lamb and Fydendahl 1991; similar the Sw3q90 EWI). In other words, each intensity perturbed storm is assigned a frequency depending on its observed frequency at a given intensity. Frequencies for the most extreme perturbed windstorms, where there are few observed storm severity index values, are assigned by using an extreme value distribution fitted to the observed index data. This ensures consistency in the frequency–intensity in SR Hist and SR Stoch datasets. The SR Stoch dataset represents around 3000 years of synthetic windstorm climate and contains 10 010 events. See Schwierz et al. (2009) for more details.

### d. Swiss Re loss model

The proprietary loss model of the Swiss Re is based on four modules (Zimmerli 2003; Schwierz et al. 2009): (i) the hazard module is essentially the SR Hist and SR Stoch datasets described above (Fig. 1b). (ii) The vulnerability module models a measure of damage (loss) as a function of the intensity of an event. A vulnerability curve (see Fig. 2a) is derived from loss data or based on engineering considerations for each risk type (e.g., residential, commercial, agricultural). (iii) The value distribution describes the amount (total sum insured), location, and risk type of the insured values (see Fig. 2b). (iv) The insurance conditions comprise all the information on the specific details of particular contracts and determine the proportion of the loss that is insured. In this study (as also used in Schwierz et al. 2009) a slightly simplified version of the operational Swiss Re model (called catXos) is employed. It uses an average vulnerability curve for all risk types and locations (Fig. 2a). The western European property insurance portfolio (Fig. 2b) is chosen to be representative of the western European market.

## 3. Methodology

In the following section we describe the most important details of methods used throughout the paper. It starts with the methodology associated with and the results of assessing the prediction skill of seasonal western European storminess using s2d models. It continues with an overview of the storm selection process and extreme value analysis (EVA) techniques, followed by methods of calibration applied to storm fields or an EWI. All analyses of uncertainty are presented as a 95% confidence interval (CI) unless otherwise stated.

### a. Prediction skill for winter storminess

We start with investigating the large-scale bias of s2d models relative to ERA-40 for the mean extended winter season GPH at 850 hPa. Figure 3a shows the mean pattern of GPH in ERA-40 and Figs. 3b,c,d show the bias of the s2d models relative to ERA-40. SYS3 GPH differences display an enhanced westerly flow compared to ERA-40 (Fig. 3b), with a deeper Icelandic low and a stronger Azores high, whereas both SYS2 and EC-DEM display a general positive GPH bias over the North Atlantic region compared to ERA-40. Biases in trends are not considered here but can be assumed to be of minor relevance for GPH in the extratropics (Liniger et al. 2007).

The skill in predicting extended winter season storminess is assessed in terms of evaluating the Q95 index (appendix B) from s2d model data (specified in Table 1) against that from the ERA-40 data. Two skill metrics are used: (i) correlation skill score, which correlates the ERA-40 Q95 with the ensemble mean s2d Q95 for the months November–April (May) [NDJFMA(M)] and (ii) debiased ranked probability skill score (RPSSd; Weigel et al. 2007), which measures the probabilistic predictability of quantile forecasts (here we use terciles). The skill metrics are computed from monthly mean Q95 values. Sampling uncertainty is assessed using a bootstrap with replacement 1000 times.

Figure 4 shows the results of each skill metric applied to each s2d model. A notable feature is significant positive skill for the first month of each of the models’ forecasts (except EC-DEM RPSSd; Fig. 4f). Using either skill metric shows that the predictability of the first month of the EWI increases with model development. All models show little or no skill after November. Generally the widths of bootstrapped CIs are smaller for EC-DEM skill metrics than the other models because of the larger sample (Table 1, fourth column). If both the correlation skill and the RPSSd skill are not significantly different from zero, there is no predictive skill and the ensemble members are statistically independent (Weigel et al. 2008b). Therefore it is justified that these data are used as a pseudoclimate once the first month of each forecast is removed (see Table 1).

### b. Windstorm selection and extreme value analysis

The EWI time series are the basis of our storm selection procedure. From these series it is possible to identify and quantify the intensity of many of the well-documented windstorms that have affected this area over the ERA-40 period (Della-Marta et al. 2009). We use the peaks over threshold approach (e.g., Davison and Smith 1990). Their distribution is modeled with a generalized Pareto distribution (GPD) whose parameters are fitted using either maximum likelihood (ML) or L-moments (LM; Hosking 1990), and we assume a Poisson occurrence for the events exceeding the threshold (see also Palutikof et al. 1999 and Della-Marta et al. 2009). The independent peaks over threshold series is obtained using a standard runs declustering (Coles 2001; Stephenson and Gilleland 2006) where the minimum separation between exceedances of the threshold is set to 3 days. Threshold selection and model evaluation follow the typical graphical diagnostic methods outlined in Coles (2001) and an objective goodness-of-fit (GOF) test based on the Anderson–Darling statistic detailed in Choulakian and Stephens (2001). The declustered peaks over threshold series (DPOT) of each EWI represent the occurrence and intensity of windstorms over the western European domain.

To introduce notation that will be used throughout the paper we rewrite the relevant formulae from Coles (2001) and Palutikof et al. (1999). The GPD can be written in terms of a generic variable *w* as

conditional on *w* > *u* and *ξ* ≠ 0 where *u* is the selected threshold. The GPD is characterized by two parameters: *ξ* the shape parameter and *σ* the scale parameter. If *ξ* > 0 then the maximum of the GPD is unbounded, whereas if *ξ* < 0 the tail has a finite extent, if *ξ* = 0 then the GPD reduces to the exponential distribution and is unbounded in the limit *ξ* → 0.

Equation (1) can be rewritten in terms of probabilities, which leads to the calculation of the *N*-year return level (RL) *w _{N}*, which is exceeded on average once every

*N*years (the RP) and is given by

where *λ* is the mean number of threshold exceedances per extended winter season. When the ML method is used we estimate the GPD fit uncertainty using the profile log-likelihood method recommended by Coles (2001) where accurate information about the uncertainty of derived parameters (such as the RP) are needed. Where the LM method is used we adopt a parametric resampling-based approach to estimating the uncertainty (Frei et al. 2006; Della-Marta et al. 2009). The declustering method behaves differently for different temporal resolutions of the EWI time series. For example, the EC-DEM dataset has a temporal resolution 24 h, whereas ERA-40 has a time resolution of 6 h. Since the difference in time resolution is of the same order as a typical windstorm development and translation time across the domain, it is probable that a certain amount of aliasing of the windstorm signal is present in the EC-DEM EWI relative to ERA-40 EWI (see Della-Marta et al. 2009). To compare windstorm climatologies the time resolution of ERA-40 EWI is degraded to match that of the s2d dataset with which we compare it (Table 1); we refer to this as aliasing.

### c. Calibration

There are two reasons why we use calibration techniques in this study: 1) each s2d model has circulation biases with respect to ERA-40 (Fig. 3) that are also present in s2d extreme wind climatologies (after calibration the s2d extreme wind climatologies are referred to as C_s2d_to_ERA); 2) to derive meaningful loss estimates comparable to real market losses. CatXos is calibrated for use with SR Hist, therefore ERA-40 and s2d windstorm fields must be calibrated to this dataset (referred to as C_ERA_to_SR, and C_s2d_to_SR, respectively).

We apply the techniques to data of the form *w*(*t*), which represents the scalar extreme wind index, for example, Q95(*t*) (C_s2d_to_ERA), or data in the form of a wind field, *w*(*x*, *y*, *t _{s}*) = max{

*w*(

*x*,

*y*,

*t*):

*t*∈ {

*t*, … ,

_{s}*t*}}, where

_{e}*t*and

_{s}*t*denote the start and end times of the windstorm, respectively. Note that C_s2d_to_ERA, C_ERA_to_SR, and C_s2d_to_SR are performed on a wind field, where all grid points’ wind speeds are pooled together. C_s2d_to_ERA is also applied to EWI and uses all values of the time series. For each of the calibration steps, C_s2d_to_ERA, C_ERA_to_SR, and C_s2d_to_SR, there are three different methods used to perform the calibration. The percentile-based calibration approach (PERC), whether it is applied to EWI or wind fields, is given by the general form

_{e}where *p* are probabilities at which the quantile functions are evaluated and *S*(·) represents a cubic spline function. In the case of calibrating wind fields, the quantile function is calculated empirically from *F*^{−1}(*p*) = min{*w*: *p* ≤ *F*_{*}(*w*)}, where *F*_{*}(*w*) is the (latitude weighted) cumulative distribution function where *p* ∈ {0.01, 0.02, … , 0.99, 0.991, 0.992, … , 0.999}. In the case of calibrating EWI each quantile function is approximated using a piecewise approach. The quantile function for index values below the chosen GPD threshold is calculated empirically, using a sequence of *p* ∈ {0.01, 0.02, … , *u*}, whereas the quantile function above the threshold is taken from fitting a GPD of the form in Eq. (1) to the DPOT EWI values using the same methodology given in section 3b. Essentially, the distribution functions of each data source are matched (Della-Marta and Wanner 2006); however, the added benefit is that we use an extreme value distribution to model the extremely high EWI values, reducing the possibility of sampling errors and allowing the estimation of extreme quantiles for which there exist few or no observations. Therefore this approach also removes the possibility for a different windstorm climate (different from ERA-40) to be obtained.

The other two calibration methods are special cases of the PERC method detailed above and adjust all data based on the difference in one quantile in the distribution. For example, choosing *p* in Eq. (3) with a single number such as 0.95 calibrates the differences in the 95th percentile from each set dataset (95PERC). The *p* could also represent the probability which corresponds to the mean of the data (MEAN). Such approaches allow varying magnitude differences in the climatologies to be maintained.

#### 1) Calibration sampling error estimation

Each of the calibration curves [Eq. (3)] is subject to sampling error. We assess the uncertainty using a bootstrap technique (Efron and Gong 1983) combined with kernel smoothers (to improve efficiency). For C_s2d_to_SR the technique consists of resampling with replacement the windstorm fields *w*(*x*, *y*, *t _{s}*) 200 times. All 200 storm fields are combined to a single vector before the probability density function is estimated using a kernel smoother. We use a Gaussian kernel (R Development Core Team 2005), which gives good representations of the densities without over smoothing (not shown).

### d. Sampling error estimation

Given that the ERA-40 dataset provides only 45 seasons compared with around 300 seasons from the s2d data (Table 1, column 6), we test to what extent the uncertainty in the ERA-40 storm RP estimates is due to sampling uncertainty using two different approaches. The first approach calculates the rate of convergence of GPD parameters as a function of the number of seasons used to fit the GPD (similar to Vannitsem 2007). For example, a random ordered sample of the total number of seasons from the DPOT series is taken (approximating chaotic interannual variability). Then a GPD is fit successively to a gradually increasing number of seasons. This is repeated 1000 times (resampling with replacement) to visualize the rate of convergence to the s2d climatology. We repeat this general procedure using pseudorandom numbers conditioned by the GPD parameters and the threshold exceedance frequency *λ* fitted to each EWI. The random draws from the fitted EWI GPD represent scalar metrics of windstorm fields and allow us to test hypotheses concerning possible biases in the ML or LM method of fitting the GPD. We compare the magnitude of possible biases in parameters to the magnitude of the interquartile range (IQR) according to *R* = 100.0||bias||/IQR to determine their relative importance to sampling variability.

The second approach focuses on 45-season samples and two different types of uncertainty: (i) sampling uncertainty, assessed by taking random 45-season samples of the DPOT series from s2d and looking at the range of fitted GPDs, and (ii) fit uncertainty, the uncertainty associated with fitting the GPD to each of the 45-season samples. These two uncertainties are not mutually exclusive, however; the sampling uncertainty can be thought of as dynamical uncertainty associated with having only 45 seasons of data, whereas the fit uncertainty can be thought of as the derived statistical uncertainty associated with having only 45 seasons of data. The sampling uncertainty calculation consists of taking 1000 different sets of 45 seasons of the DPOT series (resampling with replacement). The fit uncertainty calculation uses the same samples as above; however, we fit a GPD to each and calculate the median of the differences between the fit and the upper and lower bounds of the CI. The median fit uncertainty series are then added back to the best-fit ERA-40 GPD.

### e. Bias correction of the estimated climatologies

Both ML and LM are known to result in biased parameter estimates when used on limited samples and for various ranges of possible parameter settings; generally the biases being highest in magnitude for small samples (e.g., Cox and Snell 1968; Pandey et al. 2003). These biases can be corrected using resampling and bootstrap techniques (e.g., Zhang and Stephens 2009; Pandey et al. 2003) or directly using analytical techniques (Cox and Snell 1968). Using a general method to correct statistical model parameter biases (Cox and Snell 1968), Giles and Feng (2009) derive a first-order analytical bias correction formula using the log-likelihood equation of the GPD and show that the analytical approach is more efficient than a bootstrap approach. However, at the time of writing their correction scheme was not entirely applicable to our ML-fitted GPD parameters since many of our GPDs have a negative shape parameter where their scheme does not work efficiently. Therefore we have simply corrected the shape and scale parameters by bias estimated using the dynamical–statistical resampling technique.

## 4. Results

The section starts with a summary of the storm selection process and extreme wind climatologies, followed by the main results, the comparison of estimates of the European windstorm climate, and the associated losses to a Swiss Re portfolio.

### a. Windstorm identification, extreme value analysis, and climatology

Figure 5 summarizes the storm selection process. The first step 1) calculate the EWI series, 2) decluster the peaks over threshold series (Fig. 5a), 3) fit the GPD (Fig. 5b), and 4) assess the quality of the fit using various diagnostics such as a quantile–quantile (qq) plot (Fig. 5c) combined with objective GOF tests (Fig. 5d). Figure 5a shows the effectiveness of the Sw3q90 in measuring the difference in magnitude of extreme winds and the effectiveness of the declustering method in identifying major windstorm events. Well-known storms such as Daria, which occurred on 24 January 1990, are identified as a strong peak in the Sw3q90 series (seventh gray triangle from the left of Fig. 5a), whereas the storms Vivian and Wiebke, which were separated by only a short amount of time (25 and 27 February 1990, respectively) are identified as one cluster maxima in the DPOT series. The latter is a limitation of the scalar index approach to separate storms that occur close in time; however, the statistics of GPD fit do not seem to be overly affected as discussed in Della-Marta et al. (2009). The GPD fit (Fig. 5b) to ERA-40 Sw3q90 is bounded, that is, a negative shape parameter. Other indices generally display less negative shape parameters; however, all shape parameters are less than zero and remain so for all tested thresholds (see below). The qq plot (Fig. 5c) indicates that the GPD fit is reasonable apart from a slight departure from the GPD distribution in the middle tail. The GOF test (Choulakian and Stephens 2001) suggests that the chosen GPD threshold (97.5% quantile) results in a good GPD fit (Fig. 5d, solid black line). According to Choulakian and Stephens (2001), *p* values greater then 0.1 indicate a sufficient GOF to the GPD distribution. For comparison, GOF results for the SYS3 Sw3q90 climatology are shown as a dashed line in Fig. 5d. The 97.5% threshold was also chosen for this data–index combination because of the jump in GOF at the 97.5% quantile. Not all data–index combinations resulted in the same threshold being chosen for each dataset (see Table 2), and sometimes the GOF statistic did not increase in a general monotonic way with increasing threshold, which would be expected if the index data were asymptotically Pareto distributed. We tested the sensitivity of our results for each data, EWI, fitting method, and threshold combination (using the 95%, 96%, 97%, 97.5%, 98%, 99%, and 99.5% quantiles). The LM method results in generally higher GOF *p* values than the ML method, indicating a better GOF using LM. Sfq95q99 scores consistently high GOF *p* values across all thresholds and all datasets, indicating that the normalizing effect of this index makes it a stable choice for comparisons between datasets. The three magnitude-based indices—Mean, Q95, and Sw3q90—all display GOF at various thresholds (Table 2), which are generally higher for SYS2 and EC-DEM, Sw3q90 being the exception where the threshold for all datasets was set close to the 97.5% quantile. The index Sfq95 displays high GOF fitted to ERA-40 but low GOF to SYS2 and SYS3. Depending on the EWI, dataset, and threshold, the storm selection method finds between 263 and 380 windstorms in ERA-40 and between 367 and 2363 storms in the s2d data (Table 2).

A comparison of the ERA-40 windstorm climatology with the s2d windstorm climatologies reveals large differences in the EWI absolute values (Fig. 6) and their corresponding model parameter values (Tables 2 and 3) for the Mean, Q95, and Sw3q90 EWI. The climatologies using EWI based on local wind climatologies (Sfq95 and Sfq95q99) are quite similar in magnitude because of the normalizing effect inherit in these indices. Differences in the mean circulation are reflected in the overall differences between the climatologies (see Fig. 3), with SYS3 having higher intensity values than ERA-40 in all intensity-based EWI. SYS2 and EC-DEM are lower in wind intensity than ERA-40 but are similar to each other, which is expected since these models have similar physics and resolution (Figs. 6a–f). The widths of the CIs for each of the s2d models are much smaller than those of ERA-40, and there is clearly a less negative shape parameter in the climatologies of the s2d models using intensity-based EWI relative to ERA-40 (Table 3). As stated above, the LM GPD fits have higher GOF. The GOF test is more sensitive to lack of fit in the extreme tail of the GPD than it is to the lower tail (Choulakian and Stephens 2001). The high GOF with LM is somewhat contrary to the observed quality of the fits in the extreme tails. Comparing Figs. 6b,d,f with Figs. 6a,c,d, the LM-fitted GPD does not seem to fit the data as well as it does using the ML method in the range of approximately 20–300 yr. The ML fits seems to take more of a “middle path” through the observations, whereas the LM fits seem to be conditioned more by the lower RP EWI observations. It could be that the sensitivity of the GOF test diminishes with a high number of samples, such as used here.

Other differences (Fig. 6) lie in the position along the *x* axis of the first DPOT (windstorms, colored dots). The varying position indicates 1) that in some cases a different threshold was chosen (based on the GOF test; see Table 2) or 2) that the mean number of storm occurrences per season *λ* is different for each dataset (Table 2). Histograms in Fig. 7 show the empirical and theoretical distribution of the number of windstorms identified in each season using Sw3q90 choosing a 97.5% quantile threshold for all datasets. All distributions show some underdispersion. This has implications for the clustering (in time) of storm events (Mailier et al. 2006).

The reasons for the systematic differences in both the atmospheric circulation at 850 hPa and extreme wind climatologies are not fully known, but certainly some of the differences are due to model resolution (Jung et al. 2006). SYS2 and EC-DEM have a horizontal resolution of T95, whereas SYS3 and ERA-40 have a resolution of T159; this may explain why both ERA-40 and SYS3 have higher wind speeds (cf. Fig. 6). Model physics are also likely to play a role in the differences shown here; SYS2, EC-DEM, and ERA-40 all have similar physics (ECMWF 2003), whereas SYS3 has substantially revised physics (ECMWF 2007). Overall, these results indicate that a direct comparison of windstorm climatologies is not possible and that a calibration technique must be used.

### b. Estimates of windstorm return periods

#### 1) Calibration of s2d and ERA-40 windstorm climatologies

The three different calibration techniques, PERC, 95PERC, and MEAN, were applied to calibrate each s2d EWI series to the corresponding ERA-40 climatology, obtaining C_s2d_to_ERA. Figure 8 shows the calibration curves using the PERC method and the Sw3q90 series. The PERC curve for SYS3 Sw3q90 suggests increasingly negative adjustments to SYS3 with increasing Sw3q90 magnitude to be compatible with ERA-40 (0–80 nondimensional units). Both SYS2 and EC-DEM display mostly positive adjustments (2–5 nondimensional units) over most of the distribution; however, at higher percentiles (above the 95% quantile) the adjustments change quickly from being positive to negative. This occurs since the EWI climatologies of both SYS2 and EC-DEM cross ERA-40 at around 145 nondimensional units (Fig. 6e). Note in Figs. 8c,f there are minor discontinuities at the point where the calibration technique changes from the empirical method to the GPD method. While this is not desirable, it does not affect the subsequent analysis since the storm climatologies are based on the extreme tail of the distribution.

Applying the calibration curves and then going through the storm selection and GPD fitting procedure again results in calibrated climatologies. Figure 9 contrasts the various EVA climatologies of Sw3q90. Figures 9a,c,e compare SYS3 and SYS2 with 12-hourly aliased ERA-40 for each of the calibration techniques: PERC, 95PERC, and MEAN, respectively. Figures 9b,d,f compare EC-DEM with 24-hourly aliased ERA-40 for each of the calibration techniques, as for Figs. 9a,c,e, respectively. The PERC technique results in SYS2 and SYS3 and EC-DEM windstorm climatologies being almost exactly the same (Figs. 9a,b) as the aliased ERA-40 climate (see Table 4). The 24-hourly aliased ERA-40 climate (Fig. 9b) is very different from the 12-hourly aliased ERA-40 climate (Fig. 9a). The 24-hourly aliasing of ERA-40 has a high impact on the shape of the fit, resulting in a much shorter tail (Fig. 9b, black line, compared to Fig. 9a, black line). The aliasing of ERA-40 to 24-hourly resolution has resulted in similar windstorms being selected, that is, major storms such as Daria are identified; however, the maximum intensity of each storm in the aliased series is sometimes lower. The 95PERC calibration allows greater differences in the windstorm climatologies to remain. Both SYS3 and SYS2 (Fig. 9c) indicate that the RP of storms in ERA-40 are underestimated. The best-fit SYS2 curves lies within the CI of ERA-40 12-hourly aliased series, whereas the SYS3 estimate lies outside of the CI. Again we see the effects of the aliasing and calibration technique in Fig. 9d. Given these results we will not consider climatologies based on EC-DEM any further in this study. Calibrating using the MEAN technique results in an even more extreme windstorm climate based on SYS3 and SYS2 (Figs. 9e,f) than for the 95PERC calibration compared to ERA-40, and it is likely that this type of calibration does not result in a meaningful comparison.

#### 2) Sampling and fit uncertainty estimates

We chose to investigate SYS3 PERC-calibrated results in more detail since the model resolution is the same as ERA-40 [shown in Jung et al. (2006) to have a high impact on extratropical cyclone numbers and in Weisheimer et al. (2003) to have an influence on large-scale circulation modes]. However, we could have equally decided to use SYS2 results since SYS2 and ERA-40 have similar physics and SYS3 physics display different biases to SYS2 and EC-DEM (Fig. 3).

In Fig. 9a the windstorm climatology based on SYS3 is almost exactly the same as ERA-40. Effectively the PERC method has removed any differences between the SYS3 EWI climatologies and the ERA-40 climatology. Therefore, at this stage we are no closer to assessing whether the ERA-40 climate is close to the true climate; that is, we cannot answer the question whether the ERA-40 is a representative, unbiased sample of the true winter windstorm climate. One way to help answer this question is to utilize the longer s2d data by performing sampling experiments. Figures 10a,b show the bias and rate of convergence of the scale and shape parameter of the ML-fitted Sw3q90 SYS3 GPD as a function of the number of seasons (of DPOT data) used in the fit. Neither the scale nor the shape parameter has converged for a 45-season climatology, and they are different from the long-term parameter estimates. The magnitude of the biases represent approximately +16% and −22% of the IQR of samples taken using 45 season lengths for the scale and shape parameters, respectively. Approximately 65% of the shape and scale parameters lie below and above the long-term (316 seasons) SYS3 climatology, respectively (similar results are found for SYS2). A more negative shape parameter combined with a higher scale parameter than the converged values implies a shorter-tailed climatology when only 45 seasons are used. Figures 10c–f reveal that there is also a lack of convergence of the RPs [Eqs. (1) and (2)] for 45-season samples. For example, Figs. 10c–e all show that ERA-40 length datasets are likely to overestimate the RP of 30-, 45-, and 100-yr RP events by approximately 7, 11, and 21 yr, respectively. For RPs greater than approximately 300 yr ERA-40 length datasets tend to underestimate RP (Fig. 10f). Note, however, that the spread of possible RP estimates at 45-season lengths is large (IQR approximately 30–120 yr) compared to the apparent bias of approximately 11 yr. Therefore the issue of bias for 45-season length storm climatologies is secondary in importance to sampling variability. Figure 11 shows the same analyses as in Fig. 10 but using LM method of fitting the GPD. The LM-fitted scale and shape parameters are less obviously biased (Figs. 11a,b) than their ML counterparts. The small biases in these parameters result in small negative biases in the 30- and 45-yr RP around 1.1 and 2.1 yr, respectively. At higher RPs the negative biases become proportionally larger in magnitude than for the lower RPs, with the biases at 100- and 500-yr RP being 14 and 220 yr, respectively. Investigation of these biases using random storms simulated from the fitted ML and LM GPDs are shown in Figs. 12 and 13. These figures are very similar to Figs. 10 and 11, respectively. Almost all of the biases we see in the scale, shape, and RP estimates are due entirely to biases inherit to the ML and LM methods of fitting the GPD. Similar calculations reveal that these biases are common to all other EWI, although the magnitude of the RP biases using ML with the index Sfq95q99 are much lower than the RP biases in the magnitude-based EWI. This seems to be a function of the magnitude of the shape parameter: the more negative the shape parameter is, the larger is the ML bias (Zhang and Stephens 2009).

Assuming that the bias can be explained by the GPD fitting method we have applied a bias correction to the ERA-40 estimated *ξ* and *σ* of the form (*ξ̃*, *σ̃*)′ = (*ξ*, *σ*)′ − *B*′, where *B* = Bias(*ξ*, *σ*) and is measured from the difference between the converged SYS3 EWI GPD parameter value and the median SYS3 EWI GPD parameter value at 45 seasons (e.g., shown as text in Figs. 10a,b and Figs. 11a,b). Applying the bias correction we obtain new estimates of the European windstorm climate as defined by each EWI and each fitting method presented in Table 5. All ML-fitted EWI windstorms have lower RPs after bias correction (Table 5, column 6). The magnitude of the bias in RP is greatest for the Mean EWI with a 50-yr event bias corrected becoming a 37.5-yr event, whereas the magnitude of the bias in RP is lowest for the Sfq95q99 index showing only a small reduction in a 50-yr event to a 47.6-yr event. According to the results of Zhang and Stephens (2009) the magnitude of the bias correction to the shape parameter in their experiments increases for decreasing values of *ξ*. This may explain the results since the shape parameter of the Sfq95q99 is less negative (−0.03) than, say, the shape parameter of Sw3q90 (−0.28). All LM-fitted EWI windstorms have a higher RP after bias correction (Table 5, column 12) except the Sw3q90 index, which results in slightly negative corrections. As for the ML method the Mean index displays the largest change in RP: a 50-yr event becomes a 55.5-yr event after LM bias correction. All other LM RP differences are small. This indicates that the use of LM results in less biased RP estimates of the European windstorm climate. Possible reasons for Sw3q90 displaying slightly negative RP adjustments using the LM method (Table 5), when results presented in Figs. 11c–f indicate that positive RP adjustments may be needed, especially for higher RPs, is believed to be a function of the heteroscedasticity of the relationship between the dynamically resampled scale and shape parameters, which increases for more negative shape parameters. This has a nonlinear effect on the RP calculation and leads to the observed bias in Figs. 11c–f. This information is lost when only the median bias in the shape and scale parameters is used to correct the RPs.

Summarizing the results: if SYS3 windstorms are representative of the parent distribution (i.e., when calibrated) then dynamical–statistical sampling implies that ERA-40, by virtue of its length only and the ML method of fitting the GPD, is probably underestimating the severity (by overestimating the RPs of storms) of the current windstorm climate for windstorms with RPs between 10 and 300 yr and overestimating the severity of the windstorm climate (underestimating the RPs of storms) for storms with RPs longer than 300 yr. Whereas, if the ERA-40 storm climate is fitted using LM, the climatology is relatively well represented except at higher RPs where the method tends to overestimate the severity of the windstorm climate.

Figure 14 presents the sampling uncertainty and fit uncertainty (described in section 3d) of ERA-40 based on calibrated SYS3 for two EWI, Sw3q90 and Sfq95q99, contrasting the two parameter estimation methods. Focusing first on the gray shaded regions of Figs. 14a,c, representing the dynamical sampling uncertainty, we see that the ERA-40 and SYS3 windstorm climatologies overlap each other and lie within, but always in the upper part of, the 45%–55% percentile range of randomly sampled 45-season SYS3 GPDs (Figs. 14a,c; also shown in Fig. 10d). This result confirms that ML estimates of the windstorm climatology are on average biased. Interestingly, approximately 20% of the randomly sampled 45-season SYS3 GPDs (Figs. 14a,c, medium gray shading) are lower than the ERA-40 GPD fit lower CI (black dot–dashed line). This indicates that the ML fit uncertainty (profile log-likelihood method) of ERA-40 may be underestimating the possibility for less extreme wind climatology than what is possible by drawing a random 45-season sample from the SYS3 climatology. Similarly, the upper boundary of the gray shading is much lower than the ERA-40 GPD fit upper CI (black dot–dashed line). This indicates that the ERA-40 GPD fit uncertainty overestimates the width of the upper windstorm intensity uncertainty. In other words, ERA-40 ML fit uncertainties seem to underestimate the possibility for a less intense windstorm climate and seem to overestimate the possibility for a more intense windstorm climate (these results are also true based on SYS2 PERC-calibrated data, not shown). Another conclusion of these experiments is that the median SYS3 fit uncertainty, denoted by the yellow lines in Figs. 14a,c, indicates that the ERA-40 GPD fit uncertainty (black dot–dashed lines) is a good estimate of the typical CI widths associated with the GPDs fitted to the randomly sampled 45-season SYS3 climatologies; that is, the yellow lines more or less match the edge of the medium gray shaded regions. This signifies that, given the sample of the observed climate (i.e., ERA-40), the fit uncertainty estimates are not biased once the overall bias in the GPD fit is removed. All results described above remain valid for the other ML-fitted EWI for the range of thresholds higher than those presented in Table 2. In contrast to the ML-fitted climatologies, the 45-season LM-fitted climatologies of Sw3q90 and Sfq95q99 are not biased (Figs. 14b,d); that is, the ERA-40 and calibrated SYS3 climatologies fall in the middle of the middle 10% of 45-season sampled climatologies (light gray shaded region). However, the width of the 45-season sampled climatologies (gray shaded areas) cover a wider range of RP estimates than the ML estimates. The wider range of LM climatologies shows that the LM method is less accurate in determining the long-term windstorm climate given a 45-season sample than the ML bases climatologies, which have a lower range (Figs. 14a,c gray shaded regions compared with Figs. 14b,d shaded regions). The larger spread of LM-based climatologies is being driven by the wider range of fitted scale and shape parameters compared with ML (Figs. 10a,b compared with Figs. 11a,b). Therefore the use of ML results in more representative estimates of the long-term storm climate given 45 seasons than the LM-based climates after the ML bias has been removed. On the other hand, the LM-based climates are not biased but the spread of possible sampled climates is wider given 45 seasons of data. Similar conclusions can be drawn from the SYS3 climatology based on the 95PERC technique, yet the proposed windstorm climate is far more extreme than ERA-40, implying that the 95PERC has not adequately accounted for model differences. Therefore these 95PERC-based results (not shown) should only be seen as speculative.

The sampling experiment results give us details about how reliable the LM and ML methods are. Alternatively, the s2d climatologies can be used to simply quantify the reduction in GPD fit uncertainty gained from having a larger sample. The results are demonstrated in Fig. 15. Here the width of the ML return level (nondimensional units) and RP (yr) uncertainties (CIs) are plotted as functions of the RP estimated from ERA-40. S2d RL uncertainties are approximately one-third of the size of ERA-40 RL uncertainties (Fig. 15a). The upper bound of the return level CI is larger in magnitude than the lower bound of the return level CI, showing the asymmetry in the profile log-likelihood uncertainty estimates. Figure 15b shows the improvements in the width of the RP CI by using s2d data. For a given RP, say 20 yr, the s2d upper (lower) bound is approximately 16% (40%) of ERA-40 upper (lower) bound. Clearly, the greater length of the s2d climatologies results in markedly reduced CI widths.

### c. Estimates of windstorm-related loss

Windstorm fields identified in ERA-40 and s2d datasets are used as input to catXos. The first step in using a windstorm field is to calibrate it with the Swiss Re windstorm dataset (SR Hist). The calibration is described in section 3c as C_s2d_to_SR. The C_ERA_to_SR PERC adjustment curve (Fig. 16a, thick gray line) shows that ERA-40 GW needs to be adjusted negatively, which is expected since the GW is a measure of the free atmosphere wind. Generally the negative adjustment increases with GW and displays a rather unusual behavior between ERA-40 GW values of 10–30 m s^{−1}. The cause of this “bump” is related to the way in which the Swiss Re windstorm fields are created. Generally, only a limited number of pressure and wind gust speeds observations were available in the wind field reconstruction, usually concentrated in regions where the windstorm was known to have caused damage. These wind gust speeds are generally above 20 m s^{−1}. This has had the effect that the Swiss Re windstorm fields may be unreliable for wind gust speeds less than 20 m s^{−1} in regions that were not affected by the windstorm. While this has no effect on the actual loss values (see Fig. 2a), it exposes differences between the datasets. The C_ERA_to_SR adjustment uncertainty (Fig. 16a, thin vertical black lines) calculated using resampled kernel density estimates, shows some nonsymmetric behavior in the adjustment amounts especially at higher percentiles. Figures 16b–d show the C_s2d_to_ERA PERC adjustment curves for each of the s2d datasets. The calibration of the windstorm fields show similar adjustment curves to the calibration of the Sw3q90 (Fig. 8); however, all curves show a monotonic increasing (SYS3) or decreasing (SYS2) adjustment amount with increasing GW unlike their EWI equivalents in Figs. 8b,c,e,f for SYS2 and EC-DEM.

Figure 17 presents a comparison between loss estimates derived from s2d, ERA-40, and Swiss Re windstorm datasets. In presenting these results we wish to advise readers that the loss figures derived from ERA-40 and s2d are modeled loss estimates, which are critically dependent on methods chosen to compare the model climates and are not calibrated against loss data. In contrast, the Swiss Re loss estimates are based on and tested against independent loss data. Figure 17a compares the SR Hist and SR Stoch loss-frequency curve(s) (LFC) with those derived from ERA-40 windstorms. First, the C_ERA_to_SR method brings the raw LFC of ERA-40 (black solid and dashed lines in the top left of Fig. 17a) to lie between the Swiss Re historic and stochastic LFCs (green lines). The ERA-40 LFC based on 121 windstorms (labeled SR ERA-40 C_ERA_to_SR, the subset of storms in ERA-40 that match in time with the Swiss Re storms; black dashed line) is generally below the MeteoSwiss (MS) ERA-40 C_ERA_to_SR and could be due to some storms missing from SR Hist, in terms of their potential loss. It is noticeable that each of the ERA-40 LFCs and the Swiss Re historical LFC reach a point where the LFC is a horizontal straight line. LFCs produced by catXos have not been modeled and represent empirical loss-frequency estimates. The LFC curves remain horizontal at a RP equal to the length of the dataset. This is also the reason why the LFCs tend to display “jumps” at higher RPs because of sampling. Figure 17b is similar to Fig. 17a but shows the LFCs based on each of the s2d C_s2d_to_SR datasets. The combination of C_ERA_to_SR and C_s2d_to_ERA (C_S2d_to_SR) shows that the s2d LFCs are close to the ERA-40 LFC (black line) for RPs between 1 and 45 yr. Improved loss estimates are gained from the s2d datasets in the range of 45–300 yr and are comparable but lower than the SR Stoch results.

Figures 17c,d quantify the uncertainties; first in Fig. 17c by exploring the sampling errors in loss estimates due to a limited length of ERA-40 data. We use a similar resampling technique as described in section 3d (see Fig. 14). Taking 20 (a limited number because of computing and data storage requirements) random 45-season samples of windstorms from SYS3 to calculate their expected loss. The approximate spread of the LFCs is shown by gray shading in Fig. 17c and quantifies the sampling uncertainty in loss estimates due to using a dataset that is only 45 seasons in length. The width of the approximate confidence interval encompasses both the SR Hist and SR Stoch LFCs (green lines). This figure demonstrates the need to use alternative methods to estimate the RP of very extreme wind-related loss events, such as those currently used by Swiss Re (i.e., stochastic datasets) or the alternative that we propose using s2d models.

Figure 17d compares the spread in the LFCs by resampling the C_ERA_to_SR and C_s2d_to_ERA adjustment curves (Fig. 16) within their uncertainty estimates. Taking 20 random samples from the 200 resampled kernel density estimates of the SYS3 adjustment curve (described in section 3a). This results in the LFCs (gray shaded area, Fig. 17d), which also encompass the two Swiss Re LFCs. This figure demonstrates one of the caveats for using s2d data, namely, that there are high uncertainties associated with calibration procedure which should not be ignored. The approximate interquartile range of the LFCs, shown by the dark gray shading, is lower than the best-estimate SYS3 LFC (blue line). This is caused by the distribution of the calibration curve error (Fig. 16b) having a skewed distribution with higher densities at lower adjustment values. It is not clear whether the kernel estimators are biased, especially at such high percentiles.

## 5. Discussion

The choice of threshold used for the GPD can have a large influence on the windstorm climatology, especially if the data are not sampled from the extreme tail of the distribution (Coles 2001). We performed extensive experiments on the sensitivity of our results to the choice of threshold combined with qualitative and quantitative measures of the GOF the GPD has to the EWI data. We found that all conclusions remain valid for thresholds above those shown in Table 2 and up to the 99.5% quantile threshold. The magnitude of biases in ML- and LM-based climatologies varies in magnitude (nonmonotonically with increasing threshold); however, the sign of the biases remains the same as those shown in Table 5 and in Figs. 10 –13.

To use s2d climatologies to estimate wind-induced loss we applied two stages of calibration. The first to make ERA-40 comparable with SR Hist (C_ERA_to_SR) and the second to make s2d comparable with ERA-40 (C_s2d_to_ERA). Conceivably we could remove the need for C_ERA_to_SR by using ERA-40 to define the vulnerability function (cf. Fig. 2a). However, some ERA-40 parameters may not have the accuracy or the resolution for the purposes of modeling wind-related loss (e.g., ERA-40 wind gust; Della-Marta et al. 2009). There remains the need to apply C_s2d_to_ERA if meaningful loss estimates are to be produced.

Previous work on the impact of GCM resolution on cyclone frequencies (Jung et al. 2006) and interannual–decadal variability of large-scale circulation (Weisheimer et al. 2003) suggests that the statistics of extreme winds are likely to be affected; therefore we place more emphasis on SYS3 results and conclusions. There is strong evidence in Jung et al. (2006) that horizontal model resolution has a large impact on the number and intensity of cyclones simulated in the ECMWF atmospheric model. They note that the dynamical effect of changing horizontal resolution dominates over the truncation effect for intense cyclones, whereas the truncation effect dominates for shallow cyclones. They also suggest that the ECMWF atmospheric model, tuned to be particularly skillful in medium-range weather forecasting, may have defects when it comes to extended-range integrations at relatively low horizontal resolution. They base this argument on Bengtsson et al. (2006), who found that their GCM produced a realistic representation of extratropical cyclone characteristics, despite the relatively coarse resolution used (T63). All climate models—whether they be oceanic, atmospheric, or coupled models—display inconsistencies with observations. Sometimes these inconsistencies are in the climate of the model and/or its variability. Key components of the model may be parameterized, leading to unresolved scales of interaction within the model, leading to biases or misrepresentations. Unfortunately, given the complexity of dynamical models, it is often difficult to pinpoint the exact reason for a large-scale bias, for example, in circulation, as seen in Fig. 3 of this study. Readers are encouraged to reference Anderson et al. (2003, 2007) for a more complete analysis of ECMWF season forecast model biases and ECMWF (2003, 2007) for a full specification of the model physics.

The 850-hPa GW is suitable for the intercomparison of different s2d data since it is independent of surface boundary layer parameterization. It is available for basically any GCM run. However, the GW at 850 hPa is not fully representative of the wind gusts important for modeling of wind-related loss. GW is an analyzed variable, leading to aliasing of wind intensity (Della-Marta et al. 2009). Thus, we must consider the loss estimation figures with caution.

The aliasing technique was necessary because of the sensitivity of the storm selection method to the temporal resolution of the EWI. Using another storm selection technique (e.g., Lagrangian-based storm tracking) probably would not have helped since these methods are also sensitive to spatial and temporal resolution (Bhend 2005; Raible et al. 2008). In the absence of a reliable integrated model parameter (e.g., maximum wind gust) the spatial and temporal sampling must be adequate to analyze the variability of interest.

Do the s2d hindcasts represent the current climate? Each of the extended winter seasons are constructed from seasonal forecasts initialized at different times (see Table 1, e.g., September and November forecasts combined), possibly conditioning the dynamics by the observed initial states during the hindcast period. The longer the hindcast the more the dynamics are able to sample a wider range of initial states leading to a fuller range of possible simulated weather. Likewise, an unequal number of ensembles within hindcasts could lead to dynamics biased to the initial states. The limited resolution of s2d models indicates that the statistics of extremes should only be inferred for synoptic-scale features.

A related point to that above is the assumption that our seasonal resampling (cf. Figs. 10 –14, which is white in spectrum) represents likely interannual climate variability. For instance, it would be useful to test the influence of multiannual and decadal climate variability on the convergence of the parameters shown in Figs. 10 –14. Resampling with a reddened spectrum would probably result in a slower convergence of the parameters to their long-term values.

In Fig. 7 we showed a certain amount of underdispersion in the Poisson fits to the extended winter season storm frequency. On the contrary, Mailier et al. (2006) show that storms tend to cluster in time (i.e., the Poisson distribution is overdispersed) in this region. Further investigation and comparison with Della-Marta et al. (2009, see their Fig. 9a) show that it is related to the choice of domain and the continental perspective of the EWI used in this study. Extreme winds events in the Mediterranean region exhibit a clear tendency not to cluster in time as much as in the northwestern European and adjacent Atlantic Ocean regions.

## 6. Conclusions

This study has investigated the use of seasonal-to-decadal AOGCM forecast data to estimate the current climate of extreme winds associated with synoptic-scale cyclones over the eastern North Atlantic and western European domain. Differences between the physics, spatial and temporal resolution, and the hindcast length result in different windstorm climatologies to ERA-40 (cf. Figs. 3 and 6). After the application of a percentile calibration technique the s2d EWI-based windstorm climates were made very similar to ERA-40 windstorm climate (cf. Figs. 9a,b). This effectively removed the possibility for a different estimate of the current windstorm climate as defined by ERA-40. However, dynamical sampling experiments, utilizing the long s2d datasets, revealed that both the ML and LM methods of estimating the parameters of the GPD [Eq. (1)] contain biases at equivalent sample sizes (around 300 storms; cf. Table 2) from 45 seasons of reanalysis data.

New estimates of the European windstorm climate that have been calibrated to ERA-40 and then bias corrected suggest that the windstorm climate estimated from ERA-40 using ML method underestimates the severity of the windstorm climate, whereas using the LM method overestimates the severity of the windstorm climate at high RP. This suggests that, for example, a windstorm event that is estimated by ERA-40 to have an RP of 50 yr is more likely to be approximately a 40-yr event (cf. Table 5) when the ML method is used. Evidence to support this hypothesis is seen in the lack of convergence of the GPD parameters using datasets, which are 45 seasons in length (cf. Fig. 10). Experiments using random storms (cf. Fig. 12) indicate that the likely cause of this bias is inherent in the ML GPD parameter estimation method. Thus, by virtue of its length and the ML method, ERA-40-based windstorm climatologies (e.g., Della-Marta et al. 2009) are probably underestimating the severity of the current windstorm climate for windstorms with RPs between 10 and 300 yr and overestimating RPs of storms greater than 300 yr. However, these biases are relatively small compared to the role of dynamical sampling variability, which indicates, given 45 seasons to estimate the windstorm climate, the biases represent only around 20% (5% for L-moments) of the interquartile range of possible climates.

Using s2d allows various sampling experiments to quantify uncertainties associated with having ERA-40 length datasets. The ML ERA-40 climatology lies within the middle 10% of 45-season subsampled SYS3 windstorm climatologies (cf. Figs. 14a,c). This implies that the GPD fit to ERA-40 is accurate given that it is only 45 seasons in length, yet the windstorm climate of ERA-40 is not likely (∼65% chance) to have converged to the parent distribution (cf. Figs. 14a,c, blue line). The second conclusion from this analysis is that the profile log-likelihood ERA-40 GPD fit uncertainty overestimates the width of the lower bound of the RP of windstorms. Conversely, we also find that the upper bound of the ERA-40 RP fit uncertainty is underestimated. For the LM method the width of the 45-season sampled climatologies are not biased but cover a wider range of RP estimates than the ML estimates. The wider range of LM climatologies implies (cf. Figs. 14b,d) that the LM method is less accurate in determining the long-term windstorm climate given a 45-season sample than the ML-based climatologies.

We have shown that there is possibly useful skill in the prediction of the intensity of wind over the western European domain from the ECMWF s2d models during the first month of the November initialized forecasts (cf. Fig. 4) with little skill thereafter. Therefore, there is potential value in monthly forecast systems predicting four weeks in advance (Weigel et al. 2008a).

Our analysis has demonstrated that it is possible to use s2d data to gain useful estimates of the current windstorm climate and related loss. These data give opportunities to compliment the Swiss Re stochastically generated windstorm events with a completely independent set of physically consistent simulations of extreme wind-related weather in the current climate. New loss uncertainty estimates associated with sampling 45 seasons from SYS3 advocate the use of either a stochastically generated windstorm dataset as used by Swiss Re, or a dynamical ensemble-based dataset to improve the estimates of loss associated with high RP events. While there remains considerable uncertainty in long RP loss estimates because of the need to propagate calibration errors through the loss model, we demonstrate that s2d models are a powerful tool for exploring the climate of extremes and offer another perspective on the windstorm climate over the eastern North Atlantic and western Europe.

## Acknowledgments

We acknowledge Swiss Reinsurance Company who substantially sponsored this research through the Swiss National Science Foundation and the National Centre for Competence in Research Climate (NCCR-Climate). We thank Christoph Frei for discussions on the ML and LM methods as well as providing extreme value analysis software, and Stefan Wunderlich, Malcolm Haylock, and Marc Wüest for useful discussions and input to this paper. We thank three reviewers who provided detailed suggestions and comments that greatly improved the content of the paper. Paul Della-Marta was partially sponsored by the 6th EU framework program ENSEMBLES contract number GOCE-CT-2003-505539 and by the Swiss National Science Foundation and the National Centre for Competence in Research Climate (NCCR-Climate).

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

^{4}-year surge level estimates using data of the ECMWF seasonal prediction system.

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

### APPENDIX A

#### List of Acronyms

95PERC The 95th percentile method of calibration

AOGCMs Atmosphere–ocean general circulation models

C_ERA_to_SR The calibration of ERA-40 wind climatology to the SR Stoch wind climatology

C_s2d_to_ERA The calibration of s2d wind climatology to the ERA-40 wind climatology

C_s2d_to_SR The calibration of s2d wind climatology to the SR Stoch wind climatology

catXos Educational version of the Swiss Re windstorm loss model

CI Confidence interval (in most cases the 95% confidence interval)

DPOT Declustered peaks over threshold series

EC-DEM ECMWF s2d model used in the DEMETER project

ECMWF European Centre for Medium-Range Weather Forecasts

ERA-40 40-yr ECMWF Re-Analysis

EVA Extreme value analysis

EWI Extreme wind index

GCM General circulation model

GOF Goodness of fit (assessed using the Anderson–Darling test statistic)

GPD Generalized Pareto distribution

GPH Geopotential height

GW Geostrophic wind speed

IQR Interquartile range

LFC Loss-frequency curve

LM The L-moments method

Mean An EWI defined in appendix B

MEAN The mean method of calibration

ML The maximum likelihood method

PERC The percentile method of calibration

Q95 An EWI defined in appendix B

RL Return level

RP Return period

RPSSd The debiased ranked probability skill score

s2d Seasonal-to-decadal climate forecasts

Sfq95 An EWI defined in appendix B

Sfq95q99 An EWI defined in appendix B

SR Hist Swiss Re historical wind field dataset

SR Stoch Swiss Re stochastic wind field dataset

Sw3q90 An EWI defined in appendix B

Swiss Re Swiss Reinsurance Company

SYS2 ECMWF system2 s2d model

SYS3 ECMWF system 3 s2d model

### APPENDIX B

#### Mathematical Notation of Extreme Wind Indices

This appendix describes each of the extreme wind indices in mathematical notation for the benefit of readers who may wish to implement such indices. The content of this appendix has been taken from Della-Marta et al. (2009). The indices are denoted in terms of a generic wind variable *W*. Where possible we tried to take into account the unequal areas of each grid box by weighting sums and multipliers by the cosine of the latitude of each grid point.

##### a. Mean: Mean wind (m s^{−1})

where *κ* are the individual gridpoint weights, which only depend on *y*, *κ*(*x*, *y*) = cos[latitude (*y*)], *N _{κδ}* = ∑

_{x,y∈δ}

*κ*(

*x*,

*y*), and

*δ*denotes the domain.

##### b. Q95: The spatial 95% quantile wind (m s^{−1})

where *p* = 0.95 and *F*_{*} is the latitude-weighted empirical cumulative distribution function of {*w*(*x*, *y*, *t*):(*x*, *y*) ∈ *δ*}.

where .

##### c. Sw3q90: Cube root of the sum of wind cubed above the domain climatological 90% quantile (nondimensional)

where . The domain mean quantile function q90 is given by

where q90(*x*, *y*) = *F*^{−1}(*p*) = min{*w*:*p* ≤ *F*(*W*)}, *p* = 0.90; *F* is the empirical cumulative distribution function of {*w*(*x*, *y*, *t*): *t* ∈ ONDJFMA}.

##### d. Sfq95: Sum of the fraction of wind divided by the gridpoint climatological 95% quantile (nondimensional)

##### e. Sfq95q99: Sum of the fraction of extreme wind divided by the length of the distribution tail (nondimensional)

## Footnotes

*Corresponding author address:* Dr. Paul M. Della-Marta, Partner Reinsurance Company, 36 Bellerivestrasse, Zurich 8034, Switzerland. Email: paul.della-marta@partnerre.com