## Abstract

Sea level (SL) forecast for the city of Venice, Italy, is of paramount importance for the management and maintenance of this historical city and for operating the movable barriers that are presently being built for its protection. In this paper, an ensemble prediction system (EPS; based on an ensemble of 50 simulations) for operational forecasting of storm surge in the northern Adriatic Sea is presented and applied to 10 relatively high storm surge events that occurred in the year 2010. It is shown that storm surge peaks correspond to the maxima of uncertainty (as described by the spread of the EPS members), which increases linearly with the forecast range. Further, the uncertainty in storm surge level is shown to be linked to the uncertainty of the forcing meteorological fields. The quasi-linear dynamics of the storm surges plays a minor role in the evolution of uncertainty, except it produces its oscillation with a period associated with that of the 11-h seiche of the basin. The error of the ensemble mean forecast (EMF) is correlated with the EPS spread. For these cases, the EMF accuracy is very close to that of the high-resolution deterministic forecast (DF) and is more robust than the DF (meaning that its error is consistently smaller than the error of the DF, as the lead time of the forecast varies).

## 1. Introduction

Storm surges pose severe problems for Venice, Italy, and its lagoon, damage monuments and buildings, and affect daily life of Venetians and tourism. The potential severity of this hazard is shown by the event that occurred on 4 November 1966 [see De Zolt et al. (2006) for its description], when water rose almost 2 m above the conventional reference level^{1}, producing damages of approximately EUR 400 million (present-day value). Moreover, future sea level rise is expected to dramatically worsen the problem, increasing the frequency of floods (Scarascia and Lionello 2013; Lionello 2012).

An accurate and fully informative prediction of sea level (SL) in the time range from a few hours to several days is an essential tool for the management of the city, in particular for efficiently operating the movable dams that the Italian government is presently building as an important component of the general plan for the safeguard of Venice (Eprim et al. 2005). The movable dams are a system of electromechanical underwater barriers called Modulo Sperimentale Elettromeccancio [MOSE (Electromechanical Experimental Module)] that will be raised across the lagoon inlets before high storm surge events. Preventing the flooding of the Venice city center requires stopping the SL increase inside the lagoon before it reaches 84 cm above present mean SL (corresponding to 110 cm above the reference level, as explained in footnote 1). The timing of the decision to close the lagoon inlets should account for operational requirements (Eprim et al. 2005). A constraint is given by the leakage of the barriers, which will produce a small but continuous sea level rise in the lagoon also when they are closed. For long-lasting surges (up to 36 h), this problem imposes raising the barriers 5 h before the water level crosses the 110-cm threshold. Three additional hours are needed to stop the ship traffic and 30 min for actually raising the barriers and closing the lagoon inlets. Considering that there is a cost both to raising the wall on a false alarm and to not raising it when it should have been needed, an accurate and fully informative forecast is clearly important.

On the short time scale (from several hours to a few days), the sea level variations in the northern Adriatic Sea are caused by three factors: astronomical tide, storm surge, and seiches.^{2} Astronomical tide in the Adriatic Sea is predicted with an accuracy of about 1 cm (Comune di Venezia et al. 2013) using eight components (M_{2}, S_{2}, N_{2}, K_{2}, K_{1}, O_{1}, P_{1}, and S_{1}). Storm surges are caused by surface winds and mean sea level pressure variations (MSLP) that alter the periodic astronomical tidal oscillations and can produce the positive anomalies that cause the flooding of Venice (the well-known “Aqua Alta”; Robinson et al. 1973). The morphology of the Adriatic Sea (about 800 km long and less than 200 km wide) favors the action of the Sirocco, a strong wind that blows to the northwest along the axis of the basin and accumulates water at the closed end of the Adriatic Sea. The Sirocco’s effect is reinforced by the action of the MSLP gradient. Our study follows the consolidated operational practice of computing storm surge independently from the astronomical tide (e.g., Massalin et al. 2007), which assumes that nonlinear interactions among these two components are negligible sources of error, mainly because of the small tidal range (about 1 m), However, there are indications that nonlinearities affect the interaction of large surges with the astronomical tide (Pirazzoli et al. 2007) and that this source of error deserves to be carefully analyzed in future studies. Seiches are free oscillations of sea level with fundamental periods of about 11 and 22 h (e.g., Lionello et al. 2005) that are triggered by an initial storm surge event and successively attenuated by dissipation. In this manuscript the superposition of storm surge and seiches (which is the SL without astronomical tide) is synthetically called surge residual (SR).

The forecast center of the Venice municipality [Istituzione Centro Previsioni e Segnalazioni Maree (ICPSM), a center for tide prediction and warning], operates a set of models for SL prediction. Initially, a linear statistical autoregressive model (Tomasin 1972) has been used for operational forecasting of SL in Venice. This model, which is calibrated using observed sea level time series [the tide gauge station “Punta Salute” (PS) has been operating for over a century in the Venice city center], predicts the water level in the lagoon using observed local sea level and MSLP at stations along the Adriatic Sea. Its successive development (called BIGSUMDP) is still in use today with very good results (Canestrelli and Pastore 2000). However, this autoregressive model loses reliability when prediction over a time range longer than one day is required (Canestrelli and Moretti 2004). ICPSM has, therefore, adopted two hydrodynamical models that are directly based on the “shallow water” equations and that compute the evolution of current and SR from a sequence of MSLP and surface wind fields. These two models are the Shallow Water Hydrodynamic Finite Element Model (SHYFEM; Umgiesser et al. 2004), based on the finite element method, and the Hydrostatic Padua Sea Elevation and Adjoint Model (HYPSE-AM; Lionello et al. 2006), a finite-difference model that includes a data assimilation procedure based on the adjoint method. These models allow a reliable forecast over a longer time range, and their accuracy is determined mostly by their spatial resolution and by the quality of the forcing meteorological fields. While, in general, for hourly sea level a hydrodynamical model has a lower bias than a model based on the statistical autoregressive approach, the latter produces better results if only high storm surges are considered (see Table 1). However, hydrodynamic models provide information on SR along the whole coast (also where observations are not available), while the statistical approach is restricted to a single point in the Venice city center. Further, the results of hydrodynamic models are expected to benefit from increasing the grid resolution and the accuracy of wind fields. In fact, presently, a main source of uncertainty of the forecast carried out with hydrodynamic models are the errors of the input wind fields, which in the shallow northern Adriatic Sea are the main forcing of the storm surge (Bargagli et al. 2002). These fields present sharp and irregular mesoscale structures, produced by the steep mountains on both sides of the Adriatic Sea, that are difficult to predict with sufficient precision by meteorological models (Zecchetto and Cappa 2001; Cavaleri and Bertotti 2004).

This study investigates the possibility and utility of complementing the hydrodynamic model single prediction (“deterministic”) with an ensemble prediction, including an estimate of the forecast uncertainty and the computation of a probability value for SR thresholds (“probabilistic” forecast). Our idea is using the European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble prediction system (EPS) for forcing a hydrodynamic model and producing corresponding SR-EPS in the Adriatic Sea. The ECMWF EPS (operational since 1992; e.g., Molteni et al. 1996; Buizza et al. 1999) is a consolidated tool that provides a probabilistic weather prediction. It estimates the probability distribution function of forecast states and forecast uncertainty. The conceptual background of EPS is chaos theory, which describes the behavior of dynamical systems that are highly sensitive to the initial condition (the so-called chaotic systems, such as the atmosphere; Lorenz 1965). The ECMWF EPS consists of a set of different forecasts based on a set of different initial conditions (representing uncertainties inherent in the operational analysis) and a slightly different set of equations that is close, but not identical, to the best estimate of the model equations (thus representing also the influence of model uncertainties on forecast error). Different initial conditions are designed to include those perturbations that grow most rapidly in time and are formalized mathematically using the singular vector technique (Buizza and Palmer 1995). In this study, each member of the ECMWF EPS is used for obtaining a corresponding forecast of the SR and this set of SR forecasts is used for a probabilistic prediction and an estimate of the SR forecast uncertainty.

Multimodel storm surge ensemble prediction has been performed for New York and the North Sea (Di Liberto et al. 2011; Siek and Solomatine 2011). Both studies found that the prediction accuracy of the multimodel ensemble is considerably improved in comparison to the one achieved by single models. However, our study follows the approach of Flowerdew et al. (2009, 2010, 2012), who implemented an EPS for storm surges in the North Sea, showed that the ensemble spread is a reliable indicator of the uncertainty associated with large surge events, and obtained a skilled probabilistic forecast. These studies led to the operational implementation of the EPS system. The output of the ECMWF EPS was also used by de Vries (2008) for a probability forecast of SL at the coast of the Netherlands. This study is meant to investigate the effectiveness and utility of the EPS approach in a different situation, when the wind fields are strongly affected by local features (Zecchetto and Cappa 2001), the SL oscillations have a lower range than in the North Sea, the SL dynamics includes large seiches, and a probabilistic prediction can provide very important information for the management of coastal defenses.

The paper is organized in the following way. Section 2 describes the shallow water model, data, events, and methods used in this study. Section 3 describes the results of the storm surge forecast using the EPS and discusses the rate at which the storm surge forecast uncertainty grows in time. Section 4 summarizes the conclusions of this study.

## 2. Data and methods

The SR simulations carried out in this study are based on HYPSE (Lionello et al. 2005), which is a standard single-layer nonlinear shallow water model whose equations are derived from the depth-averaged momentum equations. It adopts an orthogonal C grid and uses the leapfrog time integration scheme with the Asselin filter to prevent time splitting. In this implementation the model uses a rectangular mesh grid of variable size that has the highest resolution in the northern part of the Adriatic Sea, where the minimum step is 0.03°. Starting from this value, the grid spacing increases with a logarithmic increment (which uses a 1.01 factor) in both latitude and longitude. Practically, its resolution varies in the range from 3.3 to 7 km. This grid has been shown to produce more accurate results with respect to other grids. The model domain and the locations of the tide gauges used in this study are shown in Fig. 1. A fixed sea level is imposed along the open boundary south of the Otranto Strait (Fig. 1). Model details and validation are described in Lionello et al. (2005, 2006).

In this study, 10 events that occurred in the year 2010 are simulated using 3-hourly ECMWF wind and MSLP as a forcing of the HYPSE model. The ECMWF EPS produces 50 different forecasts, which are used for producing a corresponding SR-EPS, and a high-resolution meteorological forecast [deterministic forecast (DF)], which is used for a corresponding SR deterministic prediction^{3} (see Table 2). This 1-yr-long period has been selected in order to have the same EPS and high-resolution forcing fields in all case studies. In fact, in January 2010 the operational version of ECMWF changed, increasing the resolution of the deterministic forecast and analysis from T799^{4} (~25 km) to T1279 (~16 km) and of the initial 10 days of EPS from T399 (~50 km) to T639 (~31 km). In this study, all the ECMWF fields have been extracted as disseminated by ECMWF (0.125° for T1279, 0.25° for T639) and linearly interpolated on the HYPSE grid. After the 2010 resolution increase of the ECMWF model system, one might expect that the quality of the meteorological forcing fields has improved, because the Adriatic area is very sensitive to topographic effects (Zecchetto and Cappa 2001).

The HYPSE model is exactly the same in all 10 cases and 51 forecasts. In each case all simulations start from the same initial condition, which is obtained with a 6-day simulation (analysis) during which HYPSE is forced by the ECMWF high-resolution analysis. In other words, each of the 51 simulations is split into two parts: a 6-day analysis (which is identical in all 51 simulations) and a 6-day forecast, each presenting a different SR evolution because of the 51 different forcings (see Table 2 for a short overview of all simulations). The 50-member SR ensemble is meant to provide information on sea level uncertainty (probabilistic prediction) and the DF is meant to be the best available individual forecast.

The analysis of the results is mainly based on comparing the model hourly data with the hourly observed SR at the Istituto di Scienze Marine (ISMAR)–Consiglio Nazionale delle Ricerche (CNR) platform, which is located 15 km offshore of the Venetian littoral. Further, also data in Trieste, Italy, and the Croatian cities of Rovinj, Split, and Dubrovnik are considered in the analysis. The ISMAR-CNR tide gauge is preferred to the historical gauge in the Venice city center (called Punta Salute), because it is not affected by the internal hydrodynamics of the lagoon, which introduces a small delay (about 1 h), slightly modifies the SL signal, and cannot be reproduced at the resolution used by the HYPSE model. However, differences of SR maxima between ISMAR-CNR and Punta Salute are very small (see Table 3), except for event 10, when it is 13 cm (about 16% of the surge). For each event five prediction ranges have been considered, corresponding to forecast launched at 0000 central Europe time (CET), approximately 24, 48, 72, 96 and 120 h before the observed storm surge. Therefore, the study is based 10 × 5 × 51 simulations of HYPSE. Each simulation reproduces the SR (storm surge and following seiches). The astronomical tide (eventually separately computed) has to be added for obtaining the actual SL in agreement with the current operational practice at ICPSM.

The 10 events selected in this study are listed in Table 3. The list includes the 10 largest events within the selected 1-yr-long period with no large preexisting seiche perturbing the initial sea level state. Events 9 and 10 were practically very relevant (more than 50% of Venice was flooded), while events 1 and 6 have produced a negligible effect. For each event, Table 3 shows the date; the observed maximum SR at ISMAR-CNR, Trieste, Rovinj, Split, and Dubrovnik; the time of the SL maximum; the percent of Venice flooded; and SL and SR in the Venice city center. In several figures event 10 has been selected as an example, because it produced the highest SL in the Venice city center among the considered surges, but other events would have produce similar plots.

Figure 2 (top) shows for event 10 the 50 simulations of the EPS launched approximately 2 days before the maximum surge (thin colored lines), the DF, the ensemble mean forecast (EMF; the mean of the 50 EPS simulations), and the observations denoted with the thick green, red, and blue lines. The DF and the entire set of EPS forecasts start from the same initial condition, because of the common previous 6-day analysis, whose last day is marked with the black vertical line. Figure 2 (top) shows clearly the divergence of the EPS member forecasts with time and its increase during the actual storm surge event in the second day of the simulations.

### Preliminary normalization of the time series

The time series of the SR during the 10 events have been normalized to reduce them to a dimensionless index , which facilitates their comparison in spite of their different amplitudes:

The observed SR maximum during each event is given by ; is the *k*th hourly value of the SR, where *k* denotes the time step; and *R*_{O} and *R*_{X} are the two background sea level values, which are meant to account for the mean level of the Adriatic Sea at the beginning of the forecast. Variable *R*_{X} is replaced with *R*_{O} or *R*_{A}, depending on whether Eq. (1) is used for observations or model simulations, respectively.

The value of *R*_{O} is associated with steric effects and changes of Adriatic total water mass, which cannot be described by a barotropic model such as HYPSE. In fact, HYPSE cannot describe the changes of SL due to changes of temperature and salinity of the water column (steric effects), because it adopts a fixed and uniform value for water density. Further, it cannot compute changes of the total mass of the Adriatic Sea, because it has no information on fluxes of mass across the Otranto Strait, which connects the basin to the rest of the Mediterranean Sea. In general, the physical mechanisms changing the background sea level (heat and evaporation fluxes, persistent MSLP gradients at a scale larger than that of the Adriatic Sea) can offset for a long time the level of the whole Adriatic and act on a time scale much larger than the storm surge. The computation of *R*_{O} and *R*_{A} is a practical way of removing the bias produced by steric and mass effects and of compensating for their absence in the forecast. In this study, *R*_{O} is the value reached at the end of the analysis time window by the linear interpolation to the observed hourly SR data^{5} [it is computed using data during the last day of the analysis period; see Fig. 2 (top)]. Variable *R*_{A} is computed analogously (for ensuring consistency when comparing observations and model results), considering the linear interpolation of the model SR data [see Fig. 2 (top)]. Anyway, *R*_{A} is always small with respect to both and *R*_{O}. For each event, the same is used for computing , both in the 51 models and the observed time series. Figure 2 (bottom) shows the effect of the normalization procedure for event 10. In all events, after this normalization, the peak of observed time series has a value equal to 1 (dimensionless), while the peak of the forecasts can be larger/smaller than 1, depending on whether they over- or underestimate the observed storm surge peak. Figure 2 (top) reports the values of *R*_{O}, *R*_{A}, and . The continuation of this paper considers normalized SR time series except in section 3b.

## 3. Results

### a. Accuracy of deterministic and ensemble mean forecast

Table 4 considers the forecasts launched approximately 24 h before the observed SR peak at the ISMAR-CNR platform. For the 10 events, it shows the (normalized) DF, the EMF, the maximum and minimum SR peaks of the EPS simulations, their difference, and the error of the DF and of the EMF. For the DF and EPS simulations, the peak SR is actually the maximum value within a 3-h time window centered at the time of the observed peak. However, such value represents in 97% of runs the actual SR maximum of the simulation. The value 1 would characterize a perfect forecast of the storm surge peak. Table 4 shows that, in terms of SR peak, the average behavior of the EMF has the same accuracy as the DF (the EMF has a slightly smaller bias, but the difference in respect to DF is not statistically significant). Further, it shows that the EPS range always includes the observed value.

Similar information is graphically summarized in Fig. 3, which shows the mean (average of the set of 10 events) results for the 24-, 48-, and 72-h forecasts, considering the normalized peak values ISMAR-CNR platform. The quality of the DF and of the EMF is very similar. Both the mean DF and EMF peak values systematically underestimate the observed peak levels (mean values of the forecasts are lower than 1 for all forecast ranges) and the mean absolute error (MAE) of the SR peak increase slightly (but the increase is not statistically significant) with the forecast time range, being slightly lower for the EMF than for the DF. Note that, because of the 3-h time window used for identifying the peak, this figure is not sensitive to forecast errors in the time of SR peak of the DF. Instead, when considering the peak of the EMF, the SR error is sensitive to the timing errors of the SR peak of the ensemble members. Figure 3 shows that the range in the EPS members (the difference between the highest and the lowest EPS member peak values) increases with the forecast range. The mean absolute error, the EPS member range, and the standard deviation of the 50 EPS members increase consistently (but slightly, and not at a high statistical significance level) with the forecast range.

Figure 4 shows the mean of the normalized time series for the set of 10 events at the ISMAR-CNR platform. Besides the normalization procedure, the time series have been shifted in time so that 0 corresponds to the observed peak of the surge. The figure is meant to represent the average (in some sense, idealized) evolution of a typical surge in the northern Adriatic and of its prediction. Individual events may deviate substantially from this idealized average evolution (e.g., event 10 in Fig. 2). It is clear from Fig. 4 that the characteristic time scale for the evolution of the surge is about a half day and that the initial peak is followed by seiches oscillating with a time period of about 22 h. It is evident that EMF and DF have in general a very similar evolution. Though Fig. 4 suggests that they both tend to underestimate the actual surge peak level, their difference with respect to observations is not statistically significant at any relevant level (>90%).

Figure 5a shows the mean absolute error of the 24-h forecast considering the normalized time series of the 10 simulated events as function of time at the ISMAR-CNR platform. All time series have been shifted as in Fig. 4. The mean absolute error increases with time for both DF and EMF, but it is almost always consistently lower for the EMF. Figure 5b is similar to Fig. 5a, except it shows the maximum error of the 10 simulations as function of time. Most of the time, the maximum error of the EMF simulations is much lower than the maximum error of the DF. Results for the longer forecast range (48 and 72 h) are similar, as the EMF has consistently lower absolute and maximum errors than the DF. Both in Figs. 5a and 5b, the difference between the EMF and DF mean and maximum errors is plotted (violet line). Positive values indicate that EMF has lower errors. Figures 5c and 5d show the difference between the EMF and DF mean errors, respectively, for all five gauges used for model validation, and they confirm that at all stations EMF is more robust than DF. It is not obvious how to estimate the confidence level of the difference between the mean absolute and the maximum errors of DF and EMF, because it depends on the estimate of the number of degrees of freedom. However, even a conservative estimate that considers that differences are based on 10 independent events shows that the reduction of the maximum and mean absolute errors of EMF with respect to DF is statistically significant well beyond the 90% confidence level.

### b. Probabilistic forecast information

The main goal of SR-EPS is to provide an estimate of the forecast error and of the probability of exceeding fixed threshold values. In none of the 10 events considered in this study (both for the 24, 48, and 72 h forecasts) is the observed maximum SR outside the range of values produced by the EPS. Therefore, the observed peak is always within the range of the possible values produced by the EPS and a non-nil probability is assigned to reaching it.

Figure 6a shows an example (the 48-h EPS forecast for event 10) of probabilistic estimates (pink line) exceeding a fixed SR threshold (this example uses 69 cm, which is the observed SR peak) at the ISMAR-CNR platform. Probability values have been derived by a Gaussian fitted to the EPS discrete distribution. Note that the maximum probability (about 17%) of reaching such a threshold does not coincide with the SR peak of the EMF, because the increase of the spread among the EPS members in this case has a larger effect than the value of the EMF. In other words, the distribution of EPS member SR values at 0000 CET 24 December has a slightly lower mean than in the previous hours, but the distribution is wider, so that the number of members above the 69-cm threshold is larger. Figure 6b displays also the probability of exceeding the 110-cm level, above which the MOSE barriers would be raised. Considering the future situation in which the MOSE barriers will be operative, Fig. 6b shows that the EPS would have provided a large probability (≥90%) of exceeding the 110-cm level and would have likely prompted the authorities to correctly lift them. Considering the present situation, EPS would have provided a very small probability of exceeding the 140-cm level (about 5%), which would have probably deterred the Civil Protection from delivering an unjustified warning. Therefore, this example suggests that the EPS would have helped the authorities to take correct action.

In 5 out of the 10 analyzed cases, the SL actually reached the 110-cm critical threshold and the decision to raise the barrier would have been correct. To test the potential of EPS to support the management of the MOSE barriers, a simple procedure has been devised that consists of deciding to raise the barriers when the forecasted probability of exceeding the critical SL threshold is above a reference level (and otherwise not to raise them). If the probability reference level is tentatively fixed at 25%, this procedure produces the right decision for 6, 8, and 6 out of 10 events using the 24-, 48-, and 72-h forecast, respectively, and a simple skill score (the Heidke skill score has been used; Von Storch and Zwiers 1999) shows that it has a skill above the random forecast. Clearly, such a probability reference level should be clearly determined on a risk versus cost basis for operational applications. If the 10% level were used, then no events would have been missed; however, the barriers would have been raised (with relative costs) without reason for 3, 4, and 3 times for the 24-, 48-, and 72-h forecast, respectively. If the 50% level were used, then 2, 3, and 3 events would have been missed, exposing residents to risks that could have been avoided.

Actually, all of Fig. 6 considers data at the ISMAR-CNR platform, but the difference between SR and SL at these two locations is generally small (see Table 3), though it should be eventually considered for a precise forecast. In this case the actual maximum level at Punta Salute was 144 cm, while at the ISMAR-CNR platform it was 132 cm.

Of course, the EPS members can be used for computing probability for any SL value and time. Figure 6c shows, according to the 48-h EPS forecast, the probability of exceeding an SL threshold (*x* axis) at 0000 CET 24 December 2012. The figure reports also the SL values of the deterministic forecast (green bullet), the EMF (red bullet), and the observed value (blue bullet). The observed value was within the predicted range with an (approximately) 15% probability of exceeding it.

Though, because of the small set of events considered in this study, the evaluation of the accuracy of the EPS probability distribution has a limited statistical value, some analysis has been carried out anyway. Figure 7 shows the statistical distribution of the peak values index :

where is the peak value in the *l*th member of the EPS in the *m* event, and and are the EMF and standard deviation of EPS member peak values for the *m*th event, respectively. The normalized indices follow the Gaussian distribution with a 0.95 confidence level according to the test statistics. Figure 7 also reports the 10 observed normalized peak indices (computed using the respective and for each event) and the Gaussian distribution with their mean and variance. If the EPS probability distribution were correct, then the 10 observed peak indices could not be distinguished from the individual members of the ensemble. In such a case, the two Gaussians, which represent the probability distribution predicted by the EPS (red dotted line) and the probability distribution derived from the observations (blue dotted line), should coincide. In fact, the mean and variance of the observed peak index are 0.48 and 0.91, respectively, while the variance of the EPS ensemble is 0.95. Though Fig. 7 visually suggests that observed SR peaks are more likely to have higher values and are distributed over a wider range than those produced by the EPS, these differences are not statistically significant at any relevant confidence level. Therefore, one cannot conclude (on the basis of these 10 events) that, when a storm surge is observed, the EPS is affected by a negative bias and that it underestimates the uncertainty of the forecast.

### c. The spread among the EPS member results

The spread among the EPS simulation is expected to represent a measure of the uncertainty of prediction and to be linked to the forecast error, so that cases with the largest spread are those with highest uncertainty and where a large error of the ensemble mean (and also of the deterministic forecast) is more likely. Figure 3 has shown the tendency of error and spread to increase with the forecast range. This concept is reinforced by the scatterplots in Fig. 8, which shows the EMF absolute error (AE) versus the spread of the corresponding EPS members. Figure 8 considers the normalized peak indices and includes all 10 cases, all forecast ranges, and all tide gauges. The black line represents a smooth interpolation to all data, and it shows the tendency of the error to increase with the spread of the results of the EPS members.

Figure 9 shows the time evolution of the EPS spread for the 10 events, comparing its behavior for different forecast ranges. Each time series shows the EPS spread after it has been normalized with its mean value during the whole forecast, so that in all events its evolution has a comparable magnitude though the actual value can vary significantly. In Fig. 9a each of the 10 time series has been shifted in time, so that the peak of the surge occurs at 24 h. In practice they represent simulations that depending on the event were launched from 24 to 48 h before the storm surge peak. The red line represents the average normalized spread of all 10 simulations. Figures 9b–e show the same information, except the forecasts were launched from 2 to 6 days in advance, so that the peak of the surge occurs at 48, 72, 96, and 120 h, in this order. Figure 9f compares the evolution of the mean spread as shown in Figs. 9a–e. The blue line is the mean of all the red lines and represents the mean behavior of the spread in the absence of a surge. This is actually only approximately true, because this mean includes also the day with the peak for one time series out of six. Figure 9f shows that, in general, the spread increases gradually (and approximately linearly) with time, but that the presence of the SR peak corresponds to a temporary overshoot with respect to its normal evolution, meaning that uncertainty has a maximum at the time of the storm surge peak. Further, Fig. 9 (particularly Fig. 9b) suggests that the forecast spread oscillates with a period qualitatively corresponding to that of the second seiche of the Adriatic Sea (11 h; e.g., Lionello et al. 2005). This peculiar behavior of the forecast uncertainty is caused by the dynamics of the sea level in this semienclosed, elongated basin.

To investigates the causes for the uncertainty of the forecast, the spread of the SR simulations has been compared (Fig. 10) with that of the MSLP and wind speed at platform ISMAR-CNR. The spread of the wind speed and the MSLP have been normalized with the same method that has been applied for the SR in Figs. 9. The spread of the SR follows very closely that of the wind speed for the 24-, 48-, and 72-h forecasts, while it follows that of the MSLP for the 96- and 120-h forecast. This behavior suggest that storm surge dynamics play a role in deciding which atmospheric variable the SR uncertainty follows at short versus long lead times. Moreover, these results show that the quasi-linear dynamics of storm surge in the Adriatic Sea has a small effect on the evolution of the sea level forecast uncertainty, which largely follows that of the meteorological forecast, so that the maxima of uncertainty at the time of the SL peak appear to be the effects of the maxima of uncertainty of the meteorological forcing fields.

## 4. Conclusions

An EPS for storm surges in the northern Adriatic Sea has been implemented using the ECMWF EPS for the wind and MSLP forcing fields of a hydrodynamical SR model. The analysis of the results is focused on the peak values (and not on hourly values) of 10 relatively intense events that occurred during the year 2010. A further study of simulating the operational practice and collecting a set of events that are representative of all conditions (and not only of observed large storm surge events) is needed for assessing the overall performance of the EPS, and for computing variables such as the Brier score, and for drawing reliability diagrams and rank histograms. However, this study has already produced interesting outcomes on the reliability of EPS for storm surge forecasts in the Adriatic Sea and its capability of describing the uncertainty of the predictions of relatively large storm surges.

The EPS probability distribution of peak storm surge values is shown to be realistic. Though it is narrower and with a lower mean than in observations, the difference between predicted and observed distribution does not reach any relevant level of statistical significance. Note that in this study the predicted and observed distributions should not be necessarily expected to match and that the EPS may actually perform better than what these results show. In fact, the simulated cases have been selected on the basis that a large storm surge actually occurred, while those where the ensemble would have produced a surge in some members, but none occurred in reality, have not been considered. This choice is expected to skew the statistics of the observation–simulation comparison, so that the EPS simulations have a negative bias. Finally, these eventual shortcomings (negative bias and low spread) of the EPS could be mitigated with simple first- and second-moment corrections based on an adequate sample size (which is not available on the basis of this limited set of simulations).

The EPS spread can be used for estimating the uncertainty of the forecast. In fact, it is correlated with the error of the EMF, meaning that events with large EPS spread are more likely to produce large errors in the EMF (and in the DF as well).

This study shows that uncertainty of the storm surge forecast is largely originated by the forcing meteorological fields. Results show that the EPS spread increases linearly with time and that it is proportional to the spread of the forcing meteorological fields. It is shown that the time of the storm surge peak corresponds to a maximum of uncertainty for the SR prediction. The quasi-linear dynamics of storm surge in the Adriatic does not add uncertainty to the SL prediction, which is mainly determined by that of the weather forecast, except it introduces fluctuations of the EPS spread with a periodicity close to that of a resonant mode of the basin (the 11-h seiche). Obviously, there are other sources of uncertainty (to be analyzed in future studies) that are related to the inaccuracy of the hydrodynamical model and errors in the initial condition of the SR forecast.

The EMF represents a reliable prediction of the SR peak. Its accuracy is similar to that of the DF (both marginally underestimate the actual maximum of the storm surge), though EPS uses forcing fields at a much lower resolution. Since an EPS is not necessarily helpful in representing errors arising from poor resolution of sharp orographic features, because they are likely at least in part systematic rather than random errors, it is not surprising that the EMF does not improve its prediction with respect to a higher-resolution DF. However, the prediction of EMF is more robust than that of DF, meaning that EMF hourly predictions have consistently lower mean absolute and maximum errors than DF.

Improvements of the quality of the EPS would be very likely obtained by increasing the accuracy of the meteorological forcing (e.g., by increasing its resolution) and/or adopting a hydrodynamic model with an unstructured grid (which could better reproduced the surge dynamics close to the coast). Further, a previous study (Lionello et al. 2006) has shown the advantages of implementing a data assimilation procedure to reduce the error of the initial condition of the storm surge forecast. Therefore, using it for initializing the EPS members will likely be very beneficial for the EPS-based probability forecast. Though this study has shown that EPS provides reliable information for storm surge forecast and its uncertainty, these results, admittedly, are not sufficient for assessing whether this EPS is adequate for managing coastal defenses. In the specific case of the MOSE barriers, the evaluation of the costs of false alarms (raising the barriers without actual need) and of partial flooding of the city center (caused by moderate events that maybe assigned a low probability by the EPS) is needed. In other words, investigating how to use probabilistic information of EPS for making practical decisions is required for assessing whether it is actually adequate for operational purposes. This will require the analysis of details such as the optimal reference probability threshold for deciding to raise the MOSE barriers. However, this study has shown that the EPS probabilistic approach can be effectively used for obtaining useful and accurate information for the forecast of storm surge peaks and on its uncertainty, providing a necessary premise for its future operational use.

## Acknowledgments

The authors thank Dr. Dario Conte (CMCC, Italy) for his assistance in solving computer problems and managing the input fields. The hourly SL observations in the CNR platform and Venice “Punta Salute” were provided by ICPSM of the Venice City Council; observations in Trieste by Dr. Fabio Raicich of ISMAR-CNR (Italy); and observations in Rovinj, Split, and Dubrovnik by Dr. Nenad Leder of the Croatian Hydrographic Institute.

## REFERENCES

*Modellistica del Sistema Lagunare, Studio di Impatto Ambientale,*Vol. 2,

*La Ricerca Scientifica per Venezia: Il Progetto Sistema Lagunare Veneziano,*Istituto Veneto di Scienze Lettere e Arti,

*Atti Classe di Scienze Fisiche, Matematiche e Naturali,*Atti dell’Istituto Veneto di Scienze Lettere ed Arti, Book 162,

*Flooding and Environmental Challenges for Venice and Its Lagoon: State of Knowledge,*C. Fletcher and T. Spencer, Eds., Cambridge University Press, 267–277.

**139,**184–197, doi:.

*Méditerranée,*

**108,**59–68. [Available online at http://mediterranee.revues.org/170.]

*J. Coastal Res.,*

**64,**1184–1188.

*Statistical Analysis in Climate Research*. Cambridge University Press, 494 pp.

## Footnotes

^{1}

In Venice, SL is measured with respect to the mean SL in 1897. Nowadays, because of local subsidence and SL rise, the mean SL is estimated about 26 cm above this reference (Comune di Venezia et al. 2013).

^{2}

There is a further contribution due to the wind-wave setup that is generally negligible, but in some extreme cases it may be relevant (e.g., De Zolt et al. 2006).

^{3}

This single deterministic forecast produced with HYPSE is presently used for surge forecast in Venice together with the forecasts produced by SHYFEM and BIGSUMDP.

^{4}

The label Tn is used for denoting that the spherical harmonic series of the global meteorological model adopts a triangular truncation with max index *n*.

^{5}

Taking the mean value instead of the linear interpolation would change the results very little.