The 2013/14 Thames Basin Floods: Do Improved Meteorological Forecasts Lead to More Skillful Hydrological Forecasts at Seasonal Time Scales?

The Thames basin experienced 12 major Atlantic depressions in winter 2013/14, leading to extensive and prolonged ﬂuvial and groundwater ﬂooding. This exceptional weather coincided with highly anomalous meteorological conditions across the globe. Atmospheric relaxation experiments, whereby conditions within speciﬁed regions are relaxed toward a reanalysis, have been used to investigate teleconnection patterns. However, no studies have examined whether improvements to seasonal meteorological forecasts translate into more skillful seasonal hydrological forecasts. This study applied relaxation experiments to reforecast the 2013/14 ﬂoods for three Thames basin catchments with different hydrogeological characteristics. The tropics played an important role in the development of extreme conditions over the Thames basin. The greatest hydrological forecasting skill was associated with the tropical Atlantic and less with the tropical Paciﬁc, al- though both captured seasonal meteorological ﬂow anomalies. Relaxation applied over the northeastern Atlantic produced conﬁdent ensemble forecasts, but hydrological extremes were underpredicted; this was unexpected with relaxation applied so close to the United Kingdom. Streamﬂow was most skillfully forecast for thecatchmentrepresenting a large drainageareawith highpeakﬂow. Permeablelithologyand antecedent conditions were important for skillfully forecasting groundwater levels. Atmospheric relaxation experiments can improve our understanding of extratropical anomalies and the potential predictability of extreme events such as the Thames 2013/14 ﬂoods. Seasonal hydrological forecasts differed from what was expected from the meteorology alone, and thus knowledge is gained by considering both components. In the densely populated Thames basin, considering the local hydrogeological context can provide an effective early alert of potential high-impact events, allowing for better preparedness.


Introduction
The prediction of water availability over seasonal time scales is beneficial for many aspects of the water sector, including flood forecasting, water supply, hydropower Denotes content that is immediately available upon publication as open access. generation, and navigation. For contingency planners, skillful seasonal hydrological forecasts (SHFs) of river and groundwater levels have the potential to provide an indication of possible flood events weeks or months in advance, allowing for more optimal and consistent decisions to be made (Arnal et al. 2017). The operational use of SHFs, however, remains a challenge because of uncertainties posed by the initial hydrologic conditions (e.g., soil moisture, groundwater levels) and seasonal climate forcings (mainly forecasts of precipitation and temperature) that lead to a decrease in skill with increasing lead times Svensson 2016).
Across the United Kingdom and Europe, seasonal streamflow and groundwater forecast methods are currently being developed for application, for example, the U.K. Met Office Global Seasonal Forecast System (GloSea5), the Hydrological Outlook UK, and the Copernicus European Flood Awareness System (MacLachlan et al. 2015;Mackay et al. 2015;Svensson 2016), supported by Copernicus projects including Service for Water Indicators in Climate Change Adaptation (SWICCA) and End-to-End Demonstrator for Improved Decision-Making in the Water Sector in Europe (EDgE; Copernicus 2017a,b). Recent U.K. developments in SHFs stem from a prolonged period of drought beginning in 2010, which changed rapidly to widespread flooding during the winter of 2013/14. Driven by the consecutive formation of 12 major Atlantic depressions, the period between December 2013 and February 2014 (DJF 2014) was the wettest in the United Kingdom since records began in 1910 (Huntingford et al. 2014;Kendon and McCarthy 2015;Muchan et al. 2015) and the stormiest for at least 143 years when measured by cyclone frequency and intensity (Matthews et al. 2014). Individual storm events did not yield exceptional rainfall, but accumulated levels over the period led to extensive flooding nationwide, the costs of which were estimated at £1.3 billion (Environment Agency 2015a). The Thames River basin (southeast United Kingdom) received more than half a year's typical rainfall during DJF 2014 , which led to concurrent fluvial, pluvial (surface water/flash), and groundwater flooding-so-called compound or coincident flood events (Thorne 2014).
To date, seasonal hydrological forecasting studies in the Thames and other lowland catchments have primarily identified initial hydrologic conditions as a dominant source of predictability (see Svensson et al. 2015;Svensson 2016). This is because flow regimes are dominated by slowly released groundwater, and forecast skill can be derived largely from the hydrogeological memory of antecedent conditions. Research conducted elsewhere, however, has found that driving hydrological models with more skillful meteorological inputs can capture observed flood events (Yossef et al. 2013) and improve hydrological prediction skill for streamflow (Shukla and Lettenmaier 2011;Svensson et al. 2015) and groundwater levels (Almanaseer et al. 2014). The contribution of meteorological forcing to SHF skill has also been found to outweigh that provided from initial hydrologic conditions during times of transition from dry to wet climate conditions (Wood et al. 2016). In the United Kingdom, there is demand to develop and improve the characterization and skill of meteorological inputs  to improve hydrological forecasting skill at longer lead times and during winter months (e.g., see Li et al. 2009;Shukla and Lettenmaier 2011;Thober et al. 2014). This is recognized as being particularly important for predicting groundwater levels during extreme events as currently, inadequacies in seasonal rainfall forecasts have resulted in low groundwater forecast skill in all but the most quickly responding U.K. catchments (Mackay et al. 2015). Considering better predictions of meteorological conditions, alongside studies focusing on the role of initial hydrologic conditions, will help ascertain which skill improvements have the greatest potential to benefit hydrological forecasts across the Thames basin.
There has been much discussion regarding the meteorological factors that led to the DJF 2014 floods in southern England. Huntingford et al. (2014) proposed various driving mechanisms for the precipitation anomalies, with the North Atlantic Oscillation (NAO) providing the strongest relationship. A positive NAO, characterized by an atmospheric pressure difference between the Azores and Iceland, is associated with increased delivery of rain-bearing cyclonic weather systems into northern Europe during the winter months (Wilby 2001;Svensson et al. 2015). The importance of the NAO, however, was disputed by van Oldenburg et al. (2015), who stated that a pressure pattern bearing a low to the west of Scotland (as opposed to Iceland) accounted for substantially more of the variance in precipitation during this event and that combinations of major large-scale modes of variability are likely to have caused the stormy conditions (see also Knight et al. 2017).
The exceptional conditions of DJF 2014 have thus also been attributed to a hemispheric pattern of severe weather. Relative to the December 1981-February 2010 ERA-Interim climatology, sea surface temperatures (SSTs) in the tropical Pacific were warmer than usual, which disturbed wind patterns over the northeast Pacific and deflected the Atlantic jet stream northward. This brought cold air to North America while eastern Europe was anomalously warm (Palmer 2014;Watson et al. 2016); this temperature gradient strengthened the jet stream and provided conditions for the continued formation of depressions that affected the United Kingdom (Slingo et al. 2014). These anomalous conditions, however, were not skillfully forecast by the European Centre for Medium-Range Weather Forecasts (ECMWF) Operational System 4 (S4) seasonal meteorological forecasting system (Molteni et al. 2011). As a result, the ECMWF conducted a set of hindcast atmospheric relaxation experiments (AREs) to better understand the role of tropical sea surface temperatures in forcing the extratropical circulation response. The AREs relaxed the atmosphere toward the ERA-Interim reanalysis state within specified domains highlighted by negative Rossby wave source anomalies (see Rodwell et al. 2015;Magnusson 2017), forcing S4 to more accurately represent the cyclonic weather conditions prevailing in winter 2013/14. The results provided convincing evidence that the temperature and precipitation anomalies in Europe and North America were embedded within a hemispheric regime that was partly forced by tropical and underlying sea surface temperatures via Rossby wave source forcing (associated with its convection and divergent outflow) and that increased precipitation may have acted to reinforce the upstream wave. This paper will use the ECMWF's AREs from Rodwell et al. (2015) to relate seasonal hydrological forecasting skill to the forecasting skill of meteorological input and its traceability from different atmospheric domains. Seasonal hydrological reforecasts for DJF 2014 were conducted using the European Flood Awareness System (EFAS) with seasonal meteorological input generated from the unforced S4 and three AREs. Specifically, we seek to identify 1) which seasonal meteorological reforecasts perform best, 2) whether increased skill in seasonal meteorological input translates through to more accurate streamflow and groundwater reforecasts for the 2013/14 compound flood event in the Thames River basin, and 3) how hydrological response differs for catchments with different geological and land-use characteristics. We discuss the potential for improvements to seasonal meteorological and hydrological forecasts and the practical value of more skillful seasonal flood forecasts for stakeholders to assist with decision-making in the Thames River basin.

a. Study catchments
The Thames River basin (containing 18 tributary catchments) covers approximately 16 200 km 2 in the southern United Kingdom. The western side is predominantly rural, comprising agriculture and woodland with rolling hills and wide, flat floodplains. Toward the center and east, the basin becomes increasingly urbanized, encompassing the towns of Reading, Slough, and Greater London. The source of the River Thames is located in the west (elevation up to 350 m MSL) and flows 230 km to Teddington Lock, which is the official upper tidal limit (elevation 4 m MSL; Fig. 1). The basin encompasses a diverse range of lithologies that greatly influence the flow regime of the Thames and its tributaries, from seasonally spring-fed streams to chalk aquifers with high baseflow and clay-based rivers that are characterized by a flashy response to storm events and high levels of surface runoff (Bloomfield et al. 2009). Anthropogenic channel modifications, abstraction from major aquifers, and discharge points into the river also influence the flow regime; abstraction specifically represents a 5%-12% reduction in typical annual peak flow (Thames Water 2010). Recent estimates identified more than 200 000 properties at risk of flooding from a ''100 year'' event across the basin (Environment Agency 2009).
For the purposes of forecasting fluvial and groundwater floods, this study focuses on three catchments with contrasting geological and physical characteristics upstream of Teddington Lock that experienced compound flood events during DJF 2014. The Evenlode is a relatively small (429 km 2 ) rural agricultural headwater catchment dominated by a limestone aquifer. The Loddon (682 km 2 ) comprises a rural-urban gradient and variable geology. The area referred to in Fig. 1 as Lower Thames (324 km 2 ) is the farthest point downstream before Teddington Lock and is small and heavily urbanized, with a densely populated floodplain largely overlaying impervious London clay deposits (Fig. 1, Table 1).

b. ECMWF atmospheric relaxation experiments
The rationale behind the AREs was to investigate teleconnection patterns from specified forcing regions. The concept nudges the forecast toward the ''true state'' in a predefined area during the forecast integration, allowing the downstream impacts from the region to be investigated. The nudging involves adding an extra term to the prognostic equations of the model. Further details about the relaxation technique can be found in section 2.2 in Magnusson (2017). The source regions in this study were selected based on their strong and persistent seasonal-mean forcing on the Rossby waveguide (Rodwell et al. 2015). As these forcing patterns in the source regions potentially had a long predictability, it was expected that the AREs should show impact on the predictability in other parts of the world.
This paper used three ARE model runs (AR_NPAC, AR_WATL, and AR_EATL), each representing a different source of atmospheric relaxation (see Figs. 2d-f and Figs. A1a-c in appendix A). The source regions were chosen where a strong average forcing on the northern midlatitude flow during DJF 2014 was identified, using a Rossby wave source as the diagnostic [see Rodwell et al. (2015) for details]. The AR_NPAC region (centered at 358N, 1508W) can be physically explained as the region where forcing from the northeast tropical Pacific acted on the midlatitude flow (Fig. 2d). AR_WATL (western Atlantic) was the source region for Atlantic cyclones (358N, 758W; Fig. 2e). AR_EATL was over the northeast Atlantic (558N, 158W) and was associated with the heavy precipitation experienced during DJF 2014, although not directly linked to any underlying SST anomaly ( Fig. 2f). In each of the regions, the atmosphere was relaxed toward the ERA-Interim reanalysis state to determine the impact of each region.
All seasonal meteorological ensemble forecasts (S4 has 51 ensemble members and AR_NPAC, AR_WATL, and AR_EATL have 28 members) were produced by the ECWMF Integrated Forecasting System (IFS) coupled  atmosphere-ocean-land model. The atmospheric model was run at T255 horizontal resolution (;80 km) with 60 vertical levels (91 for S4), and the NEMO ocean model with 18 horizontal resolution in middle latitudes and higher resolution near the equator. Because of model updates, AREs used a more recent atmospheric model version (CY40R1) than that which is used operationally in S4 (CY36R4). ECMWF produced a 28-member ensemble of unforced control runs (NO_AR; Figs. 2c and A1d); neither cycle was able to predict the observed planetary wave anomaly in DJF 2014.

c. Seasonal hydrological modeling
Hydrological reforecasts were produced using the EFAS seasonal hydrological forecasting suite. EFAS aims to increase preparedness for floods in large European river basins based on operational probabilistic flood forecasts Thielen et al. 2009;Smith et al. 2016). The hydrological model used in EFAS is LISFLOOD, a hybrid between a conceptual and a physical rainfall-runoff model combined with a river routing module and run on a 5 km 3 5 km grid (van der Knijff et al. 2010;Alfieri et al. 2014). LISFLOOD is calibrated using nonnaturalized data. A new seasonal outlook for EFAS was recently developed by the ECMWF that uses seasonal meteorological ensemble forecasts from S4 as input to LISFLOOD to extend the EFAS flood forecast horizon up to 7 months (Arnal et al. 2018). A reference daily simulation, termed the EFAS water balance (EFAS-WB), which starts from the initial conditions of the previous day and is forced with the most recent observed meteorological fields (interpolated point measurements of precipitation and temperature), is also run. EFAS-WB is used as initial conditions from which the seasonal forecasts are started and provides a best estimate of the hydrological states at a given time for a given grid point, that is, represents the theoretical upper limit of the model performance.
This study used the seasonal meteorological forecasts from ECMWF's S4 and the three ARE model runs as input to LISFLOOD. All hydrological forecasts were initiated on 1 November 2013 and ran for 4 months with a daily time step to provide ensemble reforecasts for streamflow (routed river flow measured in m 3 s 21 ) and groundwater level (storage in upper groundwater zone measured in mm). Catchment-averaged daily cumulative precipitation reforecasts (mm day 21 ) were also produced.
Raw daily observation data for streamflow (m 3 s 21 ) and groundwater level [m above ordnance datum (AOD)] were obtained from National River Flow Archive (NRFA) gauging stations and Environment Agency (EA) groundwater boreholes. Within each study catchment, one gauging station on the main river and one groundwater borehole were chosen (Fig. 1). All observation points provided a complete daily record over the 4-month reforecasting period, plus data extending back 20 years (or as far back since the start of records) to identify probability exceedance thresholds for that location. Streamflow and groundwater level reforecasts were obtained for the 5-km EFAS grid tile within which the NRFA gauging station and EA groundwater borehole were located to ensure spatial consistency when comparing between forecasts and observations ( Fig. 1). Areal precipitation reforecasts were calculated using the arithmetic mean for each catchment. Continuous ranked probability scores (CRPSs; Hersbach 2000) were used as a measure of streamflow forecast sharpness and accuracy comparing against the simulated water balance (EFAS-WB) and river gauge observation data. To ensure consistency when comparing against different-sized ensembles, the relative percentage difference between the 28-and 51-member CRPS values for S4 were calculated; values ranged from 0% (no difference in the Evenlode) to 0.83% (in the Lower Thames; Fig. B1 in appendix B).
Spearman's rank correlation coefficient p was used to compare the median forecasted groundwater level against the simulated EFAS-WB and against borehole groundwater observations. Spearman's rank is a nonparametric measure of temporal rank correlation, which accounted for groundwater levels being expressed in different units.
Finally the EFAS-WB was compared against gauged daily streamflow observations and borehole groundwater observations as an evaluation of the LISFLOOD performance capability to accurately forecast the events in each catchment-this was achieved using Pearson's correlation coefficient r (to test EFAS-WB streamflow performance) and Spearman's rank p (groundwater performance). A workflow of all the forecasts, models, methods, and analyses used in the paper is shown in Fig. 3.

a. Meteorological forcing
Severe weather conditions did not originate from a single event, but from a number of events between late December 2013 and the end of February 2014, as supported by the negative seasonal average anomaly of the 500-hPa geopotential height (z500) over the northeastern Atlantic, with the United Kingdom located at the southeastern edge (Fig. 2a). For a seasonal forecasting system, capturing this structure was key to predicting the wet anomaly over the United Kingdom, but no anomaly was present in the ensemble mean averaged over the whole season for the S4 forecast (Fig. 2b).
Figures 2d-f show the results from the three AREs. By applying the atmospheric relaxation over the northeastern Pacific (AR_NPAC), the z500 anomalies over the western hemisphere were improved with a negative node over Canada and a positive node over the western Atlantic. There was also a negative anomaly present over the northeastern Atlantic, with a similar position to the analysis but weaker in magnitude (Fig. 2d).
Relative to AR_NPAC, the seasonal anomaly over the eastern Atlantic was better captured both in position and magnitude, with relaxation applied over the eastern part of the United States and the western Atlantic (AR_WATL; Fig. 2e). In the final experiment with the relaxation applied over the eastern Atlantic (AR_EATL), the negative anomaly was inside the relaxation box. However, the magnitude was less than in the analysis and AR_WATL, and the southern extent (outside the box) was not captured (Fig. 2f). The time series of accumulated precipitation shows AR_EATL underpredicted through late December but captured the rainfall better in January and February.
b. Hydrological response to meteorological forcing 1) OVERVIEW Patterns in simulated EFAS-WB cumulative areal precipitation values (pink line) were consistent across all catchments (Figs. 4-6). There were a few wet days in early November and a dry period into mid-December followed by higher-than-average rainfall conditions, with extreme precipitation events corresponding with Atlantic depressions recorded in mid-to late December, early January, late January, and early February. Over the 4 months, total cumulative areal precipitation (EFAS-WB, pink line) was greatest in the Loddon catchment at 541.2 mm and lower in the Evenlode and Lower Thames at 494.1 and 454.1 mm, respectively (Figs. 4-6).
During early to mid-December, observed gauged daily streamflow (black line) fell below the median (Q50) daily flow record (Table 1) in all three catchments (based on daily flow records from 1994 to 2014; NRFA 2017). Observed streamflow in all catchments then exceeded the Q10 exceedance threshold (percentage of time that streamflow exceeds the 90th percentile) from mid-December through to the end of the study period. Observed borehole daily groundwater levels (black line) exceeded Q50 (EA 2017) in the Loddon toward the end of January but did not reach the Q10 level of 65.9 mm (not shown in Fig. 5). Observed groundwater levels exceeded Q10 in the Evenlode by mid-December and Lower Thames by mid-January (Figs. 4, 6).
Comparing observations against EFAS-WB (model performance; Fig. 7), LISFLOOD was capable of predicting streamflow and groundwater levels with reasonably high accuracy in all three catchments; correlation coefficients ranged from a positive moderately strong 0.7 for Lower Thames groundwater to a near perfect 0.98 positive correlation for Lower Thames streamflow (Fig. 7).

FORECASTS
Visual improvements to areal precipitation forecasts and streamflow forecasts identified by CRPS followed the general pattern: (worst) S4 . AR_NPAC . AR_EATL . (Figs. 4-6, 8a). This trend was similar for groundwater correlations; all three ARE model runs demonstrated marked improvement compared with S4 forecasts that showed negative correlation with simulated EFAS-WB and borehole groundwater observations in each catchment (Fig. 8b).

AR_WATL (best)
S4 forecasted a linear increase in rainfall from 1 November that failed to pick up the low rainfall conditions from the end of November to early December or the extreme precipitation events in mid-December and beyond. S4 also substantially underpredicted the total amount of precipitation forecast over the 4-month period. The resulting streamflow forecasts showed minimal forecast skill across all catchments; the median did predict above-average streamflow conditions (up to 90th percentile at times) and low numbers of ensemble members forecast some extremes, but the timing and magnitude of peak events were largely incorrect, notably during the first 6 weeks (Figs. 4-6). S4 forecasted decreasing groundwater levels over the 4 months, leading to negative correlations with borehole observations; this was FIG. 5. As in Fig. 4, but for the Loddon catchment.
FIG. 4. Precipitation, streamflow, and groundwater levels: a comparison between S4 and the three ARE model runs AR_NPAC, AR_WATL, and AR_EATL for the Evenlode catchment. Forecast shading shows the minimum, 5th, 25th, 75th, 95th, and maximum of the ensemble in all cases. Areal precipitation (mm) 5 catchment-averaged cumulative daily forecast median values (gray). Streamflow (m 3 s 21 ) 5 daily forecast median (light blue) at the river gauging station. Groundwater level (mm) 5 daily forecast median (dark blue) at the groundwater borehole. Observations (black) and simulated EFAS-WB (pink) in all cases. Q10 (long dash) and Q50 (short dash) show exceedance thresholds (based on 1994-2014 observation records or the longest available record). most pronounced in the Evenlode, where observations recorded a 6.10-m increase in groundwater levels in the aquifer (Figs. 4, 8b).
The AR_NPAC precipitation forecast was similar to S4, although sharper with less spread about the median leading to minor improvements in streamflow and groundwater forecasts. The timing of peak streamflow events was more accurately represented, and the magnitude was picked up by the ensemble maximum in many cases. There remained poor forecast quality during the first 6 weeks. Groundwater forecast median showed weak to moderate positive correlation with borehole observations and EFAS-WB (Fig. 8b), although there was a large ensemble spread .
Areal precipitation forecast by the AR_EATL model run was sharp with a good correlation but underprediction in respect to the simulated EFAS-WB values in all catchments. Subsequent streamflow forecasts demonstrated accuracy and sharpness but underprediction and reduced reliability for high extremes. Groundwater forecasts were sharper than S4 and AR_NPAC but also underpredicted against the EFAS-WB (Figs. 4-6).
AR_WATL produced the best areal precipitation forecast in all catchments; the forecast median traced the simulated EFAS-WB cumulative rainfall patterns with relatively high accuracy until mid-to late December when accuracy trailed off. Precipitation forecasts remained sharper than S4 and AR_NPAC, and total rainfall was matched by the forecast maximum in the Evenlode and Lower Thames (Figs. 4-6). Low CRPS and strong positive correlation values indicate a marked improvement for all streamflow and groundwater forecasts (Figs. 8a,b). Extreme streamflow events were missed from late December to early January in all catchments that correlated with the decreased accuracy in the rainfall forecast. Groundwater forecasts showed regular oscillations in all three catchments (also apparent in AR_EATL and AR_NPAC forecasts; Figs. 4-6).

3) CATCHMENT VARIATION
Observed gauged streamflow patterns (black line), although of different orders of magnitude, were similar for the Evenlode and Lower Thames with consistently high flows from mid-December onward with 5-6 clearly defined peaks (Figs. 4,6). LISFLOOD successfully modeled the flow dynamics in the Lower Thames (r sens 5 0.98; Fig. 7). The flow pattern was quite accurate in the Evenlode, but overall model performance was lower (r sens 5 0.81) as the simulated EFAS-WB did not capture flow pattern between mid-December and the end of January (Figs. 4, 6, and 7). The Loddon had a much flashier response with eight clearly defined peaks (black line), coupled with the shortest time to peak of 9.81 h (Table 1). Model performance was the lowest of the three catchments (r sens 5 0.73) as LISFLOOD failed to detect peaks around 17 December and underpredicted the extreme events from late December to mid-January (Figs. 5, 7). Observed borehole groundwater levels (black line) increased in all three catchments: Evenlode 16.10 m, Loddon 10.90 m, and Lower Thames 11.13 m (Figs. 4-6). The Loddon and Lower Thames recorded average or just below average (Q50) groundwater levels until mid-December, when levels showed a consistent and steady rise. Groundwater levels recorded in the Evenlode were more responsive following precipitation events and mirrored streamflow dynamics (Fig. 4). LISFLOOD was best able to model groundwater levels in the Evenlode (p sens 5 0.92) but was oversensitive in the Lower Thames (p sens 5 0.70; Fig. 7).
In respect to streamflow and groundwater level forecasting skill compared against observations in each catchment, CRPS and Spearman's rank p indicated that AR_WATL provided the best forecast skill in all catchments (solid bars in Figs. 8a,b). CRPS obs for the Loddon and Lower Thames followed the aforementioned pattern S4 . AR_NPAC . AR_EATL . AR_WATL; however, the AR_EATL model run performed worst in the Evenlode (CRPS obs 5 6.32). Groundwater level forecast skill was consistent across catchments, with S4 performing worst and AR_WATL best (Fig. 8b).

Discussion
The winter of 2013/14 was exceptional in regard to the large number of Atlantic depressions that affected the United Kingdom-the Thames basin saw record precipitation levels that led to widespread and prolonged fluvial and groundwater flooding (Kendon and McCarthy 2015), the impacts of which have been well documented (e.g., Slingo et al. 2014;Thorne 2014;Muchan et al. 2015). The drivers of these extreme conditions have also been debated, with papers seeking to identify the atmospheric influences via reviews of multiscale model simulations investigating factors such as atmosphere, ocean, land use and demographics (see Huntingford et al. 2014), correlation analyses (van Oldenburg et al. 2015) and relaxation experiments (Rodwell et al. 2015;Watson et al. 2016;Knight et al. 2017). It is largely accepted that a combination of global meteorological influences were important, but studies that link different meteorological inputs and how these translate through to hydrological forecasting skill have not been conducted. Below, we discuss how identification of skill through the meteorological (ARE) and hydrological (EFAS) seasonal forecasting chain may provide an indication as to the origins of extreme events and the level of predictability that can be gained if the evolution in parts of the system are known. We also highlight the value of more skillful hydrological forecasts during extreme events for stakeholders, taking into account the variation in catchment properties that exist across the Thames basin.

a. Translating meteorological improvements into more skillful hydrological forecasts
From a meteorological perspective, AR_EATL was expected to give the rainfall closest to that observed during DJF 2014 due to the location of southern England at the edge of the relaxation box. Although this experiment provided the most confident hydrometeorological ensemble forecasts, their value was limited because of underprediction, likely because AR_EATL missed the southward extent of the atmospheric trough and hence did not fully capture the details of the flow anomaly affecting southern England.
Atmospherically speaking, AR_NPAC captured the representation of large-scale flow over the northern Atlantic better than S4, yet this did not translate into an improved precipitation forecast, resulting in low hydrological forecasting skill over the Thames basin. As AR_NPAC gave a stronger anomaly in geopotential height over the eastern Atlantic, one could speculate that systematic model errors affected the Rossby wave train from the Pacific to the Atlantic, leading to misplacement of the anomaly over the northeastern Atlantic. Given the relationship between the tropical Pacific and El Niño-Southern Oscillation (ENSO; Doblas-Reyes et al. 2013), there was hope that seasonal hydrological predictability could be improved in the future with a better modeled teleconnection from ENSO. Rather, the results point to the importance of the western Atlantic and pose the open question about whether the forcing into this box is linked with the Pacific and/or tropical Atlantic.
The best hydrological forecasts were obtained by the AR_WATL experiment. Climatologically, the easternUnited States and Gulf Stream is the most active region for cyclogenesis in the Atlantic (Hoskins and Hodges 2002), and the representation of the anomaly in this region also captured the downstream anomalies over the northern Atlantic. Whether this is a result of the cold anomaly over North America giving a strong temperature contrast (baroclinicity) over the Gulf Stream or related to the anomaly in the divergent flow from South America as discussed in Knight et al. (2017) has yet to be confirmed. Nonetheless, with future improvements to coupled models, there is scope for an improvement in the teleconnections whereby the results for this study could be revised (Magnusson et al. 2013).
b. More skillful hydrological forecasts, but missed events, oscillations, and uncertainty All three AREs led to improvements in meteorological input, which translated through to more skillful streamflow and groundwater level reforecasts compared to S4, with AR_WATL performing best. However, there were consistent trends observed for all ARE model runs across the three catchments. Poor representation of the hydrological variables during the first 6 weeks coincided with the end of the drought period that preceded the extreme wet conditions. Wood et al. (2016) found that this climatological transition period produced the lowest seasonal predictability as initial hydrologic conditions provide minimal contribution, an effect that may have been heightened in the Thames basin, which is largely groundwater driven (Svensson et al. 2015).
Streamflow forecasts also missed peak streamflow events observed between the end of December and mid-January; timewise, these correlated with the point at which precipitation forecasts diverged from the simulated EFAS-WB, indicating potential meteorological forcing errors. This was likely due to the extreme nature of the rainfall experienced at this time, which was undetected in the meteorological forecast, propagating the error into the hydrological forecast (Davolio et al. 2008) coupled with the uncertainty prevalent at longer lead times . Structural issues in LISFLOOD, however, cannot be ruled out, as the EFAS-WB also failed to capture these peak streamflow events in the Evenlode and Loddon catchments. Factors including the variable density of the rain gauge network, lack of horizontal flow (from pixel to pixel) of water in the topsoil and subsoil, and inability to represent finescale geological and morphological characteristics in smaller subbasins, for example, may have limited forecast skill and model performance in these catchments. Nonetheless, recovery of the EFAS-WB toward observations later in the seasonal streamflow forecast (mid-January onward) suggests that these missed events may relate more strongly to meteorological forcing errors.
Groundwater-level forecasts showed oscillations of increasing amplitude as precipitation forecasts improved (most obvious for AR_WATL), with troughs corresponding with the November dry period and missed rainfall events, and peaks shortly following periods of intense precipitation. A rapid response to rainfall has been observed for aquifer recharge rates and groundwater level time series (Lee et al. 2006;Bloomfield and Marchant 2013), indicating that there can be sensitivity of groundwater forecasts to meteorological forcing data. Here, we investigated the LISFLOOD upper-groundwater-zone response, where processes represent a mix of fast groundwater, including preferential flow rates and subsurface flow through soil macropores ), and thus a quicker response to rainfall following a dry period was expected (Mahmood-ul-Hassan and Gregory 2002;Lee et al. 2006). The cyclical dynamic of the forecast may also represent model processes whereby outflow from the upper zone is released once the amount of water being stored reaches a threshold (van der Knijff et al. 2010). As such, it is likely that the observed oscillations represent combined effects of the LISFLOOD model setup and sensitivity to rainfall input.

c. Catchment controls on the variation in hydrological skill improvements
There were differences in the observed hydrological response and model performance between catchments, likely explained by the EFAS setup, plus local weather conditions and geographical differences acting at the catchment level. Simulated EFAS-WB streamflow values were low compared JUNE 2018 N E U M A N N E T A L . to observations in the largely groundwater-driven Evenlode and flashy-responding Loddon (discussed previously), but were well captured in the Lower Thames, where peak streamflow observations exceeded ;500 m 3 s 21 . This increased performance may be attributed to the fact that the Thames basin in LISFLOOD is calibrated using gauged daily flow records from the Lower Thames (at Kingston/ Teddington Lock). The geographical position of the Lower Thames also represents drainage from the entire upstream catchment, essentially representing a larger basin for which LISFLOOD was designed. The greater coverage of impervious surfaces where LISFLOOD assumes no soil or groundwater storage may also have played a role (Burek et al. 2013). By contrast, groundwater levels were most accurately modeled and forecast in the Evenlode. Despite its small size and position at the headwater, this suggests that LISFLOOD is well set up to capture upper-zone processes in rural land-use catchments dominated by chalk and limestone lithology (see also Mansour et al. 2013). Antecedent dry conditions are also likely to have played an important role, allowing percolation into the aquifer, as explained by the 6.10-m increase in observed groundwater levels (Svensson et al. 2015). By contrast, the relatively small yet linear increase in observed groundwater levels in the Loddon and Lower Thames could be attributed to the locations of boreholes within less permeable lithologies, and in the case of the Lower Thames, a heavily urbanized area (Macdonald et al. 2012). Bloomfield and Marchant (2013) recognized clear effects of fractured chalk and granular sandstone aquifer characteristics on saturated flow and storage during U.K. drought conditions, and it is not unreasonable to expect differences to carry forward into a period of extreme rainfall. The cumulative effects of upstream groundwater abstractions not accounted for in LISFLOOD may also explain the notable difference between simulated EFAS-WB and borehole observations in the Lower Thames-an effect less prevalent in the Loddon and Evenlode, where boreholes are located toward the top of the catchments. Interestingly, the observed groundwater storage in the Loddon and Lower Thames was more consistent with that of the lower (saturated) zone in LISFLOOD (not considered in this study because of data issues), where water is either stored or enters the channel via baseflow, producing a very slow, seasonally linear response to meteorological forcings (van der Knijff et al. 2010;Mackay et al. 2015). Whether the oversensitivity of the simulated upper-zone response in these catchments (notably the Lower Thames, where the EFAS-WB captured groundwater variability at an entirely different frequency) is a result of finer-scale geological and land-use heterogeneity not captured by LISFLOOD (Svensson et al. 2015) or the saturated nature of the impervious deposits that may be better represented by lower-zone processes requires further work.

MORE SKILLFUL HYDROLOGICAL FORECASTS
There is currently a lot of focus on improving operational flood forecasts at seasonal time scales, and extreme events such as those experienced in DJF 2014 raise important questions about whether there are elements of predictability that are being missed by seasonal forecasting systems (Scaife et al. 2014;van Oldenburg et al. 2015;Watson et al. 2016;Knight et al. 2017). Driving hydrological models with inputs from atmospheric relaxation experiments provides a valid indication of what can be achieved from an operational forecasting system if the determinants of prolonged seasonal mean forcing, for example, ENSO, could be captured in the future. While S4 was unable to skillfully capture the seasonal average forcing for DJF 2014, updates such as the fifth-generation ECMWF seasonal forecasting system (SEAS5), which will shortly replace S4, indicate substantial improvements to SST bias in the tropical Pacific, increased model resolution, and a greater ensemble size (Lucas 2017) that may go some way to improving seasonal hydrological predictions.

2) DIFFERENT HYDROLOGICAL MODEL SETUP TO
EXTRACT SKILL While the uncertainty in the forecasts appears to be largest, further analysis might consider adjusting the LISFLOOD model parameters through a process of model calibration (Shi et al. 2008) and/or comparing results with those obtained from a local-scale hydrological model that better captures streamflow and groundwater dynamics in smaller basins. Use of multiple different hydrological models could also help capture a fuller representation of the uncertainty that comes from the hydrology and land surface (e.g., see EDgE; Copernicus 2017b).

ADVANCE
Based on numerical weather prediction, the fluvial flood events of DJF 2014 were well forecast at a lead time of 2-3 days and with reasonable accuracy up to 2 weeks ahead of time . Groundwater floods, which acted over a longer time scale and were triggered by exceptional aquifer recharge and saturation of permeable deposits, were not well predicted due to the complex dynamics and interactions of the groundwater system with atmosphere and land processes (Mackay et al. 2015). The EA is responsible for managing flood risk in the United Kingdom. Taking the Loddon December floods as an example, a flood alert based on the EA streamflow thresholds at the river gauging station would have been triggered from a value of 11 m 3 s 21 : in the case of S4 and AR_NPAC, the forecast median did not cross this threshold, although maximum extremes of the ensemble did. For AR_WATL and AR_EATL, a flood alert for the local area would have been observed with 6 weeks lead time (1 November) based on the forecast median. This would have allowed mitigation strategies and low-cost preventative actions to be carried out well in advance while also highlighting an ''area to watch'' as the season progressed.
The importance of SHF for advance warning should not be underestimated in densely populated areas such as the Thames basin. Increasing pressures for urban development, intensification of agriculture, and clean water demand a more spatially and temporally integrated approach to management of the water sector (Mansour et al. 2013;Lewis et al. 2015). There is also growing evidence to support an increasing likelihood of Atlantic storms that take a more southerly track akin to DJF 2014 (Slingo et al. 2014), and while the contribution of climate change cannot be definitively related to changes in the U.K. hydrological response (Hannaford 2015), even a small shift in mean climate variability could substantially shorten the return periods of such events (Knight et al. 2017). Further studies that trace the meteorological input improvements right through the meteorologicalhydrological forecasting chain are therefore strongly advocated.

Conclusions
Atmospheric relaxation experiments can improve our understanding of extratropical anomalies and the potential predictability of extreme events such as DJF 2014. Our results highlight that there is meteorological FIG. B1. CRPS for S4 run using 2-51 ensemble members: (a) forecast against streamflow observations (CRPS obs ) and (b) forecast against simulated EFAS-WB (CRPS sim ). Tables outline relative percentage differences in CRPS achieved with 28-and 51-member ensembles. knowledge to be gained by considering the hydrology, that is, although large-scale seasonal flow anomalies were picked up in the meteorology, these did not always translate through to more skillful hydrological forecasts. Extreme events such as DJF 2014 are difficult to predict with confidence at seasonal time scales, but considering the local hydrogeological context for streamflow and groundwater levels can provide an effective early alert of potentially high impact events, allowing for better preparedness and greater confidence in forecasts as an event approaches.