Atmospheric rivers (ARs) are global phenomena that transport water vapor horizontally and are associated with hydrological extremes. In this study, the Atmospheric River Skill (ATRISK) algorithm is introduced, which quantifies AR prediction skill in an object-based framework using Subseasonal to Seasonal (S2S) Project global hindcast data from the European Centre for Medium-Range Weather Forecasts (ECMWF) model. The dependence of AR forecast skill is globally characterized by season, lead time, and distance between observed and forecasted ARs. Mean values of daily AR prediction skill saturate around 7–10 days, and seasonal variations are highest over the Northern Hemispheric ocean basins, where AR prediction skill increases by 15%–20% at a 7-day lead during boreal winter relative to boreal summer. AR hit and false alarm rates are explicitly considered using relative operating characteristic (ROC) curves. This analysis reveals that AR forecast utility increases at 10-day lead over the North Pacific/western U.S. region during positive El Niño–Southern Oscillation (ENSO) conditions and at 7- and 10-day leads over the North Atlantic/U.K. region during negative Arctic Oscillation (AO) conditions and decreases at a 10-day lead over the North Pacific/western U.S. region during negative Pacific–North America (PNA) teleconnection conditions. Exceptionally large increases in AR forecast utility are found over the North Pacific/western United States at a 10-day lead during El Niño + positive PNA conditions and over the North Atlantic/United Kingdom at a 7-day lead during La Niña + negative PNA conditions. These results represent the first global assessment of AR prediction skill and highlight climate variability conditions that modulate regional AR forecast skill.
Atmospheric rivers (ARs) are narrow plumes of strong horizontal water vapor transport that are typically found in the midlatitudes ahead of the cold front of an extratropical cyclone (Zhu and Newell 1998; Ralph et al. 2004; Neiman et al. 2008; Cordeira et al. 2013; American Meteorological Society 2017). They can intensify downstream precipitation and influence flooding, snowpack, and water availability (e.g., Ralph et al. 2004, 2005, 2006; Neiman et al. 2008; Dettinger et al. 2011; Lavers and Villarini 2015). The abundance or deficit of ARs strongly influences the availability or lack of freshwater in a given water year (Guan et al. 2010; Dettinger et al. 2011; Dettinger 2013). In addition, heavy precipitation events, flooding conditions, and hazardous winds are strongly correlated to the occurrence of ARs in many regions around the globe (Ralph et al. 2006; Ralph and Dettinger 2011; Ralph and Dettinger 2012; Sodemann and Stohl 2013; Lavers and Villarini 2015; Nayak et al. 2016; Ralph et al. 2016; Waliser and Guan 2017).
ARs can have both beneficial (e.g., replenishing water reservoirs) and detrimental (e.g., causing damaging floods and landslides) impacts on regional economics and public safety. Consequently, there is significant incentive at the intersection of the research, resource management, hazard preparedness and response, and policymaking communities to understand the linkage between AR landfall and subsequent impacts. This includes the need to quantify the skill of predicting AR occurrence, especially in terms of landfall, and to identify large-scale climate variability conditions during which forecasts of ARs are more or less skillful.
Given the importance of forecasting AR activity, recent studies have assessed prediction skill in ensemble forecasts of ARs. Wick et al. (2013) assessed AR landfall prediction skill of five operational ensemble forecast systems [National Centers for Environmental Prediction (NCEP), European Centre for Medium-Range Weather Forecasts (ECMWF), the Met Office (UKMO), the Canadian Meteorological Centre (CMC), and the Japan Meteorological Agency (JMA)] over the North Pacific/western U.S. region from 2008 to 2011 and found that AR occurrence is relatively well forecast out to a 10-day lead time (probability of detection > 84% at all lead times, and false alarms < 12% at all lead times), but that AR landfall position is subject to significant errors of over 800 km at a 10-day lead time. Nayak et al. (2014) used five operational control forecasts [four of the five operational models that were used in Wick et al. (2013)] from 2007 to 2013 and concluded that the models are not skillful in forecasting AR activity beyond a 7-day lead time over the central United States due to errors in both AR occurrence and AR landfall location. Note that for each of these studies, only the control forecast was used. In an ensemble forecast framework, which is used in this paper, particular ensemble members may have greater ability than demonstrated by the control forecast.
Considerable progress has also been made recently in understanding limits of prediction skill of AR-related quantities such as integrated vapor transport (IVT) and precipitation (as opposed to explicitly detecting or defining AR objects). Sukovich et al. (2014) evaluated extreme quantitative precipitation forecasts (QPFs) and found higher skill of seasonal extreme QPFs during boreal winter over the contiguous United States. Moore et al. (2015) verified precipitation forecasts from the NCEP Global Ensemble Forecast System from 1985 to 2014 and demonstrated that strong-IVT (likely AR) extreme precipitation events over North America are associated with higher forecast skill than weak-IVT (likely not AR) events out to a 6-day forecast lead time. Other studies have focused on predictability of these AR-related meteorological parameters. Lavers et al. (2014) and Lavers et al. (2016a) distinguished between forecasts of IVT and precipitation and found higher predictability in IVT out to a 9-day lead time over both Europe and the western United States.
The study presented here extends these studies of AR (and AR related) prediction skill and predictability by focusing on a global (rather than regional) domain and by examining the ability of numerical weather prediction models to predict ARs out to a 14-day lead time, with an added focus on forecasts that are initialized during certain conditions of large-scale climate variability.
2. Data and methods
a. ECMWF ensemble hindcasts: A subset of the S2S database
To assess the prediction skill of ARs, global hindcast data from the newly created Subseasonal to Seasonal (S2S) Project database (Table 1 of Vitart et al. 2017) are used. The analysis is limited to one of the S2S models, the ECMWF model, over the period from 1996 to 2013. The model data are retrieved at a 1.5° × 1.5° grid, which is the default archive grid for ECMWF hindcasts in the S2S database. Hindcasts are made twice a week, and there are 10 perturbed forecast members available. Ocean coupling is included in the ECMWF hindcasts, but sea ice coupling is not. Hindcasts are made “on the fly,” which means that ECMWF updates the version of their model at various times during a given year. The ECMWF hindcasts used in the present study were produced with model versions between May 2015 and May 2016 and therefore used mostly Integrated Forecast System (IFS) model versions 41r1 (41r2 after March 2016).
The ECMWF interim reanalysis (ERA-I) dataset (Dee et al. 2011a) is used as a proxy observation. The reanalysis data are available on a 0.75° × 0.75° grid and are bilinearly interpolated to match the coarser hindcast grid.
Ten perturbed ECMWF hindcast ensemble members are analyzed (the control hindcast is not included because it was not available for download at the time of data retrieval), as is ERA-I data from 1996 to 2013 at 0000 UTC using specific humidity q and zonal u and meridional υ winds at 300-, 500-, 700-, 850-, and 1000-hPa vertical levels. The zonal and meridional components of the IVT [defined as the vertical integral of the product of specific humidity and zonal/meridional wind; see Zhu and Newell 1998, their Eqs. (7a) and (7b)] are calculated first. Subsequently, the Guan and Waliser (2015, hereafter GW2015) AR detection algorithm, which identifies ARs objectively as contiguous regions (objects) where IVT at each grid cell exceeds the 85th percentile specific to the region and season and with the length of the object > 2000 km and length-to-width ratio > 2, is applied on these IVT fields. This algorithm detects ARs globally in the manner described above as a function of ensemble member and lead time (1–14 days at a 1-day step). Focus is placed on the November–March (NDJFM) and May–September (MJJAS) seasons.
b. Object-based detection for forecast verification
Object-based forecast verification involves evaluating a model’s ability to predict sporadic, episodic atmospheric phenomena that are heterogeneous in space and time (Ebert and McBride 2000; Davis et al. 2006). Recent studies have explained how traditional methods of verifying numerical weather forecasts (e.g., root-mean-square error) of relatively smooth fields (e.g., upper-tropospheric winds) may not be suitable for verification of episodic, localized features (e.g., atmospheric rivers) that have considerable spatiotemporal variability (Ebert 2008; Mittermaier et al. 2015).
The GW2015 algorithm employs an object-based approach and detects ARs based on synoptic features that represent a rough consensus of AR characteristics (IVT, geometry, etc.). The algorithm introduced in this study takes these AR objects as input, identifies the latitude and longitude coordinates of their IVT-weighted centroid, and compares the distance between any grid cell and all AR object centroids to a user-defined distance threshold. This process is described in more detail in sections 2c and 3b(1).
c. ATRISK algorithm description
The Atmospheric River Skill (ATRISK) algorithm helps us achieve two primary goals in this paper. The first goal is to compute the percentage of ECMWF ensemble members that can skillfully predict an observed AR (i.e., AR hit percentage) as a function of season, location, lead time, and an acceptable (great circle) distance threshold between the centroid of the observed and forecasted AR (1000, 500, and 250 km tested in this study). The great circle distance D between the centroid of a given grid cell and the centroid of an AR object is calculated as
where R = radius of Earth in kilometers and
for latitude φ and longitude λ.
The second goal is to compute relative operating characteristic (ROC) curves (Hanley and McNeil 1982) for ARs, which account for hits (observed AR and forecasted AR), misses (observed AR, but no forecasted AR), false alarms (no observed AR, but forecasted AR), and correct rejections (no observed AR and no forecasted AR). ROC curves have been previously used in applied weather and climate studies that seek to discern the utility of operational forecasts of atmospheric and/or oceanic fields (e.g., Mason and Graham 2002) and have recently been used to evaluate how well IVT can discriminate extreme precipitation events (Lavers et al. 2016b). Hit rates are calculated as [number of hits/(number of hits + number of misses)] and are equivalent to probability of detection (POD). False alarm rates are calculated as [number of false alarms/(number of false alarms + number of correct rejections)] over a given time period of interest. This quantity is distinct from false alarm ratio (FAR), which is often implemented as an operational forecast verification tool and is calculated as [number of false alarms/(number of false alarms + number of hits)] (see Barnes et al. 2009). Area under ROC curve values (ROC scores), which are introduced in section 3b(2), are proportional to the critical success index (CSI), which is defined as [number of hits/(number of hits + number of misses + number of false alarms)]. The ROC sensitivity to season, location, lead time, and states of various modes of climate variability is further examined.
The first goal diagnoses the ability of ECMWF to forecast an observed AR 1–14 days in advance, while the second objective assesses AR forecast utility in ECMWF by explicitly considering all contingency table values (hits, misses, false alarms, correct rejections). The ROC is introduced to achieve the second objective as a way to expand the scope of this study and more thoroughly assess model performance. Note that if one only uses the hit percentage from the first part of this study and does not include ROC analysis (or a similar metric), a model that forecasts ARs everywhere at all times will appear to be perfect. It is only by considering false alarms, misses, and correct rejections (in addition to hits) that this can be adequately accounted for. Both approaches together allow us to quantify sensitivities of AR prediction skill to region, lead time, season, and background phases of climate variability at hindcast initiation.
ATRISK follows the steps below to achieve the first goal. Additional steps are given in section 3b(1) that allow for accomplishing the second goal.
Retrieve observed (ERA-I) and modeled (ECMWF) AR centroid longitude and latitude values. For ECMWF, this is done for all lead times (1–14 days) and ensemble members 1–10.
For each day, assign the value of 1 to all grid cells whose distance is within D kilometers of the centroid of an observed AR object present at that given time step. Other grid cells are assigned the value of 0.
For each modeled hindcast (at a given lead time and for a given ensemble member), assign the value of 1 to all grid cells across the globe whose centroid is within D kilometers of the nearest AR object at that given time step. Shaded grid cells are saved with values of 1.
Using the 10 ensemble member shaded fields from step 3, compute an ensemble-average field for each hindcast and lead time at each grid cell.
As an example, consider all 1996–2013 4 January hindcasts in the ECMWF dataset, and consider the 7-day lead time. In step 3, global maps are created with shaded values of 1 that correspond to grid cells that are within D kilometers of the nearest AR object centroid. This is done for each ensemble member, yielding 10 global maps with shaded orbs (which have values of 1), similar to the orbs shown in Fig. 1b. Step 4 then creates an ensemble-averaged field from these 10 maps, with “ensemble-average orbs” ranging between 0 and 1.
A visual summary of the GW2015 and ATRISK algorithms is shown in Fig. 1. AR objects at 0000 UTC 2 March 1996, as detected by the algorithm introduced in GW2015, are shown in Fig. 1a and are shaded in blue. ATRISK objects within 1000 km of the AR objects are shown in Fig. 1b and are also shaded in blue. The method of determining if a predicted AR is a “hit” or “miss” relative to an observed AR is shown in Fig. 2. In this example, the prediction of AR1 is considered skillful (a hit) since its centroid falls within the distance threshold of the observed AR, while the prediction of AR2 is not considered skillful (a miss) since its centroid falls outside the distance threshold of the observed AR.
To achieve the first goal, all observed AR objects are identified, and their centroid longitude and latitude are compared to each ensemble member AR object (as a function of lead time). These results are described in section 3a and with Figs. 2–4. Achieving the second goal is more complex, and the steps taken to create Figs. 5–8 are described in section 3b.
3. Results and discussion
a. Global climatology of AR hits and sensitivity to lead time, season, and distance threshold
The average 1996–2013 NDJFM percentage of ECMWF ensemble member AR hits at a 7-day lead time is shown globally in Fig. 3 using a 1000- and 500-km distance threshold between observed and forecasted ARs. The blue polygons drawn in Fig. 3a denote the area average domains used throughout this study. This is the first approach used to assess global AR prediction skill (i.e., focusing only on hits, namely, the fraction of ensemble members that forecast an AR within the distance threshold).
For a given distance threshold, hit percentages are generally higher over ocean basins than over land regions, and hit percentage decreases by 25%–35% in most locations for the 500-km threshold relative to the 1000-km threshold. Because ARs are sporadic phenomena, the global hit percentage map for a single hindcast would contain some grid cells with missing hit percentage values, since the presence of observed and forecasted ARs is less likely in a smaller sample of data. Using all hindcasts from a single season (as in Figs. 3a,b) has the effect of smoothing the statistics in space, since more observed and forecasted AR pairs are captured in a larger sample. There are still areas in Figs. 3a and 3b where NA (missing) values are present due to a lack of observed and forecasted ARs. This might occur, for example, in regions of low climatological AR frequency (e.g., eastern equatorial Pacific; see GW2015, their Fig. 7a), or because the twice-a-week hindcast frequency omits some observed ARs. During boreal summer, hit percentages are highest in the Southern Hemisphere and decrease by 20%–30% in most North Hemispheric regions for both 1000- and 500-km distance thresholds (not shown).
Area-averaged AR hit percentages are shown as a function of lead time, season (colors), and distance threshold (1000 km, solid lines; 500 km, dashed lines; 250 km, dotted lines) in Figs. 4a–d for the North Pacific/western United States (35°–45°N, 150°–125°W), North Atlantic/United Kingdom (45°–60°N, 30°W–0°), South Pacific/Australia (30°–40°S, 145°–170°E), and South Pacific/Chile (30°–40°S, 110°–70°W) regions during NDJFM and MJJAS. These regions are chosen because they coincide with midlatitude storm tracks, are upstream of heavily populated regions, and include coastlines where AR landfalls are frequent (GW2015, their Fig. 8a). GW2015 identified NDJFM as a climatologically active AR season for the Northern Hemispheric regions and MJJAS as a climatologically active AR season for the Southern Hemispheric regions (see their Fig. 9b). Mundhenk et al. (2016) and Eiras-Barca et al. (2016) identified boreal winter (DJF) as a climatologically active season for ARs relative to boreal summer over the North Pacific/western U.S. and North Atlantic/U.K. regions, respectively, used in this study. In addition, for all regions, AR landfalls can have major socioeconomic consequences in association with extreme precipitation and winds (e.g., Waliser and Guan 2017). NDJFM and MJJAS are used instead of DJF and JJA in order to increase the sample size of AR hits for a given season, thus increasing the signal-to-noise ratio. In addition, for the west coast of North America and northwestern Europe, ARs are quite typical during this extended winter season.
Horizontal solid lines represent the observed ERA-I 1996–2013 reference climatologies of AR frequency for the given region, season, and distance threshold (DT). These lines represent the long-term chance of a grid cell (or in this case, a regional average of grid cells) being within DT kilometers of an observed AR (with DT = 1000, 500, or 250 km), or put another way, the skill based on using climatology as the forecast model. Therefore, the hit percentages shown in Fig. 4 can be thought of as skillful (or useful) relative to the observed climatological AR frequency (i.e., the long-term chance of a grid cell being within DT kilometers of an observed AR) if their values at a given lead time are above the horizontal line (the observed AR frequency climatology). A calculation of a similar reference climatology is made for ECMWF at all lead times and shows a very close agreement with the observed reference climatology in all regions, indicating that there are no significant model biases in AR occurrence in these regions (even out to 28 days, not shown).
AR hit percentage is most variable over the North Pacific/western U.S. region, while both Southern Hemispheric regions show little seasonal variability. In all regions, hit percentages asymptote to a “noise floor” around 10-, 8-, and 6-day lead times for the 1000-, 500-, and 250-km thresholds, respectively. The noise floor for each distance threshold and seasonal pairing is very close to the observed reference climatology, which is consistent with the interpretation that the values of the asymptotes represent the odds of a random AR falling within the associated distance threshold of an observed AR.
The lead time at which AR hit percentage equals 50% as a function of distance threshold is shown in Fig. 5a for the North Pacific/western U.S. and South Pacific/Chile regions during NDJFM (black) and MJJAS (blue). Such information is an important consideration if the landfall location and associated landfall distance error is relevant, for example, for reservoir management or hazard response. For example, if a decision-maker, impacted by North Pacific/western U.S. ARs, is able to make a decision based on at least 50% hit percentage but requires distance errors to be no larger than 250 km, then the forecast may not generally be reliable in this sense until 2–3 days before the landfall occurs. On the other hand, if the decision can be made from a consideration of 1000-km landfall error, then the forecast may be useful at a lead time of 7 days.
The AR hit percentage at a 7-day lead time as a function of distance threshold for the same regions and seasons is shown in Fig. 5b, along with reference climatology values. This information is relevant for medium-range weather forecasting or reservoir management. For example, if a water reservoir manager in California can improve his or her management by knowing that the climatological chance of a forecasted AR occurring within 500 km of an observed AR at their reservoir based exceeds 20%, only boreal winter (NDJFM) forecasts would be reliable compared to boreal summer (MJJAS) forecasts.
Results from previous studies generally corroborate the findings presented here, though no studies to date have estimated prediction skill on a global scale at lead times extending out to 14 days of the AR objects themselves in the manner of Figs. 2–4. Wick et al. (2013) computed the correlation coefficient between three boreal winter seasons (2008–11) of observed integrated water vapor (IWV) fields [from Special Sensor Microwave Imager (SSM/I) satellite data] and five numerical weather prediction model control forecasts of AR-related IWV over the North Pacific at 1-, 3-, 5-, 7-, and 10-day lead times. In most models, the correlation coefficient reduces by approximately 30% between 7- and 10-day leads. Though not directly comparable due to differences in methodology, record length, and domain size, the results in Fig. 4a are generally consistent, as the North Pacific/western U.S. hit percentages during NDJFM decrease by approximately 30% of their initial values at around 7- to 8-day lead times. Lavers et al. (2016a) estimated predictability of IVT over the eastern North Pacific/western U.S. region from the NCEP Global Ensemble Forecast System (GEFS) hindcast data from 1984 to 2015 and found indications of predictability out to a 9-day lead time (and out to a 7-day lead time for precipitation). Note that in this study, prediction skill is quantified rather than estimating predictability, and thus skill values are expected to be less than predictability values.
An additional calculation of AR prediction skill is now made using ROC curves, which explicitly account for false alarms in addition to hits. To illustrate the need for this consideration in simple terms, consider a forecast model that forecasts ARs everywhere all the time. Its hit percentage would be perfect all the time, yet it would actually be a poor forecast model as it would be overly fraught with false alarms.
b. Assessing AR prediction skill using ROC curves
The ROC methodology mentioned in section 2c is now employed as a second approach to assessing global AR prediction skill. To account for both hits and false alarms, ROC curves and areas are calculated for ARs using the ERA-I and ECMWF hindcast ensemble datasets.
To calculate ROC curves for a given great circle distance threshold of D kilometers, the following steps are used:
Follow steps 1–4 outlined in section 2c.
Use a range of trigger thresholds above which decision-makers might issue warnings or enact contingency plans for an impending impactful event. Here, the trigger thresholds are 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0. They represent the fraction of ECMWF ensemble members that agree on a forecast of an AR. Then, compare each trigger threshold to the fields created in steps 1–4 from section 2c. If the average forecast probability shaded field is greater than the trigger threshold value, store a value of “1.” If not, store a value of “0.”
Compute contingency table fields as follows, as a function of time and trigger threshold, at each grid cell:
Hit field: store value of 1 if the observed field = 1 (observed AR present within D kilometers), and the model field = 1 (modeled AR present within D kilometers at the given threshold). Otherwise, store a 0.
Miss field: store value of 1 if the observed field = 1 (observed AR present within D kilometers), and the model field = 0 (no modeled AR present within D kilometers at the given threshold). Otherwise, store a 0.
False alarm: store value of 1 if the observed field = 0 (no observed AR present within D kilometers), and the model field = 1 (modeled AR present within D kilometers at the given threshold). Otherwise, store a 0.
Correct rejection: store value of 1 if the observed field = 0 (no observed AR present within D kilometers), and the model field = 0 (no modeled AR present within D kilometers at the given threshold). Otherwise, store a 0.
Compute hit rates and false alarm rates and create ROC curves and global maps of the area under the ROC curves.
2) Long-term mean ROC curves
The 1996–2013 average ROC curve for NDJFM over the North Pacific/western U.S. region at various lead times is shown in Fig. 6a. Triangles denote hit rate and false alarm rate values at a given trigger threshold. The threshold labels below the x axis correspond to the triangles on the plot. At short (3 and 5 days) lead times, the ROC curves are closer to the top-left corner of the graph, indicating that forecasts of ARs at those lead times are useful (i.e., high hit rates and small false alarm rates). At longer (14 days) lead times, the ROC curves are closer to the solid black line, which means that a forecast of an AR at those lead times is only marginally more useful than a random guess (i.e., equal probability of hit and false alarm). This representation of AR prediction skill improves the understanding of the extent to which an AR forecast is more skillful than a random guess and distinguishes actual skill from stochastic chance. We will introduce a more formal statistical significance test in section 3b(5) to quantify the extent to which ROC curves composited on various modes of climate are significantly different from long-term average ROC curves. This comparison more explicitly distinguishes a forecast ROC curve from a random guess. The 1996–2013 average area under the ROC curve calculated at each grid cell globally at a 7-day lead time is shown in Figs. 6b and 6c using a 1000- and 500-km distance threshold. Area under ROC curve values (or “ROC scores”) for the 1000-km threshold uniformly range between 0.7 and 0.9 at most places, while values at the 500-km threshold are smaller by about 0.1–0.2 and noisier.
Area-averaged ROC scores are shown in Fig. 7 as a function of lead time, season (colors), and distance threshold (as in Fig. 4). The bold horizontal line at 0.5 represents the baseline for a useful forecast, since ROC scores that equal 0.5 indicate an equal probability of hits and false alarms. ROC scores, which explicitly account for both AR hits and AR false alarms, are higher during boreal winter in the North Pacific/western U.S. and North Atlantic/U.K. regions, especially at 1000- and 500-km distance thresholds.
3) Motivation for assessing AR prediction skill conditioned on various states of the climate
A conditional verification is now undertaken using hindcasts that are initialized during positive and negative phases of several modes of climate variability. The purpose of this analysis is to test whether or not additional forecast skill and utility can be gained during certain phases of climate mode variability. The relationship between AR forecast utility and climate variations is examined here using the El Niño–Southern Oscillation (ENSO), the Arctic Oscillation (AO), and the Pacific–North America (PNA) teleconnection pattern. These climate variations are a focus because AR frequency is sensitive to the phase of each of these modes over particular regions around the globe (GW2015). In addition, ENSO phase can be a useful predictor not only for seasonal forecasts of precipitation (Pezzi and Cavalcanti 2001; Coelho et al. 2002; Taschetto et al. 2016), but for subseasonal and synoptic time-scale forecasts as well (e.g., Li and Robertson 2015). Black et al. (2017) also examined subseasonal prediction of the modes of climate variability themselves [namely, AO, PNA, and the North Atlantic Oscillation (NAO)]. Their results suggest that the AO and the PNA, along with the NAO, can be predicted with statistically significant skill by a statistical model based on partial least squares regression up to lead times of 4–5 weeks relative to climatology and persistence.
Emphasis is also placed on phase-locked ENSO–PNA conditions because there is some evidence for useful prediction skill of ENSO on monthly-to-seasonal time scales (Dayan et al. 2014; Barnston et al. 2012), and for a strong relationship between PNA circulation anomalies and El Niño (Abid et al. 2015). Muñoz et al. (2015) also suggest that skillful prediction of extreme precipitation can be increased during particular phase-locking configurations of ENSO, the Atlantic meridional mode (AMM), and other potential predictor modes of variability. In addition, it will be shown in Fig. 9 that AR frequency anomalies are magnified during these phase-locked conditions relative to the individual mode responses.
The conditional ROC curves shown in this section might be a useful reference, for example, in issuing a forecast over a given region where AR activity projects strongly onto precipitation. If a forecaster knows that AR frequency anomalies are positive or negative during a certain mode phase, it may be useful to know the relative utility of AR forecasts during those phases (as quantified by the conditional ROC curves) as a parameter to consider when making a forecast.
A more detailed overview of atmospheric teleconnections related to these modes of variability is provided below.
4) Overview of atmospheric teleconnections and description of climate indices used
Large-scale extratropical midlatitude dynamics can be strongly influenced by remote variations in tropical Pacific sea surface temperature (SST) associated with ENSO (Bjerknes 1969; Rasmusson and Wallace 1983; McCabe and Dettinger 1999; Meehl et al. 2007; DeFlorio et al. 2013). This connection between climate mode variability (e.g., ENSO) and atmospheric circulation can have significant impacts on the orientation and frequency of ARs and resultant extreme precipitation (Cordeira et al. 2013; GW2015). In the North Pacific, variations in extratropical geopotential height anomalies that define the PNA are associated with distinct weather variations over the United States and have been shown to respond linearly to changes in tropical Pacific SST (Tsonis et al. 2008; Wallace and Gutzler 1981). Negative PNA conditions are associated with a weakening of the Aleutian low (Renwick and Wallace 1996) and more zonal flow over the North Pacific/western U.S. midlatitudes (Leathers et al. 1991, their Fig. 1).
The following climate indices, obtained from the National Oceanic and Atmospheric Administration’s Climate Prediction Center, are used in this study: 1) the Niño-3.4 index (mean SST anomalies in the central/eastern equatorial Pacific between 5°S and 5°N and between 190° and 240°E) and 2) daily values of the AO and PNA teleconnection indices, which are computed by projecting daily geopotential height anomalies at 1000 hPa (for AO) and 500 hPa (for PNA) onto their empirical orthogonal function loading patterns.
5) Climate-conditioned ROC curves
ROC curves for positive and negative phases of each mode (defined as 0.5 standard deviation above or below zero based on their respective indices) at hindcast initialization over the North Pacific/western U.S. region during DJF (for ENSO), the North Atlantic/U.K. region during DJF (for AO), and the North Pacific/western U.S. region during DJF (for PNA) at 3-day (solid line), 7-day (dashed line), and 10-day (dotted line) lead times are shown Figs. 8a–c. Focus is restricted to DJF in an attempt to capture periods where teleconnected climate variability is most prevalent. To estimate the statistical significance of the results, a bootstrap process is repeated 1000 times with replacement to create ROC score distributions (shown to the right of each ROC curve plot) by using resampling of the positive mode days (red box plots), negative mode days (blue box plots), and “all days” distribution (white boxplots). If the difference between the actual ROC score of the all-days distribution and either positive or negative mode distribution lies outside of the 5th or 95th percentile of the distribution of bootstrap ROC score differences, then the ROC curve circles (representing each trigger threshold value) are filled. If the difference is insignificant at the 90% confidence level, then the ROC curve circles are unfilled. ROC curves that have large ROC score differences between positive and negative phases in the bootstrapped samples are shown in Fig. 8.
These results indicate that AR hindcasts from the ECMWF model are more useful (i.e., relatively higher ratio of hits to false alarms) during El Niño conditions over the North Pacific/western U.S. region at a 10-day lead time in boreal winter, and during negative AO conditions at a 7-day lead time over the North Atlantic/U.K. region boreal winter. Hindcasts are less useful (i.e., relatively lower ratio of hits to false alarms) during negative PNA conditions at a 10-day lead time over the North Pacific/western U.S. region in boreal winter. Observed AR frequency anomalies are relatively large for the region and mode pairings examined here (shown in GW2015), which is why these pairings are shown. AR hindcast utility is also higher during La Niña and negative PNA conditions over the North Atlantic/U.K. region at a 7-day lead time in boreal winter (Fig. S1 in the supplemental material).
Observed anomalies of AR occurrence (1996–2013, ERA-I) are shown in Fig. 9 for phase-locked conditions of ENSO and PNA (El Niño and +PNAin Fig. 9a; La Niña and −PNA in Fig. 9b). AR occurrence is defined as the number of ARs over a 14-day time window covering a grid cell. Anomalies are computed as the AR occurrence values on phase-locked ENSO/PNA days minus the long-term mean AR occurrence values. The spatial pattern of the individual mode AR occurrence anomalies is reminiscent of the individual mode AR frequency anomalies shown in GW2015. It is clear from Fig. 9 that over the 1996–2013 observed period, phase-locked AR occurrence anomalies are enhanced relative to the individual mode magnitudes over certain regions shown in GW2015 (e.g., North Pacific/western United States).
ROC curves for positive and negative phases of the phase-locked mode conditions are shown in Fig. 10 at 3-day (solid line), 7-day (dashed line), and 10-day (dotted line) lead times during DJF over the North Pacific/western U.S. region and over the North Atlantic/U.K. region, along with the ROC scores from the bootstrapped distributions for each phase-locked mode conditions at 10- and 7-day lead times, respectively. Mean area under ROC curve scores for bootstrapped samples shown in Figs. 8 and 10 are displayed in Table 1. Over the North Pacific/western U.S. region at a 10-day lead, the difference between El Niño and +PNA and La Niña and −PNA mean ROC score shown in Fig. 10a is twice as large as the difference between either the El Niño and La Niña mean ROC score shown in Fig. 8a or the +PNA and −PNA mean ROC score shown in Fig. 8c. Similarly, over the North Atlantic/U.K. region, the difference shown in Fig. 10b is larger than the differences between the individual mode mean ROC scores shown in Fig. S1. These results indicate that the difference in AR forecast utility on positive and negative phase-locked conditions can be even larger than on positive and negative conditions of the individual modes themselves, suggesting the potential for “forecasts of opportunity” when a forecast is initialized during these phase-locked periods.
Although the influence of ENSO and PNA on North Pacific/western U.S. climate is well known, the linkage between these modes and circulation over the North Atlantic/U.K. region is less obvious. Brönnimann (2007) summarized the impact of ENSO on European climate and noted that in late winter, El Niño (La Niña) conditions generally resemble negative (positive) NAO conditions, but that the signal is different in early winter and spring. This review paper also notes that the mean ENSO signal over Europe is not strong, but can vary and is more important if the local atmosphere is preconditioned in a particular flow regime. Stan et al. (2017) note that in general, a primary response of the extratropical circulation to organized tropical convection is modified Rossby wave propagation and midlatitude storm track amplitude and orientation. Song et al. (2009) found an anticorrelation between the PNA and NAO during boreal winter that they attributed to anomalous Rossby wave breaking events associated with the PNA pattern. Similarly, Drouard et al. (2015) found that geopotential height anomalies over the North Pacific/western U.S. region associated with ENSO and/or the PNA pattern modify the direction of downstream wave propagation and subsequent wave breaking over the North Atlantic region. It is therefore plausible that individual and phase-locked ENSO and PNA conditions could affect climate (and therefore prediction skill of ARs, which is intimately related to regional circulation) over the North Atlantic/U.K. region via anomalous upstream Rossby wave breaking events in the midlatitudes (ENSO projects onto midlatitude North Pacific climate by modulating the phase or amplitude of the subtropical jet, which also influences the PNA pattern, and these wave perturbations are then felt downstream over Europe days later).
This study builds on recent advances in objective detection of atmospheric rivers that can be applied to both observations (GW2015) and global weather and climate models (Guan and Waliser 2017) and the availability of subseasonal hindcasts from operational models (Vitart et al. 2017) to quantify global prediction skill of ARs using two decades of ECMWF hindcast data at lead times ranging from 1 to 14 days. AR prediction skill is first quantified by diagnosing the ability of the ECMWF model to predict an observed AR as a function of lead time and distance threshold between the centroid of an observed and forecasted AR, averaged over all hindcasts. The average percentage of ECMWF ensemble members that skillfully predict an AR shows highest variability during boreal winter in the North Pacific/western United States (Fig. 4). Southern Hemispheric regions exhibit little seasonal dependence in AR forecast skill. Over the North Pacific/western U.S. region, the lead time at which AR hit percentage equals 50% increases by 2.5 days during NDJFM relative to MJJAS using a 1000-km threshold, and the AR hit percentage at a 7-day lead time increases by 15%–20% during NDJFM relative to MJJAS using a 1000-km distance threshold.
This study also examines 1996–2013 average ROC curves, which explicitly account for both hits (forecasted AR and observed AR) and false alarms (forecasted AR but no observed AR). The long-term mean curves are shown in Fig. 6 and are composited on various phases of climate variability at hindcast initialization in Figs. 8 and 10. This alternative methodology quantifies the extent to which an AR forecast is more useful than a random guess. Evidence for statistical increases in AR forecast utility (i.e., relatively higher ratio of hits to false alarms) at a 10-day lead is demonstrated during El Niño conditions over the North Pacific/western U.S. region in boreal winter and at a 7-day lead during negative AO conditions over the North Atlantic/U.K. region in boreal winter. Evidence for statistically significant decreases in AR forecast utility (i.e., relatively lower ratio of hits to false alarms) is demonstrated at a 10-day lead during negative PNA conditions over the North Pacific/western U.S. region in boreal winter. It is also shown that the difference in AR forecast utility between positive and negative phase-locked conditions of ENSO and PNA is at least twice as large over the North Pacific/western U.S. region and North Atlantic/U.K. region compared to the positive and negative individual mode conditions, suggesting the potential for “forecasts of opportunity” when a forecast is initialized during these phase-locked periods of climate variability.
Future studies must address several aspects of quantifying AR prediction skill that are not explored in this study. Multimodel assessment of AR prediction skill is needed in order to test these results with respect to sensitivities in model configuration, hindcast frequency, ensemble size, and record period. Additional features of the ARs themselves (e.g., orientation with respect to coastline and orography) might also be considered, as they can have significant impacts on resultant extreme precipitation. More targeted metrics of AR skill (e.g., landfall location) might be considered in future studies to more directly target water resource management and decisions applications. The composite ROC results should be calculated using different time periods to account for potential nonstationarity of climate mode teleconnections (e.g., Diaz et al. 2001). Subsequent efforts might also quantify potential predictability limits (e.g., Waliser et al. 2003; Lavers et al. 2016a) of AR objects on subseasonal time scales, as well as AR frequency of occurrence, intensity, and landfall location. Finally, future studies might evaluate forecasts of greater than 7 days using time windows greater than 24 h, which could yield greater skill than the values reported here.
We gratefully acknowledge the availability of the S2S hindcast database that makes this work possible. S2S is a joint initiative of the World Weather Research Programme (WWRP) and the World Climate Research Program (WCRP). The original S2S database is hosted at ECMWF as an extension of the “TIGGE database.” We would like to acknowledge support from the NASA Energy and Water Cycle Program and the California Department of Water Resources. We thank the two anonymous reviewers for greatly improving the quality of the manuscript. M.D.’s and D.W.’s contributions to this study were carried out on behalf of the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.
Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JHM-D-17-0135.s1.