1. Introduction
Coherent, cross-isentropically ascending warm and moist airstreams associated with midlatitude low pressure systems are often referred to as warm conveyor belts (WCBs; e.g., Carlson 1980). These WCBs are not only highly relevant for total and extreme precipitation in many parts of the extratropics (Pfahl et al. 2014), they also have a major effect on the atmospheric dynamics in midlatitudes (e.g., Wernli and Davies 1997; Pomroy and Thorpe 2000; Grams et al. 2011; Madonna et al. 2014b; Binder et al. 2016) and on the amplification of forecast errors (e.g., Lamberson et al. 2016; Martínez-Alvarado et al. 2016; Baumgart et al. 2018; Grams et al. 2018; Rodwell et al. 2018; Berman and Torn 2019; Maddison et al. 2019). Furthermore, the physical and dynamical processes associated with WCBs and their representation in numerical models are important sources of forecast uncertainty (e.g., Joos and Wernli 2012; Joos and Forbes 2016; Maddison et al. 2020). Thus, an adequate representation of WCBs is desirable in NWP and climate models.
First introduced by Browning et al. (1973) and Harrold (1973), WCBs are defined as cyclone-relative airstreams that ascend from the planetary boundary layer to the upper troposphere along vertically sloping isentropic surfaces. Assuming the absence of nonconservative forces, early studies identified WCBs using cyclone-relative streamlines on a wet-bulb potential temperature surface (e.g., Harrold 1973; Browning and Roberts 1994). To quantify this concept of WCBs, Wernli and Davies (1997) introduced a Lagrangian method based on three-dimensional kinematic trajectory calculations. With this approach, WCBs are defined as a set of trajectories along which the specific humidity decreases in 48 h by at least 10 g kg−1 (Wernli and Davies 1997) or which ascend in 48 h by at least 600 hPa (Madonna et al. 2014b). This Lagrangian approach is widely used and the analysis of physical processes along the trajectories has significantly advanced our understanding of WCBs and their effect on the large-scale flow (e.g., Eckhardt et al. 2004; Grams et al. 2011; Madonna et al. 2014b; Martínez-Alvarado et al. 2016).
The inflow of WCBs is located in a cyclone’s warm sector ahead of the cold front (label 1 in Fig. 1). At this stage, air parcels still reside predominantly in the planetary boundary layer. WCB inflow is typically characterized by strong moisture flux convergence and a band of high water vapor transport that supplies moisture to the base of the WCB (Wernli 1997; Dacre et al. 2019). The WCB starts to ascend across the warm front of the cyclone (label 2 in Fig. 1), collocated with a region of upper- and lower-tropospheric quasigeostrophic forcing for ascent (Binder et al. 2016). WCB ascent is accompanied by strong latent heat release due to stratiform and convective cloud and precipitation formation (Neiman and Shapiro 1993; Oertel et al. 2019), which is sensitive to the moisture supply in the warm sector (e.g., Field and Wood 2007; Boutle et al. 2011; Schäfler and Harnisch 2015; Berman and Torn 2019; Dacre et al. 2019). Overall, the latent heating increases the potential temperature of the air parcels on average by 20 K (Eckhardt et al. 2004; Madonna et al. 2014b). Below and close to the level of maximum latent heating, a cyclonic potential vorticity (PV) anomaly is produced, which may affect the subsequent life cycle of the associated midlatitude low pressure system (e.g., Kuo et al. 1991; Rossa et al. 2000; Dacre and Gray 2013; Binder et al. 2016). In contrast, above the level of maximum latent heating PV is destroyed (Wernli and Davies 1997; Pomroy and Thorpe 2000). Though the net PV change from WCB inflow to WCB outflow is approximately zero, the latent heat release during the WCB ascent leads to a net cross-isentropic transport of lower-tropospheric low-PV air into the upper troposphere (Grams et al. 2013; Madonna et al. 2014b; Methven 2015) where it generates anticyclonic PV anomalies (label 3 in Fig. 1). When this WCB outflow, along with its diabatically driven divergent flow, impinges on the midlatitude waveguide and jet stream, the waveguide is deflected poleward, and a ridge is built (e.g., Pomroy and Thorpe 2000; Grams et al. 2011; Teubler and Riemer 2016; Bosart et al. 2017; Quinting and Reeder 2017). This ridge building may trigger the development and downstream propagation of a baroclinic Rossby wave packet (Röthlisberger et al. 2018), and may contribute to the onset and maintenance of blocking anticyclones (e.g., Pfahl et al. 2015; Grams and Archambault 2016; Steinfeld and Pfahl 2019).

Conceptual illustration of a WCB and its relation to the large-scale synoptic situation. “L” indicates the center of a low pressure system with its cold front (blue line) and warm front (red line) attached to it. Shaded arrow denotes typical WCB trajectory with colors showing the PV along the trajectory (low PV values in blue, high PV values in red). The gray arrow is the shadow of the WCB on the ground. Blue hatching indicates the upper-tropospheric anticyclonic PV anomaly that is associated with a high pressure system (labeled “H”) at the surface. Numbers indicate WCB 1) inflow, 2) ascent, and 3) outflow stages.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Conceptual illustration of a WCB and its relation to the large-scale synoptic situation. “L” indicates the center of a low pressure system with its cold front (blue line) and warm front (red line) attached to it. Shaded arrow denotes typical WCB trajectory with colors showing the PV along the trajectory (low PV values in blue, high PV values in red). The gray arrow is the shadow of the WCB on the ground. Blue hatching indicates the upper-tropospheric anticyclonic PV anomaly that is associated with a high pressure system (labeled “H”) at the surface. Numbers indicate WCB 1) inflow, 2) ascent, and 3) outflow stages.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Conceptual illustration of a WCB and its relation to the large-scale synoptic situation. “L” indicates the center of a low pressure system with its cold front (blue line) and warm front (red line) attached to it. Shaded arrow denotes typical WCB trajectory with colors showing the PV along the trajectory (low PV values in blue, high PV values in red). The gray arrow is the shadow of the WCB on the ground. Blue hatching indicates the upper-tropospheric anticyclonic PV anomaly that is associated with a high pressure system (labeled “H”) at the surface. Numbers indicate WCB 1) inflow, 2) ascent, and 3) outflow stages.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Since WCBs alter the large-scale flow, systematic forecast errors associated with WCBs may project on the representation of the large-scale flow in NWP and climate models. As an example, the forecast of the blocking frequency during Northern Hemisphere winter has been shown to be underrepresented in NWP as well as in climate models (e.g., Tibaldi and Molteni 1990; Matsueda 2009; Masato et al. 2013; Hamill and Kiladis 2014; Davini et al. 2017). Though increasing the model resolution generally improves the representation of blocking in the Northern Hemisphere (e.g., Matsueda et al. 2009; Schiemann et al. 2017), state-of-the-art NWP models still underestimate the wintertime blocking frequency over the Atlantic–European sector (Matsueda 2009; Quinting and Vitart 2019). Furthermore, the structure and propagation of upper-level Rossby waves have been shown to be systematically misrepresented in several NWP models (Gray et al. 2014; Quinting and Vitart 2019). Since the representation of blocking is closely related to the representation of upper-level Rossby waves (e.g., Altenhoff et al. 2008; Martínez-Alvarado et al. 2018), it would be worthwhile to identify the processes that deteriorate the representation of Rossby waves. Recent studies have shown that errors in the representation of WCBs may lead to errors in the downstream Rossby wave pattern (e.g., Madonna et al. 2015; Lamberson et al. 2016; Martínez-Alvarado et al. 2016; Baumgart et al. 2018; Grams et al. 2018; Rodwell et al. 2018; Berman and Torn 2019; Maddison et al. 2019). These results suggest that WCBs project and amplify forecast errors from small scales to the large-scale flow, in particular errors related to latent heat release. Accordingly, an adequate representation of WCBs could reduce systematic forecast errors in the Northern Hemisphere large-scale flow.
The only study so far toward a systematic evaluation of WCBs is provided by Madonna et al. (2015), who investigated operational medium-range European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecasting System (IFS) forecasts for the three Northern Hemisphere winters of 2002/03, 2006/07, and 2010/11. Although they did not find a systematic bias in WCB activity, they showed that several particularly poor forecasts were indeed related to errors in WCBs. However, a more systematic evaluation of the representation of WCBs in numerical models is still missing. Especially ensemble reforecast datasets (e.g., Hamill and Kiladis 2014; Vitart et al. 2017; Pegion et al. 2019) that are run with a fixed numerical model provide an opportunity for such an endeavor. The lack of a systematic evaluation of WCBs in NWP models is likely due to the lack of sufficient input data required for the trajectory calculation to objectively identify WCBs. The calculation requires data at a high spatial [
First, we aim to objectively identify independent predictor variables for the predictands WCB inflow, ascent, and outflow.
Second, we develop and evaluate statistical frameworks that identify gridded binary footprints for WCB inflow, ascent, and outflow using the identified predictor variables.
Potential predictor variables are chosen according to meteorological parameters available in the subseasonal-to-seasonal (S2S) prediction project database that contains model weather forecasts from 11 different NWP centers (Vitart et al. 2017). The choice of parameters available in the S2S database is representative for other large datasets at comparably low spatiotemporal resolution such as NWP reforecasts, climate hindcasts, and climate projections. The representation of WCBs in the S2S prediction project database is documented in a companion study.
The paper is organized as follows: Section 2 introduces the data on which the statistical frameworks are built. The predictor selection and the development of the statistical models are described in section 3 including a discussion of the reasons for choosing logistic regression and the limitations of this approach. In section 4 we evaluate the performance of the models during Northern Hemisphere winter and demonstrate their applicability to an operational ECMWF ensemble forecast of a WCB event during January 2011. The study ends with concluding remarks and an outlook in section 5.
2. Data
a. Predictor dataset
The predictor selection as well as the development and evaluation of the logistic regression models is based on ECMWF’s interim reanalysis data (ERA-Interim; Dee et al. 2011). In this study, we use data for the period 1 March 1979–30 November 2016. The data are output 6 hourly at 0000, 0600, 1200, and 1800 UTC, are remapped from their original T255 spectral resolution to a regular 1° × 1° latitude–longitude grid, and are available on both original 60 model levels and several pressure levels. As the regression models are designed to be applicable to output of S2S prediction models, we only use selected pressure levels that match those that are available in the reforecast dataset of the S2S prediction project database. These levels are 1000, 925, 850, 700, 500, 300, and 200 hPa. Further, the regression models are based on temperature T, geopotential height ϕ, specific humidity q, horizontal wind components u and υ, and derivations of these quantities (Table 1). Thus, variables that are direct output of a model parameterization (e.g., convective precipitation) are not considered even though they would potentially be valuable predictors. Given that the parameterized variables depend on the aforementioned physical parameters T, ϕ, q, u, and υ, including parameterized variables as predictors in the logistic regression models might make it more difficult to evaluate the representation of WCBs in NWP and climate models.
List of parameters chosen for stepwise-forward selection with standard variable identifiers applied and vh denoting the horizontal wind vector. Some parameters are scaled with power of 10 to ease the interpretation of regression coefficients. Parameters are evaluated at single pressure levels of 1000, 925, 850, 700, 500, 300, and 200 hPa. Outflow predictor parameters have also been evaluated vertically integrated between 500–200 and 300–200 hPa, respectively (see Figs. 2f and 4c). Crosses in the three right columns indicate whether the variable was considered as potential predictor for the respective WCB layer. The definition of the baroclinic moisture flux can be found in McTaggart-Cowan et al. (2017). Here we use their Eq. (11) not considering the translation velocity of the system.


b. Predictand dataset
The predictand fields of gridded WCB inflow, ascent, and outflow are derived from the Lagrangian WCB trajectory data of Madonna et al. (2014b) extended to 2016 (Sprenger et al. 2017), which in turn used model level data from the ERA-Interim dataset as described above (section 2a). Madonna et al. (2014b) identify WCBs based on 48-h kinematic forward trajectories computed with the Lagrangian Analysis Tool (LAGRANTO; Wernli and Davies 1997; Sprenger and Wernli 2015). To find the starting points of possible WCBs, trajectories are started globally in the lower troposphere from an equidistant grid every 80 km in the horizontal and vertically every 20 hPa from 1050 to 790 hPa at 0000, 0600, 1200, and 1800 UTC. As a necessary condition for WCBs, only those trajectories are considered that ascend in 48 h by at least 600 hPa from the lower to the upper troposphere (Madonna et al. 2014b). As an additional criterion, the trajectories need to be matched with an extratropical cyclone identified by the methodology of Wernli and Schwierz (2006) at least once during this 48-h period. Tropical cyclones are excluded by not considering cyclones between 25°S and 25°N for the matching with the potential WCB trajectories. After the WCB trajectories are identified, all WCB parcel locations at a given time1 are binned into three vertical layers (Schäfler et al. 2014). The first layer extends from the surface to 800 hPa and is referred to as WCB inflow. The second layer comprises air masses between 800 and 400 hPa and is referred to as WCB ascent. Last, the WCB outflow includes air masses above 400 hPa. After binning the data into the three layers, the trajectories are gridded on a 1° × 1° regular latitude–longitude grid. Grid points without (with) a WCB trajectory are labeled as 0 (1), yielding the dichotomous dependent predictands for WCB inflow, ascent, and outflow, respectively.
Madonna et al. (2014b) show that the Northern Hemisphere WCB activity is most pronounced during December, January, and February (DJF). For this reason, our analysis focuses on DJF only, but results for other seasons are also presented in the online supplementary material. The climatological occurrence frequency of WCBs during DJF is strongly related to the midlatitude storm tracks (cf. Madonna et al. 2014b; Sprenger et al. 2017). During DJF, WCB inflow is most frequent over the western North Pacific and the western North Atlantic (Fig. 2a) located on the warm side of the main baroclinic zones and south of the winter storm track entrance regions (e.g., Wernli and Schwierz 2006; Sprenger et al. 2017). In general, the climatological WCB ascent frequency (Fig. 2c) is less than the inflow frequency likely due to the short time that trajectories reside in this layer. WCB ascent occurs most frequently to the north of the main inflow regions. Over the North Atlantic, the region of highest WCB ascent frequency exhibits a southwest to northeast tilt, which corresponds to the tilt of the midlatitude storm track in this region (e.g., Wernli and Schwierz 2006). On the other hand, WCB outflow is most frequent over the central North Pacific and central North Atlantic (Fig. 2e), i.e., downstream of the main WCB inflow and ascent regions.

Climatological frequency of WCB (a) inflow, (c) ascent, and (e) outflow for DJF in the period 1 Dec 1979–28 Feb 2016 (shading; in %) based on the Lagrangian dataset. Most important predictors according to forward selection scheme for DJF logistic regression models of WCB (b) inflow, (d) ascent, and (f) outflow. Black dots and crosses in (f) indicate grid points at which predictors were identified as being most important when being integrated between 300–200 and 500–200 hPa, respectively. Black contour denotes a climatological WCB frequency of 1% at the respective WCB stage.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Climatological frequency of WCB (a) inflow, (c) ascent, and (e) outflow for DJF in the period 1 Dec 1979–28 Feb 2016 (shading; in %) based on the Lagrangian dataset. Most important predictors according to forward selection scheme for DJF logistic regression models of WCB (b) inflow, (d) ascent, and (f) outflow. Black dots and crosses in (f) indicate grid points at which predictors were identified as being most important when being integrated between 300–200 and 500–200 hPa, respectively. Black contour denotes a climatological WCB frequency of 1% at the respective WCB stage.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Climatological frequency of WCB (a) inflow, (c) ascent, and (e) outflow for DJF in the period 1 Dec 1979–28 Feb 2016 (shading; in %) based on the Lagrangian dataset. Most important predictors according to forward selection scheme for DJF logistic regression models of WCB (b) inflow, (d) ascent, and (f) outflow. Black dots and crosses in (f) indicate grid points at which predictors were identified as being most important when being integrated between 300–200 and 500–200 hPa, respectively. Black contour denotes a climatological WCB frequency of 1% at the respective WCB stage.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
3. Logistic regression models
This section provides some background information on logistic regression in general as well as a detailed description of the predictor selection and the development of the logistic regression models. For computational reasons, the initial predictor selection is performed for individual grid points on a 5° × 5° latitude–longitude grid. Predictor selection and model development are performed for each season (i.e., DJF, MAM, JJA, SON) separately and are based on ERA-Interim data at 0000 UTC. For each season, we always use continuous 3-month periods within the available data period 1 March 1979–30 November 2016. We choose 0000 UTC for consistency with the S2S prediction project database (Vitart et al. 2017) in which most NWP centers provide only forecasts valid at 0000 UTC.
a. Background
The above discussion of Eqs. (1) and (2) already reveals a major advantage of the logistic regression approach compared to other classification techniques such as nonlinear support vector machines (Vapnik 1963) or deep learning methods (McGovern et al. 2019): the regression coefficients directly give inference about the importance of each predictor. Thus, logistic regression models can be used quite intuitively to find out the relationship between the predictands and independent predictor variables, and allow to check the model’s plausibility (Dreiseitl and Ohno-Machado 2002). A further advantage is that logistic regression outputs probabilities for a certain feature instead of giving a final classification as, for example, in decision trees (Quinlan 1986). Thus, if a predictor set yields a higher probability for one class than another predictor set for the same class, one can deduce which set is more accurate for the problem. A limitation of logistic regression is that complex, nonlinear relationships are difficult to capture. More complex algorithms such as artificial neural networks often outperform logistic regression (Dreiseitl and Ohno-Machado 2002; Gagne et al. 2019; Chattopadhyay et al. 2020). Given that the focus of this study is the predictor selection and its meteorological interpretation rather than the mere classification performance, logistic regression is chosen.
b. Stepwise predictor selection
The choice of independent predictor variables is an essential component of the logistic regression model development, but also for potential developments of more advanced statistical techniques in the future. We choose initial lists of predictors for WCB inflow, ascent, and outflow based on existing literature on the structure of WCBs and physical considerations (section 1; Table 1). Starting from the lists in Table 1, we determine the most important predictors via a stepwise forward selection scheme (e.g., Hosmer and Lemeshow 2000; Leroy and Wheeler 2008; Slade and Maloney 2013; Mohr et al. 2015; McGovern et al. 2019). Note that for each potential predictor we test all available pressure levels and for some also layer averages as detailed in the caption of Table 1. The predictor selection is based on daily 0000 UTC ERA-Interim data for the period 1 March 1979–30 November 2016.
In stepwise forward selection, predictor variables are sequentially added to an initial intercept-only model, i.e., a model without predictors, according to the following iterative scheme. Starting with only one predictor, we determine the regression coefficient β1 using 10 times 10-fold cross validation, which provides insights on how the models will generalize to an independent dataset. In this approach, the sets of independent (ERA-Interim based predictors) and dependent observations (WCB inflow, ascent, and outflow regions determined from the trajectory analysis) are divided randomly into 10 folds of approximately equal size. The first fold is used as the validation set, and the model itself is fit on the remaining 9 folds. For each iteration of the cross validation, the training data are standardized by removing the mean and by scaling to unit variance. The training data are then used to fit the model. For each iteration of the cross validation the log-likelihood for the intercept-only model L0 and for the model with one predictor variable
Though 10 predictors could be manageable with reanalysis data, computational costs would increase considerably when applying the models to ensemble reforecast on S2S time scales or to climate model projections. The log-likelihood ratio test reveals that for all four seasons and for all three stages of the WCB, sets of four predictors are significant at the 95% confidence level (Fig. 3). Given that this is well above the minimum threshold of 80%–85% recommended by Hosmer and Lemeshow (2000), we consider four predictors appropriate. For more predictors, the confidence level drops rapidly as shown by increasing p values. Especially for WCB outflow the p value increases significantly with increasing number of predictors (Fig. 3c) and is therefore the limiting reason for choosing four predictors.

Significance levels of the log-likelihood ratio test for WCB (a) inflow, (b) ascent, and (c) outflow as a function of the number of predictors for all seasons. Lines show the median p value across all 5° × 5° latitude–longitude grid points, dots denote the mean p value, and gray shading denotes the interquartile range.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Significance levels of the log-likelihood ratio test for WCB (a) inflow, (b) ascent, and (c) outflow as a function of the number of predictors for all seasons. Lines show the median p value across all 5° × 5° latitude–longitude grid points, dots denote the mean p value, and gray shading denotes the interquartile range.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Significance levels of the log-likelihood ratio test for WCB (a) inflow, (b) ascent, and (c) outflow as a function of the number of predictors for all seasons. Lines show the median p value across all 5° × 5° latitude–longitude grid points, dots denote the mean p value, and gray shading denotes the interquartile range.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
It should be noted that the confidence level is used to evaluate whether the unique effect of a certain predictor on the conditional probability is zero. If the predictors are highly autocorrelated then there is little unique effect so that their incorporation into the model does not yield a significant improvement in the model’s performance. Accordingly, variables that are highly autocorrelated are likely to be filtered out during the forward predictor selection. This is confirmed by calculation of the variance inflation factor (VIF; e.g., Alin 2010) for the final sets of four predictor variables. The VIF measures how much the variances of the logistic regression coefficients increase compared to conditions when the predictor variables are uncorrelated. The larger the VIF, the more the variances are inflated. VIF values in excess of 5–10 are usually considered as indicators of autocorrelation between the predictors (e.g., Rogerson 2001; Drobot and Maslanik 2002). For all final sets of four predictors of WCB inflow and ascent, and at all grid points where a predictor selection was performed the VIF ranges between 0.5 and 2 (not shown). It is only for WCB outflow where the VIF reaches values of 3.5 in regions with the climatologically highest WCB outflow frequency. Based on the range of the VIF values we consider the predictors for each of the three WCB stages as uncorrelated.
The most important WCB inflow predictors, as defined by the highest significance level for a given 5° × 5° latitude–longitude grid point, are quantities related to lower-tropospheric moisture flux and its convergence (Fig. 2b). The only exception is the western to central North Pacific (20°–40°N, 140°–170°E) where thickness advection and temperature advection are identified as the primary predictors. The regression model’s suggested link between moisture transport and WCBs is in line with previous studies that used analytical models (Field and Wood 2007), idealized simulations (Boutle et al. 2011), and analysis as well as forecast data (e.g., Berman and Torn 2019; Dacre et al. 2019).
In contrast, the spatial distribution of the most important predictors for WCB ascent is less homogeneous. South of 60°N, the predictor variables of relative vorticity, relative humidity, specific humidity, and quantities related to horizontal moisture flux are identified (Fig. 2d). In general, WCB ascent is associated with saturation and condensation, which is reflected by relative and specific humidity being important predictors. The ascent further redistributes vorticity via stretching so that cyclonic vorticity increases in the lower troposphere. Thus, the most important predictors for WCB ascent identified by the stepwise forward selection scheme are in line with the general concept of the ascent stage of WCBs. Poleward of 60°N and over the central North Pacific, temperature advection and thickness advection are the most important predictors, which is in line with the general concept that WCBs ascend in regions of quasigeostrophic forcing (Binder et al. 2016).
The most important predictor variables for WCB outflow are relative humidity and the irrotational wind (Fig. 2f). Although relative humidity is most important in regions of climatologically high WCB outflow occurrence frequency over the North Pacific and the North Atlantic, the irrotational wind is most important north of 70°N and south of 35°N over the Atlantic, and north of 60°N over the Pacific. The fact that relative humidity and irrotational wind components are identified as the most important predictors matches the conceptual picture of the WCB outflow stage, which is characterized by a broad cirrus shield (e.g., Carlson 1980; Wernli et al. 2016) and an irrotational flow that may interact with the midlatitude waveguide (e.g., Grams and Archambault 2016; Berman and Torn 2019).
The previous analysis identified the most important predictors individually at each 5° × 5° grid point compared to the intercept-only model. The regression models themselves are again developed individually for each grid point. For computational efficiency we aim to use a set of four identical predictor variables for all logistic regression models. Therefore, we still need to determine the globally most important predictors of the multivariate logistic regression models. The four most important predictor variables are selected by ranking all predictor variables by largest weighted number of occurrences aggregated across all pressure levels and choosing the pressure level that has the largest weighted number of occurrences. The number of occurrences is weighted with the cosine of latitude at the corresponding grid point since otherwise predictors at high latitudes would receive more weight. Also, as we are interested in a good performance of the regression models in regions of climatologically high WCB occurrence frequency, the number of occurrences is additionally weighted with the climatological WCB frequency at the respective grid point.

Predictors for DJF logistic regression models of WCB (a) inflow, (b) ascent, and (c) outflow ranked according to the number of occurrences in the Northern Hemisphere as one of the top four predictors. In (c) the vertical layers over which WCB outflow predictors were integrated are given by subscripts (in hPa). Regions where these vertically integrated predictors are most important are highlighted by black dots and crosses in Fig. 2f.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Predictors for DJF logistic regression models of WCB (a) inflow, (b) ascent, and (c) outflow ranked according to the number of occurrences in the Northern Hemisphere as one of the top four predictors. In (c) the vertical layers over which WCB outflow predictors were integrated are given by subscripts (in hPa). Regions where these vertically integrated predictors are most important are highlighted by black dots and crosses in Fig. 2f.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Predictors for DJF logistic regression models of WCB (a) inflow, (b) ascent, and (c) outflow ranked according to the number of occurrences in the Northern Hemisphere as one of the top four predictors. In (c) the vertical layers over which WCB outflow predictors were integrated are given by subscripts (in hPa). Regions where these vertically integrated predictors are most important are highlighted by black dots and crosses in Fig. 2f.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
c. Logistic regression model training
After having determined the four most important predictors for WCB inflow, ascent, and outflow in the Northern Hemisphere, we now calculate the regression coefficients separately for individual grid points with a 1° × 1° horizontal grid spacing. The choice of the grid spacing allows to account for local variability and to compare the results to the Lagrangian climatology, which is on the same grid. To ensure a representative sample size only those grid points are considered at which the observed climatological WCB frequency reaches at least 1%. The calculation of the regression coefficients for DJF is based on daily ERA-Interim data at 0000 UTC for the period 1 December 1979–28 February 1999. This 20-yr period is chosen to be shorter than the period for the predictor selection (cf. section 3b) in order to validate the regression models on an independent validation period covering the years 1 December 1999–28 February 2016 in section 4. Each season is treated separately and we choose the respective months during the indicated data periods. Similar to the process for selecting the predictors, we employ 10 times 10-fold cross validation to compute the regression coefficients, which are shown for WCB inflow (Fig. 5), WCB ascent (Fig. 6), and WCB outflow (Fig. 7), all during DJF. Note that the intercept values β0,i (Fig. 5a), β0,a (Fig. 6a), and β0,o (Fig. 7a) of the logistic regression models do not have a physical interpretation, since they represent the mean of the dependent variables when setting all independent variables to zero. The regression coefficients for DJF are qualitatively similar to those computed during the other seasons (MAM, JJA, and SON), which are shown and provided in netCDF format in the supplemental material (Figs. S1–S9).

Regression coefficients (a) β0,i, (b) β1,i, (c) β2,i, (d) β3,i, and (e) β4,i for the DJF WCB inflow model (shading). Contours denote the DJF climatological Lagrangian WCB inflow frequency in the period 1 Dec 1999–28 Feb 2016 (black contours at 2%, 4%, 6%, 8%, and 10%). Regression coefficients are only calculated and shown for regions where the climatological WCB inflow frequency reaches at least 1%.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Regression coefficients (a) β0,i, (b) β1,i, (c) β2,i, (d) β3,i, and (e) β4,i for the DJF WCB inflow model (shading). Contours denote the DJF climatological Lagrangian WCB inflow frequency in the period 1 Dec 1999–28 Feb 2016 (black contours at 2%, 4%, 6%, 8%, and 10%). Regression coefficients are only calculated and shown for regions where the climatological WCB inflow frequency reaches at least 1%.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Regression coefficients (a) β0,i, (b) β1,i, (c) β2,i, (d) β3,i, and (e) β4,i for the DJF WCB inflow model (shading). Contours denote the DJF climatological Lagrangian WCB inflow frequency in the period 1 Dec 1999–28 Feb 2016 (black contours at 2%, 4%, 6%, 8%, and 10%). Regression coefficients are only calculated and shown for regions where the climatological WCB inflow frequency reaches at least 1%.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Regression coefficients (a) β0,a, (b) β1,a, (c) β2,a, (d) β3,a, and (e) β4,a for DJF WCB ascent model (shading). Contours denote the DJF climatological Lagrangian WCB ascent frequency in the period 1 Dec 1999–28 Feb 2016 (black contours at 2%, 4%, 6%, 8%, and 10%). Regression coefficients are only calculated and shown for regions where the climatological WCB ascent frequency reaches at least 1%.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Regression coefficients (a) β0,a, (b) β1,a, (c) β2,a, (d) β3,a, and (e) β4,a for DJF WCB ascent model (shading). Contours denote the DJF climatological Lagrangian WCB ascent frequency in the period 1 Dec 1999–28 Feb 2016 (black contours at 2%, 4%, 6%, 8%, and 10%). Regression coefficients are only calculated and shown for regions where the climatological WCB ascent frequency reaches at least 1%.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Regression coefficients (a) β0,a, (b) β1,a, (c) β2,a, (d) β3,a, and (e) β4,a for DJF WCB ascent model (shading). Contours denote the DJF climatological Lagrangian WCB ascent frequency in the period 1 Dec 1999–28 Feb 2016 (black contours at 2%, 4%, 6%, 8%, and 10%). Regression coefficients are only calculated and shown for regions where the climatological WCB ascent frequency reaches at least 1%.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Regression coefficients (a) β0,o, (b) β1,o, (c) β2,o, (d) β3,o, and (e) β4,o for DJF WCB outflow model (shading). Contours denote the DJF climatological Lagrangian WCB outflow frequency in the period 1 Dec 1999–28 Feb 2016 (black contours at 2%, 4%, 6%, 8%, and 10%). Regression coefficients are only calculated and shown for regions where the climatological WCB outflow frequency reaches at least 1%.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Regression coefficients (a) β0,o, (b) β1,o, (c) β2,o, (d) β3,o, and (e) β4,o for DJF WCB outflow model (shading). Contours denote the DJF climatological Lagrangian WCB outflow frequency in the period 1 Dec 1999–28 Feb 2016 (black contours at 2%, 4%, 6%, 8%, and 10%). Regression coefficients are only calculated and shown for regions where the climatological WCB outflow frequency reaches at least 1%.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Regression coefficients (a) β0,o, (b) β1,o, (c) β2,o, (d) β3,o, and (e) β4,o for DJF WCB outflow model (shading). Contours denote the DJF climatological Lagrangian WCB outflow frequency in the period 1 Dec 1999–28 Feb 2016 (black contours at 2%, 4%, 6%, 8%, and 10%). Regression coefficients are only calculated and shown for regions where the climatological WCB outflow frequency reaches at least 1%.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
For WCB inflow, β1,i is positive in most regions with values ranging from 0.01 north of 45°N to 0.06 south of 30°N (Fig. 5b). This shows that positive 700-hPa thickness advection in particular south of 30°N is associated with an increased conditional probability of WCB inflow [cf. Eq. (3)]. Likewise β2,i and β3,i are positive at nearly all grid points (Figs. 5c,d). Thus, a poleward moisture flux at 850 hPa as well as moisture flux convergence at 1000 hPa correspond to an increased probability of WCB inflow. In contrast to the sign of β1,i, β2,i, and β3,i, the sign of β4,i is negative in most regions except for the subtropical western North Pacific (south of 30°N and from 120°E to 180°, in Fig. 5e). In this region, positive β4,i indicates that increased moist potential vorticity at 500 hPa is related to an increased probability for WCB inflow. In other regions of the midlatitudes, an increase in moist PV actually decreases the probability for WCB inflow.
Positive β1,a (Fig. 6b) indicates that cyclonic relative vorticity at 850 hPa is characteristic of an environment in which WCBs ascend [cf. Eq. (4)]. A physical explanation for this statistical relation is the redistribution of vorticity via stretching in ascending WCBs. Only north of 50°N and in regions of high topography (e.g., Greenland, Kamchatka Peninsula, Rocky Mountains) β1,a is locally negative. Meanwhile, highest values of β1,a are located south of 40°N. As one would expect, β2,a and β3,a are positive (Figs. 6c,d) and corresponds well to the ascent stage of a WCB, which is characterized by saturation (high relative humidity) and warm-air advection (forcing for quasigeostrophic ascent). In addition, the WCB is also characterized by the transport of moisture poleward and upward (e.g., Boutle et al. 2011), which is reflected by the positive regression coefficient β4,a associated with 500-hPa moisture transport in all regions (Fig. 6e). The highest values of β4,a (up to 150) occur at the northern edges of the climatological WCB ascent regions.
For WCB outflow, positive β1,o (Fig. 7b) reveals that high relative humidity at 300 hPa is associated with an increased probability of WCB outflow [cf. Eq. (5)]. This relation is evident at nearly all grid points except for the subtropical North Atlantic (south of 35°N and from 40° to 30°W) and subtropical North America (30°N, 100°W). Likewise, a higher irrotational wind speed at 300 hPa is linked to an increased probability of WCB outflow as highlighted by positive β2,o (Fig. 7c). Positive β3,o indicates that a higher static stability at 500 hPa corresponds to an increased outflow probability (Fig. 7d), which might be counterintuitive to WCB outflow expected in upper-tropospheric ridges of predominantly low static stability. We find two possible explanations: First, if WCB outflow is advected downstream along the flanks of an anticyclonically breaking ridge, the outflow may become collocated with a PV streamer underneath (e.g., Figs. 1g–i in Madonna et al. 2014a). Interestingly, β3,o is largest over the eastern North Atlantic where PV streamers characterized by high static stability occur frequently (Wernli and Sprenger 2007). Second, in situations with upright WCB ascent the outflow may be located above regions of enhanced static stability resulting from midtropospheric latent heat release. This physical interpretation does not apply to the subtropical North Pacific (south of 35°N and from 140°E to 140°W), the western North Pacific (45°–80°N, 130°–150°E), and the high-latitude North Atlantic (north of 80°N) where β3,o is negative. Finally, β4,o is negative in most regions (Fig. 7e), indicating that WCB outflow occurs in regions characterized by negative relative vorticity at 300 hPa, which is typical for upper-tropospheric ridges. In contrast, β4,o is positive over the subtropical North Pacific (south of 35°N and from 130°E to 120°W,) and the western North Atlantic (30°–45°N, 90°–60°W), which indicates that negative 300-hPa relative vorticity actually decreases the conditional probability.
4. Evaluation of the logistic regression model performance
The models for WCB inflow, ascent, and outflow are evaluated by comparing them against the respective gridded WCB trajectory data introduced as predictands in section 2b. We test the reliability for each WCB stage in the following additional configurations of the models. For consistency, each of them is based on a 20-yr-long training period and a 16-yr-long evaluation period.
Our standard models are trained on ERA-Interim data for DJF in the period 1 December 1979–28 February 1999 at 0000 UTC as outlined above and we evaluate their output for 0000, 0600, 1200, and 1800 UTC in DJF during the period 1 December 1999–28 February 2016. The output of the standard models is referred to as ec.erai_1979–1999. The period for model evaluation is chosen to be different from the training period in order to test for possible overfitting. In the case of overfitting, the models would perform well on the training data but poorly on the unseen testing data. We consider these models as the baseline models, use them for most of the analyses in this study, and provide their regression coefficients for the seasons MAM, JJA, and SON in the supplemental material as maps and in netCDF format.
In addition, we train the models on ERA-Interim data for DJF in the period 1 December 1995–28 February 2015 at 0000 UTC and evaluate their output for 0000, 0600, 1200, and 1800 UTC in DJF during the period 1 December 1979–28 February 1996. The output is referred to as ec.erai_1995–2015. The purpose of the later training period compared to the standard ec.erai_1979–1999 model is to assess the sensitivity of the model performance to long-term trends related to climate change or interdecadal variations.
The standard ec.erai_1979–1999 models, which are trained on ERA-Interim data for DJF in the period 1 December 1979–28 February 1999 at 0000 UTC, is instead evaluated on Japanese 55-year Reanalysis data (JRA-55; Kobayashi et al. 2015; Harada et al. 2016) covering DJF in the period 1 December 1999–28 February 2016 at 0000, 0600, 1200, and 1800 UTC. This approach is intended to provide insight into the reliability of the logistic regression model when trained on one dataset but applied to a different dataset (i.e., trained on ERA-Interim but evaluated on JRA-55). The output is referred to as jra55_1979–1999.
In contrast to the output in jra55_1979–1999, the predictors are recalibrated prior to applying the regression models by subtracting the seasonal mean difference between JRA-55 and ERA-Interim data averaged over DJF from the period 1 December 1999–28 February 2016. The output, which is a recalibrated form of the previous jra55_1979–1999 model, is referred to as jra55_recalibrated.
a. Reliability
The average agreement between the modeled WCB probabilities and the observed WCB frequencies can be assessed via reliability diagrams. For this purpose, the observed WCB frequencies are plotted against the predicted probabilities, where the range of predicted probabilities is divided into 19 regular bins from 0.05 to 0.95. The reliability curve of a perfect model would follow the solid diagonal line in Fig. 8. A model is considered to underestimate (overestimate) the observed WCB frequency when the model’s curve lies above (below) the solid diagonal line. Generally, the reliability of ec.erai_1979–1999 and ec.erai_1995–2015 hardly differ, which seems to suggest that the performance of the models is insensitive to long-term trends in the predictor variables related to interdecadal variability or climate change. For WCB inflow, the two models ec.erai_1979–1999 and ec.erai_1995–2015 follow the solid diagonal for observed frequencies less than 0.4 (Fig. 8a). For observed frequencies greater than 0.4, the regression models overestimate the WCB inflow frequency. The overestimation is more pronounced if the ec.erai_1979–1999 model is applied to JRA-55 data (see jra55_1979–1999 curve) and is due to systematic differences between ERA-Interim and JRA-55 in the predictor variables. The jra55_recalibrated model, which tries to account for the unique systematic biases in JRA-55 and ERA-Interim, improves upon the model reliability of jra55_1979–1999. Notably, the jra55_recalibrated model is characterized by improved reliability for observed frequencies up to 0.45, which is more comparable to the reliability of erai_1979–1999 than jra55_1979–1999. As the purpose of this work is to introduce logistic regression models rather than an evaluation of reanalysis data, an in-depth analysis of the differences between JRA-55 and ERA-Interim is beyond the scope of this study. Still, the improved reliability reveals that recalibrations of the predictor variables may be worthwhile when the regression models are applied to data that are not based on ERA-Interim.

Reliability diagrams for (a) inflow, (b) ascent, and (c) outflow during DJF. Modeled probabilities (x axis) and observed frequencies (y axis) are binned into 19 bins based on the modeled probabilities. The dots represent the average value for each bin. The four curves represent the reliability curve for the four model outputs as described in the text and labeled. The perfect modeled probability and a 10% interval about the perfect model is shown by the solid and dashed diagonals, respectively.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Reliability diagrams for (a) inflow, (b) ascent, and (c) outflow during DJF. Modeled probabilities (x axis) and observed frequencies (y axis) are binned into 19 bins based on the modeled probabilities. The dots represent the average value for each bin. The four curves represent the reliability curve for the four model outputs as described in the text and labeled. The perfect modeled probability and a 10% interval about the perfect model is shown by the solid and dashed diagonals, respectively.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Reliability diagrams for (a) inflow, (b) ascent, and (c) outflow during DJF. Modeled probabilities (x axis) and observed frequencies (y axis) are binned into 19 bins based on the modeled probabilities. The dots represent the average value for each bin. The four curves represent the reliability curve for the four model outputs as described in the text and labeled. The perfect modeled probability and a 10% interval about the perfect model is shown by the solid and dashed diagonals, respectively.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
As for WCB inflow, the reliability curve for WCB ascent follows the solid diagonal up to a modeled probability of 0.4 (Fig. 8b). For observed frequencies greater than 0.4, the modeled WCB ascent frequency is overestimated. Interestingly, the reliability curves of all four model outputs align better with each other than for WCB inflow. This indicates that the models for WCB ascent are less sensitive to the training period and that the predictor variables for WCB ascent are more similar in ERA-Interim and JRA-55 than for WCB inflow. The recalibration of the predictor variables improves the reliability of jra55_recalibrated by up to 0.02 so that its reliability curve nearly aligns with the ec.erai_1979–1999 curve.
Likewise, good agreement between the four regression model outputs is found for WCB outflow (Fig. 8c). For observed frequency values below 0.5, all four regression models underestimate the observed frequency of WCB outflow. In contrast, the observed frequencies greater than 0.5 are overestimated by all four regression models, such that the overestimation is more pronounced for WCB outflow than for WCB inflow or WCB ascent. The fact that the modeled probabilities in the upper bins > 0.5 are too high for all three WCB stages has two reasons. The first reason is related to the way WCBs are defined in our training dataset. As described in section 2b, the WCB criterion in the predictand data is only fulfilled if the trajectories are matched with an extratropical cyclone at least once during their lifetime (Madonna et al. 2014b). However, the logistic regression models do not account for this criterion. Thus, cases with properties similar to a WCB and high modeled probabilities are identified as WCB even though they are not associated with an extratropical cyclone in a strict sense as diagnosed by the intersect with a cyclone mask. In particular for WCB outflow, the WCB footprints as predicted by the regression models matched fewer cyclone masks than in the trajectory-based dataset (not shown). The second reason is related to the strict WCB ascent criterion of at least 600-hPa ascent within 48 h. Air masses that ascend slightly less than 600 hPa within 48 h exhibit characteristics very similar to WCBs. Accordingly the regression models predict high conditional probabilities although no WCB is identified in the predictand dataset.
b. Model skill
The calculation of TP, TN, FP, and FN requires dichotomous predictions. Thus, the modeled probabilities are converted to dichotomous predictions by applying a decision threshold above which a modeled probability is considered as WCB inflow, ascent, or outflow. To account for the fact that the chosen predictors are not optimal for each grid point (see Figs. 2b,d,f), the threshold is chosen to be gridpoint dependent. The logistic regression models must be able, first, to identify WCBs at instantaneous time steps and, second, to adequately represent the climatological WCB occurrence frequency. For the latter aspect, we tune the models’ output in the sense that we set the decision threshold to a value minimizing the climatological bias. The bias is defined as the difference between the modeled climatological WCB occurrence frequency and the observed climatological frequency. As an alternative approach, we set the decision threshold to a value at which the models reach their maximum F1 score defined as the harmonic mean of precision and recall. A comparison of the two approaches reveals that the skill of the models in terms of the Matthews correlation coefficient [Eq. (6)] is independent of the approach (not shown). However, the ability of the models to represent the climatological occurrence frequency deteriorates considerably when using the F1 score. Alternative methods to determine the decision threshold are often based on the receiver operating characteristic. These methods such as the Youden index (Youden 1950) or the maximum of the Peirce skill score (Manzato 2007; Mohr et al. 2015) may be misleading with imbalanced data (Davis and Goadrich 2006) and are therefore not applicable in the present study.
Based on the previous considerations, the threshold minimizing the climatological bias in WCB occurrence frequency is considered optimal to define dichotomous WCB predictions at instantaneous time steps. In general, the magnitude of the decision threshold scales with the climatological occurrence frequency of WCBs (Figs. 9a,c,e). For all WCB stages, the optimal decision thresholds are less than the commonly used threshold of 0.5. This is due to the imbalance of the two classes, which favors zeros (no WCB) at the expense of the ones (WCB) so that P(WCB|x) is underestimated (King and Zeng 2003). For WCB inflow, the decision threshold varies between 0.1 in regions where the climatology is less than 2%, and peaks between 0.25 and 0.35 over the western North Atlantic and western North Pacific (Fig. 9a). Similar decision thresholds are found for WCB ascent (Fig. 9c). In most areas over the North Atlantic the optimal decision threshold exceeds 0.2 and reaches local maxima over the western North Atlantic and the western North Pacific. Also for WCB outflow the decision threshold scales with the climatological occurrence frequency (Fig. 9e). In the main outflow regions over the central North Atlantic and the western to central North Pacific, the threshold exceeds 0.3. For the other meteorological seasons, the decision thresholds also scale with the climatological occurrence frequency of WCBs (Figs. S13–S15 in the online supplemental material), with the lowest decision thresholds during boreal summer.

Decision threshold (shading) based on minimum climatological bias for WCB (a) inflow, (c) ascent, and (e) outflow. Matthews correlation coefficient (shading) of the logistic regression model ec.erai_1979–1999 for (b) inflow, (d) ascent, and (f) outflow. Both metrics are shown for DJF. Contours denote the DJF climatological Lagrangian WCB frequency in the period 1 Dec 1999–28 Feb 2016 (black contours every 2%) for the respective WCB stage.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Decision threshold (shading) based on minimum climatological bias for WCB (a) inflow, (c) ascent, and (e) outflow. Matthews correlation coefficient (shading) of the logistic regression model ec.erai_1979–1999 for (b) inflow, (d) ascent, and (f) outflow. Both metrics are shown for DJF. Contours denote the DJF climatological Lagrangian WCB frequency in the period 1 Dec 1999–28 Feb 2016 (black contours every 2%) for the respective WCB stage.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Decision threshold (shading) based on minimum climatological bias for WCB (a) inflow, (c) ascent, and (e) outflow. Matthews correlation coefficient (shading) of the logistic regression model ec.erai_1979–1999 for (b) inflow, (d) ascent, and (f) outflow. Both metrics are shown for DJF. Contours denote the DJF climatological Lagrangian WCB frequency in the period 1 Dec 1999–28 Feb 2016 (black contours every 2%) for the respective WCB stage.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
After applying the above decision thresholds, the WCB predictions by the logistic regression models are characterized by positive MCC values, which indicate better than random predictions (Figs. 9b,d,f). For the WCB inflow models, the MCC mostly lies in the range of 0.2 to 0.3 in regions where the climatological occurrence frequency is 2%–4% (Fig. 9b). The highest values are found in the core regions of WCB inflow occurrence where the coefficient reaches 0.4 to 0.5 (30°–45°N, 80°–60°W and 25°–45°N, 130°E–180°). This is partly due to the predictor selection during which more weight is given to predictors in regions of climatologically high WCB frequency (section 3b). In terms of MCC, the regression models for WCB ascent perform best compared to WCB inflow and outflow models. In areas where the WCB frequency exceeds 2%, the MCC for WCB ascent ranges from 0.3 to 0.6, with the highest MCC values concentrated over the western North Atlantic (30°–50°N, 90°–50°W) and the western North Pacific (30°–50°N, 130°E–180°). In contrast, WCB outflow is characterized by the lowest MCC among the three WCB stages (Fig. 9f). Though it reaches values of 0.3 to 0.5 over the western North Atlantic (45°–60°N, 65°–50°W) and western North Pacific (40°–60°N, 140°–170°E), the MCC lies below 0.3 over the eastern part of the two basins.
c. Model frequencies
Next, we analyze to what degree the climatological occurrence frequency of WCBs based on trajectories can be represented by the logistic regression models (Fig. 10). By design the observed frequency and that of the regression models coincide well. For WCB inflow and ascent, highest occurrence frequencies as predicted by the logistic regression models are collocated with the observed WCB inflow and WCB ascent regions, respectively (Figs. 10a,c). Negative biases for WCB inflow of −5% to −1% are found over the western North Pacific (north of 30°N and from 130° to 150°E) and western North Atlantic (north of 30°N and from 80° to 50°W in Fig. 10b). Positive biases of similar magnitude occur over the central to eastern North Pacific (south of 40°N and from 150°E to 150°W) and eastern North Atlantic 45°–60°N, 40°–10°W). The dipoles of positive and negative biases, which are less than 20% of the trajectory-based climatological frequency (not shown), suggest that the regression models tend to shift the main WCB inflow regions southward over the North Atlantic and southeastward over the North Pacific.

Climatological occurrence frequency of WCB (a) inflow, (c) ascent, and (e) outflow (shading, in %) based on the logistic regression model ec.erai_1979–1999 and the trajectory based definition (contours, every 2%). Climatological bias of the logistic regression model for WCB (b) inflow, (d) ascent, and (f) outflow (shading, in %) compared to trajectory based definition (contours, every 2%) for DJF in the period 1 Dec 1999–28 Feb 2016. Green boxes indicate regions for which the distributions of predictor values are shown in Fig. 11.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Climatological occurrence frequency of WCB (a) inflow, (c) ascent, and (e) outflow (shading, in %) based on the logistic regression model ec.erai_1979–1999 and the trajectory based definition (contours, every 2%). Climatological bias of the logistic regression model for WCB (b) inflow, (d) ascent, and (f) outflow (shading, in %) compared to trajectory based definition (contours, every 2%) for DJF in the period 1 Dec 1999–28 Feb 2016. Green boxes indicate regions for which the distributions of predictor values are shown in Fig. 11.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Climatological occurrence frequency of WCB (a) inflow, (c) ascent, and (e) outflow (shading, in %) based on the logistic regression model ec.erai_1979–1999 and the trajectory based definition (contours, every 2%). Climatological bias of the logistic regression model for WCB (b) inflow, (d) ascent, and (f) outflow (shading, in %) compared to trajectory based definition (contours, every 2%) for DJF in the period 1 Dec 1999–28 Feb 2016. Green boxes indicate regions for which the distributions of predictor values are shown in Fig. 11.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
For WCB ascent, the biases are in general the smallest (Fig. 10d) compared to WCB inflow and outflow, which is consistent with the highest MCC values (Fig. 9d). Similar to the biases for WCB inflow, dipoles of negative and positive biases are found over the North Atlantic (40°–60°N, 70°–20°W in Fig. 10d) and the western North Pacific (30°–45°N, 140°–170°E). Thus, the regression model tends to shift the climatological main WCB ascent regions southward, which is consistent with the southward shift of WCB inflow. The biases in WCB ascent are in most regions less than 20% of the trajectory-based climatological frequency (not shown).
Largest discrepancies between the regression model and the trajectory model are found for WCB outflow over the southern North Pacific and western North America (Figs. 10e,f). Here, the climatological occurrence frequency of WCB outflow is overestimated by up to 10%, which corresponds to a normalized difference of 80%–100% compared to the trajectory-based WCB outflow climatology (not shown). Similar to WCB inflow and ascent, spatially coherent dipoles of positive and negative biases also occur for WCB outflow. These are located over the western North Pacific (30°–60°N, 130°–170°E) and the western North Atlantic (30°–60°N, 60°–30°W). It is notable that the dipoles of positive and negative biases occur quite consistently for the three WCB stages of inflow, ascent, and outflow over the western North Pacific and western North Atlantic. In all, this might suggest that the differences are connected to the typical track that cyclones take with WCB inflow along the east coast of Japan and the U.S. East Coast, WCB ascent over the baroclinic zones related to the Kuroshio and the Gulf Stream, and outflow farther northeast intersecting the upper-tropospheric jet stream.
d. Physical reasons for frequency biases
We investigate the physical reasons for the negative biases in WCB inflow, ascent, and outflow over the western North Atlantic and western North Pacific (regions are highlighted by green boxes in Figs. 10b,d,f) by computing the distribution of the predictor variables for grid points with false negatives (black lines in Fig. 11) and true positives (gray lines in Fig. 11). In contrast, physical reasons for positive WCB outflow biases over the eastern North Pacific are analyzed based on predictors for grid points with false positives and true positives.

Median predictor values (dots) for the three WCB stages of (a)–(d) inflow, (e)–(h) ascent, and (i)–(l) outflow in the green regions highlighted in Figs. 10b, 10d, and 10f. The predictors are as in Eqs. (3)–(5). Gray (black) dots indicate the median conditions for true positives (false negatives) and errors bars indicate the interquartile range. The only exception is in (i)–(l), where black dots and black error bars indicate the median conditions and interquartile range for false positives over the east Pacific. True positives, false positives, and false negatives are evaluated for the model ec.erai_1979–1999 in the period 1 Dec 1999–28 Feb 2016.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Median predictor values (dots) for the three WCB stages of (a)–(d) inflow, (e)–(h) ascent, and (i)–(l) outflow in the green regions highlighted in Figs. 10b, 10d, and 10f. The predictors are as in Eqs. (3)–(5). Gray (black) dots indicate the median conditions for true positives (false negatives) and errors bars indicate the interquartile range. The only exception is in (i)–(l), where black dots and black error bars indicate the median conditions and interquartile range for false positives over the east Pacific. True positives, false positives, and false negatives are evaluated for the model ec.erai_1979–1999 in the period 1 Dec 1999–28 Feb 2016.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Median predictor values (dots) for the three WCB stages of (a)–(d) inflow, (e)–(h) ascent, and (i)–(l) outflow in the green regions highlighted in Figs. 10b, 10d, and 10f. The predictors are as in Eqs. (3)–(5). Gray (black) dots indicate the median conditions for true positives (false negatives) and errors bars indicate the interquartile range. The only exception is in (i)–(l), where black dots and black error bars indicate the median conditions and interquartile range for false positives over the east Pacific. True positives, false positives, and false negatives are evaluated for the model ec.erai_1979–1999 in the period 1 Dec 1999–28 Feb 2016.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
The physical reasons for grid points with false negatives of WCB inflow, i.e., the models do not detect a WCB even though it occurred, are similar for both the western North Atlantic and western North Pacific. The predictor variables are characterized by reduced 700-hPa thickness advection (Fig. 11a), weaker 850-hPa meridional moisture flux (Fig. 11b), a median negative 1000-hPa moisture flux convergence (i.e., moisture flux divergence; Fig. 11c) and enhanced 500-hPa moist PV (Fig. 11d). Since the regression coefficients are positive for thickness advection, positive for meridional moisture flux, positive for moisture flux convergence and negative for moist PV (cf. Fig. 5), the conditional probability decreases under these conditions. Also from a dynamical perspective and the overall conceptual model of the WCB inflow stage outlined in the introduction, the false negatives are plausible under these conditions.
For the WCB ascent stage, false negatives occur in environments that are similar over the western North Atlantic and western North Pacific (Figs. 11e–h). Compared to the true positives, grid points with false negatives are characterized by reduced 850-hPa relative vorticity (Fig. 11e), lower 700-hPa relative humidity (Fig. 11f), weaker 300-hPa thickness advection (Fig. 11g), and weaker 500-hPa meridional moisture flux (Fig. 11h). However, overlapping interquartile ranges indicate that the differences of the median values between the false negatives and true positives are not significant, which may explain the inability of the regression models to correctly predict WCB ascent in these situations.
False negatives of WCB outflow over the western North Pacific and western North Atlantic are characterized by reduced 300-hPa relative humidity, lower 300-hPa irrotational wind speed, and higher 300-hPa relative vorticity. The differences in 500-hPa static stability are of opposite sign for the western North Atlantic and western North Pacific, respectively. This is due to the opposite sign of the corresponding regression coefficients (Fig. 7d) so that the differences in static stability and the other predictors still lead to an overall reduction of the conditional probability. In cases of false positives over the eastern North Pacific, i.e., the models detect a WCB even though there is none in the reference data, the 300-hPa relative humidity is higher (Fig. 11i), the 300-hPa irrotational wind speed is stronger (Fig. 11j), the 500-hPa static stability is higher (Fig. 11k), and the 300-hPa relative vorticity is more anticyclonic (Fig. 11l) than for true positives. Since the corresponding regression coefficients are positive for relative humidity (Fig. 7b), positive for irrotational wind speed (Fig. 7c), positive for static stability (Fig. 7d), and negative for relative vorticity (Fig. 7e), the conditional probability for WCB outflow increases under these conditions. Accordingly, the logistic regression models falsely predict WCB outflow in the prescribed area. One possible, but untested, hypothesis is that the falsely predicted outflow over the eastern North Pacific is associated with cyclones that occur equatorward of 25°N and are thus not included in the trajectory-based dataset.
For all regions analyzed above, the identification of the synoptic systems that are associated with the identified conditions would be intriguing but would require further in-depth analysis, which is beyond the scope of this study. Still, the above discussion reveals that a statistical analyses of the physical properties associated with WCB inflow, ascent, and outflow may yield different results depending on whether it is based on the Lagrangian WCB definition or on the logistic regression models. For example, WCB inflow over the western North Atlantic as identified by the regression models would be characterized by stronger low-level moisture flux convergence than in the Lagrangian definition since the trajectory model identifies WCB inflow in regions with considerably lower and even negative low-level moisture flux convergence (Fig. 11c). Thus, if one attempted to calculate heating rates associated with the WCB from the low-level moisture flux convergence these would likely be higher when using the regression model approach. Beyond this one example, careful consideration should be given when using the regression models in process-oriented studies on the physical properties of WCBs. We recommend that the trajectory approach should be preferred for process-oriented studies of WCBs, whereas the regression model approach facilitates systematic model evaluation in large datasets.
e. Case study
Finally, the ability of the logistic regression models to identify WCBs instantaneously and their usefulness for investigating forecast data is illustrated for a case study of a cyclone event during 22–24 January 2011. For this case, Martínez-Alvarado et al. (2016) showed that the 5-day forecast error associated with an upper-tropospheric ridge over the North Atlantic was linked to earlier forecast errors associated with a WCB within operational ECMWF forecasts. More specifically, the WCB forecast error was characterized by an overestimation of the number of ascending trajectories and a WCB outflow too far to the southeast. This incorrect WCB forecast resulted in an upper-tropospheric ridge too far to the south.
The episode of events that led to the formation of the upper-tropospheric ridge is illustrated in Fig. 12 using ERA-Interim data. At 0600 UTC 22 January 2011, the large-scale flow situation was characterized by a low pressure system west of Newfoundland (black contours in Fig. 12a; mean sea level pressure of 970 hPa at 65°W and 50°N), a deep trough over the central North Atlantic (red contour in Fig. 12a; trough base at 35°W and 35°N) and anticyclonic wave breaking over the far eastern North Atlantic. The WCB that was associated with the forecast error emerged from the warm sector of the low pressure system west of Newfoundland (blue trajectory starting points in Fig. 12a) and ascended in a northeasterly direction during the following 48 h (red trajectory end points in Fig. 12a). The WCB inflow identified by the trajectory-based approach was located at around 35°N and between 80° and 60°W (blue shading in Fig. 12b). The logistic regression model ec.erai_1979–1999 depicts this inflow region quite well (red shading in Fig. 12b); however, further inflow is identified northeast of this region, which only partly matches the trajectory-based inflow region. Forward trajectories that are initialized every 20 hPa below 790 hPa (section 2b) in the inflow region identified by the regression model, ascend on average by 384 hPa in 48 h compared to 432 hPa in 48 h for trajectories initialized from the same pressure levels but in the Lagrangian inflow mask. This illustrates that WCB trajectories do not necessarily emerge from all pressure levels in the vertical layer represented by either inflow mask definition.

Synoptic evolution for (a)–(c) 0600 UTC 22 Jan, (d)–(f) 0600 UTC 23 Jan, and (g)–(i) 0600 UTC 24 Jan 2011. (a),(d),(g) ERA-Interim-based 48-h WCB trajectories colored by their height (in hPa), dynamic tropopause (2 PVU on the 320-K isentropic surface; red contour), and mean sea level pressure (black contours, every 10 hPa from 980 to 1010 hPa). (a) All WCB trajectories that are located in the inflow layer (height below 800 hPa) at 0600 UTC 22 Jan, (d) all WCB trajectories that are located in the ascent layer (height between 800 and 400 hPa) at 0600 UTC 23 Jan, and (g) all WCB trajectories that are located in the WCB outflow layer (height above 400 hPa) at 0600 UTC 24 Jan. (b),(e),(h) ERA-Interim-based dynamic tropopause (2 PVU on the 320-K isentropic surface; red contour) and WCB mask for (b) inflow, (e) ascent, and (h) outflow based on Lagrangian definition (blue shading) and logistic regression model ec.erai_1979–1999 (red shading). All fields are based on ERA-Interim. (c),(f),(i) Red contours are as in (b), (e), and (h). ECMWF IFS operational ensemble forecast initialized at 1200 UTC 19 Jan 2011 of dynamic tropopause (2 PVU on the 320-K isentropic surface; thick black contour is ensemble mean, thin black contours denote the individual ensemble members) and ensemble probability of WCB (c) inflow, (f) ascent, and (i) outflow (shading, in %) calculated with the regression model ec.erai_1979–1999.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1

Synoptic evolution for (a)–(c) 0600 UTC 22 Jan, (d)–(f) 0600 UTC 23 Jan, and (g)–(i) 0600 UTC 24 Jan 2011. (a),(d),(g) ERA-Interim-based 48-h WCB trajectories colored by their height (in hPa), dynamic tropopause (2 PVU on the 320-K isentropic surface; red contour), and mean sea level pressure (black contours, every 10 hPa from 980 to 1010 hPa). (a) All WCB trajectories that are located in the inflow layer (height below 800 hPa) at 0600 UTC 22 Jan, (d) all WCB trajectories that are located in the ascent layer (height between 800 and 400 hPa) at 0600 UTC 23 Jan, and (g) all WCB trajectories that are located in the WCB outflow layer (height above 400 hPa) at 0600 UTC 24 Jan. (b),(e),(h) ERA-Interim-based dynamic tropopause (2 PVU on the 320-K isentropic surface; red contour) and WCB mask for (b) inflow, (e) ascent, and (h) outflow based on Lagrangian definition (blue shading) and logistic regression model ec.erai_1979–1999 (red shading). All fields are based on ERA-Interim. (c),(f),(i) Red contours are as in (b), (e), and (h). ECMWF IFS operational ensemble forecast initialized at 1200 UTC 19 Jan 2011 of dynamic tropopause (2 PVU on the 320-K isentropic surface; thick black contour is ensemble mean, thin black contours denote the individual ensemble members) and ensemble probability of WCB (c) inflow, (f) ascent, and (i) outflow (shading, in %) calculated with the regression model ec.erai_1979–1999.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
Synoptic evolution for (a)–(c) 0600 UTC 22 Jan, (d)–(f) 0600 UTC 23 Jan, and (g)–(i) 0600 UTC 24 Jan 2011. (a),(d),(g) ERA-Interim-based 48-h WCB trajectories colored by their height (in hPa), dynamic tropopause (2 PVU on the 320-K isentropic surface; red contour), and mean sea level pressure (black contours, every 10 hPa from 980 to 1010 hPa). (a) All WCB trajectories that are located in the inflow layer (height below 800 hPa) at 0600 UTC 22 Jan, (d) all WCB trajectories that are located in the ascent layer (height between 800 and 400 hPa) at 0600 UTC 23 Jan, and (g) all WCB trajectories that are located in the WCB outflow layer (height above 400 hPa) at 0600 UTC 24 Jan. (b),(e),(h) ERA-Interim-based dynamic tropopause (2 PVU on the 320-K isentropic surface; red contour) and WCB mask for (b) inflow, (e) ascent, and (h) outflow based on Lagrangian definition (blue shading) and logistic regression model ec.erai_1979–1999 (red shading). All fields are based on ERA-Interim. (c),(f),(i) Red contours are as in (b), (e), and (h). ECMWF IFS operational ensemble forecast initialized at 1200 UTC 19 Jan 2011 of dynamic tropopause (2 PVU on the 320-K isentropic surface; thick black contour is ensemble mean, thin black contours denote the individual ensemble members) and ensemble probability of WCB (c) inflow, (f) ascent, and (i) outflow (shading, in %) calculated with the regression model ec.erai_1979–1999.
Citation: Journal of the Atmospheric Sciences 78, 5; 10.1175/JAS-D-20-0139.1
At 0600 UTC 23 January 2011, a broad upper-tropospheric ridge has developed over the western North Atlantic with its apex at 50°N, 50°W(Fig. 12d). On its western flank, ahead of a short-wave upper-level trough, and in a region of a deepening low pressure system (mean sea level pressure of 992 hPa at 35°N, 70°W), the trajectories highlight the ascending WCB. The WCB ascent mask of the trajectory-based definition (blue shading in Fig. 12e) and that of the logistic regression model (red shading in Fig. 12e) match to a large degree.
The ridge over the North Atlantic amplifies rapidly and extends toward the southern tip of Greenland at 0600 UTC 24 January 2011 (Fig. 12g). The mask of the trajectory-based WCB outflow definition extends from Newfoundland to Iceland covering large parts of the ridge (blue shading in Fig. 12h). The core region of the WCB outflow over the central North Atlantic and poleward of 50°N is captured by the regression model (red shading in Fig. 12h). Areas that are not captured by the regression model are parts of a WCB outflow west of the ridge (40°–50°N, 50°–60°W). The outflow in this region is collocated with the cyclonically breaking upstream trough [2 PVU (1 PVU = 10−6 K kg−1 m2 s−1) contour in Fig. 12h] and positive 300-hPa relative vorticity (not shown). The positive relative vorticity reduces the conditional probability (negative β4,o; Fig. 7e) such that this region is not identified as WCB outflow by the logistic regression model. Likewise, WCB outflow that is not identified as such by the regression model is located on the eastern flank of the ridge (25°–50°N, 30°–40°W). In this area the 300-hPa relative vorticity is higher than in the core region of the WCB (not shown), which reduces the conditional probability for WCB outflow. Further, the WCB outflow is being advected far to the south and reaching regions with a low climatological WCB frequency (cf. Fig. 10e), and comparably poor skill of the logistic regression model (cf. Fig. 9f).
In the last paragraph of this section, we give an example on how the logistic regression model may be used to evaluate the representation of WCB footprints in numerical weather forecasts. To this end we have retrieved the operational ECMWF IFS ensemble forecast data initialized at 1200 UTC 19 January 2011, which is the same initialization time as in Martínez-Alvarado et al. (2016). Since the forecast is compared to ERA-Interim, the ensemble data are remapped from its original T399 spectral resolution to a regular 1° × 1° latitude–longitude grid. For both, reanalysis and forecast data, the WCB footprints are calculated via the logistic regression models ec.erai_1979–1999. Ensemble mean probabilities are constructed by applying the regression models to each ensemble member (51 members) separately and by converting the conditional probabilities to dichotomous predictions. The average over all ensemble members yields the ensemble mean probability. The computation time for the entire ensemble forecast with 6-hourly output and 240-h lead time takes approximately 9 min. In comparison, it takes about 3 h to derive the same trajectory-based products, even in an optimized and parallelized setup, which has been used in earlier work (Schäfler et al. 2014).
For the 66-h ensemble forecast valid at 0600 UTC 22 January 2011, the structure of the dynamic tropopause is still very similar between ERA-Interim (red contour in Fig. 12c) and the ensemble forecast (black contours in Fig. 12c). In terms of WCB inflow, the main inflow region identified in the ERA-Interim by the regression model (red shading in Fig. 12b) is captured by the ensemble forecast as indicated by ensemble mean probabilities of more than 70% (shading in Fig. 12c). One day later, major differences become apparent over the western North Atlantic. Though the axis of the short wave trough in ERA-Interim is located at 75°W, the trough in the ensemble mean is still located at 80°W (Fig. 12f). Likewise, the low pressure system, which is more than 10 hPa weaker than in the reanalysis (not shown), is located southwest of the low in ERA-Interim. This phase shift becomes also apparent in the ensemble mean WCB probability, which indicates WCB ascent extended farther west than in ERA-Interim (red shading in Fig. 12e). The differences between ERA-Interim and the ensemble forecast amplify rapidly until 0600 UTC 24 January 2011. A cyclonic breaking of the upstream trough is hardly visible in the ensemble forecast and there is considerable uncertainty concerning the position of the dynamic tropopause. This uncertainty is reflected by almost 20° ensemble variance in the cyclone’s latitudinal position and 9-hPa ensemble variance in the intensity of the cyclone, which is in the ensemble-mean about 20 hPa weaker than in ERA-Interim (not shown). Farther downstream over the central North Atlantic, the dynamic tropopause is located farther south in the forecast than in ERA-Interim (Fig. 12i). This clearly represents a misplacement of the crest of the ridge in the forecast. Especially over the Labrador Sea (60°N, 60°W), the ERA-Interim is outside the full ensemble distribution. The uncertainty in both the cyclone position and ridge location likely reduces the ensemble probability of WCB outflow that has decreased to values of 30%–40%. In addition, the region of highest ensemble mean outflow probability (35°–60°N, 30°–60°W) is located farther south than in ERA-Interim (red shading within 45°–65°N, 35°–65°W in Fig. 12h). This misplacement between ERA-Interim and the operational ensemble forecast is in line with the misplacement between the operational analysis and the ECMWF IFS high-resolution forecast reported by Martínez-Alvarado et al. (2016).
5. Concluding discussion
The present study introduces a statistical framework that allows the identification of WCB footprints from Eulerian fields that are routinely available from NWP and climate models such as those provided in the S2S database (Vitart et al. 2017). First, we identify predictor variables for the three WCB stages of WCB inflow, WCB ascent, and WCB outflow via a stepwise forward predictor selection. These predictor variables are solely derived from temperature, geopotential height, specific humidity, and horizontal wind components. Second, we develop gridpoint-specific multivariate logistic regression models using dichotomous (0 or 1) data of WCB occurrence for the inflow, ascent, or outflow stages as predictands that are based on gridded Lagrangian WCB trajectories. Both the predictands and predictors are taken from ERA-Interim for the period 1 March 1979–30 November 2016.
During the stepwise forward predictor selection based on the ERA-Interim data and the Lagrangian WCB trajectories, the regression models’ predictors are shown to be in general agreement with the conceptual understanding of the three WCB stages. For WCB inflow these predictors are 700-hPa thickness advection, 850-hPa meridional moisture transport, 1000-hPa moisture flux convergence, and 500-hPa moist PV. This is in line with diagnostic studies of WCBs showing that they emerge from regions of strong poleward moisture transport and quasi-geostrophically forced ascent (e.g., Wernli 1997; Binder et al. 2016; Dacre et al. 2019). Similar to WCB inflow, WCB ascent is also characterized by the predictors of 500-hPa meridional moisture transport and 300-hPa thickness advection, the latter of which might indicate quasigeostrophic forcing for ascent. The importance of poleward moisture transport for WCB inflow and WCB ascent suggests that most WCBs are denoted by the poleward, ascending motion of air. However, Grams et al. (2014) noted that, for example, over the European Alps equatorward ascending WCBs may occur, which are captured by the trajectory-based approach. In its current form, the logistic regression model is unable to identify these systems since an equatorward moisture transport (negative sign) combined with positive regression coefficients would actually decrease the probability of WCB ascent [see Eqs. (1) and (2)]. We accept this limitation of the regression models since equatorward ascending WCBs account in most regions for less than 10% of the yearly WCB number (Grams et al. 2014). Further predictors for WCB ascent are 700-hPa relative humidity and 850-hPa relative vorticity. These two predictors reflect the WCB characteristics, which are saturation of the ascending air masses and the redistribution of lower-tropospheric relative vorticity via stretching (e.g., Wernli and Davies 1997; Madonna et al. 2014b). For WCB outflow, 300-hPa relative humidity, the 300-hPa irrotational wind speed, and 300-hPa relative vorticity are important predictors. This is in line with the general concept of WCB air masses reaching the upper troposphere, which are characterized by a broad cloud shield and a divergent outflow into an upper-tropospheric ridge (e.g., Carlson 1980; Grams and Archambault 2016; Wernli et al. 2016).
The multivariate regression models are trained for each meteorological season and each grid point. Though the regression coefficients are only shown for DJF, they exhibit the same sign and are of comparable magnitude for the other seasons. A drawback of the season-based approach is that the modeled probabilities for dates close to the transition from one season to the next may exhibit discontinuities due to changing regression coefficients. The identified predictors may provide guidance for the development of even more advanced frameworks. We are currently working on such an advanced framework based on convolutional neural networks, which have been successfully used to identify synoptic-scale flow features (e.g., Liu et al. 2016; Biard and Kunkel 2019; Lagerquist et al. 2019). An expected advantage compared to the logistic regression models of this study is that the deep learning approach takes into account the information of neighboring grid points.
We evaluate the logistic regression models’ ability to represent WCB inflow, ascent, and outflow during Northern Hemisphere winter (DJF) using the ec-erai_1979–1999 models. These models are trained on ERA-Interim for the period 1 December 1979–28 February 1999 and evaluated for its predictions covering the period 1 December 1999–28 February 2016 against WCB stages identified by the Lagrangian trajectory-based approach applied to the ERA-Interim dataset. Sensitivity tests concerning the training period reveal that the performance of the regression models is insensitive to possible long-term trends related to interdecadal variability and climate change. However, when being applied to datasets other than ERA-Interim, the predictor variables should be recalibrated to improve the models’ reliability.
For all three WCB stages the models are reliable for low modeled probabilities but they tend to overestimate the frequency of WCBs for high modeled probabilities. This is possibly related to the way WCBs are defined in our training dataset. As described in section 2b, the trajectory-based WCB definition is only fulfilled if the trajectories are matched with an extratropical cyclone at least once during their lifetime. However, we do not pursue including a requirement for nearby extratropical cyclones in the regression model training since this would increase the computational costs and limit the ability of the models to capture other relevant midlatitude airstreams.
The modeled probabilities are converted to dichotomous predictions by choosing a decision threshold that minimizes the climatological bias of the models. The models reach highest skill as measured by the Matthews correlation coefficient in regions where the climatological frequency is highest. This characteristic is due to the predictor selection, which gives more weight to predictors in regions of climatologically high WCB frequency. For WCB ascent, the skill is generally higher than for WCB inflow and outflow. The climatological occurrence frequency of WCBs is well represented in most regions. However, dipoles of positive and negative biases over the western North Pacific and western North Atlantic indicate a southward shift of WCB inflow, ascent, and outflow compared to the trajectory-based climatology. These dipoles suggest that the differences are connected to the typical track that midlatitude cyclones take with WCB inflow along the east coast of Japan and U.S. East Coast, WCB ascent over the baroclinic zones related to the Kuroshio and the Gulf Stream, and outflow farther northeast reaching the upper-tropospheric jet stream. Accordingly, these biases should be considered when studying interactions between WCBs, cyclones, the midlatitude waveguide and blocking since the dynamical impact of WCBs might be estimated differently than with the trajectory-based approach.
An exemplary case study shows that the logistic regression models may be used to evaluate the representation of WCBs in NWP data. The logistic regression models identify footprints of WCB inflow and ascent with the limitation that WCBs are also identified in regions that do not match with an extratropical cyclone. The core region of WCB outflow into an upper-tropospheric ridge is successfully identified. However, the logistic regression models fail to capture WCB outflow west of the ridge and in the vicinity of a cyclonically breaking trough as well as WCB outflow that is being advected far to the south and away from the apex of the main ridge. Currently, we are using the logistic regression models to verify the representation of WCBs in ECMWF’s S2S reforecasts, which will be published in a companion paper (Wandel et al. 2021, manuscript submitted to J. Atmos. Sci.). With the comparably low computational costs and reduced data output required compared to the expensive trajectory-based models, the statistical models are applicable to data of climate model projections. For studies alike, the logistic regression coefficients and decision thresholds for all seasons are available in netCDF format in the online supplemental material.
Acknowledgments
This work was funded by the Helmholtz Association as part of the Young Investigator Group “Sub-Seasonal Predictability: Understanding the Role of Diabatic Outflow” (SPREADOUT; Grant VH-NG-1243). We are grateful to Suzanne Gray, Ron McTaggart-Cowan, Dominik Büeler, Moritz Pickl, Jan Wandel, and two anonymous reviewers for valuable comments on this project. Sincerest thanks to the Atmospheric Dynamics group at ETH Zurich in particular to Michael Sprenger and Heini Wernli for sharing the trajectory-based WCB data. ECMWF, Deutscher Wetterdienst, and MeteoSwiss are acknowledged for granting access to the ERA-Interim dataset and operational ECMWF ensemble forecasts. This research was partially embedded in the subprojects A8 and B8 of the Transregional Collaborative Research Center SFB/TRR 165 “Waves to Weather” (https://www.wavestoweather.de) funded by the German Research Foundation (DFG).
Data availability statement
Monthly trajectory-based WCB data can be requested from http://eraiclim.ethz.ch/. ERA-Interim data are freely available at https://apps.ecmwf.int/datasets/data/interim-full-daily/ and JRA-55 data were retrieved from https://doi.org/10.5065/D6HH6H41 (Japan Meteorological Agency 2013).
REFERENCES
Alin, A., 2010: Multicollinearity. Wiley Interdiscip. Rev.: Comput. Stat., 2, 370–374, https://doi.org/10.1002/wics.84.
Altenhoff, A. M., O. Martius, M. Croci-Maspoli, C. Schwierz, and H. C. Davies, 2008: Linkage of atmospheric blocks and synoptic-scale Rossby waves: A climatological analysis. Tellus, 60A, 1053–1063, https://doi.org/10.1111/j.1600-0870.2008.00354.x.
Baumgart, M., M. Riemer, V. Wirth, F. Teubler, and S. T. K. Lang, 2018: Potential vorticity dynamics of forecast errors: A quantitative case study. Mon. Wea. Rev., 146, 1405–1425, https://doi.org/10.1175/MWR-D-17-0196.1.
Berman, J. D., and R. D. Torn, 2019: The impact of initial condition and warm conveyor belt forecast uncertainty on variability in the downstream waveguide in an ECWMF case study. Mon. Wea. Rev., 147, 4071–4089, https://doi.org/10.1175/MWR-D-18-0333.1.
Biard, J. C., and K. E. Kunkel, 2019: Automated detection of weather fronts using a deep learning neural network. Adv. Stat. Climatol. Meteor. Oceanogr., 5, 147–160, https://doi.org/10.5194/ascmo-5-147-2019.
Billet, J., M. DeLisi, B. G. Smith, and C. Gates, 1997: Use of regression techniques to predict hail size and the probability of large hail. Wea. Forecasting, 12, 154–164, https://doi.org/10.1175/1520-0434(1997)012<0154:UORTTP>2.0.CO;2.
Binder, H., M. Boettcher, H. Joos, H. Wernli, H. Binder, M. Boettcher, H. Joos, and H. Wernli, 2016: The role of warm conveyor belts for the intensification of extratropical cyclones in Northern Hemisphere winter. J. Atmos. Sci., 73, 3997–4020, https://doi.org/10.1175/JAS-D-15-0302.1.
Bosart, L. F., B. J. Moore, J. M. Cordeira, H. M. Archambault, L. F. Bosart, B. J. Moore, J. M. Cordeira, and H. M. Archambault, 2017: Interactions of North Pacific tropical, midlatitude, and polar disturbances resulting in linked extreme weather events over North America in October 2007. Mon. Wea. Rev., 145, 1245–1273, https://doi.org/10.1175/MWR-D-16-0230.1.
Boutle, I. A., S. E. Belcher, and R. S. Plant, 2011: Moisture transport in midlatitude cyclones. Quart. J. Roy. Meteor. Soc., 137, 360–373, https://doi.org/10.1002/qj.783.
Bowman, K. P., J. C. Lin, A. Stohl, R. Draxler, P. Konopka, A. Andrews, and D. Brunner, 2013: Input data requirements for Lagrangian trajectory models. Bull. Amer. Meteor. Soc., 94, 1051–1058, https://doi.org/10.1175/BAMS-D-12-00076.1.
Browning, K. A., and N. M. Roberts, 1994: Structure of a frontal cyclone. Quart. J. Roy. Meteor. Soc., 120, 1535–1557, https://doi.org/10.1002/qj.49712052006.
Browning, K. A., M. E. Hardman, T. W. Harrold, and C. W. Pardoe, 1973: The structure of rainbands within a mid-latitude depression. Quart. J. Roy. Meteor. Soc., 99, 215–231, https://doi.org/10.1002/qj.49709942002.
Carlson, T. N., 1980: Airflow through midlatitude cyclones and the comma cloud pattern. Mon. Wea. Rev., 108, 1498–1509, https://doi.org/10.1175/1520-0493(1980)108<1498:ATMCAT>2.0.CO;2.
Chattopadhyay, A., P. Hassanzadeh, and S. Pasha, 2020: Predicting clustered weather patterns: A test case for applications of convolutional neural networks to spatio-temporal climate data. Sci. Rep., 10, 1317, https://doi.org/10.1038/s41598-020-57897-9.
Dacre, H. F., and S. L. Gray, 2013: Quantifying the climatological relationship between extratropical cyclone intensity and atmospheric precursors. Geophys. Res. Lett., 40, 2322–2327, https://doi.org/10.1002/grl.50105.
Dacre, H. F., O. Martínez-Alvarado, and C. O. Mbengue, 2019: Linking atmospheric rivers and warm conveyor belt airflows. J. Hydrometeor., 20, 1183–1196, https://doi.org/10.1175/JHM-D-18-0175.1.
Davini, P., S. Corti, F. D’Andrea, G. Rivière, and J. von Hardenberg, 2017: Improved winter European atmospheric blocking frequencies in high-resolution global climate simulations. J. Adv. Model. Earth Syst., 9, 2615–2634, https://doi.org/10.1002/2017MS001082.
Davis, J., and M. Goadrich, 2006: The relationship between precision-recall and ROC curves. Proc. 23rd Int. Conf. on Machine Learning, Pittsburgh, PA, ACM, 233–240, https://doi.org/10.1145/1143844.1143874.
Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828.
Dreiseitl, S., and L. Ohno-Machado, 2002: Logistic regression and artificial neural network classification models: A methodology review. J. Biomed. Inform., 35, 352–359, https://doi.org/10.1016/S1532-0464(03)00034-0.
Drobot, S. D., and J. A. Maslanik, 2002: A practical method for long-range forecasting of ice severity in the Beaufort Sea. Geophys. Res. Lett., 29, 1213, https://doi.org/10.1029/2001GL014173.
Eckhardt, S., A. Stohl, H. Wernli, P. James, C. Forster, and N. Spichtinger, 2004: A 15-year climatology of warm conveyor belts. J. Climate, 17, 218–237, https://doi.org/10.1175/1520-0442(2004)017<0218:AYCOWC>2.0.CO;2.
Field, P. R., and R. Wood, 2007: Precipitation and cloud structure in midlatitude cyclones. J. Climate, 20, 233–254, https://doi.org/10.1175/JCLI3998.1.
Gagne, D. J., S. E. Haupt, D. W. Nychka, and G. Thompson, 2019: Interpretable deep learning for spatial analysis of severe hailstorms. Mon. Wea. Rev., 147, 2827–2845, https://doi.org/10.1175/MWR-D-18-0316.1.
Grams, C. M., and H. M. Archambault, 2016: The key role of diabatic outflow in amplifying the midlatitude flow: A representative case study of weather systems surrounding western North Pacific extratropical transition. Mon. Wea. Rev., 144, 3847–3869, https://doi.org/10.1175/MWR-D-15-0419.1.
Grams, C. M., and Coauthors, 2011: The key role of diabatic processes in modifying the upper-tropospheric wave guide: A North Atlantic case-study. Quart. J. Roy. Meteor. Soc., 137, 2174–2193, https://doi.org/10.1002/qj.891.
Grams, C. M., S. C. Jones, C. A. Davis, P. A. Harr, and M. Weissmann, 2013: The impact of Typhoon Jangmi (2008) on the midlatitude flow. Part I: Upper-level ridgebuilding and modification of the jet. Quart. J. Roy. Meteor. Soc., 139, 2148–2164, https://doi.org/10.1002/qj.2091.
Grams, C. M., H. Binder, S. Pfahl, N. Piaget, and H. Wernli, 2014: Atmospheric processes triggering the central European floods in June 2013. Nat. Hazards Earth Syst. Sci., 14, 1691–1702, https://doi.org/10.5194/nhess-14-1691-2014.
Grams, C. M., L. Magnusson, and E. Madonna, 2018: An atmospheric dynamics perspective on the amplification and propagation of forecast error in numerical weather prediction models: A case study. Quart. J. Roy. Meteor. Soc., 144, 2577–2591, https://doi.org/10.1002/qj.3353.
Gray, S. L., C. M. Dunning, J. Methven, G. Masato, and J. M. Chagnon, 2014: Systematic model forecast error in Rossby wave structure. Geophys. Res. Lett., 41, 2979–2987, https://doi.org/10.1002/2014GL059282.
Hamill, T. M., and G. N. Kiladis, 2014: Skill of the MJO and Northern Hemisphere blocking in GEFS medium-range reforecasts. Mon. Wea. Rev., 142, 868–885, https://doi.org/10.1175/MWR-D-13-00199.1.
Harada, Y., and Coauthors, 2016: The JRA-55 Reanalysis: Representation of atmospheric circulation and climate variability. J. Meteor. Soc. Japan, 94, 269–302, https://doi.org/10.2151/jmsj.2016-015.
Harrold, T. W., 1973: Mechanisms influencing the distribution of precipitation within baroclinic disturbances. Quart. J. Roy. Meteor. Soc., 99, 232–251, https://doi.org/10.1002/qj.49709942003.
Hosmer, D. W., and S. Lemeshow, 2000: Applied Logistic Regression. 2nd ed. Wiley, 373 pp.
Japan Meteorological Agency, 2013: JRA-55: Japanese 55-year Reanalysis, daily 3-hourly and 6-hourly data. National Center for Atmospheric Research Computational and Information Systems Laboratory, accessed 1 February 2021, https://doi.org/10.5065/D6HH6H41.
Joos, H., and H. Wernli, 2012: Influence of microphysical processes on the potential vorticity development in a warm conveyor belt: A case-study with the limited-area model COSMO. Quart. J. Roy. Meteor. Soc., 138, 407–418, https://doi.org/10.1002/qj.934.
Joos, H., and R. M. Forbes, 2016: Impact of different IFS microphysics on a warm conveyor belt and the downstream flow evolution. Quart. J. Roy. Meteor. Soc., 142, 2727–2739, https://doi.org/10.1002/qj.2863.
King, G., and L. Zeng, 2003: Logistic regression in rare events data. J. Stat. Software, 8, 137–163, https://doi.org/10.18637/jss.v008.i02.
Kobayashi, S., and Coauthors, 2015: The JRA-55 Reanalysis: General specifications and basic characteristics. J. Meteor. Soc. Japan, 93, 5–48, https://doi.org/10.2151/jmsj.2015-001.
Kuo, Y.-H., M. A. Shapiro, and E. G. Donall, 1991: The interaction between baroclinic and diabatic processes in a numerical simulation of a rapidly intensifying extratropical marine cyclone. Mon. Wea. Rev., 119, 368–384, https://doi.org/10.1175/1520-0493(1991)119<0368:TIBBAD>2.0.CO;2.
Lagerquist, R., A. M. McGovern, and D. J. Gagne, 2019: Deep learning for spatially explicit prediction of synoptic-scale fronts. Wea. Forecasting, 34, 1137–1160, https://doi.org/10.1175/WAF-D-18-0183.1.
Lamberson, W. S., R. D. Torn, L. F. Bosart, and L. Magnusson, 2016: Diagnosis of the source and evolution of medium-range forecast errors for extratropical cyclone Joachim. Wea. Forecasting, 31, 1197–1214, https://doi.org/10.1175/WAF-D-16-0026.1.
Leroy, A., and M. C. Wheeler, 2008: Statistical prediction of weekly tropical cyclone activity in the Southern Hemisphere. Mon. Wea. Rev., 136, 3637–3654, https://doi.org/10.1175/2008MWR2426.1.
Liu, Y., and Coauthors, 2016: Application of deep convolutional neural networks for detecting extreme weather in climate datasets. arXiv, http://arxiv.org/abs/1605.01156.
Maddison, J. W., S. L. Gray, O. Martínez-Alvarado, and K. D. Williams, 2019: Upstream cyclone influence on the predictability of block onsets over the Euro-Atlantic region. Mon. Wea. Rev., 147, 1277–1296, https://doi.org/10.1175/MWR-D-18-0226.1.
Maddison, J. W., S. L. Gray, O. Martínez-Alvarado, and K. D. Williams, 2020: Impact of model upgrades on diabatic processes in extratropical cyclones and downstream forecast evolution. Quart. J. Roy. Meteor. Soc., 146, 1322–1350, https://doi.org/10.1002/qj.3739.
Madonna, E., S. Limbach, C. Aebi, H. Joos, H. Wernli, and O. Martius, 2014a: On the co-occurrence of warm conveyor belt outflows and PV streamers. J. Atmos. Sci., 71, 3668–3673, https://doi.org/10.1175/JAS-D-14-0119.1.
Madonna, E., H. Wernli, H. Joos, and O. Martius, 2014b: Warm conveyor belts in the ERA-Interim dataset (1979–2010). Part I: Climatology and potential vorticity evolution. J. Climate, 27, 3–26, https://doi.org/10.1175/JCLI-D-12-00720.1.
Madonna, E., M. Boettcher, C. M. Grams, H. Joos, O. Martius, and H. Wernli, 2015: Verification of North Atlantic warm conveyor belt outflows in ECMWF forecasts. Quart. J. Roy. Meteor. Soc., 141, 1333–1344, https://doi.org/10.1002/qj.2442.
Manzato, A., 2007: A note on the maximum Pierce skill score. Wea. Forecasting, 22, 1148–1154, https://doi.org/10.1175/WAF1041.1.
Martínez-Alvarado, O., E. Madonna, S. L. Gray, and H. Joos, 2016: A route to systematic error in forecasts of Rossby waves. Quart. J. Roy. Meteor. Soc., 142, 196–210, https://doi.org/10.1002/qj.2645.
Martínez-Alvarado, O., J. W. Maddison, S. L. Gray, and K. D. Williams, 2018: Atmospheric blocking and upper-level Rossby-wave forecast skill dependence on model configuration. Quart. J. Roy. Meteor. Soc., 144, 2165–2181, https://doi.org/10.1002/qj.3326.
Masato, G., B. J. Hoskins, and T. Woollings, 2013: Winter and summer Northern Hemisphere blocking in CMIP5 models. J. Climate, 26, 7044–7059, https://doi.org/10.1175/JCLI-D-12-00466.1.
Matsueda, M., 2009: Blocking predictability in operational medium-range ensemble forecasts. SOLA, 5, 113–116, https://doi.org/10.2151/sola.2009-029.
Matsueda, M., R. Mizuta, and S. Kusunoki, 2009: Future change in wintertime atmospheric blocking simulated using a 20-km-mesh atmospheric global circulation model. J. Geophys. Res., 114, D12114, https://doi.org/10.1029/2009JD011919.
Matthews, B. W., 1975: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta, 405, 442–451, https://doi.org/10.1016/0005-2795(75)90109-9.
McGovern, A., R. Lagerquist, D. J. Gagne, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 2175–2199, https://doi.org/10.1175/BAMS-D-18-0195.1.
McTaggart-Cowan, R., J. R. Gyakum, and R. W. Moore, 2017: The baroclinic moisture flux. Mon. Wea. Rev., 145, 25–47, https://doi.org/10.1175/MWR-D-16-0153.1.
Methven, J., 2015: Potential vorticity in warm conveyor belt outflow. Quart. J. Roy. Meteor. Soc., 141, 1065–1071, https://doi.org/10.1002/qj.2393.
Mohr, S., M. Kunz, and K. Keuler, 2015: Development and application of a logistic model to estimate the past and future hail potential in Germany. J. Geophys. Res. Atmos., 120, 3939–3956, https://doi.org/10.1002/2014JD022959.
Neiman, P. J., and M. A. Shapiro, 1993: The life cycle of an extratropical marine cyclone. Part I: Frontal-cyclone evolution and thermodynamic air–sea interaction. Mon. Wea. Rev., 121, 2153–2176, https://doi.org/10.1175/1520-0493(1993)121<2153:TLCOAE>2.0.CO;2.