Moisture boundaries, or drylines, are common over the southern U.S. high plains and are one of the most important airmass boundaries for convective initiation over this region. In favorable environments, drylines can initiate storms that produce strong and violent tornadoes, large hail, lightning, and heavy rainfall. Despite their importance, there are few studies documenting climatological dryline location and frequency, or performing systematic dryline forecast evaluation, which likely stems from difficulties in objectively identifying drylines over large datasets. Previous studies have employed tedious manual identification procedures. This study aims to streamline dryline identification by developing an automated, multiparameter algorithm, which applies image-processing and pattern recognition techniques to various meteorological fields and their gradients to identify drylines. The algorithm is applied to five years of high-resolution 24-h forecasts from Weather Research and Forecasting (WRF) Model simulations valid April–June 2007–11. Manually identified dryline positions, which were available from a previous study using the same dataset, are used as truth to evaluate the algorithm performance. Generally, the algorithm performed very well. High probability of detection (POD) scores indicated that the majority of drylines were identified by the method. However, a relatively high false alarm ratio (FAR) was also found, indicating that a large number of nondryline features were also identified. Preliminary use of random forests (a machine learning technique) significantly decreased the FAR, while minimally impacting the POD. The algorithm lays the groundwork for applications including model evaluation and operational forecasting, and should enable efficient analysis of drylines from very large datasets.
1. Introduction and motivation
Drylines occur most frequently over the central U.S. high plains [i.e., from western Nebraska to western Oklahoma and Texas; e.g., Fujita (1958) and Rhea (1966)] and mark an intersection where relatively warm, moist air originating from the Gulf of Mexico meets a relatively hot, dry air mass originating over the elevated terrain of the southwestern United States and northern Mexico. Afternoon dewpoint gradients of 10 K (100 km)−1 are common with drylines, and in particularly strong drylines, extreme moisture gradients up to 10 K km−1 have been documented (Pietrycha and Rasmussen 2001). The strong localized convergence and resulting ascent, along with abundant moisture that is often associated with drylines, make them one of the most important airmass boundaries for convective initiation over this region. When other favorable environmental parameters (e.g., vertical wind shear and instability) are present, drylines can serve as the initiation mechanism for thunderstorms that produce hazardous weather, including strong and violent tornadoes, large hail, lightning, and heavy rainfall.
Despite the importance of drylines, there are only a few studies documenting the climatological dryline location and frequency. Rhea (1966) and Schaefer (1973) documented drylines during April–June of the periods 1959–62 and 1966–68, respectively. Peterson (1983) documented drylines over a 10-yr period (1970–79), but only focused on western Texas. To date, the most comprehensive climatological dryline study is Hoch and Markowski (2005). They established a 30-yr dryline climatology using 0000 UTC surface analyses during April–June 1973–2002, finding drylines on 32% of days, with peak frequency occurring from mid- to late May. Schultz et al. (2007) and Coffer et al. (2013) also find similar dryline frequencies over shorter time periods.
There are even fewer studies evaluating the performance of numerical weather prediction (NWP) models in predicting the occurrence and location of drylines for large sets of cases. Most model evaluation studies have focused on single cases (e.g., Ziegler et al. 1997; Hane et al. 2001). In fact, to our knowledge the only studies that have systematically diagnosed the performance of NWP models in forecasting dryline occurrence and location are Coffer et al. (2013) and Clark et al. (2015). In Coffer et al. (2013), 24-h forecasts of dryline position from a 4-km grid-spacing experimental version of the Weather Research and Forecasting (WRF) Model (Skamarock et al. 2008), as well as the 12-km grid-spacing North American Mesoscale Forecast System (NAM; Rogers et al. 2009), were evaluated for the period April–June 2007–11. It was found that the NAM had no systematic eastward biases while the WRF had significant eastward biases. In Clark et al. (2015), the sensitivity of 24-h forecast dryline position to boundary layer parameterizations in 4-km grid-spacing WRF Model simulations was examined for the cases that occurred during the 2010–12 NOAA/Hazardous Weather Testbed Spring Forecasting Experiments (e.g., Clark et al. 2012). Significant differences in average forecast dryline position were found among the different boundary layer parameterizations with the Mellor–Yamada–Nakanishi–Niino (MYNN; Nakanishi 2000, 2001; Nakanishi and Niino 2004, 2006) scheme generally having the largest eastward position errors.
The small number of studies documenting observed dryline climatologies, as well as performing systematic NWP model forecast evaluation, is likely related to several factors. First, the relatively coarse observing network makes identifying the precise location of the dryline difficult. For example, Hoch and Markowski (2005) used objective analyses of conventional, synoptic-scale surface observations obtained using a two-pass Barnes (1964) technique, and estimated that the density of observations allowed them to determine the maximum eastward extent of dryline positions only to within about 0.5°. Recent development and improvement of high-resolution analysis and assimilation systems such as NCEP’s hourly 5-km Real-Time Mesoscale Analysis (RTMA; De Pondeca et al. 2011) system and the hourly 12-km grid-spacing Rapid Refresh (RAP; Brown et al. 2012) model should help produce more accurate and precise observed dryline positions. Second, only within the last decade or so have operational mesoscale models had adequate resolution to realistically resolve the sharp moisture gradients associated with drylines. Finally, further difficulties with identifying and tracking drylines occur because of their complex spatial and temporal characteristics. For example, drylines can jump (e.g., Hane et al. 2001) when a large section of the moist sector mixes out all at once, at times when multiple moisture gradients are present (e.g., Hane et al. 1993; Crawford and Bluestein 1997), and other types of airmass boundaries, such as outflow boundaries and cold fronts, have some characteristics similar to drylines. Because of these complexities, careful examination of multiple meteorological fields is necessary to properly identify drylines in both model and observational datasets, which is exceedingly laborious, especially for large datasets of dryline cases. For example, the manual dryline identification conducted in both the Coffer et al. (2013) and Clark et al. (2015) studies took many months to complete. As a result, Coffer et al. (2013) recommend the development of automated approaches for identifying, tracking, and visualizing drylines simulated in high-resolution models, and recognize that the increasing use of high-resolution ensembles will make automation increasingly valuable.
This study aims to address some of the recommendations of Coffer et al. (2013) by developing an automated, multiparameter dryline identification algorithm, which applies image-processing and pattern recognition techniques to various fields (and their gradients) to identify drylines. The algorithm is applied to five years of high-resolution 24-h forecasts from WRF Model simulations valid April–June 2007–11, which is the same dataset evaluated by Coffer et al. (2013). Thus, manually identified forecast dryline positions from the Coffer et al. (2013) study are available as a truth dataset to evaluate the algorithm performance. Additionally, to test the application of a machine learning technique known as random forests (RF; Breiman 2001) using the 2007–11 dataset as a training period, the dryline algorithm is also applied to 24-h National Severe Storms Laboratory (NSSL) WRF from April to June 2012. For evaluation purposes, a truth dataset was created for the 2012 cases using the manual identification procedures of Coffer et al. (2013). The remainder of this study is organized as follows. Section 2 describes the data and methods, including the configuration of the WRF Model simulations, the methodology used to obtain the manually identified dryline positions in Coffer et al. (2013), and the details of the automated dryline identification algorithm. Section 3 describes the results, and section 4 offers a summary, conclusions, and recommendations for further algorithm improvements and applications.
2. Data and methods
a. NSSL-WRF configuration
The dryline algorithm is applied to 0000 UTC initialized, 24-h forecasts from a 4-km grid-spacing configuration of the WRF Model that is run by the NSSL using a computing allocation on the Jet High-Performance Computing (HPC) cluster (Raytheon–Aspen Systems) in Boulder, Colorado. This permanent experimental modeling framework is known as the NSSL-WRF and was developed to provide storm-scale guidance to Storm Prediction Center forecasters and serve as a testing ground for the development of storm-scale model diagnostics (e.g., Kain et al. 2010). Before 9 June 2009, WRF Model, version 2.2, was used for the NSSL-WRF with a domain encompassing most of the United States except for portions of the west. After 9 June 2009, the domain was expanded to encompass the entire CONUS and the model was updated to version 184.108.40.206 Physical parameterizations include the Mellor–Yamada–Janjić (MYJ; Mellor and Yamada 1982; Janjić 2002) boundary layer scheme, the WRF single-moment 6-class microphysics scheme (WSM6; Hong and Lim 2006), the Noah land surface model (Chen and Dudhia 2001), and the Dudhia (1989) shortwave and RRTM (Mlawer et al. 1997) longwave radiation schemes. Initial and lateral boundary conditions (3-h updates) are from 0000 UTC initializations of the 12-km grid-spacing NAM (interpolated onto a 40-km grid).
b. Manual dryline identification
The manually identified 24-h forecast dryline positions that are used to evaluate the dryline algorithm were obtained from the Coffer et al. (2013) study, which included 116 dryline cases during the period April–June 2007–11. Including the 29 additional dryline cases that were identified for the April–June 2012 period gives 135 total cases, which are listed in Table 1. The details on the criteria and procedures for manual dryline identification are described in Coffer et al. (2013), and also summarized as follows. The primary dryline criterion was an unambiguous boundary between relatively moist and dry air along boundary length scales of O(100) km, where moisture boundaries were identified using 24-h forecasts of 2-m specific humidity. It was required that at some point along the boundary the specific humidity gradient magnitude was at least 3 g kg−1. Additionally, 2-m temperatures fields were used to distinguish drylines from cold fronts, and a shift in 10-m wind direction from a dry to a moist source region was also required. When drylines were identified, a Grid Analysis and Display System (GrADS; http://www.iges.org/grads/) script was used to manually draw a series of points along the axis of maximum specific humidity gradient magnitude. Straight-line segments connecting these points composed the dryline, and the latitude–longitude coordinates were saved in text files, which are used for the dryline algorithm evaluation.
c. Description of automated dryline algorithm
A flowchart with each step of the algorithm2 is provided in Fig. 1. Additionally, Figs. 2 and 3 illustrate the impact of each step for a dryline case that occurred at 0000 UTC 26 April 2012. This was a particularly challenging case for dryline identification because moisture gradients, although clearly present, were relatively weak, and several nondryline boundaries were present. Additionally, the case provides a clear demonstration of the impact of nearly each step in the algorithm. The synoptic weather regime was characterized by a weak ridge axis extending along the Rocky Mountains with weak [15–25 knots (kt; where 1 kt = 0.51 m s−1)] westerly to west-northwesterly midtropospheric flow downstream over the southern high plains. The remainder of this section describes each step of the dryline algorithm.
The first part of the algorithm is designed to identify any moisture gradient for potential dryline classification. It begins by smoothing the raw specific humidity (Fig. 2a) and dewpoint (Fig. 2b) fields using a Gaussian filter with σ = 10 km (or 2.5 grid points), which is used to reduce noise while retaining dryline-scale features. Similar filtering was found to very effectively aid in the manual identification performed by Coffer et al. (2013). Next, the gradients of the smoothed moisture fields are calculated using a 3 × 3 Sobel operator, which is a finite-difference method commonly used in image processing for edge detection (Figs. 2c,d; Lakshmanan 2012). As with all finite-difference methods, the technique is particularly sensitive to high-frequency noise, making the previous smoothing step key to obtaining reasonable results. Then, the magnitudes of the two moisture gradient fields are thresholded to provide a preliminary estimate of possible dryline locations (Figs. 2e,f). In other words, each contiguous region of points exceeding the specified threshold represents a possible dryline. This initial thresholding is performed using optimistic (i.e., lower) thresholds designed to reduce the risk of breaking apart desired features early on at the cost of including spurious dryline regions.
Although previous studies (e.g., Hoch and Markowski 2005) recommend using specific humidity gradient magnitude for dryline identification because specific humidity is not sensitive to elevation differences (unlike dewpoint temperature), we find the best results when requiring that both filtered dewpoint and specific humidity gradient magnitude fields exceed a specified threshold. Thus, once potential dryline regions are identified from thresholding both the specific humidity and dewpoint gradient fields, only the points that exceeded the specified thresholds for both of the fields are considered for dryline classification (i.e., binary AND; Fig. 2g). These regions are then required to contain at least one grid point above a vapor pressure gradient magnitude threshold and be of a certain size (Fig. 2h). The size criterion removes a large number of nondryline features in addition to smaller disconnected regions, while the vapor pressure criterion prevents the inclusion of small, noiselike features regardless of their moisture gradient.
The next section of the algorithm begins independently of the previous step and is designed to identify the lines of strongest moisture gradient along which any potential drylines will be placed. The process is based on the application of nonmaximum suppression (NMS; Sun and Vallotton 2009) to the specific humidity gradient field. The NMS method requires the x and y components of the gradient and is applied by performing a local maximum only on the examined grid point and its neighbors that are perpendicular to the gradient direction. The result is continuous isolines of maxima or minima in the gradient field. While the algorithm also identifies regions of noise, these are limited to thin and very fragmented lines. One weakness of the NMS method is that it does not allow for branching (i.e., two isolines of maxima merging or splitting from one), which creates the potential for identified isolines to follow smaller-scale, nondryline features. Two steps are taken to prevent these deviations. First, the specific humidity field used for this process is smoothed using a Gaussian filter with σ = 6 grid points (24 km), which removes some of the smaller branches, increases the spacing between identified isolines, and greatly reduces noise caused by small variations in gradient intensity. Second, a number of binary dilations (e.g., Weeks 1996) are used to enlarge the isolines identified by NMS, which helps connect closely spaced branches. An example of the smoothed specific humidity field and corresponding application of NMS with binary dilations for 0000 UTC 26 April is shown in Figs. 3a and 3b, respectively.
The final portion of the algorithm begins by combining the binary masks created in the previous two steps using a binary AND (Fig. 3c). The mask from the first step removes the spurious isolines found by NMS, restricting the output to regions where a drylinelike environment is present. Similarly, the NMS mask limits the spurious deviations identified during the initial thresholding process by removing all regions that do not fall along a line of maximum moisture gradient. The resulting output mask often has small gaps present because of minor discrepancies in the two input grids. The holes are removed using binary closing, which recursively expands and contracts the identified regions to fill in the empty spaces (Fig. 3d). The vapor pressure gradient and size thresholds are then reapplied, removing any undesired features that may have broken off during the above process (Fig. 3e). Finally, a set of masks is applied to remove regions near large bodies of water (e.g., Great Lakes and the Gulf of Mexico), and a latitudinal mask excludes all values outside of the area between 30° and 42°N to be consistent with the area used for manual dryline identification in Coffer et al. (2013). The resulting image forms the final output of the algorithm (Fig. 3f). The dryline algorithm was applied to the 135 dryline cases, as well as 344 cases that did not contain drylines (479 total cases) in order to assess the performance, including both events and nonevents.
a. Assessment of algorithm performance
First, to quantify the correspondence between the manually and objectively identified dryline positions, frequency histograms of the shortest distance between objectively identified dryline points and the manually identified dryline positions of Coffer et al. (2013) are presented in Fig. 4. It is important to note that only the objective dryline points that fall within 30 km of a manually identified dryline are considered in this analysis. The 30-km buffer was applied so that nondryline features that were mistakenly identified by the dryline algorithm (discussed later) would not skew the results. In other words, for objectively identified drylines that matched a manually identified dryline, Fig. 4 aims to quantify the location differences. Figure 4a indicates that 75% of the objective dryline points are within 10 km of a manually identified dryline, and 90% are within 15 km. The same data are used in Fig. 4b, but the shortest relative east–west distances between the manually and objectively identified drylines are presented, with negative (positive) values indicating westward (eastward) differences relative to the subjectively identified drylines. The differences are approximately normally distributed with a mean near zero, indicating a lack of systematic east–west differences in manual versus objective dryline positions.
Some of the larger distances in Fig. 4 can be attributed to the dryline algorithm following a different moisture boundary than in the subjective analyses, an example of which is illustrated over central Texas in Fig. 5. Because NMS follows the moisture boundary with the largest gradient magnitude, when there are two moisture boundaries meeting dryline criteria present, the dryline algorithm will choose the stronger one. In the case of Fig. 5, it could be argued that the objectively identified dryline is actually a better depiction of the dryline than the manually identified one.
Another factor leading to some of the larger distances in Fig. 4 is that the dryline algorithm tends to extend drylines farther along moisture boundaries than the subjective analyses. This effect can be seen on the northernmost sections of the dryline in Fig. 5, with the manually identified dryline ending in Colorado while the objective dryline extends farther north into the Nebraska Panhandle. The lengthening is likely due to the combination of initial thresholds and NMS. The isolines created by NMS rely on the threshold mask to provide a stopping criterion. Without the threshold mask, the isolines extend as far as possible while following the cross-boundary gradient maximum regardless of gradient intensity. However, the initial thresholds are optimistic (i.e., intentionally low), ensuring that dryline regions are retained at the cost of keeping additional features. While the subsequent vapor pressure gradient and size thresholds aid in removing disconnected features, they cannot affect extensions attached to the dryline regions. When the threshold mask is applied to the NMS output, the isolines are allowed to travel into the lower-gradient regions, resulting in overly long drylines. A second thresholding has been examined for use in reducing this affect; however, we were unable to achieve consistent results. A variety of iterative endpoint removal techniques were also investigated, with similar inconsistent results.
To assess the dryline algorithm over all cases, neighborhood-based forecast verification metrics based on contingency tables were computed. These metrics require two datasets on common grids. The output of the dryline algorithm is a gridded mask of zeros and ones, where the ones are the points composing the dryline, so it fits the requirement for a gridded dataset. However, the manually defined drylines are composed of a series of connected points that are not on a grid. A gridded mask was created from these connected points by defining a series of additional points along each manually defined dryline segment at 0.05° increments. Then, any point on the 4-km grid from the automated dryline detection that fell3 within 0.075° of these additional points was assigned a value of one, while all other points were assigned zero. Thus, a gridded mask of manually detected drylines was created, which could be compared to the automatically detected drylines.
Considering only individual grid points, the 2 × 2 contingency tables of possible forecast outcomes consist of hits (points at which both automated and manual procedures have identified a dryline), misses (points where the manual procedure identified a dryline but the automated did not), false alarms (points where the algorithm identified a dryline but the manual procedure did not), and correct negatives (points where neither of the procedures identifies a dryline). These contingency table elements are extended to a neighborhood-based framework following the procedures outlined in Clark et al. (2010). Thus, if a manually defined dryline exists at a point, this is a hit if an automatically detected dryline exists at the point or at any grid point within a radius r of the point. Similarly, if an automatically detected dryline exists at a point, the point is considered a hit if a manually defined dryline exists at the point or at any point within r of the point. A miss is assigned when a manually defined dryline exists at a point and none of the points within r contain an automatically detected dryline. A false alarm is assigned when an automatically detected dryline exists at a point and none of the points within r contain a manually defined dryline. Finally, correct negatives are assigned in the same way as for the traditional computation of contingency table elements (i.e., a manually or automatically detected dryline is neither forecast nor observed at a single point).
Using the neighborhood-based contingency table elements, probability of detection (POD), false alarm rate (FAR; e.g., Wilks 1995), and critical success index (CSI; e.g., Doswell et al. 1990) are computed, where
Radii of 0, 4, 8, 12, 16, 20, 24, 32, and 40 km are considered, and the contingency table elements are summed over all cases. Note that a radius of 0 km simply reduces to the traditional gridpoint-based version of the metrics. POD indicates the fraction of manually defined dryline points that were correctly identified by the automated dryline algorithm, FAR indicates the fraction of automatically detected drylines that did not correspond to manually detected drylines, and CSI measures how well the automatically detected drylines correspond to the manual ones. The range of CSI is from zero to one, with zero indicating no skill and one indicating a perfect correspondence.
First, to compare and visualize the location and extent of manually and automatically detected drylines, dryline frequencies at each grid point over all of the April–June 2007–11 cases (both dryline and nondryline days) are displayed in Fig. 6. From Fig. 6, it is very clear that the automated dryline detection algorithm identifies drylines much more frequently than the manual detection procedure. However, the areas over which the automated algorithm identifies drylines most frequently—an axis from western Texas and Oklahoma to southwestern Kansas—matches quite well with the higher frequencies from the manual dryline identification. The overdetection in the automated algorithm is related to several factors. First, as discussed in the previous section, the algorithm often extends drylines farther than the manual procedures. Second, the algorithm often identifies other boundaries connected to the dryline, such as cold fronts and outflow boundaries, as part of the dryline. Finally, despite the many steps that were taken to prevent the algorithm from detecting nondryline boundaries, these spurious detections were still identified as drylines quite frequently.
To quantify the skill of the automated detection algorithm, Fig. 7 displays POD, FAR, and CSI as a function of the radius of influence (ROI). Although the POD is relatively low at the grid scale (~0.68), it rises sharply with increasing ROI and is greater than 0.90 at ROIs ≥8 km with an asymptote of about 0.95. This result implies that, when accounting for small spatial errors, the automated detection algorithm correctly identifies about 95% of the manually identified drylines. On the other hand, the FAR is very high at the grid scale (~0.85), but falls slightly with increasing ROI. The FAR appears to asymptote at about 0.70, which implies that, when accounting for small spatial errors, about 70% of automatically detected drylines do not correspond to a manually detected dryline. This result is consistent with Fig. 6, which clearly shows the overdetection of drylines.
b. Application and assessment of a machine learning algorithm
To address the issue of overdetection by the dryline identification algorithm, random forests (Breiman 2001) were applied to the automatically detected drylines using the Waikato Environment for Knowledge Analysis (WEKA; Hall et al. 2009) toolkit. RFs are a machine learning technique that uses an ensemble of decision trees (another machine learning structure) to output a predictand for a given sample. Decision trees are generally formed by recursively splitting their training dataset along an attribute threshold chosen to maximize the reduction of information remaining in the dataset. The process is repeated until a predictand is selected or too few samples remain. For application to the automatically detected drylines, each dryline point is assigned a probability that it corresponds to a manually detected dryline. Each decision tree in the RFs is perturbed by denying it access to certain attributes during its creation. As applied herein, the RF uses 100 trees with a maximum depth of 5.
The RF technique was chosen for three reasons. First, the structure of RFs is well suited for classification (although they can be applied to regression problems). Second, the importance of the attributes being provided to the RF is unknown. There has been no assessment of how useful each may be to classification; however, the perturbations of the RF’s component decisions trees, combined with the algorithm’s ensemble approach to prediction, enable the technique to handle unimportant attributes without significant impact. Finally, the decision trees of RFs are human readable, which allows for easier understanding of the selection process formed by the learning algorithm and subsequent examination of the underlying causes of its selections (i.e., why certain attribute–threshold combinations were or were not important).
The performance of RFs is commonly assessed using cross validation. The full dataset is broken into n subsets under the assumption that the contained samples are independent. The RF is then trained (i.e., created) on n − 1 of the subsets and tested (i.e., its performance is assessed) on the remainder. However, this method cannot be used here as any two sample points may be drawn from neighboring locations on the same dryline, which breaks the independence assumption. Therefore, a single training dataset and a single testing dataset are used. The training dataset consists of the April–June 2007–11 cases analyzed in the previous section, while the testing dataset consists of cases from the period April–June 2012. To form the training dataset, repeated selections of random points within randomly selected drylines from the automated dryline algorithm were made. A total of 219 attributes were assessed, including pressure, moisture, and thermal variables. Variances of variables along the boundary using a number of distance thresholds were also assessed, along with dry and moist sector values of variables, which were found by using the specific humidity gradient orientation as a proxy for dryline orientation. The process was repeated until 10 000 points were sampled from both automatically detected drylines that corresponded to a manually detected dryline and those that did not correspond to a manually detected dryline. Then, the automated dryline detection algorithm was applied to the testing dataset with and without the RF.
To compare and visualize the location and extent of manually and automatically detected drylines for the test cases, dryline frequencies at each grid point over all the April–June 2012 cases are displayed in Fig. 8. Comparing Figs. 8a and 8b, which show the automatically detected drylines without and with the RF, respectively, reveals that the RF eliminates many spurious drylines, but retains most of the drylines that correspond to the manually identified drylines shown in Fig. 8c. The verification scores shown in Fig. 9 confirm the positive impact on the automatically detected drylines by applying the RF. At the largest ROI, the FAR is dramatically reduced from about 60% to 20%, while the POD is only reduced from 95% to 90%. This results in a much higher CSI, which, at the largest ROI, increases from 0.37 to 0.70 after RF application.
To illustrate the types of spurious dryline features recognized and removed by applying the RF, Fig. 10 displays six different example cases from the test dataset. Figures 10a, 10b, 10d, and 10e are all cases in which boundaries intersecting the dryline (i.e., cold fronts) were correctly removed by the RF. For the case on 16 April 2012 (Fig. 10c), no drylines were present and the automated dryline detection mistakenly identified a weak cold front as a dryline. In this case, the RF completely removed the entire feature. For the case on 5 June 2012, the automated algorithm identified three different nondryline features and the RF removed all but a small portion of these features.
4. Summary and conclusions
Drylines are a very common and important climatological feature of the U.S. high plains during the spring as they are often responsible for convective initiation. Additionally, the passage of the dryline can produce an abrupt transition from a warm and moist air mass to a hot, dry, and breezy air mass, which can cause a sudden increase in the danger of fire starts and rapid fire spread. Despite the importance of drylines, very few studies exist that document climatological dryline location and frequency, or perform forecast verification, which likely stems from difficulties in identifying drylines over large datasets. Thus, this study aimed at streamline dryline identification by developing an automated, multiparameter identification algorithm using image-processing and pattern recognition techniques.
The algorithm, which was described herein, was applied to five years of 4-km grid-spacing 24-h forecasts from the WRF Model for the period April–June 2007–11. Manually identified dryline positions, which were available from a previous study (Coffer et al. 2013) that used the same data, were used as truth to evaluate the algorithm performance. Neighborhood-based verification metrics revealed that the algorithm was very effective at identifying drylines with a POD of about 95% when accounting for small spatial errors. However, drylines were overdetected, which resulted in a very high FAR of about 70%. Visual inspection of the automatically detected drylines revealed that the overdetection was related to several factors. First, boundaries that intersect a dryline, such as convective outflow and cold fronts, were often identified as drylines. Second, drylines frequently extended too far. Third, many moisture gradient features that were not at all related to drylines were often detected.
Preliminary use of random forests (machine learning technique) significantly decreased the FAR, while minimally impacting the POD. The algorithm lays the groundwork for a final product with the potential to provide significant contributions to a variety of meteorological applications ranging from model evaluation to operational forecasting. Future plans involve experimental implementation within a real-time modeling framework to examine ways in which the algorithm could aid operational forecasting. For example, the algorithm could help efficiently visualize forecast uncertainty in dryline location by application to members of an ensemble, with drylines from all members shown in one plot similar to spaghetti charts that are often utilized for viewing fields in global ensembles like 500-hPa geopotential heights.
Funding was provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA–University of Oklahoma Cooperative Agreement NA11OAR4320072, U.S. Department of Commerce. In addition, three anonymous reviewers provided helpful comments to improve the manuscript.
The radius of 0.075° was chosen because it resulted in a manually defined dryline width that approximately matched that of the automatically detected drylines.