• Akaike, H., 1974: A new look at the statistical model identification. IEEE Trans. Automat. Contr., 19, 716723, https://doi.org/10.1109/TAC.1974.1100705.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Austin, P. C., and E. W. Steyerberg, 2014: Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat. Methods Med. Res., 26, 796808, https://doi.org/10.1177/0962280214558972.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brown, D., and J. Rothfuss, 1998: An approach to waterspout forecasting for south Florida and the Keys. NOAA/NWS Internal Rep., Miami, FL, https://www.weather.gov/mfl/waterspout_fcsting.

  • Craven, J. P., and H. E. Brooks, 2004: Baseline climatology of sounding derived parameters associated with deep, moist convection. Natl. Wea. Dig., 28, 1324.

    • Search Google Scholar
    • Export Citation
  • Davies, J. M., 1993: Hourly helicity, instability, and EHI in forecasting supercell tornadoes. Preprints, 17th Conf. on Severe Local Storms, St. Louis, MO, Amer. Meteor. Soc., 107–111.

  • Golden, J., 1974: The life cycle of Florida Keys’ waterspouts. I. J. Appl. Meteor., 13, 676692, https://doi.org/10.1175/1520-0450(1974)013<0676:TLCOFK>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Golden, J., 1977: An assessment of waterspout frequencies along the U.S. East and Gulf Coasts. J. Appl. Meteor., 16, 231236, https://doi.org/10.1175/1520-0450(1977)016<0231:AAOWFA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Golden, J., 2003: Waterspouts. Encyclopedia of Atmospheric Sciences, G. North, J. Pyle, and F. Zhang, Eds., Elsevier, 2510–2525.

    • Crossref
    • Export Citation
  • Golden, J., and H. B. Bluestein, 1994: The NOAA–National Geographic Society Waterspout Expedition (1993). Bull. Amer. Meteor. Soc., 75, 22812288, https://doi.org/10.1175/1520-0477(1994)075<2281:TNNGSW>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heffter, J. L., 1980: Air Resources Laboratories Atmospheric Transport and Dispersion Model (ARL-ATAD). NOAA Tech. Memo. ERL ARL 81, 29 pp., https://www.arl.noaa.gov/documents/reports/arl-81.pdf.

  • Jolliffe, I. T., and D. B. Stephenson, 2008: Proper scores for probability forecasts can never be equitable. Mon. Wea. Rev., 136, 15051510, https://doi.org/10.1175/2007MWR2194.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Keul, A. G., M. V. Sioutas, and W. Szilagyi, 2009: Prognosis of central-eastern Mediterranean waterspouts. Atmos. Res., 93, 426436, https://doi.org/10.1016/j.atmosres.2008.10.028.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kuiper, J., and M. Van der Haven, 2007: A new index to calculate risk of waterspout development. Fourth European Conf. on Severe Storms, Trieste, Italy, European Severe Storms Laboratory, https://www.essl.org/ECSS/2007/abs/06-Forecasts/1179250265.kuiper.pdf.

  • MacInnes, J., 2017: An Introduction to Secondary Data Analysis with IBM SPSS Statistics. Sage Publications, 336 pp.

  • McCann, D. W., 1994: WINDEX—A new index for forecasting microburst potential. Wea. Forecasting, 9, 532541, https://doi.org/10.1175/1520-0434(1994)009<0532:WNIFFM>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miller, R. C., 1972: Notes on analysis and severe storm forecasting procedures of the Air Force Global Weather Central. Air Weather Service Tech. Rep. 200, 184 pp., http://www.dtic.mil/dtic/tr/fulltext/u2/744042.pdf.

  • Palmer, T. N., C. Brankovic, and D. S. Richardson, 2000: A probability and decision-model analysis of PROVOST seasonal multi-model ensemble integrations. Quart. J. Roy. Meteor. Soc., 126, 20122033, https://doi.org/doi:10.1002/qj.49712656703.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Peduzzi, P., J. Concato, E. Kemper, T. R. Holford, and A. R. Feinstein, 1996: A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol., 49, 13731379, https://doi.org/10.1016/S0895-4356(96)00236-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rossow, V., 1970: Observations of water spouts and their parent clouds. NASA Tech. Note D-5854, 63 pp., https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19700020540.pdf.

  • Schwiesow, R. L., 1981: Horizontal velocity structure in waterspouts. J. Appl. Meteor., 20, 349360, https://doi.org/10.1175/1520-0450(1981)020<0349:HVSIW>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shewchuk, J., 2017: The Universal Rawinsonde Observation Program (version 6.8). Eosonde Research Services, http://www.raob.com.

  • Sioutas, M. V., and A. G. Keul, 2007: Waterspouts of the Adriatic, Ionian and Aegean Sea and their meteorological environment. Atmos. Res., 83, 542557, https://doi.org/10.1016/j.atmosres.2005.08.009.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Spratt, S. M., and B. K. Choy, 1994: Employing the WSR-88D for waterspout forecasting. Postprints, NEXRAD Users Conf., Norman, OK, NWS, https://www.weather.gov/media/mlb/research/EmployingtheWSR88DforWaterspoutForecasting.pdf.

  • Swets, J. A., 1988: Measuring the accuracy of diagnostic systems. Science, 240, 12851293, https://doi.org/10.1126/science.3287615.

  • Szilagyi, W., 2009: A waterspout forecasting technique. Fifth European Conf. on Severe Storms, Landshut, Germany, European Severe Storms Laboratory, https://www.essl.org/ECSS/2009/preprints/O05-14-sziladgyi.pdf.

  • Watson, L. W., 2011: Upgrade summer severe weather tool. NASA Contractor Rep. CR-2011-216299, 32 pp.

  • Wheeler, M. M., 2009: Severe weather and weak waterspout checklist in MIDDS. NASA Contractor Rep. CR-2009-214760, 15 pp., https://science.ksc.nasa.gov/amu/final-reports/svr-wx-wksht-midds.pdf.

  • Wilks, D. S., 2006: Statistical Methods in Atmospheric Sciences. 2nd ed. Elsevier, 627 pp.

  • World Meteorological Organization, 1999: Methods of interpreting numerical weather prediction output for aeronautical meteorology. WMO Tech. Note 195, 124 pp., https://library.wmo.int/opac/doc_num.php?explnum_id=3447.

  • View in gallery

    Waterspout day frequency by month for the period 2006–14, with the median (thick black line) and mean (diamond) shown. Whiskers show the 10th and 90th percentiles. The values for 2015 (red lines) and 2016 (blue lines) are added to show the monthly frequency in the independent verification period. Data are extracted from NWS local storm reports from the field office in Key West.

  • View in gallery

    Locations of all reported waterspouts during June–September 2006–16, extracted from NWS local storm reports from the field office in Key West. The distribution of waterspout distances (km) from the Key West sounding location is shown on the bottom right. The median, mean, and standard deviation of these distances are, respectively, 33.2, 47.3, and 42.1 km.

  • View in gallery

    Waterspout report frequency (%) by local hour of the day for the years 2006–14 during June–September. Data are extracted from NWS local storm reports from the field office in Key West. The 1200 UTC soundings are released at about 0700 eastern daylight time (EDT).

  • View in gallery

    Log10 of the NPS (vertical axis) of up to predictors that can be selected from a set of n (horizontal axis) candidate predictors. Graph obtained by calculating as a function of . The red circle, triangle, and cross indicate the number of such subsets that can be selected, respectively, from a set of , , and candidate predictors.

  • View in gallery

    Standardized coefficients (black) for the LRM with 95% confidence intervals (red lines).

  • View in gallery

    Frequency distribution (gray bars), conditional frequency of waterspout reports (red dots), and single-variable logistic regression fit (black line) for (a) 100-mb temperature (K), (b) 1000–700-mb lapse rates (K km−1), (c) the Craven significant severe parameter (×103 m3 s−3), (d) the TT index (unitless), and (e) 0–3-kft wind speed (kt, where 1 kt = 0.51 m s−1), (f) 100-mb U-wind sign (unitless), (g) transport mean U-wind sign (unitless), and (h) 0–10-kft V-wind sign (unitless).

  • View in gallery

    Boxplot of the LRM and CWI modeled waterspout probability on report vs no-report days from (a) cross validation and (b) independent verification. Whiskers show the 10th and 90th percentiles. The notch around the median (thick black line) indicates the 95% confidence interval for the median’s true value. The horizontal dashed line indicates the climatological frequency of waterspouts. The median value for each box-and-whisker plot is shown along the top of the plot, and the mean value is shown in parentheses beneath it.

  • View in gallery

    Reliability diagram for modeled waterspout probability for the (a) cross-validated LRM, (b) cross-validated CWI, (c) independent verification LRM, and (d) independent verification CWI. The point size reflects the forecast bin frequency, also denoted as a percentage. Values for the Brier score decomposition into reliability (Rel), resolution (Res), and uncertainty (Unc), as well as the BSS, are printed in the bottom-right corner of each panel.

  • View in gallery

    BSSs for individual folds of the 20-times fivefold cross validation for the LRM (red) and CWI (blue), as well as the BRI (green), KHI (orange), and SWI (cyan).

  • View in gallery

    ROC curves for the (a) cross validation and (b) independent verification of waterspout probability for the LRM (red) and for the CWI model (blue). AUC values are indicated in the bottom-right corner of each panel. In (a), the dashed lines show the ROC curves for individual folds, and solid lines are for the entire cross validation.

  • View in gallery

    AUC for individual folds of the 20-times fivefold cross validation for the LRM (red) and CWI (blue), as well as the BRI (green), KHI (orange), and SWI (cyan).

  • View in gallery

    Wet season composite soundings for the years 2006 (red), 2010 (blue), and 2016 (green). The dashed line is the dewpoint temperature, and the solid line is the temperature. Note the divergence in dewpoint temperatures above 400 mb, reflecting the change in radiosonde instruments in 2010 and 2012.

  • View in gallery

    Monthly mean 100-mb dewpoint temperature (°C) from the rawinsonde observations at Key West. Note the discontinuity between 2006–10 and 2012–16 reflecting instrumentation changes.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 152 152 22
PDF Downloads 119 119 17

Statistical Prediction of Waterspout Probability for the Florida Keys

View More View Less
  • 1 NOAA/National Weather Service, Key West, Florida
  • | 2 Center for Ocean–Atmospheric Prediction Studies, Florida State University, Tallahassee, Florida
© Get Permissions
Full access

Abstract

A statistical model of waterspout probability was developed for wet-season (June–September) days over the Florida Keys. An analysis was performed on over 200 separate variables derived from Key West 1200 UTC daily wet-season soundings during the period 2006–14. These variables were separated into two subsets: days on which a waterspout was reported anywhere in the Florida Keys coastal waters and days on which no waterspouts were reported. Days on which waterspouts were reported were determined from the National Weather Service (NWS) Key West local storm reports. The sounding at Key West was used for this analysis since it was assumed to be representative of the atmospheric environment over the area evaluated in this study. The probability of a waterspout report day was modeled using multiple logistic regression with selected predictors obtained from the sounding variables. The final model containing eight separate variables was validated using repeated fivefold cross validation, and its performance was compared to that of an existing waterspout index used as a benchmark. The performance of the model was further validated in forecast mode using an independent verification wet-season dataset from 2015–16 that was not used to define or train the model. The eight-predictor model was found to produce a probability forecast with robust skill relative to climatology and superior to the benchmark waterspout index in both the cross validation and in the independent verification.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Andrew Devanas, andrew.devanas@noaa.gov

Abstract

A statistical model of waterspout probability was developed for wet-season (June–September) days over the Florida Keys. An analysis was performed on over 200 separate variables derived from Key West 1200 UTC daily wet-season soundings during the period 2006–14. These variables were separated into two subsets: days on which a waterspout was reported anywhere in the Florida Keys coastal waters and days on which no waterspouts were reported. Days on which waterspouts were reported were determined from the National Weather Service (NWS) Key West local storm reports. The sounding at Key West was used for this analysis since it was assumed to be representative of the atmospheric environment over the area evaluated in this study. The probability of a waterspout report day was modeled using multiple logistic regression with selected predictors obtained from the sounding variables. The final model containing eight separate variables was validated using repeated fivefold cross validation, and its performance was compared to that of an existing waterspout index used as a benchmark. The performance of the model was further validated in forecast mode using an independent verification wet-season dataset from 2015–16 that was not used to define or train the model. The eight-predictor model was found to produce a probability forecast with robust skill relative to climatology and superior to the benchmark waterspout index in both the cross validation and in the independent verification.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Andrew Devanas, andrew.devanas@noaa.gov

1. Introduction

The frequency of nonsupercell waterspouts in the Florida Keys is higher than any other location in the United States and may be one of the highest in the world, with hundreds of waterspouts occurring each year in the waters surrounding the Florida Keys (Golden 2003). Waterspouts are a marine hazard and do occasionally come on shore, resulting in damage to coastal structures. Objective waterspout guidance for the Key West area is limited, and forecasters rely mainly on subjective techniques and empirical rules of thumb to forecast the possibility of waterspout development. A reliable objective guidance that can be used in combination with available subjective tools would likely aid the forecast process.

Only a fraction of waterspouts are observed and recorded. During the Lower Keys Waterspout Project in 1969, at least 400 waterspouts were documented (Golden 1974). Golden (1977) estimated that between 50 and 500 waterspouts occur each year over the Florida Keys coastal waters. Given that approximately 25 waterspouts, on average, are reported during the wet season (estimated from 2006–14 reports discussed in section 2), waterspouts may be underreported by up to an order of magnitude, a conjecture also made by Golden and Bluestein (1994). Florida Keys waterspouts have been reported in every month of the year, although less frequently from October to May. Waterspout formation is noticeably more frequent during the wet season, June–September, with a primary maximum in June (Fig. 1), consistent with Golden (2003). The months of May and October are considered transition months (dry-to-wet and wet-to-dry seasons, respectively), during which midlatitude influences can vary greatly from year to year. Therefore, the transition months were excluded from this study. During the wet-season months in the present study waterspouts were reported on approximately 19% of all days.

Fig. 1.
Fig. 1.

Waterspout day frequency by month for the period 2006–14, with the median (thick black line) and mean (diamond) shown. Whiskers show the 10th and 90th percentiles. The values for 2015 (red lines) and 2016 (blue lines) are added to show the monthly frequency in the independent verification period. Data are extracted from NWS local storm reports from the field office in Key West.

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

Rossow (1970) noted that it is not possible to distinguish a likely waterspout day from one that is not, because the weather is nearly the same during the wet season. The atmospheric environment over the Florida Keys exhibits mainly incremental day-to-day variations in the temperature and moisture profile. Because of this, it is difficult to readily differentiate between days that are more favorable for waterspout development and days that are less favorable simply by examining soundings, sounding climatologies, or individual parameters or stability indices derived from the soundings. Nonetheless, the atmosphere over the Florida Keys does experience a degree of variability. For example, tropical features such as easterly waves and tropospheric upper-level trough cells traverse the area surrounding the Keys throughout the wet season. Changes in the position and strength of the Bermuda high, as well as the flow off the mainland and Cuba, may result in changes of wind patterns and temperature and moisture profiles.

Schwiesow (1981), as well as Sioutas and Keul (2007), noted that traditional stability indices in isolation did not perform well as indicators of a favorable waterspout development environment. Several previous efforts to quantify favorable conditions for waterspout development have been performed using sounding parameters as well as synoptic-scale analysis. Golden and Bluestein (1994) noted that the thermodynamic soundings originating from the NWS Weather Forecast Office in Key West on waterspout days revealed relatively light low-level winds (5–8 m s−1), light shear, and abundant moisture in the boundary layer with drier air above. Golden (2003) expressed concern that a mean sounding climatology, or the NWS Key West sounding, could not adequately capture the superadiabatic lapse rates observed in the subcloud layer of waterspout-producing cumulus observed in his studies.

Spratt and Choy (1994) examined data from the period 1979–92, identifying days with at least one waterspout report in conjunction with soundings from Tampa, West Palm Beach, and Cape Canaveral. The results determined that light winds at discrete levels through the lower troposphere (up to 500 mb; 1 mb = 1 hPa) and total precipitable water vapor levels in excess of 4.1 cm (1.6 in.) produced favorable conditions for waterspout development. They combined these sounding-derived parameters with WSR-88D radar information to build a five-step waterspout forecast strategy.

Brown and Rothfuss (1998) developed a basic waterspout index for the Florida Keys and south Florida that included mean wind direction and speed below 600 mb, as well as a qualifier for whether a waterspout occurred on the previous day. The values obtained were assigned a value of 0.5 or 1 according to a table, and a sum equaling 2 or greater was considered favorable for waterspout development.

An index developed by Kuiper and Van der Haven (2007) used the product of four weighted parameters (0–3-km wind shear, 0–500-m lapse rate, 0–1-km average humidity, and 10-m wind speed) to indicate the risk of waterspout development over the shallow waters of the North Sea off the coast of the Netherlands. The Kuiper–Van der Haven spout index has been implemented in the locally run High Resolution Limited Area Model (HIRLAM) at the Royal Netherlands Meteorological Institute.

A study of waterspouts over the Adriatic, Ionian, and Aegean Seas by Sioutas and Keul (2007) examined several thermodynamic indices derived from proximity sounding data in relation to waterspout development. They found that “fair weather” waterspouts were mainly a result of a preexisting unstable air mass and low wind shear. Of the thermodynamic indices examined in the study, they stated that the K index and the total totals index were good indicators of an unstable environment and useful in identifying favorable tornadic and fair weather waterspout environments. Other thermodynamic indices they examined did not perform as well from a predictive standpoint.

A follow-up study (Keul et al. 2009) examined the performance of the Szilagyi waterspout forecasting technique (Szilagyi 2009) as a predictor for Mediterranean waterspout occurrence. The Szilagyi technique uses a nomogram to compare the difference between sea surface temperature and the 850-mb temperature against the convective cloud depth and is also available as an index (Szilagyi waterspout index). Values of the Szilagyi waterspout index range from −10 to 10, where values greater than zero indicate a favorable waterspout environment.

The NASA Applied Meteorology Unit (AMU) at the Kennedy Space Center in Florida used a combination of four factors (monthly climatology, vertical wind profile, precipitable water, and persistence) to determine the weak waterspout (fair weather waterspout) potential for the day (Wheeler 2009). The value produced by the methodology is compared to a table to ascertain whether the threat level for the day is low, medium, or high. An AMU follow-up study (Watson 2011) used a logistic regression technique similar to that in the present study to evaluate the probability of severe weather occurrence at the Kennedy Space Center. However, it did not include a reevaluation of the weak waterspout methodology outlined by Wheeler (2009). Watson (2011) found that using logistic regression did not show an improvement over their existing severe weather likelihood methodology.

A common thread found through many studies (Spratt and Choy 1994; Brown and Rothfuss 1998; Golden 2003; Kuiper and Van der Haven 2007; Sioutas and Keul 2007; Wheeler 2009) is the presence of weak lower-tropospheric winds and/or weak lower-tropospheric shear as a favorable environment for waterspout development. Additionally, some of these studies noted the importance of lapse rates at various levels in the lower troposphere (Golden 2003; Kuiper and van der Haven 2007). These conditions were also found to be significant in the present study.

The commercial raob software created by Environmental Research Services, LLC (Shewchuk 2017), provides a waterspout index developed by P. Mohlin and F. Alsheimer of the NWS Forecast Office in Charleston, South Carolina. This index assigns risk points based on sounding-derived variables meeting certain thresholds (Table 1; F. Alsheimer 2016, personal communication). The total number of risk points forms the Charleston waterspout index. The value of the index is used to qualitatively classify the risk for waterspout development as high, moderate, low, or none.

Table 1.

Parameters used in the Charleston waterspout index with thresholds and associated risk points. The EHI is defined by Davies (1993), and the Wind Index (WINDEX) is defined by McCann (1994).

Table 1.

The goal of the present work was to develop a quantitative representation of the probability of waterspouts as a function of sounding-derived parameters for the Florida Keys wet season, June–September, and to compare the results with existing benchmarks. For this comparison, a logistic regression model (referred to hereafter as CWI) was constructed to convert the raob-provided Charleston waterspout index described in Table 1 to a waterspout probability value. The CWI regression model was constructed using training period data (see section 2) with waterspout report days as a predictand and the Charleston waterspout index as a predictor. Similarly, logistic regression modeling during the training period was used to convert the Brown–Rothfuss index, the Kuiper–Van der Haven spout index, and the Szilagyi waterspout index to waterspout probabilities (hereafter referred to as BRI, KHI, and SWI, respectively). Of these four existing indices, the CWI produced the best-performing probabilistic model, and as such, it was chosen as a primary comparison benchmark.

Details of the waterspout and sounding observations are discussed in section 2. The statistical methodology for model development and the metrics for assessing model performance are described in section 3. The evaluation of performance for the proposed forecast model and a comparison with that of the benchmark CWI model is presented in section 4 and further discussed in section 5. Finally, some directions for future work are proposed in section 6.

2. Data

The official NWS local storm reports (obtained from the National Centers for Environmental Information database) from the field office in Key West, which covers the Florida Keys nearshore coastal waters, were used to identify days on which waterspouts were reported. Days with one or more waterspout reports anywhere along the Florida Keys coastal waters were classified as waterspout report days. Days with no reports were classified as no-waterspout report days. Figure 2 shows the geospatial distribution of waterspout reports along the Florida Keys from 2006 to 2016. These locations should be considered approximate, as they are often visually estimated at the time of observation.

Fig. 2.
Fig. 2.

Locations of all reported waterspouts during June–September 2006–16, extracted from NWS local storm reports from the field office in Key West. The distribution of waterspout distances (km) from the Key West sounding location is shown on the bottom right. The median, mean, and standard deviation of these distances are, respectively, 33.2, 47.3, and 42.1 km.

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

Inherently, reports based on visual sightings have limitations. It is likely that a larger proportion of waterspouts are reported near the densely populated cities of Key West and Marathon. Conversely, there may be underreporting along the more sparsely populated islands. Many waterspouts are reported by mariners; therefore, waterspout reports would be more common in regions with large marinas and popular boating routes (both recreational and commercial) and underreported elsewhere. Waterspouts sighted in areas outside of cellular phone or marine radio reception are more likely to go unreported, at least in real time. Also, few waterspouts have been reported in the hours between dusk and dawn. This is due to two main reasons. Physically, waterspouts are less likely to occur during evening hours because the generation of cumulus cloud lines from which waterspouts can spawn is less likely as a result of the loss of differential heating along the Keys islands. From an observation perspective, it is difficult to spot and identify waterspouts in the dark. Waterspout reports received via social media may at some point lead to an increase in the overall number of reports, but as of yet no such increase has been observed. This may in part be a result of the short data record since the onset of social media as a tool for reporting weather phenomena.

The 1200 UTC soundings were obtained at Key West for the wet seasons (June–September) of 2006–16. The first 9 years (2006–14) of the data were used as a training dataset to build a multiple logistic regression model of waterspout report probability. Once the model was finalized, data for the two withheld years (2015–16) were used as a testing dataset for independent forecast verification. Years prior to 2006 were not included in this study because of inconsistent reporting and waterspout identification practices. The 1200 UTC soundings were selected because of their predictive potential for the daylight hours, which is when most waterspouts are reported (Fig. 3). The 1200 UTC Key West sounding was considered proximate to waterspout occurrences because of the generally spatially uniform structure of the atmosphere over the Florida Keys on a given wet-season day. The Key West location is representative of a tropical marine boundary layer, without landmass influences such as near-surface radiation inversions in the morning sounding or strong land–sea-breeze interactions. The sounding at Key West is within 73 km (about 46 mi) of 75% of the waterspouts in the dataset (see Fig. 2), and within 115 km (about 72 mi) of 90% of reported waterspouts.

Fig. 3.
Fig. 3.

Waterspout report frequency (%) by local hour of the day for the years 2006–14 during June–September. Data are extracted from NWS local storm reports from the field office in Key West. The 1200 UTC soundings are released at about 0700 eastern daylight time (EDT).

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

Some of the limitations of using sounding data include the inability to capture localized mesogamma phenomena such as local sources of vorticity generated by cumulus cloud lines along the Keys, subcloud superadiabatic lapse rates, and convergence resulting from differential heating along the islands, to mention a few. These likely all play a role in waterspout development, but the data are not available for proper evaluation of these elements. Soundings are used in the hope of finding parameters that may be precursors, or proxies, to an overall favorable waterspout development environment.

Dewpoint temperatures from 400 mb and above were omitted from this study. This is because of discrepancies in moisture measurements aloft from different radiosondes used from 2006 through 2016. From the beginning of the study in 2006 to 22 March 2010, the Sippican VIZ-B2 radiosonde was used at Key West, which was replaced by the Lockheed Martin Mark IIA radiosonde, which in turn was replaced by the Vaisala RS92-NGP GPS radiosonde on 27 June 2012 and is in use to date. As is evident in the appendix (see Fig. A1), the upper-tropospheric moisture measured by the various radiosondes begins to diverge above 400 mb. The appendix also illustrates the monthly mean wet-season 100-mb dewpoint temperature by year (Fig. A2), showing a clear discontinuity in the data before 2010 and after 2012, as different radiosondes were introduced. Since the dewpoint temperature differences result from different radiosondes and not atmospheric changes, inclusion of the data could lead to statistically anomalous results. To include the data would have required a statistical scheme to remove the instrument bias, which is beyond the scope of this study.

Supercell waterspouts not associated with tropical cyclones during the wet-season months are relatively rare. One waterspout, which occurred on 1 June 2007 in association with Tropical Storm Barry moving past the Florida Keys, was removed from the dataset since this study is designed to investigate nonsupercell waterspouts, or fair weather waterspouts. This was the only waterspout in the dataset that was associated with a tropical cyclone.

The Key West radiosonde dataset included wind speed and direction at various levels or layers. This information was used to calculate the signs of the U and V components of the winds as additional derived variables that take values of +1, 0, or −1. The final dataset at that point comprised 238 sounding-derived parameters (listed in Table A1) that were considered as potential predictors for developing a waterspout probability model, as described in the following section.

3. Methodology

a. Modeling

The presence of a waterspout report can be considered a reliable indication of waterspout occurrence. However, as previously discussed, the absence of a waterspout report cannot be considered a reliable indication of waterspout nonoccurrence. This discrepancy between occurrence and report is an unavoidable property of the dataset. Given the known limitations of waterspout observations and of single-point radiosonde soundings as representations of the atmospheric environment for the Florida Keys and surrounding waters, it is not a priori obvious that it would be possible to establish a statistical relationship between sounding variables and (reported) waterspout days. The present work, however, assumes that such a relationship can be established. The validity of this assumption is eventually evaluated and confirmed by demonstrating that the resulting model does indeed have skill above that of a climatological prediction.

A logistic regression approach was employed to model the probability of a waterspout report as a function of environmental variables derived from morning (1200 UTC) soundings. This approach required finding an appropriate subset of sounding variables that would serve as predictors in the final model. The large number of candidate predictors (in our case, separate parameters derived from the sounding) posed a problem for selecting an optimal subset of predictors. Since there are possible subsets that can be obtained from of a set of elements, an exhaustive search for an optimal subset of predictors would require the calculation and comparison of separate models. If the selection of subsets is restricted to require exactly elements (out of ), the number of possible subsets is given by “ choose ,” or
eq1
Building a robust logistic regression model for binary outcomes requires a minimum of 10–15 events per predictor variable (Peduzzi et al. 1996), although for achieving predictive accuracy some 20–40 events per variable may be required (Austin and Steyerberg 2014). In the present case, there are 202 events (waterspout observations), which could accommodate up to approximately 13–20 predictor variables under the first rule, or 5–10 by the second. If the optimal predictor set is thus restricted to contain no more than predictors, the number of possible subsets (NPSs) is given by
eq2
A graphical representation of the relationship between NPS and the number of candidate predictors n is shown in Fig. 4. If all 238 variables are considered as candidate predictors, (Fig. 4, circle symbol), which is still a very large number. An exhaustive search through that many combinations would be computationally prohibitive, which makes finding a definitive “best model” not computationally feasible.
Fig. 4.
Fig. 4.

Log10 of the NPS (vertical axis) of up to predictors that can be selected from a set of n (horizontal axis) candidate predictors. Graph obtained by calculating as a function of . The red circle, triangle, and cross indicate the number of such subsets that can be selected, respectively, from a set of , , and candidate predictors.

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

It is therefore necessary to shorten the list of candidate predictors to a more manageable number. As a first step in this process, the statistical significance of the difference of means (using a t test) for each variable between waterspout report days versus no-report days was calculated (see Table A1, where the candidate predictors are arranged in order of increasing p value). If the means for a particular variable were significantly different at the 95% confidence level (p value of 0.05) or higher, that variable was added to a shortened candidate predictors list. This testing excluded 157 variables and retained variables (Table A1, column N0), reducing the NPS to (Fig. 4, triangle symbol). To further reduce the number of candidate predictors, and to reduce the risk of multicollinearity in the final model, a requirement was added that no two candidate predictors be highly correlated. A commonly used rule of thumb for the level of correlation between regression model predictors that must not be exceeded is 0.6 (MacInnes 2017). This requirement was implemented as follows. Sequentially moving through the shortened list of candidate predictors, a variable was retained if its absolute correlation with any of the previously retained variables was less than the defined threshold of 0.6. To illustrate this point, the first variable on the list (0–3 kft AGL wind speed) was retained by default, since it is on the top of the list and there are no preceding entries. The next variable (1000-mb wind speed) was not retained because of its high correlation to the previously retained variable (0–3 kft AGL wind speed). Similarly, the 3rd–14th variables [925-mb wind speed, transport mean speed, surface wind speed, 0–10 kft AGL wind speed, fire danger index, shear 1, 850-mb wind speed, 925–70-mb average wind speed, 3–10 kft AGL wind speed, energy helicity index (EHI), transport peak speed, fog, and fog stability index (FSI)] were not retained for the same reason. The 15th variable (0–10 kft AGL V-wind direction) was retained because its correlation with the only previously retained variable (0–3 kft AGL wind speed) did not reach the specified threshold. The 16th variable (1000–500-mb lapse rate) did not reach the specified correlation threshold with either of the two previously retained variables (0–3 kft AGL wind speed and 0–10 kft AGL longitudinal wind direction) and was thus also retained. This process is continued until the 81st variable (the last on the significant candidate predictors list) is reached. At the end of the process, variables are retained, forming the further shortened candidate predictors list (Table A1, column N1 and boldface). The NPS for this set is a much more manageable .

These 24 variables were then supplied as candidate predictors to an exhaustive search for a subset of up to 10 predictors that minimized the Akaike information criterion (AIC; Akaike 1974). The AIC was chosen as a selection criterion because it rewards good fit while penalizing the number of predictors, thus reducing the risk of overfitting. The search was performed using the “bestglm” package in R (https://www.r-project.org/) and resulted in a final eight-variable model (Table A1, column N2 and boldface and italic).

During the predictor selection phase described above, days with missing data for any of the initial 238 variables were temporarily excluded from consideration. Once the predictors were finalized, the model was constructed using all days for which none of the final predictors contained missing data.

The contribution of individual predictors to the informational content (measured by AIC and residual deviance) of the final eight-variable logistic regression model (LRM) is illustrated in Table 2. To construct this table, as a first step, separate logistic regressions (k = 1 in Table 2) were carried out for each individual predictor, and the residual deviance and AIC were documented. The predictor with the lowest value of AIC (representing the biggest improvement over the null model, i.e., the one with intercept only) was selected. For the next step (k = 2), the best single predictor, the average wind speed at 0–3 kft AGL, was combined in a logistic regression with each of the other predictors one at a time. Next, the resulting two-predictor model with the lowest value of AIC was retained, and the remaining predictors were added, again one at a time, to determine the best three-predictor (k = 3) model, and so forth, until the full k = 8 model is reached. The first step of the procedure illustrates that out of the set of eight, when considered separately, the best predictors are, in order 1) average wind speed at 0–3 kft AGL (0–3kSpd), 2) sign of the V-wind component at 0–10 kft AGL (0–10kV+/−), 3) Craven significant severe parameter (Craven; Craven and Brooks 2004), 4) 1000–700-mb lapse rate (1000–700LR), 5) sign of the U-wind component of the 100-mb wind (100U+/−), 6) sign of the U-wind component of the transport mean wind (TransU+/−), 7) temperature at 100 mb (100T), and 8) total totals index (TT; Miller 1972). The transport mean wind as defined here is the average wind in a layer from the surface to the mixing height, with the mixing height found using the Heffter method (Heffter 1980). An estimate of the relative contribution of each predictor in the full model can be seen in the standardized coefficients for the LRM (Fig. 5). According to these standardized coefficients, waterspout probability is largest for weak 0–3 kft AGL average wind speeds, stronger negative 1000–700-mb lapse rates, and larger values of the Craven parameter, total totals, and 100-mb temperature; the most favorable flows have positive U components of the transport mean winds and the 100-mb winds, as well as a negative V component in the lowest 10 kft of the atmosphere.

Table 2.

Residual deviance (RES) and AIC for models with k predictors built from subsets of the final eight predictors using a sequential selection approach based on AIC. Individual predictors are abbreviated as in the text. The lowest values of RES and AIC for each k are marked in boldface. The null deviance is 1032.5.

Table 2.
Fig. 5.
Fig. 5.

Standardized coefficients (black) for the LRM with 95% confidence intervals (red lines).

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

The nature of these relationships holds when the predictors are considered separately (Fig. 6). For each individual quantitative variable, the frequency distribution over the training dataset, the conditional frequency of waterspout report days in each of 15 equally populated bins, and the single-variable logistic regression fit are shown for each predictor of the LRM.

Fig. 6.
Fig. 6.

Frequency distribution (gray bars), conditional frequency of waterspout reports (red dots), and single-variable logistic regression fit (black line) for (a) 100-mb temperature (K), (b) 1000–700-mb lapse rates (K km−1), (c) the Craven significant severe parameter (×103 m3 s−3), (d) the TT index (unitless), and (e) 0–3-kft wind speed (kt, where 1 kt = 0.51 m s−1), (f) 100-mb U-wind sign (unitless), (g) transport mean U-wind sign (unitless), and (h) 0–10-kft V-wind sign (unitless).

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

b. Performance metrics

Three complementary sets of metrics were used to assess the performance of the LRM and compare the results with those of the benchmark model (CWI) and climatology, specifically (i) the distribution of predictions on days with and without waterspout reports; (ii) the reliability diagram and associated reliability, resolution, and Brier skill score of the predictions; and (iii) their receiver operating characteristics (ROCs) and the areas under the curve (AUCs). The first set of metrics addresses the question of how different the forecast distributions are for days with and without waterspout reports and whether the differences between these distributions are statistically significant. The reliability diagram and the associated Brier skill decomposition and score (Wilks 2006) address the question of how reliable the forecasts are (i.e., an event with a predicted probability of X% occurs in X% of cases), how well resolving they are (i.e., the degree to which the model produces predictions away from climatology), and whether they are, overall, more skillful than a climatological forecast. Finally, the ROC illustrates how the hit (true positive) rates and false alarm (false positive) rates compare at different classification thresholds. In other words, the ROC diagram shows, for any given value of probability taken as a threshold to differentiate between a categorical “yes” and a “no” forecast, the hit and false alarm rates for such forecast. AUC is the area under the ROC curve. For a random (no skill) forecast, the hit and false alarm rates are equal at any forecast probability threshold. For such forecasts, the ROC follows the diagonal line of the diagram, connecting the bottom-left and top-right corners, and its AUC value is 0.5. For a perfect categorical forecast, the ROC is made of two straight lines connecting the bottom-left, top-left, and top-right corners of the diagram, with an AUC value of 1. A value of AUC of 0.7 is often accepted as a threshold for moderately discriminating forecasts (Swets 1988). Unlike the reliability diagram analysis, the ROC/AUC does not penalize miscalibration, reflecting the fact that even miscalibrated (biased) forecasts can, in practice, still provide useful guidance and can be improved by proper calibration.

The performance of the LRM and the benchmark model was evaluated both in a repeated cross-validation setting using data from 2006 to 2014 and in an independent verification mode using the withheld data for 2015–16. The repeated cross validation (20 times five folds) consists of randomly splitting the training dataset into five “folds” of similar size; one fold is withheld for validation, while the remaining four are used for calculating model coefficients; this process is repeated for each of the five folds; this random splitting procedure is repeated 20 times. A similar repeated fivefold cross validation was performed for the logistic regression built using the CWI. The reason fivefold cross validation was selected is that the length of each fold (approximately 213 days) is similar in length to the independent verification dataset (235 days), allowing for a comparison of several independent verification measures with those based on training-period results for a dataset of similar length.

4. Results

a. Comparison of model distributions

The distribution of modeled probabilities of waterspouts on report versus no-report days is shown in Fig. 7 as box-and-whisker diagrams. Here, and in subsequent diagrams, colored boxes represent the interquartile range (25th–75th percentile), whiskers span from the 10th to the 90th percentile, and the notch around the median represents the 95% confidence interval for the median’s true value. Only the CWI results are shown here as a benchmark for the LRM performance, since the BRI, KHI, and SWI demonstrated no appreciable skill in differentiating between report and no-report days.

Fig. 7.
Fig. 7.

Boxplot of the LRM and CWI modeled waterspout probability on report vs no-report days from (a) cross validation and (b) independent verification. Whiskers show the 10th and 90th percentiles. The notch around the median (thick black line) indicates the 95% confidence interval for the median’s true value. The horizontal dashed line indicates the climatological frequency of waterspouts. The median value for each box-and-whisker plot is shown along the top of the plot, and the mean value is shown in parentheses beneath it.

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

The climatological probability of waterspout reports for the wet season, based on the training period, 2006–14, is 19.0%. On no-report days, the cross-validated LRM predicted a median probability of 14.3% (mean 17.4%), while on waterspout report days the median predicted probability was 22.8% (mean 26.0%) (Fig. 7a, red). The difference in means across the two groups is statistically significant (p value < 2.2 × 10−16). The 95% confidence intervals of the medians of the forecasts on no-report and report days do not overlap, indicating statistically significant differences in the medians as well.

The cross-validated CWI model median probability on no-report days was 17.8% (mean 18.5%), and on report days it was 20.6% (mean 20.8%) (Fig. 7a, blue), and the difference in means was also statistically significant (p value < 2.2 × 10−16). Here, too, the 95% confidence intervals of the medians of the forecasts on no-report and report days do not overlap, indicating statistically significant differences in the medians as well.

Also statistically significant are the differences between the mean and median forecasts produced by the two models (LRM and CWI) on no-report days (p value < 2.2 × 10−16 for the difference of means), as well as on report days (p value < 2.2 × 10−16 for the difference of means). The comparison in Fig. 7 shows that there is a wider range of probabilities produced by the LRM relative to the CWI model on both no-report and report days, and that the two CWI distributions are less distinguishable from each other than are the two LRM distributions. The 10th–90th percentile range for the LRM spans from 5.7% to 33.7% on no-report days and from 10.8% to 45.7% on report days. For the CWI model, the 10th–90th percentile spans from 10.6% to 27.6% on no-report days and from 13.8% to 28.5% on report days.

The average probability of waterspout reports for the wet season, based on the 2015–16 dataset, was roughly 22.5%, somewhat higher than that of the training period (see Fig. 1 for the monthly mean frequencies during the verification versus the training period). On no-report days, the independent verification with the LRM predicted a median probability of 14.2% (mean 16.7%), while on waterspout report days the median predicted probability was 21.5% (mean 24.1%) (Fig. 7b, red). The difference in means across the two groups is statistically significant (p value = 9.1 × 10−5), and the 95% confidence intervals for the median forecast values on report versus no-report days are clearly separated.

The independent verification with the CWI model for no-report days had a median probability of 19.6% (mean 19.9%), while on report days it was 21.4% (21.8%) (Fig. 7b, blue), and the difference in means had a weaker statistical significance (p value = 0.04). The 95% confidence intervals for the median forecast values on report versus no-report days are marginally separated.

Also statistically significant are the differences between the mean and median forecasts produced by the two models (LRM and CWI) on no-report days (p value = 3.0 × 10−4 for the difference of means). The difference between the two models on report days is not clearly statistically significant, with a p value = 0.22 for the difference of means test and some overlap between the 95% confidence intervals for the medians. Similarly to the cross-validation results, the two CWI distributions are less distinguishable from each other than are the two LRM distributions. The 10th–90th percentile range for the LRM spans from 5.9% to 30.4% on no-report days and from 10.6% to 42.4% on report days. For the CWI model, the 10th–90th percentiles span from 12.9% to 27.5% on no-report days and from 14.5% to 29.6% on report days.

The above comparison of model distributions demonstrates that both the LRM and CWI outperform climatology, and that the LRM outperforms the CWI in terms of differentiating between days with no waterspout reports and days with waterspout reports. The average predicted frequency on days during which no waterspouts are reported is lower in the LRM model than in the CWI model. Conversely, on days for which waterspouts are reported, the average predicted frequency is higher in the LRM than in the CWI. The CWI forecast frequency distribution has a similar and highly overlapping range on report versus no-report days and is more clustered around climatology. The LRM distributions demonstrate a greater difference in ranges and a smaller overlap between report and no-report days and are less clustered around climatology. The above results are seen both in cross-validation and independent verification mode.

b. Reliability diagram and Brier skill score

For a complementary look of the two models’ performances, reliability diagrams were constructed (Fig. 8). Here, the predicted probabilities were separated into evenly spaced bins between 0 and 1 (horizontal axis) with width equal to ⅔ of the climatological probability for 2006–14, and the forecast probability in each bin was assigned its midpoint value. The observed probability (vertical axis) for each bin was calculated as the mean of the corresponding validating observations. The Brier score decomposition (Wilks 2006) into reliability, resolution, and uncertainty terms, and the Brier skill score (BSS) relative to climatology, are indicated in each panel in Fig. 8. The uncertainty term is determined purely by the observations’ distribution and is not influenced by the forecast. Note that a more skillful forecast will have a smaller reliability term (indicating well-calibrated forecasts) and a larger resolution term (indicating the presence of forecasts farther from the climatological mean). The Brier skill score, on the other hand, is larger for more skillful forecasts and indicates skill improvement relative to a forecast of climatology. The BSS for a forecast as skillful as climatology is 0, it is negative for forecast less skillful than climatology, and it is 1 for a perfect deterministic forecast. This theoretical upper limit of the BSS is, in practice, not a realistic target for probabilistic forecasts (Palmer et al. 2000), and even small positive values can be indicative of forecast skill (Jolliffe and Stephenson 2008).

Fig. 8.
Fig. 8.

Reliability diagram for modeled waterspout probability for the (a) cross-validated LRM, (b) cross-validated CWI, (c) independent verification LRM, and (d) independent verification CWI. The point size reflects the forecast bin frequency, also denoted as a percentage. Values for the Brier score decomposition into reliability (Rel), resolution (Res), and uncertainty (Unc), as well as the BSS, are printed in the bottom-right corner of each panel.

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

For comparison, we show that for the repeated cross validation the LRM (Fig. 8a) is an improvement over the CWI model (Fig. 8b). The LRM reliability term is slightly smaller than that of the CWI, indicating a slightly better-calibrated model. On the graph this is evidenced by the closer proximity of the LRM forecast/observation data points to the diagonal (perfect reliability) line. The resolution term of the LRM model is larger than that of the CWI model, indicating that the LRM produces more frequent forecasts that are farther away from the climatological mean. On the graph this is evidenced by the size of the dots representing the forecast/observation data points (and specified as a percentage of all forecasts), which reiterates that the CWI model is more conservative, with most forecasts lying near the climatological probability of waterspout occurrence. In contrast, the LRM more frequently, and reliably, predicts probabilities above or below the climatological value. Only 37.5% of LRM forecasts fall into the “climatology” bin, compared to some 66.4% of CWI forecasts. The Brier skill score of the LRM model (0.067) is several times larger than that of the CWI (0.010).

Box-and-whisker diagrams (Fig. 9) of the Brier skill scores were obtained from the individual folds of the 20-times repeated fivefold cross validation for the LRM (red) and CWI (blue). Also shown for reference are the Brier skill scores of similarly obtained results for the three other models (BRI, KHI, and SWI). There is a clear separation between the Brier skill score distributions for the two models, as illustrated by the lack of overlap in the interquartile ranges. The difference in means is statistically significant (p value < 2.2 × 10−16), and the 95% confidence intervals for the medians are well separated. The 10th–90th percentile range of Brier skill scores for single folds of the LRM cross validation spans from 0.014 to 0.121 with a median value of 0.065, while for the CWI model this span is from −0.014 to 0.039 with a median value of 0.010. The range for BRI spans from 0 to 8.5 × 10−5 with a median of 4 × 10−11, for KHI it is from −6 × 10−10 to 9 × 10−5 with a median of 4 × 10−11, and for SWI it is from −9 × 10−3 to 4 × 10−3 with a median of 9 × 10−4.

Fig. 9.
Fig. 9.

BSSs for individual folds of the 20-times fivefold cross validation for the LRM (red) and CWI (blue), as well as the BRI (green), KHI (orange), and SWI (cyan).

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

The results from independent verification of the LRM and CWI in terms of reliability diagrams and Brier skill scores (Figs. 8c,d) are generally similar to those from the cross validation. Although the reliability term for the LRM is larger than that for the CWI, indicating a better calibration of the latter model, this is counterbalanced by the improved (larger) resolution term in the LRM compared to the CWI. The overall Brier skill score for the LRM is 0.092, larger than that of the CWI (0.024). These values are within the range of values for individual folds of the cross validation (recall that the length of data in an individual fold is similar to the length of data in the independent verification period) seen in Fig. 9, confirming that, when equitably compared, the model performance in independent verification mode is within the range of performance seen in cross validation.

The above comparison of the reliability diagrams demonstrates that both the LRM and CWI are well calibrated, but the resolution of the LRM forecasts is an improvement over that of the CWI forecasts, that is, that the LRM more frequently (and statistically correctly) produces forecasts that are farther away from climatology. This is reflected in the improvement of the Brier skill score of the LRM relative to the CWI. These results are seen both in cross-validation and independent verification mode. The performance of the remaining three models evaluated here (BRI, KHI, and SWI) is significantly below that of both the LRM and CWI and only slightly above climatology.

c. Receiver operating characteristic and area under the curve

Examination of the ROC curves (Fig. 10) provides a complementary comparison of the performance of the LRM and CWI models, in terms of their ability to discriminate between the two outcomes (report versus no-report days). The ROC/AUC for repeated cross validation demonstrates that the discrimination of the LRM (Fig. 10a, solid red line) is an improvement over CWI (Fig. 10a, solid blue line), since the LRM ROC is above and to the left of the CWI ROC. The AUC for the LRM is 0.70, greater than the CWI AUC value of 0.61. The ROC values for individual cross-validation folds are depicted in Fig. 10a with dashed red lines for the LRM and dashed blue lines for the CWI, thus illustrating the variability within subsets of the data and providing a reference range for the independent verification period.

Fig. 10.
Fig. 10.

ROC curves for the (a) cross validation and (b) independent verification of waterspout probability for the LRM (red) and for the CWI model (blue). AUC values are indicated in the bottom-right corner of each panel. In (a), the dashed lines show the ROC curves for individual folds, and solid lines are for the entire cross validation.

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

The box-and-whisker plot of AUC values for individual folds is depicted in Fig. 11. There is a clear separation between the AUC distributions for the two models, as illustrated by the lack of overlap in the interquartile ranges. The difference in means is statistically significant (p value < 2.2 × 10−16), and the 95% confidence intervals for the medians are well separated. The 10th–90th percentile range of AUC for single folds of the LRM cross validation spans from 0.66 to 0.74, with a median value of 0.70, while for the CWI model this span is from 0.55 to 0.66, with a median value of 0.62. The BRI spans from 0.47 to 0.59, with a median value of 0.54; the KHI spans from 0.49 to 0.58, with a median value of 0.54; and the SWI spans from 0.48 to 0.57, with a median value of 0.52.

Fig. 11.
Fig. 11.

AUC for individual folds of the 20-times fivefold cross validation for the LRM (red) and CWI (blue), as well as the BRI (green), KHI (orange), and SWI (cyan).

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

The results from independent verification of the LRM and CWI in terms of ROC/AUC (Fig. 10b) are generally similar to those from the cross validation. The LRM ROC is almost entirely above and to the left of the CWI ROC. The AUC for the LRM is 0.70, greater than the value for the CWI of 0.59. These values are within the range of values for individual folds of the cross validation (recall that the length of data in an individual fold is similar to the length of data in the independent verification period) seen in Fig. 11, illustrating that the model performance in independent verification mode is within the range of performance obtained during cross validation.

The ROC/AUC results of both the LRM and CWI models demonstrate that both models are an improvement over a random forecast, and the performance of the LRM is superior to that of the CWI in both the cross validation and independent verification. The LRM AUC is at the threshold of what is considered a moderately well discriminating model, where the CWI fails to meet that threshold. The performance of the remaining three models evaluated here (BRI, KHI, and SWI) is significantly below that of both the LRM and CWI and only slightly above climatology.

5. Summary and discussion

More than 200 separate variables were extracted from 1200 UTC Key West soundings for the period 2006–14 for the wet-season months from June through September. The variables were separated into two subsets: days when at least one waterspout was reported and days with no waterspout reports. The number of variables considered needed to be reduced in order to obtain a workable number of candidate predictors for logistic regression analysis. Each variable was examined for a statistically significant difference in the means between the two subsets. Variables for which the difference in means between the two subsets met or exceeded the 95% confidence level were retained for further analysis, and the rest were excluded from further consideration. The number of candidate predictors was further reduced by requiring that no two variables had an absolute correlation exceeding 0.6. AIC was used as a criterion to perform an exhaustive search to select a best model based on this reduced set of predictors. This resulted in an eight-predictor model, the LRM. The LRM obtained in this manner is but one model solution that resulted from the particular choice of the model selection procedure, and other model configurations, obtained by alternative means, may perform equally well or better.

The impact of each predictor in the LRM was examined by comparing against the null model and ensuring that the inclusion of each successive predictor, in order of importance, reduced the model AIC and residual deviance. The LRM was then compared against the benchmark model (CWI) using repeated cross validation, and its potential for operational forecasting was further evaluated by building the logistic regression model with the 2006–14 wet-season data and applying the resulting model to an independent, unused set constructed with data from the 2015–16 wet seasons. These evaluations demonstrated that the multivariable logistic regression, LRM, built using a small set of predictor variables derived from morning sounding data at Key West, produced a quantitative probability forecast with robust skill relative to climatology, and yields results that exceeded those of an existing operational index, CWI, validated using the same methodology. The models’ performances were compared in terms of several complementary measures: discrimination, reliability, resolution, Brier skill score, receiver operating characteristic (ROC) curves, and area under the curve (AUC). It was found that the LRM outperformed the benchmark model in terms of all evaluated metrics for both cross validation and independent verification, thus providing a more skillful probabilistic forecast with greater potential utility.

Even though the LRM performance exceeds that of the CWI model, it is important to note the results make no statement of the performance of the CWI for the Charleston coastal environment, or elsewhere. The CWI was used here only as a benchmark because the two models have similar origins, being built from sounding-derived parameters, and the CWI demonstrates an improvement over climatology. Three other additional indices described in the literature were considered as potential benchmarks, but most of these results are not shown since their performance was only marginally better than climatology.

The LRM models the chance of a waterspout being reported, rather than directly the chance of a waterspout occurring. However, it is assumed that the probability of a waterspout being reported is proportional to waterspout existence. This leaves the potential for bias since waterspouts occurring farther offshore may be witnessed and/or reported at a lesser rate. If indeed waterspouts farther offshore are underrepresented in the reports relative to nearshore waterspouts, the model may favor flow regimes that are preferentially associated with nearshore waterspouts.

The LRM is trained for waterspout development over the waters surrounding the Florida Keys and likely will not perform as well over other coastal waters since, for example, the shape of the coastline is reflected in favorable surface wind flow, and the upper troposphere may not carry the same importance in other locations as it does over the Florida Keys. Additionally, the climatological occurrence of waterspouts will differ by location, which will affect the training of the model. One possible impediment to evaluating the LRM methodology in other locations would be a low occurrence rate of waterspouts. Given a low climatological occurrence in other locations, the resulting model may not be statistically stable (World Meteorological Organization 1999, section 3.3.2). However, the methodology used to produce the LRM is expected to produce favorable results in other areas and will be examined in further work.

The LRM may prove useful in daily operational forecasting of waterspout development, either alone or in conjunction with other guidance tools. We found that when the LRM forecast was above climatology, the probability of a waterspout being observed was about 29% (35%) in cross-validation (independent verification) mode (Table 3), about 1.5 times higher than the climatological probability. When the LRM forecast was below climatology, the corresponding probability of waterspouts was about 12% (15%). For CWI forecasts these numbers were about 25% (26%) and 14% (18%), respectively. When both LRM and CWI forecasts were above climatology, waterspouts were observed in 30% (37%) of cases for cross validation (independent verification). When both forecasts were below climatology, these numbers were about 10% (15%). When only forecasts exceeding approximately 1.5 times the climatological probability were considered, however, there was no benefit to considering the CWI in addition to the LRM. This suggests that in some circumstances, the specifics of which merit further investigation, there may be a marginal utility in considering both models in combination. Such guidance would likely be most beneficial when used in conjunction with long-established forecaster rules of thumb.

Table 3.

Relative forecast frequency and corresponding waterspout report frequency for the following forecast outcomes: the LRM and CWI individually or jointly above/below climatology (0.19) and individually or jointly above 0.30. Cross-validation results and independent verification results (in parentheses).

Table 3.

6. Future work

One of the surprises to come from this research was the apparent influence of the upper troposphere, which was uncovered statistically in the development of the LRM. Waterspouts were previously thought to be a shallow phenomenon, with cloud tops generally around 3 km (Rossow 1970), and Golden (2003) alluded to the importance of weak lower-tropospheric troughs on days when large waterspouts or multiple waterspouts were reported. A statistical evaluation of favorable synoptic conditions through the depth of the troposphere should be examined, with an emphasis on transient upper-tropospheric conditions as well as the potential effects of tropical waves and/or perturbations in the easterly flow. Additionally, examination of waterspout breakout days (e.g., subjectively defined as days with three or more reports) may be helpful in defining an overall favorable synoptic pattern or relevant intraseasonal and interseasonal changes.

It may be worth expanding the dataset to years prior to 2006. Care will have to be taken to ensure following this course does not degrade the quality of the dataset. One of the waterspout reporting practices employed prior to 2006 was to identify waterspouts as funnel clouds if a spray ring was not spotted, or if the funnel did not descend to the horizon. Therefore, reports of funnel clouds over water, in “fair weather” conditions, would likely need to be considered waterspout reports. Even then, occasionally a storm report would not be issued for a brief funnel cloud.

Even though the radiosonde sounding at the Miami, Florida, NWS office is well inland (approximately 15 km), and may not be representative of the marine boundary layer, it should be examined for utility in developing future iterations of the LRM. Along these lines, the use of modeled soundings (with an emphasis on high-resolution mesoscale models) may also be useful in identifying upcoming favorable waterspout days. This would be helpful in extended outlook products, such as the NWS’s hazardous weather outlooks.

Acknowledgments

The authors are grateful for the help of Kennard Kasper, Sean Daida, and Melody Lovin at the National Weather Service in Key West for their work on graphics used in this paper as well as edits and comments. The authors also acknowledge the work of Hollings Scholar student Taylor Adams of Valparaiso University on background material and additional research. This work was funded by the United States Government, Department of Commerce, National Weather Service.

APPENDIX

Details of the Variables Considered in the Construction of the LRM

Shown in Table A1 is a list of the candidate predictors (parameters obtained from raob) that were used as a starting point for building the LRM model. Also indicated in Table A1 are the candidate predictors that were retained at each step of the selection procedure described in section 3a. Figures A1 and A2 are provided to support the motivation for excluding raob-derived values of dewpoint temperatures above 400 mb from consideration, as described in section 2.

Table A1.

Comprehensive list of all candidate variables obtained from raob (numbered in column N), candidate variables retained after the t-test screening (numbered in column N0), candidate variables retained after subsequent correlation screening (numbered in column N1), and final list of variables after subsequent best model selection based on AIC criterion (numbered in column N2). The abbreviated names for candidate variables retained after the correlation screening are shown in boldface, and the names for the final variables are shown in boldface italics. Also listed are the mean values of each variable on no-report vs report days and the p values for the t test of the difference of those means.

Table A1.
Fig. A1.
Fig. A1.

Wet season composite soundings for the years 2006 (red), 2010 (blue), and 2016 (green). The dashed line is the dewpoint temperature, and the solid line is the temperature. Note the divergence in dewpoint temperatures above 400 mb, reflecting the change in radiosonde instruments in 2010 and 2012.

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

Fig. A2.
Fig. A2.

Monthly mean 100-mb dewpoint temperature (°C) from the rawinsonde observations at Key West. Note the discontinuity between 2006–10 and 2012–16 reflecting instrumentation changes.

Citation: Weather and Forecasting 33, 2; 10.1175/WAF-D-17-0100.1

REFERENCES

  • Akaike, H., 1974: A new look at the statistical model identification. IEEE Trans. Automat. Contr., 19, 716723, https://doi.org/10.1109/TAC.1974.1100705.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Austin, P. C., and E. W. Steyerberg, 2014: Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat. Methods Med. Res., 26, 796808, https://doi.org/10.1177/0962280214558972.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brown, D., and J. Rothfuss, 1998: An approach to waterspout forecasting for south Florida and the Keys. NOAA/NWS Internal Rep., Miami, FL, https://www.weather.gov/mfl/waterspout_fcsting.

  • Craven, J. P., and H. E. Brooks, 2004: Baseline climatology of sounding derived parameters associated with deep, moist convection. Natl. Wea. Dig., 28, 1324.

    • Search Google Scholar
    • Export Citation
  • Davies, J. M., 1993: Hourly helicity, instability, and EHI in forecasting supercell tornadoes. Preprints, 17th Conf. on Severe Local Storms, St. Louis, MO, Amer. Meteor. Soc., 107–111.

  • Golden, J., 1974: The life cycle of Florida Keys’ waterspouts. I. J. Appl. Meteor., 13, 676692, https://doi.org/10.1175/1520-0450(1974)013<0676:TLCOFK>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Golden, J., 1977: An assessment of waterspout frequencies along the U.S. East and Gulf Coasts. J. Appl. Meteor., 16, 231236, https://doi.org/10.1175/1520-0450(1977)016<0231:AAOWFA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Golden, J., 2003: Waterspouts. Encyclopedia of Atmospheric Sciences, G. North, J. Pyle, and F. Zhang, Eds., Elsevier, 2510–2525.

    • Crossref
    • Export Citation
  • Golden, J., and H. B. Bluestein, 1994: The NOAA–National Geographic Society Waterspout Expedition (1993). Bull. Amer. Meteor. Soc., 75, 22812288, https://doi.org/10.1175/1520-0477(1994)075<2281:TNNGSW>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heffter, J. L., 1980: Air Resources Laboratories Atmospheric Transport and Dispersion Model (ARL-ATAD). NOAA Tech. Memo. ERL ARL 81, 29 pp., https://www.arl.noaa.gov/documents/reports/arl-81.pdf.

  • Jolliffe, I. T., and D. B. Stephenson, 2008: Proper scores for probability forecasts can never be equitable. Mon. Wea. Rev., 136, 15051510, https://doi.org/10.1175/2007MWR2194.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Keul, A. G., M. V. Sioutas, and W. Szilagyi, 2009: Prognosis of central-eastern Mediterranean waterspouts. Atmos. Res., 93, 426436, https://doi.org/10.1016/j.atmosres.2008.10.028.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kuiper, J., and M. Van der Haven, 2007: A new index to calculate risk of waterspout development. Fourth European Conf. on Severe Storms, Trieste, Italy, European Severe Storms Laboratory, https://www.essl.org/ECSS/2007/abs/06-Forecasts/1179250265.kuiper.pdf.

  • MacInnes, J., 2017: An Introduction to Secondary Data Analysis with IBM SPSS Statistics. Sage Publications, 336 pp.

  • McCann, D. W., 1994: WINDEX—A new index for forecasting microburst potential. Wea. Forecasting, 9, 532541, https://doi.org/10.1175/1520-0434(1994)009<0532:WNIFFM>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miller, R. C., 1972: Notes on analysis and severe storm forecasting procedures of the Air Force Global Weather Central. Air Weather Service Tech. Rep. 200, 184 pp., http://www.dtic.mil/dtic/tr/fulltext/u2/744042.pdf.

  • Palmer, T. N., C. Brankovic, and D. S. Richardson, 2000: A probability and decision-model analysis of PROVOST seasonal multi-model ensemble integrations. Quart. J. Roy. Meteor. Soc., 126, 20122033, https://doi.org/doi:10.1002/qj.49712656703.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Peduzzi, P., J. Concato, E. Kemper, T. R. Holford, and A. R. Feinstein, 1996: A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol., 49, 13731379, https://doi.org/10.1016/S0895-4356(96)00236-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rossow, V., 1970: Observations of water spouts and their parent clouds. NASA Tech. Note D-5854, 63 pp., https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19700020540.pdf.

  • Schwiesow, R. L., 1981: Horizontal velocity structure in waterspouts. J. Appl. Meteor., 20, 349360, https://doi.org/10.1175/1520-0450(1981)020<0349:HVSIW>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shewchuk, J., 2017: The Universal Rawinsonde Observation Program (version 6.8). Eosonde Research Services, http://www.raob.com.

  • Sioutas, M. V., and A. G. Keul, 2007: Waterspouts of the Adriatic, Ionian and Aegean Sea and their meteorological environment. Atmos. Res., 83, 542557, https://doi.org/10.1016/j.atmosres.2005.08.009.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Spratt, S. M., and B. K. Choy, 1994: Employing the WSR-88D for waterspout forecasting. Postprints, NEXRAD Users Conf., Norman, OK, NWS, https://www.weather.gov/media/mlb/research/EmployingtheWSR88DforWaterspoutForecasting.pdf.

  • Swets, J. A., 1988: Measuring the accuracy of diagnostic systems. Science, 240, 12851293, https://doi.org/10.1126/science.3287615.

  • Szilagyi, W., 2009: A waterspout forecasting technique. Fifth European Conf. on Severe Storms, Landshut, Germany, European Severe Storms Laboratory, https://www.essl.org/ECSS/2009/preprints/O05-14-sziladgyi.pdf.

  • Watson, L. W., 2011: Upgrade summer severe weather tool. NASA Contractor Rep. CR-2011-216299, 32 pp.

  • Wheeler, M. M., 2009: Severe weather and weak waterspout checklist in MIDDS. NASA Contractor Rep. CR-2009-214760, 15 pp., https://science.ksc.nasa.gov/amu/final-reports/svr-wx-wksht-midds.pdf.

  • Wilks, D. S., 2006: Statistical Methods in Atmospheric Sciences. 2nd ed. Elsevier, 627 pp.

  • World Meteorological Organization, 1999: Methods of interpreting numerical weather prediction output for aeronautical meteorology. WMO Tech. Note 195, 124 pp., https://library.wmo.int/opac/doc_num.php?explnum_id=3447.

Save