“Dendrology” in Numerical Weather Prediction: What Random Forests and Logistic Regression Tell Us about Forecasting Extreme Precipitation

Gregory R. Herman Department of Atmospheric Science, Colorado State University, Fort Collins, Colorado

Search for other papers by Gregory R. Herman in
Current site
Google Scholar
PubMed
Close
and
Russ S. Schumacher Department of Atmospheric Science, Colorado State University, Fort Collins, Colorado

Search for other papers by Russ S. Schumacher in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

Three different statistical algorithms are applied to forecast locally extreme precipitation across the contiguous United States (CONUS) as quantified by 1- and 10-yr average recurrence interval (ARI) exceedances for 1200–1200 UTC forecasts spanning forecast hours 36–60 and 60–84, denoted, respectively, day 2 and day 3. Predictors come from nearly 11 years of reforecasts from NOAA’s Second-Generation Global Ensemble Forecast System Reforecast (GEFS/R) model and derive from a variety of thermodynamic and kinematic variables that characterize the meteorological regime in addition to the quantitative precipitation forecast (QPF) output from the ensemble. In addition to encompassing nine different atmospheric fields, predictors also vary in space and time relative to the forecast point. Distinct models are trained for eight different hydrometeorologically cohesive regions of the CONUS. One algorithm supplies the GEFS/R predictors directly to a random forest (RF) procedure to produce extreme precipitation forecasts; the second also employs RFs, but the predictors instead undergo principal component analysis (PCA), and extracted leading components are supplied to the RF. In the last algorithm, dimension-reduced predictors are supplied to a logistic regression (LR) algorithm instead of an RF. A companion paper investigated the quality of the forecasts produced by these models and other RF-based forecast models. This study is an extension of that work and explores the internals of these trained models and what physical and statistical insights they reveal about forecasting extreme precipitation from a global, convection-parameterized model.

Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/MWR-D-17-0307.s1.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Gregory R. Herman, gherman@atmos.colostate.edu

Abstract

Three different statistical algorithms are applied to forecast locally extreme precipitation across the contiguous United States (CONUS) as quantified by 1- and 10-yr average recurrence interval (ARI) exceedances for 1200–1200 UTC forecasts spanning forecast hours 36–60 and 60–84, denoted, respectively, day 2 and day 3. Predictors come from nearly 11 years of reforecasts from NOAA’s Second-Generation Global Ensemble Forecast System Reforecast (GEFS/R) model and derive from a variety of thermodynamic and kinematic variables that characterize the meteorological regime in addition to the quantitative precipitation forecast (QPF) output from the ensemble. In addition to encompassing nine different atmospheric fields, predictors also vary in space and time relative to the forecast point. Distinct models are trained for eight different hydrometeorologically cohesive regions of the CONUS. One algorithm supplies the GEFS/R predictors directly to a random forest (RF) procedure to produce extreme precipitation forecasts; the second also employs RFs, but the predictors instead undergo principal component analysis (PCA), and extracted leading components are supplied to the RF. In the last algorithm, dimension-reduced predictors are supplied to a logistic regression (LR) algorithm instead of an RF. A companion paper investigated the quality of the forecasts produced by these models and other RF-based forecast models. This study is an extension of that work and explores the internals of these trained models and what physical and statistical insights they reveal about forecasting extreme precipitation from a global, convection-parameterized model.

Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/MWR-D-17-0307.s1.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Gregory R. Herman, gherman@atmos.colostate.edu

Supplementary Materials

    • Supplemental Materials (ZIP 67.67 MB)
Save
  • Ahijevych, D., J. O. Pinto, J. K. Williams, and M. Steiner, 2016: Probabilistic forecasts of mesoscale convective system initiation using the random forest data mining technique. Wea. Forecasting, 31, 581599, https://doi.org/10.1175/WAF-D-15-0113.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Applequist, S., G. E. Gahrs, R. L. Pfeffer, and X.-F. Niu, 2002: Comparison of methodologies for probabilistic quantitative precipitation forecasting. Wea. Forecasting, 17, 783799, https://doi.org/10.1175/1520-0434(2002)017<0783:COMFPQ>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bermingham, A., and A. Smeaton, 2011: On using Twitter to monitor political sentiment and predict election results. Proc. Workshop on Sentiment Analysis Where AI Meets Psychology (SAAIP 2011), Chiang Mai, Thailand, SAAIP, 2–10.

  • Bonnin, G. M., D. Todd, B. Lin, T. Parzybok, M. Yekta, and D. Riley, 2004: Precipitation-Frequency Atlas of the United States. NOAA Atlas 14, Vol. 1, 271 pp.

  • Bonnin, G. M., D. Martin, B. Lin, T. Parzybok, M. Yekta, and D. Riley, 2006: Precipitation-Frequency Atlas of the United States. NOAA Atlas 14, Vol. 2, 301 pp.

  • Breiman, L., 2001: Random forests. Mach. Learn., 45, 532, https://doi.org/10.1023/A:1010933404324.

  • Bremnes, J. B., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Mon. Wea. Rev., 132, 338347, https://doi.org/10.1175/1520-0493(2004)132<0338:PFOPIT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cao, L.-J., and F. E. H. Tay, 2003: Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans. Neural Network, 14, 15061518, https://doi.org/10.1109/TNN.2003.820556.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Clark, A. J., W. A. Gallus Jr., and M. L. Weisman, 2010: Neighborhood-based verification of precipitation forecasts from convection-allowing NCAR WRF Model simulations and the operational NAM. Wea. Forecasting, 25, 14951509, https://doi.org/10.1175/2010WAF2222404.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Collobert, R., J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, 2011: Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12, 24932537.

    • Search Google Scholar
    • Export Citation
  • DeMaria, M., and J. Kaplan, 1994: A Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic basin. Wea. Forecasting, 9, 209220, https://doi.org/10.1175/1520-0434(1994)009<0209:ASHIPS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Doswell, C. A., III, H. E. Brooks, and R. A. Maddox, 1996: Flash flood forecasting: An ingredients-based methodology. Wea. Forecasting, 11, 560581, https://doi.org/10.1175/1520-0434(1996)011<0560:FFFAIB>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Friedman, J. H., 2001: Greedy function approximation: A gradient boosting machine. Ann. Stat., 29, 11891232, https://doi.org/10.1214/aos/1013203451.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., 2016: Coupling data science techniques and numerical weather prediction models for high-impact weather prediction. Ph.D. thesis, University of Oklahoma, 204 pp.

  • Gagne, D. J., A. McGovern, and M. Xue, 2014: Machine learning enhancement of storm-scale ensemble probabilistic quantitative precipitation forecasts. Wea. Forecasting, 29, 10241043, https://doi.org/10.1175/WAF-D-13-00108.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., A. McGovern, S. E. Haupt, R. A. Sobash, J. K. Williams, and M. Xue, 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Wea. Forecasting, 32, 18191840, https://doi.org/10.1175/WAF-D-17-0010.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Glahn, H. R., and D. A. Lowry, 1972: The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11, 12031211, https://doi.org/10.1175/1520-0450(1972)011<1203:TUOMOS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gneiting, T., A. E. Raftery, A. H. Westveld III, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118, https://doi.org/10.1175/MWR2904.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Grams, J. S., W. A. Gallus Jr., S. E. Koch, L. S. Wharton, A. Loughe, and E. E. Ebert, 2006: The use of a modified Ebert–McBride technique to evaluate mesoscale model QPF as a function of convective system morphology during IHOP 2002. Wea. Forecasting, 21, 288306, https://doi.org/10.1175/WAF918.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hall, T., H. E. Brooks, and C. A. Doswell III, 1999: Precipitation forecasting using a neural network. Wea. Forecasting, 14, 338345, https://doi.org/10.1175/1520-0434(1999)014<0338:PFUANN>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., G. T. Bates, J. S. Whitaker, D. R. Murray, M. Fiorino, T. J. Galarneau Jr., Y. Zhu, and W. Lapenta, 2013: NOAA’s Second-Generation Global Medium-Range Ensemble Reforecast Dataset. Bull. Amer. Meteor. Soc., 94, 15531565, https://doi.org/10.1175/BAMS-D-12-00014.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2016a: Extreme precipitation in models: An evaluation. Wea. Forecasting, 31, 18531879, https://doi.org/10.1175/WAF-D-16-0093.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2016b: Using reforecasts to improve forecasting of fog and visibility for aviation. Wea. Forecasting, 31, 467482, https://doi.org/10.1175/WAF-D-15-0108.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018: Money doesn't grow on trees, but forecasts do: Forecasting extreme precipitation with random forests. Mon. Wea. Rev., 146, 15711600, https://doi.org/10.1175/MWR-D-17-0250.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hershfield, D. M., 1961: Rainfall frequency atlas of the United States. Weather Bureau, Department of Commerce Tech Paper 40, 65 pp.

  • Jones, T. A., D. Cecil, and M. DeMaria, 2006: Passive-microwave-enhanced Statistical Hurricane Intensity Prediction Scheme. Wea. Forecasting, 21, 613635, https://doi.org/10.1175/WAF941.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Larrañaga, P., and Coauthors, 2006: Machine learning in bioinformatics. Brief. Bioinform., 7, 86112, https://doi.org/10.1093/bib/bbk007.

  • Lin, Y., and K. E. Mitchell, 2005: The NCEP Stage II/IV hourly precipitation analyses: Development and applications. 19th Conf. on Hydrology, San Diego, CA, Amer. Meteor. Soc., 1.2, https://ams.confex.com/ams/Annual2005/techprogram/paper_83847.htm.

  • Lorenz, E. N., 1956: Empirical orthogonal functions and statistical weather prediction. Statistical Forecasting Project, Department of Meteorology, MIT Science Rep. 1, 49 pp.

  • Marzban, C., and G. J. Stumpf, 1996: A neural network for tornado prediction based on Doppler radar-derived attributes. J. Appl. Meteor., 35, 617626, https://doi.org/10.1175/1520-0450(1996)035<0617:ANNFTP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marzban, C., and A. Witt, 2001: A Bayesian neural network for severe-hail size prediction. Wea. Forecasting, 16, 600610, https://doi.org/10.1175/1520-0434(2001)016<0600:ABNNFS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • McGovern, A., K. L. Elmore, D. J. Gagne, S. E. Haupt, C. D. Karstens, R. Lagerquist, T. Smith, and J. K. Williams, 2017: Using artificial intelligence to improve real-time decision-making for high-impact weather. Bull. Amer. Meteor. Soc., 98, 20732090, https://doi.org/10.1175/BAMS-D-16-0123.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mercer, A. E., C. M. Shafer, C. A. Doswell III, L. M. Leslie, and M. B. Richman, 2012: Synoptic composites of tornadic and nontornadic outbreaks. Mon. Wea. Rev., 140, 25902608, https://doi.org/10.1175/MWR-D-12-00029.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miller, J., R. Frederick, and R. Tracey, 1973: Precipitation-Frequency Atlas of the Western United States. NOAA Atlas 2, Vol. 3, 35 pp.

  • Pedregosa, F., and Coauthors, 2011: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 28252830.

  • Perica, S., and Coauthors, 2011: Precipitation-Frequency Atlas of the United States. NOAA Atlas 14, Vol. 6, 233 pp.

  • Perica, S., and Coauthors, 2013: Precipitation-Frequency Atlas of the United States. NOAA Atlas 14, Vol. 9, 163 pp.

  • Peters, J. M., and R. S. Schumacher, 2014: Objective categorization of heavy-rain-producing MCS synoptic types by rotated principal component analysis. Mon. Wea. Rev., 142, 17161737, https://doi.org/10.1175/MWR-D-13-00295.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ralph, F. M., P. J. Neiman, G. N. Kiladis, K. Weickmann, and D. W. Reynolds, 2011: A multiscale observational case study of a Pacific atmospheric river exhibiting tropical–extratropical connections and a mesoscale frontal wave. Mon. Wea. Rev., 139, 11691189, https://doi.org/10.1175/2010MWR3596.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Richardson, L. F., 2007: Weather Prediction by Numerical Process. Cambridge University Press, 237 pp.

    • Crossref
    • Export Citation
  • Richman, M. B., 1986: Rotation of principal components. Int. J. Climatol., 6, 293335, https://doi.org/10.1002/joc.3370060305.

  • Roebber, P. J., 2013: Using evolutionary programming to generate skillful extreme value probabilistic forecasts. Mon. Wea. Rev., 141, 31703185, https://doi.org/10.1175/MWR-D-12-00285.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rosten, E., and T. Drummond, 2006: Machine learning for high-speed corner detection. Ninth European Conf. on Computer Vision, Graz, Austria, ECCV, 430–443, https://doi.org/10.1007/11744023_34.

    • Crossref
    • Export Citation
  • Rozas-Larraondo, P., I. Inza, and J. A. Lozano, 2014: A method for wind speed forecasting in airports based on nonparametric regression. Wea. Forecasting, 29, 13321342, https://doi.org/10.1175/WAF-D-14-00006.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rutz, J. J., W. J. Steenburgh, and F. M. Ralph, 2014: Climatological characteristics of atmospheric rivers and their inland penetration over the western United States. Mon. Wea. Rev., 142, 905921, https://doi.org/10.1175/MWR-D-13-00168.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schumacher, R. S., and R. H. Johnson, 2005: Organization and environmental properties of extreme-rain-producing mesoscale convective systems. Mon. Wea. Rev., 133, 961976, https://doi.org/10.1175/MWR2899.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schumacher, R. S., and R. H. Johnson, 2006: Characteristics of U.S. extreme rain events during 1999–2003. Wea. Forecasting, 21, 6985, https://doi.org/10.1175/WAF900.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sloughter, J. M. L., A. E. Raftery, T. Gneiting, and C. Fraley, 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220, https://doi.org/10.1175/MWR3441.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stevenson, S. N., and R. S. Schumacher, 2014: A 10-year survey of extreme rainfall events in the central and eastern United States using gridded multisensor precipitation analyses. Mon. Wea. Rev., 142, 31473162, https://doi.org/10.1175/MWR-D-13-00345.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn, 2007: Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8, 25, https://doi.org/10.1186/1471-2105-8-25.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis, 2008: Conditional variable importance for random forests. BMC Bioinformatics, 9, 307, https://doi.org/10.1186/1471-2105-9-307.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Thompson, D. W., and J. M. Wallace, 1998: The Arctic Oscillation signature in the wintertime geopotential height and temperature fields. Geophys. Res. Lett., 25, 12971300, https://doi.org/10.1029/98GL00950.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, S.-Y., T.-C. Chen, and S. E. Taylor, 2009: Evaluations of NAM forecasts on midtropospheric perturbation-induced convective storms over the U.S. northern plains. Wea. Forecasting, 24, 13091333, https://doi.org/10.1175/2009WAF2222185.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wheeler, M. C., and H. H. Hendon, 2004: An all-season real-time multivariate MJO index: Development of an index for monitoring and prediction. Mon. Wea. Rev., 132, 19171932, https://doi.org/10.1175/1520-0493(2004)132<1917:AARMMI>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wick, G. A., P. J. Neiman, F. M. Ralph, and T. M. Hamill, 2013: Evaluation of forecasts of the water vapor signature of atmospheric rivers in operational numerical weather prediction models. Wea. Forecasting, 28, 13371352, https://doi.org/10.1175/WAF-D-13-00025.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp.

  • Williams, J. K., 2014: Using random forests to diagnose aviation turbulence. Mach. Learn., 95, 5170, https://doi.org/10.1007/s10994-013-5346-7.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zou, H., T. Hastie, and R. Tibshirani, 2006: Sparse principal component analysis. J. Comput. Graph. Stat., 15, 265286, https://doi.org/10.1198/106186006X113430.

    • Crossref
    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 2396 756 95
PDF Downloads 2055 664 205