A Comparison of AI Weather Prediction and Numerical Weather Prediction Models for 1–7-Day Precipitation Forecasts

Jacob T. Radford Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado
NOAA/Global Systems Laboratory, Boulder, Colorado

Search for other papers by Jacob T. Radford in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0001-6824-8967
,
Imme Ebert-Uphoff Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by Imme Ebert-Uphoff in
Current site
Google Scholar
PubMed
Close
, and
Jebb Q. Stewart NOAA/Global Systems Laboratory, Boulder, Colorado

Search for other papers by Jebb Q. Stewart in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

Pure artificial intelligence (AI)-based weather prediction (AIWP) models have made waves within the scientific community and the media, claiming superior performance to numerical weather prediction (NWP) models. However, these models often lack impactful output variables such as precipitation. One exception is Google DeepMind’s GraphCast model, which became the first mainstream AIWP model to predict precipitation, but performed only limited verification. We present an analysis of the ECMWF’s Integrated Forecasting System (IFS)-initialized (GRAPIFS) and the NCEP’s Global Forecast System (GFS)-initialized (GRAPGFS) GraphCast precipitation forecasts over the contiguous United States and compare to results from the GFS and IFS models using 1) grid-based, 2) neighborhood, and 3) object-oriented metrics verified against the fifth major global reanalysis produced by ECMWF (ERA5) and the NCEP/Environmental Modeling Center (EMC) stage IV precipitation analysis datasets. We affirmed that GRAPGFS and GRAPIFS perform better than the GFS and IFS in terms of root-mean-square error and stable equitable errors in probability space, but the GFS and IFS precipitation distributions more closely align with the ERA5 and stage IV distributions. Equitable threat score also generally favored GraphCast, particularly for lower accumulation thresholds. Fractions skill score for increasing neighborhood sizes shows greater gains for the GFS and IFS than GraphCast, suggesting the NWP models may have a better handle on intensity but struggle with the location. Object-based verification for GraphCast found positive area biases at low accumulation thresholds and large negative biases at high accumulation thresholds. GRAPGFS saw similar performance gains to GRAPIFS when compared to their NWP counterparts, but initializing with the less familiar GFS conditions appeared to lead to an increase in light precipitation.

Significance Statement

Pure artificial intelligence (AI)-based weather prediction (AIWP) has exploded in popularity with promises of better performance and faster run times than numerical weather prediction (NWP) models. However, less attention has been paid to their capability to predict impactful, sensible weather like precipitation, precipitation type, or specific meteorological features. We seek to address this gap by comparing precipitation forecast performance by an AI model called GraphCast to the Global Forecast System (GFS) and the Integrated Forecasting System (IFS) NWP models. While GraphCast does perform better on many verification metrics, it has some limitations for intense precipitation forecasts. In particular, it less frequently predicts intense precipitation events than the GFS or IFS. Overall, this article emphasizes the promise of AIWP while at the same time stresses the need for robust verification by domain experts.

© 2025 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jacob Radford, jacob.radford@noaa.gov

Abstract

Pure artificial intelligence (AI)-based weather prediction (AIWP) models have made waves within the scientific community and the media, claiming superior performance to numerical weather prediction (NWP) models. However, these models often lack impactful output variables such as precipitation. One exception is Google DeepMind’s GraphCast model, which became the first mainstream AIWP model to predict precipitation, but performed only limited verification. We present an analysis of the ECMWF’s Integrated Forecasting System (IFS)-initialized (GRAPIFS) and the NCEP’s Global Forecast System (GFS)-initialized (GRAPGFS) GraphCast precipitation forecasts over the contiguous United States and compare to results from the GFS and IFS models using 1) grid-based, 2) neighborhood, and 3) object-oriented metrics verified against the fifth major global reanalysis produced by ECMWF (ERA5) and the NCEP/Environmental Modeling Center (EMC) stage IV precipitation analysis datasets. We affirmed that GRAPGFS and GRAPIFS perform better than the GFS and IFS in terms of root-mean-square error and stable equitable errors in probability space, but the GFS and IFS precipitation distributions more closely align with the ERA5 and stage IV distributions. Equitable threat score also generally favored GraphCast, particularly for lower accumulation thresholds. Fractions skill score for increasing neighborhood sizes shows greater gains for the GFS and IFS than GraphCast, suggesting the NWP models may have a better handle on intensity but struggle with the location. Object-based verification for GraphCast found positive area biases at low accumulation thresholds and large negative biases at high accumulation thresholds. GRAPGFS saw similar performance gains to GRAPIFS when compared to their NWP counterparts, but initializing with the less familiar GFS conditions appeared to lead to an increase in light precipitation.

Significance Statement

Pure artificial intelligence (AI)-based weather prediction (AIWP) has exploded in popularity with promises of better performance and faster run times than numerical weather prediction (NWP) models. However, less attention has been paid to their capability to predict impactful, sensible weather like precipitation, precipitation type, or specific meteorological features. We seek to address this gap by comparing precipitation forecast performance by an AI model called GraphCast to the Global Forecast System (GFS) and the Integrated Forecasting System (IFS) NWP models. While GraphCast does perform better on many verification metrics, it has some limitations for intense precipitation forecasts. In particular, it less frequently predicts intense precipitation events than the GFS or IFS. Overall, this article emphasizes the promise of AIWP while at the same time stresses the need for robust verification by domain experts.

© 2025 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jacob Radford, jacob.radford@noaa.gov
Save
  • Accadia, C., S. Mariani, M. Casaioli, A. Lavagnini, and A. Speranza, 2003: Sensitivity of precipitation forecast skill scores to bilinear interpolation and a simple nearest-neighbor average method on high-resolution verification grids. Wea. Forecasting, 18, 918932, https://doi.org/10.1175/1520-0434(2003)018<0918:SOPFSS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Andrychowicz, M., L. Espeholt, D. Li, S. Merchant, A. Merose, F. Zyda, S. Agrawal, and N. Kalchbrenner, 2023: Deep learning for day forecasts from sparse observations. arXiv, 2306.06079v3, https://doi.org/10.48550/arXiv.2306.06079.

  • Barber, G., 2023: Google DeepMind’s AI weather forecaster handily beats a global standard. Wired, 14 November, https://www.wired.com/story/google-deepmind-ai-weather-forecast/.

  • Ben Bouallègue, Z., and Coauthors, 2024: The rise of data-driven weather forecasting: A first statistical assessment of machine learning–based weather forecasts in an operational-like context. Bull. Amer. Meteor. Soc., 105, E864E883, https://doi.org/10.1175/BAMS-D-23-0162.1.

    • Search Google Scholar
    • Export Citation
  • Bi, K., L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, 2023: Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619, 533538, https://doi.org/10.1038/s41586-023-06185-3.

    • Search Google Scholar
    • Export Citation
  • Bonev, B., T. Kurth, C. Hundt, J. Pathak, M. Baust, K. Kashinath, and A. Anandkumar, 2023: Spherical Fourier Neural Operators: Learning stable dynamics on the sphere. arXiv, 2306.03838v1, https://doi.org/10.48550/arXiv.2306.03838.

  • Brooks, H. E., M. L. Flora, and M. E. Baldwin, 2024: A rose by any other name: On basic scores from the 2 × 2 table and the plethora of names attached to them. Artif. Intell. Earth Syst., 3, e230104, https://doi.org/10.1175/AIES-D-23-0104.1.

    • Search Google Scholar
    • Export Citation
  • Davis, C., B. Brown, and R. Bullock, 2006a: Object-based verification of precipitation forecasts. Part I: Methodology and application to mesoscale rain areas. Mon. Wea. Rev., 134, 17721784, https://doi.org/10.1175/MWR3145.1.

    • Search Google Scholar
    • Export Citation
  • Davis, C., B. Brown, and R. Bullock, 2006b: Object-based verification of precipitation forecasts. Part II: Application to convective rain systems. Mon. Wea. Rev., 134, 17851795, https://doi.org/10.1175/MWR3146.1.

    • Search Google Scholar
    • Export Citation
  • Davis, C. A., B. G. Brown, R. Bullock, and J. Halley-Gotway, 2009: The method for object-based diagnostic evaluation (MODE) applied to numerical forecasts from the 2005 NSSL/SPC Spring Program. Wea. Forecasting, 24, 12521267, https://doi.org/10.1175/2009WAF2222241.1.

    • Search Google Scholar
    • Export Citation
  • DeMaria, M., J. L. Franklin, G. Chirokova, J. Radford, R. DeMaria, K. D. Musgrave, and I. Ebert-Uphoff, 2024: Evaluation of tropical cyclone track and intensity forecasts from Artificial Intelligence Weather Prediction (AIWP) models. arXiv, 2409.06735v1, https://doi.org/10.48550/arXiv.2409.06735.

  • Donaldson, R. J., R. M. Dyer, and M. J. Kraus, 1975: An objective evaluator of techniques for predicting severe weather events. Ninth Conf. on Severe Local Storms, Norman, OK, Amer. Meteor. Soc., 321–326.

  • Duda, J. D., and D. D. Turner, 2024: Sensitivity of object-based verification results to configuration options using mode. 19th Conf. on Weather Analysis and Forecasting and 30th Conf. on Numerical Weather Prediction, Minneapolis, MN, Amer. Meteor. Soc., 16.3, https://ams.confex.com/ams/WAFNWPMS/meetingapp.cgi/Paper/424519.

  • Ebert, E. E., 2008: Fuzzy verification of high-resolution gridded forecasts: A review and proposed framework. Meteor. Appl., 15, 5164, https://doi.org/10.1002/met.25.

    • Search Google Scholar
    • Export Citation
  • ECMWF, 2024: Atmospheric Model high resolution 15-day forecast (Set I—HRES). ECMWF, accessed 7 November 2024, https://www.ecmwf.int/en/forecasts/datasets/set-i.

  • Gilbert, G. K., 1884: Finley’s tornado predictions. Amer. Meteor. J., 1, 166172.

  • Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 19992049, https://doi.org/10.1002/qj.3803.

    • Search Google Scholar
    • Export Citation
  • Jensen, T., and Coauthors, 2023: The MET version 11.1.0 user’s guide. Developmental Testbed Center, https://github.com/dtcenter/MET/releases.

  • Kursinski, A. L., and S. L. Mullen, 2008: Spatiotemporal variability of hourly precipitation over the eastern contiguous united states from stage IV multisensor analyses. J. Hydrometeor., 9, 321, https://doi.org/10.1175/2007JHM856.1.

    • Search Google Scholar
    • Export Citation
  • Lam, R., and Coauthors, 2023: Learning skillful medium-range global weather forecasting. Science, 382, 14161421, https://doi.org/10.1126/science.adi2336.

    • Search Google Scholar
    • Export Citation
  • Lang, S., and Coauthors, 2024: AIFS—ECMWF’s data-driven forecasting system. arXiv, 2406.01465v2, https://doi.org/10.48550/arXiv.2406.01465.

  • Lavers, D. A., A. Simmons, F. Vamborg, and M. J. Rodwell, 2022: An evaluation of ERA5 precipitation for climate monitoring. Quart. J. Roy. Meteor. Soc., 148, 31523165, https://doi.org/10.1002/qj.4351.

    • Search Google Scholar
    • Export Citation
  • Lin, Y., and K. E. Mitchell, 2005: The NCEP Stage II/IV hourly precipitation analyses: Development and applications. 19th Conf. on Hydrology, San Diego, CA, Amer. Meteor. Soc., 1.2, https://ams.confex.com/ams/Annual2005/techprogram/paper_83847.htm.

  • Maddox, R. A., J. Zhang, J. J. Gourley, and K. W. Howard, 2002: Weather radar coverage over the contiguous United States. Wea. Forecasting, 17, 927934, https://doi.org/10.1175/1520-0434(2002)017<0927:WRCOTC>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • NWS and NCEP, 2014: Technical implementation notice 14-46: Global Forecast Systems (GFS) update: Effective January 14, 2015. Office of Science and Technology, NOAA, https://www.weather.gov/notification/archive.

  • Pathak, J., and Coauthors, 2022: FourCastNet: A global data-driven high-resolution weather model using adaptive Fourier neural operators. arXiv, 2202.11214v1, https://doi.org/10.48550/arXiv.2202.11214.

  • Pulkkinen, S., D. Nerini, A. A. Pérez Hortal, C. Velasco-Forero, A. Seed, U. Germann, and L. Foresti, 2019: Pysteps: An open-source Python library for probabilistic precipitation nowcasting (v1.0). Geosci. Model Dev., 12, 41854219, https://doi.org/10.5194/gmd-12-4185-2019.

    • Search Google Scholar
    • Export Citation
  • Rasp, S., and Coauthors, 2024: WeatherBench 2: A benchmark for the next generation of data-driven global weather models. arXiv, 2308.15560v2, https://doi.org/10.48550/arXiv.2308.15560.

  • Roberts, N. M., and H. W. Lean, 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 7897, https://doi.org/10.1175/2007MWR2123.1.

    • Search Google Scholar
    • Export Citation
  • Rodwell, M. J., D. S. Richardson, T. D. Hewson, and T. Haiden, 2010: A new equitable score suitable for verifying precipitation in numerical weather prediction. Quart. J. Roy. Meteor. Soc., 136, 13441363, https://doi.org/10.1002/qj.656.

    • Search Google Scholar
    • Export Citation
  • Wolff, J. K., M. Harrold, T. Fowler, J. H. Gotway, L. Nance, and B. G. Brown, 2014: Beyond the basics: Evaluating model-based precipitation forecasts using traditional, spatial, and object-based methods. Wea. Forecasting, 29, 14511472, https://doi.org/10.1175/WAF-D-13-00135.1.

    • Search Google Scholar
    • Export Citation
  • Wong, C., 2023: DeepMind AI accurately forecasts weather—On a desktop computer. Nature, 14 November, https://doi.org/10.1038/d41586-023-03552-y.

  • Yan, H., and W. A. Gallus Jr., 2016: An evaluation of QPF from the WRF, NAM, and GFS models using multiple verification methods over a small domain. Wea. Forecasting, 31, 13631379, https://doi.org/10.1175/WAF-D-16-0020.1.

    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 2988 2988 1729
PDF Downloads 1832 1832 526