A Comparison of AI Weather Prediction and Numerical Weather Prediction Models for 1–7-Day Precipitation Forecasts

Jacob T. Radford a Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, CO, USA
b NOAA Global Systems Laboratory, Boulder, CO, USA

Search for other papers by Jacob T. Radford in
Current site
Google Scholar
PubMed
Close
,
Imme Ebert-Uphoff a Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, CO, USA

Search for other papers by Imme Ebert-Uphoff in
Current site
Google Scholar
PubMed
Close
, and
Jebb Q. Stewart b NOAA Global Systems Laboratory, Boulder, CO, USA

Search for other papers by Jebb Q. Stewart in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Pure AI-based weather prediction (AIWP) models have made waves within the scientific community and the media, claiming superior performance to numerical weather prediction (NWP) models. However, these models often lack impactful output variables such as precipitation. One exception is Google DeepMind’s GraphCast model, which became the first mainstream AIWP model to predict precipitation, but performed only limited verification. We present an analysis of the ECMWF’s Integrated Forecast System (IFS)-initialized (GRAPIFS) and the NCEP’s Global Forecast System (GFS)-initialized (GRAPGFS) GraphCast precipitation forecasts over the contiguous United States and compare to results from the GFS and IFS models using 1) grid-based, 2) neighborhood, and 3) object-oriented metrics verified against the ECMWF Reanalysis v5 (ERA5) and the NCEP/EMC Stage IV precipitation analysis datasets. We affirmed that GRAPGFS and GRAPIFS perform better than the GFS and IFS in terms of root mean squared error and stable equitable errors in probability space, but the GFS and IFS precipitation distributions more closely align with the ERA5 and Stage IV distributions. Equitable threat score also generally favored GraphCast, particularly for lower accumulation thresholds. Fractions skill score for increasing neighborhood sizes shows greater gains for the GFS and IFS than GraphCast, suggesting the NWP models may have a better handle on intensity but struggle with location. Object-oriented verification for GraphCast found positive area biases at low accumulation thresholds and large negative biases at high accumulation thresholds. GRAPGFS saw similar performance gains to GRAPIFS when compared to their NWP counterparts, but initializing with the less familiar GFS conditions appeared to lead to an increase in light precipitation.

© 2025 American Meteorological Society. This is an Author Accepted Manuscript distributed under the terms of the default AMS reuse license. For information regarding reuse and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jacob Radford, jacob.radford@noaa.gov

Abstract

Pure AI-based weather prediction (AIWP) models have made waves within the scientific community and the media, claiming superior performance to numerical weather prediction (NWP) models. However, these models often lack impactful output variables such as precipitation. One exception is Google DeepMind’s GraphCast model, which became the first mainstream AIWP model to predict precipitation, but performed only limited verification. We present an analysis of the ECMWF’s Integrated Forecast System (IFS)-initialized (GRAPIFS) and the NCEP’s Global Forecast System (GFS)-initialized (GRAPGFS) GraphCast precipitation forecasts over the contiguous United States and compare to results from the GFS and IFS models using 1) grid-based, 2) neighborhood, and 3) object-oriented metrics verified against the ECMWF Reanalysis v5 (ERA5) and the NCEP/EMC Stage IV precipitation analysis datasets. We affirmed that GRAPGFS and GRAPIFS perform better than the GFS and IFS in terms of root mean squared error and stable equitable errors in probability space, but the GFS and IFS precipitation distributions more closely align with the ERA5 and Stage IV distributions. Equitable threat score also generally favored GraphCast, particularly for lower accumulation thresholds. Fractions skill score for increasing neighborhood sizes shows greater gains for the GFS and IFS than GraphCast, suggesting the NWP models may have a better handle on intensity but struggle with location. Object-oriented verification for GraphCast found positive area biases at low accumulation thresholds and large negative biases at high accumulation thresholds. GRAPGFS saw similar performance gains to GRAPIFS when compared to their NWP counterparts, but initializing with the less familiar GFS conditions appeared to lead to an increase in light precipitation.

© 2025 American Meteorological Society. This is an Author Accepted Manuscript distributed under the terms of the default AMS reuse license. For information regarding reuse and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jacob Radford, jacob.radford@noaa.gov
Save