Search Results
atmosphere’s scale-dependence behavior appropriately, shortcomings in the numerics or parameterizations are likely. In the case of kinetic energy, the evaluation of scaling exponents has provided valuable insights into model performance ( Skamarock 2004 ; Hamilton et al. 2008 ; Bierdel et al. 2012 ; Fang and Kuo 2015 ). For water vapor, Schemann et al. (2013) investigated the scaling behaviors of a GCM, an NWP model, and a large-eddy simulation (LES) and the implication for cloud parameterizations
atmosphere’s scale-dependence behavior appropriately, shortcomings in the numerics or parameterizations are likely. In the case of kinetic energy, the evaluation of scaling exponents has provided valuable insights into model performance ( Skamarock 2004 ; Hamilton et al. 2008 ; Bierdel et al. 2012 ; Fang and Kuo 2015 ). For water vapor, Schemann et al. (2013) investigated the scaling behaviors of a GCM, an NWP model, and a large-eddy simulation (LES) and the implication for cloud parameterizations
-permitting (CP) version of the Met Office (UKMO) model performed better than the global model over East Africa, especially for extreme events. However, they also noted that the CP model often overestimates the extreme rainfall compared to observations. Stellingwerf et al. (2021) showed that the ECMWF model performed best when compared to models from other centers. At the subseasonal time scale, de Andrade et al. (2021) evaluated reforecasts from three global forecasting centers over Africa and found that
-permitting (CP) version of the Met Office (UKMO) model performed better than the global model over East Africa, especially for extreme events. However, they also noted that the CP model often overestimates the extreme rainfall compared to observations. Stellingwerf et al. (2021) showed that the ECMWF model performed best when compared to models from other centers. At the subseasonal time scale, de Andrade et al. (2021) evaluated reforecasts from three global forecasting centers over Africa and found that
analysis are therefore made up of values obtained from different lead times of the forecasts. This is acceptable to us since we are not interested in evaluating model performance at a fixed forecast horizon. We have found that the statistical robustness of our results is considerably enhanced when combining consecutive ensemble runs compared to an analysis based on a single ensemble only. Fig . 1. For the sensitivity analysis in section 3 , we combine four consecutive forecast ensembles, here
analysis are therefore made up of values obtained from different lead times of the forecasts. This is acceptable to us since we are not interested in evaluating model performance at a fixed forecast horizon. We have found that the statistical robustness of our results is considerably enhanced when combining consecutive ensemble runs compared to an analysis based on a single ensemble only. Fig . 1. For the sensitivity analysis in section 3 , we combine four consecutive forecast ensembles, here
distributions on the performance of the SEC is evaluated in section 5e . Figure 1 shows the sampling error corrected correlation r ^ sec as a function of the sample correlation r ^ for different ensemble sizes and a uniform prior. For example, applying the SEC using a 40-member ensemble, a sample correlation of 0.5 is corrected to approximately 0.42. This study mainly uses the SEC table provided by the Data Assimilation Research Testbed (DART; Anderson et al. 2009 ) that is based on a uniform prior
distributions on the performance of the SEC is evaluated in section 5e . Figure 1 shows the sampling error corrected correlation r ^ sec as a function of the sample correlation r ^ for different ensemble sizes and a uniform prior. For example, applying the SEC using a 40-member ensemble, a sample correlation of 0.5 is corrected to approximately 0.42. This study mainly uses the SEC table provided by the Data Assimilation Research Testbed (DART; Anderson et al. 2009 ) that is based on a uniform prior
GPD was fitted to the subsets of extreme events (i.e., >95th percentile) in the RG and SREs datasets. The extremes in the SREs were obtained in a similar way to the RGs extremes. To make the stations and all the rainfall products comparable, we normalized the modeled return values with the RG-modeled return values at the stations, then averaged over all stations for each dataset. The normalized return values of the RG data were taken as the reference for evaluating the SREs. The performance of
GPD was fitted to the subsets of extreme events (i.e., >95th percentile) in the RG and SREs datasets. The extremes in the SREs were obtained in a similar way to the RGs extremes. To make the stations and all the rainfall products comparable, we normalized the modeled return values with the RG-modeled return values at the stations, then averaged over all stations for each dataset. The normalized return values of the RG data were taken as the reference for evaluating the SREs. The performance of
predictive performance; we thus focus on maximum likelihood-based methods instead. 7 To account for the intertwined choice of scoring rules for model estimation and evaluation ( Gebetsberger et al. 2017 ), we have also evaluated the models using LogS. However, as the results are very similar to those reported here and computation of LogS for the raw ensemble and QRF forecasts is problematic ( Krüger et al. 2016 ), we focus on CRPS-based evaluation. 8 For example, see https://colab.research.google.com/ .
predictive performance; we thus focus on maximum likelihood-based methods instead. 7 To account for the intertwined choice of scoring rules for model estimation and evaluation ( Gebetsberger et al. 2017 ), we have also evaluated the models using LogS. However, as the results are very similar to those reported here and computation of LogS for the raw ensemble and QRF forecasts is problematic ( Krüger et al. 2016 ), we focus on CRPS-based evaluation. 8 For example, see https://colab.research.google.com/ .
statistical postprocessing methods, whose predictive performance is evaluated in section 4 . A meteorological interpretation of what the models have learned is presented in section 5 . Section 6 concludes with a discussion. R code ( R Core Team 2021 ) with implementations of all methods is available online ( https://github.com/benediktschulz/paper_pp_wind_gusts ). 2. Data and notation a. Forecast and observation data Our study is based on the same dataset as Pantillon et al. (2018) and we
statistical postprocessing methods, whose predictive performance is evaluated in section 4 . A meteorological interpretation of what the models have learned is presented in section 5 . Section 6 concludes with a discussion. R code ( R Core Team 2021 ) with implementations of all methods is available online ( https://github.com/benediktschulz/paper_pp_wind_gusts ). 2. Data and notation a. Forecast and observation data Our study is based on the same dataset as Pantillon et al. (2018) and we
performance of current operational systems with respect to tropical rainfall calls for alternative approaches reaching from convection-permitting resolution ( Pante and Knippertz 2019 ) to methods from statistics and machine learning ( Shi et al. 2015 ; Rasp et al. 2020 ; Vogel et al. 2021 ). Before developing and evaluating new models and approaches, it is essential to establish benchmark forecasts in order to systematically assess forecast improvement. Rasp et al. (2020) recently proposed
performance of current operational systems with respect to tropical rainfall calls for alternative approaches reaching from convection-permitting resolution ( Pante and Knippertz 2019 ) to methods from statistics and machine learning ( Shi et al. 2015 ; Rasp et al. 2020 ; Vogel et al. 2021 ). Before developing and evaluating new models and approaches, it is essential to establish benchmark forecasts in order to systematically assess forecast improvement. Rasp et al. (2020) recently proposed
). Conventional observations such as surface stations and weather balloons are scarce at low latitudes, particularly over the vast tropical oceans. Consequently, the observing system is dominated by satellite data, which are heavily skewed toward measuring atmospheric mass variables rather than wind (e.g., Baker et al. 2014 ). However, data denial experiments for periods with a much enhanced radiosonde network during field campaigns over West Africa have shown a relatively small impact on model performance
). Conventional observations such as surface stations and weather balloons are scarce at low latitudes, particularly over the vast tropical oceans. Consequently, the observing system is dominated by satellite data, which are heavily skewed toward measuring atmospheric mass variables rather than wind (e.g., Baker et al. 2014 ). However, data denial experiments for periods with a much enhanced radiosonde network during field campaigns over West Africa have shown a relatively small impact on model performance
–STV relationship over the Asian–Pacific–American region is still unclear. In addition, phase 6 of the Coupled Model Intercomparison Project (CMIP6; Eyring et al. 2016 ) has recently been released. Whether the models of the new version can produce a more realistic ENSO–STV simulation than the last generation (CMIP5) also needs to be evaluated. In this study, we first aim to examine the relationship between ENSO and STV over the Asian–Pacific–American region with CMIP5/6 models in a historical simulation and
–STV relationship over the Asian–Pacific–American region is still unclear. In addition, phase 6 of the Coupled Model Intercomparison Project (CMIP6; Eyring et al. 2016 ) has recently been released. Whether the models of the new version can produce a more realistic ENSO–STV simulation than the last generation (CMIP5) also needs to be evaluated. In this study, we first aim to examine the relationship between ENSO and STV over the Asian–Pacific–American region with CMIP5/6 models in a historical simulation and