Search Results
subpixel scales. To bridge the gap between satellite and ground-based observations, airborne measurements of spectral irradiance are a useful tool for direct model evaluation. In this regard, some airborne campaigns were conducted, operating instruments to measure the spectral solar radiation reflected by clouds, e.g., Wendisch and Keil (1999) , Wendisch et al. (2005 , 2007) , Jacob et al. (2010) , or Smith et al. (2017) . In spite of that, the natural variability of clouds might not be covered
subpixel scales. To bridge the gap between satellite and ground-based observations, airborne measurements of spectral irradiance are a useful tool for direct model evaluation. In this regard, some airborne campaigns were conducted, operating instruments to measure the spectral solar radiation reflected by clouds, e.g., Wendisch and Keil (1999) , Wendisch et al. (2005 , 2007) , Jacob et al. (2010) , or Smith et al. (2017) . In spite of that, the natural variability of clouds might not be covered
atmosphere’s scale-dependence behavior appropriately, shortcomings in the numerics or parameterizations are likely. In the case of kinetic energy, the evaluation of scaling exponents has provided valuable insights into model performance ( Skamarock 2004 ; Hamilton et al. 2008 ; Bierdel et al. 2012 ; Fang and Kuo 2015 ). For water vapor, Schemann et al. (2013) investigated the scaling behaviors of a GCM, an NWP model, and a large-eddy simulation (LES) and the implication for cloud parameterizations
atmosphere’s scale-dependence behavior appropriately, shortcomings in the numerics or parameterizations are likely. In the case of kinetic energy, the evaluation of scaling exponents has provided valuable insights into model performance ( Skamarock 2004 ; Hamilton et al. 2008 ; Bierdel et al. 2012 ; Fang and Kuo 2015 ). For water vapor, Schemann et al. (2013) investigated the scaling behaviors of a GCM, an NWP model, and a large-eddy simulation (LES) and the implication for cloud parameterizations
-permitting (CP) version of the Met Office (UKMO) model performed better than the global model over East Africa, especially for extreme events. However, they also noted that the CP model often overestimates the extreme rainfall compared to observations. Stellingwerf et al. (2021) showed that the ECMWF model performed best when compared to models from other centers. At the subseasonal time scale, de Andrade et al. (2021) evaluated reforecasts from three global forecasting centers over Africa and found that
-permitting (CP) version of the Met Office (UKMO) model performed better than the global model over East Africa, especially for extreme events. However, they also noted that the CP model often overestimates the extreme rainfall compared to observations. Stellingwerf et al. (2021) showed that the ECMWF model performed best when compared to models from other centers. At the subseasonal time scale, de Andrade et al. (2021) evaluated reforecasts from three global forecasting centers over Africa and found that
nominal level of α = 0.05 of the corresponding one-sided tests. The ( i , j ) entry in the i th row and j th column indicates the ratio of cases where the null hypothesis of equal predictive performance of the corresponding one-sided DM test is rejected in favor of the model in the i th row when compared with the model in the j th column. The remainder of the sum of ( i , j ) and ( j , i ) entry to 100% is the ratio of cases for which the score differences are not significant. Fig . 7
nominal level of α = 0.05 of the corresponding one-sided tests. The ( i , j ) entry in the i th row and j th column indicates the ratio of cases where the null hypothesis of equal predictive performance of the corresponding one-sided DM test is rejected in favor of the model in the i th row when compared with the model in the j th column. The remainder of the sum of ( i , j ) and ( j , i ) entry to 100% is the ratio of cases for which the score differences are not significant. Fig . 7
analysis are therefore made up of values obtained from different lead times of the forecasts. This is acceptable to us since we are not interested in evaluating model performance at a fixed forecast horizon. We have found that the statistical robustness of our results is considerably enhanced when combining consecutive ensemble runs compared to an analysis based on a single ensemble only. Fig . 1. For the sensitivity analysis in section 3 , we combine four consecutive forecast ensembles, here
analysis are therefore made up of values obtained from different lead times of the forecasts. This is acceptable to us since we are not interested in evaluating model performance at a fixed forecast horizon. We have found that the statistical robustness of our results is considerably enhanced when combining consecutive ensemble runs compared to an analysis based on a single ensemble only. Fig . 1. For the sensitivity analysis in section 3 , we combine four consecutive forecast ensembles, here
distributions on the performance of the SEC is evaluated in section 5e . Figure 1 shows the sampling error corrected correlation r ^ sec as a function of the sample correlation r ^ for different ensemble sizes and a uniform prior. For example, applying the SEC using a 40-member ensemble, a sample correlation of 0.5 is corrected to approximately 0.42. This study mainly uses the SEC table provided by the Data Assimilation Research Testbed (DART; Anderson et al. 2009 ) that is based on a uniform prior
distributions on the performance of the SEC is evaluated in section 5e . Figure 1 shows the sampling error corrected correlation r ^ sec as a function of the sample correlation r ^ for different ensemble sizes and a uniform prior. For example, applying the SEC using a 40-member ensemble, a sample correlation of 0.5 is corrected to approximately 0.42. This study mainly uses the SEC table provided by the Data Assimilation Research Testbed (DART; Anderson et al. 2009 ) that is based on a uniform prior
GPD was fitted to the subsets of extreme events (i.e., >95th percentile) in the RG and SREs datasets. The extremes in the SREs were obtained in a similar way to the RGs extremes. To make the stations and all the rainfall products comparable, we normalized the modeled return values with the RG-modeled return values at the stations, then averaged over all stations for each dataset. The normalized return values of the RG data were taken as the reference for evaluating the SREs. The performance of
GPD was fitted to the subsets of extreme events (i.e., >95th percentile) in the RG and SREs datasets. The extremes in the SREs were obtained in a similar way to the RGs extremes. To make the stations and all the rainfall products comparable, we normalized the modeled return values with the RG-modeled return values at the stations, then averaged over all stations for each dataset. The normalized return values of the RG data were taken as the reference for evaluating the SREs. The performance of
predictive performance; we thus focus on maximum likelihood-based methods instead. 7 To account for the intertwined choice of scoring rules for model estimation and evaluation ( Gebetsberger et al. 2017 ), we have also evaluated the models using LogS. However, as the results are very similar to those reported here and computation of LogS for the raw ensemble and QRF forecasts is problematic ( Krüger et al. 2016 ), we focus on CRPS-based evaluation. 8 For example, see https://colab.research.google.com/ .
predictive performance; we thus focus on maximum likelihood-based methods instead. 7 To account for the intertwined choice of scoring rules for model estimation and evaluation ( Gebetsberger et al. 2017 ), we have also evaluated the models using LogS. However, as the results are very similar to those reported here and computation of LogS for the raw ensemble and QRF forecasts is problematic ( Krüger et al. 2016 ), we focus on CRPS-based evaluation. 8 For example, see https://colab.research.google.com/ .
statistical postprocessing methods, whose predictive performance is evaluated in section 4 . A meteorological interpretation of what the models have learned is presented in section 5 . Section 6 concludes with a discussion. R code ( R Core Team 2021 ) with implementations of all methods is available online ( https://github.com/benediktschulz/paper_pp_wind_gusts ). 2. Data and notation a. Forecast and observation data Our study is based on the same dataset as Pantillon et al. (2018) and we
statistical postprocessing methods, whose predictive performance is evaluated in section 4 . A meteorological interpretation of what the models have learned is presented in section 5 . Section 6 concludes with a discussion. R code ( R Core Team 2021 ) with implementations of all methods is available online ( https://github.com/benediktschulz/paper_pp_wind_gusts ). 2. Data and notation a. Forecast and observation data Our study is based on the same dataset as Pantillon et al. (2018) and we
performance of current operational systems with respect to tropical rainfall calls for alternative approaches reaching from convection-permitting resolution ( Pante and Knippertz 2019 ) to methods from statistics and machine learning ( Shi et al. 2015 ; Rasp et al. 2020 ; Vogel et al. 2021 ). Before developing and evaluating new models and approaches, it is essential to establish benchmark forecasts in order to systematically assess forecast improvement. Rasp et al. (2020) recently proposed
performance of current operational systems with respect to tropical rainfall calls for alternative approaches reaching from convection-permitting resolution ( Pante and Knippertz 2019 ) to methods from statistics and machine learning ( Shi et al. 2015 ; Rasp et al. 2020 ; Vogel et al. 2021 ). Before developing and evaluating new models and approaches, it is essential to establish benchmark forecasts in order to systematically assess forecast improvement. Rasp et al. (2020) recently proposed