1. Introduction
Ensemble forecasting techniques have been developed in recent decades and are now used widely in the Earth sciences, such as weather, ocean, climate, hydrological, and space weather forecasts (Kalnay 2019; Murray 2018; Palmer 2019; Troin et al. 2021). In these disciplines, stochastic differential equations with many degrees of freedom are neither derived nor solved. Instead, a large number of forecasts (called ensemble forecasts) based on physical equations with slightly different initial conditions and stochastic parameters are solved to estimate probabilistic future states (Kalnay 2019; Palmer 2019). The probabilistic verification of continuous ensemble forecasts is therefore an essential task required to support various forecast activities.
Numerous methods and scores have been proposed for the probabilistic verification of ensemble forecasts. Among them, Roulston and Smith (2002) have proposed an information-based, probabilistic score called the ignorance score (IS) or logarithmic score. The IS measures the probabilistic forecast skill using a variant of the Kullback–Leibler (KL) divergence or relative entropy, which is a fundamental quantity in probability theory and information theory (Benedetti 2010; Roulston and Smith 2002; Weijs et al. 2010).
The IS and IG can be also defined for continuous probability forecasts (Roulston and Smith 2002; Peirolo 2011). Extending the IS for categorical forecasts to the IS for continuous forecasts may seem straightforward, but caution must be paid. Specifically, the computation of the continuous version of IS requires an additional term of bin size to avoid divergence to infinity as the bin size approaches zero (Peirolo 2011).
Although wider use of the information-based score (e.g., the IS) has been recommended because of its advantages (Casati et al. 2022), it is used less than other standard scores such as the continuous ranked probability score (CRPS) and its normalized score, the continuous ranked probability skill score (CRPSS; Hersbach 2000) in weather and climate forecasting. One of the reasons may be that, as seen in Eq. (1), the IS diverges to infinity when the forecast misses the occurrence of the event (0% forecast probability for a realized category). The IS can be computed for continuous ensemble forecasts by a method called a rank histogram by Roulston and Smith (2002) or binning method (Peirolo 2011), but the IS diverges to infinity if the verification lies outside the ensemble forecast range or, in the case of the binning method, no ensemble forecast falls in the bin of the outcome. Some studies (Bröcker and Smith 2008; Peirolo 2011) have avoided this problem by blending a small portion of the probability of the climatological forecast into the probability of the ensemble forecasts, but this treatment may be somewhat subjective. In addition, a previous study has proposed the use of the continuous ranked ignorance score, which modifies the IS by using a cumulative distribution function and avoids this problem in a manner similar to the approach used by the CRPS (Tödter and Ahrens 2012). However, the score based on a cumulative distribution function is not local, and it can be affected by the probability in the tails of the distribution in the same way that the CRPS is affected. We deal with this divergence problem in a different way in our proposed method.
To compute the IS and IG of continuous ensemble forecasts, one needs to estimate the local probability at the verification (observed value) using continuous ensemble data. In general, there are several ways to evaluate the local probability from samples of continuous data. Those methods include binning (Scott 1979), kernel plug-in (Wand and Jones 1995), and k-nearest neighbor (kNN) methods (Loftsgaarden and Quesenberry 1965). For verification purposes, a method should be as simple and consistent as possible. The binning method may be the simplest and most straightforward. However, this method has the divergence problem mentioned before. The kernel plug-in methods may be more complicated and require choices of a function and parameters, and the computed score depends on these choices. Probably for these reasons, these methods have not been widely used for verification purposes. This study considers another relatively simple and objective method of fixed-radius near-neighbors (NN) estimation, a variant of the kNN estimation, for ensemble verification.
In weather and climate forecasting, there is a need for probabilistic verification of vector data. However, the probabilistic verification of vector variables of ensemble forecasts has been less studied than that for scaler variables. Some scores have been proposed for multivariate verification, such as the energy score (Gneiting et al. 2008), which is a multivariable extension of the CRPS, the Dawid–Sebastiani score (Dawid and Sebastiani 1999; Gneiting and Raftery 2007; Wilks 2020), and the variogram score (Scheuerer and Hamill 2015). There are also some approaches for multivariate verification, for example, the minimum spanning tree (Gombos et al. 2007; Smith and Hansen 2004) and the circular CRPS (Grimit et al. 2006). These scores and methods have some limitations of complex computational procedures and/or limited ability in measuring accuracy of ensemble forecasts. In this paper, we propose a new relatively simple approach that is applicable for measuring the accuracy of two-dimensional vector ensemble forecasts.
For demonstration, this paper presents an example of the application of the proposed scores for two-dimensional data. The example treated in this paper is the Madden–Julian oscillation (MJO) index. The MJO index represents the tropical variability of the MJO using two indices (called RMM1 and RMM2, where RMM stands for real-time multivariate MJO) based on the two leading modes of a principal component analysis (Wheeler and Hendon 2004). Although numerous efforts have been made to establish a standard verification method for the MJO forecast (Gottschalck et al. 2010), its probabilistic verification has been elusive because of the two-dimensionality of the MJO index. For deterministic verification of the MJO index, a bivariate correlation score has often been used (Gottschalck et al. 2010), but it is unclear how the ensemble mean forecasts are made and verified (Matsueda and Endo 2011). More recently, Marshall et al. (2016) have proposed an approach for verifying probabilistic forecasts of the MJO. They computed CRPSs for the RMM1 and RMM2 indices separately and summed them to obtain an aggregated score. They also applied a ranked probability score for the MJO phase and CRPS for the MJO amplitude. This study takes an alternative approach, using a single information-based score of the probabilistic verification for the two-dimensional ensemble forecast data.
This paper is structured as follows. Section 2 describes the data we used and our approach to computing the IG and IGS scores from continuous ensemble forecast data. Section 3 examines the basic characteristics of the proposed scores using idealized Monte Carlo simulations. Section 3 also presents a practical example of MJO verification using the proposed score. Section 4 summarizes the conclusions of this study.
2. Data and methods
a. Data
For the demonstration, we used the MJO indices of hindcasts of the European Centre for Medium-Range Weather Forecasts (ECMWF) subseasonal forecast system (version Cy47R3). We analyzed the 46-day ECMWF ensemble hindcasts with 11 members for forecast periods during extended winters (initial dates during 1 November–28 February) from 2001/02 to 2020/21. These MJO index data were obtained from the Subseasonal-to-Seasonal Prediction Project (S2S) data archive (Vitart et al. 2017). The MJO indices were computed following the procedure used by Gottschalck et al. (2010), except that we used forecast anomalies relative to the climatology of the model. The MJO indices of the hindcasts were verified against those of the ERA5 reanalysis (Hersbach et al. 2020), which was also obtained from the S2S data archive.
b. Nonparametric estimation of the ignorance score
In this study, we consider the fixed-radius near-neighbors estimation of the IS with a fixed radius. The reason is that fixing k leads to the change of the radius for different lead times or models and makes it difficult to compare the prediction skills of different models or lead times for which the forecast dispersion (thus the radius of the kNN n-dimensional sphere) may differ. It is important to note that the kNN estimation based on the Kozachenko and Leonenko entropy estimator may be applied suitably to low-dimensional data, but it is not well suited for high-dimensional data because of the degradation of the accuracy of the kNN estimator (the so-called “curse of dimensionality”; Pestov 2000). Based on this consideration and the relatively small sample size of ensemble forecasts, we recommend that our method be used for low-dimensional data (n ≤ 2) unless a sufficiently large number of ensemble forecasts is available. We also note that the accuracy of the estimation can be improved by using methods more sophisticated than the kNN entropy estimator (e.g., Lombardi and Pant 2016; Pérez-Cruz 2008) or density-ratio estimator (e.g., Sugiyama et al. 2008). However, because these approaches complicate the computation of the scores and introduce additional arbitrary parameters, they likely impede simple and objective verification. For these reasons, these possible improvements are beyond the scope of this study.
When we compute ISNN, we first need to determine the fixed radius R of the n-dimensional sphere centered at the verification (observation) x. We then count the number of forecasts kf within the same search domain with that fixed radius (Fig. 1b; Hino and Murata 2013). Using kf, the ensemble size Nf, and the volume of the ball, the score can be computed by Eq. (4). It is noteworthy that IGNN is a nonparametric score because it does not assume a particular probability distribution.
A schematic diagram of the fixed-radius NN approach. (a) Forecasts and observations of the MJO indices RMM1 and RMM2 for a forecast case from an initial date of 22 Nov 2014. Ensemble forecasts of day 1, 5, 10, 15, and 20 (colored circles) and corresponding observations (black circles) are plotted. The color shading indicates the probability density covering 90% of the historical observations during extended winters (1 Nov–31 Mar) from 1980/81 to 2020/21 by kernel density estimation. The search circles of radii (R = 0.5 and 1) are shown only for day 10. MJO phases (P1–P8) are denoted in the diagram. (b) An enlargement of (a) around the circle domain of day 10. The historical observations are plotted with light blue circles. Please refer to the text for the procedure for counting using the fixed-radius NN approach.
Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1
A special treatment is required when there is no ensemble member within the ball of fixed radius R, i.e., the number of forecasts kf equals zero. In such a case, ISNN diverges to infinity because of the logarithmic function in the first term of Eq. (4). To avoid this problem when kf equals zero, we compute the score as follows. We search a
We note that because the probabilistic density is not necessarily homogeneous within the domain of radius R (Fig. 1b), this estimate is susceptible to an error because of the inhomogeneity of data samples. However, our objective here is not to assess the local probability density accurately but to introduce an information-theoretic concept to the probabilistic verification. We thus consider that the ISNN computed by our procedure is valid for the purpose of verification.
For verifying forecast sets with multiple forecast cases (i = 1, 2, …, N) and multiple grid points (j = 1, 2, …, M), ISNN is averaged for all the N × M cases, i.e.,
c. Nonparametric estimation of the ignorance gain
In our approach, there is one arbitrary parameter R to be configured. A larger R gives a smaller variance of IGNN but a smaller sensitivity to the skill, and vice versa. The range of kf(xi,j)/Nf and percentage of totally missed cases (no ensemble forecast in the domain) may be used to set an appropriate value of R. Although the score can be computed for a small radius R, our rule of thumb is to choose the radius R so that the percentage of totally missed cases is less than about 10%–15% for the least skillful case to be verified. If the chosen radius is too small, the radius can no longer be considered fixed, and the IS may be largely penalized by missed cases. For a normalized, one-dimensional (two-dimensional) data case, we suggest R = 0.5 (1.0) for 10 members, R = 0.3 (0.8) for 20 members, and R = 0.2 (0.65) for 30 members, based on results of idealized Monte Carlo simulations (section 3). The choice of R may also depend on the number of historical observations if that number is relatively small and a smooth climatological probability density function is difficult to obtain. One can assign several values to R to verify forecasts at different lead times if necessary. Setting multiple values to R corresponds to the current verification practice of choosing a range of categories based on the forecast skill. Namely, a smaller range of categories (e.g., decile or quintile) is used to verify relatively high-skill forecasts (e.g., shorter-range forecasts up to two weeks), and a wider range of categories (e.g., tercile) is used to verify relatively low-skill forecasts (e.g., subseasonal and longer forecasts).
3. Results
a. Basic characteristics of the IS and IG
Before discussing real applications of the proposed scores, we illustrate the basic characteristics of the ISNN and IGSNN with idealized Monte Carlo simulations of one- and two-dimensional data. For ease of understanding the new scores, we first compare the ISNN and IGSNN with the Pearson correlation coefficient of a single-member forecast, which is one of the most basic verification scores. We also examine how the proposed scores evaluate the performance of ensemble forecasts in terms of mean and dispersion biases.
We conducted an idealized Monte Carlo simulation, which is a versatile method for investigating the characteristics of forecast scores (Kumar 2009; Vitart and Takaya 2021). We used the approach of Vitart and Takaya (2021) and conducted Monte Carlo simulations by assuming that both the verification and ensemble forecast data followed a Gaussian distribution. Here we briefly describe the simulation method.
We first examine the correspondence of ISNN and IGSNN to the single-member correlation score (r1). We computed ISNN, IGSNN, and the Pearson correlation coefficients of the ensemble mean forecasts (rm, where m is ensemble size) under the perfect model assumption (α = 1 and β = 0). Figure 2 shows ISNN, IGSNN, and rm as a function of r1. It is apparent that the scores of IGNN and IGSNN are better for higher r1, which corresponds to a larger signal-to-noise ratio (Kumar 2009). The fact that the scores for ISNN and IGSNN also improve as the ensemble size increases is consistent with other scores investigated in previous studies (Leutbecher 2019; Siegert et al. 2019, and references therein). The added value of the information-based scores determined by ensemble forecasting is particularly apparent in the relatively low range of r1 (r1 < 0.6). A previous study has pointed out that the dependence on ensemble size can be adjusted and has proposed an ensemble-adjusted IS (Siegert et al. 2019). It is also noteworthy that both the ISNN and IGSNN appear more sensitive than rm to ensemble size even when the correlation is relatively high (e.g., r1 > 0.8), and rm becomes saturated. We emphasize that the proposed scores yield seemingly reasonable values, even for low-skill situations and small ensemble sizes (r1 < 0.1, m = 10) where the average ratio of totally missed cases can be approximately 10%. The implication is that the treatment for the totally missed cases described in section 2b works reasonably well for a wide range of forecast skill.
ISNN, IGSNN, and the correlation coefficient scores of the ensemble mean forecasts as a function of the single-member correlation score (r1). Plots are shown for a radius R = 0.5. Solid lines and closed circles indicate (a) ISNN and (b) IGSNN, and crosses indicate correlation coefficients of ensemble mean forecasts for reference. Colors indicate results for various ensemble sizes. The correlation coefficients of ensemble mean forecasts are averages of correlations calculated from 10 000 iterations with the Fisher’s z transformation to avoid biases of average correlations.
Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1
We next examine the properties of the proper scoring rules (Bröcker and Smith 2007). To investigate the sensitivity to the mean bias error, we computed ISNN by changing the bias parameter (β) (Fig. 3). Here, the single-member correlation score r1 was set to 0.4. As expected, the ISNN is best (smallest) when β = 0 for various ensemble sizes. Thus, ISNN is best when the probabilistic distributions of the forecasts and verification coincide. This result suggests that the ISNN is proper with respect to the mean bias error. We note that the basic characteristics did not change when the correlation coefficients (r1) differed.
The dependency of ISNN on the mean bias error (β). Colors indicate results for various ensemble sizes.
Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1
We next examine the sensitivity of the ISNN to the forecast dispersion. This sensitivity is assayed by computing ISNN as a function of the value of the forecast dispersion (α) (Fig. 4). A value of α greater (smaller) than 1 corresponds to an overdispersive (underdispersive) forecast. For comparison, we calculated another theoretically proper score for continuous probabilistic forecasts, the continuous ranked probability score (CRPS) computed by Hersbach’s algorithm (Hersbach 2000). For being a proper score, the best score must be obtained when the dispersion of the noise component of the ensemble forecast coincides with that of the verification (in this case, α = 1).
The dependency of (a) ISNN, (b) continuous ranked probability score (CRPS), and (c) the correlation coefficient of ensemble mean forecasts on the forecast dispersion parameter (α). Colors indicate results for various ensemble size. In (a), the radius R is 0.5 except for the purple data, for which R is 0.2. In (c), correlation coefficients of the 10 000 iterations were averaged using Fisher’s z transformation.
Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1
We found that in small ensemble cases, the scores can be biased and hence somewhat misleading (Fig. 4a). For example, in the 25-member case, underdispersive forecasts (α ∼ 0.9) can yield better scores than the perfect forecast (i.e., α = 1). We found that this problem is also apparent for the CRPS (Fig. 4b). This result is consistent with the finding of Ferro (2014), who noted that the CRPS favors ensembles that are sampled from underdispersed distributions. Overdispersive forecasts (α > 1) are correctly given low scores by both the ISNN and CRPS. With ensemble sizes greater than 50, both the ISNN and CRPS become more proper (the α of the best score is closer to 1). Although the ISNN of a 200-member ensemble with R = 0.5 is still slightly biased, this bias can be reduced by setting R = 0.2 (Fig. 4a). The increase of the ensemble size makes this possible with a smaller percentage of totally missed cases. Because the ISNN is more sensitive to the underdispersion error than the CRPS, the ISNN may be more helpful than CRPS in detecting the dispersion error in very underdispersive forecasts (α < 0.8). We also note that Monte Carlo simulations revealed that the correlation coefficient of ensemble mean forecasts, which is widely used as a deterministic verification score, is not a proper score for the indicator of forecast dispersion (Fig. 4c). In particular, with smaller ensemble size, an underdispersive forecast can yield a higher (better) forecast score.
These results have important implications for the current verification practice, because the score characteristics associated with being proper have usually been considered theoretically, without any constraint on ensemble size (i.e., infinite ensemble size; Roulston and Smith 2002; Bröcker and Smith 2007). Although Ferro (2014) pointed out the improper nature of the Brier score and CRPS with respect to the dispersion (total variance) for a small ensemble and proposed adjusted scores, the improper nature of the original scores has not been widely recognized. The results also imply that current subseasonal-to-seasonal hindcast configurations with a relatively small ensemble (m < 25) require attention to the detection of underdispersion by theoretically proper probabilistic scores (either local or nonlocal scores). We have not yet found a way to avoid the improper behavior of ISNN with small ensemble forecasts with respect to forecasting dispersion. This aspect merits further study in the future.
Figure 5 displays ISNN, IGSNN, and the bivariate correlation coefficients (Gottschalck et al. 2010) with respect to single-member correlations, r1 and r2. We found that ISNN and IGSNN are low and high, respectively, only if both r1 and r2 are high. This characteristic reflects the fact that the number of forecast members in the near-neighbors domain sharply decreases if one of the correlations (r1 or r2) decreases. On the contrary, the bivariate correlation coefficient is less sensitive to a lower value of r1 and r2 than ISNN and IGSNN. In other words, the bivariate correlation coefficient can be relatively high if either r1 or r2 is high. We note that the model we used assumes that the variances are the same in both dimensions. If the variability differs between the dimensions, the dependency on the single-member correlation coefficient in each dimension may differ from the results presented here. Namely, the single-member correlation coefficient of the dimension with the larger variance has a greater influence on ISNN and IGSNN. This situation may occur, for instance, in unnormalized horizontal wind vectors. More detailed analysis may be presented in future studies that deal with the verification of such variables.
(a) ISNN, (b) IGSNN, and (c) the bivariate correlation coefficient scores of the ensemble mean forecasts as a function of the single-member correlation scores (r1, r2). Plots are shown for the case of a radius R = 1.0 and m = 25.
Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1
b. Probabilistic verification of the MJO index
We now demonstrate the practical use of IGNN and IGSNN for the probabilistic verification of the MJO index. Figure 1a shows an example of an MJO diagram and the probability density of the observed climatology, which is used as the reference forecast, during the hindcast period of extended winters from 1980/81 to 2020/21. We have approximately 6200 samples of data from historical observations during the corresponding hindcast period.
Figure 6 shows IGNN and IGSNN versus lead time for the MJO forecasts of the ECMWF forecast system during the hindcast period. Scores using radii of 0.5 and 1 are shown. The quartile ranges of all cases are shown to indicate the variability of the scores among the cases (Figs. 6a,c). The significance level of 5% was assessed using Monte Carlo sampling (by shuffling the order of observation samples). It is apparent from Figs. 6a and 6c that the lower quartile of the IGSNN exceeds zero at a lead time of <15 (25) days for a radius of 0.5 (1), and the IGSNN scores indicate statistically significant skill up to roughly 40 (45) days for a radius of 0.5 (1). These choices of the radii (R = 0.5 and 1) are used to verify the skill of forecast ranges for short (<10 days) and long (>10 days) lead times. We found that both IGNN and IGSNN can estimate the probabilistic forecast skill reasonably, and, as expected, IGNN and IGSNN are higher for forecasts with a shorter lead time. The increase of IGNN with R = 0.5 at a short lead time (≤2 days) likely reflects the overconfidence of the forecast system or relatively large errors (Fig. 6a). Figures 6b and 6d show the ratio of the number of cases for which kf/Nf exceeded kc/Nc to the total number of cases. This ratio is the ratio of the forecast cases that are more skillful than the climatological forecasts. Averages of kf/Nf are also shown to indicate the probability of detection, which is another simple measure of the probabilistic forecast skill.
The IGNN and IGSNN vs lead time for the MJO ensemble hindcasts of the ECMWF forecast system starting from November to February from 2001/02 to 2020/21. Results for radii of (a),(b) 0.5 and (c),(d) 1. The blue shadings indicate quartile ranges of IGNN for all cases. The dashed lines in (a) and (c) indicate the significance level of p = 0.05 based on Monte Carlo sampling. In (b) and (d), ratios of the number of cases of IGNN > 0 to the total number of cases are shown. Averages of kf/Nf are also shown to indicate the probability of detection. The dashed lines in (b) and (d) indicate the ratio of totally missed cases to all cases.
Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1
One important feature of the IGNN score is that it can evaluate the forecast skill of individual cases. One can therefore examine the conditional forecast skill by stratifying the cases with particular conditions. Figure 7a illustrates the dependency of the IGNN score on the initial MJO phase. The results indicate that, in the ECMWF model, the forecasts starting from MJO phases 2–3 and 3–4 tend to have slightly higher scores in week 2 and week 1, whereas the forecasts starting from MJO phases 4–5 tend to have lower scores in week 3, as previous studies have reported (Kim 2017; Kim et al. 2018; Vitart and Molteni 2010). The lower skill of the initial MJO phases 4–5 presumably reflects the model’s difficulty in predicting the propagation of MJOs through the Maritime Continent (Kim 2017; Vitart and Molteni 2010). It should be noted that the dependency of the MJO forecast skill on the initial MJO phase may differ among models (Kim et al. 2014, 2018; Lim et al. 2018).
The dependency of the IGNN score on the initial MJO phase and La Niña condition in the ECMWF model. (a) Difference of IGNN scores in different initial MJO phases from the average of the IGNN scores for all cases. (b) Difference of IGNN scores between La Niña winters (extended winters of 2005/06, 2007/08, 2010/11, 2017/18, and 2020/21) from all the winters (from 2001/02 to 2020/21) for each initial MJO phase. All the IGNN scores were computed with a radius of 1 and averaged for forecast cases if their initial MJO amplitudes (
Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1
The MJO forecast skill is possibly modulated by the interannual variability of El Niño–Southern Oscillation (ENSO). Figure 7b shows the modulation of IGNN in La Niña winters for different initial MJO phases. We found that forecasts starting from MJO phases 5–6 (8–1) tend to have higher (lower) scores during La Niña winters. These higher (lower) scores are associated with larger (smaller) MJO amplitudes in the observations and forecasts during La Niña winters (not shown). This result reflects the flow-dependent predictability of the MJO due to ENSO in the ECMWF model. In contrast, we found no clear change of IGNN during El Niño winters (not shown). A further study will determine if these are common features across S2S models, but such detailed analysis is beyond the scope of this paper. As demonstrated here, the proposed approach enables evaluation of the conditional probabilistic forecast skill.
4. Conclusions
In this paper we proposed the use of information-based probabilistic verification scores named the fixed-radius near-neighbors ignorance score (ISNN) and information gain (IGNN). In the proposed method, a nonparametric, near-neighbors estimator with a fixed search radius was used to compute the ISNN and IGNN of continuous quantities of ensemble forecasts.
The characteristics of the proposed scores were investigated using idealized Monte Carlo simulations. The correspondence of the proposed scores to the Pearson correlation coefficient was illustrated. The sensitivity of the proposed scores to ensemble size was also investigated. The scores were slightly better with larger ensemble sizes as many other scores. The proper scoring characteristics were elaborated in terms of the mean and dispersion biases. We found that both the ISNN and CRPS could be biased for forecast dispersion if the ensemble size was small. It appears to be impractical to use currently proposed local (ISNN) and nonlocal (CRPS) proper scores to detect the underdispersion error with small ensemble forecasts; care should be taken in the interpretation of these scores. The ISNN becomes more reliable when the ensemble size is large and the radius parameter is small. This information facilitates practical application of the proposed scores.
One of the advantages of the proposed scores is that they can be naturally extended to vector variables of multiple dimensions (the practical application to few dimensions is due only to the limited ensemble size in the current meteorological forecasts). Comparisons were made between the proposed scores and bivariate correlation coefficients for idealized, two-dimensional Monte Carlo simulations. A notable difference was found in the score characteristics. The ignorance gain skill score (IGSNN) is high if the single-member correlations for both dimensions are high. In contrast, the bivariate correlation can be relatively high if a single-member correlation for either dimension is high.
This paper illustrated how the proposed method can be applied to subseasonal forecasts of the MJO index, which consists of two-dimensional data. The proposed approach enabled assessment of the probabilistic forecast skill of the MJO index. The results demonstrated that the IGNN score can be a useful score to assess the accuracy of ensemble forecasts. Another advantage of the proposed scores is that they can be computed for individual cases. We also showed the dependency of the probabilistic MJO forecast skill on the initial MJO phases and ENSO phases in the ECMWF model. We consider that the new approach for probabilistic forecast verification can support various forecast activities not only in the Earth sciences but also in other disciplines.
Acknowledgments.
The MJO forecast data were provided by the WWRP/WCRP S2S project. This work was supported by the Arctic Challenge for Sustainability II (ArCS II) program, Grant JPMXD1420318865, the MEXT program for the advanced studies of climate change projection (SENTAN), Grants JPMXD0722680395 and JPMXD0722680734, and Japan Society for the Promotion of Science KAKENHI Grant JP22H03653.
Data availability statement.
The MJO index data of the ECMWF subseasonal reforecasts, real-time forecasts, and ERA5 reanalysis are available from the institutional repository of the Meteorological Research Institute (https://climate.mri-jma.go.jp/pub/archives/Takaya-et-al_MJO-S2S/).
REFERENCES
Benedetti, R., 2010: Scoring rules for forecast verification. Mon. Wea. Rev., 138, 203–211, https://doi.org/10.1175/2009MWR2945.1.
Box, G. E. P., and M. E. Muller, 1958: A note on the generation of random normal deviates. Ann. Math. Stat., 29, 610–611, https://doi.org/10.1214/aoms/1177706645.
Bröcker, J., and L. A. Smith, 2007: Scoring probabilistic forecast: The importance of being proper. Wea. Forecasting, 22, 382–388, https://doi.org/10.1175/WAF966.1.
Bröcker, J., and L. A. Smith, 2008: From ensemble forecasts to predictive distribution functions. Tellus, 60A, 663–678, https://doi.org/10.1111/j.1600-0870.2008.00333.x.
Casati, B., M. Dorninger, C. A. S. Coelho, E. E. Ebert, C. Marsigli, M. P. Mittermaier, and E. Gilleland, 2022: The 2020 International Verification Methods Workshop Online: Major outcomes and way forward. Bull. Amer. Meteor. Soc., 103, E899–E910, https://doi.org/10.1175/BAMS-D-21-0126.1.
Dawid, A. P., and P. Sebastiani, 1999: Coherent dispersion criteria for optimal experimental design. Ann. Stat., 27, 65–81, https://doi.org/10.1214/aos/1018031101.
Ferro, C. A. T., 2014: Fair scores for ensemble forecasts. Quart. J. Roy. Meteor. Soc., 140, 1917–1923, https://doi.org/10.1002/qj.2270.
Gneiting, T., and A. E. Raftery, 2007: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc., 102, 359–378, https://doi.org/10.1198/016214506000001437.
Gneiting, T., L. I. Stanberry, E. P. Grimit, L. Held, and N. A. Johnson, 2008: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. TEST, 17, 211–235, https://doi.org/10.1007/s11749-008-0114-x.
Gombos, D., J. A. Hansen, J. Du, and J. McQueen, 2007: Theory and applications of the minimum spanning tree rank histogram. Mon. Wea. Rev., 135, 1490–1505, https://doi.org/10.1175/MWR3362.1.
Goria, M. N., N. N. Leonenko, V. V. Mergel, and P. L. Novi Inverardi, 2005: A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparametric Stat., 17, 277–297, https://doi.org/10.1080/104852504200026815.
Gottschalck, J., and Coauthors, 2010: A framework for assessing operational Madden–Julian oscillation forecasts. Bull. Amer. Meteor. Soc., 91, 1247–1258, https://doi.org/10.1175/2010BAMS2816.1.
Grimit, E. P., T. Gneiting, V. J. Berrocal, and N. A. Johnson, 2006: The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Quart. J. Roy. Meteor. Soc., 132, 2925–2942, https://doi.org/10.1256/qj.05.235.
Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559–570, https://doi.org/10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.
Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803.
Hino, H., and N. Murata, 2013: Information estimators for weighted observations. Neural Networks, 46, 260–275, https://doi.org/10.1016/j.neunet.2013.06.005.
Kalnay, E., 2019: Historical perspective: Earlier ensembles and forecasting forecast skill. Quart. J. Roy. Meteor. Soc., 145, 25–34, https://doi.org/10.1002/qj.3595.
Kim, H.-M., 2017: The impact of the mean moisture bias on the key physics of MJO propagation in the ECMWF reforecast. J. Geophys. Res. Atmos., 122, 7772–7784, https://doi.org/10.1002/2017JD027005.
Kim, H.-M., P. J. Webster, V. E. Toma, and D. Kim, 2014: Predictability and prediction skill of the MJO in two operational forecasting systems. J. Climate, 27, 5364–5378, https://doi.org/10.1175/JCLI-D-13-00480.1.
Kim, H.-M., F. Vitart, and D. E. Waliser, 2018: Prediction of the Madden–Julian oscillation: A review. J. Climate, 31, 9425–9443, https://doi.org/10.1175/JCLI-D-18-0210.1.
Kozachenko, L. F., and N. N. Leonenko, 1987: Sample estimate of entropy of a random vector. Probl. Info. Transm., 23, 95–101.
Kraskov, A., H. Stögbauer, and P. Grassberger, 2004: Estimating mutual information. Phys. Rev., 69E, 066138, https://doi.org/10.1103/PhysRevE.69.066138.
Kumar, A., 2009: Finite samples and uncertainty estimates for skill measures for seasonal predictions. Mon. Wea. Rev., 137, 2622–2631, https://doi.org/10.1175/2009MWR2814.1.
Leutbecher, M., 2019: Ensemble size: How suboptimal is less than infinity? Quart. J. Roy. Meteor. Soc., 145, 107–128, https://doi.org/10.1002/qj.3387.
Lim, Y., S.-W. Son, and D. Kim, 2018: MJO prediction skill of the subseasonal-to-seasonal prediction models. J. Climate, 31, 4075–4094, https://doi.org/10.1175/JCLI-D-17-0545.1.
Loftsgaarden, D. O., and C. P. Quesenberry, 1965: A nonparametric estimate of multivariate density function. Ann. Math. Stat., 36, 1049–1051, https://doi.org/10.1214/aoms/1177700079.
Lombardi, D., and S. Pant, 2016: Nonparametric k-nearest-neighbor entropy estimator. Phys. Rev., 93E, 013310, https://doi.org/10.1103/PhysRevE.93.013310.
Marshall, A. G., H. H. Hendon, and D. Hudson, 2016: Visualizing and verifying probabilistic forecasts of the Madden–Julian Oscillation. Geophys. Res. Lett., 43, 12 278–12 286, https://doi.org/10.1002/2016GL071423.
Matsueda, M., and H. Endo, 2011: Verification of medium-range MJO forecasts with TIGGE. Geophys. Res. Lett., 38, L11801, https://doi.org/10.1029/2011GL047480.
Murray, S. A., 2018: The importance of ensemble techniques for operational space weather forecasting. Space Wea., 16, 777–783, https://doi.org/10.1029/2018SW001861.
Palmer, T., 2019: The ECMWF ensemble prediction system: Looking back (more than) 25 years and projecting forward 25 years. Quart. J. Roy. Meteor. Soc., 145, 12–24, https://doi.org/10.1002/qj.3383.
Peirolo, R., 2011: Information gain as a score for probabilistic forecasts. Meteor. Appl., 18, 9–17, https://doi.org/10.1002/met.188.
Pérez-Cruz, F., 2008: Estimation of information theoretic measures for continuous random variables. Advances in Neural Information Processing Systems: Proceedings of the First 12 Conferences, M. I. Jordan, Y. LeCun, and S. A. Solla, Eds., Vol. 21, Curran Associates Inc., 1257–1264.
Pestov, V., 2000: On the geometry of similarity search: Dimensionality curse and concentration of measure. Info. Process. Lett., 73, 47–51, https://doi.org/10.1016/S0020-0190(99)00156-8.
Roulston, M. S., and L. A. Smith, 2002: Evaluating probabilistic forecasts using information theory. Mon. Wea. Rev., 130, 1653–1660, https://doi.org/10.1175/1520-0493(2002)130<1653:EPFUIT>2.0.CO;2.
Scheuerer, M., and T. M. Hamill, 2015: Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Wea. Rev., 143, 1321–1334, https://doi.org/10.1175/MWR-D-14-00269.1.
Scott, D. W., 1979: On optimal and data-based histograms. Biometrika, 66, 605–610, https://doi.org/10.1093/biomet/66.3.605.
Siegert, S., C. A. T. Ferro, D. B. Stephenson, and M. Leutbecher, 2019: The ensemble-adjusted ignorance score for forecasts issued as normal distributions. Quart. J. Roy. Meteor. Soc., 145, 129–139, https://doi.org/10.1002/qj.3447.
Singh, H., N. Misra, V. Hnizdo, A. Fedorowicz, and E. Demchuk, 2003: Nearest neighbor estimates of entropy. Amer. J. Math. Manage. Sci., 23, 301–321, https://doi.org/10.1080/01966324.2003.10737616.
Smith, L. A., and J. A. Hansen, 2004: Extending the limits of ensemble forecast verification with the minimum spanning tree. Mon. Wea. Rev., 132, 1522–1528, https://doi.org/10.1175/1520-0493(2004)132<1522:ETLOEF>2.0.CO;2.
Sugiyama, M., T. Suzuki, S. Nakajima, H. Kashima, P. von Bünau, and M. Kawanabe, 2008: Direct importance estimation for covariate shift adaptation. Ann. Inst. Stat. Math., 60, 699–746, https://doi.org/10.1007/s10463-008-0197-x.
Tödter, J., and B. Ahrens, 2012: Generalization of the ignorance score: Continuous ranked version and its decomposition. Mon. Wea. Rev., 140, 2005–2017, https://doi.org/10.1175/MWR-D-11-00266.1.
Troin, M., R. Arsenault, A. W. Wood, F. Brissette, and J.-L. Martel, 2021: Generating ensemble streamflow forecasts: A review of methods and approaches over the past 40 years. Water Resour. Res., 57, e2020WR028392, https://doi.org/10.1029/2020WR028392.
Vitart, F., and F. Molteni, 2010: Simulation of the Madden–Julian Oscillation and its teleconnections in the ECMWF forecast system. Quart. J. Roy. Meteor. Soc., 136, 842–855, https://doi.org/10.1002/qj.623.
Vitart, F., and Y. Takaya, 2021: Lagged ensembles in sub-seasonal predictions. Quart. J. Roy. Meteor. Soc., 147, 3227–3242, https://doi.org/10.1002/qj.4125.
Vitart, F., and Coauthors, 2017: The Subseasonal to Seasonal (S2S) prediction project database. Bull. Amer. Meteor. Soc., 98, 163–173, https://doi.org/10.1175/BAMS-D-16-0017.1.
Wand, M. P., and M. C. Jones, 1995: Kernel Smoothing. Springer, 224 pp.
Weijs, S. V., R. van Nooijen, and N. van de Giesen, 2010: Kullback–Leibler divergence as a forecast skill score with classic reliability–resolution–uncertainty decomposition. Mon. Wea. Rev., 138, 3387–3399, https://doi.org/10.1175/2010MWR3229.1.
Wheeler, M. C., and H. H. Hendon, 2004: An all-season real-time multivariate MJO index: Development of an index for monitoring and prediction. Mon. Wea. Rev., 132, 1917–1932, https://doi.org/10.1175/1520-0493(2004)132<1917:AARMMI>2.0.CO;2.
Wilks, D. S., 2020: Regularized Dawid–Sebastiani score for multivariate ensemble forecasts. Quart. J. Roy. Meteor. Soc., 146, 2421–2431, https://doi.org/10.1002/qj.3800.