Information-Based Probabilistic Verification Scores for Two-Dimensional Ensemble Forecast Data: A Madden–Julian Oscillation Index Example

Yuhei Takaya aMeteorological Research Institute, Japan Meteorological Agency, Tsukuba, Ibaraki, Japan

Search for other papers by Yuhei Takaya in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-7359-8897
,
Kensuke K. Komatsu aMeteorological Research Institute, Japan Meteorological Agency, Tsukuba, Ibaraki, Japan
bAtmosphere and Ocean Research Institute, The University of Tokyo, Kashiwa, Chiba, Japan

Search for other papers by Kensuke K. Komatsu in
Current site
Google Scholar
PubMed
Close
,
Hideitsu Hino cThe Institute of Statistical Mathematics, Tachikawa, Tokyo, Japan

Search for other papers by Hideitsu Hino in
Current site
Google Scholar
PubMed
Close
, and
Frédéric Vitart dThe European Centre for Medium-Range Weather Forecasts, Reading, Berkshire, United Kingdom

Search for other papers by Frédéric Vitart in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

Probabilistic forecasting is a common activity in many fields of the Earth sciences. Assessing the quality of probabilistic forecasts—probabilistic forecast verification—is therefore an essential task in these activities. Numerous methods and metrics have been proposed for this purpose; however, the probabilistic verification of vector variables of ensemble forecasts has received less attention than others. Here we introduce a new approach that is applicable for verifying ensemble forecasts of continuous, scalar, and two-dimensional vector data. The proposed method uses a fixed-radius near-neighbors search to compute two information-based scores, the ignorance score (the logarithmic score) and the information gain, which quantifies the skill gain from the reference forecast. Basic characteristics of the proposed scores were examined using idealized Monte Carlo simulations. The results indicated that both the continuous ranked probability score (CRPS) and the proposed score with a relatively small ensemble size (<25) are not proper in terms of the forecast dispersion. The proposed verification method was successfully used to verify the Madden–Julian oscillation index, which is a two-dimensional quantity. The proposed method is expected to advance probabilistic ensemble forecasts in various fields.

Significance Statement

In the Earth sciences, stochastic future states are estimated by solving a large number of forecasts (called ensemble forecasts) based on physical equations with slightly different initial conditions and stochastic parameters. The verification of probabilistic forecasts is an essential part of forecasting and modeling activity in the Earth sciences. However, there is no information-based probabilistic verification score applicable for vector variables of ensemble forecasts. The purpose of this study is to introduce a novel method for verifying scalar and two-dimensional vector variables of ensemble forecasts. The proposed method offers a new approach to probabilistic verification and is expected to advance probabilistic ensemble forecasts in various fields.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Komatsu’s current affiliation: Osaka Regional Headquarters, Japan Meteorological Agency, Osaka, Japan.

Corresponding author: Yuhei Takaya, yuhei.takaya@mri-jma.go.jp

Abstract

Probabilistic forecasting is a common activity in many fields of the Earth sciences. Assessing the quality of probabilistic forecasts—probabilistic forecast verification—is therefore an essential task in these activities. Numerous methods and metrics have been proposed for this purpose; however, the probabilistic verification of vector variables of ensemble forecasts has received less attention than others. Here we introduce a new approach that is applicable for verifying ensemble forecasts of continuous, scalar, and two-dimensional vector data. The proposed method uses a fixed-radius near-neighbors search to compute two information-based scores, the ignorance score (the logarithmic score) and the information gain, which quantifies the skill gain from the reference forecast. Basic characteristics of the proposed scores were examined using idealized Monte Carlo simulations. The results indicated that both the continuous ranked probability score (CRPS) and the proposed score with a relatively small ensemble size (<25) are not proper in terms of the forecast dispersion. The proposed verification method was successfully used to verify the Madden–Julian oscillation index, which is a two-dimensional quantity. The proposed method is expected to advance probabilistic ensemble forecasts in various fields.

Significance Statement

In the Earth sciences, stochastic future states are estimated by solving a large number of forecasts (called ensemble forecasts) based on physical equations with slightly different initial conditions and stochastic parameters. The verification of probabilistic forecasts is an essential part of forecasting and modeling activity in the Earth sciences. However, there is no information-based probabilistic verification score applicable for vector variables of ensemble forecasts. The purpose of this study is to introduce a novel method for verifying scalar and two-dimensional vector variables of ensemble forecasts. The proposed method offers a new approach to probabilistic verification and is expected to advance probabilistic ensemble forecasts in various fields.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Komatsu’s current affiliation: Osaka Regional Headquarters, Japan Meteorological Agency, Osaka, Japan.

Corresponding author: Yuhei Takaya, yuhei.takaya@mri-jma.go.jp

1. Introduction

Ensemble forecasting techniques have been developed in recent decades and are now used widely in the Earth sciences, such as weather, ocean, climate, hydrological, and space weather forecasts (Kalnay 2019; Murray 2018; Palmer 2019; Troin et al. 2021). In these disciplines, stochastic differential equations with many degrees of freedom are neither derived nor solved. Instead, a large number of forecasts (called ensemble forecasts) based on physical equations with slightly different initial conditions and stochastic parameters are solved to estimate probabilistic future states (Kalnay 2019; Palmer 2019). The probabilistic verification of continuous ensemble forecasts is therefore an essential task required to support various forecast activities.

Numerous methods and scores have been proposed for the probabilistic verification of ensemble forecasts. Among them, Roulston and Smith (2002) have proposed an information-based, probabilistic score called the ignorance score (IS) or logarithmic score. The IS measures the probabilistic forecast skill using a variant of the Kullback–Leibler (KL) divergence or relative entropy, which is a fundamental quantity in probability theory and information theory (Benedetti 2010; Roulston and Smith 2002; Weijs et al. 2010).

For categorical forecasts, the IS computes KL divergence by setting an observed probability equal to one in a category where the event occurred (hereafter realized category) and zero in the others. The IS is defined as
IS(X)=logPrf(XC),
where Prf(XC) is the forecasted probability of event X in realized category C. This quantity can also be readily interpreted as the information content (surprisal) of the ensemble forecast for a given realized category. The smaller the IS, the more skillful the forecast because a small IS means that the forecasted probability of the realized outcome is greater.
Bröcker and Smith (2008) and later Peirolo (2011) have introduced a score called the relative ignorance or the information gain (IG), which quantifies the gain of the IS from a reference forecast (e.g., a climatological probability forecast). The IG is defined as
IG(X)=ISc(X)ISf(X)=log[Prf(XC)/Prc(XC)],
where Prc(XC) is the probability of the reference forecast (e.g., climatological probability for a realized category C), and Prf(XC) is the probability of the ensemble forecast’s being verified. A larger IG means a more skillful forecast.

The IS and IG can be also defined for continuous probability forecasts (Roulston and Smith 2002; Peirolo 2011). Extending the IS for categorical forecasts to the IS for continuous forecasts may seem straightforward, but caution must be paid. Specifically, the computation of the continuous version of IS requires an additional term of bin size to avoid divergence to infinity as the bin size approaches zero (Peirolo 2011).

Although wider use of the information-based score (e.g., the IS) has been recommended because of its advantages (Casati et al. 2022), it is used less than other standard scores such as the continuous ranked probability score (CRPS) and its normalized score, the continuous ranked probability skill score (CRPSS; Hersbach 2000) in weather and climate forecasting. One of the reasons may be that, as seen in Eq. (1), the IS diverges to infinity when the forecast misses the occurrence of the event (0% forecast probability for a realized category). The IS can be computed for continuous ensemble forecasts by a method called a rank histogram by Roulston and Smith (2002) or binning method (Peirolo 2011), but the IS diverges to infinity if the verification lies outside the ensemble forecast range or, in the case of the binning method, no ensemble forecast falls in the bin of the outcome. Some studies (Bröcker and Smith 2008; Peirolo 2011) have avoided this problem by blending a small portion of the probability of the climatological forecast into the probability of the ensemble forecasts, but this treatment may be somewhat subjective. In addition, a previous study has proposed the use of the continuous ranked ignorance score, which modifies the IS by using a cumulative distribution function and avoids this problem in a manner similar to the approach used by the CRPS (Tödter and Ahrens 2012). However, the score based on a cumulative distribution function is not local, and it can be affected by the probability in the tails of the distribution in the same way that the CRPS is affected. We deal with this divergence problem in a different way in our proposed method.

To compute the IS and IG of continuous ensemble forecasts, one needs to estimate the local probability at the verification (observed value) using continuous ensemble data. In general, there are several ways to evaluate the local probability from samples of continuous data. Those methods include binning (Scott 1979), kernel plug-in (Wand and Jones 1995), and k-nearest neighbor (kNN) methods (Loftsgaarden and Quesenberry 1965). For verification purposes, a method should be as simple and consistent as possible. The binning method may be the simplest and most straightforward. However, this method has the divergence problem mentioned before. The kernel plug-in methods may be more complicated and require choices of a function and parameters, and the computed score depends on these choices. Probably for these reasons, these methods have not been widely used for verification purposes. This study considers another relatively simple and objective method of fixed-radius near-neighbors (NN) estimation, a variant of the kNN estimation, for ensemble verification.

In weather and climate forecasting, there is a need for probabilistic verification of vector data. However, the probabilistic verification of vector variables of ensemble forecasts has been less studied than that for scaler variables. Some scores have been proposed for multivariate verification, such as the energy score (Gneiting et al. 2008), which is a multivariable extension of the CRPS, the Dawid–Sebastiani score (Dawid and Sebastiani 1999; Gneiting and Raftery 2007; Wilks 2020), and the variogram score (Scheuerer and Hamill 2015). There are also some approaches for multivariate verification, for example, the minimum spanning tree (Gombos et al. 2007; Smith and Hansen 2004) and the circular CRPS (Grimit et al. 2006). These scores and methods have some limitations of complex computational procedures and/or limited ability in measuring accuracy of ensemble forecasts. In this paper, we propose a new relatively simple approach that is applicable for measuring the accuracy of two-dimensional vector ensemble forecasts.

For demonstration, this paper presents an example of the application of the proposed scores for two-dimensional data. The example treated in this paper is the Madden–Julian oscillation (MJO) index. The MJO index represents the tropical variability of the MJO using two indices (called RMM1 and RMM2, where RMM stands for real-time multivariate MJO) based on the two leading modes of a principal component analysis (Wheeler and Hendon 2004). Although numerous efforts have been made to establish a standard verification method for the MJO forecast (Gottschalck et al. 2010), its probabilistic verification has been elusive because of the two-dimensionality of the MJO index. For deterministic verification of the MJO index, a bivariate correlation score has often been used (Gottschalck et al. 2010), but it is unclear how the ensemble mean forecasts are made and verified (Matsueda and Endo 2011). More recently, Marshall et al. (2016) have proposed an approach for verifying probabilistic forecasts of the MJO. They computed CRPSs for the RMM1 and RMM2 indices separately and summed them to obtain an aggregated score. They also applied a ranked probability score for the MJO phase and CRPS for the MJO amplitude. This study takes an alternative approach, using a single information-based score of the probabilistic verification for the two-dimensional ensemble forecast data.

This paper is structured as follows. Section 2 describes the data we used and our approach to computing the IG and IGS scores from continuous ensemble forecast data. Section 3 examines the basic characteristics of the proposed scores using idealized Monte Carlo simulations. Section 3 also presents a practical example of MJO verification using the proposed score. Section 4 summarizes the conclusions of this study.

2. Data and methods

a. Data

For the demonstration, we used the MJO indices of hindcasts of the European Centre for Medium-Range Weather Forecasts (ECMWF) subseasonal forecast system (version Cy47R3). We analyzed the 46-day ECMWF ensemble hindcasts with 11 members for forecast periods during extended winters (initial dates during 1 November–28 February) from 2001/02 to 2020/21. These MJO index data were obtained from the Subseasonal-to-Seasonal Prediction Project (S2S) data archive (Vitart et al. 2017). The MJO indices were computed following the procedure used by Gottschalck et al. (2010), except that we used forecast anomalies relative to the climatology of the model. The MJO indices of the hindcasts were verified against those of the ERA5 reanalysis (Hersbach et al. 2020), which was also obtained from the S2S data archive.

b. Nonparametric estimation of the ignorance score

Before introducing our approach, we begin with the definition of the IS for a continuous probability forecast. The IS is defined similarly to Eq. (1) as
ISf(x)=log[pf(x)],
where pf is the probability density function of the ensemble forecast. The true probability density function po of the observation is seldom obtained for a particular case because there is only one observed verification. The derivation of the continuous IS based on the KL divergence is therefore not trivial. It is more straightforward to interpret the IS as the information content at a given verification x. The expectation of the IS is a minimum when the forecast probability density function (PDF) coincides with the true (verification) PDF. The IS is thus a strictly proper information-based score (Bröcker and Smith 2007; Roulston and Smith 2002). Bröcker and Smith (2007) have stressed that the IS is the only strictly proper local score for continuous probability among the scores that they considered.
Now we describe our procedure for computing the fixed-radius, near-neighbors estimate of the ignorance score (hereafter ISNN). One classical way to assess the differential (continuous) entropy of a set of continuous samples is to use the kNN estimation based on the Kozachenko and Leonenko entropy estimator (Goria et al. 2005; Kozachenko and Leonenko 1987; Kraskov et al. 2004; Singh et al. 2003). By analogy with the Kozachenko and Leonenko entropy estimator, we estimate the ignorance score, ISNN, of the probability at a given verification x for the n-dimensional case by
ISNN(x)=log(k)+log(N)+log[V(x)],
where V(x) is the volume of the kNN n-dimensional sphere (ball) centered at x, N is the total sample size, k is the number of ensemble members in the ball, and log(⋅) denotes the natural logarithm. For a Euclidean (L2) norm, the volume of the kNN n-dimensional ball V(x) is υnR(x)n, where υn is the n-dimensional volume of a Euclidean unit ball, πn/2/[Γ(1 + n/2)]. The term Γ(⋅) denotes Euler’s gamma function, and R(x) is the radius of the ball (the Euclidean distance between x and its kth nearest forecast). For instance, the volume of a ball V(x) with a radius R(x) is simply 2R(x) for one dimension and πR(x)2 for two dimensions. One may compute ISNN by assigning a value to k, as is usually done when estimating the kNN entropy estimator, or by assigning a fixed value to R, as described below.

In this study, we consider the fixed-radius near-neighbors estimation of the IS with a fixed radius. The reason is that fixing k leads to the change of the radius for different lead times or models and makes it difficult to compare the prediction skills of different models or lead times for which the forecast dispersion (thus the radius of the kNN n-dimensional sphere) may differ. It is important to note that the kNN estimation based on the Kozachenko and Leonenko entropy estimator may be applied suitably to low-dimensional data, but it is not well suited for high-dimensional data because of the degradation of the accuracy of the kNN estimator (the so-called “curse of dimensionality”; Pestov 2000). Based on this consideration and the relatively small sample size of ensemble forecasts, we recommend that our method be used for low-dimensional data (n ≤ 2) unless a sufficiently large number of ensemble forecasts is available. We also note that the accuracy of the estimation can be improved by using methods more sophisticated than the kNN entropy estimator (e.g., Lombardi and Pant 2016; Pérez-Cruz 2008) or density-ratio estimator (e.g., Sugiyama et al. 2008). However, because these approaches complicate the computation of the scores and introduce additional arbitrary parameters, they likely impede simple and objective verification. For these reasons, these possible improvements are beyond the scope of this study.

When we compute ISNN, we first need to determine the fixed radius R of the n-dimensional sphere centered at the verification (observation) x. We then count the number of forecasts kf within the same search domain with that fixed radius (Fig. 1b; Hino and Murata 2013). Using kf, the ensemble size Nf, and the volume of the ball, the score can be computed by Eq. (4). It is noteworthy that IGNN is a nonparametric score because it does not assume a particular probability distribution.

Fig. 1.
Fig. 1.

A schematic diagram of the fixed-radius NN approach. (a) Forecasts and observations of the MJO indices RMM1 and RMM2 for a forecast case from an initial date of 22 Nov 2014. Ensemble forecasts of day 1, 5, 10, 15, and 20 (colored circles) and corresponding observations (black circles) are plotted. The color shading indicates the probability density covering 90% of the historical observations during extended winters (1 Nov–31 Mar) from 1980/81 to 2020/21 by kernel density estimation. The search circles of radii (R = 0.5 and 1) are shown only for day 10. MJO phases (P1–P8) are denoted in the diagram. (b) An enlargement of (a) around the circle domain of day 10. The historical observations are plotted with light blue circles. Please refer to the text for the procedure for counting using the fixed-radius NN approach.

Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1

A special treatment is required when there is no ensemble member within the ball of fixed radius R, i.e., the number of forecasts kf equals zero. In such a case, ISNN diverges to infinity because of the logarithmic function in the first term of Eq. (4). To avoid this problem when kf equals zero, we compute the score as follows. We search a kfth NN-point with a distance R′(x) from the verification x, where k′ is a small number (typically 1). We then compute ISNN(x) by using the equation ISNN(x)=log(kf)+log(Nf)+log[υnR(x)n]. We also compute the ignorance score of a reference forecast [climatological forecast; ISNNc(x)] with a total number of historical observations Nc. We count the number of climatological forecasts kc within the same fixed radius R and obtained ISNNc(x). We then compute the ISNN(x) score using the equation ISNN(x)=max[ISNN(x),ISNNc(x)]. The reason we use the maximum of ISNN(x) and ISNNc(x) is that we found that ISNN(x) can yield a lower (better) score than ISNNc(x) in some cases, especially for ensemble forecasts with a small ensemble size. Nevertheless, ISNN(x) should be higher (worse) than ISNNc(x). This treatment may be more objective than previous methods that assign an arbitrary, small probability to pf(x) because it uses a consistent way to compute the penalty term based on the probability density calculated using the kNN estimation.

We note that because the probabilistic density is not necessarily homogeneous within the domain of radius R (Fig. 1b), this estimate is susceptible to an error because of the inhomogeneity of data samples. However, our objective here is not to assess the local probability density accurately but to introduce an information-theoretic concept to the probabilistic verification. We thus consider that the ISNN computed by our procedure is valid for the purpose of verification.

For verifying forecast sets with multiple forecast cases (i = 1, 2, …, N) and multiple grid points (j = 1, 2, …, M), ISNN is averaged for all the N × M cases, i.e., ISNN=(1/NM)i,jISNN(xi,j). A smaller score means better forecasts, and the best score is log(υnRn).

c. Nonparametric estimation of the ignorance gain

As in the IG for categorical forecasts [Eq. (2)], the IG for continuous probability forecasts is accordingly defined as
IG(x)=ISc(x)ISf(x)=log[pf(x)/pc(x)],
where pc is a probability density function of the reference forecast, which is often a climatological forecast. The IG is interpreted as a difference in the information content between the ensemble forecast and the reference forecast for a given verification x.
We compute the IG using the near-neighbors search (IGNN) from the ISs of ensemble forecasts (ISNNf) and reference forecasts (climatological forecasts; ISNNc) with the same fixed radius R. As in the ISNN calculation, we first determine the fixed radius R of an n-dimensional sphere centered at the verification x. We then count the number of ensemble members (kf) within the search domain with that fixed radius (Fig. 1b; Hino and Murata 2013). We also count the number of reference forecasts (kc) in the same domain. In accord with the definition of the IS, we then compute ISNNc(x) using Nc and kc, and we calculate ISNNf(x) using Nf and kf. If no forecast falls in the domain, we obtain the IS with the method described in section 2b. Using ISNNc(x) and ISNNf(x), we compute the value of IGNN(x) for a given verification x as
IGNN(x)=ISNNc(x)ISNNf(x).
Note that the volume term in Eq. (5) vanishes because the volumes for ISNNc(x) and ISNNf(x) are the same unless no ensemble forecast falls in the domain, in which case we apply the special treatment. This definition of the IG is consistent with that of Peirolo (2011), who used the same bin size (a n-dimensional spherical volume in our case) for the ensemble and reference forecasts.

In our approach, there is one arbitrary parameter R to be configured. A larger R gives a smaller variance of IGNN but a smaller sensitivity to the skill, and vice versa. The range of kf(xi,j)/Nf and percentage of totally missed cases (no ensemble forecast in the domain) may be used to set an appropriate value of R. Although the score can be computed for a small radius R, our rule of thumb is to choose the radius R so that the percentage of totally missed cases is less than about 10%–15% for the least skillful case to be verified. If the chosen radius is too small, the radius can no longer be considered fixed, and the IS may be largely penalized by missed cases. For a normalized, one-dimensional (two-dimensional) data case, we suggest R = 0.5 (1.0) for 10 members, R = 0.3 (0.8) for 20 members, and R = 0.2 (0.65) for 30 members, based on results of idealized Monte Carlo simulations (section 3). The choice of R may also depend on the number of historical observations if that number is relatively small and a smooth climatological probability density function is difficult to obtain. One can assign several values to R to verify forecasts at different lead times if necessary. Setting multiple values to R corresponds to the current verification practice of choosing a range of categories based on the forecast skill. Namely, a smaller range of categories (e.g., decile or quintile) is used to verify relatively high-skill forecasts (e.g., shorter-range forecasts up to two weeks), and a wider range of categories (e.g., tercile) is used to verify relatively low-skill forecasts (e.g., subseasonal and longer forecasts).

For scoring multiple forecast cases (i = 1, 2, …, N) and multiple grid points (j = 1, 2, …, M), we average IGNN for all the N × M cases as
IGNN=1NMi,jIGNN(xi,j).
Moreover, a scaled skill score may be more intuitive than the raw IGNN. The best score of IGNN, log(Nc)log[kc(xi,j)], is obtained when all ensemble members fall in the domain of the radius R, where kc(xi,j) is the number of climatological forecast (historical observation) samples in the domain, and Nc is the total number of climatological forecast samples. An ignorance gain skill score (IGSNN) can be defined by scaling the IGNN with the best score as follows:
IGSNN=i,j[IGNNc(xi,j)IGNNf(xi,j)]i,j{log(Nc)log[kc(xi,j)]}.
A larger IGSNN indicates a more skillful forecast, and IGSNN = 1 corresponds to a perfect forecast.

3. Results

a. Basic characteristics of the IS and IG

Before discussing real applications of the proposed scores, we illustrate the basic characteristics of the ISNN and IGSNN with idealized Monte Carlo simulations of one- and two-dimensional data. For ease of understanding the new scores, we first compare the ISNN and IGSNN with the Pearson correlation coefficient of a single-member forecast, which is one of the most basic verification scores. We also examine how the proposed scores evaluate the performance of ensemble forecasts in terms of mean and dispersion biases.

We conducted an idealized Monte Carlo simulation, which is a versatile method for investigating the characteristics of forecast scores (Kumar 2009; Vitart and Takaya 2021). We used the approach of Vitart and Takaya (2021) and conducted Monte Carlo simulations by assuming that both the verification and ensemble forecast data followed a Gaussian distribution. Here we briefly describe the simulation method.

In the one-dimensional idealized simulations, the verification (observation) and ensemble forecasts were generated as follows:
oi=r1Si+1r1Ni,fij=r1Si+α1r1Nij+β,
where oi and fij are the verification and the jth ensemble forecast of the ith case, respectively; r1 is the expectation value of a single-member Pearson correlation coefficient; Si is the signal component of the ith case; Ni and Nij are noise components; α is a parameter of the forecast dispersion; and β is a parameter of the bias.The terms Si, Ni, and Nij are independent identically distributed random variables of the normal (Gaussian) distribution, N(μ, σ), with mean μ = 0 and variance σ = 1. The random variables were created with the Box–Muller method (Box and Muller 1958) from uniformly distributed random numbers. We present the results for a radius R = 0.5 and a total number of historical observations N = 5000. We performed a total of 10 000 iterations of 100-case forecast sets and plotted the medians.

We first examine the correspondence of ISNN and IGSNN to the single-member correlation score (r1). We computed ISNN, IGSNN, and the Pearson correlation coefficients of the ensemble mean forecasts (rm, where m is ensemble size) under the perfect model assumption (α = 1 and β = 0). Figure 2 shows ISNN, IGSNN, and rm as a function of r1. It is apparent that the scores of IGNN and IGSNN are better for higher r1, which corresponds to a larger signal-to-noise ratio (Kumar 2009). The fact that the scores for ISNN and IGSNN also improve as the ensemble size increases is consistent with other scores investigated in previous studies (Leutbecher 2019; Siegert et al. 2019, and references therein). The added value of the information-based scores determined by ensemble forecasting is particularly apparent in the relatively low range of r1 (r1 < 0.6). A previous study has pointed out that the dependence on ensemble size can be adjusted and has proposed an ensemble-adjusted IS (Siegert et al. 2019). It is also noteworthy that both the ISNN and IGSNN appear more sensitive than rm to ensemble size even when the correlation is relatively high (e.g., r1 > 0.8), and rm becomes saturated. We emphasize that the proposed scores yield seemingly reasonable values, even for low-skill situations and small ensemble sizes (r1 < 0.1, m = 10) where the average ratio of totally missed cases can be approximately 10%. The implication is that the treatment for the totally missed cases described in section 2b works reasonably well for a wide range of forecast skill.

Fig. 2.
Fig. 2.

ISNN, IGSNN, and the correlation coefficient scores of the ensemble mean forecasts as a function of the single-member correlation score (r1). Plots are shown for a radius R = 0.5. Solid lines and closed circles indicate (a) ISNN and (b) IGSNN, and crosses indicate correlation coefficients of ensemble mean forecasts for reference. Colors indicate results for various ensemble sizes. The correlation coefficients of ensemble mean forecasts are averages of correlations calculated from 10 000 iterations with the Fisher’s z transformation to avoid biases of average correlations.

Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1

We next examine the properties of the proper scoring rules (Bröcker and Smith 2007). To investigate the sensitivity to the mean bias error, we computed ISNN by changing the bias parameter (β) (Fig. 3). Here, the single-member correlation score r1 was set to 0.4. As expected, the ISNN is best (smallest) when β = 0 for various ensemble sizes. Thus, ISNN is best when the probabilistic distributions of the forecasts and verification coincide. This result suggests that the ISNN is proper with respect to the mean bias error. We note that the basic characteristics did not change when the correlation coefficients (r1) differed.

Fig. 3.
Fig. 3.

The dependency of ISNN on the mean bias error (β). Colors indicate results for various ensemble sizes.

Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1

We next examine the sensitivity of the ISNN to the forecast dispersion. This sensitivity is assayed by computing ISNN as a function of the value of the forecast dispersion (α) (Fig. 4). A value of α greater (smaller) than 1 corresponds to an overdispersive (underdispersive) forecast. For comparison, we calculated another theoretically proper score for continuous probabilistic forecasts, the continuous ranked probability score (CRPS) computed by Hersbach’s algorithm (Hersbach 2000). For being a proper score, the best score must be obtained when the dispersion of the noise component of the ensemble forecast coincides with that of the verification (in this case, α = 1).

Fig. 4.
Fig. 4.

The dependency of (a) ISNN, (b) continuous ranked probability score (CRPS), and (c) the correlation coefficient of ensemble mean forecasts on the forecast dispersion parameter (α). Colors indicate results for various ensemble size. In (a), the radius R is 0.5 except for the purple data, for which R is 0.2. In (c), correlation coefficients of the 10 000 iterations were averaged using Fisher’s z transformation.

Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1

We found that in small ensemble cases, the scores can be biased and hence somewhat misleading (Fig. 4a). For example, in the 25-member case, underdispersive forecasts (α ∼ 0.9) can yield better scores than the perfect forecast (i.e., α = 1). We found that this problem is also apparent for the CRPS (Fig. 4b). This result is consistent with the finding of Ferro (2014), who noted that the CRPS favors ensembles that are sampled from underdispersed distributions. Overdispersive forecasts (α > 1) are correctly given low scores by both the ISNN and CRPS. With ensemble sizes greater than 50, both the ISNN and CRPS become more proper (the α of the best score is closer to 1). Although the ISNN of a 200-member ensemble with R = 0.5 is still slightly biased, this bias can be reduced by setting R = 0.2 (Fig. 4a). The increase of the ensemble size makes this possible with a smaller percentage of totally missed cases. Because the ISNN is more sensitive to the underdispersion error than the CRPS, the ISNN may be more helpful than CRPS in detecting the dispersion error in very underdispersive forecasts (α < 0.8). We also note that Monte Carlo simulations revealed that the correlation coefficient of ensemble mean forecasts, which is widely used as a deterministic verification score, is not a proper score for the indicator of forecast dispersion (Fig. 4c). In particular, with smaller ensemble size, an underdispersive forecast can yield a higher (better) forecast score.

These results have important implications for the current verification practice, because the score characteristics associated with being proper have usually been considered theoretically, without any constraint on ensemble size (i.e., infinite ensemble size; Roulston and Smith 2002; Bröcker and Smith 2007). Although Ferro (2014) pointed out the improper nature of the Brier score and CRPS with respect to the dispersion (total variance) for a small ensemble and proposed adjusted scores, the improper nature of the original scores has not been widely recognized. The results also imply that current subseasonal-to-seasonal hindcast configurations with a relatively small ensemble (m < 25) require attention to the detection of underdispersion by theoretically proper probabilistic scores (either local or nonlocal scores). We have not yet found a way to avoid the improper behavior of ISNN with small ensemble forecasts with respect to forecasting dispersion. This aspect merits further study in the future.

Last, we show the correspondence of ISNN and IGSNN to the single-member correlation score in a two-dimensional case. In this example, the observation (verification) and ensemble forecasts were generated in a manner similar to the one-dimensional example as follows:
o1i=r1S1i+1r1N1i,o2i=r2S2i+1r2N2i,f1ij=r1S1i+1r1N1ij,f2ij=r2S2i+1r2N2ij,
where o1i and o2i are the verification, f1ij and f2ij are jth ensemble forecasts of the ith case, r1 and r2 are expectations of single-member Pearson correlation coefficients, S1i and S2i are signal components of the ith case, and N1i, N2i, N1ij, and N2ij are independent noise components. The first subscript (1 or 2) indicates the dimension. This idealized simulation mimics the MJO verification discussed in section 3b.

Figure 5 displays ISNN, IGSNN, and the bivariate correlation coefficients (Gottschalck et al. 2010) with respect to single-member correlations, r1 and r2. We found that ISNN and IGSNN are low and high, respectively, only if both r1 and r2 are high. This characteristic reflects the fact that the number of forecast members in the near-neighbors domain sharply decreases if one of the correlations (r1 or r2) decreases. On the contrary, the bivariate correlation coefficient is less sensitive to a lower value of r1 and r2 than ISNN and IGSNN. In other words, the bivariate correlation coefficient can be relatively high if either r1 or r2 is high. We note that the model we used assumes that the variances are the same in both dimensions. If the variability differs between the dimensions, the dependency on the single-member correlation coefficient in each dimension may differ from the results presented here. Namely, the single-member correlation coefficient of the dimension with the larger variance has a greater influence on ISNN and IGSNN. This situation may occur, for instance, in unnormalized horizontal wind vectors. More detailed analysis may be presented in future studies that deal with the verification of such variables.

Fig. 5.
Fig. 5.

(a) ISNN, (b) IGSNN, and (c) the bivariate correlation coefficient scores of the ensemble mean forecasts as a function of the single-member correlation scores (r1, r2). Plots are shown for the case of a radius R = 1.0 and m = 25.

Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1

b. Probabilistic verification of the MJO index

We now demonstrate the practical use of IGNN and IGSNN for the probabilistic verification of the MJO index. Figure 1a shows an example of an MJO diagram and the probability density of the observed climatology, which is used as the reference forecast, during the hindcast period of extended winters from 1980/81 to 2020/21. We have approximately 6200 samples of data from historical observations during the corresponding hindcast period.

Figure 6 shows IGNN and IGSNN versus lead time for the MJO forecasts of the ECMWF forecast system during the hindcast period. Scores using radii of 0.5 and 1 are shown. The quartile ranges of all cases are shown to indicate the variability of the scores among the cases (Figs. 6a,c). The significance level of 5% was assessed using Monte Carlo sampling (by shuffling the order of observation samples). It is apparent from Figs. 6a and 6c that the lower quartile of the IGSNN exceeds zero at a lead time of <15 (25) days for a radius of 0.5 (1), and the IGSNN scores indicate statistically significant skill up to roughly 40 (45) days for a radius of 0.5 (1). These choices of the radii (R = 0.5 and 1) are used to verify the skill of forecast ranges for short (<10 days) and long (>10 days) lead times. We found that both IGNN and IGSNN can estimate the probabilistic forecast skill reasonably, and, as expected, IGNN and IGSNN are higher for forecasts with a shorter lead time. The increase of IGNN with R = 0.5 at a short lead time (≤2 days) likely reflects the overconfidence of the forecast system or relatively large errors (Fig. 6a). Figures 6b and 6d show the ratio of the number of cases for which kf/Nf exceeded kc/Nc to the total number of cases. This ratio is the ratio of the forecast cases that are more skillful than the climatological forecasts. Averages of kf/Nf are also shown to indicate the probability of detection, which is another simple measure of the probabilistic forecast skill.

Fig. 6.
Fig. 6.

The IGNN and IGSNN vs lead time for the MJO ensemble hindcasts of the ECMWF forecast system starting from November to February from 2001/02 to 2020/21. Results for radii of (a),(b) 0.5 and (c),(d) 1. The blue shadings indicate quartile ranges of IGNN for all cases. The dashed lines in (a) and (c) indicate the significance level of p = 0.05 based on Monte Carlo sampling. In (b) and (d), ratios of the number of cases of IGNN > 0 to the total number of cases are shown. Averages of kf/Nf are also shown to indicate the probability of detection. The dashed lines in (b) and (d) indicate the ratio of totally missed cases to all cases.

Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1

One important feature of the IGNN score is that it can evaluate the forecast skill of individual cases. One can therefore examine the conditional forecast skill by stratifying the cases with particular conditions. Figure 7a illustrates the dependency of the IGNN score on the initial MJO phase. The results indicate that, in the ECMWF model, the forecasts starting from MJO phases 2–3 and 3–4 tend to have slightly higher scores in week 2 and week 1, whereas the forecasts starting from MJO phases 4–5 tend to have lower scores in week 3, as previous studies have reported (Kim 2017; Kim et al. 2018; Vitart and Molteni 2010). The lower skill of the initial MJO phases 4–5 presumably reflects the model’s difficulty in predicting the propagation of MJOs through the Maritime Continent (Kim 2017; Vitart and Molteni 2010). It should be noted that the dependency of the MJO forecast skill on the initial MJO phase may differ among models (Kim et al. 2014, 2018; Lim et al. 2018).

Fig. 7.
Fig. 7.

The dependency of the IGNN score on the initial MJO phase and La Niña condition in the ECMWF model. (a) Difference of IGNN scores in different initial MJO phases from the average of the IGNN scores for all cases. (b) Difference of IGNN scores between La Niña winters (extended winters of 2005/06, 2007/08, 2010/11, 2017/18, and 2020/21) from all the winters (from 2001/02 to 2020/21) for each initial MJO phase. All the IGNN scores were computed with a radius of 1 and averaged for forecast cases if their initial MJO amplitudes (RMM12+RMM22) exceeded 1. The cross hatching indicates that the difference is statistically significant at the 90% confidence level. Numbers in parentheses denote the number of forecast cases.

Citation: Monthly Weather Review 151, 9; 10.1175/MWR-D-23-0003.1

The MJO forecast skill is possibly modulated by the interannual variability of El Niño–Southern Oscillation (ENSO). Figure 7b shows the modulation of IGNN in La Niña winters for different initial MJO phases. We found that forecasts starting from MJO phases 5–6 (8–1) tend to have higher (lower) scores during La Niña winters. These higher (lower) scores are associated with larger (smaller) MJO amplitudes in the observations and forecasts during La Niña winters (not shown). This result reflects the flow-dependent predictability of the MJO due to ENSO in the ECMWF model. In contrast, we found no clear change of IGNN during El Niño winters (not shown). A further study will determine if these are common features across S2S models, but such detailed analysis is beyond the scope of this paper. As demonstrated here, the proposed approach enables evaluation of the conditional probabilistic forecast skill.

4. Conclusions

In this paper we proposed the use of information-based probabilistic verification scores named the fixed-radius near-neighbors ignorance score (ISNN) and information gain (IGNN). In the proposed method, a nonparametric, near-neighbors estimator with a fixed search radius was used to compute the ISNN and IGNN of continuous quantities of ensemble forecasts.

The characteristics of the proposed scores were investigated using idealized Monte Carlo simulations. The correspondence of the proposed scores to the Pearson correlation coefficient was illustrated. The sensitivity of the proposed scores to ensemble size was also investigated. The scores were slightly better with larger ensemble sizes as many other scores. The proper scoring characteristics were elaborated in terms of the mean and dispersion biases. We found that both the ISNN and CRPS could be biased for forecast dispersion if the ensemble size was small. It appears to be impractical to use currently proposed local (ISNN) and nonlocal (CRPS) proper scores to detect the underdispersion error with small ensemble forecasts; care should be taken in the interpretation of these scores. The ISNN becomes more reliable when the ensemble size is large and the radius parameter is small. This information facilitates practical application of the proposed scores.

One of the advantages of the proposed scores is that they can be naturally extended to vector variables of multiple dimensions (the practical application to few dimensions is due only to the limited ensemble size in the current meteorological forecasts). Comparisons were made between the proposed scores and bivariate correlation coefficients for idealized, two-dimensional Monte Carlo simulations. A notable difference was found in the score characteristics. The ignorance gain skill score (IGSNN) is high if the single-member correlations for both dimensions are high. In contrast, the bivariate correlation can be relatively high if a single-member correlation for either dimension is high.

This paper illustrated how the proposed method can be applied to subseasonal forecasts of the MJO index, which consists of two-dimensional data. The proposed approach enabled assessment of the probabilistic forecast skill of the MJO index. The results demonstrated that the IGNN score can be a useful score to assess the accuracy of ensemble forecasts. Another advantage of the proposed scores is that they can be computed for individual cases. We also showed the dependency of the probabilistic MJO forecast skill on the initial MJO phases and ENSO phases in the ECMWF model. We consider that the new approach for probabilistic forecast verification can support various forecast activities not only in the Earth sciences but also in other disciplines.

Acknowledgments.

The MJO forecast data were provided by the WWRP/WCRP S2S project. This work was supported by the Arctic Challenge for Sustainability II (ArCS II) program, Grant JPMXD1420318865, the MEXT program for the advanced studies of climate change projection (SENTAN), Grants JPMXD0722680395 and JPMXD0722680734, and Japan Society for the Promotion of Science KAKENHI Grant JP22H03653.

Data availability statement.

The MJO index data of the ECMWF subseasonal reforecasts, real-time forecasts, and ERA5 reanalysis are available from the institutional repository of the Meteorological Research Institute (https://climate.mri-jma.go.jp/pub/archives/Takaya-et-al_MJO-S2S/).

REFERENCES

  • Benedetti, R., 2010: Scoring rules for forecast verification. Mon. Wea. Rev., 138, 203211, https://doi.org/10.1175/2009MWR2945.1.

  • Box, G. E. P., and M. E. Muller, 1958: A note on the generation of random normal deviates. Ann. Math. Stat., 29, 610611, https://doi.org/10.1214/aoms/1177706645.

    • Search Google Scholar
    • Export Citation
  • Bröcker, J., and L. A. Smith, 2007: Scoring probabilistic forecast: The importance of being proper. Wea. Forecasting, 22, 382388, https://doi.org/10.1175/WAF966.1.

    • Search Google Scholar
    • Export Citation
  • Bröcker, J., and L. A. Smith, 2008: From ensemble forecasts to predictive distribution functions. Tellus, 60A, 663678, https://doi.org/10.1111/j.1600-0870.2008.00333.x.

    • Search Google Scholar
    • Export Citation
  • Casati, B., M. Dorninger, C. A. S. Coelho, E. E. Ebert, C. Marsigli, M. P. Mittermaier, and E. Gilleland, 2022: The 2020 International Verification Methods Workshop Online: Major outcomes and way forward. Bull. Amer. Meteor. Soc., 103, E899E910, https://doi.org/10.1175/BAMS-D-21-0126.1.

    • Search Google Scholar
    • Export Citation
  • Dawid, A. P., and P. Sebastiani, 1999: Coherent dispersion criteria for optimal experimental design. Ann. Stat., 27, 6581, https://doi.org/10.1214/aos/1018031101.

    • Search Google Scholar
    • Export Citation
  • Ferro, C. A. T., 2014: Fair scores for ensemble forecasts. Quart. J. Roy. Meteor. Soc., 140, 19171923, https://doi.org/10.1002/qj.2270.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and A. E. Raftery, 2007: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc., 102, 359378, https://doi.org/10.1198/016214506000001437.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., L. I. Stanberry, E. P. Grimit, L. Held, and N. A. Johnson, 2008: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. TEST, 17, 211235, https://doi.org/10.1007/s11749-008-0114-x.

    • Search Google Scholar
    • Export Citation
  • Gombos, D., J. A. Hansen, J. Du, and J. McQueen, 2007: Theory and applications of the minimum spanning tree rank histogram. Mon. Wea. Rev., 135, 14901505, https://doi.org/10.1175/MWR3362.1.

    • Search Google Scholar
    • Export Citation
  • Goria, M. N., N. N. Leonenko, V. V. Mergel, and P. L. Novi Inverardi, 2005: A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparametric Stat., 17, 277297, https://doi.org/10.1080/104852504200026815.

    • Search Google Scholar
    • Export Citation
  • Gottschalck, J., and Coauthors, 2010: A framework for assessing operational Madden–Julian oscillation forecasts. Bull. Amer. Meteor. Soc., 91, 12471258, https://doi.org/10.1175/2010BAMS2816.1.

    • Search Google Scholar
    • Export Citation
  • Grimit, E. P., T. Gneiting, V. J. Berrocal, and N. A. Johnson, 2006: The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Quart. J. Roy. Meteor. Soc., 132, 29252942, https://doi.org/10.1256/qj.05.235.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559570, https://doi.org/10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 19992049, https://doi.org/10.1002/qj.3803.

    • Search Google Scholar
    • Export Citation
  • Hino, H., and N. Murata, 2013: Information estimators for weighted observations. Neural Networks, 46, 260275, https://doi.org/10.1016/j.neunet.2013.06.005.

    • Search Google Scholar
    • Export Citation
  • Kalnay, E., 2019: Historical perspective: Earlier ensembles and forecasting forecast skill. Quart. J. Roy. Meteor. Soc., 145, 2534, https://doi.org/10.1002/qj.3595.

    • Search Google Scholar
    • Export Citation
  • Kim, H.-M., 2017: The impact of the mean moisture bias on the key physics of MJO propagation in the ECMWF reforecast. J. Geophys. Res. Atmos., 122, 77727784, https://doi.org/10.1002/2017JD027005.

    • Search Google Scholar
    • Export Citation
  • Kim, H.-M., P. J. Webster, V. E. Toma, and D. Kim, 2014: Predictability and prediction skill of the MJO in two operational forecasting systems. J. Climate, 27, 53645378, https://doi.org/10.1175/JCLI-D-13-00480.1.

    • Search Google Scholar
    • Export Citation
  • Kim, H.-M., F. Vitart, and D. E. Waliser, 2018: Prediction of the Madden–Julian oscillation: A review. J. Climate, 31, 94259443, https://doi.org/10.1175/JCLI-D-18-0210.1.

    • Search Google Scholar
    • Export Citation
  • Kozachenko, L. F., and N. N. Leonenko, 1987: Sample estimate of entropy of a random vector. Probl. Info. Transm., 23, 95101.

  • Kraskov, A., H. Stögbauer, and P. Grassberger, 2004: Estimating mutual information. Phys. Rev., 69E, 066138, https://doi.org/10.1103/PhysRevE.69.066138.

    • Search Google Scholar
    • Export Citation
  • Kumar, A., 2009: Finite samples and uncertainty estimates for skill measures for seasonal predictions. Mon. Wea. Rev., 137, 26222631, https://doi.org/10.1175/2009MWR2814.1.

    • Search Google Scholar
    • Export Citation
  • Leutbecher, M., 2019: Ensemble size: How suboptimal is less than infinity? Quart. J. Roy. Meteor. Soc., 145, 107128, https://doi.org/10.1002/qj.3387.

    • Search Google Scholar
    • Export Citation
  • Lim, Y., S.-W. Son, and D. Kim, 2018: MJO prediction skill of the subseasonal-to-seasonal prediction models. J. Climate, 31, 40754094, https://doi.org/10.1175/JCLI-D-17-0545.1.

    • Search Google Scholar
    • Export Citation
  • Loftsgaarden, D. O., and C. P. Quesenberry, 1965: A nonparametric estimate of multivariate density function. Ann. Math. Stat., 36, 10491051, https://doi.org/10.1214/aoms/1177700079.

    • Search Google Scholar
    • Export Citation
  • Lombardi, D., and S. Pant, 2016: Nonparametric k-nearest-neighbor entropy estimator. Phys. Rev., 93E, 013310, https://doi.org/10.1103/PhysRevE.93.013310.

    • Search Google Scholar
    • Export Citation
  • Marshall, A. G., H. H. Hendon, and D. Hudson, 2016: Visualizing and verifying probabilistic forecasts of the Madden–Julian Oscillation. Geophys. Res. Lett., 43, 12 27812 286, https://doi.org/10.1002/2016GL071423.

    • Search Google Scholar
    • Export Citation
  • Matsueda, M., and H. Endo, 2011: Verification of medium-range MJO forecasts with TIGGE. Geophys. Res. Lett., 38, L11801, https://doi.org/10.1029/2011GL047480.

    • Search Google Scholar
    • Export Citation
  • Murray, S. A., 2018: The importance of ensemble techniques for operational space weather forecasting. Space Wea., 16, 777783, https://doi.org/10.1029/2018SW001861.

    • Search Google Scholar
    • Export Citation
  • Palmer, T., 2019: The ECMWF ensemble prediction system: Looking back (more than) 25 years and projecting forward 25 years. Quart. J. Roy. Meteor. Soc., 145, 1224, https://doi.org/10.1002/qj.3383.

    • Search Google Scholar
    • Export Citation
  • Peirolo, R., 2011: Information gain as a score for probabilistic forecasts. Meteor. Appl., 18, 917, https://doi.org/10.1002/met.188.

  • Pérez-Cruz, F., 2008: Estimation of information theoretic measures for continuous random variables. Advances in Neural Information Processing Systems: Proceedings of the First 12 Conferences, M. I. Jordan, Y. LeCun, and S. A. Solla, Eds., Vol. 21, Curran Associates Inc., 1257–1264.

  • Pestov, V., 2000: On the geometry of similarity search: Dimensionality curse and concentration of measure. Info. Process. Lett., 73, 4751, https://doi.org/10.1016/S0020-0190(99)00156-8.

    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., and L. A. Smith, 2002: Evaluating probabilistic forecasts using information theory. Mon. Wea. Rev., 130, 16531660, https://doi.org/10.1175/1520-0493(2002)130<1653:EPFUIT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Scheuerer, M., and T. M. Hamill, 2015: Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Wea. Rev., 143, 13211334, https://doi.org/10.1175/MWR-D-14-00269.1.

    • Search Google Scholar
    • Export Citation
  • Scott, D. W., 1979: On optimal and data-based histograms. Biometrika, 66, 605610, https://doi.org/10.1093/biomet/66.3.605.

  • Siegert, S., C. A. T. Ferro, D. B. Stephenson, and M. Leutbecher, 2019: The ensemble-adjusted ignorance score for forecasts issued as normal distributions. Quart. J. Roy. Meteor. Soc., 145, 129139, https://doi.org/10.1002/qj.3447.

    • Search Google Scholar
    • Export Citation
  • Singh, H., N. Misra, V. Hnizdo, A. Fedorowicz, and E. Demchuk, 2003: Nearest neighbor estimates of entropy. Amer. J. Math. Manage. Sci., 23, 301321, https://doi.org/10.1080/01966324.2003.10737616.

    • Search Google Scholar
    • Export Citation
  • Smith, L. A., and J. A. Hansen, 2004: Extending the limits of ensemble forecast verification with the minimum spanning tree. Mon. Wea. Rev., 132, 15221528, https://doi.org/10.1175/1520-0493(2004)132<1522:ETLOEF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Sugiyama, M., T. Suzuki, S. Nakajima, H. Kashima, P. von Bünau, and M. Kawanabe, 2008: Direct importance estimation for covariate shift adaptation. Ann. Inst. Stat. Math., 60, 699746, https://doi.org/10.1007/s10463-008-0197-x.

    • Search Google Scholar
    • Export Citation
  • Tödter, J., and B. Ahrens, 2012: Generalization of the ignorance score: Continuous ranked version and its decomposition. Mon. Wea. Rev., 140, 20052017, https://doi.org/10.1175/MWR-D-11-00266.1.

    • Search Google Scholar
    • Export Citation
  • Troin, M., R. Arsenault, A. W. Wood, F. Brissette, and J.-L. Martel, 2021: Generating ensemble streamflow forecasts: A review of methods and approaches over the past 40 years. Water Resour. Res., 57, e2020WR028392, https://doi.org/10.1029/2020WR028392.

    • Search Google Scholar
    • Export Citation
  • Vitart, F., and F. Molteni, 2010: Simulation of the Madden–Julian Oscillation and its teleconnections in the ECMWF forecast system. Quart. J. Roy. Meteor. Soc., 136, 842855, https://doi.org/10.1002/qj.623.

    • Search Google Scholar
    • Export Citation
  • Vitart, F., and Y. Takaya, 2021: Lagged ensembles in sub-seasonal predictions. Quart. J. Roy. Meteor. Soc., 147, 32273242, https://doi.org/10.1002/qj.4125.

    • Search Google Scholar
    • Export Citation
  • Vitart, F., and Coauthors, 2017: The Subseasonal to Seasonal (S2S) prediction project database. Bull. Amer. Meteor. Soc., 98, 163173, https://doi.org/10.1175/BAMS-D-16-0017.1.

    • Search Google Scholar
    • Export Citation
  • Wand, M. P., and M. C. Jones, 1995: Kernel Smoothing. Springer, 224 pp.

  • Weijs, S. V., R. van Nooijen, and N. van de Giesen, 2010: Kullback–Leibler divergence as a forecast skill score with classic reliability–resolution–uncertainty decomposition. Mon. Wea. Rev., 138, 33873399, https://doi.org/10.1175/2010MWR3229.1.

    • Search Google Scholar
    • Export Citation
  • Wheeler, M. C., and H. H. Hendon, 2004: An all-season real-time multivariate MJO index: Development of an index for monitoring and prediction. Mon. Wea. Rev., 132, 19171932, https://doi.org/10.1175/1520-0493(2004)132<1917:AARMMI>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2020: Regularized Dawid–Sebastiani score for multivariate ensemble forecasts. Quart. J. Roy. Meteor. Soc., 146, 24212431, https://doi.org/10.1002/qj.3800.

    • Search Google Scholar
    • Export Citation
Save
  • Benedetti, R., 2010: Scoring rules for forecast verification. Mon. Wea. Rev., 138, 203211, https://doi.org/10.1175/2009MWR2945.1.

  • Box, G. E. P., and M. E. Muller, 1958: A note on the generation of random normal deviates. Ann. Math. Stat., 29, 610611, https://doi.org/10.1214/aoms/1177706645.

    • Search Google Scholar
    • Export Citation
  • Bröcker, J., and L. A. Smith, 2007: Scoring probabilistic forecast: The importance of being proper. Wea. Forecasting, 22, 382388, https://doi.org/10.1175/WAF966.1.

    • Search Google Scholar
    • Export Citation
  • Bröcker, J., and L. A. Smith, 2008: From ensemble forecasts to predictive distribution functions. Tellus, 60A, 663678, https://doi.org/10.1111/j.1600-0870.2008.00333.x.

    • Search Google Scholar
    • Export Citation
  • Casati, B., M. Dorninger, C. A. S. Coelho, E. E. Ebert, C. Marsigli, M. P. Mittermaier, and E. Gilleland, 2022: The 2020 International Verification Methods Workshop Online: Major outcomes and way forward. Bull. Amer. Meteor. Soc., 103, E899E910, https://doi.org/10.1175/BAMS-D-21-0126.1.

    • Search Google Scholar
    • Export Citation
  • Dawid, A. P., and P. Sebastiani, 1999: Coherent dispersion criteria for optimal experimental design. Ann. Stat., 27, 6581, https://doi.org/10.1214/aos/1018031101.

    • Search Google Scholar
    • Export Citation
  • Ferro, C. A. T., 2014: Fair scores for ensemble forecasts. Quart. J. Roy. Meteor. Soc., 140, 19171923, https://doi.org/10.1002/qj.2270.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and A. E. Raftery, 2007: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc., 102, 359378, https://doi.org/10.1198/016214506000001437.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., L. I. Stanberry, E. P. Grimit, L. Held, and N. A. Johnson, 2008: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. TEST, 17, 211235, https://doi.org/10.1007/s11749-008-0114-x.

    • Search Google Scholar
    • Export Citation
  • Gombos, D., J. A. Hansen, J. Du, and J. McQueen, 2007: Theory and applications of the minimum spanning tree rank histogram. Mon. Wea. Rev., 135, 14901505, https://doi.org/10.1175/MWR3362.1.

    • Search Google Scholar
    • Export Citation
  • Goria, M. N., N. N. Leonenko, V. V. Mergel, and P. L. Novi Inverardi, 2005: A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparametric Stat., 17, 277297, https://doi.org/10.1080/104852504200026815.

    • Search Google Scholar
    • Export Citation
  • Gottschalck, J., and Coauthors, 2010: A framework for assessing operational Madden–Julian oscillation forecasts. Bull. Amer. Meteor. Soc., 91, 12471258, https://doi.org/10.1175/2010BAMS2816.1.

    • Search Google Scholar
    • Export Citation
  • Grimit, E. P., T. Gneiting, V. J. Berrocal, and N. A. Johnson, 2006: The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Quart. J. Roy. Meteor. Soc., 132, 29252942, https://doi.org/10.1256/qj.05.235.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559570, https://doi.org/10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 19992049, https://doi.org/10.1002/qj.3803.

    • Search Google Scholar
    • Export Citation
  • Hino, H., and N. Murata, 2013: Information estimators for weighted observations. Neural Networks, 46, 260275, https://doi.org/10.1016/j.neunet.2013.06.005.

    • Search Google Scholar
    • Export Citation
  • Kalnay, E., 2019: Historical perspective: Earlier ensembles and forecasting forecast skill. Quart. J. Roy. Meteor. Soc., 145, 2534, https://doi.org/10.1002/qj.3595.

    • Search Google Scholar
    • Export Citation
  • Kim, H.-M., 2017: The impact of the mean moisture bias on the key physics of MJO propagation in the ECMWF reforecast. J. Geophys. Res. Atmos., 122, 77727784, https://doi.org/10.1002/2017JD027005.

    • Search Google Scholar
    • Export Citation
  • Kim, H.-M., P. J. Webster, V. E. Toma, and D. Kim, 2014: Predictability and prediction skill of the MJO in two operational forecasting systems. J. Climate, 27, 53645378, https://doi.org/10.1175/JCLI-D-13-00480.1.

    • Search Google Scholar
    • Export Citation
  • Kim, H.-M., F. Vitart, and D. E. Waliser, 2018: Prediction of the Madden–Julian oscillation: A review. J. Climate, 31, 94259443, https://doi.org/10.1175/JCLI-D-18-0210.1.

    • Search Google Scholar
    • Export Citation
  • Kozachenko, L. F., and N. N. Leonenko, 1987: Sample estimate of entropy of a random vector. Probl. Info. Transm., 23, 95101.

  • Kraskov, A., H. Stögbauer, and P. Grassberger, 2004: Estimating mutual information. Phys. Rev., 69E, 066138, https://doi.org/10.1103/PhysRevE.69.066138.

    • Search Google Scholar
    • Export Citation
  • Kumar, A., 2009: Finite samples and uncertainty estimates for skill measures for seasonal predictions. Mon. Wea. Rev., 137, 26222631, https://doi.org/10.1175/2009MWR2814.1.

    • Search Google Scholar
    • Export Citation
  • Leutbecher, M., 2019: Ensemble size: How suboptimal is less than infinity? Quart. J. Roy. Meteor. Soc., 145, 107128, https://doi.org/10.1002/qj.3387.

    • Search Google Scholar
    • Export Citation
  • Lim, Y., S.-W. Son, and D. Kim, 2018: MJO prediction skill of the subseasonal-to-seasonal prediction models. J. Climate, 31, 40754094, https://doi.org/10.1175/JCLI-D-17-0545.1.

    • Search Google Scholar
    • Export Citation
  • Loftsgaarden, D. O., and C. P. Quesenberry, 1965: A nonparametric estimate of multivariate density function. Ann. Math. Stat., 36, 10491051, https://doi.org/10.1214/aoms/1177700079.

    • Search Google Scholar
    • Export Citation
  • Lombardi, D., and S. Pant, 2016: Nonparametric k-nearest-neighbor entropy estimator. Phys. Rev., 93E, 013310, https://doi.org/10.1103/PhysRevE.93.013310.

    • Search Google Scholar
    • Export Citation
  • Marshall, A. G., H. H. Hendon, and D. Hudson, 2016: Visualizing and verifying probabilistic forecasts of the Madden–Julian Oscillation. Geophys. Res. Lett., 43, 12 27812 286, https://doi.org/10.1002/2016GL071423.

    • Search Google Scholar
    • Export Citation
  • Matsueda, M., and H. Endo, 2011: Verification of medium-range MJO forecasts with TIGGE. Geophys. Res. Lett., 38, L11801, https://doi.org/10.1029/2011GL047480.

    • Search Google Scholar
    • Export Citation
  • Murray, S. A., 2018: The importance of ensemble techniques for operational space weather forecasting. Space Wea., 16, 777783, https://doi.org/10.1029/2018SW001861.

    • Search Google Scholar
    • Export Citation
  • Palmer, T., 2019: The ECMWF ensemble prediction system: Looking back (more than) 25 years and projecting forward 25 years. Quart. J. Roy. Meteor. Soc., 145, 1224, https://doi.org/10.1002/qj.3383.

    • Search Google Scholar
    • Export Citation
  • Peirolo, R., 2011: Information gain as a score for probabilistic forecasts. Meteor. Appl., 18, 917, https://doi.org/10.1002/met.188.

  • Pérez-Cruz, F., 2008: Estimation of information theoretic measures for continuous random variables. Advances in Neural Information Processing Systems: Proceedings of the First 12 Conferences, M. I. Jordan, Y. LeCun, and S. A. Solla, Eds., Vol. 21, Curran Associates Inc., 1257–1264.

  • Pestov, V., 2000: On the geometry of similarity search: Dimensionality curse and concentration of measure. Info. Process. Lett., 73, 4751, https://doi.org/10.1016/S0020-0190(99)00156-8.

    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., and L. A. Smith, 2002: Evaluating probabilistic forecasts using information theory. Mon. Wea. Rev., 130, 16531660, https://doi.org/10.1175/1520-0493(2002)130<1653:EPFUIT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Scheuerer, M., and T. M. Hamill, 2015: Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Wea. Rev., 143, 13211334, https://doi.org/10.1175/MWR-D-14-00269.1.

    • Search Google Scholar
    • Export Citation
  • Scott, D. W., 1979: On optimal and data-based histograms. Biometrika, 66, 605610, https://doi.org/10.1093/biomet/66.3.605.

  • Siegert, S., C. A. T. Ferro, D. B. Stephenson, and M. Leutbecher, 2019: The ensemble-adjusted ignorance score for forecasts issued as normal distributions. Quart. J. Roy. Meteor. Soc., 145, 129139, https://doi.org/10.1002/qj.3447.

    • Search Google Scholar
    • Export Citation
  • Singh, H., N. Misra, V. Hnizdo, A. Fedorowicz, and E. Demchuk, 2003: Nearest neighbor estimates of entropy. Amer. J. Math. Manage. Sci., 23, 301321, https://doi.org/10.1080/01966324.2003.10737616.

    • Search Google Scholar
    • Export Citation
  • Smith, L. A., and J. A. Hansen, 2004: Extending the limits of ensemble forecast verification with the minimum spanning tree. Mon. Wea. Rev., 132, 15221528, https://doi.org/10.1175/1520-0493(2004)132<1522:ETLOEF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Sugiyama, M., T. Suzuki, S. Nakajima, H. Kashima, P. von Bünau, and M. Kawanabe, 2008: Direct importance estimation for covariate shift adaptation. Ann. Inst. Stat. Math., 60, 699746, https://doi.org/10.1007/s10463-008-0197-x.

    • Search Google Scholar
    • Export Citation
  • Tödter, J., and B. Ahrens, 2012: Generalization of the ignorance score: Continuous ranked version and its decomposition. Mon. Wea. Rev., 140, 20052017, https://doi.org/10.1175/MWR-D-11-00266.1.

    • Search Google Scholar
    • Export Citation
  • Troin, M., R. Arsenault, A. W. Wood, F. Brissette, and J.-L. Martel, 2021: Generating ensemble streamflow forecasts: A review of methods and approaches over the past 40 years. Water Resour. Res., 57, e2020WR028392, https://doi.org/10.1029/2020WR028392.

    • Search Google Scholar
    • Export Citation
  • Vitart, F., and F. Molteni, 2010: Simulation of the Madden–Julian Oscillation and its teleconnections in the ECMWF forecast system. Quart. J. Roy. Meteor. Soc., 136, 842855, https://doi.org/10.1002/qj.623.

    • Search Google Scholar
    • Export Citation
  • Vitart, F., and Y. Takaya, 2021: Lagged ensembles in sub-seasonal predictions. Quart. J. Roy. Meteor. Soc., 147, 32273242, https://doi.org/10.1002/qj.4125.

    • Search Google Scholar
    • Export Citation
  • Vitart, F., and Coauthors, 2017: The Subseasonal to Seasonal (S2S) prediction project database. Bull. Amer. Meteor. Soc., 98, 163173, https://doi.org/10.1175/BAMS-D-16-0017.1.

    • Search Google Scholar
    • Export Citation
  • Wand, M. P., and M. C. Jones, 1995: Kernel Smoothing. Springer, 224 pp.

  • Weijs, S. V., R. van Nooijen, and N. van de Giesen, 2010: Kullback–Leibler divergence as a forecast skill score with classic reliability–resolution–uncertainty decomposition. Mon. Wea. Rev., 138, 33873399, https://doi.org/10.1175/2010MWR3229.1.

    • Search Google Scholar
    • Export Citation
  • Wheeler, M. C., and H. H. Hendon, 2004: An all-season real-time multivariate MJO index: Development of an index for monitoring and prediction. Mon. Wea. Rev., 132, 19171932, https://doi.org/10.1175/1520-0493(2004)132<1917:AARMMI>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2020: Regularized Dawid–Sebastiani score for multivariate ensemble forecasts. Quart. J. Roy. Meteor. Soc., 146, 24212431, https://doi.org/10.1002/qj.3800.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    A schematic diagram of the fixed-radius NN approach. (a) Forecasts and observations of the MJO indices RMM1 and RMM2 for a forecast case from an initial date of 22 Nov 2014. Ensemble forecasts of day 1, 5, 10, 15, and 20 (colored circles) and corresponding observations (black circles) are plotted. The color shading indicates the probability density covering 90% of the historical observations during extended winters (1 Nov–31 Mar) from 1980/81 to 2020/21 by kernel density estimation. The search circles of radii (R = 0.5 and 1) are shown only for day 10. MJO phases (P1–P8) are denoted in the diagram. (b) An enlargement of (a) around the circle domain of day 10. The historical observations are plotted with light blue circles. Please refer to the text for the procedure for counting using the fixed-radius NN approach.

  • Fig. 2.

    ISNN, IGSNN, and the correlation coefficient scores of the ensemble mean forecasts as a function of the single-member correlation score (r1). Plots are shown for a radius R = 0.5. Solid lines and closed circles indicate (a) ISNN and (b) IGSNN, and crosses indicate correlation coefficients of ensemble mean forecasts for reference. Colors indicate results for various ensemble sizes. The correlation coefficients of ensemble mean forecasts are averages of correlations calculated from 10 000 iterations with the Fisher’s z transformation to avoid biases of average correlations.

  • Fig. 3.

    The dependency of ISNN on the mean bias error (β). Colors indicate results for various ensemble sizes.

  • Fig. 4.

    The dependency of (a) ISNN, (b) continuous ranked probability score (CRPS), and (c) the correlation coefficient of ensemble mean forecasts on the forecast dispersion parameter (α). Colors indicate results for various ensemble size. In (a), the radius R is 0.5 except for the purple data, for which R is 0.2. In (c), correlation coefficients of the 10 000 iterations were averaged using Fisher’s z transformation.

  • Fig. 5.

    (a) ISNN, (b) IGSNN, and (c) the bivariate correlation coefficient scores of the ensemble mean forecasts as a function of the single-member correlation scores (r1, r2). Plots are shown for the case of a radius R = 1.0 and m = 25.

  • Fig. 6.

    The IGNN and IGSNN vs lead time for the MJO ensemble hindcasts of the ECMWF forecast system starting from November to February from 2001/02 to 2020/21. Results for radii of (a),(b) 0.5 and (c),(d) 1. The blue shadings indicate quartile ranges of IGNN for all cases. The dashed lines in (a) and (c) indicate the significance level of p = 0.05 based on Monte Carlo sampling. In (b) and (d), ratios of the number of cases of IGNN > 0 to the total number of cases are shown. Averages of kf/Nf are also shown to indicate the probability of detection. The dashed lines in (b) and (d) indicate the ratio of totally missed cases to all cases.

  • Fig. 7.

    The dependency of the IGNN score on the initial MJO phase and La Niña condition in the ECMWF model. (a) Difference of IGNN scores in different initial MJO phases from the average of the IGNN scores for all cases. (b) Difference of IGNN scores between La Niña winters (extended winters of 2005/06, 2007/08, 2010/11, 2017/18, and 2020/21) from all the winters (from 2001/02 to 2020/21) for each initial MJO phase. All the IGNN scores were computed with a radius of 1 and averaged for forecast cases if their initial MJO amplitudes (RMM12+RMM22) exceeded 1. The cross hatching indicates that the difference is statistically significant at the 90% confidence level. Numbers in parentheses denote the number of forecast cases.

All Time Past Year Past 30 Days
Abstract Views 543 65 0
Full Text Views 3729 3478 658
PDF Downloads 415 147 34