1. Introduction
In the past two decades, numerical weather forecasting rapidly developed and—besides model improvements—evolved from traditional single deterministic forecasts to ensemble forecasting (Gneiting et al. 2005; Bauer et al. 2015). Different forecast systems differ in their overall architecture, spatial resolution, choice of initial conditions, data assimilation technology, and physical parameterization schemes used in the numerical models. Multimodel ensemble (MME) forecasting is an effective way to make use of the forecasts from different ensemble prediction systems (EPS) with the goal being to reduce systematic deviations from observations and thus improve the overall prediction skill. Based on The Observing System Research and Predictability Experiment (THORPEX) program, which provides forecasts from different operational numerical weather prediction centers, MME forecasting is currently already widely used. Many studies have shown that the MME forecast performance is superior to the forecast of an individual (one-model-based) EPS (Krishnamurti et al. 1999; Fraley et al. 2010; Zhi et al. 2012; Zhang et al. 2015; He et al. 2015; Ji et al. 2019).
Besides the equally weighted MME, more complex MME methods, such as linear regression (Krishnamurti et al. 1999, 2000), Bayesian model averaging (BMA; Raftery et al. 2005; Vrugt et al. 2006), ensemble MOS (EMOS; Scheuerer 2014; Scheuerer and Hamill 2015) and artificial neural networks (Yuan et al. 2007; Bakhshaii and Stull 2009) have been proposed and are already widely used for precipitation forecasting (Tebaldi et al. 2004; Ke et al. 2008), typhoon forecasts (Vijaya Kumar et al. 2003; Jordan et al. 2008), and regional climate predictions (Kharin and Zwiers 2002; Yun et al. 2005). Many studies suggest that unequally weighted MME forecasts can achieve better skill than equally weighted ones (Chen et al. 2010; Zhang and Zhi 2015; Kim and Chan 2018). Peng et al. (2002) and Ke et al. (2009) show, however, that they are not always better and may even be worse than the best individual EPS forecast.
Unequally weighted MME methods often determine the weight of each contributing EPS by their relative performance during a training period, which assumes a certain temporal stability of their forecast performance. Most methods use scores derived from point-to-point comparisons between forecasts and observations, for example, the weighted ensemble mean (WEMN; Nohara et al. 2006), the bias-removed ensemble mean (BREM; Kharin and Zwiers 2002), and the superensemble (SUP; Krishnamurti et al. 2000), which uses the mean absolute error (MAE) during a training period.
However, point-to-point verification scores [e.g., MAE or equitable threat score (ETS)] provide only limited information about the quality of a precipitation forecast because they only compare the observations and predictions point by point without taking, for example, the resemblance of spatial patterns into account (Mass et al. 2002; Baldwin and Kain 2006; Gilleland et al. 2009). Precipitation is highly discontinuous in space and time. Thus, even almost perfect forecasts of, for example, the shapes and sizes of precipitation systems may lead to poor point-to-point scores because of many false alarms and misses known as “double penalty” already at small spatial deviations. However, the correct prediction of spatial features like shape, size and approximate location of extended precipitation fields are important because they can be used as a valuable guidance to improve forecasts, especially of extreme weather.
Several methods have been used to get around the limitations of point by point verification, which categorize into filtering and displacement methods. Filtering methods generally apply smoothing or scale separation to evaluate the forecast for different spatial scales (Marsigli et al. 2006; Roberts and Lean 2008; Ebert 2008, 2009; Casati et al. 2004; Casati 2009; Zepeda-Arce et al. 2000; Harris et al. 2001; Mittermaier 2006; Marzban and Sandgathe 2009), while displacement methods identify discrete features or objects in the forecast and the observations and quantify their respective displacements in terms of location or other attributes (Ebert and McBride 2000; Baldwin and Lakshmivarahan 2003; Keil and Craig 2007; Marzban and Sandgathe 2008; Gilleland et al. 2010).
The Method for Object-Based Diagnostic Evaluation (MODE) developed by Davis et al. (2006) is adopted for calculating verification scores in this study. MODE is a typical feature-based displacement approach and an example for a spatial diagnostic technique. MODE attempts to mimic the way a human would subjectively evaluate a forecast via setting a precipitation threshold and spatially convoluting (scale-dependent averaging) the precipitation field. The median of maximum interest (MMI; Davis et al. 2009) and the object-based threat score (OTS; Johnson et al. 2011a) are two scores calculated from the attributes of the detected objects in the forecast and in the observations (for more detail see section 2c), which are sensitive to different aspects of forecast accuracy (Johnson and Wang 2013).
Although MODE has been most commonly used for the verification of high-resolution model forecasts of convective storms, it can also be applied to lower-resolution numerical model weather forecasts, regional climate simulations, or chemistry model simulations (e.g., Brown et al. 2007; Wolff et al. 2014; Li et al. 2015). The 24-h accumulated precipitation over areas of hundreds of kilometers exhibits characteristic spatial patterns that should be reproduced by model forecasts. Object-based methods allow us to evaluate whether this is indeed the case. In this study, we focus on daily precipitation forecasts with lead times of 1–7 days and venture to improve their prediction skills of shape, size and/or location by a new MME approach, which employs weights derived from object-based scores. We will compare its quality with the predictions achieved by the individual EPS, by an equally weighted MME forecast, and by an MME forecast using weights based on point-to-point metric, MAE, derived from the precipitation forecasts and the observations during a training period.
First, we evaluate ensemble forecasts of 24-h accumulated precipitation produced by the five ensemble prediction systems [EPSs, i.e., the European Centre for Medium-Range Weather Forecasts (ECMWF), the National Center for Environment Prediction (NCEP), the Met Office (UKMO), the Japan Meteorological Agency (JMA), and the China Meteorological Administration (CMA)]. For each ensemble member forecast of each individual EPS, MODE is used to obtain the attributes of every identified object. Various attributes of one identified object are compared to the attributes of the best corresponding object in the observations; then the performance of each EPS is represented by the object attribute differences averaged over all objects identified for each ensemble member of an individual contributing EPS.
Second, three MME predictions are computed and their forecast accuracy evaluated using the spatial object-based measures MMI and OTS, but also using the FSS as an independent skill score that was not used for calculating weights in any of the three MME forecasts. We compare the three MME techniques in order to investigate if the MME forecast with object-based weights provides more accurate spatial information in the precipitation forecast.
The remainder of this paper is structured as follows. Section 2 briefly describes the datasets that were used and introduces MODE. In section 3, we present the performance evaluation of the five individual EPSs and of the three MME precipitation forecasting methods. A discussion and major conclusions are provided in section 4.
2. Data and methods
a. Data
We used 24-h accumulated precipitation ensemble forecasts produced by ECMWF, NCEP, UKMO, JMA and CMA at 0.5° × 0.5° resolution initialized daily at 1200 UTC for lead times of 1–7 days (Table 1). The data is available from the TIGGE-ECMWF portal (http://apps.ecmwf.int/datasets/data/tigge). TIGGE (The THORPEX Interactive Grand Global Ensemble) is a key component of the THORPEX program; it contains ensemble forecast data from 10 global model prediction centers and has been widely used for scientific research on ensemble forecasting, predictability and the development of products to improve the prediction of severe weather (Breivik et al. 2014; Loeser et al. 2017; Parsons et al. 2017). We analyzed the data for a 4-month period from 1 May to 31 August 2013 and over an area located in East Asia covering the region 15.05°–58.95°N, 70.15°–139.95°E.
Ensemble forecast systems used in this study. Here, SPPT is “stochastically perturbed parameterization tendencies,” SKEB is “stochastic kinetic energy backscatter,” ETKF is “ensemble transform Kalman filter,” and EDA is “ensemble of data assimilations.”
For forecast validation we selected a high-resolution gridded dataset of hourly precipitation, which merged precipitation analyses of the U.S. National Oceanic and Atmospheric Administration Climate Prediction Center morphing technique (CMORPH) given at a spatial resolution of 8 km, with the Chinese gauge-based precipitation analysis based on about 30 000 automatic weather stations. This merged gauge–satellite precipitation product (available at http://data.cma.cn/data/detail/dataCode/SEVP_CLI_CHN_MERGE_CMP_PRE_HOUR_GRID_0.10/) with a resolution of 0.1° × 0.1° used optimal interpolation and the probability density functions of both products, and has been proved to be superior to other similar international products over China (Xie and Xiong 2011; Pan et al. 2012). The verification data were interpolated to 0.5° × 0.5° resolution by bilinear interpolation (Rauscher et al. 2010; Kopparla et al. 2013; Ahmed et al. 2019).
b. Method for Object-Based Diagnostic Evaluation (MODE)
MODE sets weight and confidence coefficients for predefined precipitation object attributes and calculates a total interest function based on a fuzzy logic approach, which quantifies the similarity between any two objects (Davis et al. 2006; Johnson et al. 2013). The predefined attributes are chosen by a particular user for a particular application. In general, MODE consists of four steps: identifying objects, calculating object attributes, finding matching objects between observations and predictions, and assessing the similarity of their attributes.
1) Identifying objects and object attributes
To extract the spatial boundary of an object, the original precipitation field is spatially smoothed with a convolution radius R (in grid points). Then an intensity threshold T [mm (24 h)−1] is used to define the boundaries of precipitation objects (Davis et al. 2006). The original precipitation field within these boundaries then defines the precipitation objects, which are solely determined by the selection of the convolution radius R, which is related to the precipitation scale, and the threshold T, which is related to the precipitation intensity. These two paremeters can be chosen based on the scales of interest. The result of each step is demonstrated in Fig. 1.
Observed objects for 24-h accumulated precipitation on 2 Jun 2013: (a) original precipitation field before smoothing, (b) convoluted precipitation field after smoothing with a 4-gridpoint averaging radius, and (c) filtered precipitation field with the precipitation intensity greater than or equal to 10 mm.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
We usually pay attention to the overall location of a precipitating system, its size and its shape, especially when dealing with more extreme weather (Johnson and Wang 2013). Therefore, the specific attributes used in our study are the area coverage of precipitation objects, their aspect ratio (the ratio of minor axis to major axis; i.e., 1.0 for a circular object and <1 otherwise) and orientation angle (the orientation of the major axis in degrees counterclockwise starting at zonal orientation), and their centroid location. For matched object pairs [introduced in section 2b(2)], attribute differences in the four mentioned object attributes (Table 2) are calculated.
Weights and confidence values for pair attributes of matched objects used in this study. Here, CD and CDI denote the centroid distance and centroid distance interest, respectively; AR is the area ratio {AR = min[area(obs), area(mod)]/max[area(obs), area(mod)]}; and K is the aspect ratio. This table is adopted from Johnson and Wang (2013).
2) Object matching
Object matching creates a pair consisting of one object in the forecast field and one object in the observed field. Here, we followed Davis et al. (2006), who determined paired objects solely based on their centroid distance D and their areas. If
c. Quantification of similarity of matched object pairs
For a matched pair, its total interest I is computed via
where ci and ωi are the confidence value and the weight of the attribute i, respectively, and n is the number of attributes used. While the weight depends only on the particular attribute, the confidence value varies with the sizes and distances of the paired objects (Table 2). Gi is the interest value of the matched objects in terms of attribute i; it quantifies the degree of similarity between the objects for that attribute as a monotonic function decreasing from 1 to 0 as the attribute dissimilarity increases (Fig. 2).
Interest function Gi for (a) area ratio, (b) centroid distance, (c) aspect ratio difference, and (d) angle difference. This figure is adopted from Johnson and Wang (2013).
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
d. Quantification of object-based forecast accuracy
The median of maximum interest (MMI; Davis et al. 2009) and the fuzzy object-based threat score (OTS; Johnson et al. 2011a) are two metrics used to quantify the similarity of the objects in the forecast and observed fields. The MMI proposed by Davis et al. (2009), which is called the standard MMI in the following, is the median of the maximum total interests in the forecast and observed fields to which all objects contribute equally regardless of size. The MMI calculated in our study will be slightly larger than the standard MMI, because we first determine the matched objects by their centroid distance and areas, and then the total interest I is only calculated for the matched paris. Thus, unmatched objects are not considered.
The OTS is the fraction of the area of all objects that is contained in matched objects, multiplied by their total interests:
where P is the total number of objects pairs; Af and Ao are the total area of all objects in the forecast and the observed field, respectively; and
The MMI of each EPS is the median computed from the 51 members of ECMWF, 21 members of NCEP, 24 members of UKMO, 51 members of JMA, and 15 members of CMA, respectively. The OTS of each EPS is the average OTS of its ensemble members. Both EPS scores computed over a training period are used as weights to construct the object-based MME prediction as an alternative to point-to-point metrics as used in the classical approach (see next section).
e. Different multimodel ensemble types
1) Traditional gridpoint-based multimodel ensemble
Super-ensembles have the potential to improve weather and climate forecast skills above individual ensemble forecasts (Kim et al. 2004; Johnson et al. 2014; Krishnamurti et al. 2016). They automatically remove the bias between the observations and model forecasts estimated during a training period, which contributes to the improved prediction skill of multimodel forecasting. In this study, the point-to-point weighted multiensemble forecast is defined as follows:
where
2) Object-based multimodel ensemble
In this study, the weights for MME forecasts are also calculated by object scores (i.e., MMI and OTS). As described before, first, the precipitation object with its several object attributes is identified by MODE. Second, one object in the observed field will be matched to one object in the forecast field by satisfying the matching criteria. Third, the similarity between the two matched objects is determined on the basis of the differences in their attributes. Fourth, the similarity values are used to calculate the object-based metrics MMI and/or OTS from which the object-based scores for each EPS are obtained. The performance of each EPS during a training period determines its weight. Tests identified a sliding window of 30 days before the forecast period as the optimal training period. During the training period, for a certain EPS, each ensemble member is evaluated by MODE, and then MMI and/or OTS of this EPS is calculated by the median and/or mean of all ensemble members. Finally, the multimodel ensemble forecasts MMEMMI or MMEOTS are determined by multiplying the ensemble mean of each contributing EPS by the weight calculated for the training period as follows:
where N is the number of EPS, Yi is the ensemble mean for the ith EPS, and
f. Fractions skill score
Besides the MMI and OTS, the fractions skill score (FSS; Roberts and Lean 2008; Roberts 2008), which is not used to generate the weights in the tested MMEs, is applied to evaluate the forecast skill for individual EPS and the MME forecasts. This spatial verification score quantifies forecast skill over different spatial scales. FSS is calculated based on the fractional coverage within a square neighborhood centered on each grid point. FSS requires for a given spatial scale, s, the forecast and the observed areal fractions Mi and Oi at each grid point, respectively, with precipitation above a given threshold, and is calculated for an area divided into N subareas of size s × s as follows:
FBSworst is the largest fractions Brier score (FBS), which indicates the case when there are no common nonzero fractions between predictions and observations. The FSS ranges between 0 and 1; 0 stands for a totally mismatched forecast and 1 for a perfect forecast.
3. Results
a. Individual objects
Convolution radius R and threshold value T are the only two parameter that influence object recognition and thus affect the values of object attributes such as object number, area, and centroid location. In this section, we analyze the effect of the choice of R and T on the object attributes. Since the effective resolution of the model dynamics is about seven grid points (Skamarock 2004) and precipitation is generated gridpointwise in the model by the action of parameterizations, we have chosen 3 grid points for the minimum convolution radius R as compromise. Since larger R values will smooth out especially the interesting heavy precipitation areas, we analyze the impact of different R in the intermediate range between 3 and 6 grid points. Only the results for 24-h forecasts are shown; results for other lead times are qualitatively similar.
The variation of the number of objects and their areas with precipitation thresholds and convolution radii in the observations and the forecasts of the ECMWF EPS is displayed in Fig. 3. Observations and forecasts exhibit similar behavior; for example, the number of objects first increases with the precipitation threshold T until 5 mm is reached, and then gradually decreases until 25 mm is reached from where the number of objects strongly decreases. Generally, the forecast produces a lower number of objects than the observations, suggesting that the model is more inclined to predict larger continuous precipitation areas. This bias also leads to a large number of false alarms in point-to-point statistics (not shown). The number of objects understandably decreases with increasing convolution radius R in both observations and forecasts. Also the average object areas have similar dependencies on T and R for observations and forecasts, but there are also differences. The average object areas are smaller for the forecasts than for the observations for precipitation less than 5 mm. The forecast areas are slightly larger than or equal to the observed areas for higher precipitation thresholds. The average precipitation area is—different from the number of objects—relatively insensitive to the choice of the convolution radius. This is maybe because averaging makes them bigger—but also flatter, so the chosen threshold more or less compensates for that. For a given precipitation threshold, T, the number of objects decreases with increasing object area for observations and the ECMWF EPS (Fig. 4). The decrease in object number with increasing area gets larger for higher precipitation thresholds. For larger precipitation thresholds the forecasts produce significantly less objects with larger areas than the observations (be aware, that only the forecast may produce object numbers below 1, because these values are averages over the ensemble members). The effect of R and T on object number and area is qualitatively similar for the other four models (not shown).
(a) The total number of objects and (b) the average objects area for the observation (solid line) and the ECMWF EPS (dashed line) for different convolution radii (colors) and precipitation thresholds (abscissa).
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
Number of average precipitation objects vs average object area for different precipitation thresholds T (line color and type) for (left) the observations and (right) the forecasts of the 51 members of the ECMWF EPS and for increasing convolution radii R of (a),(b) 3; (c),(d) 4; and (e),(f) 5.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
In Fig. 5 we compare the distributions of several object attributes between observations and 24-h predictions by all ensemble members of all five EPSs for a convolution radius R = 4 grid points (~220 km) and a precipitation threshold T = 10 mm as an example. This qualitative analysis of the observed and forecast daily precipitation distributions is performed in order to investigate if the different numerical models behind the different EPS do capture the observed spatial features. The number of objects decrease rapidly with increasing area (Fig. 5a) with the models having a lower number of very small areas. The object aspect ratio distributions are broad and peak around 0.6 for the observations and at 0.7 for the model forecasts (Fig. 5b). Most objects have an orientation angle between −30° and 30° with the largest number of objects found around 15° especially for the forecasts, which also have secondary peaks at 90° (Fig. 5c). More objects are found in the southern part of the domain, which is also more pronounced in the forecast field, while the east–west distribution is more even for both observations and model forecasts (Figs. 5d,e). In general, the forecast and observed distributions are qualitatively similar, which demonstrates that the spatial features of 24-h accumulated precipitation are captured reasonably well by the numerical models.
Distribution of objects with specific attribute values as a fraction of the total number of objects for convolution radius R = 4 grid points and precipitation threshold T = 10 mm for observations (black bars) and 24-h lead time predictions from all members of all EPS (white bars): (a) object area, (b) object aspect ratio, (c) object orientation angle, (d) zonal grid point of object centroid, and (e) meridional grid point of object centroid.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
b. Comparison of matched object attributes
We compare now the object attributes of centroid location, aspect ratio, and orientation angle in the matched object pairs. Figures 6 and 7, respectively, show the mean objects’ zonal (i.e., east–west) and meridional (i.e., north–south) centroid differences for the five EPSs compared to the observations for different convolution radii and precipitation thresholds. The mean zonal or meridional centroids of the forecast objects are generally within 0 to 2 grid points of the observed ones for all EPSs. The forecast objects from all EPSs are located west to the observed objects for thresholds less than 10 mm. But for larger thresholds, the objects of NCEP, UKMO, and JMA EPSs are eastward from the observed ones. The predicted objects are consistently southward relative to the observed ones for all thresholds and convolution radii. The aspect ratio deviations between model predictions and observations are always positive but small (Fig. 8) indicating that the shape of the forecast objects is more circular than for the observed objects. The orientation angle differences are on average within 0° to 10° except for the large convolution radii at a threshold of 50 mm (Fig. 9). The average positive deviations between forecast and observed objects indicate a more meridional orientation of the former. In summary, the forecast objects are more circular, more southwest and have a more meridional orientation than the observed objects. A most likely hypothesis is that these characteristics of forecast objects are attributed to model dynamics and physics (Johnson et al. 2011b; Johnson and Wang 2013).
The objects’ mean zonal centroid location of the individual members of the five EPS in comparison with the observation for (a) ECMWF, (b) NCEP, (c) UKMO, (d) JMA, and (e) CMA.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
As in Fig. 6, but for meridional centroid location.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
As in Fig. 6, but for aspect ratio.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
As in Fig. 6, but for orientation angle.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
Since the four main attributes are not very sensitive to the choice of R (Figs. 6–9) and the difference of object numbers between the forecast and the observation becomes small when R is larger than 3 grid points (Fig. 3), we choose R = 4 grid points to investigate the performance of object-based MME forecasting. We have chosen a threshold value of 10 mm for 24-h accumulated precipitation in order to focus on moderate to strong precipitation and exclude light precipitation, which is usually overpredicted in frequency especially by non-convection-permitting model simulations (Giorgi et al. 1992; Golding 2000; Dravitzki and McGregor 2011).
c. Multimodel ensemble forecasting
Many studies have shown advantages of MME prediction over predictions from a single EPS (Candille 2009; Beck et al. 2016; Wanders and Wood 2016; Samuels et al. 2018). We first calculate the weights based on the MAE by point-to-point statistics and the MMI or OTS based on MODE, hereinafter abbreviated as SUP, MMEMMI, and MMEOTS, respectively. The results of these three MME predictions are weighted ensemble mean forecasts. Thus, they are deterministic forecasts, which we evaluated via MODE taking a threshold of 10 mm and a convolution radius of 4 grid points.
The object-based scores for both the individual EPSs and the three MME predictions (i.e., MMEMMI, MMEOTS and SUP) are compared in Fig. 10. As expected, the forecast skill generally decreases with lead time for all predictions. The ECMWF EPS is more skillful than the other EPSs in terms of MMI and OTS and thus it contributes relatively more to the results of MMEMMI and MMEOTS (Fig. 11). UKMO EPS performs good for lead times of 1–4 days, and NCEP EPS is better for longer lead times. The relative performance of each EPS is shown by their respective weights. The CMA EPS has the lowest scores and thus contributes the least (Fig. 11). The MME predictions weighted by the MMI and OTS metrics perform similarly well, and perform better than both the individual EPSs and the traditional gridpoint-based MME prediction based on the point-to-point MAE metric for almost all lead times.
(a) MMI and (b) OTS for five individual EPSs and the three multimodel forecasts with R = 4 grid points and T = 10 mm for the lead time of 1–7 days.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
Weights of the five EPSs with lead times of 1–7 days calculated by (left) MMI and (right) OTS.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
To understand why the MME forecasts based on MMI and OTS are better than the single-model ensemble forecasts and the traditional point-to-point MME, we analyze the four main attribute differences (aspect ratio, orientation angle, zonal and meridional centroid location) between the observed and forecast fields for all lead times (Fig. 12). For all lead times the forecast objects are on average more circular than the observed ones (Fig. 12a). The object orientation angles resulting from the traditional superensemble forecast are somewhat closer to the observations. The orientation angle of the forecast objects is on average larger than for the observed objects; thus, the forecast objects have on average a more meridionally oriented orientation than the observed objects (Fig. 12b). For aspect ratio and orientation angle, the MMEMMI and MMEOTS forecasts on average are not better than the individual model and traditional point-to-point superensemble forecasts, while the centroid locations—both latitude and longitude—are better reproduced by both the MMEMMI and MMEOTS forecasts and are thus the main reason for their overall better performances given the higher weight the centroid locations get in Eq. (1). The traditional point-to-point superensemble forecast is unable to predict the location well in our case, especially for the meridional centroid location. But it still beats some individual EPSs for lead times of 3 days and longer (Figs. 12c,d). The average bias for these four attributes in the MME forecasts is qualitatively similar to the bias of the individual models because all models exhibit similar error characteristics. Accordingly, an MME forecast will suffer from the same errors.
The average difference between the forecast (individual EPSs and multimodel ensemble forecasts) and observed object attribute distributions with R = 4 grid points and T = 10 mm as a function of lead time for (a) aspect ratio, (b) orientation angle, (c) zonal grid point of centroid, and (d) meridional grid point of centroid.
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
We evaluate the equally weighted MME mean forecasts (EMME) and the two unequally weighted MME forecasts MMEOTS and SUP with the FSS, which is not used for weight determination in the training periods (Fig. 13). The results for MMEMMI are similar to those of MMEOTS and thus not displayed in Fig. 13. The FSS always increases with scale; accordingly it is easier to predict precipitation probabilities for larger areas. For all spatial scales, the object-based MME forecasting MMEOTS is slightly better than the equally weighted one (EMME) for all lead times. The gridpoint-based MME (SUP) provides the best predictions when evaluated with the FSS. There may be two main reasons for this result. First, precipitation objects often have complicated shapes that are not sufficiently represented by the MODE attributes. In this study, only orientation angle and aspect ratio are used to describe the shape of the precipitation object; thus, other meaningful precipitation information may be missed. Second, the gridpoint-based superensemble removes the bias of precipitation intensity between the observations and model forecasts, while the object-based MME in our study removes the spatial bias (e.g., centroid location) but not the precipitation intensity bias.
FSS against forecast lead days for spatial scales s of 1 grid point (dots), 2 grid points (asterisks), and 3 grid points (triangles) for the multimodel ensemble predictions that are based on MAE (SUP), equally weighted multimodel ensemble mean (EMME), and multimodel ensemble predictions that are based on object-based scores (MMEOTS).
Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0266.1
4. Summary and discussion
Traditional point-to-point verification methods neglect important spatial information, and are usually insensitive to differences in precipitation location and shape errors. Precipitation is regarded as an object by MODE and several object attributes such as number, area, shapes and centroid locations are identified. The differences in object attributes between the model forecasts and the observations could provide important diagnostic information about prediction biases and help forecasters to better use model forecast products.
In this study, the ensemble forecasts from five EPSs (ECMWF, NCEP, UKMO, JMA, and CMA EPS) available from the TIGGE datasets are evaluated via object attributes based on MODE. In addition, we investigate an MME technique based on object-based scores and compare it with the equally weighted multimodel ensemble mean and superensemble forecasts based on the point-to-point metric MAE.
We first analyze the impact of the convolution radius R and precipitation threshold T on the attributes of the derived precipitation objects. The number of detected objects decreases with increasing convolution radius and precipitation threshold. For all precipitation fields the number of detected objects decreases with increasing object area.
In general, the numerical models could capture the distribution of attributes of the observed precipitation objects, and their forecast skill decreases as expected with lead time. The objects aspect ratio varies between 0.3 and 0.9 and the orientation angles are within ±30°. More objects are found in the eastern/central and southern portion of the domain than in other parts of the domain. In addition, for matched objects–compared to the observed one–the forecast object centroid positions by all individual model ensembles are more southward and westward. Forecast objects tend to be more circular and more southwest–northeast-oriented relative to the observed ones. Causes for these features of forecast objects are probably related to dynamical errors and model physics.
For the five EPSs used in this study, the ECMWF EPS performs best. The MME weighted by the spatial metrics outperforms both all single model EPSs and the traditional point-to-point superensemble forecast mainly because of the better forecast object centroid locations when evaluated using the ensemble mean of the object-based metrics. When all EPSs have similar error characteristics, MMEs will not help much. Thus, the causes for such biases—most probably related to model dynamics and parameterization physics—must be found and the models improved accordingly.
When evaluated with the gridpoint-based (i.e., nonobject) metric, FSS, the object-based MME still performs somewhat better than the equally weighted ensemble mean, but is not as good as the gridpoint-based MME predictions. This is probably attributed to the use of too few attributes used in our MODE realization and to the inherent bias removal built in the traditional sup-ensemble. MME performance strongly depends on how it is generated, and additional metrics may be used to determine the weights for MME forecasting. Possibly, forecast skill may be further improved by combining different postprocessing methods.
The small differences between object-based and equally weighted MME forecasts, in terms of MMI and OTS (not shown), are probably due to similar model biases of the five EPSs in our study domain and suggest an extension of such studies to other domains.
The precipitation objects are identified in our study from the raw ensemble forecasts without any bias correction. Thus, the object-based scores may be improved by appropriate bias correction. Alternatively, appropriate measures of the objects’s precipitation intensity could be developed and added as object attributes both for object pair identification and EPS weight determination and potentially improve the forecast skill of object-based MME above pure gridpoint-based MME even when evaluated by gridpoint-based metrics. The object-based MME prediction results may also be further improved by excluding the EPSs performing worst in the training period. In addition, the FSS metric can also be employed to determine the weights of the contributing EPSs. Because precipitation structures become increasingly complex as resolution increases, features such as shape and orientation are hard to define at high resolution, thus the FSS might be an alternative to MODE.
Acknowledgments
This study was supported by the National Key Research and Development Project (Grant 2017YFC1502000), the National Natural Science Foundation of China (Grant 41575104), the Postgraduate Research and Practice Innovation Program of Jiangsu Province (Grant SJKY19_0934), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).
REFERENCES
Ahmed, K., S. Shahid, N. Nawaz, and N. Khan, 2019: Modeling climate change impacts on precipitation in arid regions of Pakistan: a non-local model output statistics downscaling approach. Theor. Appl. Climatol., 137, 1347–1364, https://doi.org/10.1007/s00704-018-2672-5.
Bakhshaii, A., and R. Stull, 2009: Deterministic ensemble forecasts using gene-expression programming. Wea. Forecasting, 24, 1431–1451, https://doi.org/10.1175/2009WAF2222192.1.
Baldwin, M. E., and S. Lakshmivarahan, 2003: Development of an events-oriented verification system using data mining and image processing algorithms. Preprints, Third Conf. on Artificial Intelligence, Long Beach, CA, Amer. Meteor. Soc., 4.6, http://ams.confex.com/ams/pdfpapers/57821.pdf.
Baldwin, M. E., and J. S. Kain, 2006: Sensitivity of several performance measures to displacement error, bias, and event frequency. Wea. Forecasting, 21, 636–648, https://doi.org/10.1175/WAF933.1.
Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 47–55, https://doi.org/10.1038/nature14956.
Beck, J., F. Bouttier, L. Wiegand, C. Gebhardt, C. Eagle, and N. Roberts, 2016: Development and verification of two convection-allowing multi-model ensembles over Western Europe. Quart. J. Roy. Meteor. Soc., 142, 2808–2826, https://doi.org/10.1002/qj.2870.
Breivik, Ø., O. J. Aarnes, S. Abdalla, J.-R. Bidlot, and P. A. E. M. Janssen, 2014: Wind and wave extremes over the world oceans from very large ensembles. Geophys. Res. Lett., 41, 5122–5131, https://doi.org/10.1002/2014GL060997.
Brown, B. G., L. Holland, J. E. Halley Gotway, R. Bullock, D. A. Ahijevych, E. Gilleland, and C. A. Davis, 2007: Application of the MODE object-based verification tool for the evaluation of model precipitation fields. 22nd Conf. on Weather Analysis and Forecasting /18th Conf. on Numerical Weather Prediction, Park City, UT, Amer. Meteor. Soc., 10A.2, https://ams.confex.com/ams/22WAF18NWP/techprogram/paper_124856.htm.
Candille, G., 2009: The multiensemble approach: The NAEFS example. Mon. Wea. Rev., 137, 1655–1665, https://doi.org/10.1175/2008MWR2682.1.
Casati, B., 2009: New developments of the intensity-scale technique within the Spatial Verification Methods Intercomparison Project. Wea. Forecasting, 25, 113–143, https://doi.org/10.1175/2009WAF2222257.1.
Casati, B., G. Ross, and D. B. Stephenson, 2004: A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteor. Appl., 11, 141–154, https://doi.org/10.1017/S135048270400123.
Chen, C. H., C. Y. Li, Y. K. Tan, and T. Wang, 2010: Research of the multi-model super-ensemble prediction based on crossvalidation. J. Meteor. Res., 68, 464–476.
Davis, C. A., B. G. Brown, and R. Bullock, 2006: Object-based verification of precipitation forecasts. Part I: Methodology and application to mesoscale rain areas. Mon. Wea. Rev., 134, 1772–1784, https://doi.org/10.1175/MWR3145.1.
Davis, C. A., B. G. Brown, R. Bullock, and J. H. Gotway, 2009: The method for Object-Based Diagnostic Evaluation (MODE) applied to numerical forecasts from the 2005 NSSL/SPC spring program. Wea. Forecasting, 24, 1252–1267, https://doi.org/10.1175/2009WAF2222241.1.
Dravitzki, S., and J. McGregor, 2011: Predictability of heavy precipitation in the Waikato River basin of New Zealand. Mon. Wea. Rev., 139, 2184–2197, https://doi.org/10.1175/2010MWR3137.1.
Ebert, E. E., 2008: Fuzzy verification of high-resolution gridded forecasts: A review and proposed framework. Meteor. Appl., 15, 51–64, https://doi.org/10.1002/met.25.
Ebert, E. E., 2009: Neighborhood verification: A strategy for rewarding close forecasts. Wea. Forecasting, 24, 1498–1510, https://doi.org/10.1175/2009WAF2222251.1.
Ebert, E. E., and J. L. McBride, 2000: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol., 239, 179–202, https://doi.org/10.1016/50022-1694(00)00343-7.
Fraley, C., A. E. Raftery, and T. Gneiting, 2010: Calibrating multi-model forecast ensembles with exchangeable and missing members using Bayesian model averaging. Mon. Wea. Rev., 138, 190–202, https://doi.org/10.1175/2009MWR3046.1.
Gilleland, E., D. Ahijevych, B. G. Brown, B. Casati, and E. E. Ebert, 2009: Intercomparison of spatial forecast verification methods. Wea. Forecasting, 24, 1416–1430, https://doi.org/10.1175/2009WAF2222269.1.
Gilleland, E., D. Ahijevych, B. G. Brown, and E. E. Ebert, 2010: Verifying forecasts spatially. Bull. Amer. Meteor. Soc., 91, 1365–1376, https://doi.org/10.1175/2010BAMS2819.1.
Giorgi, F., G. T. Bates, and S. J. Nieman, 1992: Simulation of the arid climate of the southern Great Basin using a regional climate model. Bull. Amer. Meteor. Soc., 73, 1807–1822, https://doi.org/10.1175/1520-0477(1992)073<1807:SOTACO>2.0.CO;2.
Gneiting, T., A. E. Raftery, A. H. Westveld III, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 1098–1118, https://doi.org/10.1175/MWR2904.1.
Golding, B. W., 2000: Quantitative precipitation forecasting in the UK. J. Hydrol., 239, 286–305, https://doi.org/10.1016/S0022-1694(00)00354-1.
Harris, D., E. Foufoula-Georgiou, K. K. Droegemeier, and J. J. Levit, 2001: Multiscale statistical properties of a high-resolution precipitation forecast. J. Hydrometeor., 2, 406–418, https://doi.org/10.1175/1525-7541(2001)002<0406:MSPOAH>2.0.CO;2.
He, C. F., X. F. Zhi, Q. L. You, B. Song, and K. Fraedrich, 2015: Multi-model ensemble forecasts of tropical cyclones in 2010 and 2011 based on the Kalman filter method. Meteor. Atmos. Phys., 127, 467–479, https://doi.org/10.1007/s00703-015-0377-1.
Ji, L. Y., X. F. Zhi, S. P. Zhu, and K. Fraedrich, 2019: Probabilistic precipitation forecasting over East Asia using Bayesian model averaging. Wea. Forecasting, 34, 377–392, https://doi.org/10.1175/WAF-D-18-0093.1.
Johnson, A., and X. Wang, 2013: Object-based evaluation of a storm-scale ensemble during the 2009 NOAA Hazardous Weather Testbed Spring Experiment. Mon. Wea. Rev., 141, 1079–1098, https://doi.org/10.1175/MWR-D-12-00140.1.
Johnson, A., X. Wang, F. Kong, and M. Xue, 2011a: Hierarchical cluster analysis of a convection-allowing ensemble during the Hazardous Weather Testbed 2009 Spring Experiment. Part I: Development of the object-oriented cluster analysis method for precipitation fields. Mon. Wea. Rev., 139, 3673–3693, https://doi.org/10.1175/MWR-D-11-00015.1.
Johnson, A., X. Wang, F. Kong, and M. Xue, 2011b: Hierarchical cluster analysis of a convection-allowing ensemble during the Hazardous Weather Testbed 2009 Spring Experiment. Part II: Ensemble clustering over the whole experiment period. Mon. Wea. Rev., 139, 3694–3710, https://doi.org/10.1175/MWR-D-11-00016.1.
Johnson, A., X. Wang, F. Kong, and M. Xue, 2013: Object-based evaluation of the impact of horizontal grid points on convection-allowing forecasts. Mon. Wea. Rev., 141, 3413–3425, https://doi.org/10.1175/MWR-D-13-00027.1.
Johnson, B., V. Kumar, and T. N. Krishnamurti, 2014: Rainfall anomaly prediction using statistical downscaling in a multimodel superensemble over tropical South America. Climate Dyn., 43, 1731–1752, https://doi.org/10.1007/s00382-013-2001-8.
Jordan, M. R., T. N. Krishnamurti, and C. A. Clayson, 2008: Investigating the utility of using cross-oceanic training sets for superensemble forecasting of eastern Pacific tropical cyclone track and intensity. Wea. Forecasting, 23, 516–522, https://doi.org/10.1175/2007WAF2007016.1.
Ke, Z. J., W. J. Dong, and P. Q. Zhang, 2008: Multimodel ensemble forecasts for precipitations in China in 1998. Adv. Atmos. Sci., 25, 72–82, https://doi.org/10.1007/s00376-008-0072-y.
Ke, Z. J., W. J. Dong, P. Q. Zhang, J. Wang, and T. B. Zhao, 2009: An analysis of the difference between the multiple linear regression approach and the multimodel ensemble mean. Adv. Atmos. Sci., 26, 1157–1168, https://doi.org/10.1007/s00376-009-8024-8.
Keil, C., and G. C. Craig, 2007: A displacement-based error measure applied in a regional ensemble forecasting system. Mon. Wea. Rev., 135, 3248–3259, https://doi.org/10.1175/MWR3457.1.
Kharin, V. V., and F. W. Zwiers, 2002: Climate predictions with multimodel ensembles. J. Climate, 15, 793–799, https://doi.org/10.1175/1520-0442(2002)015<0793:CPWME>2.0.CO;2.
Kim, M. K., I.-S. Kang, C.-K. Park, and K.-M. Kim, 2004: Superensemble prediction of regional precipitation over Korea. Int. J. Climatol., 24, 777–790, https://doi.org/10.1002/joc.1029.
Kim, O. Y., and J. C. L. Chan, 2018: Cyclone-track based seasonal prediction for South Pacific tropical cyclone activity using APCC multi-model ensemble prediction. Climate Dyn., 51, 3209–3229, https://doi.org/10.1007/s00382-018-4075-9.
Kopparla, P., E. M. Fischer, C. Hannay, and R. Knutti, 2013: Improved simulation of extreme precipitation in a high-resolution atmosphere model. Geophys. Res. Lett., 40, 5803–5808, https://doi.org/10.1002/2013GL057866.
Krishnamurti, T. N., C. M. Kishtawal, T. E. LaRow, D. R. Bachiochi, Z. Zhang, C. E. Williford, S. Gadgil, and S. Surendran, 1999: Improved weather and seasonal climate forecasts from multimodel superensemble. Science, 285, 1548–1550, https://doi.org/10.1126/science.285.5433.1548.
Krishnamurti, T. N., C. M. Kishtawal, Z. Zhang, T. LaRow, D. Bachiochi, E. Williford, S. Gadgil, and S. Surendran, 2000: Multimodel ensemble forecasts for weather and seasonal climate. J. Climate, 13, 4196–4216, https://doi.org/10.1175/1520-0442(2000)013<4196:MEFFWA>2.0.CO;2.
Krishnamurti, T. N., V. Kumar, A. Simon, A. Bhardwaj, T. Ghosh, and R. Ross, 2016: A review of multimodel superensemble forecasting for weather, seasonal climate, and hurricanes. Rev. Geophys., 54, 336–377, https://doi.org/10.1002/2015RG000513.
Li, J., K. Hsu, A. AghaKouchak, and S. Sorooshian, 2015: An object-based approach for verification of precipitation estimation. Int. J. Remote Sens., 36, 513–529, https://doi.org/10.1080/01431161.2014.999170.
Loeser, C. F., M. A. Herrera, and I. Szunyogh, 2017: An assessment of the performance of the operational global ensemble forecast systems in predicting the forecast uncertainty. Wea. Forecasting, 32, 149–164, https://doi.org/10.1175/WAF-D-16-0126.1.
Marsigli, C., A. Montani, and T. Paccagnella, 2006: Verification of the COSMOLEPS new suite in terms of precipitation distribution. COSMO Newsletter, No. 6, Consortium for Small-Scale Modeling, Offenbach, Germany, 134–141, http://www.cosmo-model.org/content/model/documentation/newsLetters/newsLetter06/default.htm.
Marzban, C., and S. Sandgathe, 2008: Cluster analysis for object-oriented verification of fields: A variation. Mon. Wea. Rev., 136, 1013–1025, https://doi.org/10.1175/2007MWR1994.1.
Marzban, C., and S. Sandgathe, 2009: Verification with variograms. Wea. Forecasting, 24, 1102–1120, https://doi.org/10.1175/2009WAF2222122.1.
Mass, C. F., D. Ovens, K. Westrick, and B. A. Colle, 2002: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteor. Soc., 83, 407–430, https://doi.org/10.1175/1520-0477(2002)083<0407:DIHRPM>2.3.CO;2.
Mittermaier, M. P., 2006: Using an intensity-scale technique to assess the added benefit of high-resolution model precipitation forecasts. Atmos. Sci. Lett., 7, 36–42, https://doi.org/10.1002/asl.127.
Nohara, D., A. Kitoh, M. Hosaka, and T. Oki, 2006: Impact of climate change on river discharge projected by multimodel ensemble. J. Hydrol., 7, 1076–1089, https://doi.org/10.1175/JHM531.1.
Pan, Y., Y. Shen, J. J. Xu, and P. Zhao, 2012: Analysis of the combined gauge-satellite hourly precipitation over China based on the OI technique (in Chinese). Acta Meteor. Sin., 70, 1381–1389.
Parsons, D. B., and Coauthors, 2017: THORPEX research and the science of prediction. Bull. Amer. Meteor. Soc., 98, 807–830, https://doi.org/10.1175/BAMS-D-14-00025.1.
Peng, P., A. Kumar, H. Dool, and A. G. Barnston, 2002: An analysis of multimodel ensemble predictions for seasonal climate anomalies. J. Geophys. Res., 107, 4710, https://doi.org/10.1029/2002JD002712.
Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 1155–1174, https://doi.org/10.1175/MWR2906.1.
Rauscher, S. A., E. Coppola, C. Piani, and F. Giorgi, 2010: Resolution effects on regional climate model simulations of seasonal precipitation over Europe. Climate Dyn., 35, 685–711, https://doi.org/10.1007/s00382-009-0607-7.
Roberts, N., 2008: Assessing the spatial and temporal variation in the skill of precipitation forecasts from an NWP model. Meteor. Appl., 15, 163–169, https://doi.org/10.1002/met.57.
Roberts, N., and H. W. Lean, 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 78–97, https://doi.org/10.1175/2007MWR2123.1.
Samuels, R., and Coauthors, 2018: Evaluation and projection of extreme precipitation indices in the Eastern Mediterranean based on CMIP5 multi-model ensemble. Int. J. Climatol., 38, 2280–2297, https://doi.org/10.1002/joc.5334.
Scheuerer, M., 2014: Probabilistic quantitative precipitation forecasting using ensemble model output statistics. Quart. J. Roy. Meteor. Soc., 140, 1086–1096, https://doi.org/10.1002/qj.2183.
Scheuerer, M., and T. M. Hamill, 2015: Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Mon. Wea. Rev., 143, 4578–4596, https://doi.org/10.1175/MWR-D-15-0061.1.
Skamarock, W. C., 2004: Evaluating mesoscale NWP models using kinetic energy spectra. Mon. Wea. Rev., 132, 3019–3032, https://doi.org/10.1175/MWR2830.1.
Tebaldi, C., L. O. Mearns, D. Nychka, and R. L. Smith, 2004: Regional probabilities of precipitation change: A Bayesian analysis of multimodel simulations. Geophys. Res. Lett., 31, L24213, https://doi.org/10.1029/2004GL021276.
Vijaya Kumar, T. S. V., T. N. Krishnamurti, M. Fiorino, and M. Nagata, 2003: Multimodel superensemble forecasting of tropical cyclones in the Pacific. Mon. Wea. Rev., 131, 574–583, https://doi.org/10.1175/1520-0493(2003)131<0574:MSFOTC>2.0.CO;2.
Vrugt, J. A., M. P. Clark, C. G. H. Diks, Q. Duan, and B. A. Robinson, 2006: Multi-objective calibration of forecast ensembles using Bayesian model averaging. Geophys. Res. Lett., 33, L19817, https://doi.org/10.1029/2006GL027126.
Wanders, N., and E. F. Wood, 2016: Improved sub-seasonal meteorological forecast skill using weighted multi-model ensemble simulations. Environ. Res. Lett., 11, 94007, https://doi.org/10.1088/1748-9326/11/9/094007.
Wolff, J. K., M. Harrold, T. Fowler, J. H. Gotway, L. Nance, and B. G. Brown, 2014: Beyond the basics: Evaluating model-based precipitation forecasts using traditional, spatial, and object-based methods. Wea. Forecasting, 29, 1451–1472, https://doi.org/10.1175/WAF-D-13-00135.1.
Xie, P. P., and A. Y. Xiong, 2011: A conceptual model for constructing high-resolution gauge-satellite merged precipitation analyses. J. Geophys. Res., 116, D21106, https://doi.org/10.1029/2011JD016118.
Yuan, H., X. Gao, S. L. Mullen, S. Sorooshian, J. Du, and H. H. Juang, 2007: Calibration of probabilistic quantitative precipitation forecasts with an artificial neural network. Wea. Forecasting, 22, 1287–1303, https://doi.org/10.1175/2007WAF2006114.1.
Yun, W. T., L. Stefanova, A. K. Mitra, T. S. V. Vijaya Kumar, W. Dewar, and T. N. Krishnamurti, 2005: A multi-model superensemble algorithm for seasonal climate prediction using DEMETER forecasts. Tellus, 57A, 280–289, https://doi.org/10.3402/tellusa.v57i3.14699.
Zepeda-Arce, J., E. Foufoula-Georgiou, and K. K. Droegemeier, 2000: Space–time rainfall organization and its role in validating quantitative precipitation forecasts. J. Geophys. Res., 105, 10 129–10 146, https://doi.org/10.1029/1999JD901087.
Zhang, H. B., and Coauthors, 2015: Study of the modification of multi-model ensemble schemes for tropical cyclone forecasts. J. Trop. Meteor., 21, 389–399.
Zhang, L., and X. F. Zhi, 2015: Multi-model consensus forecasting of low temperature and icy weather over central and southern China in early 2008. J. Trop. Meteor., 21, 67–75.
Zhi, X. F., H. X. Qi, Y. Q. Bai, and C. Z. Lin, 2012: A comparison of three kinds of multi-model ensemble forecast techniques based on the TIGGE data. Acta Meteor. Sin., 26, 41–51, https://doi.org/10.1007/S13351-012-0104-5.