1. Introduction
Many different techniques for constructing ensembles have been developed and tested, including the use of perturbation of an initial state (e.g., Toth and Kalnay 1997; Palmer et al. 1992; Molteni et al. 1996; Houtekamer et al. 1996), different combinations of physical parameterizations (e.g., Stensrud et al. 2000; Du et al. 2004), different numerical models (e.g., Du et al. 2004; Wandishin et al. 2001), and combinations of these techniques (e.g., Du et al. 2006). These methods try to increase the skill of the ensemble by introducing independent information so that all possible states of the future atmosphere are simulated, but the best methods for many applications, including precipitation forecasting, are still under investigation (Roebber et al. 2004). In addition, precipitation forecasts from ensemble members in current research systems are underdispersive. In other words, the observed state of the atmosphere does not fall within the probability distribution function (PDF) fit of the ensemble output (Fritsch and Carbone 2004).
One method to improve our understanding of model dispersion and ensemble forecast skill is to attempt to isolate the error sources by using different perturbation strategies for forecasts conducted over an extended period of time. This is the method used in the present study, which isolates model error using the “perfect analysis” assumption and isolates analysis error using the “perfect model” assumption (e.g., Houtekamer et al. 1996; Stensrud et al. 2000). It builds on the recent success of the National Centers for Environmental Prediction’s (NCEP) Short-Range Ensemble Forecast (SREF) system (Du et al. 2003), which has shown that short-range ensemble forecasts can provide valuable information similar to operational medium and long-range ensemble forecast systems [e.g., the Global Forecast System (GFS) Ensemble Prediction System (Toth and Kalnay 1993) and the European Centre for Medium-Range Weather Forecasts (ECMWF) Ensemble Prediction System (Molteni et al. 1996)] by adding mixed physical parameterization schemes (mixed physics) and different model formulations.
Perturbing the initial conditions (ICs) has long been the strategy for medium- and long-range global ensembles like the GFS and ECMWF Ensemble Prediction Systems, but this strategy is known to produce inadequate spread in the short range (Buizza 1997; Hamill and Colucci 1997, 1998a; Stensrud et al. 2000), before error growth on the synoptic scale becomes nonlinear (Gilmour et al. 2001). Intuitively, ignoring model error and only perturbing the initial conditions should lead to severe underestimation of forecast error (Houtekamer et al. 1996). However, unlike initial condition errors, an ensemble prediction system does not need a complete covariance description for model error; instead plausible realizations of model error can be used focusing on known model deficiencies (e.g., convection, microphysics, orography, etc.; Houtekamer et al. 1996). Recent work has shown that this strategy can produce significantly more spread than initial condition errors within the first 12 h of a forecast; it is also most effective in weak forcing regimes during the warm season (Stensrud et al. 2000). In addition, adding mixed-model formulations can further contribute spread and skill to an ensemble (Wandishin et al. 2001; Du et al. 2004; Eckel and Mass 2005). However, even in an ensemble with perturbed ICs, mixed models, and mixed-physics parameterizations, spread is still inadequate, particularly for the sensible, mesoscale weather phenomena most important to human interests, such as precipitation (Eckel and Mass 2005).
Additional problems arise when running ensembles of limited area models with mixed physics and/or differing model formulations because of the infiltration of lateral boundary conditions (LBCs), which negates increasing spread (e.g., Errico and Baumhefner 1987; Warner et al. 1997). The time it takes the infiltration of LBCs to begin having adverse effects depends on the cross-domain advection time, which in turn is a function of domain size and large-scale flow (Vukecevic and Paegle 1989). In addition, because upper atmospheric short waves entering the domain can travel faster than the ambient winds, adverse effects from the lack of LBC perturbations may occur even more quickly than the cross-domain advection time. Perturbing the ICs and LBCs using members of a global ensemble is one good way to counter the loss of spread caused by using unperturbed LBCs (Hou et al. 2001), but in some applications this may not be practical or cost-efficient. For example, as local National Weather Service offices obtain the resources to run limited-area model ensembles over their county warning areas, the additional time it takes for data from global ensembles to become available along with the extra time it takes to initialize each ensemble member with unique LBCs could render a short-range forecast useless. In addition, even when perturbations are added to the LBCs by using members of a global ensemble, spread is still limited because of coarsely resolved and temporally interpolated LBCs (Nutter et al. 2004). It is clear that other methods for generating LBCs in regional-scale ensembles, such as those proposed by Torn et al. (2006), are needed.
The purpose of this paper is to isolate model and IC/LBC errors in two Weather Research and Forecasting (WRF; Skamarock et al. 2001; Michalakes et al. 2001) model ensembles and to compare the contributions of each error source to the spread and skill of precipitation forecasts. Each ensemble has eight members, with one ensemble composed of members with mixed physics and different model formulations with unperturbed ICs and LBCs (MP ensemble) and one ensemble composed of members using perturbed ICs and LBCs from a global ensemble with identical physical parameterizations (PILB ensemble). This is similar to using the perfect analysis and perfect model assumptions, respectively. By isolating the error sources in a regional ensemble, the window of forecast lead time before which unperturbed ICs/LBCs start to cause degradation in ensemble skill can be examined. The skill of deterministic forecasts derived using a statistical procedure known as probability matching and the spread and skill of probabilistic forecasts are examined.
The remainder of the paper is organized as follows: section 2 includes a description of the data and methodology, section 3 includes the results, and section 4 contains a summary and recommendations for future work.
2. Data and methodology
a. Ensemble member specifications
The domain of the ensemble system covers a large portion of the central United States with dimensions 1575 km × 1800 km (Fig. 1). The 16 WRF (version 2.1.1) members were integrated at 0000 UTC for a period of 120 h on a 15-km grid with LBCs updated every 6 h for 72 cases during the following dates: 27, 29, and 31 January; 1–3, 7, and 9–10 February; and 13 February to 17 April. These dates were chosen for this study because archived forecast rainfall data were available from simulations that were conducted in real time at Iowa State University to assist forecasters. To show how the low-level temperatures and precipitation compared to climatology during this period, composites of 850-mb temperature and precipitation rate anomalies (not shown) were constructed for the period 27 January to 17 April 2006 over the forecast domain using an interactive plotting tool on the Climate Diagnostics Center Web site (available online at http://www.cdc.noaa.gov/Composites/Day/), which utilizes variables from the NCEP–National Center for Atmospheric Research (NCAR) reanalysis (Kalnay et al. 1996). The composites showed that 850-mb temperature anomalies were in the range 0.2°–2.4°C and average precipitation rates were above normal in an approximately 350-km-wide corridor in the center of the domain extending from northeast Wyoming to west central Indiana. In most locations north and south of this corridor, average precipitation rates were slightly below normal. Archived storm reports available from the Storm Prediction Center Web site (available online at http://www.spc.noaa.gov/archive/) revealed that March and April were characterized by very active convective weather, with severe weather outbreaks occurring within the domain on 11 and 30 March and 2, 5–7, 11, 13, and 15–16 April. Also, a major snowstorm affected parts of Nebraska, Kansas, and Iowa on 20–21 March. In summary, a wide variety of weather events, many of which were associated with very strong synoptic-scale forcing, occurred during this relatively active period. Because IC/LBC errors should grow faster in strong forcing regimes than in weak forcing regimes (e.g., Stensrud et al. 2000) the PILB ensemble may have an extra advantage over the MP ensemble during this particular time period. However, because of the frequent convective activity that occurred in the domain during the time period analyzed, forecast precipitation is likely extremely sensitive to the different microphysics and convective schemes used in the MP ensemble, possibly giving the MP ensemble an extra advantage over the PILB ensemble.
The eight MP ensemble members use ICs and LBCs from the operational version of NCEP’s GFS model (Environmental Modeling Center 2003). The Advanced Research WRF (ARW) dynamic core (Wicker and Skamarock 2002; Skamarock et al. 2005) was used in six of the MP members and the Nonhydrostatic Mesoscale Model (NMM) dynamic core (Janjić 2003) was used in two of the MP members. The ARW (NMM) members had 31 (38) vertical levels. Microphysical and convective parameterization schemes were the only physics schemes varied because they have been shown to be the most efficient ways to substantially increase spread (e.g., Jankov et al. 2005, 2007). However, because including perturbations to other sources of model uncertainty (e.g., boundary layer schemes, radiation schemes, etc.) will also increase spread, the MP ensemble is only capturing a portion of the model error. Also, it should be noted that the two WRF–NMM members in the MP ensemble have physics packages identical to those of two other WRF–ARW members in the MP ensemble. Gallus and Bresch (2006) showed that dynamic core changes could cause roughly similar rainfall forecast spread as the physics changes in a set of warm season cases. Because a large portion of the time period analyzed in this study was during March and April, which were characterized by above-normal temperatures and convective episodes typical of the warm season, it is expected that dynamic core changes will have an impact. The Kain–Fritsch (KF; Kain and Fritsch 1993) and Betts–Miller–Janjić (BMJ; Betts 1986; Betts and Miller 1986; Janjić 1994) cumulus parameterization schemes were used in the MP ensemble; the Lin et al. (1983), Ferrier et al. (2002), and WRF single-moment six-class (Skamarock et al. 2005) microphysical parameterization schemes were used as well.
Instead of a more equally balanced ensemble (diversified more through mixed model formulations), six WRF–ARW and two WRF–NMM members were used because the version of the WRF–NMM core used only had one microphysical parameterization scheme and two convective parameterization schemes available at the time of the simulations. Therefore, the maximum number of members possible using mixed physics with the NMM core is two, but the ARW core, which had nine microphysical schemes and three convective schemes available, could have 27 mixed-physics members.
The eight PILB members are all run using the NCEP NMM core with the KF cumulus parameterization and the Ferrier et al. (2002) microphysics scheme. The NMM core was chosen because of its reduced computational cost relative to the ARW. All eight members have unique ICs and LBCs that consist of four positive and four negative bred perturbations (Toth and Kalnay 1993) from the GFS Ensemble Prediction System. A complete summary of all ensemble member specifications is contained in Table 1.
b. Evaluation of the skill and spread of the precipitation forecasts
This study will focus on forecasts of 3-, 6-, 12-, and 24-h accumulated rainfall. The observations used for verification are from the Stage IV (Baldwin and Mitchell 1997) multisensor rainfall estimates. It should be noted that the stage IV multisensor data have been found to slightly overestimate (underestimate) rainfall amounts below (above) 0.25 inches in 24 h when compared with gauge-only data (Schwartz and Benjamin 2000). This may result in a slight artificial decrease (increase) in the biases calculated for the model output at thresholds below (above) 0.25 in. To perform the verification, the stage IV, WRF–NMM, and WRF–ARW rainfall data were remapped to a 10-km Lambert Conformal grid. The remapping of the WRF–NMM was done using postprocessing software (Chuang and Manikin 2001) that is included with the model code to convert from the E-staggered grid the WRF-NMM uses to a standard grid. The remappings for the stage IV rainfall data from its 4-km polar stereographic grid and the WRF–ARW rainfall data from its 15-km nonstaggered A-grid were done using a neighbor-budget interpolation that conserves the total volume of liquid in the domain (a procedure typically used at NCEP). Verification was performed for both deterministic and probabilistic forecasts derived from the ensemble system. Although recent work to improve quantitative precipitation forecasts (QPFs) has stressed the need for probabilistic guidance because of the ability to express uncertainty directly (e.g., Fritsch and Carbone 2004), deterministic forecasts are also examined in this study because there are still many forecasters and other users who require a “best estimate” of the future weather (Ebert 2001). Because our postprocessed precipitation output was rounded to the nearest millimeter, objective measures are evaluated using thresholds of 0.5, 2.5, 6.5, and 25.5 mm. The skill measures used for each type of forecast are described in the following two sections.
1) Verification of deterministic forecasts
Deterministic forecasts of precipitation are calculated from the ensemble system using a statistical procedure known as probability matching. This technique can be used to blend data types with different spatial and temporal properties and is especially useful when one data type has a better spatial representation while the other has greater accuracy (Ebert 2001). A detailed description of the calculation of model forecast rainfall using probability matching is contained in Ebert (2001). Basically, if we assume that the best spatial representation of rainfall is given by the ensemble mean and the best frequency distribution of rain amounts is given by the model QPFs, we can reassign the rain amounts from the ensemble mean using values randomly selected from the distribution of individual model QPFs. This corrects for the large bias in rain area and underestimation of rain amounts that are caused by the averaging process in fields like the ensemble mean, and it results in forecast rain fields that are much more realistic. Using an ensemble consisting of models run at different operational centers, Ebert (2001) concluded that the probability matching method is the most useful deterministic ensemble rainfall forecast for forecasters.
Equitable threat score (ETS; Schaefer 1990) and bias are used to verify the deterministic forecasts and are computed by constructing a contingency table composed of elements representing all possible forecast scenarios including hits (the model predicts an event that was observed), misses (an event occurs that was not predicted by the model), false alarms (the model predicts an event that does not occur), and correct negatives (the model correctly forecasts that an event does not occur). For a complete description of ETS and bias in terms of contingency table elements the reader is referred to Hamill (1999). ETSs range from −⅓ to 1; scores below 0 have no skill and 1 represents a perfect score. Bias values range from 0 to infinity. Values of bias significantly higher (lower) than 1 indicate that the model notably overpredicted (underpredicted) areal coverage.
Average ETSs and bias scores were calculated by summing the contingency table elements from all of the forecasts and computing the scores from the summed elements. This method gives a greater weight to large precipitation events than would result from simply averaging the measures valid for each case.
To determine the times at which differences in ETSs and biases were statistically significant, Hamill’s (1999) resampling methodology was used at the standard significance level α = 0.05. This procedure was strictly followed and repeated 1000 times for comparisons of ETSs and biases at each forecast hour for precipitation accumulated at 3-, 6-, 12-, and 24-h intervals.
2) Verification of probability forecasts
The area under the relative operating characteristic curve (ROC score; Mason 1982) will be used to evaluate the probabilistic forecasts from the MP and PILB ensembles. The ROC score is closely related to the economic value of a forecast system (e.g., Mylne 1999; Richardson 2000, 2001). Its purpose is to provide information on the characteristics of systems upon which management decisions can be made. The derivation of the ROC score is based on the members of a contingency table for probabilistic forecasts. To construct the ROC curve, the probability of detection (POD) is plotted against the probability of false detection (POFD) at each forecast probability of the forecast system. The area under the ROC curve, which begins with the points (0, 0) and ends with (1, 1), is calculated using the trapezoidal method, which is applied by adding the areas of the trapezoids formed by connecting the points on the ROC curve. For a complete description of the ROC score, POD, and POFD in terms of contingency table elements, the reader is referred to Wandishin et al. (2001). The range of values for the ROC score is 0 to 1. A score of 1 represents a perfect forecast while a score of 0.5 or below has no skill and a score of 0.7 is said to represent the lower limit of a useful forecast (Buizza et al. 1999). Hamill’s (1999) resampling methodology, as described in the previous section, was used to test statistical significance.
3) Evaluation of statistical consistency and ensemble spread
An ensemble system should be designed to exhibit statistical consistency; in other words, the mean-square error (MSE) of the ensemble mean should match the ensemble variance (Talagrand et al. 1999). For the formal definition of statistical consistency used to evaluate the ensemble systems in this study, the reader is referred to Eckel and Mass (2005).
In addition to the ensemble variance, the spread ratio is also used to evaluate ensemble spread. The spread ratio is defined as the ratio of the union of two or more fields to the intersection of the fields and is formally defined by Stensrud and Wandishin (2000). Because the spread ratio is measured at different rainfall thresholds, it can provide additional information to an analysis of spread in an ensemble system that metrics such as ensemble variance or root-mean-square difference (Lorenz 1969) are not able to provide. Also, it is especially useful for evaluating discontinuous fields such as precipitation because it can be used to evaluate the divergence of the forecast fields with time (Stensrud and Wandishin 2000).
Also, rank histograms (Hamill 2001) are an additional tool used to assess ensemble spread. Rank histograms are constructed by repeatedly tallying the rank of the rainfall observation relative to forecast values from an ensemble sorted from highest to lowest. Generally, a flat rank histogram is a sign of reliability; a u-shaped rank histogram indicates a lack of spread in the ensemble; and an n-shaped rank histogram indicates too much spread in the ensemble (Hamill 2001).
3. Results
a. Deterministic forecasts
The skill of the deterministic forecasts derived from each ensemble using the probability matching method is compared by constructing time series of ETSs from each ensemble for 3-, 6-, 12-, and 24-h intervals at the 0.5-, 6.5-, and 25.5-mm rainfall thresholds (Fig. 2). Generally, the ETSs increase for the longer accumulation periods because the scores for the longer accumulation periods are affected less by timing errors (Wandishin et al. 2001). For example, if a model forecast predicted rainfall 4 h too early, the 3-h accumulation period may miss the entire event while the 6-h period may capture most of the event. Also note the slopes of the ETS time series. Intuitively, a general decreasing trend caused by the model and initial condition errors growing with time is expected. However, when the time series reaches a constant value it is assumed that the model has reached its limit of predictability and any observed skill is equal to that of climatology. Therefore, the 0.5-mm threshold ETS time series flattens out at higher values than the 6.5-mm ETS time series because rain events above 0.5 mm occur more frequently than rain events above 6.5 mm. Also, the time at which the ETS time series flattens occurs later in the forecast for the longer accumulation periods because timing errors are minimized, as discussed above.
The diurnal cycle of rainfall is having an impact on the scores as is evident by relative maxima (minima) occurring at approximately 1200 UTC (0000 UTC) in some of the scores every 24 h (Fig. 2). The maxima (minima) correspond to the time at which the propagating component of the diurnal cycle in the Midwest is at its maximum (minimum) amplitude (Carbone et al. 2002), which is most clearly seen at the 0.5- and 6.5-mm rainfall thresholds for 3- and 6-h accumulation intervals (Figs. 2a and 2b). There is no signal from the diurnal cycle on the ETSs at the 25.5-mm rainfall threshold because errors have already completely saturated the scores at this threshold before the first 24-h forecast period.
As noted in the introduction, a goal of this study is to examine the window of forecast lead time before which unperturbed ICs/LBCs start to cause degradation in ensemble forecast skill. By not using perturbed ICs/LBCs in the MP ensemble, it is expected that a decrease in ETSs relative to the PILB ensemble at some forecast lead time will result. This expected degradation in skill from the MP ensemble relative to the PILB ensemble was not observed (Fig. 2). Although the MP ensemble tended to have slightly higher ETSs near the beginning of the forecast period and the PILB ensemble tended to have slightly higher ETSs near the end of the forecast period, the differences were not significant at any forecast hour.
However, statistically significant differences in bias were present at the majority of forecast lead times at the 0.5- and 6.5-mm rainfall thresholds, and at a few of the forecast lead times at the 25.5-mm rainfall threshold, with the MP ensemble having a higher bias than the PILB ensemble in all cases. Because Hamill (1999) notes that comparing ETSs from forecasts with differing biases can give the forecast with the higher bias an unfair advantage, the ETSs from the MP ensemble may have been artificially inflated from the high biases. As will be shown later, the higher biases in the MP ensemble are caused by the choice of dynamic core.
b. Probabilistic forecasts
The skill of the probabilistic forecasts from each ensemble is compared by constructing time series of ROC scores from each ensemble for 3-, 6-, 12-, and 24-h accumulation periods at the 0.5- and 2.5-mm rainfall thresholds (Fig. 3). At all of the accumulation periods and rainfall thresholds for which ROC scores are calculated, the MP ensemble appears to have higher scores for a period at the beginning of the forecast and the PILB ensemble appears to have higher scores for the rest of the forecast. As will be discussed in the next section, the spread behaves similarly. Differences in ROC scores at forecast lead times in which the MP ensemble had higher average scores were not statistically significant for any of the accumulation intervals or rainfall thresholds examined. However, differences in ROC scores at forecast lead times in which the PILB ensemble had higher average scores were statistically significant at all accumulation intervals and at both the 0.5- and 2.5-mm rainfall thresholds. The earliest forecast lead time for which the PILB ensemble ROC scores were higher with statistical significance occurred at the 0.5-mm rainfall threshold for 3-h accumulation periods at forecast hour 72 (Fig. 3a).
At all accumulation periods examined, it appears that the MP ensemble ROC scores are higher, relative to the PILB ensemble ROC scores, at the 0.5-mm threshold than at the 2.5-mm threshold. In other words, the difference between the ROC scores [ROC(PILB) − ROC(MP)] appears to be smaller at the 0.5-mm threshold than at the 2.5-mm threshold. Because the difference in bias [bias(MP) − bias(PILB)] is larger at the 0.5-mm threshold than at the 2.5-mm threshold for 3- and 6-h accumulation intervals (not shown), it is speculated that the higher relative bias in the MP ensemble may be artificially inflating the MP ensemble ROC scores. A similar artificial increase in forecast skill using other metrics such as ETS has been shown by Hamill (1999). However, at 12- and 24-h accumulation intervals, the difference in bias [bias(MP) − bias(PILB)] is actually smaller at the 0.5-mm threshold than at the 2.5-mm threshold (not shown). Thus, the MP ensemble may truly have better forecast skill relative to the PILB ensemble for the 0.5-mm threshold.
Because it has been shown that forecast skill metrics can be sensitive to differing bias, it is useful to diagnose the sensitivity of areal precipitation coverage to different types of model perturbations. To examine the sensitivity of areal precipitation coverage above 0.5 and 2.5 mm to the physics choice, the number of grid points forecast to exceed 0.5 and 2.5 mm of precipitation by all KF and BMJ members of the MP ensemble that use the ARW dynamic core (three KF versus three BMJ members) are summed for each forecast hour over all 72 cases (Fig. 4a). To examine the sensitivity of areal precipitation coverage to dynamic core choice, a similar procedure is used to sum the number of grid points forecast to exceed 0.5 and 2.5 mm of precipitation by the NMM and ARW members of the MP ensemble that use the same physics packages (two NMM versus two ARW members; see Fig. 4b). Note that when multiple members from each subset forecasted precipitation at one grid point, that grid point is only counted once. Figures 4a and 4b show both that the total areal precipitation coverage above 0.5 and 2.5 mm, respectively, appears to be more sensitive to the dynamic core choice than the physics choice and that the inclusion of the ARW members in the MP ensemble causes the larger areal precipitation coverage above 0.5 mm in the MP ensemble. Trends in areal coverage are similar to trends in rain volume (which will be shown later) and agree with what was found by Gallus and Bresch (2006). Note that small oscillations with a 6-h period at the lightest rainfall threshold can be seen in Fig. 4b. The oscillations appeared only after the WRF–NMM data were remapped to a standard grid by the WRF–NMM postprocessor, and this signal is thus spurious. The oscillations do not affect the results.
c. Statistical consistency and ensemble spread
1) Statistical consistency
To analyze and compare the statistical consistency of each ensemble, the MSE of the ensemble mean and the ensemble variance calculated for 3-, 6-, 12-, and 24-h accumulation periods are examined (Fig. 5). In both ensembles a general lack of statistical consistency is very apparent. In the MP ensemble, the ensemble variance stops increasing shortly after forecast hour 24, but the MSE continues to increase, resulting in an increasing lack of statistical consistency with forecast lead time. However, in the PILB ensemble, the ensemble variance and MSE both increase throughout the entire forecast period, with the ensemble variance increasing at a faster rate than the MSE, resulting in increasing statistical consistency with increasing forecast lead time. In fact, near the end of the forecast period at 3-, 6- and 12-h accumulation intervals, the ensemble variance and MSE in the PILB ensemble appear to be nearly equal. However, note that the ensemble variance and MSE in the PILB ensemble only appear to become very close near minima in the diurnal precipitation cycle. When the entire diurnal cycle is considered in the 24-h accumulation intervals, there continues to be an apparent lack of statistical consistency. It is encouraging that the PILB ensemble appears to come close to being statistically consistent near the end of the forecast period, a trend indicating good reliability, or agreement between forecast probability and mean observed frequency. However, by the time the PILB ensemble becomes reliable, the resolution (i.e., ability of the forecast to discriminate between “events” and “nonevents”) as measured by the ROC scores (Fig. 3) is considerably lower than at the beginning of the forecast, even though the ROC scores are still above the 0.7 minimum threshold (Buizza et al. 1999) for a useful forecast.
At all of the accumulation periods, the MP ensemble variances appear to start off higher than the PILB ensemble during the first 24 h of the forecast and then appear to become lower than the PILB ensemble at approximately the same time the MP ensemble variance stops increasing (Fig. 5). However, at virtually all forecast lead times the MSE of the MP ensemble appears to be greater than that of the PILB ensemble. Because the MSE is sensitive to errors in heavy rainfall amounts and the MP ensemble has a higher bias than the PILB ensemble, the higher MSE in the MP ensemble is not unexpected. In addition, because ensemble variance is also sensitive to heavy precipitation amounts, as long as the heavy amounts are not exactly collocated, these heavy amounts will increase the ensemble variance. Thus, it is very possible that a bias correction procedure (e.g., Eckel and Mass 2005) applied to the MP ensemble forecasts would result in the MP ensemble variance remaining lower than the PILB ensemble variance, even during the first 24 h of the forecast.
Average error growth rates can be approximated by fitting a least-squares line to the MSE results in Fig. 5. This procedure yields average error growth rates for the PILB (MP) ensemble of −0.2% (0.8%), 0.1% (6.9%), 13.2% (51.1%), and 126% (285%) for the 3-, 6-, 12-, and 24-h accumulations periods, respectively. These results are important to consider for ensemble design because any perturbation strategy needs to capture the different error growth rates to be reliable. The lack of error growth at the short accumulation intervals likely occurs because small scales (i.e., convection) are being captured and the error growth at these small scales saturates quickly. At the longer accumulation periods, larger-scale phenomena, in which the error growth does not saturate within the 120-h forecast period examined, are being captured, resulting in the larger average error growth rates than at the shorter accumulation periods. The higher average error growth rates in the MP ensemble relative to the PILB ensemble at each accumulation interval are likely caused by the lack of spread after around forecast hour 24 in the MP ensemble.
A coherent diurnal oscillation is evident in ensemble variance and MSE from both ensembles at 3-, 6-, and 12-h accumulation periods (Figs. 5a–c). Because the amplitude of the oscillations in MSE is greater than the amplitude of ensemble variance in both MP and PILB ensembles, it is likely that the ensembles do not represent the diurnal cycle of precipitation well. The inability of the ensembles to represent the diurnal cycle of precipitation [a deficiency which has been well documented in other numerical models (e.g., Davis et al. 2003; Clark et al. 2007)] is confirmed by a comparison of time series of average domain rain volume for the ensemble means, observations, and all ensemble members (Fig. 6). The amplitude of the diurnal precipitation cycle in both ensemble means as well as in all ensemble members is less than the amplitude of the observed diurnal precipitation cycle (Fig. 6). Also, on average, all of the ARW members forecast more rainfall than the NMM members, thereby matching findings by Gallus and Bresch (2006). If the diurnal cycle representation of the ensemble members was improved, perhaps through the use of convection-resolving models (e.g., Clark et al. 2007) or postprocessing calibration (e.g., Eckel and Mass 2005), the statistical consistency would also improve.
2) Spread ratio
The spread ratio provides additional information regarding ensemble spread because it can be calculated for different rainfall thresholds. Thus, to compare the spread ratios of the MP and PILB ensembles, time series of spread ratio are computed from both ensembles at the 0.5- and 2.5-mm rainfall thresholds (Fig. 7). Generally, these time series reveal that the spread ratio increases more at the 2.5-mm threshold than at the 0.5-mm rainfall threshold. This difference simply results from heavier areas of rainfall being smaller than lighter areas of rainfall, so small displacements in rainfall areas result in larger spread ratio increases at higher rainfall thresholds. Also, the effects of the unperturbed ICs/LBCs on the MP ensemble forecasts become very apparent. For all accumulation periods and rainfall thresholds, spread ratios level out after about forecast hour 24 in the MP ensemble, while the spread ratios in the PILB ensemble become much larger, a direct result of using perturbed (unperturbed) ICs/LBCs in the PILB (MP) ensemble.
Although the time series of ensemble variance show that the MP ensemble had greater spread during the first 24 h of the forecast, the spread ratios did not indicate this. At all forecast hours and rainfall thresholds, the PILB ensemble had higher spread ratios than the MP ensemble. This discrepancy may result from the higher bias in the MP ensemble relative to the PILB ensemble affecting the ensemble variance and spread ratio in different ways. As discussed previously, increased bias should result in increased ensemble variance when areas of heavier rainfall are not exactly collocated. However, an increased bias may actually decrease the spread ratio. Consider two idealized circular rain areas separated by some distance so that the circles nearly overlap. Initially, the spread ratio would be equal to infinity, but if the areas of the circles increase while remaining in the same location (i.e., increasing bias), the spread ratio will begin to decrease from infinity and approach one as both circles nearly completely overlap each other. Thus, if the MP ensemble spread ratios were recalculated after a bias correction was applied, the spread ratios should actually increase.
3) Rank histograms
To gain more information on how well each ensemble represents the forecast uncertainty, rank histograms are constructed for each ensemble for 3-, 6-, 12-, and 24-h accumulation periods at forecast hours 24, 48, 72, 96, and 120 (Fig. 8). At each accumulation interval, with increasing forecast lead time, the PILB ensemble rank histograms become flatter, indicating an increase in reliability, while the MP ensemble rank histograms become increasingly u-shaped, indicating a decrease in reliability. These results are consistent with the analysis of statistical consistency (Fig. 5). In fact, when MSE and ensemble variance appear to be nearly equal at forecast hour 120 in the PILB ensemble for 3-, 6-, and 12-h accumulations periods (Figs. 5a–c), the corresponding rank histograms are nearly flat (neglecting the right-skewed appearance).
The right-skewed appearance of the rank histogram from both ensembles indicates that most of the ensemble members are overpredicting precipitation. Both ensembles become more heavily right skewed as accumulation intervals increase, with the MP ensemble being slightly more right skewed than the PILB ensemble, likely a result of the inclusion of ARW members in the MP ensemble.
d. Contributions to PILB ensemble variance: ICs versus LBCs
While the PILB ensemble contains both IC and LBC perturbations, it is important to realize that the perturbed LBCs are likely contributing to a majority of the ensemble variance. Previous works illustrating that perturbed LBCs have a greater impact on error growth in limited area models were discussed in a thorough literature review contained in Nutter et al. (2004).
To gain further insight in our study on contributions to error growth from IC and LBC perturbations, an additional set of simulations was performed for seven cases using an ensemble with nonperturbed ICs and perturbed LBCs (NIC ensemble). To approximate the contribution of perturbed ICs to the PILB ensemble variance, the NIC ensemble variance, which represents the variance contributed by perturbed LBCs, was subtracted from the PILB ensemble variance. Then, the fraction of the total variance from the perturbed ICs at each forecast hour was calculated. These fractions can be expressed as [Var(PILB) − Var(NIC)]/Var(PILB) and are plotted in Fig. 9 at 3 h intervals for precipitation and 500- and 850-hPa geopotential height. For the seven cases analyzed, the impact of perturbed ICs decrease to around 50% of the PILB ensemble variance by forecast hour 12 for rainfall and by forecast hour 6 for 500- and 850-hPa geopotential heights. By forecast hour 36, the contribution to the PILB ensemble variance from perturbed ICs decreases to around 0% in all variables analyzed. Thus, it is very likely that the perturbed LBCs are, in fact, contributing to a majority of the PILB ensemble variance throughout most of the 120-h forecast period. In addition, these results imply that a lack of perturbed LBCs, not ICs, is likely the primary reason for the lack of spread in the MP ensemble after forecast hour 24.
4. Summary and future work
An experiment was designed to examine the contributions of IC/LBC errors and model errors to the spread and skill of 120-h precipitation forecasts from two 15-km grid-spacing WRF ensembles composed of eight members each with a domain centered over the central United States. The forecasts were conducted for a period during late winter/early spring 2006. In one ensemble the perfect analysis assumption was made, isolating model errors by using different physical parameterizations and dynamic cores while using unperturbed ICs and LBCs. In the other ensemble the perfect model assumption was made, isolating IC/LBC errors by using perturbed ICs and LBCs from GFS ensemble members while using the same dynamic core and physics parameterizations. By isolating the error sources, the window of forecast lead time over which the unperturbed ICs/LBCs in the MP ensemble begin to cause degradation in ensemble forecast skill can be examined.
Verification was performed on deterministic forecasts computed from each ensemble using the probability matching method. Time series of ETSs for a number of rainfall thresholds revealed that the MP and PILB ensembles exhibited similar skill with differences in scores that were not statistically significant at any of the forecast lead times or rainfall thresholds examined. However, the differences in biases between the two ensembles were statistically significant at many of the forecast lead times at all rainfall thresholds examined, with the MP ensemble always having a bias greater than the PILB ensemble. The greater bias in the MP ensemble compared to the PILB ensemble was caused by the inclusion of the ARW members in the MP ensemble. The PILB ensemble was only composed of members using the NMM dynamic core.
Verification performed on the probabilistic forecasts computed from each ensemble using time series of ROC scores revealed that the MP ensemble appeared to have higher average scores than the PILB ensemble during the first 24 h of the forecasts at all accumulation periods, but these results were not statistically significant. The PILB ensemble appeared to have higher ROC scores than the MP ensemble at the majority of the forecast lead times after hour 36, with the differences being statistically significant at many times after forecast hour 69. The degradation in forecast skill in the MP ensemble, as measured by ROC scores, was a direct result of using unperturbed ICs/LBCs. Because this degradation in forecast skill was not observed in a comparison of deterministic forecasts generated using the probability matching method, the results of this study indicate that, because of the extreme difficulties associated with making deterministic forecasts of precipitation, improvements in ensemble design may only be reflected in the probabilistic forecasts of precipitation produced by the ensemble and not in the deterministic forecasts produced. This suggests that efforts to improve ensemble design may be better realized if stochastic rather than deterministic forecast skill is emphasized.
An analysis of statistical consistency showed that both ensembles were underestimating forecast uncertainty; in other words, the ensembles were underdispersive. There was a trend toward (away) from statistical consistency with increasing forecast lead time in the PILB (MP) ensemble. Also, the inability of both ensembles to represent the diurnal cycle of precipitation was shown to contribute to the lack of statistical consistency.
Unlike the ensemble variance, spread ratios indicated that the PILB ensemble had greater spread at all forecast lead times, even during the first 24 h when previous studies (e.g., Stensrud 2000) have shown that mixing the physical parameterization schemes leads to much greater spread than in an ensemble with only perturbed ICs and LBCs. It is speculated that the discrepancy between the spread indicated by the ensemble variance and spread ratio can be attributed to higher biases in the MP ensemble, which increase the ensemble variance and decrease the spread ratio.
The rank histograms for each ensemble were consistent with the statistical consistency plots. In addition, the right-skewed appearance of the rank histograms for both the PILB and MP ensembles indicated a tendency to overpredict precipitation.
Overall, results indicated that unperturbed ICs/LBCs in an ensemble using mixed-physics and dynamic cores began to negate increasing spread around forecast hour 24. However, ensemble forecast skill as measured by ROC scores (with statistical significance in the MP ensemble compared to the PILB one) did not become lower until after forecast hour 69. It is important to note that these results are exclusive to the domain and time period analyzed in this study, because the time it takes for the lateral boundaries to infiltrate the domain is dependent upon the domain size and large-scale flow. Thus, a different size domain would have likely produced a different length of time before significant differences between the forecast skill of the MP and PILB ensembles were observed. Also, it is likely that improvements in the forecasts could be made through postprocessing calibration (i.e., bias correction) as shown by Eckel and Mass (2005). The postprocessing calibration would likely decrease the spread in the MP ensemble and not in the PILB ensemble, resulting in a more pronounced difference in forecast skill and spread between the two ensembles.
It is also important to note that by varying only the cumulus and microphysics schemes along with the dynamic core in the MP ensemble, only a portion of the model error is captured. A more complete representation of model error would likely produce higher values of ensemble variance before leveling off near forecast hour 24, and, in turn, result in better forecast skill.
Future work should analyze the effects of bias correction and also examine forecast periods during the warm season, because other studies (e.g., Jankov et al. 2005; Alhamed et al. 2002; Wandishin et al. 2001; Stensrud et al. 2000) have suggested that mixed-physics and mixed-model formulations are a more effective technique for increasing spread during the warm season than in the cool season. In addition, because Gallus and Bresch (2006) show that spread from the use of different dynamic cores is a function of the physics schemes used and can be comparable to that from differing physics, future studies should be performed as more physics options become available in the WRF–NMM. Also, future work should investigate the effects on precipitation forecasts of changing the physics and perturbing the LBCs over a similar forecast period.
Acknowledgments
The authors thank two anonymous reviewers for their comments, which helped to improve the manuscript. In addition, Jon Hobbs at Iowa State University (ISU) and Isidora Jankov at the Global Systems Division of the Earth System Research Laboratory assisted with the computational work. This research was funded by Baker Endowment Fund 497-01-78-3803 and NSF Grant ATM-0537043. Model simulations were conducted on the 64-processor computing cluster in the meteorology program at ISU.
REFERENCES
Alhamed, A., S. Lakshmivarahan, and D. Stensrud, 2002: Cluster analysis of multimodel ensemble data from SAMEX. Mon. Wea. Rev., 130 , 226–256.
Baldwin, M. E., and K. E. Mitchell, 1997: The NCEP hourly multisensor U.S. precipitation analysis for operations and GCIP research. Preprints, 13th Conf. on Hydrology, Long Beach, CA, Amer. Meteor. Soc., 54–55.
Betts, A. K., 1986: A new convective adjustment scheme. Part I: Observational and theoretical basis. Quart. J. Roy. Meteor. Soc., 112 , 677–691.
Betts, A. K., and M. J. Miller, 1986: A new convective adjustment scheme. Part II: Single column tests using GATE wave, BOMEX, ATEX, and Arctic air-mass data sets. Quart. J. Roy. Meteor. Soc., 112 , 693–709.
Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributions of the ECMWF ensemble prediction system. Mon. Wea. Rev., 125 , 99–119.
Buizza, R., A. Hollingsworth, F. Lalaurette, and A. Ghelli, 1999: Probabilistic predictions of precipitation using the ECMWF Ensemble Prediction System. Wea. Forecasting, 14 , 168–189.
Carbone, R. E., J. D. Tuttle, D. A. Ahijevych, and S. B. Trier, 2002: Inferences of predictability associated with warm season precipitation episodes. J. Atmos. Sci., 59 , 2033–2056.
Chuang, H., and G. Manikin, 2001: The NCEP Meso Eta Model post processor: A documentation. NCEP Office Note 438, NOAA/NWS, 52 pp.
Clark, A. J., W. A. Gallus, and T-C. Chen, 2007: Comparison of the diurnal precipitation cycle in convection-resolving and non-convection-resolving mesoscale models. Mon. Wea. Rev., 135 , 3456–3473.
Davis, C. A., K. W. Manning, R. E. Carbone, S. B. Trier, and J. D. Tuttle, 2003: Coherence of warm-season continental rainfall in numerical weather prediction models. Mon. Wea. Rev., 131 , 2667–2679.
Du, J., G. DiMego, M. S. Tracton, and B. Zhou, 2003: NCEP Short Range Ensemble Forecasting (SREF) system: Multi-IC, multi-model and multi-physics approach. Research Activities in Atmospheric and Oceanic Modelling, J. Cote, Ed., Rep. 33, WMO/TD 1161, CAS/JSC Working Group Numerical Experimentation (WGNE), 5.09–5.10. [Available online at http://www.emc.ncep.noaa.gov/mmb/SREF/srefWMO_2003.pdf.].
Du, J., and Coauthors, 2004: The NOAA/NWS/NCEP Short Range Ensemble Forecast (SREF) system: Evaluation of an initial condition versus multiple model physics ensemble approach. Preprints, 20th Conf. on Weather Analysis and Forecasting/16th Conf. on Numerical Weather Prediction, Seattle, WA, Amer. Meteor. Soc., 21.3. [Available online at http://ams.confex.com/ams/pdfpapers/71107.pdf.].
Du, J., J. McQueen, G. DiMego, Z. Toth, D. Jovic, B. Zhou, and H. Chuang, 2006: New dimension of NCEP Short-Range Ensemble Forecasting (SREF) system: Inclusion of WRF members. Preprints, WMO Expert Team Meeting on Ensemble Prediction System, Exeter, United Kingdom, WMO, 5 pp. [Available online at http://www.emc.ncep.noaa.gov/mmb/SREF/WMO06_full.pdf.].
Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation. Mon. Wea. Rev., 129 , 2461–2480.
Eckel, F. A., and C. F. Mass, 2005: Aspects of effective mesoscale, short-range ensemble forecasting. Wea. Forecasting, 20 , 328–350.
Environmental Modeling Center, 2003: The GFS atmospheric model. NCEP Office Note 442, NCEP/NWS, 14 pp. [Available online at http://www.emc.ncep.noaa.gov/officenotes/newernotes/on442.pdf.].
Errico, R., and D. Baumhefner, 1987: Predictability experiments using a high-resolution limited-area model. Mon. Wea. Rev., 115 , 488–504.
Ferrier, B. S., Y. Jin, Y. Lin, T. Black, E. Rogers, and G. DiMego, 2002: Implementation of a new grid-scale cloud and rainfall scheme in the NCEP Eta Model. Preprints, 15th Conf. on Numerical Weather Prediction, San Antonio, TX, Amer. Meteor. Soc., 280–283.
Fritsch, J. M., and R. E. Carbone, 2004: Improving quantitative precipitation forecasts in the warm season: A USWRP research and development strategy. Bull. Amer. Meteor. Soc., 85 , 955–965.
Gallus Jr., W. A., and J. F. Bresch, 2006: Comparison of impacts of WRF dynamic core, physics package, and initial conditions on warm season rainfall forecasts. Mon. Wea. Rev., 134 , 2632–2641.
Gilmour, I., L. A. Smith, and R. Buizza, 2001: Linear region duration: Is 24 hours a long time in synoptic weather forecasting? J. Atmos. Sci., 58 , 3525–3539.
Hamill, T. M., 1999: Hypothesis tests for evaluating numerical precipitation forecasts. Wea. Forecasting, 14 , 155–167.
Hamill, T. M., 2001: Interpretations of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129 , 550–560.
Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125 , 1312–1327.
Hamill, T. M., and S. J. Colucci, 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126 , 711–724.
Hou, D., E. Kalnay, and K. K. Drogemeier, 2001: Objective verification of the SAMEX ’98 ensemble forecasts. Mon. Wea. Rev., 129 , 73–91.
Houtekamer, P. L., L. Lefaivre, J. Derome, H. Ritchie, and H. L. Mitchell, 1996: A system simulation approach to ensemble prediction. Mon. Wea. Rev., 124 , 1225–1242.
Janjić, Z. I., 1994: The step-mountain Eta coordinate model: Further developments of the convection, viscous sublayer, and turbulence closure schemes. Mon. Wea. Rev., 122 , 927–945.
Janjić, Z. I., 2003: A nonhydrostatic model based on a new approach. Meteor. Atmos. Phys., 82 , 271–285.
Jankov, I., W. A. Gallus Jr., M. Segal, B. Shaw, and S. E. Koch, 2005: The impact of different WRF model physical parameterizations and their interactions on warm season MCS rainfall. Wea. Forecasting, 20 , 1048–1060.
Jankov, I., W. A. Gallus Jr., M. Segal, and S. E. Koch, 2007: Influence of initial conditions on the WRF–ARW model QPF response to physical parameterization changes. Wea. Forecasting, 22 , 501–519.
Kain, J. S., and J. M. Fritsch, 1993: Convective parameterization for mesoscale models: The Kain–Fritsch scheme. The Representation of Cumulus Convection in Numerical Models, Meteor. Monogr., No. 46, Amer. Meteor. Soc., 165–170.
Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77 , 437–471.
Lin, Y-L., R. D. Farley, and H. D. Orville, 1983: Bulk parameterization of the snow field in a cloud model. J. Climate Appl. Meteor., 22 , 1065–1092.
Lorenz, E., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21 , 289–307.
Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag., 30 , 291–303.
Michalakes, J., S. Chen, J. Dudhia, L. Hart, J. Klemp, J. Middlecoff, and W. Skamarock, 2001: Development of a next generation regional weather research and forecast model. Developments in Teracomputing: Proceedings of the Ninth ECMWF Workshop on the Use of High Performance Computing in Meteorology, W. Zwieflhofer and N. Kreitz, Eds., World Scientific, 269–276.
Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF Ensemble Prediction System: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122 , 73–119.
Mylne, K. R., 1999: The use of forecast value calculations for optimal decision making using probability forecasts. Preprints, 17th Conf. on Weather Analysis and Forecasting, Denver, CO, Amer. Meteor. Soc., 235–239.
Nutter, P., D. Stensrud, and M. Xue, 2004: Effects of coarsely resolved and temporally interpolated lateral boundary conditions on the dispersion of limited-area ensemble forecasts. Mon. Wea. Rev., 132 , 2358–2377.
Palmer, F., R. Molteni, R. Mureau, P. Buizza, P. Chapelet, and J. Tribbia, 1992: Ensemble prediction. ECMWF Research Dept. Tech. Memo. 188, 45 pp.
Richardson, D. S., 2000: Applications of cost-loss models. Proc. Seventh ECMWF Workshop on Meteorological Operational Systems, Reading, United Kingdom, ECMWF, 209–213.
Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc., 127 , 2473–2489.
Roebber, P. J., D. M. Schultz, B. A. Colle, and D. J. Stensrud, 2004: Toward improved prediction: High-resolution and ensemble modeling systems in operations. Wea. Forecasting, 19 , 936–949.
Schaefer, J. T., 1990: The critical success index as an indicator of warning skill. Wea. Forecasting, 5 , 570–575.
Schwartz, B. E., and S. G. Benjamin, 2000: Verification of RUC2 precipitation forecasts using the NCEP multisensor analysis. Preprints. Fourth Symp. on Integrated Observing Systems, Long Beach, CA, Amer. Meteor. Soc., 182–185.
Skamarock, W. C., J. B. Klemp, and J. Dudhia, 2001: Prototypes for the WRF (Weather Research and Forecasting) model. Preprints, Ninth Conf. on Mesoscale Processes, Fort Lauderdale, FL, Amer. Meteor. Soc., J15. [Available online at http://ams.confex.com/ams/pdfpapers/23297.pdf.].
Skamarock, W. C., J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. G. Powers, 2005: A description of the Advanced Research WRF, version 2. NCAR Tech. Note NCAR/TN-468+STR, 88 pp. [Available online at http://www.mmm.ucar.edu/wrf/users/docs/arw_v2.pdf.].
Stensrud, D. J., and M. S. Wandishin, 2000: The correspondence ratio in forecast evaluation. Wea. Forecasting, 15 , 593–602.
Stensrud, D. J., J. Bao, and T. T. Warner, 2000: Using initial condition and model physics perturbations in short-range ensemble simulations of mesoscale convective systems. Mon. Wea. Rev., 128 , 2077–2107.
Talagrand, O., R. Vautard, and B. Strauss, 1999: Evaluation of probabilistic prediction systems. Proc. ECMWF Workshop on Predictability, Reading, United Kingdom, ECMWF, 1–25.
Torn, R. D., G. J. Hakim, and C. Snyder, 2006: Boundary conditions for limited-area ensemble Kalman filters. Mon. Wea. Rev., 134 , 2490–2502.
Toth, Z., and E. Kalnay, 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc., 74 , 2317–2330.
Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125 , 3297–3319.
Vukecevic, T., and J. Paegle, 1989: The influence of one-way interacting lateral boundary conditions upon predictability of flow in bounded numerical models. Mon. Wea. Rev., 117 , 340–350.
Wandishin, M. S., S. L. Mullen, D. J. Stensrud, and H. E. Brooks, 2001: Evaluation of a short-range multimodel ensemble system. Mon. Wea. Rev., 129 , 729–747.
Warner, T. T., R. A. Peterson, and R. E. Treadon, 1997: A tutorial on lateral boundary conditions as a basic and potentially serious limitation to regional numerical weather prediction. Bull. Amer. Meteor. Soc., 78 , 2599–2617.
Wicker, L. J., and W. C. Skamarock, 2002: Time-splitting methods for elastic models using forward time schemes. Mon. Wea. Rev., 130 , 2088–2097.

Domain of the WRF ensemble members.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Domain of the WRF ensemble members.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Domain of the WRF ensemble members.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of average ETSs computed for precipitation accumulated at (a) 3-, (b) 6-, (c) 12-, and (d) 24-h intervals. The rainfall thresholds of 0.5, 6.5, and 25.5 mm for the MP and PILB ensembles are included and ETS averages over all times along with the corresponding average bias scores are displayed in the top-right section.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of average ETSs computed for precipitation accumulated at (a) 3-, (b) 6-, (c) 12-, and (d) 24-h intervals. The rainfall thresholds of 0.5, 6.5, and 25.5 mm for the MP and PILB ensembles are included and ETS averages over all times along with the corresponding average bias scores are displayed in the top-right section.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Time series of average ETSs computed for precipitation accumulated at (a) 3-, (b) 6-, (c) 12-, and (d) 24-h intervals. The rainfall thresholds of 0.5, 6.5, and 25.5 mm for the MP and PILB ensembles are included and ETS averages over all times along with the corresponding average bias scores are displayed in the top-right section.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of average ROC scores at the 0.5-mm threshold for (a) 3-, (b) 6-, (c) 12-, and (d) 24-h intervals and at the 2.5-mm threshold for (e) 3-, (f) 6-, (g) 12-, and (h) 24-h intervals. Asterisks near the x axis of each plot denote the times at which the differences between the ROC scores of the MP and PILB ensembles are statistically significant at level α = 0.05.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of average ROC scores at the 0.5-mm threshold for (a) 3-, (b) 6-, (c) 12-, and (d) 24-h intervals and at the 2.5-mm threshold for (e) 3-, (f) 6-, (g) 12-, and (h) 24-h intervals. Asterisks near the x axis of each plot denote the times at which the differences between the ROC scores of the MP and PILB ensembles are statistically significant at level α = 0.05.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Time series of average ROC scores at the 0.5-mm threshold for (a) 3-, (b) 6-, (c) 12-, and (d) 24-h intervals and at the 2.5-mm threshold for (e) 3-, (f) 6-, (g) 12-, and (h) 24-h intervals. Asterisks near the x axis of each plot denote the times at which the differences between the ROC scores of the MP and PILB ensembles are statistically significant at level α = 0.05.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series at 3-h intervals of areal precipitation coverage above 0.5 and 2.5 mm forecast by (a) the KF and BMJ members within the MP ensemble that use the ARW dynamic core, and (b) the NMM and ARW members within the MP ensemble with the same physics.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series at 3-h intervals of areal precipitation coverage above 0.5 and 2.5 mm forecast by (a) the KF and BMJ members within the MP ensemble that use the ARW dynamic core, and (b) the NMM and ARW members within the MP ensemble with the same physics.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Time series at 3-h intervals of areal precipitation coverage above 0.5 and 2.5 mm forecast by (a) the KF and BMJ members within the MP ensemble that use the ARW dynamic core, and (b) the NMM and ARW members within the MP ensemble with the same physics.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Ensemble variance and MSE of the MP and PILB ensemble mean precipitation forecasts for accumulation periods of (a) 3, (b) 6, (c) 12, and (d) 24 h.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Ensemble variance and MSE of the MP and PILB ensemble mean precipitation forecasts for accumulation periods of (a) 3, (b) 6, (c) 12, and (d) 24 h.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Ensemble variance and MSE of the MP and PILB ensemble mean precipitation forecasts for accumulation periods of (a) 3, (b) 6, (c) 12, and (d) 24 h.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of average rainfall forecast for 3-h intervals by each ensemble member for the (a) PILB ensemble and (b) MP ensemble. The ensemble mean (ENS_MEAN) is also included along with the observed rainfall volume (OBS). The ensemble member abbreviations in the legends are defined in Table 1.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of average rainfall forecast for 3-h intervals by each ensemble member for the (a) PILB ensemble and (b) MP ensemble. The ensemble mean (ENS_MEAN) is also included along with the observed rainfall volume (OBS). The ensemble member abbreviations in the legends are defined in Table 1.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Time series of average rainfall forecast for 3-h intervals by each ensemble member for the (a) PILB ensemble and (b) MP ensemble. The ensemble mean (ENS_MEAN) is also included along with the observed rainfall volume (OBS). The ensemble member abbreviations in the legends are defined in Table 1.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of average spread ratios at the 0.5-mm threshold for (a) 3-, (b) 6-, (c) 12-, and (d) 24-h intervals and at the 2.5-mm threshold for (e) 3-, (f) 6-, (g) 12-, and (h) 24-h intervals.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of average spread ratios at the 0.5-mm threshold for (a) 3-, (b) 6-, (c) 12-, and (d) 24-h intervals and at the 2.5-mm threshold for (e) 3-, (f) 6-, (g) 12-, and (h) 24-h intervals.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Time series of average spread ratios at the 0.5-mm threshold for (a) 3-, (b) 6-, (c) 12-, and (d) 24-h intervals and at the 2.5-mm threshold for (e) 3-, (f) 6-, (g) 12-, and (h) 24-h intervals.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Rank histograms for the MP and PILB ensembles calculated at forecast hours 24, 48, 72, 96, and 120 for accumulation periods of (a) 3, (b) 6, (c) 12, and (d) 24 h.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Rank histograms for the MP and PILB ensembles calculated at forecast hours 24, 48, 72, 96, and 120 for accumulation periods of (a) 3, (b) 6, (c) 12, and (d) 24 h.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Rank histograms for the MP and PILB ensembles calculated at forecast hours 24, 48, 72, 96, and 120 for accumulation periods of (a) 3, (b) 6, (c) 12, and (d) 24 h.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of fractional ensemble variance contributed to the PILB ensemble from perturbed ICs calculated using [Var(PILB) − Var(NIC)]/Var(PILB) from seven cases for (a) rainfall, (b) 500-hPa geopotential height, and (c) 850-hPa geopotential height. Each line in (a)–(c) corresponds to one of the seven cases and the gray dotted line marks ratios of 0.5 and 0.0.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1

Time series of fractional ensemble variance contributed to the PILB ensemble from perturbed ICs calculated using [Var(PILB) − Var(NIC)]/Var(PILB) from seven cases for (a) rainfall, (b) 500-hPa geopotential height, and (c) 850-hPa geopotential height. Each line in (a)–(c) corresponds to one of the seven cases and the gray dotted line marks ratios of 0.5 and 0.0.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Time series of fractional ensemble variance contributed to the PILB ensemble from perturbed ICs calculated using [Var(PILB) − Var(NIC)]/Var(PILB) from seven cases for (a) rainfall, (b) 500-hPa geopotential height, and (c) 850-hPa geopotential height. Each line in (a)–(c) corresponds to one of the seven cases and the gray dotted line marks ratios of 0.5 and 0.0.
Citation: Monthly Weather Review 136, 6; 10.1175/2007MWR2029.1
Model physics options, dynamics options, and ICs and LBCs for all 16 ensemble members. The first eight members are the mixed-physics members and the last eight are the perturbed IC and LBC members. In the “Ensemble member” column, names are given to denote each unique ensemble member. The “WRF core” column specifies the dynamic cores used by the members. The microphysics schemes Lin, Ferrier, and WSM6 refer to the Lin et al. (1983), the Ferrier et al. (2002), and the WRF single-moment 6-class schemes, respectively. The last column, “ICs and LBCs,” specifies the initial and lateral boundary conditions used for each member. GFS denotes the model run operationally at NCEP and n# GFS and p# GFS denote GFS ensemble members with negative and positive bred perturbations, respectively.

