A methodology is proposed to investigate the scale dependence of the predictability of precipitation patterns at the mesoscale. By applying it to two or more precipitation fields, either modeled or observed, a decorrelation scale can be defined such that all scales smaller than are fully decorrelated. For precipitation forecasts from a radar data–assimilating storm-scale ensemble forecasting (SSEF) system, is found to increase with lead time, reaching 300 km after 30 h. That is, for , the ensemble members are fully decorrelated. Hence, there is no predictability of the model state for these scales. For , the ensemble members are correlated, indicating some predictability by the ensemble. When applied to characterize the ability to predict precipitation as compared to radar observations by numerical weather prediction (NWP) as well as by Lagrangian persistence and Eulerian persistence, increases with lead time for most forecasting methods, while it is constant (300 km) for non–radar data–assimilating NWP.
Comparing the different forecasting models, it is found that they are similar in the 0–6-h range and that none of them exhibit any predictive ability at meso-γ and meso-β scales after the first 2 h. On the other hand, the radar data–assimilating ensemble exhibits predictability of the model state at these scales, thus causing a systematic difference between corresponding to the ensemble and corresponding to model and radar. This suggests that either the ensemble does not have sufficient spread at these scales or that the forecasts suffer from biases.
As shown by Lorenz (1963), two solutions of a system of nonlinear differential equations that differ only slightly in the specification of the initial conditions will diverge with time until they become as similar as two random states. This is an intrinsic property of nonlinear deterministic systems and leads to deterministic chaos. As the atmospheric processes are considered nonlinear deterministic processes, the results of Lorenz (1963) impose a finite limit on the intrinsic predictability of the atmosphere, even if a perfect model for prediction existed and if the initial condition errors were much smaller than those currently used in atmospheric models (Lorenz 1996). Furthermore, this finite limit is highly dependent on the initial conditions, and an excellent illustration of this behavior for the Lorenz system is provided in Fig. 4 of Palmer (1993). In practice, the atmospheric state is predicted using an imperfect model initialized with initial conditions that contain considerable errors. Therefore, the practical predictability of the atmosphere (defined as the extent to which prediction is possible given current forecasting methods; Lorenz 1996) is affected by, but not equivalent to, the intrinsic predictability of the atmosphere. Moreover, estimates of the practical atmospheric predictability are highly dependent on the model used for prediction and on the initial state (just as for the Lorenz system). These considerations have made evident the need to assess the uncertainty associated with a given forecast, which can be done through ensemble forecasting. Ensemble forecasting techniques for medium- to long-range weather prediction in midlatitudes, which is mostly affected by synoptic-scale baroclinic instabilities, are well established by now (Kalnay 2003, section 6.5). On the other hand, the mesoscale details of weather, such as the evolution of precipitation systems, are affected by moist convective processes. Since computational resources have become available in the early 2000s to allow the simulation of mesoscale phenomena with high horizontal grid spacing, significant effort has been devoted to understanding the processes responsible for error growth in such models. The various studies on error growth at the mesoscale (e.g., Walser et al. 2004; Zhang et al. 2002, 2003, 2006, 2007; Hohenegger and Schar 2007a,b; Bei and Zhang 2007; Melhauser and Zhang 2012; Wu et al. 2013) indicated that moist convection is the primary mechanism promoting the growth of small initial-condition errors. Moreover, it has been shown that small errors saturate at the convective scale and then grow upscale through geostrophic adjustment or cold pool dynamics to limit the predictability at the mesoscale within the time of interest of such forecasts (about 24 h), resulting in error growth rates for convection-allowing models being much higher than for large-scale models (Hohenegger and Schar 2007a). However, just as for the very simple Lorenz system, the exact limit of predictability is highly case dependent (Done et al. 2012; Hohenegger et al. 2006; Walser et al. 2004). Moreover, Hohenegger et al. (2006) showed that even cases with apparently similar intensity of moist convection might exhibit different predictability depending on the relation between the moist convection and the larger-scale flow.
Given these considerations, the formulation of proper ensemble techniques at the storm scale remains very difficult (Johnson et al. 2014). A significant effort in developing ensemble forecasting strategies at convection-allowing resolutions is represented by the National Oceanic and Atmospheric Administration (NOAA) Hazardous Weather Testbed (HWT) Spring Experiments, which have been taking place every spring since 20071 (http://hwt.nssl.noaa.gov/spring_experiment/). As part of this experiment, ensemble forecasts were produced for 30–48 h at very high resolution (dx = 4 km) using the storm-scale ensemble forecasting (SSEF) system developed at the Center for the Analysis and Prediction of Storms (CAPS; Xue et al. 2008; Kong et al. 2008). While the ensemble configuration has changed throughout the years, each year a set of members with perturbed initial conditions (IC) and lateral boundary conditions (LBC), different model physics, and mesoscale data assimilation (DA; including radar) was produced. The ICs and LBCs for these members were derived from the operational short-range ensemble forecast (SREF) system run at the National Centers for Environmental Prediction (NCEP; Du et al. 2009). SREF forecasts are produced with grid spacing of 32–45 km (depending on the member), and thus, the IC–LBC perturbations do not include information at very small scales. A recent study by Johnson et al. (2014) compared the effect of this type of perturbation to smaller-scale IC–LBC perturbations. It was found that the relative importance of the two types of errors is case dependent, but that on average, the small-scale perturbations are less important than the larger-scale errors for precipitation forecasts at medium and large scales (64–4096 km in their study). Furthermore, they concluded that the current CAPS SSEF configuration samples the primary sources of error. However, the evaluation by Clark et al. (2011) of ensemble quantitative precipitation forecasts (QPFs) from the 2009 Spring Experiment showed that the same type of ensemble is underdispersive for lead times between 6 and 18 h. On the other hand, by investigating the filtering effect of ensemble averaging for precipitation forecasts from the 2008 Spring Experiment, Surcel et al. (2014) indicated that QPFs from the ensemble members become fully decorrelated at larger scales with increasing lead time.
The objective of this paper is to further investigate the scale dependence of precipitation predictability by the 2008 CAPS SSEF by extending the analysis of Surcel et al. (2014) to determine how the range of scales over which the ensemble QPFs are fully decorrelated evolves with forecast lead time for 22 cases during spring 2008. As mentioned by Surcel et al. (2014), the complete decorrelation of the ensemble forecasts can be regarded as a lack of predictability of precipitation patterns by the ensemble at those scales. As the purpose of ensemble forecasts is to provide information on the uncertainty in the forecast, it is desirable to compare the predictability by the ensemble to the actual ability of the ensemble members to forecast precipitation at those scales as quantified by comparison to observations.
Therefore, in this paper we will provide quantitative estimates of the loss of precipitation predictability with spatial scale and forecast lead time by particular NWP model configurations. These estimates will be obtained in two ways: (i) by analyzing the difference between forecasts from the CAPS ensemble and (ii) by comparing forecasts from the CAPS SSEF to observations. As the CAPS SSEF has both IC–LBC perturbations derived from a regional-scale ensemble and varied model physics, both estimates correspond to the loss of the “practical predictability” of precipitation, as defined by Lorenz (1996) and discussed by Zhang et al. (2006). As explained in previous work by Zhang et al. (2006), Bei and Zhang (2007), and Melhauser and Zhang (2012), the practical predictability is influenced by the intrinsic predictability of the atmosphere, that is, to the growth of small IC errors due to the chaotic nature of the atmosphere. While understanding how the intrinsic predictability of precipitation for our dataset affects the estimates of predictability loss that we obtain is very important, it is outside the scope of the current study and is left for future work. The objective of this paper is to present quantitative estimates of the loss of practical predictability and to intercompare the estimates obtained through the two methods discussed above for a dataset consisting of 22 precipitation cases during spring 2008. Therefore, to facilitate the discussion of the comparison presented in this paper, we will refer to the loss of practical predictability as estimated from the differences among forecasts from the ensemble as the loss of “the predictability of the model state.” This depends on the numerical weather prediction (NWP) model, the method of generating the ensemble forecasts, and the metric used to quantify the variability within the members. Herein, the predictability of the model state will be estimated by the decorrelation scale corresponding to QPFs from the CAPS SSEF. On the other hand, when estimates of the loss of practical predictability are obtained by comparing the outputs of NWP models or of other forecasting methods to observations, we will refer to them as estimates of “the model predictability of the atmospheric state” (in this paper, we assume that observations closely describe the atmospheric state, although in section 4d we provide an assessment of the effect of this assumption on our results). These estimates of predictability depend both on the prediction model and on the particular metric of model–observation comparison. The model predictability of the atmospheric state will be quantified by the decorrelation scale between precipitation forecasts and precipitation observations. Previous verification studies of precipitation forecasts have reported that forecasting skill shows scale dependence (Casati et al. 2004; Gilleland et al. 2009; Roberts and Lean 2008; Surcel et al. 2014), with a loss of useful skill at larger scales with increasing forecast lead time (Germann et al. 2006; Roberts 2008). Therefore, we aim to determine how the range of scales with a complete lack of skill (no model predictability of the atmospheric state) compares to the range of scales lacking predictability of the modeled precipitation for the 2008 CAPS SSEF. Ideally, the predictability of the model state and the model predictability of the atmospheric state should be consistent with each other when estimated over sufficiently large datasets. That is, it is desirable, on average, for small ensemble spread to correspond to good forecast skill, and vice versa. Therefore, our results could provide insight on whether the perturbations currently employed in the CAPS SSEF are sufficient to represent uncertainty as a function of scale.
In addition to dynamical models for precipitation forecasting, statistical models are still often employed for the very-short-term prediction of precipitation (nowcasting). The simplest one is Eulerian persistence (EP), which simply assumes no evolution of the current state (in this case, of the precipitation field). Evidently, this is a very poor model, as a cursory examination of radar imagery demonstrates that rainfall is neither stationary nor steady. However, the EP model remains a baseline for validating predictions from more complex models.
Another more appropriate statistical method for short-term precipitation forecasting is Lagrangian persistence (LP), which assumes persistence in the reference frame of the moving precipitation system. Therefore, to obtain a forecast, it suffices to characterize the motion of the precipitation patterns in the immediate past and to extrapolate the current precipitation field using this motion field. This method is known to work well for very-short-term (0–6 h) precipitation forecasting when used in radar-based extrapolation algorithms (Berenguer et al. 2012; Lin et al. 2005; Turner et al. 2004). Furthermore, LP precipitation forecasts were found to outperform the deterministic forecasts from the CAPS SSEF radar data–assimilating members for about 3 h (Berenguer et al. 2012). To complement the results of their analysis, we will also investigate here how the scale dependence of the lack of predictability of rainfall compares between the radar data–assimilating members and LP forecasting.
Our study will show the potential of using the decorrelation scale introduced by Surcel et al. (2014) as an evaluation metric for quantifying the scale dependence of the predictability of precipitation patterns. Furthermore, it will offer a quantitative estimate of the loss of predictability of precipitation by both dynamical and statistical forecasting methods, as a function of scale and forecast lead time for a set of 22 cases during spring 2008. By analyzing the scale dependence of precipitation predictability for a set of cases, rather than adopting a case study approach, we attempt to generalize and thus complement the results obtained by previous predictability studies. The results of our study are also relevant to forecasting applications consisting of postprocessing model output, such as some ensemble averaging and blending applications (Atencia et al. 2010; Ebert 2001; Kober et al. 2012), as it indicates which components of the very detailed two-dimensional picture provided by the model do not contain useful information and therefore should not be used in such applications.
The paper is organized as follows. Section 2 describes the precipitation forecasts used in the analysis. Section 3 explains the methodology used to derive the decorrelation scale. Section 4 presents the results. Section 5 offers a discussion on the predictability of precipitation at the mesoscale and suggestions for future work, and section 6 presents the conclusion.
Both precipitation forecasts and precipitation observations are used in the study, and they are described next. All the forecasts and observations have been remapped on a common grid using a nearest-neighbor interpolation method, and the analysis is performed on a domain covering most of the central and eastern United States, extending from 32° to 45°N and from 103° to 78°W, as illustrated in Fig. 1. The dataset consists of 22 precipitation cases from 18 April to 6 June 2008. Both hourly rainfall accumulations fields and instantaneous reflectivity fields (simulated or observed) were available for each dataset, and the entire analysis has been performed on both types of fields, with consistent results. However, in this paper, only the results corresponding to hourly rainfall accumulation fields are presented.
a. Precipitation forecasts
1) CAPS SSEF forecasts
The SSEF system was developed at CAPS, and it was run during the 2008 NOAA HWT Spring Experiment (Xue et al. 2008; Kong et al. 2008). It uses the Advanced Research version of the Weather Research and Forecasting (WRF-ARW) Model (Skamarock et al. 2008), version 2.2, and consists of 10 members with different physical schemes, mesoscale data assimilation including radar in 9 out of the 10 members, and perturbed ICs and LBCs. The background ICs are interpolated from the North American Model (NAM; Janjić 2003) 12-km analysis, and the IC–LBC perturbations are directly obtained from the SREF system run operationally at NCEP (Du et al. 2009). The SREF members are based on different dynamic cores [Eta, WRF Nonhydrostatic Mesoscale Model (WRF-NMM), and WRF-ARW] and are run with grid spacing of 32 or 45 km. Therefore, the IC–LBC perturbations do not have variability at the scale at which the 4-km members are run. In addition to IC–LBC perturbations, the ensemble members have different microphysical schemes varying among Thompson (Thompson et al. 2004), WRF single-moment 6-class (WSM6; Hong and Lim 2006), and Ferrier (Ferrier et al. 2002); different planetary boundary layer (PBL) schemes varying between Mellor–Yamada–Janjić (Mellor and Yamada 1982; Janjić 2001) and Yonsei University (YSU; Noh et al. 2003); and different shortwave radiation schemes varying between Goddard (Tao et al. 2003) and Dudhia (1989). Thirty-hour forecasts on a 4-km grid were performed almost daily in April–June 2008. Two of the members (control members C0 and CN) do not have SREF-based IC–LBC perturbations and have identical model configurations. However, convective-scale observations from radar [from the Weather Surveillance Radar-1988 Doppler (WSR-88D) network] and surface stations are assimilated only within CN. The assimilation of mesoscale observations was performed using the Advanced Regional Prediction System (ARPS) three-dimensional variational data assimilation (3DVAR) and cloud analysis package (Gao et al. 2004; Hu et al. 2006a,b; Xue et al. 2003). Radar reflectivity, surface data, and visible and 10.5-μm infrared data from the Geostationary Operational Environmental Satellite (GOES) were processed by the cloud analysis scheme to retrieve hydrometeor information. Radar radial velocity data and data from the Oklahoma Mesonet, METAR, and wind profiler networks were assimilated with the ARPS 3DVAR (Johnson et al. 2014).
2) MAPLE LP forecasts
The LP forecasts analyzed in this paper were produced with the McGill Algorithm of Precipitation Forecasting by Lagrangian Extrapolation (MAPLE; Germann and Zawadzki 2002). These are very-short-term precipitation forecasts produced using an extrapolation-based technique that employs the variational echo tracking (VET) algorithm (Laroche and Zawadzki 1995) to estimate the motion field of precipitation and a modified semi-Lagrangian backward scheme for advection. MAPLE was run using the National Severe Storms Laboratory (NSSL) 2.5-km height rainfall maps described below to generate 8-h forecasts initialized every hour with a temporal resolution of 15 min. For the analysis of hourly rainfall accumulations, maps of radar reflectivity Z are converted into rain rate R according to Z = 300R1.5, and then instantaneous reflectivity maps every 15 min are averaged to obtain radar-derived hourly rainfall accumulations.
b. Precipitation observations
The precipitation observations used in this study are U.S. radar reflectivity mosaics at 2.5-km altitude generated by NSSL (Zhang et al. 2005) every 5 min and mapped with a spatial resolution of 1 km. For the analysis of hourly rainfall accumulations, observed reflectivity maps every 15 min have been processed to obtain maps of hourly rainfall accumulations as in the case of MAPLE.
Given Xi, Xj, (i, j = 1, …, N) precipitation fields and assuming their variance is defined, the following holds:
If the fields Xi, Xj are fully decorrelated, then for all . It follows that
Note that if (2) stands for two fields (N = 2), this also means that the variance of the error between the fields exceeds the variance of the mean of the fields. We are interested in the scale dependence of the predictability of precipitation. Hence, we will investigate whether there are scales at which various precipitation forecasts are fully decorrelated. To do so, we verify whether (2) holds for variances of the precipitation fields at certain wavelengths. That is, let be the variance of field Xi at scale λ. We define the power ratio as
where , and search for λ such that . To obtain the variance of a precipitation field at a given scale, we compute the power spectrum using the discrete cosine transform (DCT; Denis et al. 2002). This transform is equivalent to the fast Fourier transform (FFT), but it is preferred as it eliminates the problems associated with discontinuities at the boundaries of the domain. The values for vary between 1, which represents complete decorrelation between the fields Xi at scale λ, and 1/N, which represents perfect resemblance between the fields at scale λ. To illustrate this, Fig. 2a shows, , where and are precipitation forecasts of hourly accumulations from the nine radar data–assimilating ensemble members at 1000 UTC 24 April 2008. While the curve is noisy, the figure indicates that for (red circle in Fig. 2a), indicating the complete decorrelation between the ensemble members for this range of scales. Also, for , the ratio decreases toward without reaching this value. Surcel et al. (2014) mentioned that the decorrelation scale increases with forecast lead time. Therefore, Fig. 2b shows as a function of lead time for 24 April 2008. The value of was determined by finding the largest for which . The threshold of 0.95 was chosen rather than 1 as it was found to eliminate some of the noise in determining the decorrelation scale, without introducing any significant bias. Alternatively, could be determined as the intersection between two linear fits: for and for . However, this method was showing very similar results while it was more sensitive to the noise in the curves. While the curve is somewhat noisy, it shows that the decorrelation scale is increasing with forecast lead time, reaching around 300 km at the end of the forecast period. The decorrelation scale can be determined for each different precipitation event, and averages of together with its standard deviation can be presented to have an estimate of the variability of the values. To eliminate some of the noise in the curve, a 3-h running mean is applied for each of the plots of versus t presented in the paper.
This methodology can be applied for any number of precipitation fields, even though the smaller the number of precipitation fields, the noisier the estimates. However, we will use this methodology to examine not only the decorrelation scale between only two ensemble members (predictability of the model state), but also the decorrelation between forecasts and observations (predictability of the atmospheric state by a certain model). Section 4 presents the results of applying this methodology to the 2008 dataset.
The decorrelation scale represents the upper limit of the range of scales where there is a complete lack of predictability using a certain forecasting method. This methodology does not provide any information about the degree of predictability at scales larger than ; it simply shows that there is some predictability at those scales. A measure of the predictability for scales could be the value of in Fig. 2a since this value depends on the covariance term on the right-hand side of (1).
The decorrelation scale could similarly be obtained by investigating the range of scales over which two forecasts are decorrelated. However, that would involve decomposing the precipitation fields in different scale components using decomposition methods such as DCT or FFT or the Haar wavelet transform (Casati et al. 2004; Germann et al. 2006; Johnson et al. 2014). DCT or FFT bandpass-filtered rainfall fields are strongly affected by the Gibbs effect and thus would make such computations impossible. Haar filtering is less prone to Gibbs effects (Turner et al. 2004), but the Haar transform imposes a coarse sampling in scale. Predictability estimates for LP precipitation forecasts were obtained in this way by Germann et al. (2006), and we will discuss the comparison to their results in section 4c.
This section presents the results obtained by applying the above methodology to precipitation forecasts and observations from 22 cases during spring 2008. This set of cases has been previously analyzed by Surcel et al. (2010) and Berenguer et al. (2012), who have shown that the period was dominated by large-scale precipitation systems that nonetheless exhibited a marked diurnal cycle. Figures 3 and 4 illustrate the 22 cases in terms of the evolution of the power spectra of hourly rainfall fields derived from radar and in terms of radar-derived total accumulations for the entire time period (30 h). As shown by the power spectra (left panels), most cases show a clear diurnal cycle in the evolution of the statistical properties of the precipitation fields, with variance decreasing at all scales with time from 0000 UTC, reaching a minimum during early afternoon, and then beginning to increase again through the evening. This diurnal signal can affect evaluation metrics, as evident in Johnson et al. (2014, their Figs. 6, 8, and 11), and can make it difficult to analyze the evolution of skill with lead time. This problem is avoided in our analysis, as the decorrelation scale is computed from a power ratio, thus removing the effect of the large changes in the variance of a precipitation field with the time of day. The results presented in this section are usually averaged over the entire dataset, but the case-to-case variability is also addressed wherever relevant.
a. Predictability of the model state for the CAPS SSEF
As mentioned in the introduction, by predictability of the model state, we mean the extent to which forecasts from models with slight differences in the model formulation and in the IC–LBCs resemble each other. Usually, the predictability of the model state is characterized in terms of the ensemble spread. In this sense, computing the decorrelation scale for the entire ensemble is equivalent to determining the range of scales over which the ensemble has as much spread as that of an ensemble of random precipitation fields.
Figure 5a shows the power ratios corresponding to the forecasts from all the radar data–assimilating ensemble members:
where and represent a 2D precipitation field from one of the nine radar data–assimilating ensemble members for all forecast lead times (colors) and averaged over the 22 cases. After the first 3 h, there is a range of scales for which the ratios are 1, meaning that the forecasts are fully decorrelated at those scales. Furthermore, there is a clear increase in the range of scales over which with forecast lead time. Following the methodology described in section 3, Fig. 5b shows as a function of forecast lead time. The black line and the gray shading represent the mean and standard deviation (), respectively, of for all cases, while the blue line shows the decorrelation scales derived from the average power ratio curves in Fig. 5a. The value of increases with forecast lead time following a power law: , with t in hours and in kilometers (red line in Fig. 5b). Figure 5b is interpreted as follows: for lead times and scales under the curve, the ensemble members are fully decorrelated and there is thus no predictability of modeled precipitation—QPFs from the ensemble members resemble each other as much as any nine random precipitation fields do. For lead times and scales above the curve, there is some predictability, although nothing can be said from this plot about the quality of it. According to this result, on average for the 22 cases under study, the sources of uncertainty considered in this ensemble are sufficient to result in a loss of predictability at meso-γ scales (2–20 km) after the first hour and at meso-β scales (20–200 km) after the first 18 h. While this result might sound surprising from the point of view of operational forecasting, it is in agreement with other results obtained by Walser et al. (2004), Zhang et al. (2003), Bei and Zhang (2007), and Cintineo and Stensrud (2013). To illustrate the great variability between the members at those scales, Fig. 6 shows a snapshot of the precipitation fields from the ensemble members for a lead time of 24 h for one event over a subdomain of 1300 km × 1300 km (left panel) and for a subdomain of 300 km × 300 km (right panel). In the left-hand panel, the eye focuses on the large-scale patterns, which are similar among the members and with observations. On the other hand, the lack of similarity becomes evident in the right-hand panel, when we focus on the detail at scales smaller than 300 km.
The decorrelation scale varies with time following a power law with an exponent smaller than 1, meaning that the error growth rate decreases with increasing lead time. The ensemble members have radar DA, IC–LBC perturbations, and different model physics, so the decorrelation between the members can be caused by any of these sources of error. However, there are three ensemble members that have fewer differences among each other: the control member CN; the C0 member, which has the same configuration as CN but lacks the radar DA; and N2, which has the same model configuration as the control member, but has IC–LBC perturbations. The effects of radar DA and of IC–LBC perturbations can be investigated by computing the power ratios between CN and C0 and between CN and N2 as
and then computing as a function of forecast lead time. Figure 7a shows for CN–C0 (orange), CN–N2 (blue), and the ensemble (black) averaged over all cases and with the variability around the mean values (shading), together with the equations for the power-law fit to each curve. This figure shows that the decorrelation scales are similar for each pair of forecasts after the first 15 h (blue and orange curves). Similar to what was obtained for the entire ensemble, increases with forecast lead time following a power law for CN–N2. The IC–LBC perturbations affect increasingly larger scales with increasing lead time, causing a complete lack of predictability at meso-β scales after the first 10 h, and the error growth rate decreases with lead time. Because of our limited dataset, we cannot at this time investigate the reasons for the variability of . However, this evolution of IC–LBC error is consistent with the conceptual model of Zhang et al. (2007), despite having differences in the structure of the IC–LBC perturbations and despite analyzing precipitation rather than temperature and wind fields. According to their multistage model for error growth, small initial-condition errors rapidly grow at convective scales, saturating on time scales of O(1) h. Once errors saturate at convective scales being amplified by moist convective processes, they grow upscale through mechanisms such as geostrophic adjustment and cold-pool dynamics, resulting in the loss of predictability at the mesoscale after 20–30 h. The balanced component of error could further grow upscale through baroclinic instability to limit predictability at larger scales (Zhang et al. 2007). Also, both Zhang et al. (2007) and Hohenegger and Schar (2007b) report decreasing error growth rates with increasing scale, as shown as well by our results. Moreover, appears to evolve similarly between N2 and the other ensemble members, despite the additional physics perturbations in the other members. It seems that mixed physics as an addition to IC–LBC perturbations does not affect the scale dependence of precipitation predictability on average for the 22 cases. However, it has been shown by Stensrud et al. (2000) that the relative importance of IC errors and mixed physics is highly case dependent, with model physics playing a larger role in cases with weaker large-scale forcing. To look at the difference in caused only by IC–LBC perturbations compared to also adding perturbed physics, Fig. 7b shows a scatterplot between corresponding to CN–N2 and corresponding to CN–N1 for different lead times. Unlike CN and N2, the N1 member uses the Ferrier microphysics scheme and the YSU PBL scheme. This figure shows that for lead times up to 6 h, adding physics perturbations results in larger , while after 12 h, the decorrelation scale no longer seems to depend on the type of perturbations. This agrees with the results of Hohenegger and Schar (2007b), who reported that precipitation ensemble spread becomes similar after 11 h for ensembles using different perturbation methodologies.
On the other hand, the decorrelation scale between CN and C0 is constant throughout the forecast period. This is particularly clear in Fig. 7c, which shows the scatterplot between values corresponding to CN–C0 and values corresponding to CN–N2 for all cases and for different lead times. For lead times of less than 6 h, the points are below the 1:1 line, for CN–C0 being the largest (purple and black symbols); for lead times of 12 and 18 h, they are situated around the 1:1 line (blue and green symbols), while after 24 h, for CN–N2 is larger than for CN–C0 (orange and red symbols). For the same forecasting system, but run during 2009 and 2010, Stratman et al. (2013) showed that the difference between members CN and C0 decreases with forecast lead time in terms of several metrics. That is, with forecast lead time, the effect of radar DA is lost and the members become increasingly similar. However, we see that the scales at which the error saturates are never recovered. Therefore, it appears that the assimilation of radar data has a transient effect at larger scales, while its effect at the mesoscale is long lived. As discussed by Craig et al. (2012), it could be that the effect of radar DA is manifested very clearly in the precipitation fields because of the assimilation of radar reflectivity, but not as much in the dynamical fields that control the forecast evolution, and thus it would appear to wash out with forecast time. Furthermore, it is possible that the lack of upscale growth of perturbations induced by radar DA could be due to CN and C0 having identical LBCs. It is well known that apparent “enhanced” mesoscale predictability is often caused by lack of LBC perturbations (Zhang et al. 2007). Understanding the reasons for the scale variability of the effect of radar data assimilation is of great importance, but it would require the analysis of data other than precipitation. This is outside the scope of this paper and is left for further work.
This subsection only deals with the effect of certain sources of error on the forecasts themselves, not on forecasting skill. The next section discusses the SSEF’s predictability of the atmospheric state as evaluated against radar rainfall estimates.
b. CAPS SSEF’s model predictability of the atmospheric state
This methodology can also be applied to forecast–observations pairs, by computing the power ratio
where Xi is the rainfall field corresponding to an ensemble member. Since gives the variance of each scale, the comparison of model output with observations using the equation above can be seen as a measure of model skill. The decorrelation scale curves derived from the power ratios for each radar data–assimilating ensemble member and averaged over all days are shown in Fig. 8a (gray lines), whereas the black line represents the average over all the ensemble members. This line will be used as the representative curve for the predictability of the model state by this radar data–assimilating ensemble. The decorrelation scale increases rapidly during the first six forecast hours and increases more as a power law afterward. The rapid increase at the beginning seems associated with the loss of the effect of the radar DA. This high-resolution, convection-allowing, radar data–assimilating ensemble shows no predictability of the atmospheric state at meso-β scales after the first 3 h.
To better characterize the effect of assimilating radar observations, Fig. 8b shows for radar–CN (blue line), radar–C0 (orange line), and CN–C0 (black line). The figure also indicates the power-law fit for each line, even though for the radar-CN curve the fit is poor during the first 6 h, as the growth is faster than the power law. As indicated by this figure, the non–radar data–assimilating member C0 had no predictive skill at scales lower than 300 km throughout the forecast lead time. Also, after the first 15 h, the C0 and CN members are similar in terms of the range of scales they can predict. As mentioned before, radar data assimilation affects the forecast at all scales lower than 200 km throughout the forecast lead time, as suggested by corresponding to CN–C0; but, as shown in Fig. 8b, a model without radar DA shows no predictability of the atmospheric state at scales lower than 300 km throughout the forecast period, while radar DA leads to some gain in predictive ability at scales lower than 300 km during the first 15 h, on average, for the 22 cases. However, we note that all three lines lie within the error bars of each other after the first 3 h.
Our results are comparable to Roberts (2008), who found that even at the beginning of the forecast, useful skill is exhibited only at scales larger than 100 km for forecasts of localized rainfall. This result appears grim in terms of the ability of convection-allowing models to predict precipitation. However, this metric does not measure the actual predictive skill that these models have at scales where they exhibit some predictability. In fact, our results suggest that to properly evaluate the importance of radar data assimilation for model skill, all features in the precipitation field occurring at scales lower than should be filtered out before performing the evaluation. The recent study by Stratman et al. (2013) shows that the positive effect of radar DA in terms of skill is manifested not at small scales but at scales between 40 and 320 km. They suggest that the assimilation of radar observations with a 3DVAR cloud analysis results in gains in skill only at larger scales and out to 12 h.
c. Predictability by EP and LP
The simplest model of prediction is EP, which simply states that any future state of a system is identical to the current state. In the case of precipitation, as mentioned by Germann et al. (2006), an EP precipitation forecast can be obtained from the current radar precipitation map:
where is the 2D rainfall forecast at lead time and is the observed 2D rainfall field at initial time t0. Therefore, looking at the decorrelation between EP forecasts and observations is equivalent to computing the temporal autocorrelation of observed rainfall. Figure 9a shows the power ratios for EP radar for EP initialized at 0000 UTC for lead times up to 12 h and averaged over all cases. The clear progression of colored lines indicates that here, as well, increases with lead time. Figure 9b shows averaged over all cases (black line) together with the uncertainty around the mean (gray shading), the derived from the average power ratio curves in Fig. 9a (blue line), and the power-law fit to the average (red line). Indeed, increases with forecast lead time following a power law , resulting in a complete loss of predictability at meso-β scales after the first forecast hour and confirming that EP is indeed a poor method of predicting rainfall. Finally, Fig. 9c shows for forecasts initialized at different times of the day (colored lines). It appears from the progression of the lines that the decorrelation scale is lower and changes slower with forecast lead time for the forecasts initialized around 0000 UTC (dark red to dark blue lines). For the forecasts initialized later in the day, the decorrelation scale is slightly higher, but the rate of change with time is similar to the forecasts initialized around 0000 UTC (blue and green lines). The change of with time is largest for the forecasts initialized around 1800 UTC. According to Surcel et al. (2010), during spring 2008, the average diurnal cycle of precipitation indicates that 1800 UTC is marked by the initialization of precipitation on the lee side of the Rockies and hence in the rapid evolution of the rainfall field. This can explain both the poorer performance of EP and the greater variability of the performance with lead time.
A better model for precipitation nowcasting is LP. Radar-based extrapolation algorithms using this principle are commonly used for very-short-term forecasting (0–6 h). Figure 10 shows the power ratios as a function of scale (Fig. 10a) and the decorrelation scale as a function of lead time (Fig. 10b) averaged over all cases for hourly accumulation forecasts initialized at 0000 UTC and produced by MAPLE. The decorrelation scale is increasing with lead time for LP forecasts as well, following a power law , but it increases less rapidly than for EP forecasts. However, after 2 h, scales smaller than 200 km are no longer predictable.
Figure 10c shows for forecasts initialized every hour (colored lines). From the differences between the lines, it appears that increases more rapidly with time for the forecasts initialized later in the day (1600–2300 UTC), even though there is some variability around the lines as shown in Fig. 10b. It seems reasonable for the temporal evolution of to show sensitivity to initialization time, as it was shown previously by Berenguer et al. (2012) that predictability by MAPLE is affected by the diurnal cycle of precipitation.
The predictability of precipitation by LP has been thoroughly studied by Germann and Zawadzki (2002, 2004) and Germann et al. (2006). Despite using a different methodology, Germann et al. (2006) nonetheless obtained results similar to ours, with predictability estimates of about 3 h for scales of O(100) km, as quantified in terms of the lifetime of bandpass scales.
d. The effect of observational uncertainty on predictability estimates
The entire analysis presented in this paper is based on the comparison of forecasts to a particular set of radar-derived quantitative precipitation estimates (QPEs) described in section 2b. While this verification dataset has been chosen for its quality as mentioned by Surcel et al. (2010), it is still reasonable to question how the uncertainty of these products affects the estimates of the decorrelation scale. Radar QPE is affected by many sources of error, and a proper error characterization is complicated, as shown by Berenguer and Zawadzki (2008, and references therein). Therefore, rather than attempting to characterize the error of the radar QPEs used here, we investigate the effect of observational uncertainty simply by comparing our verification dataset to another set of rainfall estimates. The additional verification dataset is NCEP’s Stage IV multisensor precipitation product (Baldwin and Mitchell 1997), available as hourly rainfall accumulations on a 4-km polar stereographic grid. The Stage IV data were obtained from the NCEP website (www.emc.ncep.noaa.gov/mmb/ylin/pcpanl/) and were remapped on the analysis grid (section 2) using nearest-neighbor interpolation. The two precipitation datasets are compared in terms of the decorrelation scale.
Figure 11a shows the power ratios between radar precipitation and Stage IV precipitation for different times, averaged over the 22 cases. The average is never 1, but there are individual cases for which . Figure 11b shows the average for those cases. According to this figure, it is impossible to infer anything about precipitation predictability at scales lower than 12 km because of the uncertainty in the observed precipitation fields.
5. Discussion on the comparison between various forecast methods
Figure 12a summarizes the results previously presented about the predictability of precipitation for lead times of 0–30 h. This figure shows corresponding to the ensemble (predictability of the model state; green), to the radar–radar DA model (model predictability of the atmospheric state for a model with radar DA; black), and to the radar-DA–non-radar-DA model (model predictability of the atmospheric state for a model without radar DA; purple). The nonshaded areas correspond to regions of a complete lack of predictability. As emphasized by this figure, after the effect of the radar DA wears out, there is no predictability of the atmospheric state by NWP at meso-γ and meso-β scales. Furthermore, after approximately 5 h, the radar data–assimilating and the non–radar data–assimilating models become equivalent in terms of the range of scales they are unable to predict (black and purple line are superimposed), and both types of models exhibit some predictability at scales larger than 300 km throughout the forecast period. However, as found by Stratman et al. (2013), the effect of radar DA seems to be present at larger scales for longer lead times. Future work should focus on determining the case dependence (and reasons thereof) of the effect of radar DA across scales.
The region in Fig. 12a between the two lines represents scales where there is some predictability of the modeled state, but no predictability of the atmospheric state by NWP. Ideally, the two lines should lie together, as the truth, approximated here by observations, should be a sample of the ensemble probability density function. We have explained in the previous section that the evolution of for the predictability of the model state results mostly from the growth of IC–LBC errors and is consistent with the conceptual model of Zhang et al. (2007). But when it comes to between forecasts and observations, it is more difficult to even hypothesize what causes the evolution of . It has been shown previously by Durran et al. (2013), Nuss and Miller (2001), and Bei and Zhang (2007) that small IC errors at the large scale significantly impact mesoscale predictability. In addition, Zhang et al. (2002) showed that simulations run with reanalyzed conventional observations resulted in an improvement in QPF skill with respect to the operational forecast. Therefore, it is possible that improving the way the analysis data is utilized could lead to improving the forecast, thus decreasing the values of between model and radar and bringing the black and green curves closer together. On the other hand, the assimilation of mesoscale observations does positively affect forecast skill, even though it is not clear whether this is simply the effect of assimilating reflectivity, which leads to the morphing of the model precipitation fields into the observed field at the initial time, or to radar DA actually changing the initial conditions. Our analysis results indicate that the average differences between the predictability of the model state and the model predictability of the atmospheric state are consistent from one case to another (not shown). Given this systematic behavior, and the results of Johnson et al. (2014) on the minimal effect of adding small-scale perturbations to the IC–LBC perturbations derived from the NCEP SREF, we believe that more benefits may be gained by accounting for larger-scale IC uncertainties in the ensemble design, which hopefully would increase the values of the decorrelation scale between the ensemble members. Of course, another possible reason for the systematic difference between the predictability of the model state and the model predictability of the atmospheric state could be model bias. Therefore, our results highlight the need to perform experiments that focus on determining the reasons for this apparent lack of spread at meso-β scales.
Figure 12b focuses on the time range (0–6 h) of very-short-term precipitation forecasts and adds the decorrelation scale curves for LP and EP. This figure shows that after the first two hours, no forecasting method exhibits any predictability as characterized with respect to radar observations at meso-γ and meso-β scales. Furthermore, while at the beginning of the forecast period, the decorrelation scale corresponding to LP is lower than that of the model, with the crossover time being 3 h, both forecasting methods (NWP and LP) are very similar during this time. Also, both the CAPS SSEF and the MAPLE LP algorithm exhibit better predictability than simple EP in terms of the scales that they can predict. It is therefore clear that EP is no longer necessary as a baseline for evaluating precipitation forecasts from LP algorithms or from radar data–assimilating models, as these other methods consistently outperform EP.
The dashed lines in the two figures illustrate the effect of the uncertainty in our verification data on the results, which is exhibited at scales smaller than 12 km. As this scale is very low compared to the model predictability limits, there is a large range of scales over which improvement is necessary before reaching this “observational” limit.
This paper builds on the results of Surcel et al. (2014) to propose and use a methodology for analyzing the scale dependence of the predictability of precipitation fields over the continental United States during spring 2008 by various forecasting methods. There have been many efforts to understand mesoscale predictability in the past few decades, and our study contributes to this by
offering a quantitative measure of the evolution of the decorrelation scale, and hence of the range of scales at which a given method exhibits a lack of predictability, with forecast lead time;
computing this measure for precipitation forecasts and observations for a dataset of a reasonable size (22 cases during spring 2008), rather than for only a few cases, thus verifying and complementing the results obtained by previous predictability studies that used a case study approach (Walser et al. 2004; Zhang et al. 2002, 2003, 2006, 2007; Bei and Zhang 2007);
using the decorrelation scale to intercompare the predictability of the model state to the model predictability of the atmospheric state, hence providing a measure of ensemble consistency as a function of scale for a storm-scale ensemble; and
intercomparing the predictive ability of statistical and dynamical methods for short-term precipitation forecasting as a function of scale.
Our results show that for all forecasting methods there is a range of scales over which the method displays a complete lack of predictability of the atmospheric state, the upper limit of which is the decorrelation scale . For all forecasting methods, increases with forecast lead time. The rate of change of is fastest for EP, followed by LP. On the other hand, the decorrelation scale between radar and C0 has a constant value of about 300 km throughout the forecast lead time. Also, increases very rapidly during the first 5 h for radar and CN, showing that the effect of radar data assimilation in terms of improving predictability at smaller scales is rapidly washed out. On the other hand, comparison between CN and C0 shows that radar DA affects scales lower than 200 km throughout the forecast time. In agreement with previous studies (Roberts 2008; Germann et al. 2006; Surcel et al. 2014), none of the forecasting systems analyzed here show any predictability of precipitation at meso-γ and meso-β scales after the first 2 h. Therefore, to properly intercompare these methods in terms of QPF skill, the unpredictable scales should be filtered out before performing the verification.
The comparison among EP, LP forecasts, and radar DA models, meant to complement the study of Berenguer et al. (2012), confirms that, given the better performance shown by radar DA models, a better baseline for model evaluation would be LP rather than EP.
On the other hand, we found that the uncertainties currently accounted for in the CAPS SSEF appear to not generate sufficient spread at forecast hours less than 18 h at meso-β scales, demonstrated by the difference in between the ensemble and the radar or model, and that this behavior was systematic for our dataset. Recent research has contributed greatly to the understanding of error growth in convection-allowing models (e.g., Zhang et al. 2002, 2003, 2007; Hohenegger et al. 2006, Walser et al. 2004), and the growth of IC–LBC perturbations in our study seems to be consistent with the error-growth model proposed by Zhang et al. (2007). However, significant case dependence of precipitation predictability has been often reported in the literature (e.g., Germann et al. 2006; Walser et al. 2004; Zhang et al. 2006; Hohenegger et al. 2006; Done et al. 2012), while our results indicate that the difference between the predictability of the model state and the model predictability of the atmospheric state (the difference between the green line and purple and black lines in Fig. 12a) is consistent among the cases. Therefore, it is possible that this apparent lack of spread at meso-β scales could be a consequence of model bias (Clark et al. 2011) or of large-scale IC errors that are known to contribute most to forecasting skill (Durran et al. 2013).
The predictabilities discussed here correspond to what Lorenz (1996) and later studies (e.g., Zhang et al. 2006; Bei and Zhang 2007; Melhauser and Zhang 2012) refer to as practical predictability. For example, the value of the decorrelation scale between forecasts and observations is more likely due to large IC and model errors than to the amplification of small IC errors through nonlinear dynamics. In the case of the decorrelation scale between the ensemble members as well, the initial perturbations were derived from a regional-scale ensemble, and it can therefore be expected that the predictability limit for this ensemble might be different than that of an ensemble that samples only very small IC errors. The intrinsic predictability of the atmosphere would have an effect on the estimates of practical predictability if the model captures the appropriate nonlinear dynamics. For example, cases with intense moist convection (highly nonlinear processes) are usually more unpredictable from a practical point of view as well. On the other hand, even for cases that exhibit strong intrinsic predictability, it is possible that model deficiencies and analysis errors might lead to poor forecast results. Investigating the intrinsic predictability for our set of cases would have demanded setting up additional experiments, thus needing significant computational resources, and is therefore better suited for future work. However, the results that we have obtained qualitatively agree with those of Bei and Zhang (2007) and Melhauser and Zhang (2012), who looked at the relationship between practical and intrinsic predictability for two case studies. By reducing the IC errors considered in their ensemble simulations, they noted linear gains in predictability, but they also found that the effect of moist convective processes on error growth sets an inherent predictability limit at the mesoscale.
Our study provides a quantitative estimate of the range of spatial scale over which the very detailed information that a forecasting method can provide is in fact unpredictable given the errors both in the modeling approach and in the initial conditions. However, this decorrelation scale focuses on the agreement between entire two-dimensional precipitation maps, and therefore it is sensitive to displacement errors. In an operational setting, forecasters might still find useful the information provided by a model in terms of storm characteristics at scales lower than 200 km, and our methodology does not account for these cases. However, our methodology is useful for the many applications that use all of the information in a two-dimensional QPF, such as blending applications, ensemble averaging, or hydrological modeling. For these applications, it is useful to know that scales lower than are unpredictable and should therefore be treated in a stochastic manner.
Finally, we remind the reader that the results presented in this study are dependent on the forecasting systems under study. We are in the process of extending the methodology herein to the Spring Experiment data of 2009–13. Our preliminary results indicate that the sensitivity to the IC–LBC perturbations analyzed here is consistent from year to year and that the decorrelation scale shows great sensitivity to the type of perturbations; that is, different errors propagate differently upscale. A paper describing these new findings is in preparation.
We are greatly indebted to Ming Xue and Fanyou Kong from CAPS for providing us the ensemble precipitation forecasts. The CAPS SSEF forecasts were produced mainly under the support of a grant from the NOAA CSTAR program, and the 2008 ensemble forecasts were produced at the Pittsburgh Supercomputer Center. Kevin Thomas, Jidong Gao, Keith Brewster, and Yunheng Wang of CAPS made significant contributions to the forecasting efforts. M. Surcel acknowledges the support received from the Fonds de Recherche du Québec–Nature et Technologies (FRQNT) in the form of a graduate scholarship. This work was also funded by the Natural Science and Engineering Research Council of Canada (NSERC) and Hydro-Quebec through the IRC program. We acknowledge the comments and suggestions of three anonymous reviewers that helped improve the paper.
In fact, the NOAA HWT Spring Experiments have been taking place since 2000, but ensemble forecasts have been produced as part of the experiments only since 2007.