In their comment, Žagar and Szunyogh raised concerns about a recent study by Zhang et al. that examined the predictability limit of midlatitude weather using two up-to-date global models. Zhang et al. showed that deterministic weather forecast may, at best, be extended by 5 days, assuming we could achieve minimal initial-condition uncertainty (e.g., 10% of current operational value) with a nearly perfect model. Žagar and Szunyogh questioned the methodology and the experiments of Zhang et al. Specifically, Žagar and Szunyogh raised issues regarding the effects of model error on the growth of the forecast uncertainty. They also suggested that estimates of the predictability limit could be obtained using a simple parametric model. This reply clarifies the misunderstandings in Žagar and Szunyogh and demonstrates that experiments conducted by Zhang et al. are reasonable. In our view, the model error concern in Žagar and Szunyogh does not apply to the intrinsic predictability limit, which is the key focus of Zhang et al. and the simple parametric model described in Žagar and Szunyogh does not serve the purpose of Zhang et al.
Predictability is a fundamental concept for numerical weather prediction (NWP). As the numerical prediction proceeds, dynamical instabilities and chaotic nonlinear interactions cause the forecast uncertainty to increase until the differences between individual ensemble forecasting members are statistically indistinguishable from random draws from the numerical model’s climate. After that, NWP provides no information that could not be readily obtained from a previously generated model climate, which means the predictability is lost under this NWP system. The time interval between the initial time of the integration and the time at which predictability is lost is usually called the practical predictability limit for an NWP system (see, e.g., Buizza and Leutbecher 2015). If the model and/or initial-condition errors can be reduced, then this practical predictability limit can be extended. Yet Lorenz (1969, hereafter L69) shows that there is likely an intrinsic finite predictability limit for the atmosphere that cannot be further extended due to very rapid error growth at small scales combined with nonlinear interaction between different scales (so-called butterfly effect). This intrinsic predictability limit for our atmosphere is supported and verified by follow-up studies (e.g., Tribbia and Baumhefner 2004; Froude et al. 2013; Selz and Craig 2015; Judt 2018) and now widely accepted.
Zhang et al. (2019, hereafter Z19) studied the predictability limit of midlatitude weather using global convection-permitting models, with a focus on the intrinsic predictability limit for day-to-day synoptic weather system. In other words, Z19 tried to address the long-lasting intriguing question since L69: how far our NWP skill could be further extended? The methodology used in Z19 is similar to identical-model twin experiments, where the same numerical model is integrated from nearly identical initial states until predictability is lost. Under a perfect-model assumption, this kind of experiment will provide an estimate for the intrinsic predictability limit. This is a common approach in predictability studies (e.g., Tribbia and Baumhefner 2004; Zhang et al. 2003, 2007; Mapes et al. 2008; Judt 2018). Given the assumption of a perfect model, the accuracy of this estimate largely depends on how realistically the model can simulate reality. There is growing evidence showing that moist physics plays a critical role in the upscale error growth (e.g., Zhang et al. 2003; Selz and Craig 2015; Sun and Zhang 2016). Therefore, the ability of a model to resolve small-scale moist physics is critical for examining the growth of the errors and providing an accurate estimate for the intrinsic finite range of the predictability. In addressing this problem, Z19 used the most up-to-date (at that time) high-resolution full-physics global models: the 9-km version of the operational model at ECMWF, and a uniform 3-km experimental finite-volume cubed-sphere model developed at GFDL during the next-generation global prediction system (NGGPS) project in the United States. These models can give us state-of-the-art, intrinsic predictability estimates.
Two types of perturbation experiments were conducted in Z19. One ensemble was started from the first 10 of the 21 available members of the operational ensemble of 4DVar analyses (EDA) in ECMWF, whereas the other was started from the same analyses after a rescaling of the analysis perturbations by a factor of 10. The former ensemble is representative of our current day initial-condition errors, while the latter was aimed to address the intrinsic predictability limit. Both ECMWF and U.S. global convection-permitting models in Z19 show very consistent results, implying that our current day deterministic weather forecast may be further extended by up to 5 days.
In their comment to Z19, Žagar and Szunyogh (2020, hereafter ZS) raised several concerns about Z19’s methodology and conclusions. They first questioned the perturbations used in Z19, arguing EDA-only type perturbations and the rescaling of these perturbations “do not provide realistic simulations of either the current-day operational, or the future ideal evolution of the forecast errors.” Another major issue they raised is related to the impacts of model error on the growth of forecast uncertainty. They also suggested the use of a parametric model for estimates of the extension of the practical predictability.
We would like to thank ZS for providing the chance to further clarify our findings in Z19, as many of the concerns in ZS are due to their misunderstanding of the aims of Z19 and the approach adopted there. In our reply here, section 2 shows that the use of EDA perturbations in Z19 is reasonable and consistent. In section 3 we will then show that the model error argument and the suggested parametric model in ZS are invalid. A summary is given in section 4.
2. Representation of the analysis and forecast uncertainties
a. EDA-only perturbations
ZS pointed out that the operational ECMWF ensemble prediction system (ENS) needs to supplement the EDA initial-condition perturbations with singular vector initial-condition perturbations and a parameterization of the effect of model errors to realistically simulate the evolution of the forecast uncertainty (Isaksen et al. 2010; Leutbecher et al. 2017; Haiden et al. 2018). Therefore, ZS suspected that including only a subset of 10 of the EDA perturbations and not including the singular vector perturbations and the parameterization of the effect of model errors, may have hampered the ensemble in Z19 from fully capturing the forecast uncertainty for an extended forecast time.
We want to first note that this argument of ZS applies more to the practical predictability scenario (EDA case in Z19). The study of Z19, however, focuses on the intrinsic predictability scenario (EDA0.1 case). Motivated by L69, the key question Z19 aims to address is “With a nearly perfect model, to what limit does the predictability horizon of NWP could approach as the initial error approaches zero?” In this case, the use of EDA-only perturbations guarantees that no model error is introduced into the integration, which ensures that the initial growth of the forecast uncertainty in EDA0.1 is largely due to the butterfly effect idea Lorenz originally proposed. On the contrary, both the singular vector perturbations and the model error schemes are developed in part to address deficiencies of the model, which is contrary to the perfect model assumption in the EDA0.1 experiment. If they were adopted, it would be hard for us to know whether the initial growth of the spread in the EDA0.1 scenario is due to these model error schemes or due to the upscale error propagation we focused on in Z19. Hence, not using singular vectors and model error schemes are justified in experiments targeting intrinsic predictability.
By using a subset of EDA perturbations, Z19 not only wants to “observe the process of forecast uncertainty growth without the influence of the unique dynamics of singular vectors” as mentioned in ZS but also tries to be more consistent with the intrinsic predictability scenario (the main focus of Z19). While we acknowledge that using only 10 EDA members is not the best way of estimating the practical predictability, we show that the use of 10 members is also reasonable for the study of practical predictability in Z19. As we do not have the computer resources to rerun the experiments, we here explore the THORPEX International Grand Global Ensemble (TIGGE) dataset to better represent forecast uncertainty growth in the operational centers (Bougeault et al. 2010). The TIGGE dataset consists of ensemble forecast data from 10 global NWP centers, starting from October 2006, which has been collected in real time and made available for scientific research. ECMWF in TIGGE dataset consists of one control forecast member and all 50 perturbed forecast members, generated using EDA and singular vector perturbations to simulate initial uncertainties (Buizza et al. 2008), also including stochastic parameterizations to simulate the effect of model errors (Palmer et al. 2009).
Figure 1 shows the evolution of the ensemble variance of 500-hPa winds in the 50 perturbed ensemble members in the TIGGE dataset, calculated using the same method as in Fig. 3 of Z19 for the same summer case. We find that forecast uncertainty in the TIGGE dataset does grow faster in the first few days, which is expected due to the nature of singular vector perturbations and added model errors. We also see that ensemble variance using only the first 10 of the 50 available TIGGE members produce almost identical results as that using all the 50 members, implying that an ensemble of 10 members is valid for the metric considered in Z19.
It is worth pointing out that, although slower initially, the growth rate of ensemble variance in Z19 shows little difference from that in the 50 member TIGGE (even slightly larger due to less saturation effect) after 2 days. As the nonlinear processes become more important, and the linearly computed singular vectors lose their optimality, the difference between Z19 and TIGGE also becomes smaller and smaller. In summary, we agree that singular vector perturbations could bring faster initial error growth for the EDA case, but we would argue that this has a small impact on the estimation of the practical predictability limit, provided that the estimated limit is beyond this short forecast range, as reported by Z19. Concerning the inclusion of model error schemes in Z19’s ensembles of the practical predictability scenario, we would also argue that including stochastic parameterizations would not have changed fundamentally results of Z19 based on Fig. 1. Results from ECMWF (see, e.g., Palmer et al. 2009) have indeed shown that the inclusion of model error schemes has a small impact, on average and globally, by ~5%–10%.
To investigate the intrinsic predictability limit of midlatitude weather systems, Z19 rescaled the EDA perturbations by a constant factor of 10 to reduce the magnitude of the perturbations and achieved nearly perfect initial conditions. Given that EDA-type perturbations have larger amplitudes in the tropics and at larger scales, the rescaling will lead to, in the absolute sense, more reduction at larger scales and in the tropical region. ZS then suspected that “reduced initial-condition uncertainties in the tropics and at the large scales must have played a major role in the increased predictability in the midlatitudes in the experiments of Z19.” This is speculative. Rescaling the perturbations is one way to investigate the sensitivity to a reduction of the initial-condition errors to a minimum, and thus to assess the intrinsic predictability limit. It is hard to separate the downscale propagation of the large-scale errors and the local error growth in the small scales due to a very short time scale for both processes. In fact, for the intrinsic predictability limit, as long as the amplitude of the perturbation is sufficiently small, the scale dependence of the minute perturbations is not so important. Sun and Zhang (2016) showed that minute perturbations at large scale show a very similar evolution path to the noise type perturbation experiment, which is also seen in Durran and Weyn (2016).
c. β term
where ε(t) is the normalized error.1 One difference between Z19 and some previous studies (e.g., Dalcher and Kalnay 1987; Magnusson and Källén 2013; Žagar et al. 2017) is that ε(t) is the ensemble spread for the 10 ensemble members in Z19, which does not contain model error at all. Hence the growth of ε(t) in Z19 has zero contribution from model error; β is the “superexponential” term that represents the intrinsic error growth even if the model is perfect and the initial-condition error ε(0) is 0. If β = 0, then Eq. (1) describes a simple exponential error growth, in which our forecast skill could be improved without any limit if we keep reducing the initial-condition error ε(0). We agree with ZS in that similar superexponential error growth can be found in coarse-resolution full-physics models (Harlim et al. 2005). Otherwise, there would be no intrinsic predictability limit for the weather systems in these coarse-resolution models. Nonetheless, as pointed in Z19 and the introduction, moist physics might be the key to this intrinsic predictability limit. For example, Sun and Zhang (2016) showed that the Gaussian white noise perturbations added to the dry model will decay for the first 36 h, compared to their rapid growth in the moist model. Therefore, to get an accurate estimate for the intrinsic predictability limit, we believe that the global convection-permitting model, which at least partially resolves the moist convection process, is the best tool to use. As stated in Buizza (2010) and Bengtsson et al. (2008), although these simple error-growth models are useful tools to investigate forecast error growth, they have difficulties in describing the error growth at short forecast ranges, and thus should be used with care.
3. Model versus initial-condition improvements and the parametric model
a. Model error
While the practical predictability limit for an NWP system is greatly affected by model errors and model bias, it is common to adopt the perfect model assumption when estimating the intrinsic predictability limit. As pointed in the introduction and Z19, the accuracy of the estimation of the intrinsic predictability limit does depend strongly on how well the model captures physical processes in the real world. This is also the reason that Z19 chose two of the most up-to-date global models available at that time. Consistency between these two very different models strengthens our confidence in the results reported in Z19. Moreover, some systematic model bias does not necessarily change the intrinsic predictability of the system. For example, a trivial dislocation of small mountains will certainly generate a model bias, but it shall not change the intrinsic predictability of the flow. Nonetheless, there is still a possibility that the scenario and estimation for the intrinsic limit would be different once the grid spacing of the global model drops below about 1 km and moist convection becomes fully resolved. New experiments with more advanced models should be conducted to answer this question.
Z19 stated that achieving the “up to 5 days” potential gain “requires coordinated efforts by the entire community to design better numerical weather models [emphasis added], to improve observations, and to make better use of observations with advanced data assimilation and computing techniques.” Although the requirement of reducing model error is implied by this statement and the perfect model assumption in Z19, Z19 did not explicitly state it. We welcome the opportunity to do it here to avoid misunderstandings as in ZS. Reducing initial-condition error alone will surely not achieve this potential “up to 5 days” gain reported in Z19.
The best way to evaluate the impact of model error is running experiments using different models under the same initial and boundary conditions and then compare the model performance (Magnusson et al. 2019). This kind of experiment is usually conducted for continuous operational model development (Haiden et al. 2018). Yet, as model development is generally done piece by piece and takes years, sometimes a decade, to see significant improvements in the weather forecast skill of an NWP system, very few systematic results are published.
b. Parametric model in ZS
We agree with ZS that parametric analytical models for the average growth of the forecast uncertainty are useful. Equation (1), which is used in Z19, also serves this purpose. Yet one needs to be cautious not to overinterpret the results of these parametric models, as the parameters are fitted and could be affected by many factors.
For convenience, the model used in Žagar et al. (2017) is listed here [their Eq. (12)]:
where E is the forecast uncertainty, Emax and Emin are the maximal and minimal values of the model function according to Žagar et al. (2017) and E ∈ [Emin, Emax]. It can be shown that Eq. (2) is just a variant of Eq. (1). They are equivalent mathematically. Simply let α = sEmax and γ = −sEmaxEmin, then Eq. (2) becomes
Whereas similar mathematical models are used in ZS and Z19, the application of the models and the interpretation are very different. Given that Žagar et al. (2017) and ZS applied this model to the operational dataset, model error plays a role in their study. In the experiment of Z19, as discussed above, the growth of the forecast uncertainty has no contribution from model error; therefore, β is mainly interpreted as the intrinsic upscale error propagation from small scales.
ZS applied the model to different horizontal scales (each different horizontal wavenumber). Figure 1 of ZS also compared their fitted results of June 2018 and May 2015, claiming that more improvement is achieved at subsynoptic and even smaller scales. However, this is not a fair comparison. The forecast skill of operational models has a large variation for different months, showing a strong seasonal cycle (Haiden et al. 2018). The seasonal cycle signal is much larger than the improvement observed in the past few years. The differences between results for June 2018 and May 2015 shown in Fig. 1 of ZS are likely due to different large-scale flow patterns and different forcing for these two periods. A valid comparison would be running the updated model again under May 2015 initial conditions to show the impact of model improvement. Alternatively, the old model in 2015 could be used to run the 2018 initial conditions for a better understanding of the impact of reduced initial conditions.
It is also questionable trying to use this parametric model to predict the effect of reducing initial-condition errors as shown in Fig. 2 of ZS. ZS processes each horizontal scale separately. However, given the strong interactions between different horizontal scales, it is almost meaningless to examine the effect of reducing the initial-condition error for a certain horizontal scale only (as in Fig. 2 of ZS) without knowing the information of the initial-condition errors at other scales. For example, if we reduce the initial-condition error only at k = 35 while keeping the initial-condition errors at other horizontal scales the same, then it is most likely we will not see any improvement in the forecast skill for this same horizontal scale (k = 35). The strong cross-scale interaction will very quickly fill the initial small gap at k = 35, making the reduction at this certain scale negligible. This example also indicates that the error growth rate at a certain horizontal scale (e.g., k = 35) is affected by the errors at this scale and other horizontal scales. Therefore, reducing the initial-condition errors will surely change the error growth rate and the associated parameters in the parametric model. Estimates made based on fixed parameters as in ZS then become less convincing.
A similar yet more meaningful problem would be, How much gain shall we expect for each scale if the initial-condition errors are reduced by a certain percentage (e.g., 75%) at all scales? This would be close to the example given in Fig. 2 of ZS.2 However, it is very difficult to provide an answer to this kind of question using a simple parametric equation due to strong cross-scale interactions. We will briefly illustrate this next. The calculation in ZS used fixed parameters for their parametric model. Hence the evolution of the errors in their model only depends on the amplitude of the errors. The additional gain calculated in ZS is therefore simply the time needed for the reduced errors to grow back to their original values. Assume we have these estimates for different horizontal scales as calculated in ZS, defined as t1, t2, …, tn for wavenumber k1, k2, …, kn. Initially, errors at all the scales are reduced (e.g., by 75%), after a certain time tm (e.g., the median number of t1, t2, …, tn), then errors at some scales (with estimated time that is less than tm) will grow larger than their original values according to the fixed parametric model, whereas errors at other scales would still be smaller than their original values. The distribution of the errors would then be very different from the original error distribution unless t1 = t2 = … = tn. The error growth rate afterward would soon be very different due to cross-scale interaction, which correspondingly shall change the parameters of this parametric model. Hence, the estimation made based on fixed parameters becomes invalid. In fact, the best way to answer this question is by using numerical models as in Z19. Z19 also provided qualitatively answers to this kind of question, implying that we shall see more gain at relatively larger scales, consistent with a longer intrinsic predictability limit at larger scales. After all, one should never expect a deterministic model to forecast the genesis of a tornado (with horizontal scales of hundreds of meters) at a specific location 1 day ahead.
Z19 investigated the predictability limit of midlatitude weather systems, with a focus on the intrinsic predictability limit of a deterministic forecast. Using the 9-km model currently running operationally at ECMWF and the 3-km experimental model developed in GFDL, Z19 concluded that “assuming the current-generation state-of-the-science NWP models could capture the most essential physical processes in the real world, we can further improve the forecast accuracy of day-to-day weather events by up to 5 days.” Achieving this additional potential gain requires continued coordinated efforts by the entire community to design more accurate and better NWP models (reducing model error), improve and enhance the observing techniques and networks, and make better use of observations with advanced data assimilation (reducing initial-condition error).
ZS criticizes that Z19 used only a subset of 10 of the EDA perturbations and did not include the singular vector perturbations and parameterizations of model errors. We think that the ZS’s comments apply more to practical predictability than to intrinsic predictability, which was the focus of Z19. For the intrinsic predictability limit (EDA0.1 experiment in Z19), adding singular vector perturbations and stochastic schemes will complicate our interpretation of the initial growth of the errors and is not really relevant in addressing our central question. Using a perfect model assumption and running the global convection-permitting model is, in our opinion, the best way to get an accurate estimate for the intrinsic predictability limit. We further note that using a perfect model assumption in the intrinsic predictability experiments also means that the “up to 5 days” potential gain reported in Z19 for midlatitude synoptic weather system requires a reduction of both the model error and the initial-condition errors. For the practical predictability (EDA experiment in Z19), we agree with ZS that the choice of perturbation methods and the model error have an impact. Indeed, ECMWF keeps using singular vectors to generate their ensemble’s initial perturbations to improve the ensemble reliability in the short forecast range, as they documented in several papers. Nonetheless, through comparison with results using the TIGGE dataset for the same event, we argue that the use of 10 EDA members in Z19 is reasonable and we show that different perturbation methods and model errors mainly influence the forecast uncertainty in the first 2 days. The authors also disagree with ZS about the suggestion of using the parametric model in ZS to predict the effects of reducing the initial-condition errors due to strong interactions between different scales.
The ECMWF services are gratefully acknowledged for the access to the ENS data. The authors thank Baoqiang Xiang and Kun Gao for providing useful comments on an earlier version. Y. Qiang Sun is funded under Award NA18OAR4320123 from the National Oceanic and Atmospheric Administration, U.S. Department of Commerce.
The original article that was the subject of this comment/reply can be found at http://journals.ametsoc.org/doi/abs/10.1175/JAS-D-18-0269.1.
We normalize ε(t) as ε(t) = (E/Emax), where E and Emax are defined in Eq. (2); β here is also normalized by Emax. Z19 did not present the value of Emax. However, it can be seen from the top panels of Figs. 3 and 4 in Z19 that the errors in both EDA and EDA0.1 experiments approach the same saturation value at the end of the integration. The value of Emax is calculated as the average of E in EDA and EDA0.1 for the last day (day 20). The value of Emax is larger in the winter than that of the summer, on the order of ~100 m2 s−2 (Figs. 3a,b of Z19). Also, the same parameter is used to fit both EDA and EDA0.1 experiments in Z19.
Figure 2 of ZS shows their results for each horizontal scale when the initial-condition errors are reduced by 75% and 99%.