Understanding the predictability limit of day-to-day weather phenomena such as midlatitude winter storms and summer monsoonal rainstorms is crucial to numerical weather prediction (NWP). This predictability limit is studied using unprecedented high-resolution global models with ensemble experiments of the European Centre for Medium-Range Weather Forecasts (ECMWF; 9-km operational model) and identical-twin experiments of the U.S. Next-Generation Global Prediction System (NGGPS; 3 km). Results suggest that the predictability limit for midlatitude weather may indeed exist and is intrinsic to the underlying dynamical system and instabilities even if the forecast model and the initial conditions are nearly perfect. Currently, a skillful forecast lead time of midlatitude instantaneous weather is around 10 days, which serves as the practical predictability limit. Reducing the current-day initial-condition uncertainty by an order of magnitude extends the deterministic forecast lead times of day-to-day weather by up to 5 days, with much less scope for improving prediction of small-scale phenomena like thunderstorms. Achieving this additional predictability limit can have enormous socioeconomic benefits but requires coordinated efforts by the entire community to design better numerical weather models, to improve observations, and to make better use of observations with advanced data assimilation and computing techniques.
Weather forecasting has improved dramatically since the introduction of numerical weather prediction (NWP) nearly six decades ago (Bauer et al. 2015). This has been accomplished through ever-increasing computing power, improved models running at ever-increasing resolution with more accurate representation of atmospheric physical processes, and more sophisticated four-dimensional data assimilating algorithms that can better ingest ever-increasing volumes and quality of in situ and remotely acquired observations (WMO 2015). A widely used measure of global NWP forecast quality is the anomaly correlation coefficient (ACC) of 500-hPa geopotential height between the forecasts and observations. In practice, 60% is usually used as a threshold for measure of skillful synoptic-scale weather forecast. Examining the evolution of ACC (Fig. 1), useful deterministic forecasts by arguably the most advanced NWP model at the European Centre for Medium-Range Weather Forecasts (ECMWF) could at best be made up to around 10 days: this number was 7 days 30 years ago (Simmons and Hollingsworth 2002; Bauer et al. 2015). More improvements can be seen in the Southern Hemisphere where the traditional observing network is sparser but which has now been densely covered by satellite observations.
Improved NWP can have significant socioeconomic benefits by better predicting the occurrence of natural disasters, saving lives, and protecting property. For example, improved NWP is largely credited with the dramatic improvement in tropical cyclone prediction worldwide. The present tropical cyclone track forecast accuracy at the U.S. National Hurricane Center on average has gained almost a day lead time per decade (e.g., Zhang and Weng 2015): the yearly averaged 5-day lead-time track forecast error for the Atlantic basin in 2016 is smaller than the 2-day lead-time forecast error in 1990, which may have saved billions of dollars (Katz and Murphy 2015).
Yet improvement of NWP has limits. From the perspective of predictability, this concept of “atmospheric predictability limit” can be grossly categorized into intrinsic versus practical predictability (Lorenz 1996; Melhauser and Zhang 2012). As discussed in Ying and Zhang (2017), intrinsic predictability refers to “the ability to predict given nearly perfect representation of the dynamical system (by a forecast model) and nearly perfect initial/boundary conditions, an inherent limit due to the chaotic nature of the atmosphere and cannot be extended by any means” (Lorenz 1963, 1969; Zhang et al. 2003, 2007; Sun and Zhang 2016). Practical predictability, also commonly referred to as our weather prediction skill, is “the ability to predict given realistic uncertainties in both the forecast model and initial and boundary conditions” (Lorenz 1982, 1996; Zhang et al. 2002, 2007). This practical predictability can be extended through reduction in key limiting factors of the forecast errors, including initial-condition errors, boundary condition errors, and model errors. All these factors, especially the initial-condition errors, have been greatly and could be further reduced with better NWP models ingesting high-accuracy observations using advanced data assimilation approaches along with advanced computing power (e.g., Zhang et al. 2009; Zhang and Weng 2015; Emanuel and Zhang 2016). Nevertheless, given our desire for better weather forecasting at all temporal and spatial scales, it is natural to ask whether an intrinsic predictability of the midlatitude weather exists. If yes, what is this inherent limit given nearly perfect NWP models with nearly perfect initial conditions? This is a crucial question that meteorologists have sought to answer ever since the beginning of NWP (e.g., Thompson 1957; Lorenz 1969; Leith 1971). Answering this question could provide guidance to society in decisions to enhance observing networks, improve models, and better assimilate observations into the forecast models.
Excellent work on this subject area has been pioneered by Lorenz, who first introduced the concept of “butterfly effect,” which described the existence of the intrinsic predictability limit using a spectral turbulence model (Lorenz 1969). Lorenz showed that, for flow whose spectral slope is shallower than −3, error-doubling time decreases with decreasing scales, which led to an upscale error spreading and could provide an effective intrinsic limit to the predictability of the flow. For flow with a slope steeper than −3, unlimited predictability might be achieved. This butterfly effect concept also inspires many subsequent studies using a hierarchy of turbulence models, which further confirmed Lorenz’s theory (e.g., Leith and Kraichnan 1972; Rotunno and Snyder 2008; Durran and Gingrich 2014). While it remains unclear how these turbulence model results relate to our real atmosphere, it is widely accepted that the real atmosphere very likely also has an intrinsic limit of predictability (Palmer et al. 2014).
Estimates of this intrinsic predictability limit for a deterministic forecast can be made based on numerical integrations of model equations from two (identical-twin experiments) or more rather similar or even identical initial states (Lorenz 1963, 1969). The limit will occur at a time when the spread between these nearly identical runs starts to saturate and becomes as much as the spread among some randomly selected, but dynamically and statistically possible states. The accuracy of this kind of estimate is dependent on the accuracy of the forecast model used (Lorenz 1996). Earlier studies have used models of increasing complexity to investigate this intrinsic predictability and the error growth behavior of our atmosphere (e.g., Leith 1971; Daley 1981; Zhang et al. 2003, 2007; Mapes et al. 2008; Morss et al. 2009; Ngan et al. 2009). While these studies all agree on the existence of an intrinsic predictability limit for the respective weather systems, detailed error growth behavior differs among different models and different weather systems being studied. For example, in addition to an error cascade from smaller to larger scales (upscale growth; e.g., Lorenz 1969; Morss et al. 2009), some recent studies also show errors could grow spontaneously at all scales (up magnitude) without saturating at smaller scales (e.g., Mapes et al. 2008; Durran and Gingrich 2014).
Given that there is a degree of model dependency, many studies now tend to explore atmospheric predictability under more realistic frameworks with either regional (e.g., Zhang et al. 2003, 2007; Selz and Craig 2015; Ying and Zhang 2017) or global (e.g., Simmons and Hollingsworth 2002; Tribbia and Baumhefner 2004; Froude et al. 2013) NWP models. Regional models, which require boundary conditions, generally constrain longer-term error growth and propagation within the domain boundaries. Previous global predictability studies, on the other hand, usually do not have sufficient model resolutions to explicitly resolve mesoscale processes and moist convections, which have been shown to be critical for the initial error growth (Zhang et al. 2003, 2007; Selz and Craig 2015; Sun and Zhang 2016). Indeed, there has been increasing evidence that mesoscale error growth shows similarity with the turbulence case under the shallower −5/3 kinetic energy spectrum, which is not well simulated in most coarse-resolution global NWP models (Augier and Lindborg 2013; Sun and Zhang 2016; Weyn and Durran 2017).
With recent advancement in computing capability, we now have entered a new era of global convection-permitting NWP models (Putman and Suarez 2011; Skamarock et al. 2014). Mapes et al. (2008) examined the predictability behavior of the atmosphere using global 7-km aquaplanet identical-twin simulations, with a focus in the tropics. Judt (2018) studied the atmospheric predictability through a pair of convection-permitting identical-twin simulations with the newly developed global Model for Prediction Across Scales (MPAS; Skamarock et al. 2014). Building on the findings of previous theoretical and modeling studies, our work here seeks to estimate the intrinsic limit of day-to-day weather predictability using ensemble simulations with the most advanced global NWP models at both ECMWF and U.S. NOAA. Our particular emphasis will be synoptic-scale weather systems dominated by baroclinic instability in the midlatitudes, where most of the world population resides. In particular, we showcase the practical versus intrinsic predictability limits of the global midlatitude weather during two periods in boreal winter and summer, respectively. These periods also endured two recent hazardous regional weather events: a wintertime cold-surge event affecting northern Europe in early January 2016, and a summertime rainfall-flooding event in China during July 2016. The choice of these two events are rather subjective and somewhat random with the intent to represent the typical midlatitude predictability while in the meantime covering some notable weather events in recent years. Nevertheless, neither of these two cases fall into the “forecast bust” cases using the criteria identified by Rodwell et al. (2013). Moreover, to the best of our knowledge, there were no severe weather outbreaks during these two periods in the midlatitude atmosphere of the Southern Hemisphere whose predictability will be simultaneously examined and compared with the Northern Hemispheric midlatitudes that have notable weather events.
Section 2 of this paper introduces the experiment design of our work, including the model we used and the perturbations added for each ensemble. Analysis for the ensemble spread from different perspectives are given in section 3, physical interpretation based on the results and hence the estimated predictability limit is also provided. Discussions on the limitations of current work and concluding remarks are presented in section 4.
2. Experimental design
This study adopted established methodologies introduced in the introduction for studying atmospheric predictability using perfect-model, identical-twin experiments where the ensemble members with minute initial-condition differences are explored. A series of ensemble simulations with the state-of-the-science global NWP model at ECMWF [viz., the Integrated Forecast System (IFS)], and U.S. Next-Generation Global Prediction System (NGGPS) with finite-volume cubed-sphere dynamical core (FV3) are designed to address the following two key questions: 1) What is the intrinsic predictability limit of multiscale midlatitude weather assuming a perfect model with nearly perfect initial conditions? 2) How much longer can the practical predictability be increased by reducing initial-condition uncertainties to different degree of accuracy?
a. Model details
1) ECMWF/IFS model
The IFS control and ensemble forecasts presented herein uses the latest upgrade (cycle 41r2) of ECMWF, the highest-resolution-ever (~9 km) global operational NWP model. More details of this model upgrade can be found on the official website of ECMWF (http://www.ecmwf.int/). Different from previous versions, this new ECMWF IFS model implements a cubic octahedral reduced Gaussian grid (with spectral truncation denoted by TCO1279) instead of the linear reduced Gaussian grid. With this cubic reduced Gaussian grid, the shortest resolved wave is represented by four rather than two grid points. The octahedral grid is also globally more uniform than the linear reduced Gaussian grid. In the vertical, the ECMWF model has 137 levels and a model top at 0.01 hPa. This corresponds to over 900 million grid points in total after this resolution upgrade.
In addition to resolution increase, the realism of the kinetic energy spectrum is also significantly improved with more energy in the smaller scales due to a reduction of the diffusion and removal of the dealiasing filter, enabled by the change to using a cubic truncation for the spectral dynamics. The semi-Lagrangian departure point iterations used to solve the primitive equations are also increased in the new model to remove numerical instabilities. The integration time step upgraded accordingly to 450 s. As intrinsic predictability implies the upper limit for our weather prediction given a nearly perfect model, no perturbation is applied to any model parameter and no stochastic physics scheme is adopted.
2) U.S. fvGFS system
The newly developed Geophysical Fluid Dynamics Laboratory (GFDL) FV3 with Global Forecast System (GFS) physics (fvGFS) modeling system (Zhou et al. 2019; Hazelton et al. 2018; Chen et al. 2018) is used to further cross-examine the sensitivity of multiscale predictability to different model parameterizations and resolutions under future global convection-permitting NWP. This system was built during the NGGPS phase II, using the nonhydrostatic FV3 coupled to physical parameterizations from the National Centers for Environmental Prediction’s GFS (NCEP/GFS). The GFDL FV3 was recently chosen as the dynamical core for the U.S. NGGPS as detailed in an online report (https://www.weather.gov/sti/stimodeling_nggps_implementation_atmdynamics); a report on this NGGPS development can also be found in Voosen (2017). In this study, we used the global uniform 3-km fvGFS configuration without ocean coupling. This model has 63 vertical layers and the model top is set at 0.6 hPa. The physical parameterizations include the Rapid Radiative Transfer Model for GCMs (RRTMG; Iacono et al. 2008) and the GFDL 6-class single-moment microphysics scheme (Chen and Lin 2011, 2013; Zhou et al. 2019). No cumulus scheme is adopted.
b. Ensemble experiments: EDA and EDA0.1
We first perform two types of ensemble experiments (denoted as EDA and EDA0.1, with explanations later in this subsection) with the current operational 9-km IFS model, running 10-member ensembles for 20 days beginning at 6 different times (3 consecutive days of 24–26 December 2015 and 3 consecutive days of 24–26 June 2016). All simulations are initialized at 0000 UTC. The initial condition and perturbations for the EDA ensembles are derived directly from the first 10 of 21 available operational ensemble four-dimensional variational data assimilation (4DVar) analyses (Bonavita et al. 2012) that represent the current realistic initial-condition uncertainties by the best-performing global NWP model (i.e., IFS at ECMWF). The design of the EDA ensemble using realistic initial-condition uncertainties is to explore more on the practical predictability side of the atmosphere as an assurance that the model used for this study could capture synoptic-scale dynamics and has typical predictive skills during the event periods selected for this study. Note that the EDA system uses the covariances derived from a coarser-resolution (TCO639; ~16 km) ensemble forecast and thus small scales are not strongly constrained by observations. When initializing the model at higher resolutions, there would be a transient adjustment process (within hours; see Skamarock et al. 2014) to the small-scale energy spectrum. This adjustment process will potentially excite a spurious cascade, which might bring faster initial error growth at these smaller scales. However, the impact of this process is expected to be small at synoptic scales and will be neglected in particular for the current study with a perfect-model assumption.
In comparison, the initial conditions for the EDA0.1 ensembles are perturbed with only 10% of the initial perturbations in the corresponding EDA ensembles centered at the control operational analysis of the IFS. With perturbation kinetic energy error only 1% of the current-day state-of-the-science analysis uncertainties, the EDA0.1 ensembles can be regarded as using nearly perfect initial conditions. The use of nearly perfect initial conditions, along with the use of the same model without physics perturbations, is in the spirit of perfect-model identical-twin experiments, which are designed to understand the intrinsic predictability limit of the atmosphere. Although the number of ensemble runs is still limited, to the best of our knowledge, this is the first time such a high-resolution global-model ensemble performing at the convection-permitting resolution is used for exploring the intrinsic limit of atmospheric predictability.
For the NGGPS FV3 model experiment, at 3-km grid spacing, computational costs permit us to run only one pair of identical-twin simulations starting at 0000 UTC 24 December 2015 for the Northern Hemispheric winter event: one initialized with the control member from the IFS model and the other initialized with the same initial perturbations as in member 1 of EDA0.1.
Although only one pair of identical-twin 3-km FV3 simulations can be afforded computationally for this study, it does offer a direct comparison of the error growth to the same pair of identical-twin simulations using the operational IFS model that has a different dynamical core and different resolution. In the meantime, for typical midlatitude synoptic systems of 5000 km in horizontal wavelength, there are about 5–10 such concurrent synoptic weather events in either hemisphere. In essence, this single pair of identical-twin experiments could represent a predictability estimate of multiple events under more general global statistics.
3. Predictability limit
To exemplify the limit of intrinsic predictability of day-to-day weather, we first select the January 2016 cold-surge event during which most areas of northern Europe experienced temperature anomalies below −5°C, as shown in the observational analysis (Fig. 2a). Near-normal temperature is observed over most of the contiguous United States and Canada except for a moderate warm anomaly over the Great Lakes region. The corresponding 15-day control forecast (Fig. 2b) by the ECMWF 9-km operational model IFS initialized at 0000 UTC 24 December 2015 failed to predict the northern Europe cold anomaly while it underpredicted the surface temperature over most of contiguous United States and overpredicted temperature over most of Canada.
A 10-member ensemble (EDA0.1), constructed by perturbing the control forecast with minute initial perturbations that are an order of magnitude smaller than the current analysis uncertainty, produced drastically different 15-day forecasts, each of which is nearly indistinguishable from a random sample of the climatology of this day. For example, member 1 of this reduced-perturbation ensemble (Fig. 2c) initialized also at 0000 UTC 24 December 2015 predicted a slightly above normal temperature (instead of the observed cold surge) over northern Europe while forecasting extremely cold conditions over most of the contiguous United States (instead of the observed normal to slightly warmer anomalies). The differences in predicted synoptic flow patterns between EDA0.1 member 1 and the unperturbed control forecast are comparable to the differences between the control run and the observational analyses represented by the sea level pressure maps in Fig. 2, except for the quasi-stationary planetary low pressure centers over the northern Atlantic and Pacific Oceans typical of climatological mean patterns. Failure of the control forecast (compared to observational analysis) and drastic forecast divergence between the control forecast and EDA0.1 ensemble member 1 that is perturbed with hypothetical minute initial perturbations (likely beyond the reach of future analysis accuracy) suggests a complete loss of predictability at the 15-day lead time (i.e., the intrinsic limit of day-to-day midlatitude weather predictability may not be extended beyond 2 weeks, at least in this case).
a. Evolution of ensemble spread
As mentioned in the introduction, the forecast uncertainty and the limit of predictability can be more systematically quantified by the evolution of the spread between the ensemble members and the time when it starts to saturate. Figure 3 shows midlatitude mean ensemble variance of the 500-hPa winds (a measure of ensemble kinetic energy spread) from two ensemble hindcasts initialized on three consecutive days (24–26 December). The choice of 500-hPa winds is because that it is directly linked to the kinetic energy spectrum, which will be discussed later. Nonetheless, metrics using geopotential height give very consistent results (not shown here). The EDA ensemble sets in Fig. 3 are initialized with the current realistic analysis uncertainties represented by the ECMWF ensemble of 4DVar analyses, while EDA0.1 ensemble sets are initialized with nearly perfect initial conditions (initial kinetic energy error is 1% of that in EDA). As shown in Fig. 3a (normalized results shown in Fig. 3c), the spread of the EDA ensembles with realistic initial-condition uncertainties grows nearly two orders of magnitude larger before saturating at approximately 10–12 days, while the spread of the EDA0.1 ensembles, with minute initial perturbations (i.e., nearly perfect initial conditions), grows nearly four orders of magnitude larger before saturating at the same level as the EDA ensemble around 14–15 days (as a strong indication of the intrinsic predictability limit).
Similar quantitative statistics, representing intrinsic versus practical predictability limits assuming perfect model, can also be inferred from the same pairs of ensembles for the Southern Hemisphere (Fig. 4), as well as from pairs of Northern Hemisphere midlatitude 20-day 10-member global ensemble (Figs. 3b and 3d) initialized from three consecutive summer days (24–26 June) in 2016. During the 20-day simulation period in June, vast areas of the Yangtze River basin of China observed historical flooding (NASA; https://earthobservatory.nasa.gov/NaturalHazards/view.php?id=88467). Moreover, calculation of the ACC between the ensemble forecast and the observations over the global midlatitudes also gives quantitatively similar estimates for both the practical and intrinsic predictability limits (Fig. 5), with correlation dropping to 60% at around 10 days for the EDA ensemble and 13–15 days for EDA0.1 ensemble.
The growth of the ensemble variance, representative of the forecast error, fits surprisingly well with the simple error growth model that was originally proposed in Lorenz (1982), modified later (Dalcher and Kalnay 1987; Reynolds et al. 1994) and here as
Here ε(t) is the normalized error where ε ~ 1 means it reaches the maximum or becomes saturated, and α is the synoptic-scale error growth rate. Previous studies (e.g., Magnusson and Källén 2013) usually use β as a measure for model error. Given that we are comparing between different ensemble members using the same forecast model, β here represents the error growth rate induced by the intrinsic upscale error propagation such as from small-scale moist processes (e.g., convection) even when we have nearly perfect initial condition (Sun and Zhang 2016). Figures 3c, 3d, 4c, and 4d show the evolution of normalized error averaged for both the winter and summer cases, respectively, as well as the fitted error growth curves from Eq. (1).
Figures 3 and 4 show that exponential error growth (quasi-linear line in the logarithmic plot) dominates the first few days of the EDA ensembles, with a growth rate determined by α. The β term has little impact on the error growth curve for the EDA experiment due to relatively large initial-condition error. However, compared with EDA, much faster initial error growth is observed for the EDA0.1 ensembles. We can also deduce that the error growth rate (slope of the error growth curve in Figs. 3 and 4) in EDA0.1 will increase with decreasing ε, implying that there will eventually be diminishing returns from further reducing the initial-condition errors. This “superexponential” initial error growth in EDA0.1 is caused by the presence of the β term (representing the intrinsic upscale error growth and propagation from small scales) in Eq. (1). For example, the green line in Figs. 3c and 3d shows the predicted error growth curve derived from Eq. (1) when the initial-condition error is reduced to 1.0 × 10−10. It is nearly identical to the blue line, which means there is not much more room for improvement. In other words, if Eq. (1) holds, further reduction in the initial-condition or model error would not help extend our forecast lead time much longer (maybe only in hours or even minutes).
The errors in EDA0.1 grow to an amplitude similar to the EDA initial ensemble spread in 3–4 days. Subsequent error growth and saturation in the EDA0.1 ensembles mimic those of the EDA ensembles except for a 3–4-day delay in forecast lead times. The overall reference error kinetic energy saturates (ε ~ 1) at around 10–12 days for all the EDA ensembles and 14–15 days for all the EDA0.1 ensembles. This remains true for different initialization times and for both the winter and summer days of the Northern Hemisphere and the Southern Hemisphere.
These unprecedented high-resolution 9-km global ensembles of a state-of-the-science NWP model, initialized with both realistic and nearly perfect initial-condition uncertainties, suggest that the ultimate limit of midlatitude day-to-day weather predictability is about 2 weeks, but there is still a potential of 3–5 more days of additional forecast lead time to be gained through improving the current practical predictability, which is about 9–10 days. Such improvements may be gained from reducing initial-condition and model uncertainties through better observations, better data assimilation, and better forecast models running at higher resolution with ever-increasing computing capability.
b. Spectral analysis
While 3–5 days serves as the estimated potential for extended weather forecast lead time, the atmospheric predictability limit is also scale dependent. For example, small-scale thunderstorms are much less predictable than the synoptic system in which they are embedded. Therefore, it is important to examine the scale dependence of predictability limit. Spectral decomposition of perturbation kinetic energy across all zonal wavenumbers averaged over the midlatitudes (40°–60°N) for both winter and summer periods are displayed in Fig. 6. The corresponding spectra for the Southern Hemisphere midlatitudes (40°–60°S) are shown in Fig. 7.
The kinetic energy spectrum here is calculated as in (Skamarock 2004). We have chosen to compute the one-dimensional (1D) spectrum of the velocity fields along zonal direction. The advantage of this 1D spectrum is that we could fully utilize the periodicity of the global model in the zonal direction while focusing on the midlatitude only. Let and denote the zonal and meridional velocity components for the nth ensemble member, subtracting the ensemble mean fields first if we are calculating the kinetic energy spectra for the perturbations. For the spectra in Fig. 8, the differences between the perturbed run and the unperturbed run are used. The Fourier transforms of the velocity components and are then computed along the zonal direction for each ensemble member and all the meridional j indices. Then the kinetic energy spectra density can be written as
where Nx is the number of grid points along the zonal direction of the model. The asterisk denotes the complex conjugate. We can then average over j and n to get the kinetic energy spectrum for the full ensemble and the latitude band of interest (40°–60°N for the midlatitudes; the results are not very sensitive to this choice; the 30°–60°N average give very similar plots). When the spectrum of the perturbation kinetic energy (amplitude of “noise”) at a given wavelength reaches the reference background spectral kinetic energy (signal to be predicted), it is saturated, after which no single deterministic forecast will have any predictive skill.
Consistent with Fig. 3, Fig. 6 also shows that it takes slightly more than 3 days for the perturbation kinetic energy in the reduced-perturbation ensemble (EDA0.1) to grow two orders of magnitude across all resolvable wavelengths to a level comparable with the realistic analysis uncertainty represented by the EDA for the periods of the winter and summer events, respectively. Also, Fig. 6 shows that the perturbation spectral kinetic energy from the EDA0.1 ensemble saturates at the amplitude of the reference kinetic energy across all synoptic scales by 15 days, again consistent with the overall intrinsic predictability limit estimated from Fig. 3.
Moreover, saturation time for different scales is different. With reduced initial-condition uncertainties, as in EDA0.1, forecast error first saturates at smaller scales, then subsequently grows rapidly in magnitude and in scale, consistent with past regional modeling studies (Zhang et al. 2007; Selz and Craig 2015; Sun and Zhang 2016). A simple estimation from Fig. 6 shows that the forecast error saturation time (and thus intrinsic limit of predictability) is less than 3 days for horizontal scales less than 200 km, less than 5 days for horizontal scales less than 400 km, and less than 10 days for horizontal scales less than 1000 km. Also worth noting, Fig. 6 illustrates the synoptic-scale predictability in terms of 500-hPa horizontal winds. Much more limited predictability is expected for vertical velocity and instantaneous precipitation rate forecasts (Bei and Zhang 2007, 2014), which possess very different reference energy spectra.
c. Sensitivity study using U.S. NGGPS model
Although we could only afford to perform one pair of 10-day forecasts using the U.S. NGGPS model based on the FV3 dynamical core with 3-km convection-permitting horizontal grid spacing, the results (Fig. 8) show that for both winter- and summer-hemispheric midlatitudes, such limits are rather insensitive to the forecast model or resolution, and likely arise from the intrinsic dynamics of the atmosphere (Zhang et al. 2007; Rotunno and Snyder 2008; Sun and Zhang 2016). Due to the use of a higher horizontal resolution, the NGGPS FV3 model better resolves the small-scale atmospheric motions as can be seen by the extended background energy spectrum at smaller scales (Fig. 8). Yet the evolution of the forecast error in the NGGPS FV3 model, as reflected by the perturbation kinetic energy spectrum at 500 hPa, does not show significant differences with that in the ECMWF model. Once again, we find that after 3–5 days the differences between two initially nearly identical runs is comparable with our current operational analysis uncertainty (Fig. 8).
This consistency between two completely different models (with different dynamical cores, different physics, and different resolutions) also strengthens our confidence that these two state-of-the-science NWP models are “appropriate” to assess the intrinsic predictability limit, at least for the periods examined in this study. It is safe to say that minute uncontrollable initial-condition uncertainties originating from convective and mesoscale instabilities can grow upscale and will eventually limit the predictability of various weather systems at increasingly larger scales. The impacts of the background governing dynamics and instabilities on the limits of intrinsic predictability may also be inferred from the differences in Eq. (1) fitted values of α (synoptic-scale error growth rate likely controlled by synoptic instabilities such as baroclinicity) and β (upscale error growth rate likely controlled by small-scale instabilities and moist physics including convention). As denoted in Figs. 3c and 3d, a larger value of α and a smaller value of β are derived from the winter cold-surge event than those derived from the summer flooding event. This is consistent with stronger baroclinicity and weaker convective instability in the winter than in the summer, although more research is needed to further quantify such relationships (Reynolds et al. 1994; Magnusson and Källén 2013).
d. Possible dynamic processes
Given both models agree on the forecast lead time we could gain from reducing initial-condition uncertainty, the question then arises, What dynamic processes control the error growth and eventually limit the predictability of midlatitude weather? While this is surely an important question and needs future research (Rosinski and Williamson 1997; Magnusson 2017), some details on the evolution of the ensemble spread during specific cases may give us some insights and guidance into this question.
Figure 9 shows the evolution of the ensemble spread of 500-hpa meridional winds for the first 3 days in EDA0.1 integrated from 0000 UTC 26 December 2015, especially focused on a developing extratropical cyclone over the west coast of the United States. The blue contour is the 500-hPa geopotential height and the gray contour is the region with 12-h precipitation greater than 0.1 mm. We can see that, at day 1 the ensemble spread first shows up in the precipitating region. Then the spread increases and moves with the synoptic system, propagating both upstream and downstream in the meanwhile. After 3 days, the ensemble spread could be found anywhere in the midlatitude bands, although with a maximum in the synoptic storms. The key idea that minute perturbations will first generate errors in small-scale moist convective systems and the errors then grow upscale is consistent with previous studies (Zhang et al. 2007; Sun and Zhang 2016). We could also take a first look at the propagation of the ensemble spread using the Hovmöller diagram, which is plotted in Fig. 10 for the EDA0.1 ensembles initialized at 0000 UTC 24 December 2015. As marked subjectively by different type of arrows, there exist at least three characteristic pathways for the error to evolve and grow through time globally. The dotted ones, mainly shown up in the first few days, have an approximate eastward speed of 10–15 m s−1 and are consistent with the phase speed of individual synoptic weather systems. The double-dashed arrows that have an approximate eastward speed of 25–30 m s−1 likely follow the downstream energy propagation of different baroclinic wave packets, and the thick solid arrows with a slow westward progression signal the error enhancement near the quasi-stationary planetary-scale low pressure centers. These error growth pathways expand beyond the multistage error growth mechanisms identified in a previous regional-scale predictability study (Zhang et al. 2007) and will be examined in more detail in our future study.
4. Concluding remarks
The promising finding of the current study is that, assuming the current-generation state-of-the-science NWP models could capture the most essential physical processes in the real world, we can further improve the forecast accuracy of day-to-day weather events such as the ones we discussed, by up to 5 days, if we reduce the initial-condition uncertainties by a factor of 10. In particular, we examined the predictability of weather forecasts in two showcase studies, and we have looked at the multiscale midlatitude error evolution across different spatial scales. Our study suggests that we are currently still quite far from the ultimate limit of predictability, and it is apparent that we have ample room for further improvement in the day-to-day weather predictability likely for decades ahead.
More quantitatively, it can be inferred from Figs. 3 and 6 that reducing the current initial-condition error represented by EDA by about 20% (~40% smaller in error kinetic energy) can potentially lead to a gain of one more day, and reducing by about 50% for a gain of two more days of additional predictable forecast lead time. Achieving this additional predictability limit can greatly benefit society by saving lives and property but requires continued coordinated efforts by the entire meteorology community and beyond to design more accurate NWP models performing at refined resolutions, improve and enhance the observing techniques and networks, and make better use of observations with advanced data assimilation and computing techniques.
It is possible those two individual local weather events (cold surge and floods) have slightly different predictability than more typical weather patterns but our findings here are based on mean error growth statistics of the global midlatitudes averaged over many wavelengths and multiple initialization times, not just the regions of the localized hazardous events. Additional calculations that exclude these two local events show consistent results (Fig. 11), as do the calculations in the opposing hemispheric midlatitudes that have no remarkable severe weather events during the same periods examined. Nonetheless, further research is needed to extend the findings to more case studies during all seasons, and preferably with further refined, convection-permitting model resolutions to capture more realistic rapid initial error growth from small-scale moist physics including convection. Although limited sensitivity experiments suggest that the predictability horizon of the day-to-day midlatitude weather is controlled by dynamics and instabilities of the atmosphere and are not particularly dependent on the specific numerical models, it remains possible the estimates may change if future improved models have different perturbation growth’s characteristics. While it remains possible that the incorporation of additional unresolved scales and phenomena could actually lead to an increase in the upper bound of predictability (Lorenz 1982), it is generally acknowledged that improved models will resolve more smaller-scale instabilities and thus the error growth is likely to further increase, at least in smaller scales, in which case the current estimate of intrinsic predictability limit may be on the optimistic side.
Also, despite the use of ensembles, the current study focuses on whether a limit of deterministic forecast longer than 2 weeks can be reached if we concentrate on the instantaneous weather that we experience every day. If we define the “forecast skill horizon” as the lead time when ensemble forecasts cease to be, statistically, more skillful than a climatological distribution, the predictability horizon can be longer than 2 weeks for some variables at large synoptic and planetary scales and with longer periods and lower frequencies (Buizza and Leutbecher 2015; Shukla 1998; Palmer 2017). It is beyond the scope of this study to determine the intrinsic limit of the probabilistic prediction. It is also beyond the limit of the current study to determine the predictability horizon for lower-frequency oscillations such as the Madden–Julian oscillation (Madden and Julian 1971; Zhang 2005; Zhang et al. 2017) or the background-mean weather regimes that potentially have predictive skills at the seasonal-to-intraseasonal time scales and beyond. Moreover, even at the convection-permitting resolution, the current sets of ensemble experiments might be insufficient to fully reproduce multiscale tropical waves coupled with moist convection, and thus future studies on the error growth dynamics and predictability limits for tropical systems and their interactions with midlatitude systems (e.g., Ying and Zhang 2017) are warranted.
This study is partially supported by NSF Grants AGS-1305798 and 1712290 and ONR Grant N000140910526. Computing is performed at ECMWF, NOAA ESRL jet cluster, and the Texas Advanced Computer Center. Review comments from the editor and three anonymous reviewers, and discussions with Richard Rotunno, Dale Durran, Robert Nystrom, James Doyle, and many other researchers on related subjects are beneficial.