1. Introduction
In dynamical systems theory, predictability is often characterized by the largest Lyapunov exponent of the system. This characterization is based on studying the evolution of initially small perturbations to a nonlinear trajectory, assuming that a numerically computed, sufficiently long trajectory can explore the small neighborhood of all possible states of the system (e.g., Ott 2002). Such a characterization may not apply for finite time forecasts and is especially inappropriate when the dimensionality of the dynamics is so high that exploration of the attractor by a typical trajectory takes a very long time. This is the case for a high-dimensional weather prediction model that mimics the evolution of the atmosphere.
Patil et al. (2001) introduced the ensemble dimension (E dimension) [originally called bred vector dimension (BV dimension)] to characterize the spatiotemporally changing complexity of the dynamics for a physically extended large system, such as a state-of-the art numerical weather prediction model. The E dimension is a local, spatiotemporally evolving measure of complexity (Patil et al. 2001; Oczkowski et al. 2005). The calculation of this measure is based on the singular value decomposition of an ensemble-based estimate of the analysis (or forecast) error covariance matrix in a local region. Heuristically, the E dimension measures the evenness of the distribution of the variance between the principal components of the ensemble-based estimate of the forecast error covariance matrix. The lowest possible value of the E dimension, which is one, occurs when the estimated variance is confined to a single spatial pattern of uncertainty. The highest possible value of the E dimension, which is equal to the number of ensemble members N, occurs when the variance is evenly distributed between N independent patterns of uncertainty.
Patil et al. (2001) applied the E dimension diagnostic to operational forecast ensembles of the National Centers for Environmental Prediction (NCEP). They found an intriguing relationship between the regions of low E dimensionality and the magnitude of the ensemble perturbations: the lowest-dimensional regions were often the regions of largest estimated forecast uncertainties. Patil et al. (2001) hypothesized that there was a large potential for analysis and forecast improvements in the regions of low E dimensionality due to the simple structure of potential analysis and forecast error patterns in those area. Most importantly, this result motivated the development of the local ensemble Kalman filter (LEKF; Ott et al. 2004) data assimilation scheme.
While the results of Patil et al. (2001) with the NCEP forecast ensembles were encouraging, they could not be considered conclusive due to some important limitations of the ensemble used in the study. Most importantly, there were only five independent ensemble members available for the calculation. Second, the NCEP ensembles were initialized with the breeding algorithm (Toth and Kalnay 1993, 1997), which tends to force the initial ensemble perturbations toward a few dominant error patterns (e.g., Szunyogh et al. 1997). These limitations of the Patil et al. (2001) study motivated Oczkowski et al. (2005) to repeat the calculations of Patil et al. (2001) with much larger ensembles. Oczkowski et al. (2005), who also employed local energetics diagnostics to identify the atmospheric dynamical processes that led to the development of local low dimensionality, confirmed the earlier result that local low dimensionality was often the result of strong local instabilities that led to the rapid growth of simple error patterns.
The study of Oczkowski et al. (2005) was also based on a bred-vector ensemble. As mentioned earlier, the main problem with this approach is that extreme low dimensionality tends to occur in the initial ensemble as a result of the ensemble generation technique. The main goal of the present study is to investigate the role that changes in the complexity of the local dynamics play in predictability, using an ensemble of initial perturbations that has high E dimension and is consistent with the estimated analysis uncertainties. To achieve this goal, we take advantage of our previous work to test an implementation of the LEKF on the NCEP Global Forecast System (GFS) model (Szunyogh et al. 2005, SEA05 hereafter). We investigate the evolution of the E dimension and the role it plays in predictability in forecasts started from analysis ensembles of SEA05. For a 40-member bred-vector ensemble, the typical values of the E dimension vary between 5 and 25 (Oczkowski et al. 2005), but for a 40-member LEKF ensemble the E dimension is never smaller than 25 and is typically larger than 30 (SEA05).
We carry out experiments for the perfect model scenario: a “true” nonlinear trajectory is generated by a long integration of the model from a realistic Northern Hemisphere winter initial condition. Then, imperfect (perturbed) initial conditions are obtained by assimilating simulated noisy observations of the true states with the LEKF data assimilation system. An important feature of the hypothetical observing network is that the observations are randomly distributed. Thus, unlike a real observing network, the simulated observing network may be assumed to have no effect on the geographical distribution of the analysis and forecast uncertainties (provided that the observational network is not too sparse). Here, the focus is on the spatiotemporal evolution of the forecasts and the forecast uncertainties started from the analyses of SEA05. Although the unique features of the LEKF algorithm make the close relationship between local dimensionality, error growth, and skill of the ensemble to capture the space of forecast uncertainties especially transparent, we believe that our results could be reproduced with any suitably formulated ensemble-based Kalman filter scheme (e.g., Anderson 2001; Bishop et al. 2001; Houtekamer and Mitchell 2001; Evensen 2003; Keppenne and Rienecker 2002; Whitaker and Hamill 2002). In addition, we hope that our results help strengthen the theoretical foundation of the operational practice of using small ensembles to predict the evolution of uncertainties in high-dimensional operational numerical weather prediction models (e.g., Kalnay 2003).
The analysis–forecast system used in our experiments, as well as the experimental design, is described briefly in section 2. Section 3 investigates the geographical distribution and typical evolution of the forecast errors. This section also provides a detailed account of a case of explosive error growth. Section 4 investigates the relationship between the E dimension, forecast error growth, and the skill of the ensemble in tracking the space of the spatiotemporally evolving forecast uncertainties. Section 5 is a summary of our main conclusions.
2. Experimental design
The LEKF scheme is a model-independent algorithm to estimate the state of a large spatiotemporally chaotic system (Ott et al. 2004). The term “local” refers to an important feature of the scheme: it solves the Kalman filter equations locally in model grid space. More precisely, the state estimate at a grid point P is obtained independently from the state estimate at the other grid points, considering the observations and the background state only from a local cube centered at P. The LEKF scheme also provides an estimate of the analysis uncertainty at P and generates an ensemble of analysis perturbations that represent the estimated uncertainty at P. When the LEKF is applied to the assimilation of observations of a perfect model, we use a 4% multiplicative variance inflation (Anderson and Anderson 1999) at each analysis step to increase the estimated analysis uncertainty to compensate for the loss of ensemble variance due to sampling errors and the effects of nonlinearities. In addition to the variance inflation coefficient, the scheme has two tunable parameters: the number of grid points in the local cube and the number of ensemble members.
Here, as well as in SEA05, the LEKF is implemented on a reduced-resolution version of the 2001 operational implementation of the NCEP GFS model. With the exception of the resolution, which is reduced to T62 in the horizontal direction and to 28 levels in the vertical direction, the model we use is identical to the full operationally implemented version of the 2001 NCEP GFS (detailed documentation of the model can be found online at http://www.emc.ncep.noaa.gov/modelinfo).
A time series of true states was generated by a 60-day integration of the model starting from the operational NCEP analysis at 0000 UTC 1 January 2000. The two components of the horizontal wind vector and the temperature were observed at all model levels, and the associated surface pressure was also observed. The assumed observational errors were normally distributed with zero mean and standard deviations of 1 m s−1, 1 K, and 1 hPa, respectively. Initially, observations were generated at all 17 848 horizontal gridpoint locations. Then, reduced observational networks were created by gradually removing observational locations at randomly selected grid points. This approach was applied to construct three additional observational networks that take vertical soundings of the atmosphere at 2000, 1000, or 500 fixed locations every 6 h.
In what follows, we investigate the subsequent evolution of the distribution of the forecast errors. Most of the results presented here are for a configuration of the LEKF that consists of a 40-member ensemble, 7 × 7 × υ gridpoint local cubes (υ is the number of vertical grid points in the cube and changes with altitude; see SEA05 for details), and 2000 simulated vertical soundings. We note that seven grid points is equivalent to a distance of 13.4° in the meridional direction and to a distance of 13.1° in the zonal direction. The initial ensemble perturbations are generated by adding random noise to the operational NCEP background forecast, truncated to the resolution used in this paper, at 0000 UTC 1 January 2000. The distribution of the random noise is identical to that of the simulated observations. That is, except for the effects of statistical fluctuations and truncation errors, the initial background is identical to the operational NCEP background at 0000 UTC 1 January 2000, and in the initial estimate of the background error covariance matrix the error variance is about the same as the observational error variance, while the errors of the different variables at the different gridpoint locations are uncorrelated.
a. Datasets
A state estimate is obtained every 6 h by assimilating the simulated observations with the LEKF scheme. Deterministic forecasts are started from the 0000, 0600, 1200, and 1800 UTC ensemble mean analyses each day. An ensemble of forecasts is also started every 12 h, using the analysis ensemble provided by the LEKF as the initial conditions. Forecast error statistics are generated by comparing the deterministic forecasts to the true states. (The only exceptions are the results presented in section 4c, where the ensemble-mean forecast is compared to the true states.) The forecast error statistics are computed for the 40-day period that starts at the 15th day along the true trajectory. We refer to time using the 40-day period as reference, that is, the first forecast that we verify starts at 0000 UTC on day 1, and the last forecast we verify starts at 1200 UTC on day 40. The model outputs are processed on a 2.5° × 2.5° resolution grid. We present error statistics in the following formats:
Snapshots of errors are presented by mapping the difference between the forecast and the true state on the grid.
Maps of the time-mean absolute error are generated by first computing the absolute value of the difference between the forecasts and the true states at the grid points, then computing the 40-day mean of the absolute values.
The error for a geographical region is obtained by computing the root-mean-square (rms) of the error over all grid points in the geographical region. Plots showing time series of the errors are based on this information. Errors are shown for three geographical regions: NH extratropics (30°N–90°N), Tropics (30°S–30°N), and SH extratropics (90°S–30°S).
The spectrally filtered errors for a geographical region are obtained by first spectrally filtering the gridpoint values along each latitude, based on the zonal wavenumbers, then computing the rms over the region.
The time-mean absolute error for a geographical region is obtained by computing the 40-day mean of the root-mean-square error for the given geographical region.
3. Evolution of the forecast errors
The simulations in SEA05 found that the largest wind and temperature analysis errors were in the main regions of deep convection in the Tropics, while the smallest analysis errors were found in the midlatitude storm track regions. Figure 1 illustrates the rapid change in the geographical distribution of the errors as the forecasts progress, showing the time mean of the forecast errors for the meridional component of the wind vector at the 500-hPa level (the figure shows the time mean over all 160 forecast cycles). There seems to be a relationship between the errors in the region of deep convection and the early amplification of the errors in the North Pacific storm track region. Then the errors propagate westward along the upper-tropospheric waveguides. Although a clear indication of rapidly growing errors in the North Atlantic and Southern Hemisphere storm track regions can be seen first at the 48-h forecast lead time, the storm track regions become the location of the dominant error patterns in the extratropics by the 72-h forecast lead time.
a. Dependence on the geographical region
The difference between the error growth characteristics in the extratropics and the Tropics becomes obvious by investigating the time evolution of the root-mean-square forecast errors for the different geographical regions (shown by closed squares in Figs. 2 and 3). The most striking difference between the extratropics and the Tropics is in the functional dependence of the error growth on the forecast lead time. (Notice that although the vertical scale in Fig. 2 is logarithmic, the vertical scale in Fig. 3 is linear.) In the extratropics, the root-mean-square of the forecast error is approximately an exponential function of the forecast lead time for the first 72 h, that is, zf(t) = zaert, where the scalar r denotes the exponential error growth rate. After about 72 h, the error growth starts slowing down, indicating an initial stage of nonlinear error saturation. In contrast, in the Tropics, the root-mean-square of the forecast error, zf(t), is a linear function of the forecast lead time, that is, zf(t) = bt + za, where za ≈ zf(0) is the root-mean-square analysis error and the scalar b is the linear error growth rate.
We obtain estimates of the parameters za and r by calculating their values for the curves that best fit the forecast errors for the first 72 h in the least squares sense. Although the initial errors are very slightly larger in the SH extratropics (not shown) than in the NH extratropics (0.42 versus 0.39 m s−1), the forecast errors grow a bit more slowly in the SH (not shown) than in the NH extratropics; the error doubling time T = r−1 ln2 is 38.5 h in the SH extratropics versus 34.7 h in the NH extratropics.
Interestingly, the functional dependence of the error growth is independent of the spatial scale in both regions: except for the zonal mean term (k = 0), the initial error grows exponentially for all wavenumber ranges in the extratropics (Fig. 2), while the error grows linearly for all wavenumber ranges in the Tropics (Fig. 3). The linear error growth rate b and the initial exponential growth rate r are larger for the wavenumber ranges k = 1–10 and k = 11–20 than for the range k = 21–40. Also, the errors tend to start saturating earlier for the smaller scales (larger wavenumbers).
b. Dependence on the LEKF parameters
We have carried out experiments to test the sensitivity of the forecast results to the free parameters of the analysis scheme (results are not shown). We find that, within a reasonable range of the parameters, the forecast errors depend only weakly on the parameters. More precisely, the small initial differences between the analyses for 5 × 5 × υ, 7 × 7 × υ, and 9 × 9 × υ local cubes show negligible growth in the forecast phase. Likewise, for a 5 × 5 × υ local region size, the advantage of the 80-member ensemble filter over the 40-member ensemble filter is negligible in the first 72 h. Since the dominant errors grow exponentially in the extratropics, our result shows that differences in the analysis due to changes of the free parameters have only a very small projection on the dominant instabilities. This indicates that, when the parameters of the LEKF scheme are chosen from a reasonable range, the scheme can efficiently remove the growing error components. This is a nontrivial result since the scheme corrects errors that were growing before the analysis time, while the forecast errors are governed by errors that are growing after the analysis time. An important practical consequence of the weak sensitivity to the tunable parameters is that it greatly increases the generality of our predictability assessment.
c. Dependence on the number of observations
In sharp contrast to the aforementioned weak sensitivity to the tunable parameters, the observational density has a significant influence on the accuracy of the forecasts. Increasing the number of observations substantially improves the accuracy of the forecasts in all geographical regions (results are not shown).
In the Tropics, the improvement is essentially constant in time, due to a weak dependence of the linear error growth rate on the number of observations. This result suggests that increasing the number of observations in the Tropics leads to a reduction of the magnitude of the forecast errors, but it does not change the characteristics of error growth. Likewise in the extratropics, the influence of the observational density on the exponential error growth rate is modest, although the error growth is slightly faster for the higher observational density (Table 1).
d. Temporal variability of the forecast errors
Among the three geographical regions considered in this paper, the temporal variability of the forecast errors is highest in the NH extratropics and lowest in the Tropics (Fig. 4). The high variability in the NH extratropics is due to episodes of unusually large forecast errors. The first such episode is a pattern of extremely large errors in forecasts started between 1200 UTC on day 4 and 0000 UTC on day 7. We find (results not shown) that improving the accuracy of the analysis, by adding more observations and/or increasing the ensemble size, leads to minuscule reductions in the forecast errors at these verification times. This indicates that the unusually large forecast errors in this case are more associated with low predictability of the atmospheric states than with the accuracy of the analyses. An inspection of the atmospheric flow regimes reveals that the relatively low predictability of these states is associated with the rapid amplification of errors in the presence of an unusually strong jet stream in the North Atlantic storm track region (further details on this event are provided in sections 3e and 4b).
The second episode involves a pattern of unusually large analysis errors between about day 16 and day 24, which lead to a proportionally elevated level of forecast errors at the associated verification times. An inspection of the spatiotemporal evolution of the errors for this period (not shown) reveals that the relatively large errors are due to exceptionally large analysis error in the region of Indonesia that later propagate into the NH extratropics. The visible propagation of the time-mean forecast errors from the Tropics to the extratropics shown in Fig. 1 is associated with this episode.
e. A case of explosive error growth
To gain a better understanding of the processes that lead to the explosive error growth in the aforementioned first episode, we select the forecast started at 1200 UTC on day 6 for further inspection. Maps of the forecast errors show that the explosive error growth at the 36-h lead time occurs in a very localized region off the coast of Newfoundland (Fig. 5).
For the next 24 h, the dominant error pattern is characterized by an eastward-propagating, rapidly amplifying dipole structure. This structure and the fast propagation speed indicate that the dominant error pattern takes the shape of a packet of synoptic-scale Rossby waves. This conclusion can be confirmed by calculating the packet envelope of the forecast errors for the 4- to 9-wavenumber range with a Hilbert transform-based method (Zimin et al. 2003, 2006). Using the technique of Zimin et al. (2006), Fig. 6 depicts an amplifying eastward-extending envelope of errors. An inspection of the vertical cross section of the errors (not shown) also confirms that the error growth starts in the jet layer with an overestimation of the wind speed in the core of the jet and a small distortion of the upper-tropospheric wave near the core of the jet. Although downstream development [an initial divergence of the ageostrophic fluxes that triggers a baroclinic energy conversion; see Orlanski and Chang (1993) and Orlanski and Sheldon (1995)] leads to the development of a closed low associated with the upper-tropospheric wave, the largest forecast errors occur further downstream, near the leading edge of the wave packet shown in Fig. 6. Such propagation of the dominant errors was documented and analyzed in detail in Persson (2000), Szunyogh et al. (2000, 2002), Zimin et al. (2003), and Hakim (2005) and was foreseen long ago by the pioneers of numerical weather prediction (Rossby 1949; Charney 1949; Phillips 1990).
In our example of rapid error growth, the atmospheric instability that drives the propagation of the errors is a growing uncertainty in the characteristics (phase and amplitude) of finite amplitude waves generated by an earlier downstream baroclinic development. (Here the term “instability” is used in the mathematical sense, i.e., it refers to a growing uncertainty in the solution due to an uncertainty in the initial condition.) The potential importance of an instability process, in which an earlier baroclinic or barotropic instability leads to uncertainties in the characteristics of the developing finite-amplitude waves, was first pointed out by Snyder (1999). That the dominant errors propagate along the upper-tropospheric waveguides (Fig. 1 and related discussion earlier) suggests that this may be the most important instability in the model solutions (forecasts). The importance of this instability process, in which temporal evolution and spatial propagation play equally important roles, reinforces our view that the atmosphere should always be approached as a spatiotemporally chaotic system.
4. The role of local dimensionality
SEA05 found that the efficiency of the LEKF algorithm was inversely proportional to the E dimension. More precisely, a strong negative correlation was found between the gridpoint values of the time-mean E dimension and the gridpoint values of the time mean of the explained variance. The explained variance measures the portion of the error that is captured by the ensemble. In what follows, we investigate the relationship between E dimension, explained variance, and the magnitude of forecast errors.
a. E dimension, explained variance, and forecast error
While the choice of the coordinates of the state vector does not affect the state estimates, it has a profound effect on the singular value decomposition (SVD) of the error covariance matrices. Thus, the choice of coordinates has an important effect on such SVD-based diagnostics as the E dimension. We follow the strategy of Oczkowski et al. (2005) and transform the ensemble perturbations so that the square of the Euclidean norm of the transformed perturbations has dimensions of energy. The local state vector is defined by all gridpoint variables in a local volume that contains a 5 × 5 horizontal grid (at 2.5° resolution in both direction) and the entire model atmosphere in the vertical direction.
This definition of the local state vector differs from that used for the calculation of the E dimension in SEA05. There, the local volume was defined by the local volume used in the LEKF algorithm, in which only a few model levels were included in the vertical direction and the number of model levels in the vertical layers was height dependent. The rationale for this change is that in SEA05 the goal was to evaluate the assumptions made in the implementation of the LEKF on the NCEP GFS; here, the goal is to study the role of local dimensionality in shaping the local predictability.
As expected based on the results of SEA05, the E dimension is typically higher in the Tropics (Fig. 7) than in the extratropics for the entire 5-day forecast range. While the E dimension decreases with increasing forecast time over the entire globe, the decrease of the dimension is much faster in the storm track regions than elsewhere. One may wonder whether this effect is associated with an inherent property of the model dynamics or arises from an unexpected collapse of the ensemble due to some unforeseen problem with the ensemble generation technique. To answer this question, we apply the explained variance diagnostic (see SEA05) to the forecast error and the forecast ensemble. The explained variance diagnostic measures the portion of the forecast error that lies in the space spanned by the evolving ensemble perturbations. (Formally, it is calculated by projecting the forecast error on the space of the ensemble, then taking the square of the projection, which is finally normalized by the square of the forecast error to obtain the measure). In the extreme cases, when the ensemble perfectly captures the space in which the forecast error evolves, the explained variance is one, and when the forecast error falls entirely outside of the ensemble space, the explained variance is zero. The close relationship between the typical regions of low dimensionality and the typical regions of high explained variance can be deduced subjectively by comparing Figs. 7 and 8. This observation motivates us to assess the relationship between the two quantities in a more quantitative way. In addition, we would like to know whether such a strong relationship exists only for the temporal means of the two quantities or whether one is also present for the spatiotemporally evolving fields. To achieve these two objectives, we study the joint probability distribution of the E dimension and the explained variance in the NH extratropics (Fig. 9) and the Tropics (Fig. 10). (The joint probability distribution for the SH extratropics is similar to that for the NH extratropics, thus it is not shown.)
The joint probability distribution function is obtained by counting the number of cases when a pair of values for the E dimension and the explained variance falls into a bin defined by a small interval ΔE of the E dimension and a small interval ΔEV of the explained variance. Then the number of cases is normalized by ΔE × ΔEV × n and the bin is color shaded based on the result. The total sample size n is equal to the total number of grid points in the given geographical region times the number of verification times, 160, on which the sample is based. This normalization ensures that the integral of the plotted values over all bins is equal to one.
The most important common feature of the joint probability for the NH extratropics and Tropics is that the smaller the E dimension, the larger the possible smallest value of the explained variance. In other words, the lower the E dimension, the higher the confidence we can have that the ensemble captures the actual forecast error. In addition, as the forecast time increases, the lowest possible value of the E dimension decreases, and the lowest values of the E dimension become an increasingly sharper predictor of a high explained variance. We also note that the boundary between the NH extratropics and the Tropics is not sharp: when the two figures are merged (not shown) there is no visible jump in the probability distribution, because the high E dimension end of the distribution for the NH extratropics and the low E dimension end of the distribution for the Tropics are populated by values from the transient region between the two areas.
What makes the close relationship between low E dimension and high explained variance potentially valuable from a forecasting point of view in the extratropics is that fast error growth always leads to low E dimension. (We note that the opposite is not true, the forecast error can be small for a case of low E dimension at any forecast time.) That is, we can have the highest confidence in the ability of the ensemble to predict the space of possible errors, when the errors are the largest. This property of the ensemble is illustrated by Figs. 11 and 12. Figure 11 shows the joint probability distribution for the analysis and forecast errors and the explained variance in the NH extratropics. It can be seen that as the forecast lead time increases, the ensemble captures an increasingly larger portion of the forecast errors for the cases of large errors. This can be explained by the fact that the fast error growth always leads to low E dimension, that is, to high explained variance (Fig. 12).
The picture is very different for the Tropics (Figs. 13 and 14). In this region, the magnitude of the forecast error is more directly related to the magnitude of the analysis error due to the linear nature of the error growth. Since the analysis errors are smaller for the lower E dimensions, the forecast errors are also small for the low E dimensions. (We can start seeing a shift of the larger errors toward the smaller E dimensions only after 72 h.) Thus the highest explained variance occurs for relatively small errors.
b. Local low dimensionality and explosive local error growth
So far we have shown that there is a close statistical relationship between E dimension, explained variance, and forecast error. Here we illustrate this close relationship using the example of the explosive forecast error growth described in section 3e. In this case, the overlap between the regions of large errors and low dimensionality is almost perfect (Fig. 15), especially at and after the 36-h forecast lead time. Likewise, the explained variance rapidly grows in the regions of rapidly decreasing dimensionality, where the explained variance exceeds 90% at and beyond the 24-h forecast lead time (Fig. 16).
c. Local low dimensionality and the spread–skill relationship
It has been long thought that the spread (the second moment) of a suitably prepared ensemble forecast can be used as a predictor of the skill of the ensemble-mean forecast (Leith 1974). It has also been observed, however, that the positive correlation between the spread and the forecast error is disappointingly small; even in the perfect model scenario, the correlation was found to be less than 0.5 (Barker 1991). The theoretical explanation for this result was provided by Houtekamer (1993) and Whitaker and Loughe (1998) using a simple stochastic model of the spread–skill relationship: a large correlation can be expected only when the temporal variability of the forecast (or analysis) error is large. This rule explains the behavior of the spread–skill relationship for the LEKF system shown in Fig. 17: (i) initially the correlation increases due to the increasing variability of the forecast errors as the forecast time increases (see Fig. 4); (ii) the correlation peaks at a level slightly below 0.5 at the 72-h forecast lead time in all three geographical regions; and (iii) the maximum value of the correlation is the largest in the NH extratropics, the region where variability of the forecast errors is the largest. The low initial correlation can be explained by the fact that an ensemble-based data assimilation system, such as the LEKF, is designed to remove that part of the analysis error that is successfully captured by the ensemble. The only surprising feature in Fig. 17 is the relatively high initial correlation in the Tropics. The only plausible explanation for this is that in the Tropics, the location of the dominant analysis errors is better captured by the ensemble than the structure of the errors. This result reinforces our earlier conjecture, drawn in section 3c, that the assimilation of observations in the Tropics reduces the magnitude of the errors in the state estimation but does not change drastically the structure of the errors. This indicates that there is no strong relationship between errors at the different grid points in the Tropics.
The joint probability distribution function for the ensemble spread and the error in the ensemble-mean forecasts is shown in Fig. 18. This figure indicates that the ensemble spread is typically smaller than the error in the ensemble mean. This finding is not unexpected, since as was shown earlier (e.g., Fig. 8), part of the forecast error is not captured by the ensemble. (For short forecast lead times, the ensemble-mean forecast and the forecast started from the analysis mean are nearly identical due to the nearly linear initial evolution of the ensemble perturbations.) In addition, the ensemble spread predicts the upper bound of the error most reliably at locations where the E dimension is the smallest (Fig. 19). In contrast to the case of the single deterministic forecast, where the largest errors occur for the smallest E dimensions, the errors in the ensemble-mean forecast are relatively small in the regions of the smallest E dimensions. This is due to the efficient error-filtering effects of ensemble averaging in regions where the ensemble efficiently captures the space of uncertainties, that is, in regions of high explained variance.
5. Conclusions
In this paper, we assess atmospheric predictability with the help of a state-of-the-art numerical weather prediction model (at a reduced resolution) and the local ensemble Kalman filter data assimilation scheme. Our experimental design addresses the issue of determining the degree to which uncertainty in the knowledge of the initial state influences the predictability of a high-dimensional, spatiotemporally chaotic system. We assume that the numerical model provides a perfect representation of the true atmospheric dynamics. Our main findings are as follows:
For this specific choice of the model and data assimilation system, the forecast errors grow exponentially in the extratropics and linearly in the Tropics. As exponential growth has been found in many previous studies that considered different types of uncertainties in the knowledge of the true initial conditions, the dominance of exponentially growing features seems to be an important property of predictability in the extratropics. Our earlier research indicates that these dominant instabilities are closely related to the synoptic-scale local generation and propagation of the eddy kinetic energy. Since these processes can be well simulated by the models, there are good reasons to believe that exponentially growing instabilities dominate real atmospheric dynamics in the extratropics. The linear growth of errors in the Tropics is a more unique result of our experiments. While this result may be an artifact of the model dynamics, which rely heavily on parameterized physical processes in the Tropics, we tend to believe that the real atmosphere behaves similarly.
The explained variance is always highest for the lowest E dimension, independently of the geographical region and the forecast lead times. (As was shown in SEA05, this guarantees that the analysis errors are the smallest for the smallest E dimension independently of the geographical region.)
In the extratropics, large forecast errors gradually become more likely to occur in regions of low E dimension as the forecast time increases. Thus, the ensemble gradually becomes more likely to capture a large portion of the forecast error as the forecast time increases. The larger the forecast error, the larger the portion of the forecast error that the ensemble captures with high certainty.
Since the ensemble captures a larger portion of the forecast error with high certainty in the regions of low E dimension, in those regions ensemble averaging becomes an efficient error filter and the ensemble spread provides an accurate prediction of the upper bound of the error in the ensemble-mean forecast.
In the Tropics, due to the linear error growth, the magnitude of the forecast error is closely tied to the magnitude of the analysis error. Since the analysis errors are small for the smallest E dimensions, the forecast errors are also small for the smallest E dimensions. In our experiments, this pattern starts breaking up beyond a forecast lead time of 72 h.
Do these results have any practical use when the forecast model is not perfect? First of all, it is safe to assume that the local dimensionality of the true atmosphere is higher than in our global forecast model. This would degrade the ability of the model-based ensemble to capture the space of forecast uncertainties. We note that, in principle, the LEKF algorithm could be used to estimate the effect of forecast errors on the E dimension. The extension of the LEKF algorithm described in Baek et al. (2006) provides an estimate of the model errors in addition to the estimate of the state. More precisely, it provides an estimate of the augmented state, where the state is augmented by the parameters that describe the model errors. The E dimension could be determined by using the augmented state to define the local background covariance matrix. It is yet to be seen, however, whether the model errors can be efficiently parameterized for a complex weather prediction model, such as the NCEP GFS.
Local low dimensionality is a property that eventually breaks down with increasing forecast lead time. Eventually, predictability is completely lost, and the predictive value of the ensemble becomes the same as that of a set of randomly drawn samples from the much larger set of climatologically realizable states of the model. The larger the magnitude of the initial ensemble perturbations, the earlier the breakdown of local low dimensionality occurs. For instance, Oczkowski et al. (2005) observed such breakdowns at forecast lead times of as little as 24 to 48 h when investigating the evolution of a set of bred vectors. In our experimental design, the magnitude of the analysis uncertainty is small (presumably an order of magnitude smaller than in an operational weather analysis), so our results are not affected by an overall breakdown of local low dimensionality in the first 120 h of model integration. Our plan is to investigate the process of the breakdown of low dimensionality in a future paper for both simulated and real observations.
Acknowledgments
The authors thank one of the anonymous reviewers for insightful comments. D. K. was supported by a Rising Star Fellowship from the National Institute for Aerospace, Hampton, Virginia. This work was also supported by a National Oceanic and Atmospheric Administration THORPEX Grant, the Army Research Office, a James S. McDonnell 21st Century Research Award, the NPOESS Integrated Program Office (IPO), the Office of Naval Research (Physics), and the National Science Foundation (Grants 0104087 and PHYS 0098632). E. J. K. gratefully acknowledges support from the National Science Foundation (Grant DMS-0408102).
REFERENCES
Anderson, J. L., 2001: An ensemble adjustment filter for data assimilation. Mon. Wea. Rev., 129 , 2884–2903.
Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127 , 2741–2758.
Baek, S-J., B. R. Hunt, E. Kalnay, E. Ott, and I. Szunyogh, 2006: Local ensemble Kalman filtering in the presence of model bias. Tellus, 58A , 293–306.
Barker, T. W., 1991: The relationship between spread and forecast error in extended-range forecasts. J. Climate, 4 , 733–742.
Bishop, C. H., B. J. Etherton, and S. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129 , 420–436.
Charney, J. G., 1949: On a physical basis for numerical prediction of large-scale motions in the atmosphere. J. Meteor., 6 , 371–385.
Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53 , 343–367.
Hakim, G. J., 2005: Vertical structure of midlatitude analysis and forecast errors. Mon. Wea. Rev., 133 , 567–578.
Houtekamer, P. L., 1993: Global and local skill forecasts. Mon. Wea. Rev., 121 , 1834–1846.
Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129 , 123–137.
Kalnay, E., 2003: Atmospheric Modeling, Data Assimilation, and Predictability. Cambridge University Press, 341 pp.
Keppenne, C. L., and M. M. Rienecker, 2002: Initial testing of a massively parallel ensemble Kalman filter with the Poseidon isopycnal ocean general circulation model. Mon. Wea. Rev., 130 , 2951–2965.
Leith, C. E., 1974: Theoretical skill of Monte Carlo forecasts. Mon. Wea. Rev., 102 , 409–418.
Oczkowski, M., I. Szunyogh, and D. J. Patil, 2005: Mechanisms for the development of locally low-dimensional atmospheric dynamics. J. Atmos. Sci., 62 , 1135–1156.
Orlanski, I., and E. K. M. Chang, 1993: Ageostrophic geopotential fluxes in downstream and upstream development. J. Atmos. Sci., 50 , 212–225.
Orlanski, I., and J. P. Sheldon, 1995: Stages in the energetics of baroclinic systems. Tellus, 47A , 605–628.
Ott, E., 2002: Chaos in Dynamical Systems. 2d ed. Cambridge University Press, 490 pp.
Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A , 415–428.
Patil, D. J., B. R. Hunt, E. Kalnay, J. A. Yorke, and E. Ott, 2001: Local low dimensionality of atmospheric dynamics. Phys. Rev. Lett., 86 , 5878–5881.
Persson, A., 2000: Synoptic-dynamic diagnosis of medium range weather forecast systems. Proc. Seminars on Diagnosis of Models and Data Assimilation Systems, Reading, United Kingdom, ECMWF, 123–137.
Phillips, N. A., 1990: Dispersion processes in large-scale weather prediction. WMO-No. 700, World Meteorological Organization, 126 pp.
Rossby, C-G., 1949: On the dispersion of planetary waves in a barotropic atmosphere. Tellus, 1 , 54–58.
Snyder, C., 1999: Error growth in flows with finite-amplitude waves or coherent structures. J. Atmos. Sci., 56 , 500–506.
Szunyogh, I., E. Kalnay, and Z. Toth, 1997: A comparison of Lyapunov and optimal vectors in a low-resolution GCM. Tellus, 49A , 200–227.
Szunyogh, I., Z. Toth, R. E. Morss, S. J. Majumdar, B. J. Etherton, and C. H. Bishop, 2000: The effect of targeted dropsonde observations during the 1999 Winter Storm Reconnaissance Program. Mon. Wea. Rev., 128 , 3520–3537.
Szunyogh, I., Z. Toth, A. V. Zimin, S. J. Majumdar, and A. Persson, 2002: Propagation of the effect of targeted observations: The 2000 Winter Storm Reconnaissance Program. Mon. Wea. Rev., 130 , 1144–1165.
Szunyogh, I., E. J. Kostelich, G. Gyarmati, D. J. Patil, B. R. Hunt, E. Kalnay, E. Ott, and J. Yorke, 2005: Assessing a local ensemble Kalman filter: Perfect model experiments with the NCEP global model. Tellus, 57A , 528–545.
Toth, Z., and E. Kalnay, 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc., 74 , 2317–2330.
Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125 , 3297–3319.
Whitaker, J. S., and A. F. Loughe, 1998: The relationship between ensemble spread and ensemble mean skill. Mon. Wea. Rev., 126 , 3292–3302.
Whitaker, J. S., and T. H. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130 , 1913–1924.
Zimin, A. V., I. Szunyogh, D. J. Patil, B. R. Hunt, and E. Ott, 2003: Extracting envelopes of Rossby wave packets. Mon. Wea. Rev., 131 , 1011–1017.
Zimin, A. V., I. Szunyogh, D. J. Patil, B. R. Hunt, and E. Ott, 2006: Extracting envelopes of nonzonally propagating Rossby wave packets. Mon. Wea. Rev., 134 , 1329–1333.
Time-mean absolute error in forecasts of the meridional wind component at the 500-hPa pressure level at different forecast lead times.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Dependence of the time-mean forecast error on the forecast lead time for the meridional wind component at the 500-hPa level in the NH extratropics. The evolution of the forecast error is shown for different ranges of the zonal wavenumber k.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Dependence of the time-mean forecast error on the forecast lead time for the meridional wind component at the 500-hPa level in the Tropics. The evolution of the forecast error is shown for different ranges of the zonal wavenumber k.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Time series of the root-mean-square forecast error for different forecast lead times. Shown is the forecast error for the meridional wind component at the 500-hPa level.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Time evolution of the errors in the forecast started at 1200 UTC on day 7. Shown are the errors (color shades) and the “true” state of the geopotential height of the 500-hPa pressure level.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Time evolution of the wave packet envelope of errors in the forecast started at 1200 UTC on day 6. The wave packet envelope is calculated based on errors in the prediction of the meridional component of the wind vector in the zonal wavenumber range from 4 to 9. Notice the change in the color scheme between the 36- and 48-h forecast lead times.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Time-mean E dimension at different forecast lead times.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Time-mean explained variance at different forecast lead times.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Joint probability distribution of the E dimension and the explained variance in the NH extratropics. The bins are defined by ΔE = 0.2 and ΔEV = 0.005.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Joint probability distribution of the E dimension and the explained variance in the Tropics. The bins are defined by ΔE = 0.2 and ΔEV = 0.005.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Joint probability distribution of the explained variance and the magnitude of the error in the forecast of the meridional component of the wind at the 500-hPa level in the extratropics. The bins are defined by ΔE = 0.005 and ΔER = 0.4, where ΔER is the interval for the forecast error.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Mean E dimension for the bins shown in Fig. 11.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Joint probability distribution of the explained variance and the magnitude of the error in the forecast of the meridional component of the wind at the 500-hPa level in the Tropics. The bins are defined by ΔE = 0.005 and ΔER = 0.4.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Mean E dimension for the bins shown in Fig. 13.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Shown are the E dimension (color shades) and the geopotential height forecast error at the 500-hPa level in the forecasts started at 1200 UTC on day 6.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Shown are the E dimension (color shades) and the explained variance (contours) in the forecasts started at 1200 UTC on day 6. The contour interval is 0.1 and values smaller than 0.7 are not shown.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Correlation between ensemble spread and error in the ensemble-mean forecast as a function of forecast time.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Joint probability distribution of the ensemble spread and the magnitude of the error in the ensemble-mean forecast of the meridional component of the wind at the 500-hPa level in the NH extratropics. The width of the bins is 0.005 for the ensemble spread and 0.4 for the forecast error.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
Mean E dimension for the bins shown in Fig. 18.
Citation: Journal of the Atmospheric Sciences 64, 4; 10.1175/JAS3885.1
NH extratropics root-mean-square analysis error, za, and error doubling time for the meridional wind component at the 500-hPa level at different observational densities. While these values are slightly different for the other model variables, they show the same tendencies.