## Abstract

Nonlinear local Lyapunov vectors (NLLVs) are developed to indicate orthogonal directions in phase space with different perturbation growth rates. In particular, the first few NLLVs are considered to be an appropriate orthogonal basis for the fast-growing subspace. In this paper, the NLLV method is used to generate initial perturbations and implement ensemble forecasts in simple nonlinear models (the Lorenz63 and Lorenz96 models) to explore the validity of the NLLV method.

The performance of the NLLV method is compared comprehensively and systematically with other methods such as the bred vector (BV) and the random perturbation (Monte Carlo) methods. In experiments using the Lorenz63 model, the leading NLLV (LNLLV) captured a more precise direction, and with a faster growth rate, than any individual bred vector. It may be the larger projection on fastest-growing analysis errors that causes the improved performance of the new method. Regarding the Lorenz96 model, two practical measures—namely the spread–skill relationship and the Brier score—were used to assess the reliability and resolution of these ensemble schemes. Overall, the ensemble spread of NLLVs is more consistent with the errors of the ensemble mean, which indicates the better performance of NLLVs in simulating the evolution of analysis errors. In addition, the NLLVs perform significantly better than the BVs in terms of reliability and the random perturbations in resolution.

## 1. Introduction

The atmosphere is a chaotic system, and even negligible initial errors will give rise to gradual deviation of the forecast state from the true path, eventually resulting in chaos (Lorenz 1963a,b, 1965; Li and Chou 1997). This means that the weather has a predictability limit beyond which forecasts will lose all skill. Based on this, any single forecast is simply an estimate of the future state of the atmosphere within a stochastic framework but provides no information regarding its reliability. Ensemble prediction offers one approach to generate probabilistic forecasts of the future state of the system based on a statistical sampling approach (Leith 1974).

The ensemble method provides an approximate description of the probability density function (PDF) of the forecast state based on a finite number of samples. In a generally perfect model, if a group of initial states sample the uncertainty related to the initial value reasonably well, the states of their individual integration will have the ability to represent the uncertainty of the forecast states. In contrast with a single forecast, ensemble averaging provides a nonlinear filter that reduces error (Toth and Kalnay 1997). However, the ensemble mean simply treats the ensemble forecast as a single forecast, which does not indicate how reliable the result is. The biggest advantage of ensemble prediction is the product of the second moment of the ensemble, or the ensemble spread (Whitaker and Loughe 1998; Zhu et al. 2002). With this additional information, the quality of forecasts can be significantly enhanced.

Initially, the use of ensemble techniques focused on random samples (Monte Carlo forecasting) as a description of the probability distribution of initial states (Epstein 1969; Leith 1974). However, the atmosphere is an extremely complex system, which has a very high phase-space dimension; that is, the number of random samples must be sufficiently large. Consequently, the cost of performing the requisite number of predictions presents a significant challenge. However, as more and more intensive studies of data assimilation procedures have been completed, analysis errors are no longer considered to be totally unknown. Pure random perturbations may introduce irrelevant directions to a chosen initial state (Houtekamer and Derome 1994). In the early 1990s, ensemble prediction systems were developed and used operationally by weather prediction centers. Different schemes are used to generate the initial perturbations in the control analysis: for example, the bred vector (BV) method (Toth and Kalnay 1993, 1997) at the National Centers for Environmental Prediction (NCEP), the singular vector (SV) method (Molteni and Palmer 1993; Molteni et al. 1996; Buizza 1997) at the European Centre for Medium-Range Weather Forecasts (ECMWF), and the perturbed-observation approach (Houtekamer et al. 1996) at the Canadian Meteorological Centre (CMC). The BV and SV methods have been widely used in predictability and probabilistic prediction of weather and climate (Cai et al. 2003; Yang et al. 2009; Cheng et al. 2010; Buizza 2010). Recently, some methods based on the ensemble transform concept to generate the initial perturbations, such as the ensemble transform Kalman filter (ETKF), ensemble transform (ET), and ensemble transform with rescaling (ETR) methods, have been developed and attempted to improve operational forecasts (Bishop et al. 2001; Wang and Bishop 2003; Wei et al. 2006, 2008; Descamps and Talagrand 2007; Pu and Hacker 2009).

Of several operational ensemble generation schemes, the breeding method is the most time efficient (Wang and Bishop 2003). Toth and Kalnay (1993, 1997) introduced the breeding cycle as a procedure to simulate the behavior of growing modes in the analysis cycle (see Fig. 1). Their rationale was that in a data assimilation cycle, random errors introduced at each analysis stage would evolve into fast-growing directions in the atmospheric flow; that is, the ratio of fast-growing errors to the total error would significantly increase in the control analysis (Toth and Kalnay 1997). Based on these properties of analysis errors, the breeding method was proposed to simulate the analysis cycle and generate fast-growing error modes. In this process, random perturbations are superposed on the initial condition, evolve directly in a nonlinear model of dynamical flow, and are then scaled down to the same size as the initial perturbations at regular intervals. The final perturbations corresponding to each initial state are called BVs, which are added and subtracted from the control analysis to generate the ensemble samples. A more complete description is given in appendix A.

The BVs are a nonlinear extension of the local Lyapunov vectors (LLVs) (Toth and Kalnay 1997; Kalnay 2003). Through the breeding cycles, the BVs all generally turn toward the fastest-growing modes. Toth and Kalnay (1997) state that the BVs have strong similarities in some local regions but are quasi-orthogonal globally. However, the high correlation between the local structures of the BVs may degrade the global orthogonality to some extent. Wang and Bishop (2003) analyzed the orthogonal properties of BVs, concluding that “nearly all of the ensemble forecast variance is contained in a single direction for both the simple breeding and the masked breeding.” This result means that BVs may not span the fast-growing subspace efficiently. However, short-term error growth is a local phenomenon. The high correlation of BVs over local regions may limit forecast skill in those regions (Annan 2004, hereafter A04). Several methods have been recently proposed to prevent BVs from collapsing into one dominant vector. For example, A04 attempted to obtain the fastest-growing modes using orthogonal BVs. Primo et al. (2008) introduced a new logarithmic bred vector (LBV) to increase the diversity of the ensemble. Corazza et al. (2003) and Greybush et al. (2013) attempted to keep BVs less correlated by adding a small random perturbation to the perturbed run. These methods improve to some extent the performance of the diversity of the ensemble perturbations. However, they did not effectively eliminate the dependence among the initial perturbations.

Early investigators developed the theory of Lyapunov exponents [or the Lyapunov exponent spectrum (LEs)] to quantitatively estimate the expanding or contracting nature of different directions in the phase space of nonlinear dynamical systems (Benettin 1980; Wolf et al. 1985; Fraedrich 1987; Trevisan and Legnani 1995). The LEs are global properties of the attractor of the dynamical system. Furthermore, local or finite-time Lyapunov exponents have been proposed (Yoden and Nomura 1993; Ziehmann et al. 2000) to measure the short-term growth rate of small initial perturbations. However, these local or finite-time Lyapunov exponents, which are similar to the global Lyapunov exponents, are established based on the assumption that initial perturbations are infinitesimal and therefore can be controlled by linear error growth dynamics. To understand the nonlinear evolution of initial errors, the nonlinear local Lyapunov exponent (NLLE) was proposed to study the predictability of an -dimensional chaotic system or a single variable within the system (Li et al. 2006; Ding and Li 2007; Rui-Qiang et al. 2008; Ding et al. 2008, 2010; Li and Wang 2008; Li and Ding 2009, 2011, 2013). The NLLE depends on the initial state, initial error, and evolution time. For low-order chaotic systems, the error growth rate mainly depends on the fastest-growing direction. While for high-dimensional chaotic systems, such as the atmosphere, other fast-growing directions also make large contributions to the evolution of initial errors. Therefore, to further reveal the dynamical properties of the error evolution of complex nonlinear systems, the maximal NLLE was expanded in a high-dimensional situation and the NLLE spectrum (NLLEs) was proposed to investigate the nonlinear evolution behaviors of initial perturbations along different directions in phase space (Ding 2007; Li and Wang 2008). This group of orthogonal initial perturbations was defined as the nonlinear local Lyapunov vectors (NLLVs). The first NLLV is referred to as the leading NLLV (LNLLV). NLLVs are a set of orthogonal vectors spanning the fast-growing subspace and would be expected to be a quick and efficient ensemble generation scheme.

Traditional BVs are obtained through natural breeding cycles. However, natural breeding may not be efficient for two reasons. First, the BVs span a relatively small part of the subspace of fast-growing directions owing to their dependence of perturbations. Second, BVs naturally evolved from random perturbations may not be able to provide a stable indication of the fastest-growing direction after a finite breeding period. We attempted to use the comparison and orthogonalization methods to obtain NLLVs to overcome these two problems. The comparison method, which was used by A04, attempts to capture the fastest-growing direction by comparing the growth rate of orthogonal BVs; it is, therefore, more effective than a single breeding cycle. Here, we will use this comparison method for the acquisition of the LNLLV. The additional NLLVs can be obtained successively by using Gram–Schmidt reorthonormalization (GSR) (Wolf et al. 1985; Li and Wang 2008). A detailed description of the GSR and comparison methods is given in section 2. The first few NLLVs are obtained in this way by optimally selecting the fastest-growing modes from the initial perturbations; they are strictly orthogonal and thereby provide an appropriate orthogonal basis for the fast-growing subspace. The NLLV ensemble would be expected to outperform the BV ensemble. A comparison of the performances of the BV and NLLV methods in a perfect system is our primary aim here. Results using the Monte Carlo method are also included for reference.

The remainder of this paper is arranged as follows. Section 2 provides a brief description of the NLLE, NLLEs, and NLLVs, while section 3 introduces the idealized models and the initialization of the analysis states. A description of the BV and NLLV ensemble generation schemes is given in section 4, and the performance of the different ensemble methods are quantitatively compared in section 5. The discussion and our conclusions are presented in section 6.

## 2. Defining the nonlinear local Lyapunov vector

Consider a general *n*-dimensional nonlinear dynamical system whose evolution is governed by

where is the state vector at time , the superscript is the transpose, and represents the dynamics. The evolution of a small error vector , superimposed on a state **x**, is governed by the nonlinear equations:

where are the tangent linear terms and are the high-order nonlinear terms of the error . Integrating Eq. (2) from to , we obtain

where is defined as the nonlinear propagator (Ding and Li 2007), which differs from the linear propagator obtained from the tangent linear equation of Eq. (2) (Yoden and Nomura 1993; Ziehmann et al. 2000). In a chaotic system, after several regular rescalings, any initial perturbation started a short time before will converge to the fastest-growing direction , which is defined as the LNLLV.

Once has been acquired, the maximal NLLE can be computed from the rate of change of its norm using Eq. (3):

where depends on the initial state in phase space, the initial error vector , and the evolution time .

If LNLLV along the fastest-growing direction has been obtained, additional NLLVs [] along other directions can be successively obtained by the GSR (Li and Wang 2008). In GSR procedure, the direction of the first vector is never affected. The second vector is projected onto the subspace orthogonal to the first vector, and iteratively the *n*th vector is projected onto the subspace orthogonal to the previous *n* − 1 vectors. If are *m* perturbations through breeding procedure, the GSR would provide the following orthogonal set :

where (⋅, ⋅) signifies the inner product. In practice, are orthogonal to each other, representing the vectors along the directions from the fastest-growing direction to the fastest-shrinking direction. The first few of them will be utilized as ensemble initial perturbations. We may also determine the *i*th NLLE directly from the growth rate of vector :

The NLLEs depend on , and the sum of them determines the divergence or expansion of a phase-space volume element surrounding the initial states. Therefore, the first few NLLVs corresponding to the largest NLLEs properly span the unstable subspace of initial states. The LNLLV is derived through the breeding method; therefore, it presents some similarity to the BV. However, the largest difference between the NLLV method and the BV method is that besides the LNLLV other orthogonal fast-growing directions are also considered in the generation of ensemble perturbations. Because the first few NLLVs point to independent-growing directions, the perturbation growth of analysis error would be much more likely to be simulated.

To obtain the fastest-growing LNLLV more effectively than a single BV, we adapted the comparison method of A04 before orthogonalizing the perturbations (as illustrated in Fig. 2). This method includes two steps: a comparison of the growth rates of perturbations in each breeding cycle and a reordering of the perturbations by their growth rate from the largest to the smallest. The NLLVs bear some similarities to the orthogonal BVs as proposed by A04 in the use of the comparison method. However, some differences still exist between them. First, the orthogonal BVs in A04 are mainly used in the local region to increase local diversity, while the NLLVs are globally orthogonal and represent a nonlinear extension of the LLVs based on the theory of Lyapunov stability. Second, the generated ensemble perturbations in A04 are much less correlated than the BVs but are not strictly orthogonal. A04 reported that the mean correlation between any two ensemble perturbations is 0.17 (i.e., not zero) at the end of the integration time in the Lorenz96 model, implying that these perturbations are not strictly orthogonal. However, the NLLVs are strictly orthogonal (i.e., zero correlation) and so form an appropriate orthogonal basis for the fast-growing subspace.

## 3. Experimental setup

To determine the differences among these ensemble methods as clearly as possible, we made our comparison based on two ideal models, namely, the Lorenz63 (Lorenz 1963a) and Lorenz96 models (Lorenz 1996). The former can be regarded as a one-point model, while the latter can been seen as resembling a low-order real NWP model. We have taken special account of the generation of the control analyses to ensure that they have no partiality for any initialization methods. Anderson (1997) simply used random perturbations as analysis errors to compare dynamically constrained (i.e., BV and SV) and unconstrained perturbation methods, but these kinds of random perturbations in dynamical systems were recognized to be the nongrowing type on average, which would reduce initially (Toth and Kalnay 1993). Therefore, growing perturbations (i.e., BV and SV) may have no advantage in simulating such analysis errors. Houtekamer and Derome (1994) even used any of the BVs as an analysis error to investigate the improvement of the breeding method to control the forecast. Our experiments found that the higher correlation between the BVs and the analysis error give some unfair advantage to the breeding method over other ensemble methods (not shown). To perform a thorough comparison of the different initialization methods, following Descamps and Talagrand (2007), the control analyses in the BV, NLLV, and random perturbation methods were obtained using the ensemble Kalman filter (EnKF) data assimilation method (Evensen 2003, 2004). The detailed description of this assimilation method is shown in appendix B.

The BV and NLLV cycles are implemented in relation to the analysis cycle. In each analysis cycle, an analysis state is updated by combining the observations and forecast states from a numerical model, as shown in Fig. 2. The forecast error that evolves in time through the atmospheric flow will naturally project into the growing directions. Therefore, growing components will develop within the analysis cycle and dominate the analysis errors. The breeding and NLLV cycles are implemented to simulate the development of growing errors in the analysis cycle and to obtain fast-growing modes associated with the evolving state of the atmosphere. This allows the BVs and NLLVs to sample efficiently the probability of analysis error.

The first model is the Lorenz63 model (Lorenz 1963a):

where , , and , and for which the well-known butterfly attractor exists. We selected a fourth-order Runge–Kutta time integration scheme with a time step equal to 0.01 time units. The linear Lyapunov exponents of this simple model were calculated to be 0.906, 0, and −14.572, under the given parameters (Wolf et al. 1985). This means that there is only one fast-growing dimension on the attractor. Consequently, we performed a two-member experiment of ensemble prediction (*N* = 2, where *N* denotes the number of the ensemble member in a forecast example) for this model.

The Lorenz96 model is a 40-variable (*m* = 40, where *m* denotes the dimension of the state vector of the ensemble forecast model) model that has been used by various authors as a low-order proxy for atmospheric prediction and assimilation studies (e.g., Lorenz and Emanuel 1998; Anderson 2001). We denote the state variables ( = 1, 2, …, 40), which are governed by the equations

In the case *F* = 8, the model solves Eq. (8) also using a fourth-order Runge–Kutta scheme with a time step of 0.05 time units. This model can be seen as the time evolution of an arbitrary one-dimensional quantity on a constant latitude circle in which a time increment of 0.2 nondimensional units corresponds to about 1 day in terms of the error growth rate (Lorenz 1996).

Here, all experiments are perfect model experiments in which the output of a long-period integration will be regarded as references of the forecasts once the system has reached its attractor. The observations to be assimilated are obtained by adding a random noise to the true state. We assume that there are observations on every grid. If a vector **x** denotes the true state vector, and **y** is the observation vector, then

where is the observation operator that defines the projection of prognostic variables **x** to the space of observational data **y**. The added noise vector has a Gaussian distribution of for both the Lorenz63 and Lorenz96 models, where denotes the sample space of the Gaussian distribution, is the expectation, and is the standard deviation.

To make the experimental setup clearer, two tables listing the experiment setup of EnKF data assimilation and the breeding procedures are included (Tables 1 and 2, respectively). The statistical performances of the forecasts are compared, for each initialization scheme, by averaging over a large number of experiments. The number of experiments was *M* = 10 000 with different initial analysis states for the Lorenz63 model and *M* = 500 for the Lorenz96 model. To ensure that the different experiments were conducted independently, they had different initial states and different observations. For the Lorenz63 model observations were assimilated every 0.1 time units and the performing time was 0.4 time units in each case, while for the Lorenz96 model the assimilation cycle was 0.05 time units (6 h) and was repeated over 10 time units (50 days). Under the parameters as set here, the amplitudes of the initial analysis errors were within the bounds 5%–10% of the natural variability (about 14 for the Lorenz63 model and 2.8 for the Lorenz96 model; in the norm). The mean root-mean-square error (RMSE) of global analysis is about 0.98 for the Lorenz63 model and about 0.17 for the Lorenz96 model.

Once the analysis states are generated, the ensemble perturbations produced by the random perturbation, BV and NLLV schemes will be added and subtracted from them. The BVs and NLLVs are generated through breeding cycles with regular rescaling. For the Lorenz63 model, the length of a breeding cycle of the NLLV and BV methods is 0.1 time units and the process is repeated over 0.4 time units. For the Lorenz96 model, the length of a breeding cycle of both methods is 0.2 time units (1 day) and the breeding time is 2 time units (10 days). The orthogonalization process for the NLLVs is performed during each breeding cycle. As for the random perturbation method, the random noise is from a Gaussian distribution of . The ensemble perturbations of the three methods have the same size as that of the analysis errors for both models. The ensemble members were integrated for 4 time units for the Lorenz63 model to implement ensemble forecasts and 2 time units (10 days) for the Lorenz96 model. The initial states to start assimilation in each sample were drawn at an interval of 0.05 time units (0.2 time units) for the Lorenz63 model (the Lorenz96 model) after a first spinup of 1000 time units.

## 4. Nonlinear model results

### a. Perturbation growth rate

Before providing a detailed assessment of the quality of the ensemble predictions, the error growth of the different initial schemes will be investigated. For the Lorenz63 model, the growth rates of the vectors from the three schemes were compared by calculating the ratio of the final to initial error (in the norm) at the integrated interval (i.e., the time of a breeding cycle 0.1 time units after the initial state). The parameters used to generate the initial perturbations were those referred to above. However, we fixed the rescaling size to 1.0, which means that the final size of the error vector is simply the growth rate. Three random vectors are used to generate the LNLLV. All results presented here were averaged over 10 000 samples. Similar to the results from A04, the standard breeding method results in an average ratio of 1.26, while the LNLLV has an average ratio of 1.44. As for the random vector, it initially points to a nongrowing direction on average and thus has an average ratio of only 0.96.

The process for the Lorenz96 model was similar to that for the Lorenz63 model. The rescaling factor was 0.17 (the average RMSE of analysis error). After a breeding period of 2 time units (10 days), the first EOF explains 52% of the variance of the bred vectors, indicating that these modes have a certain degree of similarity in patterns that represent the unstable modes determined by the background flow (Corazza et al. 2003). A group of 10 vectors were used for the generation of the NLLVs and BVs. The comparison of growth rates from the different schemes is shown in Fig. 3. Here, the Lyapunov exponent was used to measure the perturbation growth.

If the size of the perturbation at time is expressed by , then the Lyapunov exponential growth can be calculated as

where denotes the initial time. Only the results of the first five NLLVs (LNLLV to NLLV_{5}) are shown here. Each performance is the average of 500 samples. The LNLLV appears to remain the fastest during the first 4 days. The NLLVs represent different directions ordered by their growth rates. Among the first five NLLVs, the growth rates of the first two both over take those of the BVs during the first 2 days, while the growth rates of the last three NLLVs are slightly smaller than that of the BV. These results indicate that the NLLVs span the fast-growing subspace well. After that period, with the growth of the perturbations, the growth rates of all perturbations slow and become very close. The random perturbations are initially characterized by negative growth rates; the growth becomes enhanced (superexponential) (Trevisan and Legnani 1995) after about 1 day. This may be related to the fast growth of the small-scale errors. After small-scale error saturation had occurred in a short time, the growth rate gradually slowed down and mainly depended on the error growth in the large scales (Lacarra and Talagrand 1988; Molteni and Palmer 1993). We chose three pairs of members (*N* = 6) to implement ensemble forecasts for the Lorenz96 model. The use of more ensemble members only limitedly improves the prediction skill for both the BV and NLLV methods.

### b. The Lorenz63 model

The RMSE and pattern anomaly correlation (PAC) offer good measures of overall forecast performances (Buizza et al. 2005). Figures 4a and 4b shows the mean error (in the norm) and the mean correlation (cosine of the angle of forecast state vector and true state vector), respectively, as a function of forecast lead time. The corresponding measurements of control forecasts are also shown for comparison. The ensemble averaging significantly improves the performance of the forecasts as a whole, although different ensemble methods, because of their distinct initialization schemes, have different forecast skills. On account of the analysis cycles that produce the initial states, growing errors have gradually accounted for a significant proportion of the analysis errors. Random perturbations, as nongrowing errors, simulate the probabilistic distribution of analysis errors in the suboptimal directions and thus have the worst ensemble performance. The LNLLV has a larger projection onto the fast-growing component of the analysis errors, and so behaves better than the traditional BV, especially for medium and long-range forecasts.

### c. The Lorenz96 model

The Lorenz96 model is a relatively complex model that resembles a real NWP model and can be used to assess different aspects of forecast performance. As, in the real world, uncertainties in the initial states are inevitable, we assumed that the control analysis is the most real initial state we could obtain. Here, all results were derived from the comparison with true references.

#### 1) Overall measures of ensemble performance

Similarly to the Lorenz63 model, the mean RMSE and PAC were also used here to assess overall performance (Fig. 5). Besides, the ensemble spread is added in Fig. 5a as a comparison with the RMSE of ensemble mean. The spread is defined as the standard deviation of the perturbed ensemble members. The three ensemble schemes evidently outperform the single control forecast in ensemble skill. The gain in predictability from running an ensemble is about 12–24 h on forecast days 7–10. For the first few days (about 4 days), ensemble averages are indistinguishable from the control forecasts. The initial interval can be considered to be a linear stage, during which positive and negative twin perturbations cancel each other out almost completely. However, when the errors gradually increase and enter the nonlinear stage, ensemble averaging plays a much more important role in nonlinear filtering to reduce error growth.

The ensemble skill of the BVs did not show an evident improvement over that of the random vectors. The reason for this may be that during the initial linear stage, the random vectors have already pointed to growing directions through the dynamical flow during the forecast stage, which also spans the subspace of growing errors. However, random vectors have a lower correlation with the analysis errors than bred vectors, and so perform slightly less well than bred vectors, especially in the PAC score. Compared with these two schemes, the ensemble mean of NLLVs is more skillful for 5–10-day forecasts. This can be explained by the fact that the NLLVs sample analysis errors in more mutually orthogonal directions than do the other two schemes. The probability distribution of the analysis errors are more likely to be well simulated by the orthogonal basis the NLLVs form.

The spread is a measure of ensemble reliability. For a statistically reliable ensemble system, ensemble spread should be close to the error of the ensemble mean (Buizza et al. 2005). Figure 5 shows that the growth of the RMSE exceeds that of the spread for all three methods. The use of the comparison method in generating the ensemble perturbations allows the NLLV method to exhibit the largest perturbation growth during the first 2 days; after this, the random method exhibits the largest spread overall, but the spatial distribution of its spread has the lowest correlation with that of the ensemble forecast error (Fig. 6). The ensemble spread of the NLLVs remained larger than that of the BVs from days 1 to 10. This was attributed to the NLLVs sampling the analysis errors in directions much less correlated than used in the breeding schemes.

To distinguish the respective effects of the comparison and the orthogonalization processes on ensemble forecasting, three experiments were implemented and their results were listed in Table 3. By comparing experiments 1 and 2, the only use of the orthogonalization procedure could reduce about 3%–4% of the RMSE of ensemble mean relative to the BV method. The further use of the comparison method as in experiment 3 could have another nearly 1% improvement for the ensemble-mean skill. These results indicate that the global orthgonality is more critical than the comparison method for improving ensemble forecasting in the Lorenz96 model. But the two methods may have different relative performance based on different models and different assimilation systems. This needs a further investigation in future research.

We also compared the forecast error for the BV and NLLV methods at day 7 with the same number of ensemble members (Fig. 6). It can be seen that the ensemble-mean skill for the BVs and NLLVs are very close when two ensemble members are used. However, with the increase of the number of ensemble members, the forecast error of the NLLV and BV schemes are both decreased, and the performance of NLLVs is more significant than the BVs. It indicates that the addition of the rest NLLVs have larger contributions to the improvement of prediction skill than the BVs. It can be attributed to the orthogonality of the NLLVs that better spanning the fast-growing subspace though the rest of them point to mild-growing modes.

#### 2) The validation scores

Reliability and resolution are considered to be two important attributes of ensemble predictions (Murphy 1973). Here, two scores are used to assess the performances of the ensemble forecasts in statistical reliability and resolution. One is the spread–skill correlation and the other is the classical Brier score.

The first measure of statistical reliability focuses on spread and skill correlations. The average error of the ensemble-mean forecasts from the three schemes is used as a reference field to ensure fair comparisons; that is, the forecast skill is defined as the average distance of the respective ensemble-mean forecasts from the true state. Regions with high forecast skill (small average error) regularly have a relatively small spread and vice versa (Buizza et al. 2005). The spread–error correlation over the whole field (40 grids) was computed as a measure of transient consistency. The day-to-day variations of this measure for the BV, random perturbations, and NLLV methods are shown in Fig. 7. As a comparison, the results of the linear LLVs are also given. Similar to the NLLVs, they are also orthogonal perturbations, but they are obtained using an integration of the tangent linear model (TLM) of Lorenz96.

The forecast error field at time 0 can be considered to be the analysis error amplitude. As the ensemble perturbations are designed to represent analysis errors, the ensemble spread should have a similar pattern to the analysis distance field. As is indicated in Fig. 7, the NLLV scheme has the highest correlation coefficient at the initial time, which means that this method performed best at simulating the initial analysis errors. Furthermore, this method maintains its advantage through the whole forecast period. Because the effects of nonlinear terms were introduced in the NLLVs, the spread of the NLLV ensemble better reflected the nonlinear error growth of the ensemble mean than did the LLVs from 1 to 10 days. It is worth noting that the initial coefficient of the Monte Carlo method approximates to zero, which implies that random perturbations are almost incapable of capturing the mode of the analysis errors. The BV method performs better than the Monte Carlo method. The initial perturbations of these four schemes evolve to become increasingly similar to the forecast errors under the driving effects of background flow. The correlations between the spread and skill of these four schemes spontaneously peaks at about day 5, which may be due to reaching the end of the linear stage and the increasingly strong nonlinear effects. At long forecast ranges, the correlation between spread and skill must decrease to zero (Whitaker and Loughe 1998).

The other score used was the classical Brier score, which measures both the reliability and resolution of the ensemble predictions. This score assesses the performance of the probability forecast of the occurrence of a binary event . In the experiments with the Lorenz96 model, the evaluation was made on each grid . The Brier score is computed for the event (the climatological mean to the distance of one standard deviation), which can be assumed to be normal for warm events. It means that if the value of falls between 2.3 and 5.9, the observed event occurs and vice versa. The final score was calculated as the average of 500 samples. Conventionally, the basic Brier score () can be decomposed into its reliability () and resolution () components (Murphy 1973; Stephenson et al. 2008). Further detail regarding this score is given in appendix B. In brief, and were negatively oriented, which means that smaller values indicate a better performance, while for the opposite was the case. The results are presented in Fig. 8.

As is shown in Fig. 8a, for the basic Brier score, the NLLVs performed better than either the BVs or random vectors, and the advantage of BVs over random vectors was very small. Figure 8b presents the evolution of the two components of BS as a function of time. Overall, the best performance was achieved by the NLLV ensemble, and the figure also indicates that the superiority of the NLLVs over the BVs comes mainly from their reliability. It may be explained that the stronger correlations and smaller ensemble diversity of BVs result in synergic deviation of ensemble forecast states from the observed state. So the BV ensemble may not describe the probability of real events efficiently. On the other hand, the superiority of the NLLVs over random vectors comes mainly from their resolution. This phenomenon may arise because random members sample the true state from a largely unconstrained distribution, which, over longer periods, approximately provides the mean probability of the event occurring. Overall, the relative performance can be ranked as NLLV > BV > Random.

## 5. Discussion and conclusions

The NWP models have extremely high degrees of freedom, which leads to huge difficulties in sampling all of the directions in the phase spaces. Therefore, one can argue that it is appropriate to sample the directions in a subspace that is relatively more important. One proposal is to employ a breeding cycle similar to the analysis cycle to generate perturbations largely projecting onto the growing components of the analysis errors (Toth and Kalnay 1997). This method dramatically improved the level of the ensemble prediction using finite members and has become a useful operational forecast tool. However, independent breeding cycles may cause a certain degree of global similarity of bred vectors, especially in local regions, where the stronger correlations are in fact repeated sampling, which degrades the effectiveness of the ensemble. In the present paper, the NLLV ensemble generation technique was proposed to sample the probability distribution of analysis errors using a set of independent and orthogonal directions. Generally, compared to the BVs, the globally orthogonality of the NLLVs can be helpful to reduce the dependence among perturbations and increase the ensemble spread. The comparison method could also be used to optimally select the fastest-growing perturbations. Therefore, the NLLVs performed better than the BVs in simulate the analysis errors in the fastest-growing subspace.

The search for the fastest-growing subspace is improved by using more initial perturbations in the NLLV scheme than in the BV scheme. This increases the computing time (to about 4 times that required by the BV scheme for the Lorenz96 model). However, the breeding cycles of the NLLVs are independent of each other before orthogonalization and thus can be implemented in parallel to save computational time. Therefore, the increased computation cost of the NLLVs in comparison with the BVs is mainly attributed to the comparison and orthogonalization procedures. Our experiment using the Lorenz96 model has a computational expense for generating the NLLV initial perturbations that is only about 8% greater than that of generating the BV initial perturbations.

For the low-order models discussed here, the NLLV method has shown a relatively better performance than those of the BV and the Monte Carlo methods. The Monte Carlo method generally needs many more samples because of its defects in the recognition of the specification of initial errors, which leads to a large computational cost. On the other hand, random sampling degrades its resolution for actual events (Fig. 8b). The significant advantage of BVs is the ease of implementation. However, in our experiments with the Lorenz96 model, the BVs, which have stronger global similarity and smaller ensemble spread, may not describe the probability of real events efficiently and thus have lower reliability.

Our experiments provide a preliminary demonstration of the GSR method as a quick and effective method for separating independent ensemble modes in the Lorenz 3- and 40-variable models. A high-dimensional numerical model will greatly increase the dimension of the ensemble perturbations and also bring certain difficulties in utilizing the GSR method. However, the dimension of the fast-growing subspace is much smaller than the model dimension. Orthogonal perturbations can capture this subspace to the greatest extent, and orthogonalization is only needed to generate the first few fastest-growing NLLVs, thus greatly saving computation time. The present computing power is sufficient to satisfy the requirements of the orthogonalization in high-order models. Therefore, the NLLV scheme could be expected to be quick and effective for generating ensemble perturbations in a high-dimension numerical model. Nevertheless, some other techniques mentioned in the introduction have been proposed to increase the diversity of traditional BVs. These schemes may be able to improve the prediction skill of the traditional BV method. It is worthwhile investigating the performance of NLLVs and comparing with these methods in more complex models.

## Acknowledgments

We wish to thank Dr. Zhina Jiang for many useful comments and assistance with this work. This research was jointly supported by the NSFC projects (41375110, 41175069) and the 973 projects of China (2012CB955200 and 2010CB950400).

### APPENDIX A

#### Ensemble Generation Scheme of Bred Vector

Initially, Toth and Kalnay (1993) used a global rescaling factor to generate each ensemble perturbation when using the BV method. In general, the value of the factor is the same size as the empirically determined global analysis RMSE (Wang and Bishop 2003). However, global rescaling is not optimal for reflecting the characteristics of the observations over local regions because it appears that regions where observations are limited usually have larger analysis errors, and larger regional perturbation amplitudes are therefore expected, and vice versa. Based on this relationship, Toth and Kalnay (1997) introduced the smooth regional rescaling factor.

Here, only simple breeding is considered using a global rescaling factor since all the variables in the Lorenz96 model are equivalent and the observations have the same level for each grid point. The procedure can be easily expressed by the equation

where and are the perturbation vectors of forecast and analysis, respectively, and represents the amplitude of the , while is the value of the rescaling factor. The detailed processes used to generate BVs are schematically described in Fig. 1 (Toth and Kalnay 1993, 1997). The initial analysis state (in this paper, the true state was used) was perturbed by adding a random vector of the same size as the analysis error. The model was then integrated from the corresponding perturbed states. At the end of each cycle, the perturbation was rescaled to the initial size and again superposed on the new reference to integrate. This breeding cycle is repeated to the initial state for forecasting. In this paper, the time of the breeding cycle of the Lorenz63 and Lorenz96 models was 0.1 and 0.2 time units (1 day), respectively.

### APPENDIX B

#### The Ensemble Kalman Filter

The generation of initial analysis states in these experiments was based on the algorithm described by Evensen (2003, 2004). For a given observation time, the ensemble of forecast states are obtained as a reference. The ensemble matrix is then defined as

Here, if is denoted as the mean of the ensemble, then an ensemble perturbation matrix can be expressed as

The variance covariance matrix is

The is used to update the Kalman gain , which is written as

Here, is actually a weighting according to the ratio of the forecast and observational error covariance; is the observational error covariance matrix. The observations are assimilated to produce a new analysis of the state:

The is a set of perturbed observations that corresponds to each previous forecast of the ensemble. They are defined as

The values are independent realizations of the noise and are equivalent to in section 3. Here, the average of all analysis states of the ensemble (i.e., ) is regarded as the initial analysis state when performing the forecasts. To avoid the problem of undersampling, a 7% variance inflation factor was applied to for both of the Lorenz models. The ensemble size of the EnKF assimilation for the Lorenz63 and Lorenz96 models is much larger than the model dimension that has been efficient to estimate the covariance matrix of the background error. Therefore, the localization method for the EnKF system of both models was not used.

### APPENDIX C

#### Brier Score Decomposition

The algorithm of the Brier score used here is based on Stephenson et al. (2008). Assume that the unit probability interval is divided to bins. The total forecasts whose number is are placed in corresponding bins by their forecast probability. Denote the probability forecasts that have fallen in the *k*th bin, then . In the *k*th bin, the probability of forecasts and observations read and (0 or 1), , respectively. The basic Brier Score can be written

The Brier score can be decomposed into its reliability, resolution, and the uncertainty components:

where is the average forecast probability and is the relative frequency that the observed event occurred in the *k*th bin; is the mean probability for the event to occur. They are calculated using the equations

It is , , and that are used in the present paper. The first two are negatively oriented and the third is positively oriented.

## REFERENCES

*Nonlinear Error Dynamics and Predictability Study*(in Chinese). Chinese Academy of Sciences, 192 pp.

**40A,**81–95, doi:.

*Sci. China,*

**40D,**215–224.

*Review and Prospects of the Developments of Atmosphere Sciences in Early 21st Century,*China Meteorology Press, 96–104.

*Trans. New York Acad. Sci. Ser. II,*

**25,**409–433.

*Proc. Seminar on Predictability,*Vol. I, Reading, United Kingdom, ECMWF,

*Wea. Forecasting,*

**23,**752–757, doi:.