## 1. Introduction

The singular vectors of a chemical transport model are the directions of fastest perturbation growth over a finite time interval. Singular vector analysis was introduced in meteorology by Lorenz (1965), who computed the largest error growth rates in an idealized model of the atmosphere. The adjoint technique was used by Molteni and Palmer (1993) and Mureau et al. (1993) to compute singular vectors for meteorological models. Singular vector analysis of general circulation models with millions of variables is now possible (see e.g., Buizza and Palmer 1995; Li et al. 2005).

Applications of singular vector analysis in numerical weather prediction include the following: 1) normal mode analysis of atmospheric flow instability, estimation of error growth, and the assessment of atmospheric predictability (Borges and Hartmann 1992; Ehrendorfer and Tribbia 1997; Molteni and Palmer 1993; Oortwijn 1998); 2) initialization of ensemble forecasts (Molteni et al. 1996; Mureau et al. 1993); and 3) estimation of the optimal placement of adaptive observations (Buizza and Montani 1999; Gelaro et al. 1998; Lorenz and Emanuel 1998; Palmer et al. 1998).

Numerous studies have shown that the structure of singular vectors in atmospheric general circulation models is determined by the following: 1) the atmospheric episode under consideration (Farrell 1988; Barkmeijer et al. 2001); 2) model physics (e.g., the treatment of boundary layer processes; Barkmeijer et al. 2001; Buizza and Montani 1999; Buizza and Palmer 1995; Ehrendorfer et al. 1999; Mahfouf 1999); 3) model resolution (Buizza and Montani 1999; Buizza and Palmer 1995; Ehrendorfer et al. 1999; Mahfouf 1999); and 4) the particular choice of error norms (Kuang 2004).

The objective of this work is to study the singular vectors for atmospheric chemical transport models. The distinguishing feature of these models is the presence of chemical interactions between tracer species, which leads to a very stiff, nonlinear system of partial differential equations. This poses nontrivial challenges in the computation of singular vectors, and allows interesting interpretations stemming from the complex interactions between emission sources, chemical transformations, transport, and deposition processes.

While in the study of atmospheric dynamics the dominant singular vectors are associated with unstable modes, in the study of chemical transport systems the dominant singular vectors are useful to describe the uncertainty in a limited subdomain (e.g., where the model prediction needs to be improved). In this paper we illustrate the use of singular vectors to describe uncertainties in the initial conditions. Other very important sources of uncertainty in air quality models that need to be quantified in real applications are the emissions, the meteorological fields, the deposition velocities, and the top and lateral boundary conditions for regional models.

The paper is organized as follows. In section 2 we introduce the chemical transport model singular vectors and discuss two of their possible applications in the context of air pollution modeling. Computational aspects of chemical singular vectors are discussed in section 3. An introduction to chemical transport modeling and the formulation of the tangent linear and adjoint models is presented in section 4. Several possible perturbation norms are discussed in section 5. Numerical results from a simulation of air pollution in East Asia are shown in section 6. Section 7 summarizes the main findings of this work.

## 2. Singular vectors and chemical transport models

**x**(

*t*

_{0}) to the final state

**x**(

*t*). With

_{F}*t*

_{0}) to 𝗣(

*t*) according to

_{F}### a. Singular vectors

*t*

_{0}is measured in the

*L*

^{2}norm defined by a symmetric positive definite matrix 𝗔

*L*

^{2}. Similarly, the perturbation magnitude at the final time

*t*is measured in a seminorm defined by a semipositive definite matrix 𝗕

_{F}*t*and

_{F}*t*

_{0}offers a measure of error growth:

**s**

*(*

_{k}*t*

_{0}) that maximize the ratio

*σ*

^{2}in (7). These directions are the solutions of the generalized eigenvalue problem:

**v**

*(*

_{k}*t*

_{0}) are the left singular vectors in the singular value decomposition:

**s**

*are 𝗔 orthogonal at*

_{k}*t*

_{0}and 𝗕 orthogonal at

*t*

_{F}:_{t0→tF}, with the 𝗔 scalar product at

*t*

_{0}and the 𝗕 scalar product at

*t*, has the left singular vectors

_{F}**s**

*(*

_{k}*t*

_{0}) and the right singular vectors

**s**

*(*

_{k}*t*). If the same norms are used at the initial and at the final times the singular values

_{F}*σ*can be interpreted as the error amplification factors along each direction

_{k}**s**

*.*

_{k}*t*

_{0})

^{−1}. In this case the resulting singular vectors

**s**

*(*

_{k}*t*

_{0}) evolve into the leading eigenvectors

**s**

*(*

_{k}*t*) of the forecast error covariance matrix 𝗣(

_{F}*t*):

_{F}**s**

*(*

_{k}*t*) are called analysis error covariance singular vectors (Ehrendorfer and Tribbia 1997; Barkmeijer et al. 1999). Since the leading eigenvectors of 𝗣(

_{F}*t*) are the directions of maximum variance of forecast error, the singular vectors define the directions along which we must do a good job of analysis in order to minimize the forecast error at

_{F}*t*. We assume that the model error in (4) is negligible over the period [

_{F}*t*

_{0},

*t*]. From (12) it follows that the singular vectors are the solutions of the following generalized eigenvalue problem:

_{F}*J*in the variational analysis system is an estimate of the inverse of the analysis covariance matrix. This motivates the name Hessian singular vectors for the solutions

**s**

*(*

_{k}*t*

_{0}) of the eigenproblem in (13).

In the study of atmospheric dynamics the dominant singular vectors are associated with unstable modes. In the study of chemical transport systems the dominant singular vectors are useful to describe the uncertainty in a limited subdomain (e.g., where the model prediction needs to be improved). Limited subdomain studies are also of interest in numerical weather prediction (Hersbach et al. 2003).

### b. Initialization of ensemble forecasts

**y**at

*t*(assumed, for simplicity, to be a linear function of model state,

_{F}**y**= 𝗛

**x**). The extended Kalman filter uses the forecast state and its covariance {

**x**(

*t*), 𝗣(

_{F}*t*)} and the observations and their covariance {

_{F}**y**, 𝗥} to produce an optimal (“analyzed”) estimation of the model state and its covariance {

**x**

*(*

_{A}*t*), 𝗣

_{F}*(*

_{A}*t*)}:

_{F}^{T}and apply the tangent linear model to each column and the adjoint model to each row of the covariance matrix (Fisher 2001). The commonly used method to reduce the computational cost is to propagate (only) the projection of the covariance matrix onto a low-dimensional subspace (span {

**s**

_{1}, . . . ,

**s**

*}). The ensemble Kalman filter (Houtekamer and Mitchell 2001) uses a Monte Carlo approach to define this subspace and to approximate the time-evolving covariance matrix.*

_{k}The subspace (i.e., the ensemble of perturbations at the analysis time *t _{F}*) should contain the directions

**s**

*(*

_{k}*t*) along which the error has the maximal growth. Consequently the initial ensemble should be defined based on the singular vectors

_{F}**s**

*(*

_{k}*t*

_{0}). This approach is used at the European Centre for Medium-Range Weather Forecasts (ECMWF) to generate initial perturbations for ensemble forecasts (Buizza et al. 2000; Buizza and Palmer 1995; Hamill et al. 2003).

### c. Targeted observations

Adaptive observations placed in well-chosen locations can reduce the initial condition uncertainties and decrease forecast errors. A number of methods were proposed to “target observations,” that is, to select areas where additional observations are expected to improve considerably the skill of a given forecast. Singular vectors identify sensitive regions of the atmospheric flow and can be used to optimally configure the observational network.

Singular vectors can identify the most sensitive regions of the atmosphere for targeted observations as long as the linearity assumption of error propagation holds (Hansen and Smith 2000). Majumdar et al. (2002) compare the singular vector approach for observation targeting with the ensemble transform Kalman filter. Palmer et al. (1998) argue that for predictability studies an appropriate metric is the perturbation energy. Daescu and Navon (2004) discuss the adaptive observation problem in the context of 4DVAR data assimilation. Leutbecher et al. (2002) and Leutbecher (2003) use Hessians to optimally place the adaptive observations.

## 3. Computation of singular vectors for chemical models

In this section we discuss the computational challenges associated with the chemical singular vectors and propose an approach for calculating them accurately. A numerical eigenvalue solver applied to (8) requires a symmetric matrix 𝗠*𝗕𝗠 in order to successfully employ Lanczos iterations, and to guarantee that the numerical eigenvalues are real. There are two approaches to computing adjoints: continuous and discrete. In the continuous approach the adjoint of the continuous differential equations is derived, then solved numerically. In the discrete approach the numerical solution is (considered to be) the forward model and its adjoint is constructed. The symmetry requirement imposes the use of the discrete adjoint 𝗠* of the tangent linear operator 𝗠 in (8). The computation of discrete adjoints for stiff systems is a nontrivial task (Sandu et al. 2003). In addition, computational errors (which can destroy symmetry) have to be small.

**u**(

*t*

_{0}) and

**v**(

*t*

_{0}) that are propagated forward in time:

**r**(

*τ*) = 0 for all

*τ*. However, both 𝗠 and 𝗠* are evaluated numerically and in practice we expect the symmetry residual

**r**(

*τ*) to have small (but nonzero) values.

To illustrate possible problems with losing the symmetry we consider the SAPRC-99 atmospheric gas-phase reaction mechanism (Carter 2000), which has 93 species and 235 reactions. The forward, tangent linear, and adjoint models are implemented using the Kinetic Preprocessor (KPP), an automatic code generator (Damian et al. 2002; Daescu et al. 2003; Sandu et al. 2003). Several numerical experiments revealed that the magnitude of the symmetry residual depends on the choice of numerical integrator. Among the Rosenbrock integrators available in KPP we selected Rodas4 (Sandu et al. 2003), which performs best with respect to symmetry. The variation of **r**(*τ*) with time is shown in Fig. 1a (solid line). Surprisingly, the symmetry is lost during the stiff transient at the beginning of the integration interval, where the symmetry residual jumps from 10^{−16} to 10^{−2}.

*y*and of the fast component

*z*. This model problem is widely used in the theoretical study of the behavior of stiff systems and of the stiff numerical methods (Hairer and Wanner 2004).

*ϵ*→ 0, the perturbation vectors in (18) are of the form

**x**= [Δ

*, Δ*

_{y}*]*

_{z}^{T}, which do not satisfy (19). These vectors are the initial conditions for the tangent linear model and are propagated forward, then backward through the adjoint model, in order to evaluate the matrix-vector products 𝗠*𝗕𝗠

**x**. Strong, artificial transients appear in the tangent linear model because of the fact that the initial perturbations are away from the slow manifold described by (19).

Numerical tests revealed that a small number of projection steps are sufficient in practice to substantially enhance symmetry. Figure 1b presents the evolution of the symmetry residual with the number of projection steps. The symmetry is markedly improved after only two projection steps.

Figure 1a (dashed lines) presents the evolution of the symmetry residual when six projection steps are performed with the very small step size of 10^{−9} s. The symmetry error during the stiff transient is reduced to 10^{−11}. Note that the projection time step is of the order of the fastest scales in the system, and is much smaller than the numerical integration time step. We next extend these results to 3D chemical transport models.

## 4. 3D chemical transport models

Chemical transport models solve the mass-balance equations for concentrations of trace species in order to determine the fate of pollutants in the atmosphere. In this section we briefly describe the governing mass balance equations and the tangent linear and adjoint models; a detailed discussion is given in Sandu et al. (2005). Tangent linear and adjoints of transport models are discussed in Daley (1995) and Vukicevic and Hess (2000)

*c*be the mole-fraction concentration of chemical species

_{i}*i*,

*Q*the rate of surface emissions,

_{i}*E*the rate of elevated emissions,

_{i}**V**

^{dep}

_{i}the deposition velocity, and

*f*the rate of chemical transformations. Furthermore, the inflow, outflow, and ground boundaries of the computational domain are denoted by Γ

_{i}^{in}, Γ

^{out}, and Γ

^{ground}, respectively;

**u**is the wind field vector;

*K*is the turbulent diffusivity tensor; and

*ρ*is the air density. The evolution of

*c*is described by the following equations:

_{i}*forward model*.

*c*

^{0}of the initial conditions will result in perturbations Δ

*c*(

*t*) of the concentration field at later times. The evolution of these perturbations is governed by the equations:

*tangent linear model*associated with the forward model (21). Here 𝗙 = ∂

*f*/∂

*c*denotes the Jacobian of the chemical rate function

*f*, and

**F**

_{i}_{,*}is its

*i*th row.

*continuous adjoint model*associated with the forward model (21) [or, more exactly, the adjoint of the tangent linear model (22)] describes the evolution of the adjoint variables

*λ*:

_{i}**u**·

**n**= 0 at ground level. The forcing function

*ϕ*depends on the particular cost functional under consideration (Sandu et al. 2005).

_{i}*tangent linear model*of (24) is constructed from the tangent linear transport 𝗧 and chemistry 𝗖 operators. As explained in section 3 a projection onto the chemical slow manifold Π is applied before each linearized chemistry:

*discrete adjoint model*is based on the discrete adjoints of the transport 𝗧* and chemistry 𝗖* numerical schemes. A chemical adjoint projection Π* is applied after each adjoint chemistry step:

## 5. Error norms

In numerical weather prediction models, variables have different physical units (wind velocity, temperature, air density, etc). The energy norms provide a unified measure for the magnitude of perturbations in variables of different dimensions.

*L*

^{2}norms will provide a reasonable measure of the magnitude of the perturbation

*c*

^{s}_{i,j,k}denotes the concentration of chemical species

*s*at the grid point (

*i, j, k*) in the discrete model, and Δ

*c*

^{s}_{i,j,k}is its perturbation.

*relative error*growth (i.e., the directions that maximize):

*c*

^{s}_{i,j,k}as model variables. In practice it is advantageous to approximate the relative errors by the absolute errors Δ

*c*

^{s}_{ijk}scaled by “typical” concentration values

*w*

^{s}_{ijk}

*w*

^{s}_{i,j,k}can be chosen to be bounded away from zero. More importantly, having the weights independent of the system state

*c*keeps the maximization problem in (29) equivalent to a generalized eigenvalue problem

## 6. Numerical results

The numerical tests use the state-of-the-art regional atmospheric chemical transport model (STEM; Carmichael et al. 2003). The simulation covers a region of 7200 km × 4800 km in East Asia and the simulated conditions correspond to March 2001. More details about the forward model simulation conditions and comparison with observations are available in Carmichael et al.(2003).

The computational grid has *N _{x}* ×

*N*×

_{y}*N*nodes with

_{z}*N*= 30,

_{x}*N*= 20,

_{y}*N*= 18, and a horizontal resolution of 240 km × 240 km. The chemical mechanism is SAPRC-99 (Carter 2000), which considers the gas-phase atmospheric reactions of volatile organic and nitrogen oxides in urban and regional settings. The meteorological fields have been computed using the Regional Atmospheric Modeling System (more information available online at http://rams.atmos.colostate.edu/), and analyzed offline (data assimilation of the meteorological observations has been performed before the chemical transport simulations). The initial and boundary conditions have been obtained from a long run of the model before the start time of the current computations; details can be found Carmichael et al. (2003). While the simulations described in Carmichael et al. (2003) use a grid resolution of 80 km × 80 km, in the current paper we use a coarser grid in order to reduce the CPU time needed by the singular vector calculations.

_{x}The adjoint of the comprehensive model STEM is discussed in detailed in Sandu et al.(2005). Both the forward and adjoint chemical models are implemented using KPP (Damian et al. 2002; Daescu et al. 2003; Sandu et al. 2003). The forward and adjoint models are parallelized using PAQMSG (Miehe et al. 2002). PARPACK (available online at http://www.caam.rice.edu/~kristyn/parpack_home.html) was used to solve the symmetric generalized eigenvalue problems.

The singular vectors **s**(*N _{x}*,

*N*,

_{y}*N*,

_{z}*N*

_{spec}) in (8) are represented by four-dimensional arrays. To visualize them we separately consider the vector sections corresponding to different chemical species. Further, each three-dimensional section is reduced to a two-dimensional “top” view by adding the values in each vertical column, or to a two-dimensional “south” view by adding the values in each north–south column.

Numerical results for different optimization regions, optimization species, simulation intervals, meteorological data, and error norms are presented next.

### a. Singular vectors for different simulation intervals

We first consider the case where the optimization criterion is the ground-level ozone concentration in a 720 km × 960 km area covering Korea. The singular vector analysis presented next will help answer the following questions:

In which areas will small changes in the initial conditions grow fastest to impact the ozone levels over Korea after 12, 24, and 48 h and what is their rate of growth?

How should the initial perturbations be constructed for ensemble simulations in order to properly describe the uncertainty in ground ozone predictions over Korea after 12, 24, and 48 h?

Where are additional observations needed the most in order to improve 12-, 24-, and 48-h predictions of ground-level Korean ozone?

The meteorological conditions are an important factor in determining the singular vectors. The 2-km level wind fields for the simulation interval under consideration are shown in Fig. 3.

The top and south views for O_{3} sections of the dominant et al. for the 12-, 24-, and 48-h simulations are presented in Fig. 4. Singular vectors are localized near the optimization area in both the horizontal and the vertical directions. As expected, for longer simulation intervals the singular vectors spread further away from the optimization region. The singular vectors are not confined to the lowest layers, but also show important regions located between 1 and 3 km. This is because of the transport processes, which exchange material from the surface into the free troposphere, and which bring free tropospheric air back to the surface. This has important implications for the design of measurement systems, since it shows that surface measurements alone are not sufficient for a correct representation of ground-level concentrations.

Several dominant singular vectors for the 12-, 24-, and 48-h simulations starting at 0000 UTC 1 March 2001 are shown in Figs. 4, 5 and 6 (O_{3} sections), Fig. 7 (NO_{2} sections), and Fig. 8 (HCHO sections). The NO_{2} and HCHO are important species involved in the photochemical production of ozone. They also are species that are directly emitted into the atmosphere as a result of combustion processes.

A close look at the structure of the dominant singular vectors reveals the following:

Since different singular vectors are orthogonal they contain different information about the areas of maximal error growth.

The eigenvectors evolve in time as the length of the simulation interval increases. They tend to expand farther away from the optimization area, illustrating that perturbations in a wider area at earlier times impact the optimization area.

The shapes and the magnitudes of the O

_{3}, NO_{2}, and HCHO sections show subtle differences, illustrating the different influences that these species have on ground-level O_{3}after 12, 24, and 48 h.

_{3}measurements may have to be placed in a different location than additional NO

_{2}or HCHO observations. Moreover, the optimal location of observations changes in time and drifts away from the optimization area for longer intervals. In conclusion, what is needed is a well thought out distribution of sites measuring many parameters simultaneously.

### b. Evolved singular vectors

The perturbations initialized along each dominant singular vectors develop in time becoming “evolved singular vectors.” We are interested in the shape of these perturbations at the end of the 24-h simulation interval. The evolved singular vectors (scaled to have the A-norm equal to 1) are displayed in Fig. 9. The largest values of singular vectors are clustered above the optimization region. This is expected since the SVs are constructed to optimize final-time perturbation norm in the optimization region.

### c. The linearity assumption

Inherent in the singular vector calculation is the assumption that small perturbations propagate according to the tangent linear model dynamics (Tanguay et al. 1997). To assess the validity of this linearity assumption we perturb the initial state with scaled versions each of the first five singular vectors. The scaling is chosen such that the ground-level ozone perturbations are ∼5%–10% of the reference ozone values. The perturbed initial state is propagated forward for 24 h using the full, nonlinear model. The perturbation at the final time is the difference between the perturbed and the reference final states. The B norms of the evolved perturbations are divided by the A norms of the initial perturbations. The results shown in Table 1 reveal that the perturbation magnitude ratios approximate well the singular values.

The structure of the nonlinearly evolved perturbations at final time are shown in Fig. 10. The evolved perturbation structure is similar to that of the evolved singular vectors shown in Fig. 9. Since both the magnitude and the structure of the linearly evolved perturbations match those of the nonlinearly evolved perturbations we conclude that the linearity assumption holds (at least) for the 24-h simulation interval under consideration.

We next consider a random perturbation vector **r** with components drawn from a uniform distribution with amplitude ±10% of the initial concentrations *r ^{s}*

_{i,j,k}(

*t*

_{0}) ∈

^{s}

_{i,j,k}(

*t*

_{0}), 1.1

*x*

^{s}_{i,j,k}(

*t*

_{0})]. The perturbation evolved for 24 h and its B norm was taken. The perturbation components along each of the singular vectors

**r**

*(*

_{i}*t*

_{0}) =

*σ*〈

_{i}**r**(

*t*

_{0}), 𝗔

**s**

*(*

_{i}*t*

_{0})〉

**s**

*(*

_{i}*t*

_{0}) evolve into

**r**

*(*

_{i}*t*) =

_{F}*σ*〈

_{i}**r**(

*t*

_{0}), 𝗔

**s**

*(*

_{i}*t*

_{0})〉

**s**

*(*

_{i}*t*). The total B norm of the first

_{F}*n*components of the perturbation is Σ

^{n}

_{i=1}

*σ*

^{2}

_{i}〈

**r**(

*t*

_{0}), 𝗔

**s**

_{i}(

*t*

_{0})〉

^{2}. The results in Table 1 (last row) show that the evolved perturbation components along the first 12 singular vectors account for virtually all the B norm of the random perturbation at the final time. This result confirms the fact that perturbation effects can be captured using only a small subspace of dominant singular vectors.

### d. Singular vectors versus adjoints

To illustrate the difference between the information conveyed by the singular vectors and by the adjoint variables we consider again the ground-level O_{3} in the Korea optimization area and focus on the 24-h simulation starting at 0000 UTC 1 March 2001. The cost function in the adjoint calculation is the sum of squared ground-level O_{3} concentrations in the optimization area. The adjoint variables are computed through a 24-h backward integration and are shown in Fig. 11.

To assess the relationship between the adjoint variable *λ*(*t*_{0}) and different singular vectors **s*** _{k}*(

*t*

_{0}) we consider the correlation coefficients

*= 〈*

**ρ**_{k}*λ*(

*t*

_{0}), 𝗔

**s**

*(*

_{k}*t*

_{0})〉/(||

*λ*(

*t*

_{0})||

*||*

_{A}**s**

*(*

_{k}*t*

_{0})||

*). Specifically, we compute the correlations between individual (and homologous) sections of*

_{A}*λ*(

*t*

_{0}) and

**s**

*(*

_{k}*t*

_{0}). The results are shown in Fig. 12. For all sections the correlation of the adjoint and the first singular vector is the strongest. The O

_{3}section of the adjoint in particular is very weakly correlated with the remaining singular vectors.

The comparison of adjoint variables with the singular vectors (Figs. 4 –8) points to the following conclusions.

The adjoints have a similar structure with the first singular vectors. The next dominant singular vectors (the second, the third, etc.) carry additional information about the areas where changes have a high impact on the optimization area. This additional information is not captured by the adjoint.

The adjoint covers a wider area following the flow pattern, while the singular vectors remain localized, even for larger simulation interval.

### e. Influence of the meteorological conditions

To assess the influence of different meteorological conditions on the singular vectors we perform a 24-h simulation starting at 0000 UTC 26 March 2001, for the same optimization criterion (ground-level O_{3} in the Korea area). The 2-km level wind fields on 26 March are shown in Fig. 13. A comparison with the conditions present on 1 March (shown in Fig. 3) reveals that the meteorological conditions were considerably different during these two days.

The O_{3}, NO_{2}, and HCHO sections of the four dominant singular vectors are shown in Fig. 14. There are clear differences between the structure of the singular vectors at 26 and 1 March (Figs. 4 –8). This shows the important role that meteorology plays in the distribution of pollutant concentrations.

### f. Influence of the optimization region

In the study of chemical transport systems the dominant singular vectors are useful to describe the uncertainty in a limited subdomain. We now analyze how the choice of the optimization region impacts the singular vectors.

The O_{3}, NO_{2}, and HCHO sections of the dominant eigenvector for another 24-h, 1 March simulation are shown in Fig. 15. The optimization criterion is ground-level O_{3} over a region of the same area, but located in southeast China (the gray area on the map). As expected, singular vectors are localized over the optimization area.

Another numerical test is performed for a optimization area that covers 24 grid cells over Japan, Korea, and southeast China. The magnitude of the largest eigenvalues decreases at a slower rate (as shown in Fig. 18). This is because of the optimization area being larger than in the previous numerical experiments. About 30 eigenvalues are needed for a two orders of magnitude decrease in the magnitude of the eigenvalues; therefore about 30 singular vectors are needed to accurately capture the uncertainty. The O_{3}, NO_{2}, and HCHO sections of the dominant singular vectors are shown in Fig. 16. Singular vectors are localized over China and Korea; there are no lobes localized over Japan. Because of the westerly flows changes in initial concentration fields over Japan do not impact significantly ground O_{3} concentration after 24 h over the optimization region.

To further show the influence of the optimization region we consider a very large area with over 100 cells covering parts of China, Korea, and Japan (the gray area on the map in Fig. 17). The magnitude of the largest 40 eigenvalues is shown in Fig. 18. The decrease of eigenvalue magnitude is slower for the larger regions, and therefore, more eigenvectors are needed to capture the uncertainty. The ratio of the smallest to largest computed eigenvalues is *λ*_{1}/*λ*_{40} = 22 for the 100-cell region, compared with *λ*_{1}/*λ*_{40} = 472 for the 24-cell region. The O_{3}, NO_{2}, and HCHO sections of the dominant singular vectors are shown in Fig. 17. The dominant singular vectors are also localized to Korea and southeast China. Note that the first few dominant singular vectors are insufficient to accurately describe the uncertainty.

The singular vectors also depend on the choice of error norms at the initial (*A*) and final time (*B*). Additional numerical tests (not shown here) have been performed using the ground-level concentrations of 66 long-lived species over Korea. The new singular vectors displayed clear differences from the ones computed based on ozone ground level over Korea.

## 7. Conclusions

In this work we study the computational aspects of singular vector analysis of chemical transport models. Singular vectors span the directions of maximal error growth in a finite time, as measured by specific error norms.

To maintain the symmetry of the tangent linear–adjoint operator 𝗠*𝗠 it is necessary to employ discrete adjoints. A projection method is proposed to preserve the symmetry of 𝗠*𝗠 operators for stiff chemical systems. The application of this technique is extended to 3D chemical transport models. Different definitions of the perturbation error norms are discussed.

Numerical results are presented for a 3D chemical transport simulation of atmospheric pollution in East Asia in March 2001. The assumption of linear propagation of perturbations, intrinsic in the singular vector calculation, was checked numerically for a 24-h simulation interval. The singular values and the structure of the singular vectors depend on the length of the simulation interval, the meteorological data, the location of optimization region and the selection of optimized species, the choice of error norms, and the size of the optimization region.

While in the study of atmospheric dynamics the dominant singular vectors are associated with unstable modes, in the study of chemical transport systems the dominant singular vectors are useful to describe the uncertainty in a limited subdomain (e.g., where the model prediction needs to be improved). In this paper we illustrate the use of singular vectors to describe uncertainties in the initial conditions. Other very important sources of uncertainty in air quality models that need to be quantified in real applications are the emissions, the meteorological fields, the deposition velocities, and the top and lateral boundary conditions for regional models.

The predictions of air quality models are corrupted by uncertainties coming from the initial distribution of the chemical fields, the boundary conditions, the rates of emission of pollutants, and the meteorological fields. In this paper we illustrate the use of singular vectors to quantify the propagation of uncertainties from the initial conditions; but the other sources need to be accounted for in a real data assimilation setting. The decrease of the singular values for longer simulation intervals is a result of the fact that, as time progresses, the final solution is driven more by emissions and less by the initial conditions. Consequently, the effect of uncertainties in emission sources on the final state becomes more important.

Most of the uncertainty in the optimization region at the final time is determined by the uncertainty along the dominant singular vectors at the initial time. The uncertainty (error) growth rates along each direction are given by the corresponding singular values. For limited optimization regions the singular values decrease rapidly, and a few dominant singular vectors are sufficient to capture most of the uncertainty. For large optimization regions the singular values decrease slowly. The areas of influence are no longer localized, and uncertainty from all over the computational domain contributes to the uncertainty in the optimization area at the final time. As a consequence for data assimilation, small ensembles are sufficient if the observations are localized, or if one seeks improved predictions over a relatively small, well-defined region.

To improve predictions within the optimization region additional observations are needed in the areas described by the dominant singular vectors. Additional O_{3} measurements have to be placed in a different location than additional NO_{2} or HCHO observations. The optimal location of observations changes in time and drifts away from the optimization area for longer intervals.

The dominant singular vector has a similar structure to the adjoint variable. The next singular vectors carry additional information about the high sensitivity areas, which is not captured by a simple adjoint analysis.

The computation of singular vectors is computationally intensive. In our experiments 40–100 iterations were necessary for PARPACK to converge (taking between 8–16 h of CPU time for a parallel run on 30 Opteron processors). Each iteration includes one forward and tangent linear model run and one adjoint run. The cost of one forward and tangent linear model (using a direct-decoupled approach and reusing the matrix factorizations) followed by one backward adjoint integration is less than 3 times the cost of the forward trajectory calculation (Sandu et al. 2005). The calculation of singular vectors is at least as expensive as a full 4DVAR data assimilation cycle, where 20–30 iterations are typically sufficient for a substantial decrease in the cost function (Sandu et al. 2005). Note that the cost of performing the chemical projections accounts for only a small percent of the total computational time.

As the field of chemical weather forecasting grows, it can be anticipated that singular vectors will find many applications. The results presented in this paper are a first step in this direction.

## Acknowledgments

This work was supported by the National Science Foundation through Award NSF ITR AP&IM 0205198. Sandu’s work was also partially supported by the Award NSF CAREER ACI-0413872. We thank Virginia Tech’s Laboratory for Advanced Scientific Computing (LASCA) for the use of the Anantham cluster. The authors also thank the anonymous reviewers for their constructive comments, which helped improve this work.

## REFERENCES

Barkmeijer, J., M. van Gijzen, and F. Bouttier, 1998: Singular vectors and estimates of the analysis error covariance metric.

,*Quart. J. Roy. Meteor. Soc.***124****,**1695–1713.Barkmeijer, J., R. Buizza, and T. N. Palmer, 1999: 3D-Var Hessian singular vectors and their potential use in the ECMWF Ensemble Prediction System.

,*Quart. J. Roy. Meteor. Soc.***125****,**2333–2351.Barkmeijer, J., R. Buizza, T. N. Palmer, K. Puri, and J. Mahfouf, 2001: Tropical singular vectors computed with linearized diabatic physics.

,*Quart. J. Roy. Meteor. Soc.***127****,**685–708.Borges, M., and D. Hartmann, 1992: Barotropic instability and optimal perturbations of observed non-zonal flow.

,*J. Atmos. Sci.***49****,**335–354.Buizza, R., 1994: Localization of optimal perturbations using a projection operator.

,*Quart. J. Roy. Meteor. Soc.***120****,**1647–1681.Buizza, R., and T. N. Palmer, 1995: The singular-vector structure of the atmospheric general circulation.

,*J. Atmos. Sci.***52****,**1434–1456.Buizza, R., and A. Montani, 1999: Targeting observations using singular vectors.

,*J. Atmos. Sci.***56****,**2965–2985.Buizza, R., J. Barkmeijer, T. Palmer, and D. Richardson, 2000: Current status and future developments of the ECMWF ensemble prediction system.

,*Meteor. Appl.***7****,**163–175.Carmichael, G. R., and Coauthors, 2003: Regional-scale chemical transport modeling in support of the analysis of observations obtained during the TRACE-P experiment.

,*J. Geophys. Res.***108****,**10649–10671.Carter, W., 2000: Implementation of the saprc-99 chemical mechanism into the models-3 framework. Tech. Rep., U.S. Environmental Protection Agency, 215 pp.

Daescu, D., and I. M. Navon, 2004: Adaptive observations in the context of 4d-var data assimilation.

,*Meteor. Atmos. Phys.***85****,**4. 205–226.Daescu, D., A. Sandu, and G. Carmichael, 2003: Direct and adjoint sensitivity analysis of chemical kinetic systems with KPP: II—Numerical validation and applications.

,*Atmos. Environ.***37****,**5097–5114.Daley, R., 1991:

*Atmospheric Data Analysis*. Cambridge University Press, 457 pp.Daley, R., 1995: Estimating the wind field from chemical constituent observations: Experiments with a one-dimensional extended Kalman filter.

,*Mon. Wea. Rev.***123****,**181–198.Damian, V., A. Sandu, M. Damian, F. Potra, and G. Carmichael, 2002: The kinetic preprocessor kpp—A software environment for solving chemical kinetics.

,*Comput. Chem. Eng.***26****,**1567–1579.Ehrendorfer, M., and J. Tribbia, 1997: Optimal prediction of forecast error covariance through singular vectors.

,*J. Atmos. Sci.***54****,**286–313.Ehrendorfer, M., R. Errico, and K. Raeder, 1999: Singular vector perturbation growth in a primitive equation model with moist physics.

,*J. Atmos. Sci.***56****,**1627–1648.Farrell, B., 1988: Optimal excitation of neutral Rossby waves.

,*J. Atmos. Sci.***45****,**163–172.Fisher, M., cited. 2001: Assimilation techniques (5): Approximate Kalman filters and singular vectors. ECMWF Meteorological Training Course Lecture Notes. [Available online at http://www.ecmwf.int/newsevents/training/rcourse_notes.].

Gelaro, R., R. Buizza, T. Palmer, and E. Klinker, 1998: Sensitivity analysis of forecast errors and the construction of optimal perturbations using singular vectors.

,*J. Atmos. Sci.***55****,**1012–1037.Hairer, E., and G. Wanner, 2004:

*Solving Ordinary Differential Equations II Stiff and Differential-Algebraic Problems*. 3d ed. Springer Series in Computational Mathematics, Vol. 14, Springer, 617 pp.Hamill, T., C. Snyder, and J. Whitaker, 2003: Approximate analysis error covariance singular vectors in a simple GCM.

*Abstracts, EGS–AGU–EUG Joint Assembly,*Nice, France, EGS–AGU–EUG, Abstract 13876.Hansen, J., and A. Smith, 2000: The role of operational constraints in selecting supplementary observations.

,*J. Atmos. Sci.***57****,**2859–2871.Hersbach, H., R. Mureau, J. D. Opsteegh, and J. Barkmeijer, 2003: Developments of a targeted ensemble prediction system.

,*Quart. J. Roy. Meteor. Soc.***129****,**2027–2048.Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129****,**123–137.Jazwinski, A., 1970:

*Stochastic Processes and Filtering Theory*. Academic Press, 376 pp.Kuang, Z., 2004: The norm dependence of singular vectors.

,*J. Atmos. Sci.***61****,**2943–2949.Leutbecher, M., 2003: A reduced rank estimate of forecast error variance changes due to intermittent modifications of the observing network.

,*J. Atmos. Sci.***60****,**729–742.Leutbecher, M., J. Barkmeijer, T. N. Palmer, and A. J. Thorpe, 2002: Potential improvement to forecasts of two severe storms using targeted observations.

,*Quart. J. Roy. Meteor. Soc.***128****,**1641–1670.Li, Z., I. M. Navon, and M. Hussaini, 2005: Analysis of the singular vectors of the full-physics FSU global spectral model.

,*Tellus***57A****,**560–574.Lorenz, E., 1965: A study of the predictability of a 28 variable atmospheric model.

,*Tellus***17****,**321–333.Lorenz, E., and K. Emanuel, 1998: Optimal sites for supplementary observations: Simulation with a small model.

,*J. Atmos. Sci.***55****,**399–414.Mahfouf, J., 1999: Influence of physical processes on the tangent-linear approximation.

,*Tellus***51****,**147–166.Majumdar, S., C. Bishop, R. Buizza, and R. Gelaro, 2002: A comparison of ensemble transform Kalman filter targeting guidance with ECMWF and NRL total-energy singular vector guidance.

,*Quart. J. Roy. Meteor. Soc.***128****,**2527–2549.Miehe, P., A. Sandu, G. Carmichael, Y. Tang, and D. Daescu, 2002: A communication library for the parallelization of air quality models on structured grids.

,*Atmos. Environ.***36****,**3917–3930.Molteni, F., and T. Palmer, 1993: Predictability and finite-time instability of the northern winter circulation.

,*Quart. J. Roy. Meteor. Soc.***119****,**269–298.Molteni, F., R. Buizza, T. Palmer, and T. Petroliagis, 1996: The new ECMWF ensemble prediction system: Methodology and validation.

,*Quart. J. Roy. Meteor. Soc.***122****,**73–119.Mureau, R., F. Molteni, and T. Palmer, 1993: Ensemble prediction using dynamically-conditioned perturbations.

,*Quart. J. Roy. Meteor. Soc.***119****,**299–323.Oortwijn, J., 1998: Predictability of the onset of blocking and strong zonal flow regimes.

,*J. Atmos. Sci.***55****,**973–994.Palmer, T., R. Gelaro, J. Barkmeijer, and R. Buizza, 1998: Singular vectors, metrics, and adaptive observations.

,*J. Atmos. Sci.***55****,**633–653.Sandu, A., D. Daescu, and G. Carmichael, 2003: Direct and adjoint sensitivity analysis of chemical kinetic systems with KPP: I—Theory and software tools.

,*Atmos. Environ.***37****,**5083–5096.Sandu, A., D. Daescu, G. Carmichael, and T. Chai, 2005: Adjoint sensitivity analysis of regional air quality models.

,*J. Comput. Phys.***204****,**222–252.Tanguay, M., S. Polavarapu, and P. Gauthier, 1997: Temporal accumulation of first-order linearization error for semi-Lagrangian passive advection.

,*Mon. Wea. Rev.***125****,**1296–1311.Vukicevic, T., and P. G. Hess, 2000: Analysis of tropospheric transport in the Pacific basin using the adjoint technique.

,*J. Geophys. Res.***105****,**7213–7230.

The dominant eigenvalues (*λ*_{k} = *σ*^{2}_{k}) for the 12-, 24-, and 48-h simulations. The rapid decrease in magnitude indicates that uncertainty in the optimization region can be captured by only a few singular vectors.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The dominant eigenvalues (*λ*_{k} = *σ*^{2}_{k}) for the 12-, 24-, and 48-h simulations. The rapid decrease in magnitude indicates that uncertainty in the optimization region can be captured by only a few singular vectors.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The dominant eigenvalues (*λ*_{k} = *σ*^{2}_{k}) for the 12-, 24-, and 48-h simulations. The rapid decrease in magnitude indicates that uncertainty in the optimization region can be captured by only a few singular vectors.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The 2-km level wind fields during the 12-, 24-, and 48-h simulations starting at 0000 UTC 1 Mar 2001. The optimization area is shaded.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The 2-km level wind fields during the 12-, 24-, and 48-h simulations starting at 0000 UTC 1 Mar 2001. The optimization area is shaded.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The 2-km level wind fields during the 12-, 24-, and 48-h simulations starting at 0000 UTC 1 Mar 2001. The optimization area is shaded.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant singular vectors (SV; O_{3} sections) for the 12-, 24-, and 48-h simulations starting at 0000 UTC 1 Mar 2001. The optimized criterion is the ground-level O_{3} value in the shaded area.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant singular vectors (SV; O_{3} sections) for the 12-, 24-, and 48-h simulations starting at 0000 UTC 1 Mar 2001. The optimized criterion is the ground-level O_{3} value in the shaded area.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant singular vectors (SV; O_{3} sections) for the 12-, 24-, and 48-h simulations starting at 0000 UTC 1 Mar 2001. The optimized criterion is the ground-level O_{3} value in the shaded area.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for the second dominant SVs (O_{3} sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for the second dominant SVs (O_{3} sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for the second dominant SVs (O_{3} sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for third and fourth dominant SVs (O_{3} sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for third and fourth dominant SVs (O_{3} sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for third and fourth dominant SVs (O_{3} sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for the first and second dominant SVs (NO_{2} sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for the first and second dominant SVs (NO_{2} sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for the first and second dominant SVs (NO_{2} sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for the first and second dominant SVs (HCHO sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for the first and second dominant SVs (HCHO sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Same as in Fig. 4, but for the first and second dominant SVs (HCHO sections).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The evolved dominant SVs after the 24-h simulation (i.e., at 0000 UTC 2 Mar 2001).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The evolved dominant SVs after the 24-h simulation (i.e., at 0000 UTC 2 Mar 2001).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The evolved dominant SVs after the 24-h simulation (i.e., at 0000 UTC 2 Mar 2001).

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The evolved perturbations using the full nonlinear model matches well the evolved singular vectors after 24 h.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The evolved perturbations using the full nonlinear model matches well the evolved singular vectors after 24 h.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The evolved perturbations using the full nonlinear model matches well the evolved singular vectors after 24 h.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Adjoints for the 24-h simulation starting at 0000 UTC 1 Mar 2001.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Adjoints for the 24-h simulation starting at 0000 UTC 1 Mar 2001.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Adjoints for the 24-h simulation starting at 0000 UTC 1 Mar 2001.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Correlations between homologous sections of the adjoint variable and of the dominant SVs.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Correlations between homologous sections of the adjoint variable and of the dominant SVs.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Correlations between homologous sections of the adjoint variable and of the dominant SVs.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The 2-km level wind fields during the simulation starting at 0000 UTC 26 Mar 2001. The optimization area is shaded.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The 2-km level wind fields during the simulation starting at 0000 UTC 26 Mar 2001. The optimization area is shaded.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The 2-km level wind fields during the simulation starting at 0000 UTC 26 Mar 2001. The optimization area is shaded.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation starting at 0000 UTC 26 Mar 2001.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation starting at 0000 UTC 26 Mar 2001.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation starting at 0000 UTC 26 Mar 2001.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation starting at 0000 UTC 1 Mar 2001. The optimized criterion is ground-level O_{3} over the gray region in southeast China.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation starting at 0000 UTC 1 Mar 2001. The optimized criterion is ground-level O_{3} over the gray region in southeast China.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation starting at 0000 UTC 1 Mar 2001. The optimized criterion is ground-level O_{3} over the gray region in southeast China.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation. The optimization area is the 24-cell area covering Korea, Japan, and east China.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation. The optimization area is the 24-cell area covering Korea, Japan, and east China.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation. The optimization area is the 24-cell area covering Korea, Japan, and east China.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation over a large optimization area.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation over a large optimization area.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Dominant SVs for the 24-h simulation over a large optimization area.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The dominant eigenvalues for simulations on 24-cell and large optimization areas.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The dominant eigenvalues for simulations on 24-cell and large optimization areas.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

The dominant eigenvalues for simulations on 24-cell and large optimization areas.

Citation: Monthly Weather Review 134, 9; 10.1175/MWR3158.1

Singular values are well approximated by the ratios of perturbation energies for initial perturbations along the dominant SVs and also for random perturbations.