## 1. Introduction

A semi-implicit time-stepping scheme (Robert 1969) is commonly applied to the terms responsible for fast waves in large-scale weather prediction and general circulation models. This removes the associated time step restrictions and makes computation practical. More recently, semi-implicit integrators have also been applied at mesoscale resolutions (Tapp and White 1976; Cullen 1990; Tanguay et al. 1990; Golding 1992; Yeh et al. 2002; Grabowski and Smolarkiewicz 2002) and convective scales (Robert 1993; Grabowski and Smolarkiewicz 2002). The resulting elliptic partial differential equations are nontrivial. They are poorly conditioned, nonseparable, contain cross-derivative terms, and are typically nonsymmetric. This is due to domain anisotropy, effects of planetary rotation, ambient stratification, the use of general curvilinear coordinates in the governing equations [e.g., the Gal-Chen and Somerville (1975) terrain-following transformation], or the imposition of partial-slip conditions along an irregular lower boundary. Among the most effective methods reported for solving such problems are the preconditioned nonsymmetric conjugate-gradient-type (alias Krylov subspace, hereafter Krylov) iterative schemes (e.g., Kapitza and Eppel 1992; Skamarock et al. 1997; Thomas et al. 1998, 2000). A number of alternative nonsymmetric Krylov solvers are used in computational research and engineering (Axelsson 1994; Greenbaum 1997). Although this paper focuses on a particular scheme, the issues discussed are relevant to other iterative methods as well. Our method of choice is the restarted generalized conjugate residual [GCR(*k*)] algorithm^{1} akin to the popular generalized minimum residual [GMRES(*k*)] solver (Eisenstat et al. 1983; Saad and Schultz 1986; Smolarkiewicz and Margolin 1994, 1997; Smolarkiewicz et al. 1997).^{2} For the reader's convenience, GCR(*k*) is summarized in appendix A, where we also introduce some terminology, notation, and a notion of preconditioning (left, as opposed to right) used throughout this paper. A brief discussion of line-relaxation preconditioners is included in appendix B.

Designing a suitable preconditioner is important, as it can dramatically accelerate solver convergence, thereby reducing the overall computational expense of a model. In principle, the preconditioner ^{−1} is definite, where *Q* = 0 with an auxiliary problem ^{−1}[*Q*] = 0 that converges faster (than the original problem) because of a closer clustering of the eigenvalues of the auxiliary elliptic operator ^{−1}*Q* symbolize the dependent variable and the rhs, respectively. For the preconditioner to be useful, the convergence of the auxiliary problem must be sufficiently rapid to overcome the effort associated with “inverting” the preconditioner itself [i.e., computing ^{−1}(·)]. In general, the closer the preconditioner approximates the original operator, the faster the solver converges but the more difficult it is to compute ^{−1}(·).

There is no general method for designing an optimal preconditioner (Axelsson 1994, section 7). Kadogliu and Mudrick (1992) argued that basic point-Jacobi relaxation provides an effective preconditioner for GMRES(*k*) applied to atmospheric flow problems. However, their conclusion is derived from solving the diagnostic quasigeostrophic omega equation in a periodic channel—a fairly simple elliptic problem that does not reflect the full complexity of a multiscale atmosphere. In particular, it does not extrapolate to prognostic problems when the shallowness of the earth's atmosphere dictates condition numbers *κ*(*O*(10^{10}); recall that *κ*(*κ*(^{−1/2}, a direct preconditioner in the vertical is the “categorical imperative” of an effective iterative solver for all-scale atmospheric models (e.g., Marshall et al. 1997; Skamarock et al. 1997; Thomas et al. 1998; Smolarkiewicz et al. 2001). In this paper we take this preconditioning strategy in the vertical for granted and focus on the optimization of the horizontal component of

Guided by the development of elliptic solvers for early anelastic models, we extend a fast direct preconditioning strategy to three spatial dimensions via horizontal spectral decomposition. This offers an alternative to alternating-direction-implicit (ADI) iterative methods (Skamarock et al. 1997), of which the line Jacobi preconditioner is a special case (appendix B). The use of the FFT approach has a long history in the design of elliptic solvers. Because FFTs are effective for separable problems, early elliptic solvers for meteorological models often employed operator decomposition (into a separable part and the residual) and a *stationary* block-iteration with the nonseparable residual lagging behind (cf. Schumann and Volkert 1987 and Schumann and Sweet 1988, and references therein). Bernardet (1995) investigated an alternative approach (conceptually the opposite) where the full operator is retained in a *nonstationary* Krylov iteration, but FFTs are used to invert the “flat” preconditioner obtained by neglecting orography. He used the classical, symmetric, conjugate-gradient solver of Hestenes and Stiefel (1952), thereby necessitating careful modifications to the boundary conditions (in order to assure a self-adjoint elliptic operator). His development addressed 2D mesoscale problems in the *x*–*z* plane. Because of the inherent difficulties with extending this approach to 3D, Lafore et al. (1998) adopted an FFT-preconditioned Richardson iteration. Recently, Elman and O'Leary (1998) analyzed the performance of nonsymmetric Krylov solvers for indefinite 3D Helmholtz equations with Sommerfeld radiation boundary conditions and found that FFT methods can result in effective preconditioners even on parallel architectures. Because of the indefiniteness, they addressed a more difficult problem than those typically encountered in meteorological applications. On the other hand, their Helmholtz operator contained the standard Laplacian on isotropic grids and had constant coefficients. Both the coefficients and solutions were smooth.

Although there exists a substantial body of evidence indicating the potential of FFT preconditioners, a systematic study in the context of all-scale meteorological applications is needed. This is because modeling natural flows leads to elliptic problems characterized by highly inhomogeneous anistropic coefficients and full spectra forced on the rhs. We complement earlier work with a comprehensive study based on a fully compressible nonhydrostatic all-scale atmospheric model (Thomas et al. 1998). A semi-implicit time discretization leads to a nonsymmetric, but definite, Helmholtz problem. In order to design an effective FFT preconditioner, we derive a separable constant-coefficient approximation to the semi-implicit Helmholtz equation by dropping cross-derivative terms and averaging metric coefficients across the domain. Neumann boundary conditions are specified to close the problem, and thus a discrete cosine transform (DCT) is required to diagonalize the discrete horizontal Laplacian component of the preconditioner.

Robustness is vital for the elliptic solver when a numerical model admits a broad range of scales of motion. We assess the relative efficiency of line relaxation versus DCT preconditioners in both linear and nonlinear flow regimes (prototypes for, respectively, laminar and turbulent flows) throughout the range *O*(10^{1})–*O*(10^{7}) m of spatial scales. Our study encompasses the simulation of shallow thermal convection, small and large mesoscale mountain flows, and synoptic-scale winter storm prediction. We measure the performance of the two preconditioners by comparing the amount of work (the iteration count and total CPU time of the model) required to achieve a specified convergence threshold. Because the residual error of the elliptic problem has a sense of Δ*t*-normalized first-order spatial partial derivatives of velocity, the convergence criterion is based on the magnitude of the local Courant number and its variations. This specifies both the absolute admissible error and its magnitude compared to the flow. In effect, both numerical and physical error metrics are employed (cf. Smolarkiewicz et al. 1997).

The paper is organized as follows: Section 2 summarizes the governing equations of the model along with their discretization and formulation of the elliptic problem. Section 3 presents the derivation of a spectral preconditioner. Section 4 discusses the results of our simulations. Remarks in section 5 conclude the paper.

## 2. Model description

### a. Analytic formulation

**Ω**, it consists of mass, momentum, and entropy evolution laws, and the equation of state:

*D*/

*Dt*= ∂/∂

*t*+

**v**· ∇ is the material-derivative operator,

**v**the velocity vector,

*p*thermodynamic pressure,

*ρ*density, and

**g**combines gravitational and centrifugal accelerations. Also, Θ =

*T*exp(−

*κq*) is the potential temperature, where

*T*denotes the temperature,

*κ*=

*R*/

*c*

_{p}(with

*R*and

*c*

_{p}being the gas constant and specific heat at constant pressure), and

*q*= ln(

*p*/

*p*

_{00}), with

*p*

_{00}= 100 kPa.

^{3}

*ρ*using the gas law, the governing evolution equations take the form

*B*=

*gT*′/

*T*

_{0}denotes the buoyancy, and

*f*= 2Ω sin

*ϕ*is the Coriolis parameter at latitude

*ϕ.*Also,

*γ*=

*c*

_{p}/

*c*

_{υ}(with

*c*

_{υ}=

*c*

_{p}−

*R*denoting the specific heat at constant volume), and

*N*

_{0}and

*c*

_{0}are, respectively, the Brunt–Väisälä frequency and speed of sound:

*ϕ*

_{0}(cf. Tanguay et al. 1990). The resulting equations take the form

*X,*

*Y*) conformal coordinate-based form of

*D*/

*Dt,*horizontal divergence

*D*

_{H}(

*U,*

*V*) of the wind image (

*U,*

*V*), pseudokinetic energy

*K,*and metric coefficient

*S*are all detailed in appendix C for the sake of completeness. To account for the irregularity of the lower boundary, the system in (6) is transformed to the terrain-following curvilinear framework of Gal-Chen and Sommerville (1975). Optionally, variable spacing of vertical levels is allowed by composing the Gal-Chen transformation with a smooth stretching (Thomas et al. 1998). Rather than complicating (6) with details of the composition of these mappings, all metric terms associated with the vertical transformations are absorbed into the definitions of the partial derivatives (appendix C).

### b. Numerical approximations

The semi-Lagrangian approach employed for approximating (6) on a discrete mesh (Robert 1993) aims at second-order accuracy in space and time.^{4} A staggered Arakawa C grid (Arakawa and Lamb 1977) is employed in the horizontal, and a staggered Tokioka B grid (Tokioka 1978) in the vertical. The associated spatial partial-derivative discrete operators are detailed in appendix D.

*ψ*stands for any of the kinematic or thermodynamic dependent variables of the model, and

*F*

_{ψ}for the associated rhs. Then, semi-Lagrangian schemes can be interpreted as approximations to the trajectory integrals of the governing equations:

*ψ*and

*ψ*

_{0}denote, respectively, the values of

*ψ*(

**X**,

*t*) at the two endpoints of the trajectory Γ connecting [

**X**

_{0}(

**X**,

*t*

_{1}),

*t*

_{0}] and (

**X**,

*t*

_{1}). For our study (of spectral preconditioners), we adopt the Mesoscale Compressible Community (MC2) model (Tanguay et al. 1990; Benoit et al. 1997; Laprise et al. 1997; Thomas et al. 1998, and references therein), which implements the semi-implicit semi-Lagrangian algorithm of Robert (1981). In the context of the integral representation, this scheme blends midpoint three-time-level and trapezoidal two-time-level quadrature rules. The midpoint and trapezoidal approximations result, respectively, in explicit (leapfrog) and implicit (Crank–Nicholson) time discretizations of the various forcing terms. In order to concisely present the resulting finite-difference equations, we define the discrete operators

*ψ*

^{n+1}

_{i}

*ψ*at the grid point (

**X**

_{i},

*t*

^{n+1}), and

*ψ*

^{n−1}

_{0}

*ψ*at the foot (

**X**

_{0},

*t*

^{n−1}) of the trajectory arriving at (

**X**

_{i},

*t*

^{n+1}), evaluated using cubic interpolation from the neighboring grid points. With the definitions in (9), the compact discretized form of the equations (6) becomes (after some regrouping)

*D*

_{X},

*D*

_{Y},

*D*

_{Z}, and

*μ*

_{Z}) are defined in appendix D, and the

*F*

^{n}

_{ψ}

**X**,

*t*

^{n}), with

**X**≡ (

**X**

_{i}+

**X**

_{0})/2 [cf. Thomas et al. (1998) for a discussion of the trajectory scheme)].

### c. Elliptic problem

*unknown*

*t*

^{n+1}terms are grouped on the lhs, and all

*known*

*t*

^{n}and

*t*

^{n−1}terms are separated on the rhs. Denoting the rhs by

*Q*and dropping all temporal superscripts (as there is no ambiguity at this point), the resulting linear system of algebraic equations (valid at each grid point) is written as follows:

*q*′, all remaining dependent variables are eliminated from (11)–(15). Combining (14) and (15) eliminates the buoyancy

*B*and results in

*D*

_{H}(

*U,*

*V*) in (11),

*Q*

^{′}

_{q′}

*Q*

_{q′}

*tγD*

_{H}

*Q*

_{U}

*Q*

_{V}

*t*

^{2}

*N*

^{2}

_{0}

*w*in (16) and substituting the result in (18), we arrive at the discrete Helmholtz equation,

*Q*

^{*}

_{q′}

*Q*

^{′}

_{q′}

*tD*

^{(2)}

_{Z}

^{−1}

*Q*

^{′}

_{w}

### d. Boundary conditions

*z*

_{top}, the terrain-following coordinate

*Z*becomes flat [cf. (C7) in appendix C], whereupon the contravariant “vertical” velocity becomes identical to the vertical component of the physical velocity and they both vanish:

*Ż*

_{ztop}

*w*

_{ztop}

*Ż*|

_{0}= 0), implying that the physical vertical velocity

*w*follows coordinate surfaces according to

*w*

_{0}

*GS*

*G*

^{13}

*U*

*G*

^{23}

*V*

_{0}

*q*′ associated with (23) and (24) are derived by substituting the momentum equations (12) and (13) into, respectively, (23) and (24) and requiring that the resulting relations be consistent with (16). Then

*Q*

^{′}

_{w}

*D*

^{(1)}

_{Z}

*Z*= 0 and

*Z*=

*z*

_{top}.

^{5}

## 3. Spectral preconditioner

*q*

*Q*

^{*}

_{q′}

*D*

_{H}(

*D*

_{X},

*D*

_{Y}) embedded in

*X*–

*Z*and

*Y*–

*Z*derivatives and averages. The first step in constructing an effective preconditioner

*e*

*r*

^{2}

_{H}

*δ*

_{XX}+

*δ*

_{YY}denotes the standard discrete horizontal Laplacian;

*e*is the current iterate (of the Krylov solver) estimation of the solution error

*e*=

*q*′ −

*q*

*r*is the current iterate value of the residual error

*r*=

*q*′ −

*Q*

^{*}

_{q′}

*S*=

*S*(

*X,*

*Y*) is a function of both

*X*and

*Y.*

^{6}In general, a separable operator is obtained by a cross-directional homogenization of metric coefficients—for example, by averaging

*S*that multiplies the

*δ*

_{XX}and

*δ*

_{YY}parts of

^{2}

_{H}

*Y*and

*X*directions—whereupon the tensor-product method (Lynch et al. 1964) can be applied to construct a direct solver. This requires computing separately the eigenvalues and eigenvectors of the matrices repre-senting the discrete operators

*S*

^{Y}

*δ*

_{XX}and

*S*

^{X}

*δ*

_{YY}, and diagonalization follows by combining two unitary similarity transformations. Insofar as stand-alone solvers for separable problems are concerned, this approach requires

*O*(

*n*

_{z}

*n*

^{3}) operations and is not competitive with ADI schemes [an

*O*(

*n*

_{z}

*n*

^{2}log

^{2}

*n*) operation count] except when a fast transform [e.g., FFT with

*O*(

*n*

_{z}

*n*

^{2}log

*n*) operations] is available (Lynch et al. 1964). Here,

*n*=

*n*

_{x}=

*n*

_{y}refers to the number of mesh points in either horizontal direction, and

*n*

_{z}is the number of points in the vertical. Because (27) is merely a preconditioner,

*S*can be globally averaged or extremized (Bernardet 1995; Smolarkiewicz and Margolin 2000), thereby enabling application of the FFT. Here, we replace

*S*in (27) with the global mean; flat topography is specified to assure constant coefficients in the vertical derivatives. The discrete operator

^{2}

_{H}

*homogeneous*Neumann boundary conditions,

^{7}is then readily diagonalized by the real part of the Fourier transform (via the discrete cosine transform; Sweet 1973). This results in

*n*

_{x}×

*n*

_{y}independent tridiagonal vertical problems:

^{2}

_{H}

*X*and

*Y*directions, and

## 4. Numerical experiments

^{8}is designed to compare the performance of the DCT preconditioner against the line Jacobi preconditioner, proven effective in meteorological applications (Thomas et al. 1998, 2000). The relative efficiency of the two preconditioners is measured with two Jacobi/DCT ratios [hereafter the performance ratios (PR)]: the ratio of total GCR solver iterations and the ratio of the total model CPU times. Both ratios are informative. For a given convergence criterion, the first one estimates the computational cost of the solver, independent of the implementation and machine, whereas the second assesses the total work required for the application at hand. Values of PR > 1 show efficiency gains by the DCT preconditioner. Because the residual error of the Helmholtz problem in (21) combines Δ

*t*-normalized first-order spatial partial derivatives of the velocity components, it motivates the stopping criterion:

*q*

*Q*

^{*}

_{q′}

*C,*

*L*

*C*and

*L*represent the Courant ‖Δ

*t*

**v**/Δ

**X**‖ and Lipschitz ‖Δ

*t*(∂

**v**/∂

**x**)‖ numbers—correspondingly, the measures of the flow magnitude and its variations. Typically, in order to control the stability/accuracy of the Eulerian and semi-Lagrangian computations,

*C*∼

*L*∼

*O*(1), whereupon ε specifies both the absolute admissible error and its magnitude compared to the flow.

The test cases considered here are designed to evaluate the robustness and efficiency of the DCT preconditioner over a wide range of meteorological applications. The first test is very special; it probes the *reflexivity* of our preconditioned Krylov approach. In all problems where the preconditioner is an exact solver, GCR should converge in one iteration to machine precision, regardless of the value of ε. In the MC2 model, all small- to mesoscale problems with map scale factor *m* ≡ 1 in (C1) and flat lower boundary with *h*(*X,* *Y*) ≡ const in (C7) belong to this class, which constitutes a broad range of research applications including studies of fully developed turbulence and deep moist convection. For illustration, we consider a full-spectrum turbulent thermal convection simulation.

The physical scenario is a buoyant thermal rising in a neutrally stratified environment—a 3D extension of the earlier experiment in Smolarkiewicz and Pudykiewicz (1992). The grid employed consists of 100 × 100 × 150 points with constant interval Δ*X* = Δ*Y* = Δ*Z* = 10 m; Δ*t* = 2.5 s. Initially, a spherical temperature anomaly of radius *r*_{0} = 250 m and Θ′ = 0.5 K is centered in the horizontal, *r*_{0} + Δ*Z* above the lower boundary. Figure 1 shows potential temperature (Θ) *xz* and *xy* cross sections through the center of the convective bubble and at *z* = 800 m, respectively; at the end of the integration, *t* = 180Δ*t,* by which time the flow becomes turbulent and exhibits a full range of scales (see Fig. 2).

The solution in Figs. 1 and 2 is for the DCT preconditioner. It is independent of the stopping criterion in (29). This is documented in Fig. 3, which compares the rms divergence of the solution using the DCT and line Jacobi preconditioners for a range of ε. Following Skamarock et al. (1997), this diagnostic is used to measure the solution convergence in elastic models. For the Jacobi preconditioner, as ε → 0 the curve flattens, indicating the critical value ε = 10^{−3}, below which the solution can be considered physically converged; that is, further solver iterations do not contribute to the overall accuracy of the results. Because the DCT preconditioner is a direct solver of the elliptic problem at hand, convergence is reached in a single GCR iteration regardless of the specified value of ε. The latter is not the case for the Jacobi preconditioner, which requires an increasing number of GCR iterations as ε → 0. Figure 4 demonstrates this by displaying the PR for iteration count and total CPU time as a function of ε. Here, the DCT-preconditioned solver converges in one iteration, and the PR iteration count is equivalent to the average number of GCR iterations (per model time step) using the Jacobi preconditioner. The PR CPU time documents that, even with the overhead associated with computing FFTs, the DCT preconditioner is at least an order of magnitude more efficient for all values of ε representing the convergent solutions.

The second test simulates a smooth, vertically propagating 3D mountain wave, identical to those in Saito et al. (1998) and Thomas et al. (2000). The resulting flow is characterized by small-amplitude gravity wave response at Froude number *F* ≡ *U*/*Nh* = 4.4, where the linear theory of Smith (1980) is qualitatively valid. To contrast with the buoyant thermal case, the linear flow regime is chosen because it emphasizes a low signal-to-noise ratio. Here, unlike in the previous case, the energy spectrum of a fully developed solution is compact, and the grid aspect ratio significantly departs from unity, reflecting the anisotropy of the problem. To quantify the effect of domain anisotropy (shallow versus deep flow scenarios) on solver performance, both hydrostatic and nonhydrostatic variants are addressed [cf. Skamarock et al. (1997) for a discussion]. The parameters of the experiment are as follows.

*a*= 3Δ

*X*= 3Δ

*Y*and the height

*h*

_{0}= 100 m is centered at the (30, 30) grid point of the 61 × 61 horizontal mesh. In the vertical, 31 exponentially stretched levels extend to 19 km, resulting in vertical grid increments ranging from 40 to 1180 m, respectively, for the bottom and top thermodynamic layers. The model is integrated over 480 time steps, when the solution is considered to have reached an approximate steady state. A constant inflow velocity of

*U*= 8 m s

^{−1}and an isothermal atmosphere with buoyancy frequency

*N*= 0.018 s

^{−1}are assumed. The Coriolis force and the curvature of the earth are neglected. No subgrid-scale parameterizations are employed. The Davies (1976) relaxation scheme is used at the lateral boundaries to minimize wave reflections.

In the hydrostatic case, the grid length Δ*X* = 2000 m and time step Δ*t* = 60 s result in the advective Courant number *C* = 0.24. The horizontal scale of the mountain is *a* = 6000 m, so *aN*/*U* = 13.5. In the nonhydrostatic case, Δ*X* = 400 m and Δ*t* = 15 s, giving *C* = 0.3. The horizontal scale decreases to *a* = 1200 m, so that *aN*/*U* = 2.7, whereupon nonhydrostatic effects are significant. For illustration, Fig. 5 displays the converged solution. Vertical velocity cross sections are shown through the vertical center plane and on the horizontal plane at *z* = 2100 m (3/4 of the vertical wavelength) for both cases. The maximum vertical velocities at this level are 0.06 and 0.2 m s^{−1} in the hydrostatic and nonhydrostatic cases, respectively. Note the downstream tilt of the wave packet (manifested by the downstream shift of the maxima) in the nonhydrostatic case. The corresponding energy spectra in the mean flow direction at *z* = 1800 m are shown in Fig. 6.

The iteration count of the Krylov solver using the DCT (solid line) and line Jacobi (dashed line) preconditioners is compared in Fig. 7 as a function of the convergence threshold ε in (29). The rms divergence (thick solid line) shows the critical value of ε = 10^{−5}, below which the solution is considered physically converged.^{9} It is apparent that the number of iterations per time step required for the line Jacobi preconditioner is about 5(10) times that required for the DCT scheme in the hydrostatic (nonhydrostatic) case. However, this does not imply that the efficiency of the DCT preconditioner follows the same pattern. The computational cost of a single GCR iteration with the DCT preconditioner is much higher than with line Jacobi, whereupon the CPU time performance ratio is about 1.5 for the hydrostatic result and somewhat over 2 for the nonhydrostatic solution (see Fig. 8). The trend of decreasing relative efficacy of the DCT preconditioner with increasing hydrostacy of the problem is consistent with that observed for ADI preconditioners (Skamarock et al. 1997). It is likely related to the propagation characteristics of gravity waves present in the system.

Next, the complex Scandinavian orography is used for the lower boundary to probe the solver performance for natural mesoscale flow scenarios containing many scales of motion. For this case, Δ*X* = Δ*Y* = 10 km, and Δ*Z* = 300 m, giving a grid aspect ratio of 0.01. The total horizontal domain size is 2000 km × 2000 km (200 × 200 horizontal grid points), with the model lid placed at 15 km. A uniform northwesterly ambient flow with speed of 10 m s^{−1} and buoyancy frequency *N* = 0.01 s^{−1} are assumed. Unlike in the previous cases, a *β*-plane approximation is employed, with the map scale factor *m* varying accordingly in (C1). The model is integrated over 24 h with 288 5-min time steps. The converged wind solution on the lowest momentum level (approximately 150 m above the ground) is shown in Fig. 9, for illustration. Based on experience gained from the earlier test cases, a safe value ε = 10^{−6} is assumed to assure physical convergence. Performance-ratio values are 4.04 and 5.46 for CPU time and iteration count, respectively, documenting much higher gains for the DCT preconditioner compared to the isolated mountain case discussed above. This is not surprising because compact spectra limit the number of significant eigenmodes present in the solution, thereby accelerating the convergence of Krylov solvers. For the sake of completeness, a 3D ADI preconditioner was also tested in this case. The number of GCR iterations required (for the given convergence threshold) is the same as with the DCT preconditioner, but the computational overhead is substantial, owing to multiple ADI iterations within the preconditioner itself.

Finally, we test the performance of the DCT preconditioner on a synoptic-scale, weather-prediction problem. An arbitrarily chosen forecast from 0000 UTC 16 September 2002 is run using initial and boundary conditions supplied by the National Centers for Environmental Prediction (NCEP) Eta Model. The MC2 model is integrated over 48 h with 480 6-min time steps. The computational domain is 200 × 160 × 19, with Δ*X* = Δ*Y* = 50 km on a polar stereographic projection true at 60°N, and the vertically stretched grid ranges from Δ*Z* = 72 m near the surface to 3800 m near the top at 23 km. The resulting condition number of the Helmholtz operator is extreme, *O*(10^{12}), and the energy spectrum full. As in the Scandinavian case, ε = 10^{−6} is assumed. The resulting performance ratios are 1.93 and 4.90 for CPU time and iteration count, respectively. The DCT preconditioner again requires fewer iterations, and the model integration rate is about twice as fast. This demonstrates that the DCT is far superior to line Jacobi, despite the large departure of the preconditioner from the full Helmholtz operator emphasized by a synoptic-scale forecast.

To assess the relative merits of alternative Krylov methods, we also implemented the right-preconditioned flexible GMRES (FGMRES) algorithm of Saad (1993).^{10} For comparison, we ran the MC2 model using FGMRES(4), with DCT preconditioner, for the Scandinavian topography simulation and winter storm synoptic forecast. In both cases, we found that FGMRES requires additional iterations because it tends to converge more slowly than GCR after a restart. The model run times were nearly identical, with FGMRES giving a slight advantage in the synoptic case. To compare solutions, we examined the maximum Courant numbers (equivalent to a maximum norm) and residual *L*_{2} norm (least squares error) of the elliptic problem over the computational domain for the GCR and FGMRES solvers. In both cases, these agree to six digits, and thus we conclude that the solutions are the same.

## 5. Concluding remarks

We have developed a spectral preconditioner for a generalized conjugate residual (GCR) Krylov solver applied to the Helmholtz problem arising in the semi-implicit time discretization of a compressible nonhydrostatic atmospheric model. Our approach offers an alternative to the more standard and much simpler line-relaxation, stationary block-iteration-type preconditioners used in meteorology. Despite substantial departures of the spectral preconditioner from the governing Helmholtz operator, we observed dramatic performance gains over these simpler schemes. The relative improvement in the model integration rate ranges from a factor of 2 to one order of magnitude, depending on the problem at hand.

The relative performance gains are, of course, greatest when the DCT preconditioner happens to be a direct solver for the governing Helmholtz problem, that is, zero orography and constant mapscale factor characteristic of small-scale dynamics applications. In a mesoscale flow scenario over Scandinavian topography, with a broad range of scales of motion present, a factor of 4 improvement was achieved. For smooth idealized mountain wave flows, frequently used to benchmark numerical model performance, the relative gains are more modest. But extrapolating these results to natural problems underestimates the potential gains because idealized flows often have compact spectra that favor simpler schemes. For synoptic weather prediction, where the shallowness of the atmosphere implies an extremely poorly conditioned problem, the performance improvement is still a factor of 2.

Spectral methods rely on global basis functions, and computing their expansion coefficients requires multiple evaluations of global sums. Consequently, spectral transform methods are criticized as being inappropriate for modern, distributed-memory parallel architectures. Nevertheless, in the context of FFT-based preconditioners for iterative Krylov methods, Elman and O'Leary (1998) documented overall parallel speedups despite the large data communication overheads. In meteorology, the vertical direction is physically distinct and typically not partitioned across the processors in numerical models. Thus, a massively parallel implementation of spectral preconditioners can be based on a data-transposition strategy (Thomas et al. 2002) similar to that used in the European Centre for Medium-Range Weather Forecasts (ECMWF) weather prediction model Integrated Forecast System (IFS; Skålin 1997; Barros and Kauranne 1994).

The performance of the next-generation (all-scale, nonhydrostatic semi-implicit, weather and climate prediction/research) numerical models in meteorology depends critically on the performance of elliptic solvers employed. It is widely recognized that the state-of-the-art methods rely on preconditioned nonsymmetric Krylov algorithms. Given the extensive body of mathematical literature, still relatively little numerical experience exists with such solvers in advanced meteorological applications (cf. Navon and Cai 1993). We have advocated a spectrally preconditioned GCR solver in the context of a particular model, representative of current trends in NWP. Further studies with a focus on alternative numerics, mesh refinement, complex geometries, and extreme meteorological scenarios will pose/define extreme challenges/benchmarks to the advocated approach.

## Acknowledgments

The authors are grateful to Claude Girard and Michel Desgagné of RPN and Environment Canada, for maintaining and improving the MC2 model, and to Xingxiu Deng and Henryk Modzelewski from UBC, for preparing the synoptic case and adept computer administration, respectively. This work was supported by the National Science Foundation, the Department of Energy Climate Change Prediction Program (CCPP), the Canadian Natural Science and Engineering Council, RPN and Environment Canada, the Canadian Foundation for Innovation, BC Knowledge Development Fund, the Geophysical Disaster Computational Fluid Dynamics Center, and UBC Research Endowments. The comments of two anonymous referees helped to improve the presentation.

## REFERENCES

Arakawa, A., and V. Lamb, 1977: Computational design of the basic dynamical processes of the UCLA general circulation model.

*Methods in Computational Physics,*Vol. 17, J. Chang, Ed., Academic Press, 174–265.Axelsson, O., 1994:

*Iterative Solution Methods*. Cambridge University Press, 654 pp.Barros, S., and T. Kauranne, 1994: On the parallelization of global spectral weather models.

,*Parallel Comput.***20****,**1335–1356.Benoit, R., M. Desgagné, P. Pellerin, S. Pellerin, Y. Chartier, and S. Desjardins, 1997: The Canadian MC2: A semi-Lagrangian, semi-implicit wide-band atmospheric model suited for fine-scale process studies and simulations.

,*Mon. Wea. Rev.***125****,**2382–2415.Bernardet, P., 1995: The pressure term in the anelastic model: A symmetric elliptic solver for an Arakawa C grid in generalized coordinates.

,*Mon. Wea. Rev.***123****,**2474–2490.Bourke, W., 1974: A multi-level spectral model. I. Formulation and hemispheric integrations.

,*Mon. Wea. Rev.***102****,**687–701.Cullen, M., 1990: A test of a semi-implicit integration technique for a fully compressible nonhydrostatic model.

,*Quart. J. Roy. Meteor. Soc.***116****,**1253–1258.Davies, H., 1976: A lateral boundary formulation for multi-level prediction models.

,*Quart. J. Roy. Meteor. Soc.***102****,**405–418.Eisenstat, S., H. Elman, and M. Schultz, 1983: Variational iterative methods for nonsymmetric systems of linear equations.

,*SIAM J. Numer. Anal.***2****,**345–357.Elman, H., and D. O'Leary, 1998: Efficient iterative solution of the three-dimensional Helmholtz equation.

,*J. Comput. Phys.***142****,**163–181.Freund, R., 1993: A transpose-free quasi-mimimum residual algorithm for non-Hermitian linear systems.

,*SIAM J. Sci. Stat. Comput.***14****,**470–482.Gal-Chen, T., and R. Sommerville, 1975: On the use of a coordinate transformation for the solution of the Navier–Stokes equations.

,*J. Comput. Phys.***17****,**209–228.Golding, B., 1992: An efficient nonhydrostatic forecast model.

,*Meteor. Atmos. Phys.***50****,**89–103.Grabowski, W., and P. Smolarkiewicz, 2002: A multiscale model for meteorological research.

,*Mon. Wea. Rev.***130****,**939–956.Greenbaum, A., 1997:

*Iterative Methods for Solving Linear Systems*. Society of Industrial and Applied Mathematics, 220 pp.Hestenes, M., and E. Stiefel, 1952: Method of conjugate gradients for solving linear systems.

,*J. Res. Natl. Bur. Stand. (U.S.)***49****,**409–436.Kadogliu, H., and S. Mudrick, 1992: On the implementation of the GMRES(

*m*) method to elliptic equations in meteorology.,*J. Comput. Phys.***102****,**348–359.Kapitza, H., and D. Eppel, 1992: The nonhydrostatic mesoscale model GESIMA. Part I: Dynamical equations and tests.

,*Beitr. Phys. Atmos.***65****,**129–146.Lafore, J., and Coauthors. 1998: The Meso–NH atmospheric modeling system. Part I: Adiabatic formulation and control simulations.

,*Ann. Geophys.***16****,**90–109.Laprise, R., D. Caya, G. Bergeron, and M. Giguère, 1997: The formulation of the André Robert MC2 (Mesoscale Compressible Community) model.

,*Atmos.–Ocean***35****,**127–152.Lynch, R., J. Rice, and D. Thomas, 1964: Direct solution of partial difference equations by tensor product methods.

,*Numer. Math.***6****,**185–199.Machenhauer, B., and R. Daley, 1972: A baroclinic primitive equation model with a spectral representation in three dimensions. Tech. Rep. 4, Institute of Theoretical Meteorology, University of Copenhagen, 63 pp.

Marshall, J., C. Hill, L. Perelman, and A. Adcroft, 1997: Hydrostatic, quasi-hydrostatic and non-hydrostatic ocean modeling.

,*J. Geophys. Res.***102****,**5733–5752.Prusa, J., P. Smolarkiewicz, and R. Garcia, 1996: On the propagation and breaking at high altitudes of gravity waves excited by tropospheric forcing.

,*J. Atmos. Sci.***53****,**2186–2216.Roache, P., 1972:

*Computational Fluid Dynamics*. Hermosa, 446 pp.Robert, A., 1969: Integration of a spectral model of the atmosphere by the implicit method.

*Proc. WMO/IUGG Symp. on NWP,*Vol. 7, Tokyo, Japan, Japan Meteorological Agency, 19–24.Robert, A., 1981: A stable numerical integration scheme for the primitive meteorological equations.

,*Atmos.–Ocean***19****,**35–46.Robert, A., 1993: Bubble convection experiments with a semi-implicit formulation of the Euler equations.

,*J. Atmos. Sci.***50****,**1865–1873.Robert, A., T. Yee, and H. Ritchie, 1985: A semi-Lagrangian semi-implicit numerical integration scheme for multilevel atmospheric models.

,*Mon. Wea. Rev.***113****,**388–394.Saad, Y., 1993: A flexible inner-outer preconditioned GMRES algorithm.

,*SIAM J. Sci. Stat. Comput.***14****,**461–469.Saad, Y., 1996:

*Iterative Methods for Sparse Linear Systems*. PWS, 447 pp.Saad, Y., and M. Schultz, 1986: GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems.

,*SIAM J. Sci. Stat. Comput.***7****,**856–869.Saito, K., U. Schättler, and U. Steppler, 1998: 3D mountain waves by the Lokal-Modell of DWD and the MRI mesoscale nonhydrostatic model.

,*Pap. Meteor. Geophys.***49****,**7–19.Schumann, U., and H. Volkert, 1987: Three-dimensional mass- and momentum-consistent Helmholtz-equation in terrain-following coordinates.

*Notes on Numerical Fluid Mechanics,*W. Hackbusch, Ed., Vol. 10, Springer-Verlag, 109–131.Schumann, U., and R. Sweet, 1988: Fast Fourier transforms for direct solution of Poisson's equation with staggered boundary conditions.

,*J. Comput. Phys.***75****,**123–137.Skålin, R., 1997: Scalability of parallel gridpoint limited-area atmospheric models. Part II: Semi-implicit time-integration schemes.

,*J. Atmos. Oceanic Technol.***14****,**442–455.Skamarock, W., P. Smolarkiewicz, and J. Klemp, 1997: Preconditioned conjugate-residual solvers for Helmholtz equations in nonhydrostatic models.

,*Mon. Wea. Rev.***125****,**587–599.Smith, R., 1980: Linear theory of stratified hydrostatic flow past an isolated mountain.

,*Tellus***32****,**348–364.Smolarkiewicz, P., and J. Pudykiewicz, 1992: A class of semi-Lagrangian approximations for fluids.

,*J. Atmos. Sci.***49****,**2082–2096.Smolarkiewicz, P., and L. Margolin, 1994: Variational solver for elliptic problems in atmospheric flows.

,*Appl. Math. Comput. Sci.***4****,**527–551.Smolarkiewicz, P., and L. Margolin, 1997: On forward-in-time differencing for fluids: An Eulerian/semi-Lagrangian nonhydrostatic model for stratified flows.

,*Atmos.–Ocean***35****,**127–152.Smolarkiewicz, P., and L. Margolin, 2000: Variational methods for elliptic problems in fluid models.

*Proc. ECMWF Workshop on Developments in Numerical Methods for Very High Resolution Global Models,*Reading, United Kingdom, ECMWF, 137–159.Smolarkiewicz, P., and J. Prusa, 2002: VLES modelling of geophysical fluids with nonoscillatory forward-in-time schemes.

,*Int. J. Numer. Methods Fluids***39****,**779–819.Smolarkiewicz, P., V. Grubisǐć, and L. Margolin, 1997: On forward-in-time differencing for fluids: Stopping criteria for iterative solutions of anelastic pressure equations.

,*Mon. Wea. Rev.***125****,**647–654.Sweet, R., 1973: Direct methods for the solution of Poisson's equation on a staggered grid.

,*J. Comput. Phys.***12****,**422–428.Tanguay, M., A. Robert, and R. Laprise, 1990: A semi-implicit semi-Lagrangian fully compressible regional forecast model.

,*Mon. Wea. Rev.***118****,**1970–1980.Tapp, M., and P. White, 1976: A nonhydrostatic mesoscale model.

,*Quart. J. Roy. Meteor. Soc.***102****,**277–296.Thomas, S., C. Girard, R. Benoit, M. Desgagné, and P. Pellerin, 1998: A new adiabatic kernel for the MC2 model.

,*Atmos.–Ocean***36****,**241–270.Thomas, S., C. Girard, G. Doms, and U. Schättler, 2000: Semi-implicit scheme for the DWD Lokal-Modell.

,*Meteor. Atmos. Phys.***73****,**105–125.Thomas, S., J. Hacker, M. Desgagné, and R. Stull, 2002: An ensemble analysis of forecast errors related to floating point performance.

,*Wea. Forecasting***17****,**898–906.Tokioka, T., 1978: Some considerations on vertical differencing.

,*J. Meteor. Soc. Japan***56****,**98–111.van der Vorst, H., 1992: Bi-CGStab: A fast and smoothly converging variant of Bi-CG for the solution of non-symmetric linear systems.

,*SIAM J. Sci. Stat. Comput.***12****,**631–644.Yeh, K-S., J. Cote, S. Gravel, A. Methot, A. Patoine, M. Roch, and A. Staniforth, 2002: The CMC–MRB Global Environmental Multiscale (GEM) model. Part III: Nonhydrostatic formulation.

,*Mon. Wea. Rev.***130****,**339–356.

## APPENDIX A

### Generalized Conjugate-Residual Approach

*k*) algorithm used in this study. In general, we consider a linear elliptic problem,

*A,*

*C*

^{IJ},

*D*

^{I},

*Q,*and either periodic, Dirichlet, or Neumann boundary conditions and adopt the following notation: The discrete representation of a field on the grid is denoted by the subscript

**i**; the discrete representation of the elliptic operator on the lhs of (A1) is denoted by

_{i}(Ψ); and the inner product 〈

*ξζ*〉 ≡ Σ

_{i}

*ξ*

_{i}

*ζ*

_{i}. The preconditioner

^{−1}. In this paper, we are primarily concerned with “left” preconditioning that substitutes (A1) with an auxiliary problem

^{−1}[

*Q*] = 0. Its accelerated convergence exploits spectral properties of

^{−1}

^{−1}[

*Q,*and its convergence relies on reducing the spectral radius of

^{−1}. Left preconditioning assumes

*k*) method of Eisenstat et al. (1983) may be derived via variational arguments (cf. Smolarkiewicz and Margolin 1994, 2000). In essence, we augment (A1) with a

*k*th-order damped oscillation equation:

*τ*to form the affine discrete equation for the progression of the residual errors

*r*and determine the optimal parameters

*T*

_{1}, … ,

*T*

_{k−1}and integration increment Δ

*τ*(variable in

*τ*) that assure minimization of the residual errors in the norm defined by the inner product 〈

*rr*〉. This leads to the following algorithm.

^{0}

_{i}

*r*

^{0}

_{i}

_{i}(Ψ

^{0}) −

*Q*

_{i},

*p*

^{0}

_{i}

^{−1}

_{i}

*r*

^{0}); then iterate:

For the convergence, the GCR(*k*) algorithm above requires ^{−1} to be negative definite^{A1} but not necessarily self-adjoint.^{A2} Direct evaluation of the elliptic operator on the grid takes place only once per iteration following the preconditioning *e* = ^{−1}(*r*^{ν+1}), which provides an estimate of the solution error *e*^{ν+1} = Ψ^{ν+1} − Ψ_{exact}.

## APPENDIX B

### Line-Relaxation Preconditioners

*e*=

^{−1}(

*r*

^{ν+1}) of the GCR(

*k*) solver summarized in appendix A. Here,

^{h}and

^{z}are the horizontal and the vertical counterparts of the operator

*τ̃*

^{h}[viz., linear stability analysis of (B1)];

*μ*numbers successive Richardson iterations; and

*ν*numbers the outer iterations of the Krylov solver. Equation (B1) leads to a linear problem,

*τ̃*

^{z}

*e*

^{μ+1}

*R̃*

^{μ}

*R̃*

^{μ}≡

*e*

^{μ}+ Δ

*τ̃*

^{h}(

*e*

^{μ}) −

*r*

^{ν+1}], that can be solved readily using the well-known tridiagonal algorithm (cf. appendix A in Roache 1972). Alternating implicit discretization between the vertical and horizontal counterparts of

*τ̃*

^{h}to the diagonally preconditioned Duffort–Frankel type implicit algorithm,

^{h}on the grid. Note that adding the relaxation term on the rhs of (B1) has the effect of replacing the

*e*

^{μ}term with

*e*

^{μ+1}) in

^{h}(

*e*

^{μ}) without complicating flux boundary conditions imposed in constructing

^{h}(

*e*) (cf. Smolarkiewicz and Margolin 2000 for discussions). In the limit Δ

*τ̃*

## APPENDIX C

### Coordinate Systems

*ϕ*

_{0},

*m*is the map scale factor and

*S*=

*m*

^{2}:

*U,*

*V*) of the wind “image” are defined in terms of the scaled and rotated components of the true horizontal velocity (

*u,*

*υ*) in local (

*x,*

*y,*

*z*) coordinates,

*λ*represents longitude. The (

*X,*

*Y*) components of the conformal coordinates are scaled and rotated according to

*X,*

*Y,*

*z*), the Euler equations take the following form:

*K*= (

*U*

^{2}+

*V*

^{2})/2 is the pseudohorizontal kinetic energy per unit mass. The material derivative is written as

*U,*

*V,*

*w*) is given by

*U,*

*V*) on the rhs is denoted by

*D*

_{H}(

*U,*

*V*) throughout the body of the paper.

*Z.*Throughout this paper we employ the classical transformation of Gal-Chen and Sommerville (1975), where

*Z*in (C7) is assumed to be time independent [cf. Prusa et al. (1996) for time-dependent extensions], the resulting

*Z*-coordinate system represents a nondeformable system with surfaces of constant

*Z*fixed in physical space—in contrast to various pressure-based vertical coordinates used in hydrostatic weather-prediction models. The mapping from conformal (

*X,*

*Y,*

*z*) coordinates to curvilinear (

*X,*

*Y,*

*Z*) coordinates is represented by the metric coefficients

*G*

^{IJ}of the transformation:

*Z*increases from the bottom of the model domain to the top, and the

*Z*surface corresponding to the lower boundary is conformal to the terrain. Thus, by definition,

*G*is always positive. Partial derivatives of any variable

*ψ*are to be interpreted for terrain-following coordinates as follows:

*W*

*Ż*

*G*

^{−1}

*w*

*S*

*G*

^{13}

*U*

*G*

^{23}

*V*

## APPENDIX D

### Numerical Operators

*μ*

_{ζ}and differencing

*δ*

_{ζ}operators are defined as

*ψ*is a dependent model variable that may be defined at the center of a grid cell or on the grid cell faces. Multiple averages are denoted by

*D,*acting on the contravariant wind components, can be written as follows:

*G*of the coordinate transformation is located at grid cell vertices and

*D*(

*U,*

*V,*

*W*) at cell centers. The contravariant velocity component

*W,*defined in (C10), is computed from

*D*(

*U,*

*V,*

*W*) =

*D*

_{H}(

*U,*

*V*) +

*δ*

_{z}

*w,*where

Spanwise (a) *x*-average and (b) *y*-average 1D power spectra at three height levels of the solution in Fig. 1. The thin solid line shows the “−5/3” slope for reference

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Spanwise (a) *x*-average and (b) *y*-average 1D power spectra at three height levels of the solution in Fig. 1. The thin solid line shows the “−5/3” slope for reference

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Spanwise (a) *x*-average and (b) *y*-average 1D power spectra at three height levels of the solution in Fig. 1. The thin solid line shows the “−5/3” slope for reference

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

The rms divergence as a function of ε in Eq. (29) for the convective bubble case shown in Fig. 1. Results with both the Jacobi (solid line) and the DCT (dashed line) preconditioners are shown

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

The rms divergence as a function of ε in Eq. (29) for the convective bubble case shown in Fig. 1. Results with both the Jacobi (solid line) and the DCT (dashed line) preconditioners are shown

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

The rms divergence as a function of ε in Eq. (29) for the convective bubble case shown in Fig. 1. Results with both the Jacobi (solid line) and the DCT (dashed line) preconditioners are shown

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Performance ratios for CPU and iteration count, as a function of ε, for the convective bubble case shown in Fig. 1.

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Performance ratios for CPU and iteration count, as a function of ε, for the convective bubble case shown in Fig. 1.

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Performance ratios for CPU and iteration count, as a function of ε, for the convective bubble case shown in Fig. 1.

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Vertical velocity (left) *x*–*z* and (right) *x–y* cross sections through the mountain wave simulations at time step 480: (top) hydrostatic case and (bottom) nonhydrostatic case. The *x*–*y* cross sections are at *z* = 2100 m. Contour intervals are 0.01 m s^{−1} for the hydrostatic case and 0.04 m s^{−1} for the nonhydrostatic case

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Vertical velocity (left) *x*–*z* and (right) *x–y* cross sections through the mountain wave simulations at time step 480: (top) hydrostatic case and (bottom) nonhydrostatic case. The *x*–*y* cross sections are at *z* = 2100 m. Contour intervals are 0.01 m s^{−1} for the hydrostatic case and 0.04 m s^{−1} for the nonhydrostatic case

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Vertical velocity (left) *x*–*z* and (right) *x–y* cross sections through the mountain wave simulations at time step 480: (top) hydrostatic case and (bottom) nonhydrostatic case. The *x*–*y* cross sections are at *z* = 2100 m. Contour intervals are 0.01 m s^{−1} for the hydrostatic case and 0.04 m s^{−1} for the nonhydrostatic case

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

One-dimensional power spectra through the centers of the right-hand panels in Fig. 5. Solid and dashed lines denote the hydrostatic and nonhydrostatic case, respectively

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

One-dimensional power spectra through the centers of the right-hand panels in Fig. 5. Solid and dashed lines denote the hydrostatic and nonhydrostatic case, respectively

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

One-dimensional power spectra through the centers of the right-hand panels in Fig. 5. Solid and dashed lines denote the hydrostatic and nonhydrostatic case, respectively

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

The rms divergence and average number of GCR iterations per time step for the (a) hydrostatic and (b) nonhydrostatic mountain wave cases. Results with both the Jacobi (JAC) and DCT preconditioner are shown

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

The rms divergence and average number of GCR iterations per time step for the (a) hydrostatic and (b) nonhydrostatic mountain wave cases. Results with both the Jacobi (JAC) and DCT preconditioner are shown

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

The rms divergence and average number of GCR iterations per time step for the (a) hydrostatic and (b) nonhydrostatic mountain wave cases. Results with both the Jacobi (JAC) and DCT preconditioner are shown

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

The CPU time performance ratio as a function of convergence criterion ε for both the hydrostatic and nonhydrostatic mountain wave cases

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

The CPU time performance ratio as a function of convergence criterion ε for both the hydrostatic and nonhydrostatic mountain wave cases

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

The CPU time performance ratio as a function of convergence criterion ε for both the hydrostatic and nonhydrostatic mountain wave cases

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Wind solution for the first model level (*Z* = 150 m) over the Scandinavian topography at 24 h with the DCT preconditioner. Topography is also shown at 500-m intervals

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Wind solution for the first model level (*Z* = 150 m) over the Scandinavian topography at 24 h with the DCT preconditioner. Topography is also shown at 500-m intervals

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

Wind solution for the first model level (*Z* = 150 m) over the Scandinavian topography at 24 h with the DCT preconditioner. Topography is also shown at 500-m intervals

Citation: Monthly Weather Review 131, 10; 10.1175/1520-0493(2003)131<2464:SPFNAM>2.0.CO;2

^{2}

Other popular alternatives include the biconjugate gradients stabilized (BiCGStab; van der Vorst 1992) and transpose-free quasi-minimal residual (TFQMR; Freund 1993) algorithms.

^{3}

Using the auxiliary “pressure” variable *q,* rather than the pressure *p* itself, facilitates isolating the 1/*ρ*∇*p* nonlinearity in the numerical-model equations, analogous to *q* = ln*p*_{s} (surface pressure *p*_{s}) in hydrostatic primitive equation models (cf. Machenhauer and Daley 1972; Bourke 1974; Robert et al. (1985; Tanguay et al. 1990). The Exner function Π = (*p*/*p*_{00})^{κ} is an alternative (Skamarock et al. 1997).

^{4}

Numerical dissipative terms absent from the analytic model (1)—absorbers, filters, subgrid-scale turbulence parameterizations, etc.—are split and integrated to first order.

^{5}

The reader interested in implicit formulations of boundary conditions in the context of Krylov solvers is referred to Smolarkiewicz and Margolin (2000) for a discussion.

^{6}

In models with *X*–*Y* cross differentiation [e.g., due to the coordinate mapping for the sake of mesh adaptivity (Smolarkiewicz and Prusa 2002)] either ADI or spectral preconditioners would require neglecting these cross terms.

^{7}

Nonperiodic boundary conditions are always homogeneous in the preconditioner (Smolarkiewicz and Margolin 2000, section 3).

^{8}

All calculations reported were performed with 64-bit real arithmetic.

^{9}

For the exact steady-state solution, the rms divergence should vanish.

^{10}

Computational complexity estimates for the various algorithms are provided by Saad (1996).

^{A1}

An operator *ξ**ξ*)〉 is either strictly positive (positive definite) or strictly negative (negative definite) for all *ξ.*

^{A2}

An operator *ξ**ζ*)〉 = 〈*ζ**ξ*)〉 for all *ξ* and *ζ.*