Abstract

Fast, accurate computation of geophysical fluid dynamics is often very challenging. This is due to the complexity of the PDEs themselves and their initial and boundary conditions. There are several practical advantages to using a relatively new numerical method, the spectral-element method (SEM), over standard methods. SEM combines spectral-method high accuracy with the geometric flexibility and computational efficiency of finite-element methods.

This paper is intended to augment the few descriptions of SEM that aim at audiences besides numerical-methods specialists. Advantages of SEM with regard to flexibility, accuracy, and efficient parallel performance are explained, including sufficient details that readers may estimate the benefit of applying SEM to their own computations.

The spectral element atmosphere model (SEAM) is an application of SEM to solving the spherical shallow-water or primitive equations. SEAM simulated decaying Jovian atmospheric shallow-water turbulence up to resolution T1067, producing jets and vortices consistent with Rhines theory. SEAM validates the Held–Suarez primitive equations test case and exhibits excellent parallel performance. At T171L20, SEAM scales up to 292 million floating-point operations per second (Mflops) per processor (29% of supercomputer peak) on 32 Compaq ES40 processors (93% efficiency over using 1 processor), allocating 49 spectral elements/processor. At T533L20, SEAM scales up to 130 billion floating-point operations per second (Gflops) (8% of peak) and 9 wall clock minutes per model day on 1024 IBM POWER3 processors (48% efficiency over 16 processors), allocating 17 spectral elements per processor. Local element-mesh refinement with 300% stretching enables conformally embedding T480 within T53 resolution, inside a region containing 73% of the forcing but 6% of the area. Thereby the authors virtually reproduced a uniform-mesh T363 shallow-water computation, at 94% lower cost.

1. Introduction

Our purpose is to describe the spectral element atmosphere model (SEAM), with attention to the qualities that make it an attractive alternative to existing models. These qualities include its high-resolution accuracy, efficiency on parallel computers, and simple provision for local resolution enhancement by local mesh refinement (LMR). In particular, we show how LMR can be used to study regional dynamics within a global model, without the usual recourse to interpolation, boundary-value or boundary-flux fixing, etc. To help the reader judge the suitability of this model and the spectral-element method (SEM) to solve a given system of PDEs, we include most of the mathematical formulation details in appendixes or refer to appropriate literature. The mathematical formulation includes numerical interpolation and quadrature, coordinate maps, and representation of differential operators, etc. The research presented here extends the research presented by Taylor et al. (1997, 1998, hereafter TTI and TLT) and Fournier et al. (2000b), which formed the basis of recent extensions by Thomas et al. (2000, 2001) and Thomas and Loft (2002).

a. Background

1) Numerical solution of PDEs

Perhaps the most straightforward, intuitive methods to discretize and solve PDEs are finite-difference methods (e.g., Morton and Mayers 1994). Spectral methods (SMs) generally are more accurate, and their operator diagonalizations make implicit time steps more efficient (Gottlieb and Orszag 1977; Canuto et al. 1988; Boyd 2000). More generally, both these and other methods may be described in a variational framework, that is, as a minimization of the root-mean-square error (rmse) w.r.t. coefficients defining the approximate solution.

It is well known that in some fluid dynamics problems there is very strong interaction over a large range of spatial scales. In the case of the earth, these scales range from the planetary scale at least down to the mesoscale, five decades smaller. These interactions tend to be localized in physical and wavenumber space, yet most operational models tend to use uniform resolution. Increasing computer power will probably continue to make it more possible to simulate a larger range of scales using uniform resolution, but that power will also be needed for additional physical processes and parallel ensemble prediction. Ideally, we would like models to economize by devoting computations to regions where smaller scales play a more important role. This is the chief motivation behind LMR methods.

Geophysical problems need fields to be represented on the sphere. Spherical-harmonic functions (e.g., Mathews and Walker 1971) offer all the usual SM advantages, as well as a uniquely isotropic representation (i.e., a given triangular truncation of modes that approximate a spherical field and also approximate any rotation of that field with the same accuracy). The isotropy property is very important for building spherical symmetry into operators, and thus is also important for the success of sensitive calculations such as maintaining a balanced steady state. Although spherical harmonics are most efficient per digit of accuracy for globally smooth fields, to represent locally varying fields without “ringing,” they require many constructively or destructively interfering global modes, and thus may not be practically more efficient or accurate than a filtered fourth-order method (Spotz et al. 1998).

Spherical coordinates introduce additional difficulties. Nonlinear terms are most easily computed on grid points, so SM models usually transform between spectral and gridpoint representations during each time step. On a latitude–longitude mesh, curves of constant longitude λ converge as latitude ϕ approaches the poles. The zonal grid length

 
formula

so the Courant–Friedrichs–Lewy stability-limited explicit time step ΔtCFL → 0 also. We shall refer to this, and the fact that spherical metric factors ∝secϕ diverge, together as the spherical pole problem, part of the more general group of sphere-meshing problems.

In effecting the transform between representations, the longitudinal coordinate λ permits the use of the FFT, which costs O(Nλ logNλ) for Nλ nodes, but the latitudinal coordinate ϕ does not permit such a fast transform. The associated Legendre transform from Nϕ latitude circles normally costs O(N2ϕ) for each of the O(Nλ) modes on every circle. This calculation is the most expensive in GCMs at sufficiently high global resolution. Parallel implementation is doable but can be inefficient for large numbers of processing elements (PEs) (Foster et al. 1992; Foster and Worley 1997). This implementation has been accomplished recently, up to resolution T170L18 on 512 PEs (Worley 2002), T799L90 on 704 PEs (Hamrud et al. 2003), and T1279L96 on 5120 PEs (Shingu et al. 2002).

So there are currently at least three needs: locally increased resolution, avoiding pole problems, and efficient parallel implementation. In this paper we address all these needs.

2) Historical approaches to locally increased resolution

Historically there have been many approaches to locally increased resolution. The best known and most flexible is LMR using the finite-element method (e.g., Morton and Mayers 1994). For example, adaptive LMR has been implemented by a multigrid method and applied by Stevens et al. (1999) to large-eddy simulation and boundary layer convection. Mashkovich (1994) described a dynamic grid adaptation based on a given measure of “local space variability of the meteorological field,” for example, geopotential Laplacian. Fox-Rabinovitz et al. (2001) created a “stretched grid” regional-climate model and carefully evaluated various dependencies of the simulation quality on stretching parameters. McGregor (1999) has another approach to grid stretching that he applied to regional-climate and tracer-transport modeling. A stretched finite-element method has been successfully applied to regional-climate modeling by Tanguay et al. (1989), albeit encumbered with spurious lateral walls. Williamson and Rosinski (2000) demonstrated a treatment of the pole problem using reduced grids that vary Δλ with ϕ such that truncated spherical-harmonic series retain a specified global accuracy. Iselin et al. (2002) employ continuously moving grid points to increase local resolution and achieved 90% diffusion-error reduction for 2D linear advection of a passive tracer. They extended the method to 3D Navier–Stokes solving (Prusa et al. 2001). Recently, promising new methods have been introduced that use wavelets (Dahmen et al. 1997; Beylkin et al. 1998; Alpert et al. 2002). All of these approaches have useful qualities, but to our knowledge, few have been applied to calculations comparable in sophistication to full 3D spherical atmospheric dynamics, and of those that have been so applied, none offers all the advantages of SEM.

3) General remarks about SEM

SEM was first proposed by Patera (1984). General introductions to SEM include the review by Maday and Patera (1989) and the books by Funaro (1997) and Karniadakis and Sherwin (1999). It should be noted that SEM is technically slightly less accurate than SM in terms of formal convergence estimates. However, in all our numerical experiments, including both traditional and new test cases, comparative accuracy has never indicated preferring SM over SEM. In general, differential, integral, and other operators are not diagonal in the SEM representation; rather, differential operators are block diagonal plus a global continuity constraint that amounts to a block tridiagonal of block rank one. Besides avoiding the spherical pole problem, SEM enjoys a mathematical structure that is ideal for implementation on massively parallel processing computers (see section 2c).

4) Other uses of SEM

Recently SEM has been successfully applied in a number of geophysics areas (Iskandarani et al. 2002), including atmosphere (Giraldo and Rosmond 2004) and ocean (Haidvogel and Beckmann 1999) circulation and seismology (Capdeville et al. 2002; Komatitsch et al. 2002), as well as such diverse applications as flame simulation (Feng and Mavriplis 2002), mechanical vibrations (Wang and Wereley 2002), amorphous film growth (Hoppe and Nash 2002), and others. Blackburn (1998) showed that SEM may be used for large-eddy simulation. Approaches to dynamically adaptive SEM are described by Henderson (1999), Barosan et al. (2001), Fournier (2001), Feng and Mavriplis (2002), and Fournier et al. (2003). There is also a wavelet-based SEM by Canuto et al. (2000).

b. Review of atmospheric-circulation models

1) Shallow-water models

The shallow-water equations (SWEs; e.g., Pedlosky 1987) are a very useful test system for numerical methods for spherical geophysical fluid dynamics because their solutions include nonlinear effects and wave structures similar to those of the full primitive equation (PrE) system. Let units of measure be such that the earth radius, angular rotation frequency, and gas constant are all unity. Then the SWE may be written as Newton's second law:

 
formula

combined with redistribution of mass geopotential Φ:

 
t
(2)

where 𝒩Φ ≡ (Φ − Φs) + fΦ includes flux Φ ≡ −∇ · Φu of Φ by fluid velocity u, as well as effects of surface topography ∝Φs. Various standard test cases and experiments may provide physical-process parameterizations (fu, fΦ). Subgrid-scale processes can optionally be included by the simple turbulence closure model 𝒟μ2, although in practice we usually set the eddy diffusivity μ = 0. Other symbols are standard or are defined in appendix A, including relating the usual spherical coordinates (λ, ϕ) to general nonorthogonal 2D coordinates x and defining differential operators.

2) The primitive equations

We discretize the PrE in the normalized-pressure vertical coordinate σ ∈ [0, 1], following Williamson et al. (1987). This yields Nσ levels of stacked, coupled SWE, with surface-pressure tendency ∂tP replacing ∂tΦ:

 
formula

where subscript k ∈ {1, … Nσ} denotes evaluation on surface σk of pressure σkP and thickness Δσk. Vertical advection is discretized to level k by

 
formula

Optional vertical hyperdiffusion uses a fourth difference with symmetrized boundaries:

 
formula

The additional temperature field T has a tendency driven by diabatic heating fT:

 
formula

Geopotential Φ is diagnosed by hydrostasis:

 
formula

For a vertically integrated column of fluid, mass conservation diagnoses the two vertical motions:

 
formula

The boundary conditions are just σ̇1/2 = σ̇Nσ+1/2 = 0.

2. Numerical method

a. Gnomonic mapping of sphere to cube

Figure 1 shows gnomonic (from the center, like a γνώμων = sun dial pointer) projections. Lines on the unit cube

 
formula

are mapped to curves on the unit sphere {r : |r| = 1}. It is convenient to discuss these projections first at the level of cube faces s ∈ {0, … 5} (associated with maps λs in this section) and then at the level of quadrangular elements , ℓ ∈ {1, … Ngh} that uniformly tile the cube (maps ϑ, section 2b). We refer the reader to appendixes B and C for mathematical details.

Fig. 1.

Cubed-sphere mesh, with continental outlines (gray), element boundaries (thick black), and node lines (thin black curves) for Nh = 4, Np = 6. Alphabetic labels are discussed in the text

Fig. 1.

Cubed-sphere mesh, with continental outlines (gray), element boundaries (thick black), and node lines (thin black curves) for Nh = 4, Np = 6. Alphabetic labels are discussed in the text

In Fig. 1, points A, D, and G indicate cube-face-tangent coordinates (λs, ϕs) [see appendix C, Eq. (C1)], for faces s = 0, 3, and 4, respectively. Unfolding the cube along the equator and Greenwich meridian maps the face indexes to

 
formula

and the Fig. 1 labels to (λ, ϕ) pairs as in Fig. 2.

Fig. 2.

Mapping under λs/π (C2) of nine points AI in Fig. 1 and similar points. Lon–lat pairs (λ, ϕ)/π are shown schematically on the cube faces.

Fig. 2.

Mapping under λs/π (C2) of nine points AI in Fig. 1 and similar points. Lon–lat pairs (λ, ϕ)/π are shown schematically on the cube faces.

All lines of constant x component on a cube face are projected by λs [see appendix C, Eq. (C2)] to great-circle segments on the sphere. For example, in Fig. 1 the images in faces s = 0, 3, and 4 of the x1 axis are the arcs AB, DE, and GH, and of the x2 axis, the arcs AC, DF, and GI, respectively. The cube edges are mapped by λs to the Fig. 1 curves as detailed in appendix C. The cube-edge curves are punctuated by the x-coordinate-line cusps in the (λ, ϕ) maps (as will be seen in the left panels of Fig. 10 in section 3). Alternative maps are described by Purser and Rančić (1998) and references therein.

b. Gauss–Lobatto–Legendre interpolation and quadrature

The next mathematical tools that must be employed to implement SEM are Gauss–Lobatto–Legendre collocation and quadrature. These are built using the nodes ξj and interpolating polynomials ϕj(ξ) (appendix B), shown in Fig. 3 for degree p = 7. Let j denote the space of polynomials of degree ≤j; then, an important classical result is that continuum integrals of the form

 
formula

may be evaluated exactly by the discrete sum

 
formula

over the nodes, provided that the integrand satisfies the condition u2p−1. In section 2c, this result leads to a collocation method, that is, an approximation of the continuum equations by equations solved only at the ξj. Conversely, using the interpolation operation u (appendix B), one estimates u(ξ) at arbitrary ξ ∈ [−1, 1], given only the node values u(ξj). Derivatives are evaluated at the nodes by Eq. (B2) (see appendix B)

 
formula
Fig. 3.

Cardinal functions (from bottom to top) ϕ7, ϕ1, ϕ5, and ϕ3 for p = 7, vertically offset by 0, 1/2, 1, and 3/2, respectively, as a function of ξ. The square markers indicate ξj, j ∈{0, … 7}. Other functions are derivable by ϕj(ξ) = ϕ7−j(−ξ).

Fig. 3.

Cardinal functions (from bottom to top) ϕ7, ϕ1, ϕ5, and ϕ3 for p = 7, vertically offset by 0, 1/2, 1, and 3/2, respectively, as a function of ξ. The square markers indicate ξj, j ∈{0, … 7}. Other functions are derivable by ϕj(ξ) = ϕ7−j(−ξ).

In 2D, interpolation commits an error at any point ξ, due to the parts ∉p of u in both directions, given classically by

 
formula

where the 1D residual operator is

 
formula

and ξ′ and ξ″ are two ξ-dependent points in the standard element ≡ {ξ : ξα ∈ [−1, 1]} ≡ [−1, 1]2 (Isaacson and Keller 1994, section 6.6).

We then use maps ϑ to the “physical” cube-face coordinates x from the “standard” coordinates ξ, as described in appendixes B and C. Each map ϑ is defined by a quadrangular element ; that is, by its four corners located at x = cℓ,k, k ∈ {0, … 3}. The global index ℓ ∈ {1, … Ngh} labels elements. Taking uniform subdivision of each cube face s as an example, then ℓ runs over rows and columns of length Nh, the number of elements along each cube edge: ℓ = 1 + lrow + Nhlcol + N2hs. In Fig. 1 the points A, D, and G are the centers of faces s = 0, 3, and 4 and (since Nh = 4) are the center cross points of sets

 
formula

of element indexes ℓ, respectively.

c. Integral form and collocation of the governing equations

A variational method consists of minimizing rmse, in the form of certain domain integrals, w.r.t. the unknown fields on which the integrands depend. Specifically, an integrand ɛ2h,p is the squared residual error between evaluations at true fields and at fields computed with polynomials of degree p in elements of size h. As we now show, SEM enables a variational method to be coded as a collocation method, potentially with spectral accuracy (rmse decaying with increasing p exponentially, i.e., faster than any power pq) for sufficiently smooth true solutions. For example, if ɛh,p is the SEM deviation from true solution u of the 1D forced Helmholtz equation with frequency Λ, then it is bounded as

 
formula

for some constant Cq (e.g., Karniadakis and Sherwin 1999, section 2.3.6), assuming that the forcing f is bounded as ‖(d/dx)q−1f2 < ∞. The exponent q measures u smoothness and q = ∞ if u and f are analytic, in which case rmse decays exponentially. De Frutos and Novo (2000) extended a similar result to the case of incompressible Navier–Stokes equations.

To derive the SEM collocation, the governing equations (1)–(2) or (3)–(8) are all multiplied by global test functions υ(λ, ϕ) and integrated in the horizontal, with two changes of variables defined in appendix C. The first change of variables is to (λ, ϕ) by λs from coordinates x in cube face s. The second is to x in each element ℓ′, by ϑℓ′ from the standard coordinates ξ. In other words, taking the term a to represent any expression, then a and the map compositions aλs and aλsℓ′ϑℓ′ represent the same expression with arguments (λ, ϕ), x, and ξ, respectively, where s denotes the cube face containing element ℓ. The integral form of any equation is partitioned into sums over six cube faces, then sums over the Ngh elements:

 
formula

introducing Jacobian factors into the integrands for both changes of variables.

Now let υυn,0 be continuous over the whole sphere and vanish at every node except node n in element ℓ, denoted by λn,λs(xn,), and possibly at identical nodes λn′,ℓ′ = λn, in abutting elements ℓ′ ≠ ℓ. To make this concrete, let us briefly consider the case of 1D intervals [cl, cl+1[ with maps x = ϑl(ξ) ≡ 2−1[cl + cl+1 + (cl+1cl)ξ]. The desired 1D test function υj,l(x) is given by υj,lϑl = ϕjδl,l for j ∉ {0, p}, and since ϕj(ξ) = ϕpj(−ξ), ensuring υj,l0 simply amounts to prepending west-edge reflections υ0,lϑl = ϕpδl−1,l + ϕ0δl,l, or appending east-edge reflections υp,lϑl = ϕpδl,l + ϕ0δl+1,l. To generalize this 1D reflection to the cubed sphere, there are two cases to consider: boundary nodes on the four element edges,

 
formula

and internal nodes. If xn, is an internal node then it is unique, and υn,0 is an interpolating-polynomial tensor product inside and vanishes outside :

 
formula

where Np is the number of nodes along an element edge. But if xn, is on the boundary of Nb neighboring elements with connectivity index sets b (containing nodes that are shared) and b (containing elements that share those nodes) as described in appendix D, then in order to ensure υn,0, υn, is also nonzero in all those neighbors,

 
formula

Inserting such a test function υn, and evaluating the integrals by the quadrature (9) eliminates coupling between distinct nodes and so generates a diagonal mass matrix δn,nδℓ,ℓ′Mn, (appendix C). Then terms such as (13) reduce to Mn,a(λn,ℓ, ϕn,), so effectively we arrive at the original dynamics (1)–(2) or (3)–(8) collocated at discrete nodes λn,. Of course there is a quadrature-error term such as (11) due to the part ∉2p−1 of each integrand of (13).

Continuity is enforced at the bth of all the Nbn shared boundary nodes, using the element-connectivity sets b, b, by applying the continuity-projection operation Φ → 𝒫0Φ, defined by

 
formula

The connectivity is also input to the software package METIS (http://www-users.cs.umn.edu/~karypis/metis) in order to allocate spectral elements to PEs.

d. Explicit time stepping and spatiotemporal filtering

Referring to (2) as a paradigm for the complete dynamics, SEAM employs second-order leapfrog time stepping for the nondiffusive nonlinear forced part (⁠𝒩⁠) and forward Euler if there is horizontal or vertical diffusion (⁠𝒟⁠):

 
formula

Here the subscript R denotes a 2Δt Robert filter,

 
formula

and BV denotes τf-periodic Boyd–Vandeven filtering [see appendix B, Eq. (B3)],

 
formula

This filtering controls both instabilities due to aliasing or possibly other nonlinear effects and any spurious modes. It acts by progressively damping components of higher Legendre spectral index. See appendix B or TTI for more details.

e. Implementing LMR

SEAM offers a major advantage by enabling LMR. Another strength is that LMR entails no code modifications other than reconstructing elements and a one-time PE reallocation using METIS. LMR is implemented as follows. One defines refinement matrices 𝗥m,d that transform the square-element matrix 𝗰 (as defined in appendix C) to several irregular quadrangles 𝗰𝗥m,d for m ∈ {1, … Ld} by subdividing each element edge k into dk equal pieces. The refinement satisfies

 
formula

(neglecting shared boundaries) for any (ℓ, d), so that every x is still contained exactly once. Note that Ngh is increased by Ld − 1 for each element to be refined, and that locally h and therefore globally ΔtCFL are reduced by roughly a factor of

 
formula

typically ∈{½, ⅓}. The overall effect resembles picture framing, which is what we call it. It is a simple first approach to conformal LMR, and is only adaptive to the extent that the user exploits a priori information about localized forcing, topography, etc. For production codes we would suggest investigating the more sophisticated packages such as CUBIT (http://endo.sandia.gov/cubit). For nonconformal LMR, we refer to Funaro (1997), Karniadakis and Sherwin (1999), Fournier (2001), Feng and Mavriplis (2002), Fournier et al. (2003), and references therein. The latter two documents also include discussions of dynamic adaptivity. Table 1 lists the 𝗥m,d matrices for the refinements discussed in section 3d.

Table 1.

Refinement transformations. Overbar denotes repeated decimal. Note that each column sums to one. To preserve integrals, is symmetric, where  Sk ,kδk ,k ′+1 mod 4

Refinement transformations. Overbar denotes repeated decimal. Note that each column sums to one. To preserve integrals,  is symmetric, where  Sk ,k ′ ≡ δk ,k ′+1 mod 4
Refinement transformations. Overbar denotes repeated decimal. Note that each column sums to one. To preserve integrals,  is symmetric, where  Sk ,k ′ ≡ δk ,k ′+1 mod 4

f. Performance estimates for the method

SEAM's high parallel performance is due to the connectivity structure (which simplifies PE communication) and the explicit time stepping (which requires no operator inversions). Only data on the element boundary nodes that are shared between PEs (identical physical locations mapped to elements on different PEs) need to be communicated between those PEs. This occurs in (14). All computations on individual elements are just size N2p dense matrix multiplies, such as (10) or (15), etc., that easily fit in a modern PE cache. Thus we have very low surface-to-volume (shared-to-local) data ratio, which is very efficient. Finite-difference schemes can also be very efficient but are usually low order, whereas SEM affords very high order p (up to p = 31 in TTI). We note that compact-difference schemes (e.g., Chang and Shirer 1985) also afford somewhat high order (typically fourth or sixth), using stencils that can improve parallel efficiency (e.g., Dixon and Tan 2003).

Upon model initialization, the Legendre-transform representations such as (10) or (15) are built into the operators that act on function values at collocation points λn,. Unlike the most common parallel implementations of SM models, there are no run-time transformations between point values and spectral coefficients, thus, no data transpositions between coordinates in different directions and no global communication for that purpose. This is another factor contributing to SEAM's relative efficiency.

The stability-limited time step ΔtCFL = O(N−1hN−2p). This estimate includes the contributions to spatial discretization length that come from element size h (Nh ≡ 2h−1) and from node clustering (Np = p + 1). Computation costs O(N2hN3p) per time step, including O(N2h) to touch every element once and O(N3p) for matrix operations in any element. Thus, as a trend, “h refinement” (decreasing h; algebraic convergence) costs less and constrains ΔtCFL less than does “p refinement” (increasing p; exponential convergence), referring to (12). However, as observed by an anonymous reviewer, p refinement can be better, depending on solution properties and desired accuracy. SWE experiments by TTI show that the greatest computational efficiency is obtained for Np ∈ {8, … 16}, and Nh increased as needed for accuracy. In contrast, the cost for SM models with O(Ns) zonal and O(Ns) polar wavenumbers is O(N3s). If Np is fixed, then at high-enough resolution SEM can cost less per degree of freedom (dof) than do fully SM models. For example, taking into account a fully SM model's larger Δt (apparently a factor of 10 for Np = 8) and all prefactors, the complexity analysis by TTI shows that the SEM solver becomes more cost effective per grid point at resolutions T ≥ 169. The larger Δt for semi-implicit stepping was a motive for the improvements by Thomas et al. (2000, 2001) and Thomas and Loft (2002).

3. Numerical experimental results

a. High-resolution Jupiter SWE simulation

Using the SWE version of the SEAM, we carried out integrations of decaying turbulence on Jupiter with very high resolution and very weak dissipation (Baer et al. 2001). Four different truncations are used ranging from Te171 to a maximum of Te1067, where in order to compare with SM models, we let Te denote the roughly equivalent dealiased triangular spectral resolution of SEAM. (The value Te counts the 4NhNp equatorial nodes and ÷3; thus, Te1067 represents 6N2h = Ngh = 60 000, N2p = 8 × 8, that is, 3200 equatorial latitude-circle points.) SEAM ran very efficiently on the Cray T3E with 128 PEs (Table 2) and at that resolution produced multiple jets from pole to pole as anticipated by Rhines. The intensity of the jets, as well as the developed vortices, may be seen in Fig. 4. The jet-streak number and vortex intensity increase as resolution increases. It was encouraging to find from these initial experiments that SEAM is capable of solving SWE efficiently at exceptionally high resolution on a massively parallel computer. Indeed there are few if any reports of other global models capable of running at the high resolution reported here (Iacono et al. 1999; Yoden et al. 1999; Shingu et al. 2002).

Table 2.

Description of supercomputers. The ratio of peak Mflops to CPU speed is just the number (2 or 4) of floating-point operations each PE can perform per cycle. Bold font indicates supercomputers used for SEAM experiments. Additional information sources are listed in the footnotes

Description of supercomputers. The ratio of peak Mflops to CPU speed is just the number (2 or 4) of floating-point operations each PE can perform per cycle. Bold font indicates supercomputers used for SEAM experiments. Additional information sources are listed in the footnotes
Description of supercomputers. The ratio of peak Mflops to CPU speed is just the number (2 or 4) of floating-point operations each PE can perform per cycle. Bold font indicates supercomputers used for SEAM experiments. Additional information sources are listed in the footnotes
Fig. 4.

Decaying Jovian shallow-water turbulence simulation (using 128 PEs of the Cray T3E; Table 2). Potential vorticity contours at Jovian time t = 276 days at approximate resolutions as labeled.

Fig. 4.

Decaying Jovian shallow-water turbulence simulation (using 128 PEs of the Cray T3E; Table 2). Potential vorticity contours at Jovian time t = 276 days at approximate resolutions as labeled.

b. SEAM PrE trials with Held–Suarez forcing

After the SWE experiments reported by TTI and Baer et al. (2001), SEAM has also been applied to PrE, as described by TLT and Fournier et al. (2000a,b). The first step in validating a PrE solver as a prospective GCM dynamical core (dycore) is to pass the test of Held and Suarez (1994, hereafter HS), as reported for SEAM by TLT. SEAM results, which derive from final 1000-day time averages from 1200-day runs, are shown in Figs. 5 and 6. We present these diagnostics mainly to validate SEAM by showing it produces virtually identical results to those of a spherical-harmonic SM (SHSM) model. This is not surprising since the models use a similar vertical discretization, and the horizontal discretizations have been shown by TTI to agree to several digits.

Fig. 5.

Eddy kinetic energy (ordinate; m2 s−2) vs maximum zonal wavenumber (abscissa) (after Fig. 2 of TLT). Values for a finite-difference model (triangles) and a SHSM model (squares) are taken from HS. Results from SEAM (circles) are in close agreement with the SHSM model.

Fig. 5.

Eddy kinetic energy (ordinate; m2 s−2) vs maximum zonal wavenumber (abscissa) (after Fig. 2 of TLT). Values for a finite-difference model (triangles) and a SHSM model (squares) are taken from HS. Results from SEAM (circles) are in close agreement with the SHSM model.

Fig. 6.

The zonal mean uλ (top; 4 m s−1 contours) and T eddy variance (bottom; 5 K2 contours) as a function of ϕ (abscissa) and σ (ordinate) (Figs. 3 and 4 of TLT). The mean is computed from the last 1000 days of a 1200-day run at 157e-km resolution with Nσ = 20. The forcing is symmetric about ϕ = 0, so that differences between the hemispheres indicate the mean variability.

Fig. 6.

The zonal mean uλ (top; 4 m s−1 contours) and T eddy variance (bottom; 5 K2 contours) as a function of ϕ (abscissa) and σ (ordinate) (Figs. 3 and 4 of TLT). The mean is computed from the last 1000 days of a 1200-day run at 157e-km resolution with Nσ = 20. The forcing is symmetric about ϕ = 0, so that differences between the hemispheres indicate the mean variability.

Figure 5 shows the eddy kinetic energy as a function of zonal resolution for SEAM for a gridpoint model and a SHSM model. For the SHSM and gridpoint models the results were taken from HS, which contains a more complete description of them. One can see that the SM model and SEAM have very similar results.

Figure 6 shows the zonal mean wind (top) and temperature eddy variance (bottom) from SEAM. Both of these plots are shown for the gridpoint model and SM model in HS. Consulting that paper shows that all the models produce similar results. We conclude that the SM model and SEAM results are almost identical and differ slightly from the results of the finite-difference model.

c. Experimental computational performance results

One advantage of the dycore benchmark of HS over the shallow-water test cases is that it represents a complete component of a GCM. It allows us to compute meaningful performance measures since the efficiency of a dycore on this benchmark directly measures how efficient it will be when inserted into a GCM. For all these tests SEAM uses Nσ = 20 vertical levels (denoted by L20), and all cases use an N2p = 8 × 8 quadrature mesh within each element, corresponding to local spectral representation up to degree p = 7. For this configuration SEAM uses less than 1 kbyte of cache per dependent variable, per element, per level. The global resolution varies only with Ngh.

Various supercomputer systems are described in Table 2. Parallel performance on these systems may be quantified in terms of the efficiency Ef,I in increasing the rate R(I) in Gflops when the number I of PEs is scaled up by a factor f > 1. We may conveniently define it as Ef,IR(fI)/fR(I).

In Fig. 7a, we present the total Gflops rates obtained for Te171L20 (77e km) or Te533L20 (24e km) resolution on up to 1024 PEs, where subscript e on a value denotes resolution length estimated by the average node separation:

 
formula

The results of Schaffer and Suárez (2000) are inserted for comparison. At resolution Te533L20, using IBM POWER3 PEs, SEAM achieves a computation rate of R(1024) = 130 Gflops with 17 spectral elements allocated per PE. This corresponds to 8% of peak Gflops and efficiency E64,16 = 48%. The scalability and efficiency of the method is better illustrated in Fig. 7b by R(I)/I, the Mflops per PE. At resolution Te171L20, SEAM achieves 292 Mflops per PE (29% peak) on 32 Compaq ES40 PEs (E32,1 = 93%) with 49 elements allocated per PE. We also may infer from TLT that R(64) on the HP SPP2000 varied by ≤19% for resolutions 2 or 4 times coarser than Te171, which is too little variance to show clearly in Fig. 7. Overall, SEAM Mflops per PE rates and model-state size are comparable to the results of Anderson et al. (1999) and Gropp et al. (2001), who use unstructured tetrahedral control volumes to implicitly solve for Euler flow.

Fig. 7.

Parallel computation performance of SEAM and other dycores at various resolutions and on various supercomputers described in Table 2, as indicated in the figure legend by CPU make and CPU speed (MHz). (a) Gflops (log-scale ordinate) achieved by SEAM and Aries (Schaffer and Suárez 2000) vs number of PEs (log-scale abscissa).

Fig. 7.

Parallel computation performance of SEAM and other dycores at various resolutions and on various supercomputers described in Table 2, as indicated in the figure legend by CPU make and CPU speed (MHz). (a) Gflops (log-scale ordinate) achieved by SEAM and Aries (Schaffer and Suárez 2000) vs number of PEs (log-scale abscissa).

More important than Mflops is the actual time to solution. These numbers are given in Fig. 7c. Compared to other models, SEAM at resolution 77e km on the SGI Origin 2000 is slightly faster per dof than the speed reported by DAO (2000), who discuss runs at roughly 250-km resolution on the SGI Origin 2000, of the model by Lin and Rood (1999). Also on this computer, SEAM's E2,16 = 90% is about 15 points above the efficiency inferred from DAO (2000). Figure 7c shows that SEAM performs at least 13 times faster per level on thirty-two 200-MHz IBM CPUs than do the National Center for Atmospheric Research (NCAR) Community Atmosphere Model 2.0 (CAM2.0) dycores (Eulerian, semi-Lagrangian, and finite-volume) and the Globel Model Europa (GME) dycore, on forty 375-MHz IBM CPUs at similar high horizontal resolutions. It should be noted that the three NCAR CAM2.0 dycore and the GME dycore runs were only compiler optimized and that their timings include initialization and input/output and hence are only representative (C. Jablonowski 2003, personal communication).

Table 3 shows E2,I for all the Fig. 7 SEAM data. For the Te533L20 problem using IBM POWER3 PEs, note that E2,I decreases from 98% gradually to 69% as I increases from 64 to 512, indicating very good performance for a fixed-size problem. For Te171L20 on every system, the performance E2,I ≥ 78%.

Table 3.

Parallel performance efficiency E 2, I (%) for doubling number I of PEs (right columns) for various supercomputers in Table 2, with rows sorted by CPU and model resolution. Values are inferred for Aries from Schaffer and Suárez (2000, their Fig. 3), for fv [finite-volume dycore of Lin and Rood (1998)] from Sawyer (2001, p. 27), and for geo (geodesic grid) from Randall et al. (2002, their Fig. 8). Values ≥90% are in bold face

Parallel performance efficiency E 2, I (%) for doubling number I of PEs (right columns) for various supercomputers in Table 2, with rows sorted by CPU and model resolution. Values are inferred for Aries from Schaffer and Suárez (2000, their Fig. 3), for fv [finite-volume dycore of Lin and Rood (1998)] from Sawyer (2001, p. 27), and for geo (geodesic grid) from Randall et al. (2002, their Fig. 8). Values ≥90% are in bold face
Parallel performance efficiency E 2, I (%) for doubling number I of PEs (right columns) for various supercomputers in Table 2, with rows sorted by CPU and model resolution. Values are inferred for Aries from Schaffer and Suárez (2000, their Fig. 3), for fv [finite-volume dycore of Lin and Rood (1998)] from Sawyer (2001, p. 27), and for geo (geodesic grid) from Randall et al. (2002, their Fig. 8). Values ≥90% are in bold face

Figure 8 shows the percentage of total wall clock time spent on message passing between PEs of the HP SPP2000 (Table 2), for three different horizontal resolutions: Ngh = 96, 384, and 1536 elements, equivalent to spectral resolutions of Te43, Te85, and Te171, and average grid spacing of 319e, 157e, and 77e km, respectively. As expected, message passing takes a steadily greater proportion of time as we keep the resolution (problem size) fixed while using more and more PEs. However, message passing also takes a lesser proportion of time as we increase the problem size while using a fixed number of PEs. This is due to smaller surface-to-volume ratio of data to be communicated, since each PE acquires more spectral elements as resolution is increased. The higher-resolution curves are close together, suggesting that performance becomes less sensitive to Ngh. As observed by an anonymous reviewer, this implies that SEAM's scalability is a simple function of NghN2pNσ and communication costs.

Fig. 8.

Percentage of total SEAM run time spent message passing (ordinate) for three resolutions as indicated in the legend vs number of PEs (log-scale abscissa) (after Fig. 7 of TLT).

Fig. 8.

Percentage of total SEAM run time spent message passing (ordinate) for three resolutions as indicated in the legend vs number of PEs (log-scale abscissa) (after Fig. 7 of TLT).

d. Static LMR in SWE

The shallow-water test cases (Williamson et al. 1992) were used to demonstrate the accuracy and efficiency of SEAM, as reported by TTI. Another success of SEAM is to incorporate LMR, presented here.

Figure 9a (left) shows the simulation of test case 4 (“forced nonlinear system with a translating low”) of Williamson et al. (1992) using a uniform mesh with Ngh = 150 (resolution Te53, 242e km). One way to correct the errors (right) near the low pressure feature is to uniformly triple the resolution, yielding 1350 elements (Te160, 81e km), as shown in Fig. 9b. A more cost-effective way (Fig. 9c) is to use a locally refined Ngh = 442 element mesh, whose resolution varies between those of the other two meshes, using successive 300% local stretching factors. As seen in Table 4, a small loss of accuracy can be tolerated to compute the solution with the same Δt and a savings in effort in proportion to the numbers Ngh of elements, or 442/1350 ⇒ 67% savings.

Fig. 9.

Element-mesh refinement in Northern Hemisphere. SEAM applied to test case 4 of Williamson et al. (1992) with u0 = 20 m s−1, t = 5 days, contour interval (left) 50 m for height and (right) 1 m for error. Every element contains an N2p = 8 × 8 node mesh (not shown). Element numbers and time steps (Ngh, Δt) are (a) (6 × 52; 90 s) and (b) (6 × 152; 30 s) for uniform meshes and (c) (442; 30 s) for a locally refined mesh

Fig. 9.

Element-mesh refinement in Northern Hemisphere. SEAM applied to test case 4 of Williamson et al. (1992) with u0 = 20 m s−1, t = 5 days, contour interval (left) 50 m for height and (right) 1 m for error. Every element contains an N2p = 8 × 8 node mesh (not shown). Element numbers and time steps (Ngh, Δt) are (a) (6 × 52; 90 s) and (b) (6 × 152; 30 s) for uniform meshes and (c) (442; 30 s) for a locally refined mesh

Table 4.

Error norms relative to field norms (Williamson et al. 1992, section 3.1) for test cases 4 (vs true solution) and 5 (vs N gh = 6 × 34 2 run with Δ t = 20 s)

Error norms relative to field norms (Williamson et al. 1992, section 3.1) for test cases 4 (vs true solution) and 5 (vs N gh = 6 × 34 2 run with Δ t = 20 s)
Error norms relative to field norms (Williamson et al. 1992, section 3.1) for test cases 4 (vs true solution) and 5 (vs N gh = 6 × 34 2 run with Δ t = 20 s)

Another time to use LMR is when localized topography is important. In Fig. 10a (center) we show SWE flow forced by a smooth topographic feature that is 76% (by mass) spatially localized to 6% of the sphere (Williamson et al. 1992, their test case 5: “zonal flow over an isolated mountain”). Note that many finescale solution gradients have been corrupted by ringing (spectral-truncation oscillations) due to insufficient resolution. These spurious oscillations weakly persist at higher resolution (Fig. 10b). By locally refining the element mesh (Fig. 10c), nearly the same accuracy solution (Table 4) as a highest-resolution computation (Fig. 10d) can be computed with only 310/(6 × 342) ≈ 1/22 the number of elements and 3/4 the time step (Fournier et al. 2000b). This implies only 1/17 the computational cost, or a 94% savings. Additionally there is no degradation of the global solution, as can be seen in Figs. 10e–h.

Fig. 10.

(center) Zoom view of test case 5 (Williamson et al. 1992) zonal-deviation height field at t = 15 days (selected 2-m contours, resting depth 5 km). (left) Spectral-element mesh and 500-m contours of a 3-km-tall mountain. The N2p = 82 node mesh in every element is not shown. (right) Error fields w.r.t. a Ngh = 6 × 342 run (1-dm contours). Resolution Ngh varies over (a) 6 × 92, (b) 6 × 172, (c) 310 (a locally refined mesh), and (d) 6 × 342. (e)–(h) As in (a)–(d), but for the full domain

Fig. 10.

(center) Zoom view of test case 5 (Williamson et al. 1992) zonal-deviation height field at t = 15 days (selected 2-m contours, resting depth 5 km). (left) Spectral-element mesh and 500-m contours of a 3-km-tall mountain. The N2p = 82 node mesh in every element is not shown. (right) Error fields w.r.t. a Ngh = 6 × 342 run (1-dm contours). Resolution Ngh varies over (a) 6 × 92, (b) 6 × 172, (c) 310 (a locally refined mesh), and (d) 6 × 342. (e)–(h) As in (a)–(d), but for the full domain

Figure 9c also illustrates the use of the refinement matrices 𝗥m,d. The refinement from the tropical to the extratropical zone uses a zone with L3,2,1,2 = 7 refined elements dividing each length h from the south (k = 2) by d0 = 3 going north and d1 = d3 = 2 going east and west. There follows a uniformly refined zone of L3,3,3,3 = 9 to the north. The less-resolved polar zone is alternately connected using the same L3,2,1,2 toward the cube edges, or else L3,3,2,2 = 6 refined elements toward the cube corners. Finally, in Figs. 10c and 10g we use these refinements again, in addition to a refinement by L2,2,1,1 = 3 elements.

4. Concluding remarks

We conclude by summarizing the main features of the spectral-element method and its application to the Spectral Element Atmosphere Model (SEAM).

  • SEAM's cubed-sphere coordinates eliminate the pole problem (Fig. 1). That is, there are no mesh-point clustering, worsening Courant–Friedrichs–Lewy stability restriction, or coordinate singularities as latitude |ϕ| → π/2, as there are in explicit spectral-transform codes on latitude–longitude grids.

  • SEAM produces ultra-high-resolution shallow-water simulation of Jupiter's atmosphere (Fig. 4).

  • SEAM validates the dynamical-core tests of Held and Suarez (1994) (Figs. 5 and 6).

  • The subdivision of the problem domain across elements lends itself naturally to efficient parallel computation when using explicit time stepping. SEAM uses less than 1 kbyte of processing element (PE) cache per dependent variable, per element, per level. Solving the primitive equations at resolution T171L20, SEAM achieved 292 Mflops (106 floating-point operations per second) per PE (29% peak) on 32 Compaq ES40 PEs (efficiency E32,1 = 93%) with 49 elements allocated per PE (Fig. 7b). At T533L20, SEAM hit 130 Gflops (8% peak) on 1024 IBM POWER3 PEs (efficiency E64,16 = 48%) with 17 spectral elements allocated per PE (Fig. 7a). SEAM compares well against several other dynamical cores w.r.t. wall clock time per model day (Fig. 7c).

  • Local mesh refinement is practical and can afford significant computational savings, while focusing higher resolution on regions of dynamic and/or climatological interest (Figs. 9 and 10).

  • It is not easy to construct semi-implicit or semi-Lagrangian time-stepping algorithms that are efficient on parallel machines, but this has been recently accomplished by our collaborators Thomas and Loft (2002). Their semi-implicit scheme used a Helmholtz solver to time march 30 decoupled shallow-water layers roughly twice as quickly as did their explicit scheme, at ≈T85 on up to 384 IBM-SP PEs, although the average solver iteration count grew by ≈40% as resolution was doubled.

  • Most recently, H. Wang et al. (2003, unpublished manuscript) are coupling SEAM to the NCAR Community Climate System Model.

Fig. 7.

(Continued) (b) As in (a), but for Mflops per PE vs number of PEs.

Fig. 7.

(Continued) (b) As in (a), but for Mflops per PE vs number of PEs.

Fig. 7.

(Continued) (c) As in (a), but for simulation rate (left) in seconds and (right log-scale ordinate) in minutes, per model day, achieved by SEAM, Aries, NCAR CAM2.0 (Eulerian, semi-Lagrangian, and finite-volume), and GME dycores, vs number of PEs (log-scale abscissa)

Fig. 7.

(Continued) (c) As in (a), but for simulation rate (left) in seconds and (right log-scale ordinate) in minutes, per model day, achieved by SEAM, Aries, NCAR CAM2.0 (Eulerian, semi-Lagrangian, and finite-volume), and GME dycores, vs number of PEs (log-scale abscissa)

Acknowledgments

We thank C. Jablonowski, M. J. Suárez, and P. H. Worley for providing and discussing data and two anonymous reviewers for thorough reading and many useful comments. Our thanks go to F. Baer for guidance and support and to S. Levis for etymological assistance. This research was supported by the Office of Science (BER), U.S. Department of Energy, Grants DEFG0298ER62612 and DEFG0201ER63247 to the University of Maryland. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract DE-AC03-76SF00098. Acknowledgment is made to the National Center for Atmospheric Research for computing time used in this research. The National Center for Atmospheric Research is operated by the University Corporation for Atmospheric Research under sponsorship of the National Science Foundation. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation.

REFERENCES

REFERENCES
Alpert
,
B.
,
G.
Beylkin
,
D.
Gines
, and
L.
Vozovoi
,
2002
:
Adaptive solution of partial differential equations in multiwavelet bases.
J. Comput. Phys.
,
182
,
149
190
.
Anderson
,
W. K.
,
W. D.
Gropp
,
D. K.
Kaushik
,
D. E.
Keyes
, and
B. F.
Smith
,
1999
:
Achieving high sustained performance in an unstructured mesh CFD application.
Proc. Supercomputing, Portland, OR, IEEE, 1–13. [Available online at http://www.supercomp.org/sc99/proceedings/papers/anderson.pdf.]
.
Baer
,
F.
,
J. J.
Tribbia
, and
M.
Taylor
,
2001
:
Global and regional atmospheric modelling using spectral elements.
Fluid Mech. Appl.
,
61
,
81
86
.
Barosan
,
I.
,
F. N.
van de Vosse
,
P. D.
Anderson
, and
H. E. H.
Meijer
,
cited 2001
:
hN mesh-refinement techniques 2D.
Eindhoven University of Technology Internal Poster. [Available online at http://www.mate.tue.nl/mate/showemp.php/54.]
.
Beylkin
,
G.
,
J. M.
Keiser
, and
L.
Vozovoi
,
1998
:
A new class of time discretization schemes for the solution of nonlinear PDES.
J. Comput. Phys.
,
147
,
362
387
.
Blackburn
,
H. M.
,
1998
:
Channel flow LES with spectral elements.
Proc. 13th Australasian Fluid Mechanics Conf., Melbourne, Australia, Monash University, 989–992
.
Boyd
,
J. P.
,
2000
:
Chebyshev and Fourier Spectral Methods. 2d ed.
Dover, 688 pp
.
Canuto
,
C.
,
M. Y.
Hussaini
,
A.
Quarteroni
, and
T. A.
Zang
,
1988
:
Spectral Methods in Fluid Dynamics.
Springer-Verlag, 557 pp
.
Canuto
,
C.
,
A.
Tabacco
, and
K.
Urban
,
2000
:
The wavelet element method. Part II: Realization and additional features in 2D and 3D.
Appl. Comput. Harm. Anal.
,
8
(
)
123
165
.
Capdeville
,
Y.
,
C.
Larmat
,
J. P.
Vilotte
, and
J. P.
Montagner
,
2002
:
A new coupled spectral element and modal solution method for global seismology: A first application to the scattering induced by a plume-like anomaly.
Geophys. Res. Lett., 29, 1318, doi:10.1029/2001GL013747
.
Chang
,
H. R.
, and
H. N.
Shirer
,
1985
:
Compact spatial differencing techniques in numerical modeling.
Mon. Wea. Rev.
,
113
,
409
423
.
Dahmen
,
W.
,
P.
Oswald
, and
A. J.
Kurdila
,
1997
:
Multiscale Wavelet Methods for Partial Differential Equations.
Academic Press, 570 pp
.
DAO
,
2000
:
Algorithm theoretical basis document, Version 2.00.
Data Assimilation Office Tech. Rep., NASA Goddard Space Flight Center, 33 pp
.
de Frutos
,
J.
, and
J.
Novo
,
2000
:
A spectral element method for the Navier–Stokes equations with improved accuracy.
SIAM J. Numer. Anal.
,
38
,
799
819
.
Dixon
,
M.
, and
K.
Tan
,
2003
:
Distributed solution of high-order compact difference schemes for multidimensional convection–diffusion equations.
ICCSA03 Conference Proceedings, V. Kumar, M. L. Gavrilova, C. J. K. Tan, and P. L'Ecuyer, Eds., Lecture Notes in Computer Science, Vol. 2669, Springer-Verlag, 226–235
.
Feng
,
H.
, and
C.
Mavriplis
,
2002
:
Adaptive spectral element simulations of thin premixed flame sheet deformations.
J. Sci. Comput.
,
17
,
385
395
.
Foster
,
I. T.
, and
P. H.
Worley
,
1997
:
Parallel algorithms for the spectral transform method.
SIAM J. Sci. Comput.
,
18
,
806
837
.
Foster
,
I. T.
,
W.
Gropp
, and
R.
Stevens
,
1992
:
The parallel scalability of the spectral transform method.
Mon. Wea. Rev.
,
120
,
835
850
.
Fournier
,
A.
,
cited 2001
:
Adaptive spectral-element method for the shallow-water equations.
.
Fournier
,
A.
,
M. A.
Taylor
,
L. M.
Polvani
, and
R.
Saravanan
,
2000a
:
Spectral element method. Part 2: Numerical simulations.
Proc. Eighth Annual Conf., Montreal, QC, Canada, CFD Society of Canada, 181–188. [Available online at http://www.asp.ucar.edu/gtp/fournier/files/publications/AINu.pdf.]
.
Fournier
,
A.
,
M. A.
Taylor
, and
J.
Tribbia
,
2000b
:
Spectral element method. Part 1: Numerical algorithms.
Proc. Eighth Annual Conf., Montreal, QC, Canada, CFD Society of Canada, 173–180. [Available online at http://www.asp.ucar.edu/gtp/fournier/files/publications/NuSi.pdf.]
.
Fournier
,
A.
,
B.
Beylkin
, and
V.
Cheruvu
,
2003
:
Multiresolution adaptive space refinement in geophysical fluid dynamics simulation.
Adaptive Mesh Refinement—Theory and Application, T. Plewa, T. Linde, and V. G. Weirs, Eds., Lecture Notes in Computational Sciences and Engineering, Springer-Verlag, in press
.
Fox-Rabinovitz
,
M. S.
,
L. L.
Takacs
,
R. C.
Govindaraju
, and
M. J.
Suarez
,
2001
:
A variable-resolution stretched-grid general circulation model: Regional climate simulation.
Mon. Wea. Rev.
,
129
,
453
469
.
Funaro
,
D.
,
1997
:
Spectral Elements for Transport-Dominated Equations.
Springer-Verlag, 211 pp
.
Giraldo
,
F. X.
, and
T. E.
Rosmond
,
2004
:
A scalable spectral element Eulerian atmospheric model (SEE-AM) for NWP: Dynamical core tests.
Mon. Wea. Rev.
,
132
,
133
153
.
Gottlieb
,
D.
, and
S. A.
Orszag
,
1977
:
Numerical Analysis of Spectral Methods: Theory and Applications.
SIAM, 170 pp
.
Gropp
,
W. D.
,
D. K.
Kaushik
,
D. E.
Keyes
, and
B. F.
Smith
,
2001
:
High-performance parallel implicit CFD.
Parallel Comput.
,
27
,
337
362
.
Haidvogel
,
D. B.
, and
A.
Beckmann
,
1999
:
Numerical Ocean Circulation Modeling.
Imperial College Press, 300 pp
.
Hamrud
,
M.
,
S.
Saarinen
, and
D.
Salmond
,
2003
:
Implementation of the IFS on a highly parallel scaler system.
Realizing TeraComputing—Tenth Workshop on the Use of High Performance Computing in Meteorology, W. Zwieflhofer and N. Kreitz, Eds., World Scientific, 74– 87. [Available online at http://www.ecmwf.int/publications/library/ecpublications/proceedings/high_performance_computing_2002.]
.
Held
,
I. M.
, and
M. J.
Suarez
,
1994
:
A proposal for the intercomparison of the dynamical cores of atmospheric general circulation models.
Bull. Amer. Meteor. Soc.
,
75
,
1825
1830
.
Henderson
,
R. D.
,
1999
:
Dynamic refinement algorithms for spectral element methods.
Comput. Methods Appl. Mech. Eng.
,
175
,
395
411
.
Hoppe
,
R. H. W.
, and
E. M.
Nash
,
2002
:
A combined spectral element/finite element approach to the numerical solution of a nonlinear evolution equation describing amorphous surface growth of thin films.
J. Numer. Math.
,
10
(
)
127
136
.
Iacono
,
R.
,
M. V.
Struglia
,
C.
Ronchi
, and
S.
Nicastro
,
1999
:
High-resolution simulations of freely decaying shallow-water turbulence on a rotating sphere.
Nuovo Cimento
,
22C
,
813
821
.
Isaacson
,
E.
, and
H. B.
Keller
,
1994
:
Analysis of Numerical Methods.
Dover, 541 pp
.
Iselin
,
J. P.
,
J. M.
Prusa
, and
W. J.
Gutowski
,
2002
:
Dynamic grid adaptation using the MPDATA scheme.
Mon. Wea. Rev.
,
130
,
1026
1039
.
Iskandarani
,
M.
,
D. B.
Haidvogel
,
J. C.
Levin
,
E.
Curchitser
, and
C. A.
Edwards
,
2002
:
Multiscale geophysical modeling using the spectral element method.
Comput. Sci. Eng.
,
4
(
5
)
42
48
.
Karniadakis
,
G. E.
, and
S. J.
Sherwin
,
1999
:
Spectral/hp Element Methods for Computational Fluid Dynamics.
Oxford University Press, 404 pp
.
Komatitsch
,
D.
,
J.
Ritsema
, and
J.
Tromp
,
2002
:
The spectral-element method, Beowulf computing, and global seismology.
Science
,
298
,
1737
1742
.
Lin
,
S. J.
, and
R. B.
Rood
,
1998
:
A flux-form semi-Lagrangian general circulation model with a Lagrangian control-volume vertical coordinate.
Proc. Rossby-100 Symp., Stockholm, Sweden, Dept. of Meteorology, Stockholm University
.
Lin
,
S. J.
, and
R. B.
Rood
,
1999
:
Development of the joint NASA/NCAR general circulation model.
Preprints, 13th Conf. on Numerical Weather Prediction, Denver, CO, Amer. Meteor. Soc., 115–119
.
Maday
,
Y.
, and
A. T.
Patera
,
1989
:
Spectral element methods for the incompressible Navier–Stokes equations.
State-of-the-Art Surveys on Computational Mechanics, A. K. Noor and J. T. Oden, Eds., American Society of Mechanical Engineers, 71–143
.
Mashkovich
,
S. A.
,
1994
:
Application of a continuous dynamic grid adaptation technique to numerical weather forecasting.
Russ. Meteor. Hydrol.
,
11
,
1
6
.
Mathews
,
J.
, and
R. L.
Walker
,
1971
:
Mathematical Methods of Physics. 2d ed.
Addison-Wesley, 501 pp
.
McGregor
,
J. L.
,
1999
:
Regional modeling at CAR: Recent developments.
Parallel computing in meteorology and oceanography, BMRC Research Rep. 75, Bureau of Meteorology Research Centre, 43–48
.
Morton
,
K. W.
, and
D. F.
Mayers
,
1994
:
Numerical Solution of Partial Differential Equations: An Introduction.
Cambridge University Press, 239 pp
.
Patera
,
A. T.
,
1984
:
A spectral element method for fluid dynamics: Laminar flow in a channel expansion.
J. Comput. Phys.
,
54
,
468
488
.
Pedlosky
,
J.
,
1987
:
Geophysical Fluid Dynamics. 2d ed.
Springer-Verlag, 710 pp
.
Prusa
,
J. M.
,
P. K.
Smolarkiewicz
, and
A. A.
Wyszogrodzki
,
2001
:
Simulations of gravity wave induced turbulence using 512 pe Cray T3E.
Int. J. Appl. Math. Comput. Sci.
,
11
,
883
897
.
Purser
,
R. J.
, and
M.
Rančić
,
1998
:
Smooth quasi-homogeneous gridding of the sphere.
Quart. J. Roy. Meteor. Soc.
,
124
,
637
647
.
Randall
,
D. A.
,
T. D.
Ringler
,
R. P.
Heikes
,
P.
Jones
, and
J.
Baumgardner
,
2002
:
Climate modeling with spherical geodesic grids.
Comput. Sci. Eng.
,
4
(
5
)
32
41
.
Sawyer
,
W.
,
2001
:
Challenges in the dynamics of atmospheric modelling.
Seminar at the Institut für Angewandte Mathematik II, Freiberg, Germany, TU Bergakademie Freiberg, 1–37. [Available online at http://www.iac.ethz.ch/staff/sawyer/freibergJun2001/sld001.htm.]
.
Schaffer
,
D. S.
, and
M. J.
Suárez
,
2000
:
Design and performance analysis of a massively parallel atmospheric general circulation model.
Sci. Programm.
,
8
,
49
57
.
Shingu
,
S.
, and
Coauthors
,
2002
:
A 26.58 Tflops global atmospheric simulation with the spectral transform method on the Earth Simulator.
Supercomputing 2002: From Terabytes to Insights, R. Lucas, Ed., IEEE Computer Society and ACM SIGARCH. [Available online at http://sc-2002.org/paperpdfs/pap.pap331.pdf.]
.
Spotz
,
W. F.
,
M. A.
Taylor
, and
P. N.
Swarztrauber
,
1998
:
Fast shallow-water equation solvers in latitude–longitude coordinates.
J. Comput. Phys.
,
145
,
432
444
.
Stevens
,
D. E.
,
A. S.
Almgren
, and
J. B.
Bell
,
1999
:
Adaptive simulations of trade cumulus convection.
University of California Tech. Rep. UCRL-JC-133201, 23 pp
.
Tanguay
,
M.
,
A.
Simard
, and
A.
Staniforth
,
1989
:
A three-dimensional semi-Lagrangian scheme for the Canadian regional finite-element forecast model.
Mon. Wea. Rev.
,
117
,
1861
1871
.
Taylor
,
M.
,
J.
Tribbia
, and
M.
Iskandarani
,
1997
:
The spectral element method for the shallow water equations on the sphere.
J. Comput. Phys.
,
130
,
92
108
.
Taylor
,
M.
,
R.
Loft
, and
J.
Tribbia
,
1998
:
Performance of a spectral element atmospheric model (SEAM) on the HP Exemplar SPP2000.
NCAR Tech. Rep. TN-439 + EDD, 16 pp
.
Thomas
,
S. J.
, and
R. D.
Loft
,
2002
:
Semi-implicit spectral element atmospheric model.
J. Sci. Comput.
,
17
,
339
350
.
Thomas
,
S. J.
,
R.
Loft
,
W. F.
Spotz
, and
A.
Fournier
,
2000
:
Semi-implicit scheme for the Spectral Element Atmospheric Model.
Proc. Eighth Annual Conf., Montreal, QC, Canada, CFD Society of Canada, 231–238
.
Thomas
,
S. J.
,
R.
Loft
,
A.
Fournier
, and
J.
Tribbia
,
2001
:
Parallel spectral element dynamical core for atmospheric general circulation models.
Proc. Ninth Annual Conf., Waterloo, ON, Canada, CFD Society of Canada, 69–74
.
Wang
,
G.
, and
N. W.
Wereley
,
2002
:
Spectral finite element analysis of sandwich beams with passive constrained layer damping.
Trans. ASME, J. Vib. Acoust.
,
124
,
376
386
.
Williamson
,
D. L.
, and
J. M.
Rosinski
,
2000
:
Accuracy of reduced grid calculations.
Quart. J. Roy. Meteor. Soc.
,
126
,
1619
1640
.
Williamson
,
D. L.
,
J. T.
Kiehl
,
V.
Ramanathan
,
R. E.
Dickinson
, and
J. J.
Hack
,
1987
:
Description of the NCAR community climate model (CCM1).
NCAR Tech. Rep. 285, 112 pp
.
Williamson
,
D. L.
,
J. B.
Drake
,
J. J.
Hack
,
R.
Jakob
, and
P. N.
Swarztrauber
,
1992
:
A standard test set for numerical approximations to the shallow water equations in spherical geometry.
J. Comput. Phys.
,
102
,
211
224
.
Worley
,
P. H.
,
2002
:
Performance studies using CCM/MP-2D.
Oak Ridge National Laboratory Tech. Rep. [Available online at http://www.csm.ornl.gov/~worley/studies/ccm-mp-2d-platforms.html.]
.
Yoden
,
S.
,
K.
Ishioka
,
Y-Y.
Hayashi
, and
M.
Yamada
,
1999
:
A further experiment on two-dimensional decaying turbulence on a rotating sphere.
Nuovo Cimento
,
22C
,
803
812
.

APPENDIX A

Curvilinear Coordinates and Operators

Spherical coordinates are longitude λ = λ1 ∈ [0, 2π[ and latitude ϕ = λ2 ∈ [−π/2, π/2]. The 3D eastward, northward, and vertical (spherical normal) unit vectors are

 
formula

where ≐ is read “is represented by.” Two-dimensional coordinates in the standard square are given by x ≐ (x1, x2)T ≡ {ξ : ξα ∈ [−1, 1]} ≡ [−1, 1]2. On cube face s the map (λ1, λ2) = λs(x) (appendix C) induces two biorthogonal bases tangent to the sphere:

 
formula

(assuming summation convention over duplicated Greek coordinate indexes α, β ∈ {1, 2}), where ∂β ≡ ∂/∂xβ and Jα,β ≡ |∇λa|−1βλa (for index a ∈ {1, 2}) is the rescaled Jacobian. Biorthogonal means gα · gβ = δαβ. The Jacobian's inverse is J−1β,α = |∇λa|(∂xβ/∂λa), and its determinant is

 
formula

The metric tensor is

 
formula

with inverse

 
formula

and determinant J2. The Christoffel symbol is

 
formula

The wind velocity, or any arbitrary vector, tangent to the sphere, may be expressed as u = uλe1 + uϕe2 = uαgα = uβgβ. Here uαu · gα = gα,βuβ is the covariant and uβu · gβ = gα,βuα is the contravariant u component, with transformation law (secϕuλ, uϕ) = (u1, u2)(∂/∂x)λ. Similarly uλu · e1 = J1,βuβ is the longitudinal and uϕu · e2 = J2,βuβ is the latitudinal u component, with covariant components u · ∂λe3 = cosϕuλ and u · ∂ϕe3 = uϕ and contravariant components u · ∇λ = secϕuλ and u · ∇ϕ = uϕ.

The horizontal (sphere tangent) gradient operator is gαα ≡ ∇ = e1 secϕλ + e2ϕ, where ∇αgα,ββ. Thus one defines the vertical (sphere normal) vorticity or u curl (a pseudodivergence) to be

 
formula

where e3 × u = J−1uβɛβ,αgα. Similarly one defines u divergence as

 
formula

The final expression equivalents are in spherical coordinates. In SEAM, the expressions in x coordinates involving metric terms are used.

APPENDIX B

Polynomial Spaces and Operators

For ξ a standard coordinate ∈ [−1, 1], let

 
formula

denote the Legendre polynomial of degree j. The span of Lj for all j ∈ {0, … p} is a polynomial space of dimension Npp + 1. The quadrature formula (9) employs the Gauss–Lobatto node [−1, 1] ξj = −ξpj = jth-greatest root of (1 − ξ)(1 + ξ)(d/)Lp and the Gauss–Lobatto weight wj ≡ 2(pNpL2p,j)−1 (Boyd 2000; Canuto et al. 1988). Then corresponding to ξj, the cardinal function

 
formula

is alternately given by

 
formula

where γj ≡ 〈L2j−1GL. Using (9), one constructs the interpolation operator as

 
formula

The derivative matrix is

 
formula

where Lj,j ≡ (d/)Lj(ξj) ∈ 2−1j(j + 1)[−1, 1]. The Boyd–Vandeven filter matrix of order 12 and length Npjf = max(3, ⅔Np) is given by

 
formula

where Zq(Ω) ≡ [ln(1 − Ω2)q]1/2 sgnΩ. This expression is just the representation at nodes ξj of the operation of attenuating Legendre coefficients of degree ≥jf by an effective viscosity μf times the bracketed expression (∈[0, 1]).

Let ξ ≡ (ξ1, ξ2)T be the 2D coordinate in . At the 2D node index n ∈ {0, … N2p − 1} one has the 2D Gauss–Lobatto node ξn given by ξj+Npj ≡ (ξj, ξj)T.

APPENDIX C

Construction of the Element Mesh

Each 1D cube edge is subdivided into Nh ≡ 2h−1 elements of length h, with lower element edges located at cl ≡ tan[(π/4)(hl − 1)] ∈ [−1, 1[ for l ∈ {0, … Nh − 1}. Then for a uniform 2D mesh, element ℓ ∈ {1, … Ngh} (where the global number of elements Ngh = 6N2h), corner k ∈ {0, … 3} is located at cℓ,k which represents the columns of

 
formula

Now introduce the bilinear map

 
formula

from to the quadrangle with corners Ck. Generally the inverse map ϑ−1(𝗖, x) is not bilinear and involves square roots. In the rectangular, linear case

 
formula

may be inverted trivially. Thus one defines the ℓth element ϑ() with four boundaries

 
formula

where ϑ(ξ) ≡ ϑ(𝗰, ξ) and has four boundaries

 
formula

Define longitude and latitude

 
formula

at which the sphere is tangent to the center of cube face s. Then the local 3D Cartesian basis vectors are just

 
formula

Vector eιs is parallel to cube face s for ι ≤ 2, and e3s is perpendicular. These define 3D location

 
formula

on cube face s and distance

 
formula

from the sphere center. Then the maps λ−1s and λs are given by

 
formula

respectively, where xn,ϑ(ξn) ∈ is the global node and s denotes the cube face containing element ℓ.

The cube edges {x : x2 = ±1(|x1| ≤ 1)} are mapped by λs to the curves

 
formula

while the edges {x : x1 = ±1(|x2| ≤ 1)} go to the curves

 
formula

for s < 4 (“equatorial faces”) and to the curves

 
formula

for s ≥ 4 (“polar faces”).

The value of the metric tensor at xn, is 𝗴n, ≡ (𝗝n,)T𝗝n,, where

 
formula

are the rescaled Jacobians at xn,ℓ. Enforcing continuity by (14) employs the mass matrix Mn,wjwjJn,|∂ϑ(ξn)/∂ξ| for n = j + Npj′, including determinants of mappings Jacobians.

APPENDIX D

Element-Connectivity Notation

First make a correspondence between 2D indexes n and nodes on the four element boundaries:

 
formula

Then obtain 3D distance squared between spherical points as

 
formula

We can thereby define element-connectivity index sets

 
formula

(both of size Nb) by a maximum 3D distance Δ(λn,, ϕn,, λn′,ℓ′, ϕn′,ℓ′) ≤ 10−9 for all (n, n′) ∈ 2b, (ℓ, ℓ′) ∈ 2b, and ℓ ≠ ℓ′, where the number of elements sharing the bth boundary node is

 
formula

These relationships bear a unique global boundary-node index b ∈ {1, … Nbn}, where the global number of possible element-boundary nodes on the cubed sphere is Nbn[≡2 + (2p − 1)Ngh for uniform subdivision].

Footnotes

Corresponding author address: Dr. Aimé Fournier, NCAR, P.O. Box 3000, Boulder, CO 80307-3000. Email: fournier@ucar.edu

*

The National Center for Atmospheric Research is sponsored by the National Science Foundation.