1. Introduction
o3o3, an alternative local Galerkin method (LGM; Steppeler and Klemp 2017), is a generalization of the third-order spectral element method (SEM3; Taylor et al. 1997; Taylor and Fournier 2010). It uses third-order piecewise cubic polynomials for the representation of both the fields and the fluxes. Here, “onom” is an abbreviation of an LGM where the fields are represented by nth-degree polynomials and the fluxes by such of degree m. The SEM3 and o3o3 methods use the same representation of fields. Steppeler and Klemp (2017) defined the schemes o3o3 and o2o3 without investigating the accuracy and stability. o3o3 is somewhat related to the family of multimoment finite-volume methods described by Chen and Xiao (2008), where the latter does not make a definite basis function assumption.
In this study, LGM is used in both methods to obtain the discretization. This is quadrature approximation for SEM3. The details of the LGM procedure for o3o3 will be described in section 3. Both o3o3 and SEM3 cover the computational area with cells, which are cells of square 3dx × 3dz in this study. For SEM3, we use dx and dz to denote the average distance between the degrees of freedom of each field; however, for a uniformly spaced method such as o3o3, dx and dz also denote the exact distance. Within the cells, the fields are represented by third-order serendipity interpolation (Steppeler 1976; Ahlberg et al. 1967). This means that the corner points define a bilinear spline. A third-order field component is added to this, which is based on amplitudes on the edges of cells. In one dimension (1D), this means that fields and fluxes are represented as third-degree piecewise polynomials. For simplicity, we express the theory in 1D. In each cell, a field h(x) is represented by a third-degree polynomial and the polynomials belonging to different cells fit together continuously. With a regular grid xi = i × dx (i = 0, 1, 2, 3, …), the cells for both SEM3 and o3o3 are the intervals (xi, xi+3) (i = 0, 3, 6, 9, …). The coefficients of the third-degree polynomials for each cell can equivalently be described by gridpoint values at four points
SEM3 and the SEM schemes of higher polynomial degree than 3 have been frequently considered for application in realistic models. These two schemes modify the concept of the classical Galerkin scheme [see Steppeler (1987) for a review] to be local with explicit time schemes and therefore are suitable for massively multiprocessing computers (Taylor et al. 1997). The suitability of SEM for parallel computing with explicit time schemes comes from avoiding nonlocality in the computation of derivatives. For finite-element methods (FEMs) that use the classical Galerkin scheme on piecewise polynomial spaces, nonlocality is caused by a global mass matrix. SEM avoids nonlocality by using a lumped mass matrix obtained from inexact quadrature. It has other interesting properties, such as preservation of conservation laws (Giraldo 2001; Taylor and Fournier 2010). SEM3 provides a fourth-order uniform approximation, which means that the approximation order does not drop below 3 at any point. Tomita et al. (2001) gave an example how a low-order approximation at some points, such as poles, can lead to grid imprinting, meaning that the grid is seen in the numerical solution. Models with a uniform order of approximation weaken such grid imprinting. Baumgardner and Frederickson (1985) gave an example of a model on the sphere with a second-order uniform approximation and Steppeler et al. (2008) defined a shallow-water scheme on the sphere of uniform third-order. Both models weakened grid imprinting, but were nonconserving. One of the advantages of SEM of all polynomial degrees is that their order is uniform. They can weaken grid imprinting and are suitable for irregular grids (Taylor et al. 1997). SEM combines a high-order approximation with conservation. This is considered as a major advantage of SEM, as older high-order models, such as that of Kalnay et al. (1977) are nonconserving.
While o3o3 may share all of the mentioned advantages with SEM3, in the present paper we investigate only the conservation of first-order moments by o3o3 and the uniform approximation order of at least 3. A regular collocation grid is considered a major advantage for physical parameterization (Herrington et al. 2019). We describe the continuity properties and alternative LGM schemes in section 2. Section 3 outlines the o3o3 and SEM3 methods and presents the governing equations and their discretizations in flux form employed in all subsequent experiments. Section 4 describes the performance of a time step. Section 5 illustrates the results of the homogeneous advection test for accuracy and stability of the o3o3 method. The practical performance of o3o3 in 2D and 3D is discussed in section 6 and the study concludes in section 7.
2. Continuity properties and alternative local Galerkin schemes
This study investigates the o3o3 scheme (Steppeler and Klemp 2017) as an example of an alternative to the SEM scheme. The latter is currently the only LGM scheme near practical application in realistic models for forecast and climate simulation (Dennis et al. 2012). Analogous to SEM, o3o3 uses continuous basis functions but employs an alternative discretization which results into stability on the uniform grid.
The approximation space, consisting of continuous piecewise polynomial functions, is a basis of a number of important numerical approaches, such as FEMs, SEMs, third-degree method (Giraldo 2001; Ahlberg et al. 1967; Steppeler 1976). However, this is not the only possible choice and this section aims to look at other possibilities. In the interior of cells, a polynomial representation is used. In this respect, the basis function space is characterized by the order of the polynomial representation. It is possible to approximate the flux in a higher polynomial space than the field. Therefore, there are two orders n and m to characterize the approximation spaces, named onom, with n being the polynomial order for the field and m that of the fluxes. All methods tested so far use n = m, but Steppeler and Klemp (2017) suggested the untested method o2o3, approximating the flux to third order, while the field is approximated by second-order polynomials. The resulting methods would then be third order. The regularity of fields at the cell boundaries is another distinguishing element of approximation spaces. We consider C0 as the space of continuous functions, C1 as the space of differentiable functions, C−1 as the space of discontinuous functions and so on (Evans 1998). In connection with polynomial representations in the interiors of cells, the C−1 space is used to obtain approximations (Cockburn and Shu 2001). Such methods are called discontinuous Galerkin method, which are not considered in this paper. For the regularity of fields and fluxes at cell boundaries we call the methods cn′cm′. The total characteristic of an approximation space becomes onomcn′cm′. The special case after o3o3 considered here could more precisely be called o3o3c0c0. The method o2o3 defined but not explored by Steppeler and Klemp (2017) would in more extensive notation be called o2o3c0c1. For the special case of o3o3, the definition of grids, function systems and related LGM approximations will be given in sections 3 and 4. The LGM to be used with a space onom is not uniquely determined. The o3o3 scheme to be described in the following is given in two versions and many more versions are possible. The algebraic appearance of a scheme based on a space onom may be different. SEM schemes normally define the polynomial in the interior of a cell by collocation grid points (Taylor et al. 1997; Giraldo 2001). This representation is also used with o3o3, but second and third derivatives at cell centers are used as alternatives in this paper and are called the spectral space. Alternative to second and third derivatives could be one-sided field derivatives at cell boundaries, an option not followed in this paper. Such alternative spectral spaces would give an equivalent alternative arithmetical form to the LGM procedure. Such alternative arithmetic forms can lead to different numerical efficiencies for the same scheme.
The impact, that the different ways of organizing the calculations have on computational efficiency, is not investigated in this paper. However, it is clear that with the use of the second-order and third-order derivatives of the fields as amplitudes (hxx,i and hxxx,i) and performance of the time step in gridpoint space, SEM3 involves the same transformation effort as o3o3. The spectral calculations may be marginally cheaper with o3o3. Therefore, it may be expected that the performance of one time step causes about the same cost as SEM3. This is marginally more expensive than the standard classical fourth-order (o4) scheme (see Steppeler et al. (2008) and Kalnay et al. (1977) for applications). A performance of o3o3 in spectral space for the edge amplitudes would bring the cost of o3o3 into the order of that of the o4 scheme. This latter option would require that the physics scheme be performed at corner points only, which would save computer time by itself. o2o3 is a method using the space c0c1 and its analysis is left for future work. With that said, a calculation by this study’s authors suggested a CFL condition of 1.8. Although the time-stepping procedure of o2o3 is somewhat more complicated than that of o3o3, this work indicates there remains a large family of possible methods that are in need of further study. For example, it is possible to use C1 as approximation space for h(y). If the fluxes are also approximated in C1, the result of differentiation would be in C0, then a continuous function must be approximated by a C1 function, being differentiable at corner points. This would lead to a new version of an o3o3 scheme. It is also possible to approximate the flux in a higher space, allowing for continuous derivatives, meaning the the second-derivative exists at corner points. Obviously such approximations for the flux can be done in C2. Therefore, the resulting scheme would be classified as o3o5. There is no indication that such modified o3o3 or o3o5 would not work. Currently there exists no investigation into the accuracy and stability of such schemes.
Our investigations concentrate on rather low polynomial degrees. The LGM (Steppeler and Klemp 2017), SEM3 and o3o3 are to a large part related and these schemes will be described in the following section together.
3. The SEM3 and o3o3 schemes













(a) o3o3 and (b) SEM3 grids.
Citation: Monthly Weather Review 147, 6; 10.1175/MWR-D-18-0288.1






For the collocation grid Eq. (4), there are also weights w0, w1, w2, w3 = (1/6)dx′, (5/6)dx′, (5/6)dx′, (1/6)dx′ defined. The weights can be used to form integrals of the fields, such as for the computation of mass. o3o3 does not use gridpoint formula such as Legendre–Gauss–Lobatto to compute integrals. Integrals are rather computed in spectral space by integrating the polynomials analytically.








Equations (7)–(12) define a basis function representation of h(x′). The basis function representation is used in the same way for SEM3 and o3o3. The polynomial basis



The spectral space is the same for FEM3, SEM3, and o3o3. It is formed by the amplitudes hi, hxx,i+(3/2) and hxxx,i+(3/2) (i = 0, 3, 6, …). Because collocation points are different between these methods, the transformation formula between gridpoint space and spectral space are different for SEM3 and o3o3. The formulas have the same form but use different coefficients. As the values hi (i = 0, 3, 6, …) are gridpoint values and spectral coefficients at the same time, we need transformation formula for i + 1 and i + 2 only.











While Eqs. (14)–(16) give the transformation from gridpoint space to spectral space, the transformation from spectral space to gridpoint space can be done using Eqs. (7)–(12), when the spectral coefficients hi, hxx,i+(3/2), and hxxx,i+(3/2) are given.

The time stepping with a time step dt is done using the fourth-order Runge–Kutta (RK4) time scheme, as described by Steppeler et al. (2008). The Runge–Kutta scheme can be performed if the spatial derivative of the flux fl(x) is approximated according to Eqs. (7)–(12) in the same space as h(x).














The computation of the spatial derivative of fl(x) given in Eqs. (20)–(22) is used for SEM3. The discontinuous representation Eq. (21) is mass conserving by construction. From the Legendre–Gauss–Lobatto integration formula, it follows that Eqs. (20)–(22) is also mass conserving. Equation (21) means that the construction of the high-order part of ht is done in spectral space and transformed in gridpoint space. Other choices with SEM3 for the collocation points can violate mass conservation. Instability is possible with explicit time integration methods such as leapfrog or RK4 schemes (however, see Ullrich (2014b) for stable alternatives).
Equations (20)–(22) give the spatial derivative flx(x) of fl(x) at all collocation points and the RK4 time step for SEM3 can therefore be performed in the collocation gridpoint space. Equations (20)–(22) are FDM equations in gridpoint space with the unusual feature that the FDM equations given in Eqs. (20)–(22) are different for each of the grid points i, i + 1, i + 2. Such difference schemes using more than one difference equation for different grid points are called inhomogeneous FDM schemes (Chen and Xiao 2008; Ullrich 2014b). Therefore, SEM3 can be written as an inhomogeneous FDM schemes, while schemes such as centered differences or classical fourth-order spatial differences (Steppeler et al. 2008) are homogeneous FDM schemes.
o3o3 in the version discussed here also computes the time derivatives of h(x) at the collocation points, which are equally spaced [see Eq. (5)] in the case of o3o3. The RK4 time step can then be performed in gridpoint space. Equations (20)–(22) will not be used for computing spatial derivatives. The time derivatives for the high-order part hh(x) will rather be computed in spectral space and then transformed to gridpoint space. The discontinuous form Eq. (22) of ht(x) will not be used with o3o3 to compute derivatives.







Equation (30) uses a rather large stencil, including five grid intervals. The particular approximations used in this study are made for easy implementation and not for optimal performance. Optimal performance is not explored here, as the intention is only to give an example for alternative continuous Galerkin schemes. Inside a grid interval, the collocation points in this formulation are always regular. However, the grid sizes are allowed to be irregular to accommodate to the irregular grid structures which may occur with approximations on the sphere. Equation (26) would be valid for irregular grids as it is. The FDM schemes Eqs. (22) and (29) are formulated for regular grids. These formulas for irregular meshes must be replaced by the differentiation in the irregular grid.
4. Performance of a time step
The time step procedure for the field h(x) is done in gridpoint space, using the formulas for flux divergence derived in section 3. The gridpoint space to be used is not necessarily the regular grid xi (i = 0, 1, 2, 3, …) but the
RK4 requires the computation of the spatial derivatives hx(x). The collocation grid is redundant for the description of continuous fields [see Eq. (3)], as corner points occur twice. In the collocation grid, discontinuous fields can be represented, which in this paper occur as derivatives. In the redundant collocation grid, according to field values in Eq. (20),







To perform an RK4 time step, a transformation into spectral space is done obtaining the spectral amplitudes hi, hxx,i+(3/2), hxxx,i+(3/2) (i = 0, 3, 6, 9, …) from Eqs. (13)–(16) for both SEM3 and o3o3. These coefficients are used to compute the values
5. Results


Figure 2 shows the initial values representing the peak solution, as defined above over a distance of 300dx and 30 000dx for standard nonconserving o4 spatial differences (Figs. 2b,c). The transport over the larger distance of 30 000dx shows an increased dispersion error. Figures 2d–i are the same result for standard o3o3, o3o3 with spectral differences at corner points and SEM3. Figure 3 is the same as Fig. 2 for the smooth solution with f length = 4.

Runs with the peak initial condition and transport over 300 points with different spacing for dx = 1 and dt = 1: (a) initial values, (b) standard o4 spatial difference, (d) o3o3 standard difference, (f) o3o3 spectral difference, and (h) SEM3; transport over 30 000 points: (c) standard o4 spatial difference, (e) o3o3 standard difference, (g) o3o3 spectral difference, and (i) SEM3 at all points.
Citation: Monthly Weather Review 147, 6; 10.1175/MWR-D-18-0288.1

As in Fig. 2, but for a smooth solution.
Citation: Monthly Weather Review 147, 6; 10.1175/MWR-D-18-0288.1
For the transport over 30 000dx with dt = 1, the maximum of h(x) is given in Table 1 for the four schemes investigated. SEM3 is the best by a small margin and the two versions of o3o3 are rather similar in accuracy to the classical o4 scheme. As a quantitative measure of accuracy for the transport over the distance of 30 000dx the forecasted maximum of h(x) divided by four with solutions using dt = 1, dx = 1 is shown in Table 1. The higher the maximum is, the more accurate the scheme is. The accuracies for the four schemes are rather similar. For o4 and o3o3, the results are also shown for the larger time steps dt = 2 and dt = 2.5. The rather strong dependence of the results on the time step is remarkable. The spread of the results with dt is such that we cannot conclude more, than that the fourth-order schemes are similar in accuracy.
Maximum of field h(x) after a transport over 30 000dx. The higher the value, the more accurate is a scheme.

SEM3 and the two versions of o3o3 presented here have the advantage over standard o4 differences that they are mass-conserving schemes. The standard o4 scheme was used in a realistic model by Kalnay et al. (1977) and the lack of conservation was considered to be a disadvantage of that model.
To see the dependence of the accuracy on the resolution, the solution of Figs. 2d–g (o3o3 with f length = 4, dt = 1, dx = 1) is repeated for a forecast time of 600 with dx = 1/64, 1/32, 1/16, 1/8, 1/4, 1/2, 1, dt = 1/16 and the results are shown in Fig. 4 for both the o3o3 method and the o3o3 method with spectral differentiation. The accuracy of both schemes does not go beyond 5 × 10−3. This is caused by the presence of a computational mode within this bell shaped solution. By plotting in a smaller scale (not shown here), this can be shown to be a small-scale wave of this amplitude and this is connected to the presence of the dispersion properties which will be investigated later. The convergence experiment was repeated using one sine wave over 192 points. Such a pure wave has no component of a computational mode. It was shown, that for the prediction of one wave cycle the convergence was fourth-order and the accuracy went down to 2 × 10−5 for the resolution dx = 1 and wavelength 192 (not shown here).

Maximum error of o3o3 standard (dotted scatters) and o3o3 spectral differences (square scatters) for different grid spacings: dx = 1/64, 1/32, 1/16, 1/8, 1/4, 1/2, 1.
Citation: Monthly Weather Review 147, 6; 10.1175/MWR-D-18-0288.1








Spectral diagrams of (left) imaginary and (right) real parts of eigenvalues dependent on k, where o3o3 standard is in the first row and o3o3 with spectral differences at corner points is in the second row.
Citation: Monthly Weather Review 147, 6; 10.1175/MWR-D-18-0288.1
For o3o3, a large part of the spectrum is stationary (phase velocity is 0). These stationary modes arise when the discrete fourth derivative at corner points evaluates to zero and hi+3 = hi, in which Eqs. (29) and (30) evaluate to zero. For another part of the spectrum, the phase velocity is negative. While it clearly indicates that there is room for improvements, the comparison with the two control runs o4 and SEM3 shows that o3o3 is competitive in accuracy.
Ullrich (2014a) defines a measure of the effective resolution, indicating the smallest scale of a method to obtain useful forecasts. This is the smallest wave for which the absolute error of phase velocity is less than 1%. The effective resolution comes out to be 7.5dx for o3o3, 7.9dx for o4, and 8.4dx for SEM3. The large value for SEM3 is caused by the spectral gap (Ullrich et al. 2018). Notably, o3o3 and o4 do not have a spectral gap. For SEM3 the solution can be improved by hyper-viscosity. The impact of hyper-viscosity is not investigated for o3o3. All four schemes investigated are neutral, in having no implicit diffusion.
The largest imaginary eigenvalue for o3o3 is 3.29 (dx = 1/3), thus resulting in a CFL condition of 2.53 under the RK4 scheme. A table of CFL condition for the methods investigates here is given in Table 2. For comparison, it is mentioned that RK4 with centered differences in space has a CFL condition of 2.8. The CFL condition of o3o3 with spectral differences at corner points turned out to be higher than standard o4 differences by 0.2. For standard o3o3, the CFL condition was higher than standard o4 by 0.5, while for SEM3 it was lower by 0.5. Standard o3o3 had a CFL condition higher than SEM3 by a factor of 1.67, which resulted in shorter runtime of the program.
The CFL condition of RK4 with spatial centered difference is 2.8.

Using the measure of effective resolution, the accuracy of o3o3 is similar to the other third- or fourth-order schemes used for the control run. SEM3 and o3o3 are mass conserving. An advantage of o3o3 over SEM3 is the regular collocation grid and the larger time step. For 1D there are small differences of the cost to produce a time step. In 2D, o3o3 has a considerable advantage, as the sparse grid used in higher-dimensional spaces is potentially much more economical than the full grid (see details in section 6).
6. Practical considerations for 2D and 3D
The practical usefulness of o3o3 or any other numerical scheme depends very much on factors like the number of operations necessary per grid point, suitability for scaling well on multiprocessing computers and the data volume per grid point which needs to be communicated to other points. In this section, we only give a preliminary answer, as the mentioned factors depend very much on the implementation of a scheme and can realistically be investigated only by doing systematic experiments on different computer architectures. Such investigations in the field of informatics are beyond the scope of this study. For 1D calculations and a rather coarse estimate, we do not consider the small differences in performance between the schemes SEM3, o3o3, and o4. These methods should be rather similar in performance per grid point. Though SEM3 and o3o3 are simpler in the performance of a time step than o4, SEM3 and o3o3 need an extra (small) spectral transformation compared to o4, which is not necessary with o4. As a rough estimate, we can expect a similar performance per grid point. In this paper, we do not investigate computational advantages by using larger time steps, which can be chosen somewhat higher with o3o3. However, in 2D, there may be computational advantages for o3o3 by using sparse grids. The use of sparse grids with o3o3 is similar to the situation with FEMs (Ahlberg et al. 1967). With sparse grids, some of the points of a regular 2D or 3D grid are not predicted in time and a corresponding saving of computer time is achieved. Therefore, considerations of numerical efficiency are incomplete without using more than one dimension.
We use tensor function spaces to expand to more than 1D and investigate an example for gaining computational efficiency from sparse grids. Lagrange elements and simplices are not considered in this paper. SEM3 implementations use the full grid (xi, zk) = (i × dx, k × dz), where i, k = 1, 2, 3, …Third-order FEMs use a reduced or sparse grid obtained from the full grid by omitting points when neither i or k are multiples of 3 (Ahlberg et al. 1967). For the 2D case, the full and reduced grids are shown in Fig. 6. All points together shown in Fig. 6a form the full grid. The points shown in Fig. 6b as white are unused for dynamics. The points in black form the sparse grid and are called dynamic points. The fields in the interior of the cell are obtained by bilinear interpolation when plotting (see details in Ahlberg et al. 1967; Steppeler 1987). The sparseness factor, being the ratio of the number of points of the sparse and the full grid is S = 5/9 (about 1/2). With the o3o3 scheme most of the terms used for the calculations are done on coordinate lines. Operations defined in 1D can be used on the coordinate line.

Computational grid in 2D. (a) full grid and (b) sparse grid where unused points are indicated in white.
Citation: Monthly Weather Review 147, 6; 10.1175/MWR-D-18-0288.1



The grid point amplitudes of h(x, z) and the fluxes have two indices i, k. As in the 1D case, the corner amplitudes hi,k (i, k = 0, 3, 6, …) are both gridpoint values and spectral amplitudes and are not transformed. The spectral amplitudes for the 2D case are hxx,i+(3/2),k, hxxx,i+(3/2),k and hzz,i,k+(3/2), hzzz,i,k+(3/2). As in the sparse grids all points are on coordinate lines x = xi and z = zk, the spectral amplitudes can be computed by 1D operations along these coordinate lines, as described in sections 3 and 4.




According to the serendipity interpolation flx,2(x, z) is a cubic spline in z direction and linear in x direction. The flx,2(x, z) is also a small field. When approximating analytic functions, the maximum of flx,2(x, z) goes to 0 in fourth order. This can be used to design approximations and the motivation of the sparse grid also comes from sparseness. The basis functions to be used with the unused points would be squares of the basis functions used with flx,2(x, z). This means that the neglections leading to the sparse grid concern small terms.




Figure 7 gives the result of advection for a square grid of 300 × 300 point grid with dx = dz = 1.0, dt = 1.0 s. The velocity components u0 and w0 are (1, 1) for the first 100 time steps, (−1, 0) for the second 100 time steps and (0, −1) for the last 100 time steps. This velocity field is changed in time to obtain a solution where the initial field is reproduced after 300 time steps. In Fig. 7, 50 time steps are done between plots and the plots are done to show the sparse grid structure. The unused points are assigned a time derivative of 0. Therefore, the initial values at such points remain there at all times. In the area of the forecasted field, the unused points have the value 0. To see the field structure better, blowups at 50th time steps are given at the areas of the initial values and the forecasted fields (Fig. 7c). The forecasted field has holes at places, where there are maxima in the area of the initial conditions. When the forecast arrives at the initial position, the two figures combine to give a smooth field again. This way of plotting is chosen to give a graphical illustration of sparseness. When interpolating to the unused points before plotting, ordinary smooth fields are obtained.

2D results: advection with a homogeneous velocity field changing in time. (a) The plots are done to indicate the sparse grid. The unused points are plotted with amplitude 0 for the forecasted fields and at the position of the initial values the unused amplitudes have their original values. The plot gives a blowup of the fields belonging to (b) the initial time and (c) the 50th time step to show that these diagrams are like the positive and negative pictures of the same structure.
Citation: Monthly Weather Review 147, 6; 10.1175/MWR-D-18-0288.1
The example included here is to discuss the potential of computer efficiency associated with sparse grids. A substantial gain of computer efficiency may be obtained by the sparse grid, which comes naturally with o3o3. The sparseness factor in 2D is the relation of the number of dynamic points to all points: S = 5/9. If the computational cost of o3o3 per grid point is the same as that the same as for a FDM scheme on the full grid, this promises a saving of computer time in 2D by the factor of S. SEM schemes, such as Giraldo (2001) often use SEM only for the horizontal. However, doing the sparse grid in 3D creates a larger potential for saving from sparseness. With o3o3 in 3D, we have S = 7/27 (about 1/4) for cubic cells.
For time stepping, we use RK4 and this scheme is not totally uncompetitive (Durran 2012). In practical modeling RK3 is often preferred to RK4 (Klemp et al. 2007). Also, the different arithmetic forms of o3o3 can have an impact on the numerical performance of a model. While for a realistic model in practical use such small differences in performance may be worth investigating, in the present paper we do not investigate this.
It should be mentioned that the two versions of o3o3 presented here are not the only possibilities, but rather examples for alternative LGMs. It is not likely that the schemes presented are optimal. The procedures in this paper were chosen for simplicity of programming and not for being optimal. In Eq. (30), a rather wide stencil for the computation of ht,xxx,i+(3/2), spreading over ten points was used. A more narrow stencil for the computation of this quantity is an option which could be investigated. A large number of options exist, which could be used for optimization.
7. Conclusions
The two versions of the o3o3 LGM scheme share with SEM3 the properties of being at least uniform fourth order and conserving first-order moments. The accuracy of the simulation was rather similar for the schemes tested. In this study, it was not investigated if other interesting properties of SEM3 are shared by o3o3. o3o3 differs from SEM3 by having a regular collocation grid. One of the o3o3 schemes had a CFL condition about 25% higher than standard o4 differences according to Table 1. SEM3 had a more restrictive CFL condition than either of the o3o3 schemes.
Two versions of o3o3 were investigated. These are examples for LGM schemes being alternatives to the currently popular quadrature approximation with SEM schemes. There are more options, such as using other FDM schemes at the corner points and employing polynomial spaces being more regular, such as differentiable at the corner points. The investigation of this large family of LGM schemes is above the scope of this paper and could potentially lead to a further increase of efficiency.
For 2D, o3o3 was implemented on the sparse serendipity grid, where not all points of the corresponding regular grid are used as dynamic points. In 2D, the sparseness factor, which is the proportion of the dynamic points to all points is 5:9 on square grids, which promises a corresponding increase of computational efficiency. The sparseness factor in 3D is 7:27.
On the negative side, our particular implementation of o3o3 has a rather large stencil of 16 points and a null space. Such features might be improved by investigating other options for o3o3. The dynamic equations are not solved on such unused points, which promises a considerable saving of computer time.
No author reported any potential conflicts of interest. This work is jointly supported by the National Key Research and Development Program of China (Grant 2017YFC0209800), the China Postdoctoral Science Foundation (Grant 2016M601101), and the National Key Research and Development Program of China (Grant 2016YFC1401705). The authors thank the cities of Erquy, Brittany, France and Bad Orb, Germany for providing office space for this cooperation. Dr. Fang is grateful for the support of the EPSRC grant: Managing Air for Green Inner Cities (MAGIC) (EP/N010221/1).
APPENDIX
Computation of the Evolution Matrix 




















REFERENCES
Ahlberg, J. H., E. N. Nilson, and J. L. Walsh, 1967: The Theory of Splines and their Application. Academic Press, 296 pp.
Baumgardner, J. R., and P. O. Frederickson, 1985: Icosahedral discretization of the two-sphere. SIAM J. Numer. Anal., 22, 1107–1115, https://doi.org/10.1137/0722066.
Chen, C., and F. Xiao, 2008: Shallow water model on cubed-sphere by multi-moment finite volume method. J. Comput. Phys., 227, 5019–5044, https://doi.org/10.1016/j.jcp.2008.01.033.
Cockburn, B., and C. W. Shu, 2001: Runge– Kutta discontinuous Galerkin methods for convection-dominated problems. J. Sci. Comput., 16, 173–261, https://doi.org/10.1023/A:1012873910884.
Dennis, J. M., and Coauthors, 2012: CAM-SE: A scalable spectral element dynamical core for the community atmosphere model. Int. J. High Perform. Comput. Appl., 26, 74–89, https://doi.org/10.1177/1094342011428142.
Durran, D., 2012: Numerical Methods for Fluid Dynamics with Applications to Geophysics. Springer, 465 pp.
Evans, L. C., 1998: Partial Differential Equation. Graduate Studies in Mathematics, Vol. 19, American Mathematical Society, 749 pp.
Giraldo, F. X., 2001: A spectral element shallow water model on spherical geodesic grids. Int. J. Numer. Methods Fluids, 35, 869–901, https://doi.org/10.1002/1097-0363(20010430)35:8<869::AID-FLD116>3.0.CO;2-S.
Herrington, A. R., P. H. Lauritzen, M. A. Taylor, S. Goldhaber, B. E. Eaton, J.T. Bacmeister, K. A. Reed, and P. A. Ullrich, 2019: Physics–dynamics coupling with element-based high-order Galerkin methods: Quasi equal-area physics grid. Mon. Wea. Rev., 147, 69–84,https://doi.org/10.1175/MWR-D-18-0136.1.
Kalnay-Rivas, E., A. Bayliss, and J. Storch, 1977: The 4th order GISS model of the global atmosphere. Beitr. Phys. Atmos., 50, 299–311.
Klemp, J. B., W. C. Skamarock, and J. Dudhia, 2007: Conservative split-explicit time integration methods for the compressible nonhydrostatic equations. Mon. Wea. Rev., 135, 2897–2913, https://doi.org/10.1175/MWR3440.1.
Steppeler, J., 1976: The application of the second and third degree methods. J. Comput. Phys., 22, 295–318, https://doi.org/10.1016/0021-9991(76)90051-6.
Steppeler, J., 1987: Galerkin and Finite Element Methods in Numerical Weather Prediction. Duemmler, 120 pp.
Steppeler, J., and J. B. Klemp, 2017: Advection on cut-cell grids for an idealized mountain of constant slope. Mon. Wea. Rev., 145, 1765–1777, https://doi.org/10.1175/MWR-D-16-0308.1.
Steppeler, J., P. Ripodas, and S. Thomas, 2008: Third-order finite-difference schemes on isocahedral-type grids on the sphere. Mon. Wea. Rev., 136, 2683–2698, https://doi.org/10.1175/2007MWR2182.1.
Taylor, M. A., and A. Fournier, 2010: A compatible and conservative spectral element method on unstructured grids. J. Comput. Phys., 229, 5879–5895, https://doi.org/10.1016/j.jcp.2010.04.008.
Taylor, M., J. Tribbia, and M. Iskandarani, 1997: The spectral element method for the shallow water equations on the sphere. J. Comput. Phys., 130, 92–108, https://doi.org/10.1006/jcph.1996.5554.
Tomita, H., M. Tsugawa, M. Satoh, and K. Goto, 2001: Shallow water model on a modified isosahedral geodesic grid by using spring dynamics. J. Comput. Phys., 174, 579–613, https://doi.org/10.1006/jcph.2001.6897.
Ullrich, P. A., 2014a: A global finite-element shallow-water model supporting continuous and discontinuous elements. Geosci. Model Dev., 7, 3017–3035, https://doi.org/10.5194/gmd-7-3017-2014.
Ullrich, P. A., 2014b: Understanding the treatment of waves in atmospheric models. Part I: The shortest resolved waves of the 1d linearized shallow-water equations. Quart. J. Roy. Meteor. Soc., 140, 1426–1440, https://doi.org/10.1002/qj.2226.
Ullrich, P. A., D. Reynolds, J. E. Guerra, and M. A. Taylor, 2018: Impacts and importance of diffusion on the spectral element method: A linear analysis. J. Comput. Phys., 375, 427–446, https://doi.org/10.1016/j.jcp.2018.06.035.