## Abstract

Several transport schemes developed for spherical icosahedral grids are based on the piecewise linear approximation. The simplest one among them uses an algorithm where the tracer distribution in the upwind side of a cell face is reconstructed using a linear surface. Recently, it was demonstrated that using second- or fourth-order reconstructions instead of the linear one produces better results. The computational cost of the second-order reconstruction method was not much larger than the linear one, while that of the fourth-order one was significantly larger. In this work, the authors propose another second-order reconstruction scheme on the spherical icosahedral grids, motivated by some ideas from the piecewise parabolic method. The second-order profile of a tracer is reconstructed under two constraints: (i) the area integral of the profile is equal to the cell-averaged value times the cell area and (ii) the profile is the least squares fit to the cell-vertex values. The new scheme [the second upwind-biased quadratic approximation (UQA-2)] is more accurate than the preceding second-order reconstruction scheme [the first upwind-biased quadratic approximation (UQA-1)] in most of the tests in this work. Solutions of UQA-2 are sharper than those of UQA-1, although with slightly larger phase errors. The computational cost of UQA-2 is comparable to UQA-1.

## 1. Introduction

Spherical icosahedral grids are usually generated by grid partitioning, starting from the icosahedron (e.g., Heikes and Randall 1995). Recently, they are beginning to be widely adopted for atmosphere and ocean models (e.g., Ringler et al. 2000; Satoh et al. 2008; Skamarock et al. 2012). One reason may be their quasi-uniform grid structure, which enables higher computational efficiency on massively parallel computers. A global high-resolution simulation, which covered the entire sphere with cells of several-kilometer scale and, thus, required huge computing power, was realized on the grid (e.g., Miura et al. 2007). Although durations of such high-resolution simulations are currently limited from 1 to 3 months so far, increases in computer speed will hopefully enable such high-resolution climate simulations in the near future. In terms of climate simulations, accurate transports of water substances, aerosols, and chemical species are crucial for a realistic hydrological cycle and accurate radiation balance. Therefore, accurate transport schemes are preferable to enhance credibility of climate simulations.

Research on transport schemes may be regarded as one of the central topics in numerical modeling on spherical icosahedral grids. Stuhne and Peltier (1996) and Majewski et al. (2002) developed transport schemes using the advective form equations and second-order polynomial fitting. It may be possible to develop higher-order schemes by making higher-order polynomial fits with larger stencils. However, it is very difficult for transport schemes in the advective form to ensure conservation of transported quantities, and thus, they might be unsuitable for long-term simulations. In contrast, transport schemes in the flux form are well suited for conservation. The finite-volume method is widely adopted to develop flux-form transport schemes on the spherical icosahedral grids because it can deal with unstructured grids in a relatively straightforward manner. There is a large amount of literature on the finite-volume method on unstructured grids and some approaches appear attractive for future transport-scheme developments (e.g., Iske and Sonar 1996; Friedrich 1998), but it is not the purpose of this study to give an overall review of these schemes. Here, we restrict out attention to the transport schemes developed for the spherical icosahedral grids.

For multistage time-stepping schemes, Masuda and Ohnishi (1986) developed a spatial discretization method that was equivalent to a second-order-centered scheme on the regular rectangular grid, and Heikes and Randall (1995), Tomita et al. (2001), and Ringler and Randall (2002) followed their approach with some modifications. Lee and MacDonald (2009) and Weller et al. (2009) used upwind-biased polynomial interpolations to have improved results. Recently, Skamarock and Gassmann (2011) provided third- and fourth-order discretization methods. In contrast, for single-stage forward-in-time schemes, Thuburn (1997) extended the Uniformly Third-Order Polynomial Interpolation Algorithm (UTOPIA) scheme (Leonard et al. 1993; Rasch 1994) to spherical icosahedral grids although distortions of grid cells were not taken into account; the interpolation constants derived on the perfect hexagonal grid were used. Lipscomb and Ringler (2005, hereafter LR05) and Yeh (2007) applied the piecewise linear approximation of van Leer (1977) and reconstructed piecewise linear profiles inside hexagonal or pentagonal cells. Miura (2007b, hereafter M07b) introduced two simplifications, which will be explained in the next section, to avoid complex conditional branching in the methods of LR05 and Yeh (2007). Recently, Skamarock and Menchaca (2010, hereafter SM10) improved the scheme of M07b by replacing the linear reconstruction by second- or the fourth-order reconstructions. SM10 showed that the fourth-order reconstruction scheme was less diffusive than the second-order one, but it had much higher computational cost. We will present a review of the M07b and SM10 schemes in section 2.

In this paper, following M07b and SM10, we propose another second-order reconstruction-based scheme that produces more accurate simulations with only a moderate increase in computational cost. The second-order profile is reconstructed under two constraints. One is that the area integral of the profile inside a hexagonal or pentagonal cell is equal to the cell-center value multiplied by the cell area. The other is that the profile is the least squares fit to the cell-vertex values. We did not choose the fourth-order reconstruction because of its high computational cost demonstrated by SM10.

Section 2 describes about the schemes of M07b and SM10 first, and then, introduces the algorithm of the new scheme. Results from the new scheme are compared to those from the schemes of M07b, SM10, and other schemes in section 3. They are subjected to a cosine-bell advection test (Williamson et al. 1992), the slotted-cylinder advection test of LR05, and a deformational flow test from Nair and Lauritzen (2010). Computational performance is also compared on three different computer architectures: Apple iMac, HP ProLiant, and HITACHI HA8000. Section 4 presents the summary.

## 2. Transport schemes

The prognostic equation for a tracer amount transported by velocity can be written in the flux form as

where is density and is mixing ratio. Assuming a uniform distribution of , we obtain a form that is often considered in the researches of transport schemes as

If flow is divergent and density distribution is nonuniform, it may be required to consider the consistency between the discrete continuity and the transport equations (Gross et al. 2002; Niwa et al. 2011). The consistency with continuity is beyond our scope here and we limit our attention to (1). In this section, we first review the spatial discretization methods of the flux-divergence operator proposed by M07b and SM10, and then we introduce our new method.

The finite-volume method is used for a hexagonal or pentagonal cell to discretize the flux-divergence operator of (1) for the zeroth cell that includes the node point (Fig. 1a) as

where is the mixing ratio at ; is the area of the zeroth cell; is the number of the cells surrounding the zeroth cell; is the length of the *i*th cell face shared by the zeroth cell and the *i*th surrounding cell; is the unit vector normal to the *i*th cell face that points outward from the zeroth cell; and and are the mixing ratio and the velocity vector, respectively, estimated at the center of the *i*th cell face . We use a single-stage forward-in-time scheme to discretize the left-hand side of (2). The rest of this section describes about three methods to determine .

### a. Upwind-biased linear approximation (ULA) by M07b

It is assumed that the flow is outward from the zeroth cell along the *i*th cell face (). One of the two assumptions introduced by M07b is that the tracer amount swept out from the zeroth cell through the *i*th cell face during a time step is equal to the tracer amount contained in a parallelogram configured in the upwind side. The parallelogram is defined by shifting the *i*th cell face by , and an example is schematically illustrated by the gray shaded areas in Figs. 1a,b. This assumption gives the following form for :

where is the area of the parallelogram and means the area integral of a tracer profile inside the parallelogram. M07b and SM10 assumed the first- and the second-order distributions for , respectively, in a neighborhood of :

The other assumption introduced by M07b is that the tracer distribution about represents the tracer distribution inside the parallelogram even if the parallelogram overlaps other cells. These two assumptions hold not only for the schemes of M07b and SM10 but also for our new scheme.

The least squares fit on the local two-dimensional coordinate, following Stuhne and Peltier (1996) and Majewski et al. (2002), is also a common procedure to determine the first- or the second-order profile. One simple method to project the position of the node point of the *i*th surrounding cell onto the local *x–y* coordinate with an origin is

where and constitute a pair of the orthonormal basis, one choice of which can be the eastward and the northward tangential unit vectors except for the poles. Assuming about , we have the following matrix formula for (5):

where

We determine the coefficient vector using for the pentagonal cell or using for the hexagonal cell. Here, and are the inverse and pseudoinverse matrices of , respectively, which can be computed, for example, by using the singular vector decomposition algorithm (e.g., Golub and Reinsch 1970).

M07b reset the coefficients of the quadrature terms (, , and ) to zero for the linear reconstruction (4). The area integral in (3) about this linear profile can be computed as

where can be computed by substituting

which is the center of the parallelogram and corresponds to the Gauss quadrature point, in place of in (6). Substituting (8) into (3), we obtain a formula for as

We call this method the upwind-biased linear approximation (ULA) in this paper.

### b. The first upwind-biased quadratic approximation (UQA-1) by SM10

SM10 used (5) after modifying it to satisfy a constraint that the area integral of the profile *f* inside a hexagonal or a pentagonal cell is equal to the tracer amount contained in the cell. We confirmed that phase error was reduced and accuracy was improved by this modification (not shown). About the zeroth cell in Fig. 1, this constraint is written as

where means the area integral of over the zeroth cell. To satisfy this constraint, the second-order profile is shifted, as schematically drawn in Fig. 1c, by including a correction term, that is, by adjusting the constant in the polynomial. The new profile becomes

where the subscript *c* means “corrected.” Because the method to compute the correction term was not explicitly provided in SM10 and we use the same method in our new scheme later, it is worth noting our algorithm here.

It is known that the area integral of a second-order profile over a triangle can be calculated by using three Gauss quadrature points located at the center of each edge of the triangle. From this fact, the area integral of (5) inside the triangle (Fig. 2), with vertices , and , is

where is the area of the triangle; and , , and are the coordinates of the quadrature points , , and , respectively, in the local coordinate system. Here is at the edge midpoint between the cell node and the *i*th cell vertex , and is at the edge midpoint between and . Substituting (11) into (10) and using , we have , and the correction term is

where

Here it is assumed that the subscript *i* is cyclic for , and thus, . Because the weights *w _{i}* can be precomputed before the time loop, the computational cost of this correction is small.

The parallelogram on the upwind side of a cell face can be evenly split into two triangles (Fig. 1b), and thus the area integral of the polynomial (11) over the parallelogram can be calculated as

where , , , and are the coordinates of the Gauss quadrature points,

respectively; and is (9). Compared to another set of the Gauss quadrature points chosen by SM10, our choice requires one more point, but the computational cost is similar because the point is fixed and can be precomputed before the time loop. A merit of our choice is that is shared with ULA and the program code is also shared. Substituting (15) into (3), we obtain a formula to determine as

We call this method the first upwind-biased quadratic approximation (UQA-1).

### c. The second upwind-biased quadratic approximation (UQA-2)

Next, we introduce a new scheme called the second upwind-biased quadratic approximation (UQA-2). UQA-2 might be regarded as a variant of the piecewise parabolic method (PPM) of Colella and Woodward (1984) because some ideas of PPM are used in its reconstruction procedure. The original PPM requires in its initial stage that the quadratic profiles be reconstructed for each cell so that they are continuous at the cell interfaces in smooth parts of solution (discontinuities at the cell interfaces are introduced later for monotonicity). The cell interface values are determined by an interpolation and, if the grid is equally spaced, they constitute a fourth-order transport scheme under uniform flow. On the spherical icosahedral grids, we do not have appropriate methods to reconstruct quadratic profiles that are continuous along all cell faces for any tracer distribution. Therefore, we consider reducing discontinuities at cell vertices instead. As a result, two differences exist between the algorithms of UQA-1 and UQA-2. The first is that UQA-2 uses mixing ratios at the cell vertices (Fig. 3a) in the least squares fitting. The second is in the correction procedure for the second-order profile, the algorithm that we explain below. A schematic illustration is shown in Fig. 3c.

The coordinate of the cell vertices (Fig. 3a) can be computed by

The mixing ratio at the *i*th cell vertex, denoted by , is computed by combining two linear interpolations about the smaller and larger triangles schematically illustrated in Fig. 3b as

where we assume that the subscript *i* is cyclic so that for . Here the linear interpolation operator *I*, determining a scalar value at a point _{0} from the scalar values , , and at the vertices _{1}, _{2}, and _{3}, respectively, of a triangle (Fig. 2b), is defined by

where , , and are the areas of the inner triangles _{0}_{2}_{3}, _{0}_{3}_{1}, and _{0}_{1}_{2}, respectively. If the hexagonal grid is regular, (20) could constitute a fourth-order accurate gradient operator and a fourth-order accurate transport scheme under uniform flow (Miura 2007a). We use the pair of the constants, and −½, for all vertices of the spherical icosahedral grid, but this choice may be a weak point of UQA-2 because most of the hexagonal cells of the spherical icosahedral grid are somewhat distorted (e.g., Miura and Kimoto 2005).

Using at the cell vertices in (5), we have a matrix equation similar to (7) defining the least squares fit: . Here and in are those of the cell vertices and where T means transpose. In contrast to the UQA-1 scheme where the constant in the polynomial is adjusted after the least squares fit to satisfy the constraint in (10), we satisfy the constraint in the UQA-2 scheme by adjusting the cell-center value by and using a least squares fit to the adjusted cell-vertex values . The resulting polynomial for the UQA-2 scheme can be expressed as

where for pentagonal cells and for hexagonal cells. The constant is determined by integrating the polynomial over the cell in the application of the constraint (10):

where is the column vector of size ,

and the summation is over the quadrature points defined by (14). Introducing the row vector

we modify (22) to be

Using the fact that from , this equation can be solved for as

Using this in (21), the second-order profile is defined. To summarize, the polynomial (21) is constructed using a least squares fit of the polynomial to the adjusted vertex values , the full polynomial equals at the cell node point , and its area integral over the cell equals .

The Gauss quadrature points are the same as those of UQA-1. We can precompute , , and before the time loop, and thus, the computational cost of UQA-2 is only slightly larger than UQA-1 about the profile reconstruction. It should be noted, however, that UQA-2 requires larger stencils (13 for hexagons and 11 for pentagons) than UQA-1 (7 for hexagons and 6 for pentagons) because of the interpolation to the cell vertices (20). In a test shown in the next section, the computational cost of the data transfer, which accompanied the interpolation in our code, was not negligible when a larger number of processor cores were used.

## 3. Test results

The three transport schemes, ULA, UQA-1, and UQA-2, have been subjected to a cosine-bell advection test (Williamson et al. 1992), a slotted-cylinder advection test of LR05, and a deformational flow test (Nair and Lauritzen 2010) to compare their behaviors on the spherical icosahedral grids. The grids were generated by the iterative splitting from the icosahedron (Heikes and Randall 1995) and optimized by the spring method of Tomita et al. (2001) that was slightly modified by Miura and Kimoto (2005). After adjusting the positions of the grid nodes, the cell vertices were positioned at the barycenter of each triangle configured by three neighboring nodes. Computations were performed on grids having 2562, 10 242, 40 962, 16 382, and 655 362 hexagonal/pentagonal cells on the sphere. Instead of the “glevel” notation introduced by Tomita et al. (2001), we label them as H16, H32, H64, H128, and H256 from the number of hexagons plus one pentagon on each edge of the original icosahedron that is projected onto the sphere. Because it was incorporated into the derivations of ULA, UQA-1, and UQA-2, the single-stage forward-in-time scheme was used. The flux limiter of Thuburn (1995, 1996) was applied for monotonicity where not noted explicitly. The ZM-grid arrangement (Ringler and Randall 2002) was used; mixing ratios on the cell centers and flow velocities on the cell vertices.

### a. Advection of a cosine-shaped bell

Williamson et al. (1992) proposed a cosine-bell advection test as one of their seven test cases for shallow-water models. Most transport schemes on the icosahedral grids were subjected to this test (e.g., Heikes and Randall 1995; Thuburn 1997; LR05; Yeh 2007; M07b; Mittal et al. 2007; Lee and MacDonald 2009; SM10; Skamarock and Gassmann 2011). Because results were only weakly sensitive to the flow angle, the angle between the axis of the solid-body rotation and the pole axis of the sphere was set to zero. The definitions of the and norms to evaluate accuracy are as follows:

and

where and are the computed and exact solutions of the *i*th cell node, respectively, and is the area of the cell.

First, we examine the dependence of the solution errors on the time interval. The grid used was H64 and the time intervals were = 1800, 900, 450, 225, 100, and 50 s. Figure 4 indicates that the solution of ULA degrades rapidly as the time interval decreases, while the solution of UQA-1 is relatively insensitive to the time interval. These are the same features as those shown in Fig. 6 of SM10 and the magnitudes of the error norms of ULA and UQA-1 are very similar to their first- and the second-order reconstruction schemes. As it was discussed by van Leer (1977) and M07b, an overestimation of phase error with a Courant number smaller than 0.5 is a common feature of the schemes using the piecewise linear approximation, so not only ULA but also the schemes of LR05 and Yeh (2007) probably suffer from this error, as in Fig. 8a of LR05. UQA-2 is more accurate than UQA-1 for all of the time intervals although the error norms of UQA-2 are slightly more sensitive to the time interval than UQA-1.

Next, we compare the convergence properties of ULA, UQA-1, and UQA-2. The grids used were H16, H32, H64, H128, and H256 and the time intervals were = 7200, 3600, 1800, 900, and 450 s, respectively. Figure 5 includes not only the and norms of ULA, UQA-1, and UQA-2 but also the norm of the second- and the fourth-order reconstruction schemes of SM10 (Fig. 3 of SM10). Without the flux limiter (Fig. 5a), UQA-2 is obviously more accurate than ULA and UQA-1 because UQA-2 is less diffusive than the others (Figs. 6g–i), but convergence of the error norms is almost second order commonly. With the flux limiter (Fig. 5b), convergence of the norms of ULA and UQA-1 is greater than the second order as already shown by M07b and SM10, while that of UQA-2 is only slightly greater than the second order. As a result, the norm of UQA-2 becomes almost equivalent to that of UQA-1 for H256.

The reason for the slower convergence is probably explained as follows. The solution of UQA-2 is more accurate than ULA and UQA-1 for coarser grids because UQA-2 is much less diffusive than ULA and UQA-1 as is confirmed in the test results on the H32 grid (Figs. 6a–c). Comparing Fig. 6d with Fig. 6g, Fig. 6e with Fig. 6h, and Fig. 6f with Fig. 6i, we see that the flux limiter corrects diffusive error effectively, but does not correct phase error. It is suggested that UQA-2 accompanies significantly smaller diffusive error but slightly larger phase error than UQA-1.

Compared to preexisting schemes, phase error of UQA-2 is much smaller than that depicted in Fig. 6 of Heikes and Randall (1995) and Fig. 1 of Lee and MacDonald (2009), and is almost equivalent to that in Fig. 7 of Thuburn (1997), Fig. 5 of Mittal et al. (2007), Fig. 3 of Skamarock and Gassmann (2011), and Fig. 4 of SM10. The diffusive error of UQA-2 is somewhat smaller than all of the schemes listed above. It can be seen in Fig. 5 that UQA-1 is almost equivalent to the second-order reconstruction scheme of SM10, and UQA-2 is generally more accurate than the fourth-order reconstruction scheme of SM10. The norm of UQA-2 is less sensitive to the use of the flux limiter than the other schemes. With the flux limiter, however, convergence of the norm of UQA-2 is slower than the fourth-order reconstruction scheme of SM10. This suggests that phase error of UQA-2 is slightly larger than the fourth-order reconstruction scheme of SM10.

### b. Advection of a slotted cylinder

Following LR05, a harder test that includes sharp discontinuities has been performed. The cosine bell in the previous test was replaced with a slotted cylinder that had an initial height of 1000 m. Two series of tests were performed with different settings in the time interval. One series used 7200, 3600, 1800, 900, and 450 s, respectively, for H16, H32, H64, H128, and H256. The other series used 50 s for all the grids.

Figure 7 shows that the norms of ULA, UQA-1, and UQA-2 are almost the same for all resolutions regardless of the time interval. This lack of convergence in the norm is due to the discontinuities along the edge of the slotted-cylinder. Although the norms decrease as the grid becomes finer, their convergence rates are all less than first order as was shown in Fig. 7 of LR05. While ULA is as good as UQA-1 in the first series (Fig. 7a), it is the worst in the second series (Fig. 7b) because of the deformations of the solutions (Figs. 8d,g) similar to Fig. 9d of LR05. The norm of UQA-2 is almost insensitive to and always smaller than those of ULA and UQA-1 (Fig. 7) because discontinuities around the slotted cylinder are more sharply captured by UQA-2. The solution of UQA-2 on H32 (Fig. 8f) appears similar to that of UQA-1 on H64 (Fig. 8h). The norm of UQA-2 on H32 is almost equivalent to that of UQA-1 on H64 (Fig. 7b). Thus, the spatial resolution of UQA-2 is better than that of ULA and UQA-1 when discontinuities are present.

### c. Advection by a deformational flow

A test using a nondivergent deformational flow proposed by Nair and Lauritzen (2010) has been performed. The initial distribution of the tracer is given by a superposition of two Gaussian hills. The flow field is composed of a superposition of a deformational flow and a zonal background flow. These scalar and velocity fields were computed by using Eqs. (14), (31), and (32) in Nair and Lauritzen (2010). We set parameters not given in their paper as same as in Harris et al. (2011). The time intervals were 3600, 1800, 900, 450, and 225 s on H16, H32, H64, H128, and H256 grids, respectively.

The initial pair of the Gaussian hills is strongly deformed until day 6, as depicted in Fig. 9a. Our result is similar to the reference solution given by Nair and Lauritzen (2010). UQA-2 produces a less diffusive solution than those of ULA and UQA-1 for H64 after 12 days (Figs. 9b–d). With the flux limiter (Fig. 10b), convergence rates of the norm are less than the second order for ULA, and between the second order and the third order for UQA-1 and UQA-2. Convergence of the norms is less than the second order for all of the three schemes. Without the flux limiter (Fig. 10a), the and norms of UQA-2 converge nearly in the third order from H32 to H64, but their convergences degrade as the grid becomes finer.

The reason for this degradation in convergence may be explained by considering diffusive and phase errors. When the horizontal resolution is not sufficient to resolve deformations of the Gaussian hills, the dominant source of error may be implicit diffusion. As suggested in the previous tests, UQA-2 appears to reduce diffusive error faster than phase error as the grid becomes finer, thus the dominant source of error may become phase error on finer grids. It is speculated that convergence of phase error is almost in the second order, while that of diffusive error is greater than the second order. Some signatures of this phase error are seen for UQA-2 (Fig. 9f), but not for UQA-1 (Fig. 9e).

### d. Computational cost

In this subsection, we compare the computational costs of ULA, UQA-1 and UQA-2. It should be noted that this comparison was made under limited conditions. Results may strongly depend not only on the environment, such as architectures and compilers, but also on coding skills. A single FORTRAN program coded for this work was run on three different computers. The computing environments were as follows. An Apple iMac with Intel Core i7 processor was used for a single process run. An Intel FORTRAN Compiler was used with -*fast* option. An HP ProLiant with Intel Xeon processors was used for a small multiprocess run. A PGI FORTRAN Compiler was used with *-fastsse* option and OpenMPI was used for parallelization. A HITACHI HA8000 with AMD Opteron processors was used for a large multiprocess run. A HITACHI FORTRAN Compiler was used with *-Oss -noparallel* options and MPICH-MX was used for parallelization. The numbers of processor cores were 10 and 160 for the small and the large multiprocess runs, respectively.

The pair of deformational flow tests from section 3c was performed with and without the flux limiter and was repeated three times. The grids used were H32 for iMac and ProLiant and H256 for HA8000. Figure 11 shows the averages of the computing times. *Preprocess* includes computations of the normal velocity, determinations of quadrature point(s) and evaluations of the upwind side. *Flux divergence* includes interpolations to the vertices (UQA-2 only), profile reconstructions, estimations of the cell face values, and computations of the flux divergence and the flux limiter (if used).

Figure 11 indicates that ULA is the fastest and UQA-2 is the slowest, as expected. On iMac and ProLiant, computing costs were only weakly sensitive to the differences in architecture, compiler, and parallelization. *Preprocess* of UQA-1 and UQA-2 consumed about 40% more time than ULA because of the additional quadrature points. Without the flux limiter, the cost of UQA-1 was about 40% larger than ULA in total, and that of UQA-2 was about 10% larger than UQA-1. With the flux limiter, those differences became smaller because the significant cost of the flux limiter was common for all. In this test, the cost of UQA-1 was about 25% larger than that of ULA in total, and that of UQA-2 was about 5% larger than that of UQA-1. The cost of UQA-2 is comparable to UQA-1 if the number of process cores is small and if the flux limiter is used. On HA8000, *flux divergence* of UQA-2 was obviously time consuming, comparing to UQA-1. This is because the interpolations to the cell vertices and necessary data transfers are included in *flux divergence*. To reduce this cost, we may need to improve this code for more efficient data transfer.

## 4. Summary

This study proposed a new upwind-biased forward-in-time transport scheme [second upwind-biased quadratic approximation (UQA-2)] for the spherical icosahedral grids. UQA-2 basically follows the ideas of ULA by M07b and UQA-1 by SM10 and also benefits from basic ideas of PPM (Colella and Woodward 1984). The second-order tracer distribution on the upwind side of a cell face is reconstructed by imposing two constraints. The first one is that the second-order polynomial is the least squares fit to the interpolated cell-vertex values. The second one is that the area integral of the second-order polynomial over the cell located in the upwind side is equal to the cell-averaged value times the cell area. By fitting the second-order polynomial to the cell-vertex values (that have been interpolated from the cell-center values), we significantly minimize the discontinuity at the cell edges and vertices; PPM enforces this continuity in its unlimited formulation.

Accuracy of UQA-2 was compared with those of ULA and UQA-1 through a cosine-bell advection test (Williamson et al. 1992), a slotted-cylinder advection test of LR05, and a deformational flow test (Nair and Lauritzen 2010). UQA-2 was more accurate than ULA and UQA-1 in most of the tests. UQA-2 showed nearly third-order convergence of the error norms for a C-infinity function in a lower-resolution range although convergence rates degraded as the grid becomes finer. For discontinuities, UQA-2 reproduced sharper solutions than ULA and UQA-1. Because of its higher spatial resolution, UQA-2 is more suitable than ULA and UQA-1 for high-resolution atmospheric simulations that contain sharp boundaries between cloudy and cloud-free regions.

Computational cost of UQA-2 was compared with those of ULA and UQA-1 on three different architectures using different compilers. Single process and multiprocess runs were also compared. General features of the results were not sensitive to the differences in architectures, compilers, and parallelization, but performance of UQA-2 degraded as a result of the cost of data transfers when many processors were used. Without a flux limiter, UQA-2 was more costly than ULA and UQA-1 by about 50% and about 10%, respectively. With a flux limiter, the cost differences were less because of the significant cost of the flux limiter, but UQA-2 was still more costly by about 30% compared to ULA and by about 5% compared to UQA-1.

## Acknowledgments

Hiroaki Miura thanks Prof. David Randall for supporting his visit to Colorado State University; a part of this work was done during that visit. Dr. Takanobu Yamaguchi and Dr. Ross Heikes are also acknowledged for fruitful discussions. This work was supported by the Grant-in-Aid for Young Scientists (B) of MEXT (22740310). The HA8000 supercomputer of The University of Tokyo was used in a test.

## REFERENCES

*Short- and Medium-Range Numerical Weather Prediction,*T. Matsuno, Ed., Japan Meteorological Society, 317–326.