1. Introduction
Large-scale parallel computing has the potential to alter the landscape of turbulence simulations in the atmospheric and oceanic planetary boundary layers (PBLs) as increased computer power using O(104–105) or more processors (National Science Foundation 2007) will permit large-eddy simulations (LESs) of turbulent PBLs coupling small and large scales in realistic outdoor environments. Applications include, atmosphere–land interactions (Patton et al. 2005), boundary layers with surface water wave effects (Sullivan and McWilliams 2010; Sullivan et al. 2007, 2008), weakly stable nocturnal flows (Beare et al. 2006), flow in complex terrain (Lundquist et al. 2010), stratocumulus clouds (Stevens et al. 2005), tropical boundary layers beneath deep convection (Moeng et al. 2009), and coupling with mesoscale weather events (Bryan et al. 2003), to mention just a few.
Given the prominent and important role of LES in studying boundary layer dynamics (Wyngaard 1998), it is important to examine the quality of LES solutions, and in particular their dependence on the grid mesh, subgrid-scale (SGS) parameterizations, numerical discretizations, and surface boundary conditions. Assessing the numerical convergence and the quantification of uncertainty in LES, induced by modeling and numerical errors, is compounded by the significant computational expense needed to carry out meaningful grid refinement for a three-dimensional time-dependent turbulent flow (Pope 2000). The subgrid-scale model and numerical discretization errors are intertwined since both depend explicitly on the mesh spacing (Chow and Moin 2003; Meyers et al. 2007; Geurts and Fröhlich 2002). The effective Reynolds number associated with the subgrid-scale model can vary widely so that LES solutions can be either deterministic or stochastic (Bryan et al. 2003; Wyngaard 2004a). When the effective Reynolds number is sufficiently large, resolved turbulence is supported and LES solutions are stochastic, which requires that time- and space-averaged statistics be examined in order to judge convergence. Designing metrics to assess solution error is not obvious (Celik et al. 2006). Meyers et al. (2007) propose a framework for LES model evaluation using large- and small-scale metrics that are both physics and mathematics based. They are able to extract LES discretization errors for idealized homogeneous isotropic turbulence simulations with the Smagorinsky model but rely on a direct numerical simulation (DNS) as ground truth in their evaluations, which is not available for the high-Reynolds number PBL.
Here, we investigate one aspect of assessing the quality of LES solutions, namely the sensitivity and convergence of LES solutions as the grid mesh is substantially varied for a particular choice of subgrid-scale model. The physical problem investigated is a very weakly sheared daytime convective PBL similar to that studied by Schmidt and Schumann (1989). There have been a few previous investigations that explored some aspects of the convergence of LES solutions mainly focused on an intercomparison of different codes on a similar mesh [e.g., see LES intercomparison studies by Beare et al. (2006), Stevens et al. (2005), Bretherton et al. (1999), Andren et al. (1994), Nieuwstadt et al. (1993) and Fedorovich et al. (2004)]. Bryan et al. (2003) examined the resolution requirements to simulate convective weather events and found that the statistical properties of squall lines are still not converged with a grid spacing of 125 m. Past investigations have been carried out with the intent of clarifying the behavior of LES for different PBL flows. Nieuwstadt et al. (1993) reports on the first intercomparison of simulation codes for the convective PBL using coarse 403 meshes. Andren et al. (1994) examined neutrally stratified PBLs, Beare et al. (2006) considered the behavior of the stable PBL, and Bretherton et al. (1999) studied radiatively driven entrainment in a smoke cloud. Previous work aligned with the present study is documented by Mason and Brown (1999). They examined a modest range of domain size, grid resolutions, and subgrid-scale model constants but were particularly interested in the influence of filter-scale CsΔf; Cs is the Smagorinsky constant and Δf is a characteristic subgrid length scale.
The outline of the paper is as follows: section 2 is a brief introduction to the LES equations appropriate for a high-Reynolds number PBL; section 3 describes the LES grid refinement experiments; results are presented in section 4; section 5 provides a summary of the findings; and the appendix provides technical details about the LES code parallelization and performance.
2. LES equations










An important difference between smooth and rough wall LES is the specification of surface boundary conditions. As is common practice with geophysical flows, we impose rough wall boundary conditions based on a drag rule where the surface transfer coefficients are determined from Monin–Obukhov similarity functions (Moeng 1984; Moeng and Sullivan 1994). A high Reynolds number model for viscous dissipation is used in (1c) [see discussion near (6)]. Thus, molecular viscosity and diffusivity do not appear in the LES equation set. The sidewall (x, y) boundary conditions are periodic and a radiation boundary condition (Klemp and Durran 1983) is used at the top of the domain.
In our LES code, (1) are integrated in time using a fractional step method. The spatial discretization is second-order finite difference in the vertical direction and pseudospectral in the horizontal planes. The resolved vertical flux
3. Design of LES experiments
A suite of simulations on a fixed computational domain with varying grid resolutions is performed to examine the convergence of the LES equations given in section 2 using the parallel algorithm described in the appendix. A canonical daytime convective PBL is simulated in a computational domain (Lx, Ly, Lz) = (5120, 5120, 2048) m. Six simulations are performed with grid meshes of 323, 643, 1283, 2563, 5123, and 10243, and for each mesh the spacing is held constant in the three (x, y, z) directions (see Table 1). The PBL is driven by a constant surface buoyancy flux Q* = 0.24 K m s−1 and weak geostrophic winds (Ug, Vg) = (1, 0) m s−1. Other external inputs are surface roughness z0 = 0.1 m, Coriolis parameter f = 1 × 10−4 s−1, and initial inversion height zi ~1024 m. In terms of the initial PBL height, the computational domain is (Lx, Ly, Lz)/zi = (5, 5, 2), which is sufficient to allow fully turbulent flow fields to develop independently of the periodic sidewall boundary conditions (e.g., Schmidt and Schumann 1989). At long time scales (t ≥ 8 h) the horizontal domain should be expanded to accommodate the very large structures that can develop under persistent forcing, as discovered by Jonker et al. (1999) and de Roode et al. (2004).
Simulation grid spacings.



Grid resolution tests with LES are demanding since the resolved turbulent motions are always 3D and time dependent. For rough-wall LES of a given domain size, the number of mesh points in a single direction N ~ (Lx/Δx) and hence N3 ~ (Lx/Δx)3, assuming equal spacing in all three directions. However, refining the mesh also lowers the acceptable time step owing to the limits imposed by a CFL constraint; that is, CFL = |u|max Δt/Δx. Thus, as the grid spacing decreases, the number of time steps needed to advance the solutions to the same time further increases by the factor M ~ Lx/Δx (see, e.g., Pope 2000, p. 348). The total computational work for a complete simulation is then M · N3 ~ (Lx/Δx)4. As an example of the steep climb in work with increasing resolution, the computational effort on a mesh with 10243 grid points is approximately 4096 times greater than the work required on a mesh with 1283 grid points. This underestimates the effort by a factor of 2 since our computations are dominated by FFT work, which scales as NlogN in both x and y.
4. Results
In the analysis of the LES solutions we discuss the variation of statistics and vertical profiles as a function of the mesh resolution ratio zi/Δf or zi/Δz; here zi is the PBL depth and Δf is the LES filter width, which is related to the mesh spacings Δxi, as discussed below. In the interior of the PBL, away from the surface layer and entrainment zone, numerous observational and LES studies find that zi is a characteristic scale of the energy containing eddies in the convective PBL (e.g., Deardorff 1972a; Lenschow et al. 1980; Lothon et al. 2009; Jonker et al. 1999). Thus, the nondimensional ratio zi/Δf can be interpreted as a measure of the scale separation between the energy-containing eddies and those near the filter cutoff. When the SGS closure is the Smagorinsky model, Mason and Brown (1999) and Pope (2000) prefer to interpret the LES set of equations as a numerical system with the degrees of freedom limited by a low-pass “Smagorinsky filter.” The cutoff scale of the filter is CsΔf, with Cs equal to the Smagorinsky constant. Muschinski (1996) builds on this interpretation and discusses the properties of a non-Newtonian LES fluid with a Smagorinsky viscosity. To place our simulations in the context of this alternate interpretation, we also present the results as a function of the resolution ratio zi/(CsΔf). In either interpretation, when zi/Δf ≫ 1 LES solutions have a wide separation between the energy-containing eddies and those near the filter cutoff scale. Observations of subgrid-scale turbulence in the atmospheric surface layer demonstrate that a similar ratio of scales Λw/Δf, where Λw is the scale of the peak in the vertical velocity spectrum, is a useful dimensionless parameter that collapses the variation of subgrid-scale turbulence over a range of stratification and filter widths (Sullivan et al. 2003).
A summary of bulk PBL properties generated from the various simulations is provided in Table 2. Entries in this table are PBL depth zi, convective velocity scale w*, normalized entrainment rate ratio we/w*, large-eddy Reynolds number at mid-PBL Reℓ, friction velocity ratio u*/w*, bottom and top of the entrainment zone (δb, δt)/zi, and the ratio of PBL depth to filter width and vertical resolution zi/(Δf, CsΔf, Δz). Note that δb and δt are the endpoints of the entrainment zone defined as the region where the total vertical temperature flux is negative. A broad look at the tabulated results shows that w* is almost invariant with the mesh resolution, while the friction velocity shows a slight downward trend of ~10% as the mesh varies. Our values of u*/w* ~ 0.08 for zi/z0 ~ 104 are close to those predicted by Schmidt and Schumann (1989). Meanwhile, the entrainment rate and entrainment zone depth vary substantially on the coarser meshes. The variations of the bulk properties and the vertical profiles of selected flow variables are discussed below.
Bulk simulation properties.
a. Inertial subrange scaling






Variation of large-eddy Reynolds number Reℓ with mesh resolution at heights z/zi = 0.1, 0.5, and 0.9 denoted by symbols □, ⋄, and ○, respectively; Reℓ is computed from (6). Inertial subrange scaling is obeyed when the solid line becomes flat. Note the bottom and top x axes show the resolution ratios of zi to Δf and to CsΔf, respectively.
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
An alternate but equivalent statement of the high Reynolds number scaling Reℓ ~ (zi/Δf)4/3 is that the dissipation
b. Temperature profiles and entrainment statistics
The vertical structure of the mean temperature
Vertical profile of virtual potential temperature
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
Vertical profile of total temperature flux
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
The response of the temperature flux profiles to the varying mean θ profiles, shown in Fig. 3, is interesting. Despite the radical changes to the overlying temperature structure with varying mesh, all the temperature flux profiles decrease linearly over the boundary layer, reaching a minimum (negative) value near and below zi. Note that Fig. 3 shows the total temperature flux (i.e., the sum of resolved plus subgrid-scale fluxes where the latter is retrieved from the SGS eddy viscosity model
The temporal variation of the boundary layer inversion height zi(t), shown in Fig. 4, is a strong measure of solution convergence. Here zi is determined using the maximum vertical gradient in temperature; that is, for each x, y gridpoint we search along a vertical column to find the location of the maximum in
Variation of the boundary layer height zi with nondimensional time t/T; the large-eddy time scale T = zi/w*. The labels A–F correspond to the grid resolutions 323, 643, 1283, 2563, 5123, and 10243, respectively. The high-resolution runs (D–F) overlap. The simulation marked with an open square uses a mesh of 643 but with no monotone vertical temperature flux; that marked with an open circle uses a mesh of 2563 and is identical to simulation D but uses a filter width Δf equal to simulation B.
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1




The couplings among mean temperature, temperature flux, and temperature variance in (8) are subtle and complex and apparently depend critically on the mean temperature gradient. This in turn impacts the overall entrainment predicted by LES. To illustrate the influence of
Based on our LES experiments we conclude that to generate grid-independent solutions the mesh needs to have sufficiently fine vertical resolution to capture both the mean temperature gradients in the overlying inversion and the turbulence. However, vertical refinement requires a comparable refinement of the horizontal grid in order to maintain reasonable aspect ratio grids; grid isotropy impacts inertial range SGS constants (e.g., Scotti et al. 1993). Generally, the impact of grid anisotropy Δx ≠ Δy ≠ Δz on LES solutions is not well understood (e.g., Kaltenbach 1997; Silva Lopes and Palma 2002). We note, however, that in all our computations Δx = Δy and hence the explicit (dealiasing) filtering used in horizontal x–y planes is isotropic. Tong et al. (1998) shows that 2D (isotropic) filtering, as used here, is nearly equivalent to 3D filtering.
These mesh resolution experiments have implications for LES studies of entrainment. There is a subtle interplay among mesh resolution, the overlying inversion, the minimum temperature flux, and the entrainment rate. Insufficient vertical resolution weakens the inversion and increases the entrainment rate while maintaining nearly the same minimum temperature flux. A first-order entrainment jump model (Betts 1974) shows how a finite inversion thickness contributes to the entrainment rate (see Sullivan et al. 1998). Linearity of the temperature (or heat) flux profile and minimum temperature flux approximately equal to −0.2Q* are relatively insensitive to the mesh resolution and thus are insufficient to judge the convergence of LES solutions for the convective boundary layer. The variation of the entrainment rate we = dzi/dt is a much more sensitive indicator of LES solution convergence.
c. Convergence of variances statistics
Effect of mesh resolution on the (left) total turbulent kinetic energy (TKE) and (right) total temperature variance Θ2. TKE is normalized by
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
Inspection of the vertical profiles of total vertical variance
Total variance (resolved plus SGS contributions) of the (left) vertical and (right) horizontal velocities. The horizontal variance uh is the sum of the u and υ components; see definition below (9).
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1



d. Spectral analysis
Figure 7 shows two-dimensional spectra of the vertical and horizontal velocity at nondimensional heights z/zi = (0.9, 0.5, 0.1) for varying mesh resolutions. These spectra are functions of the horizontal wavenumber vector
Two-dimensional energy spectra of (left) vertical velocity w and (right) horizontal velocity u in the PBL for varying meshes. The spectra are functions of the magnitude of the horizontal wavenumber vector kh = |k|. The groups of spectra at the top, middle, and bottom in each plot correspond to the heights z/zi = 0.9, 0.5, and 0.1, respectively. For clarity, the spectral amplitudes in each group are multiplied by the numerical factor on the left-hand side of the plot. The dashed line has slope
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
In the upper boundary layer, z/zi = 0.9, all the meshes capture the peak in the vertical velocity spectrum reasonably well and also display a
The spectrum of horizontal velocity displays an intriguing behavior at z/zi = 0.1, and to a lesser extent at z/zi = 0.9, Its peak energy is clearly at a lower wavenumber compared to the vertical velocity, and the finest-resolution run hints at a two-slope character (i.e., it displays a slope transition near khzi ~ 25). This behavior reflects the redistribution of energy near the lower surface because of the wall presence. This is exposed more clearly in Fig. 8 where we show the z variation of the spectra from the 10243 simulation as the lower boundary is approached. We notice a smooth gradual decrease in the magnitude of the vertical velocity spectrum at low wavenumbers accompanied by a gradual shift in the peak toward higher wavenumbers as z/zi decreases. A slope of
Two-dimensional energy spectrum of (left) vertical velocity w and (right) horizontal velocity u near the lower boundary at various heights z/zi = 0.1, 0.2, 0.3, and 0.5 for a simulations with 10243 grid points. The dashed line has slope
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
e. High-order moments
Velocity and scalar moments higher than second order appear in ensemble average TKE and flux budgets and are used in the interpretation of PBL dynamics (e.g., Mironov 2009). Often LES flow fields are used to compute high-order moments, but it is unknown how grid resolution impacts these estimates. Moeng and Rotunno (1990) identify the vertical velocity skewness Sw as a critical parameter in boundary layer dynamics. In convective PBLs, Sw is an indicator of the updraft–downdraft distribution, provides clues about vertical transport, and is utilized in dispersion studies (Weil 1988, 1990). Further, Moeng and Rotunno (1990) find that vertical velocity skewness is sensitive to the type of surface boundary conditions and also varies with Reynolds number in direct numerical simulations.

Vertical profiles of
Effect of mesh resolution on resolved vertical velocity skewness
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1



To evaluate the importance of the SGS moments
Vertical profiles of skewness and SGS moments constructed from the filtered 10243 simulation (referred to as case Ff) are presented in Fig. 10; results obtained from filtering case E are similar. The skewness estimates from Ff are similar to the comparable 643 coarse simulation result (i.e., small in the surface layer and large near the inversion) but exhibit important quantitative differences. In the surface layer, the skewness from case Ff is always positive except very near the ground, in contrast to simulation B. This is in agreement with our physical expectation. Also the skewness from Ff matches the high-resolution result in the mid-PBL. The SGS moments in Fig. 10b illustrate the shortcomings of the coarse 643 simulation (case B). In the surface layer the triple moment
(a) Skewness from the 10243 simulation (solid), the 10243 simulation filtered in horizontal planes to 642 resolution (dotted), and the 643 simulation (dashed). (b) Third- and second-order SGS moments computed from 10243 simulation, showing
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
Comparison of third- and second-order resolved vertical velocity moments from the 10243 simulation (solid), the 10243 simulation filtered in horizontal planes to 642 resolution (dotted), and the 643 simulation (dashed), showing (a) normalized
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
The turbulent transport (term T) in (9a) and (9b) depends on the vertical divergence of the third-order moments
Effect of mesh resolution on resolved third-order moments (left)
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
f. Flow visualization
A complete discussion of the impact of mesh resolution on the formation and dynamics of coherent structures and their connection to the statistical moments in the convective PBL is beyond the scope of the present work. Here we briefly illustrate one aspect of large- and small-scale interaction that can occur in high-resolution LES. In Fig. 13, we observe the classic formation of plumes in a convective PBL. Vigorous thermal plumes near the top of the PBL can trace their roots through the middle of the PBL down to the surface layer. Convergence at the common corners of the hexagonal patterns in the surface layer leads to the formation of strong updrafts that evolve into large-scale plumes that fill and dominate the dynamics of the daytime PBL. Near the inversion a descending shell of motion readily develops around each plume.
Visualization of the vertical velocity field in a convective PBL at different heights from the 10243 simulation: z/zi = (top left) 0.04, (top right) 0.1, (bottom left) 0.5, and (bottom right) 0.9. The gray scale color bar changes between the panels and is in units of m s−1.
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
Closer inspection of the large-scale flow patterns in Fig. 13 also reveals coherent smaller-scale structures. This is demonstrated in Fig. 14 where we track the evolution of 105 particles over about 400 s. Over the limited region where the particles are released the flow is dominated by a persistent line of larger-scale upward convection. On either side of the convection line descending motion develops and near the surface these downdrafts turn laterally and converge. The outcome of this surface layer convergence spawns many small-scale vertically oriented vortices that resemble dust devils. These rapidly rotating vortices are readily observed, persist in time, and rotate in both clockwise and counterclockwise directions. Often the vortices coalesce in a region where a coherent thermal plume erupts. Coarse-mesh LES hints at these coherent vortices but fine-resolution simulations allow a detailed examination of their dynamics within the larger-scale flow. Previously, Kanak (2005) observed the formation of dust devils in convective simulations, but in small computational domains O(750 m).
Visualization of 105 particles randomly released in a convective PBL at z/zi ~ 0.01 over a limited horizontal extent from the 10243 simulation of convection. The viewed area, ~3.8% of the total horizontal domain, is the topmost left corner from the top-left panel of Fig. 13. Notice the evolution of the larger-scale line of convection into small-scale vortical motions that resemble dust devils. Time advances from left to right beginning along the top row of images. The images are 71.6, 151, and 390 s after the initial release. Vertical vorticity
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
5. Summary
A highly parallel large-eddy simulation (LES) code for the atmospheric boundary layer is developed based on a high-Reynolds number Boussinesq flow model with a fully rough lower boundary. The numerical scheme employs pseudospectral differencing in horizontal planes and solves an elliptic pressure Poisson equation utilizing 2D domain decomposition. Despite these global operations, the code exhibits both weak and strong scaling over a wide range of problem sizes with scaling tests are carried out using as many 16 384 processors (see the appendix).
This code is used to carry out a grid sensitivity study of a daytime convective PBL for a wide range of meshes varying from 323 to 10243. Based on the variation of the second-order statistics, spectra, and entrainment statistics we find that the 3D time-dependent LES solutions numerically converge as the mesh is refined for this canonical problem. In the boundary layer interior (0.1 < z/zi < 0.9, where zi is the boundary layer height), the total variances and temperature flux have effectively converged when the mesh resolution is 2563 or greater. The convergence of the total vertical velocity is very good. For our mesh of 2563, the ratio zi/Δf > 60 or zi/(CsΔf) > 310, where Δf is the LES filter width and Cs is the Smagorinsky constant. In this regime, the scale separation between the energy containing eddies and the filter cutoff scale is sufficiently wide that the large-eddy Reynolds number Reℓ ~ (zi/Δf)4/3 and the parameterized viscous dissipation
The entrainment rate determined from the time variation of the boundary layer height we = dzi/dt is a sensitive measure of the LES solution convergence. The LES estimates of entrainment velocity become mesh independent when the vertical grid resolution is able to capture both the mean structure of the overlying inversion and the turbulence. The entrainment rate increases with decreasing mesh resolution because of inadequate resolution of the mean temperature gradients in the inversion. For all mesh resolutions used, the vertical temperature flux varies linearly over the boundary layer with the minimum temperature flux ≈ −0.2 of the surface flux. Thus, these scalar-flux properties are not adequate to judge the convergence of LES solutions.
The variation of third-order moments, often used to interpret PBL dynamics, depends on the grid resolution; skewness of resolved vertical velocity
The criterion zi/(CsΔf) > 310 proposed here for simulations of convective boundary layers needs to be tested for simulations of boundary layers dominated by shear, stable stratification, cloudy boundary layers, and boundary layers with surface heterogeneity where the energy containing eddies are concentrated at scales smaller than the boundary layer height zi.
Acknowledgments
We thank Chin-Hoh Moeng, Harm Jonker, and Jeff Weil for their insights and suggestions, which improved the present work. The comments by the anonymous reviewers are appreciated. PPS was partially supported by the Office of Naval Research and by the National Science Foundation through the National Center for Atmospheric Research. EGP acknowledges partial support from the Army Research Office, the National Science Foundation’s Science and Technology Center for Multi-Scale Modeling of Atmospheric Processes, and NCAR’s BEACHON program. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract DE-AC02-05CH11231. Computer time was also provided by NCAR and the Department of Defense.
APPENDIX
Algorithm Parallelization
a. Domain decomposition
The parallelization of the LES algorithm is based on the following criteria: 1) to accomplish 2D domain decomposition using solely the Message Passing Interface (MPI) (Aoyama and Nakano 1999); 2) to preserve pseudospectral differencing in x–y planes using fast Fourier transforms (FFTs); and 3) to maintain a Boussinesq incompressible flow model. The ability to use 2D domain decomposition is a significant advantage in pseudospectral simulation codes as it allows direct numerical simulations of isotropic turbulence on meshes of 20483 or more (Pekurovsky et al. 2006). A sketch of the domain decomposition layout that conforms to our constraints is given in Fig. A1. We mention that 2D domain decomposition in x–y planes is often used with low-order finite-difference schemes (Raasch and Schröter 2001) and mesoscale codes that adopt compressible equations (Michalakes et al. 2005).
The 2D domain decomposition on nine processors: (a) base state with y–z decomposition, (b) x–z decomposition used for computation of y derivatives and 2D planar FFT, and (c) x–y decomposition used in the tridiagonal matrix inversion of the pressure Poisson equation.
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
forward x to y transpose
,FFT derivative ∂fT/∂y, and
inverse y to x transpose ∂fT/∂y → ∂f/∂y.







With these enhancements our new algorithm allows a very large number of processors O(104) or more to be utilized. No global communication between processors is required; that is, we do not call MPI’s ALL_TO_ALL routine. Instead, the MPI routine SENDRECV is wrapped with FORTRAN statements to accomplish the desired communication pattern. The scheme outlined above introduces more communication but the send–receive messages are smaller and hence large numbers of grid points can be used. Also, the total number of processors is not limited by the number of vertical grid points. This flexibility allows simulations in boxes with large horizontal and small vertical extents. The transpose routines are general and allow arbitrary numbers of mesh points, although the best performance is of course realized when the load is balanced across processors.
b. Scaling
The performance of the code for varying workload as a function of the total number of processors NP is provided in Figs. A2 and A3 for three different machine architectures (NP = NPz × NPxy where NPz and NPxy are the number of processors in the vertical and horizontal directions, respectively). In each figure, the vertical axis is total computational time t × NP divided by total work. Also, Nz is the number of vertical levels and Mx,y is proportional to the FFT work (i.e., Mx,y = Nx,y logNx,y, with Nx,y being the number of grid points in the x and y directions). Ideal scaling corresponds to a flat line with increasing number of processors. The timing tests illustrate the present scheme exhibits both strong scaling (i.e., where the problem size is held fixed and the number of processors is increased) and weak scaling (i.e., where the problem size grows as the number of processors increases so the amount of work per processor is held constant) over a wide range of problem sizes and is able to use as many as 16 384 processors (i.e., the maximum number available to our application). Further, the results are robust for varying combinations of (NPz, NPxy). Generally, the performance only begins to degrade when the number of processors exceeds about 8 times the minimum of (Nx, Ny, Nz) because of increases in communication overhead.
Computational time per grid point for different combinations of problem size and 2D domain decomposition for the Cray XT4 (an example of strong scaling), showing problem sizes (a) 5123 (⋄), (b) 10243 (○), (c) 20483 (□), and (d) 30723 (Δ). For a given number of total processors NP the symbols are varying vertical and horizontal decompositions [i.e., different combinations (NPz, NPxy)].
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
Computational time per grid point for a fixed amount of work per processor (an example of weak scaling). Shown are 60 000 points per processor for the Cray XT4 (○), dual core IBM SP5+ (⋄), and single core IBM SP5 (□), and 524 288 points per processor for the Cray XT4 (Δ). For a fixed number of total processors NP multiple symbols are different combinations of (NPz, NPxy).
Citation: Journal of the Atmospheric Sciences 68, 10; 10.1175/JAS-D-10-05010.1
REFERENCES
Andren, A., A. R. Brown, P. J. Mason, J. Graf, U. Schumann, C.-H. Moeng, and F. T. M. Nieuwstadt, 1994: Large-eddy simulation of a neutrally stratified boundary layer: A comparison of four computer codes. Quart. J. Roy. Meteor. Soc., 120, 1457–1484.
Aoyama, Y., and J. Nakano, 1999: RS/6000 SP: Practical MPI programming. Tech. Rep. IBM Redbook SG24-5380-00, International Business Machines, 221 pp.
Beare, R. J., and Coauthors, 2006: An intercomparison of large-eddy simulations of the stable boundary layer. Bound.-Layer Meteor., 118, 242–272.
Beets, C., and B. Koren, 1996: Large-eddy simulation with accurate implicit subgrid-scale diffusion. Department of Numerical Mathematics Rep. NM-R9601, Utrecht University, 24 pp.
Betts, A. K., 1974: Reply to comment on the paper ‘Non-precipitating cumulus convection and its parameterization.’ Quart. J. Roy. Meteor. Soc., 100, 469–471.
Brasseur, J. G., and T. Wei, 2010: Designing large eddy simulation of the turbulent boundary layer to capture law-of-the-wall scaling. Phys. Fluids,22, 021303, doi:10.1063/1.3319073.
Bretherton, C. S., and Coauthors, 1999: An intercomparison of radiatively driven entrainment and turbulence in a smoke cloud, as simulated by different numerical models. Quart. J. Roy. Meteor. Soc., 554, 391–423.
Bryan, G. H., J. C. Wyngaard, and J. M. Fritsch, 2003: Resolution requirements for the simulation of deep moist convection. Mon. Wea. Rev., 131, 2394–2416.
Celik, I., M. Klein, M. Freitag, and J. Janicka, 2006: Assessment measures for URANS/DES/LES: An overview with applications. J. Turbul., 7, 1–27.
Chow, F. K., and P. Moin, 2003: A further study of numerical errors in large-eddy simulations. J. Comput. Phys., 184, 366–380.
Davis, K. J., N. Gamage, C. R. Hagelberg, D. H. L. C. Kiemle, and P. P. Sullivan, 2000: An objective method for deriving atmospheric structure from airborne lidar observations. J. Atmos. Oceanic Technol., 17, 1455–1468.
Deardorff, J. W., 1972a: Numerical investigation of neutral and unstable planetary boundary layers. J. Atmos. Sci., 29, 91–115.
Deardorff, J. W., 1972b: Three-dimensional numerical modeling of the planetary boundary layer. Workshop on Micrometeorology, D. A. Haugen, Ed., Amer. Meteor. Soc., 271–311.
Deardorff, J. W., 1980: Stratocumulus-capped mixed layers derived from a three-dimensional model. Bound.-Layer Meteor., 18, 495–527.
de Roode, S. R., P. G. Duynkerke, and H. J. J. Jonker, 2004: Large eddy simulation: How large is large enough? J. Atmos. Sci., 61, 403–421.
Fedorovich, E., F. T. M. Nieuwstadt, and R. Kaiser, 2001: Numerical and laboratory study of a horizontally evolving convective boundary layer. Part I: Transition regimes and development of the mixed layer. J. Atmos. Sci., 58, 70–86.
Fedorovich, E., and Coauthors, 2004: Entrainment into sheared convective boundary layers as predicted by different large eddy simulation codes. Preprints, 16th Symp. on Boundary Layer and Turbulence, Portland, ME, Amer. Meteor. Soc., P4.7.
Geurts, B. J., 2001: Modern Simulation Strategies for Turbulent Flow. R. T. Edwards, 327 pp.
Geurts, B. J., and J. Fröhlich, 2002: A framework for predicting accuracy limitations in large-eddy simulation. Phys. Fluids, 14, L41–L44.
Gibbs, W. R., 2004: A parallel/recursive algorithm. J. Comput. Phys., 201, 573–585.
Hatlee, S. C., and J. C. Wyngaard, 2007: Improved subfilter-scale models from the HATS field data. J. Atmos. Sci., 64, 1694–1705.
Hunt, J. C. R., J. C. Kaimal, and J. E. Gaynor, 1988: Eddy structure in the convective boundary layer—New measurements and new concepts. Quart. J. Roy. Meteor. Soc., 482, 827–858.
Jonker, H. J. J., P. G. Duynkerke, and J. W. M. Cuijpers, 1999: Mesoscale fluctuations in scalars generated by boundary layer convection. J. Atmos. Sci., 56, 801–808.
Kaltenbach, H.-J., 1997: Cell aspect ratio dependence of anisotropy measures for resolved and subgrid scale stresses. J. Comput. Phys., 136, 399–410.
Kanak, K. M., 2005: Numerical simulation of dust devil–scale vortices. Quart. J. Roy. Meteor. Soc., 131, 1271–1292.
Klemp, J., and D. Durran, 1983: An upper boundary condition permitting internal gravity wave radiation in numerical mesoscale models. Mon. Wea. Rev., 111, 430–444.
Koren, B., 1993: A robust upwind discretization method for advection, diffusion and source terms. Notes on Numerical Fluid Mechanics, Vol. 45, C. B. Vreugdenhil and B. Koren, Eds., Vieweg-Braunschweig, 117–138.
Lele, S. K., 1992: Compact finite difference schemes with spectral-like resolution. J. Comput. Phys., 103, 16–42.
Lenschow, D. H., J. C. Wyngaard, and W. T. Pennell, 1980: Mean-field and second-moment budgets in a baroclinic, convective boundary layer. J. Atmos. Sci., 37, 1313–1326.
Lenschow, D. H., J. Mann, and L. Kristensen, 1994: How long is long enough when measuring fluxes and other turbulence statistics? J. Atmos. Oceanic Technol., 11, 661–673.
Lenschow, D. H., M. Lothon, S. D. Mayor, P. P. Sullivan, and G. Canut, 2011: A comparison of higher-order vertical velocity moments in the convective boundary layer from lidar with in situ measurements and LES. Bound.-Layer Meteor., doi:10.1007/s10546-011-9615-3, in press.
Lilly, D. K., 1967: The representation of small-scale turbulence in numerical simulation experiments. Proc. IBM Scientific Computing Symp. on Environmental Sciences, Yorktown Heights, NY, International Business Machines, 195–210.
Lothon, M., D. H. Lenschow, and S. D. Mayor, 2009: Doppler lidar measurements of vertical velocity spectra in the convective planetary boundary layer. Bound.-Layer Meteor., 132, 205–226.
Lothon, M., D. H. Lenschow, G. Canut, S. D. Mayor, and P. P. Sullivan, 2010: Measurements of higher-order turbulence statistics in the daytime convective boundary layer derived from a ground-based Doppler lidar. Proc. Int. Symp. for the Advancement of Boundary Layer Remote Sensing, Paris, France, ISARS. [Available online at http://www.isars2010.uvsq.fr/images/stories/PosterExtAbstracts/P_TUR03_Lothon.pdf.]
Lundquist, K. A., F. K. Chow, and J. K. Lundquist, 2010: An immersed boundary method for the Weather Research and Forecasting model. Mon. Wea. Rev., 138, 796–817.
Mason, P. J., and A. R. Brown, 1999: On subgrid models and filter operations in large-eddy simulations. J. Atmos. Sci., 56, 2101–2114.
Meneveau, C., and J. Katz, 2000: Scale-invariance and turbulence models for large-eddy simulations. Annu. Rev. Fluid Mech., 32, 1–32.
Meyers, J., B. J. Geurts, and P. Sagaut, 2007: A computational error-assessment of central finite-volume discretizations in large-eddy simulation using a Smagorinsky model. J. Comput. Phys., 227, 156–173.
Michalakes, J., J. Dudhia, D. Gill, T. Henderson, J. Klemp, W. Skamarock, and W. Wang, 2005: The Weather Research and Forecast Model: Software architecture and performance. Proceedings of the Eleventh ECMWF Workshop on the Use of High Performance Computing in Meteorology, W. Zwieflhofer and G. Mozdzynski, Eds., World Scientific, 156–168.
Mironov, D. V., 2009: Turbulence in the lower troposphere: Second-order closure and mass-flux modelling frameworks. Interdisciplinary Aspects of Turbulence, W. Hillebrandt and F. Kupka, Eds., Lecture Notes in Physics, Vol. 756, Springer-Verlag, 161–221.
Mironov, D. V., V. M. Gryanik, C.-H. Moeng, D. J. Olbers, and T. H. Warncke, 2000: Vertical turbulence structure and second-moment budgets in convection with rotation: A large-eddy simulation study. Quart. J. Roy. Meteor. Soc., 126, 477–515.
Moeng, C.-H., 1984: A large-eddy-simulation model for the study of planetary boundary-layer turbulence. J. Atmos. Sci., 41, 2052–2062.
Moeng, C.-H., and J. C. Wyngaard, 1988: Spectral analysis of large-eddy simulations of the convective boundary layer. J. Atmos. Sci., 45, 3573–3587.
Moeng, C.-H., and R. Rotunno, 1990: Vertical velocity skewness in the buoyancy-driven boundary layer. J. Atmos. Sci., 47, 1149–1162.
Moeng, C.-H., and P. P. Sullivan, 1994: A comparison of shear- and buoyancy-driven planetary boundary layer flows. J. Atmos. Sci., 51, 999–1022.
Moeng, C.-H., and P. P. Sullivan, 2002: Large-eddy simulation. Encyclopedia of Atmospheric Sciences, J. R. Holton, J. Pyle, and J. A. Curry, Eds., Academic Press, 1140–1150.
Moeng, C.-H., M. A. LeMone, M. F. Khairoutdinov, S. K. Krueger, P. A. Bogenschutz, and D. A. Randall, 2009: The tropical marine boundary layer under a deep convection system: A large-eddy simulation study. J. Adv. Model. Earth Syst., 1 (16), doi:10.3894/JAMES.2009.1.16.
Muschinski, A., 1996: A similarity theory of locally homogenous and isotropic turbulence generated by a Smagorinsky-type LES. J. Fluid Mech., 325, 239–260.
National Science Foundation, 2007: Cyberinfrastructure vision for 21st century discovery. Tech. Rep. NSF 07–28, NSF Cyberinfrastructure Council. [Available online at http://www.nsf.gov/pubs/2007/nsf0728/index.jsp.]
Nieuwstadt, F. T. M., P. J. Mason, C. H. Moeng, and U. Schumann, 1993: Large-eddy simulation of the convective boundary layer: A comparison of four computer codes. Turbulent Shear Flows 8, F. Durst, Ed., Springer-Verlag, 343–367.
Patton, E. G., P. P. Sullivan, and C.-H. Moeng, 2005: The influence of idealized heterogeneity on wet and dry planetary boundary layers coupled to the land surface. J. Atmos. Sci., 62, 2078–2097.
Pekurovsky, D., P. K. Yeung, D. Donzis, W. Pfeiffer, and G. Chukkapallli, 2006: Scalability of a pseudospectral DNS turbulence code with 2D domain decomposition on Power4+/Federation and Blue Gene systems. ScicomP12 and SP-XXL, Boulder, CO, International Business Machines. [Available online at http://www.spscicomp.org/ScicomP12/Presentations/User/Pekurovsky.pdf.]
Pope, S. B., 2000: Turbulent Flows. Cambridge University Press, 771 pp.
Raasch, S., and M. Schröter, 2001: PALM—A large-eddy simulation model performing on massively parallel computers. Meteor. Z., 10, 363–372.
Schmidt, H., and U. Schumann, 1989: Coherent structure of the convective boundary layer derived from large-eddy simulations. J. Fluid Mech., 200, 511–562.
Scotti, A., C. Meneveau, and D. K. Lilly, 1993: Generalized Smagorinsky model for anisotropic grids. Phys. Fluids A, 5, 2306–2308.
Silva Lopes, A., and J. M. L. M. Palma, 2002: Numerical simulation of isotropic turbulence using a collocated approach and a nonorthogonal grid system. J. Comput. Phys., 175, 713–738.
Spalart, P. R., R. D. Moser, and M. M. Rogers, 1991: Spectral methods for the Navier–Stokes equations with one infinite and two periodic directions. J. Comput. Phys., 96, 297–324.
Sreenivasan, K. R., A. Bershadskii, and J. J. Niemela, 2002: Mean wind and its reversal in thermal convection. Phys. Rev. E, 65, 056306, doi:10.1103/PhysRevE.65.056306.
Stevens, B., and Coauthors, 2005: Evaluation of large-eddy simulations via observations of nocturnal marine stratocumulus. Mon. Wea. Rev., 133, 1443–1462.
Sullivan, P. P., and J. C. McWilliams, 2010: Dynamics of winds and currents coupled to surface waves. Annu. Rev. Fluid Mech., 42, 19–42.
Sullivan, P. P., J. C. McWilliams, and C.-H. Moeng, 1994: A subgrid-scale model for large-eddy simulation of planetary boundary-layer flows. Bound.-Layer Meteor., 71, 247–276.
Sullivan, P. P., J. C. McWilliams, and C.-H. Moeng, 1996: A grid nesting method for large-eddy simulation of planetary boundary layer flows. Bound.-Layer Meteor., 80, 167–202.
Sullivan, P. P., C.-H. Moeng, B. Stevens, D. H. Lenschow, and S. D. Mayor, 1998: Structure of the entrainment zone capping the convective atmospheric boundary layer. J. Atmos. Sci., 55, 3042–3064.
Sullivan, P. P., T. W. Horst, D. H. Lenschow, C.-H. Moeng, and J. C. Weil, 2003: Structure of subfilter-scale fluxes in the atmospheric surface layer with application to large-eddy simulation modeling. J. Fluid Mech., 482, 101–139.
Sullivan, P. P., J. C. McWilliams, and W. K. Melville, 2007: Surface gravity wave effects in the oceanic boundary layer: Large-eddy simulation with vortex force and stochastic breakers. J. Fluid Mech., 593, 405–452.
Sullivan, P. P., J. B. Edson, T. Hristov, and J. C. McWilliams, 2008: Large-eddy simulations and observations of atmospheric marine boundary layers above non-equilibrium surface waves. J. Atmos. Sci., 65, 1225–1245.
Tong, C., J. C. Wyngaard, S. Khanna, and J. G. Brasseur, 1998: Resolvable- and subgrid-scale measurement in the atmospheric surface layer: Technique and issues. J. Atmos. Sci., 55, 3114–3126.
Townsend, A. A., 1976: The Structure of Turbulent Shear Flow. Cambridge University Press, 429 pp.
Weil, J. C., 1988: Dispersion in the convective boundary layer. Lectures on Air Pollution Modeling, A. Venkatram and J. Wyngaard, Eds., Amer. Meteor. Soc., 167–227.
Weil, J. C., 1990: A diagnosis of the asymmetry in top-down and bottom-up diffusion in a Lagrangian stochastic model. J. Atmos. Sci., 47, 501–515.
Werne, J., and D. C. Fritts, 1999: Stratified shear turbulence: Evolution and statistics. Geophys. Res. Lett., 26, 439–442.
Wyngaard, J. C., 1998: Boundary-layer modeling: History, philosophy, and sociology. Clear and Cloudy Boundary Layers, A. A. M. Holtslag and P. G. Duynkerke, Eds., Royal Netherlands Academy of Arts and Sciences, 325–332.
Wyngaard, J. C., 2004a: Changing the face of small-scale meteorology. Atmospheric Turbulence and Mesoscale Meteorology, E. Federovich, R. Rotunno, and B. Stevens, Eds., Cambridge University Press, 17–34.
Wyngaard, J. C., 2004b: Toward numerical modeling in the “terra incognita.” J. Atmos. Sci., 61, 1816–1826.
Wyngaard, J. C., 2010: Turbulence in the Atmosphere. Cambridge University Press, 393 pp.