1. Introduction
An explicit free surface method has been developed for the z-coordinate Geophysical Fluid Dynamics Laboratory (GFDL) Modular Ocean Model (MOM) (for documentation of the model, see Pacanowski and Griffies 2000). The scheme allows for the use of a free surface with substantially improved stability properties over the older approach of Killworth et al. (1991, referred to as KSWP in the following), as well as for greatly improved conservation of tracers, such as heat and salt, relative to either the KSWP or Dukowicz and Smith (1994) approach. The result is an algorithm suitable for climate as well as regional ocean models.
The purpose of this paper is to present the method and to discuss its physical and numerical properties. Solution examples are provided of the Goldsbrough–Stommel circulation (see Huang 1993) driven by surface freshwater forcing, a baroclinic adjustment problem similar to that considered by Marsigli in the seventeenth century (see Gill 1982), and a global ocean simulation.
Before detailing the algorithm and showing examples, it is useful to provide some context for the present work.
a. Tracer conservation and freshwater forcing
Many ocean models employ a constant volume. For example, the rigid-lid approximation sets ∂tη = 0, thence eliminating fast barotropic waves. With this assumption, volume conservation via Eq. (1) leads to the balance qw = ∇ · U, which forms the basis for the work of Huang (1993). The corresponding barotropic dynamics require both a streamfunction and velocity potential, thus introducing two elliptic problems. Elliptic problems are generally inefficient to solve, especially in the presence of realistic geometry, topography, and surface forcing, and moreso on parallel computers (see section 5). Bryan (1969) made the further assumption that the vertically integrated velocity has zero divergence, resulting in a zero vertical velocity at the ocean surface: w(z = 0) = −∇ · U = 0. This assumption eliminates the velocity potential, yet at the cost of precluding a direct freshwater forcing.
In principle, implementation of tracer forcing in flux form is sufficient to ensure that tracer is globally conserved. However, with time-dependent cell thicknesses, details related to the discrete time stepping scheme, including the use of time filters necessary with the three-time-level leapfrog scheme, serve to preclude strict conservation. Additionally, global conservation is neither necessary nor sufficient to ensure locally conservative behavior. For example, consider an ocean at rest with globally constant tracer concentration. Then allow a wind to blow so to initiate velocity and surface height fluctuations, yet do not introduce any tracer surface fluxes. In addition to global tracer content remaining fixed, the tracer concentration locally at every point in the ocean will maintain the same constant value. This behavior provides an example of “local” tracer conservation, which is not a necessary result of global conservation. Instead, it arises so long as there is compatibility between tracer and volume budgets.
b. Splitting between fast and slow dynamics
For computational efficiency, primitive equation ocean models aim to exploit the timescale split between the fast barotropic gravity waves, which can propagate at some 200 m s−1, and the much slower (some 100 times slower) remaining dynamics. Such models approximate the barotropic mode by the depth-averaged motion, and the baroclinic mode is approximated by the deviation from the depth average. In a rigid-lid ocean model, vertical averaging provides a clean split between the fast and slow modes. In contrast, vertical averaging does not completely separate out the fast dynamics in a free surface model, since in this case the barotropic mode is weakly depth dependent. The papers by KSWP, Dukowicz and Smith (1994), Higdon and Bennett (1996), Higdon and de Szoeke (1997), and Hallberg (1997) provide discussions of these issues, which can be quite subtle yet important for the purpose of providing a stable split between the fast and slow modes in realistic ocean models. Additionally, details of this split have crucial implications for the tracer conservation properties of the model.
c. Modeling context and summary of key results
MOM has traditionally been a rigid-lid ocean model, where the streamfunction algorithm of Bryan (1969) was used to solve for the barotropic component of the dynamics. In the 1990s, important alternatives have been added, including the explicit free surface of KSWP, the rigid lid–surface pressure of Smith et al. (1992) and Dukowicz et al. (1993), and the implicit free surface of Dukowicz and Smith (1994). Version 3 of MOM (MOM 3) includes all of these algorithms within a single model framework. There are numerous additional developments using a free surface in z-coordinate ocean models, such as that used in the Ocean Circulation and Climate Advanced Modelling project (OCCAM; Webb et al. 1998), where the KSWP scheme is refined, as well as the Hamburg Ocean Primitive Equation (Wolff et al. 1997), Massachusetts Institute of Technology (Marshall et al. 1997), and Océan Parallélisé (OPA; Roullet and Madec 2000) ocean models, each of which employ an implicit time stepping algorithm. The present paper is the result of an effort to assess each of these free surface schemes, and possible alternatives such as those of Blumberg and Mellor (1987) and Mellor (1996), who use a terrain-following coordinate model, and Bleck and Smith (1990) and Hallberg (1997) who employ isopycnal layered models.
When testing the various free surface methods, we kept the following goals in mind: 1) MOM’s dynamical core should provide a faithful representation of large-scale ocean dynamics in an efficient manner for use in both coarse (>1°) and fine (<1°) resolution global and regional experiments. 2) Because of the increasing importance of parallel computation, the model should scale well as the number of computer processors increases. 3) The input of freshwater should occur naturally and allow tracer conservation. As described in this paper, our preferred approach is to use an explicit free surface method, which is shown here to satisfy these three goals.
There are important differences between the free surface method as described in this paper and the other free surface methods currently in use with z-coordinate models. First, the KSWP, Dukowicz and Smith (1993), Wolff et al. (1997), and Marshall et al. (1997) approaches assume the top model box to have fixed volume for purposes of the tracer and baroclinic momentum budgets, and so do not satisfy the third goal above. Second, the OCCAM model (Webb et al. 1998) relaxes this constraint, although details of their method are not documented. As shown in this paper, allowing the top box to change in time is necessary, but not sufficient, to globally and locally conserve tracer when using a free surface method. Details of the time stepping scheme are crucial and warrant careful analysis. Third, the OPA model also relaxes the fixed top cell volume constraint, yet it employs an implicit approach that is arguably less straightforward than an explicit approach, in addition does not tend to perform as well on parallel machines [though advances are available, as documented by Guyon et al. (2000)].
d. Contents of this paper
This paper consists of the following sections. Section 2 describes the model’s discrete tracer equations. It highlights points concerning tracer conservation in the presence of a time-dependent surface cell. Section 3 describes the model’s discrete equations of motion and details the time stepping algorithm for the barotropic and baroclinic systems. Section 4 exhibits results from idealized numerical examples that provide some practical illustrations of the method. Section 5 discusses issues related to using the method in a parallel computational environment. Section 6 finishes the paper with conclusions.
2. Discrete tracer budget
The purpose of this section is to derive the discrete tracer budgets within the free surface ocean model with a time-dependent top cell volume. For orientation and notation, Fig. 1 details a zonal-depth cross section of the rectangular control volumes over which the model’s surface equations are discretized. Notably, the position of a grid point within a cell is assumed to be fixed in time, hence maintaining the Eulerian nature of MOM. However, the corresponding grid cell volume generally changes according to temporal undulations of the surface height. We do not make the approximation that the surface height is small relative to any other length scale in the model, including the ocean depth.
a. Continuous time tracer budget
Specification of the surface tracer flux involves details of how the air–sea interface is modeled. A simple “closure” for the turbulence term, which is often used for ocean-only simulations, is a restoring condition
b. Time discretization
The question arises whether it is preferable to time step the thickness-weighted tracer concentration htT or the tracer concentration T. Tests with both approaches were performed and showed negligible differences. Notably, as shown below, both approaches have the same temporal order of accuracy, and so choosing one is a matter of convenience. As time stepping T involved the least changes to the existing MOM code, we detail that approach in the following. A similar reasoning applies to the momentum equation time discretization discussed in section 3.
c. Compatibility between volume and tracer budgets
Recall the example considered in section 1a, in which an ocean at rest with an initially constant tracer concentration is perturbed via a wind stress yet without tracer sources. Again, the solution should maintain the same constant tracer concentration locally at each grid cell, even as the surface cells undulate. To do so, it is necessary for there to be compatibility between the volume and tracer budgets. In particular, upon setting the tracer concentration to a constant, the tracer budget in the surface grid cells must reduce to the discretized surface height equation (1). Compatibility between the budgets thus imposes a constraint on how to time step the surface height, and such is detailed in section 3c.
d. Total tracer content in the discrete model
e. Local compatibility versus global conservation
The compatibility condition, which leads to a locally conservative scheme, and global tracer conservation cannot be simultaneously maintained with a stable three-time-level discretization. For example, use of a time filter on either the tracer (via the Robert filter) or the surface height (via a time averaging procedure described in section 3), preclude strict conservation of the total tracer content given by Eq. (10). However, these filters do not preclude local compatibility between volume and tracer budgets. Alternatively, omitting the time filters on the surface height and tracer in the tracer budget (11) allows for exact global conservation, yet such breaks the local compatibility so long as a time filter is used in the surface height equation (11). We provide two examples in section 4 that aim to quantify these points. Given the inability to provide exact discrete local and global conservation properties with an undulating free surface and the leapfrog scheme, we generally believe it prudent that conservation properties should be assessed on a case by case basis.
Alternative time stepping schemes exist, such as two-time-level schemes (e.g., Hallberg 1995, 1997), which provide potential remedies to the issues raised here. However, pursuit of such alternatives goes beyond our present scope.
f. Comments on surface fluxes
Assuming that the tracer concentration in the freshwater is the same as in the surface cell, Tw ≈ T1, makes the tracer time tendency independent of any explicit freshwater forcing. Instead, the dependence is restricted to the affects that freshwater has on the convergence −∇ · U. This approximation simplifies the setup of boundary conditions for the tracer flux. However, in general Tw and T1 are distinct and so Tw may need to be specified explicitly from data or another component model.
g. The special case of salt
3. Discrete momentum budget
In this section, we formulate the discrete momentum budget, which shares much in common with the tracer budget just discussed, and then present a solution algorithm for the model’s approximated barotropic and baroclinic modes.
a. Continuous time momentum budget
For Boussinesq models, it is common to approximate the surface pressure as ps ≈
b. General strategy for the time stepping
c. Time stepping algorithm
The focus in this section is on time and depth discretization, with Fig. 2 summarizing the following algorithm. For purposes of brevity, the horizontal spatial discretization discussed in the previous subsection will not be exposed. Discrete baroclinic times and time steps will be denoted by the Greek τ and Δτ, respectively, whereas the barotropic analogs will use the Latin t and Δt.
1) Algorithm basics
Allowing the grid cell thicknesses to evolve introduces a fundamentally new element to the traditional algorithms relevant for constant cells in z-coordinate models (e.g., Bryan 1969; Semtner 1974; Cox 1984; Killworth et al. 1991; Dukowicz and Smith 1994). However, since we impose positive cell thicknesses for all cells, including the surface, modifications to the constant cell approach should be relatively modest. That is a goal of the proposed algorithm.
This baroclinicity operator projects out the depth-independent part of a field and keeps its approximate baroclinic portion, where the projection is based on the distribution of cell thicknesses at time τ. Introduction of two baroclinic time labels τ and τ′ to Eq. (22) is necessitated by the freedom afforded the ocean depth to change in time. That is, τ′ represents the baroclinic time of the full velocity field uk(τ′), whereas τ represents the time used to define the baroclinicity operator Bkm(τ) and which defines the split.
2) Computing ûk(τ, τ + Δτ)
3) Computing u (τ, τ + Δτ)
The computation of
4) Computing η(τ + Δτ) and u (τ + Δτ)
d. Comments on the algorithm
1) Generality
Although the immediate application of the algorithm is for the free surface method, the algorithm allows any thickness within a vertical column to vary in time, so long as it remains positive. In particular, it is thought that this approach will find use for models with a time-varying bottom boundary layer thickness, such as that proposed by Killworth and Edwards (1999).
2) Stability
As expected, the baroclinic and barotropic time steps are determined by the usual Courant–Freidrichs–Lewy constraints set by waves and advection. Notably, time averaging has not been found to adversely affect the propagation of barotropic waves relative to the results using the KSWP approach. We comment further on this point when presenting a version of the Marsigli problem in section 4c.
Time averaging over the barotropic steps has been found to stabilize the model so that stretching of tracer time steps to values larger than the baroclinic time step is readily available. The stretching of tracer time steps is ubiquitous when spinning up to a thermodynamic equilibrium coarse-resolution, rigid-lid, z-coordinate ocean models (e.g., Bryan 1984; Killworth et al. 1984;Danagasoglu et al. 1996). Therefore, we consider the added stability of the present scheme to be of great practical value.
3) Computational timing
Because of the ability to match the baroclinic and tracer time steps to those commonly used in rigid-lid models, the present scheme has been found to be computationally comparable to the rigid lid on a single computer processor. Furthermore, due to the absence of the topographic instability present in rigid-lid models, which was described by Killworth (1987) and Dukowicz et al. (1993), the free surface is more economical with nontrivial topography. Section 4a provides an example to support these comments, and section 5 presents a discussion of computational aspects relevant for parallel machines.
Relatedly, when using partial bottom cells of Pacanowski and Gnanadesikan (1998), as commonly used now in MOM for representing the bottom topography, use of the time-dependent top model cells engenders only a trivial added cost relative to the case of constant top model cell volumes. This experience is in contrast to the 10% added cost of allowing the surface cell to vary in the implicit method of Roullet and Madec (2000).
4) Grid splitting
As pointed out in KSWP, the surface height on a B grid is prone to grid splitting, which manifests as a checkerboard pattern. The present scheme is less prone to this splitting than other algorithms we tried, and numerous tests indicate that the splitting is easily suppressed with mild filtering as described by KSWP or other filters described in Pacanowski and Griffies (2000). Notably, as the nonlinear free surface described here incorporates the surface height undulations into the tracer and momentum equations, suppression of the grid splitting is more important than in the linearized free surface methods.
5) Volume conservation
To conserve the model volume, it is sufficient to ensure that the surface height η evolves in a conservative manner. Both explicit time stepping discretizations (28) and (38) trivially satisfy such conservation. Therefore, all model tests indicate that volume is conserved to within computer roundoff.
6) Energetic consistency
Energetic consistency of the methods used here have been shown elsewhere. First, the arguments given by Bryan (1969) account for the momentum advection terms, which are discretized using second-order centered differences. The result is a redistribution of local kinetic energy, yet a preservation of its global integral. Second, the arguments in the appendix of Pacanowski and Gnanadesikan (1998) show that the partial cell methods ensure that the change in energy due to horizontal pressure forces balances potential energy change when density is linearly dependent on temperature and salinity. Importantly, this consistency is realized only when incorporating the undulating surface grid cell thickness into both the tracer and baroclinic velocity equations, as done here. Third, appendix B of Dukowicz and Smith (1994) provides a complementary analysis of the energetic consistency within their implicit free surface method, much of which is relevant for the present considerations.
7) Vertical velocities
Vertical velocities are diagnosed at baroclinic time steps using volume conservation within a grid cell. In particular, volume conservation over a surface cell indicates that the vertical velocity at the bottom face of this cell arises from the horizontal convergence of volume in this cell, time tendencies in the thickness of this cell, and volume passing across the top face from freshwater fluxes.
Given an expression for the convergence −∇ · U, diagnosing wk=1 in this manner allows for the remaining interior vertical velocities to be successively found through further integration of the continuity equation downward through a vertical column. On a B grid, −∇ · U is centered on a tracer point. Hence, Eq. (44) yields the vertical velocity on the bottom face of the surface tracer cell. The vertical velocity on the surrounding velocity cells is constructed as a volume conserving average of the surrounding tracer cell vertical velocities.
In MOM, the bottom of the ocean on tracer cells is a flat surface, representing the “lopped off” surfaces of topography.2 Hence, the vertical velocity must vanish at this location. A self-consistency check on how accurately the model’s numerics conserve volume amounts to testing how well this property is satisfied when integrating downward from the ocean surface, starting from the vertical velocity given by Eq. (44). Adding freshwater to the model provides a nontrivial test of these properties. The present scheme produces zero vertical velocities at the ocean bottom on tracer cells, to within computer roundoff, regardless of the topography or surface forcing.
8) Order of the solution method
The method presented above solves for the updated“quasi-baroclinic” velocity ûk(τ, τ + Δτ) prior to the updated quasi-barotropic velocity
4. Numerical examples
For the purpose of numerically illustrating the algorithm, we present selected results from four experiments. The first and second are from an idealized, flat-bottom, coarse-resolution, sector model driven by time-independent buoyancy and momentum forcing. Details of the configuration are provided in the caption to Fig. 3. The third example is that of a higher-resolution regional model with two basins connected by a channel with a shallow sill. This configuration is shown in detail later (Fig. 5). The final example is a coarse-resolution global model that quantifies the issues of local versus global conservation.
a. Free surface compared to rigid lid
We start by providing a direct comparison between the explicit free surface and the rigid-lid streamfunction method in the sector model. Each experiment used the same tracer time step of 1 day and the same baroclinic time step of 1 h. The rigid-lid experiment used a 1 h time step for the streamfunction, whereas the free surface used 300 s for the barotropic time step. With the time averaging used with the free surface method, there are a total of 24 barotropic time steps integrated for each baroclinic time step. Both the free surface and rigid-lid methods used a virtual salt flux and zero freshwater forcing.
The experiments were run for 4000 tracer years, at which point both reached a quasi-equilibrium. When run on a single processor on GFDL’s Cray T90, the free surface method took a few percent longer than the rigid-lid method. Although this comparison is a function of model configuration and computer details, as well as elliptic solver algorithm, the key point is that such a comparison is impressive for the free surface model, since the rigid-lid model is ideally suited for such a simple configuration. That is, the flat-bottom, time-independent forcing, and absence of mesoscale eddies means that the number of elliptic solver scans can be quite low, often requiring only a single scan to update the streamfunction.
More realistic bottom topography, surface forcing, and mesoscale eddies will greatly increase the number of elliptic solver scans required by the rigid-lid model. In practice, more realistic situations often prompt one to greatly reduce or loosen the criteria used for determining a solution to the elliptic problem. Indeed, in some cases it is difficult to argue that a “solution” has even been found. In contrast, the ratio of barotropic to baroclinic wave speeds is independent of model resolution and surface forcing. Rather, it is determined by the bottom topography and stratification. Consequently, for the explicit free surface method, there is no issue regarding convergence to a solution, as convergence is guaranteed by construction.
As with KSWP and Dukowicz and Smith (1994), we check numerical integrity of the free surface solution by comparing with the more highly tested rigid-lid results. For example, Fig. 3 provides a sample of the free surface solution; the rigid-lid results are nearly identical. Since the vertical velocity at the ocean surface vanishes at equilibrium in a constant forced, coarse-resolution free surface model without freshwater forcing, the two methods should indeed provide nearly identical answers. A detailed comparison (not shown) reveals negligible differences for all aspects of the simulations. Other model configurations also have been run, with similar comparisons.
We conclude from these tests that the explicit free surface method maintains the convenience of the rigid lid for purposes of idealized coarse-resolution modeling. The solutions are likewise nearly the same when run under the same forcing. Both of these conclusions have remained valid with more realistic climate model experiments now routinely run at GFDL.
b. Goldsbrough–Stommel circulation
When integrating the model to reach the solution in Fig. 4, we used 1-day tracer and 1-h baroclinic velocity time steps as in the previous experiment. Although neither salt nor water were added to the model, the total salt does not remain precisely constant during the integration. The reason, as discussed in section 2, is that the tracer and baroclinic momentum time steps were unequal. The consequences on the equilibrium solution reached at 4000 yr have not been assessed. Doing so requires rerunning the experiment with equal tracer and baroclinic time steps.
c. Marsigli problem
The purpose of this section is to directly compare the new free surface method to that of KSWP. For this purpose, we consider an unforced baroclinic adjustment problem motivated from the system first discussed for the Bosporus by Marsigli in 1681 (see Gill 1982, p. 96).
The experimental details are given in the caption to Fig. 5. A strong baroclinic pressure gradient arising from a salinity front rapidly piles a sea level difference of about 40 cm between the two basins. After about two model days, barotropic and average baroclinic pressure gradients balance each other, and the sea level difference becomes roughly time independent. A weak cross-channel geostrophic circulation persists that prevents a salinity exchange between both basins. Although the setup appears artificial, a similar balance can be observed in the Baltic Sea during periods of weak wind forcing. Because the sea level changes rapidly and there is a strong interbasin salinity gradient, this adjustment problem provides a severe test for how well the model conserves total salt.
Figure 5 shows the sea level and barotropic current after 1 day using the nonlinear free surface method. The current between the basins is deflected to the right by the Coriolis force. There is a strong dipole sitting on top of the sill, with a high to the southwest and a low to the northeast. The barotropic flow at the outer edges of the subbasins is directed northward in both basins, which indicates that this current is the result of barotropic Kelvin waves encircling the subbasins. Such waves, and the remaining currents, cannot be represented in a rigid-lid model.
In general, the solution for the sea level, velocity, temperature, and salinity differ only slightly between the KSWP and the new approach during the adjustment. The close agreement of these features provides an important positive assessment of the new method. In particular, it indicates that the time averaging approach used here does not overly damp the barotropic waves, at least as compared to the KSWP algorithm.
As there are no surface water fluxes, the total volume of the ocean domain should remain constant, as indeed it does to within computer roundoff (not shown). Likewise, total salt and domain-averaged salinity (total salt divided by total volume) should remain constant, where total salt is computed via Eq. (10). An accounting of the domain-averaged salinity is shown in Fig. 6. Shown are four time series over a 3-month period that track how salinity deviates from its initial value.
The time series l1 results from running the free surface of KSWP, with a constant cell thickness in the tracer and baroclinic velocity equations. The tracer and baroclinic time steps are both 240 s and the barotropic time step is 60 s. During the first two model days the sea level in the western basin is decreasing rapidly but the sea level in the eastern basin is increasing. Since the upper model boundary with the KSWP scheme is at z = 0, saline water is entering the model area in the western basin but brackish water leaves the model area through the level z = 0 in the eastern basin. Consequently, the domain-averaged salinity in the model area is increasing. As long as the sea surface salinity is uniform, the salt gain in the western basin and the salt loss in the eastern basin would be canceled exactly by a sea level variation in the opposite direction. However, such cancellation is not general, and so nonconservation of total salt is the norm.
Since the sea surface height is known, the salt between the upper model boundary at z = 0 and the sea level z = η can be included in the diagnostic for domain tracer content. As an example, the time series l2 employs exactly the same linearized free surface scheme as l1 but uses this improved tracer diagnostic. Compared with l1, the conservation of salt is better; however, there is a clear trend in the domain-averaged salinity. This gain of salt stems from undulations of the sea level in correspondence with a varying sea surface salinity.
The time series n1, which shows minor deviation from zero, uses the new free surface discussed in this paper with tracer and baroclinic time steps both equal to 240 s, with a 60-s barotropic time step. The conservation of tracer is improved substantially when compared with the linearized free surface results. The deviation from zero is less than 10−5 psu.
The time series n2 uses the same free surface, yet with unequal baroclinic and tracer time steps, with a tracer time step of 720 s, and baroclinic time step of 240 s. As discussed in section 2, tracer conservation is not ensured in this case, as is indeed shown by the roughly 0.04 psu salinity drift.
d. Quantifying local versus global conservation
The Marsigli example provided an illustration of the global conservation properties of the new algorithm when using a discretization that maintains local compatibility between the volume and tracer budgets. As mentioned in section 2e, there is an alternative approach that is available for maintaining more exact global conservation properties, yet at the cost of sacrificing the local compatibility. We present here an example that compares the two approaches and concludes that the scheme that performs superior for global conservation is preferable for climate modeling purposes.
We consider a coarse-resolution global model run with two idealized conditions and using two different forms for the discretization of the T∂tη term appearing in the tracer concentration budget (6). Discretization (a) uses T(τ)[η(τ + Δτ) −
The global model has roughly three degrees of horizontal resolution, uses 15 vertical levels down to 5000-m depth, and is run with a 30-min time step for the tracer and baroclinic momentum and 30-s barotropic time step. The first experiment initialized the ocean at rest with constant 25°C water and then added a climatological wind stress yet no heat fluxes. As presented in section 2c, the solution should maintain constant 25°C water everywhere. For an indefinite integration period, discretization a indeed remains precisely at 25°C. In contrast, discretization b shows water after 30 days with a temperature of 24.999 95°C. Although not exact, this is quite a small deviation even when extrapolated to integrations of centuries to millennia.
The second experiment initialized the temperature field with a zonally averaged version of the Levitus (1982) analysis. The surface flux consisted of the same climatological wind stress as the previous example as well as a uniform and constant 10 W m−2 surface heat flux. We assess here whether the heat input through the ocean surface is equivalent to the heat absorbed by the ocean as deduced by measuring the ocean’s total heat content via Eq. (10). We focus on a single model day since this is a typical time period over which winds and surface tracer flux are held fixed in coupled climate models.
After 1 day for discretization a, the heat input through the ocean surface minus the change in ocean heat content was equivalent to a spatially averaged error over the globe of −0.5 W m−2. Note that over the first half of the day, the error was −1.6 W m−2, which diminished to −0.37 W m−2 after 2 days. These are nontrivial errors, upward of 10%–20% of the imposed heating from the atmosphere. In contrast, discretization b, along with Robert filtering on the tracer, reduced the error from −0.5 W m−2 to a negligible −0.003 W m−2. Removing the Robert filter brings the error to 10−9 W m−2, which arises from computer roundoff.
Both solution methods showed negligible qualitative differences for the structure of the surface height over the 30-day integration period. However, given the small errors incurred by discretization b for both the first and second set of experiments, we conclude that it is a preferable approach for longer integrations where conservation is key.
5. Computational aspects
The purpose of this section is to provide an overview of computational issues related to the use of a particular barotropic algorithm in a parallel computational environment. The papers by Dukowicz et al. (1993), Webb (1996), Webb et al. (1997), Marshall et al. (1997), and Guyon et al. (2000) are complementary to the following.
a. Fundamental considerations
The dominant feature of current supercomputing technology is the large and rapidly growing disparity between processor clock speeds and the memory bandwidth and latency. The rate at which processors are able to execute their instructions upon data far outstrips the ability of memory to deliver data to the processor. All current research into supercomputing architecture (see, e.g., Culler and Singh 1998) is directed at resolving this problem.
Traditional supercomputers, such as the Cray vector machines, are largely dependent on prohibitively expensive custom memory to deliver data at a rate sufficient to keep vector pipelines busy. The microprocessor-based computers currently being produced rely instead on commodity memory, but implement a deep memory hierarchy, where multiple levels of smaller and faster data caches are placed between the main memory and the processor. For performance comparable to vector systems, scalable cache-based microprocessor supercomputers achieve speedup through massive parallel partitioning of the problem, where the problem is distributed among many processors working concurrently and exchanging data on a network. This architectural model leads to measures of algorithm performance substantially different from the vector model. Notably, the following issues become central.
Temporal locality. Given the relatively high cost of memory access, algorithms that have a high rate of data reuse are preferable. The computational intensity, defined as the ratio of floating-point operations per memory request, is required to be as high as possible. The match between the size of the data subset allotted to a processor, and the size of its cache, is a key element in this aspect of performance.
Spatial locality. In the interests of minimizing communication among processors, algorithms that maintain spatial locality of data references are most useful.
Communication profile. The communication performance of the interprocessor hardware interconnect is measured in terms of its latency (startup time per message) and bandwidth (the inverse of the transmission rate). Communication latency can be a cause for concern on most commercially available machines today, barring a few exceptions such as the Cray T3E. The amount of data that needs to be transferred between processors is a function of the spatial locality of the algorithm and the problem size. Algorithms that can accomplish the maximum data transfer in the fewest possible messages are preferable.
b. Algorithms available in MOM 3
As mentioned in the introduction, there are three basic algorithms in MOM 3 for solving the barotropic mode:the rigid-lid streamfunction of Bryan (1969), with modifications to the rigid-lid surface pressure method of Smith et al. (1992) and Dukowicz et al. (1993); the implicit free surface method of Dukowicz and Smith (1994); and the explicit free surface method of KSWP as well as that discussed here.
The rigid-lid model, although parallelized (Redler et al. 1998), has not been the focus of recent algorithm research at GFDL, largely because of the physical issues previously raised in this paper as well as problems with global data dependencies related to island integrals [see Bryan (1969) as well as Smith et al. 1992 for discussion].
The implicit free surface method, in contrast, has a relatively long history of use on parallel machines. This algorithm solves the barotropic system implicitly in time, hence allowing for the barotropic mode time step to be lengthened to that of the baroclinic mode. As such, there is no subcycling of the barotropic mode as required with the explicit approach. In so doing, however, the algorithm requires the solution of an elliptic problem.
The elliptic solvers used in the implicit free surface method are variants of the basic iterative Jacobi solver, where methods are used to accelerate the rate of convergence. For example, the conjugate gradient (CG) solver computes the optimal vector along which to converge. Computation of this vector involves a global sum across the entire domain; hence, the data dependencies are global, and involve global communication at each step in the solver iteration. It is observed in MOM that the number of iterations for the CG method is ∼100 for a 1° model with static boundary conditions, and goes up with increasing resolution. Additionally, the iteration count is highly dependent on the model’s representation of bottom topography and surface forcing, with more realistic topography and time-dependent surface forcing generally requiring much larger iteration counts. Parallel implementation of the CG requires communication at each iteration. Other, faster methods, with better parallel performance, such as multigrid methods, are known to exist, but also require increased steps as the resolution goes up.
The explicit free surface has improved data locality, since the computations involve only local derivatives and no elliptic problem. The ratio of the baroclinic to the barotropic time step is given by the ratio of the phase speeds of the barotropic gravity wave to the first baroclinic wave. This ratio is determined by ocean depth and stratification and, so, is independent of grid resolution. Hence, the method’s computational performance is generally independent of grid resolution.
c. Examples of performance on scalable systems
MOM 3 is parallelized in one dimension, along latitude rows only. Scalability is thus measured in terms of latitude rows per processor. We show scaling results from a Cray T3E at two different resolutions using the explicit free surface algorithm documented here. Table 1 shows scaling results for a Southern Hemisphere configuration using a 4° Mercator grid. The superlinear scaling above seven rows per processing element (PE) is a result of arrays becoming small enough to fit in cache. Table 2 shows results for the same domain at 1° resolution.
The results for the 1° model are more appropriate for current and future needs of parallel ocean modeling. They indicate that one can increase the number of processors in MOM 3 until 8 rows/PE without significant loss of scalability. Note that with 4 rows/PE, scalability is 80%, whereas it has been found to be about 60% for MOM’s version of the implicit free surface method. Furthermore, as noted previously, absolute times for the implicit free surface method are highly dependent on the conjugate gradient iteration count needed to achieve convergence to the subjectively chosen tolerance level.
6. Conclusions
Some key conclusions of this paper were noted in the introduction, with the central ones highlighted here.
By adapting the partial cell framework of Pacanowski and Gnanadesikan (1998) to the undulating top model grid cell, the explicit free surface scheme described here fully incorporates the effects of the surface height into the tracer and baroclinic momentum equations. Doing so allows for the surface boundary conditions to be formulated in a physically consistent and meaningful manner, to treat tracer and freshwater fluxes consistently and quasi-conservatively, and to maintain energetic consistency in which buoyancy effects in the density equation balance the work done by pressure from the momentum equation.
The explicit free surface method is stable and allows long tracer time steps. This result represents a key practical feature of the algorithm for use in coarse-resolution climate studies. Relatedly, when run on parallel computers, the absence of an elliptic problem, present in the rigid-lid and implicit free surface methods, enhances the processor scaling of the explicit free surface method.
Acknowledgments
We thank members of the Ocean, Climate, and Prediction Groups at GFDL for testing this scheme, and earlier versions, in various model configurations. In particular, we thank Jeff Anderson, Bob Hallberg, Matt Harrison, George Mellor, Thomas Neumann, Young-Gyu Park, Igor Polyakov, Tony Rosati, Torsten Seifert, Mike Spelman, Eli Tziperman, David Webb, Mike Winton, and Bruce Wyman for enjoyable conversations and useful suggestions. Bob Hallberg and Igor Polyakov deserve special thanks for extensive suggestions and useful critiques. Comments from the anonymous reviewers are also greatly appreciated. We thank Jerry Mahlman, the director of GFDL, for his support and encouragement.
REFERENCES
Barnier, B., 1998: Forcing the oceans. Ocean Modeling and Parameterization, E. P. Chassignet and J. Verron, Eds., NATO Advanced Study Institute, Kluwer Academic, 45–80.
Beron-Vera, F. J., J. Ochoa, and P. Ripa, 1999: A note on boundary conditions for salt and freshwater balances. Ocean Modelling,1, 111–118.
Bleck, R., and L. T. Smith, 1990: A wind-driven isopycnic coordinate model of the north and equatorial Atlantic Ocean. 1. Model development and supporting experiments. J. Geophys. Res.,95 (C3), 3273–3285.
Blumberg, A. F., and G. L. Mellor, 1987: A description of a three-dimensional coastal ocean circulation model. Three-Dimensional Coastal Ocean Models, N. Heaps, Ed., Coastal and Estuarine Sciences, Vol. 4, Amer. Geophys. Union, 1–16.
Bryan, F. O., 1987: Parameter sensitivity of primitive equation ocean general circulation models. J. Phys. Oceanogr.,17, 970–985.
Bryan, K., 1969: A numerical method for the study of the circulation of the world ocean. J. Comput. Phys.,4, 347–376.
——, 1984: Accelerating the convergence to equilibrium of ocean-climate models. J. Phys. Oceanogr.,14, 666–673.
——, and M. D. Cox, 1972: An approximate equation of state for numerical models of the ocean circulation. J. Phys. Oceanogr.,2, 510–514.
Cox, M. D., 1984: A primitive equation, 3-dimensional model of the ocean. GFDL Ocean Group Tech. Rep. 1, 143 pp.
——, and K. Bryan, 1984: A numerical model of the ventilated thermocline. J. Phys. Oceanogr.,14, 674–687.
Culler, D. E., and J. P. Singh, 1998: Parallel Computer Architecture:A Hardware/Software Approach. Morgan Kaufmann, 1100 pp.
Danabasoglu, G., J. C. McWilliams, and W. G. Large, 1996: Approach to equilibrium in accelerated global oceanic models. J. Climate,9, 1092–1110.
Dewar, W. K., and R. X. Huang, 1996: On the forced flow of salty water in a loop. Phys. Fluids,8, 954–970.
——, Y. Hsueh, T. J. McDougall, and D. Yuan, 1998: Calculation of pressure in ocean simulations. J. Phys. Oceanogr.,28, 577–588.
Dukowicz, J. K., and R. D. Smith, 1994: Implicit free-surface method for the Bryan–Cox–Semtner ocean model. J. Geophys. Res.,99, 7991–8014.
——, ——, and R. C. Malone, 1993: A reformulation and implementation of the Bryan–Cox–Semtner ocean model on the connection machine. J. Atmos. Oceanic Technol.,10, 195–208.
Gill, A. E., 1982: Atmosphere–Ocean Dynamics. Academic Press, 662 pp.
Gordon, C., C. Cooper, C. A. Senior, H. Banks, J. M. Gregory, T. C. Johns, J. F. B. Mitchell, and R. A. Wood, 2000: The simulation of SST, sea ice extents and ocean heat transports in a version of the Hadley Centre coupled model without flux adjustments. Climate Dyn.,16, 147–168.
Griffies, S. M., and R. W. Hallberg, 2000: Biharmonic friction with a Smagorinsky viscosity for use in large-scale eddy-permitting ocean models. Mon. Wea. Rev.,128, 2935–2946.
Guyon, M., G. Madec, F. X. Roux, M. Imbard, C. Herbaut, and P. Fronier, 1999: Parallelization of the OPA ocean model. Calculateurs Paralleles,11, 499–517.
Hallberg, R., 1995: Some aspects of the circulation in ocean basins with isopycnals intersecting the sloping boundaries. Ph.D. thesis, University of Washington, Seattle, WA, 244 pp.
——, 1997: Stable split time stepping schemes for large-scale ocean modeling. J. Comput. Phys.,135, 54–65.
Haltiner, G. J., and R. T. Williams, 1980: Numerical Prediction and Dynamic Meteorology. John Wiley, 477 pp.
Higdon, R. L., and A. F. Bennett, 1996: Stability analysis of operator splitting for large-scale ocean modeling. J. Comput. Phys.,123, 311–329.
——, and R. A. de Szoeke, 1997: Barotropic–baroclinic time splitting for ocean circulation modeling. J. Comput. Phys.,135, 30–53.
Holland, W. R., J. C. Chow, and F. O. Bryan, 1998: Application of a third-order upwind scheme in the NCAR ocean model. J. Climate,11, 1487–1493.
Huang, R. X., 1993: Real freshwater flux as a natural boundary condition for the salinity balance and thermohaline circulation forced by evaporation and precipitation. J. Phys. Oceanogr.,23, 2428–2446.
Jackett, D. R., and T. J. McDougall, 1995: Minimal adjustment of hydrographic profiles to achieve static stablilty. J. Atmos. Oceanic Technol.,12, 381–389.
Killworth, P. D., 1987: Topographic instabilities in level model OGCM’s. Ocean Modelling (unpublished manuscript), 75, 9–12.
——, and N. R. Edwards, 1999: A turbulent bottom boundary layer code for use in numerical models. J. Phys. Oceanogr.,29, 1221–1238.
——, J. M. Smith, and A. E. Gill, 1984: Speeding up ocean circulation models. Ocean Modelling (unpublished manuscript), 56, 1–5.
——, D. Stainforth, D. J. Webb, and S. M. Paterson, 1991: The development of a free-surface Bryan–Cox–Semtner ocean model. J. Phys. Oceanogr.,21, 1333–1348.
Large, W. G., and S. Pond, 1981: Open ocean flux measurements in moderate to strong winds. J. Phys. Oceanogr.,11, 324–336.
Leonard, B. P., 1979: A stable and accurate convective modelling procedure based on quadratic upstream interpolation. Comput. Methods Appl. Mech. Eng.,19, 59–98.
Levitus, S., 1982: Climatological Atlas of the World Ocean. NOAA Prof. Paper 13, U.S. Government Printing Office, Washington, DC, 173 pp.
Marshall, J., A. Adcroft, C. Hill, L. Perelman, and C. Heisey, 1997:A finite-volume, incompressible Navier–Stokes model for studies of the ocean on parallel computers. J. Geophys. Res.,102, 5753–5766.
Mellor, G. L., 1996: User’s guide for a three-dimensional, primitive equation, numerical ocean model. Program in Atmospheric and Oceanic Studies, Princeton University, Princeton, NJ, 40 pp. [Available from Program in Atmospheric and Oceanic Sciences, Princeton University, Princeton, NJ 08542.].
Pacanowski, R. C., and A. Gnanadesikan, 1998: Transient response in a z-level ocean model that resolves topography with partial cells. Mon. Wea. Rev.,126, 3248–3270.
——, and S. M. Griffies, 2000: The MOM 3.1 manual. NOAA/Geophysical Fluid Dynamics Laboratory, Princeton, NJ, 680 pp.
——, K. Dixon, and A. Rosati, 1991: The GFDL modular ocean model user guide. GFDL Ocean Group Tech. Rep. 2, Geophysical Fluid Dynamics Laboratory, Princeton, NJ, 16 pp.
Redler, R., K. Ketelsen, J. Dengg, and C. W. Böning, 1998: A high-resolution numerical model for the circulation of the Atlantic Ocean. Proceedings of the Fourth European CRAY-SGI MPP Workshop, H. Lederer and F. Hertweck, Eds., Max-Planck-Institut für Plasmaphysik, 95–108.
Roullet, G., and G. Magec, 2000: Salt conservation, free surface and varying volume. A new formulation for Ocean GCMs. J. Geophys. Res.,105, 23 927–23 947.
Semtner, A. J., Jr., 1974: An oceanic general circulation model with bottom topography. Numerical Simulation of Weather and Climate, Tech. Rep. 9, Department of Meteorology, University of California, Los Angeles.
Smith, R. D., J. K. Dukowicz, and R. C. Malone, 1992: Parallel ocean general circulation modeling. Physica D,60, 38–61.
Webb, D. J., 1995: The vertical advection of momentum in Bryan–Cox–Semtner ocean general circulation models. J. Phys. Oceanogr.,25, 3186–3195.
——, 1996: An ocean model code for array processor computers. Comput. Geophys.,22, 569–578.
——, A. C. Coward, B. A. de Cuevas, and C. S. Gwilliam, 1997: A multiprocessor ocean general circulation model using message passing. J. Atmos. Oceanic Technol.,14, 175–183.
——, B. A. de Cuevas, and A. C. Coward, 1998: The first main run of the OCCAM global ocean model. Southampton Oceanography Centre Internal Doc. 34, 44 pp.
Wolff, J.-O., E. Maier-Reimer, and S. Legutke, 1997: The Hamburg Ocean Primitive Equation Model HOPE. DKRZ Tech. Rep. 13, 98 pp.
MOM 3 performance on a Cray T3E-900 for a coarse-resolution model. The model has one-dimensional domain decomposition along latitude rows, runs with the MOM 3 memory window, and uses Cray-SHMEM communication. Results here are from a Southern Hemisphere configuration using a 4°Mercator grid extending from the equator to 74°S with 90 × 28 × 40 grid points. The model used a tracer time step of 43 200 s, baroclinic time step of 4320 s, and explicit free surface time step of 216 s. Model run times are given in s. Shown are the number of T3E computer processors (N), number of computed latitude rows per processor (j/N), time for the main model loop (main), time for the free surface portion of the model (fs), total model run time (total), scaling (scale), and the megaflops per second (Mflop/s). Scaling is defined to be the total run time taken for an experiment to complete on one processor, divided by the time taken for N processors as scaled by the number of processors: i.e., scale = T1/(NTN).
Same as Table 1 but for a more refined grid of 1° Mercator resolution with 360 × 112 × 40 grid points, tracer time step of 4320 s, baroclinic time step of 4320 s, and explicit free surface time step of 60 s.
Additionally, the constant Boussinesq density, ρo, is not set to 1 g cm−3, as previously done in the Bryan–Cox–Semtner model (Bryan 1969; Cox 1984; Semtner 1974) or previous versions of MOM (Pacanowski et al. 1991). Instead, ρo = 1.035 g cm−3, from which density in the World Ocean generally deviates by less than 2% (Gill 1982, p. 47), whereas using ρ0 = 1.0 g cm−3 is less accurate.
The bottom velocity cell generally does not sit on the ocean bottom, and so can support a vertical velocity due to sloping topography. Details of how MOM handles this velocity are given in Webb (1995) as well as in Pacanowski and Griffies (2000).