• Backhaus, J. O., 1983: A semi-implicit scheme for the shallow water equations for application to shelf sea modeling. Cont. Shelf Res, 2 , 234254.

    • Search Google Scholar
    • Export Citation
  • Backhaus, J. O., 1985: A three-dimensional model for the simulation of shelf-sea dynamics. Dtsch. Hydrogr. Z, 38 , 165187.

  • Beare, M. I., , and Stevens D. P. , 1997: Optimisation of a parallel ocean general circulation model. Ann. Geophys, 15 , 13691377.

  • Bergström, S., , and Carlsson B. , 1994: River runoff to the Baltic Sea: 1950–1990. Ambio, 23 , 280287.

  • Berntsen, J., 2000: Users guide for a modesplit σ-coordinate numerical ocean model. Department of Applied Mathematics, University of Bergen, Tech. Rep. 135, 48 pp.

    • Search Google Scholar
    • Export Citation
  • Bleck, R., , and Boudra D. , 1981: Initial testing of a numerical ocean circulation model using a hybrid (quasi-isopycnic) vertical coordinate. J. Phys. Oceanogr, 11 , 755770.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bleck, R., , Hanson H. P. , , Hu D. , , and Kraus E. B. , 1989: Mixed layer/thermocline interaction in a three-dimensional isopycnal model. J. Phys. Oceanogr, 19 , 14171439.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bleck, R., , Rooth C. , , Hu D. , , and Smith L. T. , 1992: Salinity-driven transients in a wind- and thermohaline-forced isopycnic coordinate model of the North Atlantic. J. Phys. Oceanogr, 22 , 14861505.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bleck, R., , Dean S. , , O'Keefe M. , , and Sawdey A. , 1995: A comparison of data-parallel and message-passing versions of the Miami Isopycnic Coordinate Ocean Model (MICOM). Parallel Comput, 21 , 16951720.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Blumberg, A. F., , and Mellor G. L. , 1987: A description of a three-dimensional coastal ocean circulation model. Three-dimensional coastal ocean models. N. S. Heaps, Ed., Coastal Estuarine Series, Vol. 4, American Geophysical Union, 208 pp.

    • Search Google Scholar
    • Export Citation
  • Bryan, K., 1969: A numerical method for the study of the circulation of the World Ocean. J. Comput. Phys, 4 , 347376.

  • Bryan, K., , and Cox M. D. , 1972: An approximate equation of state for numerical models of ocean circulation. J. Phys. Oceanogr, 2 , 510514.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cox, M. D., 1984: A primitive equation 3-dimensional model of the ocean. Geophysical Fluid Dynamics Laboratory Ocean Group Tech. Rep. 1, Princeton University, 141 pp.

    • Search Google Scholar
    • Export Citation
  • Craig, P. D., , and Banner M. L. , 1994: Modeling wave-enhanced turbulence in the ocean surface layer. J. Phys. Oceanogr, 24 , 25462559.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • de Szoeke, R. A., 2000: Equations of motion using thermodynamic coordinates. J. Phys. Oceanogr, 30 , 28142829.

  • de Szoeke, R. A., , Springer S. R. , , and Oxilia D. M. , 2000: Orthobaric density: A thermodynamic variable for ocean circulation studies. J. Phys. Oceanogr, 30 , 28302852.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dukowicz, J. K., , Smith R. D. , , and Malone R. C. , 1993: A reformulation and implementation of the Bryan–Cox–Semtner Ocean Model on the connection machine. Atmos. Ocean. Tech, 10 , 195208.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Eigenheer, A., , and Dahlin H. , 1998: Quality assessment of the High Resolution Operational Model for the Baltic Sea—HIROMB. ICES Statutory Meeting, Lisbon, Portugal, ICES, 13 pp.

    • Search Google Scholar
    • Export Citation
  • Griffies, S. M., , Böning C. , , Bryan F. O. , , Chassignet E. P. , , Gerdes R. , , Hasumi H. , , Hirst A. , , Treguier A-M. , , and Webb D. , 2000: Developments in ocean climate modelling. Ocean Modelling, 2 , 123192.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haidvogel, D. B., , Wilkin J. L. , , and Young R. E. , 1991: A semi-spectral primitive equation ocean circulation model using vertical sigma and orthogonal curvilinear horizontal coordinates. J. Comput. Phys, 94 , 151185.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hallberg, R., 1997: HIM: The Hallberg isopycnal coordinate primitive equation model. NOAA GFDL Tech. Rep., Princeton University, 39 pp.

  • Hasumi, H., 2000: CCSR Ocean Component Model (COCO) version 2.1. CCSR Report 13.

  • Hibler, W. D., 1979: A dynamic thermodynamic sea ice model. J. Phys. Oceanogr, 9 , 817846.

  • Hunke, E. C., , and Dukowicz J. K. , 1997: An elastic-viscous-plastic model for sea ice dynamics. J. Phys. Oceanogr, 27 , 18491867.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hunke, E. C., , and Zhang Y. , 1999: A comparison of sea ice dynamics models at high resolution. Mon. Wea. Rev, 127 , 396408.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Karypis, G., , and Kumar V. , 1995: METIS—Unstructured graph partitioning and sparse matrix ordering system, version 2.0. Department of Computer Science Tech. Rep., University of Minnesota, 16 pp.

    • Search Google Scholar
    • Export Citation
  • Killworth, P. D., , Stainforth D. , , Webb D. J. , , and Paterson S. M. , 1991: The development of a free-surface Bryan-Cox-Semtner ocean model. J. Phys. Oceanogr, 21 , 13331348.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Madec, G., , Delecluse P. , , Imbard M. , , and Lévy C. , 1998: OPA 8.1 Ocean General Circulation Model reference manual. Note du Pôle de modélisation XX, Institut Pierre-Simon Laplace, France, 91 pp.

    • Search Google Scholar
    • Export Citation
  • Marshall, J., , Hill C. , , Perelman L. , , and Adcroft A. , 1997a: Hydrostatic, quasi-hydrostatic, and nonhydrostatic ocean modeling. J. Geophys. Res, 102 , 57335752.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marshall, J., , Adcroft A. , , Hill C. , , Perelman L. , , and Heisey C. , 1997b: A finite-volume, incompressible Navier Stokes model for studies of the ocean on parallel computers. J. Geophys. Res, 102 , 57535766.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Meier, H. E. M., 1999: First results of multi-year simulations using a 3D Baltic Sea model. Swedish Meteorological and Hydrological Institute Reports Oceanography 27, 48 pp.

    • Search Google Scholar
    • Export Citation
  • Meier, H. E. M., 2000: The use of the kε turbulence model within the Rossby Centre regional ocean climate model: Parameterization development and results. Swedish Meteorological and Hydrological Institute Reports Oceanography 28, 81 pp.

    • Search Google Scholar
    • Export Citation
  • Meier, H. E. M., , Döscher R. , , Coward A. C. , , Nycander J. , , and Döös K. , 1999: RCO—Rossby Centre regional Ocean climate model: Model description (version 1.0) and first results from the hindcast period 1992/93. Swedish Meteorological and Hydrological Institute Reports Oceanography 26, 102 pp.

    • Search Google Scholar
    • Export Citation
  • Mesinger, F., , and Arakawa A. , 1976: Numerical Methods Used in Atmospheric Models. GARP Publications Series, Vol. 1, World Meteorological Organisation, 64 pp.

    • Search Google Scholar
    • Export Citation
  • Oberhuber, J. M., 1993a: The OPYC ocean general circulation model—The description of a coupled snow, sea-ice, mixed layer and isopycnal ocean model. Deutsches Klimarechenzentrum Tech. Rep. 7, 130 pp.

    • Search Google Scholar
    • Export Citation
  • Oberhuber, J. M., 1993b: Simulation of the Atlantic circulation with a coupled sea ice–mixed-layer–isopycnal general circulation model. Part I: Model description. J. Phys. Oceanogr, 23 , 808829.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Oberpriller, W. D., , Sawdey A. , , O'Keefe M. T. , , and Gao S. , 1999: Parallelizing the Princeton Ocean Model using TOPAZ. Parallel Computer Systems Laboratory, Department of Electrical and Computer Engineering Tech. Rep. University of Minnesota, 21 pp.

    • Search Google Scholar
    • Export Citation
  • Orlanski, I., 1976: A simple boundary condition for unbounded hyperbolic flows. J. Comput. Phys, 21 , 251269.

  • Pacanowski, R. C., , and Griffies S. M. , 2000: MOM 3.0 Manual (draft). NOAA Geophysical Fluid Dynamics Laboratory, Princeton University, 680 pp.

    • Search Google Scholar
    • Export Citation
  • Paulson, C. A., , and Simpson J. J. , 1977: Irradiance measurements in the upper ocean. J. Phys. Oceanogr, 7 , 952956.

  • Rantakokko, J., 1998: A framework for partitioning structured grids with inhomogeneous workload. Parallel Algorithms Appl, 13 , 135152.

  • Rodi, W., 1980: Turbulence Models and their Application in Hydraulics—A State-of-the-Art Review. International Association for Hydraulic Research, 104 pp.

    • Search Google Scholar
    • Export Citation
  • Seifert, T., , and Kayser B. , 1995: A high resolution spherical grid topography of the Baltic Sea. Meereswiss. Ber., Warnemünde, 9 , 7388.

    • Search Google Scholar
    • Export Citation
  • Semtner, A. J., 1974: A general circulation model for the World Ocean. Department of Meteorology Tech. Rep. 9, University of California, Los Angeles, 99 pp.

    • Search Google Scholar
    • Export Citation
  • Semtner, A. J., 1976: A model for the thermodynamic growth of sea ice in numerical investigations of climate. J. Phys. Oceanogr, 6 , 379389.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Semtner, A. J., , and Chervin R. M. , 1992: Ocean general circulation from a global eddy-resolving model. J. Geophys. Res, 97 , 54935550.

  • Smith, R. D., , Dukowicz J. K. , , and Malone R. C. , 1992: Parallel ocean general circulation modeling. Physica D, 60 , 3861.

  • Song, Y., , and Haidvogel D. B. , 1994: A semi-implicit ocean circulation model using a generalized topography-following coordinate system. J. Comput. Phys, 115 , 228244.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stammer, D., , Tokmakian R. , , Semtner A. J. , , and Wunsch C. , 1996: How well does a 1/4° global circulation model simulate large-scale oceanic observations? J. Geophys. Res, 101 , 2577925811.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stevens, D. P., 1990: On open boundary conditions for three dimensional primitive equation ocean circulation models. Geophys. Astrophys. Fluid Dyn, 51 , 103133.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stevens, D. P., 1991: The open boundary condition in the United Kingdom fine-resolution Antarctic model. J. Phys. Oceanogr, 21 , 14941499.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stigebrandt, A., 1987: A model of the vertical circulation of the Baltic deep water. J. Phys. Oceanogr, 17 , 17721785.

  • Svensson, U., 1978: A mathematical model of the seasonal thermocline. Department of Water Resources and Engineering Report 1002, University of Lund, Lund, Sweden, 187 pp.

    • Search Google Scholar
    • Export Citation
  • UNESCO, 1981: Tenth report of the joint panel on oceanographic tables and standards. UNESCO Technical Papers in Marine Science 36, UNESCO, 25 pp.

    • Search Google Scholar
    • Export Citation
  • Washington, W. M., and and Coauthors, 2000: Parallel climate model (PCM) control and transient simulations. Climate Dyn, 16 , 755774.

  • Webb, D. J., 1995: The vertical advection of momentum in Bryan-Cox-Semtner Ocean General Circulation models. J. Phys. Oceanogr, 25 , 31863195.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Webb, D. J., 1996: An ocean model code for array processor computers. Comput. Geophys, 22 , 569578.

  • Webb, D. J., , Coward A. C. , , de Cuevas B. A. , , and Gwilliam C. S. , 1997: A multiprocessor ocean circulation model using message passing. J. Atmos. Oceanic Technol, 14 , 175183.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Webb, D. J., , de Cuevas B. A. , , and Richmond C. S. , 1998: Improved advection schemes for ocean models. J. Atmos. Oceanic Technol, 15 , 11711187.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilhelmsson, T., , and Schüle J. , 1999: Running an operational Baltic Sea model on the T3E. Proc. Fifth European SGI/CRAY MPP Workshop, CINECA, Bologna, Italy, 10 pp.

    • Search Google Scholar
    • Export Citation
  • Zhang, Y., , and Hunke E. C. , 2001: Recent Arctic sea ice change simulated with a coupled ice-ocean model. J. Geophys. Res, 106 , 43694390.

  • View in gallery

    Bottom topography of the Baltic Sea including Kattegat and Skagerrak [data from Seifert and Kayser (1995)]. The model domain of the RCO is limited with open boundaries in the northern Kattegat (dashed line)

  • View in gallery

    Simulated ice-covered area in (109 m2) for the period Jul 1980 until Jun 1993. Here, tickmarks denote 1 Jul of the corresponding year. In this simulation RCO has a 6-nm horizontal resolution. Squares denote observed maximum ice extent. The dashed line denotes the Baltic Sea surface area including Kattegat (420 560 km2)

  • View in gallery

    Simulated ice thickness (in cm) (a) on 25 Feb 1993, and (b) on 16 Mar 1987. In this simulation RCO has a 2-nm horizontal resolution

  • View in gallery

    Processor map for 16 processors (16 slaves and 1 master) with bounding boxes. The largest box is emphasized using a thick frame and defines the total memory requirement. The horizontal and vertical axes scales are grid-point indices

  • View in gallery

    CPU time per time step as a function of the weight ratio with sea ice (dashed) and without sea ice (solid). The results are calculated with 32 processors using 2-nm horizontal resolution

  • View in gallery

    (a) Average and (b) maximum from each processor of CPU time per time step spent for the whole time step (solid), the wait-on-receive (dotted), the baroclinic (dashed), the barotropic (dashed-dotted), and the ice part (dashed–dotted with three dots). The results are calculated with 16 processors using 6-nm horizontal resolution. The period is from 1986 to 1989 starting on 2 Dec 1986

  • View in gallery

    Relative performance of RCO with (dashed) and without ice (dotted) and the ideal speedup (solid). Speedup is calculated relative to 9 slave processors. Numbers are determined for 9, 16, 32, 63, and 128 processors

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 123 123 10
PDF Downloads 17 17 7

Performance Analysis of a Multiprocessor Coupled Ice–Ocean Model for the Baltic Sea

View More View Less
  • 1 Rossby Centre, Swedish Meteorological and Hydrological Institute, Norrköping, Sweden
  • | 2 National Supercomputer Centre, Linköping University, Linköping, Sweden
© Get Permissions
Full access

Abstract

Within the Swedish Regional Climate Modelling Programme (SWECLIM) a 3D coupled ice–ocean model for the Baltic Sea has been developed to simulate physical processes on timescales of hours to decades. The model code is based on the global ocean GCM of the Ocean Circulation Climate Advanced Modelling (OCCAM) project and has been optimized for massively parallel computer architectures. The Hibler-type dynamic–thermodynamic sea ice model utilizes elastic–viscous–plastic rheology resulting in a fully explicit numerical scheme that improves computational efficiency. A detailed performance analysis shows that the ice model causes generic workload imbalance between involved processors. An improved domain partitioning technique minimizes load imbalance, but cannot solve the problem completely. However, it is shown that the total load imbalance is not more than 13% for a mild winter and about 8% for a severe winter. With respect to parallel processor performance, the code makes the best use of available computer resources.

Corresponding author address: Dr. H. E. Markus Meier, Rossby Centre, Swedish Meteorological and Hydrological Institute, SE-60176 Norrköping, Sweden. Email: markus.meier@smhi.se

Abstract

Within the Swedish Regional Climate Modelling Programme (SWECLIM) a 3D coupled ice–ocean model for the Baltic Sea has been developed to simulate physical processes on timescales of hours to decades. The model code is based on the global ocean GCM of the Ocean Circulation Climate Advanced Modelling (OCCAM) project and has been optimized for massively parallel computer architectures. The Hibler-type dynamic–thermodynamic sea ice model utilizes elastic–viscous–plastic rheology resulting in a fully explicit numerical scheme that improves computational efficiency. A detailed performance analysis shows that the ice model causes generic workload imbalance between involved processors. An improved domain partitioning technique minimizes load imbalance, but cannot solve the problem completely. However, it is shown that the total load imbalance is not more than 13% for a mild winter and about 8% for a severe winter. With respect to parallel processor performance, the code makes the best use of available computer resources.

Corresponding author address: Dr. H. E. Markus Meier, Rossby Centre, Swedish Meteorological and Hydrological Institute, SE-60176 Norrköping, Sweden. Email: markus.meier@smhi.se

1. Introduction

The Baltic Sea is one of the world's largest brackish water sea areas with a total surface, excluding the Danish Straits (Great Belt and the Sound), of 377 400 km2 and a corresponding volume of 21 200 km3 (Fig. 1). The mean water depth amounts to 56 m and the maximum depth to 451 m (Landsort Deep). The highly variable bottom topography separates the water masses into separate basins, delimited by high sills. Especially, the restricted water exchange with the North Sea through the Danish Straits greatly influences the hydrography of the Baltic Sea. The width of the narrowest part of the Sound between Denmark and Sweden is only 4 km and the sill depths of the Great Belt and the Sound are 18 and 8 m, respectively.

A pronounced feature of the Baltic is the seasonal sea ice cover. Sea ice acts as a relatively rigid insulating film between the air and the sea, which modifies air–sea exchange of momentum, heat, and material and influences local meteorological conditions. In respect to the ocean, sea ice influences the temperature and salinity characteristics of the water masses and the circulation of the Baltic Sea. Normally the ice season lasts 5–7 months, from November to May, with large interannual variability of ice extent. During a mild winter ice occurs only in the Bothnian Bay (the northernmost basin), but during a cold winter, the whole Baltic Sea becomes ice covered (Fig. 2).

Therefore, gridded 3D Baltic Sea models require, on the one hand, high vertical and horizontal resolution with a corresponding short time step to resolve the bottom topography, and need to be coupled with sophisticated dynamic–thermodynamic sea ice models to perform realistic multiyear simulations on the other hand. The longest timescale of the system is the diffusive timescale of the ocean and is about 30 yr. Hence, climate studies for the Baltic Sea using 3D coupled ice–ocean (or even coupled atmosphere–ice–ocean) models are special technical challenges requiring, for example, massively parallel computer architectures.

Nowadays, a number of ocean general circulation models have parallel versions of their code (Table 1). However, most of them are neither adapted to distributed memory machines like CRAY T3E nor coupled with a parallel thermodynamic–dynamic ice model. Instead, purely thermodynamic ice models or thermodynamic ice models together with the free-drift assumption are often used. To the authors' knowledge the following parallel ocean models are coupled with a fully thermodynamic–dynamic parallel ice model: Modular Ocean Model version 2 (MOM2; e.g., Gerdes et al. 2001, unpublished manuscript), Ocean Circulation Climate Advanced Modelling Programme [OCCAM; e.g., Rossby Centre Ocean model (RCO); see Meier et al. 1999], Parallel Ocean Program [POP; e.g., Parallel Climate Model (PCM); see Washington et al. 2000], Parallel Ocean Climate Model (POCM; e.g., Zhang and Hunke 2001), High Resolution Operational Model for the Baltic Sea (HIROMB; Wilhelmsson and Schüle 1999), S-Coordinate Primitive Equation Model [SPEM; e.g., Bremerhaven Regional Ice Ocean Simulations (BRIOS); see Timmermann et al. 2001, submitted to J. Geophys. Res.], Miami Isopycnic Coordinate Ocean Model (MICOM; e.g., Liasæter et al. 2001, submitted to J. Geophys. Res.),1 and Ocean Isopycnic Model (OPYC)2 (Oberhuber 1993b). These efforts have been developed in parallel during the past years. As far as we know, publications of improved parallelization strategies to avoid load imbalance within coupled ice–ocean models are not yet available.

Within the Swedish Regional Climate Modelling Programme (SWECLIM) a 3D coupled ice–ocean model for the Baltic Sea has been developed to gain understanding by simulating long-term climate changes and natural variability of the Baltic Sea. These simulations require a state-of-the-art supercomputer, which is available for SWECLIM in the form of a CRAY T3E-600 with 272 processors at the Swedish National Supercomputer Centre (NSC) in Linköping, Sweden. Three years ago, when SWECLIM began operations, none of the available Baltic Sea models were suitable for parallel computing. Hence, the RCO has been developed using the Ocean Circulation Climate Advanced Modelling program version (OCCAM; at the James Rennel Division, Southampton Oceanography Centre, Southampton, United Kingdom; see Webb et al. 1997) of the Bryan-Cox-Semtner primitive equation ocean model with a free surface. Since the OCCAM project focuses on global scales, it was necessary to add parameterizations important to the Baltic Sea. A two-equation turbulence closure scheme, open boundary conditions, and a sea ice model were main features that had to be implemented for our purposes. Due to limited computer resources, an additional effort has been undertaken to optimize model performance, which will be reported as follows: in the second section the model is described briefly. General features of the parallelization strategy have been summarized and appear in the third section. A new algorithm to calculate domain decompositions automatically and to optimize workload balance is presented in the fourth section. In the fifth and sixth sections, code instrumentation and results of performance analysis with special emphasis given to the impact of ice on performance, as well as parallel speedup, are discussed. The paper ends with a summary and conclusions that may be important to future code development of massively parallel applications similar to the one presented here.3

2. Model description

The Bryan–Cox–Semtner model is one of the most widely used general circulation models of the ocean (Bryan 1969; Semtner 1974; Cox 1984). The code is now available as the Geophysical Fluid Dynamics Laboratory (GFDL) MOM for use in different computer architectures (MOM 3.0; Pacanowski and Griffies 2000) and in a multiprocessor version for the global ocean (OCCAM; Webb et al. 1997). The OCCAM version includes an explicit free surface (Killworth et al. 1991), improved vertical and horizontal advection schemes (Webb 1995; Webb et al. 1998), a quadratic law for bottom friction (Cox 1984), harmonic horizontal viscosity and diffusivity, and a third-order polynomial approximation (Bryan and Cox 1972) for the equation of state, as proposed by the Joint Panel on Oceanographic Tables and Standards (UNESCO 1981). The conservation equations of momentum, mass, potential temperature, and salinity are discretized horizontally in spherical coordinates on a staggered grid (Arakawa B grid; cf. Mesinger and Arakawa 1976) and vertically in nonuniform geopotential levels.

RCO is a further development of the OCCAM code applied to the Baltic Sea. As the model domain of RCO is limited (contrary to the global OCCAM) with open boundaries in the northern Kattegat (Fig. 1), open boundary conditions, as developed by Stevens (1990, 1991) for the Bryan–Cox–Semtner model, has been reimplemented. In case of inflow, temperature and salinity values at the boundaries are nudged toward observed climatological profiles. In case of outflow, a modified Orlanski radiation condition is utilized (Orlanski 1976). Sea level elevation at the boundaries is prescribed from hourly tide gauge data.

In RCO a two-equation turbulence closure, the kε model (Svensson 1978; Rodi 1980) is embedded. Two prognostic equations for turbulent kinetic energy (k) and for dissipation (ε) of k have to be solved additionally at every grid point of the 3D model. Appropriately chosen flux boundary conditions are included to take into account the effect of breaking surface gravity waves, which result in an enhanced turbulence surface layer (Craig and Banner 1994). The kε model is extended to include a parameterization for breaking internal waves (Stigebrandt 1987). In addition to turbulent vertical transports, the divergence of absorbed intensity of the penetrated shortwave radiation is able to heat the water column. The solar intensity is parameterized using two extinction lengths according to Paulson and Simpson (1977).

The ocean model in RCO is coupled with a Hibler-type (Hibler 1979) two-level (open water and ice) dynamic–thermodynamic sea ice model. An extension of the widely used viscous–plastic rheology with an elastic component (Hunke and Dukowicz 1997) leads to a fully explicit numerical scheme that improves computational efficiency, particularly on high-resolution grids, and easily adapts to parallel computer architectures. A first version of the sea ice model of the OCCAM project has been adopted and significantly modified to simulate seasonal ice in the Baltic. Within each time step, the dynamic component needs to be subcycled several times to damp elastic waves. As described in Hunke and Zhang (1999), the elastic term initially makes a prediction for the ice stress, which is then “corrected” toward the viscous-plastic solution by means of subcycling. By choosing the number of subcycles (N), a compromise has to be made between an energetic solution that quickly adjusts during rapidly changing forcing conditions (small N) and a solution that does not significantly differ from the viscous–plastic one on longer timescales (high N). The equations of the ice model are discretized on the same Arakawa B grid as used for the ocean.

The ice thermodynamic is based on Semtner's layer models (Semtner 1976) for thick ice/snow (multiple layers) and thin ice/snow (“zero” layer) using characteristic discrimination thicknesses for ice (25 cm) and snow (15 cm). In RCO thick ice consists of one or two layers, and thick snow consists of one layer. The reason for the discrimination between thick and thin ice/snow is numerical stability. The zero-layer models for ice and snow are based on simple heat budgets.

The model depths are based on realistic bottom topography data (Seifert and Kayser 1995) as shown in Fig. 1. RCO makes use of 41 vertical levels with layer thicknesses from 3 m close to the surface to 12 m near the bottom. Within the upper 99 m, the layer thickness is constant. In greater depths it increases with a cosine-shaped profile toward the bottom. The maximum depth in RCO is only 250 m to avoid small time steps. In this paper two different horizontal resolutions are used: 2 and 6 nautical miles (nm) corresponding to Δϕ = 2′, Δλ = 4′ and Δϕ = 6′, Δλ = 12′ with latitude ϕ and longitude λ.

RCO is forced by monthly river runoff data (Bergström and Carlsson 1994) and surface fluxes calculated with standard bulk formulas from 3-h gridded atmospheric observations developed at the Swedish Meteorological and Hydrological Institute (SMHI). Data from all available synoptic stations (about 700–800) covering the whole Baltic Sea drainage basin are interpolated on a 1° regular horizontal grid (L. Meuller 2001, personal communication). A 2D univariate optimum interpolation scheme is used.

A more detailed model description of RCO has been presented by Meier et al. (1999) and Meier (2000) but will not be repeated here. Preliminary results of multiyear simulations can be found in Meier (1999) and validation analysis is presented in the paper by Meier et al. (2001, submitted to J. Geophys. Res.). Selected results, the simulated ice-covered area for the period July 1980 until June 1993, and the ice thickness distributions of maximum ice extent from a normal and a severe winter are shown in Figs. 2 and 3. The agreement in Fig. 2 between model results and observations is quite good. RCO reproduces the large interannual variability of ice extent well. A comparison of the snapshots (Fig. 3) with corresponding ice chart data from the SMHI ice service (not shown) reveals that convergences and divergences in the ice pack, dependent on highly variable wind forcing, are simulated realistically.

3. Parallelization strategy

RCO was started as a subset of the fully global parallel OCCAM model and keeps the same strategy for parallelization. A new method for automatic domain decomposition of the 3D longitude–latitude–depth (i, j, k) grid is used for mapping the Baltic Sea (see next section), which results in almost optimum data distribution.

RCO utilizes a master–slave concept. The master processor controls the run, initializes all slave processors, and does all I/O. The “real” computation is performed by the slave processors. Within the main time-stepping subroutine, the baroclinic (internal mode), barotropic (external mode), and ice calculations are performed, each with their own time-stepping loop. The baroclinic time-stepping loop includes a loop over vertical sea depth columns, one for each ocean latitude and longitude index. The barotropic and the ice time step loops are for the ocean surface points only. The time-stepping algorithm is described in detail by Webb et al. (1997). Table 2 lists the time steps used in RCO.

The horizontal model domain is partitioned into several regions, one for each processor. Each processor domain consists of a core region surrounded by a ring of “inner halo” points, which in turn is surrounded by a layer of “outer halo” points [cf. Fig. 3 by Webb et al. (1997)]. The two halo regions represent those points involved in transferring data between processors. At the end of each time step, model variables from the halo regions are sent to the neighboring processes for use during the next time step. The message passing is performed between neighboring processors using the CRAY SHMEM library on the T3E.

The code supports arbitrarily irregular domains, thus allowing for fine tuning with respect to load balancing. A master processor map is used to describe the partitioning. This map is simply a two-dimensional array having zeroes for land points and a processor number for each sea grid point, denoting the processor responsible for that particular grid point. Only sea points participate in the computation, which is important for performance, because only about 9% of the 3D array including the entire model domain are sea points (corresponding to 409 464 active grid points for the 2-nm resolution). This is possible since the order of loops of the calculations has been changed compared to the earlier Geophysical Fluid Dynamics Laboratory (GFDL) code. The inner loop is performed for the k index (vertical levels) and the outer loop contains all surface grid points of the processor domain. By contrast, the earlier GFDL code conducts calculations on squares horizontally masking land points after calculation.

Also the earlier effort to parallelize MICOM has implemented the computationally efficient strategy to consider only sea points (Bleck et al. 1995). As shown by Bleck et al. (1995), performance of the message passing version of MICOM (MP-MICOM) improved, if in case of an equal-sized rectangular domain partitioning those processors, whose subdomains contain only land points, were removed.

The parallelization strategy of HIROMB is based on a subdivision of the computational grid into a smaller set of rectangular grid blocks of any size, which are distributed onto the processor elements. Thus, all algorithms and numerics in the parallel version are the same as in the serial version. According to Wilhelmsson and Schüle (1999), however, inactive grid points within a block incur computational costs, which cannot be ignored to get good load balance.

Figure 4 shows a typical RCO processor map for a 16 processor run (16 slaves and 1 master). For high-resolution models, memory requirements restrict the maximum dimensions of the processor regions, which sets the lower limit on the number of processors that can run the model. The total memory requirement is given by the largest box surrounding the irregular-shaped processor domains (Fig. 4). Although not necessary, RCO uses the memory of a virtual box with dimensions given by the maximum length of all bounding boxes in longitudinal direction times the maximum length in latitudinal direction.

4. Processor map generator (load balancing)

The Baltic Sea is both irregular in shape and has a highly variable bottom topography. In order to achieve the best load balance and minimize processor communication, a tool has been developed to automatically create processor maps for various processor configurations. This tool is based on a new partitioning strategy utilizing graph theory (Rantakokko 1998). New processor maps can now be generated on the fly with no constraint to the number of processors, as long as available memory is sufficient. Since the model has a checkpoint-restart facility, we can, in principle, use this tool to generate new processor maps automatically and repeatedly, even within one run and without recompilation. To assess the work for each vertical column belonging to the surface grid point (i, j), the following workmap or weighting function w is used:
wi,jαβkmti,j
where kmt is the number of vertical levels at (i, j). The idea using two weights (α and β) is that one can now adjust to the fact that the work in each column is not just a function of depth, but also a function of the ratio of work in the baroclinic part (which is a function of depth) and the barotropic part (only calculated for the surface points), and (in case) the ice part. For a given workmap w and for any given processor number, the software package calculates first the corresponding so-called graph and second processor maps using multilevel graph partitioning. The used public domain graph partitioning package (METIS) is highly optimized for unstructured grids, as our application (Karypis and Kumar 1995). It provides high quality partitions, is extremely fast, and provides low fill orderings. In an optimal workmap, the load will be even and the number of communication points will be at a minimum for the processor domains.

Empirically, we have found a clear performance maximum for the ratio β/α. Figure 5 shows the results for a 32 processor run with 2-nm resolution. Only CPU time necessary to integrate the prognostic model equations is considered. Without sea ice, best performance is obtained using α = 10 and β = 1 (ratio 0.1, see lower curve). During a simulation of an ice situation, as depicted in Fig. 3 (maximum ice extent for the mild winter of 1992/93), the minimum is shifted toward α = 20 and β = 1 (ratio 0.05, see upper curve). Using improved processor maps leads to a performance gain of 23% in summer and 10% in winter, compared to corresponding runs with a processor map assigning the same number of surface grid points to each slave (α = 1, β = 0). The latter would be the perfect solution of the interactive procedure of the original OCCAM utility program to generate processor maps. Indeed, improvements are even more pronounced compared to the first RCO runs, because the earlier used “hand-made” processor maps are far from being perfect, especially in higher processor numbers. The average performance of 6-nm resolution for the 13-yr integrations, with realistic seasonal sea ice included, increased by 25% using improved processor maps.

A further extension was tested. Instead of using the function above, we collected actual performance data for each vertical column. We used that as weighting factor to generate a new processor map that was used for a new run. However, performance improved only slightly.

Further analysis showed that there is still a remaining load imbalance that could not be solved by further refinement of the processor map. Load imbalance in any of the three “sub” phases (baroclinic, barotropic, and ice) causes some slaves to take longer to complete their task, generating a delay in the nearest neighbor communication that is performed after each time step. Both barotropic and ice dynamic parts have multiple time steps and we found that the delay will propagate through the processor mesh, since slave after slave will be idle, waiting for a message from a neighboring processor. We call this waiting time “wait-on-receive.”

Therefore, in order to achieve load balance, the load must be balanced not only in the sum of the three sub phases, but also in each of them. This cannot be achieved with a single processor map, since the amount of work in each grid point differs in different phases. For example, ice in the Baltic develops very localized, from north to south. Instead we would need a different processor map for each phase, in which each processor map was generated based on the computation done in that phase. To switch between processor maps means frequent data redistribution between processors, which in many cases involves a large number of grid points. We have not investigated this approach further, but it is estimated that the communication overhead would be prohibitive and computation time would increase rather than decrease.

5. Code instrumentation for performance analysis

For performance analysis purposes, the code is modified to include calls to a profiling library that calculates run time statistics as well as hardware performance counters for different sections of the code. The new performance analysis tool is based on our own development and unsupported libraries from CRAY.

Performance statistics are averaged over every Ith time step, where I is a user defined environment variable. For each block of I time steps we collect for each processor:

  • Time spent in user defined sections of the code.
  • Number of floating point operations and on chip secondary data cache misses using the hardware performance counters on the EV-5 processor on the T3E.
According to this, the following data are computed and printed out:
  • Average time per complete time step for the I time step period.
  • Maximum, minimum, and average of time spent from each processor and number of calls to user-defined code sections (in this paper barotropic, baroclinic, ice, and wait-on-receive).
  • Average millions of floating point operations (MFLOP) per second (for each processor) for the last period and average over all previous periods.
  • Number of seconds that each processor spent waiting for data from memory (assuming a memory latency of 84 cycles per cache miss) for the last period and average over all previous periods.
All runs were performed at the 272-processor T3E-600 at NSC. The processors clock frequency is 300 MHz.

6. Results of performance analysis

In order to analyze the performance of the RCO code and to especially take a closer look at how ice influences performance, we have focused on the 3-yr simulation for 1986–89 using the 6-nm resolution model. This period has a large variation in ice extent from year to year (Fig. 2). Winter 1986/87 had a maximum ice extent that is about three times larger than that in 1988/89, while 1987/88 is somewhat in between. Using the 2-nm resolution model, we have simulated a 1-yr period, covering the fairly mild winter of 1992/93.

Figure 6 shows CPU time for various parts of the code as a function of time steps. For each full time step (involving one baroclinic and several barotropic and ice dynamic time steps) the CPU time for the baroclinic, barotropic, ice, and wait-on-receive part is measured for each processor. The CPU time is then plotted in two different ways: the average over all processors (Fig. 6a) and the maximum from all the processors (Fig. 6b).

In Fig. 6a, we can clearly distinguish the winter seasons from the peaks in computation time. In all runs, the number of ice dynamic subcycles has been set to N = 40 (cf. Table 2), because no obvious differences to the case with N = 100, as recommend by Hunke and Zhang (1999), have been observed. As computation costs for ice dynamics are much higher than those for ice thermodynamics, the total computation cost of ice depends linearly on N approximately (not shown). The additional computation time required for ice is about 1.5 times larger for the severe winter of 1986/87 than the mild winter of 1988/89, which does not correlate well with the factor of 3 as seen in the ice extent (Fig. 2). However, the variation of average time spent in the ice routines correlates rather well with the ice extent curves. It is striking that the time spent in wait-on-receive mode is substantial and almost identical for the 3 yr. It is also fairly constant over the entire winter period. The decrease in computation time in ice routines in early spring is not followed by a similar drop in wait-on-receive, until a sharp drop occurs when the ice has completely melted. Maximum CPU time is of especial interest in ice situations, since we often have the case where at least one processor does not have any ice in its domain. The delay caused by load imbalance (discussed in section 4) should, thus, imply that the processor with no ice will spend the same amount of time in waiting (maximum time for wait-on-receive) as the CPU time the processor with most ice needs for the ice dynamic calculations (maximum time for ice). Indeed, this is the case as seen in Fig. 6b, in which there is a very close correlation between the amplitudes of the curves for ice and wait-on-receive.

Figure 6 can be best explained by making a load balancing analysis. Even in a mild winter some ice occurs in the North. As the Baltic Sea extends mainly in a north–south direction, this means a limited number of processors will be responsible for doing the computations for the ice. The rest will be idle, waiting for the ice part to be computed before moving to the next time step. A mild winter will only have ice enough to cover the area for few processors. The more severe the winter, the farther south the ice will extend and the more processors will be involved. However, the total computation time will not increase, since these processors will now be computing their own ice instead of waiting for others to finish. In Fig. 6b, the maximum time one processor spends on the ice calculations corresponds well with the maximum idle time of another processor.

What impact the existence of ice has on the total computation time is illustrated in Table 3, which integrates the total computation time for different years. The ideal computation time for ice assumes a perfect load balance and is the result of integrating the average time for ice only (Fig. 6a). Load imbalance is calculated from the ratio of the actual computation time and the summed computation times without ice and ice only. Table 3 shows the following.

  • The difference in computation time between a mild and a severe winter is merely 6%.
  • The total overhead of including ice (ratio of actual computation time and computation time without ice) is about 24%–31% depending on the severity of the winter (6 nm). In the 2-nm version, the overhead is similar and amounts to 21%.
  • The load imbalance is about 8% (severe winter) to 13% (mild winter) of the total computation time independent of the resolution.
In order to analyze parallel performance, we have computed the parallel speedup of performance versus number of processors for the 2-nm resolution (Fig. 7). The number of processors ranges between 9 and 128, since simulations using less than nine processors require more memory per processor than what is available on the T3E-600 used. Speedup for days with ice as well as days without ice were computed. The data of days with ice are from the period in early March 1993, just after the occurrence of maximum ice extent for the winter 1992/93 (Fig. 3). The same amount of work was done in all runs. Hence, as the processor number increased, the size of the domain assigned to each processor decreased. The parallel speedup is quite impressive, especially for days without ice, in which 128 processors are more than 11 times faster than 9 processors, corresponding to a parallel efficiency of 0.78. Parallel efficiency is calculated here as the ratio of actual speedup and ideal speedup. With ice, the entire code has a parallel efficiency of 0.55. If we look at the ice dynamics only, this part of the program has a parallel efficiency of 0.37.

Although ice creates significant load imbalance, it does not prevent substantial speedup even at large processor counts. This is very much a function of how the processor maps are generated. As long as an increase in the total amount of processors generates a sufficient decrease in maximum ice extent on any processor, speedup can be obtained.

Processor aggregate performance ranges from 0.33 GFLOP (109 FLOP) per second (9 processors) to 3.63 GFLOP per second (128 processors) for runs without ice using the weight ratio β/α = 0.1 (Table 4). No “empty” flops (operating zeros) are calculated, since only water grid points are calculated.

We noticed that the increased memory bandwidth available through the T3E STREAMS hardware prefetching buffers had no impact on performance. This came as a surprise, since the number of secondary data cache misses are very high, indicating that the EV5-processor spends significant time waiting for data from memory.

Further analysis of the per processor performance showed (not discussed in the paper) that a major limiting factor of performance is the lack of compact storage in memory for model variables. Even though the model supports arbitrarily irregular grids, in which only sea points are considered, the allocation of memory uses the same size arrays for all slaves. The dimension of the arrays are based on the maximum number (over all processors) of grid points for longitude, latitude, and depth. The highly irregular topography of the Baltic Sea produces processor maps with a large variation in all three dimensions, resulting in much larger memory regions being allocated than actually used. This in turn leads to poor cache and memory usage.

7. Summary and conclusions

A coupled ice–ocean model for the Baltic Sea has been developed and its parallel processor performance on a CRAY T3E-600 has been analyzed systematically using a new performance analysis tool. This tool has also been utilized to optimize coding during the model development. For example, the number of subcycles of the ice dynamic part has been chosen to sufficiently approximate viscous–plastic ice behavior by optimal performance.

In addition, a new processor map generator has been implemented to guarantee the best possible workload balance and to minimize communication between processors. Multiyear simulations of the 6-nm version using modified processor maps are about 25% faster. It has been explained why even more sophisticated processor maps did not result in further significant improvements. The inclusion of ice in the model has a very dramatic effect on the load balance, since ice can be present in only a small part of the computational grids in a mild winter.

In order to analyze the impact of ice on performance we have done a series of runs during both mild and severe winters. The total computational overhead for the entire model including ice is around 24% for a mild winter and 31% for a severe winter. We found that a mild winter has much more load imbalance than a severe winter. Fortunately, the total load imbalance is not more than 13% for a mild winter and about 8% for a severe winter. Although the load imbalance increases with the number of processors involved, we could show that there is still substantial speedup, even for 128 processors in the worst case of partial ice cover as shown in Fig. 3.

Parallelization strategies using different processor maps for computation of the baroclinic, barotropic and sea ice have not been explored, because of the expected message passing overhead resulting from frequent redistribution of data between the processors.

Further investigations to reduce the residual workload imbalance are not our primary concern, because it is not planned within SWECLIM to run the model on more than 128 processors. Instead, future work should concentrate on single node performance to improve cache and memory usage. Nevertheless, the evaluation shows that RCO is a fast, state-of-the-art coupled ice–ocean model for the Baltic Sea, which is suitable for climate studies using satisfactorily available computer resources.

Acknowledgments

The SWECLIM program and the Rossby Centre are funded by MISTRA (Foundation for Strategic Environmental Research) and by SMHI. The testing and running of RCO has been done on the CRAY T3E-600 at NSC. The setup of RCO and the ongoing work within SWECLIM are supported by the OCCAM core team in Southampton, especially by Andrew Coward. The program for the calculation of the improved processor maps and a MATLAB visualization tool have been written by Jarmo Rantakokko. A first version of the partitioning software package has been developed for HIROMB. Very helpful comments on an earlier draft were made by René Redler.

REFERENCES

  • Backhaus, J. O., 1983: A semi-implicit scheme for the shallow water equations for application to shelf sea modeling. Cont. Shelf Res, 2 , 234254.

    • Search Google Scholar
    • Export Citation
  • Backhaus, J. O., 1985: A three-dimensional model for the simulation of shelf-sea dynamics. Dtsch. Hydrogr. Z, 38 , 165187.

  • Beare, M. I., , and Stevens D. P. , 1997: Optimisation of a parallel ocean general circulation model. Ann. Geophys, 15 , 13691377.

  • Bergström, S., , and Carlsson B. , 1994: River runoff to the Baltic Sea: 1950–1990. Ambio, 23 , 280287.

  • Berntsen, J., 2000: Users guide for a modesplit σ-coordinate numerical ocean model. Department of Applied Mathematics, University of Bergen, Tech. Rep. 135, 48 pp.

    • Search Google Scholar
    • Export Citation
  • Bleck, R., , and Boudra D. , 1981: Initial testing of a numerical ocean circulation model using a hybrid (quasi-isopycnic) vertical coordinate. J. Phys. Oceanogr, 11 , 755770.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bleck, R., , Hanson H. P. , , Hu D. , , and Kraus E. B. , 1989: Mixed layer/thermocline interaction in a three-dimensional isopycnal model. J. Phys. Oceanogr, 19 , 14171439.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bleck, R., , Rooth C. , , Hu D. , , and Smith L. T. , 1992: Salinity-driven transients in a wind- and thermohaline-forced isopycnic coordinate model of the North Atlantic. J. Phys. Oceanogr, 22 , 14861505.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bleck, R., , Dean S. , , O'Keefe M. , , and Sawdey A. , 1995: A comparison of data-parallel and message-passing versions of the Miami Isopycnic Coordinate Ocean Model (MICOM). Parallel Comput, 21 , 16951720.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Blumberg, A. F., , and Mellor G. L. , 1987: A description of a three-dimensional coastal ocean circulation model. Three-dimensional coastal ocean models. N. S. Heaps, Ed., Coastal Estuarine Series, Vol. 4, American Geophysical Union, 208 pp.

    • Search Google Scholar
    • Export Citation
  • Bryan, K., 1969: A numerical method for the study of the circulation of the World Ocean. J. Comput. Phys, 4 , 347376.

  • Bryan, K., , and Cox M. D. , 1972: An approximate equation of state for numerical models of ocean circulation. J. Phys. Oceanogr, 2 , 510514.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cox, M. D., 1984: A primitive equation 3-dimensional model of the ocean. Geophysical Fluid Dynamics Laboratory Ocean Group Tech. Rep. 1, Princeton University, 141 pp.

    • Search Google Scholar
    • Export Citation
  • Craig, P. D., , and Banner M. L. , 1994: Modeling wave-enhanced turbulence in the ocean surface layer. J. Phys. Oceanogr, 24 , 25462559.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • de Szoeke, R. A., 2000: Equations of motion using thermodynamic coordinates. J. Phys. Oceanogr, 30 , 28142829.

  • de Szoeke, R. A., , Springer S. R. , , and Oxilia D. M. , 2000: Orthobaric density: A thermodynamic variable for ocean circulation studies. J. Phys. Oceanogr, 30 , 28302852.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dukowicz, J. K., , Smith R. D. , , and Malone R. C. , 1993: A reformulation and implementation of the Bryan–Cox–Semtner Ocean Model on the connection machine. Atmos. Ocean. Tech, 10 , 195208.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Eigenheer, A., , and Dahlin H. , 1998: Quality assessment of the High Resolution Operational Model for the Baltic Sea—HIROMB. ICES Statutory Meeting, Lisbon, Portugal, ICES, 13 pp.

    • Search Google Scholar
    • Export Citation
  • Griffies, S. M., , Böning C. , , Bryan F. O. , , Chassignet E. P. , , Gerdes R. , , Hasumi H. , , Hirst A. , , Treguier A-M. , , and Webb D. , 2000: Developments in ocean climate modelling. Ocean Modelling, 2 , 123192.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haidvogel, D. B., , Wilkin J. L. , , and Young R. E. , 1991: A semi-spectral primitive equation ocean circulation model using vertical sigma and orthogonal curvilinear horizontal coordinates. J. Comput. Phys, 94 , 151185.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hallberg, R., 1997: HIM: The Hallberg isopycnal coordinate primitive equation model. NOAA GFDL Tech. Rep., Princeton University, 39 pp.

  • Hasumi, H., 2000: CCSR Ocean Component Model (COCO) version 2.1. CCSR Report 13.

  • Hibler, W. D., 1979: A dynamic thermodynamic sea ice model. J. Phys. Oceanogr, 9 , 817846.

  • Hunke, E. C., , and Dukowicz J. K. , 1997: An elastic-viscous-plastic model for sea ice dynamics. J. Phys. Oceanogr, 27 , 18491867.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hunke, E. C., , and Zhang Y. , 1999: A comparison of sea ice dynamics models at high resolution. Mon. Wea. Rev, 127 , 396408.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Karypis, G., , and Kumar V. , 1995: METIS—Unstructured graph partitioning and sparse matrix ordering system, version 2.0. Department of Computer Science Tech. Rep., University of Minnesota, 16 pp.

    • Search Google Scholar
    • Export Citation
  • Killworth, P. D., , Stainforth D. , , Webb D. J. , , and Paterson S. M. , 1991: The development of a free-surface Bryan-Cox-Semtner ocean model. J. Phys. Oceanogr, 21 , 13331348.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Madec, G., , Delecluse P. , , Imbard M. , , and Lévy C. , 1998: OPA 8.1 Ocean General Circulation Model reference manual. Note du Pôle de modélisation XX, Institut Pierre-Simon Laplace, France, 91 pp.

    • Search Google Scholar
    • Export Citation
  • Marshall, J., , Hill C. , , Perelman L. , , and Adcroft A. , 1997a: Hydrostatic, quasi-hydrostatic, and nonhydrostatic ocean modeling. J. Geophys. Res, 102 , 57335752.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marshall, J., , Adcroft A. , , Hill C. , , Perelman L. , , and Heisey C. , 1997b: A finite-volume, incompressible Navier Stokes model for studies of the ocean on parallel computers. J. Geophys. Res, 102 , 57535766.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Meier, H. E. M., 1999: First results of multi-year simulations using a 3D Baltic Sea model. Swedish Meteorological and Hydrological Institute Reports Oceanography 27, 48 pp.

    • Search Google Scholar
    • Export Citation
  • Meier, H. E. M., 2000: The use of the kε turbulence model within the Rossby Centre regional ocean climate model: Parameterization development and results. Swedish Meteorological and Hydrological Institute Reports Oceanography 28, 81 pp.

    • Search Google Scholar
    • Export Citation
  • Meier, H. E. M., , Döscher R. , , Coward A. C. , , Nycander J. , , and Döös K. , 1999: RCO—Rossby Centre regional Ocean climate model: Model description (version 1.0) and first results from the hindcast period 1992/93. Swedish Meteorological and Hydrological Institute Reports Oceanography 26, 102 pp.

    • Search Google Scholar
    • Export Citation
  • Mesinger, F., , and Arakawa A. , 1976: Numerical Methods Used in Atmospheric Models. GARP Publications Series, Vol. 1, World Meteorological Organisation, 64 pp.

    • Search Google Scholar
    • Export Citation
  • Oberhuber, J. M., 1993a: The OPYC ocean general circulation model—The description of a coupled snow, sea-ice, mixed layer and isopycnal ocean model. Deutsches Klimarechenzentrum Tech. Rep. 7, 130 pp.

    • Search Google Scholar
    • Export Citation
  • Oberhuber, J. M., 1993b: Simulation of the Atlantic circulation with a coupled sea ice–mixed-layer–isopycnal general circulation model. Part I: Model description. J. Phys. Oceanogr, 23 , 808829.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Oberpriller, W. D., , Sawdey A. , , O'Keefe M. T. , , and Gao S. , 1999: Parallelizing the Princeton Ocean Model using TOPAZ. Parallel Computer Systems Laboratory, Department of Electrical and Computer Engineering Tech. Rep. University of Minnesota, 21 pp.

    • Search Google Scholar
    • Export Citation
  • Orlanski, I., 1976: A simple boundary condition for unbounded hyperbolic flows. J. Comput. Phys, 21 , 251269.

  • Pacanowski, R. C., , and Griffies S. M. , 2000: MOM 3.0 Manual (draft). NOAA Geophysical Fluid Dynamics Laboratory, Princeton University, 680 pp.

    • Search Google Scholar
    • Export Citation
  • Paulson, C. A., , and Simpson J. J. , 1977: Irradiance measurements in the upper ocean. J. Phys. Oceanogr, 7 , 952956.

  • Rantakokko, J., 1998: A framework for partitioning structured grids with inhomogeneous workload. Parallel Algorithms Appl, 13 , 135152.

  • Rodi, W., 1980: Turbulence Models and their Application in Hydraulics—A State-of-the-Art Review. International Association for Hydraulic Research, 104 pp.

    • Search Google Scholar
    • Export Citation
  • Seifert, T., , and Kayser B. , 1995: A high resolution spherical grid topography of the Baltic Sea. Meereswiss. Ber., Warnemünde, 9 , 7388.

    • Search Google Scholar
    • Export Citation
  • Semtner, A. J., 1974: A general circulation model for the World Ocean. Department of Meteorology Tech. Rep. 9, University of California, Los Angeles, 99 pp.

    • Search Google Scholar
    • Export Citation
  • Semtner, A. J., 1976: A model for the thermodynamic growth of sea ice in numerical investigations of climate. J. Phys. Oceanogr, 6 , 379389.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Semtner, A. J., , and Chervin R. M. , 1992: Ocean general circulation from a global eddy-resolving model. J. Geophys. Res, 97 , 54935550.

  • Smith, R. D., , Dukowicz J. K. , , and Malone R. C. , 1992: Parallel ocean general circulation modeling. Physica D, 60 , 3861.

  • Song, Y., , and Haidvogel D. B. , 1994: A semi-implicit ocean circulation model using a generalized topography-following coordinate system. J. Comput. Phys, 115 , 228244.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stammer, D., , Tokmakian R. , , Semtner A. J. , , and Wunsch C. , 1996: How well does a 1/4° global circulation model simulate large-scale oceanic observations? J. Geophys. Res, 101 , 2577925811.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stevens, D. P., 1990: On open boundary conditions for three dimensional primitive equation ocean circulation models. Geophys. Astrophys. Fluid Dyn, 51 , 103133.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stevens, D. P., 1991: The open boundary condition in the United Kingdom fine-resolution Antarctic model. J. Phys. Oceanogr, 21 , 14941499.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stigebrandt, A., 1987: A model of the vertical circulation of the Baltic deep water. J. Phys. Oceanogr, 17 , 17721785.

  • Svensson, U., 1978: A mathematical model of the seasonal thermocline. Department of Water Resources and Engineering Report 1002, University of Lund, Lund, Sweden, 187 pp.

    • Search Google Scholar
    • Export Citation
  • UNESCO, 1981: Tenth report of the joint panel on oceanographic tables and standards. UNESCO Technical Papers in Marine Science 36, UNESCO, 25 pp.

    • Search Google Scholar
    • Export Citation
  • Washington, W. M., and and Coauthors, 2000: Parallel climate model (PCM) control and transient simulations. Climate Dyn, 16 , 755774.

  • Webb, D. J., 1995: The vertical advection of momentum in Bryan-Cox-Semtner Ocean General Circulation models. J. Phys. Oceanogr, 25 , 31863195.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Webb, D. J., 1996: An ocean model code for array processor computers. Comput. Geophys, 22 , 569578.

  • Webb, D. J., , Coward A. C. , , de Cuevas B. A. , , and Gwilliam C. S. , 1997: A multiprocessor ocean circulation model using message passing. J. Atmos. Oceanic Technol, 14 , 175183.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Webb, D. J., , de Cuevas B. A. , , and Richmond C. S. , 1998: Improved advection schemes for ocean models. J. Atmos. Oceanic Technol, 15 , 11711187.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilhelmsson, T., , and Schüle J. , 1999: Running an operational Baltic Sea model on the T3E. Proc. Fifth European SGI/CRAY MPP Workshop, CINECA, Bologna, Italy, 10 pp.

    • Search Google Scholar
    • Export Citation
  • Zhang, Y., , and Hunke E. C. , 2001: Recent Arctic sea ice change simulated with a coupled ice-ocean model. J. Geophys. Res, 106 , 43694390.

Fig. 1.
Fig. 1.

Bottom topography of the Baltic Sea including Kattegat and Skagerrak [data from Seifert and Kayser (1995)]. The model domain of the RCO is limited with open boundaries in the northern Kattegat (dashed line)

Citation: Journal of Atmospheric and Oceanic Technology 19, 1; 10.1175/1520-0426(2002)019<0114:PAOAMC>2.0.CO;2

Fig. 2.
Fig. 2.

Simulated ice-covered area in (109 m2) for the period Jul 1980 until Jun 1993. Here, tickmarks denote 1 Jul of the corresponding year. In this simulation RCO has a 6-nm horizontal resolution. Squares denote observed maximum ice extent. The dashed line denotes the Baltic Sea surface area including Kattegat (420 560 km2)

Citation: Journal of Atmospheric and Oceanic Technology 19, 1; 10.1175/1520-0426(2002)019<0114:PAOAMC>2.0.CO;2

Fig. 3.
Fig. 3.

Simulated ice thickness (in cm) (a) on 25 Feb 1993, and (b) on 16 Mar 1987. In this simulation RCO has a 2-nm horizontal resolution

Citation: Journal of Atmospheric and Oceanic Technology 19, 1; 10.1175/1520-0426(2002)019<0114:PAOAMC>2.0.CO;2

Fig. 4.
Fig. 4.

Processor map for 16 processors (16 slaves and 1 master) with bounding boxes. The largest box is emphasized using a thick frame and defines the total memory requirement. The horizontal and vertical axes scales are grid-point indices

Citation: Journal of Atmospheric and Oceanic Technology 19, 1; 10.1175/1520-0426(2002)019<0114:PAOAMC>2.0.CO;2

Fig. 5.
Fig. 5.

CPU time per time step as a function of the weight ratio with sea ice (dashed) and without sea ice (solid). The results are calculated with 32 processors using 2-nm horizontal resolution

Citation: Journal of Atmospheric and Oceanic Technology 19, 1; 10.1175/1520-0426(2002)019<0114:PAOAMC>2.0.CO;2

Fig. 6.
Fig. 6.

(a) Average and (b) maximum from each processor of CPU time per time step spent for the whole time step (solid), the wait-on-receive (dotted), the baroclinic (dashed), the barotropic (dashed-dotted), and the ice part (dashed–dotted with three dots). The results are calculated with 16 processors using 6-nm horizontal resolution. The period is from 1986 to 1989 starting on 2 Dec 1986

Citation: Journal of Atmospheric and Oceanic Technology 19, 1; 10.1175/1520-0426(2002)019<0114:PAOAMC>2.0.CO;2

Fig. 7.
Fig. 7.

Relative performance of RCO with (dashed) and without ice (dotted) and the ideal speedup (solid). Speedup is calculated relative to 9 slave processors. Numbers are determined for 9, 16, 32, 63, and 128 processors

Citation: Journal of Atmospheric and Oceanic Technology 19, 1; 10.1175/1520-0426(2002)019<0114:PAOAMC>2.0.CO;2

Table 1.

Selected list of parallel ocean model codes including documentation source. An overview about ocean models currently developed and supported with applications to climate related studies is given by Griffies et al. (2000)

Table 1.
Table 2.

Time steps used in RCO for two different horizontal resolutions

Table 2.
Table 3.

Computation time (s) for three 6-nm runs using 16 processors for the years 1986–89 and for one 2-nm run using 32 processors for 1992/93

Table 3.
Table 4.

Processor aggregate performance in 109 floating point operations per second

Table 4.

1

The Norwegian MICOM application uses OpenMP statements and the ice dynamics is running only on one processor (H. Drange 2001, personal communication).

2

OPYC is not supported at Max Planck Institute in Hamburg anymore (L. Bengtsson 2001, personal communication).

3

For research purposes, the RCO code and utility programs can be obtained on CD-ROM from the corresponding author.

Save