The Second Workshop on Coupling Technologies for Earth System Models (CW2013) continued the momentum of the first workshop held two years prior (Valcke and Dunlap 2011a, b) by bringing together researchers and practitioners from over 20 institutions across the globe. Coupling technologies are software packages designed to instantiate a system of interacting components by offering support for parallel data exchange, repartitioning of distributed data structures across processor sets, grid interpolation and remapping, control flow management, and other utilities. The goals of the workshop were to update participants on recent developments in coupling technologies, share experiences deploying existing coupling infrastructure into new contexts, discuss common challenges and goals, and identify possible avenues for community convergence via software interoperability, benchmarking, and sharing infrastructure. The three-day workshop included 31 talks in five major sessions: coupling technologies overviews and recent developments, experience reports, interoperability (I/O), benchmarking, and performance of new and emerging architectures. After hearing all talks, participants discussed common themes that emerged during the workshop and identified possible next steps. The workshop program including presentation slides and abstracts are available on the CW2013 wiki (located at https://wiki.cc.gatech.edu/CW2013).
What: A total of 50 participants from the Earth system modeling communities of North America, Europe, and Asia discussed new technologies and implementation challenges for coupling software.
When: 20–22 February 2013
Where: Boulder, Colorado
The talks during the coupling technology overviews and recent developments session described design decisions of existing coupling technologies and provided an update on new and emerging features. The technologies covered include CPL7; the Model Coupling Toolkit (MCT); Ocean Atmosphere Sea Ice Soil, version 3 (OASIS3-MCT); Yet Another Coupler (YAC); OpenPALM; the Earth System Modeling Framework (ESMF) and its National Unified Operation Prediction Capability (NUOPC) layer; the Bespoke Framework Generator (BFG); the Community Surface Dynamics Modeling System (CSDMS); the Mesh-Based Parallel Code Coupling Interface (MpCCI); the Multidimensional Common Remapping Software for Earth System Models (CoR); and the C-Coupler.
The talks in the experience reports session described particular deployments of coupling technologies and identified key challenges that require future research. The talks identified various implementations based on the OASIS series of couplers, ESMF-based models, and couplings with the Community Earth System Model (CESM) CPL7. Experiences with OASIS include coupling of the Action de Recherche Petite Echelle Grande Echelle (ARPEGE) atmosphere model to the Nucleus for European Modeling of the Ocean (NEMO) ocean model with a high-resolution regional grid, the evolution of the Met Office's Unified Model (UM) from a single executable with customized coupling code to a multiple executable architecture with a combined atmosphere–chemistry–land model and a combined ocean–ice model, the multiscale NEMO–Weather Research and Forecasting (WRF) coupling, and the EC-Earth Consortium (EC-EARTH3) coupling. Although presented during the interoperability session, experiences were reported implementing the ESMF NUOPC abstractions inside U.S. Navy Earth system models (ESMs). Other coupling architectures were presented, such as integration of the NEMO ocean model into the CESM CPL7 architecture.
Nearly all talks identified current and expected technical challenges involved with deploying coupling technologies. Work is currently being done to enhance performance of existing deployments via improved load balancing and use of separate I/O servers. Many challenges arise because of the need to couple an increasing number of constituents, to support diverse scientific objectives, and to maintain multiple configurations. Several speakers noted a desire for increased abstraction, improved model interfaces, a reduction in bespoke coupling code, and overall architectural unity.
The interoperability session included a diverse set of talks about ESM architectures and how constituent models and coupling technologies interact. Members of the ESMF team described several interoperability efforts, including the use of web services to enable interoperability with hydrology models executing on external platforms, and NUOPC, an abstraction layer that defines a set of conventions to facilitate interoperability within a common model architecture. Options for interconnecting frameworks were identified, including functional split, external specification, cross-registration, wrapping, and codevelopment. One talk recognized the long-term limitations of the current scientific culture in which constituent models tend to be owned by a “top model” instead of retaining a truly independent identity. It was argued that this paradigm limits interoperability because the top model then dictates the development processes of other models that need to interact. Other important topics include using visual representations to facilitate comparison of climate model architectures and the use of domain-specific languages and compilers to increase modeler productivity through code generation.
The interoperability session included a plenary roundtable discussion. Major discussion points included the importance of the modularization of component models as a precursor to interoperability; the role of rules and conventions to enable interoperability, such as precisely defining the behavior of a model initialization procedure; the impact of implicit coupling across component boundaries, which requires sequential execution of parts of different component models; the need for common metadata formalisms; and how the existence of multiple abstract data types—for example, for defining grid structures—can impede interoperability.
The benchmarking session began with a talk describing the benefits of benchmarking efforts on scientific communities in general, linked especially to community convergence on the identification of a set of agreed upon and worthwhile problems. Members of the Met Office then described a planned rewrite of the UM to enable scaling to 105 cores and to replace the current latitude–longitude grid structure with a globally quasi-uniform grid. The team desires to build upon state-of-the-art coupling technologies and their requirements were considered as potential features for developing functional benchmarks for couplers and framework systems.
With the background on the advantages of benchmarking and the set of representative requirements derived from the UM rewrite, the workshop participants divided into three groups to discuss the development of a coupling technology benchmark. Discussions were driven by two sets of questions:
What are the scientific and technical requirements, including functional (data exchange, regridding, etc.) and nonfunctional (performance, flexibility, etc.) aspects, to build a geophysical coupled system from independent models?
What are the qualities that should be assessed in a coupling technologies benchmark and how should those qualities be measured? As a community, how can we progress in the realization of such a benchmark? What existing resources can we leverage to bootstrap the development of a community benchmark?
Results of the discussions are briefly summarized in the conclusions.
The final session, performance of new and emerging architectures, featured talks focused on I/O (which can be viewed as a type of coupling), scaling to ultra-high resolutions, and performance modeling. Two I/O libraries were presented: parallel I/O library (PIO) and the XML I/O server (XIOS). I/O remains a significant performance bottleneck; potential solutions include the use of asynchronous I/O, use of processors dedicated to IO, parallel data compression, and on-the-fly post processing. To illustrate the challenges of component scalability, the architecture of the Model for Prediction Across Scales-Atmosphere (MPAS-A) nonhydrostatic atmosphere model was presented, including its unique centroidal Voronoi tessellation grid structure, its decomposition and parallel communication schemes, and strategies for minimizing communication via shared halo exchanges, improved assignment of decomposition blocks to Message Passing Interface (MPI) tasks, and improved overlap of computation and halo exchanges. In another presentation, some early work was discussed on using performance models to improve locality between the land and atmosphere data decompositions in CPL7. Finally, work was presented on a configuration of CESM for high-resolution runs using over 23,000 cores; challenges include scaling issues due to MPI communication bottlenecks, suboptimal partitioning of components, suboptimal scaling of CPL7, and I/O overhead.
There are multiple coupling strategies and each has advantages and disadvantages and each supports different use cases. The prominent coupling strategies can be categorized by those that include a driver with explicit calls to constituent model interfaces and those that feature distributed control and synchronize via specialized communication calls. The first strategy typically involves compiling all constituent models into a single binary, while the second strategy allows models to be coupled as separate binaries. Because they address different needs, both strategies should be maintained and further developed for the foreseeable future.
The integrated strategy, with a top-level driver, enables sharing data via memory since multiple models can share an address space and also offers increased flexibility to execute constituents sequentially, concurrently, or in a hybrid mode. The explicitness of the execution schedule may improve model developers' ability to understand the overall sequencing of the constituents and can facilitate debugging. The use of a driver for calling constituent models gives impetus to define explicit interfaces and improve code modularity.
The distributed control strategy allows binary independence of the constituent models and allows asynchronous communication calls to be flexibly placed in models with minimal intrusion into the coding architecture. However, executing constituents sequentially in a common address space is not possible—a limitation that could reduce efficiency. OASIS, which uses the distributed control approach, is currently used with only two to three constituent models at most institutions, and it is unknown how well this approach will scale to a larger number of constituents.
Work should be done to compare and optimize both approaches. Generative technologies like BFG can unify the two apparently disparate approaches through the use of code generators that can produce custom framework code. Other loosely coupled approaches are also emerging, such as the ability to access ESMF components as a web service. This may be particularly beneficial for integrating models from different communities with different drivers and constraints, such as the ESM and land surface communities. The heterogeneous computing community may have solutions to offer, especially in cases where it is not necessary to compile everything into a single executable, or even have the constituent models executing on the same platform.
The community identified multiple motivations for interoperability. There were discussions about what software modules should interoperate. Coarse-grained interoperability such as the ability to select from a number of compatible model components is already supported by some ESMs, but no solution provides completely automated interoperability. Further, fine-grained interoperability of parts of a model, such as the ability to share physics parameterizations between models, is also of interest. However, even if it is technically possible to design mechanisms for “plug and play,” the technical compatibility of constituent models does not guarantee the scientific validity of the resulting coupled model.
There are some key challenges that encumber interoperability. In porting the MPAS atmosphere dynamical core into the Community Atmosphere Model (CAM), one developer stated that the most difficult part was understanding the CAM source code in order to determine how to integrate the two systems. Mechanisms that promote program understanding and improve code readability can contribute to interoperability by reducing the implementation burden when manual changes are required. The success of integrating NEMO into the CESM CPL7 architecture, despite their different approaches to coupling, shows that convergence on model interface definitions reduces the development burden when integrating new components into an existing system.
There are benefits to sharing the infrastructure libraries that all coupled models require despite the overall architecture, such as data exchange, interpolation, and regridding functions. Interoperability among infrastructure libraries is reduced owing to the use of heterogeneous abstract data types that represent grid structures and physical fields. Lack of common abstract data types is a key reason why many groups have developed custom functions instead of using existing libraries. Options for dealing with abstract data-type inconsistencies include translating among them (which may require data and metadata copies and hence a performance penalty), standardizing abstract types across models, or eliminating abstract types by creating coupling operators that work over FORTRAN primitives.
Modeler productivity is tied to the complexity of the build process. In some cases, the build process is becoming the “showstopper” because of the increased number of constituents in coupled models and the large number of infrastructure pieces required. Coupling technologies themselves have complex build processes because of their dependencies on external libraries and, in many cases, the need to compile some parts of the coupling technology into the models. Additional sources of complexity include the use of multiple programming languages, limited support for shared libraries on high-performance computing platforms, the requirement to support multiple versions of dependent libraries, and the use of different tools such as make, autoconf, and CMake. Together, these obstacles hinder defining a single, unified build process.
Build complexity can be reduced by eliminating or better managing software dependencies. Dependencies may be managed better by enabling selective use of individual parts of coupling technologies. Most coupling technologies are packaged as a single, all-inclusive distribution. An alternative approach is to enable fine-grained reuse by offering smaller packages that can be downloaded and installed selectively.
There is general interest in developing a set of benchmarks for coupling technologies. However, there were competing ideas about both the goals of the benchmark and what qualities should be measured. Some coupling tasks are amenable to traditional performance comparisons, such as generation of interpolation weights, parallel communication, and regridding operations. Other qualities such as flexibility and intrusiveness of coupling technologies are both difficult to define precisely and difficult to measure directly. However, the community would benefit from rigorous assessment of these somewhat less tangible qualities because they directly impact the ease with which a coupling technology can be implemented in new contexts.
Facilitating the sharing of coupling software could be an important outcome of a benchmarking effort. Although a benchmark will not necessarily reduce the number of coupling technologies to one, it can at least offer a path of convergence by facilitating the identification of the functions offered by the different coupling technologies and giving a quantitative assessment of how well they implement these functions. Moreover, a community-wide benchmark suite could be used to identify important use cases for coupling technologies and to set development priorities for future releases.
An international group of participants volunteered to begin organizing the benchmarking activities. A first step will be to review the outcome of the group discussions on benchmarking carried out at the workshop.
Investing in shared software infrastructure.
Improving scientific productivity will continue to be the main driver for decisions about the future of coupling technologies. Although there is basic agreement that software infrastructure should be shared and that the continuing amount of diversity results in some duplication of effort, there are still significant barriers to sharing infrastructure. The merge of OASIS and MCT into OASIS-MCT is an example of a successful collaboration: MCT provides low-level building blocks, while OASIS provides a layer that facilitates interfacing with existing model implementations and offers an external coupling configuration control. Use of ESMF interpolation weight generation in CESM is another collaboration that has had significant impact. Successful collaborations such as these indicate the potential advantages of sharing coupling infrastructure. As future partnerships emerge, we expect the geoscience communities to reap the benefits of a new generation of robust, efficient, and high-quality coupling technologies.
Funding for the Second Workshop on Coupling Technologies for Earth System Models was provided by the National Center for Atmospheric Research Climate and Global Dynamics Division and by the Infrastructure for the European Network for Earth System Modelling (IS-ENES).