The range of possible results inherent in climate simulations can be examined using ensembles of simulations to explore the influence of climate model formulation as well as those of initial and boundary condition changes. Multimodel ensembles are now being used both in seasonal-to-interannual forecasts and on longer time scales and under specified external forcing, as well as for projections of climate change. The range of results produced by different models can be partly attributed to the chaotic nature of the climate system itself, but the range can also be dominated by differences in model formulation. Today, the differences and similarity between models is not well documented nor widely understood either by the research community or by potential users of climate model output. Here we describe a project that seeks to collect better documentation of these climate models and their simulations. Such documentation is a necessary step toward understanding how the diversity of existing models influences the differences and uncertainty in their predictions.
THE NEED TO UNDERSTAND STRUCTURAL UNCERTAINTY AMONG CLIMATE MODELS.
The climate system features a complex interplay of processes and feedbacks, ranging from ecosystems to radiative transfer. Numerical climate simulations therefore involve several “component models” that are coupled together into an “Earth system model” to simulate the atmosphere, ocean, sea ice, land surface, and land ice, and that can account for processes governing the physical, chemical, and biological behavior of the system. The complexity of these models, gauged in terms of the number of the processes that are represented, continues to grow. The complexity of each individual process is scientifically challenging enough that there is sometimes disagreement as to the best way to represent them in models. Without an obvious correct choice, the corresponding “structure” of a model must be considered uncertain. This is referred to as “structural uncertainty” (see also Palmer 2012). In different climate models, the interactions of clouds with the large-scale atmospheric circulation and with radiation, for example, have been encoded in a number of different ways, which partially accounts for the structural uncertainty evident in the climate projections produced by these models.
In this context, multimodel ensembles are now being used to explore the range of results arising from structural uncertainty. Different models perform the same numerical experiments under agreed-upon common protocols that specify—for instance, in the case of twentieth-century simulations—the imposed external forcing (solar, volcanic, and anthropogenic). Such comparative studies of models, internationally organized under formal model intercomparison projects, have now become standard practice. They have spawned new lines of research into the evolution and genealogy of models and into the skill of the model ensemble mean relative to the individual models. Studies of this nature require precise descriptions of models so that their features can be contrasted and compared.
THE EXPANDING USER COMMUNITY OF CLIMATE MODEL OUTPUT.
The results of multimodel climate simulations are also of increasing importance to broad segments of society. Not only scientists and researchers in the climate change impacts and adaptation fields, but also nonspecialists such as policy makers, local government officials, and the general public now have a need to locate and understand the implications of climate simulations. Climate simulation data are stored in huge and complex digital repositories. For the fifth phase of the Coupled Model Intercomparison Project (CMIP5), organized by the Working Group on Coupled Modelling (WGCM) on behalf of WMO's World Climate Research Program (WCRP), more than a million individual datasets and several petabytes of data will be generated. Locating and making sense of this community resource requires accurate and complete metadata. This is especially important for climate model simulation output, since many different variants of each of the component models exist, and it is essential to document which model version produced each dataset. A model version may differ from others, for example, not only in the values specified in the parametric representations of various processes, but also in the algorithms incorporated in the source code itself.
THE NEED FOR CLIMATE MODELING GROUPS TO MAKE METADATA AVAILABLE.
Until now, much of the critical information needed to describe the model configurations and the experiments could only be found in the notebooks of individual climate scientists and the “comment statements” found in their computer codes; hence, it was largely inaccessible by the broader community. Multimodel databases provided the first strong incentive for developing a common approach to recording descriptions. When dealing with multimodel databases, scientists and other stakeholders are increasingly seeking information about the suitability of available data for their purposes. Prior to CMIP5, this information was difficult to obtain. Thanks to recent community efforts, it should now be possible to obtain answers to questions like: What differences are there between the GFDL CM2.0 and GFDL CM2.1 models? (The only difference is the atmospheric dynamical core.) Which simulations of the twentieth century have daily output data and use turbulent kinetic energy (TKE) vertical mixing in the ocean? What is the grid resolution near the equator or over Europe in the IPSL-CM5A simulator? Are volcanoes included in the MIROC5 simulator, and how? The current set of initial questions has for now (during this proof of concept phase) a scope targeted to climate scientists, but, as described below, extension to other fields and stakeholders will be natural.
Many climate model configuration choices are determined by experimental requirements, but these still usually leave room for some differences in how a particular simulation is performed by each model. Hence, in addition to detailed documentation of the models themselves, the experiment conditions must also be fully documented. This information is not only important for scientific interpretation but, under increasing scrutiny from society, it is also demanded of a science that purports to be mature, credible, and open. As a consequence, in early planning stages of CMIP5, the climate modeling community committed to collecting a comprehensive and standardized set of metadata for the climate model simulations. An important additional benefit of archiving such information is that this ensures that the conditions under which model simulations were performed will be understood well into the future (decades and beyond) when data may still be relevant not only for scientific reasons, but also possibly of historical interest. This “data curation” activity depends on such comprehensive information collected when the data are produced.
THE METAFOR PROJECT AND CMIP5 QUESTIONNAIRE.
Acknowledging these challenges, a part of the climate-research community committed itself to achieving the ambitious goal of defining, collecting, and making accessible model and experiment metadata for CMIP5. The aim was to make generally available an unprecedented level of detail in describing the models and simulations. Funding for this international effort is being provided by the European Union (http://metaforclimate.eu, http://enes.org) and the United States (http://earthsystemcurator.org), and guidance and encouragement is being provided by the WCRP. In the initial phase of the project, climate and information technology experts worked together to identify the information that would need to be collected to describe models and their simulations. The various types of metadata of interest were then organized into a new conceptual model, called the CIM (Common Information Model). This conceptual model was applied to the specific needs of CMIP5, and a metadata entry tool was developed to collect the information (the CMIP5 questionnaire, located at http://q.cmip5.ceda.ac.uk).
Climate scientists and modelers were initially expected to have the most interest in metadata that would help them understand the differences among the various simulations performed by different models. Hence, the content and structure of the model description section of the CMIP5 questionnaire largely reflect the needs of these groups. The input from climate modelers was obtained through direct interviews. Besides identifying the appropriate questions to include, a list of standardized responses was also developed. Converging on a first version of the questionnaire proved relatively straightforward, and disagreements among experts were usually easy to address. Care was taken not to impose uniformity in areas where consensus within the climate research community has not yet been achieved and where agreed “standards” have yet to emerge. In addition, because of the sheer complexity of climate modeling and the finite resources available, a decision was made to limit the scope of the questions, leaving some areas of interest—such as the description of specific model tuning approaches or the choices of metrics used in model evaluation—for future development.
THE METAFOR PROJECT
The Common Metadata for Climate Modelling Digital Repositories (METAFOR, http://metaforclimate.eu, 2008–11) project is a Europe–U.S. collaboration that addressed the problems associated with metadata (data describing data) identification, assessment, and usage. This EU-funded 2.5M-euro project, which involved 12 institutions, has been led by the U.K.'s National Centre for Atmospheric Science (NCAS) at the University of Reading. METAFOR has developed a Common Information Model (CIM, currently at version 1.9) to standardize descriptions of climate data and the models that produce it. METAFOR has secured a mandate from the World Climate Research Programme's Working Group on Coupled Modelling (WGCM) to define and collect model and experimental metadata for the Coupled Model Intercomparison Project Phase 5 (CMIP5) project. METAFOR is taking the first step in doing for climate data what search engines have done for the Internet: putting users of climate data in touch with the information they need. Following the completion of the project in late 2011, the European Commission review characterized METAFOR as “a very successful project, which should become a blueprint-element for other projects in the data infrastructure domain.” The funding for the continued development of METAFOR activities is now secured under the EU IS-ENES2 project (http://enes.org), starting early 2013.
The modeling groups involved in CMIP5 are now entering information into a metadata catalogue, and the documentation for about 20 models and hundreds of simulations has already been recorded by the model developers through the web interface questionnaire. Data portals can harvest the information contained in the resulting machine-readable files and render it in a form more usable to humans. Given that different users will want to explore this database in different ways, it is essential to engage directly with various individual communities so that tools can be developed to address their specific needs. Such tools will depend on standard technical interfaces, which have already been developed, and the first tools aimed at displaying and searching the metadata are also now becoming available. Further development planned both in Europe and the United States will eventually enable more complex analysis of the metadata (e.g., determining the differences between two model versions).
CONCLUSIONS AND FUTURE OUTLOOK.
As described above, this undertaking by the climate modeling community to collect and make accessible metadata in support of CMIP5 will provide the most comprehensive set of multimodel climate simulation metadata to date. Beyond CMIP5, the intention is that the CIM and the associated standards should become increasingly adopted by climate modeling frameworks in much the same way that promotion of the climate and forecast (CF) conventions (see http://cf-pcmdi.llnl.gov) led to standardization of climate model output. To ensure continuity while allowing evolution, a governance structure to maintain and further develop the CIM and the associated “controlled vocabularies” is being proposed that will build on the structure already in place for governing the CF conventions. The term “controlled vocabularies” refers to the predefined and limited set of words, phrases, and names that comprise the metadata description.
Several funded projects (listed from http://es-doc.org) have already begun to build on this initial effort and to extend its scope and use in several directions (e.g., developing new tools, addressing the needs of additional user communities.). A possible extension, for example, is to link such model and simulation descriptions to the suite of model performance metrics now being devised by WCRP (http://www-metrics-panel.llnl.gov/wiki). Such metrics measure how the models perform compared to observations; linking these with detailed model descriptions might help in determining the origin of some model errors.
USING THE CMIP5 MODEL AND SIMULATION DESCRIPTION PORTALS
The CMIP5 climate modeling metadata is being exploited by the Earth System Documentation (ES-DOC) project, which aims to supply high-quality Earth system documentation tools and services to the international community. Initially such tools will support documentation viewing and comparison, but will be extended in due course to support documentation creation and visualization. The tools are designed to be easily integrated into third-party portals so as to ensure that the documentation is as widely disseminated as possible.
One such third-party portal already leveraging the ES-DOC documentation viewer tool is the Earth System Grid Federation Web front end (http://pcmdi9.llnl.gov/esgf-web-fe). When performing a CMIP5 dataset search, a “model metadata” link is displayed beside each returned dataset result. Clicking upon this link opens the ES-DOC documentation viewer, thereby allowing the user to view model, experiment, simulation, platform, and quality documentation related to the dataset in question. The model documentation details the full model component hierarchy along with associated citations, contacts, and scientific properties. The experiment documentation details all experimental requirements. The simulation documentation details amongst other things ensemble and forcing information. The platform documentation details the machine upon which the simulation was run. Finally the quality documentation details the automated quality-control checks run by the DKRZ. All this documentation is derived from the Metafor CMIP5 Questionnaire.
In due course, the CMIP5 metadata information will also be accessible through the main ES-DOC portal (http://purl.org/org/es-doc). This portal will provide a documentation search engine that will more closely follow the structure of the CIM documents. Here the user will be able to independently explore models, grids, simulations, etc. The main ES-DOC portal will also provide access to comparison and visualisation tools built upon aggregated views of the CMIP5 metadata archive.
It should be noted that the CMIP5 metadata information, as provided by the Metafor CMIP5 Questionnaire, is already being used to generate model and experiment description tables in support of the IPCC 5th Assessment Report (scheduled for publication in 2013).
Another project has succeeded in linking the CIM metadata for the EU ENSEMBLES simulations to the University of Cantabria downscaling portal (www.meteo.unican.es/downscaling/ensembles), thereby helping to meet the needs of the impacts community. It is expected that new tools will be developed to provide a synthesis of information in various CIM documents and to produce easily configurable, scientifically meaningful summaries of, for example, the differences between two models or two simulations. In time, it is expected that a diversity of users would engage their own experts to devise alternative ways of melding climate model output and documentation to best meet their needs.
Continuing developments and investments to record and archive climate model metadata are only part of a longer-term effort that should provide ongoing benefits to the community of users accessing climate model output. For instance, this model metadata archive will provide a much more comprehensive and up-to-date description of climate models than is typically available in journal articles or reports. Beyond the raw documentation, these community-managed metadata repositories will spur development of analysis tools for a wide range of stakeholders, providing a form of “Google advanced search” suitable for finding and exploiting simulations of the Earth system.
ACKNOWLEDGMENTS
METAFOR has been funded by the EU 7th Framework Programme as an e-infrastructure (Project #211,753). We thank Ron Stouffer, Sandrine Bony, and Jerry Meehl for the strong support provided via the CMIP and WGCM panels. Mark Elkington from the Met Office/Hadley Centre provided extensive comments on the beta testing of the CMIP5 metadata questionnaire. We acknowledge the enthusiasm, dedication, inspiration, and hard work of the Metafor and Earth System Curator teams and, in particular, Charlotte Pascoe, Gerry Devine, Hans Ramthun, Allyn Treshansky, Marie-Pierre Moine, Frank Toussain, Rupert Ford, and Sophie Valcke.
FOR FURTHER READING
Carlson, D., 2011: A lesson in sharing. Nature, 469, 293, doi:10.1038/469293a.
Gleckler, P. J., K. E. Taylor, and C. Doutriaux, 2008: Performance metrics for climate models. J. Geophys. Res., 113, D06104, doi:10.1029/2007JD008972.
Guilyardi, E., and Coauthors, and the METAFOR group, 2011: The CMIP5 model and simulation documentation: A new standard for climate modelling metadata. CLIVAR Exchanges, 56, 42–46.
Hawkins, E., and R. Sutton, 2009: The potential to narrow uncertainty in regional climate predictions. Bull. Amer. Meteor. Soc., 90, 1095–1107, doi:10.1175/2009BAMS2607.1.
Kleiner, K., 2011: Data on demand. Nature Climate Change, 1, 10–12, doi:10.1038/nclimate1057.
Lawrence, B. N., and Coauthors, 2012: Describing Earth system simulations with the Metafor CIM. Geosci. Mod. Dev. Disc., 5(2), 1669–1689, doi:10.5194/gmdd-5-1669-2012.
Masson, D., and R. Knutti, 2011: Climate model genealogy. Geophys. Res. Lett., 38, L08703, doi:10.1029/2011GL046864.
Overpeck, J. T., G. A. Meehl, S. Bony, and D. R. Easterling, 2011: Climate data challenges in the 21st century. Science, 331, 700–702, doi:10.1126/science.1197869.
Palmer, T., 2012: Towards the probabilistic Earth-system simulator: A vision for the future of climate and weather prediction. Quart. J. Roy. Meteor. Soc., 138, 841–861.
Reichler, T., and J. Kim, 2008: How well do coupled models simulate today's climate? Bull. Amer. Meteor. Soc., 89, 303–312.
Santer, B. D., and Coauthors, 2009: Incorporating model quality information in climate change detection and attribution studies. Proc. Nat. Acad. Sci., 106, 14 778–14 783, doi:10.1073/pnas.0901736106.
Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485–498, doi:10.1175/BAMS-D-11-00094.1.
Williams, D. N., B. N. Lawrence, M. Lautenschlager, D. Middleton, and V. Balaji, 2011: The Earth System Grid Federation: Delivering globally accessible petascale data for CMIP5. Proc. of the 32nd Asia-Pacific Advanced Network Mtg, New Delhi, doi:10.7125/APAN.32.15.