Observations are the foundation for understanding the climate system. Yet, currently available land meteorological data are highly fractured into various global, regional, and national holdings for different variables and time scales, from a variety of sources, and in a mixture of formats. Added to this, many data are still inaccessible for analysis and usage. To meet modern scientific and societal demands as well as emerging needs such as the provision of climate services, it is essential that we improve the management and curation of available land-based meteorological holdings. We need a comprehensive global set of data holdings, of known provenance, that is truly integrated both across essential climate variables (ECVs) and across time scales to meet the broad range of stakeholder needs. These holdings must be easily discoverable, made available in accessible formats, and backed up by multitiered user support. The present paper provides a high-level overview, based upon broad community input, of the steps that are required to bring about this integration. The significant challenge is to find a sustained means to realize this vision. This requires a long-term international program. The database that results will transform our collective ability to provide societally relevant research, analysis, and predictions in many weather- and climate-related application areas across much of the globe.
We cannot predict what is not observed, and we cannot analyze what is not archived. Fully integrated land meteorological records are essential to advance understanding of weather and climate.
Meteorological observations at stations over land areas have been taken for several centuries. Early measurements were performed by the scientists of the Enlightenment exploring their environment, by medical doctors trying to understand illnesses, agronomists aiming at improving yields, or colonial administrators documenting the wealth of their assigned colonies. These early observations used a broad range of instrumentation, emerging measurement systems, and disparate measurement scales (e.g., Réaumur, Fahrenheit, Celsius, Paris lines, inches, millimeters, hectopascals; Brázdil et al. 2010). From the midnineteenth century onward, many nation states started the operation of meteorological networks in the framework of a new administration style, new responsibilities, and technical innovations (e.g., telegraph) (Edwards 2011). By the late nineteenth century, station coverage (albeit sparse) extended across all inhabited continents sufficient that, today, we can calculate estimates of global-mean change in several important parameters, such as temperature (e.g., Lawrimore et al. 2011, Morice et al. 2012), dating back to the mid-to-late nineteenth century. The late nineteenth and early twentieth centuries saw a drive to standardize methods of observations, data formats, and metadata under the World Meteorological Organization (WMO) and its predecessors as the scientific and societal value of these observations became more and more apparent (Parker 1994). But change continues to this day, for example, with an increased propensity to unmanned methods of observation.
These meteorological data constitute the foundation of our knowledge of the climate system. Without long records of observations, there can be no viable pathway to understanding climatic processes, climate variability, climate and weather extremes, or climate change. Historical observations have been used to create datasets for many essential climate variables (ECVs; Bojinski et al. 2014) that have enabled assessments of changing climate system properties (e.g., Blunden and Arndt 2016; Hartmann et al. 2013). They tell us about a world that has changed and warmed over the last 150 or so years (Kennedy et al. 2010), and their continuation is important in monitoring the effectiveness of the recent Paris agreement (Dolman et al. 2016). They can also be used (along with many other observations) to derive reanalysis products (Saha et al. 2010; Compo et al. 2011; Rienecker et al. 2011; Dee et al. 2011, 2014; Kobayashi et al. 2015) and assess performance of climate models (e.g., Flato et al. 2013 and references therein).
Despite the importance of land-based meteorological observations, their data management is currently very fragmented. While for marine observations the International Comprehensive Ocean–Atmosphere Data Set (ICOADS; Freeman et al. 2017) provides integrated access to a wide range of surface marine data, the same cannot be said for the land-based observations. The absence of a coordinated global program for data rescue and provision, data management, data curation, and data usage means that we are not currently extracting the full scientific and societal benefits from the observations that we know of (and this ignores the unknown knowns—the uncataloged, the unshared, and the forgotten). Such data have potential to be used seamlessly across a range of local, regional, and global products and applications. However, this problem is multifaceted with no simple solutions. Data sharing is complex, involving both data rescue and data policy. National and global meteorological data management has historically been highly fractured, such that today we have distinct holdings for hourly, daily, and monthly data managed by different groups in disparate ways. We have also developed several ECV-specific sets of holdings for many ECVs such as surface pressure (Cram et al. 2015), temperature (Rennie et al. 2014), and precipitation (Adler et al. 2003; Schneider et al. 2014). These holdings use distinct data formats, carry different station identifiers, often disagree in simple facets such as station names and even coordinates, and have differing levels of data completeness. Different sources may have had distinct quality control applied. Even more confusingly, daily and monthly averages derived from the same underlying data may differ if processed in different ways. Further compounding the issues, available data discovery tools are often rudimentary, and traceability to the original source is lacking. Hence, it is at best hard and at worst impossible for users to reconcile available holdings.
While some efforts to create merged holdings exist regionally—for example, the European Climate Assessment and Dataset (ECA&D; Klein Tank et al. 2002; Klok and Klein Tank 2009) and International Climate Assessment and Dataset (ICA&D) components (Van Den Besselaar et al. 2015)—and globally—for example, Global Historical Climatology Network-Daily (GHCN-D; Menne et al. 2012)—they are limited in geographical domain (ECA&D, ICA&D), temporal integration (GHCN-D), and often to a narrow subset of variables (all to some extent). This globally piecemeal approach to data management is a major impediment to optimal usage of historical land meteorological holdings. What we need is a coordinated, sustained, and international effort to better access, manage, and integrate these holdings.
The challenge is not simply that the holdings are suboptimally managed for present-day applications, but also that there are emerging needs that are currently ill-served (Allan et al. 2012). In the era of climate services, there is a growing and changing demand for local information, for information on daily and subdaily aspects of climate change (not least extremes, impacts, and risks), and for data that are “open” and free from restrictions and conditions. Many of these emerging applications also require consideration of the full suite of surface ECVs (Bojinski et al. 2014), as well as measurements that are not currently routinely captured or used from multiple sources beyond the remit of many national meteorological services (NMSs) (e.g., agriculture, transport, energy, amateur observers). Furthermore, there is an increasing need for high-quality discovery and observational metadata: indicators of quality and uncertainties, in addition to known changes in measurement techniques, practices, locations, and siting environments. Such data and metadata are essential in provision of scientifically robust climate services (Van Den Besselaar et al. 2015).
Put bluntly, we keep being confronted with climate situations that are seemingly unprecedented. This is likely in part because we are currently unable to comprehensively represent the climate of our recent past owing to the present disorganized state of land-based holdings. To be able to better assess and respond to climate-induced challenges, it is crucial that we gain a better understanding not only of large-scale climatic trends, but also of local climate, past conditions and variability, and short-term events such as flooding, heat waves, and storms. We are in a fortunate position of being able to draw upon decades and, in many cases, centuries of scientific heritage in the form of rich meteorological archives, but this invaluable heritage is not currently capable of being used to its full potential. To be able to research the scope and causes of climatic events effectively, it is imperative that the many millions of observations taken at some tens of thousands of locations, including emerging methods and modes of observation, be integrated and consolidated into a coherent database.
In this paper we outline a high-level roadmap for a comprehensive land-based international meteorological observation databank (CLIMOD) and offer examples (via boxes) of how valuable a database like this could be. We also consider the numerous logistical hurdles to overcome. We begin by outlining the high-level vision of the approach to be undertaken. Then, in turn, we discuss in more detail aspects around data sourcing, data rescue and collection (including metadata), data management, data serving, and governance (including data policy).
HIGH-LEVEL VISION FOR CLIMOD.
The combined policies, practices, and projects since systematic observations of weather were instigated have led to a massively fractured set of land-based meteorological archives ranging from single-station, single-ECV, single-time-resolution series to large global multi-ECV, multi-time-scale holdings. Stations often have contributed to multiple archives, sometimes at a range of time averages, such that there is gross duplication and resulting confusion in the available records. CLIMOD will first collect all digital data holdings and associated metadata, augment them with rescued and newly discovered data as they become available, and combine them into a single database of all commonly observed ECVs at land meteorological observing sites. To ensure provenance (data lineage), these collected data should be as close to their original format and source as possible. This basic consolidation is a key and necessary first phase, but it is not sufficient to facilitate widespread usage, except perhaps by a handful of experts who can properly assess and use such complex holdings.
What is next required is analogous to a satellite “level 1b” product (Campbell and Wynne 2011) that is geolocated, with duplicates removed, in consistent geophysical units, with metadata directly associated and with gross errors flagged. This would provide a set of openly available land–meteorological holdings that is integrated, consistently formatted, and discoverable. In this phase, the different ECVs should be collated and served together; to quote Aristotle, “the whole is greater than the sum of its parts” (as illustrated by the three boxes outlining potential applications). When these data are served in this manner, their potential utility will be substantially increased. Figure 1 illustrates the change between a set of consolidated (all available) holdings and a set of merged (unique only) holdings for the particular case of monthly averaged surface temperatures. Note the reduction in apparent station count that relates to duplicates and records of uncertain quality or provenance.
Another problem is that records exist in the archives at a mixture of subdaily, daily, and monthly averaged time scales. Hence, next it is required to ensure time-scale consistency. The subdaily data—where they exist—should be consistent with the derived daily values, which should in turn be consistent with the monthly values. Consistency implies that, for example, daily averages and ranges are consistent with the respective reported maxima and minima recognizing national reporting norms that vary over time and between countries. This will allow analyses of extremes and events to be consistent with studies of mean change. CLIMOD should thus, in addition, contribute to envisaged cross-domain and cross-ECV analysis systems (Kent et al. 2017, their recommendation 7). Figure 2 provides a schematic overview of the proposed CLIMOD holdings and their subsequent analysis.
Historical data can be considered in three classes: i) data that are digitized and are already made available with workable or no usage restrictions, ii) data that are digitized but are either not made available or made available with restrictive usage conditions, and iii) data that are only available presently as either hard copy or, at best, non-machine-readable image formats. Imperfect individual and community knowledge leads to a fourth class of “unknown knowns” or “forgottens”—data that exist but that the present-day scientific community, for a variety of reasons, do not know about. These data should become substantively less elusive with a successful CLIMOD simply as a result of raised community awareness as to the existence of a recognized repository for such data.
Data that are digitized and available for, at a minimum, academic and research usage are served from a broad variety of local, national, regional, and international repositories. This includes national meteorological services, regional efforts such as ECA&D, and global repositories such as GHCN-D. Experience with the International Surface Temperature Initiative (ISTI; Rennie et al. 2014) and the Atmospheric Circulation Reconstructions over the Earth (ACRE; Allan et al. 2016) initiative points to additional, nontraditional, sources that could greatly improve the density of available observations. For example, through Argentinean members of the ISTI databank working group, access was secured to a number of long-term rural stations from the Argentinean Agricultural Ministry.
Many of the currently served holdings are actually amalgamations of a broad range of underlying sources. For example, GHCN-D is an amalgamation of over 50 sources, many of which themselves (such as ECA&D) are the result of amalgamating underlying sources. It would be preferable to start from the underlying sources in subsequent efforts, described in the next section, to merge holdings together. Starting from the underlying sources provides better provenance and data-processing transparency.
Initially, CLIMOD would start from as many of the available holdings with minimal usage restrictions as could be collected. Efforts would be made to capture the contributing data sources from amalgamated holdings. This collation and cataloging of available data will require substantive international efforts and cooperation and will build on the partnerships already developed by the currently served holdings.
There are several known data holdings that are available digitally but not currently available for academic and research use. The prevalence of such holdings has strong geographical patterns, with data policy being a particular issue for many of the areas where the community currently has access to the fewest data. In some cases, such as in the Pacific and Southeast Asian regions (Page et al. 2004), these amount to several thousand stations. Efforts are already underway to advocate for more relaxed data policies and for allowing access to these data (Lorrey 2011; www.apn-gcr.org/resources/items/show/1666; Williamson et al. 2015). CLIMOD could facilitate such efforts, but the issue is vexing and could take decades to resolve. We return to data policy in “Governance and modus operandi.”
Data rescue activities have provided major improvements in the availability of historical meteorological data and must be sustained. There are, by conservative estimates, at least as many data to be rescued as are currently available in digital archives for the period prior to 1950 (Allan et al. 2011). There are growing international efforts such as ACRE and the International Environmental Data Rescue Organization (IEDRO), and funded projects such as European Reanalysis of the Global Climate System (ERA-CLIM2) and Uncertainties in Ensembles of Regional ReAnalyses (UERRA), that have attempted to mobilize, cataloge, and coordinate data rescue activities. For example, in 2014, WMO, ACRE, and the Global Framework for Climate Services (GFCS) developed the Indian Ocean Data Rescue (INDARE) initiative (www.wmo.int/pages/prog/wcp/wcdmp/documents/INDAREimplementationPlan.pdf). More recently, the WMO Commission for Climatology formed a task team on data rescue and instigated a portal to encourage coordination hosted by the Royal Netherlands Meteorological Institute (KNMI; www.idare-portal.org). These efforts have substantially increased historical meteorological digital data availability. They also identified key requirements essential to long-term preservation of the rescued data, including cataloging, having an institutional champion, and provision of necessary equipment to capture the data (Brönnimann et al. 2006; Tan et al. 2004).
Many different types of data can be valuable in addition to NMS stations (Allan et al. 2016). For example, data have been rescued from a French eighteenth-century doctor’s network (Yiou et al. 2014) and colonial administrations (White et al. 2018). Early records, and those from countries without a mature NMS, are most likely to come from such nontraditional sources—for example, the National Oceanic and Atmospheric Administration (NOAA)’s Climate Data Modernization Program resulted in the keying of 450 stations across the United States based on data from the U.S. Army Surgeon network, the Smithsonian Institution, and the U.S. Signal Service spanning the 1820–92 period. Discovering and securing access to such data are a long-term process that is already ongoing and mature. The broader aspects of CLIMOD can add enormously to the long-term value of data rescue work by providing a recognized repository that is well used, permitting the easy identification of data that have already been digitized, and highlighting vital applications for the rescued data. Experience from the marine ICOADS effort attests to the value of such holdings in soliciting additional contributions.
DATA SOURCE AMALGAMATION.
To ensure consistency of processing for derived analyses and applications, CLIMOD will work from the most original basic data sources available. Where this consists of images of original data, a link back to the image and source should be retained and such “gold standard” data provenance should be an aspiration for all sources. To minimize transcription errors occurring on conversion to the archive data model, data and metadata should be ingested and archived in perpetuity in the native format of the data supplier with relevant provenance metadata to ensure future usability. CLIMOD should take all relevant data regardless of quality or format. However, certain minimum data standards are likely to be required for inclusion in public-facing holdings—at least sufficient discovery metadata (name, latitude, longitude, and ideally elevation) to enable users to appropriately contextualize the data. Such discovery metadata might become available later so it is important that all data are retained and archived even if not currently useable. This data ingest model permits reprocessing and reconversion of the data and also sets the lowest possible hurdle to data submission. This is a substantial advantage because data will be submitted from a wide variety of organizations and programs. If higher-level products exist (e.g., post quality control or homogenization) these should be stored in parallel. However, preference should be given to data closer to “original,” such that multiple groups of analysts can subsequently consider the challenging issues of quality control, homogenization, and interpolation (e.g., Venema et al. 2012), to create datasets and data products.
The data lineage (provenance) of observations should be carried through the entire data archive and each stage of processing recorded in the metadata associated with the observation. The next logical step is thus to convert the multiple sources to a flexible and extendable common data model of consistent formatting and metadata conventions that retains all metadata and allows full provenance tracking through the use of unique identifiers (UIDs). Each separately provided element of data or information should be assigned a unique identifier (data, station metadata, general source format documentation, etc.) and all associated information linked. This should indicate alternative sources of data at a given time step if they exist. Converting to a common data model (such as espoused by the Copernicus Climate Change Service; https://climate.copernicus.eu) makes intercomparison of the various sources easier but still requires the user to understand the different data sources and use them appropriately. While a necessary step, stopping at this stage is an impediment to usage for all but the most expert users. We then need to create a merged set of holdings.
A large fraction of observing platforms are found in more than one existing data holding, because of past splitting of multivariate observations into different archives and subsequent repeated splitting and merging of different archives (e.g., Rennie et al. 2014). In creating the merged CLIMOD data archive, the source holdings will be prioritized and then merged sequentially. In-depth understanding of each data source will be required to enable a clear decision on the priority order to be made. This may depend on some or all of the following nonexhaustive list: data provenance, data quality level, temporal completeness, number of ECVs supplied, reporting resolution, metadata detail, most recent update, consistency over time, and quality control applied. Any duplication of particular observing platforms across sources will allow quality to be assessed and might fill gaps in the record.
Merging of the source data will be a vital step in producing a final archive valuable to the wider community. It will by necessity be iterative, merging each prioritized source in turn (Fig. 3). A high level of automation will be required, using the metadata and data similarity across all variables to appropriately decide whether a given platform is unique or to be merged. However some manual intervention is foreseen, using local knowledge wherever possible. The merging process will capture the decisions made into the metadata to keep the data provenance and audit trail, including for any manual process. Knowledge gained during manual intervention will be used to update the automatic procedure. Experts would be able to use the resulting metadata to perform a different merge or ensemble of merges matching their own specific requirements.
Merging ECVs and merging time resolutions are both challenging. Merging ECV holdings will require careful attention to ensure physical consistency between variables (supersaturation, rain at <0°C, or snow at >10°C, etc.). To have consistency across the temporal range (where monthly values are to be based on daily values, which themselves are based on the synoptic reports), extra checks will be required—for example, that the subdaily report values of temperature do not exceed the daily minimum/maximum. These will be easier when additional metadata have been included with the original reports (e.g., the definition of the climatological/meteorological day, daily averaging method, details on instrumentation). Data supplied as temporal averages will only be taken as primary sources in the absence of any information at higher temporal resolution.
The final output will undergo quality assurance for gross errors. Any errors identified (e.g., unphysical values) will be removed from the primary data stream but retained within the archive. However, a comprehensive quality control system is not envisaged as part of this primary data product. Quality control processes (manual or automated; e.g., Dunn et al. 2012; Durre et al. 2010; Schneider et al. 2014) are not able to correctly classify an observation in every instance. No matter how carefully the process is designed, errors that either retain bad data points or exclude valid data points will occur, even if infrequently. Furthermore, in some instances the available information will not allow clear decisions to be made. Enforcing one specific set of choices (e.g., 4 sigma versus 4.5 sigma trimming) removes an important means of quantifying uncertainty and risks biasing all downstream applications. Each user of the data product will have different requirements for their application of the data. We encourage the development of multiple quality control suites, each tailored to the end data use, which should be made available alongside the primary CLIMOD holdings.
CLIMOD will become the one-stop shop for all land meteorological data, ready for scientists and other interested parties to develop climate data records and derived products. It is expected to be a dataset roughly the same size as the current ICOADS holdings. As such, it is essential that the combined CLIMOD be built on a strong base of machine-readable metadata to enable powerful searching and subsetting capabilities across time scales and ECVs. The ability to ingest, update, and communicate a range of new data quickly will facilitate the development of timely climate data products and services. This includes access to and timely (with little delay) quality assessments of synoptic reports, daily summaries (including daily CLIMATs when these become available), and monthly CLIMAT summaries exchanged via the WMO Information System. CLIMOD should be capable of providing multiple formats to suit user needs. To build user confidence in the database, full provenance metadata will need to be servable alongside the data.
The proposed CLIMOD holdings represent a requisite high-quality database for the production of climate datasets and value-added products (see sidebars). These subsequent uses would have to consider questions of quality control, quality assurance, and record homogeneity, as discussed in the previous section. Using the database without taking into account these issues would be actively discouraged. To aid such applications, metadata may be appended where available, giving measurement uncertainties, findings of analysts, and other aspects similar to the ICOADS Value Added Database (IVAD; Freeman et al. 2017).
Mosquito-borne diseases such as dengue fever, Zika virus, West Nile virus, and malaria are each transmitted by different mosquito species. Each species has a certain climatic range that serves to limit the disease range. Mosquitos require standing water sources to reproduce and can only survive in certain thermal ranges. Hence, at a minimum, temperature and precipitation data are required (at least away from areas with human mediated standing water sources) to help understand disease outbreaks. Studies have investigated links between meteorology, climate, and outbreaks of various mosquito-borne diseases (e.g., Parham and Michael 2010; McMichael et al. 2006; Hales et al. 1996; Hay et al. 2002). However, climate is only one factor in historical (and future) prevalence of such diseases. Epidemiological studies would substantively benefit from CLIMOD enabling access to and an analysis of historical frequency of favorable conditions for the different mosquito species to improve understanding of past outbreaks. This does not address remaining issues surrounding access to other required information such as disease occurrence monitoring that also need to be addressed by other expert communities.
The study of meteorological drought is a complex issue. Drought is directly caused by a deficit of rainfall but is more than simply a rainfall deficit. There are a large variety of drought definitions—for example, meteorological, agricultural, hydrological, or socioeconomic drought, each of which has distinct impacts (Dai 2011). Observational records of drought are uncertain, and the IPCC Fifth Assessment Report concluded that “confidence is low for a global-scale observed trend in drought or dryness (lack of rainfall) since the middle of the 20th century, owing to lack of direct observations, methodological uncertainties and geographical inconsistencies in the trends” (Hartmann et al. 2013, p. 162). Most drought indices make use of two or more ECVs (Zargar et al. 2011). The availability of multivariate surface holdings through CLIMOD that included drought-relevant parameters such as temperature, humidity, and winds in addition to rainfall would greatly facilitate historical and ongoing observationally based drought research activities and help to reconcile apparent discrepancies in historical estimates of drought prevalence that depend both on data sources considered (Trenberth et al. 2014; Dai and Zhao 2017) and choice of specific indices (Dai and Zhao 2017; Fig. SB1).
The effect of heat waves on flora and fauna can be studied using solely dry-bulb air temperature. However, the effect of moisture in the air is an important measure of the thermal comfort of animals (including humans) who rely on the evaporation of water (sweating or panting) to thermoregulate. High relative humidity at high temperatures limits the rate at which evaporation can occur, leading to heat stress or, in extreme circumstances, fatal overheating. Conversely, at low temperatures, moister air can make the body feel cooler through more efficient conduction of heat away from the skin and more energy required to warm the moist air close to the skin. An additional ECV that has value in the study of heat stress is the local wind speed.
A common measure of heat stress is the temperature–humidity index (THI), which was initially developed for humans but has subsequently been shown to correlate strongly with the impact of heat stress on dairy cattle (Berry et al. 1964). By combining air and dewpoint temperature from the subdaily multivariable Hadley Centre Integrated Surface Database (HadISD) dataset, Dunn et al. (2014) investigated the changes in the THI for U.K. dairy cattle since 1973. Although on average there are only a few days per year where cattle would experience heat stress, during the heat wave years of 2003 and 2006, this exceeded 5 days or more (Fig. SB2). A shift in the distribution of THI between the early and late periods was also shown. CLIMOD would allow these kinds of studies to be conducted on a larger scale for a wider range of environments and adaptations, including the assessment of urban heat stress, heat sensitivity in crops, and vulnerability thresholds for plants and animals.
To be sustainable, CLIMOD will need to be well used. Primary users are likely to be experts and practitioners accessing the raw observations and deriving data products suitable for specific applications. These users should be engaged from the outset to ensure that CLIMOD meets their needs. Users may inform prioritization with regards to data rescue and sourcing, data formats, data access options, and data provenance. Broad usage will require data discovery metadata and associated data exploration tools. This may include UIDs and tools to subset and visualize holdings by region, time scale, ECV, source, and provision of inventories. Discovery metadata should be ISO19115 compliant. Existing resources such as the National Center for Atmospheric Research (NCAR) Research Data Archive (http://rda.ucar.edu/) would be able to perform user-defined subsetting operations. Metadata exploration could be enabled through the WMO Observing Systems Capability Analysis and Review tool (https://oscar.wmo.int/surface), which would also ensure compliance with emerging WMO Integrated Global Observing System (WIGOS) metadata standards, or the NOAA/National Centers for Environmental Information (NCEI) Historical Observing Metadata Repository (www.ncdc.noaa.gov/homr) interface. Although initially machine-readable metadata may be extremely limited for many stations, in the longer term such tools may facilitate appending of richer metadata for stations.
A range of download tools will be required to cater to diverse users. Regional mirrors would aid data provision and also ensure accessibility. Some users will prefer web-based access tools, whereas frequent users are likely to need service via FTP, OPeNDAP, or similar. Many applications will require timely access to data updates, so a regular update cycle will be needed. An increasing number of applications for climate services require timely updates (within days to a month).
It is likely that users will find errors in the data and its presentation. A mechanism will be required for users to be able to report data issues. This should include a response system that enables innovations to improve the database to be enacted in a timely manner. The NOAA/NCEI Datzilla ticket system (https://datzilla.srcc.lsu.edu/datzilla/) is an example of such a facility and has led to corrections including poor station merge decisions and bad data segments. Users also need to be made aware of updates. This requires a clear version control protocol. One or more of monthly status reports, blog posts highlighting updates and applications, and provision of educational resources would help build a loyal user community.
GOVERNANCE AND MODUS OPERANDI.
CLIMOD clearly constitutes a long-term effort that would benefit substantially from sustained international cooperation. Such an explicitly international effort would reduce the risk of a single point of failure (whereby the decision of a single funder or host organization may imperil continued function) and also make it clear that the set of holdings belonged to the global research and applications communities. This might reduce anxieties over data intellectual property rights among potential data contributors. A broadly based international steering committee of recognized experts would be advisable, answering to a body such as the Global Climate Observing System (GCOS), World Climate Research Programme, or WMO Commission for Climatology. The steering committee should provide advice on data rescue, policy, processing, and serving to cover the full range of activities.
Such international and sustained collaboration is only likely to result if sufficient resource support is forthcoming to facilitate the work on a long-term basis. Regular forums or meetings of participants would be required to ensure a coordinated approach, to discuss progress and challenges, and to assure continued cooperation. The effort will require the input of scientists, software engineers, data access rights experts, database/data life cycle experts, project managers, and associated support staff. Although initial work may benefit from a research project, long-term continuity will be best achieved through sustained support from NMSs or large international programs, such as Copernicus. To that end, we strongly welcome the recent Copernicus Climate Change Service Contract awarded (led by the lead author on this paper and in partnership with NOAA/NCEI), which shall, over the coming four years, start to make CLIMOD a reality and as a further benefit provide integrated access to land and surface marine observations using a common data model.
Data policy needs to be clearly articulated and to have broad buy-in. Internationally, efforts have been made to secure access to observations resulting in WMO Resolutions 40 and 60, which pertain to real-time and historical data exchange, respectively. These govern the routine global exchange of meteorological data by all nations, but are far from perfect since they allow institutions to place restrictions on data use and reuse. CLIMOD must operate within these boundaries while working with others to advocate for improved access to data. Usage policy must be agreed upon and clearly articulated to end users. It may be that different data streams could have different levels of restriction on use and reuse. There is no value in publicly serving data that are not useable at a minimum for academic and research purposes. While recognizing the substantial data policy hurdles to be overcome, the long-term aspiration should be that data be shared openly and without restriction, with a share-and-share-alike onward usage policy. Regardless, acknowledgment by citation and DOI (to a list of contributors), and where necessary to relevant citations of value-added products, will be required.
We have outlined a framework for the construction of a comprehensive database of land meteorological holdings that are integrated across time scales and ECVs. Such integration is essential to understand and address the challenges of climate change and variability. Observations are the foundation upon which datasets, products, and services are built and based upon which decisions can be made. The time is right to address the challenge. However, outlining and agreeing on a broadscale vision is necessary but not sufficient. A sustained and sufficiently resourced effort is needed, covering the entire processing chain from data sourcing (including rescuing) through to amalgamation and provision to the community. The need is clearly articulated in the latest GCOS Implementation Plan (GCOS 2016). The question that remains is whether sufficient organizations will “step up to the plate” on a sustained basis to take on the meteorological databank challenge.
The work was facilitated by Grant 16/CW/3801 from Science Foundation Ireland. Dr. Jan Rigby (Maynooth University) provided expert advice on the mosquito-borne diseases sidebar. Prof. Aiguo Dai (SUNY-Albany) provided useful input and a figure for the drought sidebar. Deb Misch (Telesolv Consulting) and Sara Veasey of NOAA/NCEI graphics team are acknowledged for substantial help in creating publication-ready figures. Prof. Rob Allan is supported by a combination of funding from the Joint BEIS/Defra Met Office Hadley Centre Climate Programme (GA01101), the European Union’s Seventh Framework Programme (FP7) European Reanalysis of Global Climate Observations 2 (ERA-CLIM2) project, and the Climate Science for Service Partnership (CSSP) China under the Newton Fund. He also acknowledges the University of Southern Queensland, Toowoomba, Australia, where he is an adjunct professor. Robert Dunn and Kate Willett were supported by the Joint BEIS/Defra Met Office Hadley Centre Climate Programme (GA01101). Linden Ashcroft is supported by EU-FP7-SPACE-2013-1 project Uncertainties in Ensembles of Regional Reanalyses (UERRA; Grant Agreement 607193). Elizabeth Kent was supported by NERC by Grant NE/J020788/1. John Coll acknowledges funding provided by the Irish Environmental Protection Agency under project 2012-CCRP-FS.11. Gilbert Compo is supported by the NOAA Climate Program Office and the U.S. Department of Energy Office of Science Biological and Environmental Research Program. Petra Pearce was supported by NIWA Core Funding under the Climate Present and Past project (CAOA1701), the New Zealand Deep South National Science Challenge (BDS16101), and NOAA. Stefan Bronnimann acknowledges funding from the Swiss National Science Foundation (Project 205121_169676) and was supported by the European Commission H2020 Grant EUSTACE (Grant Agreement 640171). Jun Matsumoto was supported by the KAKENHI Grant (26220202) from the Japan Society for the Promotion of Science (JSPS).