Central Europe has a vital and extensive meteorological research community comprising national weather services, universities, and research organizations and institutes. Nearly all of them are involved in the open scientific questions regarding clouds and precipitation processes. The research activities include observations (from in situ ground-based remote sensing radio soundings to satellite-based observations), model development on all scales (from direct numerical simulations to global climate models), and other activities. With Germany as an example our first objective is to show the large amount and the diversity of observations regarding clouds and precipitation. The goal is to give an overview of existing measurements and datasets to show the benefit of combining the different information from a variety of observations. Up to now the access to and the usage of these datasets from different sources was not straightforward, due to the issue of missing data and archiving standards for observational data. This then motivates our second objective, which is to introduce our solution for this issue—the novel Standardized Atmospheric Measurement Data archive (SAMD). SAMD is one of the outcomes of the German research initiative High Definition Clouds and Precipitation for Advancing Climate Prediction [HD(CP)2]. The goal of SAMD is an easy-to-use approach for both data producers and archive users. Therefore the archive provides observational data in the common Climate Forecast (CF) Conventions format and makes it available to the broader public. SAMD offers highly standardized quality-controlled data and metadata for a wide range of instruments, with open access, which makes this novel archive important for the research community.
A multitude of cloud and precipitation observations in Germany are now available in the new Standardized Atmospheric Measurement Data archive.
A generation ago, as the contours of clouds began to impose themselves on the climate predic- tion enterprise (Cess et al. 1989), the scientific community began to realize that improvements in the treatment of cloud and radiation in global-scale circulation models required a meticulous testing of their parameterizations against special observations describing the physical processes. Consequently, the United States Department of Energy responded by setting up the Atmospheric Radiation Measurement (ARM) program (Turner and Ellingson 2016). The idea was to simultaneously characterize the optical properties of the atmosphere—determined in large part by clouds and water vapor—and the ensuing distribution of radiation. These measurements, it was hoped, would help tame wild differences in the representation of clouds by climate models (Ackerman and Stokes 2003).
The first approach called for a characterization of the physical properties within an atmospheric column concurrent with radiation measurements. Ackerman and Stokes (2003) explained the idea using the analogy of a soda straw, which facilitates vertical exchanges in a manner that is not directly dependent on what happens in neighboring columns. This measurement concept has been enormously influential, shaping how a generation of climate modelers thinks about clouds and radiation (Randall et al. 2003; Schneider et al. 2017).
Since the advent of ARM, Moore’s law1 has allowed for a steady increase of the computational throughput by a factor of 10,000, if not more. Large-eddy simulations, which at that time were run for a few hours over idealized domains of a few kilometers (Moeng et al. 1996), are now being performed for periods of days, for domains spanning 1,000 km (Heinze et al. 2017). This is not the only change. Precipitation, like clouds and radiation, has come to be appreciated as an important physical process that climate models inadequately represent (Dai et al. 1999), which is also less compellingly described by the soda-straw approach. A further change, important at least for the present narrative, is that routine meteorological measurements, which in the past—especially in Europe—were proprietary and difficult to obtain, are now more widely available.
These changes were very much in our minds when we proposed the German national project “High Definition Clouds and Precipitation for Advancing Climate Prediction” [HD(CP)2], to evaluate the ability of large-eddy simulation (LES) to lessen the climate-prediction cloud and precipitation problem by reducing the scale gap between the relevant processes and the comparably coarse resolution of the atmospheric models. The project sought to answer the question: If it were possible to perform large-eddy simulations of the global atmosphere, would it be better than traditional approaches? How would we know? To provide the answer we proposed to collect and organize observations relevant to clouds, radiation and precipitation from across central Europe, into a central archive using a standardized format with a rich metadata description. In essence, the idea was to turn Germany, if not central Europe, into an ARM site. The purpose of this article is to describe the results of this effort, a data archive called Standardized Atmospheric Measurement Data (SAMD). With SAMD we have established a new infrastructure, which stands for a new culture in data archiving: highly standardized, easy to use, and freely available data for the whole research community, in line with the new paradigm proposed by Overpeck et al. (2011). SAMD focuses on long-term observations as well as campaign data, meteorological observatories as well as network data, and operational used instruments as well as new and experimental instrument data. In doing this work we hope to encourage the use of its data, but also to encourage others to adopt the methods, quality control procedures, and data description methods we developed, to expand the archive for use by future generations.
HD(CP)2 is a project funded by the German Federal Ministry of Education and Research. It is a cooperative project with more than 100 members from 19 different participating institutions and universities all over Germany. HD(CP)2 employs in total about 35 postdocs and 12 Ph.D. students that are involved in the six modules that make up the project and deal with a very different science questions. The topics that are covered within HD(CP)2 range from the influence of aerosols over questions concerning microphysics to organization of convection. It aims to improve the understanding of cloud and precipitation processes and to circumvent the use of parameterizations, as this still denotes one of the largest uncertainties in current climate prediction models. Therefore HD(CP)2 utilizes two of the six modules as so-called infrastructure, first a high-resolution large-eddy model (ICON-LEM) that was developed in the first phase (2012–15) of the project and provides a previously unknown horizontal resolution of up to 156 m. In the second project phase (2016–19) this model is used for 24-h-long hindcast simulations of various synoptic situations, ranging from convective systems with thunderstorms to cloudless days. The model output is compared to observational data to allow a critical assessment. Alongside the model development, the second infrastructure module in HD(CP)2 organizes the observational activities to provide measurement data in the same resolution as the model data. As part of the observational module, a project data archive for standardized measurements was built up. In the second project phase, the SAMD archive emerged from this project archive and is now open for the whole research community.
THE GERMAN CLOUD AND PRECIPITATION RESEARCH LABORATORY.
Central Europe, in particular Germany, has a multifaceted research community regarding atmospheric physics, clouds, and precipitation processes in particular. This community consists of universities, research centers, and the national weather services, all of them having different foci. Germany’s National Meteorological Service (DWD) pursues, among others, operational observations, like a C-band radar network, a ceilometer network, and observatories (e.g., the Richard Aßmann Observatory in Lindenberg, east of Berlin). Their measurements are part of the operational forecasting system. Universities and research centers have their focus on research. This includes the operation of supersites, the development of new instruments and measurement techniques, the development of new satellite products, and the realization of measurement campaigns, to name only a few. Furthermore, there are several cooperation and collaborations both within Germany and also with international institutions, such as with the Royal Netherlands Meteorological Institute (KNMI).
The same institutions are part of a vital modeling community in Germany, including a model cascade from cloud-resolving models, LES models, weather forecast models, and regional models to global circulation climate models. The development of new methods to resolve clouds, precipitation, and related processes in LES models and the development of new parameterizations of those processes for large-scale climate models are only a few of current science questions of this community.
The existence and cooperation of the various communities, institutions, and research centers are the key, which turns Germany into one research laboratory for clouds and precipitation. The collaborations also allow national research projects like the HD(CP)2 project,2 including 19 different institutions. This research initiative is addressing the lack of understanding of cloud and precipitation processes, which is arguably the foremost challenge of current climate simulations.
With regard to observations, we are focusing on three main pillars. The first one is formed by the meteorological observatories (see the next subsection). These measurement sites provide temporally highly resolved, long-term observations of single columns of the atmosphere. The second pillar is satellites and ground-based networks, with spatially resolved, long-term observations of specific variables of the atmosphere (see the section “Full domain”). Last but not least, the short-term observations from measurement campaigns and intensive observational periods (IOPs) are the third pillar (see the section “Measurement campaigns and IOPs”). All three types of observations provide specific information of the atmospheric conditions. Our goal is the collection of this considerable quantity of observations as well as making these data available and usable for the whole research community.
Datasets of these different observation types are exemplary shown for 11 May 2013 in the upcoming subsections. On that day, Germany was mainly influenced by a ridge of high pressure with its center at the Azores. During the day, a weak cold front was passing from western Germany to northern Germany causing frontal rain along the passage. Low wind speeds were dominating together with atmospheric instability at most parts of Germany. Broken cloudiness with convectively induced rain showers were mainly present during the day due to strong atmospheric instability and heating by the sun. Together with a low upper airflow, clouds and precipitation were predominantly induced by local effects. Therefore, this day is ideal to investigate clouds and precipitation processes in detail by various observations available within central Europe.
Many institutions are operating measurement sites with a large variety of different instruments. Such a measurement site could be a permanent meteorological observatory, like the Richard-Aßmann Observatory in Lindenberg3 (near Berlin), operated by the DWD, or a so-called supersite, like Jülich Observatory for Cloud Evolution Core Facility4 (JOYCE-CF; in Jülich, near Cologne) (Löhnert et al. 2015). Furthermore, mobile facilities are operated during measurement campaigns, like the Leipzig Aerosol and Cloud Remote Observation System5 (LACROS), from the Leibniz Institute for Tropospheric Research in Leipzig. Figure 1 and Table 1 show an overview of the locations and the main focus of these meteorological observatories.
An important characteristic of such a site is the continuous observation of the atmosphere with a suite of different instruments. In particular, the combination of remote sensing (active and passive), near-surface, and tower measurements creates a unique added value that makes a meteorological observatory, both fixed and mobile, vital for meteorological research and operational weather forecasts. The high density of instruments and measurements allows a very precise description of the state of atmosphere above the measurement site, with a high temporal resolution, and, mostly, over long time periods. A clear advantage is the combination of different instruments and measurement methods that allows the development of synergy products for additional information like cloud thickness. One example is the Cloudnet6 data product, which combines observations of a cloud radar, a microwave radiometer, and a ceilometer (Illingworth et al. 2007) to synergistic information about the cloud properties. These data products are of high value and allow for a more advanced evaluation of weather forecast models like the evaluation of liquid water path (LWP) in contrast to single sensor measurements.
We want to present JOYCE-CF as an example of a supersite that contributes data to SAMD. JOYCE-CF is located in Jülich and jointly operated by the Universities of Cologne and Bonn as well as the Research Center Jülich. It recently became a German Research Community (DFG)-funded core facility, which includes the aim of long-term, continuous, and sustainable observations.
The focus of JOYCE-CF is on the detailed description of processes around clouds and precipitation by ground-based remote sensing (Löhnert et al. 2015). JOYCE-CF contains the column observations at the research center Jülich as well as the two scanning X-Band radars at the Sophienhöhe and in Bonn (the main instruments are shown in Fig. 2). All these instruments are measuring continuously with a high temporal resolution of mostly 30 s and better. Because of the excellent cloud observing capabilities, JOYCE-CF was chosen as location for the HD(CP)2 Observation Prototype Experiment (HOPE) in 2013 (Macke et al. (2017); see also the section “Measurement campaigns and IOPs” herein).
The measurement example of the 24-h time series for 11 May 2013 (Fig. 3) shows the complementary information that can be obtained by the various observation techniques. The Cloudnet target classification combines observations of a cloud radar, microwave radiometer (MWR), and ceilometer to identify the dominant type of hydrometeors or aerosol at different height levels. There were several rain showers during the day, starting between 1000 and 1100 UTC (Fig. 3a). The Doppler lidar allows the quantification of the boundary layer wind profile (Fig. 3b). Observations of the microwave radiometer present the temperature profile of the boundary layer which was clearly responding to the first rain shower at 1100 UTC by cooling the lowest 1,000 m (Fig. 3c). Around 1400 UTC a short dry spell moved over the area, indicated by a drop in integrated water vapor (IWV) by approximately 4 kg m–2. Altogether, the development of the atmospheric state in the column at JOYCE on 11 May 2013 can be well described by the instrumentation.
Full domain (ground-based networks and satellite products).
The observatories are able to yield a large amount of information with a very high temporal resolution, but just for one location and profile. Ground-based networks and satellites are gathering information about specific single variables only, but provide a spatially resolved view of the atmosphere. In Table 2, a short overview of the networks, which are currently available in SAMD, is provided.
The DWD operates a large number of meteorological instruments and sensor networks. The network of C-band rain radars is especially important in this context. The positions of the 17 radar instruments (Fig. 1) guarantee an almost 100% coverage across Germany (Helmert et al. 2014). The usage of the operational calibration procedure Radar-Online-Aneichung (RADOLAN; Weigl et al. 2004) generates temporally and spatially highly resolved quantitative precipitation data through a combination of radar and rain gauges measurements (an example is given in Fig. 4a). The resulting data are used, among others, as assimilation data into the operational weather forecast model Consortium for Small-Scale Modeling (COSMO-DE; see Stephan et al. 2008). The data show the precipitation along the front (at 1200 UTC in the western part of Germany) and the small-scale convective precipitation at large parts of Germany, as described in the section “The German cloud and precipitation research laboratory.” Furthermore, the DWD operates a high-density network of ceilometers. More than 160 ceilometer instruments are distributed across Germany, collecting information about the vertical structure of the atmosphere. One important product is the cloud-base height (see Fig. 4d for the position of the ceilometers and the observed cloud-base height for the example day: 11 May 2013).
The Helmholtz Centre Potsdam operates a network of roughly 500 Global Navigation Satellite Systems (GNSS) stations, distributed across the country. One resulting meteorological data product is the integrated water vapor (Fig. 4c), which is used, for example, for the operational assimilation of regional weather forecast models and for several scientific investigations (e.g., Ning et al. 2016; Zus et al. 2015).
In addition to ground-based networks, satellites offer a vast number of valuable observations. These observations from space are part of the operational weather forecast for more than 30 years by now and therefore give the opportunity to analyze long-term spatial data on a global scale. Satellites in lower orbits scan the Earth with their comparably narrow swaths and cross a certain location of the Earth at certain times only. These satellites carry active and passive instruments, such as radars, lidars, and radiometers at different wavelengths. The swath width of such scanning satellites ranges from less than 2 km, such as for the nadir-pointing Cloud Profiling Radar (Durden and Boain 2004) on CloudSat, to some thousand kilometers for instruments like the Moderate-Resolution Imaging Spectrometer (MODIS7) or the Advanced Very High-Resolution Radiometer (AVHRR8). Naturally, the resolution of the instruments has a broad range from orders of 1 km to orders of 10 km.
In contrast, geostationary satellites stay at fixed positions over a certain area of the Earth at a height of about 35,800 km. Such satellites observe continuously the surface and the atmosphere in that area, but have a coarser spatial resolution due to their height. One important example for geostationary satellites are the Meteosat Second Generation (MSG) satellites with, among others, the Spinning Enhanced Visible and Infrared Imager (SEVIRI)9 on board. Figure 4b shows one observed variable, the SEVIRI cloud-top height. The comparison of the SERIVI cloud-top height with the precipitation rate and the integrated water vapor allows a more detailed interpretation than each single dataset.
Measurement campaigns and IOPs.
Most of the above-mentioned instruments and techniques provide continuous data streams. Those are, however, typically derived for standard operation modes for which the locations and periods of measurements, as well as the measured parameters, are fixed. This ensures the recording of long-lasting, well-defined datasets with the drawback that the spatiotemporal coverage of observations or of the observed parameters cannot be easily modified or adapted to custom requirements. Nonetheless, frequently flexible instrumental setups are required to address certain key questions of atmospheric sciences appropriately. Such key questions are usually devoted to either enhancing the understanding of specific meteorological processes or to the evaluation of atmospheric models and parameterizations used therein. The deployment of instruments in order to address such specific requirements is usually realized by means of field experiments or IOPs. A selection of recently performed field experiments, which incorporated multiple German research institutions, is presented in Table 3. LAUNCH was dedicated to the provision of a dataset for data assimilation experiments using high-resolution water vapor profiling systems in regional numerical weather prediction (NWP) modeling. The obtained dataset was used to demonstrate that regional modeling of the water vapor field improves significantly when continuous observations of three Raman water vapor lidars were incorporated in addition to regularly launched radiosondes (Grzeschik et al. 2008). COPS was dedicated to collect datasets of high accuracy and resolution as they are demanded the improvement of quantitative precipitation forecasts. The focus of the campaign, conducted in the German Black Forest, was set on orographically induced precipitation (Wulfmeyer et al. 2011). The objective of HOPE was to improve the representation of clouds and precipitation in high-resolution numerical modeling as it has aspired in the frame of HD(CP)2 (see the next subsection). HOPE took place in April and May 2013 nearby Jülich (JOYCE-CF). The instrumentation included a radio sounding station, around 30 remote sensing instruments, 99 pyranometers, and 6 energy balance stations operated at different sites, some of them in synergy (Macke et al. 2017).
Field experiments are not necessarily restricted to ground-based observations. Airborne and shipborne observation platforms provide a high flexibility in terms of required spatiotemporal coverage and instrumental needs. Research institutes, like the Alfred Wegener Institute Bremerhaven (AWI) and the German Aerospace Center (DLR) operate different research vessels (RV), like Polarstern and Sonne, and research aircrafts, like the High Altitude and Long Range Research Aircraft (HALO) Gulfstream G-550 and the Dassault Falcon 20E–D-CMET (Falcon for short). Recent prominent examples for airborne campaigns are the Midlatitude Cirrus (ML-CIRRUS) experiment (Voigt et al. 2017) and the Next-Generation Aircraft Remote Sensing for Validation (NARVAL) studies I and II (Klepp et al. 2014).
The atmospheric observations collected during field experiments, research flights, and ship cruises are in general of outstanding relevance for state-of-the-art atmospheric research. The project partners are thus usually required to provide all data records and obtained results in publicly available databases in order to make the measurements available for the scientific community. This requirement is additionally substantiated by the large amount of funding required for such intensive observation periods, giving further motivation for storing the measured data on a long-term. Future usability of such datasets, however, relies considerably on the availability of the data in databases that are easy to use for both data providers and users.
For all meteorological observations, one big challenge is to guarantee high-quality, controlled, and long-term observations, which are necessary for climate studies. A further challenge is the management of the large amount of data from these measurements, including different product levels. All these data have to be prepared for reuse, collected, stored, and made available with a precise description with defined metadata—in a sustainable way.
A NEW ARCHIVE FOR METEOROLOGICAL MEASUREMENT DATA: SAMD.
The starting point: Supporting the big leap in climate modeling—Large-eddy resolving simulations.
The biggest uncertainties of latest atmospheric models are still caused by the lack of knowledge with regard to parameterizations of cloud and precipitation processes (IPCC 2007; Forster et al. 2007). The German research initiative HD(CP)2 aims at a big leap to overcome some of the necessary parameterizations by performing high-resolution large-eddy simulations.
The novel, unified Icosahedral Non-Hydrostatic (ICON) model was jointly developed by the German Weather Service and the Max Planck Institute for Meteorology (MPI). This model framework can be used for large-scale climate simulations, weather predictions, and down to finescale LES studies (Dipankar et al. 2015), all sharing the same code base (Zängl et al. 2015). Realistic LES simulations with a horizontal resolution of up to 156 m are conducted for all of Germany (900 km × 800 km) for several days within the HD(CP)2 project without any convection parameterization. These novel high-resolution experiments provide an unprecedented three-dimensional insight into clouds and precipitation, but it is important to demonstrate their realism. For that reason, a variety of new appropriate observations at similar scales are necessary for a comprehensive model evaluation, model development, and cloud and precipitation process studies at these scales. Therefore, all datasets have to be easily accessible and comprehensible even for non–observational experts, but simultaneously they have to account for all sensor characteristics. This led to the initial idea to create the new standardized SAMD archive from which further challenges and demands arose.
For example, the required multitude of different sensors, observation strategies, data formats, and institutions of, for example, the corresponding HOPE campaign made it inevitable to develop one common HD(CP)2 observational product standard, based on the general Climate Forecast conventions (CF)10 (Gregory 2003) considering all instruments. The common standard is crucial to make this large dataset accessible and useful for the whole community, which was also the motivation for all data providers and users to participate during the creation of this standard. A common naming scheme for each variable, sensor, and dataset was developed, which is meanwhile the basis for other German research projects [e.g., Urban Climate under Change (UC2)11]. The new and highly standardized database SAMD was thus set up using the common data format (see Lammert et al. 2018) making observational data easily accessible and usable, like model data in the Coupled Model Intercomparison Project (CMIP)12. These advantages help in increasing the usage of observations throughout the whole community as well as to bridge the gaps between model developers, model users, and data providers.
Data format and description: Observations in the context of clouds and precipitation cover a wide range of different instruments with different temporal and spatial resolutions, data formats, and processing levels. Therefore, to find a common and long existing data format is one of the key challenges to build a long-term archive. The SAMD product standard (Lammert et al. 2018) was developed, involving more than 20 participating institutions, in order to define a common data and metadata specification using the well-established Network Common Data Form (netCDF; Unidata 2016). This data format is commonly known and allows a very flexible storage of all relevant metadata directly in the header of each data file. The SAMD standard defines, for the first time in Germany, an overarching and highly standardized format for a variety of cloud and precipitation measurements.
Data policy: The high number of participating institutions means also different solutions regarding the official data policy for observational data. An exceptional achievement for SAMD is the common SAMD data policy accepted by all the diverse data providers, offering open data access to all noncommercial users. This pushes the worldwide existing open data initiatives and promotes innovation by sharing data, which is one of the leading principles of SAMD.
Data quality management: All SAMD data are verified by an extensive two-stage quality-control system to guarantee a high level of standardization and quality. The first automatic quality check examines all newly uploaded files for the SAMD product standard netCDF CF conventions as well as for the correct file and variable naming scheme. In the second step datasets are undergoing the scientific check by the data management scientist regarding plausibility and consistency. Only this quality management system ensures the high level of standardization provided by SAMD (Fig. SB1).
The reason for SAMD.
The demand for easy usable reference data for the new high-resolution simulations of the HD(CP)2 project was the primary reason to set up the SAMD archive. However, during the conception phase, the importance of a highly standardized database for all cloud and precipitation observations in central Europe became more and more obvious. For example, nowadays there are more atmospheric observations collected every second by various instruments and devices than ever before, with an incredible growth rate. New sensors, technological developments, and upcoming satellite missions (Illingworth et al. 2015), as well as planned measurement campaigns, will even speed up this growth. State-of-the-art observation systems cover almost all relevant scales on which clouds and precipitation mainly occur like global long-term satellite records, high-resolution ground-based precipitation radar networks, or in situ cloud measurements gathered by research aircrafts. Nevertheless, to find suitable observations for a certain analysis, region, and/or period of time has never been as complicated and diverse as before due to technical, bureaucratic, and other obstacles in accessing the datasets.
Most cloud and precipitation measurements are conducted by a wide variety of institutions across central Europe like universities, public and private research institutes, and national, international, and European agencies [e.g., the German National Meteorological Service or the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT)], as well as many more data providers. Most of those institutions have their own data policies, data portals, and formats. Many datasets are moreover not publicly accessible at all. The data are often only stored on local servers or are spread over several different web portals like Cloudnet and ACTRIS.13 Furthermore, some web portals have limitations or are not freely accessible at all.
All these restrictions make working with the data more difficult for researchers and slow down the overall scientific progress. Furthermore, many valuable long-term and/or operational datasets, measurement campaign data, and small regional observations are still undiscovered by the community. Therefore, they cannot be used so far although those datasets can be very worthwhile to look at for many aspects like process understanding or the optimization of future field campaigns.
Easy access and well-standardized data are required to increase our knowledge about atmospheric principles, help to develop synergistic datasets like the Cloudnet products, and support the work on novel measurement techniques. Advancements in the accessibility, availability, and quality of observational data for the whole research community, and not only for the providers, will be the basis for an increased usage of observational data in the future. Moreover, standardized and easily accessible measurement data are crucial for an overarching model evaluation, as shown by Heinze et al. (2017), to improve current weather and climate prediction models.
Even though the need for standardization and quality control of all observational datasets creates lots of additional work for the data providers, many providers have experienced themselves the advantages during their own work with the data later on. Additionally, increasing observations require more and more standards to be able to work with different data sources at the same time. Upcoming data publishing regulations and research policies will further require well-formatted and published data, which are some incentives for contributing data to SAMD. Nevertheless, first datasets were contributed within the HD(CP)2 framework, which was of great value to prove the concept and its advantages. Nowadays, more and more external providers like the Barbados Cloud Observatory are joining this initiative.
The data in SAMD.
The SAMD archive provides an easy and standardized access to a variety of observational datasets, making the work with the data as easy as possible. All datasets are in addition available under the same Creative Commons License CC BY-NC-SA, which allows for free usage for noncommercial purposes and has only be confirmed once by the user. The achievement of a single license was only possible due to personal negotiations as well as generally changing data policies of national weather services and research institutions. Ground-based, airborne, and spaceborne instrument data are included in the archive, incorporating point and area measurement data. Long-term observations and ongoing, completed, and short-term measurement campaign data are available. Some of the datasets are also available via other data portals, like the Cloudnet products.14 These datasets are additionally stored in the SAMD format and fulfil the SAMD requirements (like continuous time series for higher-level products) for user support. Most datasets cover the area of Germany or central Europe more broadly, but also data of other regions like the tropics (e.g., Barbados Cloud Observatory; see Stevens et al. 2016) are available via the SAMD interface. Table 4 outlines a few examples of long-term records and Table 5 some of the short-term measurements that are available. The whole database contains more than 170 different datasets with more than 200,000 files, starting in 2007, and is distributed over three different data centers (Stamnas et al. 2016).
The satellite products that are currently contained in SAMD15 are liquid and ice water path, cloud mask, cloud optical thickness, the effective radius of cloud liquid and cloud ice particles, and cloud-top altitude from SEVIRI on MSG (Roebeling et al. 2006; Bley et al. 2016). Furthermore, the surface albedo and vegetation index (16-day averaged), and surface temperature, surface longwave emissivity, and sea surface temperature (on a daily basis) from MODIS (Wan 2008) on the satellites Terra and Aqua are available. These datasets, which are also freely available (https://modis.gsfc.nasa.gov/data/), are stored for the region of central Europe in the SAMD data format as a service for the SAMD users.
In addition to existing satellite products, SAMD also contains higher-level products, which were developed from research groups outside the operational framework. Such products are often difficult to get or contain insufficient metadata because they are stored locally at the respective institution only. SAMD makes them available to the international research community and ensures that the datasets comply with the SAMD quality standards. Examples for newly developed satellite products in SAMD are the cloud thickness, which is a combination from DWD’s ceilometer network and data from SEVIRI on MSG that is provided in a joint effort by the Leibniz Institute for Tropospheric Research and the University of Cologne.
All datasets mentioned in the previous sections highlight some specific type of data and can be freely downloaded from the SAMD archive (www.samd-archive.org). These observations build the basis for an easy and reliable model evaluation, proving, for example, the physical consistency of a model. The whole database will be extended by ongoing measurements and new data sources as well as through further collaborations with data providers and other databases. Future improvements (see the section “Future plans”) are planned within the upcoming months.
An easy-to-use approach.
One of the key design principles for the new SAMD archive was to use commonly used technologies at all stages to assure scaleability, future readiness for upcoming demands of even higher-resolution observations, and interoperability with other data portals and services. Therefore, all datasets are stored at distributed Thematic Real-time Environmental Distributed Data Services (THREDDS) server instances from Unidata THREDDS Data Server (TDS) with a centralized control server and web portal collecting all metadata (Stamnas et al. 2016). The TDS technology allows the integration of an arbitrary number of servers and additional datasets without any complications for the users of the archive. The physical storage place of a dataset, whether on a certain local server or on a cloud computing data center, is not crucial at all, enabling an almost infinite scaleability and continuous growth of SAMD. The TDS technology ensures the interoperability to most other data portals, making SAMD ready for future developments.
SAMD is accessible through one data portal (www.samd-archive.org; Fig. 5), offering various possibilities to search for datasets, explore new data, or get all necessary information about them. Datasets can be selected by region, instrument, variable group, time, and/or other parameters. The data browser (see Fig. 5) allows the collection of datasets via a shopping cart function, which generates a wget script (a common software for retrieving files using HTTP, HTTPS, and FTP) for an easy download of the chosen data.
The access is granted by the widely used OpenID16 system of the Earth System Grid Federation (ESGF)17, which ensures long-term user management and low extra organizational work for all users and the archive operators. Also, the acceptance of the SAMD data policy for all datasets is controlled centrally by the OpenID system. Therefore, users have to confirm the policy just once for data access without any additional barriers.
Moreover, the web portal provides extensive, standardized, and human-readable information of each available dataset. Contact addresses, associated literature, and additional information and links offer a profound possibility to investigate the data in more detail. The extensive file naming and metadata standard of the database enables an automatic searching by third-party databases and services making the uploaded observations available to an even broader community. Future planned collaborations and metadata exchange with other data portals can thus be easily integrated.
Guaranteed long-term storage, a comprehensive web portal, provider registration, user management, and straightforward open data access are only a few of the prime advantages of the SAMD archive for data providers and users. This helps to convince more and more providers to participate and thus to make the archive even more useful for the whole community as previously inaccessible and/or hidden datasets become available to everyone through SAMD. Unique and high-resolution data gathered during the HOPE campaign (Macke et al. 2017) were one of the first available datasets and provided easy access to more than a hundred scientists of the HD(CP)2 project.
Extensive atmospheric measurement data are gathered every day by various devices and institutions. Thus it will become even more challenging and important in the future to standardize these data and make them accessible for the whole research community. In addition, future evaluation of high-resolution models requires very detailed observations to take account of the new scales. Therefore, further observational datasets will be integrated into the steadily growing SAMD archive. New measurements will be updated more frequently and an almost real-time availability is planned for several operational datasets of, for example, national weather services. An integration of high-resolution observations in terms of time (e.g., up to 20 Hz for wind measurements) and spatial resolution is targeted for the upcoming development of SAMD to support the evaluation of future high-resolution modeling activities. Upcoming sensors and datasets like AllSky cameras, X-band radars, or airborne spectrometers generate rapidly big datasets of several terabytes and are of great interest, but become even more challenging to be served in a timely manner. Measurements of the Barbados Cloud Observatory,18 operated by the Max Planck Institute for Meteorology, will also be extended in SAMD. First datasets of the S-Band rain radar and the weather station are already available (see Table 4). This will enhance the archive by valuable cloud and precipitation records even across the borders of Germany. Beyond ground-based and satellite measurements, further data from the German research aircraft HALO and other German airplanes are planned to be included into the SAMD archive, introducing a completely new dimension to the database.
Future collaborations, data interoperability, and metadata exchange with other worldwide existing Earth system databases will further increase the outreach of the available SAMD datasets, but at the same time the SAMD archive needs to establish itself through its advantages and unique features compared to the numerous other new and existing archives. The collaborations require in addition standardized interfaces and exchange formats, which still have to be negotiated and implemented by all databases. One good example for an already existing collaboration is the cooperation with the World Data Center for Climate (WDCC)19, hosted by the German Climate Computing Center (DKRZ)20 in Hamburg. All closed datasets will be sustainable stored in the WDCC with a digital object identifier (DOI) reference number for long-term archiving. One of the biggest challenges in the future will be to make the rapidly increasing number of different archives, datasets, and sensors easily searchable and accessible to the users to gain new insights by upcoming high-resolution observations. In this regard the standardized datasets and metadata of SAMD will help other archives and users to rapidly find their datasets.
In summary, the new SAMD archive offers the following benefits:
Collection, storage, and provision of meteorological measurement data
Provision of a multitude of instruments, methods, resolutions, and variables
Consistent data format for all datasets (netCDF)
Extensive documentation of SAMD data standards
Two-stage quality management
Ease of use for data providers and archive users
Sustainable research data management and infrastructure (TDS, web portal)
Thus, the SAMD archive provides a sustainable, easy-to-use, and long-term archive infrastructure for meteorological measurements, promoting the usage to an even larger research community.
This work is funded by the German Federal Ministry of Education and Research within the framework program “Research for Sustainable Development” (FONA; www.fona.de), under FKZ: 01LK1209A-E, 01LK1210A-E, 01LK1211A-C, and 01LK1212A-F. We gratefully acknowledge the efforts of numerous project scientists within HD(CP)2 who have, since 2013, actively contributed to development of SAMD. The authors thank all institutions (weather services, research organizations, universities, etc.) that provided their observational data to SAMD. The SAMD archive uses the THREDDS Data Server (TDS) technology developed by UCAR/Unidata (http://doi.org/10.5065/D6N014KG).
Thanks to Annika Jahnke-Bornemann, Remon Sadikni, and Magnus Bornemann (Integrated Climate Data Center, University Hamburg) for their great support. Last but not least, thanks to the Data Management Team of the German Climate Computing Center for the support regarding the longterm archiving in WDCC.
Moore’s law is the observation that the number of transistors in a dense integrated circuit doubles about every two years. The observation is named after Gordon Moore.