1. Introduction
Interlaboratory comparison (ILC) is a tool that can be used for a number of purposes. Its usage is increasing internationally. Results of the ILC are usually used for the evaluation of the performance of laboratories for a specific measurements and monitoring laboratories’ continuing performance. Not only the laboratories and their customers but also other interested parties, such as governmental bodies, national accreditation bodies, and nongovernmental organizations, are interested in the quality of performance of laboratories performing calibration and testing services. The most important climatological parameters measured by the meteorology community are temperature, relative humidity, and pressure, which need international recognition and mutual acceptance among all countries. Manual and automatic observations made by different types of weather stations are covered by the World Meteorological Organization’s (WMO’s) general instrument requirements. These requirements are different depending on the applications and services (weather forecasting and nowcasting, modeling of weather phenomena, historical analyses, airport weather systems, etc.) provided by national meteorological and hydrological services (NHMS). Despite increased development and importance of the remote sensing technology provided by satellite networks, the surface-based observations are of utmost importance in climate analysis. The quality of surface-based observations depends on a number of parameters, such as the equipment used, the metrological traceability, and the quality of the NHMS system and staff. The best tool for evaluating the quality of surface-based observations is performing ILCs according to critical measured parameters providing evidence of one part of a quality assurance system implemented at accredited or nonaccredited NMHSs. According to the requirements of the International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) 17025:2005 standard for calibration and testing laboratories, ILC should be regularly performed at least once per five years as crucial element for confirming quality of measurement results.
This ILC was organized by the laboratory in the Slovenian Environment Agency (ARSO) from the Regional Instrument Centre (RIC) in Regional Association (RA) VI in Europe, which acted as the coordinator and reference laboratory for pressure, temperature, and humidity. The University of Ljubljana, Faculty of Electrical Engineering, Laboratory of Metrology and Quality (UL-FE/LMK), which was the accredited interlaboratory comparison (proficiency testing) provider according to ISO/IEC 17043:2010 and the designated Institute within the National Metrology Institute, acted also as reference laboratory for temperature and humidity and was the data analysis coordinator. The MIRS/UL-FE/LMK, as the national laboratory for thermodynamic temperature and humidity, is an associated member of EURAMET (see http://www.euramet.org). The procedure and protocol for this ILC were developed within the European Metrology Research Programme (EMRP) of EURAMET, the joint research project “MeteoMet2”—Metrology for Meteorology (Merlone et al. 2018). Participating laboratories were national meteorological laboratories from 17 countries: the Croatian Meteorological and Hydrological Service (HR) and Department of Meteorology (CY), the Estonian Environment Agency (EE), the Finnish Meteorological Institute (FI), Météo France (a RIC; FR), Deutscher Wetterdienst–Hamburg (DE1), Deutscher Wetterdienst–Oberschleissheim (DE2), the Hungarian Meteorological Service (HU), Met Éireann (IE), the Latvian Environment, Geology and Meteorology Centre (LV), the Lithuanian Hydrometeorological Service under the Ministry of Environment (LT), the Institute of Meteorology and Water Management, National Research Institute, Poland (PL), the Republic Hydrometeorological Service of Serbia (RS), Slovenský hydrometeorologický ústav (a RIC; SK), the Slovenian Environment Agency (a RIC; SI1), the University of Ljubljana, Faculty of Electrical Engineering (SI2), Agencia Estatal de Meteorología (ES), and the Turkish State Meteorological Service (TR). To anonymize results each laboratory received a random two-digit code. The coding system was only known to the data analysis coordinator.
It was agreed that the assigned reference values were determined by using the results of the UL-FE/LMK (except pressure) and three RIC laboratories (Météo France, the Slovakian Hydrometeorological Institute, and the Slovenian Environment Agency). The ILC was conducted according to the requirements in the ISO/IEC 17043:2010 standard. The protocol was prepared and at the end a final report was issued as in Bojkovski et al. (2018).
2. Specification of the interlaboratory comparison
The purpose of the interlaboratory comparison was to compare the results of the participating laboratories during calibrations of thermometers, hygrometers, and barometers. Because of the large number of participating laboratories, two loops were organized. Each loop had the same type of instrumentation but different serial numbers. The circulating items were the following:
Keysight/Agilent/Hewlett Packard 34420A digital readout, two ELPRO Pt100s; calibration range from −30° to 40°C
Vaisala HMP 155 digital hygrometer; calibration range from 10% to 95% RH at room temperature.
Vaisala PTB 220 digital barometer; calibration range from 800 to 1100 hPa.
The reported expanded uncertainty of measurement was stated as the standard uncertainty of measurement multiplied by the coverage factor k = 2, which for the normal distribution corresponds to a coverage probability of approximately 95%. The standard uncertainty of measurement was determined in accordance with the publication “Evaluation of measurement data—Guide to the expression of uncertainty in measurement” (JCGM 100 2008). Prior to the calibration, test measurements were performed in order to assess the stability of the instruments and indicate any problems, which could occur as a consequence of the transport. From these measurements it has been concluded that all the instruments were stable enough and their short-term stability did not influence the final results of the interlaboratory comparison. Calibrations were carried out at an ambient temperature of nominal 23°C. The ambient temperature, relative humidity, and air pressure during calibration had to be reported. The results were reported electronically in special report form. In the report form, the participants were also asked to fill in details about the applied method, equipment, and traceability.
The interlaboratory comparison was organized in two loops. In the first loop equipment circulated through laboratories from SI1, SI2, FR, IE, DE1, DE2, SK, ES, HU, HR, CY, NL, SI1, and SI2 and in the second loop through SI1, SI2, FR, FI, EE, LV, LT, PL, SK, RS, TR, SI1, and SI2. In both loops there were regional instrument center laboratories at the beginning of the loop, in the middle of the loop, and at the end of the loop. As a result, it was possible to continuously monitor the performance of the instruments sent in both loops and react in case of noted instability. All participating laboratories had three weeks for calibration including transport to the next laboratory. The deadline for reporting the results was 4 weeks after the equipment had left the laboratory. The duration of ILC was approximately 40 weeks. The measurements were performed from June 2016 until March 2017.
For the temperature calibration the subject of the ILC was the calibration of two Pt100s in combination with the Keysight/Agilent/Hewlett Packard 34420A. The HP 34420A is a digital multimeter (readout), which can measure resistance using four-wire ohm measurement, using 1-mA test current. It combines its low-noise nanovolt input circuits with a high-stability current source to provide precise low-level resistance measurements. In this ILC it was used as digital readout of measured temperature. The calibration was performed in the following measurement points within tolerances of ±0.2°C using standard laboratory procedures: −30°, −20°, −10°, 0°, 10°, 20°, 30°, and 40°C.
For the relative humidity the subject of the ILC was the calibration of the capacitive hygrometer Vaisala HMP155. The calibration was made in the following measurement points within tolerances of ±3% RH at temperature of 20°C using standard laboratory procedures: 10%, 20%, 35%, 55%, 75%, 90%, and 95% RH.
For the air pressure the subject of the ILC was the calibration of the digital barometer Vaisala PTB220 ACA2A3A1AB. The calibration started at a minimum calibration point followed by increasing pressure and return steps by decrease of pressure. The calibration was made in the following measurement points within tolerances of 20 hPa using standard laboratory procedures: 800, 850, 900, 950, 1000, 1050, and 1100 hPa.
3. Measurement data and uncertainties processing
a. The uncertainties of assigned values

Comparing two laboratories from two different loops via linking laboratories. There are four (three for pressure) linking laboratories that participated in both loops.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Comparing two laboratories from two different loops via linking laboratories. There are four (three for pressure) linking laboratories that participated in both loops.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1
Comparing two laboratories from two different loops via linking laboratories. There are four (three for pressure) linking laboratories that participated in both loops.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1
The uncertainty analysis was done in accordance with the GUM (Guide to the Expression of the Uncertainty in Measurement; JCGM 100 2008). Uncertainty contributions were taken into account such as the stability of the measured reference thermometers, and the determined drift of the measured references (PRTs) was added to the total uncertainty as the rectangular distribution. All the uncertainties are at the 95% confidence level (k = 2).
b. Criteria for performance evaluation
The assigned values xref were calculated as weighted mean of reference laboratories (RICs and UL/FE-LMK for temperature and humidity; RICs for pressure). The uncertainty of the assigned value Uref was calculated as uncertainty of weighted mean, with uncertainties of reference laboratories at each calibration point, as described in Eqs. (1)–(10).
It is common to use a critical value of 1.0 with the En number because it is calculated using expanded uncertainties in denominator of Eq. (12) instead of standard deviations. It has to be emphasized that it is crucial that participants of the interlaboratory comparison understand and properly evaluate all the uncertainty sources in the uniform way. Only then can a value of
4. Results of interlaboratory comparison
In this section results for pressure, relative humidity, and temperature with En values and expanded uncertainty of difference (Rx − AV) are presented in graphical form. In Fig. 2 can be seen an example of the results for pressure calibration given as Rx − AV, the result of each particular laboratory value minus the assigned value calculated in accordance with Eq. (10), together with expanded uncertainties for measurements made at a measuring point of 800 hPa. Laboratories coded with numbers 22, 41, 85, and 91 did not perform calibration at this measuring point. It can be seen that result of the laboratory 53 is unsatisfactory (the result together with its expanded uncertainty does not overlap with 0). Also, the laboratory with code 58 is on the limit of acceptability. In Fig. 3 is presented a summary of En numbers for all calibration points from 800 to 1100 hPa. Laboratory 21 and laboratory 58 each have one unsatisfactory result at 1100 hPa while laboratory 53 has the majority of its measurements above 1. In Fig. 4 is presented an example of the results for relative humidity calibration given as Rx − AV, the result of each particular laboratory value minus the assigned value calculated in accordance with Eq. (10), together with expanded uncertainties for measurements made at a measuring point of 55% RH at room temperature. All the laboratories performed calibration at this measuring point. It can be seen that the results of laboratory 21 are unsatisfactory. In Fig. 5 is presented a summary of En numbers for all calibration points from 10% to 95% RH. Laboratory 21 had all En values smaller than −1 and thus unsatisfactory. Further analysis showed that this was the example of unreasonably small calibration and measurement capabilities (CMCs), which resulted in large En values. The laboratory did not take into account the number of important uncertainty sources, which at the end resulted in the expanded uncertainty that was on the level of the uncertainty of the used reference meter. Laboratory 96 had one measurement at 10% RH close to −1 but was still satisfactory. Laboratory 60 had all |En| values smaller than 0.2. Further analysis showed that this was the example of unreasonably large CMCs, which resulted in small En values. Because of a lack of experience, the laboratory overestimated some of the uncertainty sources, which at the end resulted in too large expanded uncertainty for the equipment used in the calibration process. In Fig. 6 is presented an example of the results for temperature calibration given as Rx − AV, the result of each particular laboratory minus the assigned value calculated in accordance with Eq. (10), together with expanded uncertainties for measurements made at measuring point 0°C. All laboratories except laboratory 53 performed calibration at this measuring point. It can be seen that results of all laboratories are satisfactory. However, it can be clearly seen that laboratory 96 has uncertainties that are significantly larger than other laboratories. In Fig. 7 is presented a summary of En numbers for all calibration points from −30° to 40°C. Laboratory 21 had one En value larger than 1, at 20°C.

Difference Rx − AV at 800 hPa. Each result is presented with accompanying measurement uncertainty.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Difference Rx − AV at 800 hPa. Each result is presented with accompanying measurement uncertainty.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1
Difference Rx − AV at 800 hPa. Each result is presented with accompanying measurement uncertainty.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Summary of En values for all pressure calibrations.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Summary of En values for all pressure calibrations.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1
Summary of En values for all pressure calibrations.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Difference Rx − AV at 55% RH. Each result is presented with accompanying measurement uncertainty.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Difference Rx − AV at 55% RH. Each result is presented with accompanying measurement uncertainty.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1
Difference Rx − AV at 55% RH. Each result is presented with accompanying measurement uncertainty.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Summary of En values for all relative humidity calibrations.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Summary of En values for all relative humidity calibrations.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1
Summary of En values for all relative humidity calibrations.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Difference Rx − AV at 0°C. Results are presented for thermometers PT 100 1 and PT 100 2. Each result is presented with accompanying measurement uncertainty.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Difference Rx − AV at 0°C. Results are presented for thermometers PT 100 1 and PT 100 2. Each result is presented with accompanying measurement uncertainty.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1
Difference Rx − AV at 0°C. Results are presented for thermometers PT 100 1 and PT 100 2. Each result is presented with accompanying measurement uncertainty.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Summary of En values for all temperature calibrations for PT 100 1.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1

Summary of En values for all temperature calibrations for PT 100 1.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1
Summary of En values for all temperature calibrations for PT 100 1.
Citation: Journal of Atmospheric and Oceanic Technology 36, 2; 10.1175/JTECH-D-18-0131.1
5. Comments and conclusions
The ILC had 18 laboratories participating from 17 countries. The equipment used in both loops exhibited stability and uncertainty, which enabled full evaluation of the participating laboratories capabilities.
Despite the fact that majority of the laboratories that participated in ILC are not accredited, the final results of the ILC are very good. A large majority of the results in all three fields are satisfactory (|En| ≤ 1). In the field of temperature 98% out of 270 submitted results are satisfactory, in the field of relative humidity 96% out of 117 submitted results are satisfactory, and in the field of atmospheric pressure 95% out of 784 submitted results are satisfactory. These results are comparable to the results of typical ILC performed within calibration laboratories accredited in accordance with the ISO/IEC 17025 standard. To improve the quality of their measurements, laboratories with unsatisfactory results were informed and asked to check their traceability of equipment, their uncertainty budget, and their results for any potential systematic error.
Further analysis of the unsatisfactory results in one case showed that at the time of the measurements the traceability of the used equipment was not present. After the ILC, the participating laboratory sent its equipment for calibration. A significant drift between two calibrations was determined, which influenced their results in the ILC. Another laboratory immediately repeated internal bilateral comparison with the National Metrology Institute. The results of the second comparison were satisfactory. It has been concluded that they probably did not fully follow the calibration procedure for the traveling artifact. As a consequence their results for the WMO RA VI interlaboratory comparison were not satisfactory. It is of utmost importance to fully follow instructions provided by the organizer of the ILC. And last, causes of the unsatisfactory results were underestimated or even overlooked uncertainties. Lack of full knowledge and understanding of the uncertainty sources can lead to wrong final results. Some of the uncertainties of the participating laboratories were on the level of the best national metrology laboratories or even better. For the current state of the art in calibration of any instrument, it is worthwhile to check the Bureau International des Poids et Mesures (BIPM) Key Comparison Database (KCDB) annex C for the CMCs of the best national metrology laboratories. ILCs in the field of the meteorology laboratories are not frequently organized. This ILC was the largest-scale interlaboratory comparison ever organized in the meteorological community in Europe, including not just WMO regional instrument centers but also national meteorological and hydrological services. The concept of this ILC was expanded to currently ongoing ILCs in WMO Regional Associations II (Asia) and V (southwest Pacific) with six participating laboratories and representing subloops linked to the ILC in Europe.
Results of interlaboratory comparison in Europe can prove that participating laboratories have a high level of confidence in their CMCs. The CMCs depend on quality of laboratory technical infrastructure (equipment, personnel, facilities, etc.). The evaluation using the En number has some disadvantages. In cases when a laboratory is not using its best CMCs in the comparison or does not fully understand and evaluate all the sources of the uncertainties, the results can be misleading. For example, if the laboratory had input large uncertainty into the ILC, its En number would be rather small. On the other hand, the best results are provided in cases when laboratory has small En number and small uncertainty compared to other laboratories and assigned values.
Data processing did not include any analysis for potential same sources of traceability such as use of same traceability provider that calibrates reference instruments for two or more participating laboratories. Further investigation would be able to provide more insight into traceability chains and their influence on results of the calibrations.
Acknowledgments
The authors thank all the participating laboratories for their cooperation and excellent technical assistance. Furthermore, authors would like to thank EMRP METEOMET 2 project partners and EURAMET. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
REFERENCES
Bojkovski, J., J. Drnovsek, D. Groselj, and G. Beges, 2018: Report on intercomparison in the field of temperature, humidity and pressure (MM-ILC-2015-THP). WMO IOM Rep. 128, 93 pp., http://www.wmo.int/pages/prog/www/IMOP/publications-IOM-series.html.
Cox, M., 2002: The evaluation of key comparison data. Metrologia, 39, 589–595, https://doi.org/10.1088/0026-1394/39/6/10.
Cox, M., 2007: The evaluation of key comparison data: Determining the largest consistent subset. Metrologia, 44, 187–200, https://doi.org/10.1088/0026-1394/44/3/005.
Heinenon, M., 2010: Report to the CCT on Key Comparison EUROMET.T-K6 (EUROMET Project no. 621): Comparison of the realizations of local dew/frost-point temperature scales in the range −50°C to +20°C. Metrologia, 47 (Tech. Suppl.), 03003, https://doi.org/10.1088/0026-1394/47/1A/03003.
ISO 5725-2:1994, 1994: Accuracy (trueness and precision) of measurement methods and results–Part 2: Basic method for the determination of repeatability and reproducibility of a standard measurement method, International Organization for Standardization, https://www.iso.org/standard/11834.html.
ISO 13528:2015, 2015: Statistical methods for use in proficiency testing by interlaboratory comparison, International Organization for Standardization, https://www.iso.org/standard/56125.html.
JCGM 100, 2008: Evaluation of measurement data—Guide to the expression of uncertainty in measurement, Joint Committee for Guides in Metrology, 120 pp., https://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf.
Lorefice S., and Coauthors, 2008: EUROMET.M.D-K4/EUROMET Project 702: Comparison of the calibrations of high-resolution hydrometers for liquid density determinations. Metrologia, 45 (Tech. Suppl.), 07008, https://doi.org/10.1088/0026-1394/45/1A/07008.
Merlone, A., and Coauthors, 2018: The MeteoMet2 project—Highlights and results. Meas. Sci. Technol., 29, 025802, https://doi.org/10.1088/1361-6501/aa99fc.
Nielsen, L., 2000: Evaluation of measurement intercomparisons by the method of least squares. DFM Tech. Rep. DFM-99-R39, Danish Institute of Fundamental Metrology, 17 pp., https://www1.bipm.org/utils/common/pdf/JCGM/nielsen_final.pdf.