Abstract

Near-real-time quality control procedures for temperature profiles collected from ships of opportunity were implemented during the 1980s in oceans across the world and from the 1990s in the Mediterranean. In this sea, the procedures were originally based on seven steps (detection of end of profile, gross range check, position control, elimination of spikes, Gaussian smoothing and resampling at 1-m intervals, general malfunction control, and comparison with climatology), complemented with initial and final visual checks. The quality of data derived from a comparison with historical data (namely, climatology) depends on the availability of a huge amount of data that can statistically represent the mean characteristics of the seawater. A significant amount of data has been collected, and the existing temperature database in the Mediterranean can now provide more information on temporal and spatial variability at monthly and mesoscales, and an improved procedure for data quality control has now been adopted. New “best” estimates of monthly temperature profiles are calculated by using a maximum likelihood method. It has been found that more than one “best estimate” temperature can be defined in particular areas and depths, as a consequence of climate variability. Additional near-real-time control procedures have been included in order to provide information on long-term variability associated with data. This information is included in metafiles to be used for reanalysis and studies on long-term variability and changes.

1. Introduction

Forecast, analysis, and reanalysis in ocean science all need data that are delivered in real time, as well as high-quality archived data. Real-time data are assimilated into numerical models, providing analysis of the state of the ocean and forecasts of future ocean characteristics. Quality control (QC) of such data normally involves a small number of working procedures compared to QC for archived data. In operational oceanography (as well as in meteorology) from time to time a reanalysis is performed, in order to check numerical results against more complete datasets. Reanalysis requires careful data quality control, during which well-defined procedures are applied.

Data quality management entails the establishment and deployment of roles, responsibilities, policies, and procedures concerning the acquisition, maintenance, dissemination, and disposition of data. Good data can be obtained by following precise methodologies and protocols during all phases of data collection, from preparation of surveys and instruments to postprocessing. These activities can be divided into two broad categories: 1) quality assurance and 2) quality control.

Quality assurance procedures include training of personnel, testing of instruments, calibration/intercomparison, and control of data and instruments during acquisition. For real-time data, the Quality Assurance of Real Time Oceanographic Data (QARTOD) project (Babin et al. 2009) defined the following seven management guidelines to ensure the quality of data:

  1. Every real-time observation distributed to the ocean community must be accompanied by a quality descriptor.

  2. All observations should be subject to some level of automated real-time quality test.

  3. Quality flags and quality test descriptions must be sufficiently described in the accompanying metadata.

  4. Observers should independently verify or calibrate a sensor before deployment.

  5. Observers should describe their methodology/calibration in the real-time metadata.

  6. Observers should quantify the level of calibration accuracy and associated expected error bounds.

  7. Manual checks on the automated procedures, the real-time data collected, and the status of the observing system must be provided by the observer on a time-scale appropriate to ensure the integrity of the observing system.

These guidelines contain some important points regarding quality assessment and control, and they express the need to include as much information as possible in the metadata. This is of paramount importance, since the data can be used, for example, for assimilation in forecast models (e.g., Pinardi and Coppini 2010) or climate variability studies (e.g., Levitus et al. 2000; Rixen et al. 2005), and for any information on error bounds, precision of instruments, and significance of the measurements must be provided.

Data are deemed of high quality if they correctly represent the real-world construct to which they refer. Data are affected by errors due to (among others) changes in technology and instrument precision, or the lack (in some cases) of precise quality assurance and quality control procedures. In particular, there are uncontrolled errors associated with expendable bathythermographs (XBTs), which represent a major percentage of temperature data in the Mediterranean as well as in the world’s oceans (see Gouretski and Koltermann 2007). Data errors affect assessments of climate variability/changes and climatologies (e.g., Levitus et al. 2000). For the Mediterranean, the use of data affected by errors is even more problematic because of the large seasonal and interannual variability of the seawater characteristics.

In situ observations are still far from the stage of “data deluge” that is typical of remotely sensed observations; however, different sensors are collecting data in an operational way, increasing data quantity in a manner that was not foreseen a decade ago. Although the procedures for quality assurance–quality control are well established, “good use of data” requires “best practices” involving the management of instruments, data, and metadata: there is a need to continually update and implement procedures and software.

A particular aspect of quality control normally performed is data profiling, that is, the process of examining the data available in an existing data source (e.g., a database or a file) and collecting statistics and information about that data. This requires a continuous implementation of the estimation of the “distance” between data and expected values, which is normally evaluated by averaging data across regions or in grid cells. This was done by Manzella et al. (2003), who defined the quality of data in terms of distance, measured in standard deviation, between data and mean profiles, calculated in 1° × 1° cells. Maillard et al. (2001) suggested the same approach but defined some homogeneous areas in terms of similarity among profiles. This control has required the construction of monthly or seasonal climatologies (depending on data availability) using different methodologies, such as calculation of mean values in predefined areas (e.g., Manzella et al. 2003; Belkin, 2009) or application of inverse methods (MEDAR Group 2002). Advances can be made in the calculation of “climatological values” by considering the statistical distribution of data. Figure 1a shows an example of profiles in a 1° × 1° cell (position 40°–41°N, 12°–13°E) during the month of February. The profiles have different vertical resolution and the quantity of data decreases with depth, the surface having more than 50 values and only about 20 values at 500 m. This means that the data analysis is carried out on a set of observations whose statistical significance varies with depth. Another example of data distribution is provided in Fig. 1b, in which the data for temperature intervals of 0.1°C in a 1° × 1° cell in the Tyrrhenian Sea are plotted. In the case shown in Fig. 1b, there is only one maximum in the discrete data distribution, and it is defined as a unimodal distribution. However, analyzing the data, it has been observed that multimodal data distributions do exist in different areas and depths (see sections 4 and 7 for further discussion). This could be due to the influence of different dynamic characteristics of the sea. For example, changes in thermohaline circulation of the eastern Mediterranean were observed during the mid-1990s (Klein et al. 1999). Former conditions were restored at the end of the decade. The “signature” of this “eastern Mediterranean transient” (EMT) is present in some areas of the Mediterranean and represents a possible state of the sea. In other words, the data can have a discrete distribution with more than one maximum and a mean value could, in principle, not be representative of any physical characteristics of an area.

Fig. 1.

(a) Example of temperature data distribution along the vertical in an area of 1° × 1°. The data were collected with different technologies and different vertical resolution. (b) Statistical distribution of temperature values in intervals of 0.1°C. The maximum frequency of data is at 14.1°C, while the mean value is 14.3°C. The calculated mean value is indicated with a dashed black line.

Fig. 1.

(a) Example of temperature data distribution along the vertical in an area of 1° × 1°. The data were collected with different technologies and different vertical resolution. (b) Statistical distribution of temperature values in intervals of 0.1°C. The maximum frequency of data is at 14.1°C, while the mean value is 14.3°C. The calculated mean value is indicated with a dashed black line.

This paper presents a methodology for the estimation of reference temperature profiles by considering the statistical distribution of data. These new “climatological estimates” are used for the implementation of quality control procedures.

Section 2 presents the databases and the methods to construct the best estimates of vertical profiles. The methodology has been developed for the temperature profile, a parameter for which it is possible to have a large quantity of data, but it can be expanded to other variables, such as salinity. In section 3, results are compared with previous climatologies. Vertical estimated profiles are presented in section 4, while their temporal variability is discussed in section 5. New steps for quality control are presented in section 6, and conclusions are presented in section 7.

2. Materials and methods

a. Materials

Data used for the construction of the temperature climatology are principally derived from Mediterranean Data Archeology and Rescue/Mediterranean Hydrographic Atlas (MEDAR/MEDATLAS) (MEDAR Group 2002), Mass Transfer and Ecosystem Response (MATER) (Monaco and Peruzzi 2002), and Mediterranean Forecasting System (MFS) (Manzella et al. 2007). Additional data from Mediterranean Global Ocean Observing System (MedGOOS) (Santinelli et al. 2008) were also added. The data were collected over 7132 cruises and provide 24 641 980 unique seawater temperature values from September 1900 to October 2009. Temporal distribution of all available casts is shown in Fig. 2 and Table 1.

Fig. 2.

Temporal distribution of data in the entire Mediterranean. Significant amounts of data were collected from the end of the Second World War.

Fig. 2.

Temporal distribution of data in the entire Mediterranean. Significant amounts of data were collected from the end of the Second World War.

Table 1.

Temporal data distribution.

Temporal data distribution.
Temporal data distribution.

Using historical data from different sources, there is a risk of including duplicates. Free software Ocean Data View (http://odv.awi.de; Schlitzer 2012) was used to check the existence of stations having position and time within predefined intervals (e.g., 0.01° for position and 0.5 days for time).

Historical data have been collected for over a century with very different methodologies and technologies [such as bottles, salinity–temperature–depth (STD) probes, conductivity–temperature–depth (CTD) probes, mechanical bathythermographs (MBTs), and XBTs], each of them with a different level of precision, summarized in Table 2 for temperature sensors.

Table 2.

Accuracy of instruments normally used for temperature collection. Temperature units are in degrees Celsius, Z stands for depth, and FS for full scale (Stewart 2008; Flierl and Robinson 1977; Howe and Tait 1965; Krause 1986).

Accuracy of instruments normally used for temperature collection. Temperature units are in degrees Celsius, Z stands for depth, and FS for full scale (Stewart 2008; Flierl and Robinson 1977; Howe and Tait 1965; Krause 1986).
Accuracy of instruments normally used for temperature collection. Temperature units are in degrees Celsius, Z stands for depth, and FS for full scale (Stewart 2008; Flierl and Robinson 1977; Howe and Tait 1965; Krause 1986).

Temperature and salinity data for the Mediterranean have been gathered from the beginning of the twentieth century. From a historical point of view, highlights to remember include the 1908–10 Danish “Thor” expedition, and the “Dana” expeditions, which were part of the worldwide “Carlsberg Foundation Oceanographical Expedition Round the World 1928–30” (the official name of the expedition). During a detour into the Mediterranean, the equipment on board the Dana was tried out and data were collected at 18 stations. Other important initiatives were promoted by the Experimental Thalassographic Institute in Trieste, an Austro-Hungarian scientific institution founded in 1841 and later assimilated to the Italian Thalassographic Committee (founded in 1909) after World War I.

A long period without expeditions lasted for about 5 years during World War I. The data collected in the period 1921–24 were essentially part of an exploration conducted locally in the Alboran Sea and the area around the Strait of Messina. A significant amount of data was collected in the Mediterranean starting from 1943, in an area between Gibraltar and Sicily. This data collection was probably related to the preparation for the invasion of Sicily by the Allied Naval Forces. In 1944 data were collected throughout the western Mediterranean, probably for similar wartime reasons. There were also some important programs that studied the characteristics of water masses: the International Geophysical Year (1957–58), established within the framework of the International Council for Science (ICSU); the Gibraltar and Mediterranée occidentale (MEDOC) experiments (1968–72; e.g., Lacombe et al. 1985; MEDOC Group 1970), established within the framework of International Commission for the Exploration of the Mediterranean Sea (CIESM); the Western Mediterranean Circulation Experiment (1985–86; e.g., La Violette 1989); and finally, the Physical Oceanography of the Eastern Mediterranean (1986–87; e.g., Robinson et al. 1991), conducted within the framework of the Intergovernmental Oceanographic Commission programs. On the basis of the methodologies used at the times in question, it can be inferred that the data collected with bottles during the Thor and Dana expeditions, as well the Austro-Hungarian and International Geophysical Year data, have quite good quality and a temperature accuracy of about 0.01°C. The data collected during the Second World War and the MEDOC experiments during the 1970s were probably less accurate (0.1°–0.05°C, depending on the instrument used).

It is not easy now to establish the correct instrument accuracy to assign to the data from each single cruise, since such information (nowadays called metadata) has been lost. In a general way, an approximate precision can be assigned based on the time the data were collected. Table 3 provides a rough estimation of the accuracy associated with data.

Table 3.

Estimated accuracy during different decades.

Estimated accuracy during different decades.
Estimated accuracy during different decades.

The XBT data have a depth value that is normally calculated using the Integrated Global Ocean Services System (IGOSS) formulas (Hanawa and Yasuda 1992). Several analyses on the quality of XBT temperature profiles have been published, and many contributing factors have been identified [for a review, see Reseghetti et al. (2007)]. Unfortunately, in the case of historical data, it was not possible to correct the XBT data for biases because the model and fall depth equation used are not known. This correction was done only with data collected since 1999 (Reseghetti et al. 2007).

b. Method

To achieve a reliable set of temperature profile best estimates, the Mediterranean Sea was subdivided into 1° × 1° spatial cells, with a temporal interval of 1 month. The historical data have a different vertical resolution. Bottle stations sampled the water column at predefined discrete depths. Until the 1970s, the MBT and XBT data were derived from graphs and quality dependent on the ability of the technician, who transformed the graphs to digital records. In some cases the original data were lost and only data interpolated at standard levels were available.

Because of the presence of profiles whose quality is questionable, the temperature data were checked for quality and consistency, applying a threshold rejection algorithm: data outside selected ranges [initially a gross range check of 5°–35°C at basin level and subsequently, regional range checks as defined in MEDAR Group (2002)] were deleted. The number of deleted profiles was about 1% of the entire original dataset.

Subsequently, for each cell a calculation was made of the number of data existing in intervals of temperature and depth of δT = 0.025°C and ΔZ = 2 m, respectively, and a discrete distribution was obtained.

Figure 1b shows an example of temperature data distribution with a single maximum. However, there were cells in which the number of temperature values shows two or more maxima, and this entailed a more detailed examination of the data distribution in each box and each depth. An index of the dispersion of data points in each cell can be provided by calculating an average distance between each data value and its nearest neighboring values. An expected distance is calculated using Eq. (1) (Davis 1973), defined as

 
formula

where is the area of the cell, is the length of the perimeter of the cell, and is the number of points within it. The expected distance is the average distance between neighboring points in a hypothetical random distribution. The “nearest neighbor index” is expressed as the ratio of the observed distance divided by the expected distance, as follows:

 
formula

If the index is less than 1, then the pattern exhibits clusters, and in particular, zero indicates that all data are represented by a single point. An index of 1 indicates a random distribution; if the index is greater than 1, then the trend is toward perfectly uniformly dispersed points. The maximum value possible is 2.15, indicating absolutely uniform dispersion. In the case of uniform data distribution, there would be no difference between the best estimate presented in this paper and the mean profile. This method has been applied to the temperature data, and Fig. 3 visually illustrates the concept behind the distribution index. This has been calculated for each cell, depth, and month to provide an indication of representativeness of the estimated temperature profile. For the calculation of the area and perimeter of the cells, it must be considered that each cell is 1° in latitude and longitude.

Fig. 3.

Example of data (points) distribution in a cell. The points could be accumulating to around (a) a value, (b) clustered, (c) randomly, or (d) uniformly distributed in a cell. The data distribution index [Eq. (2)] is giving an indication of the data dispersion in the cell (from 0 to 2.15).

Fig. 3.

Example of data (points) distribution in a cell. The points could be accumulating to around (a) a value, (b) clustered, (c) randomly, or (d) uniformly distributed in a cell. The data distribution index [Eq. (2)] is giving an indication of the data dispersion in the cell (from 0 to 2.15).

In practice the following steps were applied:

  • Calculate the dispersion index and adapt it as an indication of the distribution of data in each 1° × 1° cell for all calendar months across all years.

  • Analyze the index value to ascertain whether the data distribution has more than one mode.

In a discrete unimodal data distribution, the temperature values corresponding to all maxima were assumed as a first guess. The best estimates of the data profiles were obtained by best fits, in the least squares sense, minimizing the sum of squared residuals. It was decided to have smooth gradients in the vertical. The quality of the results was checked by comparison with previous climatologies. An example of the results is shown in Fig. 4, in which the original data and the best estimate (heavy line) are shown.

Fig. 4.

Example of original temperature profiles in an area of 1° × 1° and best estimate (heavy gray line) obtained by calculating a curve that was minimizing the root-mean-square and the vertical gradients of the many estimates that can be found.

Fig. 4.

Example of original temperature profiles in an area of 1° × 1° and best estimate (heavy gray line) obtained by calculating a curve that was minimizing the root-mean-square and the vertical gradients of the many estimates that can be found.

In the case of a significant bimodal distribution, two independent “temperature best estimates” were calculated, by dividing the two different data populations as follows: the dataset was searched for the minimum number of data between the two maxima. Then, the two datasets on the left and on the right of the calculated minimum were selected, and for each set the respective best estimate was calculated with the method described for the unimodal distribution.

In the case of multimodal distribution, the mean value was assumed as a first guess and again the minimization procedure was applied.

The temperature profiles obtained were smoothed with a Gaussian filter and standard deviations were calculated. The monthly standard deviations were derived from differences between the estimated temperature values and the original profiles. Since these did not have a regular vertical sampling, the standard deviations therefore lacked some values at certain depths in some profiles; in this case, the standard deviation was calculated by interpolation. For a bimodal distribution, two standard deviations were calculated for the two datasets on the left and on the right of the calculated minimum (see above).

It must be emphasized that the results depend on the choice of the weight assigned to profiles gathered using different instrumentation, and on the requirement for best estimates providing a smoothed field (minimization of the vertical gradient).

The distribution index for two particular months (February and August) is shown in Fig. 5. The index ranges from about 0.3 to 1.5. In this work the data distribution was analyzed carefully in order to define a management strategy for multimodal distribution. It was assumed that only two physical states are significant in the Mediterranean (discussed in the introduction, when the eastern Mediterranean transient was introduced). The presence of more maxima is not indicative of a variety of physical states, but it is due to a high temporal variability and a small signal-to-noise ratio. This point will be further discussed in section 7. We found that for an index lower than 0.5, the data distribution can be considered unimodal, whereas an index ranging from 0.5 to 0.8 can be considered bimodal. For random or uniformly distributed data, a simple mean value was calculated. A specific discussion on the methodology will be provided in the conclusions.

Fig. 5.

Maps of data distribution index for (top) February and (bottom) August at surface (0–2-m depth). The index is providing an indication on how the temperature is distributed in a 1° × 1° area. Index = 0 indicates that all data have a unique value, index = 1 indicates that the data are randomly distributed, and index = 2.5 indicates that data are uniformly distributed.

Fig. 5.

Maps of data distribution index for (top) February and (bottom) August at surface (0–2-m depth). The index is providing an indication on how the temperature is distributed in a 1° × 1° area. Index = 0 indicates that all data have a unique value, index = 1 indicates that the data are randomly distributed, and index = 2.5 indicates that data are uniformly distributed.

3. Monthly temperature fields

The Mediterranean Sea was subdivided into grid cells of 1° × 1° in latitude and longitude, and monthly profiles, standard deviation, and data distribution indices were calculated for each cell. In those areas where two bimodal distributions were found, secondary mean profiles and standard deviations were also calculated, by applying the method described in section 2. In this paper “mode 1” refers to best estimates calculated by using the first maximum of bimodal distribution, corresponding to a relative minimum temperature, and “mode 2” refers to the secondary maximum having a higher temperature. Below, the results for February and August obtained for mode 1 are shown and discussed, making comparisons with previous climatologies.

The temperature estimates at the surface during February are shown in Fig. 6a. There are some clear indications of well-known circulation features, such as the cold water in the Gulf of Lion with a pattern of isolines extending toward the Bonifacio Straits and relatively warm water in the easternmost part of the Mediterranean. The coastal waters have a very high temporal and spatial variability and in particular, a significant year-to-year variability can also be expected; the estimates calculated with the method described in this paper cannot be assumed representative of the monthly characteristics. In any case, it is possible to comment on some features, such as the presence of cold water in the northern Adriatic Sea (<12°C) during February (Fig. 6a), which could be due to the strong cold and dry wind prevailing in that area during winter (bora wind). The relatively warm water offshore of Venice (>17°C) is not considered representative. The original MEDAR/MEDATLAS climatology provides a very smooth temperature field, without details on variability on a spatial scale of hundred kilometers and a comparison is not useful. For this reason, the so-called World Ocean Atlas 2009 (WOA09) climatology (Locarnini et al. 2010; Boyer et al. 2009) is used for comparison, although even this appears too smooth (Fig. 6b). The general west–east and south–north gradients are in both fields, but Fig. 6a shows a higher spatial variability.

Fig. 6.

(a) Mode 1 (see text) best estimate of surface temperature (0–2 m) in February obtained in this work. Contour interval is 0.5°C. The very high SST (>17°C) between Venice and the Istrian Peninsula can be explained by the relatively warm terrestrial effluents. The very low SST (<12°C) over and west of the Kvarner Gulf is likely caused by cold bora winds. (b) Climatology of surface (0 m) temperature in February from the WOA09 (Boyer et al. 2009). Contour interval is 0.5°C. The map has been obtained with the same parameters as used in Fig. 6a.

Fig. 6.

(a) Mode 1 (see text) best estimate of surface temperature (0–2 m) in February obtained in this work. Contour interval is 0.5°C. The very high SST (>17°C) between Venice and the Istrian Peninsula can be explained by the relatively warm terrestrial effluents. The very low SST (<12°C) over and west of the Kvarner Gulf is likely caused by cold bora winds. (b) Climatology of surface (0 m) temperature in February from the WOA09 (Boyer et al. 2009). Contour interval is 0.5°C. The map has been obtained with the same parameters as used in Fig. 6a.

In August, the temperature gradients in the present field are higher than in February, and there are some typical patterns of the Mediterranean circulation, such as the relatively cold water in the Gulf of Lion, and the relatively warm water in the Tyrrhenian Sea (Fig. 7a). Here, too, the WOA09 climatology (Fig. 7b) shows a higher spatial variability compared to the MEDAR/MEDATLAS, but it is still smoother than the present one.

Fig. 7.

(a) Mode 1 (see text) best estimate of surface (0–2 m) temperature in August obtained in this work. Contour interval is 1°C. The map has been obtained with the same parameters as used in Fig. 5. The scale has been shifted toward higher temperatures. (b) Climatology of surface (0 m) temperature in August from the WOA09 (Boyer et al. 2009). Contour interval is 1°C. The same parameters in Fig. 7a have been used.

Fig. 7.

(a) Mode 1 (see text) best estimate of surface (0–2 m) temperature in August obtained in this work. Contour interval is 1°C. The map has been obtained with the same parameters as used in Fig. 5. The scale has been shifted toward higher temperatures. (b) Climatology of surface (0 m) temperature in August from the WOA09 (Boyer et al. 2009). Contour interval is 1°C. The same parameters in Fig. 7a have been used.

4. Bimodal distributions of data

In the previous section, the main characteristics of temperature fields obtained from the mode 1 best estimates were described for selected months.

We have found that for an index below 0.5, the distribution can be considered unimodal, while an index in the range of 05–0.8 is bimodal.

The data also revealed bimodal distributions, indicating that two dynamic phenomena are of importance in some areas. Figure 8a shows the profiles of the two principal estimates for the month of February. It must be stressed that only about 20 out of 256 cells have bimodal distributions.

Fig. 8.

(a) The best profiles estimated from statistical data distribution during February. Temp1 is the estimate with the major number of data in the temperature distribution. Temp2 is the second maximum obtained in those areas where a bimodal distribution was found. (b) As in (a), but during August. Temperature is in degrees Celsius.

Fig. 8.

(a) The best profiles estimated from statistical data distribution during February. Temp1 is the estimate with the major number of data in the temperature distribution. Temp2 is the second maximum obtained in those areas where a bimodal distribution was found. (b) As in (a), but during August. Temperature is in degrees Celsius.

In August, about 30 out of 264 cells had a significant bimodal distribution. An example is shown in Fig. 8b.

The comparison between the two modes was carried out for the month of February (Fig. 9). It can be seen that there are some changes in the temperature distribution, especially in the Levantine Basin and the Aegean Sea. The EMT is associated with lower temperatures. This means that the mode 1 estimates reflect the physical state of the Mediterranean associated with the EMT. From Fig. 9 it appears that one effect of this transient is to shift the temperature isoline southward.

Fig. 9.

Comparison between the best estimates obtained from bimodal distribution at surface (0–2 m). In mode 1 the first maximum (having the relatively lower temperature) is used; in mode 2 the second maximum with the higher temperature is used. Significant differences are in the Levantine Basin and Aegean Sea, where the water properties were more affected by the EMT. Mode 1 has relatively lower temperatures, probably related to the EMT. Temperature is in degrees Celsius.

Fig. 9.

Comparison between the best estimates obtained from bimodal distribution at surface (0–2 m). In mode 1 the first maximum (having the relatively lower temperature) is used; in mode 2 the second maximum with the higher temperature is used. Significant differences are in the Levantine Basin and Aegean Sea, where the water properties were more affected by the EMT. Mode 1 has relatively lower temperatures, probably related to the EMT. Temperature is in degrees Celsius.

A comparison between profiles of different modes is presented in Fig. 10. The main differences are in the intermediate and surface layers (down to about 500 m). The figure also shows also that two modes are mainly found in the eastern Mediterranean. Conversely, only in a few areas of the western Mediterranean is there a presence of two modes.

Fig. 10.

The figures of the Mediterranean show the grid points where the (top) mode 1 and (bottom) mode 2 estimates have been produced for August. For mode 2, the majority of points are in the eastern Mediterranean, where the effects of the EMT have been stronger. (right) The profiles show modes 1 and 2 on the point indicated by black dots. (left) Shading indicates the shelf seas. Temperature is in degrees Celsius.

Fig. 10.

The figures of the Mediterranean show the grid points where the (top) mode 1 and (bottom) mode 2 estimates have been produced for August. For mode 2, the majority of points are in the eastern Mediterranean, where the effects of the EMT have been stronger. (right) The profiles show modes 1 and 2 on the point indicated by black dots. (left) Shading indicates the shelf seas. Temperature is in degrees Celsius.

5. Temporal variability

The best estimates shown in Figs. 6, 7, and 9 are alternative “climatologies” of the Mediterranean Sea. However, in the case of temperature, it is also possible to provide information on the temporal variability of the thermal characteristics of the sea. The amount of data for each month of each year is in fact sufficient for a statistically calculation of the mean monthly values for each year. With a spatial representation of expected temperature values and their temporal variability, it is possible to provide information on, for example, climate evolution and to obtain a better assessment of new data to be included in a high-quality database.

Using the best estimates described in sections 3 and 4, the data falling within three standard deviations of the best estimates were retained. Subsequently, only for the area with at least 25 vertical profiles, a mean value was calculated at different depths (0–50, 50–250, 500–800 m). The variability during February and August at three different depths is shown in Figs. 11a and 11b for the entire Mediterranean. The linear trends were calculated only to demonstrate that at the different depths there is no consistent trend.

Fig. 11.

(a) The temporal variability of the temperature at surface, subsurface, and intermediate layers (0–50, 50–250, 500–800 m, respectively) during February in the entire Mediterranean. (b) The temporal variability of the temperature at surface, subsurface, and intermediate layers during August in the entire Mediterranean.

Fig. 11.

(a) The temporal variability of the temperature at surface, subsurface, and intermediate layers (0–50, 50–250, 500–800 m, respectively) during February in the entire Mediterranean. (b) The temporal variability of the temperature at surface, subsurface, and intermediate layers during August in the entire Mediterranean.

6. New steps for quality control

A temperature data file consists of two parts: general information on projects, platforms, instruments, etc. (metadata) and the data values themselves, with associated quality flags. The metadata also contain information on the quality of the entire profile (e.g., if the data are reliable for more than, say, 80% of the total record, then the profile has a special flag).

A near-real-time quality control of temperature profiles collected from available navigation was implemented in the Mediterranean by the MFS Pilot Project (Manzella et al. 2003). Over the last decade, a significant amount of data was collected, giving greater detail of temporal and spatial variability of temperature and making possible the implementation of more sophisticated quality control procedures. The quality control procedure now consists of the following 10 steps:

  1. date, position control [using the 1-min gridded elevations/bathymetry for the world (ETOPO1), for both near-real-time and historical data, although the operator is requested also to check with a good hydrographic map], and control of vessel speed

  2. elimination of spikes

  3. interpolation at 1-m intervals (if necessary, using software provided with instruments)

  4. Gaussian smoothing (for XBT)

  5. general malfunction control

  6. regional range check

  7. comparison with best estimates and overall profile quality control

  8. comparison with historical data to assess the temporal variability

  9. property/property scatter (when two parameters are measured) to assess the consistency of controlled data with other historical values

  10. visual check, confirming the final validity of profiles

Details of these procedures are given below.

  1. The position control is carried out by assuming that the first cast position is correct; the other drop positions were checked as follows: the distance and time interval between two consecutive stations is calculated and the corresponding ship velocity derived. If this is less than the maximum nominal ship speed, then the position is considered good; otherwise, a “negative” flag is allocated to the output file. The procedure also includes correction of “wrong” positions by interpolation. This is done after a check of the survey cruise report completed by technicians on board the ships.

  2. Spikes are identified by computing the median value for temperature in a chosen interval (3 m) of the profile and comparing this median value with the original profile value at the central point of the interval. The spike is detected and removed if the difference between the value and the median is greater than a 0.1°C tolerance.

  3. In cases where software is not provided by the instrument provider, the data are resampled at a 1-m interval by means of a polynomial fit.

  4. In cases where software is not provided by the instrument provider, smoothing is carried out with a Gaussian filter of 4-m e-folding depth; this eliminates high-frequency noise.

  5. The test is to check whether the temperature gradient between adjacent sample points is greater than a certain parametric value. This test does not eliminate any part of the profile, but it does provide a “warning.” The data manager can decide to maintain the data where the vertical gradient is very high or to delete it. This check is active where significant changes of temperature occur. The difference between spike removal and the general malfunctioning check is that in the first case, relatively little data are anomalous, while in the second case, a significant part of the profile has apparently anomalous temperature values.

  6. A gross range check is done in subregions defined previously in MEDAR/MEDATLAS (MEDAR Group 2002).

  7. A comparison is carried out between the temperature profile and the “best estimates.” In the new procedure, the overall consistency can take into consideration the existence of other dynamic states in some areas (e.g., the temperature profile derived from the two maxima in the statistical distributions). Step 7 provides the metadata, with a flag indicating the quality of the entire profile.

  8. A comparison with temporal series shown in Figs. 9 and 11 (as examples) is carried out. Information on how the new data fit the temporal evolution can be added to metafiles if necessary. This qualitative assessment also includes the evaluation of the technology used for data acquisition. This step could provide useful information for reanalysis and studies on climate variability.

  9. Using the Ocean Data View software, temperature–salinity (TS) diagrams of historical data in the area and of new data are drawn. This visual inspection provides information on the consistency of new data with historical data.

  10. Using the Ocean Data View software, sections are drawn and a visual inspection provides information on consistency in new data.

The procedures have been automated and are applied both to near-real-time and delayed mode data. The software was developed using the FORTRAN language and visualization uses Ocean Data View. An example is provided using data collected by the National Research Council Institute of Marine Sciences in La Spezia (CNR-ISMAR SP) in the straits of Sicily during November 2009 and November 2005 on board the R/V Urania. Table 4 shows the output of the software that checks ship speed. The alert must be evaluated on the basis of the cruise logbook, since a long time interval can be due to bad weather, or other causes. The software also checks the position (land or sea) and the depth of water to be compared with the last data value or depth given in the metadata. This is done using the ETOPO1 dataset. Table 5 shows an example of this check. The software is only providing a comparison between the last measurement depth and the bottom depth. In the event of any inconsistency, the operator has to decide what to do.

Table 4.

Output of the QC software (e.g., ship speed check). Interpretation of the speed requires a logbook.

Output of the QC software (e.g., ship speed check). Interpretation of the speed requires a logbook.
Output of the QC software (e.g., ship speed check). Interpretation of the speed requires a logbook.
Table 5.

Output of the QC software, depth, and land check. The software warns the manager that two casts were too deep with respect to the ETOPO1 database. This requires an additional check with bathymetric charts to find that the casts were done on the Malta escarpment, where the one prime resolution of the ETOPO1 bathymetry is inadequate.

Output of the QC software, depth, and land check. The software warns the manager that two casts were too deep with respect to the ETOPO1 database. This requires an additional check with bathymetric charts to find that the casts were done on the Malta escarpment, where the one prime resolution of the ETOPO1 bathymetry is inadequate.
Output of the QC software, depth, and land check. The software warns the manager that two casts were too deep with respect to the ETOPO1 database. This requires an additional check with bathymetric charts to find that the casts were done on the Malta escarpment, where the one prime resolution of the ETOPO1 bathymetry is inadequate.

In Table 6 the comparison of new data checked and other historical data is shown. In this way the operators have an idea if they should follow the normal trend or if any abnormality occurs in the time of data collection.

Table 6.

Example of comparison of the temperature in the upper 50 m at station MEDOC5 (in parentheses) with the mean historical value in the central Mediterranean (Strait of Sicily). The mean value in the MEDOC5 station was obtained by averaging the data from 4- to 50-m depths. The comparison can be interpreted as part of the spatial variability in the area. In any case the MEDOC5 data are within two standard deviations. Further investigations could provide more insight in the circulation variability. Numbers in italics indicate the MEDOC5 mean temperature, standard deviation, and number of data used for the calculation of the values.

Example of comparison of the temperature in the upper 50 m at station MEDOC5 (in parentheses) with the mean historical value in the central Mediterranean (Strait of Sicily). The mean value in the MEDOC5 station was obtained by averaging the data from 4- to 50-m depths. The comparison can be interpreted as part of the spatial variability in the area. In any case the MEDOC5 data are within two standard deviations. Further investigations could provide more insight in the circulation variability. Numbers in italics indicate the MEDOC5 mean temperature, standard deviation, and number of data used for the calculation of the values.
Example of comparison of the temperature in the upper 50 m at station MEDOC5 (in parentheses) with the mean historical value in the central Mediterranean (Strait of Sicily). The mean value in the MEDOC5 station was obtained by averaging the data from 4- to 50-m depths. The comparison can be interpreted as part of the spatial variability in the area. In any case the MEDOC5 data are within two standard deviations. Further investigations could provide more insight in the circulation variability. Numbers in italics indicate the MEDOC5 mean temperature, standard deviation, and number of data used for the calculation of the values.

Regional checks and comparison with best estimates are performed and the results are viewed using the Ocean Data View software. In this way it is possible to compare the profile in question, not only with the closest grid point of the best estimates, but also with neighboring points (Fig. 12). Furthermore, the operator can compare the data with both the first and second mode best estimates, and can add comments to the metadata arising from the comparison.

Fig. 12.

Sample temperature profile examined during the QC process; two closest best-estimate profiles and the standard deviation profile. Clicking on various grid points allows the user to compare the sample profile with other best estimates. The data user can have an indication of the quality of the temperature values (dashed line) by comparing with the best estimates and the use of the standard deviation. Temperature is in degrees Celsius.

Fig. 12.

Sample temperature profile examined during the QC process; two closest best-estimate profiles and the standard deviation profile. Clicking on various grid points allows the user to compare the sample profile with other best estimates. The data user can have an indication of the quality of the temperature values (dashed line) by comparing with the best estimates and the use of the standard deviation. Temperature is in degrees Celsius.

Mean values in different regions are calculated for each month in the different years in order to have an estimate of temporal variability. From these it is possible to assess how the data contribute to the temporal variability. This, of course, is a qualitative assessment that can be included as a comment in metafiles.

It must be emphasized that the procedure very much relies on the expertise of the data manager. Normally, the amount of data received from ships of opportunity or from survey cruises in the Mediterranean varies from 10 to 50 profiles (in the case of surveys) per day, and this amount of data can be checked in 1 h.

7. Conclusions

The initial objective of this project was improvement of quality control for data collected in an operational way. As we discussed in the introduction, in many cases data are gathered in an opportunistic way, which makes it difficult to define precisely the spatial and temporal variability in any marine region of the world. In any case, in the Mediterranean the amount of temperature profiles collected during the last decade has been so high that it is now possible to define a more precise climatology and to assess the climate variability. The work started with the analysis of the statistical distribution of monthly temperatures in 1° × 1° cells. It was found that data in some areas and depths are clustered around one or more values. A computation of temperature value best estimates in each area and each month was carried out by looking for maxima data distribution and by minimizing root-mean-squares and vertical gradients. As remarked in section 2b, the results depend on the choice of the weight assigned to profiles gathered using different instrumentation, and on the requirement for best estimates providing a smoothed field.

The presence of two maxima suggested the calculation of two modes, the first is associated with lower temperatures than the second. A comparison between profiles of different modes is presented in Fig. 10. The main differences are in the intermediate and surface layers (down to about 500 m). The figure shows also that two modes are mainly found in the eastern Mediterranean. Conversely, only in a few areas of the western Mediterranean is there a presence of those two modes.

Studies have demonstrated that waters with different characteristics can be formed in various areas of the Mediterranean in different years (e.g., Lacombe et al. 1985; Klein et al. 1999; Sparnocchia et al. 1995). This is due to the strong air–sea interaction existing in the basin, a place where dense water formation occurs every year. The physical significance of multimodal data distribution was not considered as a component of this work and needs further investigation.

With regard to the distribution of data within each cell, there is another point that must be discussed. In some cells data that represent phenomena different from each other may coexist. In this case the high variability at mesoscale and submesoscale existing in the Mediterranean (e.g., Millot 2005) influences the distribution of data that appear to be random. This is what happens in the western Mediterranean, where the spatial and temporal variability is very high. In the eastern Mediterranean the effect of the eastern Mediterranean transient was very evident in this area and therefore it is possible to find a bimodal distribution that is truly indicative of two different states.

A simple comparison between the two modes obtained in this work and other climatologies provides only a partial view of the similarities and differences between them. The concepts behind the modes are substantially different. This work does not calculate a “mean state” of the Mediterranean, but it emphasizes the possible “states” that can exist in different periods of time. Two clear examples of unimodal and bimodal distributions during February are shown in Fig. 13, the first being from data at 10-m depth in the Tyrrhenian Sea and the second from data at 650-m depth in the Levantine Basin.

Fig. 13.

Examples of unimodal and bimodal distributions during February in the (top) Tyrrhenian Sea at 10-m depth and (bottom) Levantine Basin at 650-m depth. The continuous line is a best fit of the data points. Temperature is in degrees Celsius.

Fig. 13.

Examples of unimodal and bimodal distributions during February in the (top) Tyrrhenian Sea at 10-m depth and (bottom) Levantine Basin at 650-m depth. The continuous line is a best fit of the data points. Temperature is in degrees Celsius.

These temperature estimates are used for the quality control of near-real-time and delayed data. The high-quality data can be released a short time after data collection, and in any case within the 12 h defined as a target for operational systems.

Identical procedures have also been developed for salinity; however, the temporal and spatial coverage is not sufficient for the production of monthly estimates. In this case seasonal estimates have been produced. The situation is even worse for other parameters, but the procedures can nevertheless be used.

Acknowledgments

This work was carried out as part of the MyOcean and SeaDataNet projects, financially supported by the European Commission. We thank the many scientists, technicians, data center staff, and data managers for their contributions of data to the SeaDataNet system, which allowed us to compile the database used in this work. We thank also Nadia Pinardi (Bologna University) for the support provided to the authors and Igor M. Belkin (University of Rhode Island) for reviewing this paper. His suggestions have greatly improved the quality of the paper. We also thank an unknown reviewer for the many comments, which allowed us to improve the scientific aspects of the paper.

REFERENCES

REFERENCES
Babin
,
B.
,
J.
Bosch
,
B.
Burnett
,
M.
Bushnell
,
J.
Fredericks
,
S.
Kavanaugh
, and
M.
Tamburri
,
2009
: QARTOD V final report. NOAA, 136 pp. [Available online at http://nautilus.baruch.sc.edu/twiki/pub/Main/WebHome/QARTODVReport_Final2.pdf.]
Belkin
,
I. M.
,
2009
:
Rapid warming of large marine ecosystems
.
Prog. Oceanogr.
,
81
,
207
213
,
doi:10.1016/j.pocean.2009.04.011
.
Boyer
,
T. P.
, and
Coauthors
,
2009
: World Ocean Database 2009. S. Levitus, Ed., NOAA Atlas NESDIS 66, 216 pp.
Davis
,
J. C.
,
1973
: Statistics and Data Analysis in Geology. Wiley, 550 pp.
Flierl
,
G.
, and
A. R.
Robinson
,
1977
:
XBT measurements of thermal gradients in the mode eddy
.
J. Phys. Oceanogr.
,
7
,
300
302
.
Gouretski
,
V.
and
K. P.
Koltermann
,
2007
: How much is the ocean really warming? Geophys. Res. Lett., 34, L01610,
doi:10.1029/2006GL027834
.
Hanawa
,
K.
, and
T.
Yasuda
,
1992
:
New detection method for XBT depth error and relationship between the depth error and coefficients in the depth-time equation
.
J. Oceanogr.
,
48
,
221
230
.
Howe
,
M. R.
, and
R. I.
Tait
,
1965
:
An evaluation of an in-situ salinity-temperature-depth measuring system
.
Mar. Geol.
,
3
,
483
487
.
Klein
,
B.
,
W.
Roether
,
B. B.
Manca
,
D.
Bregant
,
V.
Beitzel
,
V.
Kovacevic
, and
A.
Luchetta
,
1999
:
The large deep water transient in the eastern Mediterranean
.
Deep-Sea Res. I
,
46
,
371
414
.
Krause
,
G.
,
1986
: STD/CTD-instrumentation. Numerical Data and Functional Relationships in Science and Technology, Series Landolt-Börnstein- Group V Geophysics, Vol. 3a, Springer Berlin Heidelberg, 202–208.
La Violette
,
P. E.
,
1989
:
WMCE Western Mediterranean Circulation Experiment: A preliminary review of results
.
Eos, Trans. Amer. Geophys. Union
,
70
,
746
.
Lacombe
,
H.
,
P.
Tchernia
, and
L.
Gamberoni
,
1985
:
Variable bottom water in the western Mediterranean basin
.
Prog. Oceanogr.
,
14
,
319
338
.
Levitus
,
S.
,
J. I.
Antonov
,
T. P.
Boyer
, and
C.
Stephens
,
2000
:
Warming of the World Ocean
.
Science
,
287
,
2225
2229
.
Locarnini
,
R. A.
,
A. V.
Mishonov
,
J. I.
Antonov
,
T. P.
Boyer
,
H. E.
Garcia
,
O. K.
Baranova
,
M. M.
Zweng
, and
D. R.
Johnson
,
2010
: Temperature. Vol. 1, World Ocean Atlas 2009, NOAA Atlas NESDIS 68, 184 pp.
Maillard
,
C.
,
M.
Fichaut
, and
H.
Dooley
,
2001
:
MEDAR-MEDATLAS protocol: Part I; Exchange format and quality checks for observed profiles
. Ifremer Rep. R.INT.TMSI/IDM/SISMER/
SIS00
084
,
48 pp.
Manzella
,
G. M. R.
,
E.
Scoccimarro
,
N.
Pinardi
, and
M.
Tonani
,
2003
:
Improved near real-time data management procedures for the Mediterranean Ocean Forecasting System Volunteer Observing Ship program
.
Ann. Geophys.
,
21
,
49
62
.
Manzella
,
G. M. R.
, and
Coauthors
,
2007
: The improvements of the Ships of Opportunity program in MFS-TEP. Ocean Sci.,3, 245–258, doi:10.5194/os-3-245-2007.
MEDAR Group
,
2002
: MEDATLAS/2002 database: Mediterranean and Black Sea database of temperature, salinity and bio-chemical parameters. Climatological Atlas, IFREMER, CD-ROM. [Available online at http://www.ifremer.fr/medar/cdrom_database.htm.]
MEDOC Group
,
1970
:
Observation of formation of deep water in the Mediterranean Sea, 1969
.
Nature
,
227
,
1037
1040
.
Millot
,
C.
,
2005
:
Circulation in the Mediterranean Sea: Evidences, debates and unanswered questions
.
Sci. Mar.
,
69
(
Suppl.
),
5
21
.
Monaco
,
A.
, and
S.
Peruzzi
,
2002
:
The Mediterranean Targeted Project MATER—A multiscale approach of the variability of a marine system—Overview
.
J. Mar. Syst.
,
33–34
,
3
21
,
doi:10.1016/S0924-7963(02)00050-7
.
Pinardi
,
N.
, and
G.
Coppini
,
2010
: Operational oceanography in the Mediterranean Sea: The second stage of development. Ocean Sci.,6, 263–267, doi:10.5194/os-6-263-2010.
Reseghetti
,
F.
,
M.
Borghini
, and
G. M. R.
Manzella
,
2007
: Factors affecting the quality of XBT data –Results of analyses on profiles from the western Mediterranean Sea. Ocean Sci.,3, 59–75, doi:10.5194/os-3-59-2007.
Rixen
,
N.
, and
Coauthors
,
2005
:
The Western Mediterranean Deep Water: A proxy for climate change
.
Geophys. Res. Lett.
,
32
,
L12608
,
doi:10.1029/2005GL022702
.
Robinson
,
A. R.
, and
Coauthors
,
1991
:
The eastern Mediterranean general circulation: Features, structure and variability
.
Dyn. Atmos. Oceans
,
15
,
215
240
.
Santinelli
,
C.
,
A.
Ribotti
,
R.
Sorgente
,
G. P.
Gasparini
,
L.
Nannicini
,
S.
Vignudelli
, and
A.
Seritti
,
2008
:
Coastal dynamics and dissolved organic carbon in the western Sardinian shelf (western Mediterranean)
.
J. Mar. Syst.
,
74
,
167
188
.
Schlitzer
,
R.
, cited
2012
: Ocean data view. [Available online at http://odv.awi.de.]
Sparnocchia
,
S.
,
P.
Picco
,
G. M. R.
Manzella
,
A.
Ribotti
,
S.
Copello
, and
P.
Brasey
,
1995
:
Intermediate water formation in the Ligurian Sea
.
Oceanol. Acta
,
18
,
151
162
.
Stewart
,
R. H.
,
2008
: Introduction to Physical Oceanography. Texas A&M University, 345 pp. [Available online at http://oceanworld.tamu.edu/resources/ocng_textbook/PDF_files/book_pdf_files.html.]