1. Introduction
Performing meteorological measurements, data storage, and management is a delicate process that is never exempt of errors, despite the efforts and care invested in the task. For any meaningful use of these meteorological data, it is important to ensure, as much as possible, the validity of observations. The procedures used for this purpose constitute the so-called quality control (QC; e.g., Wade 1987; Gandin 1988; DeGaetano 1997; Shafer et al. 2000; Fiebrich et al. 2010; see section 1 in Lucio-Eceiza et al. 2017).
Some QC tests are focused on the detection of issues related to data transcription and collection or to errors that occurred during data manipulation, like the duplication of data sequences. Additionally, the standardization of practices that can vary across institutions like measurement units or reference times can be issues of importance for databases built using data from various source institutions. All these checks refer to data management issues. Additionally, there are other tests that address temporal or spatial consistency in the data that are designed to deal with errors often produced at the moment of sampling, as a result of instrumental malfunction, calibration or exposure problems. All these errors are generally of local nature and are less likely to depend on procedures established by the data source institution. We refer to these cases as measurement errors.
The present work summarizes the second part of a QC process applied to an historical data compilation of surface wind observations across northeastern North America (WNENA). Lucio-Eceiza et al. (2017, hereafter Part I) reports on data management issues, whereas the procedures described herein, Part II hereafter, are focused on the detection and removal/correction of measurement errors. Part I demonstrated that the problems related to data management had a very important impact on the surface wind data, with more than 90% of the data being modified during the process of unifying data transcription, collection, and storage, and ~0.1% of faulty records being deleted mainly due to intersite erroneous duplications of data sequences.
The goal of this paper is to analyze the problems related to measurement errors, to alleviate them with the help of data flagging/correction protocols, and to compare their extent with the data management issues detected in Part I. Since the measurement errors are, by their very nature, independent of the dataset, the procedures presented herein are of universal applicability and thus easily translatable to other datasets. As with Part I, for each test the behavior of the suspect records is addressed together with the statistics of occurrence in space, time, and data source. In both parts, an evaluation of the impact of errors on the statistics of the data is also provided. The final purpose of this work is to construct a surface wind speed and wind direction database of robust quality and wide spatial and temporal expanse that can be later used for the analysis of interesting phenomena specific to this region, like the analysis of extreme values (Cheng 2014), wind variability at different time scales and their relationship to large-scale modes of circulation (e.g., Jiménez et al. 2008; García-Bustamante et al. 2012), high-resolution model validation (e.g., Jiménez et al. 2010a), or long-term electric production estimation (e.g., García-Bustamante et al. 2013) among others.
Section 2 briefly describes the observational database. Section 3 describes the methodologies of the QC process undertaken in this manuscript. Section 4 provides an account of the results obtained at each phase of the QC procedure. The impact of the suppressed data is discussed in section 5, and conclusions are given in section 6. The purpose of sections 5 and 6 is twofold, since they present the results specifically in reference to the treatment of measurement errors while offering at the same time a general view concerning the whole QC process in which the results obtained here are discussed in the perspective of those attained in Part I.
2. Observational wind data
As it was extensively described in Part I (section 2), WNENA integrates the observations of 526 sites: 486 land stations distributed over eastern Canada (New Brunswick, Newfoundland and Labrador, Nova Scotia, Nunavut, Prince Edward Island, Ontario, and Quebec) and five northeastern U.S. states (Maine, Massachusetts, New Hampshire, New York, and Vermont), as well as 40 buoys distributed between the east coast of Canada and the Canadian Great Lakes. The area covers an approximate spatial extension of
3. QC methodology
The QC that has been applied in WNENA is structured into six phases that deal with different issues (numbered in Fig. 1): 1) compilation; 2) duplication errors; 3) physical consistency in the ranges of recorded values; 4) temporal consistency, regarding abnormally high/low variability in the time series; 5) detection of long-term biases; and 6) removal of isolated records. The first three phases deal with data management issues and were addressed in Part I. This manuscript focuses on issues that involve the last three phases in Fig. 1. These can be regarded as measurement errors that are often related to instrumentation problems (temporal consistency and isolated records, phases 4 and 6, respectively), instrument calibration, and siting (bias detection, phase 5).
Diagram describing the six phases of the QC process. Magenta (green) highlights checks that are applied only to wind speed (direction). Blue indicates tests applied to both variables. This paper deals only with measurement errors (last three phases); issues related to data management are treated in Part I.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
The QC process follows a sequential structure designed to minimize potential overlapping between the various different phases. Some checks are common for both wind speed and direction, while others specifically address one of the variables. In Part I data identified as erroneous, such as duplicated chains of values, were removed. However, in Part II, erroneous values are just flagged for posterior removal (FR) or correction (FC). During the process, the FR data are temporally removed in order to establish the thresholds of subsequent steps but are reverted at the time of applying them so that each record can be flagged by more than one step. The FC data, however, are kept corrected permanently as, for instance, in the case of documented height changes. All the flags are stored in a separate track file that codifies each step in a unique way for easy identification and eventual reversal. This section makes a methodological description of each phase, while the presentation of results and the illustration of specific cases will be addressed in the next section. A summary of the procedures is collected in Tables 1, 2.
Summary of the procedures carried out in Part II related to abnormally low and high variability (phase 4). The meaning of the abbreviations/symbols is at the bottom of Table 2. For a more detailed explanation, refer to section 3a.
Summary of the procedures carried out in Part II related to bias detection and isolated values (phases 5 and 6, respectively). The meaning of the abbreviations/symbols is at the bottom of this table. For a more detailed explanation, please refer to section 3b and 3c.
a. Phase 4: Temporal consistency
These checks (phase 4 in Fig. 1) analyze the consistency of the temporal variability of the wind series. They target two different kinds of extreme behavior within the time series: periods with abnormally low or abnormally high variability.
1) Abnormally low variability
Periods with an inordinately small variability in wind speed and direction are typically the result of damaged instruments, caused by dust, corrosion or icing conditions, or as result of faulty communications between an instrument and the datalogger (Shafer et al. 2000). Various approaches have been taken to identify such errors. Some studies look for low-variability periods at relatively long time scales (e.g., 24 h in Shafer et al. 2000; one month in Hubbard et al. 2005). They compare the standard deviation of data in a given predefined moving window with a previously established threshold value. Periods with standard deviation values below this limit are flagged. Other studies search for constant data sequences (i.e., zero variance) of suspicious length at shorter time scales (e.g., minutes in Jiménez et al. 2010b; hourly in Meek and Hatfield 1994; 3-hourly in DeGaetano 1997). The unrealistically long constant value chains can be identified by establishing a threshold length. Maximum threshold lengths can be either arbitrarily imposed (Meek and Hatfield 1994; Durre et al. 2010; Dunn et al. 2016) or estimated from the sample statistics (Jiménez et al. 2010b). Alternatively, the faulty constant sequences can be identified with the help of an auxiliary variable (e.g., pressure; DeGaetano 1997).
The approach presented herein targets the search of constant data sequences. Direction sequences corresponding to records of 0° are excluded from the analysis, since this value was imposed for 0
The following methodology consists of three tests: the first two tests address wind direction and both high and low wind speeds, while the third test targets only low wind speeds/calm periods. Prior to their application, the sequences are classified in 12 different resolution × precision (RP) groups that will be evaluated separately: hourly/3 hourly/6 hourly for resolution, and 360/36/16/8 points of the compass for direction and 0.1/0.3/0.5/1
(a) Example of the effect of precision degradation for the detection of constant wind direction periods for buoy c44141 located at the Laurentian Fan (east coast; DFO). The original precision (purple) has been successively degraded to lower precisions typical in the database (other colors). The largest constant period at the lowest precision occurring between 28 Dec 1995 and 4 Jan 1996 (159 records) was nonexistent at the highest precision. (b) Example of an erroneous calm episode that expands over a year at buoy c44141 (Laurentian Fan, highlighted in red). (c) Examples for calm detection at sites 8400301 (Badger, Newfoundland; EC; left) and 7026042 (Piedmont, Quebec; EC; right). The target series (light blue) and the regional series (black) are indicated. The analyzed calm periods are highlighted (yellow bars). The elements for the calculation of Eq. (1) (pink),
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
The first test searches for constant sequences surrounded by a large proportion of missing data, which can be indicative of operational problems. The test evaluates the percentage of missing data during each constant sequence and during the preceding and following 24-h intervals. The sequence is flagged as erroneous if any two of these three percentages exceeds a given threshold. The threshold was set at 90%, as it was observed that lower values tended to erroneously flag correct sequences of sites limited to daylight measurements (see Part I, section 2).
The second test evaluates the statistical likelihood of a constant data sequence depending on its length. The suspicious periods can span from several hours to months long, such as the example shown in Fig. 2b for an erroneous calm that lasted around a year. Prior to the evaluation, the constant sequences are segregated by site × resolution × precision (SRP) in 12 RP groups for each site of the database (i.e., 526 × 12) and sorted by their length. For each distribution a nonparametric threshold is established, based on the distance given by
A third test is applied only to constant sequences at low wind speeds. It is based on the spatial consistency of the wind variability at a site and their neighbors and is able to detect periods that, albeit being erroneous, were overlooked by the previous test due to their short length. In real calm situations, well-chosen neighbor stations should also experience a decline in wind speed. A regional reference is constructed by selecting the five closest and best correlated,







2) Abnormally high variability
These errors, typically a consequence of technical issues like loose wires or datalogger problems (Shafer et al. 2000), are in general less common than the erroneous low-variability records. A common method for detecting them is the so-called step check (Meek and Hatfield 1994; Hubbard et al. 2005), which compares the differences between sequential observations to a threshold value searching for steplike behavior (see Fig. 3a). For differences greater than the threshold, both values are regarded as erroneous. A somewhat more sophisticated approach is the blip test. This test looks for spikes and dips (Fig. 3a)—that is, successive increases and decreases in values (Fiebrich et al. 2010)—and unlike the step check, it is able to discern the faulty records from the good records. The thresholds for both tests can be either single values fixed for the whole time series (Meek and Hatfield 1994; Fiebrich et al. 2010), variable thresholds dependent on the month of the year (e.g., Vejen 2002; Dunn et al. 2016), or framed within the behavior of the day (e.g., DeGaetano 1997).
(a) Conceptual illustration of high-variability errors. A normally behaving sample (solid line). The various types of detectable errors, namely, spikes, dips, steps, and long periods, are indicated as deviations (dashed lines; figure based on Fiebrich et al. 2010). (b) The wind speed bias detection method applied to station 8202550 (Inverness, Cape Breton, Nova Scotia; EC). The 15-day moving averages of the daily means of the anomalies (red line, right y axis), standard deviations (blue lines), and coefficients of variation (orange lines); horizontal lines of the same color depict their corresponding thresholds. The detected erroneous values of mean and standard deviation are highlighted, for easier visualization, by points of the same color (lower part of the plot). The original daily means (gray, left y axis). (c) Wind direction bias, the number of detected cases in the year-to-year analysis vs the angle of the shift (absolute value). The y axis is in logarithmic scale.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
The method applied in this work uses a blip (or temporal) test complemented with a spatial check. The combination of both tests allows for identification of three different error typologies: spikes and dips, steps, and long episodes, schematically represented in Fig. 3a.
The blip test compares wind speed differences between valid consecutive observations with thresholds that are specifically defined for each station, so the analysis is run individually site by site. The differences that exceed the given thresholds are considered suspect. Since the series contain missing data and different time resolutions, consecutive observations may be separated by different time intervals. Therefore, different thresholds are obtained for each site from the distributions of the differences between pairs of observations separated by time intervals (
Additionally, each record is independently spatially evaluated using available wind speed data from the closest 10–40 sites located within a distance of less than 300 km and an elevation difference of less than 500 m (Dunn et al. 2016) to the target series. The record will be considered suspect if it exceeds the limit of its spatial threshold (
The suspect data may be classified into one of the aforementioned three categories (Fig. 3a). The blip test by itself is able to detect erroneous spikes (dips) when a positive (negative) suspect difference is followed by a suspect negative (positive) one. In these cases, only the middle value is flagged. The erroneous steps are identified when a suspect positive (negative) difference involves also spatially suspect data. In those cases both values are flagged. Finally, all the values between flagged positive and/or negative steps will also be flagged if they fail the spatial test, constituting erroneous long periods. The validation of suspicious cases has been carried out by comparing them to auxiliary anemometers when possible, as in the case of DFO buoys (see section 2). For periods longer than a day, an additional 24-h window has been preemptively flagged.
b. Phase 5: Bias detection
The previous section targets erroneous periods of constant values or high-variability errors that are a few days long at most. However, longer intervals of time, such as weeks or months, that have systematic unusual values of mean and/or standard deviation will not be identified by the preceding analysis. The fifth phase of the QC (Fig. 1) deals with the detection of systematic errors (or biases) in both wind speed and direction. These errors, common to any meteorological variable, are related to a great variety of factors, such as changes in the measuring devices, different averaging methods, changes in anemometer heights, or changes in exposure or site relocation (e.g., Alexandersson 1986; Begert et al. 2003; Thomas et al. 2005; Wan et al. 2010). This work considers biases at different time scales. We correct for long-term wind speed biases from documented changes in anemometer heights. Regarding wind direction, we target the detection and correction of biases, specifically shifts in direction, which may affect interannual to multidecadal time scales. We also look for errors caused by many other, often unknown, factors that affect the behavior of wind speed records for periods ranging from several weeks to months, longer than those targeted in the previous QC steps but shorter than long-term inhomogeneities.
1) Wind speed

where
The measurement heights were compiled from several different metadata sources. The height information related to Canadian land stations was acquired mainly from a database that gathered information from many climate station inspection reports (SIRs) from Environment Canada’s National Climate and Data Information Archives (Wan and Wang 2006). This information has been supplemented by looking through additional individual digitized SIRs obtained from EC. For U.S. sites, the heights have been extracted from individual annual local climatological data (LCD) publication files obtained from the NCDC Image and Publication System (http://www.ncdc.noaa.gov/IPS/lcd/lcd.html). Finally, for the moored buoys, the information has been obtained from the Meteorological Service of Canada (MSC) buoy status reports archived by the DFO (http://www.meds-sdmm.dfo-mpo.gc.ca/isdm-gdsi/waves-vagues/index-eng.htm; Table 2; Part I). The value of
The flagging of nondocumented wind speed biases is done using daily averages. The method screens anomalous behavior beyond the limits of the temporal consistency tests (section 3a). For this purpose, 15-day moving windows of the mean (
The upper and lower thresholds are defined using
2) Wind direction
The detection of biases in wind direction addresses the identification and correction, not the removal, of shifts in direction. The procedure considered herein is one of the few attempts reported in the literature (Begert et al. 2003; Petrovic 2006; Gruber and Haimberger 2008) and searches for temporal changes in wind roses. As in the case of biases in wind speed, the approach presented here is univariate, thus each series is treated individually and comparisons with regional references are avoided. The rationale for this is that wind direction at the surface is shaped by orography, thus complicating the identification of good reference neighbors within the spatial scales of intersite distances of our database. The method is based on the comparison of annual wind roses in search for relative shifts. The method is run individually for each series. In the first step, the method compares consecutive yearly wind roses. This is done by shifting them to find the relative angle in which their root-mean-square differences (RMSD) are at a minimum. This approach assumes that the distributions of wind direction at a site—that is, the wind roses—remain approximately stable through time with minor year-to-year variations and vary only slightly due to long-term changes in the atmospheric circulation. This is a realistic assumption as shown in Fig. 3c, where about 90% of the year-to-year comparisons did not register any shift change and only around 4% involved shifts larger than 10°. Sudden shifts are presumed to be caused by changes in measurements, location, anemometer heights, surrounding environment, or artificial biases of any other nature. RMSDs are calculated for all the possible relative angles between the wind roses, depending on the precision with which wind direction is recorded (8, 16, 36, or 360 sectors). Precisions of 360 sectors are in practice reduced to 36 for this analysis in order to make results less noisy. Only years with at least 75% of availability of data are considered. This is still sufficient to avoid a large drop in yearly records while having robust estimates of the wind rose, that is, reducing the number of false positives related to subsampling. After a first check in which year-to-year steps are considered, only rotations larger than 10° are preserved for the second round.
The comparison of wind roses is expanded in a second step to the time intervals between the previously flagged years. The longer samples used to estimate wind roses in this step make the results more reliable and allow for disgarding previously estimated changes that might have been due to data paucity. The resulting cases are individually inspected. Rotations in wind roses with equiprobable wind directions are discarded. The remaining cases are corrected by means of addition/subtraction of the rotated angle to match the position of the most recent time interval. The corrections are checked to be consistent with the surrounding orography where possible. The available metadata have also been consulted when looking for information that validates our findings and for problems that might be too short or involved fairly small angles to be detected. The analysis is limited to complete years defined from 1 January to 31 December, which means that there is a chance of erroneously modifying some correct months or overlooking erroneous loose months belonging to mostly correct years. This problem should, however, involve a relatively small number of months, as the needed amount of data to discernibly alter the wind rose is generally large and tends to be even larger for smaller shifts (not shown).
c. Phase 6: Isolated records
Once the previous steps of the QC are completed, a final step (phase 6 in Fig. 1) is conducted to flag isolated suspicious data. After the application of a number of tests, it is not uncommon to find short groups of isolated data between relatively long segments of missing or flagged data. This can also happen in the original data series between relatively long periods of missing observations. The reliability of these data is questionable (Lawrimore et al. 2011). The criterion followed herein is that any sequence of observations of 24 h long or shorter that is surrounded by an interval of missing observations of 24 h or longer has been flagged.
4. Results
This section reports on the results of the QC process and provides some additional technical details and examples of the identified error typologies. The number of the affected records in each phase is presented in Table 3, where all the phases of Part I and Part II are sequentially shown, offering a general view.
Number of affected data during each phase of the QC (Fig. 1) for wind speed (WS) and wind direction (WD), and in total. Percentages, in parentheses, are given with reference to the initial number of WS/WD records (53 956 328 records each). Column 4 refers to both WS and WD as if the removal (Part I) or FR (Part II) had been applied, since the elimination of a WS or WD record implies the loss of the WS–WD pair. For FC data, column 4 is a simple sum of columns 2 and 3. Since some records may have been FR more than once, the total of the last row does not correspond to the sum of individual steps. Phases 1–3 synthesize the results in Table 5 of Part I. Also refer to Fig. 9.
a. Phase 4: Temporal consistency
1) Abnormally low variability
Low-variability checks address the detection of suspicious constant data sequences (see section 3a), making a distinction between noncalm (wind speed
Spatial distribution of the affected stations after checking for (a) long constant noncalm sequences and (b) calm corrections. Symbols indicate the variable (wind direction/speed) and source institution, and color fill indicates the number of affected data. (c) Example of simultaneous flagged constant periods for both wind speed (red) and direction (blue) at the same station (station CXPC, Parry Sound, Ontario; NCAR). (d) Absolute frequency distribution of the number of calms (left y axis) vs their length in number of records: those evaluated with a spatial comparison (blue), those that had at least one flagged record (red), and the percentage of flagged calms over the total evaluated (black, right y axis). In vertical (horizontal) bars lengths at 12, 24, and 48 records (percentages at 75%) are marked for easier visualization.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
The total number of handled constant sequences for tests 1 and 2 adds up to around 602 000 in direction (with a flagging rate of
Summary of the number of cases detected at each test of Part II. Results are provided in total and also separately for each data dataset/institution. The range of lengths of these cases is indicated in the number of records for abnormally low and high variability, and in hours/weeks/years for biases and isolated values. False-positive ratios, obtained after manual scrutiny of all the outcomes, are indicated within parentheses when possible.
The third test is based on spatial consistency and was applied only to calm situations. It accounts for the largest part of the flagged data: circa 32 000 from a total of 97 000 candidate calms showed at least one flagged record. Although the method targets calms of any length, the percentage of flagged sequences increases with length to more than 75% for sequences longer than 24 records (Fig. 4d, total calms in blue, flagged calms in red, and percentages in black). The majority of the flagged records within the intradaily sequences are concentrated during diurnal hours when the winds tend to be higher and it is less likely that the regional series support zero wind speed at the test site (not shown). The percentage of flagged calm sequences remains more or less constant in the database for the whole time (not shown) with
2) Abnormally high variability
Regarding high-variability errors, 2082 (<0.01%) wind speed records were flagged, affecting 160 sites (Fig. 5a): 82 EC sites (528 records), three DFO buoys (3), and 75 NCAR sites (1551). The number of cases and associated false-positive ratios can be found in Table 4. The errors are more or less homogeneously distributed over time as shown in Fig. 5b and increase with the addition of new sites. NCAR sites show an abundance of flagged records, a consequence of the lack of QC processes applied to them. The majority involves isolated records, placed well above the typical range of variability of the site, as shown in Fig. 5c for a site located in Massachusetts. Although less common, longer faulty periods can also be found in the database, such as the one in New Brunswick, Canada (EC, Fig. 5d).
(a) Spatial distribution of high-variability errors. Symbols indicate the dataset, and the color scale indicates the number of affected data. (b) Temporal distribution of flagged values. The number of operating stations per year (dashed lines, right ordinate axis), and the number of erased data per year (solid lines, left ordinate axis). Colors indicate the source institution. (c) Example of high-variability flagged data (red points) corresponding to site NZW (South Weymouth Naval Air Station, Massachusetts; NCAR). (d) As in (c), but for site 8104201 (Point Lepreau climatological station, New Brunswick; EC).
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
b. Phase 5: Bias detection
1) Wind speed
The systematic biases in wind speed records are divided into two groups depending on their causes: those attributable to documented anemometer height changes; and those that, ranging from weeks to months, are caused by unknown/undocumented factors [see section 3b(1)].
The first group of biases was corrected following Eq. (2), which makes use of the available metadata on anemometer height changes. Documentation about the heights were available for 220 sites in total: 166 Canadian sites (EC + NCAR), all of the 40 DFO buoys, and 14 U.S. (NCAR) sites (Fig. 6a). Despite involving only 40% of the sites, these include 106 of the 125 sites longer than 20 years (85%, Fig. 6a), which are more prone to suffer from changes. The corrections have been applied to the 91 sites with at least one height change (Fig. 6b). The number of local documented changes may range between one and seven, with the longest sites suffering the most changes. Nevertheless, the comparatively shorter moored buoys can accumulate between one and four changes per site, due to a combination of two factors: 1) some buoys changed their hull type through time (Table 2 in Part I); and 2) each time series was constructed by combining the information of two channels (see section 2 in Part I), belonging to anemometers normally located at different heights in most of the cases (Table 2 in Part I). A total number of 8 563 779 (15.87%) records have been modified (Table 3). Figure 6c shows the temporal distribution of the documented heights for the 1953–2010 period. Measuring heights can broadly range from 37.19 to 6 m (10–3.3 m) for land sites (moored buoys). As we can see, before the late 1960s/early 1970s, there was not a preferred height as evidenced by the larger diversity of heights and the uniform distribution of stations over them (Fig. 6c). After the 1970s, however, a tendency to follow the standard 10-m height develops (Klink 1999; Wan et al. 2010) albeit with some notable exceptions (e.g., 37.19 m at the Greater Binghamton Airport, Binghamton, New York). The decrease in the percentage of 10-m-height sites after the late 1990s is parallel to the increase of heights below 10 m, which is mainly related to the appearance of moored buoys. Figure 6d shows an example of one of the longest time series in the database (Goose Bay, Labrador, Canada; EC) with seven documented height changes. The records were corrected to its reference, last documented, height of 10 m.
(a) Spatial distribution of anemometer heights (m). The indicated height is the reference, last known, height. Sites for which individual information was not available (pink). The length of the time series is indicated with the size of symbols. (b) Spatial distribution of sites with at least one change in height. The modified amount of records is indicated by color filling, the number of documented height changes is indicated by the symbol, and the dataset is indicated by the color border. (c) Temporal distribution of known anemometer heights in the database. The heights (left axis) are rounded to the closest meter for clarity. The total number of active stations with documented height at each moment (blue line, right axis). The color bar shows the percentage of stations from the total that are placed at a given height (in logarithmic scale). (d) Monthly wind speed (
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
Regarding nondocumented errors, 103 562 records (0.19%) were flagged (Table 3) with 78 sites affected (see Fig. 7a) in total: 37 EC (58 255 records), eight DFO (32 793 records), and 33 NCAR sites (12 514 records). Information about the number of detected cases and associated false-positive ratios can be found in Table 4. The most abundant flagged periods are under 4–5 weeks (Fig. 7b). Many of the shortest periods correspond to NCAR sites and coincide with high-variability errors detected previously. Two additional examples (Figs. 7c,d) show periods of about a month at Parry Sound (Ontario, Canada; NCAR) and about 3 months at Laterrière (Quebec, Canada; EC). The flagged cases from DFO tend to be of extreme low values, opposite of the cases of NCAR. The identified longest case (Fig. 3b) is likely due to an undocumented height change, as the coefficient of variation is unaffected (Vautard et al. 2010). The temporal distribution of flagged segments of data with biases (Fig. 7e) shows that despite the successive increase in the number of sites, EC has maintained a stable or even declining trend of erroneous periods in the last years, likely as a result of improvements in both site maintenance and data processing/QC methodologies. These numbers are comparable to those of DFO and NCAR in spite of the higher number of stations from EC.
(a) Spatial distribution of the sites affected by undocumented wind speed biases, symbols indicate the dataset, and colors indicate the number of affected data. (b) Number of cases vs their approximate length in weeks. Vertical bars indicate some time scales for better visualization. Examples of flagged periods of (c) almost 1 month at site CXPC [Parry Sound Canadian Coast Guard (CCG), Ontario; NCAR] and (d) 3 months at site 7064181 (Laterrière). (e) Temporal distribution of the flagged values. The number of operating stations per year (dashed lines, right y axis), and the number of flagged data per year (solid lines, left y axis). Colors indicate the source institution.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
2) Wind direction
Detection of wind direction biases involved the correction of 931 842 (1.73%) records (Table 3). In total 36 stations were affected with vane shifts greater than 20° (Fig. 8a) with lengths expanding from one to several years. Most of these sites were affected by one or two shifts (circles). The metadata files (triangles) provided information to correct four additional sites, all of them with periods shorter than a year or with angles smaller than 20° and thus indiscernible by our method. In total 23 EC sites were affected with 670 798 records, three DFO buoys with 21 876, and 14 NCAR sites with 239 168 records. Figure 8b,c show two cases detected with our method: the first one corresponds to a site located at the Port Hastings Canal (Nova Scotia, Canada) with five changes and the second one corresponds to a station located at the Greenville Maine Forestry Service (Maine) with only one shift. An example of a shift detected from the metadata is shown in Fig. 8d. It corresponds to a site located in Blanc Sablon (Quebec) with a 10° angle shift that lasted from 20 March to 21 October 1974.
(a) Spatial distribution of biases in wind direction. Colors of symbol contours indicate the dataset, symbols indicate that the shift has been detected with the method (circles) and metadata (triangles), symbol size indicates the number of vane changes, and the color scale indicates the number of affected data (×1000). (b) Wind rose showing wind direction bias before and after correction, corresponding to station 8204481 (Port Hastings Canal, Nova Scotia; EC) with five changes, each indicated by a line of a different color. The roses are shifted to match the last time interval that is considered the reference period. The indication for the rotated angle in the inset is provided also with respect to the last time interval. (c) As in (b), but for site KGNR (Greenville Maine Forestry Service; NCAR). (d) As in (b), but for site 7040812 (Blanc Sablon, Quebec; EC).
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
c. Phase 6: Isolated records
The last step in the QC process involved flagging 6906 (0.01%) pairs of records (Table 3) distributed among 6089 events of varying lengths of 24 h or shorter over 269 sites: 157 EC sites (2007 records), 22 DFO sites (291 records), and 90 NCAR sites (4608 records).
5. Impact
This section describes the impact of the whole QC procedure (Part I and Part II) on the statistics of the observational time series. Figures 9a,b show, for wind speed and direction, the type of error that had the largest implications at each site in terms of corrected/FC or deleted/FR data (excluding the phase related to compilation and the redefinition of calms and true north, in Part I). From an initial number of 526 stations, 501 have been affected with one or more of the analyzed nine error typologies in the case of wind speed. The most common error is related to unrealistic calms, being the most relevant one in 300 sites. The standardization of documented changes in height follows as the most important at 89 sites, followed by long-term errors at 52 sites and isolated values at 46. In wind direction, 310 stations were affected by any of the six analyzed error types. The most relevant error in the majority of the stations (209) is related to isolated values, followed by constant periods (42) and biases in direction measurements (40).
Overview of the errors involving the largest amount of data at each site for (a) wind speed and (b) wind direction. Results for both Part I and Part II are included. The errors are indicated with colors, and the symbols indicate the data source institution (see legends). The symbols are given in decreasing sizes for easier visualization of close sites. (c) Distribution of the percentage of total deleted data at each site after all the steps of the QC. The stations with percentages over 90% were removed from the initial database. (d) Distribution of the percentage of total modified data at each site in Part II. (e), (left) Number of erased (Part I)/flagged (Part II) data (bars, left y axis) for the whole database (blue) and for each dataset (other colors), at each of the steps during the whole QC (x axis). The affected number of sites are in lines (right y axis). (e), (right) As in (left), but for percentages corresponding to total amount of data/sites (gray) or divided by dataset (rest). For the meaning of the abbreviations, refer to Table 3, phases 2–6.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
Regarding wind speed and considering the different data datasets, those affected by one or more issues include 328 EC sites (95% of EC sites), 35 DFO buoys (87%), and 138 NCAR sites (98%). Regarding direction 183 EC sites were affected (53%), 27 DFO sites (67%) and 100 NCAR sites (69%).
The total accumulated percentage of removed/FR data by all the tests is shown in Fig. 9c. In total 501 sites were affected, although the vast majority of them (416) with less than 1% of data. Only six sites presented percentages above 10%, with four being buoys mostly affected by undocumented biases and very long calm situations that accounted for 25%–50% of their data flagged. Site CWVY located in Lemieux (Quebec; NCAR; see Part I, section 2b) had all its data removed, as it was found that it was constructed with data of two other nearby sites. Fewer sites, 123, were affected by data corrections (accumulated percentages in Fig. 9d) but with higher percentages of affected data than in the previous case. Most of the sites (107) presented percentages above 10% and 33 sites more than 50%. Table 5 summarizes the number of FR/removed and FC/corrected data per dataset and in total.
Summary of total (wind speed + wind direction), and FR/removed and FC/corrected records by dataset and in total. The records that have been FR by multiple tests are counted only once. The percentages relative to the size of the dataset are in parentheses.
Figure 9e categorizes the results by test and dataset, both with raw numbers and percentages. Although the EC dataset shows a higher number of flagged records and had more sites affected than NCAR and DFO, when taking into account percentages the situation is reversed: NCAR tends to show more problems in isolated segments of data, unphysical measurements, problems related to the vane orientation, and with high variability; whereas DFO registers problems related with low-variability measurements and long-term biased periods. In percentages DFO (NCAR) has 8 (4) times as many removed/FR data as EC (Table 5).
The impact of the correction/removal of records on the shape parameters of the statistical distribution of data is shown in Figs. 10, 11. These parameters are the mean, standard deviation, skewness, and kurtosis obtained from the calculation of the first- to fourth-order moments (von Storch and Zwiers 2003). Despite not being the optimal estimators for non-Gaussian distributions, they have nevertheless been used, as they offer some valuable information about the changes in the wind distributions before and after applying the QC. The mean wind speed (direction) differences (
Spatial distribution of differences in (a),(b) mean and (c),(d) standard deviation ratios for (a),(c) wind speed and (b),(d) wind direction (using directional statistics, Mardia and Jupp 2009) before and after QC changes.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
Spatial distribution of the (a),(b) skewness and (c),(d) kurtosis for wind speed (a),(c) before and (b),(d) after the QC.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
The skewness of the wind speed time series, a measure of the asymmetry of the distribution (Figs. 11a,b), is significantly reduced after the QC process, a sign of the effects of the unrealistic high records that have been erased on the tails of the distributions. Nevertheless, it continues being positive, which is a characteristic of this type of variable. It is noteworthy that all sites show now skewness values in a very close range of [0.25,1.75]. The kurtosis, a measure of the peakedness of the distribution (Figs. 11c,b), is also drastically reduced in stations with a greater number of high values but is still generally leptokurtic; here, the reference value of kurtosis is 0 for normal distributions. The stations now show a close range in kurtosis, [−0.5,5].
6. Conclusions
This paper describes the second part of a semiautomatic QC procedure designed to identify and correct erroneous records of a surface wind speed and direction database of opportunity located in northeastern North America (WNENA), compiled from an heterogeneous origin that were subjected to previous quality treatments of different depths. There are relatively few works covering these types of meteorological variables, especially at such depth (e.g., DeGaetano 1997; Graybeal et al. 2004; Dunn et al. 2016). The vast array of tests described herein provides an overview of data quality issues and offers some guidelines on how to improve a starting position that may not be optimal in terms of having data of the best desirable quality, but it may be nevertheless representative of what one can commonly access or acquire in the course of a time-bounded study. Most of these tests are either improved versions of previous studies or have been newly developed for this work. The tests described in Part II (Fig. 1) are focused on the detection of measurement errors: errors produced at the moment of the measurement and related to instrumental faulty performance or calibration, or sitting exposure. In contrast, the first three phases, described in Part I (Fig. 1, shaded), were centered on issues related to data management: problems originated at the compilation and subsequent unification of databases from institutions that follow different criteria, and data manipulation procedures.
Most of the tests presented herein are based on simple principles and are computationally affordable, offering admissible false-positive ratios of
As a result of the whole QC process (Part I and Part II), about 0.5% of wind speed and 0.16% of wind direction records have been identified as erroneous and removed/FR (0.49% and 0.03%, respectively, corresponding to Part II alone; see Table 3), resulting in a total of 0.65% of discarded data pairs. Additionally 15.87% of wind speed and 1.73% of wind direction records have been corrected after testing for biases (Part II) and more than 90% of the records were modified in one way or another during the compilation (Part I). The results of the different procedures provide evidence of the inferior initial data quality of the NCAR and DFO datasets used in this study, with a large majority of high-variability-related errors at the NCAR sites and low-variability errors at the DFO sites. These sets present overall a larger percentage (
Some general considerations related to the assessment of measurement errors can be extracted from the development and application of the procedures described herein. Regarding low-variability problems, the length of purported calms or in general, of sequences during which wind speed or direction are constant, can vary greatly depending on the resolution and instrument precision. The importance of segregating sequences according to resolution and precision on the distributions of extremes has been shown (Fig. 2a). For unrealistically long calm periods, threshold analysis allows for singling out obvious erroneous cases. Identifying erroneous shorter calm sequences with plausible lengths is a challenging task. The spatial comparison has been shown to be useful for these situations (Fig. 2c). The tests for high-variability errors have been successful in flagging erroneous clusters of data (Fig. 5d) and extreme events of sites regardless of their recording time resolution. It would be interesting to adapt a similar technique for wind direction in the future, which is hardly addressed in depth in the literature (DeGaetano 1997). The test employed to look for undocumented long-term errors in wind speed has been effective at bridging the time-scale gap between the targets of traditional QC processes (hourly to weekly) and the statistical methodologies devised for homogenization problems (interannual and above; Fig. 7b). The use of metadata has been decisive in identifying a large number of changes in anemometer heights and their associated disturbances on long-term wind speed trends (Fig. 6d). Most of the corrections affected the longest sites, which are more prone to present successive height changes and buoys, with plentiful changes in hull types and transmission channels. Finally, a wind rose correction procedure (Figs. 8a,b) has been proposed with satisfactory results, a topic barely treated in the literature. The method still poses some limitations regarding the minimum unit length for correction (one year) and the minimum detected angle of rotation (effective only for
The development and/or application of techniques to correct wind speed inhomogeneities of an undocumented nature, including long-term biases and/or drifts, has not been considered herein (e.g., Wan et al. 2010). Also, problems such as buoy tilt or wave sheltering in the DFO records (e.g., Gower 1996; Skey et al. 1998) are beyond the scope of the current work.
After the QC, WNENA consists of 525 sites. This database has a relatively homogeneous distribution of sites through time (Fig. 12a). The oldest (and longest) stations are those starting in 1953. The database grows considerably after 1978 with the inclusion of some NCAR stations and also during the 1990s with the aggregation of DFO buoys and new EC and NCAR stations. The last 15 years of data show a spatially homogeneous and temporally stable coexistence of around 300 stations. A considerable number of stations (more than 200) are still active in 2010 (Fig. 12b), which would allow for expansion of the database in the future. Figure 12c shows the spatial distribution of the mean wind speeds and wind directions, and standard deviations of the database. The winds, predominantly westerlies, reach their maximum values along the coast of Labrador, the island of Newfoundland, and the Gulf of St. Lawrence. Figure 12d shows the effects of the QC on the wind speed distribution: before (red), after Part I (blue), and after Part II (black). As a result of the QC process, the highest realistic wind speed records have been reduced from 100 to 53.5
Spatial distribution of availability of observations in the final database: (a) dates of first recordings at each site (colors) and the number of years with available data (symbol size), and (b) dates of last recordings (color scale). (c) Spatial distribution of mean winds (arrows) and standard deviations (isolines). The arrows give the direction from which the mean wind is blowing. The wind speed is given by the arrow size and color. The topography of the area is presented in grayscale. (d) Wind speed histogram comparing the pre-QC database (red) with the database after Part I (blue) and Part II (black). (inset) Seasonal distribution of the number of wind speed records
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0205.1
Acknowledgments
EELE was supported by the Agreement of Cooperation 4164281 between the UCM and St. Francis Xavier University, and projects CGL2014-59644-R and PCIN-2014-017-C07-06 of the MINECO (Spain). Funding for 4164281 was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC DG 140576948), the Canada Research Chairs Program (CRC 230687), and the Atlantic Innovation Fund (AIF-ACOA). HB holds a Canada Research Chair in Climate Dynamics. JN and JFGR were supported by projects PCIN-2014-017-C07-03, PCIN-2014-017-C07-06, CGL2011-29677-C02-01, and CGL2011-29677-C02-02 of the MINECO (Spain). JC was supported by Global Forecasters until March 2014. This research has been conducted under the Joint Research Unit between UCM and CIEMAT, by the Collaboration Agreement 7158/2016. The research has also received funding from the European Union's Horizon 2020 Programme (2014–2020) and from Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project (www.hpc4e.eu), Grant Agreement 689772. We wish to thank the people of Environment and Climate Change Canada, Department of Fisheries and Oceans Canada, and the National Center for Atmospheric Research for providing us with the original data used in this study and for their kindness in responding to all the questions that arose during the development of this work and the review process. Special thanks to Gérard Morin and Hui Wan for the metadata of EC sites; Bruce Bradshaw, Mathieu Ouellet, and Bridget Thomas for information regarding moored buoys; and Douglas Schuster for information regarding the ds461.0 and ds464.0 datasets. We thank J. Álvarez-Solas, A. Hidalgo, and P.A. Jiménez for the helpful discussions. Finally, we would also like to thank the reviewers for the many suggestions and useful information they offered us.
Note: A first version of this database will be made available to the public. The QC procedures in this manuscript have been developed using Linux shell scripting and Fortran programming. Potential users interested in having the code are invited to contact the corresponding author.
REFERENCES
Alexandersson, H., 1986: A homogeneity test applied to precipitation data. Int. J. Climatol., 6, 661–675, https://doi.org/10.1002/joc.3370060607.
Begert, M., G. Seiz, T. Schlegel, M. Musa, G. Baudraz, and M. Moesch, 2003: Homogenisierung von Klimamessreihen der Schweiz und Bestimmung der Normwerte 1961-1990. MeteoSchweiz Tech. Rep. 67, 170 pp.
Cheng, C. S., 2014: Evidence from the historical record to support projection of future wind regimes: An application to Canada. Atmos.–Ocean, 52, 232–241, https://doi.org/10.1080/07055900.2014.902803.
DeGaetano, A., 1997: A quality-control routine for hourly wind observations. J. Atmos. Oceanic Technol., 14, 308–317, https://doi.org/10.1175/1520-0426(1997)014<0308:AQCRFH>2.0.CO;2.
DeGaetano, A., 1998: Identification and implications of biases in U.S. surface wind observation, archival, and summarization methods. Theor. Appl. Climatol., 60, 151–162, https://doi.org/10.1007/s007040050040.
Dunn, R. J. H., K. M. Willett, D. E. Parker, and L. Mitchell, 2016: Expanding HadISD: Quality-controlled, sub-daily station data from 1931. Geosci. Instrum. Methods Data Syst., 5, 473–491, https://doi.org/10.5194/gi-5-473-2016.
Durre, I., M. J. Menne, B. E. Gleason, T. G. Houston, and R. S. Vose, 2010: Comprehensive automated quality assurance of daily surface observations. J. Appl. Meteor. Climatol., 49, 1615–1633, https://doi.org/10.1175/2010JAMC2375.1.
Fiebrich, C. A., C. R. Morgan, A. G. McCombs, P. K. Hall, and R. A. McPherson, 2010: Quality assurance procedures for mesoscale meteorological data. J. Atmos. Oceanic Technol., 27, 1565–1582, https://doi.org/10.1175/2010JTECHA1433.1.
Gandin, L., 1988: Complex quality control of meteorological observations. Mon. Wea. Rev., 116, 1137–1156, https://doi.org/10.1175/1520-0493(1988)116<1137:CQCOMO>2.0.CO;2.
García-Bustamante, E., J. F. González-Rouco, J. Navarro, E. Xoplaki, P. A. Jiménez, and J. P. Montávez, 2012: North Atlantic atmospheric circulation and surface wind in the Northeast of the Iberian Peninsula: Uncertainty and long term downscaled variability. Climate Dyn., 38, 141–160, https://doi.org/10.1007/s00382-010-0969-x.
García-Bustamante, E., and Coauthors, 2013: Relationship between wind power production and North Atlantic atmospheric circulation over the northeastern Iberian Peninsula. Climate Dyn., 40, 935–949, https://doi.org/10.1007/s00382-012-1451-8.
Gower, J. F. R., 1996: Intercalibration of wave and wind data from TOPEX/POSEIDON and moored buoys off the west coast of Canada. J. Geophys. Res., 101, 3817–3829, https://doi.org/10.1029/95JC03281.
Graybeal, D., A. DeGaetano, and K. Eggleston, 2004: Complex quality assurance of historical hourly surface airways meteorological data. J. Atmos. Oceanic Technol., 21, 1156–1169, https://doi.org/10.1175/1520-0426(2004)021<1156:CQAOHH>2.0.CO;2.
Gruber, C., and L. Haimberger, 2008: On the homogeneity of radiosonde wind time series. Meteor. Z., 17, 631, https://doi.org/10.1127/0941-2948/2008/0298.
Hubbard, K., S. Goddard, W. Sorensen, N. Wells, and T. Osugi, 2005: Performance of quality assurance procedures for an applied climate information system. J. Atmos. Oceanic Technol., 22, 105–112, https://doi.org/10.1175/JTECH-1657.1.
Jiménez, P., E. García-Bustamante, J. González-Rouco, F. Valero, J. Montávez, and J. Navarro, 2008: Surface wind regionalization in complex terrain. J. Appl. Meteor. Climatol., 47, 308–325, https://doi.org/10.1175/2007JAMC1483.1.
Jiménez, P., J. González-Rouco, E. García-Bustamante, J. Navarro, J. Montávez, J. de Arellano, J. Dudhia, and A. Muñoz-Roldan, 2010a: Surface wind regionalization over complex terrain: Evaluation and analysis of a high-resolution WRF simulation. J. Appl. Meteor. Climatol., 49, 268–287, https://doi.org/10.1175/2009JAMC2175.1.
Jiménez, P., J. González-Rouco, J. Navarro, J. Montávez, and E. García-Bustamante, 2010b: Quality assurance of surface wind observations from automated weather stations. J. Atmos. Oceanic Technol., 27, 1101–1122, https://doi.org/10.1175/2010JTECHA1404.1.
Klink, K., 1999: Climatological mean and interannual variance of United States surface wind speed, direction and velocity. Int. J. Climatol., 19, 471–488, https://doi.org/10.1002/(SICI)1097-0088(199904)19:5<471::AID-JOC367>3.0.CO;2-X.
Lawrimore, J., M. Menne, B. Gleason, C. Williams, D. Wuertz, R. Vose, and J. Rennie, 2011: An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3. J. Geophys. Res., 116, D19121, https://doi.org/10.1029/2011JD016187.
Loveland, T., B. Reed, J. Brown, D. Ohlen, Z. Zhu, L. Yang, and J. Merchant, 2000: Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data. Int. J. Remote Sens., 21, 1303–1330, https://doi.org/10.1080/014311600210191.
Lucio-Eceiza, E. E., J. F. González-Rouco, J. Navarro, and H. Beltrami, 2017: Quality control of surface wind observations in north eastern North America. Part I: Data management issues. J. Atmos. Oceanic Technol., https://doi.org/10.1175/JTECH-D-16-0204.1, in press.
Mardia, K. V., and P. E. Jupp, 2009: Directional Statistics. Wiley Series in Probability and Statistics, Vol. 494, John Wiley & Sons, 456 pp., https://doi.org/10.1002/9780470316979.
Meek, D., and J. Hatfield, 1994: Data quality checking for single station meteorological databases. Agric. For. Meteor., 69, 85–109, https://doi.org/10.1016/0168-1923(94)90083-3.
MSC, 2013: MANOBS: Manual of surface weather observations. 7th ed. Amendment 18, Meteorological Service of Canada Tech. Rep. En56-238/2-2012E-PDF, 488 pp.
NCEP ADP OGSO, 1980: NCEP ADP operational global surface observations. National Center for Atmospheric Research Computational and Information Systems Laboratory Research Data Archive. Subset: February 1975–February 2007, accessed 1 January 2010, http://rda.ucar.edu/datasets/ds464.0/.
NCEP ADP OGSO, 2004: NCEP ADP global surface observational weather data, continuing from October 1999. National Center for Atmospheric Research Computational and Information Systems Laboratory Research Data Archive, accessed 1 January 2010, http://rda.ucar.edu/datasets/ds461.0/.
Petrovic, P., 2006: Detection of inhomogeneities in wind direction and speed data. Proc. Fifth Seminar for Homogenization and Quality Control in Climatological Databases, WCDMP-71, Budapest, Hungary, WCDMP, 83–90.
Plante, M., S.-W. Son, E. Atallah, J. Gyakum, and K. Grise, 2015: Extratropical cyclone climatology across eastern Canada. Int. J. Climatol., 35, 2759–2776, https://doi.org/10.1002/joc.4170.
Pryor, S. C., and Coauthors, 2009: Wind speed trends over the contiguous United States. J. Geophys. Res., 114, D14105, doi:10.1029/2008JD011416.
Shafer, M., C. Fiebrich, D. Arndt, S. Fredrickson, and T. Hughes, 2000: Quality assurance procedures in the Oklahoma Mesonetwork. J. Atmos. Oceanic Technol., 17, 474–494, https://doi.org/10.1175/1520-0426(2000)017<0474:QAPITO>2.0.CO;2.
Skamarock, W., and Coauthors, 2008: A description of the Advanced Research WRF version 3. NCAR Tech. Rep. NCAR/TN-475+STR, 113 pp., http://dx.doi.org/10.5065/D68S4MVH.
Skey, S. G. P., K. Berger-North, and V. R. Swail, 1998: Measurement of winds and waves from a NOMAD buoy in high sea states. Preprints, Fifth Int. Workshop on Wave Hindcasting and Forecasting, Melbourne, FL, Environment Canada, 163–175, http://waveworkshop.org/5thWaves/C4.pdf.
Thomas, B. R., and V. R. Swail, 2011: Buoy wind inhomogeneities related to averaging method and anemometer type: Application to long time series. Int. J. Climatol., 31, 1040–1055, https://doi.org/10.1002/joc.2339.
Thomas, B. R., E. Kent, and V. Swail, 2005: Methods to homogenize wind speeds from ships and buoys. Int. J. Climatol., 25, 979–995, https://doi.org/10.1002/joc.1176.
Vautard, R., J. Cattiaux, P. Yiou, J.-N. Thépaut, and P. Ciais, 2010: Northern Hemisphere atmospheric stilling partly attributed to an increase in surface roughness. Nat. Geosci., 3, 756–761, https://doi.org/10.1038/ngeo979.
Vejen, F., Ed., 2002: Quality control of meteorological observations: Automatic methods used in the Nordic countries. Norwegian Meteorological Institute KLIMA Tech. Rep. 8/2002, 109 pp.
von Storch, H., and R. W. Zwiers, 2003: Statistical Analysis in Climate Research. Cambridge University Press, 484 pp., https://doi.org/10.1017/CBO9780511612336.
Wade, C. G. N., 1987: A quality control program for surface mesometeorological data. J. Atmos. Oceanic Technol., 4, 435–453, https://doi.org/10.1175/1520-0426(1987)004<0435:AQCPFS>2.0.CO;2.
Wan, H., and X. L. Wang, 2006: Canadian special metadata database for climate data homogenization. Environment Canada Internal Rep., 29 pp.
Wan, H., X. L. Wang, and V. R. Swail, 2010: Homogenization and trend analysis of Canadian near-surface wind speeds. J. Climate, 23, 1209, https://doi.org/10.1175/2009JCLI3200.1.
WMO, 1950: Provisional guide to international meteorological instrument and observing practice. World Meteorological Organization Tech. Rep. WMO-8, chapter 6.4, 68 pp.
WMO, 1969: Measurement of surface wind. Guide to meteorological instruments and methods of observation, 3rd ed. Secretariat of the World Meteorological Organization Tech. Rep. WMO-8, chapter 6.4, 10 pp.
WMO, 1983: Measurement of surface wind. Guide to meteorological instruments and methods of observation, 5th ed. Secretariat of the World Meteorological Organization Tech. Rep. WMO-8, chapter 6.6.2, 14 pp.
WMO, 2008: Guide to meteorological instruments and methods of observation. 7th ed. World Meteorological Organization Tech. Rep. WMO-8, 716 pp.