1. Introduction
Any analysis that makes use of meteorological or climatological observations relies on the quality of the data. Errors in measurements occur at any stage during the process of data manipulation, starting with the recording, storage, and distribution of data, until they reach the end users. Therefore, it is important to establish procedures to ensure the quality of the observations. Errors can be classified into three different types (Gandin 1988): random, systematic, and rough errors. Random errors are unavoidably inherent to all data, independent of the measured value and follow a zero-centered normal distribution. Systematic errors are distributed asymmetrically, usually persist in time and have multiple origins (e.g., instrument bias, calibration drifts, exposure problems; Wade 1987). These errors can be easily mistaken for random errors unless there is a priori information about them. Last, the malfunctioning of measuring devices and mistakes during data processing, transmission, and reception (Gandin 1988) can lead to the third type of error, the so-called rough (or large) error. The majority of the rough errors are caused by the malfunctioning of measuring devices or are communication related—introduced when the data are recorded, pass through, or emerge from communication channels. Although usually only a very small part of all the data are affected, the distortion caused by rough errors can be large enough to greatly impact subsequent analyses.
The procedures and protocols targeting the flagging and elimination, or eventual correction, of those systematic and rough errors are traditionally known as quality control (QC; e.g., DeGaetano 1997) or quality assurance (QA; e.g., Shafer et al. 2000) procedures. The use of both terms is frequent in the literature, often with the same meaning (e.g., Meek and Hatfield 1994; Eischeid et al. 1995; Graybeal et al. 2004; Wan et al. 2007; Lawrimore et al. 2011), which can lead to an ambiguous situation. Resorting to the Guide to Meteorological Instruments and Methods of Observation (WMO 2008), the difference, however, is clear: QA is the framework designed to prevent errors along the various meteorological activities, and QC is focused on the procedures to detect them. Therefore, we will refer to the analysis presented herein as QC.
QC processes can be designed to fulfill different goals. Some evaluate operational (real time) data by focusing on single stations (Meek and Hatfield 1994) or involving station networks (Wade 1987; Gandin 1988; DeGaetano 1997; Shafer et al. 2000; Fiebrich et al. 2010). Others focus their attention on assuring the quality of previously compiled historical databases (Graybeal et al. 2004; Jiménez et al. 2010b) that, in turn, may have been previously subjected to a QC process. Many quality protocols address several meteorological variables at the same time and are able to exploit cross information from each parameter (Gandin 1988; Meek and Hatfield 1994; Shafer et al. 2000; Fiebrich et al. 2010; Dunn et al. 2016), while in other cases they specifically focus on one parameter, which is often temperature or precipitation (Gandin 1988; Eischeid et al. 2000; González-Rouco et al. 2001; Lanzante et al. 2003; Lawrimore et al. 2011). Comparatively few studies address the detection of erroneous data and their correction or suppression in wind variables (DeGaetano 1997; Graybeal 2006; Jiménez et al. 2010b). Detection or correction protocols usually involve a battery of tests or checks, each of them focused on a specific potential problem. In some of these tests, a comparison with a neighbor or reference station is essential. In other cases the tests are carried out individually for each station.
The limit or plausibility checks search for individual measurements outside of a certain physical or statistical admissible range of values (e.g., Meek and Hatfield 1994; Graybeal et al. 2004; Lawrimore et al. 2011; Woodruff et al. 2011; Dunn et al. 2016). Temporal consistency checks account for excessive variability or unrealistic steady behaviors (e.g., DeGaetano 1997; Jiménez et al. 2010b). Internal consistency checks cross compare multiple variable types or a variable type from redundant sensors (e.g., Shafer et al. 2000; Graybeal et al. 2004). Spatial checks evaluate the records of a site in relation to those obtained at some neighbor location (e.g., Barnes 1964; Gandin 1988; DeGaetano 1997; Hubbard et al. 2005; Durre et al. 2010; Steinacker et al. 2011). Duplication error checks identify segments that could be artificially duplicated within a station’s lifetime or between different sites (e.g., Kunkel et al. 1998; Guttman 2002; Durre et al. 2010; Jiménez et al. 2010b; Lawrimore et al. 2011; Dunn et al. 2016). Finally, typographical error checks look for errors related to human mistakes made when the observations were recorded on paper and later transcribed during digitization efforts (e.g., DeGaetano 1997; Kunkel et al. 1998; Guttman 2002; Graybeal et al. 2004; Dunn et al. 2016).
In addition to these tests, which mainly deal with rough errors, there are also different procedures focused on the detection of systematic errors or biases (e.g., Klink 1999; Begert et al. 2003; Thomas et al. 2005; Jiménez et al. 2010b; Wan et al. 2010). These problems tend to affect longer time intervals than those discussed above. When changes are documented, corrections can be straightforward as in the standardization on known measurement height changes (Klink 1999; Thomas et al. 2005). These methods straddle the often fuzzy border between QC and data homogenization procedures, which are focused in the detection (and eventual correction) of artificial breaks in long-term means, standard deviations, or trends (e.g., Alexandersson 1986; González-Rouco et al. 2001; Begert et al. 2003; Wan et al. 2010).
At the stage of implementing corrections, some studies treat the aforementioned checks independently and thus decide about the quality of the data at the end of each test that is applied sequentially (DeGaetano 1997; Jiménez et al. 2010b; Lawrimore et al. 2011). Other studies use the so-called complex procedures: flagging the data after each test and making the final decision based on the results of all tests (Gandin 1988; Meek and Hatfield 1994; Eischeid et al. 1995; Shafer et al. 2000; Graybeal et al. 2004; Wan et al. 2007). Once the data are flagged, they can be eliminated or corrected through near- or fully automatic processes (Gandin 1988; Dunn et al. 2016) or with the help of human intervention (Wan et al. 2007; Jiménez et al. 2010b; Lawrimore et al. 2011). The data can also be merely flagged (Meek and Hatfield 1994; Eischeid et al. 1995; DeGaetano 1997; Shafer et al. 2000; Graybeal et al. 2004; Durre et al. 2010; Dunn et al. 2016), leaving the ultimate decision regarding corrections/removal to the end user.
Many of the tests cited above seek to detect either rough or systematic errors that can be produced at different moments between generating and archiving meteorological information. These types of erroneous records are in general of a local nature and are in principle not related to the institutional data sources. We will refer to these types of errors broadly as measurement errors.
On the other hand, during the operation and management of meteorological networks, the institutions in charge often adopt a set of criteria that assure the internal coherence of their data regarding the way the variables are measured and postprocessed (e.g., WMO 2008; MSC 2013). The criteria can, however, differ from one institution to another and pose challenges when unifying data from different sources into a common database. Additionally, there are errors, generally related to data manipulation, that can systematically affect multiple series that originate from a common source (e.g., duplication errors). These cases will be regarded as issues related to data storage and management.
The present work summarizes a QC process applied to a historical data compilation of onshore and offshore surface wind observations across the east coast of Canada and the northeastern United States. The observations have been collected from three different sources. The sources were selected on the basis of their availability for this study. The combined time span of the records covers almost 60 years with varying time resolutions; uneven measurement units; and changing measuring procedures, instrumentation, and heights. The level of QC procedures applied to the series prior to our compilation can be very different (Thomas and Swail 2011; MSC 2013). Therefore, the potential number of existing errors could be high and may have a nonnegligible impact on any future analysis.
The large-scale dynamics favor the transit of cyclones from tropical origin over the region of interest during the summer season (Landsea 2007) and even more intense extratropical cyclones during winter (Hart and Evans 2001; Plante et al. 2015). Such extratropical cyclones are frequency responsible for extreme weather events (Richards and Abuamer 2007; Cheng 2014). A large coastal perimeter and complex orography pose challenges for downscaling strategies oriented to the understanding of wind variability at a range of time scales, from intra- and interannual to long-term trends. So far, this area has received relatively low attention (e.g., Cheng et al. 2008, 2012; Martinez et al. 2013) and future analyses of the database provided here may focus on regional wind variability and trends that have been performed in other regions (e.g., Najac et al. 2009; Jiménez et al. 2010a; García-Bustamante et al. 2012; Pryor and Barthelmie 2014). This may be of scientific and societal relevance, as the government of Canada has shown a growing interest in building wind farms on the peninsula of Nova Scotia and in its annexed provinces (e.g., Hughes et al. 2006; Hughes 2007; Hughes and Chaudhry 2011). For the veracity of these analyses, however, it is paramount to handle observational databases in which the quality of different sources is brought to a common ground so that the data can later be used with confidence regardless of their provenance.
The objective of this work is to analyze and improve the quality of a set of wind surface data across northeastern North America obtained from a variety of sources and ultimately to develop a database useful for the analysis of surface wind variability. This study is divided into two parts. The goal of this first part is to analyze the occurrence of the various issues related to data management errors and their impact. Some of the issues treated herein have been discussed in previous works, addressing, for instance, eventual site relocations (e.g., Vautard et al. 2010), duplication errors (e.g., Dunn et al. 2016), or checks related to physical limits (e.g., Durre et al. 2010), among others. Alternatively, some of the tests used in this work are new, like those targeting site relocations or duplication errors, and can be useful workarounds for situations were metadata are not available (e.g., duplication errors). For each test, a description of the type of the problem is provided, together with a report on the statistics of occurrence in space and time, and other details, such as the data source. This helps to illustrate the different factors that can contribute to the apparition and occurrence of management errors. Although the specifics of some of the developed tests (especially during the compilation) have been tailored to the used data sources, the issues presented herein are nonetheless common to many different kinds of datasets, and most of the described procedures can be applied broadly.
The second part of this study (Lucio-Eceiza et al. 2017, hereafter Part II) is focused on measurement errors; the procedures presented therein are of universal applicability, as these errors are independent of the dataset. As with Part I, attention will be paid to illustrating the dependencies on the occurrence of errors. In both parts, an evaluation of the impact of errors on the statistics of the data is provided.
The remainder of the present paper is structured as follows. Section 2 describes the observational database. Section 3 describes the methodologies of the QC process for issues related to data management. Section 4 provides an account of the results obtained during each step of the QC applied herein. The impact of the suppressed data is discussed in section 5. The conclusions and some discussion are provided in section 6.
2. Observational wind data
The QC described herein focuses on a surface wind database that integrates 526 stations distributed across northeastern North America (WNENA). WNENA is the result of an aggregation of three different datasets (Fig. 1a), each one provided by a different institution: Environment Canada [EC; now known as Environment and Climate Change Canada (ECCC)], the Department of Fisheries and Oceans Canada Integrated Science Data Management division (DFO), and the operational global surface observations (NCEP ADP OGSO 1980, 2004) archived at the National Center for Atmospheric Research (NCAR). WNENA has an uneven distribution of stations, with higher spatial density across the southern area and along the coast, and with lower density northward and inland. The database spans over 60 years of hourly, 3- and 6-hourly recorded measurements using a variety of time references (Fig. 1b). Only simultaneous valid data pairs of both wind direction and speed are kept. Additionally, only sites with valuable data from the climatological perspective were selected, keeping those that had a good representation of at least one annual cycle or partial information over more than one season. For land stations, only those that had at least one year with >90% of nonmissing records or 3 years with >50% of nonmissing records were selected. For moored buoys the conditions were less stringent, as these are more prone to having data gaps (Thomas and Swail 2011), specifically those from the Great Lakes, which are only seasonally operated during ice-free months (B. Bradshaw 2009, personal communication). Only buoys that had at least one year with >85% of nonmissing records or 2 years with four operating months were kept. These conditions reduced the initial size of the database from ~700 sites to the actual number of 526 and accounted for an approximate loss of 2 400 000 pairs of records. As these numbers are the result of an initial decision to remove potentially problematic sites or sites of lower value previous to the compilation (section 3a), these numbers have not been included in the statistics and results described herein.
(a) Distribution of available station data and original wind speed units according to the source institution (see legend). (b) Distribution of the regional time zones (local standard times/daylight saving times; shading, see legend) and recording time references of the stations (symbols) as they were provided by their source institutions. For the latter, stars indicate sites where more than one reference time was used throughout their operational history; in those cases the color indicates the last operational time reference. Daylight savings time (DST; LST + 1 h) has not been used in data recording. (c) Number of active stations over time; each row corresponds to a site; the sources are identified with a different color. (d) Example of MSE pairwise comparison between station 702327R, Île Charron at UTC-5 and three neighboring sites located in Quebec already set to UTC (symbols). MSE values are given in logarithmic scale and the corresponding hour-lag difference in color filling. (e) Example of a site with two different LSTs (station 7041166, Grand lac des Îles, Quebec; EC data). The original recording time (LST; blue) and the corrected one (UTC; red) are indicated. The gray area shows the difference in hours between the LST (either AST or EST) and UTC. The orange line indicates the changepoint from AST to EST.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0204.1
EC is the primary source for data with an original size of more than 400 sites; after the minimum length constraint, the number of sites was reduced to 343 land stations distributed across the east coast of Canada (see Table 1) and encompassing the provinces of New Brunswick (40 sites), Newfoundland (48) and Labrador (16), Nova Scotia (66), Prince Edward Island (19), and Quebec (154). The data have been gathered from HLY01 (hourly weather) and HLY15 (wind) ASCII individual files. These sites have been, to various degrees, previously quality controlled in both real-time and delayed mode by Environment Canada (MSC 2013). The files were acquired in subsequent batches in May 2008, February 2009, and March 2009. The series span from 1 January 1953, with 44 sites available, to 4 March 2009, with 193 sites available (see Fig. 1c).
The regions (first two columns), the number of sites per region from each of the three data-providing institutions (columns 3–5), and the total number of sites (last row). Numbers in parentheses correspond to sites in each country.
A database of this spatiotemporal extension makes use of a great variety of anemometer types and averaging methods through time (Table 2), both automatic and manually operated, with this being a source of potential data issues. The most used anemometer types are the U2A (HLY01 and HLY15) and 45B (HLY15) for manned stations, and in recent times 78D digital automatic systems that incorporate U2A equipment. The original measurements for U2A and 45B are 2-min averages ending at the time of recording and are reported to the nearest nautical mile (1.852 km) per hour since 1996. Prior to that date, 1-min averages to the nearest land mile (1.609 km) per hour were used. The 78D system, in turn, provides averages ranging between 2 and 10 min (Richards and Abuamer 2007; Wan et al. 2010). All the data have been provided in kilometers per hour. The direction has been recorded at 8 (HLY15), 16 (HLY01), or 36 (HLY01, HLY15) points of the compass, with the transition from 16 to 36 points taking place at the end of 1970 (Environment and Climate Change Canada 2017). The records with 36 points of the compass are provided to the closest decagrade (0–36), while those given in 8 (16) points store their measurements in alternate intervals of four or five (two or three) decagrades (MSC 2013). The standard measuring height should follow, in theory, the international convention of 10 m (WMO 1950, 1969, 1983, 2008; MSC 2013). However, in practice many sites have experienced changes through time, particularly in the 1950s and 1960s, when it was not rare to install the instrumentation on rooftops to attain better exposure (Klink 1999; Wan et al. 2010). Only after the 1970s can the heights be considered with certain confidence at the standard 10-m height [Wan et al. 2010; see Table 3; for more information see Part II, section 4b(1)].
List of known anemometer models and types, wind speed operating ranges, and length of measuring recording period by each source institution. The transition from 16 to 36 points of the compass for U2A anemometers was done from December 1970 to January 1971.
Number of anemometers per site and height (m) for each institution. For DFO buoys, the hull type is also provided. Heights with asterisk are nominal; the known height range is the value between the brackets.
The records are given in local standard time (LST), which usually matches with province boundaries (Fig. 1b): eastern time zone (ETZ) at coordinated universal time − 5 h (UTC-5 or eastern standard time, red), Atlantic time zone (ATZ) at UTC-4 [Atlantic standard time (AST), orange], and Newfoundland time zone (NTZ, purple) at UTC-3.5, although the observations were made at 30 min past the hour, thus at AST. The data have been archived at hourly time resolution for most of the cases, although a few sites have been reporting the data at 3-hourly resolution [1960s–1980s; e.g., sites 704470 (Manicouagan) and 8200640 (Canso)] or synoptic intervals [until 1960s; e.g., 7043000 (Harrington Harbour) and 8401000 (Cape Race)] and some even only during daylight hours [e.g., 7052605 (Gaspe) and 705C2G9 (Îles de la Madeleine)].
The DFO dataset, archived by the Fisheries and Oceans Canada Integrated Science Data Management (ISDM) division (http://www.meds-sdmm.dfo-mpo.gc.ca/isdm-gdsi/waves-vagues/index-eng.htm), consisted originally of 22 moored weather buoys from Environment Canada covering the east coast of Canada (10) and the Canadian Great Lakes (12) that after the first step of the compilation (section 3a) resulted in 40 fixed positions (Table 1). The meteorological raw data were gathered from individual CSV files and had not received any quality control (Thomas and Swail 2011). The files had a single flag that applied to wave data and sometimes indicated whether the buoys were at the right position. The data corresponding to buoys flagged as being off position, adrift, or in dock under reparation, or at the end of each measuring season have not been considered. The data were accessed during June 2008 and consequently the time span ranges from 2 December 1988 to 25 June 2008 (see Fig. 1c). The data are provided at different heights depending on the hull type of the buoy (Table 3). Some time series have been recorded using different hulls and thus at different heights. In buoys with two anemometers, the highest one is usually considered as the primary source of information—or primary channel—while the second one is used as a backup when the first one is faulty. The historical MSC buoy status reports (available on the aforementioned ISDM website) give information on the channel used for data transmission, which corresponds to the highest anemometer by default. The time series of each buoy has been constructed by combining the information of both channels, either by choosing the transmitted channel or by visually rejecting the channel with erroneous data when the metadata were not available (e.g., before 22 June 1997). The periods when the sensors were unserviceable are also indicated in the metadata and were removed from our series. The records are 10-min-average samples ending at the time of their recording. Most of the measurements have been performed with R. M. Young anemometers, although since 2007 Vaisala WS425 ultrasonic anemometers have also been installed in secondary positions (Table 2; Thomas et al. 2005; Thomas and Swail 2011) with wind speed and direction being recorded meters per second and degrees, respectively. The data were provided in UTC, mostly in hourly resolution but also in 3-hourly resolution [e.g., 44139 (Banquereau Bank)] and then rounded to the closest hour at the collection. The reported time in the CSV files corresponds to the end of the wave sample, which for east coast buoys (including the Gulf of St. Lawrence) occurs 45 min before the end of the meteorological sample. The reporting times were delayed accordingly to match the meteorological measurements. The reported times for the Great Lakes are given at the end of the meteorological measurements and were left as they were provided (AXYS Environmental Consulting Ltd. 1996; M. Ouellet 2015, personal communication).
NCAR provided 143 additional series. From an original set of ~700 NCAR sites located in the region, only those longer than a year and located farther than 0.05° of any nearby EC station were chosen, both to improve the density of sites across eastern Canada and to introduce some information across the southern part of our area of interest. Ninety-one new sites across eastern Canada, involving stations in Nova Scotia (2), Nunavut (2), Ontario (78), and Quebec (9), and 52 sites across adjacent lands in the United States, including the states of Maine (14), Massachusetts (8), New Hampshire (9), New York (18) and Vermont (3), were added (Fig. 1a; Table 1). The dataset combines data from synoptic observations (SYNOP), aviation routine weather reports (METARs), Automated Weather Observing Systems (AWOS), and Automated Surface Observing Systems (ASOS), transmitted by the Global Telecommunication System (GTS) and stored in ds464.0 [office note (ON) 124 format] and ds461.0 (WMO BUFR format) databases. The data were downloaded on 1 January 2010. The series span from 1 January 1978 to 31 December 2009 (see Fig. 1c). Following the recommendations from NCAR, only ds461.0 was used since April 2000. There is no evidence of any QC process applied by NCEP to either ds461.0 or ds464.0 land surface wind data. The sampling resolution varies from 1–2 to 10 min before the hour. The data were recorded in knots for wind speed and degrees with a resolution of 36 compass points for wind direction, and were provided at UTC mainly in hourly, 3-hourly, and synoptic resolution, rounded to the closest hour during collection.
Being a compilation of both Canadian and U.S. sites, the measurements have been carried out with a great variety of anemometers and sampling techniques. For example, for the sites located in Canada the anemometers are likely to be of the 45B/U2A type, while the ASOS sites in the United States are equipped with the Belfort F420 series (Table 2; Nadolski 1998), although they have been transitioning to the Vaisala NWS 425 ice-free wind sensor (IFWS) ultrasonic anemometers since late 2005 (NOAA 2003; Schmitt IV 2009). The anemometer heights at these sites, although theoretically at 10 m (WMO 1969, 1983, 2008), in reality may have varied considerably (Wieringa 1980; Klink 1999; Pryor et al. 2009), and only for ASOS data from the mid-1990s onward can a 10-m height be assumed with certain reliability (Table 3; Nadolski 1998).
3. Quality control methodology
The QC that has been applied is structured into six phases that deal with the detection of various issues in data quality (numbered in Fig. 2): 1) compilation; 2) duplication errors; 3) physical consistency in the ranges of recorded values; 4) temporal consistency, regarding abnormally high/low variability in the time series; 5) detection of long-term biases; and 6) removal of isolated records. The first three phases deal with issues often related to data recording and management. The issues discussed in the compilation phase are divided into two steps. The first one is related to the way the information is stored in the different datasets. The second step is related to issues that arise at the moment of the compilation of data from different sources and involves the unification of criteria due to different institutional practices. The latter step is also the case with the consistency in values phase regarding redefinitions like those of true north and calms. The duplication errors and consistency in values phases are mostly related to data management issues, although instrumental faults can also influence the consistency in values phase. The last three phases (phases 4–6) deal with measurement errors related to instrumental problems, like untrustworthy performance, calibration, siting, changes in exposure of the surrounding environment, or others. This manuscript describes the issues related to data management (phases 1–3 in Part I, Fig. 2), while measurement errors will be addressed in Part II (phases 4–6 in Fig. 2).
Diagram describing the six phases of the QC process. Magenta (green) highlights checks that are applied only to wind speed (wind direction). Blue indicates tests applied to both variables. This paper deals only with issues associated with data management (first three phases); the measurement errors are described in Part II.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0204.1
The QC process follows a sequential structure designed to minimize potential overlapping between the various different phases. Most of the checks are common for both wind speed and direction, although some of them specifically address only one of the variables. The steps outlined in this manuscript (Part I) are designed to remove all the data regarded as erroneous after each phase, where the elimination of a speed or direction record implies the loss of the pair of both variables. However, in Part II the erroneous records will only be left flagged without further removal. Some of the procedures, as discussed at the introduction of each test within these papers, are to some extent based upon those developed in Jiménez et al. (2010b). However, improvements to them and many new steps have also been introduced herein. This section describes the first three phases, while the presentation of results and the illustration of specific cases will be addressed in the next section. Likewise, section 3 in Part II will deal with the last three phases in Fig. 2.
a. Phase 1: Compilation
The compilation phase (phase 1 in Fig. 2) is divided in two steps. In the first step a series of procedures was independently applied to each separate data source in order to detect errors during the data transcription or collection process (see typographical error checks, section 1). Tests were run to detect and correct measurements out of chronological order (Guttman 2002; Graybeal et al. 2004) and dates that have been entered/stored more than once (Guttman 2002). From February 2001 to July 2002, the ds461.0 dataset stored wind speed data erroneously. There is abundant information in its documentation page reporting this issue. Wind speed records from raw SYNOP reports, received via the GTS, were incorrectly converted when the ADP BUFR records and files were created. Numerous raw SYNOP wind speed reports were assumed to be in units of meters per second when they were actually in knots. This error spread over various sites, and the data were patched with a corrected batch.
Additionally, displacements of stations have also been taken into account in all datasets. The DFO buoys suffer changes in their moored position from time to time. Each buoy time series has been split into several parts, each one corresponding to the periods of stable positions after a displacement took place. Some NCAR sites also show displacements (Vautard et al. 2010), albeit for different reasons. This happens because the code identifiers of the stations are reused each time a station ceases to exist or is moved (e.g., to a different site within the same airport) and leads to cases where a code may combine data from different locations through time. To identify when a relocation could appreciably affect the wind behavior, the percentage of change of mean wind speeds was calculated for subsequent periods before and after each location changepoint reported in the data files. The changes were compared to changes experienced in randomly selected dates. This allowed for identifying the range of change that can be ascribed to natural variability and for identifying the shifts that produced changes that are comparably too large.
The second step of the compilation phase deals with the standardization of the diversity of measurement units, formats, and dates described in section 2 to a common frame (e.g., Haylock et al. 2008; Durre et al. 2010). Wind speed has been set to meters per second for all datasets and wind direction to degrees. The recording time of all the sites has been set to UTC with the help of a metadata file provided by EC and that contained the LST of all the EC stations. The information contained in the metadata has been independently validated through a pairwise comparison between each EC (target) site and its neighbors via mean square error (MSE). The comparison was carried out by shifting the target site 5 h forward and backward with respect to its pair, and looking at the time lag with minimum MSE among them, a procedure similar to that followed by Haylock et al. (2008). Section 4a describes the data that have been modified at this stage.
b. Phase 2: Duplication errors
The tests performed during this second phase (Fig. 2) identify periods of data that might have been accidentally duplicated during data retrieval, transmission, and archival (Kunkel et al. 1998; Durre et al. 2010; Jiménez et al. 2010b; Lawrimore et al. 2011; Dunn et al. 2016). These errors can take place within the same series (intrasite duplications) or from the accidental transfer of data from one series to another (intersite duplications). The checks have been applied first to a single time series and then to target intersite duplications. Both cases are handled in a similar manner.
The initial phase of the test localizes any data chain of any length that has been repeated in every other period within the same time series. For intersite duplications, the detection is done for chains that are repeated between site pairs and at any time. The intersite process is conducted systematically by comparing each site with every other site in the database; that is, the process is repeated
All the flagged repeated chains are subjected to a final inspection before any corrective decision is taken. The duplicated data chains from each of the two different time intervals at a given site (intrasite case) or from each of the two sites involved (intersite case) are compared with data from neighboring stations via Pearson correlation coefficient whenever this is possible. For wind direction sequences, directional statistics are applied to the correlation (Mardia and Jupp 2009). If this comparison provides enough evidence, the correct data interval will be identified and preserved, and the erroneous one erased. Otherwise, both data intervals will be removed. For intersite duplications the comparison has been extended to other time intervals and neighbor sites to identify whether the flagged repetitions can be attributed to a meteorological/natural origin.
c. Phase 3: Consistency in values
The purpose of phase 3 (Fig. 2) is twofold: 1) to unify the criteria to consistently define calm and true north values in the database, and 2) to identify unrealistic observations within each time series.
The original data sources did not use a common criterion for wind direction in calm (wind speed = 0) situations and also in true north conditions, when wind speed is different from zero. Therefore, wind direction has been herein set to match the criteria established in DeGaetano (1997): 0° for calm cases and 360° for true north cases.
Unrealistic measurements are those that fall outside of some defined recording range. The range can be derived from statistics calculated at different time scales, from extreme events based on historical records (e.g., Graybeal et al. 2004; Dunn et al. 2016) or from the limits given by the specifications of the sensor (e.g., Meek and Hatfield 1994). In our case the limits are intended to be consistent with the limited metadata information of the instruments used in the observational networks (Table 2). Wind direction records that fall beyond
4. Results
This section reports on the results of the first three phases of the QC process by showing the spatial and temporal distributions of each error type and illustrating each error type with some specific examples. A schematic description of each test is listed in Table 4. The number of affected records in each phase is presented in Table 5. The numbers in columns 2 and 3 correspond strictly to the data affected by either wind speed or wind direction, and the percentages are given with respect to the initial amount of data (53 956 328 records). The totals in column 4 refer to the affected wind speed and wind direction pairs (107 912 656 records), since the elimination of a speed or direction record implies herein the loss of the pair of both variables.
Summary of the procedures carried out in Part I. The meaning of the abbreviations/symbols is at the bottom of this table.
Quantity of affected data during each phase of Part I of the QC (Fig. 2) for wind speed and wind direction, and in total. The corresponding percentage, in parentheses, is given with reference to the initial number of records (53 956 328 wind speed/direction records). The steps regarding data transcription and collection (phase 1) and the records that have been modified instead of erased (e.g., true north in section 3c) are marked with a an asterisk and are not taken into account to calculate the total of deleted data in the last row. For the first phase of the QC, where the modifications affect the data of specific institutions, the name of the institution is indicated.
a. Phase 1: Compilation
The checks applied are aimed at detecting changes in the time sequence of data and repetitions of record entries for a given time step (section 3a). The compiled series did not show any cases of measurements out of chronological order. However, many repeated record entries for the given dates were detected, and they affected exclusively the NCAR dataset. According to their documentation page, these duplications may happen, for instance, when station METARs fall on the same time as SYNOP reports and are archived twice. Figure 3a shows the spatial distribution of the affected stations, 116 out of 143—some of them with up to near 8000 repeated entries totaling 258 321 records. However, only 2261 of these cases, belonging to 43 sites, involved various entries with different wind speed or direction values (Fig. 3a, color bar). In the cases with various entries containing the same observations, only one entry was kept. For the entries that presented different observations, the date was set as missing.
(a) Spatial distribution of NCAR stations with repeated date entries; repeated entries with the same data are in white and the color bar indicates those with differing records. (b) Spatial distribution of NCAR sites with modified data as a result of decoding errors (regular triangles) and removed data as a result of internal displacements (inverted triangles). The color bar indicates the amount of affected data. (c) Wind speed decoding issue (red) and correction (blue) for site 71621 located in Trenton. (d) Percentages of change for mean wind speed before and after each location change as indicated in the NCAR dataset vs the distance of the displacement. Percentage of wind speed changes in real acknowledged shifts (red dots). Values estimated from randomly selected dates in wind speed series for which no shift was reported in the selected date (blue dots). This range of values (gray line). (e) Wind speed measurements at hourly (red) and monthly (black) resolution for site 71432 located in Port Weller highlighted in (d). Date of occurrence and the shift (blue bar).
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0204.1
Additionally, a unit conversion issue related to decoding/encoding the ds461.0 wind speed data was also amended. The data were erroneously stored as meters per second instead of knots. This error affected 41 of our sites, with a total of 1 003 991 patched records, shown in Fig. 3b with regular triangles. Figure 3c shows an example for a site located in Trenton (Ontario).
Regarding displacements, the DFO moored buoys have been split constituting 40 independent buoy series of stable positions (section 3a): 23 series for the east coast of Canada (from the initial 10) and 17 for the Canadian Great Lakes (from the initial 12). In the case of NCAR stations, 30 out of the 143 sites showed relocations. Most of the relocations were by less than 3 km (Fig. 3d) and entailed wind speed mean shifts below 10% (gray line). This range is comparable to shifts calculated from randomly selected periods with no reported relocations (blue dots) that can be attributed to natural variability. The displacements showing larger ratios or those that took place over distances above 3 km were more thoroughly analyzed. For the four cases showing this condition, the last period after the change was removed. In total 84 062 records were erased. The affected sites are shown in Fig. 3b with inverted triangles. Figure 3e shows an example of a displacement of 1.38 km of station 71432 located in Port Weller (Ontario).
Regarding the standardization step, Fig. 1a shows the spatial distribution of the measurement units for wind speed. All records from EC (
b. Phase 2: Duplication errors
The search of inter- and intrasite duplicated chains has been undertaken with periods of 12 h and longer to allow for a minimum of 6-hourly data chains of at least three values. Around
Absolute frequency distribution of repeated chains according to their length, for (a) intrasite repetitions of wind direction (green bars) and (b) intersite repetitions of wind speed (red bars). The chains that were duplicated at equal/similar dates (blue) in percentages with respect to the total number of duplicated chains. The chains corresponding to percentages greater than 50% (dashed line) are regarded as suspect (shaded area). The repetitions with periods longer than 50 values have been clustered for easier visualization; note the change in scale on the x axis. Spatial distribution of (c) intrasite and (d) intersite repetitions, and the involved amount of erroneous data. The contour colors indicate the institution, the regular (inverted) triangle indicates duplications in wind direction (speed), and the color represents the amount of duplicated data (in logarithmic scale). (e) Example of intrasite wind speed duplication for station 8101600 (Fredericton; EC). Red (blue) lines/dots follow the red (blue) timeline. The duplicated chains are highlighted (gray shading). (f) Intersite wind direction duplication involving station CWVY at Villeroy provided by NCAR (blue), with 7018766 (red) and 701Q009 (orange) both located in Lemieux, provided by EC. Relative distances are illustrated on the bottom of the plot.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0204.1
A total of 16 (17) sites were affected by erroneous intrasite wind direction (wind speed) duplications, marked as regular (inverted) triangles in Fig. 4c, affecting nine buoys and eight EC land sites and involving a total of 5640 records (
A total of 1689 candidate intersite chains (976 wind direction, 713 wind speed) were flagged for later evaluation. These duplications correspond to only 9 (19) site pairs that involve 15 (28) sites for wind direction (speed) across the approximate 64 000 (38 000) site pairs that share any number of repeated chains. A comparison with neighbor stations and at different periods allowed us to identify the duplications caused by similar meteorological behaviors, which were spared. This is the case, among others, of eight sites located on Prince Edward Island (PEI) with a few sporadic duplications lasting around a day that occurred either simultaneously or with a difference of 1–2 h (not shown). PEI is a territory with a gently rolling landscape in which the highest point of land is located at only 152 m above sea level, which favors similar undisrupted wind flows all over the area. After the analysis, only duplications corresponding to four different sites were considered erroneous (Fig. 4d), one from NCAR and three from EC, totaling 138 864 (0.13%) records. The comparison with neighbors allowed us to identify the site that inherited the duplicated data in each case. The longest duplicated period corresponds to a site in Villeroy (Quebec, NCAR; Fig. 4f) that duplicates wind direction data of two other nearby sites (EC) for almost seven consecutive years. The differing institutional calm definition (see section 3c) resulted in the detection of fragmented chains instead of a continuous long chain. Duplications in speed were not detected, probably due to successive unit conversions of wind speed before our compilation process, presumably at the retrieval by NCAR for the ds464.0/ds461.0 set. These two sites, both in Lemieux (Quebec) and separated 500 m from each other, are located 17 km apart from Villeroy. This was the only cross-source duplication we detected, but it is nevertheless a reminder of the care that is needed when merging information from different sources (Dunn et al. 2016).
c. Phase 3: Consistency in values
The new direction criteria for calms consist of assigning 0° for the wind direction when the wind speed is 0
(a) Temporal distribution of the different wind direction criteria used for calms (wind speed is 0
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0204.1
The removal of unrealistic wind speed records is hampered, as noted in section 3c, by the lack of extensive metadata on the use of different anemometer types and their variety of operational ranges (Table 2). This makes the establishment of a confident upper limit for wind speed elusive. As can be seen in Fig. 5b, some of these operational instrumental limits (dark blue vertical bars) fall within the tails of the wind speed distribution (blue bars). However, the wind speed records within this range bear physical realism. During the summer season, the extratropical cyclones of tropical origin (Landsea 2007) induce very high winds over the region. A comparison between the approximate wind speeds during the cyclonic events for 1954–2010 derived from the Canadian Hurricane Centre (CHC, gray bars) and those recorded by WNENA (red) is also shown in Fig. 5b. The information about the cyclonic events and their approximate wind speeds have been constructed from the storm-track images and the complementary information provided by the tropical cyclone season summaries (http://www.ec.gc.ca/ouragans-hurricanes/default.asp?lang=en&n=23B1454D-1). The midlatitude storms are even a larger contributor to extreme winds, much larger than hurricanes and with wind speeds that match or exceed hurricane intensity (Richards and Abuamer 2007). These storms usually occur during winter and are responsible for the majority of the extreme winds occurring in our area of interest, as shown in Fig. 5c. Data from the sites located in Mount Washington (New Hampshire) with a mean of ~15
(a) Spatial distribution of stations with detected unrealistic wind speed (inverted triangles) and direction (regular triangles) values. Examples of (b) wind speed (72613, Mount Washington; NCAR) and (c) direction (71624, Toronto Pearson International Airport; NCAR) are provided in the insets.
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0204.1
Regarding wind direction, 181 unrealistic records (<0.01%, Table 5) corresponding to 14 stations were detected, all belonging to the NCAR dataset (Fig. 6a). An example is shown in Fig. 6c, corresponding to a site located at Toronto Pearson International Airport (Quebec; NCAR), with a record of 990° that probably corresponds to a miscoded missing value.
5. Impact
This section summarizes the extent of the issues related to data management. The impact of the modifications on the statistics of the sites (mean wind speed and direction, standard deviation, kurtosis, and skewness) will be presented in Part II for the whole quality control process. Many sites have suffered profound modifications during the compilation processes described herein. For example, 41 NCAR sites (out of 143) were affected by duplicated entries that were erased from the sites; 43 NCAR sites presented unit conversion issues that had to be corrected; and 7 sites showed relocation with significant changes in the behavior of the time series, which implied the removal of the shorter location in each instance. Figure 7a shows the most relevant issue at each of the affected 70 NCAR sites in terms of the largest amount of modified data. The whole DFO dataset suffered periodically from buoy relocations that involved the refurbishing of the original dataset into a more manageable one composed by static locations. Regarding EC, all the sites had to be transformed from their LST dates to UTC. Finally, the datasets as a whole had to be standardized in their measurement units and in their true north and calm criteria.
(a) Overview of the procedures involving the largest amount of affected data at each site during the compilation phase of the NCAR dataset. Overview of the errors involving the largest amount of removed data at each site during the duplication errors and consistency in values phases for (b) wind speed and (c) wind direction. The colors indicate the error type that is dominant at each site, and symbols identify the data source institution (see legends). For the meaning of the abbreviations, refer to Table 5, phases 1–3., bold text in the first column (d) Distribution of the percentage of total deleted data at each site after the first three phases of the QC. (e) Wind speed histogram comparing the pre-QC database (red) with the database after Part I tests (blue).
Citation: Journal of Atmospheric and Oceanic Technology 35, 1; 10.1175/JTECH-D-16-0204.1
The duplication errors and consistency in values phases had, with a few exceptions, a lesser impact on the sites than the compilation, but as they face QC issues that are commonly treated in other works, they are presented separately in Fig. 7. Figures 7b,c show the type of errors with the largest implication in terms of deleted data at each site both for wind speed (Fig. 7b) and direction (Fig. 7c). Only 43 (12%) sites have been affected in one or more of the analyzed three error typologies, 29 in the case of wind speed, and 34 for wind direction. The most widespread error for wind speed (direction) records is related to intrasite data sequence duplications (in purple) and affected 17 (16) sites, half of them caused during a simultaneous failure that affected nine buoys (stars) over the course of a day. The unrealistic measurements (yellow), which affected exclusively the NCAR dataset (triangles) with nine sites regarding wind speed (14 for direction), involved few records, in many cases as a result of a miscoding of missing values. The intersite duplications (pink) affected only three sites regarding wind speed (four for wind direction), but they involved the total suppression of one NCAR site (site CWVY, Fig. 4f).
Figure 7d shows the total accumulated percentage of deleted data. From the 43 sites (out of 526), 24 were barely affected with less than 0.01% of erroneous records and 17 with percentages ranging from 0.1% to 1%. One NCAR site presented errors in more than 1% of its data, all of them related to unrealistic speeds; another one, the aforementioned site CWVY, had all its records removed. The impact of the tests on the wind speed distribution can be seen in Fig. 7e: the distribution before phases 2 and 3 is presented in red and after it in blue. As a result of the application of these first three phases, the maximum wind speed values have been restricted to 100
6. Conclusions
This text describes the first part of a QC procedure designed to identify and correct erroneous records of surface wind speed and direction observations. In this work we describe the first phases of the compilation of a database and the subsequent QC tests focused on data management issues. This database, with
It is worth noting that some potentially useful data sources were not included in the compilation phase as result of a lack of knowledge at the time. For instance, regarding the United States, additional data can be acquired via NOAA’s National Centers for Environmental Information (NCEI; https://www.ncei.noaa.gov). Data of moored buoys, on the other hand, can be additionally retrieved from EC’s ship-format reports, archived by the International Comprehensive Ocean–Atmosphere Data set (ICOADS; available online at http://icoads.noaa.gov). The data obtained from national climate archive organizations offer the advantage of having been subjected to some level of quality control in delayed mode and are more likely to be accompanied with metadata information. Regarding the Canadian stations, although the data are commonly shared in LST format, there is also the possibility of acquiring them nowadays in UTC format by request. These datasets should save some of the painstaking steps taken during the compilation, but they might pose new unknown challenges. Future developments of the WNENA will hopefully integrate these additional sources of information.
Phases 2 and 3, which are more general in nature than phase 1, had a lesser impact on the database as only
The procedures described herein are focused on the establishment of a manageable, internally consistent and spatially well-characterized database composed of climatologically relevant sites. For instance, the tests devoted to the chronological sorting and the detection of duplicated dates ensure the temporal coherence that is indispensable in all the subsequent tests applied both in Part I and Part II and any data analysis in general. The data completeness criteria discriminate sites with climatological value. The procedures that identify internal site displacements ensure that the stored time series do not merge spurious information belonging to different locations. These procedures are subsequently supplemented with the tests devoted to the detection of erroneously duplicated data. Finally, the detection of unrealistic data removes clearly impossible records, in contrast to the flagging process carried out during the detection of improbable measurements in Part II. In general the issues dealt with during Part I have a comparatively lower impact on the number of affected data and wind statistics than those demonstrated in Part II. However, they are crucial for the phases dealing with measurement errors described in Part II.
Acknowledgments
EELE was supported by the Agreement of Cooperation 4164281 between the UCM and St. Francis Xavier University, and projects CGL2014-59644-R and PCIN-2014-017-C07-06 of the MINECO (Spain). Funding for 4164281 was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC DG 140576948), the Canada Research Chairs Program (CRC 230687), and the Atlantic Innovation Fund (AIF-ACOA). HB holds a Canada Research Chair in Climate Dynamics. JN and JFGR were supported by projects PCIN-2014-017-C07-03, PCIN-2014-017-C07-06, CGL2011-29677-C02-01 and CGL2011-29677-C02-02 of the MINECO (Spain). This research has been conducted under the Joint Research Unit between UCM and CIEMAT, by the Colaboration Agreement 7158/2016. We wish to thank the people of Environment and Climate Change Canada, Department of Fisheries and Oceans Canada, and National Center for Atmospheric Research for providing us with the original data used in this study and for their kindness in responding to all the questions that arose during the development of this work and the review process. Special thanks to Gérard Morin and Hui Wan for the metadata from the EC sites; Bruce Bradshaw, Mathieu Ouellet and Bridget Thomas for information regarding moored buoys; and Douglas Schuster for information regarding the ds461.0 and ds464.0 datasets.We thank J. Álvarez-Solas, A. Hidalgo, and P. A. Jiménez for the helpful discussions. Finally, we would also like to thank the reviewers for the many suggestions and useful information they offered us.
Note: A first version of this database will be made available to the public. The QC procedures in this manuscript have been developed using Linux shell scripting and Fortran programming. Potential users interested in having the code are invited to contact the corresponding author.
REFERENCES
Alexandersson, H., 1986: A homogeneity test applied to precipitation data. Int. J. Climatol., 6, 661–675, https://doi.org/10.1002/joc.3370060607.
AXYS Environmental Consulting Ltd., 1996: Meteorological and oceanographic measurements from Canadian weather buoys. AXYS Tech. Rep., 80 pp.
Barnes, S., 1964: A technique for maximizing details in numerical weather map analysis. J. Appl. Meteor., 3, 396–409, https://doi.org/10.1175/1520-0450(1964)003<0396:ATFMDI>2.0.CO;2.
Begert, M., G. Seiz, T. Schlegel, M. Musa, G. Baudraz, and M. Moesch, 2003: Homogenisierung von Klimamessreihen der Schweiz und Bestimmung der Normwerte 1961-1990. MeteoSchweiz Tech. Rep. 67, 170 pp.
Cheng, C. S., 2014: Evidence from the historical record to support projection of future wind regimes: An application to Canada. Atmos.–Ocean, 52, 232–241, https://doi.org/10.1080/07055900.2014.902803.
Cheng, C. S., G. Li, Q. Li, and H. Auld, 2008: Statistical downscaling of hourly and daily climate scenarios for various meteorological variables in South-central Canada. Theor. Appl. Climatol., 91, 129–147, https://doi.org/10.1007/s00704-007-0302-8.
Cheng, C. S., G. Li, Q. Li, H. Auld, and C. Fu, 2012: Possible impacts of climate change on wind gusts under downscaled future climate conditions over Ontario, Canada. J. Climate, 25, 3390–3408, https://doi.org/10.1175/JCLI-D-11-00198.1.
DeGaetano, A., 1997: A quality-control routine for hourly wind observations. J. Atmos. Oceanic Technol., 14, 308–317, https://doi.org/10.1175/1520-0426(1997)014<0308:AQCRFH>2.0.CO;2.
Dunn, R. J. H., K. M. Willett, D. E. Parker, and L. Mitchell, 2016: Expanding HadISD: Quality-controlled, sub-daily station data from 1931. Geosci. Instrum. Methods Data Syst., 5, 473–491, https://doi.org/10.5194/gi-5-473-2016.
Durre, I., M. J. Menne, B. E. Gleason, T. G. Houston, and R. S. Vose, 2010: Comprehensive automated quality assurance of daily surface observations. J. Appl. Meteor. Climatol., 49, 1615–1633, https://doi.org/10.1175/2010JAMC2375.1.
Eischeid, J. K., C. B. Baker, T. R. Karl, and H. F. Diaz, 1995: The quality control of long-term climatological data using objective data analysis. J. Appl. Meteor., 34, 2787–2795, https://doi.org/10.1175/1520-0450(1995)034<2787:TQCOLT>2.0.CO;2.
Eischeid, J. K., P. Pasteris, H. Diaz, M. Plantico, and N. Lott, 2000: Creating a serially complete, national daily time series of temperature and precipitation for the western United States. J. Appl. Meteor., 39, 1580–1591, https://doi.org/10.1175/1520-0450(2000)039<1580:CASCND>2.0.CO;2.
Environment and Climate Change Canada, 2017: Technical documentation: Digital archive of Canadian climatological data. ECCC, 35 pp., ftp://ftp.tor.ec.gc.ca/Pub/Documentation_Technical/Technical_Documentation.pdf.
Etkin, D., S. E. Brun, A. Shabbar, and P. Joe, 2001: Tornado climatology of Canada revisited: Tornado activity during different phases of ENSO. Int. J. Climatol., 21, 915–938, https://doi.org/10.1002/joc.654.
Fiebrich, C. A., C. R. Morgan, A. G. McCombs, P. K. Hall, and R. A. McPherson, 2010: Quality assurance procedures for mesoscale meteorological data. J. Atmos. Oceanic Technol., 27, 1565–1582, https://doi.org/10.1175/2010JTECHA1433.1.
Gandin, L., 1988: Complex quality control of meteorological observations. Mon. Wea. Rev., 116, 1137–1156, https://doi.org/10.1175/1520-0493(1988)116<1137:CQCOMO>2.0.CO;2.
García-Bustamante, E., J. F. González-Rouco, J. Navarro, E. Xoplaki, P. A. Jiménez, and J. P. Montávez, 2012: North Atlantic atmospheric circulation and surface wind in the Northeast of the Iberian Peninsula: Uncertainty and long term downscaled variability. Climate Dyn., 38, 141–160, https://doi.org/10.1007/s00382-010-0969-x.
González-Rouco, J., J. Jiménez, V. Quesada, and F. Valero, 2001: Quality control and homogeneity of precipitation data in the southwest of Europe. J. Climate, 14, 964–978, https://doi.org/10.1175/1520-0442(2001)014<0964:QCAHOP>2.0.CO;2.
Graybeal, D., 2006: Relationships among daily mean and maximum wind speeds, with application to data quality assurance. Int. J. Climatol., 26, 29–43, https://doi.org/10.1002/joc.1237.
Graybeal, D., A. DeGaetano, and K. Eggleston, 2004: Complex quality assurance of historical hourly surface airways meteorological data. J. Atmos. Oceanic Technol., 21, 1156–1169, https://doi.org/10.1175/1520-0426(2004)021<1156:CQAOHH>2.0.CO;2.
Guttman, N. B., 2002: Digitization of historical daily cooperative network data. Preprints, 13th Conf. on Applied Climatology, Portland, OR, Amer. Meteor. Soc., 2.8, https://ams.confex.com/ams/13ac10av/techprogram/paper_38849.htm.
Hart, R. E., and J. L. Evans, 2001: A climatology of the extratropical transition of Atlantic tropical cyclones. J. Climate, 14, 546–564, https://doi.org/10.1175/1520-0442(2001)014<0546:ACOTET>2.0.CO;2.
Haylock, M., N. Hofstra, A. Klein Tank, E. Klok, P. Jones, and M. New, 2008: A European daily high-resolution gridded data set of surface temperature and precipitation for 1950–2006. J. Geophys. Res., 113, D20119, https://doi.org/10.1029/2008JD010201.
Hubbard, K., S. Goddard, W. Sorensen, N. Wells, and T. Osugi, 2005: Performance of quality assurance procedures for an applied climate information system. J. Atmos. Oceanic Technol., 22, 105–112, https://doi.org/10.1175/JTECH-1657.1.
Hughes, L., 2007: Energy security in Nova Scotia. Canadian Center for Policy Alternatives Rep., 74 pp.
Hughes, L., and N. Chaudhry, 2011: The challenge of meeting Canada’s greenhouse gas reduction targets. Energy Policy, 39, 1352–1362, https://doi.org/10.1016/j.enpol.2010.12.007.
Hughes, L., M. Dhaliwal, A. Long, and N. Sheth, 2006: A study of wind energy use for space heating in Prince Edward Island. Proc. Second Int. Green Energy Conf. (IGEC-2), Oshawa, ON, Canada, International Association for Green Energy, 322–332.
Jiménez, P., J. González-Rouco, E. García-Bustamante, J. Navarro, J. Montávez, J. de Arellano, J. Dudhia, and A. Muñoz-Roldan, 2010a: Surface wind regionalization over complex terrain: Evaluation and analysis of a high-resolution WRF simulation. J. Appl. Meteor. Climatol., 49, 268–287, https://doi.org/10.1175/2009JAMC2175.1.
Jiménez, P., J. González-Rouco, J. Navarro, J. Montávez, and E. García-Bustamante, 2010b: Quality assurance of surface wind observations from automated weather stations. J. Atmos. Oceanic Technol., 27, 1101–1122, https://doi.org/10.1175/2010JTECHA1404.1.
Klink, K., 1999: Climatological mean and interannual variance of United States surface wind speed, direction and velocity. Int. J. Climatol., 19, 471–488, https://doi.org/10.1002/(SICI)1097-0088(199904)19:5<471::AID-JOC367>3.0.CO;2-X.
Krause, P. F., and K. L. Flood, 1997: Weather and climate extremes. U.S. Army Corps of Engineers Tech. Rep. TEC-0099, 89 pp.
Kunkel, K. E., and Coauthors, 1998: An expanded digital daily database for climatic resources applications in the midwestern United States. Bull. Amer. Meteor. Soc., 79, 1357–1366, https://doi.org/10.1175/1520-0477(1998)079<1357:AEDDDF>2.0.CO;2.
Landsea, C. W., 2007: Counting Atlantic tropical cyclones back to 1900. Eos, Trans. Amer. Geophys. Union, 88, 197–202, https://doi.org/10.1029/2007EO180001.
Lanzante, J. R., S. A. Klein, and D. J. Seidel, 2003: Temporal homogenization of monthly radiosonde temperature data. Part II: Trends, sensitivities, and MSU comparison. J. Climate, 16, 241–262, https://doi.org/10.1175/1520-0442(2003)016<0241:THOMRT>2.0.CO;2.
Lawrimore, J., M. Menne, B. Gleason, C. Williams, D. Wuertz, R. Vose, and J. Rennie, 2011: An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3. J. Geophys. Res., 116, 1–18, https://doi.org/10.1029/2011JD016187.
Lucio-Eceiza, E. E., J. F. González-Rouco, J. Navarro, H. Beltrami, and J. Conte, 2017: Quality control of surface wind observations in northeastern North America. Part II: Measurement errors. J. Atmos. Oceanic Technol., https://doi.org/10.1175/JTECH-D-16-0205.1, in press.
Mardia, K. V., and P. E. Jupp, 2009: Directional Statistics. Wiley Series in Probability and Statisticcs, Vol. 494, John Wiley & Sons, 456 pp., https://doi.org/10.1002/9780470316979.
Martinez, Y., W. Yu, and H. Lin, 2013: A new statistical–dynamical downscaling procedure based on EOF analysis for regional time series generation. J. Appl. Meteor. Climatol., 52, 935–952, https://doi.org/10.1175/JAMC-D-11-065.1.
Meek, D., and J. Hatfield, 1994: Data quality checking for single station meteorological databases. Agric. For. Meteor., 69, 85–109, https://doi.org/10.1016/0168-1923(94)90083-3.
MSC, 2013: MANOBS: Manual of surface weather observations. 7th ed. Amendment 18, Meteorological Service of Canada Tech. Rep. En56-238/2-2012E-PDF, 488 pp.
Nadolski, V., 1998: Automated Surface Observing System (ASOS) user’s guide. NOAA, Department of Defense, Federal Aviation Administration, U.S. Navy Tech. Rep., 67 pp.
Najac, J., J. Boé, and L. Terray, 2009: A multi-model ensemble approach for assessment of climate change impact on surface winds in France. Climate Dyn., 32, 615–634, https://doi.org/10.1007/s00382-008-0440-4.
NCEP ADP OGSO, 1980: NCEP ADP operational global surface observations. National Center for Atmospheric Research Computational and Information Systems Laboratory Research Data Archive. Subset: February 1975–February 2007, accessed 1 January 2010, http://rda.ucar.edu/datasets/ds464.0/.
NCEP ADP OGSO, 2004: NCEP ADP global surface observational weather data, continuing from October 1999. National Center for Atmospheric Research Computational and Information Systems Laboratory Research Data Archive, accessed 1 January 2010, http://rda.ucar.edu/datasets/ds461.0/.
Newark, M. J., 1981: Tornadoes in Canada for the period 1950 to 1979. Atomic Energy Control Board Research Rep., 88 pp.
NOAA, 2003: ASOS product improvement implementation plan (addendum III) for ice free wind. NOAA Tech. Rep., 76 pp.
Plante, M., S.-W. Son, E. Atallah, J. Gyakum, and K. Grise, 2015: Extratropical cyclone climatology across eastern Canada. Int. J. Climatol., 35, 2759–2776, https://doi.org/10.1002/joc.4170.
Pryor, S. C., and R. J. Barthelmie, 2014: Hybrid downscaling of wind climates over the eastern USA. Environ. Res. Lett., 9, 024013, https://doi.org/10.1088/1748-9326/9/2/024013.
Pryor, S. C., and Coauthors, 2009: Wind speed trends over the contiguous United States. J. Geophys. Res., 114, D14105, https://doi.org/10.1029/2008jd011416.
Richards, W. G., and Y. Abuamer, 2007: Atmospheric hazards: Extreme wind gust climatology in Atlantic Canada 1955–2000. Meteorological Service of Canada Science Report Series EN57-36/2007-1E-PDF, 47 pp.
Schmitt, C. V., IV, 2009: A quality control algorithm for the ASOS ice free wind sensor. 13th Conf. on Integrated Observing and Assimilation Systems for Atmosphere, Oceans, and Land Surface, Phoenix, AZ, Amer. Meteor. Soc., 12A.3, https://ams.confex.com/ams/89annual/webprogram/Paper145755.html.
Shafer, M., C. Fiebrich, D. Arndt, S. Fredrickson, and T. Hughes, 2000: Quality assurance procedures in the Oklahoma Mesonetwork. J. Atmos. Oceanic Technol., 17, 474–494, https://doi.org/10.1175/1520-0426(2000)017<0474:QAPITO>2.0.CO;2.
Steinacker, R., D. Mayer, and A. Steiner, 2011: Data quality control based self-consistency. Mon. Wea. Rev., 139, 3974–3991, https://doi.org/10.1175/MWR-D-10-05024.1.
Thomas, B. R., and V. R. Swail, 2011: Buoy wind inhomogeneities related to averaging method and anemometer type: Application to long time series. Int. J. Climatol., 31, 1040–1055, https://doi.org/10.1002/joc.2339.
Thomas, B. R., E. Kent, and V. R. Swail, 2005: Methods to homogenize wind speeds from ships and buoys. Int. J. Climatol., 25, 979–995, https://doi.org/10.1002/joc.1176.
Vautard, R., J. Cattiaux, P. Yiou, J.-N. Thépaut, and P. Ciais, 2010: Northern Hemisphere atmospheric stilling partly attributed to an increase in surface roughness. Nat. Geosci., 3, 756–761, https://doi.org/10.1038/ngeo979.
Wade, C. G. N., 1987: A quality control program for surface mesometeorological data. J. Atmos. Oceanic Technol., 4, 435–453, https://doi.org/10.1175/1520-0426(1987)004<0435:AQCPFS>2.0.CO;2.
Wan, H., X. L. Wang, and V. R. Swail, 2007: A quality assurance system for Canadian hourly pressure data. J. Appl. Meteor. Climatol., 46, 1804–1817, https://doi.org/10.1175/2007JAMC1484.1.
Wan, H., X. L. Wang, and V. R. Swail, 2010: Homogenization and trend analysis of Canadian near-surface wind speeds. J. Climate, 23, 1209, https://doi.org/10.1175/2009JCLI3200.1.
Wieringa, J., 1980: Representativeness of wind observations at airports. Bull. Amer. Meteor. Soc., 61, 962–971, https://doi.org/10.1175/1520-0477(1980)061<0962:ROWOAA>2.0.CO;2.
WMO, 1950: Provisional guide to international meteorological instrument and observing practice. World Meteorological Organization Tech. Rep. WMO-8, 422 pp.
WMO, 1969: Measurement of surface wind. Guide to meteorological instruments and methods of observation, 3rd ed. Secretariat of the World Meteorological Organization Tech. Rep. WMO-8, 10 pp.
WMO, 1983: Measurement of surface wind. Guide to meteorological instruments and methods of observation, 5th ed. Secretariat of the World Meteorological Organization Tech. Rep. WMO-8, 14 pp.
WMO, 2008: Guide to meteorological instruments and methods of observation. 7th ed. World Meteorological Organization Tech. Rep. WMO-8, 716 pp.
Woodruff, S. D., and Coauthors, 2011: ICOADS Release 2.5: Extensions and enhancements to the surface marine meteorological archive. Int. J. Climatol., 31, 951–967, https://doi.org/10.1002/joc.2103.