The North American Soil Moisture Database (NASMD) was initiated in 2011 to provide support for developing climate forecasting tools, calibrating land surface models, and validating satellite-derived soil moisture algorithms. The NASMD has collected data from over 30 soil moisture observation networks providing millions of in situ soil moisture observations in all 50 states, as well as Canada and Mexico. It is recognized that the quality of measured soil moisture in NASMD is highly variable because of the diversity of climatological conditions, land cover, soil texture, and topographies of the stations, and differences in measurement devices (e.g., sensors) and installation. It is also recognized that error, inaccuracy, and imprecision in the data can have significant impacts on practical operations and scientific studies. Therefore, developing an appropriate quality control procedure is essential to ensure that the data are of the best quality. In this study, an automated quality control approach is developed using the North American Land Data Assimilation System, phase 2 (NLDAS-2), Noah soil porosity, soil temperature, and fraction of liquid and total soil moisture to flag erroneous and/or spurious measurements. Overall results show that this approach is able to flag unreasonable values when the soil is partially frozen. A validation example using NLDAS-2 multiple model soil moisture products at the 20-cm soil layer showed that the quality control procedure had a significant positive impact in Alabama, North Carolina, and west Texas. It had a greater impact in colder regions, particularly during spring and autumn. Over 433 NASMD stations have been quality controlled using the methodology proposed in this study, and the algorithm will be implemented to control data quality from the other ~1200 NASMD stations in the near future.
In situ soil moisture is valuable for validating soil moisture products such as offline land surface models (Robock et al. 2003; Fan et al. 2006; Liu et al. 2011; Meng et al. 2012; Xia et al. 2014) as well as coupled numerical weather and climate prediction models (de Goncalves 2006; De Rosnay et al. 2009; Fan et al. 2011; Su et al. 2013). It also has been widely used for validating remote sensing–based soil moisture products (Zribi et al. 2008; Gruhier et al. 2010; Jackson et al. 2010; Parrens et al. 2012), for monitoring drought (see http://www.cpc.ncep.noaa.gov/products/monitoring_and_data/topsoil.shtml), and for studying the dynamics of soil moisture in space and time (Entin et al. 2000; Famiglietti et al. 2008; Brocca et al. 2010). However, networks that measure in situ soil moisture utilize different measurement instruments (sensors), calibration techniques, and installation methods. Therefore, in situ soil moisture observations in the United States lack harmonization through standardized quality control methods and protocols. The absence of consistent calibration and measurement standards among observation networks makes comparison of data collected by different networks very difficult. To overcome this limitation, the International Soil Moisture Network (ISMN; http://www.ipf.tuwien.ac.at/insitu) was initiated in 2010 to serve as a centralized data hosting facility where globally available in situ soil moisture measurements from operational networks and validation campaigns are collected, harmonized, and made available to users through a single access point (Dorigo et al. 2011). The ISMN includes more than 1400 stations internationally and offers soil moisture data at an hourly time scale. In 2011, the North American Soil Moisture Database (NASMD; http://soilmoisture.tamu.edu/) was initiated (Ford and Quiring 2013a,b; Ford et al. 2014) as a database for harmonized and quality controlled daily soil moisture that can be used to investigate land–atmosphere interactions, validate the accuracy of soil moisture simulations in global land surface models and satellite remote sensing, and document how soil moisture influences climate on seasonal to interannual time scales. The NASMD includes 32 networks and more than 1600 stations throughout North America and provides data at daily or longer time scales. Observations from networks measuring soil moisture at subdaily scales are aggregated to a daily resolution in the NASMD. As with the ISMN, datasets incorporated into the NASMD are heterogeneous in terms of measurement technique, measurement depth, spatial extent, and degree of automation. Therefore, quality control of measured soil moisture data is crucial to maintain a harmonized database across North America. Several studies have developed quality control algorithms for meteorological variables such as air temperature and precipitation (Hubbard et al. 2005), solar radiation (Journée and Bertrand 2011), sea surface temperatures (Merchant et al. 2008), and ocean salinity (Ingleby and Huddleston 2007); however, considerably fewer have focused on in situ soil moisture quality control (Illston et al. 2008; You et al. 2010). Recently, Dorigo et al. (2013) developed a soil moisture quality control methodology to flag erroneous soil moisture measurements in the ISMN. With this algorithm, soil moisture observations are flagged if 1) the volumetric water content is less than 0 m3 m−3; 2) the value is larger than 0.6 m3 m−3; 3) the value is larger than the saturated point (porosity); 4) the soil temperature is negative; 5) soil moisture increases without a precipitation event; and 6) the time series has a spike, break, or plateau. This flagging process removed about 13% of the measurements in the ISMN. The NASMD has also developed a quality control algorithm that flags values outside the 0.0–0.6 m3 m−3 range as well as flagging values that deviate from values of the previous and subsequent days by three or more standard deviations and streaks of more than 10 days of the same volumetric water content value (Quiring et al. 2013, manuscript submitted to Bull. Amer. Meteor. Soc.). However, because soil temperature is not observed at all stations in the NASMD, quality control tests based on frozen/unfrozen soil status have not been developed. Furthermore, soil porosity varies considerably between sites in North America, making the use of an arbitrary maximum value such as 0.6 m3 m−3 inadequate for robust soil moisture quality control. Therefore, this study is developed to provide an addition to the NASMD automated quality control methodology based on soil temperature and soil porosity for each individual observing station.
The North American Land Data Assimilation System, phase 2 (NLDAS-2; Xia et al. 2012a,b), used the Noah model (a community land surface model commonly used as the land component of coupled weather and climate models) to produce a >34-yr (1979–2014) hourly and ° soil temperature dataset using the Climate Prediction Center’s gauge-based precipitation, bias-corrected with the PRISM monthly precipitation (Daly et al. 2008). The Noah-simulated soil temperature has high accuracy as it has been comprehensively evaluated using in situ soil temperature observations (Xia et al. 2013). In addition, a recent comparison between observed soil texture and NLDAS-2 texture shows good correspondence (Xia et al. 2015b). Therefore, the NLDAS-2 soil temperature and porosity will be used to flag soil moisture measurements that are observed in frozen soils (negative soil temperature) and those that are higher than the soil porosity. In addition, the Noah-simulated fraction of liquid and total soil moisture will also be tested to flag erroneous soil moisture observations in the NASMD. We recognize that spatial-scale mismatch issues still exist in this study when the gridded soil temperature, soil porosity, and fraction of liquid and total soil moisture are used to quality control site data. This will continue to be a challenge for land surface modeling until the model spatial resolution is comparable to the spatial footprint of in situ soil moisture sensors. The following results are presented with this caveat in mind. The automated quality control (QC) methodology developed in this study is based on existing experiences in the ISMN automated QC system. We first describe the NASMD and NLDAS-2 generally and include a more detailed description of the datasets that are used for developing and testing the QC procedures. The sources of in situ soil moisture measurement errors are described in section 3, the quality control methodology is discussed in section 4, and summary and conclusions are given in section 5.
The data used in this study include observed daily soil moisture provided by the NASMD (Table 1 and Fig. 1), simulated daily soil temperature values from the NLDAS-2, the fraction of liquid and total soil moisture (a sum of liquid and frozen soil moisture), and soil porosity. The NASMD hosts data from over 32 networks representing more than 1600 measurement stations. Figure 1 shows the spatial distribution of these stations in the NASMD database, while Table 1 lists station density and measurement information for each network. The networks used in this study include the Michigan Automated Weather Network, North Carolina Environment and Climate Observing Network (ECONet), Oklahoma Mesonet, Nebraska Automated Weather Data Network (NAWDN), SNOTEL, Soil Climate Analysis Network (SCAN), and West Texas Mesonet (Fig. 2). Stations contained within each network range from 22 to several hundred, managed by a single organization/institution. Each network has its own objectives with a different focus on soil moisture measurements. Therefore, the measurement techniques, sensor types and techniques, and depths at which soil moisture is observed vary from network to network. For example, networks included in the NASMD measure or estimate soil moisture content using Stevens Water HydraProbe, CS616 water content reflectometer, Theta probes, Vitel probes, cosmic-ray observing systems, and heat dissipation sensors. Errors vary between different measurement techniques, such that estimated measurement error of soil moisture will vary from 0.03 to 0.06 m3 m−3 (Xia et al. 2014). Note that the measurement errors may be underestimated as they are close to the manufacturer’s estimate only when sensors are appropriately calibrated to in situ soils. In practice, most installed sensors are not well calibrated to their respective soils and actual errors likely range from 0.05 to 0.10 m3 m−3.
Soil moisture measurements are often problematic in winter as most soil moisture measuring devices malfunction when soils are frozen. This leads to significantly lower recorded soil moisture (Hallikainen et al. 1985) and these data are not appropriate for scientific use. Usually, in situ soil temperature measurements have been tested to detect and flag spurious soil moisture observations in the ISMN (Dorigo et al. 2011) and in the U.S. Soil Climate Analysis Network (Liu et al. 2011). The results are very encouraging. However, use of in situ soil temperature measurements is not universally applicable for NASMD as soil temperature is not measured by most of the networks. Furthermore, soil temperature observations may be missing when the corresponding soil moisture measurements exist at a given station. Dorigo et al. (2013) used a Global Land Data Assimilation System (GLDAS; Rodell et al. 2004)–Noah soil temperature dataset with a spatial resolution of 0.25° (2000–present) and 1° (before 2000) to flag soil moisture measurements when modeled soil temperature dropped below 0°C. Their results show that the use of the GLDAS–Noah soil temperature for flagging spurious values performs well when compared to the results obtained using in situ soil temperature data. However, coarse-resolution (40–100 km) GLDAS–Noah soil temperatures may lead to inaccuracies or inconsistencies in flagging soil moisture. The NLDAS provides a finer-resolution alternative to GLDAS. NLDAS-2 is a multi-institutional collaboration project from which a long-term (1979–present), fine-spatial-resolution (0.125°), and consistent Noah soil temperature product is generated. PRISM bias-corrected gauge-based precipitation, bias-corrected downward solar radiation, and the North American Regional Reanalysis (Mesinger et al. 2006) are all integrated into the NLDAS-2 suite of products. The NLDAS-2 Noah soil temperature has been comprehensively evaluated using in situ soil temperature observations over the United States (Xia et al. 2013), and it was further improved by upgrades to the Noah model (Xia et al. 2015c). In this study NLDAS-2 Noah soil temperature is used to flag spurious soil moisture values. In addition, daily fractions of liquid and total soil moisture simulated from the NLDAS-2 Noah model are also used to flag spurious soil moisture values when soils are frozen.
In situ soil moisture observations from 12 SCAN stations were checked using extensive QC steps that included automatic detection of problematic observations and a visual inspection of the time series, including erroneous increases and decreases in soil moisture. Unrealistic measured values (i.e., data beyond a reasonable range, data with large inconsistencies associated with sensor calibration and installation) and measurements taken under frozen conditions were excluded (Liu et al. 2011). These observations were used as benchmarks to evaluate how well the quality control method proposed in this study works.
Dorigo et al. (2013) reported that about 1% of total measured soil moisture records exceed the saturation point. Unfortunately, in situ porosity was not available for any of the validation datasets. If in situ soil texture data are available from the NASMD, they are set as observed soil classes. If soil classes were not provided by the observation network, these parameters were estimated from the Soil Survey Geographic (SSURGO) Database generated from the U.S. Department of Agriculture. Recent work from the Environmental Modeling Center (EMC) Land-Hydrology Group compared the NASMD soil classes with NLDAS-2 soil classes. The results show that they are quite similar. Therefore, the NLDAS-2 porosity at 0.125° resolution is used to flag spurious soil moisture values that exceed the NLDAS-2 soil saturation point. Figure 2 shows the spatial distribution of NLDAS-2 porosity (25-cm soil temperature on 10 February 2007) with 12 SCAN stations and 435 NASMD stations.
The NASMD soil moisture is measured at different soil depths, depending on the network. NLDAS-2 Noah soil temperature and fraction of liquid and total soil moisture are simulated at four Noah soil layers (5 cm for 0–10-cm soil layer, 25 cm for 10–40-cm soil layer, 70 cm for 40–100-cm soil layer, and 150 cm for 100–200-cm soil layer). The layers in which soil moisture is estimated may not match the layers in which soil temperature and fraction of liquid are estimated. To reasonably use NLDAS-2 Noah products, we apply a simple linear interpolation to vertically translate NLDAS Noah products from Noah soil layers to the NASMD soil layers that is variable by network. This simple method has been used for evaluation of NLDAS-2 soil temperature (Xia et al. 2013) and soil moisture (Xia et al. 2014). We acknowledge that soil temperature and moisture profile may not be linearly distributed with soil depth, which will vary depending on soil, depth, climate, and season. However, because of the lack of soil temperature and moisture profile information, a simple linear interpolation is deemed to be the most appropriate approach.
3. Sources of errors in soil moisture measurements
All observations of physical quantities are subject to measurement error. The measurement of a physical variable (e.g., soil moisture) includes two components: 1) a numerical value giving the best estimate possible of the quantity measured, and 2) the degree of error associated with this estimated value. In general, there are two types of errors: 1) systematic errors and 2) random errors. Systematic errors are constant and always of the same sign and thus may not be reduced by averaging over a lot of data. Examples of systematic errors in soil moisture observations include consistent over- or underestimation of actual soil moisture content caused by poor sensor calibration or preferential flow due to incorrect installation procedures. Random errors are produced by any one of a number of unpredictable and unknown variations in the experiment. If soil moisture sensors are calibrated and installed correctly, the majority of measurement errors should be random. The error sources for soil moisture in the NASMD are mostly attributable to the diversity of climatological conditions, vegetation types, soil classes, and topographies of the stations, as well as differences in measurement devices and installation. In this study, three kinds of errors are identified in NASMD soil moisture data. The first is outliers that are outside the geophysical range of 0.0–0.6 m3 m−3 (Dorigo et al. 2013). The second is underestimation of actual soil moisture content, which occurs systematically by in situ sensors in frozen soils. These errors are due to the fact that dielectric conductivity of ice is significantly lower than that of liquid water, and therefore most measuring devices malfunction when soils are frozen. These errors are systematic and can be reasonably flagged with the aid of the NLDAS-2 Noah soil temperature and the fractions of liquid and total soil moisture. The third kind of error occurs when measured soil moisture values exceed soil saturation. There are at least two possible causes of this error. The first is when the observed soil type is different from the soil texture used in NLDAS-2. For example, clay is identified as loam when the texture from NLDAS-2 is used at a given site. In this case, the NLDAS-2 porosity is not representative of porosity at the point soil moisture measurement. This leads to the use of lower porosity for this site when higher-porosity soil exists. The second potential source of this error is measurements taken immediately or a few hours after heavy rainfall or irrigation; however, these conditions typically do not persist for more than a few days. Because high-resolution measurements of rainfall and/or irrigation are not available in both NLDAS-2 and NASMD, these errors are difficult to detect.
4. QC methodology
The most thorough quality control of measured soil moisture records is exclusion of spurious and erroneous values through manual inspection of each station, each measurement depth, and each day. However, this is not practical for large volumes of data and integrative databases such as the NASMD. The more feasible solution is to develop an automatic methodology to identify and flag spurious observations without visual inspection. Dorigo et al. (2013) developed an automated methodology to exclude the spurious and erroneous soil moisture records in the ISMN using GLDAS precipitation, GLDAS–Noah soil temperature, and several statistical QC checks. We developed an automated QC methodology based on the QC algorithm of the ISMN by 1) ensuring measurements are within a geophysical range (GR) between 0.0 and 0.6 m3 m−3 (GR check), 2) using the NLDAS-2 Noah soil temperature, 3) the NLDAS-2 fraction of liquid and total soil moisture, and 4) the NLDAS-2 soil porosity (SP check). To test our QC approach, we first selected 12 stations (Fig. 2) from the 121 SCAN sites used in Liu et al. (2011) and Xia et al. (2014), as soil moisture measurements at these 12 stations have been flagged using the corresponding in situ soil temperature observations and from visual inspection. The observed soil moisture data at these 12 stations were used as a reference dataset for benchmarking. After the QC approach was tested at these 12 stations, we then repeated this testing at 421 stations from seven networks in NASMD, covering eight states (Fig. 2).
The soil porosity for each of the 12 test stations and 421 NASMD stations were obtained using a nearest-neighbor interpolation method from ° NLDAS-2 Noah soil porosity grid field. We use the 10-cm soil porosity from Noah to quality control soil moisture values in the shallow (<60 cm) soil layers. At a given station, when the measured soil moisture value is larger than the saturated porosity at this station, the value is flagged. To apply the NLDAS-2 Noah soil temperature to the QC procedure, daily soil temperature is first calculated from NLDAS-2 hourly output (Xia et al. 2013) for all four soil layers (i.e., 5, 25, 70, and 150 cm). Daily soil temperature values at each station for the four Noah soil layers are generated over a 14-yr period (1999–2012) using the nearest-neighbor method to match the long-term measured daily soil moisture from the seven networks (Table 2). For a given soil layer at a given station, a simple linear method is used to vertically interpolate the soil temperature from the four Noah soil layers into the specific soil layer in which the sensor lies. When the soil temperature is below 0°C at a given station, the corresponding soil moisture observation is flagged because of the effect of frozen soils.
a. Development of an automated quality control based on 12 SCAN stations
Table 3 summarizes the overall percentages of values flagged for 12 SCAN stations. The only depth at which values are flagged as outside the geophysical range of soil moisture is at 50 cm (0.1% values flagged). In contrast, the percentages of values exceeding NLDAS-2 soil porosity are larger, as 0.2%, 0.4%, 0.2%, 0.8%, and 9.06% of values are flagged at the 5-, 10-, 20-, 50-, and 100-cm soil depths, respectively. The first four values are quite comparable to those reported by Dorigo et al. (2013). However, the relatively large fraction of soil moisture observations flagged at the 100-cm depth suggests that the observations may be incorrectly flagged. To examine this in more detail, we analyze raw observations for 12 stations and find that most of the flagged observation records come from the Nebraska and New York SCAN sites. An example is the 8-yr (2002–09) averaged soil moisture at the 100-cm soil depth at SCAN station NY_2011 near Geneva, New York (Fig. 3). Almost 50% of the measured soil moisture values at the station exceed the NLDAS-2 soil porosity (0.465, clay loam). However, the measured soil moisture seems to be quite reasonable. The Noah model uses the same soil class and porosity for all soil layers, but in fact soil porosity may be different for shallow and deep soil layers. So the use of the same value may lead to a mismatch between the NLDAS-2 soil class and the observed soil class at this site. In addition, maximum porosity in NLDAS-2 Noah is 0.476, which is smaller than observed soil moisture values (>0.50) at this site. Therefore, for deep soil moisture measurements, we do not use the SP approach to flag observations. For frozen soils, the percentages flagged for the 12 SCAN stations are 19.8%, 19.57%. 18.37%, 16.86%, and 13.01% for the 5-, 10-, 20-, 50-, and 100-cm soil depths, respectively, when the NLDAS-2 Noah soil temperatures are used. These values are comparable when the fractions of NLDAS-2 Noah liquid and total soil moisture (<80%) are used (Table 3). As the soil depth increases, the percentage of flagged values decreases because the deep soil freezes less frequently. Figure 4 compares the spatially averaged soil moisture climatology at 5-, 10-, and 20-cm soil depths against the quality controlled SCAN observations (Liu et al. 2011), raw observations, and the remaining data after removing the flagged raw observations when the NLDAS-2 Noah soil temperatures are below 0°C (ST0) or when the fractions of NLDAS-2 Noah liquid and total soil moisture are smaller than 80% (F80). We excluded the comparison for 50- and 100-cm soil depths as few SCAN stations have quality controlled data at either of those depths. The results show that the ST0 and F80 can flag unrealistically small soil moisture values in winter, early spring, and late fall when the soil is still frozen for all three soil layers when compared with the quality controlled observations. Figure 5 compares the quality controlled observations, raw observations, and the observations flagged through ST0 and F80 at station WA_2021 (Washington State; see Fig. 2a) for all five soil layers. In the top three soil layers, ST0 performs slightly better than F80 when compared to quality controlled observations. In the fourth soil layer, ST0 has a larger effect than F80, and in the fifth soil layer, ST0 and F80 have similar effects. The results also show that ST0 and F80 flag fewer values as the layer depth increases, which is consistent with the results in Table 3. As NLDAS-2 Noah soil temperatures have been comprehensively evaluated using many observations over the United States (Xia et al. 2013, 2015c) and NLDAS-2 Noah liquid soil moisture has not yet been evaluated because of the lack of observations, it is more appropriate to flag soil moisture observations using the NLDAS-2 Noah soil temperature than using F80, although the F80 also performs quite well. Figure 6 shows a similar comparison at the 20-cm depth for 10 SCAN stations when the ST0 is used. The results show that ST0 flags relatively few values in warmer climates (e.g., Alabama or South Carolina). However, it indeed flags a lot of data at stations in higher latitudes, in particular for winter, early spring, and late fall. Determining if the soil moisture observations are valid is more difficult in spring and fall, as these seasons experience days with frozen and unfrozen soils. Overall, ST0 did a good job of flagging spurious observations at the 12 SCAN stations, although it has difficulty flagging some spurious measurements during the late spring transition period at two cold stations (MN_2002, ND_2020). This is attributable to two factors: 1) the soil is not completely melted although the soil temperatures are above 0°C, and 2) the simulated soil temperatures may be warmer than in situ soil temperature at these two stations. By incorporating the GR check, SP check, and negative NLDAS-2 Noah soil temperature, an automated QC methodology to flag spurious soil moisture observations was developed and tested at 12 SCAN stations. The results suggest that this QC approach is able to identify most of the spurious observations while not falsely flagging any valid observations. However, it remains unclear how the QC approach performs when more stations are used. The next section will assess the QC approach using more than 421 stations covering eight states with different climates, soil classes, and vegetation types.
b. Applying the automated QC to 421 NASMD stations
Table 4 shows percentages of flagged values based on eight states and seven networks included in the NASMD for different soil depths. The number of the measurement records varies from 118 302 in North Carolina to 1 253 665 in Oklahoma, depending on the number of soil depths measured, number of stations, and years of record (Table 4). In general, during the cold season (winter, early spring, and late fall), the data are more reliable in states with unfrozen soils than in those with cold and/or mountainous climates as the frozen soil affects measurement accuracy. This is particularly obvious in Table 4. The GR and SP tests together flag less than 2% of values in Colorado, Utah, and Nebraska. The ST0 check, in contrast, flags 66.16%, 48.93%, and 24.87% of values in these three states, respectively. In warmer locations like Alabama, North Carolina, and west Texas, less than 2% of the measured records were flagged by the ST0 test (Table 4). The GR check flags the lowest percentage of measured records when compared to the SP and ST0 checks, varying from 0.0% in Oklahoma to 3.92% in Michigan. The percentage flagged by SP check varies from 0.0% in Nebraska to 7.89% in North Carolina. Overall, soil moisture observations in Oklahoma and west Texas had the smallest percentage of flagged values (<2.5%), benefiting from unfrozen soils year-round. Including the ST0 test, the reliability of soil moisture observations was lowest in Colorado (66.16%), Utah (48.93% flagged), and Michigan (33.47% flagged). However, when we remove the effects of the ST0 test, the states with the lowest percentage of values flagged are Nebraska (0.01%), Oklahoma (0.15%), and Utah (0.91%), while the states with the highest percentage of values flagged are Alabama (6.66%), Michigan (10.18%), and North Carolina (10.32%). Although states in high-latitude and high-elevation locations had, in general, higher percentages of values flagged, it is important to separate the cumulative effects of the GR and SP checks from the ST0 check. The GR and SP checks evaluate the quality of the data with respect to physical limitations of the soil, while the ST0 test evaluates the reliability of the data during the cold season. Figure 7 shows the comparison of observations without (RAW OBS) and with the QC procedure (QC flag) for Alabama, North Carolina, Oklahoma, and west Texas. We selected these regions because frozen soils are not common. Most of the flags over these states are due to the GR and SP check, in particular for 50-cm soil moisture in Alabama, 20-cm soil moisture in North Carolina, and 20-cm soil moisture in west Texas, where the checks significantly reduced soil moisture contents when compared to the raw observations. In most cases, the effects of the QC checks on raw observations are very small. In contrast, for states with colder climates such as Nebraska, Michigan, Colorado, and Utah, much of the soil is frozen in winter, early spring, and late fall. Figure 8 shows the effect of frozen soils on raw observations when the QC procedure is used, in particular for the stations in the SNOTEL network, which are located in the mountains (Figs. 8g–l). For shallow soil layers (<50 cm) over Michigan, Nebraska, Colorado, and Utah, most of the raw observations are flagged in winter, early spring, and late fall as expected. Fewer raw observations are flagged in the deeper layers as soil temperatures become warmer (see Figs. 8d–f). In Michigan, more than 10% of the raw observations are flagged based on the GR and SP checks, making the QC observations lower (drier soils) than the raw observations (Figs. 8a,b). In Utah, there are some spikes for all three soil layers because of unreasonably large values of the original data. The reason for this may be sensor failure or erroneous readings; however, a simple GR check can easily remove them. Overall, the results show that the QC procedure, particularly the GR and SP checks, largely reduces the soil moisture content when compared with the raw observations without the QC procedure.
To demonstrate how the QC procedure affects the evaluation of multimodel-simulated soil moisture in the NLDAS-2, we validated simulations of 20-cm soil moisture in Alabama, 20-cm soil moisture at North Carolina, and 20-cm soil moisture in west Texas for four NLDAS-2 land surface models [i.e., Noah, Mosaic, the Sacramento soil moisture accounting model (SAC), and VIC]. The Taylor skill score (Taylor 2001) is used because it is an integrated statistical measure that accounts for bias and correlation. The Taylor skill score is defined as
where R is correlation between the simulated and observed daily soil moisture, R0 is the theoretical maximum correlation (assumed to be 1 in this study), and σ is the standard deviation of the simulated daily soil moisture normalized by the standard deviation of the observed daily soil moisture. Table 5 compares the S scores calculated from the simulated and observed daily soil moisture at the 20-cm soil layer for Alabama and west Texas, and the 20-cm soil layer in North Carolina against observations using the raw data and using the quality controlled data. The S scores increased in all three regions, especially in North Carolina and west Texas. This increase is a result of both increased correlation and reduced biases (i.e., mean absolute error, root-mean-square error), suggesting that the data flags had a positive influence on the statistical measures. This partially explains why we found relatively low S values at the 20-cm soil layer in west Texas in our recent study (Xia et al. 2015a). Note that we did not evaluate remotely sensed soil moisture products such as the Advanced Scatterometer (ASCAT; Bartalis et al. 2007), AMSR-E (Owe et al. 2008), SMOS (Mecklenburg et al. 2012), SMAP (Entekhabi et al. 2010), and Soil Moisture Operational Products System (SMOPS; Zhan et al. 2011) as an assessment of remotely sensed products is beyond the scope of this study. The evaluation of these remote sensing products using raw and QC observations is ongoing and will be addressed in a future paper.
Although NLDAS-2 soil porosity can be reasonably used to screen out obvious anomalies, this flag should be applied carefully because it is based on a soil porosity map obtained from the ⅛° NLDAS-2 Noah database (http://www.emc.ncep.noaa.gov/mmb/nldas/LDAS8th/soils/LDASsoils.shtml; Mitchell et al. 2004) and may not correctly represent soil properties at fine scales. In addition, the NLDAS-2 soil porosity may be too small in high-latitude wet regions and too large in arid/semiarid regions. However, soil porosity observations are only available for a few stations. Therefore, the use of a regionally consistent dataset is preferred over in situ porosity observations, as the latter will lead to an inconsistent quality flag for the NASMD. It is expected that a more reasonable soil porosity dataset, that is a function of soil depth, will be developed in the future based on the SSURGO Database from the U.S. Department of Agriculture, Natural Resources Conservation Service (NRCS 2013; http://websoilsurvey.sc.egov.usda.gov/App/HomePage.htm).
Note that 10-cm soil porosity from the Noah model is applied throughout the shallow layers (<60 cm) in this study. However, soil type and compaction, and therefore porosity, are typically not consistent throughout the profile. The SP check may introduce unintended biases because of the variability in soil porosity within the top 60 cm. Hydraprobe sensors operated by multiple soil moisture monitoring networks in the United States also report soil temperature with which the quality control procedures could be implemented. However, soil moisture sensor type is variable by network, and therefore in situ soil temperature measurements are not ubiquitous. The NLDAS-2 soil temperature dataset has been exhaustively validated with in situ observations (Xia et al. 2013) and can provide a high-quality product for soil moisture quality control across the United States. The quality control process that is used in this study, similar to those developed in other studies (Dorigo et al. 2013), is more likely to flag observations taken in cold weather or more northern-latitude regions. This may introduce unintended biases with regard to flagged observations, which could affect applications of soil moisture observations in cold conditions.
An automated QC methodology is developed for controlling in situ soil moisture observations in the North American Soil Moisture Database using a geophysical range check, a soil porosity check, and a soil temperature check. We used 12 example stations (Liu et al. 2011) to verify the performance of the QC approach. The soil moisture observations validated carefully by using in situ soil temperature and visual inspection were assumed to be “ground truth.” The results showed that the NLDAS-2 products flagged spurious measurements for all 12 stations, suggesting that soil porosity, soil temperature, and fractions of liquid and total (liquid plus frozen) soil moisture were very useful. The QC approach was further applied to 421 NASMD stations covering eight states and seven networks. When including the effects of frozen soils, the locations with the lowest percent of values flagged were Oklahoma and west Texas, both with less than 2.5%. Not surprisingly, the states with the highest percent of values flagged are those in high-elevation and high-latitude regions (Colorado, Michigan, Utah, and Nebraska), all with more than 24% of values flagged. However, when we exclude flags due to frozen soils, the states with the lowest percent of values flagged are Nebraska (0.01%), Oklahoma (0.15%), and Utah (0.91%), while the states with the highest percentage of values flagged are Alabama (6.66%), Michigan (10.18%), and North Carolina (10.32%). When the quality controlled data were used to evaluate the soil moisture simulated from the four NLDAS-2 land surface models (i.e., Noah, Mosaic, SAC, VIC) over Alabama, North Carolina, and west Texas, the Taylor skill scores increased using the QC data as compared to using the raw observations, suggesting that the data flagging procedure had a positive influence on the assessment of model products.
Overall, the QC approach provided satisfactory results, and it should be a useful QC tool for processing in situ soil moisture measurements in the NASMD. The QC procedure developed here will be applied to the NASMD to flag erroneous and spurious data for all NASMD stations, except for the Cosmic-Ray Soil Moisture Observing System (COSMOS) stations listed in Table 1. The measurement instrument in the COSMOS network can measure total soil moisture (liquid + frozen) when soil is frozen. To facilitate a proper tracking of the error source, we will use the Coordinated Energy and Water Cycle Observation Project (CEOP) flag definitions (http://www.eol.ucar.edu/projects/ceop/dm/documents/refdata_report/data_flag_definitions.html), in particular for category D to add a subcategory (i.e., D1—exceeding geophysical range, D2—exceeding soil porosity, D3—soil temperature below zero) to describe these individual cases. A quality controlled in situ soil moisture database with flag tracks will be staged on the NASMD website (http://soilmoisture.tamu.edu/) for the public in the near future.
We thank the data providers from the following networks: Jeff Andresen (Michigan AWN), Natalie Umphlett (Nebraska AWN), Mike Strobel (SCAN, SNOTEL), John Schroeder (West Texas Mesonet), and Rolf Reichle (QC SCAN). We also appreciate the scientists who worked for the Oklahoma Mesonet and the North Carolina ECONet. Without their efforts and support, this study would not be possible. Authors SQ and TF were funded by the National Science Foundation (Award AGS-1056796). We also acknowledge Brad Ferrier from EMC whose edits greatly improved the readability of this manuscript. The data used in this study can be obtained from the NASMD website (http://soilmoisture.tamu.edu/) and the NLDAS-2 website (http://www.emc.ncep.noaa.gov/mmb/nldas).