## Abstract

The validation of satellite surface soil moisture products requires comparisons between point-scale ground observations and footprint-scale (>100 km^{2}) retrievals. In regions containing a limited number of measurement sites per footprint, some of the observed difference between the retrievals and ground observations is attributable to spatial sampling error and not the intrinsic error of the satellite retrievals themselves. Here, a triple collocation (TC) approach is applied to footprint-scale soil moisture products acquired from passive microwave remote sensing, land surface modeling, and a single ground-based station with the goal of the estimating (and correcting for) spatial sampling error in footprint-scale soil moisture estimates derived from the ground station. Using these three soil moisture products, the TC approach is shown to estimate point-to-footprint soil moisture sampling errors to within 0.0059 m^{3} m^{−3} and enhance the ability to validate satellite footprint-scale soil moisture products using existing low-density ground networks.

## 1. Introduction

The upcoming National Aeronautics and Space Administration (NASA) Soil Moisture Active Passive (SMAP) mission and the recently launched European Space Agency (ESA) Soil Moisture Ocean Salinity (SMOS) mission are designed to retrieve surface soil moisture at coarse spatial resolutions (100 km^{2} for SMAP and 1600 km^{2} for SMOS). Both missions include ground validation activities to verify that retrievals meet required root-mean-square error (RMSE) accuracy goals. However, these activities are hampered by the scale contrast between satellite-based sensor resolutions and the point-scale nature of ground-based instrumentation used for validation (Crow et al. 2005). Since the majority of the available ground-based soil moisture observations are from low-density networks in which one or two measurements are available per satellite footprint (T. J. Jackson 2010, personal communication), the direct comparison of ground networks to footprint-scale satellite soil moisture retrievals will yield mean-square differences (MSDs), which are a function of the intrinsic accuracy of the remote sensing product as well as the spatial representativeness of the ground observations (Cosh et al. 2008). Given the high levels of spatial variability typically observed in soil moisture fields (Famiglietti et al. 2008), poor representativeness may artificially inflate the measured MSD comparisons above mission accuracy goals.

Recently, Scipal et al. (2008) proposed the application of a triple collocation (TC) procedure (Stoffelen 1998; Caires and Sterl 2003; Janssen et al. 2007) to soil moisture. TC is based on the premise that uncertainty in three parallel estimates of a single variable can be deduced if the estimates possess mutually independent errors. Here, we describe the first application of a TC approach to ground-based soil moisture instrumentation and the estimation of sampling errors associated with the spatial upscaling of their measurements. As described above, the direct comparison of point-scale ground observations with satellite-based soil moisture retrievals yields an MSD that is inflated by the sampling error associated with acquiring footprint-scale means using sparse ground observations. Our goal here is to apply TC to estimate (and correct for) the portion of the total MSD between the ground observations and the retrievals attributable to the spatial sampling error and improve prospects for adequately validating soil moisture retrievals using existing ground-based instrumentation.

## 2. Triple collocation

Our TC approach is based on three separate time series assumed to approximate footprint-scale (>100 km^{2}) surface soil moisture (*θ*): a microwave remote sensing product (*θ*_{RS}), a land surface model product (*θ*_{LSM}), and a ground-based product derived from a single point-scale observation within each footprint (*θ*_{POINT}). All three products contain errors arising from mutually distinct sources. Remotely sensed estimates are impacted by instrument noise and uncertainty in microwave emission modeling. Model-based estimates suffer from a simplified parameterization of soil water loss and forcing data error. Coarse-scale soil moisture estimates obtained from a single point-scale observation are degraded by sensor calibration/measurement errors and representativeness errors due to the inherent spatial heterogeneity of surface soil moisture fields. Given the diversity of these sources, it appears reasonable to assume that the three products contain mutually independent errors.

Prior to the application of TC, each product is decomposed into its climatology mean and anomaly components:

where is the climatological expectation for soil moisture at the day of year (*D*) associated with time step *i* and *θ*′* _{i}* is the actual anomaly relative to this expectation. Values of are calculated through moving window averaging of multiyear data within a window size of

*N*days centered on

*D*. The implications of this decomposition are discussed further in section 5. In addition,

*θ*′

_{RS}and

*θ*′

_{LSM}are rescaled to match the temporal variance of

*θ*′

_{POINT}. Unless otherwise noted,

*N*= 31 days and the subscript

*i*is dropped in future references to time series variables.

Differences in the temporal anomalies estimated by the remote sensor and point-scale ground observations can be written as

where *θ*′_{TRUE} represents the true anomaly time series. Assuming mutual independence of error in the remote sensing observations (*θ*′_{RS} − *θ*′_{TRUE}) and point observations (*θ*′_{TRUE} − *θ*′_{POINT}), the mean of the square of both sides is

or equivalently

where the measurable quantity MSD(*θ*′_{RS}, *θ*′_{POINT}) differs from the true validation quantity of interest MSD(*θ*′_{RS}, *θ*′_{TRUE}) due to the spatial sampling error MSD(*θ*′_{POINT}, *θ*′_{TRUE}). Our approach applies TC to estimate MSD(*θ*′_{POINT}, *θ*′_{TRUE}) and uses (4) to correct estimates of MSD(*θ*′_{RS}, *θ*′_{TRUE}) based on measured values of MSD(*θ*′_{RS}, *θ*′_{POINT}).

TC is based on expressing the relationship between temporal anomalies in all three available soil moisture estimates (*θ*′_{RS}, *θ*′_{POINT} and *θ*′_{RS}) and true soil moisture anomalies as

where the *ε* terms denote times series errors relative to *θ*′_{TRUE}.

Assuming mutually independent errors, (10) collapses to

Our goal is validating estimates of MSD(*θ*′_{POINT}, *θ*′_{TRUE}) from (11) with independent estimates of the same quantity acquired from ground-based soil moisture networks within data-rich watershed sites. TC-based estimates of MSD(*θ*′_{POINT}, *θ*′_{TRUE}) can then be combined with (4) to improve estimates of MSD(*θ*′_{RS}, *θ*′_{TRUE}) without access to extensive ground-based measurements. Note that the success of both inferences hinges on the, as of yet untested, independent error assumptions underlying (4) and (11).

## 3. Data and processing

The verification analysis described above is conducted over four U.S. Department of Agriculture (USDA) Agricultural Research Service (ARS) experimental watersheds: the Little River (LR), Georgia (Bosch et al. 2007), the Little Washita (LW), Oklahoma (Allen and Naney 1991; Cosh et al. 2006), the Reynolds Creek (RC), Idaho (Slaughter et al. 2001), and the Walnut Gulch (WG), Arizona (Renard et al. 2008; Cosh et al. 2008). As a group, they provide a range of climate, land cover, and topographic conditions under which to evaluate TC. Each watershed also contains a network of about 20 Stevens Water Hydra Probe surface (0–5 cm) soil moisture sensors installed at USDA Micronet sites within each watershed. See Fig. 1 for watershed/network site locations and Table 1 for a summary of the watershed characteristics and soil moisture instrumentation. Further details are given in Jackson et al. (2010).

To obtain a reference dataset to verify TC predictions, a Thiessen polygon approach is used to interpolate all available 1330 local solar time (LST) ground-based soil moisture measurements up to a single watershed-scale daily time series of *θ*_{NETWORK} (Jackson et al. 2010). While *θ*_{NETWORK} represents the best-available approximation of *θ*_{TRUE} for a watershed, small instrumental and spatial sampling errors in *θ*_{NETWORK} can still artificially inflate their MSD comparisons with other soil moisture products. To correct for this, the estimated error variance in *θ*′_{NETWORK}(〈*ε*_{NETWORK2}〉) is correctively subtracted from MSD comparisons with *θ*′_{NETWORK} to estimate MSD versus *θ*′_{TRUE}:

Based on comparisons with gravimetric soil moisture measurements obtained during field campaigns, 〈*ε*_{NETWORK2}〉 is assumed to be on the order of 0.010^{2} m^{6} m^{−6} (Cosh et al. 2006; Cosh et al. 2008). In addition, multiple sets of *θ*_{POINT} time series are acquired by repeatedly selecting different individual sensor locations to represent each watershed. Only locations containing measurements for at least 50% of the days in the study period (2 February 2002 to end dates listed in Table 1) are used to represent *θ*_{POINT}.

Our *θ*_{RS} estimates are based on 0.25° single-channel algorithm (Jackson et al. 2010) retrievals acquired from Advanced Microwave Scanning Radiometer (AMSR-E) 10.6-GHz brightness temperature observations. Only data from the 1330 LST AMSR-E overpass are considered. Time series of *θ*_{RS} for each watershed are extracted from the 0.25° pixel most closely matching each watershed. Validation work has demonstrated that the measurement depth of these retrievals is consistent with the ground measurements described above (Jackson et al. 2010).

Our *θ*_{LSM} estimates are based on 0–5-cm surface soil moisture predictions from a 0.125° Noah land surface model (LSM; Mitchell 2009) simulation run on a 30-min time step and driven by the North American Land Data Assimilation System (NLDAS) forcing dataset (Cosgrove et al. 2003), the Foreign Agricultural Office world soil classification with Reynolds et al. (2000) soil/clay fractions, and the 1-km global land cover classification of Hanson et al. (2000). Soil and vegetation parameter lookup tables are based on the Noah implementation for the NLDAS project (Robock et al. 2003). Soil moisture states are spun up for 18 months prior to the start of the analysis, and surface values for the multiple Noah 0.125° pixels corresponding to each watershed are spatially averaged to obtain a single *θ*_{LSM} for each watershed.

## 4. Results

As stated in section 1, our goal is the estimation of MSD(*θ*′_{POINT}, *θ*′_{TRUE}) based solely on the availability of *θ*′_{RS}, *θ*′_{LSM} and *θ*′_{POINT}. By comparing TC-based estimates of MSD(*θ*′_{POINT}, *θ*′_{TRUE}) from (11) with comparable statistics obtained from extensive ground-based measurements, we can verify the assumption of mutually independent errors that underlies the approach.

Figure 2 provides such a comparison by summarizing TC results for the four watersheds described above. Each point represents the use of a single sensor in a given watershed as *θ*′_{POINT}. Expressed in terms of the square root of MSD (RMSD), the plot compares RMSD(*θ*′_{POINT}, *θ*′_{TRUE}) values calculated using *θ*′_{NETWORK} and (12) to TC-based estimates acquired by taking the square root of (11). Relative to benchmark values of RMSD(*θ*′_{POINT}, *θ*′_{TRUE}) obtained independently from (12), TC estimates of RMSD(*θ*′_{POINT}, *θ*′_{TRUE}) utilizing only *θ*′_{RS}, *θ*′_{LSM} and *θ*′_{POINT} data are nearly unbiased and have an RMSE accuracy of 0.0059 m^{3} m^{−3}. Despite their intrinsic variability, the TC approach appears to work equally well in all four watersheds. Problems with the underlying TC assumption of mutually independent errors would manifest themselves as nonzero covariance terms on the right-hand side of (10) and induce bias in (11) relative to (12). The lack of any apparent bias (and/or extensive scatter) in Fig. 2 implies that these assumptions have been adequately met within our three collocated *θ*′ estimates.

Values of RMSD(*θ*′_{POINT}, *θ*′_{TRUE}) in Fig. 2 are based on 〈*ε*_{NETWORK2}〉 = 0.010^{2} m^{6} m^{−6} in (12). However, results are generally robust to assumptions concerning 〈*ε*_{NETWORK2}〉. Assuming 〈*ε*_{NETWORK2}〉 = 0, for instance, leads to only a slight increase in RMSE (from 0.0059 to 0.0062 m^{3} m^{−3}). In addition, *N* = 31 days in (1) dictates that soil moisture anomalies are calculated relative to a seasonally varying climatology. In contrast, selecting *N* = 365 days means that anomalies are calculated relative to a fixed soil moisture mean across the entire seasonal cycle. Recreating Fig. 2 for *N* = 365 days (Fig. 3) increases the RMSE from 0.0059 to 0.0089 m^{3} m^{−3}, suggesting that TC works better when variations in soil moisture climatology are taken into account.

Another relevant issue is the required accuracy of the LSM. Decreasing LSM accuracy (via, e.g., an excessive coarsening of the spatial resolution or the degradation of the model physics) increases *ε*_{LSM} and therefore the sampling uncertainty in the 〈*ε*_{LSM}*ε*_{RS}〉 and 〈*ε*_{LSM}*ε*_{POINT}〉 covariance terms on the right-hand side of (10). Since (11) is based on neglecting these terms, such noise induces a greater amount of random error into TC estimates. To examine the magnitude of this impact, the Noah LSM is replaced with a simple antecedent precipitation index, where *θ*_{LSM} is generated via

and *P _{i}* is a watershed-scale daily rainfall accumulation based on the Tropical Rainfall Measurement Mission’s (TRMM) 0.250° 3B42 rainfall product. Relative to Noah, (13) represents a degradation in resolution (from 0.125° to 0.250°), forcing data (from gauge-based NLDAS to satellite-based TRMM 3B42 rainfall), and the quality of the LSM physics. Nevertheless, duplicating Fig. 2 for

*θ*

_{LSM}from (13) (not shown) produces only a modest increase in the RMSE of TC-based RMSD(

*θ*′

_{POINT},

*θ*′

_{TRUE}) (from 0.0059 to 0.0082 m

^{3}m

^{−3}). This implies that the approach is relatively tolerant to variations in LSM performance.

By combining TC-based estimates of MSD(*θ*′_{POINT}, *θ*′_{TRUE}) from Fig. 2 with the observable quantity MSD(*θ*′_{RS}, *θ*′_{POINT}), RMSD(*θ*′_{RS}, *θ*′_{TRUE}) can be estimated as the square root of (4). Figure 4b shows the relationship between such estimates and actual values of RMSD(*θ*′_{RS}, *θ*′_{TRUE}) obtained from (12). For comparison, Fig. 4a shows the same relationship but assumes MSD(*θ*′_{POINT}, *θ*′_{TRUE}) = 0. Due to sub-watershed-scale soil moisture variability, the comparison of a single point-scale ground observation with a footprint-scale AMSR-E retrieval leads to an inflated estimate of RMSD(*θ*′_{RS}, *θ*′_{TRUE}) (Fig. 4a). Estimating MSD(*θ*′_{POINT}, *θ*′_{TRUE}) via (11) and inserting it into (4) improves our ability to recover RMSD(*θ*′_{RS}, *θ*′_{TRUE}) and thus validate *θ*_{RS} (Fig. 4b).

## 5. Summary and discussion

The NASA SMAP and ESA SMOS missions are faced with RMSE-based error validation goals for their surface soil moisture products. For most areas, the sole basis for demonstrating such accuracies is comparisons with ground-based soil moisture networks that provide a very small number of observations per satellite footprint. Consequently, retrieval error estimates from such comparisons are spuriously inflated. Requiring only the added availability of a LSM simulation, the TC approach described here offers a viable approach for addressing this problem. In particular, results in Figs. 2 and 3 demonstrate the ability of a TC-based procedure to estimate spatial sampling errors associated with using low-density soil moisture observations to obtain footprint-scale soil moisture averages. When combined with (4), the accurate specification of these errors provides a robust basis for removing the positive bias in RMSD(*θ*′_{RS}, *θ*′_{POINT}) associated with ground-based spatial sampling errors (Fig. 4). That is, application of the TC approach enhances our ability to validate remotely sensed soil moisture estimates using existing low-density soil moisture networks.

The presented approach is limited to recovering information about the accuracy of soil moisture temporal anomalies obtained from (1) and cannot therefore predict the long-term bias of a particular point-scale observation site relative to a footprint-scale average. However, such biases can potentially be estimated based on knowledge of local land surface conditions (Grayson and Western 1998) and/or the application of a spatially distributed LSM (Crow et al. 2005). In addition, most data assimilation systems require the rescaling of soil moisture retrieval products into a particular LSM’s unique climatology prior to analysis. In such cases, the information content of the remote sensing retrievals is based solely on their representation of anomalies (Koster et al. 2009). Consequently, the ability of a TC approach to validate the representation of soil moisture anomalies in remote sensing products is arguably addressing the most critical aspects of the problem.

## Acknowledgments

This research was NASA-supported through Wade Crow’s membership on the NASA SMAP mission science definition team.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**(

## Footnotes

* Current affiliation: Department of Hydrology and GeoEnvironmental Sciences, VU University Amsterdam, Amsterdam, Netherlands

*Corresponding author address:* Wade T. Crow, USDA Agricultural Research Service, Hydrology and Remote Sensing Laboratory, Rm. 104, Bldg. 007, BARC-W, Beltsville, MD 20705. Email: wade.crow@ars.usda.gov