## 1. Introduction

Soil moisture plays a unique role in land–atmosphere interactions by coupling the water and energy cycles through its control on evapotranspiration and runoff and consequently influencing boundary layer cloud formation and precipitation (e.g., Schär et al. 1999; Betts 2004; Koster et al. 2004). Although it is well recognized that accurate knowledge of real-time soil moisture conditions will enhance hydrological forecasts and potentially contribute to improving numerical weather forecasting, such data have not been collected at continental scales by ground networks. The large variability in surface soil moisture across different spatial scales presents a challenge for such in situ observation networks, like the Oklahoma Mesonet (Vinnikov et al. 1999; Illston et al. 2004; Reichle et al. 2004; Prigent et al. 2005; Gao et al. 2006). Therefore, in most cases, soil moisture remains a model-estimated variable with its accuracy depending on the quality of the forcing data and model physics.

The influence of soil moisture in weather prediction was highlighted in Beljaars et al. (1996) and Viterbo and Betts (1999), where the sensitivity of short- and medium-range precipitation forecasts to land surface parameterization and modeled soil moisture anomalies was analyzed. Koster and Suarez (2001, 2003) analyzed the role of soil moisture in seasonal forecasts of summertime precipitation and found that the predictability increased in selected regions when accurate initial soil moisture states were used. In the framework of the World Climate Research Program (WCRP) Global Energy and Water Experiment (GEWEX) Land Atmospheric Coupling Experiment (GLACE), Koster et al. (2004) identified areas of strong coupling between soil moisture and precipitation through a multimodel analysis.

The sensitivity of surface emissivity at microwave frequencies to surface soil wetness provides great potential for observing soil moisture by remote sensing, with lower frequencies having greater sensitivity. Operational spaceborne passive microwave sensors extend this potential to large scales up to global coverage for land areas with suitable land cover (Jackson and Schmugge 1989; Jackson 1993; Owe et al. 1999). Historically, the lowest available frequency was the 6.6-GHz channel (C band) from the Scanning Multichannel Microwave Radiometer (SMMR) that operated from 1978 to 1987. After 1987, the lowest frequency was 19.7 GHz on the Special Sensor Microwave Imager (SMM/I) until the 10.7-GHz channel (X band) on the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) was available starting in 1997. TMI was followed by the Advanced Microwave Scanning Radiometer on the National Aeronautics and Space Administration (NASA) *Aqua* platform (AMSR-E) (with both C band and X band), which was launched in 2002. European Space Agency’s (ESA) Soil Moisture and Ocean Salinity mission (SMOS) (Kerr et al. 2001) with a planned launch in 2007 will provide L-band observations. The effective spatial resolution of TMI and AMSR-E is approximately 50 km. However, Vinnikov et al. (1996) and Entin et al. (2000) have shown that about half of the variance in surface soil moisture in the Northern Hemispheric midlatitude comes from atmospheric processes at scales of ∼500 km, implying that satellite-derived surface soil moisture products should be useful in both hydrological and meteorological applications.

A number of studies have demonstrated the potential of assimilating satellite-derived soil moisture into land surface models (LSMs) (e.g., Walker and Houser 2001; Reichle et al. 2001; Crow et al. 2005). The promising results from these studies were obtained using high-resolution airborne data from field experiments over limited domains or from synthetic observations, which do not fully represent the statistical characteristics of operational retrievals from spaceborne sensors. Recent studies (e.g., Reichle et al. 2004; Drusch et al. 2005; Gao et al. 2006) indicate that large systematic differences exist between satellite-retrieved soil moisture and in situ observations or model simulations. These differences are partly due to a number of factors, such as different effective depths of retrieved and modeled soil moisture (Wilker et al. 2006); the confounding influences of subgrid elements, such as roads, houses, trees, small water bodies, and so forth within the satellite footprint that are ignored by the retrieval algorithms (Gao et al. 2006); and the recognition that each LSM has its own “climatology” (Koster and Milly 1997). However, Reichle and Koster (2005) showed for SMMR retrievals and soil moisture estimates from the NASA Catchment LSM that if the systematic differences between model and satellite-based soil moisture are accounted for, then estimates of soil moisture from assimilation are superior to either the satellite or the model alone when compared to in situ observations.

Observation operators are used in data assimilation systems to relate the quantities observed by satellites to the state variable in the model. If the brightness temperature is assimilated, the observation operator is a (forward) radiative transfer model. If the satellite-derived soil moisture is assimilated, the observation operator is a transfer function that corrects the bias and systematic differences between the retrieved and modeled soil moisture. Reichle and Koster (2005) and Drusch et al. (2005) developed observation operators for the assimilation of soil moisture by matching the two cumulative distribution functions (CDFs) over a comparison period.

One assumption of CDF matching is that the ranked values of the two datasets are uniquely related. In practice this usually does not hold due to estimation uncertainties in both the retrieved and observed/modeled soil moisture, resulting in a *statistical* rather than a *perfect* dependency. An alternative approach is to construct statistical observation operators by fitting a joint probabilistic model to the two datasets, which addresses both the uncertainty within the datasets and the dependency between them.

In this study we use copula-based probability distributions (Nelsen 1999; De Michele and Salvadori 2003; Favre et al. 2004) to develop the observation operators for two soil moisture datasets from satellite retrievals and three LSMs. These data sources are discussed in detail in section 3. Seasonal observation operators are developed for each remote sensing–model combination, which allows an evaluation of the usefulness of satellite data for assimilation on a seasonal basis. Due to seasonal effects from vegetation and seasonal variations in soil moisture profiles and dynamics, one expects that the model and retrieval errors to vary within the year, and the results confirm this. This study provides the first comparison of soil moisture observation operators developed from different satellite-based retrievals and land surface models and an evaluation of the usefulness of the satellite information to improve the modeled soil moisture estimates.

Section 2 describes the copula-based probability distributions and their conditional simulations, which are used for the statistical soil moisture operators. Section 3 introduces the satellite and model data used in the study and their comparison to in situ measurements from the Oklahoma Mesonet. In section 4, the resulting soil moisture operators are presented and compared, including a comparison to the approach of CDF matching (Reichle and Koster 2005; Drusch et al. 2005).

## 2. Copula-based joint probability distribution

*X*and

*Y*, with their joint CDF

*F*(

_{XY}*x*,

*y*) and marginal CDFs

*F*(

_{X}*x*) and

*F*(

_{Y}*y*). (Following the convention in the probability theory literature, an uppercase letter denotes a random variable and a lowercase letter denotes the value taken by the corresponding random variable.) We also define two new variables

*u*and

*υ*as follows:

*u*=

*F*(

_{X}*x*) and

*υ*=

*F*(

_{Y}*y*). Then according to Sklar’s theorem (Sklar 1959), there exists a unique 2D function

*C*, such that

*C*is the so-called copula function. From their definition, it is seen that both

*u*and

*υ*are uniformly distributed on the space [0, 1] (Nelsen 1999), and thus the copula function

*C*is indeed a 2D uniform CDF.

In the estimation of *F _{XY}*,

*F*and

_{X}*F*describe the marginal distributions of

_{Y}*X*and

*Y*, while

*C*represents the dependency structure between

*X*and

*Y*. This allows for the estimation of the joint distribution to be divided between estimating the individual marginal CDFs and estimating the copula (dependency) function. There are a large number of parametric univariate distributions available to represent the marginal distributions and abundant choices for the copula function (Nelsen 1999), unlike the situation if the joint distribution

*F*was estimated directly from the data. In this latter case, the only available choice would be the multivariate normal distribution. Thus, the proposed copula-based approach extends the probability models to any combination of known univariate distributions and copulas, which greatly enhances our ability and flexibility in constructing a joint distribution from the data.

_{XY}*ϕ*is called the copula generator. The Gumbel copula generator is

*ϕ*(

*u*) = [−log(

*u*)]

^{δ}^{+1}, the Clayton is

*ϕ*(

*u*) =

*u*

^{−}

*− 1, and the Frank is*

^{δ}*ϕ*(

*u*) = −log[(

*e*

^{−}

*− 1)/(*

^{δu}*e*

^{−}

*− 1)]. All three are one-parameter (*

^{δ}*δ*) copulas, where the strength of dependency between the two variables increases as

*δ*increases. For a copula in this family, the relationship between the generator

*ϕ*and Kendall’s rank correlation coefficient

*τ*is

*ϕ*′(

*u*) is the derivative of

*ϕ*(

*u*) with respect to

*u*.

*τ*is a nonparametric measure of dependency between two (matched) random variables and is defined as

*n*is the sample size,

*n*is the number of concordant pairs in the sample, and

_{c}*n*is the number of discordant pairs. All

_{d}*n*sample data are exhaustively pairwise compared resulting in

*n*(

*n*+ 1)/2 comparisons. A comparison between the

*i*th and

*j*th sample points is “concordant” if (

*x*−

_{i}*x*)(

_{j}*y*−

_{i}*y*) > 0; that is, they vary in the same direction and are “discordant” otherwise. Confidence limits in

_{j}*τ*can be determined (Best and Gipps 1974).

As stated above, fitting a copula-based joint distribution is done in two steps: (i) fitting marginal (univariate) distributions to each variable, using appropriate models and techniques, which are well described in the statistical literature and (ii) fitting the copula, which requires estimating the copula function *C* in Eq. (2). For an Archimedean copula, the algebraic relationship between Kendall’s *τ* and the generator in Eq. (3) reduces the fitting to computing Kendall’s *τ* from the data and solving Eq. (3) for *δ*.

Once the copula-based joint distribution is estimated—that is, *F _{X}*(

*x*),

*F*(

_{Y}*y*), and

*C*(

*u*,

*υ*) are obtained—both unconditional and conditional random samples from this distribution can be generated through Monte Carlo simulations. The unconditional simulation of (

*x*,

*y*) is divided into three steps: (i) generate random samples of

*u*uniformly from [0, 1], remembering that

*u*=

*F*(

_{X}*x*); (ii) given a sample value of

*u*, generate a random sample of

*υ*|

*u*using the inverse conditional CDF

*C*

_{V}_{|}

_{U}^{−1}(

*υ*|

*u*), where the conditional CDF is simply

*C*

_{V|U}(

*υ*|

*u*) = (∂/∂

*u*)

*C*(

*u*,

*υ*); and (iii) generate the corresponding

*x*and

*y*variates by inverting their marginal CDFs from

*u*and

*υ*:

*x*=

*F*

_{X}^{−1}(

*u*) and

*y*=

*F*

_{Y}^{−1}(

*υ*). The conditional simulation of

*y*|

*x*is even simpler: (i) compute

*u*=

*F*(

_{X}*x*); (ii) draw random samples of

*υ*|

*u*as described above; and (iii) invert from

*υ*to obtain

*y*:

*y*=

*F*

_{Y}^{–1}(

*υ*).

The conditional simulation is especially important for this investigation, because of the need for conditional probability distribution of the modeled soil moisture given a satellite-retrieved value. Also, such conditional random samples and their statistics are useful in a Bayesian estimation system, where this conditional would be the posterior distribution of modeled soil moisture given a satellite retrieval.

## 3. Data sources for the analysis

This section briefly describes the data sources used in estimating the copula joint distributions and the resulting statistical soil moisture observation operators. The satellite data are the retrievals from the TMI (Gao et al. 2006) and AMSR-E (Njoku et al. 2003), and the LSM data are from simulations of the Variable Infiltration Capacity (VIC) model (Liang et al. 1994) within the North American Land Data Assimilation System (NLDAS) project (Mitchell et al. 2004), the 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40), and the National Centers for Environmental Prediction (NCEP) North American Regional Reanalysis (NARR). More information on these data and modeling sources is provided in the following sections.

### a. Satellite measurements and retrieved soil moisture

TRMM/TMI measures microwave emissions from the earth between 38°S and 38°N. It runs a sun-asynchronous orbit with 16 orbits per day (more at http://trmm.gsfc.nasa.gov/). The brightness temperatures from November 1997 to the present are available through the NASA Goddard Space Flight Center (GSFC) Distributed Active Archive Center (DAAC), with the period January 1998 to the end of December 2003 used in this study. The TMI soil moisture is retrieved using the Land Surface Microwave Emission Model (LSMEM) (Drusch et al. 2001; Gao et al. 2004) from horizontally polarized 10.65 GHz (X band) brightness temperatures, and the LSMEM provides the retrieved product at a ⅛° spatial resolution. The retrieval approach is described in Gao et al. (2006), where the reader is directed for more detailed information. TMI-based soil moisture was also retrieved using an alternative retrieval algorithm based on the NASA AMSR-E algorithm described by Njoku et al. (2003), which was modified for use with the TMI measurements.

The second satellite sensor is AMSR-E. AMSR-E shares the same frequencies as TMI with an additional C-band radiometer, which was not used in this study due to radio frequency interference (RFI). AMSR-E operates in a polar, sun-synchronous orbit with equator crossings at 1:30 a.m./p.m. (Njoku et al. 2003). Level 2B AMSR-E-retrieved soil moisture was obtained through the NASA U.S. Geological Survey (USGS) Land Process Distributed Active Archive Center (LP DAAC) (http://edcimswww.cr.usgs.gov/pub/imswelcome/) at a spatial resolution of ¼° since May 2002. The retrieved AMSR-E soil moisture is based on the retrieval algorithm version B02 originally developed at the NASA Jet Propulsion Laboratory (JPL) (Njoku 2004) that uses multifrequency, multipolarization measurements within a statistical framework and whose coefficients come from regression analysis against simulations. The retrieved values are still considered preliminary and unvalidated.

Table 1 lists the major instrument characteristics of the two sensors. Results presented in section 4 for the two satellite soil moisture products reveal that the standard NASA retrievals have a significantly suppressed dynamic range in comparison to the LSMEM retrievals. The LSMEM retrieval algorithm was also applied to the AMSR-E 10.65-GHz brightness temperature, resulting in both retrieval algorithms being used on both TMI and AMSR-E data. A comparison of the retrieved soil moisture from the two algorithms for the two radiometers is presented in Fig. 1, where the reader can note the different dynamic ranges from the algorithms. The comparisons are based on daily averaged soil moisture values averaged over 30 Oklahoma Mesonet sites. The sites were chosen by Gao et al. (2006) to be ones that appear to have little or no instrumental errors when checked for data quality. Besides this concern for obvious instrumental errors, no other criterion was used for screening out mesonet sites. Gao et al. (2006) computes the correlation between the LSMEM retrievals from TMI and Oklahoma Mesonet measured soil moisture, by season, and the reader is referred here for more information on the assessment of the LSMEM retrievals.

On the basis of Fig. 1 and related analyses (not shown), which show that each retrieval algorithm performs similarly regardless of the sensor (TMI or AMSR-E), we decided to pool the retrievals together based on retrieval algorithm. This resulted in a combined TMI/AMSR-E LSMEM-retrieved soil moisture dataset and a corresponding one for the NASA/JPL–based algorithm, referred to later as “LSMEM soil moisture” and “NASA soil moisture,” respectively.

The retrieved soil moisture series are compared to their difference from the daily averaged soil moisture measurements across Oklahoma in Fig. 2 (LSMEM algorithm) and Fig. 3 (NASA algorithm). The field measurements are from the 30 Oklahoma Mesonet (http://www.mesonet.org) monitoring sites described above for Fig. 1 and are averaged spatially to provide an Oklahoma-scale average. Daily averages for TMI were computed from the retrieved values at its overpass times and for AMSR-E from its afternoon overpass time. The mesonet soil moisture is measured at 5-cm depth (perhaps reflecting the average moisture content in the layer 0∼10 cm or so), while TMI and AMSR-E 10.65-GHz brightness temperatures are sensitive to soil moisture variations in the top ∼1 cm. LSMEM-derived soil moisture demonstrates a strong correlation to the soil moisture difference (correlation coefficients for the four seasons are 0.88, 0.71, 0.68, and 0.88), though it is less significant with NASA-derived soil moisture (correlation coefficients for the four seasons are 0.32, −0.41, −0.19, and 0.64), especially in summer and fall.

### b. Land surface model–derived soil moisture fields

As stated above, three different model-derived soil moisture fields were utilized in the study, and the fields are briefly described here. The first is surface (10 cm) soil moisture from the VIC model simulations from the NLDAS at an hourly time step and at ⅛° spatial resolution. The second model-derived soil moisture field comes from the top layer of the ERA-40 model (Simmons and Gibson 2000). ERA-40 provides the top 7-cm layer soil moisture globally at a spatial resolution of 120 km with analyses at 0000, 0600, 1200, and 1800 UTC, respectively. The third model-derived soil moisture product is from NCEP’s NARR, which utilizes their Noah model (see Mitchell et al. 2004) and provides the top 7-cm top layer soil moisture at a 3-hourly time step and a 32-km resolution. All model soil moisture data are averaged over the 30 mesonet sites and aggregated to daily values for the comparisons with TMI. For comparison of VIC to AMSR-E retrievals, its 1800 UTC values are used. Six years (1998∼2003) of data from all models were collected except for the ERA-40 output, which is not available for 2003.

The surface layer soil moisture is fairly similar among the models (7–10-cm depth) and the Oklahoma Mesonet upper layer (5-cm depth), so the comparison and preparation of the training data for generating the soil moisture observation operators is the same as in section 3a. In a manner similar to Figs. 2 and 3, Fig. 4 presents a comparison between satellite-retrieved soil moisture and their difference from model-derived soil moisture for the summer [June–August (JJA)] season. The comparisons are for 1998 to 2003, except for ERA-40, which covers 1998–2002. The two satellite retrieval algorithms, LSMEM and NASA, are plotted separately, with the left panels for the LSMEM retrievals and the right panels for the NASA retrievals. The top panels show the comparisons with the VIC land surface models. The most significant feature in the NASA retrievals is the suppressed dynamic range in retrieved soil moisture. The results for ERA-40 in the two bottom panels show similar results to VIC. The two panels in the middle are for the NARR model, which show larger scattering. Again, the suppressed dynamic range in NASA retrievals can be seen. The comparisons for other seasons (not shown) yield similar results.

## 4. Results from fitting the soil moisture operators

This section describes the results from estimating the copula function and joint distributions to the soil moisture series and the generation of the soil moisture observation operators. Following Drusch et al. (2005), the soil moisture operator provides the correction, or difference, between a satellite-retrieved soil moisture value and the modeled value, given a satellite retrieval, and can be estimated as the conditional distribution of this difference, given a satellite retrieval, from the joint distribution. Thus, accurate fitting of the copula and joint distribution to the datasets that consist of the satellite retrievals and their difference from the modeled soil moisture is a critical step. This process is presented in the next section for the VIC-derived soil moisture and satellite-retrieved soil moisture from LSMEM and NASA. The study considered six potential operators [the two satellite-based datasets (LSMEM and NASA) and three model-based datasets]. The purpose of considering all six is to evaluate the extent that satellite observations may potentially provide useful information for assimilation into land surface models. The results for the other combinations are briefly discussed in section 4b. Further discussions about the operators are made in section 5.

### a. Estimating the copula function and joint distribution

Estimating the soil moisture operators consists of fitting marginal distributions to the individual soil moisture series, fitting the copula function and therefore the joint distribution, and generating conditional distributions given a satellite soil moisture retrieval. Proper fitting of each marginal distribution is important because they are used to map the copula-based joint distributions from their CDFs to soil moisture. The fitting of each distribution is done independently, which allows maximum flexibility in choosing the most appropriate distribution. From preliminary analysis of the data, a gamma distribution was chosen to represent the satellite-retrieved soil moisture from both the LSMEM and NASA algorithms. A gamma distribution also fitted well the soil moisture differences between the LSMEM and the LSM estimates (VIC, ERA-40, and NARR). For the differences between the NASA-based algorithm estimates and the LSM models, a normal distribution is selected as being most appropriate. The parameters of the fitted marginal distributions are presented in Table 2. The marginal CDF using the fitted parameters compared very closely to the empirical CDFs (not shown).

Different types of copula represent different dependency structures. Within the bivariate copula function, the scale of dependency is described by its parameter *δ,* which is transformed from Kendall’s *τ* according to Eq. (3). Knowing the parameter *δ* for a particular copula, the joint distribution can be determined and the marginal distributions simulated, as described in section 2. Various copula-based joint distributions should be compared with the empirical joint distributions such that the type of copula whose dependency structure best characterizes the training data will be selected. Figure 5 shows an example of joint distributions of empirical CDFs and the simulated results from three different copulas (the Clayton, Gumbel, and Frank). By comparing each simulated result with the empirical marginals using chi-square goodness-of-fit tests, we choose the Gumbel copula to represent the joint distributions of the retrieved soil moisture and soil moisture differences, which is then used to develop the soil moisture operators. Table 2 also includes for each soil moisture operator, Kendall’s *τ* and the Gumbel copula dependency parameter *δ.*

The resulting copula joint distribution can be checked through simulation using the parameters in Table 2 and comparing the simulated values to the original series. Results for summertime (JJA) satellite-retrieved soil moisture and soil moisture differences are shown in Fig. 6 for the three models and two satellite-retrieval algorithm products. As can be seen, the simulated values from the fitted joint distribution mimic well the observed data.

### b. Observation operators based on conditional simulation

Conditional copula simulation allows the ensemble generation of the soil moisture differences (observed soil moisture minus modeled soil moisture) conditioned on satellite-retrieved soil moisture, and from this a modeled soil moisture for a given retrieved soil moisture. The conditional simulation is largely the same as described in the above section, except with the following modifications.

First, for a range of potential soil moisture retrievals, we calculated their CDFs using the distribution parameters from section 4a. For instance, the range of retrieved soil moisture using the LSMEM algorithm is 2%–37%. Using the marginal distribution and parameters in Table 2, the corresponding CDF was calculated for 100 LSMEM soil moisture values evenly spanning this range with an interval of 0.35%. These CDF values correspond to *u* in Eqs. (1) and (2), and given their value a conditional simulation of *υ* can be generated following the approach of section 2. For each generated *υ*, its marginal distribution can be inverted to estimate the random variable, which in this case is the difference between satellite-retrieved and modeled soil moisture. By generating say 1000 simulations, a soil moisture operator can be estimated for each potential satellite retrieval value. Modeled soil moisture can then be computed from the simulated soil moisture differences and the observed soil moisture.

The above results are used to develop observation operators that are easy to implement in data assimilation applications. Further comments regarding the use of operators in data assimilation are provided in the discussion section. Note here that the important characteristic of an operator is that it offers an explicit relationship between satellite-retrieved soil moisture and a variable useful for merging the satellite information with model estimates, which can either be a correction term [difference between satellite and modeled soil moisture following Drusch et al. (2005)] or the modeled soil moisture, as we now show below.

Using the 1000 conditional simulations from the copula-based joint distribution, the statistical mean and standard deviation of modeled soil moisture are first calculated for each retrieved soil moisture level and then regressed against the retrievals using second-order polynomials. All *R*^{2} values are larger than 0.99 for these regressions. While seasonal operators are estimated in this study, it follows that a single (all season) operator could be estimated based on all satellite retrievals and modeled soil moisture, and comparisons carried out to determine which offers more information.

A sampling of these derived observation operators are plotted in Figs. 7 –9, with the continuous line being the conditional mean and the vertical bars the conditional standard deviations. The major features of these operators are summarized below.

A clear season signal is evident in the operators. During the December–February (DJF) and March–May (MAM) seasons, the thicker upper layers in the models (7–10 cm) have less variability in soil moisture than the surface layer represented by the TMI and AMSR-E retrievals. During the JJA and September–November (SON) period, there appears to be a stronger coupling due to climatic variables (high evaporative demand and dryness) resulting in a near-linear relationship across the dynamic range of both satellite retrievals.

The comparison between the conditional standard deviation of the operators and the unconditional standard deviation of the modeled soil moisture is an indication of the information contributed by the satellite-retrieved soil moisture. In this sense, the SON season retrievals from both LSMEM and NASA would contribute more information in data assimilation than their retrievals would in the MAM season. LSMEM also would contribute more information to VIC than to ERA-40, because the conditional standard deviations are smaller. For wet conditions in the JJA season, LSMEM retrievals would contribute little information if assimilated into ERA-40 or NARR. The larger conditional standard deviation for NARR for all seasons and retrievals indicates that assimilating soil moisture into this model would contribute the least. This underscores the importance of the infiltration/evaporation parameterization in coupled weather models, including how these processes interact with the overall data assimilation systems, if these models start to assimilate satellite-based soil moisture retrievals.

The conditional standard deviation derived from the copulas, shown to vary with season and with retrieved wetness, offers a flexible error structure that has been ignored by previous data assimilation applications. In past applications, the error structure has usually been assumed constant (Walker and Houser 2004).

### c. Comparison of copula-based operators to CDF-matching operators

Previous work on soil moisture operators has used a technique often referred to as “CDF matching” (Reichle and Koster 2005; Drusch et al. 2005) to remove bias and to rescale satellite-retrieved soil moisture to modeled soil moisture values. Conceptually, the satellite-retrieved values are scaled so that the CDFs of both datasets match. Technically the procedure is as follows: (i) the two datasets (satellite-retrieved and modeled soil moisture) are ranked and the differences in soil moisture of each ranked dataset computed; and (ii) the observation operator, which removes the systematic differences between both datasets, is determined by a fitted polynomial to the ranked retrieved soil moisture values and the corresponding differences (e.g., retrieved minus modeled). This approach corrects all moments of the retrieved soil moisture distribution under the assumption of perfect dependency among the ranks (i.e., *n*th ranked soil moisture values in each time series occurred at the same time). However, poor model parameterizations and forcings, or poor retrieval algorithms or errors, often lead to low dependency between the datasets as can be seen in Fig. 4.

Using the same training data as shown in Fig. 4, the CDF matching has been performed over all the JJA data for the 6-yr period. An example from the summer of 2002 is shown in Fig. 10, which compares the CDF-matching approach to the copula-based soil moisture operator using the LSMEM and NASA retrievals and the VIC-simulated soil moisture. In Fig. 10a, where the LSMEM retrievals are presented, the two approaches give similar results. However, in Fig. 10b based on the NASA retrievals, they differ significantly, with the CDF-matching result having a larger dynamic range when compared to the modeled soil moisture (i.e., it tends to be too high in wetter periods and too low in drier periods). The result shown in Fig. 10b using CDF matching only occurred for JJA in 2002, while for the complete six JJA seasons the dynamic range is consistent with the VIC-simulated soil moisture range.

The different performance of CDF-matching results in Figs. 10a,b is attributed to the assumption in the CDF-matching approach that the two underlying uniformly distributed random variables (the CDFs of observed and modeled data) are perfectly correlated. If this assumption holds, then the results from copula approach and CDF matching will be the same, and the standard deviation from the copula conditional simulations will be zero. If the correlation is zero, the distribution from the copula-simulated model soil moisture, conditional on the satellite-retrieved soil moisture, will be the same as if it were sampled from the marginal distribution of the model soil moisture. In this case, paired soil moisture values (retrieved and modeled) would have different ranks. For the retrieved TMI/AMSR-E soil moisture, the dependency varies seasonally, with retrieval method and with the hydrologic model, so that the copula-based operator provides an equal or better function than the CDF-matching approach.

The second advantage of the copula-based observation operators over the CDF matching operators is that it provides a quantitative estimate of the variance of the model soil moisture conditional on the retrieved soil moisture, which accurately represents the retrieval error associated with remotely sensed soil moisture. In fact, the error accounts for the aggregated effects of sensor error, retrieval errors, model parameterization errors, and scale. The model parameterization differences can be seen from the different magnitudes of the conditional standard deviations for the same retrieval algorithm. This variability is crucial information when merging satellite retrievals with model predictions in a Bayesian framework.

## 5. Discussion and conclusions

The main objective of assimilating remotely sensed soil moisture information into land surface models is to reduce prediction errors in surface land states that arise from poor model parameterizations and forcings (primarily precipitation but also surface meteorology) by merging model prediction with observations. With satellite observations, either the measured brightness temperature is assimilated via a forward model or the retrieved geophysical parameters are assimilated. Nevertheless, both are characterized by its specific mean value, variability, and dynamical range, as is the land surface model into which satellite-based information is to be merged.

Merging satellite-based measurements, like retrieved soil moisture with land surface model–based soil moisture, presents a unique set of challenges that includes issues of scale among satellite resolution, land surface modeling, and “validation” datasets and issues of prediction errors in both modeling and retrievals and the level of correlation between them. In this paper we present a statistical approach for fitting joint distributions based on copulas that offers great flexibility in fitting the marginal distributions and the dependency between datasets. From this joint distribution, soil moisture operators can be developed that naturally arise from the conditional distribution of land model–based soil moisture conditioned on satellite-retrieved soil moisture.

Within a Bayesian framework, copula-based joint distributions offer a tremendous contribution for merging model and satellite information. Specifically, if *x* represents the modeled soil moisture at its spatial scale and depth and *y* represents the satellite-retrieved soil moisture at its spatial scale and effective depth, then the copula-based joint distribution provides *p*(*x, y*) based on historical series of model and satellite retrievals. The merged Bayesian (posterior) distribution is simply *p*(*x|y*) = *p*(*x, y*)/*p*(*y*), where *p*(*y*) is the marginal distribution of the satellite-retrieved soil moisture over the historical period. Sampling from this distribution would provide ensembles that could be used for assessing the uncertainty in related land surface variables that depend on soil moisture, like evapotranspiration. If the dependency between model and satellite-retrieved soil moisture is close to zero, then *p*(*x, y*) = *p*(*x*) *p*(*y*), *p*(*x|y*) = *p*(*x*) and the posterior model-based soil moisture distribution is equal to its prior, premerged distribution; that is, the satellite offers no information. This can occur due to deficiencies in either the land surface model predictions or the satellite retrievals. The information (or skill) offered by the satellite information is related to the reduction in the posterior conditional variance compared to the model-based soil moisture marginal distribution.

For assimilation approaches like the ensemble Kalman filter, using the operators from the satellite-retrieved and model-based soil moisture to generate observation ensembles would violate the independence assumption between model (state equation) errors and observation errors. If a copula-based joint distribution is fitted to satellite-retrieved soil moisture and in situ soil moisture measurements that were assumed to be “truth,” then ensembles representing the satellite retrieval errors can be generated as described in section 2. Whether such in situ data correctly represents true soil moisture at the satellite sensor scale is an open question.

In this study copula-based joint distributions were fitted to soil moisture retrievals based on 10.65-GHz TMI and AMSR-E microwave measurements over the southern Great Plains region of the United States and corresponding model-based soil moisture estimates from three different land surface models. One set of TMI and AMSR-E soil moisture data was retrieved using the LSMEM (Drusch et al. 2001; Gao et al. 2006) and another set using the NASA Level 2B AMSR-E algorithm (Njoku 2004). From the joint distributions, probability-based soil moisture operators were developed for each season and model-retrieval pair. The mean of the distribution offers the expectation of the bias-corrected soil moisture based on the satellite retrievals, while the standard deviation provides a measure of the information added through assimilation. From these standard deviations it can be seen that LSMEM adds more information than the standard NASA retrieval product over the domain studied, a finding supporting work by other researchers (Crow 2007), but in other regions this may not hold. Similarly, summer (JJA) and fall (SON) retrievals are more useful for assimilation than winter (DJF) or spring (MAM), probably because of higher precipitation and low evaporation resulting in uniform soil moisture in the models’ upper (7–10 cm) layer. Finally, the results seem to indicate that the value of satellite-retrieved soil moisture is most useful to VIC, followed by ERA-40 (ECMWF) and then NARR (NCEP). There is no assurance that satellite retrievals offer useful information to improve model-based predictions, but the approach outlined in this paper allows for a quantitative determination of their potential value.

Further work is needed to better understand the statistical properties of the copula-derived soil moisture operators for other regions outside the study area. As interest increases in assimilating AMSR-E and anticipated European Soil Moisture and Ocean Salinity (SMOS) soil moisture products in regional and global weather models, it will be important to understand the length of the time series (temporal duration) needed to derive robust copula-based operators and the variability of the operators seasonally and for various land regions of the globe.

## Acknowledgments

The research reported in this paper was supported through NASA Grants NAG5-9635 (TMI Soil Moisture Pathfinder Data Study), NAG5-11111 (Land Surface Modeling Studies in Support of AQUA AMSR-E Validation), and NAG5-11610 (Evaluation of Hydrologic Remote Sensing Observations for Improved Weather Prediction). This support is gratefully acknowledged. The authors wish to thank Dr. Eni Njoku for providing the NASA AMSR-E-based retrieval code for use with the TMI 10.65-GHz measurements.

## REFERENCES

Beljaars, A. C. M., Viterbo P. , Miller M. J. , and Betts A. K. , 1996: The anomalous rainfall over the United States during July 1993: Sensitivity to land surface parameterization and soil moisture.

,*Mon. Wea. Rev.***124****,**362–383.Best, D. J., and Gipps P. G. , 1974: Upper tail probabilities of Kendall’s tau.

,*Appl. Stat.***23****,**98–100.Betts, A. K., 2004: Understanding hydrometeorology using global models.

,*Bull. Amer. Meteor. Soc.***85****,**1673–1688.Crow, W. T., 2007: A novel method for quantifying value in spaceborne soil moisture retrievals.

,*J. Hydrometeor.***8****,**56–67.Crow, W. T., and Coauthors, 2005: An observing system simulation experiment for Hydros radiometer-only soil moisture products.

,*IEEE Trans. Geosci. Remote Sens.***43****,**1289–1303.De Michele, C., and Salvadori G. , 2003: A generalized Pareto intensity-duration model of storm rainfall exploiting 2-copulas.

,*J. Geophys. Res.***108****.**4067, doi:10.1029/2002JD002534.Drusch, M., Wood E. F. , and Jackson T. J. , 2001: Vegetation and atmospheric corrections for the soil moisture retrieval from passive microwave remote sensing data: Results from the Southern Great Plains Hydrology Experiment 1997.

,*J. Hydrometeor.***2****,**181–192.Drusch, M., Wood E. F. , and Gao H. , 2005: Observation operators for the direct assimilation of satellite retrieved soil moisture into land surface models.

,*Geophys. Res. Lett.***32****.**L15403, doi:10.1029/2005GL023623.Entin, J. K., Robock A. , Vinnikov K. Y. , Hollinger S. E. , Liu S. X. , and Namkhai A. , 2000: Temporal and spatial scales of observed soil moisture variations in the extratropics.

,*J. Geophys. Res.***105****,**11865–11877.Favre, A. C., El Adlouni S. , Perreault L. , Thiemonge N. , and Bobee B. , 2004: Multivariate hydrological frequency analysis using copulas.

,*Water Resour. Res.***40****.**W01101, doi:10.1029/2003WR002456.Gao, H., Wood E. F. , Drusch M. , Crow W. , and Jackson T. J. , 2004: Using a microwave emission model to estimate soil moisture from ESTAR observations during SGP99.

,*J. Hydrometeor.***5****,**49–63.Gao, H., Wood E. F. , Jackson T. J. , Drusch M. , and Bindlish R. , 2006: Using TRMM/TMI to retrieve surface soil moisture over the southern United States from 1998 to 2002.

,*J. Hydrometeor.***7****,**23–38.Illston, B. G., Caldwell J. C. , and Bodnar S. G. , 2004: Representativeness of soil moisture conditions in central Oklahoma during the enhanced drying phase. Preprints,

*18th Conf. on Hydrology,*Seattle, WA, Amer. Meteor. Soc., CD-ROM, JP4.14.Jackson, T. J., 1993: Measuring surface soil-moisture using passive microwave remote-sensing.

,*Hydrol. Processes***7****,**139–152.Jackson, T. J., and Schmugge T. J. , 1989: Passive microwave remote sensing for soil moisture—Some supporting research.

,*IEEE Trans. Geosci. Remote Sens.***27****,**225–235.Kerr, Y. H., Waldteufel P. , Wigneron J. P. , Martinuzzi J. M. , Font J. , and Berger M. , 2001: Soil moisture retrieval from space: The Soil Moisture and Ocean Salinity (SMOS) mission.

,*IEEE Trans. Geosci. Remote Sens.***39****,**1729–1735.Kimberling, C. H., 1974: A probabilistic interpretation of complete monotonicity.

,*Aequationes Math.***10****,**152–164.Koster, R. D., and Milly P. C. D. , 1997: The interplay between transpiration and runoff formulations in land surface schemes used with atmospheric models.

,*J. Climate***10****,**1578–1591.Koster, R. D., and Suarez M. J. , 2001: Soil moisture memory in climate models.

,*J. Hydrometeor.***2****,**558–570.Koster, R. D., and Suarez M. J. , 2003: Impact of land surface initialization on seasonal precipitation and temperature prediction.

,*J. Hydrometeor.***4****,**408–423.Koster, R. D., and GLACE Team, 2004: Regions of strong coupling between soil moisture and precipitation.

,*Science***305****,**1138–1140.Liang, X., Lettenmaier D. P. , Wood E. F. , and Burges S. J. , 1994: A simple hydrologically based model of land surface water and energy fluxes for general circulation models.

,*J. Geophys. Res.***99****,**14415–14428.Mitchell, K. E., and Coauthors, 2004: The multi-institution North American Land Data Assimilation System (NLDAS): Utilizing multiple GCIP products and partners in a continental distributed hydrological modeling system.

,*J. Geophys. Res.***109****.**D07S90, doi:10.1029/2003JD003823.Nelsen, R. B., 1999:

*An Introduction to Copulas.*Lecture Notes in Statistics, Springer-Verlag, 139 pp.Njoku, E., 2004: AMSR-E/Aqua Daily L3 surface soil moisture. V001. National Snow and Ice Data Center, Boulder, CO, digital media. [Available online at: http://nsidc.org/data/docs/daac/ae_land_l2b_soil_moisture.gd.html.].

Njoku, E. G., Jackson T. J. , Lakshmi V. , Chan T. K. , and Nghiem S. V. , 2003: Soil moisture retrieval from AMSR-E.

,*IEEE Trans. Geosci. Remote Sens.***41****,**215–229.Owe, M., Van de Griend A. A. , de Jeu R. , de Vries J. J. , Seyhan E. , and Engman E. T. , 1999: Estimating soil moisture from satellite microwave observations: Past and ongoing projects, and relevance to GCIP.

,*J. Geophys. Res.***104****,**19735–19742.Prigent, C., Aires F. , Rossow W. B. , and Robock A. , 2005: Sensitivity of satellite microwave and infrared observations to soil moisture at a global scale: Relationship of satellite observations to in situ soil moisture measurements.

,*J. Geophys. Res.***110****.**D07110, doi:10.1029/2004JD005087.Reichle, R. H., and Koster R. D. , 2005: Global assimilation of satellite surface soil moisture retrievals into the NASA Catchment land surface model.

,*Geophys. Res. Lett.***32****.**L02404, doi:10.1029/2004GL021700.Reichle, R. H., McLaughlin D. B. , and Entekhabi D. , 2001: Variational data assimilation of microwave radiobrightness observations for land surface hydrology applications.

,*IEEE Trans. Geosci. Remote Sens.***39****,**1708–1718.Reichle, R. H., Koster R. D. , Dong J. , and Berg A. , 2004: Global soil moisture from satellite observations, land surface models, and ground data: Implications for data assimilation.

,*J. Hydrometeor.***5****,**430–442.Schär, C., Lüthi D. , and Beyerle U. , 1999: The soil–precipitation feedback: A process study with a regional climate model.

,*J. Climate***12****,**722–741.Simmons, A. J., and Gibson J. K. , 2000: The ERA-40 project plan. ERA-40 Project Report Series 1, ECMWF, Reading, United Kingdom, 62 pp.

Sklar, A., 1959: Fonctions de repartition à n dimensions et leurs marges. Publication of the Institute of Statistics, University of Paris, Vol. 8, 229–231.

Vinnikov, K. Y., Robock A. , Speranskaya N. A. , and Schlosser A. , 1996: Scales of temporal and spatial variability of midlatitude soil moisture.

,*J. Geophys. Res.***101****,**7163–7174.Vinnikov, K. Y., Robock A. , Qiu S. , Entin J. K. , Owe M. , Choudhury B. J. , Hollinger S. E. , and Njoku E. G. , 1999: Satellite remote sensing of soil moisture in Illinois.

,*J. Geophys. Res.***104****,**4145–4168.Viterbo, P., and Betts A. K. , 1999: Impact of the ECMWF reanalysis soil water on forecasts of the July 1993 Mississippi flood.

,*J. Geophys. Res.***104****,**D16. 19361–19366.Walker, J. P., and Houser P. R. , 2001: A methodology for initializing soil moisture in a global climate model: Assimilation of near surface soil moisture observations.

,*J. Geophys. Res.***106****,**11761–11774.Walker, J. P., and Houser P. R. , 2004: Requirements of a global near-surface soil moisture satellite mission: Accuracy, repeat time, and spatial resolution.

,*Adv. Water Resour.***27****,**785–801.Wilker, H., Drusch M. , Seuffert G. , and Simmer C. , 2006: Effects of the near-surface soil moisture profile on the assimilation of L-band microwave brightness temperature.

,*J. Hydrometeor.***7****,**433–442.

Comparison between soil moisture from the LSMEM retrieval and their differences from Oklahoma Mesonet field-measured data.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparison between soil moisture from the LSMEM retrieval and their differences from Oklahoma Mesonet field-measured data.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparison between soil moisture from the LSMEM retrieval and their differences from Oklahoma Mesonet field-measured data.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparison between soil moisture from the NASA algorithm retrieval and their differences from Oklahoma Mesonet field-measured data.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparison between soil moisture from the NASA algorithm retrieval and their differences from Oklahoma Mesonet field-measured data.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparison between soil moisture from the NASA algorithm retrieval and their differences from Oklahoma Mesonet field-measured data.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparison between soil moisture from remote sensing and their differences from land surface models during summer seasons (JJA) from 1998 to 2003 (1998–2002 for ERA-40). The data for the two retrieval algorithms, LSMEM and NASA, are plotted separately.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparison between soil moisture from remote sensing and their differences from land surface models during summer seasons (JJA) from 1998 to 2003 (1998–2002 for ERA-40). The data for the two retrieval algorithms, LSMEM and NASA, are plotted separately.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparison between soil moisture from remote sensing and their differences from land surface models during summer seasons (JJA) from 1998 to 2003 (1998–2002 for ERA-40). The data for the two retrieval algorithms, LSMEM and NASA, are plotted separately.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

(upper left) Scatterplot of the empirical CDFs and simulated CDFs (remaining panels) from two time-matched datasets: *x* = LSMEM soil moisture and *y* = LSMEM minus VIC soil moisture, for the spring (MAM) season. (upper left) The empirical CDFs are from original distributions of each dataset, and the simulated CDFs are from three copula joint distributions fitted to the LSMEM and VIC datasets: (upper right) Clayton, (lower left) Gumbel, and (lower right) Frank.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

(upper left) Scatterplot of the empirical CDFs and simulated CDFs (remaining panels) from two time-matched datasets: *x* = LSMEM soil moisture and *y* = LSMEM minus VIC soil moisture, for the spring (MAM) season. (upper left) The empirical CDFs are from original distributions of each dataset, and the simulated CDFs are from three copula joint distributions fitted to the LSMEM and VIC datasets: (upper right) Clayton, (lower left) Gumbel, and (lower right) Frank.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

(upper left) Scatterplot of the empirical CDFs and simulated CDFs (remaining panels) from two time-matched datasets: *x* = LSMEM soil moisture and *y* = LSMEM minus VIC soil moisture, for the spring (MAM) season. (upper left) The empirical CDFs are from original distributions of each dataset, and the simulated CDFs are from three copula joint distributions fitted to the LSMEM and VIC datasets: (upper right) Clayton, (lower left) Gumbel, and (lower right) Frank.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Copula-simulated joint distributions (in red) between retrievals and their biases from model output. The original data are in black (same as in Fig. 4).

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Copula-simulated joint distributions (in red) between retrievals and their biases from model output. The original data are in black (same as in Fig. 4).

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Copula-simulated joint distributions (in red) between retrievals and their biases from model output. The original data are in black (same as in Fig. 4).

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Observation operator that predicts the mean and standard deviation of modeled soil moisture (VIC, NARR, and ERA-40) from LSMEM and NASA-retrieved soil moisture during summer seasons (JJA). The original data are in gray.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Observation operator that predicts the mean and standard deviation of modeled soil moisture (VIC, NARR, and ERA-40) from LSMEM and NASA-retrieved soil moisture during summer seasons (JJA). The original data are in gray.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Observation operator that predicts the mean and standard deviation of modeled soil moisture (VIC, NARR, and ERA-40) from LSMEM and NASA-retrieved soil moisture during summer seasons (JJA). The original data are in gray.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Observation operator that predicts the mean and standard deviation of VIC model soil moisture from LSMEM retrievals. The original data are in gray.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Observation operator that predicts the mean and standard deviation of VIC model soil moisture from LSMEM retrievals. The original data are in gray.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Observation operator that predicts the mean and standard deviation of VIC model soil moisture from LSMEM retrievals. The original data are in gray.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Observation operator that predicts the mean and standard deviation of VIC model soil moisture from NASA retrievals. The original data are in gray.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Observation operator that predicts the mean and standard deviation of VIC model soil moisture from NASA retrievals. The original data are in gray.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Observation operator that predicts the mean and standard deviation of VIC model soil moisture from NASA retrievals. The original data are in gray.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparisons between observation operators from CDF matching and Copula simulations. (a) Results using LSMEM soil moisture and (b) results using NASA soil moisture. Note that the black line is the VIC soil moisture; gray is the retrieved soil moisture; dotted black is the result from CDF matching; gray dots represent randomly generated soil moisture ensembles using the copula approach; and the thick gray line is the mean value of the dots.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparisons between observation operators from CDF matching and Copula simulations. (a) Results using LSMEM soil moisture and (b) results using NASA soil moisture. Note that the black line is the VIC soil moisture; gray is the retrieved soil moisture; dotted black is the result from CDF matching; gray dots represent randomly generated soil moisture ensembles using the copula approach; and the thick gray line is the mean value of the dots.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

Comparisons between observation operators from CDF matching and Copula simulations. (a) Results using LSMEM soil moisture and (b) results using NASA soil moisture. Note that the black line is the VIC soil moisture; gray is the retrieved soil moisture; dotted black is the result from CDF matching; gray dots represent randomly generated soil moisture ensembles using the copula approach; and the thick gray line is the mean value of the dots.

Citation: Journal of Hydrometeorology 8, 3; 10.1175/JHM570.1

TMI and AMSR instrument characteristics.

Parameters of the training data for the copula simulations.