The Optimality of Potential Rescaling Approaches in Land Data Assimilation

M. Tugrul Yilmaz Hydrology and Remote Sensing Laboratory, Beltsville, Maryland

Search for other papers by M. Tugrul Yilmaz in
Current site
Google Scholar
PubMed
Close
and
Wade T. Crow Hydrology and Remote Sensing Laboratory, Beltsville, Maryland

Search for other papers by Wade T. Crow in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

It is well known that systematic differences exist between modeled and observed realizations of hydrological variables like soil moisture. Prior to data assimilation, these differences must be removed in order to obtain an optimal analysis. A number of rescaling approaches have been proposed for this purpose. These methods include rescaling techniques based on matching sampled temporal statistics, minimizing the least squares distance between observations and models, and the application of triple collocation. Here, the authors evaluate the optimality and relative performances of these rescaling methods both analytically and numerically and find that a triple collocation–based rescaling method results in an optimal solution, whereas variance matching and linear least squares regression approaches result in only approximations to this optimal solution.

Corresponding author address: M. Tugrul Yilmaz, Hydrology and Remote Sensing Laboratory, Agricultural Research Service, U.S. Department of Agriculture, 10300 Baltimore Ave., BARC-WEST, Bldg. 007, Room 104, Beltsville, MD 20705. E-mail: tugrul.yilmaz@ars.usda.gov

Abstract

It is well known that systematic differences exist between modeled and observed realizations of hydrological variables like soil moisture. Prior to data assimilation, these differences must be removed in order to obtain an optimal analysis. A number of rescaling approaches have been proposed for this purpose. These methods include rescaling techniques based on matching sampled temporal statistics, minimizing the least squares distance between observations and models, and the application of triple collocation. Here, the authors evaluate the optimality and relative performances of these rescaling methods both analytically and numerically and find that a triple collocation–based rescaling method results in an optimal solution, whereas variance matching and linear least squares regression approaches result in only approximations to this optimal solution.

Corresponding author address: M. Tugrul Yilmaz, Hydrology and Remote Sensing Laboratory, Agricultural Research Service, U.S. Department of Agriculture, 10300 Baltimore Ave., BARC-WEST, Bldg. 007, Room 104, Beltsville, MD 20705. E-mail: tugrul.yilmaz@ars.usda.gov

1. Introduction

Given the availability of multiple approaches (i.e., models, in situ observations, and remote sensing) for estimating many geophysical variables, it is often desirable to merge them to obtain a more accurate product. In data assimilation, the goal is to optimally merge independent datasets with different error characteristics to obtain an analysis product with higher accuracy than all of the parent products.

However, the use of different modeling and/or observational approaches typically leads to predictions with different systematic relationships to the assumed truth. This is particularly true for soil moisture data assimilation given well-known climatological differences in both model-derived (Koster et al. 2009) and remotely sensed (Jackson et al. 2010) soil moisture products. Additionally, absolute values of models and observations differ from ground observations (Reichle and Koster 2004; Reichle et al. 2004). Hence, it is crucial to remove systematic differences between different datasets before using them in a hydrological data assimilation framework (Reichle and Koster 2004). This is commonly achieved by rescaling soil moisture observations to match model-predicted soil moisture (in some statistical sense) during a preprocessing step.

Several potential strategies for such rescaling have been proposed and applied in recent land data assimilation studies. Among them, cumulative distribution function (CDF) matching (Reichle and Koster 2004) and variance matching techniques are perhaps the most common. A handful of studies have applied rescaling based on least squares regression techniques (Crow et al. 2005; Crow and Zhan 2007) but failed to offer any clear rationale for this choice. Additionally, signal variance-based rescaling, typically applied as a preprocessing step in triple collocation analysis (Stoffelen 1998), also provides a means to rescale datasets using three independent estimates of the same variable. However, this approach has not yet been applied in soil moisture data assimilation.

Although there are many existing methods for rescaling hydrological variables, their optimality in terms of analysis errors in an assimilation framework has not yet been assessed. This paper investigates the relative performances of the above-mentioned rescaling methods both analytically and numerically.

The theoretical rationale for rescaling, and the degree to which rescaling techniques discussed above are consistent with this rationale, are discussed in the next section. Section 3 briefly presents the numerical experiment setup, section 4 presents the numerical results, section 5 discusses the implications of the results, and section 6 summarizes our conclusions.

2. Rescaling datasets

a. Analytical solution for the rescaling factor

Assuming that we have representations of a given geophysical variable derived from both a model and observations, we can generalize the model-based estimates x and the observations y in a linear form as
e1
e2
where μx and μy are the mean values of x and y, t′ is the true anomaly of the geophysical variable, αx and αy are scaling factors between the magnitude of the anomaly signals of x and y with t′, and x and y are zero mean random errors in x and y. In hydrological data assimilation, observations y are derived from in situ measurements and/or satellite-based retrievals, y is commonly assumed to lack autocorrelation, and x is generally considered to contain autocorrelation owing to the temporal memory of the model. In this setup, μ and αt′ represent the signal component while represents the noise component. We emphasize that (1) and (2) are general enough to encompass any unit dimension and dynamic range differences that may exist between x and y. In addition, note that, for the case in which the observations are assumed to capture a linear transformation of t′ (rather than t′ itself), the required transformation can simply be folded into the existing linear form of (2) through a trivial redefinition of αy. As a result, the development below is equally valid for the case of a linear observation operator.

The purpose of data assimilation should be to reduce the magnitude of the noise component while preserving the information obtained from the signal components. Although these products have similarities in the way they realize the truth, they often have characteristic differences as well (i.e., different μ and α). Therefore, without the knowledge of the truth, arguably the best way to ensure the merged product has minimized error variance (assuming the uncertainties of products are characterized accurately) is to match datasets x and y to minimize the systematic differences between them prior to data assimilation. Without knowledge of the truth, matching datasets can be done by selecting one of the datasets as reference and linearly rescaling the other one.

Given the linear model in (1) and (2) and assuming x is the reference dataset, y can be rescaled via the general linear transformation
e3
where cy is a rescaling factor and y* the rescaled dataset. Combining (2) and (3), we obtain
e4
After this rescaling, error in y* can be expressed as
e5
e6
where the unknown truth is given as t = μx + αxt′ since x is the reference dataset. Our goal here is identifying the functional form of cy that leads to an optimal data assimilation analysis. A key condition for such optimality is that assimilated observations y* have orthogonal errors or (Chui and Chen 1998, p. 33). The orthogonality of can expressed as
e7
e8
where the expectation operation E[·] represents long-term temporal averaging, is the variance of t′, and is assumed. Accordingly, in order for to satisfy the orthogonal property, (8) has to vanish. The only nontrivial way to ensure this is defining the optimal value of cy as
e9
Note that (9) can also be obtained from optimal filtering requirements that errors in y* be uncorrelated in time and/or errors in the analysis must be orthogonal.

b. Numerical solutions for the rescaling factor

Since αx and αy are typically unknown, (9) cannot be calculated directly. Instead, most land data assimilation studies attempt to replicate (9) using data that are available (i.e., x and y). Therefore, it is useful to consider the relationship between functional forms of cy derived from potential empirical rescaling strategies and the optimal form in (9). In appendix A we derive functional forms of cy obtained 1) by using linear least squares techniques to regress y onto x , 2) by scaling y so that it has the same long-term temporal variance as x , and 3) by applying preprocessing techniques commonly used in triple collocation as
e10
e11
e12
where is the variance of y, is the variance of t (also note that ), and and are the error variances of x and y, respectively. Defining the signal variance ( or ) to total variance ratio of y (stry) and x (strx) as
e13
e14
and using (9), (10)(12) can be rewritten as
e15
e16
e17
where is the variance of x. The expressions for in (10) and in (11) can also be written as
e18
e19
where ρ(x,y) is the correlation between x and y (see appendix A). In these forms (18) and (19), and can be obtained in any data assimilation study.

Note that, considering the definition in (4), in (19) ensures only the variances of x and y* match. This is sufficient for linear systems with Gaussian errors. However, a more general form of matching is also common in which the higher-order statistical moments are also matched. These so-called CDF matching approaches will be considered in numerical examples presented below.

c. Optimal versus suboptimal solutions

Above we derive the optimal solution for cy in a sequential filtering framework as (9). We also define cy for three different empirical rescaling strategies (, , and ). Of the three approaches considered [REG, VAR, and TCA abbreviations will be used to refer to the least squares regression–, variance matching–, and triple collocation analysis–based approaches described above; ORG will be used to refer to the nonrescaled original observations case of cy = 1], only the TCA-based approach resulted in the optimal solution, whereas REG- and VAR-based solutions resulted in approximations to this optimal solution in the form . For the REG- and VAR-based solutions, f factors are defined as fR = stry in (15) and in (16), respectively. In all cases where fR or fV are not equal to one, these two approaches diverge from the optimal solution given by (9). Therefore, the suboptimal REG-based solution converges to the optimal solution as stry converges to one, while VAR-based solution converges to the optimal solution only when strx = stry. Given the ubiquity of VAR- and REG-based rescaling approaches in contemporary land data assimilation, this demonstrates that a widely applied element of existing assimilation systems is generally suboptimal. An optimal solution is available from TCA-based rescaling approach; however, it requires three independent and mutually linear datasets of sufficient temporal length. If these requirements are not met, which is generally the case for most hydrological data assimilation systems, we are limited to the approximate REG- and VAR-based solutions.

3. Synthetic-twin experiment setup

The failure of REG- and VAR-based preprocessing to generally conform to the optimal rescaling state criteria given in (9) should degrade the subsequent performance of a sequential filter. Here, we conduct a series of synthetic-twin data assimilation experiments to quantify this degradation. Our numerical results are based on a simple antecedent precipitation index (API) model, given as
e20
e21
where d is day of the year, xd is the API model value at d, Pd is the precipitation value at d, and a and b values are selected as 0.85 and 0.10, respectively. The model is run over a single 0.25° pixel (35°N, 98°W) using daily Tropical Rainfall Measuring Mission (TRMM) 3B42 precipitation accumulations acquired between 1998 and 2010.

Using the above API model, we have created daily synthetic ground truth t. Control runs x are obtained from model runs that do not assimilate observations, while API values from d to d + 1 are additively perturbed with random numbers that have mean of zero and standard deviations given in Table 1. Original observations y are created by multiplying the truth with a constant (true observation scaling factors αy) and then adding mean-zero random noise with the same standard deviations as the control run (Table 1). We later rescale y to x by using four different rescaling methods: VAR observations are created using (19), CDF observations are created by using the CDF-matching technique described by Reichle and Koster (2004), REG observations are created using (18), and TCA observations are created using (12). For the TCA-based rescaling, z values are created in an identical way to y but using a different random number sequence.

Table 1.

Standard deviation cases for random additive observation and model perturbations σom (20 cases total).

Table 1.

Perturbation standard deviations (Table 1) and αy values [αy = (0.12, 1.00, 2.50)] are selected to result in increasing (or decreasing) cy and/or ρ(x,y). The true rescaling factors are calculated by taking the ratio of . Here, the values of αy are given as input in the experiment design and therefore explicitly known. However, αx is not known and instead calculated as
e22
where x′ is the control run anomaly. Rescaled observations are later assimilated into (20) using an ensemble Kalman filter (EnKF) of the form
e23
where
e24
and e is the ensemble member number (total is 40); and are analysis and forecast model values, respectively, for e at d; is the synthetically created linearly rescaled observation for e at d; and K is the Kalman gain at d. Here we note that the methodology is general to any Kalman filter variant while our choice of EnKF is arbitrary. Ensembles of observations are created by perturbing the observations at any given time step with statistics consistent with the error variances used for the calculation of K. An ensemble of model replicates at any time step are created by adding mean-zero noise (standard deviations given Table 1) to model forecasts of x. Values of are sampled from the background ensemble of xd at d, while observation error standard deviations are calculated using the rescaled sets as
e25
Hence, we assign perfect error variances to y*.

Using this synthetic-twin framework, we investigate the impact of the REG, VAR, CDF, and TCA rescaling strategies on the accuracy of subsequent EnKF predictions by estimating the error standard deviation of EnKF analysis and the EnKF analysis correlation with the truth ρ(m,t). In particular, we investigate these estimates as a function of ρ(x,y) since it differentiates the suboptimal REG- and VAR-based solutions [in (18) and (19)].

4. Results

Based on the synthetic-twin EnKF setup described above, we examined the performance of various rescaling strategies by selecting three cases (corresponding to low, close to one, and high ) and using all 20 perturbation scenarios summarized in Table 1. Values of for each of these three cases are plotted in separate panels in Fig. 1 (each line in each panel contains results for 20 different model and observation perturbation values). For the same three cases, and ρ(m,t) are presented in Figs. 2 and 3 (similar to Fig. 1, different model and observation perturbation values are plotted for each rescaling method).

Fig. 1.
Fig. 1.

Observation error standard deviations (left axis) and str of observations [see (13)] and the model [see (14)] (both on right axis), for (a) very low (0.12), (b) unity (1.00), and (c) very high (2.5) values of αy.

Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1

Fig. 2.
Fig. 2.

EnKF analysis error standard deviations for three different αy = (a) 0.12, (b) 1.00, and (c) 2.5. For clarity, actual str values (Fig. 1) are not drawn; instead, their max/min values are given. There are overlapping lines: green with brown in (b) and blue with green in (c).

Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1

Fig. 3.
Fig. 3.

As in Fig. 2, except EnKF analysis correlations with truth are plotted on the left axis. Actual str values are shown in Fig. 1. There are overlapping lines: brown and green in (b) and blue with green in (c).

Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1

Confirming the earlier theoretical analysis, TCA-based rescaling results in the smallest for all cases (Figs. 2a–c) with the exception when ρ(x,y) are very low. VAR-based are very similar to the TCA-based when 0.5 (subscripts x and y refer to the model and observations, respectively) are around one (Fig. 2b). CDF-based (not shown) are very similar to VAR-based for all cases. Additionally, the impact of rescaling approaches (i.e., the spread in VAR-, REG-, and TCA-based results) is maximized when ρ(x,y) are minimized, which emphasizes the importance of accurate rescaling for variables having moderate to low model/observation correlations (such as soil moisture).

Limited cases, where REG- and VAR-based rescaling produces smaller than TCA-based rescaling, are attributed to the lack of reliability of error variance/standard deviation as a performance metric when comparing datasets with different dynamic ranges (Entekhabi et al. 2010; Gupta et al. 2009). In particular, appendix B demonstrates how suboptimal solutions (VAR and REG based) can produce spuriously low error variances when cy ≫ 1 and strx and stry are very low. This problem is especially acute for REG-based rescaling when cy ≫ 1 since it frequently results in rescaled datasets with very small standard deviations due to grossly underestimated rescaling factors when stry ≪ 0.5. Hence, it is necessary to replot Fig. 2 using ρ(m,t) as an alternative error metric.

Results in Fig. 3 demonstrate that TCA-based rescaling results in the highest ρ(m,t) for all examined cases. In addition, confirming earlier theoretical results, REG- and TCA-based EnKF results have comparable ρ(m,t) when stry values are high (Fig. 3c), and VAR-based rescaling converges to TCA-based rescaling when strx and stry are approximately equal (Fig. 3b).

When αy values are very low, TCA-based are the highest (Fig. 1a), yet these observations result in the most accurate EnKF analysis (Fig. 3a). It can be shown that TCA-based become higher than the errors of other rescaling approaches when stry < 0.5 and f < 1 ( or stry < 1). However, this does not imply anything wrong with TCA-based rescaling; on the contrary, it emphasizes the importance of correctly assigning rescaling factors and illustrates that the goal of rescaling is not necessarily to minimize plotted in Fig. 1. Another interesting result in Fig. 2b and Fig. 3b is the suggestion that application of suboptimal rescaling methods can degrade the accuracy of the EnKF (relative to the ORG case of no rescaling) for cases in which little or no rescaling is required (i.e., ). Consequently, blindly applying any suboptimal scaling method entails an element of risk.

5. Discussion

Given that we present two suboptimal solutions (REG- and VAR-based rescaling) that are widely applied in hydrological sciences, it is of interest to generalize which one leads to a more accurate analysis under specific conditions. Theoretically, the relative accuracy of REG- and VAR-based rescaling depends on the relative magnitudes of stry and (stry/strx)0.5. However, such information is seldom readily available to developers of land data assimilation systems. Hence, it is not straightforward to offer general advice about whether the REG- or VAR-based rescaling method is optimal.

Nevertheless, it is possible to perform a consistency check to see whether a particular rescaling approach is consistent with statistical assumptions made during the implementation of a data assimilation system. For example, in the implementation of an EnKF, specific assumptions must be made regarding the error covariance of observations and the forecast uncertainty of the model. Based on these assumptions, estimates of strx and stry can be readily obtained [i.e., stry and strx can be found as and , respectively]. Therefore, a consistency check is possible between these str estimates and rescaling methods. Excluding the special cases given in appendix B, if assumed (observations less accurate then model or equally accurate), then VAR-based rescaling is preferable because the underestimation of rescaling factor through VAR-based rescaling would be less than through REG-based rescaling. Conversely, if observations are assumed to be more accurate than the model (i.e., ), particularly when and/or stry is high (≫0.4), then REG-based rescaling is preferable. In general, the choice of REG- or VAR-based rescaling methods is less critical (perhaps negligible) for very high stry and strx values (str > 0.9). However, note that particular str thresholds acquired from Fig. 2 (e.g., 0.4 and 0.9) might be system specific and not generalizable to other assimilation setups using different land models and/or observations. Nevertheless, at a minimum, this consistency check ensures that an applied rescaling approach is not grossly inconsistent with the error assumptions underlying the application of a particular data assimilation approach.

Another important issue is the relevance of this analysis for the case of utilizing an observation operator to directly assimilate satellite brightness temperature Tb observations rather than geophysical retrievals based on the inversion of Tb. One interesting implication of applying a forward model to assimilate Tb is that the errors due to the radiance transfer model are effectively moved from the observation side to the model forecast side of the data assimilation system. As a consequence, assimilating Tb rather than soil moisture leads to an effective decrease in model-based str (strx) and increase in observation str (stry). In many cases, stry could be quite close to one, since the accuracy goal of low-frequency (<10 GHz) satellite Tb retrievals used for soil moisture retrieval (often on the order of 1–3 K) tends to be small relative to the observation dynamic range in true Tb (up to 100 K). This suggests that a REG-based rescaling approach is advantageous for rescaling Tb observations prior to their assimilation as it yields smaller analysis errors when stry is high and stry > strx (Fig. 2). However, it should be stressed that, while results presented here can be trivially generalized for the application of a linear observation operator, it is currently unknown how significantly they are impacted by the presence of a strongly nonlinear observation operator. Therefore, additional analysis will be required to fully describe the implications of this analysis for Tb assimilation based on nonlinear forward radiative transfer calculations.

6. Conclusions

In hydrological assimilation studies, the primary goal is to combine different datasets to obtain a more accurate one via reducing the level of noise in the datasets. However, if datasets do not have a similar systematic relationship with the assumed truth, merging methodologies can result in increased errors even if the product uncertainties are specified correctly. As a result, it is critical to have correctly rescaled datasets before a merging methodology is applied.

This paper investigated existing methods that are widely applied in hydrological data assimilation studies to rescale observations prior to their assimilation into models. Specifically, we have evaluated the VAR-, CDF-, REG-, and TCA-based rescaling methods. Among these methods, the REG-based linear regression solution has been recognized by some studies (Gupta et al. 2009; Holmes et al. 2012) and applied by Crow et al. (2005) and Crow and Zhan (2007), whereas the vast majority of the hydrological assimilation studies have applied VAR- and CDF-based rescaling strategies. Although the triple collocation solution of Stoffelen (1998) has been widely applied, it was not particularly emphasized before that its intermediate rescaling step should be applied in hydrological data assimilation studies.

In a hydrological assimilation study, if the errors of the reference and the matched datasets (i.e., hydrological model and the observations) are assumed negligible when compared to the real signal (implying very high str values), then these suboptimal rescaling factor solutions give very close to optimal estimates. However, for many hydrological studies the noise of the datasets cannot be ignored, hence the rescaling method should also take into account the magnitude of the noise components of both datasets. Among the methods, VAR- and CDF-based rescaling methods match the total variance of observations to the model while neglecting the noise contributions of the datasets (Gao et al. 2007), whereas the REG-based rescaling takes into account these error components via the additional multiplication factor of the correlation coefficient. Nevertheless, the VAR-, CDF-, and REG-based rescaling methods are only suboptimal solutions as they generally violate the orthogonality property of an optimal estimation procedure (section 2a). As a result, they provide only approximations to the optimal estimate with a multiplication factor f (fR = stry for the REG-based solution and for the VAR-based solution). Hence, the suboptimality of these methods is reduced and their solutions converge to the true solution only when fR or fV converge to one.

This analytical description of is confirmed via a set of numerical synthetic-twin experiments using a simple soil moisture model. After rescaling the observations with the VAR-, CDF-, REG-, and TCA-based methods, we find that TCA rescaling leads to the most accurate EnKF analysis—with the exception of the cases clarified in appendix B. Accordingly, it is best to use TCA-based rescaling factors when available as long as its underlying assumptions (independence of errors, mutual linear relation, and long enough datasets) are met. If these conditions cannot be met, which often is the case in hydrological assimilation studies, then suboptimal approximations (VAR-, CDF-, and REG-based rescaling) must be used. However, in such cases, it should be recognized that such rescaling introduces a suboptimal element into the analysis that may degrade subsequent data assimilation results. The relative optimality of these approximations depends on the str of model and observations. Therefore, developers of land data assimilation applying a particular suboptimal approach should examine the consistency of their rescaling approach with error assumptions underlying the application of their data assimilation system. While a simple model is sufficient to clarify the underlying rescaling principles described above, follow-on work with more complex land surface models will be required to fully quantify the overall impact of rescaling errors on land data assimilation analysis products.

Acknowledgments

We thank two anonymous reviewers and Bart Forman for their constructive comments, which led to numerous clarifications in the final version of the manuscript. Research was partially supported by Wade Crow’s membership in the NASA Soil Moisture Active/Passive Science Definition Team. The United States Department of Agriculture is an equal opportunity provider and employer.

APPENDIX A

Numerical Solutions for the Rescaling Factor

a. Rescaling factor from linear least squares regression

One potential rescaling strategy is choosing a value of cy by linearly regressing y onto x and obtaining the best linear expression for x in terms of y. This least squares sense solution can be found by minimizing the mean square difference (msd) between x and y*:
ea1
Using (1) and (3) above can be written as
ea2
ea3
ea4
By taking the first derivative of (A4) with respect to cy and setting it to zero, we find the regression-based rescaling factor solution
ea5
ea6
ea7
Using the definitions of stry in (13) and strx in (14), in (A7) can be written as
ea8
However, the right-hand side of (A8) cannot always be obtained in this form since the calculation of αx, αy, and stry requires additional ground truth or ancillary datasets that are often not available. Consequently, we will rewrite (A8) in terms of readily available variables. To do this, we apply the definition of correlation between model and observation ρ(x,y):
ea9
ea10
Assuming , , and , (A10) can be written as
ea11
ea12
ea13
Using (A13), the regression-based rescaling factor solution in (A8) can be rewritten as
ea14
In this form, can be obtained in any data assimilation application.

b. Rescaling factor from variance matching

A widely applied rescaling strategy is transforming y so that its statistical moments match that of x. Since the form of (3) already ensures a match in means, the simplest viable case of this transformation is based solely on matching variances. Here, the rescaling factor from variance matching is given by
ea15
Below we rewrite (A15) in a way that resembles (A8) to clarify later the differences between different rescaling factor solutions. Assuming the noise components of the datasets are independent from the truth ( and ), and can be written as
ea16
ea17
Using these definitions, (A15) becomes
ea18
ea19
ea20

c. Rescaling factor from triple collocation

Triple collocation analysis (TCA) is an error magnitude estimation method that uses three linearly related independent products to obtain the errors of each product separately. It was initially introduced for error magnitude estimation in oceanic studies (Stoffelen 1998; Caires and Sterl 2003), and has recently been applied to large-scale soil moisture error estimation-based studies (Scipal et al. 2008; Parinussa et al. 2011; Hain et al. 2011; Yilmaz et al. 2012; Anderson et al. 2012). These studies are typically based on one model-based soil moisture product and two remotely sensed products derived from contrasting remote sensing retrieval techniques (e.g., passive and active microwave).

As a first step, TCA requires products to be rescaled to each other to match their signal statistics. Following Stoffelen (1998), such rescaling is based on applying TCA-based rescaling factor
ea21
in (3), where z is a third independent product that is similar to x and y (1)(2), and defined as z = μz + αzt′ + z with time anomaly z′ = αzt′ = z. Assuming all product errors are independent from both the truth and each other, (A21) can be rewritten as
ea22
ea23
ea24
Therefore, by taking advantage of a third independent product, TCA is able to recover the optimal rescaling factor defined in (9).

APPENDIX B

Optimal versus Suboptimal Rescaling Error Variances

The difference between the error variances of the optimal (TCA) rescaling and suboptimal (VAR and REG) rescaling is compared. Assuming , , and , following (6), the error variance of the optimal and the suboptimal analysis are found as
eb1
eb2
where co ( or in section 2) and cs ( or in section 2) are the rescaling factors of the optimal and suboptimal rescaling factors, respectively, and wo and ws are the weights of the rescaled observations associated with the optimal and suboptimal rescaling factors, respectively. Given optimal analysis satisfies αycoαx = 0, (B1) can be written as
eb3
We are interested in relative magnitudes of optimal and suboptimal error variances, hence we investigate their difference using (B2) and (B3):
eb4
eb5
We are particularly interested in this difference when strx, stry, and αy are very low and stry < strx [setup in Fig. 2a for very low ρ(x,y)]. Furthermore, for the scenario in Fig. 2a αy ≪ 1, hence co ≫ 1.
For the VAR-based solution, (considering both strx and stry are very low), hence ws ~ 0.5 [considering ]. Since stry < strx < 1, then f < 1, hence cs < co (reminder cs = cof). Accordingly, (given co ≫ 1), and wswo. Under this condition (wswo), the first term in (B5) can be approximated to , which is approximately (considering ws ~ 0.5). Since cs < co, the assumption of csco overestimates the third term in (B5). Thus, this assumption overall results in a higher number (approximately since ) to be subtracted in (B5). Using these approximations for the first and the third terms, (B5) is rewritten as
eb6
Following above , third term can be dropped and (B6) becomes
eb7
Here cs < co and wows, hence is slightly higher than , which results in the second term in (B6) being negative. However, this term is still much lower than the first term , hence the overall difference remains positive, which implies for VAR solution.

Similarly, for the REG-based solution, stry ≪ 1, hence csco. It follows that , hence wswo, which also results in the difference being positive, parallel to the VAR-based solution. Accordingly, suboptimal rescaling solutions (VAR and REG based) can result in smaller error variance than optimal solutions (TCA based) when αy ≪ 1, and strx and stry are very low. For these conditions, the REG-based rescaling strategy is particularly prone to spuriously low error variances since tends to be lower than , while both and underestimate cy (given and , if stry < strx ≪ 1 then ).

REFERENCES

  • Anderson, W. B., Zaitchik B. F. , Hain C. R. , Anderson M. C. , Yilmaz M. T. , Mecikalski J. , and Schultz L. , 2012: Towards an integrated soil moisture drought monitor for East Africa. Hydrol. Earth Syst. Sci., 9, 45874631.

    • Search Google Scholar
    • Export Citation
  • Caires, S., and Sterl A. , 2003: Validation of ocean wind and wave data using triple collocation. J. Geophys. Res., 108, 3098, doi:10.1029/2002JC001491.

    • Search Google Scholar
    • Export Citation
  • Chui, C. K., and Chen G. , 1998: Kalman Filtering with Real-Time Applications. Springer, 230 pp.

  • Crow, W. T., and Zhan X. , 2007: Continental-scale evaluation of remotely sensed soil moisture products. IEEE Geosci. Remote Sens. Lett., 4, 451455.

    • Search Google Scholar
    • Export Citation
  • Crow, W. T., Bindlish R. , and Jackson T. J. , 2005: The added value of spaceborne passive microwave soil moisture retrievals for forecasting rainfall-runoff partitioning. J. Geophys. Res.,32, L18401, doi:10.1029/2005GL023543.

  • Entekhabi, D., and Coauthors, 2010: The Soil Moisture Active Passive (SMAP) Mission. Proc. IEEE, 98, 704716.

  • Gao, H., Wood E. F. , Drusch M. , and Mccabe M. F. , 2007: Copula-derived observation operators for assimilating TMI and AMSR-E retrieved soil moisture into land surface models. J. Hydrometeor., 8, 413429.

    • Search Google Scholar
    • Export Citation
  • Gupta, H. V., Kling H. , Yilmaz K. K. , and Martinez G. F. , 2009: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol., 377 (1–2), 8091.

    • Search Google Scholar
    • Export Citation
  • Hain, C. R., Crow W. T. , Mecikalski J. R. , Anderson M. C. , and Holmes T. , 2011: An intercomparison of available soil moisture estimates from thermal infrared and passive microwave remote sensing and land surface modeling. J. Geophys. Res.,116, D15107, doi:10.1029/2011JD015633.

  • Holmes, T. R. H., Jackson T. J. , Reichle R. H. , and Basara J. B. , 2012: An assessment of surface soil temperature products from numerical weather prediction models using ground-based measurements. Water Resour. Res.,48, W02531, doi:10.1029/2011WR010538.

  • Jackson, T. J., and Coauthors, 2010: Validation of Advanced Microwave Scanning Radiometer soil moisture products. IEEE Trans. Geosci. Remote Sens., 48, 42564272.

    • Search Google Scholar
    • Export Citation
  • Koster, R. D., Guo Z. , Yang R. , Dirmeyer P. A. , Mitchell K. , and Puma M. J. , 2009: On the nature of soil moisture in land surface models. J. Climate, 22, 43224335.

    • Search Google Scholar
    • Export Citation
  • Parinussa, R. M., Holmes T. R. H. , Yilmaz M. T. , and Crow W. T. , 2011: The impact of land surface temperature on soil moisture anomaly detection from passive microwave observations. Hydrol. Earth Syst. Sci., 15, 31353151.

    • Search Google Scholar
    • Export Citation
  • Reichle, R. H., and Koster R. D. , 2004: Bias reduction in short records of satellite soil moisture. Geophys. Res. Lett.,31, L19501, doi:10.1029/2004GL020938.

  • Reichle, R. H., Koster R. D. , Dong J. , and Berg A. A. , 2004: Global soil moisture from satellite observations, land surface models, and ground data: Implications for data assimilation. J. Hydrometeor., 5, 430442.

    • Search Google Scholar
    • Export Citation
  • Scipal, K., Holmes T. , de Jeu R. , Naeimi V. , and Wagner W. , 2008: A possible solution for the problem of estimating the error structure of global soil moisture data sets. Geophys. Res. Lett.,35, L24403, doi:10.1029/2008GL035599.

  • Stoffelen, A., 1998: Toward the true near-surface wind speed: Error modeling and calibration using triple collocation. J. Geophys. Res., 103 (C4), 77557766.

    • Search Google Scholar
    • Export Citation
  • Yilmaz, M. T., Crow W. T. , Anderson M. C. , and Hain C. , 2012: An objective methodology for merging satellite- and model-based soil moisture products. Water Resour. Res.,48, W11502, doi:10.1029/2011WR011682.

Save
  • Anderson, W. B., Zaitchik B. F. , Hain C. R. , Anderson M. C. , Yilmaz M. T. , Mecikalski J. , and Schultz L. , 2012: Towards an integrated soil moisture drought monitor for East Africa. Hydrol. Earth Syst. Sci., 9, 45874631.

    • Search Google Scholar
    • Export Citation
  • Caires, S., and Sterl A. , 2003: Validation of ocean wind and wave data using triple collocation. J. Geophys. Res., 108, 3098, doi:10.1029/2002JC001491.

    • Search Google Scholar
    • Export Citation
  • Chui, C. K., and Chen G. , 1998: Kalman Filtering with Real-Time Applications. Springer, 230 pp.

  • Crow, W. T., and Zhan X. , 2007: Continental-scale evaluation of remotely sensed soil moisture products. IEEE Geosci. Remote Sens. Lett., 4, 451455.

    • Search Google Scholar
    • Export Citation
  • Crow, W. T., Bindlish R. , and Jackson T. J. , 2005: The added value of spaceborne passive microwave soil moisture retrievals for forecasting rainfall-runoff partitioning. J. Geophys. Res.,32, L18401, doi:10.1029/2005GL023543.

  • Entekhabi, D., and Coauthors, 2010: The Soil Moisture Active Passive (SMAP) Mission. Proc. IEEE, 98, 704716.

  • Gao, H., Wood E. F. , Drusch M. , and Mccabe M. F. , 2007: Copula-derived observation operators for assimilating TMI and AMSR-E retrieved soil moisture into land surface models. J. Hydrometeor., 8, 413429.

    • Search Google Scholar
    • Export Citation
  • Gupta, H. V., Kling H. , Yilmaz K. K. , and Martinez G. F. , 2009: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol., 377 (1–2), 8091.

    • Search Google Scholar
    • Export Citation
  • Hain, C. R., Crow W. T. , Mecikalski J. R. , Anderson M. C. , and Holmes T. , 2011: An intercomparison of available soil moisture estimates from thermal infrared and passive microwave remote sensing and land surface modeling. J. Geophys. Res.,116, D15107, doi:10.1029/2011JD015633.

  • Holmes, T. R. H., Jackson T. J. , Reichle R. H. , and Basara J. B. , 2012: An assessment of surface soil temperature products from numerical weather prediction models using ground-based measurements. Water Resour. Res.,48, W02531, doi:10.1029/2011WR010538.

  • Jackson, T. J., and Coauthors, 2010: Validation of Advanced Microwave Scanning Radiometer soil moisture products. IEEE Trans. Geosci. Remote Sens., 48, 42564272.

    • Search Google Scholar
    • Export Citation
  • Koster, R. D., Guo Z. , Yang R. , Dirmeyer P. A. , Mitchell K. , and Puma M. J. , 2009: On the nature of soil moisture in land surface models. J. Climate, 22, 43224335.

    • Search Google Scholar
    • Export Citation
  • Parinussa, R. M., Holmes T. R. H. , Yilmaz M. T. , and Crow W. T. , 2011: The impact of land surface temperature on soil moisture anomaly detection from passive microwave observations. Hydrol. Earth Syst. Sci., 15, 31353151.

    • Search Google Scholar
    • Export Citation
  • Reichle, R. H., and Koster R. D. , 2004: Bias reduction in short records of satellite soil moisture. Geophys. Res. Lett.,31, L19501, doi:10.1029/2004GL020938.

  • Reichle, R. H., Koster R. D. , Dong J. , and Berg A. A. , 2004: Global soil moisture from satellite observations, land surface models, and ground data: Implications for data assimilation. J. Hydrometeor., 5, 430442.

    • Search Google Scholar
    • Export Citation
  • Scipal, K., Holmes T. , de Jeu R. , Naeimi V. , and Wagner W. , 2008: A possible solution for the problem of estimating the error structure of global soil moisture data sets. Geophys. Res. Lett.,35, L24403, doi:10.1029/2008GL035599.

  • Stoffelen, A., 1998: Toward the true near-surface wind speed: Error modeling and calibration using triple collocation. J. Geophys. Res., 103 (C4), 77557766.

    • Search Google Scholar
    • Export Citation
  • Yilmaz, M. T., Crow W. T. , Anderson M. C. , and Hain C. , 2012: An objective methodology for merging satellite- and model-based soil moisture products. Water Resour. Res.,48, W11502, doi:10.1029/2011WR011682.

  • Fig. 1.

    Observation error standard deviations (left axis) and str of observations [see (13)] and the model [see (14)] (both on right axis), for (a) very low (0.12), (b) unity (1.00), and (c) very high (2.5) values of αy.

  • Fig. 2.

    EnKF analysis error standard deviations for three different αy = (a) 0.12, (b) 1.00, and (c) 2.5. For clarity, actual str values (Fig. 1) are not drawn; instead, their max/min values are given. There are overlapping lines: green with brown in (b) and blue with green in (c).

  • Fig. 3.

    As in Fig. 2, except EnKF analysis correlations with truth are plotted on the left axis. Actual str values are shown in Fig. 1. There are overlapping lines: brown and green in (b) and blue with green in (c).

All Time Past Year Past 30 Days
Abstract Views 388 0 0
Full Text Views 544 290 39
PDF Downloads 382 193 17