1. Introduction
Given the availability of multiple approaches (i.e., models, in situ observations, and remote sensing) for estimating many geophysical variables, it is often desirable to merge them to obtain a more accurate product. In data assimilation, the goal is to optimally merge independent datasets with different error characteristics to obtain an analysis product with higher accuracy than all of the parent products.
However, the use of different modeling and/or observational approaches typically leads to predictions with different systematic relationships to the assumed truth. This is particularly true for soil moisture data assimilation given well-known climatological differences in both model-derived (Koster et al. 2009) and remotely sensed (Jackson et al. 2010) soil moisture products. Additionally, absolute values of models and observations differ from ground observations (Reichle and Koster 2004; Reichle et al. 2004). Hence, it is crucial to remove systematic differences between different datasets before using them in a hydrological data assimilation framework (Reichle and Koster 2004). This is commonly achieved by rescaling soil moisture observations to match model-predicted soil moisture (in some statistical sense) during a preprocessing step.
Several potential strategies for such rescaling have been proposed and applied in recent land data assimilation studies. Among them, cumulative distribution function (CDF) matching (Reichle and Koster 2004) and variance matching techniques are perhaps the most common. A handful of studies have applied rescaling based on least squares regression techniques (Crow et al. 2005; Crow and Zhan 2007) but failed to offer any clear rationale for this choice. Additionally, signal variance-based rescaling, typically applied as a preprocessing step in triple collocation analysis (Stoffelen 1998), also provides a means to rescale datasets using three independent estimates of the same variable. However, this approach has not yet been applied in soil moisture data assimilation.
Although there are many existing methods for rescaling hydrological variables, their optimality in terms of analysis errors in an assimilation framework has not yet been assessed. This paper investigates the relative performances of the above-mentioned rescaling methods both analytically and numerically.
The theoretical rationale for rescaling, and the degree to which rescaling techniques discussed above are consistent with this rationale, are discussed in the next section. Section 3 briefly presents the numerical experiment setup, section 4 presents the numerical results, section 5 discusses the implications of the results, and section 6 summarizes our conclusions.
2. Rescaling datasets
a. Analytical solution for the rescaling factor









The purpose of data assimilation should be to reduce the magnitude of the noise component while preserving the information obtained from the signal components. Although these products have similarities in the way they realize the truth, they often have characteristic differences as well (i.e., different μ and α). Therefore, without the knowledge of the truth, arguably the best way to ensure the merged product has minimized error variance (assuming the uncertainties of products are characterized accurately) is to match datasets x and y to minimize the systematic differences between them prior to data assimilation. Without knowledge of the truth, matching datasets can be done by selecting one of the datasets as reference and linearly rescaling the other one.




















b. Numerical solutions for the rescaling factor



































Note that, considering the definition in (4),
c. Optimal versus suboptimal solutions
Above we derive the optimal solution for cy in a sequential filtering framework as
3. Synthetic-twin experiment setup




Using the above API model, we have created daily synthetic ground truth t. Control runs x are obtained from model runs that do not assimilate observations, while API values from d to d + 1 are additively perturbed with random numbers that have mean of zero and standard deviations given in Table 1. Original observations y are created by multiplying the truth with a constant (true observation scaling factors αy) and then adding mean-zero random noise with the same standard deviations as the control run (Table 1). We later rescale y to x by using four different rescaling methods: VAR observations are created using (19), CDF observations are created by using the CDF-matching technique described by Reichle and Koster (2004), REG observations are created using (18), and TCA observations are created using (12). For the TCA-based rescaling, z values are created in an identical way to y but using a different random number sequence.
Standard deviation cases for random additive observation and model perturbations σom (20 cases total).

















Using this synthetic-twin framework, we investigate the impact of the REG, VAR, CDF, and TCA rescaling strategies on the accuracy of subsequent EnKF predictions by estimating the error standard deviation of EnKF analysis
4. Results
Based on the synthetic-twin EnKF setup described above, we examined the performance of various rescaling strategies by selecting three

Observation error standard deviations (left axis) and str of observations [see (13)] and the model [see (14)] (both on right axis), for (a) very low (0.12), (b) unity (1.00), and (c) very high (2.5) values of αy.
Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1

Observation error standard deviations (left axis) and str of observations [see (13)] and the model [see (14)] (both on right axis), for (a) very low (0.12), (b) unity (1.00), and (c) very high (2.5) values of αy.
Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1
Observation error standard deviations (left axis) and str of observations [see (13)] and the model [see (14)] (both on right axis), for (a) very low (0.12), (b) unity (1.00), and (c) very high (2.5) values of αy.
Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1

EnKF analysis error standard deviations for three different αy = (a) 0.12, (b) 1.00, and (c) 2.5. For clarity, actual str values (Fig. 1) are not drawn; instead, their max/min values are given. There are overlapping lines: green with brown in (b) and blue with green in (c).
Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1

EnKF analysis error standard deviations for three different αy = (a) 0.12, (b) 1.00, and (c) 2.5. For clarity, actual str values (Fig. 1) are not drawn; instead, their max/min values are given. There are overlapping lines: green with brown in (b) and blue with green in (c).
Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1
EnKF analysis error standard deviations for three different αy = (a) 0.12, (b) 1.00, and (c) 2.5. For clarity, actual str values (Fig. 1) are not drawn; instead, their max/min values are given. There are overlapping lines: green with brown in (b) and blue with green in (c).
Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1

As in Fig. 2, except EnKF analysis correlations with truth are plotted on the left axis. Actual str values are shown in Fig. 1. There are overlapping lines: brown and green in (b) and blue with green in (c).
Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1

As in Fig. 2, except EnKF analysis correlations with truth are plotted on the left axis. Actual str values are shown in Fig. 1. There are overlapping lines: brown and green in (b) and blue with green in (c).
Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1
As in Fig. 2, except EnKF analysis correlations with truth are plotted on the left axis. Actual str values are shown in Fig. 1. There are overlapping lines: brown and green in (b) and blue with green in (c).
Citation: Journal of Hydrometeorology 14, 2; 10.1175/JHM-D-12-052.1
Confirming the earlier theoretical analysis, TCA-based rescaling results in the smallest
Limited cases, where REG- and VAR-based rescaling produces smaller
Results in Fig. 3 demonstrate that TCA-based rescaling results in the highest ρ(m,t) for all examined cases. In addition, confirming earlier theoretical results, REG- and TCA-based EnKF results have comparable ρ(m,t) when stry values are high (Fig. 3c), and VAR-based rescaling converges to TCA-based rescaling when strx and stry are approximately equal (Fig. 3b).
When αy values are very low, TCA-based
5. Discussion
Given that we present two suboptimal solutions (REG- and VAR-based rescaling) that are widely applied in hydrological sciences, it is of interest to generalize which one leads to a more accurate analysis under specific conditions. Theoretically, the relative accuracy of REG- and VAR-based rescaling depends on the relative magnitudes of stry and (stry/strx)0.5. However, such information is seldom readily available to developers of land data assimilation systems. Hence, it is not straightforward to offer general advice about whether the REG- or VAR-based rescaling method is optimal.
Nevertheless, it is possible to perform a consistency check to see whether a particular rescaling approach is consistent with statistical assumptions made during the implementation of a data assimilation system. For example, in the implementation of an EnKF, specific assumptions must be made regarding the error covariance of observations and the forecast uncertainty of the model. Based on these assumptions, estimates of strx and stry can be readily obtained [i.e., stry and strx can be found as
Another important issue is the relevance of this analysis for the case of utilizing an observation operator to directly assimilate satellite brightness temperature Tb observations rather than geophysical retrievals based on the inversion of Tb. One interesting implication of applying a forward model to assimilate Tb is that the errors due to the radiance transfer model are effectively moved from the observation side to the model forecast side of the data assimilation system. As a consequence, assimilating Tb rather than soil moisture leads to an effective decrease in model-based str (strx) and increase in observation str (stry). In many cases, stry could be quite close to one, since the accuracy goal of low-frequency (<10 GHz) satellite Tb retrievals used for soil moisture retrieval (often on the order of 1–3 K) tends to be small relative to the observation dynamic range in true Tb (up to 100 K). This suggests that a REG-based rescaling approach is advantageous for rescaling Tb observations prior to their assimilation as it yields smaller analysis errors when stry is high and stry > strx (Fig. 2). However, it should be stressed that, while results presented here can be trivially generalized for the application of a linear observation operator, it is currently unknown how significantly they are impacted by the presence of a strongly nonlinear observation operator. Therefore, additional analysis will be required to fully describe the implications of this analysis for Tb assimilation based on nonlinear forward radiative transfer calculations.
6. Conclusions
In hydrological assimilation studies, the primary goal is to combine different datasets to obtain a more accurate one via reducing the level of noise in the datasets. However, if datasets do not have a similar systematic relationship with the assumed truth, merging methodologies can result in increased errors even if the product uncertainties are specified correctly. As a result, it is critical to have correctly rescaled datasets before a merging methodology is applied.
This paper investigated existing methods that are widely applied in hydrological data assimilation studies to rescale observations prior to their assimilation into models. Specifically, we have evaluated the VAR-, CDF-, REG-, and TCA-based rescaling methods. Among these methods, the REG-based linear regression solution has been recognized by some studies (Gupta et al. 2009; Holmes et al. 2012) and applied by Crow et al. (2005) and Crow and Zhan (2007), whereas the vast majority of the hydrological assimilation studies have applied VAR- and CDF-based rescaling strategies. Although the triple collocation solution of Stoffelen (1998) has been widely applied, it was not particularly emphasized before that its intermediate rescaling step should be applied in hydrological data assimilation studies.
In a hydrological assimilation study, if the errors of the reference and the matched datasets (i.e., hydrological model and the observations) are assumed negligible when compared to the real signal (implying very high str values), then these suboptimal rescaling factor solutions give very close to optimal estimates. However, for many hydrological studies the noise of the datasets cannot be ignored, hence the rescaling method should also take into account the magnitude of the noise components of both datasets. Among the methods, VAR- and CDF-based rescaling methods match the total variance of observations to the model while neglecting the noise contributions of the datasets (Gao et al. 2007), whereas the REG-based rescaling takes into account these error components via the additional multiplication factor of the correlation coefficient. Nevertheless, the VAR-, CDF-, and REG-based rescaling methods are only suboptimal solutions as they generally violate the orthogonality property of an optimal estimation procedure (section 2a). As a result, they provide only approximations to the optimal estimate with a multiplication factor f (fR = stry for the REG-based solution and
This analytical description of
Acknowledgments
We thank two anonymous reviewers and Bart Forman for their constructive comments, which led to numerous clarifications in the final version of the manuscript. Research was partially supported by Wade Crow’s membership in the NASA Soil Moisture Active/Passive Science Definition Team. The United States Department of Agriculture is an equal opportunity provider and employer.
APPENDIX A
Numerical Solutions for the Rescaling Factor
a. Rescaling factor from linear least squares regression


































b. Rescaling factor from variance matching

















c. Rescaling factor from triple collocation
Triple collocation analysis (TCA) is an error magnitude estimation method that uses three linearly related independent products to obtain the errors of each product separately. It was initially introduced for error magnitude estimation in oceanic studies (Stoffelen 1998; Caires and Sterl 2003), and has recently been applied to large-scale soil moisture error estimation-based studies (Scipal et al. 2008; Parinussa et al. 2011; Hain et al. 2011; Yilmaz et al. 2012; Anderson et al. 2012). These studies are typically based on one model-based soil moisture product and two remotely sensed products derived from contrasting remote sensing retrieval techniques (e.g., passive and active microwave).











APPENDIX B
Optimal versus Suboptimal Rescaling Error Variances


































Similarly, for the REG-based solution, stry ≪ 1, hence cs ≪ co. It follows that
REFERENCES
Anderson, W. B., Zaitchik B. F. , Hain C. R. , Anderson M. C. , Yilmaz M. T. , Mecikalski J. , and Schultz L. , 2012: Towards an integrated soil moisture drought monitor for East Africa. Hydrol. Earth Syst. Sci., 9, 4587–4631.
Caires, S., and Sterl A. , 2003: Validation of ocean wind and wave data using triple collocation. J. Geophys. Res., 108, 3098, doi:10.1029/2002JC001491.
Chui, C. K., and Chen G. , 1998: Kalman Filtering with Real-Time Applications. Springer, 230 pp.
Crow, W. T., and Zhan X. , 2007: Continental-scale evaluation of remotely sensed soil moisture products. IEEE Geosci. Remote Sens. Lett., 4, 451–455.
Crow, W. T., Bindlish R. , and Jackson T. J. , 2005: The added value of spaceborne passive microwave soil moisture retrievals for forecasting rainfall-runoff partitioning. J. Geophys. Res.,32, L18401, doi:10.1029/2005GL023543.
Entekhabi, D., and Coauthors, 2010: The Soil Moisture Active Passive (SMAP) Mission. Proc. IEEE, 98, 704–716.
Gao, H., Wood E. F. , Drusch M. , and Mccabe M. F. , 2007: Copula-derived observation operators for assimilating TMI and AMSR-E retrieved soil moisture into land surface models. J. Hydrometeor., 8, 413–429.
Gupta, H. V., Kling H. , Yilmaz K. K. , and Martinez G. F. , 2009: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol., 377 (1–2), 80–91.
Hain, C. R., Crow W. T. , Mecikalski J. R. , Anderson M. C. , and Holmes T. , 2011: An intercomparison of available soil moisture estimates from thermal infrared and passive microwave remote sensing and land surface modeling. J. Geophys. Res.,116, D15107, doi:10.1029/2011JD015633.
Holmes, T. R. H., Jackson T. J. , Reichle R. H. , and Basara J. B. , 2012: An assessment of surface soil temperature products from numerical weather prediction models using ground-based measurements. Water Resour. Res.,48, W02531, doi:10.1029/2011WR010538.
Jackson, T. J., and Coauthors, 2010: Validation of Advanced Microwave Scanning Radiometer soil moisture products. IEEE Trans. Geosci. Remote Sens., 48, 4256–4272.
Koster, R. D., Guo Z. , Yang R. , Dirmeyer P. A. , Mitchell K. , and Puma M. J. , 2009: On the nature of soil moisture in land surface models. J. Climate, 22, 4322–4335.
Parinussa, R. M., Holmes T. R. H. , Yilmaz M. T. , and Crow W. T. , 2011: The impact of land surface temperature on soil moisture anomaly detection from passive microwave observations. Hydrol. Earth Syst. Sci., 15, 3135–3151.
Reichle, R. H., and Koster R. D. , 2004: Bias reduction in short records of satellite soil moisture. Geophys. Res. Lett.,31, L19501, doi:10.1029/2004GL020938.
Reichle, R. H., Koster R. D. , Dong J. , and Berg A. A. , 2004: Global soil moisture from satellite observations, land surface models, and ground data: Implications for data assimilation. J. Hydrometeor., 5, 430–442.
Scipal, K., Holmes T. , de Jeu R. , Naeimi V. , and Wagner W. , 2008: A possible solution for the problem of estimating the error structure of global soil moisture data sets. Geophys. Res. Lett.,35, L24403, doi:10.1029/2008GL035599.
Stoffelen, A., 1998: Toward the true near-surface wind speed: Error modeling and calibration using triple collocation. J. Geophys. Res., 103 (C4), 7755–7766.
Yilmaz, M. T., Crow W. T. , Anderson M. C. , and Hain C. , 2012: An objective methodology for merging satellite- and model-based soil moisture products. Water Resour. Res.,48, W11502, doi:10.1029/2011WR011682.