• Aires, F., 2011: Measure and exploitation of multisensor and multiwavelength synergy for remote sensing: 1. Theoretical considerations. J. Geophys. Res., 116, D02301, doi:10.1029/2010JD014701.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , and Rossow W. , 2003: Inferring instantaneous, multivariate and nonlinear sensitivities for the analysis of feedback processes in a dynamical system: Lorenz model case study. Quart. J. Roy. Meteor. Soc., 129, 239275, doi:10.1256/qj.01.174.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Schmitt M. , , Scott N. , , and Chédin A. , 1999: The weight smoothing regularization for MLP for resolving the input contribution’s errors in functional interpolations. IEEE Trans. Neural Networks, 10, 15021510, doi:10.1109/72.809096.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Prigent C. , , and Rossow W. B. , 2004a: Neural network uncertainty assessment using Bayesian statistics with application to remote sensing: 3. Network Jacobians. J. Geophys. Res., 109, D10305, doi:10.1029/2003JD004175.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Prigent C. , , and Rossow W. B. , 2004b: Neural network uncertainty assessment using Bayesian statistics: A remote sensing application. Neural Comput., 16, 24152458, doi:10.1162/0899766041941925.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Bernardo F. , , Brogniez H. , , and Prigent C. , 2010: Calibration for the inversion of satellite observations. J. Appl. Meteor. Climatol., 49, 24582473, doi:10.1175/2010JAMC2435.1.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Paul M. , , Prigent C. , , Rommen B. , , and Bouvet M. , 2011: Measure and exploitation of multisensor and multiwavelength synergy for remote sensing: 2. Application to the retrieval of atmospheric temperature and water vapor from MetOp. J. Geophys. Res., 116, D02302, doi:10.1029/2010JD014702.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Aznay O. , , Prigent C. , , Paul M. , , and Bernardo F. , 2012: Synergistic multi-wavelength remote sensing versus a posteriori combination of retrieved products: Application for the retrieval of atmospheric profiles using MetOp-A. J. Geophys. Res., 117, D18304, doi:10.1029/2011JD017188.

    • Search Google Scholar
    • Export Citation
  • Azarderakhsh, M., , Rossow W. B. , , Papa F. , , Norouzi M. , , and Khanbilvardi R. , 2011: Diagnosing water variations with the Amazon basin using satellite data. J. Geophys. Res., 116, D24107, doi:10.1029/2011JD015997.

    • Search Google Scholar
    • Export Citation
  • Bishop, C., 1996: Neural Networks for Pattern Recognition.Clarendon Press, 482 pp.

  • De Geeter, J. D., , Bruseel H. V. , , and Schutter J. D. , 1997: A smoothly constrained Kalman filter. IEEE Trans. Pattern Anal. Mach. Intell., 19, 11711177, doi:10.1109/34.625129.

    • Search Google Scholar
    • Export Citation
  • Fernandez-Prieto, D., , van Oevelen P. , , Su Z. , , and Wagner W. , 2012: Advances in earth observation for water cycle science. Hydrol. Earth Syst. Sci., 16, 543549, doi:10.5194/hess-16-543-2012.

    • Search Google Scholar
    • Export Citation
  • Foley, A. M., and Coauthors, 2013: Evaluation of biospheric components in Earth system models using modern and palaeo-observations: The state-of-the-art. Biogeosciences, 10, 83058328, doi:10.5194/bg-10-8305-2013.

    • Search Google Scholar
    • Export Citation
  • Hayward, S., 1998: Constrained Kalman filter for least-squares estimation of time-varying beamforming weights. Mathematics in Signal Processing IV, J. McWhirter and I. Proudler, Eds., Oxford University Press, 113–125.

  • Kalman, R., 1960: A new approach to linear filtering and prediction problems. Trans. ASME,82D, 35–45.

  • Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc.,77, 437–471, doi:10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.

  • Ko, S., , and Bitmead R. R. , 2007: State estimation for linear systems with state equality constraints. Automatica, 43, 13631368, doi:10.1016/j.automatica.2007.01.017.

    • Search Google Scholar
    • Export Citation
  • Kolassa, J., , Aires F. , , Polcher J. , , Prigent C. , , Jimenez C. , , and Pereira J. , 2013: Soil moisture retrieval from multi-instrument observations: Information content analysis and retrieval methodology. J. Geophys. Res. Atmos., 118, 48474859, doi:10.1029/2012JD018150.

    • Search Google Scholar
    • Export Citation
  • Liu, Y., and Coauthors, 2011: Developing an improved soil moisture dataset by blending passive and active microwave satellite-based retrievals. Hydrol. Earth Syst. Sci., 15, 425436, doi:10.5194/hess-15-425-2011.

    • Search Google Scholar
    • Export Citation
  • Pan, M., , and Wood E. F. , 2006: Data assimilation for estimating the terrestrial water budget using a constrained ensemble Kalman filter. J. Hydrometeor., 7, 534547, doi:10.1175/JHM495.1.

    • Search Google Scholar
    • Export Citation
  • Pan, M., , Sahoo A. K. , , Troy T. J. , , Vinukollu R. K. , , Sheffield J. , , and Wood E. F. , 2012: Multisource estimation of long-term terrestrial water budget for major global river basins. J. Climate, 25, 31913206, doi:10.1175/JCLI-D-11-00300.1.

    • Search Google Scholar
    • Export Citation
  • Porrill, J., 1988: Optimal combination and constraints for geometrical sensor data. Int. J. Robotics Res.,7, 66–77, doi:10.1177/027836498800700606.

  • Prigent, C., , Papa F. , , Aires F. , , Rossow W. , , and Matthews E. , 2007: Global inundation dynamics inferred from multiple satellite observations, 1993–2000. J. Geophys. Res., 112, D12107, doi:10.1029/2006JD007847.

    • Search Google Scholar
    • Export Citation
  • Rodgers, C., 2000: Inverse Methods for Atmospheric Sounding: Theory and Practice. Series on Atmospheric, Oceanic and Planetary Physics, Vol. 2, World Scientific Publishing, 240 pp.

  • Rumelhart, D., , Hinton G. , , and Williams R. , 1986: Learning internal representations by error propagation. Foundations, Vol. 1, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, 318–362.

  • Sahoo, A. K., , Pan M. P. , , Troy T. J. , , Vinukollu R. K. , , Sheffield J. , , and Wood E. F. , 2011: Reconciling the global terrestrial water budget using satellite remote sensing. Remote Sens. Environ., 115, 18501865, doi:10.1016/j.rse.2011.03.009.

    • Search Google Scholar
    • Export Citation
  • Sheffield, J., , Ferguson C. , , Troy T. , , Wood E. , , and McCabe M. , 2009: Closing the terrestrial water budget from satellite remote sensing. Geophys. Res. Lett.,36, L07403, doi:10.1029/2009GL037338.

  • Simon, D., , and Chia T. L. , 2002: Kalman filtering with state equality constraints. IEEE Trans. Aerosp. Electron. Syst., 38, 128136, doi:10.1109/7.993234.

    • Search Google Scholar
    • Export Citation
  • Tarantola, A., 1987: Inverse Problem Theory: Methods for Data Fitting and Model Parameter Estimation.Elsevier, 613 pp.

  • Thome, K., 2004: In-flight intersensor radiometric calibration using vicarious approaches. Post-Launch Calibration of Satellite Sensors, S. A. Morain and A. M. Budge, Eds., ISPRS Book Series, Vol. 2, Taylor and Francis, 93–102.

  • View in gallery

    The four water cycle components (P, E, Q, and ΔS) over the Mississippi basin for 1984–2006. Dataset is from Pan et al. (2012).

  • View in gallery

    The values indicate how the weighting is done in the PF filter between observational water components and resulting components with closure.

  • View in gallery

    Filters for SW, SW + PF, OI2, and OI3 = OI1 + PF. The values in these matrices indicate how the weighting is performed by each filter on each input dataset, in order to obtain the results integrated dataset of the four water cycle components.

  • View in gallery

    NN Jacobians (analog of the linear filters of Fig. 3) for (from top to bottom) Mississippi in January and July and Niger in January and July.

  • View in gallery

    Time series of the P, E, Q, ΔS, and terrestrial water budget for year 1984 for the Niger basin. Observations are in red, the SW estimates are in green, the NN estimate in blue, and the target is dashed black.

  • View in gallery

    RMSE of the retrieved water component (P, E, Q, and ΔS) using method SW (red), SW + PF (green), and NN (blue). The other integration methods are similar to one of these curves. The resulting water cycle budget is also represented. Results are for the whole 1984–2006 time series. Mississippi basin is in continuous lines and Niger in dashed lines.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 9 9 4
PDF Downloads 4 4 2

Combining Datasets of Satellite-Retrieved Products. Part I: Methodology and Water Budget Closure

View More View Less
  • 1 Estellus, France, and Laboratoire de l’Etude du Rayonnement et de la Matière en Astrophysique, CNRS, Observatoire de Paris, Paris, France, and Earth Institute and Department of Earth and Environmental Engineering, Columbia University, New York, New York
© Get Permissions
Full access

Abstract

This study addresses in general terms the problem of the optimal combination of multiple observation datasets. Only satellite-retrieved geophysical parameter datasets are considered here (not the raw satellite observations). This study focuses on the terrestrial water cycle and presents methodologies to obtain a coherent dataset of four water cycle key components: precipitation, evapotranspiration, runoff, and terrestrial water storage. Various innovative “integration” methodologies are introduced: simple weighting (SW), constrained linear (CL), optimal interpolation (OI), and neural networks (NN). The term “integration” will be used here, not “assimilation,” as no model will be included in the data fusion process. A simple postprocessing filtering (PF) step can be used to impose the water cycle budget closure after the integration method. It is shown that this constraint actually improves the estimation of the water cycle components. The integration techniques are tested using real observation data over the Mississippi and Niger basins from satellite and in situ measurements. A Monte Carlo experiment with a synthetic uncertainty perturbation model is used to measure the ability of the SW, OI, and NN, with or without the PF step, to retrieve the four water cycle components. Once the PF closure constraint is added, the methodologies have equivalent accuracies. The need for these types of methodologies should increase in the future since multiple observation datasets are now available and the climate community needs to combine them into a unique, optimal, and coherent dataset of multiple parameters. A companion paper will test these methodologies on satellite observation datasets at the basin and global scales.

Corresponding author address: F. Aires, Estellus/LERMA, Observatoire de Paris, 61 avenue de l’Observatoire, 75014 Paris, France. E-mail: filipe.aires@estellus.fr

Abstract

This study addresses in general terms the problem of the optimal combination of multiple observation datasets. Only satellite-retrieved geophysical parameter datasets are considered here (not the raw satellite observations). This study focuses on the terrestrial water cycle and presents methodologies to obtain a coherent dataset of four water cycle key components: precipitation, evapotranspiration, runoff, and terrestrial water storage. Various innovative “integration” methodologies are introduced: simple weighting (SW), constrained linear (CL), optimal interpolation (OI), and neural networks (NN). The term “integration” will be used here, not “assimilation,” as no model will be included in the data fusion process. A simple postprocessing filtering (PF) step can be used to impose the water cycle budget closure after the integration method. It is shown that this constraint actually improves the estimation of the water cycle components. The integration techniques are tested using real observation data over the Mississippi and Niger basins from satellite and in situ measurements. A Monte Carlo experiment with a synthetic uncertainty perturbation model is used to measure the ability of the SW, OI, and NN, with or without the PF step, to retrieve the four water cycle components. Once the PF closure constraint is added, the methodologies have equivalent accuracies. The need for these types of methodologies should increase in the future since multiple observation datasets are now available and the climate community needs to combine them into a unique, optimal, and coherent dataset of multiple parameters. A companion paper will test these methodologies on satellite observation datasets at the basin and global scales.

Corresponding author address: F. Aires, Estellus/LERMA, Observatoire de Paris, 61 avenue de l’Observatoire, 75014 Paris, France. E-mail: filipe.aires@estellus.fr

1. Introduction

The number of satellites observing Earth’s surface and atmosphere has been constantly increasing over the last three decades (Fernandez-Prieto et al. 2012). In addition, the exponential increase of computational power has allowed more and more groups (i.e., operational centers, space agencies, universities, public laboratories, or private companies) to develop their own satellite retrievals from these observations. As a consequence, multiple observation datasets exist for each geophysical variable, and the scientific community is now confronted with the problem of choosing a single dataset for each variable. However, each dataset source has its own advantages and drawbacks, so this choice is difficult. Another difficulty is that the available datasets for different geophysical variables are rarely coherent with each other because each one generally uses different assumptions, auxiliary information, or models. These incoherencies in multivariable datasets complicate their use in studies focusing on variable dependencies, such as process-oriented analysis or sophisticated validation metrics (Aires and Rossow 2003; Foley et al. 2013). In this context, international programs such as the Global Energy and Water Cycle Experiment (GEWEX; www.gewex.org/GDAP.html) are focusing on the integration of multiple datasets with a particular emphasis on coherency constraints.

As a consequence, there is a strong need for methodologies able to optimally produce, in a coherent way, multivariable datasets from the fusion of various observation datasets. These types of methods should be able to assimilate very different sources of information, such as 1) satellite observations, 2) in situ measurements, and 3) physical constraints, when they are available. Variational assimilation (Rodgers 2000; Kalnay et al. 1996) could be a good candidate, but in order to obtain a purely observational dataset, it is important not to use any model (hydrological, surface, or climate). This is important because one of the main objectives for building such a multivariable dataset is to obtain a reference that is as independent as possible from any model, so that it can be used to validate and calibrate models.

In this paper, various methodologies will be presented and tested by studying the terrestrial water cycle budget. The continental water budget is a key diagnostic of the effects of climate change and variability. It is also the most important part of the water cycle for impact studies on the populations, and its change remains uncertain. The closure of the terrestrial water cycle has been studied using the following water budget equation:
e1
where S represents the terrestrial water storage, R is the global runoff, P is the precipitation, and E is the evaporation (evapotranspiration).

Experiments have been conducted to close this budget (Sheffield et al. 2009; Sahoo et al. 2011; Pan et al. 2012). Most studies have targeted the runoff (freshwater discharge) as a diagnostic. It is at the interface between land and ocean, and it can be measured by in situ observations (even if there are now attempts to measure it with the use of satellite observations). In Sheffield et al. (2009), precipitation, evapotranspiration, and water storage are quantified using a collection of satellite measurements while the streamflow is used as the closure target and estimated with gauge measurements. Large uncertainties in the individual components make the closure difficult. For 10 basins over 3 years, Sahoo et al. (2011) merge different products for precipitation and evapotranspiration, using weighted values based on their errors, to mitigate errors in the individual satellite products. A constrained ensemble Kalman filter is then used to close the water budget and provide a constrained estimate of the water budget. Pan et al. (2012) adopt a similar strategy to estimate the water budget over 32 basins for the period of 1984–2006. However, they used in situ observations, land surface model simulations, and global reanalyses together with the remote sensing products. In Azarderakhsh et al. (2011), the Amazon basin is studied using water storage, precipitation, and evaporation satellite observations. The water discharge is estimated using the water balance equation and results are compared to gauge-based measurements. This study stresses the need for subbasin studies and the addition of temporal lags in the analysis.

From the previous experiments and literature, it is clear that each individual dataset is not accurate enough to directly close the water budget. In this paper, the objective is then to estimate new optimized states a = (P, E, Q, ΔS) from various estimates of the different components, plus additional constraints, in particular, the water budget closure. Pan and Wood (2006), Sahoo et al. (2011), and Pan et al. (2012) have performed a similar task, but they employed a Kalman filtering solution, and a land surface model was used. In this study, the emphasis is to obtain purely observation-driven estimates, limiting the intervention of any model. The qualities and limitations of various approaches will be investigated. The methods should utilize as many sources of information as possible for each water cycle component (e.g., various satellite precipitation estimates), but no model. They should combine these estimates by using a priori information on their respective uncertainties; the combination of sources is intended to reduce these uncertainties. They should use the water budget closure as independent a priori information to help characterize uncertainties and to obtain a new optimized dataset of terrestrial water components. Furthermore, the methodology should be flexible enough to ensure that as much a priori information as possible about these water components can be used. The weighting and closure constraints should be performed simultaneously to obtain an optimal solution. The approach should be flexible enough to ensure that it can evolve with time, when more information on the water cycle can be integrated. A thorough analysis of the results should be performed over space and time (by analyzing the space–time variability of the weighting) to ensure the best quality for a long-term and globally integrated dataset that can be used for climate studies.

Section 2 presents various integration techniques. The uncertainties of these methods will be assessed using Monte Carlo experiments, and section 3 describes the approach used to perform this task. The results are presented and analyzed in section 4. Section 5 provides the main conclusions of this study and presents various perspectives for this work.

2. Integration methods

a. General context

The goal of this study is to introduce different methods able to combine various datasets of multiple variables, in such a way that a priori constraints among the state variables are imposed to the state estimate.

The major developments for the introduction of state constraints have been accomplished in the framework of Kalman filtering. A first approach to use such state constraint consists of projecting the state variables in a space where the constraint is satisfied. However, this approach has some limitations. For example, the interpretability of the results is more difficult since both the state variables and the physical equations are modified (Simon and Chia 2002). Another solution uses the particular constraint as a “perfect measurement,” meaning that there is no uncertainty on this constraint. The drawback of this method is that the introduction of such “hard” or “strong” constraints into the system increases the problem dimension and can introduce numerical instabilities (Hayward 1998; Porrill 1988; De Geeter et al. 1997; Kalman 1960). Simon and Chia (2002) proposed to postprocess the unconstrained Kalman solution to impose a linear state constraint: at each time step, the unconstrained solution is projected into a constrained state space. This avoids the numerical instabilities associated with the perfect measurement approach. Furthermore, it does not increase the dimension of the problem since the state variables are kept in the same space (this facilitates the interpretation of results). In addition, it is shown that this projected solution is better than the unconstrained one since the elements of the error covariance matrix are smaller in magnitude. This postprocessing can be defined using maximum a posteriori, least squares, or linear projection approaches.1 An even more advanced solution is proposed by Ko and Bitmead (2007) that projects the full dynamical system into a constrained space. This system-projection approach is shown to be even better than the postprocessing solution.

However, these developments have been made in the framework of Kalman filtering where a state variable dynamical system is considered. This often means that a model is used so that the state variables at time t are dependent on the state variables at earlier times (i.e., the dynamical system). As previously mentioned, the goal of this paper is to produce state estimates that are independent from any model. Therefore, the Kalman filtering cannot be considered here; only direct state estimation will be proposed. However, the technical developments made for the Kalman filtering will be useful for the integration tools proposed in the following.

b. Notations

Let T = (P, E, Q, ΔS) be the state variable quantifying the four water cycle components of interest in this paper (a superscript T is the transpose symbol). The goal of this study is to estimate . The closure of the water cycle is obtained when T · G = 0, where GT = (1, −1, −1, −1), from Eq. (1).

Multiple observations can be obtained for each component:

  • (P1, P2, …, Pp), the p precipitation estimates;
  • (E1, E2, …, Eq), the q sources of information for evapotranspiration;
  • (Q1, Q2, …, Qr), the r runoff estimates; and
  • S1, ΔS2, …, ΔSs), the s sources of information for the groundwater storage change.
Except for the runoff Q that comes from in situ measurements, in the following, “observation” will be referring to a satellite observation. Note also that, in the following, observations will refer to water cycle components (for P, E, Q, or ΔS) and not to actual raw satellite observations such as radiances measured by satellite instruments. The aim of this study is to optimally combine observation products, not to develop a remote sensing algorithm.

Let
e2
be the vector of dimension n = p + q + r + s gathering all the initial water cycle observations. The observing system can be represented using2
e3
no bias error is considered here. The observation operator has the following structure:
e4
Off-diagonal terms could be introduced if this information was available.
In this matrix, row i includes a “1” in the first column if observation i measures P (or E, Q, or ΔS for the second, third, and fourth columns, respectively). The n-vector ε represents the uncertainties of each observation in ε. The observation uncertainty is characterized by the covariance matrix = cov(ε), often chosen to be diagonal:
e5

In this study, the goal is to estimate a relationship (i.e., filter) that finds an estimate a (a stands for analysis) of based on the observations ε. Therefore, an inverse problem needs to be solved where Eq. (3) is inverted from observation ε to a (Tarantola 1987; Rodgers 2000).

A simple inversion of matrix is not good enough. Since is not square, a pseudoinverse or a generalized inversion of matrix could be a candidate, but these simple algebra techniques suffer from numerical instabilities. In addition, other sources of information are available, so more sophisticated methods should be considered to exploit them. For example, the information on the observation noise (i.e., matrix ) is very valuable information that allows for weighing the observations ε based on their reliability. Furthermore, in addition to the observations ε, a priori information b on can also be used (b stands for “background,” the terminology used in numerical weather prediction centers). An important question concerns of course the origin of this first guess (FG) information since it is not always easy to obtain such a priori information, especially when models are excluded. The covariance matrix of the FG errors needs to be specified based on the source of information for b.

c. Simple weighting

Let us first consider the simple case where two observations P1 and P2, with Gaussian uncertainties ε1 ~ N(0, σ1) and ε2 ~ N(0, σ2), are available for the estimation of precipitation P. In this case, the optimal linear estimator (Rodgers 2000) is given by
e6
This type of equation is valid when there is no bias error in the sources of information P1 and P2. Note that to be an optimal solution, the errors in P1 and P2 must be independent from each other. This type of estimator is more frequently used when merging raw satellite observations because the bias correction is easier in this case (Thome 2004; Aires et al. 2010). However, the variables that are combined here are geophysical products originating from a remote sensing algorithm and, most of the time biases do exist in this case, unless a calibration procedure has been used to reduce them. When information on the bias is available (e.g., by comparison to in situ measurements), this information can be used to suppress the bias. However, in many cases, bias information is difficult to obtain, and a pragmatic strategy is to estimate the mean state for each variable as the average of each of its observations: (the overbar refers to the mean). Then all the sources of information Pi on this variable are bias corrected toward this average: , for i = 1, 2.
It is possible to extend the result of Eq. (6) to the case where more than two observations are available for a variable and where more than one variable is retrieved. Let us find the optimal linear estimator, such that
e7
where is a 4 × n matrix. To be optimal, needs to take into account the observation uncertainties of the observation ε described by .
Let us consider the first line of used to retrieve the first water cycle component in , that is, P. As stated earlier, there are p available observations: (P1, P2, …, Pp). The first line of is composed of zeroes, except for the first p columns that are given by
e8
for j = 1, …, p. This expression extends easily to the second (for E), third (for Q), and fourth (for ΔS) rows. Again, this expression is the optimal solution only if all the errors are independent from each other. If a covariance matrix describing the dependencies of the input datasets errors was available, an eigendecomposition of this covariance matrix would create a new input dataset with uncorrelated errors. Unfortunately, this type of covariance matrix for the input dataset errors is generally not available.

d. A constrained linear technique

This approach is based on a simple linear model for the weighting of the satellite observations:
e9
with additional constraints on gain matrix and on solution .
We note that is the (4, n = p + q + r + s) matrix that linearly links the new water cycle components to the satellite estimates:
e10
Note that the simple weighting (SW) filter follows such structural constraint.
The coefficients in Eq. (10) are constrained by
e11
(i.e., each line L in adds up to 1). This ensures that estimated water components in are a weighted sum of the observations for that particular component. However, if the estimation of is a well-posed problem, it might be possible to use simpler constraints, for example, the coordinates in can be constrained to be 0 ≤ kij ≤ 1, or their amplitude can be limited. This later and simple procedure is a Tikhonov regularization often used in regression problems. For instance, in the neural network (NN) theory, it is called “weight decay” (Bishop 1996).
Note that if only constraints of Eqs. (11) were applied to the filter of Eq. (10), the constrained linear (CL) solution would be very close to the SW one. The SW exploits the uncertainty characteristics to obtain the good coefficients in filter . The specificity of the CL approach is that, in addition to the structural constraint in , a water closure constraint is also added. This closure can be written as a constraint on the solution :
e12
where M is the number of samples (i.e., observations i) used in this criterion. The term υ is a scalar; it represents the variance of closure errors. This number can be defined a priori from previous experiments or as an empirical regularization term. In this case, the smallest value for υ that still ensures water budget closure in the estimate a is chosen.
To find the optimal solution , an optimization algorithm needs to be used to minimize closure under constraints of Eqs. (11):
e13
This optimization needs to be performed simultaneously on all the N available samples. The samples can be pooled into a general dataset, and the optimization be used to obtain a general filtering matrix . As previously mentioned, matrix has a similar structure to the SW solution. As a consequence, SW can be a good first guess solution for CL, and this can speed up the optimization process and the quality of the final results.

It should be noted that in this CL solution, and are not used, though they are also very valuable constraints. It is an intermediate solution between the SW and the following optimal interpolation (OI). This method will not be tested in the following experiments because it is more difficult to implement and the application that will be presented in this paper does not require such complexity. However, it is a very interesting approach, especially for problems where complex physical constraints need to be imposed on the solution.

e. The optimal interpolation method

The goal of this approach is to simultaneously exploit all the available a priori information and to find the best compromise for a. For this purpose, a cost function (i.e., quality criterion) is first defined as
e14
The first right-hand term is related to the information provided by an FG variable b. Since the focus of this paper is to obtain a model-independent estimate of the water cycle components, this FG needs to originate from observations such as available climatology, simple averaging of the observations for each component, or the result of an SW filtering. The second term is related to the observations ε. The third term represents the water budget closure constraint. This OI filter can be used without the FG information b, but, of course, this would be detrimental to the quality of the solution.
The optimal solution is chosen to minimize the quadratic cost function J(). This is obtained when equating to zero the gradient J() = 0. By posing ε · = ε · b · (b), and neglecting the constant terms, we search for the gradient of
e15
This gradient is given by
e16
Therefore, the solution follows:
e17
where
e18

It can be seen that in this approach, the linear matrix is built using both the and matrices, and this is a valuable feature. No constraints of the form LP, LE, LQ, and LΔS or no structural constraint such as in the filter of Eq. (9) are included in this solution, so there is a flexibility in the shape of the filter. Note that this OI filter does not change with the input data, except if the a priori information error statistics in matrix or the observations errors in change from one input datum to another.

In the following experiments, three tests will be conducted. The OI1 and OI2 solutions refer to the OI estimate with and without the closure term in Eq. (14) (i.e., last term in the equation with G). The solution OI3 will be the OI1 solution plus the postprocessing filtering (PF) step that will be introduced in the following to impose closure in a postprocessing step.

f. A nonlinear neural network weighting technique

In this approach, the estimate of the water components is provided by a nonlinear NN model g:
e19
This is an extension of the SW, CL, or OI solutions that use a linear filtering instead of the nonlinear model g. This nonlinear character of g means that the weighting of the information in ε is state dependent; this is generally a useful feature.

The details of the NN methodology will not be discussed here. The interested reader is referred to, for example, Bishop (1996) and Aires et al. (2004b). The NN model g is a parametric model, and a training procedure needs to be used to estimate these parameters. The training of g first requires an optimization procedure. The usual back-propagation algorithm (Rumelhart et al. 1986) will be used here. A training dataset is required too: it gathers a set of input–output couples (ε, ). In this study, these couples will be built with an ensemble of water cycle components that follow a closure constraint from Pan et al. (2012) (see section 3a). Synthetic observations ε are generated using a specified noise model for ε. The goal of the NN model g is therefore to retrieve the target water cycle components from the perturbed observations ε. The noise model for the perturbations in ε is not provided to the NN model; only their implicit characteristics in dataset are available to the NN.

The goal of this study is to retrieve, from the observations ε, an estimation a such that its components close the water cycle budget. How can this closure constraint be imposed on the NN outputs? The first solution is easy in practice; it consists of using the learning dataset outputs that respect this closure. In general, a constraint in the learning dataset is reproduced by the trained NN, but this is not a strong explicit constraint on the model g. This is the solution tested in this paper since the dataset described in section 3a respects this budget closure. Another solution would be to introduce the closure constraint (previously defined as closure) in the loss function that is minimized during the NN training. Such complex constraints have been used, for example, in Aires et al. (1999), but their implementation is complex. In particular, the back-propagation algorithm would need to be redefined in this case. This approach will not be tested here.

In this NN technique, the weights for each water cycle component in ε can vary with the state of the system. However, there is no constraint on the summation of the weights, as in the LP, LE, LQ, and LΔS terms in section 2d. This can potentially be dangerous because compensation phenomena could occur (Aires et al. 2004a): an error in the weight of one observation in ε is compensated by an opposite error in the weight of another observation. This can lead to very high positive and negative weight values in the NN filter. This is a classic symptom of ill-conditioned problems. In the case of such problems, regularization solutions tend to reduce the potential for such high-value weights, for example, the traditional NN weight decay (Bishop 1996).

As previously mentioned, the learning of the NN requires a dataset that includes the four water cycle component targets that respect the closure constraint. As a consequence, the NN learning inference strategy cannot easily be used on real conditions. However, in the experiments of section 4, this technique will be used to test the state dependency of the obtained filters.

g. The postprocessing filtering

This interesting approach has been presented, for example, in Pan and Wood (2006). It is not an integration method, but rather a postprocessing method to impose the closure constraint on a previously obtained solution b. Thus, an estimate of the state variables b = · ε is first obtained using the observations ε and some filter . In Pan and Wood (2006), the FG uses also some constraints from the variable infiltration capacity (VIC) macroscale hydrologic model, but other FG solutions (from the SW, CL, OI, or NN estimates) can be used instead. This will be tested in section 4.

A PF process is then applied on the FG solution b in order to introduce the closure constraint:
e20
where . This can be rewritten as
e21
where PF = [Id − GT(GT)−1G], where Id is the identity matrix. This expression shows that this postprocessing is a filtering of b based on G with the goal of obtaining the closure. Note again that this filtering is constant and identical for any situation unless there is enough information to have an a priori error matrix dependent on the situation.

In this paper, this approach is used, for example, with , that is, the SW solution of section 2c. Note that this postprocessing step does not use the observations ε (used for the FG b only), so the observations are not used twice to estimate , which would have been a problem (Rodgers 2000).

3. Dataset and Monte Carlo scheme

a. Hydrological dataset

The numerical experiments of this paper use the terrestrial water cycle dataset produced in Pan et al. (2012). This dataset includes the P, E, Q, and ΔS components at the monthly time scale for 23 years (1984–2006) and over 32 major basins. Therefore, the dataset is a matrix of dimension 276 × 4 for each basin. In this dataset, the terrestrial water budget closure is respected since this was the goal of its construction. The method used to impose this closure was based on the PF method, with an FG provided by a combination of observations and modeling constraints. Figure 1 represents the four water components in for the Mississippi basin. The budget is not represented in this figure, but, by construction, it is equal to zero for all time steps. It can be noted that large interannual variations are present, but no easy long-term trend can be identified here.

Fig. 1.
Fig. 1.

The four water cycle components (P, E, Q, and ΔS) over the Mississippi basin for 1984–2006. Dataset is from Pan et al. (2012).

Citation: Journal of Hydrometeorology 15, 4; 10.1175/JHM-D-13-0148.1

The tests conducted in this paper are performed over two basins only: Mississippi and Niger. These basins are major hydrological regions. The mean and standard deviation (STD) of the four water components for these two basins are presented in Table 1. The average values are close for both basins but the variability can differ: evapotranspiration has a higher STD for Mississippi, but precipitation is the more variable component for Niger. In both cases, the runoff Q has the lowest variability.

Table 1.

Statistics on the Mississippi and Niger water components and a priori information on observation uncertainty characteristics for observation Yε.

Table 1.

From this dataset , synthetic observations are built for testing the ability of the integration methods of section 2 to retrieve . The perturbed observations will be built based on the following structure:
e22
Each water component will be observed three times, except for runoff Q, which is observed twice only. In the following Monte Carlo experiments, each one of these observations will be chosen equal to the true component in plus a random perturbing noise using Eq. (3).

The a priori information on observation uncertainty is given in Table 1 (right columns). These uncertainties are synthetic and have been chosen based on an educated physical guess. There is no given information on potential biases in the observations (bias will be present in the following Monte Carlo experiments). It will be seen in the following section that the actual observation errors used in the Monte Carlo experiments will be different from this a priori information in order to get closer to real conditions. For example, the observation error covariance matrix is well determined for the simulation of the experimental datasets, but our knowledge of is only partial when testing the integration techniques. Tests have been conducted with a perfect knowledge of these observation errors; obviously, the retrieval results are better, but this is not a realistic situation.

b. Perturbation model

To test the integration methodologies presented in section 2, a Monte Carlo approach is applied to the dataset of the previous section. A strategy could be to first perturb the data once to obtain one synthetic observation ε and then test which method retrieves better from ε. However, the results obtained using this strategy would be highly dependent on the chosen perturbation ε. The Monte Carlo approach consists of drawing a large number of perturbations ε in order to obtain reliable statistics on the quality of the integration methods.

The first very important component of a Monte Carlo experiment is therefore the statistical model for the perturbations ε. The perturbation model represents both errors in observation ε and uncertainties of our knowledge of these observation errors. In this study, the goal is to obtain a model that represents a bias and variance error on each one of the observations in ε. The model also needs to represent the uncertainties on these observation errors, since most of the time, the user of the integration models has only a limited knowledge of these observation errors.

Let S(t) be the time series of one water cycle component. The time series are perturbed using the following model:
e23
with a(t) following a normal distribution a(t) ~ (0, σa), where σa ~ (0, +20), that is, a uniform distribution between 0 and +20, and b ~ (−10, +10), also a uniform distribution. When the random errors a(t) + b are such that the term S(t) + a(t) + b is negative, the sampling is redone in order to avoid negative P, E, or Q components. The random bias b is independent from the time index t, so it represents a random bias applied for the whole time series for each N Monte Carlo members. The term a(t) is a random Gaussian noise with no bias and a variance that is drawn uniformly from −10 and +10 for each time series. Different parameters for the two uniform distributions (0, +20) and (−10, +10) could have been chosen for each observation ε (from P1 to ΔS3). For simplicity of presentation, they are chosen here to be equal for each observation type. These noise characteristics constitute the synthetic errors of observations that will be used in the Monte Carlo experiment, but our a priori knowledge of these errors is imperfect and is provided in Table 1.

Other distributions of errors could have been chosen, for instance, a lognormal distribution for precipitation. Also, the error model can be chosen to be state dependent, for example, the random term is not the same for a low and a high runoff. Again, for simplicity, we have chosen to use a Gaussian distribution of the random errors because it is a good assumption when no other information on the uncertainties is available, and because often, errors become Gaussian when a first guess is used. A uniform distribution for parameters σa and b was chosen because it is the “no information” distribution in Bayesian statistics and because it represents well the type of a priori knowledge that is generally available about uncertainties.

c. Monte Carlo algorithm

The Monte Carlo scheme for the estimation of the integration error statistics is described below to facilitate the understanding of the approach. It uses N = 1000 samples; the stability of the results to this number has been validated. The inputs are the datasets Mississippi and Niger (i.e., two 276 × 4 matrices for Mississippi and Niger) gathering the time series of the state variables for both basins. The Monte Carlo algorithm draws random samples ε using the perturbation model. The outputs of the Monte Carlo are the retrieval statistics of (a) for the various methodologies. We will test only six methodologies: SW, OI, NN, and for each one of them, their PF-filtered versions. No off-diagonal error in for the input datasets is considered in these experiments.

  • Data: Mississippi and Niger data: Mississippi and Niger.
  • Result: Retrieval accuracies of the integration methodologies.
  • Define and (i.e., a priori information on observation and first guess errors);
  • For i = 1 to N (i.e., number of Monte Carlo samples, 1000) do
    • For j = 1 to J (i.e., number of observations, 276) do
      • Draw bias b from (−10, +10);
      • Draw σa from (0, +20);
      • For t = 1 to T (number of time steps) do
        • Draw a(t) from (0, σa);
      • End
      • Obtain observation noises εj(t) = a(t) + b for each time step t.
    • End
    • Obtain ε = · + ε(t) for each time step.
    • Estimate using Eq. (7);
    • Estimate using Eq. (17);
    • Estimate using Eq. (19);
    • Apply the PF filtering to each estimate and obtain: , , ;
  • End
  • Perform the (a) root-mean-square (RMS) statistics on the pooled (N samples) dataset for the six methods.

4. Comparison of the integration methods

In this section, the integration methods presented in section 2 are tested on a dataset of real data but are artificially perturbed by a random noise following the Monte Carlo experiment described in the previous section.

a. Filters

The first filter considered here is the PF PF of Eq. (21). Figure 2 represents this 4 × 4 nonsymmetric matrix. It can be seen that PF is close to the identity matrix, but the two more uncertain variables, ΔS and even more P, have a lower diagonal amplitude, which means that these variables can be modified more than the two others to reach the closure. The closure is possible thanks to the off-diagonal terms. This closure filter can be applied a posteriori to the SW, CL, OI, and NN filters.

Fig. 2.
Fig. 2.

The values indicate how the weighting is done in the PF filter between observational water components and resulting components with closure.

Citation: Journal of Hydrometeorology 15, 4; 10.1175/JHM-D-13-0148.1

In Fig. 3, the 4 × 11 matrix filters of the SW and OI methods are represented without and with the PF. These filters are identical for Mississippi and Niger since they are not dependent on the state variables or observations ε. They would change if our a priori information on observation uncertainty in Table 1 was different for the two basins. It can be seen that the SW filter has no off-diagonal terms: the three observations for precipitation are used only for the precipitation, and similarly for evapotranspiration, runoff, and terrestrial water storage change. This is a structural constraint. Again, the observations with less uncertainties have a larger weight. The application of the PF filter on the SW solution keeps a structure that is relatively close to the SW filter, but off-diagonal terms are introduced to obtain the water budget closure. The application of the PF filter on the OI solution has a relatively small impact because a closure constraint was already introduced in the OI filter [Eq. (17)]. However, the weights on the precipitation data are particularly low, meaning that the determination of the precipitation in is almost as much determined by the precipitation observations in ε as by the other water cycle components and the closure constraint.

Fig. 3.
Fig. 3.

Filters for SW, SW + PF, OI2, and OI3 = OI1 + PF. The values in these matrices indicate how the weighting is performed by each filter on each input dataset, in order to obtain the results integrated dataset of the four water cycle components.

Citation: Journal of Hydrometeorology 15, 4; 10.1175/JHM-D-13-0148.1

Figure 4 represents the Jacobians of the neural network integration model. They are the analog of the linear filters SW or OI since these Jacobians represent a linearization of the nonlinear NN function g. As such, they can be directly compared to Fig. 3. The Jacobians are dependent on the actual observations ε, so they are represented here for both Mississippi and Niger, and for January and July means (i.e., monthly averages over the 23 years of data). It can be noted that January and July Jacobians are not identical. Mississippi and Niger Jacobians differ too. This means that the NN is able to optimize its weighting of the observations in ε to the particular location and time. However, the overall structure of the filters is similar and close to the filters of Fig. 3, which is sign of robustness of the integration methodologies. The closure constraint introduced with PF has almost no impact on the NN filters (not shown) because the closure is already good with the NN integration solution.

Fig. 4.
Fig. 4.

NN Jacobians (analog of the linear filters of Fig. 3) for (from top to bottom) Mississippi in January and July and Niger in January and July.

Citation: Journal of Hydrometeorology 15, 4; 10.1175/JHM-D-13-0148.1

b. Integration results

Figure 5 represents the time series of the precipitation, evapotranspiration, runoff, water storage change, and terrestrial water budget for year 1984 for the Niger basin. There are three observations for each water component (red). These observations are deviated from the target cycle (dashed black line). The SW solution (green) is a weighting of the observations. It can be seen in the time series of the budget that this weighting of the observations is already a good improvement, even if no closure constraint is used, since the residuals are much lower. The NN estimate (blue) is an even better solution: the residuals are, in this case, negligible. It confirms that the NN is able to obtain the closure even if this constraint was only implicitly imposed in the learning dataset .

Fig. 5.
Fig. 5.

Time series of the P, E, Q, ΔS, and terrestrial water budget for year 1984 for the Niger basin. Observations are in red, the SW estimates are in green, the NN estimate in blue, and the target is dashed black.

Citation: Journal of Hydrometeorology 15, 4; 10.1175/JHM-D-13-0148.1

The RMS errors of the retrieval of the four water cycle components and the residuals of the terrestrial water budget are represented in Table 2 for the Mississippi and Niger basins, for the whole 1984–2006 record. These estimates result from the Monte Carlo experiments described in section 3. The STD of each variable is represented first for comparison purposes. Typical errors for the observations are also represented: set 1 is for (P1, E1, Q1, ΔS1) in Eq. (22) and set 2 is for (P2, E2, Q2, ΔS2). The RMS errors for set 1 are the same for both basins since no location dependency has been introduced in the observation noise. However, for each water component, errors from the two sets are different.

Table 2.

RMSE statistics from the Monte Carlo experiments for the retrieval of P, E, Q, ΔS, and the resulting water cycle budget for the Mississippi and Niger basins. The STD of each variable is represented for comparison purposes. Integration methodologies are SW, SW + PF, OI1, OI2, OI1 + PF, NN, and NN + PF. Statistics are performed for the whole 1984–2006 period.

Table 2.

The results of this table are also represented in Fig. 6 for Mississippi (continuous lines) and for Niger (dashed lines). Only the SW, SW+PF, and NN results are presented since the other integration methods are comparable to one or the other of these three methods. Each integration technique is better than the use of a single set of observations. This is expected, of course, since the combination of multiple observations is better than the use of only one. SW is a very good improvement over the use of a single set of observations; the RMS errors can be divided by a factor of 2. However, the resulting budget is still not acceptable (RMS about 9.25 mm for Mississippi). The introduction of the postprocessing filtering PF to SW solves this problem: the budget reaches closure. In addition, the introduction of the closure constraint has not degraded the retrievals of SW; it has actually improved it significantly (Ko and Bitmead 2007), for example, from 5.38 to 4.36 for precipitation. This means that the closure constraint is good a priori information and that its use in the integration process is very valuable.

Fig. 6.
Fig. 6.

RMSE of the retrieved water component (P, E, Q, and ΔS) using method SW (red), SW + PF (green), and NN (blue). The other integration methods are similar to one of these curves. The resulting water cycle budget is also represented. Results are for the whole 1984–2006 time series. Mississippi basin is in continuous lines and Niger in dashed lines.

Citation: Journal of Hydrometeorology 15, 4; 10.1175/JHM-D-13-0148.1

The RMS errors of OI1 are almost identical to SW. This is surprising since more information is used in the OI integration techniques, that is, the observing operator . However, this information is relatively limited in this experiment because of the simplicity of the structure. Its use in the integration might only be useful for more sophisticated models such as when is the linearization of the radiative transfer in numerical weather prediction centers for traditional weather forecasting. The introduction of the closure constraint in the OI2 estimation has a very strong positive impact in the water budget closure that decreases from 9.25 for the OI1 estimate, to 0.01 with OI2, again for the Mississippi case. This demonstrates the ability of the OI technique to utilize external constraints such as the closure. Again, the introduction of the closure constraint also improves the estimate of the water components. Note that results for the SW + PF and the OI2 solution are almost identical. Furthermore, the OI1 + PF solution, which includes the PF filtering, performs as well as the OI2 estimate. This means that the introduction of the closure can be done simultaneously with the observations or as a postprocessing step.

The NN approach appears to be quite a good solution. It outperforms the other methods in terms of retrieval accuracy. Furthermore, it is possible to confirm that the closure constraint imposed only implicitly in the dataset used to train the NN is respected in the NN outputs since the NN budget has RMS error close to zero. The use of the PF filter does not improve the solution: the retrieval uncertainties are the same and the closure of the budget is only slightly reduced. The fact that the NN follows the closure well using only an implicit constraint could be surprising. However, the PF processing to explicitly impose the closure (section 2g) is just a 4 × 4 matrix close to identity (Fig. 2), so if the NN “learns” during the training that such postprocessing is useful to obtain optimal retrievals, it is easy for the NN to represent it in its model g. Another remark is that these NN experiments have been performed without any regularization imposing constraints on the weight amplitude (section 2f), so the fact that the results appear robust means that the problem is well posed.

5. Conclusions, discussion, and perspectives

A multitude of satellite observations are available to monitor Earth’s surface and atmosphere at various wavelengths, from various orbits (geostationary or polar orbiters), and with different space and time samplings. In general, these numerous observations are used independently to derive one particular geophysical variable. These observations have resulted in a multiplicity of datasets for the same geophysical variable. Each dataset has its own advantages and limitations [e.g., one soil moisture dataset can perform better over semiarid regions, another can be better for vegetated areas (Liu et al. 2011)], so users have difficulties in choosing one dataset among the others.

This situation is close to what happens in the climate modeling community, where many climate models are available and each one is offering valuable information. To solve this issue and exploit optimally all the information available about the future climate, the scientific community has averaged the forecasts from the major climate models to obtain 1) a more reliable estimate of the future climate and 2) a quantification of model dispersion as a surrogate for climate forecast uncertainty. In a similar way, the Earth observation community needs to organize a better framework to combine, as well as possible, all the available satellite observations. International programs such as the GEWEX program or space agencies such as the European Space Agency (ESA) now focus on the production of unified and coherent global satellite datasets.

It has been shown in the literature that, if possible, it is optimal to combine the satellite measurements before the retrieval so that the potential synergy among the observations could be exploited by the retrieval algorithm itself (Aires et al. 2012). However, this is not always possible in practice, so there is a need for methods combining a posteriori multiple geophysical datasets from independent retrievals. The integration solutions presented in this study intend to optimize this a posteriori combination using a priori information on the observation uncertainty estimates. Four methodologies have been introduced in order to obtain optimized terrestrial water cycle component, in particular, estimates of precipitation, evapotranspiration, runoff, and terrestrial water storage change: simple weighting (SW), the constrained linear (CL) method, the optimal interpolation (OI) method, and the neural network (NN) technique. All of them can be combined, if necessary, with a postprocessing filtering (PF) step to impose the water budget closure (Pan and Wood 2006). These integration methodologies are able to exploit the synergy of multiple observational datasets from satellite observations or in situ measurements (Aires 2011; Aires et al. 2011).

The presented methodologies are also able to use a physical constraint that links the four water cycle components, for example, the closure of the terrestrial water cycle budget. This means that, first, multiple water cycle component estimates are optimized at once, and, second, that these components become more coherent with each other. The Monte Carlo numerical experiments show that the introduction of this additional physical constraint actually helps the retrieval process. This a priori physical constraint is independent from any model, so the final dataset of optimized water cycle components is model independent3 and it can be used to validate or calibrate hydrological, surface, or climate models. Among the four methods, the solutions seem to have equivalent performances once the closure constraint is used. The introduction of the closure constraint is very important; it actually improves not only the budget but also the estimation of the four components.

The perspectives for this work are many fold. In a companion paper, various global datasets providing multiple estimates of each terrestrial water cycle component will be gathered. The integration approaches will be used to obtain a global optimized and coherent dataset of the four water cycle components. The weighting of the various initial observation datasets will be investigated. Some of the techniques have their weighting dependent on the state variables of the hydrologic system or on the observation uncertainty. These variations of the weighting will be investigated globally. Furthermore, precipitation, evaporation, and water storage changes can be observed globally, but runoff is only measured in some local stations. To obtain an optimized and global dataset of (P, E, ΔS), the local weights in the runoff measurement locations need to be interpolated to the locations without runoff measurements. Another way to improve the water cycle dataset is to introduce even more constraints. For instance, links to other water cycle components such as soil moisture (Liu et al. 2011; Kolassa et al. 2013) or inundations (Prigent et al. 2007) could be identified. An example of possible links could be the correlation between two water cycle variables. Then, the introduction of these links as new constraints in the integration scheme could be considered. This would require new technical developments.

Another very important topic concerns the estimation of errors for the input datasets, before integration. We have seen in this paper that this information has a strong impact on most integration methods. Unfortunately, error information is very difficult to obtain from data producers. In our experiments, Gaussian and uniform distributions have been used, but more complex distributions might be more adequate (such as a lognormal distribution for precipitation errors). The distribution of errors might also be dependent on the state (e.g., bigger relative errors when the signal to measure is smaller). An important point also concerns the correlation of errors among the input datasets. Satellite datasets use often common features (like the same radiative transfer model or a same auxiliary dataset used for the retrieval) so correlation of errors might appear. Even if no common feature is used among the various input datasets, the cases with larger errors for one dataset might be the cases with larger errors for another dataset; this would create error correlations. If such complex information on the errors was available, it could be used in the integration method. It might be possible to use Bayesian statistics to estimate “implicitly” the uncertainties of each input datasets and their relationship during the integration step. This will be the subject of a forthcoming study.

Acknowledgments

We thank the ESA (European Space Agency) and in particular Diego Fernandez for funding the project “Watchful: STSE Water Cycle Feasibility” (4000107122/12/I-NB; http://watchful.geo.tuwien.ac.at) and Marcela Doubkova for managing this project. We also thank colleagues Wolfgang Wagner, Richard Kidd, Wouter Dorigo, Patrick Matgen, and Paul Bates from the Watchful project for interesting related discussions and in particular Stefan Schlaffer, who coordinated this project. We are also grateful to Ming Pan and colleagues from Princeton University for providing the dataset used in this study.

REFERENCES

  • Aires, F., 2011: Measure and exploitation of multisensor and multiwavelength synergy for remote sensing: 1. Theoretical considerations. J. Geophys. Res., 116, D02301, doi:10.1029/2010JD014701.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , and Rossow W. , 2003: Inferring instantaneous, multivariate and nonlinear sensitivities for the analysis of feedback processes in a dynamical system: Lorenz model case study. Quart. J. Roy. Meteor. Soc., 129, 239275, doi:10.1256/qj.01.174.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Schmitt M. , , Scott N. , , and Chédin A. , 1999: The weight smoothing regularization for MLP for resolving the input contribution’s errors in functional interpolations. IEEE Trans. Neural Networks, 10, 15021510, doi:10.1109/72.809096.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Prigent C. , , and Rossow W. B. , 2004a: Neural network uncertainty assessment using Bayesian statistics with application to remote sensing: 3. Network Jacobians. J. Geophys. Res., 109, D10305, doi:10.1029/2003JD004175.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Prigent C. , , and Rossow W. B. , 2004b: Neural network uncertainty assessment using Bayesian statistics: A remote sensing application. Neural Comput., 16, 24152458, doi:10.1162/0899766041941925.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Bernardo F. , , Brogniez H. , , and Prigent C. , 2010: Calibration for the inversion of satellite observations. J. Appl. Meteor. Climatol., 49, 24582473, doi:10.1175/2010JAMC2435.1.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Paul M. , , Prigent C. , , Rommen B. , , and Bouvet M. , 2011: Measure and exploitation of multisensor and multiwavelength synergy for remote sensing: 2. Application to the retrieval of atmospheric temperature and water vapor from MetOp. J. Geophys. Res., 116, D02302, doi:10.1029/2010JD014702.

    • Search Google Scholar
    • Export Citation
  • Aires, F., , Aznay O. , , Prigent C. , , Paul M. , , and Bernardo F. , 2012: Synergistic multi-wavelength remote sensing versus a posteriori combination of retrieved products: Application for the retrieval of atmospheric profiles using MetOp-A. J. Geophys. Res., 117, D18304, doi:10.1029/2011JD017188.

    • Search Google Scholar
    • Export Citation
  • Azarderakhsh, M., , Rossow W. B. , , Papa F. , , Norouzi M. , , and Khanbilvardi R. , 2011: Diagnosing water variations with the Amazon basin using satellite data. J. Geophys. Res., 116, D24107, doi:10.1029/2011JD015997.

    • Search Google Scholar
    • Export Citation
  • Bishop, C., 1996: Neural Networks for Pattern Recognition.Clarendon Press, 482 pp.

  • De Geeter, J. D., , Bruseel H. V. , , and Schutter J. D. , 1997: A smoothly constrained Kalman filter. IEEE Trans. Pattern Anal. Mach. Intell., 19, 11711177, doi:10.1109/34.625129.

    • Search Google Scholar
    • Export Citation
  • Fernandez-Prieto, D., , van Oevelen P. , , Su Z. , , and Wagner W. , 2012: Advances in earth observation for water cycle science. Hydrol. Earth Syst. Sci., 16, 543549, doi:10.5194/hess-16-543-2012.

    • Search Google Scholar
    • Export Citation
  • Foley, A. M., and Coauthors, 2013: Evaluation of biospheric components in Earth system models using modern and palaeo-observations: The state-of-the-art. Biogeosciences, 10, 83058328, doi:10.5194/bg-10-8305-2013.

    • Search Google Scholar
    • Export Citation
  • Hayward, S., 1998: Constrained Kalman filter for least-squares estimation of time-varying beamforming weights. Mathematics in Signal Processing IV, J. McWhirter and I. Proudler, Eds., Oxford University Press, 113–125.

  • Kalman, R., 1960: A new approach to linear filtering and prediction problems. Trans. ASME,82D, 35–45.

  • Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc.,77, 437–471, doi:10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.

  • Ko, S., , and Bitmead R. R. , 2007: State estimation for linear systems with state equality constraints. Automatica, 43, 13631368, doi:10.1016/j.automatica.2007.01.017.

    • Search Google Scholar
    • Export Citation
  • Kolassa, J., , Aires F. , , Polcher J. , , Prigent C. , , Jimenez C. , , and Pereira J. , 2013: Soil moisture retrieval from multi-instrument observations: Information content analysis and retrieval methodology. J. Geophys. Res. Atmos., 118, 48474859, doi:10.1029/2012JD018150.

    • Search Google Scholar
    • Export Citation
  • Liu, Y., and Coauthors, 2011: Developing an improved soil moisture dataset by blending passive and active microwave satellite-based retrievals. Hydrol. Earth Syst. Sci., 15, 425436, doi:10.5194/hess-15-425-2011.

    • Search Google Scholar
    • Export Citation
  • Pan, M., , and Wood E. F. , 2006: Data assimilation for estimating the terrestrial water budget using a constrained ensemble Kalman filter. J. Hydrometeor., 7, 534547, doi:10.1175/JHM495.1.

    • Search Google Scholar
    • Export Citation
  • Pan, M., , Sahoo A. K. , , Troy T. J. , , Vinukollu R. K. , , Sheffield J. , , and Wood E. F. , 2012: Multisource estimation of long-term terrestrial water budget for major global river basins. J. Climate, 25, 31913206, doi:10.1175/JCLI-D-11-00300.1.

    • Search Google Scholar
    • Export Citation
  • Porrill, J., 1988: Optimal combination and constraints for geometrical sensor data. Int. J. Robotics Res.,7, 66–77, doi:10.1177/027836498800700606.

  • Prigent, C., , Papa F. , , Aires F. , , Rossow W. , , and Matthews E. , 2007: Global inundation dynamics inferred from multiple satellite observations, 1993–2000. J. Geophys. Res., 112, D12107, doi:10.1029/2006JD007847.

    • Search Google Scholar
    • Export Citation
  • Rodgers, C., 2000: Inverse Methods for Atmospheric Sounding: Theory and Practice. Series on Atmospheric, Oceanic and Planetary Physics, Vol. 2, World Scientific Publishing, 240 pp.

  • Rumelhart, D., , Hinton G. , , and Williams R. , 1986: Learning internal representations by error propagation. Foundations, Vol. 1, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, 318–362.

  • Sahoo, A. K., , Pan M. P. , , Troy T. J. , , Vinukollu R. K. , , Sheffield J. , , and Wood E. F. , 2011: Reconciling the global terrestrial water budget using satellite remote sensing. Remote Sens. Environ., 115, 18501865, doi:10.1016/j.rse.2011.03.009.

    • Search Google Scholar
    • Export Citation
  • Sheffield, J., , Ferguson C. , , Troy T. , , Wood E. , , and McCabe M. , 2009: Closing the terrestrial water budget from satellite remote sensing. Geophys. Res. Lett.,36, L07403, doi:10.1029/2009GL037338.

  • Simon, D., , and Chia T. L. , 2002: Kalman filtering with state equality constraints. IEEE Trans. Aerosp. Electron. Syst., 38, 128136, doi:10.1109/7.993234.

    • Search Google Scholar
    • Export Citation
  • Tarantola, A., 1987: Inverse Problem Theory: Methods for Data Fitting and Model Parameter Estimation.Elsevier, 613 pp.

  • Thome, K., 2004: In-flight intersensor radiometric calibration using vicarious approaches. Post-Launch Calibration of Satellite Sensors, S. A. Morain and A. M. Budge, Eds., ISPRS Book Series, Vol. 2, Taylor and Francis, 93–102.

1

As often seen, the various solutions are equivalent when the same assumptions are used (i.e., linearity of the operators and Gaussian character of the considered random variables).

2

In this case, observations are equal to the actual water cycle component plus an additive noise. More complex observing systems could be considered here and, in this case, the linear function A would be replaced by a more complex relation.

3

Note that the techniques presented here could be used together with an FG estimate from a numerical model (like in variational assimilation) but, in this case, their potential use for model validation would vanish.

Save