Mixtures of Gaussians for Uncertainty Description in Bivariate Latent Heat Flux Proxies

R. Wójcik Environmental Sciences Group, Hydrology and Quantitative Water Management, Wageningen University, Wageningen, Netherlands

Search for other papers by R. Wójcik in
Current site
Google Scholar
PubMed
Close
,
Peter A. Troch Environmental Sciences Group, Hydrology and Quantitative Water Management, Wageningen University, Wageningen, Netherlands

Search for other papers by Peter A. Troch in
Current site
Google Scholar
PubMed
Close
,
H. Stricker Environmental Sciences Group, Hydrology and Quantitative Water Management, Wageningen University, Wageningen, Netherlands

Search for other papers by H. Stricker in
Current site
Google Scholar
PubMed
Close
,
P. Torfs Environmental Sciences Group, Hydrology and Quantitative Water Management, Wageningen University, Wageningen, Netherlands

Search for other papers by P. Torfs in
Current site
Google Scholar
PubMed
Close
,
E. Wood Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey

Search for other papers by E. Wood in
Current site
Google Scholar
PubMed
Close
,
H. Su Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey

Search for other papers by H. Su in
Current site
Google Scholar
PubMed
Close
, and
Z. Su International Institute for Geo-Information Science and Earth Observation (ITC), Enschede, Netherlands

Search for other papers by Z. Su in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

This paper proposes a new probabilistic approach for describing uncertainty in the ensembles of latent heat flux proxies. The proxies are obtained from hourly Bowen ratio and satellite-derived measurements, respectively, at several locations in the southern Great Plains region in the United States. The novelty of the presented approach is that the proxies are not considered separately, but as bivariate samples from an underlying probability density function. To describe the latter, the use of Gaussian mixture density models—a class of nonparametric, data-adaptive probability density functions—is proposed. In this way any subjective assumptions (e.g., Gaussianity) on the form of bivariate latent heat flux ensembles are avoided. This makes the estimated mixtures potentially useful in nonlinear interpolation and nonlinear probabilistic data assimilation of noisy latent heat flux measurements. The results in this study show that both of these applications are feasible through regionalization of estimated mixture densities. The regionalization scheme investigated here utilizes land cover and vegetation fraction as discriminatory variables.

* Current affiliation: Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey

Correspondingauthor address: R. Wójcik, Dept. of Civil and Environmental Engineering, Princeton University, Princeton, NJ 08544. Email: rwojcik@princeton.edu

Abstract

This paper proposes a new probabilistic approach for describing uncertainty in the ensembles of latent heat flux proxies. The proxies are obtained from hourly Bowen ratio and satellite-derived measurements, respectively, at several locations in the southern Great Plains region in the United States. The novelty of the presented approach is that the proxies are not considered separately, but as bivariate samples from an underlying probability density function. To describe the latter, the use of Gaussian mixture density models—a class of nonparametric, data-adaptive probability density functions—is proposed. In this way any subjective assumptions (e.g., Gaussianity) on the form of bivariate latent heat flux ensembles are avoided. This makes the estimated mixtures potentially useful in nonlinear interpolation and nonlinear probabilistic data assimilation of noisy latent heat flux measurements. The results in this study show that both of these applications are feasible through regionalization of estimated mixture densities. The regionalization scheme investigated here utilizes land cover and vegetation fraction as discriminatory variables.

* Current affiliation: Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey

Correspondingauthor address: R. Wójcik, Dept. of Civil and Environmental Engineering, Princeton University, Princeton, NJ 08544. Email: rwojcik@princeton.edu

1. Introduction

Latent heat flux (LE) is the key variable that provides a link between energy and water budgets at the land surface. Since much of our understanding of the complex feedback mechanisms between the earth surface and the atmosphere is focused on quantifying these budgets, there is considerable interest in developing methods that routinely predict this variable. Local- and regional-scale estimates of LE would offer insight into hydroecological processes, aid in improving irrigation efficiency, and would provide a valuable tool for water resource management. Accurate estimation at large scales is required to improve our understanding of the global climate and its spatial and temporal variability (Miller et al. 1995). However, the prediction and validation of LE across all scales remains problematic.

The conventional methods to estimate LE are based on point measurements of energy balance components or turbulent surface fluxes and are representative only for very local scales. Recently, a new class of techniques based on satellite remotely sensed (RS) information has been developed to compute LE at scales from 1 km to a continent. Despite their theoretical attractiveness, especially for regional and global hydrological applications, “satellite derived” LEsat usually does not compare well with “in situ measured” LEis. Both proxies of LE, however, contain information about the true variability of this quantity. The difficulty in inferring this information from data is due to different sources of uncertainty involved (e.g., measurement errors, support scale, heterogeneity of land surface). In this context it is therefore natural to treat LE as a joint probability density function (pdf) over a spatially distributed set of bivariate random variables comprised of both proxies. Although this “true” high-dimensional joint pdf is a purely theoretical object, we are able to observe the bivariate samples from its marginal pdfs at particular locations in space. The purpose of this paper is threefold: to investigate a new nonparametric methodology for fitting these marginal LE pdfs to the experimental data from six sites, to pose the (preliminary) hypothesis of regionalization of the estimated pdfs through land use alone and additional parameters like degree of vegetation cover using again the six datasets, and to show the theoretical option of using these pdfs in spatiotemporal interpolation and data assimilation. Technically, the last objective is accomplished by first estimating the marginal pdf of bivariate LE and then using it to derive a conditional pdf of LEis given LEsat. The motivation for the latter asymmetry is that since in situ measurements of LE are derived from observations of physical processes at the land surface that determine natural variability of LE, they provide best estimates of LE at local scale. On the other hand, satellites observe states that are affecting this flux (as, e.g., surface temperature) at coarser pixel scales. Moreover, when RS information is used to derive LE there are extra sources of uncertainty as compared to in situ measurements due to inadequacies in retrieval algorithms, nonlinear measurement error propagation through RS models for LE, influence of cloud cover, and errors in land-use classification (Hipps and Kustas 2000). In this paper we therefore choose to condition local LEis on LEsat. The conditionals are modeled by a nonparametric class of continuous pdfs referred to as mixtures of Gaussians (MGs; see McLachlan and Peel 2000). The attractive property of MGs is that they do not require any arbitrary assumptions on the form of an underlying pdf (like, e.g., Gaussian assumption). This implies that as compared to the classical parametric approaches MGs can adapt to the local geometry of data ensembles (e.g., points distributed in multiple modes or points distributed on a low-dimensional surface in a high-dimensional space) and are able to approximate any continuous density to an arbitrary precision. Moreover, it is easy to simulate ensembles of points from parameterized MGs, which makes them useful in ensemble Kalman filtering.

The marginal MGs can further be regionalized and used for spatiotemporal interpolation of the LE proxies. In this article we investigate a practical method for regionalization of MGs that utilizes land use and vegetation cover as discriminatory variables. The interpolation is performed by deriving the conditional pdfs from a particular regionalized marginal pdf and then calculating their (conditional) expectation. Such nonlinear interpolation is particularly suitable for the LE flux, which does not aggregate/disaggregate linearly (Braud 1998). Another benefit from having the regionalized conditionals parameterized by MGs is that they can be assimilated into land surface models by the recently developed nonlinear ensemble Kalman filter (see Anderson and Anderson 1999; Torfs et al. 2002).

The LEis data in this paper are obtained from energy balance Bowen ratio (EBBR) systems from southern Great Plains (SGP) region in the United States. The LEsat proxies are estimated with Surface Energy Balance System (SEBS) developed by Su (2002).

2. Mixtures of Gaussians

a. Definition

To estimate the conditional uncertainty of LEis given LEsat a pdf needs first to be fitted to a bivariate sample {LEsat,k; LEis,k}Kk=1. In this work the focus is on the use of MGs, which are defined as a linear combination of Gaussian densities (see Fig. 1), called components:
i1525-7541-7-3-330-e1
where x is a D-dimensional vector of variables, Nc is the number of components, and g(mn,Cn) stands for the Gaussian density with mean mn and covariance Cn.
Here D = 2, x = [LEis, LEsat]T and the wn’s are the component weights that “∀n wn ≥ 0” and Σwn = 1. Densities of the MG type inherit a lot of interesting properties from their Gaussian components: for example, the conditional densities p(x1|x2),
i1525-7541-7-3-330-e2
are again MGs and can be calculated analytically (for technical details see Sharma 2000; Torfs and Wójcik 2001), Monte Carlo sampling is easy and fast, and, if needed, the conditional expectation (regression curve),
i1525-7541-7-3-330-e3
can be derived from (1) and again is given by an analytic expression (Torfs and Wójcik 2001).

b. Fitting procedure

Fitting (1) to data requires optimizing weights wn and the parameters of the components θn = {mn, Cn}. The most commonly used optimization criterion is the maximum likelihood (ML) criterion implemented as expectation maximization (EM) algorithm (McLachlan and Krishnan 1997). Standard EM for mixtures, however, exhibits some weaknesses. It requires knowledge of Nc and good initialization is essential for reaching a good local optimum. To overcome these difficulties we use the approach of Figueiredo and Jain (2002) based on the minimum message length (MML) criterion. The rationale behind MML is that if one can built a short code describing one’s data that means that one has a good data generation model (Bishop 1995). Mathematically, the MML criterion for MG pdfs consists of minimizing with respect to θ, where θ ≡ {θ1. . .θNc,w1. . .wNc}, the following cost function:
i1525-7541-7-3-330-e4
where N = dim(θn), and K is the number of sample points. An attractive property of the algorithm of Figueiredo and Jain (2002) is that it is coupled with the model selection procedure that automatically determines the number of components Nc. Thus, MG can be initialized with a large value of Nc, alleviating the need for careful initialization. Because of this, a component-wise version of EM (Celeux et al. 2001) is adopted in Figueiredo and Jain (2002) to minimize (4).

3. Model structure of SEBS

The estimates of LEsat were computed with SEBS (Su 2002). This model calculates atmospheric turbulent heat fluxes using satellite earth observation data. The SEBS consists of three components—land surface parameters, sensible heat flux estimation, and the energy balance—that are described briefly below [see Su (2002) for additional details].

Required land surface parameters include albedo, emissivity, surface temperature, fractional vegetation coverage, leaf area index, and the height of the vegetation from which displacement height and roughness height are derived. All this information is usually derived from remote sensing radiance data [e.g., the Moderate Resolution Imaging Spectroradiometer (MODIS)] in conjunction with other surface-related data [as, e.g., those from the Land Data Assimilation System (LDAS) database; http://ldas.gsfc.nasa.gov/LDAS8th/MAPPED.VEG/LDASmapveg.shtml]. The sensible heat flux H is based on the aerodynamic profile method and, because of the use of surface temperature, the determination of the two roughness lengths for heat and momentum transfer, as described in (Su et al. 2001). Required observations [or from four-dimensional data assimilation (4DDA) analysis fields] include air pressure, air temperature, humidity, and wind speed at a reference height (the measurement height for local-scale applications). For the local-scale SEBS obtains the friction velocity, the sensible heat flux, and the Monin–Obukhov stability length by solving iteratively a system of nonlinear equations. For field measurements performed at a height of a few meters above ground, where the surface fluxes are related to surface variables and variables in the atmospheric surface layer, all calculations involve the Monin–Obukhov similarity (MOS) functions given by Brutsaert (1999). The fluxes are based on the energy balance relations and utilize measured net radiation (or its equivalent derived through satellite-based observations of incoming radiation and surface temperature) and an estimate of the ground heat flux (see Su 2002). Latent heat flux can be estimated now from the energy balance equation:
i1525-7541-7-3-330-e5
where Rn stands for net radiation and G for soil heat flux, respectively.

4. MGs in the generalized ensemble Kalman filter

Apart from considering the conditionals in (2) as pure uncertainty descriptors, they can be assimilated into land surface models [as e.g., variable infiltration capacity (VIC) model in Liang et al. 1996] by the generalized ensemble Kalman filter (GEnKF). This algorithm [originally inspired by Anderson and Anderson (1999) and cast by Torfs et al. (2002) into a broader probability theoretical framework] goes beyond the classical linear ensemble Kalman filter (EnKF; Evensen 1994) in the sense that it does not require any a priori assumptions on the form of pdfs for state and observational noise, nor does it presumes linearity of state and/or output equations to get optimal state estimates. Accordingly, when new observations become available the state updates are not restricted to assimilating Gaussian pdfs of these observations as in the EnKF, but allow any nonparametric pdfs (e.g., MGs) to be incorporated. This new idea extends the work of Torfs et al. (2002) and makes GEnKF competitive with more traditional data assimilation schemes. Since the overall objective of this research direction is the implementation of GEnKF we find it useful here to give a brief mathematical description of the algorithm with emphasis on state updates given new observations.

Let us denote the state of the system at time n as sn, the state at time n + 1 as sn+1, and the observation at time n + 1 as on+1. These three variables can be scalars or vectors. We use ϕn to denote the joint pdf and fn, fn+1 and hn+1 their respective marginals. Given a new observation on+1 the solution to GEnKF problem is given by
i1525-7541-7-3-330-e6
Assuming conditional independence of sn and on+1 given sn+1 and making use of the Bayesian approach the following relations hold (Torfs et al. 2002):
i1525-7541-7-3-330-e7
i1525-7541-7-3-330-e8
Equation (7) is referred to as the prediction step and (8) as the Kalman gain or analysis step, respectively. When the observation is not known deterministically, but only its probability density ϕ is given,1 this last formula is to be replaced by
i1525-7541-7-3-330-e9
With preliminary knowledge of fn,n+1(sn, sn+1) and ϕn+1(sn+1, on+1), the integrals above are calculated recursively until the time of the latest observation.

Equations (6)(9) describe an abstract setting for Kalman filtering, regardless of how the densities involved are given. In this paper we propose to approximate them by MGs. MGs are particularly well suited for this: because simulating from them is extremely fast, all integrals above can be evaluated by Monte Carlo sampling. Figure 2 illustrates the steps involved in computing the product in (9). First, a number of starting points is simulated from known observational ϕn+1 pdf (Fig. 2a; the marks stand for the simulated points). From this ensemble, we calculate a statistically relevant set of posterior state pdfs fgn+1 (on the lines of Fig. 2b). For this we use our knowledge of ϕn+1 (Fig. 2c) at the simulated starting points. Then, the posteriors are again sampled, which is visualized in Fig. 2d. The joint pdf ϕ*n+1 is then fitted in Fig. 2f to the sample in Fig. 2e. Finally, ϕ*n+1 is marginalized (Fig. 2g), resulting in the analysis pdf Fgn+1.

When dimensionality of state/observation space is high, as is the case of spatially distributed hydrologic models, the performance of the algorithm above would be suffering the curse of dimensionality (Gershenfeld 1992). That term was coined by Bellman (1961) to describe the problem that occurs when searching in or estimating pdfs on high-dimensional spaces. This problem may become intuitively clearer by looking at the example of fitting multidimensional histograms. Given a fixed number of M grid lines per dimension D, the number of independent cells grows as MD. Furthermore, if the density function is to be estimated based on a set of high-dimensional samples, the number of samples required for accurate histogram estimation also grows as MD. The same is true for MG models—here the components are continuous equivalents of cells used in histograming, and their weights can be viewed as histogram values at those cells. A pragmatic way to tackle this problem is to regionalize the pdfs. In other words, instead of using high-dimensional joint pdfs representative of the entire spatial domain of a particular model, we propose to use a few lower-dimensional marginal pdfs that are representative only of subdomains. For marginal LE pdfs, identification of these subdomains might be based on two discriminatory variables described in section 7c.

5. Control run and the surrogates

When estimating MG pdfs for bivariate LE ensembles, there are three potential difficulties that should be addressed:

  • Undersampling: For optimization of 2D MGs, six parameters have to estimated for each component [see Eq. (1)]. The algorithm in Figueiredo and Jain (2002) is only guaranteed to be robust with regard to the initialization procedure if the sample size is “large enough.” As mentioned in section 4 the number of points required for reasonable density estimation in D-dimensional space scales roughly as MD. So gaps in data might result in the higher uncertainty in MG parameters. There are a few common reasons for missing records in satellite-derived data. For example, the performance of the RS retrieval algorithms for surface temperature Ts is influenced by cloud cover. Thus, SEBS predictions are available only for cloud-free or partly cloud-free days. Moreover, satellites provide only a snapshot view of spatial variability in Ts and as a consequence indirectly the spatial variability in LEsat. Finally, accidental technical failures in in situ or RS instruments are responsible for gaps in data.

  • Instrumental and model errors: Although LEis observations are the closest approximation to natural variation of LE at local scales, the techniques used to measure LEis, as, for example, EBBR systems, are themselves not without error. Indeed, in the heterogeneous landscape of the First International Satellite Land Surface Climatology Project (ISLSCP) Field Experiment (FIFE) campaign, LEis predictions were often as high as 20% in error (Nie et al. 1992). On the other hand, LEsat estimates are prone to errors in RS inputs to SEBS and limitations of SEBS itself to reproduce the complicated physical situation in the surface layer of air.

  • Scaling: The footprint over which an LEsat is determined is rarely at the same scale as LEis, making direct comparison difficult.

The above-mentioned problems are usually responsible for strong scattering effects that blur the dependency structure underlying the bivariate LE ensembles. Recalling that the second objective of this work is to investigate the robustness of a practical methodology for regionalization of marginal LE pdfs—or, in other words, to investigate whether for classes of visually similar regions particular pdfs can be representative—such a situation needs to be resolved. As a pragmatic approach we propose the following course of action. We first pose the hypothesis that p(LEis, LEsat)’s have a structure that depends on land use and vegetation cover, equivalent to visually similar regions. Next, the hypothesis is verified by creating “idealized” bivariate LE samples for a variety of environmental conditions (in terms of water supply, available energy, saturation deficit, turbulent transport, and vegetation characteristics). These hourly, daylight-based samples are referred to as the control run. The LEis in the control run are taken “as is,” and LEsat proxies are obtained from SEBS forced with Rn estimated from in situ–measured radiation fluxes and Ts derived from the longwave radiation:
i1525-7541-7-3-330-e10
where LWout denotes the outgoing longwave radiation, LWin refers to the incoming longwave radiation, ε is the emissivity of the surface, and ς stands for the Stefan—Boltzmann constant. Note that in SEBS Rn and H, which specify the total available energy and sensible heat flux, respectively, depend on Ts. Next, the LEsat in the control run is perturbed by Monte Carlo propagation of “satellite” error in Ts through Rn and H in SEBS (see section 7a for technical details of this procedure). This way, keeping again LEis unchanged, we obtain another bivariate sample referred to as the surrogate data. It is expected that introduction of error sources blurs the structure in the control run but does not let it disappear. Thus, MG pdfs fitted separately to control run and surrogate data should be similar. To quantify the strength of this similarity we use the L2 correlation in Scott and Szewczyk (2001):
i1525-7541-7-3-330-e11
where p1 and p2 are L2 integrable pdfs for the control run and surrogate data, respectively. This measure is 0 if two pdfs show no similarity and 1 if two pdfs are just the same. For MGs (11) can be calculated analytically.

To summarize the above procedure: undersampling in LEsat data is tackled by creating an hourly surrogate data; by creating a control run of hourly data, error propagation is controlled and scaling is accounted for automatically by obtaining the conditional pdfs p(LEis|LEsat) in (2) from regionalized marginal pdfs p(LEis, LEsat) and using these conditionals to compute the regression curves in (3).

6. Data

The measurements of LEis used in this study come from six EBBR U.S. Department of Energy’s Atmospheric Radiation Measurement Program Cloud and Radiation Testbed (ARM/CART) stations (E15, E4, E9, E20, E7, E25) distributed across the SGP region of the United States (see Fig. 3). ARM is aimed at obtaining field measurements and developing models to better understand the processes that control solar and thermal infrared radiative transfer in the atmosphere and at the earth’s surface. The SGP CART site was the first field site established by ARM and consists of in situ and remote sensing instrument clusters across north-central Oklahoma and south-central Kansas.

The LEis proxies are based on 30-min averaged observations. The LEsat estimates in the control run were obtained with SEBS forced with the input set displayed in Table 1.

Both types of LE data were obtained at 1-hourly resolution in the period of 1 July 2001–30 September 2001. We further restricted the data to 8-hourly sections of a day (0900–1700 local time) so we only considered unstable and neutral conditions in the atmospheric surface layer. Figure 4 shows the control run ensembles of the bivariate LE data.

These ensembles can be thought of being discrete samples from the unknown “true” marginal pdfs. Looking at the ensembles in Fig. 4 it is clear that simple parametric families of pdfs as Gaussians are not flexible enough to capture the geometry of the problem. For this reason in section 7b we fit MGs to describe the LE data.

7. Results

a. Analysis of surrogate LE data

To obtain the surrogate data, “satellite” errors in Ts were then propagated through SEBS for recalculations of Rn and H to obtain Monte Carlo simulations. We assumed that errors in Ts are additive and follow N(0, σ2) distribution where σ = ±1.5 K. This estimate is derived from studies on MODIS Ts retrieval algorithms reported by Sobrino et al. (2003) and Wan et al. (2004). To perform the Monte Carlo propagation, 40 points were generated at random from the error distribution and added to or subtracted from a particular Ts measurement in the control run. This operation was independently repeated for all available Ts estimates from (10). Because we restricted ourselves to considering only unstable and neutral conditions in the atmospheric surface layer, whenever a realization of Ts < Ta (Ta denotes 2-m air temperature) appeared (sporadically) it was replaced with a regenerated value, the regeneration process being repeated until TsTa. All the Ts realizations were then propagated through SEBS to obtain the surrogate LE data. The surrogates are displayed in Fig. 5.

There is another subtle point pertinent to the above algorithm. In SEBS, an error in Ts directly contaminates Rnet and H estimates (see Su 2002), and in consequence aggravates the errors in LEsat. However, the question arises of whether the error in Ts is representative for the “satellite” error in Rnet. To investigate that issue the following analysis was done. First, by comparing in situ–measured Rnet,is with SEBS-estimated Rnet,sat using the dataset in Wood et al. (2003), we found the error in Rnet,sat to be as high as a 15% coefficient of variation (cυ). Then, we performed a simple check on the order of magnitude of the absolute error in Rnet,sat originating from the 15% error and from the 1.5-K error for a range of Rnet,sat values. These results are demonstrated in Table 2.

The clear conclusion from the table is that the error in the Ts cannot account for the total error in Rnet,sat as estimated by SEBS using the dataset in Wood et al. (2003). In practice there are evidently many more error sources in the computation of Rnet,sat, although we have to realize that part of the 15% cυ exists by the mismatch in spatial scale between Rnet,sat and Rnet,is. Returning to the point of how serious H and Rnet aggravate the contamination of LEsat by errors in Ts (cf. Figs. 4 and 5), it is mainly through H and only to a small extent through Rnet. This also implies that we may expect in practice substantially more scatter in a figure like Fig. 5 if more error sources would be considered. However, here we restrict our exercise to the most simple case of a single error source in Ts to demonstrate the methodology and usefulness of applying MGs.

b. MG density fitting

Bivariate MGs were fitted to both the control run and surrogate data. We initialized mean mn vectors in (1) to 30 randomly chosen data points. The initial covariances were made proportional to the identity matrix 𝗖n = σ2init𝗜 with the diagonal entries σ2init equal to 1/10 of the mean of the variances along each dimension of the data:
i1525-7541-7-3-330-e12
where m = (1/KKi=1x(i) is the global data mean (see Figueiredo and Jain 2002). This step was meant to assure the initial density on each data point be reasonably higher than 0. Figure 6 displays the fitted MGs. Comparing these pdfs to discrete underlying ensembles in Figs. 4 and 5 shows that MGs are a smoothed continuous representation of the underlying points. It is also clear from the figure that MGs can capture the particular local features of the ensemble while standard parametric densities (as, e.g., Gaussians) are unable to do this.

To quantify the similarity between pdfs for the control run and those for the surrogate data, the L2 correlation in (11) was calculated for each pair of MGs. These results are given as numbers in Fig. 6. Since the value of the correlation is high (0.84–0.97), the error in Ts did not have much influence on the control run pdfs of the bivariate LE data. It refines the control run pdfs, one may say. Note that the error in Ts has, however, a pronounced impact on LEsat estimates from SEBS, which can be seen by comparing horizontal spread of the ensembles in Figs. 4 and 5.

Calculation of the conditional MGs from the marginal MGs fitted to the surrogate data provides a basis for potential applications in spatiotemporal interpolation and data assimilation. The former can be achieved by making probabilistic predictions of LEis by either resampling from the conditional density p(LEis|LEsat) or calculating the conditional expectation E[LEis|LEsat]. The latter requires the knowledge of p(LEis|LEsat) and implementation of the algorithm described in section 4. For two grassland sites in the SGP region, marginal and conditional pdfs together with corresponding regression curves and standard deviation envelopes are presented in Fig. 7.

It is clear that the form of the conditional MG pdfs alters with an increase of LEsat values and reveals a variety of shapes: from Gaussian to highly non-Gaussian (e.g., multimodal or skewed). This nonstationary behavior influences the geometry of conditional expectation and standard deviation envelopes, which, for E15, are clearly nonlinear. Interestingly, the conditional pdfs for the E4 site appear to flatten (or technically, to have higher entropy) with increasing value of LEsat. The opposite is true for the E15 site. This is in accordance with the scatter patterns for both sites in Fig. 4. Both sites are situated in the grassland area and show a similar range of LEis values. However, it can be seen from Figs. 4 and 5 that hydrometeorological conditions, as produced by SEBS, for E15 suggest that it is dryer than E4. Therefore, our first conclusion with respect to the objectives of this study is that land-cover type alone cannot be used to regionalize LE pdfs. In the next section we identify an additional control parameter in SEBS that makes the regionalization feasible.

c. Regionalization of MGs

One of the SEBS parameters that characterize vegetation cover over a particular area is vegetation fraction ( fc). This parameter, which takes values from 0 to 1, plays a role in the estimation of soil heat flux and most importantly in the determination of scalar roughness height for heat transfer (Su 2002). The latter is a crucial parameter in parameterization of the momentum and heat transfer. From the physical point of view one may say that the smaller the fc, the earlier reduction of LEsat with respect to some potential LE will start and reduction will follow a steeper slope for a period of dryness. In contrast after a period of dryness the LE of the (1 − fc) fraction will restore much more quickly than that of the fc fraction.

To test the sensitivity of the bivariate LE data to fc we performed the following exercise. We extracted the < min; max > range of fc for E4 and E15 sites. This range was < 0.45; 0.61 > and < 0.22; 0.32 >, respectively. Then for E4 we altered the original values of fc by subtracting the [0.1; 0.2; 0.3; 0.4] offset and for E15 by adding the [0.2; 0.4; 0.5; 0.6] offset in the control run, respectively. Afterward, SEBS was forced with these eight variants of fc while keeping the other variables in the control run unchanged. So, for each of the two sites we obtained four different bivariate LE ensembles. These ensembles are shown in Figs. 8 and 9.

Clearly, fc controls the spread of the presented ensembles around the identity line. In the case of E15 ensembles one can notice an acceleration in the evapotranspiration process. The opposite effect is visible for E4. Notice that by adjusting the value of fc, the initial geometric pattern of the E15 ensemble (see Fig. 4) can be approximately transformed into the pattern present in E4 ensemble (cf. lower-right panel of Fig. 9 with the lower-left panel of Fig. 4). We repeated the same exercise for the remaining sites. In all cases we were able to achieve the similar degree of control by shifting the ensembles depending on the initial geometry of the ensemble in the control run. So practically we may conclude that land-cover type together with land-cover intensity fc are steering factors for regionalization of LE pdfs. From the remaining scatter, we do realize, however, that there are many other variables responsible for LE dynamics.

Ideally, regionalization of bivariate LE pdfs would require us to collect LEis observations for a range of fc within a land-cover type, which is difficult to achieve in practice, especially for sparse in situ networks. However, as demonstrated above we can control the shape of bivariate ensembles to a reasonable extent by changing the value of fc in SEBS. So the approximate solution here could be as follows:

  • at a given site (a) with a particular land-cover type and with available in situ LE measurements the bivariate pdf are fitted to data ensemble

  • for a different site (b) with the same land-cover type and without the in situ latent heat flux measurements, estimated fc (presumably from remote sensing) is used to transform the existing ensembles at site (a) refit the pdf

To demonstrate an example of the above approach the following cross-validation procedure was performed. Given information about bivariate LE fluxes at E4, Plevna site with grassland cover, and known fc we tried to infer the structure of the MG pdf at E15, Ringwood site, with the same land cover but different fc. So, basically, we fitted the MG densities to the data ensembles in Fig. 8 {i.e., we approximated fc at the E15 site by fc at E4 minus the offset [0.1; 0.2; 0.3; 0.4]} and compared these with the MG density for the E15 site fitted to the control run ensemble in the upper-left panel of Fig. 4. The comparison was done using the similarity measure in (11). The results are displayed as numbers (in bold) in Fig. 8. It is easy to see that the L2 correlation between pdfs is highest (0.81) when subtracting the 0.4 offset, which yields a range of fc between < 0.05; 0.21 >. This range, however, is small compared to the original fc range derived from MODIS for the E15 site: < 0.22; 0.31 >. Therefore, there is some unexplained uncertainty in the regionalization procedure, which can be attributed to the uncertainty in MODIS-derived fc and differences between the two sites in terms of the soil water balance. The latter source of uncertainty is demonstrated in Fig. 10 as a scatterplot between in situ evaporative fractions for E15 and E4. Clearly E4 is evaporating more than E15. This is related to available energy, soil moisture (thus antecedent precipitation), soil water storage capacity, and fc (assuming vegetation types are the same). So one can conclude that by using land-cover type and fc as a regionalization parameter, one gains a reasonable amount of information about underlying bivariate pdfs at sites where LEis measurements are not available. This information, however, is not sufficient to guarantee full recovery of the underlying MG pdf, due to aforementioned sources of uncertainty. It is worth mentioning that similar cross-validation results were obtained for E9, Ashton, and E20, Mekeer, sites with cropland land cover.

8. Discussion and outlook

In this paper we have proposed a new procedure for describing bivariate LE ensembles as MG pdfs. This procedure is able to produce nonparametric pdfs that are fully data driven and are more flexible to describe local geometry of LE ensembles than classical parametric pdfs like, for example, Gaussians. Moreover, the procedure offers a vehicle for uncertainty analysis due to undersampling, error sources, and scaling problems, which are notorious when comparing LEis to LEsat data. We have shown that the conditional pdfs p(LEis|LEsat) can theoretically be useful in novel data assimilation schemes and spatiotemporal interpolation of bivariate LE data. The essential prerequisite for these applications is the ability to regionalize MG pdfs. The preliminary results in this work have demonstrated that it is feasible to regionalize the pdfs using land cover and vegetation fraction as discriminatory variables.

There are a number of issues that need to be investigated in order to implement the above methodology in hydrologic practice. Additional research is needed to quantify errors in LEis data for various measuring techniques and see how they influence pdfs of bivariate LE ensembles. Progress in this direction, for EBBR/ARM-CART sites, is discussed in Stricker and Wójcik (2005, unpublished manuscript). An attractive option would also be to use scintillometeric measurements. The scintillation technique is one of the few techniques that can provide LEis fluxes at scales of several kilometers (up to 10 km), making them more comparable to LEsat fluxes (Meijninger et al. 2002). Additional work is further required to fine-tune the regionalization of LE pdfs by investigating the use of other environmental variables and influence of various uncertainty sources on pattern of bivariate LE ensembles. As demonstrated in section 7c an environmental variable to conceive here is the soil moisture. Finally, the parallel implementation of GEnKF for high-dimensional hydrologic systems is a challenging research problem. If successful, the assimilation of p(LEis|LEsat) into land surface models promises to enhance significantly the quality of water balance estimates over a range of spatial and temporal scales.

Acknowledgments

The first two authors are grateful for the financial support from WIMEK, the Wageningen Institute for Environmental and Climate studies. We also wish to acknowledge the constructive comments of Prof. S. Margulis and anonymous JHM reviewers whose inputs greatly improved the quality of our presentation.

REFERENCES

  • Anderson, J., and Anderson S. L. , 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127 , 27412748.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bellman, R., 1961: Adaptive Control Processes: A Guided Tour. Princeton University Press, 265 pp.

  • Bishop, M. C., 1995: Neural Networks for Pattern Recognition. Oxford University Press, 475 pp.

  • Braud, I., 1998: Spatial variability of surface properties and estimation of surface fluxes of a savannah. Agric. For. Meteor., 89 , 1544.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brutsaert, W., 1999: Aspects of bulk atmospheric boundary layer similarity under free-convective conditions. Geophys. Rev., 37 , 439451.

  • Celeux, G., Chretien S. , Forbes F. , and Mkhadri A. , 2001: A componentwise EM algorithm for mixtures. J. Comput. Graph. Stat., 10 , 699712.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with non-linear quasi geostrophic model using monte-carlo methods to forecast error statistics. J. Geophys. Res., 99 , 143162.

    • Search Google Scholar
    • Export Citation
  • Figueiredo, M., and Jain A. , 2002: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell., 24 , 381395.

  • Gershenfeld, N., 1992: Dimension measurement on high-dimensional systems. Physica D, 55 , 135154.

  • Hipps, L., and Kustas W. , 2000: Patterns and organisation in evaporation. Spatial Patterns in Catchment Hydrology-Observations and Modelling, R. Grayson and G. Bloshl, Eds., Cambridge University Press, 105–122.

    • Search Google Scholar
    • Export Citation
  • Liang, X., Lettenmaier D. P. , and Wood E. F. , 1996: One-dimensional statistical dynamic representation of subgrid spatial variability of precipitation in the two-layer variable infiltration capacity model. J. Geophys. Res., 101 , 2140321422.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • McLachlan, G., and Krishnan T. , 1997: The EM Algorithm and Extensions. Wiley Interscience, 352 pp.

  • McLachlan, G., and Peel D. A. , 2000: Finite Mixture Models. Wiley Interscience, 419 pp.

  • Meijninger, W., Green A. , Hartogensis O. , Kohsiek W. , Hoedjes J. , Zuurbier R. , and De Bruin H. , 2002: Determination of area-averaged water vapour fluxes with large aperture and radio wave scintillometers over a heterogeneous surface flevoland field experiment. Bound.-Layer Meteor., 105 , 6383.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miller, D., Washburne J. , and Wood E. , 1995: Eos workshop on land-surface evaporation and transpiration. Earth Obs., 7 , 5256.

  • Nie, D., and Coauthors, 1992: An intercomparison of surface flux measurement systems used during FIFE 1987. J. Geophys. Res., 97 , D17. 1871518724.

  • Scott, D., and Szewczyk W. , 2001: From kernels to mixtures. Technometrics, 43 , 323335.

  • Sharma, A., 2000: Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 3—A nonparametric probabilistic forecast model. J. Hydrol., 239 , 249258.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sobrino, J., el Kharraz J. , and Li Z. , 2003: Surface temperature and water vapour retrival from MODIS data. Int. J. Remote Sens., 24 , 51615182.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Su, Z., 2002: The surface energy balance system (SEBS) for estimation of the turbulent heat fluxes. Hydrol. Earth Syst. Sci., 6 , 8599.

  • Su, Z., Schmugge T. , Kustas W. , and Massman W. , 2001: An evaluation of two models for estimation of the roughness height for heat transfer between the land surface and the atmosphere. J. Appl. Meteor., 40 , 19331951.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Torfs, P., and Wójcik R. , 2001: Local probabilistic neural networks in hydrology. Phys. Chem. Earth, 26B , 914.

  • Torfs, P., van Loon E. , Wójcik R. , and Troch P. , 2002: Data assimilation by non-parametric local density estimation. Computational Methods in Water Resources, S. Hassanizadeh, R. Schotting, W. Gray, and G. Pinder, Eds., Elsevier, 1355– 1362.

    • Search Google Scholar
    • Export Citation
  • Wan, Z., Zhang Y. , Zhang Q. , and Li Z. , 2004: Quality assessment and validation of the MODIS global land surface temperature. Int. J. Remote Sens., 25 , 261274.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wood, E., Su H. , McCabe M. , and Su Z. , 2003: Estimating evaporation from remote sensing. Proc. IGARSS ’03, Vol. 2, Toulouse, France, IEEE, 1163–1165.

Fig. 1.
Fig. 1.

1D and 2D example of MG (in both cases as a linear combination of three components).

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Fig. 2.
Fig. 2.

Assimilation of observational pdf (ϕn+1) by GEnKF: Monte Carlo estimation of the analysis step.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Fig. 3.
Fig. 3.

Land-cover classification from MODIS in the International Geosphere–Biosphere Program (IGBP) scheme for the ARM/CART region in Oklahoma. Circles represent the distribution of EBBR stations across the region.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Fig. 4.
Fig. 4.

The control run LE data for six sites in the SGP region for the period of 1 Jul–30 Sep 2001.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Fig. 5.
Fig. 5.

The surrogate data for six sites in the SGP region for the period of 1 Jul 2001–30 Sep 2001.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Fig. 6.
Fig. 6.

MG pdfs p(LEis, LEsat) fitted to the control run and the surrogate data in Figs. 4 and 5, respectively. Numbers between corresponding pairs of pdfs indicate CL2(p1;p2) in (11) estimated for these pdfs.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Fig. 7.
Fig. 7.

(left) MG pdfs p(LEis, LEsat) fitted to the surrogate LE data for two grassland sites in the SGP region. (right) A few examples of conditional MG pdfs p(LEis|LEsat). The solid line in the x–y plane represents the conditional expectation (regression curve) whereas the dashed lines represent the standard deviation envelopes.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Fig. 8.
Fig. 8.

Controlling the geometry of bivariate LE ensemble by subtracting an offset from fc in the control run for the E4 site. Numbers displayed in upper-left corner of each scatterplot indicate CL2(p1;p2) in (11) between the pdf fitted to each of the fc altered ensembles and the pdf fitted to the control run ensemble for site E15 in the upper-leftmost panel of Fig. 4.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Fig. 9.
Fig. 9.

Controlling the geometry of bivariate LE ensemble by adding an offset to fc in the control run for the E15 site.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Fig. 10.
Fig. 10.

Evaporative fraction at E15 vs evaporative fraction at E4 grassland sites for the period of 1 Jul–30 Sep 2001.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM491.1

Table 1.

SEBS input for the control run.

Table 1.
Table 2.

Analysis of errors in Rnet,sat.

Table 2.

1

For assimilation of LE into land surface models we propose ϕ = p(LEis|LEsat).

Save
  • Anderson, J., and Anderson S. L. , 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127 , 27412748.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bellman, R., 1961: Adaptive Control Processes: A Guided Tour. Princeton University Press, 265 pp.

  • Bishop, M. C., 1995: Neural Networks for Pattern Recognition. Oxford University Press, 475 pp.

  • Braud, I., 1998: Spatial variability of surface properties and estimation of surface fluxes of a savannah. Agric. For. Meteor., 89 , 1544.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brutsaert, W., 1999: Aspects of bulk atmospheric boundary layer similarity under free-convective conditions. Geophys. Rev., 37 , 439451.

  • Celeux, G., Chretien S. , Forbes F. , and Mkhadri A. , 2001: A componentwise EM algorithm for mixtures. J. Comput. Graph. Stat., 10 , 699712.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with non-linear quasi geostrophic model using monte-carlo methods to forecast error statistics. J. Geophys. Res., 99 , 143162.

    • Search Google Scholar
    • Export Citation
  • Figueiredo, M., and Jain A. , 2002: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell., 24 , 381395.

  • Gershenfeld, N., 1992: Dimension measurement on high-dimensional systems. Physica D, 55 , 135154.

  • Hipps, L., and Kustas W. , 2000: Patterns and organisation in evaporation. Spatial Patterns in Catchment Hydrology-Observations and Modelling, R. Grayson and G. Bloshl, Eds., Cambridge University Press, 105–122.

    • Search Google Scholar
    • Export Citation
  • Liang, X., Lettenmaier D. P. , and Wood E. F. , 1996: One-dimensional statistical dynamic representation of subgrid spatial variability of precipitation in the two-layer variable infiltration capacity model. J. Geophys. Res., 101 , 2140321422.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • McLachlan, G., and Krishnan T. , 1997: The EM Algorithm and Extensions. Wiley Interscience, 352 pp.

  • McLachlan, G., and Peel D. A. , 2000: Finite Mixture Models. Wiley Interscience, 419 pp.

  • Meijninger, W., Green A. , Hartogensis O. , Kohsiek W. , Hoedjes J. , Zuurbier R. , and De Bruin H. , 2002: Determination of area-averaged water vapour fluxes with large aperture and radio wave scintillometers over a heterogeneous surface flevoland field experiment. Bound.-Layer Meteor., 105 , 6383.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miller, D., Washburne J. , and Wood E. , 1995: Eos workshop on land-surface evaporation and transpiration. Earth Obs., 7 , 5256.

  • Nie, D., and Coauthors, 1992: An intercomparison of surface flux measurement systems used during FIFE 1987. J. Geophys. Res., 97 , D17. 1871518724.

  • Scott, D., and Szewczyk W. , 2001: From kernels to mixtures. Technometrics, 43 , 323335.

  • Sharma, A., 2000: Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 3—A nonparametric probabilistic forecast model. J. Hydrol., 239 , 249258.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sobrino, J., el Kharraz J. , and Li Z. , 2003: Surface temperature and water vapour retrival from MODIS data. Int. J. Remote Sens., 24 , 51615182.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Su, Z., 2002: The surface energy balance system (SEBS) for estimation of the turbulent heat fluxes. Hydrol. Earth Syst. Sci., 6 , 8599.

  • Su, Z., Schmugge T. , Kustas W. , and Massman W. , 2001: An evaluation of two models for estimation of the roughness height for heat transfer between the land surface and the atmosphere. J. Appl. Meteor., 40 , 19331951.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Torfs, P., and Wójcik R. , 2001: Local probabilistic neural networks in hydrology. Phys. Chem. Earth, 26B , 914.

  • Torfs, P., van Loon E. , Wójcik R. , and Troch P. , 2002: Data assimilation by non-parametric local density estimation. Computational Methods in Water Resources, S. Hassanizadeh, R. Schotting, W. Gray, and G. Pinder, Eds., Elsevier, 1355– 1362.

    • Search Google Scholar
    • Export Citation
  • Wan, Z., Zhang Y. , Zhang Q. , and Li Z. , 2004: Quality assessment and validation of the MODIS global land surface temperature. Int. J. Remote Sens., 25 , 261274.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wood, E., Su H. , McCabe M. , and Su Z. , 2003: Estimating evaporation from remote sensing. Proc. IGARSS ’03, Vol. 2, Toulouse, France, IEEE, 1163–1165.

  • Fig. 1.

    1D and 2D example of MG (in both cases as a linear combination of three components).

  • Fig. 2.

    Assimilation of observational pdf (ϕn+1) by GEnKF: Monte Carlo estimation of the analysis step.

  • Fig. 3.

    Land-cover classification from MODIS in the International Geosphere–Biosphere Program (IGBP) scheme for the ARM/CART region in Oklahoma. Circles represent the distribution of EBBR stations across the region.

  • Fig. 4.

    The control run LE data for six sites in the SGP region for the period of 1 Jul–30 Sep 2001.

  • Fig. 5.

    The surrogate data for six sites in the SGP region for the period of 1 Jul 2001–30 Sep 2001.

  • Fig. 6.

    MG pdfs p(LEis, LEsat) fitted to the control run and the surrogate data in Figs. 4 and 5, respectively. Numbers between corresponding pairs of pdfs indicate CL2(p1;p2) in (11) estimated for these pdfs.

  • Fig. 7.

    (left) MG pdfs p(LEis, LEsat) fitted to the surrogate LE data for two grassland sites in the SGP region. (right) A few examples of conditional MG pdfs p(LEis|LEsat). The solid line in the x–y plane represents the conditional expectation (regression curve) whereas the dashed lines represent the standard deviation envelopes.

  • Fig. 8.

    Controlling the geometry of bivariate LE ensemble by subtracting an offset from fc in the control run for the E4 site. Numbers displayed in upper-left corner of each scatterplot indicate CL2(p1;p2) in (11) between the pdf fitted to each of the fc altered ensembles and the pdf fitted to the control run ensemble for site E15 in the upper-leftmost panel of Fig. 4.

  • Fig. 9.

    Controlling the geometry of bivariate LE ensemble by adding an offset to fc in the control run for the E15 site.

  • Fig. 10.

    Evaporative fraction at E15 vs evaporative fraction at E4 grassland sites for the period of 1 Jul–30 Sep 2001.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 455 316 24
PDF Downloads 72 21 1