## 1. Introduction

*R*

_{N}

*G*

*H*

*LE,*

*R*

_{N}is net radiation (W m

^{−2}),

*G*ground heat flux (W m

^{−2}),

*H*sensible heat flux (W m

^{−2}), and

*LE*latent heat flux (W m

^{−2}). The available energy (

*R*

_{N}−

*G*) is relatively easy to estimate since

*R*

_{N}may be obtained from the balance of incoming and outgoing shortwave and longwave radiation (e.g., Daughtry et al. 1990), and

*G*expressed as a fraction of

*R*

_{N}(de Bruin and Holtslag 1982), with a typical daytime value of about 0.1 ×

*R*

_{N}or less for grass and crops. Therefore, the solution to Eq. (1) is reduced to assessing either

*H*or

*LE*and calculating the remaining term as the residual.

The First ISLSCP Field Experiment (FIFE) provided a test case for attempts to map surface distributions of source and sink strength for sensible heat, latent heat, and CO_{2} based on airborne flux observations (Desjardins et al. 1992). Airborne fluxes showed reasonableagreement with surface observations (Kelly et al. 1992) and the resulting maps showed good correspondence between spatial patterns of energy and CO_{2} fluxes and maps of surface characteristics, such as excess of radiometric surface temperature over air temperature and greenness (NDVI or simple ratio of near-IR to red reflectance) (Desjardins et al. 1992; Schuepp et al. 1992; Desjardins et al. 1995). However, it was not possible to derive magnitudes of sensible heat flux densities from meteorological and radiometric surface observations so that latent heat flux could also not have been evaluated as a residual of the energy budget [Eq. (1)].

*ρC*

_{p}is air heat capacity (J m

^{−3}K

^{−1}),

*T*

_{c}aerodynamic temperature of canopy surface (K),

*T*

_{a}air temperature (K) at a reference level

*z*(m),

*r*

_{a}aerodynamic resistance to heat transfer (s m

^{−1}),

*k*the von Kármán’s constant (0.4 ± 0.01; Högström, 1988),

*u** friction velocity (m s

^{−1}),

*d*zero displacement (m),

*z*

_{0m}and

*z*

_{0h}roughness lengths for momentum and heat (m),

*U*horizontal wind speed (m s

^{−1}) at the reference level, and Ψ

_{h}and Ψ

_{m}stability corrections for heat and momentum (Businger et al. 1971; Högström 1988).

The aerodynamic method is appropriate for extended fields with short canopies and under conditions that are not highly stable. It has the potential to be used at various scales, and each of its constituent terms may be obtained from a different platform. These characteristics make it an appealing candidate for a multitude of applications, such as water resources management for grasslands and most crops, even though the difficulty in defining an effective aerodynamic surface temperature for canopy (*T*_{c}) is a major obstacle to its practical use. As an intuitively justified alternative, the radiometric temperature of a vegetated surface, *T*_{s}, has been used as a substitute for aerodynamic temperature (e.g., Hatfield et al. 1984; Reginato et al. 1985; Kustas et al. 1989; Kalma and Jupp 1990; Kohsiek et al. 1993; Humes et al. 1994; Cleugh and Dunin 1995). However, comparison of *T*_{c} values derived from sensible heat observations by eddy correlation or Bowen ratio method on the basis of Eq. (2), and observations of *T*_{s} by infrared thermometers (IRTs) often show significant differences, especially for sparse canopies (e.g., Choudhury et al. 1986; Hall et al. 1992; Cleugh and Dunin 1995). This is due to inclusion of a varying fraction of soil within the field of view of the IRT, which may be warmer or colder than the canopy elements coupled thermally and aerodynamically to ambient airflow, depending on soil moisture and degree of shading.

Attempts have been made to resolve the difference between *T*_{s} and *T*_{c} by adding an adjustment term to the aerodynamic resistance *r*_{a} in Eq. (2) (Lhomme et al. 1992), such as the “excess resistance” *B*^{−1}*u**^{−1} (Stewart et al. 1994), where *B*^{−1} quantifies the difference between heat and momentum transfer from canopies, and is defined as 1/*k*[ln(*z*_{0m}/*z*_{0h})] (Chamberlain 1968). The underlying assumption in this method is that canopy and soil are coupled as source or sink, which means that *T*_{s} and *T*_{c} should be both either higher or lower than *T*_{a}. However, occasional observations of *T*_{c} > *T*_{a} > *T*_{s} cast a serious doubt on the validity of this assumption, and suggest that the link between *T*_{s} and *T*_{c} may be too complex to be established solely by adjusting the aerodynamic resistance (Kalma and Jupp 1990).

For these reasons, our study examined the feasibility of developing a neural network model that incorporates the physics of the aerodynamic method, along with other physical constraints, to estimate sensible heat flux, based on routinely measured meteorological variables and variables amenable to remote sensing. Of particular interest among these was the radiometric surface temperature, which was used as a substitute for aerodynamic temperature. This initial study uses tower-based, near-surface observations because they provide full diurnal cycles with their associated variability in magnitude and direction of flux, unlike airborne observations which are restricted to much narrower temporal windows.

## 2. Methodology

### a. Basic principles of neural networks

Neural networks (NNs) are a relatively new generation of information processing systems, inspired by the learning and memorizing mechanisms of the brain, and its distributed, massively parallel processing architecture.

The building block of a NN is a simple processing element (PE), a crude mathematical analog to the biological neuron. The core of a PE is a differentiable, nonlinear function of choice *φ,* called a transfer function, whose argument is the dot product of an input vector **I****W***θ* (Fig. 1) to partially mimic the thresholding characteristic of a neuron, which means that a neuron is only triggered when the level of chemical signals it receives exceeds a certain level.

The weight vector is analogous to synaptic gaps, junctions which control the amount of information received by a neuron. Loosely speaking, synaptic gaps in a certain region of the brain reach their individual optimum size when we learn something. We forget what we learned when the synaptic gaps change their size, for example, due to lack of practice. Therefore, one can say that synaptic gaps constitute what we call memory, where knowledge is stored.

A typical NN consists of several interconnected PEs arranged in three layers: 1) the input layer, where the network receives information; 2) the hidden layer, wherethis information is processed; and 3) the output layer, for the NN’s response to this information (Fig. 2). NNs resemble the brain in that they acquire experiential knowledge through learning, store it in the weights, and recall it later. Here learning is defined as a process to adjust the weights for optimum knowledge acquisition, and is achieved by applying a learning rule of choice to extract enough information from a set of input–output examples (training set) to simulate the unknown function that maps the input vector to the output vector. One of the most popular learning rules that has successfully been used in a variety of fields is the backpropagation method, a first-order approximation of the steepest descent technique applied to multidimensional error surfaces (Haykin 1994).

To train a NN, its weights are initialized randomly. Training examples, consisting of an input vector and the“correct,” corresponding output, are then picked at random from a “training set” of observations. The network processes the input vector and evaluates the error between its output and the correct one. A learning rule of choice, for example, the backpropagation method, is then used to minimize the error by adjusting the weight values, followed by presenting the network with another randomly picked training example. This training cycle is continued until either the error decreases to a desired limit or the network shows no more improvement. It should be mentioned that weight adjustments may be done after a specified number of training examples (epoch) to minimize the cumulative error.

The main advantages of neural networks are that 1) they are capable of generalization; 2) they are good candidates for solving nonlinear problems, as the PE is a nonlinear processor by definition (Haykin 1994); 3) the parallel, distributed structure of neural networks gives them a certain degree of tolerance for input noise;and 4) they can use almost any input that is relevant to the output (NeuralWare 1993a).

One of the main disadvantages of neural networks is that due to architectural complexity, their theoretical analysis is difficult, so that currently virtually no designstandards exist, and configuration parameters must be determined by trial and error (Haykin 1994). Also, the training process is data intensive and may prove computationally expensive for complex problems.

### b. Data

Twenty-six datasets, each consisting of synchronous ground measurements of meteorological variables and sensible heat flux collected in five Intensive Field Campaigns (IFCs) conducted in FIFE were used to develop and validate six neural networks.

The 15 km × 15 km FIFE site, with a midcontinental climate, characterized by strong seasonal climatic forcing (Sellers et al. 1992), has a fairly complex topography, with rms deviations in vertical excursions of the order of 35 m. The main vegetation is tallgrass prairie, with a growing season from mid-March to mid-October (Kanemasu et al. 1992). About one-third of the area is managed as a long-term ecological reserve (the Konza Prairie), with the rest under private management for grazing (Strebel et al. 1994).

Four IFCs in 1987 were conducted to capture the cardinal phenological stages of the vegetation: IFC-1 green-up (26 May–6 June), IFC-2 peak greenness (25 June–11 July), IFC-3 dry-down (6–21 August), and IFC-4 senescence (5–16 October). However, the important transitional dry-down phase was missed, as the first three IFCs were unusually wet, and the fourth completely dry, so that a fifth IFC was conducted in 1989 (24 July–12 August) to fill the gap. Hereafter, the term FIFE-87 will refer to the first four IFCs collectively, and FIFE-89 will represent IFC-5.

Figure 3 shows the surface flux stations (SFSs) and the automated meteorological stations (AMSs) within the FIFE site boundary in 1987 and 1989 that provided the data used in this study. They had to be co-located in the sense that the AMS observations could be considered representative for conditions within the flux footprint area of the tower measurements—that is, the upwind surface area for which the flux measurements can be considered to be representative [see e.g., Schuepp et al. (1990) or Horst and Weil (1994) for footprint estimates]. Both the AMSs and SFSs were to provide half-hourly averages of their respective variables at 15 and 45 min past the hour, with 48 records in 24 h. However, due to down periods, the total number of available records for each station in each IFC was usually less than maximum, except for a few stations where data were obtained beyond the official observation periods due to individual observer initiative.

Since varying cloud cover was considered as a natural source of variability in sensible heat flux, available data from all the days in each IFC were used. Missing records were not replaced by artificial data because most of them were in time blocks of more than a few hours rather than in singularities. All datasets were visually examined for spikes, which was easier for AMS data thanfor the noisier sensible heat flux records, and the corresponding records excluded from the analysis (⩽3% of records). Sellers (Strebel et al. 1994) used a series of spike detection tests, and flagged dubious records in FIFE-87 data, which were taken into consideration when deciding whether or not a record should be removed. Also, all Bowen ratio data were checked for energy imbalances to help detect spikes.

IFC-5 was chosen for training networks for the following reasons: First, the greater spatial variability in surface flux density of IFC-5 compared to other IFCs made it a better representative of the norm, and more challenging from a modeling perspective. Second, IFC-5 straddled the central part of the growing season, was the longest IFC, and, therefore, provided the most extensive set of training data.

IFC-5 consisted of six datasets: two from station 2133, which was equipped with both Bowen ratio and eddy correlation instrumentation, and one each from stations 4268, 4439, 6912, and 8739 (see Fig. 3). Six networks—one for each dataset—were developed. Each network is referred to by the ID number of the station that furnished the data, followed by the suffix BR (Bowen ratio) or EC (eddy correlation) according to station type. Hence, the six networks are 2133BR, 2133EC, 4268BR, 4439EC, 6912BR, and 8739EC.

The dataset for each network was shuffled and partitioned into a “training set” and a “test set” of roughly equal size. Each network was presented with 1000 training examples randomly picked from the training set. Then it was tested with the test set before being presented with another 1000 examples. The training would terminate if either the rms of the test set did not improve after 100 consecutive tests or the total number of training examples presented to the network reached 10^{6}. This training strategy is one of the recommended methods to reduce the possibility of overtraining, a situation in which a network performs well on its training data, but poorly on independent test data (Haykin 1994; NeuralWare 1993b).

### c. Network configuration

The NN software package (NeuralWorks Explorer 5.0, i.e., the educational edition of NeuralWorks Professional II/PLUS by NeuralWare, Inc., Pittsburgh, PA) was run on an IBM-compatible PC. The learning rule chosen for our study was a variation of the backpropagation algorithm called the extended delta-bar-delta rule (Minai and Williams 1990; NeuralWare 1993a). The optimum number of hidden PEs, that is, PEs in the hidden layer, was determined by a pruning technique(NeuralWare 1993c) to be nine. A hyperbolic tangent transfer function was used for the PEs in the hidden layer and a linear transfer function for the PE in the output layer. The epoch size, the number of training examples between weight updates (NeuralWare 1993b), was varied between 15 and 25.

### d. Input vector

Choosing the elements of the input vector is the most crucial step in developing a neural network. In our study it is based on the following arguments.

*H*is a function of aerodynamic temperature

*T*

_{c}, air temperature

*T*

_{a}, wind velocity

*U,*reference height

*z,*zero displacement

*d,*momentum and heat roughness lengths

*z*

_{0m}and

*z*

_{0h}, and net radiation

*R*

_{N}. It is also an implicit function of time

*t.*A typical sensible heat flux time series shows a well-defined diurnal cycle, with a high autocorrelation at both diurnal and seasonal scales so that at any time of the day the observed value of

*H*might be close to its value on the preceding or following day at the same time, or at least might have the same sign. Also, changes in solar angle are expected to lead to time-dependent differences in the relative heating of soil and canopy. Hence, in general, we would expect

*H*

*f*

*T*

_{s}

*T*

_{a}

*U, z, d, z*

_{0m}

*z*

_{0h}

*R*

_{N}

*t*

*T*

_{c}has been replaced by the easily observed radiometric surface temperature

*T*

_{s}(as previously discussed), measured by an Everest 4000 IRT mounted at 2 m, with zenith angle = 0 and field of view = 15°.

*z*can be left out of Eq. (5). Furthermore, assuming relative homogeneity at each station, and bearing in mind that the duration of IFC-5 (20 days) was not long enough for significant morphological changes to take place,

*d,*

*z*

_{0m}, and

*z*

_{0h}may be considered as approximately constant within the training set and left out of Eq. (5), reducing it to

*H*

*f*

*T*

_{s}

*T*

_{a}

*U, R*

_{N}

*t*

Results may be further improved by including supplementary physical information in the input vector (Al-Mashouq and Reed 1991). This can be achieved by combining the arguments of *f* in a physically meaningful way, such as in using the temperature difference Δ*T* = *T*_{s} − *T*_{a} as a single input field to represent the thermal forcing on *H.* We also note that, since *u** is proportional to *U,* Eq. (2) essentially reduces to *H* ∝ *U* × Δ*T.* This suggests that combining *U, T*_{s}, and *T*_{a} into a single variable *U* × Δ*T* may be beneficial, which is particularly interesting since Kustas et al. (1989) have linked *U* × Δ*T* to the added resistance to heat transfer, *kB*^{−1}. Based on a least squares regression analysis, they state that, despite the theoretical and experimental evidence suggesting that *kB*^{−1} for vegetative surfaces can be treatedas constant, it may be a function of *T*_{s} for partial canopy cover under arid conditions, such that *kB*^{−1} = *cU*Δ*T,* where *c* is an empirically determined coefficient.

A sensitivity analysis was performed to determine the relative advantages of input fields using the combined forms Δ*T* and *U* × Δ*T,* as opposed to the individual forms *U, T*_{s}, and *T*_{a}. The relative importance of each input field based on its contribution to the overall performance of the network was evaluated by a backward-elimination process, in which one starts with an input vector containing all the candidate variables and uses a numerical measure of choice to prune the less important ones. The measure used here was a modified version of the “explain net” feature in NeuralWare Explorer (NeuralWare 1993c), which is based on the magnitude of output variation corresponding to a certain amount of change in an input candidate: the bigger the output variation, the more important that candidate.

The backward-elimination procedure resulted in an input vector that consisted of Δ*T, U* × Δ*T, R*_{N}, and *t.* One might argue that in the presence of *U* × Δ*T,* Δ*T* would be redundant, but its exclusion deteriorated network performance significantly. To investigate this matter further, three models were developed: one with all four input fields, one without *U* × Δ*T,* and one without Δ*T.* These models were tested with all the available data from other IFCs and stations. Overall, the model including both Δ*T* and *U* × Δ*T* outperformed the other two, confirming the backward-elimination result. This suggests that multiplying Δ*T* by *U* creates a new variable that carries physical information separate from—and complementary to—that of the variables *U* or Δ*T* by themselves.

It should be pointed out that for backward-elimination to be meaningful, competition should be only between those input fields whose importance or format of presentation is under question. In our study *R*_{N} and *t* were always included in the input vector, and competition was only between *U, T*_{s}, *T*_{a}, Δ*T,* and *U* × Δ*T.* Otherwise, in spite of the physical evidence supporting *t,* it would have been forced out by variables such as *T*_{s} or Δ*T,* which are more highly correlated to *H.*

### e. Model evaluation method

Apart from the usual quantitative measures used in evaluating the performance of a model, our study used two additional ones: 1) Willmott’s index of agreement and 2) the percentage of systematic error.

*q*is defined as (Willmott 1982)where

*M*is model output and

*O*is observation. Higher

*q*’s mean better performance. In order to have four significant digits and maintain readability, in this study

*q*was replaced by

*Q*=

*q*× 100 (0 ⩽

*Q*⩽ 100).

*M*

^{R}

_{i}

*N*is the number of observations, and mse stands for mean square error. Note that the numerator is the systematic error (Willmott 1982).

## 3. Results and discussion

### a. Level I model validation

The model was first evaluated on its test set, that is, on data not seen by a network during training, but from the same space–time coordinates as the training set (Table 1). This stage, hereafter called Level I, can be thought of as network interpolation, since the test set has more or less the same range as the training set.

Figure 4 shows the best and worst networks according to Willmott’s index of agreement *Q.* Even for the worst network (Fig. 4b), estimations are mostly within the confidence bands for sensible heat flux measurements in FIFE. Moreover, the estimations have a fairly random distribution within the confidence bands, indicating a lack of significant bias. This means good performance for the networks in general. The quantitative measures in Table 1 support the graphical interpretation. Besides *Q,* the agreement between the mean and standard deviation of the network estimations and those of the observations reflects the generally good performance of the networks. The relatively low percentage of systematic errors (PSE) confirms a lack of significant bias.

Network performance at Level I showed a mild tendency toward underestimation at the higher, and overestimation at the lower range of the sensible heat flux values; in all cases the least squares regression slope *b* is less than one, and the least squares regression intercept *a* is greater than zero. An attempt was made to correct this problem with “error tables” (NeuralWare1993c). An error table is a 10 × 10 matrix superimposed on a Cartesian coordinate system, where the *x* axis denotes desired output and the* y* axis actual network output (Fig. 5), so that the principal diagonal corresponds to error-free performance. The difference between an ordinary network and one using an error table is that the latter multiplies the raw error (desired output minus actual output) by a bias factor (matrix element) looked up in the error table. Therefore, at least in theory, one may correct an ordinary network biased toward certain ranges of input signals by applying an error table whose matrix elements counterbalance the bias. The choice of matrix elements is a matter of trial and error. In our study several error tables were tried, of which one is shown in Fig. 5. Their common feature was that the value of an element was proportional to its distance from the principal diagonal to discourage the network from overestimation or underestimation.

The use of error tables in our study yielded neither significant nor systematic improvement in model performance, most likely because training examples were presented in “epochs” (number of training cycles between weight updates) of typically between 15 and 25. Using epochs results in accumulated error, which would prevent the NN from using the error table effectively to link bias factors (matrix elements) to individual input signals, or even a number of similar input signals, since training examples within an epoch are picked randomly. But even NNs that were presented with training examples one at a time (i.e., epoch = 1) and used error tables performed worse than networks presented with training examples at epochs of 15–25 with or without error tables. This suggests that possible gains due to error tables in the form of selective (regional) error reduction must have been outweighed by the loss due to an inappropriate choice of epoch size (i.e., one), which inflated the general (global) error.

### b. Level II model validation

Level II validation assessed how well a network with good interpolation capabilities could extrapolate to conditions other than those it was trained with. The two networks with the best interpolation performance, that is, 4439EC and 6912BR, were tested with available data from other sites and/or IFCs. Since the two networks had more or less identical performance, only the results obtained for network 4439EC are presented (Table 2).

Network performance at Level II varied from very good to very poor, depending on the situation to which the model was applied. Not surprisingly, the amount of scatter and bias was much higher than Level I. Whereas in Level I the percentage of systematic errors was quite low, in 32% of the cases in Level II it exceeded 50%, quite apart from the fact that the overall error size (systematic plus random) in Level II was generally larger than in Level I, as seen from usually lower *Q* values. Apart from the PSE and *Q* values, this drop in performance is also evident in the generally large gap between the mean and standard deviation of the network estimations and those of the observations.

Considering network performance over any one station, there was a gradual transition from overestimation to underestimation, or more precisely a decline in the least squares regression coefficients *a* and *b* as one moves from IFC-1 to IFC-4 (Fig. 6). This pattern indicates, again not surprisingly, that some pertinent parameters that vary from one IFC to the other have not been included in the model (underparameterization). One example would be the morphological aspect of the canopy, parameterized in terms of zero displacement and momentum roughness length. Examination of Fig. 6 suggests soil moisture as another such parameter. This may be inferred from the gradual increase in the upper limit of observed sensible heat flux from IFC-1 (wet) to IFC-4 (dry). Since soil water content is high at the beginning of the growing season (IFC-1), most of the available energy is expended on evaporation so that the sensible heat flux is kept low. As the growing season moves toward senescence (IFC-4), soil water content decreases, reducing evaporation, with an associated boost in sensible heat flux. In other words, the network estimate of sensible heat flux is valid only for soil moisture conditions similar to those of its training set.

Underparameterization is usually imposed by experimental constraints or by inadequate data. Clearly, the necessary though not sufficient conditions for an underparameterized network to perform optimally may be formulated as follows: 1) the network should be trained with data from a space–time segment narrow enough so that the pertinent parameters excluded from the model can be assumed to have constant values, and 2) the network should be applied to conditions in which the values of the excluded parameters are similar to those of the training data. It could be claimed that the model developed in this study complies with condition 1 asfar as zero displacement and momentum roughness length are concerned. Whether this claim could be extended to include soil moisture will be discussed further below.

A noteworthy ramification of condition 1 is that as long as a model does not include all the variables needed to differentiate between the various space–times constituting a “universe,” we may not be able to develop a universal network even if we have access to ample training data coming from all the space–times constituting that universe. The reason becomes clear if we bear in mind that in an underparameterized model the correct answer to a given input signal may not be unique since the input–output mapping function that the network tries to simulate is ill-defined. This intrinsic ambiguity confuses the network, thereby hampering the learning process. As a crude verification, all the available data (26 datasets) were combined into a single file, shuffled, and divided into a training set and a test set of roughly equal size in order to train and validate a universal network (Fig. 7a and Table 3). This, of course, is a Level I validation on a large, diverse dataset. The performance of the network is poor, especially when compared to the performance of the previous, “local” networks at the same level (Table 1 and Fig. 4). Even at Level II the local network 4439EC on average has almost similar performance to the universal network at Level I (Fig. 7b and Table 3). This shows that an underparameterized network may not be made universalhowever large the training set, reiterating the ramification stated at the beginning of this paragraph.

Figure 7b also provides a bird’s-eye view of network performance at Level II that conveys two pieces of noteworthy information: 1) the tendency to underestimate in the higher range and overestimate in the lower range of the sensible heat flux values was present at Level II to an even stronger degree than at Level I, and 2) the amount of scatter increased with increasing flux (presumably with increasing instability during the day). Note that these symptoms are also shared by the universal network (Fig. 7a).

### c. Sources of random error

The FIFE experiment was not designed for the type of modeling conducted in this study. The resulting random errors are difficult to quantify. However, a qualitative grasp of their possible impact may be gained by identifying their main sources.

#### 1) Input–output mismatch

The “footprint” of any sensor, namely the area upwind of a sensor where most of the registered flux originates, is dynamic in that its position varies with wind direction and its size with stability (e.g., Horst and Weil 1994). On the other hand, the position of an AMS, where network inputs are measured, is fixed. Therefore, whenever an AMS happens to lie outside the footprint area of its co-located SFS, network inputs may not correspond to flux measurements. This input–output mismatch is double edged in confusing the network and preventing optimum learning during the training phase, and in leading to erroneous output in the test phase.

Since the predominant wind in the FIFE site was from the southwest (Fig. 3), input–output mismatch could have been minimized if each AMS were located not more than 200 m SW of its co-located SFS, that is, generally within its footprint area. Figure 3 shows that this was not the case. Moreover, the relative position of AMS to SFS varied from station to station, thus amplifying the problem at Level II, where intersite tests were conducted.

It should be mentioned that even in case of perfect co-location of AMS and SFS, it could not necessarily be assumed that the surface properties in the viewing field of the AMS IRT would be identical to the average surface properties within the SFS footprint, but we have no data to quantify such potential discrepancies.

#### 2) Measurement method and sensor type

In Table 2 we see considerable intersite variability in mean and standard deviation of observed sensible heatflux values within an IFC. This is mainly due to 1) topographical and morphological differences and/or 2) measurement method and instrument type differences. While the first is beyond our control, the second could have been minimized by using the same measurement method and instrument type in addition to standard setup and sampling procedures. But one of the objectives in FIFE was to compare the eddy correlation method to the Bowen ratio method and to assess differences between sensors. Therefore, not only was there no uniformity of method, but stations using the same method were equipped with different instrument types, and setup and sampling procedures varied from station to station.

The fact that stations 4439 in FIFE-87 and 2133 in FIFE-89 were equipped with both eddy correlation and Bowen ratio instrumentation (see Fig. 3) provided an opportunity for comparing the two methods and evaluating the impact of using different methods on network performance. Figure 8 shows two representative cases:station 4439 in IFC-3 (Fig. 8a) and station 2133 in IFC-5 (Fig. 8b). The discrepancy between the two methods is significantly higher at station 2133. Ironically, the disagreement between the two methods at station 2133 is even higher than the disagreement between either method and its corresponding network (cf. the *Q* value in Fig. 8b with those of 2133BR and 2133EC in Table 1). The additional discrepancy in Fig. 8b can be explained as follows.

First, eddy correlation measurements at station 4439 were systematically not taken at certain times of day. Therefore, Fig. 8a may not reflect the true extent of the difference between the two methods.

Second, spatial distribution of soil moisture was relatively homogeneous in IFC-1 through IFC-4, with the first three IFCs quite wet and the fourth one very dry. By contrast, IFC-5 was characterized by significant spatial variation in soil moisture. It has been shown that under such heterogeneous conditions two nearby sensors may register different flux measurements (e.g., see Lloyd 1995) since, depending on the degree of heterogeneity, even the footprints of two nearby sensors may differ sufficiently to emit significantly different fluxes. This implies that soil moisture within a station in IFC-5 should not necessarily have been treated as constant, and may explain some of the drop in performance of the networks when they were tested on other stations in IFC-5 at Level II.

Another factor that might have contributed to the discrepancy in Fig. 8b is that the eddy correlation and Bowen ratio setups in station 4439 were under the supervision of the same investigator, whereas in station 2133 each setup was operated by a different investigator.

Even when two stations use the same measurement method, intersensor differences can still cause significant flux variation. For example, a study by Nie et al. (1992) on FIFE reports average differences of up to 6% for network radiation and 10% for Bowen ratio for sensors from the same manufacturer, increasing to 15% (net radiation) and 30% (Bowen ratio) for sensors from different manufacturers.

### d. Cross comparison with other conventional schemes

So far the NN model developed here has been compared with two standard methods of measuring sensible heat flux: eddy correlation and Bowen ratio. It would be interesting to compare the model with methods that use standard meteorological observations to estimate sensible heat flux. However, such methods were not used in FIFE. While comparing models across different experimental frameworks is always problematic, the study by Galinski and Thomson (1995, hereafter GT95) merits some attention in this context.

Galinski and Thomson tested three semiempirical schemes, all based on the resistance method (Monteith and Unsworth 1990) and designed for midlatitude, grass-covered surfaces. Hourly estimates of sensible heat flux given by the schemes were compared with eddy correlation observations. The schemes, each originally calibrated on the basis of a unique set of field observations, were not recalibrated for GT95 space–time conditions. This, and the fact that the test was conducted with data spanning about 2½ years, make GT95 comparable to the combined Level II results. However, GT95 results were presented separately for day and night. Therefore, combined Level II data were partitioned for day and night to allow a meaningful cross comparison, which will be done only with the best GT95 cases.

Perhaps the best quantitative measure of performance appropriate for cross comparison is a normalized, bounded index such as Willmott’s index of agreement *Q,* which unfortunately is not widely used [for pitfalls of using inappropriate measures such as the correlation coefficient, see Willmott (1982)]. The next best measure obtainable from GT95 was rms divided by the mean of observations 〈*O*〉, a normalized, though unbounded, measure, which still can reduce the differences due to seasonal and physical differences between space–times. Based on this measure, the scheme by Berkowicz and Prahm (1982) had the best daytime performance: rms/〈*O*〉 = 38.5/48.9 = 0.79; daytime combined Level II results gave rms/〈*O*〉 = 54.93/85.29 = 0.64. The scheme by Holtslag and Van Ulden (1982) had the best nocturnal performance: rms/〈*O*〉 = 13.1/−10.9 = −1.2; nocturnal combined Level II results gave rms/〈*O*〉 = 20.38/−24.33 = −0.84.

While, according to the magnitude of rms/〈*O*〉, the NN model seems to outperform the best schemes for day and night, it is not clear if the difference in performance is significant. Therefore, it would be unwise to conclude that the NN model is superior to conventional schemes. Perhaps it is only safe to accept the positive results of this cross comparison as an additional indication that NNs may indeed deserve a second look.

### e. Summary and conclusions

This study presents a simple neural network model for estimating sensible heat flux over a midcontinental prairie grassland. The required input data are horizontal wind speed, air temperature, radiometric surface temperature, net radiation, and time.

In general, networks trained on part of the data from a narrow range of space–time coordinates performed very well over the other part (Level I). Performance over data from other sites and times (season or year) varied from good to poor (Level II). This indicated that the model did not include some relevant input data that varied with space and time (underparameterization), such as those parameterizing canopy morphology and soil moisture. Apart from the lack of data, the basis for excluding these parameters was the assumption that they varied little within the narrow window of space and time that furnished the training data and could be treated as constants. The validity of this assumption is attested to by the relative success of testing at Level I. However, the drop in performance from Level I to Level II showed that an underparameterized network is limited in its scope of applicability because it will perform well only when applied to sites and times where the values of the missing parameters were the same as those of the training set. Moreover, it was shown that this limitation could not be overcome by increasing the space and time range of the training set so that it comprised a number of experimental conditions with different values of the missing parameters. In other words, an underparameterized network could not be made universal.

In conclusion, the good results obtained at Level I indicated the potential of neural networks for linking sensible heat flux to routinely measured meteorological variables and variables amenable to remote sensing. The significance of this potential is appreciated when bearing in mind that the controversial surface radiometric temperature was used directly as a surrogate for the aerodynamic temperature. The results at Level II were poorer on average, but not disappointingly so. They conveyed a clear and important guideline for future work:if a neural network is to be designed for estimating sensible heat flux on a large scale, it should incorporate variables that serve as indices for canopy morphology and soil moisture. Recent advances in remote sensing techniques for land-use classification (e.g., Sellers et al. 1995a) and soil moisture estimation (e.g., Engman and Chauhan 1995), an emerging wave of new and improved remote sensing data from the Boreal Ecosystem Atmosphere Study (BOREAS) (Sellers et al. 1995b), and the rapid pace of progress in the field of neural computing all signal great opportunities in this area.

## Acknowledgments

The support of this study through grants from the Natural Science and Engineering Research Council of Canada, the Atmospheric Environment Service of Canada and Agriculture and AgrifoodCanada, the support of the first author by the Iranian Ministry of Culture and Higher Education, and the infrastructure support of NASA are gratefully acknowledged.

## REFERENCES

Al-Mashouq, K. A., and I. S. Reed, 1991: Including hints in training neural nets.

*Neural Comput.***3,**418–427.Berkowicz, R., and L. P. Prahm, 1982: Sensible heat flux estimated from routine meteorological data by the resistance method.

*J. Appl. Meteor.,***21,**1845–1864.Businger, J. A., J. C. Wyngaard, I. Izumi, and E. F. Bradley, 1971: Flux profile relationships in the atmospheric surface layer.

*J. Atmos. Sci.,***28,**181–189.Chamberlain, A. C., 1968: Transport of gases to and from surfaces with bluff and wave-like roughness elements.

*Quart. J. Roy. Meteor. Soc.,***94,**318–322.Choudhury, B. J., R. J. Reginato, and S. B. Idso, 1986: An analysis of infrared temperature observations over wheat and calculation of latent heat flux.

*Agric. For. Meteor.,***37,**75–88.Cleugh, H. A., and F. X. Dunin, 1995: Modeling sensible heat fluxes from a wheat canopy: An evaluation of the resistance energy balance model.

*J. Hydrol.,***164,**127–152.Daughtry, C. S. T., W. P. Kustas, M. S. Moran, P. J. Pinter Jr., R. D. Jackson, P. W. Brown, W. D. Nichols, and L. W. Gay, 1990: Spectral estimates of net radiation and soil heat flux.

*Remote Sens. Environ.,***32,**111–124.de Bruin, H. A. R., and A. A. M. Holtslag, 1982: A simple parameterization of the surface fluxes of sensible and latent heat during daytime compared with the Penman–Monteith concept.

*J. Appl. Meteor.,***21,**1610–1621.Desjardins, R. L., P. H. Schuepp, J. I. MacPherson, and D. J. Buckley, 1992: Spatial and temporal variations of the fluxes of carbon dioxide and sensible and latent heat over the FIFE site.

*J. Geophys. Res.,***97**(D17), 18467–18475.——, R. Pelletier, P. H. Schuepp, J. I. MacPherson, H. Hayhoe, and J. Cihlar, 1995: Measurement and prediction of CO2 fluxes from aircraft and satellite-based systems.

*J. Geophys. Res.,***100**(D12), 25549–25558.Engman, E. T., and N. Chauhan, 1995: Status of microwave soil moisture measurements with remote sensing.

*Remote Sens. Environ.,***51,**189–195.Galinski, A. E., and D. J. Thomson, 1995: Comparison of three schemes for predicting surface sensible heat flux.

*Bound.-Layer Meteor.,***72,**345–370.Hall, F. G., K. F. Huemmrich, S. J. Goetz, P. J. Sellers, and J. E. Nickeson, 1992: Satellite remote sensing of surface energy balance: Success, failures, and unresolved issues in FIFE.

*J. Geophys. Res.,***97,**19061–19089.Hatfield, J. L., R. J. Reginato, and S. B. Idso, 1984: Evaluation of canopy temperature–evapotranspiration models over various crops.

*Agric. For. Meteor.,***32,**41–53.Haykin, S., 1994:

*Neural Networks: A Comprehensive Foundation.*Macmillan, 696 pp.Högström, U., 1988: Non-dimensional wind and temperature profiles in the atmospheric surface layer: Re-evaluation.

*Bound.-Layer Meteor.,***42,**55–78.Holtslag, A. A. M., and A. P. Van Ulden, 1983: Simple estimates of night-time surface fluxes from routine weather data.

*J. Climate Appl. Meteor.,***22,**517–529.Horst, T. W., and J. C. Weil, 1994: How far is far enough? The fetch requirements for micrometeorological measurements of surface fluxes.

*J. Atmos. Oceanic Technol.,***11,**1018–1025.Humes, K. S., W. P. Kustas, and M. S. Moran, 1994: Use of remote sensing and reference site measurements to estimate instantaneous surface energy balance components over a semiarid rangeland watershed.

*Water Resour. Res.,***30,**1363–1373.Kalma, J. D., and D. L. B. Jupp, 1990: Estimating evaporation frompasture using infrared thermometry: Evaluation of a one-layer resistance model.

*Agric. For. Meteor.,***51,**223–246.Kanemasu, E. T., S. B. Verma, E. A. Smith, L. J. Fritschen, M. Wesely, R. T. Field, W. P. Kustas, H. Weaver, J. B. Stewart, R. Gurney, G. Panin, and J. B. Moncrieff, 1992: Surface flux measurements in FIFE: An overview.

*J. Geophys. Res.,***97,**18547–18555.Kelly, R. D., E. A. Smith, and J. I. MacPherson, 1992: A comparison of surface sensible and latent heat fluxes from aircraft and surface measurements in FIFE 1987.

*J. Geophys. Res.,***97**(D12), 18445–18454.Kohsiek, W., H. A. R. de Bruin, H. The, and B. Van den Hurk, 1993:Estimation of the sensible heat flux of a semiarid area using surface radiative temperature measurements.

*Bound.-Layer Meteor.,***63,**213–230.Kustas, W. P., B. J. Choudhury, M. S. Moran, R. J. Reginato, R. D. Jackson, L. W. Gay, and H. L. Weaver, 1989: Determination of sensible heat flux over sparse canopy using thermal infrared data.

*Agric. For. Meteor.,***44,**197–216.Lhomme, J. P., N. Katerji, and J. M. Bertolini, 1992: Estimating sensible heat flux from radiometric temperature over crop canopy.

*Bound.-Layer Meteor.,***61,**287–300.Lloyd, C. R., 1995: The effect of heterogeneous terrain on micrometeorological flux measurements: A case study from HAPEX–SAHEL.

*Agric. For. Meteor.,***73,**209–216.Minia, A. A., and R. D. Williams, 1990: Acceleration of back-propagation through learning rate and momentum adaptation.

*Proc.**Int. Joint Conf. on Neural Networks,*Vol. 1, San Diego, CA, IEEE Neural Networks Council, 676–679.Monteith, J. L., and M. H. Unsworth, 1990:

*Principles of Environmental Physics.*2d ed. Edward Arnold, 291 pp.NeuralWare, 1993a:

*Neural Computing: A Technology Handbook for Professional II/PLUS and NeuralWorks Explorer.*NeuralWare, 329 pp.——, 1993b:

*Using NeuralWorks: A Tutorial for NeuralWorks Professional II/PLUS and NeuralWorks Explorer.*NeuralWare, 158 pp.——, 1993c:

*Reference Guide for Professional II/PLUS and NeuralWorks Explorer.*NeuralWare, 278 pp.Nie, D., E. T. Kanemasu, L. J. Fritschen, H. L. Weaver, E. A. Smith, S. B. Verma, R. T. Field, and W. P. Kustas, 1992: An intercomparison of surface energy flux measurement systems used during FIFE 1987.

*J. Geophys. Res.,***97,**18715–18724.Reginato, R. J., R. D. Jackson, and P. J. Pinter Jr., 1985: Evapotranspiration calculated from remote multispectral and ground station meteorological data.

*Remote Sens. Environ.,***18,**75–89.Schuepp, P. H., M. Y. Leclerc, J. I. MacPherson, and R. L. Desjardins, 1990: Footprint prediction for scalar fluxes by analytical solutions of the diffusion equation.

*Bound.-Layer Meteor.,***50,**355–373.——, J. I. MacPherson, and R. L. Desjardins, 1992: Adjustment of footprint correction for airborne flux mapping over the FIFE site.

*J. Geophys. Res.,***97**(D17), 18455–18466.Sellers, P. J., F. G. Hall, G. Asrar, D. E. Strebel, and R. E. Murphy, 1992: An overview of the First International Satellite Land Surface Climatology Project (ISLSCP) Field Experiment (FIFE).

*J. Geophys. Res.,***97,**18345–18371.——, B. W. Meeson, F. G. Hall, G. Asrar, R. E. Murphy, R. A. Schiffer, F. P. Bretherton, R. E. Dickinson, R. G. Ellingson, and C. B. Field, 1995a: Remote sensing of the land surface for studies of global change: Models—algorithms—experiments.

*Remote Sens. Environ.,***51,**3–26.——, F. Hall, H. Margolis, B. Kelly, D. Baldocchi, G. Hartog, J. Cihlar, M. G. Ryan, B. Goodison, P. Crill, K. Jon Ranson, D. Lettenmaier, and D. E. Wickland, 1995b: The Boreal Ecosystem—Atmosphere Study (BOREAS): An overview and early results from the 1994 field year.

*Bull. Amer. Meteor. Soc.,***76,**1549–1577.Stewart, J. B., W. P. Kustas, K. S. Humes, W. D. Nichols, M. S. Moran, and H. A. R. de Bruin, 1994: Sensible heat flux-radiometric surface temperature relationship for eight semiarid areas.

*J. Appl. Meteor.,***33,**1110–1117.Strebel, D. E., D. R. Landis, K. F. Huemmrich, and B. W. Meeson, 1994:

*Collected Data of the First ISLSCP Field Experiment.*Vol. 1,*Surface Observations and Non-Image Data Sets.*NASA, CD-ROM.Willmott, C. J., 1982: Some comments on the evaluation of model performance.

*Bull. Amer. Meteor. Soc.,***63,**1309–1313.

Quantitative measures of performance for six networks at Level I. *N*: number of observations; 〈*O*〉: mean of observations; 〈*M*〉:mean of model output; *s _{O}*: standard deviation of observations;

*s*: standard deviation of model output;

_{M}*a*: least squares regression intercept;

*b*: least squares regression slope; rms: root-mean-square error; PSE: percentage of systematic errors;

*Q*: Willmott’s index of agreement; rms/〈

*O*〉: skill;

*R*: correlation coefficient. 〈

*O*〉, 〈

*M*〉,

*s*,

_{O}*s*, and rms are in watts per square meter; the other terms are dimensionless.

_{M}Quantitative measures of performance for network 4439EC at Level II. *N*: number of observations; 〈*O*〉: mean of observations;〈*M*〉: mean of model output; *s _{O}*: standard deviation of observations;

*s*: standard deviation of model output;

_{M}*a*: least squares regression intercept;

*b*: least squares regression slope; rmse: root-mean-square error; PSE: percentage of systematic errors;

*Q*: Willmott’s index of agreement; rms/〈

*O*〉: skill;

*R*: correlation coefficient. 〈

*O*〉, 〈

*M*〉,

*s*,

_{O}*s*, and rmse are in watts per square meter; the other terms are dimensionless.

_{M}Quantitative measures of performance for the “universal” network at Level I and the “local” network 4439EC at Level II (combined results). *N*: number of observations; 〈*O*〉: mean of observations; 〈*M*〉: mean of model output; *s _{O}*: standard deviation of observations;

*s*: standard deviation of model output;

_{M}*a*: least squares regression intercept;

*b*: least squares regression slope; rmse: root-mean-square error;PSE: percentage of systematic errors;

*Q*: Willmott’s index of agreement; rms/〈

*O*〉: skill;

*R*: correlation coefficient. 〈

*O*〉, 〈

*M*〉,

*s*,

_{O}*s*, and rms are in watts per square meter; the other terms are dimensionless.

_{M}