Understanding the hydrological cycle is one of the major tasks in atmospheric research. This cycle is responsible for the redistribution of water and (latent) energy in the climate system. In addition clouds—one component of this cycle—significantly affect the earth radiation budget (Harrison et al. 1990). Small variations in cloud amount can alter the radiative forcing at the top of the atmosphere and consequently change the atmospheric temperature. Modeling results further suggest that a radiative perturbation due to a CO2 doubling could be balanced by an increase in liquid water path (LWP) from 0.05 to 0.06–0.068 kg m−2 (Slingo 1990). Thus, a prerequisite for a better understanding of climate is the knowledge of the temporal and spatial behavior of the cloud fields.
The aim of this study is to contribute to the observational part, that is, to apply neural networks (NNs) to the retrieval of LWP from spaceborne microwave measurements over the oceans where ground-based measurements are limited.
The microwave radiometer SSM/I (Special Sensor Microwave/Imager) on board the satellites of the Defense Meteorological Satellite Program (DMSP) measures thermal-emitted radiation from the earth’s surface and the atmosphere at 19.35, 37.0, and 85.5 GHz, vertically and horizontally polarized, and at 22.235 GHz, vertically polarizied only. These observational data can be used to retrieve surface and atmospheric parameters simultaneously, in particular those that are included in the hydrological cycle. (For further details, see Hollinger et al. 1987.) The first DMSP satellite was launched in June 1987. The long time series provided by the SSM/I makes this radiometer attractive for many research purposes: for example, investigation of seasonal and interannual variability of the hydrological parameters (Ferraro et al. 1996).
Several retrieval algorithms for the LWP can be found in the literature. On the one hand, there exist (semi-) statistical algorithms based on regression techniques (Alishouse et al. 1990b; Karstens et al. 1994; Weng and Grody 1994). On the other hand, there are several algorithms of physical origin; that is, they are based on radiative transfer calculations (Greenwald et al. 1993; Liu and Curry 1993; Prigent et al. 1994). The former have advantages due to their simplicity in operational use, and the latter are assumed to perform better, as validation results for six cases with simultaneous aircraft and SSM/I measurements indicate (Cober et al. 1996). A more detailed description of most of the cited algorithms is given by Cober et al. (1996).
As noted by Petty at the SSM/I Algorithm Symposium (see Colton and Poe 1994), retrieval problems arise due to correlations between the retrieved LWP and high humidity values (“cross talk”) and due to “clear-sky noise,” which is accompanied by negative LWPs. However, setting negative LWPs to zero, as is sometimes proposed, biases a retrieval mean proportional to the level of “noise.” Thus, it is desirable to develop LWP algorithms that combine simplicity in applications with accuracy and that show neither cross talk nor clear-sky noise.
During the past five years the application of NNs has become a valuable tool in meteorology. Among these applications are predicitions, for example, on time series data (Elsner and Tsonis 1992; Tang et al. 1994; Hastenrath et al. 1995) and nonlinear principal component analysis of climate data (Sengupta and Boyle 1995). However, most applications can be found in remote sensing, either to classify properties within an image (for an overview, see Atkinson and Tatnall 1997) or to solve inverse problems: for example, to retrieve vertical profiles of temperature (Churnside et al. 1994; Butler and Meredith 1996) and of relative humidity (Cabrera-Mercader and Staelin 1995) from microwave data. With NNs, improvements were found over regression algorithms (Cabrera-Mercader and Staelin 1995; Butler and Meredith 1996). Retrieval algorithms for SSM/I observations on the basis of NNs are also available to infer wind speed (Stogryn et al. 1994; Krasnopolsky et al. 1995) and snow parameters (Tsang et al. 1992). Krasnopolsky et al. (1995) report significant improvements compared to regression algorithms, in particular under cloudy and very cloudy conditions. To our knowledge, however, NNs have not been applied to the retrieval of LWP from SSM/I measurements.
Neural networks have several properties that make them attractive for solving inverse problems. First, they are able to detect and approximate multiple, nonlinear correlations without a priori information about the problem under consideration. Second, they are able to approximate relationships using only the data presented. That is, they do not depend on model equations like regression techniques. Once training has been finished, they are easy to implement, and the computer cost is relatively low.
In this study, the potential of NNs for the development of LWP retrieval algorithms for SSM/I measurements is investigated. In section 2 the data basis used for the calibration and validation is described. The NN methodology is introduced in section 3. The algorithm development and the results are described in sections 4 and 5, and the summary and conclusions are given in section 6.
Data and radiative transfer model
Three types of data are used in this study. The calibration and validation of the LWP algorithms are carried out with pairs of simulated brightness temperatures (Tb’s) for the SSM/I and parameterized LWPs, both calculated from radiosonde observations (RAOBs). An indirect validation is performed with SSM/I observations under clear-sky conditions from the F-10 intrument. The clear-sky cases were defined with collocated IR-Meteosat data.
About 2060 RAOBs over most of the global oceans with corresponding synoptic observations (SYNOPs) are the main data basis in this study. The sources, number of ascents, locations, and time periods for the different subsets are given in Table 1. RAOBs from two regions are missing: no data are available over the Indian Ocean, and except for one cruise, the subtropical South Pacific is not represented at all in our dataset.
The mean and standard deviation of the estimated LWP for all 2060 RAOBs is given by 0.13 and 0.35 kg m−2, respectively. For 1408 RAOBs (approximately 70%) LWP is equal to 0.0, for 476 cases LWP is between 0.0 and 0.5 kg m−2, and for 176 cases LWP is greater than 0.5 kg m−2.
Both the RAOBs and the calculated profiles of the LWCmod serve as input for the radiative transfer model to compute the Tb’s. Missing wind speed data in the SYNOPs were filled using equally distributed random values between 0 and 14 m s−1. Missing SSTs (sea surface temperatures) were substituted by the surface air temperature. For calculating the microwave radiances at the top of the atmosphere the radiative transfer model developed by Simmer (1994) was used. This one-dimensional monochromatic model solves the polarized radiative transfer equation by successive order of scattering (e.g., Goody and Yung 1989). The absorption and scattering coefficients were calculated using Mie theory. Rain clouds are not included. For the ocean surface, specular reflection is assumed modified by wind-speed-dependent surface roughness by a formulation of Wisler and Hollinger (1977). The influence of foam (reflection) on the oceanic surface emissivity was allowed for according to Stogryn (1972). The simulated Tb’s refer to an incident angle of 53°. Normally distributed noise has been added to the Tb’s with standard deviations equal to the noise-equivalent temperature differentials given by Hollinger et al. (1987).
The clear-sky dataset of microwave observation has been compiled with independent IR data (10.5–12.5 μm) obtained from Meteosat. This dataset comprises 70 000 SSM/I observations (F-10 instrument) taken over the Atlantic Ocean for June 1992.
Two important differences in the use of NN programs, as compared to other optimization problems, need to be mentioned. Conventionally, minimization is said to have converged if the changes ΔX and/or ΔE had become sufficiently small. This criterion cannot be applied to our problem for the following reason. Any method minimizing E will in the beginning fit the gross features of the data. This is reflected by a large gain in E in each step. Once a reasonable E is obtained, it will start to optimize particular properties of the training events (which may also be a few badly measured events). To prevent this “overfitting,” the usual method is to divide the available data into two parts: a training set that is used for the minimization and a test set for the validation. In each iteration step the cost function for the latter is also calculated. The overfitting phase is signaled if the cost function for the training dataset decreases, but the cost function for the test set increases over several subsequent iterations. Independent of the minimzation method, any further minimization is meaningless. Therefore, we restart the minimzation using the same initial weights and stop the learning process before the overfitting phase starts.
The second remark concerns the problem of several minima. The minimization has to be repeated many times with different randomly chosen starting values for X. This is of particular importance for the variable metric method. From the distribution of the achieved cost functions one can estimate whether the best value is purely accidential or representative of the dataset.
Algorithm development—Different neural network architectures
To investigate the architecture of NNs for the development of an LWP algorithm, the 2060 available data pairs have been randomly divided into two parts. The first 1030 data pairs were used as a training set, and the remaining as a test set. We found for all of our results that interchanging the role of the training and the test sets did not produce any significant change in the results.
The following questions have been adressed to investigate the best possible architecture.
NN input: Which channels of the SSM/I should be used at best?
Number of hidden neurons: How many do we need?
Application: Are the channels always available?
The first and third criteria are common to other retrieval techniques as well. The number of hidden neurons is proportional to the flexibility of the NN: the more neurons that are used, the better the training data that are reproduced. Nevertheless, too many neurons may result in greater errors for the independent test set. The last criterion, applicability, is motivated by the fact that, for example, the SSM/I measurements at T85V on board the F-8 satellite are not reliable after 1989 due to high instrumental noise (which hampers a long homogeneous time series) and that corrections for systematic deviations between measured and simulated Tb’s (Simmer 1994) are not available for T85V.
Three different input combinations are investigated: all SSM/I channels except the T85V one (ALL/85V), the low-frequency SSM/I channels (19, 22, 37 GHz) for both polarizations (LOW), and a two-channel algorithm—T22V and T37V (TWO). To estimate the influence of different local minima, 40 random searches from different starting points in weight space were carried out for zero to seven hidden neurons and for each of the three input combinations. In each run a maximum of 2000 iterations was performed and the best solution X for the test set was saved for further investigation. Note that, typically, fewer than 2000 iterations were sufficient to reach convergence.
The rms errors for ALL/85V, LOW, and TWO depend on the number of hidden neurons (Fig. 2). Differences are obvious except between zero and one neuron. With more than five hidden neurons, the improvement shows some kind of saturation, which is obvious from the 10th percentile (dotted in Fig. 2). Note that the 10th percentile separates the four best solutions with respect to the rms error. The spread of the curves in Fig. 2 indicates that the range of the rms errors is wide for ALL/85V and LOW and very narrow for TWO for more than one hidden neuron. This indicates that few of the runs were stuck in local minima of the cost function, which is true especially for the former two algorithms. This result confirms the importance of multiple random searches. Despite the narrow range of the rms errors in TWO, the accuracies are better for ALL/85V and LOW, which can be seen from the 10th percentile.
The best results of all 40 realizations are summarized in Table 2. In addition, the number of weights is given to show the complexity of the method. The differences between ALL/85V and LOW are relatively small. Note that the best results agree well with the 10th percentile in Fig. 2, which enhances confidence that the best values are not purely accidential.
The rms errors for TWO with more than four hidden neurons are considerably higher than those for ALL/85V and LOW, indicating that there is additional valuable information in the other channels for the LWP retrieval. It also shows the ability of NNs to make use of small correlations in the data. Note that this result is in contrast with that reported by Karstens et al. (1994) with their regression method. Karstens et al. obtained no further improvement when all SSM/I channels were used as predictors. This might be the consequence of collinearities between the Tb’s, which are problematic for ordinary least squares techniques (Crone et al. 1996).
We conclude that five hidden neurons seem to be a good compromise with respect to accuracy and simplicity. We have also applied Akaike’s Information Criterion (AIC), which is a more objective criterion for model selection in regression and NN applications (e.g., von Storch and Zwiers 1997). AIC takes into account the two parameters’ model error and complexity (number of weights), for which a minimum is required. AIC as function of hidden neurons (not shown) leads essentially to the same conclusions described above but in a more objective manner; that is, five hidden neurons seem to be sufficient. Consequently, further investigations in the remainder of this paper refer to NN architecture with five neurons in the hidden layer.
Algorithm intercomparison for simulated data
The results for ALL/85V and REG for low and high LWPs are shown in Fig. 3. Obviously, the scatter is reduced for the NN-retrieved results. This is true in particular for cloud-free cases and LWP larger than 1.0 kg m−2. For the latter, REG shows a similar behavior if a0, a1, and a2 were recalibrated for the whole LWP range. Note that Karstens et al. (1994) calibrated their algorithm only for LWPs less than 1.0 kg m−2. The low accuracy at high LWPs might be due to the fact that the linearization is not valid for optically thick clouds. The results of the NN algorithm ALL/85V agree very well with the parameterized LWP over the whole range, but there is a tendency to underestimate LWPs less than 0.02 kg m−2 (Fig. 3a). This behavior is more strongly pronounced for the two-channel algorithm (not shown). The results for LOW resemble those for ALL/85V (not shown).
The performances of ALL/85V, LOW, TWO, and REG are summarized in Table 3. The bias is not included because it is close to zero for all algorithms. TWO is twice as accurate as REG for all LWP ranges and explains significantly more variance. Using more than two channels (LOW and ALL/85V) provides a significant gain in accuracy and explained variance (Table 3). The differences between LOW and ALL/85V are negligible.
Since, in general, measurements of LWP are not available to validate the algorithm results, a test of the algorithm behavior at LWP = 0.0 is often the only solution to this problem (see, e.g., Karstens et al. 1994). However, algorithms can have problems at the low end of the LWP range. Therefore, we investigate the NN algorithm behavior for LWP = 0.0. The normalized frequency distributions of the results of REG, TWO, LOW, and ALL/85V in clear-sky conditions are shown in Fig. 4. The three NN-based algorithms identify clear-sky cases much more accurately than the regression algorithm that shows a relatively flat distribution. For LOW and ALL/85V, approximately 80% and 90% of all clear-sky cases (n = 691), respectively, were correctly estimated within the range of ±0.006 kg m−2. The corresponding clear-sky statistic is summarized in Table 4. Again, the bias is negligible for all algorithms. The standard deviation for REG is significantly higher than those for TWO, LOW, and ALL/85V. ALL/85V shows the lowest standard deviation of 2.0 × 10−3 kg m−2. A remarkable feature of ALL/85V is that no negative values were retrieved (Table 4). The weights a10 and a11 in (6) were adjusted in such a manner (during the training process) that ALL/85V yields no negative values.
Statistical results show that the large clear-sky standard deviation of REG is due partly to the influence of other geophysical parameters such as surface wind speed (V), total precipitable water (TPW), and SST. The linear correlations and their error estimates between the retrieved LWP under clear-sky conditions and these three geophysical parameters are shown in Table 5. The results of REG are significantly correlated with all three parameters. The explained variances, which contribute to the “clear-sky noise,” range from 10% for TPW to 25% for V and SST. The results of TWO are also significantly correlated with V; however, the correlation is much smaller with SST and TPW. The results of LOW are influenced only by the SST, and those of ALL/85V show no significant correlations with all three parameters within the range of uncertainty.
Note, however, that only linear correlations were considered, so that the given dependencies may have to be regarded as lower boundaries. For example, REG systematically overestimates LWP up to 0.1 kg m−2 mainly for dry atmospheres with TPW ⩽ 15 kg m−2, which contributes to the negative sign of the correlation coefficient.
The validation with the independent test set described above is based on data that were generated (parameterized) in the same manner as the training set. To test the algorithms for “other clouds,” which are more independent, we have changed the LWP parameterization; that is, the relative humidity threshold was set to 90% instead of 95% (section 2). For the 1030 independent RAOBs, the LWP and Tb’s were recalculated using the lower threshold. As a result, the relationship between the temperature–humidity profiles and the cloud properties (height, water content) has changed.
As expected, the behavior of the different algorithms has not changed significantly for clear-sky conditions. To estimate the sensitivity of the algorithms for cloudy conditions, two intervals were investigated: 0 < LWP ⩽ 0.5 kg m−2 and LWP ≥ 0.5 kg m−2. The latter range is more or less of theoretical interest since in the real atmosphere clouds with LWPs higher than 0.5 kg m−2 are commonly believed to rain (e.g., Karstens et al. 1994), and rain is not included in the training set.
The correlation coefficients as well as the biases are not affected by the change in the relative humidity threshold. For 0 < LWP ⩽ 0.5 kg m−2, the increase in rms errors range from 0.001 (ALL/85V and REG) to 0.004 kg m−2 (TWO).
The increase in rms error for LWP ≥ 0.5 kg m−2 is more pronounced: Δrms ≈ 0.01 kg m−2 (ALL/85V, LOW, and REG). The increase for TWO is three times larger (=0.036 kg m−2). The larger rms errors are mainly caused by a shift in the LWP distribution toward higher values for which the retrieval errors are also larger (not shown). Therefore, the main conclusions given above still hold in the case when the algorithms are validated with clouds that due to a change in the parameterization method are different from those used during the training procedure.
Indirect validation for clear-sky cases
After the test of the algorithms with simulated data, direct measurements of the SSM/I are used in this section. Again no actual observations of cloud liquid water content are available; thus, the test is carried out for the clear-sky cases. The clear-sky noise and cross-talk problems can be investigated with observed data, too.
Before the algorithms could be applied, the Tb’s had to be adjusted for systematic differences between the results of the model used for the algorithm development and the SSM/I observations. Corrections were estimated from comparisons under clear-sky conditions between simulated and measured F-8 Tb’s (Fuhrhop and Simmer 1996). The adjustments are quite similar for the F-10 and F-11 instruments (H. Gäng 1997, personnel communication).
The standard deviation for the retrieved LWPs for clear-sky cases (classified with IR-Meteosat data) from measured SSM/I Tb’s for REG, TWO, and ALL/85V as a function of latitude are depicted in Fig. 5. The results of LOW are similar to those of ALL/85V and therefore are not shown. REG shows the highest scatter with a dependency on the geographical latitude. The standard deviation for TWO are lower, but latitudinal dependency is still evident. The clear-sky noise of ALL/85V is considerably reduced. All algorithms estimate LWPs exceeding 0.1 kg m−2 in few cases (less than 1%), especially over the Southern Hemisphere and near 40°N. This might be due in part to errors in the clear-sky classification procedure.
The biases of the retrieved LWPs under the same clear-sky conditions (which is equal to the average retrieval) are shown in Fig. 6. The biases for LOW (not shown) and ALL/85V are independent of latitude and therefore independent of the total TPW. In contrast, there are significant latitudinal dependencies for TWO and REG. This behavior is somewhat masked by a negative peak near 15°N for REG. Overall, however, the correlations with TPW are obviously positive for both algorithms. The corresponding normalized frequency distributions are presented in Fig. 7. They are like those for the parameterized clear-sky cases (Fig. 4) except for positive biases for TWO and REG.
Clear-sky statistics and correlations between retrieved LWPs and TPWs (Alishouse et al. 1990a), as well as surface wind speeds V calculated with the retrieval algorithms (Goodberlet et al. 1989) are summarized in Table 6. As noted before, there are significant positive biases for REG and TWO, and negligible positive biases for LOW and ALL/85V. The standard deviation for REG is twice as high as for TWO. ALL/85V shows the highest accuracy with a standard deviation of 0.006 kg m−2. The standard deviation of REG agrees well with that for the simulated data, but for the NN-based algorithms the standard deviation is higher, which is mainly caused by outliers that were not present in the simulated data. Note that only the two-channel NN (TWO) shows a significantly positive bias. This might be due in part to the positive correlations (r = 0.38) with TPW (Table 6) that are not present for LOW and ALL/85V (0.07 and −0.03). The mean TPW for the clear dataset (SSM/I estimate) is 30 kg m−2, which is approximately 4 kg m−2 higher than the mean TPW in the training and test set. Consequently, moist conditions are somewhat overrepresentated, so that the positive biases may be due in part to the TPW influence. Note that the sign of the correlation with TPW has reversed for REG in comparison to the parameterized clear-sky cases (Table 5). The negative sign for the latter dataset is due mainly to an overestimation for dry atmospheres, which are underrepresentated in the former set. Additionally, the overestimation of REG at high humidity values is much more pronounced for the measured clear-sky cases (not shown). Surface wind speed V explains 1% of the variance of the results of REG and TWO (Table 6). In contrast, LOW and ALL/85V show no significant correlation with V. The positive sign of the correlation coefficients for the two-channel approaches is due mainly to an overestimation of LWP for V higher than 12 m s−1 (not shown).
Summary and conclusions
The potential of NNs for the development of retrieval algorithms for LWP from SSM/I data are studied. The effects of NN architecture and different predictors are investigated, and an intercomparison with a standard regression algorithm (Karstens et al. 1994) is presented.
The NN is trained with a dataset comprising 1030 pairs of simulated Tb’s and parameterized LWPs that are based on global distributed RAOBs over the oceans.
The choice of the NN architecture has a significant influence on the retrieval accuracy. Important architecture parameters are the number of neurons in the hidden layer (only one layer was employed). Multiple random searches in weight space were necessary in order to get a good minimum of the cost function and to ensure that the solution found during the training procedure was not purely accidential. This might be mainly a problem of variable metric and conjugate gradient methods because standard back propagation is less likely to get stuck in local minima (van der Smagt 1994). Nevertheless, second-order methods are far superior with respect to learning time.
A validation of the algorithm is presented that relies on 1030 independent pairs of Tb’s and LWPs, simulated and parameterized in the same manner as for the training set. The NN algorithms show significantly better retrieval results in comparison to the regression algorithm, especially for clear-sky cases and high LWPs (LWP > 0.5 kg m−2). For the latter, however, the results are of more theoretical interest since clouds contain rainwater for such LWP (Karstens et al. 1994) and rain was not included in the training and test set. A direct intercomparison between regression and NNs was possible since both algorithms use the same Tb’s as predictors (T22V and T37V). Using these two channels reduces but does not fully cancel the problems of clear-sky noise and cross talk with TPW. These effects can be reduced significantly if NNs with more SSM/I channels as predictors (LOW and ALL/85V) are used in order to correct for the influence of other geophysical parameters. The advantage of NNs lies in the fact that they can extract information even if the channels are redundant and the linear correlations with the geophysical parameters are small. This is apparent from clear-sky investigations (clear-sky noise and cross talk) and high LWP cases. For the latter, the linearization of the regression model (12), which is valid only under nonscattering conditions, is not appropriate. With respect to LWP, NNs can approximate the (unknown) Tb–LWP relationship, taking into account nonlinear effects such as scattering. These results agree quite well with those from Stogryn et al. (1994) and Krasnopolsky et al. (1995), who found that the wind speed retrieval for the SSM/I radiometer is also possible under more adverse conditions (high LWPs) when NNs are used instead of regression algorithms.
A direct validation of LWP algorithms is difficult to perform since measurements of LWP over the whole horizontal scale (≈50 km) and vertical column do not exist. To test the algorithms for more independent cases, two additional validations are presented. Changing the relative humidity threshold for the LWP parameterization changes the vertical distribution and total amount of cloud water, and with this the relationship between the vertical structure of temperature as well as humidity and the parameterized clouds. Using this more independent test set of Tb’s and LWPs does not change the conclusions described above. The second indirect validation is based on measured Tb’s (F-10 instrument) for clear-sky cases, classified with independent IR-Meteosat data. Applying this set of noisier Tb’s shows that clear-sky noise and cross talk is also removed for LOW and ALL/85V applications to SSM/I observations.
To estimate the influence of noisier Tb’s under cloudy conditions we performed Monte Carlo experiments. Normally distributed, uncorrelated, and zero mean noise with standard deviations between 0 and 5 K and Δstd dev = 0.5 was added to the simulated Tb’s (clean apart from the radiometer’s noise) of the test set. For each noise level this procedure was repeated 50 times (i.e., 50 realizations of 11 different noise levels and 1030 pairs). The rms errors as estimates of the scatter as well as the biases were calculated for all algorithms. The rms ratio between the NN (TWO) and the regression algorithm (REG) using the same predictors increases from 0.6 for zero noise to 0.8 for an added noise higher than 3 K. The rms difference, however, remains constant. The increase in bias for higher noise levels is greater for TWO than for REG. These results still hold when noise is added only to one channel. The sensitivity to noise in channels T22V and T37V (the other held fixed) is significantly reduced for LOW and ALL/85V in comparison to the two-channel algorithms because the other noise-free channels stabilize the retrieval. These results enhance our confidence that the algorithms give better results than the regression algorithms, even when noisier Tb’s are used as input, that is, applications of observed data.
To estimate the retrieval error for the new algorithms under cloudy conditions (LWP ⩽ 0.5 kg m−2), we assume that the rms ratio between clear-sky and cloudy conditions remains equal when measured instead of simulated data are used (first-order error estimate). Taking the estimates from Tables 3 and 4 (simulated) and Table 6 (measured), we can solve for the unknown rms error. Using this procedure, the rms errors for LWP ⩽ 0.5 kg m−2 become equal for REG and TWO (≈0.031 kg m−2). However, the error estimates for LOW and ALL/85V are much smaller (≈0.023 kg m−2). Note that possible classification errors (IR clear-sky detection) propagate into these error estimates. Another indirect validation can be performed with ground-based observations. Such measurements are currently being carried out by our group, but more data are needed for a sufficient test.
As noted above, the new algorithms are not valid when rain clouds are present. To exclude such cases, we propose a polarization threshold method that relies on the Tb’s at 37 and 85 GHz. If we assume that clouds rain if LWP > 0.5 kg m−2 (Karstens et al. 1994), the new algorithms should not be used if the polarization (vertical minus horizontal polarization) is lower than 40 or 7 K for 37 or 85 GHz, respectively. The LWP algorithms can be applied, however, even when ice clouds are present because these clouds are included in the training set. Problems might arise due to inhomogeneously distributed clouds in the radiometer’s field of view—the “beam-filling problem,” the influence of which is difficult to estimate since no quantitative numbers can be given.
New NN algorithms that are also applicable under rainy conditions are currently under development. The theoretical results, that is, the retrievals for LWP > 0.5 kg m−2, presented in this study indicate that NNs are able to map high nonlinerarities. An NN-based LWP algorithm, which is still applicable when rain is present, will be a considerable improvement to the algorithms presented in this study.1
We propose to use NNs instead of regression techniques when nonlinearities in the radiative transfer equation become important during the retrieval process because the form of the inverse radiative transfer operator is a priori unknown in such cases. If the linearization of the radiative transfer equation is valid over the whole range, for the retrieval of total precipitable water, for example, the advantage of NNs is comparably small (Jung 1996). We would like to note that the computational cost of the new algorithms is comparable to the regression method, once training has been finished, because only the feed-forward phase has to be processed [(4)–(6)]. This aspect may be important, for example, when long times series of geophysical fields have to be calculated from satellite observations.
The authors would like to acknowledge Dr. Rolf Fuhrhop for valuable discussions and for providing the radiative transfer model. Dr. Holger Gäng is acknowledged for providing the clear-sky dataset. We also wish to thank three reviewers for constructive comments on this manuscript. This work was supported by the German Research Foundation (DFG) under Contract Ru 375/5-1.
Alishouse, J. C., S. A. Snyder, J. Vongsathorn, and R. R. Ferraro, 1990a: Determination of oceanic total precipitable water from the SSM/I. IEEE Trans. Geosci. Remote Sens.,28, 811–816.
——, J. B. Snider, E. R. Westwater, C. T. Swift, C. S. Ruf, S. A. Snyder, J. Vongsathorn, and R. R. Ferraro, 1990b: Determination of cloud liquid water content using the SSM/I. IEEE Trans. Geosci. Remote Sens.,28, 817–822.
Atkinson, P. M., and A. R. L. Tatnall, 1997: Neural networks in remote sensing. Int. J. Remote Sens.,18, 699–709.
Butler, C. T., and R. v. Z. Meredith, 1996: Retrieving atmospheric temperature parameters from DMSP SSM/T-1 data with a neural network. J. Geophys. Res.,101, 7075–7083.
Cabrera-Mercader, C. R., and D. H. Staelin, 1995: Passive microwave relative humidity retrievals using feedforward neural networks. IEEE Trans. Geosci. Remote Sens.,33, 1324–1328.
Campbell, N. A., 1996: The decorrelation stretch transformation. Int. J. Remote Sens.,17, 1939–1949.
Churnside, J. H., T. A. Stermitz, and J. A. Schroeder, 1994: Temperature profiling with neural network inversion of microwave radiometer data. J. Atmos. Oceanic. Technol.,11, 105–109.
Cober, G. C., A. Tremblay, and G. A. Isaac, 1996: Comparison of SSM/I liquid water paths with aircraft measurements. J. Appl. Meteor.,35, 503–519.
Colton, M. C., and G. A. Poe, 1994: Shared Processing Program, Defense Meteorological Satellite Program, Special Sensor Microwave/Imager Algorithm Symposium, 8–10 June 1993. Bull. Amer. Meteor. Soc.,75, 1663–1669.
Crone, L. J., L. M. McMillin, and D. S. Crosby, 1996: Constrained regression in satellite meteorology. J. Appl. Meteor.,35, 2023–2035.
Elsner, J. B., and A. A. Tsonis, 1992: Nonlinear prediction, chaos, and noise. Bull. Amer. Meteor. Soc.,73, 49–60.
Ferraro, R. R., F. Weng, N. C. Grody, and A. Basist, 1996: An eight-year (1987–1994) time series of rainfall, clouds, water vapor, snow cover, and sea ice derived from SSM/I measurements. Bull. Amer. Meteor. Soc.,77, 891–905.
Fletcher, R., and M. J. D. Powell, 1963: A rapidly convergent descent method for minimization. Comput. J.,6, 163–168.
Fuhrhop, R., and C. Simmer, 1996: SSM/I brightness temperature corrections for incident angle variations. J. Atmos. Oceanic. Technol.,13, 246–254.
Goodberlet, M. A., C. T. Swift, and J. C. Wilkerson, 1989: Remote sensing of ocean surface winds with the Special Sensor Microwave/Imager. J. Geophys. Res.,94, 14 547–14 555.
Goody, R. M., and Y. L. Yung, 1989: Atmospheric Radiation—Theoretical Basis. Oxford University Press, 519 pp.
Greenwald, T. J., G. L. Stephens, T. H. Vonder Haar, and D. L. Jackson, 1993: A physical retrieval of cloud liquid water over the global oceans using Special Sensor Microwave/Imager (SSM/I) observations. J. Geophys. Res.,98, 18 471–18 488.
Harrison, E. F., P. Minnis, B. R. Barkstrom, V. Ramanathan, R. D. Cess, and G. G. Gibson, 1990: Seasonal variation of cloud radiative forcing derived from the Earth Radiation Budget Experiment. J. Geophys. Res.,95, 18 687–18 703.
Hastenrath, S., L. Greischar, and J. van Heerden, 1995: Prediction of the summer rainfall over South Africa. J. Climate,8, 1511–1518.
Hollinger, J. P., R. Lo, G. Poe, R. Savage, and J. Peirce, 1987: Special Sensor Microwave/Imager user’s guide. Tech. Rep., Naval Research Laboratory, Washington, DC, 120 pp.
Jung, T., 1996: Bestimmung des Wasserdampf- und Flüssigwassergehaltes über den Ozeanen aus simulierten Special Sensor Microwave/Imager (SSM/I)-Daten mit neuronalen Netzen. M.S. thesis, Diplomarbeit am Institut für Meereskunde an der Christian-Albrechts-Universität Kiel, Kiel, Germany, 87 pp. [Available online at http://www.ifm.uni-kiel.de/me/research/Projekte/RemSens/hypam.html.].
Karstens, U., C. Simmer, and E. Ruprecht, 1994: Remote sensing of cloud liquid water. Meteor. Atmos. Phys.,54, 157–171.
Krasnopolsky, V. M., L. C. Breaker, and W. H. Gemmill, 1995: A neural network as a nonlinear transfer function model for retrieving surface wind speeds from the Special Sensor Microwave/Imager. J. Geophys. Res.,100, 11 033–11 045.
Liu, G., and J. A. Curry, 1993: Determination of characteristic features of cloud liquid water from satellite microwave measurements. J. Geophys. Res.,98, 5069–5092.
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, 1992: Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, 963 pp.
Prigent, C., A. Sand, C. Klapisz, and Y. Lemaitre, 1994: Physical retrieval of liquid water contents in a North Atlantic cyclone using SSM/I data. Quart. J. Roy. Meteor. Soc.,120, 1179–1207.
Rojas, R., 1995: Neural Networks—A Systematic Introduction. Springer-Verlag, 467 pp.
Rumelhart, D. E., G. E. Hinton, and R. J. Williams, 1986: Learning representations by back-propagating errors. Nature,323, 533–536.
Sengupta, S. K., and J. S. Boyle, 1995: Nonlinear principal component analysis of climate data: Program for climate model diagnosis and intercomparison. PCMDI Rep. 29, Lawrence Livermore National Laboratory, Livermore, CA, 21 pp.
Simmer, C., 1994: Satellitenfernerkundung hydrologischer Parameter der Atmosphäre mit Mikrowellen. Verlag, 313 pp.
Slingo, A., 1990: Sensitivity of the earth’s radiation budget to changes in low clouds. Nature,343, 49–51.
Stogryn, A. P., 1972: The emissivity of sea foam at microwave frequencies. J. Geophys. Res.,77, 1658–1666.
——, C. T. Butler, and T. J. Bartolac, 1994: Ocean surface wind retrievals from Special Sensor Microwave/Imager data with neural networks. J. Geophys. Res.,99, 981–984.
Tang, B., G. M. Flato, and G. Holloway, 1994: A study of arctic sea ice and sea-level pressure using POP and neural network methods. Atmos.–Ocean,32, 507–529.
Tsang, L., Z. Chen, S. Oh, R. J. Marks, and A. T. C. Chang, 1992: Inversion of snow parameters from passive microwave remote sensing measurements by a neural network trained with a multiple scattering model. IEEE Trans. Geosci. Remote Sens.,30, 1015–1024.
van der Smagt, P. P., 1994: Minimization methods for training feedforward neural networks. Neural Networks,7, 1–11.
von Storch, H., and F. W. Zwiers, 1997: Statistical Analysis in Climate Research. Cambridge University Press, in press.
Wagner, F., 1996: Manual for Netfit (Version 1.2). Institut für Theoretische Physik an der Christian-Albrechts-Universität Kiel, Kiel, Germany, 11 pp.
——, and C. Lovelace, 1971: Phase shift analysis of Π− p → K1b. Nucl. Phys.,25B, 411–427.
Warner, J., 1955: The water content of cumuli form cloud. Tellus,7, 449–457.
Weng, F., and N. C. Grody, 1994: Retrieval of cloud liquid water using the Special Sensor Microwave/Imager (SSM/I). J. Geophys. Res.,99, 25 535–25 551.
Wisler, M. M., and J. P. Hollinger, 1977: Estimation of marine environmental parameters using microwave radiometric remote sensing systems. NRL Tech. Rep. 3661, Naval Research Laboratory, Washington, DC, 27 pp.
Summary of radiosonde observations used to parameterize the liquid water paths and simulate the SSM/I brightness temperatures.
The best results in terms of the rms error of all 40 random searches for the independent test sets and different NN architectures. The number of weights (complexity) are given in parentheses. Three different input combinations are considered: all SSM/I channels except T85V (ALL/85V), the low-frequency SSM/I channels (LOW), and T22V, T37V (TWO).
Summary of retrieval results from four different algorithms for the independent test set. The results are subdivided into three different LWP ranges. Here B denotes the explained variance, and ΔB is the corresponding 99% confidence interval for the lower/upper boundary. The rms errors are given in (kg m−2), and B as well as ΔB in (%).
Deviation of the retrieved values for the test set (n = 691) from the expected clear-sky value LWP = 0.0. Given are the minimum, maximum, and average values, as well as the width of the distribution (std dev) of the LWP retrieved by different algorithms.
Dependencies of the (parameterized) clear-sky retrievals of four different algorithms on total precipitable water, surface wind speed, and SST; r is the correlation coefficient between the algorithm results and the three parameters, and Δr is the corresponding 99% confidence interval for the lower/upper boundary. The presented statistics are based on 691 cases.
The summary of retrieval results for clear cases (identified with Meteosat data) of four different algorithms applied to SSM/I measurements (F-10). The correlations between the LWP retrievals and the total precipitable water (Alishouse et al. 1990a) as well as the surface wind speed (Goodberlet et al. 1989) calculated from the same Tb’s are also presented. The presented statistics are based on 70 000 cases.
The new LWP algorithms, described in this paper, are available in the form of FORTRAN functions from the World Wide Web at http://www.ifm.uni-kiel.de/me/research/Projekte/RemSens/hypam.html, or directly from the corresponding author.