• Barros, A. P., and D. P. Lettenmaier. 1993. Dynamic modeling of the spatial distribution of precipitation in remote mountainous areas. Mon. Wea. Rev 121:11951214.

    • Search Google Scholar
    • Export Citation
  • Barros, A. P., and D. P. Lettenmaier. 1994. Introduction of a very simple evaporative cooling scheme into a dynamic model of precipitation efficiency in mountainous regions. Mon. Wea. Rev 122:27772783.

    • Search Google Scholar
    • Export Citation
  • Butler, C. T., , R. V. Z. Meredith, , and A. P. Stogryn. 1996. Retrieving atmospheric temperature parameters from DMSP SSM/T-1 data with a neural network. J. Geophys. Res 101:70757083.

    • Search Google Scholar
    • Export Citation
  • Caberra-Mercader, C. R., and D. H. Staelin. 1995. Passive microwave relative humidity retrievals using feedforward neural networks. IEEE Trans. Geosci. Remote Sens 33:13241328.

    • Search Google Scholar
    • Export Citation
  • Caudill, M., and C. Butler. 1992. Understanding Neural Networks, Vol. 1: Basic Networks. MIT Press, 309 pp.

  • Cheng, B., and D. M. Titterington. 1994. Neural networks: A review from a statistical perspective. Stat. Sci 9:254.

  • Chevallier, F., , F. Chéruy, , N. A. Scott, , and A. Chédin. 1998. A neural network approach for a fast and accurate computation of a longwave radiative budget. J. Appl. Meteor 37:13851397.

    • Search Google Scholar
    • Export Citation
  • Fletcher, D., and E. Goss. 1993. Forecasting with neural networks: An application using bankruptcy data. Inf. Manage 24:159167.

  • Gardner, M. W., and S. R. Dorling. 1998. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos. Environ 32:26272636.

    • Search Google Scholar
    • Export Citation
  • Hassibi, B., and T. Kailath. 1995. H optimal training algorithms and their relation to backpropagation. Advances in Neural Information Processing Systems, G. Tesauro et al., Eds., Vol. 7, MIT Press, 191–199.

    • Search Google Scholar
    • Export Citation
  • Hassibi, B., , A. H. Sayed, , and T. Kailath. 1994. LMS and backpropagation are minimax filters. Theoretical Advances In Neural Computation and Learning, V. Roychowdhury et al., Eds., Kluwer, 424–449.

    • Search Google Scholar
    • Export Citation
  • Hecht-Nielsen, R. 1989. Theory of the backpropagation neural network. Proc. Int. Joint Conf. on Neural Networks, 1989: IJCNN-89, Washington, DC, IEEE, I593–I606.

    • Search Google Scholar
    • Export Citation
  • Heckley, W. A., , G. Kelly, , and M. Tiedke. 1990. On the use of satellite-derived heating rates for data assimilation within the Tropics. Mon. Wea. Rev 118:17431757.

    • Search Google Scholar
    • Export Citation
  • Hoskins, B. J., , M. E. McIntyre, , and A. W. Robertson. 1985. On the use and significance of isentropic potential vorticity maps. Quart. J. Roy. Meteor. Soc 111:877946.

    • Search Google Scholar
    • Export Citation
  • Isaacs, R. G., , R. N. Hoffman, , and L. D. Kaplan. 1986. Satellite remote sensing of meteorological parameters for global numerical weather prediction. Rev. Geophys 24:701743.

    • Search Google Scholar
    • Export Citation
  • Jacobs, R. A. 1988. Increased rates of convergence through learning rate adaption. Neural Networks 1:295307.

  • Jung, T., , E. Ruprecht, , and F. Wagner. 1998. Determination of cloud liquid water path over the oceans from Special Sensor Microwave/Imager (SSM/I) data using neural networks. J. Appl. Meteor 37:832844.

    • Search Google Scholar
    • Export Citation
  • Karnin, E. D. 1990. A simple procedure for pruning back-propagation trained neural networks. IEEE. Trans. Neural Networks 1:239242.

  • Klein, W. H. 1983. Objective specification of monthly mean surface temperature from mean 700 mb heights in winter. Mon. Wea. Rev 111:674691.

    • Search Google Scholar
    • Export Citation
  • Krasnopolsky, V. M., , L. C. Breaker, , and W. H. Gemmill. 1995. A neural network as a nonlinear transfer function model for retrieving surface wind speeds from the Special Sensor Microwave Imager. J. Geophys. Res 100:1103311045.

    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., and H. S. Bedi. 1996. A brief review of physical initialization. Meteor. Atmos. Phys 60:137142.

  • Krishnamurti, T. N., , J. Xue, , H. S. Bedi, , K. Ingles, , and D. Oosterhof. 1991. Physical initialization for numerical weather prediction over the Tropics. Tellus 43AB:5381.

    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., , H. S. Bedi, , and K. Ingles. 1993. Physical initialization using SSM/I rain rates. Tellus 45A:247269.

  • Krishnamurti, T. N., , G. D. Rohaly, , and H. S. Bedi. 1994. On the improvement of precipitation forecast skill from physical initialization. Tellus 46A:598614.

    • Search Google Scholar
    • Export Citation
  • Kuligowski, R. J., and A. P. Barros. 1998a. Experiments in short-term precipitation forecasting using artificial neural networks. Mon. Wea. Rev 126:470482.

    • Search Google Scholar
    • Export Citation
  • Kuligowski, R. J., and A. P. Barros. 1998b. Using artificial neural networks to estimate missing rainfall data. J. Amer. Water. Resour. Assoc 34:14371447.

    • Search Google Scholar
    • Export Citation
  • Kuligowski, R. J., and A. P. Barros. 1999. High-resolution short-term quantitative precipitation forecasting in mountainous regions using a nested model. J. Geophys. Res 104:3155331564.

    • Search Google Scholar
    • Export Citation
  • Lipton, A. E. 1993. Cloud shading retrieval and assimilation in a satellite–model coupled mesoscale analysis system. Mon. Wea. Rev 121:30623081.

    • Search Google Scholar
    • Export Citation
  • Lipton, A. E., and T. H. Vonder Haar. 1990a. Mesoscale analysis by numerical modeling coupled with sounding retrieval from satellites. Mon. Wea. Rev 118:13081329.

    • Search Google Scholar
    • Export Citation
  • Lipton, A. E., and T. H. Vonder Haar. 1990b. Preconvective mesoscale analysis over irregular terrain with a satellite-model coupled system. Mon. Wea. Rev 118:13301358.

    • Search Google Scholar
    • Export Citation
  • Lipton, A. E., , G. D. Modica, , S. T. Heckman, , and A. J. Jackson. 1995. Satellite-model coupled analysis of convective potential in Florida with VAS water vapor and surface temperature data. Mon. Wea. Rev 123:32923304.

    • Search Google Scholar
    • Export Citation
  • Liu, Q., and C. J. E. Schuurmans. 1990. The correlation of tropospheric and stratospheric temperatures and its effect on the detection of climate changes. Geophys. Res. Lett 17:10851088.

    • Search Google Scholar
    • Export Citation
  • Menzel, W. P., , F. C. Holt, , T. J. Schmit, , R. M. Aune, , A. J. Schreiner, , G. S. Wade, , and D. G. Gray. 1998. Application of GOES-8/9 soundings to weather forecasting and nowcasting. Bull. Amer. Meteor. Soc 79:20592077.

    • Search Google Scholar
    • Export Citation
  • Oh, S-H. 1997. Improving the error backpropagation algorithm with a modified error function. IEEE Trans. Neural Networks 8:799803.

  • Oh, S-H., and S-Y. Lee. 1999. A new error function at hidden layers for fast training of multilayer perceptrons. IEEE Trans. Neural Networks 10:960964.

    • Search Google Scholar
    • Export Citation
  • Ponnapalli, P. V. S., , K. C. Ho, , and M. Thomson. 1999. A formal selection and pruning algorithm for feedforward artificial neural network optimization. IEEE Trans. Neural Networks 10:964968.

    • Search Google Scholar
    • Export Citation
  • Porto, V. W., , D. B. Fogel, , and L. J. Fogel. 1995. Alternative neural network training methods. IEEE Expert 10:1622.

  • Puri, K., and M. J. Miller. 1990. The use of satellite data in the specification of convective heating for diabatic initialization and moisture adjustment in numerical weather prediction models. Mon. Wea. Rev 118:6793.

    • Search Google Scholar
    • Export Citation
  • Puri, K., and N. E. Davidson. 1992. The use of infrared satellite cloud imagery as proxy data for moisture and diabatic heating in data assimilation. Mon. Wea. Rev 120:23292341.

    • Search Google Scholar
    • Export Citation
  • Reale, A., , M. Chalfant, , and L. Wilson. 1999. Scientific status of NOAA Advanced-TOVS sounding products. Tech. Proc. 10th Int. TOVS Study Conf. Boulder, CO, Int. TOVS Working Group, IAMAP, 437–446.

    • Search Google Scholar
    • Export Citation
  • Ripley, B. D. 1994. Neural networks and related methods for classification. J. Roy. Stat. Soc 56:409456.

  • Schmidlin, F. J., and A. Ivanov. 1998. Radiosonde relative humidity sensor performance: The WMO intercomparison—Sept. 1995. Preprints, 10th Symp. on Meoeorological Observations and Instrumentation, Phoenix, AZ, Amer. Meteor. Soc., 68–71.

    • Search Google Scholar
    • Export Citation
  • Smith, W. L., , H. M. Woolf, , and W. J. Jacob. 1970. A regression method for obtaining real-time temperature and geopotential height profiles from satellite spectrometer measurements and its application to Nimbus 3 “SIRS” observations. Mon. Wea. Rev 98:582603.

    • Search Google Scholar
    • Export Citation
  • Stogryn, A. P., , C. T. Butler, , and T. J. Bartolac. 1994. Ocean surface wind retrievals from Special Sensor Microwave Imager data with neural networks. J. Geophys. Res 99:981984.

    • Search Google Scholar
    • Export Citation
  • Turpeinen, O. M. 1990. Diabatic initialization of the Canadian Regional Finite-Element (RFE) model using satellite data. Part II: Sensitivity to humidity enhancement, latent-heating profile and rain rates. Mon. Wea. Rev 118:13961407.

    • Search Google Scholar
    • Export Citation
  • Turpeinen, O. M., , L. Garand, , R. Benoit, , and M. Roch. 1990. Diabatic initialization of the Canadian Regional Finite-Element (RFE) model using satellite data. Part I: Methodology and application to a winter storm. Mon. Wea. Rev 118:13811395.

    • Search Google Scholar
    • Export Citation
  • van Ooyen, A., and B. Nienhuis. 1992. Improving the convergence of the back-propagation algorithm. Neural Networks 5:465471.

  • Vogl, T. P., , J. K. Mangis, , A. K. Rigler, , W. T. Zink, , and D. L. Alkon. 1988. Accelerating the convergence of the back-propagation method. Biol. Cybern 59:257263.

    • Search Google Scholar
    • Export Citation
  • Watrous, R. L. 1987. Learning algorithms for connectionist networks: applied gradient methods of nonlinear optimization. Proc. IEEE First Int. Conf. on Neural Networks, San Diego, CA, IEEE, 619–627.

    • Search Google Scholar
    • Export Citation
  • Wessels, L. F. A., and E. Barnard. 1992. Avoiding false local minima by proper initialization of connections. IEEE Trans. Neural Networks 3:899905.

    • Search Google Scholar
    • Export Citation
  • Wright, B. J., and B. W. Golding. 1990. The interactive mesoscale initialization. Meteor. Mag 119:234244.

  • Wu, X., , G. R. Diak, , C. M. Hayden, , and J. A. Young. 1995. Short-range precipitation forecasts using assimilation of simulated satellite water vapor profiles and column cloud liquid water amounts. Mon. Wea. Rev 123:347365.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    Spectral diagram of the HIRS channels at left and the associated absorption and window regions to the right. The molecules associated with each absorption region are indicated

  • View in gallery

    Same as Fig. 1 but for AMSU-A

  • View in gallery

    Training curves for a neural network on a sample of data for the EBP algorithm with four different values of the learning coefficient β and for the LBL algorithm

  • View in gallery

    Weighting functions for channels (a) 1–8, (b) 9–12, and (c) 13–19 of the HIRS, and (d) channels 4–14 of the AMSU-A. The HIRS curves are courtesy of M. Chalfant of the National Environmental Satellite, Data, and Information Service (NESDIS); the AMSU-A curves are courtesy of N. Grody of NESDIS

  • View in gallery

    Profiles of correlation coefficient of satellite brightness temperatures with observed temperatures as a function of height for channels (a) 1–8, (b) 9–12, and (c) 13–19 of the HIRS and channels (d) 1–8 and (e) 9–15 of the AMSU-A

  • View in gallery

    Profiles of rmse of temperature (solid) and dewpoint (dashed) with height for the trained neural networks. The black lines represent the neural networks trained using all of the data, and the gray lines represent the results from neural networks trained on cloudy and clear cases separately

  • View in gallery

    Profiles of rmse of temperature (solid) and dewpoint (dashed) with height for the neural networks trained on the full input set (black lines) and after pruning selected input nodes and synapses (gray lines)

  • View in gallery

    Scatterplots of neural network estimates of temperature vs observed temperature for the independent data (403 cases) for all nine OPM layers. All temperatures are in degrees Celsius

  • View in gallery

    Same as Fig. 8 but for dewpoint

  • View in gallery

    A comparison of the vertical temperature and dewpoint profiles at Buffalo, NY, at 0000 UTC 22 Aug 1999 and (b) Dulles Airport, VA, at 1200 UTC 20 May 1999 according to radiosonde observations (thick solid lines) and the neural-network retrievals (thick dot–dashed lines). The left line of each pair represents dewpoint; the right line represents temperature. The thin lines sloping upward and to the right represent temperature; the thin solid lines sloping upward and to the left with an upward concavity represent potential temperature; the thin lines sloping upward and to the left with a downward concavity represent wet-bulb potential temperature. All values are in degrees Celsius on the abscissa (potential temperature = actual temperature at 1000 hPa). The dot–dashed lines sloping upward and to the right represent mixing ratio in grams per kilogram, with values given on the ordinate on the right-hand side of the plot and also along the top of the plot

  • View in gallery

    Vertical profiles of (a) rmse and (b) bias for the neural-network retrievals (solid lines), the operational ATOVS retrievals (dashed lines), and the first guess used in the ATOVS retrievals (dotted lines). The dewpoint retrievals are indicated by the use of dots at the actual retrieval levels. The thin vertical line on the bias plot indicates zero bias

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 8 8 3
PDF Downloads 5 5 1

Combined IR–Microwave Satellite Retrieval of Temperature and Dewpoint Profiles Using Artificial Neural Networks

View More View Less
  • a National Environmental Satellite, Data, and Information Service, Camp Springs, Maryland
  • b Division of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts
© Get Permissions
Full access

Abstract

Radiance measurements from satellites offer the opportunity to retrieve atmospheric variables at much higher spatial resolution than is presently afforded by in situ measurements (e.g., radiosondes). However, the accuracy of these retrievals is crucial to their usefulness, and the ill-posed nature of the problem precludes a straightforward solution. A number of retrieval approaches have been investigated, including empirical techniques, coupling with numerical weather prediction models, and data analysis techniques such as regression. In this paper, artificial neural networks are used to retrieve vertical temperature and dewpoint profiles from infrared and microwave brightness temperatures from a polar-orbiting satellite. This approach allows retrievals to be performed even in cloudy conditions—a limitation of infrared-only retrievals. In a direct comparison of this technique with results from the operational Advanced Television and Infrared Observation Satellite Operational Vertical Sounder (ATOVS) retrievals, it was found that the neural-network temperature retrievals had larger errors than the ATOVS retrievals (though generally smaller than the first guess used in the ATOVS retrievals) but that the dewpoint retrievals showed consistent improvement over the comparable ATOVS retrievals.

Corresponding author address: Ana P. Barros, 118 Pierce Hall, 29 Oxford St., Cambridge, MA 02138. barros@deas.harvard.edu

Abstract

Radiance measurements from satellites offer the opportunity to retrieve atmospheric variables at much higher spatial resolution than is presently afforded by in situ measurements (e.g., radiosondes). However, the accuracy of these retrievals is crucial to their usefulness, and the ill-posed nature of the problem precludes a straightforward solution. A number of retrieval approaches have been investigated, including empirical techniques, coupling with numerical weather prediction models, and data analysis techniques such as regression. In this paper, artificial neural networks are used to retrieve vertical temperature and dewpoint profiles from infrared and microwave brightness temperatures from a polar-orbiting satellite. This approach allows retrievals to be performed even in cloudy conditions—a limitation of infrared-only retrievals. In a direct comparison of this technique with results from the operational Advanced Television and Infrared Observation Satellite Operational Vertical Sounder (ATOVS) retrievals, it was found that the neural-network temperature retrievals had larger errors than the ATOVS retrievals (though generally smaller than the first guess used in the ATOVS retrievals) but that the dewpoint retrievals showed consistent improvement over the comparable ATOVS retrievals.

Corresponding author address: Ana P. Barros, 118 Pierce Hall, 29 Oxford St., Cambridge, MA 02138. barros@deas.harvard.edu

Introduction

A proper depiction of the state of the atmosphere is crucial for accurately predicting future conditions, especially in the case of such highly variable parameters as precipitation amount (Wu et al. 1995). For precipitation processes, two of the most important variables are moisture availability (in both vapor and liquid/solid form) and temperature, which determines the maximum availability of water vapor in the atmosphere. However, the observation of these values has historically relied largely on a radiosonde network with a spatial resolution that is too coarse to capture adequately the spatial distribution of moisture. This problem is most notably true in the Tropics (e.g., Krishnamurti et al. 1991) but is the case in the temperate zones as well. Additional difficulties are presented by radiosonde instrument errors and by the horizontal drift of the balloon as it ascends, which can result in significant geolocation errors (Schmidlin and Ivanov 1998).

Sensors mounted on satellite platforms provide global coverage that is relatively homogeneous in space and is of much higher resolution than the current radiosonde network. However, these instruments only provide a vertically integrated measure of the amount of outgoing radiation (radiance) at the top of the earth's atmosphere. Because these radiances are a function of the vertical distribution of water vapor and temperature in the atmosphere and not simply of their average values, the retrieval of these vertical profiles from the radiances is an ill-posed problem that cannot be solved directly (Isaacs et al. 1986). As a result, numerous approaches have been taken for retrieving geophysical parameters from satellite radiances. The simplest approach is to develop semiempirical relationships between satellite radiances and the parameters of interest (e.g., Turpeinen et al. 1990; Turpeinen 1990; Wright and Golding 1990; Wu et al. 1995). Variables that are more readily estimated from satellite data (such as precipitation amount) can also be used in conjunction with a model initialization scheme to retrieve values that are less readily estimated, such as diabatic heating and vertical motion fields. This approach, referred to as “physical initialization” by Krishnamurti and colleagues, has also been investigated by many researchers (Krishnamurti et al. 1991; 1993; 1994; Krishnamurti and Bedi 1996; Puri and Miller 1990; Puri and Davidson 1992; Heckley et al. 1990). Retrievals can also be performed in conjunction with a numerical weather prediction (NWP) model by using the model forecasts as a first-guess field for satellite retrievals and the satellite retrievals, in turn, to influence the NWP model solution (Lipton and Vonder Haar 1990a,b; Lipton 1993; Lipton et al. 1995). A fourth approach is to use a data analysis technique such as linear regression (Smith et al. 1970) to determine the relationships between satellite radiances and geophysical parameters from a sample set of data.

This paper presents an approach using artificial neural networks as the data analysis technique for developing relationships between collocated satellite brightness temperatures and radiosonde measurements of temperature and dewpoint. Artificial neural networks are described extensively in the literature (e.g., Caudill and Butler 1992; Cheng and Titteringon 1994; Hecht-Nielsen 1989; Ripley 1994) and will not be explained in detail here, except to say that they are simple data processing units that transform one or more input values to obtain an output value. They consist of nodes (containing the values) and synapses (connecting numerical weights expressing the mathematical strength of relationship between nodes). The use of a “hidden” group of nodes that are topologically located between the inputs and outputs and the application of a nonlinear transfer function to the values in those nodes allow the neural network to solve nonlinear problems by simultaneously considering multiple combinations of the inputs.

Neural networks have been widely applied in the atmospheric sciences, as reviewed by Gardner and Dorling (1998). Specific applications to the satellite retrieval problem include the work of Butler et al. (1996), who trained a single neural network to relate microwave brightness temperatures to atmospheric temperatures at 15 levels in the atmosphere, and of Caberra-Mercader and Staelin (1995), who used artificial neural networks to retrieve relative humidity profiles in the atmosphere. Jung et al. (1998) trained neural networks to retrieve estimated values of cloud liquid water path from simulated microwave data. A number of authors have used microwave data to retrieve surface winds over the ocean (using buoy data as “ground truth”), including Stogryn et al. (1994) and Krasnopolsky et al. (1995). A much more general application of neural networks to satellite retrieval was performed by Chevallier et al. (1998), who produced a radiative transfer model based on neural networks and achieved accuracy similar to a radiative transfer model but with a much shorter computation time.

As one of the highly used nonlinear adaptive data processing techniques, neural networks have been submitted to intense scrutiny with the objective of establishing a theoretical justification for their robustness (Hassibi et al. 1994; Hassibi and Kailath 1995). In particular, Hassibi and colleagues have shown that, without prior knowledge of the statistical properties of the input data and associated noise (i.e., disturbances), the backpropagation algorithm is an optimal learning algorithm in the sense that it minimizes the maximum energy gain from disturbances to the estimation errors.

Dataset and neural-network specifications

Dataset

Data from the National Oceanic and Atmospheric Administration NOAA-15 Polar-Orbiting Environmental Satellite from 26 October 1998 (the first day archived data were available) through 31 August 1999 were used in this work. This satellite has two instrument platforms: the Advanced Very High Resolution Radiometer, and the Television and Infrared Observation Satellite-Next Generation (TIROS-N) Operational Vertical Sounder, which has three instruments: the High-Resolution Infrared Radiation Sounder (HIRS/3) and the Advanced Microwave Sounding Units A and B (AMSU-A and AMSU-B). Because this satellite passes over the eastern United States at approximately the radiosonde launch times (0000 and 1200 UTC), collocated radiosonde data were available for comparison with the satellite brightness temperatures. Radiosonde data were taken from the seven radiosonde locations described in Table 1.

The HIRS/3 has 20 spectral channels at a spatial resolution of 18.9 km (shortwave IR and visible) or 20.3 km (longwave IR) at nadir. The AMSU-A has a resolution of 48 km at nadir and contains 15 frequencies, and the AMSU-B has an additional 5 frequencies with a horizontal data resolution of 16.3 km at nadir. The channel frequencies of the HIRS/3 and AMSU-A are shown graphically in Figs. 1 and 2, respectively. Only the AMSU-A channels are considered here because of the influence of electromagnetic interference on the AMSU-B readings, a problem that was not resolved satisfactorily until after the chosen test period (R. Ferraro, 2001, personal communication).

The brightness temperatures from the satellite pixel whose center was closest to a particular radiosonde were matched with temperature and dewpoint data from that radiosonde at nine levels: surface and 950, 900, 850, 800, 750, 700, 500, and 250 hPa. This arrangement results in a total of 4340 possible pairings of radiosonde data (for a given level) and collocated satellite brightness temperatures [(310 days) × (2 launches per day) × (7 radiosondes)]. However, pairings in which either the radiosonde data or satellite data from any channel were missing were eliminated from the dataset, resulting in a final total of 2013 sample points available for study.

These nine particular levels were selected because the ultimate objective of this research is to use these retrieved values of temperature and dewpoint to initialize an NWP model. This model, hereinafter called the orographic precipitation model (OPM), was developed for predicting precipitation in regions of complex terrain where orographic forcing mechanisms can exert a significant influence (Barros and Lettenmaier 1993, 1994). The current version of the OPM (Kuligowski and Barros 1999) has been run at 1-km horizontal resolution over the Pocono Mountains of northeastern Pennsylvania using initial and boundary conditions from the Pennsylvania State University–National Center for Atmospheric Research Fifth-Generation Mesoscale Model (MM5). However, experiments have shown that the bias characteristics of OPM closely follow those of MM5 (Kuligowski and Barros 1999), and the use of satellite data to provide initial conditions is seen as a way to circumvent this difficulty.

Neural network architecture and training

Architecture

This work uses a standard, fully-connected backpropagation neural network with a single hidden layer. To maximize the accuracy of the retrievals, each temperature and dewpoint retrieval for each atmospheric level was performed independently by a separate neural network with a single output node; thus retrievals are performed independently for each atmospheric level. Although the number of atmospheric levels used in this work is only nine, the methodology can be applied to any number of levels. The selection of inputs used for each network will be discussed in the next section. An ideal number of hidden-layer nodes for a given number of input and/or output nodes has not yet been firmly established, so the authors have followed Fletcher and Goss (1993), who obtained their best results with hidden layers ranging in size from (2I + 1) to (2I1/2 + K) nodes, where I is the number of input nodes and K is the number of output nodes. Experiments by the authors have shown no significant difference in results within this range (Kuligowski and Barros 1998a,b), so the smallest size was chosen in order to minimize the required computation time.

The transfer function used in this work is a variant of the widely used logistic equation (Caudill and Butler 1992; Ripley 1994):
i1520-0450-40-11-2051-e1
where yj, n is the value of hidden-layer node j for training case n (one of a total of N training cases).

The initial weights were selected following the procedure of Wessels and Barnard (1992), who found that random initial input-to-hidden-layer weights in the range of (±3I) resulted in optimal neural network training. For the hidden-to-output layer, they conjectured that setting the initial weights to approximately the same magnitude would result in consistent weight adjustments among different portions of the neural network, and for this work those values were set as the reciprocal of J, the number of hidden-layer nodes.

Training

The most commonly used approach for optimizing the neural-network weights is called error backpropagation (EBP) and a host of other names (Caudill and Butler 1992; Cheng and Titterington 1994; Hecht-Nielsen 1989). This is a gradient-descent approach in which each weight is adjusted adaptively in a way that incrementally reduces the total error in the neural network output. The reader is referred to the above references for a more detailed explanation of this approach.

Although EBP is widely used, it suffers from a number of weaknesses. The three most significant are 1) training is very slow, requiring many cycles; 2) the neural network tends to become trapped in local minima, even with a momentum term; and 3) the values of the learning (and momentum) coefficients must be determined by trial and error and are not guaranteed to have the same values for different networks. In response to these problems, numerous other approaches have been attempted, with varying levels of computational cost and overall success. Adaptive learning rates (Jacobs 1988; Vogl et al. 1988) can enhance the rate of training but require more calibration than fixed learning rates and remain susceptible to local minima. Modified error functions (Oh 1997; van Ooyen and Nienhuis 1992) allow the neural network to converge toward a solution more quickly and are less prone to becoming trapped in local minima, but they also slow the training process further. Simulated annealing (Porto et al. 1995) is very successful at avoiding local minima but requires enormous amounts of training time as compared with an EBP. Quasi-Newton convergence schemes (Watrous 1987) are extremely fast but are also extremely memory intensive.

The approach used here is the layer-by-layer (LBL) method as modified by Oh and Lee (1999), which determines an optimum learning rate with each iteration based on the error from that iteration instead of using a predetermined learning rate. As a result, the LBL algorithm converges more quickly than EBP and does not need a learning parameter determined by trial and error. The learning coefficient β is computed as follows:
i1520-0450-40-11-2051-e2
where ∂Eout/∂wj, the change in output error with respect to output weight j, is the same as in standard backpropagation; that is,
i1520-0450-40-11-2051-e3
where ztn is the target value of the neural network output for training sample n, zn is the actual network output, wj is the weight connecting hidden node j with the output node, and f(yj,n) is the value of hidden node j after applying the transfer function [Eq. (1)].
The hidden-layer node errors are also estimated differently from the EBP method. The values in the hidden layer f(ytj, n) are estimated as follows:
i1520-0450-40-11-2051-e4
where (Eout)/∂f(yj, n) is computed similarly to (Eout)/∂wj from Eq. (3):
i1520-0450-40-11-2051-e5
Because the value of f(ytj, n) can only range from −1 to 1 for the transfer function, truncation may be necessary to bring the value of f(ytj, n) within this range. The target value for the pretransfer function is then estimated by inverting of the transfer function f(ytj, n) to obtain ytj, n. The hidden-layer node error Ehidj is then computed as follows:
i1520-0450-40-11-2051-e6
which results in the following first derivative with respect to υi, j which forms the basis for the change in the hidden layer weights:
i1520-0450-40-11-2051-e7

Figure 3 is an example of the training curves [changes in root-mean-square error (rmse) with training, evaluated on independent data] for a neural network using the EBP approach with four different values of β and the LBL algorithm. The figure illustrates the problems that can occur in EBP training if β is not chosen correctly: the training will become prematurely trapped in a local minimum or will be unstable and will “overshoot” the minimum value. The LBL approach avoids both of these pitfalls. The stable training of the LBL is also valuable when it is desired to stop training when the rmse reaches a minimum value, because the EBP also produces numerous false minima in the training curves that can halt training prematurely.

Before continuing, a few general limitations of backpropagation networks should be noted. As stated previously, neural networks can emulate nonlinearities in the relationship between datasets—an advantage over other approaches that use linear methods or that fit preselected equations to the data. However, the backpropagation network, like any other method, is limited to finding those relationships that are actually present in the data used to train it. Therefore, the usefulness of the results will be directly related to the appropriateness of the data presented to it. Also, backpropagation networks are more prone than linear methods to overfitting (learning relationships that apply to the training data but do not generalize to independent data), apparently because most nongeneralizing relationships in data subsets are nonlinear. This possibility makes evaluation on independent data very important; in fact, the training of the backpropagation networks in this paper was monitored using independent data rather than the training data in order to detect overfitting. Last, as with all other data analysis methods, considerable amounts of data are required to determine adequately the relevant relationships between predictors and predictands. This necessity can present problems when the historic records are short, as is in the case for the newly available NOAA-15 data used in this application.

Variable selection

To optimize the training and performance of the neural network, those predictors with a weak physical relationship to the predictand should be eliminated from consideration. For instance, channel 20 of the HIRS (visible channel) is an obvious candidate for elimination, given that the radiance in the visible band is a function primarily of reflected rather than emitted radiation. The channels in the IR and microwave bands can be evaluated by examining the vertical weighting functions of the HIRS and AMSU-A instruments, as depicted in Figs. 4a–d. These weighting functions, which are vertical derivatives of the atmospheric transmissivity at the frequency for each channel, show the approximate levels of origin (and the relative contributions of each level) of the radiation emerging from the top of the atmosphere that is detected by the sensor. (Note that AMSU-A channels 1–3 and 15 are not shown in the figure; these are surface channels whose weighting functions have very narrow peaks near the earth's surface.) Those channels with weighting functions that do not possess significant contributions at the atmospheric level of interest would appear at first glance to be ideal candidates for elimination.

However, this process is complicated greatly by the high correlation among the brightness temperatures from most channels. For example, in the case of HIRS data, the squared value of the correlation coefficients (r2) for all pairs of channels 4–19 are on the order of 0.6 and often above. Although this value is consistent with the substantial degree of correlation exhibited by the observed temperature values at different levels of the atmosphere, such high r2 values also suggest that there is a large degree of spurious correlation (correlated noise) among channels. Table 2 indicates a high degree of correlation among the tropospheric carbon dioxide channels (4–7, 10, and 14–16), the window channels (8, 13, 18, and 19), and the water vapor channels (11 and 12) for clear-sky conditions. This result is not entirely surprising given that their weighting functions overlap significantly (Figs. 4a–c), but it makes channel selection difficult. Furthermore, as indicated by Table 3, temperatures below the tropopause at the radiosonde mandatory levels are highly correlated with one another (correlation coefficients of 0.79 or greater for this sample), and high correlations also exist between tropospheric temperatures and stratospheric temperatures in the 20–30-hPa range. Strong negative correlations between tropospheric temperatures and temperatures immediately above the tropopause (200–100 hPa) also are observed. These phenomena have also been documented by other authors, including Smith et al. (1970), Hoskins et al. (1985), and Liu and Schuurmans (1990). The negative correlations are associated with Dines' relationship: in the midlatitudes, a cold troposphere is associated with lower pressure in the upper troposphere and a warm stratosphere; the opposite is true for a warm troposphere (Liu and Schuurmans 1990).

The cumulative effect of these two factors is to produce vertical correlation functions that are different from the observed weighting functions. An example is given in Fig. 5a, which shows the correlation coefficient with observed temperature as a function of height for the first eight HIRS channels. A comparison with the corresponding weighting functions in Fig. 4a shows that the correlation among atmospheric temperatures at different levels dominates the resulting profiles and makes them substantially different from the weighting functions. The correlation structures for different channels (especially 5–8) are also more similar than would be suggested by Fig. 4a.

The correlation plots for the remaining channels are depicted in Figs. 5b–e. These figures can be used to eliminate three additional channels for the surface–500-hPa region: HIRS channel 12 and AMSU-A channels 8 and 9. The elimination of these channels can be justified further by an examination of their respective weighting functions in Figs. 4a,d and the temperature correlations in Table 3, which suggest that the weighting functions of these channels peak in regions in which the temperature has a relatively weak correlation with that of the lower troposphere. This results in 31 predictor channels for the lowest eight layers, after previously eliminating the HIRS visible channel. For the 250-hPa layer, none of the channels are obvious candidates for elimination, so 34 channels are used (all except the HIRS visible channel).

Neural network training, pruning, and robustness

Full predictor set

The initial step was to use the predictor set from section 3 (31 predictors for the surface to 500 hPa; 34 predictors for 250 hPa) in a backpropagation network trained using the LBL algorithm described in section 2b. Four-fifths of the 2013 available data points (1610 points) were used for training; the remaining one-fifth (403 points) were used to evaluate training, which was stopped when the rmse on this evaluation set reached a minimum (i.e., began increasing with further training). The evaluation set consisted of every fifth point in sequence from the dataset to ensure representative sampling in time and space. To reduce the effects of instrument noise on the training data, spatial averages over a 3 × 3 pixel area were used rather than the individual pixel values.

The black lines in Fig. 6 shows the vertical profile of rmse for both the temperature and dewpoint retrievals. The estimates of temperature have average errors in the range of 2.5° to 3.5°C, but the dewpoint estimates are poorer, especially between 800 and 500 hPa, which is similar to the results of the operational Geostationary Operational Environmental Satellite GOES-8 sounder retrievals in Fig. 7 of Menzel et al. (1998). This would be expected given that most of the available channels are in absorption bands for carbon dioxide and ozone rather than for water vapor. The only channels with a significant sensitivity to water vapor are HIRS channels 11 and 12 (in the 5–8-μm water vapor band), and AMSU channel 2 (23.8 GHz). Under these circumstances, much of the relationship between the dewpoint and the satellite radiances is probably a product of the natural correlation between temperature and dewpoint.

Cloudy–clear separation

To alleviate the effects of cloud contamination of the IR brightness temperatures, the training data were split into cases in which clouds were present and in which clouds were absent, and separate sets of neural networks were trained on each. A priori separation of cloud-free samples from those with clouds present was accomplished using a threshold difference in brightness temperature between an infrared and microwave channel corresponding to the lower troposphere. To be specific, a threshold difference of 16°C between HIRS channel 13 and AMSU-A channel 4 produced the highest correspondence between the IR brightness temperature and observed temperature for the cloud-free cases. The training data were thus divided into 1117 clear cases and 693 cases in which cloud was present, and the numbers for the validation dataset were 250 and 153, respectively.

All of the neural networks were trained using the full set of predictor variables. The HIRS data were included in the cloudy retrievals because, for three layers tested (surface, 900 hPa, 850 hPa), the rmse were found to be 0.2°–1.0° higher for the neural networks trained using only the AMSU channels, indicating that the HIRS channels provided some useful information despite the effects of cloud contamination.

The combined rmse of the resulting clear and cloud retrievals is shown by the gray lines in Fig. 6. At some levels, the rmse was actually increased by training the clear and cloudy cases separately. This fact appears to indicate deficiencies in the neural network optimization, because performing a similar analysis using linear regression produced slightly (∼0.2°C) lower rmse when the cloudy and clear cases were trained separately. The neural networks may be overfitting, either because they do not have enough training data or because they are too large, or both. Any problems resulting from insufficient training data would certainly be exacerbated by dividing this small training dataset into even smaller clear and cloudy segments prior to training; however, the small degree of improvement even with linear regression (which is much less susceptible to overfitting) suggests that the potential for reducing the rmse by training the clear and cloudy cases separately is limited regardless of the amount of training data. As a consequence, simplifying the neural networks by removing nodes and/or synapses may be a more fruitful approach to improving optimization.

Self-pruning neural network

Removal of input nodes

To reduce the number of input nodes effectively, an appropriate hierarchy of the predictive contribution of each channel must be established. The strong correlations among various channels discussed in section 3 precludes doing so intuitively, so forward screening regression was chosen to perform this task. Although a hierarchy for linear relationships may not be applicable when nonlinear relationships are taken into account, it may be adequate given that a similar forward screening approach using neural networks would require a prohibitive amount of computation time.

The predictor sets for each layer are shown in order of selection by forward regression in Table 4, and they appear to be reasonably consistent with the discussion of variable selection in section 3. Selecting an appropriate number of predictors would be the next step, but experiments by the authors failed to yield a consistently useful criterion for the number of inputs (e.g., the F test) that would yield optimal results on independent data; this result is consistent with the findings of Klein (1983). These experiments did indicate that a significant number of predictors (>20) were needed to achieve optimal results for both linear regression and neural networks, which suggests that a backward selection scheme (removal of predictors) may yield a more rapid optimization of the number of predictors than forward selection.

In lieu of a time-consuming trial-and-error removal of predictors, the hierarchy in Table 4 was followed as a basis for removal. After optimizing the network on the full set of predictors, the lowest-ranked input was removed and the network was retrained; this elimination process continued until performance on independent data failed to improve. Only a small number of inputs were removed by this approach (an average of 1.2 with a maximum of 8), and the reduction in rmse was also small, with a few notable exceptions (an average of 0.09°C, but a maximum of 1.17°C). It cannot be said definitively whether the small differences meant that little improvement could be achieved by this approach or whether an improved means was needed for selecting which predictors to remove.

Removal of synapses

In addition to removing inputs, removing unnecessary synapses during training can also improve the generalization capacity of the network. To determine which to remove, Karnin (1990) expressed the sensitivity of the network to the weight of a particular synapse as the change in network error if it is set to zero. Because determining this directly is time-consuming, it is approximated as:
i1520-0450-40-11-2051-e8
where Δwij is the change in weight wij from the previous training cycle, N is the total number of training cycles, wiij is the value at the beginning of training, and ∂(Ej)/∂wij is computed for each cycle. The relative sensitivity of all of the weight connections between a particular pair of node layers is expressed by the local relative sensitivity index (LRSI) of Ponnapalli et al. (1999):
i1520-0450-40-11-2051-e9
where M is the total number of weights providing input to hidden (output) neuron j. All synapses with an LRSI below a predetermined threshold value are removed.

For the application in this paper, once the neural network had reached a minimum rmse on independent data, the LRSI was computed for each synapse, and the weights of those synapses with an LRSI below a threshold value were set to zero and were not permitted to be changed again. Various trials failed to produce a consistent optimum value for the LRSI threshold, so a varying threshold technique was developed. An initial threshold value of 0.01 was used, and if removing the weights below the threshold value failed to produce a lower rmse after additional training, the threshold was lowered by an order of magnitude (hence reducing the number of weights removed) and the weight removal and retraining were repeated. If no reduction in rmse was achieved even when the threshold has been lowered to 0.000 001, then training ended. This procedure is more computationally expensive because of the trial and error involving the threshold LRSI value, but it is also more flexible.

The rmse profiles produced by these self-pruning neural networks are shown in gray in Fig. 7; the results before pruning are shown as black lines for comparison. Note that the changes are not consistent from layer to layer: in some cases, no pruning was performed, but in one case pruning reduced the rmse by nearly 1°C.

Principal component analysis

An attempt was also made to reduce the complexity of the neural network by using principal component analysis to reduce the number of predictors effectively. To be specific, the first 10 empirical orthogonal function (EOF) coefficients were computed from a matrix consisting of all 35 predictors. Table 5 lists the EOF components for all 35 predictors for the first three EOFs; the results are consistent with the findings of section 3.

The first four EOFs were reconstructed and used as inputs to the neural network, but the resulting rmse values for temperature and dewpoint were 1.5°–3.5°C higher than for the self-pruning neural network using the full predictor set. The use of 10 EOFs resulted in somewhat smaller errors, but they were still 1.0°–1.5°C higher than for the full predictor set.

Results and discussion

This section will examine the neural network results in more detail. Fig. 8 is a scatterplot of the observed versus estimated temperature for the pruned neural-network results for 403 independent cases for all nine OPM layers; Table 5a provides basic statistics for these same cases. The overall fit of the data to the 1:1 line is very good, with correlation coefficients between the estimates and observations at or above 0.94 for all layers except layer 9 (250 hPa). Furthermore, there is very little bias (a maximum of 0.34°C) and very little systematic bias (the slope of the best-fit line exceeds 0.9 in all layers except layer 9). However, the magnitude of the rmse values suggests room for improvement. The dewpoint scatterplots in Fig. 9 are less encouraging, but, as stated before, this is to be expected. As indicated by the figure and by Table 5b, the scatter of these plots is significantly greater (though the correlation coefficient is above 0.8 below 500 hPa and above 0.9 below 800 hPa) than for temperature, and significant systematic biases can be found, especially higher in the atmosphere; the retrievals tend to be too moist for low dewpoints and too dry for high dewpoints.

To illustrate more clearly the performance of the scheme for individual cases and the differences between cases, the retrieved and observed soundings for Buffalo, New York, (72528) at 1200 UTC 22 August 1999 are shown in Fig. 10a and for Dulles Airport, Virginia, (72403) at 0000 UTC 20 May 1999 are shown in Fig. 10b. The 22 August sounding represents a case in which the scheme performs very well at the nine model retrieval levels, with an rmse of only 1.3°C for the 18 temperature and dewpoint values. On the other hand, the retrieval scheme performed very poorly for the 20 May 1999 data. The retrieved dewpoint values were much too high, especially between 900 and 500 hPa. In this case, the rmse for the 18 retrieved temperature and dewpoint values exceeded 13°C. The steep dewpoint gradient between 925 and 800 hPa may have contributed in part to the poor quality of the retrieval in this particular case. The weighting functions cover a relatively large vertical extent, so the presence of sharp gradients of temperature or dewpoint will be masked by the implicit integration in the vertical in the radiance values.

To provide a comparison with an operational satellite sounding technique, radiosonde match files of the Advanced TIROS Operational Vertical Sounder (ATOVS) sounding products (Reale et al. 1999) were obtained. These archives were available only from April 1999 onward, so a direct comparison of 102 independent neural network soundings and operational soundings was made. Figure 11a compares the rmse values of both retrievals and also includes the rmse values of the “first guess” used in the operational retrievals that is generated from a library search. The neural-network temperature retrievals are found to compare very well with the operational retrievals at 800 hPa and below; above that level, the neural-network rmse is as much as 0.75°C larger. However, the neural network also shows significant improvement over the first-guess retrievals except in the middle troposphere; conceivably, the use of the neural-network retrievals in place of the current operational first guess could produce better results than the current operational retrievals. Note also that no limb corrections were performed on the data used in the neural-network retrievals, which is another potential source of improvement. The neural-network dewpoint retrievals are found to be superior to the operational dewpoint retrievals at all levels, especially between 800 and 500 hPa, where the improvements are as great as 1.0°C.

A comparison of the mean additive bias of the two retrievals is shown in Fig. 11b. The neural-network retrievals have a much lower bias than the operational retrievals at 500 hPa and above; elsewhere the magnitude of the bias is similar, though everywhere the signs are opposite (cold bias for the operational retrieval, warm bias for the neural-network retrievals). The dewpoint retrievals show a consistently lower warm bias for the neural-network retrievals when compared with the operational retrievals.

Summary and conclusions

In this paper, a technique has been demonstrated for using artificial neural networks to retrieve vertical profiles of temperature and dewpoint from satellite brightness temperatures. Although the application of neural networks to the satellite retrieval problem is not new, this is the first published neural-network-based retrieval technique of which the authors are aware in which infrared and microwave data were used together to combine their relative strengths: finer spatial resolution (in the horizontal and the vertical) of the infrared instrument and relative transparency to (water) clouds for the microwave. The errors for this technique for clear imagery were compared with the current operational ATOVS retrievals (Reale et al. 1999). The neural-network temperature retrievals exhibited somewhat higher rmses in the middle troposphere but comparable errors above and below; the dewpoint retrievals were found to be improved at all levels in terms of both rmse and bias reduction. The use of the neural-network retrievals as a first guess for the ATOVS retrievals instead of the currently used library search could conceivably lead to even better results.

This method so far has been applied to retrievals for the atmosphere above a particular point in space; however, spatial fields are needed to initialize an NWP model. The use of data from multiple instruments represents a challenge to producing spatially distributed retrievals, because the AMSU-A has a much coarser spatial resolution than the HIRS, and the resolution of both instruments is still coarser than that of many regional mesoscale NWP models. A technique for resolving these issues, and thus generating retrieved spatial fields of temperature and dewpoint that appropriately account for the scaling differences among the instruments and an NWP model, will be presented in a forthcoming paper.

Acknowledgments

The authors thank Dr. Tom Kleespies of the NOAA NESDIS Office of Research and Applications for providing the archived ATOVS radiosonde match files. All of the figures in this manuscript were created using the Gridded Analysis and Display System (GADS) developed at the Center for Ocean–Land–Atmosphere Studies (COLA), and the routines for creating the skew T plots were written by Robert Hart at The Pennsylvania State University. The authors also thank the anonymous reviewers for their comments that led to improvements in the manuscript. This work was supported by NASA under an Earth System Science Fellowship awarded to the first author and under Contract NAG5-6656 with the second author and by the National Science Foundation under Contract CMS 95-01958.

REFERENCES

  • Barros, A. P., and D. P. Lettenmaier. 1993. Dynamic modeling of the spatial distribution of precipitation in remote mountainous areas. Mon. Wea. Rev 121:11951214.

    • Search Google Scholar
    • Export Citation
  • Barros, A. P., and D. P. Lettenmaier. 1994. Introduction of a very simple evaporative cooling scheme into a dynamic model of precipitation efficiency in mountainous regions. Mon. Wea. Rev 122:27772783.

    • Search Google Scholar
    • Export Citation
  • Butler, C. T., , R. V. Z. Meredith, , and A. P. Stogryn. 1996. Retrieving atmospheric temperature parameters from DMSP SSM/T-1 data with a neural network. J. Geophys. Res 101:70757083.

    • Search Google Scholar
    • Export Citation
  • Caberra-Mercader, C. R., and D. H. Staelin. 1995. Passive microwave relative humidity retrievals using feedforward neural networks. IEEE Trans. Geosci. Remote Sens 33:13241328.

    • Search Google Scholar
    • Export Citation
  • Caudill, M., and C. Butler. 1992. Understanding Neural Networks, Vol. 1: Basic Networks. MIT Press, 309 pp.

  • Cheng, B., and D. M. Titterington. 1994. Neural networks: A review from a statistical perspective. Stat. Sci 9:254.

  • Chevallier, F., , F. Chéruy, , N. A. Scott, , and A. Chédin. 1998. A neural network approach for a fast and accurate computation of a longwave radiative budget. J. Appl. Meteor 37:13851397.

    • Search Google Scholar
    • Export Citation
  • Fletcher, D., and E. Goss. 1993. Forecasting with neural networks: An application using bankruptcy data. Inf. Manage 24:159167.

  • Gardner, M. W., and S. R. Dorling. 1998. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos. Environ 32:26272636.

    • Search Google Scholar
    • Export Citation
  • Hassibi, B., and T. Kailath. 1995. H optimal training algorithms and their relation to backpropagation. Advances in Neural Information Processing Systems, G. Tesauro et al., Eds., Vol. 7, MIT Press, 191–199.

    • Search Google Scholar
    • Export Citation
  • Hassibi, B., , A. H. Sayed, , and T. Kailath. 1994. LMS and backpropagation are minimax filters. Theoretical Advances In Neural Computation and Learning, V. Roychowdhury et al., Eds., Kluwer, 424–449.

    • Search Google Scholar
    • Export Citation
  • Hecht-Nielsen, R. 1989. Theory of the backpropagation neural network. Proc. Int. Joint Conf. on Neural Networks, 1989: IJCNN-89, Washington, DC, IEEE, I593–I606.

    • Search Google Scholar
    • Export Citation
  • Heckley, W. A., , G. Kelly, , and M. Tiedke. 1990. On the use of satellite-derived heating rates for data assimilation within the Tropics. Mon. Wea. Rev 118:17431757.

    • Search Google Scholar
    • Export Citation
  • Hoskins, B. J., , M. E. McIntyre, , and A. W. Robertson. 1985. On the use and significance of isentropic potential vorticity maps. Quart. J. Roy. Meteor. Soc 111:877946.

    • Search Google Scholar
    • Export Citation
  • Isaacs, R. G., , R. N. Hoffman, , and L. D. Kaplan. 1986. Satellite remote sensing of meteorological parameters for global numerical weather prediction. Rev. Geophys 24:701743.

    • Search Google Scholar
    • Export Citation
  • Jacobs, R. A. 1988. Increased rates of convergence through learning rate adaption. Neural Networks 1:295307.

  • Jung, T., , E. Ruprecht, , and F. Wagner. 1998. Determination of cloud liquid water path over the oceans from Special Sensor Microwave/Imager (SSM/I) data using neural networks. J. Appl. Meteor 37:832844.

    • Search Google Scholar
    • Export Citation
  • Karnin, E. D. 1990. A simple procedure for pruning back-propagation trained neural networks. IEEE. Trans. Neural Networks 1:239242.

  • Klein, W. H. 1983. Objective specification of monthly mean surface temperature from mean 700 mb heights in winter. Mon. Wea. Rev 111:674691.

    • Search Google Scholar
    • Export Citation
  • Krasnopolsky, V. M., , L. C. Breaker, , and W. H. Gemmill. 1995. A neural network as a nonlinear transfer function model for retrieving surface wind speeds from the Special Sensor Microwave Imager. J. Geophys. Res 100:1103311045.

    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., and H. S. Bedi. 1996. A brief review of physical initialization. Meteor. Atmos. Phys 60:137142.

  • Krishnamurti, T. N., , J. Xue, , H. S. Bedi, , K. Ingles, , and D. Oosterhof. 1991. Physical initialization for numerical weather prediction over the Tropics. Tellus 43AB:5381.

    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., , H. S. Bedi, , and K. Ingles. 1993. Physical initialization using SSM/I rain rates. Tellus 45A:247269.

  • Krishnamurti, T. N., , G. D. Rohaly, , and H. S. Bedi. 1994. On the improvement of precipitation forecast skill from physical initialization. Tellus 46A:598614.

    • Search Google Scholar
    • Export Citation
  • Kuligowski, R. J., and A. P. Barros. 1998a. Experiments in short-term precipitation forecasting using artificial neural networks. Mon. Wea. Rev 126:470482.

    • Search Google Scholar
    • Export Citation
  • Kuligowski, R. J., and A. P. Barros. 1998b. Using artificial neural networks to estimate missing rainfall data. J. Amer. Water. Resour. Assoc 34:14371447.

    • Search Google Scholar
    • Export Citation
  • Kuligowski, R. J., and A. P. Barros. 1999. High-resolution short-term quantitative precipitation forecasting in mountainous regions using a nested model. J. Geophys. Res 104:3155331564.

    • Search Google Scholar
    • Export Citation
  • Lipton, A. E. 1993. Cloud shading retrieval and assimilation in a satellite–model coupled mesoscale analysis system. Mon. Wea. Rev 121:30623081.

    • Search Google Scholar
    • Export Citation
  • Lipton, A. E., and T. H. Vonder Haar. 1990a. Mesoscale analysis by numerical modeling coupled with sounding retrieval from satellites. Mon. Wea. Rev 118:13081329.

    • Search Google Scholar
    • Export Citation
  • Lipton, A. E., and T. H. Vonder Haar. 1990b. Preconvective mesoscale analysis over irregular terrain with a satellite-model coupled system. Mon. Wea. Rev 118:13301358.

    • Search Google Scholar
    • Export Citation
  • Lipton, A. E., , G. D. Modica, , S. T. Heckman, , and A. J. Jackson. 1995. Satellite-model coupled analysis of convective potential in Florida with VAS water vapor and surface temperature data. Mon. Wea. Rev 123:32923304.

    • Search Google Scholar
    • Export Citation
  • Liu, Q., and C. J. E. Schuurmans. 1990. The correlation of tropospheric and stratospheric temperatures and its effect on the detection of climate changes. Geophys. Res. Lett 17:10851088.

    • Search Google Scholar
    • Export Citation
  • Menzel, W. P., , F. C. Holt, , T. J. Schmit, , R. M. Aune, , A. J. Schreiner, , G. S. Wade, , and D. G. Gray. 1998. Application of GOES-8/9 soundings to weather forecasting and nowcasting. Bull. Amer. Meteor. Soc 79:20592077.

    • Search Google Scholar
    • Export Citation
  • Oh, S-H. 1997. Improving the error backpropagation algorithm with a modified error function. IEEE Trans. Neural Networks 8:799803.

  • Oh, S-H., and S-Y. Lee. 1999. A new error function at hidden layers for fast training of multilayer perceptrons. IEEE Trans. Neural Networks 10:960964.

    • Search Google Scholar
    • Export Citation
  • Ponnapalli, P. V. S., , K. C. Ho, , and M. Thomson. 1999. A formal selection and pruning algorithm for feedforward artificial neural network optimization. IEEE Trans. Neural Networks 10:964968.

    • Search Google Scholar
    • Export Citation
  • Porto, V. W., , D. B. Fogel, , and L. J. Fogel. 1995. Alternative neural network training methods. IEEE Expert 10:1622.

  • Puri, K., and M. J. Miller. 1990. The use of satellite data in the specification of convective heating for diabatic initialization and moisture adjustment in numerical weather prediction models. Mon. Wea. Rev 118:6793.

    • Search Google Scholar
    • Export Citation
  • Puri, K., and N. E. Davidson. 1992. The use of infrared satellite cloud imagery as proxy data for moisture and diabatic heating in data assimilation. Mon. Wea. Rev 120:23292341.

    • Search Google Scholar
    • Export Citation
  • Reale, A., , M. Chalfant, , and L. Wilson. 1999. Scientific status of NOAA Advanced-TOVS sounding products. Tech. Proc. 10th Int. TOVS Study Conf. Boulder, CO, Int. TOVS Working Group, IAMAP, 437–446.

    • Search Google Scholar
    • Export Citation
  • Ripley, B. D. 1994. Neural networks and related methods for classification. J. Roy. Stat. Soc 56:409456.

  • Schmidlin, F. J., and A. Ivanov. 1998. Radiosonde relative humidity sensor performance: The WMO intercomparison—Sept. 1995. Preprints, 10th Symp. on Meoeorological Observations and Instrumentation, Phoenix, AZ, Amer. Meteor. Soc., 68–71.

    • Search Google Scholar
    • Export Citation
  • Smith, W. L., , H. M. Woolf, , and W. J. Jacob. 1970. A regression method for obtaining real-time temperature and geopotential height profiles from satellite spectrometer measurements and its application to Nimbus 3 “SIRS” observations. Mon. Wea. Rev 98:582603.

    • Search Google Scholar
    • Export Citation
  • Stogryn, A. P., , C. T. Butler, , and T. J. Bartolac. 1994. Ocean surface wind retrievals from Special Sensor Microwave Imager data with neural networks. J. Geophys. Res 99:981984.

    • Search Google Scholar
    • Export Citation
  • Turpeinen, O. M. 1990. Diabatic initialization of the Canadian Regional Finite-Element (RFE) model using satellite data. Part II: Sensitivity to humidity enhancement, latent-heating profile and rain rates. Mon. Wea. Rev 118:13961407.

    • Search Google Scholar
    • Export Citation
  • Turpeinen, O. M., , L. Garand, , R. Benoit, , and M. Roch. 1990. Diabatic initialization of the Canadian Regional Finite-Element (RFE) model using satellite data. Part I: Methodology and application to a winter storm. Mon. Wea. Rev 118:13811395.

    • Search Google Scholar
    • Export Citation
  • van Ooyen, A., and B. Nienhuis. 1992. Improving the convergence of the back-propagation algorithm. Neural Networks 5:465471.

  • Vogl, T. P., , J. K. Mangis, , A. K. Rigler, , W. T. Zink, , and D. L. Alkon. 1988. Accelerating the convergence of the back-propagation method. Biol. Cybern 59:257263.

    • Search Google Scholar
    • Export Citation
  • Watrous, R. L. 1987. Learning algorithms for connectionist networks: applied gradient methods of nonlinear optimization. Proc. IEEE First Int. Conf. on Neural Networks, San Diego, CA, IEEE, 619–627.

    • Search Google Scholar
    • Export Citation
  • Wessels, L. F. A., and E. Barnard. 1992. Avoiding false local minima by proper initialization of connections. IEEE Trans. Neural Networks 3:899905.

    • Search Google Scholar
    • Export Citation
  • Wright, B. J., and B. W. Golding. 1990. The interactive mesoscale initialization. Meteor. Mag 119:234244.

  • Wu, X., , G. R. Diak, , C. M. Hayden, , and J. A. Young. 1995. Short-range precipitation forecasts using assimilation of simulated satellite water vapor profiles and column cloud liquid water amounts. Mon. Wea. Rev 123:347365.

    • Search Google Scholar
    • Export Citation
Fig. 1.
Fig. 1.

Spectral diagram of the HIRS channels at left and the associated absorption and window regions to the right. The molecules associated with each absorption region are indicated

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 2.
Fig. 2.

Same as Fig. 1 but for AMSU-A

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 3.
Fig. 3.

Training curves for a neural network on a sample of data for the EBP algorithm with four different values of the learning coefficient β and for the LBL algorithm

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 4.
Fig. 4.

Weighting functions for channels (a) 1–8, (b) 9–12, and (c) 13–19 of the HIRS, and (d) channels 4–14 of the AMSU-A. The HIRS curves are courtesy of M. Chalfant of the National Environmental Satellite, Data, and Information Service (NESDIS); the AMSU-A curves are courtesy of N. Grody of NESDIS

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 5.
Fig. 5.

Profiles of correlation coefficient of satellite brightness temperatures with observed temperatures as a function of height for channels (a) 1–8, (b) 9–12, and (c) 13–19 of the HIRS and channels (d) 1–8 and (e) 9–15 of the AMSU-A

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 6.
Fig. 6.

Profiles of rmse of temperature (solid) and dewpoint (dashed) with height for the trained neural networks. The black lines represent the neural networks trained using all of the data, and the gray lines represent the results from neural networks trained on cloudy and clear cases separately

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 7.
Fig. 7.

Profiles of rmse of temperature (solid) and dewpoint (dashed) with height for the neural networks trained on the full input set (black lines) and after pruning selected input nodes and synapses (gray lines)

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 8.
Fig. 8.

Scatterplots of neural network estimates of temperature vs observed temperature for the independent data (403 cases) for all nine OPM layers. All temperatures are in degrees Celsius

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 9.
Fig. 9.

Same as Fig. 8 but for dewpoint

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 10.
Fig. 10.

A comparison of the vertical temperature and dewpoint profiles at Buffalo, NY, at 0000 UTC 22 Aug 1999 and (b) Dulles Airport, VA, at 1200 UTC 20 May 1999 according to radiosonde observations (thick solid lines) and the neural-network retrievals (thick dot–dashed lines). The left line of each pair represents dewpoint; the right line represents temperature. The thin lines sloping upward and to the right represent temperature; the thin solid lines sloping upward and to the left with an upward concavity represent potential temperature; the thin lines sloping upward and to the left with a downward concavity represent wet-bulb potential temperature. All values are in degrees Celsius on the abscissa (potential temperature = actual temperature at 1000 hPa). The dot–dashed lines sloping upward and to the right represent mixing ratio in grams per kilogram, with values given on the ordinate on the right-hand side of the plot and also along the top of the plot

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Fig. 11.
Fig. 11.

Vertical profiles of (a) rmse and (b) bias for the neural-network retrievals (solid lines), the operational ATOVS retrievals (dashed lines), and the first guess used in the ATOVS retrievals (dotted lines). The dewpoint retrievals are indicated by the use of dots at the actual retrieval levels. The thin vertical line on the bias plot indicates zero bias

Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<2051:CIMSRO>2.0.CO;2

Table 1.

List of radiosondes that were used to provide collocated sounding data for this study

Table 1.
Table 2.

Correlation tables of observed temperatures vs brightness temperatures from HIRS channels 1–10, HIRS channels 11–20, and AMSU-A channels 1–15 for collocated satellite and radiosonde data for 26 Oct 1998 through 31 Aug 1999. Italics and boldface are used to mark correlation values exceeding 0.5 and 0.7, respectively. Sfc indicates surface

Table 2.
Table 3.

Predictors in neutral-network training. The “direct” channels have weighting functions that directly overlap the atmospheric layer of interest, and the “indirect” channels have weighting functions that overlap a region in the stratosphere for which the temperature is strongly correlated with the temperature in the atmospheric layer of interest

Table 3.
Table 4.

Predictors used in neural-network training, in order of selection by a forward screening regression scheme. For brevity, “A” indicates AMSU-A channels and “H” indicates HIRS channels. Sfc is surface

Table 4.
Table 5.

Basic statistics of the neural-network temperature and dewpoint retrievals for 403 samples of independent data

Table 5.
Save