## Abstract

The authors have investigated the possibility of elaborating a new generation of radiative transfer models for climate studies based on the neural network technique. The authors show that their neural network–based model, NeuroFlux, can be used successfully for accurately deriving the longwave radiative budget from the top of the atmosphere to the surface. The reliable sampling of the earth’s atmospheric situations in the new version of the TIGR (Thermodynamic Initial Guess Retrieval) dataset, developed at the Laboratoire de Météorologie Dynamique, allows for an efficient learning of the neural networks. Two radiative transfer models are applied to the computation of the radiative part of the dataset: a line-by-line model and a band model. These results have been used to infer the parameters of two neural network–based radiative transfer codes. Both of them achieve an accuracy comparable to, if not better than, the current general circulation model radiative transfer codes, and they are much faster. The dramatic saving of computing time based on the neural network technique (22 times faster compared with the band model), 10^{6} times faster compared with the line-by-line model, allows for an improved estimation of the longwave radiative properties of the atmosphere in general circulation model simulations.

## Introduction

Knowledge of the radiative properties of the atmosphere is fundamental for an improved understanding of the processes that influence the earth’s climate and can modulate its variations. Although important progress has been made since satellite-borne instruments, such as the earth radiation budget instruments, and has provided spectrally integrated observations of the radiative budget at the top of the atmosphere (TOA), dramatic uncertainties remain concerning the extent to which clouds, water vapor, and other minor absorbing gases act to modify the greenhouse effect of the earth’s atmosphere and to feedback on the earth’s climate. Further progress toward answering those questions requires the understanding of the couplings between the thermodynamic structure of the atmosphere and the localization of the radiative energy sources and sinks within it. Up to now, the need for rapid codes to deal with global and long-term datasets appeared to be somewhat inconsistent with the accuracy needed for climate studies. As a matter of fact, accurate modeling of the atmospheric radiative processes requires line-by-line models, and so, because they consume so much computing time, cannot be used for climate (global, long term) simulations. For these reasons, numerous, highly parameterized radiative transfer schemes have been developed. The restricted accuracy of these schemes has been demonstrated to have important consequences on the sensitivity studies of climate models to various forcings. This is the case for carbon dioxide doubling experiments in which the scattering of the radiative simulations obtained by several general circulation models prevents a clear understanding of the radiative impact of this greenhouse gas on the earth’s climate (Cess et al. 1993), even though this is not an intrinsic limitation of all such schemes. In spite of these weaknesses in the parameterizations, the radiative computational burden still remains high. As a consequence, the temporal resolution of the radiative fluxes diurnal cycle in the simulations is often degraded. For instance, the longwave radiative code requires 10% of the computing time in the European Centre for Medium-Range Weather Forecasts (ECMWF) general circulation model and 18% in the climate model at Laboratoire de Météorologie Dynamique (LMD) (Sadourny and Laval 1984), although the radiative variables in both models are not initialized at every time step. It is recognized (e.g., Wilson and Mitchell 1986) that failure to resolve the diurnal cycle adequately can lead to a degradation of the general circulation model simulations. Faster and accurate radiative codes are the key to a better account of the diurnal cycle.

Statistical methods draw interesting prospects to solve this contradiction between speed and accuracy. In this paper we present a fast and accurate method for the computation of longwave flux profiles for clear as well as cloudy situations. The scheme, hereafter referred to as “NeuroFlux,” is based on a neural network technique—the multilayer perceptron (MLP), as defined by Rumelhart et al. (1986). The next section is devoted to a brief description of the MLP. In section 3 we review the radiative transfer equations for longwave fluxes in clear and cloudy atmospheres and describe the features of the neural network–based radiative transfer scheme. In section 4 we present the setup of the datasets on which the parameters of the neural networks are inferred. Section 5 presents the results of flux calculations with NeuroFlux.

## The multilayer perceptron

Neural network techniques, among others, allow us to reduce the number of calculations contained in a function *f*: *A* → *B,* where *A* ⊂ **R**^{n} and *B* ⊂ **R**^{m}, by approximating *f* with a parameterized function *g*:

The parameters are gathered in vector ** θ** ∈

**R**

^{p}and are called “synaptic weights.”

These statistical methods have proved their ability to deal with nonlinear problems, in particular the MLP, as defined by Rumelhart et al. (1986). The MLP relies on processors, called “formal neurons” or “neurons” with reference to the biological analogy. A neuron computes a weighted sum of its inputs and transfers this signal through a sigmoidal function (here the hyperbolic tangent). The weights are the parameters in vector ** θ**. The neurons are gathered in layers. One or more “hidden” layers of neurons may be introduced between the input layer and the output layer. Figure 1 shows an example of a single-hidden-layer network. Each neuron of layer

*k*receives information from the previous layer

*k*− 1. Information from the input layer (layer 0) is propagated to the output layer (the last layer) through the hidden layer (if any). All of the neural networks used here have a single hidden layer. The nonlinear response of a neuron to its inputs, due to the sigmoidal function, allows for the nonlinear fitting of the function parameterized by the neural network. The error introduced by the use of

*g*instead of

*f*depends on the distance of

*f*to the allowed functions

*g*on

*B.*Given a norm ‖·‖ on

*B*and given

*ζ,*the maximum error that we tolerate, we want:

The learning phase is devoted to the optimization of the MLP. That is, according to the representativity of the learning set, the algorithm selects a vector ** θ** among all of the possible

**’s. The so-called back-propagation algorithm enables us to derive the parameters in an iterative way. This method minimizes the quadratic error between the function computed by the network**

*θ**g*and the function to be approximated

*f*for a set of patterns for which inputs and ouputs are known—the “learning set.”

## A neural network model for the longwave radiative transfer

### Longwave radiative transfer equations

We restrict ourselves to longwave (LW) radiative transfer modeling. The formal solution of the radiative transfer equation for a stratified cloudless atmosphere in local thermodynamic equilibrium is

and

where *F*^{↑}(*P*) (*F*^{↓}(*P*)) is the upward (downward) radiative flux at the pressure level *P,* integrated over the longwave spectrum; *I*^{↑}_{ν}(*P, μ*) is the monochromatic radiance of frequency *ν* at pressure level *P,* propagating in a direction such that *μ* is the cosine of the zenith angle; *P*_{0} is the pressure at the surface; *ν* is the frequency; *B*_{ν}(*T*_{P}) is the Planck function at temperature *T*_{P} at pressure level *P*; and *τ*_{ν}(*P*′, *P, μ*) is the monochromatic flux transmittance for isotropic radiation between the levels of pressure *P* and *P*′. By introducing the monochromatic surface emissivity *ε*^{s}_{ν}, the surface radiative emission may be expressed by

We divide the atmosphere into *N* plane-parallel layers from the surface to the TOA. The bottom layer is layer 1 and the top layer is layer *N.* Following the graybody approximation (Stephens 1978), we describe the cloudiness in a layer *k* with its horizontal coverage of the grid box *n*_{k} and its longwave emissivity *ε*_{k}. The only assumptions required are that clouds do not reflect in the longwave and fill the model layer in the vertical.

When (*n*_{k}*ε*_{k} = 1) in a layer *k* and (*n*_{l}*ε*_{l} = 0) in the other layers, *l* ≠ *k,* the only cloud in the grid box is a black cloud (*ε*_{k} = 1) in layer *k,* and it completely covers the grid box (*n*_{k} = 1). In this case the vertical upward (downward) fluxes on level *i* above (below) the black cloud in layer *k* can be expressed as

and

where *i* ranges from 0 (the surface level) to *N* (the TOA), and *P*_{k} is the pressure at the top of layer *k.* The *F*^{↑}_{k}’s (*F*^{↓}_{k}’s) below (upon) the black cloud are the clear-sky fluxes given in (3) and (4).

In the general case, in which the effective emissivity *n*_{k}*ε*_{k} is measured in a layer *k* by a number between 0 and 1, we use the probability of a clear line of sight between the different levels of calculation. This model of multilayer clouds leads to the following expressions for the upward and downward fluxes:

and

where *H* (*J*) is the index of the highest cloudy layer below (above) the level of calculation, *C*_{k,i} is the probability of a clear line of sight between the levels *k* and *i,* and *F*^{↑}_{k}(*P*_{i}) *F*^{↓}_{k}(*P*_{i}) is the upward (downward) flux at pressure level *P*_{i} if the only cloud in the atmosphere was a blackbody in layer *k* [(6) and (7)]. With this formalism, *F*^{↑}_{0} and *F*^{↓}_{0} correspond to the fluxes in the absence of clouds, described in (3) and (4). The *C*_{k,i}’s, or cloud fractional coverages, are functions of the *nε*’s and depend on the way the cloudy layers overlap. Three possibilities are taken into account according to the structure of the clouds: the overlap may be random, maximum, or maximum–random (e.g., Harshvardhan et al. 1987).

### The scheme

Equations (8) and (9) mean that the LW fluxes in the presence of multilayer graybodies, *F*^{↑} and *F*^{↓}, can be deduced from the clear-sky fluxes, *F*^{↑}_{0} and *F*^{↓}_{0}, and from the fluxes in the presence of single-layered black clouds, *F*^{↑}_{k} and *F*^{↓}_{k}, *k* > 0. In our approach, the computation of the *F*^{↑}_{k}’s and *F*^{↓}_{k}’s, *k* ≥ 0, is based on the MLP, whose basic principles are described in section 2. A first neural network (NN-Clr) computes the clear-sky part of the longwave fluxes, *F*^{↑}_{0} and *F*^{↓}_{0}. Then a battery of neural networks (the NN-Cld’s) computes the contribution of every cloudy layer, *F*^{↑}_{k} and *F*^{↓}_{k}, with *k* > 0. Each neural network among the NN-Cld’s is dedicated to the calculation of the fluxes, either upward or downward, in the presence of a single black cloud in a specified layer *k.* The overall fluxes *F*^{↑} and *F*^{↓} are then computed according to (8) and (9). Thus, if *M* is the number of allowed cloudy layers in the model, our radiative code relies on (1 + 2 × *M*) neural networks. In the version we use, the temperature, water vapor, and ozone concentrations are defined in the middle of 19 atmospheric layers (see Table 1) from the TOA to 1013 hPa. The cloudiness is described by the effective emissivity profile [(*nε*)_{i}]_{i=1,M}. The surface temperature, the mean CO_{2} concentration, and the mean longwave surface emissivity are also inputs to the model. The concentrations of the minor gases (e.g., N_{2}O and CH_{4}) are set to the mean current level.

Table 2 describes the neural network architecture. All hidden layers are single. NN-Clr uses the 60 inputs described in the previous paragraph. The 40 outputs include 20 upward fluxes and 20 downward fluxes. The number of inputs and outputs of each one of the (1 + 2*M*) cloudy networks NN-Cld depends on the position of the considered cloudy layer and on the kind of flux computed. For instance, the computation of the downward flux under a black cloud that top arises at 1005 hPa uses only the temperature at 1013 hPa [see Eq. (7)]. Thus, the input number varies between 1 (computation of the downward flux under a black cloud that top arises at 1005 hPa) and 58 (computation of the upward flux above the previous black cloud). Similarly, the output number varies between 1 and 19.

Note that every neural network can compute quantities of the same form:

where *P* and *P*_{b} are the pressures at the boundaries of a given atmospheric layer.

One neural network would be sufficient for the computation of all of the *F*^{↑}_{k}’s and the *F*^{↓}_{k}’s, both clear and cloudy, but the specialization of the neural networks in our model in selected fields of the computations improves the accuracy of the neural network simulations because it induces a smaller number of inputs and fewer parameters to set.

## Training neural networks

### The TIGR (Thermodynamic Initial Guess Retrieval) dataset

It appears from the neural network method (described in section 2) that the accuracy of our scheme stems from the statistical characteristics of the training datasets of neural networks NN-Clr and NN-Clds. The essential tool for generating them is the TIGR data bank.

The second version of the data bank, TIGR-2 (Achard 1991; Escobar-Munoz 1993), groups together about 1800 atmospheric situations selected from more than 80 000 radiosonde reports collected over 10 years from all parts of the world.

The selection method is an iterative process. It relies on a distance *D*_{T}, which measures the dissimilarity in the temperature profiles of the atmospheric situations. During the first step of TIGR-2’s sampling, the first atmospheric situation in the 80 000 radiosondes is archived. At step *n* the *n*th atmospheric situation is selected if its minimum distance *D*_{T} to the already archived situations is greater than an empirically chosen number *D*^{min}_{T}. Thus, the unequal distribution of the 80 000 radiosondes over space and time disappeared. TIGR-2 can be assimilated into an almost regular networking of the observable atmospheric temperature profiles associated with their corresponding water vapor and ozone profiles.

The set is classified into quite homogeneous groups of situations determined from statistical methods, the five airmass classes: tropical (trop), midlatitude (ml1), cold temperate and summer polar (ml2), Northern Hemisphere very cold (pol1), and winter polar (pol2) (Achard 1991). TIGR-2 has been used successfully for retrieving the main thermodynamic parameters from satellite observations with a physicostatistical approach, the Improved Initialization Inversion (3I) (Chédin et al. 1985), and with neural networks [Escobar-Munoz et al. (1993), for the retrieval of temperature profiles, and Cabrera-Mercader and Staelin (1995), for the retrieval of relative humidity].

Nevertheless, two limitations in the design of the TIGR-2 dataset appeared in the framework of the present study. First, the profile selection relies only on a temperature criterion, which does not take care of the variability of the water vapor profiles. Second, the dataset does not contain enough situations with high total water vapor contents. These are mostly below 4 g cm^{−2}, whereas it has been observed that the water vapor content can exceed 7 g cm^{−2} in areas such as the warming pool in the western Pacific. These two limitations are particularly critical in the tropical class.

### A new tropical class

TIGR-2’s identified deficiencies mostly impact the tropical class. We undertook to reshape it. The new tropical class has been established according to the following parameters:

the addition of atmospheric situations inferred from satellite data to the initial radiosonde dataset using the 3I algorithm and

a new sampling method that takes the water vapor profiles into account.

The TIGR-2 dataset is based upon radiosonde reports, although they do not cover the globe equally. Instruments such as the TIROS-N (Television Infrared Observation Satellite) Operational Vertical Sounder (TOVS) aboard the National Oceanic and Atmospheric Administration (NOAA) polar-orbiting satellites have far better coverage. Traveling through all latitudes and longitudes, TOVS’s are able to observe the properties of the atmosphere over areas poorly described by radiosondes (sparsely populated lands or oceans). The 3I algorithm is designed to retrieve the temperature profile, layered water vapor contents, surface characteristics, and cloud properties from TOVS-observed radiances. The 3I algorithm has been used for the analysis of a multiyear archive of TOVS satellite data in the framework of the NOAA/National Aeronautical and Space Administration (NASA) Pathfinder program (Scott et al. 1996). The output products for the morning path during July 1987, January 1988, and July 1989 were extracted from our in-house archive. Only the temperature and water vapor information were used from the TOVS 3I-retrieved variables.

To design the new tropical class, we first merged two datasets: the 30 000 atmospheric situations of tropical type among the 80 000 radiosonde reports and the tropical-type 3I retrievals extracted from the archive for the three months mentioned above. More than 500 000 tropical situations were collected. We then developed an iterative method to sample the resulting set (Chevallier 1998). The method allowed us to select an atmospheric situation from the 500 000 tropical situations if its temperature profile, its water vapor profile, or the association of the two, was different enough from the already archived profiles. About 900 tropical situations (a relative majority of the 30 000 radiosonde reports) were finally selected due to their high vertical resolution.

The histograms of the water vapor contents of the new tropical dataset and of the TIGR-2 tropical class are shown in Figs. 2a and 2b. Note the more regular spread of the points of the 900 situations that have a Gaussian-like shape.

We added this tropical set to the four temperate and polar classes of TIGR-2. This new version, “TIGR-3,” contains approximately 2300 situations.

### A better representation of the climatological regimes

As shown by Stephens et al. (1996), the tropical class of TIGR-2 contains few cases of deep, moist ascent profiles: many of the atmospheric situations contained in the data bank have characteristics of the subsidence process. The irregular representation of the vertical climatological regimes is illustrated in Fig. 3a with a Rossby diagram of TIGR-2’s tropical class. The vertical coordinate Θ is the (liquid water) potential temperature, and the horizontal coordinate *q* is the water vapor mixing ratio. Each TIGR-2 tropical situation is represented by 14 points between 300 and 1013 hPa. The highest part of the profiles appears in the top left corner. For higher pressure levels, the points spread in a triangular shape, indicative of the thermodynamic constraints on the atmospheric profiles. According to their location on the diagram, a point from a situation can be attributed to a particular climatological regime: convection; radiative cooling; upper-tropospheric, meridional transport and subsidence; entrainment; and boundary layer transport (Boers and Prata 1996). Figure 3a shows an irregular spread of the points, indicating an irregular repartition of the TIGR-2 tropical situations on the climatological regimes. Figure 3b presents the same type of projection with the TIGR-3 tropical class. The overall density of the points has increased due to the increased number of atmospheric situations. More importantly, the filling is more regular. We, therefore, have improved the distribution of the different vertical climatological regimes with the new tropical class—in particular, the convection process.

### Surface characteristics, CO_{2} concentration, and cloudiness information

The learning sets that correspond to the clear and cloudy neural networks, upon which our radiative transfer scheme is based, were selected from TIGR-3. In order to take into account existing discontinuities between the surface temperature and the surface air temperature, we defined a surface temperature for each of the TIGR-3 situations with a random drawing, allowing temperature discontinuities up to 10 K. The mean longwave surface emissivity and the CO_{2} concentration were also randomly chosen: between 0.9 and 1.0 and between 200 and 900 ppmv, respectively. For the cloudy-sky learning datasets, black clouds in each atmospheric layer among the 19 were alternately introduced, even though the thermodynamic description of a layer sometimes made the presence of a cloud unlikely. Thus, the learning sets contain atmospheric situations that may never be observed in the atmosphere: they are not only compilations of real atmospheric situations but are also devoted to teach to the networks the computation of longwave radiative fluxes from thermodynamic profiles. For this purpose, it is suitable that a physical phenomenon—such as the presence of a cloud in the atmospheric layers—has in the learning datasets, a regular distribution, rather than a more realistic distribution.

### The vertical pressure grid

All atmospheric profiles in the learning datasets have been interpolated from the 40 pressure-level grid on which they are archived in the TIGR dataset, on the grid that NeuroFlux uses. We defined NeuroFlux in such a way that each version of the scheme is dedicated to a specified vertical pressure grid. This is not the case with the usual wideband models, which allow the pressure grid to be changed easily. In the versions of NeuroFlux that we use in this paper, the temperature, water vapor, and ozone profiles are discretized in the middle of 19 layers, as shown in Table 1. The pressure levels of this grid are fixed. Surface pressures less than 1013 hPa can be introduced with a blackbody at the ground pressure:the altitude is treated in the same way as is cloudiness.

If one wants to choose a different vertical grid, one has to retrain the networks on this new grid. The most time-consuming computations are absorbed in the learning phase: once it is done, the method is very fast. The sigma-pressure-level grids used in most GCMs can be introduced easily by adding the surface pressure to the inputs of the scheme.

### Computation of the radiative characteristics of the learning datasets

We used two radiative forward models to obtain the radiative characteristics of these profiles [the LW fluxes from (3), (4), (6), and (7)]: the ECMWF wideband operational scheme (Morcrette 1991; Zhong and Haigh 1995) and the Automatized Atmospheric Absorption Atlas line-by-line model (4A) (Scott and Chédin 1981; Tournier et al. 1995). In the following, the ECMWF wideband model will be referred to as “the WBM.” It uses averages of the absorption parameters over spectral intervals of hundreds of inverse centimeters, whereas 4A takes into account each absorption line of the different atmospheric constituents. In an earlier paper, we also presented some results of the neural network model using a narrowband model (Chéruy et al. 1996a).

The WBM refers to nine spectral intervals, for which absorption by H_{2}O, O_{3}, CO_{2}, CH_{4}, N_{2}O, CFC-11, and CFC-12 is accounted. This model was calibrated against the Geophysical Fluid Dynamics Laboratory (Schwarzkopf and Fels 1991) line-by-line results. The parameterization of the water vapor has been improved recently by Zhong and Haigh (1995). This code is fast enough to be used in operational weather forecasts (at ECMWF) and for climate simulations (in the LMD general circulation model).

The 4A model takes into account all radiatively active atmospheric constituents, the H_{2}O, N_{2}, and O_{2} continua and the CO_{2} line coupling. This line-by-line and layer-by-layer method allows us to quickly restore monochromatic optical depths for each absorbing gas in each atmospheric layer for realistic atmospheric situations: the most time-consuming computations have been absorbed, once and for all, by precomputations, the results of which constitute the 4A atlas. The complete validation of the method for high spectral resolution is based on comparisons with the high-resolution interferometer sounder (Smith et al. 1988) measurements. The 4A model is much more accurate than the WBM, but its lengthy computing time forbids its use for long-term global calculations.

In numerical models of atmospheric circulation, the radiative cooling rates are used instead of the fluxes themselves. The cooling rates, in K day^{−1}, are the divergences of the radiative fluxes. They equal the rate at which energy is lost by the atmosphere from radiation:

where *F*(*P*) = *F*^{↑}(*P*) − *F*^{↓}(*P*) is the net flux, *C*_{p} is the heat capacity at constant pressure, and *g* is the gravitational acceleration.

To compare the two codes, we have computed the mean and standard deviation of their computations of the fluxes and cooling rates for 1032 radiosonde reports (Moulinier 1983). These observations—in the following set “S”—cover a wide range of atmospheric situations. As we did for the 80 000 radiosonde reports, we classified S into the five TIGR airmass classes. For the forthcoming statistics, the two midlatitude classes (ml1 and ml2) are mixed in a single midlatitude class and the two polar classes (pl1 and pl2) are mixed in a single polar class. Thus, the 1032 radiosonde reports are divided into three airmass classes: tropical (265 situations), midlatitude (509 situations), and polar (258 situations). The radiative computations of these atmospheric situations are performed cloudless. The atmosphere is divided into the 19 layers from Table 1. For 4A, the monochromatic optical depths are first computed on 39 layers from the archived optical depths before being interpolated into the 19 layers.

Figure 4 shows the differences between the two codes, 4A and WBM, for the computation of cooling rates, and Table 3 for the outgoing longwave radiances (OLRs) and downward longwave radiance at the surface (DLR). The absolute value of the sytematic differences between the cooling rates from 4A and from WBM reaches 0.4 K day^{−1} at 300 hPa and near the ground in the tropical class (Fig. 4). The sign changes around 500 hPa. Compared with 4A, the WBM tends to cool the high troposphere and heat the boundary layer. The standard deviations are of the order of 0.2 K day^{−1} above 600 hPa and increase below. Overall, the differences increase with the water vapor content. For the OLR, the mean difference is between 2.1 W m^{−2} in the tropical class and 0.7 W m^{−2} in the polar class. The standard deviations are about 0.7 W m^{−2}. For the DLR, the differences are larger, with an rms of the order of 3 W m^{−2}.

## Performances

### NeuroFlux-A: Comparisons with WBM using radiosonde data

The *F*^{↑}_{n}’s and the *F*^{↓}_{n}’s for the training of the neural networks were first computed with WBM. These learning datasets led to a first version of our neural network model, called NeuroFlux-A.

Radiative fluxes obtained by using either NeuroFlux-A or WBM have been compared for the 1032 radiosonde reports from the set S. None of them was used in the learning datasets of the neural networks. For the three airmass classes, we computed the biases and standard deviations of the differences between the radiative calculations of NeuroFlux-A and WBM. The computations are performed cloudless. Results are shown in Fig. 5. The standard deviations are about 0.2 K day^{−1}. The absolute value of the biases are less than 0.05 K day^{−1} in the midlatitude and polar classes. In the tropical class the absolute value is slightly higher but less than 0.1 K day^{−1} except near the ground, where the bias reaches −0.2 K day^{−1}. Compared with Fig. 4, the standard deviations are similar, but the biases are significantly smaller. Because NeuroFlux is based on principles that differ greatly with the principles of WBM, the maximum errors are also interesting characteristics of the validation. For NeuroFlux-A, the maximum error in the troposphere equals 1.8 K day^{−1}, less than the maximum error of the WBM compared with 4A, which reaches 2.9 K day^{−1}.

For the computation of OLR and DLR (Table 4), the standard deviations are similar to those between WBM and 4A: 1.0 W m^{−2} for OLR and 1.6 W m^{−2} for DLR. The biases, whose absolute value is less than 1.0 W m^{−2}, are significantly smaller.

We checked the accuracy of the cloudy neural networks by artificially adding black clouds in the radiosondes and comparing the corresponding computations of the WBM and NeuroFlux-A. The neural network–based model exhibits, with WBM, differences similar to the ones observed in clear-sky situations (results not shown).

### NeuroFlux-A: Comparisons with the WBM using TOVS 3I global data

To study the robustness of the method, we have extended the previous comparisons between the two models, WBM and NeuroFlux-A, to an observation dataset consisting of two years of global data. To do so, we used the geophysical parameters inferred from TOVS data, actually the TOVS 3I–retrieved products morning path, obtained from April 1987 to March 1989 (Scott et al. 1996). The computation of the radiative fluxes from the TOVS 3I products using WBM is explained in Chéruy et al. (1996b). Each month groups together hundreds of thousands of tropical, midlatitude, and polar clear and cloudy atmospheric situations. We computed the global monthly mean error of the instantaneous differences. For upward flux calculations (Fig. 6a), the rms of differences between NeuroFlux-A and WBM remains between 1 and 2 W m^{−2}, which represents less than 0.6% of the upward flux values. For downward fluxes (Fig. 6b), the results are similar with a larger difference around 500 hPa of about 4%. This larger error is due to both a particularity in the TOVS 3I–retrieved profiles and the way NeuroFlux parameters are inferred. The water vapor profiles inferred from TOVS 3I—actually amounts in four coarse layers—sometimes present important discontinuities at 500 hPa. Now, in their computations, the neural networks exploit the correlations that usually exist in the learning datasets between successive water vapor specific humidities in the water vapor profiles. A coarse resolution of the profile degrades both the correlations and the quality of the results.

The difference between outputs of NeuroFlux-A and WBM decreases below 1% near the ground. For the calculation of cooling rates, deviations are less than 0.3 K day^{−1}. We obtained similar results with the evening path data.

The neural network method performs equally for the whole period. Note that the error does not particularly decrease either during July 1987 or during January 1988, although these two months were used for the extension of TIGR-2. The stability of the method enables us to use the code for long-term and global simulations, as is done in numerical models of atmospheric circulation.

### NeuroFlux-B: Comparisons with 4A

For NeuroFlux-B, the longwave fluxes of the learning datasets for the neural networks are computed with 4A. We carried out the validation of the parameterization on the 1032 radiosonde reports from the set S. Figure 7 shows the results for the computation of the cooling rates, and Table 5 shows the results for the computation of the OLR and DLR in clear situations. Although the training model is much more accurate, the differences between NeuroFlux-B and 4A are similar to the ones between NeuroFlux-A and WBM. Compared to the ones between WBM and 4A, the standard deviations are similar, but the biases are significantly smaller. The validation of the NeuroFlux-B cloudy-skies neural networks gave similar results (not shown).

Therefore, the neural network parameterization using line-by-line computations appears to be more accurate than the classic parameterization of WBM.

### NeuroFlux-B: Doubling CO_{2} concentrations experiment

The computing time of NeuroFlux-B reference code (4A) forbids global long-term validations of NeuroFlux-B as was done with NeuroFlux-A, though other kinds of validations are possible: in particular, the classic sensitivity study doubling concentrations of CO_{2}. Indeed, for a climate model it is particularly important to obtain the correct tendency to changes in CO_{2} concentrations. To assess this sensitivity, both 4A and NeuroFlux-B were used to compute the differences in cooling rates for set S resulting from doubling concentrations of CO_{2} from its current level of 355 to 710 ppmv. For brevity, only the results for the tropical class are shown. Figure 8a presents the mean and standard deviation of the 4A computations. They show known features of the atmospheric response to increased CO_{2} (e.g., Schlesinger and Mitchell 1987): the stratosphere cools and the troposphere warms. Figure 8b presents NeuroFlux-B computations and the corresponding error. NeuroFlux-B reproduces the stratospheric tendency well with negligible biases and standard deviations. In the troposphere, where the change in cooling rate is far smaller, the error is characterized by a standard deviation of about 0.03 K day^{−1}, that is, more than half of the rms signal (about 0.05 K day^{−1}). Nevertheless, the error bias is less than 0.01 K day^{−1} when the mean signal is about 0.04 K day^{−1}.

Every paramerization method brings noise into the realizations of a function *f* to be computed. The results of NeuroFlux-B validation show that the noise introduced by the neural networks is small for computations of individual fluxes or cooling rates but is not negligible in sensitivity tests, though the mean tendency is well reproduced by the method. This is very fundamental for climate studies.

### Computing time

The architecture of the neural networks enables fast computations with modern computers in spite of the number of parameters the neural networks include. We compared the computing times of NeuroFlux-A and NeuroFlux-B to the computing times of WBM and 4A, respectively.

The computing times of our schemes vary according to the number of cloud layers since for each additional layer in the atmosphere the algorithm includes two more neural networks to perform the computations. Figure 9 shows the gain in rapidity of NeuroFlux-A compared with the WBM versus the number of cloudy layers. The computations were performed on a CRAY C98. For clear skies, the neural network method saves computing time by a factor of 32, as compared with the initial scheme. For 15 cloudy layers, the computing time is smaller by a factor of 16. For an entire month of TOVS data, with both clear and cloudy situations, NeuroFlux-A is 22 times as fast.

Since the architectures of NeuroFlux-B and NeuroFlux-A neural networks are the same, their computing times are exactly the same, although the code NeuroFlux-B simulates—the 4A line-by-line model—is much more accurate. The neural networks allow us to benefit from the accuracy of the line-by-line model while drastically reducing the required computing time. Compared with 4A, the required computing time is divided by a factor of 10^{6}.

## Conclusions

We showed that neural networks can be used successfully to derive the vertical atmospheric longwave radiative budget from the TOA to the surface. We used a wideband model and a line-by-line model for the longwave radiative transfer—two codes differing widely in their computing time requirements and accuracy—to compute profiles of longwave fluxes and cooling rates from the TOA to the surface. The results of these computations were used for both training and controlling the quality of two neural network models—NeuroFlux-A and NeuroFlux-B.

Global-scale comparisons with the wideband model showed that the error associated with our parameterization stays significantly below the error commonly admitted for a wideband model. By comparison, the discrepancies observed between various wideband models in the 1991 ICRCCM-2 exercise (Ellingson and Ellis 1991) were significantly larger. For example, the discrepancies reach 12% for the downward flux at the tropopause of a midlatitude summer profile, whereas we estimated the error introduced by our parameterization to be less than 5% in similar conditions. Compared with WBM, the use of neural networks trained with the line-by-line scheme leads to a more accurate parameterization of longwave radiative processes: the systematic differences between the parameterized radiative transfer scheme and the line-by-line model are significantly reduced, while the standard deviations are similar. The impact of these differences is currently evaluated in the framework of the LMD GCM. The decrease of the biases should induce a reduction of the systematic undesirable tendencies of the simulations, such as excessive cooling in the high troposphere, as shown in Fig. 4.

The major advantage of NeuroFlux is its reduction in computing time; it is 22 times faster than WBM and 10^{6} times faster than 4A. NeuroFlux enables more frequent calls to the longwave routines and, thus, takes the diurnal cycle into account better.

## Acknowledgments

The authors wish to thank J.-J. Morcrette for making the different versions of the ECMWF radiative code available to them and for fruitful discussions. The most time-consuming computations were performed on IDRIS/CNRS (Institut du Développement et des Ressources en Informatique Scientifique Centre National de la Recherche Scientifique) computers.

## REFERENCES

## Footnotes

* Current affiliation: ECMWF, Shinfield Park, Reading, Berkshire, United Kingdom.

*Corresponding author address:* Dr. F. Chevallier, ECMWF, Shinfield Park, Reading, Berkshire RG2 9AX, United Kingdom.