## 1. Introduction

The optimal spectral sampling (OSS) approach is a fast and accurate method for treating molecular absorption in radiative transfer calculations (Moncet et al. 2008, hereafter Mon08). OSS is a conceptually simple approach for extending the *k* distribution (e.g., Goody and Yung 1989) or exponential sum fitting of transmittance (ESFT; Wiscombe and Evans 1977) techniques to vertically inhomogeneous atmospheres (e.g., West et al. 2010). In practice, OSS reduces the modeling of radiometric measurements over spectral channels to a one-dimensional optimization problem, seeking a minimal set of spectral points (nodes) such that spectrally integrated radiances (or transmittances) are well approximated as the weighted average of monochromatic radiances (or transmittances) at the selected nodes. The required accuracy of the approximation can be set when the optimization is performed. The physical nature of the method confers the model a lesser degree of dependence on the dataset used for training than with statistical methods, which makes it particularly appropriate for climate monitoring or retrieval of minor pollutants—applications that require the ability to model weak spectral signatures in the most atypical situations. Because OSS is aimed at replacing line-by-line (LBL) models in applications that require processing large datasets, robustness and fidelity to the LBL model are the main design requirements for OSS. The method has been applied to a wide variety of radiative transfer modeling problems, including aircraft (Liu et al. 2003), limb (Eluszkiewicz et al. 2015, manuscript submitted to *J. Quant. Spectrosc. Radiat. Transfer*), and ground-based observations, and covers the entire spectrum from microwave (Lipton et al. 2009) to ultraviolet. It is integrated in the operational Cross-Track Infrared Sounder (CrIS; Han et al. 2013) processing system (Moncet et al. 2005; Divakarla et al. 2014) and is used for retrieval of pollutants from the Tropospheric Emission Spectrometer (TES; Cady-Pereira et al. 2014, 2012; Shephard et al. 2011).

This paper focuses on the modeling of hyperspectral infrared measurements from spaceborne sensors for retrieval applications and assimilation of satellite data in numerical weather prediction (NWP) and climate models. These applications commonly deal with large amounts of data and require very fast models to produce both radiances and Jacobians (derivatives of the radiances with respect to the model inputs). Whether the radiometric observations are provided in spectral space or in a linearly transformed space, their treatment in OSS is identical. A linear transformation of particular relevance is the projection onto a truncated set of empirical orthogonal functions (EOFs) derived from a large ensemble of measured or calculated spectra, reducing the set of channel radiances to a smaller set of principal components (PCs). PCs have been proposed for data compression and for filtering noise from satellite observations (e.g., Huang and Antonelli 2001; Aires et al. 2002; Antonelli et al. 2004). Their direct use in physical retrieval systems greatly speeds up the inversion process (Liu et al. 2006b). The use of PCs is also being considered for NWP assimilation (Matricardi and McNally 2014). In this case, strategies are still sought for dealing with the nonlocal nature of the Jacobians (mixing issue) especially in the treatment of cloudy observations when using the full spectrum (Collard et al. 2010).

*l*is expressed as

*N*

_{l}OSS wavenumbers

*l*. In the vector form of (1),

*S*(training) scenes:

^{−4}cm

^{−1}around 700 cm

^{−1}to about 2.5 × 10

^{−4}cm

^{−1}in the shortwave end of the infrared spectrum. The results presented in this paper are based on LBLRTM, version 12.1 (Alvarado et al. 2013). The OSS method can be applied with any alternative reference LBL model.

The OSS method can be coupled with any monochromatic radiative transfer (RT) solver (scattering or nonscattering). The layer molecular optical depths (and their derivatives with respect to constituent concentrations and temperature) input to the RT solver are obtained by interpolation from precomputed absorption coefficients stored for all nodes in an LUT as a function of pressure and temperature and, for water vapor, as a function of water vapor itself (Mon08, their section 5). The structure of the OSS forward model differs from that of models employed with traditional channel transmittance parameterizations (McMillin and Fleming 1976; Fleming and McMillin 1977; McMillin et al. 1979; Eyre and Woolf 1988; Eyre 1991; Strow et al. 2003; Matricardi 2003), which process the channels sequentially. In OSS, the loop over nodes is the main loop. The execution of (1) is embedded in the node loop, after the call to the RT model, to update channel radiance and Jacobians each time a new node has been processed.

*N*

_{tot}is the total number of nodes for the set, with each node counted once even if it is used by more than one channel and

^{1}In this equation,

*N*

_{tot}. In the thermal regime for clear-sky calculations, the RT part is quite fast, and, in this case, the Jacobian-mapping operation [second term on the right-hand side of (3)] may represent a significant fraction of

*t*

_{OSS}if

The “localized” training method described in Mon08 is primarily aimed at minimizing *N*_{l} needed to model radiances in any given channel *l* within prescribed accuracy. This approach operates on individual channels (one channel at a time) using nodes selected from within the interval Δ*ν*_{l} over which the instrument function [or instrument line shape (ILS)] is nonzero (the entire band for unapodized or weakly apodized ILS). The adaptive preselection scheme employed in Mon08 (their section 2), by depleting the number of candidate nodes available for the final selection stage, favors node reuse in regions where ILSs overlap and typically maintains *N*_{tot} on the order of *N*_{chan} with current operational hyperspectral sounders, such as the Infrared Atmospheric Sounding Interferometer (IASI; Hilton et al. 2012), the Atmospheric Infrared Sounder (AIRS; Aumann et al. 2003), or CrIS (apodized) when fidelity to the reference LBL model [(2)] is required to be well within the sensor noise. This similarity of *N*_{tot} to *N*_{chan} results in computational speeds that are comparable to those of statistical transmittance parameterizations for radiance calculations. This approach has proven to be robust and accurate (e.g., Saunders et al. 2007; Calbet et al. 2011). The computation of derivatives of monochromatic-layer optical depths with respect to atmospheric temperature and constituents’ concentrations from data stored in LUTs is simple and gives the OSS method a speed advantage for the computation of Jacobians. Recent comparisons with the Joint Center for Satellite Data Assimilation (JCSDA) Community Radiative Transfer Model (CRTM; e.g., Liu et al. 2013) and the Radiative Transfer for Television and Infrared Observation Satellite (TIROS) Operational Vertical Sounder, version 11 (RTTOV-11; Hocking et al. 2014), have shown that OSS (trained for 0.05-K accuracy) is faster by a factor of more than 2 for the modeling of IASI radiances and Jacobians. The generalization of the OSS node-selection methodology, called “global” training in Mon08, seeks to minimize the total number of nodes required to model an entire instrument spectrum (or individual bands) by simultaneously operating on all channels across that interval. The global solution described here can provide substantial reductions in *t*_{OSS} in certain configurations, an advantage for applications that handle large volumes of data, and an order-of-magnitude reduction in the size of the absorption LUTs.

This paper describes the global training method, which leverages an enhanced localized training method and provides an assessment of the accuracy and computational performance of the global solution. We address modeling of radiances as well as PCs of radiances. The capability of OSS to handle large numbers of variable molecular absorbers, as well as clouds and complex land surface properties over extended spectral regimes (e.g., nonapodized interferometric ILS) is discussed. Performance of localized and global training for application to existing hyperspectral sensors is also discussed, with comparative timing performances for various configurations.

## 2. Enhancements to the localized training approach

### a. Background

The basic approach to finding a minimal number of nodes that approximates radiances in a channel *l* within a prescribed accuracy consists of successively adding nodes selected from an initial ensemble contained in the interval Δ*ν*_{l} such that, with each addition of a new node, improvement in the sum of squared errors (SSE) computed over all scenes in the training ensemble is obtained. The Wiscombe and Evans (W-E) approach discussed in Mon08 looks for maximum SSE improvement at each update and differs from the basic steepest descent method (Snyman 2005) in that determinant control and term dropping procedures provide mechanisms for escaping local minima. The alternative Monte Carlo (MC) approach attempts instead to modify the search path at each step using a statistical rule and is more successful than the W-E search at finding a global minimum when the node count in the final selection exceeds about 10 (e.g., in strong absorption bands) and the initial set of nodes contains highly redundant information (like the initial set produced by LBL models). In other situations, the W-E and MC searches produce similar node counts, but the W-E search is faster.

^{2}for each channel

*l*):

The threshold value *ε*_{thr} = *α*NEdN, where *α* is a user-specified parameter less than 1 and NEdN is noise-equivalent delta radiance. As explained in section 2b, validation with independent datasets is an integral part of building and testing our training set. Therefore, nominal performances quoted in this paper are met with both dependent and independent datasets.

The localized node selection process for any given sensor typically starts from a generic set of node selections obtained for boxcar pseudo-instrument functions uniformly spanning the spectral domain of interest (Mon08, their section 3). The width of the boxcars ranges from less than 0.01 cm^{−1} (with which we produce an OSS database called HIRES) to more than 5 cm^{−1}. Training for an actual sensor ILS starts from the solution corresponding to low-resolution boxcars and goes to higher-resolution boxcars only if a solution that meets the accuracy threshold cannot be found. In this way, the method encourages maximum node reuse (and, hence, lowering of *N*_{tot}) in regions where adjacent channels overlap.

### b. Training profiles and treatment of variable constituents

The robustness of radiance-trained OSS models depends substantially on the properties of the set of atmospheric profiles constructed for training. A property that takes a heightened importance with the global training method is the destruction of profile smoothness by random perturbations (Mon08, their section 4). By greatly weakening profile vertical correlations, the OSS selection process tends to model each channel by selecting nodes that represent the distinct portions of the atmosphere the channel senses (as would be the case if we were training the models to fit Jacobians), rather than relying on fewer nodes, the radiances of which would be correlated by way of profile correlations. If the vertical correlations were not greatly diminished, the number of nodes needed to model a set of channels would be reduced, but the model would be less robust when confronted with anomalous profiles. The adequacy of our training set is monitored by verifying that the accuracy of the OSS model is within the threshold

In OSS, molecular constituents are divided into variable and fixed categories (Mon08). The number of optically active variable constituents varies from node to node. One advantage of the OSS formalism over transmittance parameterizations is that adding new variable molecular constituents does not require any algorithm change, either in the training or in the forward model. All that is required is including the new constituent in the training scenes and adding entries for those constituents in the absorption LUTs.

While operational infrared models typically include 6 variable constituents (Hocking et al. 2014; Chen et al. 2012), the infrared models that we use in support of retrieval and spectroscopic validation studies typically include 13 variable constituents (Table 1). Not all those constituents are necessarily being retrieved, but some constituents’ concentrations may be dynamically updated using information from external sources to capture regional and seasonal variability, as well as secular trends. The variable constituents were extended to 20 (Table 1) for application to retrieval of pollutants from TES, with only O_{2}, N_{2}, NO, and NO_{2} as fixed gases.

Minor constituents treated as variable (V) and fixed (F) in versions of OSS infrared forward models. Constituents that are treated as fixed in all current versions of OSS are not listed. Most operational models handle the six variable molecules listed in the first column, but the list of fixed molecules may vary from model to model.

The original atmospheric set used for radiance training with H_{2}O and O_{3} as the only variable absorbers has been described in Mon08. Concentration profiles for all new minor constituents are selected by sampling the output of year-long runs of chemistry models. An analysis similar to the one used by Chevallier (2002) is applied to each constituent to reduce the size of the datasets down to a few hundred profiles while ensuring that variability in concentration profiles is adequately represented in each set. Correlations between the minor constituents’ concentrations may result in the mixing of spectral signatures associated with different absorbers, similar to the effect of profile correlations discussed above. These correlations are deliberately excluded by randomly scaling the column amounts and adding the scaled profiles at random to each scene in the original training set. The scaling factors are chosen so that the range of the training set encompasses the concentrations observed locally (from in situ reports) near sources of emission during pollution events and accounts for secular trends. Sources of concentration profiles for the constituents listed in Table 1 include the NASA Global Modeling Initiative (GMI) (Strahan et al. 2007; Duncan et al. 2007) and Harvard University GEOS-Chem (Bey et al. 2001) models. Constituents not treated in current chemical models are obtained from the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) database (http://www.atm.ox.ac.uk/RFM/rfm_downloads.html; Remedios et al. 2007). We verify the adequacy of the constituent profile training set by confirming that OSS model accuracy is maintained for every channel even when the concentration of any one constituent is set to its maximum or minimum value: that is, we verify that there are no spurious effects of extreme concentrations of any constituent.

Because minor atmospheric pollutants are active over limited spectral ranges and their impact on radiances is relatively small, going from 6 to 20 constituents only results in a moderate increase in *N*_{tot} (mostly in the longwave and midwave regions of the spectrum) and in the average number of constituents per node (Fig. 1). For IASI, the impact on the size of the LUT and on the forward model execution time for the first two bands is only on the order of a few tens of percent. With the global training (section 3), the most notable impact is on the execution time associated with mapping Jacobians for the additional individual variable absorbers from node to channel space.

### c. Handling of surface and cloud spectral properties

Surface emissivity and cloud properties are evaluated at each OSS node by interpolating input spectra provided on sets of user-defined hinge points and are allowed to vary across the channel bandwidth, unlike in RT models that use channel transmittance parameterizations.

With localized radiance training, spectral variations of the surface properties may be ignored so long as the impact of a change in surface emissivity across the channel bandwidth is within the accuracy threshold. This is typically the case for highly emissive surfaces (e.g., vegetation or snow) or for instruments with spectral response functions that vanish very rapidly outside an interval of a few per centimeter or less. In this case, it is sufficient to train with a set of scenes built using spectrally uniform emissivities. Similarly, in the case of clouds in the thermal infrared, it has been found that a model trained in clear sky typically performs as well as (or better than) the accuracy threshold when applied to cloudy atmospheres so long as spectral variations in cloud properties within the channel ILS have negligible impact on channel radiances.

When training across wide spectral domains (several tens of wavenumbers or more, e.g., for broadband channels or for unapodized or weakly apodized interferometer ILS)^{3} or for multiple channels, one must include realistic spectral variations of surface or cloud properties in the training set to avoid introducing internodal correlations that are unrepresentative of many real scenes and that would skew the node selection (Fig. 2).

Brightness temperature errors in a cloudy atmosphere obtained with models for an IASI-type interferometer trained in clear sky (localized training) to 0.05-K accuracy. (top) The average spectral rms modeling error [computed using the independent AIRS diverse set of 48 clear atmospheric profiles from Strow et al. (2003)] over the longwave portion of the infrared spectrum as a function of cloud optical depth (OD) and effective particle diameter (Deff) for a Gaussian-apodized (solid) and unapodized (dashed) ILS (OD = 0 corresponds to clear-sky conditions). Also shown are error spectra (sampled at 0.25-cm^{−1} interval) for OD = 4.6 and Deff = 50 *μ*m (red) overlaid by clear-sky error spectra (black) for the (middle) Gaussian-apodized and the (bottom) nonapodized cases. The example shown here is for a single-layer ice cloud at 300 hPa. Calculations were performed using the code for high-resolution accelerated radiative transfer with scattering (CHARTS) adding-–doubling scheme (Moncet and Clough 1997) with four streams.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Brightness temperature errors in a cloudy atmosphere obtained with models for an IASI-type interferometer trained in clear sky (localized training) to 0.05-K accuracy. (top) The average spectral rms modeling error [computed using the independent AIRS diverse set of 48 clear atmospheric profiles from Strow et al. (2003)] over the longwave portion of the infrared spectrum as a function of cloud optical depth (OD) and effective particle diameter (Deff) for a Gaussian-apodized (solid) and unapodized (dashed) ILS (OD = 0 corresponds to clear-sky conditions). Also shown are error spectra (sampled at 0.25-cm^{−1} interval) for OD = 4.6 and Deff = 50 *μ*m (red) overlaid by clear-sky error spectra (black) for the (middle) Gaussian-apodized and the (bottom) nonapodized cases. The example shown here is for a single-layer ice cloud at 300 hPa. Calculations were performed using the code for high-resolution accelerated radiative transfer with scattering (CHARTS) adding-–doubling scheme (Moncet and Clough 1997) with four streams.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Brightness temperature errors in a cloudy atmosphere obtained with models for an IASI-type interferometer trained in clear sky (localized training) to 0.05-K accuracy. (top) The average spectral rms modeling error [computed using the independent AIRS diverse set of 48 clear atmospheric profiles from Strow et al. (2003)] over the longwave portion of the infrared spectrum as a function of cloud optical depth (OD) and effective particle diameter (Deff) for a Gaussian-apodized (solid) and unapodized (dashed) ILS (OD = 0 corresponds to clear-sky conditions). Also shown are error spectra (sampled at 0.25-cm^{−1} interval) for OD = 4.6 and Deff = 50 *μ*m (red) overlaid by clear-sky error spectra (black) for the (middle) Gaussian-apodized and the (bottom) nonapodized cases. The example shown here is for a single-layer ice cloud at 300 hPa. Calculations were performed using the code for high-resolution accelerated radiative transfer with scattering (CHARTS) adding-–doubling scheme (Moncet and Clough 1997) with four streams.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

The training scenes used for extended spectral regimes use realistic water and land surface emissivities and liquid and ice cloud properties. Here we use a set of 188 emissivities from the MODIS UCSB emissivity library (Wan et al. 1999) and the ASTER spectral library (http://speclib.jpl.nasa.gov/; Salisbury and D’Aria 1994, 1992; Salisbury et al. 1994) compiled by E. Borbas and D. Zhou (2012, personal communication) (Fig. 3). Individual emissivity spectra in the original set were filtered to remove measurement noise in the original laboratory data and uniformly resampled at intervals of 10 cm^{−1}. The objective of meeting the accuracy requirement for each emissivity can be achieved without having to replicate our atmospheric training set 188 times (although our current system handles large number of scenes well) by carefully selecting representative emissivity spectra and combining each one of them with a few atmospheric profiles. The resulting number of scenes is only a few hundred. For clear and cloudy training, we use separate sets of clear, liquid only, ice only, and combined liquid and ice cloudy scenes. Clouds tend to smooth out molecular absorption spectral features in the outgoing radiances so that training in purely cloudy atmospheres would produce fewer nodes than in the clear sky for the same accuracy requirements. We require that the accuracy requirement is met for the clear set and for each cloudy set, separately and simultaneously, during training and thus ensure accurate performance of the model when applied to clear and cloudy atmospheres.

Sample of land surface emissivities for water, snow/ice, vegetation, rock/soil, and sand used for OSS training over extended spectral domains (E. Borbas and D. Zhou 2012, personal communication).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Sample of land surface emissivities for water, snow/ice, vegetation, rock/soil, and sand used for OSS training over extended spectral domains (E. Borbas and D. Zhou 2012, personal communication).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Sample of land surface emissivities for water, snow/ice, vegetation, rock/soil, and sand used for OSS training over extended spectral domains (E. Borbas and D. Zhou 2012, personal communication).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

## 3. Global training

*C*contained in a wide spectral interval can be written as

**y**is a vector of channels radiances,

*N*

_{chan}×

*N*

_{tot}) matrix, the rows of which contain the transpose of the weight vectors

**w**. The search approach described in this section finds the smallest set of nodes and the matrix

With any rigorous search approach, execution time of the training increases with the number of possible node combinations (i.e., both with the initial number of candidate nodes and the number of nodes in the final selection), which makes the direct application of the Mon08 approach (herein called the Mon08 vector) to complete sets of sounding channels very time consuming. The vector search can be accelerated by dividing the set *C* into vectors of small length (e.g., length 2) obtained by grouping adjacent channels and by applying the search independently to each group of channels. For a given group of channels, the search only operates on nodes that are contained in the spectral interval spanned by those channels. Those nodes are initially selected from **y**. The length of the vectors (the size of the groups) is progressively increased (e.g., doubled) at each subsequent pass. At each new pass, node selection is initialized with the result of the previous pass. The search stops when all channels in the band are included into a single vector. This method has the advantage that the total number of nodes decreases at each pass as the length of the vector increases, thereby significantly reducing the training time compared with the Mon08-vector search approach. Training time with this more efficient implementation is about 30 h on a single CPU for a sensor like IASI, which is still excessive for certain applications, such as sensor trade-off experiments that require retraining for different instrument characteristics and for OSS testing and development purposes. We use this method for benchmarking the *N*_{tot} and

Clustering techniques have been considered as a way to eliminate redundant information from ^{4} that offer significant reduction in the size of

We devised instead a simplified search procedure in which channels are processed sequentially, a single channel at a time, using a variant of the scalar W-E search approach. At each step *N* in this procedure, we start from the result of step *N* − 1, which consists of the ensemble *C*_{N−1} of *C* within prescribed accuracy *l* in step *N*, we consider the ensemble *ν*_{l}. A W-E search is applied to select a minimal number of nodes from *y*_{l}. The new node ensemble *C*_{N} along with *l*; thus, no step in the procedure need be devoted specifically to them. The first step starts with a channel arbitrarily chosen from among *C* (e.g., first channel in the band). In all subsequent steps, the channel with the largest rms error (normalized by *l* addressed by the step. When the last channel among *C* has been addressed by this process, all channels fall within the prescribed accuracy threshold. The value of *N*_{tot} obtained with this procedure is only 2%–3% larger (on average) than with our benchmark approach, but the procedure runs an order of magnitude faster: typically less than 1 h for IASI.

Figure 4 shows an example of the progression in *N*_{tot} and *N*_{tot}. The last point on the curves corresponds to the global solution obtained when all channels in the set are used. The final value of *N*_{tot} for the global solution is less than 0.1 nodes per channel. Modeling the 2260 IASI band-1 channels with this solution requires fewer RT calculations than with the localized solution by a factor of about 10. However, *N*_{tot} node selection) and results in an approximately 60% reduction in

(left) *N*_{tot} and (right) *N*_{tot} (see text). A gap in the sequence occurs when a single step in the process results in multiple channels meeting the accuracy threshold. In the right panel, the filled and open symbols are used to indicate

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

(left) *N*_{tot} and (right) *N*_{tot} (see text). A gap in the sequence occurs when a single step in the process results in multiple channels meeting the accuracy threshold. In the right panel, the filled and open symbols are used to indicate

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

(left) *N*_{tot} and (right) *N*_{tot} (see text). A gap in the sequence occurs when a single step in the process results in multiple channels meeting the accuracy threshold. In the right panel, the filled and open symbols are used to indicate

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

We also show the impact of the initial selection on the global solution. In Fig. 4, selection A (triangles) corresponds to our baseline localized approach. Selection B (circles) is obtained by turning off our adaptive preselection scheme in the localized search so that node sharing among channels is no longer maximized. The initial *N*_{tot} value in the sequential search is higher as a result. The paths followed by the two approaches converge in the last 10 steps, and the global solution does not depend significantly on the initial selection.

Global training introduces channel-to-channel correlations in the errors due to the OSS parameterization [(1)] that do not occur with localized training (Fig. 5). When the training is done with a tight threshold for the fit with respect to the reference model, as is typically the case, the modeling errors are negligible, and their correlations are, therefore, inconsequential.

OSS modeling-error interchannel correlations (derived from a large number of scenes) between the channel at 894.75 cm^{−1} and all other channels in IASI band 1, with localized and global channel radiance models and PC model. Full correlation matrices are provided as supplemental information.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

OSS modeling-error interchannel correlations (derived from a large number of scenes) between the channel at 894.75 cm^{−1} and all other channels in IASI band 1, with localized and global channel radiance models and PC model. Full correlation matrices are provided as supplemental information.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

OSS modeling-error interchannel correlations (derived from a large number of scenes) between the channel at 894.75 cm^{−1} and all other channels in IASI band 1, with localized and global channel radiance models and PC model. Full correlation matrices are provided as supplemental information.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

One might assume that mixing is bound to occur because each channel radiance is reconstructed from nodes in different parts of the spectrum: that is, modeled radiances in a given channel may respond to perturbations in parameters to which the channel is theoretically insensitive. Such occurrences can be diagnosed by inspection of the Jacobians. Figure 6 shows examples of spectral Jacobians for selected constituents. As seen in this figure, the Jacobians obtained with the global and localized solutions are both in excellent agreement with the reference (exact) calculations, even for the most weakly absorbing constituents. In particular, the response to a perturbation in the concentration of any constituent (among all 20 constituents included in the model) vanishes outside of the spectral region where the constituent is absorbing, indicating the absence of significant mixing. The global solutions are noisier in appearance, but the magnitude of the modeling errors remains commensurate with the accuracy threshold imposed in the training.

Radiance response across the 745–2000-cm^{−1} region to perturbations in (top) CH_{4}, (middle) NH_{3}, and (bottom) HNO_{3} concentrations, with localized (black) and global (red) IASI channel radiance models. The reference HIRES calculations are shown in blue. The perturbations are the differences between extreme profiles in our training-profile database. The changes in mass column amounts are ~890–1060 mg m^{−2} for CH_{4}, ~0–55 mg m^{−2} for NH_{3}, and 3–75 mg m^{−2} for HNO_{3}.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Radiance response across the 745–2000-cm^{−1} region to perturbations in (top) CH_{4}, (middle) NH_{3}, and (bottom) HNO_{3} concentrations, with localized (black) and global (red) IASI channel radiance models. The reference HIRES calculations are shown in blue. The perturbations are the differences between extreme profiles in our training-profile database. The changes in mass column amounts are ~890–1060 mg m^{−2} for CH_{4}, ~0–55 mg m^{−2} for NH_{3}, and 3–75 mg m^{−2} for HNO_{3}.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Radiance response across the 745–2000-cm^{−1} region to perturbations in (top) CH_{4}, (middle) NH_{3}, and (bottom) HNO_{3} concentrations, with localized (black) and global (red) IASI channel radiance models. The reference HIRES calculations are shown in blue. The perturbations are the differences between extreme profiles in our training-profile database. The changes in mass column amounts are ~890–1060 mg m^{−2} for CH_{4}, ~0–55 mg m^{−2} for NH_{3}, and 3–75 mg m^{−2} for HNO_{3}.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

## 4. Application to AIRS, IASI, and CrIS forward modeling

OSS performance is demonstrated with localized and global training for typical hyperspectral infrared sounders: IASI, CrIS, and AIRS. The main sensor characteristics are in Table 2. For the examples in Table 3, the models were trained with six variable constituents in the clear sky, with an accuracy threshold set to 20% of the instrument NEdN (Fig. 7).

Summary of AIRS, IASI, and CrIS channel sets. AIRS and IASI resolutions are given in terms of full width at half maximum (FWHM). CrIS resolution is given in terms of the distance between the center of the ILS and the first zero crossing (FWHM is about 1.21 times larger). All data have units of inverse centimeters. Resolutions provided for IASI and CrIS are for self-apodized spectra. The IASI standard Gaussian-apodized resolution is 0.5 cm^{−1} (FWHM).

Total and average number of nodes for localized and global clear-sky training for IASI, AIRS, and CrIS (unapodized) sensors for individual LW, MW, and SW infrared bands. Numbers in parenthesis are *N*_{tot}/*N*_{chan}. Values provided here are for training with six variable constituents and an accuracy threshold set to 20% of NEdN.

NEdN for the AIRS (http://disc.sci.gsfc.nasa.gov/AIRS/documentation/v5_docs/v5_docs_list.shtml#Level_1B_Documents), IASI, and CrIS (P. Antonelli and D. Tobin 2012, personal communication) sensors.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

NEdN for the AIRS (http://disc.sci.gsfc.nasa.gov/AIRS/documentation/v5_docs/v5_docs_list.shtml#Level_1B_Documents), IASI, and CrIS (P. Antonelli and D. Tobin 2012, personal communication) sensors.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

NEdN for the AIRS (http://disc.sci.gsfc.nasa.gov/AIRS/documentation/v5_docs/v5_docs_list.shtml#Level_1B_Documents), IASI, and CrIS (P. Antonelli and D. Tobin 2012, personal communication) sensors.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

With the global training, the entire set of radiances (across all three bands) for IASI, AIRS, or CrIS can be described with only a few hundred nodes when each band is trained independently (490–668 for the selected model accuracy), which is significantly fewer than the number of channels. For IASI, *N*_{tot} drops by about one-third when training is applied to bands 1 and 2 simultaneously and more than 40% when applied to all bands. By comparison, localized training produced OSS models requiring between about 4360 and 7960 nodes for the same sensors and model accuracy. In accord with the plots of the previous section, *N*_{tot} increases only marginally (<10%) relative to clear-sky training.

Differences in global training performance among instruments are explained by several factors, including spectral coverage and resolution, model accuracy (tied here to sensor noise), and characteristics of the variable gas absorption (number of variable constituents, magnitude, and spectral distribution of their radiometric impact) in the domain covered by the sensor. For a given spectral domain and model accuracy, we have found that *N*_{tot} can be predicted from the area *A* underneath the ILS in the Fourier domain [a function of both optical path difference (OPD) and apodization function for interferometers] and that *N*_{tot} is essentially insensitive to the shape of the ILS (or apodization function) and spectral sampling as long as the spectrum is Nyquist sampled and, hence, aliasing is small. These points are illustrated in Figs. 8 and 9. Figure 8 shows *N*_{tot} versus *A* for interferometers with different OPDs and apodization functions (Gaussian with varying width, Hamming, and Blackman) covering the IASI band 1 range (645–1210 cm^{−1}). Numbers for the AIRS instrument function are also included. With AIRS, the width of the instrument function varies with wavenumber, and the approximate value provided for *A* is an average value over the 649–1136-cm^{−1} interval. The shapes of these curves for bands 2 and 3 (not shown) are similar to the ones in Fig. 8. The impact of the sampling interval on *N*_{tot} versus *A* is illustrated in Fig. 9. The value of *A*, although the data are more scattered. No such trends occur with localized training because of the different nature of the optimization criteria. With localized training, both *N*_{tot} and

Total number of nodes with global training as a function of apodization strength for interferometric ILS corresponding to OPD = 2.0 (black curve), 1.25 (blue curve), and 0.8 (red curve). All the calculations are based on IASI band-1 spectral coverage and instrument noise (Fig. 7) and the same accuracy threshold (20% of IASI NEdN). The training set used here only includes ocean, snow, and vegetated surfaces. The rightmost point on each curve corresponds to unapodized ILS. Other points were obtained by applying Gaussian apodization with increasing strength. The *x* axis represents the normalized area under the instrument function (in Fourier space) *y* axis is the total number of OSS nodes divided by ^{−1} (red).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Total number of nodes with global training as a function of apodization strength for interferometric ILS corresponding to OPD = 2.0 (black curve), 1.25 (blue curve), and 0.8 (red curve). All the calculations are based on IASI band-1 spectral coverage and instrument noise (Fig. 7) and the same accuracy threshold (20% of IASI NEdN). The training set used here only includes ocean, snow, and vegetated surfaces. The rightmost point on each curve corresponds to unapodized ILS. Other points were obtained by applying Gaussian apodization with increasing strength. The *x* axis represents the normalized area under the instrument function (in Fourier space) *y* axis is the total number of OSS nodes divided by ^{−1} (red).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Total number of nodes with global training as a function of apodization strength for interferometric ILS corresponding to OPD = 2.0 (black curve), 1.25 (blue curve), and 0.8 (red curve). All the calculations are based on IASI band-1 spectral coverage and instrument noise (Fig. 7) and the same accuracy threshold (20% of IASI NEdN). The training set used here only includes ocean, snow, and vegetated surfaces. The rightmost point on each curve corresponds to unapodized ILS. Other points were obtained by applying Gaussian apodization with increasing strength. The *x* axis represents the normalized area under the instrument function (in Fourier space) *y* axis is the total number of OSS nodes divided by ^{−1} (red).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Impact of sampling interval of convolved radiance spectrum on *N*_{tot} vs *A* curve (Fig. 8) for IASI band 1. The blue curve (as in Fig. 8) corresponds to an interferometer with OPD = 1.25 and a sampling interval *I* = 0.4 cm^{−1}. The green and black curves represent undersampling, and the red curve represents oversampling of an unapodized ILS by the Nyquist criterion. The fact that all curves converge for low values of *A*/*A** is explained by the fact that, as apodization strength is increased, a point is reached where spectra become oversampled. In this regime,

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Impact of sampling interval of convolved radiance spectrum on *N*_{tot} vs *A* curve (Fig. 8) for IASI band 1. The blue curve (as in Fig. 8) corresponds to an interferometer with OPD = 1.25 and a sampling interval *I* = 0.4 cm^{−1}. The green and black curves represent undersampling, and the red curve represents oversampling of an unapodized ILS by the Nyquist criterion. The fact that all curves converge for low values of *A*/*A** is explained by the fact that, as apodization strength is increased, a point is reached where spectra become oversampled. In this regime,

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Impact of sampling interval of convolved radiance spectrum on *N*_{tot} vs *A* curve (Fig. 8) for IASI band 1. The blue curve (as in Fig. 8) corresponds to an interferometer with OPD = 1.25 and a sampling interval *I* = 0.4 cm^{−1}. The green and black curves represent undersampling, and the red curve represents oversampling of an unapodized ILS by the Nyquist criterion. The fact that all curves converge for low values of *A*/*A** is explained by the fact that, as apodization strength is increased, a point is reached where spectra become oversampled. In this regime,

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

The *A* values associated with the CrIS (unapodized), AIRS, and Gaussian-apodized IASI instrument functions are similar in band 1, and, according to Fig. 8, global OSS models trained to the same accuracy over the 645–1210-cm^{−1} range for those three configurations use roughly the same number of nodes. The larger *N*_{tot} for CrIS in Table 3 is due to the lower noise of the CrIS sensor throughout band 1 (Fig. 7). In bands 2 and 3 (not shown), the spectral resolutions of the CrIS and AIRS sensors are lower than that of IASI (Table 2); as a result, *N*_{tot} for the two sensors is reduced by 15%–25% in band 2 and around 30% in band 3, compared to IASI. In addition, the coverage of AIRS and CrIS in band 3 excludes O_{3} and CO absorption bands, accounting for another 40%–50% reduction in *N*_{tot}. However, this decrease is countered by the lower NEdN of the two sensors in those bands (Fig. 7).

The numbers provided above are typical for generic OSS models trained with the robust training set described in section 2b. The accuracy of these models measured with independent datasets (no decorrelation applied) is overall no worse and often better than the accuracy achieved during training with the dependent set (e.g., Fig. 10). As mentioned in section 2, the training set can be tailored to specific applications. In NWP model assimilation for short- and medium-range weather forecasting, the atmospheric profiles that are input to OSS always come from the same NWP source, and the decorrelation applied to the vertical profiles used for OSS training is unnecessary. Similarly, surface emissivity spectra available to the models may not be as complex as those used in our generic training, and variability in minor constituents (outside of water vapor, ozone, and CO_{2}) may be constrained (if not ignored). When the training set is made less rigorous by omitting vertical decorrelation of temperature and ozone profiles, the number of OSS nodes for IASI drops below 80 in each band for the same accuracy threshold.

IASI global radiance model rms fitting errors measured in clear sky with an independent set of temperature, water vapor, and ozone profiles from the Thermodynamic Initial Guess Retrieval (TIGR; Atmospheric Radiation Analysis 2001; Chédin et al. 1985) dataset. The model was trained with six variable constituents. In this example, CO_{2} and CH_{4} concentrations for all 2231 profiles in the TIGR set are set to a low (blue) and high (red) value corresponding to approximately 370 and 400 ppmv and 1600 and 1800 ppbv for CO_{2} and CH_{4}, respectively. Concentrations of CO and N_{2}O are fixed to a global average. The surface emissivity used in those calculations is typical of sand deserts. The rms errors obtained with the dependent training set (with random CO_{2} and CH_{4} concentrations) have been added for reference (black). All rms errors are normalized by ^{−1} regions, where _{2} and CH_{4} concentrations are fixed at extreme values. These results demonstrate that OSS model performance is practically insensitive to a change in CO_{2} and CH_{4} concentration. Similarly, model performance depends little on surface type (not shown).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

IASI global radiance model rms fitting errors measured in clear sky with an independent set of temperature, water vapor, and ozone profiles from the Thermodynamic Initial Guess Retrieval (TIGR; Atmospheric Radiation Analysis 2001; Chédin et al. 1985) dataset. The model was trained with six variable constituents. In this example, CO_{2} and CH_{4} concentrations for all 2231 profiles in the TIGR set are set to a low (blue) and high (red) value corresponding to approximately 370 and 400 ppmv and 1600 and 1800 ppbv for CO_{2} and CH_{4}, respectively. Concentrations of CO and N_{2}O are fixed to a global average. The surface emissivity used in those calculations is typical of sand deserts. The rms errors obtained with the dependent training set (with random CO_{2} and CH_{4} concentrations) have been added for reference (black). All rms errors are normalized by ^{−1} regions, where _{2} and CH_{4} concentrations are fixed at extreme values. These results demonstrate that OSS model performance is practically insensitive to a change in CO_{2} and CH_{4} concentration. Similarly, model performance depends little on surface type (not shown).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

IASI global radiance model rms fitting errors measured in clear sky with an independent set of temperature, water vapor, and ozone profiles from the Thermodynamic Initial Guess Retrieval (TIGR; Atmospheric Radiation Analysis 2001; Chédin et al. 1985) dataset. The model was trained with six variable constituents. In this example, CO_{2} and CH_{4} concentrations for all 2231 profiles in the TIGR set are set to a low (blue) and high (red) value corresponding to approximately 370 and 400 ppmv and 1600 and 1800 ppbv for CO_{2} and CH_{4}, respectively. Concentrations of CO and N_{2}O are fixed to a global average. The surface emissivity used in those calculations is typical of sand deserts. The rms errors obtained with the dependent training set (with random CO_{2} and CH_{4} concentrations) have been added for reference (black). All rms errors are normalized by ^{−1} regions, where _{2} and CH_{4} concentrations are fixed at extreme values. These results demonstrate that OSS model performance is practically insensitive to a change in CO_{2} and CH_{4} concentration. Similarly, model performance depends little on surface type (not shown).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

## 5. Modeling principal components of channel radiances

**Γ**is a diagonal matrix of radiometric noise estimates and is used as a normalization factor. In practice, only the

*M*most significant (first) components are retained:

*M*is a trade-off between maximizing noise filtering and data compression versus minimizing reconstruction error, and it is application dependent. Typically, 200 EOFs (e.g., Collard et al. 2010) have been used for characterizing the atmospheric signal in the original IASI measurements.

The OSS search methods are directly applicable to the PC domain. So long as the number of retained PCs is sufficient to preserve the information content of the modeled observations, the number of nodes selected from global training in channel and PC domains is similar. An example of *N*_{tot} and *N*_{l} versus PC number, with global training, is shown in Fig. 11 for IASI band 1 (0.2 accuracy, equivalent to 0.2NEdN in channel space; see below). In this example, EOFs were derived from simulated data. For modeling the PCs within the noise threshold, 236 nodes are required (to be compared with 204 with radiance training; Table 3). All 236 nodes are selected for the first 5 PCs. The number of nodes used to model each of the individual PCs is between about 130 and 210 up to PC 160 and starts dropping beyond that point (Fig. 11). Above PC 195, a single node is sufficient per PC.

*N*_{l} (dots) and *N*_{tot} (dashed line) as a function of PC number with global OSS PC training applied to IASI band 1. Following common convention, PCs are ordered according to their corresponding eigenvalues (variance). OSS rms modeling error is shown by the solid line, with the scale on the right axis. Here, the OSS model was trained to achieve an accuracy *α* = 0.2 in PC space.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

*N*_{l} (dots) and *N*_{tot} (dashed line) as a function of PC number with global OSS PC training applied to IASI band 1. Following common convention, PCs are ordered according to their corresponding eigenvalues (variance). OSS rms modeling error is shown by the solid line, with the scale on the right axis. Here, the OSS model was trained to achieve an accuracy *α* = 0.2 in PC space.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

*N*_{l} (dots) and *N*_{tot} (dashed line) as a function of PC number with global OSS PC training applied to IASI band 1. Following common convention, PCs are ordered according to their corresponding eigenvalues (variance). OSS rms modeling error is shown by the solid line, with the scale on the right axis. Here, the OSS model was trained to achieve an accuracy *α* = 0.2 in PC space.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

*N*

_{tot}nodes contribute to each PC. It can be shown that when

*l*in the channel radiance solution, weights computed using (13) are identical to the optimal weights derived by least squares regression [see (5)–(7)] in EOF space:

*N*

_{tot}nodes are used to reconstruct radiances in each channel makes the Jacobian mapping excessively slow.

When modeling errors are spectrally uncorrelated, an accuracy of *α*NEdN in channel space translates into an accuracy of *α* for each PC, because of the normalization in (9). Globally trained OSS model error may contain significant spectral correlation (section 3; Fig. 5), so a model trained to achieve *α*NEdN in the channel domain may exceed the *α* threshold when transformed into PC space using (13). This may not be an issue in practice if *α* is chosen to be small enough so the error in the lower-order PCs remains below the sensor noise level; however, for applications that require strict control on model accuracy, it is best to tailor the node selection for models destined to be transformed in the PC domain to meet the *α* threshold in the PC domain. Training global OSS models to the *α* accuracy level in PC space results in better performance both in PC and in radiance domains (Fig. 12) but also typically incurs an increase in *N*_{tot} of a few tens of percent compared to radiance models trained to an accuracy of *α*NEdN (e.g., 236 vs 204 in the example above for IASI band 1).

As in Fig. 10, but for radiances reconstructed from an OSS PC model trained to 0.2 accuracy. As in Fig. 10, rms errors are normalized by 0.2NEdN. The same truncated set of EOFs derived from the training data is used to reconstruct radiances for both the dependent and independent datasets.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

As in Fig. 10, but for radiances reconstructed from an OSS PC model trained to 0.2 accuracy. As in Fig. 10, rms errors are normalized by 0.2NEdN. The same truncated set of EOFs derived from the training data is used to reconstruct radiances for both the dependent and independent datasets.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

As in Fig. 10, but for radiances reconstructed from an OSS PC model trained to 0.2 accuracy. As in Fig. 10, rms errors are normalized by 0.2NEdN. The same truncated set of EOFs derived from the training data is used to reconstruct radiances for both the dependent and independent datasets.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

It is of interest to compare the attributes of the OSS global node selection approach to a simple (suboptimal) clustering approach to node selection in the context of the objectives stated in the introduction. For this exercise, we chose to implement the method used in a PC-based radiative transfer model (PCRTM; Liu et al. 2006a), in which a set of nodes is obtained by uniformly sampling an internodal radiance correlation function. Figure 13 compares the modeling errors in IASI band 1 obtained with this clustering approach (with 210, 240, and 270 nodes) and an OSS 179-node global selection providing nominal 0.05-K accuracy. In this experiment, both models were trained with the robust training set described in section 2b. Aside from the fact that the number of nodes produced with the clustering technique is always larger than with OSS for the same strict accuracy tolerance, two undesirable characteristics of this technique are highlighted in Fig. 13: 1) while the error produced with the clustering approach is generally low, it can be significant (far exceeding the target accuracy threshold) locally in important regions of the spectrum, and 2) increasing the number of nodes by sampling the correlation function more finely does not guarantee a uniform improvement in accuracy and may even cause degradations in certain parts of the spectrum. This last feature is explained by the fact that key nodes captured with a given sampling interval may be missed when narrowing the interval. With suboptimal node-selection schemes, it is difficult to control the accuracy of the model. More importantly, the magnitude of the modeling errors tends to be more sensitive to the atmospheric conditions than with OSS, making the reliability of such models in a stressful environment questionable. By comparison, the accuracy of OSS model is at or below threshold everywhere across the spectrum.

(top) Examples of modeling errors in PC space with the PCRTM clustering technique using 210, 240, and 270 nodes vs 0.05-K accuracy (179 nodes) OSS global channel radiance model for IASI band 1. The models were trained over ocean, snow, and vegetated surfaces. The dashed line indicates the *α* = 1 (sensor noise) level. (bottom) As in (top), except projected in spectral space (in brightness temperature units).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

(top) Examples of modeling errors in PC space with the PCRTM clustering technique using 210, 240, and 270 nodes vs 0.05-K accuracy (179 nodes) OSS global channel radiance model for IASI band 1. The models were trained over ocean, snow, and vegetated surfaces. The dashed line indicates the *α* = 1 (sensor noise) level. (bottom) As in (top), except projected in spectral space (in brightness temperature units).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

(top) Examples of modeling errors in PC space with the PCRTM clustering technique using 210, 240, and 270 nodes vs 0.05-K accuracy (179 nodes) OSS global channel radiance model for IASI band 1. The models were trained over ocean, snow, and vegetated surfaces. The dashed line indicates the *α* = 1 (sensor noise) level. (bottom) As in (top), except projected in spectral space (in brightness temperature units).

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

## 6. Computational considerations

Table 4 shows examples of IASI bands 1 + 2 clear-sky model timings with localized and global training in channel and PC space. The RT solver used for radiance and analytical Jacobians calculations in this experiment is described in Mon08. The number of nodes (*N*_{tot} and

Estimated OSS (clear sky) forward model timing (ms) per profile for IASI bands 1 + 2 for the training modes and representations listed in Table 5. Estimates have been obtained from (3), using measurements of *t*_{RT} and *t*_{float} performed on a Linux computer equipped with a single Intel 4C Core i7-3770 3.5-GHz processor using a PGI FORTRAN 95 (-O2) compiler. Timing includes analytical Jacobians with respect to input model parameters in original geophysical space (no regularization) and after transformation of the input model parameters to a reduced-dimension space (with regularization). The second column under each category is the acceleration factor relative to the benchmark timing from localized training for channel radiances.

Total and average number of nodes with clear localized and global training (in channel radiance and PC representations) for IASI bands 1 + 2, with two variable constituents (H_{2}O and O_{3}).

The treatment of the state vector **x** of retrieved parameters can have a large impact on the Jacobian timing. In this example, **x** consists of atmospheric temperature, water vapor and ozone profiles (specified on a 101-level grid), and surface temperature (304 parameters), plus the emissivity at each of 38 hinge points for *N*_{par} = 342. The number of multiplications required for mapping derivatives with respect to emissivity is small because emissivity at a node depends only on the emissivity values at the two adjacent hinge points.

**x**commensurate with the information content of the observations and avoid ill conditioning in the inversion problem. In practice, regularization is commonly achieved by expressing the departure of the profile variables and spectral surface emissivity composing

**x**from their prior values as a linear combination of basis functions:

**x**

_{r}is the retrieved state vector in the regularized state space. We choose

**x**

_{r}has fewer elements than

**x**and may be composed from a subset of the EOFs of

**x**. In Table 4, two cases are considered: 1) the model outputs the Jacobians as the derivatives of channel radiances (or PCs) with respect to

**x**in the original geophysical space; and 2) regularization is applied, and the Jacobians

**x**

_{r}.

For localized channel radiance training (used here as our benchmark for evaluating other modes), typically *N*_{tot} is of the same order as the number of channels, and *t*_{OSS}. When the dimension of ^{5} the node-to-channel mapping operation is much faster, but the execution of (16) (included in the timing) offsets the gain.

With global training, the time spent on monochromatic RT (*N*_{tot}*t*_{RT}), drops by over an order of magnitude compared to our benchmark, down to about 0.002 s in both channel and PC space. However, the fact that *N*_{tot}*t*_{RT}, even with only two variable absorbers. The timing improvement over locally trained models when regularization is used is because (16) is applied in node space and that significant gain is achieved when *N*_{tot} is reduced with global training. The speed advantage of PC models over channel models is because

The timing ratios shown in Table 4 are typical for clear-sky retrievals in the infrared domain. Gains obtained with the global training are larger in cloudy atmospheres where scattering RT calculations are more time consuming. For a single cloud layer, using four-stream calculations, the radiance model with global training is approximately 5 times as fast as the model with localized training when no regularization is applied. In the shortwave portion of the spectrum (near-IR or visible), where *t*_{OSS} is largely dominated by the RT calculations, model timing ratios for localized and global training are essentially equal to the *N*_{tot} ratios.

## 7. Conclusions

The work presented here describes the general application of OSS to the modeling of observations from high-resolution infrared sounders over extended spectral regimes, with advancements over methods presented previously (Mon08). Although our focus is on the application to radiance inversion and assimilation, the results presented here are useful for general radiative transfer applications for which fast computation is desired. The models discussed in this paper are applicable to land and ocean backgrounds in clear and cloudy (scattering) atmospheres. OSS offers significant numerical accuracy and speed advantages over the transmittance parameterizations used operationally in NWP centers. It also provides more flexibility with respect to the handling of variable constituents and is readily applicable to sinc-function ILS and representation of radiances in terms of PCs. Configuring the approach for a particular application (or type of instrument function or representation) requires no changes to the training software or to the forward model; it only requires the provision of sensor-channel definition data and the selection of configuration options, including the accuracy to which the OSS model must match the reference line-by-line model.

The global search procedure described in this paper offers another order-of-magnitude gain in speed (and reduction in size of the LUTs) over localized training for RT calculations. Like with the localized training described in Mon08, the spectral sampling (node selection) approach for the global solution is aimed at ensuring fidelity to the reference line-by-line model under all atmospheric conditions. With the global training, only a few hundred monochromatic calculations (nodes) are necessary to reproduce radiance spectra in all three IASI, AIRS, or CrIS bands within a fraction of the instrument noise. This node count is highly predictable from the information content of the observations (as represented by the area underneath the ILS in the Fourier domain).

PCs have been used in retrieval applications to reduce the size of the observation vector and accelerate the inversion process. The same OSS training approach and same forward model are used to train/model channel radiances and PCs. Unlike the PC approach, the channel radiance representation preserves the integrity of the spectral information under all conditions and is compatible with retrieval and assimilation applications that perform dynamic (at run time) channel selection, such as to avoid channels affected by low-level clouds. Because channel radiances and PCs are linearly related, OSS allows switching the representation at run time, which is useful for applications that benefit from updating the set of EOFs based on newer information or for tailoring to local conditions.

In retrieval applications, the speed advantage of channel radiance models with global training over models with localized training may be hindered by the fact that more nodes participate in the reconstruction of each channel, significantly slowing the node-to-channel Jacobian-mapping operation. The impact of Jacobian mapping on total execution time is more apparent in clear sky. In scattering atmospheres, radiative-transfer calculations are more time consuming, and the relative contribution of the mapping operation is not as large. For equivalent accuracy, channel radiance models with global training and PC models provide similar performance in terms of number of nodes, but the reduction in size of the observation vector in PC space makes the Jacobian mapping more efficient.

The highest achievable gain in computation speed is realized with global training when observations are projected upfront onto nodes, and forward model and inversion algebra operate entirely in node space, thereby avoiding Jacobian mapping altogether. This approach will be the subject of a subsequent publication.

The OSS approach has been applied for modeling that handles as many as 20 variable gases, and can be extended for additional gases. While OSS is used for trace-gas retrieval applications, more evaluation is needed to validate the robustness of the global training approach under stressful conditions in regions of the spectrum where two or more minor constituents are active.

Along with speed, fidelity to a line-by-line model (LBLRTM) is an objective of OSS; hence, OSS validation has been assessed using simulated datasets, in which we can easily make the modeled situations as stressful as required to test the robustness of the fast models. Because the errors due to the fast model parameterization are typically small compared to other sources of errors (e.g., sensor noise and spectroscopic errors), differences between modeled and measured data with OSS are essentially the same as those obtained with LBLRTM.

Aspects not discussed here include the handling of nonlocal thermodynamic equilibrium (in the daytime) and accelerated treatment of the radiative transfer in cloudy atmospheres in both the thermal and solar regime. Application of OSS to near-IR and visible domains [Orbiting Carbon Observatory (OCO), Global Ozone Monitoring Experiment (GOME), etc.] will be the topic of another subsequent publication.

## Acknowledgments

The work described here was supported in part by the Joint Center for Satellite Data Assimilation through NOAA/NESDIS (Award NA10NES4400009) and through internal AER funding. The integration of the multiple-scattering capability and validation of OSS-based cloudy retrievals was supported by the Air Force Weather Agency. The early development of the PC version of OSS was funded by EUMETSAT. We acknowledge Janusz Eluszkiewicz and Yaping Xiao, formerly with AER, for including the minor constituents into the atmospheric training dataset, Susan Strahan from NASA/GSFC for providing the latest GMI model runs, Vivienne Payne from JPL (formerly with AER) for reviewing the training set and including the MIPAS data, and Eva Borbas from the University of Wisconsin and Daniel Zhou from NASA/LaRC for making their merged land surface emissivity dataset available to us. We also thank Karen Cady-Pereira of AER for her feedback on OSS-based analysis of TES retrievals and Matthew Alvarado of AER for helping with fixing issues, as well as Stephen Tjemkes from EUMETSAT assisted by Paolo Antonelli (from the University of Wisconsin) for performing independent testing and evaluation of the OSS training system and the OSS model in both forward and retrieval modes.

## APPENDIX

### The Physical Nature of the OSS Method

This appendix addresses the physical nature of the OSS solution and the reasons that using a radiance clustering technique to find an optimal set of nodes would require a complex algorithm. Figure A1 shows the spectral location of nodes selected by the localized approach (Mon08) to model a simple example ILS: the average radiance (boxcar) over the 940–950-cm^{−1} range, where H_{2}O and CO_{2} are the dominant absorbers and lines are relatively weak and well separated. The OSS method produces separate sets of nodes for the different absorbers, sampling line spectra for the individual absorbers in such a way that the contribution from regions near and away from the line centers is adequately represented. The rest of the nodes (only node 4 in this example) sample the smooth part of the spectrum dominated by the far wing of the lines and continua. The weights determined by regression (section 2a) associated with each node are positive when the ILS is positive and their sum is very close to 1. Hence, they can be associated with the probability of occurrence of a certain absorption property (represented by the node) across the spectral range (Fig. A1). Equivalently, each node can be thought of as the center of a cluster of monochromic radiance vectors in an *S*-dimensional space (i.e., each vector component is the radiance for one profile in the training set), and its associated weight is the radius of that cluster. The radius can be measured in terms of the separation angle *α* between the node’s vector radiance and the vectors for all other monochromatic points for which radiances are equally correlated with the node radiance. This angular measure of correlation radii was used in the PCRTM method of finding predictors (nodes) by clustering (Liu et al. 2006a). Figure A2 illustrates the characteristics of the radiance correlations associated with OSS nodes near and away from line centers. By plotting the OSS weights along the curves in Fig. A2, we can see that nodes in smooth parts of the spectrum (e.g., node 4) have higher weights and represent the large portions of the spectrum for which radiances are within a small correlation radius around the node radiance; that is, their radiance vectors are tightly clustered and are highly correlated. Nodes near the strongest absorption lines (e.g., node 7) have smaller OSS weights and represent small portions of the spectrum within a broader correlation radius around the node radiance (more broadly clustered in radiance space). An algorithm that selects nodes using a clustering approach with uniform cluster radii would, therefore, not arrive at an optimal solution. Finding optimal nodes by a clustering approach would require an adaptive method to determine the cluster radii.

Spectral location of nodes (vertical dashed lines) to model the average radiance over the 940–950-cm^{−1} range, overlaying radiance spectra computed for three atmospheric conditions (A: warm and moist, B: cold and dry, and C: temperate). The H_{2}O and CO_{2} lines are indicated by the arrows. The parts of the spectrum where radiances computed over many scenes are highly correlated with the radiances at a given node are drawn using the same color assigned to that node. The correlation radius around each node is determined by the OSS weight, as shown in Fig. A2.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Spectral location of nodes (vertical dashed lines) to model the average radiance over the 940–950-cm^{−1} range, overlaying radiance spectra computed for three atmospheric conditions (A: warm and moist, B: cold and dry, and C: temperate). The H_{2}O and CO_{2} lines are indicated by the arrows. The parts of the spectrum where radiances computed over many scenes are highly correlated with the radiances at a given node are drawn using the same color assigned to that node. The correlation radius around each node is determined by the OSS weight, as shown in Fig. A2.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

Spectral location of nodes (vertical dashed lines) to model the average radiance over the 940–950-cm^{−1} range, overlaying radiance spectra computed for three atmospheric conditions (A: warm and moist, B: cold and dry, and C: temperate). The H_{2}O and CO_{2} lines are indicated by the arrows. The parts of the spectrum where radiances computed over many scenes are highly correlated with the radiances at a given node are drawn using the same color assigned to that node. The correlation radius around each node is determined by the OSS weight, as shown in Fig. A2.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

The fraction of the 940–950-cm^{−1} range for which the radiances are within varying correlation radii of OSS node radiances. Correlations (the horizontal axis) are represented as the angle *α* between two radiance vectors in an *S*-dimensional space, where *α* = 0° indicates that the radiances are 100% correlated. The curve for each node indicates the fraction of this spectral range for which radiance vectors are within an angle *α* of a node, with the nodes numbered as in Fig. A1. Radiances were computed with LBLRTM at an interval of approximately 10^{−3} cm^{−1}. For each curve, the dot represents the node’s OSS weight.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

The fraction of the 940–950-cm^{−1} range for which the radiances are within varying correlation radii of OSS node radiances. Correlations (the horizontal axis) are represented as the angle *α* between two radiance vectors in an *S*-dimensional space, where *α* = 0° indicates that the radiances are 100% correlated. The curve for each node indicates the fraction of this spectral range for which radiance vectors are within an angle *α* of a node, with the nodes numbered as in Fig. A1. Radiances were computed with LBLRTM at an interval of approximately 10^{−3} cm^{−1}. For each curve, the dot represents the node’s OSS weight.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

The fraction of the 940–950-cm^{−1} range for which the radiances are within varying correlation radii of OSS node radiances. Correlations (the horizontal axis) are represented as the angle *α* between two radiance vectors in an *S*-dimensional space, where *α* = 0° indicates that the radiances are 100% correlated. The curve for each node indicates the fraction of this spectral range for which radiance vectors are within an angle *α* of a node, with the nodes numbered as in Fig. A1. Radiances were computed with LBLRTM at an interval of approximately 10^{−3} cm^{−1}. For each curve, the dot represents the node’s OSS weight.

Citation: Journal of the Atmospheric Sciences 72, 7; 10.1175/JAS-D-14-0190.1

## REFERENCES

Aires, F., W. B. Rossow, N. A. Scott, and A. Chédin, 2002: Remote sensing from the infrared atmospheric sounding interferometer instrument. 1. Compression, denoising, and first-guess retrieval algorithms.

,*J. Geophys. Res.***107**, 4619, doi:10.1029/2001JD000955.Alvarado, M. J., V. H. Payne, E. J. Mlawer, G. Uymin, M. W. Shephard, K. E. Cady-Pereira, J. Delamere, and J. Moncet, 2013: Performance of the line-by-line radiative transfer model (LBLRTM) for temperature, water vapor, and trace gas retrievals: Recent updates evaluated with IASI case studies.

,*Atmos. Chem. Phys. Discuss.***13**, 79–144, doi:10.5194/acpd-13-79-2013.Antonelli, P., and Coauthors, 2004: A principal component noise filter for high spectral resolution infrared measurements.

,*J. Geophys. Res.***109**, D23102, doi:10.1029/2004JD004862.Atmospheric Radiation Analysis, 2001: Thermodynamic Initial Guess Retrieval (TIGR) dataset, version 1.2. Laboratoire de Météorologie Dynamique, accessed 2001. [Available online at http://ara.abct.lmd.polytechnique.fr/index.php?page5tigr.]

Aumann, H. H., and Coauthors, 2003: AIRS/AMSU/HSB on the Aqua mission: Design, science objectives, data products, and processing systems.

,*IEEE Trans. Geosci. Remote Sens.***41**, 253–264, doi:10.1109/TGRS.2002.808356.Bey, I., and Coauthors, 2001: Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation.

,*J. Geophys. Res.***106**, 23 073–23 096, doi:10.1029/2001JD000807.Cady-Pereira, K. E., M. W. Shephard, D. B. Millet, M. Luo, K. C. Wells, Y. Xiao, V. H. Payne, and J. Worden, 2012: Methanol from TES global observations: Retrieval algorithm and seasonal and spatial variability.

,*Atmos. Chem. Phys.***12**, 8189–8203, doi:10.5194/acp-12-8189-2012.Cady-Pereira, K. E., S. Chaliyakunnel, M. W. Shephard, D. B. Millet, M. Luo, and K. C. Wells, 2014: HCOOH measurements from space: TES retrieval algorithm and observed global distribution.

,*Atmos. Meas. Tech.***7**, 2297–2311, doi:10.5194/amt-7-2297-2014.Calbet, X., R. Kivi, S. Tjemkes, F. Montagner, and R. Stuhlmann, 2011: Matching radiative transfer models and radiosonde data from the EPS/Metop Sodankylä campaign to IASI measurements.

,*Atmos. Meas. Tech.***4**, 1177–1189, doi:10.5194/amt-4-1177-2011.Chédin, A., N. A. Scott, C. Wahiche, and P. Moulinier, 1985: The improved initialization inversion method: A high resolution physical method for temperature retrievals from satellites of the TIROS-N series.

,*J. Climate Appl. Meteor.***24**, 128–143, doi:10.1175/1520-0450(1985)024<0128:TIIIMA>2.0.CO;2.Chen, Y., Y. Han, and F. Weng, 2012: Comparison of two transmittance algorithms in the community radiative transfer model: Application to AVHRR.

,*J. Geophys. Res.***117**, D06206, doi:10.1029/2011JD016656.Chevallier, F., 2002: Sampled databases of 60-level atmospheric profiles from the ECMWF analyses. NWP SAF Rep. NWPSAF-EC-TR-004, 27 pp.

Clough, S. A., M. J. Iacono, and J.-L. Moncet, 1992: Line-by-line calculation of atmospheric fluxes and cooling rates: Application to water vapor.

,*J. Geophys. Res.***97**, 15 761–15 785, doi:10.1029/92JD01419.Clough, S. A., M. W. Shephard, E. J. Mlawer, J. S. Delamere, M. J. Iacono, K. Cady-Pereira, S. Boukabara, and P. D. Brown, 2005: Atmospheric radiative transfer modeling: A summary of the AER codes.

,*J. Quant. Spectrosc. Radiat. Transfer***91**, 233–244, doi:10.1016/j.jqsrt.2004.05.058.Collard, A. D., A. P. McNally, F. I. Hilton, S. B. Healy, and N. C. Atkinson, 2010: The use of principal component analysis for the assimilation of high-resolution infrared sounder observations for numerical weather prediction.

,*Quart. J. Roy. Meteor. Soc.***136**, 2038–2050, doi:10.1002/qj.701.Divakarla, M. G., and Coauthors, 2014: The CrIMSS EDR algorithm: Characterization, optimization, and validation.

,*J. Geophys. Res. Atmos.***119,**4953–4977, doi:10.1002/2013JD020438.Duncan, B., S. Strahan, Y. Yoshida, S. Steenrod, and N. Livesey, 2007: Model study of the cross-tropopause transport of biomass burning pollution.

,*Atmos. Chem. Phys.***7**, 3713–3736, doi:10.5194/acp-7-3713-2007.Eyre, J. R., 1991: A fast radiative transfer model for satellite sounding systems. ECMWF Tech. Memo. 176, 28 pp. [Available online at http://old.ecmwf.int/publications/library/ecpublications/_pdf/tm/001-300/tm176.pdf.]

Eyre, J. R., and H. M. Woolf, 1988: Transmittance of atmospheric gases in the microwave region: A fast model.

,*Appl. Opt.***27**, 3244–3249, doi:10.1364/AO.27.003244.Fleming H. E., and L. M. McMillin, 1977: Atmospheric transmittance of an absorbing gas. 2: A computationally fast and accurate transmittance model for slant paths at different zenith angles.

,*Appl. Opt.***16**, 1366–1370, doi:10.1364/AO.16.001366.Goody, R., and Y. L. Yung, 1989:

*Atmospheric Radiation: Theoretical Basis.*2nd ed. Oxford University Press, 519 pp.Han, Y., and Coauthors, 2013: Suomi NPP CrIS measurements, sensor data record algorithm, calibration and validation activities, and record data quality.

,*J. Geophys. Res. Atmos.***118**, 12 734–12 748, doi:10.1002/2013JD020344.Hilton, F. I., and Coauthors, 2012: Hyperspectral Earth Observation from IASI: Five years of accomplishments.

,*Bull. Amer. Meteor. Soc.***93**, 347–370, doi:10.1175/BAMS-D-11-00027.1.Hocking, J., P. Rayer, D. Rundle, R. Saunders, M. Matricardi, A. Geer, P. Brunel, and J. Vidot, 2014: RTTOV v11 users guide. NWP SAF Tech. Rep. NWPSAF-MO-UD-028, 114 pp. [Available online at https://nwpsaf.eu/deliverables/rtm/docs_rttov11/users_guide_11_v1.3.pdf.]

Huang, H.-L., and P. Antonelli, 2001: Application of principal component analysis to high-resolution infrared measurement compression and retrieval.

,*J. Appl. Meteor.***40**, 365–388, doi:10.1175/1520-0450(2001)040<0365:AOPCAT>2.0.CO;2.Lipton, A. E., J.-L. Moncet, S.-A. Boukabara, G. Uymin, and K. J. Quinn, 2009: Fast and accurate radiative transfer in the microwave with optimum spectral sampling.

,*IEEE Trans. Geosci. Remote Sens.***47**, 1909–1917, doi:10.1109/TGRS.2008.2010933.Liu, Q., Y. Xue, and C. Li, 2013: Sensor-based clear and cloud radiance calculations in the community radiative transfer model.

,*Appl. Opt.***52**, 4981–4990, doi:10.1364/AO.52.004981.Liu, X., J.-L. Moncet, D. K. Zhou, and W. L. Smith, 2003: A fast and accurate forward model for NAST-I instrument.

*Extended Abstracts, Optical Remote Sensing,*Quebec City, QC, Canada, Optical Society of America, OMB2, doi:10.1364/ORS.2003.OMB2.Liu, X., W. L. Smith, D. K. Zhou, and A. Larar, 2006a: Principal component-based radiative transfer model for hyperspectral sensors: Theoretical concept.

,*Appl. Opt.***45**, 201–209, doi:10.1364/AO.45.000201.Liu, X., D. K. Zhou, A. M. Larar, W. L. Smith, and P. Schluessel, 2006b: A physical retrieval algorithm for IASI using PCRTM.

*Multispectral, Hyperspectral, and Ultraspectral Remote Sensing Technology, Techniques, and Applications,*W. L. Smith et al., Eds., International Society for Optical Engineering (SPIE Proceedings, Vol. 6405), 64050L, doi:10.1117/12.694175.Matricardi, M., 2003: RTIASI-4, a new version of the ECMWF fast radiative transfer model for the infrared atmospheric sounding interferometer. ECMWF Tech. Memo. 425, 65 pp. [Available online at http://old.ecmwf.int/publications/library/ecpublications/_pdf/tm/401-500/tm425.pdf.]

Matricardi, M., and A. P. McNally, 2014: The direct assimilation of principal components of IASI spectra in the ECMWF 4D-Var.

,*Quart. J. Roy. Meteor. Soc.***140**, 573–582, doi:10.1002/qj.2156.McMillin, L. M., and H. E. Fleming, 1976: Atmospheric transmittance of an absorbing gas: A computationally fast and accurate transmittance model for absorbing gases with constant mixing ratios in inhomogeneous atmospheres.

,*Appl. Opt.***15**, 358–363, doi:10.1364/AO.15.000358.McMillin, L. M., H. E. Fleming, and M. L. Hill, 1979: Atmospheric transmittance of an absorbing gas. 3: A computationally fast and accurate transmittance model for absorbing gases with variable mixing ratios.

,*Appl. Opt.***18**, 1600–1606, doi:10.1364/AO.18.001600.Moncet, J.-L., and S. A. Clough, 1997: Accelerated monochromatic radiative transfer for scattering atmospheres: Application of a new model to spectral radiance observations.

,*J. Geophys. Res.***102**, 21 853–21 866, doi:10.1029/97JD01551.Moncet, J.-L., and Coauthors, 2005: Algorithm theoretical basis document for the Cross Track Infrared Sounder (CrIS). Volume II, Environmental Data Records (EDR), version 4.2. AER Tech. Doc. P1187-TR-I-08, 298 pp. [Available online at http://npp.gsfc.nasa.gov/sciencedocuments/2013-01/474-00056_RevABaseline.pdf.]

Moncet, J.-L., G. Uymin, A. E. Lipton, and H. E. Snell, 2008: Infrared radiance modeling by optimal spectral sampling.

,*J. Atmos. Sci.***65**, 3917–3934, doi:10.1175/2008JAS2711.1.Remedios, J. J., and Coauthors, 2007: MIPAS reference atmospheres and comparisons to V4.61/V4.62 MIPAS level 2 geophysical data sets.

,*Atmos. Chem. Phys. Discuss.***7**, 9973–10 017, doi:10.5194/acpd-7-9973-2007.Salisbury, J. W., and D. M. D’Aria, 1992: Emissivity of terrestrial materials in the 8-14 μm atmospheric window.

,*Remote Sens. Environ.***42**, 83–106, doi:10.1016/0034-4257(92)90092-X.Salisbury, J. W., and D. M. D’Aria, 1994: Emissivity of terrestrial materials in the 3–5 μm atmospheric window.

,*Remote Sens. Environ.***47**, 345–361, doi:10.1016/0034-4257(94)90102-3.Salisbury, J. W., D. M. D’Aria, and A. Wald, 1994: Measurements of thermal infrared spectral reflectance of frost, snow, and ice.

,*J. Geophys. Res.***99**, 24 235–24 240, doi:10.1029/94JB00579.Saunders, R. W., and Coauthors, 2007: A comparison of radiative transfer models for simulating Atmospheric Infrared Sounder (AIRS) radiances.

,*J. Geophys. Res.***112**, D01S90, doi:10.1029/2006JD007088.Shephard, M. W., and Coauthors, 2011: TES ammonia retrieval strategy and global observations of the spatial and seasonal variability of ammonia.

,*Atmos. Chem. Phys.***11**, 10 743–10 763, doi:10.5194/acp-11-10743-2011.Snyman, J. A., 2005:

*Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms.*Springer Publishing, 258 pp.Strahan, S., B. Duncan, and P. Hoor, 2007: Observationally derived transport diagnostics for the lowermost stratosphere and their application to the GMI chemistry and transport model.

,*Atmos. Chem. Phys.***7**, 2435–2445, doi:10.5194/acp-7-2435-2007.Strow, L. L., S. E. Hannon, S. De Souza-Machado, H. E. Motteler, and D. Tobin, 2003: An overview of the AIRS radiative transfer model.

,*IEEE Trans. Geosci. Remote Sens.***41**, 303–313, doi:10.1109/TGRS.2002.808244.Tobin, D. C., P. Antonelli, H. E. Revercomb, S. Dutcher, D. D. Turner, J. K. Taylor, R. O. Knuteson, and K. Vinson, 2007: Hyperspectral data noise characterization using principle component analysis: Application to the atmospheric infrared sounder.

,*J. Appl. Remote Sens.***1**, 013515, doi:10.1117/1.2757707.Wan, Z., and Coauthors, 1999: MODIS UCSB emissivity library. Institute for Computational Earth System Science. [Available online at http://www.icess.ucsb.edu/modis/EMIS/html/em.html.]

West, R., R. Goody, L. Chen, and D. Crisp, 2010: The correlated-k method and related methods for broadband radiation calculations.

,*J. Quant. Spectrosc. Radiat. Transfer***111**, 1672–1673, doi:10.1016/j.jqsrt.2010.01.013.Wiscombe, W. J., and J. W. Evans, 1977: Exponential-sum fitting of radiative transmission functions.

,*J. Comput. Phys.***24**, 416–444, doi:10.1016/0021-9991(77)90031-6.

^{1}

The time taken to perform the mapping from node to channel space is typically small for radiances compared to Jacobians.

^{2}

As in Mon08, we have the option to constrain the sum of the weights to be exactly equal to 1. However, the sum of weights in the unconstrained solution always approaches 1 very closely, so the impact on the solution of including this explicit constraint is minimal.

^{3}

Although the term “localized training” is a misnomer in the case of sinc functions, the node selection technique described above applies as is to this case.

^{4}

Optimal radius varies spectrally with sensitivity of the Planck function to temperature and local absorption characteristics.

^{5}

The Jacobians transformation should be performed in the *y* space with the smallest dimension (i.e., for IASI) in node space in all modes.