## Abstract

A moving-window regression technique was developed for obtaining better a priori information for one-dimensional variational (1DVAR) physical retrievals. Using this technique regression coefficients were obtained for a specific geographical 10° × 10° window and for a given season. Then, regionally obtained regression retrievals over East Asia were used as a priori information for physical retrievals. To assess the effect of improved a priori information on the accuracy of the physical retrievals, error statistics of the physical retrievals from clear-sky Atmospheric Infrared Sounder (AIRS) measurements during 4 months of observation (March, June, September, and December of 2010) were compared; the results obtained using new a priori information were compared with those using a priori information from a global set of training data classified into six classes of infrared (IR) window channel brightness temperature. This comparison demonstrated that the moving-window regression method can successfully improve the accuracy of physical retrieval. For temperature, root-mean-square error (RMSE) improvements of 0.1–0.2 and 0.25–0.5 K were achieved over the 150–300- and 900–1000-hPa layers, respectively. For water vapor given as relative humidity, the RMSE was reduced by 1.5%–3.5% above the 300-hPa level and by 0.5%–1% within the 700–950-hPa layer.

## 1. Introduction

With the advent of satellite-based hyperspectral infrared measurement technologies such as the Atmospheric Infrared Sounder (AIRS; Chahine et al. 2006), global pictures of three-dimensional temperature and moisture became available, recently yielding a high vertical resolution of ~1–2 km. One-dimensional variational (1DVAR)-based physical methods have been developed for retrieving relevant parameters in a manner consistent with both satellite measurements and a priori condition (Eyre 1989; Li and Huang 1999; Ma et al. 1999; Li et al. 2000; Susskind et al. 2003; Carissimo et al. 2005; Gambacorta 2013; Masiello and Serio 2013).

The retrievals from the hyperspectral sounder have been widely used for achieving a variety of objectives. One of the major achievements from the satellite-based sounding is to extend our understanding of atmospheric phenomena from the weather to the climate (e.g., Dessler et al. 2008; Tian et al. 2010; Kahn et al. 2011; among many others). In particular, better understanding of the role of water vapor in the global climate system can be brought with these retrievals. Moreover, the retrievals are utilized for monitoring severe weather (i.e., Weisz et al. 2015) and for improving numerical weather prediction (Le Marshall et al. 2006; Hilton et al. 2012; Zheng et al. 2015; among many others). In line with these, many attempts have been made to improve the physical methods (Li et al. 2007; Kwon et al. 2012; Smith et al. 2012; Bisht et al. 2015) but solving the retrieval problem remains a challenge (Grieco et al. 2011; Serio et al. 2015). Nonetheless, the knowledge obtained from these previous studies should have been useful for developing better data assimilation methods for numerical weather forecasting.

It has been long recognized that a priori information for constraining the physical method is essential for solving ill-posed inverse problems (Bouttier and Courtier 2002; Prunet et al. 2001; Zhang et al. 2014). One can use regression-based retrievals from satellite measurements for providing a priori information (Li et al. 2000; Kwon et al. 2012). In this case, regression retrievals are used as a priori information but also used as a first guess in the physical method. Therefore, improving the regression method is considered to yield more accurate retrieval.

Different from this approach, regression retrievals can be used as first guess only, while a priori information is obtained from climatology or numerical model outputs. For those, better regression is also thought to be an important way to improve the physical retrievals. Recent updates of AIRS and Infrared Atmospheric Sounding Interferometer (IASI; Blumstein et al. 2004) operational products [i.e., version 6 of AIRS Level 2 (L2) retrieval from the National Aeronautics and Space Administration (NASA), and version 6 of IASI L2 retrieval from the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT)] reflect such efforts. In the case of AIRS, version 6, training datasets were classified in terms of surface type and pressure, latitude, season, and day/night (Olsen 2013). This effort was based on the assumption that well-categorized regression coefficients can reduce the nonlinearity between observed radiance and underlying atmospheric state. Similarly, IASI, version 6, introduced a piecewise regression method for generating a better first guess (EUMETSAT 2016a). In the EUMETSAT method for IASI, training segments are stratified into 480 classes, such that each retrieval case is examined to determine the most appropriate class out of 480 classes using scan angle, solar zenith angle (i.e., day or night), surface altitude, radiances from IASI, and collocated radiances from Advanced Microwave Sounding Unit-A (AMSU-A) and Microwave Humidity Sounder (MHS).

If a well-classified training dataset is a key to improving regression algorithm, then a question may arise about whether regression methods based on regionally and seasonally varying regression coefficients can yield a better a priori information or a first guess compared to globally derived coefficients. This idea is not completely new, and in fact, as shown in Thompson et al. (1985) and Chedin et al. (1985), the idea has been tested by partitioning the global ensemble into various classes for improving the first-guess fields. Along this line of reasoning, a method (called “the moving-window technique” in this study) is introduced here to yield better regionally focused a priori information and first guess, for improving the 1DVAR-based physical retrievals of clear-sky temperature and moisture profiles from AIRS infrared hyperspectral measurements.

This study first compares two types of regression retrievals, from 1) regionally and seasonally driven regression coefficients obtained for the moving-window technique, and 2) Cooperative Institute for Meteorological Satellite Studies (CIMSS) regression coefficients derived from a global training dataset (Weisz et al. 2007; Weisz et al. 2013). The CIMSS regression uses the clear-sky training dataset (Borbas et al. 2005) classified with scanning angles and six classes of the window channel brightness temperature TB (referred to as the TB-based classification technique; Weisz et al. 2003). Comparison is made over the East Asian region, for 4 months spanning a year (i.e., March, June, September, and December of 2010). Then, regression retrievals obtained from the two methods are used as a priori information constraining the CIMSS 1DVAR-based physical retrieval model (Li et al. 2000). The effect of different types of a priori information on the accuracy of the physical retrieval is studied by comparing the error statistics of physical retrievals using the two sets of regression data against reference profiles.

The paper is organized as follows. A moving-window technique for the regression algorithm and the associated experimental design are introduced in sections 2 and 3, respectively. Significant features of the moving-window technique with respect to the regression retrieval and the effect of this technique on the physical retrieval are described in sections 4 and 5, respectively. A discussion and summary follow in section 6.

## 2. Methodology and data used

### a. Construction of training data

In developing a statistical regression model, a set of training data is necessary, from which predictand parameters (here temperature and moisture profiles) can be related to predictor variables (i.e., AIRS radiances and surface pressure). In addition, surface pressure is used as a predictor variable for maximizing the retrieval performance (Weisz et al. 2007; Thapliyal et al. 2014; Olsen 2013).

To prepare the training dataset, we simulate expected AIRS radiances using 4-yr-long (2006–09) European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis [ERA-Interim (ERA-I)] data (ECMWF 2009) as inputs to the radiative transfer model; these inputs are the temperature, moisture, ozone profiles, surface temperature, and surface pressure. Surface emissivity is also used to enhance the simulation accuracy, and the data are obtained from the global infrared surface emissivity database produced by CIMSS (Borbas et al. 2007; Seemann et al. 2008). The regression method is developed over the East Asian analysis domain (15°–55°N, 90°–150°E). One may construct the training data for the regression from satellite measurements collocated with reference data. However, this method is prone to errors related to not well-known sensor characteristics and collocation mismatch. On the other hand, although the simulated brightness temperatures are not true, due to inaccurate radiative transfer modeling and specification of measurement errors, the simulation approach has an advantage of avoiding the collocation problem and thus being easily equipped with known reference data for constructing a priori as well as for validation.

For the clear-sky radiance simulation, the stand-alone radiative transfer algorithm (SARTA; Strow et al. 2003) is used, which ensures high-accuracy fast simulations of AIRS radiance. It was reported that uncertainty of the SARTA method results is ~0.2 K for CO_{2} absorption channels. For H_{2}O absorption channels, uncertainty is reported to be mostly less than 1 K. All 2378 AIRS channels are simultaneously simulated using the SARTA model, except for the channels that exhibit noisy behavior, for example, owing to significant instrumental noise, non-Gaussian noise distribution, or poor spectral response function (Weisz et al. 2007). After subjectively identifying and removing these noisy channels, 1435 channels are retained for channel simulations. Also, 28 shortwave infrared (SWIR) channels are retained, but they are used only for nighttime retrieval analysis; thus, regression analysis is performed separately for daytime and nighttime. Random instrumental noise is added to the simulated radiances, based on the assumption that instrumental noise is Gaussian distributed, with a standard deviation of noise-equivalent differential temperature (NedT). Then, principal component analysis (Huang and Antonelli 2001) is applied to the simulated radiances for reducing the computational burden associated with the regression using all 1435 channels. The first 40 principle components (i.e., compressed radiances), explaining nearly all of the AIRS spectrum variance, are retained as predictors.

### b. Moving-window regression method

Adopting a linear regression model (Smith and Woolf 1976; Weisz et al. 2007), we apply the moving-window technique for classifying the training dataset in terms of their region and season. For a given set of predictor variables (here a vector **Y**_{R} of 41 parameters, containing data for 40 compressed radiances plus one surface pressure), the predictand **X**_{R} can be retrieved as follows:

The regression model finds the solution by minimizing , where the regression coefficient **C** is defined as follows:

To obtain regionally based regression coefficients for the temperature and moisture retrievals, we define a moving window as a 10° × 10° box. Figure 1 exemplifies the retrieval procedure by showing the analysis domain, two adjacent windows for the retrieval, and associated two boxes for training. For retrieval for box 1, the regression relationship is trained on a larger 20° × 20° training box 1, concentric with retrieval box 1. After completing the retrieval for box 1, the regression window is moved to retrieval box 2, where retrieval is performed with another regression relationship obtained from the 20° × 20° training box 2. In this way, spatial discontinuity between neighboring windows is minimized. This procedure is repeated until the retrieval is completed over the entire analysis domain. Seasonal variations are incorporated into the regression by training on different seasons (here, the four seasons of March–May, June–August, September–November, and December–February). To reduce the temporal discontinuity between two adjacent seasons, the regression for each season is conducted over a period of 5 months by adding one month before the season and another month after the season in a target; for example, regression coefficients for spring (March–May) are obtained from the February–June period. Thus, regression coefficients relevant to each AIRS pixel are obtained by considering the pixel location and the associated season.

In addition, the effect of the scan angle (*θ*) is also included in the regression model; 11 scanning angles from 0° to 49° are considered, following Weisz et al. (2007),

where Δ = 0.0524 and *j* = 0, 1, 2, …, 10. Applying this method to AIRS measurements, retrieval at a specific *θ* is obtained by linearly interpolating retrievals at two adjacent viewing angles, with respect to the relative air mass associated with the viewing angles. The retrievals obtained using the moving-window regression method are henceforth denoted by .

### c. Use of regression retrieval for physical method

Using regression-based retrievals as an initial field, a physical retrieval is conducted to find a solution that minimizes the cost function (*J*),

where **X**_{P} is an optimal guess (the so-called physical retrieval), **X**_{0} is a priori information, **Y** is an observation, **Y**(**X**_{p}) is an expected observation for the given **X**_{p}, is the error of a priori information, and is the observation error. By applying the quasi–Gauss–Newton iteration method (Eyre 1989), the physical retrieval finds a solution that minimizes the difference between a priori information and observations. The relative accuracy between the a priori information and observations is accounted for by introducing and into the cost function. A detailed description of this method can be found in Rodgers (1976).

In this study, regression-based is used as an initial field for applying the CIMSS physical model (Li et al. 2000), which uses an analytical form of the radiative transfer equation (Li 1994). Some salient features of this CIMSS model include the use of an error balancing factor γ to yield more accurate error covariance and the use of eigenvectors to compress state vectors in Eq. (4). Details can be found in Li et al. (2000).

To use the regression retrievals as a priori information for the physical retrieval, its error covariance () should be defined, in order to determine a relative importance to the observation, as expressed in Eq. (4). One may directly construct from ERA-I data, but uncertainties of the background covariance should be associated with errors in ERA-I data. Instead of directly calculating from ERA-I data, is assumed to be similar to the one that is used for other regression methods; here, we utilize the predeveloped diagonal error covariance matrix () that was used in the CIMSS regression retrieval (). Here, the error ratio between and is assumed to be proportional to the ratio between the diagonal component of and that of :

where diag(⋅) denotes the diagonal component and **α** is the ratio of the error variance between and . ERA-I is used as a reference for estimating the ratio of the error variance between and . The final results of for temperature and moisture are shown in Figs. 2a and 2b, respectively.

For a given (here ), kernel () explaining the relative importance between and can be expressed as follows:

where denotes Jacobian matrix. In this study, the observation error is defined by AIRS instrumental noise plus radiative transfer model error (0.2 K). No observation error correlation is assumed. In Eq. (6), the kernel is used to estimate a best state for given observations and a priori information (Rodgers 2000), and the best estimate of **X** (i.e., **X**_{best}) for given observations is defined as follows:

Since the kernel is a matrix, we provide increments of temperature and moisture from **X**_{0} to **X**_{best} by assuming all TB perturbations are 1 K (i.e., = 1 for all channels; Fig. 2c). For temperature, relatively larger increments are shown in the 200–300-hPa layer and below the 500-hPa level, while for moisture most of the increments are made below the 500-hPa level. Thus, we expect relatively larger improvements over those layers when 1DVAR is applied.

In this study, to determine whether the physical retrieval is converged to an acceptable solution, the following brightness temperature residual (Res) is used:

where *M* is the number of used channels. During the minimization of the cost function in Eq. (4), Res is calculated and is examined to determine whether the convergence criterion, Res < 0.1 K, is met for the ideal solution. But, even after the minimization is completed, all cases may not meet the Res < 0.1 K condition. In this study, these cases are also considered to be the loosely optimized solutions if they satisfy the quality control criterion, Res < 1 K, as in Kwon et al. (2012).

Here, to facilitate our understanding of the algorithm, a flowchart summarizing the 1DVAR-based physical algorithm is provided in Fig. 3. With the given AIRS TBs and a priori information (i.e., regression retrieval), the first updated state (**X**_{2}) is estimated from the guess field (**X**_{1}) using an iterative method of 1DVAR, which starts with a regression retrieval. Then at each iteration step, residuals of both guess and updated fields are calculated. When the residual of the updated field becomes smaller and is less than 0.1 K, the iteration stops and the updated state is set to a final solution (here, the best solution). If Res ≥ 0.1 K, then the algorithm keeps updating from the previous estimate. At this time, the speed of iteration is enhanced by reducing the weight of a priori information with the use of the error balancing factor (i.e., = γ, γ = 0.8). This iteration process for updating is allowed up to six times. After six times, the last updated state is considered to be the final solution.

During the minimization process, the residual of the updated state may become larger than the previous state. If this is the case, then the increased weight of a priori information is used (i.e., = γ, γ = 1.8) to allow a more stable iteration in the next step. This stabilizing process is allowed up to three times. If the residual is not further minimized in three times, the iteration stops and the state having the lowest residual is set to the final solution. In this algorithm, up to nine iterations are allowed, and the mean iteration number in this study is found to be about 6.4.

As shown in Fig. 3, the final solution is one of following three: 1) the best solution satisfying the preset convergence criterion (Res < 0.1 K), 2) the loosely optimized solution (0.1 K ≤ Res < 1 K), and 3) the not-converged solution (Res ≥ 1 K). In this study, the first two classes (i.e., Res < 1 K) outlined as a dotted box in Fig. 3 are considered to be accepted physical solutions.

## 3. Experimental design

To examine the effect of a priori information on the physical retrieval, we compare two retrievals—that is, and —based on their respective regression-based initial fields: 1) regression retrieval obtained using the window technique () and 2) CIMSS regression retrieval (). Before doing so, the accuracies of and are compared. In these comparisons, all AIRS granules over the East Asian domain for 4 months (March, June, September, and December of 2010) are used as inputs (NASA 2007). As an additional predictor, we use surface pressure from collocated ERA-I data. Retrievals are performed only over clear-sky regions. Determining the clear-sky pixel, we use cloud information obtained from the International MODIS/AIRS Processing Package (IMAPP) MODIS–AIRS collocation package (https://cimss.ssec.wisc.edu/imapp/uwairs_utils_v1.0.shtml), which provides cloud amount at each AIRS pixel (Li et al. 2004). The AIRS pixel is considered to be clear when the obtained cloud amount is zero.

Retrieval accuracy is examined in terms of error statistics for ERA-I data (used as reference) and collocated AIRS retrievals averaged over the 0.75° × 0.75° ERA-I grid box. Linear time interpolation of ERA-I data is also considered, for determining the AIRS observation time. In examining the retrieval accuracy, however, a caution should be exercised because both ERA-I and AIRS retrievals are subject to uncertainties and thus the difference between two fields should not be a direct measure of AIRS errors. Furthermore, here, spatial and temporal interpolation may induce another source of error. Nevertheless, in this study, the collocation-caused error is considered to be negligible and the errors in the ERA-I data are considered to be minor, although the surface layers in the model outputs tend to be more uncertain.

## 4. Regression results

Using the methodology described in section 3, the accuracies of and are examined, and the results are presented in Fig. 4. The mean reference profiles of temperature and water vapor from the ERA-I data are given in Fig. 4a, while the error statistics of the temperature and moisture retrievals are shown in Figs. 4b and 4c, respectively. In the case of water vapor comparison, the retrieved water vapor mixing ratio is converted into relative humidity using the saturation vapor pressure estimated from the collocated ERA-I temperature. Nearly 150 000 matched samples are used for this comparison, but the number of data used for the comparison is smaller near the surface owing to the surface topography.

Here we provide mean temperature and moisture profiles to help understand the error statistics. The mean temperature profile in Fig. 4a shows a monotonic decrease from 290 to 210 K, from the surface to the level of 100 hPa. The mean relative humidity exhibits a pattern of reduction from ~70% at the surface to ~35% at 500 hPa, followed by an increase up to the level of 250 hPa. This peak may be caused by a relatively lower ice saturation vapor pressure that was used for calculating the relative humidity. Above the level of 250 hPa, the humidity drops to 20% at 150 hPa. It is also interesting to note that the reduction in the relative humidity is highest in layers below 850 hPa.

Regarding temperature, the mean bias of is under 0.6 K and the root-mean-square error (RMSE) is in the 1.5–3-K range. The bias appears to be smaller compared with the bias suggested by the CIMSS regression retrieval, except for the layer higher than the pressure level of 150 hPa. On the other hand, the RMSEs of and show nearly the same patterns and have nearly the same magnitudes, although the moving-window technique yields slightly smaller values near the surface and for the upper troposphere, above the 150–300-hPa layer. It is of interest to note that both retrievals show the largest RMSEs near the surface layer (below ~800 hPa), which may be due to the effects of surface parameters (i.e., skin temperature and surface emissivity) on the temperature channels. The larger RMSEs may also stem from more uncertain features of model outputs (here ERA-I) near the surface. Distinct surface features such as diurnal variations of temperature and associated humidity field variations can also induce collocation errors, causing larger RMSEs in the surface layers.

The fact that the major improvement of the current moving-window technique, in comparison to , is with respect to the mean bias and suggests that the major benefit introduced by the moving-window technique appears to be the removal of the mean bias. It is because the current regression method may fit the retrievals better into the local climatology; yet, the proposed method seems to have a limited impact on explaining the variance of the parameter from the mean climatology. In particular, midtropospheric temperature around the 350–850-hPa layer shows no improvement in terms of RMSE. In this layer, we expect the regression performance to be relatively independent of the classification method. In this regime, a relatively straightforward relationship between temperature and observed radiance can be proposed. One possible reason for this is that midtropospheric temperature channels are nearly independent of surface parameters; that is, radiance observed from midtropospheric temperature channels is a function only of atmospheric temperature. A relatively simple temperature profile found in the midtroposphere can also explain good regression performance. The other possible reason is that the classification approach employed in this study may not be fully optimal. If it is classified against atmospheric situations, then better regression can be expected over the midtroposphere. For example, the recent validation study for IASI, based on the classification with respect to the atmospheric situations (so-called piecewise regression), reported that the precision in the midtroposphere is closer to 1 K (EUMETSAT 2016b).

One of the interesting things in error statistics are error peaks in the upper troposphere (~200–300 hPa), suggesting less confidence for retrievals over that layer. Increasing complexity of the temperature field may be one reason for this. This layer is a layer in which the stratospheric influence starts to appear. It is noted that the lapse rate also changes around this layer. A detailed discussion of this phenomenon is beyond the scope of the present study, but we expect this phenomenon to be also related to seasons and geographical location, because the moving-window technique shows a significant improvement over the 200–300-hPa layer.

The water vapor retrievals based on the moving-window technique are compared with the CIMSS regression results (Fig. 4c). The mean bias ranges from −12% to 2% and the RMSEs are in the 8%–22% range. Compared with the CIMSS regression retrievals, the moving-window technique yields a significantly smaller mean bias, except for the surface layer, in which the mean bias increases toward the negative value. The RMSE becomes smaller over the whole troposphere, but the major improvement appears to occur over the 200–400-hPa atmospheric layer.

Similar to the temperature retrieval, the main improvement for the water vapor retrieval is also in the bias removal, probably because this model more precisely describes the local climatology. In addition, smaller variances of regression coefficients again suggest that regionally and seasonally focused training data can help enhance the regression performance.

## 5. Effect of a priori information on the physical retrievals

Above, we showed that the moving-window regression method yields better temperature and moisture profiles than those obtained using the CIMSS regression method with training on global data. Because, in general, physical methods with better initial data will yield better retrievals, it is of interest to examine to what extent the proposed method improves retrieval, compared with the retrieval results obtained using the CIMSS regression method.

We examine the error statistics of retrievals when two different methods for producing initial data—moving-window-based regression retrieval and CIMSS regression retrieval—are used for running the CIMSS physical model. As described in section 2c, Res < 1 K was used as a criterion for determining the accepted solution.

For temperature, the mean bias of ranges from −0.5 to 0.3 K for the entire troposphere, and the RMSE is in the 1.2–2.5-K range (Fig. 5a). It is noted that these results are significantly improved, in terms of both the mean bias and RMSE, compared with the regression results in Fig. 4a. In particular, much improvement is evident for the lower layer near the surface. The better performance demonstrated by this physical algorithm seems to be largely attributed to improving the radiative consistency between the observed radiance and the calculated retrieval radiance. Improved performance of the physical method using CIMSS regression retrievals as inputs is also clear. Although the bias values in the two cases may be similar to each other, seems to yield a smaller RMSE than , likely owing to a smaller error in , as shown in Fig. 4b. It is interesting to note that the RMSEs in the two cases are nearly the same over the 350–700-hPa layer and over the layer above the 150-hPa layer, although that of is slightly larger than the CIMSS regression results for that layer, suggesting that the CIMSS physical method is less dependent on the input data over the 350–700-hPa layer.

With regard to the moisture retrieval (Fig. 5b), both methods also yield significant improvement compared with regression retrievals, especially over the surface–400-hPa layer. Although better results can be expected using the moving-window regression retrieval, which shows a slightly smaller RMSE from the surface to the 400-hPa layer, and show nearly the same RMSEs. It is considered that the physical retrieval of water vapor tends to be dominantly determined by radiance rather than by a priori information. In other words, the error associated with the a priori information is relatively larger than that associated with AIRS radiances.

It is worthwhile to note that both water vapor retrievals show consistently dry biases over the upper troposphere and that these dry biases may cause large RMSEs over the 200–300-hPa layer. Considering that the retrievals have residuals smaller than 1 K, and thus the radiance consistency is strong, we suspect the humid bias of the ERA-I reference data in the upper troposphere. Such humid bias of the ERA-I data in the upper troposphere against radiosonde observations has been reported (Noh et al. 2016).

In this study, retrieval outcomes having Res ≥ 1 K are considered as not-converged solutions. At this point, it is quite interesting to examine the error characteristics of cases not meeting the quality control criterion. Results showing Res ≥ 1 K for both *T* and *q* exhibit biases and RMSEs similar to those obtained for Res < 1 K, albeit larger in magnitude, in particular over the lower atmosphere below 700 hPa (Figs. 6a and 6b). Considering the similar statistics above the 700-hPa level, it is suggested that the lower atmosphere could cause such a larger residual owing to the imbalance between the observed and calculated radiances. Compared with , the RMSEs of are smaller, both for temperature and water vapor. For temperature, the RMSEs over the 150–300- and 850–1000-hPa layers are improved, while for water vapor the RMSEs over the 200–400- and 850–1000-hPa layers are improved. In addition, the biases of below the 150-hPa layer are smaller than those of .

In conclusion, can be considered more accurate than , in terms of both temperature and water vapor retrievals, as shown in Figs. 5 and 6. It is shown that the regression retrieval with the moving-window technique improves the physical retrievals of temperature and water vapor profiles. If we focus on the comparison of accepted solutions, improved RMSEs of temperature are evident in the 150–350-hPa layer and below the 900-hPa level. On the other hand, the RMSEs of the humidity profiles are clearly improved in the layer above ~300 hPa. The RMSEs in the 750–900-hPa layer indicate reduced magnitudes associated with the physical retrievals. However, it should be noted that the temperature retrieval bias becomes large above the 200-hPa level and that the water vapor retrieval bias is higher at the surface.

## 6. Summary and discussion

To improve the temperature and moisture retrievals from hyperspectral AIRS measurements, we attempted to improve the regression retrievals that can be used as a priori information for the physical retrieval model. In doing so, a moving-window technique was developed based on the assumption that the inclusion of local climate features in the regression procedures will yield better regression results compared with the CIMSS regression method, which does not consider local features (as trained with global data). The moving-window technique performs the regression at a given location (a 10° × 10° grid box) and at a given time (any season). For regional and global applications, the regression box can be continuously moved to the adjacent location (i.e., another 10° × 10° box) for performing regression on that particular box area. The obtained a priori information was then used to constrain the CIMSS physical model, and the results were compared with those obtained using the a priori information obtained from the CIMSS regression method. In this study, the developed moving-window technique was applied to 4 months (March, June, September, and December of 2010) of AIRS clear-sky measurements over East Asia.

Notably, regression retrievals based on the moving-window technique yielded smaller mean biases and RMSEs compared with the CIMSS regression retrievals. Because regional and seasonal climate variations were accounted for by the moving-window technique, the mean bias, implying the accuracy of the estimated climatology, was substantially improved. The assumption of linearity between the atmospheric state and radiance variables must be more valid when using the moving-window technique, because the atmospheric state in a 10° × 10° regression box is one of many realizations yielding the box climatology. In other words, a relatively smaller deviation from the mean state can be expected compared with the method in which the training dataset is collected over a much larger area. By the same token, by regressing over a narrower time window (here, season) a better linear relationship can be expected. In the end, regionally and seasonally varying regression coefficients may result in better performance compared with the performance that is obtained when training data are collected over larger regions and longer times.

The effect of localization of regression coefficients derived from the moving-window technique appears to be more significant in situations in which the relationship between measured radiances and atmospheric state is not clear (e.g., the relationship between the measured radiances and the lower boundary layer states). The moving-window technique is particularly advantageous in this situation, compared with the TB-based classification technique used with the CIMSS regression method. The TB-based classification technique may be useful for reflecting the TB dependency onto the statistical relationship between the atmospheric state and radiance. However, it is thought that the moving-window technique may result in a less accurate performance in extreme cases compared with the CIMSS TB-based classification technique, because compared with the latter method the moving-window technique is much more susceptible to the variation from the local climatology.

The use of a priori information from the moving-window regression method improved the physical retrieval. For temperature, RMSE improvements of 0.1–0.2 and 0.25–0.5 K were demonstrated over the 150–300- and 900–1000-hPa layers, respectively. For water vapor expressed as relative humidity, RMSEs were reduced by 1.5%–3.5% above the 300-hPa atmospheric pressure level and by 0.5%–1% over the 750–900-hPa layer compared with the results obtained when CIMSS regression retrievals were used as the initial guess fields. However, no effect was observed for the midtroposphere around 350–650 hPa. Water vapor retrievals were not much improved in the lower boundary layer. These benign results suggest that the regression-based improvement of the initial field data may be limited. Other methods can be devised. For example, including also the numerical weather prediction (NWP) model background in the regression procedure can improve the a prior information for 1DVAR physical retrieval (Jin et al. 2008; Schmit et al. 2008; Li et al. 2009); adding different types of observations, such as surface observations, to the retrieval procedures, as suggested by Liu et al. (2014), may be one way of improving the retrievals.

## Acknowledgments

The authors convey their sincere thanks to three anonymous reviewers for their valuable comments, which led to the improved version of the manuscript. We would also like to thank Jinlong Li and Zhenglong Li for their support and discussion of this work. This study was supported by the Korea Meteorological Administration Research and Development Program under Grant KMIPA 2015-1060.

## REFERENCES

*Infrared Spaceborne Remote Sensing XII*, M. Strojnik, Ed., International Society for Optical Engineering (SPIE Proceedings, Vol. 5543), 196–207, doi:.

*Proceedings of the Fourteenth International TOVS Study Conference*, ITWG, 763–770. [Available online at http://library.ssec.wisc.edu/research_Resources/publications/pdfs/ITSC14/borbas02_ITSC14_2005.pdf.]

*Joint 2007 EUMETSAT Meteorological Satellite Conf. and the 15th Satellite Meteorology and Oceanography Conf. of the American Meteorological Society*, Amsterdam, Netherlands, EUMETSAT and Amer. Meteor. Soc., EUMETSATP.50, P50_S10_03_BORBAS. [Available online at http://www.eumetsat.int/website/wcm/idc/idcplg?IdcService=GET_FILE&dDocName=PDF_CONF_P50_S10_03_BORBAS_P&RevisionSelectionMethod=LatestReleased&Rendition=Web.]

_{2}O and CO

_{2}

*Inverse Methods for Atmospheric Sounding: Theory and Practice*. Series on Atmospheric, Oceanic and Planetary Physics, Vol. 2, World Scientific, 256 pp.

*Proceedings of the Thirteenth International TOVS Study Conference*, ITWG, 323–330. [Available online at http://library.ssec.wisc.edu/research_Resources/publications/pdfs/ITSC13/weisz01_ITSC13_2003.pdf.]

## Footnotes

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).