Information about the vertical distribution of water vapor is useful in many disciplines. For example, it plays an important role in hydrometeorology, space observations, climatology, atmospheric radiation, and operational meteorology. Thus, high-resolution measurements describing the vertical distribution of water vapor would be valuable. However, the network of radiosondes even in the United States is inadequate for this purpose since it has an average spacing of approximately 250 km and only two releases per day. High-resolution water vapor data over large areas can be obtained using geostationary satellites. The Visible Infrared Spin Scan Radiometer (VISSR) Atmospheric Sounder (VAS) on board the GOES-7 satellite has been employed extensively for retrieving water vapor–related quantities. The recently launched GOES-8 and GOES-9 instruments are now being used for this purpose.
Physical and statistical procedures traditionally have been employed to make retrievals. Smith et al. (1985) and Hayden (1988) described a simultaneous physical algorithm for deriving vertical profiles of temperature and moisture from VAS radiances. Parameters such as precipitable water, lifted index, and potential temperature could then be calculated from these profiles. Major limitations of their algorithm are the necessity of a first guess profile and the fact that the final retrieval tends to retain features of this first guess (Hayden 1988; Fuelberg and Olson 1991). Split-window techniques attempt to exploit the differential absorption of water vapor in two adjacent bandpasses. Guillory et al. (1993) described a physical split-window algorithm for deriving precipitable water from VAS 11- and 12-μm radiances. They found that their technique had excellent potential for depicting mesoscale moisture variations.
Statistical algorithms have also been employed to retrieve moisture parameters from VAS radiances. Lee et al. (1983) used a regression procedure incorporating surface observations with the radiance data to determine water vapor quantities. Their scheme could improve the spatial and temporal continuity of retrieved low-level water vapor parameters. They also noted the importance of selecting an appropriate dependent dataset for each regression scheme. Chesters and Mostek (1985) used VAS radiances in a regression algorithm to calculate parameters such as precipitable water, lifted index, layer thickness, and equivalent potential temperature. They found that level-specific data appeared noisier than integrated parameters, most likely due to the poor vertical resolution of VAS. However, they also concluded that VAS could reveal mesoscale features that could not be detected by any other data source.
Techniques incorporating both the physical and statistical approaches also have been developed. Chesters et al. (1983, 1987) developed a split-window algorithm based on a linearized form of the radiative transfer equation. Two steps were needed to apply this technique: 1) a simple parameterization linking the ratio of the 11- and 12-μm transmittances to the water vapor content and 2) an empirical method for determining an air temperature for the radiating atmosphere. The main advantage of Chesters’s technique is the high spatial and temporal resolution that it provides. However, the necessity of a first-guess air temperature is a disadvantage.
With only 12 infrared channels, the VAS instrument provides poor vertical resolution. The GOES-8 and GOES-9 satellites, launched in April 1994 and May 1995, respectively, yield improved vertical and horizontal resolutions, noise magnitudes, and temporal coverage (Menzel and Purdom 1994). With these improvements, GOES-8 and GOES-9 should provide superior information about the structure of moisture and temperature.
Vertical profiles of Θe provide information about the vertical distribution of water vapor. In addition, convective instability, also called potential instability, is based on the variation of Θe with height—that is, δΘe/δz (Carlson 1991), hereafter denoted ΔΘe. A layer is stable (unstable) if the derivative is positive (negative). This stability parameter is useful in forecasting since it denotes areas that may experience severe weather. Satellite-derived ΔΘe would be helpful in thunderstorm prediction since mesoscale temporal and spatial resolution could be achieved.
This paper describes a procedure for determining ΔΘe from GOES-8 radiance data. The data and methodologies are described in section 2. Section 3 discusses how the statistical model was selected. The performance of the model is discussed in section 4. Finally, section 5 contains conclusions about the usefulness of statistical regression for retrieving moisture parameters from GOES-8 radiances.
It should be noted that GOES-9 radiance data could easily be used in a procedure such as ours since the instruments are nearly identical. However, GOES-8 is examined here since GOES-9 data were not yet available at the time of calculations; it became operational in January 1996.
Data and methodologies
Menzel and Purdom (1994) provide a detailed description of the GOES-8 satellite and its improvements over GOES-7. Briefly stated, GOES-8 contains 22 infrared sensing channels. Table 1 lists these channels along with their central wavelengths, major purposes, and approximate weighting function peaks in a standard atmosphere. (The weighting function is derived from the vertical change of transmittance—i.e., ∂τ/∂ lnp.) Eighteen of the channels are used for sounding purposes, while four are for imaging. They include four midtropospheric water vapor channels, five-split-window channels, five longwave temperature channels, three shortwave temperature channels, four shortwave window channels, and one ozone channel. With its additional midtropospheric water vapor channels, GOES-8 should provide improved vertical moisture structure compared to GOES-7. The better horizontal resolution in the midtropospheric water vapor channels (about 8 km at nadir for GOES-8 vs 14 km at nadir for GOES-7) will allow moisture gradients to be better defined. The improved noise magnitudes in the water vapor channels (at 0.3 K at 230 K for GOES-8 vs 1.0 K at 230 K for GOES-7) should increase the accuracy of retrievals. A study using GOES-8 retrievals can be found in Hayden et al. (1996).
The algorithms for determining ΔΘe were developed from a set of training data and then applied to independent data. The training data consisted of National Weather Service radiosonde observations (RAOBs) from 10 cities during 1979, including both 0000 and 1200 UTC releases. A full year of soundings was used to maximize the likelihood that most types of temperature and moisture profiles would be included. Cloudy soundings were omitted because infrared radiances are contaminated by clouds. A sounding was assumed to be cloudy if the dewpoint depression was 1°C or less anywhere in the vertical profile.
The independent data consisted of a model run from the Limited Area Mesoscale Prediction System (LAMPS) provided by the National Aeronautics and Space Administration/Marshall Space Flight Center. Doty and Perkey (1993) provide a complete description of LAMPS. LAMPS is a hydrostatic, primitive equation mesoscale model. It has time-dependent boundary conditions obtained from linearly interpolated (in time) National Centers for Environmental Prediction (NCEP) data grids, with a four-gridpoint “sponge” layer on each boundary. Model initial conditions were prepared from the NCEP 381-km mesh global analysis that was interpolated to the LAMPS grid as a first guess field and then modified with sounding data. The version of LAMPS employed here had 34 sigma-height levels between 0 and 16 km, with a horizontal spacing of approximately 35 km. The model domain was 23.8°–49.3°N and 70.5°–111.7°W. The model simulation was initialized at 1200 UTC 17 June 1986 and was run for 24 h.
Our statistical algorithms for ΔΘe were derived from simulated GOES-8 radiances that were calculated for each sounding comprising the training data. This was done using a radiative transfer model in which the vertical profile of temperature and dewpoint was input, with the output being radiation data for the GOES-8 channels. McMillin et al. (1979) provide a more detailed discussion of the model. Emissivities for all wavelengths were assumed to equal 1.0, and skin temperatures were calculated by adding a temperature factor to the shelter values. This factor depended upon time of day and location (latitude and longitude) and is similar to that used by Jedlovec (1990). Radiances were calculated for all 22 imager and sounder channels.
The actual (ground truth) convective instability was also needed. Values of Θe were obtained from the radiosonde soundings using standard thermodynamic equations; ΔΘe was calculated by subtracting Θe at 920 hPa from the value at 620 hPa. These levels were chosen for two reasons: 1) 920 and 620 hPa are usually within the moist and dry layers associated with convective instability and 2) many of the GOES-8 channels are sensitive at or near these levels (Table 1).
Our goal was to determine convective instability from GOES-8 channel brightness temperatures (TB) using regression. We did not consider sounder channel 9 (ozone) or channels 16, 17, and 18, and imager channel 2 (all shortwave window channels). The ozone channel was not used because of limitations in its simulation. The four shortwave window channels (∼3.9 μm) are near the solar end of the infrared spectrum and can be affected by solar reflection (Liou 1980). Since our radiative transfer code did not account for reflection, these channels also were eliminated.
The SAS statistical package (SAS 1990) was used to develop the regression relation between observed ΔΘe and the simulated radiance data. A backward elimination regression procedure was utilized. The initial regression included all independent variables (GOES-8 channels). If a variable did not contribute to the statistical model within a certain threshold, it was eliminated. A p value of 0.10 (defined later) was found to be the optimum threshold. A regression then was run with the remaining variables, using the same elimination procedure. This was repeated until all remaining variables contributed significantly to the model. At this point, a plot of the predicted ΔΘe’s (those found via the model) versus the residuals (actual minus predicted) allowed us to determine whether a bias was present such that a quadratic term should be considered. These plots (not shown) depicted only random scatter, implying that no bias was present.
Numerous combinations of the radiosonde training data were examined in an effort to produce the best statistical model. Specifically, four basic subsets of data were further stratified into seven different combinations depending on time (0000 and 1200 UTC) and instrument (sounder and imager). The four subsets were 1) 414 cloud-free soundings from Oklahoma City, Oklahoma (OKC), 2) 1188 cloud-free soundings from the southeastern United States (Jackson, Mississippi; Waycross, Georgia; Centerville, Alabama; Appalachicola, Florida; and Nashville, Tennessee), 3) 428 cloud-free soundings from the northeastern United States (Pittsburgh, Pennsylvania; Buffalo, New York; and Albany, New York), and 4) 295 cloud-free soundings from New York City (JFK). These four subsets will be denoted as OKC, SE, NE, and JFK, respectively, for the remainder of this paper. These subsets were utilized for two reasons: 1) to determine whether regional weather characteristics would influence the regression model, and 2) to determine whether using one location as opposed to multiple locations would influence the results (e.g., OKC vs SE).
Table 2 gives values of R2 (explained variance) for each version of training data. All four subsets show similar trends for the different combinations of data. Specifically, the models using either 0000 or 1200 UTC data alone (a, b, c, and g) outperform those using both times together (d, e, and f); R2 values for the former range from 80% to 95%, while those of the latter range from 56% to 88%.
Smaller R2 is expected for models d, e, and f since the combination of 0000 and 1200 UTC soundings adds more variability to the training data. That is, one cause of decreasing R2 occurs when different temperature–dewpoint profiles produce similar convective instability. For example, a profile having an upper-level Θe of 10°C and a lower-level Θe of 30°C would have the same convective instability as a profile having an upper- and lower-level Θe of 25°C and 45°C, respectively. However, these two profiles most likely have different temperature and moisture characteristics and different brightness temperatures. These situations will be more common in large sets of training data. In a similar way, Table 2 also shows that subsets with only one location (OKC and JFK) generally outperform those with multiple stations (SE and NE).
When the imager channels are included with the sounder channels, the greatest increase in explained variance is only 3%—that is, between models a and c from OKC. Thus, the imager channels add little additional information for determining convective instability. Furthermore, relatively small R2 values occur when only the imager channels are employed (model e).
In summary, the models trained on data with relatively small variability show the largest R2 values. Specifically, models b and g using OKC data (Table 2) explain the greatest variance. They contain only 0000 UTC soundings and produce R2 values near 95%.
It is important to determine how the statistical models perform at times other than 0000 and 1200 UTC. Therefore, each model was applied to several times of the LAMPS run constituting the independent data. All points in the LAMPS grid were included. The days 17–18 June 1986 contained large horizontal and vertical gradients of water vapor that made them ideal for examining the performance of our procedure. A synoptic discussion and humidity analysis of the period will be given in a later section. Channel brightness temperatures were calculated at each LAMPS grid point using the same radiative transfer code as for the training data. Other methodologies were also identical to those described previously.
Table 3 lists R2 values for models b and c when applied at 1800 UTC 17 June, and 0000 and 0600 UTC 18 June 1986. All of the models discussed previously (Table 2) were applied to the LAMPS data (not shown), but results either were similar or inferior to those of b or c.
Results for OKC and JFK show that models trained on 0000 UTC data (model b) perform best at 0000 UTC. Similarly, models trained on 1200 UTC data (model c) perform best at 1800 or 0600 UTC; R2 values for OKC range from 63% (model c) to 81% (model b), while values for JFK range from 70% (model c) to 78% (model b).
Results for the SE and NE subsets are different from those for OKC and JFK. Model SE, trained on the five southeastern cities, shows better R2 values at 1800 UTC than at the other times. This occurs because the relatively warm surface temperatures in the 1800 UTC LAMPS data are more similar to the training data (relatively warm southeastern cities) than are the 0000 or 0600 UTC LAMPS data. The opposite holds for the NE subset, which was trained on three northeastern cities. Here, the best results occur at 0600 UTC; R2 values for SE range from 63% (model c at 0000 UTC) to 76% (model b at 1800 UTC), while values for NE range from 63% (model b at 1800 UTC) to 76% (model b at 0600 UTC).
The results indicate that the models tend to perform best on conditions that are similar to those of their training data. Further evidence of this will be given in later sections. In general, for all four subsets, the models trained on 0000 UTC data perform better with the June 1986 LAMPS run than those trained on 1200 UTC. Model b for OKC has the overall best performance at the three LAMPS data times. This is consistent with the previous discussion (Table 2). In addition, those models using both the imager and sounder channels (not shown) are only slightly superior to those using just the sounder channels. Since addition of the imager channels would increase the computational requirements, they were eliminated from further consideration. Finally, models using both 0000 and 1200 UTC data (not shown) are inferior to models using data at just one time.
Based on the results discussed above, model b, with OKC training data, was chosen for further investigation. Figure 1 shows a time series of mean ΔΘe differences and standard deviations of ΔΘe differences between actual and predicted values for this model. The entire LAMPS data domain is considered. At 0000 UTC 17 June, the mean difference is 0.0°C, while the standard deviation is near its minimum (∼4.3°C). Actual values are greater than predicted at times when surface temperatures are warmer than those at 0000 UTC. The opposite holds true at the cooler times. Standard deviations indicate that the variability of the differences increases nearly symmetrically both before and after 0000 UTC. This cyclic trend in both statistics is expected since the model was trained at 0000 UTC but applied to other times.
Model b with OKC data utilizes eight sounder channels: longwave channels 2, 3, and 4; split-window channel 8; midtropospheric water vapor channels 10, 11, and 12; and shortwave channel 13. These channels are listed in Table 4, along with corresponding parameter estimates and p values of these parameter estimates. The p value of a parameter estimate is a measure of confidence in the independent variable. It is the probability that including the variable does not contribute to the fit. Each of our independent variables (channels) exhibits a very small p value (less than 0.003). The p value of the entire model (the observed significance level) is also very good (0.001). Nearly all of the statistical models (described earlier) utilized the same or similar channels as model b from OKC. A later section will show that it is beneficial to apply more than one model to a given time. However, we first will investigate the single model in more detail.
Parameter estimates (Table 4) are sometimes referred to as partial slopes. In a multiple first-order regression model such as ours, the parameter estimate for each channel represents the expected change in the dependent variable (ΔΘe) for each unit increase in the independent variable (channel TB), holding all other independent variables constant (Ott 1993). Shortwave temperature channel 13, with the greatest response near the surface (Table 1), has a negative contribution to the model. As the surface temperature increases, the brightness temperature of this channel increases, producing a decrease in ΔΘe (greater instability). Other channels that contribute negatively to the predicted value are midtropospheric water vapor channels 10 and 12, and longwave temperature channel 2. Channel 2, peaking near 250 hPa, is affected by upper-level temperature changes. Radiances for channels 10 and 12 increase as the middle troposphere dries, thereby enhancing the convective instability (ΔΘe decreases). The remaining channels in the algorithm, sounders 3, 4, 8, and 11, all provide a positive contribution to the model. Channels 3 and 4 are similar to channel 2 but respond to layers in the middle troposphere. Channel 8, the 11-μm “clean” split-window channel, is relatively insensitive to water vapor absorption but is affected strongly by surface temperature. Channel 11 is a midtropospheric water vapor channel with a weighting function peak in between those of channels 10 and 12. It is interesting to note that even though channel 11 has a weighting function peak in between those of channels 10 and 12, its parameter estimate is of opposite sign. Channel 10’s large negative parameter estimate can be easily explained (discussed later). The opposite signs between the parameter estimates of channels 11 and 12 most likely occur because of a small degree of multicollinearity. For example, any change in channel 11 is slightly negated by a similar change in channel 12.
Multicollinearity is expected when using multiple radiometric channels in a procedure such as ours. Specifically, since the satellite’s channels sense broad layers within the atmosphere, there is significant overlap among some of them. The TB’s of these channels are interrelated. For example, as channel 10’s TB increases, so does the TB from channel 11. There are ways by which this effect can be detected and minimized. By comparing the p value of the entire model with those of the parameter estimates, the degree of multicollinearity can be estimated. For example, if the p value indicates a statistically significant model, but the p value of each parameter estimate is large, it is difficult to determine which channels are important because they might be measuring similar phenomena. A small degree of multicollinearity can be seen in our model by noting the relatively large p value of channels 3 and 12. However, these channels still are significant at the 0.003 level, implying statistical importance. We strove to minimize multicollinearity by using a stepwise regression (a backward elimination procedure is one form of this) to eliminate unwanted independent variables. We believe that multicollinearity exists to only a small degree in our model.
The inclusion of channels 4, 8, and 10–13 in our model can be physically explained quite easily. Here, ΔΘe depends greatly on both temperature and moisture in the middle and lower troposphere, and these channels are designed to measure those parameters. However, the presence of channels 2 and 3 is more difficult to explain. They are most likely included because their very broad weighting functions can extend below 600 hPa. Thus, their TB’s are affected by changes in the 620-hPa temperature. The relatively small parameter estimates and cold TB’s associated with these channels imply that their influence on predicted convective instability is small compared with the other independent variables.
Influence of random errors
It is important to determine the influence of random errors that will occur in observed GOES-8 data. Lee et al. (1983) and Chesters et al. (1983) state that estimates of random noise should be included when developing statistical models trained on small samples. We did not include noise in our procedure, under the assumption that the training dataset was sufficiently large. To test this assumption, we added random noise to the TB’s and then ran the same backward elimination procedure described.
The rms noise estimates (NEDT) for GOES-8 from Menzel and Purdom (1994) were randomly applied to all TB’s calculated from the Oklahoma City soundings. This was done five separate times. Table 5 contains parameter estimates from the original noise-free model b (column A) and the five runs containing errors (columns B–F). The predicted values of ΔΘe for representative stable (s) and unstable (u) soundings, along with the explained variance for all 224 soundings, are also listed. Only slight differences, generally less than 1 unit, occur in the parameter estimates when random error is introduced. It also is encouraging that predicted ΔΘe’s for the stable and unstable soundings show maximum changes of only 0.46° and 1.32°C, respectively. The explained variances for all soundings decrease only 1% when the random errors are included. While random noise does affect the regression models, these results indicate that our set of training data is sufficiently large to keep its influence small.
It is informative to examine the model’s performance in controlled tests before applying it to more complex situations. These controlled tests indicate some of the strengths and weaknesses of the model. The tests employed a typical stable 1200 UTC sounding that then was modified to produce progressively greater convective instability. This destabilization was achieved by three different experiments (Fig. 2). The five soundings of experiment A (A1–A5) have the same temperature profile, but the dewpoints are modified (Fig. 2a). In sounding A2, dewpoints at pressures greater than or equal to 850 hPa are increased 1°C, while dewpoints at pressures less than 850 hPa are decreased 3°C. This same modification is repeated for soundings A3 and A4. In sounding A5, the only modification is to decrease dewpoints above 850 hPa by an additional 10°C. All soundings in experiment A are assumed to occur at 1200 UTC—that is, low-level temperatures are not modified.
Experiment B (Fig. 2b) considered four soundings (B1–B4) that simulate daytime heating. Here, temperatures near the surface are increased to produce a dry adiabatic lapse rate over progressively deeper layers. Temperatures below 950 hPa are increased in sounding B2. Sounding B3 is warmed below 900 hPa, while B4 is warmed below 850 hPa. Soundings B1 through B4 are assumed to occur near 1200, 1600, 2000, and 0000 UTC, respectively. Although daytime warming probably alters the dewpoint profile near the surface due to mixing, that effect is not considered here.
Experiment C (Fig. 2c) consisted of five soundings (C1–C5), which are combinations of experiments A and B. This experiment simulates an extreme case of destabilization—that is, drying at midlevels with both moistening and warming near the surface. Channel TB’s for each experiment were obtained using the same radiative transfer code described earlier.
Table 6 lists simulated channel TB’s for each sounding of each experiment. Also included are actual ΔΘe’s (calculated directly from the sounding), predicted ΔΘe’s (calculated from statistical model b) from OKC, and the assumed time of day. Since only the dewpoints are changed in experiment A, TB’s for thermal channels 13 and 8 vary little among the different soundings (Table 6). Channel 8, the clean 11-μm split-window channel, is relatively insensitive to water vapor absorption but is greatly influenced by the surface temperature, which is held constant here. Similarly, channel 13 at 4.57 μm peaks near the surface, and its main purpose is temperature retrieval. The TB’s for water vapor channels 10, 11, and 12 increase substantially as the middle troposphere becomes drier, while TB’s for thermal channels 2, 3, and 4 show only small changes.
Both predicted and actual ΔΘe’s show the expected decreases during this experiment (Table 6). Actual ΔΘe ranges from 2.3°C in the most stable case (A1) to −11.7°C in the most unstable case (A5), while predicted ΔΘe decreases from 9.0°C to −4.4°C. Predicted values are about 7°C greater (more stable) than actual values in all five soundings. This bias is most likely caused by temperature differences between the sounding of experiment A (at 1200 UTC) and those of the training data (at 0000 UTC). Specifically, the surface temperature used here is cooler than most of those used to develop the statistical model at 0000 UTC. This produces relatively cool TB’s in channel 13, causing the model to underestimate the convective instability—that is, predicted ΔΘe’s are too large. This type of bias was noted in an earlier discussion (Fig. 1).
Experiment B (Fig. 2b) maintains constant dewpoints, but low-level temperatures are increased to simulate diurnal heating. This produces large TB changes in both channels 8 and 13 (Table 6). The TB for channel 8 increases almost 14 K over the four soundings, while the value for channel 13 increases over 10 K. The TB’s for upper-level temperature channels 2, 3, and 4 change little since only the lowest levels are warmed. Values for water vapor channels 10, 11, and 12 also remain nearly constant. It should be noted that although channels 8 and 13 produce opposing changes in ΔΘe (Table 4), the magnitude of channel 13’s parameter estimate is nearly twice that of channel 8. Thus, for these channels to completely negate one another in a changing environment, the TB change for channel 8 would have to be twice that of channel 13. This explains why ΔΘe decreases (Table 6) even though the channel 8 TB increases more than its channel 13 counterpart.
Both actual and predicted ΔΘe decrease from B1 to B4 (Table 6), but their differences do not remain constant. For example, the actual ΔΘe for soundings B1 and B2 is 2.3°C; however, the predicted value decreases from being more stable (9.0°C) to less stable (−1.7°C) than the observed. This relatively large decrease occurs because the low-level warming produces temperatures that are more similar to those of the training data at 0000 UTC. On the other hand, the layer being warmed (surface to 950 hPa) is below the lower level used to calculate the actual ΔΘe (920 hPa). Thus, the low-level warming increases TB’s in both channels 8 and 13, thereby decreasing the predicted ΔΘe, although it does not affect actual values. Thus, the model will produce greater than calculated instability between 920 and 620 hPa on days with strong afternoon heating.
Experiment C combines the temperature and moisture modifications used in experiments A and B (Fig. 2c). Actual ΔΘe decreases over 20°C from the most stable case (sounding C1) to the most unstable case (sounding C5) (Table 6), while the predicted value decreases over 24°C. It is interesting to note that the minimum predicted ΔΘe occurs in sounding C3, not C5. This occurs because TB’s for channels 3 and 4 are slightly colder in sounding C3 than in sounding C5. Differences between actual and predicted ΔΘe’s range from less than 6°C (C5) to over 14°C (C3).
To summarize, the controlled experiments verify that the statistical model incorporates variations in both midlevel water vapor and low-level temperature that affect ΔΘe. While predicted ΔΘe’s may differ from the actual, their trends are similar. The experiments also indicate that two related processes affect differences between actual and predicted values: 1) contrasts between actual temperatures and those of the training data, and 2) the levels being used for ground truth calculation. Although not specifically simulated, these processes could be combined to produce even greater discrepancies than those described above. For example, a progressively stronger temperature inversion near the surface should produce variations in experiment A that are opposite those in experiment B. If this were the case, predicted ΔΘe’s would be even larger than observed in experiment A because both processes (levels for ground truth and training–actual temperature contrasts) would produce relatively large predicted values.
As noted earlier, the days 17–18 June 1986 contained large horizontal and vertical humidity gradients. Fuelberg et al. (1991) present a detailed synoptic discussion of the case. The brief description presented here utilizes the LAMPS model run. The strong gradient of moisture from Minnesota to Virginia at 1.5 km (Fig. 3a) denotes the position of a cold front at 1800 UTC. Temperature gradients across the front were very weak (not shown). The analysis of midlevel humidity (Fig. 3b) shows relatively dry conditions over the New England states, Kansas, northwestern South Carolina, and the far West. More humid conditions were located over the Gulf Coast states and near the border of South Dakota and Iowa. Streamlines at 1.5 and 4 km (not shown) for 1800 UTC indicate anticyclonic flow dominating the central United States at both low and midlevels. The jet stream was located in the far northeastern portion of the country (not shown), flowing from northwest to southeast. Most convection on 17 June (not shown) began near 1900 UTC over southeastern Texas and extended from the Florida peninsula through western Tennessee.
Figure 4 contains horizontal analyses of actual and predicted (using model b from OKC) ΔΘe at 1800 UTC. Darker (lighter) shading indicates values less than (greater than) −2°C. Cloudy regions or areas where the surface was above 920 hPa are denoted by the absence of shading outlined by thick solid lines.
The analysis of the predicted ΔΘe (Fig. 4b) contains the major features of the actual version (Fig. 4a), and the relative accuracy is generally good. Both show New England and the Great Lakes to be comparatively stable. Predicted ΔΘe’s are greater than 6°C, while actual values exceed 14°C. No thunderstorms develop in this stable region throughout the period. A strong gradient of ΔΘe stretches from the Dakotas, through Tennessee, and then into Virginia in both analyses. Within the Midwestern portion of the gradient, both actual and predicted ΔΘe’s have similar ranges. Strong instability can be seen over Kansas in both versions; actual and predicted values are near −20°C. The Gulf Coast is very unstable and becomes convectively active later in the day. Predicted ΔΘe’s in this area are considerably less than actual values. This region is bordered on the west by a sharp gradient over Arkansas and Missouri. Soundings in this area (not shown) are very warm and humid at low levels and are similar to those in experiment C (C2 and C3). Most of this area experienced thunderstorms by 2000 UTC.
Figure 5 is a scatterplot of predicted versus actual ΔΘe’s for 1800 UTC 17 June. The R2 value is 0.77. Although the negative bias seen in Fig. 1 is evident, there is a good linear relation. The lobe of points in the lower left quadrant of Fig. 5 corresponds to the Gulf Coast region (Fig. 4b) discussed earlier.
The bias along the Gulf Coast is not unexpected since the model was trained using only Oklahoma City data and then applied to an area comprising most of the United States. Therefore, we investigated whether utilizing more than one statistical model would improve the results. Noting that the area of greatest bias (Figs. 4 and 5) corresponded to warm, moist conditions near the surface, we developed a new model trained on similar soundings. The same data used for the SE subset (five southeastern cities) were used, except that surface temperatures and dewpoints were required to be at least 295 and 289 K, respectively. The original and warm, moist models were then applied at 1800 UTC. If a grid point contained a surface temperature of at least 295 K and a surface dewpoint of at least 289 K, the warm and moist model was used to calculate ΔΘe. If both conditions were not met, the original model b from Oklahoma City was used at that point.
Figure 6 shows the analysis of the predicted ΔΘe for 1800 UTC 17 June, using the two statistical models. Magnitudes and patterns of actual and predicted values are more similar than when only one model was used (Fig. 4). The negative bias along the Gulf Coast has been reduced considerably. The scatterplot of actual versus predicted convective instability (Fig. 7) shows a strong linear fit, with an R2 of 93%, exceeding the earlier value of 77% (Fig. 5). Although a slight overall bias (∼5°C) is still present, the lobe of points corresponding to the Gulf Coast region is no longer present. This improvement is expected since the new model used in the Gulf Coast region was trained on similar data (warm, moist conditions near the surface).
A cross section of specific humidity (g kg−1) at 1800 UTC was prepared to further investigate the model’s performance (Fig. 8). The axis of the cross section (denoted X1) passes through the strong ΔΘe gradient in Fig. 4a. The plot of actual and predicted ΔΘe’s along the cross section (Fig. 9) was prepared using ground truth values and those calculated from the two models (Figs. 4a and 6).
The cross section of specific humidity (Fig. 8) contains several important features. The southwestern (left) portion has a strong vertical humidity gradient—that is, dry air over moist air. Values near the surface exceed 12 g kg−1, while those near 5 km (∼550 hPa) are less than 1 g kg−1. This moisture profile produces large convective instability, and both actual and predicted ΔΘe’s are near −25°C (Fig. 9). In spite of this instability, no significant convection occurs later in the day, due to the anticyclonic circulation (not shown). The strong horizontal gradient of convective instability midway through X1 (Figs. 6 and 9) is due to spatial changes in the vertical moisture distribution (Fig. 8). Specifically, the vertical moisture gradient diminishes toward the north in the lower and middle levels. Predicted and actual ΔΘe’s along X1 (Fig. 9) show good agreement, with a maximum difference of only about 5°C in the northeast portion. This plot indicates that the statistical models successfully resolve the vertical moisture profile as it affects ΔΘe.
Based on the above results for 1800 UTC, the two-model procedure was applied to other model times of the case study. Figure 10 contains actual and predicted ΔΘe’s for 0000 UTC 18 June. Calculations could not be made over a large portion of the area because of extensive cloud cover. Darker (lighter) shading indicates ΔΘe’s less than (greater than) −14°C. There is good agreement between the actual (Fig. 10a) and predicted values (Fig. 10b). Both analyses show the northern and northeastern portions of the domain to have large values with similar patterns, and the gradient stretching from Iowa through Kentucky and Virginia is predicted well. The magnitude and shape of the unstable region over Kansas are similar in both analyses. The scatterplot of actual versus predicted convective instability (Fig. 11) shows a strong linear fit, no discernible bias, and an R2 of 88%.
These analyses (Figs. 4–11) show that the use of multiple models yields high correlations with observed values over a large domain. Each model is applied to locations with weather characteristics similar to those of the training data. Gradients and relative maxima/minima are spatially consistent with ground truth.
Application to observed GOES-8 data
Conclusions about the usefulness of the statistical algorithm cannot be made without using observed GOES-8 data. We used GOES-8 TB’s provided by the National Oceanic and Atmospheric Administration/National Environmental Satellite, Data, and Information Service (NOAA/NESDIS) for this purpose. Specifically, we obtained cloud-free TB’s at 0000 UTC from May to June 1995 that were collocated with National Weather Service radiosonde soundings that are typically released near 2315 UTC. To minimize discrepancies due to horizontal gradients, these RAOB–retrieval pairs were required to be within 50 km of one another. The collocated data consisted of 440 pairings; ΔΘe was calculated from each of the soundings as before. The 440 pairs then were randomly split into two groups (one for training and one for independent), each consisting of 220 pairs. The backward regression procedure then was performed on the training data to develop a statistical model. It should be noted that all 18 sounder channels could be used in this procedure since the TB’s were observed, not simulated.
Table 7 gives statistical information for the resulting model. It produces an R2 of 71% and consists of split window channels 6 and 7, midtropospheric water vapor channels 11 and 12, shortwave temperature channels 13 and 14, and ozone channel 9. This choice of channels is somewhat different from that of the OKC model (Table 4). The OKC version included all three water vapor channels, whereas this model only uses two, omitting channel 10. This channel’s weighting function peaks near 650 hPa but also responds to moisture near the surface. The two 12-μm “dirty” split-window channels (channels 6 and 7) are included now but not with the original OKC version. These channels are somewhat sensitive to low-level water vapor. Thus, channel 10 might include redundant information. Low-level temperature channels are utilized in both the OKC and observed data models (channels 13 and 8 for OKC and channels 13 and 14 for the real data model).
The ozone channel is also included in the observed data model (Table 7). We did not consider this channel in the simulations discussed earlier. Its weighting function (not shown) typically has a strong peak above the tropopause, where the majority of ozone is concentrated. However, the weighting function also indicates significant absorption throughout the troposphere. We believe this channel was included because it is affected by the atmospheric temperature throughout the troposphere, thereby adding unique information to the regression.
The regression equation for ΔΘe was applied to the remaining 220 independent TB observations. Figure 12 is a scatterplot of actual versus predicted ΔΘe’s. There is a good linear fit, but the R2 is reduced to 67% and there is more scatter than seen previously with the models based on simulated TB’s (Figs. 5, 7, and 11). The smaller R2 is expected since there are now several additional sources of discrepancy, which are as follows. 1) There is inexact spatial and temporal collocation between the radiances and soundings. Although we have tried to minimize this effect by requiring collocation to within 50 km and approximately 45 min, it has not been eliminated. Bruce et al. (1977) and Garand (1993) discuss the collocation problem in more detail. 2) There are errors in the radiosonde data. Problems with equipment, data quality, missing data, and reporting practices have been documented (e.g., Pratt 1985; Schwartz and Doswell 1991). Root-mean-square errors as large as 0.8°C have been estimated for RAOB temperatures, while the errors are usually under 20% for relative humidity. 3) GOES-8 TB’s have been averaged over a 5 × 5 array of FOVs (Menzel and Purdom 1994). These average TB’s will not correspond to sonde-derived conditions at a specific point. 4) There is random error present in the observed GOES-8 data. Although these errors are relatively small (Menzel and Purdom 1994), they still affect the comparison in Fig. 12. 5) Cloudy radiances can be present in the observed GOES-8 data. Even though the radiances were screened for clouds, some cloudy pixels might still be present. This can contaminate the observed values.
In spite of the several sources of discrepancy discussed above, the agreement between actual and predicted values of ΔΘe using the observed GOES-8 model (Fig. 12) suggests that such a model can provide valuable mesoscale patterns of convective instability.
Performance of model using VAS radiances
The GOES-8 instrument was designed to provide improved vertical resolution compared to GOES-7 VAS. We examined the influence of this improved resolution on ΔΘe by performing a final experiment. Specifically, we ran a backward elimination regression using the same soundings as model b, with OKC data. However, this time GOES-7 VAS radiances were simulated and used as the independent variables. All auxiliary calculations (skin temperatures, radiances, cloud checks) were done as before. The same ground truth convective instability used for the GOES-8 model also was used. The 3.95-μm channel (shortwave window) was not considered.
Table 8 gives statistics for the GOES-7 VAS model. VAS channels 3 and 4 are longwave temperature channels, 6 and 11 are shortwave temperature channels, channel 8 is the 11-μm clean split-window channel, and channels 9 and 10 are midtropospheric water vapor channels. Most of these channels are similar to those of the GOES-8 model (Table 4)—that is, similar wavelength intervals are used in both. However, one should note that the GOES-8 model utilized all 3 of its midtropospheric water vapor channels, whereas the VAS model uses only the 2 that are available. This suggests that the VAS model cannot provide as much information about the vertical distribution of midtropospheric water vapor as its GOES-8 counterpart, and this quantity is important in determining convective instability. This decreased vertical resolution in the moisture channels apparently is the cause for the smaller explained variance for the GOES-7 model compared to the GOES-8 version (90% to 94%).
The VAS model was applied to the LAMPS data at 1800 UTC 17 June. The scatterplot of actual versus predicted ΔΘe’s (Fig. 13) is similar to that of the GOES-8 model (Fig. 5), including the lobe of biased values in the lower left quadrant. The increased scatter for the VAS plot is reflected in the reduced explained variance (77% vs 72%). Although the performance of the VAS model for ΔΘe is not greatly inferior to that of GOES-8, the smaller number of VAS channels does appear to affect the results.
Summary and conclusions
Statistical algorithms have been developed to diagnose the vertical change in equivalent potential temperature (ΔΘe) between 920 and 620 hPa, using radiances from GOES-8. Training data for these algorithms consisted of National Weather Service radiosonde observations from 10 United States cities in 1979. Simulated GOES-8 brightness temperatures for all 4 imager channels and 18 sounder channels were calculated from these soundings. The independent data consisted of a LAMPS model run for 17–18 June 1986.
The training data were stratified into four subsets (depending on location) and then further categorized to seven groups (depending on time and instrument). Backward elimination regression was used to prepare the statistical algorithm. The statistical models using only 0000 or 1200 UTC soundings were found to explain approximately 7% more of the variance in observed ΔΘe than those models based on both 0000 and 1200 UTC soundings. Information from the imager channels was shown to be redundant to the regression and, thus, unnecessary. The statistical models then were applied to the LAMPS independent data to determine which performed best over diurnally varying conditions. The models containing only 0000 UTC soundings (model b) were consistently superior to those using 1200 UTC soundings. Model b, with data from Oklahoma City, had the highest R2 values (70% and 81% at 0600 and 0000 UTC, respectively). It was investigated further.
This model’s performance was analyzed in three controlled experiments in which a stable sounding was modified to exhibit progressively greater convective instability. This destabilization was achieved three ways: 1) by increasing the low-level moisture and decreasing the mid- and upper-level moisture while holding the temperature constant, 2) by increasing low-level temperatures to a dry adiabatic lapse rate over progressively deeper layers while holding the dewpoint constant, and 3) by combining moisture modifications of experiment A with temperature modifications of experiment B. Although the model exhibited some bias in all three experiments, actual and predicted values of ΔΘe had similar trends. Low-level heating (cooling) caused the model to overestimate (underestimate) the convective instability. Differences between actual temperatures and those of the 0000 UTC training data were also a source of bias in ΔΘe.
The performance of statistical model b, using OKC data, was then analyzed in detail using the LAMPS independent data. Horizontal analyses showed that the model usually gave a faithful representation of observed stability patterns. Most maxima, minima, and strong gradients of ΔΘe were reasonably placed. Statistics indicated that the model overestimated the convective instability at 1800 UTC but predicted both the trends and magnitudes of the instability at 0000 UTC. These findings are consistent with those of the controlled experiments. The model did produce an area of large bias along the Gulf Coast. This occurred because only one model (trained on OKC data) was used and then applied to an area comprising most of the United States. A new model was developed, trained on soundings similar to those found in the biased area (warm and moist near the surface). A simple surface temperature–surface dewpoint threshold was used to determine whether the original or warm, moist model should be applied at a given location. The addition of the new model increased the accuracy of the diagnosis substantially. The R2 values for 1800 UTC increased by approximately 16% when the two models were applied.
These results show the importance of using statistical models that are trained on conditions that are appropriate for the location in question. Chesters (1983, 1987) also noted the importance of local training in their statistical algorithm for GOES-7 VAS. If our statistical approach were to be used operationally, a variety of models might be developed, each based on specific thermal and moisture conditions.
A cross section intersecting the strong gradient of convective instability at 1800 UTC showed that the vertical moisture distribution and surface temperatures greatly influenced ΔΘe. The algorithm for ΔΘe responded correctly to these variations.
The methodologies then were tested with observed GOES-8 radiances. Four hundred forty pairs of TB–RAOB data were used to develop and evaluate the algorithm. Results indicated that several sources of error reduced the accuracy of the model, but relative magnitudes and trends of ΔΘe still could be determined.
As a final experiment, simulated GOES-7 VAS radiances were used to develop an algorithm using the data in model b for OKC. The resulting model then was applied to the 1800 UTC 17 June LAMPS data. The VAS model for ΔΘe did not perform as well as the version using GOES-8 radiances—the explained variance decreased approximately 5%. This reduction is believed to be caused by VAS’s diminished vertical resolution of humidity.
In summary, statistical algorithms employing GOES-8 radiance data can be used to detect major features of convective instability. Gradients and relative maxima and minima can be diagnosed well. When more than one model is applied, accuracy increases substantially, and biases are minimized. Physical algorithms for retrieving satellite-derived products appear to be superior to statistical schemes because atmospheric physics is incorporated (Eyre 1987; Bates 1994). We do not disagree with this assessment. However, two major advantages of simple statistical algorithms such as ours are the speed and simplicity with which calculations can be made. Although modern computers allow complex physical algorithms to be run at high speed, regression-type products can still be run in a fraction of that time. Plus, the simple relations contained in a regression equation allow the user to easily understand the role of the brightness temperatures in producing the convective instability. We believe that a convective instability algorithm using regression models (each using training data with specific weather characteristics) would be a useful tool within the operational community where mesoscale aspects of the stability field are important.
This research was sponsored by NASA Grant NAG8-911 under the auspices of the Marshall Space Flight Center. We would like to thank Dr. Christopher Hayden of NOAA/NESDIS and Tim Schmit of the Cooperative Institute for Meteorological Satellite Studies, both in Madison, Wisconsin, for providing the observed GOES-8 data. We also would like to thank Dr. P. V. Rao of the University of Florida for assistance with the statistics.
Bates, J. J., 1994: An assessment of global water vapor retrievals using satellite infrared and microwave observations: A GEWEX water vapor project (GVAP) update. Preprints, Seventh Conf. on Satellite Meteorology and Oceanography, Monterey, CA, Amer. Meteor. Soc., 75–79.
Bruce, R. E., L. D. Duncan, and J. H. Pierluissi, 1977: Experimental study of the relationships between radiosonde temperatures and satellite-derived temperatures. Mon. Wea. Rev.,105, 493–496.
Carlson, T. N., 1991: Mid-Latitude Weather Systems. Harper Collins Academic, 507 pp.
Chesters, D., and A. Mostek, 1985: High resolution images of atmospheric parameters computed directly from VAS satellite radiances and conventional surface data. Preprints, Int. Conf. on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Los Angeles, CA, Amer. Meteor. Soc., 261–268.
——, L. W. Uccellini, and W. D. Robinson, 1983: Low-level water vapor fields from the VISSR atmospheric sounder (VAS) “split window” channels. J. Climate Appl. Meteor.,22, 725–743.
——, W. D. Robinson, and L. W. Uccellini, 1987: Optimized retrievals of precipitable water from the VAS split window. J. Climate Appl. Meteor.,26, 1059–1066.
Doty, K. G., and D. J. Perkey, 1993: Sensitivity of trajectory calculations to the temporal frequency of wind data. Mon. Wea. Rev.,121, 387–401.
Eyre, J. R., 1987: On systematic errors in satellite sounding products and their climatological mean values. Quart. J. Roy. Meteor. Soc.,113, 279–292.
Fuelberg, H. E., and S. R. Olson, 1991: An assessment of VAS-derived retrievals and parameters used in thunderstorm forecasting. Mon. Wea. Rev.,119, 795–814.
——, R. L. Schudalla, and A. R. Guillory, 1991: Analysis of sudden mesoscale drying at the surface. Mon. Wea. Rev.,119, 1391–1406.
Garand, L., 1993: A pattern recognition technique for retrieving humidity profiles from Meteosat or GOES imagery. J. Appl. Meteor.,32, 1592–1607.
Guillory, A. R., G. J. Jedlovec, and H. E. Fuelberg, 1993: A technique for deriving column-integrated water content using VAS split-window data. J. Appl. Meteor.,32, 1226–1241.
Hayden, C. M., 1988: GOES-VAS simultaneous temperature–moisture retrieval algorithm.J. Appl. Meteor.,27, 705–733.
——, G. S. Wade, and T. J. Schmit, 1996: Derived product imagery from GOES-8. J. Appl. Meteor.,35, 153–162.
Holton, J. R., 1992: An Introduction to Dynamic Meteorology. Academic Press, 497 pp.
Jedlovec, G. J., 1990: Precipitable water estimation from high-resolution split window radiance measurements. J. Appl. Meteor.,29, 863–877.
Lee, T.-H., D. Chesters, and A. Mostek, 1983: The impact of conventional surface data upon VAS regression retrievals in the lower tropopause. J. Climate Appl. Meteor.,22, 1853–1874.
Liou, K.-N., 1980: An Introduction to Atmospheric Radiation. Academic Press, 392 pp.
McMillin, L. M., H. Fleming, and M. L. Hill, 1979: Atmospheric transmittance of an absorbing gas. 3: A computationally fast and accurate transmittance model for absorbing gases with variable mixing ratios. Appl. Opt.,18, 1600–1606.
Menzel, W. P., and J. F. W. Purdom, 1994: Introducing GOES-I: The first of a new generation of geostationary operational environmental satellites. Bull. Amer. Meteor. Soc.,75, 757–781.
Ott, R. L., 1993: An Introduction to Statistical Methods and Data Analysis. Marion Merrell Dow, 1051 pp.
Pratt, R. W., 1985: Review of radiosonde humidity and temperature errors. J. Atmos. Oceanic Technol.,2, 404–407.
SAS Institute Inc., 1990: User’s guide. Version 6, Vols. 1 and 2, 1862 pp. [Available from SAS Institue, Inc., SAS Campus Drive, Cary, NC 27513.].
Schwartz, B. E., and C. A. Doswell III, 1991: North American rawinsonde observations: Problems, concerns, and a call to action. Bull. Amer. Meteor. Soc.,72, 1885–1896.
Smith, W. L., H. M. Woolf, and A. J. Schreiner, 1985: Simultaneous retrieval of surface atmospheric parameters: A physical and analytically direct approach. Advances in Remote Sensing, A. Deepak, H. E. Fleming, and M. T. Chahine, Eds., A. Deepak Publishing, 221–232.
GOES-8 channel properties (after Menzel and Purdom 1994). Response denotes the peak of the weighting function in a standard atmosphere.
Explained variances R2 (%) for all models of each subset of training data. The number of soundings is given in parentheses for each entry. Time and instrument also are included for each model (12–1200 UTC; Im—Imager, Sd—Sounder).
Explained variances R2 (%) of models b and c of the four subsets when applied to 1800 UTC 17 June and 0000 and 0600 UTC 18 June of the LAMPS independent data.
Channels, parameter estimates, and p values of statistical model b, using OKC data. Each channel’s central wavelength is also included.
Parameter estimates for model b, with and without random noise. Column A is the original result for model b, which contained no random noise. Columns B–F are for models containing random noise. Also listed are predicted Δϒe’s (°C) for a stable (s) and an unstable (u) vertical profile along with explained variance (%) for all 223 soundings.
Channel TB’s with actual and predicted Δϒe’s for each sounding in experiments A, B, and C. Numbers in parentheses are the assumed time of day (UTC). Sounding profiles used in the experiments are found in Fig. 2.
Channels, parameter estimates, and p values of the statistical model using observed GOES-8 data. Each channel’s central wavelength is also included.
Channels, parameter estimates, and p values for the statistical model using GOES-7 VAS radiances with the Oklahoma City data. Each channel’s central wavelength is also included.