• Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129 , 28842903.

  • Bishop, C. H., , B. Etherton, , and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129 , 420436.

    • Search Google Scholar
    • Export Citation
  • Dee, D. P., , and A. M. da Silva, 2003: The choice of variable for atmospheric moisture analysis. Mon. Wea. Rev., 131 , 155171.

  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 , (C5). 1014310162.

    • Search Google Scholar
    • Export Citation
  • Holm, E., , E. Anderson, , A. Beljaars, , P. Lopez, , J-F. Mahfouf, , A. J. Simmons, , and J-N. Thepaut, 2002: Assimilation and modeling of the hydrological cycle: ECMWF’s status and plans. ECMWF Tech. Memo. 383, 55 pp.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., , and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129 , 123137.

    • Search Google Scholar
    • Export Citation
  • Hunt, B. R., and Coauthors, 2004: Four-dimensional ensemble Kalman filtering. Tellus, 56A , 273277.

  • Hunt, B. R., , E. J. Kostelich, , and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230 , 112126.

    • Search Google Scholar
    • Export Citation
  • Liu, J., 2007: Applications of the LETKF to adaptive observations, analysis sensitivity, observation impact and assimilation of moisture. Ph.D. thesis, University of Maryland, College Park, 154 pp.

  • Miyoshi, T., , and S. Yamane, 2007: Local ensemble transform Kalman filter with an AGCM at a T159/L48 resolution. Mon. Wea. Rev., 135 , 38413861.

    • Search Google Scholar
    • Export Citation
  • Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A , 415428.

  • Parrish, D. F., , and J. C. Derber, 1992: The National Meteorological Center’s spectral statistical-interpolation analysis system. Mon. Wea. Rev., 120 , 17471763.

    • Search Google Scholar
    • Export Citation
  • Susskind, J., , C. Barnet, , and J. Blaisdell, 2003: Retrieval of atmospheric and surface parameters from AIRS/AMSU/HSB data in the presence of clouds. IEEE Trans. Geosci. Remote Sens., 41 , 390409.

    • Search Google Scholar
    • Export Citation
  • Szunyogh, I., , E. J. Kostelich, , G. Gyarmati, , E. Kalnay, , B. R. Hunt, , E. Ott, , E. Satterfield, , and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation for the NCEP global model. Tellus, 60A , 113130.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., , and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130 , 19131924.

  • Whitaker, J. S., , T. M. Hamill, , X. Wei, , Y. Song, , and Z. Toth, 2008: Ensemble data assimilation with the NCEP Global Forecast System. Mon. Wea. Rev., 136 , 463482.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. Academic Press, 611 pp.

  • View in gallery

    (top) The spatial coverage of temperature observations (gray dots are the AIRS temperature retrievals; black dots are the conventional temperature observations) around 500 hPa on 22 Jan 2004. The AIRS specific humidity retrievals have the same spatial coverage as the temperature retrievals. (bottom) Total number of AIRS specific humidity retrievals assimilated in a time interval of every 6 h at 505 hPa.

  • View in gallery

    Time evolution of 500-hPa global average rms errors for the analysis of (top left) the relative humidity, (top right) temperature (K), (bottom left) zonal wind (m s−1), and (bottom right) meridional wind (m s−1). The control run (blue line), passive q (gray line with open circles), univariate q (green line), and multivariate q (black line) are shown. Here and in the following figures, the verification is made against NCEP high-resolution (T256L28) analyses.

  • View in gallery

    The time average zonal mean-square error difference between three humidity experiments and the control run [(top) between the passive-q experiment and the control run, (middle) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run] for (left) specific humidity (g2 kg−2) and (right) zonal wind (m2 s−2). Gray shades indicate where the humidity runs have a smaller mean-square error than the control run, the contours indicate where the humidity runs have a larger mean-square error than the control run, and the magnitude is larger than 0.0005 g2 kg−2 or 0.5 m2 s−2.

  • View in gallery

    The 300-hPa time average zonal wind analysis mean-square error difference (m2 s−2) (top left) between the passive-q experiment and the control run, (top right) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run. All shaded areas and contours (the absolute values that pass the statistical test change with the comparison pairs, see details in section 4c) are significant at the 95% level. The shaded areas are where the humidity runs have better performance than the control run, and the contours are where the humidity runs have worse performance than the control run.

  • View in gallery

    The total column precipitable water 6-h forecast mean-square error (kg2 m−4) comparison (top) between the passive-q experiment and the control run, (middle) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run. Absolute differences larger than 6.0 kg2 m−4 are significant at the 95% level in (top), those larger than 5.5 kg2 m−4 are significant at the same level in (middle), and those larger than 2.5 kg2 m−4 are significant in (bottom). The negative values are shades, where the humidity runs have smaller mean-square errors; and the positive values are contours, where the humidity runs have larger mean-square errors.

  • View in gallery

    500-hPa global average forecast rms errors for (top) relative humidity and (bottom) zonal wind (m s−1). Multivariate q (gray line), univariate q (black line with open circles), passive q (black line with closed circles), and control run (gray line with squares) are shown.

  • View in gallery

    48-h forecast mean-square error (m2 s−2) comparisons for zonal wind (top left) between the passive-q experiment and the control run, (top right) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run. Gray shading indicates where the humidity runs have smaller mean-square errors than the control run, the contours indicate where the humidity runs have larger mean-square errors than the control run, and the magnitude is larger than 1.0 m2 s−2. The values between −1.0 and 1.0 m2 s−2 are blank.

  • View in gallery

    The 300-hPa time-average 48-h zonal wind forecast mean-square error difference (m2 s−2) (top left) between the passive-q experiment and the control run, (top right) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run. Absolute differences larger than 5.0 m2 s−2 are significant at the 95% level in the comparison in (top left), those larger than 4.0 m2 s−2 are significant in (top right), and the significant level is 3.5 m2 s−2 in (bottom). Gray shading indicates where the humidity runs have smaller mean-square errors than the control run, and the contours indicate where the humidity runs have larger mean-square errors than the control run.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 60 60 10
PDF Downloads 26 26 2

Univariate and Multivariate Assimilation of AIRS Humidity Retrievals with the Local Ensemble Transform Kalman Filter

View More View Less
  • 1 University of California, Berkeley, Berkeley, California
  • | 2 Shanghai Typhoon Institute of CMA, Shanghai, China
  • | 3 University of Maryland, College Park, College Park, Maryland
  • | 4 Arizona State University, Tempe, Arizona
  • | 5 Texas A&M University, College Station, Texas
© Get Permissions
Full access

Abstract

This study uses the local ensemble transform Kalman filter to assimilate Atmospheric Infrared Sounder (AIRS) specific humidity retrievals with pseudo relative humidity (pseudo-RH) as the observation variable. Three approaches are tested: (i) updating specific humidity with observations other than specific humidity (“passive q”), (ii) updating specific humidity only with humidity observations (“univariate q”), and (iii) assimilating the humidity and the other observations together (“multivariate q”). This is the first time that the performance of the univariate and multivariate assimilation of q is compared within an ensemble Kalman filter framework. The results show that updating the humidity analyses by either AIRS specific humidity retrievals or nonhumidity observations improves both the humidity and wind analyses. The improvement with the multivariate-q experiment is by far the largest for all dynamical variables at both analysis and forecast time, indicating that the interaction between the specific humidity and the other dynamical variables through the background error covariance during data assimilation process yields more balanced analysis fields. In the univariate assimilation of q, the humidity interacts with the other dynamical variables only through the forecast process. The univariate assimilation produces more accurate humidity analyses than those obtained when no humidity observations are assimilated, but it does not improve the accuracy of the zonal wind analyses. The 6-h total column precipitable water forecast also benefits from the improved humidity analyses, with the multivariate q experiment having the largest improvement.

Corresponding author address: Junjie Liu, Earth and Planetary Science Department, University of California, Berkeley, Berkeley, CA 94720. Email: jjliu@atmos.berkeley.edu

This article included in the Mathematical Advances in Data Assimilation (MADA) special collection.

Abstract

This study uses the local ensemble transform Kalman filter to assimilate Atmospheric Infrared Sounder (AIRS) specific humidity retrievals with pseudo relative humidity (pseudo-RH) as the observation variable. Three approaches are tested: (i) updating specific humidity with observations other than specific humidity (“passive q”), (ii) updating specific humidity only with humidity observations (“univariate q”), and (iii) assimilating the humidity and the other observations together (“multivariate q”). This is the first time that the performance of the univariate and multivariate assimilation of q is compared within an ensemble Kalman filter framework. The results show that updating the humidity analyses by either AIRS specific humidity retrievals or nonhumidity observations improves both the humidity and wind analyses. The improvement with the multivariate-q experiment is by far the largest for all dynamical variables at both analysis and forecast time, indicating that the interaction between the specific humidity and the other dynamical variables through the background error covariance during data assimilation process yields more balanced analysis fields. In the univariate assimilation of q, the humidity interacts with the other dynamical variables only through the forecast process. The univariate assimilation produces more accurate humidity analyses than those obtained when no humidity observations are assimilated, but it does not improve the accuracy of the zonal wind analyses. The 6-h total column precipitable water forecast also benefits from the improved humidity analyses, with the multivariate q experiment having the largest improvement.

Corresponding author address: Junjie Liu, Earth and Planetary Science Department, University of California, Berkeley, Berkeley, CA 94720. Email: jjliu@atmos.berkeley.edu

This article included in the Mathematical Advances in Data Assimilation (MADA) special collection.

1. Introduction

Humidity is an important dynamical variable in numerical weather forecast models because it not only determines the occurrence of precipitation, but also changes temperature through evaporation and condensation processes and affects winds by changing the pressure gradient. However, because of the special error characteristics of humidity variables, the poor quality of observations, and the model errors related with moisture parameterizations (Dee and da Silva 2003), humidity data assimilation remains a challenging problem.

Several studies have shown that assimilation of humidity observations improves the accuracy of analysis states with an appropriate selection of humidity variable type within variational data assimilation framework (e.g., Dee and da Silva 2003; Holm et al. 2002). Ensemble Kalman filters (EnKF; Evensen 1994; Houtekamer and Mitchell 2001; Anderson 2001; Bishop et al. 2001; Whitaker and Hamill 2002; Ott et al. 2004; Hunt et al. 2004, 2007), a different type of data assimilation scheme, have been used to assimilate real observations with encouraging results (Miyoshi and Yamane 2007; Szunyogh et al. 2008; Whitaker et al. 2008); however, so far, there has been no systematic study on how best to assimilate humidity observations within EnKF framework and what is the impact of different representation of relationship between humidity and the other variables on humidity data assimilation. A unique feature of EnKF is its ability to explicitly estimate background error covariance among different dynamical variables in each data assimilation cycle, so that it can naturally use one type of observations to update the analyses of the other dynamical variable types based on the flow-dependent background error covariance. In this study, we use the local ensemble transform Kalman filter (LETKF; Ott et al. 2004; Hunt et al. 2007), one type of EnKF, to do both multivariate assimilation and univariate assimilation of Atmospheric Infrared Sounder (AIRS) specific humidity retrievals (provided by C. Barnet 2007, personal communication). In multivariate assimilation, the humidity variable interacts with the other dynamical variable types through the background error covariance during data assimilation; while in univariate assimilation, humidity analyses are updated by humidity observations only (section 3). AIRS is a high-spectral-resolution instrument, and it has been shown that the humidity retrievals are of high quality (Susskind et al. 2003). The questions we address here, in addition to investigating the impact of the humidity retrievals on the analyses, include whether the humidity analyses can be improved by coupling the humidity background errors with those of the other variables, and whether the multivariate assimilation of humidity improves the accuracy of the analyses of the other dynamical variables (e.g., winds) compared to the univariate assimilation.

The paper is organized as follows: section 2 briefly describes the LETKF and the data assimilation system, section 3 provides a detailed description of the experimental design and verification methods, section 4 presents the results of the numerical experiments, and section 5 summarizes our main findings.

2. The LETKF and data assimilation system

The LETKF is an efficient type of EnKF derived from both the local ensemble Kalman filter (LEKF; Ott et al. 2004) and the ensemble transform Kalman filter (ETKF; Bishop et al. 2001) algorithms. Hunt et al. (2007) provide a detailed description of the LETKF and explain how it differs from the other formulations of ensemble-based Kalman filters. Here, we discuss only the analysis steps that are essential to explain the humidity assimilation experiments with the LETKF.

In the LETKF, the background perturbations and the interpolation of the background ensemble forecasts to observation space are computed globally, but most of the other steps are performed locally at each grid point, assimilating only the observations within a certain distance of the given grid point.

The global background ensemble perturbation matrix 𝗫b is the difference between the ensemble forecasts xb(i) and the ensemble forecast mean state xb verified at the analysis time; the ith column of 𝗫b is xb(i)xb. We use the superscripts b and a to denote the background and analysis state, respectively. The nonlinear observation operator h(·) transforms each ensemble forecast member xb(i) {i = 1, 2, … , k} to observation space to obtain the global background observation ensemble yb(i) = h(xb(i)), {i = 1, 2, … , k}, where k is the total number of ensemble members. The difference between the background observation ensemble yb(i) and the mean of the background observation ensemble yb is the background observation ensemble perturbation matrix 𝗬b; its ith column is yb(i)yb. When calculating the analysis mean state and analysis perturbations, 𝗫b, 𝗬b, yb and the observation vector yo are all defined on a local region centered at each grid point. We follow the notation of Szunyogh et al. (2008) using the subindex (l) to indicate a quantity defined on a local region. According to Hunt et al. (2007), the analysis mean state in the center of a local region is equal to
i1520-0493-137-11-3918-e1
where w is the mean weighting vector given by
i1520-0493-137-11-3918-e2
The observation error covariance 𝗥 is assumed to be diagonal. Equations (1) and (2) indicate that the mean analysis increment is a linear combination of the background perturbations ; the weighting vector is a function of the assimilated observational increments . The local analysis perturbation matrix is
i1520-0493-137-11-3918-e3
where is the analysis error covariance matrix in the subspace of the background ensemble perturbations. The analysis perturbations are thus obtained by a transformation of the background perturbations . The global ensemble analyses are obtained by assembling the local analyses at each grid point.

Szunyogh et al. (2008) and Whitaker et al. (2008) have implemented the LETKF on the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS; at T62L28 resolution) for the assimilation of nonradiance observations. In data-sparse regions, both studies obtained better performance with the LETKF than with the statistical spectral interpolation (SSI; Parrish and Derber 1992) of NCEP. In the system of Szunyogh et al. (2008), which does not assimilate humidity observations, the specific humidity analyses are obtained by simply copying the background forecasts in each analysis cycle. This experimental setup provides the control run in our experimental design, except that we also assimilate AIRS temperature retrievals (see details in section 3). In section 3, we describe several different ways to update humidity in this system, and compare the results in section 4.

3. Experimental design and verification methods

We assimilate AIRS specific humidity retrievals with the LETKF. From Dee and daSilva (2003), we know, however, that specific humidity observation errors have a non-Gaussian distribution, with abrupt value changes in both space and time. Thus, specific humidity observations have to be transformed to a new variable that has more Gaussian error distribution, as required by the data assimilation algorithm.

Relative humidity has a more Gaussian error distribution than specific humidity, but it has the disadvantage of having a strong error correlation with temperature observations; this correlation is usually neglected in data assimilation. The logarithm of specific humidity has more Gaussian error distribution than specific humidity, and the error has no correlation with temperature, but a small value must be substituted when specific humidity is zero; this introduces a bias. Dee and da Silva (2003) proposed to convert humidity observations to pseudo relative humidity (pseudo-RH), which they defined as the ratio between observed specific humidity and the saturated specific humidity from background forecast. Like relative humidity, the newly formulated variable has a more Gaussian error distribution than specific humidity. Also, since the saturated humidity used for normalization comes from the background, it does not have error correlations with temperature observations; nor does it introduce a bias in the case of zero humidity. On the other hand, this approach has the potential disadvantage that the errors in the humidity observations could become correlated with the errors in the background, leading to a different violation of the assumptions made in the formulation of the analysis schemes.

Liu (2007) tested the use of pseudo-RH in the EnKF framework and found that, in practice, it provided more accurate humidity analyses than either specific humidity or relative humidity. Motivated by these results, in the present study we convert the AIRS specific humidity retrievals to pseudo-RH observations, and obtain the observation error variance by numerical experimentation.

In assimilating pseudo-RH with the LETKF, we normalize the local specific humidity observations by the mean background saturated specific humidity qsb at the observation locations as follows:
i1520-0493-137-11-3918-e4
Here, 𝗘 is the diagonal matrix whose entries are the mean background saturated specific humidity values interpolated to the observation locations. The corresponding background pseudo-RH at the observation location is equal to
i1520-0493-137-11-3918-e5
Here, 𝗗 is the diagonal matrix whose entries are the mean background saturated specific humidity values at the model grid points. (Notice that the dimensions of 𝗘 and 𝗗 are different.) Thus, the humidity components of the ensemble perturbations at observation locations are
i1520-0493-137-11-3918-e6
In applying the observation operator in Eqs. (5) and (6), we first do normalization, and then do spatial interpolation. Our motivation is that the spatial variability of the normalized specific humidity perturbations is less than that of the specific humidity perturbations, so that the spatial interpolation of normalized perturbations is more accurate.

The humidity components of the background ensemble perturbation matrix 𝗫b in Eqs. (1) and (2) are not affected by the change of observed variable and they remain as specific humidity: the ith column is equal to . Thus, substituting Eqs. (4), (5), and (6) into Eqs. (1) and (2), we obtain the mean and the perturbed analyses in specific humidity units.

The goals of our study are to explore the impact of winds, temperature, and surface pressure observations on humidity analyses through the background error covariance and also to examine the differences between univariate assimilation and multivariate assimilation of humidity observations. To reach these goals, we design four experiments (Table 1). The first experiment is the control run, in which the specific humidity analyses are copied from the background as in Szunyogh et al. (2008), and the updated state variables include only winds, temperature, and surface pressure. The observations include all observations that were operationally assimilated at NCEP between 0000 UTC 1 January 2004 and 1800 UTC 31 January 2004, with the exception of satellite radiances, but including all satellite-derived wind observations. In addition, we assimilate AIRS temperature retrievals provided by C. Barnet (2007, personal communication), which were not assimilated operationally by NCEP. Figure 1 shows an example of observation coverage for temperature around 500 hPa in a particular day (i.e., 22 January 2004).

The observation error standard deviation for operational assimilated observation data is provided along with the observations by NCEP. The observation error standard deviations for AIRS temperature retrievals were also provided by C. Barnet (2007, personal communication). To compensate for the neglect of observation error correlations between retrieval values in the same vertical column that are the result of overlaps between the weighting functions of the different channels, we increase the magnitude of the estimates of these observation errors by a factor of 2.

The second experiment, “passive q,” assimilates the same observations as the control run. The difference between passive q and the control run is the inclusion of specific humidity as part of the state vector xb in the passive-q experiment. With this change, the specific humidity analyses are not copied from the background ensemble any longer, but are updated during the data assimilation based on the nonhumidity observations through the error covariance term between the specific humidity and the other dynamical variables. A comparison between passive q and the control run shows the impact of winds, temperature, and surface pressure observations on the quality of specific humidity analyses.

The third experiment, “univariate q,” has two parallel assimilation cycles. One is the same as the control run, which creates the updated winds, temperature, and surface pressure state variables. The other is the univariate assimilation of AIRS specific humidity retrievals, which uses the AIRS specific humidity observations to update the specific humidity component of the state vector. The final analysis is the concatenation of the first analysis with the univariate humidity analysis. As explained earlier, in assimilating the specific humidity AIRS retrievals, we convert the specific humidity observations to pseudo-RH as proposed by Dee and da Silva (2003). The AIRS specific humidity retrievals have the same observation coverage as AIRS temperature observations (gray dots and bottom bar plot in Fig. 1). However, since the quality of the AIRS specific humidity retrievals between 1000 and 700 hPa is relatively poor (C. Barnet 2007, private communication), we exclude the humidity retrievals between these levels.

The last experiment, “multivariate q,” fully couples winds, temperature, surface pressure, and specific humidity during the data assimilation through the error covariance: the AIRS specific humidity observations are used to simultaneously update winds, temperature, surface pressure, and specific humidity components of the state vector. As in the univariate-q experiment, we convert specific humidity to pseudo-RH. A comparison between the multivariate-q and the univariate-q experiments will show the impact of the specific humidity observations on the analyses of the other state vector components. In our discussion, we will use the phrase “humidity runs” to refer to all three experiments (i.e., passive q, univariate q, and multivariate q) that update the humidity state vector during the analysis process.

We run each experiment for a month from 0000 UTC 1 January 2004 to 1800 UTC 31 January 2004, with the analysis states being updated every 6 h. The analysis states (sections 4a, 4b, and 4c), and short-term forecasts (sections 4d and 4e) are verified against the higher-resolution (T256L28) operational analyses of NCEP, which were obtained by assimilating a large number of radiance observations in addition to the conventional observations. Because of the higher resolution and the assimilation of a much larger number of observations, which include humidity observations (but do not include AIRS data), the verification analyses are much more accurate (Whitaker et al. 2008), and were also used as verification states in Szunyogh et al. (2008) and Whitaker et al. (2008). In addition, unlike conventional observations (black dots in Fig. 1) that are commonly used as verification data, the operational analyses have uniform coverage throughout the globe, which is essential to assess the impact of assimilating the AIRS humidity retrievals that have the highest concentration over the oceans.

Two statistical quantities are used to show the difference in accuracy between the humidity runs and the control run. One is root-mean-square (rms) error (Figs. 2 and 6), which shows the absolute magnitude of the analysis or forecast error of the humidity runs and the control run. The other is mean-square error, which is used in calculating the error difference (Figs. 3 and 7) and testing the significance of the error difference (Figs. 4, 5, and 8) between the humidity runs and the control run. The mean-square error instead of rms error is used in comparing the error differences because it reduces the impact of the possible error present in the verification states, as shown in Szunyogh et al. (2008) and in the appendix.

4. Results

We evaluate the performance of the humidity runs by comparing the accuracy of the analyses and forecasts from these runs with those from the control run. These comparisons show the impact of the different ways of humidity data assimilation (i.e., passive q, univariate q, and multivariate q) on the analyses and forecasts of the humidity and the other dynamical variables.

a. Global mean analysis accuracy

Figure 2 shows the time evolution of the global average analysis rms error (estimated by comparing with the high resolution NCEP operational analyses) for the relative humidity (top-left panel), the temperature (top-right panel), and the winds (bottom panel) at the 500-hPa level. The rms error of the relative humidity analyses from the univariate-q and the multivariate-q experiments are smaller than those from the control run and the passive-q experiment, with the rms error reduced by about 10%–15% from the rms error of the control run. Initially, the rms error of the relative humidity analysis for the univariate-q experiment decreases more sharply than for the multivariate-q experiment. After about 10 days, the relative humidity rms errors are comparable between these two experiments. A comparison between the control run and the passive-q experiment also shows that, after about 10 days, the relative humidity rms error of the passive-q experiment attains the same level of accuracy as the control run and becomes smaller afterward. The large relative humidity rms error peaks (e.g., between 0600 UTC 12 January 2004 and 0600 UTC 14 January 2004) in the multivariate-q and the univariate-q experiments correspond to the periods where the AIRS specific humidity retrievals are missing (bottom panel in Fig. 1).

The rms errors of the temperature analyses (top-right panel in Fig. 2) from the passive-q and multivariate-q experiments are similar to those from the control run. However, the rms error of the temperature analysis from univariate q is somewhat larger than that from the control run. The most important feature of the rms errors of the winds analyses (bottom panels in Fig. 2) is that the errors from multivariate q are smaller than those from the other three experiments beyond a 10-day spinup time, with errors reduced by 7% compared to the control run during some time period. This difference is more evident at times when the AIRS specific humidity observations are available, and it is especially large between 21 and 26 January 2004 when the specific humidity is best observed (bottom panel in Fig. 1). The rms errors of the wind analyses are comparable between the univariate-q and passive-q experiments, and are similar to those of the control run.

b. Spatial analysis accuracy comparison

In this section, we analyze the time average of the zonal mean analysis square error difference between the three humidity experiments and the control run. Based on these diagnostics, we discuss the reasons behind the different performance of the humidity runs. Since the results of section 4a showed that the temperature analyses have similar accuracies among the three humidity experiments, we focus here on the winds and humidity analyses. For these diagnostics, we take the time average over the analysis cycles of the last 16 days of the period under investigation. We make this choice to include the period when the analysis accuracy is stable and has more AIRS specific humidity retrievals available (Figs. 2 and 1).

Figure 3 shows that the specific humidity analyses (left panels) have smaller mean-square errors between 700 and 200 hPa (where AIRS specific humidity retrievals are assimilated) in both the univariate-q (middle-left panel) and the multivariate-q (bottom-left panel) experiments. Although no specific humidity observations are assimilated in the passive-q experiment, compared to the control run, the specific humidity analysis (top-left panel) still improves in the midlatitudes of the Southern Hemisphere (SH). As pointed out in section 3, this improvement comes from the impact of the observations of the other dynamical variables through the background error covariance terms. However, this impact is not always positive, especially in the tropics, which may be due to the strong convective activities and the difficulty to estimate accurate background error covariance between humidity and the other dynamical variables in that region. One common characteristic of these three comparisons is the degradation of the specific humidity analysis in the higher atmospheric levels over the South Pole. This could be due to the combined effects of poor-quality specific humidity observations and the poor quality of the covariance estimates between humidity and the other dynamical variables.

The accuracy of the zonal wind analyses (right panels) is also improved in all the humidity runs compared to the control run. The largest improvement occurs in the multivariate-q experiment, which is consistent with the results of the globally averaged zonal wind analysis accuracy comparison at 500 hPa (Fig. 2). The improvements are the largest in the SH middle to high latitudes, where the density of wind observations is the lowest. In the passive-q and univariate-q experiments, the improvement is not as large as in the multivariate-q experiment. Although the accuracy of the specific humidity analyses is better in the univariate-q experiment than in the passive-q experiment, this improvement in the humidity analyses does not yield a corresponding improvement in the accuracy of the zonal wind analyses (middle-right panel in Fig. 3). The difference between the performance of the multivariate-q and univariate-q experiments is due to the fact that, in the multivariate-q experiment, the specific humidity observations affect the winds analyses not only through the forecast process, but also through the background error covariance between winds and humidity during data assimilation. Furthermore, since the specific humidity and winds dynamical vectors are updated simultaneously in the multivariate-q experiment, the humidity, winds, and temperature components of the resulting analyses are more dynamically consistent.

c. Statistical significance of the analysis mean-square error difference between the humidity runs and the control run

In this section, we investigate whether the difference between the performance of the humidity runs and the control run, which we discussed in section 4b, is statistically significant. We follow Szunyogh et al. (2008), and apply a two-sample t test for correlated data (Wilks 2006) to the square error difference between the humidity runs and the control run.

In the two-sample t test for autocorrelated data, the significance level is proportional to the standard deviation of the difference between the two samples with a coefficient proportional to the autocorrelation in the two samples. In our case, the total sample size is 64 for each comparison (the analysis cycles from the last 16 days), but the effective sample size (ranges from 12 to 26) is much smaller than that due to the nonzero autocorrelations. We consider the mean-square error difference between the humidity runs and the control run to be significant when the null hypothesis, which is that the square error difference between the humidity runs and the control run is zero, can be rejected at the 95% confidence level. We apply the test to the zonal wind square error difference at the 300-hPa level, and find that the difference is statistically significant at the 95% level when the absolute difference is larger than 1.6 m2 s−2 for the error comparison between the passive-q experiment and the control run and 2.0 m2 s−2 for the other two comparisons.

In Fig. 4, we indicate the locations with statistically significant different mean-square errors from the control run with shades and contours. The shades are the regions where the humidity runs have smaller mean-square errors, while the contours indicate where the humidity runs have larger mean-square errors. It shows that the multivariate-q experiment (bottom panel) has the broadest areas with statistically significant smaller mean-square errors, which are over most of the globe, even in the Northern Hemisphere (NH). The other two comparisons (top-two figures) show that there is no significant difference between the errors of the humidity runs (i.e., passive q and univariate q) and the control run in the NH, which is consistent with the results we obtained for the zonal average mean-square error comparison (Fig. 3). Compared to the other areas, the zonal wind analysis is most improved around 60°S, where the coverage of wind observations is sparsest (not shown here), in both the passive-q and the univariate-q experiments. The magnitude of the improvement in the univariate-q experiment is larger than that in the passive-q experiment, especially between 120° and 60°W. In the tropics, the results are mixed.

d. 6-hour forecast accuracy of the total column precipitable water

Humidity is a variable that connects dynamical and physical processes in numerical weather forecasts and is an important factor that affects the precipitation parameterization process. With improved humidity analyses, we expect to have more accurate precipitation forecasts. In this section, we examine the effects of analyzing the humidity on the accuracy of 6-h total column precipitable water forecasts. As in section 4c, we apply a two-sample t test for correlated data to the total column precipitable water forecast square error difference between the humidity runs and the control run. The two-sample t test shows that the statistically significant difference at the 95% level is about 6.0 kg2 m−4 for the mean-square error difference between the passive q and control run, and is 5.5 kg2 m−4 between the univariate-q and the control run, and is about 2.5 kg2 m−4 for the error difference between the multivariate-q and the control run. As explained earlier, the different significance levels are due to the different autocorrelations of the errors in the experiments. In Fig. 5, only the values that pass the statistical test are plotted. The negative values are shaded, where the humidity runs have smaller mean-square errors; and the positive values are contours, where the humidity runs have larger mean-square errors.

Overall, the inclusion of specific humidity in the data assimilation process, either by simply extending the analyzed state vector or by assimilating humidity observations, improves the accuracy of 6-h total column precipitable water forecast in most areas. The largest positive impact is achieved in the multivariate-q experiment (Fig. 5). Although the magnitude of the impact of analyzing the humidity on the 6-h total column precipitable water forecast accuracy differs among the humidity runs, the general pattern of improvements and degradations is very similar in all cases: the main regions of improvement are in the tropical Atlantic, the east Pacific, Southeast Asia, and the central Indian Ocean, while the main regions of degradations are in the western Pacific and the Amazonian regions. In contrast with the results obtained from the passive-q and the univariate-q experiments, the accuracy of total column precipitable water forecast from the multivariate-q experiment is significantly better from the control run in the SH midlatitudes. These regions coincide with the areas where the winds analyses from the humidity runs are significantly better than the control run, which indicates that the improvement in the total column precipitable water forecast is due to a better analysis of both the humidity and the other dynamical fields.

e. The impact of the humidity assimilation on the forecast accuracy

The comparison of the analysis mean-square errors between the humidity runs and the control run shows that all three humidity runs show positive impact on the accuracy of the analysis states, with the largest positive impact from the multivariate-q experiment. Since it is known that the impact of the assimilation of humidity observations tends to be short lived, we now examine how the forecast accuracy from the humidity runs changes with the increase of the forecast leading time. We also investigate the spatial characteristics of the forecast error reductions.

We generate twice daily 96-h forecasts starting from the mean analyses at 0000 and 1200 UTC. Since we prepare forecasts for the last 16 days of the period considered in this study, we have a 32-member sample of forecast errors at each forecast leading time.

A comparison of the global average forecast rms errors at the 500-hPa pressure level (Fig. 6) shows that during the 96-h forecast, both the relative humidity (top panel) and the zonal wind (bottom panel) have the smallest error in the multivariate-q experiment. The relative humidity forecast error from the multivariate-q experiment is comparable to that from the univariate-q experiment, and the error from the passive-q experiment is comparable to that from the control run. The largest difference occurs at the analysis time and becomes smaller as the forecast time increases. In contrast, the differences between the zonal wind forecast errors from the multivariate-q experiment and those from the other three runs, although small, remain roughly constant for the entire 96-h forecast period.

To examine the spatial characteristics of the changes in the forecast accuracy, we compare the 48-h zonal wind forecast mean-square errors from the three humidity runs to those from the control run (Fig. 7). The improvement of the upper tropics zonal wind accuracy is larger than in the analysis time in all three humidity runs. The forecast initialized with the analyses of the multivariate-q experiment shows the largest improvement over the control run. Although the zonal wind analysis from the passive-q experiment has a similar advantage to that from the univariate-q experiment over the control run in the SH midlatitudes (Fig. 3), it almost disappears after 48-h forecast in the passive-q experiment, but is still significant in the univariate-q experiment (top-right panel in Fig. 7). In the NH midlatitudes, the forecast accuracy is improved in both the passive-q and the univariate-q experiments.

We carry out a two-sample t test on the 48-h zonal wind forecast square error differences between the humidity runs and the control run at the 300-hPa level. The statistically significant difference at the 95% level is about 5.0 m2 s−2 for the error difference between the passive q and the control run, is 4.0 m2 s−2 between the univariate q and the control run, and is 3.5 m2 s−2 for the error difference between the multivariate q and the control run. The results (Fig. 8) show that the difference is significant over most of the northern Pacific and the North America in the passive-q experiment. The forecast accuracy from the univariate-q experiment is better than that from the control run mostly over the ocean, which may be related to the relatively sparse conventional observation coverage over the oceans and the relatively dense AIRS humidity retrieval coverage over that region. In the multivariate-q experiment, the forecast improvements are especially large in the midlatitudes in both the NH and the SH, while the performance is mixed over the tropics.

5. Summary and discussion

The LETKF, as any other implementation of the ensemble-based Kalman filters, is a data assimilation scheme that provides a dynamically evolving, flow-dependent estimate of the background error covariance. In this paper, we have assimilated the AIRS specific humidity retrievals with the LETKF. In particular, we have explored three different ways to obtain an analysis of specific humidity: 1) a passive-q setup in which no humidity observations are assimilated, but the humidity analysis can be affected by nonhumidity observations through the background error covariance; 2) a univariate-q setup that humidity analysis is only updated by humidity observations, and the humidity variable interacts with the other dynamical variables only through the forecast process; and 3) a multivariate-q setup in which the humidity observations are assimilated simultaneously with other types of observations with a coupled background error covariance. To the best of our knowledge, this is the first study to compare the performance of univariate-q and multivariate-q setups in an ensemble-based Kalman filter framework.

The results show that performing a humidity analysis provides better results than copying the humidity variables from the background, even when humidity observations are not assimilated (passive q). Further improvement of the analyses can be achieved by using humidity observations in either the analysis of the humidity model variable (univariate q) or in the analysis of all model variables (multivariate q). Either through the dynamical interaction during the forecast step (e.g., univariate q) or by both the dynamical interaction and the covariance interaction during data assimilation (e.g., passive q and multivariate q), the improved humidity analyses improve the analyses of the other dynamical variables, as is especially evident in the wind analyses. By far the largest improvement of wind analyses is obtained from the multivariate-q experiment, because the correlated errors between specific humidity and the other dynamical variables are accounted for during data assimilation, and also because the analysis fields are dynamically more consistent. Even though the humidity field from the univariate-q experiment is as accurate as the multivariate-q experiment, the wind analyses from the univariate q experiment are only comparable to those from the passive-q experiment. The impact of the assimilation of humidity observations on the temperature analyses is small.

The effect of the humidity assimilation on the 96-h forecasts is qualitatively similar to that on the analyses. In particular, the relative humidity and zonal wind forecast have the smallest errors in the multivariate-q experiment.

In summary, in this study we have explored three ways to do humidity data assimilation. We have used a relatively low-resolution model and no radiance observations, which may have affected the results. Nevertheless, our results indicate that, compared to univariate q and passive q, multivariate q is a better way to assimilate humidity observations if a realistic state-dependent multivariate background error covariance is available as is the case with an ensemble-based data assimilation system. The interaction between humidity and the other dynamical variables during data assimilation process not only benefits the accuracy of analyses, but also generates dynamically more consistent analyses that, in turn, improve the forecasts.

Acknowledgments

We were supported in this project by NASA Grants NNG04GK29G, NNX08AD40G, and NNX08AD37G, and would like to express our gratitude to the AIRS Science Team, especially Joel Susskind, Chris Barnet, Mous Chahine, and George Aumann for their guidance and helpful discussions. Chris Barnet generously provided us AIRS temperature and moisture retrievals. Gyorgy Gyarmati patiently helped us in building the observation structure. Comments by the two anonymous reviewers were very helpful in improving the manuscript.

REFERENCES

  • Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129 , 28842903.

  • Bishop, C. H., , B. Etherton, , and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129 , 420436.

    • Search Google Scholar
    • Export Citation
  • Dee, D. P., , and A. M. da Silva, 2003: The choice of variable for atmospheric moisture analysis. Mon. Wea. Rev., 131 , 155171.

  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 , (C5). 1014310162.

    • Search Google Scholar
    • Export Citation
  • Holm, E., , E. Anderson, , A. Beljaars, , P. Lopez, , J-F. Mahfouf, , A. J. Simmons, , and J-N. Thepaut, 2002: Assimilation and modeling of the hydrological cycle: ECMWF’s status and plans. ECMWF Tech. Memo. 383, 55 pp.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., , and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129 , 123137.

    • Search Google Scholar
    • Export Citation
  • Hunt, B. R., and Coauthors, 2004: Four-dimensional ensemble Kalman filtering. Tellus, 56A , 273277.

  • Hunt, B. R., , E. J. Kostelich, , and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230 , 112126.

    • Search Google Scholar
    • Export Citation
  • Liu, J., 2007: Applications of the LETKF to adaptive observations, analysis sensitivity, observation impact and assimilation of moisture. Ph.D. thesis, University of Maryland, College Park, 154 pp.

  • Miyoshi, T., , and S. Yamane, 2007: Local ensemble transform Kalman filter with an AGCM at a T159/L48 resolution. Mon. Wea. Rev., 135 , 38413861.

    • Search Google Scholar
    • Export Citation
  • Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A , 415428.

  • Parrish, D. F., , and J. C. Derber, 1992: The National Meteorological Center’s spectral statistical-interpolation analysis system. Mon. Wea. Rev., 120 , 17471763.

    • Search Google Scholar
    • Export Citation
  • Susskind, J., , C. Barnet, , and J. Blaisdell, 2003: Retrieval of atmospheric and surface parameters from AIRS/AMSU/HSB data in the presence of clouds. IEEE Trans. Geosci. Remote Sens., 41 , 390409.

    • Search Google Scholar
    • Export Citation
  • Szunyogh, I., , E. J. Kostelich, , G. Gyarmati, , E. Kalnay, , B. R. Hunt, , E. Ott, , E. Satterfield, , and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation for the NCEP global model. Tellus, 60A , 113130.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., , and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130 , 19131924.

  • Whitaker, J. S., , T. M. Hamill, , X. Wei, , Y. Song, , and Z. Toth, 2008: Ensemble data assimilation with the NCEP Global Forecast System. Mon. Wea. Rev., 136 , 463482.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. Academic Press, 611 pp.

APPENDIX

Mean-Square Error Difference and Root-Mean-Square Error Difference

In comparing the error difference between two experiments, the possible error present in the verification state has less impact on the mean-square error difference than in the root-mean-square error difference (Szunyogh et al. 2008).

Expressed in mean-square error, the error difference between the humidity runs and the control run can be written as
i1520-0493-137-11-3918-ea1
Here, xh represents the ensemble analysis or forecast mean value from the humidity runs, xc is the value from the control run, xυ is the verification operational analysis state, xt is the true value, and the angled bracket 〈·〉 stands for the average of the gridpoint values over space and/or time. When expressed in rms error difference, the accuracy difference between the humidity runs and the control run is
i1520-0493-137-11-3918-ea2
With mean-square error difference, as shown in Eq. (A1), the square errors in the verification analyses 〈(xυxt)2〉 are canceled, and the only factor that affects the accuracy of the error differences is the possible error covariance between the verification state and that of either the humidity runs or the control run. With rms error difference [Eq. (A2)], on the other hand, the square error in the verification states cannot be simply canceled, and the error covariance still exists. Therefore, we expect that the possible errors present in the verification analysis will have larger impact on the rms error difference than on the mean-square error difference.

Fig. 1.
Fig. 1.

(top) The spatial coverage of temperature observations (gray dots are the AIRS temperature retrievals; black dots are the conventional temperature observations) around 500 hPa on 22 Jan 2004. The AIRS specific humidity retrievals have the same spatial coverage as the temperature retrievals. (bottom) Total number of AIRS specific humidity retrievals assimilated in a time interval of every 6 h at 505 hPa.

Citation: Monthly Weather Review 137, 11; 10.1175/2009MWR2791.1

Fig. 2.
Fig. 2.

Time evolution of 500-hPa global average rms errors for the analysis of (top left) the relative humidity, (top right) temperature (K), (bottom left) zonal wind (m s−1), and (bottom right) meridional wind (m s−1). The control run (blue line), passive q (gray line with open circles), univariate q (green line), and multivariate q (black line) are shown. Here and in the following figures, the verification is made against NCEP high-resolution (T256L28) analyses.

Citation: Monthly Weather Review 137, 11; 10.1175/2009MWR2791.1

Fig. 3.
Fig. 3.

The time average zonal mean-square error difference between three humidity experiments and the control run [(top) between the passive-q experiment and the control run, (middle) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run] for (left) specific humidity (g2 kg−2) and (right) zonal wind (m2 s−2). Gray shades indicate where the humidity runs have a smaller mean-square error than the control run, the contours indicate where the humidity runs have a larger mean-square error than the control run, and the magnitude is larger than 0.0005 g2 kg−2 or 0.5 m2 s−2.

Citation: Monthly Weather Review 137, 11; 10.1175/2009MWR2791.1

Fig. 4.
Fig. 4.

The 300-hPa time average zonal wind analysis mean-square error difference (m2 s−2) (top left) between the passive-q experiment and the control run, (top right) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run. All shaded areas and contours (the absolute values that pass the statistical test change with the comparison pairs, see details in section 4c) are significant at the 95% level. The shaded areas are where the humidity runs have better performance than the control run, and the contours are where the humidity runs have worse performance than the control run.

Citation: Monthly Weather Review 137, 11; 10.1175/2009MWR2791.1

Fig. 5.
Fig. 5.

The total column precipitable water 6-h forecast mean-square error (kg2 m−4) comparison (top) between the passive-q experiment and the control run, (middle) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run. Absolute differences larger than 6.0 kg2 m−4 are significant at the 95% level in (top), those larger than 5.5 kg2 m−4 are significant at the same level in (middle), and those larger than 2.5 kg2 m−4 are significant in (bottom). The negative values are shades, where the humidity runs have smaller mean-square errors; and the positive values are contours, where the humidity runs have larger mean-square errors.

Citation: Monthly Weather Review 137, 11; 10.1175/2009MWR2791.1

Fig. 6.
Fig. 6.

500-hPa global average forecast rms errors for (top) relative humidity and (bottom) zonal wind (m s−1). Multivariate q (gray line), univariate q (black line with open circles), passive q (black line with closed circles), and control run (gray line with squares) are shown.

Citation: Monthly Weather Review 137, 11; 10.1175/2009MWR2791.1

Fig. 7.
Fig. 7.

48-h forecast mean-square error (m2 s−2) comparisons for zonal wind (top left) between the passive-q experiment and the control run, (top right) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run. Gray shading indicates where the humidity runs have smaller mean-square errors than the control run, the contours indicate where the humidity runs have larger mean-square errors than the control run, and the magnitude is larger than 1.0 m2 s−2. The values between −1.0 and 1.0 m2 s−2 are blank.

Citation: Monthly Weather Review 137, 11; 10.1175/2009MWR2791.1

Fig. 8.
Fig. 8.

The 300-hPa time-average 48-h zonal wind forecast mean-square error difference (m2 s−2) (top left) between the passive-q experiment and the control run, (top right) between the univariate-q experiment and the control run, and (bottom) between the multivariate-q experiment and the control run. Absolute differences larger than 5.0 m2 s−2 are significant at the 95% level in the comparison in (top left), those larger than 4.0 m2 s−2 are significant in (top right), and the significant level is 3.5 m2 s−2 in (bottom). Gray shading indicates where the humidity runs have smaller mean-square errors than the control run, and the contours indicate where the humidity runs have larger mean-square errors than the control run.

Citation: Monthly Weather Review 137, 11; 10.1175/2009MWR2791.1

Table 1.

Summary of the experimental design (u = zonal wind, υ = meridional wind, T = temperature, ps = surface pressure, and q = specific humidity).

Table 1.
Save