1. Introduction
Surface layer in situ observations compose a rich, accurate, and often dense data source, but they are generally underutilized in current operational numerical weather prediction (NWP) and data assimilation (DA) systems. In the absence of precipitation and strong winds the utility of radar is limited, and surface (screen height) observations may be the only reliable data source in the PBL.
The effective use of these data for NWP applications is complicated by the transient coupling and decoupling of the surface and PBL with the free atmosphere aloft. Parameterization schemes currently available in mesoscale and global NWP models may also exhibit significant model error. Here, experiments are performed with a single-column model and simulated observations to test the potential of surface observations for PBL structure estimation and model error mitigation.
Better assimilation of surface observations is likely to have tangible benefits for NWP. Accurate representation of the initial PBL structure in a mesoscale NWP model could lead to improved short-range forecasts of local thermally driven flows such as the sea breeze and slope flows. Forecasts of larger-scale phenomena such as convective outbreaks (Crook 1996; McCaul and Cohen 2002), frontal propagation (Rotunno et al. 1998), and cyclogenesis (Kuo et al. 1991) could also be improved. In addition, when a NWP model is used as a tool to generate 3D datasets for process studies, an accurate representation of the PBL would increase confidence in the results. For example, air pollution studies (e.g., Kumar and Russell 1996; Shafran et al. 2000) are sensitive to modeled PBL properties.
This study uses a perturbed-observation ensemble Kalman filter (EnKF; Burgers et al. 1998; Houtekamer and Mitchell 1998) to assimilate simulated screen-height (2 m) and anemometer-height (10 m) observations in an evolving PBL model. A season of mesoscale forecasts from the Weather Research and Forecasting model (WRF; Skamarock et al. 2005) is examined to understand some summertime climatological characteristics of a parameterized PBL (hereafter PPBL). A 1D PBL model with a similar climatology, given the same forcing, is designed to run offline from its mesoscale model parent. The model consists of the WRF PPBL and land surface algorithms with all other tendencies (advection, radiative forcing, etc.) drawn from the set of WRF forecasts. This approach facilitates large ensembles and many cases, and its similarity to the complete mesoscale model suggests that results from the simplified model may extend to the full modeling system.
The success of the EnKF in estimating the state of a PPBL depends on the accurate representation of growing modes in the PPBL, the statistical variance–covariance characteristics of the PPBL, and the statistical nature of the observations. This study represents one step toward understanding the extent to which surface observations, assimilated via the EnKF, can reconstruct the state of a PPBL. It offers an optimistic view of the positive impact of assimilation because the simulated observations are sampled from a solution of the same (stochastic) model.
Past studies have demonstrated the potential of the EnKF, and its variations, for DA applications. Evensen (1997), Miller et al. (1999), Anderson and Anderson (1999), and Anderson (2001) investigated its performance with low-order models. Quasigeostrophic models were used successfully by Evensen (1994), Houtekamer and Mitchell (1998), Mitchell and Houtekamer (2000), and Hamill and Snyder (2000). Whitaker and Hamill (2002), Mitchell et al. (2002), and Snyder and Zhang (2003) implemented it with more complex primitive equation models. Anderson (2001) began some simple parameter estimation experiments, while Mitchell and Houtekamer (2000) parameterized the full model error-covariance matrix and estimated the parameters with the EnKF. This body of research suggests that the EnKF works well for many dynamical systems. But the efficacy of the EnKF in a PPBL, which has unknown error-growth properties and potentially significant model error, has not been explored. Analysis and experimentation are necessary to determine whether a useful DA system is realizable.
The four-dimensional variational (4DVAR) approach has been used successfully with prespecified error covariances to assimilate surface pressure observations (Järvinen et al. 1999). As an integrated quantity, pressure provides information on the state of the troposphere aloft and elicits a deep response that is easily mapped to other variables through balance relationships. Specifying the state of the PPBL is an inherently different problem. A pressure observation cannot distinguish a PPBL feature from larger vertical scales, but including surface layer winds and temperatures directly to better estimate the local, transient, state of the PPBL may lead to better simulation and short-range forecasts near the surface.
One difficulty in using surface observations to specify the state of the PBL is determining the vertical influence of the observations. Nudging and incremental update approaches have been reported to improve the PPBL representation in earlier mesoscale implementations. Stauffer and Seaman (1994) applied nudging to individual model layers and relied on the model to vertically propagate the observation influence. Fast (1995) specified a vertical exponential weighting function (e-folding length 30 m) that allowed the surface observations to influence nearby layers, and relied on the mesoscale model to retain the vertical structure and further spread the observations. Other variants on nudging rely on updating the PBL structure with a parameterized or climatological profile, and matching it to an observation and a background profile aloft (e.g., Stauffer et al. 1991; Leidner et al. 2001). This accounts for some of the bulk properties of the PBL under different stability conditions, but suffers from a sensitivity to the choice of vertical profiles. More sophisticated profile-matching techniques have also shown promise when combined with an incremental update DA system (Ruggiero et al. 1996).
A hypothesis underlying this work is that modern DA techniques can improve on nudging by more effectively, and objectively, spreading surface observations upward with anisotropic and inhomogeneous error-covariance structures. Both the EnKF and the 4DVAR algorithms provide for systematic treatment of observational uncertainty and allow for straightforward assimilation of indirect observations (those other than state variables). The EnKF (Burgers et al. 1998; Houtekamer and Mitchell 1998) is attractive for assimilation of surface observations to specify the state of a PPBL because it is easy to implement for a variety of observation platforms. More importantly, anisotropic and inhomogeneous background error covariances are a natural product of the algorithm.
Another difficulty with simulating and forecasting the state of the PBL is the likelihood that model error is as severe as other sources of error. One way to overcome this is to account for uncertainty in model parameters, which are ubiquitous in PPBL schemes. Acknowledging uncertainty in model parameters is simply admitting that models are not perfect. This reasoning is behind ensemble research with multimodel ensembles (e.g., Hou et al. 2001) and attempts to apply an explicit stochastic term to physical parameterization tendencies (Buizza et al. 1999). While model error in a dynamical model may have unknown characteristics, it is reasonable to expect that in a physical parameterization scheme a large part of the error is related to uncertain parameters that are assigned constant values.
Model error can be partially addressed with the EnKF because statistical distributions of PPBL parameters (“constants” that may or may not be fundamental physical quantities) can be estimated through correlation with observations. This work focuses on estimating a soil moisture parameter because it strongly influences PBL structure by controlling the partition of sensible and latent heat fluxes through the surface.
Error of representativeness, which can be viewed as a component of model error, is not addressed in the present study. The fact that numerical models cannot fully capture the surface orography, for example, increases the difficulty of assimilating surface observations. If the model orography lacks, say, a long narrow valley, the surface conditions in the model will not be representative of actual observations at the valley floor, and assimilation of those observations into the model may yield little benefit. This important issue is not addressed in our experiments, although we note that such errors of representativeness can be expected to decrease as the resolution of NWP and climate models increases. In this study the use of synthesized observations ensures that the variability of the assimilating model and the observations is similar, thereby removing representativeness error from consideration.
A number of previous studies use assimilation algorithms with column models of the atmosphere and soil. Several have focused on estimating soil moisture form near-surface observations and used the atmospheric column model to ensure realistic surface energy transfer. For example, Mahfouf (1991) used screen-height temperature and relative humidity to retrieve soil moisture through both a varitional technique and a sequential assimilation approach based on specified covariances between observed variables and soil properties. Bouttier et al. (1993a) and Bouttier et al. (1993b) refined the sequential approach, showing that it provides soil moisture analyses and can improve NWP forecasts that use them. Margulis and Entekhabi (2003) used a variational algorithm to assimilate both screen-height observations and radiometric surface temperature into a land–atmosphere and PBL column model (see Rhodin et al. 1999 for a similar approach). Their results suggest that studies with synthetic data were useful predictors of performance with real data. Reichle et al. (2001a, b) also use variational approaches, and impose all of the atmospheric forcing instead of explicitly including the PBL state in the assimilating model.
Here, we seek to evaluate the extent to which the state of the atmospheric PBL can be estimated from surface observations. Our column model includes both the PBL and soil conditions in the state vector, but we focus on the atmospheric response to near-surface observations, rather than the soil response.
The next section explores the PPBL characteristics of a set of real-time WRF model forecasts, over the Southern Great Plains, in an effort to define the potential benefit of a DA system on a PPBL in a mesoscale model. Section 3 describes the PPBL modeling system used for this study, and compares its variance–covariance structures with those of the WRF. To test how the actual DA system compares with the potential improvement, section 4 documents experiments on constraining the PPBL with the observations. It also explores parameter estimation as a means of reducing model error. Finally, section 5 reviews the conclusions of this work.
2. Variance and correlation structures of PBL forecasts over the Southern Great Plains
In this section we describe the climatology of WRF forecasts, in terms of variance and correlation, in order to guide the development and interpretation of the offline model that will be used in the assimilation experiments. The climatological correlations between the surface layer and the state of the PPBL indicate the potential for the observations to provide information relevant to structures in the PPBL.
a. Climate of real-time forecasts
We examine WRF forecast profiles for July–August 2002 at a single point located in the Southern Great Plains. The 48-h forecasts were run daily at the National Center for Atmospheric Research (NCAR) on a continental U.S. (CONUS) domain with horizontal grid spacing of 22 km. The vertical grid has 28 stretched levels with spacing that varies from approximately 100 m near the surface to 1300 m at the top of the domain, which is near 20 km. Runs began daily with 0000 UTC (1900 LST) analyses from the Eta model (Black 1994) 3D variational (3DVAR) DA system (EDAS; Rogers et al. 1999). The WRF was configured with the National Centers for Environmental Prediction (NCEP) Medium-Range Forecast (MRF) model PPBL scheme (Hong and Pan 1996), which includes a five-layer slab soil model (Dudhia 1996) and a surface layer similarity scheme (Louis 1979), coupled with the PPBL nonlocal diffusion scheme. While containing only a limited number of degrees of freedom, the model does allow for feedback between the land and the atmosphere that is important for use in 3D modeling systems (Margulis and Entekhabi 2001).
A column located near 35.2°N, 97.5°W, in Oklahoma, was chosen for evaluation, and the profiles were interpolated to a uniform vertical grid with Δz = 100 m for comparison with the results presented later. Nothing about this particular location is unique, and the analysis presented here should represent the surrounding region. The flat topography eliminates many mesoscale circulations driven by differential heating. The summer period is weakly forced at large scales and allows the analysis to focus on the remaining local effects, including those from heterogeneous land surfaces.
For the July–August period, 3-hourly output of 48-h WRF forecasts were available from 54 days. Given this sample, we calculated relevant variances and correlations for model variables. We will refer to these quantities as “climatological” and use the notation σ2γ to denote the variance of a variable γ. A simplified PPBL modeling system, designed to reproduce this variability exhibited by the sample of forecasts, is introduced in the next section.
Climatological variances (spread) at forecast lead times of 3 to 48 h (2200 LST one day to 1900 LST two days later) are shown in Fig. 1 for potential temperature θ, U-wind (zonal) component, and specific humidity Q (σθ, σU, and σQ, respectively). The screen-height diagnostic variables at z = 2 m for θ and Q (denoted θs and Qs), and z = 10 m for U (denoted Us), are plotted at z = 0 m. These will be used in assimilation experiments later.
During the forecasts, and particularly during the first 12 h, the variability of θ and Q within the PPBL decreases markedly. This adjustment shows that the climate of the WRF PPBL differs from that found in the EDAS, and that much of the drift from one to the other occurs during the first 12 h. It is not clear whether the EDAS analyses or the WRF forecasts are more representative of the actual PBL climatology, though experience suggests that model solutions typically possess too little variability. For the present work we ignore the adjustment period and focus on the latter 24-h period.
The PPBL in the WRF climatology clearly has greater σθ than the synoptically inactive free atmosphere aloft (Fig. 1a). This behavior can be explained by the time dependence of the initialization and forecasts. Each forecast in the climatology was initialized on a different day, and near-surface daily temperatures can easily vary by several degrees because of varying cloud cover and soil moisture, for example. The maximum at t = 39 h (1000 LST), z = 1000 m, shows large daily variability in the growth of the PPBL with the morning onset of thermal convection. Just before and during sunrise (30–36 h), the σθ is lowest near the surface. This may be because the low temperatures are more closely tied to the surface and soil parameters in the model, which are constant, than heating or cooling caused by forcing external to the PPBL.
Figure 1b shows that, as expected, σU in the PPBL exhibits a strong diurnal dependence. During the day, characterized by a convective regime, levels throughout the PPBL are coupled to the surface and the wind speeds are limited by the negative momentum flux associated with rising parameterized thermals. Thus a σU minimum occurs in the PPBL near 45 h (1600 LST). Conversely, the early-morning PPBL is decoupled from the surface because it is in a neutral or stable regime for most of the night, and features such as low-level jets from the inertial oscillation can contribute to large σU. Figure 1b also shows that at any time of day, the winds at z = 10 m are limited by surface friction.
The plot of σQ shows a persistent maximum between 2000 and 3000 m, near where the sharp Q gradient at the top of the convective PPBL can often be found (Fig. 1c). The other notable maximum is near the surface during the night, when σQ is determined by the downward moisture flux from the mixed layer of the previous day. Above the surface layer at night, σQ is primarily a function of horizontal advection of moisture that was transported upward the previous day and is left in the residual layer. Just before sunrise (36 h), coincident with the maximum in σU, σQ in the residual layer increases slightly. The fact that σQ decreases in the PPBL during the morning and afternoon hours, while the spread at the surface remains high, can result from either a strong response to entrainment of more homogeneous air from aloft and/or an inefficiency in vertical transport of moisture into the modeled mixed layer.
Although the standard deviations discussed above are not directly related to analysis or forecast error, high climatological variance does suggest regions that may be difficult to estimate and forecast. Correcting the state in those regions can be accomplished by directly observing the regions of high spread or by observing regions that are well correlated with the regions of high spread. Next we examine the climatological correlation of the PPBL state with surface observation locations.
b. Correlation with surface observation locations
Spatial correlation coefficients (r) reveal characteristic structures in the flow, which will influence how the PPBL can be inferred from a surface observation and an appropriate DA system. We begin by examining the correlation of θ, U, and Q profiles in the PPBL with their respective screen- and anemometer-height values θs, Us, and Qs, which are the diagnostic forecast variables (Fig. 2).
Potential temperature is correlated with r = 0.8 to a maximum height of approximately z = 2000 m in the late afternoon, with the depth decreasing to a minimum just before sunrise. This behavior corresponds to the growth and decay of the convective PPBL, but the correlation is surprisingly strong through the night when the residual layer is decoupled from the surface layer. The decoupling does not completely destroy the correlation with the surface, revealing the effect of weak advection and suggesting that θ is horizontally homogeneous in the nearby residual layer. If we suppose that surface nocturnal cooling occurs at approximately the same rate each night, then the temperature at any time of night is strongly tied to the afternoon maximum in θs, which in turn is strongly tied to θ in the mixed layer formed above it. With weak advective forcing, the correlation can be maintained through the night. From this we expect that assimilation of θs under quiet synoptic conditions can be productive nearly any time of day.
The correlation of U with Us shows a similar diurnal structure, but with more rapid decorrelation at night. The maximum extent of the 0.8 correlation is approximately z = 2400 m for a short duration around t = 45 h (1600 LST). The correlation reduces rapidly as the sun goes down but remains above 0.7 for z < 1000 m throughout the night. While the temperature correlation is positive all the way to z = 4000 m, cor(Us, U) ≈ 0 above z = 3000 m at night, and surface wind observations are not likely to help determine winds aloft, which are determined by a larger-scale pressure gradient. Below z = 1000 m just before sunrise (36 h), correlations greater that 0.7 suggest that the maximum in σU (Fig. 1b) should be partially determined by observations of Us, but the possibility of rather large error during that period still exists. During the day, the correlation is at a maximum in the PPBL while the variance is at a minimum, suggesting that surface wind observations could substantially reduce errors there.
Similar to both θ and U, the correlation of the Q profile with Qs is strongest in the PPBL during the afternoon and weakest just before sunrise. The strong correlation in the PPBL suggests that surface observations may be able to influence its structure. But the highly variable region shown in Fig. 1c at the top of the PPBL is coincident with weaker correlations of approximately 0.3, suggesting that the region may be difficult to correct with surface observations when error is high.
c. Cross correlation with surface observation locations
Links between the various permutations of θ, U, and Q are to be expected in the PPBL. For example, the wind speed is important in determining both moisture and temperature flux through the surface layer. Here we consider those links by examining cross correlation in the same manner as the single-variable correlations.
The cross correlations plotted in Fig. 3 are weaker than the single-variable correlations. All of the values are below r = 0.7, and most are below r = 0.5, suggesting tenuous connections in the PPBL between diagnostic surface variables and profiles of different state variables. This is surprising because temperature and humidity fluxes, which largely determine the diagnostic variables, are functions of virtual temperature and wind speed. But the lack of correlation suggests that surface observations of one variable may not be particularly useful for determining other variables in the PPBL through cross correlation. The state of each variable in the PPBL should be better determined by a direct observation.
The analysis presented in this section partially characterizes the error and climatological variability in the WRF real-time implementation over the Southern Great Plains. Whether surface observations can actually determine the state of the PPBL depends directly on the estimated variance and covariance structures, which in turn depend on the assimilating model and the forecast dynamics. Before discussing the assimilation experiments, we turn next to the development and behavior of the single-column model.
3. The single-column model
a. The parameterized 1D PBL
The 1D column model (hereafter “the model”) is constructed around the same PPBL as the WRF model. Note that our goal in developing this 1D model is not to reproduce realistic 3D solutions, but rather to have a simple vehicle to investigate the assimilation of surface observations. Because our focus is on the PPBL response to observations, the model incorporates external atmospheric processes (such as horizontal advection and downward long- and shortwave radiative fluxes) through tendencies drawn from the WRF climatology.
Both the initial state and the tendency term for the model are drawn from the WRF climatology as follows. An initial state is constructed by first randomly selecting two states valid at a lead time of 12 h, then choosing a random weighting factor from a uniform distribution [α ∈
The model is configured to run on 120 vertical grid points with Δz = 100 m and Δt = 120 s. This vertical grid spacing is slightly coarser near the surface than a typical mesoscale model implementation, but much finer through most of the domain. Scalars of real-time WRF model variables are interpolated to this grid for all computations. Each run is integrated for 36 h corresponding to the 12–48-h WRF forecast period.
The effect of surface observations is evaluated by performing a series of data assimilation experiments with the EnKF, which explicitly estimates the required forecast covariances from the ensemble. Simulated observations are computed from a reference or “true” solution of (1), which is initialized and propagated as just described. For these data assimilation experiments, we desire a forecast ensemble whose initial conditions and tendencies f are independent of the true solution. This is accomplished by withholding from the climatology the pair of WRF forecasts used in constructing the true solution, and then, for each member, drawing an additional pair. All the results presented are for N = 100 member ensembles, which is approximately 20% of the size of the state space, but which makes computation fast. Larger ensemble sizes were tested for a few cases and show only small differences.
b. Variance–covariance structures in the offline model
The model (1) is not a perfect representation of the full WRF model, and neither its forecasts nor its climatology will be exactly the same as those of the WRF. In this section, we show evidence that the structures in the offline model are indeed like those of the WRF model, but that the increased resolution sharpens some of the variance structures. The similarities lead one to expect that some of the results of these experiments will extend to the complete WRF modeling system and perhaps other modeling systems,
Plots of standard deviations σθ, σU, and σQ averaged over 500 random 100-member ensembles are shown in Fig. 4. The large number of cases is intended to provide statistically robust assimilation results (e.g., Morss and Emanuel 2002). The plots are for forecast lead times of 24–48 h, and may be directly compared to the plots in Fig. 1.1 The column model output is plotted every 1 h for all of our results, so the WRF output will appear smoother.
Qualitatively the patterns show similarities, but the overall ensemble spread is smaller because the distributions are narrowed by averaging to produce initial conditions and forcing terms. The patterns also show stronger gradients because the grid spacing is different from that of the WRF. Larger Δz for the lowest model layer weakens the surface layer gradients in the state variables and hence the variability. But the inversion near z = 3000 m is better resolved in this model, leading to greater variance there. Because we are evaluating over many cases, and the truth and ensembles are drawn from the same distributions, absolute error of the ensemble mean is similar to the ensemble spread shown in Fig. 4.
Correlation coefficient structures (Fig. 5) also suggest that the important features of the WRF solutions are being captured (Fig. 2). Differences result from propagating the state vector with the offline model, but overall Fig. 5 and Fig. 2 are qualitatively similar. This similarity provides some confidence that the results of assimilation experiments with the offline model should extend to the complete WRF model when run for similar forcing regimes. The strong correlations also indicate that the PPBL will respond to surface observations, a notion that is tested next.
4. Application of the EnKF
In this section, results of assimilating 2-m θ and specific humidity Q (θs and Qs), and 10-m wind component (Us and Vs) observations are presented.
a. EnKF algorithm
The observation errors are assumed to be uncorrelated and to have variances 1.0 K2, 2.0 m2 s−2, and 1.0 × 10−6 kg2 kg−2, for θs, (Us, Vs), and Qs, respectively, which agree roughly with those cited and used in Crook (1996). An imperfect observation is created by adding a realization of mean-zero Gaussian white noise with the appropriate error variance to the perfect observation (from the truth run). Then an ensemble of N observations, one to assimilate into each member, is generated by adding realizations of mean-zero Gaussian white noise to the imperfect observation.
b. Results when observations are assimilated
In this subsection we demonstrate that the assimilation of simulated observations in this experimental environment leads to substantial error reduction. In the next subsection, we further test the utility of the surface observations by adding a source of simulated model error.
We can say that an ensemble run in an assimilation cycle is constrained by the observations, and an ensemble run without is unconstrained. Constrained and unconstrained ensembles have ensemble-mean absolute error EC and EU, respectively. Here the constrained ensembles use initial conditions and tendencies that are identical to those for the unconstrained ensembles shown in the last section. The results are averaged over the same 500 cases.
The behavior of the assimilation algorithm can be checked by examining the ratio of ensemble spread to ensemble-mean error (σγ/EC). If the EnKF is properly updating and generating appropriate posterior distributions, then this ratio should be close to one. Because we expect the assimilation to primarily affect the lower portion of the domain, and for brevity, these results are averaged over the lowest 1000 m and shown in Fig. 6. Some variation is evident, but the curves stay within range of the expected value and indicate a properly functioning EnKF.
An easy way to evaluate any benefit of assimilated surface observations is to compare the forecast error of the ensembles with and without them. A reduced error indicates that the assimilated observations are spread upward according to the covariances estimated from the forecast ensemble, and they have had a positive impact on skill. Figure 7 summarizes the results, showing that EC ≤ EU below z = 4000 m at all times. Note that because the truth and the ensemble members are drawn from the same distribution in these experiments, the variances shown in Fig. 4 are a proxy for EU. Therefore, we do not show EU explicitly.
The introduction of observations at t = 30 h (0100 LST) occurs when the atmosphere is stable and weakly coupled to the surface, but immediate error reductions are evident. The effect is strongest in the lowest few hundred meters because the high correlation with the surface through a shallow layer persists at night, as shown in Fig. 5. A weak effect extends higher and small error reductions are evident in θ and Q as high as 3000 m, and in U as high as 1500 m. A maximum in Q error reduction at t = 30 h, z = 2800 m demonstrates the effect of deep covariance structures. In this case, the high-variance region near that level in Fig. 4c is likely associated with a strong vertical gradient at the top of the nocturnal residual layer, which may be correlated with the surface because both its height and the surface layer humidity at t = 30 h depend on the humidity of the soil and the well-mixed PPBL the day before. Later in the forecasts, the magnitude of error reduction is modulated by the subsequent observations and the diurnal cycle.
Comparison of Figs. 7a and 4a shows that error reduction ranges from 0% to 60%. For example, an error maximum in θ just after sunrise at z = 1200 m that is associated with the growth of the PPBL experiences an error reduction of 0.6 K, or approximately 37%. The maximum at t = 39 h, z = 200 m in Fig. 7a represents a 60% reduction. During the late afternoon strong coupling between the surface and the atmosphere results in a deeper response. The reduction of 0.7 K in θ error for z ≤ 2800 m at 48 h is approximately 54% of the unconstrained error, which is EU = 1.3 K.
Error reduction in U is also substantial and is maximized where EU is greatest. The presunrise reduction of 2.0 m s−1 is 57% of EU = 3.5 m s−1. Eliminating the presunrise error maximum could have implications for larger-scale moisture and pollutant transport associated with inertial low-level jets. During the day, the error reduction is relatively constant at about 30%, leaving an error in the constrained ensemble of EC ≈ 1.0 m s−1. It appears that this value may be an asymptotic lower limit of error for this model, assimilation system, and assigned observation-error variance.
Figure 4 gives the impression that development of the well-mixed Q layer lags that of θ by 3 h. This can also be seen in Fig. 5, as Qs becomes strongly coupled with the profile of Q at t = 42 h rather than t = 39 h. The effect of the observation through a deep layer is therefore realized later. The 48-h reduction in Q for z < 2000 m is approximately 35% from EU = 1.4 g kg−1. Near the top of the PPBL (just above z = 3000 m at 48 h) the error reduction in terms of percentage is lower, ranging from approximately 5% to 30%. This is expected from the weaker correlations, the difficulty of accurately analyzing the depth of the PPBL, and the location of the strong vertical gradients at the PPBL top.
For comparison, time-invariant background-error covariances were derived from a climatology of the offline model and used in place of the ensemble-derived covariances. The results should be similar to what might be expected in an optimal interpolation or 3DVAR scheme with stationary covariance functions. Although the climatological covariances were effective at reducing error, the ensemble-derived dynamic covariances (those shown in this section) consistently reduced the errors by an additional 20%–30%, demonstrating the utility of flow-dependent covariance structures.
These results appear promising and suggest that the assimilation of surface observations could be useful in real applications. For example, Crook (1996) found that convection is sensitive to the gradient of moisture and temperature between the surface and mixed layer. McCaul and Cohen (2002) found that the buoyancy and shear profiles are important determinants of storm intensity. But some caveats arise in this simplified system. More research is certainly necessary to understand 3D assimilation and model error. Model error can lead to inaccurate ensemble-derived covariances, resulting in poor performance of the assimilation algorithm. Within this experimental framework, we can begin to address it. In the next subsection we assess the impact of model error and the potential for its mitigation via assimilation of surface observations.
c. Toward a reduction in model error through parameter estimation
The sensitivity of PPBL schemes to the moisture availability parameter M, which can be thought of as the ratio between the latent heat of evaporation and potential evaporation through the surface, has been documented in Zhang and Anthes (1982), Troen and Mahrt (1986), and Oncley and Dudhia (1995). Here we simulate model error by specifying incorrect values of M in the forecast model. We then use the EnKF as implemented above to estimate M by including it in the state vector x (so that each ensemble member uses a different value of M), and allowing the correlation between the parameter and the surface observations to update the distribution. All statistics are derived from the same 500 cases used previously.
For each of these 500 cases, an ensemble of initial values for M is constructed as follows. Working with the variable μ = log M (or, equivalently, assuming a lognormal distribution for M), a random value μ̂ is drawn from the Gaussian distribution
Forecasts using these incorrect values of M have increased ensemble-mean forecast error when compared to the model with correct parameters, as shown in Fig. 8. The differences are small for U because momentum fluxes are only weakly dependent on M (Oncley and Dudhia 1995). The error is greater for θ and Q because of their stronger relationship to M. The model error appears to make forecasting the depth of the well-mixed moisture in the PBL particularly difficult.
We next attempt to recover the correct value of M, and assess the benefit of allowing the observations to influence the moisture availability, by examining ensemble-mean error and ensemble spread. The results are averaged over all 500 cases and normalized by the average initial values, and are denoted ÊC and σ̂M for error and spread, respectively. Figure 9a shows that both ÊC and σ̂M decrease monotonically following a rapid decrease at the first assimilation step. The steady decrease in ÊC shows that on average, the estimate M̂ approaches the truth.
The solid curve in Fig. 9a, which is the ratio of the average (but not normalized) spread to error, reveals the extent to which σ̂M provides quantitative information about the uncertainty of M. Its initial value is a function of the initial sampling strategy and σ2μ, the assumed variance of log M. At the first assimilation step (t = 30 h) the spread decreases much more than the error, as is also shown by the normalized curves. Thereafter a slow decrease appears to level off at a constant value near 1.05, when the error and spread are decreasing at the same rate. This result indicates that the ensemble spread provides a quantitative estimate of the error in the ensemble-mean M.
The case-by-case variability in the estimate M̂ is demonstrated in Fig. 9b. Ten randomly chosen cases show that the estimate for a particular case need not approach the true value. This suggests a complicated relationship between M and other error sources that vary with the case.
Correlation of M with the surface observations shows that θs and Qs have a greater effect on M than does Us (Fig. 9c). Near-surface water vapor is positively correlated with M because a greater M allows more evaporation from the surface. Near-surface temperature is negatively correlated because the partitioning of total latent and sensible heat flux is a function of M, and more evaporation leads to less sensible heating.
Comparable error is observed in constrained ensembles, both with and without estimating M. Figure 8 shows that the unconstrained error is greater when M is incorrectly specified, Comparing constrained and unconstrained ensembles, both initialized with an incorrect distribution of M, a plot such as Fig. 7 would show that the error reduction has similar structures but with greater magnitude. We omit the figure for brevity, but maximum differences in θ and Q error reduction of up to roughly 50% occur near the top of the mixed layer between 42 and 48 h. Overall, these results suggest that parameter estimation via the EnKF is a viable approach for mitigating systematic model error introduced by incorrect parameter specification.
5. Summary and conclusions
This paper documents the effectiveness of simulated surface observations, assimilated via the EnKF, for reducing error in a parameterized PBL. An analysis of variance–covariance structures in the PPBL of WRF model real-time forecasts over the Southern Great Plains (Fig. 1) indicates climatological regions of expected maximum error. The correlation of the PPBL state with near-surface observation locations (Fig. 2) suggests the potential for reducing that error. An offline model was constructed, with guidance from the WRF climatology, consisting of the MRF PPBL scheme and forcing derived from the WRF forecasts. The model allows efficient experimentation on both data assimilation of synthetic surface observations and parameter estimation.
Results show that the simulated surface observations are effective at constraining the state of the PPBL and reducing ensemble-mean forecast error (Fig. 7). The introduction of observations results in an initial error decrease through a deep layer, with a magnitude that depends on the strength of atmospheric coupling with the surface. Subsequent reductions in error are maximized during the strongest coupling and extend through the deepest layer late in the afternoon when the PPBL is well mixed. A presunrise maximum in U error associated with nocturnal jets is also substantially reduced. For both θ and Q, error associated with the strong vertical gradients at the top of the PPBL is reduced, suggesting an improved forecast of the depth of the PPBL. Extension of these results to real modeling systems could improve forecasts of pollutant transport and storm initiation.
An exploratory attempt to mitigate simulated model error shows the potential of parameter estimation in a data assimilation cycle. The model was initialized with a distribution of the moisture availability parameter centered on an incorrectly specified mean. The distribution was updated with the EnKF and simulated observations, and on average approached the correct value with ensemble spread providing a measure of error in the estimate (Fig. 9).
These results represent a step toward productive utilization of surface mesonet observations in a comprehensive data assimilation system. They also demonstrate the potential utility of the EnKF approach to DA for a system with different error-growth properties than those used in many previously documented experiments. Three directions for future study are immediately obvious: extension to 3D, different climatological regimes, and real-data cases.
Local circulations in mesoscale models are often responses to horizontal differences in the PPBL structure, and it is expected that these results will extend to a 3D PPBL to improve simulation and very short-range forecasts of PBL circulation. This hypothesis should be rigorously tested, and given sufficient resources, an experiment such as that carried out here is one possible approach.
The usefulness of surface observations naturally depends on the depth of coupling with the atmosphere. During quiescent summertime regimes this coupling is deep because horizontal advection acts only slowly to destroy it. During winter, strong advection will often rapidly decouple the surface from the atmosphere aloft, reducing the utility of surface observations. In reality this is not a problem because the vertical influence of a surface observation should be limited during these periods, and if it reduces error within the appropriate shallow layer it may still improve simulations and forecasts. Initial experiments with a winter regime over the Southern Great Plains show positive results, suggesting that surface observations may be useful under a variety of conditions.
The assimilation of real observations will also present a more formidable challenge. Both perfect and imperfect model experiments performed here suggest that observations can reduce model error when they are used to help estimate model parameters. But the total model error of a PPBL is undoubtedly larger than the simulated error in these experiments. The PPBL has less spatial and temporal variability than do real observations, and may also contain significant biases. Experimentation with more complex PBL models, such as large eddy simulation (LES), may further the development of surface-observation DA methods before attempting to assimilate real observations.
Finally, experimentation with a more sophisticated land surface model coupled to an atmospheric model is necessary. The interaction of atmospheric ensemble data assimilation schemes on more realistic prognostic soil variables may lead to a better representation of surface exchanges. Additional parameter estimation experiments with these models may also help guide future modeling efforts and development of the land surface models themselves.
Acknowledgments
Chris Snyder was supported by the National Science Foundation under Grant 0205655. The authors gratefully thank Wei Wang and the rest of the real-time WRF forecast team at NCAR for access to archived research runs. Jimy Dudhia also provided valuable comments during the finalization of the manuscript. Finally, two anonymous reviewers provided thoughtful and thorough reviews that greatly improved the clarity and readability of the manuscript.
REFERENCES
Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129 , 2884–2903.
Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127 , 2741–2758.
Black, T. L., 1994: The new NMC mesoscale Eta model: Description and forecast examples. Wea. Forecasting, 9 , 265–278.
Bouttier, F., J-F. Mahfouf, and J. Noilhan, 1993a: Sequential assimilation of soil moisture from atmospheric low-level parameters. Part I: Sensitivity and calibration. J. Appl. Meteor., 32 , 1335–1351.
Bouttier, F., J-F. Mahfouf, and J. Noilhan, 1993b: Sequential assimilation of soil moisture from atmospheric low-level parameters. Part II: Implementation in a mesoscale model. J. Appl. Meteor., 32 , 1352–1364.
Buizza, R., M. Miller, and T. N. Palmer, 1999: Stochastic representation for model uncertainties in the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 125 , 2887–2908.
Burgers, G., P. VanLeeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126 , 1719–1724.
Crook, N. A., 1996: Sensitivity of moist convection forced by boundary layer processes to low-level thermodynamic fields. Mon. Wea. Rev., 124 , 1767–1785.
Dudhia, J., 1996: A multi-layer soil temperature model for MM5. Proc. Sixth Annual PSU/NCAR Mesoscale Model Users’ Workshop, Boulder, CO, NCAR, 49–51.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 , 10143–10162.
Evensen, G., 1997: Advanced data assimilation for strongly nonlinear dynamics. Mon. Wea. Rev., 125 , 1342–1354.
Fast, J. D., 1995: Mesoscale modeling and four-dimensional data assimilation in areas of highly complex terrain. J. Appl. Meteor., 34 , 2762–2782.
Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter three-dimensional variational analysis scheme. Mon. Wea. Rev., 128 , 1835–1851.
Hong, S-Y., and H-L. Pan, 1996: Nonlocal boundary layer vertical diffusion in a medium-range forecast model. Mon. Wea. Rev., 124 , 2322–2339.
Hou, D., E. Kalnay, and K. Drogemeier, 2001: Objective verification of the SAMEX ‘98 ensemble experiments. Mon. Wea. Rev., 129 , 73–91.
Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126 , 796–811.
Järvinen, H., E. Andersson, and F. Bouttier, 1999: Variational assimilation of time sequences of surface observations with serially correlated errors. Tellus, 51A , 469–488.
Kumar, N., and A. G. Russell, 1996: Comparing prognostic and diagnostic meteorological fields and their impacts on photochemical air quality modeling. Atmos. Environ., 12 , 1989–2010.
Kuo, Y-H., R. J. Reed, and S. Low-Nam, 1991: Effects of surface energy fluxes during the early development and rapid intensification stages of seven explosive cyclones in the western Atlantic. Mon. Wea. Rev., 119 , 457–476.
Leidner, S. M., D. R. Stauffer, and N. L. Seaman, 2001: Improving short-term numerical weather prediction in the California coastal zone by dynamic initialization of the marine boundary layer. Mon. Wea. Rev., 129 , 275–293.
Louis, J-F., 1979: A parametric model of vertical eddy fluxes in the atmosphere. Bound.-Layer Meteor., 17 , 187–202.
Mahfouf, J-F., 1991: Analysis of soil moisture from near-surface parameters: A feasibility study. J. Appl. Meteor., 30 , 1534–1547.
Margulis, S. A., and D. Entekhabi, 2001: Feedback between the land surface energy balance and atmospheric boundary layer diagnosed through a model and its adjoint. J. Hydrometeor., 2 , 599–620.
Margulis, S. A., and D. Entekhabi, 2003: Variational assimilation of radiometric surface temperature and reference-level micrometeorology into a model of the atmospheric boundary layer and land surface. Mon. Wea. Rev., 131 , 1272–1288.
McCaul, E. W., and C. Cohen, 2002: The impact on simulated storm structure and intensity of variations in the mixed layer and moist layer depths. Mon. Wea. Rev., 130 , 1722–1748.
Miller, R. N., E. F. Carter Jr., and S. T. Blue, 1999: Data assimilation into nonlinear stochastic models. Tellus, 51A , 167–194.
Mitchell, H. L., and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter. Mon. Wea. Rev., 128 , 416–433.
Mitchell, H. L., P. L. Houtekamer, and G. Pellerin, 2002: Ensemble size, balance, and model-error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130 , 2791–2808.
Morss, R. E., and K. A. Emanuel, 2002: Influence of added observations on analysis and forecast errors: Results from idealized systems. Quart. J. Roy. Meteor. Soc., 128 , 285–321.
Oncley, S. P., and J. Dudhia, 1995: Evaluation of surface fluxes from MM5 using observations. Mon. Wea. Rev., 123 , 3344–3357.
Reichle, R. H., D. Entekhabi, and D. B. McLaughlin, 2001a: Downscaling of radio brightness measurements for soil moisture estimation: A four-dimensional variational approach. Water Resour. Res., 37 , 2353–2364.
Reichle, R. H., D. B. McLaughlin, and D. Entekhabi, 2001b: Variational data assimilation of microwave radiobrightness observations for land surface hydrologic applications. IEEE Trans. Geosci. Remote Sens., 39 , 1708–1718.
Rhodin, A., F. Kucharski, U. Callies, D. P. Eppel, and W. Wergen, 1999: Variational analysis of effective soil moisture from screen-level atmospheric parameters: Application to a short-range weather forecast model. Quart. J. Roy. Meteor. Soc., 125 , 2427–2448.
Rogers, E., D. Parrish, and G. DiMego, 1999: Changes to the NCEP operational eta analysis. Tech. Rep. 454, National Weather Service Technical Procedures Bulletin, 25 pp.
Rotunno, R., W. C. Skamarock, and C. Snyder, 1998: Effects of surface drag on fronts within numerically simulated baroclinic waves. J. Atmos. Sci., 55 , 2119–2129.
Ruggiero, F. H., K. D. Sashegyi, R. V. Madala, and S. Raman, 1996: The use of surface observations in four-dimensional data assimilation using a mesoscale model. Mon. Wea. Rev., 124 , 1018–1033.
Shafran, P. C., N. L. Seaman, and G. A. Gayno, 2000: Evaluation of numerical predictions of boundary layer structure during the Lake Michigan Ozone Study. J. Appl. Meteor., 39 , 412–426.
Skamarock, W. C., J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. G. Powers, 2005: A description of the advanced research WRF version 2. NCAR Tech. Rep. TN-468, 100 pp.
Snyder, C., and F. Zhang, 2003: Assimilation of simulated doppler radar observations with an ensemble Kalman filter. Mon. Wea. Rev., 131 , 1663–1677.
Stauffer, D. R., and N. L. Seaman, 1994: Multiscale four-dimensional data assimilation. J. Appl. Meteor., 33 , 416–434.
Stauffer, D. R., N. L. Seaman, and F. S. Binkowski, 1991: Use of four-dimensional data assimilation in a limited-area mesoscale model. Part II: Effects of data assimilation within the planetary boundary layer. Mon. Wea. Rev., 119 , 734–754.
Troen, I., and L. Mahrt, 1986: A simple model of the atmospheric boundary layer: Sensitivity to surface evaporation. Bound.-Layer Meteor., 37 , 129–148.
Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130 , 1913–1924.
Zhang, D., and R. A. Anthes, 1982: A high-resolution model of the planetary boundary layer—Sensitivity tests and comparisons with SESAME-79 data. J. Appl. Meteor., 21 , 1594–1609.
WRF climatology std dev (spread) as a function of height AGL and forecast lead time. Contours of (a) σθ (0.1 K), (b) σU (0.5 m s−1), and (c) σQ (0.1 g kg−1) are shown up to 4000 m. The vertical line at forecast hour 24 is to aid comparison with later plots.
Citation: Monthly Weather Review 133, 11; 10.1175/MWR3022.1
Correlation coefficients relating screen-height forecasts to profiles of the same variable, as a function of height AGL and forecast lead time in the WRF forecasts. Contours of (a) cor(θs, θ), (b) cor (Us, U), and (c) cor (Qs, Q) are shown.
Citation: Monthly Weather Review 133, 11; 10.1175/MWR3022.1
Cross-correlation coefficients relating screen-height forecasts to profiles of other variables, as a function of height AGL and forecast lead time. Contours of (a) cor(θs, U), (b) cor(θ, Us), (c) cor(θs, Q), (d) cor(θ, Qs), (e) cor(Us, Q), and (f) cor(U, Qs) are shown.
Citation: Monthly Weather Review 133, 11; 10.1175/MWR3022.1
Ensemble std dev (spread), as a function of height AGL and forecast lead time for 100 realizations of the forecast model in Eq. (1). Contours of (a) σθ (0.1 K), (b) σU (0.5 m s−1), and (c) σQ (0.1 g kg−1) are shown up to 4000 m.
Citation: Monthly Weather Review 133, 11; 10.1175/MWR3022.1
Same as in Fig. 2, but for the 1D offline model.
Citation: Monthly Weather Review 133, 11; 10.1175/MWR3022.1
Ratio of ensemble spread to ensemble-mean error averaged over z ≤ 1000 m.
Citation: Monthly Weather Review 133, 11; 10.1175/MWR3022.1
Ensemble-mean error difference (EC − EU), between the EnKF-constrained (EC) and the unconstrained (EU) ensembles, as a function of height AGL and forecast lead time. Contours of (a) potential temperature θ (0.1 K), (b) U wind (0.5 m s−1), and (c) specific humidity Q (0.1 g kg−1) are shown up to 4000 m.
Citation: Monthly Weather Review 133, 11; 10.1175/MWR3022.1
Same as in Fig. 7, but for ensemble-mean error increase (positive shaded) from model error introduced through an incorrect specification of moisture availability, as a function of height AGL and forecast lead time.
Citation: Monthly Weather Review 133, 11; 10.1175/MWR3022.1
Estimates of moisture availability M. (a) The ensemble-mean absolute error normalized by the initial error (thick dotted curve), the ensemble spread normalized by the initial spread (thick dashed curve), and the ratio of the spread to error (thick solid curve). All are averages over 500 cases. (b) The ensemble-mean estimate for several different cases, and the correct value MT (thick dashed line). (c) The correlation of M with various surface observations, averaged over all cases.
Citation: Monthly Weather Review 133, 11; 10.1175/MWR3022.1
Because the integration of the offline model begins at 1200 UTC, after the corresponding 0000 UTC WRF forecast, the lead time in the offline model is actually 12–36. Here we denote the lead time as 24–48 h to be consistent with the tendency terms extracted from WRF forecasts.