A probabilistic verification and factor-separation analysis (FSA) elucidate skillful nowcasts of planetary boundary layer (PBL) temperature, moisture, and wind profiles with a single-column model (SCM) and ensemble filter (EF) assimilation of surface observations. Recently, an FSA showed the importance of surface assimilation versus advection and radiation on ensemble-mean skill. That work addressed the necessary complexity of the model and assimilation scheme for improving PBL nowcasts, relative to deterministic-mesoscale predictions. Here, probabilistic ensemble-based SCM forecasts are compared to a simple probabilistic postprocessing scheme termed climatological dressing (CD). CD adjusts a deterministic mesoscale forecast using surface-atmosphere 3D-climatological covariances, a 30-min persistence model, and surface-forecast errors. It also dresses the adjusted profile with an in-sample uncertainty distribution (obtained from archives) scaled by the 30-min forecast error. Superior deterministic skill from SCM/EF results during night when flow-dependent covariances are more accurate than climatological covariances. CD is deterministically more skillful for temperature and moisture profiles during daytime because SCM/PBL parameterization yields biased covariances. SCM/EF is most probabilistically skillful because (a) the EF covariances accommodate large seasonal variability, (b) the 30-min error persistence assumption fails during nighttime, and (c) vertical error covariance estimates from archived forecasts are generally poor estimates of actual error covariances. A probabilistic FSA of the SCM/EF shows the relative importance of surface assimilation, radiation parameterization, and advection during night. Results confirm surface assimilation as the most important factor. A factor can be deterministically beneficial and probabilistically detrimental, or vice versa, depending on its role in reducing mean error or improving sharpness. Assimilation results in notable probabilistic improvement for nowcasts of low-level jet structures.
Efficient and accurate nowcasting and forecasting of the planetary boundary layer (PBL) are valuable for a broad spectrum of practical forecasting applications. Forecasted precipitation and convective initiation have shown sensitivity to the PBL structure (Crook 1996; McCaul and Cohen 2002; Martin and Xue 2006). Precise PBL analysis of stability and mixing depth is important for air quality analysis and plume dispersion (e.g., Kumar and Russell 1996; Shafran et al. 2000). Wind resource siting and real-time wind power operations can benefit from accurate wind analyses and forecasts at several heights within the PBL (Wagner et al. 2009). Short-range model forecasts of local thermally driven circulations such as land–sea and mountain breezes can be improved if accurate representation of the initial PBL structure is available (e.g., Leidner et al. 2001).
Forecast error growth can be fast for turbulent scales that determine PBL structure, and an estimate of the forecast uncertainty via a probabilistic procedure is important even for very short-term forecasts. Real-time mesoscale ensemble systems capable of providing skillful high-resolution [O(1) km] probabilistic nowcasts and short-range forecasts of the PBL are beyond the capacity of present computational resources. Augmenting current high-resolution deterministic forecasts with probabilistic information is an alternative.
Rostkier-Edelstein and Hacker (2010, hereafter RH10) proposed one approach for retrieving flow-dependent probabilistic nowcasts of the state of the PBL in columns above existing surface-observation sites, at the cost of a few minutes per site on a desktop computer. It is based on a single-column model (SCM) and ensemble filter (EF) assimilation of surface observations, initialized and forced with profiles derived from timely 3D Weather Research and Forecasting Model (WRF) mesoscale predictions. Its deterministic skill (of the ensemble mean) was explored with factor separation analysis (FSA, following Stein and Alpert 1993) to assess the relative importance of surface assimilation and two additional model components in the SCM: externally imposed horizontal advection and parameterized radiation. The results showed that it is possible to nowcast (30-min forecast) skillful PBL profiles of wind, temperature, and water vapor mixing ratio under various flow scenarios with this relatively simple system. Surface assimilation plays the most significant role in improving deterministic nowcast skill up to several hundreds of meters above ground level (showing little sensitivity to weather conditions), particularly during the night when PBL and surface-layer parameterization schemes often fail.
Several questions arise from RH10 and related studies using SCM and EFs to estimate the state of the PBL (e.g., Rémy and Bergot 2010). What is the necessary complexity or sophistication of the model and assimilation scheme? Would the resulting nowcast PBL profiles be as accurate when assimilating the information in the surface observations into background profiles using simpler schemes than an EF? Would a simpler SCM be sufficient? Do flow-dependent covariances derived from an SCM (including externally imposed horizontal advection) ensemble contain the needed 3D flow information?
Although spatially anisotropic covariances may be critical for data assimilation in the PBL, it is not immediately clear that complete time dependence is needed. During synoptically quiescent periods and in regions lacking local thermal circulations (i.e., away from bodies of water and topography), local energy balances largely determine the PBL structure. To first order, local time of day determines those energy balances. One might then guess that climatological covariances conditioned on time of day could provide most of the information needed by an assimilation system. Such an approach could be much simpler, and even less computationally costly, than an SCM/EF with 10 to 100 members. Although we cannot expect success from this approach during synoptically active periods, much of the southern Great Plains (SGP) during the warm season presents favorable conditions.
We propose a simple probabilistic “dressing” method that takes advantage of the first-order forcing in the PBL and conditional-climatological error statistics, and use it as a reference against which the SCM/EF can be compared. In this reference system a deterministic-mesoscale forecast (3D WRF) is adjusted using surface-atmosphere 3D-climatological covariances (calculated within the 3D WRF sample and conditioned on the local time of day) and surface forecast errors where surface observations are available. We adopt 30-min persistence as an estimate of the error. It can be interpreted as an optimal interpolation technique based on climatological covariances and a 30-min persistence error model. The adjusted profile is dressed with the in-sample uncertainty distribution scaled by the most recent observed error to provide a probabilistic nowcast. We will refer to this method as climatological dressing (CD).
Both the SCM/EF and CD approaches rely on timely 3D WRF forecasts, and both approaches provide probabilistic information not available from deterministic 3D WRF forecasts. Because a current deterministic forecast can be viewed as the baseline that should be beaten by a proposed forecast system, a deterministic skill evaluation is needed to establish that both the SCM/EF and CD methods are an improvement. We can then compare the probabilistic skill of the two probabilistic methods.
The SCM/EF can be run with several levels of complexity, and we extend the deterministic FSA used in RH10 to quantify the contribution of the same model components (parameterized radiation, externally imposed horizontal advection, and surface assimilation) to the probabilistic skill of the SCM/EF for cases in which it improves WRF and CD estimations.
a. SCM predictions and SCM/EF flow-dependent covariances
The SCM/EF system used in this study has been described in detail in RH10 and is briefly summarized here. Horizontal momentum, thermodynamic, and moisture equations form the resolved dynamics. Vertical turbulence, atmospheric surface layer, and land surface parameterizations identical to those in version 2.2.1 of the Advanced Research core of the WRF (ARW; Skamarock et al. 2005) force and close the system. We chose the Mellor–Yamada–Janjić (MYJ)1 PBL scheme (Janjić 1996), as implemented in the WRF, the surface layer parameterization follows Monin–Obukhov similarity theory, extended by Beljaars and Holtstag (1991) to the free-convection regime, and land surface modeling is achieved with the Noah land surface model (Chen and Dudhia 2001; Ek et al. 2003). The Crank–Nicholson scheme is used to integrate resolved dynamics in time with a time step of 20 s. A total of 81 vertical levels are defined on a vertically stretched column with model top at approximately 16 km to properly simulate radiative processes.
As fully described in RH10, ensembles of initial conditions, geostrophic forcing, advective tendencies, and surface radiation (if not explicitly computed by the SCM) are imposed by starting with a WRF forecast column closest to the location of the surface observations for a given day and hour, then perturbing it with the scaled difference between that forecast and another randomly chosen archived forecast from the same season valid at the same hour. The WRF forecasts here were initialized directly from the National Center for Environmental Prediction’s (NCEP) Eta Model (i.e., “cold started”), and may be subject to spinup of moist processes during the first few hours. Neither liquid water nor ice is advected into the SCM domain, and clouds can only result from local vertical velocity. Radiative effects of clouds are underrepresented when radiative processes in the column are parameterized. When surface radiation from the WRF is imposed on the SCM, clouds from the 3D WRF simulation directly affect the surface energy balance.
Horizontal advection to allow for the effects of 3D mesoscale dynamics is implemented following Ghan et al. (1999), which describes an approach for upstream advection of temperature (T), water vapor mixing ratio (Qυ), and wind (U and V components) in SCMs. This scheme acts to relax the SCM conditions toward a prescribed 3D state (e.g., WRF forecasts), on the advective time scale. The advection speed is dynamically tuned by augmenting the model state to allow covariances between predicted observations and advection speed to increment the advection speed in the data assimilation. State augmentation in ensemble filters is more typically adopted to estimate physical parameters (cf. Hacker and Snyder 2005; Aksoy et al. 2006; Tong and Xue 2008). Here, it simulates the effect of assimilating data in three spatial dimensions, and diminishes unrealistically rapid growth in ensemble spread due to large variance in the WRF forecasts from which forcing is drawn. The Rapid Radiative Transfer Model (RRTM) longwave (Mlawer et al. 1997) and Dudhia shortwave (Dudhia 1989) radiation schemes are also options. The longwave scheme is included particularly to improve simulations during the night, when radiative cooling can be important in the PBL.
The SCM is coupled to the Data Assimilation Research Testbed (DART; Anderson et al. 2009) system developed at the National Center for Atmospheric Research (NCAR). We use the default ensemble adjustment Kalman filter (EAKF; Anderson 2001), which is a square root filter, and implemented with serial observation processing (Anderson 2003). Extensive documentation on EF may be found in the literature, including the citations above. The EF is used to assimilate anemometer- and shelter-height observations of U, V, Qυ, and T (at 10 and 2 m AGL, respectively) and to estimate the atmospheric (temperature, moisture, and winds) and ground (temperature and moisture) states of the SCM. We focus on 30-min ensemble forecasts (“nowcasts”) from the EF analyses.
b. A column dressing approach
The CD system, based on climatological covariances from the 3D WRF sample, is in essence a dressing technique whereby a deterministic 3D WRF mesoscale forecast is adjusted and dressed with a normal distribution according to a sample of forecasts and surface-forecast errors. We are exploiting WRF-derived covariances, a recent observation, and observation variance to generate a probabilistic PBL prediction.
Motivated by the fact that a forecaster may have access to a recent deterministic mesoscale forecast, the approach is first to correct the error in the profile using covariances computed from the distributions of archived 3D WRF forecasts, and then dress a timely deterministic forecast with an uncertainty distribution scaled by the most recent measured error. Given a surface observation ok at time k, the error in the corresponding forecast yk is dy,k = ok − yk. Observation ok is unavailable to a forecast. Because we are interested in very short-range forecasting (30 min), we assume that persistence is the best prediction for the error over the 30-min prediction time. Thus, dy,k = dy,k−1.
Sample covariances are formed from a conditional climatology of WRF deterministic forecasts. Conditioning is such that a covariance estimate is valid for the same season, and at the same local time of day, as a current forecast. Samples are simply formed from an archive of forecasts spanning 3 May–15 July 2003. For example, a collection of forecasts with lead times of 5.5 and 17.5 h initialized at 0000 UTC may be used to correct and dress WRF forecasts valid at 0030 and 1230 local daylight time (LDT) (0530 and 1730 UTC), respectively.
The predicted error dy,k, climatological forecast variances , and observation error variances can be used to produce a probabilistic short-range forecast from a recent deterministic forecast. A predicted observation increment can be formed from principles of optimal interpolation:
where is the same value as assigned in the SCM/EF system. We then map the predicted error dy,k to an increment Δx for an arbitrary state variable x via a univariate linear regression using the sample covariances [here, cross covariances are disregarded as they are expected to be significantly smaller than direct covariances (Hacker and Snyder 2005)]. Dropping the time index for clarity, the estimated state increment is
where Δx is the adjustment applied to a WRF-column state variable and is the climatological covariance between the state variable and the WRF-diagnosed value in observation space.
To estimate uncertainty, we dress the adjusted profile by forcing the expected error to produce a reliable system in observation space. This can be accomplished by scaling the climatological error variance:
where 〈·〉 is the expectation operator and α is a derived scaling factor. Because the individual error dk is not useful here (i.e., only the climatological mean is used), all quantities in Eq. (3) are valid at the dressing time of day but have no specific relationship to an individual forecast. This scaling by α is universally applied in state space such that the expected error in a forecast state variable is assumed:
where is the WRF climatological variance of a state variable. Given distributions of c, o, and observation error valid for a single forecast lead time (and corresponding time of day), the scaling does not depend on a particular forecast. In the present application, α < 1 [since ; see Eq.(3)] so the scaling acts to shrink the climatological variance.
3. Experiment design
The set of SCM/EF numerical experiments is identical to that evaluated in RH10. The SCM described in section 2a has 16 levels in the first 1 km AGL and the lowest level at about 20 m AGL (Fig. 1). The experiment period is 3 May–15 July 2003, and the SCM is configured to run over the Atmospheric Radiation Measurement (ARM) SGP Central Facility near Lamont, Oklahoma. The 30-min average surface observations of winds (U and V vector components), T, and Qυ are used for assimilation. Observation-error variances are specified in the same manner as in Hacker and Rostkier-Edelstein (2007), which agree roughly with values estimated in Crook (1996): 1 K2, 10−6 kg2 kg−2, and 1 m2 s−2 for temperature, mixing ratio, and wind, respectively. Archived runs of version 2.1 of the WRF-ARW2, with horizontal grid size ΔX = 4 km, initialized at 1900 LDT (0000 UTC), and coinciding with the experiment period are used for ensemble initialization and to provide advective tendencies, large-scale forcing, and surface radiation. The 30-min nowcasts are verified against the 0030 LDT (0530 UTC) and 1230 LDT (1730 UTC) soundings and surface observations.
The ensemble is initialized 3.5 h prior to verification and spun up for 1 h, followed by three surface-observation assimilation cycles at 30-min intervals. WRF forecasts with lead times of 2–5.5 and 14–17.5 h, for verifications at 0030 and 1230 LDT, respectively, are used for initialization and forcing. WRF-forecast mean absolute error (MAE; not shown here) confirms that adjustment (spinup) is completed by 2200 LDT and the nighttime experiments are not subject to those effects. The experiments are also used to analyze the effects of three model components (surface assimilation, parameterized radiation, and horizontal advection) on the probabilistic skill of the SCM/EF system, following the FSA (see section 4b).
The relationship between the spread (standard deviation squared) of the ensemble and the ensemble-mean error is the canonical first-order assessment of filter performance. To avoid the need for normalization, the relevant quantity is the mean-squared error (MSE) of the ensemble mean. Following Desroziers et al. (2005), the spread includes the observation error variances as assigned in the EF. These reflect typical values but have not been tuned (see, e.g., Desroziers et al. 2005). The spread-error ratio should be unity for a system composed of an infinite ensemble, a perfect model, and a perfect ensemble filter with Gaussian error statistics. Sampling error can lead to spread that is somewhat less than error, while forecast bias will inflate the MSE without affecting the ensemble dispersion. Both factors will lead to spread-error ratios smaller than one.
Figure 2 shows spread-error ratios from 30-min surface forecasts of (a) T, (b) Qυ, (c) U, and (d) V. Circles and squares show the results of our nighttime and daytime experiments, respectively. Analysis cycles numbered 1–4 correspond either to 0400–0530 LDT and 1600–1730 LDT for night or day, respectively. Error bars represent 90% confidence intervals (CIs) derived using the BCa bootstrapping technique (described later).
Ratios from our experiments are in general greater than or equal to one, and show reasonable EF tuning. Spread-error ratios increase with assimilation cycling for T, in particular at night. Investigation of error and spread individually (not shown here) reveals that error decreases faster than spread, indicating that observation variance as assigned in the EF might be too large. Conversely, Qυ ratios are improved along assimilation cycling during both day and night simulations. This is a result of spread increase, error decrease, and a suitable choice of . For U and V, the ratios show less variability along the assimilation cycles. The U ratios show the best ensemble performance, while V and T show a slight overdispersion at night.
CD probabilistic nowcasts are computed for the same experiment period and location. WRF forecast climatological surface-atmosphere covariances are computed from the sample of all available forecasts during the experiment period that are valid at the verification time. Because WRF forecasts were initialized at 1900 LDT, the collection of forecasts with lead times of 5.5 and 17.5 h (0030 and 1230 LDT, respectively) are corrected and dressed with uncertainty. The WRF vertical grid contains six levels in the first 1 km AGL, with the first level at about 60 m AGL (Fig. 1). The results of the CD method should be unrealistically good because the forecast and dressing samples are the same, providing a difficult benchmark for the ensemble system to beat.
4. Analysis methods
a. Verification metrics
We adopt the MAE (Wilks 1995) for the deterministic comparison among the SCM/EF, CD, and 3D WRF profiles. We choose the MAE rather than the root-mean-square error (RMSE) because the former is more resistant to outliers. We also calculate the MSE decomposition into error variance and mean-error squared (squared bias) to assess the effect of bias on the assimilation scheme.
A probabilistic evaluation measures three attributes of the SCM/EF and CD probabilistic forecasts: reliability, resolution, and discrimination. The Brier skill score (BSS; Wilks 1995) and the area under the relative operating characteristic (ROC) curve (AUR; Mason and Graham 1999) complement each other and meet our needs. Both of these scores evaluate the system performance relative to a reference system. The BSS is easily decomposed into both a reliability and a resolution term (Murphy 1973) to understand the trade-offs between different components of probabilistic skill. The ROC curve has been widely used in the field of signal detection to distinguish between two alternative results (Mason 1982); thus, AUR quantifies the ability of the forecast to discriminate between events. We define an event, unique to each variable and verification time, to be a forecast value exceeding the range of the 75th percentile of the observations at the ARM central facility during the experiment period. The in-sample climatology derived from the observations during the experiment period serves as the reference forecast.
The number of realizations is finite: 65 and 57 verification times against soundings (at 0030 and 1230 LDT, respectively) and 71 verification times against surface observations; an estimate of the uncertainty in the verification scores is required to allow meaningful statistical conclusions. We quantify the scores’ uncertainty through the estimation of CIs following a BCa bootstrap technique (Efron and Tibshirani 1993), as described in RH10. CI estimates at 90% for all verification metrics are given from BCa bootstrapping a sample of 1000, and will be presented as either error bars or thin lines bracketing the scores. Since the BCa CIs do not assume a normal distribution of errors around the score, they are generally asymmetric.
Recall that all experiments are run over the same season and verified against the same observations. The different magnitudes of the CIs among the different methods and configurations at a given verification time reflect the sensitivity of the skill to day-to-day flow variability. We seek a consistently skillful system for the range of flow scenarios present during the experiment, which narrow CIs will indicate. Wide CIs indicate that skill is more sensitive to flow variability.
FSA has been applied in several sensitivity studies (Alpert and Sholokhman 2011). RH10 applied it to the evaluation of the SCM/EF ensemble-mean skill and directly quantified the contributions of three model components (surface assimilation, radiation, and advection) to skill. Here, we apply the FSA in a similar manner to assess the relative contribution of those model components (and of their interactions) to the probabilistic skill of the SCM/EF. This implementation of FSA requires skill evaluation for all 23 = 8 model configurations, and Table 1 lists all possible combinations.
Factor separation (FS) equations are derived by a Taylor expansion of the effect of each component. Stein and Alpert (1993) derived the resulting equations for the factors fijk for three model components [Eqs. (17)–(24) in their manuscript, Eqs. (5)–(12) in RH10]:
RH10 discussed the meaning of Eqs. (5)–(12) within the framework of verification, which can be summarized as follows: each eijk represents system skill as measured by a verification metric (defined in section 4a), ijk denotes a binary switch corresponding to each model component, and fijk denotes the separated factor associated with each individual model component or each combination.
Similarly to RH10, we define fall as the sum of all single, dual and triple factors defined above, or, equivalently, as the difference in the studied metric between the base system and the system when all three of the studied components are included:
Equation (13) is a metric comparison more typical when evaluating one or more model improvements.
5. Verification results and interpretation
a. SCM/EF mean, CD, and 3D WRF profiles
In this section we first note the general observations from the analysis and, then, discuss each result in detail. The MAE differences between (a) the ensemble-mean SCM/EF and the 3D WRF profiles and (b) the CD nowcasts and the 3D WRF profiles indicate improvement under various flow scenarios. The SCM/EF appears to provide additional skill over a wider range of situations than CD, especially in the first ~200 m AGL under statically stable nighttime conditions when CD fails. Superiority of the SCM/EF in some cases results from the ability to handle seasonal flow variability not accounted for in static CD covariances, but in other cases it is the result of the failure of either the 30-min surface error persistence assumption in the CD or the WRF climatological covariances to represent error covariances. During the day, characterized by PBL convection, less seasonal variability than at night, slower time variability at the surface, and persistent diurnal structures, CD can outperform the SCM/EF. The SCM T and Qυ profiles show bias during the day. For some variables, at night, the SCM/EF shows poorer skill than the 3D WRF above approximately 200 m AGL. Surface-atmosphere covariances decrease more steeply with height than during the daytime (Hacker and Snyder 2005), leading to spurious covariances at those levels (Hacker et al. 2007).
The 3D WRF forecasts used to drive both the SCM/EF and CD represent the deterministic baseline, and we first analyze the MAE for the 3D WRF profiles at 1230 and 0030 LDT (lead times of 17.5 and 5.5 h, respectively; see Fig. 3). In most cases errors are greater at 0030 than at 1230 LDT. This is expected, as PBL and surface-layer parameterization can fail under nighttime statically stable regimes (see, e.g., RH10, Mauritsen et al. 2007) when Monin–Obukhov theory is invalid. In particular, T profiles show larger MAEs at 0030 LDT than at 1230 LDT at levels very close to the surface. The U MAE at 0030 LDT is greater than at 1230 LDT at most vertical levels, while V at 0030 LDT shows greater MAEs than at 1230 LDT at levels where low-level jets (LLJs) develop.3 This behavior is consistent with the findings of Storm et al. (2009), who showed the limitations of WRF forecasts (at the same horizontal resolution) in reproducing LLJ speeds at some locations over the SGP.
Decomposing MSE into error variance and squared bias (Fig. 4) shows that bias has a nonnegligible contribution to the Qυ error at 1230 LDT (Fig. 4a). At 0030 LDT, bias dominates T error in the first tens of meters AGL (Fig. 4b). In all other instances error variance dominates.
We next analyze MAE differences between the SCM/EF and CD profiles with respect to 3D WRF forecasts. In most cases we limit the discussion to MAE differences for which the SCM/EF and CD show an advantage, exceeding the 90% CIs, over the 3D WRF results between the surface and approximately 1 km AGL.
MAE differences between the SCM/EF in its most complex configuration (cf. section 4b), and 3D WRF profiles, at 1230 and 0030 LDT, show that the SCM/EF skill can exceed 3D WRF skill under various flow scenarios (Fig. 5). Negative values in each panel indicate where the SCM/EF is an improvement. At various levels the skill of the SCM/EF appears to be worse than that of the 3D WRF forecasts, or shows no advantage. The vertical extent of the MAE reduction by the SCM/EF is greater at 1230 LDT than at 0030 LDT, in particular for U and V (Figs. 5e,g). For T and V at 0030 LDT (Figs. 5b,h), the SCM/EF shows more skill than that of the 3D WRF results below 200 m AGL, but is poorer for levels above ~200 m AGL. The poor SCM/EF behavior aloft may result from small spurious covariances that could be corrected through better ensemble sampling or better-tuned vertical localization (e.g., Hacker et al. 2007). Since surface assimilation is not implemented in the 3D WRF forecasts, MAE differences in Fig. 5 suggest under which flow conditions and for which vertical levels surface assimilation is most useful in reducing MAE.
MSE differences (SCM/EF minus WRF) and their decomposition into variance and squared-bias differences reveal that the SCM/EF reduces error variance and that in most cases the bias differences are negligible. Notable exceptions are the larger T and Qυ biases at 1230 LDT in the SCM/EF, as shown in Fig. 6. Bias is the main contributor to SCM/EF-Qυ error at 1230 LDT. Similarly, RH10 showed that assimilation worsened the wet bias at 1230 LDT, consistent with the negative MAE differences (SCM/EF minus 3D WRF) at various levels in Fig. 5c. This illustrates the need for explicit bias removal when assimilating observations into biased background estimates (e.g., Dee 2005; Ancell et al. 2011).
Figure 7 shows MAE differences between the CD and the 3D WRF profiles at 1230 LDT. CD reduces the MAE in the T and Qυ profiles at 1230 LDT, up to approximately 1 km AGL, more effectively than the SCM/EF (Figs. 7a,b and 5a,c, respectively). Decomposition of the MSE differences (CD minus WRF) reveals that unlike the SCM/EF, the CD forecasts have no significant contribution from bias (not shown here). The improvement from CD on U and V profiles at 1230 LDT is limited to the first tens of meters AGL (Figs. 7c,d).
To better understand why CD skill varies among different variables, we analyze the kurtosis of the profiles (Fig. 8). Kurtosis (the fourth moment about the mean) is a measure of the sharpness of the distribution and is related to the variability in a sample distribution (e.g., Jolliffe and Stephenson 2003). It is a reduced variable and can be used to compare between different observation types. Greater kurtosis indicates that the tails of the probability density decrease slowly and that rare events with respect to the standard deviation are more frequent. The seasonal variability of U and V along the profile is similar to that of T and Qυ, as deduced from the observation sample kurtosis (Fig. 8). We would expect static mean-climatological covariances to be as effective for U and V as for T and Qυ and produce similar levels of CD skill.
Further analysis shows that the poor CD forecasts of U and V at 1230 LDT can be attributed to poorly estimated error covariances. Comparison with true error covariances (computed from 3D WRF and observations) indicates the problems. Figure 9 shows the CD adjustment factor in Eq. (2) calculated using either WRF covariances (as in our calculations) or error covariances, for each of the four analyzed variables at 1230 and 0030 LDT. At 1230 LDT, Qυ adjustment factors based on model covariances and error covariances show good agreement up to about 800 m AGL. For T, the factors based on model covariances are close to and follow the general vertical shape of their error-covariance-based counterparts. Conversely, U and V factors calculated using model covariances are significantly different from those based on true error covariances, both in their shapes and values.
CD forecasts show no improvement over the 3D WRF method at 0030 LDT, and several factors may contribute. First, the suitability of the 30-min persistent-error forecast assumption can be questioned. Table 2 summarizes mean-squared surface errors at 0030 and 1230 LDT. The 30-min persistence assumption appears suitable at 1230 LDT, but inaccurate at 0030 LDT. Second, climatological model-derived covariances at 0030 LDT can be biased. Figures 9b,d,f,h show that these results strongly differ from the actual error covariances. We also analyze the variability of the observed flow at 0030 LDT: for U and V the observation distributions (Fig. 8) show larger variability during the night than during the day at several vertical levels, and poor CD performance is likely due in this case to the use of static covariances. Finally, we recall that WRF covariances, as opposed to the SCM/EF, are defined on a relatively coarse vertical grid (Fig. 1) that is unable to resolve shallow structures close to the surface during the night.
b. Probabilistic evaluation of SCM/EF and CD
We are interested in the probabilistic information provided by the SCM/EF and CD, which is not available in 3D WRF deterministic forecasts. Discussion again focuses on forecasts when the mean error from CD and/or SCM/EF is lower than that from 3D WRF, so that the probabilistic information augments a superior deterministic forecast (see Figs. 5 and 7). Verification shows that simple methods using conditional climatological covariances can be effective in estimating forecast uncertainty during the daytime when the seasonal variability is narrow, and the 30-min error persistence model is valid. But the SCM/EF is more probabilistically skillful when flow dependence is needed or the persistent error model breaks down.
The SCM/EF simulations provide useful probabilistic information with respect to observed climatology both at 1230 and 0030 LDT. Each of the scores (BSS, reliability, resolution, and AUR) shows a similar range of values for all variables along the profile, except for Qυ at 1230 LDT. The reliability score indicates bias for Qυ at 1230 LDT, consistent with the deterministic verification results in section 5a and RH10. Further, the probabilistic comparison between the SCM/EF and CD at 1230 LDT shows that CD outperforms SCM/EF scores of T and Qυ profiles, with better reliability, resolution, or/and AUR at several levels. These results are consistent with the deterministic comparison in section 5a.
Since the probabilistic verification along the profile involves multiple levels, we will present detailed profiles of the scores only when vertical variability is notable. Otherwise, we summarize the results in terms of ranges of values along the profile. Figure 10 presents ranges of values of BSS, reliability, resolution, and AUR scores along the profile (up to 1 km AGL) for 30-min nowcasts at 1230 LDT with the SCM/EF in its most complex configuration. Boldface lines denote the ranges of the scores derived from our experiments, and thin lines reflect the range of 90% CIs. The BSS is positively oriented so that higher values indicate more skill; a perfect forecasting system gives BSS = 1, and BSS ≤ 0 indicates no skill or less skill than a climatological forecast. The AUR serves as a metric to assess the discrimination of the forecast; AUR = 1 is perfect discrimination, while 0.5 indicates that the forecast has no discrimination.
Scores are a function of both the forecast system and the reference (here observation climatology) used in the calculation of the verification metric. A narrow observation climatology will, in practice, represent a useful forecast, and the added value of a flow-dependent probabilistic forecast can be small, as very little further resolution or discrimination enhancement is anticipated. Conversely, we can expect a more significant improvement in resolution/discrimination when the observation climatology is widely distributed.
During a majority of flow scenarios the SCM/EF provides skillful 30-min probabilistic forecasts within the 90% CIs, but there are cases that cannot be unambiguously identified as skillful because CIs cross the no-skill level of BSS = 0. In most instances, the values obtained from our experiments show little dependence on height, but the CIs may significantly vary along the profile, as results for Qυ at 1230 LDT and V at 0030 LDT (Figs. 11a,b). The Qυ at 1230 LDT is the only instance for which our experiment results lack skill.
BSS decomposition into reliability and resolution allows further interpretation; BSS = (resolution − reliability)/uncertainty. BSS uncertainty is a function of the distribution of observations, and varies among different variables. Resolution less than or equal to the reliability indicates a prediction with no skill, and in a perfect system the resolution is equal to the sum of the reliability and uncertainty. In general, the BSS is improved by maximizing the resolution (upper bound reliability plus uncertainty) and minimizing the reliability term (lower bound 0). Note again that resolution and reliability are independent because they result from different distributions.
Figure 10 shows that reliability and resolution are within the same range of values at 1230 and 0030 LDT. The Qυ at 1230 LDT is the most obvious exception (Fig. 10a). Decomposition into reliability and resolution terms (Figs. 10c and 10e, respectively) and inspection of the scores along the profile (Figs. 11a,c,e) show that lack of skill in Qυ profiles at 1230 LDT mainly results from poor reliability (higher values of the reliability term) at levels between 300 and 500 m and in the first tens of meters AGL. Reduced resolution (smaller resolution term) negatively contributes to the BSS of Qυ in the lowest hundred meters AGL (Fig. 11c) to some extent too. The largest range of CIs is observed for V at 0030 LDT (Figs. 10b,d,f,h), in particular for resolution. Detailed inspection of the scores along the profile (Figs. 11b,d,f,h) shows that resolution determines the vertical shape of the BSS profile, consistent with the observed kurtosis (Fig. 8). Similar vertical variability is observed for AUR too (Fig. 11h). Greater variability is observed at levels where the LLJ develops. BSS, resolution, and kurtosis are also correlated at night along the profile for U and T, but to a lesser extent (not shown here).
We next compare the SCM/EF and CD probabilistic skill at 1230 LDT (CD is more skillful than WRF profiles only at 1230 LDT), and discuss only T and Qυ profiles, as the improvement of CD over WRF for U and V at larger than 90% CIs is limited to the first ~100 m AGL only. Figure 12 shows the difference between the probabilistic scores resulting from the SCM/EF and the CD for T and Qυ profiles at 1230 LDT. A positive difference in BSS indicates more skillful SCM/EF compared to CD.
Figures 12a,b show the advantage of CD over the SCM/EF in the BSS (negative difference) of T and Qυ nowcasts at 1230 LDT. Figures 12c–f reveal that this is a result of better reliability (positive difference) and resolution (negative difference) of the CD. The advantage of CD in reliability is consistent with the deterministic results showing bias in the SCM/EF forecasts (section 5a). Resolution gained from CD (negative difference) is consistent with the narrow seasonal distribution (Fig. 8). CD shows better discrimination than SCM/EF (negative difference; see Fig. 8), particularly near the surface.
The poorer resolution and discrimination of the SCM/EF result from external forcing on the SCM. Ensemble perturbations used for SCM/EF initialization, forcing, and advection tendencies may lead to too much spread in the ensemble. Resolution is closely related to sharpness. A sharper and likely better resolved ensemble may in certain conditions result from analog methods to choose sample profiles (e.g., Delle Monache et al. 2011) used for ensemble perturbations. Such an analog technique is expected to be more effective at improving resolution when the seasonal distributions from which the ensemble perturbations are sampled show larger variance and kurtosis.
c. Probabilistic evaluation of the SCM/EF system within the framework of factor separation
Probabilistic FSA of SCM/EF nowcasts elucidates some properties not evident from the deterministic FSA in RH10. The basic model configuration in this analysis is the SCM/EF forced by WRF geostrophic winds; additional factors like advection, parameterized radiation, and assimilation are evaluated for their impact on the probabilistic skill. The effect of a given factor on ensemble reliability is in certain instances similar to its effect on ensemble-mean skill, especially for biased forecasts. The FSA shows that the effect of a given factor on ensemble resolution/discrimination can be opposite to its impact on ensemble-mean skill or on ensemble reliability. Metrics quantifying ensemble resolution and discrimination depend on the reference forecast, and improvements are most achievable when climatology (our reference) shows larger variance and kurtosis.
1) FS evaluation in observation space
Figures 13 and 14 show the FS of the BSS, its decomposition into resolution and reliability, and the AUR for 30-min surface forecasts at 0030 LDT of T and Qυ, and U and V, respectively. The variable f000 (filled square) corresponds to the scores of the basic model configuration. When advection, parameterized radiation, and assimilation are included, the resulting scores are represented by the open squares (fall, sum of contributions from all factors to the base system). Other symbols represent the effects of specific (single or synergistic) factors on skill as defined in Eqs. (6)–(12).
The effects of all factors on BSS (Figs. 13a,b and 14a,b) are qualitatively similar to those for MAE (see RH10, but note that the results are reversed in sign because of the score orientation). BSSs show that in most cases the basic configuration (f000 = e000, filled square) has probabilistic skill relative to the climatology of the observations. All three components (fall, open square) improve skill, reaching in some cases the maximum attainable value of BSS = 1 (f000 + fall). Further inspection reveals that surface assimilation (f100, triangle) has the largest effect on probabilistic skill improvement, and in some cases (e.g., Qυ) it leads to BSS ≈ 1 (f000 + f100). Some additional factors show weak effects depending on the variable. Advection slightly improves BSS, with the exception of a minor negative effect on Qυ. Advection–assimilation synergism can also be detrimental, most notably for Qυ.
Figures 13c–f and 14c–f complement the information and show how Brier reliability and resolution terms determine the BSS. BSS improvement may be due to an improvement in reliability, resolution, or both. Assimilation always shows a positive effect (in some cases not exceeding 90% CIs) toward improving both the resolution and reliability, and CIs are generally narrow. Advection improves reliability (in some cases not exceeding 90% CIs) except for a slight negative effect on V-profile reliability. Advection also improves the V-profile resolution exceeding 90% CIs.
Positive effects on reliability often result from reducing bias, which assimilation and advection both do. Assimilation contracts the spread of the system as it increments the ensemble members to the observations to improve the resolution. Advection improves V-profile resolution by contracting the excessive ensemble spread (not shown) that results from climatological geostrophic forcing.
Nonlinear interactions vary, and all are characterized by wide CIs. Their effects are generally negative when the single factors lead to skill improvement, and positive when single factors degrade skill. Assimilation–advection (f101, filled circle) and also the triple advection–radiation–assimilation synergism (f111, open circle) are the only factors leading to effects exceeding 90% CIs.
AUR plots (Figs. 13g,h and 14g,h) show that the base configuration is the main contributor to discrimination, and that the simpler base model configuration can gain most of the skill realized among any experiment here. Addition of assimilation, radiation, or advection does improve the discrimination in some cases. AUR factors are qualitatively similar to resolution factors, but the relative contribution from the base configuration to maximum skill is greater for discrimination. Resolution measures the ability to forecast a particular categorical event. Discrimination measures an ability to forecast above or below a threshold, and is less sensitive than resolution to error magnitude. The AUR factors compared to resolution factors show that greater complexity (e.g., including DA) yields a greater improvement to the resolution of the base system than the same complexity does to the discrimination of the base system.
2) FSA of nowcast profiles: Brier skill scores and their decomposition
Probabilistic skill in the profile realized by the basic system SCM/EF at 1230 LDT is difficult to beat with additional factors (plots not shown here). This differs from the deterministic FS results reported in RH10 and the probabilistic skill in observation space reported in the last section, which both show clear improvement resulting from assimilation and advection. Here we limit the discussion to exceptions found for 0030 LDT nowcasts, and to effects that are different from zero at greater than 90% confidence. At times, effects on reliability and resolution may be smaller than 90% confidence, but the resulting effects on BSS are greater.
(i) Surface assimilation
Surface assimilation is the most important factor in improving probabilistic skill for all variables at 0030 LDT, especially at levels where the LLJ develops (Fig. 15). Its effect is confined to a few hundred meters AGL because surface-atmosphere covariance strength rapidly decreases with height under statically stable nighttime conditions. It is the only factor that improves BSS for all variables (though not at all vertical levels). The decomposition of BSS into reliability and resolution (Fig. 16) shows that the contribution of these terms to the BSS depends on the variable.
Surface assimilation improves the BSS of T nowcasts mainly by improving reliability (reducing bias; see Fig. 16a) and to some extent by increasing resolution (Fig. 16b) at low levels (~100 m AGL; not shown here) where observation variance is wider (Fig. 8d). The Qυ profiles benefit from improved reliability and resolution due to surface assimilation (Figs. 16a,b), but the BSS improvement is not consistently greater than 90% confidence.
The resolution and reliability of U profiles are improved by surface assimilation at several levels (Figs. 16a,b). Improvement is most notable where the LLJ develops (not shown here), at levels characterized by wider observation variance (Fig. 8k). Consistently, wider CIs for BSS U at ~100 m AGL (Fig. 15c) result from the greater variability associated with LLJs.
The effect of surface assimilation on V shows a spike-shaped BSS improvement at ~100 m AGL (Fig. 15d), resulting from improvement in both reliability and resolution (Fig. 16) at levels characterized by greater observed variance (Fig. 8o). Similar to the effect on U profiles, wide CIs for V show the variability attributable to LLJs. These results illustrate again that surface assimilation reduces bias and improves the resolution of LLJ nowcasts.
(ii) Externally imposed horizontal advection
Advection produces the greatest effect besides surface assimilation on the BSS profile of only two of the analyzed variables: T and U (Figs. 17a,b). Its effect is in most cases beneficial, but several smaller detrimental effects appear. Cases of no net BSS improvement result from opposite effects on resolution and reliability.
Resolution improvement in T and U (Figs. 17c,d) is apparent at various levels. We find that U, which is characterized by greater kurtosis (Fig. 8l) improves most. Wide CIs for U profiles show the scenario dependency of this factor, coinciding with the larger seasonal variability of the observed U profiles (Fig. 8l). Reliability in T and V improves at some levels too (not shown here).
(iii) Parameterized radiation
Parameterized radiation is the weakest individual factor. Its effect differs from zero at greater than 90% CIs for T and V BSSs and/or its decomposition (Fig. 18). CIs are significantly narrower than those for assimilation and advection, indicating little sensitivity to the weather scenario. This result is expected because cloud effects are not considered in our simulations. We note that the FS of ensemble-mean MAE in RH10 showed the effect of radiation only on T profiles, illustrating that factors may contribute differently depending on whether their impact is analyzed deterministically or probabilistically. Figure 18a shows that the vertical extent of its benefit for T-profile BSS extends up to ~300 m AGL, resulting from the shallow effect of radiative-flux divergence. Radiation benefits V resolution and harms V reliability at ~150 m AGL (Figs. 18b,c), without a net effect on BSS. Radiative cooling included here can affect the land–atmosphere decoupling that helps develop the LLJ. This level is characterized by increased variance in the profile (Fig. 8o), and additional factors may improve resolution not saturated in the base configuration. The cause of the negative effect on reliability is still unclear.
(iv) Dual and triple factors
The effect of dual factors is shown in Figs. 19–21. In most cases, dual factors show that two factors do not linearly combine and result in no further skill improvements, but some positive synergisms are apparent.
Assimilation–advection degrades the BSSs of T, U, and V (Figs. 19a,e,g), canceling the improvement gained from the individual factors, except for the BSS improvement of V resulting from resolution improvement where the LLJ develops (Fig. 19h). Assimilation and advection individually lead to small improvements at these levels. When both act together, advection is tuned through state augmentation to better fit the SCM, ensemble spread is contracted (see Fig. 1 in RH10), and positive synergism results. CIs are significantly larger for V at levels where the LLJ develops.
Dual factors that include parameterized radiation (assimilation–radiation and advection–radiation; see Figs. 20 and 21) affect BSS and its decomposition for T and V, consistent with the effects of radiation individually. The magnitude is generally smaller than from other dual factors. Similar to advection–assimilation, both dual factors including radiation harm BSSs of T profiles by canceling their individual benefits (Figs. 20a and 21a). Assimilation–radiation synergism improves the V-profile BSS (Fig. 20b), though not always at greater than 90% confidence. It shows a positive spike at levels where the LLJ develops (~150 m AGL).
Advection–radiation synergism is beneficial to the reliability and detrimental to the resolution of V profiles at LLJ levels (Figs. 21b,c). We recall that parameterized radiation alone harms reliability, and the dual synergism with advection cancels it.
The effects of synergistic factors that include radiation are characterized by small CIs, consistent with those associated with radiation alone. No net effect of the triple synergistic factor results.
3) FSA of nowcast profiles: AUR
As in observation space, the base system is the main contributor to AUR in the profile (not shown here), especially for T. Some improvements from additional factors are found for Qυ, U, and V, as summarized in Fig. 22.
The AURs for Qυ and U profiles improve from assimilation and advection individually. Two dual factors show nonzero effects on AUR at greater than 90% CIs: assimilation–advection (on U and V profiles) and assimilation–radiation (on V profiles). Some of the effects on AUR coincide with effects on BSS and/or resolution. But a factor showing no effect on BSS and on its decomposition can affect AUR (e.g., the harm that advection does to Qυ profiles). Finally, U-profile resolution and AUR are opposite in assimilation–advection synergism. The narrowest CIs result from assimilation–radiation. Conversely, assimilation–advection gives the widest CIs for V, showing again large observed V variability.
6. Summary and conclusions
The present study further investigates the beneficial complexity of the model and assimilation scheme used for probabilistic nowcasts of PBL profiles of wind, temperature, and moisture in columns above surface-observation sites. Our method is based on an SCM and assimilation of surface observations with an EF (SCM/EF). It is initialized and forced with 3D WRF forecasts and uses flow-dependent covariances derived from the ensemble of SCM nowcasts. We compare it to a simpler postprocessing scheme that relies on statistics from a sample climatology (CD). The CD adjusts a deterministic-mesoscale forecast using surface-atmosphere 3D-climatological covariances, a 30-min persistence model, and surface forecast errors. It dresses the adjusted profile with the in-sample uncertainty distribution scaled by the 30-min surface error. Mean estimates derived from both methods are compared to the 3D WRF forecasts (over the SGP) that represent the baseline to be improved. Both methods are evaluated probabilistically.
Deterministic evaluation (MAE of mean profiles) shows that both methods improve 3D WRF forecasts under several flow conditions. Compared to 3D forecasts from a few hours earlier, assimilating surface observations improves predictions in the lowest few hundred meters AGL where surface-atmospheric covariances are significant. The SCM/EF improves forecasts under a wider variety of flow scenarios than does the CD. But CD leads to more skillful estimates of T and Qυ profiles during daytime convective regimes when the parameterized PBL within the SCM is biased. The 30-min error persistence assumed in CD implicitly removes much of the bias in observation space, and errors in climatological covariances are not as great when the PBL lacks variability in the middle of the day. The advantage of the SCM/EF is greatest during statically stable nighttime conditions in the first hundreds of meters above the surface. The CD is relatively unskillful during night because flow-dependent surface-atmosphere covariances produced by the SCM/EF better account for seasonal flow variability, climatological 3D WRF covariances are inaccurate estimates of error covariances, and the 30-min surface-error persistence assumption loses validity at night.
Both the CD and SCM/EF provide an estimate of the nowcast uncertainty not available in the 3D WRF deterministic forecasts. Probabilistic skill (relative to the observed climatology) measured by the BSS, its decomposition into resolution and reliability terms, and by the AUR reveals that the SCM/EF is probabilistically skillful during both day and night. An exception is Qυ profiles during the day, when reliability is poor because of bias. The CD is probabilistically more skillful than SCM/EF estimates of T and Qυ during daytime convective PBL regimes. The CD advantage results from smaller bias, greater reliability, and/or better resolution/discrimination. Excessive spread in the SCM/EF ensemble from using a climatological distribution of geostrophic winds to force the SCM appears to be the primary factor reducing resolution. But the SCM/EF is probabilistically more skillful than the CD in all other instances, especially during statically stable conditions. Overall, this pattern of behavior is consistent with deterministic skill.
A factor-separation analysis (FSA) applied to the nighttime SCM/EF forecasts shows the most important factors for model skill. This is the first work to use the FS method within the framework of probabilistic verification. The FSA shows that a given factor can either benefit or degrade a forecast, depending on whether its impact is assessed deterministically or probabilistically. Opposing effects on reliability resolution/discrimination arise occasionally, illustrating the independence of these attributes. The dependence of the probabilistic verification scores on the variability of the reference forecast system (here the observed climatology) is clear. For instance, in many cases the vertical shape of the BSS follows the shape of the resolution term, which in turn is determined by the variability of the observed climatology.
Surface-observation assimilation by the SCM/EF shows a consistent positive effect on probabilistic skill measured by the BSS, its decomposition into reliability and resolution terms, and by the AUR for all the analyzed variables. Horizontal advection contracts the excessive spread resulting from climatological geostrophic relaxation. Parameterized radiation, which is the most computationally expensive among the three SCM/EF analyzed components, exhibits a modest effect and acts on T and V only. The effect of parameterized radiation on V profiles was not seen in the deterministic verification in earlier work (RH10), but here it improves resolution. It appears to improve decoupling of the PBL to allow for the variability introduced by low-level jet (LLJ) formation. Dual interactions tend to cancel the improvement resulting from single factors. Exceptions are assimilation–radiation synergism on resolution, and discrimination of V profiles.
The use of CIs lends confidence to the quantitative differences among the methods and to the effects of SCM/EF components. They also provide a measure of the sensitivity to the flow scenario. For instance, wider CIs at vertical levels where LLJs develop are characterized by wider seasonal variability.
Results from this study suggest paths for further improvements to the SCM/EF. Resolution in the SCM/EF might be improved by using analog techniques to build samples for forcing. The CD method is less sensitive to bias, suggesting that some of its features could be used in a simple bias correction scheme for the SCM/EF.
The authors are grateful to the developers of the NCAR Data Assimilation Research Testbed (DART) for providing a useful platform for experimentation. We acknowledge M. Pocernic and E. Gilleland for guidance on the R packages (“verification” and “bootstrap”; http://www.r-project.org) that were used for our calculations. We acknowledge T. Hopson and Z. Klausner for insightful discussions on the bootstrap technique. Yehuda Alexander is thanked for fruitful discussion. Data were obtained from the Atmospheric Radiation Measurement (ARM) program sponsored by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Environmental Sciences Division.
As detailed in RH10, comparative experiments were run using the Yonsei University (YSU) scheme (Hong and Pan 1996; Noh et al. 2003). The use of the YSU scheme does not lead to poorer performance and it can even slightly improve the results. We choose to present the analysis of the experiments using the MYJ scheme as it offers additional turbulent kinetic energy information, which is of value for further analysis of our model and for its potential use in air quality and plume dispersion studies.
The LLJ over the SGP is dominated by the V-wind component; thus, a significant effect on V leads to a significant effect on wind speed.