An ensemble prediction system featuring subkilometer horizontal grid spacing and high vertical resolution is used to quantify forecast uncertainty in the stable boundary layer (SBL). Diversity in initial conditions and/or planetary boundary layer/surface layer physics within the WRF Model provides ensembles with up to 12 members. WRF explicit ensemble data drive trajectory calculations and the Second-Order Closure Integrated Puff (SCIPUFF) model for hazard prediction. Explicit ensemble SCIPUFF forecasts are compared to single-member SCIPUFF forecasts leveraging WRF ensemble wind field uncertainty statistics. The performance of 1.3- and 0.4-km horizontal-gridlength ensemble configurations is evaluated for two case studies of differing flow regimes with respect to the Nittany Valley in central Pennsylvania where uncertainty in atmospheric transport and dispersion (ATD) is dependent on drainage flows and circulations related to trapped lee-wave activity.
Results demonstrate that a 12-member ensemble provides reasonable spread in ATD forecasts. Additionally, single-member SCIPUFF surface-dosage probability forecasts using the meteorological ensemble statistics generally reflect the pattern while encompassing the hazard area given by the explicit SCIPUFF ensemble, but at a reduced computational cost. Low-level wind and temperature forecasts given by the 12-member, 0.4-km ensemble are improved significantly over corresponding 1.3-km ensemble forecasts. In general, the 12-member, subkilometer-gridlength ensemble configuration reliably captures temperature and wind fluctuations related to drainage flows and trapped lee-wave activity that directly impact ATD. Localized data assimilation positively impacts overall probabilistic forecast skill when trapped lee waves are present, and drainage flow case results appear more dependent on model physics than initialization strategy.
Turbulent mixing has a direct and complex impact on stable boundary layer (SBL) evolution and the atmospheric transport and dispersion (ATD) of contaminants released within the SBL. Because of stable stratification, contaminants and pollutants released within the SBL generally disperse horizontally as meandering plumes (Etling 1990; Mahrt 2007). However, during intermittent turbulent mixing events, dispersion may be enhanced in the vertical, and the SBL may temporarily couple with the residual layer aloft where neutral stratification allows pollutants to disperse equally in the horizontal and vertical (Salmond and McKendry 2005). ATD may be further complicated by flow interaction with complex terrain (Nappo 1991; Hoover et al. 2015) due to shear generation of turbulence related to meso-γ-scale (2–20 km) and submesoscale (submeso) motions. Submeso motions are nonturbulent motions generated by nonstationary shear events with time scales of minutes to tens of minutes and spatial scales from 0.02 to 2 km (e.g., Seaman et al. 2012; Mahrt 2014; Hoover et al. 2015). Common submeso motions within the SBL over complex terrain include wavelike motions and intermittent drainage flows among other more complex structures (Mahrt 2014).
Depending on wind speed and stability, mountain waves may be excited by flow over topography. Boundary layer separation processes related to mountain-wave activity can lead to the development of rotors (e.g., Scorer 1949; Kuettner 1959; Doyle and Durran 2002). Rotors are characterized by moderate-to-severe turbulence related to pressure-gradient-induced circulations that develop beneath wave crests when mountain waves become trapped close to the ground in a ducting region or by severe-to-extreme turbulence and downslope windstorms associated with nonlinear processes such as hydraulic jumps downwind of a mountain crest (Doyle and Durran 2002; Hertenstein and Kuettner 2005; Colman et al. 2013). Regardless of the mechanism, rotors serve as a source of turbulent mixing and play a complex role in ATD (Durran 1990).
Using NWP to forecast the impact of a chemical or biological (chem-bio) release within the SBL is an inherently uncertain task. The initial state of the atmosphere, temporal and spatial model numerics, and physical parameterizations necessary for the prediction of subgrid-scale processes are all sources of NWP uncertainty (e.g., Stauffer 2013). Ensembles are often used to quantify forecast uncertainty as no single deterministic NWP forecast can cover the full range of atmospheric motions (Murphy 1988; Leutbecher and Palmer 2008). With the goal of sampling plausible future atmospheric states, ensemble members can be created by perturbing various aspects of an NWP model (Leith 1974), and the aggregation of each ensemble-member forecast may be used to compute the probability of a particular event. Regardless of the nature of a chem-bio release, implications for human life in the path of a plume may be dire, and it is essential to quantify the uncertainty surrounding ATD forecasts in order to aid emergency response efforts.
No one universal method of creating sufficient spread among ensemble members exists. Methods such as breeding vectors (Toth and Kalnay 1997) or the ensemble transform and rescaling method (ETR; Ma et al. 2014) produce ensemble forecasts based on plausible initial atmospheric states with a focus usually on synoptic-scale features. Conversely, mesoscale forecasts of shorter length (6–48 h) may not allow sufficient time for initial-condition perturbations to grow leading to insufficient ensemble spread. Multimodel and/or multiphysics ensembles have been used to successfully achieve sufficient spread over shorter forecast lengths (e.g., Stensrud et al. 2000). Stensrud and Fritsch (1994) suggest that ensemble diversity in both initial conditions and model physics may be appropriate because of uncertainties in both of these aspects. Ensembles designed for ATD applications commonly include model, physics, or initial-condition diversity or combinations of the three (e.g., Warner et al. 2002; Delle Monache and Stull 2003; Lei et al. 2012), and some achieve diversity by varying a release in space and time (e.g., Bieringer et al. 2014). Utilizing a practical number of ensemble members is a common constraint that contributes to the underdispersive nature of most ensembles (Kolczynski et al. 2011). Fortunately, previous studies have noted improvements in forecast skill using ensembles of only 5–10 members (e.g., Houtekamer and Derome 1995; Du et al. 1997; Buizza and Palmer 1998). Choosing members that represent the true atmospheric uncertainty space is essential to gleaning the best forecast skill from a small ensemble. As there is no set standard for ensemble member selection, a common method, also used here, is to select members based on physical intuition of governing, application-relevant, atmospheric processes.
An ensemble approach is employed to quantify forecast uncertainty within the SBL over the complex terrain of the Nittany Valley in central Pennsylvania. To more accurately resolve the SBL and small-scale motions related to flow interaction with topography, members require very fine horizontal and vertical grid resolution uncommon to current operational ensemble and even deterministic forecast systems. We explore the viability and performance characteristics of 3–12 member multi-initialization, multiphysics WRF ensembles with subkilometer (0.4 km) grid spacing and high vertical resolution. Ensemble output is used to compute trajectories and drive ATD forecasts in a way that reflects uncertainty surrounding the true atmospheric state.
Section 2 describes the WRF Model configuration, ensemble design, and the Second-order Closure Integrated Puff (SCIPUFF) model for hazard prediction (Sykes et al. 2006). Experimental design and statistical evaluation methods are summarized in section 3. Section 4 describes the Nittany Valley, Rock Springs (RS), Pennsylvania, observation network, and SBL case studies that will be evaluated. Results are discussed in section 5, and section 6 provides conclusions and recommendations for future work.
2. WRF configuration, ensemble design, and SCIPUFF
a. WRF configuration
Ensembles are based on initial-condition and physics diversity within a high-resolution configuration of version 3.6 of WRF-ARW. WRF is a 3D, fully compressible, nonhydrostatic primitive equation model with terrain-following vertical coordinates and Arakawa C horizontal gridpoint staggering (Skamarock et al. 2008). The WRF configuration includes four one-way nested domains of 12-, 4-, 1.3-, and 0.4-km horizontal grid spacing with the 1.3- and 0.4-km nests centered over central Pennsylvania and the Nittany Valley (Fig. 1a). The location and topography of the 0.4-km domain with respect to the topography of the 1.3-km domain are shown in Figs. 1b and 1c. Initial and lateral boundary conditions are given by 6-h NCEP 0.5° × 0.5° GFS analyses and forecasts. WRF physics options held constant across ensemble members are similar to those in Seaman et al. (2012) except that atmospheric radiation calculations are updated every 5 min, and the Noah land surface model (LSM) is coupled with MODIS land use (Table 1).
The staggered vertical grid for all domains features 44 full levels with 2-m spacing in the lowest 10 m and 10 half levels in the lowest 50 m with gradually increasing thicknesses up to the 50-hPa model top. This configuration is based on the results of previous examinations of SBL forecast sensitivity to various vertical grid configurations (e.g., Suarez et al. 2011; Seaman et al. 2012; Hoover et al. 2015). For example, in a comparison of both 44 and 35 vertical level configurations, Seaman et al. 2012 show that the use of fine vertical resolution near the surface (10 layers in the lowest 50 m AGL) produces more accurate near-surface nocturnal SBL wind forecasts than a coarser configuration (e.g., a more common two layers in the lowest 50 m AGL). Moreover, because poor representation of the midtropospheric environment can potentially affect the structure of gravity waves and rotor circulations near the surface, a gravity wave case should exhibit greater sensitivity to vertical resolution above the SBL than a nonwave quiescent case in which most of the submeso and turbulence intermittency is driven by shallow processes within the SBL. Suarez et al. (2011) demonstrate that further increasing the number of vertical levels in a wave case from 44 to 64 by doubling the low-level vertical resolution in the lowest 6 km AGL minimally impacts wave evolution and structure, as well as surface forecasts of temperature and vertical velocity.
b. Ensemble design
Given the high computational cost of running a mesoscale ensemble that employs very fine resolutions in both the horizontal and vertical grids, 12 members (see Table 2) are carefully chosen to provide the most relevant spread using the fewest members. Both multi-initialization and multiphysics diversity is included in the creation of ensembles because of uncertainties in initial conditions and model physics over mesoscale forecast lengths less than 48 h (e.g., Stensrud et al. 2000). Clear-sky conditions govern the SBL cases presented here; thus, diversity in convective and microphysical parameterizations is not included. Additionally, earlier sensitivity studies comparing the use of both Noah (Chen and Dudhia 2001) and thermal diffusion (Dudhia 1996) LSMs reveal minimal differences in average nighttime RS temperature and wind forecast errors (Suarez et al. 2011). This is further validated by the results of Lee et al. (2012) that show, following a statistical down-selection process, the most representative members of the true atmospheric state included the Noah LSM. Therefore, LSM diversity is not included, and use of the Noah LSM is held constant for each member (Table 1). To provide the largest relevant spread among members, physics diversity is introduced by varying planetary boundary layer (PBL) and surface layer (SL) physics schemes among ensemble members.
Ensemble initial-condition diversity is given by three WRF initialization strategies. The first strategy, Control (CTRL), is a 12-h free forecast from 0000 to 1200 UTC using GFS initial and lateral boundary conditions. The second strategy, baseline (BSL), uses a 24-h free forecast initialized from GFS 12 h prior to the nighttime period of interest. The third strategy, four-dimensional data assimilation (FDDA; Stauffer and Seaman 1994; Rogers et al. 2013), refers to a 12-h nighttime forecast with a 12-h preforecast that includes analysis nudging over the 12-km domain and observation nudging over all domains until 0000 UTC when the free forecast begins. Each of these strategies offers different but plausible initial conditions for each SBL case.
Subgrid-scale turbulent motions must be accounted for because turbulent mixing, or the lack thereof, directly impact SBL evolution. Therefore, the choice of subgrid-scale parameterization employed in WRF PBL schemes is important for the creation of relevant physics diversity among ensemble members. Four PBL physics schemes and their respective coupled SL schemes are incorporated into the ensemble (Table 2). The local, 1.5-order closure Mellor–Yamada–Janjić (MYJ; Janjić 1994), Mellor–Yamada–Nakanishi–Niino (MYNN; Nakanishi and Niino 2004), and quasi-normal scale elimination (QNSE; Sukoriansky et al. 2005) PBL schemes include a prognostic equation for turbulent kinetic energy (TKE) in contrast to the first-order, K-profile Yonsei University (YSU; Hong et al. 2006; Hong 2010) PBL scheme. MYJ background TKE is reduced from 0.1 to 0.01 m2 s−2 to prevent excessive SBL mixing (Seaman et al. 2012). Unlike the MYJ, the MYNN employs a different mixing length calculation methodology and accounts for TKE advection. To represent turbulent physics within the SBL, the QNSE calculates eddy diffusivities using a spectrally based theory that eliminates perturbation scales from the primitive equations and considers spatially anisotropic flow and the effects of unresolved turbulence and waves simultaneously (Sukoriansky et al. 2005; Sun et al. 2015). Mixed-layer countergradient terms in the YSU are dropped under stable conditions reducing the scheme to a local approach (Hong et al. 2006; Hong 2010). Improvements to the YSU have reduced excessive mixing allowing for more representative SBL forecasts (e.g., Hu et al. 2013). The MYJ, MYNN, QNSE, and YSU are coupled with Eta SL (Janjić 1996), Eta SL, QNSE SL (Sukoriansky 2008), and revised MM5 SL (Jimenez et al. 2012), respectively.
Ensemble utility is extended to ATD forecasting by using each ensemble member to drive trajectories and the SCIPUFF ATD model (Sykes et al. 2006) to quantify forecast uncertainty for simulated chem-bio releases within the Nittany Valley. SCIPUFF uses a Lagrangian approach to simulate the dispersion of 3D Gaussian puffs and models statistics of unresolved turbulence via second-order turbulence closure. As in Warner et al. (2002), Lee et al. (2009), and Kolczynski et al. (2009), SCIPUFF is driven by gridded meteorological data (here, gridded ensemble member data from WRF). The meteorological data driving ATD models, particularly the horizontal winds (Rao 2005), are a large source of uncertainty in dispersion forecasting. In hazard mode, SCIPUFF accounts for horizontal-wind uncertainty by augmenting and orienting a plume based on single-point wind variances derived from the WRF ensemble of gridded meteorological data. Larger values of u- and υ-wind component ensemble variances lead to larger puff areas and lower mean concentrations. The ensemble covariance of u- and υ-wind components alters plume orientation given the sign of the single-point values (e.g., Warner et al. 2002; Kolczynski et al. 2009). Also, a decorrelation scale length estimate (SLE) for dispersion concentration variance must be provided in hazard mode where larger values of SLE increase uncertainty in the concentration field because of decreased small-scale mixing and diffusion of the plume and increased variability in plume location (Warner et al. 2002).
Two SCIPUFF ensemble approaches are considered to quantify ATD uncertainty as in Warner et al. (2002). The explicit-ensemble approach aggregates ATD simulations for each ensemble member, and point-based probabilities of meeting or exceeding a given concentration threshold are calculated. Because running SCIPUFF for each ensemble member may be too costly given a time-sensitive situation, the second approach involves driving a SCIPUFF hazard-mode simulation with a single-member derived from all 12 WRF ensemble members. The ensemble mean, which has been shown to provide a more skillful forecast than any one explicit member on average (e.g., Grimit and Mass 2002) despite dynamic inconsistency, is created by averaging the 12 WRF member forecasts. To avoid smoothing and to retain dynamic consistency, another single member, referred to as a “best member,” is defined as the member whose vector wind difference from the ensemble mean is smallest when averaged over a prescribed atmospheric depth across the domain (set to 300 m here given local terrain and PBL heights). Because SCIPUFF defines a concentration probability density function with concentration mean and variance at a given point, probabilities of meeting or exceeding a threshold concentration are calculated using a clipped-normal distribution for single-member ATD forecasts (Warner et al. 2002; Sykes et al. 2006).
Several considerations are made before employing SCIPUFF for SBL cases. A nonzero minimum value of the horizontal velocity fluctuation is typically used in SCIPUFF to parameterize unresolved horizontal wind meandering for coarser grid models with grid sizes of O(10) km. When model grid spacing is O(1) km or subkilometer, the model can explicitly resolve meandering motions and plumes, and this parameter is reset from 0.25 to 0 m2 s−2. Similarly, the minimum value for tropospheric vertical turbulent fluctuations in SCIPUFF is also reduced from 0.01 to 1 × 10−4 m2 s−2 for these high-resolution model simulations. The SLE length scale is set to 2 km to retain the impacts of submeso and meso-γ motions. To include dispersion related to unresolved small-scale flows, the large-scale variability (LSV) model within SCIPUFF is often activated for SCIPUFF applications driven by coarse-resolution gridded meteorological data. However, we are already using very fine grid lengths to resolve small-scale flows and LSV does not currently take atmospheric stratification or terrain into account. Thus, the additional dispersion caused by LSV would be unrealistic in this application, and it is deactivated here for all SCIPUFF runs.
3. Experimental design and ensemble evaluation metrics
a. Experimental design
Three ensemble configuration types are tested to evaluate full ensemble performance as well as the contributions of initialization and physics diversity. The configurations are as follows: full ensembles that include both initial-condition and physics diversity (all 12 members in Table 2), multi-initialization ensembles that include initial-condition diversity while holding PBL/SL physics constant, and multiphysics ensembles that include PBL/SL physics diversity while holding initial-conditions constant. Multi-initialization ensembles allow testing of four 3-member ensembles (e.g., members 1–3 in Table 2 form the 3-member MYJ multi-initialization ensemble), and multiphysics ensembles permit testing of three 4-member ensembles (e.g., members 3, 6, 9, and 12 in Table 2 form the 4-member FDDA multiphysics ensemble). A 0.4-km horizontal grid spacing nest is included in the WRF Model configuration to better resolve the small-scale phenomena that occur within the SBL. However, Grimit and Mass (2002) suggest diminishing ensemble performance improvement for low-level wind direction forecasts as grid spacing is decreased from 12 to 4 km for a study over the Pacific Northwest. Thus, it is important to investigate the added value of a limited-size ensemble with finer grid spacing (e.g., Wang et al. 2012). Forecast performance of 0.4- and 1.3-km horizontal grid spacing configurations of the full ensemble (referred to as the 0.4-km ensemble and the 1.3-km ensemble) are evaluated and compared.
b. Evaluation metrics
Ideally, the verified future atmospheric state should fall within the spread of the ensemble, and the amount of spread should be correlated to ensemble-mean forecast error. This spread–skill relationship yields information about the dispersive behavior of an ensemble. Here, the performance of various ensemble configurations is evaluated using rank histograms (e.g., Talagrand et al. 1997) and the continuous ranked probability score (CRPS; Hersbach 2000).
Rank histograms assess the reliability and spread of an ensemble prediction system. Reliability is a measure of the calibration of a forecast system. For a set of perfectly reliable ensemble forecasts, realized probabilities equal forecasted probabilities. If ensemble forecasts are reliable, observations are equally likely to fall into any position among corresponding sorted ensemble forecast values where “rank” refers to the position of the observation. If reliable, a uniform rank histogram results when the frequency of ranks over some spatial area and/or temporal interval are examined. Deviations from rank uniformity infer ensemble bias, underdispersive, or overdispersive behavior (e.g., Wilks 2011). Rank histograms are unable to quantify the ability of an ensemble to produce a specific, or sharp, forecast and should be used in conjunction with a resolution assessment metric (Hamill 2001).
The CRPS is a strictly proper ensemble verification metric that provides an assessment of global ensemble skill. The CRPS quantifies the squared difference between the ensemble forecast cumulative distribution function (CDF) and the CDF of the corresponding observation (e.g., Gneiting and Raftery 2007). Practically, CDFs are constructed as cumulative density functions, and the CRPS is calculated and averaged over many ensemble forecast and observation pairs. Optimally, the CRPS is zero and forecast skill decreases with increasing value. The CRPS can be decomposed into three components so that
(Hersbach 2000). Closely related to rank histogram shape, reliability is related to the position of the observation within the ensemble of sorted forecast values. Resolution increases as the ability of an ensemble to produce a specific, unambiguous event forecast increases. Uncertainty is proportional to the standard deviation of the observations. The potential CRPS given by
which represents the improvement of the ensemble forecasts over forecasts based on observational climatology (Hersbach 2000). Because a direct comparison of two ensemble configurations is based on the same observation data, observational uncertainty is equal in both ensembles and an improvement in potential CRPS identifies an improvement in resolution. Significant differences in CRPS, reliability, and potential CRPS between the 1.3- and 0.4-km ensembles are identified at a 95% confidence level using a bias-corrected and accelerated (BCa; DiCiccio and Efron 1996) bootstrapping technique (e.g., Efron 1979; Candille et al. 2007).
4. Nittany Valley, Rock Springs Observation Network, and SBL case study descriptions
The Nittany Valley in central Pennsylvania is approximately 20 km wide and is bordered by the Allegheny Mountains to the northwest and Tussey Ridge to the southeast (both ridge tops ~300–350 m above the valley floor). The long axis of the valley extends from roughly southwest to northeast (Fig. 1c). The surrounding slopes are forested while the valley itself is predominantly covered by croplands and grasslands. The RS observation network located along the base of Tussey Ridge contains instrumented towers as well as sonic detection and ranging (SODAR; Fig. 2) instrumentation (Hoover et al. 2015). Although direct verification of ATD ensemble forecasts is not possible because tracer measurements are unavailable, low-level horizontal winds and PBL depth are two of the most important parameters impacting ATD forecast uncertainty (Lewellen and Sykes 1989; Rao 2005; Reen et al. 2014). Thus, along with SODAR horizontal winds, ensemble forecasts are evaluated against 2-m temperature and wind observations from six RS sites (Fig. 2). The data are used as a proxy for direct ATD measurements. Time series of 12-min-centered average temperature, wind speed, and wind components rotated to an across-valley (perpendicular to Tussey Ridge; positive toward the southeast) and along-valley (parallel to Tussey Ridge; positive toward the northeast) coordinate system are compared against instantaneous 12-min WRF ensemble member outputs. SODAR winds are also used to identify the presence of wave motions and rotor circulations over the RS network.
Ensembles are evaluated over 12-h nighttime periods (0000–1200 UTC; 0700–0700 LST), and results are presented for two case studies during which SBL related submeso- and meso-γ-scale motions lead to ATD forecast uncertainty. Two case studies are chosen to provide examples of SBL flows related to the topography of the Nittany Valley: 13 September 2011 (SEP13) and 16 September 2011 (SEP16). SEP13 features clear skies and weak southwesterly valley-parallel SBL flow (Fig. 3a) associated with a high pressure system centered over the Tennessee Valley. SEP13 ATD uncertainty is dominated by submeso motions such as katabatic drainage flows resulting from strong near-surface thermal stratification. SEP16 exhibits northwesterly flow over the Allegheny Mountains and perpendicular to the Nittany Valley associated with a high pressure system approaching from Ohio. Clear skies and light surface winds allow the development of the SBL. From about 0500 to 0900 UTC, RS SODAR observations (Fig. 3b) suggest the presence of rotor circulations associated with trapped lee waves excited by flow over the Allegheny Mountains. These rotors, leading to flow reversal as indicated by the SODAR wind shift from west-northwesterly to south-southeasterly near the surface, complicate ATD.
5. Ensemble case study results
Using the metrics describe in section 3, ensemble forecasts for each case study are first evaluated against RS network observations. For each case, the ATD outcomes of simulated near-surface releases from RS site 9 (Fig. 2) and a midvalley location are analyzed. Particle trajectories are initialized from 3 m AGL at nine grid cells surrounding site 9 and midvalley to simulate the movement of air parcels. Corresponding SCIPUFF surface dosage plumes are analyzed following 3-m AGL, 12-min continuous releases of the passive tracer C7F14. SCIPUFF single-member surface dosage threshold probabilities are then compared against threshold probabilities derived from the explicit SCIPUFF ensemble.
a. Case 1: Valley-parallel flow
1) SEP13 ensemble performance evaluation results
Low-level temperature and wind forecasts are compared against RS observations to gain insight into ensemble forecast accuracy for SEP13. Over the nighttime window, CRPS, reliability, and potential CRPS values for 2-m temperature, wind speed, across-valley wind, and along-valley wind are calculated (Table 3). Ensemble forecasts are also evaluated against SODAR horizontal wind observations for a 30–115-m AGL layer above the valley floor (Table 3). For each variable at 2-m AGL, significantly improved CRPS and reliability values for the 0.4-km ensemble over the 1.3-km ensemble indicate that finer grid spacing produces more accurate forecasts. A statistically significant degradation in potential CRPS for each wind variable signifies a reduction in resolution (i.e., sharpness) from the 1.3- to the 0.4-km ensemble. This result indicates that the 1.3-km ensemble produces more specific, higher-resolution forecasts for each wind variable, but these forecasts less effectively capture the actual observed wind values. This is evidenced by poor reliability component results of each wind variable for the 1.3-km ensemble. Ensemble wind variable forecast performance in the SODAR-resolved layer above the value floor is generally similar to the performance at 2-m AGL. Significant improvements in 0.4-km ensemble CRPS and reliability are evident for wind speed and across-valley wind. Overall forecast performance between the 1.3- and 0.4-km ensembles is not significantly different for along-valley wind as evidenced by the CRPS using SODAR data. This result is not surprising as flows in the layer above the valley floor are more dependent on larger-scale forcing and are less prone to the effects of shallow katabatic flows along the surface occurring in this case.
Wind components at 2-m AGL demonstrate the most notable statistical improvements from the 1.3- to the 0.4-km ensemble. Improvements to the along-valley wind forecasts, the most relevant variable for downwind ATD in this case, are mainly due to a large bias reduction indicated by the corresponding rank histograms in Fig. 4. The large frequency for rank “1” (Fig. 4a) indicates that for over 90% of instances throughout the night, every 1.3-km ensemble forecast value is greater than the observed along-valley wind. Although still present in the 0.4-km ensemble forecasts, this overforecasting bias is reduced by over 40%, and the rank histogram tends more toward rank uniformity (Fig. 4b).
To obtain an overall sense of initialization- and physics-diversity contributions to the full ensemble, 2-m AGL CRPS results for the 0.4-km multi-initialization and multiphysics ensembles are presented in Fig. 5. The QNSE multi-initialization ensemble appears to produce the best temperature forecast by a relatively large margin while it performs similarly to the MYNN and MYJ configurations for wind speed variables (Fig. 5a). On average, the QNSE members (best) have a positive temperature bias of less than 1.0 K compared to 2.1 K for YSU members (worst). Contrary to previous studies that have found the QNSE to have a cold bias in the SBL (e.g., Suarez et al. 2015; Boadh et al. 2016), the QNSE configuration best simulates low-level fields for this SBL case, characterized by strongly stable conditions and weak (<1 m s−1) drainage flows. These experiments tend to produce on average the strongest low-level stratification, largest negative sensible heat flux, and weakest TKE values or mixing (not shown), which allows the development of strong cold pools with temperatures closer to those observed.
The non-TKE YSU scheme ensemble produces the worst forecasts for each variable, despite recent revisions that have been shown to improve SBL forecasts (Hu et al. 2013). The worst CRPS scores in Fig. 5a from YSU are consistent with its 2-m temperature and wind speed forecasts over RS. Most notably the YSU-BSL and YSU-FDDA produce larger positive 2-m wind speed biases than all other members (0.56 and 0.50 m s−1, respectively, versus an ensemble mean wind speed bias of 0.20 m s−1), and also larger positive biases in surface temperature (2.2 and 2.4 K, respectively, versus an ensemble mean temperature bias of 1.5 K). This may be related to the tendency of the scheme to produce deep, well-mixed layers during the daytime and to overestimate the eddy diffusivities during the nighttime period (e.g., Hu et al. 2013). So in agreement with previous studies (e.g., Shin and Hong 2011), it seems that TKE-based local turbulence closure schemes (e.g., MYJ, QNSE) still have some advantage over the YSU scheme. It should not be totally unexpected if YSU members produce different ATD results compared to the other ensemble members.
The multiphysics ensemble CRPS values (Fig. 5b) show that no one configuration consistently produces the best wind variable forecasts while the CTRL multiphysics configuration produces superior temperature forecasts. This may indicate that low-level ensemble wind forecasts are more dependent on PBL/SL physics diversity than initialization strategy in this weakly forced case. CTRL experiments consistently produce the strongest low-level stratifications and weakest mixing.
2) SEP13 ATD results
ATD predictions for SEP13 are generally dominated by the presence of drainage flows within the Nittany Valley. Figure 6 depicts horizontal trajectories after a 0600–0800 UTC integration for each of the 0.4-km ensemble members along with 3-m horizontal wind vectors at 0800 UTC. The trajectories generally converge at the base of Tussey Ridge before moving northeast. Horizontal wind vectors suggest drainage-flow convergence as the cause of this behavior as flow from the slopes of Tussey Ridge meets weak flow of the opposite direction draining from a small region of topography (denoted by the star in the figure) within the valley. Perpendicular to Tussey Ridge, vertical cross sections of these trajectories along with potential temperature contours at 0800 UTC (see Fig. 1c for the location of the cross section) are plotted in Fig. 7. Cross sections reveal little or no vertical extent for the trajectories as they are caught in a strong surface cold pool indicated by the tightly stacked potential temperature contours. However, the trajectories for the YSU-BSL and YSU-FDDA members are lofted just above the region of strongest thermal stratification and advected farther down the valley than shown for the other members.
The lofting of trajectories for the YSU-BSL and YSU-FDDA members compared to CTRL appears to be the result of differences in atmospheric conditions at 0000 UTC due to the former having overmixed conditions during their previous daytime simulation periods, stronger convergence regions of the subsequent drainage flows and the cold pool, and near ridge-top dynamics enhancing the horizontal flow in the valley. The YSU-CTRL, characterized by trajectories that have limited vertical extent (Figs. 6 and 7), exhibit mostly negative vertical motions as expected whereas all of the lofted trajectories for YSU-BSL, for example, are released within the strongest convergence region or within strong downslope flow. YSU-BSL trajectories show strong, positive vertical motions as large as 8 m s−1, and YSU-BSL and YSU-FDDA produce the strongest vertical velocities compared to all the ensemble members during this 0600–0800 UTC period over site 9.
Nonetheless, other factors such as the low-level stratification and wind speed also appear to play a major role in the horizontal and vertical propagation of the trajectories. For YSU-CTRL, a deep layer of strong stratification beneath 600 m MSL and weak wind speeds are forecasted through the release period while YSU-BSL exhibits a shallow (~30 m AGL), strongly stratified cold pool and strong downslope flows exceeding 5 m s−1 on the windward (north facing) slope of Tussey Ridge. For this experiment, strong wind speeds coupled with large downward vertical motions erode the cold pool and reduce the stratification above 400 m MSL (30 m AGL) resulting in regions of neutral stratification and convective overturning (Fig. 7). Weaker stratification at and above this level does not appear to damp the horizontal or vertical propagation of the trajectories as in YSU-CTRL while that near the surface appears to play a limited role modulating the lofting of the trajectories because low-level (2–9 m AGL) stratification for YSU-BSL and YSU-FDDA is comparable and not smaller than that of all the other members (not shown).
The YSU-BSL and YSU-FDDA experiments are characterized by positive wind speed bias at the surface and exhibit stronger wind speeds beneath 1000 m MSL than the other ensemble members (not shown). The stronger near-surface wind speed for YSU-BSL, for example, may be the result of higher momentum air being forced down toward the surface due to blocking by a region of strong static stability over Stone Valley (downwind of Tussey Ridge). Nine backward-in-time trajectories around site 9 from 0800 to 0600 UTC for this experiment (not shown) reveal that the flow descends into the valley from 650–680 m MSL, where wind speeds are approximately 6 m s−1. Plots of stratification and wind speed for all BSL experiments (not shown) reveal that the YSU tends to produce some of the deepest near-neutral layers by the end of the daytime period (0000 UTC), and through the nighttime period, YSU-BSL continues to produce stronger wind speed and weaker stratification than YSU-CTRL. Thus, a combination of overmixing during the day and stronger wind speeds appear to explain the different results of YSU-BSL and YSU-FDDA to YSU-CTRL and the other ensemble members.
SCIPUFF surface dosage plumes are shown in Fig. 8. The highest surface dosages occur along Tussey Ridge in the region of drainage flow convergence. Strong agreement among ensemble members causes high explicit ensemble surface dosage probabilities of meeting or exceeding a concentration threshold value of 10−9 m3 s m−3 along Tussey Ridge and downwind (Fig. 9a). This threshold value is arbitrary and would change based on exposure risk tolerances for a specific substance. Reasonable agreement exists between patterns of explicit ensemble probabilities and those derived from a SCIPUFF hazard mode simulation using the best member, MYJ-CTRL (Fig. 9b); however, the probability swath shown by the best member is greatly expanded because of plume augmentation and orientation impacts of the wind field uncertainty statistics employed in SCIPUFF hazard mode. Therefore, single-member probabilities demonstrate a spatially larger and probabilistically more diffuse solution. Enhanced probabilities to the south of Tussey Ridge given by the best member (Fig. 9b) are due to artificial plume dispersion linked to the use of SCIPUFF hazard mode essentially forcing the plume through a gap along Tussey Ridge. Although SCIPUFF ensemble mean results are not shown, probability swaths given by both the ensemble mean and ensemble best member (Fig. 9b) approaches are similar.
SCIPUFF ATD results for a midvalley release show reasonable agreement between the pattern of explicit ensemble probabilities (Figs. 9c,d) and that given by the single-member (MYJ-CTRL) approach (best member similar to ensemble mean results, not shown). The single-member hazard area generally encompasses the hazard area that includes the higher surface dosage exceedance probabilities from the explicit ensemble. The region of highest probabilities for both approaches is downwind from the release point as expected. The best-member probability swath is expanded and more diffuse than that of the explicit ensemble that again retains more finescale detail. For example, in the latter, the 50%–60% and 60%–70% contours split the probability swath around a region of topography east of the release point. This behavior appears to be related to katabatic drainage flows in between small regions of varying elevation within the valley for five of the 0.4-km ensemble members. Figure 10 shows this expected drainage-flow effect on the trajectories in more detail for the 0.4-km MYJ-FDDA member. Fine details aside, the single members provide some indication of the overall range of potential ATD outcomes from the explicit ensemble and the spatial area likely to be impacted following a chem-bio release at a reduced computational cost (1 rather than 12 SCIPUFF integrations for each release).
b. Case 2: Northwesterly cross-mountain flow
1) SEP16 ensemble performance evaluation results
Trapped lee waves and associated rotor circulations excited by northwesterly flow over the Allegheny Mountains are a large source of forecast uncertainty for SEP16. CRPS and its component values using observed surface wind and temperature and SODAR wind (as in SEP13 case) are presented in Table 4. CRPS and reliability for 2-m AGL temperature, wind speed, and across-valley wind forecasts significantly improve from the 1.3-km ensemble to the 0.4-km ensemble. Significant CRPS improvements in wind speed and across-valley wind appear for SODAR layer winds as well. Improvements in potential CRPS are significant for 2-m AGL wind speed and across-valley wind while the difference in performance is not significant for wind speed when evaluated against SODAR wind speed. Significant degradations in 2-m AGL and SODAR layer along-valley wind CRPS may be attributed to forecast phase and amplitude errors. Although the 0.4-km ensemble members resolve more fine physical structure in the along-valley wind time series, it appears that numerics associated with the coarser 1.3-km ensemble lead to smoother and less risky forecasts (not shown). Because of susceptibility to phase and amplitude errors, a shortcoming of most traditional verification approaches when evaluating finescale mesoscale model performance (Mass et al. 2002), the finer 0.4-km ensemble is penalized more harshly by the CRPS. Thus, evaluation complexities often arise with finer-resolution model products (Casati et al. 2008), and feature-based and wavelet-based evaluation approaches may provide greater model performance insight (Mittermaier et al. 2016; Suarez et al. 2015).
The largest CRPS improvement occurs for 2-m AGL across-valley wind, the variable most directly impacted by rotor circulation reversal flow over the RS network (Fig. 3b). The improvement is largely due to a reduction in forecast bias and, to a lesser extent, a resolution improvement. The bias reduction is best evidenced in the 2-m AGL across-valley wind rank histogram for the 0.4-km ensemble (Fig. 11b) that tends much closer to rank uniformity after more than a 30% reduction in overforecasting bias when compared to that of the 1.3-km ensemble (Fig. 11a). CRPS reliability values for all 2-m variables of the 0.4-km ensemble approach zero indicating reliable forecasts (Table 4). Overall, these ensemble evaluation results reveal the value of a subkilometer horizontal grid-spacing ensemble approach in better forecasting near-surface wind and temperature fluctuations associated with mountain-wave activity in the Nittany Valley.
Multi-initialization ensemble CRPS values reveal mixed results among configurations for each variable (Fig. 12a). In contrast to SEP13, the QNSE-based ensemble performs markedly worse for temperature because of a persistent cold bias but performs comparably to or better than the MYJ and MYNN multi-initialization configurations for wind variables. The QNSE experiments produce the strongest cold pools and the weakest wind speeds above the surface and between 30 and 120 m AGL (not shown). The strong low-level stratification acts to damp wave motions near the surface and low-level wind speeds. Furthermore, these experiments exhibit the weakest mixing (TKE) at low levels and weak waves and rotors (not shown), preventing or limiting the erosion of the cold pool during rotor events and allowing rapid cooling near the surface.
As in SEP13, the YSU multi-initialization ensemble again performs comparatively poor for each wind speed variable due to larger wind speed overforecasting biases of the YSU members compared to all other members (e.g., YSU-CTRL: 0.59 m s−1, YSU-BSL: 0.63 m s−1, YSU-FDDA: 0.47 m s−1 vs ensemble mean bias: 0.30 m s−1). It is important to note that the overestimation of the wind speeds by the YSU members, compared to SODAR observations and other ensemble members, can still be noted from 30 to 120 m AGL for this case (not shown). The limitations of the YSU accurately reproducing the nighttime wind and temperature structures have been well documented in the literature (e.g., Shin and Hong 2011; Hu et al. 2013). This may again suggest an advantage of TKE-based schemes as opposed to a non-TKE alternative for SBL processes.
Multiphysics ensemble CRPS results show that the FDDA ensemble configuration produces superior forecasts for temperature, wind speed, and along-valley wind while performing comparably to the CTRL multiphysics configuration for across-valley wind (Fig. 12b). Ensemble skill generally appears to depend more on initialization strategy than for SEP13. This is expected as SEP16 is more synoptically forced, and the presence of rotor circulations within the model is highly dependent on the correct representation of upstream conditions.
2) SEP16 ATD results
Near-surface temperature and wind fluctuations for SEP16 are dominated by the effects of trapped lee-wave-induced rotor circulations over the Nittany Valley. These complex SBL flows are the main source of ATD uncertainty for SEP16. Figure 3b presents SODAR observations of horizontal wind from 0500 to 0900 UTC near the base of Tussey Ridge (see Fig. 2 for location). After 0600 UTC, weakening winds at low-levels that spread upward accompany the onset of flow reversal in the vertical. By 0700 UTC, low-level winds become southeasterly while winds aloft remain northwesterly. Persisting until about 0800 UTC, this flow reversal pattern is indicative of a rotor circulation over the RS network with the reversal flow below about ~100 m and perpendicular to Tussey Ridge into the Nittany Valley.
Vertical cross sections of trajectories initialized at site 9 and integrated from 0600 to 0800 UTC show that approximately half of the members produce trajectories that move southeast across Tussey Ridge and into the adjacent valley while trajectories for the remaining members recirculate above the release site (e.g., MYJ-BSL, MYNN-BSL, QNSE-CTRL, QNSE-BSL; Fig. 13). Uncertainty exists when forecasting the amplitude, wavelength, and phase of trapped lee waves that may lead to rotor circulations. Here, the presence of the rotor circulation causes trajectory paths to deviate from the mean northwesterly flow and track along Tussey Ridge and into the Nittany Valley. Trajectory vertical cross sections for the 1.3-km ensemble configuration (not shown) indicate no trajectories showing recirculation for this release location and time. The 1.3-km nested grid provides relatively high horizontal resolution, but it is less capable of resolving the fine structure circulations related to flow interaction with the terrain surrounding the Nittany Valley and tends to severely underestimate the amplitude of the waves. Given the SODAR-observed rotor circulation over the RS network for this case, the discrepancy in 0.4- and 1.3-km ensemble outcomes exemplifies the need for subkilometer grid spacing when investigating small mesoscale features in the SBL.
SCIPUFF surface dosage plumes for a 0600–0800 UTC site 9 release demonstrate a broad range of ensemble member ATD outcomes (Fig. 14). Surface dosage values and the spatial extent of the surface dosage plumes are closely related to the variations in transport associated with the rotor circulations resolved by the model. For example, the MYNN-CTRL surface dosage plume crosses Tussey Ridge and moves southeast in accordance with synoptic-scale northwesterly flow. Conversely, the dosage plume for the QNSE-CTRL member that indicated a rotor circulation in its trajectories moves back into the Nittany Valley before drifting southeast over Tussey Ridge. Explicit ensemble surface dosage probabilities and those probabilities derived from a single-member (ensemble mean) approach are compared in Fig. 15. Again, the patterns are similar, and the area of the single-member ensemble-mean dosage probabilities (Fig. 15b) generally encompasses that of the explicit ensemble (Fig. 15a). This provides an estimate of the overall range of potential dispersion results from the explicit ensemble and the spatial area most likely to be impacted in a chem-bio release at a much reduced cost. Use of the best member (MYJ-CTRL) yields a very similar result (not shown).
To further investigate the impact that wave activity and corresponding rotor circulations have on ATD, a midvalley release is also performed as in SEP13. For the site 9 release in SEP16, trapped lee-wave phase may have been somewhat constrained by proximity to Tussey Ridge. Although interesting for the purposes of SODAR observation comparisons, the site 9 release results may not fully demonstrate ATD variability as a consequence of trapped lee waves and reversal flow due to rotor circulations. Therefore, the midvalley release location where wave phase is less constrained by topography provides another example of the ability of waves and rotors to increase spread in ATD outcomes among ensemble members. Over the same integration period, midvalley release trajectory paths for each member demonstrate enhanced outcome variability than for corresponding site 9 release results. For example, Fig. 16 shows a vertical depiction of the 12-member ensemble trajectories for the midvalley release with potential temperature along a northwest–southeast cross section (an extension of the site 9 release cross section). For the midvalley release, every member demonstrates new outcomes and, in most cases, a large increase in outcome variability when compared to corresponding site 9 results (Fig. 13). Some trajectory paths eventually cross Tussey Ridge and exit the valley, and trajectories for other members also demonstrate enhanced variability in the horizontal (horizontal trajectory figures not shown) both along-valley and across-valley in the forward (due to northwesterly synoptic-scale flow) and reversed direction (due to the rotor circulation).
This increased variability among ensemble members translates into wider and more uncertain SCIPUFF ATD outcomes for both the explicit and single-member approaches. Reducing ATD dependency on Tussey Ridge allows the area of explicit ensemble dosage probabilities (Fig. 15c) to expand more along-valley (especially for higher probability contours) than for the site 9 release. As a result of rotor reversal flows, the increased probabilities also extend northwestward against the synoptic-scale flow. As SCIPUFF hazard mode uses the ensemble wind-variance statistics to represent uncertainty among members, single-member approaches also demonstrate enhanced hazard-area prediction compared to site 9 release results (cf. Figs. 15c,d and 15a,b).
These ATD results demonstrate the importance of wind reversal regions associated with rotor circulations to surface dosage prediction. Predicting the correct location and timing of these motions is especially difficult making an ensemble approach quite valuable. It is noted that results for a third case study (not shown) involving trapped lee waves induced by south-southwesterly flow over the opposite ridge, Tussey Ridge, demonstrate SEP16-like reversal flow patterns along with consistent statistical forecast improvements for the 0.4-km domain over the 1.3-km domain for both surface and SODAR wind observations. Its ATD results also support the general conclusions from the two cases presented here. Given all of these results, a 12-member ensemble with subkilometer grid spacing appears capable of providing reasonable ATD spread critically important for understanding forecast uncertainty for a chem-bio release during conditions induced by mountain-wave activity in the SBL. Proof-of-concept for hazard-prediction area estimates from a single-member SCIPUFF simulation using meteorological-ensemble wind-variance statistics has also been demonstrated, and it warrants further investigation/testing for time-critical emergency response applications.
6. Conclusions and future work
A 12-member ensemble with subkilometer grid spacing, high vertical resolution, and diversity in initial conditions and PBL/SL physics produces reasonable spread in ATD forecasts and generally reliable low-level temperature and wind forecasts for SBL cases involving flows over complex terrain. Subkilometer grid spacing appears necessary for resolving flows most relevant to ATD uncertainty in each SBL case study. The 0.4-km ensemble ATD forecasts resolve the impacts of drainage flow in a valley-parallel flow case and the impacts of reversal flows associated with wave-induced rotor circulations in a trapped lee-wave case for which rotor circulations are not produced by the 1.3-km ensemble. For both case studies, ridgeline and midvalley release experiments demonstrate that single-member ensemble mean and best-member approaches using ensemble wind field uncertainty statistics in SCIPUFF hazard mode generally represent the hazard-area patterns while encompassing the regions of explicit-ensemble dosage probabilities but at a reduced computational cost. Proof-of-concept for the single-member approach was demonstrated, and it warrants further investigation/testing as a valuable tool for a short time-fuse emergency response in the event of a chem-bio release whereas additional details given by the explicit-ensemble approach may be more useful for retrospective studies following an incident.
CRPS and rank histogram results show that subkilometer grid spacing is necessary for providing adequate ensemble spread and for capturing temperature and wind fluctuations directly related to drainage flows and rotor-circulation flows over the RS network. For both case studies, CRPS reliability values for low-level variables from the subkilometer model configuration generally indicate that forecasted ensemble probabilities are close to the true probabilities of the observed values. In most instances, significant improvements in CRPS and its components from the 1.3-km ensemble to the 0.4-km ensemble justify the increased computational resources needed for a subkilometer ensemble. These improved low-level temperature and wind forecasts should lead to improved ATD forecasts. Multi-initialization versus multiphysics ensemble CRPS results indicate that low-level ensemble wind forecasts are generally more dependent on the TKE-based PBL/SL physics than initialization strategy with available observations for the weakly forced parallel-flow case (SEP13), while forecast skill for the synoptically forced rotor circulation case over RS (SEP16) depends more on initialization strategy, and the use of data assimilation generally gives superior results. The combination of initialization and physics diversity in the 12-member ensemble yields the best ensemble performance.
Future work should include evaluation against tracer data and the creation of a large database of ensemble forecasts over different locations and seasons. More extensive verification at and above the surface will allow for additional tuning of ensemble member diversity strategy and calibration of ensemble output. Alternative data assimilation methods should be explored on these very fine scales for the SBL and should leverage additional upwind observations such as wind and stratification profiles. This will allow for a more thorough assessment of the utility of data assimilation and enhance prediction of the effects of different wave and rotor types on SBL ATD. The impact of even higher horizontal and vertical resolution should also be examined.
This project received support from the Defense Threat Reduction Agency (DTRA; Grant HDTRA1-10-1-0033) under the supervision of John Hannan. SODAR funding was provided by the U.S. Army Research Office under DURIP Grant W911NF-10-1-0238. The authors thank AJ Deng, Brian Gaudet, Glenn Hunter, David Stensrud, and George Young for useful discussions and input. The authors appreciate the valuable suggestions put forth by three anonymous reviewers.