## 1. Introduction

Light fog, defined here as fog resulting in visibility of 1–7 mi, occurs nearly globally and is a significant safety concern for many aviation operations. In the Department of Defense, light fog can drive restrictions on certain aircraft types, equipment, and less-experienced pilots such that the ability to conduct operations is severely degraded. Postprocessing numerical forecasts with observations may mitigate the lack of accuracy in predicting light fog. Where observational records are unavailable, standard statistical corrections that may not consider the physical state of the model are impossible. The lack of observations in these locales also rules out robust tools available in data-rich regions, such as the Federal Aviation Administration’s National Ceiling and Visibility Analysis product, which uses a decision-tree framework to assimilate real-time surface and satellite observations with model data to make ceiling and visibility predictions to 12 h (Herzegh et al. 2006).

A statistical link between a numerical forecast and the expected error in fog prediction can be drawn, for example analogous to MOS, but the predictors cannot be chosen blindly, and the relationships can be highly nonlinear. This is the reason for the use of, for example, nonlinear neural networks in fog prediction (Marzban et al. 2007).

We propose a nonparametric approach to ensemble fog prediction, where the predictand is the probability of light fog. Ryerson and Hacker (2014, hereafter RH14) examined raw predictions from an uncalibrated 4-km, 10-member Weather Research and Forecasting (WRF) Model (Skamarock et al. 2008) ensemble based on varying physical parameterizations. Analysis showed the raw (i.e., uncalibrated) predictions from the ensemble produce a severe shortage of light fog predictions corresponding to visibilities of 1–7 mi, in favor of excessive forecasts of zero cloud water (no fog). The raw predictions had skillful results in a mountainous region (the Sierra Nevada near the California–Nevada border), mixed results in a coastal region (the Pacific coast in northern California), and were not skillful in a valley region (the California Central Valley) dominated by radiation fog. Error in predictions of cloud water mixing ratio *q*_{c} was primarily due to systematic NWP physics error, defined broadly as error resulting from an inaccurate parameterization of subgrid-scale processes in the model ultimately leading to error in resolved variables.

Most previous work in fog prediction using NWP and postprocessing has focused on heavy fog (for simplicity we define heavy fog as conditions that reduce visibility to <1 mi), but error from subgrid-scale parameterizations is a consistent theme. In a thorough review of the literature and a comparison of several fog NWP models, Gultepe et al. (2007a) emphasized several challenges to accurately predicting fog with an NWP model, notably the need for very high horizontal resolution that is “sufficiently less than a few meters” as well as the need to account for horizontal heterogeneities of soil and vegetation.

Geiszler et al. (2000) offered an informative example of attempting fog prediction using direct, uncalibrated NWP output of *q*_{c}. Coupled with a simple visibility parameterization to convert NWP predictions of *q*_{c} to visibility, they tested a 9-km-resolution version of the Coupled Ocean–Atmospheric Mesoscale Prediction System (COAMPS) model over coastal California, finding the results had little skill. The somewhat more skillful results attained by direct, uncalibrated *q*_{c} predictions in RH14 were from stochastic verification of an ensemble; each individual ensemble member was unskillful on its own. Similarly, Zhou et al. (2009) obtained qualitatively better results than Geiszler et al. (2000) when applying the same simple visibility parameterization to NWP *q*_{c} predictions from the 32-km horizontal resolution, 21-member Short Range Ensemble Forecast system produced by the National Centers for Environmental Predictions (NCEP). Yet in a recent prolonged case of widespread fog in the United Kingdom, Price et al. (2015) found that 1.5-km ensemble predictions from a high-resolution version of the U.K. Met Office Unified Model struggled to distinguish low stratus cloud from fog and predicted premature dissipation.

These results support the fundamental advantage of an ensemble, but also suggest that the prospects of attaining consistently skillful fog prediction using direct, uncalibrated NWP predictions of *q*_{c} are limited using a horizontal resolution of a few kilometers or more, currently the approximate limit for most operational NWP modeling centers. Even at a resolution of 333 m, using a version of the Unified Model with a modified cloud parameterization, Boutle et al. (2016) achieved perhaps the most promising results to date but observed little improvement at shorter forecast lengths, and computational expense currently limits a model domain of this resolution to perhaps a single state or province (i.e., 100 km × 100 km) in real-time operations. Given these limitations, a common approach is to rely on a more sophisticated visibility parameterization that leverages NWP output variables beyond just *q*_{c}.

Examples of these more sophisticated visibility parameterizations are plentiful in both operations and research. Until recently, visibility predictions from the U.S. Air Force’s Mesoscale Ensemble Prediction System (MEPS) used a statistical parameterization developed from regression on a 1-yr training dataset of RUC analyses at thousands of U.S. locations (E. Kuchera 2011, personal communication), with predictors of total column precipitable water, 10-m wind speed, and 2-m relative humidity (RH). Zhou and Du (2010) also applied a unique visibility parameterization to output from a 15-km resolution, 10-member ensemble, in order to make a yes/no radiation fog prediction based on *q*_{c}, 10-m wind speed, 2-m RH, and cloud-top and cloud-base heights.

Since these more complex visibility parameterizations are trained using observational predictors, they effectively improve the quality of the parameterization while remaining susceptible to error in the NWP predictions of the input parameters. Yet RH14 showed that error in the NWP predictions of parameters most germane for fog (i.e., *q*_{c}, temperature, and water vapor) are the most important source of error, suggesting a limitation on the capability of parameterization strategies built from observational data that ignore NWP error.

In a unique approach, Zhou and Ferrier (2008) described a physically based computational method for obtaining *q*_{c} values during radiation fog events by explicitly solving the governing equation that diagnoses *q*_{c} as a function of the turbulent exchange coefficient, droplet gravitational settling, condensation rate due to cooling, and depth of the fog layer all taken from numerical predictions. The abundance of zero or near-zero *q*_{c} predictions from the 10-member ensemble documented in RH14 eliminate fog depth as a useful input for diagnosis. Zhou (2011) suggested that for known negative RH biases, defining the fog depth as the depth where an RH threshold is exceeded in the NWP predictions can help.

The next section will discuss the basis and methodology for the postprocessing strategy used in this work, followed by the process of optimizing the technique in section 3. Cross validation is presented in section 4, and section 5 offers a summary and conclusions.

## 2. Methodology

### a. Data

This work uses the same seven verifying locations and 29-day dataset of Automated Surface Observing System (ASOS) observations and WRF ensemble predictions as in RH14, and the details are not repeated here. Since ASOS estimates visibility by measuring the extinction coefficient *β*_{e}, this work used *β*_{e} as the verifying parameter. The ASOS algorithms for converting *β*_{e} to visibility are provided in Table 1.

ASOS daytime and nighttime algorithms for converting the extinction coefficient *β*_{e} to visibility *r*_{t}.

As detailed in Ryerson (2012) and RH14, the observations were also processed to isolate visibility restrictions due to fog from those due to other obscurations (e.g., haze or precipitation), which were recategorized as “no fog” and given values *β*_{e} = 0.10 km^{−1} to indicate unrestricted visibility.

### b. Postprocessing objective

As detailed in RH14, the light-fog prediction skill of the WRF ensemble is limited by the ensemble systematically predicting a bimodal *q*_{c} and excessive zero or near-zero *q*_{c} in the coastal and valley regions. Because the errors are linked to physical attributes in the model, and are persistent across similarly sited observing locations, we postulate that addressing this deficiency is the most immediately impactful step in improving operational WRF-based fog prediction where location-specific calibration is not an option.

To mitigate the errors diagnosed in RH14, the approach exploits joint distributions of resolved variables in the numerical forecasts and links those distributions to errors in light fog. Lookup tables are created, which can be interpolated to provide a forecast for the probability of light fog at any threshold.

The approach centers on identifying instances when individual members predict zero or near-zero *q*_{c}, yet fog is likely (i.e., the missed opportunities), and making upward adjustments to the probability of fog in these cases. The approach does not address the *overforecasts* of *q*_{c} for two reasons. First, it offers less promise for improving skill because overforecast *q*_{c} is far less common than underforecast *q*_{c} (RH14). Second, the raw ensemble output is severely underdispersive, with all ensemble members simultaneously predicting zero *q*_{c} in 62% of all observed fog cases, which results in zero ensemble dispersion and no skill in these cases. Results will show the approach can add significant overnight skill to predictions in valley and coastal regions compared to raw mesoscale ensemble forecasts, with modest skill increases after sunrise.

The challenge in adding skill to the ensemble by statistically adjusting zero or near-zero *q*_{c} predictions from the members is in knowing whether fog is likely. The strategy is intentionally conservative such that the fog prediction is taken directly from the NWP model when fog is predicted. The probability of light fog can only be increased from zero, reducing complexity while adding skill.

Although the numerical *q*_{c} predictions are bimodal in all regions, the mountain region is unique in that the model does not exhibit a surplus of zero or near-zero *q*_{c} predictions. RH14 showed overall *q*_{c} bias to be near neutral or positive for most members in the mountain region, and the raw predictions were shown to produce the greatest skill of any region beyond 10 h. Therefore, adding skill to these predictions is not an objective. However, where indicated, the mountain region is included when optimizing and cross validating the method since it may not be possible to discriminate between regions in an operational setting.

RH14 showed that a negative RH bias at the lowest model level (layer 1) is largely responsible for the lack of predicted *q*_{c} in the coastal and valley regions, which in turn results mostly from a layer-1 warm bias that is greatest overnight. Predictions of 2-m water vapor *q*_{υ} in the valley region have no clear systematic error.

To further reduce complexity, the postprocessing works on *β*_{e} thresholds rather than *q*_{c}. The combined effects of NWP prediction error in *q*_{c} and visibility parameterization error (conversion of *q*_{c} to *β*_{e}) are treated together. Working mainly with heavy fog cases, Gultepe et al. (2006) showed *β*_{e} is strongly influenced by both *q*_{c} and the droplet number concentration. Separately postprocessing *q*_{c} predictions and then diagnosing *β*_{e} is expected to be much more difficult and costly. It would likely require archiving the output of number concentration and droplet size distribution information and greater availability of *q*_{c} observations against which to verify.

### c. Exploiting joint forecast distributions

The proposed approach estimates the probability of observed fog, given the joint probability of two NWP parameters from a single member of the ensemble. For an individual ensemble member, we estimate *P* is the exceedance probability for a given extinction, *β*_{e} is at given model-predicted values *X*_{1} and *X*_{2} of two particular parameters, and *p* is the probability of exceeding the threshold from a sample of similar historical model forecasts at *X*_{1} and *X*_{2.} The specific parameters used, *x*_{1} and *x*_{2}, are NWP predictions of two physical values related to fog. Choices for those are discussed below. An example, presented next, helps clarify the details of estimating the probability.

The method for estimating the probabilities is described using data from the coastal region plotted in Fig. 1 as an example. In Fig. 1a, each point represents a 2-m RH and 2-m vapor pressure prediction from an individual ensemble member at a single time when *no fog was predicted* by that member. Red points are missed opportunities; fog was not forecast but was observed (using the lowest *β*_{e} threshold). Blue points show predictions coinciding with no observed fog or correct rejections. The first 6 h of each case day are excluded to minimize the effect of model spinup. The points in Fig. 1a are the training data for postprocessing.

The majority of both missed opportunities and correct rejections appear to occur at predicted RH > 0.70. Within a given range of predicted RH values, an ideal predictor would lead to the ratio of observed fog cases to total cases being equal to either 0 or 1. Using only RH does not lead to values near 0 or 1. RH alone is not a useful predictor of observed fog.

When examined in two-dimensional space, Fig. 1a reveals that the ratio of observed fog cases to total plotted cases is quite low when the members’ 2-m RH predictions are 0.70–0.80 *and* 2-m vapor pressure predictions are 6–8 hPa. The exact ratio within this two-dimensional bin is 12:208, or an incidence of about 0.058. For any member prediction that does not include fog, but has a 2-m RH prediction and 2-m vapor pressure prediction that falls in this two-dimensional bin, we could use this ratio to estimate the probability of fog that should be predicted by the member.

Instead of binning the points in Fig. 1a, or fitting functions to the distribution, we can choose a more flexible approach to summarizing it. We seek to retain resolution in the training data (e.g., Fig. 1a), while ensuring sufficient samples for each probability estimate. If fixed bins were chosen, variable data density would lead to variable sampling error across the range of forecast values.

Instead, a sample is formed to describe the probability at each point in Fig. 1a by choosing *N −* 1 additional points closest to that point. The distance is Euclidian, and the two axes are normalized by the range of the data of each parameter. These neighborhood sample regions are therefore circular in 2D space, with a unique sample radius at every point such that the sample size *N* is constant. Initially, *N* = 1/12 of all the data in the plot, which in the coastal region (having the smallest dataset of any region) equates to 499 points, or an average of 62 predictions from each of the eight ensemble members.^{1} Since only forecast hours 7–20 are used for each ensemble run (a total of 14 predictions per member, per case day), each sample contains data from at least five separate days. Additionally, each case day is spaced three or four calendar days apart to further reduce the correlation among the data.

At every point on the plot, each with its own sample radius, a probability estimate is estimated as the ratio of missed opportunities in the sample region to the neighborhood sample size (here 499). Contours of this ratio in the joint parameter space, as shown in Fig. 1b, can take on any shape, with the largest gradients in regions of high fog sensitivity. Conversely, estimated probabilities change little where data are sparse.

With greater data density, output probabilities are permitted greater sensitivity to small changes in the predictions, resolving patterns in the data that might otherwise be absent with samples spanning a broader range of data. As an example, in Fig. 1 where the predicted RH > 0.8, the incidence of fog increases rapidly with increasing predicted vapor pressure. Here, the lowest observed fog incidence indicated by the joint distribution, with a value of 0.002, is found at *high* RH predictions and low water vapor pressure predictions. The thermodynamic reasons for this will be discussed in section 3.

To verify the predictions from the training data, it is necessary that values at all points in the space between training data points be explicitly defined, because new predictions subject to postprocessing are not likely to match predictions in the training data. A multiple nonlinear regression technique would be required to properly fit the data to a function. Alternately, we can interpolate estimated probabilities between the training data points. To interpolate between training data, we use the Delauney triangulation scheme (Delaunay 1934). It is similar to bilinear interpolation, but references only three surrounding data points instead of four. This scheme is preferable for dealing with irregularly spaced (i.e., nongridded) data.

As described above, the postprocessed probability of the *β*_{e} threshold exceedance is estimated for each member that did not predict exceedance on its own. Ensemble members predicting fog on their own have a probability of 1. Then, the mean of the member probabilities gives the postprocessed probability of exceedance for the ensemble. The process for producing an output probability map is repeated at each of the remaining three *β*_{e} verification thresholds using the same parameter pair for each.

Outside of physical intuition, we lack rigorous guidance on choosing parameter pairs to form joint distributions. Below, we thoroughly explore potential pairs and choose a simple metric to measure the predictive potential of each pair. The variance of estimated probabilities across each plot serves the purpose, that is, the mean squared difference between the output probability at each point and climatological rate of probabilities for that joint distribution. The variance measures the potential for predictive *resolution* in the estimated probabilities, where resolution is one of two components of overall stochastic prediction accuracy and is defined as the degree to which the final probabilistic predictions vary in conjunction with the observed frequency of occurrence without regard to the bias of the predicted probabilities. Here, the variance is used to assist with selecting the most promising parameter pairs, later subject to full cross validation in section 4, where the exact resolution will be computed.

This method for initially screening parameter pairs might seem to oversimplify the process by ignoring the other component of stochastic predictive accuracy, *reliability*, which measures the magnitude of the bias of the probabilistic predictions. During initial screening, the reliability is not easily estimated a priori because it is inherently perfect for the training data used to build the plots. Not until cross validating the most promising parameter pairs will we know the true reliability, and this will depend on the degree of overfitting of the training data. The inability to estimate reliability during the screening process should not favor any particular parameter pair because the use of standardized sample sizes equalizes the potential impact of data overfitting across all the screened parameter pairs. For additional exploration of resolution and reliability as stochastic metrics, see the mathematical definitions provided in Table 2, as well as a detailed discussion in Wilks (1995).

Description of metrics used to assess stochastic predictions from the ensemble. Here, *M* is the number of forecast/observation pairs; *I* is the number of probability bins (11); *N* is the number of data pairs in bin *i*; *p*′_{e} is the center of the forecast probability bin (0.025, 0.1, 0.2, 0.3, …, 0.8, 0.9, 0.975) for bin *i*; *ō*_{i} is the observed relative frequency for bin *i*; *ō* is the climatological frequency (total occurrences/total forecasts); and *T* is the number of event thresholds.

Based on the systematic NWP model errors identified in Ryerson (2012) and RH14, we limit the parameter candidates to temperature and moisture variables at both model layer 1 (19–21 m above the model’s ground level) and diagnosed 2-m values, and parameters easily derived from them. Those include RH, virtual temperature, and vapor pressure depression (i.e., the difference between the saturation vapor pressure and the vapor pressure). The 2-m predictions of temperature and water vapor are diagnosed in a WRF atmospheric surface-layer scheme based on Monin–Obukhov theory and various flux-profile relationships [cf. Stull (1988) for a survey]. In addition, variable *deficits*, defined as the 2-m diagnosed values minus the layer-1 prediction values, are also included as parameter candidates.

Some of the NWP model deficiencies examined in RH14 exhibited a time dependence, so we include the time rate of change of each parameter, computed by subtracting the prediction from the prediction at the previous hour, as an alternative parameter candidate in the evaluation.

The complete list of parameter candidates initially evaluated is listed in Table 3. Including the time rate of change of each parameter, a total of 946 joint parameter combinations are screened by computing the spatial variance of estimated probabilities in the parameter space. While some of the parameters in Table 3 may appear redundant (such as temperature and saturation vapor pressure, which are functions only of each other), the relationship between similar variables is nonlinear, and therefore each variable produces unique patterns and predictive potential when plotted in the joint parameter space where the axes are linear.

Predicted parameters considered for use in a parameter pair to define a joint parameter space. In addition, the 1-h time rate of change of each parameter was also considered as its own parameter. The cloud water mass concentration predictions only include values ≤ 8.5 × 10^{−4} g m^{−3} since anything larger than this is not subject to postprocessing and therefore is not in the training dataset. Parameters denoted in boldface, along with the time rate of change of all listed parameters, were considered for use in BT.

## 3. Selection of parameter pairs

This section will present the results of the parameter screening, based on three distinct approaches to constructing and summarizing the joint distributions. The “Best Overall” (BO) experiment is chosen as the parameter pairing producing the largest variance when the data from all three geographic regions are included. The “Sample Size” experiment examines the impact of changes to the bin size for the parameter pairing found in BO. Finally, we take a more critical view and examine parameter pairings we believe are less dependent on the local climatology of the sites used in the training data than the pairing found in BO, making them more geographically transferable. We call this experiment “Best Transferable” (BT).

The parameter screening sums the plot variances for all four *β*_{e} thresholds. This has the effect of giving greater weight to the lowest *β*_{e} thresholds because they have more variability among the screened parameter pairs (i.e., the ability to predict fog of any severity is given higher priority than the ability to predict only the heavy fog cases).

### a. Best Overall

The virtual temperature deficit paired with predictions of layer-1 vapor pressure produce the greatest variances when all three geographic regions are included in the training sample. The data and corresponding probability plots for this parameter pair are shown in Fig. 2, with rows corresponding to each of the four *β*_{e} thresholds increasing from top to bottom. The range of probabilities (and plot variance; not shown) are lower at the greater *β*_{e} thresholds. The plots show that heavy fog events (Figs. 2g,h) are not clearly distinguishable. As the fog threshold decreases toward lighter fog (greater visibilities), the gradients in the probabilities increase, indicating a greater ability to resolve predicted fog events.

Vapor pressure predictions exhibit high predictive power in the coastal region (Fig. 1), with a low observed incidence of fog when the predicted 2-m vapor pressure predictions are low. We propose this predictive mechanism exists because, as described in Ryerson (2012), during the overnight hours in this region, predicted low-level vapor pressure is a better predictor of observed 2-m temperature than the low-level temperature predictions themselves. At low temperatures, and therefore low vapor pressure predictions, upward heat flux from the sea surface maintains a weakly turbulent boundary layer that favors low stratus clouds rather than fog.

The predictive power of virtual temperature deficit predictions is also tied to stability, with a particularly strong signal in the valley region (not shown). Negative values (i.e., the predicted 2-m virtual temperature is less than the layer-1 virtual temperature) correspond to predicted low-level temperature inversions. Inversions in the valley region are typically produced by overnight radiational cooling of the ground and are required for radiation fog. To a certain extent, the virtual temperature deficit predictions help mitigate the impact of volatility in the temperature, water vapor, and RH predictions, which RH14 showed had a bias that changes sign after sunrise. Instead, leveraging predictions of the virtual temperature *deficit* appears to mitigate these changing biases, making the virtual temperature deficit predictions a viable predictor for fog.

The postsunrise period remains challenging. The challenge is manifest in Fig. 2 by numerous instances of fog events during predictions of a positive virtual temperature deficit. Fog not associated with a predicted inversion is not well resolved by any joint parameter distributions when considering data from all the regions, which limits potential skill increases in the valley region during these hours.

The parameter pairs with the most predictive power are also those where the mountain region predictions exist in a different sector of the space than the rest of the data. Then, the mountain predictions can be assigned appropriately low probabilities, preserving the NWP skill in the mountains as much as possible. This is beneficial for skill in the other regions as well, as probabilities are not lowered by the excessive influence from the mountain region predictions. For predicted layer-1 vapor pressures used in BO, mountain region values are typically <6 hPa, effectively separating much of the mountain data from the coastal and valley predictions, which have values generally >6 hPa.

### b. Sample Size

The degree to which the joint distribution training data are overfit largely depends on the sample size used. Larger samples reduce the risk of overfitting and increase the likelihood of reliability improvement, but potentially reduce resolution as the probability forecasts approach the climatological incidence. Samples that are too small and have overfit the training data capture unresolved high-frequency variations in the predictions, rather than a systematic NWP model behavior, potentially resulting in reliability and resolution decreases.

Predictions are tested using modified versions of the BO joint parameter space map with different sample sizes. The Sample Size–Large experiment uses a sample size increased by 50%, such that each sample includes 1/8 of the total data rather than 1/12 as used elsewhere. The Sample Size–Small experiment uses samples that are 33% smaller than BO or 1/18 of the total data. The resulting postprocessing maps are shown in Fig. 3, with the standard sample size used in BO also included for comparison (center column).

### c. Best Transferable

The BT experiment examines parameter pairs that might have more worldwide transferability because their predictive power is expected to be less reliant on a particular aspect of the local climatology than the parameter pair identified in BO. For example, BO revealed a range of predicted vapor pressure values favorable for fog, but this range likely depends on local conditions specific to the California coast during the test period (e.g., sea surface temperature). In BT, we still look for plots producing the largest variance, but subjectively restrict candidate predictors to parameters more easily ascribed to universal, as opposed to localized, physical mechanisms. This eliminates several parameters that are a single absolute value (i.e., vapor pressure, temperature, wind direction, etc.), while favoring parameters defined by a ratio (e.g., RH), difference (e.g., vapor pressure deficit), or time rate of change. In Table 3, parameters denoted in boldface, along with the time rate of change of all parameters, are candidates for BT.

The BT experiment consists of four tests, each using a distinct dataset. The first three of these aim to find the best transferable parameter pair for individual regions or region combinations. These are a coastal domain, utilizing data only from the two coastal sites; a valley domain, using the data from the three Central Valley sites; and a combined valley*/*mountain domain, which uses data from the five sites in the valley and mountain regions. These domain-optimized pairings leverage both the unique traits of the systematic NWP error in each domain and the aspects of the predictions with the most predictive skill. Operationally, they are intended for applications such as small NWP model domains with little geographical variation, or point forecasts for which the domain category can be appropriately defined. Finally, BT identifies the most promising geographically transferable parameter pair found when including data from all three regions. This test is similar to BO, except that here the parameter pairs are restricted as described above.

Evaluation of the BT candidate predictors shows that along the coast the 2-m RH paired with the virtual temperature deficit provides the most accurate fog predictions (Fig. 4). Figure 1 showed that 2-m RH paired with 2-m vapor pressure reasonably predict of fog in this region, especially by ruling out fog when predicted RH values are low. For predicting inversions, the 2-m virtual temperature deficit appears to be an adequate substitute for the 2-m vapor pressure used in Fig. 1 and is less likely to be location specific.

The mechanism by which the 2-m virtual temperature deficit indicates stability near the coast is fundamentally the same as with a radiation inversion in a valley: the 2-m temperature predictions will have values in between the layer-1 predictions and the surface (soil or sea) temperature in the member, and so negative deficits indicate that the surface temperature is likely colder than the layer-1 temperature in the ensemble member, and a stable lower boundary layer exists. A stable boundary layer alone is not sufficient for fog in the coastal region, but Fig. 4 indicates an incidence > 0.4 at the lowest *β*_{e} threshold if the predicted RH is also >0.8.

The coastal region is heavily influenced by the stability over water, but the sites themselves are on land and are also affected by diurnal radiative forcing. Figure 4 indicates the incidence of fog is very low when the predicted virtual temperature deficits are >0.5 K, which tend to occur during either cold outbreaks (unstable marine boundary layer) or postsunrise radiative heating.

NWP predictions at the coastal sites are bilinearly interpolated from two NWP model grid points over land and two over water and represent some mixture of marine and terrestrial boundary layer structure within a few kilometers of the coast. Whether this is overall beneficial is not known, but it presumably lends some consistency to the error characteristics and postprocessing performance among the stations compared to using the single nearest grid point to each station, which might be over land or water. Another potential approach, not used here, is to use the nearest land grid point to the station in order to isolate the error characteristics from any complicating effects of data from grid points over water.

The greatest probability variances of all the parameter pairs tested for the valley sites are produced by a pairing expected to be among the most transferrable: the saturation vapor pressure deficit and layer-1 vapor pressure depression (Fig. 5; the *y* axis has been inverted so that a smaller vapor pressure depression, which generally corresponds to higher RH, is at the top of the plot). As saturation vapor pressure depends only on temperature, negative values of the saturation vapor pressure deficit correspond to predicted low-level temperature inversions, which are strongly correlated to fog incidence in this region. A large portion of the space associated with predicted inversions shows a fog incidence exceeding 0.8 at the lowest *β*_{e} threshold.

Even when an inversion is predicted, the data show fog is less likely when the layer-1 vapor pressure depression is very small, which corresponds to high RH. To examine this more closely, Fig. 6 plots the mean observed and predicted saturation vapor pressure for the valley sites from NWP model runs when morning fog is not predicted but is observed (Fig. 6a) and when morning fog was neither predicted nor observed (Fig. 6b). The plots do not include cases when fog was predicted. Foggy days are characterized by more rapid cooling during the overnight hours (forecast hours 0–16), which is consistent with a conventional radiation fog scenario in which the cooling rates are higher due to minimal cloud cover and light winds.

The NWP cooling rate is accurate in the observed fog cases, but the saturation vapor pressure is initialized too high by about 3 hPa (or about 2–3 K), and maintains this bias throughout the night, resulting in erroneously low RH predictions (and likewise, erroneously high predictions of vapor pressure depression). In cases without fog (Fig. 6b), the NWP has minimal temperature bias at initialization and throughout the nighttime. RH predictions are reasonably accurate, with just small positive biases attributable to slightly positive vapor pressure biases.

NWP model deficiency in predicting RH, or vapor pressure depression, results in an unconventional but effective set of predictors for fog. Among all days when the NWP model does not predict fog, it exhibits a warm bias that preferentially affects runs when fog is likely to form. Since the observed vapor pressure depression during the nighttime shows little difference between the fog and no-fog cases in the plot, this also leads to a positive vapor pressure depression bias in the NWP model that preferentially affects unpredicted fog days. Paired with the saturation vapor pressure deficit the biased vapor pressure depression produces a useful joint distribution.

These results offer a subtle contrast to the low-level cooling rates suggested by Tardif (2007) for use as a radiation fog predictor. Here, using cooling rates as one of the predictors produces probability plots with variances peaking about 30% lower than those in Fig. 5. Figure 6 suggests cooling rates could be a valuable alternative for identifying radiation fog likelihood, perhaps more so if postprocessed in a way that allows the response in fog probability to lag the predictor (e.g., high cooling rates result in high fog probabilities at a later forecast hour).

At the mountain/valley combined sites, layer-1 RH paired with the virtual temperature deficit is the most promising (Fig. 7) among pairings expected to be transferable. Compared to BO, BT shows significant overlap between mountain and valley predictions. The overlap is particularly notable at high layer-1 RH, where the mountain data show high RH predictions without observed fog. Cross validation in section 5 will show this produces a negative impact for the postprocessed mountain region predictions.

When BT is optimized using the data from all regions (Fig. 8), model predictions of inversions again prove to be important for predicting fog. While BO paired the virtual temperature deficit with the layer-1 vapor pressure, BT pairs the virtual temperature deficit with the more-transferrable layer-1 RH.

This is same parameter pair used for BT with valley/mountain optimization (Fig. 7). By including coastal data in the optimization here, the probabilities are lowered when the predicted virtual temperature deficit is >0 compared to probabilities in BT with valley/mountain optimization (i.e., when inversions are not predicted, fog is rarely observed in the coastal region).

In contrast, the range of layer-1 RH predictions most likely for fog aligns well between BT in the valley/mountain domain (Fig. 7) and BT using the data from all sites (Fig. 8). As we have documented, fog in the valley region is most likely with layer-1 RH predictions of 0.7–0.8 due to a warm bias that preferentially affects unpredicted fog days. Layer-1 RH predictions of 0.7–0.8 also correspond to the highest fog probability in the coastal region, which suffers from a layer-1 RH bias of about −0.20 during the overnight hours (RH14).

## 4. Verification

### a. Description

Cross validation of each experiment provides an indication of how well the joint parameter space technique might predict outcomes when employed with new data. We use leave-one-out cross validation, where each case day from the dataset is verified using the postprocessing probability map developed with data from the other 28 case days. Ryerson (2012) also examined parsing the data by member or by location for the purposes of leave-one-out cross validation but found that parsing by case day likely produces the lowest validation scores and is, therefore, the most difficult test.

Verification is performed using traditional stochastic metrics, also employed in RH14 and defined in Table 2. The ranked probability skill score (RPSS), a consolidated metric that combines the Brier skill score (BSS) at each of the four verification thresholds, is shown in Fig. 9 for the coastal sites (top panel), valley sites (middle panel), and mountain sites (bottom panel). Subsequent figures show reliability, resolution, and BSS results at the lowest *β*_{e} threshold (results at highest *β*_{e} thresholds are qualitatively similar but less skillful) for all the experiments at the coastal sites (Fig. 10), valley sites (Fig. 11), and mountain sites (Fig. 12). On all verification plots, results from each experiment are distinguished by the symbols and line types indicated in Fig. 13.

### b. BO, BT, and comparison to Cntrl

Figure 9 indicates both BO and BT add skill to the stochastic ensemble predictions in the coastal (top panel) and valley (middle panel) regions at most hours, regardless of the data used for optimization. Skill improvements result from a combination of reliability and resolution improvements (Figs. 10 and 11), with greater resolution improvements in the coastal region. Reliability improvements are not surprising since RH14 showed the raw NWP predictions have a negative *q*_{c} bias, and postprocessing with the joint distributions can only maintain or increase the probability of *β*_{e} exceedance. The resolution improvement is encouraging, because it suggests the postprocessing is effective at making larger upward probability adjustments to the predictions when fog occurs.

In the valley region (Figs. 9b and 11), all of the experiments show large skill improvements during the overnight hours (forecast hours 7–15). Improvements are driven mostly by reliability improvements, but also resolution improvements in most experiments. The postsunrise hours (forecast hours 16–20) are characterized by more modest skill improvements, with small improvements in both reliability and resolution.

None of the experiments produce appreciable skill increases in the mountain region, with most experiments producing RPSS < 0 and BSS < 0 until after sunrise (Figs. 9c and 12). Making upward adjustments to zero or near-zero *q*_{c} predictions is not well suited to this region because the raw model predictions do not exhibit a surplus of these predictions, and the overall *q*_{c} bias is near neutral or positive for most members. Despite this drawback, more skillful results are likely if the technique were optimized for the mountain region by itself. This was not tested since the raw model predictions in this region are far more skillful than in the other regions, presumably having less to gain from postprocessing.

Although not presented here, Ryerson (2012) showed that significant skill increases in the coastal and valley regions could be achieved by using 2-m RH as a single probabilistic predictor during the overnight hours. The single predictor performed poorly after sunrise, leading to nominal skill changes in the coastal region, and skill decreases in the valley and mountain regions compared to Cntrl (see description in Fig. 13). The primary advantage of the most skillful BO and BT pairs of predictors, over the single predictor, is the postsunrise performance. It is equal to the single-predictor results in the coastal region and is more skillful in the valley and mountain regions. Two predictors also maintain overnight skill equal to or better than the single-parameter technique in all regions. Increased skill results from the virtual temperature deficit or saturation vapor pressure deficit predictions, which appeared as part of the optimal configuration in every region. The predictors based on stability offer an additional degree of freedom that appropriately adjusts fog probability downward during postsunrise heating.

Despite its use of the most promising overall parameter pair without regard to worldwide transferability, in the coastal and valley regions BO seems to offer no appreciable skill advantage over BT optimized for the all regions domain, even after sunrise (Fig. 9).

### c. Domain optimization

Figure 9 shows, for the valley (center panel) and mountain (bottom panel) regions, BT with valley/mountain optimization (dotted line) performs similarly to BT with all-regions optimization (solid line). This indicates the addition of coastal region predictions to the joint distribution has little effect on the output probabilities for the valley and mountain regions. For coastal-only applications, Fig. 9a shows that 2-m RH in BT with coastal optimization (dashed line), instead of layer-1 RH used in BT with all-regions optimization (solid line), produces a slight skill advantage after sunrise. Otherwise, the effect is minimal.

In the valley region, Fig. 9b shows BT with all-regions optimization (solid line) is skillful, and Fig. 11 indicates this is mostly via reliability improvements. BT with valley optimization (dashed line) also produces significant reliability gains, in addition to resolution gains during the overnight hours. The parameter pair for BT with valley optimization is clearly the most promising among those tested for valley-only applications.

Because there is no acceptable transferable parameter pair that produces positive skill in the mountains, the joint parameter postprocessing framework developed in this work should not be applied in a mountain region.

### d. Sample Size–Large and Sample Size–Small

Results from the Sample Size–Large and Sample Size–Small experiments show that there are no clear skill differences compared to BO at the lowest *β*_{e} threshold (Fig. 9), and only small differences in reliability and resolution that vary by region and forecast hour (Figs. 10–12). We conclude that the particular joint parameter distribution used for this experiment has low sensitivity to bin size within the range of bin sizes tested using this relatively small dataset.

## 5. Summary and conclusions

The goal of this research was to investigate the viability of improving short-term (<20 h) probabilistic fog predictions using joint parameter distributions to postprocess NWP output in a way suitable for use in sparsely observed regions. Results suggest the method can add significant overnight skill in valley and coastal regions compared to raw mesoscale ensemble forecasts, with modest skill increases after sunrise.

The postprocessing strategy differs from many typical methods in that it is not a site-specific calibration. Instead, it aims to maintain a measure of transferability by correcting for systematic errors in the mesoscale ensemble predictions by leveraging joint parameter pairs that have a close and recognizable physical link to fog. Previous related work (RH14) found the primary systematic error in the mesoscale ensemble to be a low-level warm bias in coastal and valley regions that is greatest overnight, resulting in a negative RH bias, and consequently a lack of predicted *q*_{c}.

The highest-performing parameter pairs invariably included a moisture parameter such as RH or vapor pressure depression and a low-level stability parameter. The stability parameter is crucial for preventing rapid skill decreases postsunrise, when RH bias changes abruptly and, in the valley region, RH error variance increases (RH14).

The success of BT extends the results from two earlier studies. First, it supports the experience of Gultepe et al. (2007b) that NWP predictions of near-surface RH are a useful, but generally inadequate, component of a fog detection or prediction scheme. Second, it supports the finding of Hippi et al. (2010), who found observed temperature differences between the surface and 500 m, and surface RH, to be the two best fog predictors at two surface stations in Finland. Results here extend the useful predictive time and show that the virtual temperature deficit and layer-1 RH model predictions are useful indicators of observed fog.

At the coastal sites the best transferable parameter pair (BT) shows little advantage from using a coastal optimization (which used parameters of the virtual temperature deficit and 2-m RH) compared to an all-regions optimization (which replaces the 2-m RH with layer-1 RH). At valley sites, there was clear skill improvement using BT with all regions optimization, but even greater skill is possible in valley-only applications by using BT with valley optimization. The output probability map and parameters vary slightly, to use the saturation vapor pressure deficit and layer-1 vapor pressure depression rather than the virtual temperature deficit and layer-1 RH, giving the largest skill improvement of any technique.

None of the experiments improve the already skillful unaltered NWP model predictions in the mountain region.

For the joint parameter technique developed in this work, conservatively large bins were used to minimize the risk of overfitting the training data. The results of the Sample Size–Large and Sample Size–Small experiments indicate the overall reliability, resolution, and skill have low sensitivity to bin size in the range of bin sizes used.

In addition to using large bins, several other measures were taken in this work to attempt to maintain as much worldwide transferability as possible within the prediction framework. These include 1) restricting the use of predictor candidates to those with a clear thermodynamic linkage to fog and excluding those with a linkage that might be speculative or vary by location; 2) in experiment BT, further restricting the candidate parameters to those believed to possess a high transferable quality; and 3) performing cross validation using the most difficult test.

## Acknowledgments

The authors are grateful for funding from the U.S. Air Force to complete this work and to NPS for funding submission fees. Support from NCAR, particularly Chris Snyder and Kate Smith, is also appreciated. Mary Jordan at NPS was helpful in preparing figures.

## REFERENCES

Boutle, I. A., A. Finnenkoetter, A. P. Lock, and H. Wells, 2016: The London Model: Forecasting fog at 333 m resolution.

,*Quart. J. Roy. Meteor. Soc.***142**, 360–371, https://doi.org/10.1002/qj.2656.Delaunay, D., 1934: Sur la sphère vide.

,*Izv. Akad. Nauk SSSR [Khim]***7**, 793–800.Geiszler, D. A., J. Cook, P. Tag, W. Thompson, R. Bankert, and J. Schmidt, 2000: Evaluation of ceiling and visibility prediction: Preliminary results over California using the Navy’s Couple Ocean/Atmosphere Mesoscale Prediction System (COAMPS). Preprints,

*Ninth Conf. on Aviation, Range, and Aerospace Meteorology*, Orlando, FL, Amer. Meteor. Soc., 334–338.Gultepe, I., M. D. Müller, and Z. Boybeyi, 2006: A new visibility parameterization for warm-fog applications in numerical weather prediction models.

,*J. Appl. Meteor. Climatol.***45**, 1469–1480, https://doi.org/10.1175/JAM2423.1.Gultepe, I., and et al. , 2007a: Fog research: A review of past achievements and future perspectives.

,*Pure Appl. Geophys.***164**, 1121–1159, https://doi.org/10.1007/s00024-007-0211-x.Gultepe, I., M. Pagowski, and J. Reid, 2007b: A satellite-based fog detection scheme using screen air temperature.

,*Wea. Forecasting***22**, 444–456, https://doi.org/10.1175/WAF1011.1.Herzegh, P., and et al. , 2006: Development of FAA national ceiling and visibility products: Challenges, strategies and progress.

*12th Conf. on Aviation, Range, and Aerospace Meteorology*, Atlanta, GA, Amer. Meteor. Soc., P1.17, https://ams.confex.com/ams/Annual2006/techprogram/paper_103545.htm.Hippi, M., I. Juga, A. Rossa, F. Zardini, and F. Domenichini, 2010: Road weather and model development—Friction model and fog model. ROADIDEA Project Rep., Finnish Meteorological Institute–Regional Agency for Environmental Prevention and Protection of Veneto, 41 pp., https://cordis.europa.eu/docs/projects/cnect/5/215455/080/deliverables/001-ROADIDEAD34bRoadweathermodeldevelopment1.pdf.

Marzban, C., S. Leyton, and B. Colman, 2007: Ceiling and visibility forecasts via neural networks.

,*Wea. Forecasting***22**, 466–479, https://doi.org/10.1175/WAF994.1.Price, J., A. Porson, and A. Loc, 2015: An observational case study of persistent fog and comparison with an ensemble forecast model.

,*Bound.-Layer Meteor.***155**, 301–327, https://doi.org/10.1007/s10546-014-9995-2.Ryerson, W. R., 2012: Toward improving short-range fog prediction in data-denied areas using the Air Force Weather Agency mesoscale ensemble. Ph.D. dissertation, Naval Postgraduate School, 225 pp., http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA567345.

Ryerson, W. R., and J. P. Hacker, 2014: The potential for mesoscale visibility predictions with a multimodel ensemble.

,*Wea. Forecasting***29**, 543–562, https://doi.org/10.1175/WAF-D-13-00067.1.Skamarock, W. C., and et al. , 2008: A description of the Advanced Research WRF version 3. NCAR Tech. Note NCAR/TN-475+STR, 113 pp., http://dx.doi.org/10.5065/D68S4MVH.

Stull, R. B., 1988:

*An Introduction to Boundary Layer Meteorology*. Kluwer Academic, 666 pp.Tardif, R., 2007: The impact of vertical resolution in the explicit numerical forecasting of radiation fog: A case study.

*Pure Appl. Geophys*.,**164**, 1221–1240, https://doi.org/10.1007/s00024-007-0216-5.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences: An Introduction*. Academic Press, 467 pp.Zhou, B., 2011: Introduction to a new fog diagnostic scheme. NCEP Office Note 466, 43 pp.

Zhou, B., and B. S. Ferrier, 2008: Asymptotic analysis of equilibrium in radiation fog.

,*J. Appl. Meteor. Climatol.***47**, 1704–1722, https://doi.org/10.1175/2007JAMC1685.1.Zhou, B., and J. Du, 2010: Fog prediction from a multimodel mesoscale ensemble prediction system.

,*Wea. Forecasting***25**, 303–322, https://doi.org/10.1175/2009WAF2222289.1.Zhou, B., J. Du, J. McQueen, and G. DiMego, 2009: Ensemble forecast of ceiling, visibility and fog with NCEP Short-Range Ensemble Forecast System (SREF).

*Aviation, Range, and Aerospace Meteorology Special Symp. on Weather-Air Traffic Management Integration*, Phoenix, AZ, Amer. Meteor. Soc., 4.5, https://ams.confex.com/ams/89annual/techprogram/paper_142255.htm.

^{1}

In a WRF Model update notice dated 21 December 2011, primary model developers at the National Center for Atmospheric Research (NCAR) reported a bug affecting 2-m temperature predictions when the RUC land surface model is used in conjunction with the Yonsei University (YSU) PBL scheme. Members 15 and 17 in this work are configured with these two schemes. Although a new version of WRF was released by NCAR with the bug resolved, it was too late in this work to reproduce the NWP model runs. During development and testing of the *β*_{e} postprocessing technique in this work, these two members were largely excluded when a parameter pair involved 2-m predictions from the NWP model, resulting in an eight-member ensemble in these cases.