Air quality forecasts for the mid-Atlantic region (including the metropolitan areas of Baltimore, Washington, D.C., and Philadelphia) began in 1992. These forecasts were issued to the public beginning in 1995 and predict daily peak O3 concentrations (1-h average) within each metropolitan area. The purposes of the forecasts are to warn sensitive populations of concentrations that are in excess of the National Ambient Air Quality Standard (NAAQS) for O3 and initiate voluntary control programs (“ozone action days”) designed to reduce pollution. Ozone is a photochemical pollutant whose concentrations reach a maximum during the summer months when day length is long and solar zenith angle low. Forecasts are issued daily from mid-May to mid-September at approximately 1900 UTC and are valid the following day. The forecasts are based on statistical models that use primarily meteorological predictors. Output from the statistical model is used as guidance and modified by the forecasters to account for features not resolved by those methods. The forecast is issued to the public in the form of a color code with “code red” indicating unhealthy conditions. The range of peak O3 concentrations in the mid-Atlantic region during a given season is typically 30–180 parts per billion by volume (ppbv). For the period of this study (1995–98) the Baltimore forecast area observed seasonal mean peak O3 of 85.7 ppbv. Median absolute forecast error for the 1995–98 seasons was 9.0 ppbv and root-mean-square error was 14.8 ppbv. This represents a 40% increase in skill over simple persistence forecasts. Nine of the 10 most severe cases during this period were correctly forecast code red with the remaining case forecast “code orange” (O3 watch). Currently, real-time photochemical models are being developed to forecast O3. The results presented here represent benchmark skill from which to judge improvements occasioned by numerical models or other forecasting techniques.
Air quality forecasts predict concentrations of ozone (O3) near the surface. Ozone is a photochemical pollutant formed by reactions involving oxides of nitrogen (NOx = NO + NO2) and volatile organic compounds (VOCs or hydrocarbons) in the presence of sunlight (Crutzen 1979; Finlayson-Pitts and Pitts 1986). The major sources of NOx are combustion processes such as internal combustion engines and power generation. The major sources of VOCs are automobile emissions, industrial activity, and natural sources (certain types of pines and deciduous trees) (EPA 1997a). At high concentrations, O3 can pose a threat to human health (Lippman 1989; Bascom et al. 1996a,b) as well as to plants and materials (Haagen-Smit et al. 1951; Heck et al. 1982). As a result, O3 has been designated as a criteria pollutant for which the United States Environmental Protection Agency (EPA) has adopted a National Ambient Air Quality Standard (NAAQS). Historically, this standard was set at 0.12 ppmv (parts per million by volume) for a 1-h average (EPA 1986). Recent research showing that O3 may be harmful at lower concentrations, if the exposure is of longer duration, led to the adoption of a new NAAQS for O3 of an 8-h mean concentration of 0.08 ppmv (EPA 1997c). Many major metropolitan areas are not in compliance with the O3 NAAQS and continue to exceed the standard on a number of days each year (Seinfeld 1991). In order to inform the public that the air may be unhealthy and encourage voluntary reductions in precursors (e.g., reduced automobile usage), air quality forecasts are issued. This paper outlines the results from initial forecast efforts in the mid-Atlantic region.
2. Ozone climatology in the mid-Atlantic
Because O3 formation requires ultraviolet (UV) radiation as well as precursors such as NOx and VOCs, O3 concentrations (denoted [O3]) reach a maximum during the summer months when solar zenith angle is low and length of day is long (Fig. 1). In addition to the direct effect of UV radiation on O3 production, emissions of some O3 precursors are temperature dependent and peak during midsummer. For example, biogenic, or natural, emissions of hydrocarbons are particularly sensitive to temperature and follow a seasonal cycle based on leaf growth (Goldstein et al. 1998). As a result, although UV radiation flux reaches a maximum at the summer solstice, peak [O3] typically lag the solstice by approximately 20 days and maximize in early July.
Although it is expected, based on emissions estimates, that air quality is worse on weekdays than weekends, that does not appear to be entirely true. If mean peak 1-h [O3] are segregated by the day of the week, there are no significant differences in concentrations between weekdays and weekends. However, extremely high O3 cases are less likely to occur on Sundays. In the 95th percentile of the distribution ([O3] ≥ 137 ppbv), only 5% (4 of 84 cases) occur on Sundays.
The diurnal cycle of O3 follows the course of the sun. Concentrations maximize in mid- to late afternoon and reach a minimum just before sunrise. A typical diurnal profile of near-surface O3 on a sunny day is given in Fig. 2. The rapid reduction in O3 in the overnight hours is due primarily to impaction (dry deposition) on the surface although there are also losses through chemical reactions (Finlayson-Pitts and Pitts 1986). Air parcels containing O3 that remain above the nocturnal inversion layer will not impact on the surface and will have much longer lifetimes (Liu et al. 1987). These air parcels can be mixed back down to the surface the following day or days, often at locations far distant from their sources (Harriss et al. 1984; Anderson et al. 1994; Dickerson et al. 1995; Ryan et al. 1998; Knapp et al. 1998). This reservoir effect is the key to understanding regional-scale O3. In Fig. 2, the monitor near Frederick, Maryland, located upwind of the Baltimore area on this day (22 August 1998), shows a rapid increase in O3 in the midmorning hours coincident with the breakdown of the nocturnal inversion (Zhang and Rao 1998).
Because air quality forecasts are intended to protect public health, forecast skill prior to and during extended periods of unhealthful air quality (O3 episodes) is of primary importance. In the Baltimore region, [O3] in excess of the 1-h NAAQS are observed on approximately 13 days per year (1987–98 mean frequency). While emissions of most man-made O3 precursors are relatively constant day to day, weather conditions relevant to O3 formation can change significantly over the same timescale. It is the response of O3 to particular sets of weather variations that allows for forecast skill. In general, O3 concentrations will increase when incoming UV flux reaches a maximum (midsummer), clouds are few, and the volume into which precursors are mixed, and transported, is limited. These factors typically converge in the context of a slow moving anticyclone (Vukovich et al. 1977; Vukovich and Fishman 1986; Ryan 1995; Ryan et al. 1998). Seasonal weather anomalies can have a strong impact on the frequency of occurrence of these conditions (Cox and Chu 1993; Vukovich 1994; Flaum et al. 1996). The extreme heat and drought of 1988 led to 35 days in excess of the NAAQS in Baltimore while the wet and cool summer of 1996 contained only 5 such cases.
While weather conditions modulate [O3] to a considerable degree, it is also true that pollution controls instituted over the past several years have lowered precursor emissions (EPA 1997a,b). For certain large metropolitan areas, reductions in VOC emissions from automobiles may have succeeded in reducing extreme peak [O3] though mean concentrations have shown little change in the Baltimore–Washington region (Rao et al. 1992; Rao and Zurbenko 1994).
3. Preparation of forecasts
a. Description of forecasts
Air quality forecasts are prepared individually for each metropolitan area. This reflects the EPA practice of determining attainment with the NAAQS on a countywide basis within a consolidated metropolitan statistical area. The Baltimore forecast area, and the air quality monitors included in it, are shown in Fig. 3. There are currently 10 O3 monitors operated by the state of Maryland within this forecast area. The Philadelphia and Washington, D.C., forecast areas are of similar areal extent. Ozone concentrations in any city in this region cannot be considered to be a function solely of precursor emissions within that city. Because O3 has a lifetime of several days in the free troposphere, it can be transported significant distances. As a result, forecasters must take into account conditions at a much wider spatial and temporal scale than the forecast area itself.
Forecasts are verified against peak 1-h average [O3] at any monitor within the forecast area. Verification of the forecast at the most local scale (a single monitor), when peak [O3] are, as noted above, influenced by factors operating at much large scales, raises a consistency problem between the forecast phenomena and the verification technique. However, the air quality forecast is designed to protect public health and peak single monitor concentrations are the basis for the applicable NAAQS. Thus, the single monitor standard is a valid basis for warning and verification although this can lead to overprediction of [O3] for the region as a whole.
In order to give the public sufficient time to protect themselves or alter their behavior based on the expected air quality, the forecasts are issued daily (May–September) at 1900 UTC and verify the following afternoon. Long-range outlooks (2–3 days) are also issued at this time. The forecasts are posted to the World Wide Web at a variety of sites (appendix A) and are faxed to local television and radio stations. In addition, the forecast is faxed to a network of organizations and employers who institute voluntary pollution controls (ozone action days) based on a forecast of high [O3] (code red).
b. Operational procedure
The forecasts are prepared using output from a multiple linear regression algorithm as forecast guidance, which is then supplemented by expert analysis. The forecast guidance is developed, as discussed in more detail below, using the perfect prognosis approach (Klein et al. 1959) in which a statistical relationship is determined between observed values of the predictand (O3) and a set of meteorological predictors. A dataset of observed O3 and meteorological variables from preceding years (1987–97 in the current application) is used to develop the algorithm while, in operational forecast use, each variable must be predicted. Because the forecasts are valid the day following the forecast, this represents a 24-h forecast of meteorological conditions. The forecaster typically selects the value for each predictor from a variety of sources including direct output from short-range National Centers for Environmental Prediction (NCEP) numerical models, model output statistics (MOS), and guidance from National Weather Service forecasts including the Coded Cities Forecast. As there is often disagreement between these models and techniques, the forecaster applies expert judgment to select the values for each predictor. This introduces a human factor into the forecast at the initial stage. As applied in 1996, an improvement of 13% in root-mean-square error for the O3 forecast was found in the Washington, D.C., forecast area when forecaster-selected predictors for surface predictors were compared with those taken directly from NCEP’s Nested Grid Model (NGM) MOS output (Glahn and Lowry 1972). Skill in prediction of upper-air forecast variables for 1998 is given in Table 1. Upper-air predictors are available from several numerical models including the NCEP Eta and NGM models. Numerical model predictions of upper-air temperatures are good and little improved by expert intervention. Of particular interest is the skill by the Eta Model, relative to the NGM, on warm days when [O3] are enhanced and forecast skill is critical.
Once the forecasters select the values for the forecast variables, they are entered into a spreadsheet to calculate the regression guidance for an ensemble of regression algorithms. Typically, an updated regression algorithm is tested each year using different sets of predictors and an extended input dataset. All regressions are exercised together to allow a full comparison of new approaches. The output from the statistical model (regression guidance), based on forecasts of predictors, is then evaluated by the forecaster (expert analysis phase) to account for factors not fully resolved by the statistical approach. These factors are discussed in more detail below. The forecasters may modify the regression guidance at this point. Following the expert analysis phase, the forecast is issued to the public.
c. Statistical model
Statistical methods for air quality forecasting have been investigated for some time (Wolff and Lioy 1978;Robeson and Steyn 1990). Techniques include multiple linear regression (Clark and Karl 1982; Feister and Balzer 1990; Ryan 1995), nonlinear regression (Hubbard and Cobourn 1997), neural network techniques (Comrie 1997; Ruiz-Suarez and Mayorra-Ibarra 1995), classification and regression trees (Burrows et al. 1995), and cluster analysis (Ryan 1995). The technique used in this study is a standard multiple linear regression approach modified (as noted below) by expert analysis.
The linear regression approach is selected based, in part, on the strong correlation between [O3] and maximum surface temperature (Fig. 4). While the temperature–[O3] correlation alone provides considerable skill, there is still a wide range of possible [O3] in the upper end of the temperature distribution. For the Baltimore area, while warm temperature is required for [O3] in excess of the NAAQS (84% occur with maximum temperature ≥ 90°F), warm temperature alone is not necessarily sufficient (only 38% of all days ≥ 90°F observe [O3] above the NAAQS). Thus, for the critical high O3 cases, a wider set of predictors or additional techniques are required.
The regression-based forecast algorithms reported here are prepared using a dataset of observed meteorological and other variables for the period May–September 1987–97. For the Baltimore forecast area, surface observations at Baltimore–Washington International Airport (BWI) and upper-air observations at Dulles International Airport (IAD) are used as input for the regression analysis. Surface observations at IAD are used for the Washington area forecast and Philadelphia International Airport (PHL) for the Philadelphia area. IAD upper-air soundings are used for upper-air predictors for all forecast areas, including Philadelphia. Although use of single point data raises issues related to representativeness, one advantage is the routine availability of surface and upper-air predictors from operational numerical models for specific locations.
Only the summer months (May–September) are used as input for the regression analysis because O3, as a photochemical pollutant, has a strong seasonal cycle and only approaches unhealthy levels during summer months. The period selected for the input dataset (1987–97) is based on several factors. First, an interannual trend in [O3], and therefore the stability of the forecast model, may occur due to changes in emissions of O3 precursors during the time period of interest. There appears to be little or no long-term trend in [O3] in the Baltimore–Washington region over recent years (Rao and Zurbenko 1994). A shorter time series may be preferred where there is evidence that large changes in emissions of O3 precursors may have occurred. For example, motor vehicle emissions have declined over the years and particularly in the late 1980s due to controls on the Reid vapor pressure (RVP) of automobile fuels (NESCAUM 1989). For the Philadelphia region, it appears that emissions changes may have affected peak [O3] in recent years. Using a 1987–95 database for forecasts in 1996–98 resulted in a high bias (11.9 ppbv) in the regression guidance. This bias was reduced in 1997–98 to 4.6 ppbv by restricting the database to 1991–96, a period after the imposition of RVP controls. A longer time series (prior to 1987) is not used due to changes in observation techniques for O3 and monitor location during the late 1970s and early 1980s.
The optimum set of predictors, and the best fit of observed data to peak [O3], is determined using a stepwise regression analysis (SPSS 1997). Additional predictors are added one by one and retained in the algorithm if they improve performance based on the developmental dataset. Performance is defined with respect to the percentage of explained variance of observed [O3]. When the stepwise procedure is automated, a large number (>12) of predictors are selected. The predictors are reduced to a more manageable number using two criteria. First, if the predictor increases the explained variance by a negligible amount (<0.4%), it is discarded. This is based on the expectation that the error in forecasting the predictor will be greater than the additional variance explained. Second, the predictability of the weaker predictors is subjectively assessed. Some predictors may be marginally useful to explain variance but are difficult, in practice, to forecast. Wind direction (u, υ components), for example, is marginally useful as a predictor but notoriously difficult to forecast in weakly forced summer conditions. The end result of this “weeding out” procedure is a set of 5–10 forecast parameters that are used in the daily forecast.
A list of the most commonly selected predictors is given in Table 2. A subset of these predictors is used in any given algorithm and the explained variance of observed [O3] is typically 75%–80%. A specific example of a regression algorithm for the Baltimore forecast area is given in appendix B. In practice, a much larger number of predictors are tested at more preliminary stages. The predictors are generally meteorological parameters rather than chemical parameters. The reliance on meteorological predictors is due to the relatively strong correlation between high-frequency changes in [O3] and weather conditions. Using O3 precursors as predictors is not attempted due to uncertainties associated with their measurement and the limited spatial scope of the observation network. Individual species of VOCs are difficult to measure and, while total VOC concentrations are relatively simpler to measure, they are insufficient to determine O3 production capacity. That is, some VOCs are more reactive (produce more O3 per molecule); therefore, the measured concentration of each individual VOC must be weighted for O3 production efficiency (Chameides et al. 1988; Carter and Atkinson 1989; Chameides 1992). For example, while isoprene, a highly reactive, naturally occurring VOC, is rarely the most numerous VOC in terms of mass, it often provides the most potential for O3 formation. As a result, the concentrations of individual VOCs (speciation) must be known with some certainty for use as an O3 predictor. In operational use, of course, each key VOC species must be forecast for the following day. The NOx concentrations ([NOx] = [NO] + [NO2]) unlike VOCs, are routinely available. Unfortunately, [NOx] are measured at only a handful of sites and the standard measurement techniques for [NO2] are subject to interference (Fehsenfeld et al. 1987). Also, NO2 is a criteria pollutant and measurement sites are selected to observe local peak concentrations—typically near sources of automobile emissions or power plants. As a result, these measurements will be uncertain and subject to very large, local-scale effects, including high-frequency variations that may not be consistent with the wider metropolitan area.
While direct use of chemical measurements, other than O3, is not feasible, the most commonly used forecast variables summarized in Table 2 do reflect physical processes important to ozone production. Temperature is the strongest predictor of peak [O3]. This reflects, in part, the temperature sensitivity of emissions and reaction rates for many precursors of O3 (e.g., Guenther et al. 1994; Geron et al. 1997). The intensity of incoming (UV) radiation is also indirectly forecast using sky cover and solar zenith angle as proxies. Other measures, such as atmospheric turbidity, can be used (Hubbard and Cobourn 1997). Local [O3] are also strongly affected by mixing of O3 and its precursors. Vertical mixing can be estimated by a variety of predictors including upper-air temperature and stability indices. Horizontal mixing may be estimated by surface and upper-air wind speed. Because O3 is highly persistent over 1–2 days, the previous day’s peak [O3] is used as predictor. The autocorrelation coefficient for [O3] is 0.62 for one day (Baltimore data) and drops to 0.25 or less at three days and beyond. Some aspects important to local O3 production cannot be explicitly included in a regression-based forecast. These include airmass characteristics, regional transport of O3 and its precursors, and timing and extent of deep convection. These effects are considered in the expert analysis step of the forecast.
d. Subjective (expert analysis) methods
Output from the regression model is used as preliminary guidance by the forecasters. At this stage, the forecasters must determine what factors expected to affect air quality in the next 0–36 h are not resolved by regression techniques and adjust the forecast accordingly. These factors include airmass characteristics, timing and extent of local- and regional-scale convection, and regional-scale [O3]. Intense convective activity mixes air parcels through the depth of the troposphere and, combined with rainfall, lowers [O3]. As convective activity can occur in conjunction with high temperatures and few clouds, the regression guidance, which is sensitive to both factors, will overpredict [O3] in these cases. An example of convective activity and its effect on [O3] is given in Fig. 5. On 25 June 1998, [O3] in excess of the NAAQS were observed at Fair Hill, Maryland (northeast of Baltimore). Continued warm and sunny conditions were expected, and occurred, the following day and, by noon, [O3] were higher than the previous day. However, deep convective activity later that afternoon lowered [O3] quickly. As seen in Fig. 5, timing of convection is critical. A delay of several hours in thunderstorm activity can result in [O3] increasing by 20–30 ppbv. To the extent that convective activity is anticipated before the usual time of peak O3 (∼1900–2100 UTC), regression guidance will be corrected downward.
Convective activity need not be local to affect O3. Cirrus or cirrostratus formed from remnants of upstream mesoscale convective activity can significantly reduce UV flux and [O3]. While the sky cover predictor can modulate the regression forecast, the weight of this predictor is often not sufficient to introduce an adequate downward correction. Airmass characteristics can also be a factor (Bethan et al. 1998). For example, summer frontal passages often are not characterized by large changes in maximum temperature. However, air masses of relatively recent Canadian origin or maritime air masses are low in O3. Finally, as will be discussed in greater detail below, the history of the air mass entering the region—due to parcel trajectory and upstream concentrations—can also be an important factor not expressly considered by the regression guidance.
The final forecast is developed in a consensus fashion by forecasters from the University of Maryland (UM), Maryland Department of the Environment (MDE), and, for Washington, D.C., the Metropolitan Washington Council of Governments (MWCOG) and the Virginia Department of Environmental Quality (VADEQ). The forecast for Philadelphia is issued directly by UM but often after consultation with forecasters from regions surrounding the forecast area. The forecast is determined by ppbv concentration and then translated to a color code (Table 3) for issuance to the public. In the case of forecasts of code orange or code red, a health advisory is also issued by the appropriate governmental authority.
Over the period 1995–98, air quality forecasts in the mid-Atlantic region have shown adequate performance and a gradual increase in skill. Standard statistics for the regression guidance and consensus forecasts are given in Table 4 for the Baltimore region. Similar skill is seen in Washington, D.C., and Philadelphia (not shown). The trend in regression guidance skill in Baltimore is given in Fig. 6 and shows a gradual improvement. Of particular note is the reduction in overall bias. This improvement is due to a combination of factors. First, the regression algorithms are updated each year with new predictors tested. In addition, the observed dataset used to develop the algorithms is extended each year. For example, the algorithm used in 1995 was based on observations from 1987–94 while the 1998 algorithm used the 1987–97 period. Finally, improvements in NCEP short-range models, particularly the introduction of the Eta Model, and forecaster experience may have led to better forecasts of predictors.
The simplest measure of forecast skill is improvement over the persistence forecast (the current day’s observed [O3] forecast for tomorrow) (Table 5). Persistence is a fair measure of forecast skill as O3 concentrations are highly autocorrelated in the 1–2-day range. While this analysis is confounded to some degree because the latest (1998 version) of the forecast guidance uses the previous day’s peak [O3] as a predictor, the versions used in 1995–97 did not. For the standard error measures in Table 5, the regression guidance improves on persistence by 27%–29% and the consensus forecast improves on persistence by 40%–48%. The overall improvement occasioned by expert analysis compared to regression guidance is 16% for rms error and 21% for mean absolute error. The distribution of regression guidance and consensus forecast errors is given in Fig. 7 and also shows the extent of skill added at the expert analysis stage of the forecast process.
For the upper end of the [O3] distribution (≥105 ppbv), the regression forecast is approximately the equal of the consensus forecast. The regression guidance actually improves on the expert forecasts for cases of observed [O3] in excess of 105 ppbv (Table 5). However, as shown in Table 6, this skill comes with a tendency to overpredict. In cases of forecasts in excess of 105 ppbv, the expert forecast is more consistent. Put another way, for a given season, the regression guidance will accurately predict an additional 2.5 cases in excess of 105 ppbv compared to the expert forecast but will do so at the cost of approximately 7 additional cases of false alarms of high O3. The problem with regression performance in the upper end of the distribution may be corrected in a number of ways. One technique is the use of a modified database including only the extremes of the distribution (“high–low”) (Hubbard and Cobourn 1997).
Because regression guidance overpredicts in the upper end of the distribution, forecasters may tend to discount, or reduce, regression guidance in these cases. This approach, while effective overall, can be counterproductive in specific cases. For example, expert forecast skill is known to be poor in cases of rapid increases in [O3] (“transition” cases). In these cases, [O3] increase by greater than 20 ppbv over initially moderate (code yellow) concentrations. Transition cases often occur on the onset days of O3 episodes. In these situations, the regression guidance is accurate and outperforms the expert forecast (Table 7).
In any forecast program where public decision-making is based on a forecast threshold, a key issue is communicating the probability of success and, therefore, the risk involved in the forecast (Murphy 1985; Murphy and Ehrendorfer 1987). For air quality forecasts, risk–benefit concerns can be contradictory at times. Forecasts of high O3 initiate voluntary pollution control efforts that can be very costly. From this perspective, false alarms (high [O3] forecast but not observed) are to be avoided since no costs are incurred for forecasts of lower concentrations. From the public health perspective, avoiding misses (high [O3] observed but not forecast) is more critical. The forecast program as currently devised tries to respond to both goals by the use of two levels of warning. The code orange forecast (≥105 ppbv) is intended to serve as an ozone watch. This forecast is issued with a health warning that advises the public that conditions are ripe for unhealthy [O3] though no immediate alarm is sounded. The code red forecast (≥125 ppbv) then serves as an ozone warning stating that unhealthy conditions are expected to occur. Ozone action day forecasts are triggered only by forecasts of code red while the affected population may wish to alter their behavior based on the code orange forecast.
For the period 1995–98 in Baltimore there have been three “complete” false alarms of code red (code red forecast and only code yellow observed)—an average of less than one per year (Washington, D.C., and Philadelphia experienced a total of two and one cases, respectively). However, a larger number of code red forecasts resulted in concentrations reaching only the code orange level (roughly two cases per year in Baltimore and Washington, D.C.; one per year in Philadelphia). The complete miss rate (code red observed, code yellow forecast) in Baltimore is similar with three cases in the period 1995–98 (six occurred in Washington, D.C.). However, a large number of cases (roughly five per season) of observed code red occurred with code orange forecasts.
On a more quantitative level, standard skill score measures for the thresholds of 105 ppbv (code orange) and 125 ppbv (code red) are given in Table 8. The conflicting cost–benefit goals noted above are met to some degree by the forecasts. For example, forecasts of code red, which initiate costly O3 control programs, have a good probability of detection (0.73) with a limited false alarm rate (0.27). Forecasts of code orange, which are geared to alerting sensitive individuals, have a very good probability of detection relative to an observed code red threshold (0.92) but at the cost of a very high false alarm rate (0.70). As noted above, the regression guidance achieves a high probability of detection with a correspondingly high false alarm rate.
Pollution controls in the coming years are expected to lower [O3]. This may result in reducing the current “minor” NAAQS exceedances (125–135 ppbv) to below the standard. However, during the most severe O3 episodes, [O3] may still remain above the NAAQS even with more stringent controls. The key to future forecast skill is the anticipation of the very high O3 cases. For the current program, forecast skill in highest O3 cases is quite good. Health warnings were issued for all of the 10 most severe cases in the Baltimore area (nine forecasts of code red and one of code orange). The two most severe regional episodes (12–15 July 1995 and 12–17 July 1997) were covered with warnings in the 2–3-day outlooks.
One factor contributing to skill in the extreme of the distribution is the application of pattern analysis during the expert analysis stage. An analysis of high [O3] events in 1983–91 showed certain weather patterns were associated with severe, multiday high [O3] episodes in the mid-Atlantic (Ryan 1995). The most common pattern is characterized by a slow-moving upper-air ridge with its axis west of the region. An upper-air analysis for the highest [O3] case in the 1995–98 period is shown in Fig. 8. A ridge located west (upstream) influences O3 in several ways. The area east of a ridge axis is characterized by subsidence (downward motion). Subsidence results in a stronger low-level inversion and reduced cloud cover. As the center of surface high pressure typically leads the upper-air ridge axis, any surface pressure gradient is weak and horizontal ventilation is limited. These factors (clear skies, little vertical motion, light winds) are conducive to O3 formation. However, the temporal and spatial extent of extraordinarily high O3 cases is also affected by regional-scale factors. Observations of [O3] by aircraft upwind (west in these cases) of the mid-Atlantic during the highest [O3] cases in 1995 (Fig. 9) and 1997 (not shown) show that [O3] entering the region are on the order of 90–110 ppbv. These concentrations are observed just after sunrise so the observed O3 must have been produced the preceding day and, given nonstagnant conditions during this episode, at some distance to the west or northwest.
With the recent availability of near real-time O3 observations (http://www.epa.gov/airnow), identifying the approximate source and O3 loading of air parcels entering the forecast area can be a key factor to inform forecasts. Trajectory models, such as the HY-SPLIT model (Draxler 1991, 1992), may be used to model the path of air parcels entering the region (www.noaa.arl.gov/READY). Air parcel back trajectories can be used both to analyze historical cases (using NGM or Eta initial fields as input) or in forecast mode (using Eta or NGM forecast fields) via the World Wide Web. For the highest O3 cases in 1995–96 (Fig. 10) and in 1998 (Fig. 11) trajectories cluster along a corridor roughly lying between Buffalo, New York, and Charleston, West Virginia. This transport orientation is also the case for high concentrations of O3 and SO2 at rural locations within Shenandoah National Park in western Virginia (K. Hallock 1999, personal communication). The trajectories shown in Figs. 10 and 11 are for air parcels whose paths terminate at 1500 m above ground level near BWI. Although lower-layer trajectories are often quite similar in orientation, 1500 m is used as reference height for two reasons. First, layers of high [O3] have been observed at this level (Ryan et al. 1998). Second, airflow is nearly geostrophic at this level and turbulent and frictional effects, which cannot be resolved at the scale of the trajectory model, are less likely to degrade model output. In operational use, regression guidance in cases with high temperature and low cloud cover will be adjusted upward if an upper-air ridge is located west of the region and the back trajectory shows a trajectory along the “high O3 corridor.”
Although current results are encouraging and there has been incremental improvement over the course of this forecast program, regression-based forecast guidance can be expected to have a finite skill level. For 1997, regression guidance initialized after the fact with observations showed an improvement in mean absolute error of only 7% above regression guidance initialized with forecast data. While compensating errors may allow the regression guidance to perform well even with poorer input data, the implication is that only limited further improvement is possible. Higher skill levels may still be achieved in specific portions of the distribution. In particular, forecast algorithms can be developed specific to the highest range of the distribution. As seen in Fig. 4, a quadratic fit to temperature may improve skill in this range of the distribution. Other predictors may be utilized in the higher end of the distribution. For example, recent research has shown that certain natural hydrocarbons have a distinct seasonal cycle with rapid enhancements during high temperature cases (Goldstein et al. 1998). Other techniques, such as neural networks, may offer additional improvement in the high end of the distribution and are currently being tested.
In the short term, air quality forecasts will shift from peak 1-h concentrations to the new 8-h mean standard. The strong correlation between peak 1- and 8-h [O3] suggests that forecast techniques developed for 1-h peak [O3] will also be effective for 8-h forecasts. In the longer term, coupled chemistry–transport numerical models (CTMs), which can solve for O3 concentrations directly, will be more widely applied to operational forecasting of air quality. These models are not currently in general use for daily, local-scale operational air quality forecasts but regional-scale (36-km grid) operational forecasts are being tested (McHenry et al. 1999).
Several problems currently limit the operational use of air pollution models at local scales. First, a very finescale grid resolution is required to resolve local variations in [O3]. For example, O3 modeling for pollution control strategy development is typically undertaken using a grid size of 4–12 km (Scheffe and Morris 1993; Hanna et al. 1996). This reflects spatial variations in emission sources (highways, power plants, industrial operations) as well as meteorological factors that are critical to local-scale O3 (nocturnal jets, bay–land breezes) (Pielke and Uliasz 1998; McQueen, et al. 1995). Assuming that computational resource requirements can be met for the smaller grid resolution, large uncertainties remain in CTMs related to initial conditions, boundary conditions, and inputs of chemical parameters (emissions) (Hanna et al. 1998; Chock et al. 1995; Derwent and Hov 1988; Seinfeld 1988). CTMs must be initialized with ambient concentrations of a wide range of trace gases. Currently, only surface measurements of O3 and a few precursors (NO, CO) are available in the time and space scales necessary for initializing the model. Many precursors, particularly VOCs, are only intermittently measured and cannot be quickly analyzed in time for use by a forecast model. In addition, observation sites are clustered in urban areas and may not adequately reflect regional concentrations. With observations limited to surface concentrations, the vertical structure of O3 and its precursors must be parameterized. Efforts are under way to use remote sensing to provide boundary conditions for O3 (R. Hudson 1999, personal communication). The chemistry model must also account for emissions of O3 precursors at fine space scales and timescales. This requires detailed knowledge of emissions from a wide variety of processes. For example, in order to determine automobile emissions accurately, the number of cars driven, their location, distance traveled, and details of exhaust products are required to be known (EPA 1999b). Models have been developed to determine these factors but considerable uncertainty remains (Lonneman et al. 1986; Bishop and Stedman 1990; Lawson et al. 1990; Robinson et al. 1996; Gertler and Pierson 1996). Industrial processes, which can vary widely from site to site, must be subjected to the same exhaustive inquiry (EPA 1999a). Finally, even if reaction rates, ambient concentrations, and emissions are accurate, the extent to which these emissions are mixed requires very finescale resolution of advective and turbulent effects (Sistla et al. 1996).
Despite these shortcomings, several avenues for improvement using CTMs in conjunction with other techniques are possible. For example, finer-scale meteorological models used in conjunction with the CTMs could be utilized to provide better inputs for current statistical models, such as those reported here, or to develop a MOS technique (Glahn and Lowry 1972) for O3. Model output could also be used as additional predictors for quantification of regional-scale O3.
Air quality forecasts (for ground-level [O3]) in the mid-Atlantic region using multiple linear regression techniques, abetted by expert analysis, are reasonably accurate for forecasts of 1-h peak [O3]. While there is a low bias in the upper end of the distribution, forecasters have profitably used pattern analysis, supplemented by regional O3 observations, to predict the most severe, multiday O3 events. For the Baltimore forecast area, 9 of the 10 highest [O3] cases were forecast with an ozone warning (code red) with the final case carrying an ozone watch (code orange). In the future, regression-based forecasts may be replaced by, or used in tandem with, numerical models solving directly for [O3]. Forecast accuracy using a simple statistical model reported here serves as a benchmark to determine improvements made by other approaches.
The authors thank our forecasting colleagues for their advice and insight including Ram Tangirala (MWCOG); Jane Mahinske, Merlin Zook, and Timothy Leon Guerrero (Pennsylvania Department of Environment Protection); Dan Salkovitz (Virginia Department of Environmental Quality); and Andrew Mikula (New Jersey Department of Environmental Protection). This work was funded by the Maryland Department of the Environment, Metropolitan Washington Council of Governments, and the Delaware Valley Regional Planning Commission. The authors thank Tad Aburn and Ron Roggenburk for their continued support of the project.
Air Quality Forecasts on the World Wide Web
Washington, D.C.: http://www.mwcog.org/dep/airqual.html
Other air quality forecasts in the United States: http://www.nescaum.org/links.html
Sample Forecast Algorithm for Baltimore (1998)
Listed in Table B1 below are the weighted coefficients for the updated regression algorithm for Baltimore used in 1998. This algorithm used a dataset from 1987 to 1997 for a total of 1284 cases excluding those with missing data. The dependent variable in this analysis is the square root of [O3]. The explained variance in predicted [O3] for the period 1987–97 was 79% with a correlation coefficient (multiple R) of 0.89.
The verification of the ozone forecasts for color code thresholds utilized a standard contingency table of the type displayed below:
Forecast skill can be expressed in a number of ways. The probability of detection (POD) measures the percentage of ozone events that were correctly forecast and is given by
The miss rate (MISS) measures the rate at which ozone events occurred but were failed to be forecast and is given by
The false alarm rate (FAR) measures the tendency of the ozone forecast to overpredict ozone occurences and is given by
In addition to forecasts of ozone occurrences, an alternative measure is the skill at which nonevents are forecast. This is a measure of the forecast skill at predicting “clean” days and is given by the correct null forecast (CNULL):
There are a number of more detailed skill scores that are useful. The critical success index (CSI or threat score) combines forecast occurrences and observed occurrences without regard to successful null forecasts. This is given by
The true skill score (TSS or Hanssen–Kuipers skill score) includes the success of null forecasts in the form of a ratio of observed skill to perfect forecast skill. This measure is not dependent on the relative frequency of occurrence and nonoccurrence or the number of trials and can be expressed as
If all forecasts are correct, TSS = 1; if all forecasts are incorrect, TSS = −1 (Lee and Passner 1993).
For events that are rare, the Heidke skill score (S) is often used. This score is a measure of the skill of a set of forecasts compared to the skill of a random forecast (Doswell et al. 1990):
* Current affiliation: NOAA/NWS Joint Agricultural Weather Facility, Washington, D.C.
Corresponding author address: William F. Ryan, Department of Meteorology, University of Maryland at College Park, College Park, MD 20742.