• Balmor, Y., A. Gutman, and U. Dayan. 1988. Wind and temperature structure of the atmospheric boundary layer in a coastal versus inland site in Israel. Proc. EURASAP Conf: Meteorology and Atmospheric Dispersion in a Coastal Area, Riso, Denmark, EURASAP, 15–24.

    • Search Google Scholar
    • Export Citation
  • Benkovitz, C. M. and Coauthors, 1996. Global gridded inventories of anthropogenic emissions of sulfur and nitrogen. J. Geophys. Res. Atmos 101:2923929253.

    • Search Google Scholar
    • Export Citation
  • Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth and Brooks, 358 pp.

  • Burrows, W. R. 1997. CART regression models for predicting UV radiation at the ground in the presence of cloud and other environmental factors. J. Appl. Meteor 36:531544.

    • Search Google Scholar
    • Export Citation
  • ——, Benjamin, M., S. Beauchamp, E. R. Lord, D. McCollor, and B. Thomson. 1995. CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. J. Appl. Meteor 34:18481862.

    • Search Google Scholar
    • Export Citation
  • Carter, M. M. and J. B. Elsner. 1997. Statistical method for forecasting rainfall over Puerto Rico. Wea. Forecasting 12:515525.

  • Dayan, U. and J. Koch. 1989. Assessment of the critical conditions for dispersion and transport of plumes from tall stacks in the Haifa area. Proc. Fourth Int. Conf. of the Israel Society for Ecology and Environmental Quality Sciences, Jerusalem, Israel, ISEEQS, 27–36.

    • Search Google Scholar
    • Export Citation
  • ——, Shenhav, R. and M. Graber. 1988. The spatial and temporal behavior of the mixed layer in Israel. J. Appl. Meteor 27:13821394.

    • Search Google Scholar
    • Export Citation
  • Duncan, B. N., A. W. Stelson, and C. S. Kiang. 1995. Estimated contribution of power plants to ambient nitrogen-oxides measured in Atlanta, Georgia in August 1992. Atmos. Environ 29:30433054.

    • Search Google Scholar
    • Export Citation
  • Efron, B. and R. J. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman and Hall, 436 pp.

  • Gifford, F. A. 1976. Turbulence diffusion typing schemes—A review. Nucl. Saf 17:6886.

  • Hopke, P. K. 1985. Receptor Modeling in Environmental Chemistry. John Wiley and Sons, 319 pp.

  • Koch, J. and U. Dayan. 1992. A synoptic analysis of the meteorological conditions affecting dispersion of pollutants emitted from tall stacks in the coastal plain of Israel. Atmos. Environ 26A:25372543.

    • Search Google Scholar
    • Export Citation
  • Lebret, E. and Coauthors, 2000. Small area variations in ambient NO2 concentrations in four European areas. Atmos. Environ 34:177185.

    • Search Google Scholar
    • Export Citation
  • MathSoft, 1998. S-PLUS 5 for UNIX Guide to Statistics. Data Analysis Division, MathSoft, 1014 pp.

  • Ryan, W. F. 1995. Forecasting severe ozone episodes in the Baltimore metropolitan-area. Atmos. Environ 29:23872398.

  • Slade, D. H. 1968. Meteorology and Atomic Energy. Office of Information Services, U.S. Atomic Energy Commission, 444 pp.

  • Swietlicki, E., S. Puri, H-C. Hansson, and H. Edner. 1996. Urban air pollution source apportionment using a combination of aerosol and gas monitoring techniques. Atmos. Environ 30:27952809.

    • Search Google Scholar
    • Export Citation
  • Venables, W. N. and B. D. Ripley. 1994. Modern Applied Statistics with S-Plus. Springer, 462 pp.

  • Walmsley, J. L., W. R. Burrows, and S. Schemenauer. 1999. The use of routine weather observations to calculate liquid water content in summertime high-elevation fog. J. Appl. Meteor 38:369384.

    • Search Google Scholar
    • Export Citation
  • Watson, J. G., N. F. Robinson, J. C. Chow, R. C. Henry, B. M. Kim, T. J. Pace, E. L. Meyer, and Q. Nguyen. 1990. The USEPA/DRI Chemical Mass Balance Receptor Model, CMB7.0. Environ. Software 5:3849.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    The Ashdod–Hevel Yavne area, depicting the Ashdod air quality monitoring station (star), the power plant (square), the oil refineries (circle), and local industries (triangle)

  • View in gallery

    Distribution of RSN attributable to the power plant for 1987–97. Included events satisfy: SO2 > 20 μg m−3, 0 < RSN < 15, wind sector within 280°–340°, and time span 0900–1800 LST on days for which the minimum temperature between 1100 and 1700 LST exceeds 23°C. The dashed line connects the median RSN values, and the solid line depicts annual amounts of SO2 (metric kilotons) emitted by the power plant. The box indicates the interquartile range; the vertical lines extend to the 5th and 95th percentiles (n = 6822)

  • View in gallery

    Mean RSN by wind sector and season: (a) 1100–1500 LST and (b) 1800–2200 LST. Values are presented in a square root scale. The length of the arrow corresponds to RSN = 1

  • View in gallery

    Reduction in deviance as a function of the number of nodes for the 1996–97 data in original scale. The vertical line indicates the deviance for a 9-nodes subtree (n = 5868)

  • View in gallery

    A 9-nodes tree in original scale for 1996–97 (Torg). At each node, the mean value of RSN appears inside the ellipse (rectangle for terminal nodes) and the within-node deviance D2i is beneath it. The split rule appears across the branches

  • View in gallery

    A 9-nodes tree in logarithmic scale for 1996–97 (Tlog)

  • View in gallery

    Distribution of RSN by level, 1996–97. The box indicates the interquartile range, the center horizontal line is the median, and the vertical lines extend to 1.5 times the interquartile range. Dots above the staple represent outliers

  • View in gallery

    Number of exceedances of SO2 beyond one-quarter of the air quality standard (125 μg m−3) by pollution source and month, 1996–97 (n = 485)

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 455 411 36
PDF Downloads 227 198 24

On the Ratio of Sulfur Dioxide to Nitrogen Oxides as an Indicator of Air Pollution Sources

View More View Less
  • 1 The Hebrew University of Jerusalem, Jerusalem, Israel
© Get Permissions
Full access

Abstract

The ratio of sulfur dioxide to nitrogen oxides (RSN = SO2/NOx) is one indicator of air pollution sources. The role of this ratio in source attribution is illustrated here for the Ashdod area, located in the southern coastal plain of Israel. The main sources of pollution in the area are the tall stacks of the Eshkol power plant, the stacks of oil refineries, and areal sources (stationary and mobile). The factors that affect RSN are studied using four regression models: a binary regression tree in original scale, a tree in logarithmic scale, a data partition produced by a combination of the two trees, and a linear regression model. All models have similar relative prediction error, with the combined partition best highlighting the sources of variability in RSN: (a) very low values (interquartile range of [0.12, 0.48]) are associated with traffic, (b) low values ([0.43, 1.00]) are attributed to the power plant and to daytime emissions of local industry, (c) medium values ([0.74, 1.90]) are associated with local industry emissions during cooler hours of the day and refinery emissions mainly on slow wind episodes, and (d) high values ([1.07, 4.30]) are attributed to refinery emissions during moderate to fast wind episodes. Analysis of the number of episodes of increased concentrations indicates that, during 1996 and 1997, about 42% of SO2 episodes are attributable to the power plant and 33% to the refineries. Increased-NOx episodes are mainly contributed by traffic (91%) and power plant (4.5%) emissions.

Corresponding author address: Dr. Ronit Nirel, Dept. of Statistics, The Hebrew University of Jerusalem, Mt. Scopus, Jerusalem 91905, Israel.nirelr@cc.huji.ac.il

Abstract

The ratio of sulfur dioxide to nitrogen oxides (RSN = SO2/NOx) is one indicator of air pollution sources. The role of this ratio in source attribution is illustrated here for the Ashdod area, located in the southern coastal plain of Israel. The main sources of pollution in the area are the tall stacks of the Eshkol power plant, the stacks of oil refineries, and areal sources (stationary and mobile). The factors that affect RSN are studied using four regression models: a binary regression tree in original scale, a tree in logarithmic scale, a data partition produced by a combination of the two trees, and a linear regression model. All models have similar relative prediction error, with the combined partition best highlighting the sources of variability in RSN: (a) very low values (interquartile range of [0.12, 0.48]) are associated with traffic, (b) low values ([0.43, 1.00]) are attributed to the power plant and to daytime emissions of local industry, (c) medium values ([0.74, 1.90]) are associated with local industry emissions during cooler hours of the day and refinery emissions mainly on slow wind episodes, and (d) high values ([1.07, 4.30]) are attributed to refinery emissions during moderate to fast wind episodes. Analysis of the number of episodes of increased concentrations indicates that, during 1996 and 1997, about 42% of SO2 episodes are attributable to the power plant and 33% to the refineries. Increased-NOx episodes are mainly contributed by traffic (91%) and power plant (4.5%) emissions.

Corresponding author address: Dr. Ronit Nirel, Dept. of Statistics, The Hebrew University of Jerusalem, Mt. Scopus, Jerusalem 91905, Israel.nirelr@cc.huji.ac.il

Introduction

In many regions, air pollution is caused by different types of sources. One type is point sources, such as tall stacks of power plants and oil refineries stacks. A second type is areal sources, such as local industry and transportation. The ability to differentiate among various sources can be useful (a) for quantifying the relative contribution of each source, in particular the relative contribution of mobile sources; (b) for evaluating the effectiveness of existing control measures; and (c) for gaining insight into the mechanisms and conditions that affect the level of pollution emitted by different sources.

The level, location, and duration of pollutants' concentration within a region depend on plume height, wind speed, rate of vertical mixing in the atmosphere, and distance from the source. Hence, plumes from different sources can be identified by the following indicators:

  • Structure of the atmospheric boundary layer (ABL). During the day, the concentration of pollutants emitted from areal sources is well mixed throughout the boundary layer. This mixing is more vigorous and efficient within the free convective layer (FCL). In this layer, free convection is formed when buoyancy dominates turbulence production rather than shear. Such convective heating is manifested by a pronounced superadiabatic lapse rate of up to −3° to −3.5°C (100 m)−1, as measured by Dayan et al. (1988) in the southern part of the coast of Israel. Plumes injected above the ABL from tall point sources may drift downward and cause localized high concentrations by impingement of the plume upon the ground as a result of convective downdrafts. As opposed to the FCL, which is driven by thermal turbulence, a stable boundary layer (SBL) may be built during the night in clear sky conditions due to radiative heat losses from the ground and cooling by conduction. Dayan and Koch (1989) found that the frequency of radiative inversions along the coast of Israel for each of the three synoptic categories occurring during the spring and beginning of summer were in the range of 13%–22%, whereas the frequency for each of the other synoptic categories was less than 7%. Such a stable boundary layer prevents plumes emitted from tall point sources from reaching the surface and therefore does not affect the surface air quality, while emissions from mobile and other ground sources are trapped within the stable layer and increase the level of pollution.

  • Wind direction. Frequently, pollution sources can be identified by the upwind direction from the monitoring station, or receptor. In extracting the actual wind direction from surface measurements, the height of the source, its distance from the receptor, and the measurement timescale should be taken into account. Above the coastal plain of Israel, typically observed veering angles in wind profiles within the ABL (i.e., from the surface up to the frictionless layer) are 5°–10° for neutral atmospheric conditions and 30°–40° for slightly stable conditions (Balmor et al. 1988; Koch and Dayan 1992). Besides veering of the horizontal direction of the wind profile, two other parameters should be considered: the deviation σθ of the horizontal wind direction fluctuation at the surface and the standard deviation σy of the concentration distribution in lateral direction as a function of the distance from the pollution sources.

  • Plume signature: RSN = SO2/NOx. Gas-phase sulfur dioxide (SO2) is emitted during combustion of all sulfur-containing fuels (oil, coal, and diesel), whereas traffic is a prominent source of nitrogen oxides (NOx). The ratio between the two may be useful in identifying pollution sources, because fuels used, say, for electricity generation and for transportation differ in their sulfur content and because the ratio is related to combustion conditions. Typically, electricity production is expected to result in a lower SO2/NOx ratio than emissions caused by low-temperature boilers burning fuel oil with high sulfur content. Benkovitz et al. (1996) mapped the nitrogen-to-sulfur ratios (moles N emitted/moles S emitted) to reflect the overall character of pollution sources around the world. A study of the inverse ratio S/N indicates that, in many European countries, the ratio is typically near 1. It is larger than 3 in some countries in central and eastern Europe, reflecting, most likely, the presence of heavy industry. In industrialized and heavily populated areas of North America, the ratio is generally greater than 1, while in areas located away from large stationary sources, it is around 0.1. Studies in Georgia report a mean SO2/NOx ratio of 0.05 for mobile sources and ratios ranging from 2.7 to 4.6 for power plant plumes (Duncan et al. 1995). These findings indicate that mobile and point sources may be identified by their characteristic RSN.

Meteorological conditions cannot always help in identifying the sources that account for specific pollution episodes. For such cases, a source apportionment approach using receptor models is often used. Models of this kind may be of a chemical orientation, such as the chemical mass balance model, which provides a quantitative assessment of the contribution of emission sources to pollutant concentrations in the receptor (Watson et al. 1990). Another type of receptor model is based on statistical methods such as multiple linear regression, principle component analysis, or cluster analysis models (e.g., Swietlicki et al. 1996). Such models assume that, although ambient concentrations result from the superposition of several sources (Hopke 1985), a dominant source often can be identified.

In this paper, the factors that affect the variability of RSN are studied through regression tree models. The results are then used to classify pollution episodes by their probable source. Tree models are useful exploratory tools for generating insight into the nature of a relationship between response and explanatory variables. In comparison with linear models, tree-based models are more adept at capturing nonlinear behavior and allow more general interactions among predictors. We demonstrate the method for the Ashdod area, Israel. The different sources of pollution in the area and the data from an air quality monitoring station within this airshed are described in section 2. Section 3 describes regression tree models and compares them with linear regression. Four regression models are fitted and validated in section 4: a regression tree model in original scale, a tree model in logarithmic scale, a data partition produced by a combination of the two trees, and a linear regression model in log scale. The combined partition defines 20 data subsets, representing four RSN levels: very low (median RSN 0.21), low (0.62), medium (1.29), and high (2.11). The meteorological conditions underlying these levels are analyzed in section 5a. In section 5b, “exceedances” of SO2 and NOx concentrations beyond one-quarter of the respective air quality standards are classified by their probable pollution source. Section 6 summarizes the main findings.

Location, data, and descriptive analysis

We analyze data from the Ashdod air quality monitoring station, located in the southern coastal plain of Israel (Fig. 1). The main pollution sources in the area, within approximately a 20-km range of the receptor, are (a) point sources, including the 150-m stacks of the Eshkol power plant, located about 3 km (1.9 mi) northwest of the receptor, and high stacks (80 m) of the Ashdod oil refineries, located approximately 2.5 km to the north of the monitoring station; (b) mobile sources, including traffic on the Tel Aviv–Ashqelon highway to the east of the receptor, on the road to the Ashdod harbor north of the receptor, and on the road leading to the town west of the station; and (c) areal sources, including mainly small food and pharmaceutical industries. These sources are located mostly to the west of the receptor.

Identifying the pollution sources in this area may prove to be simple under some conditions; for example, during the hot hours of the day in the summer when pollution arriving from the northwesterly sector is attributable to the power plant. However, it is more difficult to distinguish between sources in the cooler hours of the day. At these times, the pollution intercepted by the station from the northern sector, for example, can be attributed to either stationary (refineries) or mobile (traffic) sources.

The data are composed of 30-min averages of SO2 and NOx concentrations, as recorded in the period of 1987–97. Surface meteorological measurements include wind direction, wind speed, and temperature. Monthly fuel data specify the type of fuel (regular, low, and very low sulfur content) and quantities used by the power plant and refineries. Since 1996, the power plant fuel policy requires that fuel containing “low” sulfur content (1%) is to be used between April and October, and fuel containing “regular” sulfur content (2%) is to be used during the rest of the year. “Very low” sulfur content fuel (0.5%) should be used when the Israel Meteorological Service Intermittent Control System issues a pollution alert.

Between 1987 and 1997, the total fuel consumption of the power plant was fairly stable, but the percentage of low-sulfur fuels consumed increased continuously. This fact resulted in a pronounced decrease in sulfur emissions during this period and in a respective decline in RSN values intercepted downwind of the power plant (Fig. 2). The largest median value of RSN was observed in 1988 (about 2.8), when 50% of the values ranged approximately from 2 to 4. Since 1988, there was a continuous decline in the median RSN, and in 1996 and 1997 the levels were near 1.

In view of the high variability of RSN in the power plant wind sector, it seems reasonable to narrow the analysis to 1996 and 1997, which share the same fuel consumption policy. The values of RSN in 1996–97 range from 0 to 36.7. To eliminate the influence of negligible concentrations, in the numerator and denominator of RSN, only measurements with SO2 greater than 20 μg m−3 (this is background concentration in the area) and 0 < RSN < 15 were included in the analysis (number of measurements n = 5868). Twenty-five episodes, clearly attributable to the refineries, had RSN ≥ 15.

The diurnal, seasonal, and directional variation of RSN is summarized by the RSN roses in Fig. 3. Note that RSN values appear in a square root scale for a clearer exposition. Figure 3a illustrates mean RSN values under well-mixed FCL conditions (1100–1500 LST). During the hot hours of the day, pollution is intercepted from the western and northern sectors: in January, mean RSN values range from 0.39 to 1.40, in the 205°–45° wind sector; in April, the means range from 0.76 to 1.82, in the 215°–355° sector; in July, RSN means are between 0.28 and 1.33 in the 225°–335° sector; and in October, the means are in the range of 0.43–3.03 in the 225°–345° direction. Hence, daytime RSN levels are higher and more variable in the transitional seasons than in the winter and summer. Wintertime episodes are contributed by sources to the west (power plant and local industry) and north (refineries) of the receptor; during the rest of the year, pollution episodes are mainly attributable to westerly sources. Figure 3b illustrates the mean RSN values under a stratified SBL (1800–2200 LST). In January, pollution is intercepted from two wind sectors: within the 335°–55° sector, RSN means range from 0.24 to 2.02, while within the 225°–285° sector, they range from 0.48 to 0.60. In April, three pollution sectors are identifiable: the 315°–55° sector contributes RSN levels between 0.88 and 4.42; the mean levels in the 95°–135° sector are in the range of 0.15–0.29; and in the 245°–295° sector, 0.75–1.74. In July, pollution episodes are intercepted in the 255°–355° sector, with mean RSN in the range of 0.49–1.81. October episodes are in the 345°–65° sector, with RSN means ranging from 0.51 to 4.32. Thus, evening episodes in the study area have a higher spatial dispersion as compared with daytime episodes and have more variable RSN levels. The highest variability is observed in springtime, with high RSN values intercepted from the north and low values from the east (most likely attributable to transportation). The lowest variability is observed in the summer, when relatively stable RSN values are intercepted from the northwesterly sector. The observed spatial homogeneity in the summer is explained meteorologically by the fact that the summer in Israel is characterized by the predominant barometric trough known as the “Persian trough.” This unique synoptic system generates northwesterly winds almost every day along the coastal region of Israel.

Regression tree models

The sources of variability in the SO2/NOx ratios are investigated in this paper through regression tree models and the data partitions that they induce. The regression tree technique was developed by Breiman et al. (1984) in a wider framework of the Classification and Regression Trees (CART) method. Regression trees, like linear regression, study relationships between a response variable, or predictand, Y and a set of possible p predictors X1, … , Xp. The rules that define the relationship between the predictand and predictors are displayed in the form of a binary tree, hence the name. Tree-based models are descriptive exploratory tools, whereas linear regression models can be used for statistical inference, provided the distributional assumptions are satisfied. CART does not assume linearity and can therefore capture a wider variety of structures than the linear model. Questions of interaction between predictor variables are handled automatically by the tree-building process. In addition, for some applications, the tree structure gives results that are easier to interpret than the classical regression equation. These properties of tree-based models have made them attractive and useful for modeling environmental and meteorological data. Recent applications include Burrows et al. (1995) and Ryan (1995), who utilized CART models for prediction of ozone concentrations in southern Canada and Baltimore, respectively. Burrows (1997) built CART models for predicting UV radiation, Carter and Elsner (1997) forecast rainfall over Puerto Rico, and Walmsley et al. (1999) used CART to investigate the factors that affect liquid water content in fog.

The CART builds a binary decision tree by partitioning the data into disjoint subsets, or “bins,” represented by nodes in the tree. The first node of the tree, or the root node, is the entire dataset. The tree is built by splitting the root node into two new nodes. These nodes may be also split, and so forth. A node that is not split is referred to as a terminal node. This building process is often called recursive partitioning. Each successive partition in the tree is defined by a predictor Xi and a split value s, assigning observations for which xis to the left child node and those for which xi > s to the right (uppercase letters are used for random variables and lowercase for the observed values). At each partition, the algorithm examines every allowable split value on each of the predictors. The split that minimizes the prediction error is selected. The prediction for all observations in a terminal node i is equal to the predictand mean at this node.

The tree is grown until some stopping rule is satisfied (e.g., minimal size of a terminal node or error threshold). Frequently, the trees tend to grow too big with too few observations in each terminal node. To overcome this problem, a “bottom-up” process is initiated, which incrementally collapses back pairs of terminal nodes and is named recursive pruning. The final tree minimizes the cost complexity measure, which contains a penalty for tree size.

CART is an exploratory technique, and its main tool to assess model fit is the prediction error. Define the deviance in node i by
i1520-0450-40-7-1209-eq1
where ni is the number of observations in node i, yij the predictand value for the jth observation in node i, and yi the observed mean of the predictand in node i. The overall deviance is given by
i1520-0450-40-7-1209-eq2
where b is the number of terminal nodes. At the root of the tree, the “naive” prediction is the overall mean y. Thus, D2r = Σbi=1Σntj=1(yijy)2 is the “worst-case” deviance. The relative error
i1520-0450-40-7-1209-eq3
measures the improvement of the current model as compared with the simplest model. A reliable estimate of E2 is obtained by cross validation (cf. Efron and Tibshirani 1993). Typically, the data are divided into υ similar parts. The υ − 1 parts are used as a training set for the tree-building process, and the remaining data serve as a test set for validating the tree. The overall fit is assessed by averaging the υ estimates of error, obtaining the cross-validation relative error E2cv.

The analysis is based on the proprietary S-Plus implementation of CART (MathSoft 1998) in the tree function and its derivatives. Further description of this implementation can be found in Venables and Ripley (1994).

Fitting and validating the models

Constructing explanatory variables

Potential raw explanatory variables for the models are temperature, wind speed, wind direction, month, and hour of the day. Because the last three variables are cyclical, respective indicators were defined.

  1. Wind sector indicator w. For the eastern wind sector, w = −1; w = 1 for the power plant sector; w = 2 for the refineries sector; and w = 0 for all other sectors. To determine the relevant wind sectors, we start by observing that the power plant is located in the 330° direction from the receptor. Koch and Dayan (1992) showed that, for the most frequent synoptic category (Persian trough with shallow pressure gradient, persisting in this mode about 2/3 of the days in the summer), a 5-yr mean anticyclonic veering of 4° (100 m)−1 is measured in the atmospheric layer in which the plume has risen and been transported as released from a 300-m stack. The highest stacks in the studied region are 150 m, corresponding to a total veering of approximately 10°–15° in the ABL, where effective height of emitted plumes of such stacks are expected. Typical 30-min period standard deviations of the horizontal wind direction fluctuation (σθ) for summer noontime corresponding to a B–C Pasquill stability category is approximately 15°–20° (Gifford 1976). The standard deviation of concentration distribution in lateral direction (σy) for a 3-km downwind distance from the power plant under such stability conditions is 300–500 m (Slade 1968). These considerations lead us to conclude that the relevant wind direction indicating plumes originating from the power plant is within the sector 280°–340°. The refineries, located approximately in the 10° direction, have lower stacks than the power plant does and are closer to the receptor. In accord, the relevant sector assigned to this source is 350°–20°. The eastern sector is defined by the range 30°–180°.

  2. Time of day indicator h. Between 0900 and 1800 LST, h = 1, and h = 0 otherwise. The daytime range was determined by the observed diurnal distribution of pollution episodes downwind of the power plant, which indicates a sharp decline before 0900 and after 1800, when convection induced by surface heating is suppressed.

  3. Season indicator m. Between May and September, m = 1, and m = 0 otherwise, corresponding to the aforementioned power plant fuel policy.

Regression tree in original scale

Our objective is to classify RSN into categories characterizing distinct sources of pollution and to understand the principle sources of variation in RSN. It is therefore sufficient for our purposes to obtain a tree with a relatively small number of terminal nodes, reflecting the main sources of variation in RSN. We begin by restricting the minimal node size for splitting to 20 observations (i.e., growing continues if there are at least 20 observations in a node). The resulting initial tree had 30 terminal nodes with deviance D2 = 4953. Figure 4 explores all subtrees of the initial tree by plotting the deviance against the number of terminal nodes. The figure indicates that the main gain in precision is obtained by the 9-nodes tree, with D2 = 5344 and relative error E2 = 5344/6953 = 0.76. The next tree for consideration has 16 nodes, and D2 = 5147. The relatively small improvement in deviance does not seem to justify the added complexity (seven additional nodes). Figure 5 displays the resulting tree, and Table 1 characterizes the nodes. The salient features of the 9-nodes tree are as follows:

  • Level 0. At the root, the overall mean RSN is 0.98.

  • Level 1. The first split partitions the data by the wind sector indicator w, associated with the main stationary sources. On the right branch are observations for which w > 1.5, indicating the refinery direction (n = 493), and on the left branch are all other wind sectors. The mean RSN in the direction of the refineries is 2.00, as compared with 0.89 for all other observations.

  • Level 2. At the second level, both nodes are partitioned by wind speed at thresholds of 3.75 and 2.75 m s−1 on the left- and right-hand sides, respectively. Mean RSN values are larger for faster wind episodes than for slower wind episodes (1.05 as compared with 0.71 on the left branch and 2.80 as compared with 1.05 on the right branch).

  • Level 3, nonrefineries sector. A further split on the left branch reflects diurnal variation of RSN during faster wind episodes, with night and day means of 1.68 and 0.94, respectively. The relative error of the left branch is E2 = 0.91.

  • Levels 3 and higher, refineries sector. For the faster wind speed episodes, on the right-hand side, a further partitioning by temperature and wind speed is applied. The model identifies two small subgroups of observations (n = 15 and 10), with extremely high mean RSN (5.31 and 7.01 in nodes 7 and 9, respectively). It is clear that the RSN does not increase monotonically with temperature. The relative error of this branch is lower than that of the left branch (E2 = 0.73). These findings are consistent with the descriptive analysis of section 2 and with the meteorological mechanism prevailing in the Ashdod airshed. The model adds insight into the pollution process by determining the cutoff points that define the underlying meteorological structure.

This tree illustrates the flexibility of the model in accommodating, for example, nonlinear relationships. It also shows that the same predictor may be used in several levels of the model. The asymmetry of the tree is evident, in that 6 of the 9 terminal nodes explain the structure of high values of RSN (which are only 8% of the observations). Because RSN values attributable to mobile sources are expected to be low, a further investigation of the variability of low values of RSN is carried out through a regression tree in logarithmic scale.

Regression tree in logarithmic scale

Similar criteria and considerations used for fitting the previous tree indicated a 9-nodes tree for the logarithm of RSN. The resulting tree is displayed in Fig. 6, and the nodes are defined and characterized in Table 1. The tree has an overall relative error of E2 = 0.73 and is nearly symmetric. The first-level split partitions the data by wind speed, where again slower winds are associated with lower values of RSN. At the second level, spells of easterly winds are separated from those of noneasterly winds, with mean RSN of 0.47 and 0.74, respectively. On the right, the split reflects diurnal variation, with means of 0.97 and 1.74 in nighttime and in daytime, respectively. Note that the cited means are computed in the original scale.

Combined partition

The original- and log-scale trees define two partitions of the data into subsets (nodes). Denote these partitions by Torg and Tlog, respectively, and the ith node of a tree T, by T(i). The partition Tlog focuses on the within-node homogeneity of low RSN values, and Torg focuses on homogeneity of high RSN values. Because both low and high values of RSN are of interest, we combine the two models by forming a new partition. The combined partition Tcom is defined by the intersection of the subsets {Torg(i)} and {Tlog(i)}. Generally, this partition cannot be described by a binary tree. Nevertheless, its relative error E2(Tcom) can be computed by summing the squared error over all subsets. Because, for each pair of subsets Tcom(i) and Torg(i), Tcom(i) is either included in Torg(i) or the two subsets are disjoint, Tcom is a refinement of Torg (Breiman et al. 1984, section 9.2). In a similar way, Tcom is a refinement of Tlog. Therefore, E2(Tcom) ≤ E2(Torg) and E2(Tcom) ≤ E2(Tlog); that is, Tcom performs at least as well as both its parent trees.

Analysis of the combined partition indicates that there are 22 nonempty subsets (out of 81 potential sets). The intersections of Torg(2) and Tlog(5) and of Torg(9) and Tlog(6) comprise two and four observations, respectively. These two subsets were omitted from the analysis. Table 2 displays the remaining 20 subsets, sorted by their mean RSN. As expected, the most prominent feature of this partition is that Torg(1), consisting of low RSN values, intersects with essentially six nodes of Tlog, and nodes Torg(5)–Torg(9), consisting of the highest values of RSN, are collapsed into two nodes in Tlog. In summary, the separate analyses based on Torg and Tlog highlight the upper and lower tails of the distribution of RSN, respectively. In contrast, the combined partition, based on the intersection of the two trees, incorporates the sources of variability in the full range of RSN values.

Linear regression model and cross validation

The last model we consider is the parametric multiple linear regression model. To comply with the linearity and variance homogeneity assumptions of the model, we use log(RSN) as the response variable. Because the results of CART indicate that the relationship between RSN and temperature is nonlinear, a second-degree polynomial in temperature was fitted to the data. Because the wind sector indicator receives four values, three dummy variables (w*−1, w*1, and w*2) were defined to represent this factor. In addition, interactions of {w*i} with all other explanatory variables were tested for statistical significance. A stepwise regression algorithm was used to estimate the model for 1996–97. The resulting model, including all coefficients with significance level P ≤ 0.05, in the order in which they entered the model, is given by
i1520-0450-40-7-1209-eq4
where y = log(RSN), wds is wind speed, and tem is temperature. The percent of explained variance is R2 = 0.30 (E2 = 0.70). Hence, the regression equation is consistent with CART's findings: wind speed plays a major role in determining RSN levels, with different slopes for the refineries and for the power plant wind sectors. RSN levels also vary with interactions of the time of day and of the time of year with wind sector indicators. Temperature is the least influential factor.

To obtain a cross-validation estimate of the relative error, separate trees for 1996 and 1997 are constructed. The model constructed for 1996 is applied to 1997 data, and vice versa. The first and last rows in Table 3 display the ordinary estimates of relative error based on the training test. The relative error for the combined partition Tcom was computed in both the original and logarithmic scales. It is seen that the models have similar cross-validation error estimates, ranging from 0.77 to 0.79 in the log scale and between 0.83 and 0.84 in the original scale. The cross-validation estimates are higher, on average, by 8%–10% than the respective ordinary estimates. Note that the linear and tree-based models have roughly the same predictive power, despite the fact that categories of temperature and wind speed are used in the tree-based model as opposed to the more detailed continuous values, used in the linear model. We conclude that, for the classification of meteorological conditions by their effect on RSN, the combined partition is more informative than the regression equation. We therefore use Tcom in the subsequent analysis.

Results

Sources of variability in RSN

The 20 subsets of the combined partition reflect four underlying RSN levels, depicted in Table 2: very low (subsets 1–4), low (5–10), medium (11–15), and high (16–20). Figure 7 displays the distribution of observations in these four levels. The salient meteorological characteristics of observations included in each level are as follows:

  • Very low, interquartile RSN range [0.12, 0.48]. Observations in this level are characterized by easterly slow winds, not in the summer (highest frequency between November and January), and mainly in the cooler hours of the day (0600–1100 and 1500–0000 LST). Toward the end of autumn and most of the winter, when observed RSN in this level is the most frequent, 50%–60% of the winds blow from the easterly sector at a mean speed of 2.5 m s−1. Being a marker for traffic-related pollution and characterized by a stable and persistent spatial pattern (Lebret et al. 2000), NOx is probably trapped within the stable atmospheric surface layer formed by the rapid nocturnal radiative heat losses from the open agricultural area to the east of the receptor (see Fig. 1). Because morning and evening heavy traffic are the main sources of pollution east of the receptor, pollution events in this category are mainly attributable to mobile sources.

  • Low, interquartile range [0.43, 1.00]. Episodes in this category are typically in the western wind sector (43% in the power plant sector and the rest in the southwesterly sector), mainly during the day (0800–1800 LST), throughout the year. This category represents levels of RSN attributable to the power plant, because it includes 95% of the episodes downwind of this source. These are, apparently, indistinguishable from typical RSN values attributable to daytime emissions of stationary areal sources in the southwesterly sector.

  • Medium, interquartile range [0.74, 1.90]. Observations in this level occur on either episodes of southwesterly winds (peak at 260°–280°) or downwind of the refineries, mainly not during daytime, and on nonsummer months (peak at March–April). Hence, medium levels of RSN are attributable to local industry during the cool hours of the day and to the refineries when the wind is slow. Surface wind distributions for March and April around 1900 LST (mode of the RSN distribution in this level) indicate that in March the most frequent wind directions are westerlies (11%) and northeasterlies (10%). In April, the prevailing winds are northerlies (21%) and northeasterlies (11%). This result explains the episodes observed of both natures: local industry affecting the receptor in the westerly wind regime and influence of the refineries during north-to-northeast flows.

  • High, interquartile range [1.07, 4.30]. This category comprises episodes of moderate to fast winds (>2.75 m s−1), from the refineries sector, at temperatures exceeding 16.85°C, in late afternoon and early evening (1600–2100 LST), and during the transitional seasons. This may be explained by the high frequency (23%) of north-to-northeasterly winds characterizing the wind flow during the transitional seasons. Several factors act in concert to reduce plume rise from the refineries sector, thus dispersing less efficiently their rich sulfur plumes, which leads to high RSN levels in the downwind sector: (a) slightly to moderately stable conditions, persisting during these seasons and time of day, which resist upward vertical motions; (b) moderate to strong winds, steering the refinery plumes to blow downwind, suppressing their initial rises; and (c) the relatively low temperatures of the emitted gases of these low stacks, which favor a rapid loss of buoyancy.

The next section assesses the contribution of different sources in the Ashdod area to the number of episodes of increased concentrations of SO2 and NOx during 1996–97.

Analysis of episodes of increased concentrations

The findings of the previous section indicate that the SO2/NOx signature, in conjunction with wind direction, is a useful indicator of pollution sources. Analysis of RSN values for episodes with SO2 > 125 μg m−3 or NOx > 235 μg m−3 (one-quarter of the respective half-hourly air quality standards) shows that (a) for all wind sectors except the power plant sector, RSN values either do not exceed 0.37 or are higher than 0.53 and (b) daytime SO2 episodes in the power plant sector, included in the low-level category, have a minimal RSN value of 0.49. However, for NOx exceedances in this sector, RSN values are continuous, precluding a clear-cut distinction between traffic and power plant emissions. These findings yield the following source attribution rules: observations with RSN not exceeding 0.5 (approximately the third quartile of the very low level) were attributed to mobile sources, except for the power plant sector, for which the threshold was conservatively set to 0.1 (approximately the first quartile of the very low level). Observations with RSN larger than 0.5 were attributed to stationary sources except for the power plant, for which a threshold of 0.3 was used.

Although pollution events may result from a superposition of sources, episodes of increased SO2 and NOx concentrations might be frequently classified by a dominant source. Table 4 specifies the classification criteria and the estimated number of exceedances by source. Because wind direction is indicative of different stationary sources, intervals of 10°–15° separate the different sectors. However, pollution episodes attributable to mobile sources are identified mainly by their RSN fingerprint, and therefore the wind sector definitions for these sources are exhaustive. The results indicate that the respective numbers of SO2 and NOx exceedances in 1996–97 are 485 and 634. About 42% of the elevated SO2 concentrations are attributed to the power plant, 33% to the refineries, 11% to local industry, and 14% to undefined sources. Most of the 66 SO2 exceedances in the “undefined” group are in the 265°–279° sector, where it is unclear whether the power plant or local industry are accountable. The SO2 1996 emissions inventory in the area indicates that, of the total emitted 36 343 Mg, 27 015 Mg (74.3%) were emitted by the power plant, 8010 Mg (22.0%) by the refineries, and the rest (3.7%) by local industry. Inventories for 1997 had a similar distribution. As expected, the relative contribution of the power plant to the total emissions is higher than its relative contribution to ambient concentrations, and the inverse is true for the refineries. This result is explained by the higher effective height of the power plant stack as compared with the refinery stacks.

NOx exceedances are mainly accounted for by traffic (91%), the power plant (4.5%), and undefined sources (4.5%). Most of the 29 NOx exceedances in the undefined category have RSN between 0.1 and 0.3. No inventories for NOx emissions are available for the study area. Comparison between 1996 and 1997 indicates that the number of elevated NOx episodes slightly decreased (329 and 305 in 1996 and 1997, respectively). For SO2, the number increased by approximately 150% (from 139 in 1996 to 346 in 1997).

Figure 8 shows the seasonal distribution of SO2 exceedances. The lowest frequency is in the midsummer months of July and August. The highest impact of power plant emissions is in April and May and of the refineries is in April and October–November. Areal stationary sources also have the largest contribution in April. Traffic emissions, manifested in high NOx episodes, have the largest effect between November and January (69% of the exceedances) and almost no effect during the summer.

Summary and conclusions

The paper proposes a method for studying the source–receptor relationship when regular emission inventories are missing or incomplete. The usefulness of the SO2/NOx signature in distinguishing between pollution sources was demonstrated for the southern coastal plain of Israel, where diverse pollution sources affect air quality. The sources of variability of RSN were analyzed by tree-based regression models and by a linear model. The main advantage of the tree-based procedures proved to be the insight gained by the unique subset structure that they defined. The combination of two trees, in original and logarithmic scales, enabled a simultaneous investigation of the low and high RSN values. The resulting combined partition highlighted the underlying meteorological conditions, which yield different levels of RSN. The agreement between the combined partition and the inferential findings of the linear regression model strengthened the analysis. Furthermore, although tree-based predictions are constant for all observations in the same subset, the relative prediction error of CART was similar to that of the linear model.

The explanatory power of the variables at hand was low, as suggested by the fact that the relative error of a “near-perfect” tree (912 nodes) is E2 = 0.51. Clearly, more accurate meteorological data based on synoptic conditions, depth of the mixing layer, ventilation rates, and other ABL parameters are expected to improve the model performance. This area will be dealt with in future work.

The results show that RSN identifies mobile sources for all wind sectors except for the power plant sector. For stationary sources, the RSN signature should be supplemented by wind direction.

It is interesting that, in earlier years, RSN values in the power plant sector were indistinguishable from those in the refineries sector. The decrease in the power plant SO2/NOx signature is a direct result of the improvement in fuel quality and of the control policy.

Classification of episodes of increased SO2 and NOx concentrations by their probable source indicate that power plant SO2 emissions are controlled successfully during the summer but not during winter and spring and that refineries' SO2 emissions have a prominent effect on air quality during the transitional seasons. NOx emissions are mostly attributable to mobile sources. It is regrettable that the number of NOx exceedances beyond one-quarter of the air quality standard nearly tripled between 1995 and 1996, indicating that traffic emissions have become a major air quality problem in the study area.

Acknowledgments

The authors thank Doron Lahav of the Ashdod–Hevel Yavne Association of Towns for the Protection of Environment Quality for providing the data and Michal Kidron from the Cartographic Laboratory of The Hebrew University for her kind assistance in the preparation of the figures. The authors also thank the referees for their valuable comments, which led to significant improvement in the paper. This research was partially funded by the Ministry of the Environment, Israel.

REFERENCES

  • Balmor, Y., A. Gutman, and U. Dayan. 1988. Wind and temperature structure of the atmospheric boundary layer in a coastal versus inland site in Israel. Proc. EURASAP Conf: Meteorology and Atmospheric Dispersion in a Coastal Area, Riso, Denmark, EURASAP, 15–24.

    • Search Google Scholar
    • Export Citation
  • Benkovitz, C. M. and Coauthors, 1996. Global gridded inventories of anthropogenic emissions of sulfur and nitrogen. J. Geophys. Res. Atmos 101:2923929253.

    • Search Google Scholar
    • Export Citation
  • Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth and Brooks, 358 pp.

  • Burrows, W. R. 1997. CART regression models for predicting UV radiation at the ground in the presence of cloud and other environmental factors. J. Appl. Meteor 36:531544.

    • Search Google Scholar
    • Export Citation
  • ——, Benjamin, M., S. Beauchamp, E. R. Lord, D. McCollor, and B. Thomson. 1995. CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. J. Appl. Meteor 34:18481862.

    • Search Google Scholar
    • Export Citation
  • Carter, M. M. and J. B. Elsner. 1997. Statistical method for forecasting rainfall over Puerto Rico. Wea. Forecasting 12:515525.

  • Dayan, U. and J. Koch. 1989. Assessment of the critical conditions for dispersion and transport of plumes from tall stacks in the Haifa area. Proc. Fourth Int. Conf. of the Israel Society for Ecology and Environmental Quality Sciences, Jerusalem, Israel, ISEEQS, 27–36.

    • Search Google Scholar
    • Export Citation
  • ——, Shenhav, R. and M. Graber. 1988. The spatial and temporal behavior of the mixed layer in Israel. J. Appl. Meteor 27:13821394.

    • Search Google Scholar
    • Export Citation
  • Duncan, B. N., A. W. Stelson, and C. S. Kiang. 1995. Estimated contribution of power plants to ambient nitrogen-oxides measured in Atlanta, Georgia in August 1992. Atmos. Environ 29:30433054.

    • Search Google Scholar
    • Export Citation
  • Efron, B. and R. J. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman and Hall, 436 pp.

  • Gifford, F. A. 1976. Turbulence diffusion typing schemes—A review. Nucl. Saf 17:6886.

  • Hopke, P. K. 1985. Receptor Modeling in Environmental Chemistry. John Wiley and Sons, 319 pp.

  • Koch, J. and U. Dayan. 1992. A synoptic analysis of the meteorological conditions affecting dispersion of pollutants emitted from tall stacks in the coastal plain of Israel. Atmos. Environ 26A:25372543.

    • Search Google Scholar
    • Export Citation
  • Lebret, E. and Coauthors, 2000. Small area variations in ambient NO2 concentrations in four European areas. Atmos. Environ 34:177185.

    • Search Google Scholar
    • Export Citation
  • MathSoft, 1998. S-PLUS 5 for UNIX Guide to Statistics. Data Analysis Division, MathSoft, 1014 pp.

  • Ryan, W. F. 1995. Forecasting severe ozone episodes in the Baltimore metropolitan-area. Atmos. Environ 29:23872398.

  • Slade, D. H. 1968. Meteorology and Atomic Energy. Office of Information Services, U.S. Atomic Energy Commission, 444 pp.

  • Swietlicki, E., S. Puri, H-C. Hansson, and H. Edner. 1996. Urban air pollution source apportionment using a combination of aerosol and gas monitoring techniques. Atmos. Environ 30:27952809.

    • Search Google Scholar
    • Export Citation
  • Venables, W. N. and B. D. Ripley. 1994. Modern Applied Statistics with S-Plus. Springer, 462 pp.

  • Walmsley, J. L., W. R. Burrows, and S. Schemenauer. 1999. The use of routine weather observations to calculate liquid water content in summertime high-elevation fog. J. Appl. Meteor 38:369384.

    • Search Google Scholar
    • Export Citation
  • Watson, J. G., N. F. Robinson, J. C. Chow, R. C. Henry, B. M. Kim, T. J. Pace, E. L. Meyer, and Q. Nguyen. 1990. The USEPA/DRI Chemical Mass Balance Receptor Model, CMB7.0. Environ. Software 5:3849.

    • Search Google Scholar
    • Export Citation

Fig. 1.
Fig. 1.

The Ashdod–Hevel Yavne area, depicting the Ashdod air quality monitoring station (star), the power plant (square), the oil refineries (circle), and local industries (triangle)

Citation: Journal of Applied Meteorology 40, 7; 10.1175/1520-0450(2001)040<1209:OTROSD>2.0.CO;2

Fig. 2.
Fig. 2.

Distribution of RSN attributable to the power plant for 1987–97. Included events satisfy: SO2 > 20 μg m−3, 0 < RSN < 15, wind sector within 280°–340°, and time span 0900–1800 LST on days for which the minimum temperature between 1100 and 1700 LST exceeds 23°C. The dashed line connects the median RSN values, and the solid line depicts annual amounts of SO2 (metric kilotons) emitted by the power plant. The box indicates the interquartile range; the vertical lines extend to the 5th and 95th percentiles (n = 6822)

Citation: Journal of Applied Meteorology 40, 7; 10.1175/1520-0450(2001)040<1209:OTROSD>2.0.CO;2

Fig. 3.
Fig. 3.

Mean RSN by wind sector and season: (a) 1100–1500 LST and (b) 1800–2200 LST. Values are presented in a square root scale. The length of the arrow corresponds to RSN = 1

Citation: Journal of Applied Meteorology 40, 7; 10.1175/1520-0450(2001)040<1209:OTROSD>2.0.CO;2

Fig. 4.
Fig. 4.

Reduction in deviance as a function of the number of nodes for the 1996–97 data in original scale. The vertical line indicates the deviance for a 9-nodes subtree (n = 5868)

Citation: Journal of Applied Meteorology 40, 7; 10.1175/1520-0450(2001)040<1209:OTROSD>2.0.CO;2

Fig. 5.
Fig. 5.

A 9-nodes tree in original scale for 1996–97 (Torg). At each node, the mean value of RSN appears inside the ellipse (rectangle for terminal nodes) and the within-node deviance D2i is beneath it. The split rule appears across the branches

Citation: Journal of Applied Meteorology 40, 7; 10.1175/1520-0450(2001)040<1209:OTROSD>2.0.CO;2

Fig. 6.
Fig. 6.

A 9-nodes tree in logarithmic scale for 1996–97 (Tlog)

Citation: Journal of Applied Meteorology 40, 7; 10.1175/1520-0450(2001)040<1209:OTROSD>2.0.CO;2

Fig. 7.
Fig. 7.

Distribution of RSN by level, 1996–97. The box indicates the interquartile range, the center horizontal line is the median, and the vertical lines extend to 1.5 times the interquartile range. Dots above the staple represent outliers

Citation: Journal of Applied Meteorology 40, 7; 10.1175/1520-0450(2001)040<1209:OTROSD>2.0.CO;2

Fig. 8.
Fig. 8.

Number of exceedances of SO2 beyond one-quarter of the air quality standard (125 μg m−3) by pollution source and month, 1996–97 (n = 485)

Citation: Journal of Applied Meteorology 40, 7; 10.1175/1520-0450(2001)040<1209:OTROSD>2.0.CO;2

Table 1.

Definition of nodes in Torg (Fig. 4) and Tlog (Fig. 5) and distributional properties of RSN. Day: 0900–1800 LST; summer: May–Sep

Table 1.
Table 2.

Distribution of RSN in the combined partition Tcom defined by the intersection of terminal nodes of Torg and Tlog

Table 2.
Table 3.

Ordinary and cross-validation estimates of relative error for Torg, Tlog, Tcom, and the linear regression model, by year

Table 3.
Table 4.

Characterization of pollution sources and the number of exceedances beyond one-quarter of the respective air quality standards, 1996–97

Table 4.
Save