• Barnes, S. L., 1964: A technique for maximizing details in numerical weather map analysis. J. Appl. Meteor., 3 , 396409.

  • Black, T. L., 1994: The new NMC mesoscale Eta model: Description and forecast examples. Wea. Forecasting, 9 , 265278.

  • Brugge, R., 1995: Heatwaves and record temperatures in North America. Weather, 50 , 2023.

  • Changnon, S. A., , K. E. Kunkel, , and B. C. Reinke, 1996: Impacts and responses to the 1995 heat wave: A call to action. Bull. Amer. Meteor. Soc., 77 , 14971506.

    • Search Google Scholar
    • Export Citation
  • DuBois, P., 2005: MYSQL. 3d ed. New Riders Publishing, 1320 pp.

  • El-Kadi, A. K., , and P. A. Smithson, 1996: An automated classification of pressure patterns over the British Isles. Trans. Inst. Br. Geogr., 21 , 141156.

    • Search Google Scholar
    • Export Citation
  • Freedman, D., , and P. Diaconis, 1981b: On the histogram as a density estimator: L2 theory. Z. Wahrscheinlichkeitstheor. Verw. Gebeite, 57 , 453476.

    • Search Google Scholar
    • Export Citation
  • Glickman, T., 2000: Glossary of Meteorology. 2d ed. Amer. Meteor. Soc., 855 pp.

  • Grumm, R. H., , and R. Hart, 2001: Standardized anomalies applied to significant cold season weather events: Preliminary findings. Wea. Forecasting, 16 , 736754.

    • Search Google Scholar
    • Export Citation
  • Hart, R. H., , and R. H. Grumm, 2001: Using normalized climatological anomalies to rank synoptic-scale events objectively. Mon. Wea. Rev., 129 , 24262442.

    • Search Google Scholar
    • Export Citation
  • Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77 , 437471.

  • Kirchhofer, W., 1973: Classification of European 500 mb patterns. Swiss Meteorological Institute No. 43, Zurich, Switzerland.

  • Kistler, R., and Coauthors, 2001: The NCEP–NCAR 50-Year Reanalysis: Monthly means CD-ROM and documentation. Bull. Amer. Meteor. Soc., 82 , 247267.

    • Search Google Scholar
    • Export Citation
  • Klein, W. H., , and F. Lewis, 1970: Computer forecasts of maximum and minimum temperatures. J. Appl. Meteor., 9 , 350359.

  • Knight, P. G., , and M. E. Evans, 2000: Prediction of excessive rainfall in the Middle Atlantic region. Preprints, Second Conf. on Extreme Precipitation, Long Beach, CA, Amer. Meteor. Soc., 96–98.

  • Kruizinga, S., , and A. H. Murphy, 1983: Use of an analog procedure to formulate objective probabilistic temperature forecasts in the Netherlands. Mon. Wea. Rev., 111 , 22442254.

    • Search Google Scholar
    • Export Citation
  • Livezey, R. E., , and R. Tinker, 1996: Some meteorological, climatological, and microclimatological considerations of the severe U.S. heat wave of mid-July 1995. Bull. Amer. Meteor. Soc., 77 , 20432054.

    • Search Google Scholar
    • Export Citation
  • Lund, I. A., 1963: Map-patterns classification by statistical methods. J. Appl. Meteor., 2 , 5665.

  • Mckendry, I. G., 1994: Synoptic circulation and summertime ground-level ozone concentrations at Vancouver, British Columbia. J. Appl. Meteor., 33 , 627641.

    • Search Google Scholar
    • Export Citation
  • Mesinger, F. G., and Coauthors, 2006: North American Regional Reanalysis. Bull. Amer. Meteor. Soc., 87 , 343360.

  • Namias, J., 1982: Anatomy of Great Plains protracted heat waves (especially the 1980 summer drought). Mon. Wea. Rev., 110 , 824838.

  • Rogers, E., , D. Parrish, , Y. Lin, , and G. DiMego, 1996: The NCEP Eta data assimilation system: Tests with regional 3-D variational analysis and cycling. Preprints, 11th Conf. on Numerical Weather Prediction, Norfolk, VA, Amer. Meteor. Soc., 105–106.

  • Rogers, E., and Coauthors, 1997: Changes to the NCEP operational “early” Eta analysis/forecast system. NOAA/NWS Tech. Procedures Bulletin 447, 16 pp. [Available from Office of Meteorology, National Weather Service, 1325 East–West Highway, Silver Spring, MD 20910.].

  • Scott, D. W., 1979: On optimal and data-based histograms. Biometrika, 66 , 605610.

  • Uccellini, L. W., , and P. J. Kocin, 1987: The interaction of jet streak circulations during heavy snow events along the East Coast of the United States. Wea. Forecasting, 2 , 289308.

    • Search Google Scholar
    • Export Citation
  • Uccellini, L. W., , P. J. Kocin, , R. S. Schneider, , P. M. Stokols, , and R. A. Dorr, 1995: Forecasting the 12–14 March 1993 superstorm. Bull. Amer. Meteor. Soc., 76 , 183199.

    • Search Google Scholar
    • Export Citation
  • Van den Dool, H. M., 1994: Searching for analogues, how long must one wait? Tellus, 46A , 314324.

  • Vislocky, R. L., , and G. S. Young, 1989: The use of perfect prog forecasts to improve model output statistics forecasts of precipitation probability. Wea. Forecasting, 4 , 202209.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp.

  • Witten, I. H., , and E. Frank, 2005: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, 525 pp.

  • Yarnal, B., , and J. D. Draves, 1993: A synoptic climatology of stream flow and acidity. Climate Res., 2 , 193202.

  • View in gallery

    Schematic of fingerprinting technique for detecting major weather events. Boxes represent datasets, and the arrows represent the steps necessary to generate them.

  • View in gallery

    The number of clustered events will always be a subset of the total number of major weather events. The degree of clustering depends on the number of grid points within the domain (ρclust).

  • View in gallery

    Global reanalysis data valid at 0000 UTC 15 Jul 1995 showing (a) 500-hPa heights (dam), (b) 850-hPa temperatures (°C), (c) 700-hPa temperatures (°C), and (d) precipitable water (mm). Shading shows departures of each field in standard deviations from normal as indicated by the shading bar at the bottom of each panel. Heights are contoured at 6 dam, temperatures are contoured at every 2°C, and precipitable water is contoured at every 5 mm.

  • View in gallery

    Primary maximum peak anomalies showing (a) unclustered locations of 850-hPa temperature anomalies for all heat events and (b) spatially clustered location of 850-hPa temperature anomalies. Each circle represents one occurrence; concentric circles denote multiple occurrences at a single point.

  • View in gallery

    As in Fig. 4 except showing (a) unclustered locations of 500-hPa height anomalies for all heat events and (b) spatially clustered locations of 500-hPa height anomalies for all heat events.

  • View in gallery

    As in Fig. 3 except valid at 0000 UTC 8 Jan 1996 showing (a) mean sea level pressure (hPa) and anomalies, (b) precipitable water (mm) and anomalies, (c) 850-hPa winds and U-wind anomalies, and (d) 850-hPa winds and V-wind anomalies. Mean sea level pressure units are every 4 hPa, precipitable water is every 5 mm, and winds are in knots (1 kt ≈ 0.5 m s−1). Anomalies are in standard deviations from normal.

  • View in gallery

    As in Fig. 4 except showing (a) unclustered locations of mean sea level negative pressure anomalies for all snow events and (b) spatial clustered location of negative MSLP anomalies.

  • View in gallery

    As in Fig. 4 except showing (a) unclustered locations of 850-hPa U-wind negative anomalies and (b) spatially clustered location of 850-hPa negative U-wind anomalies for all snow events.

  • View in gallery

    Event-score distributions for 12 of the event types. The “climatological” event scores (solid line) are calculated from 2000 random analyses in the NCEP global reanalysis. The “trained” event scores (dashed line) are calculated for events with which the system was trained. An event-score distribution of “untrained” events (dotted line), derived using 10-fold cross validation, approximates the expected distribution of future events of that type. Skill in learning events is demonstrated by the separation between the climatological distribution and the trained distribution. Skill in recognizing events is demonstrated by the separation between the climatological distribution and the untrained distribution.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 55 55 3
PDF Downloads 39 39 3

A Fingerprinting Technique for Major Weather Events

View More View Less
  • 1 The Pennsylvania State University, University Park, Pennsylvania
  • | 2 National Weather Service, State College, Pennsylvania
  • | 3 ZedX, Inc., Bellefonte, Pennsylvania
© Get Permissions
Full access

Abstract

Advances in numerical weather prediction have occurred on numerous fronts, from sophisticated physics packages in the latest mesoscale models to multimodel ensembles of medium-range predictions. Thus, the skill of numerical weather forecasts continues to increase. Statistical techniques have further increased the utility of these predictions. The availability of large atmospheric datasets and faster computers has made pattern recognition of major weather events a feasible means of statistically enhancing the value of numerical forecasts. This paper examines the utility of pattern recognition in assisting the prediction of severe and major weather in the Middle Atlantic region. An important innovation in this work is that the analog technique is applied to NWP forecast maps as a pattern-recognition tool rather than to analysis maps as a forecast tool. A technique is described that employs a new clustering algorithm to objectively identify the anomaly patterns or “fingerprints” associated with past events. The potential refinement and applicability of this method as an operational forecasting tool employed by comparing numerical weather prediction forecasts with fingerprints already identified for major weather events are also discussed.

Corresponding author address: Paul Knight, The Pennsylvania State University, Department of Meteorology, 503 Walker Bldg., University Park, PA 16802. Email: pgk2@psu.edu

Abstract

Advances in numerical weather prediction have occurred on numerous fronts, from sophisticated physics packages in the latest mesoscale models to multimodel ensembles of medium-range predictions. Thus, the skill of numerical weather forecasts continues to increase. Statistical techniques have further increased the utility of these predictions. The availability of large atmospheric datasets and faster computers has made pattern recognition of major weather events a feasible means of statistically enhancing the value of numerical forecasts. This paper examines the utility of pattern recognition in assisting the prediction of severe and major weather in the Middle Atlantic region. An important innovation in this work is that the analog technique is applied to NWP forecast maps as a pattern-recognition tool rather than to analysis maps as a forecast tool. A technique is described that employs a new clustering algorithm to objectively identify the anomaly patterns or “fingerprints” associated with past events. The potential refinement and applicability of this method as an operational forecasting tool employed by comparing numerical weather prediction forecasts with fingerprints already identified for major weather events are also discussed.

Corresponding author address: Paul Knight, The Pennsylvania State University, Department of Meteorology, 503 Walker Bldg., University Park, PA 16802. Email: pgk2@psu.edu

1. Introduction

Major weather events continue to pose a challenge for forecasters. Although there are notable success stories of predicting important storms, such as the 12–14 March 1993 superstorm (Uccellini et al. 1995) and the 6–8 January 1996 mid-Atlantic snowstorm (Hart and Grumm 2001), many major weather events are not recognized in the model output by even experienced forecasters. Pattern recognition and forecast analogs offer a paradigm for meeting this prediction challenge. The premise of pattern recognition and forecast analogs is that major weather events have repeatable and specific atmospheric anomaly fields that can be objectively identified within datasets such as the National Centers for Environmental Prediction (NCEP) reanalysis dataset. The results of that objective identification can be used to detect similar patterns in numerical weather prediction (NWP) forecasts. The fingerprint approach differs from traditional analog forecasting in that it follows a perfect-prognostic (“perfect prog”) method (Klein and Lewis 1970; Vislocky and Young 1989), searching the forecast fields rather than the current analysis for analogs. Thus, the pattern prediction skill depends upon the NWP model while the pattern recognition is handled by a statistical postprocessing system as described below.

This pattern-recognition system is based on the concept of weather types, which are defined by the American Meteorological Society as a series of generalized synoptic situations or patterns usually presented in chart form (Glickman 2000). Weather types are selected to represent typical pressure and frontal patterns and were originally devised as a method for lengthening the effective time range of forecasts. The concept of weather types and patterns implies the existence of analogs. These analog-based methods can be treated either as forecasts systems or as a pattern-recognition system as described below.

Analog forecasting has been shown to be useful but limited. Van den Dool (1994) demonstrated that approximately 1030 years of data would be required to produce useful natural analogs of global 500-hPa height fields to within instrument error. He showed that an analog forecast method could show skill over using climatological means when relatively small domains are considered. The resulting 500-hPa field could also be used to estimate the surface weather. With a short data library and analogs over limited areas, the mission of diagnosing the sensible weather over a region has been one of the more successful applications of analogs (Van den Dool 1994). That study also concluded that a constructed analog, which is similar to a weighted average of natural analogs, is more skillful than the average of the top 10 unweighted natural analogs. Studies demonstrating the success of analogs over limited forecast domains can be found in Vislocky and Young (1989) and Kruizinga and Murphy (1983).

In contrast, this paper will demonstrate the value of analogs in pattern recognition over a limited domain. The key difference in the current work is that the analog technique is applied to NWP forecast maps as a pattern-recognition tool rather than to analysis maps as a forecast tool. Thus, the forecast skill of the system depends in large measure on the underlying NWP model rather than on the subsequent application of analogs.

In this paper, a method is presented that leverages the existence of limited-area analogs (Van den Dool 1994) and distinct weather patterns or fingerprints associated with major weather events. To accomplish this, a comprehensive database of types of major weather events was developed. The NCEP–National Center for Atmospheric Research (NCAR) reanalysis data were then processed to determine the pattern and potential predictors associated with each event type (see Table 1). Unlike previous analog and weather-pattern studies, the departures of the fields from normal in standard deviations were used, as in Hart and Grumm (2001) and Grumm and Hart (2001). Previous analog studies focused on correlating surface weather to 500- or 700-hPa patterns. In this study, anomalies instead of raw patterns determine analogs, allowing the system to recognize better the complex interaction of spatially distributed elements for large and major weather events of a particular type.

Past analyses have elucidated some of the patterns associated with specific major weather events. For example, synoptic-scale weather patterns associated with East Coast snowstorms have been identified (Uccellini and Kocin 1987). These include the presence of an area of surface convergence over the northwestern Atlantic Ocean, a strong surface anticyclone over New England and a cyclone along the East Coast. In addition, the presence of a strong upper-level jet entrance region and a coupled jet circulation are important parts of the large-scale pattern conducive to major East Coast winter storms. Grumm and Hart (2001) examined the anomalies in mean sea level pressure (MSLP), 500-hPa heights, and 850-hPa temperatures for the storms presented by Uccellini and Kocin (1987), revealing distinct anomaly patterns with these storms. Additional snowstorms were also examined by Grumm and Hart (2001). Their results showed that many record snowstorms were associated with conditions that departed significantly from normal. Fields such as MSLP and 500-hPa heights were typically two standard deviations below normal in the vicinity of these events. Grumm and Hart also demonstrated that numerical weather prediction models can successfully forecast significant departures from normal at least 2–3 days in advance and in some instances as much as 6 days prior to the event. The reliability of predicting extreme anomalies has, however, yet to be validated.

Likewise, specific anomaly patterns have also been observed with flooding in Virginia, which occurs with the combination of anomalous values of precipitable water, a maximum in the 700-hPa thermal field east of the region, and unusually strong negative values of the U component (east) of the 850-hPa winds. The latter is often coupled with a highly positive value of the V component (south) of 200-hPa wind along the East Coast (Knight and Evans 2000).

Namias (1982) described the pattern associated with heat waves in the United States. The key feature was a persistent and strong upper-level ridge. The association of anticyclones with heat waves that is observed for the United States appears to be a common thread that is observed in the United Kingdom, too. The accompanying subsidence produces cloud-free conditions and an inversion (Brugge 1995) that facilitate and maintain the low-level heating. Livezey and Tinker (1996) documented the importance of the strong and persistent anticyclonic conditions over the Midwest during the fatal 1995 Chicago heat wave.

This paper addresses the key elements associated with the major weather events noted in Table 1. The next section details the criteria used to define these events along with the sources of data that describe them. The following section presents an objective method used to estimate the importance of atmospheric data fields for each event. An innovative approach that objectively measures how maximum and minimum values of anomalies are spatially clustered is discussed along with the analysis of the anomalies’ relative magnitudes. A method for using this information to diagnose an event’s presence in the NWP forecast is also detailed. The results section discusses the plans for implementing an early alert tool for major weather events over the mid-Atlantic region. A data flowchart in Fig. 1 also serves to depict the sequences of these stages.

2. Methods and data

a. Major events database

For objective identification of the patterns associated with major weather events, it is paramount to develop criteria to identify particular weather phenomena and to use the criteria to create an extensive database of historical examples of such events. A relational database (DuBois 2005) of significant weather events was generated for a domain similar to the Middle Atlantic River Forecast Center (MARFC) region for the period 1950 to present. Most events correspond with those in the National Climatic Data Center (NCDC) storm-events database, and others have been obtained from additional sources. The fire occurrences were extracted from the fire bills that were obtained from the Department of Forestry in Pennsylvania and the storm-events database from NCDC for other states within the MARFC domain.

b. Event criteria and terminology

Twelve phenomena were selected for this study and are listed in Table 1. Each instance of a phenomenon is referred to as an occurrence. An occurrence describes a specific time and place and is local in scope. Occurrences are spatially or temporally grouped into events that compose a single storm or outbreak. These are regional in scope and are represented by an average location and time. For example, an occurrence of “snow” would be “26 inches of snow at Allentown, PA,” and the event that contains this occurrence would be “blizzard of January 7–8, 1996.” Criteria were needed for deciding both which occurrences to include in the database and how to group occurrences into events. The details of the events, the threshold values, and the number of documented occurrences within the Middle Atlantic region are also displayed in Table 1.

c. Structure of the database

The occurrences are grouped into events and are given event index identifiers. For synoptic events that occur on longer time scales, such as heat waves, occurrences over consecutive dates compose an event. Mesoscale phenomena are more strictly grouped. For example, tornadoes must occur within a 6-h period of each other to be classified as a tornado event.

Quality assurance included spot checks. For the purposes of the fingerprinting algorithm, an average date–time and latitude–longitude are compiled for each event. The date–time field typically refers to the “peak time” of the event, which is a time between the start and end time at which the event is estimated to be most intense. A Web interface (http://www.climate.psu.edu/data/events/) facilitated access to the data.

d. Historical reanalysis

The meteorological analysis and their anomaly fields are obtained for all events from the NCEP–NCAR global reanalysis dataset (Kalnay et al. 1996; Kistler et al. 2001) by using a subset of the domain for the eastern two-thirds of the United States. The reanalysis dataset is global, with a horizontal grid spacing of 2.5° × 2.5° at 17 vertical pressure levels and a 6-h temporal resolution beginning 0000 UTC 1 January 1948.

Because this research focuses on the eastern United States where the density and quality of the observing network were sufficient throughout the reanalysis period, it is considered to be adequate for research purposes. Future research will use the newly available higher-resolution historical reanalysis developed using the NCEP Eta data assimilation system (Black 1994; Rogers et al. 1996, 1997) applied over the period 1979–present and referred to as the North American Regional Reanalysis (NARR) project (Mesinger et al. 2006). This dataset exceeded the capacity of the available computational resources until recently.

A number of meteorological variables from the NCEP–NCAR reanalysis dataset are used in this pattern recognition. These include geopotential heights, temperatures, specific humidity, meridional and zonal winds, sea level pressure, precipitable water, and 2-m temperature. In addition, several other variables are constructed from these elementary fields: thickness, shear, and wind speed.

Because identifying anomaly patterns is the goal of this research, a technique similar to that developed by Hart and Grumm (2001) is used to generate and standardize the magnitude of field anomalies from the reanalysis. This technique begins with calculating the World Meteorological Organization standard 30-yr (1971–2000) climatological mean X and standard deviation σ from all NCEP reanalysis and derived fields for each 6-h period within a calendar year. Each 6-h series is smoothed with a 21-day centered running mean (0000, 0600, 1200, and 1800 UTC). This process preserves both the annual and diurnal cycles while smoothing the irregularities resulting from undersampling of season-to-season variations. The climatological mean is then subtracted from the observed field X and is divided by that field’s standard deviation (at each 6-h interval) to arrive at a normalized anomaly N (this technique is applied for all 6-h periods from 1948 to 2004):
i1558-8432-46-7-1053-e1

All fields are normalized despite the fact that some fields, such as wind, are not best described by a normal distribution, though they have been demonstrated to be a close approximation (Grumm and Hart 2001). In addition, the normalization process aids the recognition of the extreme events because the magnitude of the standardized anomalies is independent of season and location, which is a characteristic that is important in the recognition of patterns conducive to major weather events. The compiled datasets are then used in an objective statistical model to examine the major weather event and its linked patterns.

3. Pattern-recognition method

a. Techniques

Objective pattern recognition begins with the compilation of a training set. For this study, the training set consists of a multivariate group of anomaly fields at the time of each event in the database. Some of these fields, such as precipitable water, represent vertical integrals, whereas other fields such as U and V wind components, temperatures, heights, and specific humidity are computed at specific pressure levels.

The training set is composed of the primary and secondary peaks and valleys of selected anomaly fields. A peak (valley) is defined as the grid point having a standardized anomaly value that is larger (smaller) than or equal to the values of the surrounding grid points. The primary peak (valley) has a value that is larger (smaller) than those of all the other peaks (valleys) in the domain. The secondary peak (valley) has the second largest (smallest) value over the same domain. For each event, the training set contains the location and the standardized anomaly value for the primary and secondary peaks and valleys for select variables and levels. The data fields selected for major weather events are based on previous research by Knight and Evans (2000) and Grumm and Hart (2001). Some are well known, such as deep southerly flow and high precipitable water related to heavy rainfall, and others are culled from the experience of veteran forecasters to expedite the selection of pertinent data fields (N. Junker 2004, personal communication). Not all data fields were examined because the focus of the method is to identify fingerprints of major weather events from only the most important anomaly fields.

The training method uses a spatial clustering algorithm to group anomaly fields for each type of event. The purpose of this step is to identify any spatial groupings of recurring peaks and valleys for each of these fields while also removing outlier extreme values. The clustering algorithm is applied to the set of locations for each of a given field’s primary and secondary peaks and valleys. For each field, the algorithm returns only the most significant cluster.

Clusters are identified using a new method called strong-point analysis. This method analyzes the gridpoint density of peaks and valleys to determine which grid points are likely to exhibit such extrema in future events. These “strong points” are identified in one of two ways. First, if the density at a grid point is greater than 1.85 standard deviations above the mean density using a gamma distribution, then that point is strong enough to stand on its own. Second, if no line can be drawn through the point without encountering an adjacent point with a density greater than 0.25 standard deviations below the mean density, then the point is strongly surrounded and is likely an interior point of the cluster. So long as the point itself is strong enough (density greater than 0.25 standard deviations below mean density), it too is designated as a strong point. Neighboring strong points are then connected into a network that composes the skeleton of a cluster. Other grid points embedded within the skeleton (the “weak points within”) are then padded onto the skeleton to complete the final cluster. To qualify for inclusion, a “weak point” must have its density weighted by the number of surrounding strong points greater than 0.25 standard deviations below the mean density. This approach allows for nonelliptical, fenestrated (with openings) clusters, which is an improvement over more conventional clustering techniques. Each peak or valley that fits into a cluster is called a member of that cluster. A cluster represents an array of grid points that, for a particular field, forms one portion of the event type’s multivariate/spatial fingerprint. Each weather parameter, such as temperature, contributes up to four fields to identify the fingerprint: one for each peak and valley (one primary and one secondary).

This new clustering technique was developed because existing clustering techniques generally explicitly favor those clusters that are spatially compact rather than simply spatially contiguous (Wilks 1995; Witten and Frank 2005). Because the topography of the mid-Atlantic region features an elongated mountain range (the Appalachians), the quasi-linear Atlantic coast, and an extended group of Great Lakes, the spatial finger prints of the appropriate synoptic/terrain interaction are apt be elongated rather than compact clusters. It was thus necessary to depart from the traditional clustering algorithms such as K-means (Witten and Frank 2005) that seek primarily to minimize the distance between cluster members and their centroid. Instead, the strong-point analysis developed here seeks to include in a cluster all contiguous locations with a high frequency of peak or valley occurrence.

Other, non-cluster-based synoptic typing techniques, such as those using the Kirchhofer sum-of-squares score (Kirchhofer 1973; Mckendry 1994) or Lund correlation (Lund 1963; Yarnal and Draves 1993), look for analogs to the entire anomaly field, whereas the hypothesis being examined here is that the most crucial aspects of the analog field are the location and intensity of the greatest anomalies. Thus, the strong-point technique was developed to extract those aspects while ignoring the weak areas of the field that are apt to be far less apropos to the occurrence of a major weather event. Issues related to this decision are discussed in depth in El-Kadi and Smithson (1996).

It is crucial to weigh each field’s significance so that its role in an event’s fingerprint can be objectively determined. To do so, the overall value of a field as an indicator is broken down into two components. The first component α is the measure of the cluster’s spatial significance, and the second component ϕ is the measure of the significance of the cluster members’ anomaly values. All fields have α and ϕ values for that field’s cluster.

The equation for a cluster’s α metric is
i1558-8432-46-7-1053-e2
where Nclust,i is the number of members in the cluster and Nevent is the number of events of this type in the database, ρclust,i is the event density over the cluster’s grid points as a count per grid point, and ρi is the average event density of the field over the entire domain (see Fig. 2); Nclust,i/Nevent is called the consistency coefficient because its effect is to diminish a field’s influence on the fingerprint when the cluster contains only a fraction of the original set of events. This equation can be viewed as the comparison between the cluster’s density and the density of a completely random distribution of peaks or valleys. Therefore, a denser cluster will have a larger α value, indicating that the peak and valley locations play a consistent role in the event’s fingerprint. Negative values are prohibited because they would mean that the cluster represented a local minimum in density rather than a local maximum. The values of α can range from 0 (totally random) to 1 (a perfect cluster).
The equation for a cluster’s ϕ metric is
i1558-8432-46-7-1053-e3
where σclust,i and μclust,i are the standard deviation and the mean of the cluster members’ standard normalized anomaly values, respectively. This equation scales the consistency coefficient by a quantity that is a function of how small the standard deviations are and how far the absolute value of their mean is from 1.5. Therefore, if the standard normalized anomaly values for a cluster have similar values, the standard deviation will be low, contributing to a larger ϕ value. If the average of the standard normalized anomaly values is close to ±1.5, then this would contribute to a smaller ϕ value. This equation can be interpreted as the comparison between the standard deviation and the mean of the cluster members’ values and the standard deviation and mean of a completely random distribution of standard anomaly values, which were determined to be approximately 1 and 1.5, respectively. Larger values of ϕ indicate that the peaks’ and valleys’ standard normalized anomaly values play a large part in the event type’s fingerprint. The range of values for ϕ is from 0 (random) to ∞ (impossible case).

b. Fingerprint application

These two quantities (α, ϕ) are used to evaluate the contribution of each field to an event’s fingerprint. The algorithm is trained using historical cases from the events database and is then ready to be tested on any day’s weather pattern. For the training system to be effective, it must also recognize and distinguish between the various event types (i.e., snow or hail). To do this, it must be able to measure how well the peaks and valleys at a specific time match the fingerprints of various event types. As with the cluster metrics, the measure of an individual peak or valley and its spatial connection to other major event anomalies is separated into two components: γ and χ. The equation for γ measures how well an expected peak or valley matches the spatial locale of the corresponding cluster:
i1558-8432-46-7-1053-e4

This equation sums the Barnes distance (Barnes 1964) between the location of the numerical prediction model forecast peak or valley and the location of each member j of that cluster. Here Distj is the great circle distance in kilometers, and R is the area-of-influence parameter in kilometers squared. It is currently set to 14 792 km2, which is approximately the area of a reanalysis grid box (2.5°). The nearer a forecast peak or valley is to the cluster, the larger the value of γ will be.

The maximum possible value of γ depends on the number of members within a particular cluster and on the spatial configuration of that cluster. Because there are an infinite number of possible configurations, there is no generalized way to find a maximum value for γ. Therefore, the maximum γ value for each cluster has to be approximated by calculating all possible γ values at high grid resolution.

The value of χ measures how well the standard anomaly value of an expected peak or valley matches the set of standard normalized anomaly values of the corresponding cluster members. To calculate χ, a histogram of the cluster members’ standard normalized anomaly values is created. The width of a bin of the histogram is determined using Eqs. (5a) and (5b) (Freedman and Diaconis 1981b;Scott 1979):
i1558-8432-46-7-1053-e5a
i1558-8432-46-7-1053-e5b
where σclust,i is the standard deviation of the cluster members’ standard normalized anomaly values, and Nclust,i is the number of members in the cluster. IQRclust,i is the interquartile range of the cluster members’ standard normalized anomaly values. This equation produces the bin width for a histogram that should be an unbiased approximation of the distribution of the cluster members’ standard normalized anomaly values.
Once the histogram is generated, χ is calculated by ascertaining in which bin the forecast peak’s and valley’s standard normalized anomaly value belongs. Then, a weighted average of the adjacent histogram bins yields χ in Eq. (6):
i1558-8432-46-7-1053-e6
where ωk is the cluster’s histogram frequency for bin number k. Larger values of χ indicate that the standard anomaly value of the forecast peak or valley fits better with the fingerprint. The maximum possible χ value for each cluster is calculated and kept with the cluster’s maximum possible γ value.
These pattern-matching metrics (γ and χ), combined with the cluster importance metrics (α and ϕ), are summed to obtain the weighted match value for each cluster i:
i1558-8432-46-7-1053-e7

This combined measure yields maximum values for a forecast peak or valley when it most closely matches the position and standard anomaly values of a dense cluster. For example, consider the case of a cluster that is spread across much of the domain and an anticipated peak or valley located within the area of the cluster. For this case, the normalized γ value would be large, but the α value would be small, since the cluster is not spatially compact. This would diminish the υ value. Consider the same cluster, but with every member of that cluster having the same standard anomaly value, giving the cluster a large ϕ value. If the expected peak or valley had a similar standard normalized anomaly value, then the χ value would be large, increasing υ. The inclusion of αi and ϕi weights the measure based upon how important that field is to that fingerprint of that event type.

The sum of all the fields’ υ values is the event score for that event type. An event score is computed for each event type based on the expected fields for a given forecast. The larger the event score is, the better the expected peaks or valleys match that fingerprint:
i1558-8432-46-7-1053-e8

Each event type could have its own distribution of event scores that is due to the varying number of fields used for its respective pattern fingerprint. Thus, there is no single threshold value for event scores because they will vary by type.

It is important to note that the event score does not take into account the severity of the events. From a mathematical view, larger event scores indicate that the pattern more closely resembles the average pattern of the events of that type in the training set.

For this system to be useful operationally, a target value for the event score must be used as a comparison point. This threshold value is the point where higher values indicate the likely pattern of the event type and lower values indicate the lack thereof. An event type’s threshold score value is tentatively defined as the event score that is at the first quartile of the sample event scores ascertained by hindcasting.

So, how does one know when a forecast’s event score is large enough to indicate a positive identification? The solution is to create sample distributions of the event scores by hindcasting. Recall that the global reanalysis dataset has been used as the best available approximation of the atmosphere to train the system. By following the perfect-prog approach and using this dataset again as a surrogate model, the analysis during each event can be compared with the assembled fingerprints so that event scores can be estimated for each of the event types.

4. Results

The patterns associated with heat waves and snowstorms were the most clear-cut. The conditions at 0000 UTC 15 July 1995 during the heat wave of July 1995 (Changnon et al. 1996) are shown in Fig. 3. For demonstration purposes, each panel shows a primary peak positive anomaly for four diverse fields covering eastern North America. The 500-hPa height anomaly indicates a widespread positive anomaly across the Midwest and interior Northeast. Stronger anomalies are present in the 850- and 700-hPa temperature fields (Figs. 3b and 3c). Even anomalous values of precipitable water are noted along the perimeter of the height and temperature anomalies.

When all heat waves are considered (more than 100 cases), the location of all primary peak 850-hPa temperature and 500-hPa height anomalies (see Figs. 4a and 5a) appear remarkably similar to the configuration noted in the 15 July 1995 event. The result of strong-point analysis of these data is used to determine the location of clusters. The 850-hPa temperature anomaly peak (Fig. 3a) at 0000 UTC 15 July 1995 occurred over eastern Lake Erie. The strong-point analysis identified this region as seen in Fig. 4b.

The results of the strong-point analysis for heat waves shows that 500-hPa height peak anomalies tend to occur near and southeast of the mid-Atlantic region (Fig. 5b) and 850-hPa temperature (Fig. 4b), 700-hPa temperature, and precipitable water anomalies (not shown) tend to occur north of the mid-Atlantic region.

The consistency coefficient values (Nclust/Nevent) for 850-hPa temperatures and 500-hPa heights are 0.404 and 0.281, respectively. For 850-hPa temperatures, there are positive anomalies in 104 events and 42 of these are preserved in the strong-point analysis. The 500-hPa data had 89 events, of which 25 are preserved in the strong-point analysis. The data in Figs. 4a and 4b imply a dense cluster and an accompanying high value of α for 850-hPa temperatures, which also had the highest consistency coefficients for heat events. The signal was not as strong was for 500-hPa temperatures where there are fewer primary peaks and fewer points in the strong-point analysis.

A different pattern is recognized with snowstorms. The blizzard of 1996 (Grumm and Hart 2001) produced heavy snow over a large portion of the eastern United States. The large-scale conditions and their associated anomalies are shown in Fig. 6. A deep surface cyclone was located east of the Delmarva Peninsula at 0000 UTC 8 January 1996 and the pressure anomalies were approximately three standard deviations below normal. In a comparison of this event with all other snowstorms within the database, Fig. 7a shows that the surface cyclone and its anomaly are located in proximity to anomalies associated with the other major East Coast snow events. The precipitable water (Fig. 6b) anomaly, 850-hPa U-wind anomaly (Fig. 6c), and 850-hPa V-wind (Fig. 6d) anomaly maximum are also located near the location that the strong-point analysis suggests is favored. The location of the 850-hPa negative U-wind anomalies and strong-point analysis of this field are shown in Figs. 8a and 8b. These data include the strong easterly jet found to the north of the surface cyclone in the January 1996 blizzard (Fig. 6c). Though not shown, the positive V anomaly is a slightly stronger pattern indicator than the U wind anomaly, and the strong-point analysis places this V anomaly over the western Atlantic in the warm sector of the surface cyclone.

The consistency coefficient values (Nclust/Nevent) for precipitable water primary peaks, 850-hPa U-wind primary valleys, 850-hPa V-wind primary peaks, and MSLP primary valleys values are 0.55, 0.645, 0.73, and 0.85, respectively. For 850-hPa U-wind valleys, there are negative anomalies in 110 events, and 71 of these are preserved in the strong-point analysis. The MSLP data have 107 events, of which 91 are preserved in the strong-point analysis. The data in Figs. 7a and 7b imply a dense cluster and an accompanying high value of α for MSLP, which had the highest consistency coefficients for snow events. The pattern noted in Fig. 8 shows the grouping of maximum negative U-wind components at 850 hPa when major snowstorms occur. Anticyclones, often associated with major snowstorms, have a significantly weaker signal from the strong-point analysis with the primary location in Oklahoma and Texas (not shown).

To test the skill of the pattern-recognition program, an evaluation is conducted of its ability to distinguish events from nonevents. Random dates are chosen to ascertain the baseline distribution of the event-score metric [Eq. (8)] for each type. This measure is also computed for the event cases by two methods. The first method uses all cases for both development and testing of the fingerprints. The second method applies 10-fold cross validation (Witten and Frank 2005), developing the system 10 times, each time using 90% of the cases to develop the system and the remaining 10% for testing. The testing results are combined into a single pool that thus includes all of the cases while still maintaining the independence of the developmental and testing cases. The difference between the resulting metric histograms for these two results quantifies the degree of overfitting in the method, and is, as one would expect, greatest for those event types with the fewest cases (Fig. 9). The difference between the cross-validation histogram and the baseline histogram quantifies the power of the method to distinguish an event type from other weather conditions. The distribution for actual events was then compared with the baseline distribution (Fig. 9). It can be seen that the pattern method easily distinguishes events from climatological values, most notably for cold, heat, snow, synoptic wind, and tornado events. Even some skill is distinguishable for ice, fog, hail, and fire events. Flooding and thunderstorm wind events have a modicum of skill, but the subgrid-scale size of these events limit the program’s ability to distinguish these types.

5. Conclusions

A method is presented to identify the fingerprints of pattern anomalies associated with various types of major weather events in the mid-Atlantic region. The current implementation employs the NCEP–NCAR reanalysis dataset as the main identification tool. For each major weather event type defined herein, the location of primary and secondary peaks and valleys of anomalies are used to establish a clustering of the most likely area of event occurrence. The anomaly values themselves are also tested for their clustering tendency. The assertion is that the location and intensity of certain anomalies are associated with similar event types. This concept is outlined subjectively by Uccellini and Kocin (1987) for East Coast snowstorms and is presented quantitatively by Hart and Grumm (2001).

The technique described in this paper shows promise for discerning the patterns associated with other major weather events. Future research requires regular updating of the events database and the associated fingerprints. The identification equations will be tested using output from a numerical weather prediction model to assess the feasibility of recognizing the occurrence of each major event type in a real-time forecast environment.

For the next-generation technique, the NARR data will be used to create a new set of clusters. Because of its higher spatial and temporal resolution, this dataset may permit further refinement of the identification algorithms. However, a limitation of this dataset is the number of years of available data, only 1979 to current.

Acknowledgments

The authors appreciate the early support of the COMET grant for An Interactive Climate System for Predicting Significant Weather Events, UCAR S03-44672. That project formed the foundation of this research. The authors also thank the reviewers of the paper for their insights and very helpful suggestions.

REFERENCES

  • Barnes, S. L., 1964: A technique for maximizing details in numerical weather map analysis. J. Appl. Meteor., 3 , 396409.

  • Black, T. L., 1994: The new NMC mesoscale Eta model: Description and forecast examples. Wea. Forecasting, 9 , 265278.

  • Brugge, R., 1995: Heatwaves and record temperatures in North America. Weather, 50 , 2023.

  • Changnon, S. A., , K. E. Kunkel, , and B. C. Reinke, 1996: Impacts and responses to the 1995 heat wave: A call to action. Bull. Amer. Meteor. Soc., 77 , 14971506.

    • Search Google Scholar
    • Export Citation
  • DuBois, P., 2005: MYSQL. 3d ed. New Riders Publishing, 1320 pp.

  • El-Kadi, A. K., , and P. A. Smithson, 1996: An automated classification of pressure patterns over the British Isles. Trans. Inst. Br. Geogr., 21 , 141156.

    • Search Google Scholar
    • Export Citation
  • Freedman, D., , and P. Diaconis, 1981b: On the histogram as a density estimator: L2 theory. Z. Wahrscheinlichkeitstheor. Verw. Gebeite, 57 , 453476.

    • Search Google Scholar
    • Export Citation
  • Glickman, T., 2000: Glossary of Meteorology. 2d ed. Amer. Meteor. Soc., 855 pp.

  • Grumm, R. H., , and R. Hart, 2001: Standardized anomalies applied to significant cold season weather events: Preliminary findings. Wea. Forecasting, 16 , 736754.

    • Search Google Scholar
    • Export Citation
  • Hart, R. H., , and R. H. Grumm, 2001: Using normalized climatological anomalies to rank synoptic-scale events objectively. Mon. Wea. Rev., 129 , 24262442.

    • Search Google Scholar
    • Export Citation
  • Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77 , 437471.

  • Kirchhofer, W., 1973: Classification of European 500 mb patterns. Swiss Meteorological Institute No. 43, Zurich, Switzerland.

  • Kistler, R., and Coauthors, 2001: The NCEP–NCAR 50-Year Reanalysis: Monthly means CD-ROM and documentation. Bull. Amer. Meteor. Soc., 82 , 247267.

    • Search Google Scholar
    • Export Citation
  • Klein, W. H., , and F. Lewis, 1970: Computer forecasts of maximum and minimum temperatures. J. Appl. Meteor., 9 , 350359.

  • Knight, P. G., , and M. E. Evans, 2000: Prediction of excessive rainfall in the Middle Atlantic region. Preprints, Second Conf. on Extreme Precipitation, Long Beach, CA, Amer. Meteor. Soc., 96–98.

  • Kruizinga, S., , and A. H. Murphy, 1983: Use of an analog procedure to formulate objective probabilistic temperature forecasts in the Netherlands. Mon. Wea. Rev., 111 , 22442254.

    • Search Google Scholar
    • Export Citation
  • Livezey, R. E., , and R. Tinker, 1996: Some meteorological, climatological, and microclimatological considerations of the severe U.S. heat wave of mid-July 1995. Bull. Amer. Meteor. Soc., 77 , 20432054.

    • Search Google Scholar
    • Export Citation
  • Lund, I. A., 1963: Map-patterns classification by statistical methods. J. Appl. Meteor., 2 , 5665.

  • Mckendry, I. G., 1994: Synoptic circulation and summertime ground-level ozone concentrations at Vancouver, British Columbia. J. Appl. Meteor., 33 , 627641.

    • Search Google Scholar
    • Export Citation
  • Mesinger, F. G., and Coauthors, 2006: North American Regional Reanalysis. Bull. Amer. Meteor. Soc., 87 , 343360.

  • Namias, J., 1982: Anatomy of Great Plains protracted heat waves (especially the 1980 summer drought). Mon. Wea. Rev., 110 , 824838.

  • Rogers, E., , D. Parrish, , Y. Lin, , and G. DiMego, 1996: The NCEP Eta data assimilation system: Tests with regional 3-D variational analysis and cycling. Preprints, 11th Conf. on Numerical Weather Prediction, Norfolk, VA, Amer. Meteor. Soc., 105–106.

  • Rogers, E., and Coauthors, 1997: Changes to the NCEP operational “early” Eta analysis/forecast system. NOAA/NWS Tech. Procedures Bulletin 447, 16 pp. [Available from Office of Meteorology, National Weather Service, 1325 East–West Highway, Silver Spring, MD 20910.].

  • Scott, D. W., 1979: On optimal and data-based histograms. Biometrika, 66 , 605610.

  • Uccellini, L. W., , and P. J. Kocin, 1987: The interaction of jet streak circulations during heavy snow events along the East Coast of the United States. Wea. Forecasting, 2 , 289308.

    • Search Google Scholar
    • Export Citation
  • Uccellini, L. W., , P. J. Kocin, , R. S. Schneider, , P. M. Stokols, , and R. A. Dorr, 1995: Forecasting the 12–14 March 1993 superstorm. Bull. Amer. Meteor. Soc., 76 , 183199.

    • Search Google Scholar
    • Export Citation
  • Van den Dool, H. M., 1994: Searching for analogues, how long must one wait? Tellus, 46A , 314324.

  • Vislocky, R. L., , and G. S. Young, 1989: The use of perfect prog forecasts to improve model output statistics forecasts of precipitation probability. Wea. Forecasting, 4 , 202209.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp.

  • Witten, I. H., , and E. Frank, 2005: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, 525 pp.

  • Yarnal, B., , and J. D. Draves, 1993: A synoptic climatology of stream flow and acidity. Climate Res., 2 , 193202.

Fig. 1.
Fig. 1.

Schematic of fingerprinting technique for detecting major weather events. Boxes represent datasets, and the arrows represent the steps necessary to generate them.

Citation: Journal of Applied Meteorology and Climatology 46, 7; 10.1175/JAM2509.1

Fig. 2.
Fig. 2.

The number of clustered events will always be a subset of the total number of major weather events. The degree of clustering depends on the number of grid points within the domain (ρclust).

Citation: Journal of Applied Meteorology and Climatology 46, 7; 10.1175/JAM2509.1

Fig. 3.
Fig. 3.

Global reanalysis data valid at 0000 UTC 15 Jul 1995 showing (a) 500-hPa heights (dam), (b) 850-hPa temperatures (°C), (c) 700-hPa temperatures (°C), and (d) precipitable water (mm). Shading shows departures of each field in standard deviations from normal as indicated by the shading bar at the bottom of each panel. Heights are contoured at 6 dam, temperatures are contoured at every 2°C, and precipitable water is contoured at every 5 mm.

Citation: Journal of Applied Meteorology and Climatology 46, 7; 10.1175/JAM2509.1

Fig. 4.
Fig. 4.

Primary maximum peak anomalies showing (a) unclustered locations of 850-hPa temperature anomalies for all heat events and (b) spatially clustered location of 850-hPa temperature anomalies. Each circle represents one occurrence; concentric circles denote multiple occurrences at a single point.

Citation: Journal of Applied Meteorology and Climatology 46, 7; 10.1175/JAM2509.1

Fig. 5.
Fig. 5.

As in Fig. 4 except showing (a) unclustered locations of 500-hPa height anomalies for all heat events and (b) spatially clustered locations of 500-hPa height anomalies for all heat events.

Citation: Journal of Applied Meteorology and Climatology 46, 7; 10.1175/JAM2509.1

Fig. 6.
Fig. 6.

As in Fig. 3 except valid at 0000 UTC 8 Jan 1996 showing (a) mean sea level pressure (hPa) and anomalies, (b) precipitable water (mm) and anomalies, (c) 850-hPa winds and U-wind anomalies, and (d) 850-hPa winds and V-wind anomalies. Mean sea level pressure units are every 4 hPa, precipitable water is every 5 mm, and winds are in knots (1 kt ≈ 0.5 m s−1). Anomalies are in standard deviations from normal.

Citation: Journal of Applied Meteorology and Climatology 46, 7; 10.1175/JAM2509.1

Fig. 7.
Fig. 7.

As in Fig. 4 except showing (a) unclustered locations of mean sea level negative pressure anomalies for all snow events and (b) spatial clustered location of negative MSLP anomalies.

Citation: Journal of Applied Meteorology and Climatology 46, 7; 10.1175/JAM2509.1

Fig. 8.
Fig. 8.

As in Fig. 4 except showing (a) unclustered locations of 850-hPa U-wind negative anomalies and (b) spatially clustered location of 850-hPa negative U-wind anomalies for all snow events.

Citation: Journal of Applied Meteorology and Climatology 46, 7; 10.1175/JAM2509.1

Fig. 9.
Fig. 9.

Event-score distributions for 12 of the event types. The “climatological” event scores (solid line) are calculated from 2000 random analyses in the NCEP global reanalysis. The “trained” event scores (dashed line) are calculated for events with which the system was trained. An event-score distribution of “untrained” events (dotted line), derived using 10-fold cross validation, approximates the expected distribution of future events of that type. Skill in learning events is demonstrated by the separation between the climatological distribution and the trained distribution. Skill in recognizing events is demonstrated by the separation between the climatological distribution and the untrained distribution.

Citation: Journal of Applied Meteorology and Climatology 46, 7; 10.1175/JAM2509.1

Table 1.

The major weather event criteria and number of cases.

Table 1.
Save