A high-resolution drought-monitoring tool was developed to assess drought on multiple time scales using the standardized precipitation index (SPI). Daily precipitation data at 4-km resolution are obtained from the Advanced Hydrologic Prediction Service multisensor precipitation estimates (MPE) and are aggregated on several time scales. Daily station precipitation data available from the Cooperative Observer Program (COOP) provide the historical context for the MPE precipitation data. Pearson type-III distribution parameters were interpolated to the 4-km grid on the basis of a regional frequency analysis of the COOP stations and L-moment ratios of the precipitation data. The resulting high-resolution SPI data can be used as guidance for the U.S. Drought Monitor at the subcounty scale in areas where local precipitation is the primary driver of drought. The temporal flexibility and spatial resolution of the drought-monitoring tool are used to illustrate the onset, intensity, and termination of the 2008–09 Texas drought, and the tool is shown to provide better county- and subcounty-scale information than do gauge-based products.
Except for in riparian areas, the health and growth of local vegetation and nonirrigated crops are sensitive to precipitation that falls locally. Yet most tools for monitoring drought provide information only at very large spatial scales, such as climate divisions (Svoboda et al. 2002). Drought-monitoring tools are available at the station level, but the effectiveness of station data in monitoring drought is limited by the availability of data and the density of stations in a given area.
The U.S. Drought Monitor (USDM; Svoboda et al. 2002) provides a weekly status of drought intensity across the entire United States. Most decisions on drought are made and the effects are felt at the county and even subcounty levels (Quiring 2009). Dow et al. (2009) indicate that those in charge of local decisions prefer maps, figures, and tables to be restricted to their specific area. Therefore, the USDM is used as an assessment of drought at the local level, although its intent is to depict drought on a regional level.
Attempts have been made to reconstruct past drought at a high resolution. Sheffield et al. (2004) used a soil-moisture-based drought index to quantify drought in the United States at 12-km resolution. Andreadis and Lettenmaier (2006) used a simulated dataset of hydroclimatological variables at 0.5° resolution to analyze drought for the same time period. Land surface model output can be compared with these retrospective model runs to obtain estimates of the current relative drought severity. Kangas and Brown (2007) utilized Parameter-Elevation Regression on Independent Slopes Model (PRISM) monthly precipitation data to characterize both twentieth-century drought and pluviosity at 4-km resolution.
Efforts have also been made to utilize high-resolution remote sensing tools for drought monitoring. Anderson et al. (2011) developed a new remote sensing evaporative stress index on the basis of the ratio of evapotranspiration to potential evapotranspiration that provides drought information at a resolution of 5–10 km without requiring precipitation data. Jain et al. (2009) used several products from the Advanced Very High Resolution Radiometer (AVHRR) to determine drought- and wetness-related stresses on crops and vegetation. The AVHRR was employed by Ji and Peters (2003) to compare the normalized difference vegetation index at 1-km resolution with drought index data at the climate-division level. Brown et al. (2008) developed the vegetation drought response index (VegDRI), which combines traditional drought indicators with satellite-derived metrics to produce a real-time 1-km-resolution map of drought conditions. VegDRI integrates precipitation into its statistical model using station-based standardized precipitation index (SPI) values and Palmer drought severity index computations interpolated to a 1-km grid.
To quantify and categorize drought, it is essential to place accumulated precipitation at various time scales into historical context. Most common drought indices relate current conditions to an expected distribution of dry or wet conditions. For example, the SPI value for a given precipitation amount is defined as the number of standard deviations above or below a mean, given a historical probability distribution that has been transformed into a Gaussian shape. The SPI is a spatially invariant and objective method for quantifying precipitation on the basis of historical data that was originally developed by McKee et al. (1993). The ability of the SPI to quantify drought at different time scales makes it a valuable tool for detecting both short-term and long-term water supply issues (Hayes et al. 1999).
Improvement of current drought monitoring can be accomplished by combining precipitation information at small spatial scales with accurate estimates of historical conditions at the same resolution. The high-resolution national multisensor precipitation estimates (MPE) that have recently become operational (Lawrence et al. 2003; Young and Brunsell 2008) provide an excellent source of information on the primary driver of drought, precipitation, on a 4-km grid, but their relatively short period of record means that the historical context must come from elsewhere.
The purpose of this paper is to describe a drought-monitoring product that combines the MPE precipitation analysis with historical precipitation information from long-term climate stations to create a set of high-resolution SPI maps for drought monitoring. The suite of drought-indicator products generated from MPE analyses is here referred to as the MPE Drought Estimator, or MPEDE. Section 2 describes the method for determining the historical frequency distribution for individual Cooperative Observer Network (COOP) stations for a variety of dates and accumulation periods and for integrating this information with the MPE analyses to produce drought index maps. Section 3 compares the resulting drought index maps with purely station-based drought maps and discusses the strengths and limitations of the high-resolution products. Section 4 discusses the intensity and spatial extent of the 2008–09 Texas drought, which reached its peak in the summer of 2009, using the high-resolution SPI (MPEDE-SPI) maps. A summary is provided in section 5.
a. Regional frequency analysis
Regional frequency analysis is an approach in which data from several stations with similar event frequencies are combined to make conclusions about the event probability distribution across a region (Hosking and Wallis 1997). The first step is to separate stations with sufficient data into clusters, with each cluster containing members expected to have similar event probability characteristics. The observed probability distribution at each station is determined nonparametrically. Weighted averages of the normalized probability distribution characteristics for members within each cluster are then computed. Higher weighting is given to stations with longer precipitation records, with the assumption that the calculated distributions at these stations are likely to be closer to the true climatological distributions.
The probability distribution characteristics are specified as L-moment ratios, which are robust, nonparametric measures of the shape of a distribution independent of its scale of measurement (Hosking and Wallis 1997). The L moments are computed by using weighted averages of the ordered sample members; for example, the L moment describing the width of the probability distribution (analogous to the standard deviation) is based upon the average differences between the sample members. Conventional higher-order moments, in contrast, are computed using higher powers of the sample member departures from the sample mean. The L moments turn out to be more efficient than conventional moments and are less sensitive to outliers, and they perform better than conventional moments at all sample sizes for distributions with high skewness (Sankarasubramanian and Srinivasan 1999). Guttman (1999) concluded that L moments are accurate in describing at-site monthly precipitation distributions for large sample sizes, and they have become the standard method for characterizing regional precipitation frequencies (e.g., Bonnin et al. 2006). For a more comprehensive description of L moments and L-moment ratios, see Hosking and Wallis (1997).
The details of the computation of the L moments for Texas precipitation are described in the appendix. The results of the computation were scale parameters for each available long-term station and shape parameters for each cluster of stations. The parameters describe a Pearson type-III distribution, which was found to match best the observed probability distribution characteristics.
The value of the 1.5th precipitation percentile at each long-term station was compared with the lower tail of the Pearson type-III distribution fit using the L moments to assess how well the lower tail of the precipitation distributions reflected the actual extreme values. In general, the estimated distributions tended to slightly overestimate the frequency of extreme dry events, with the median of the sample 1.5th percentile falling just below the 2nd percentiles of the estimated distribution. This provides some confidence that Pearson type III is adequate for study of the lower tail of accumulated rainfall distributions and produces a reasonable but conservative measure of the severity of drought conditions.
b. Interpolation of Pearson type-III parameters to a high-resolution grid
The scale and shape parameters were estimated from long-term station data directly, but the high-resolution location parameter for the distributions was obtained from PRISM grid points. PRISM (Daly et al. 1994) uses station data, a digital elevation model, and other spatial datasets to generate monthly precipitation normals that are on 4-km grid cells. The 1971–2000 PRISM precipitation normals dataset is used to calculate the MPE percent of normal precipitation. Di Luzio et al. (2008) combined the PRISM dataset with station observations to create 1960–2001 daily precipitation grids at 4-km resolution. Implicit in the use of PRISM data is the assumption that the PRISM 1971–2000 normals provide a reasonable expected value for MPE analysis values, just as station data are assumed to provide reasonable estimates of other characteristics of the probability distribution.
To combine the high-resolution PRISM precipitation climatological description with the probability distribution information calculated for each station, duration, and ending time, the normalized scale and shape values σ/μ and γ/μ were interpolated using inverse distance weighting from the four closest stations to each grid point. For each day within a month i, the normal is assumed to be (PRISM_normali/daysi). Each duration and ending time thus has a PRISM 1971–2001 precipitation normal μPRISM and the normalized scale and shape values for a Pearson type-III distribution at each point on the 4-km grid.
c. Using MPE precipitation data to create a high-resolution SPI
The National Weather Service creates an up-to-date daily rainfall product (MPE) on the 4-km PRISM grid that is available from the beginning of 2005. The MPE is based on radar estimates of 24-h precipitation totals, satellite estimates in areas of poor radar coverage, and rain gauge values; an example is shown for 14 September 2008 (Fig. 1), just after Hurricane Ike. These daily values can be aggregated on a daily basis to create accumulated precipitation data for several different accumulation periods at each grid point (MPE aggregated precipitation, or MPEAP). Each MPEAP value is then normalized by μPRISM, and its placement on the historical cumulative distribution function (CDF) is determined from the normalized scale and shape values. The final step is to apply the inverse normal (Gaussian) function to the CDF, with 0 mean and a variance of 1. The result is a daily product that contains SPI values for each accumulation period at every grid point in Texas.
3. MPEDE products
a. Overview of products
The MPEDE-SPI products made using MPEAP data are used for diagnosis of drought severity in Texas. This section provides an overview of the GIS-based MPEDE-SPI maps. Table 1 shows the McKee et al. (1993) classification system for defining drought intensity based on SPI values. Two additional drought and wetness categories are included for extreme SPI values, on the basis of SPI categories used in the High Plains Regional Climate Center Applied Climate Information System (HPRCC-ACIS).
The high-resolution MPEDE-SPI products provide values on the 4-km PRISM grid, which can be used to determine the severity of drought at the subcounty level. This set of maps for various accumulation periods is available on a daily basis because of the daily availability of MPEAP data. The SPI grid data for each accumulation period are converted to a map format using the latitude, longitude, and percentile value for each grid point and a spatial interpolation algorithm in the ArcGIS proprietary software package.
Each radar estimates precipitation independently within its domain, and therefore discontinuities in the estimation of precipitation can occur along borders between radar domains (white lines on the MPEDE-SPI products). An area for which radar coverage is blocked by the Davis Mountains in western Texas is represented as a white area on the SPI maps. Different River Forecast Centers are responsible for the adjustment of the radar-estimated daily precipitation within their areas of responsibilities, and therefore discontinuities in precipitation estimates may also occur along boundaries between areas of responsibility (purple lines on the MPEDE-SPI maps).
The adjustments made by River Forecast Centers are based on observed or known biases in the radar estimates. The bias adjustments are made on a daily basis, a period for which sampling errors can account for a large portion of the differences between radar estimates and gauge measurements. Over time, sampling errors tend to average out while biases persist, affecting the quality and spatial consistency of the drought severity estimates.
b. Known issues with the MPE precipitation data
The MPE precipitation estimates have only been produced for a decade or so. Since their implementation, several investigators have examined the accuracy of MPE relative to rain gauge measurements. Because the objective techniques and subjective skill involved in creating MPE have both evolved over time, these past results do not necessarily apply to current and future MPE analyses. Because the MPE have improved over the years (Habib et al. 2009; Young and Brunsell 2008), the results represent a lower bound on MPE quality and accuracy. Indeed, investigators have started using MPE over gauges as a reference precipitation dataset (Gourley et al. 2010).
Most investigators have found a slight negative bias during the early days of MPE. Westcott et al. (2008) and Westcott (2009) found an overall negative bias of 6% in the upper Midwest, with 65% of monthly MPE values falling within 25% of collocated gauge values. Light precipitation tended to be overestimated, and heavy precipitation tended to be underestimated. Much of the overall negative bias was attributed to ground clutter and beam blockage near the radar and beam overshooting of precipitation far from the radar, the latter primarily in northern regions in wintertime.
Wang et al. (2008) found MPE precipitation to underestimate gauge values in central Texas in 2004. Habib et al. (2009) also found an underestimate in 2004 for MPE relative to a finescale gauge network in Louisiana. They found, however, that by the end of their study period in 2006 the monthly and annual MPE precipitation accumulations were nearly bias free. Habib et al. also concluded that much of the random MPE–gauge difference was due to the point sampling of gauges as compared with the area-mean estimates of MPE, although Westcott et al. (2008) did not find that multiple gauges made much difference in verification statistics for a network in Illinois. On the other hand, for a multimonth period of heavy rainfall in Oklahoma during 2007, Gourley et al. (2010) found that MPE generally estimated more rainfall than was recorded by the gauge network.
In summary, MPE appears to have little consistent bias when compared with gauges, as recently as 2006–07. These regional-average estimates have little direct value, though, when biases are present from individual radars. In Texas, spatial discontinuities in the MPE at some of the radar domain boundaries are evident, and these tend to become more noticeable with longer accumulation periods (e.g., 24 months). For example, the 1 June 2009 24-month MPEDE-SPI values (Fig. 2) tend to be lower in the Dyess Air Force Base (AFB) radar domain, located in west-central Texas, than in the immediately adjoining domains of other radars. In converse, the 1 June 2009 24-month MPEDE-SPI values within the Brownsville radar domain, in extreme southern Texas, generally are larger than the adjoining SPI values within the Corpus Christi radar domain (Fig. 2) just to the north. The SPI values in the Corpus Christi radar domain blend smoothly with adjoining values across other domain boundaries, suggesting that the issue is with the MPE within the Brownsville domain.
c. Comparison of an MPEDE-SPI product with a station-based SPI product
The high-resolution MPEDE-SPI values for a 12-month accumulation period ending on 1 June 2009 were compared with an ACIS-SPI product obtained from HPRCC-ACIS that uses COOP station data (Fig. 3a). The shading of the high-resolution SPI map is designed to mimic the coloration of the ACIS station values. In this example, most of the COOP stations fall within the same SPI color category or are within one SPI category of collocated grid points. Most of the larger discrepancies are in radar domains that appear to systematically underestimate (Dyess AFB) or overestimate (Brownsville) precipitation.
A major benefit to using the MPEDE-SPI is its capability of providing drought information in areas where spatial coverage of COOP stations is poor. One example is in Maverick County, which has an SPI value in the range from −2.5 to −1.5 on the 1 June 2009 12-month MPEDE-SPI map (Fig. 3b). There are no COOP stations plotted with SPI values in Maverick County, and the nearest COOP stations in surrounding counties have SPI values between −2.0 and −1.5. In this example, drought in Maverick County is underestimated in its severity using only COOP data, as the MPEDE-SPI map indicates SPI values between −3.0 and −2.0. In general, spatial coverage of COOP stations is poor in many parts of Texas, including the region in southern Texas at the core of the severe 2008–09 drought.
d. Validation of MPEDE-SPI for climate divisions
Monthly NCDC-SPI data are available for each climate division from January 1895 to the present (see online at http://www1.ncdc.noaa.gov/pub/data/cirs). The MPEDE-SPI values were aggregated to the climate-division scale for comparison with the climate-division SPI data provided by the National Climatic Data Center (NCDC-SPI). Each monthly NCDC-SPI value was calculated as an equal-weighted average of values from COOP stations within each climate division reporting both precipitation and temperature for that given month. Each climate-division MPEDE-SPI (CDM-SPI) value for a given month is the mean of all of the gridded MPEDE-SPI values within the climate division, with each gridded value weighted equally.
The climate-division-scale validation of the MPEDE-SPI used monthly SPI data from January 2008 through December 2010 in 9 of the 10 Texas climate divisions. Because of the radar blockage in western Texas (Trans-Pecos region), data from Texas climate division 5 (TX CD-5) were withheld from the validation analysis. Although drought plagued much of Texas during the 3-yr period, there was enough wetness in the period to provide comparisons of the CDM-SPI with the NCDC-SPI over a range of climatic conditions.
The CDM-SPI data are well correlated with the NCDC-SPI data (Fig. 4), with correlation coefficients squared of r2 = 0.9027 at 6 months, r2 = 0.8837 at 12 months, and r2 = 0.7278 at 24 months. In general, CDM-SPI values were slightly lower than the corresponding NCDC-SPI values for both wet and dry conditions. Figure 5 compares the NCDC-SPI and CDM-SPI for 6-month, 12-month, and 24-month accumulation periods in selected Texas climate divisions.
The NCDC-SPI and CDM-SPI show good agreement in the High Plains region (TX-CD1), which is located in the Panhandle of Texas (Fig. 5b). In general, the CDM-SPI 24-month accumulation values are slightly lower than the NCDC-SPI values in the Low Rolling Plains (TX-CD2), particularly during the first half of the analysis period (Fig. 5c). TX-CD2 contains the Dyess AFB radar domain, which tended to be drier over longer accumulation periods than neighboring radar domains (Fig. 2). The underestimation of MPEAP in the Dyess AFB radar domain is a likely cause for at least some of the discrepancy between the NCDC-SPI and CDM-SPI at 24 months in TX-CD 2. The difference between the NCDC-SPI and CDM-SPI at the 24-month accumulation period is much larger in Southern Texas (TX-CD9) than in other climate divisions (Fig. 5d). During the analysis period, the spatial coverage of COOP stations in the areas of TX-CD9 affected by drought was poor (Fig. 3). The following section provides an analysis of the drought that plagued Texas throughout much of 2008–09 using the MPEDE-SPI products, including a look into the improvement provided by the MPEDE-SPI in TX-CD9.
4. Overview of the 2008–09 Texas drought
The drought of 2008–09 was one of the most severe in the history of central and southern Texas, although the effects of the drought were statewide. This drought was probably the most severe of the past 100 yr in nine south-central Texas counties (Nielsen-Gammon and McRoberts 2009). By the end of the summer of 2009, the Texas drought had caused about $3.6 billion in losses to agriculture and ranching across the state (Nielsen-Gammon and McRoberts 2009). The genesis of the drought was in late 2007, which followed an extremely wet first nine months of 2007 that ranked among the wettest on record for statewide precipitation in Texas. The MPEDE-SPI products illustrate the spatial evolution of the drought from early 2008 through the end of 2009.
a. Summary of 2008
Below-normal precipitation was widespread across Texas during the period of October 2007–March 2008. Precipitation deficits were particularly unusual in far southern Texas, with the 6-month MPEDE-SPI indicating extreme dryness in several counties and exceptional dryness (Fig. 6a) in some areas of this region. The core of the extreme drought shifted from far southern Texas northward to south-central Texas by the beginning of the summer in 2008 with 6-month SPI values (Fig. 6b) below −2.0 in most of the region. Outside south-central Texas, most of the southern half of the state was moderately dry at the 6-month time scale.
During the summer of 2008, substantial rains fell in western, southern, and southeastern Texas, eliminating short-term drought from these regions. Hurricane Dolly brought torrential rains to extreme southern Texas in late July of 2008, and an active monsoon brought above-normal summer rainfall to western Texas. Hurricane Ike struck southeastern Texas during September of 2008 and eliminated drought conditions in this region. The severity and spatial coverage of the drought lessened across Texas by the end of the summer, leaving a core area of 12-month drought in south-central Texas (Fig. 6c).
b. Summary of 2009
September 2008–February 2009 was among the driest September–February periods on record across much of central and south-central Texas. The lack of precipitation resulted in widespread 6-month MPEDE-SPI values below −2.0 in central and south-central Texas (Fig. 7a), with 6-month MPEDE-SPI values below −3.0 near the Gulf Coast. The drought was most severe where the cool-season precipitation deficits combined with preexisting drought conditions, as shown by the 12-month MPEDE-SPI (Fig. 7b). The most extreme 12-month dryness was in a region that was severely to exceptionally dry during the 2008/09 winter and largely missed out on the precipitation brought by Hurricanes Dolly and Ike during the summer of 2008.
The drought continued to intensify across south-central and southern Texas during the spring and summer months of 2009 (Fig. 8a), with 6-month MPEDE-SPI values of less than −3.0 by the end of August of 2009, the height of the 2008–09 Texas drought in terms of the spatial coverage of extreme and exceptional drought in the state. Values of the 12-month MPEDE-SPI below −2.0 (at least extreme dryness) were widespread across central and southern Texas, covering about 25% of the state (Fig. 8b). By the end of the summer of 2009, south-central and southern Texas had experienced a 2-yr stretch of dryness that rivaled, and in some cases exceeded, the 1950s drought. Values of 24-month MPEDE-SPI of less than −2.5 covered a large swath of the state, from the Rio Grande in western Texas to east-central Texas. Three counties in south-central Texas (Guadalupe, Caldwell, and Bastrop) measured 24-month MPEDE-SPI values of less than −3.0, and this three-county region was likely hit the hardest by the 2008–09 Texas drought (Fig. 8c). Coastal areas of south-central Texas recorded the driest 12-month period in the history of the region from September 2008 through August 2009.
The Texas drought of 2008–09 gradually came to an end during the last four months of 2009, as evidenced by the 6-month MPEDE-SPI values from July through December 2009 (Fig. 8d). Short-term precipitation was near or above normal across the entire state, but significant long-term deficits remained in areas of south-central Texas.
c. Regional verification of MPEDE-SPI during the 2008–09 Texas drought
The overall severity of the drought in the southern half of Texas was clearly indicated by the station-based ACIS-SPI product (Fig. 3a) and the MPEDE-SPI (Fig. 3b). There were, however, regions of Texas for which the ACIS-SPI could not objectively determine the severity of drought because of a lack of precipitation stations in the area. One such region for which this occurred was in southern Texas (Fig. 9a), specifically in Uvalde and Zavala Counties, where 12-month MPEDE-SPI values between −2.0 and −3.0 covered most of the region by the end of August of 2009 (Fig. 9b).
Analysis of drought in this region using the ACIS-SPI station plot map (Fig. 9c) is difficult because of the varying degrees of drought severity at surrounding stations. Figures 3a, 9c, and 9d are modifications of images directly downloaded from HPRCC-ACIS, which led to “blobs” representing stations in Fig. 9b and to jagged contours in Fig. 9c because of magnification of the original images. The 1 September 2009 ACIS-SPI contour plot (Fig. 9d), which is based on analysis and interpolation of the station data shown in Fig. 9c, suggested a decline in the 12-month drought severity from east to west across both Uvalde and Zavala Counties. The MPEDE-SPI suggests the severity of drought in eastern Uvalde County is similar to that throughout all of Uvalde and Zavala Counties, with agreement between the MPEDE-SPI values (Fig. 9b) and collocated ACIS-SPI stations (Fig. 9c), the latter determined to the center of a blob.
To determine whether the MPEDE-SPI improves upon interpolation or analysis of the sparse station data, the authors contacted the Texas AgriLife county extension agent in each county. Agent J. Dalrymple of Uvalde County reported a total loss of dry land crops, and agent M. Valdez of Zavala County indicated a greater-than-95% loss of dry land crops. Both agents indicated that the only crops with significant yields were those that relied completely on irrigation. The little summertime precipitation that fell in these counties quickly evaporated because temperatures were consistently above normal.
Both county agents verified the spatial consistency and severity of the drought impacts as shown by the MPEDE-SPI (Fig. 9b) rather than a waning of impacts in the western parts of their counties as suggested by the ACIS-SPI (Fig. 9d). Agent Dalrymple is a volunteer fire fighter and mentioned being very busy during the summer of 2009, fighting fires that closed in on Kinney County, which borders Uvalde County to the west. Agent Valdez reported that any variations in Zavala drought impacts were isolated, with an overall 300% increase in total irrigation costs and a 25%–40% reduction in cattle numbers.
The MPEDE-SPI products fill an important gap in drought-monitoring capabilities. The purpose of such a high-resolution drought-monitoring product is to portray accurately the status of drought and pluviosity at different time scales with subcounty detail. Such objective information at the subcounty scale is useful input information for U.S. Drought Monitor assessments of drought conditions within climate divisions. The county and subcounty resolution of the MPEDE-SPI values can be combined with information from currently available tools to produce drought depictions with the spatial resolution of MPE calibrated with a broad spectrum of drought-monitoring-tool information.
Further refinement is needed before MPEDE-SPI can be reliably applied across broader segments of the United States. MPE errors due to radar biases accumulate over time, leading to artificial spatial discontinuities in drought depiction. In many parts of the country, radar estimates of precipitation are unreliable because of the shallowness of the precipitation or beam blockage from topography. Nonetheless, the MPEDE-SPI that is presented here represents a unique source of high-resolution drought information that is a useful addition to the existing arsenal of drought-monitoring tools. Future work will use monthly precipitation estimates and known radar error characteristics to provide an additional radar-rainfall calibration step that will reduce artificial biases in the MPEDE-SPI estimates.
A sincere thank you is given to J. R. M. Hosking for the series of computer programs (Hosking 1996) that were written for station clustering, determination of distribution fit, and determination of the gridpoint percentiles and that have been made freely available online. In addition, the regional frequency analysis work done by Hosking and Wallis (1997) was crucial to the development of the high-resolution SPI algorithm.
a. Determining characteristics of COOP stations for clustering
The initial clustering stage determines the station characteristics that can accurately describe a station’s properties. The clustering method used grouped stations with similar geographical properties and precipitation characteristics. Hosking and Wallis (1997) make a clear distinction between at-site characteristics and at-site statistics when performing any precipitation cluster analysis. Characteristics of a station include its geographical identity such as latitude, longitude, and elevation and other such quantities that are known for each COOP station prior to any information derived from the daily precipitation variability. The 1971–2000 monthly and annual precipitation normal values (National Climatic Data Center) are also examples of site characteristics used in this study. COOP stations in Texas and adjoining states were required to have monthly and annual 1971–2000 precipitation normals and at least 40 years of precipitation data to be eligible for use in the regional frequency analysis.
Table A1 contains the six at-site characteristics used to define stations in the cluster analysis of the 497 COOP stations with sufficient precipitation records. These site characteristics X were transformed to create a cluster variable Y with range −1 < Y < 1. The final two site characteristics in Table A1 are intended to cluster stations with similar precipitation seasonality.
b. Clustering of COOP stations
Commonly used methods of cluster analysis include single linkage, average linkage, complete linkage, Ward’s method, the centroid technique, and k means (Robeson and Doty 2005). Each is a hierarchical clustering technique that uses a measure of (dis)similarity or Euclidian “distance” between each pair of objects, whether these are single stations or clusters of stations. All of the clustering techniques use the following routine, starting with each station as an individual cluster:
The Euclidian distance between all stations is calculated using the differences between the six at-site characteristics.
The “closest” pair of objects is merged into a cluster.
The Euclidian distances are recalculated.
Steps 2 and 3 are repeated until only one cluster remains.
Ward’s method joins clusters that minimize the within-cluster sum of squared distances SS (Stooksbury and Michaels 1991) and is strongly biased in favor of producing clusters with roughly the same number of sites (Hosking and Wallis 1997). This method is preferred for the present purposes because of its tendency not to isolate single stations, as occurs more frequently with other hierarchical clustering techniques, such as the average-linkage and single-linkage methods. Ward’s method was implemented using a clustering algorithm that followed Hosking (1996).
c. Selection of an ideal clustering solution
COOP stations were clustered using Ward’s method, and the reliability of clusters from each iteration was assessed to determine an ideal clustering solution. Hosking and Wallis (1997) define a discordancy measure Di for site i as
where ui is a transposition of sample L-moment ratios, is the cluster average of all of the ui vectors, N is the number of stations within a cluster, and A is defined as
The discordancy value was computed for each station within a cluster, and a station was deemed to be an outlier if Di was too large according to the criteria outlined in Hosking and Wallis (1997).
At various stages, cluster sets resulting from Ward’s method were tested both for homogeneity and the presence of discordant stations. The homogeneity of each cluster was tested with 500 Monte Carlo simulations, with each simulation being a homogeneous simulated cluster containing the same number of stations. All simulated stations had the same number of years of data as their real-world counterparts, and simulated rainfall data were generated using the Hosking (1996) computer algorithm. A solution with 40 clusters produced the largest clusters that were still small enough to contain few discordant stations. Rearrangement of a few stations based on subjective judgment, including the addition of three clusters and the combination of others, left the 497 stations divided into 41 clusters (Fig. A1).
d. L-moment ratios
For each calendar day of the year, sample L moments were calculated at each station for several durations of accumulated precipitation ending on that day, using all years for which data are available. The calculation of sample L moments at each station, yearday, and accumulation period (SDP) begins with ordering the given precipitation data. The probability-weighted moments br are calculated using (A3), where r ≥ 0 and j is the ordered location of x within sample size n years of annual precipitation data. The br values are used in estimating parameters for sample L moments l1 [(A4)], l2 [(A5)], l3 [(A6)], and l4 [(A7)] of a probability distribution:
From the sample L moments lr (r = 1–4), sample L-moment ratios are computed for each SDP. The sample L-moment coefficient of variation (L-CV) is denoted as t [(A8)]. The L skewness (r = 3) and L kurtosis (r = 4) are denoted as t3 and t4, respectively [(A9)]:
To ensure continuity between days and reduce sampling errors, spectral smoothing was performed on the time series of each L-moment ratio, retaining only the first three annual harmonics. Figure A2 provides an example of this smoothing as applied to L-moment ratios at Camp Mabry in Austin, Texas. The L-moment ratios resulting from these smoothed time series were then used in distribution parameter estimation for each SDP.
e. Choice of frequency distribution
Within each cluster, regional sample L-moment ratios tR [(A10)], [(A11)], and [(A12)] were calculated (Hosking and Wallis 1997) on the basis of the L-moment ratios (ti, , and ) of each of the N member stations and were weighted by the number of years of data ni at each station:
The L kurtosis of each distribution was found for five candidate distributions using a regional L moment of l1 = 1 and regional L-moment ratios of tR and to determine the best overall fit. The distributions used were the generalized logistic, generalized extreme value, generalized Pareto, generalized normal, and Pearson type-III (P-3) distributions.
Using the sample L moment l1 = 1 and L-moment ratios tR, , and for each cluster, kappa distribution parameters were calculated. For each cluster, NSIM = 500 Monte Carlo simulations were performed using the Hosking 1996 algorithm. Each simulation m was characterized by a regional L kurtosis and was compared with the of all five candidate distributions.
The average bias B4 was computed for each cluster by comparing its sample regional L kurtosis with each [(A13)]. The standard deviation s4 of was calculated [(A14)] to determine the goodness-of-fit measure ZDIST for each distribution [(A15)] given (Hosking and Wallis 1997):
The ZDIST metric was used to find the distribution that would best represent all the clusters as a whole. The goodness-of-fit tests performed on all clusters for each SDP found the P-3 to minimize ZDIST, making it the best fit among the five candidate distributions. For each SDP, the P-3 parameters were transformed from t (individual station L-CV) and (regional skewness) into location μ, scale σ, and shape γ parameters, following Hosking (1996). Using t from station data ensures that the individual station scale was preserved, and using ensures a regional shape to each SDP distribution.