This study quantifies the spatial distribution of precipitation patterns on an annual basis for southeast Louisiana. To compile a long-term record of 24-h rainfall, rainfall reports collected by National Weather Service (NWS) cooperative observers were gathered from National Climatic Data Center (NCDC) archives, private collections of observational data held at regional and local libraries, NWS offices, and local utility providers. The reports were placed into a digital database in which each station’s record was subjected to an extensive quality control process. This process produced a database of daily rainfall reports for 59 south Louisiana stations for the period 1836–2002, with extensive documentation for each site outlining the differences between the study’s data and the data available from the NCDC Web page. A statistical methodology was developed to determine if the four NCDC climate divisions for southeast Louisiana accurately depict average monthly rainfall for the area. This method employs cluster analysis, using Euclidean distance as the measure of dissimilarity for the clustering technique. To resolve missing rainfall observations, an imputation scheme was developed that uses the two most similar stations (based on Euclidean distance) to determine appropriate values for missing rainfall observations. Results from this testing structure show statistical evidence of precipitation microclimates across south Louisiana at higher spatial scales than those of the NCDC climate zones. Quantifying the spatial extent of daily precipitation and documenting historical trends of precipitation provides critical design information for regional infrastructure within this highly vulnerable area of the central Gulf Coast region.
Human and economic effects due to hydrologic extremes span across every ecosystem dimension. Hydrologic extremes are responsible for our nation’s most catastrophic events in terms of disease, death, and economic loss. Fatality statistics compiled by the National Weather Service (NWS) Office of Climate, Weather, and Water Services and the National Climatic Data Center (NCDC), for the 30-yr period from 1977–2006, indicate floods are responsible for the highest number of U.S. storm-related deaths annually (NOAA 2007). National Oceanic and Atmospheric Administration (NOAA) sources cite annual economic losses due to droughts and floods are each in the billions of dollars (NWS 2007; NOAA 2006); however, concerns have been raised on the calculation of actual economic losses due to floods (NWS 2007; Downton et al. 2005; Mileti 1999). Nationally, economic effects of floods can be documented by the dollar amounts paid to flood insurance clients holding policies administered by the Federal Emergency Management Agency (FEMA) through the National Flood Insurance Program (NFIP). Loss dollars paid by the NFIP for the 10-yr period ending in 2006 showed just over $2.5 billion (U.S. dollars) paid out annually to policyholders. This has resulted in NFIP incurring a debt for almost $20 billion to the U.S. Treasury (ASFPM 2007).
This rainfall study is focused on the southern portions of Louisiana and Mississippi (Fig. 1). The catastrophic damage and loss of life due to hydrologic hazards within this area emphasize the need to accurately assess and monitor hydrologic variables to predict the region’s risk to hydrologic hazards (Townsend 2006). Numerous hydrologic events within the study area have been responsible for national policy change and congressional reforms. Until 2005, the 1927 Mississippi River flood set the benchmark for regional hydrologic disasters. The 1927 event changed U.S. flood control policies and the demographics of the rural south—and eventually the nation (Barry 1997; Trotter et al. 1998). In the late 1990s and early 2000s, catastrophic events within this region continued to create institutional reforms, including the 1997 Mississippi River flooding event (Nossiter 1997), Hurricane Georges (Pasch et al. 2001), Tropical Storm Allison (Beven et al. 2003; NOAA 2001), Tropical Storm Isidore, and Hurricane Lili (Schumacher and Johnson 2006). However, these events pale when compared to the devastation of Hurricanes Katrina and Rita in 2005. In 2005, primarily as a result of Hurricanes Katrina and Rita, more than 200 000 claims were paid out by the NFIP, totaling more than $17 billion, the most claims and losses paid out since 1978 (FEMA 2007). The majority of these NFIP claims were filed within the area of this study (NWS 2006; Graumann et al. 2005; RMS 2005).
This research employs statistical techniques to quantify the spatial distribution of precipitation patterns on a monthly and annual basis for southeast Louisiana. The assessment of precipitation patterns using a long-term record of observations is fundamental to understanding the area’s exposure to risk from hydrologic extremes. The largest number of U.S. flood events occur within a 5-state area composed of Oklahoma, Texas, Arkansas, Louisiana, and Mississippi (Konrad 2001). Not surprisingly, the largest sum of economic flood losses in the nation (a total of $15.2 billion from 1983 to 1997) occurs within this region (Pielke Jr. and Downton 2000). Excessive rainfall, effects of which are exacerbated by social factors, is the primary factor for catastrophic economic loss. The highest frequency of extreme 24-h precipitation events in the nation (Konrad 2001) occurs within this 5-state region, Additional research indicates total flood damage is most closely related to rainfall exceeding a 5-yr recurrence interval occurring over two days (Pielke Jr. and Downton 2000), with a secondary factor being the frequency of rain or snow days.
Most importantly, total flood damage has the highest correlation to a recurrence interval determined for a specific station (Pielke Jr. and Downton 2000) and not the absolute rainfall threshold for an area, as provided by NOAA Technical Paper 40 (Hershfield (1961). This finding illustrates the importance of spatial resolution in determining precipitation patterns to assess risk. It is important to understand and quantify the spatial patterns of rainfall extremes within individual watersheds and not regional areas. There are existing products that attempt to describe the rainfall distribution in the study area (i.e., NCDC climate zones). However, the primary question motivating this particular research activity is: Do these existing products have the necessary spatial resolution to create an accurate picture of rainfall on a monthly, annual, and historic basis for use in hydrologic modeling and risk analysis? Specifically, do NCDC climate zones fully capture the spectrum of high-impact, low-frequency precipitation events accurately within the study area? Knowledge of the location, amount, timing, and intensity of rainfall within watersheds in southeast Louisiana is critical to quantifying the complex relationships between precipitation, social factors, and the magnitude of flood damage.
The spatial distribution of rainfall observing sites and the extensive period of rainfall records compiled for this study provide a foundation to examine the spatial and temporal patterns of rainfall patterns in one of the most vulnerable coastal watersheds and endangered ecosystems in the United States. Within southern Louisiana and south Mississippi, as in other coastal areas, significant urbanization has occurred. In this study region, rainfall spans all spatial and temporal scales. Individual thunderstorms produce heavy rainfall that overwhelms urban systems. Seasonal synoptic-scale systems produce storm complexes that affect the region for extended periods, affecting regional river systems such as the Mississippi and the Pearl. The counterpart to flooding—drought—is equally damaging. The effects of drought affect soil stability and allow for saltwater to push inland, which produces catastrophic damage on fragile coastal ecosystems. Droughts have been examined as a possible cause for brown marsh episodes within Louisiana wetlands. The brown marsh event of 2000 damaged or destroyed 100 000 acres of smooth cordgrass (Spartina alterniflora; Brown and Pezeshki 2007; McKee et al. 2004; Stewart et al. 2001), affecting the stability of coastal soils in marshlands. During this same drought event, interior portions of the Maurepas swamp experienced an extended episode of saltwater intrusion, which reduced the annual late-season primary tree productivity for 2000 by roughly 20%–40% when compared to a year of normal precipitation (Shaffer et al. 2003).
The resiliency of communities in the study area is directly linked to the resiliency of its ecosystems. Fundamental to understanding this ability to recover and strategically adapt is an accurate assessment of watershed-based rainfall patterns founded on a long-term record of rainfall observations (Tierney et al. 2001). An accurate rainfall assessment provides the foundation for research, which quantifies the effects of socially based factors—such as population trends, urbanization, and policy mandates—which may exacerbate the effects of floods and droughts (Downton et al. 2005; Smith et al. 2002; McBean and Rovers 1998; Gupta 1995; American Iron and Steel Institute 1995; Linsley and Franzini 1972) and possibly alter the climatology of precipitation (Ntelekos et al. 2007). Assessments of rainfall patterns using a long-term multisource precipitation database provide valuable information on the spatial pattern of rainfall for periods of record spanning active and inactive coastal storm periods, regional growth, and watershed and coastline modification. Information gained from the long-term precipitation database compiled for this study and the statistical analysis presented by this manuscript are critical to determining the spatial extent of local factors affecting the frequency and magnitude of rainfall extremes, which have tremendous consequences on infrastructure planning activities for the future.
2. Purpose of research and overview of research components
The objective of this study was to determine the spatial extent of monthly rainfall patterns across the central Gulf Coast region of the United States to investigate the accuracy of NCDC climate zones in depicting subregional rainfall patterns. For this study, NCDC climate zones 5, 6, 8, and 9 define the southeast Louisiana area (Fig. 2). A weakness of the NCDC climate division system is that the boundaries are not based, in most cases, on climate considerations. The climate divisions were delineated based on geography, political boundaries drainage basins, crop districts, and/or weather bureau forecast areas (Guttman and Quayle 1996; Fovell and Fovell 1993). These concerns are relevant for the study area where the northern and eastern boundaries of NCDC climate zone 6 coincide with the Louisiana and Mississippi state lines and the presence of the Mississippi and Pearl River basins. To conduct the assessment, a statistical methodology was developed to identify the existence of precipitation microclimates and then determine the spatial extent of these precipitation areas using a long-term rainfall database compiled from daily rainfall observations recorded at NWS cooperative observing stations.
The initial project effort focused on the development of a high-quality, long-term database of daily precipitation compiled in a digital format. The primary objective of this effort centered on compiling all rainfall observations available from multiple sources (NCDC Web sites, paper documents, handwritten observation logs). Digital spreadsheets with standardized formats were developed to ease data access and digitizing because most New Orleans–area data prior to 1940 were rescued from fragile data sources (dss). This activity rescued data from paper formats in private repositories, which included valuable information on extreme rainfall events. In examining rainfall records compiled using only the individual NCDC Annual Climatological Summary (ACS) sheets or the NCDC Record of Climatological Observations (RCOs), many extreme rainfall events would not have been captured, leading to erroneous assumptions on rainfall extremes and frequency of events. This finding mirrors the conclusion of the National Academies of Science that rescued data have a critical role in augmenting available data to reduce the uncertainties of predicting hydrologic hazards (Water Science and Technology Board 2004). In summary, the extensive effort placed on the data mining and data digitizing accomplished two objectives: 1) the digitizing process compiled the data into a uniform format to facilitate the tests of the statistical methodology; and 2) an information site capturing southeast Louisiana rainfall reports from multiple sources was created with extensive documentation on data quality control and quality assurance procedures. Only 24-h rainfall observations that have a period of record from 1836 to 2002 were included in the database. All data sources and database decisions are available in the electronic supplement of this publication (http://dx.doi.org/10.1175/2009JHM1076.s1).
The second research focus of this study centered on developing a process to characterize the spatial variability of 24-h precipitation across southeast Louisiana. A statistical methodology was developed to determine statistically significant differences in average monthly rainfall between individual stations. As a result of these statistical tests, individual stations were then grouped to depict the spatial extent of monthly precipitation patterns derived from 24 rainfall reports. This grouping activity, combined with individual observations contained in the long-term database, provides information on the quantity of rainfall and recurrence intervals for rainfall extremes within southern Louisiana. The results and conclusions of the study examine how differences in spatial resolutions, such as those of NCDC climate zones, affect disseminated information regarding monthly rainfall patterns and how this information directly affects assessments of rainfall quantity and variability within coastal watersheds.
3. Database development
a. Data discovery procedures
Initial efforts in assembling the long-term digital database of daily rainfall focused on identifying rainfall observing sites operating in the study area of southeast Louisiana and southern Mississippi prior to 31 December 2002. This search area was right of a line extending from the southeast Louisiana community of New Iberia, Louisiana, northward to Brookhaven, Mississippi, and then east to the Mississippi–Alabama state line near Leakesville, Mississippi, and then south along the state line to the Mississippi coastal community of Pascagoula (Fig. 1). Table 1 lists the sources used, publication medium, and data repository Web site if applicable.
The initial search yielded more than 200 rainfall stations reporting daily rainfall within the defined study area for the period ending 31 December 2002. Extensive effort was placed on verifying and digitizing paper reports. These reports were contained in government documents and handwritten reports on individual station observation sheets, such as those discovered in log books located in the storeroom of the NWS Weather Forecast Office New Orleans/Baton Rouge. The digitizing effort significantly extended the daily rainfall records of many southeast Louisiana and southern Mississippi stations and secured fragile data vulnerable to damage. Most importantly, it provided multiple sources to investigate and verify data values in the event of data discrepancies among data sources.
b. Database quality assurance procedures
To detect data discontinuities due to observer errors, station relocations, and changes in the observation program at a site, an accurate and continuous observational record is needed (Robinson 1990; Wu et al. 2005, Kunkel et al. 2005). To accomplish the task of building long-term continuous rainfall records, a three-step quality assurance process was developed. This process helped document and resolve as many data discrepancies between the Table 1 data sources as possible. In the first step of the quality assurance process, every station’s observational record was examined to identify stations with near-continuous daily rainfall observations of 10 yr or greater. There were 94 Louisiana stations and 42 southern Mississippi stations that met the minimum requirement of a rainfall record longer than 10 yr.
The second step of the quality assurance process identified missing observations and documented all data discrepancies within the observational records of these stations.
The third quality assurance step involved several tasks to resolve as many discrepancies and cases of missing data as possible. The following three tasks were accomplished: 1) examination of each data discrepancy between the Table 1 data sources, with an initial effort to resolve differences using a decision hierarchy; 2) development of mean areal precipitation (MAP) sheets to resolve remaining missing data by examining regional rainfall patterns; and 3) resolution of any remaining missing data by plotting daily precipitation patterns and correlating archived weather data and historical reports. Steps (a)–(e) of Fig. 3 illustrate this process of data synthesis.
1) Resolving data discrepancies
In many cases, the Table 1 data sources (U.S. Department of Commerce 1932, 1956; U.S. Department of Agriculture 2005; U.S. Department of Commerce 2005) reported different values for daily values or monthly sums. There were primarily of four types of data discrepancies:
Type 1: Data from RCOs [data source 2 (ds 2)] have 999.99 for a particular day without the data being flagged with an S to indicate the value is included in the next observation. Scanned paper monthly Louisiana and Mississippi climatological publications (ds 4) have an asterisk for the day to indicate the measurement is included in the next rainfall report.
Type 2: DS 2 and ds 4 each have the same monthly sum. ACS (ds 3) has MM (indicating missing data) for the monthly sum.
Type 3: Pre-1948 ds 4 have daily rainfall reports. DS 3 and ds 2 documents are not available.
Type 4: DS 2 and NCDC CD-ROM monthly sum (ds 7) do not agree with ds 3. DS 4 monthly sums do not agree with ds 3 as a result of different daily data values.
Step 1 of Fig. 3 outlines the procedure to document the process to discover and document data discrepancies. Step 2 of Fig. 3 illustrates the decision hierarchy used to resolve the documented data discrepancies that developed from discussions of the lead author with senior management of NCDC (R. Vose and T. Karl, 2003 personal communication). From these discussions, the following decision hierarchy evolved: 1) for monthly sums, ds 3 data should take precedence over all other sources; and 2) values found in paper documents, including ds 4, should take precedence over other NCDC Internet information sources as a result of digitizing issues.
2) Use of MAP data sheets
MAP data sheets were compiled to provide a daily snapshot of rainfall from 1836 to 2002 to depict daily rainfall across the study’s geographic area and the number of reporting stations. This was necessary to examine station records for additional data gaps. The first task involved the identification of rain and no-rain days across the area. This process was not implemented for reports before 1900 because of a reduced number of observing stations. For missing data after 1900, if the MAP sheet showed no rainfall observed at any observing site, the day was defined as a “no rain” day. For observations from 1995 to 2002, Next Generation Weather Radar (NEXRAD) mosaics from NCDC archives were used to detect the absence or presence of radar echoes to confirm a rain or no-rain day. If a missing observation occurred on a no-rain day, with careful consideration of observing times, then the missing value was changed to a zero. The station’s spreadsheet and the database documentation record this substitution.
3) Mapping regional precipitation patterns
A final effort to resolve missing rainfall data involved all data sources with information on weather conditions. These information sources included reports found in Monthly Weather Review articles, NEXRAD radar archives, and archived surface and atmospheric weather maps. If a rain report or NEXRAD echo was discovered in the vicinity of the station, then rainfall reports were plotted. To derive a possible value for the missing rainfall observation, a mean daily rainfall for each month was calculated for each station. This mean daily rainfall was compared with the plotted rainfall reports. In fewer than 10 cases, the mean daily rainfall value was a suitable solution. For this small number of cases, the substituted values were less than 0.65 mm, and the substitutions showed no significant effect on station rainfall statistics. For all other cases, the rainfall plots suggested values that exceeded 24-h precipitation values for a 25-yr return period derived from NOAA Technical Paper 40 (Hershfield 1961). In these cases, the missing daily observation remained in the station’s database record and the data gap was retained. In all instances, the station’s spreadsheet and documentation, available from the electronic supplement (http://dx.doi.org/10.1175/2009JHM1076.s1), outline the decision process and the substituted value.
As a result of this three-step quality assurance process, most data discrepancies were resolved, resulting in longer continuous precipitation records. It is important to note two activities that resulted in resolving a majority of the data discrepancies. The first activity was entering daily data from the scanned paper monthly and annual climatological publications (ds 4 and ds 5). For most stations, daily rainfall observations and monthly sums were available for an extensive period of time prior to the period of record on the NCDC ACS Web access page. The second activity involved resolving type 1 data discrepancies in which the monthly climatological data publication indicated an asterisk and the RCO data listed a 999.99 value, often not flagged with an S to indicate the value is included in a subsequent observation. For these cases, 0.00 was entered into the spreadsheet for the asterisk value because the rainfall was included in a subsequent observation. This substitution did not affect the values used for this specific statistical methodology because this investigation seeks to quantify spatial precipitation patterns based on monthly averages and not daily values. However, this substitution would affect the results of any study focused on daily rainfall distribution and frequency. All data discrepancies for the stations listed in Tables 2 and 3 along with all decisions related to what values were entered into each of the individual station spreadsheets have been extensively documented and are provided in the electronic supplement (http://dx.doi.org/10.1175/2009JHM1076.s1).
4) Identification of data discontinuities
Once the quality assurance efforts to resolve data discrepancies and missing data were completed, station history forms were examined to determine when observation program changes occurred at any of the stations. Dates for station equipment changes, station relocations, station observers, and station observation times were superimposed on plots of the station’s observed precipitation. Data for the length of record were analyzed by double mass analysis (Gupta 1995; Kohler 1949). In addition to these tests, data five years prior and five years after any alteration to the station observation program were examined, and statistical tests were completed to detect any data shifts (Brooks and Carruthers 1953; Guttman 1998; Peterson et al. 1998; Conrad and Pollak 1950).
The quality assurance efforts and data discontinuity tests on the database produced several stations that had long-term, near-continuous precipitation records. The spatial distribution of these reference stations was sufficient to document localized precipitation patterns across the study’s geographical area. Data from the reference stations were used to verify the results of the data discontinuity statistical tests to prevent the masking of actual local effects by statistical analysis techniques (Fiebrich and Crawford 2001). Without examination of precipitation patterns observed at these designated reference stations, extreme values in the data record would have been flagged as suspect data by statistical tests and removed from the study erroneously. Tables 4 and 5 present the monthly average rainfall for each of the stations selected for use in the statistical tests.
4. Methodology to define data period for statistical tests
a. Choosing an appropriate observation period
At first glance, the database appears to provide an extensive period of observations beginning in 1836. However, closer inspection reveals periods of missing data as observing programs, and stations change over the course of the period of record. To determine the period of record that contains the highest number of observations, two criteria were used: 1) find a period when more than 50% of all stations were reporting, and 2) insure all available stations are reporting at some point in the period. These screening criteria identified a period consisting of 684 months ending with December 2002. Within the span of 684 months, January 1946–December 2002, there was not a month that had fewer than 33 stations reporting (Fig. 4). One station [Slidell Weather Service Meteorological Observatory (WSMO)] has as few as 170 months of data, whereas Baton Rouge and Oaknolia each have all 684 months of data [month 1321 (January 1946)–month 2004 (December 2002)]. Within these 684 months, most stations have strings of missing data; however, no stations show a regular pattern of missing observations. These strings of missing data appear to be random. This examination did highlight one station, Burrwood, that was removed from the dataset as a result of apparent quality control issues.
b. Assessing structure in the data
There are two assumptions made concerning structure within these data. The first assumption is that decadal oscillations equally affect all stations as a result of the small spatial extent of the study area. The second assumption is that decadal oscillation effects can be removed by subtracting the ensemble mean from the monthly means of each individual station.
Various methods exist for finding groups in data. Among these are certain eigentechniques—for example, principal component analysis (PCA; Hotelling 1933) and empirical orthogonal functions (Lorenz 1956)—and various pattern classification schemes—for example, discriminant analysis (Fisher 1936), support vector machines (Vapnik 1998), neural networks (McCulloch and Pitts 1943), classification and regression trees (Brieman et al. 1984), and cluster analysis (Kaufman and Rousseeuw 1990). For this work we chose cluster analysis. Cluster analysis is free of some of the data requirements of the eigentechniques and is somewhat easier to manage given the available data. Among prior works that use clustering to identify stations with similar characteristics are Gong and Richman (1995) and Fovell (1997). Gong and Richman use rotated PCA as a foundation for clustering, whereas Fovell uses a method that includes both precipitation and temperature data. Both works are applied to the entire conterminous United States (CONUS). Here a clustering based only upon precipitation is applied to a small region. All of the caveats noted in Fovell, and Gong and Richman apply here as well.
Cluster analysis requires a dissimilarity metric. The most common choice is the Euclidean distance (the Minkowski L2 norm), and it is also used here. With regard to assembling a dissimilarity metric, there were months within this subset when particular station pairs had missing data. Such nonoverlapping data periods are problematic when computing a dissimilarity metric (such as Euclidean distance), which depends on paired observations. Standard imputation techniques cannot be used because data are not missing at random. This would indicate the need to eliminate about 20% of the stations (approximately 10). This solution was not acceptable because all stations were needed. To remove these particular pairs would minimize the spatial distribution of observations and affect the final conclusions of the spatial extent of potential microclimates.
For some of the stations, data was missing for several years. There are concerns that long absences of data could mask the effects of decadal oscillations in rainfall over the analysis area. Although decadal oscillations are unlikely to create statistically significant rainfall gradients over the small area of the analysis region (200 km2 or 124 mi2), if a station’s observations resume in a dry period following a prolonged wet period, the station record will not capture the wet period. The station could be incorrectly grouped with “dry” stations when in fact the station experiences the same rainfall patterns as the wettest stations in the region, but this fact is overlooked only because observations were not taken. To remove any such effects, a mean monthly rainfall for the study area was computed from the 684 months. This ensemble mean monthly rainfall was subtracted from the monthly rainfall reported from each of the stations, yielding rainfall that reflects monthly perturbations.
Because Euclidean distance is used as the measure of dissimilarity in the clustering technique, imputation that uses the two most similar stations (based on Euclidean distance) is employed. The technique is as follows: First, find the station that suffers the largest number of nonoverlapping pairs and call this station A. Using the Euclidean distance, find the two stations most similar to station A and call these stations B and C. Average the precipitation values for B and C where there is overlap, and insert these values into the missing data of station A. Where there is no overlap, simply use whatever data exists from either B or C and insert it into the appropriate missing values of station A. This is followed by a check to see if there are still station pairs for which there is no overlap. If so, repeat the process for the station that suffers the most nonoverlapping stations. The process continues until all station pairs exhibit some overlap. As a result of this process, missing precipitation values are imputed using the similarity metric that defines the clustering. Data are then recentered (as described above) and fed to the clustering algorithm. This technique has the desirable characteristic of being “blind” to the geographic location of stations used for imputation: physical proximity has no bearing on the stations chosen for imputation and therefore does not explicitly bias the clustering results.
Many clustering techniques are available and all have different characteristics. The method chosen for this work is based on the agglomerative coefficient (AC), which is defined as follows: Let d(i) denote the dissimilarity of object i to the first cluster it is merged with, divided by the dissimilarity of the merger in the last step of the algorithm. The AC is defined as the average of all [1 − d(i)]. The clustering method that provides the largest AC is agglomerative nesting (AGNES; Kaufman and Rousseeuw 1990). Using AGNES, the AC for these data is 0.82. Using bootstrap resampling (Efron and Tibshirani 1993), the 95% confidence interval for the AC is 0.80–0.84. To insure that this AC is not due to random chance, data with no structure are given to the AGNES algorithm. Using 10 000 Monte Carlo trials that each feed 55 data strings of 684 uniformly distributed random values on the interval [0, 1], the 95% confidence interval for the resulting AC is 0.15–0.19. Hence, the structure extracted from the precipitation data is not the result of chance.
Although there are some methods by which the number of clusters may be validated, none are generally applicable in all cases. Several methods were investigated and all provided ambiguous guidance. The clustering process shows 2–7 or possibly 8 clusters appear reasonable based on prior research on precipitation regimes within the analysis area (Kiem et al. 2005), an examination of the dendrogram, the silhouette coefficient, a plot of a number of clusters versus dissimilarity, and finally examination of how cluster structure changes as different numbers of clusters are chosen. However, with more than five clusters, some clusters become too small (with three or fewer members), and the resulting clusters cannot be contained within a convex hull. Hence, five clusters are used in this work (Fig. 5).
Figure 6 illustrates the monthly precipitation trends of the five groups. The bar graphs show the variability of monthly rainfall normally observed within a year. With the exception of group 4, the maximum monthly rainfall occurs in July. Group 5 reports the largest July rainfall with 178.44 mm (7.03 in.) observed in a normal year. All station groups show a minimum of rainfall in October, with the lowest average rainfall occurring in group 1 with 78.12 mm (3.08 in.).
Figure 2 shows four NCDC climate zones for southeast Louisiana. Results of the clustering activity infer five precipitation microclimates exist within the study area. The statistical process indicates three station groups are present (groups 3–5) within NCDC climate zone 6. Station groups 2 and 4 exist within the boundary of NCDC climate zone 5. Two station groups, 1 and 2, exist within the boundary of NCDC climate zone 9.
The orientation of precipitation microclimates identified by the clustering technique appears to have a diagonal orientation, southwest to northeast, rather than an east–west orientation as suggested by the NCDC climate zones. Figure 7 shows points along the Mississippi River labeled A, B, C, and D. These locations were chosen as a result of their proximity to group boundaries derived from the clustering technique. The river segment from point A to point B passes through an area of stations comprising group 4, segment B–C passes through group 3, segment C–D passes through group 2, and the segment from point D to the mouth of the river passes through an area of stations comprising group 1. For this same river reach, from point A to the mouth of the river, two NCDC climate zones are considered sufficient to describe the area. For the segment from point A to point C, NCDC climate zone 5 is considered representative. For the river segment from point C to the mouth of the river, NCDC climate zone 9 is considered sufficient. The following section presents several examples of how these differences between NCDC climate zone and the clusters defined by the statistical methodology influence the depiction of the spatial extent and orientation of precipitation microclimates and how these depictions affect the information conveyed on the annual distribution of rainfall and precipitation extremes.
Failure to accurately depict differences in the distribution of monthly average rainfall and its extremes has a wide spectrum of implications for hydrologic design considerations. Precipitation is a fundamental piece of information for project design criteria and hydraulic structure construction (Henry and Cassidy 1978). The intensity, duration, and frequency of rainfall drive the decision process to define the appropriate criteria for structure capacity and design life. To derive accurate probabilities of average streamflow discharge and discharge extremes, streamflow models must have accurate values for maximum and minimum precipitation observed within a watershed in addition to information on the frequency of these precipitation extremes to determine recurrence intervals. These recurrence intervals and values for extreme streamflows are critical to determining safety criteria for hydraulic structures. Inadequate design of the carrying capacity of culverts and storm water systems, faulty construction processes, and placement of critical structures, including pumping stations in flood-prone areas, increase flood levels and thus a community’s exposure to risk (Ntelekos et al. 2007; Highfield and Brody 2006; Laing 2004; Linsley and Franzini 1972). Construction of hydraulic structures requires accurate high-resolution precipitation information to develop construction schedules and determine optimum placement for effective hazard mitigation projects.
Figure 7 shows the plots of the annual average monthly rainfall for groups 1–4 and their respective river segment to examine the effects of spatial resolution and orientation on the depiction of rainfall extremes and annual distribution In river segment A–B, Fig. 7 shows that the average monthly rainfall distribution looks very different than the average monthly rainfall distribution for locations downstream of point D. From northwest to southeast along the river, the range between minimum average monthly rainfall and maximum average monthly rainfall increases. Figure 8 shows the differences in monthly rainfall and maximum monthly rainfall for all five of the station groups. Variability of rainfall is very different between group 5 and group 1. The proximity of the group to the Gulf of Mexico may be one factor. Topography may be another factor. There is significant elevation change, in a relative sense, as the Mississippi River travels to the Gulf of Mexico. In Fig. 7, point A corresponds to the approximate location of the Old River Lock observation station, which sits at 70 feet mean sea level. Traveling downriver, the New Roads observing site, which is noted by point B, sits at 40 feet mean sea level. Point C corresponds to a point just upstream of Carville, whose station elevation is listed in NCDC metadata files as 25 feet mean sea level. Point D is near Reserve, which is listed at 10 feet mean sea level.
The averaging process to recreate the coarse resolution of the NCDC climate zones appears to dampen the rainfall extremes of the individual groups and skews the monthly distribution of rainfall in an average year. This results in significant underestimation of maximum and minimum values and also distorts when these extremes normally occur. For example, group 4 shows maximum rainfall occurs in December and January, with a secondary maximum in March. The neighboring groups of 2 and 3 show peak rainfall at the beginning of the hurricane season, in June and July, and increased variability of rainfall on an annual basis.
To quantify the effects of this averaging, the mean monthly rainfall statistics provided by an NCDC climate zone were compared to those of a group defined by the clustering technique. For this comparison, the variability in rainfall from point A to point C was compared to surrogate monthly-mean rainfall values computed for NCDC climate zone 5. The surrogate values were computed using only stations involved in the clustering activity and residing within the boundaries of climate zone 5. The stations used to derive average monthly rainfall for a pseudoclimate zone 5 (PSZ5) included the group 4 stations of Ville Platte, Old River Lock, New Roads, and Melville, the group 2 station of Grand Coteau, and the group 3 station of Cinclare.
Figure 9 and Table 6 show the comparison of the monthly means of PSZ5 with the monthly means of groups 2–4. Large differences in the trend and quantity of rainfall exist between PSZ5 and individual stations in group 2 during the hurricane season from June through October. The observing site of Cinclare is defined by NCDC as residing in climate zone 5, although the clustering technique indicated it fell into group 3, whose convex hull predominantly resides in NCDC climate zone 6. For a group 3 station, such as Cinclare, NCDC climate zone 5 underestimates the mean monthly precipitation for all months except November and December. The largest underestimates occur in the tropical season where the surrogate NCDC climate zone 5 underestimates precipitation by 16.57 mm (0.65 in.) in June, 25.51 mm (1.00 in.) in July, 28.25 mm (1.11 in.) in August, 12.61 mm (0.50 in.) in September, and 2.69 mm (0.11 in.) in October.
This analysis shows the effects of the practice of defining NCDC climate zones based on political boundaries and major river systems. The eastern boundary of NCDC climate zone 5 is the Mississippi River. The clustering technique shows evidence that supports redefining the NCDC climate zone boundaries. Cinclare is fewer than 10 miles from the city center of Baton Rouge and fewer than 5 miles from the Louisiana State University (LSU) campus. However, it is classified by NCDC as residing in NCDC climate zone 5 because it is on the west bank of the Mississippi River. Baton Rouge is on the east bank of the Mississippi River and therefore falls in NCDC climate zone 6. A classification scheme based on precipitation shows strong evidence that the average precipitation variability in an average year observed at Cinclare would be better represented by NCDC climate zone 6 than NCDC climate zone 5.
Additional evidence for redefining boundaries emerges by examining Grand Coteau, which the clustering technique placed in group 2. To use climate zone 5 values as a proxy for a group 2 station such as Grand Coteau would result in the average monthly rainfall being underestimated by 32.59 mm (1.28 in.) in June, 35.95 mm (1.42 in.) in July, 34.77 mm (1.37 in.) in August, 20.35 mm (0.80 in.) in September, and 5.66 mm (0.22 in.) in October. The southern boundary of NCDC climate zone 5 corresponds to parish lines between St. Landry, St. Martin, Iberville, and West Baton Rouge. The clustering technique shows evidence that supports redefining the NCDC climate zone southern boundary and moving it north to exclude southern portions of St. Landry.
An inaccurate depiction of precipitation patterns has implications for flood flows on major river systems, such as the Mississippi, which are heavily controlled by holding water in reservoirs or diverting flood flows via diversion structures into floodways. If NCDC climate zone 5 precipitation data were considered exclusively for the river segment A–C, a management agency would expect to see a period of maximum precipitation in December and January, a relative minimum in rainfall in February, followed by a secondary maximum in the spring months from March through April in an average year. However, this precipitation information would be flawed. The clustering technique identified two groups, group 3 and group 4, that exist along the river reach from point A to point C. From point B to point C, the maximum rainfall for group 3 occurs in the summer months of July and August, with a secondary maximum in December and January and another upswing in precipitation in March. Additionally, the difference between the maximum and minimum monthly precipitation is considerably different in terms of timing and quantity. For group 4, the difference between the monthly maximum precipitation of 146.64 mm (5.77 in.) occurring in January and the minimum monthly precipitation of 92.30 mm (3.63 in.) occurring in October is 54.34 mm (2.14 in.). For group 3, the difference between the monthly maximum precipitation of 165.42 mm (6.51 in.) occurring in July and the minimum monthly precipitation of 90.92 mm (3.58 in.) in October is 74.50 mm (2.93 in.). The distribution of rainfall is also masked because NCDC climate zone 5 underestimates the March and April average rainfall, does not capture the third maximum in March of group 3, and significantly underestimates the group 3 monthly precipitation for July and August by 25.52 mm (1.00 in.) and 28.25 mm (1.11 in.), respectively.
A statistical methodology was developed to detect statistically significant differences in average monthly rainfall observed at stations in southeast Louisiana that collect 24-h rainfall totals to determine if NCDC climate zones accurately captured the spatial extent of rainfall patterns within the region. This methodology employs cluster analysis using Euclidean distance as the measure of dissimilarity for the clustering technique. To resolve missing rainfall observations, an imputation scheme was developed that uses the two most similar stations (based on Euclidean distance) to determine appropriate values for missing rainfall observations. The clustering technique selected identified five distinct clusters (Fig. 5). For the study area, there are four NCDC climate zones that differ in spatial extent and orientation. These differences have large implications on the depiction of rainfall extremes and distribution of rainfall on an annual basis.
To mitigate hydrologic hazards, residents must know what those hazards are, where and when they occur, how often, and the cycles between surplus and deficit. To accurately capture these cycles, long-term precipitation records are required. The long-term precipitation database compiled for this study illustrates the importance of using multiple data sources to discover critical information on precipitation extremes and trends. There is tremendous value in ensuring data archives capture all the historical data available on rainfall extremes in a form that is usable and accessible from a reliable central access point that ensures security of the data from natural hazards. This one single activity is paramount to fostering collaboration among researchers and practitioners.
This extensive examination of the spatial characteristics of monthly rainfall derived from daily rainfall observations illustrates two critical needs for hydrologic services and the wide array of practitioners who depend on them. The first critical need is to rescue data and implement quality assurance processes to increase the length and accuracy of data contained in our nation’s precipitation archives. The second critical need is to use these enhanced data archives to increase the spatial resolution of precipitation information to understand precipitation patterns on subregional and watershed scales.
The economic and human losses due to hydrologic extremes within southeast Louisiana and our nation’s coastal zones illustrate the importance of accurate high-resolution precipitation information. This study highlights an existing information gap on hydrologic hazards and precipitation patterns. The comparison of the surrogate values derived for NCDC climate zones 5, 6, and 9 using stations located in the study area shows the need for hydrologic information at resolutions greater than the existing NCDC climate zone system. This is particularly true for coastal watersheds as the nation’s populations migrate to the coastline (U.S. Census Bureau 2007) and are exposed to the risks of freshwater flooding (Rappaport 2000). These new residents need information on the hydrologic extremes of the past. There is tremendous opportunity for new discussions and dialog because these shifting populations will become community leaders and land-use planning chairpersons in the near future within their new environments. Individuals can research the hydrologic events of the past, but they need to know what the effects will be in the future. It is to this end that researchers and practitioners will have to decide if the data provided by NCDC climate zones accurately depict the intensity, duration, and frequency of precipitation within their watershed to support smart sustainable growth within our nation’s coastal zones.
The authors thank the three anonymous reviewers who provided valuable comments on the methodology and content of this paper. We would also like to thank Valliappa Lakshmanan of NOAA’s National Severe Laboratory (NSSL) for his assistance in suggesting appropriate imputation techniques and to David Jorgensen and Kevin Kelleher of NOAA’s NSSL for their suggestions on this manuscript.
Corresponding author address: Dr. Suzanne Van Cooten, National Severe Storms Laboratory, 120 David L. Boren Blvd., Norman, OK 73072. Email: email@example.com
* Supplemental information related to this paper is available at the Journals Online Web site: http://dx.doi.org/10.1175/2009JHM1076.s1.