1. Background
Northwestern Mexico, a region that is strongly affected by the North American monsoon system (NAMS), receives between 60% and 80% of its annual precipitation during the June–September (JJAS) monsoon season (Douglas et al. 1993). Understanding the genesis of warm season rainfall in northwestern Mexico has strong implications for warm season precipitation predictability over the NAMS region as well as much of the southern tier of U.S. states (which effectively are the northwestern extremity of the much more pronounced NAMS phenomenon over northwestern Mexico). A key to understanding this predictability is datasets that support analyses of land–atmosphere interactions. The dataset described in this paper arises from this motivation.
To date, data that will support land–atmosphere feedback studies within the NAMS region, particularly land surface states and fluxes such as soil moisture and turbulent heat fluxes, have been essentially nonexistent. This is a result mostly of the absence of direct observations of such variables as soil moisture (which, it must be said, is not a problem specific to the NAMS region). For instance, the existing CPC (Climate Prediction Center) dataset (information available online at http://www.cpc.ncep.noaa.gov/products/precip/realtime/retro.shtml) for Mexico only includes retrospective (1948–2004) and real-time gridded precipitation. It does not include hydrological variables (e.g., soil moisture), which are needed to support land surface feedback analysis (see, e.g., Zhu et al. 2007).
The dataset reported here is patterned on previous work by Maurer et al. (2002) over the North American Land Data Assimilation System (N-LDAS) domain. We note that the dataset described by Maurer et al. (2002) has been used in over 50 published studies, which speaks to the value of such long-term derived datasets. The Maurer et al. domain includes the conterminous United States and portions of Canada and Mexico, but extends south only to 25°N, and in any event the Mexican portion of the domain was largely treated as a “filler” or buffer zone when the dataset was created, recognizing that the station source data used outside the continental United States were not as carefully quality controlled as were the U.S. data. Given the importance of the NAMS domain, we describe here a dataset for all of Mexico that is compatible with the Maurer et al. (2002) data.
It has been a particular challenge to generate a long-term daily gridded climatological forcing dataset for land surface models (including precipitation, and daily maximum and minimum temperatures) over Mexico because of quality control problems, as well as discontinuity and unavailability of raw station data. Some datasets that have recently become available help to alleviate this problem though. Recently (in 2005), the Servicio Meteorológico Nacional of Mexico (SMN) released a long-term improved surface station dataset that includes precipitation, and daily maximum and minimum temperature covering all of Mexico and extending from the mid-1920s (a few stations date to the 1900s). In this note, we describe a consistent set of observation-based meteorological surface forcings, and derived hydrological surface fluxes and state variables over Mexico from January 1925 to October 2004 with 1/8° spatial resolution at a subdaily (3 h) time step.
2. Gridded dataset development
a. Climate data, quality control, and gridding
Our main source of station data is the recently updated and improved long-term surface station data from SMN (obtained courtesy of Ing. A. Gonzalez Serratos 2005, personal communication). It includes daily precipitation and daily maximum and minimum surface air temperatures for around 5000 stations covering all of Mexico from the 1920s to 2004. It is essentially an updated version of the Extractor Rápido de la Información de Climatologic (ERIC II) dataset (Quintas 2000). The station density is quite variable over time as shown in Fig. 1a, with a peak in the 1970s and 1980s, then a decreasing density in northern Mexico from the 1990s to the present. Two other data sources compensate somewhat for the scarcity of precipitation observations in northern Mexico from 1995 onward, as is shown in Fig. 1b. The first is SMN daily precipitation data for 1995–2003 (provided courtesy of Dr. M. Cortez Vázquez, SMN, 2004, personal communication) with around 1000 stations. The second is for northwestern Mexico and comes from the North American Monsoon Experiment (NAME) Event Rain Gage Network (NERN), which provides daily precipitation data for 2002–2005 (provided courtesy of D. Gochis, NCAR, 2005, personal communication) for 86 stations that cross the Sierra Madre (Gochis et al. 2003). This network is especially useful in that it provides some information about high-elevation precipitation. We are aware of the existence of additional station data compiled by S. Aguilar of Instituto Mexicano de Tecnología del Agua (IMTA) that would fill in some of the data gaps in northern Mexico (north of about 25°N) especially for the period prior to 1960. These data have not yet been formally released; when they are, we intend to use them to augment the gridded data products we report here.
After extracting the raw station data from these three archives, we performed quality control checks to remove implausible values (mostly arising from data entry errors), and statistical analysis to screen out potential outliers and/or erroneous data. The quality control checks included
the removal of negative precipitation;
checking for maximum temperatures less than or equal to minimum temperature for the same day (in such cases, both temperatures were set to missing values);
checking for days with the same value repeated 10 or more consecutive times (except for zero precipitation); in such cases, all values were set to be missing values; and
checking for daily values that greatly exceeded climatological values. In such cases, the data were removed and replaced with a missing value; for example, we removed all values for a given station whose annual mean (temperature) or total (precipitation) was more than four standard deviations from the mean, and applied the same procedure on a monthly basis.
The quality controlled station data were then gridded using the Synographic Mapping System (SYMAP) method [Shepard (1984); also applied by Maurer et al. (2002)], which uses the weighted (based on the inverse square of the distances to the target grid cell) average of all records in the neighborhood of a grid cell to produce a long-term gridded daily precipitation and temperature dataset (1925–October 2004) at 1/8° spatial resolution over all of Mexico. In so doing, the domain was roughly divided into three climate regions: northwestern Mexico (north of 22°N, west of 104°W), which is mainly characterized by monsoon precipitation; Mexico east (north of 22°N, east of 104°W); and Mexico south (south of 22°N), which is strongly influenced by tropical cyclones. It should be noted that as in any attempt at regional classification, there are inevitably exceptions to the above broad categories and mechanisms, and considerable climate variability may be present within each region. The gridding was done by region to assure that only stations in the same climate region were considered in the gridding process, which is especially important in preserving precipitation patterns for periods with sparse gauge station distribution (e.g., pre-1960). To make maximum advantage of the gauge station data, the gridding is done year by year for each region because of the discontinuous gauge records for many stations. For each year, we only used stations with more than 50 days of nonmissing data. All other gridding steps are as described in Maurer et al. (2002).
b. Land surface characteristics
The Variable Infiltration Capacity (VIC) hydrological model is a macroscale hydrology model (Liang et al. 1994, 1996) that balances both surface energy and water over a grid mesh. It is designed both for off-line or stand-alone use to simulate the water and energy budgets of large areas (e.g., large continental river basins, continents), and for use in coupled land–atmosphere models to simulate the role of the land surface in partitioning moisture and energy. Land surface characteristics required by the VIC model include soils data, topography, and vegetation characteristics. The soil texture is based on a 5′ Food and Agriculture Organization dataset (FAO 1998). The specific soil characteristics (e.g., field capacity, wilting point, and saturated hydraulic conductivity) were obtained from algorithms of Cosby et al. (1984), Rawls et al. (1998), and Reynolds et al. (2000) for each soil texture type. The VIC model was implemented using a three-layer soil column where the depths of each layer were adjusted during the calibration process. The calibration process was as described in Maurer et al. (2002), and specifically targeted 14 long-term stream gauge records as described below. Adjustments to parameters resulting from the calibration process were applied regionally so that the entire domain was adjusted. Land cover characteristics were derived from the University of Maryland global vegetation classifications (Hansen et al. 2000). The leaf area index (LAI) is based on the gridded (1/4°) monthly global LAI database of Myneni et al. (1997). In this study, the monthly LAI is specified for each grid cell for each vegetation class with no interannual variation included (Maurer et al. 2002).
The forcings for the VIC model are precipitation, temperature, wind, vapor pressure, downward longwave and shortwave radiation, and air pressure. Among all the driving forcings, only precipitation and temperature are taken directly from observations. Wind fields are linearly interpolated from 1949–2004 National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis data (Kalnay et al. 1996). Prior to 1949, a daily climatology from 1949 to 2003 is used. The other meteorological and radiative variables are calculated based on relationships with precipitation, daily mean temperature, and the daily temperature range, as described in Maurer et al. (2002).
3. Calibration
Because the VIC model uses a conceptually based representation of runoff generation, several parameters need to be adjusted to produce an acceptable match of model-simulated discharge with observations. This adjustment is important in that it affects other moisture (evapotranspiration) and energy fluxes. The infiltration parameter (bi) defines the shape of the variable infiltration capacity curve. It describes the amount of available infiltration capacity as a function of relative saturated grid cell area and controls the amount of water that can infiltrate into the soil. Usually, a higher value of bi gives lower infiltration and yields higher surface runoff. The second and third soil layer thicknesses (d2, d3) affect the water available for transpiration and for baseflow. The seasonal peak flows (which generally are dominated by baseflow) are reduced and the loss of soil moisture due to transpiration increases for thicker soil depths. Three baseflow parameters including the maximum baseflow velocity (Dm), the fraction of Dm (Ds), and the fraction of maximum soil moisture content of third layer (Ws) at which nonlinear baseflow occurs determine how quickly the water stored in the third layer is evacuated as baseflow. These five model parameters were targeted for calibration. The procedure of matching the simulated and observed streamflows through calibration ensures that for a long enough time evapotranspiration is realistically estimated because the change in surface storage is relatively small compared to other accumulated variables in the water balance system. On this basis, and given the physically based model parameterizations of the soil moisture and energy fluxes calculation, the other surface flux and state variables such as soil moisture should represent observations reasonably well, at least in the aggregate, and Maurer et al. (2002) and Nijssen et al. (2001) have shown this to be the case over the continental United States and Eurasia, respectively.
The streamflow dataset we used is termed Banco Nacional de Datos de Aguas Superficiales (BANDAS) and is a product of the Comisión Nacional del Agua (CNA) and IMTA. Because nearly all of the larger river basins represented in BANDAS are regulated, we selected 14 comparatively small basins (less than 10 000 km2) with long-term records during the period of 1970s–1990s (a period during which the precipitation gauge station density is relatively high). These stations are more or less evenly distributed across Mexico (shown in Fig. 2 and Table 1). These basins represent different climate zones (according to Thornthwaite’s precipitation effectiveness index; information online at http://www.gisdevelopment.net/magazine/years/2005/dec/40_2.htm), with five (basins 2, 3, 4, 5, 6) in the NAMS core region, five (basins 1, 2, 3, 4, 7) in semiarid areas, seven (basins 5, 6, 8, 9, 10, 12) in dry subhumid areas, and two (basins 11 and 14) in the humid area. This distribution provides a basis for transferring the parameters over the entire region. Calibration was performed via a trial and error procedure and focused on matching the total annual flow volume and the shape of the mean monthly hydrographs.
Table 2 shows the values of calibrated parameters and the Nash–Sutcliffe model efficiency (Ef) and bias for each basin on a monthly time step. The model parameters vary widely between different basins. Seemingly, there are some empirical relationships among the estimated parameters and climate conditions, with two humid regions (Poza Rica and Las Perlas) having much higher Bi than other semiarid and dry subhumid basins. Among the 14 basins, there are 11 with Ef exceeding 0.5. Three semiarid basins (Casas Grandes, Villalba Conchos, and Cazadero II) have the lowest model efficiency. Generally, VIC did a good job of capturing the peak time and temporal pattern of streamflow for both arid and wet regions as shown in Fig. 3. The obvious problem is the great overestimation or underestimation of peak flows in some years especially for arid basins. It is not entirely clear whether input precipitation errors, model parameterization deficiencies, or both are responsible for the simulation bias. However, we argue that the problems are at least in substantial part related to problems with meteorological forcings, especially for the dry regions in northwestern Mexico because of the sparse gauge density there (Fig. 1). The 20-yr mean monthly hydrographs for the 14 calibrated basins (not shown here) shows the mean annual cycle matches observations quite well in most cases with all biases less than 25% (shown in Table 2).
The model parameters were transferred to the uncalibrated basins from the nearest calibrated basins by applying the calibrated parameters. The basin boundary cells were smoothed over the neighboring four cells by arithmetic averaging of the parameters.
4. Comparison of surface fluxes with the North American Regional Reanalysis
Because there are essentially no energy flux observations over Mexico suitable for evaluation of the derived values, we compared monthly gridded fields used either to drive VIC or derived from the model with those produced by the North American Regional Reanalysis [NARR; information online at http://dss.ucar.edu/pub/narr/; Mesinger et al. (2006)] for the 21-yr period of overlap (1979–99). In particular, we compared downward shortwave radiation (Fig. 4), net radiation (Fig. 5), latent heat (Fig. 6), and sensible heat (not shown here). It should be noted that the downward solar radiation used in the VIC simulations comes from a relationship with the daily temperature range described by Thornton and Running (1999), whereas in NARR, solar radiation (and all other variables) come from analysis fields that results from use of the NCEP Eta Model with a data assimilation system. The Eta Model uses the NCEP–Oregon State University–U.S. Air Force–NWS/Hydrologic Research Lab (Noah) land surface scheme, which in general character is similar to VIC, but differs in many specifics.
Compared to NARR, the downward solar radiation used to force VIC is typically around 20 W m−2 less over most Mexico in January–March (JFM), April–June (AMJ), and October–December (OND), with the highest downward bias (∼60 W m−2) occurring in northwestern Mexico in summer (JAS). In previous (unpublished) work in which we have compared the Thornton and Running (1999) algorithm with high-quality Surface Radiation (SURFRAD) network observations (information online at http://www.srrb.noaa.gov/surfrad/, which monitors surface radiation in the continental United States), the Thornton and Running estimates were slightly biased downward (typical deviations less than 10 W m−2) in the daily average in summer. We suspect that the larger differences shown here are a result of both some bias in the temperature range algorithm, and some bias related to the underestimation of atmospheric attenuation in the Eta Model, perhaps as a result of biases in cloud cover and/or water vapor simulation in the coupled model. Figure 5 shows differences in net radiation (a derived variable in both VIC and NARR). The major differences between VIC and NARR occur in spring (AMJ) and winter (JFM), and mostly over the Sierra Madre where VIC values are around 40 ∼60 W m−2 larger than NARR. In summer (JAS) and autumn (OND), the VIC net radiation is larger, by around 20 W m−2, over the Sierra Madre, but is less by about the same magnitude over parts of northwestern Mexico and southern Mexico. Given the lower downward solar radiation used to drive VIC, the generally higher net radiation is somewhat surprising; however, there are many possible sources of differences, including differences in albedo, lower computed surface temperature, differences in downward longwave radiation (VIC uses a temperature-based computation, whereas the Eta Model performs a radiative transfer computation), and differences in emissivity. Also, although the downward shortwave radiation produced by the Thornton and Running (1999) temperature index method is lower than the Eta surface solar radiation in NARR, an exploratory investigation (not shown here) indicates that the VIC reflected solar radiation is lower than that of NARR as well, due to generally lower albedo values, and net solar radiation is larger for VIC than for NARR. This contributes in substantial part to the generally higher net radiation in VIC.
Among the four surface flux variables evaluated, latent heat exhibits the most consistent pattern between VIC and NARR (Fig. 6). Since VIC-simulated evaporation is thought to be plausible (at least on an annual basis) because of the calibration described in section 3, NARR evaporation appears to be represented reasonably as well. This is confirmed by Korolevich et al.’s (2005) comparison of evaporation between NARR and observations in Canada. To close the energy budget, the extra (VIC) net radiation goes to sensible heat and ground heat. Sensible heat differences (not shown) show consistently larger sensible heat for VIC as compared with NARR especially over the Sierra Madre in JFM and AMJ. The negative differences in net radiation (Fig. 5) are enhanced for sensible heat especially in AMJ and JAS along the west coast and southern Mexico from about 20 to 40∼60 W m−2.
5. Data availability and archiving
The forcing datasets and derived surface fluxes and state variables are archived in Network Common Data Form (netCDF) format, which is comparable to the dataset described by Maurer et al. (2002). The 3-hourly, daily, and monthly summaries of model forcing variables and derived variables (listed in Table 3) are available via FTP from the authors via ftp (www.hydro.washington.edu). For the 3-hourly output, precipitation, evaporation, surface runoff, and baseflow are accumulated values for the preceding 3 h. Soil moisture and snow water equivalent are reported at the end of the time step. All other variables are averages over the time step. For the daily and monthly summary data, the first four variables are also accumulated numbers over that day or that month. Other fluxes and state variables are averages of the eight reported values during that day.
6. Conclusions
We have described a long-term derived dataset of land surface states and fluxes over the whole of Mexico, which spans the period of 1925–October 2004 at 1/8° spatial resolution at a subdaily (3 h) time step. The dataset is intended to support long-term land–atmosphere feedback analyses that require spatially distributed values of hydrologic variables like soil moisture. The simulated runoff included in the dataset matches the observations plausibly over most of 14 small river basins spanning all of Mexico, which suggests that long-term mean evapotranspiration is realistically reproduced. On this basis, and given the physically based parameterizations in the model, over shorter time scales the other components in the surface water balance such as soil moisture should be represented reasonably well. In general, the derived surface fluxes in VIC show compatible spatial patterns with NARR data on a seasonal mean basis, but with some difference in the magnitude, especially for downward shortwave radiation and net radiation. These validation results provide confidence that the dataset can be used for a variety of purposes, such as (a) the creation of more realistic initial conditions for weather and climate forecasting and simulation models, (b) model diagnostic studies, (c) climate change and trend analysis of simulated hydrological variables, and (d) evaluation of the role of land surface–atmosphere interactions in the NAMS region.
Acknowledgments
This publication was funded by the NOAA Office of Global Programs under Cooperative Agreement NA030AR4310062 with the University of Washington. The authors appreciate the assistance of Tereza Cavazos (CICESE, Mexico), Miguel Cortez Vázquez and Alejandro Gonzalez Serratos (SMN), and David Gochis (NCAR) in providing access to meteorological station data.
REFERENCES
Cosby, B. J., G. M. Hornberger, R. B. Clapp, and T. R. Ginn, 1984: A statistical exploration of the relationships of soil moisture characteristics to the physical properties of soils. Water Resour. Res., 20 , 682–690.
Douglas, M. W., R. A. Maddox, and K. Howard, 1993: The Mexican monsoon. J. Climate, 6 , 1665–1677.
FAO, 1998: Digital Soil Map of the World and Derived Soil Properties. Land and Water Digital Media Series 1, Food and Agriculture Organization, CD-ROM.
Gochis, D. J., J. C. Leal, W. J. Shuttleworth, C. J. Watts, and J. Garatuza-Payan, 2003: Preliminary diagnostics from a new event-based precipitation monitoring system in support of the North American Monsoon Experiment. J. Hydrometeor., 4 , 974–981.
Hansen, M. C., R. S. DeFries, J. R. G. Townshend, and R. Sohlberg, 2000: Global land cover classification at 1 km spatial resolution using a classification tree approach. Int. J. Remote Sens., 21 , 1331–1364.
Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77 , 437–471.
Korolevich, V., R. Fernandes, S. Wang, A. Simic, F. Gong, and P. Zelic, cited. 2005: Assessment of reanalysis forcings for use within Canada-wide water budgets: Part 2. [Available online at http://wwwt.emc.ncep.noaa.gov/mmb/rreanl/korolevich2.ppt.].
Liang, X., D. P. Lettenmaier, E. Wood, and S. J. Burges, 1994: A simple hydrologically based model of land surface and energy fluxes for general circulation models. J. Geophys. Res., 99 , 14415–14428.
Liang, X., D. P. Lettenmaier, and E. Wood, 1996: One-dimensional statistical dynamic representation of subgrid spatial variability of precipitation in the two-layer variable infiltration capacity model. J. Geophys. Res., 101 , 21403–21422.
Maurer, E. P., A. W. Wood, J. C. Adam, D. P. Lettenmaier, and B. Nijssen, 2002: A long-term hydrologically based dataset of land surface fluxes and states for the conterminous United States. J. Climate, 15 , 3237–3251.
Mesinger, F., and Coauthors, 2006: North American Regional Reanalysis. Bull. Amer. Meteor. Soc., 87 , 343–360.
Myneni, R. B., R. R. Nemani, and S. W. Running, 1997: Estimation of global leaf area index and absorbed PAR using radiative transfer model. IEEE Trans. Geosci. Remote Sens., 35 , 1380–1393.
Nijssen, B., R. Schnur, and D. P. Lettenmaier, 2001: Global retrospective estimation of soil moisture using the Variable Infiltration Capacity land surface model, 1980–93. J. Climate, 14 , 1790–1808.
Quintas, I., 2000: ERIC II: Documentación de la base de datos climatológica y del programa extractor (ERIC II: Documentation of the climatologic database and data extraction program). Instituto Mexicano del Tecnología del Agua (IMTA), 54 pp.
Rawls, W. J., D. Gimenez, and R. Grossman, 1998: Use of soil texture, bulk density, and slope of water retention curve to predict saturated hydraulic conductivity. Trans. Amer. Soc. Agric. Eng., 41 , 983–988.
Reynolds, C. A., T. J. Jackson, and W. J. Rawls, 2000: Estimating soil water-holding capacities by linking the Food and Agriculture Organization soil map of the world with global pedon databases and continuous pedontransfer functions. Water Resour. Res., 36 , 3653–3662.
Shepard, D. S., 1984: Computer mapping: The SYMAP interpolation algorithm. Spatial Statistics and Models, G. L. Gaile and C. J. Willmott, Eds., D. Reidel, 133–145.
Thornton, P. E., and S. W. Running, 1999: An improved algorithm for estimating incident daily solar radiation from measurements of temperature, humidity, and precipitation. Agric. For. Meteor., 93 , 211–228.
Zhu, C., T. Cavazos, and D. P. Lettenmaier, 2007: Role of antecedent land surface conditions in warm season precipitation over northwestern Mexico. J. Climate, 20 , 1774–1791.
Selected river basins for calibration and their climate conditions.
Calibrated soil parameters and statistics.
Variables included in data archive.