Abstract

Global land cover data are widely used in weather, climate, and hydrometeorological models. The Collection 5.1 Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Type (MCD12Q1) product is found to have a substantial amount of interannual variability, with 40% of land pixels showing land cover change one or more times during 2001–10. This affects the global distribution of vegetation if any one year or many years of data are used, for example, to parameterize land processes in regional and global models. In this paper, a value-added global 0.5-km land cover climatology (a single representative map for 2001–10) is developed by weighting each land cover type by its corresponding confidence score for each year and using the highest-weighted land cover type in each pixel in the 2001–10 MODIS data. The climatology is validated by comparing it with the System for Terrestrial Ecosystem Parameterization database as well as additional pixels that are identified from the Google Earth proprietary software database. When compared with the data of any individual year, this climatology does not substantially alter the overall global frequencies of most land cover classes but does affect the global distribution of many land cover classes. In addition, it is validated as well as or better than the MODIS data for individual years. Also, it is based on higher-quality data and is validated better than the Global Land Cover Characteristics database, which is based on 1 year of Advanced Very High Resolution Radiometer data and represents a widely used first-generation global product.

1. Introduction

Global land cover information is important for regional and global models because plant functional types (PFTs) strongly influence the land–atmosphere exchanges of water, energy, and carbon (e.g., Dickinson et al. 1986; Sellers et al. 1996; Bonan 1996). In particular, having an accurate baseline of land cover distribution is essential for modeling land surface processes, which affect many aspects of Earth’s climate system (Bonan et al. 2002). Land cover products are used in land surface models to specify model parameters [e.g., land surface roughness length (DeFries and Townshend 1994) or vegetation root distribution (Zeng 2001)], to derive other quantities that are used directly as model inputs (e.g., fractional vegetation cover; Zeng et al. 2000), and to infer land cover changes that might affect Earth’s climate (Lawrence and Chase 2010).

Land cover data for global models are typically derived from moderate-resolution (0.5–1 km) land cover datasets that are based on satellite imagery. Further, many global land models use vegetation-type data derived from one year of satellite imagery. In particular, the Global Land Cover Characterization (GLCC) dataset (Loveland et al. 2000), which is based on one year of Advanced Very High Resolution Radiometer (AVHRR) land cover data (from April 1992 to March 1993), is used in the Goddard Earth Observing System Model, version 5 (GEOS-5), for the Modern-Era Retrospective Analysis for Research and Applications (MERRA) reanalysis (Rienecker et al. 2011), in the global weather forecasting model at the European Centre for Medium-Range Weather Forecasts (ECMWF) (ECMWF 2013), and in the “Noah” land surface model (Chen and Dudhia 2001; Ek et al. 2003) as implemented in the Global Forecast System (GFS) model and the Weather Research and Forecasting (WRF) regional model. In addition, the PFTs and other land surface properties in the Community Land Model (CLM) are largely determined from land products that use April 1992–March 1993 AVHRR data and 2001 MODIS data (Lawrence and Chase 2007; DeFries et al. 2000; Ramankutty and Foley 1999; Friedl et al. 2002).

The use of a single year of land cover data can be problematic not so much because of actual land cover changes but because of the low accuracies of the land cover datasets. There are significant disagreements between the various moderate-resolution land cover datasets (e.g., Hansen and Reed 2000; Giri et al. 2005; McCallum et al. 2006), and there can even be disagreement within a single dataset spanning multiple years. For example, the Collection 5.1 MODIS Land Cover Type (MCD12Q1) dataset, which includes vegetation-type maps for multiple years, has considerable interannual variability that is far greater than the amount of actual vegetation change (e.g., Friedl et al. 2010; Liang and Gong 2010). This is despite the steps taken to eliminate some of the interannual variability in the product’s creation.

Given these disagreements, it is natural to wonder whether these annual data can be condensed to create a more accurate product. There have been attempts at harmonizing various vegetation products in the past, but these efforts are often hampered because there are many factors that need to be considered when comparing vegetation products from different sources (differences in legends, classification algorithms, sensors, etc.; Herold et al. 2006, 2008). In this study, we take a simpler approach by developing a consistent land cover “climatology” (defined here as a single representative global land cover map for 2001–10) that is based on 10 years of MCD12Q1 data. Since we focus only on these MODIS data, we do not have to handle complexities such as inconsistent legends, spatial resolutions, or classifiers.

To assess the potential benefits of using the land cover climatology we pose two questions: 1) How is the land cover climatology different from land cover data for individual years? 2) Is the land cover climatology any more accurate than the land cover data from individual years? In section 2, we describe the MCD12Q1 product and its interannual variability. We discuss the method for developing our land cover climatology in section 3. Our value-added product is evaluated using two validation datasets and is compared with the original MODIS annual data and the GLCC data in section 4. Concluding remarks are provided in section 5.

2. MODIS data description and data analysis

a. Description of Collection 5.1 MCD12Q1 data

The Collection 5.1 MCD12Q1 dataset contains vegetation-type information for each year from 2001 to 2010 at an ~500-m pixel resolution. The MODIS land cover type (MLCT) classification algorithm uses a supervised decision-tree algorithm (Quinlan 1993), which is distribution independent and contains robust procedures for handling missing data. The algorithm also relies on a technique to improve classification accuracies by combining a group of weaker learners into a single stronger learner by voting on the basis of the accuracies of the learners, known as boosting (Schapire 1990; Freund and Schapire 1997). When boosted, a classification algorithm is applied to reweighted versions of the training data (in which misclassified data are given a higher weight), and then a weighted majority vote is taken of all of the produced classifiers, resulting in dramatic improvements in performance for many classification algorithms. Boosting has been shown to be a form of additive logistic regression, thus giving probabilities of class membership for each class (Friedman et al. 2000). In this study, we use the most likely International Geosphere–Biosphere Programme (IGBP; Townshend 1992) class label at each pixel for each year from 2001 to 2010, as well as the probability of class membership (which we call the confidence score) associated with the class label.

Details of the MCD12Q1 product can be found in Friedl et al. (2010), but a few important aspects are given here. The MLCT algorithm uses a database of 2095 globally distributed sites, called the System for Terrestrial Ecosystem Parameterization (STEP) database (Muchoney et al. 1999). The STEP database is constantly updated as newer data, such as those from the Google Earth proprietary software and database tool, become available and as various weaknesses of the database are identified and addressed (Friedl et al. 2010). In this study, we use a newer version of the STEP database [which contains some modifications to the version-5 (V5.1) STEP database that is used to train the Collection 5.1 MCD12Q1 product] for validation of our new method.

Input features for the MLCT algorithm include spectral and temporal information from MODIS bands 1–7, as well as MODIS-derived estimates of the enhanced vegetation index (EVI; Huete et al. 2002) and land surface temperature (Wan et al. 2002). Reflectance data for bands 1–7, as well as those used to compute EVI, are normalized to a consistent nadir-view geometry on the basis of the Bidirectional Reflectance Distribution Function models of surface anisotropy (MCD43A4; Schaaf et al. 2002). These data are produced on a rolling 8-day interval that is based on 16 days of MODIS data (from both the Terra and Aqua satellites), although the MLCT algorithm aggregates these data into 32-day averages to reduce data volume, and these data are used to generate annual land cover maps. There are several postclassification steps to correct for biases of the decision-tree classification [described in detail in Friedl et al. (2010), their sections 4 and 5]. Furthermore, to reduce interannual variability of the classification, the class label is changed between years only if a new label has a higher confidence score than the previous label for the preceding two years.

b. Interannual variability of MODIS land cover data

Despite efforts to reduce the interannual variability in the 2001–10 MCD12Q1 product, there is still an unreasonable amount of interannual variability relative to actual land cover changes. For example, about 40% of all nonwater pixels undergo at least one change between 2001 and 2010 (Table 1). Changes in land cover type are abundant on all continents and tend to occur frequently in areas that do not have an easily recognizable and homogenous land cover type (e.g., the Sahel region in Africa or the boreal forests in Eurasia and North America; Fig. 1a). In fact, only the classes labeled as urban, snow/ice, barren, and evergreen broadleaf have a single land cover type for more than 80% of pixels during the 10-yr record. In all of the remaining types, 30% or more of their pixels have two land cover types, and in all but the type labeled as open shrubland, 10% or more of their pixels have three or more land cover types between 2001 and 2010. In the most unstable types (deciduous needleleaf, permanent wetlands, and closed shrublands), about one-half of the pixels have three or more land cover types during the 10-yr period.

Table 1.

IGBP land cover (LC) classes ordered by stability (defined by the percentage of “no change” in the first column) of each nonwater class. Data columns 1–5 show the percentages of each class that undergo a specified amount of change from 2001 to 2010. Data column 6 shows the average confidence scores for each class from the 2001–10 MCD12Q1 data, data column 7 shows the average global abundances of each class (as a percentage of all nonwater pixels) for the 2001–10 MCD12Q1 data, data column 8 shows the average global abundances of each class for our LC climatology, and data column 9 shows the relative difference between column 7 and column 8. The categories in columns 1–5 are determined from our LC type climatology at the native MODIS resolution of 0.5 km, and the sum of columns 2 and 3 is the same as that of columns 4 and 5. Toggles (columns 4 and 5) denote switches between different land cover types in subsequent years. Percent difference (column 9) is computed as 100% × (column 8 − column 7)/column 7.

IGBP land cover (LC) classes ordered by stability (defined by the percentage of “no change” in the first column) of each nonwater class. Data columns 1–5 show the percentages of each class that undergo a specified amount of change from 2001 to 2010. Data column 6 shows the average confidence scores for each class from the 2001–10 MCD12Q1 data, data column 7 shows the average global abundances of each class (as a percentage of all nonwater pixels) for the 2001–10 MCD12Q1 data, data column 8 shows the average global abundances of each class for our LC climatology, and data column 9 shows the relative difference between column 7 and column 8. The categories in columns 1–5 are determined from our LC type climatology at the native MODIS resolution of 0.5 km, and the sum of columns 2 and 3 is the same as that of columns 4 and 5. Toggles (columns 4 and 5) denote switches between different land cover types in subsequent years. Percent difference (column 9) is computed as 100% × (column 8 − column 7)/column 7.
IGBP land cover (LC) classes ordered by stability (defined by the percentage of “no change” in the first column) of each nonwater class. Data columns 1–5 show the percentages of each class that undergo a specified amount of change from 2001 to 2010. Data column 6 shows the average confidence scores for each class from the 2001–10 MCD12Q1 data, data column 7 shows the average global abundances of each class (as a percentage of all nonwater pixels) for the 2001–10 MCD12Q1 data, data column 8 shows the average global abundances of each class for our LC climatology, and data column 9 shows the relative difference between column 7 and column 8. The categories in columns 1–5 are determined from our LC type climatology at the native MODIS resolution of 0.5 km, and the sum of columns 2 and 3 is the same as that of columns 4 and 5. Toggles (columns 4 and 5) denote switches between different land cover types in subsequent years. Percent difference (column 9) is computed as 100% × (column 8 − column 7)/column 7.
Fig. 1.

Global distributions of (a) the number of vegetation types and (b) the number of toggles (or switches) between vegetation types from 2001 to 2010. To reduce data volume, the original data on an ~0.5-km sinusoidal grid are projected to geographic coordinates at a 0.1° resolution from 60°S to 80°N.

Fig. 1.

Global distributions of (a) the number of vegetation types and (b) the number of toggles (or switches) between vegetation types from 2001 to 2010. To reduce data volume, the original data on an ~0.5-km sinusoidal grid are projected to geographic coordinates at a 0.1° resolution from 60°S to 80°N.

Areas with two or more land cover types also have a large number of toggles (switches between different land cover types in subsequent years) in the 2001–10 data (Table 1; Fig. 1b). As one might predict, pixels with more vegetation types in the period have a larger number of toggles, but the number of toggles is typically higher than the number of land cover types, suggesting that in many areas the labels repeatedly toggle back and forth between the same two land cover types. The number of toggles, however, can also be fewer than the number of land cover types if there is only one change in the 10-yr record (corresponding to two land cover types). Except for the four relatively stable land cover classes mentioned before (urban, snow/ice, barren, and evergreen broadleaf), more than 30% of pixels in the remaining land cover types have two or more toggles. In the least stable land cover classes (deciduous needleleaf, permanent wetlands, and closed shrublands), 65.3%–80.6% of all of the pixels undergo two or more toggles. Over the globe, about 30% of all land pixels undergo two or more toggles.

For a majority of land cover classes, most of the switches occur with just a few other land cover classes. Table 2 shows, for each class in any given year from 2001 to 2009, the average percentage of pixels that do not change from year to year as well as the four most common classes to which a pixel with a given land cover type changes (and the percentage of pixels that change to these classes). For example, on average, in any given year from 2001 to 2009, 79.4% of type-1 pixels remained type 1 in the subsequent year and 8.5% of type-1 pixels became type 5 in the subsequent year. In most cases (except for closed shrublands, woody savannas, and permanent wetlands), over one-half of the switches involve just two additional classes. In some cases, these additional classes are biophysically similar to the given land cover class. For example, pixels labeled as evergreen needleleaf and deciduous broadleaf are most likely to switch to mixed forests and pixels labeled as savanna and woody savanna are most likely to switch between each other. There are also many cases in which the switches involve dissimilar types, however. For example, woody savanna is the first or second most likely class to which all of the forest type pixels switch, open shrubland is most likely to switch to grassland, and grassland is most likely to switch to open shrubland.

Table 2.

For each land cover type (table column 1), shown are the average percentage of pixels that do not change for any given year from 2001 to 2009 (table column 2), as well as four additional classes to which the pixels most commonly switch (table columns 3, 5, 7, and 9) and the percentage of pixels that change to these additional classes (table columns 4, 6, 8, and 10). The rows are given in decreasing order by the values in column 2.

For each land cover type (table column 1), shown are the average percentage of pixels that do not change for any given year from 2001 to 2009 (table column 2), as well as four additional classes to which the pixels most commonly switch (table columns 3, 5, 7, and 9) and the percentage of pixels that change to these additional classes (table columns 4, 6, 8, and 10). The rows are given in decreasing order by the values in column 2.
For each land cover type (table column 1), shown are the average percentage of pixels that do not change for any given year from 2001 to 2009 (table column 2), as well as four additional classes to which the pixels most commonly switch (table columns 3, 5, 7, and 9) and the percentage of pixels that change to these additional classes (table columns 4, 6, 8, and 10). The rows are given in decreasing order by the values in column 2.

It is clear that the interannual variability in land cover types shown in Tables 1 and 2 and Fig. 1 mostly reflects the deficiencies of the MLCT product rather than actual vegetation change. It also suggests that if the land cover product for a given year were to be used, for example, to parameterize global models (e.g., Zeng et al. 2002; Lawrence and Chase 2007) then the results would be strongly dependent upon the specific year used. Therefore, the question arises: How can we develop a land cover–type climatology using the strengths (rather than weaknesses) of the MODIS land cover product? This question will be addressed next in section 3.

3. Method to develop the land cover climatology

To develop a value-added and sensible climatology that does not reflect the interannual variability described in section 2, we use the primary IGBP land cover layers as well as the confidence scores of the primary land cover types from each year in the MCD12Q1 product. Our climatology is based on the principle that pixels with higher confidence scores are more likely to represent the correct land cover class. Indeed, confidence scores for pixels that are located where the land cover changes in the 2001–10 data are very different than those for pixels for which land cover type remains consistent (Fig. 2). For example ~64% of pixels with one land cover type have confidence scores of 80 or above, while only 19% (or 9%) of pixels with two (or more) land cover types have confidence scores of 80 or above. In general, the land cover classification for pixels with lower confidences (i.e., probabilities of class membership) typically has more interannual variability because of the larger spread of the probabilities of class membership between the highest and next highest class (Sulla-Menashe et al. 2011). Therefore, it is logical to consult the confidence scores to identify more likely candidates for the land cover climatology.

Fig. 2.

Histograms of confidence scores for all pixels with a consistent land cover type, for pixels with two land cover types, and for pixels with three or more land cover types from 2001 to 2010. The range for the boosting scores is from 1 to 99 to exclude areas that have 100% confidence (e.g., Greenland ice sheet or Sahara Desert).

Fig. 2.

Histograms of confidence scores for all pixels with a consistent land cover type, for pixels with two land cover types, and for pixels with three or more land cover types from 2001 to 2010. The range for the boosting scores is from 1 to 99 to exclude areas that have 100% confidence (e.g., Greenland ice sheet or Sahara Desert).

To this end, we select the most likely land cover type by weighting each pixel’s land cover label for each year by its corresponding confidence score, adding up the weights, and choosing the land cover type that has the highest overall score:

 
formula

where N = 10 is the number of years in the data record and Bj(i) is the boosting score for a particular pixel with land cover type j for a given year i. If a pixel is not assigned a boosting score for a particular year, affecting on average ~0.5% of land areas only, a boosting score of 75.7 (the average boosting score for all land pixels) is assumed.

4. Evaluation of the land cover climatology

a. Agreement between the climatology and an individual year of data

Despite the large amount of interannual land cover variability described in section 2, the land cover climatology (Fig. 3) does not substantially alter the global frequencies of most land cover classes relative to the average yearly global distribution. A few of the classes do, however, have substantial differences in their frequencies (Table 1). Most notable is that the classes labeled as deciduous needleleaf and permanent wetland are ~20%–30% more abundant in the climatology than for individual years and that the class labeled as closed shrubland is about one-half as abundant in the climatology. These three classes are less stable (defined by the percentage of “no change” pixels in the first column of Table 1) than other land cover classes. Deciduous needleleaf and permanent wetland have very high confidence scores, and closed shrubland has the lowest confidence scores of any class. The high confidence scores for deciduous needleleaf and permanent wetland can be explained by the fact that these two classes are adjusted postclassification by eliminating their occurrence if they have confidence scores lower than some threshold (Friedl et al. 2010).

Fig. 3.

Global map of our land cover–type climatology. To reduce data volume, the original data on an ~0.5-km sinusoidal grid are projected to geographic coordinates at a 0.1° resolution from 60°S to 80°N.

Fig. 3.

Global map of our land cover–type climatology. To reduce data volume, the original data on an ~0.5-km sinusoidal grid are projected to geographic coordinates at a 0.1° resolution from 60°S to 80°N.

Although the global frequencies of many climatological land cover classes are similar to the yearly data, their spatial distribution is sometimes substantially different. Figure 4 illustrates the agreement between the climatology and the yearly MCD12Q1 data for each land cover class on a 0.5° grid (to match the grid used in CLM land data). To make a comparison that is analogous to the way PFTs are used in CLM, the number of 0.5-km pixels of each land cover type is counted for each 0.5° grid cell using MCD12Q1 data for each year versus using the climatology and is expressed as a percentage of the total number of land pixels in the 0.5° grid cell. We then calculate the root-mean-square deviation (RMSD) of these numbers between the climatology and the 10-yr data record in each 0.5° grid cell. This metric gives a measure of how closely the land cover abundances that are based on the yearly data are to those that are based on the climatology. As a fraction of the pixels in each land cover category, deciduous needleleaf, open shrublands, grasslands, and savannas are the categories with the largest differences between the yearly land cover maps and the climatology: 26.8%, 16.6%, 15.6%, and 14.7% of 0.5° grid cells with these types have RMSD that is ≥0.1. In a similar way, grasslands, open shrublands, and savannas have 4615, 3795, and 2261 grid cells where the RMSD is ≥0.1. Because of its limited spatial extent, even though a high proportion of grid cells with deciduous needleleaf have RMSD that is ≥0.1, the total number of grid cells with RMSD ≥0.1 is lower than many other classes (1133). In fact, woody savanna is the only other class having >2000 grid cells where RMSD is ≥0.1.

Fig. 4.

RMSD of the fraction of 0.5-km pixels in each 0.5° grid cell for each land cover class between the 2001–10 data and the land cover climatology. Also shown under each panel are the numbers of 0.5° grid cells that fall into each RMSD category (less than 0.1, between 0.1 and 0.2, and greater than 0.2). Grid cells are shaded gray and are not considered in the counts if the land cover type does not make up more than 5% of the area in any of the land cover maps.

Fig. 4.

RMSD of the fraction of 0.5-km pixels in each 0.5° grid cell for each land cover class between the 2001–10 data and the land cover climatology. Also shown under each panel are the numbers of 0.5° grid cells that fall into each RMSD category (less than 0.1, between 0.1 and 0.2, and greater than 0.2). Grid cells are shaded gray and are not considered in the counts if the land cover type does not make up more than 5% of the area in any of the land cover maps.

b. Validation

One method to validate our climatology is to compare it with the STEP database, because it is one of the most complete and extensive land cover evaluation datasets available. We compare the IGBP class label for each STEP site with the corresponding 500-m MODIS pixel at the center of that site. In this study, we use version 6 (V6) of the STEP database, which includes 2463 sites with labeled IGBP classes (Fig. 5a). The V6 STEP database is slightly different than V5.1 (which is used to train Collection 5.1 of the MODIS data) in that sites have been modified or deleted because of recent changes in land cover type or improved information regarding the site and other sites have been added to fill geographic or ecological gaps in the database. Therefore, it contains some independent sites but is still largely dominated by training sites used in the actual product and so gives unrealistically high estimates of product accuracies (e.g., consistent and correct for 1908 pixels out of 2463 sites, or 77.5% of pixels, in Table 3). For completeness, we present results of this analysis using the entire V6 STEP database, using the subset of V6 STEP sites that are not included in V5.1 (to avoid training data for the Collection 5.1 MCD12Q1 product), and using a subset of V6 STEP sites that have a single land cover classification for all MODIS pixels within a site (to provide additional assurances that the center pixel is representative of the entire STEP site). As we discuss below, the same conclusions can be drawn from all three of these approaches.

Fig. 5.

(a) Global map of the STEP sites and 100 additional validation sites. (b) Example of Google Earth imagery (© Google; image ©2013 DigitalGlobe) used to determine areas that are predominantly cropland or urban. (c) Bar chart of the number of additional validation sites that are categorized correctly as having anthropogenic and natural landscapes in the climatology and in the 2001–10 data (on average). The climatology correctly identifies 73% of these sites, and the 2001–10 data correctly identify an average of 64.3% of the sites.

Fig. 5.

(a) Global map of the STEP sites and 100 additional validation sites. (b) Example of Google Earth imagery (© Google; image ©2013 DigitalGlobe) used to determine areas that are predominantly cropland or urban. (c) Bar chart of the number of additional validation sites that are categorized correctly as having anthropogenic and natural landscapes in the climatology and in the 2001–10 data (on average). The climatology correctly identifies 73% of these sites, and the 2001–10 data correctly identify an average of 64.3% of the sites.

Table 3.

Number of STEP sites that are correctly and incorrectly classified in the 2001–10 data and in the land cover climatology. Sites with inconsistent and sometimes correct land cover classifications can become correct or incorrect in the climatology or in successive years, and therefore this category is expanded to show data for all years as well as for the climatology.

Number of STEP sites that are correctly and incorrectly classified in the 2001–10 data and in the land cover climatology. Sites with inconsistent and sometimes correct land cover classifications can become correct or incorrect in the climatology or in successive years, and therefore this category is expanded to show data for all years as well as for the climatology.
Number of STEP sites that are correctly and incorrectly classified in the 2001–10 data and in the land cover climatology. Sites with inconsistent and sometimes correct land cover classifications can become correct or incorrect in the climatology or in successive years, and therefore this category is expanded to show data for all years as well as for the climatology.

Because our method does not improve the correctness of pixels that are consistent and correct (i.e., always classified correctly), consistent and incorrect (i.e., always classified with the same, incorrect class), or inconsistent and always incorrect (i.e., not always classified with the same class, but always incorrect), we focus our analysis on those pixels that are inconsistent and sometimes correct, which represents ~12% of the pixels (Table 3). In terms of validation, our climatology performs better than the MCD12Q1 product for each year. Relative to the MCD12Q1 data from 2001 to 2010, our climatology increases the number of correctly classified sites by 3–107 (or 9%–37% of the total number of inconsistent but sometimes correct sites), although the early years (especially 2001) are worse than the later years. In fact, the number of inconsistent/sometimes correct sites for 2001–03 is ≥1.95 standard deviations away from the average number of inconsistent/sometimes correct sites from 2004 to 2010 (defining the 97.5% confidence interval). It is unclear why there are lower confidence scores for the first three years, but it could have to do with lower-quality surface reflectance composites during the Terra-only period before the launch of Aqua and also the fewer years used to stabilize the classifications prior to 2003 (which, itself, uses data from 2001 to 2002).

Similar results can be obtained if we only consider the 688 V6 STEP sites that are not included in the V5.1 database or the 1092 V6 STEP sites at which 100% of our climatology pixels are of a single IGBP type. Again, data from the later years are validated better than data from earlier years, and the climatology performs slightly better than data from the later years. Of 143 inconsistent and sometimes correct sites in the subset data that are not included in the V5.1 STEP database, an average of 60 are correctly classified from 2001 to 2003, 76 are correctly classified from 2004 to 2010, and 80 are correctly classified in the climatology. The major difference between using the entire STEP database and this subset is that the agreement of the MODIS land cover data with the assigned classes in the STEP database is much lower (e.g., ~47% instead of ~77% of the sites are consistently and correctly classified with the subset data). Likewise, of 106 inconsistent and sometimes correct sites in the subset data that have 100% homogenous land cover on the basis of our climatology, an average of 55 are correctly classified from 2001 to 2003, 71 are correctly classified from 2004 to 2010, and 78 are correctly classified in the climatology.

To assess how well the climatology performs against a truly independent sample, we also compare our climatology (as well as the 2001–10 data) with Google Earth imagery at 100 additional 500-m pixels. We assess whether these pixels are correctly identified as having a significant human influence (over 50% cropland or urban) or not having one because it is relatively easy to visually identify cropland and urban from high-resolution satellite imagery. These pixels are chosen randomly from a pool of potential candidates that are determined by selecting from each MODIS tile a random set of pixels that has at least two classes represented as well as at least one occurrence of cropland or urban/built-up land cover classes in the 2001–10 MCD12Q1 data. Further, we only consider pixels for which there is high-resolution imagery in Google Earth to provide an accurate visual inspection of the area (i.e., to identify areas where cropland or urban cover a majority of the area of the pixel). The locations of these pixels and the results of this test are shown in Fig. 5. As we demonstrate for the STEP sites, the climatology performs as well as or slightly better than the 2001–10 MCD12Q1 data. There is a noticeable difference for human-influenced pixels, as 42 out of 57 pixels are correctly identified as cropland or urban for the climatology versus an average of 32.8 out of 57 pixels for the 2001–10 data. For the natural sites, however, the climatology and the 2001–10 data are nearly equally correct (31 out of 43 natural pixels are correctly identified instead of an average of 31.5 out of 43).

In addition to the climatology that based on Eq. (1) (section 3), we also investigate an alternative climatology that is based on the most frequent land cover type in the 2001–10 MODIS data. We prefer a climatology that is based on Eq. (1) because it is not uncommon to have two or more land cover types that are represented equally often in the 2001–10 data, and, without other information, random chance dictates the winner of these ties. For the STEP database, our climatology that is based on Eq. (1) is better than the alternative climatology as they respectively identify 210 and 201 inconsistent and sometimes correct STEP sites correctly. Their performances are very similar when compared with the Google Earth imagery: they respectively classify 42 and 41 pixels correctly as being dominated by cropland/urban and classify 31 and 33 pixels correctly as being dominated by natural vegetation. Furthermore, they give similar results, as there is better than 90% agreement globally between them for most classes [with the exception of permanent wetlands and deciduous needleleaf, whose confidence scores are artificially inflated (Friedl et al. 2010) after classification].

c. Comparison with GLCC data

Because version 2 of the GLCC dataset (Loveland et al. 2000) is widely used in regional and global models (e.g., ECMWF, GEOS-5, GFS, and WRF), we also compared it with our climatology. Figure 6 shows that there are substantial differences between the two maps in the distributions of nearly all of the land cover classes. For instance, barren is much more common in central Asia in our MODIS climatology (whereas open shrublands were more common in the GLCC data) and much less common at high northern latitudes (where it is predominantly classified as grasslands or open shrublands in our MODIS climatology). There are also large differences between the GLCC data and our MODIS climatology in terms of global frequencies (Table 4). For example, the percentage of urban area in the GLCC data is less than one-half of that in our MODIS climatology, and closed shrublands are 18 times as abundant in the GLCC data. Permanent wetlands, grasslands, savannas, and woody savannas are also at least 20% less common in the GLCC data, and evergreen needleleaf, deciduous broadleaf, mixed forests, croplands, and cropland/natural vegetation mosaics are at least 20% more common in the GLCC data.

Fig. 6.

Difference of fractional abundance of each land cover class in the 0.5° grid cells between our MODIS climatology and the V2 GLCC data. The GLCC data (http://edc2.usgs.gov/glcc/) have a resolution of 30 arc s. Also shown under each panel are the numbers of 0.5° grid cells that fall into each category (less than −0.2, between −0.2 and 0.2, and greater than 0.2). Grid cells are shaded gray and are not considered in the counts if the land cover type does not make up more than 5% of the area in any of the land cover maps.

Fig. 6.

Difference of fractional abundance of each land cover class in the 0.5° grid cells between our MODIS climatology and the V2 GLCC data. The GLCC data (http://edc2.usgs.gov/glcc/) have a resolution of 30 arc s. Also shown under each panel are the numbers of 0.5° grid cells that fall into each category (less than −0.2, between −0.2 and 0.2, and greater than 0.2). Grid cells are shaded gray and are not considered in the counts if the land cover type does not make up more than 5% of the area in any of the land cover maps.

Table 4.

Global abundances and validation for the GLCC product vs our MODIS climatology, showing (for both datasets) the global frequencies (as a percentage of all nonwater pixels) of each class, the number of pixels that are incorrectly identified in each class, and the percentage of total pixels that are incorrectly identified in each class as based on a subset of the STEP database (V6) that is not used in the training dataset for the MODIS data.

Global abundances and validation for the GLCC product vs our MODIS climatology, showing (for both datasets) the global frequencies (as a percentage of all nonwater pixels) of each class, the number of pixels that are incorrectly identified in each class, and the percentage of total pixels that are incorrectly identified in each class as based on a subset of the STEP database (V6) that is not used in the training dataset for the MODIS data.
Global abundances and validation for the GLCC product vs our MODIS climatology, showing (for both datasets) the global frequencies (as a percentage of all nonwater pixels) of each class, the number of pixels that are incorrectly identified in each class, and the percentage of total pixels that are incorrectly identified in each class as based on a subset of the STEP database (V6) that is not used in the training dataset for the MODIS data.

The GLCC data also are substantially more poorly validated than the MODIS product using the STEP sites and the additional pixels from the Google Earth imagery. For a fair comparison, we only use the subset of V6 STEP sites that are not used in the training dataset for the V5.1 MCD12Q1 data and only consider pixels that have a classification in the GLCC data. There are substantially more discrepancies between the STEP sites and the GLCC data than the MODIS data (Table 4). Overall, 393 of 669 pixels (58.7%) are classified differently in the GLCC data than in the STEP database. For comparison, 277 of 669 pixels (41.4%) are classified differently on the basis of our MODIS climatology. In fact, only the class labeled as closed shrubland has more errors in our MODIS climatology than in the GLCC data. If we further only consider STEP sites that have a homogenous land cover type, on the basis of our climatology, 151 of 284 pixels (53.2%) are classified differently in the GLCC data than in the STEP database, and 118 of 284 pixels (41.5%) are classified differently on the basis of our MODIS climatology.

In a similar way, in our second validation test that is based on the Google Earth imagery, 20 human-influenced pixels and 25 natural pixels are correctly identified, whereas 35 human-influenced pixels and 10 natural pixels are incorrectly identified in the GLCC data (again, neglecting pixels that have no classification in the GLCC data). This gives a success rate of 36% for human-influenced pixels and 71% for natural pixels. For comparison, the success rate for our MODIS climatology is 74% for human-influenced pixels and 72% for natural pixels.

5. Discussion

There is substantial interannual variability in the MODIS (Collection 5.1) Land Cover Type (MCD12Q1) data, despite the steps taken to reduce this interannual variability in the product’s creation (Friedl et al. 2010). This fact makes it difficult to get consistent land cover distributions, especially at finer spatial scales. In fact, 40% of nonwater pixels undergo at least one change from 2001 to 2010, and of these pixels, most undergo more changes than are physically reasonable, suggesting that a majority of changes in the 2001–10 data are probably spurious. Only a few classes (urban, snow/ice, barren, and evergreen broadleaf) have more than 80% of pixels experiencing no classification changes in the 2001–10 data. With the exception of the urban class, which is based on an independent product (Schneider et al. 2009, 2010) and is kept constant throughout the time period, these stable classes have larger land areas, especially as large homogenous expanses (e.g., Sahara and Middle Eastern deserts, tropical jungles, and Greenland and Antarctic ice sheets) where the confidences are relatively high.

The reason for this high degree of interannual variability is difficult to determine from the data themselves, but it is commonly attributed to the fact that many landscapes include mixtures of vegetation classes (e.g., Latifovic and Olthof 2004; Smith et al. 2002, 2003; Herold et al. 2008). These mixtures exist at nearly all spatial scales (DeFries et al. 1999). This situation can make it difficult to determine the dominant vegetation type from satellite sensors because different vegetation types might have different spectral characteristics and phenologies. In addition, class definitions for such land cover classifications do not necessarily correspond to classes that are consistently and accurately identified by satellite sensors. As a result, there can be poor spectral–temporal separability between vegetation classes. For example, it is difficult to separate savannas from grasslands on the basis of spectral characteristics alone (Friedl et al. 2010). Also, specific land cover classes can exist in a variety of climatic regions with a variety of different land-use practices, making it difficult to identify those classes in all regions. For example, shrublands exist in multiple climate zones, and there can be spectral confusion between shrublands and different classes in these different climate zones (Sulla-Menashe et al. 2011). Another example is that croplands can range from industrial-scale corn and wheat production in the North American central plains to highly fragmented agriculture in tropical regions of Africa.

Because there are so many spurious toggles (i.e., switches between different land cover types) between adjacent years, it makes sense to develop a land cover climatology that does not reflect this variability and that, it is hoped, resolves some of the conflicts. To develop this climatology we use confidence scores included with the product because they reflect uncertainty in the original land cover classification. Overall, we find that our climatology does not significantly affect the global distribution of most land cover classes from the 2001–10 data. The exceptions are deciduous needleleaf, permanent wetland, and closed shrubland, presumably because these classes are often confused with other classes yet their confidence scores are considerably different from the surrounding classes. Because of the high level of interannual variability, however, there are differences involving the distributions of many of the classes between using data from just one year and using the climatology. For example, when aggregated to 0.5° climate-model grid boxes, the RMSD of the abundances of certain land cover types between the climatology and the 2001–10 data can be greater than 0.2. The climatology also performs as well as or better than the yearly land cover data at the validation sites.

In addition, our climatology, which is based on Collection 5.1 MCD12Q1 data, is substantially different than the GLCC product, which is derived from the AVHRR data from April 1992 to March 1993. Many of these differences probably have to do with differences between the MODIS and the AVHRR instruments (with the former having higher spatial and radiometric resolutions) as well as a high-resolution database used to train the MODIS classifiers (which now includes imagery from Google Earth). It might also be related to the fact that the GLCC data use a single year of reflectance data from 1992 to 1993, although it is difficult (without more than one year of AVHRR data) to determine how different the land cover distribution would be if multiple years of AVHRR data were used. Nevertheless, we show that the MODIS data are validated substantially better than the GLCC data as based on the STEP database and our additional validation sites identified in Google Earth.

6. Conclusions

Our value-added MODIS-based land surface climatology represents a conceptual improvement over the multiyear data, which are subject to significant interannual variability that is not real, while not adversely affecting (and actually slightly improving) the validation of the dataset. We demonstrate that this interannual variability can lead to substantially different land cover distributions in some areas of the world, even at a typical climate-model resolution, suggesting that global modeling efforts should base their land cover data on a climatology rather than on data from individual years. In addition, our land cover climatology represents an improvement over the GLCC data (which represent a widely used first-generation global land cover data) because our climatology is based on 10 years of data rather than on only one year of data and MODIS data have better quality than AVHRR data.

The change of land cover type is expected to modify the land surface roughness length, albedo, and vegetation root distribution. We intend to work with modeling groups to assess the impacts of the new land cover climatology on weather, climate, and hydrometeorological processes in the future. The data are freely available from the authors.

Acknowledgments

This work was supported by NASA (NNX09A021G) and the U.S. Department of Energy (DE-SC0006773). The MODIS data used in this study were obtained from the USGS’s Land Processes Distributed Archive Center (https://lpdaac.usgs.gov/products/modis_products_table/mcd12q1), and the GLCC data were obtained from the USGS’s Global Land Cover Characterization website (http://edc2.usgs.gov/glcc/globe_int.php). We also thank the Boston University Land Cover and Surface Climate Group for making available the STEP data for validation.

REFERENCES

REFERENCES
Bonan
,
G. B.
,
1996
: A land surface model (LSM ver. 1.0) for ecological, hydrological, and atmospheric studies: Technical description and user’s guide. NCAR Tech. Note NCAR/TN-417+STR, 155 pp., doi:.
Bonan
,
G. B.
,
K. W.
Oleson
,
M.
Vertenstein
,
S.
Levis
,
X.
Zeng
,
Y.
Dai
,
R. E.
Dickinson
, and
Z.-L.
Yang
,
2002
:
The land surface climatology of the Community Land Model coupled to the NCAR Community Climate Model
.
J. Climate
,
15
,
3123
3149
, doi:.
Chen
,
F.
, and
J.
Dudhia
,
2001
:
Coupling an advanced land surface–hydrology model with the Penn State–NCAR MM5 modeling system. Part I: Model implementation and sensitivity
.
Mon. Wea. Rev.
,
129
,
569
585
, doi:.
DeFries
,
R. S.
, and
J. R. G.
Townshend
,
1994
:
NDVI-derived land cover classifications at a global scale
.
Int. J. Remote Sens.
,
15
,
3567
3586
, doi:.
DeFries
,
R. S.
,
J. R. G.
Townshend
, and
M. C.
Hansen
,
1999
:
Continuous fields of vegetation characteristics at the global scale at 1-km resolution
.
J. Geophys. Res.
,
104
,
16 911
16 923
, doi:.
DeFries
,
R. S.
,
M. C.
Hansen
,
J. R. G.
Townshend
,
A. C.
Janetos
, and
T. R.
Loveland
,
2000
:
A new global 1-km dataset of percentage tree cover derived from remote sensing
.
Global Change Biol.
,
6
,
247
254
, doi:.
Dickinson
,
R. E.
,
A.
Henderson-Sellers
,
P. J.
Kennedy
, and
M. F.
Wilson
,
1986
: Biosphere-Atmosphere Transfer Scheme (BATS) for the NCAR Community Climate Model. NCAR Tech. Note NCAR/TN-275-+STR, 82 pp., doi:.
ECMWF
,
2013
: IFS documentation—Cy38r1, operational implementation 19 June 2012, Part IV: Physical processes. ECMWF Tech. Doc., 189 pp. [Available online at http://old.ecmwf.int/research/ifsdocs/CY38r1/IFSPart4.pdf.]
Ek
,
M. B.
,
K. E.
Mitchell
,
Y.
Lin
,
E.
Rogers
,
P.
Grunmann
,
V.
Koren
,
G.
Gayno
, and
J. D.
Tarpley
,
2003
:
Implementation of Noah land surface model advances in the National Centers for Environmental Prediction operational mesoscale Eta Model
.
J. Geophys. Res.
,
108
, 8851, doi:.
Freund
,
Y.
, and
R. E.
Schapire
,
1997
:
A decision-theoretic generalization of on-line learning and an application to boosting
.
J. Comput. Syst. Sci.
,
55
,
119
139
, doi:.
Friedl
,
M. A.
, and Coauthors
,
2002
:
Global land cover mapping from MODIS: Algorithms and early results
.
Remote Sens. Environ.
,
83
,
287
302
, doi:.
Friedl
,
M. A.
,
D.
Sulla-Menashe
,
B.
Tan
,
A.
Schneider
,
N.
Ramankutty
,
A.
Sibley
, and
X.
Huang
,
2010
:
MODIS collection 5 global land cover: Algorithm refinements and characterization of new datasets
.
Remote Sens. Environ.
,
114
,
168
182
, doi:.
Friedman
,
J.
,
T.
Hastie
, and
R.
Tibshirani
,
2000
:
Additive logistic regression: A statistical view of boosting
.
Ann. Stat.
,
28
,
337
407
, doi:.
Giri
,
C.
,
Z.
Zhu
, and
B.
Reed
,
2005
:
A comparative analysis of Global Land Cover 2000 and MODIS land cover data sets
.
Remote Sens. Environ.
,
94
,
123
132
, doi:.
Hansen
,
M. C.
, and
B.
Reed
,
2000
:
A comparison of the IGBP DISCover and University of Maryland 1 km global land cover products
.
Int. J. Remote Sens.
,
21
,
1365
1373
, doi:.
Herold
,
M.
,
C. E.
Woodcock
,
A.
di Gregorio
,
P.
Mayaux
,
A. S.
Belward
,
J.
Latham
, and
C. C.
Schmullius
,
2006
:
A joint initiative for harmonization and validation of land cover datasets
.
IEEE Trans. Geosci. Remote Sens.
,
44
,
1719
1727
, doi:.
Herold
,
M.
,
P.
Mayaux
,
C. E.
Woodcock
,
A.
Baccini
, and
C.
Schmullius
,
2008
:
Some challenges in global land cover mapping: An assessment of agreement and accuracy in existing 1 km datasets
.
Remote Sens. Environ.
,
112
,
2538
2556
, doi:.
Huete
,
A.
,
K.
Didan
,
K.
Miura
,
E. P.
Rodriguez
,
X.
Gao
, and
L. G.
Ferreira
,
2002
:
Overview of the radiometric and biophysical performance of the MODIS vegetation indices
.
Remote Sens. Environ.
,
83
,
195
213
, doi:.
Latifovic
,
R.
, and
I.
Olthof
,
2004
:
Accuracy assessment using sub-pixel fractional error matrices of global land cover products derived from satellite data
.
Remote Sens. Environ.
,
90
,
153
165
, doi:.
Lawrence
,
P. J.
, and
T. N.
Chase
,
2007
:
Representing a new MODIS consistent land surface in the Community Land Model (CLM 3.0)
.
J. Geophys. Res.
,
112
, G01023, doi:.
Lawrence
,
P. J.
, and
T. N.
Chase
,
2010
:
Investigating the climate impacts of global land cover change in the Community Climate System Model
.
Int. J. Climatol.
,
30
,
2066
2087
, doi:.
Liang
,
L.
, and
P.
Gong
,
2010
: An assessment of MODIS Collection 5 global land cover product for biological conservation studies. Proc. 18th Int. Conf. on Geoinformatics, 2010 Beijing, China, IEEE, 1–6. [Available online at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5567991.]
Loveland
,
T. R.
,
B. C.
Reed
,
J. F.
Brown
,
D. O.
Ohlen
,
Z.
Zhu
,
L.
Yang
, and
J. W.
Merchant
,
2000
:
Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data
.
Int. J. Remote Sens.
,
21
,
1303
1330
, doi:.
McCallum
,
I.
,
M.
Obersteiner
,
S.
Nilsson
, and
A.
Shvidenko
,
2006
:
A spatial comparison of four satellite derived 1 km global land cover datasets
.
Int. J. Appl. Earth Obs. Geoinf.
,
8
,
246
255
, doi:.
Muchoney
,
D.
,
A.
Strahler
,
J.
Hodges
, and
J.
LoCastro
,
1999
:
The IGBP DISCover confidence sites and the system for terrestrial ecosystem parameterization: Tools for validating global land-cover data
.
Photogramm. Eng. Remote Sens.
,
65
,
1061
1067
.
Quinlan
,
J. R.
,
1993
: C4.5 Programs for Machine Learning. Morgan Kaufmann, 302 pp.
Ramankutty
,
N.
, and
J. A.
Foley
,
1999
:
Estimating historical changes in global land cover: Croplands from 1700–1992
.
Global Biogeochem. Cycles
,
13
,
997
1027
, doi:.
Rienecker
,
M. M.
, and Coauthors
,
2011
:
MERRA: NASA’s Modern-Era Retrospective Analysis for Research and Applications
.
J. Climate
,
24
,
3624
3648
, doi:.
Schaaf
,
C. B.
, and Coauthors
,
2002
:
First operational BRDF, albedo nadir reflectance products from MODIS
.
Remote Sens. Environ.
,
83
,
135
148
, doi:.
Schapire
,
R. E.
,
1990
:
The strength of weak learnability
.
Mach. Learn.
,
5
,
197
227
, doi:.
Schneider
,
A.
,
M. A.
Friedl
, and
D.
Potere
,
2009
:
A new map of global urban extent from MODIS satellite data
.
Environ. Res. Lett.
,
4
, 044003, doi:.
Schneider
,
A.
,
M. A.
Friedl
, and
D.
Potere
,
2010
:
Mapping global urban areas using MODIS 500-m data: New methods and datasets based on ‘urban ecoregions.’
Remote Sens. Environ.
,
114
,
1733
1746
, doi:.
Sellers
,
P. J.
,
C. J.
Tucker
,
G. J.
Collatz
,
S. O.
Los
,
C. O.
Justice
,
D. A.
Dazlich
, and
D. A.
Randall
,
1996
:
A revised land surface parameterization (SiB2) for atmospheric GCMS. Part II: The generation of global fields of terrestrial biophysical parameters from satellite data
.
J. Climate
,
9
,
706
737
, doi:.
Smith
,
J. H.
,
J. D.
Wickham
,
S. V.
Stehman
, and
L.
Yang
,
2002
:
Impacts of patch size and land-cover heterogeneity on thematic image classification accuracy
.
Photogramm. Eng. Remote Sens.
,
68
,
65
70
.
Smith
,
J. H.
,
S. V.
Stehman
,
J. D.
Wickham
, and
L.
Yang
,
2003
:
Effects of landscape characteristics on land-cover class accuracy
.
Remote Sens. Environ.
,
84
,
342
349
, doi:.
Sulla-Menashe
, and Coauthors
,
2011
:
Hierarchical mapping of northern Eurasian land cover using MODIS data
.
Remote Sens. Environ.
,
115
,
392
403
, doi:.
Townshend
,
J. R. G.
,
1992
: Improved global data for land applications: A proposal for a new high resolution data set. Royal Swedish Academy of Sciences IGBP Rep. 20, 87 pp.
Wan
,
Z.
,
Y.
Zhang
,
Q.
Zhang
, and
Z.-L.
Li
,
2002
:
Validation of the land-surface temperature products retrieved from Terra Moderate Resolution Imaging Spectroradiometer data
.
Remote Sens. Environ.
,
83
,
163
180
, doi:.
Zeng
,
X.
,
2001
:
Global vegetation root distribution for land modeling
.
J. Hydrometeor.
,
2
,
525
530
, doi:.
Zeng
,
X.
,
R. E.
Dickinson
,
A.
Walker
,
M.
Shaikh
,
R.
DeFries
, and
J.
Qi
,
2000
:
Derivation and evaluation of global 1-km fractional vegetation cover data for land modeling
.
J. Appl. Meteor.
,
39
,
826
839
, doi:.
Zeng
,
X.
,
M.
Shaikh
,
Y.
Dai
,
R. E.
Dickinson
, and
R.
Myneni
,
2002
:
Coupling of the Common Land Model to the NCAR Community Climate Model
.
J. Climate
,
15
,
1832
1854
, doi:.