The green vegetation fraction Fg, which represents the horizontal density of live vegetation, is an important parameter for the study of global energy, carbon, hydrological, and biogeochemical cycling. A common method of calculating Fg is to create a simple linear mixing model between two NDVI endmembers: bare soil NDVI, , and full vegetation NDVI, . However, many uncertainties exist for the determination of these parameters at large scales. The present study investigates how and determination can impact Fg calculations for all of China, based on different land-cover datasets, hyperspectral data, and soil type classification maps. The results show the following: 1) The regional ChinaCover dataset, with higher accuracy and more detailed classification, is preferable for calculating Fg in China, compared with the most commonly used MOD12Q1 dataset, although it would not lead to too much difference in values. 2) The soil NDVI from Hyperion datasets shows that soils have highly variable NDVI values (0.006–0.2), and 79.36% of the area studied has a much larger NDVI value than the commonly used value of 0.05. Therefore, the dynamic values with different soil types are much better for Fg calculation than the invariant value (0.05), which would yield a significant overestimation of Fg, especially for areas with low vegetation coverage. 3) A high-quality Fg dataset for China from 2000 to 2010 was established with and parameters based on MOD13Q1 250-m NDVI data.
Terrestrial vegetation plays a key role in global energy, carbon, hydrological, and biogeochemical cycling (Potter et al. 2008). The green vegetation fraction Fg, which represents the horizontal density of live vegetation, is of particular importance for regional and global carbon modeling, ecological assessment, and agricultural monitoring (Asner and Lobell 2000; Lucht et al. 2002; Parmesan and Yohe 2003). At the ecosystem level, the normalized difference vegetation index (NDVI) calculated from coarse spatial resolution satellite data, has been widely utilized to estimate Fg by exploiting the difference in visible and near-infrared (NIR) reflectance due to the presence of chlorophyll (Reed 2006; Tucker 1979; Xiao and Moody 2005).
Models used to derive Fg based on NDVI are generally simple linear (Gutman and Ignatov 1998) or quadratic (Carlson and Ripley 1997) combinations of two endmembers: NDVI from dense (LAI > 3) live vegetation and soil. The simple linear model developed by Gutman and Ignatov (1998), hereafter referred to as the G–I approach, was widely applied because of its ease of implementation, which stems partly from preselected values of NDVI for the soil and plant endmembers (Montandon and Small 2008). However, the selection of NDVI values for the two endmembers is complicated by variations in the spectral signals due to differences in vegetation type, plant health, leaf water content, and other factors (Elmore et al. 2000). Also, the spectral signature of soil varies, depending upon mineralogy, moisture, and grain size (Baumgardner et al. 1985). Considering the difficulties in addressing the spatial variability of live vegetation and soil endmembers over large areas, both the linear and quadratic models are normally parameterized using single estimated NDVI values of live vegetation and soil endmembers. The most common technique for estimating the two endmembers is to infer them from the data themselves. The can be defined as the highest NDVI value within the scene (Gallo et al. 2001; Li et al. 2002). The is commonly inferred from the lowest historical NDVI values within the scene (e.g., the G–I approach). However, many authors opt for the use of popular published values of 0.05 or less (Gebremichael and Barros 2006; Sridhar et al. 2003; Zeng et al. 2000). Since single or values are obviously not valid when studying large areas, alternative methods for determining more appropriate and values were developed. For , the NDVI data can be split into biomes, and the maximum value can be selected from each (Matsui et al. 2005; Oleson et al. 2000). For , combining soil spectral databases with temporal NDVI information for each pixel can yield better estimates of Fg than using global-invariant values estimated from whole scenes (Montandon and Small 2008). Nevertheless, the accuracy of the land-cover dataset and the absence of sufficiently detailed soil reflectance data constrain the accuracy of Fg calculations.
Here, taking all of China as the study region, we investigate how and determination can impact Fg calculations for all of China using MODIS 16-day NDVI imagery. We limit this study to the problems resulting from invariant assumptions of and when using NDVI in a linear G–I model and how these assumptions affect Fg estimations. First, we evaluate the influence of determination on Fg calculations by comparing two land-cover datasets: the MODIS land cover using the International Geosphere–Biosphere Programme (IGBP) classification system and the regional land-cover dataset ChinaCover developed by the Chinese Academy of Sciences. It is worth noting that different land cover would have different values. Second, we evaluate the error in Fg introduced by the selection by comparing a popular single value (0.05) to values from the soil type dataset and hyperspectral data. Finally, we present a green vegetation fraction dataset for all of China from 2000 to 2010 by adopting the optimal combinations of and .
2. Materials and methodology
2.1.1. NDVI data
In this study, we used the MOD13Q1 vegetation index product obtained from National Aeronautics and Space Administration (NASA)’s Earth Observation System (EOS) with a spatial resolution of 250 m. It is calculated from the two-way atmospherically corrected surface reflectivity that reduces the effect of water, clouds, and heavy aerosols and comes with a cloud shadow mask. A 16-day composite was used to further improve data quality. The value range of the MODIS NDVI dataset is between −2000 and 10 000, with a scale conversion factor of 10 000. The time range of the dataset is from 2000 to 2010. We used 250 completely different time phases for China in the entire time range. The data were stitched and cut off from 19 scenes of MODIS images, covering all mainland China and Taiwan.
First, we transformed the sinusoidal projection generally used in MODIS products to the Albers equal area projection. Then, a Savitzky–Golay filter-based method proposed by Chen et al. (2004), which was developed to make data approach the upper NDVI envelope and to reflect the changes in NDVI patterns via an iteration process, was applied to the time series NDVI dataset annually in order to further smooth out noise in NDVI time series, specifically the depressed NDVI values caused primarily by cloud contamination and atmospheric variability. After these preprocessing steps, we prepared 250 cloudless NDVI images for China with minimum deviations.
2.1.2. Land-cover data
Two land-cover datasets were compared to investigate their influence on Fg calculations. One was the Collection 5.1 MODIS land-cover product (MCD12Q1), which includes adjustments for significant errors that were detected in Collection 5. It is an annual product from 2000 to 2011 with a spatial resolution of 500 m, and the IGBP classification scheme was used in the study, which includes 11 natural vegetation classes, 3 developed and mosaicked land classes, and 3 nonvegetated land classes. The other was the ChinaCover product with a spatial resolution of 30 m, a new land-cover product developed by the Chinese Academy of Sciences; this product was specially designed for China’s ecological change analysis and includes a total of 38 classes. The most striking characteristic of the ChinaCover product is that more vegetation types are considered: 24 vegetation types in ChinaCover compared to 14 vegetation types in the MCD12Q1 product. Land cover in 2010 in both land-cover datasets were chosen to carry out determination. The changes in land cover were not considered because there were no real-time land-cover datasets.
2.1.3. Soil data
The soil type data were from China’s 1:4 000 000 soil spatial database digital maps, which were published by the Institute of Soil Science, Chinese Academy of Sciences (ISSCAS) in 1996. The maps were digitized based on China’s soil map published by ISSCAS in 1978. The Albers equal area conic projection was used, and different polygons with identifiers (codes) represent different soil types. This study used the first class of the soil type classification system, and a total of 57 soil types were included.
2.1.4. Hyperspectral data
To obtain sufficient soil reflectance data for retrieving , Earth Observing-1 (EO-1) Hyperion data for China (Figure 1) with a 30-m spatial resolution from 2000 to 2010 were downloaded (670 scenes). The preprocessing techniques were applied to the Hyperion data, including fixing bad and outlier pixels, local destriping, atmospheric correction, and minimum noise fraction smoothing, which ensures a consistent and standardized time series of data that is compatible with field-scale and airborne measured indices (Datt et al. 2003). To match them with MODIS NDVI data, the Hyperion spectra were first resampled using MODIS spectral response functions, then the NDVI was calculated.
2.2.1. Fg calculation model
The G–I approach (Gutman and Ignatov 1998) was used to calculate Fg based on the dimidiate pixel model, assuming each pixel is composed of only two components: vegetation and nonvegetation. The spectral information results from linear mixing of these two components. The proportional area of each component in the pixel is the weight of each component. The proportional area of vegetation is the Fg of the pixel, as mathematically expressed using the following formula:
where Fg is the green vegetation fraction in the mixed pixel, NDVI is the NDVI of the mixed pixel, is the NDVI value for bare soil, and is the NDVI value for each land-cover type corresponding to 100% green vegetation cover. It is clear that this dimidiate pixel model is a linear regression model for NDVI, and the accuracy of the results of the dimidiate pixel model depends on the value of and .
We used the same method described in Zeng et al. (2000) to calculate for each IGBP and ChinaCover land-cover type, since their method was validated with Fg estimates from 1- and 2-m satellite data and has been wildly applied for global Fg mapping (Wu et al. 2014; Broxton et al. 2014). First, we calculated the annual maximum NDVI image from 2000 to 2010, pixel by pixel. Second, histograms of the annual maximum images from 2000 to 2010 were calculated for each land-cover type for the corresponding years. Third, was taken as the 90th percentile for closed shrublands and urban and built-up lands, and the 75th percentile for other vegetation types. The for open shrubland or barren or sparsely vegetated land was taken to be the same as that for closed shrubland. Finally, the for each IGBP and ChinaCover category were determined based on the land cover–specific statistic of the annual maximum NDVI distribution.
First, we used the popular value of 0.05 proposed by Zeng et al. (2000) and widely applied in Fg calculations (hereafter referred as constant ). Also, we generated statistics from the bare land Hyperion NDVI data with soil types to calculate the for each soil group. The bare lands in Hyperion data for determination were defined through two steps. First, the bare land of ChinaCover 2010 was extracted as the background; then, an NDVI threshold (0.001–0.25) was utilized on the background to refine the pure “bare land” pixels. Finally, the for each soil group was defined as the average of the bare lands’ Hyperion NDVI values for corresponding soil groups (hereafter referred as dynamic ).
2.2.4. Fg validation
The Fg product is validated against high-resolution images from Google Earth Pro (Google, Inc.) in 400 locations with different land-cover types (Figure 1). The high-resolution images were chosen according to the following criteria: 1) the overpass time was between May and September; 2) our initial, quick look, visual assessment of Fg; and 3) the neighboring landscape was essentially homogeneous (approximates a 2 × 2 MODIS 250-m pixel window size). The third criterion was incorporated in order to reduce the effects of coregistration errors between the MODIS and Google Earth high-resolution images.
The Fg pixel whose centroid was closest to each location selected in Google Earth was identified, and the boundary of each pixel, geolocated according to the four corners, was overlaid on the Google Earth imagery. The boundaries of the selected pixels for each location were then converted to keyhole markup language (KML) files and displayed in Google Earth, and the high-resolution images corresponding to each validation pixel were saved with a nominal spatial resolution of 2.5 m in Google Earth Pro. Also, the overpass time of the images was recorded.
First, the land-cover type of each image was determined through analyzing the high spatial resolution image carefully. Then, the images were applied to a segmentation algorithm in ENVI 5 (from Exelis Visual Information Solutions, Inc.) to divide them into different homogenous objects. We visually merged the objects that overlap with the green vegetation and calculated the green vegetation fraction for each validation location.
2.2.5. Evaluation of the influence of and determination
Keeping or constant, Fg was calculated and compared with the 400 samples in the validation data. The root-mean-square errors (RMSE) for all the samples and different classes were further analyzed. Then, the best combination of and was applied to estimate the Fg for all of China from 2000 to 2010. The pixels with negative values were set to zero, and the pixels with values larger than one were set to one. Permanent snow, water, and unclassified pixels were masked.
3.1. Values of and their influence on Fg calculation
The values are given in Table 1. Also, the spatial distribution of is shown in Figure 2. Clearly, there is little difference between IGBP and ChinaCover for determining the value. Deciduous broadleaf forests have the largest values, 0.90 and 0.89 for IGBP and ChinaCover, respectively, whereas grasslands have the smallest values, 0.74 for IGBP and 0.74–0.76 for ChinaCover. Because more vegetation types were considered in ChinaCover, one IGBP category would correspond to more ChinaCover categories; therefore, more detailed values were assigned in ChinaCover. For example, the value for croplands in IGBP is 0.84, while it is 0.84 for paddy fields and 0.87 for dry land in ChinaCover; the value for shrublands in IGBP is 0.86, while it is 0.89 for evergreen shrublands and 0.86 for deciduous shrublands in ChinaCover. Therefore, we can see that the values were more detailed in ChinaCover than in IGBP, despite the similar distribution characteristics.
Assuming a constant of 0.05, the values based on IGBP and ChinaCover were utilized to calculate the Fg values for China, and the Fg results were validated by the Fg data of the 400 samples across China. The validation results for two estimated Fg results are shown in Figure 3. The Fg calculation accuracy was slightly better on the whole with values from ChinaCover than from IGBP, with RMSE values of 0.1978 and 0.2079, respectively. However, a significant improvement was observed for high vegetation coverage samples due to the alleviation of the overestimation problem. For low vegetation coverage samples (Fg < 0.5), there was no clear difference for Fg calculation whether either ChinaCover or IGBP land-cover data were used.
Table 2 shows the validation results calculated according to different land covers. Whether MODIS or ChinaCover, the Fg calculation accuracy is highest for forest then cropland, grassland, and shrubland. The Fg calculation using values from ChinaCover yielded higher accuracies for forest, grassland, and cropland, but there is no obvious difference for shrubland. The highest RMSE decrease occurred for grassland (a relative reduction of 6.90%), followed by forest (a relative reduction of 6.67%) and cropland (a relative reduction of 3.70%).
Higher RMSE values resulted when IGBP-based values were used to calculate Fg because of the inherent classification inaccuracies in global IGBP land cover. According to the land-cover type of validation sample, 94 of 200 forest samples were misclassified (mainly between different forest types), 16 of 31 shrubland samples were misclassified (mainly as forest or cropland), 13 of 19 grassland samples were misclassified (mainly as cropland), and 16 of 67 cropland samples were misclassified (mainly as grassland). In contrast, the ChinaCover were totally consistent with the validation sample, which could be attributed in part to the higher accuracy of ChinaCover where the high spatial resolution image existed, since large amounts of field truth data, including field samples and high spatial resolution images, were incorporated when producing ChinaCover 2010 (Zhang et al. 2014). Also, the Fg calculation accuracy was evaluated for each category especially by using the misclassified sample in IGBP. The results show that the RMSE of Fg calculation decreases from 0.19 to 0.16 (a relative reduction of 15.79%) for forest, 0.23 to 0.21 (a relative reduction of 8.70%) for cropland, and 0.29 to 0.28 (a relative reduction of 3.45%) for grassland, with no obvious difference for shrubland, when ChinaCover was used instead of IGBP. Thus, the accuracy of land cover was a key factor that affected the Fg calculation. For example, the misclassification of grassland into cropland would result in an obvious underestimation in the IGBP grassland category (Figure 4), since the values for grassland are significantly smaller than for cropland.
3.2. Values of and their influence on Fg calculation
The values of different soil types are shown in Figure 2. The statistics show that the ranges from 0.006 to 0.2, with an average value of 0.1, and most of the values exceed the constant value of 0.05. With ChinaCover-based values, a constant value and dynamic values were used to compare their influence on Fg calculations. The validation results of two estimated Fg values are shown in Figure 3. The Fg calculation errors using dynamic values are lower than those using a constant value of 0.05, with RMSE values of 0.1918 and 0.1978, respectively.
A total of 22 soil types were covered by Fg validation samples, and the RMSEs of the different soil types were analyzed and are shown in Table 3. Note that the RMSE was lower for 18 of the total 22 soil types when dynamic values were used, and the accuracy improved the most for castanozem (an RMSE decrease of 0.253), followed by purplish soil (RMSE decrease of 0.051) and felty soil (RMSE decrease of 0.045). For other soil types, including Huanglu soil, Mian soil, fluvo-aquic soil, and cinnamon soil, the RMSE increased by 0.033, 0.008, 0.013, and 0.037, respectively.
Considering that Fg in low vegetation cover areas was more sensitive to the determination, we carried out the validation for low vegetation cover (Fg < 0.5) separately. Figure 5 shows that the overestimation that occurs when using a constant value is dramatically reduced in low vegetation cover areas when a dynamic is adopted, and the RMSE also decreases significantly from 0.2050 to 0.1584 (a relative reduction of 22.73%), showing that a dynamic performs better than a constant especially for low vegetation cover areas.
3.3. Fg distributions
The 16-day Fg in China from 2000 to 2010 was computed with and parameters calculated from ChinaCover and Hyperion NDVI for different soil types, respectively, based on MOD13Q1 250-m NDVI data. After calculating the 16-day Fg for China from 2000 to 2010 for each pixel in an image, we computed the annual maximum Fg image on a pixel-by-pixel basis from 2000 to 2010 (Figure 6). An obvious increasing trend of Fg from west to east could be identified. Moreover, the statistics show that 59.6% of total pixels in China express a high Fg (>60%), and 40.4% of the pixels display a low Fg (≤40%). Most of the low Fg areas are located in western China, except for water bodies and settlements.
4.1. determination and uncertainty
The most common method for determination is to calculate the different percentiles according to land-cover type. In this study, we adopted the same percentile numbers for different land-cover types as utilized by Zeng et al. (2000), who determined the percentile numbers using a global AVHRR 10-day NDVI composite by interpreting the rough percentages of full vegetation coverage areas of different land covers in the conterminous United States based on the high-resolution images. Because of the differences in the study area and NDVI datasets used, these percentile numbers were certainly not the best values. However, we did not devote too much effort to fixing this problem, since our main objective was to evaluate the effect on Fg calculation of using a more accurate land-cover dataset.
The values determined in this study were consistent with those found by Montandon and Small (2008) for the conterminous United States, when the same NDVI dataset and Zeng et al. (2000) method were used. Deciduous broadleaf forests have the largest value of 0.89 for the conterminous United States and China, whereas grasslands have the smallest value of 0.67 for the conterminous United States and 0.74–0.76 for China. This consistency shows that our values were reasonable and are comparable with other Fg studies at the national level. Although the different land-cover datasets would not lead to a big difference in values owing to the large-scale statistics, the accuracy of land-cover data influences the Fg calculation significantly. For example, when grassland was misclassified as cropland, an obvious Fg underestimation arose, since the values for grassland are significantly smaller than those for cropland. Since ChinaCover has higher spatial resolution (30 m) and is supported by several hundred thousand field samples, the Fg results using ChinaCover were substantially better, especially for the misclassified grassland in MOD12Q1. Therefore, using a regional land-cover dataset (e.g., ChinaCover) makes the Fg calculation in China more reasonable compared with other global land-cover datasets, such as MOD12Q1, which has been criticized for its low accuracy in China (Ran et al. 2010).
4.2. determination and uncertainty
The soil reflectance data available from Hyperion datasets show that soils have highly variable NDVI values (0.006–0.2), and 79.36% of the areas have a much larger NDVI value than that commonly used (0.05) in Fg models. If the constant value (0.05) was adopted, the underestimation of would yield significant overestimation of Fg, especially for low vegetation coverage areas (Figure 5). Also, given the variability of soil NDVI, using a single value for all of China is not appropriate, particularly for large-scale studies. Therefore, the dynamic values computed from the Hyperion dataset according to different soil types were much better for Fg calculation than the invariant value (0.05). Montandon and Small (2008) reached a similar conclusion by analyzing the impact of determination on the Fg calculation from NDVI.
When no information on soil is available, the most popular method of estimating the value is to use the lowest historical NDVI values within the study area by assuming the pixel with the lowest soil NDVI is free of vegetation (bare soil). However, this assumption is likely to be incorrect because of the coarse resolution (250 m) of the MODIS NDVI dataset and high vegetation cover in southeastern China. Therefore, using a high-resolution soil spectra dataset (e.g., the 30-m Hyperion dataset used in this study) to determine the value is a more realistic approach, since the relative high spatial resolution would be beneficial for pure soil pixels extraction. Furthermore, this study incorporates the soil type classification map to better constrain , making the localized more targeted and reasonable when no detailed soil spectral library exists.
There were some sources of uncertainty inherent in the data and method used for determination in this study. Because of the limited coverage of Hyperion data, values for three soil types were not retrieved, for which the average of all values (0.1) was assigned. Additionally, temporal differences between Hyperion image acquisitions are also likely to reduce the comparability to some extent; as bare soil spectra changes with soil moisture, which is highly variable seasonally, especially in eastern and southern China. Although the bare land pixels were carefully determined through combining ChinaCover information and NDVI threshold, the defined “bare soil” inevitably included some surface features. Choosing the average NDVI value as the for each soil type reduced these potential misclassification effects to a certain extent. Finally, artificial surfaces were not considered for determination individually since they do not belong to any soil type in the existing classification system. However, the big spectral difference between artificial surfaces and natural soils would lead to extra errors for Fg calculations in urban areas.
4.3. Fg validation and its uncertainty
Because of the widespread distribution of validation samples across China, field investigations were unrealistic. The use of high spatial resolution imagery from Google Earth circumvents this limitation and provides a means to “ground truth,” MODIS-derived Fg values over large spatial extents. Admittedly, this evaluation has some uncertainty associated with it. To ensure the accuracy as much as possible, we adopted an object-oriented classification technique for each small patch (500 m × 500 m), which has proven to be effective and has high accuracy for high spatial resolution images (Mathieu et al. 2007).
Additionally, the validation samples were not selected randomly but focused on the main vegetation types, such as woodlands, shrublands, grasslands, and croplands, and considered the existence of the high spatial resolution data in Google Earth. Thus, there is a certain subjectivity in the validation samples’ distribution. However, on the whole, the validation samples were representative because they were evenly distributed across China and included all the major vegetation types in different ecoclimate regions. Since nonvegetated areas were not included in the validation samples, the validation results do not represent the overall precision of the Fg calculation, which should be higher, because our Fg calculation approach produced very precise estimates for nonvegetated land covers, such as bare land and water bodies.
Based on the MODIS MOD13Q1 NDVI product, we improved the and estimations in the G–I model in order to make the Fg calculation more reasonable. Our results show that the different land-cover datasets do not lead to big differences in values, but the accuracy of land-cover data influences the Fg calculation significantly. The regional ChinaCover dataset is preferred for Fg calculations in China, compared with the more commonly used MOD12Q1 dataset. Furthermore, the soil reflectance data available from Hyperion datasets show that soils have highly variable NDVI values (0.006–0.2), and 79.36% of the areas studied has a much larger NDVI value than that commonly used in Fg models (0.05). When the constant value (0.05) was adopted, the underestimation of would yield significant overestimation of Fg, especially for areas with low vegetation coverage. Therefore, dynamic values with different soil types are much better for Fg calculation than the invariant value (0.05). Finally, a high-quality Fg dataset for China from 2000 to 2010 was produced, with optimal and parameters with MOD13Q1 250-m NDVI data, which can better support regional carbon modeling, ecological assessment, and agricultural monitoring in China.
We acknowledge financial support from the Forestry Public Interest Research Program (201404422) and the National Natural Science Foundation of China (41361091). We thank all team members for providing the valuable ChinaCover product supported by the “National Ecological Environment Dynamic assessment based on Remote Sensing from 2000 to 2010.” Finally, we thank the editor and two anonymous reviewers whose comments helped to improve the quality of this paper.