Urban environments are characterized by high spectral and spatial heterogeneity and, as a consequence, most urban pixels in moderate-resolution imagery contain multiple land-cover materials. Despite these complexities, virtually all urban land cover can be generalized as a combination of vegetation, impervious surfaces, and soil (V–I–S components), in addition to water. Previous work has demonstrated the potential of multiple endmember spectral mixture analysis (MESMA) to model the subpixel abundance of V–I–S components. Here, the authors test whether the technique is sufficiently robust to map V–I–S components for a diverse set of cities, selecting 10 urban centers in the state of Rondônia, Brazil, to represent a range of populations, development histories, and economic activities. For each urban sample, a 20 km × 20 km region centered over the built-up area was subset from Landsat Enhanced Thematic Mapper Plus (ETM+) imagery. MESMA was applied to all subscenes using the same spectral library, model constraints, and selection rules. Accuracy of the modeled V–I–S fractions was assessed using high-resolution images mosaicked from digital aerial videography. Modeled fractions and reference fractions were highly correlated, with R2 values exceeding 0.75 for all materials in multiple cities across a region. Model complexity, or the number of endmembers required to accurately model each pixel, was correlated with the degree of human impact on the landscape. Built-up areas, as delineated by model complexity, exhibited a strong fit to the well-established relationship between the built-up area of a settlement and its population. Finally, this work demonstrates that the V–I–S components as modeled by MESMA can capture both inter- and intraurban variability, suggesting that these data products could contribute to comparative studies of urbanizing areas through time and across regions.
Urban land cover represents one of the most extreme forms of human impact on the natural environment, and urban areas are among the most rapidly changing forms of land cover on the planet (Douglas 1994; Ridd 1995; Collins et al. 2000; Sánchez-Rodriguez et al. 2005). This trend is expected to continue as more than half of the world’s population currently lives in urban areas, and the vast majority of future population growth is expected to be concentrated in cities (United Nations 1998; Sánchez-Rodriguez et al. 2005). Spatial patterns of urban growth result from the interplay of social systems and the physical environment (Douglas 1994; Weber 2001; Rashed et al. 2005). A first step in developing insights concerning the interaction between such systems is to monitor change in the distribution and composition of urban land cover; yet, currently few global datasets exist concerning the extent, shape, and/or composition of built-up land cover (Lepers et al. 2005).
Remote sensing imagery offers a fast, cost-effective, and consistent source of information about the evolution of urban land cover from intraurban to global scales. However, deriving meaningful and accurate measures to quantitatively characterize urban land cover remains a challenge in remote sensing applications because an urban area represents a complex landscape of built structures and human-modified land cover (Forster 1985; Zipperer et al. 2000). Globally, urban land cover varies in terms of density of habitation, structure of buildings, type of construction materials, abundance of vegetation, and number of open spaces, among other factors. As a result, there is no globally consistent spectral signature for urban land cover, and there is no straightforward means of comparing urban environments from different regions (Ridd 1995; Small 2005). Even within a single urban area, image analysis is complicated because the high spectral and spatial variability of built-up materials limit the utility of traditional classification approaches that assume the landscape is composed of discrete, homogeneous classes (Forster 1985; Small 2002).
Effectively monitoring urbanization on a landscape requires consistent measures to describe urban land cover. These measures should be specific enough to capture the spectral diversity of local materials, yet flexible enough to provide standardized measures of physical properties that can be compared through time and across regions. Ridd (Ridd 1995) proposed a theoretical framework to address these issues: the urban physical environment can be characterized in terms of three primary components—vegetation, impervious surfaces, and soil (V–I–S components)—in addition to water. Previous work has quantified the abundance of these components on a subpixel basis for moderate spectral and spatial resolution imagery using a technique known as multiple endmember spectral mixture analysis (MESMA). Rashed et al. (Rashed et al. 2003) successfully applied MESMA to Landsat Thematic Mapper (TM) data over Los Angeles, and Powell et al. (Powell et al. 2007) to Landsat Enhanced Mapper Plus (ETM+) data over Manaus, Brazil. The successful use of MESMA to map V–I–S components of two cities representing different levels of economic development and located in drastically different environments indicates the potential of the technique to generalize the physical components of urban land cover.
However, each of the two case studies optimized the application of MESMA for a single city on a single date, and thus a number of uncertainties remain in terms of the feasibility of applying the same approach and models to different regions or to different dates. First among these uncertainties is how well the same techniques for spectral library selection and MESMA implementation developed in either test case would perform in another area. In other words, to what degree is the strategy for implementing MESMA specific to the spectral and spatial properties of each city? A second uncertainty concerns the specificity required of the spectral library; that is, can a spectral library be derived for and successfully applied to urban areas across a large region, or must a library be developed independently for each urban area to be mapped? A third uncertainty is how well the application of V–I–S mapping can discriminate between built-up and nonbuilt-up land cover, especially in a landscape that has been highly impacted by human activity. Finally, we are ultimately interested in describing how urban landscapes evolve through time and in comparing the physical environments of urban areas from different regions of the world. Accomplishing this will require characterizing both the internal diversity of a single urban area and the interregional diversity between urban areas. The final goal of this paper, therefore, is to assess how well subpixel V–I–S mapping using moderate-resolution imagery can capture those differences.
To investigate these questions, we focused on the state of Rondônia, southwestern Brazil. Portuguese settlement of the region began in the mid-eighteenth century; however, the most important wave of modern settlement began in the early 1970s, with the opening of the Cuiabá-Porto Velho (BR-364) highway (Browder and Godfrey 1997), which bisects the state. Large tracts of land on both sides of the highway were appropriated for government- and corporate-sponsored colonization projects, the first initiated in 1971 (Browder and Godfrey 1997; Pedlowski et al. 1997). These events coincided with a period of agricultural mechanization and massive land consolidation in the south and southeastern regions of Brazil, and unprecedented numbers of displaced families migrated to Rondônia from those regions (Browder and Godfrey 1997; Perz 2002). The population of the state increased tenfold between 1970 and 1991, from just over one hundred thousand to well over a million (Perz 2002), and the population continued to increase through the 1990s by another quarter of a million inhabitants. Since 1980, the urban growth rate has been significantly higher than the rural growth rate (Perz 2002), and by the year 2000, over 64% of the population lived in urban areas, as defined by the national census (IBGE 2005). Along with the massive influx of colonists and subsequent population growth, the region experienced one of the highest deforestation rates in the Amazon, remaining high through the 1990s (Alves 2002).
The state of Rondônia provides an interesting environment to test the issues posed above for several reasons. First, the period of rapid settlement and urbanization in the region coincides with the 30+ year Landsat data archive (1970s to present); as a result, the trajectory of urbanization in the region has been in large part documented by satellite imagery, a situation that exists in few places in the world. Second, because the history of urban development has been so explosive, we propose that the population of an urban center can serve as a proxy for settlement age, and therefore inferences can be drawn about the trajectory of urban development in the region using a single date for analysis. Finally, while much effort has been devoted to studying rural land-cover change in Rondônia over the same time period, very little work has considered the relationship between urban areas and regional land-cover change (Browder and Godfrey 1997; Boucek and Moran 2004). Characterizing the physical environment of urban centers and their relationship to surrounding land cover is a first step in building this connection. For example, Roberts et al. (Roberts et al. 2002) noted that the landscapes immediately surrounding urban centers in Rondônia can follow quite different land-use trajectories, and they suggested that such differences may be linked to differences in the economies and physical infrastructures of urban areas.
2.1. Study sites and data
Ten urban centers in the state of Rondônia were included in this study (Figure 1); they were selected to represent a range of populations, development histories, and economic activities (Table 1). Broadly, they fall into four categories. The first category includes only the region’s primate city, Porto Velho, the original state capitol and the only city on a major riverine thoroughfare. The second category includes urban centers that emerged from telegraph posts established in 1909 during the first government-sponsored survey of the state—Ariquemes, Ji-Paraná, and Pimenta Bueno. The road built to place the telegraph lines became what is now the federal highway BR-364. The third category is represented by the only major commercial center in the state not located on the BR-364, Rolim de Moura, formerly a center of the mahogany extraction industry (Browder and Godfrey 1997). Finally, the remaining towns are located on small feeder roads of the BR-364; most were established as private or government agricultural colonies (de Lima 1990). The distribution of the 2000 urban population for all 52 county seats (sedes de município) in Rondônia is presented in Figure 2, with samples used in this study identified with numbers corresponding to Table 1. The sample was somewhat biased to cities with larger populations; however, this was deemed necessary because constructing a regional library of impervious spectra required a sufficiently large number of pixels dominated by built-up land cover.
Two sets of five Landsat Enhanced Thematic Mapper Plus (ETM+) scenes were used in this study (Figure 1; Table 1). The first set was used to retrieve apparent surface reflectance, and the second was selected to be as close as possible to the date of reference data collection for accuracy assessment. Surface reflectance was derived from the calibrated multispectral data using a radiative transfer model as implemented in the Atmospheric Correction Now (ACORN) software (ImSpec LLC 2002). All scenes were coregistered to 1998 or 1999 digital base maps provided by Brazil’s Instituto Nacional de Pesquisas Espaciais (INPE 2000). The five 1999/2000 scenes all had some degree of smoke contamination; therefore, a haze-correction algorithm was applied to bands 1–3 (Carlotto 1999). The haze-corrected images were then intercalibrated to the reflectance images using a relative radiometric calibration approach (Roberts et al. 1998a; Furby and Campbell 2001). For each urban sample, a 400-km2 region (670 × 670 pixels) centered over the built-up area was subset from the 1999/2000 reflectance images, and these subscenes were the basis of all further analysis.
Reference data were produced from digital aerial videography collected over the region on 22 June 1999, as part of the Validation Overflights for Amazon Mosaics (VOAM) surveys (Hess et al. 2002). The flight lines transected three of the urban areas in the sample: Ji-Paraná (number 2 in Figure 1), Rolim de Moura (number 4 in Figure 1), and Santa Luzia d’Oeste (number 8 in Figure 1). Wide-angle and zoom videography were collected simultaneously. Average swath width was approximately 0.9 km for the wide-angle videography and 65 m for the zoom; average ground instantaneous field of view (GIFOV) was approximately 1.5 m for the wide-angle and 0.3 m for the zoom. Typically, global positioning system (GPS) location and time code, aircraft altitude, and aircraft height data were encoded on each videography frame and used to automatically generate geocoded videography mosaics; however, on 22 June, the GPS instrument on board the aircraft was not functioning properly, and so georeferencing was not recorded as the data were collected. Nonetheless, it was still possible to estimate aircraft height and to generate freestanding mosaics, which were in turn manually georeferenced to the Landsat images.
2.2. Multiple endmember spectral mixture analysis
The basic assumption of spectral mixture analysis (SMA) is that the measured reflectance P′ for any pixel i can be modeled as the linear sum of N endmembers, or spectrally “pure” materials, weighted by the fraction fki of area within the pixel composed of endmember k (e.g., Adams et al. 1993; Roberts et al. 1998a). That is, for a given wavelength λ,
where eiλ is a residual term indicating the disagreement between the measured and modeled spectra. The modeled fractions are typically constrained to sum to correspond to physical reality:
Endmember spectra can be collected in the field or laboratory (reference endmembers) or extracted from an image (image endmembers). Constraints for selecting appropriate models for each pixel can be specified in terms of the range of endmember fractions, residuals for each wavelength, and the RMSE (Roberts et al. 1998b).
In a standard application of SMA, a fixed number of representative endmembers is selected and the entire image is modeled in terms of those spectral components. However, the success of this procedure can be limited because the selected endmember spectra may not effectively model all elements in the image, or a pixel may be modeled by endmembers that do not correspond to the materials located in its field of view. Both cases result in decreased accuracy of the estimated fractions (Sabol et al. 1992). These limitations of simple SMA are especially problematic in urban environments, which exhibit high degrees of spectral heterogeneity on fine spatial scales. A technique that addresses these limitations is MESMA, which allows the number and type of endmembers to vary on a per-pixel basis (Roberts et al. 1998b).
In this study, MESMA was implemented following the procedure detailed in Powell et al. (Powell et al. 2007). In summary, an endmember library was constructed from candidate image and reference endmembers. Then, a series of simple SMA models based on different combinations of library endmembers was applied to every pixel in the image, and the “best fit” model was selected for each pixel. The models were generalized into the land-cover components of interest (i.e., vegetation, impervious surface, soil), and an image of fractional coverage per pixel was generated for each component. How well the fraction images represented the actual fractions of each land-cover component was assessed by comparing the modeled fractions to fractional cover measured from classified aerial videography. Agreement between modeled fraction cover and reference fraction cover was used to refine the combinations of endmembers allowed for SMA modeling, as well as to select the most appropriate constraints and model selection rules.
2.3. Library construction
Arguably the most critical step in implementing MESMA is the selection of the endmember library (Tompkins et al. 1997; Dennison and Roberts 2003). In this case, we followed an adapted version of the steps applied in the Manaus case study (Powell et al. 2007) and summarized in Figure 3. First, a collection of candidate reference and image endmembers was compiled, and endmembers were organized into four groups of materials: green vegetation (GV), nonphotosynthetic vegetation (NPV)—that is, dry or senesced vegetation—soil, and impervious surfaces. Previous applications of SMA in the region had identified that the vegetation class needed to be divided into two spectrally distinct groups—green and senesced vegetation—to adequately capture the variability of the landscape (Adams et al. 1995; Roberts et al. 2002). Second, for each class of materials, a subset of endmembers was selected that best represented the class in the library collection. Third, from this subset, endmembers were selected that best represented materials within each scene. Each step is discussed in more detail below.
2.3.1. Endmember collection
Reference endmembers included in the Rondônia library were identical to those used in the Manaus endmember library collection; see Powell et al. (Powell et al. 2007) for details concerning the collection of field and laboratory spectra. Image endmembers were collected by applying a pixel purity index (PPI) to all 10 subscenes (Boardman et al. 1995). Pixels identified as extreme by the PPI were visually inspected, and the spectra of those that could be confidently classified as one of the four materials of interest were collected. Water pixels, identified by applying a threshold to Landsat band 7, and pixels that were saturated in any band were excluded from the PPI evaluation. The resulting endmember collection consisted of 627 endmembers, including 301 image and 326 reference. The number of endmembers in the library collection grouped by material class was as follows: 228 GV, 87 NPV, 136 impervious, and 176 soil (Figure 3a).
2.3.2. Endmembers representative of library
The next step in library construction was to identify which spectra were most representative of their material class. We followed a procedure developed by Dennison and Roberts (Dennison and Roberts 2003) in which each spectrum in the library—plus photometric shade—is used to unmix every other spectrum in the library. This resulted in 627 unique two-endmember models for each spectrum, and the RMSE for all models was recorded. To rank spectra in terms of how well they represented each material class, endmember average RMSE (i.e., EAR) was calculated (Dennison and Roberts 2003). For each spectrum, EAR is the average RMSE for that spectrum unmixing all other spectra within its class. For each class, the spectrum with the lowest EAR was considered the endmember that best represented that class.
A preliminary endmember library was constructed on the basis of EAR alone. For each material class, the spectrum associated with the lowest EAR was selected. That spectrum and all other spectra that were modeled by it were removed from the library collection and EAR was recalculated. The spectrum most representative of the remaining spectra in its class was added to the preliminary library. This was repeated until three endmembers per class had been selected (Figure 3b). This preliminary library of 12 endmembers was applied to model all of the subscenes. Based on visual inspection of the results, the GV and NPV material classes appeared to be well represented by the library; that is, areas dominated by GV and NPV such as forest and pasture were well modeled. However, areas expected to have high soil and impervious fractions were poorly modeled or not modeled at all.
2.3.3. Endmembers representative of image materials
The spectrum that best represents its class within the library does not necessarily account for the spectral diversity of materials within the image (Song 2005). In the Manaus case study (Powell et al. 2007), a link was established between endmembers representative of the library and endmembers representative of materials on the ground by running a series of two-endmember spectral mixture models for each class of materials. Criteria were established based on the number of pixels in the image successfully modeled by each two-endmember model. However, when applied to Rondônia subimages, very few image pixels were modeled by two-endmember soil or two-endmember impervious models and, as a result, the same strategy could not be implemented in this environment.
To gain insight as to why the previously successful endmember selection strategy failed in this case, we visually inspected the high-resolution reference data constructed from zoom videography. A 30-m grid was overlaid on each mosaic, and it was noted that almost every grid cell within the built-up areas included surface materials from all three classes of interest: vegetation (GV or NPV), soil, and impervious. A large portion of the 30-m grid cells had no majority land cover, and almost no cell contained more than 50% impervious cover. Clearly, a new strategy for linking library spectra to image materials was needed. The strategy developed to address this issue depended on two general steps. The first step was to rank library spectra in terms of their importance in modeling materials in the subscenes (Figure 3c). The second step was to determine how many spectra of each material type were required to adequately capture the diversity of materials on the ground. This was determined by finding the number of endmembers that maximized accuracy (Figure 3d).
The basic library of the top three EAR endmembers per material class was used as a starting point for ranking the endmembers in terms of their importance for capturing the spectral diversity of subscenes. The endmembers for GV and NPV were deemed acceptable based on preliminary MESMA application to the study area, so the goal was to identify more appropriate soil and impervious endmembers. For each class independent of the other, the next iteration of EAR calculations was applied, the selected spectrum was added to the library, and MESMA was reapplied to the three subscenes associated with the reference data. Because soil and impervious spectra generally occur at the pixel scale in four-endmember models, MESMA was applied to the image using only four-endmember models, listed in Table 2. At each iteration, the correlation between modeled fractions and reference fractions was assessed (Figure 3c). This procedure was repeated until accuracy statistics started to decline. For the last version of the library, the number of times each soil or each impervious spectrum was included in a successful mixing model was tabulated (Figure 4). The spectrum most frequently included in the four-endmember models was assumed to be the most important in terms of representing materials in the study areas.
The next step was to determine how many spectra from each class were needed to sufficiently represent materials in the scene, while minimizing confusion between impervious and soil (Figure 3d). This was accomplished by creating a new version of the base library: the “best” spectra for GV and NPV were based on EAR alone, while the most representative spectra for soil and impervious were determined based on the frequency they were used by four-endmember models. For example, for the soil class, endmembers 6, 5, and 2 were deemed most representative because they were the spectra most frequently included in four-endmember models (Figure 4), and endmembers were similarly selected for the impervious class. Next, MESMA was applied using this revised endmember library to the three subscenes with reference data, and accuracy statistics computed (see section 2.6 below). These served as the new baseline for accuracy assessment. To identify impervious candidates, spectra were sequentially added to the base library in the order of importance as determined by their inclusion in the four-endmember models. Each time a new endmember was added to the library, MESMA was applied to the subscenes, and accuracy statistics recorded. The same procedure was applied to soil spectra, independent of the impervious spectra. Finally, the procedure was applied again by simultaneously adding one impervious and one soil at each step. The accuracy statistics were compared for each iteration, and the most appropriate number of soil and impervious spectra was selected to correspond with the highest accuracy MESMA output.
2.4. SMA models
Two-, three-, and four-endmember spectral mixture models were applied to all of the subimages and compared to determine the best-fit model for each pixel in terms of number and type of endmembers. Allowed combinations of material classes were those listed in Table 2; the four-endmember model combinations were based on the analysis of reference data described above. All models included an endmember representing photometric shade to effectively account for variations in brightness of surfaces due to viewing angle, topography, and other forms of shading (Okin et al. 1999; Dennison and Roberts 2003). Dark surfaces were therefore modeled as a small bright endmember fraction and high shade fraction. All possible permutations of the final endmember library spectra for each model in Table 2 were tested for every pixel. Candidate models were selected based on three constraints. First, a minimum RMSE threshold of 2.5% reflectance was set for all models, the value most commonly accepted in the literature (e.g., Roberts et al. 1998b; Okin et al. 2001; Dennison and Roberts 2003). Second, the bright fraction values were constrained between 5% and 100%; and third, the shade fraction values were constrained between 0% and 55%. These fraction constraints were empirically determined based on the accuracy of MESMA output, and were optimized to eliminate spurious bright endmember fractions, following the work of Okin et al. (Okin et al. 1999).
For each pixel, the best model for each degree of model complexity (i.e., two endmember, three endmember, or four endmember) was identified by comparing all models that met the constraints above and selecting the model with the lowest RMSE, that is, the assumed best fit (Painter et al. 1998; Okin et al. 1999). If no model met all constraints, the pixel was left unmodeled. At this stage, each pixel was associated with up to three potential models, and each equally valid based on the criteria developed thus far. The next step, therefore, was to determine the most appropriate model in terms of model complexity. Several methods of comparing models of different degree were tested in this study, which provided an interesting framework to investigate the best method for model selection. Because all four-endmember models included a material type—impervious—that was a priori expected to be spatially constrained, the effectiveness of model selection rules could be initially assessed based on visual inspection alone and more rigorously evaluated by comparing modeled fractions to fractions derived from the reference data for three of the study areas.
The first method tested for selecting the best degree model was to rank models by RMSE, that is, to select the model with lowest RMSE (e.g., Painter et al. 1998). While this is a logical way to compare models of the same degree, it becomes problematic when comparing models of different degrees, because adding endmembers to the SMA model will almost always result in a reduction of error (Okin et al. 1999; Dennison and Roberts 2003), and the model with highest complexity is favored. The second method tested simply selected the model with lowest complexity; that is, the model with the lowest number of endmembers was given priority. This selection rule was effectively applied in the case study of Manaus (Powell et al. 2007), but when applied to the Rondônia samples the result did not adequately represent areas with impervious surfaces, because such areas tended to be highly heterogeneous at a subpixel scale.
Finally, we tested a combination of the two previous methods, giving priority to the lowest degree model, but considering a higher degree model if the RMSE was significantly lower, a strategy that has been successfully implemented for hyperspectral data (Halligan 2002; Dennison and Roberts 2003; Roberts et al. 2003). To determine the value of a “significant difference” in RMSE, a range of differences was tested, from Δ0.1% to Δ1.0% reflectance, while the maximum allowed RMSE remained constant at 2.5% reflectance. The fractions resulting from models selected in each test run were compared in terms of accuracy statistics and visual inspection to assess whether the spatial distribution of four-endmember models was reasonable. The RMSE difference which best captured urban extent and minimized “speckle” of impervious fractions in the nonbuilt-up areas was determined to be 0.25%.
In summary, model selection followed a three-step strategy. 1) Two types of fraction constraints were imposed on all models to minimize spurious nonshade fractions: (i) a minimum cutoff for nonshade fractions and (ii) a maximum cutoff for the shade fraction. 2) For a given model complexity, the model with lowest RMSE was selected. 3) Models of different degrees were compared in terms of RMSE, and models of higher complexity were selected only if the RMSE was significantly lower than the model(s) of lower complexity. In this study, a difference of more than 0.25% (or 10% of the maximum allowed RMSE) was required to meet this criterion.
The final product of this analysis was a set of fractional abundance maps for each class of materials (i.e., vegetation, impervious surfaces, soil, and water). Because shade was not considered a land-cover component but rather a variant on endmember brightness, the fractions of each pixel were shade normalized (Adams et al. 1993). For two-endmember models, the resulting shade-normalized fraction was 100%. After shade normalization, the endmembers were relabeled as their corresponding generalized class, and the GV and NPV fractions were combined into a single vegetation class. These shade-normalized, class-generalized fractions were combined to generate an image of each land-cover material, with values representing the physical abundance of that material. Pixels included in the water mask were assigned a fraction value of 100% water.
2.6. Accuracy assessment
Accuracy assessment was complicated by the lack of georeferencing typically associated with the videography due to malfunctioning GPS equipment the day of the flight over the study area. However, freestanding mosaics were generated for both the wide-angle and zoom videography using frame-matching software to build the mosaics (Hess et al. 2002). In total, five wide-angle mosaics were generated, two each for the larger urban areas (Ji-Paraná and Rolim de Moura) and one for Santa Luzia d’Oeste. The corresponding zoom data were mosaicked into 15 ten-second mosaics. All data that included built-up land cover were retained for analysis. Wide-angle mosaics were manually georeferenced to the Landsat subscenes; because of the grid structure of city streets, selecting ground control points (GCPs) was a relatively straightforward task. Zoom mosaics, in turn, were manually georeferenced to the wide-angle mosaics, using street intersections and corners of structures as GCPs.
A second factor complicating accuracy assessment was that the spectral contrast of the wide-angle data was not sufficient to confidently distinguish between classes. In particular, it was extremely difficult to differentiate between soil and NPV and some types of impervious surfaces; as a result, the zoom mosaics were considered the highest-quality data for accuracy assessment. The goal of our accuracy assessment was to compare the fractional abundance of each material class as predicted by spectral mixture analysis with the fractional abundance of each material class as measured from the videography. Because of the very high spatial resolution of the videography, the most efficient means of measuring the fractional abundance of materials was to segment the videography mosaics using a multiresolution segmentation algorithm as implemented in eCognition software (Baatz et al. 2003). Supervised classification was then applied to classify each segment as vegetation, impervious, soil, or water. Classified segments were visually inspected for accuracy, and errors were manually corrected.
Average width of the zoom mosaics was equivalent to 2.5 Landsat pixels, resulting in a very limited sample of the built-up areas, and sharp bends caused by aircraft tilt and pitch further limited the amount of usable information from each mosaic. To maximize the usable reference data, regions of interest (ROIs) were drawn along straight portions of the zoom mosaics, enclosing as large a contiguous area as possible. The vertex coordinates for the ROI polygons were extracted, and corresponding ROIs were located on the ETM+ subscenes. Because Landsat pixels are of a fixed resolution, the areas delineated on the image subscenes and on the video mosaics were not exactly the same; the image ROI always included more area than that delineated by the corresponding videography ROI, but effort was taken to minimize the bounding area on the Landsat imagery. A total of 41 ROIs were selected. The mean ROI size on the Landsat image was equivalent to 17.3 pixels, with a maximum area of 38 pixels and a minimum of 6 pixels. The average sample size was therefore approximately equivalent in area to a 4 × 4 pixel window. For each ROI, the fractions of materials present were extracted from the classified videography and from the corresponding area on the modeled Landsat image. Unmodeled pixels were excluded from average fraction calculations.
Accuracy measures were based on the correlation between modeled fractions and reference fractions from the videography. The accuracy of modeled fractions was assessed by the slope, intercept, and R2 values associated with the relationship. In an ideal case, when the modeled data perfectly represent the reference data, the slope of the relationship would equal one, the intercept zero, and the R2 value would approach one. Two types of error measurement were also used to evaluate the range of disagreement between fraction estimates: mean absolute error and bias. Mean absolute error (MAE) is the average absolute value of the difference between modeled and measured fraction values, while bias is the average of the error, indicating general trends in over or underestimation (Schwarz and Zimmermann 2005). Those equations are given as
where Z′ki is the is the modeled fractional value of land-cover component k measured at pixel i, Zki is the reference fractional value, and n is the number of samples.
3. Results and discussions
3.1. MESMA library and models
The final endmember library consisted of 14 endmembers in addition to photometric shade (Figure 5). The GV, NPV, and soil material classes each contained three endmembers, while the impervious class contained five endmembers. Two of the NPV endmembers were reference spectra; all other endmembers were image spectra. The final library included image spectra that had been derived from all five Landsat scenes. Allowed MESMA models consisted of all possible permutations of the generalized combinations listed in Table 2. This resulted in 14 two-endmember models, 72 three-endmember models, and 90 four-endmember models.
When the results for the 10 study sites were assessed in aggregate, 99.5% of the land pixels were successfully modeled (Table 3). A total of 10.7% of nonwater pixels in the study area were modeled by two-endmember models, 85.8% were modeled by three-endmember models, and 2.97% were modeled by four-endmember models. Additionally, per-pixel model complexity was mapped for each subscene, and this resulted in several interesting patterns. Pixels modeled by four-endmember models were generally concentrated in the urban centers, and as a result could be used to delineate the urban extent, as discussed below. Four-endmember models were also successful at highlighting the major roadways. Areas dominated by two-endmember models tended to be intact portions of the original landscape, that is, primary forest or natural savanna. Finally, human-altered portions of the landscape that were not built up (i.e., not human settlements) tended to be dominated by three-endmember models. Such areas included pastures, cropland, and some second-growth forest.
Maps of model complexity are shown for two of the samples: Ji-Paraná (Figure 6a), the second-largest city in the state, with a year 2000 population of approximately 91 000; and Seringueiras (Figure 7a), ranked 30th in the state based on a year 2000 population of approximately 4000. In both cases, four-endmember models—displayed in white—clearly delineated the built-up area of the urban center. The landscape around Ji-Paraná consisted primarily of pasture, with some second-growth forest, and these areas were mapped as three-endmember models—displayed in green. The few patches of primary or very old second-growth forest that remained were highlighted by two-endmember models—displayed as dark purple. The landscape around Seringueiras—a much younger settlement located on the fringes of development linked to the BR-364 (Figure 1)—still retained some sizable tracts of primary forest, and these areas were highlighted by a high concentration of two-endmember model pixels.
Summaries of model complexity for each subscene can serve as rough proxies for the degree of human impact on the landscape, as well as the type of impact (i.e., urban versus nonurban). Figure 8 displays the percent of each 400-km2 sample area modeled by each degree of model complexity, in order of ascending urban population. For all samples, three-endmember models dominated the landscape, indicating that the majority of the originally forested landscape around these urban centers had been cleared for crops or pasture. The graph representing two-endmember model abundance indicates the relative amount of natural land cover that was present in each subscene, with Ji-Paraná (number 2) and Espigão d’Oeste (number 7) having the least amount of unaltered land cover and Buritis (number 6) and Ariquemes (number 3) having the largest intact areas of primary forest. Similarly, the graph of four-endmember model abundance is roughly correlated with the size of the urban center. The number of four-endmember model pixels increased with population size, with the exception of Buritis and Ariquemes. A point of future research would be to investigate why these urban centers do not follow the general trend, either because of social and/or economic factors driving their development, or due to nonrepresentative impervious surface spectra in the endmember library.
To gain a general sense of how well V–I–S characterization of urban environments using MESMA captured the extent of built-up land cover, a well-established relationship between the built-up area of a city and its corresponding population was used. Nordbeck (Nordbeck 1965) and Tobler (Tobler 1969) demonstrated that the built-up area (A) of a settlement can be estimated with a high degree of accuracy if its population (P) is known, given the following allometric relationship:
The constants a and b are empirically determined, and relate to the nature of urbanization in a given region. For each sample in this study, the area with the highest concentration of four-endmember models was manually delineated, and this area was considered the “urban extent.” The log-linear relationship between the measured urban area and the urban population reported in the 2000 Brazilian census is presented in Figure 9. The strength of the relationship (R2 = 0.989) indicates that the four-endmember models sufficiently captured the built-up area of the urban environments sampled. Furthermore, the slope and intercept values associated with the relationship—1.108 and 7.390, respectively—are very similar to the relationship calculated by Sutton et al. (Sutton et al. 2001) for the entire country of Brazil. The latter study compared 85 urban clusters with 1997 population estimates, and obtained slope and intercept values of 1.186 and 7.446, respectively.
3.2. Fraction maps
Maps of the generalized fractions are shown for the Ji-Paraná and Seringueiras samples in Figures 6b–d and 7b–d. Bright areas represent higher fractions and dark areas lower fractions, while black pixels indicate absence of the material. In both sets of images, the impervious component is concentrated in the built-up urban center, and is also high along the major roads. The vegetation component is almost the inverse of the impervious component, with very low fractions in the urban centers and along roadways. Low vegetation fractions also occur in some pasture areas, especially in the upper-left corner of the Seringueiras sample, where there is a large tract of pasture that is dominated by soil. The soil fraction is an important component of urban land cover in this region, as indicated by the high soil fractions in the urban centers, even in the older and much larger city of Ji-Paraná. The soil component is also high in pasture areas, and along roadways, and remains quite low in the tracts of primary forest.
3.3. Accuracy assessment
The correlation between reference fractions and modeled fractions can be described in terms of slope, intercept, coefficient of determination (i.e., R2 value), MAE, and bias (Figure 10). The modeled vegetation fraction for each pixel is the sum of the shade-normalized GV and NPV (i.e., senesced vegetation) fractions. Correlation between reference and modeled fractions was highest for vegetation, with a slope of 0.868 and an R2 of 0.882, and lowest for impervious fraction, with a slope of 0.747 and R2 of 0.769. The soil fraction fell in between, with a slope of 0.805 and an R2 of 0.760. For all classes, both the intercept and the mean absolute error were close to zero, corresponding to less than 8% fractional cover (Table 4); these results fell well within expected error because the precision of estimating per-pixel land cover is limited by the modulation transfer function (MTF) of the Landsat sensor (Forster 1985; Townshend et al. 2000; Huang et al. 2002).
We should note here that agreement between reference and image fractions is relatively high in part because the process of endmember selection for the MESMA spectral library was circuitous; that is, one criterion for selecting endmembers for the spectral library was optimization of accuracy. This strategy is similar to the process Roberts et al. (Roberts et al. 1998a) employed to select image and reference endmembers to map another region of the Amazon; they based endmember selection on minimizing fraction and RMSEs and maximizing agreement with anticipated fraction combinations. While ideally an independent set of reference data would be used to report the accuracy statistics of the final products, in this case, we were limited by the availability of reference data. Though the final statistics reported are based on the same reference data used to prioritize selection of endmembers, the final results produced reasonable fractions for other subscenes not used for endmember selection, providing independent—albeit qualitative—support for the high accuracy reported.
Analysis of the residuals in terms of the bias can indicate whether there is a trend of over- or underestimation of modeled fractions for a given class. In this study, the bias for all classes was below 5% (Table 4). The impervious and vegetation fractions had slight positive biases, indicating general overestimation of the modeled fractions relative to the reference fractions, while soil had a negative bias, indicating consistent underestimation of the modeled fractions. The overestimation of impervious and vegetation fractions was approximately equal to the underestimation of the soil fractions, a result of the SMA constraint that forces the fractions for any pixel to sum to 100%.
A graph of the bias for each sample as a function of the reference fraction revealed patterns of over- and underestimation (Figure 10); for all materials, small fractions were somewhat overestimated while large fractions were somewhat underestimated. This pattern explains why the slope for all materials was less than 1.0, and the intercept greater than zero. Specifically, the vegetation residuals were fairly evenly distributed, though they were more consistently overestimated for fractions less than 50%. The impervious residuals were the most tightly clustered around zero, and were generally overestimated for fractions less than 30% and underestimated for fractions greater than 30%. There were no impervious fractions—reference or modeled—greater than 50%. The modeled soil fractions produced the largest residuals, and were generally underestimated for fractions greater than 40%.
Visual inspection of the MESMA output for all of the samples revealed several sources of error. First, unmodeled pixels could be placed into three general categories: 1) edge pixels between land and water, 2) recently burned (i.e., charred) pastures or fields, and 3) impervious pixels in the central business districts of Porto Velho and Ji-Paraná that appeared to correspond to blacktop roads. All three cases probably resulted from the MESMA constraint, which limited the maximum shade fraction and thereby limited the ability of the MESMA models to adequately represent dark surfaces. Second, it appeared that in some areas impervious fractions were confused with soil (e.g., portions of the Porto Velho and Ariquemes urban centers), and that NPV fractions were confused with impervious (e.g., outlying pastures sometimes contain impervious fractions).
A potential source of this confusion was the use of image endmembers in the MESMA library; as previously noted there were no pixel-sized areas in the reference data that contained more than 50% impervious cover. Therefore, there was a high probability that the impervious endmember spectra were actually mixed with soil and/or senesced vegetation. Because accuracy assessment presented in terms of correlations provides no specific information concerning the confusion between classes of materials, there is no way to unravel where the specific disagreement lies. To address these sources of confusion would require more reference data, perhaps supplemented with fieldwork, to clarify whether and where impervious surfaces were being over- and undermapped.
3.4. Characterizing urban environments
To consider how well V–I–S mapping using MESMA characterized variations between urban environments, we more closely examined the distributions of the impervious class within and between the study sites. As expected, the number of pixels with impervious cover increased for settlements of increasing population. However, we hypothesized that investigating the internal structure of settlements could provide insights to the trajectory of urban development in rapidly growing frontier such as Rondônia. Though the present study only considered a single snapshot in time, the study areas represent a range of populations, which could be used as a proxy for stages of development. To summarize structural changes, we compared the nonzero distribution of impervious surface fractions for pixels within the “urban extent” of each study site. To compare variations in distribution of percent cover, the impervious distribution for each settlement area was normalized by the total number of pixels within the urban extent (as defined in section 3.1. above) containing impervious fractions. The distributions were compared in terms of their central tendency, here taken as the median, and in terms of their variability, here taken as the interquartile range, or the difference between values at the 75th percentile and 25th percentile (Clark and Hosking 1986).
The normalized distributions for the 10 samples were grouped into three generalized categories (Figure 11). With one exception (Buritis, number 6), the groups included settlements of similar populations. Group A included Porto Velho (number 1) and Ji-Paraná (number 2) (and the one outlier, discussed separately below), the largest and oldest urban centers within the state. For these cities, the median value of the impervious distribution was approximately 23% cover, with an interquartile range of 17% cover. The distributions were positively skewed, indicating a large number of pixels with percent impervious cover greater than the median, especially when compared to the distribution of the two other groups. Group B contained the next three samples in order of population: Ariquemes (number 3), Rolim de Moura (number 4), and Pimenta Bueno (number 5). All three cities are central commercial hubs, located on the BR-364 or its major crossroads. The median impervious value for this group was around 17% cover, and the interquartile range was approximately 10%. The distributions were more symmetrical than group A, and there were few pixels with impervious cover greater than 40%. Group C contained the smallest of the samples in terms of population: Esigão d’Oeste (number 7), Santa Luzia d’Oeste (number 8), Seringuerias (number 9), and Theobroma (number 10). All of these towns are located on smaller feeder roads to the BR-364. The median of these distributions shifted to approximately 13% impervious cover, the interquartile range was narrower, approximately 7%, and there were virtually no pixels with impervious cover greater than 30%.
Based on this analysis of impervious surface composition, the general trajectory of urban growth in this frontier region might be summarized as follows. As a settlement grows in population and/or commercial importance, the number of pixels containing impervious surfaces increases in a predictable fashion, as demonstrated in Figure 9. The distribution of percent impervious cover evolves as well. As the population increases, the median percent impervious cover within the settlement increases (roughly the peak of the distribution), and the percent of pixels with an impervious cover greater than the peak also increases. Both measures indicate that impervious cover becomes denser as the settlement grows; however, there is also a corresponding increase in the variability of percent impervious cover, as indicated by the increase in the interquartile range.
The one outlier in the current study is the settlement of Buritis, an unusual settlement in the context of the region for several reasons. First, though Buritis is located on a feeder road quite far from the BR-364, it supported a relatively large population, approximately 15 000 in the year 2000. Second, Buritis is the youngest town in the present sample, and one of the youngest settlements in the entire state, as it did not even appear in Landsat imagery until after 1995. Finally, the streets of Buritis were laid out in a radial pattern, whereas all other settlements in the study were laid out in grid patterns. Given the settlement’s age and population, its impervious distribution was expected to be similar to those of group C. However, the median of Buritis’s impervious distribution was 20%, between that of group A and group B, and the interquartile range and general shape of the impervious distribution were closest to group A. The physical evolution of this settlement therefore appears to deviate from the otherwise systematic trajectory of morphological development—at least in terms of the summary measures for impervious surfaces presented above. This result leads to an obvious next research question: What factors influencing urban development in Buritis caused the morphology of this very young, very rapidly growing city to resemble the oldest cities of the region?
3.5. Test case comparisons
Modeled fraction accuracy was similar between Manaus (Powell et al. 2007) and Rondônia case studies, though the accuracy statistics for the sites in Rondônia were generally better than the Manaus study. Since the average size of the Rondônia accuracy samples was approximately equivalent to a window size of 4 × 4 Landsat pixels, the accuracy statistics for the Rondônia region were compared to those derived for Manaus using a 5 × 5 pixel window (Table 4). For all surfaces types, the MAE was lower for the Rondônia samples, especially for the impervious and soil fractions. While the slope for the impervious fractions was somewhat lower for the Rondônia samples, indicating a poorer correlation between modeled and reference fractions, there was a substantial improvement in modeling soil fractions. In the Manaus case, the relationship between reference and modeled soil fractions was poor, with a slope of 0.42 and an R2 of 0.28, compared to a slope of 0.81 and R2 of 0.76 for the Rondônia soil fractions. The source of this improvement could be due to a combination of factors, including better choices regarding constraints and model selection rules and/or higher resolution, implying higher quality, reference data.
The most striking difference between the Manaus and Rondônia studies is the degree of model complexity required to sufficiently characterize the built-up land cover in each region. In Rondônia, four-endmember models were required to capture impervious surfaces because, as discussed above, almost all urban land cover at the scale of a Landsat pixel consisted of three material types plus shade. However, in Manaus, including four-endmember models did not appreciably increase the accuracy of V–I–S fractions in the built-up area, nor did it significantly increase the number of pixels modeled. Additionally, including four-endmember models introduced error in the nonbuilt-up portions in the Manaus study area. As a result of these factors, only two- and three-endmember models were needed to model V–I–S fractions in Manaus (Powell et al. 2007). In contrast, the Manaus spectral library included substantially more endmembers (26) than the Rondônia spectral library (14). In particular, the Manaus library included 11 impervious spectra, while the Rondônia library included only 5 impervious spectra.
These differences—the degree of model complexity and the number of spectra required to capture built-up land cover—can be explained in part by the characteristics of the cities being modeled in each region. Manaus is significantly larger than any of the Rondônia cities in terms of population—with a year 2000 population that is a factor of 5 times greater than that of Porto Velho (IBGE 2005)—and significantly older in terms of development—having been an established town by the late eighteenth century, while Porto Velho did not emerge in its present location until the early twentieth century (Browder and Godfrey 1997). Both factors result in differences in the relative area of built-up land cover, the density of impervious cover, and the diversity of surface materials. Because Manaus is larger and older, its land cover is composed of a greater variety of materials than the younger, smaller cities in Rondônia, and as a result, more endmember spectra are needed to capture the spectral variability of surface. On the other hand, because the cities of Rondônia are at an earlier stage of development, they are not as densely built up. The lower density of impervious surfaces results in a more complex combination of materials at the subpixel scale, and four-endmember models are necessary to model the built-up surface.
While we have thus far only applied this methodology to two regions (each for a single image acquisition date), we purport that the basic spectral components of the landscape are similar for most of the Amazon basin (e.g., Adams et al. 1995; Roberts et al. 1998a; Roberts et al. 2002). In addition, the two test cases represent quite different trajectories of urban and regional development in terms of driving forces and the time scale of urbanization. We therefore propose a conceptual model for the evolution of land-use change on the Brazilian Amazon frontier in terms of the spectral complexity and the spectral diversity of the landscape. First, land cover that has been minimally impacted by humans can be modeled predominantly as two-endmember models and requires very few spectra (i.e., a small endmember library) to represent the dominant components of the landscape. Second, land cover that has been cleared for use by humans is spectrally more complex, and generally requires three-endmember models to represent the landscape. Third, settlements or other built-up areas that emerge have higher spectral complexity because of increased spatial and spectral heterogeneity. These areas require four-endmember models to adequately represent the spatial variability of land-cover components, and the endmember library must also include a more diverse set of spectra to capture the increased diversity of materials. Finally, as an urban area ages, the spatial diversity of the built-up area decreases (i.e., larger continuous areas are covered with impervious surfaces), while the spectral diversity of the urban land cover increases (i.e., there is a greater variety of materials in the built environment). As a result, lower complexity mixture models (e.g., two- or three-endmember models) adequately represent the spatial heterogeneity, but a more diverse spectral library is necessary to capture the variety of impervious materials present.
The trajectory described above supports the principal conclusion of Small (Small 2005) that spectral complexity is the defining characteristic of urban land cover and that built-up areas can be separated from undeveloped land cover based on differences in spectral complexity. Here, we expand this conclusion to suggest that the source of urban spectral complexity evolves through time. Initially, the spectral complexity of settlements is dominated by spatial heterogeneity of the built-up land cover. However, through time, the spectral complexity of urban areas becomes dominated by the great variability of urban materials, that is, spectral heterogeneity, and spatial heterogeneity may be less important as the urban space is increasingly filled in with built-up surfaces. A direction of future research is to investigate whether this conceptual model describing the evolution of the spectral complexity of built-up environments generally holds for newly urbanizing landscapes in other regions of the world.
The comparison of the Manaus and Rondônia case studies indicates that the V–I–S components of an urbanizing landscape cannot be mapped by generic application of MESMA. Several factors in MESMA implementation may need to be customized to adequately capture the diversity of ground materials and optimally represent the spatial complexity of built-up land cover. First, construction of the endmember library depends on linking the best representative library spectra to materials in the scene, and the details of how this link is established will depend on the spatial complexity of the built-up surface. Second, the parameters of MESMA implementation should be allowed to vary depending on the characteristics of the urban land cover and peri-urban landscape being mapped. These details include fraction constraints, allowed endmember combinations, and rules to select between models of different complexity. Determining the most appropriate context for endmember selection and selecting the most effective parameters for MESMA implementation depend on the existence of high-quality reference data; without available reference data, the procedures detailed herein would not be possible.
However, the results also demonstrate that such customization is not required on a per-settlement basis. That is, if settlements within the region are reasonably similar in terms of material composition and physical structure, the same spectral library, fraction constraints, and model selection rules may be simultaneously applied to all settlements.
Additionally, the high correlation between the V–I–S fraction maps and the reference data indicates that the output of the MESMA application was comparable in both study regions tested. We propose that the flexibility of MESMA implementation, which allows adjustments to the modeling procedure so that local conditions are optimally represented, is a major strength of the technical procedure. Because the output can be generalized within the V–I–S framework, maps of the fractional abundance of urban materials represent physical quantities that can be compared across and between regions. Thus, the techniques presented here can be tailored to capture the regionally specific properties of urban land cover yet simultaneously provide measures that are physically meaningful on a global scale.
The MESMA procedures we have outlined are not limited to the datasets we have used in this paper, but can be similarly applied to other moderate spatial and spectral resolution sensors with a sufficient number of spectral bands to resolve four-endmember mixing models. To simultaneously solve for SMA fractions and model RMSE, the number of spectral bands must at least be equal to the number of endmembers in the model (Adams and Gillespie 2006). Examples of moderate-resolution sensors currently in operation that collect data that might be substituted for Landsat include the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), the fifth satellite in the Système Pour l’Observation de la Terre (SPOT) series (SPOT-5), and China–Brazil Earth Resources Satellite (CBERS). Additionally, MESMA could be applied to datasets collected by sensors with higher spectral resolution, such as Hyperion or Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). Analysis of data with higher spectral dimensionality has the potential to improve discrimination of materials that are spectrally similar in Landsat mixing space, such as soil and impervious, or soil and NPV (e.g., Herold et al. 2003). Finally, other sources of high spatial resolution data, such as aerial photography, IKONOS, and/or QuickBird imagery, could be used to facilitate the accuracy assessment process using the methodology outlined above.
This work has also demonstrated that model complexity—how many endmembers are required to accurately model each pixel—is related to nature of human impact on the landscape. For the Rondônia sites, urban areas and roadways were dominated by four-endmember models; nonbuilt-up portions of the landscape that had been heavily impacted by human activity, such as pasture or second-growth forest, were modeled by three-endmember models; and intact portions of the natural landscape, such as primary forest or savanna, were dominated by two-endmember models. Thus, summaries of model complexity can serve as a rough proxy for degree and type of human impact on the environment. The success of delineating the built-up extent of a settlement based on model complexity suggests the possibility of developing future applications to automate the mapping of urban extent.
Finally, we have demonstrated that V–I–S components characterized in this manner can capture inter- and intraurban variability. This result suggests that these data products can contribute to parallel and comparative studies of urbanizing areas through time and across regions. By comparing the composition of the V–I–S components through time, the trajectory of physical land-cover change as an urban area develops can be characterized; in the case of Rondônia, the timeline of this trajectory is almost entirely captured by satellite imagery. Comparing urban areas within and between regions can serve as a valuable first step in building connections between social systems—be they economic, political, or cultural—and the physical environment of the urban landscape. Comparing the development trajectory of regional systems of urban centers can provide insights into the role of urbanization in regional land-cover transformation and environmental change.
This research was funded by a NASA Earth System Science Graduate Student Fellowship, as well as NASA Grant NCC5-282 as part of LBA-Ecology. Digital videography was acquired as part of LBA-Ecology investigation LC-07. The Landsat ETM+ imagery was acquired from the Tropical Rain Forest Information Center (TRFIC). We also thank the reviewers for their helpful comments.
* Corresponding author address: Rebecca L. Powell, University of Denver, 2050 E. Iliff Ave., Denver, CO 80208. email@example.com