Visible and infrared data obtained from instruments onboard geostationary satellites have been extensively used for monitoring clouds and their evolution. The Advanced Baseline Imager (ABI) that will be launched onboard the Geostationary Operational Environmental Satellite-R (GOES-R) series in the near future will offer a larger range of spectral bands; hence, it will provide observations of cloud and rain systems at even finer spatial, temporal, and spectral resolutions than are possible with the current GOES. In this paper, a new method called Precipitation Estimation from Remotely Sensed information using Artificial Neural Networks–Multispectral Analysis (PERSIANN-MSA) is proposed to evaluate the effect of using multispectral imagery on precipitation estimation. The proposed approach uses a self-organizing feature map (SOFM) to classify multidimensional input information, extracted from each grid box and corresponding textural features of multispectral bands. In addition, principal component analysis (PCA) is used to reduce the dimensionality to a few independent input features while preserving most of the variations of all input information. The above method is applied to estimate rainfall using multiple channels of the Spinning Enhanced Visible and Infrared Imager (SEVIRI) onboard the Meteosat Second Generation (MSG) satellite. In comparison to the use of a single thermal infrared channel, the analysis shows that using multispectral data has the potential to improve rain detection and estimation skills with an average of more than 50% gain in equitable threat score for rain/no-rain detection, and more than 20% gain in correlation coefficient associated with rain-rate estimation.
The accurate estimation of the amount, and temporal and spatial distribution of precipitation is critical to a wide range of applications from global climate modeling to local weather and flood forecasting. Precipitation is a key component of the earth’s hydrological cycle and it has great effects on human lives and property. At regional to global scales, the existing ground-based precipitation observation networks are insufficient; thus, satellites provide a viable and attractive alternative. Clearly, the continuing development of satellite-based precipitation retrieval algorithms that provide progressively better estimates of precipitation has been a growing area of research because of the opportunities and the challenges it entails.
On the basis of the assumption that colder cloud tops are correlated with higher rain rate, cloud-top infrared (IR; ∼11 μm) data from geosynchronous earth-orbiting (GEO) satellites have been frequently used to provide high spatial and temporal resolution rain retrievals. In general, these indirect approaches give good results when rain estimates are integrated over larger time and space scales, but they tend to provide high uncertainty for instantaneous rainfall estimates at small scales (Arkin and Meisner 1987). In contrast, passive microwave (PMW) data, particularly over oceans, capture hydrometeor-relevant information and facilitate more accurate instantaneous rainfall estimates because they are more directly related to rain rates. To date, the main disadvantage of PMW sensors has been that they are carried onboard low Earth-orbiting (LEO) satellites; therefore, they are limited in their temporal resolution. The Global Precipitation Measurement (GPM) mission will coordinate the collection of higher-quality PMW-based global rainfall observations at ∼3-h average revisit time and ∼10-km pixel resolution (Hou et al. 2008). Although this represents a significant improvement, the current and future network of PMW sensors will continue to lack the spatial and temporal resolutions that are required by some applications. Among such applications are flash floods caused by extreme convective storms, where the life of the storm from initiation to dissipation can occur within an hour or less and is confined to a highly localized area.
Many recent algorithms show encouraging results by combining data from IR and PMW sensors (Adler et al. 1993: Bellerby et al. 2000, 2009; Hsu et al. 1997, 2009; Huffman et al. 2001, 2007; Joyce et al. 2004; Kidd et al. 2003; Kuligowski 2002; Kummerow and Giglio 1995; Levizzani et al. 1996; Miller et al. 2001; Sorooshian et al. 2000; Todd et al. 2001; Turk et al. 2000, 2003; Xu et al. 1999). Arguably, GEO satellite observations will continue to play an important role in rain estimation (Huffman et al. 2007), and the challenge is to provide more accurate estimates. One possible direction of improving the utility of GEO satellites is to explore the benefit of multispectral imagers for rain detection and estimation. This is consistent with the increasing number of spectral bands on recent and future GEO-based instruments, along with higher time and space resolution. For example, the operational Spinning Enhanced Visible and Infrared Imager (SEVIRI) instrument, onboard Meteosat Second Generation (MSG), has 12 spectral bands and currently scans the earth’s surface every 15 min, with a pixel size of 3 km at the subsatellite point (Schmetz et al. 2002). Similarly, the future Geostationary Operational Environmental Satellite-R (GOES-R) series, planned to be launched in early 2015, will carry the Advanced Baseline Imager (ABI), which will provide images in 16 spectral bands ranging from 0.47 to 13.3 μm, with improved temporal and spatial resolutions (Schmit et al. 2005).
A great deal of research has been devoted to precipitation-relevant multispectral studies, either using multispectral bands to characterize clouds and understand some of the near-cloud-top microphysical processes associated with precipitation processes (Arking and Childs 1985; Levizzani and Setvák 1996; Pilewskie and Twomey 1987; Rosenfeld and Gutman 1994; Turk and Miller 2005) or to delineate the areal extent of precipitation (Behrangi et al. 2009; Cheng et al. 1993; Inoue and Aonashi 2000; Lensky and Rosenfeld 2003; Lovejoy and Austin 1979; Tsonis 1984). Efforts to estimate rain rate from a combination of channels have been also investigated, mainly limited to using only a single visible and a single infrared band together (Bellerby et al. 2000; Griffith et al. 1978; Hsu et al. 1999; King et al. 1995; Negri et al. 1984; O’Sullivan et al. 1990). A few studies have used multispectral bands (more than two channels) for rain-rate estimation. Kurino (1997) argued that areas with brightness temperature difference (Tb11μm − Tb12μm) equal or greater than 3K correspond to cirrus clouds with no rain, and areas with (Tb11μm − Tb6.7μm) < 0 correspond to deep convective cloud with heavy rain. Using these three channels and composite digital radar data, he developed a three-dimensional (3D) lookup table of rain probability and mean rain rate (MRR) to estimate both deep and shallow precipitation. In another effort, Ba and Gruber (2001) developed the GOES Multispectral Rainfall Algorithm (GMSRA), which uses five spectral bands—0.65, 3.9, 6.7, 11, and 12 μm—to estimate rainfall. GMSRA builds on a series of previously developed techniques: raining clouds are identified using spatial temperature gradients (Adler and Negri 1988) and time changes (Vicente et al. 1998), and during the daytime, effective radii of cloud particles(Rosenfeld and Gutman 1994). Rainfall rates are derived from 11-μm brightness temperatures and are also adjusted for subcloud moisture in a manner similar to Vicente et al. (1998). The self-calibrating multivariate precipitation retrieval (SCaMPR; Kuligowski 2002) is another multispectral approach, but it adds the dimension of being calibrated in real time against rain rates from PMW sensors: regression techniques are used to select the optimal predictors (selected from GOES bands 3–6 plus some derived quantities) and coefficients for rain/no-rain discrimination and rainfall rate estimation.
In this paper we develop a multidimensional rain estimation technique called Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks–Multispectral Analysis (PERSIANN-MSA) to estimate rain rate using multispectral data plus textural features. Textural features represent various aspects of each channel’s value across several neighboring grid boxes in a satellite image. The results are also compared to evaluate the value of such multidimensional data. PERSIANN-MSA employs the ANN-based self-organizing feature map (SOFM; Kohonen 1982) to cluster multidimensional input information in a manner that facilitates the assessment of their individual and combined utility in rain estimation. By clustering input features into localized maps, SOFM has the advantage of facilitating analysis capabilities, and by extension, the ability to interpret the nonlinear output resulting from ANN models (Behrangi et al. 2009; Hsu et al. 1997, 1999; Tapiador et al. 2004) and to shed light on the value of ANN in rain estimation. In section 2, we describe the datasets used in the study and the study location. Algorithm development and scenarios of combined channel selection are described in section 3. The results of these combination scenarios are then compared in section 4 using both statistical scores and visual analyses. Lastly, the conclusions of this study are presented in section 5.
2. Description of the dataset
As part of the algorithm development efforts for GOES-R, the GOES-R Algorithm Working Group (AWG) has archived datasets for development and validation work, including SEVIRI data for the first nine days from each of January, April, July, and October of 2005 at the full temporal (15 min) and spatial (3 km at subpoint) resolution of the instrument. Of the 12 spectral channels for each SEVIRI image, 10 channels in the present study were used, centered at 0.635 (hereafter referred to as VIS0.6), 0.81 (VIS0.8), 1.64 (NIR1.6), 6.25 (WV6.2), 7.35 (WV7.3), 8.70 (IR8.7), 9.70 (IR9.7), 10.80 (IR10.8), 12.0 (IR12.0), and 13.4 μm (IR13.4).
Notice that despite its demonstrated value, the 3.9-μm channel is not considered in this study. There are several studies in support of using the 3.7-μm (and by extension 3.9-μm) channel to retrieve cloud particle properties and rainfall (Arking and Childs 1985; Rosenfeld and Gutman 1994; Rosenfeld and Lensky 1998; Ba and Gruber 2001), retrieval of cirrus cloud parameters (Rao et al. 1995), and observation of convective storm tops and plumes (Levizzani and Setvák 1996; Setvák et al. 2003), among others. During the daytime, the 3.7/3.9-μm channel radiance contains both solar reflection and thermal emission. To consider the effect of solar zenith angle (SZA) on the reflection component of this channel, the solar component must be quantified and uncoupled, which in itself has been a subject of several investigations in the past (Rao et al. 1995; Rosenfeld and Lensky 1998; Setvák and Doswell 1991). However, in practice, most of these studies are subject to some necessary simplifications and assumptions, such as zero transmissivity of clouds, which is only acceptable for optically thick clouds. Some other simplifications have been addressed by Setvák et al. (2003). Therefore, despite of its importance, the SEVIRI 3.9-μm channel is not considered in the present comparative study, preventing any misinterpretation due to the necessary simplifications.
In this study, the first five days of each month were allocated for algorithm training and calibration, with the remaining four days set aside for verification and testing. A study region covering longitudes 30°W–0° and latitudes 15°S–15°N was selected because 1) it only includes the ocean as background, where the PMW rain estimates used for training and calibration purposes are more accurate; 2) it includes the equatorial region, where a large number of major precipitation events is expected; 3) it minimizes the effects of parallax on the cloud locations because the SEVIRI images are near nadir; and 4) it maximizes the sample size for daylight-only bands by avoiding higher latitudes, where the length of daylight is relatively short in January (Northern Hemisphere) or July (Southern Hemisphere) and SZA becomes significant.
The first three channels (VIS0.6, VIS0.8, and NIR1.6) are highly affected by SZA and must be normalized. We tested two different normalization methods suggested in the literature (i.e., Cheng et al. 1993; King et al. 1995; Minnis and Harrison 1984; Tsonis and Isaac 1985): 1) the inverse of cos(SZA) and 2) the inverse square root of cos(SZA). Our experiments show that, for SZA < 60°, using 1/cos(SZA) results in more reasonable correction (Behrangi et al. 2009), and thus the former method was chosen and applied for grid boxes with SZA < 60°.
Because of the limited amount of available ground observations of rainfall in the study area, a set of combined intercalibrated normalized PMW-derived rain-rate estimates, provided by the National Oceanic and Atmospheric Administration’s Climate Prediction Center (NOAA/CPC; Joyce et al. 2004), was used for training, calibration, and validation purposes. The combined PMW data includes rain-rate estimates from the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI), the Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E), the Special Sensor Microwave Imager (SSM/I), and Advanced Microwave Sounding Unit/Microwave Humidity Sounder (AMSU/MHS) sensors. The precedent order of TMI, AMSR-E, SSM/I, and AMSU/MHS is used for sensor overlaps in a 30-min period (Joyce et al. 2004). Both the combined PMW rain estimates (with a grid size of about 0.07° × 0.07°) and the SEVIRI multispectral images (with the nominal grid size of 3 km) were remapped onto common 0.08° latitude–longitude grids and approximately coincident pairs were used. Because of the scan time of the SEVIRI images (15 min) and the temporal precision of the combined PMW rain maps (30 min), a 30-min time lag (at worst) between the pairs can occur, which to some extent could insert uncertainties in both training and evaluation processes.
a. Scenario development
To assess the role of multispectral data in improving satellite rainfall estimation, we selected 12 combination scenarios (Table 1). A fair comparison is ensured by distinguishing “anytime” from “daytime” scenarios: “anytime” scenarios are those excluding both visible and near-infrared bands (VIS06, VIS08, and NIR1.6). This results in five scenarios (scenarios 1–5, hereafter group 1) that are applicable to both daytime and nighttime hours, whereas the remainder (scenarios 6–12, hereafter group 2) are only applicable during the daytime. As described in section 2, the daytime hours consist of grid boxes with SZA < 60°, allowing more reliable normalization of visible channels during daytime. Although “anytime” scenarios can be used during daytime and nighttime, the study is restricted to only daytime images to provide a direct comparison of the various approaches. Notice that scenarios 3, 4, 10 and 11 contain GOES 8–11 imager-like channels, and thus their results are arguably extendable to current GOES as well.
b. Input feature extraction and compression
Local textural features around each satellite grid box are useful in rainfall estimation (i.e., Wu et al. 1985). In this study, we extract five features similar to those used in the PERSIANN (Sorooshian et al. 2000) algorithm. These include the grid-box value itself along with the means and standard deviations of 3 × 3 and 5 × 5 windows of pixels centered on each grid box. Clearly, using both multispectral images and textural features significantly increases the number of inputs (see Table 1) and imposes significant computational demand on the algorithm. However, the high interband correlations, which reflect redundancy among the spectral bands, can be employed to reduce the dimensionality of the problem. One common way to obtain such reduction is by using principal component analysis (PCA). PCA uses an orthogonal linear transformation of the data to a new coordinate system, such that the first coordinate (principal component) contains the greatest data variance, the second coordinate contains the second largest data variance, and so on (Jolliffe 2002). Depending on the field of application, PCA is also known as Karhunen–Loeve transform (KLT), empirical orthogonal functions (EOFs), or the Hotelling transformation (Anderson 2003; Jolliffe 2002). Briefly, let X be the data matrix containing N variables. The covariance matrix (𝗖) of X is calculated as follows:
where X is the mean value of X. The linear transformation of X to orthogonal matrix 𝗬 is
where 𝗬 is the matrix of principal components and 𝗩 is the eigenvector matrix of the covariance matrix 𝗖. The transformed components in 𝗬 are uncorrelated to each other, and the covariance matrix of principal components is
where 𝗗 is diagonal matrix of eigenvalues of 𝗖. The total variance of the data matrix can be represented as , where the λi values are the eigenvalues ranked from the largest to the smallest variance. For relatively high-dimensional scenarios in our study, as shown in the last column in Table 1, the number of principal components selected for the developmental data preserve 99% of the total variance.
c. Classification of the input features
In this study, the ANN-based SOFM technique (Kohonen 1982) is used to classify input vectors into a number of classes (hereafter clusters), each presenting a unique combination of input features. Rain observations are not used to train the SOFM because experience shows that uncertainties in the assumed-true rain rate tend to degrade the result. The cluster centers are arranged into a two-dimensional discrete map, randomly initialized near the center of the feature space, and trained by introducing input vectors one by one. After training, the clusters are spread out in the input feature space, and each cluster center represents neighboring input vectors with similar properties (Behrangi et al. 2009). By using the SOFM technique, features within each cluster retain the same order in which they were introduced to the network. This property helps to visualize the input features as described in section 3d.
The process of training the SOFM, which is further described in Kohonen (1982) and Hsu et al. (1999), consists of introducing input vectors one by one from the training dataset to the network. All input feature must be standardized so that they become comparable to one another in magnitude. The distance d between each standardized input vector (xi, i = 1, … n0) and the center of each SOFM cluster is calculated as follows:
where n1 is the total number of clusters and wij is the cluster-center vector (weight vector) of the SOFM that connects input feature i to the specified cluster j.
The best-matching SOFM cluster c (winning node) is the one corresponding with the minimum distance (dc) between the input feature vector and the SOFM connection weights wij as follows:
Through a recursive process of competitive cluster selection and weight adjustment, the locations of the cluster centers are incrementally shifted in the N-dimensional vector space until they become stable. Thereafter, the trained SOFM has the ability to assign any arbitrary input feature vector xi to an SOFM cluster (with fixed weights) according to its minimum distance.
d. Calculating mean rain rate for each cluster
Given the classification described in section 3c, each SEVIRI grid box is assigned to one of the SOFM clusters, bringing with it the corresponding coincident PMW rain estimate. By processing the entire algorithm development dataset through the SOFM network, the MRR for each cluster is calculated as follows:
where MRRC is the mean rain rate for cluster c, RRC is the rain rate (including zero rain rate) of every single grid box of the cluster c, and NrC and NnC are the numbers of rain and no-rain grid boxes within the cluster c, respectively.
Because the SOFM preserves the topological order of the input features (Hsu et al. 1999; Kohonen 1982), a distinct map of input features and MRR can be displayed for each cluster. Figures 1a and 1b show these input maps for scenario 8, which has only two input features. The grids in Fig. 1 represent clusters arranged in a 2D network with a size of 20 × 20. Comparison of albedo (Fig. 1a), brightness temperature (Fig. 1b), and MRR (Fig. 1c) maps demonstrates that brighter and colder cloud (similar to convective clouds) is associated with higher rain rates (zone B), and almost no precipitation is expected from low-reflectance regions, either with high temperature (zone D, similar to clear-sky condition) or low temperature (zone C, similar to thin cirrus clouds).
e. Precipitation estimation
The power-law regression (e.g., Martin et al. 1990; Vicente et al. 1998; Kuligowski 2002) and the histogram matching (e.g., Huffman et al. 2007; Kidd et al. 2003; Todd et al. 2001; Turk et al. 2003; Hong et al. 2004) techniques have been commonly used in estimating rain rate using brightness temperature of clouds obtained from a single IR band. Although the latter techniques may reduce systematic errors from spatially mismatched cloud features and surface rainfall (Kidd et al. 2003), they are based on the assumption that rainfall rate monotonically increases as cloud-top temperature decreases (Scherer and Hudlow 1971: Scofield 1987). However, as illustrated in Fig. 1, this assumption does not always hold: a relatively warm and thick cloud (high albedo) can produce substantial rainfall (zone A), whereas in some cases, almost no precipitation is observed when cold clouds have low reflectance (i.e., cirrus cloud, zone C).
The classification of multispectral data to discrete clusters and the subsequent calculation of MRR for each cluster (Fig. 1c) provide the basis to populate the rain histogram using clusters (which lumps many input features together) as opposed to the IR Tb alone. In other words, using multispectral (or textural feature) classification and statistics, the notion that “more intense rainfall belongs to colder temperature group” is modified to “more intense rainfall belongs to a group of higher MRR.” This view suggests applying the histogram matching technique to redistribute the data to the clusters based on the MRR of each cluster.
As such, the MRR for each cluster is first calculated. The cluster (containing N1 samples) that presents the highest MRR is given the first rank and the cluster (containing N2 samples) that demonstrates the second highest MRR is labeled the second rank. This continues for all clusters, and the number of samples in each cluster is recorded. Afterward, the first-ranked cluster will be assigned the N1 highest rain-rate values, the second highest MRR cluster will be given the next N2 highest rain-rate values, and so on for all of the clusters in the map, and then a new mean rate (MRR2) is calculated for each cluster. A comparison of MRR2 (Fig. 1d) with MRR (Fig. 1c) shows that MRR2 is increased in clusters having high MRR and decreased in clusters containing low MRR.
4. Evaluation and comparison of results
The PMW rain estimates are taken to be the “observation” and are used to build the contingency table with concurrent SEVIRI-based rain estimates, which represent “model predictions.” The construction of the contingency table is based on identifying binary (0/1 or yes/no) events. This is accomplished by selecting a threshold (0.1 mm h−1) above which a rain event would be considered to have occurred. This approach yields information on the algorithm’s ability to delineate rain/no-rain areas and enables us to compute evaluation statistics that include the following:
H (hits) = number of grid boxes correctly classified as rain,
M (misses) = number of grid boxes incorrectly classified as no rain,
F (false alarm) = number of grid boxes incorrectly classified as rain, and
Z (correct no rain) = number of grid boxes correctly classified as no rain.
The ETS allows the scores to be compared “equitably” across different regimes (Schaefer 1990) and is insensitive to being influenced by systematic over- or underforecasting.
In addition to the categorical evaluation statistics, the following continuous evaluation statistics were computed: 1) correlation coefficient (CORR), 2) root-mean-square error (RMSE) and 3) volume bias (BIASυ). These statistics are used to evaluate the skill of each scenario in estimating rain intensity. Unlike the bias in areal coverage (BIASa), which is computed using the contingency table, BIASυ represents the ratio of the average estimate to the average observation:
where N is the total number of grid boxes, RRest(k) is the estimated rain-rate value for grid box k using PERSIANN-MSA, and RRobs(k) is the PMW rain-rate value for grid box k, treated as a rain-rate observation in this study. As a side note, both area and volume bias give only an overall comparison of rain magnitude and its areal extent; they do not measure the magnitude of the errors. For fair evaluation, all statistics should be considered simultaneously, which in some cases is not an easy task, particularly when they show improvements in some aspects but degradation in others.
Table 2 shows the overall statistics for all 12 scenarios listed in Table 1. To facilitate cross comparisons between the scenarios, the statistics in Table 2 are plotted in Fig. 2. Notice that these statistics are computed from all coincident PMW SEVIRI images for which at least two percent of the total grid boxes in the image contain rain.
Comparing group 1 with group 2, significant improvement in skill is evident whenever the VIS06 channel is used in combination. Similar results are achieved by replacing the VIS06 channel with VIS08 (not shown here). Scenario 6 (only VIS06 without any textural features) results in substantially better rain/no-rain statistical scores than any of the scenarios in group 1, in which no visible channel is included. However, for rain-rate estimation, using albedo alone does not produce significant improvement over the remaining combinations, including the single IR channel (scenario 1). The role of textural features is highlighted by comparing scenario 7 with scenario 6. Scenario 7 adds extracted textural features to the albedo channel in scenario 6 and shows significant improvements over the IR-only scenario 1, particularly for rain/no-rain detection. This implies that textural features extracted from VIS06 add information about cloud type, and thus improve the classification and result in more robust estimation of rain intensity. Arguably, the important role of the visible channel for precipitation retrieval can be linked to its utility in removing thin cirrus clouds. In addition, using IR-only data may result in screening out precipitation from relatively warm but dense raining clouds, which is an error that can be avoided by adding the VIS channel. Our assessment of the value of visible channels for rain retrieval is consistent with some previous studies (e.g., Ba and Gruber 2001; Tsonis and Isaac 1985).
Comparison of statistics within group 1 shows that scenarios containing more spectral channels generally produce better skill for both rain detection and estimation. However, the improvement is more pronounced when comparisons are made for the rain/no-rain delineation problem. This observation is also valid for the scenarios in group 2. Note that the improvements within each group (groups 1 and 2) are less significant than between groups, highlighting the important role of the visible channel. Although including more spectral bands was found to be effective, the statistics are systematically better when textural features are added to gridbox values for each spectral image (see scenarios 2, 4, 6, 8, and 10 as compared to scenarios 1, 3, 5, 7, 9, and 11 in Table 2).
Comparison of scenarios 4 and 5 (group 1) and scenarios 11 and 12 (group 2) with other scenarios demonstrates that PCA is a useful technique that compresses input features while preserving precipitation-relevant information (see Tables 1 and 2). The correlated features were compressed to only four independent variables from the original 20 and 35 correlated features in scenarios 4 and 5, respectively, while maintaining 99% of the variance in the data. Similarly, the 25 and 50 correlated features (in scenarios 11 and 12) were compressed into six and eight independent features, respectively. The effectiveness of the PCA technique to compress the original set of input features into a fewer number of independent features reduces the computational burden and speeds up the rain estimation process. This is particularly important when the algorithm is used for operational purpose.
The comparison of all scenarios, except 5 and 12, with scenario 1 (IR only) indicates that using the common multispectral channels (visible, water vapor, and thermal IR channels) mostly available from current GOES series satellites can lead to various levels of improvement, particularly for precipitation detection. Although only one water vapor channel exists in most of the current GOES series, experiments have shown that even a single water vapor channel adds information relevant to precipitation (Ba and Gruber 2001; Behrangi et al. 2009; Desbois et al. 1982; Kurino 1997; Shenk et al. 1976). Following Behrangi et al. (2009), we define a relative gain/loss metric by referencing all scenarios to the performance associated with scenario 1—the commonly used IR (∼11 μm) channel. Each element of the gain/loss column shown in Table 2 is calculated as follows:
where S and i represent the performance metric (ETS, POD, FAR, CORR, or RMSE) and scenario index (i = 1, … 12), as shown in Tables 1 and 2. For example, in Table 2, the gain in ETS for scenario 2 is computed as follows:
Whether the previous index is considered as a gain or a loss depends on whether an increase or decrease of the value of performance measure is better or worse. For example, considering FAR and POD, the former is said to have gained if Eq. (12) yields a negative number, whereas the latter gains when Eq. (12) produces a positive value.
Employing the additional SEVIRI channels (i.e., NIR1.6, WV8.7, IR9.7, and IR13.4) results in only marginal improvements. Although more detailed study is needed to assess the role of each of these spectral bands, the complementary role of NIR1.6 for rain detection has been reported in previous studies (e.g., Capacci and Conway 2005; Inoue and Aonashi 2000). Within group 1, the all-infrared channel combination along with individual channel textural features (scenario 5) results in 12.7%, 9.8%, 7.7%, 4.6%, and 4.8% gain in ETS, POD, FAR, CORR, and RMSE, respectively. Including the remaining “daytime spectral bands” (scenario 12) leads to further skill improvements of 53.4%, 30.9%, 36.6%, 21.4%, and 12.5% respectively.
To further examine gains and losses due to using multispectral data and textural features, we selected 77 precipitation events from the subset of coincident SEVIRI PMW overpass images, containing at least 3000 rainy grid boxes. This criterion was set to narrow the comparison to relatively extensive rainfall events. For each event, Eq. (12) was applied to the evaluation statistics of scenarios 5 and 12. Figure 3 summarizes the gain/loss values associated with all 77 events. Except for a few events, both scenarios 12 and 5 consistently outperform scenario 1 in terms of rain/no-rain detection skill, as indicated by the ETS (Fig. 3a). With respect to rain intensity, the correlation coefficient (Fig. 3b) and root-mean-square error (Fig. 3c) also demonstrate the superior skill of scenario 12. The skill of scenario 5, which employs all “anytime” spectral bands with their textural features, was not as consistent as scenario 12, and in some cases it was below that of scenario 1. Similar results are observed for scenario 4 (not shown), indicating that infrared multispectral data are not as useful for rain-rate estimation as for rain detection.
A visual event–scale exploration of a precipitation event at 1157 UTC 9 July 2005 is shown in Fig. 4. Brightness temperature (IR10.8), normalized albedo (from VIS06) and PMW rain-rate maps are shown in Figs. 4a–4c, respectively. The rain-rate estimates from scenarios 1, 2, 5, 6, 7, 8, 9, and 12 are shown in Figs. 4d–4k. The ellipses in the scenario figures highlight noticeable regions of false detection (black) and misses (red). The performance measures for these eight scenarios are shown in Table 3.
From Figs. 4d–4f it is clear that scenarios 1, 2, and 5, which do not contain daytime-only channels, have relatively low skill in screening cold and thin cloud, and detecting rainfall from warm and relatively thick clouds. Introducing the daytime-only channels (Figs. 4g–4k) results in significant improvement in evaluation statistics, and even using albedo without any textural features (Fig. 4g) results in satisfactory performance. However, scenario 6 does not show good skill for estimation of intense rain rates, resulting in substantial underestimation of total rain amount. Again, the addition of textural features improves rain retrieval performance, as shown in Table 3 and discerned by comparing Figs. 4e, 4h, and 4j with Figs. 4d, 4g, and 4i, respectively. Consequently, from Fig. 4 and Table 3, one can also infer that using additional spectral bands can lead to important improvements, particularly in delineation of the areal extent of precipitation.
For the above rainfall event and using scenario 8 (bispectral scenario: Tb and albedo), Fig. 5 illustrates a classified cloud map and PMW rain coverage (Fig. 5a) as well as the corresponding map of the SOFM clusters (Fig. 5b). The figure geographically maps the SOFM clusters associated with scenario 8 and is constructed by assigning the corresponding cluster ID to each image pixel. The resulting cluster image (Fig. 5a) is then colored using the 2D color bar in Fig. 5b, which reflects the cluster position in SOFM map described in Fig. 1. In Fig. 5b, similar to Fig. 1, the rain rate fades in all directions from its maximum value in the upper-right corner (zone B) toward no rain in the three remaining corners (zones A, C, and D), and the rain/no-rain boundary denoted by the white line is identical to the rain/no-rain boundary in Fig. 1d. Examining the rain area delineated by the PMW rain estimates (area inside the black line in Fig. 5a), the SOFM clusters help to capture the areal extent of rainfall reasonably well. The colors red and yellow are related to the image pixels of low IR temperature and high albedo (also see Fig. 1), which correspond well to the intense PMW rain area. The shades of purple are associated with low IR temperature and medium albedo. Clouds with such properties are thin-layer cold clouds or cirrus and are mainly consistent with observed no rain in Fig. 5a. Similarly, the clusters in green shades are warm clouds with medium reflectance. Clusters with dark greens appear to be thick clouds with a potential to give warm rain. Lastly, image pixels associated with the clusters near the bottom-left corner of Fig. 5b are warm and have low albedo, where no rain is observed. Although the figure constructed here is based on VIS06 and IR10.8, similar figures can be constructed for any spectral combination regardless of the number of bands.
The evaluation against the PMW performed so far in this section provides a measure of how well PERSIANN-MSA is able to “fit” the target PMW data. However, it does not provide a measure of the absolute skill of the algorithm, and it does not provide an independent validation dataset. Consequently, validation was also performed on a selected set of VIS/IR scenarios using NASA’s TRMM level 2A near-surface rain-rate estimate (TRMM-PR 2A25) as the “truth” rainfall rate. The precipitation radar (PR) onboard TRMM is generally considered to be the most accurate source of rain information for the study area, which is over the open tropical ocean. The original horizontal resolution of the PR rain estimate product is 5 km, and it was remapped to 30-min images of 0.08° × 0.08° (latitude × longitude) grid boxes prior to evaluation.
Table 4 presents the overall PR-based evaluation statistics for the three most distinct scenarios: IR only (scenario 2), all IR-based channels (scenario 5), and all studied channels (scenario 12). In addition, the table includes the same PR-based statistics to evaluate PMW rain estimates, which are exclusively from TMI, given the requirement for coincident PR data. The evaluation statistics are computed for the entire study period, including both calibration and validation periods. The statistics support the previous results, indicating that including spectral channels helps to improve the rain estimation. For visual demonstration, Fig. 6 and Table 5 document a precipitation event at 1342 UTC 1 July 2005. Maps of brightness temperature (from IR10.8), normalized albedo (from VIS06), and PR rain estimate are shown in the top row of Fig. 6. Scenarios 2, 5, and 12, and the PMW rain estimate are displayed in the bottom row. PMW (Fig. 6g) performs the best followed by scenarios 12 (Fig. 6f), 5 (Fig. 6e), and 2 (Fig. 6d). The superior performance of the PMW estimates relative to the PERSIANN-MSA scenarios is not at all surprising; given that PERSIANN-MSA trained against PMW, the best that could be hoped for would be a statistical tie. Although the TRMM PR is expected to provide the most reliable rain-rate estimates over the study area, it has a poor temporal sampling, and thus its use requires a much longer record of SEVIRI data to achieve any statistically robust conclusion. Even pooling the dependent and independent days as we did here cannot provide as many samples as the PMW does during the independent days. Consequently, more definite results will only come from a much longer record of SEVIRI data.
In this paper, an algorithm called PERSIANN-MSA was developed to estimate precipitation rate from multidimensional inputs. The proposed algorithm was tested over the equatorial Atlantic Ocean west of Africa, using 10 SEVIRI spectral bands. An unsupervised feature classification technique, SOFM, was used to classify input spectral features to a predetermined number clusters. The mean rain rate (MRR) for each SOFM cluster was calculated based on time/space-matched PMW rain rate and SEVIRI data. Finally, a probability matching method was used to assign grid boxes with higher rainfall values to clusters with higher MRR. This technique extends the IR-only histogram matching concept to multiple dimensions.
The role of multispectral data and textural features in improving rain estimation, and rain and no-rain detection skills was investigated by defining 12 input scenarios and calculating performance metrics using PMW precipitation estimate as the reference precipitation. Our results indicate the following:
Visible channels add significant information, mainly regarding cloud thickness. Including at least one visible channel (either VIS06 or VIS08) can significantly improve both rain-rate estimation and rain area detection.
Other spectral channels were also found useful for improving the algorithm’s skill for both rain detection and estimation. However, these improvements were not as significant when VIS information is excluded, as is the case at night. For anytime (day and night) scenarios, adding the water vapor channel was found to be effective. Further studies are needed to determine the effectiveness of each single channel.
The textural features, as defined herein, provide information about the gridbox neighborhood and can improve statistics. Therefore, extraction of textural features in conjunction with multispectral bands is once more demonstrated as a valuable source of information for precipitation retrieval.
PCA is an effective tool for the compression of high-dimension multispectral data. Our analysis demonstrates that the first few principal components are sufficient to extract the majority of the independent information, with the benefit of a significant reduction of the input dimensionality as well as the computational cost of multispectral rain estimation. However, PCA can lead to a loss of information; one must proceed cautiously, because there is no guarantee that the selected directions of maximum variances include the best features for rain retrieval. Plus, a low-order truncation might fail to adequately capture specific rare—but meteorologically important—situations.
Global observations in the visible, water vapor, and thermal IR bands are currently available with relatively high temporal and spatial resolution from existing GEO satellites. As more spectral bands become available from recently launched (e.g., SEVIRI) and future (e.g., ABI) sensors, more emphasis on the analysis and development of multispectral precipitation retrieval algorithms is expected. In this paper, we investigated the value of spectral bands in improving precipitation retrieval using the proposed method. More detailed studies are needed, but we are hopeful that these efforts in conjunction with the anticipated launch and operation of NASA’s Global Precipitation Measurement (GPM) mission will help to provide more consistent, higher quality global rainfall observations.
Partial financial support is made available from NASA Earth and Space Science Fellowship (NESSF; Award NNX08AU78H), NASA-PMM (Grant NNG04GC74G), NOAA/NESDIS GOES-R Program Office (GPO) via the GOES-R Algorithm Working Group (AWG), and NASA NEWS (Grant NNX06AF934) programs. The authors thank Mr. Dan Braithwaite for his technical assistance on processing the satellite data for this experiment. The contents of this paper are solely the opinion of the authors and do not constitute a statement of policy, decision, or position on behalf of the GOES-R Program Office, NOAA, or the U.S. government.
Corresponding author address: Ali Behrangi, The Henry Samueli School of Engineering, Department of Civil and Environmental Engineering, E.4130, Engineering Gateway, Irvine, CA 92697-2175. Email: email@example.com