1. Introduction
Many societal applications that use precipitation information require high-quality data with fine spatiotemporal resolution (Kirschbaum et al. 2017). Satellite observations of high-quality precipitation data are typically derived from passive microwave (PMW) sensors on board low-Earth-orbiting satellites because microwave radiation interacts directly with precipitation-sized particles (Kidd and Huffman 2011; Kidd and Levizzani 2011; Tapiador et al. 2012). However, even with a constellation of low-Earth-orbiting satellites, PMW observations cover only part of the Earth at any point in time. To deliver complete observations, the Global Precipitation Measurement (GPM) mission U.S. gridded product, the Integrated Multisatellite Retrievals for GPM (IMERG), relies on morphing, in which motion vectors are used to perform quasi-Lagrangian interpolation between successive PMW observations of precipitation (Huffman et al. 2020). The propagated precipitation fields, one from a propagation forward in time and one from a propagation backward in time, are supplemented by infrared (IR) precipitation estimates from geosynchronous satellites. While IR precipitation estimates are of lower quality due to the indirect nature of the retrieval, they have a high spatiotemporal resolution with nearly complete coverage in the latitude band of 60°N/S.
In IMERG, three precipitation fields—the forward propagated precipitation field, the backward propagated precipitation field, and the IR precipitation field—are combined using a Kalman filter following Joyce and Xie (2011). Essentially, at each grid box the three precipitation estimates are averaged with relative weights that are empirically derived. However, while such averaging preserves the mean value of precipitation, the probability distribution function (PDF) of averaged values differs from the PDF of the original instantaneous retrievals. Averaging pushes estimates toward the overall mean, expanding the areal coverage by precipitation, increasing the occurrence at the low end, and decreasing the occurrence at the high end. This was borne out clearly by Rajagopal et al. (2021), showing that the IMERG precipitation that is derived from averaging multiple fields has PDFs that are distinct from the PMW precipitation PDFs. They demonstrated how, in two case studies, the time series of the maximum precipitation rate within a mesoscale convective system “sagged” in between PMW observations and the area of another system triples in just one half-hour immediately after a PMW observation. More generally, the distortion to the PDF of the averaged precipitation increases in severity the longer the precipitation is propagated, suggesting the potential contribution from errors in the motion vectors to this problem. Such a distortion of the PDF manifests in ground validation as higher false alarms and a greater underestimation of intense precipitation in morphed estimates compared to PMW estimates (Tan et al. 2016; Maranan et al. 2020).
To resolve this issue, we introduce a new algorithm, called the Scheme for Histogram Adjustment with Ranked Precipitation Estimates in the Neighborhood (SHARPEN), that restores the PDF of the averaged field to the PDFs of the individual fields. Inspired by quantile mapping, this scheme is implemented as a filter on the averaged precipitation estimates, mapping the original Kalman filter estimates to instantaneous retrieval values drawn from the three parent precipitation fields in their local environment, thereby modifying the distribution of averaged precipitation rates to the distributions of the parent fields. While this study is concerned with the implementation of this algorithm in IMERG, the underlying concept can more generally be applied to any precipitation field whose PDF is modified due to averaging.
2. Data and methods
a. IMERG
IMERG is the gridded U.S. merged-satellite precipitation product from the GPM mission that combines observations from a network of partner satellites in the GPM constellation (Huffman et al. 2019a,b,c, 2020). In IMERG V06, precipitation estimates are provided on 0.1° grids every half-hour globally. IMERG has three runs—Early, Late, and Final—to accommodate different user requirements for latency and accuracy. The Early Run, available at a 4-h latency, is suitable for real-time applications such as in the prediction of flash floods. The Late Run, with a 12-h latency, can be used for purposes such as water resource management. The Final Run is at a 3.5-month latency and is intended for research applications.
IMERG relies on precipitation estimates derived from passive microwave sensors using the Goddard Profiling algorithm (Kummerow et al. 2001, 2011, 2015; Kummerow 2017; Randel et al. 2020) and the Precipitation Retrieval and Profiling Scheme (Kidd 2018, 2019). The PMW estimates are gridded to 0.1° in half-hour intervals, with conical-scanning sensors prioritized over cross-track scanning sensors, followed by priority for an observation time closest to the center of the half-hour, should there be multiple observations within the same grid box. The gridded PMW estimates are then intercalibrated to the Ku-band combined radar and radiometer product (Olson 2018; Grecu and Olson 2020). To fill in gaps in the PMW field, the precipitation estimates are propagated forward and backward in time over the half-hour time steps using motion vectors computed from total precipitable water vapor in numerical models (Tan et al. 2019), a process first introduced in the CPC morphing method using IR brightness temperature (Joyce et al. 2004; Climate Prediction Center 2011; Xie et al. 2017). The propagated precipitation is supplemented with microwave-calibrated IR precipitation estimates computed by the Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks–Cloud Classification System (PERSIANN-CCS) algorithm (Hong et al. 2004; Nguyen et al. 2018). All three estimates are merged via a Kalman filter approach following Joyce and Xie (2011), creating the “best” satellite-only estimates from IMERG, which in the Final Run is then calibrated with gauge analyses from the Global Precipitation Climatology Centre Full and Monitoring products (Schneider et al. 2014, 2015). Due to the need for timely data delivery, a more limited selection of data is employed in the near-real-time Early and Late Runs.
b. Kalman filter
The propagated precipitation is a crucial component in IMERG. However, since the propagation does not capture the evolution of the precipitation system, the quality of the estimates will degrade as the estimates are propagated further in time. In addition, any systematic or persistent error in the motion vectors will build up with propagation time, further contributing to the reduced performance. This decline in skill means that precipitation retrievals from IR, while being of lower quality compared to instantaneous PMW retrievals, become competitive compared to the same PMW precipitation that has been propagated several hours (Joyce and Xie 2011).
The Kalman filter is a technique that allows IMERG to account for such changes in skill. In a grid box without a PMW retrieval in that half-hour, a combination of forward propagated precipitation, backward propagated precipitation, and IR precipitation is used. As implemented in V06, the combination is a weighted average, with the weights proportional to the square of the positive correlation between the “type” of estimate and a reference estimate (simply referred to as “Kalman correlation”). The estimate type is based on two factors: 1) whether the estimate originated from a conical-scanning PMW sensor, a cross-track-scanning PMW sensor, or the geosynchronous IR observation and 2) the number of half-hours a PMW observation has been propagated. The reference used in IMERG is the gridded and calibrated estimates from the GPM Microwave Imager (after May 2014) and the TRMM Microwave Imager (prior to May 2014). These correlations are computed at 20° × 20° regions (ocean) or 10° latitude bands (land) every month over a 3-month period that is either centered on (Final) or trailing (Late and Early) that month. The spatial and temporal coarseness is needed to ensure sufficient sample sizes, with additional postprocessing to reduce fluctuations due to noise. Therefore, the Kalman correlations give us a quantitative measure at every location of the skill of, say, an estimate from Special Sensor Microwave Imager/Sounder (SSMIS) that is propagated forward for three half-hours compared to an estimate from Microwave Humidity Sounder (MHS) propagated backward for two half-hours compared to an estimate from IR. This construct allows us to take the weighted average of all three estimates for a final satellite-only estimate from the Kalman filter (which we refer to as “Kalman estimate”).
c. SHARPEN
While the weighted average gives a better estimate compared to any one of the three parent estimates, the PDF of averaged estimates is different from the PDF of the original instantaneous retrievals, which most notably increases the precipitating area and decreases the peak precipitation. One way to restore a distribution is through quantile mapping. Given a set of target precipitation rates and reference precipitation rates, quantile mapping changes the target precipitation rates such that their distribution follows the reference precipitation rates. In our case, the target precipitation is the Kalman estimate and the reference precipitation is the collection from three fields: forward propagated precipitation, backward propagated precipitation, and IR precipitation.
In general, quantile mapping utilizes the cumulative distribution functions of the reference and target precipitation rates to derive an intensity-dependent multiplier that is applied to the target precipitation rates. SHARPEN is a new scheme inspired by quantile mapping, but has its own adaptations. While quantile mapping computes an intensity-based multiplier that is applied to the target values, SHARPEN replaces the target value with a value from the reference. This replacement, or mapping, is performed within a local environment at the same half-hour to capture the precipitation systems within the vicinity. Specifically, we define an n × n latitude–longitude template within which we map the target precipitation rates to the reference precipitation rates. Since we are dealing with a limited set of values—up to n2 target values and up to 3 × n2 reference values—we can map the values directly instead of using a multiplier. That is, we replace the highest value in the target precipitation to the highest value in the reference precipitation, and so on. Therefore, in a way, SHARPEN performs a discrete version of quantile mapping, in which the target values are replaced by reference values in a sorted order. In section 3c, we will examine how n affects the output of SHARPEN.
Since we have about 3 times as many reference values as target values (though not precisely 3 times, as there may be missing values in the reference fields), not all the reference values will be selected. One simple way is to use every third reference value. A more sophisticated approach is to select the reference values based on their weights derived from the Kalman correlations. The higher the weight, the more likely the reference value will be selected for replacement. Mathematically speaking, this means that the PDF contribution of each reference value is proportional to its weight from the Kalman filter.
With a mapping technique that is local (using the n × n template), discrete (mapping of values instead of a multiplication factor) and weighted (probability is scaled by its Kalman correlations), we could use the single set of PDFs for the template to adjust all of the original Kalman estimates in the entire area covered by the template. However, doing so would very likely create artificial data jumps at the boundaries between templates. Therefore, one key design choice in SHARPEN is to replace only the center grid box of the template. That is, for a particular grid box (i, j), extract the n × n template centered on (i, j), compute the mapping for all the n2 target values, but replace only the target value in (i, j). This is done independently for each grid box, so adjacent templates overlap to a large degree, ensuring smooth variations in the statistics. An illustration of this procedure is given in Fig. S1 in the online supplemental material.
d. Algorithm
The SHARPEN algorithm, implemented as a filter to the Kalman estimates, is as follows. For each grid box, if the target value is 1) derived from PMW observations, 2) missing, or 3) over frozen surface, it will be skipped. Otherwise, the following steps will be applied to each grid box:
Extract the target values, reference values, and reference weights in the n × n template centered on the grid box.
Sort the reference values (with attached reference weights) based on the ascending order of the reference values.
Take the cumulative sum of the sorted reference values with probabilities proportional to their weights, scaled so this “cumulative distribution function” ranges from 0 to 1.
Split the cumulative distribution of reference values into n2 equal intervals, where n2 is the number of target values in the template.
Determine the rank r of the target value in the center grid box based on the sorted list of target values for the entire template. Identify the reference value where the cumulative distribution function intersects with the center of the rth band and use it to replace the target value in the grid box.
Figure 1 illustrates the algorithm using the simplified example of a 3 × 3 hypothetical template. Considering only forward and backward propagated fields as reference (with no missing values or IR values), this means that there are 18 reference values and nine target values (step 1). The 18 reference values are sorted in ascending order (step 2), and their weights are used to construct the cumulative distribution function of the values (blue line; step 3). That is, the vertical steps of the blue curve are proportional to the weights of reference values x1, x2, …, x18. This 0–1 range is then divided into nine bands, corresponding to the nine target values (step 4). Suppose the target value in the center grid box is y7 (ranked seventh within the nine values); the center of the seventh band intersects the cumulative distribution function at x12 (red line), which replaces y7 (step 5). This entire process is then repeated for the next grid box.
Illustration of the SHARPEN algorithm using a simplified example of a 3 × 3 template giving 18 reference values (x1, x2, …, x18) with varying weights and nine target values (y1, y2, …, y9). The cumulative distribution of the sorted reference values is given by the blue curve, whose vertical increments are proportional to the varying weights of the reference values. The cumulative distribution curve is used to map a target value to a reference value; the example here shows y7 being mapped to x12 (red line). The horizontal dotted red lines demarcate the nine bands corresponding to the nine target values.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
When determining the rank, a common complication that arises is multiple instances of the same precipitation value. Most of the time, this repeated value is zero, but repeating nonzero values may occur due to discretization in the input from upstream modules. As such, the target estimate to be replaced may correspond to a range of ranks with the template. In this case, we choose the mean of the lowest and highest ranks before mapping it to a reference value. Continuing the example above, suppose y6 = y7 = y8, in which case the lowest rank is 6 and the highest rank is 8. This averages out to 7, so y7 is used to perform the mapping. Note that the rank used to perform the mapping does not have to be an integer (see Fig. 1). Although the effect is slight, the alternative of mapping to each of the identical targets to their references values first and then averaging all of the adjusted values will lead to the undesirable effect of providing an averaged value, not an instantaneous value.
Finally, due to the highly skewed distribution of instantaneous precipitation values, the process above can lead to a slight bias. This is best illustrated using the extreme case of a 1 × 1 template, in which we only have one target value (y1) and three reference values from the forward propagated field, the backward propagated field, and the IR precipitation field (x1, x2, x3). With only one target value, there will be only one band that spans the entire y axis and thus SHARPEN will always select the middle of the y axis in Fig. 1, in which case it will usually be mapped to x2. (The only situation in which it will be mapped to x1 or x3 is when the weight of x1 or x3 exceeds the sum of the other two.) Since precipitation values have a long-tail distribution, x2 is generally less than the mean of all three reference values, resulting in an underestimation. This underestimation is small (within a few percent) and—consistent with expectations—reduces with larger templates. Therefore, the global precipitation field from SHARPEN is multiplied by a single scaling factor at each half-hour such that its mean value is equal to that of the original field of Kalman estimates prior to SHARPEN.
3. Evaluation
a. Two case studies
Figure 2 shows a snapshot of a precipitation scene for 0000 UTC 1 July 2018 over the Solomon Islands in the tropical western Pacific. While the forward and backward propagated precipitation are broadly similar—suggesting that the motion vectors accurately captured the large-scale motion of the systems—the evolution of the precipitation systems and intersensor differences resulted in differences in the details. For example, the forward propagated precipitation field shows some finescale structure with narrow “strands” of high precipitation rates, while the backward propagated precipitation field shows a more circular structure in intense precipitation. The Kalman estimate, a weighted average of the forward and backward propagated precipitation and IR precipitation (a minor contribution in this case), smoothed out the detailed differences but showed an increase in the area of precipitation with reduced peak values. Passing the Kalman estimates through SHARPEN with a template size of 25 × 25 mitigated this shortcoming. In fact, a close inspection of the precipitation patterns revealed that the SHARPEN output captured spatial features from both the forward and backward propagated precipitation fields, possessing both the strands and circular structure of the parent fields. Note that, in this particular case, the backward propagated precipitation field has been propagated longer than the forward propagated precipitation field, so it has a correspondingly lower weight, leading the SHARPEN output to more closely resemble the forward propagated precipitation field.
(a)–(e) Snapshots of the precipitation fields over the Solomon Islands at 0000–0030 UTC 1 Jul 2018, showing the forward and backward propagated precipitation, the IR precipitation, the original Kalman precipitation, and the SHARPEN precipitation at the same half-hour. The blue square in (e) indicates the size of a 25 × 25 template.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
Figure 3 shows another snapshot over tropical western Africa at the same time, where a series of westward-propagating mesoscale convective systems was present. This example occurred during a period in the active storm season associated with African easterly waves. In this case, the positions of the precipitation systems in the forward propagated precipitation field, the backward propagated precipitation field, and IR precipitation do not match precisely. The precipitation systems in the backward propagated field are displaced south and/or west compared to the forward propagated field. This may be a result of motion vectors that are inaccurate or a fast evolution of the precipitation systems. As a result, the original Kalman estimate field, being the average of the three precipitation fields, leads to a “smearing” of the precipitation systems, even though the mean precipitation rate remains similar. Applying the SHARPEN filter mitigated the issue and produced a precipitation field that is visually more consistent with the parent fields.
As in Fig. 2, but for 0000–0030 UTC 1 Jul 2018 over tropical equatorial Africa.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
b. Precipitation statistics
Figure 4 shows the fractional contribution to total precipitation (i.e., the PDF multiplied by the bin value) and cumulative contribution to the total precipitation from different intensities of the forward and backward propagated precipitation, the original Kalman estimates, and the output from SHARPEN using nearly 10 billion values from July 2018 globally. Here, a template of 25 × 25 is used; using different template sizes does not lead to discernibly different distributions. As expected, the original Kalman estimates have more low values and fewer high values compared to the propagated precipitation. It should be noted that the propagated precipitation fields have a distribution that is similar to the instantaneous PMW retrievals, as the propagation procedure does not involve averaging precipitation pixels except when they converge, which occurs in the minority of cases. The SHARPEN output, on the other hand, was clearly able to “restore” the distribution of the Kalman estimates through the combined effect of mapping and scaling.
The (a) fractional contribution and (b) cumulative contribution of different precipitation bins to total precipitation in forward and backward propagated precipitation, the original Kalman precipitation, and SHARPEN precipitation globally for July 2018. The bins are logarithmically spaced, and each field has nearly 10 billion pixels.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
Figure 5 illustrates two other aspects of the precipitation statistics: the mean precipitation rates (including zero values) and the fraction of nonzero precipitation. Despite the shift in the distribution of precipitation values in the Kalman estimates, the mean precipitation is comparable to the forward and backward propagated precipitation. The effect of more light precipitation balances out the reduced intense precipitation. Due to the bias adjustment in the SHARPEN filter, the mean precipitation rate is nearly equal to that of the Kalman estimates. Without the bias adjustment, the SHARPEN estimates would have underestimated the precipitation by 1.5% compared to the Kalman estimates (not shown). Meanwhile, SHARPEN clearly reduced the precipitating area to that of the propagated precipitation fields, while at the same time improved the representation of extreme precipitation.
The mean precipitation rate (including zero values), percent of nonzero precipitation rates, and percent of intense precipitation rates (defined as ≥20 mm h−1) from forward and backward propagated precipitation, the original Kalman precipitation, and SHARPEN precipitation globally for July 2018.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
c. Ground validation
While the statistics of the SHARPEN output are clearly preferable to those of the Kalman estimates, a more stringent evaluation is to compare against ground observations. To this end, we evaluate the Kalman estimates from July 2018 against the Multi-Radar Multi-Sensor (MRMS) product over the contiguous United States processed in support of GPM ground validation (Kirstetter et al. 2012, 2014, 2015). This product aggregates the MRMS precipitation rates to half-hourly accumulated precipitation rates over 20°–55°N, 130°–60°W with a high spatial resolution of 0.01°, which we average to 0.1°. Each MRMS grid box is also associated with a radar quality index that reflects sampling and estimation uncertainty, which allows us to select only the most reliable grid boxes. Furthermore, MRMS is adjusted with the Hydrometeorological Automated Data System and regional rain gauge networks, which allows us to exclude grid boxes with gauge correction factors outside the range of 0.5 to 2—indicating potential unreliability hinted by large discrepancies in the surface data. We select two metrics to evaluate the precipitation estimates: Heidke skill score (HSS) at 0.2 mm h−1, which quantifies the detection skill, and the Pearson correlation coefficient of the hits, which characterizes the random error of the intensities. See Wilks (2011) for more information on these metrics. While the evaluation is performed only over one month, the large sample sizes (>30 million for HSS and >400 000 for correlation) assure that any differences will be significant, though the degree to which SHARPEN improves the skill may be affected by the predominance of different precipitation systems in other months.
Figure 6 shows the HSS and correlation of the original Kalman estimate and SHARPEN over the contiguous United States for July 2018 with a range of template sizes. Consistent with the reduction in precipitating area, the HSSs of SHARPEN at all template sizes are higher than the HSS of the original Kalman estimate. Furthermore, the improvement increases with template size, albeit with diminishing returns. On the other hand, while correlation increases diminishingly with larger template size, they are all below that of the original Kalman estimate. This is not a surprising result, since a “smeared out” field by itself will lead to higher correlations. This is confirmed by applying a two-dimensional Gaussian filter to smooth the precipitation field at each half-hour (Fig. 7). With smoothing, the correlation of the output of SHARPEN increases, even exceeding that of the original Kalman estimates (Fig. 6) with moderate smoothing, though the correlation starts dropping once the smoothing becomes sufficiently heavy. Even the HSS increases slightly with a light smoothing. This behavior can be explained by the fact that smoothing leverages the spatial correlation structure of the precipitation system to reduce random noise in the data. For our purpose, it serves to illustrate that the decrease in correlation of SHARPEN can be attributed in large part to the sharper precipitation field it produces compared to the original Kalman estimates.
HSS, correlation, and run time of the original Kalman precipitation and the SHARPEN precipitation for various template sizes, computed over the entire month of July 2018 over the contiguous United States. More than 30 million pixels and 400 000 pixels are used in the computation of HSS and correlation, respectively, for each template size.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
Tests of the effect of smoothing on the HSS and correlation between the SHARPEN and MRMS precipitation fields. (a)–(f) The top and middle panels illustrate the effects of smoothing, showing snapshots of a precipitation field from MRMS and SHARPEN, as well as SHARPEN output with a Gaussian filter of various standard deviations σ (0.1°) over the eastern United States. (g)–(h) The bottom panels show the HSS and correlation as a function of the degree of smoothing, computed over the entire month of July 2018 over the contiguous United States.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
Computational expediency is an important practical aspect of IMERG, especially for the near-real-time runs. Figure 6 also includes the average run time of the module producing the Kalman estimates. Each run of the original Kalman estimates—producing half-hour fields—takes 80 s on average. Adding the SHARPEN filter increases the computation time, which escalates as the template size get larger, reaching up to more than 15 min for a template of 45 × 45. The most computationally intensive part of SHARPEN is the sorting, which is performed twice: once for the reference values and once for the target values. We employed the quicksort algorithm (Hoare 1961), which has a time complexity of O(N logN), where N is the input size. Coupled with the fact that, for a template of n × n, there are up to n2 target values and 3n2 reference values, the rapid increase in computation time is not surprising.
4. Using stride to reduce computation time
While a larger template size leads to an improved accuracy with diminishing returns, it also results in increased computation time that escalates with template size. Since the computationally intensive portion of SHARPEN lies with the sorting, one way to ease the computational burden is to replace multiple grid boxes for each round of sorting. We call this approach “stride.” In the n × n template, instead of replacing the target value in only the center grid box, we replace the target values in the center k × k grid boxes, where—intuitively—k ≪ n to minimize boundary artifacts. For example, in a 25 × 25 template, we may apply the same sorted lists to replace the target values in the center 3 × 3 grid boxes, which would result in a reduction in time spent sorting by a factor of 9 compared to replacing just the target value in the center 1 × 1 grid box (see Fig. S1 for an illustration). Obviously, valid values of k, which must be odd for purposes of symmetry, are determined by whether the global 0.1° grid (3600 × 1800) is divisible by k; hence, k can take only the following values: 1, 3, 5, 9, 15, 25, 45, 75, 225, and 1800.
Figure 8 shows the HSS, correlation, and run times of SHARPEN with a template size of 25 × 25 and various strides compared to MRMS over the contiguous United States in July 2018. Reassuringly, the HSSs and correlations are only modestly affected by stride; in fact, the differences between various strides are much less than the differences between template sizes. On the other hand, stride substantially reduces the run times, even for a stride of just 3 × 3. In fact, the further reductions in run time at larger strides compared to 3 × 3 are relatively marginal. These conclusions of nearly constant HSS and correlations and reduced run times hold for other template sizes (not shown).
HSS, correlation, and run time of the original Kalman precipitation and the SHARPEN precipitation with various strides and a template of 25 × 25 over the contiguous United States during July 2018.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
While the effect of stride on the performance of SHARPEN estimates is small, a chief concern of stride is the introduction of artificial boundaries. Here, we examine the possibility of this artifact by computing the mean precipitation fields for the month of July 2018. Figure 9 shows the mean precipitation rate over the Bay of Bengal and Southeast Asia for the SHARPEN output with a 25 × 25 template and no stride (Fig. 9a), as well as the anomalies to Fig. 9a when using a 3 × 3 stride (Fig. 9b) and a 15 × 15 stride (Fig. 9c). The use of stride does give rise to blocks of size equal to the stride, but the magnitudes of the anomalies are substantially smaller than the precipitation rates themselves. These blocks are most prominent in the vicinity of coasts, with higher values on the ocean side of the blocks, likely reflecting the difference in distributions of precipitation over land and over ocean (with stride causing SHARPEN to use the wrong distribution as reference). The magnitudes of the anomalies increase with stride, confirming our earlier intuition that k ≪ n to avoid boundary artifacts. Incidentally, this also supports our choice of not replacing all the values in the template (section 2c), which is equivalent to setting k = n and thus introducing block artifacts that are likely discernible upon detailed inspection of the precipitation accumulation.
Mean precipitation rate for July 2018 over the Bay of Bengal and Southeast Asia from SHARPEN with a template of 25 × 25 and (a) no stride, (b) the percent anomaly to (a) of the same template but with a 3 × 3 stride, and (c) the percent anomaly to (a) of the same template but with a 15 × 15 stride. Percent anomaly is defined as the difference between the precipitation field with stride and the precipitation field with no stride, divided by the precipitation field with no stride.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
5. Factors to consider for template size and stride
SHARPEN has the ability to restore the distributions of the Kalman estimates, leading to improvements in HSS, but at the expense of lower correlation due to a sharper image and increased run time. While the operationally optimal template size and stride depends on the available computational resources, our analysis suggests the following guidelines:
The larger the template, the better the skill, but with diminishing returns.
The larger the template, the longer the run time, and with an escalating increase.
The larger the stride, the shorter the run time, but with diminishing returns.
The larger the stride, the stronger the block artifacts.
It is possible that the improvement in skill with larger template size depends on the type of precipitation system, with convective systems potentially needing a smaller template size compared to stratiform systems. It also thus follows that this improvement may be regionally and seasonally dependent due to varying climatology. The selection of the appropriate template size and stride needs to consider these four guidelines alongside the available computational resource.
Furthermore, one should consider factors that are not as easily quantified. For example, a very large template size means that the quantile mapping is no longer local, so the reference values used in SHARPEN may be of another precipitation regime or system. Figure 3 exemplifies this issue: the 25 × 25 template (blue box) is approximately the size of the mesoscale convective systems, so the reference values used in SHARPEN comes roughly from the same system. A larger template will use reference values from other systems; it is unclear what effect this may have on the overall precipitation field. In the development of SHARPEN, we also investigated the simpler approach of a traditional quantile mapping based on the global distributions of precipitation rates (not shown), but the improvement is less than that of SHARPEN, suggesting that it is beneficial to restrict the computation to the local environment.
6. CPEX case studies
In Rajagopal et al. (2021), two mesoscale convective systems (MCSs) observed during the Convective Processes Experiment (CPEX) field campaign were investigated in detail using an object oriented approach. The authors found that for these MCSs, defined as objects in IMERG V06 precipitation, the time series of maximum precipitation rate within one MCS “sagged” between successive PMW observations and the area of another MCS increased threefold in the half-hour following a PMW observation. These artifacts can be traced to the averaging of multiple precipitation fields in the IMERG V06 Kalman filter. Here, we revisit these two CPEX case studies by rerunning the IMERG V06 Kalman filter with the SHARPEN output (with a 25 × 25 template size and a 3 × 3 stride). These two case studies differ from those considered in section 3a by focusing on a Lagrangian perspective of the precipitation properties.
In the first case study on 6 June 2017 (Fig. 10a), the observed MCS had an increasing volumetric rain rate based on IMERG V06, largely driven by the increase in area. Except for a steep rise at the start, the maximum rain rates observed by the PMW satellites—separated by roughly 3 h—were consistently around 80 mm h−1. However, in the half-hours between the PMW overpasses, the maximum rain rates would regularly dip down to about 30–40 mm h−1. When SHARPEN is applied, the maximum rain rates recovered to a consistent ~80 mm h−1 throughout, except for a brief drop to about 60 mm h−1 for one hour. Such a behavior of sustained maximum rain rate is consistent with our expectation of an MCS that is developing—as indicated by the increasing volumetric rain rate. In addition, the changes in rain area is more gradual, with no sharp jumps before or after a PMW overpass.
Time series of volumetric rain rate, maximum rain rate, and rain area from IMERG V06 (solid lines) and SHARPEN (dashed lines) for the MCSs observed during CPEX on (a) 6 Jun 2017 and (b) 10 Jun 2017. Note that the rain area for the 6 Jun 2017 case study is multiplied by two in the figure to increase its visibility on the shared axis.
Citation: Journal of Hydrometeorology 22, 8; 10.1175/JHM-D-20-0225.1
In the second case study on 10 June 2017 (Fig. 10b), the observed MCS, based on the IMERG V06 volumetric rain rate, had a rapid growth and, after a brief drop, gradually increase to a maximum several hours later before decaying. However, in the half-hour immediately after the MHS overpass in the 1930–2000 UTC window, the rain area tripled from 500 to 1500 km2, maintaining a plateau for 6 h through an SSMIS overpass, before dropping down to about 500 km2 upon another MHS overpass. Flight observations at 2030 UTC indicate that at least part of the enlarged area is spurious (Rajagopal et al. 2021). This artificial inflation of the rain area is the result of a contrast between the small areas observed by MHS and a large area observed by SSMIS; an average of the two observations would produce the plateau of large rain area. When SHARPEN is applied, the change in the rain area became more natural, appearing to approximate a linear interpolation between the large area from SSMIS and the small areas from MHS. When compared to the CPEX aircraft observation at 2030 UTC, the coverage of spurious precipitation in the flight path is reduced considerably (not shown). On the other hand, the maximum rain rate changed from a gradual decrease to a step-wise decrease, which is to be expected because these steps likely reflect the discrete precipitation values from the reference fields.
It should be noted that, in both cases, SHARPEN does not alter the volumetric rain rate appreciably. Volumetric rain rate depends on the mean precipitation rate, which is preserved by averaging in IMERG V06 and scaled to match the references in SHARPEN. Indeed, Rajagopal et al. (2021) suggested that volumetric rain rate is a better indicator of the MCS life cycle, so the consistency in volumetric rain rate is reassuring. We should also point out that SHARPEN does not—and is not designed to—handle potential shortcomings in the PMW observations. For example, in the second case study, the substantial difference in rain area observed by MHS and SSMIS (Fig. 10b) may suggest limitations in the PMW retrievals from either sensor, but SHARPEN accepted them as truth and attempted to bridge the rain areas as they were reported. Since it is not clear how such discrepancies should be resolved—in this case, it is unclear if MHS or SSMIS was at fault—artifacts in PMW observations are best handled through improvements in the parent PMW algorithms.
7. Conclusions
Inspired by the concept of quantile mapping, we devised a new scheme named Scheme for Histogram Adjustment with Ranked Precipitation Estimates in the Neighborhood (SHARPEN). SHARPEN is implemented as a filter that can adjust the distribution of the Kalman precipitation estimates to the local instantaneous estimates for forward propagated precipitation, backward propagated precipitation, and IR precipitation. The scheme maps Kalman estimates to the reference precipitation rates based on their ranks within local templates, weighted by the Kalman correlations. We also introduced a computational shortcut in the form of stride to reduce run time associated with large templates. Note that the conceptual idea underpinning SHARPEN is applicable not just to the Kalman filter or morphing, but to any precipitation field derived by averaging, which changes the PDF of the precipitation values.
Evaluation of SHARPEN confirmed its ability to recover the distribution of the precipitation rates, with a desired reduction in precipitating area and increase in peak precipitation compared to the original Kalman precipitation. Comparison with MRMS revealed an improved HSS but at the expense of slightly reduced correlation, though the lower correlation is possibly also due to the sharper image of precipitation system. While run time increases substantially with template size, stride was able to mitigate much of the effect with minimal artifacts, especially if stride is much smaller than the template size. A revisit of the CPEX case studies in Rajagopal et al. (2021) demonstrated that SHARPEN was largely able to reduce the artifacts present in IMERG V06 due to the averaging of precipitation fields in the Kalman filter. SHARPEN is currently being considered for implementation in the next version of IMERG, V07.
While SHARPEN represents a step forward compared to the Kalman filter averaging in the original IMERG morphing scheme, the ultimate goal of a Lagrangian interpolation scheme is to implement “true” morphing, i.e., a smooth transition from one frame to another, such as through pixel-to-pixel tracing or feature tracking. However, this endeavor poses a formidable set of challenges. One difficulty in developing a true morphing scheme is the considerable intersensor differences; in other words, two sensors can view the same precipitation scene but produce appreciably different retrievals. Furthermore, true morphing alone may still lack physical evolution of the precipitation system, which is perhaps most prominent when a precipitation system emerges or disappears between successive satellite overpasses. Above all, there is no assurance that a true morphing scheme will produce superior precipitation rate estimates that outperform SHARPEN or even the original morphing scheme, on top of the practical requirement that the algorithm remains computationally lean. These challenges need to be overcome in the development of a true morphing scheme; until then, SHARPEN enforces one critical condition of physical reasonableness by restoring the distribution of the averaged precipitation field.
Acknowledgments
The idea behind SHARPEN originated from discussions with researchers at the University of Utah, including Edward Zipser and James Russell, whom the authors thank for their insightful results that first demonstrated shortcomings in the PDF of the Kalman estimates in IMERG V06. The authors are also grateful to the Precipitation Processing System and, in particular, Patty McCaughey, who performed benchmark tests on the computational performance of SHARPEN. The authors thank three anonymous reviewers for their detailed comments that improved this study. All authors are supported by the NASA Precipitation Measurement Missions funding (program manager Gail Skofronick-Jackson); MR is also supported by NASA Grant NNX17AG74G (program managers Ramesh Kakar and Gail Skofronick-Jackson).
Data availability statement
IMERG is provided by the NASA Goddard Space Flight Center’s IMERG team and PPS, which develop and compute IMERG as a contribution to the GPM mission, and archived at the NASA GES DISC. IMERG can be downloaded at https://gpm.nasa.gov/data/directory; https://doi.org/10.5067/GPM/IMERG/3B-HH/05.
REFERENCES
Climate Prediction Center, 2011: NOAA CPC Morphing Technique (CMORPH) global precipitation analyses. National Center for Atmospheric Research, Computational and Information Systems Laboratory, accessed 27 June 2019, https://doi.org/10.5065/d6cz356w.
Grecu, M., and W. S. Olson, 2020: Precipitation retrievals from satellite combined radar and radiometer observations. Satellite Precipitation Measurement, V. Levizzani et al., Eds., Advances in Global Change Research, Vol. 67, Springer, 231–248, https://doi.org/10.1007/978-3-030-24568-9_14.
Hoare, C. A. R., 1961: Algorithm 64: Quicksort. Commun. ACM, 4, 321.
Hong, Y., K.-L. Hsu, S. Sorooshian, and X. Gao, 2004: Precipitation estimation from remotely sensed imagery using an artificial neural network cloud classification system. J. Appl. Meteor., 43, 1834–1853, https://doi.org/10.1175/JAM2173.1.
Huffman, G. J., and Coauthors, 2019a: NASA Global Precipitation Measurement (GPM) Integrated Multi-satellitE Retrievals for GPM (IMERG). Algorithm Theoretical Basis Doc., version 6, 34 pp., https://gpm.nasa.gov/sites/default/files/document_files/IMERG_ATBD_V06.pdf.
Huffman, G. J., D. T. Bolvin, E. J. Nelkin, and J. Tan, 2019b: Integrated Multi-satellitE Retreivals for GPM (IMERG) technical documentation. NASA Tech. Doc., 77 pp., https://gpm.nasa.gov/sites/default/files/document_files/IMERG_doc_190909.pdf.
Huffman, G. J., E. F. Stocker, D. T. Bolvin, E. J. Nelkin, and J. Tan, 2019c: GPM IMERG final precipitation L3 half hourly 0.1 degree × 0.1 degree V06. Goddard Earth Sciences Data and Information Services Center, accessed 27 June 2019, https://doi.org/10.5067/gpm/imerg/3b-hh/06.
Huffman, G. J., and Coauthors, 2020: Integrated Multi-satellite Retrievals for the Global Precipitation Measurement (GPM) Mission (IMERG). Satellite Precipitation Measurement, V. Levizzani et al., Eds., Advances in Global Change Research, Vol. 67, Springer, 343–353, https://doi.org/10.1007/978-3-030-24568-9_19.
Joyce, R. J., and P. Xie, 2011: Kalman filter–based CMORPH. J. Hydrometeor., 12, 1547–1563, https://doi.org/10.1175/JHM-D-11-022.1.
Joyce, R. J., J. E. Janowiak, P. A. Arkin, and P. Xie, 2004: CMORPH: A method that produces global precipitation estimates from passive microwave and infrared data at high spatial and temporal resolution. J. Hydrometeor., 5, 487–503, https://doi.org/10.1175/1525-7541(2004)005<0487:CAMTPG>2.0.CO;2.
Kidd, C., 2018: NASA Global Precipitation Measurement (GPM) Precipitation Retrieval and Profiling Scheme (PRPS). Algorithm Theoretical Basis Doc., Version 01-02, 17 pp., https://pps.gsfc.nasa.gov/Documents/20180203_SAPHIR-ATBD.pdf.
Kidd, C., and G. Huffman, 2011: Global precipitation measurement. Meteor. Appl., 18, 334–353, https://doi.org/10.1002/met.284.
Kidd, C., and V. Levizzani, 2011: Status of satellite precipitation retrievals. Hydrol. Earth Syst. Sci., 15, 1109–1116, https://doi.org/10.5194/hess-15-1109-2011.
Kidd, C., 2019: GPM SAPHIR on MT1 (PRPS) radiometer precipitation profiling L2 1.5 hours 10 kmV06. Goddard Earth Sciences Data and Information Services Center, accessed 27 June 2019, https://doi.org/10.5067/gpm/saphir/mt1/prps/2a/06.
Kirschbaum, D. B., and Coauthors, 2017: NASA’s remotely sensed precipitation: A reservoir for applications users. Bull. Amer. Meteor. Soc., 98, 1169–1184, https://doi.org/10.1175/BAMS-D-15-00296.1.
Kirstetter, P.-E., and Coauthors, 2012: Toward a framework for systematic error modeling of spaceborne precipitation radar with NOAA/NSSL ground radar–based national mosaic QPE. J. Hydrometeor., 13, 1285–1300, https://doi.org/10.1175/JHM-D-11-0139.1.
Kirstetter, P.-E., Y. Hong, J. J. Gourley, Q. Cao, M. Schwaller, and W. Petersen, 2014: Research framework to bridge from the global precipitation measurement mission core satellite to the constellation sensors using ground-radar-based national mosaic QPE. Remote Sensing of the Terrestrial Water Cycle, Geophys. Monogr., Vol. 206, Amer. Geophys. Union, 61–79.
Kirstetter, P.-E., J. J. Gourley, Y. Hong, J. Zhang, S. Moazamigoodarzi, C. Langston, and A. Arthur, 2015: Probabilistic precipitation rate estimates with ground-based radar networks. Water Resour. Res., 51, 1422–1442, https://doi.org/10.1002/2014WR015672.
Kummerow, C. D., 2017: GPM GMI (GPROF) radiometer precipitation profiling L2A 1.5 hours 13 km V05. Goddard Earth Sciences Data and Information Services Center, accessed 27 June 2019, https://doi.org/10.5067/gpm/gmi/gpm/gprof/2a/05.
Kummerow, C. D., and Coauthors, 2001: The evolution of the Goddard Profiling Algorithm (GPROF) for rainfall estimation from passive microwave sensors. J. Appl. Meteor., 40, 1801–1820, https://doi.org/10.1175/1520-0450(2001)040<1801:TEOTGP>2.0.CO;2.
Kummerow, C. D., S. Ringerud, J. Crook, D. Randel, and W. Berg, 2011: An observationally generated a priori database for microwave rainfall retrievals. J. Atmos. Oceanic Technol., 28, 113–130, https://doi.org/10.1175/2010JTECHA1468.1.
Kummerow, C. D., D. L. Randel, M. Kulie, N.-Y. Wang, R. Ferraro, S. Joseph Munchak, and V. Petkovic, 2015: The evolution of the Goddard profiling algorithm to a fully parametric scheme. J. Atmos. Oceanic Technol., 32, 2265–2280, https://doi.org/10.1175/JTECH-D-15-0039.1.
Maranan, M., A. H. Fink, P. Knippertz, L. K. Amekudzi, W. A. Atiah, and M. Stengel, 2020: A process-based validation of GPM IMERG and its sources using a mesoscale rain gauge network in the West African forest zone. J. Hydrometeor., 21, 729–749, https://doi.org/10.1175/JHM-D-19-0257.1.
Nguyen, P., M. Ombadi, S. Sorooshian, K. Hsu, A. AghaKouchak, D. Braithwaite, H. Ashouri, and A. R. Thorstensen, 2018: The PERSIANN family of global satellite precipitation data: A review and evaluation of products. Hydrol. Earth Syst. Sci., 22, 5801–5816, https://doi.org/10.5194/hess-22-5801-2018.
Olson, W., 2018: GPM DPR and GMI combined precipitation L2B 1.5 hours 5 km V06. Goddard Earth Sciences Data and Information Services Center, accessed 27 June 2019, https://doi.org/10.5067/gpm/dprgmi/cmb/2b/06.
Rajagopal, M., E. Zipser, G. Huffman, J. Russell, and J. Tan, 2021: Comparisons of IMERG version 6 precipitation at and between passive microwave overpasses in the tropics. J. Hydrometeor., 22, 2117–2130, https://doi.org/10.1175/JHM-D-20-0226.1.
Randel, D. L., C. D. Kummerow, and S. Ringerud, 2020: The Goddard Profiling (GPROF) precipitation retrieval algorithm. Satellite Precipitation Measurement, V. Levizzani et al., Eds., Advances in Global Change Research, Vol. 67, Springer, 141–152, https://doi.org/10.1007/978-3-030-24568-9_8.
Schneider, U., A. Becker, P. Finger, A. Meyer-Christoffer, M. Ziese, and B. Rudolf, 2014: GPCC’s new land surface precipitation climatology based on quality-controlled in situ data and its role in quantifying the global water cycle. Theor. Appl. Climatol., 115, 15–40, https://doi.org/10.1007/s00704-013-0860-x.
Schneider, U., A. Becker, P. Finger, A. Meyer-Christoffer, B. Rudolf, and M. Ziese, 2015: GPCC full data reanalysis version 7.0 at 1.0°: Monthly land-surface precipitation from rain-gauges built on GTS-based and historic data: Gridded monthly totals. Global Precipitation Climatology Centre, accessed 27 June 2019, https://doi.org/10.5676/dwd_gpcc/fd_m_v7_100.
Tan, J., W. A. Petersen, and A. Tokay, 2016: A novel approach to identify sources of errors in IMERG for GPM ground validation. J. Hydrometeor., 17, 2477–2491, https://doi.org/10.1175/JHM-D-16-0079.1.
Tan, J., G. J. Huffman, D. T. Bolvin, and E. J. Nelkin, 2019: IMERG V06: Changes to the morphing algorithm. J. Atmos. Oceanic Technol., 36, 2471–2482, https://doi.org/10.1175/JTECH-D-19-0114.1.
Tapiador, F. J., and Coauthors, 2012: Global precipitation measurement: Methods, datasets and applications. Atmos. Res., 104–105, 70–97, https://doi.org/10.1016/j.atmosres.2011.10.021.
Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Elsevier, 676 pp.
Xie, P., R. Joyce, S. Wu, S.-H. Yoo, Y. Yarosh, F. Sun, and R. Lin, 2017: Reprocessed, bias-corrected CMORPH global high-resolution precipitation estimates from 1998. J. Hydrometeor., 18, 1617–1641, https://doi.org/10.1175/JHM-D-16-0168.1.