1. Introduction
The primary purpose of the Pacific Rainfall (PACRAIN) database (Greene et al. 2008) is to serve as an expanding repository of rainfall records for the tropical Pacific Ocean. Of particular interest are rainfall measurements obtained at locations where the host landmass lacks sufficient size and topographic relief to affect the local rainfall climate via orographic enhancement, or by convective motion induced by land–sea temperature gradients; thus, at locations approximating the spatial homogeneity of openocean conditions. Rainfall data obtained at openocean proxy locations are crucial for understanding the error structure of remote sensing rainfall estimates and for validation of climate models. It was postulated by Lavoie (1963) that rainfall observations collected at atoll locations are representative of rainfall occurring over the surrounding open ocean; a position adopted by Morrissey and Greene (1993) and Cook and Greene (2019) and broadly in the literature. Unfortunately, atoll rainfall records are scarce in space and time and have become more so in the last 20 years, as stations have been abandoned/decommissioned and various nations’ meteorological services have limited the scientific community’s access to their data.
If atollbased rainfall data are used as proxies for rainfall over the surrounding open ocean then rainfall observations taken on oceanographic buoys could also serve that purpose. Selfsyphoning, capacitancetype rain gauges were deployed on TAO array buoys (McPhaden et al. 1998; Serra et al. 2001) for varying periods starting in 1997 and continuing to the present, although deployments were largely curtailed after 2012. Rainfall rates derived from these gauges’ raw volume data are publicly available through the Pacific Marine Environmental Laboratory (PMEL) website, courtesy of the Global Tropical Moored Buoy Array (GTMBA) Project office of NOAA/PMEL (see GTMBA 2020) and through the National Data Buoy Center (NDBC) website. Daily accumulations derived from the rainfall dataset collected at TAO array locations represent a valuable addition to the PACRAIN database if the generating algorithm yields estimates that are insensitive to the noise and artifacts inherent to the selfsyphoning capacitance gauges.
Because derived rates are already available, it would be convenient to instead incorporate them into the PACRAIN database. Indeed, publishing a newly processed version of a preexisting product (although in terms of accumulation, rather than rate) begs for an explanation. The two products will differ—perhaps enough to change conclusions about statistical metrics sensitive to infrequent and small accumulation errors.
An example of such a metric is the accumulation at 1% probability of exceedance for daily dryseason rainfall for some notoriously dry location in the equatorial eastern Pacific. Where sampling is sparse, and rainfall accumulations are small and sporadic, changes in data quality assurance (QA) and derivation techniques could give the appearance of (relatively, in a geographic–climatologic sense) highend rain events where there are none or vice versa.
Ultimately, if a new method can be contrived that exhibits better performance in terms of standard error of daily estimates, and/or aggregates thereof, than that of previously published estimates, then that method and its results should be documented and published, if possible.
To facilitate the comparison of a PACRAINproposed daily rainrate estimation method with the technique used by PMEL and NDBC, archived 1min TAO volume data were obtained through the courtesy of GTMBA personnel and corresponding daily averaged TAO rainfall rates were obtained from the PMEL website. Figure 1 shows the locations of all atoll stations with rainfall records included in the PACRAIN database as well as all locations of TAO array rain gauges with available 1min volume data and corresponding daily rainfallrate estimates.
The acquired data do not include rainfall observations from the Triangle TransOcean Buoy Network (TRITON) network, and, as of publication, we have been unable to obtain similar 1min volume data for rain gauges deployed in the TRITON network. Furthermore, highresolution TAO data are not available on an ongoing basis. Improvements in direct, in situ rainfall sampling over the tropical Pacific, due to the addition of TAO data, will be historical in nature for the foreseeable future.
Raw 1min TAO rainfall data are gathered using R. M. Young Model 50202/50203 selfsyphoning rain gauges. These gauges are of the capacitance type, in which a capacitance probe and collected rain serve as the “plates” of a variable capacitor, capacitance being determined by the portion of the probe’s mylar dielectric coating covered by contiguous collected water. As described by Serra et al. (2001), deployed gauges were fitted with PMELdesigned circuits that generated a digitized signal (“counts”) every second, based on the probe’s capacitance, which was converted to average collected water volume in milliliters for 1min periods via a calibration equation.
Prior to 2005, raw 1min rain volume data were retrieved after each gauge deployment, subjected to manual QA by PMEL personnel and processed in “delayed mode” into the various averaged rate products and archived. These replaced “real time” transmitted daily rate estimates where available. Moorings have since been upgraded to provide data transmission, including 1min rainfall data, in real time. Currently, raw rainfall data are processed automatically at the NDBC into averaged rainfallrate products (Serra 2018).
The daily TAO rainfall rates obtained from PMEL were first recomputed by applying the estimation procedure described in Serra et al. (2001) (the duplicate method) to the raw 1min volume data. This was done to verify an understanding of the basic derivation technique (the PMEL method) applied by PMEL and NDBC, ensuring that differences between estimates derived via the duplicate method and those obtained via the alternate method described in this paper (the PACRAIN method) can be correctly attributed to the PMEL method as well. The duplicate method is found to yield daily rainfall rates consistent with published PMEL daily estimates where automated QA actions meet criteria set forth in section 2.
Next, the duplicate and PACRAIN methods were applied to synthetic 1min accumulation data, and the corresponding daily accumulations were compared with the underlying, simulated accumulation signal (without syphon, evaporation, noise, or temperature signals) to establish the nature of their error distributions. The PACRAIN method is shown to produce more reliable daily accumulation estimates than those derived via the duplicate method, most notably for synthetic data with greater noise magnitude and shorter noise decorrelation times. Adopting the PACRAIN method to compute daily TAO accumulation values for incorporation into the PACRAIN database will lead to more robust rainfall estimates and improve understanding of patterns in openocean, tropical Pacific rainfall.
Accumulations and rates estimated from selfsyphon rain gauge data (via the PACRAIN method or the duplicate method) are vulnerable to some errors that differ from those to which rainfall data collected using manualread or tippingbucket rain gauges are subject. Error sources not mentioned or not fully addressed in the literature are described.
2. Duplication of PMEL rainfallrate estimates
The method for estimating published rainfall rates from raw 1min volume data is described in general terms by Serra et al. (2001), Serra (2018), and PMEL (2021). The methods are straightforward in the general sense, but multiple possible implementations exist satisfying the given descriptions. For the purposes of duplicating the PMEL methodology, the published descriptions are construed in the ways we judge to be the most intuitive and defensible, while remaining consistent with the available parameters and clearly stated definitions.
Prior to estimating rain rates for comparison with published daily rates, automated QA algorithms were applied to the raw volume data as set forth in Table 1.
Quality assurance categories, tests, and actions.
The outofrange criterion is based upon the range of values commonly seen in the data. There are cases of huge excursions in the leadups to syphon events. In manual examinations of the data, no excursions near the “full” reading exceeding ∼10 mm appeared to be random variations about a preceding accumulation trend. Also, it appears that in some cases “empty” does not correspond to volume readings that average to zero. No clear cases of empty gauge readings of less than approximately −2 mm were noted.
Volume spikes are positive spikes, sometimes singular and periodic, at other times clustered and irregular, that lie above the bulk of the data in which they are imbedded. The 4standarddeviation threshold is used to identify the anomaly; omission of the four largest values in data point’s neighborhood allows for other spikes in the neighborhood.
Spurious 1min rates were determined through manual examining highrainrate periods. No rates in excess of 3.5 mm min^{−1} imbedded in monotonic accumulations were identified. Compensating, spurious rates were also excluded.
General noise of up to 1 mm is assumed to be possible immediately following a syphon, so values of <10 ml that are greater than and prior to the minimum value following the syphon are not flagged; otherwise, it is concluded that such data points do not reflect true postsyphon volume and are part of a “syphon tail” feature.
Spurious losses occur between filtered values and have no obvious explanation. They are characterized by persistent losses occurring on the order of minutes and were identified through manual review of data associated with preliminary PACRAIN method negative daily estimates. None were noted with a magnitude less than ∼2 mm. Both estimation methods are directly affected by such losses and broad data flagging is necessary. Even if such losses and all compensating gains occurred in the same day, and no data were excluded, the duplicate method will not necessarily provide a daily rate estimate consistent with that derived for the same 24h data series with the spurious gains, losses, and intervening data omitted. A 21point Hann filter applied every 10 min would be an elegant remedy for this result, applying equal total weights to all data not flagged for omission by QA algorithms, if the sum of all sensorreading excursions from the true collected volume was zero (or very close to zero), for a particular 24h period. Such is not generally the case: sensor readings may rise or fall erroneously on one day and exhibit compensating changes on a different day. Moreover, spurious gains, whether they follow or precede spurious losses, may be wholly indistinguishable from changes due to real catchment. In these cases, syphon events serve as “landmarks” in the data. Given the design and geometry of the R. M. Young selfsyphon gauge, prototypical syphons, with sensor readings beginning and ending at volumes consistent with those seen throughout the particular gauge deployment are not the plausible result of electronic anomalies or “nonstandard” configurations of gauge and/or collected water, e.g., sloshing or partial syphons of a highly tilted gauge. With this in mind, all data between windows containing syphon events were omitted from the duplication study if they contained spurious losses.
The QA standards adopted resulted in the majority (about 75%) of the raw data being excluded from the duplication procedure. This is not to suggest that the PMEL published rates based on data excluded here are defective in any sense. Rather, the intent is to show replication of the PMEL method itself, so that performance differences seen subsequently, between the duplicate and PACRAIN methods, can be defensibly attributed to methodological differences and not to QA actions. When computing values for publication in the PACRAIN database, data affected by spurious losses were retained and the affected daily estimates were supplemented in the database with a flag value of 64, indicating “consecutive values with this flag may contain offsetting errors.”
In summary, the duplicate method proceeds as follows:

Automated QA algorithms are applied to the raw 1min data.

Oneminute differences of accumulation equivalents of the 1min TAO volumes are computed.

A 16point Hann filter (see von Hann 1903) is applied to the 1min differences at 10min intervals and the resulting filtered differences are converted into hourly rates.

Daily rates are computed as simple averages of the filtered rates.
The “readme” files associated with published rainfall rates of GTMBA (2020) state that time stamps designate averagingperiod centers, so the rate for day d in the duplicate daily estimates corresponds to the rate for day d − 1 in the published daily files time stamped at 1200 UTC. Furthermore, time stamps are at whole multiples of 10 min for the published 10min rates; therefore, the duplicatemethod daily rainfallrate formula includes a filter window centered on 0000 UTC beginning the daily accumulation period but does not include one centered on 0000 UTC 24 h later. Thus, the daily averaging period is offset by onehalf of the Hann window width. Such asymmetry may seem like an unlikely definitional choice because it will introduce some random error into daily rate estimates if estimates are ascribed to symmetric periods centered on 1200 UTC, but testing of other plausible arrangements showed it to be the most consistent with published rates.
Syphon events are initially identified by scanning the data for losses of at least 300 ml volume, as judged by the difference between two filtered volumes obtained from nonoverlapping (separated by a three 1min data points centered on the time being tested), 10min Hann windows. Where such a loss coincides with a loss of at least 200 ml between consecutive 1min raw volumes ending with the time being tested, that time is tentatively marked as a syphon event. It is possible for two consecutive times to be marked thusly—in which case the last is taken to mark the syphon.
The duplicate method just described, as well as the previously described QA procedures were applied to the 1min delayedmode TAO volume data to produce daily rainfallrate estimates, against which published PMEL daily rainfall rates are plotted in Fig. 2.
In five cases (plotted as open circles), four of which occurred for days lacking syphon events, the duplicate method yielded a daily rate differing from the published value by more than 0.2 mm h^{−1}—visibly outside the cluster of values along the regression line and more consistent with rates estimated in our manual review of the relevant data. In one case (plotted as a plus sign), a difference of more than 0.2 mm h^{−1} was associated with better agreement of the published daily rate with the manually estimated rate. Collected volume is “path independent,” so, to the extent sensor readings are representative of collected volume and neighborhoods can be defined such that filtered or regressed values remove noncollection parts of the raw data, data between those neighborhoods, about 0000 UTC and syphon events, are not relevant to the computation of daily rates. Thus, we estimated rates manually based on sensor levels about 0000 UTC and either side of syphon events. Table 2 summarizes the data conditions affecting the estimation of these values and proposes reasons for the observed differences. Gauge locations (left column) are shown as blue open circles in Fig. 1.
Manual estimation notes where duplicate and PMEL estimates differ by more than 0.2 mm h^{−1}.
The slopes and coefficients of regression of Fig. 2 do not exclude the possibility that published values are derived using a technique other than the “duplicate” method. An examination of the distribution of differences between duplicate method and published rates, as seen in Fig. 3, is more telling.
For days without syphon events, the majority (>68%) of the estimates (with the duplicate method’s estimates also rounded to the nearest 0.01 mm h^{−1} and not including cases where either estimate is zero) are identical. Two thirds of the remainder of the differences can be accounted for by the omission, via QA action(s), of sufficient data to change a single filtered 10min rate estimate by ±1.44 mm h^{−1}. Only two of the “outlying” differences found were inexplicable in terms of plausible (and subjectively minor) QA actions taken prior to computation of the published rate—and those two cases defy the simplest reading of the raw data and cannot be indicative of the underlying methodology used elsewhere. For days where syphons are present, differences were not as concentrated at 0 ± 0.01 mm h^{−1}, but still explainable in terms of QA actions. This behavior was expected, as it would be consistent with QA actions affecting the published rates but not the duplicate method rates for 10min windows containing syphon events. The oftennoisy data associated with rapid accumulations (syphons are more likely to occur if it is raining and it is more likely to be raining heavily if it is raining at all) immediately preceding syphon initiation are ripe for QA flags. Sampling at these points is critical since it is often raining, and thus, more likely to be raining heavily, and we limited QA near syphon events to outofrange and spurious 1min rate tests, and either of these tests resulting in flagging outside the standard, 3min syphon “blackout” caused omission of the day from the comparison.
Again, we do not have a complete accounting of QA standards and actions employed in preparing daily rates published on the PMEL and NDBC websites. They are not provided in the literature. In general, from PMEL (2021): “Next, remaining data are checked against a narrower range of error specifications, and those that fall outside this range generate an error alert message. However, questionable data are not automatically removed. Rather, for each error alert, the suspect data are checked for validity by experienced data analysts.”
In summary, we find the rates derived with the duplicate method are consistent with corresponding published rates available through the PMEL and NDBC websites, and it is reasonable to draw conclusions about the performance of the actual PMEL derivation method, based on the performance of the duplicate method. This finding is supported by the following:

The fraction of days included in the comparison in which both the duplicate method rate and published rate were identical is suggestive of common methodology.

The range of rate differences, excluding “visible outliers,” for days with and without syphon events is explicable in terms of plausibly different QA actions.

Four of six visibly outlying rates are explicable in terms of plausibly different QA actions.

For the two rate differences inexplicable in terms of QA differences, the corresponding published rates are not the result of conceivable QA actions followed by a filter/difference method, akin in the known details, to those described in the literature.

While the method descriptions in the literature could be construed in different ways (e.g., differencing filtered values separated by 10 min, or averaging 10min rates from 0000 to 0000 UTC, rather than from 0000 to 2350 UTC), these methods have shown less agreement with published values. In the case of differencing across 10min periods, the masking needed about syphons results in big rate fluctuations; the chosen method is, in principle, “blind” to the syphon as long as the syphon loss is omitted.
3. The PACRAIN estimation method
QA actions for the PACRAIN method are identical to those described in section 2, prior to the duplicate method. The criteria for retaining a daily value are different: no more than three flagged values are allowed in each 11point window centered on 0000 UTC; no more than one flagged value is allowed in each fivepoint window preceding and following a 3min syphon blackout period. These criteria result in estimates whose agreement with the published values is almost indistinguishable to that obtained using the duplicate method, as seen in Fig. 4 and actually slightly better in terms of frequency of identical values on nonsyphon days, as depicted in Fig. 5.
The latter result is not unexpected. If sufficient reliable data exist at the critical times—0000 UTC and syphons—only minimal data are needed for the rest of a 24h period (to rule in or rule out syphon events). This obviates much of the QA actions otherwise taken. The downside is, of course, that insufficient data in those crucial periods can hardly be mitigated, particularly in the case of syphon events. The relative frequency histograms do not show which estimator is more reliable in any sense, only that the PACRAIN estimator reproduces a larger fraction of the published results for the restricted dataset where there are no syphons. To determine relative reliability, the duplicate and PACRAIN methods will be applied to synthetic data and their estimates compared with the corresponding simulated rainfall rates in section 5.
The outlying point in on the left plot in Fig. 4 stems from a published PMEL rate that excludes a syphon event occurring during very high (>2 mm min^{−1}) rain rates.
4. Generation of synthetic 1min volume time series
The simulation of rainfall time series is widely addressed in the literature, with a broad emphasis on reproducing the statistical properties of observed time series to make inferences about the impacts of extreme events and/or the statistical nature of rainfall at unsampled locations. Brief explanations of Poisson process models, commonly used for simulating rainfall time series, with Bartlett–Lewis and Neyman–Scott clustering processes are given by RodriguezIturbe et al. (1987) and an improved implementation of the Bartlett–Lewis process is tested in RodriguezIturbe et al. (1988). A more generalized simulation procedure is presented by Cowpertwait (1994) in which storms (clusters) are initiated in a Poisson process and a variety of cells, in terms of intensity and duration, may appear within a particular storm. Cell type properties and their distributions are fit from observed data. Cowpertwait (2010) extends the Poisson point process of storm and cell locations to two dimensions, where storms (and cells) are discs, with the appearance of individual cells appearing within a storm being determined by a Neyman–Scott process.
A broad review of rainfall time series simulation methods is given by Sharma and Mehrotra (2010), which covers historically common Markovchain simulation techniques and includes a section highlighting generation of higher temporal resolution (subdaily) series. Thayakaran and Ramesh (2013) apply three Markovmodulated Poisson process (MMPP) models (see Fischer and MeierHellstern 1993) to generate synthetic rainfall time series, where rain gauge tip times are the result of an MMPP informed by a continuoustime Markov chain with three possible intensity states. These models captured the behavior of aggregated rates for individual stations but not extreme rate values.
A comparison of rainfall time series models (the rectangular pulse Poisson model, the Bartlett–Lewis rectangular pulse model, Bartlett–Lewis model with two cell types, the Bartlett–Lewis rectangular with cell depth distribution dependent on duration, and the Neyman–Scott rectangular pulse model) is made by Lu and Qin (2012) using rainfall records from the Changi Airport in Singapore. Variations of Bartlett–Lewis models were shown to outperform the Neyman–Scott models. Using a rectangular pulse model for storms with Bartlett–Lewis grouping, after Kaczmarska et al. (2014), and rearrangement/shuffling over two timescale ranges, Kim and Onof (2020) was able to reproduce, at 5min resolution, the first three moments of the historical rainfall record for Bochum, Germany (1931–99), as well as extreme rainfall for accumulation periods from 5 min to 3 days.
Oriani et al. (2014) takes a completely different route, employing an iterative, direct sampling of observed time series. Data values for uninformed random times
Our aims in generating a synthetic times series differ in purpose and requirements from those of the literature cited above. Strictly speaking, our task is not to simulate a rainfall time series; rather, it is to generate a synthetic time series of rain gauge sensor values to be used for testing daily rainrate estimation methods. While collected volume is intended as a proxy for rain accumulation (after calibration, conversion to depth, wind corrections, etc.), we do not set out to reproduce the statistics of one or more specific rainfall climatologies. On the contrary, for the purpose of this study, simulated sensor output should reflect a range of conceivable collection rates and collection timing, not the marginal rate distributions or correlation structures of particular climates. We wish to gauge the general performance of daily rainfall estimation methods, not their performance relative to a particular rainfall climate. Minimally, selfsyphoning rain gauge sensor output also reflects syphon losses, evaporation, effects of temperature change (the sensor circuit’s sensitivity to temperature), and “random” noise. Other features, such as spikes and longerduration excursions in indicated level, and syphon “tails” can also be incorporated, where the effects of QA algorithms need to be examined.
Our 1min sensor output simulation procedure entails generating 1.44 × 10^{7}element (10 000 days), 1min time series of 1) rainfree period logical mask, 2) collected volume, 3) evaporative loss, 4) random noise, 5) syphon losses, and 6) temperatureinduced changes. It should be emphasized that the synthesis methods described below are to illustrate the efficacy of the daily rainfall estimation technique over a broad range of basic properties—not to simulate particular rainfall climates.
a. Rainfree period masks
Onethird of the period of record for which synthetic data are produced is masked as rain free. The rain mask M_{rain} is initially set to “true.” A pseudorandom, uniformly distributed time index t_{r} is generated such that 1 ≤ t_{r} ≤ n_{rec} − 1439, and t_{r} corresponds to 0000 UTC. A 1440min contiguous period of the rain mask beginning at t_{r} is then set to “false.” The process repeats until the onethird threshold is reached. The main parameters—the size of the random false mask, its timing, and the total fraction masked—of this routine are motivated by the desire to have a subjectively large number of full days (corresponding to periods beginning and ending at 0000 UTC, where crucial computations are made) with no accumulation, to provide a control set. Neither the basic PACRAIN nor duplicate methods will necessarily render rainrate estimates of zero for any calendar day period without simulated rainfall, because of noncollection features in the synthetic signal and to estimation windows reaching into adjacent days. It is important to know the importance of such errors relative to errors on days with rain and if they can be easily mitigated.
b. Collected volume
c. Evaporative loss
For the purpose of simulation, evaporation is deemed to occur whenever it is not raining and there is water retained in the gauge. In reality, evaporation rates are determined by the temperature of the gauge reservoir and collected water it contains, the ambient air temperature, ambient vapor pressure, and wind. During extended periods of a day or more where dry, quiescent conditions are evident from subjectively low levels of noise in sensor readings, and continuous trends in sensor readings are not positive, as judged by local regression, the 1min volume data indicate losses of up to 2 − 3 ml day^{−1}—consistent with Serra et al. (2001). Evaporation of this magnitude is not expected to occur during short (on the order of an hour or less) rainfree periods separating periods of accumulation in longer, otherwise “showery,” periods. At such times, ambient vapor pressures are expected to be close to equilibrium, and gauge/collected water temperatures are expected to be closer to ambient temperatures, due to cloudiness, than they would be during extended stretches of sunny weather. However, the exact nature of evaporation cannot be deduced from the data on rainy days, so the selection of evaporation behavior and the magnitude of its rate is made to maximize its plausible contribution to daily estimate errors.
Note also that syphon event times are related (both in simulation and reality) to retained water volume in the gauge, and daily rate estimation errors for both PACRAIN and duplicate methods are tied algorithmically to periods about syphons, more so than to other periods, due to masking and to the greater likelihood of rain at those time and therefore variations in rain rate. Thus, to ignore evaporation could affect the timing of estimation errors related to syphon events.
d. Noise
Syphon gauge readings are subject to variations that cannot be explained in terms of accumulation, evaporation, or temperature changes, which we will designate as noise in the aggregate. Quantifying (or even fully enumerating) the physical mechanisms underlying noise is beyond the scope of this paper, but obvious sources include the following:

Sloshing of retained water in the reservoir due to rapid changes in tilt, resulting directly in changes in probe surface area covered by a contiguous water mass, also in contact with the jacket—This is mentioned in passing by Yang et al. (2015) and alluded to by Yuter and Parker (2001).

Water adhering to the capacitive probe after syphon drain or slosh and contiguous with main, “settled” volume of collected water—For example, it can be deduced from the known syphon duration of approximately 30 s (see R. M. Young 2021) and from the sequence of volume measurements following some syphon events, where “noise” is subjectively absent, that water can adhere to the capacitance probe above the level of coverage given the most consolidated configuration of the retained water, sometimes resulting in transient volume readings tens of more than the true amount. This issue is transient in nature and can be mitigated by omitting from one to three volume data points from the post syphon regression.

Water adhering to interior of reservoir and/or probe jacket after splash/slosh and subsequent drainage and consolidation

Movement of water in the syphon tube (not syphoning) due to slosh or windinduced air pressure differential between reservoir and syphontube exit

Electronic interference—mentioned in passing by Serra et al. (2001).

Noise due to finite sensor resolution.
Of the sources listed above, the direct effects of slosh (and its secondary effects) are the most prominent for buoymounted gauges. Electronic noise is expected to affect small, episodic time domains. Water moving in and out of the syphon tube due to airpressure differentials is limited by tilt and wind speed. For wind speeds on the order of meters per second, air pressure differentials of a few hectopascals between reservoir and syphon tube can be simulated using known geometry of the selfsyphon gauges. A static difference of 1 hPa results in a height difference of approximately 10 mm between water in the syphon tube and reservoir, resulting in an indicated gain/loss of approximately 0.3 ml in volume. Horizontal accelerations and rapid excursions in tilt, can be envisioned to result in apparent level changes in the reservoir on the order of centimeters, or apparent volume changes on the order of tens of milliliters.
Noise vectors are produced for combinations of low, medium, and high values of maximum noise magnitude, and short, medium, and long decorrelation lags. Maximum noise magnitude is modulated by normalizing a vector by the magnitude of its maximum absolute value element and scaling appropriately by η_{max}. The coefficient c_{η} is adjusted to yield a desired decorrelation lag.
e. Syphon losses
f. Temperatureinduced changes
Clearly, temperature dependence of the magnitude documented for TAO gauges would represent a potentially bigger problem for nonoceanic and/or midlatitude locations, but in principle, and given wellmodeled sensor behavior and availability of temperature readings, the temperature signal could be subtracted from the 1min volume data. We take (4.9) as a “worst case” scenario.
5. Results: Application of PACRAIN and duplicate methods to synthetic 1min accumulations
Daily rainfallrate estimates derived via the PACRAIN and duplicate methods were obtained for combinations of four values of C_{n} from 0.0 to 0.8, and five values of η_{max} from 0.0 to 1.0 mm. A sampling of PACRAIN and duplicate method estimates are plotted against corresponding simulated (true) collectiononly rates, for days with and without syphons in Figs. 6–8.
In all cases examined, coefficients of determination are larger for the PACRAIN method, although the difference is small for small noise magnitude and longer decorrelation times. In general, differences in r^{2} increase with increasing noise maximum and with decreasing decorrelation times, resulting from smaller C_{n}. For a given noise sample, maximum errors of the PACRAIN method on days without syphon events are smaller than those of the duplicate method; moreover, in many cases they are smaller than the visible changes in maximum error for the duplicate method between different combinations of η_{max} and C_{n} (e.g., between Figs. 6b and 7b and between Figs. 7a and 7b). This fact amounts to an important practical difference between the methods since there are no syphon events during the majority of days at the TAO buoy locations.
For days with syphons, the improvement in r^{2} for the PACRAIN method is most evident for η_{max} = 1.0 mm and for shorter decorrelation times, as seen most clearly in Figs. 7a–c. However, in Fig. 8, the PACRAIN method also exhibits larger r^{2} when applied to simulated 1min data with no noise.
In most cases (depending on sensor level) volume correlation at 1min lag is less than 0.5, and where it is greater, it is much closer to 0.5 than to unity and correlation does trend to zero. Thus, for this specific (and obvious) case, (5.2) is much less than (5.3). This may seem an absurd example: comparing error statistics of a difference across 24 h with those of the scaled average of two 1min differences, yet identical arguments can be made for periods smaller than a day (with a similar procedure applied when aggregated to a full day). This means that averaging noncontiguous subinterval differences will always yield less reliable rate estimates than a simple difference across the full interval. Also, beginning and ending volumes used in the PACRAIN method are regressed from the raw data and can be expected to yield better error variance than the data themselves (or filtered averages thereof) where the regression assumptions hold.
6. Summary and comparison of rainfallrate error sources
Daily rainfall estimates have been derived from 1min TAO volume data by differencing regressed values obtained at 0000 UTC and about syphon events. This method is shown to be statistically advantageous, relative to the method used to derived rates already published on the PMEL and NBDC websites. These daily rainfall accumulation estimates are published on the PACRAIN website (https://www.pacrain.ou.edu).
The derived daily accumulations are subject to errors arising from the PACRAIN method (including the QA algorithms and the regression assumptions) as well as from systematic errors in the data such as evaporative losses. As seen in the probability density functions (PDFs) and corresponding differences of Figs. 9 and 10, there is a visible shift of error probabilities toward, and from both sides of, a mode very near the PACRAIN bias at approximately −0.015 mm h^{−1}, from the duplicate method to the PACRAIN method. Both methods have broader error PDFs for larger C_{n} and smaller η_{max}, but the PACRAIN method gives better results in all cases, including for the nonoise case in Fig. 10.
The majority of the bias exhibited by the PACRAIN estimator (approximately −0.016 mm h^{−1} for the cases shown) is explained by simulated evaporation. Removing evaporation from the simulated data results in bias shifts from −0.016 63 to −0.003 91 mm h^{−1} and from −0.016 48 to −0.003 75 mm h^{−1} for the PACRAIN and duplicate methods, respectively, for the nonoise case. When comparing Figs. 10 and 11, a clear shift in error probabilities is observed toward less negative bias for both estimators.
While the change in bias observed in both estimators, when evaporation is removed, is consistent in magnitude—as expected—with simulated evaporation (−0.0125 mm h^{−1} during nonraining periods), evaporative bias does not always explain the majority of bias for the duplicate method; it does for the PACRAIN method. For example, in the case of C_{n} = 0.2 and η_{max} = 1.0 mm, duplicate method bias changes from 0.0269 to 0.0415 mm h^{−1} when evaporation is excluded. The removal of values via QA actions leaves the duplicate method open to uneven sampling of large, random excursions throughout each 24h period. The nonevaporative bias of daily rates obtained using the PACRAIN method is ranges from approximately 0.002 to 0.004 mm h^{−1} for the simulated cases examined.
The derived TAO estimates that are available through the PACRAIN website include negative daily rates, as do those published on the PMEL and NDBC websites. The expected magnitude of evaporation is not more than (and typically less than) that simulated here, since evaporation on cloudy, rainy, and/or cooler days is expected to be much less rapid than it is during long, dry, presumably sunny periods when it is most evident in the data. Thus, we suggest that individual, negative daily values published on the PACRAIN website should be corrected when comparing them with other daily accumulations in the PACRAIN database, obtained from traditional (manualread or tippingbucket) gauges, by setting them to zero. Since errors in regressed 0000 UTC gauge level estimates (in the PACRAIN method) result in accumulation errors of opposite sign for consecutive days, negative daily estimates should not be set to zero prior to aggregation into larger accumulation periods. Negative aggregated values obtained from PACRAIN daily accumulations derived from TAO data, e.g., monthly accumulations, should be set to zero.
Maximum standard errors in daily rainrate can be deduced from algorithm performance as applied to the synthetic data. Figure 12 shows errors for the PACRAIN and duplicate method in boxplot form for combinations of C_{n} and η_{max} as in Fig. 9.
The 1min TAO data contain noise with decorrelation times of at least two minutes, analogous to synthetic noise where C_{n} > 0.2, and magnitudes less than 1.0 mm (10 ml). Standard error for PMEL realtime daily estimates is given by Serra et al. (2001) as 0.03 mm h^{−1}, or less than 1 mm daily accumulation, based on rainyperiod laboratory data. Delayedmode daily estimates (corresponding to the duplicate method) should show some improvement in accuracy over realtime estimates. This standard error estimate is consistent with that of the duplicate method for the case parameters used in the upperleft chart of Fig. 12, in which PACRAIN method standard errors were ∼33% smaller. Duplicate method errors increased for greater noise magnitudes and shorter decorrelation times. While a noise magnitude of 0.3 mm in these simulations implies rate noise standard deviations greater than the 1.3 mm h^{−1} given for the laboratory data used in Serra et al. (2001), observed noise magnitudes are even larger at times, approaching 1 mm. Shorter decorrelation times are also observed.
In contrast, the PACRAIN method yields daily estimates with a maximum overall standard error of 0.03 mm h^{−1} for all simulated datasets examined. For a given noise level, accumulation standard error for the PACRAIN method grows with aggregation over contiguous periods only if those periods contain syphon events. In such cases, accumulation standard error scales as (1 + n_{s})^{1/2}, assuming that volume estimate errors on either side of a syphon event (and a minimum of four minutes apart) are uncorrelated.
Figure 13 depicts PACRAIN and duplicate method errors for the nonoise case, which approximates the data during very quiescent, rainfree conditions. For the nonoise case the PACRAIN method yields an overall standard error of approximately 0.016 mm h^{−1} (approximately 0.38 mm day^{−1}). PACRAIN and duplicate method biases for the nonoise synthetic data differ by 0.0002 mm h^{−1}, and are explained within 0.004 mm h^{−1} by simulated evaporation.
Both PACRAIN and duplicate method standard errors are functions of sensor noise, the magnitude of which varies from essentially zero during rainfree quiescent periods, up to 1 mm during heavy rain. Serra et al. (2001) estimates sensor resolution as 0.002–0.005 mm and calibration slope repeatability of 0.2% (0.1 mm per syphon event). Sensor resolution can lead to sensor noise of similar magnitude and mask lesser magnitude noise. The limits of repeatability result in systematic errors of similar magnitude unique to each deployment; such errors are not removed by correcting for changes in pre and postdeployment calibrations. Observed noise on the order of 0.1 mm clearly exceeds the inherent uncertainty of the gauge output.
Duplicate method standard errors exceed those of the PACRAIN method by at least 50% for the nonoise condition and up to 400% for the case of uncorrelated noise with the maximum magnitude. Overall, simulations demonstrated PACRAIN method standard errors of no more than 0.03 mm h^{−1} or 0.72 mm day^{−1}. Duplicate method standard errors were greater than the laboratorybased error estimate (0.03 mm h^{−1}) for daily values, given by Serra et al. (2001), except for the case of C_{n} = 0.8 and η_{max} = 0.3 mm. For comparison, errors associated with naïve differencing of raw 1min data across 24 h, during which there are no syphon events or QA actions, are on the order of 0.04 mm h^{−1} for general noise magnitudes of 1 mm. In comparison, the PACRAIN method standard error may seem to represent a trivial improvement, yet it is an improvement, and it is derived using simulations that include syphon events. Using naïve differencing in the presence of syphons, or where data are otherwise excluded due to QA actions, is untenable.
The performance limit for methods based on averaging Hannfiltered rates taken at 10min intervals would be obtained for a 21point filter. For that case, with no syphons and no QA actions, daily estimates would reduce to the rate obtained by differencing filtered values (not rates) obtained at 0000 UTC on consecutive days. The PACRAIN method would still produce superior results for that idealized case to the extent that the simple linear regression assumptions hold.
The PACRAIN method daily estimates are subject to windinduced undercatch errors. Serra et al. (2001) approximates—from the empirical results of Koschmieder (1934) and World Meteorological Organization (1962)—that windinduced undercatch typically exceeds 10% for wind speeds of 5 m s^{−1}. Additional information about wind effects on catchment can be found in Nešpor and Sevruk (1999). We have not corrected daily estimates for windinduced undercatchment, as it is a task to which the work discussed here is prerequisite. For example, wind corrections are not properly applied to raw 1min data, as the correction values would be functions of both volume and noise rates. An appropriate undercatch correction scheme for the PACRAIN method might involve determining baseline local rain rates via regression (possibly using a nonlinear model) and computing corrections based on high resolution wind data, before applying the PACRAIN method. Corresponding corrections prior to implementing the duplicate method would likely take a different form. It also requires, at the very least, rudimentary numerical modeling of flow about the R. M. Young gauge and its effects on trajectories over on a spectrum of drop sizes or field/laboratory testing. It is beyond the scope of this research, although it is potentially the largest source of error.
Followon research will begin with implementation of an updated algorithm for generating the PACRAIN historical monthly gridded rainfall estimates. The daily TAO estimates have already been incorporated into the currently available historical grids. Both the derived daily TAO estimates and the monthly gridded product are freely available online (https://www.pacrain.ou.edu).
Acknowledgments.
This work is supported by a research grant from the National Oceanic and Atmospheric Administration (NOAA) Climate Program Office—Global Ocean Monitoring and Observing. Raw data were provided courtesy of the Global Tropical Moored Buoy Array (GTMBA) Project Office of NOAA/Pacific Marine Environmental Laboratory.
Data availability statement.
The daily rainfall estimates derived via the method presented in this research are freely available through the PACRAIN database query page (http://pacrain.ou.edu/rain_query.php).
REFERENCES
Cook, W. E., and J. S. Greene, 2019: Gridded monthly rainfall estimates derived from historical atoll observations. J. Atmos. Oceanic Technol., 36, 671–687, https://doi.org/10.1175/JTECHD180140.1.
Cowpertwait, P. S. P., 1994: A generalized point process model for rainfall . Proc. Roy. Soc. London, 447, 23–37, https://doi.org/10.1098/rspa.1994.0126.
Cowpertwait, P. S. P., 2010: A spatialtemporal point process model with a continuous distribution of storm types. Water Resour. Res., 46, W12507, https://doi.org/10.1029/2010WR009728.
Fischer, W., and K. MeierHellstern, 1993: The Markov modulated Poisson process (MMPP) cookbook. Perform. Eval., 18, 149–171, https://doi.org/10.1016/01665316(93)90035S.
Greene, J. S., M. Klatt, M. Morrissey, and S. Postawko, 2008: The comprehensive Pacific Rainfall Database. J. Atmos. Oceanic Technol., 25, 71–82, https://doi.org/10.1175/2007JTECHA904.1.
GTMBA, 2020: High resolution rain data. NOAA/PMEL, accessed 19 February 2020, https://www.pmel.noaa.gov/tao/drupal/disdel/.
Kaczmarska, J., V. Isham, and C. Onof, 2014: Point process models for fineresolution rainfall. Hydrol. Sci. J., 59, 1972–1991, https://doi.org/10.1080/02626667.2014.925558.
Kim, D., and C. Onof, 2020: A stochastic rainfall model that can reproduce important rainfall properties across the timescales from several minutes to a decade. J. Hydrol., 589, 125150, https://doi.org/10.1016/j.jhydrol.2020.125150.
Koschmieder, H., 1934: Methods and results of definite rain measurements. Mon. Wea. Rev., 62, 5–7, https://doi.org/10.1175/15200493(1934)62<5:MARODR>2.0.CO;2.
Lavoie, R. L., 1963: Some aspects of the meteorology of the tropical Pacific viewed from an atoll. Atoll Research Bulletin, No. 96, Pacific Science Board, Washington, DC, 80 pp., https://doi.org/10.5479/si.00775630.96.1.
Lu, Y., and X. S. Qin, 2012: Comparison of stochastic point process models of rainfall in Singapore. Int. J. Environ. Ecol. Eng., 6, 529–533, https://doi.org/10.5281/zenodo.1079672.
McPhaden, M. J., and Coauthors, 1998: The Tropical OceanGlobal Atmosphere observing system: A decade of progress. J. Geophys. Res., 103, 14 169–14 240, https://doi.org/10.1029/97JC02906.
Morrissey, M. L., and J. S. Greene, 1993: Comparison of two satellitebased rainfall algorithms using Pacific atoll raingage data. J. Appl. Meteor. Climatol., 32, 411–425, https://doi.org/10.1175/15200450(1993)032<0411:COTSBR>2.0.CO;2.
Nešpor, V., and B. Sevruk, 1999: Estimation of windinduced error of rainfall gauge measurements using a numerical simulation. J. Atmos. Oceanic Technol., 16, 450–464, https://doi.org/10.1175/15200426(1999)016<0450:EOWIEO>2.0.CO;2.
Oriani, F., J. Straubhaar, P. Renard, and G. Mariethoz, 2014: Simulation of rainfall time series from different climatic regions using the direct sampling technique. Hydrol. Earth Syst. Sci., 18, 3015–3031, https://doi.org/10.5194/hess1830152014.
PMEL, 2021: Data quality control. Accessed 18 February 2021, https://www.pmel.noaa.gov/gtmba/dataqualitycontrol.
R. M., Young, 2021: Model 50202/50203 precipitation gage. R. M. Young Doc., 8 pp.
RodriguezIturbe, I., D. R. Cox, and V. Isham, 1987: Some models for rainfall based on stochastic point processes. Proc. Roy. Soc. London, 410, 269–288, https://doi.org/10.1098/rspa.1987.0039.
RodriguezIturbe, I., D. R. Cox, and V. Isham, 1988: A point process model for rainfall: Further developments. Proc. Roy. Soc. London, 417, 283–298, https://doi.org/10.1098/rspa.1988.0061.
Serra, Y. L., 2018: Precipitation measurements from the Tropical Moored Array: A review and look ahead. Quart. J. Roy. Meteor. Soc., 144, 221–234, https://doi.org/10.1002/qj.3287.
Serra, Y. L., P. A’Hearn, H. P. Freitag, and M. J. McPhaden, 2001: ATLAS selfsyphoning rain gauge error estimates. J. Atmos. Oceanic Technol., 18, 1989–2002, https://doi.org/10.1175/15200426(2001)018<1989:ASSRGE>2.0.CO;2.
Sharma, A., and R. Mehrotra, 2010: Rainfall generation. Rainfall: State of the Science, Geophys. Monogr., Vol. 191, Amer. Geophys. Union, 215–246.
Thayakaran, R., and N. I. Ramesh, 2013: Multivariate models for rainfall based on Markov modulated Poisson processes. Hydrol. Res., 44, 631–643, https://doi.org/10.2166/nh.2013.180.
Tsoukalas, I., C. Makropoulos, and D. Koutsoyiannis, 2018: Simulation of stochastic processes exhibiting any‐range dependence and arbitrary marginal distributions. Water Resour. Res., 54, 9484–9513, https://doi.org/10.1029/2017WR022462.
von Hann, J., 1903: Handbook of Climatology. MacMillan, 465 pp.
World Meteorological Organization, 1962: Precipitation measurements at sea. WMO Tech. Note 47, 18 pp., https://library.wmo.int/doc_num.php?explnum_id=1742.
Yang, J., S. C. Riser, J. A. Nystuen, W. E. Asher, and A. T. Jessup, 2015: Regional rainfall measurements using the passive aquatic listener during the SPURS field campaign. Oceanography, 28 (1), 124–133, https://doi.org/10.5670/oceanog.2015.10.
Yuter, S. E., and W. S. Parker, 2001: Rainfall measurement on ship revisited: The 1997 PACS TEPPS cruise. J. Appl. Meteor. Climatol., 40, 1003–1018, https://doi.org/10.1175/15200450(2001)040<1003:RMOSRT>2.0.CO;2.