This study introduces a new daily high-resolution land-only observational gridded dataset, called SA-OBS, for precipitation and minimum, mean, and maximum temperature covering Southeast Asia. This dataset improves upon existing observational products in terms of the number of contributing stations, in the use of an interpolation technique appropriate for daily climate observations, and in making estimates of the uncertainty of the gridded data. The dataset is delivered on a 0.25° × 0.25° and a 0.5° × 0.5° regular latitude–longitude grid for the period 1981–2014. The dataset aims to provide best estimates of grid square averages rather than point values to enable direct comparisons with regional climate models. Next to the best estimates, daily uncertainties are quantified. The underlying daily station time series are collected in cooperation between meteorological services in the region: the Southeast Asian Climate Assessment and Dataset (SACA&D). Comparisons are made with station observations and other gridded station or satellite-based datasets (APHRODITE, CMORPH, TRMM). The comparisons show that vast differences exist in the average daily precipitation, the number of rainy days, and the average precipitation on a wet day between these datasets. SA-OBS closely resembles the station observations in terms of dry/wet frequency, the timing of precipitation events, and the reproduction of extreme precipitation. New versions of SA-OBS will be released when the station network in SACA&D has grown further.
The collection and organization of sufficient meteorological data into comprehensive datasets is fundamental in making climate research possible. For many parts of the world, this process is only just beginning. In Southeast Asia, a collaborative effort between meteorological services in the region to collect station observations with a daily resolution has led to a new dataset available for scientific research, called the Southeast Asian Climate Assessment and Dataset (SACA&D; http://sacad.database.bmkg.go.id).
Meteorological stations are irregularly spaced with low station densities in vast areas of Southeast Asia. The interpolation of these observational datasets to a regular grid makes it much easier to handle climatic information captured in the station observations, and this interpolation is therefore important for climate research. These gridded datasets allow best estimates of climate variables at locations away from observing stations and therefore allow studying climate in data-sparse regions.
Next to providing easily accessible data for research, gridded observational datasets can be used to validate or compare with weather forecasting models, regional climate models, and impact models, for example within the Coordinated Regional Climate Downscaling Experiment (CORDEX) community (Giorgi et al. 2009). Recently, Ngo-Duc et al. (2017) used station observations in the evaluation of regional climate models over the CORDEX–Southeast Asia region, and high-resolution gridded datasets with almost complete spatial coverage over this domain facilitate such comparisons.
Supari et al. (2017) have demonstrated a significant trend in temperature extremes across most parts of Indonesia over the last 30 years, with weaker trends evident in extreme rainfall events over that time period. Their analysis was conducted on a station-by-station basis; gridded data allow for further analysis on the spatial structure of such events.
In this paper we present a high-resolution land-only gridded dataset for the Southeast Asian region for daily precipitation amount and daily minimum, mean, and maximum temperature. The gridded dataset will be referred to as SA-OBS. The development of SA-OBS is principally aimed to provide an observational basis against which model results can be compared. A direct comparison between model output and interpolated data assumes that the observations and the model are indicative of processes at the same scale. This is especially important for the hydrological cycle, which is of clear societal importance and expected to change with global warming (Held and Soden 2006). A gridded dataset where each grid value is a best estimate of the average of the grid square observations is more appropriate for validating model output than a direct comparison with observational station data. The most optimal methodology to interpolate point observations to a regular grid has been the focus of an effort leading to the realization of the European counterpart of SA-OBS (E-OBS; Haylock et al. 2008; Hofstra et al. 2008; van den Besselaar et al. 2011).
Existing gridded observational datasets for Southeast Asia that are comparable to SA-OBS are either based on a less dense network of stations over the area covered by SA-OBS or based on satellite information where the interpretation of the infrared and microwave signals into precipitation has a limited accuracy. None of the existing datasets provides uncertainty estimates.
The Asian Precipitation–Highly-Resolved Observational Data Integration toward Evaluation of Water Resources (APHRODITE) dataset for precipitation (Yatagai et al. 2012) and temperature (Yasutomi et al. 2011) is comparable to SA-OBS in the use of a network of rain gauges as their basis. This dataset covers a large region of Asia including eastern Europe and the Middle East, and covers a large part of our region as well in their “Monsoon Asia” version. Other observational datasets used for the (Southeast) Asian region are derived from satellites such as the NOAA CPC Morphing Technique (CMORPH) dataset (Joyce et al. 2004) and the Tropical Rainfall Measuring Mission (TRMM) dataset (Huffman et al. 2007) for precipitation. These datasets use a combination of infrared and passive microwave observations to derive precipitation. We will show a comparison with these datasets in more detail in section 4.
2. Data collection and quality
SACA&D contains daily station observations for Southeast Asia, covering WMO Regional Association V north of 20°S and a few countries on the southern flank of WMO Regional Association II (see Fig. 1 for the region). The densest coverage is found for daily precipitation amount. The web tools and database infrastructure are based on the European Climate Assessment and Dataset (ECA&D; Klein Tank et al. 2002; Klok and Klein Tank 2008) and part of the International Climate Assessment and Dataset (ICA&D; van den Besselaar et al. 2015) umbrella concept. SACA&D aims to serve a wide user group with meteorological data available for scientific research and some specialized derived products targeted at specific user groups.
The climate data and information offered by SACA&D are brought together by a cooperation of the meteorological services in the region. To fully explore all sources of data and to provide a more extensive historical perspective, data rescue initiatives also contribute to SACA&D. One of these is the Digitisasi Data Historis (Didah) project. This project focused on the digitization and use of high-resolution historical climate data from Indonesia. Didah was a joint project between the national meteorological services of Indonesia and the Netherlands. Also, daily series digitized by the Japanese data rescue initiative of JAMSTEC (Hamada et al. 2002) are included for Indonesia and Timor-Leste.
SACA&D collected daily time series from the meteorological institutes of Australia, Indonesia, Malaysia, the Philippines, Singapore, Thailand, and Vietnam. A dense network of Indonesian rain gauges operated by the Indonesian knowledge and research institute for water management (Pusair) is included in SACA&D. Finally, temperature and precipitation series from the Global Historical Climate Network (GHCN-Daily 2012) for American Samoa, Fiji, Kiribati, the Federated States of Micronesia, Papua New Guinea, Samoa, and the Solomon Islands are added. Efforts are ongoing to increase the number of participating data providers, to increase the number of available stations, and to regularly update the series.
For the creation of the gridded dataset presented in this study, we selected those stations that have daily precipitation and/or temperature series. This resulted in 1393 precipitation stations, 365 stations with minimum and maximum temperature, and 274 stations with daily mean temperature (Fig. 1). The stations are unevenly distributed over Southeast Asia and the number of stations varies over time (Fig. 2). Also large gaps in series exist due to missing values. The coordinates and time period for each included station are available from separate files distributed along with the gridded dataset.
The stations series used for the gridding are the so-called blended time series from SACA&D. These are the series that are made as long as possible by combining time series from different sources per station. In this step, series from stations that are within 25 km distance and that have a height difference of less than 50 m are used to extend the series. This blending process is described in detail by SACA&D Project Team (2010). The requirement that a rather short distance should exist between stations when blending is applied means that the climatology of the donating station is likely to be similar to that of the receiving station. An earlier analysis (Marjuki et al. 2016) indicates that 14.4% of precipitation stations have no days of blended data but it also shows that many records have more than half their data filled in from nearby stations. This points to the fact that many records in SACA&D are actually fairly short. These short records are blended with each other, which is only possible in areas with a sufficiently high station density, like Java and Sumatra. The median value for interstation distance in the cases where blending is applied is just below 10 km.
Synoptic messages from the Global Telecommunication System (GTS) are currently used for the most recent time period, with the same restrictions on interstation distance and elevation difference as the nearby stations, if no other data are available [for a maximum of 10 years; see SACA&D Project Team (2010) for more details] (van den Besselaar et al. 2012).
The nonblended and the blended time series all underwent a basic quality control before the gridding process started. The QC consists of, for example, checking that the values are within reasonable bounds and not repetitive for a certain number of days (depending on the value). For temperature, additional tests are made to check whether, for example, the daily minimum temperature is below the daily mean and maximum temperature. See SACA&D Project Team (2010) for the specific details. Only data that have passed the quality control are used in the gridding.
b. Other datasets
The APHRODITE dataset is a daily precipitation (Yatagai et al. 2012) and temperature (Yasutomi et al. 2011) dataset for Asia. The dataset is created primarily with data obtained from station observations and the interpolation of station observations to a grid is done by angular distance weighting. This dataset is released on a 0.25° and 0.5° resolution for the period 1951–2007. In our study, we use the 0.25° version.
Other gridded observational datasets are derived from microwave and infrared observations from satellites. One of these is CMORPH (Joyce et al. 2004), which is also available in daily resolution on a 0.25° resolution grid for the period 1998 until about the present time. We have used the daily fields from CMORPH V1.0 (raw; 0000–2400 UTC). Another satellite-derived precipitation dataset is TRMM (Huffman et al. 2007). In our study daily fields of TRMM 3B42 version 7 (2100–2100 UTC) are used, which are available for the period 1 January 1998 until about the present.
These three datasets are used for comparison with the new SA-OBS dataset.
c. Data issues
An issue that needs some attention is the assigning of a date to the rain gauge measurement. Usually, the daily amount of rain is gauged in the morning, but the exact timing differs from country to country. Also the date attached to this measurement is not always clear in that sometimes the start of the accumulating interval is used and sometimes the end in setting the date of the measurement. This makes that it is unclear to what date the 24-h precipitation amount actually relates to. For example, the guideline of the Australian Bureau of Meteorology (BOM) is to measure 24-h accumulated precipitation at 0900 local time and assign the date of the day on which the measurement is made to this measurement. The consequence is that the 0900–0900 local time interval over which the precipitation is accumulated largely overlaps with the day preceding the day related to the date of the measurement. To account for this, the precipitation series of the BOM are shifted one day backward in SACA&D (so the measurement originally assigned to 2 January 1984 is shifted to 1 January 1984). For other countries in Southeast Asia, either the metadata of the precipitation series do not indicate that a shift is appropriate, or the metadata are insufficiently detailed for us to be certain whether a day shift is appropriate or not.
For the Indonesian series a further problem is identified. The guideline of the Indonesian Meteorological Service (BMKG) is to measure accumulated precipitation at 0700 local time and assign the date of the day on which the measurement is made to this measurement. However, a comparison between daily precipitation sourced directly from BMKG with precipitation data from other sources casts doubts on the firmness of the guideline. Synoptic data from the GTS contain 6-h precipitation totals measured at 0600, 1200, 1800, and 2400 UTC. With a time difference of 7 h between western Indonesian time and UTC, the 24-h totals over the 0000–2400 UTC time period should match accumulated precipitation over the 0700–0700 local time period, albeit with a day shift (where the GTS data should lag the BMKG data with one day). A correlation between 24-h precipitation sums sourced from the GTS and those from the BMKG shows (for most stations) maximum correlation at lag 0, indicating that the BMKG measurements already relate to the day that overlaps most with the measurement interval. However, the GTS data are very incomplete, making it a less than ideal reference. Comparison with other sources, such as JAMSTEC data, does also not reveal a clear indication that a day shift is needed for the BMKG series.
A further check has been made using the field correlation of SA-OBS and the other gridded datasets APHRODITE, CMORPH, and TRMM. This was done without a day shift and with a 1 day lag or −1 day lag for SA-OBS. The comparison with APHRODITE is shown in Fig. 3, which shows that the highest correlations are reached at zero lag. The same holds for the comparisons with CMORPH and TRMM (not shown). Because of the aggregation in space, these analyses obscure if series from one data provider require a day shift. A map showing the correlation between SA-OBS and APHRODITE (Fig. 7, discussed in section 4) indicates that no specific country stands out with uniformly low correlation values. Except for the ones mentioned above, the times series are not shifted prior to gridding.
3. Gridding methodology
The gridding method used for SA-OBS is based on the one for the European gridded dataset E-OBS as described by Haylock et al. (2008). It involves kriging using a geographically independent variogram. The full procedures are described by Haylock et al. (2008), so only a summary is given here.
Kriging involves solving a set of linear equations to minimize the variance of the observations around the interpolating surface. This least squares problem therefore assumes that the station data being interpolated are homogeneous in space, which is not the case here.
This problem is addressed by adopting a three-step methodology of interpolating the daily data. First, the monthly mean is interpolated with thin-plate splines using elevation to define the underlying spatial structure of the data. Second, the daily anomalies with regard to the monthly mean are interpolated. Third, the interpolated daily anomaly is applied to the interpolated monthly mean to create the final result. This approach is very similar to universal kriging (Journel and Huijbregts 1978), where a polynomial is fit to the underlying spatial trend. In such a large and complex region as Southeast Asia, thin plate splines are a more appropriate method for trend estimation than polynomials. A minimum of 4 stations and a maximum of 25 stations within 450 km of a specific grid square were used in the interpolating of precipitation. For temperature, a distance of 500 km was used. If the minimum station limit was not met, the grid square value was set to missing. Figure 4 shows decorrelation lengths as determined from the variogram calculations. For precipitation, the value levels off around 350–400 km, so a maximum search radius of 450 km is reasonable. For temperature, the values level off at distance higher than 500 km, which means that our maximum search radius of 500 km is on the conservative side.
A second-order trivariate thin-plate spline is used in the gridding of the monthly temperature and precipitation values following the example of Haylock et al. (2008) and as used in the E-OBS dataset. In this spline the latitude and longitude values for each station are scaled in the unit of degrees and the altitude values are scaled in kilometers; this follows the recommendations of Hutchinson (1995, 1998). In the case of temperature this trivariate spline implicitly contains the empirically derived lapse rate, as determined at the station locations. For precipitation such a trivariate spline is likely to be a simplification of the complex spatial variation of the precipitation field, but the error from trivariate splines as used in this interpolation have been shown to be comparable to splines that incorporate more detailed environmental parameters (Hutchinson 1998). The altitude values used in the spline model were obtained as metadata from the station files, and where these were missing they were estimated from the global 30 arc-second digital elevation model (GTOPO30; https://lta.cr.usgs.gov/GTOPO30) data. Since erroneous altitude values can have a large effect on the gridding output (Sharples et al. 2005), the altitude values provided in the station files were first checked against values interpolated from the GTOPO30 data.
Since the daily values gridded using kriging are anomalies from the respective monthly mean or total, ordinary kriging was used without external drift values based on altitude. In the final stage of the gridding procedure when the gridded anomalies are converted back to absolute values, the altitude information from the gridded monthly values gets incorporated to the daily gridded fields.
For precipitation, the rainfall was first transformed to a binary distribution depending on being above or below a threshold. A first threshold of 0.5 mm was used to define a rainy day. Adopting thresholds lower than this has been shown to be sensitive to data quality, such as under reporting of small rainfall amounts due to bad observer practice (Hennessy et al. 1999). These binary values were then interpolated to produce values of the probability of observing a rainfall event above 0.5 mm. These probabilities were transformed back to real precipitation. This 0.5 mm threshold is only used for determining the difference between a wet day or not, and it is not used as a real limit on the precipitation itself.
The gridding is first performed on a regular high-resolution latitude–longitude grid of 0.1° × 0.1°. This master grid was then averaged to produce 0.25° × 0.25° and 0.5° × 0.5° regular grids to produce grid square averages. Grid squares without valid data and sea grid squares are indicated with missing values. The reason for this is that the interpolation methods were tuned to reproduce as accurately as possible a point observation, whereas the aim of the gridding was to produce grid square averages. We aimed to have grid squares where the distribution of precipitation occurrence is more comparable to that of a climate model (which is like an area average) than to a point observation, which has generally fewer rainy days. Averaging the high-resolution master grid, where each grid square has a dry/wet distribution similar to that of a point observation, to a coarser grid will achieve this.
An estimate of the interpolation uncertainty is included in the dataset. The uncertainty determined by interpolating the monthly climatology from all available years was applied to all years because of computational constraints. The method of addressing uncertainty is based on the premise that we would expect higher uncertainty at an interpolated point when the neighbors are more variable. When neighbors are similar, less uncertainty is expected. We applied the method of Yamamoto (2000) to every grid square for every day to arrive at the standard error for the daily anomaly. The final uncertainty at a grid square was calculated by combining the uncertainties from the monthly climatology and the daily anomaly in quadrature (i.e., the square root of the sum of the squares of the two uncertainties). More information on the uncertainty calculation is described by Haylock et al. (2008).
SA-OBS covers the period 1981–2014 whereas APHRODITE covers 1951–2007. Since the CMORPH and TRMM datasets start later (in 1998), the overlapping period 1998–2007 with 0.25° resolution is used in the comparison between the datasets. Using the same grid coverage when comparing the datasets, we calculated the daily precipitation differences between SA-OBS and APHRODITE. From the daily differences, we determined the mean over the whole 1998–2007 period and divided this by the mean daily precipitation in SA-OBS over the same period. Figure 5a shows this ratio for grid squares where at least 75% of the days have valid data. This figure shows that the mean of the differences is positive over the whole area, except for a small part of southern Myanmar at the border with Thailand and southern Philippines, indicating that the averaged daily precipitation amounts in SA-OBS are higher than in APHRODITE over almost the complete domain. The differences are smallest over northern Australia, Thailand, and the Philippines, and highest over Papua, New Guinea.
Figure 5b shows the same as Fig. 5a but for the comparison with CMORPH. In this case, the situation is similar to the APHRODITE comparison with a positive mean of the differences over the whole area. The fraction is 0.5 or higher over the whole domain, except a small part of southern Myanmar at the border with Thailand.
Figure 5c shows the mean for the comparison between our SA-OBS and the TRMM satellite derived precipitation data. This comparison also shows that the mean of the differences is positive, with TRMM underestimating the amount of precipitation compared to SA-OBS. The situation is almost the same for TRMM and CMORPH.
The online supplemental material shows scatterplots between observed precipitation at six selected stations in the region against gridded values of the corresponding grid squares in the gridded datasets (Fig. S1). These scatterplots confirm the results of Fig. 5 and show that the gridded datasets underestimate the observed precipitation, with SA-OBS closest to the observed station values. To make sure that any issues related to variations in the start and end of the 24-h period over which precipitation is measured are reduced, scatterplots of 10-day accumulated values are shown in the supplemental material as well (Fig. S2).
One observation made from Fig. 5 is that the topography is not reflected in the patterns showing the differences between the datasets. Data coverage seems to be more critical for these differences. This is most obvious for northern Australia, for which station data are freely accessible, and Thailand and the Philippines where the National Meteorological Services have shared data with both SACA&D and APHRODITE.
Recently, Herold et al. (2016) showed that various datasets give very different amounts of precipitation, making it difficult to assess how much it actually rains. Such large discrepancies where also noticed in the mean precipitation shown in Fig. 5. They also studied the simple daily intensity index [SDII; average precipitation on a wet day ( mm) divided by the number of wet days] and the number of wet days (RR1). Here we calculated annual values of these indices for the years in the common period 1998–2007 and averaged these in space using the same grid coverage for all datasets. These index time series are shown in Fig. 6 were differences between datasets are observed as well. While the mean precipitation in SA-OBS is higher over the whole period in Fig. 5, it is interesting to see that SDII is highest for TRMM and lowest for APHRODITE with comparable values for SA-OBS and CMORPH in between. Contrasting with this is that APHRODITE shows the highest number of wet days with SA-OBS and the satellite datasets are considerably lower (except for 1999). SA-OBS shows consistently more wet days than the satellite datasets, but the difference between SA-OBS and either TRMM or CMORPH is much less than between APHRODITE and the satellite datasets. These differences most likely come from different input data (stations or satellite), interpolation techniques (kriging or angular distance weighting), and station density (SA-OBS and APHRODITE).
The correlation between daily fields of SA-OBS and the three other datasets for each grid square is shown in Fig. 7 for the overlapping period. The overall correlation with APHRODITE (Fig. 7a) is highest (average correlation of 0.48), compared to the other two datasets. The correlation with CMORPH (Fig. 7b with average correlation of 0.30) is very low, with slightly better values for Australia and northern Philippines. TRMM (Fig. 7c with average correlation of 0.34) shows about the same correlations as CMORPH.
The relatively strong correlations over northern Australia, the Philippines, Thailand, and Vietnam between SA-OBS and APHRODITE must be related to the availability of rain gauge data for these countries, which are used in both datasets. It is interesting that the correlation map shows the actual locations of the rain gauges in Thailand and Vietnam as high correlation islands, whereas correlations in the areas between stations drop to about 0.6. The rain gauge data of these stations are most likely shared by these two datasets while different gridding approaches relate to the diverging results in grid squares away from rain gauge stations.
The good temporal relation between SA-OBS and TRMM over northern Australia and the Philippines presumably relates to the calibration of the satellite precipitation estimates with rain gauge data (Huffman et al. 2007).
Finally, the supplemental material shows the comparison between SA-OBS and GPCC Full Data Reanalysis Version 7 (Schneider et al. 2011) for average January and July conditions, where SA-OBS is aggregated to the coarser 0.5° resolution. The comparison period is over the 1998–2007 period (Fig. S3) and shows a general good agreement between the two datasets.
1) Comparison against station data
Since no station list is available for APHRODITE, we determined for each contributing SA-OBS station with data in the overlapping time period if the corresponding APHRODITE 0.25° grid square had one or more contributing stations. This resulted in 323 stations that are used for a comparison of the gridded datasets against station time series. For each station and each gridded dataset, the time series of the grid square in which the coordinates are located is taken in this comparison.
Table 1 shows a contingency table (Wilks 1995) with the fraction of observed and gridded dry and wet days over the common period 1998–2007 and for all stations combined. The threshold for a dry day is set at 0.5 mm. The probability of detection (the fraction of the days when a dry day is observed both at the station and in the corresponding grid square compared to the total number of days when a dry day is observed at the station) is 0.88 for SA-OBS while the other datasets show values of 0.66, 0.75, and 0.79, for APHRODITE, CMORPH, and TRMM, respectively. The false-alarm rate (the proportion of dry days in the corresponding grid square of the gridded data that are not observed at the station) is 0.04 for SA-OBS and 0.15, 0.30, and 0.31 for APHRODITE, CMORPH, and TRMM, respectively. This comparison shows that SA-OBS accurately reconstructs a dry day when a dry day is observed at the station, while the fraction of days that are reconstructed in the gridded field as dry but are observed at the station to be wet is low.
To quantify the skill of the gridded datasets in reproducing extreme precipitation events, we focus on days when the accumulated precipitation equals or exceeds 20 mm. Table 2 shows the relative frequencies of days where either extreme precipitation is observed or not at the station are reproduced in the various grids.
The bias in the gridded dataset (the ratio of the relative frequency of extreme precipitation reproduced in the grid to the relative frequency of extreme precipitation observed at the station; Wilks 1995) varies between the datasets. For SA-OBS, APHRODITE, CMORPH, and TRMM, these values are 0.85, 0.50, 0.70, and 0.96, respectively. All datasets have a bias less than one, meaning that the number of days with accumulated precipitation of 20 mm or more is reproduced less often in the grid than observed at the station. In a gridded dataset with a 0.25° resolution where observed rare events in the rain gauge network are underestimated, this is to be expected. The APHRODITE dataset stands out with a bias very different from one, indicating that the number of days with extreme precipitation is underestimated the strongest in this dataset, which is surprising since only grid squares in APHRODITE that have a station close by are used in this analysis.
The probability of detection of a day with 20 mm of precipitation for the SA-OBS, APHRODITE, CMORPH, and TRMM datasets are 0.68, 0.30, 0.29, and 0.37, respectively. The fact that the probability of detection of APHRODITE, CMORPH, and TRMM datasets is not near one, while the bias is closer to one, indicates that many of the days with accumulated precipitation 20 mm in these datasets are on days when the extreme precipitation is not observed at the station.
Skill scores used in the comparison of wet and dry days, like the base rate, are unsuitable to compare more extreme events since they tend to degenerate to trivial values for extreme events (Wilks 1995). A recently introduced skill score that gives a more reliable verification of rare events is the Symmetric Extremal Dependency Index (SEDI; Ferro and Stephenson 2011). Although this measure is recommended for use in recalibrated situations only, where the bias in reproducing the extreme event in the gridded dataset is removed, we apply this measure to the uncorrected gridded datasets.
The SEDI varies in the interval where perfect forecasts receive a score of one, random forecasts receive a score of zero, and forecasts inferior to random receive negative scores. For SA-OBS, APHRODITE, CMORPH, and TRMM, these values are 0.86, 0.56, 0.47, and 0.51, respectively. All scores are positive and the SA-OBS score is closest to one, a good result that will be partly related to using the verifying stations as input to the gridding of SA-OBS, although APHRODITE also has at least one station in the 0.25° grid square used in the comparison.
2) Climate indices
To investigate the climatology of precipitation extremes between the datasets in more detail over the area, two climate indices are calculated using the gridded datasets. The number of days with very heavy precipitation ( 20 mm) for each year is calculated for each grid square. Figure 8 shows the difference in number of days with precipitation 20 mm between SA-OBS and APHRODITE, CMORPH, and TRMM averaged over the period 1998–2007. It is seen that SA-OBS has more heavy precipitation days than APHRODITE over almost the whole area, with the exception of the border near Myanmar and Thailand and some isolated spots in Malaysia, the Philippines, and New Guinea. CMORPH has more of these days than SA-OBS, especially in New Guinea. Finally, TRMM has many more heavy precipitation days over the area than SA-OBS, where Malaysia, parts of Sumatra (Indonesia), Kalimantan (Indonesia), and New Guinea stand out.
Recurrent droughts are a serious problem for Southeast Asia, where low crop yields are usually related to a failure of the wet season (Marjuki et al. 2016; Moron et al. 2009). To analyze the behavior of SA-OBS in producing droughts, the longest period with consecutive dry days for each year is calculated and compared with the three alternative gridded datasets. Figure 9 shows the difference in consecutive number of dry days between SA-OBS and APHRODITE, CMORPH, and TRMM averaged over the period 1998–2007. It shows that SA-OBS almost uniformly shows longer dry periods than APHRODITE, with the exception of Cambodia and Papua (Indonesia) where the reverse is the case. Compared with CMORPH, CMORPH shows slightly longer dry periods for most of the northern part of the domain. TRMM shows overall shorter dry periods with the exception of northern Vietnam and parts of Cambodia and Papua (Indonesia).
The SA-OBS daily mean temperature dataset is compared to the APHRODITE mean temperature gridded dataset, available for the “Monsoon Asia” region over the period 1961–2007 (Yasutomi et al. 2011). Figure 10a shows the mean of the differences for grid squares where at least 75% of the days have valid data (common period is 1981–2007). The coverage over Southeast Asia in SA-OBS is worse for temperature than for precipitation. This relates directly to the sparseness of the weather station network available to SACA&D. We expect that this will be better in future versions of SA-OBS when additional series will become available. In most areas SA-OBS has a higher mean temperature compared to APHRODITE, except for a large part of Sumatra (Indonesia). The correlation between SA-OBS and APHRODITE for mean temperature is shown in Fig. 10b where it is seen that a high correlation exists between the two datasets over the whole domain.
Comparisons in the form of scatterplots where observed values from six stations in the region are compared to the corresponding grid squares of SA-OBS and APHRODITE confirm the results of Fig. 10 and are shown in the supplemental material (Fig. S4).
The supplemental material shows also the comparison between SA-OBS and CRU TS 3.24 (Harris et al. 2014) for average January and July conditions, where SA-OBS is aggregated to the coarser 0.5° resolution (Fig. S5). The comparison is over the 1981–2007 period and shows a general good agreement between the two datasets.
We are not aware of a daily gridded dataset for daily minimum temperature or daily maximum temperature for Southeast Asia. Therefore no comparisons with other datasets can be made for these variables.
c. Observed event
Jakarta experienced a large flooding in February 2007 (Aldrian 2008), which was a very localized event. Maps of the Jakarta area for 1 and 2 February 2007 as available in SA-OBS, APHRODITE, CMORPH, and TRMM are shown in Fig. 11. From these maps, it is seen that an extreme precipitation event is captured in SA-OBS (blue grid squares; 185.9 and 152.1 mm for 1 and 2 February, respectively), while precipitation values in APHRODITE are less extreme (43.0 and 24.9 mm, respectively). The most likely reason for this is that the event was very localized and that APHRODITE does not have a station in the area of the event, preventing the inclusion of this flood in the APHRODITE dataset. In the satellite-derived datasets, the flooding is only weakly observed in CMORPH (37.2 and 9.9 mm, respectively) and TRMM (97.4 and 24.9 mm, respectively). For comparison, the values observed at station Kemayoran-Jakarta were 234.7 mm and 76.7 mm on 1 and 2 February 2007, respectively.
The SA-OBS dataset includes uncertainty fields next to best estimates of the daily precipitation amounts and daily values of minimum, mean, and maximum temperature. The uncertainty estimate relates to variability among neighboring stations, with higher uncertainty at grid squares when the signal from surrounding stations is less homogeneous. Figure 12 shows the average daily uncertainty field for precipitation over the period 1981–2010. This is just the average of the uncertainties per grid square, independent of the data availability of that grid square. Kalimantan (Indonesia) shows the highest uncertainties, which is related to a low station density. Australia, Thailand, Philippines, and Java (Indonesia) show the lowest average uncertainties, areas with a higher station density. The same figure for mean temperature is shown in Fig. 13. In this case, the largest uncertainties are found around the northern edge of the domain. Note that a simple daily average is calculated, not involving a division with the square root of the number of samples as would be appropriate when actual time-averaged uncertainties need to be estimated.
In this paper a newly developed daily gridded dataset for Southeast Asia, called SA-OBS, has been presented. The available parameters are daily precipitation amount, daily minimum temperature, daily mean temperature, and daily maximum temperature for the period 1981–2014. The dataset is available on regular latitude–longitude grids with resolutions of 0.25° × 0.25° and 0.50° × 0.50°. The underlying station series come from SACA&D, which is a collaboration of the meteorological services in the region and a platform to share daily meteorological observations. When more and longer series will become available in SACA&D, this SA-OBS dataset will be updated too.
The number of rain gauges used in the gridding of precipitation of SA-OBS is 1393 where the highest density is found over Indonesia (particularly Java and Sumatra) and Australia. For mean temperature, 274 stations are used in SA-OBS and for minimum and maximum temperature 365 stations. Most of the data over Java and Sumatra have not been available for scientific research before.
The comparisons for precipitation between SA-OBS, another rain gauge-based observational gridded dataset (APHRODITE), two satellite-based gridded datasets (CMORPH, TRMM), and station observations show that the correlation between station observations and the corresponding grid squares is highest for SA-OBS, and that the reproduction in SA-OBS of very heavy precipitation days and dry days matches the station observations best. SA-OBS does have a slight underestimation for higher precipitation amounts as the grid square values represent grid square averages instead of point observations, but the comparisons show that, in terms of the timing of extreme events and the amplitude, SA-OBS is closest to the station observations.
The reproduction of extreme events is further demonstrated by comparing gridded maps for a well-documented extreme precipitation event in the area around Jakarta (Indonesia), which is reproduced in SA-OBS but fails to be significant in the other datasets.
In a comparison between SA-OBS and the three other gridded datasets, it is shown that the length of the longest period with consecutive dry days is generally longer over Southeast Asia in SA-OBS than in the other datasets. For the number of heavy precipitation days, on which 20 mm is observed, SA-OBS has generally more of these days compared to the other rain gauge–based dataset, while the satellite-based datasets overestimate this number in comparison to SA-OBS.
As there are fewer daily gridded observational temperature datasets available than precipitation datasets, only a comparison for mean temperature with APHRODITE has been made. For minimum and maximum temperature this was not possible at all.
Not all Southeast Asian countries in the domain of SA-OBS have contributed to the station dataset SACA&D, and some have contributed only a small part of their rain gauge or weather station network. The quality of any gridded dataset, and SA-OBS is no exception, improves with higher station density. Because of the high station density in SACA&D for many parts of Southeast Asia, SA-OBS is the currently best available daily gridded observational datasets for the region, but there is room for improvement in terms of spatial and temporal coverage. Participation in SACA&D is open for everyone with daily station observations.
SA-OBS will be available from http://sacad.database.bmkg.go.id/download/grid/register.php.
We acknowledge the data providers in the SACA&D project (http://sacad.database.bmkg.go.id). The research leading to these results has received funding from the Didah project, the European Union, Seventh Framework Programme (FP7/2007-2013 and SPA.2013.1.1-02) under Grant Agreements 242093 (EURO4M) and 607193 (UERRA). The authors also acknowledge the support of the Royal Netherlands Embassy in Jakarta, Indonesia, through a Joint Cooperation Programme between Dutch and Indonesian research institutes.
Supplemental information related to this paper is available at the Journals Online website: http://dx.doi.org/10.1175/JCLI-D-16-0575.s1.