The Performance of ECMWF Subseasonal Forecasts to Predict the Rainy Season Onset Dates in Vietnam

: The onset of the rainy season is an important date for the mostly rain-fed agricultural practices in Vietnam. Subseasonal to seasonal (S2S) ensemble hindcasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) are used to evaluate the predictability of the rainy season onset dates (RSODs) over ﬁ ve climatic subregions of Vietnam. The results show that the ECMWF model reproduces well the observed interannual variability of RSODs, with a high correlation ranging from 0.60 to 0.99 over all subregions at all lead times (up to 40 days) using ﬁ ve different RSOD de ﬁ nitions. For increasing lead times, forecasted RSODs tend to be earlier than the observed ones. Positive skill score values for almost all cases examined in all subregions indicate that the model outperforms the observed climatology in predict- ing the RSOD at subseasonal lead times ( ∼ 28 – 35 days). However, the model is overall more skillful at shorter lead times. The choice of the RSOD criterion should be considered because it can signi ﬁ cantly in ﬂ uence the model performance. The result of analyzing the highest skill score for each subregion at each lead time shows that criteria with higher 5-day rainfall thresholds tend to be more suitable for the forecasts at long lead times. However, the values of mean absolute error are approximately the same as the absolute values of the mean error, indicating that the prediction could be improved by a simple bias correction. The present study shows a large potential to use S2S forecasts to provide meaningful predictions of RSODs for farmers.


Introduction
Located in the eastern part of the Indochina Peninsula, Vietnam is a developing country with an economy that substantially depends on agriculture. Especially rainfall can either encourage growth or devastate crops, so it has direct effects on agricultural production and the income of farmers (Lambert 2014). However, agriculture in Vietnam seems to be vulnerable to the anomalous variations of the rainfall characteristics, which was reported in previous studies (e.g., Endo et al. 2009;Ngo-Duc 2014;Trinh-Tuan et al. 2019;Pham-Thanh et al. 2020).
Temporal and spatial variability of rainfall characteristics in Vietnam is quite complex. The rainy seasons in the north, the south, and in the Central Highlands coincide with the Asian summer monsoon (May-October), while it is shifted to boreal autumn-winter (August-December) in the central part (Phan- Nguyen-Le et al. 2014). In the context of global warming, trends in annual rainfall totals over the country fluctuate substantially from station to station and from the north to the south. As an example, the total rainfall has increased in the south but decreased in northern Vietnam in current decades (e.g., Endo et al. 2009;Pham-Thanh et al. 2020). Among rainfall characteristics, the rainy season onset date (hereafter denoted as RSOD) has received major attention because of its important implications not only for agriculture but also for society due to its impact on electricity generation, water management, and human health (e.g., Bombardi et al. 2020). Furthermore, changes in the starting dates of the rainy season or monsoon were documented for Vietnam (e.g., Kajikawa et al. 2012;Ngo-Thanh et al. 2018;Pham-Thanh et al. 2019). Therefore, a reliable prediction of RSODs is one aspect to improve decision-making across several social sectors.
For different regions of the world, many studies have been conducted to estimate the predictability of RSODs using both statistical and dynamical methods to meet the strong demand of society. Statistical predictions of RSODs commonly depend on the influence of the large-scale circulation on local-scale rainfall characteristics (e.g., Omotosho 1992;Omotosho et al. 2000;Lala et al. 2020). For the south and the Central Highlands of Vietnam, previous studies suggested that multivariate linear regression approaches, which use large-scale circulation features such as pressure, moist static energy gradients, outgoing longwave radiation, and wind fields as predictors, provide skillful predictions of RSODs on seasonal time scales (Pham et al. 2010;Pham-Thanh et al. 2019).
For the dynamical approaches, a local-scale onset of rainy seasons or monsoons can only be predictable if it is heavily constrained by the large-scale circulation (Bombardi et al. 2020). Recent studies showed that the onset of monsoons and rainy seasons could be predicted on lead times of weeks to months using dynamical models (e.g., Vellinga et al. 2013;Alessandri et al. 2015;Bombardi et al. 2017;Chevuturi et al. 2019Chevuturi et al. , 2021. Among different times scales of predicting the future state of the atmosphere ranging from nowcasting (from 0 to 2 h) to long-range forecasting (from 30 days up to a year), the subseasonal to seasonal (from 2 weeks to 2 months, hereafter referred to as S2S) forecasts are of pivotal importance in management decisions for agriculture, water resource management, disaster risk reduction, health, and others (Vitart et al. 2012;White et al. 2017). However, the S2S time scale has long been considered as a "predictability desert" (Vitart et al. 2012). Recently, there are several subseasonal climate prediction datasets from state-of-the-art forecast models freely available, for example, the multimodel S2S prediction database hosted by the European Centre for Medium-Range Weather Forecasts (ECMWF), which enable studies examine the predictive skills of the S2S forecasts for different purposes across the world. Several studies demonstrated predictive skill of S2S forecasts at lead times of several weeks for raw output variables such as rainfall, wind speed, and temperature (Tompkins and Feudale 2010;Lynch et al. 2014;Phakula et al. 2020) and its applications to many socioeconomic sectors, especially early warnings of malaria (Tompkins and Di Giuseppe 2015), heat waves (Lowe et al. 2016), and tropical cyclones (Lee et al. 2020).
Recent studies pointed out the ability of dynamical models in capturing the RSOD at subseasonal time scales. In many monsoon regions such as the North and South American, East Asian, and Northern Australian monsoon areas, the onset and demise dates of the rainy season could be forecasted at subseasonal (∼30 days) lead times (Bombardi et al. 2017). Recently, Kumi et al. (2020) conducted a study focusing on the West African monsoon. Their results showed that the dynamical model was able to reproduce the three main phases of the monsoon (i.e., onset, peak, and southward retreat of rainfall) and other dynamics at six lead times (10, 20, 30, 40, 50, and 60 days), but especially in the 20-60-day forecasts. Thus, previous research suggests that the RSODs over a typical monsoon region like Vietnam could be predicted at subseasonal time scales using dynamical models.
For Vietnam, where the rainy season is strongly related to the activity of monsoon systems (Zhang et al. 2002), hitherto no study has investigated the skill of dynamical forecast models in predicting RSODs. Therefore, the present study selects five climatic subregions over Vietnam where the rainy season is mostly dominated by the monsoon, namely, the Northwest (R1), Northeast (R2), Red River Delta (R3), Central Highlands (R6), and Southern Plain (R7) regions (Nguyen and Nguyen 2013;Thanh Nga et al. 2021), to answer the question: Can the raw S2S rainfall prediction be used for skillful subseasonal forecasts of RSODs in the abovementioned subregions of Vietnam? Skillful forecasts would be an essential step to establish an operational S2S forecast system with lead times of more than one week for decision support across the country. The rest of this work is organized as follows: In section 2, detailed descriptions of the data and methods are provided. Section 3 presents the obtained results and analysis. Discussions and conclusions are given in the final section.

a. Data
In this study, two data sources for the 20 years from 2000 to 2019 have been used to evaluate the subseasonal predictability of the RSOD, namely, observed daily rainfall and precipitation hindcasts from the S2S database of the ECMWF. Only forecasts from the ECMWF model (in the following ECMWF-S2S) were used. The ECMWF-S2S hindcasts provide forecasts for 11 members, and the forecasts have a length of 46 days with a spatial resolution of 1.58 3 1.58. To better select grid points that fall into the different subregions, interpolated data with the resolution of 0.1258 3 0.1258 were retrieved. The higher resolution data were provided by ECMWF using a linear interpolation on a triangular grid. Note that the higher resolution will not remediate the smoothness imposed by the original resolution, thus the lack of differences between and within subregions, especially in the smaller northern regions, remains. The ECMWF hindcasts are produced for two starting days each week (Mondays and Thursdays) for the last 20 years once a new operational model version is available. In our study, hindcasts corresponding to model version dates of the year 2020 were used. In 2020, two model versions have been in use, namely, CY46R1 until 29 June 2020 and CY47R1 from 30 June 2020 onward. Altogether 23 100 (i.e., 20 years 3 105 start days 3 11 members) 46-day forecasts were utilized. Time resolution of the forecasts is daily, and the data are freely available online (https://apps.ecmwf.int/datasets/data/s2s/).
To evaluate the ability of ECMWF-S2S in capturing the RSODs, daily rainfall data obtained from meteorological stations of the Vietnam Meteorological and Hydrological Administration were used as reference. To ensure sufficient data quality, only stations that do not have more than three consecutive days with missing values during the periods close to the mean start dates of the rainy season (March-September) were selected. Consequently, data from 93 stations (cf. Fig. 3 for their locations) covering the study period were used.

b. Criteria for determination of RSODs
RSODs can be defined by using a wide range of criteria. In general, the techniques for identifying onset dates can be classified into two main categories: those based on rainfall distribution and those related to the changes in the large-scale atmospheric circulation. However, to answer the question about directly using the raw ECMWF-S2S rainfall forecasts to determine RSODs, this study selects a rainfall-based approach to detect RSODs. This approach was modified and applied in many previous studies for both monitoring RSODs from observed rainfall data and detecting RSODs from rainfall data of numerical weather prediction models (e.g., Stern et al. 1981;Sivakumar 1988;Omotosho 1992;Moron et al. 2009, Kumi et al. 2020).
Here, the RSOD is defined as the first wet day of a rainy period (i.e., lasting for at least N days), which received a given rainfall amount (i.e., larger than P mm) and is not followed by a long dry spell during the following weeks (i.e., the following D days after the RSOD). Typically, the thresholds of P, N, and D are empirically defined and rely on the climatic rainfall conditions in the study region. Thus, these fixed empirical thresholds are not useful for application outside of the study region without adjustments. Previous studies focusing on tropical Africa suggested that the condition represents the rainfall intensity in an initial wet spell (i.e., the P threshold) can be considered as the crucial parameter to determine RSODs (Marteau et al. 2009;Boyard-Micheau et al. 2013). Therefore, five different criteria that use five different thresholds for this threshold, namely, P = 20, 25, 30, 40, or 50 mm, were applied to investigate the RSOD for both the observed and hindcast data in this study. In the following these criteria will be referred to as P20, P25, P30, P40, and P50.
In detail, the observed station and forecast gridpoint RSODs are determined as the first day of the year that meets the following conditions: 1) The total rainfall amount of five consecutive days is larger than P mm. 2) The RSOD and at least 3 days in five consecutive days have rainfall amounts larger than 1 mm day 21 . 3) No more than seven consecutive dry days in the following 20 days after the RSOD.
Because the longest forecast lead time is 46 days, the third condition of the abovementioned criteria needs to be modified as: 3 ) No more than seven consecutive dry days after the RSOD. Figure 1 illustrates the process of determining RSODs using observations and forecasts. The RSODs using station data are determined by directly applying criteria 1-3 to rainfall observations at an individual station within the subregions. To determine forecast RSODs from ECMWF-S2S, the median of all observed RSODs within one subregion (e.g., subregional RSODs) needs to be known, since lead times of ECMWF-S2S are defined relative to this value. For example, if the observed RSOD is 20 April, a forecast lead time of 7 days refers to the 46-day forecasts starting on 13 April. Using the 46-day forecasts for a particular lead time, RSODs are then determined using the criteria 1, 2, and 3 . Due to the fact that for criterion 3 at least seven forecast days are required; the longest possible lead time is 40 days. For each forecast member, the subregional RSOD is the median of RSODs determined at all grid points within the subregion. Note that if for less than 50% of grid points a RSOD could be determined, no RSOD was assigned to the region. Since ECMWF-S2S forecasts are only available on Mondays and Thursdays, RSODs cannot be determined for all lead times. Using the example from above, RSODs cannot be determined, for example, for a lead time of 8 days since ECMWF-S2S forecasts are not starting on Sundays. Figure S1 shows the availability of forecasts using certain lead times, where 100% would mean that for all 20 years a RSOD could be determined for this lead time. In all subregions, the frequency does not exceed 50%. Therefore, to evaluate the model performance for all lead times, the RSODs of lead times that are not available are defined using the nonweighted mean of nearest available RSODs on both sides. Thus, in the case of unavailable lead times the results do not reflect the performance of these particular lead times but are more representative for a period 63-4 days around it. A particular date is only considered to be the RSOD if at least half of the grid points within one subregion satisfy all conditions. This condition reduces the number of false alarm onsets, which may occur locally.

c. Distinction of forecasted RSODs between ECMWF-S2S members
The distinction of RSODs between the 11 members of ECMWF-S2S is estimated by the sensitivity index (SI). This index is calculated for each lead time and for each subregion. The SI is defined as where L is the number of years (20 years), M is the number of members (11 members), x kj is the onset date computed for year k by the hindcast from member j, and x ke is the onset date computed for year k by the hindcast from member e. A higher (lower) SI value means that the detected RSODs at a given lead time and region are more (less) distinct between each member. Therefore, this information can highlight regions and lead times, which should be treated carefully when calculating the ensemble median values to determine RSODs.
To evaluate the performance of single members against the ensemble median for each lead time and for each subregion, the added value (AV) index is used. This index is defined as where Here n is number of years; F mod,i , F ens,i , and O i are RSODs from single members, RSODs from the ensemble median, and observed RSODs for the ith year, respectively. The AV thus can be used to measure the improvement of the ensemble median over a single member in forecasting RSODs. A positive value of the AV means that using the ensemble median to determine RSODs is better than using a single member.
Besides that, to evaluate the reliability of the ensemble predictions, a simple reliability index (RI) is introduced. The RI is defined in two steps as follows: Step 1: For each of the 20 years, the forecasted RSODs from the 11 members are denoted as X 1 , X 2 , … , X 11 and are sorted in ascending order, resulting in Z 1 , Z 2 , … , Z 11 , where Z 1 is the earliest forecasted RSOD and Z 11 is the latest one. Then, the interval I = [Z 1 , Z 11 ] is the range of predictions from all ensemble members.
Step 2: Calculate RI as follows: where L is the number of years (20 years), and C is the number of years in which the observed RSOD is within the interval of the ensemble members.
Therefore, if the ensemble is calibrated (i.e., if the ensemble is a random sample from the true distribution of the RSODs), the probability that the observed RSOD is contained in [Z 1 , Z 11 ] should be 10/12 = 83.3%, because the event of the observation falling into any of the 12 intervals (2infinity, Z 1 ], (Z 1 , Z 2 ], … , (Z 10 , Z 11 ], (Z 11 , 1infinity) should have equal probability. If the RI value is smaller than the nominal coverage of 83.3%, too few observations fall into the ensemble range. It reflects the bias in the ensemble forecasts, and/or the range of the ensemble is too small (i.e., the ensemble underestimates the true uncertainty). Vice versa, if the RI is larger than the nominal coverage of 83.3%, the ensemble range is too large, and the ensemble overestimates the true uncertainty.

d. Forecast verification
The model performance was evaluated using mean error (ME), mean absolute error (MAE), and MAE skill score (SS MAE ), which are defined as with MAE Obs 1 n n i 1 where n is number of years, F i and O i are modeled and observed RSODs for the ith year, O is the mean of the RSODs based on observations, and MAE obs is the mean absolute error of climatological forecasts (O) used as reference model. In addition, the correlation (CORR) is also used for verification.

a. Observed RSODs
As a preparatory step, the coherence of observed RSODs across each subregion is estimated. In particular, the spatial variability is determined by taking the standard deviation of all mean onset dates at all stations in a given subregion for each criterion (P20, P25, P30, P40, and P50; Table 1). Overall, the spatial variability in each subregion depends only slightly on the criterion used. However, the spatial heterogeneity of mean RSODs differs substantially between subregions, ranging from 7 days in R7 to 23 days in R6. Spatial variability is lower in R1 and R7 with 7-13 days and higher in R2, R3, and R6 with 15-23 days. The higher spatial heterogeneity of RSODs reflects localscale factors that play different roles in rainfall characteristics at each station, even in the same subregion. Therefore, for a given subregion, to reduce the effect of local-scale outliers of RSODs the median of RSODs from all stations is used in the following as the representative of the regional RSODs. Figure 2 shows the boxplots of the median forecast RSODs in the five subregions identified by five criteria for the period 2000-19. Overall, the stronger criteria lead to later RSODs. However, the sensitivity with changing criteria to determine RSODs is different in each subregion. For the northern parts (R1, R2, R3), changes in the chosen thresholds result in much higher variations of detected onset dates than for the southern parts (R6, R7). The distinction caused by using different criteria to the climatology of regional RSOD is up to 19 days for the three northern regions, while this value is only about 7 days for the southern regions. These results suggest that finding robust RSOD thresholds for the northern regions region are more challenging and necessary than in others.
However, results from different criteria consistently show that the earliest RSODs occur in subregion R1 around late April. For the R2, R6, and R7 subregions, the RSODs occur mainly between late April and early May, while the rainy season starts latest in subregion R3 (mid-May). In general, these spatial heterogeneities and chronological order of RSODs are consistent with previous studies that used different criteria and data to determine rainy season or summer monsoon onset dates (Nguyen-Le et al. 2014Ngo-Thanh et al. 2018;Pham-Thanh et al. 2019;Acharya and Bennett 2021).
The interquartile ranges of the RSODs obtained by different criteria in each subregion are similar. However, irrespective of the method to determine the RSODs, the interannual variability tends to be higher in the northern parts (R1, R2, R3) than in the southern parts (R6, R7). These heterogeneities could be related to the different causes of rain in each subregion. The higher fluctuation of rainfall events in the early seasonal transition periods in the northern part might be associated with diverse causes, such as the activity of mei-yu fronts, cold surges, and summer monsoon, or an interaction of different forcings (e.g., Yokoi and Matsumoto 2008;Chen et al. 2012;Vu et al. 2015;van der Linden et al. 2016). For the southern part, the interannual variability is higher in R6, which could be related to the mountainous terrain, which favors local afternoon instability showers (Pham-Thanh et al. 2019).
Although the results from the five criteria are somewhat different, they are still consistent in capturing the spatial and interannual variability in each subregion, especially the heterogeneity of RSODs between subregions. After comparing these results with other studies (Nguyen-Le et al. 2014;Ngo-Thanh et al. 2018;Pham-Thanh et al. 2019;Acharya and Bennett 2021), the P20 criterion seems to be more reasonable than other criteria (i.e., P25, P30, P40, and P50) in determining the RSODs. Figure 3 shows the mean RSODs determined using the P20 criterion at the 93 stations. It can be seen that the RSODs vary  significantly between stations, not only in the different subregions but also in the same subregion (Fig. 3, Table 1). Using the P20 criterion, the mean RSODs for all subregions vary between late April and mid-May. The earliest start date of the rainy season occurs in the R1 subregion, with the mean RSOD on 24 April and a standard deviation of 18 days. For the R2 and R6 subregions, the mean onsets are comparable with a mean RSOD around 29 and 30 April, but the interannual variability in R2 (21 days) is higher than in R6 (12 days). For the R7 subregion, the mean onset is on 8 May with a standard deviation of 11 days. In comparison to other subregions, the mean RSOD in R3 is latest (approximately 17 May). This region also has the highest interannual variability of about 28 days. To evaluate the performance of ECMWF-S2S hindcasts in capturing the transition between the dry and rainy seasons, the median of RSODs identified with the P20 criterion in each subregion will be used as reference in the remainder of this study.
b. Performance of ECMWF subseasonal forecasts to predict RSODs Figure 4 shows the SI of RSODs computed for 7-40-day lead times over the five subregions using five criteria. Overall, the SI values range from 1 to 12 days and depend on both criteria used and forecast lead times. Not surprisingly, irrespective of the criteria to determine the RSOD, the SI index tends to get higher for longer lead times, which can be seen clearly for most subregions, especially R7. An exception is subregion R3 where the SI rather fluctuates around a certain value irrespective of the lead time. For most lead times and all subregions, the SI values are smallest when using criteria with smaller P thresholds.
For the criteria using higher values of P (P40 and P50), the SI values only differ slightly between lead times, with high values ranging from 5 to 12 days. Since a high SI value means that the detected RSODs at given lead times and subregions are more distinct among members, the ensemble median will be used instead of individual members to reduce the influence of outliers. Particularly, for each subregion and each forecast member, the RSOD is the median of RSODs determined at all grid points. The ensemble forecast RSOD is the median of forecast RSODs of all members.
The RSOD is a diagnostic variable, which is different from the prognostic variables of numerical models such as temperature and wind. Therefore, for a specific forecast, the forecasted RSODs depend strongly on both the used criteria and the model performance. In some cases, a RSOD cannot be determined for a given member, especially for forecasts with longer lead times (above approximately 28 days) and using criteria P40 and P50 ( Fig. S2). Using the ensemble median, for more than 18 out of 20 years the RSODs could be determined in all subregions and for all lead times, while a RSOD cannot be determined for many years using individual members.
To further motivate the use of the ensemble median instead of individual members to determine the RSODs, Fig. 5 presents the AV for the 11 members for 7-40-day lead times in the five subregions. As already mentioned above, the AV values reflect the improvement of the ensemble median over a single member in determining RSODs. Overall, the AV calculated for a specific member strongly fluctuates, depending on the lead time and the criterion used to determine the RSOD. However, especially for shorter lead times positive AVs prevail for almost all subregions and for all criteria. The highest AVs tend to occur for forecasts with short lead times and when using the higher 5day rainfall criteria to identify the RSODs. These results suggest that in terms of predicting the RSODs, using the ensemble median might be better than using individual members. Thus, in the remainder of this study, the ensemble median will be used to evaluate the performance of using the raw ECMWF-S2S rainfall in predicting the RSODs.
The top row of Fig. 6 shows that using the ensemble median, ECMWF-S2S can capture well the interannual variability of the RSODs at all lead times. The correlation is rather high, with values ranging from 0.60 to 0.99 with significance above 95% based on the Student's t test in all subregions and for all criteria. As expected, the correlations tend to be slightly weaker for increasing lead times. However, the choice of RSOD criterion clearly influences the ability of ECMWF-S2S forecasts in capturing the observed RSOD fluctuations, which is reflected more specifically in subregions R1 and R2. For a fixed lead time, the high correlation tends to appear in cases of using criteria with smaller P thresholds (P20 and P30) in all subregions. In general, the correlation coefficients are lowest when using the P50 criterion for almost all lead times. However, in some forecasts with a lead time of more than 30 days in the R6 and R7 subregions, the higher values of P (P40, P50) give higher correlation values than other criteria. The observed RSODs in the R6 and R7 subregions are stable with a small interannual variation of about 12 days (cf. section 3a). While a wet spell is easier to be as the RSODs when using the weaker criteria, which lead to the

W E A T H E R A N D F O R E C A S T I N G VOLUME 37
forecasted RSOD could occur in the early period of a forecast length of 46 days. Thus, for the long lead-time forecasts, the forecasted RSODs, which in the early period, seem to fail and far from the observed RSODs. The error of the forecasted RSOD in previously mentioned cases could lead to an increase in interannual variation of the forecasted RSODs or the opposite of the forecasted RSODs and observed RSODs anomaly. As a consequence, the correlation of the forecasted RSODs and observed RSODs is decreased. Figure 6 also shows the verification scores calculated using the RSODs determined using the ensemble median. Overall, the MEs range from 235 (forecasted RSODs earlier than observed) to 111 days (forecasted RSODs later than observed), which strongly depend on the criterion used to determine the RSODs and on the lead times. For P20, P25 and P30, the RSODs are predominantly earlier than observed, while for P40 and P50 the MEs change from positive to negative between approximately 14 and 21 days. For all subregions, at a given lead time, the differences of MEs between forecasts using different criteria ranges from 5 to 15 days. The RSOD criteria with higher 5-day rainfall thresholds (P40, P50) tend to give later RSODs than other criteria.
In all subregions, the RSOD forecasts tend to be earlier than the observations for increasing lead times. For lead times shorter than 20 days, the absolute values of ME are less than seven days, while these values can reach up to more than 20 days earlier for lead times of more than 30 days.
The model performance seems to be strongly influenced by lead time. Overall, in all subregions, the model performance tends to drop when increasing the lead time, also reflected in higher MAEs, which range from 0 to 30 days (Fig. 6). Interestingly, for almost all cases, the MAEs are approximately the absolute values of MEs. These results combined with the high correlation between the forecasts and the observations suggest the potential improvement of forecasting RSODs by applying a simple bias correction.
In comparison to the reference forecasts using observed RSOD climatology (i.e., climatological forecast), the skill scores of MAE decrease with increasing lead times (Fig. 6, bottom row). This means that the forecast performance strongly depends on the forecast lead times. The skill scores are positive (negative) if the forecast skill is higher (lower) than the climatological forecast. Thus, it can be said that the model outperforms the climatological forecast in predicting the RSOD in all subregions for the lead times up to about * * * * * * * * * * * m01 m02 m03 m04 m05 m06 m07 m08 m09 m10 cf 7 1 4 2 1 2 8 3 5
c. Lead-time-dependent performance of S2S forecasts The above results showed that the ECMWF-S2S hindcasts can capture well the interannual variability of observed RSODs in all subregions with lead times between 7 and 40 days. However, the forecast skill strongly depends on the forecast lead times and the criteria used to determine the RSODs. The ECMWF-S2S hindcasts generally outperform the observed climatology in predicting the RSOD at the lead times between 1 and 6 weeks over all subregions. In this section, six lead times of 7, 14, 21, 28, 35, and 40 days are selected for a quantitative forecast verification. As before, RSODs determined using the ensemble median are also used here. Table 2 shows the criteria with which the forecasts have highest skill scores for each region and each lead time. It can  be seen that the criteria using smaller P thresholds (P20 and P30) are more suitable for the forecasts at the shorter lead times, while the criteria using higher P thresholds (P40 and P50) are more suitable for longer lead times. Figure 7 shows the interannual variability of RSODs based on observations (OBS) and corresponding forecasts at different lead times (7, 14, 21, 28, 35, and 40 days) for each subregion using the criteria listed in Table 2. Overall, the interannual variability of the observed RSODs in all subregions is reproduced reasonably well by ECMWF-S2S at all forecast lead times with correlation coefficients between OBS and forecasts ranging between 0.76 and 0.99 with significance above 95% based on Student's t test (Table 3). However, the correlation coefficients tend to decrease for increasing lead times. The ECMWF-S2S RSODs tend to occur predominantly earlier than the observation, shown by MEs ranging from 219 to 12 days. However, for lead times of 21 days and less, the MEs are quite small, varying from 23 to 12 days.
The MAEs range from 3 to 19 days and strongly depend on the lead times (Table 3). Longer lead times result in an increase of MAEs: 3-7 days for lead times of up to 21 days, 5-10 days for lead times of 28 days, and 7-19 days for lead times of more than 35 days. When considering subregions individually, the MAEs in R6 and R7 are more stable between each lead time up to 28 days, ranging from 3 to 5 days, suggesting that the accuracy of RSOD forecasts for these subregions is higher. Table 3 also shows skill scores of MAE. These values range from 20.74 to 0.86 and strongly depend on the subregions and lead times. Irrespective of the subregion, forecasts are more Finally, Table 3 also shows the reliability of the ensemble forecasts through the RI introduced in section 2c. For lead times below 21 days, RI values above 80% indicate a too large spread in the ensemble that could be reduced by postprocessing. For larger lead times at and above 21 days, RI values drop and can be as low as 0% at a lead time of 40 days in some regions (R1, R3, and R6). This insinuates that the ensemble range tends to be smaller than the uncertainty of the observed RSODs at longer lead times. While the reasons for the different RI values with lead times and between regions are hard to disentangle, partly also due to the low sample of 20 years and ensuing uncertainties in the RI estimates, the low RI values for lead times of above 35 days could be related to the ECMWF-S2S RSODs occurring predominantly earlier than the observation, as indicated by MEs ranging from 219 to 27 days (Table 3).

Conclusions
As a part of activities toward operational subseasonal forecasts of RSODs for Vietnam, this study has evaluated the performance of ECMWF-S2S forecasts in predicting RSODs in five climatic subregions. Five different criteria were used to quantify observation uncertainties and RSOD characteristics, and to evaluate the model's performance. The results can be summarized as follows.
1) Although rainfall in all subregions is influenced substantially by the monsoon, local-scale factors still play an essential role in rainfall characteristics at each station. This is reflected in a large spatial heterogeneity of RSODs of stations even within one specific subregion. This suggests that for each subregion the median of RSODs from all stations should be used as being representative of the regional RSOD to reduce the effect of local-scale outliers. 2) Uncertainties in determination of RSODs associated with the chosen thresholds are an essential aspect that should be considered. The sensitivity with changing criteria to determined RSOD is different between subregions.
Although the obtained results of detected RSOD from five criteria are somewhat different, there are still consistent in capturing the spatial variability and interannual variability of each subregion, especially the heterogeneity of RSOD between subregions. Among five selected criteria, the P20 criteria seem to be more reasonable in determining the regional RSOD, which coincides significantly with previous studies (Nguyen-Le et al. 2014;Ngo-Thanh et al. 2018;Pham-Thanh et al. 2019;Acharya and Bennett 2021). Following this criterion, the observed RSODs vary from late April to mid-May in the selected subregions. The earliest rainy season is observed in the R1 subregion. The mean onsets in the R2 and R6 subregions are somewhat similar to that in R1, but the interannual variability is higher in R2 than in R6. Among subregions, the RSOD in R7 is most stable, while in R3 the RSOD is latest and the interannual variability of RSOD is highest. 3) In terms of subseasonal forecasts using ECMWF-S2S hindcasts, there is a distinction of RSODs determined by each forecast member. The SI depends on both criteria used and the forecast lead times. Using criteria with smaller P thresholds, such as P20, to determine the RSOD, SI values are smallest at most forecast lead times and in all subregions. Thus, to reduce the influence of outliers, especially in case of long lead times, the RSODs determined from the S2S ensemble forecasts were calculated as the median instead of mean values from 11 forecast members. 4) When compared to observations, the ECMWF-S2S ensemble hindcasts can capture the interannual variability of RSODs in all subregions with lead times of 7-40 days. The correlations between forecasts and observations are high, but tend to be slightly weaker for increasing lead times. Overall, the RSODs determined based on ECMWF-S2S hindcasts tend to be earlier than those from observations, especially for increasing forecast lead times. The analysis of MAE and skill score of MAE identified that models are more skillful at shorter forecast lead times. 5) For all subregions, the RSOD criteria with higher P thresholds tend to be more suitable in capturing the RSOD for forecasts at longer lead times. The P20, P25, or P30 criteria are more suitable for the forecast lead times of 7 days, while the P40 or P50 are more decent for lead times of 14 days or more. 6) Using the best suitable criteria, the ECMWF-S2S ensemble hindcasts outperform the observed RSOD climatology for lead times of up to 4 or 5 weeks, depending on the subregion considered. However, irrespective of the subregion, the skill scores decrease for increasing lead times. For lead times of less than 28 days, the skill scores of the MAEs are mostly above 0.3 in all subregions.
These results contribute evidence that the RSODs can be predicted on lead times of weeks to months in selected climatic subregions of Vietnam, where the rainy season is mainly dominated by the summer monsoon. The results qualitatively agree with previous studies about the potential of predicting RSODs at subseasonal time scales for several monsoon regions, by using only raw rainfall from numerical models (e.g., Bombardi et al. 2017;Kumi et al. 2020). The similarity between MEs and MAEs as well as the large values of the RI at shorter lead times suggests that the quality of subseasonal RSOD predictions could be enhanced using ensemble postprocessing to realize the full potential of ensemble forecasts (e.g., Vogel et al. 2018). This should be in the focus of future research.