• Clarke, A. J., 2008: An Introduction to the Dynamics of El Niño & the Southern Oscillation. Elsevier, 324 pp.

  • Clarke, A. J., and S. Van Gorder, 2003: Improving El Niño prediction using a space-time integration of Indo-Pacific winds and equatorial Pacific upper ocean heat content. Geophys. Res. Lett., 30 , 1399. doi:10.1029/2002GL016673.

    • Search Google Scholar
    • Export Citation
  • Everingham, Y. L., R. C. Muchow, R. C. Stone, N. G. Inman-Bamber, A. Singels, and C. N. Bezuidenhout, 2002: Enhanced risk management and decision-making capability across the sugar industry value chain based on seasonal climate forecasts. Agric. Syst., 74 , 459477.

    • Search Google Scholar
    • Export Citation
  • Everingham, Y. L., A. J. Clarke, C. C. M. Chen, S. Van Gorder, and P. McGuire, 2007: Exploring the capabilities of a long lead forecasting system for the NSW sugar industry. Proc. Aust. Soc. Sugar Cane Technol., 29 , 917.

    • Search Google Scholar
    • Export Citation
  • Everingham, Y. L., A. J. Clarke, and S. Van Gorder, 2008: Long lead rainfall forecasts for the Australian sugar industry. Int. J. Climatol., 28 , 111117.

    • Search Google Scholar
    • Export Citation
  • Jeffrey, S. J., J. O. Carter, K. M. Moodie, and A. R. Beswick, 2001: Using spatial interpolation to construct a comprehensive archive of Australian climate data. Environ. Model. Softw., 16 (4) 309330.

    • Search Google Scholar
    • Export Citation
  • Power, S., M. Haylock, R. Colman, and X. Wang, 2006: The predictability of interdecadal changes in ENSO activity and ENSO teleconnections. J. Climate, 19 , 47554771.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.

  • View in gallery
    Fig. 1.

    Map of sugarcane-growing regions (gray areas) in northeastern Australia together with the locations of three of the sugar mills at Tully, Plane Creek, and Harwood. These mills are representative of the northern, central, and southern regions, respectively.

  • View in gallery
    Fig. 2.

    (a) SON rainfall at Tully plotted against the JJA averaged value of the Niño-3.4 index for the years 1950–2005. The bilinear fit to the data is the best fit in a least squares sense (for details, see the second to last paragraph of section 2). (b) As in (a), but for Plane Creek and the MJJ averaged value of the Niño-3.4 index. (c) As in (a), but for Harwood and the July–September (JAS) averaged value of the Niño-3.4 index and the years 1950–2006.

  • View in gallery
    Fig. 3.

    Histogram of JJA Niño-3.4 “forecasts” using data up to the end of January minus the observed JJA Niño-3.4 values for the Clarke and Van Gorder (2003) ENSO prediction model. The “predictions” are cross-verified hindcasts for the period from January 1981 to December 2001 and operational predictions from 2002 to 2006. The Gaussian curve with the same mean (−0.13) and standard deviation (0.51) as the histogram is also plotted. Calculations based on a Lilliefors test (see, e.g., Wilks 2006) reveal that the hypothesis that the histogram is Gaussian cannot be rejected at the 10% probability level.

  • View in gallery
    Fig. 4.

    (a) Operationally predicted, one million–sample histogram pdf of 2007 SON rainfall at Tully using climate data available up to the end of January 2007. The predicted mean SON rainfall is 500.5 mm, the predicted median is 423.9 mm, and the observed SON rainfall is 304.4 mm. The solid curved line is the fitted gamma pdf. (b) Cumulative distribution functions corresponding to the histogram (black) and gamma pdf (gray).

  • View in gallery
    Fig. 5.

    (a) SON median rainfall for Tully “predicted” (solid gray line) from January and the corresponding observed rainfall (solid black line). These two time series are correlated at r = 0.57. The dashed lines denote the 16⅔ and 83⅓ percentiles so theoretically there is a 66⅔% = ⅔ probability that the observed rainfall will lie in the envelope formed by the dashed lines. In fact, during the analyzed period 1981–2005, 19 out of 25 (76%) of the observed rainfalls were within the ⅔ probability envelope. The average absolute value of the difference between predicted median rainfall and observed rainfall was 156 mm while that between observed rainfall and the long-term median rainfall (270.2 mm) was 205 mm. (b) As in (a) but for Plane Creek. In this case r = 0.30, (44%) of the rainfalls were within the ⅔ probability envelope, the average of the absolute value of the difference between predicted median and observed rainfall was 116 mm, and the average of the absolute value of the difference between the long-term median (159 mm) and observed rainfall was 113 mm. (c) As in (a) but for Harwood with corresponding values r = 0.51, 24/26 = 92%, the average absolute value of the difference between the predicted median and observed SON rainfall = 46 mm, and the average absolute value of the difference between the long-term median (187 mm) and observed SON rainfall = 57 mm.

  • View in gallery
    Fig. 6.

    (a) CRPS (solid line) of the Tully SON rainfall (mm) pdf “predicted” from January for 1981–2005. The “predictions” are based on cross-verified results for 1981–2001 and predictions for 2002–05. Also shown is the absolute value of the difference (mm) between the long-term median (270 mm) and observed rainfalls (dashed line) and the long-term mean (337 mm) and observed rainfalls (gray line). (b) As in (a), but for Plane Creek with long-term median = 159 mm and long-term mean = 180 mm. (c) As in (a), but for Harwood with long-term median = 187 mm and long-term mean = 211 mm. At Harwood, data for 2006 were available and the extra year has been included.

  • View in gallery
    Fig. A1.

    Bootstrap histogram (see appendix text) of the difference between “predicted” (from January) and observed JJA Niño-3.4. Also shown is the corresponding Gaussian pdf (solid curve; mean = −0.18; standard deviation = 0.55) and the Gaussian pdf estimated directly from the original data (dashed curve; see also Fig. 3; mean = −0.13 and standard deviation = 0.51).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 110 44 0
PDF Downloads 73 38 0

Forecasting Long-Lead Rainfall Probability with Application to Australia’s Northeastern Coast

Allan J. ClarkeDepartment of Earth, Ocean, and Atmospheric Science, The Florida State University, Tallahassee, Florida

Search for other papers by Allan J. Clarke in
Current site
Google Scholar
PubMed
Close
,
Stephen Van GorderDepartment of Earth, Ocean, and Atmospheric Science, The Florida State University, Tallahassee, Florida

Search for other papers by Stephen Van Gorder in
Current site
Google Scholar
PubMed
Close
, and
Yvette EveringhamSchool of Engineering and Physical Sciences, James Cook University, Townsville, Queensland, Australia

Search for other papers by Yvette Everingham in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

The authors develop a method for the long-lead forecasting of El Niño–influenced rainfall probability and illustrate it using the economically important prediction, from the beginning of the year, of September–November (SON) rainfall in the coastal sugarcane producing region of Australia’s northeastern coast. The method is based on two probability distributions. One is the Gaussian error distribution of the long-lead prediction of the El Niño index Niño-3.4 by the Clarke and Van Gorder forecast method. The other is the relationship of the rainfall distribution to the Niño-3.4 index. The rainfall distribution can be approximated by a gamma distribution whose two parameters depend on Niño-3.4. To predict the rainfall at, say, the Tully Sugar, Ltd., mill on the north Queensland coast in SON 2009, the June–August (JJA) value of Niño-3.4 is predicted and then 1000 possible “observed” JJA Niño-3.4 values calculated from the error distribution. Each one of these observed Niño-3.4 values is then used, with the Niño-3.4-dependent gamma distribution for that location, to calculate 1000 possible SON rainfall totals. The result is one million possible SON rainfalls. A histogram of these rainfalls is the required probability distribution for the rainfall at that location predicted from the beginning of the year. Cross-validated predictions suggest that the method is successful.

Corresponding author address: Professor Allan J. Clarke, Dept. of Earth, Ocean, and Atmospheric Science 4320, The Florida State University, Tallahassee, FL 32306-4320. Email: aclarke@fsu.edu

Abstract

The authors develop a method for the long-lead forecasting of El Niño–influenced rainfall probability and illustrate it using the economically important prediction, from the beginning of the year, of September–November (SON) rainfall in the coastal sugarcane producing region of Australia’s northeastern coast. The method is based on two probability distributions. One is the Gaussian error distribution of the long-lead prediction of the El Niño index Niño-3.4 by the Clarke and Van Gorder forecast method. The other is the relationship of the rainfall distribution to the Niño-3.4 index. The rainfall distribution can be approximated by a gamma distribution whose two parameters depend on Niño-3.4. To predict the rainfall at, say, the Tully Sugar, Ltd., mill on the north Queensland coast in SON 2009, the June–August (JJA) value of Niño-3.4 is predicted and then 1000 possible “observed” JJA Niño-3.4 values calculated from the error distribution. Each one of these observed Niño-3.4 values is then used, with the Niño-3.4-dependent gamma distribution for that location, to calculate 1000 possible SON rainfall totals. The result is one million possible SON rainfalls. A histogram of these rainfalls is the required probability distribution for the rainfall at that location predicted from the beginning of the year. Cross-validated predictions suggest that the method is successful.

Corresponding author address: Professor Allan J. Clarke, Dept. of Earth, Ocean, and Atmospheric Science 4320, The Florida State University, Tallahassee, FL 32306-4320. Email: aclarke@fsu.edu

1. Introduction

Like most agricultural industries, the success of the Australian sugar industry is heavily impacted by climate. Most of Australia’s sugarcane is grown along the narrow coastal strip between the latitudes of 15° and 30°S (Fig. 1). This region experiences wet summers and dry winters with considerable rainfall variability from one year to the next, much of this interannual variability being due to the El Niño–Southern Oscillation (ENSO) phenomenon. The ability to forecast future ENSO conditions and understand how this relates to industry planning is of vital practical and financial importance to the Australian sugar industry.

Many industry practitioners vividly recall the La Niña events that dominated the years from 1998 to 2000. The 1998/99 La Niña in particular was reported to cost the Australian sugar industry in excess of $175 million (Australian dollars; Everingham et al. 2007). For an industry that generates between $1 and $2 billion annually, this cost represents a significant sacrifice of revenue. These costs can be attributed to many factors, but ultimately most blame can be placed on a wet end to the harvest season and the inability to forecast and plan for it with a sufficient lead time.

The harvest season runs from approximately the middle (May–June) to near the end of the year (November–December). The objective is to complete the harvest before the onset of the summer rains, when conditions are too boggy for machinery to operate. Harvesting in wet conditions damages the soil by compaction and limits the regrowth of the crop in future seasons. This is particularly harmful for sugarcane crops because the plant is typically allowed to regrow for five successive seasons before being completely ploughed out and replanted. Failure to harvest all of the cane can mean lost profit. Harvesting in wet conditions creates additional problems at the mill level because more dirt must be removed from the cane as part of the milling process. At the marketing level, harvest disruption contributes to the logistical nightmare of delivering sugar that has been forward sold on the world sugar market. It is without question that wet harvest seasons are undesirable for the industry.

During the harvest season, most of the rainfall occurs during the Southern Hemisphere spring months of September–November (SON), with the winter months of June–August being much drier. However, owing to natural climate variability, some springs are wetter than others while some are drier. Advance knowledge about spring conditions could be used to help decide the start date for the harvest season. If there were a higher risk of spring rainfall, industry decision makers could consider starting the harvest season earlier. Conversely, if there were a lower risk of rainfall during spring, the industry could give consideration to starting the harvest season at a similar or later time than usual. To give the industry time to complete the mill maintenance needed for the commencement of harvest, the industry would need to know the chance of spring rainfall as early as January of the same year and no later than March.

Spring rainfall for Australian sugarcane-growing regions is known to be influenced by ENSO (e.g., Everingham et al. 2002, 2007), so the ability to forecast ENSO conditions would help in deciding when to harvest. Knowledge of ENSO conditions that were to emerge during the harvest would be needed early in the year so that the industry had sufficient lead time to implement preharvest preparations. This requires forecasting across the austral autumn, the most difficult time of the year to forecast ENSO. Clarke and Van Gorder (2003) developed a statistical model to predict the commonly used Niño-3.4 index, which is defined as the sea surface temperature anomaly averaged over the equatorial Pacific region 5°S–5°N, 170°–120°W. Apart from the simplicity of the model, a distinct advantage of the Clarke and Van Gorder model to the Australian sugar industry is the ability of the model to forecast post-autumn Niño-3.4 conditions before autumn (i.e., the ability of the model to forecast across the so-called Southern Hemisphere autumn predictability barrier).

While the ability to predict Niño-3.4 values is necessary to make a rainfall forecast, it is only sufficient if ENSO forecasts are linked with rainfall. Everingham et al. (2007, 2008) tested the relationship between austral spring (SON) rainfall and the Clarke and Van Gorder (2003) Niño-3.4 predictions made using data up to the end of January, February, and March. Across all regions considered in their studies there was a higher risk of obtaining SON rainfall above the median when the statistical model predicted La Niña during the SON season. For selected northern regions this risk was reduced when the model predicted El Niño during the SON season. While the study revealed that under certain conditions the probability of having wetter (or drier) springs differed from climatology, the study did not attempt to quantify the precise risk of receiving above- (or below) median rainfall amounts, nor did it consider the risk of receiving certain amounts of rainfall other than the median.

A more general approach is to forecast the probability of rain distributions. Knowing the relevant rainfall probability density functions (pdfs) enables risk to be quantified more accurately; probability of rainfall greater than or less than any amount can be given. The goal of this paper is to report and test a method for predicting, early in the calendar year, the austral spring (SON) rainfall pdfs for sugar-growing regions along the eastern coast of Australia.

There are two major uncertainties contributing to the prediction of the SON rainfall pdf at a given location. Firstly, there is not a one-to-one relationship between a given ENSO state and SON rainfall at that location; the same ENSO state, by some measure, can result in two very different SON rainfall totals. Secondly, the future ENSO state relevant to the SON rainfall is not known at the beginning of the year; it must be predicted, and such predictions have errors. In this paper we shall examine these uncertainties and show how they can be combined to calculate the required pdf.

The structure of the paper is as follows: in the next section we discuss the (nonlinear) relationship between the El Niño index Niño-3.4 and northeastern coastal Australian rainfall and show how, for each cane-growing region, SON rainfall can be approximately described by a gamma distribution with parameters dependent on Niño-3.4. In section 3 we estimate, approximately, the Gaussian error distribution made by the Clarke and Van Gorder method when it predicts Niño-3.4. The results of sections 2 and 3 are then used, in section 4, to estimate SON rainfall probability distributions for the Tully, Plane Creek, and Harwood sugar mills, representative mills for the northern, central, and southern sugarcane regions. Section 5 presents and discusses the cross-verified results and the main text of the paper then concludes with a summary.

2. SON northeastern Australian rainfall and Niño-3.4

Daily rainfall data from 33 locations on Australia’s northeastern coast were obtained (see http://www.longpaddock.qld.gov.au/silo/). The original daily data from the Australian Bureau of Meteorology were in-filled using the techniques outlined in Jeffrey et al. (2001). We used the daily data to calculate SON rainfalls at the 33 locations.

Table 1 shows the results of lag correlating the SON rainfalls at the 33 locations with 3-month averages of Niño-3.4. Maximum (in magnitude) correlations occur when Niño-3.4 leads the SON rainfall by a few months. Slightly better correlations are obtained with the equatorial index Niño-4, which, being defined as the sea surface temperature anomaly averaged over the region 5°S–5°N, 150°W–160°E, is based on an equatorial region closer to the northeastern Australian coast than the Niño-3.4 region. However, the correlation differences are small, and since we will be using the Clarke and Van Gorder (2003) model that is set up to predict Niño-3.4, we will use this ENSO index here.

Correlation magnitudes in Table 1 are similar over large areas and tend to decrease southward. For simplicity in this paper we will focus on three stations—Tully, Plane Creek, and Harwood. These stations are representative of the northern, central, and southern coastal regions.

Figure 2 shows SON rainfall at these representative stations plotted against the relevant best-correlated Niño-3.4 lead index for the years 1950–2005 (Tully and Plane Creek) and 1950–2006 (Harwood). The plots indicate that the SON rainfall, particularly at Tully and Plane Creek, is approximately bilinearly dependent on Niño-3.4. Specifically, both small and large El Niños produce about the same dryness; in other words, past a given point, it does not matter how big the El Niño is—the rainfall typically does not decrease further. This is in contrast to SON rainfall during La Niña, which on average is more variable and higher as Niño-3.4 becomes more and more negative. This nonlinear rainfall response to ENSO is typical of stations in the northern and central regions. It also is seen in all-Australian rainfall as was originally pointed out by Power et al. (2006).

Figure 2 shows that, as noted earlier, while there is a link between SON rainfall and Niño-3.4, it is far from deterministic even if we take into account the nonlinearity discussed above. We therefore adopt a probabilistic approach, each Niño-3.4 value corresponding to a rainfall probability density function. If we had enough data, we could determine this pdf as a function of Niño-3.4 by constructing a histogram for each value of Niño-3.4. However, Fig. 2 shows that we do not have nearly enough data to do this. We will therefore proceed as follows:

As is often done (see, e.g., Wilks 2006), we will approximate the rainfall distribution by a gamma distribution with, in our case, its two parameters dependent on Niño-3.4. The gamma probability density function is of the form
i1558-8432-49-7-1443-e21
where Γ(α) is the gamma function. Note that f (x) is fully specified by only two parameters α and β. As summarized by Wilks (2006), α and β can be found from the data [see (2.4) and (2.5) below] using an approximation to the maximum likelihood solution. We checked that this approximation is an excellent one for the parameters we used. The approximation first calculates
i1558-8432-49-7-1443-e22
and
i1558-8432-49-7-1443-e23
where, in our case, xi is the SON rainfall in mm for the ith year and n is the number of years for which data are available. The parameters α and β are then estimated as (see Wilks 2006)
i1558-8432-49-7-1443-e24
and
i1558-8432-49-7-1443-e25
Note that if we decompose the seasonal rainfall into a long-term mean and a seasonal anomaly via
i1558-8432-49-7-1443-e26
and substitute this into (2.3), then
i1558-8432-49-7-1443-e27
For the rainfall data of interest in our case, |xi/x| is typically small. For |xi/x| small and scriptstyle by construction, it follows from Taylor’s Theorem that
i1558-8432-49-7-1443-e28
where s2 is the sample variance. Thus from (2.7) and (2.8)
i1558-8432-49-7-1443-e29
Hence, once s and x are known, D is known and, by (2.4) and (2.5), so are α, β, and the gamma distribution (2.1).

From the above we can determine the dependence of the gamma distribution pdf on Niño-3.4 if we can determine the dependence of x and s on Niño-3.4. The plots in Fig. 2 suggest that x depends bilinearly on Niño-3.4 [i.e., for Niño-3.4 greater than some value N*, x = P* (constant) and then, for Niño-3.4 ≤ N*, x increases linearly as Niño-3.4 becomes more negative]. We determined the bilinear fits for x in Fig. 2 using a least squares approach, varying N*, P*, and the constant slope for Niño-3.4 ≤ N* to get the best fit.

The Fig. 2 plots also suggest that the variance about the x bilinear fit is essentially constant for Niño-3.4 > N* and then increases for Niño-3.4 ≤ N*. For simplicity we therefore adopted a similar bilinear fit for s as for x. Specifically, for Niño-3.4 > N*, we estimated the constant s part of the fit using all data having Niño-3.4 > N*. Let this constant value of s be s*. We also estimated s_, the value of s calculated using all data having Niño-3.4 ≤ N*. If N_ is the average value of Niño-3.4 found from the data for Niño-3.4 ≤ N*, then our calculations have provided us two points (N_, s_) and (N*, s*) in the (N, s) plane. These two points can be used to define a straight line for Niño-3.4 ≤ N* and thus complete the specification of the bilinear function s (Niño-3.4). Bilinear parameter values for Tully, Plane Creek, and Harwood are given in Table 2.

3. Prediction of Niño-3.4

The Clarke and Van Gorder (2003) statistical model uses the predictor
i1558-8432-49-7-1443-e31
to predict Niño-3.4 (t + Δt) for various lead times Δt. In (3.1), τ(t) is an Indo-Pacific zonal wind anomaly index and h(t) is related to upper-ocean heat content since it is the anomalous depth of the 20°C isotherm averaged across the equatorial Pacific from 5°S to 5°N. Both the τ(t) and h(t) indexes are each useful predictors of ENSO across the Southern Hemisphere autumn; January, February, and March values of either index are correlated with September, October, and November values of Niño-3.4 later that year with a correlation of at least 0.6. In addition, since Niño-3.4 is strongly persistent from June through March the following year, by appropriate choice of the coefficients a, b, and c in (3.1), S(t) is an excellent ENSO predictor throughout the year. The coefficients are determined by a least squares fit of S(t) to Niño-3.4 (t + Δt) for each calendar month and each lead time Δt. Cross-validated calculations indicate that the model performs as well or better than other statistical and dynamical ENSO prediction models (see Figs. 10.1 and 11.1 of Clarke 2008).

Our focus is on predicting 3-month averages of Niño-3.4 for May–July (MJJ), June–August (JJA), and August–October (ASO) since, as was shown in section 2, these Niño-3.4 indexes are related to rainfall probabilities. Forecasts of these indexes are needed from January, February, and March.

Histograms of “predicted” JJA Niño-3.4 minus observed JJA Niño-3.4 are shown in Fig. 3 for model “predictions” given data up until the end of January. The predictions are cross-verified hindcasts for the period from January 1981 to December 2001 and operational predictions from 2002 to 2006. As is apparent from the plot, the data are few and we decided to fit the data with a Gaussian distribution. In the appendix, we describe a bootstrap approach for estimating the predicted minus observed histogram. This histogram is similar to the straight Gaussian estimate, so much so that we get rainfall probability estimates that differ negligibly. In what follows we will use the simple Gaussian fit for the predicted minus observed pdf.

The preceding discussion focused on the prediction of JJA Niño-3.4; similar results are obtained for MJJ and ASO Niño-3.4. In the next section, we will show how the error distributions of this section and the link SON rainfall and Niño-3.4 of section 2 can be used to predict a pdf of SON rainfall at a given location.

4. Prediction of SON rainfall probability

We illustrate our method using our prediction of the pdf for the 2007 SON Tully rainfall from January 2007. At the beginning of February 2007 we know the January 2007 values of Niño-3.4, τ, and h. These values, together with the known coefficients a, b, and c for Δt = 6 months, are used in (3.1) to calculate S(t), our prediction of JJA 2007 Niño-3.4. In this prediction the known a, b, and c coefficients are “frozen,” having been obtained from a least squares analysis of data from January 1981 to December 2001.

If we randomly sample the predicted minus observed Gaussian pdf for the prediction of JJA Niño-3.4 from January (see Fig. 3), then since we know the predicted 2007 JJA Niño-3.4 for 2007, we can calculate a sample observed value of JJA Niño-3.4. From section 2 we can use this observed JJA Niño-3.4 value to calculate the gamma distribution parameters α and β for Tully so the gamma distribution is known. From this gamma distribution we calculate 1000 rainfall samples. Repeating this whole process 999 more times enables the calculation of one million possible SON rainfalls from which a histogram can be constructed (Fig. 4a). This histogram is an estimate of the 2007 SON rainfall pdf for Tully given climate data until the end of January 2007. To be consistent with our use of the gamma distribution for rainfall, this histogram is fitted to a gamma pdf (see Fig. 4a) and we use this as our final predicted 2007 rainfall pdf. Figure 4b shows the cumulative distribution functions corresponding to the histogram and gamma pdf.

We only have complete datasets for all stations up to the end of 2005, so we only have four verified SON rainfall forecasts (2002–05) since the a, b, and c model coefficients were frozen. To test our model predictions for more years, we additionally carried out cross-verified “forecasts” for the years 1981–2001 when data for the predictors Niño-3.4, τ, and h were available. In these calculations, data for the year to be predicted (say 1998) were removed and the model coefficients a, b, and c found by least squares fitting of the remaining years 1981–97, 1999–2001. The SON rainfall–Niño-3.4 bilinear relationship (see Fig. 2) was also calculated with the 1998 data removed. Then the prediction for the pdf of 1998 SON rainfall at a given location was found in a similar way to that for the 2007 case described above. For each station we thus have 25 predicted pdfs (21 cross verified, 1981–2001, and 4 operational, 2002–05) available for testing the method.

5. Testing the accuracy of the SON rainfall probability predictions

Figure 5 shows the predicted median SON rainfalls at Tully, Plane Creek, and Harwood (solid gray line) and the corresponding observed SON rainfalls. The median and observed rainfalls at these locations are correlated at 0.57, 0.30, and 0.51, respectively. Also shown in Fig. 5 in each plot are dashed lines showing the 16⅔ and 83⅓ percentiles so that, in theory, 83⅓% − 16⅔% = 66⅔% = ⅔ of the predicted rainfalls should lie between the dashed lines. The percentage for the observed SON rainfalls within the dashed lines at Tully is 76%, close to the ⅔ value, but there are bigger discrepancies at Plane Creek (44%) and Harwood (92%). The better agreement is expected at Tully since the correlation of the predicted median and observed rainfall is higher there. The average percentage for all three stations is 71%, close to the expected ⅔ value.

Another way (see, e.g., Wilks 2006) to test the predicted rainfall pdfs is to calculate the continuous ranked probability score (CRPS)
i1558-8432-49-7-1443-e51
where x is the rainfall in millimeters, R(x) is the predicted cumulative distribution function corresponding to the predicted pdf, and Ro(x) is a cumulative distribution function corresponding to the observed rainfall, being zero when x is less than the observed rainfall and unity when x is greater than or equal to the observed rainfall. The CRPS is a measure of the error in the predicted pdf and behaves in a way you expect the error to behave. For example, the deterministic forecast x = xpredicted corresponds to a Dirac δ function pdf at x = xpredicted and a unit step function R(x) at x = xpredicted giving
i1558-8432-49-7-1443-e52
In other words, the CRPS reduces to the absolute error when the forecast is deterministic. When the predicted pdf is not deterministic but, say, Gaussian with mean xpredicted and standard deviation σ, the CRPS increases with increasing σ when xpredictedxobserved, for then the predicted pdf and observed δ function pdf differ increasingly as σ increases. On the other hand, when xpredicted and xobserved are well separated, increasing σ shortens the distance between the predicted and observed pdfs and the CRPS decreases. Specifically, numerical calculations show that when xpredicted = xobserved, then
i1558-8432-49-7-1443-e53
and when |xpredictedxobserved| > 2σ,
i1558-8432-49-7-1443-e54

Figure 6 shows the CRPS (solid black lines) for Tully, Plane Creek, and Harwood predicted from January for the years 1981–2006. The predictions are based on cross-verified results for 1981–2001 and predictions for 2002–06. The dashed curve in Fig. 6 corresponds to the CRPS error for the deterministic prediction [see (5.2)] equal to the long-term median rainfall and the gray curve to the deterministic prediction equal to the long-term mean. The curves show that the CRPS error is usually smaller for the black line (i.e., the pdf predictions are usually better using our model than trying to predict the rainfall using the long-term median or mean). The biggest prediction improvement is in the north at Tully where Niño-3.4 has its biggest influence on the rainfall and the model skill in predicting rainfall can be utilized.

Table 3 shows the average (1981–2005) CRPS errors for predictions given data up to the end of January, February, and March. It echoes the above results that the model gives improved SON rainfall predictions, this improvement decreasing southward. Notice from the table that while predictive skill changes spatially as one goes from one station to the next, for each station the skill is similar whether predictions are made from the end of January, February, or March.

6. Conclusions

We have developed a method for the long-lead forecasting of ENSO-influenced rainfall probability, specifically applying it to Australia’s sugarcane region along its northeastern coast. For planning purposes and risk assessment in the highly variable climate, forecasts of SON rainfall pdfs in specific locations are needed at the beginning of the calendar year. The predicted pdfs can be used to assess the likelihood of extreme SON rainfall, which can be very damaging to the industry. The likelihood of extremely heavy SON rainfall increases as the magnitude of a La Niña increases.

The prediction scheme involves forecasting Niño-3.4 several months in advance across the austral autumn, the most difficult time of the year to forecast ENSO, and then using the known relationship between Niño-3.4 and SON rainfall based on historical data. The latter relationship is nonlinear in the sense that while bigger La Niñas typically correspond to higher rainfall, bigger El Niños typically do not result in drier conditions. The method takes into account both the uncertainty associated with the Niño-3.4 forecast and the uncertain connection between rainfall and Niño-3.4 using two pdfs—a Gaussian pdf associated with the error in predicting Niño-3.4 using the Clarke and Van Gorder (2003) forecast method and a gamma pdf whose two parameters depend on observed Niño-3.4.

Specifically, suppose it is, say, January 2010 and we wish to predict the 2010 SON rainfall pdf at Tully. The method’s first step is to predict the 2010 value of JJA Niño-3.4, since JJA Niño-3.4 is the 3-month Niño-3.4 index most strongly related to the SON Tully rainfall. One thousand sample values of observed JJA Niño-3.4 are then obtained from this prediction and the Gaussian predicted minus observed error distribution. Since the gamma distribution of the SON Tully rainfall is a known function of observed JJA Niño-3.4 through its alpha and beta parameters, the 1000 sample observed Niño-3.4 values determine 1000 gamma Tully rainfall distributions. Each of these is then sampled 1000 times to obtain a million SON rainfalls from which a histogram is constructed. This histogram is then an estimate of the SON 2010 rainfall pdf at Tully. To be consistent with using the gamma distribution to represent rainfall, in practice we fit the histogram to a final gamma distribution and make it our estimate of the SON 2010 rainfall pdf at Tully.

Cross-verified tests of the forecast skill were carried out at the Tully, Plane Creek, and Harwood mills, representing, respectively, the northern, central, and southern regions of the approximately 1700-km-long cane-growing coastal strip. The tests showed that there is some skill in the pdf forecasts, particularly in the northern region where the SON rainfall–ENSO connection is strongest.

Acknowledgments

The authors gratefully acknowledge financial support from the Australian Government through the Sugar Research and Development Corporation and from National Science Foundation Grants ATM-0623402 and OCE-0850749.

REFERENCES

  • Clarke, A. J., 2008: An Introduction to the Dynamics of El Niño & the Southern Oscillation. Elsevier, 324 pp.

  • Clarke, A. J., and S. Van Gorder, 2003: Improving El Niño prediction using a space-time integration of Indo-Pacific winds and equatorial Pacific upper ocean heat content. Geophys. Res. Lett., 30 , 1399. doi:10.1029/2002GL016673.

    • Search Google Scholar
    • Export Citation
  • Everingham, Y. L., R. C. Muchow, R. C. Stone, N. G. Inman-Bamber, A. Singels, and C. N. Bezuidenhout, 2002: Enhanced risk management and decision-making capability across the sugar industry value chain based on seasonal climate forecasts. Agric. Syst., 74 , 459477.

    • Search Google Scholar
    • Export Citation
  • Everingham, Y. L., A. J. Clarke, C. C. M. Chen, S. Van Gorder, and P. McGuire, 2007: Exploring the capabilities of a long lead forecasting system for the NSW sugar industry. Proc. Aust. Soc. Sugar Cane Technol., 29 , 917.

    • Search Google Scholar
    • Export Citation
  • Everingham, Y. L., A. J. Clarke, and S. Van Gorder, 2008: Long lead rainfall forecasts for the Australian sugar industry. Int. J. Climatol., 28 , 111117.

    • Search Google Scholar
    • Export Citation
  • Jeffrey, S. J., J. O. Carter, K. M. Moodie, and A. R. Beswick, 2001: Using spatial interpolation to construct a comprehensive archive of Australian climate data. Environ. Model. Softw., 16 (4) 309330.

    • Search Google Scholar
    • Export Citation
  • Power, S., M. Haylock, R. Colman, and X. Wang, 2006: The predictability of interdecadal changes in ENSO activity and ENSO teleconnections. J. Climate, 19 , 47554771.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.

APPENDIX

Bootstrap Approach for Estimating the Niño-3.4 Prediction Error

Figure 3 shows a histogram of predicted JJA Niño-3.4 forecasts from January minus the observed JJA Niño-3.4 values using the Clarke and Van Gorder (2003) ENSO prediction model. Another way to estimate this histogram is to use a bootstrap method as follows:

For each of the 21 yr from 1981 to 2001 we have a set of predictors, Niño-3.4 (t), τ(t), and h(t), and the corresponding observed predictands, Niño-3.4(t + Δt). We omit one year (say 1998) and sample the 1981–97 and 1999–2001 data 21 times with replacement to obtain a set of 21 predictors and their corresponding predictands. The coefficients a, b, and c in (3.1) are then found by a least squares fit and the prediction S(t) is made for 1998 using these coefficients and the predictors Niño-3.4(t), t(t), and h(t) for 1998. The difference S(t) − Niño-3.4(t + Δt) for 1998 is then calculated. This process is repeated 999 times so that for the year 1998 we have 1000 S(t) − Niño-3.4(t + Δt) differences. Similar “omit one year” calculations are then repeated for the other 20 yr in the 1981–2001 interval. In addition, bootstrap calculations are also performed for the 5 yr 2002–06 but in those cases none of the years in the 1981–2001 set of predictors and predictands had to be omitted. The result of all these calculations is a set of 26 000 realizations of S(t) − Niño-3.4(t + Δt), based, as much as possible, on the coefficients a, b, and c found during the 1981–2001 training interval. From these 26 000 realizations we construct the histogram and Gaussian pdf fit shown in Fig. A1.

Fig. 1.
Fig. 1.

Map of sugarcane-growing regions (gray areas) in northeastern Australia together with the locations of three of the sugar mills at Tully, Plane Creek, and Harwood. These mills are representative of the northern, central, and southern regions, respectively.

Citation: Journal of Applied Meteorology and Climatology 49, 7; 10.1175/2010JAMC2373.1

Fig. 2.
Fig. 2.

(a) SON rainfall at Tully plotted against the JJA averaged value of the Niño-3.4 index for the years 1950–2005. The bilinear fit to the data is the best fit in a least squares sense (for details, see the second to last paragraph of section 2). (b) As in (a), but for Plane Creek and the MJJ averaged value of the Niño-3.4 index. (c) As in (a), but for Harwood and the July–September (JAS) averaged value of the Niño-3.4 index and the years 1950–2006.

Citation: Journal of Applied Meteorology and Climatology 49, 7; 10.1175/2010JAMC2373.1

Fig. 3.
Fig. 3.

Histogram of JJA Niño-3.4 “forecasts” using data up to the end of January minus the observed JJA Niño-3.4 values for the Clarke and Van Gorder (2003) ENSO prediction model. The “predictions” are cross-verified hindcasts for the period from January 1981 to December 2001 and operational predictions from 2002 to 2006. The Gaussian curve with the same mean (−0.13) and standard deviation (0.51) as the histogram is also plotted. Calculations based on a Lilliefors test (see, e.g., Wilks 2006) reveal that the hypothesis that the histogram is Gaussian cannot be rejected at the 10% probability level.

Citation: Journal of Applied Meteorology and Climatology 49, 7; 10.1175/2010JAMC2373.1

Fig. 4.
Fig. 4.

(a) Operationally predicted, one million–sample histogram pdf of 2007 SON rainfall at Tully using climate data available up to the end of January 2007. The predicted mean SON rainfall is 500.5 mm, the predicted median is 423.9 mm, and the observed SON rainfall is 304.4 mm. The solid curved line is the fitted gamma pdf. (b) Cumulative distribution functions corresponding to the histogram (black) and gamma pdf (gray).

Citation: Journal of Applied Meteorology and Climatology 49, 7; 10.1175/2010JAMC2373.1

Fig. 5.
Fig. 5.

(a) SON median rainfall for Tully “predicted” (solid gray line) from January and the corresponding observed rainfall (solid black line). These two time series are correlated at r = 0.57. The dashed lines denote the 16⅔ and 83⅓ percentiles so theoretically there is a 66⅔% = ⅔ probability that the observed rainfall will lie in the envelope formed by the dashed lines. In fact, during the analyzed period 1981–2005, 19 out of 25 (76%) of the observed rainfalls were within the ⅔ probability envelope. The average absolute value of the difference between predicted median rainfall and observed rainfall was 156 mm while that between observed rainfall and the long-term median rainfall (270.2 mm) was 205 mm. (b) As in (a) but for Plane Creek. In this case r = 0.30, (44%) of the rainfalls were within the ⅔ probability envelope, the average of the absolute value of the difference between predicted median and observed rainfall was 116 mm, and the average of the absolute value of the difference between the long-term median (159 mm) and observed rainfall was 113 mm. (c) As in (a) but for Harwood with corresponding values r = 0.51, 24/26 = 92%, the average absolute value of the difference between the predicted median and observed SON rainfall = 46 mm, and the average absolute value of the difference between the long-term median (187 mm) and observed SON rainfall = 57 mm.

Citation: Journal of Applied Meteorology and Climatology 49, 7; 10.1175/2010JAMC2373.1

Fig. 6.
Fig. 6.

(a) CRPS (solid line) of the Tully SON rainfall (mm) pdf “predicted” from January for 1981–2005. The “predictions” are based on cross-verified results for 1981–2001 and predictions for 2002–05. Also shown is the absolute value of the difference (mm) between the long-term median (270 mm) and observed rainfalls (dashed line) and the long-term mean (337 mm) and observed rainfalls (gray line). (b) As in (a), but for Plane Creek with long-term median = 159 mm and long-term mean = 180 mm. (c) As in (a), but for Harwood with long-term median = 187 mm and long-term mean = 211 mm. At Harwood, data for 2006 were available and the extra year has been included.

Citation: Journal of Applied Meteorology and Climatology 49, 7; 10.1175/2010JAMC2373.1

Fig. A1.
Fig. A1.

Bootstrap histogram (see appendix text) of the difference between “predicted” (from January) and observed JJA Niño-3.4. Also shown is the corresponding Gaussian pdf (solid curve; mean = −0.18; standard deviation = 0.55) and the Gaussian pdf estimated directly from the original data (dashed curve; see also Fig. 3; mean = −0.13 and standard deviation = 0.51).

Citation: Journal of Applied Meteorology and Climatology 49, 7; 10.1175/2010JAMC2373.1

Table 1.

Pearson (ordinary) correlation (column 6) of a 3-month average of the El Niño index Niño-3.4 with SON rainfall at 33 stations (column 1) located (see columns 2 and 3) along the northeastern Australian coast for the years from 1950 to the (recent) end year in column 8. The 3-month average Niño-3.4 index (column 4) leads the SON rainfall by the number of months shown in column 5. For example, the correlation at Mossman is between the SON rainfall at Mossman and the MJJ Niño-3.4 index 4 months earlier and so for Mossman MJJ is in column 4 and 4 is in column 5. The lead was chosen to give a maximum (in magnitude) correlation. The stations have been arranged so that latitude south increases down the list. Harwood and all stations north of Isis have correlation coefficients significantly different from zero since |rcrit(95%)| = 0.22. Column 7 is the ratio of the explained variance of the least squares bilinear fit (see Fig. 2) to the explained variance of the standard linear regression fit. The larger the number is when compared with unity, the better the bilinear fit is than the standard regression fit. Note that the column 7 calculations were done with the lead months of column 5 corresponding to standard linear correlation; in a few cases slightly different leads may have resulted if we had optimized the lead for the bilinear fit. Here, MAM indicates March–May and DJF indicates December–February. The other 3-month acronyms are defined in the text.

Table 1.
Table 2.

Mean and median rainfall as well as bilinear fit (see Fig. 2) SON rainfall parameters for the Tully, Plane Creek, and Harwood mills for the years 1950–2005, 1950–2005, and 1950–2006, respectively. The quantities (N*, P*) correspond to the hinge point in Fig. 2 where the sloping line segment of slope dP/dN for N = Niño-3.4 ≤ N* joins the constant SON precipitation line P = P*; N*, P*, and dP/dN were varied to obtain the best fit to the data in the least squares sense with the tabulated values shown. Under this bilinear fit, s*, the sample standard deviation for Niño-3.4 > N* and s_ the sample standard deviation for Niño-3.4 ≤ N*, were calculated as in columns 7 and 8. The parameter N_, the average value of Niño-3.4 for Niño-3.4 ≤ N*, was used to estimate the bilinear dependence of standard deviation on Niño-3.4 (see the last paragraph of section 2).

Table 2.
Table 3.

Average (1981–2005) CRPS errors in mm of SON rainfall for predicted pdfs at Tully, Plane Creek, and Harwood using our prediction method and predictions based on the long-term SON median rainfall. Predictions are made given data up to the end of January, February, and March. Positive values in the next to last column indicate a lower error for our prediction method. The skill in the last column is equal to (our method CRPS − long-term medium CRPS) divided by (perfect prediction CRPS − long-term median CRPS). Since the perfect prediction CRPS is zero, this ratio reduces to 1 − (our method CRPS)/(long-term median CRPS). The skill is thus 100% when our method is perfect, between 0% and 100% when it beats the climatological long-term median skill and negative when it loses to climatology.

Table 3.
Save