• Agresti, A., 1996: An Introduction to Categorical Data Analysis. Wiley Series in Probability and Statistics, Wiley, 290 pp.

  • Aksu, C., , and S. Gunter, 1992: An empirical analysis of the accuracy of SA, OLS, ERLS and NRLS combination forecasts. Int. J. Forecasting, 8 , 2743.

    • Search Google Scholar
    • Export Citation
  • Armstrong, J. S., 1989: Combining forecasts: The end of the beginning or the beginning of the end? Int. J. Forecasting, 5 , 585588.

  • Armstrong, J. S., , and J. S. Armstrong, 2001: Combining forecasts. Principles of Forecasting: A Handbook for Researchers and Practitioners, J. S. Armstrong, Ed., Kluwer Academic, 417–439.

    • Search Google Scholar
    • Export Citation
  • Barnston, A. G., , S. J. Mason, , L. Goddard, , D. G. Dewitt, , and S. E. Zebiak, 2003: Multimodel ensembling in seasonal climate forecasting at IRI. Bull. Amer. Meteor. Soc., 84 , 17831796.

    • Search Google Scholar
    • Export Citation
  • Bates, J. M., , and C. W. J. Granger, 1969: The combination of forecasts. Oper. Res. Quart., 20 , 451468.

  • Bottomley, M., , C. K. Folland, , J. Hsiung, , R. E. Newell, , and D. E. Parker, 1990: Global Ocean Surface Temperature Atlas. Met Office, and Massachusetts Institute of Technology, 20 pp. and 313 plates.

    • Search Google Scholar
    • Export Citation
  • Carrasco, J. A., , and J. D. Ortuzar, 2002: Review and assessment of the nested logit model. Transp. Rev., 22 , 197218.

  • Chambers, J. M., 1992: Linear models. Statistical Models in S, J. M. Chambers and T. Hastie, Eds., Wadsworth & Brooks, 95–144.

  • Chandler, R. E., 2005: On the use of generalized linear models for interpreting climate variability. Environmetrics, 16 , 699715.

  • Clemen, R. T., 1989: Combining forecasts: A review and annotated bibliography. Int. J. Forecasting, 5 , 559583.

  • Cleveland, W. S., 1979: Robust locally weighted regression and smoothing scatterplots. J. Amer. Stat. Assoc., 74 , 829836.

  • Cleveland, W. S., , S. J. Devlin, , and E. Grosse, 1988: Regression by local fitting: Methods, properties, and computational algorithms. J. Econometrics, 37 , 87114.

    • Search Google Scholar
    • Export Citation
  • Coelho, C. A. S., , S. Pezzulli, , M. Balmaseda, , F. J. Doblas-Reyes, , and D. B. Stephenson, 2004: Forecast calibration and combination: A simple Bayesian approach for ENSO. J. Climate, 17 , 15041516.

    • Search Google Scholar
    • Export Citation
  • Colman, A. W., , and M. K. Davey, 2003: Statistical prediction of global sea-surface temperature anomalies. Int. J. Climatol., 23 , 16771697.

    • Search Google Scholar
    • Export Citation
  • Dawes, R., , R. Fildes, , M. Lawrence, , and K. Ord, 1994: The past and the future of forecasting research. Int. J. Forecasting, 10 , 151159.

    • Search Google Scholar
    • Export Citation
  • de Gooijer, J. G., , and R. J. Hyndman, 2006: 25 years of time series forecasting. Int. J. Forecasting, 22 , 443473.

  • de Menezes, L. M., , D. W. Bunn, , and J. W. Taylor, 2000: Review of guidelines for the use of combined forecasts. Eur. J. Oper. Res., 120 , 190204.

    • Search Google Scholar
    • Export Citation
  • Deutsch, M., , C. W. J. Granger, , and T. Terasvirta, 1994: The combination of forecasts using changing weights. Int. J. Forecasting, 10 , 4757.

    • Search Google Scholar
    • Export Citation
  • Doblas-Reyes, F. J., , R. Hagedorn, , and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting: II. Calibration and combination. Tellus, 57A , 234252.

    • Search Google Scholar
    • Export Citation
  • Dunteman, G. H., , and M. R. Ho, 2006: An Introduction to Generalized Linear Models. Quantitative Applications in the Social Sciences, Vol. 145, Sage Publications, 72 pp.

    • Search Google Scholar
    • Export Citation
  • Ferrari, S. L. P., , and F. Cribari-Neto, 2004: Beta regression for modelling rates and proportions. J. Appl. Stat., 31 , 799815.

  • Fraedrich, K., , and N. Smith, 1989: Combining predictive scheme in long-range forecasting. J. Climate, 2 , 291294.

  • Fritsch, J. M., , J. Hilliker, , and J. Ross, 2000: Model consensus. Wea. Forecasting, 15 , 571582.

  • Gelman, A., , and J. Hill, 2006: Data Analysis Using Regression and Multilevel/Hierarchical Models. 1st ed. Cambridge University Press, 625 pp.

    • Search Google Scholar
    • Export Citation
  • Granger, C. W. J., , and P. Newbold, 1977: Forecasting Economic Time Series. Academic Press, 333 pp.

  • Greene, A. M., , L. Goddard, , and U. Lall, 2006: Probabilistic multimodel regional temperature change projections. J. Climate, 19 , 43264343.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., , and R. Tibshirani, 1986: Generalized additive models. Stat. Sci., 1 , 297310.

  • Hastie, T., , and D. Pregibon, 1992: Generalised linear models. Statistical Models in S, J. M. Chambers and T. Hastie, Eds., Wadsworth & Brooks, 195–248.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., , R. Tibshirani, , and J. Friedman, 2000: The Elements of Statistical Learning, Data Mining, Inference and Prediction. Springer Series in Statistics, Springer, 533 pp.

    • Search Google Scholar
    • Export Citation
  • Hibon, M., , and T. Evgeniou, 2005: To combine or not to combine: Selecting among forecasts and their combinations. Int. J. Forecasting, 21 , 1524.

    • Search Google Scholar
    • Export Citation
  • Hoeting, J. A., , D. Madigan, , A. E. Raftery, , and C. T. Volinsky, 1999: Bayesian model averaging: A tutorial (with discussion). Stat. Sci., 14 , 382417.

    • Search Google Scholar
    • Export Citation
  • Kamstra, M., , and P. Kennedy, 1998: Combining qualitative forecasts using logit. Int. J. Forecasting, 14 , 8393.

  • Kim, Y. O., , D. Jeong, , and I. H. Ko, 2006: Combining rainfall-runoff model outputs for improving ensemble streamflow prediction. J. Hydrol. Eng., 11 , 578588.

    • Search Google Scholar
    • Export Citation
  • Kondrashov, D., , S. Kravtsov, , A. W. Robertson, , and M. Ghil, 2005: A hierarchy of data-based ENSO models. J. Climate, 18 , 44254444.

  • Lall, U., , and A. Sharma, 1996: A nearest neighbor bootstrap for time series resampling. Water Resour. Res., 32 , 679693.

  • Lall, U., , Y-I. Moon, , H-H. Kwon, , and K. Bosworth, 2006: Locally weighted polynomial regression: Parameter choice and application to forecasts of the Great Salt Lake. Water Resour. Res., 42 , W05422. doi:10.1029/2004WR003782.

    • Search Google Scholar
    • Export Citation
  • Larrick, R. P., , and J. B. Soll, 2006: Intuitions about combining opinions: Misappreciation of the averaging principle. Manage. Sci., 52 , 111127.

    • Search Google Scholar
    • Export Citation
  • Lundberg, S., , T. Terasvirta, , and D. van Dijk, 2003: Time-varying smooth transition autoregressive models. J. Business Econ. Stat., 21 , 104112.

    • Search Google Scholar
    • Export Citation
  • Marshall, L. A., 2006: Bayesian analysis of rainfall-runoff models: Insights to parameter estimation, model comparison and hierarchical model development. Ph.D. thesis, School of Civil and Environmental Engineering, University of New South Wales, 222 pp.

  • Marshall, L. A., , D. Nott, , and A. Sharma, 2007: Towards dynamic catchment modelling: A Bayesian hierarchical modelling framework. Hydrol. Processes, 21 , 847861.

    • Search Google Scholar
    • Export Citation
  • McCullagh, P., , and J. A. Nelder, 1989: Generalized Linear Models. 2nd ed. Chapman and Hall, 511 pp.

  • McLeod, A. I., , D. J. Noakes, , K. W. Hipel, , and R. M. Thompstone, 1987: Combining hydrologic forecast. J. Water Resour. Plann. Manage., 113 , 2941.

    • Search Google Scholar
    • Export Citation
  • Mehrotra, R., , and A. Sharma, 2006: Conditional resampling of hydrologic time series using multiple predictor variables: A K-nearest neighbour approach. Adv. Water Resour., 29 , 987999.

    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., and Coauthors, 2004: Development of a European multimodel ensemble system for seasonal-to-interannual prediction (DEMETER). Bull. Amer. Meteor. Soc., 85 , 853872.

    • Search Google Scholar
    • Export Citation
  • Pavan, V., , and F. J. Doblas-Reyes, 2000: Multi-model seasonal hindcasts over the Euro-Atlantic: Skill scores and dynamic features. Climate Dyn., 16 , 611625.

    • Search Google Scholar
    • Export Citation
  • Peng, P., , A. Kumar, , H. Van den Dool, , and A. G. Barnston, 2002: An analysis of multimodel ensemble predictions for seasonal climate anomalies. J. Geophys. Res., 107 , 4710. doi:10.1029/2002JD002712.

    • Search Google Scholar
    • Export Citation
  • Phillips-Wren, G. E., , E. D. Hahn, , and G. A. Forgionne, 2004: A multiple-criteria framework for evaluation of decision support systems. Omega, 32 , 323.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., , T. Geneiting, , F. Balabdaoui, , and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133 , 11551174.

    • Search Google Scholar
    • Export Citation
  • Ragonda, S. K., , B. Rajagopalan, , M. Clark, , and E. Zagona, 2006: A multimodel ensemble forecast framework: Application to spring seasonal flows in the Gunnison River Basin. Water Resour. Res., 42 , W09404. doi:10.1029/2005WR004653.

    • Search Google Scholar
    • Export Citation
  • Rajagopalan, B., , U. Lall, , and S. E. Zebiak, 2002: Categorical climate forecasts through regularization and optimal combination of multiple GCM ensembles. Mon. Wea. Rev., 130 , 17921811.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., , U. Lall, , S. E. Zebiak, , and L. Goddard, 2004: Improved combination of multiple atmospheric GCM ensembles for seasonal prediction. Mon. Wea. Rev., 132 , 27322744.

    • Search Google Scholar
    • Export Citation
  • Sanders, F., 1963: On subjective probability forecasting. J. Appl. Meteor., 2 , 191201.

  • See, L., , and R. J. Abrahart, 2001: Multi-model data fusion for hydrological forecasting. Comput. Geosci., 27 , 987994.

  • Shamseldin, A. Y., , K. M. O’Connor, , and G. C. Liang, 1997: Methods for combining the outputs of different rainfall-runoff models. J. Hydrol., 197 , 203229.

    • Search Google Scholar
    • Export Citation
  • Sharma, A., 2000: Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1—A strategy for system predictor identification. J. Hydrol., 239 , 232239.

    • Search Google Scholar
    • Export Citation
  • Sharma, A., , and U. Lall, 1999: A nonparametric approach for daily rainfall simulation. Math. Comput. Simul., 48 , 361371.

  • Sharma, A., , and U. Lall, 2004: Model averaging and its use in probabilistic forecasting of hydrologic variables. Hydrology, Science and Practice for the 21st Century, Imperial College, London, British Hydrological Society, 372–378.

    • Search Google Scholar
    • Export Citation
  • Shephard, N., 1995: Generalized linear autoregressions. Economics working paper 8, Nuffield College, University of Oxford, 13 pp.

  • Smith, T. M., , and R. W. Reynolds, 2003: Extended reconstruction of global sea surface temperatures based on COADS data (1854–1997). J. Climate, 16 , 14951510.

    • Search Google Scholar
    • Export Citation
  • Terui, N., , and H. K. van Dijk, 2002: Combined forecasts from linear and nonlinear time series models. Int. J. Forecasting, 18 , 421438.

    • Search Google Scholar
    • Export Citation
  • Thompson, P. D., 1977: How to improve accuracy by combining independent forecast. Mon. Wea. Rev., 105 , 228229.

  • Van den Dool, H., 2000: Constructed analogue prediction of the east central tropical Pacific SST and the entire world ocean for 2001. Exp. Long-Lead Forecast Bull., 9 , 3841.

    • Search Google Scholar
    • Export Citation
  • Xiong, L., , A. Y. Shamseldin, , and K. M. O’Connor, 2001: A non-linear combination of the forecasts of rainfall-runoff models by the first-order Takagi-Sugeno fuzzy system. J. Hydrol., 245 , 196217.

    • Search Google Scholar
    • Export Citation
  • Yang, C., , R. E. Chandler, , V. S. Isham, , and H. S. Wheater, 2005: Spatial-temporal rainfall simulation using generalized linear models. Water Resour. Res., 41 , W11415. doi:10.1029/2004WR003739.

    • Search Google Scholar
    • Export Citation
  • Yu, L., , S. Wang, , and K. K. Lai, 2005: A novel nonlinear ensemble forecasting model incorporating GLAR and ANN for foreign exchange rates. Comput. Oper. Res., 32 , 25232541.

    • Search Google Scholar
    • Export Citation
  • Zou, H., , and Y. Yang, 2004: Combining time series models for forecasting. Int. J. Forecasting, 20 , 6984.

  • View in gallery

    (a) The classification method of the bias direction variable. The time series of the ratio of two model residuals (e2/e1) are grouped into three zones. The residuals are classified into a three category response variable {mix, zero, one} as shown. (b) The simple ordered logistic model for a three category response variable. The regression lines are dividing the probability space. For a given value of the predictor x, the dashed line is showing P(b = mix) equal to 0.26, P(b = zero) equal to 0.70–0.26, and P(b = one) equal to 1–0.70.

  • View in gallery

    The tree showing pairwise hierarchical mixing of four component models.

  • View in gallery

    The paired plot of the three model residuals. The value in the diagonals shows the variance of the individual model error, the numbers in upper boxes are the covariance. The lower covariance has higher potential of improvement via combination.

  • View in gallery

    Pairwise hierarchical mixing tree of UCLA (U), CPC (C), and ECMWF (E). The first pair UCLA + ECMWF is chosen owing to its lowest covariance.

  • View in gallery

    The overall dynamic weights for the UCLA model. The predicted weights in validation are drawn in solid lines and the optimum weights as black dots. The broken line is a four monthly moving average of the optimum weights included for clarity. The near horizontal line along 0.55 is the static weight.

  • View in gallery

    The overall dynamic weights for the CPC model. The predicted weights in validation are drawn in solid lines and the optimum weights as black dots. The broken line is a four monthly moving average of the optimum weights included for clarity. The near horizontal line along 0.35 is the static weight.

  • View in gallery

    The error variance of the combined prediction is drawn against the parameters of Eqs. (5) and (6). The predictive variance of static weight combination of period 1992–2001 is 0.192, higher h value or γ = 1 collapses the model to the static weight case. None of the two methods exhibit any improvement (i.e., error variance < 0.192) by localized estimation of the weights (by smaller h or γ > 1).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 21 21 3
PDF Downloads 10 10 1

Long-Range Niño-3.4 Predictions Using Pairwise Dynamic Combinations of Multiple Models

View More View Less
  • 1 School of Civil and Environmental Engineering, University of New South Wales, Sydney, Australia
© Get Permissions
Full access

Abstract

The interest in climate prediction has seen a rise in the number of modeling alternatives in recent years. One way to reduce the predictive uncertainty from any such modeling procedure is to combine or average the modeled outputs. Multiple model results can be combined such that the combination weights may either be static or vary over time. This research develops a methodology for combining forecasts from multiple models in a dynamic setting. The authors mix models on a pairwise basis using importance weights that vary in time, reflecting the persistence of individual model skills. Such an approach is referred to here as a dynamic pairwise combination tree and is presented as an improvement over the case where the importance weights are static or constant over time. The pairwise importance weight is modeled as a product of a “mixture ratio” and a “bias direction,” the former representing the fraction of the absolute residual error associated with each of the paired models, and the latter representing an indicator of the sign of the two residual errors. The mixture ratio is modeled using a generalized autoregressive model and the bias direction using ordered logistic regression.

The method is applied to combine three climate models, the variables of interest being the monthly sea surface temperature anomalies averaged over the Niño-3.4 region from 1956 to 2001. The authors test the combined model skill using a “leave ± 6 months out cross-validation” approach along with validation in 10-yr blocks. This study attained a small but consistent improvement of the predictive skill of the dynamically combined models compared to the existing practice of static weight combination.

Corresponding author address: A/Prof. Ashish Sharma, School of Civil and Environmental Engineering, University of New South Wales, Sydney, NSW 2052, Australia. Email: a.sharma@unsw.edu.au

Abstract

The interest in climate prediction has seen a rise in the number of modeling alternatives in recent years. One way to reduce the predictive uncertainty from any such modeling procedure is to combine or average the modeled outputs. Multiple model results can be combined such that the combination weights may either be static or vary over time. This research develops a methodology for combining forecasts from multiple models in a dynamic setting. The authors mix models on a pairwise basis using importance weights that vary in time, reflecting the persistence of individual model skills. Such an approach is referred to here as a dynamic pairwise combination tree and is presented as an improvement over the case where the importance weights are static or constant over time. The pairwise importance weight is modeled as a product of a “mixture ratio” and a “bias direction,” the former representing the fraction of the absolute residual error associated with each of the paired models, and the latter representing an indicator of the sign of the two residual errors. The mixture ratio is modeled using a generalized autoregressive model and the bias direction using ordered logistic regression.

The method is applied to combine three climate models, the variables of interest being the monthly sea surface temperature anomalies averaged over the Niño-3.4 region from 1956 to 2001. The authors test the combined model skill using a “leave ± 6 months out cross-validation” approach along with validation in 10-yr blocks. This study attained a small but consistent improvement of the predictive skill of the dynamically combined models compared to the existing practice of static weight combination.

Corresponding author address: A/Prof. Ashish Sharma, School of Civil and Environmental Engineering, University of New South Wales, Sydney, NSW 2052, Australia. Email: a.sharma@unsw.edu.au

1. Introduction

Climate models vary in complexity from simplistic conceptualizations of the underlying physics, to statistical or empirically based methods, to detailed physical representations of the processes involved. The relative strengths and weaknesses of climate prediction models vary depending on the assumed model structure, the data quality, the period of calibration, and the method of validation. The increase in computing power and availability of more accurate input data has resulted in significant improvements in climate predictions. However, the fact remains that each type of model is often able to capture some aspect of the underlying behavior better than the other. The differential improvement in prediction, achieved by investigating a single modeling approach in isolation, diminishes asymptotically. Climate modelers are now combining various models in order to exploit the strength of individual approaches and reduce the variance of predictive uncertainty (Barnston et al. 2003; Colman and Davey 2003; Greene et al. 2006; Peng et al. 2002; Raftery et al. 2005; Robertson et al. 2004; Sharma and Lall 2004). The combination parameters are usually estimated based on overall performance of the component models. The consideration of persistence of the component model skills while building a combination of predictive models is introduced in this paper. The method is referred to as a pairwise dynamic model combination, the term dynamic being used to denote the fact that the mode of combination varies with time, with combination weights (or importance weights as they are referred to later in the paper) being modeled on the basis of the persistence they exhibit in some local time window. The aim is to improve upon the existing method of combining model predictions that overlooks persistence in the individual model skills. We present a general review of the existing developments that set the background of our current research in the remainder of this section.

The rationale behind model combination is the statistical principle that the weighted mean of two zero centered symmetrical distributions has a lower variance. This principle raises the possibility that two or more inaccurate but independent predictions of the same future event can be combined to yield a prediction that is, on average, more accurate than either of them taken individually (Bates and Granger 1969; Fraedrich and Smith 1989; Granger and Newbold 1977; Sanders 1963; Thompson 1977). The advantage of combining predictions using various methods is researched in the fields of biometrics, econometrics, and decision sciences (Dawes et al. 1994; de Menezes et al. 2000; Larrick and Soll 2006; Phillips-Wren et al. 2004). Clemen (1989) drew conclusions based on an extensive literature search that combining forecasts leads to increased forecast accuracy. Accordingly, the ensembles of various models are being routinely used now to issue predictions in various disciplines (Armstrong 2001; Hoeting et al. 1999).

The methods of model combination can be classed into two broad categories: static combination and dynamic combination. The static combination method, as the name implies, leads to a weighted average output in which the weights are time invariant and, hence, does not consider any temporal variations of the component model skill. Variations of such a weighted average combination (which includes Bayesian model averaging) have been used in a variety of applications such as rainfall runoff modeling (Granger and Newbold 1977; Kim et al. 2006; Marshall 2006; McLeod et al. 1987; Ragonda et al. 2006; Shamseldin et al. 1997; Xiong et al. 2001) and climate modeling (Coelho et al. 2004; Fritsch et al. 2000; Peng et al. 2002; Raftery et al. 2005; Rajagopalan et al. 2002; Robertson et al. 2004). The weighted average combination forms the benchmark against which we compared the performance of our proposed pairwise dynamic combination approach. It should be noted that estimation of the weights can be performed using an equal weighting for all models, or a weighting that reflects the accuracy of individual models, or using more appropriate optimization-based approaches that maximize the performance of the weighted combination output (Coelho et al. 2004; Doblas-Reyes et al. 2005; Kondrashov et al. 2005; Pavan and Doblas-Reyes 2000; See and Abrahart 2001; Xiong et al. 2001). However, such an approach is unable to allow dynamic variations in weights to form outputs that resemble more the outputs from the better performing models at any given point in time.

The dynamic combination allows the combination weights to vary over time. This combined outputs to take into account local nonstationarities and inhomogeneities in individual model outputs, thereby resulting in a forecast that is less susceptible to sudden and unexplained variations. The early attempt of dynamic combination was in the form of a switching regression model (Deutsch et al. 1994), which later evolved into more complex nonlinear combinations (Lundberg et al. 2003; Terui and van Dijk 2002; Zou and Yang 2004). These studies are related to forming econometric models, with the relevant papers appearing in econometric and applied statistics literature and no similar work being reported in climate science. We present next an approach that dynamically mixes climate prediction models in a pairwise hierarchical tree structure, offering significant advantages to model combinations that are of a more static form.

2. Model combination

a. Static combination

We first introduce a static combination of m = 1,2,…,M models for a period of t = 1,2,…,tmax. Let us define component predictions as {ûm,t; m = 1,2,…,M; t = 1,2 …,tmax}, with residual error as {em,t; m = 1,2,…,M; t = 1,2 …,tmax } so that
i1520-0442-22-3-793-e1
where yt is the observed response at time t. Then the combined prediction ŷt(s) is ascertained as
i1520-0442-22-3-793-e2
where wm(s) are the static weights, conditional to wm(s) > 0 and ∑m wm(s) = 1.
The parameter vector w(s) = {wm(s); m = 1,2,…,M} is estimated through a constrained minimization of the error variance of ŷ(s) = {ŷt(s); t = 1,2…,tmax}, under the constraint that the parameters lie in the range 0 to 1 and sum to unity. This constraint reflects the assumption that each component model is unbiased. For a two-model case the maximum likelihood estimate of w(s) from a bivariate normal error distribution can be derived to be
i1520-0442-22-3-793-e3
where e1,t and e2,t are residual errors of model 1 and model 2, respectively.

Note that this paper follows notations of italic for scalar values (e.g.: ûm,t , wm(s)) and bold regular roman (e.g.: um, w(s)) for vectors and bold san serif (e.g.: 𝘂, 𝗪) to represent higher-dimension matrices if not mentioned otherwise. Vector series are enclosed by braces. Functions are denoted by names followed by parentheses, such as minimum being specified as min(···).

This weighted average combination method is referred as static combination hereafter owing to its time-invariant weight w(s). This paper proposes the dynamic weight instead, {ωt}, which incorporates the persistence of component model skills as described in the rest of this section. The error variance of the static combination prediction ŷ(s) is used as the benchmark of performance of this proposed method.

b. Paired dynamic combination

Consider a case of combining predictions of a pair comprised of ith and jth component models {m = i, j} using dynamic weights. If the component predictions at time t are ûi,t and ûj,t, then the two models can be combined as follows:
i1520-0442-22-3-793-e4

Here ët is the residual of the combination where the true weight ωt is available. We continue the assumption of the static weight formulation that the component predictions are unbiased and hence restrict the weights within 0 to 1. This constraint reduces the serial correlations in the combined forecast error (Aksu and Gunter 1992).

Early research on model combinations by Bates and Granger (1969) acknowledged the possible nonstationarity of Eq. (3) and, hence, the need for estimating weights dynamically. The approach adopted in this and earlier studies is to investigate the possibility of using a dynamic structure in formulating the weights. Two possible ways of specifying the autoregressive structure in the dynamic weights are (Granger and Newbold 1977; McLeod et al. 1987)
i1520-0442-22-3-793-e5
i1520-0442-22-3-793-e6
where t is the current time, h is a time bandwidth representing a local window centered around time t, and 1≥ λ > 0 and γ ≥ 1 are parameters that control the degree of autocorrelation. Note that these methods are primarily based on precision (inverse of prediction error variance) of component predictions. The case of (λ = 0, h = t − 1) in Eq. (5) and γ = 1 in Eq. (6) collapses them to the static weight estimate of Eq. (3) when combining two independent predictions.

Our proposed method starts by first computing a time series of target weights {ωt ∈ [0,1]} that would produce perfect combined hindcast out of the component hindcast pairs. The target weights are predicted using generalized linear models (Chandler 2005; Dunteman and Ho 2006; McCullagh and Nelder 1989; Yang et al. 2005). Generalized linear regression requires that the response variable belongs to exponential family of distributions, in contrast the target weights, which follow a beta distribution (Bates and Granger 1969). The requirement is met by formulating separate linear models in two steps.

The first step uses a generalized linear autoregressive (GLAR) model (Shephard 1995; Yu et al. 2005) as the basis of predicting the mixture ratio rt:
i1520-0442-22-3-793-e7

The GLAR is a special case of a generalized linear model that includes both autoregressive and exogenous covariates. Exogenous covariates imply the predictors external to the ones used in the component models and thus potentially adjoin additional predictive information.

GLAR estimates the predicted mixture ratio as follows:
i1520-0442-22-3-793-e8
where
  • rt-.: {1, rt-h(1), rt-h(2), …}, stepwise autoregressive covariates at lags of h(1), h(2), …, representing the persistence that is exhibited in rt;
  • Zt-.: {z1,t-., z2,t-., …}, exogenous covariates at earlier times (subscript t-.<t);
  • θt: seasonally variant intercept, varying from one season to the other, but not varying across years;
  • φ, ψ: {φ0, φ1, φ2,…; ψ1, ψ2, …}T the time-invariant model parameters and the intercept (φ0); and
  • g(.): a function transforming the response variable known as link function of the generalized linear model.
The link function g(.) is chosen in a way that it transforms the bounded mixture ratio ≡ {0 → 1} to unbounded values, ≡ {−∞ → +∞}. This research applies the following logit(.) link function, which was used in the studies of forecast probabilities (Carrasco and Ortuzar 2002; Kamstra and Kennedy 1998):
i1520-0442-22-3-793-e9

GLAR parameters are estimated using maximum likelihood of the beta binomial distribution (Gelman and Hill 2006; Yang et al. 2005) for the response rt, which was found to be suitable in the context of the over dispersion of r described in later sections.

The mixture ratio, Eq. (7), is not sufficient to keep ët2 ≤ min(ei,t2, ej,t2). The second step introduces additional criteria that aim to identify the direction of the bias of each model. The models are combined based on rt only when ei,t and ej,t have opposing sign; that is, two predictions are bracketing the true value. On the other hand, while both predictions exhibit bias in the same direction the better prediction is chosen ignoring rt. The bias direction {bt; t =1,2,…,tmax} is mapped into three categories (see Fig. 1a) where bt is a categorical variable as follows:
i1520-0442-22-3-793-e10
The optimum measure of ωt+1 is defined as follows:
i1520-0442-22-3-793-e11
The prediction of ωt+1 is done in a two-step process. The first step involves predicting rt+1, the mixture ratio model using GLAR. The second step is predicting the bias direction b = {mix, zero, one} using an ordered logistic regression (OLR) model (Agresti 1996). In OLR, the cumulative probability of b, (b) is estimated as
i1520-0442-22-3-793-e12
where α = {α1, α2} are intercepts, xt = {x0,t, x1,t, x2,t,…} are predictor vectors inclusive of a periodic intercept (x0,t) and autoregressive and any exogenous covariates, and β = {1, β1, β2, ..}T are model parameters. No third equation is necessary since P(b = one) = 1 − P(b = mix or b = zero). The logic for generation of this three category ordered regression variable is presented in Fig. 1b.
If the predicted mixture ratio is t and the bias direction is t then the fitted dynamic weight (ω̂t) of the pairwise model combination can be estimated from Eq. (11). The dynamically combined prediction of the hydrologic response variable ŷt(d) is as follows:
i1520-0442-22-3-793-e13

c. Multiple model combination

The last section presented the basis for a pairwise combination of models. The exercise is now extended to M component models, where M > 2. We propose a paired combination hierarchical tree as shown for a four model case in Fig. 2.

Denoting the component prediction errors of the component models as 𝗲 = {em,t; m = 1,2,…,M; t = 1,2…,tmax}, one can estimate the variance–covariance matrix of the residuals cov(𝗲) as {cij; i=1,2,…,M; j = 1,2,…,M}. The model pair with smaller covariance has a higher potential of improvement after combination. Hence, the model pairing is performed by first sorting the models in order of their individual residual variance, and then starting from the lowest variance model and finding its pair as the model with which it has the lowest covariance. This process is repeated for the models that remain until all models are exhausted. If the number of component models is even, one would expect all models to be paired; if not, one would expect one component model to remain on its own. In the notation used in Fig. 2, the indices of the models have been altered to reflect the pairs as (1,2) and (3,4). This notation will be followed in the remainder of this paper.

The hierarchical combination tree will have multiple levels depending on the number of component models present. The hierarchical tree contains l levels which satisfy the following constraints: 2lM and 2l−1 < M, where the exact value depends on the binary divisibility of M. This hierarchical tree uses the same (M − 1) number of weight parameters as the static combination method described in section 2a. If wt,i(k) includes {ωtwt,i(k)} and represents the ith weight time series vector at kth level of the mixing tree then the weight matrix 𝗪 can be shown as following:
i1520-0442-22-3-793-e14
where nk ≤ 2k−1.
The predicted value of 𝗪 consists of (M − 1)tmax elements of ŵt where the full set of component predictions are = {ûm,t; m = 1,2,…,M; t = 1,2,…,tmax}. The hierarchical extension of Eq. (13) for the tree shown in Fig. 2 where M = 4 is as follows:
i1520-0442-22-3-793-e15

d. Model combination algorithm

The algorithm for combining M models using the pairwise dynamic procedure described above is as follows:

  • (i) Index the models in such a way that the pairs satisfy the logic in section 2c.
  • (ii) Choose the pair {û1,t, û2,t;t = 1,2…,tmax} and compute the target mixture ratio and bias direction {rt, bt; t = 1, 2…tc} for the period of calibration tc using Eqs. (7) and (12).
  • (iii) Identify any autoregressive structure in {r, b} and plausible exogenous predictors using a model selection algorithm such as the Akaike Information Criterion (AIC) (Chambers 1992; Hastie and Pregibon 1992; Hastie et al. 2000). Ascertain parameters φ, ψ, α, and β of the selected model.
  • (iv) Apply the developed model to obtain estimates of combination weights for a forecast period tc+, {ŵt; t tc+}.
  • (v) Repeat steps (ii)–(iv) for all M/2 pairs of combination, if M is even. Otherwise repeat for (M − 1)/2 pairs and add the remaining component model at lower tree level.
  • (vi) Repeat steps (i)–(v) for all l levels of the hierarchical tree and thus estimate the weight matrix .

Compute the final estimate by Eq. (15). The error variance of ŷ(d) should be narrower than that ŷ(s).

3. Application

The pairwise dynamic combination approach is applied to three component models selected from the pool of models available to predict globally gridded sea surface temperature anomalies (SSTA). The base of the anomalies was the Global Ocean Surface Temperature Atlas (GOSTA) climatology of 1951–80 (Bottomley et al. 1990). The extended SST dataset, reconstructed at the U.S. National Climate Data Center (Smith and Reynolds 2003), was used as observed SST. The component models predicted monthly SSTA at the Niño-3.4 region at three months in advance. For example, the SSTA value in April, May, or June 1980 corresponds to the forecast of those months as made in January, February, or March 1980, respectively. The first set of the three models was developed at the University of California, Los Angles (hereafter referred to as the UCLA model) (Kondrashov et al. 2005). This is a multilevel quadratic inverse stochastic model formulated using global sea surface temperature data from 1950 to 2003 with an emphasis on ENSO variability. The second set of the three models was developed at the Climate Prediction Centre (CPC) of the National Oceanic and Atmospheric Administration. It uses a statistical method known as constructed analog and referred to as the CPC model (Van den Dool 2000). The third model was prepared by the Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER) project of the European Centre for Medium-Range Weather Forecasts and referred to as the ECMWF model (Palmer et al. 2004). The concurrent hindcasts during the period of January 1956 to December 2001 of these component models are used as the basis of evaluating the accuracy of the model combination procedure. This study used an available set of hindcast realization only and it may not be the most up-to-date version of the component model. All SSTA time series, except UCLA, were downloaded from the data library of the International Research Institute for Climate and Society, New York (accessed on February 2006). The UCLA data was collected from D. Kondrashov (2007, personal communication). The relative performances of our hindcast dataset (represented as residuals from the observed) are illustrated in Fig. 3. Note that the component model pair, which shows a low residual covariance, offers a potential of improvement based on the pairwise dynamic combination approach.

This research briefly trialed the two existing dynamic weight estimation methods presented in Eqs. (5) and (6). The proposed GLAR plus OLR method is tested in greater detail. The first step in the proposed approach is the identification of relevant predictors, followed by an evaluation of the resulting model in a predictive sense. Details on each of these are presented next.

a. Predictor selection

The predictors for the mixture ratio (rt-.) model in Eq. (8) are ascertained from lagged values of the response (rt) over the past 12 time steps (months). Predictors for the categorical bias direction (bt) are selected from lagged values of the ratio ej,t/ei,t. This ratio (termed residual ratio) is constrained to lie within [−1, 2] to avoid numerical instability when ei,t ≈ 0. The inclusion of hydrologic variables to the pool of candidate exogenous predictors involves detail knowledge of the component model pairs. We do not include any exogenous variables at this stage of the research.

The final predictor vector is chosen using standard statistical model selection procedures involving partial autocorrelation to the response, backward stepwise model selection using the partial F test (Chambers 1992; Hastie and Pregibon 1992; Hastie et al. 2000), and partial mutual information (Sharma 2000). The statistical analysis selected various autoregressive lags and a periodic intercept with 12 values of three monthly means; for example, the February intercept contains the mean of January to March values of the mixture ratio (or residual ratio) of the entire calibration period. The lag period of autoregressive covariates are listed in Table 1. The preference was toward smaller number of predictors to preclude over parameterization, which is why a limited number of (autoregressive) covariates were considered in the regression formulations.

b. Results

The results presented next evaluate the performance of the modeling framework presented in the previous section in the context of forecasting weights as the basis of reducing predictive uncertainty of the Niño-3.4 forecasts.

1) Bias direction forecasts

We consider next the predictive accuracy of the ordered logistic regression of the categorical bias direction as stated in Eq. (12). The expected bias direction at any time step is the category with the highest predicted probability. The calibration results for the classification obtained using the model structure illustrated in Fig. 4 are presented in Table 2. As can be inferred from the diagonal values in the table, the correct classification rate is 52% and 44% at level 2 and 1 combinations, respectively. The potential of combination error arising from misclassification of one to zero or vice versa is higher than the misclassification of the category mix. Such misclassification rate (zero to one, one to zero) was small in this study. The result table shows that there are only 5 out of possible 30 instances in level 2 combination and another 3 out of 20 instances in level 1 misclassified zero to one or vice versa.

2) Mixture ratio forecasts

Dynamic weights are predicted based on precision forecast conditional on bias direction forecast [Eq. (11)]. In the results that follow, a leave ± 6 months out cross-validation, where data blocks of 6 months from either side of a validation month are excluded in the formulation of the model, is performed to ascertain the predictive accuracy associated with the categorical forecast. For example, the July 1970 validation is based on a calibration period of January 1956 to December 1969 and February 1971 to December 2001. In addition to this, we also validate the model in four 10-yr blocks, for example, validation from 1992 to 2001 that has been calibrated for the period of 1956–91. The mean of squared error (MSE) of the predictions is used as a measure of forecast accuracy.

The static combination forecasts [Eq. (2)] is used as the benchmark to evaluate the performance of this dynamic combination method. Let us index {UCLA, ECMWF, CPC} as models {1, 2, 3} based on lower to higher paired covariance. The static weight of UCLA and ECMWF models are w1(s) and w2(s). These weights can be compared to the overall weight assigned to individual models at level 1 of the hierarchical tree in Fig. 4. Using the notation as of Eq. (14) for the dynamic weight {wt,m(k)}, where m refers to the pair number for the lth tree level, the weight representing the weight for the first model included in the pair (the weight for the other model being (1 − wt,m(k)), the overall weights associated with individual models can be derived as shown in Table 3.

The dynamic and static weights, as explained above, from the ± 6 months out cross validation are presented in Figs. 5 and 6. The static weights do not change over time (the wiggly appearance being a result of presenting results in cross validation). As expected the static weights represent the centroid of the observed values of the dynamic weights. A close scrutiny of the predicted weights revealed that the UCLA model dominated during El Niño period, reflecting prime calibration intent of UCLA (Kondrashov et al. 2005). The weights during La Niña phase slightly favored the CPC model. While overall ECMWF contribution was minor, this minor role is more evident during La Niña and the neutral years.

3) Niño-3.4 prediction from the combined models

This subsection presents improvements achieved in predicting Niño-3.4 due to model combination by dynamic weight. We start by offering an assessment of the performance of existing precision ratio-based combination methods [Eqs. (6) and (7)]. The performances of these precision-ratio-based estimates are scrutinized based on the last 10 years (1991–2001) of hindcasts. Various combinations of parameters γ, h, and λ of Eqs. (5) and (6) did not improve the predictive error variance compared to that of (0.192) static weight-based combination (Fig. 7), confirming the need of a more flexible dynamic weight formulation.

Readers should recall that our proposed model consists of two stages, the first stage aiming to ascertain the magnitude of the mixture ratio between the two models and the second step the direction of the respective errors the two models have. These stages are referred to as the mixture ratio and the bias direction models. Table 4 presents the MSE of calibration and various validation results. Improvements can be noted in all validation cases presented.

The MSE of the dynamic combination (same as the second column in Table 4), the static combination, and the component models for the same period for the Niño-3.4 response variable are listed in Table 5. The overall reduction of MSE (Table 5) and improved prediction skill in either combination method reconfirms the past finding that the model combination improves the prediction accuracy (Armstrong 1989; Clemen 1989). We note that the reduction of the MSE of the dynamic combination to that of the static combination is minor. However, the better results are consistent for all the cases. The reduction of MSE, if analyzed by a one-tailed t test, is found to be significant at confidence level of p = 0.0117. A notable point here is that these improvements are based on the use of persistence of various orders only, with no exogenous predictors being considered for simplicity in our presentation. In addition to the presented MSE, the results are analyzed using alternative measures like mean absolute error, mean error in probability space, and Nash–Sutcliffe efficiency or R2. These measures drew similar conclusions and are not presented here.

4. Discussion

While the results presented in the previous section do point to the utility of the proposed dynamic combination approach, there are a number of issues that need to be discussed. Foremost amongst these are predictive models used to ascertain the dynamic combination weights.

Inclusion of the bias direction model reduces the chance of combined prediction being inferior to an individual model. No strong and clear basis of setting the order of the categories of the bias direction variables exists. There are alternatives to the ordered logistic regression used for the bias direction model. They are linear discriminant analysis, quadratic discriminant analysis, and multinomial models. However, an evaluation of these alternatives did not yield any better prediction in the case study presented here. In addition, we preferred OLR as it needs lesser parameter than the multinomial model. It is worthwhile here to flag the basic assumption of each component model being unbiased while formulating the combination method. However, this study did not remove the minor bias from component predictions prior to combination. We formed the view that any apparent fine bias is local to the time window of analysis only; any prewhitening of the predictions in absence of detail calibration knowledge may be precarious.

The class of linear predictive models, in our case the GLAR, does not represent well the probability distribution of the weights because of its limited ability to assume their extreme limits of 0 and 1 and, thus, underestimates the confidence interval of the combined prediction. The localized regression models (Cleveland 1979; Lall et al. 2006) such as loess (Cleveland et al. 1988) and generalized additive models (Hastie and Tibshirani 1986) yielded better predictions in some trials. Our findings concur with Bates and Granger (1969) that the weights (mixture ratio) follow a beta distribution; improvements can be expected by choosing a predictive model that represents this to a better extent such as beta regression model (Ferrari and Cribari-Neto 2004)—something that was not attempted to keep our presentation simple and concise.

One way to avoid the need of a design distribution altogether is to use nonparametric predictive models (Lall and Sharma 1996; Mehrotra and Sharma 2006; Sharma and Lall 1999). The details of alternative regression options and nonparametric methods are not included here in order to maintain the focus of the paper on the rationale of combining models in a dynamic manner as presented here.

One can infer the weights ωt as the probability of one component model exhibiting a lower error as compared to its pair in each pairwise combination. The estimation of these weights, or gating function as referred alternatively (Hastie et al. 2000; Marshall et al. 2007), proceeds through the assumption of a linear form in the logistic transformed space. This is an assumption often used for simplicity and can be improved upon by having other gating functions (instead of the logit transform used here) or nonlinear/nonparametric models instead of the linear one used. A notable departure in this research from existing combination weight formulation is to use absolute residual ratio [Eq. (7)] instead of precision ratio (inverse of variance). Precision-ratio-based dynamic weights [variants of Eqs. (3), (5), or (6)] requires a minimum bandwidth (time window) to obtain a stable estimate of the variance. Whereas the absolute ratio, which is the analytical solution of the weights when the predictions are bracketing the true value, can be deduced at every time step.

It should also be noted that the notion of formulating dynamic combination weights has been explored in earlier studies, although not in the context of formulating climate model forecasts. Marshall et al. (2007) dynamically applied Bayesian model averaging (Hoeting et al. 1999), where the method assumes knowledge of exogenous predictors (with imbedded persistence) and full structural details of all component models. In contrast, this study regards each component model as a black box; that is, no structural knowledge is needed, giving way to mix wide types of (statistical or dynamical) predictive models available off the shelves, and considers models on a pairwise basis. Another example of a similar application is found in (Robertson et al. 2004) where all the component models were paired against the climatological forecasts as a way to stabilize the multivariate weight computation. This resulted in a multivariate extension to the static weight combination presented in section 2a. Kim et al. (2006) applied the artificial neural network method to dynamically combine hydrological models. However the predictor identification (of the weights) was performed in a full multivariate setting and could represent added predictive variance due to the complexity of the neural network models used.

While this paper advocates model combinations using the dynamic weight rationale, the robustness of the simple static combination cannot be underestimated. The simple method often gives satisfactory results compared to more computationally intensive approaches (De Gooijer and Hyndman 2006), which is more true if the component predictions exhibit good precision. The complexity of combination increases with higher number of component models. Increasing the number of hyperparameters (parameters external to the component models) reduces the degrees of freedom, which may eventually compromise the strength of combination. However, the use of cross-validation as the basis for evaluating model performance removes this concern to a significant extent, as the cross-validation mean square error represent the predictive error the models may have. Any reduction of the number of component models can be carried out during the design of the combination tree by precombining (using static weight) the model partners showing high residual covariance. Although empirical studies in the economic forecasting literature recommend a maximum number of components as 6–8 (Armstrong 2001, p. 420; Hibon and Evgeniou 2005), it is unclear if that finding holds for the type of models considered here: especially so if our aim were to formulate such combinations over a multivariate response representing gridded sea surface temperature anomalies spread over the full world surface. Nevertheless, the bias direction model here, to some extent, shields the loss of parsimony of the combined prediction when the number of models M is high. We are of the opinion that the optimum size of M largely depends on the extent of the uncertainty and level of independence of the component predictions. This empirical study is based on ensemble mean only: nevertheless, the predicted weights can be applied to combine the full set of component realizations in order to attain full probability range. Our future work aims to expand this method, now applied to a univariate response, to a multivariate spatially distributed response vector. In a multivariate extension, we envisage the challenge will be to maintain spatial dependence in the predicted responses with minimal loss of degrees of freedom while consistently exhibiting improvement.

5. Conclusions

This paper presented a methodology for combining forecasts from multiple models in a dynamic manner. Multiple models were mixed in pairs based on importance weights that were allowed to vary in time reflecting the persistence of individual model skills and of any relevant exogenous variable. The model pairs were first matched based on the sample error covariance. Then the pairs were combined by ascertaining a weight for each time step. The weights were structured in a hierarchical pairwise combination tree. This process provided a low-dimension setting for investigating any predictive structure of the relative model strengths. A two-step regression model was used to predict the weights; the steps being the formulation of the mixture ratio model and the bias direction model. The mixture ratio was predicted by a generalized linear autoregressive model and the bias direction by an ordered logistic regression.

The method was applied to combine two statistically based and one dynamic climate models. The variables of interest were the monthly sea surface temperature anomalies averaged over the Niño-3.4 region from 1956 to 2001 predicted three months in advance. The combined model skill was tested using a “leave ± 6 months out cross-validation” along with validation in individual 10-yr blocks. This empirical study first reconfirmed the concept that the predictions from static weight combination (or a weighted model average) of multiple models improves the skills compared to any single model prediction. Second, we found that the predictions using existing precision-ratio-based dynamic weight did not offer any improvement over predictions using static weight combination. Third, the proposed dynamic weight computation method is an improvement over the existing precision-ratio-based dynamic weights. The proposed method exhibited a very small but consistent increase in prediction skill over that of static weight method for the entire six validation scenarios with no case of worsening results. These consistent results suggest that the potential of improvement is real if multiple predictions are combined using our proposed dynamic weights.

Acknowledgments

We acknowledge the assistance of Upmanu Lall, Andrew Robertson, and Lisa Goddard of the Lamont Doherty Earth Observatory Centre of Columbia University, New York, and for access to their component model prediction database. This research was funded by the Australian Research Council and the Sydney Catchment Authority. The computation was performed using the freely available R statistical computing platform (R Development Core Team 2006, available online at http://www.r-project.org/). The helpful comments of anonymous Journal of Climate reviewers are gratefully acknowledged.

REFERENCES

  • Agresti, A., 1996: An Introduction to Categorical Data Analysis. Wiley Series in Probability and Statistics, Wiley, 290 pp.

  • Aksu, C., , and S. Gunter, 1992: An empirical analysis of the accuracy of SA, OLS, ERLS and NRLS combination forecasts. Int. J. Forecasting, 8 , 2743.

    • Search Google Scholar
    • Export Citation
  • Armstrong, J. S., 1989: Combining forecasts: The end of the beginning or the beginning of the end? Int. J. Forecasting, 5 , 585588.

  • Armstrong, J. S., , and J. S. Armstrong, 2001: Combining forecasts. Principles of Forecasting: A Handbook for Researchers and Practitioners, J. S. Armstrong, Ed., Kluwer Academic, 417–439.

    • Search Google Scholar
    • Export Citation
  • Barnston, A. G., , S. J. Mason, , L. Goddard, , D. G. Dewitt, , and S. E. Zebiak, 2003: Multimodel ensembling in seasonal climate forecasting at IRI. Bull. Amer. Meteor. Soc., 84 , 17831796.

    • Search Google Scholar
    • Export Citation
  • Bates, J. M., , and C. W. J. Granger, 1969: The combination of forecasts. Oper. Res. Quart., 20 , 451468.

  • Bottomley, M., , C. K. Folland, , J. Hsiung, , R. E. Newell, , and D. E. Parker, 1990: Global Ocean Surface Temperature Atlas. Met Office, and Massachusetts Institute of Technology, 20 pp. and 313 plates.

    • Search Google Scholar
    • Export Citation
  • Carrasco, J. A., , and J. D. Ortuzar, 2002: Review and assessment of the nested logit model. Transp. Rev., 22 , 197218.

  • Chambers, J. M., 1992: Linear models. Statistical Models in S, J. M. Chambers and T. Hastie, Eds., Wadsworth & Brooks, 95–144.

  • Chandler, R. E., 2005: On the use of generalized linear models for interpreting climate variability. Environmetrics, 16 , 699715.

  • Clemen, R. T., 1989: Combining forecasts: A review and annotated bibliography. Int. J. Forecasting, 5 , 559583.

  • Cleveland, W. S., 1979: Robust locally weighted regression and smoothing scatterplots. J. Amer. Stat. Assoc., 74 , 829836.

  • Cleveland, W. S., , S. J. Devlin, , and E. Grosse, 1988: Regression by local fitting: Methods, properties, and computational algorithms. J. Econometrics, 37 , 87114.

    • Search Google Scholar
    • Export Citation
  • Coelho, C. A. S., , S. Pezzulli, , M. Balmaseda, , F. J. Doblas-Reyes, , and D. B. Stephenson, 2004: Forecast calibration and combination: A simple Bayesian approach for ENSO. J. Climate, 17 , 15041516.

    • Search Google Scholar
    • Export Citation
  • Colman, A. W., , and M. K. Davey, 2003: Statistical prediction of global sea-surface temperature anomalies. Int. J. Climatol., 23 , 16771697.

    • Search Google Scholar
    • Export Citation
  • Dawes, R., , R. Fildes, , M. Lawrence, , and K. Ord, 1994: The past and the future of forecasting research. Int. J. Forecasting, 10 , 151159.

    • Search Google Scholar
    • Export Citation
  • de Gooijer, J. G., , and R. J. Hyndman, 2006: 25 years of time series forecasting. Int. J. Forecasting, 22 , 443473.

  • de Menezes, L. M., , D. W. Bunn, , and J. W. Taylor, 2000: Review of guidelines for the use of combined forecasts. Eur. J. Oper. Res., 120 , 190204.

    • Search Google Scholar
    • Export Citation
  • Deutsch, M., , C. W. J. Granger, , and T. Terasvirta, 1994: The combination of forecasts using changing weights. Int. J. Forecasting, 10 , 4757.

    • Search Google Scholar
    • Export Citation
  • Doblas-Reyes, F. J., , R. Hagedorn, , and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting: II. Calibration and combination. Tellus, 57A , 234252.

    • Search Google Scholar
    • Export Citation
  • Dunteman, G. H., , and M. R. Ho, 2006: An Introduction to Generalized Linear Models. Quantitative Applications in the Social Sciences, Vol. 145, Sage Publications, 72 pp.

    • Search Google Scholar
    • Export Citation
  • Ferrari, S. L. P., , and F. Cribari-Neto, 2004: Beta regression for modelling rates and proportions. J. Appl. Stat., 31 , 799815.

  • Fraedrich, K., , and N. Smith, 1989: Combining predictive scheme in long-range forecasting. J. Climate, 2 , 291294.

  • Fritsch, J. M., , J. Hilliker, , and J. Ross, 2000: Model consensus. Wea. Forecasting, 15 , 571582.

  • Gelman, A., , and J. Hill, 2006: Data Analysis Using Regression and Multilevel/Hierarchical Models. 1st ed. Cambridge University Press, 625 pp.

    • Search Google Scholar
    • Export Citation
  • Granger, C. W. J., , and P. Newbold, 1977: Forecasting Economic Time Series. Academic Press, 333 pp.

  • Greene, A. M., , L. Goddard, , and U. Lall, 2006: Probabilistic multimodel regional temperature change projections. J. Climate, 19 , 43264343.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., , and R. Tibshirani, 1986: Generalized additive models. Stat. Sci., 1 , 297310.

  • Hastie, T., , and D. Pregibon, 1992: Generalised linear models. Statistical Models in S, J. M. Chambers and T. Hastie, Eds., Wadsworth & Brooks, 195–248.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., , R. Tibshirani, , and J. Friedman, 2000: The Elements of Statistical Learning, Data Mining, Inference and Prediction. Springer Series in Statistics, Springer, 533 pp.

    • Search Google Scholar
    • Export Citation
  • Hibon, M., , and T. Evgeniou, 2005: To combine or not to combine: Selecting among forecasts and their combinations. Int. J. Forecasting, 21 , 1524.

    • Search Google Scholar
    • Export Citation
  • Hoeting, J. A., , D. Madigan, , A. E. Raftery, , and C. T. Volinsky, 1999: Bayesian model averaging: A tutorial (with discussion). Stat. Sci., 14 , 382417.

    • Search Google Scholar
    • Export Citation
  • Kamstra, M., , and P. Kennedy, 1998: Combining qualitative forecasts using logit. Int. J. Forecasting, 14 , 8393.

  • Kim, Y. O., , D. Jeong, , and I. H. Ko, 2006: Combining rainfall-runoff model outputs for improving ensemble streamflow prediction. J. Hydrol. Eng., 11 , 578588.

    • Search Google Scholar
    • Export Citation
  • Kondrashov, D., , S. Kravtsov, , A. W. Robertson, , and M. Ghil, 2005: A hierarchy of data-based ENSO models. J. Climate, 18 , 44254444.

  • Lall, U., , and A. Sharma, 1996: A nearest neighbor bootstrap for time series resampling. Water Resour. Res., 32 , 679693.

  • Lall, U., , Y-I. Moon, , H-H. Kwon, , and K. Bosworth, 2006: Locally weighted polynomial regression: Parameter choice and application to forecasts of the Great Salt Lake. Water Resour. Res., 42 , W05422. doi:10.1029/2004WR003782.

    • Search Google Scholar
    • Export Citation
  • Larrick, R. P., , and J. B. Soll, 2006: Intuitions about combining opinions: Misappreciation of the averaging principle. Manage. Sci., 52 , 111127.

    • Search Google Scholar
    • Export Citation
  • Lundberg, S., , T. Terasvirta, , and D. van Dijk, 2003: Time-varying smooth transition autoregressive models. J. Business Econ. Stat., 21 , 104112.

    • Search Google Scholar
    • Export Citation
  • Marshall, L. A., 2006: Bayesian analysis of rainfall-runoff models: Insights to parameter estimation, model comparison and hierarchical model development. Ph.D. thesis, School of Civil and Environmental Engineering, University of New South Wales, 222 pp.

  • Marshall, L. A., , D. Nott, , and A. Sharma, 2007: Towards dynamic catchment modelling: A Bayesian hierarchical modelling framework. Hydrol. Processes, 21 , 847861.

    • Search Google Scholar
    • Export Citation
  • McCullagh, P., , and J. A. Nelder, 1989: Generalized Linear Models. 2nd ed. Chapman and Hall, 511 pp.

  • McLeod, A. I., , D. J. Noakes, , K. W. Hipel, , and R. M. Thompstone, 1987: Combining hydrologic forecast. J. Water Resour. Plann. Manage., 113 , 2941.

    • Search Google Scholar
    • Export Citation
  • Mehrotra, R., , and A. Sharma, 2006: Conditional resampling of hydrologic time series using multiple predictor variables: A K-nearest neighbour approach. Adv. Water Resour., 29 , 987999.

    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., and Coauthors, 2004: Development of a European multimodel ensemble system for seasonal-to-interannual prediction (DEMETER). Bull. Amer. Meteor. Soc., 85 , 853872.

    • Search Google Scholar
    • Export Citation
  • Pavan, V., , and F. J. Doblas-Reyes, 2000: Multi-model seasonal hindcasts over the Euro-Atlantic: Skill scores and dynamic features. Climate Dyn., 16 , 611625.

    • Search Google Scholar
    • Export Citation
  • Peng, P., , A. Kumar, , H. Van den Dool, , and A. G. Barnston, 2002: An analysis of multimodel ensemble predictions for seasonal climate anomalies. J. Geophys. Res., 107 , 4710. doi:10.1029/2002JD002712.

    • Search Google Scholar
    • Export Citation
  • Phillips-Wren, G. E., , E. D. Hahn, , and G. A. Forgionne, 2004: A multiple-criteria framework for evaluation of decision support systems. Omega, 32 , 323.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., , T. Geneiting, , F. Balabdaoui, , and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133 , 11551174.

    • Search Google Scholar
    • Export Citation
  • Ragonda, S. K., , B. Rajagopalan, , M. Clark, , and E. Zagona, 2006: A multimodel ensemble forecast framework: Application to spring seasonal flows in the Gunnison River Basin. Water Resour. Res., 42 , W09404. doi:10.1029/2005WR004653.

    • Search Google Scholar
    • Export Citation
  • Rajagopalan, B., , U. Lall, , and S. E. Zebiak, 2002: Categorical climate forecasts through regularization and optimal combination of multiple GCM ensembles. Mon. Wea. Rev., 130 , 17921811.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., , U. Lall, , S. E. Zebiak, , and L. Goddard, 2004: Improved combination of multiple atmospheric GCM ensembles for seasonal prediction. Mon. Wea. Rev., 132 , 27322744.

    • Search Google Scholar
    • Export Citation
  • Sanders, F., 1963: On subjective probability forecasting. J. Appl. Meteor., 2 , 191201.

  • See, L., , and R. J. Abrahart, 2001: Multi-model data fusion for hydrological forecasting. Comput. Geosci., 27 , 987994.

  • Shamseldin, A. Y., , K. M. O’Connor, , and G. C. Liang, 1997: Methods for combining the outputs of different rainfall-runoff models. J. Hydrol., 197 , 203229.

    • Search Google Scholar
    • Export Citation
  • Sharma, A., 2000: Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1—A strategy for system predictor identification. J. Hydrol., 239 , 232239.

    • Search Google Scholar
    • Export Citation
  • Sharma, A., , and U. Lall, 1999: A nonparametric approach for daily rainfall simulation. Math. Comput. Simul., 48 , 361371.

  • Sharma, A., , and U. Lall, 2004: Model averaging and its use in probabilistic forecasting of hydrologic variables. Hydrology, Science and Practice for the 21st Century, Imperial College, London, British Hydrological Society, 372–378.

    • Search Google Scholar
    • Export Citation
  • Shephard, N., 1995: Generalized linear autoregressions. Economics working paper 8, Nuffield College, University of Oxford, 13 pp.

  • Smith, T. M., , and R. W. Reynolds, 2003: Extended reconstruction of global sea surface temperatures based on COADS data (1854–1997). J. Climate, 16 , 14951510.

    • Search Google Scholar
    • Export Citation
  • Terui, N., , and H. K. van Dijk, 2002: Combined forecasts from linear and nonlinear time series models. Int. J. Forecasting, 18 , 421438.

    • Search Google Scholar
    • Export Citation
  • Thompson, P. D., 1977: How to improve accuracy by combining independent forecast. Mon. Wea. Rev., 105 , 228229.

  • Van den Dool, H., 2000: Constructed analogue prediction of the east central tropical Pacific SST and the entire world ocean for 2001. Exp. Long-Lead Forecast Bull., 9 , 3841.

    • Search Google Scholar
    • Export Citation
  • Xiong, L., , A. Y. Shamseldin, , and K. M. O’Connor, 2001: A non-linear combination of the forecasts of rainfall-runoff models by the first-order Takagi-Sugeno fuzzy system. J. Hydrol., 245 , 196217.

    • Search Google Scholar
    • Export Citation
  • Yang, C., , R. E. Chandler, , V. S. Isham, , and H. S. Wheater, 2005: Spatial-temporal rainfall simulation using generalized linear models. Water Resour. Res., 41 , W11415. doi:10.1029/2004WR003739.

    • Search Google Scholar
    • Export Citation
  • Yu, L., , S. Wang, , and K. K. Lai, 2005: A novel nonlinear ensemble forecasting model incorporating GLAR and ANN for foreign exchange rates. Comput. Oper. Res., 32 , 25232541.

    • Search Google Scholar
    • Export Citation
  • Zou, H., , and Y. Yang, 2004: Combining time series models for forecasting. Int. J. Forecasting, 20 , 6984.

Fig. 1.
Fig. 1.

(a) The classification method of the bias direction variable. The time series of the ratio of two model residuals (e2/e1) are grouped into three zones. The residuals are classified into a three category response variable {mix, zero, one} as shown. (b) The simple ordered logistic model for a three category response variable. The regression lines are dividing the probability space. For a given value of the predictor x, the dashed line is showing P(b = mix) equal to 0.26, P(b = zero) equal to 0.70–0.26, and P(b = one) equal to 1–0.70.

Citation: Journal of Climate 22, 3; 10.1175/2008JCLI2210.1

Fig. 2.
Fig. 2.

The tree showing pairwise hierarchical mixing of four component models.

Citation: Journal of Climate 22, 3; 10.1175/2008JCLI2210.1

Fig. 3.
Fig. 3.

The paired plot of the three model residuals. The value in the diagonals shows the variance of the individual model error, the numbers in upper boxes are the covariance. The lower covariance has higher potential of improvement via combination.

Citation: Journal of Climate 22, 3; 10.1175/2008JCLI2210.1

Fig. 4.
Fig. 4.

Pairwise hierarchical mixing tree of UCLA (U), CPC (C), and ECMWF (E). The first pair UCLA + ECMWF is chosen owing to its lowest covariance.

Citation: Journal of Climate 22, 3; 10.1175/2008JCLI2210.1

Fig. 5.
Fig. 5.

The overall dynamic weights for the UCLA model. The predicted weights in validation are drawn in solid lines and the optimum weights as black dots. The broken line is a four monthly moving average of the optimum weights included for clarity. The near horizontal line along 0.55 is the static weight.

Citation: Journal of Climate 22, 3; 10.1175/2008JCLI2210.1

Fig. 6.
Fig. 6.

The overall dynamic weights for the CPC model. The predicted weights in validation are drawn in solid lines and the optimum weights as black dots. The broken line is a four monthly moving average of the optimum weights included for clarity. The near horizontal line along 0.35 is the static weight.

Citation: Journal of Climate 22, 3; 10.1175/2008JCLI2210.1

Fig. 7.
Fig. 7.

The error variance of the combined prediction is drawn against the parameters of Eqs. (5) and (6). The predictive variance of static weight combination of period 1992–2001 is 0.192, higher h value or γ = 1 collapses the model to the static weight case. None of the two methods exhibit any improvement (i.e., error variance < 0.192) by localized estimation of the weights (by smaller h or γ > 1).

Citation: Journal of Climate 22, 3; 10.1175/2008JCLI2210.1

Table 1.

Predictor variables for pairwise model combinations. Note U: UCLA, C: CPC, and E: ECMWF in the table. Note also that t − 3 implies a lag of 3 months, see Eqs. (8) and (12) for further description of the notations.

Table 1.
Table 2.

Contingency table (represented as percentages) showing predictions from ordered logistic model. The diagonal indicates the correct classification rate; (a) the level 2 tree where UCLA and ECMWF models are combined; (b) the level 1 tree where (UCLA + ECMWF) is combined with CPC.

Table 2.
Table 3.

The notation and comparative association of the dynamic weight (46 yr of monthly values) to static weight.

Table 3.
Table 4.

The mean square errors of Niño-3.4 predictions prior to and after the inclusion of the bias direction model. The first row represents the calibration using the full dataset (1951–2001). The second row represents leave ±6 months out cross-validation (CV) skills. The other rows represent validation in 10-yr blocks. Note that all cases exhibit superior performance after consideration of the bias model.

Table 4.
Table 5.

The MSE for various model results for a concurrent period for Niño-3.4. The first row represents the calibration using the full dataset (1951–2001). The second row represents leave ±6 months out cross-validation skills. The other rows represent validation in 10-yr blocks.

Table 5.
Save