The Subseasonal Experiment (SubX) is a multimodel subseasonal prediction experiment designed around operational requirements with the goal of improving subseasonal forecasts. Seven global models have produced 17 years of retrospective (re)forecasts and more than a year of weekly real-time forecasts. The reforecasts and forecasts are archived at the Data Library of the International Research Institute for Climate and Society, Columbia University, providing a comprehensive database for research on subseasonal to seasonal predictability and predictions. The SubX models show skill for temperature and precipitation 3 weeks ahead of time in specific regions. The SubX multimodel ensemble mean is more skillful than any individual model overall. Skill in simulating the Madden–Julian oscillation (MJO) and the North Atlantic Oscillation (NAO), two sources of subseasonal predictability, is also evaluated, with skillful predictions of the MJO 4 weeks in advance and of the NAO 2 weeks in advance. SubX is also able to make useful contributions to operational forecast guidance at the Climate Prediction Center. Additionally, SubX provides information on the potential for extreme precipitation associated with tropical cyclones, which can help emergency management and aid organizations to plan for disasters.
SubX is a research to operations project in service of developing better operational subseasonal forecasts.
Early warning of heat waves, extreme cold, flooding rains, flash drought, or other weather hazards as far as 4 weeks into the future could allow for risk reduction and disaster preparedness, potentially preserving life and resources. Less extreme, but no less important, reliable probabilistic forecasts about the potential for warmer, colder, wetter, or drier conditions at a few weeks lead are valuable for routine planning and resource management. Many sectors would benefit from these predictions, including emergency management, public health, energy, water management, agriculture, and marine fisheries [see White et al. (2017) for a review of potential applications]. However, a well-known “gap” exists in our current prediction systems at this subseasonal time scale of 2 weeks to 1 month. This gap falls between the prediction of weather, where atmospheric initial conditions contribute to skillful forecasts, and seasonal prediction, which is guided by slowly evolving surface boundary conditions such as sea surface temperatures and soil moisture (National Research Council 2010; Brunet et al. 2010; National Academies of Sciences, Engineering and Medicine 2017; Mariotti et al. 2018; Black et al. 2017; DelSole et al. 2017).
The potential for successful prediction at the subseasonal time scale has been established for some regions and seasons (e.g., Pegion and Sardeshmukh 2011; DelSole et al. 2017; Li et al. 2015), but it is not clear whether the full potential predictability has been realized. Additionally, many questions remain regarding our fundamental understanding of the physical processes giving rise to predictability, as well as how best to design, build, postprocess, and verify a subseasonal prediction system. Amidst these questions, the U.S. National Oceanic and Atmospheric Administration (NOAA) was mandated to begin issuing week 3–4 outlooks for temperature and precipitation. NOAA has for many years released official outlooks for 1-week, 2-week, 1-month, and 3-month averages; week 3–4 prediction is a new area with many unique research and development concerns.
The Subseasonal Experiment (SubX), a research-to-operations project, was launched to fulfill both the immediate need for real-time subseasonal prediction guidance and to allow for the exploration of relevant research questions, in order to develop more skillful and useful subseasonal predictions in the future. SubX takes a multimodel ensemble approach and includes global climate prediction models from both operational and research centers. As a research database designed around operational standards, SubX improves our ability to directly answer research questions in the service of developing better operational forecasts.
Combining models together into multimodel ensembles has been a successful technique to improve forecast quality for weather and seasonal predictions (e.g., Hagedorn et al. 2005; Weigel et al. 2008; Kirtman et al. 2014; Krishnamurti et al. 2000, 1999). The skill improvement comes from two sources: first, the collection of a larger ensemble of model predictions than that available from any individual forecast system, which allows for a better estimation of forecast uncertainty, probability distribution, and signal-to-noise ratio; equally advantageous is so-called “complementary skill,” or the additive skill from the different models. Also, as new versions of constituent models are introduced to the ensemble, a multimodel system can evolve faster than the typical improvement cycle for a single model. Examples of current multimodel systems include the North American Multimodel Ensemble (NMME; Kirtman et al. 2014) and European Seasonal to Interannual Prediction (EUROSIP; Mishra et al. 2019), both seasonal forecast systems, and the North American Ensemble Forecast System (NAEFS; Candille 2009; Candille et al. 2010), which produces forecasts out to 14 days.
THE SUBX DATABASE.
SubX provides a publicly available database of 17 years of historical reforecasts (1999–2015), plus more than 18 months of real-time forecasts from seven U.S. and Canadian modeling groups. All forecasts include daily values for at least 32 days beyond the initialization date. See Table 1 for model descriptions and the appendix A for protocol details.
SubX has two unique aspects that distinguish it from other subseasonal forecast databases, such as the World Weather Research Programme (WWRP)/World Climate Research Programme (WCRP) Subseasonal to Seasonal (S2S) Prediction Project (Robertson et al. 2015; Vitart et al. 2017). The first of these is the inclusion of research models alongside operational models from NOAA and Environment and Climate Change Canada, facilitating feedback between research and operations on model development. A second distinction is the almost immediate availability of forecasts, allowing for use in real-time applications, including the NOAA Climate Prediction Center’s week 3–4 outlooks. This aspect of SubX has provided forecasters with additional forecast guidance, and allows for a research experiment to assess and guide best practices and priorities for real-time predictions.
HOW SKILLFUL ARE SUBSEASONAL PREDICTIONS WITH THE SUBX MODELS?
In addition to physical scientific questions, the design of a subseasonal multimodel ensemble mean (MME) presents practical complications beyond those of a weather or seasonal system. For example, a common challenge for subseasonal reforecast databases is that different models are initialized on different days, making it difficult to produce a traditional multimodel ensemble, typically made by averaging all forecasts from the same start date (Vitart et al. 2017). The implications of this practical consideration are explored in the SubX project, wherein forecasts from different start dates over the course of 1 week are combined and verified for the same verification period. This methodology, called a lagged average ensemble, has been used in weather and seasonal forecasting with single models (e.g., Hoffman and Kalnay 1983; Kalnay and Dalcher 1987; Trenary et al. 2018; DelSole et al. 2017).
Here, we evaluate the skill of the week 3 averages (average of days 15–21 of the forecast period) over all seasons from the individual SubX models ensemble means, as well as the MME, for anomalous temperature and precipitation over land. Skill is assessed using the anomaly correlation coefficient (ACC; Wilks 2006). The ACC provides information about how well the variability of the forecasted anomalies matches the observed variability, and is calculated as the temporal correlation of temporal anomalies at each grid point (Becker et al. 2014), shown as maps in Figs. 1 and 2. Details of the observational datasets used for verification are provided in the sidebar “Verification datasets” and details of the methodology used for making climatology and anomalies are provided in the appendix B.
Calculation of skill requires a verifying observational dataset. Where applicable, the datasets used correspond to those used by CPC for verification of their forecasts. For 2-m temperature over land, the CPC daily temperature dataset with horizontal resolution of 0.5° × 0.5° is used.SB3 These data are provided as a maximum (Tmax) and minimum (Tmin) daily temperature, thus the average daily temperature is calculated as the average of Tmax and Tmin (Fan and van den Dool 2008). For precipitation over land, the CPC Global Daily Precipitation dataset (0.5° × 0.5°) is used (Xie et al. 2007; Chen et al. 2008). Verification datasets are regridded to the coarser SubX model resolution of 1° × 1° prior to performing model evaluation. The years 1999–2014 are used for evaluation of the 2-m temperature and precipitation skill.
We also evaluate the skill of indices representing two subseasonal phenomena that are known sources of S2S predictability–the MJO and the NAO. The MJO skill is evaluated using the RMM index without interannual variability removed (Wheeler and Hendon 2004). The observed index is calculated using the NCEP–NCAR reanalysis (Kalnay et al. 1996) and NOAA Interpolated OLR (Liebmann and Smith 1996). The NAO is defined as the projection of the December–February geopotential height at 500 hPa (Z500) onto the leading North Atlantic EOF spatial pattern of Z500 (0°–90°N, 93°W–47°E). The observed NAO index is calculated using 500-hPa geopotential height from the NCEP–NCAR reanalysis (Kalnay et al. 1996). The years 1999–2014 are used for the evaluation of MJO and NAO skill. Both indices are calculated daily and then averaged to weekly values for skill calculations.
The original data can be found at ftp://ftp.cpc.ncep.noaa.gov/precip/PEOPLE/wd52ws/global_temp/.
The skill of the individual models and MME are also compared to a forecast based on the persistence of the initial conditions, where the anomaly at the initial forecast time is predicted to continue throughout the forecast. Week 3 is beyond weather time scales, and predictability due to atmospheric initial conditions is largely absent (Lorenz 1965, 1969). However, predictability due to slower varying components of the climate system, such as the global warming trend or the El Niño–Southern Oscillation, present in the initial anomaly will change little over a 3-week forecast. Therefore, skill due to these mechanisms would be present in a persistence forecast. Comparison of forecast skill with the skill of a persistence forecast provides insights into whether forecast skill can be attributed to any of these slowly varying components.
Over all months, positive ACC for temperature forecasts is present over much of the land for most models and the MME, with substantial regional variations (Fig. 1). The ACC of the individual models and the MME are higher than the skill of a persistence forecast, indicating that there is skill from sources other than the trend and/or ENSO (Fig. 1). While skill here is shown for the 15–21-day average forecasts for the individual models, the MME is produced from lagged averaged forecasts, and contains older model initializations (see the appendix C for details). However, the MME shows skill improvement over the individual models. For precipitation, anomaly correlation maps for week 3 indicate that the only region of statistically significant skill when calculated over all months is in Brazil (Fig. 2). This region of precipitation skill is consistent across the individual models and has higher skill than a persistence forecast; again, the MME has higher ACC than individual models, despite the inclusion of older model initializations.
While the multimodel ensemble mean methodology improves skill over the individual models, on average, skill at subseasonal time scales is low. However, there is evidence that skill varies over time. For example, there is seasonal dependence of skill for North America, with winter being more skillful than summer (e.g., DelSole et al. 2017). Skill also varies from year-to-year. This is evident in the SubX MME skill of spatial pattern correlations of North America temperature and precipitation anomalies for January initial conditions, which exhibits substantial variation with time (Fig. 3). At times, the ACC exceeds 0.5, a common threshold for “useful” skill (Murphy and Epstein 1989; Barnston and den Dool 1994; Jones et al. 2000) while at other times, the ACC is zero or even negative. This indicates there may be potential for higher skill forecasts at certain times, called “forecasts of opportunity.” While a thorough diagnosis of these higher skill periods is outside the current scope of this paper, in the next section we examine some potential sources of subseasonal prediction skill.
SUBSEASONAL SOURCES OF PREDICTABILITY.
Subseasonal predictability is likely influenced by a number of modes of climate variability that vary on time scales of weeks, including the Madden–Julian oscillation (MJO; Madden and Julian 1971, 1972) or the North Atlantic Oscillation (NAO; Hurrell et al. 2010). Several studies have suggested these modes may be predictable on subseasonal time scales, and present potential sources of predictability, allowing for the identification of “forecasts of opportunity” (National Research Council 2010; National Academies of Sciences, Engineering and Medicine 2017). That is, due to known impacts from the subseasonal modes, model forecasts may be more skillful when these modes are active, allowing for more confidence in their output. Correctly simulating and predicting these processes and their impacts are the key to successful subseasonal prediction.
The Madden–Julian oscillation.
The Madden–Julian oscillation, a dominant mode of tropical variability on subseasonal time scales, is a system of large-scale convective anomalies and associated circulation anomalies that propagates eastward from the tropical Indian Ocean and affects global weather [e.g., Cassou 2008; Lin et al. 2009; Guan et al. 2012; Mundhenk et al. 2018; Zhang 2013; see Stan et al. (2017) for a review of MJO teleconnections].
Therefore, accurate simulation and prediction of the MJO and its propagation is crucial to extend global subseasonal forecast skill. Observed convective anomalies associated with the MJO, as indicated by outgoing longwave radiation (OLR) anomalies, propagate eastward from the Indian Ocean (60°E) to the date line (Fig. 4, top). Most of the SubX models can reproduce the observed propagation of the OLR anomalies in week 3 forecasts, although some appear to have difficulty propagating them across the Maritime Continent, approximately 120°E—a well-known challenge for global climate models (Kim et al. 2018).
A common measurement of the MJO uses two Real-time Multivariate MJO (RMM) indices that combine OLR with winds at 200 and 850 hPa and measure the strength and phase of the MJO (Wheeler and Hendon 2004). A model’s ability to predict the combination of both RMM indices in terms of the bivariate correlation of the two indices provides insight into its overall capability to simulate and predict the MJO (Rashid et al. 2011). Most of the individual SubX models have ACC for these indices >0.5 out to week 4 (Fig. 5). This range of prediction skill is similar to the MJO skill of the WWRP/WCRP S2S models, with the exception of the skill of the European Centre for Medium-Range Weather Forecasts (ECMWF) model, which far exceeds that of any other S2S or SubX model (Vitart 2017). The SubX MME has similar skill to the best individual models for weeks 1–3 and higher skill at week 4 (Fig. 5). The MME is consistent with the ECMWF model from the S2S database, which has ACC for RMM indices of 0.6 out to 28 days (i.e., the end of the 4-week period; Vitart 2017).
It is of interest that the two most skillful SubX models at weeks 3 and 4 have very different configurations. The GMAO-GEOS model is a fully coupled atmosphere–ocean–land–sea ice model that has contributed to the monthly and seasonal NMME; this model contributes 4 ensemble members in SubX (see sidebar “SubX models”). In contrast, the base model of the EMC-GEFS is a numerical weather prediction atmosphere–land model forced with prescribed sea surface temperatures (SSTs) and contributes 11 ensemble members to the SubX reforecasts. The comparable MJO prediction skill from these two models illustrates an open question of S2S ensemble prediction, as the varying contributions of model configuration, ensemble size, and the role of a fully interactive ocean model remain to be clarified.
Seven modeling groups participate in SubX:
National Centers for Environmental Prediction (NCEP) Climate Forecast System, version 2 (NCEP-CFSv2);
NCEP Environmental Modeling Center, Global Ensemble Forecast System (EMC-GEFS);
Environment and Climate Change Canada Global Ensemble Prediction System, Global Environmental Multi-scale Model (ECCC-GEM);
National Aeronautics and Space Administration, Global Modeling and Assimilation Office, Goddard Earth Observing System (GMAO-GEOS);
Navy Earth System Prediction Capability (NAVY-ESPC);SB1
National Center for Atmospheric Research Community Climate System Model, version 4, run at the University of Miami Rosenstiel School for Marine and Atmospheric Science (RSMAS-CCSM4);
National Oceanic and Atmospheric Administration, Earth System Research Laboratory, Flow-Following Icosahedral Model (ESRL-FIM).
(For additional details, see Table 1.)
All groups have provided reforecasts for the 1999–2015 period with the exception of ECCC-GEM (1999–2014)SB2 and most have provided additional reforecasts to fill the gap between the end of the SubX reforecast period and beginning of the real-time forecasts in July 2017. Five of the groups use fully coupled atmosphere–ocean–land–sea ice models (NCEP-CFSv2, GMAO-GEOS, NAVY-ESPC, RSMAS-CCSM4, ESRL-FIM), while two groups use models with atmosphere and land components forced with prescribed SSTs (EMC-GEFS, ECCC-GEM). In the EMC-GEFS forecast system, SSTs are specified by relaxing the SST analysis to a combination of climatological SST and bias-corrected SST from operational NCEP-CFSv2 forecasts. The longer the lead time, the more weight given to the bias-corrected NCEP-CFSv2 forecast SST. In the ECCC-GEM forecast system, the SST anomaly averaged from the previous 30 days is persisted in the forecast. The sea ice cover is adjusted in order to be consistent with the SST change [see Gagnon et al. (2013) for details]. Most groups provide four ensemble members for the reforecasts (NCEP-CFSv2, ECCC-GEM, GMAO-GEOS, NAVY-ESPC, ESRL-FIM) with some groups creating ensembles by combining different start times and others using their own ensemble generation systems to produce initial conditions. Some groups provide additional ensemble members in real-time (e.g., RSMAS-CCSM4, EMC-GEFS).
The NAVY-ESPC model is referred to as NRL-NESM in the SubX database and the change of name to NAVY-ESPC in the database is currently in progress. NRL-NESM and NAVY-ESPC refer to the same model.
ECCC-GEM runs its reforecasts on the fly as part of their operational practice and will fill in 2015 at a later date.
The North Atlantic Oscillation.
The NAO, indicated by an oscillation in surface pressure and geopotential height between the Iceland low and the Azores high, is a key source of extratropical subseasonal variability (Hurrell et al. 2010). The NAO has been linked to periods of extreme winter weather on subseasonal time scales in eastern North America and Europe (e.g., Hurrell et al. 2010). Until recently, there was little evidence that the NAO could be skillfully predicted beyond weather time scales (e.g., Johansson 2007; Kim et al. 2012); however, recent studies have found that the Met Office seasonal prediction system can produce skillful monthly predictions of the NAO up to 1 year into the future using large ensembles (>20 members) and long reforecasts (∼40 years; Scaife et al. 2014; Dunstone et al. 2016).
Given both this newly discovered predictability of the NAO and its potential impacts on extreme weather at S2S time scales, we evaluate the skill of NAO prediction by the SubX models using a daily index representing the NAO (see “Verification datasets” sidebar for details of the index calculation). All individual models, as well as the MME, exhibit ACC > 0.5 when forecasting this NAO index through week 2 (average of days 8–14), using initialization dates from the Northern Hemisphere winter (Fig. 6). While ACC drops for forecasts of week 3 and week 4, one individual model has ACC = 0.5, while all models have significant skill at week 3. Only for forecasts of week 4 does the ACC of the MME clearly exceed any individual model.
The SubX participating modeling centers have produced new forecasts each week since July 2017. These are provided to the NOAA Climate Prediction Center (CPC) as dynamical guidance for their official week 3–4 temperature outlook and experimental week 3–4 precipitation outlook, issued every Friday. The CPC outlooks show regions of increased probability of above-normal or below-normal temperature and precipitation, and regions where the probabilities of above or below normal are equally likely (i.e., 50/50 chance). Using guidance from the real-time SubX forecasts for 2-m temperature, precipitation, and 500-hPa geopotential heights as well as other tools, CPC forecasters produce the official maps for week 3–4 outlooks. For example, the maps for 6 July 2018 temperature and precipitation show above- and below-normal areas consistent with the corresponding probabilities and anomalies from the SubX multimodel ensemble mean, demonstrating the use of SubX in the CPC official outlooks (Fig. 7).
We also evaluate the skill of the SubX real-time 2-m temperature forecasts produced from July 2017 to December 2018. Overall the real-time forecasts have similar skill to the reforecasts (Figs. 1 and 8). The real-time forecasts are also substantially more skillful over the continental United States than the reforecasts. Skill is expected to vary from year to year, depending on the presence or absence of major modes of climate variation, land surface conditions, and other factors. The sources of the higher skill over the continental United States during this period remain to be identified, but could come from the trend, ENSO, or other sources.
REAL-TIME PREDICTION OF HAZARDOUS AND EXTREME EVENTS.
Disaster preparedness and emergency management is one sector for which prediction of hazardous and extreme weather on S2S time scales is of particular interest (e.g., White et al. 2017). As an example of how SubX real-time forecasts can potentially provide information useful to this sector, Fig. 9 shows precipitation forecasts associated with Hurricane Michael for the SubX real-time forecasts. These forecasts were issued on 20 September 2018, prior to the formation of Michael, and were valid for the 2-week period of 6–19 October. All SubX models indicated the potential for precipitation anomalies in this period in excess of 50 mm over the 2-week period along a line stretching from southwest to northeast across Florida at 3 weeks lead time. Tropical Storm Michael formed on 7 October and made landfall as a hurricane along the Florida panhandle on 10 October. The storm tracked across the panhandle and through the southeastern United States, delivering heavy rainfall. Although the actual track is not accurately predicted at this lead time, the forecast for a potential tropical cyclone and associated enhanced precipitation during this period is useful information, potentially helping emergency managers to plan and aid organizations to stage supplies in anticipation of a disaster. A similar early picture was provided by SubX for Hurricane Harvey. SubX models forecasted anomalously high precipitation over the week spanning 24–31 August in Texas and Louisiana at 3–4-week lead times (not shown). Case studies such as these add to our understanding of the prediction and predictability of extreme events, especially in the context of a database designed for operational considerations.
SubX provides a comprehensive, publicly available research infrastructure in the service of developing better S2S forecasts. It consists of a database of seven global models that have produced a suite of 17 years of historical reforecasts and also have provided weekly real-time forecasts since July 2017. The inclusion of research and operational models and availability of both real-time and retrospective forecasts in SubX provides a unique contribution to community efforts in subseasonal predictability and prediction.
With the availability of subseasonal reforecast databases such as SubX and WWRP/WCRP S2S, it is now possible for the research community to extensively explore the full range of subseasonal predictability, and to develop methodologies for S2S postprocessing including forecast calibration and multimodel ensemble weighting (e.g., Vigaud et al. 2017a,b). Additionally, the contribution of individual models to an MME can be explored comprehensively. The inclusion of research models in SubX makes it possible for this research to directly feedback to model development. The availability of real-time subseasonal forecasts in SubX also enables the development of real-time forecast demonstration prototypes for applications in various socioeconomic sectors. We hope that the community will use the SubX database to provide input into pressing questions in S2S predictability and prediction, design tools relevant to decision making on the S2S time scale, and test and compare model developments for better S2S predictions.
Some important questions regarding S2S predictions remain unanswerable with the current datasets, including SubX. For example, in a second phase of SubX, with a more strict protocol aligning model initialization dates, it would be easier to combine models into an MME and we could better untangle questions about the contributions of individual models. Another improvement for a second phase would be to produce a longer reforecast period and a larger ensemble to evaluate the number of years and ensemble members needed to robustly quantify S2S skill and identify forecasts of opportunity.
The SubX project is funded and was initiated by NOAA’s Climate Program Office’s Modeling, Analysis, Predictions, and Projections program (MAPP) in partnership with the NASA Modeling, Analysis, and Prediction program (MAP); the Office of Naval Research; and NOAA’s NWS Office of Science and Technology Integration. Relevant NOAA award numbers are NA16OAR4310149, NA16OAR4310151, NA16OAR4310150, NA16OAR4310143, NA16OAR4310141, NA16OAR4310146, NA16OAR4310145, NA16OAR4310148. S. Sun and B. W. Green are supported by funding from NOAA Award NA17OAR4320101. N. Barton and E.J. Metzger were funded by the “Navy ESPC in the North-American Multi Model Ensemble” project sponsored by the Office of Naval Research. Computer time for Navy-ESPC was provided by the Department of Defense High Performance Computing Modernization Program. This is NRL contribution NRL/JA/7320-18-4121. Global Modeling and Assimilation Office, NASA Goddard Space Flight Center, Greenbelt, Maryland, and Universities Space Research Association, Columbia, Maryland. CPC precipitation and temperature, NCEP/NCAR Reanalysis, and NOAA Interpolated OLR data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, from their website at www.esrl.noaa.gov/psd/. The Center for Ocean–Land–Atmosphere studies (COLA) provided extensive disk space for performing the model evaluations and also hosts the SubX Website. COLA support for SubX is provided by grants from NSF (1338427) and NASA (NNX14AM19G) and a Cooperative Agreement with NOAA (NA14OAR4310160).
APPENDIX A: SUBX PROTOCOL.
The SubX protocol required that each modeling group adhere to a rigid scope of retrospective and real-time forecasts. The groups agreed to produce 17 years of reforecasts out to a minimum of 32 days for the years 1999–2015. Initialization was required at least weekly, and a minimum of three ensemble members were required, although more were encouraged. Since the land surface (e.g., soil moisture) is an important source of subseasonal predictability (Koster et al. 2010, 2011), all models were required to include a land surface model and initialize both the atmosphere and land. Additionally, coupled ocean–atmosphere models were also required to initialize the ocean. The SubX project has also performed more than 1 year of real-time forecasts. During this demonstration period, forecasts were required to be made available to CPC by 1800 local time every Wednesday. This requirement was relaxed to 0800 local time Thursday partway through the real-time demonstration period. All data were provided on a uniform 1° × 1° longitude–latitude grid as full fields to both CPC for their internal use and the International Research Institute for Climate and Society Data Library (IRIDL) for public dissemination (Kirtman et al. 2017).
APPENDIX B: CLIMATOLOGY AND BIAS CORRECTION.
A forecast is typically initialized with an analysis in which observations have been assimilated, thereby constraining the initial state to represent the observed state as closely as possible. As the forecast time increases, the model state on average moves from the observed climate toward a model-intrinsic climate, which is typically biased. Therefore, it is common practice in S2S predictions to estimate and remove the mean forecast bias using a set of reforecasts (e.g., Zhu et al. 2014). Additionally, the skill of forecasts at S2S time scales is typically evaluated in terms of anomalies or differences from the mean climate, thus requiring a climatology based on reforecasts. Both of these needs are met by determining the model climatology as a function of lead time and initialization date. For seasonal predictions using monthly data, it is typical to calculate the model climatology as a multiyear average for each forecast start month and lead or target time (Tippett et al. 2018). However, calculation of the climatology is not trivial due to differences in initialization day and frequency among models. For example, some forecast models are initialized on the same Julian days every year while others are initialized on a day-of-the-week schedule, meaning that the Julian initialization dates shift from year to year. In the first case, the 17-yr reforecast period yields 17 model runs on some calendar dates and none on the rest. In the second case, only 2–3 model runs are available for each day of the year from which to determine the climatology. An additional challenge for the SubX project was that a climatology was needed to produce bias-corrected forecast anomalies in real-time for CPC prior to the completion of the reforecasts at some centers. The need to compute model climatology adaptively will recur because some models will likely change during the forecast phase due to routine model improvements. Additionally, many operational models used by the CPC only provide reforecasts “on the fly” (e.g., ECMWF and Environment and Climate Change Canada ensembles generate reforecasts for a single day of the year with each real-time forecast initialization).
To compute the climatology, the first step is to calculate ensemble means for individual days of each forecast run. For most groups, ensembles are produced by averaging initialization dates from different hours of the same initialization day; these are averaged to yield ensemble means for the 24-h period spanning each forecast day. In the case of the NAVY-ESPC, which produces ensemble means over runs started on four consecutive days because ocean data assimilation is based on a 24-h data cycle, the ensemble mean consists of a single member for each day. Next, for each day of the year (1–366), a multiyear average of the ensemble means is calculated. Depending on how model runs are scheduled, this may not produce a climatology for each day of the year for some models. Finally, a triangular window is applied to the (fairly noisy as well as sparse in some cases) climatology, meaning that weight decreases linearly with distance from the center point. A smoothing window of 31 days (±15 days) is applied in a periodic fashion such that December smoothing includes January values and vice versa. This approach means that the forecast climatology can be computed from a partial reforecast database whereby only reforecasts with nearby initializations are required. Due to drift from the initial quasi-observed state to the model’s own internal mean state, the climatology for a given calendar day is expected to be different for different lead times. Therefore, the above procedure is performed for each lead time and each model individually. Removal of this climatology from the corresponding full fields produces anomalies and effectively performs a mean bias correction (Becker et al. 2014). Climatologies have been computed for many variables following this procedure and are available from the IRIDL.
Another common methodology is to fit harmonics to the data (Saha et al. 2014; Tippett et al. 2018). Both our smoothing methodology and the fitting of harmonics can be viewed as a special case of local linear regression [Tippett and DelSole 2013; see Hastie et al. (2009) for a review]. Mahlstein et al. (2015) previously proposed using local linear regression to compute climatologies of daily data. Local linear regression estimates a simple function of the predictors using data close to the desired climatology target in such a way as to yield a smooth function of the predictors. Figure A1 demonstrates that with synthetic data and a known climatology, the methodology used in SubX (green line) produces a climatology very close to the one obtained with a harmonic (red) using a similar number of years (16 years) and initial condition sampling (every 7 days) as SubX.
APPENDIX C: MULTIMODEL ENSEMBLE MEAN.
Since the SubX models are initialized on different days, producing an MME becomes a challenging problem (e.g., Vitart et al. 2017). In SubX, we choose to align the verification dates of each model to produce an MME so that skill could be assessed for the same verification period in observations. Additionally, this choice reproduces well the setup for weekly real-time forecasting. Following the same procedure used by CPC for producing real-time forecasts, Saturday is defined as the first day of a given week. All reforecasts for all models that are produced during the prior week (previous Friday–Thursday) are used to produce an MME forecast for weeks 1–4 individually, where week 1 is defined as the first Saturday–Friday interval. Friday initializations are not included in an attempt to mimic real-time forecast procedures. In real time, forecasts provided after 0800 local time Thursday cannot be processed in time to be used by the forecasters because forecasters must review forecast guidance on Thursday and issue the forecast on Friday. This procedure, which also involves forming averages of daily forecasts over the appropriate week, is repeated for weeks 2–4. Weeks 3 and 4 are then averaged together to produce week 3–4 forecasts. Using this procedure, a multimodel ensemble mean, equally weighted by model can be produced by averaging the ensemble means of each of the models for their week 3–4 forecasts. There are some potential drawbacks to this procedure. For example, some models will contribute older forecasts to the MME than others, depending on their initialization date. The extent to which decreased skill with longer lead time is balanced by increased ensemble size and model diversity in such an ensemble remains an open research question to be addressed in future research. Additionally, since the period over which forecasts are obtained is Saturday–Thursday (a 6-day period, used to mimic the 6-day period of real-time forecast initializations) and some of the models initialize once every 7 days, there are times when a model will not be included in the MME, depending on how the reforecast dates fall. For example, this occurs with the ECCC-GEM model in approximately 13% of the weekly forecasts. Finally, in rare cases, it is not possible to produce a week 3–4 forecast for the ECCC-GEM model since part of week 4 is not available due to the reforecast initialization day and 32-day reforecast length.