This workshop1 focused on predictability of the weather–climate interface, specifically aiming to determine how the operational community can deliver useful products with calibrated uncertainty at the intraseasonal to interannual (ISI) time scales using multimodel global Earth system ensembles. At weather time scales (less than 2 weeks) we deliver information that is readily evaluated using the current observation network, is robust, and can be easily presented in calibrated probabilistic terms. At climate time scales of decades or more, we deliver assessments based on the model's ability to reproduce broad climate characteristics, but with little ability to quantify uncertainty. In between there remains a question: How do we provide a calibrated, meaningful product to meet current and emerging energy, agricultural, disaster mitigation, infrastructure, etc., needs at subseasonal to interannual time scales?
What: Scientists representing a broad spectrum of disciplines met to discuss emerging user needs and the current state of the science for seasonal to interannual prediction of the Earth system with a focus on achieving predictability limits with multimodel ensembles.
When: 29 July–1 August 2013
Where: La Jolla, California
The workshop participants first addressed verification across space/time scales and posed the question of whether skill can be demonstrated at the longer time scales such as seasonal and interannual. They questioned whether seamless verification was practical: greater aggregation leads to reduced specificity. It was noted that problems and decisions are different on different time scales. The same evaluation approaches may not be appropriate for different time scales (e.g., 2 weeks, 2 months, 2 years) and may require unique but overlapping verification methods. The need for identifying the most meaningful metrics for different time scales was discussed along with the need to determine how to provide information that represents the needs of a range of users. Both predictability and limits on availability of information were believed to be factors limiting skill at longer time scales. It was suggested that the use of skill verifications for patterns of phenomena—that is, drought regions or regions of high sea surface temperatures (SSTs) instead of mean temperatures or specific values at a location—should be pursued. Also identified was a need to increase specificity on forecasts at longer time scales, which in turn would allow for more objective/specific verification (e.g., for longer time scales, it would be useful to know not only that it will be colder/warmer than normal but also how much colder/warmer than normal and the variability over both time and space). It was also questioned as to whether we can succeed in developing decision-relevant climate validation. From the discussion of these issues, a suggestion was made to develop an aggregate map of user needs/decision-maker requirements. This would likely require a sector-by-sector review (different user communities such as specific industries) of user needs. Related to this, we need visualization of what we can predict at various time scales and users need to be better educated about predictability. A suggestion from this discussion would be to develop a global visual depiction of predictability skill for different forecast values or phenomena (a spatial skill chart). An alternative approach to challenges of insufficient resolution and fidelity of long-range ensembles might be a two-step conditional probability rather than a user/sector approach where long-range ensemble prediction might be validated against known correlation indices at resolved scales from the observational record, which then might be linked to expected “sensible weather” conditions. In other words, predictability in the forecast of El Niño–Southern Oscillation (ENSO), Pacific decadal oscillation (PDO), North Atlantic Oscillation (NAO), Madden–Julian oscillation (MJO), midlatitude blocking, etc., could be assessed through global zonal indices or principle component analysis, which could then be linked to sector-relevant or high-impact observed conditions in the reanalysis record.
The workshop then addressed the complexity of coupled Earth system models. It was felt that many complex phenomena, such as global teleconnections associated with the MJO, were poorly handled by current numerical models. It was noted that these connections involve interactions through the stratosphere and increased vertical resolution in the models would likely be helpful in improving skill. Emphasis to date has been more on increasing horizontal rather than vertical resolution and vertical resolution should be addressed. A question was posed as to how you trace bias/drifts back to specific model developments (targeting development to address bias) in complex coupled systems. For example, in verifying multimodel coupled systems, how do you trace errors and what metric do you use? If the SST is too high, how do you discover that the cirrus clouds in the model are wrong? Do you look at the ocean mixed layer depth when trying to improve tropical cyclone genesis locations? A comment was made that we are getting good at developing statistical methods for measuring skill but not good at developing methods for measuring processes. Additionally, we need to improve our skill in capturing relationships between different variables when tracing model error. Further discussion on tracing model error considered the question of how error/bias is propagated through the ocean–air interface in coupled systems. What physical processes should be looked into in order to investigate errors/bias in forecasting a particular process? This would not likely be the same for different models and would be particularly complex for a multimodel system. Depending on scale, local influences may also need to be considered. A question was posed as to whether models within a multimodel ensemble should be ranked in a particular order based on relative skill for a particular parameter or process (e.g., permafrost or precipitation amount). The group thought it would be useful to develop spatial skill maps for particular models at different lead times.
One highlight of the workshop was a presentation by a meteorologist from a company providing support to energy traders. He stated a need to put actual forecasts in front of people and let them interact with the data: track the data they use and how they use it. This approach indicates both trust and utility. He also presented a trader case study to include the daily challenges of being a natural gas trader (how traders leverage weather information, along with other market information, to evaluate trades). He summarized that weather remains a core component of any fundamental trading strategy. From his perspective, sophisticated analytics and cutting-edge research i) are valued more than in the past but in many cases cannot be leveraged by the end user; ii) provide probabilistic forecasts (full distribution) with skill metrics targeted for the user; iii) provide model diagnostics that the operational forecaster can leverage in real time; and iv) allow user-defined forecasts.
The workshop then asked the following: How do we adjust/calibrate 30-day to annual predictions to enhance predictive skill and customer value? Can this question be answered by scientists in isolation, or do we need to define customers to proceed? On the question of calibration, it was suggested that data provided to the general public should not be calibrated but instead operational centers could interact with users desiring to calibrate to understand their needs and issues. A suggestion was to provide the supportive databases to allow for intelligent calibration efforts, in agreement with a recent National Research Council (2012) report recommendation that the National Oceanic and Atmospheric Administration (NOAA) should make all data available for applications. The bottom line of this discussion was that customer interaction is absolutely necessary while recognizing there are restrictions to federal operational centers tailoring data or information for specific customers.
Final recommendations from the workshop were as follows:
Ensemble skill metrics have been shown to be dependent on the selected target parameter. Identify user-driven metrics that will stress the coupled model parameter space and the ISI temporal space. (“Users” are defined as atmosphere/ocean/ice forecasters that support decision makers.)
Develop a user “scorecard” of five or six targets.
Develop a table of user requirements by area, parameter, threshold, and time.
Sponsor a workshop for ISI users.
Put forecasts in front of users and let them interact.
The scientific hypothesis in support of the multimodel approach is that the diversity of models on average resolves the probability distribution better than an individual model. One test of this hypothesis is to diagnostically evaluate the complementary information provided in multimodel ensembles. For example, given the National Multi-Model Ensemble (NMME) retrospective dataset, it is possible to rigorously evaluate the complementary skill provided by each of the forecast systems. Encourage/support/enable the analysis of multimodel experiments [e.g., phase 5 of the Coupled Model Intercomparison Project (CMIP5), The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE), NMME] to determine the complementary information/skill.
Investigate improved means of providing data in easily accessible format/locations that make it easy for customers to pull data and format them in their own tailored/desired fashion (consider providing the supportive databases to allow for intelligent calibration efforts).
Have data in a digital interactive database so users can set thresholds, etc. Do not calibrate or postprocess, but assist users if necessary.
Develop metrics appropriate for each time/space scale. Metrics need to show geographic or temporal variability; single-number boil downs have their uses but are inadequate to customers.
Use reforecasts to determine skill at larger spatial and temporal scales.
Develop a global map of forecast skill or predictability.
Define comprehensive diagnostics for model improvement.
Initiate studies to identify the degree to which multimodel ensembles provide improved forecasts due to independence versus due to offsetting errors.
Develop products to show forecast uncertainty, in addition to quality.
Develop coupled model data assimilation methods to trace errors in complex coupled ensemble systems.
Reconceptualize how we parameterize models for ensemble prediction at ISI time scales. Do we adopt stochastic parameterizations? Does this change the number of ensemble members required?
Define and provide dataset protocols and tools as well as repository, support, and funding mechanisms to involve the broader community in NMME and in improving ISI prediction and prediction systems.
The workshop was sponsored by the Earth System Prediction Capability interagency project.
1The workshop was sponsored by the National Earth System Prediction Capability (ESPC) project, a multiagency [Department of Defense, NOAA, National Aeronautics and Space Administration (NASA), Department of Energy, and National Science Foundation] effort to advance Earth system predictability at all time scales.