Simultaneous Optimization of 20 Key Parameters of the Integrated Forecasting System of ECMWF Using OpenIFS. Part I: Effect on Deterministic Forecasts

Lauri Tuppi aInstitute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, Helsinki, Finland

Search for other papers by Lauri Tuppi in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-4673-382X
,
Madeleine Ekblom aInstitute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, Helsinki, Finland

Search for other papers by Madeleine Ekblom in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-1133-2361
,
Pirkka Ollinaho bFinnish Meteorological Institute, Helsinki, Finland

Search for other papers by Pirkka Ollinaho in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-1547-4949
, and
Heikki Järvinen aInstitute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, Helsinki, Finland

Search for other papers by Heikki Järvinen in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-1879-6804
Open access

Abstract

Numerical weather prediction models contain parameters that are inherently uncertain and cannot be determined exactly. It is thus desirable to have reliable objective approaches for estimation of optimal values and uncertainties of these parameters. Traditionally, the parameter tuning has been done manually, which can lead to the tuning process being a maze of subjective choices. In this paper we present how to optimize 20 key physical parameters in the atmospheric model Open Integrated Forecasting System (OpenIFS) that have a strong impact on forecast quality. The results show that simultaneous optimization of O(20) parameters is possible with O(100) algorithm steps using an ensemble of O(20) members; the results also show that the optimized parameters lead to substantial enhancement of predictive skill. The enhanced predictive skill can be attributed to reduced biases in low-level winds and upper-tropospheric humidity in the optimized model. We find that the optimization process is dependent on the starting values of the parameters that are optimized (starting from better-suited values results in a better model). The results show also that the applicability of the tuned parameter values across different model resolutions is somewhat limited because of resolution-dependent model biases, and we also found that the parameter covariances provided by the tuning algorithm seem to be uninformative.

Significance Statement

The purpose of this work is to show how to use algorithmic methods to optimize a weather model in a computationally efficient manner. Traditional manual model tuning is an extremely laborious and time-consuming process, so algorithmic methods have strong potential for saving the model developers’ time and accelerating development. This paper shows that algorithmic optimization is possible and that weather forecasts can be improved. However, potential issues related to the use of the optimized parameter values across different model resolutions are discussed as well as other shortcomings related to the tuning process.

© 2023 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Lauri Tuppi, lauri.tuppi@helsinki.fi

Abstract

Numerical weather prediction models contain parameters that are inherently uncertain and cannot be determined exactly. It is thus desirable to have reliable objective approaches for estimation of optimal values and uncertainties of these parameters. Traditionally, the parameter tuning has been done manually, which can lead to the tuning process being a maze of subjective choices. In this paper we present how to optimize 20 key physical parameters in the atmospheric model Open Integrated Forecasting System (OpenIFS) that have a strong impact on forecast quality. The results show that simultaneous optimization of O(20) parameters is possible with O(100) algorithm steps using an ensemble of O(20) members; the results also show that the optimized parameters lead to substantial enhancement of predictive skill. The enhanced predictive skill can be attributed to reduced biases in low-level winds and upper-tropospheric humidity in the optimized model. We find that the optimization process is dependent on the starting values of the parameters that are optimized (starting from better-suited values results in a better model). The results show also that the applicability of the tuned parameter values across different model resolutions is somewhat limited because of resolution-dependent model biases, and we also found that the parameter covariances provided by the tuning algorithm seem to be uninformative.

Significance Statement

The purpose of this work is to show how to use algorithmic methods to optimize a weather model in a computationally efficient manner. Traditional manual model tuning is an extremely laborious and time-consuming process, so algorithmic methods have strong potential for saving the model developers’ time and accelerating development. This paper shows that algorithmic optimization is possible and that weather forecasts can be improved. However, potential issues related to the use of the optimized parameter values across different model resolutions are discussed as well as other shortcomings related to the tuning process.

© 2023 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Lauri Tuppi, lauri.tuppi@helsinki.fi

1. Introduction

Numerical weather prediction (NWP) models solve differential equations in discrete form in a grid or equivalent representation meaning that small scale processes are not solved explicitly. Processes acting on scales smaller than the grid size are parameterized with approximations containing adjustable constants called closure, or tuning, parameters. A major part of the forecast uncertainty related to the model itself can be attributed to the uncertainty of these parameters (Leutbecher and Palmer 2008; Ruiz et al. 2013). Values of these parameters are difficult to determine analytically and are usually adjusted in a trial-and-error process involving expert knowledge to maximize the forecast skill of the model (see e.g., Mauritsen et al. 2012). This is by no means an easy task as several subgrid-scale processes interact simultaneously with the resolved scale dynamics and a “balance” needs to found among all of these processes. From the viewpoint of a single physical process, this balance may not be the technically correct one but instead the one leading to the best overall predictive skill (Franzke et al. 2015; Schmidt et al. 2017). It is also possible that changes in some part of the model, or even just a change of a parameter value, can cause unexpected consequences elsewhere in the model due to presence of underlying compensating biases and nonlinear interactions (Hourdin et al. 2017).

Simultaneous optimization of many parameters, say 20, is challenging because the space spanned by model parameters is highly nonlinear. It is generally known that part-optimization is not a viable option since interactions between parameters can affect model state more than varying the parameters individually (e.g., Qian et al. 2015; Wang et al. 2014). Simultaneous optimization of many parameters in geophysical applications have been attempted by several authors. For instance, Duan et al. (2017) optimized nine parameters in the WRF3 Model for summer-season precipitation events in the Beijing, China, area using adaptive surrogate-modeling-based optimization. They were able to gain substantial improvements in precipitation and temperature. Yang et al. (2013) optimized nine parameters of the convection scheme of CAM5 model using simulated annealing based method. They were able to improve wind and precipitation climatology. More examples are found from other Earth system component models, e.g., Sumata et al. (2019) optimized 15 parameters of a sea ice model using a genetic algorithm, Fennel et al. (2001) used data assimilation for optimization of 14 parameters in a marine ecosystem model, Gong et al. (2015) optimized 40 parameters of the common land model using adaptive surrogate modeling, and Wang et al. (2014) optimized 14 parameters of a hydrological model with surrogate modeling. Houtekamer et al. (2021) provides a different perspective as they optimized model parameters with the focus on improving ensemble forecasting capability instead of deterministic skill. Williamson et al. (2017) and Hourdin et al. (2021) focused on ruling out bad parameter values instead of searching for optimal values.

In this study, we will concentrate on a key set of 20 parameters of the atmospheric prediction model of the ECMWF Integrated Forecasting System (IFS). This parameter set is included in the so-called stochastically perturbed parameters (SPP) scheme that is one option for representing model uncertainty in the ensemble prediction system of ECMWF (Ollinaho et al. 2017). Here, we follow the implementation that is available in the IFS, version cy43r3. The underlying hypothesis in this paper is that the potential prediction skill improvement, which is achievable via parameter estimation, is mostly contained in this set, and the computational efforts of algorithmic optimization should thus be focused there.

The background for this study is provided by Tuppi et al. (2020) who studied model optimization practices with a full-fledged NWP model to find ways to speed up parameter convergence and save computing resources at the same time. The main result boils down to following “recipe”:

  • use ensemble initial states but not model stochastic physics,

  • use a comprehensive measure as a cost function,

  • use a relatively short forecast range (e.g., 24 h), and

  • use a relatively small ensemble (e.g., 20 members).

The rationale of the first item is to use as diverse as possible sample of weather states in order to acquire additional robustness for the outcome of tuning. Using stochastic model physics besides initial state perturbations was shown to not bring any benefits. A comprehensive cost function (target for optimization) is to be used so that the model state would be gauged thoroughly. Tuppi et al. (2020) used the so-called moist total energy norm that is a multivariate integral over the entire model atmosphere. The forecast range should be kept short so that nonlinearity of the state evolution does not blur the signal coming from the parameter perturbations. Using an ensemble size of ∼20 is a compromise between stability and efficiency. Very small ensembles tend to result in unstable parameter convergence, whereas very large ensembles enhance the convergence little relative to the additional computational cost.

The aim here is to demonstrate the practicality of tuning an NWP model while obtaining notable gain in the forecast skill. The research question is thus: Is it possible to simultaneously estimate a large [O(20)] number of model parameters and achieve significant prediction skill improvement? This study tests the practicality of the knowledge mentioned above and also shows how the performance of the model changes with tuning.

This study will show that the recipe of Tuppi et al. (2020) works remarkably well considering that optimizing 20 parameters at once is a tough multidimensional problem. An optimal model is found with an order of 100 steps. However, because of the 1) internal stochastic variability of the algorithm and 2) nonlinearity of the problem, different optima are reached with successive optimizations, each of which outperforms the default model. These model versions can, in fact, be considered as proposals that experts need to evaluate further and possibly fine-tune. Closer inspection of the optimized models can also hint about possible structural shortcomings. An example is the well-known zonal bias of the low-level winds in IFS (Sandu et al. 2020). This apparent lack of low-level convergence near the intertropical convergence zone (ITCZ) is present in our results, too, and likely cannot be fixed with optimization.

The outline of the paper is as follows. Section 2 presents the algorithmic tuning model EPPES, the numerical weather prediction model OpenIFS, tools for ensemble forecasting and the verification method of optimized model. Section 3 explains the parameters used and the experimental setup. Section 4 shows the results from the different experiments. Sections 5 and 6 discuss and conclude the study.

2. Methods

This section gives an overview of the methods. The optimization algorithm EPPES will be presented briefly followed by a short description of OpenIFS, the parameters, and the verification method.

a. Ensemble prediction and parameter estimation system (EPPES)

In this paper, the ensemble prediction and parameter estimation system (EPPES; Laine et al. 2012; Järvinen et al. 2012) is used for optimizing the model parameters. EPPES is a fully nonlinear parameter estimation method. It is a hierarchical statistical method that uses Gaussian proposal distributions, importance sampling, and sequential modeling for optimizing model parameters and their uncertainties while running ensembles of short, initialized forecasts.

EPPES is summarized here very shortly, but more technical details can be found in appendix A. The basic idea of EPPES is the following: The algorithm samples an ensemble of parameter vectors, one for each ensemble member, from a prior distribution based on the hyperparameters of the method. Then, an ensemble of forecasts is run with the perturbed parameters. For each parameter vector, a cost function is evaluated using the forecast and a reference (e.g., analysis or observation). From the cost function values, the algorithm calculates importance weights, which are used to update the hyperparameters for subsequent resampling. The process continues until a stopping criterion (e.g., number of steps) is reached. A detailed application example can be found from, e.g., Ollinaho et al. (2013).

Based on (Tuppi et al. 2020), the following setup is used: ensembles with 20 members are run with perturbed initial states and a forecast length of 36 h, and the weights are calculated from ranked cost function values. Using ranking means that the cost function values themselves are only used to sort the ensemble members ascending. Importance weights thus depend on the ranks only. Because of using the ranks, EPPES always assigns large weights on the best ensemble members (with the smallest ranks) regardless of how small the differences in the cost function values are. Tuppi et al. (2020) found this to be a practical way to enhance convergence rate of the parameter mean values.

The moist total energy norm (see e.g., Ehrendorfer et al. 1999) is used as a cost function. It is calculated as
ΔEm=12MaηD(u2+υ2+cpTrT2+cqL2cpTrq2)dDdη+12AD(RTrprlogps2)dD,
where u′, υ′, T′, q′, and ps are the differences between the forecast and the reference state for u and υ wind components, temperature, specific humidity, and surface pressure; cp is the specific heat capacity of air at constant pressure; L is the vaporization energy of water; R is the gas constant of dry air; Ma is the total mass of the atmosphere; A is the surface area of Earth; Tr and pr are reference values for temperature and pressure, with Tr set to 280 K and pr set to 1000 hPa; and cq is a scaling constant set to unity. The integrals are over the horizontal D and vertical η domains (Ollinaho et al. 2014). This formulation of the moist total energy norm essentially tells how far two atmospheric states are from one another in terms of joules per kilogram.

In our optimization experiments, we compare the forecasts with perturbed parameters against analyses (based on model version 43r3). The aim of the optimization is to find a parameter vector leading to the smallest moist total energy norm at 36-h forecast lead time when compared with ECMWF analyses. The interpretation is that a faithful, or optimal, atmospheric model stays in the vicinity of observations if measured with an energy-relevant yardstick. This formulation of the cost function assumes that the analyses represent the real world observations well (see e.g., Virman et al. 2021), However, in reality, the analyses can have biases as well, and grid-form representation can treat small-scale features poorly. The set of analysis states is created from the initial state dataset provided by Ollinaho et al. (2021).

b. OpenIFS and tools for ensemble forecasting

OpenIFS is the atmospheric forecasting model of ECMWF’s Integrated Forecasting System (IFS). OpenIFS has the same hydrostatic dynamical core and physical parameterization schemes as well as the land surface and wave model as the full IFS. OpenIFS does not include any data assimilation routines. This study uses OpenIFS, version 43r3, which is the model version of IFS that was operational between 11 July 2017 and 5 June 2018. OpenIFS with model resolution TL399 and 91 vertical levels is mainly used in the optimization experiments. The model resolution TL399 corresponds to approximately 50-km grid spacing in horizontal direction. In addition, resolutions TL639 (∼32 km) and TL159 (∼120 km) are used in verification of tuned models. For a detailed description of OpenIFS 43r3, see the IFS documentation (ECMWF 2017a,b).

OpenEPS (Ollinaho et al. 2021) is a lightweight workflow manager for running ensemble forecasts for research purposes. Within OpenEPS it is possible to run ensembles of different numerical models (in our case, OpenIFS) and to set up postprocessing methods and optimization algorithms, such as EPPES, to be a part of the workflow. Because OpenEPS manages the workflow, the user only needs to define the experimental setting, such as model configuration, ensemble size, and how often the ensembles are launched. When running an optimization algorithm, definition of the cost function and instructions how it should be calculated are required. For a more detailed description of OpenEPS and how to set experiments up, see Ollinaho et al. (2021).

The ensemble forecasts used in the optimization experiments have initial state perturbations activated and model perturbations deactivated. The initial state perturbations contain a mix of ensembles of data assimilation (EDA) and singular vector (SV) perturbations, as in the ECMWF operational suite [for more details, see e.g., Ollinaho et al. (2021)]. The ensemble size is set to 20 and the forecast range to 36 h. The forecast range is based on initial sensitivity testing showing stronger response coming from the radiation scheme parameters with slightly longer forecasts, while sensitivity to the other parameters stayed strong for 36 h and started decreasing later.

Two different initial state datasets are used here. The training material for optimization (a.k.a. the dependent dataset) covers ECMWF analyses and initial state perturbations at 0000 and 1200 UTC from December 2016 to November 2017 for launching ensembles. The verification dataset (a.k.a. the independent dataset) covers the period from December 2017 to November 2018 for the operational deterministic ECMWF analyses at 0000 and 1200 UTC. The analyses are generated at ECMWF with the IFS model, versions 43r3 (December 2017–4 June 2018) and 45r1 (5 June 2018–November 2018), using the default parameter values.

c. Verification methods

To verify the performance of the parameters suggested by EPPES, deterministic OpenIFS forecasts are run both with the default and optimized parameters using the independent dataset (December 2017 to November 2018) once a week. The root-mean-square error is calculated against those analyses as
RMSE=1DD(xy)2dD,
where x is the forecast, y is the corresponding analysis, and D refers to integration over the horizontal model domain. RMSE is calculated for geopotential, horizontal wind components, temperature, and specific humidity at pressure levels of 100, 250, 500, 700, 850, and 1000 hPa, as well as for surface variables: 2-m temperature (t2m), mean sea level pressure (mslp), 10-m horizontal wind components (u10m, v10m), total cloud cover (tcc), low cloud cover (lcc), medium cloud cover (mcc), high cloud cover (hcc), and total column water vapor (tcwv). The 850- and 1000-hPa levels intersect the ground level in various locations, therefore points that are more than 30 hPa below the surface are ignored. We consider slight extrapolation below the surface to be acceptable, since the model and analysis are using the same extrapolation method meaning that points slightly below the surface can also contain useful information about the lower part of the model domain. The results are presented as global RMSE scorecards showing the relative difference to the default model. If the tuned parameters improve the model relative to the analysis, the RMSE is smaller than the RMSE of the default model, hence a negative change shown as blue in the scorecard. Degradation of the model, instead, leads to the RMSE being larger than RMSE of the default model, which is shown as red. If the change is statistically significant at the 95% level with the two-sample t test, a black dot is added to the scorecard for that specific forecast length and variable.

The tuned model versions are verified with 10-day forecasts of resolutions of TL159, TL399, and TL639. The focus is on resolution TL399 that is used in tuning and on higher-resolution TL639. The possibility of using lower resolution in tuning is of interest since successful tuning with lower resolution would increase the prospects of using ensemble-based tuning methods for operationally used model resolutions.

3. OpenIFS parameters and experimental setup

The set of key parameters belongs to the following subgrid-scale physical parameterization schemes: turbulent diffusion and subgrid orography, convection, cloud and large-scale precipitation, and radiation. Table 1 gives an overview of the parameterization schemes and model parameters therein. These parameters are supposed to be used universally across all model resolutions. The scale-aware features have been taken into account already in the design of the parameterization schemes, and these parameter values visible to the outside should, in theory, work for all available OpenIFS resolutions (ECMWF 2017b). For the application of parameters in our experiments, we note that 1) CFM parameter is defined separately for ocean and land areas in SPP, but we treat it as one parameter, except for one experiment (see details below), and 2) we treat convective momentum transport caused by zonal wind (CUDU) and meridional wind (CUDV) as one parameter (CUDUDV) because it is not physically meaningful to treat the directions separately.

Table 1.

Short explanations of the parameters used in the optimization. Parameter names, physical meanings, and code-level forms or values are explained. Parameters are grouped by parameterization schemes: turbulent diffusion and subgrid orography, convection, cloud and large-scale precipitation, and radiation [from Ollinaho et al. (2017) and ECMWF (2017b)].

Table 1.

To facilitate the application of algorithmic tuning, the parameter values are normalized multiplicatively:
Xn=X/X0,
where X is the actual parameter value and X0 is the parameter default value. There are two practical reasons for using normalized parameter values instead of their actual values. First, during their convergence testing Tuppi et al. (2020) found out that EPPES does not work very well if the values of the parameters differ several orders of magnitude from each other, and second, some of the parameters are actually two-dimensional climatological fields or other two or three-dimensional quantities inside the model.

The aim of this paper is to tune the deterministic values of the set of 20 (or 19; see below) OpenIFS parameters. The first experiment Edef mimics an optimization experiment spanning over three years, but, since we have ensemble initial states for one year only, the same year is used three times but selecting different initialization dates each time. Thus, the “first” year starts from 0000 UTC 1 December 2016, and ensembles are launched every 3 days until the end of dataset. We then continue for the “second” year starting from 0000 UTC 2 December 2016, and again to the “third” year starting from 0000 UTC 3 December 2016. Of course, the variability of atmospheric states in Edef is not as rich as in a real 3-yr dataset. The other experiments (E1, E2, E3, and ECFM) are 1-yr experiments starting from 0000 UTC 1 December 2016, and they use initial states every 3 days (totaling 122 ensembles). The experimental setup for the algorithm is based on the results of Tuppi et al. (2020) and consists of 20-member 36-h-long ensemble forecasts with perturbed initial states and model parameters. The initial setup of the parameter distributions in the five experiments is as follows:

  1. Default Experiment (Edef)—the initial values of the parameters are the default values of OpenIFS,

  2. Experiment 1 (E1)—as in Edef because this experiment is just the first year of Edef,

  3. Experiment 2 (E2)—half of the parameters have 5%-higher values than the default while the other half have 5%-lower values than the default,

  4. Experiment 3 (E3)—as in E2, but with opposite signs, and

  5. Experiment CFM (ECFM)—as in E1, but with the CFM parameter treated separately for ocean and land.

For convenience, parameter values are normalized to unity in this presentation. For all five experiments, the initial variance of the parameters is set to 0.1, corresponding to roughly 32% standard deviation (see Tables A1 and A2 in appendix A for details). Our main objective is to search for the optimal mean value for these parameters, but the optimization algorithm needs an initial estimate on how certain (or uncertain) the initial parameter value is. The initial uncertainty is set relatively high to allow EPPES to generously explore the parameter space while still limiting the potentially detrimental effects (Tuppi et al. 2020).

Besides the parameter mean values, EPPES provides also full covariance matrix for the parameters. Therefore, parameter covariances are shortly investigated with Pearson correlation empirically from the parameter mean values of a number of tuning experiments and from the final covariance matrices of EPPES. For the empirical correlation search, 10 additional short tuning experiments are run besides the four experiments E1–E3 and ECFM. The short experiments are initialized as Edef but the initial parameter mean values are sampled randomly from uniform distribution [0.95, 1.05]. Ensembles are launched every 7 days (0000 UTC 1 December 2016, 0000 UTC 8 December 2016, …, 0000 UTC 30 November 2017), leading to 53 algorithm steps. Experiment E1 is also rerun without cost function ranking (called Enorank hereinafter) for the assessment of correlations based on the covariance matrices of EPPES. Using the cost function ranking is an aggressive method to scale up the discrimination of parameter vectors because the differences in the cost function values become smaller as tuning is progressing. Comparing the empirical correlations and correlations obtained from the final covariance matrices of E1 and Enorank can provide an answer as to whether there are simple ways to extract information about the shape of the parameter space and about how good EPPES is in identifying the parameter covariances.

4. Results

This section is divided into five parts: 1) evolution of parameter values during optimization, 2) forecast skill using global scorecards, 3) investigation into changes in the model behavior, 4) application to different model resolutions and 5) investigation of parameter correlations.

a. Evolution of optimization

Figure 1 shows convergence of 19 parameters during the “3 yr” baseline optimization experiment consisting of 365 steps (Edef). In general, almost all parameters show at least some degree of convergence during the experiment. Figure 1 shows that, first, there are parameters that are well identifiable and likely correct. They are sensitive to the cost function, their uncertainty becomes small, and the convergence takes place very close to the default parameter value. This means that the need for optimization of these parameters is very small. Examples of this type of parameter include ENTRORG, ENTSHALP and CUDUDV. Second, there are parameters that are well identifiable but converge away from the default value. These parameters are also sensitive to the cost function and are likely to benefit from optimization. Examples of this type of parameters include CFM, RCLCRITSNOW and DELTA_AERO. Third, there are parameters that are poorly identifiable but likely correct. In this case the parameter uncertainty remains substantial but the mean value stays close to the default value. These parameters probably are already close to their optimal values but their uncertainty remains large due to their relative insensitivity to the selected cost function. Examples of this type of parameters include ZDECORR and ZHS_VDAERO. The fourth class contains poorly identifiable and likely incorrect parameters. The values of these parameters are poorly constrained in the context of the moist total energy cost function, see Eq. (1); however, there is still some indication that the optimal value may differ from the default value. Examples of this type include DETRPEN and ZSIGQCW. It is justified to question whether the insensitive parameters should be adjusted at all or whether instead the cost function should be adjusted to better accommodate the insensitive parameters.

Fig. 1.
Fig. 1.

Parameter convergence during a 365-steps-long optimization experiment Edef. Year 2017 is used three times but with different initialization dates being used on each round. The x axis shows the running number of steps of EPPES, and the y axis shows the relative parameter values. Parameter values assigned to individual ensemble members are shown with black dots, the parameter mean value is shown with a red line, the parameter mean value ±2 standard deviations is shown with blue lines, and the original parameter value is shown with a green line. The vertical magenta lines point out the iteration number 122 that is the last iteration of the first year. Only every third parameter ensemble is displayed.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

Figure 1 is also indicative about how many steps are needed for parameter convergence. If the intention is to only search for optimal deterministic parameter values and ignore the uncertainty, there is quite little fluctuation in the mean values after about 50 steps. It seems a good stopping criterion value, for example, in search for potential correlations among the parameter values. Additional steps nevertheless add confidence to the results as uncertainties decrease. The decrease of uncertainty for most parameters slows down considerably after about 100 steps. Thus, continuing optimization further brings only little additional value considering the amount of computational resources needed. This justifies using only one year (122 steps) in the main optimization experiments E1–ECFM. However, an interesting detail is that the convergence of RCLDIFF and RLCRITSNOW suddenly accelerates again after about 200 steps.

Parameter convergence figures are complemented here with figures about the cost function evolution during the estimation process. Figure 2 shows the evolution of the moist total energy norm ΔEm during the experiments. In all cases ΔEm values evolve toward smaller values with growing number of steps implying that the algorithm converges toward parameter vectors that minimize the chosen target function. All the experiments are able to find parameter combinations that result in ΔEm being slightly lower than in the corresponding ensemble with the default parameter values (Fig. 2e). The differences in the cost function are less than 1% at the final steps. It is surprising how small the differences are between the experiments given their substantially different parameter values (Fig. 3). This indicates that there are multiple local optima that minimize the cost function almost equally well.

Fig. 2.
Fig. 2.

Evolution of the moist total energy norm (ΔEm) in the four tuning experiments (a) E1, (b) E2, (c) E3, and (d) ECFM. The x axis shows the iteration count during the experiments, and the y axis shows the value of ΔEm. In (a)–(d), green dots show ΔEm for individual ensemble members and red crosses show the ensemble mean. (e) A zoom-in to the last 10 steps in (a)–(d), along with control ensembles created with the default model (cyan). In (e), ensemble members are shown with dots, and ensemble mean values are shown as crosses.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

Fig. 3.
Fig. 3.

Initial and final parameter values and remaining standard deviation of the parameter values of the default experiment Edef and the four main experiments E1, E2, E3, and ECFM. The parameter names are shown between the initial and final values, and short references to the names of the parameterization schemes are shown on the left y axis. The parameter values are shown with respect to the default values used in OpenIFS, thus being 1.0 for all parameters. Experiments Edef, E1, and ECFM begin from the default values, whereas in case of E2 and E3 the initial values are either 1.05 or 0.95. In experiment ECFM, the CFM parameter is treated as two parameters separately for ocean and land areas. Initial variance of the parameters is 0.1, corresponding to a standard deviation of about 0.32.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

Figure 2 provides additional evidence that one year with 122 steps is sufficient for convergence. One must bear in mind, however, that the training material lacks interannual variations and the uncertainties are underestimated. It is thus fair to talk about the optima only within the limitations of training material. The ΔEm value plateaus after about 80 steps and only the day-to-day fluctuations in predictability of the weather remains.

Figure 2 also shows that the initial parameter uncertainty was likely too large and included grossly nonphysical propositions. Actually, there are even ΔEm values outside the plot. However, Fig. 2 shows that EPPES is able to deal with bad parameter propositions. Furthermore, EPPES manages to rule out all bad options in less than 30 steps.

The outcome of tuning experiments Edef, E1, E2, E3, and ECFM is summarized in Fig. 3. The final parameter values in Fig. 3 show that the outcomes of the tuning experiments are surprisingly variable, and in some cases the parameter values experience surprisingly large changes: up to 30%. However, several parameters seem to agree on the direction of change.

As already discussed in relation to Fig. 1, the differences in parameter identifiability and need for adjustment indicated by EPPES seem to be different for parameters belonging to different parameterization schemes. There are several parameters, mainly in turbulence and convection schemes, which give strong indication to which direction they should be adjusted. This can be seen from both final parameter values and final standard deviations. The main tuning experiments E1, E2, E3 and ECFM agree relatively well on the direction of change for the parameter values, and also the final standard deviations are lower than for parameters in large scale precipitation and radiation (with a few exceptions). In the default model, CUDUDV and ENTRORG have very likely optimal values already. The EPPES algorithm proposes that the values of CFM, DETRPEN, RPRCON and ZSIGQCW should be increased and the values of VDEXC_LEN, RTAU, RAMID and DELTA_AERO should be decreased. However, uncertainty of DETRPEN and ZSIGQCW remain large in the experiments.

b. Impacts on forecast skill

Figure 4 shows global RMSE verification scorecards for the optimized model versions using the independent dataset. The scorecards show RMSE values as relative change with respect to the default model. A negative value implies a gain in forecast skill relative to the default model. Instead, a positive value implies a loss in forecast skill. The verification reveals two types of optimal models. The first type represented by E1, E2 and E3 is characterized by a strong gain in forecast performance especially in wind components and specific humidity. However, the gains come with the cost of degrading geopotential. The other type, represented by ECFM, is characterized by slight gains in skill in most aspects, except for specific humidity at 850 hPa near the forecast range of 36 h and weaker improvement of the wind components throughout the forecast range. The maximum gain at 36 h coincides with the forecast range of 36 h used in the tuning, implying that in ECFM, EPPES did exactly what it was configured to do, i.e., to find a parameter vector that minimizes ΔEm at 36-h forecast range. This means also that the optimal parameter values depend on the forecast range used in the optimization.

Fig. 4.
Fig. 4.

Verification of the four main tuning experiments using global RMSE scorecards with respect to the default model. The x axis of each panel shows the forecast range (h), and the y axis shows the atmospheric variables used: Z is geopotential; U and V are the zonal and meridional components of wind, respectively; T is temperature; and Q is specific humidity. The Z, U, V, T, and Q are verified at six pressure levels from the top down: 100, 250, 500, 700, 850, and 1000 hPa. The two-dimensional variables at the bottom of the panels are 2-m temperature (t2m), mean sea level pressure (mslp), 10-m zonal and meridional wind (u10m; v10m), total cloud cover (tcc), low cloud cover (lcc), medium cloud cover (mcc), high cloud cover (hcc), and total column of water vapor (tcwv). Blue color means that the optimized model outperforms the default model. Black dots indicate that the difference between the optimized and default models is statistically significant at the 95% level according to a two-sample t test. Geopotential data at 100 and 250 hPa are missing because of missing operational analysis.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

Besides these tuning experiments presented in this paper, we have run other tuning experiments without land/sea separation in turbulent momentum transport coefficient CFM. The results of those experiments (not shown) suggest that ECFM is different than E1–E3 because OpenIFS responds relatively nonlinearly to the value of CFM. The numerical value of CFM is between 1.144 and 1.186 in E1–E3, but an additional experiment got a value of 1.094, and the scorecard of this experiment is very similar to the scorecard of ECFM, which has CFM values of 1.099 for the sea and 0.907 for the land. CFM seems to have a tipping point between 1.094 and 1.144 that changes the nature of OpenIFS remarkably. However, possible parameter interactions are not considered here.

Experiments E1, E2, and E3 are considered to be more useful in the sense that the footprint of the forecast range in tuning appears weaker and the forecast skill improvements cover the entire forecast range used in the verification. E1 appears to be the best of these four optimized model versions. It has few deteriorating fields and the skill degradation is not statistically significant except for 10-m u wind at 12 h. The average overall RMSE reduction is −0.65%. Respective change for E2 is −0.36%, that for E3 is −0.32%, and that for ECFM is −0.44%. We also verified the outcome of the default experiment Edef (i.e., E1 extended for another 243 steps) of Fig. 1. The result is very similar to that of E1. Average of the RMSE reduction is −0.66%. Thus, the parameters do not improve much during the last 243 steps. This supports the conclusion that O(100) steps is sufficient for parameter convergence, unless new training material with more atmospheric variability is introduced.

To gain further insight into which parameters contribute the most to the changes in the forecast skill, an array of single parameter sensitivity tests using the parameter values of E1 was devised. These sensitivity tests were carried out in the same way as the verification of the model versions setting one parameter at a time to the optimized value and setting other parameters to default. It turned out that the majority of the single-parameter impact is due to four parameters: CFM (turbulent momentum transport to the surface), DETRPEN (detrainment of air from the deep convective clouds), RPRCON (conversion rate of cloud water/ice to rain/snow in convection), and RAMID (relative humidity threshold for stratiform cloud formation). These four examples are shown in appendix B. The parameters ENTRORG (entrainment of environmental air into convective clouds) and CUDUDV (vertical momentum transport caused by convection) are also sensitive but the change of their value is so small that the contribution remains low. The array of sensitivity tests also reveals that most of the changes related to wind components and geopotential are attributed to CFM parameter. Most of the contribution to improvement of specific humidity, especially in the upper troposphere, comes from DETRPEN, RPRCON and RAMID. At the same time these three parameters act to balance the degradation of geopotential caused by CFM. Improvement of upper-tropospheric temperature can be mainly attributed to DETRPEN and RPRCON. Also, ZSIGQCW (standard deviation of horizontal distribution of cloud water) has a weak positive effect on most of the verified variables, especially at short and long (7–10 day) forecast ranges. However, this kind of analysis obscures any potential compound effects caused by changing values of multiple parameters simultaneously, which can be the reason why DETRPEN appears to be an important parameter in the sensitivity tests, but insensitive during actual optimization in Edef and E1.

c. A closer inspection of the model behavior

This section takes a closer look at the most pronounced changes taking place in the optimized model E1 as based on the independent verification dataset. In E1, the improvements are strong in specific humidity at 250 hPa and wind at 1000 hPa. Figure 5 shows a specific humidity bias of the default model at 250 hPa over the entire forecast range (Fig. 5a) and how the specific humidity changes in E1 with respect to the default model (Fig. 5b). The default model is moister than the operational analysis: thus, the generally positive bias. However, there are also regions where there is a negative bias indicating that besides global positive bias, there are displacement errors present as well. In E1, the 250-hPa level becomes drier almost everywhere between 50°N and 50°S. Closer to the polar regions there are no significant changes. Drying of the 250-hPa level does not seem to affect the high cloud cover (hcc) (see Fig. 4) as the RMSE for hcc stays neutral. Figure 5b also shows that algorithmic tuning can reduce the moist bias but it cannot fix the displacement errors. This means that besides the somewhat suboptimal tuning, the default model may have some structural shortcomings, too.

Fig. 5.
Fig. 5.

(a) Relative bias of specific humidity at 250 hPa in the default model with respect to operational analyses, and (b) how the E1 model version differs from the default model. In (a), data from 53 ten-day forecasts and their respective operational analyses are used. The analyses are subtracted from the forecasts, and then the difference fields are averaged over the 53 forecasts. Last, the differences are averaged over the forecast range. The plot in (b) is produced the same way except that the default forecasts are subtracted from the forecasts produced with the E1 model version. In (a), the red color means that the default model is too moist and blue means that the default model is too dry. In (b), the red color means that the E1 model version is moister than the default model and blue means that the E1 model version is drier than the default model. To facilitate the interpretation, the panels have been smoothed with a 5 × 5 gridpoint moving average, and areas where the default model has bias larger than ±8% are highlighted with contours.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

Previously, it was noted that DETRPEN, RPRCON and RAMID have the strongest impact on specific humidity RMSE through drying of troposphere. DETRPEN controls how much air is detrained out from deep convective clouds. Increasing DETRPEN leads to more air detraining from clouds to the surroundings. This means that especially weaker convective updrafts detrain already at lower levels and less humidity is transported to the upper troposphere. RPRCON controls how fast cloud water and ice are converted to rain and snow. Faster conversion means there is less water to spread and evaporate into surroundings through detrainment. RAMID controls the relative humidity threshold after which clouds begin to form. RAMID is lowered slightly in E1, implying that clouds can form in drier air hindering the decrease of cloud cover due to drying.

Decreasing specific humidity has a direct effect on temperature as well. The default model has a slight warm bias in the mid- to upper troposphere and E1 leads to cooling due to less water vapor absorbing outgoing longwave radiation (not shown). Initially, this cooling effect decreases the bias, which improves the verification scores, as can be seen in Fig. 4a. However, the cooling continues and an equally large cold bias develops and the scores return to neutral. Cooling is especially strong at 250 hPa where there is the most drying, and the cold bias becomes larger in magnitude than the original warm bias.

Figures 6 and 7 show the biases of zonal and meridional wind components at the 1000-hPa pressure level in Figs. 6a and 7a and how E1 changes the default model in Figs. 6b and 7b. Figure 6a shows how the default model tends to have too strong westerly component in the storm tracks on both hemispheres and too strong easterly component in many areas near the poles and in tropics. Figure 6b shows that E1 generally tends to slow down the low-level zonal winds. In most areas, the biases in the default model and the improvement in E1 coincide very well, and for example in the storm tracks, algorithmic tuning is able to reduce the westerly bias by up to 20%. However, in the tropics there are areas where the easterly wind component is already too weak in the default model and it becomes even weaker in E1. The most notable examples include eastern parts of the tropical Pacific and tropical Atlantic Ocean regions.

Fig. 6.
Fig. 6.

(a) Bias of zonal wind at 1000 hPa in the default model with respect to operational analyses and (b) difference of E1 model version to the default model. The arrows show the average wind speed and direction in the default model in the verification forecast dataset. Note that the color scale of (b) is one-fifth that of (a). In (a), the red color means too-strong westerly or too-weak easterly wind and blue means too-weak westerly or too-strong easterly wind in the default model depending on the prevailing wind direction. Biases stronger than ±0.5 m s−1 are highlighted with contours. For example, midlatitude westerlies are too strong and tropical easterlies are too strong. In (b), the red color means that westerlies become weaker or easterlies become stronger in E1 with respect to the default model. Blue means that westerlies become stronger or easterlies become weaker in E1 with respect to the default model. For example, midlatitude westerlies become weaker in E1 as well as tropical easterlies except for the north Indian Ocean. Areas that are, on average, more than 30 hPa below the surface are cut out.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

Fig. 7.
Fig. 7.

As in Fig. 6, but for meridional wind at 1000 hPa. In (a), the red color now means too-strong southerly or too-weak northerly wind component and blue means too-weak southerly or too-strong northerly wind component in the default model. For example, the default model has too-strong southerly wind component in the Norwegian Sea, south of Alaska, and west of Chile, too-weak northerly component north of the intertropical convergence zone (ITCZ), and too-weak southerly component south of the ITCZ. In (b), the red color means that E1 has stronger southerly component or weaker northerly component than the default model. Blue means that E1 has weaker southerly component or stronger northerly component than the default model.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

The low-level meridional wind component also tends to be too strong in the extratropics. However, the already known issue of too zonal wind (Sandu et al. 2020) manifests itself as too weak meridional component toward the ITCZ in the trade wind zone (Fig. 7a). The optimized parameter values lead to global slowdown of the meridional wind as well. In the extratropics, this is strongly beneficial as the changes are the opposite of biases in most of the areas between the panels in Fig. 7. However, the convergence toward ITCZ decreases as well, which is a strongly detrimental effect. The fact that tuning increases particular wind error (the lack of convergence toward ITCZ) even though the wind improves overall, suggests that the lack of low-level convergence is another structural shortcoming in OpenIFS.

Most of the improvement of low-level wind could be traced back to the CFM parameter (see Fig. B1). CFM controls how efficient the turbulent momentum transfer to the surface is. Therefore, increased CFM leads to more efficient momentum transfer and hence, slowdown of the low-level wind.

d. Performance across resolutions

The performance of the optimized parameters with other OpenIFS resolutions is tested with focus on the higher resolution of TL639 (∼32 km). The parameter values obtained from the four main experiments (E1, E2, E3 and ECMF) are verified using a set of 53 independent 10-day-long forecasts initialized every 7 days between December 2017 and November 2018. The control model and tuned models are compared with operational analyses using RMSE [Eq. (2)], and the results are again collected into scorecards. Figure 8 shows the RMSE scorecards for the four main experiments. In general, the scorecards show that the tuned parameters have a slight positive impact on several parameters in the short range but slight negative impact in medium to long range, and averages over the scorecards are neutral (E1 = −0.01%, E2 = +0.06%, E3 = +0.13%, and ECMF = −0.13%).

Fig. 8.
Fig. 8.

Verification of the four main tuning experiments (a) E1, (b) E2, (c) E3, and (d) ECFM with global RMSE using resolution of TL639. The blue color denotes improvement, red indicates deterioration relative to the default model, and black dots mark statistical significance at 95% level. The atmospheric variables and pressure levels are as in Fig. 4.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

A couple of atmospheric fields are studied further to see the similarities and differences between TL399 and TL639 resolutions. Specific humidity at 250 hPa (Q250) shows strong improvement throughout the forecast range in all four experiments in TL639 also. Closer inspection of the Q250 biases in E1 shows that both TL399 and TL639 model resolutions have similar structure consisting of large areas of moist bias and smaller areas of dry bias near the equator (not shown). In E1 the near-tropopause level becomes drier between 50°N and 50°S, improving the scores of Q250. However, the drying leads to strong cooling of the atmosphere and development of widespread cold bias leading to decreased scores for temperature in the medium and long range.

In the contrary to TL399, the 1000-hPa wind components do not improve much for TL639 (Fig. 8). This is also seen in closer inspection of E1 in Fig. 9, which shows the zonal and meridional wind biases in the TL639 default model and the change of the wind components. In this case the TL639 default model has clearly smaller and more localized biases than TL399 default model in Figs. 6 and 7. However, tuning affects the fields roughly in the same way as in the TL399 case meaning that there are large areas where the tuned parameters induce new biases as the wind speed generally decreases. The slight improvement of 1000-hPa U wind seen in the scorecards seems to originate from a few areas in the tropical and subtropical oceans and the northern hemispheric storm tracks. The slight improvement of V wind originates from a few localized areas in the extratropical oceans.

Fig. 9.
Fig. 9.

As in Figs. 6 and 7, but for resolution TL639, showing (a),(c) average wind speed and direction in the default model (arrows), the biases of zonal and meridional wind (color shading), and highlighted areas of biases larger than ±0.5 m s−1 (contours). Also shown are (b),(d) E1 changes relative to the default model.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

The parameter values of E1 were also tested with a lower resolution of TL159 but due to the lack of independent initial states, the dependent year 2017 was used. In this test, all variables improve at all levels except for 2-m temperature and low cloud cover at all forecast ranges and 10-m wind in short forecasts (not shown).

e. Parameter correlations

Based on the experiments shown so far, it seems that most of the parameter convergence occurs in the early part the optimization. Here, we use this information and use short tuning experiments [O(50) steps] to examine possible correlations between the parameters in an empirical manner. We then compare these results with those given by EPPES. Figure 10a shows Pearson correlations computed from the four main experiments and 10 additional short tuning experiments. In general, the empirical correlations between the parameters appear to be relatively weak but for a number of parameter pairs the correlation is statistically significant at 95% level. Many of the correlations are difficult to explain but there are some physically intuitive examples as well. A couple of examples are explained here with the help of documentation of physical processes in IFS (ECMWF 2017b). ENTRORG and DETRPEN have statistically significant negative correlation. Both parameters are related to how efficiently the convective updraft and environmental air are mixing together. An increase of either parameter leads to more efficient mixing, which inhibits deep convection, and also leads to the convection being shallower. Thus, they can compensate each other naturally, and the correlation should be negative. ENTSHALP and RAMID also have a significant negative correlation. ENTSHALP controls how moisture is distributed in the lowest 200-hPa layer of the atmosphere. Increased ENTSHALP leads to more even distribution of low-level moisture, meaning that formation of stratus clouds near the surface decreases unless the relative humidity threshold for stratiform cloud formation (RAMID) is lowered. Thus, these two parameters can also compensate each other naturally. Both parameter pairs do have statistically significant negative correlations in Fig. 10a. On the code level ENTRORG and ENTSHALP should be negatively correlated, since the entrainment of shallow convection is proportional to ENTSHALP multiplied with ENTRORG. Exactly that is seen in Fig. 10a even though the correlation is not quite statistically significant.

Fig. 10.
Fig. 10.

Pearson correlations between parameter pairs: (a) correlations computed on the basis of the final parameter mean values of experiments E1, E2, E3, and ECFM along with 10 additional short tuning experiments (the CFM parameter has the land–sea separation in ECFM, so for this purpose a global average of the two components has been taken), and correlations computed (b) from the EPPES-given final parameter covariance matrix of experiment E1 and (c) for an experiment Enorank that does not use ranking of the cost function.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

The empirically obtained correlations are compared with correlations computed from the final covariance matrices of two tuning experiments E1 and Enorank, where Enorank differs from E1 only in that ranking of the cost function is inactive. A comparison of the empirically obtained correlations with those obtained from final covariance matrices of two tuning experiments (Fig. 10) shows that, first, EPPES tends to give substantially stronger correlations for the parameters and, second, the correlations change substantially from one experiment to another. The additional experiment without ranking of the cost function also shows that the overestimation of the correlations is not caused by the aggressive settings of EPPES we used in the other experiments. Comparison of Figs. 10b and 10c shows that the correlations appear to be unrealistically strong regardless of the use of cost function ranking. Similar figures were created also from the 10 short tuning experiments, and in those figures the correlations seem to vary almost randomly between the experiments (not shown). It is difficult to see any physical reason for the unrealistically strong and random nature of the parameter correlations but it rather points to the incapability of EPPES to model the parameter covariances correctly in case of a fully realistic NWP model. The degeneracy of the covariance matrix is not caused by the settings used in tuning but is caused by something more fundamental in the design of EPPES. The only useful information in the covariance matrix is how fast the uncertainties of the parameters are shrinking.

5. Discussion

This study of algorithmic model optimization with the EPPES algorithm has shown that the recipe of optimization in (Tuppi et al. 2020) works well in fully realistic cases with about 20 parameters. The recipe suggested that most of the information about the parameter space can be extracted already with relatively short optimization experiments. This turned out to be true for the deterministic parameter mean values. After about 50 steps, the parameter mean values change only very little even though the uncertainty keeps decreasing for hundreds of steps. This means, on one hand, that algorithmic optimization has flexibility as the number of steps can be chosen based on the user needs. On the other hand, this can be seen as an overfitting problem: the variability in the training material is limited, and the uncertainty reduction would probably not occur if the material with richer interannual variability was included. Nevertheless, we chose 122 steps (forecasts initialized every third day for one year) for E1, E2, E3, and ECFM as a reasonable compromise between estimation uncertainty and efficient use of computational resources. With an ensemble size of 20 and a forecast range of 36 h, the continuous simulation time in our 1-yr experiments amounts to 10 years and the number of model simulations being 2440, which we still consider acceptable. We used a larger number of model simulations than for example Dunbar et al. (2021) who argue that O(100) model simulations is sufficient for model optimization and uncertainty quantification with their method. For capturing impacts of intra- and interannual variability on parameter values, this may be on the low side.

The cost function was evaluated at one forecast range only (36 h) and the cost function formulation was fixed to Eq. (1). These choices affect parameter identifiability and convergence in tuning. The forecast range used in tuning is clearly visible in Figs. 4 and 8 indicating that the parameters are flow and lead-time dependent, which in turn suggests that OpenIFS might contain evolving systematic errors as well. The lead-time dependency hints that the cost function might need to be extended to evaluate forecast skill on multiple forecast ranges. Ollinaho et al. (2013) already tried combining different forecast ranges with limited results. However, the cost function they used was quite primitive in nature (mean-square error of 500-hPa geopotential). It is also possible that these evolving errors are structural and cannot, therefore, be fixed with tuning. It is an endless road to try to specify different ranges and cost function formulations for different parameters to improve identifiability. These can easily lead to other problems, such as those related to partial optimization which ignores parameter interactions. In our view, it is best to try to be objective and transparent in the choices made in the algorithmic tuning process. Even the use of total energy norm as a cost function is a subjective choice. The absolutely best option would be to use a directly observation-based formulation—for example, filter likelihood score (Ekblom et al. 2023).

Other unfortunate result discovered during testing the tuned parameters with higher model resolution is that some model biases seem to be strongly resolution-dependent, indicating that the optimal parameters may be resolution-specific even though the parameter values in OpenIFS are supposed to be universal across resolutions. For example, the low-level wind speed is too strong in most parts of the oceans in the TL399 default model but this bias is largely nonexistent in the TL639 default model. Resolution-dependent biases can potentially mean that tuning with lower resolution works only for some limited aspects of the model, and that the tuned parameter values might not be optimal for the operational resolution model. The resolutions used here (50 and 32 km) are still far away from the IFS operational resolution of 9 km, so there is still a lot to explore in this sense, and we are planning to study this further in the future.

We would also like to emphasize that tuning with 50-km resolution implies that certain small-scale high-impact features are almost completely unconstrained in our experiments meaning that they may experience unexpected effects due to the tuned parameter values. Good examples of poorly constrained high-impact features are tropical cyclones that are poorly represented even with higher resolution of 32 km (see Figs. 6 and 7 of Ollinaho et al. 2021). Other poorly constrained features include local circulation patterns near coastlines and complex orography. Tuning a model for operational use requires balancing a huge amount of competing requirements, and high-impact features should not be forgotten.

We had suspicions that the parameter covariances in EPPES may not be quite right and that it could be caused by the use of cost function ranking in the main tuning experiments. This was studied by comparing parameter correlations calculated empirically from the outcomes of several tuning experiments and by comparing the empirical correlations with those computed from the final covariance matrices provided by EPPES. It turned out that the reason for too strong and randomly behaving correlations in EPPES is of more fundamental origin than using the cost function ranking. The degeneration of the parameter covariances may be at least a partial explanation to the lack of consistency in the tuning outcomes. Too fast degeneration may prevent convergence to the global optimum and enable convergence to a random local optimum instead. However, studying what exactly causes the covariance matrix of EPPES to degenerate in fully realistic tuning is out of the scope of this paper and is left for the future.

Tuppi et al. (2020) concluded in their study that initial state perturbations do not hinder the parameter convergence, i.e., the perturbations can be used to increase the sample diversity and increase the likelihood that the unknown true atmospheric state is accommodated sufficiently. However, the consequences of using initial state perturbations in finding optimal parameter values that improve the model forecast skill have not been assessed thoroughly. Tuppi et al. (2020) speculate that using initial state perturbations during tuning could affect the ensemble forecasting system’s ability to generate spread. We have a new paper in preparation (D. Köhler et al. 2023, unpublished manuscript) in which we hope to fully explore this issue. One of the early results from this study shows that tuning with perturbed initial states decreases the spread of ensemble forecasts significantly even though the deterministic model improves.

Two aspects related to the optimization experiments were surprising. First, the changes of the parameter values were very large both between the tuned models and the default model, and between the different tuned model versions. Our assumption was that OpenIFS is a well-optimized model and the largest expected changes would be below 10% while they turned out to be roughly 30%. This could be caused by the resolution-dependent biases discussed above, compensation of processes in OpenIFS as well as the limited training material not representing a sufficiently rich selection of different weather types. Second, we were surprised that the parameter sensitivity can vary substantially between the experiments. A good example is DETRPEN that shows almost no sensitivity in Edef and E1 but is at least moderately sensitive in the other experiments. Both varying sensitivity and varying outcome may be manifestations of compensating processes, or they could be linked to the shortcomings of the tuning algorithm discussed above as well. A way to control these two features could be useful since it would enable more precise tailoring of the cost function. However, as noted above, this can lead to increased subjectivity of the optimization.

We advocate algorithmic optimization per se and as a tool to obtain parameter vector proposals for more close inspection by experts. A test was carried out to see if it is possible to manually enhance the results of algorithmic optimization with a modest amount of effort. We used the array of parameter sensitivity tests and information from all algorithmic optimization experiments and combined all of the features, leading to notable gain of skill. This way it was possible to improve the average of the RMSE scorecard of E1 from −0.65% to −0.99% (not shown). EPPES outcomes can, hence, be interpreted also as guidance in which direction to move, which is in line with the conclusions of Houtekamer et al. (2021): an expert can further optimize a model by interpreting the results of algorithmic optimization.

6. Conclusions

We studied how to efficiently optimize numerical weather prediction model parameters using the Ensemble prediction and parameter estimation system (EPPES, Järvinen et al. 2012; Laine et al. 2012). Our research question was: Is it possible to simultaneously estimate a large [O(20)] number of model parameters and achieve significant prediction skill improvement. We focused on a key set of important OpenIFS model parameters and configured the EPPES algorithm to find a model parameter vector that leads to the slowest forecast error growth as measured with the moist total energy norm. The key set of parameters is included at ECMWF in the SPP scheme (IFS cy43r3 implementation; Ollinaho et al. 2017) totaling 20 parameters (19 if surface drag is considered the same for ocean and land). The algorithmic tuning was demonstrated with OpenIFS at TL399 resolution (50-km grid spacing). EPPES was able to find such parameter values that, first, produced consistently lower cost function values than the respective control ensembles, and second, led to significant improvement in prediction skill when verified in an independent dataset. Thus, the recipe of (Tuppi et al. 2020) turned out to be a very cost effective guide to set up algorithmic optimization experiments. However, our method is by no means perfect yet because it also tends to degrade some aspects of the model.

Our experiments provided additional evidence that optimization experiments do not need to be very long and expensive. Most of the information about the optimal parameter values could be extracted with experiments of O(100) steps using a moderate ensemble of O(20) members. The main optimization experiments of this paper used 122 steps. Even shorter experiments already contain useful information about the parameter values but then the accuracy may suffer as the decrease of uncertainty has not consolidated the parameters to their optimal values yet. EPPES is extremely efficient in ruling out bad parameter values but fine tuning is slower.

Running a small number of optimization experiments revealed that the optimized parameter values can differ substantially from the default values and between the experiments. However, verification against analysis with global RMSE revealed that all optimization experiments led to improved performance of the model. This clearly illustrates that there is no single correct solution for the optimization problem but a multitude of locally optimal parameter values, each of which having a characteristic impact on model behavior. Hence, running a number of optimization experiments is useful in giving parameter vector proposals from which to choose the most suitable one. As EPPES always tends to converge to some optimum, output of multiple tuning experiments can be used to probe the shape of the parameter space. In our examples, a number of physically reasonable correlations were found.

We noted the following after tuning OpenIFS with EPPES: wind components U and V at 1000 hPa improved significantly due to slowdown of wind caused by increased surface drag, and specific humidity q at 250 hPa reduced the upper-tropospheric moisture and the associated bias in the tropics and the subtropics. In relative terms, EPPES could decrease these biases by up to about 20%. However, even these fields contain features that cannot be fixed with algorithmic optimization. In OpenIFS, there is a lack of low-level wind convergence near ITCZ that actually becomes worse in the optimized model, and the specific humidity bias at 250 hPa is not positive everywhere but optimization decreases specific humidity almost everywhere at that level.

We still foresee room for further research on the topic of algorithmic model tuning with EPPES. EPPES seems to have some degeneracy issues with parameter covariances that have not surfaced before in trivial testing. Applicability of the tuned parameter values across different model resolutions deserves more attention than the verification tests shown here. Ensemble forecasting-based tuning of high-resolution models is expensive so any means for saving computational resources are welcome. The initial results with low-resolution tuning are partly promising but resolution-dependent biases could cause problems together with poorly constrained small-scale high-impact features. Also, the scope of this paper was limited to improving the deterministic forecast skill, so any considerations of how the design of tuning experiments affects ensemble forecasts were left out to be presented in Part II of the article series (D. Köhler et al. 2023, unpublished manuscript).

Acknowledgments.

We thank Marcus Köhler for creating the independent set of OpenIFS initial states used in verification of the optimized model versions. Climate Data Operators (Schulzweida 2018) was used for postprocessing OpenIFS output, and Python Matplotlib (Hunter 2007) was used for producing the figures. The authors acknowledge CSC–IT Center for Science, Finland, for generous computational resources and Juha Lento for user support of CSC infrastructure. The authors also thank Marko Laine for technical help with EPPES and discussions on how to use EPPES in tuning, as well as Heikki Haario and Olle Räty for insightful general discussions about algorithmic optimization. We are thankful for funding from multiple sources: 1) Doctoral Programme in Atmospheric Sciences of University of Helsinki, Finland; 2) Academy of Finland [Grants 333034 (author Järvinen), 1333034 (authors Tuppi and Järvinen), and 316939 (author Ollinaho)]; and 3) European Union’s Horizon 2020 research and innovation program under Grants 101003826 CRiceS (author Ollinaho) and 101003470, the NextGEMS project (authors Tuppi, Ekblom, and Järvinen). We acknowledge the comments from two anonymous reviewers and Editor Ron McTaggart-Cowan, which helped to improve the paper further. The authors have no competing interests to declare.

Data availability statement.

OpenIFS model requires a license for usage; the details can be seen online (https://confluence.ecmwf.int/display/OIFS/OpenIFS+Licensing). OpenEPS is freely available under Apache 2.0 license (https://github.com/pirkkao/OpenEPS, last access: 27 June 2022). EPPES is available under MIT licence on Zenodo (https://doi.org/10.5281/zenodo.3757580). OpenIFS ensemble initialization states are available online (https://a3s.fi/oifs-t399/YYYYMMDDHH.tgz, where YYYYMMDDHH is the initialization time of the forecast in order of year, month, day, and hour, with last access being 30 November 2022 and size being ∼7.4 GBytes per file). Verifying analyses are downloaded from ECMWF’s Meteorological Archival and Retrieval System (MARS).

APPENDIX A

EPPES Details and Settings

The details presented here are based on the original description of EPPES (Laine et al. 2012). EPPES is a hierarchical statistical algorithm that uses a set of hyperparameters (μ, W, Σ, n) for estimating the model parameters θ = (θ1, …, θM), where M is the total number of parameters to be estimated. The hyperparameters μ and Σ define how much information we have about the unknown parameter vector θ as a distribution N(μ, Σ). The other two hyperparameters W and n tell our knowledge of the accuracy of μ and Σ. The hyperparameters evolve iteratively and are connected as follows:
θiN(μi,Σi),
μiN(μi1,Wi1), and
ΣiiWish(Σi1,ni1),
where i is the time window (or, algorithm step) and iWish is the inverse Wishart distribution. The hyperparameters are initialized on the basis of prior knowledge of the parameters and are updated during the run of the algorithm. In a perturbed ensemble of size N, each member is assigned its own values of the parameter vector θ sampled from N(μ, Σ). After the ensemble is evaluated, the sample of parameters is weighted and resampled according to a chosen cost function or the rank of the cost function, depending on whether the ranking option is used. Given these resampled parameters as θij, for j = 1, …, N at the ith iteration step, the update equations for the hyperparameters are given by
Wij=(Wi11+Σi11)1,
μij=Wi(Wi11+Σi11θij),
ni=ni1+1, and
Σij=[ni1Σi1+(θijμij)T(θijμij)]/ni,
followed by calculating the mean for each hyperparameters as μi=(1/N)j=1Nμij, with j denoting the ensemble member. For a detailed background of the update equations, we refer interested readers to Laine et al. (2012).

The idea of the algorithm is as follows:

  1. Initialize hyperparameters (μ0, W0, Σ0, n0); The proposal distribution for time window i = 0 is N(μ0, Σ0).

  2. Sample N parameter vectors θj (j = 1, …, N) from the proposal distribution.

  3. Run an ensemble forecast with N members with the perturbed parameters proposed at the previous step.

  4. Evaluate the performance of the parameters by computing a cost function; if cost function ranking is used, sort the cost function values, and save the ranks.

  5. On the basis of the cost function values or the ranks, calculate importance weights.

  6. Use the importance weights to resample the parameter vectors.

  7. Update the hyperparameters using their previous values and the resampled parameter vectors.

  8. Move forward in time and go back to step 2 unless the stopping criteria are fulfilled.

From (Tuppi et al. 2020), we concluded that an ensemble with 20 members (N = 20), a forecast length of 24 h, and perturbed initial values provide the best estimate for tuning parameters of OpenIFS with EPPES. In this study, we optimize parameters originating from four different parameterization schemes: turbulent diffusion and subgrid orography, convection, cloud and large-scale precipitation, and radiation. Because these schemes are not equally sensitive at the same forecast ranges, we decided to use forecasts of 36 h instead of 24 h as was suggested in (Tuppi et al. 2020). For example, the radiation scheme shows better sensitivity at 48 h than at 24 h but the convection scheme strongly prefers 24 h, and using 36-h forecasts was a good compromise. We use the moist total energy norm described in Eq. (1) as a cost function, and the importance weights are calculated by ranking the cost function values with the smallest cost function value ranked as number one. Using the importance weights, a resample is drawn from the distribution. The resampled parameter vectors are then used to update the hyperparameters with the update equations given in Eq. (A2). As a stopping criterion for EPPES, we used maximum number of iterations. The illustration in Fig. A1 shows a visual overview of the evolution of EPPES and detailed technical settings for EPPES for recreating optimization experiment E1 are provided in Tables A1 and A2.

Fig. A1.
Fig. A1.

Visual overview of the principle of EPPES.

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

Table A1.

Initial values of the parameters for EPPES for experiment E1. The values are relative to the original parameter values. Lower and upper boundaries are hard limits for EPPES to not propose parameter values outside the range.

Table A1.
Table A2.

Namelist settings of EPPES [from Tuppi et al. (2020)].

Table A2.

APPENDIX B

Single-Parameter Sensitivity Tests

Figure B1 shows single-parameter sensitivity tests for the most influential parameters CFM, DETRPEN, RPRCON, and RAMID using the values obtained in E1. These sensitivity experiments have been constructed in the same way as the verification of the main experiments E1, E2, E3, and ECFM except that only the parameter in question has been modified. Figure B1 suggests that a large part of the impact in 250-hPa specific humidity in E1 originates from DETRPEN, RPRCON, and RAMID whereas almost all of the contribution to low-level wind and midlevel geopotential comes from CFM.

Fig. B1.
Fig. B1.

As in Fig. 4, but for single-parameter sensitivity tests in which only the parameter in question has been adjusted according to the final parameter mean values proposed by EPPES in experiment E1 (CFM = 1.144, DETRPEN = 1.154, RPRCON = 1.128, and RAMID = 0.928).

Citation: Monthly Weather Review 151, 6; 10.1175/MWR-D-22-0209.1

REFERENCES

  • Duan, Q., and Coauthors, 2017: Automatic model calibration: A new way to improve numerical weather forecasting. Bull. Amer. Meteor. Soc., 98, 959970, https://doi.org/10.1175/BAMS-D-15-00104.1.

    • Search Google Scholar
    • Export Citation
  • Dunbar, O. R. A., A. Garbuno-Inigo, T. Schneider, and A. M. Stuart, 2021: Calibration and uncertainty quantification of convective parameters in an idealized GCM. J. Adv. Model. Earth Syst., 13, e2020MS002454, https://doi.org/10.1029/2020MS002454.

    • Search Google Scholar
    • Export Citation
  • ECMWF, 2017a: IFS Documentation CY43R3—Part III: Dynamics and numerical procedures. ECMWF IFS Doc. 3, 31 pp., https://doi.org/10.21957/8l7miod5m.

  • ECMWF, 2017b: IFS Documentation CY43R3—Part IV: Physical processes. ECMWF IFS Doc. 4, 221 pp., https://doi.org/10.21957/efyk72kl.

  • Ehrendorfer, M., R. M. Errico, and K. D. Raeder, 1999: Singular-vector perturbation growth in a primitive equation model with moist physics. J. Atmos. Sci., 56, 16271648, https://doi.org/10.1175/1520-0469(1999)056<1627:SVPGIA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Ekblom, M., L. Tuppi, O. Räty, P. Ollinaho, M. Laine, and H. Järvinen, 2023: Filter likelihood as an observation-based verification metric in ensemble forecasting. Tellus, 75A, 6987, http://doi.org/10.16993/tellusa.96.

    • Search Google Scholar
    • Export Citation
  • Fennel, K., M. Losch, J. Schröter, and M. Wenzel, 2001: Testing a marine ecosystem model: Sensitivity analysis and parameter optimization. J. Mar. Syst., 28, 4563, https://doi.org/10.1016/S0924-7963(00)00083-X.

    • Search Google Scholar
    • Export Citation
  • Franzke, C. L. E., T. J. O’Kane, J. Berner, P. D. Williams, and V. Lucarini, 2015: Stochastic climate theory and modeling. Wiley Interdiscip. Rev.: Climate Change, 6, 6378, https://doi.org/10.1002/wcc.318.

    • Search Google Scholar
    • Export Citation
  • Gong, W., Q. Duan, J. Li, C. Wang, Z. Di, Y. Dai, A. Ye, and C. Miao, 2015: Multi-objective parameter optimization of common land model using adaptive surrogate modeling. Hydrol. Earth Syst. Sci., 19, 24092425, https://doi.org/10.5194/hess-19-2409-2015

    • Search Google Scholar
    • Export Citation
  • Hourdin, F., and Coauthors, 2017: The art and science of climate model tuning. Bull. Amer. Meteor. Soc., 98, 589602, https://doi.org/10.1175/BAMS-D-15-00135.1.

    • Search Google Scholar
    • Export Citation
  • Hourdin, F., and Coauthors, 2021: Process-based climate model development harnessing machine learning: II. Model calibration from single column to global. J. Adv. Model. Earth Syst., 13, e2020MS002225, https://doi.org/10.1029/2020MS002225.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., B. He, D. Jacques, R. McTaggart-Cowan, L. Separovic, P. A. Vaillancourt, A. Zadra, and X. Deng, 2021: Use of a genetic algorithm to optimize a numerical weather prediction system. Mon. Wea. Rev., 149, 10891104, https://doi.org/10.1175/MWR-D-20-0238.1.

    • Search Google Scholar
    • Export Citation
  • Hunter, J. D., 2007: Matplotlib: A 2D graphics environment. Comput. Sci. Eng., 9, 9095, https://doi.org/10.1109/MCSE.2007.55.

  • Järvinen, H., M. Laine, A. Solonen, and H. Haario, 2012: Ensemble prediction and parameter estimation system: The concept. Quart. J. Roy. Meteor. Soc., 138, 281288, https://doi.org/10.1002/qj.923.

    • Search Google Scholar
    • Export Citation
  • Laine, M., A. Solonen, H. Haario, and H. Järvinen, 2012: Ensemble prediction and parameter estimation system: The method. Quart. J. Roy. Meteor. Soc., 138, 289297, https://doi.org/10.1002/qj.922.

    • Search Google Scholar
    • Export Citation
  • Leutbecher, M., and T. N. Palmer, 2008: Ensemble forecasting. J. Comput. Phys., 227, 35153539, https://doi.org/10.1016/j.jcp.2007.02.014.

    • Search Google Scholar
    • Export Citation
  • Mauritsen, T., and Coauthors, 2012: Tuning the climate of a global model. J. Adv. Model. Earth Syst., 4, M00A01, https://doi.org/10.1029/2012MS000154.

    • Search Google Scholar
    • Export Citation
  • Ollinaho, P., M. Laine, A. Solonen, H. Haario, and H. Järvinen, 2013: NWP model forecast skill optimization via closure parameter variations. Quart. J. Roy. Meteor. Soc., 139, 15201532, https://doi.org/10.1002/qj.2044.

    • Search Google Scholar
    • Export Citation
  • Ollinaho, P., H. Järvinen, P. Bauer, M. Laine, P. Bechtold, J. Susiluoto, and H. Haario, 2014: Optimization of NWP model closure parameters using total energy norm of forecast error as a target. Geosci. Model Dev., 7, 18891900, https://doi.org/10.5194/gmd-7-1889-2014.

    • Search Google Scholar
    • Export Citation
  • Ollinaho, P., and Coauthors, 2017: Towards process-level representation of model uncertainties: Stochastically perturbed parametrizations in the ECMWF ensemble. Quart. J. Roy. Meteor. Soc., 143, 408422, https://doi.org/10.1002/qj.2931.

    • Search Google Scholar
    • Export Citation
  • Ollinaho, P., G. D. Carver, S. T. Lang, L. Tuppi, M. Ekblom, and H. Järvinen, 2021: Ensemble prediction using a new dataset of ECMWF initial states—OpenEnsemble 1.0. Geosci. Model Dev., 14, 21432160, https://doi.org/10.5194/gmd-14-2143-2021.

    • Search Google Scholar
    • Export Citation
  • Qian, Y., and Coauthors, 2015: Parametric sensitivity analysis of precipitation at global and local scales in the community atmosphere model CAM5. J. Adv. Model. Earth Syst., 7, 382411, https://doi.org/10.1002/2014MS000354.

    • Search Google Scholar
    • Export Citation
  • Ruiz, J. J., M. Pulido, and T. Miyoshi, 2013: Estimating model parameters with ensemble-based data assimilation: A review. J. Meteor. Soc. Japan, 91, 7999, https://doi.org/10.2151/jmsj.202013-201.

    • Search Google Scholar
    • Export Citation
  • Sandu, I., P. Bechtold, L. Nuijens, A. Beljaars, and A. Brown, 2020: On the causes of systematic forecast biases in near-surface wind direction over the oceans. ECMWF Tech. Memo. 866, 23 pp., https://doi.org/10.21957/wggbl43u.

  • Schmidt, G. A., and Coauthors, 2017: Practice and philosophy of climate model tuning across six U.S. modeling centers. Geosci. Model Dev., 10, 32073223, https://doi.org/10.5194/gmd-10-3207-2017.

    • Search Google Scholar
    • Export Citation
  • Schulzweida, U., 2018: CDO user guide: Climate Data Operators, version 1.9.5. MPI for Meteorology Rep., 215 pp., https://doi.org/10.5281/zenodo.2558193.

  • Sumata, H., F. Kauker, M. Karcher, and R. Gerdes, 2019: Simultaneous parameter optimization of an Arctic sea ice–ocean model by a genetic algorithm. Mon. Wea. Rev., 147, 18991926, https://doi.org/10.1175/MWR-D-18-0360.1.

    • Search Google Scholar
    • Export Citation
  • Tuppi, L., P. Ollinaho, M. Ekblom, V. Shemyakin, and H. Järvinen, 2020: Necessary conditions for algorithmic tuning of weather prediction models using OpenIFS as an example. Geosci. Model Dev., 13, 57995812, https://doi.org/10.5194/gmd-13-5799-2020.

    • Search Google Scholar
    • Export Citation
  • Virman, M., M. Bister, J. Räisänen, V. A. Sinclair, and H. Järvinen, 2021: Radiosonde comparison of ERA5 and ERA-Interim reanalysis datasets over tropical oceans. Tellus, 73A, 1929752, https://doi.org/10.1080/16000870.2021.1929752.

    • Search Google Scholar
    • Export Citation
  • Wang, C., Q. Duan, W. Gong, A. Ye, Z. Di, and C. Miao, 2014: An evaluation of adaptive surrogate modeling based optimization with two benchmark problems. Environ. Modell. Software, 60, 167179, https://doi.org/10.1016/j.envsoft.2014.05.026.

    • Search Google Scholar
    • Export Citation
  • Williamson, D. B., A. T. Blaker, and B. Sinha, 2017: Tuning without over-tuning: Parametric uncertainty quantification for the NEMO ocean model. Geosci. Model Dev., 10, 17891816, https://doi.org/10.5194/gmd-10-1789-2017.

    • Search Google Scholar
    • Export Citation
  • Yang, B., and Coauthors, 2013: Uncertainty quantification and parameter tuning in the CAM5 Zhang–McFarlane convection scheme and impact of improved convection on the global circulation and climate. J. Geophys. Res. Atmos., 118, 395415, https://doi.org/10.1029/2012JD018213.

    • Search Google Scholar
    • Export Citation
Save
  • Duan, Q., and Coauthors, 2017: Automatic model calibration: A new way to improve numerical weather forecasting. Bull. Amer. Meteor. Soc., 98, 959970, https://doi.org/10.1175/BAMS-D-15-00104.1.

    • Search Google Scholar
    • Export Citation
  • Dunbar, O. R. A., A. Garbuno-Inigo, T. Schneider, and A. M. Stuart, 2021: Calibration and uncertainty quantification of convective parameters in an idealized GCM. J. Adv. Model. Earth Syst., 13, e2020MS002454, https://doi.org/10.1029/2020MS002454.

    • Search Google Scholar
    • Export Citation
  • ECMWF, 2017a: IFS Documentation CY43R3—Part III: Dynamics and numerical procedures. ECMWF IFS Doc. 3, 31 pp., https://doi.org/10.21957/8l7miod5m.

  • ECMWF, 2017b: IFS Documentation CY43R3—Part IV: Physical processes. ECMWF IFS Doc. 4, 221 pp., https://doi.org/10.21957/efyk72kl.

  • Ehrendorfer, M., R. M. Errico, and K. D. Raeder, 1999: Singular-vector perturbation growth in a primitive equation model with moist physics. J. Atmos. Sci., 56, 16271648, https://doi.org/10.1175/1520-0469(1999)056<1627:SVPGIA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Ekblom, M., L. Tuppi, O. Räty, P. Ollinaho, M. Laine, and H. Järvinen, 2023: Filter likelihood as an observation-based verification metric in ensemble forecasting. Tellus, 75A, 6987, http://doi.org/10.16993/tellusa.96.

    • Search Google Scholar
    • Export Citation
  • Fennel, K., M. Losch, J. Schröter, and M. Wenzel, 2001: Testing a marine ecosystem model: Sensitivity analysis and parameter optimization. J. Mar. Syst., 28, 4563, https://doi.org/10.1016/S0924-7963(00)00083-X.

    • Search Google Scholar
    • Export Citation
  • Franzke, C. L. E., T. J. O’Kane, J. Berner, P. D. Williams, and V. Lucarini, 2015: Stochastic climate theory and modeling. Wiley Interdiscip. Rev.: Climate Change, 6, 6378, https://doi.org/10.1002/wcc.318.

    • Search Google Scholar
    • Export Citation
  • Gong, W., Q. Duan, J. Li, C. Wang, Z. Di, Y. Dai, A. Ye, and C. Miao, 2015: Multi-objective parameter optimization of common land model using adaptive surrogate modeling. Hydrol. Earth Syst. Sci., 19, 24092425, https://doi.org/10.5194/hess-19-2409-2015

    • Search Google Scholar
    • Export Citation
  • Hourdin, F., and Coauthors, 2017: The art and science of climate model tuning. Bull. Amer. Meteor. Soc., 98, 589602, https://doi.org/10.1175/BAMS-D-15-00135.1.

    • Search Google Scholar
    • Export Citation
  • Hourdin, F., and Coauthors, 2021: Process-based climate model development harnessing machine learning: II. Model calibration from single column to global. J. Adv. Model. Earth Syst., 13, e2020MS002225, https://doi.org/10.1029/2020MS002225.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., B. He, D. Jacques, R. McTaggart-Cowan, L. Separovic, P. A. Vaillancourt, A. Zadra, and X. Deng, 2021: Use of a genetic algorithm to optimize a numerical weather prediction system. Mon. Wea. Rev., 149, 10891104, https://doi.org/10.1175/MWR-D-20-0238.1.

    • Search Google Scholar
    • Export Citation
  • Hunter, J. D., 2007: Matplotlib: A 2D graphics environment. Comput. Sci. Eng., 9, 9095, https://doi.org/10.1109/MCSE.2007.55.

  • Järvinen, H., M. Laine, A. Solonen, and H. Haario, 2012: Ensemble prediction and parameter estimation system: The concept. Quart. J. Roy. Meteor. Soc., 138, 281288, https://doi.org/10.1002/qj.923.

    • Search Google Scholar
    • Export Citation
  • Laine, M., A. Solonen, H. Haario, and H. Järvinen, 2012: Ensemble prediction and parameter estimation system: The method. Quart. J. Roy. Meteor. Soc., 138, 289297, https://doi.org/10.1002/qj.922.

    • Search Google Scholar
    • Export Citation
  • Leutbecher, M., and T. N. Palmer, 2008: Ensemble forecasting. J. Comput. Phys., 227, 35153539, https://doi.org/10.1016/j.jcp.2007.02.014.

    • Search Google Scholar
    • Export Citation
  • Mauritsen, T., and Coauthors, 2012: Tuning the climate of a global model. J. Adv. Model. Earth Syst., 4, M00A01, https://doi.org/10.1029/2012MS000154.

    • Search Google Scholar
    • Export Citation
  • Ollinaho, P., M. Laine, A. Solonen, H. Haario, and H. Järvinen, 2013: NWP model forecast skill optimization via closure parameter variations. Quart. J. Roy. Meteor. Soc., 139, 15201532, https://doi.org/10.1002/qj.2044.

    • Search Google Scholar
    • Export Citation
  • Ollinaho, P., H. Järvinen, P. Bauer, M. Laine, P. Bechtold, J. Susiluoto, and H. Haario, 2014: Optimization of NWP model closure parameters using total energy norm of forecast error as a target. Geosci. Model Dev., 7, 18891900, https://doi.org/10.5194/gmd-7-1889-2014.

    • Search Google Scholar
    • Export Citation
  • Ollinaho, P., and Coauthors, 2017: Towards process-level representation of model uncertainties: Stochastically perturbed parametrizations in the ECMWF ensemble. Quart. J. Roy. Meteor. Soc., 143, 408422, https://doi.org/10.1002/qj.2931.

    • Search Google Scholar
    • Export Citation
  • Ollinaho, P., G. D. Carver, S. T. Lang, L. Tuppi, M. Ekblom, and H. Järvinen, 2021: Ensemble prediction using a new dataset of ECMWF initial states—OpenEnsemble 1.0. Geosci. Model Dev., 14, 21432160, https://doi.org/10.5194/gmd-14-2143-2021.

    • Search Google Scholar
    • Export Citation
  • Qian, Y., and Coauthors, 2015: Parametric sensitivity analysis of precipitation at global and local scales in the community atmosphere model CAM5. J. Adv. Model. Earth Syst., 7, 382411, https://doi.org/10.1002/2014MS000354.

    • Search Google Scholar
    • Export Citation
  • Ruiz, J. J., M. Pulido, and T. Miyoshi, 2013: Estimating model parameters with ensemble-based data assimilation: A review. J. Meteor. Soc. Japan, 91, 7999, https://doi.org/10.2151/jmsj.202013-201.

    • Search Google Scholar
    • Export Citation
  • Sandu, I., P. Bechtold, L. Nuijens, A. Beljaars, and A. Brown, 2020: On the causes of systematic forecast biases in near-surface wind direction over the oceans. ECMWF Tech. Memo. 866, 23 pp., https://doi.org/10.21957/wggbl43u.

  • Schmidt, G. A., and Coauthors, 2017: Practice and philosophy of climate model tuning across six U.S. modeling centers. Geosci. Model Dev., 10, 32073223, https://doi.org/10.5194/gmd-10-3207-2017.

    • Search Google Scholar
    • Export Citation
  • Schulzweida, U., 2018: CDO user guide: Climate Data Operators, version 1.9.5. MPI for Meteorology Rep., 215 pp., https://doi.org/10.5281/zenodo.2558193.

  • Sumata, H., F. Kauker, M. Karcher, and R. Gerdes, 2019: Simultaneous parameter optimization of an Arctic sea ice–ocean model by a genetic algorithm. Mon. Wea. Rev., 147, 18991926, https://doi.org/10.1175/MWR-D-18-0360.1.

    • Search Google Scholar
    • Export Citation
  • Tuppi, L., P. Ollinaho, M. Ekblom, V. Shemyakin, and H. Järvinen, 2020: Necessary conditions for algorithmic tuning of weather prediction models using OpenIFS as an example. Geosci. Model Dev., 13, 57995812, https://doi.org/10.5194/gmd-13-5799-2020.

    • Search Google Scholar
    • Export Citation
  • Virman, M., M. Bister, J. Räisänen, V. A. Sinclair, and H. Järvinen, 2021: Radiosonde comparison of ERA5 and ERA-Interim reanalysis datasets over tropical oceans. Tellus, 73A, 1929752, https://doi.org/10.1080/16000870.2021.1929752.

    • Search Google Scholar
    • Export Citation
  • Wang, C., Q. Duan, W. Gong, A. Ye, Z. Di, and C. Miao, 2014: An evaluation of adaptive surrogate modeling based optimization with two benchmark problems. Environ. Modell. Software, 60, 167179, https://doi.org/10.1016/j.envsoft.2014.05.026.

    • Search Google Scholar
    • Export Citation
  • Williamson, D. B., A. T. Blaker, and B. Sinha, 2017: Tuning without over-tuning: Parametric uncertainty quantification for the NEMO ocean model. Geosci. Model Dev., 10, 17891816, https://doi.org/10.5194/gmd-10-1789-2017.

    • Search Google Scholar
    • Export Citation
  • Yang, B., and Coauthors, 2013: Uncertainty quantification and parameter tuning in the CAM5 Zhang–McFarlane convection scheme and impact of improved convection on the global circulation and climate. J. Geophys. Res. Atmos., 118, 395415, https://doi.org/10.1029/2012JD018213.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Parameter convergence during a 365-steps-long optimization experiment Edef. Year 2017 is used three times but with different initialization dates being used on each round. The x axis shows the running number of steps of EPPES, and the y axis shows the relative parameter values. Parameter values assigned to individual ensemble members are shown with black dots, the parameter mean value is shown with a red line, the parameter mean value ±2 standard deviations is shown with blue lines, and the original parameter value is shown with a green line. The vertical magenta lines point out the iteration number 122 that is the last iteration of the first year. Only every third parameter ensemble is displayed.

  • Fig. 2.

    Evolution of the moist total energy norm (ΔEm) in the four tuning experiments (a) E1, (b) E2, (c) E3, and (d) ECFM. The x axis shows the iteration count during the experiments, and the y axis shows the value of ΔEm. In (a)–(d), green dots show ΔEm for individual ensemble members and red crosses show the ensemble mean. (e) A zoom-in to the last 10 steps in (a)–(d), along with control ensembles created with the default model (cyan). In (e), ensemble members are shown with dots, and ensemble mean values are shown as crosses.

  • Fig. 3.

    Initial and final parameter values and remaining standard deviation of the parameter values of the default experiment Edef and the four main experiments E1, E2, E3, and ECFM. The parameter names are shown between the initial and final values, and short references to the names of the parameterization schemes are shown on the left y axis. The parameter values are shown with respect to the default values used in OpenIFS, thus being 1.0 for all parameters. Experiments Edef, E1, and ECFM begin from the default values, whereas in case of E2 and E3 the initial values are either 1.05 or 0.95. In experiment ECFM, the CFM parameter is treated as two parameters separately for ocean and land areas. Initial variance of the parameters is 0.1, corresponding to a standard deviation of about 0.32.

  • Fig. 4.

    Verification of the four main tuning experiments using global RMSE scorecards with respect to the default model. The x axis of each panel shows the forecast range (h), and the y axis shows the atmospheric variables used: Z is geopotential; U and V are the zonal and meridional components of wind, respectively; T is temperature; and Q is specific humidity. The Z, U, V, T, and Q are verified at six pressure levels from the top down: 100, 250, 500, 700, 850, and 1000 hPa. The two-dimensional variables at the bottom of the panels are 2-m temperature (t2m), mean sea level pressure (mslp), 10-m zonal and meridional wind (u10m; v10m), total cloud cover (tcc), low cloud cover (lcc), medium cloud cover (mcc), high cloud cover (hcc), and total column of water vapor (tcwv). Blue color means that the optimized model outperforms the default model. Black dots indicate that the difference between the optimized and default models is statistically significant at the 95% level according to a two-sample t test. Geopotential data at 100 and 250 hPa are missing because of missing operational analysis.

  • Fig. 5.

    (a) Relative bias of specific humidity at 250 hPa in the default model with respect to operational analyses, and (b) how the E1 model version differs from the default model. In (a), data from 53 ten-day forecasts and their respective operational analyses are used. The analyses are subtracted from the forecasts, and then the difference fields are averaged over the 53 forecasts. Last, the differences are averaged over the forecast range. The plot in (b) is produced the same way except that the default forecasts are subtracted from the forecasts produced with the E1 model version. In (a), the red color means that the default model is too moist and blue means that the default model is too dry. In (b), the red color means that the E1 model version is moister than the default model and blue means that the E1 model version is drier than the default model. To facilitate the interpretation, the panels have been smoothed with a 5 × 5 gridpoint moving average, and areas where the default model has bias larger than ±8% are highlighted with contours.

  • Fig. 6.

    (a) Bias of zonal wind at 1000 hPa in the default model with respect to operational analyses and (b) difference of E1 model version to the default model. The arrows show the average wind speed and direction in the default model in the verification forecast dataset. Note that the color scale of (b) is one-fifth that of (a). In (a), the red color means too-strong westerly or too-weak easterly wind and blue means too-weak westerly or too-strong easterly wind in the default model depending on the prevailing wind direction. Biases stronger than ±0.5 m s−1 are highlighted with contours. For example, midlatitude westerlies are too strong and tropical easterlies are too strong. In (b), the red color means that westerlies become weaker or easterlies become stronger in E1 with respect to the default model. Blue means that westerlies become stronger or easterlies become weaker in E1 with respect to the default model. For example, midlatitude westerlies become weaker in E1 as well as tropical easterlies except for the north Indian Ocean. Areas that are, on average, more than 30 hPa below the surface are cut out.

  • Fig. 7.

    As in Fig. 6, but for meridional wind at 1000 hPa. In (a), the red color now means too-strong southerly or too-weak northerly wind component and blue means too-weak southerly or too-strong northerly wind component in the default model. For example, the default model has too-strong southerly wind component in the Norwegian Sea, south of Alaska, and west of Chile, too-weak northerly component north of the intertropical convergence zone (ITCZ), and too-weak southerly component south of the ITCZ. In (b), the red color means that E1 has stronger southerly component or weaker northerly component than the default model. Blue means that E1 has weaker southerly component or stronger northerly component than the default model.

  • Fig. 8.

    Verification of the four main tuning experiments (a) E1, (b) E2, (c) E3, and (d) ECFM with global RMSE using resolution of TL639. The blue color denotes improvement, red indicates deterioration relative to the default model, and black dots mark statistical significance at 95% level. The atmospheric variables and pressure levels are as in Fig. 4.

  • Fig. 9.

    As in Figs. 6 and 7, but for resolution TL639, showing (a),(c) average wind speed and direction in the default model (arrows), the biases of zonal and meridional wind (color shading), and highlighted areas of biases larger than ±0.5 m s−1 (contours). Also shown are (b),(d) E1 changes relative to the default model.

  • Fig. 10.

    Pearson correlations between parameter pairs: (a) correlations computed on the basis of the final parameter mean values of experiments E1, E2, E3, and ECFM along with 10 additional short tuning experiments (the CFM parameter has the land–sea separation in ECFM, so for this purpose a global average of the two components has been taken), and correlations computed (b) from the EPPES-given final parameter covariance matrix of experiment E1 and (c) for an experiment Enorank that does not use ranking of the cost function.

  • Fig. A1.

    Visual overview of the principle of EPPES.

  • Fig. B1.

    As in Fig. 4, but for single-parameter sensitivity tests in which only the parameter in question has been adjusted according to the final parameter mean values proposed by EPPES in experiment E1 (CFM = 1.144, DETRPEN = 1.154, RPRCON = 1.128, and RAMID = 0.928).

All Time Past Year Past 30 Days
Abstract Views 140 140 0
Full Text Views 612 612 58
PDF Downloads 473 473 23