## 1. Introduction

A multimodel ensemble aims to cope with model imperfections in numerical weather prediction (NWP) and has been studied extensively in recent years. In operational NWP, ensemble predictions with perturbed initial conditions have widely been used to evaluate the forecast uncertainties due to the initial condition uncertainties. This type of ensemble prediction system (EPS) was developed based on the theory of error growth due to the chaotic nature of the atmosphere (e.g., Yoden 2007). However, it ignores model imperfections. The multimodel ensemble consists of different model implementations and may account for the forecast uncertainties due to model imperfections in dynamical cores and physics parameterizations.

There are a variety of approaches in constructing a multimodel ensemble. For example, Stensrud et al. (2000) examined uncertainties arising from the initial condition and parameterization schemes separately, and concluded that multiphysics ensemble simulations are more skillful than initial condition ensemble simulations when the large-scale forcing is weak. In contrast, there are several ways to consider uncertainties in the initial condition and model imperfections simultaneously. Evans et al. (2000) employed two models and two objective analyses using the same 16 initial condition perturbations [multimodel and multianalysis (MMMA) ensembles], and reported that the MMMA ensemble forecasts significantly outperformed single-model single-analysis ensemble forecasts. Krishnamurti et al. (1999) constructed a multimodel superensemble using forecasts from different operational weather centers and making multiregression of the forecasts at each grid point. The superensemble outperformed both deterministic forecasts by each model and equally weighted ensemble-mean forecasts in terms of errors in the south–north component of winds at 850 hPa. Furthermore, ensembles of ensemble forecasts from different weather centers (multicenter grand ensembles) have been studied in recent years. For example, Matsueda et al. (2006) reported that ensemble-mean forecasts of multicenter grand ensembles outperformed single-center ensemble-mean forecasts.

Utilizing a multimodel ensemble in ensemble-based data assimilation methods has also been studied (e.g., Fujita et al. 2007; Meng and Zhang 2007; Houtekamer et al. 2009). Fujita et al. (2007) showed that an experiment with an ensemble Kalman filter (EnKF) using different initial and boundary conditions combined with different model physics schemes outperformed experiments with different initial and boundary conditions only or with different model physics schemes only. In their experiments, the initial condition ensemble produced a larger spread in the locations of synoptic systems, whereas the model physics ensemble produced a larger spread in temperature and moisture fields. Combining these two represented the uncertainties better.

Meng and Zhang (2007) conducted observing system simulation experiments with the multimodel EnKF under perfect and imperfect model assumptions by changing parameterization schemes of cumulus, cloud microphysics, and planetary boundary layer. They tested an ensemble of different combinations of parameterizations, and the ensemble sizes for each combination were nearly uniform. They showed that combining different model schemes largely improved the performance compared to a single imperfect model.

Houtekamer et al. (2009) compared four different representations of model error: additive isotropic model error perturbations, multiphysics ensemble, stochastic perturbations to physical tendencies, and stochastic kinetic energy backscatter. They found that the former two had a positive impact in the Canadian operational global EnKF, whereas the latter two did not show further improvements.

When combining a multimodel ensemble with an initial condition ensemble, multiple initial conditions need to be assigned to each model. For example, Koyama and Watanabe (2010) developed a sophisticated way to combine an initial-condition ensemble with a model ensemble in an EnKF framework. The choice of the ensemble sizes of each model may have impacts on the accuracy of ensemble forecasts and ensemble data assimilation.

In the previous studies on multimodel EnKF, the ensemble size for each model was often prescribed subjectively (e.g., Fujita et al. 2007; Meng and Zhang 2007; Houtekamer et al. 2009). In most cases, ensemble members are uniformly distributed to each model. However, Ebert (2001) reported that adding poor-skill models to multimodel ensemble forecasts may degrade the performance of the ensemble in several aspects such as precipitation scores. Success of weighted-mean approaches (e.g., Krishnamurti et al. 1999) also suggests that contributions from less-skilled models must be reduced. Thus, by optimizing the number of ensemble sizes for each model, the state of the system may be more accurately determined. When optimizing the ensemble sizes, observations may provide information on how well each model agrees with observations. It would also be possible to adaptively change the ensemble sizes depending on the flow.

Previous studies investigated the estimation of model parameters in data assimilation (e.g., Annan and Hargreaves 2004; Hacker and Angevine 2013; Ruiz et al. 2013; Bellsky et al. 2014). However, selecting discrete states such as different models requires a different approach from estimating real-number parameters. In this study, we adopt Bayes’s rule to find the optimal combination of ensemble sizes for each model in the multimodel EnKF. As the first step, we test the proposed approach with the Lorenz-96 40-variable model.

This paper is organized as follows. Section 2 introduces the methodology, and section 3 describes the experimental setup. In section 4, results are presented for five typical cases. Section 5 discusses sensitivities of the system to tuning parameters, and conclusions are provided in section 6.

## 2. Method

### a. Multimodel EnKF

*n*is the number of models,

*i*th model,

*j*th member in the

*i*th model at time

*t*(

*f*) or analysis (

*a*),

*i*th nonlinear model operator from time

*t*, and

*t*is mostly omitted for convenience. Next, the forecast error covariance

### b. Optimization of ensemble sizes

As noted in the introduction, we apply a discrete Bayesian filter to a multimodel EnKF to estimate the optimal ensemble sizes for each model (Fig. 1). Here, we assume that increasing the ensemble sizes for “better” models in a multimodel ensemble improves the performance of the multimodel EnKF.

The procedure of the optimization is as follows:

Receiving observations and extended multimodel ensemble forecasts.

Computing distances between the observations and the ensemble means of each model in the extended forecasts [Eq. (13)].

Computing likelihood based on the distances [Eq. (15)].

Updating the model probability density function (PDF) by a discrete Bayesian filter [Eq. (11)].

Inflating the model PDF based on the degree of model imperfections [Eq. (17)].

Temporal smoothing of the model PDF by a Kalman filter [Eq. (18)].

Changing the ensemble sizes in the next forecast–analysis cycle based on the updated model PDF [Eq. (22)].

*i*th model

*i*th model. The probability

By definition, “better” models give the model states closer to the true state. In practice, we may use observations as a proxy for the true state, and compare the model states with the observations. Since observations are assumed to be randomly perturbed around the true state, differences among state estimates by different models in the multimodel ensemble must be statistically larger than the random observation errors. Here, it is natural to assume that extended forecasts would have larger impacts from model errors than the analysis states. Therefore, we use extended forecasts to evaluate the distances from the observations. The advantage of using extended forecasts instead of analyses will be shown in section 5b. In an operational ensemble analysis–forecast cycle, extended forecasts might already be available without additional cost. Therefore, most of the additional cost for the discrete Bayesian filter part comes from the computation of the distances between the model states and the observations as shown below.

*i*th model and the observations is used as a measure of the likelihood

*j*th member of the

*i*th model at time

*t*with forecast length FT. The operator

Schematic showing extended multimodel ensemble forecasts of model 1 (solid) and model 2 (dashed), and observation (thick crisscross mark) in a phase space. Open cross marks denote the ensemble means of each model, and thick gray arrows indicate distances from the observation.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

Schematic showing extended multimodel ensemble forecasts of model 1 (solid) and model 2 (dashed), and observation (thick crisscross mark) in a phase space. Open cross marks denote the ensemble means of each model, and thick gray arrows indicate distances from the observation.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

Schematic showing extended multimodel ensemble forecasts of model 1 (solid) and model 2 (dashed), and observation (thick crisscross mark) in a phase space. Open cross marks denote the ensemble means of each model, and thick gray arrows indicate distances from the observation.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The Bayesian update [Eq. (11)] is performed after every EnKF step, and the discrete probability function of the models

*β*is a tuning parameter to control the magnitude of the inflation. This operation flattens the PDF

If the performance of the multimodel ensemble is close to that of the perfect model, then *γ* becomes smaller. If no perfect model is available and if the most probable model is going to obtain all the ensemble members, then the inflation parameter *γ* becomes larger. In this case, Eqs. (16) and (17) decrease the ensemble size of the most probable model and increase the ensemble sizes of the other models.

Note that in some cases assigning at least one ensemble member to each model improves the analysis performance when combined with the method of modifying the PDF as shown above. If the closest model to the true system varies in time, then this works fine. This will be shown in section 4e.

*i*th model at time

*t*;

*κ*is an inflation factor for

*o*), forecast (

*f*), or analysis (

*a*); and

### c. Summary of the algorithm

The overall flowchart of the system is shown in Fig. 3. First, a multimodel EnKF is performed (Figs. 3a1,a2). The ensemble size is optimized by the proposed method (Fig. 3b) just after the EnKF analysis step. For the ensemble size optimization, an extended forecast (Fig. 3c) is used. In the discrete Bayesian filter [Eq. (11)], the distances between the means of extended ensemble forecasts of each model and the observations are used to obtain likelihood [Eqs. (13)–(15)]. The posterior distribution is then inflated depending on the degree of model imperfections measured by the adaptive inflation parameter in the state estimate [Eqs. (16) and (17)]. The time filter for the ensemble size suppresses noises due to the random observation errors [Eqs. (18)–(21)]. Finally, the ensemble sizes of each model are updated using the model PDF [Eq. (22)]. These processes are repeated for the next time step.

Flowchart of the multimodel EnKF system with ensemble size optimization. Time *i*th time step. The extended ensemble forecast has a forecast length of *k* steps.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

Flowchart of the multimodel EnKF system with ensemble size optimization. Time *i*th time step. The extended ensemble forecast has a forecast length of *k* steps.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

Flowchart of the multimodel EnKF system with ensemble size optimization. Time *i*th time step. The extended ensemble forecast has a forecast length of *k* steps.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

## 3. Experimental setup

For the first assessment of the proposed method of the adaptive multimodel ensemble, experiments are performed with the Lorenz-96 40-variable model (Lorenz 1996; Lorenz and Emanuel 1998). The model has terms that represent advection, dissipation, and external forcing, and mimics an atmospheric variable spaced equally along a latitudinal circle. The fourth-order Runge–Kutta scheme is used, and the time interval is chosen to be 0.01. Lorenz and Emanuel (1998) showed that the 0.2 time unit is equivalent to one Earth day in terms of the error-doubling time of synoptic weather if the forcing parameter

Different values of the model parameter *F* are used to mimic a multimodel ensemble (Table 1). Representing model imperfections by different *F* values was also adopted in previous studies (e.g., Anderson 2007, 2009). Anderson (2007) described behaviors of the models with different *F* values, and compared performances of assimilation experiments when the model parameter *F* was different from the true *F*. Although the current approach does not have errors in the advection or dissipation terms, the essence of the multimodel ensemble is reasonably represented by the *F* ensemble, and conclusions derived from the experiments below do not lose generality. Note that the aim of the current experiments is not to estimate *F* in the real domain, but to find the optimal combination of different models as a discrete state. Hereafter, a model with *F* is first fixed at 8, and the observations are generated every 0.05 time units (or 6 h) by adding independent Gaussian random noise with unit variance to the true time series.

Experimental setup.

The simulated observation data are assimilated using the SEnSRF with 20 ensemble members for 110 years. The initial ensemble is created with a 1-yr free run using each model initialized by random states. Observations are given at all 40 grid points every 0.05 time units, that is, equivalent to 6 h. Namely, each experiment has 160 600 assimilation steps. Although the observation network is unrealistically perfect in this paper for simplicity, an experiment with observations at 20 contiguous grid points also works and leads to the same conclusion (not shown). Statistics are computed for the latter 100 years after a 10-yr spinup. The covariance localization parameter for the EnKF, which is defined as the one-standard-deviation radius of the Gaussian function, is fixed at 6.0 grid points throughout this study after manual tuning.

Covariance inflation is adaptively computed (Li et al. 2009; Miyoshi 2011). We follow Miyoshi (2011) except that the inflation factor

In the following experiments, otherwise noted, the extended forecast length FT used in the ensemble size optimization is chosen to be 2 days (or 0.4 time units). The parameters *β* [Eq. (16)] and *κ* [Eq. (21)] are listed in Table 1 after manual tuning. The 100-yr-mean inflation factor for a perfect model experiment is computed as

## 4. Results

### a. Case 1: Multimodel ensemble including the true model

First, the adaptive ensemble size optimization is tested with a multimodel ensemble that includes the true model F8. The multimodel ensemble consists of F6, F7, F8, F9, and F10 (four members each at the initial time). Figure 4 shows the time series of the ensemble sizes of each model in the multimodel ensemble. As soon as the assimilation cycle starts, the multimodel ensemble quickly converges to the true model (F8, the green line in Fig. 4) within 600 cycles (about five months). After each analysis step, the root-mean-square error (RMSE) is calculated using the 40 grid points, and we will use time-mean RMSEs over the 100 years. The RMSE of the adaptive ensemble size optimization is compared with those of two additional experiments. One is the manually tuned run, in which the ensemble size is fixed at the manually optimized combination. In case 1, all the ensemble members are assigned to the true model F8: equivalent to the perfect model experiment. The other is the uniformly distributed run, in which all the models have the same ensemble size, and the ensemble size is fixed in time.

(a) Time series of ensemble sizes of each model in case 1. The black, red, green, blue, and purple lines show F6, F7, F8, F9, and F10, respectively. (b) Magnifies the first two years of (a).

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

(a) Time series of ensemble sizes of each model in case 1. The black, red, green, blue, and purple lines show F6, F7, F8, F9, and F10, respectively. (b) Magnifies the first two years of (a).

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

(a) Time series of ensemble sizes of each model in case 1. The black, red, green, blue, and purple lines show F6, F7, F8, F9, and F10, respectively. (b) Magnifies the first two years of (a).

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The RMSE for the proposed method is 0.189, much smaller than 0.307 from the uniform multimodel ensemble. This value is almost identical to the RMSE of 0.189 obtained by the perfect model experiment (Fig. 5). Note that the error bars in Fig. 5 represent 99.99% confidence intervals, in which effective sample sizes are computed using a lag-1 autocorrelation (e.g., Wilks 2006). The error bars are too narrow and obscured, showing that the RMSE differences among the proposed method, manually tuned run, and uniformly distributed run are statistically significant.

The analysis RMSEs averaged for 100 years for cases 1–5 after a 10-yr spinup. Tuned, uniform, and adaptive denote runs with manually tuned, uniformly distributed, and adaptively optimized ensemble sizes, respectively. The manually tuned run in case 1 is equivalent to the perfect model run. The error bars (obscured) represent 99.99% confidence intervals. The black dots denote the time-mean analysis multimodel ensemble spreads for the same period as the RMSE.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The analysis RMSEs averaged for 100 years for cases 1–5 after a 10-yr spinup. Tuned, uniform, and adaptive denote runs with manually tuned, uniformly distributed, and adaptively optimized ensemble sizes, respectively. The manually tuned run in case 1 is equivalent to the perfect model run. The error bars (obscured) represent 99.99% confidence intervals. The black dots denote the time-mean analysis multimodel ensemble spreads for the same period as the RMSE.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The analysis RMSEs averaged for 100 years for cases 1–5 after a 10-yr spinup. Tuned, uniform, and adaptive denote runs with manually tuned, uniformly distributed, and adaptively optimized ensemble sizes, respectively. The manually tuned run in case 1 is equivalent to the perfect model run. The error bars (obscured) represent 99.99% confidence intervals. The black dots denote the time-mean analysis multimodel ensemble spreads for the same period as the RMSE.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The analysis multimodel ensemble spreads (black dots in Fig. 5) are compared with the RMSEs. The spreads averaged over the 100 years are 0.173, 0.259, and 0.173 for the manually tuned run, the uniformly distributed run, and the proposed method, respectively. These are smaller than the RMSEs by 0.016, 0.048, and 0.016, respectively. The analysis multimodel ensemble spread systematically underestimates the analysis multimodel ensemble-mean error; this is consistent with Fig. 5 in Miyoshi (2011).

### b. Case 2: Multimodel ensemble consisting of similarly imperfect models

A multimodel ensemble without the true model is tested. Here, the multimodel ensemble consists of F6, F7, F9, and F10. If the ensemble sizes are uniform—that is, five members each—then the mean analysis RMSE becomes 0.316 (Fig. 5).

Before running an adaptive multimodel ensemble, a manual tuning is performed to find the optimal combination of the models (Fig. 6). In the current test case, two models (F7, F9) have similar degrees of imperfections (the difference of *F* from the truth is

The analysis RMSEs for different combinations of the ensemble sizes of F7 and F9. The crisscross mark shows the smallest RMSE.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The analysis RMSEs for different combinations of the ensemble sizes of F7 and F9. The crisscross mark shows the smallest RMSE.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The analysis RMSEs for different combinations of the ensemble sizes of F7 and F9. The crisscross mark shows the smallest RMSE.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The proposed Bayesian filter approach to multimodel ensemble successfully selects F7 and F9. The analysis RMSE in the proposed system becomes 0.274 (Fig. 5), which is almost identical to 0.273 of the manual tuning, resulting in

Figure 7a shows the time series of the ensemble sizes. The multimodel ensemble converges to the two models around the optimal combination (F7:F9 = 17:3). It takes about two years to stabilize the model PDF (Fig. 7b). The time-mean probabilities of the models are shown in Fig. 7c: about 16 members for F7 and about 3 members for F9, consistent with the optimal combination F7:F9 = 17:3.

(a) Time series of ensemble sizes of each model in case 2. The black, red, green, and blue lines show F6, F7, F9, and F10, respectively. (b) Magnifies the first 2 years of (a). (c) Time-mean ensemble sizes for each model.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

(a) Time series of ensemble sizes of each model in case 2. The black, red, green, and blue lines show F6, F7, F9, and F10, respectively. (b) Magnifies the first 2 years of (a). (c) Time-mean ensemble sizes for each model.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

(a) Time series of ensemble sizes of each model in case 2. The black, red, green, and blue lines show F6, F7, F9, and F10, respectively. (b) Magnifies the first 2 years of (a). (c) Time-mean ensemble sizes for each model.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

Around the assimilation cycles of 30 000, 53 000, and 107 000, the ensemble size of F9 becomes larger than that of F7 (Fig. 7a). During the periods around 30 000 and 107 000, F6 and F10 slightly increase, too. This indicates that the estimation of the optimal model combination depends on the flow of the day. However, the RMSEs during those periods do not change much compared to the mean RMSE.

If we apply no inflation to

### c. Case 3: Multimodel ensemble consisting of models with different degrees of imperfections

In this test case, F5.5, F6.5, F9, and F10 are used, so that different models have different degrees of imperfections. F9 is the closest to the true model F8, and F6.5 is the second closest. The results show that F9 obtains about 18 members out of 20, and F6.5 obtains the rest (Fig. 8c). That is, the adaptive method finds the same ensemble sizes as those of the manual tuning; the best combination is F6.5:F9 = 2:18. Compared to the previous case (case 2), the present case (case 3) shows smaller time variations of the ensemble sizes (Fig. 8a). A possible reason is that the degree of imperfection of F9 is clearly the smallest among the four, and it would be easier to find F9. It takes about 600 cycles to stabilize the model PDF, shorter than the previous case (case 2).

As in Fig. 7, but for case 3. The black, red, green, and blue lines show F5.5, F6.5, F9, and F10, respectively.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

As in Fig. 7, but for case 3. The black, red, green, and blue lines show F5.5, F6.5, F9, and F10, respectively.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

As in Fig. 7, but for case 3. The black, red, green, and blue lines show F5.5, F6.5, F9, and F10, respectively.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The RMSEs of manual tuning, uniform distribution, and the proposed method are 0.279, 0.328, and 0.280, respectively (Fig. 5), resulting in

### d. Case 4: Multimodel ensemble consisting of biased imperfect models

In this test case, F5.5, F6, F6.5, F7 are used, so that all the models are biased to the same direction; all the model *F* values are smaller than the true value

The RMSEs of manual tuning, uniform distribution, and the proposed method are 0.331, 0.374, and 0.335, respectively (Fig. 5), resulting in

### e. Case 5: An experiment with time-varying true F

Next, we investigate how well the proposed system can follow the true forcing *F* varying in time much more slowly than the frequency of the observations. In this test case, the true *F* is chosen to be a sinusoidal function *F* smoothly, at least one member is assigned for every model.

The proposed system obtains the optimal combinations of ensemble sizes that change in time: when the true *F* is above 8, F9 has more than 16 members, whereas F7 has more than 16 members when the true *F* is below 8 (Fig. 9a). The transition between F7 and F9 is slightly delayed compared to the change of the true forcing *F*, depending on the strength of temporal smoothing *κ*: the smaller *κ*, the more delay. We can also find the clear delay in the mean annual cycle (Fig. 9b): the delay is less than a month. The mean PDF of the ensemble sizes shows almost the same probabilities (about 0.5) for F7 and F9 (Fig. 9c).

(a) Time series of ensemble sizes for the last three years in case 5. The black, red, green, and blue lines show F6, F7, F9, and F10, respectively (left axis), and the dashed line shows the true *F* (right axis). (b) Mean annual cycle of the ensemble sizes (left axis) and the true *F* for 100 years after a 10-yr spinup (right axis). (c) Time-mean ensemble sizes for the 100 years.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

(a) Time series of ensemble sizes for the last three years in case 5. The black, red, green, and blue lines show F6, F7, F9, and F10, respectively (left axis), and the dashed line shows the true *F* (right axis). (b) Mean annual cycle of the ensemble sizes (left axis) and the true *F* for 100 years after a 10-yr spinup (right axis). (c) Time-mean ensemble sizes for the 100 years.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

(a) Time series of ensemble sizes for the last three years in case 5. The black, red, green, and blue lines show F6, F7, F9, and F10, respectively (left axis), and the dashed line shows the true *F* (right axis). (b) Mean annual cycle of the ensemble sizes (left axis) and the true *F* for 100 years after a 10-yr spinup (right axis). (c) Time-mean ensemble sizes for the 100 years.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The current system outperforms the manually tuned multimodel EnKF with a fixed combination; the RMSE is 0.273, less than 0.285 of the manually tuned case (Fig. 5). As a result, *F* has time variations. The time-varying ensemble size in the current method provides the best performance.

In the experiment mentioned above, at least one member was assigned to each model. If we allow the ensemble size to be zero, then the transition from one model to another becomes significantly worse (not shown). In this case, the system in the transition period needs to wait first until the variance of the model PDF increases to a certain value, causing a large delay in the transition.

## 5. Parameter sensitivities

In this section, we discuss sensitivities of the system to several parameters.

### a. Sensitivity to the inflation and temporal smoothing parameters of the model PDF

Additional experiments are performed for cases 1, 2, and 5 to investigate the sensitivities of the proposed approach to the two parameters: the inflation parameter *β* and the temporal smoothing parameter *κ* of the model PDF. A larger *β* corresponds to more forcing toward the uniform distribution (equal ensemble sizes), and a larger *κ* corresponds to a smaller adjustment at each analysis and a smoother time series. Figure 10 shows

The improvement rate *β* and *κ* for (a) case 1, (b) case 2, and (c) case 5. See section 4a for the definition of *β* toward uniform distribution. Note that the horizontal and vertical axes are shown in logarithmic scales. The crisscross mark in (b) shows the best score in case 2.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The improvement rate *β* and *κ* for (a) case 1, (b) case 2, and (c) case 5. See section 4a for the definition of *β* toward uniform distribution. Note that the horizontal and vertical axes are shown in logarithmic scales. The crisscross mark in (b) shows the best score in case 2.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

The improvement rate *β* and *κ* for (a) case 1, (b) case 2, and (c) case 5. See section 4a for the definition of *β* toward uniform distribution. Note that the horizontal and vertical axes are shown in logarithmic scales. The crisscross mark in (b) shows the best score in case 2.

Citation: Monthly Weather Review 143, 6; 10.1175/MWR-D-14-00148.1

Figure 10a shows the results for case 1, in which the true model is included in the multimodel ensemble. When *β* is small (less than *κ* and less than *κ*), the system converges to the true model with the highest accuracy (the white regions in Fig. 10a). When *β* is larger than *κ* and *κ*, the system is forced toward uniform distribution and keeps other models alive because of stronger inflation of the model PDF, in which case the analysis accuracy is degraded. However, there is no significant degradation compared with the uniformly distributed runs, because the multimodel ensemble approaches the uniformly distributed ensemble as *β* becomes large.

Figure 10b shows the results for case 2, in which the true model is not included in the multimodel ensemble. The optimal performance is achieved when *β* becomes larger than the optimal 0.6, *β* becomes smaller than 0.6, the model PDF collapses at around *β* to be safe. As the temporal *κ* deviates from 1.08,

Figure 10c shows the results for case 5, in which the forcing *F* for the true system changes in time. The improvement rate *κ* is large and *β* is small. That is, the adaptive ensemble approach outperforms the manually tuned but fixed combinations. A larger time-smoothing *κ* enables quicker transitions from one model to another, although the temporal change of the model PDF includes higher-frequency noise. A smaller *β* enables the system to converge to the most probable model at each moment. Note that we limit the smallest ensemble size of each model to be 1 in case 5, so that the model PDF never collapses.

In all three cases, larger *β* makes the system approach to the uniformly distributed runs. For larger *β*, the system is stable, and moderate improvements are expected. Smaller *β* may make the model PDF collapse. With an imperfect multimodel ensemble, we need to avoid collapsing by allocating ensemble members to at least two different models. Note that case 3 also shows similar results as case 2 with the optimal combination of *κ* and *β* slightly different from those of case 2 (not shown).

### b. Sensitivity to other parameters

In this study, extended 2-day forecasts are used to find the observed PDF of each model, so that the distance between the forecasts and the observations represents more of the model errors with minimal impact from the observation errors. It is important to investigate the sensitivity of the proposed approach to the length of extended forecasts [forecast time (FT)]. When

In the inflation method of the model PDF [Eq. (16)], the reference parameter *β* and *κ*, which gives the best *β* and *κ* in which the model PDF collapses in Fig. 10 will also change.

The sensitivity to the ensemble size is also investigated. All experiments so far used 20 members, but experiments with 10 members also showed stable performance and led to the same conclusions. Case 5 with 10 members also followed the temporal variations of *F*.

## 6. Conclusions

In this study, we have developed an optimization algorithm for the ensemble sizes of each model in a multimodel ensemble Kalman filter using a discrete Bayesian filter. The inverse of the distances between the ensemble means of each model in extended forecasts and observations were used as likelihood of each model in the discrete Bayesian filter. We developed an effective inflation method to make the Bayesian filter work without converging to a single imperfect model.

As a first step, we tested the proposed approach with the 40-variable Lorenz-96 model. Different values of the model parameter *F* were used to mimic the multimodel ensemble. The true *F* was first chosen to be *F* = 9—were selected successfully (case 2). Multimodel ensembles consisting of models with different degrees of imperfections (

When the true *F* had temporal variations, *F* was above 8, F9 obtained more than 16 members, and when the true *F* was below 8, F7 obtained more than 16 members. In terms of the RMSE, the proposed system outperformed the multimodel EnKF with the time-invariant optimal combination.

We have also investigated sensitivities to several parameters in the proposed system. If the inflation parameter of the model PDF was too small, then the model PDF collapsed and the performance of the system was significantly degraded. If the inflation parameter was too large, then the model PDF became uniform, and no improvement was obtained. We have also found that the proposed system showed improvements in a wide range of parameters.

Although we have seen promising results, the model used in this study was a simple toy model. Further investigation of the current approach using realistic geophysical models is necessary and remains to be a subject of future research.

## Acknowledgments

The authors thank three anonymous reviewers for their helpful comments. The authors thank the members of the Data Assimilation Research Team of RIKEN Advanced Institute for Computational Science for the fruitful discussions. The figures were produced by the GFD-DENNOU Library. This research was partially supported by JST, CREST.

## REFERENCES

Anderson, J. L., 2007: An adaptive covariance inflation error correction algorithm for ensemble filters.

,*Tellus***59A**, 210–224, doi:10.1111/j.1600-0870.2006.00216.x.Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters.

,*Tellus***61A**, 72–83, doi:10.1111/j.1600-0870.2008.00361.x.Annan, J. D., and J. C. Hargreaves, 2004: Efficient parameter estimation for a highly chaotic system.

,*Tellus***56A**, 520–526, doi:10.1111/j.1600-0870.2004.00073.x.Bellsky, T., J. Berwald, and L. Mitchell, 2014: Nonglobal parameter estimation using local ensemble Kalman filtering.

,*Mon. Wea. Rev.***142**, 2150–2164, doi:10.1175/MWR-D-13-00200.1.Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation.

,*Mon. Wea. Rev.***129**, 2461–2480, doi:10.1175/1520-0493(2001)129<2461:AOAPMS>2.0.CO;2.Evans, R. E., M. S. J. Harrison, R. J. Graham, and K. R. Mylne, 2000: Joint medium-range ensembles from the Met. Office and ECMWF systems.

,*Mon. Wea. Rev.***128**, 3104–3127, doi:10.1175/1520-0493(2000)128<3104:JMREFT>2.0.CO;2.Fujita, T., D. J. Stensrud, and D. C. Dowell, 2007: Surface data assimilation using an ensemble Kalman filter approach with initial condition and model physics uncertainties.

,*Mon. Wea. Rev.***135**, 1846–1868, doi:10.1175/MWR3391.1.Hacker, J. P., and W. M. Angevine, 2013: Ensemble data assimilation to characterize surface-layer errors in numerical weather prediction models.

,*Mon. Wea. Rev.***141**, 1804–1821, doi:10.1175/MWR-D-12-00280.1.Houtekamer, P. L., H. L. Mitchell, and X. Deng, 2009: Model error representation in an operational ensemble Kalman filter.

,*Mon. Wea. Rev.***137**, 2126–2143, doi:10.1175/2008MWR2737.1.Koyama, H., and M. Watanabe, 2010: Reducing forecast errors due to model imperfections using ensemble Kalman filtering.

,*Mon. Wea. Rev.***138**, 3316–3332, doi:10.1175/2010MWR3067.1.Krishnamurti, T. N., C. M. Kishtawal, T. E. Larow, D. R. Bachiochi, Z. Zhang, C. E. Williford, S. Gadgil, and S. Surendran, 1999: Improved weather and seasonal climate forecasts from multimodel superensemble.

,*Science***285**, 1548–1550, doi:10.1126/science.285.5433.1548.Li, H., E. Kalnay, and T. Miyoshi, 2009: Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter.

,*Quart. J. Roy. Meteor. Soc.***135**, 523–533, doi:10.1002/qj.371.Lorenz, E. N., 1996: Predicability: A problem partly solved.

*Proceedings of the Seminar on Predictability,*Vol. 1, T. Palmer, Ed., ECWMF, 1–18.Lorenz, E. N., and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model.

,*J. Atmos. Sci.***55**, 399–414, doi:10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.Matsueda, M., M. Kyouda, H. L. Tanaka, and T. Tsuyuki, 2006: Multi-center grand ensemble using three operational ensemble forecasts.

,*SOLA***2**, 33–36, doi:10.2151/sola.2006-009.Meng, Z., and F. Zhang, 2007: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part II: Imperfect model experiments.

,*Mon. Wea. Rev.***135**, 1403–1423, doi:10.1175/MWR3352.1.Miyoshi, T., 2011: The Gaussian approach to adaptive covariance inflation and its implementation with the local ensemble transform Kalman filter.

,*Mon. Wea. Rev.***139**, 1519–1535, doi:10.1175/2010MWR3570.1.Ruiz, J. J., M. Pulido, and T. Miyoshi, 2013: Estimating model parameters with ensemble-based data assimilation: A review.

,*J. Meteor. Soc. Japan***91**, 79–99, doi:10.2151/jmsj.2013-201.Stensrud, D. J., J.-W. Bao, and T. T. Warner, 2000: Using initial condition and model physics perturbations in short-range ensemble simulations of mesoscale convective systems.

,*Mon. Wea. Rev.***128**, 2077–2107, doi:10.1175/1520-0493(2000)128<2077:UICAMP>2.0.CO;2.Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130**, 1913–1924, doi:10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.Wilks, D. S., 2006:

*Statistical Methods in the Atmospheric Sciences.*2nd ed. Academic Press, 627 pp.Yoden, S., 2007: Atmospheric predictability.

,*J. Meteor. Soc. Japan***85B**, 77–102, doi:10.2151/jmsj.85B.77.