1. Introduction
In high-dimensional applications, ensemble Kalman filters are usually implemented using a small number of ensemble members because of the high cost in integrating the forecast model. This occurs for instance in operational numerical weather prediction, where forecast models have
While parametric localization functions are useful in practice, they can be expensive to tune in large-scale applications. For instance, the GC localization function has an optimal half-width that tends to vary, for example, by observation type (Houtekamer and Mitchell 2005) and model variables (Anderson 2007b, 2012) or even as a function of time (Anderson 2012; Chen and Oliver 2010). Recently, De La Chevrotière and Harlim (2017) proposed a data-driven localization technique that can capture nonuniform localization bandwidths using a single parameter. The technique uses ensemble archived products from which time series of sampled and undersampled correlations are computed. A supervised learning algorithm analyzes the two training correlation datasets to infer a localization function, named the localization map. The localization map is used in verification mode to transform the poorly estimated sample correlation into an improved correlation. In a series of observing system simulation experiments (OSSEs) using the 40-variable Lorenz-96 model (Lorenz 1996) and a range of linear and nonlinear observation models, the localization maps were found to improve the filter estimates, most notably in the case of nonlinear indirect observations (De La Chevrotière and Harlim 2017).
In light of these promising results obtained using a low-order model, the performance of the localization maps is further explored in a data assimilation system of intermediate complexity. Here, the serial least squares ensemble Kalman filter (LS-EnKF) of Anderson (2003) is implemented in the monsoon–Hadley multicloud model (De La Chevrotière and Khouider 2017; De La Chevrotière 2015), a zonally symmetric model for the meridional Hadley circulation and monsoonal flow. The model’s free troposphere synoptic-scale wave dynamics is given by nonlinear equations for the barotropic and first two baroclinic modes of vertical structure, while the physical processes of convection and precipitation are represented by a stochastic model for clouds. Although the monsoon–Hadley multicloud model is an idealized atmospheric circulation model, it features a nonlinear multiscale wave dynamics with several thousands of model coordinates, which makes it an ideal test bed for the localization maps. The vertical basis function representation of the model is exploited to recreate satellite-like observations using an idealized radiative transfer model. Brightness temperature-like measurements of six satellite channels are assimilated to the model using a sparse observational network. The filter skill of the localization maps is tested with this nonlinear indirect observation model in a series of perfect-model OSSEs as well as in the presence of model error.
The structure of this paper is as follows: in section 2, we review the general framework of the monsoon–Hadley multicloud model and look at numerical simulations of the model in two different regimes. In section 3, the technique of the localization mapping is explained in the context of the LS-EnKF, followed by a description of the idealized radiative transfer model and general experimental design. In section 4 we present the results of OSSEs realized in the perfect and imperfect-model scenarios. We wrap up the paper with a brief summary and conclusions in section 5.
2. The monsoon–Hadley multicloud model
The monsoon–Hadley multicloud model is a zonally symmetric model for the large-scale Hadley circulation, ambient winds, and precipitation associated with the summer monsoon season (De La Chevrotière and Khouider 2017; De La Chevrotière 2015). The model is based on the Galerkin projection of the primitive equations of atmospheric synoptic dynamics onto the first few modes of vertical structure in the free troposphere, and is coupled to a bulk atmospheric boundary layer (ABL) model. The prognostic variables of this vertical projection are the barotropic and baroclinic horizontal velocities,

(a) Vertical profiles of the leading modes of horizontal velocity
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

(a) Vertical profiles of the leading modes of horizontal velocity
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
(a) Vertical profiles of the leading modes of horizontal velocity
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Altogether, the governing equations for the large-scale variables
The computational domain is reduced to a meridional slice of the troposphere between 40°S and 40°N (roughly 9000 km) with a mesoscale grid resolution of about 37 km. The system is integrated as an initial value problem with the initial condition set to a radiative convective equilibrium (RCE; De La Chevrotière 2015). This is a spatially homogeneous steady-state solution where the convective heating is balanced by the radiative cooling. Details on how to construct a RCE solution for the coupled system can be found in De La Chevrotière (2015). In section 2b we present numerical simulations in an idealized boreal summer setting. A detailed description of the model can be found in the original works (De La Chevrotière and Khouider 2017; De La Chevrotière 2015).
a. The stochastic multicloud parameterization
The multicloud model highlights the role of three heating rates,
The parameterization scheme overlays on top of each grid box a Markov chain square lattice of size
Cloud transition probability rates




















b. Idealized boreal summer monsoon simulations
The monsoon–Hadley multicloud model is tested in an idealized summer monsoon setting on an aquaplanet with constant but nonuniform sea surface temperature (SST) mimicking the Indian and Pacific Oceans’ warm pool (WP). The prescribed SST follows a Gaussian meridional profile centered at 15°N, as shown in Fig. 2. This is meant to replicate the warm SSTs observed in the intertropical convergence zone (ITCZ) during the boreal summer. We use a multicloud stochastic lattice of size

Imposed SST meridional profile. The surface temperature gradient at RCE follows a normal distribution centered at 15°N with a standard deviation of 7.2°. Here
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

Imposed SST meridional profile. The surface temperature gradient at RCE follows a normal distribution centered at 15°N with a standard deviation of 7.2°. Here
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Imposed SST meridional profile. The surface temperature gradient at RCE follows a normal distribution centered at 15°N with a standard deviation of 7.2°. Here
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
The model is integrated for roughly 1350 days with a time step of 3 min. The solutions are in a statistical steady state after a short transient period of 100–200 days (De La Chevrotière 2015). The first 1000 days are discarded as burn-in and the last 350 days are used for training purposes. Here we present the simulation results for two parameter regimes A and B, which differ only by their convective parameterization cloud time scales: regime A uses the Bayes’s mean time-scale estimates of Table 1, while the time scales of regime B are obtained by adding one Bayes’s standard deviation to the mean. We should point out that most of the model errors are due to misspecification in the deep clouds decaying rate with large standard deviation. We will use these two regimes to simulate numerical experiments with model error.
The mean meridional circulation resulting from taking the time average of the solution over the training interval is plotted in Fig. 3 for regime A. The height–latitude contour plots are shown for the horizontal wind components u and υ, vertical velocity w, potential temperature θ, pressure p, and total heating H, each obtained from its respective Galerkin expansion as detailed in De La Chevrotière and Khouider (2017). The cross sections show the dominant deep tropospheric overturning of the Hadley circulation, with an ascending branch over the WP at 15°N resulting from low-level convergence, and subsidence near 10°S. The upward motion branch of the Hadley cell is associated with a strong deep barotropic heating mode and a stratiform second baroclinic potential temperature mode. The sea level pressure drops significantly moving northward through the ITCZ, a characteristic of the monsoon trough. The low-level wind displays the turning of the equatorial easterlies to westerlies south of the pressure trough and then back to easterlies, similar to the mean monsoonal flow of the boreal summer season.

Mean meridional circulation averaged over the training interval for regime A. The top of the ABL (solid black line) is located at height 0 km. The contours represent the indicated fields, and the arrows are the velocity vector field (υ, w). (clockwise) Zonal and meridional winds, potential temperature, total heating, vertical velocity, and pressure.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

Mean meridional circulation averaged over the training interval for regime A. The top of the ABL (solid black line) is located at height 0 km. The contours represent the indicated fields, and the arrows are the velocity vector field (υ, w). (clockwise) Zonal and meridional winds, potential temperature, total heating, vertical velocity, and pressure.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Mean meridional circulation averaged over the training interval for regime A. The top of the ABL (solid black line) is located at height 0 km. The contours represent the indicated fields, and the arrows are the velocity vector field (υ, w). (clockwise) Zonal and meridional winds, potential temperature, total heating, vertical velocity, and pressure.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
The Hovmöller plot diagrams of the wave fluctuations from the mean solutions are shown for regime A in Fig. 4. The latitude–time contours of

The 25-day Hovmöller plots for regime A. (clockwise) ABL equivalent potential temperature, free-tropospheric moisture, meridional barotropic wind; and stratiform, deep, and congestus cloud area fractions.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

The 25-day Hovmöller plots for regime A. (clockwise) ABL equivalent potential temperature, free-tropospheric moisture, meridional barotropic wind; and stratiform, deep, and congestus cloud area fractions.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
The 25-day Hovmöller plots for regime A. (clockwise) ABL equivalent potential temperature, free-tropospheric moisture, meridional barotropic wind; and stratiform, deep, and congestus cloud area fractions.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
The mean meridional circulation and Hovmöller plots for regime B are shown in Figs. 5 and 6, respectively. As mentioned before, the cloud transition time scales of regime B are larger than those of regime A by one standard deviation. This positive bias in the transition time scales has an impact on the wave disturbances: cloud systems are now larger and persist over several days, while their period of oscillation is in the order of 10 days or so. The mean meridional circulation shows a reduced low-level convergence and dampened upward motion over the WP region. The convective total heating also appears diminished throughout the domain.

As in Fig. 3, but for regime B.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

As in Fig. 3, but for regime B.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
As in Fig. 3, but for regime B.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

As in Fig. 4, but for regime B.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

As in Fig. 4, but for regime B.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
As in Fig. 4, but for regime B.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
3. Data assimilation methodology and experimental design






In theory, for linear models with Gaussian errors, the EnKF converges to the (exact) KF solution in the limit of large ensemble size,
In this paper, we address the spurious correlation issue with the data-driven technique proposed by De La Chevrotière and Harlim (2017). Instead of tuning a specified parametric localization function (such as the half-width parameter of the Gaspari–Cohn or exponentially decaying functions), this method uses a pair of labeled training datasets to estimate a linear map, called the localization map, that transforms the poorly estimated sample correlation into an improved correlation. This training methodology is effectively an example of supervised learning in the machine learning community (see e.g., Hastie et al. 2009). Here, the localization map is implemented within the sequential least squares framework of Anderson (LS-EnKF; Anderson 2003), a serial variant of the EnKF that allows for observations with independent measurement errors to be assimilated sequentially. Anderson’s scheme breaks down the filtering problem into a sequence of linear regressions of a scalar observation onto the state vector. In this scalar context, the covariances, or correlations, between a single observation and the model state variables appear explicitly and can be easily localized.
In the remainder of this section, we provide a brief review on the LS-EnKF in section 3a and the localization map technique in section 3b. In section 3c, we introduce the idealized radiative transfer model for synthetic satellite observations and conclude with the experimental design in section 3d.
a. Least squares EnKF algorithm






















b. Localization mappings and modified LS-EnKF










The idea behind the localization mappings is to obtain a map

























One way to approximate the solution of the minimization problem in (7) is to discretize the cost function using samples from





























c. An idealized radiative transfer model

















We desire to simulate channels whose weighing function peaks at a specific height

(left) Optical thickness
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

(left) Optical thickness
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
(left) Optical thickness
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1









Nondimensional brightness temperature
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

Nondimensional brightness temperature
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Nondimensional brightness temperature
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
d. Experimental design
The monsoon–Hadley multicloud model described in section 2 coupled with the modified LS-EnKF given by (4) and (9) form our assimilation scheme. We will conduct OSSEs in the perfect as well as imperfect-model scenarios. In the perfect-model scenario, both the nature and forecast states are generated using the same model configuration (regime A; identical twin experiments). In the imperfect-model scenario, the nature is generated using the reference parameters of regime A while the forecast model parameters are of regime B, as discussed in section 2b.
The analysis is performed over the 256 internal grid points of the meridional numerical domain, on the 14 large-scale fields
The observations are generated from the nature run according to (2b), where the observation model G is the idealized radiative transfer model described in the previous section, and the observation error covariance matrix
The correlation data used to train the localization maps are obtained by running an OSSE using

Contour plot of the vector localization map for the correlations between the model coordinates i of the field
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

Contour plot of the vector localization map for the correlations between the model coordinates i of the field
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Contour plot of the vector localization map for the correlations between the model coordinates i of the field
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Notice that from Fig. 9 the structure of the map is not symmetric with respect to the observation location. Second, the function value of the map for larger
Although the focus will be on the vector maps obtained with

The GC function and localization map for the correlation between the observed brightness temperature of channel 1 observed at three different locations (roughly 14°S, 0°, and 12°N) and the model variables (a)
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

The GC function and localization map for the correlation between the observed brightness temperature of channel 1 observed at three different locations (roughly 14°S, 0°, and 12°N) and the model variables (a)
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
The GC function and localization map for the correlation between the observed brightness temperature of channel 1 observed at three different locations (roughly 14°S, 0°, and 12°N) and the model variables (a)
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1






4. Numerical results
We now present the numerical results for OSSEs realized in the perfect- and imperfect-model scenarios. In the perfect-model experiments, both the truth and forecast states are simulated with regime A. In the imperfect-model experiments, the forecast model and the truth are simulated using regimes B and A, respectively (see Table 2 for details). Recall that regime B differs from regime A by its slightly larger convective parameterization time scales. The goal of the imperfect-model experiments is to test the robustness of the localization maps in the presence of model error.











Average relative degradation as a function of the parameter ρ for the perfect-model and model error experiments.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

Average relative degradation as a function of the parameter ρ for the perfect-model and model error experiments.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Average relative degradation as a function of the parameter ρ for the perfect-model and model error experiments.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
a. Perfect-model experiments
We first realize a perfect-model experiment using 1000 members (TD_PM), calculating the background correlation at each analysis cycle. Using this dataset we obtain two different maps: 1) a scalar map with

Perfect-model experiments. Log-linear plot of the time mean of analysis RMSE as a function of ensemble size is shown for the 14 analyzed fields, for the verification experiments using the localization maps PM with
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

Perfect-model experiments. Log-linear plot of the time mean of analysis RMSE as a function of ensemble size is shown for the 14 analyzed fields, for the verification experiments using the localization maps PM with
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Perfect-model experiments. Log-linear plot of the time mean of analysis RMSE as a function of ensemble size is shown for the 14 analyzed fields, for the verification experiments using the localization maps PM with
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Comparing the scalar map PM
b. Model error experiments
We next investigate the performance of the localization maps in the presence of model error. We first run a model error experiment using 1000 members, producing the training dataset TD_ME. We train two different maps on TD_ME’s background correlations to be used in verification model error experiments (labeled ME2): 1) A scalar map (
As a reference, we show the model error experiment ME1 using a “perfectly tuned” vector map (
The results of the model error verification experiments using the maps ME1 and ME2 as well as a GC localization (with a half-width equal to 6) are shown in Fig. 13, along with the RMSE of the two reference training experiments TD_PM and TD_ME. The perfect-model experiment PM

Model error experiments. Log-linear plot of the time mean of analysis RMSE as a function of ensemble size is shown for the 14 analyzed fields, for the verification experiments using the localization maps ME1 and ME2. The training experiments TD_ME and TD_PM using 1000 members are reported as baselines (solid blue).
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

Model error experiments. Log-linear plot of the time mean of analysis RMSE as a function of ensemble size is shown for the 14 analyzed fields, for the verification experiments using the localization maps ME1 and ME2. The training experiments TD_ME and TD_PM using 1000 members are reported as baselines (solid blue).
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Model error experiments. Log-linear plot of the time mean of analysis RMSE as a function of ensemble size is shown for the 14 analyzed fields, for the verification experiments using the localization maps ME1 and ME2. The training experiments TD_ME and TD_PM using 1000 members are reported as baselines (solid blue).
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
While the results from using the vector maps ME2
If a longer dataset is not available, we numerically found that one can also overcome this issue by choosing different ρ (results are not shown). We should also mention that for a given
Alternatively, one can use the maps optimized for smaller ensemble sizes. As a supporting argument, we test the robustness of the localization vector maps (

Model error experiments. Time mean of analysis RMSE as a function of training and verification ensemble sizes for experiment ME2 (
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

Model error experiments. Time mean of analysis RMSE as a function of training and verification ensemble sizes for experiment ME2 (
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Model error experiments. Time mean of analysis RMSE as a function of training and verification ensemble sizes for experiment ME2 (
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
The analysis and truth for the experiment ME2

Snapshot of the meridional circulation of (a) the truth and (b) the analysis for the verification experiment ME2 (
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

Snapshot of the meridional circulation of (a) the truth and (b) the analysis for the verification experiment ME2 (
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
Snapshot of the meridional circulation of (a) the truth and (b) the analysis for the verification experiment ME2 (
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

25-day Hovmöller plots of (a) the truth and (b) the analysis for the verification experiment ME2 (
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1

25-day Hovmöller plots of (a) the truth and (b) the analysis for the verification experiment ME2 (
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
25-day Hovmöller plots of (a) the truth and (b) the analysis for the verification experiment ME2 (
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0381.1
5. Summary and conclusions
In this paper, we demonstrated the efficacy of the localization maps introduced in De La Chevrotière and Harlim (2017) in a series of OSSEs realized with the monsoon–Hadley multicloud model, an idealized model with roughly 3600 model coordinates for the synoptic-scale Hadley circulation and monsoonal flow. The model features a stochastic parameterization for clouds to represent the subgrid-scale processes of convection and precipitation and a bulk boundary layer dynamical model. We implemented the localization maps in a serial EnKF to assimilate satellite-like nonlinear indirect observations using an idealized radiative transfer model. We took vertically integrated brightness temperature measurements on 6 different channels over a sparse observational network (the total number of observations is close to 400).
From the perfect-model configuration, we learn that the data-driven localization map with small ensemble sizes of order
We also checked the proposed localization mapping in the presence of model error arising from misspecification of the convective time scales, which impacts on the stochastic dynamics of the cloud area fractions, and in turn affects the large-scale through the convective closure of the model. In this scenario, we found that the filter performances using the localization maps obtained from imperfect-model training data are almost identical to those using the localization maps obtained from perfect-model training data. In some variables (
Closer inspection reveals that the proposed least squares fitting in (8) is an ill-conditioned problem with condition numbers as large as
The numerical results from the scalar map
From the encouraging results in this paper, this data-driven localization mapping is scalable for high-dimensional applications, replacing the usual distance-based parametric-type localization function that is designed for spatial correlations that are local. The nonparametric nature of this approach allows the data to flexibly determine the appropriate nontrivial shape of the localization maps, including various nonlocal correlation dependence, which is usually ignored with the standard localization. One potential issue is the availability of the high-quality training dataset since generating training data without any localization (as done in this paper) is not possible for atmospheric global circulation models at this point. However, one can train the localization maps using empirical correlations obtained from large ensemble member data assimilation simulations with a very broad localization range such as those demonstrated in Miyoshi et al. (2014). Another issue is the availability of the high-quality training dataset in the presence of more severe modeling error, beyond parameter misspecification considered in this paper. In this situation, one may need more advanced model error estimation techniques (Harlim 2017) to generate reliable training dataset. Another challenge in the operational setting is that the atmospheric dynamics are intermittent and seasonal. In addition, various types of observations are usually assimilated. It remains interesting to see whether we can use the idea in this paper to train the localization maps for the observations that have nontrivial nonlocal correlation structures and whether substantial improvement can be attained to offset the cost in the training procedure.
Acknowledgments
The authors thank Dr. Peter Houtekamer for his careful reading of the manuscript and insightful comments. The research of J. H. is partially supported by the ONR Grant N00014-16-1-2888 and the NSF Grants DMS-1317919 and DMS-1619661.
REFERENCES
Abhik, S., M. Halder, P. Mukhopadhyay, X. Jiang, and B. N. Goswami, 2013: A possible new mechanism for northward propagation of boreal summer intraseasonal oscillations based on TRMM and MERRA reanalysis. Climate Dyn., 40, 1611–1624, https://doi.org/10.1007/s00382-012-1425-x.
Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903, https://doi.org/10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.
Anderson, J. L., 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131, 634–642, https://doi.org/10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2.
Anderson, J. L., 2007a: An adaptive covariance inflation error correction algorithm for ensemble filters. Tellus, 59A, 210–224, https://doi.org/10.1111/j.1600-0870.2006.00216.x.
Anderson, J. L., 2007b: Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter. Physica D, 230, 99–111, https://doi.org/10.1016/j.physd.2006.02.011.
Anderson, J. L., 2012: Localization and sampling error correction in ensemble Kalman filter data assimilation. Mon. Wea. Rev., 140, 2359–2371, https://doi.org/10.1175/MWR-D-11-00013.1.
Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127, 2741–2758, https://doi.org/10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO;2.
Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420–436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.
Chen, Y., and D. S. Oliver, 2010: Cross-covariances and localization for EnKF in multiphase flow data assimilation. Comput. Geosci., 14, 579–601, https://doi.org/10.1007/s10596-009-9174-6.
De La Chevrotière, M., 2015: Stochastic and numerical models for tropical convection and Hadley-monsoon dynamics. Ph.D. thesis, University of Victoria, 233 pp.
De La Chevrotière, M., and J. Harlim, 2017: A data-driven method for improving the correlation estimation in serial ensemble Kalman filters. Mon. Wea. Rev., 145, 985–1001, https://doi.org/10.1175/MWR-D-16-0109.1.
De La Chevrotière, M., and B. Khouider, 2017: A zonally symmetric model for the monsoon-Hadley circulation with stochastic convective forcing. Theor. Comput. Fluid Dyn., 31, 89–110, https://doi.org/10.1007/s00162-016-0407-8.
De La Chevrotière, M., B. Khouider, and A. J. Majda, 2016: Stochasticity of convection in Giga-LES data. Climate Dyn., 47, 1845–1861, https://doi.org/10.1007/s00382-015-2936-z.
Demmel, J. W., 1997: Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, xi + 416 pp., https://doi.org/10.1137/1.9781611971446.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, 10 143–10 162, https://doi.org/10.1029/94JC00572.
Furrer, R., and T. Bengtsson, 2007: Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivar. Anal., 98, 227–255, https://doi.org/10.1016/j.jmva.2006.08.003.
Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757, https://doi.org/10.1002/qj.49712555417.
Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–2790, https://doi.org/10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.
Harlim, J., 2017: Model error in data assimilation. Nonlinear and Stochastic Climate Dynamics, C. L. E. Franzke and T. J. O’Kane, Eds., Cambridge University Press, 276–317, https://doi.org/10.1017/9781316339251.011.
Hastie, T., R. Tibshirani, and J. Friedman, 2009: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Vol. 1, Springer Series in Statistics, Springer, 745 pp., https://doi.org/10.1007/978-0-387-84858-7.
Hotelling, H., 1953: New light on the correlation coefficient and its transforms. J. Roy. Stat. Soc., 15B (2), 193–232.
Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796–811, https://doi.org/10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.
Houtekamer, P. L., and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131, 3269–3289, https://doi.org/10.1256/qj.05.135.
Houtekamer, P. L., and F. Zhang, 2016: Review of the ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 144, 4489–4532, https://doi.org/10.1175/MWR-D-15-0440.1.
Janjić, T., D. McLaughlin, S. E. Cohn, and M. Verlaan, 2014: Conservation of mass and preservation of positivity with ensemble-type Kalman filter algorithms. Mon. Wea. Rev., 142, 755–773, https://doi.org/10.1175/MWR-D-13-00056.1.
Johnson, R. H., T. M. Rickenbach, S. A. Rutledge, P. E. Ciesielski, and W. H. Schubert, 1999: Trimodal characteristics of tropical convection. J. Climate, 12, 2397–2418, https://doi.org/10.1175/1520-0442(1999)012<2397:TCOTC>2.0.CO;2.
Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. J. Basic Eng., 82, 35–45, https://doi.org/10.1115/1.3662552.
Katsoulakis, M. A., A. J. Majda, and D. G. Vlachos, 2003: Coarse-grained stochastic processes and Monte Carlo simulations in lattice systems. J. Comput. Phys., 186, 250–278, https://doi.org/10.1016/S0021-9991(03)00051-2.
Khouider, B., and A. J. Majda, 2006a: Model multi-cloud parameterizations for convectively coupled waves: Detailed nonlinear wave evolution. Dyn. Atmos. Oceans, 42, 59–80, https://doi.org/10.1016/j.dynatmoce.2005.12.001.
Khouider, B., and A. J. Majda, 2006b: Multicloud convective parametrizations with crude vertical structure. Theor. Comput. Fluid Dyn., 20, 351–375, https://doi.org/10.1007/s00162-006-0013-2.
Khouider, B., and A. J. Majda, 2006c: A simple multicloud parameterization for convectively coupled tropical waves. Part I: Linear analysis. J. Atmos. Sci., 63, 1308–1323, https://doi.org/10.1175/JAS3677.1.
Khouider, B., and A. J. Majda, 2007: A simple multicloud parameterization for convectively coupled tropical waves. Part II: Nonlinear simulations. J. Atmos. Sci., 64, 381–400, https://doi.org/10.1175/JAS3833.1.
Khouider, B., and A. Majda, 2008: Multicloud models for organized tropical convection: Enhanced congestus heating. J. Atmos. Sci., 65, 895–914, https://doi.org/10.1175/2007JAS2408.1.
Khouider, B., A. Majda, and M. Katsoulakis, 2003: Coarse-grained stochastic models for tropical convection and climate. Proc. Natl. Acad. Sci. USA, 100, 11 941–11 946, https://doi.org/10.1073/pnas.1634951100.
Khouider, B., J. Biello, and A. J. Majda, 2010: A stochastic multicloud model for tropical convection. Commun. Math. Sci., 8, 187–216, https://doi.org/10.4310/CMS.2010.v8.n1.a10.
Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP—A comparison with 4D-Var. Quart. J. Roy. Meteor. Soc., 129, 3183–3203, https://doi.org/10.1256/qj.02.132.
Lorenz, E. N., 1996: Predictability: A problem partly solved. Proc. Seminar on Predictability, Vol. 1, Shinfield Park, Reading, United Kingdom, 18 pp., https://www.ecmwf.int/sites/default/files/elibrary/1995/10829-predictability-problem-partly-solved.pdf.
Mapes, B., S. Tulich, J. Lin, and P. Zuidema, 2006: The mesoscale convection life cycle: Building block or prototype for large-scale tropical waves? Dyn. Atmos. Oceans, 42, 3–29, https://doi.org/10.1016/j.dynatmoce.2006.03.003.
Miyoshi, T., K. Kondo, and T. Imamura, 2014: The 10,240-member ensemble Kalman filtering with an intermediate AGCM. Geophys. Res. Lett., 41, 5264–5271, https://doi.org/10.1002/2014GL060863.
Oke, P. R., P. Sakov, and S. P. Corney, 2007: Impacts of localisation in the EnKF and EnOI: Experiments with a small model. Ocean Dyn., 57, 32–45, https://doi.org/10.1007/s10236-006-0088-8.
Strang, G., 1968: On the construction and comparison of difference schemes. SIAM J. Numer. Anal., 5, 506–517, https://doi.org/10.1137/0705041.
Tardif, R., G. J. Hakim, and C. Snyder, 2014: Coupled atmosphere–ocean data assimilation experiments with a low-order climate model. Climate Dyn., 43, 1631–1643, https://doi.org/10.1007/s00382-013-1989-0.
Waite, M. L., and B. Khouider, 2009: Boundary layer dynamics in a simple model for convectively coupled gravity waves. J. Atmos. Sci., 66, 2780–2795, https://doi.org/10.1175/2009JAS2871.1.
Waller, J. A., S. L. Dance, A. S. Lawless, and N. K. Nichols, 2014: Estimating correlated observation error statistics using an ensemble transform Kalman filter. Tellus, 66A, 23294, https://doi.org/10.3402/tellusa.v66.23294.
Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 1913–1924, https://doi.org/10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.