A one-step blended soundproof-compressible model with balanced data assimilation: theory and idealised tests

A challenge arising from the local Bayesian assimilation of data in an atmospheric flow simulation is the imbalances it may introduce. Acoustic fast-mode imbalances of the order of the slower dynamics can be negated by employing a blended numerical model with seamless access to the compressible and the soundproof pseudo-incompressible dynamics. Here, the blended modelling strategy by Benacchio et al., MWR, vol. 142 (2014) is upgraded in an advanced numerical framework and extended with a Bayesian local ensemble data assimilation method. Upon assimilation of data, the model configuration is switched to the pseudo-incompressible regime for one time-step. After that, the model configuration is switched back to the compressible model for the duration of the assimilation window. The switching between model regimes is repeated for each subsequent assimilation window. An improved blending strategy for the numerical model ensures that a single time-step in the pseudo-incompressible regime is sufficient to suppress imbalances coming from the initialisation and data assimilation. This improvement is based on three innovations: (i) the association of pressure fields computed at different stages of the numerical integration with actual time levels; (ii) a conversion of pressure-related variables between the model regimes derived from low Mach number asymptotics; and (iii) a judicious selection of the pressure variables used in converting numerical model states when a switch of models occurs. Idealised two-dimensional travelling vortex and buoyancy-driven bubble convection experiments show that acoustic imbalances arising from data assimilation can be eliminated by using this blended model, thereby achieving balanced analysis fields.


Abstract
A challenge arising from the local Bayesian assimilation of data in an atmospheric flow simulation is the imbalances it may introduce. Acoustic fast-mode imbalances of the order of the slower dynamics can be negated by employing a blended numerical model with seamless access to the compressible and the soundproof pseudoincompressible dynamics. Here, the blended modelling strategy by Benacchio et al., MWR, vol. 142 (2014) is upgraded in an advanced numerical framework and extended with a Bayesian local ensemble data assimilation method. Upon assimilation of data, the model configuration is switched to the pseudo-incompressible regime for one time-step. After that, the model configuration is switched back to the compressible model for the duration of the assimilation window. The switching between model regimes is repeated for each subsequent assimilation window. An improved blending strategy for the numerical model ensures that a single time-step in the pseudoincompressible regime is sufficient to suppress imbalances coming from the initialisation and data assimilation. This improvement is based on three innovations: (i) the association of pressure fields computed at different stages of the numerical integration with actual time levels; (ii) a conversion of pressure-related variables between the model regimes derived from low Mach number asymptotics; and (iii) a judicious selection of the pressure variables used in converting numerical model states when a switch of models occurs. Idealised two-dimensional travelling vortex and buoyancy-driven bubble convection experiments show that acoustic imbalances arising from data assimilation can be eliminated by using this blended model, thereby achieving balanced analysis fields.
Significance statement Weather forecasting models use a combination of physics-based algorithms and meteorological measurements. A problem with combining outputs from the model with measurements of the atmosphere is that insignificant signals may generate noise and compromise the physical soundness of weatherrelevant processes. By selecting atmospheric processes through the toggling of parameters in a mixed model, we propose to suppress the undesirable signals in an efficient way and retain the physical features of solutions produced by the model. The approach is validated here for acoustic imbalances using a compressible/pseudo-incompressible model pair. This development has the potential to improve the techniques used to bring observations into models and with them the quality of atmospheric model output.

Motivation
Dynamical processes in the atmosphere evolve on a range of spatiotemporal scales, most comprehensively expressed by the full compressible flow equations. Limit regimes, derived from the full compressible flow equations by scale analysis and asymptotics, describe reduced dynamics, examples being the soundproof anelastic and pseudoincompressible models traditionally used at small-to mesoscale, and the hydrostatic primitive equations at large to planetary scales [Pedlosky, 2013, Vallis, 2017, Klein, 2010.
To access the dynamics of the full compressible flow equations and of their limit regimes, separate numerical schemes can be developed for each of the limiting models. From a computational perspective, however, the discrepancies between numerical solutions of different equation sets obtained by essentially the same numerical scheme can be substantially smaller than the discrepancies associated with the solution of one and the same equation set by different numerical schemes [Smolarkiewicz andDörnbrack, 2008, Klein, 2009]. Benacchio et al. [2014],  and, separately, Smolarkiewicz et al. [2014] developed discretisation schemes for the compressible equations that allow access to the pseudo-incompressible model within a single numerical framework, showing equivalent results of both configurations in small-to mesoscale tests involving acoustically balanced flows. The blended analytical and numerical framework in Benacchio et al. [2014], , within which the compressible to pseudo-incompressible transition is realised as a continuum of models controlled by an appropriate blending parameter, was conceptually extended in Klein and Benacchio [2016] to include access to hydrostatic models. Benacchio and Klein [2019] then proposed a numerical implementation and achieved equivalence of hydrostatic and nonhydrostatic model solutions on large scales in the absence of vertically propagating acoustic modes.
Balanced data assimilation provides a key motivation for blended numerical models. A problem with local data assimilation is the imbalance that it may induce [Lorenc, 2003]. As the assimilation procedure does not take heed of specific characteristics of a flow, such as conservation of mass, momentum, and energy, or of particular smoothness properties, the initial balance of a flow state may be destroyed by the assimilation procedure, see Neef et al. [2006] and more specifically Greybush et al. [2011], Bannister [2015] on the effects of localisation on balanced analysis fields.
Physically, local data assimilation in a compressible framework can introduce imbalances through fast acoustic modes with velocity amplitudes that may be of the same order of magnitude as the velocities found in the slowly evolving balanced dynamics of interest, with potentially destructive effects on overall solution quality [Hohenegger and Schär, 2007]. Judicious use of a blended soundproof-compressible model can be employed to counteract this effect. Imbalances inherent in the initial pressure fields can be effectively reduced by solving the initial time-steps of a simulation in the pseudo-incompressible regime so that, upon the subsequent transition to the compressible regime over several further time steps, the pressure field is balanced with respect to the initial velocities and potential temperature fields [Benacchio et al., 2014. More specifically, the blending method leverages a discrete orthogonal projection onto the space of pseudo-incompressible solutions. Therefore, the blending scheme used in an ensemble data assimilation framework yields the ensemble of balanced solutions closest to the analysis ensemble, measured in a norm weighted by the mass-weighted potential temperature. By extension of this insight, when mounting data assimilation on the numerics, a projection of the solution onto the soundproof pseudo-incompressible model can suppress the fast acoustic modes arising from the assimilation procedure. After suppression of the fast modes, the remaining time-steps until the next assimilation procedure are solved with the compressible model. As this method makes use of the different dynamics modelled by the compressible and soundproof equation sets, it fundamentally deviates from existing methods to handle initialisation problems such as the post-analysis digital filter [DFI, e.g., Lynch and Huang, 1992] and the incremental analysis update [IAU, Bloom et al., 1996]. These techniques act as low-pass filters, and repeated application of the filter may have undesirable effects on long-term dynamics Zhang, 2016, Polavarapu et al., 2004].
Balance was also shown to improve with the choice of localisation space [Kepert, 2009] and by allowing observations outside of a localisation radius to relax to a climatological mean [Flowerdew, 2015]. Hastermann et al. [2021] compared the effects of the blending approach with those of the post-analysis penalty method in achieving balanced analysis fields for highly oscillatory systems and found comparable improvements for both methods in the case of nonlinear balance relations. See also Zupanski [2009] and Houtekamer and Zhang [2016] for reviews of balanced atmospheric data assimilation.

Contributions
This paper proposes a dynamics-driven method to achieve balanced data assimilation using a blended numerical framework with the following advances: • One-step blending of the pseudo-incompressible and compressible models by instantaneous switching. This is achieved by (a) accounting for the fact that Exner pressure fields computed at comparable stages within a time step correspond to different time levels in the compressible and soundproof model; (b) judiciously converting the thermodynamic variables between the compressible and soundproof models motivated by low Mach number asymptotic arguments; and (c) carefully selecting, based on (a) and (b), the pressure variables used in converting numerical model states at the blending time interfaces. One-step blending is a sizeable improvement over Benacchio et al. [2014], who needed several intermediate time-steps for the blending procedure.
• Exploitation of the blended framework for balanced ensemble data assimilation. We employ an untuned data assimilation scheme that is known to introduce imbalances. After each assimilation of data, a single time-step in the pseudo-incompressible model configuration is used to suppress the fast acoustic imbalances. The model configuration is then switched back to the compressible model. In the reported idealised experiments, balanced analysis fields are obtained by combining data assimilation and blending, thus verifying the ability of the blended model to handle imbalances consistently with the underlying compressible and soundproof dynamics.
The effects of data assimilation and blending on balanced solutions are investigated in the two-dimensional numerical experiments of a travelling vortex and of a rising thermal in a vertical slice [see Kadioglu et al., 2008, Mendez-Nunez and Carroll, 1994, Klein, 2009. For these tests, unbalanced and untuned data assimilation is shown here to destroy solution quality, while the use of blending effectively recovers the structure of the solution as evaluated by comparison with runs without data assimilation. Moreover, with the balanced data assimilation procedure, the solution quality of the observed quantities is maintained or improved independently of the size of the localisation region, which is an important tunable parameter of many sequential data assimilation procedures. The imbalances introduced by data assimilation in these idealised test cases can be quantified by scale analysis.
The paper is structured as follows. Section 2 contains a brief introduction to data assimilation and the Kalman filters considered here. Section 3 reviews the blended numerical framework. Section 4 proposes the new blending scheme and section 5 details the results of numerical experiments. The effectiveness of the one-step blended soundproof-compressible scheme is investigated for balanced data initialisation in section 5.1 and its application towards balanced data assimilation in 5.3. Section 6 contains discussion and conclusion.
2 Data assimilation: a quick primer Data assimilation is used in numerical weather prediction to improve forecasting. Existing approaches include 4D-Var, which optimises model states over a finite time horizon in the past before launching a new prediction, and sequential assimilation procedures, which assimilate the available observations at specific points in time. Here we focus on the latter, which are more susceptible to the problem of imbalances addressed in this paper due the local nature of these methods and especially when the localisation is severe [Cohn et al., 1998, Mitchell et al., 2002.
Modern weather forecasting techniques aim to represent the uncertainty of a forecast by generating an ensemble of likely candidates of model states. Such an ensemble can be understood as an approximate representation of a probability distribution over model states. The task of sequential data assimilation is then as follows. Suppose we are given the probabilistic weight of each ensemble member at a previous instance in time, i.e., at the beginning of the current simulation window, together with the forward simulation states of all ensemble members at the current time, i.e., at the end of the simulation window. Then the prior probability distribution pdf prior is represented by the model states at the new time level together with their probabilistic weights inherited from the beginning of the simulation window. Now we are to readjust the current states or the probabilistic weights of the ensemble members, at fixed time, such that the resulting posterior probability distribution pdf post best reflects the observations that have arrived during the simulation window.
The connection between pdf prior and pdf post can be established in a Bayesian framework. For this purpose we assume no model error and x truth to be a perfectly resolved, true model state. We denote the observation operator by H. Then, observations that have arrived during the simulation window are subject to Gaussian distributed noise and satisfy Now Bayes' theorem gives pdf post (x) = pdf(x|y obs ) = pdf(y obs |x) pdf(y obs ) pdf prior (x).
Here pdf(x|y obs ) is the conditional probability of state x given the observations y obs and pdf(y obs |x) is the probability of observation y obs given the state x. The right-hand side of this equation is computable given the information before the data assimilation step, noting that the best available estimate of pdf(y obs ) is the expectation of pdf(y obs |x) with respect to x under the prior probability distribution. See Wikle and Berliner [2007], Reich and Cotter [2013] for more details on Bayesian data assimilation.

The Kalman filters
Kalman filters are a family of popular Bayesian-based data assimilation methods [Kalman, 1960] that assumes Gaussian shape for all probability densities so that they can be fully characterised by their means and covariance matrices. Identifying the prior with the term forecast (f ), and the posterior with the term analysis (a), the Kalman filter is where B B B ∈ R m×m and R R R ∈ R l×l are the covariance matrices associated with the forecast and observations, respectively. K K K is the Kalman gain, which rewards the forecast if B B B R R R and penalises it if R R R B B B. A class of Monte Carlo-based Kalman filters, the ensemble Kalman filters, avoid the problem of high dimensionality by approximating the underlying probability density functions through the empirical distributions given by an ensemble of individual simulation states [Reich and Cotter, 2015]. As a consequence, ensemble-based methods are often computationally more efficient than any scheme that aims to explicitly describe entire probability density functions.
Specifically, for an ensemble of size K, the ensemble forecast is {x f 1 , . . . , x f K } and the ensemble's parametric information specifying its probability distribution is updated bȳ wherex a/f is the ensemble mean and P P P a/f K ∈ R K×K is the covariance associated with the ensemble.
A drawback to the ensemble Kalman filter is that the covariance is determined by the spread of the ensemble and is therefore typically underestimated. However, ensemble inflation can be applied by multiplying the ensemble covariance by a constant factor larger than 1. This increases the covariance in the direction of the ensemble spread [Anderson, 2007, Van Leeuwen et al., 2015.
This paper uses the local ensemble transform Kalman filter (LETKF) data assimilation method [Hunt et al., 2007] based on the ensemble square-root filter (ESRF). The LETKF localises the observation covariance in such a way that observations farther away from the grid point under analysis have less influence, tapering off to zero influence for observations outside of a prescribed observation radius. The algorithm for the LETKF is provided in Appendix A.
Localisation prevents spurious correlations of faraway observations while potentially reducing the complexity of the problem by making the observation covariance matrix closer to diagonal [Hamill et al., 2001, Houtekamer andMitchell, 1998]. After localisation, the analysis is only performed on a smaller local region, and the global analysis ensemble comprises different linear combinations of the ensemble members in each of these local regions. This allows the ensemble to represent a higher-dimensional space than one constrained by the ensemble size [Fukumori, 2002, Mitchell et al., 2002. A smaller ensemble size may necessitate more severe localisation.
When applying the LETKF, there are two potential sources for imbalances. In the case of a nonlinear balance relation, the LETKF fails to recover the desired balance due to its local linear construction. Even without localisation and for a given observation, the analysis ensemble of the ESRF is obtained as a linear combination of the forecast ensemble. In the case of linear balances, the situation is more subtle. On one hand the ESRF is capable of resolving linear balances due to its linear construction. On the other hand the LETKF, utilising localisation, does not act as a linear map on the global fields and therefore does not necessarily preserve the balance relation. Numerical experiments in this paper investigate imbalances arising from both these sources. A smooth localisation function, such as the truncated Gaussian function or the Gaspari and Cohn [1999] function, may be used to keep the resulting field sufficiently smooth.
3 The blended numerical model

Governing equations
In a rotating three-dimensional Cartesian domain, the adiabatic, dry compressible fluid flow equations for an ideal gas under gravity are: where ρ is the density, u = (u, v) the vector of horizontal velocities and w the vertical velocity, P is the mass-weighted potential temperature and π is the Exner pressure. f is the Coriolis parameter on the horizontal (x, y)-plane, k a unit vector in the vertical direction and × the cross product. g is the acceleration of gravity acting in the direction of k. • denotes the tensor product, ∇ denotes the horizontal gradient while the subscripts t and z denote the partial derivatives with respect to time t and the vertical coordinate z. π and P are related to the thermodynamic pressure p by the equation of state, where p ref is a reference pressure, c p and c v are the heat capacities at constant pressure and constant volume, R = c p − c v is the ideal gas constant, and Θ is the potential temperature. The parameter α P tunes between the compressible and the pseudo-incompressible model [Durran, 1989, Klein et al., 2010. Identifying χ with the inverse potential temperature the Exner pressure and inverse potential temperature can be decomposed as π =π + π and (8a) where the bar denotes a hydrostatic background state, which depends only on the vertical coordinate, and the prime denotes the perturbation. Rewriting (5) with (7) yields Using the notation of Smolarkiewicz et al. [2014] and Benacchio and Klein [2019], Ψ = (χ, χu, χw, χ ), (9) can be written compactly as where v = (u, v, w) subsumes the three-dimensional velocity fields, A(Ψ; P v) denotes the advection of the quantity Ψ given the advective fluxes P v, while Q(Ψ; P ) describes the effect on the right-hand side of (9) on Ψ given P . From (6), P is a function of π only, where γ = c p /c v is the isentropic exponent. With (12), (11b) becomes 3.2 Summary of the numerical scheme Equation (9d) is discretised in time with an implicit midpoint method, In order to obtain the advective fluxes at the half time-level, the timeupdate for equations (11) is split into advective and non-advective terms. The advection terms on the left are updated by where∇ is the discrete divergence and A 1st is an advection scheme corresponding to the half time-level update. The terms on the right are then advanced using an implicit Euler method, Expressions (15) and (16) yield the advective fluxes at the half time-level. Subsequently, the quantities Ψ are updated to the full time-level with an explicit Euler half step followed by a full advection step and a final implicit Euler half step, yielding a second-order accurate one-step method [Benacchio and Klein, 2019, Smolarkiewicz, 1991, Smolarkiewicz and Margolin, 1993. A first-order Runge-Kutta method is used for the advection operator A ∆t/2 1st in (15a) while second-order Strang splitting is used for A ∆t 2nd in (17b). The former is necessary for the time-level analysis in section 4 to hold and the latter guarantees second order in time of the overall scheme. The spatial discretisation of the numerical scheme is based on a finite volume framework, for more details see section 4 in Benacchio and Klein [2019].

Single time-step soundproofcompressible transition
In the following, a conversion of pressure-related quantities, motivated by low Mach number asymptotics and applied prior to the model transitions, is proposed which allows for model switching within a single time-step.

Time-level of the pressure-related variables
In the simpler non-rotating case without gravity (g, f = 0), the update for the momentum equation multiplied by the potential temperature Θ in (16a) and (17c) read where the superscript adv denotes the quantity that becomes available after the advection substeps (15a) and (17b). Applying an implicit Euler discretisation to (19), we find where the superscript in denotes the quantities at the time-level corresponding to the start of the time-step and out at the end. δt ≤ ∆t is an arbitrary time-step.

The compressible equations
For the case α P = 1, using (12) a discretisation of the left-hand side of (13) yields at the half time-level where (∂P /∂π) # is obtained from P after the advection step at the half-time level, (15). Substituting (21) into (16b), (20) is the solution of the advection terms in (15), with δt = ∆t/2. Identifying out with n + 1/2 and rearranging, (22) becomes which fixes the time-level of π after the half-time step of (15) and (16) at n + 1/2. For the full-time stepping of (17), a similar procedure yields From (24), π is at time-level n + 1/2 after the half-time stepping (16) while (25) starts with π at time-level n for the full-time stepping (17). Therefore, the time-level of π has to be reset from n + 1/2 to n after the half-time step (16) and before the full time-step (17). Furthermore, the time-level of π after the full time-step (17) is n + 1 as intended.

The pseudo-incompressible equations
For α P = 0, the coupling between P and π in (13) no longer holds and the two variables decouple, leading to Enforcing this divergence constraint for the left-hand side of (20), we obtain∇ At the half-time level, (P v) in is the solution of (15) comprising the half time-step advection. Therefore, where∂ t is the discrete partial time derivative. Assuming that the divergence constraint (26) has been satisfied at the end of time-step (n − 1), the first term on the right-hand side vanishes. As the second term is generated by (15) starting at time-level n, i.e., by an explicit advection step associated with the left-hand side of (9b) and (9c) multiplied by Θ,∂ equation (28) becomes Inserting (30) back into (27), with δt = ∆t/2 and adv as #, where the right-hand side has fixed the time-level of π out at n. For the full time-stepping, (P v) in is the solution of (17b) and so (27) is∇ with where the second term in the square brackets is a correction of (P v) from the implicit substep at the half time-level (16) and the third term is the solution of the advection substep at the full time-level. Assuming again that the divergence constraint (26) has been fulfilled at the onset,∇ · (P v) n = 0.
Substitute (31) into (33), and note that advection substep (17b) solves the left-hand side of (9b) and (9c) multiplied with Θ, i.e, where the half time-level of the second term emerges from the solution of substeps (17a) and (17b) under the advecting fluxes (P v) n+1/2 . Putting (35) and (36) together yields, after re-elaborations, Figure 1: Summary of the time-levels of π for the pseudo-incompressible (psinc) and the compressible (comp) solutions in the numerical scheme. ∆t indicates time-step size. The dashed lines relate the π's at the same time-level between the two models. The two valid choices of π in the pseudo-incompressible to compressible blending, at time t n and t n+1 , are depicted with arrows.
Inserting (37) back into (32) gives fixing the time-level of π out at n + 1, since the right-hand side is a half time-step advancement from the n + 1/2 time-level.
In contrast to the compressible case, expressions (31) and (38) imply that Exner pressure π after the half-step (15) and (16) is at the time-level n, and could be used as the input to (17) as an alternative to using the Exner pressure obtained at the end of time step n − 1. With this option, π does not have to be reset to time-level n after the half-time predictor for the pseudo-incompressible solve. Figure 1 summarises the time-level analysis of π.

Conversion of the pressure-related variables
Expression (8a) separates the background Exner pressure from its perturbation. For low Mach number flows, Ma 1, such a separation is naturally induced by the asymptotic expansion based on which we can blend the pseudo-incompressible (α P = 0) and compressible (α P = 1) models. Using (12), the compressible P comp is obtained from the pseudo-incompressible model variables as By inverting (41), the pseudo-incompressible P psinc is obtained as Therefore, at the blending time interfaces between the compressible and the pseudo-incompressible configurations, expression (41) or (42) is applied depending on the direction of the transition.

Association of perturbation variables between the compressible and soundproof models
The time-level analysis of π in section 4.1 demonstrated that, in a pseudo-incompressible solve, both the Exner pressure solution after the full time-step from t n to t n+1 and that obtained after the subsequent half time-step are associated with the same time-level t n+1 . Consider then the compressible to pseudo-incompressible transition at time n + 1. P n+1 psinc is obtained by inserting π n+1 comp into the right-hand side of (42). Moreover, there are two valid choices for π in a pseudo-incompressible to compressible transition ( Figure 1): (1) π full , i.e., π obtained after the full n-to-n + 1 time step; or (2) π half , i.e., π obtained after the n + 1-to-n + 3/2 half time-step. π half is obtained from the solution of (16) with the solution of (15) as its input. The input to (15) are Ψ n and (P v) n . This means that π half is recovered from the other quantities and is independent of π at the previous time-level, so errors in the initialisation of π are not propagated. By contrast, π full is obtained from the solution of (17). The explicit (17a) has π as an input to the right-hand side Q(Ψ n ; P n ). Therefore, π full propagates errors in the initialisation of π. Note that choice (2) entails solving an additional time-step in the pseudoincompressible regime to obtain π half at the psuedo-incompressible to compressible blending time interfaces.
In addition, choice (2) offers a conceptual advantage. The Exner pressure field in the pseudo-incompressible model is not controlled by an evolution equation but rather acts as a Lagrangian multiplier ensuring compliance of the velocity field with the divergence constraint at some fixed time. Thus, a direct dependence of the pressure on its previous time level data, as occurs under option (1), is a numerical artefact that should be avoided.

Data assimilation and blending
A data assimilation engine is used to insert observations in the fully compressible configuration of the blended numerical framework. Prior to the assimilation procedure at time t n , the forecast ensemble state vector {x f k } n for k = 1, . . . , K and a set of observations y n obs are available. K is the ensemble size. For vertical slice simulations with the full compressible flow equations, the ensemble state vector is Here, the two-dimensional spatial grid has (N x × N z ) cells and m = (5 × N x × N z ). The observations of the momentum fields are where the subscript obs indicates that the data is obtained externally and is noisy and sparse, and l = (2 × N obs (n)), with N obs (n) the time-dependent dimension of the sparse observation space. The observation covariance R R R n is determined by the observation noise. The forward observation operator H selects for each {x f k } n in (43) the momenta (ρu) n and (ρw) n on the grid points corresponding to the sparse observations, thereby projecting x f,n k from the state space R m into observation space R l , i.e.
The ensemble mean in observation space is computed as A similar ensemble averaging is applied to obtainx f,n . As observation localisation is used in the LETKF algorithm [Hunt et al., 2007], only observations in a local region surrounding a given grid point are involved in its update. A localisation function is furthermore applied to the observations in the local region, see section 5.2 for more details on the setup. A Kalman gain K K K n similar to (4c) is obtained from the observation operator H, the observation covariance R R R n , and an ensemble inflation factor b. As in the right-hand side of (4a), the distance of the forecast − 1

Pseudo-incompressible Compressible Compressible
Figure 2: Schematic of data assimilation with blending for data assimilated at time t n . Blending time interfaces are in red, and the time-step spent in the pseudo-incompressible regime is shaded. See main text for full description. Numbers in brackets refer to equations, and ( §) denotes the section.
ensemble mean from the observations is computed with (44) and (46). From these, a set of K weight vectors {w a k } n is obtained, applied to (43), and added tox f,n , updating the forecast ensemble to the analysis ensemble. Further details are given in Appendix A.
Once the assimilation procedure is completed, the model switches to the pseudo-incompressible limit regime and then back again to fully compressible until the next assimilation time. The process of switching back and forth between the model configurations exploits the blended numerical model to achieve balanced data assimilation and is termed blended data assimilation.
In particular, if data are assimilated into the compressible flow equations at time n, then compressible to pseudo-incompressible blending entails setting the switch α P to 0 and converting the quantity P comp with (42). The solution is then propagated in the pseudoincompressible regime for a time-step, after which α P is set back to 1, switching to the compressible flow equations. The quantity P psinc is reconverted by (41) using either π half or π full . Figure 2 summarises the procedure. Following the analysis from section 4.1, the perturbation variable π is reset after the half time-stepping in the solution of the full model, but not in the solution of the limit model. The blended data assimilation workflow is displayed in Figure 3.
As our principal strategy is to split measures of balancing the flow state from those of assimilating the data, we have not tuned the data assimilation procedures themselves in any way. Tuning its parameters may further improve balance, but as our balancing strategy is rather successful without this, the degrees of freedom of parameter tuning might be used more efficiently to achieve additional Data assimilation § 2 and § 5b Observations / data § 5b Blended numerical model § 3 and § 5a Initial conditions § 5a Analysis § 5c Forecast § 5c Balancing § 4 and § 5c Figure 3: Blended data assimilation workflow with the sections ( §) of this paper describing the algorithmic components. The initial condition (dashed outline) is used only once to start the simulation. For each assimilation window, externally obtained observations / data are assimilated into the forecast and the algorithm loops through the components following the direction of the two-headed arrows.
goals aside from the elimination of unphysical acoustic noise.

Numerical results
The idealised test cases of a travelling vortex and a rising warm air bubble are used to validate model performance in this section. To evaluate the effectiveness of the single time-step blended soundproofcompressible scheme, unbalanced states are initialised in the compressible flow equations for both test cases and the blended scheme is applied. The balance of the compressible solution with unbalanced initial states is evaluated by "probe measurements", i.e., by time series of the flow variables at selected points in the domain, and compared against analogous data extracted from the soundproof solution [Benacchio et al., 2014].
For blended ensemble data assimilation, an ensemble is generated by perturbing the initial conditions. Then, the blended scheme is applied after the assimilation of observations into the compressible flow equations and repeated after each assimilation procedure. The quality of balanced data assimilation is evaluated by root mean square errors with respect to a reference solution. In order to gauge the performance of the improved blended model, probe measurements of the full pressure increments δp are taken, defined, e.g. at time-level n, as at the center (0.0 km, 0.0 km). The first increment δp 0 corresponds to a spinup adjustment and is therefore omitted in the plots, as done in Benacchio et al. [2014]. The distance in the pressure perturbation result of a given run compared to the reference pseudo-incompressible run is quantified by the relative error E ν , where ν = b for the blended run and ν = c for the imbalanced compressible run. · 2 is the 2-norm taken over the probe measured time series of δp.
An imbalanced initial state is created by setting P = 10 5 Pa and π = 0.0 over the whole domain for the full compressible flow equations (5) with α P = 1. This state is propagated for one time-step in the limit pseudo-incompressible regime followed by the rest of the time-steps in the fully compressible model. The blending scheme in section 4 is used to transition between the model regimes.
For these imbalanced initial states, a compressible run with blending is compared with a compressible run without blending and with a pseudo-incompressible run (top panel in Figure 5). Fast acoustic modes are filtered from the blended solution and the result is indistinguishable from the limit pseudo-incompressible reference solution, save for an initial adjustment in the first time-step. Blending is able to recover the dynamics of the balanced state.
A close-up (bottom panel of Figure 5) compares the blended runs with choices of π half and π full from section 4.3 against a run with the balanced initial state obtained from the known exact compressible vortex solution. The blended runs are as good as, and the π half run slightly closer to, the balanced compressible run. The relative error of the blended run with respect to the reference balanced run is 0.0285 using π half and 0.0393 using π full . This corroborates the insight from section 4.3 that π half is a better choice. The choice of π half is used from here on.

The rising bubble experiment
The second test consists of a gravity-driven thermal flow with f = 0.0 s −1 initialised as a bubble-shaped positive potential temperature perturbation δΘ, on a constant isentropic background with Θ 0 = 300   -Nunez and Carroll, 1994, Klein, 2009, Benacchio et al., 2014. The dimensionless perturbation δΘ is defined by where r 0 = 2.0 km is the initial radius of the bubble, and z 0 = 2.0 km the initial vertical displacement of the bubble. The choice of reference units yields Ma ≈ 0.0341. The models are run on a grid with (160 × 80) cells to a final simulation time of 1000.0 s. The initial pressure fields are set to reflect a horizontally homogeneous hydrostatic pressure field p(z) based on Θ 0 and initial condition p(0) = 10 5 Pa, with π ≡ 0.0. These pressure data are   imbalanced, however, with respect to the perturbed initial potential temperature Θ 0 + δΘ, see (49). Potential temperature at the initial and final time are depicted in Figure 6. The initial stages of the bubble evolution are compared for the compressible, pseudo-incompressible and one-step blended runs in Figure 7. As the initial state is not hydrostatically balanced, pressure waves propagate in the compressible configuration (top left panel) as seen in a time series of pressure perturbation increment at (x, z) = (−7.5, 5) km (orange cross in the top left panel and blue line in the top right panel). The acoustics are absent in the soundproof configuration (top right panel, black line) and in the single time-step blended soundproof-compressible configuration (orange dots).
Next, the blended run and the pseudo-incompressible run are compared in more detail (Figure 7, middle and bottom panels) with pressure perturbation increment probe measurements δp , and p = p −p. The probes are located at (x, z) = (−7.5, 5) km (middle panels) and at (x, z) = (0, 5) km (red cross in the top left panel and bottom panels in Figure 7), both with a constant small timestep ∆t = 1.9 s (top, middle left and bottom left panels) and for larger, advective CFL-constrained time-steps (middle right and bottom right panels, CFL = 0.5 and ∆t = 21.69 s for the first two   time-steps). Away from the bubble trajectory (middle panels), the pressure perturbation increment due to the rising bubble and the remnants of the background acoustics from blending are comparable Table 1: Errors E c and E b (see text for definitions) of the time series of δp in [0, 1000] s relative to the reference pseudo-incompressible run (middle and bottom panels of Figure  7). ∆t AC = 1.9 s, ∆t ADV is determined by advective CFL = 0.5 and ∆t ADV = 21.69 s for the first two time-steps. Probe location (−7.5, 5) km corresponds to the orange marker and orange lines in Figure 7 and (0, 5) km to the red markers and red lines. in amplitude. Larger amplitudes are observed with the blended model and the larger time step (middle right panel), but they are still very small compared to the fully compressible run (note the different range on the vertical axes between the top right and middle right panels). On the bubble trajectory (bottom panels), the pressure perturbation increment due to the rising bubble dominates and the solutions are almost identical. Throughout the runs, a single time-step spent in the soundproof pseudo-incompressible regime largely filters out the fast acoustic imbalances of the compressible run (not shown in the middle and bottom panels of Figure 7). This is quantified by comparing the relative errors with respect to the reference pseudo-incompressible run for the compressible run, E c , and for the blended run, E b , defined in (48) and shown in Table 1. E b is more than 25 times smaller than E c for the large time-step case, and more than two orders of magnitude smaller for the small time-step case.
We also remark that a probe measurement of the full pressure time increment δp differs slightly between the reference pseudoincompressible run and the one-step blended run (not shown). The difference is due to the time-dependence of the hydrostatically balanced background pressurep in the blended run. However, the computed values of the pressure perturbation time increment δp are remarkably similar in the two runs (black line and orange dots in top right panel of Figure 7). We can thus conclude that blending recovers balanced dynamics irrespective of small compressibility-induced variations of the background pressurep.
In view of these results, blending can be employed as an effective means to achieve the balanced initialisation of data within a fully compressible model. The single time-step balancing capability in the model presented here substantially improves on the performance of  and Benacchio et al. [2014], whose blended models achieved smaller reductions in amplitude compared to the fully compressible case and needed several time steps in the limit regime.

Travelling vortex setup
To combine blending with data assimilation as described in Section 4d, an ensemble is generated by perturbing the initial vortex center position (x c , z c ) within the open half interval of [−1.0 km, 1.0 km) for both x c and z c . The vortex is then generated around this center position such that the full vortex structure is translated. Ten such samples are drawn and they constitute the ensemble members. An additional sample is drawn and solved with the full model for the balanced initial condition. This run, denoted by obs, is used to generate the artificial observations. Another run identical to this additional obs sample is made. This time, however, blending for the first time-step is applied and this run is considered the truth in the sequel. This is to correct for any errors in the initialisation of π, as discussed in section 4.3.
This choice of generating the truth and obs through a perturbation of the initial condition is such that the ensemble mean does not coincide with the truth. Otherwise, ensemble deflation alone would be sufficient to make the ensemble converge towards the truth, see also Lang et al. [2017].
The observations are taken from the obs run every 25 s -only a tenth of the grid points are observed and these are drawn randomly. Sparse observation grid points are randomly drawn as follows: A Boolean mask selecting for a tenth of the grid points is generated where if necessary, a ceiling function is applied to obtain an integer number of grid points selected. The entries of the mask are then shuffled using the algorithm by Fisher and Yates [1953] and the Boolean mask is applied to the obs array to obtain the sparse observations. This deviates from a more realistic situation where observations and grid points do not coincide. To simulate measurement noise, Gaussian noise with standard deviation equal to 5% of the peak-to-peak amplitude of the obs quantity at the given time is added independently to each of the observed grid points. A similar method of generating artificial observations was used in, for example, Bocquet [2011], Harlim and Hunt [2005] for the Lorenz-63 and Lorenz-96 models.
The regions for localised data assimilation are of size (11×11) grid points and only observations within such a patch are considered for analysis operations at the respective central grid point. A localisation function corresponding to a truncated Gaussian function is applied such that observations farther from the grid point under analysis have less influence, and that the influence decays smoothly towards the edges of the localisation subdomain, where it is abruptly truncated to zero. No ensemble inflation is applied in this case. Examples of the observations and truths used in the generation and evaluation of the experiments with data assimilation are displayed in Figure 8. Notice that we run one test with observations of the momentum fields only, and another test with observations of the full set of variables.
The ten ensemble members in each of these tests are initialised with balanced states, and blending is applied for the first time-step when the model runs in the pseudo-incompressible configuration. The ensemble is then solved forward in time with the fully compressible model. Data from the generated observations are assimilated every 25 s. The immediate time-step after the assimilation procedure is solved in the pseudo-incompressible limit regime while the rest of the time-steps in the assimilation window are solved using the fully compressible model. Conversions according to the blending scheme in section 4 are employed when switching back and forth between the full and limit models. Furthermore, the choice of π half is used (cf. the discussion in section 4.3.). The ensemble solved with both data assimilation and blending is abbreviated as EnDAB.
The setup is repeated for two additional ensembles and each observation scenario, one where data are still assimilated but no blending is performed (EnDA), and another where neither data assimilation nor blending are performed (EnNoDA). EnNoDA and EnDA constitute an identical twin experiment Cotter, 2015, Lang et al., 2017], through which the effects of data assimilation can be evaluated. EnDA along with EnDAB constitute yet another identical twin experiment, which evaluates the performance of blending.

Rising bubble setup
The rising bubble ensemble spread is generated by randomly modifying the maximum of the potential temperature perturbation δΘ in the open half interval [2.0 K, 12.0 K). The ensemble comprises ten members. While the relative spread of the temperature perturbation is large with this setup, the ensemble spread of the bubble position at the final time of the simulation, t fin = 1000.0 s, is only moderate.
An additional sample is drawn for the obs and the truth, which are identical in this setup. Blending is applied to the first time-step of the obs and the truth, obtaining a balanced solution. As the rising bubble flow fields evolve rather slowly in the beginning, data are only assimilated from t = 500.0 s onwards. Observations of the momentum field are then assimilated every 50.0 s. As with the vortex experiments, only a tenth of the grid points are observed, noise with standard deviation 5% of the peak-to-peak amplitude is added, and localisation within an (11 × 11) grid points region is applied. A localisation function corresponding to the truncated the Gaussian function is applied and the ensemble is not inflated. Examples of the observation and truth are given in Figure 9. Three ensembles corresponding to the EnNoDA, EnDA, and EnDAB settings, with 10 members each, are generated, but only one set of experiments involving assimilation of the momentum field only is pursued.
Note that as the ensembles and the observations are generated with balanced initial conditions, any noise present in the simulation results is the result of the data assimilation procedure.  summarises the details of the data assimilation-related experimental setup for both test cases.

Evaluation of data assimilation
The quality of data assimilation is evaluated by a spatially and ensemble averaged root mean square error (RMSE) from the truth. This is given by where k = 1, . . . , K indexes the ensemble members and i = 1, . . . , N x and j = 1, . . . , N z the number of grid points in the (x, z) coordinates. ψ is the set of quantities {ρ, ρu, ρw, P, π}. Table 2: Assimilation-related experimental parameters. K is the ensemble size, b the ensemble inflation factor, t first the first assimilation time, ∆t obs the observation interval, ψ assimilated the set of quantities assimilated, (N ×N ) local the size of the local region, f local the type of localisation function, η obs the observation noise, σ the standard deviation of the Gaussian noise, A the peak-to-peak amplitude of the quantity observed, obs sparse the sparsity of the observations, ∆t blending the number of initial time-steps spent in the limit model regime. π choice, used in the initialisation of ∆t blending , is either π half or π full , more details in section 4.3.  Figure 10 depicts the ensemble snapshots for the vortex case with all quantities observed and assimilated. EnNoDA acts as the control ensemble, and the top row depicts its solutions for the travelling vortex without data assimilation and blending. While the center position of the vortex for each ensemble member is perturbed, the ensemble mean vortex position (fourth column) is centered around the origin. This is in line with the conditions used to generate the initial ensemble. With data assimilation, EnDA (middle row), the balance is lost and at final time the vortex structure is not preserved. Data assimilation and blending, EnDAB (bottom row), recovers the balanced solution and the vortex structure is preserved after three periods of revolution. Moreover, comparing with Figure  EnDAB ensemble mean is in the lower right quadrant, closer to that of the observation and the truth.

Test case
Data assimilation without blending (EnDA, orange lines in Figure  11) leads to a jump in the RMSE in the thermodynamic P variable upon the first assimilation at t = 25 s. After that, the error stays relatively constant. Appendix B shows that the error jump quantifies the imbalance introduced by the data assimilation procedure.
Assimilating the momentum fields alone is insufficient and the RMSE in the solution (solid lines in Figure 11) is larger than in the reference EnNoDA run. As expected, EnDAB provides a smoother solution over time as the error does not oscillate, yet ensemble spread and RMSE are comparable in these runs (not shown). This test includes a strong axisymmetric potential temperature variation ( Figure 4), and the potential temperature is an advected quantity not corrected by momentum data assimilation. Therefore, the initially tight correlation of the velocity and potential temperature variations gets destroyed in the course of data assimilation. Since the potential temperature is fluid dynamically active through the generation of baroclinic torque, the flow fields of the ensemble members increasingly deviate from their reference as a consequence. Assimilating all the quantities yields an improvement (dashed lines in Figure 11). While the initial assimilation reduces the error substantially for ρ, ρu and ρw of the EnDA run, the error increases over time and surpasses the error of the control EnNoDA run at approximately t = 225 s. This increase in the error is due to the imbalances introduced by the chosen (11 × 11) grid point size of the localisation regions (more details are provided in section 5.3.3 and Appendix B). For the EnDAB run, the imbalances are suppressed and the RMSEs are lower than those of the control EnNoDA run for all quantities over the entire simulation period. Figure 12 displays snapshots of pressure perturbation for the bubble case. In the EnNoDA run (first row) the bubbles in the ensemble attain different heights at the end of the simulation time and the ensemble mean is diffused, in line with the spread in the initial conditions used in generating the ensemble. Ensemble members with larger potential temperature perturbation rise faster. In the EnDA ensemble (second row), large-amplitude fast-mode imbalances are present while the ensemble mean of the bubble rotor positions at the end time better approximates the true positions of the rotors. For EnDAB (third row), the individual ensemble members are close to one another, as reflected in the ensemble mean. The ensemble better approximates the truth and the fast-mode imbalances are suppressed. Moreover, the pressure footprints of the bubble rotors are not visible in plots of the pressure differences between the EnDA and EnDAB ensembles (fourth row), showing that the difference is predominantly due to the presence of the imbalances only, and suggesting (last column) that data assimilation is comparably effective in nudging the bubble towards the truth in both cases. Blending suppresses the imbalances while leaving the dynamics of the rising bubble largely unaffected. RMSE plots of data assimilation of the momentum fields in the rising bubble experiment are shown in Figure 13. The momentum fields are assimilated every 50.0 s after 500.0 s. This is visible in the momenta RMSE plots, where each downward step corresponds to one application of the assimilation procedure. For EnDA, an error is introduced in the density ρ and mass-weighted potential temperature P . Blending negates this and the EnDAB curves show a smooth profile, with RMSE lower than the control EnNoDA. As in the travelling vortex case, a jump is visible in the RMSE of P at the first assimilation time for EnDA, and this corresponds to the imbalances introduced. See Appendix B on the scale analysis for more details. The ensemble spread and RMSE are again comparable in these runs (not shown).

Localisation region and imbalances
In this section, results of the EnDA and EnDAB ensembles are investigated for varying localisation radii. Here the aim is not to obtain the optimal choice of the localisation radius but to illustrate its effect on the imbalances. All the quantities are assimilated for the travelling vortex test case and localisation regions of (5 × 5), (21 × 21) and (41 × 41) grid points are used. Otherwise, the setup follows the parameters laid out in sections 5.1.1 and Table 2.
Increasing the size of the localisation region reduces the error for EnDA and EnDAB experiments. The error of the (41 × 41) travelling vortex EnDA experiment (cyan solid line in Figure 14) is consistently lower than that of the control EnNoDA ensemble for all variables except for the pressure-related mass-weighted potential temperature P , for which the error jump is nevertheless small. This result suggests that the error in Figure 11 for the EnDA ensemble with all quantities assimilated (orange dashed line) arises from the imbalances introduced through localisation. The amount of imbalances introduced can be reduced with large enough localisation regions. Imbalances are introduced by localisation even when taking into account a large proportion of the total grid points, e.g., (41 × 41) regions in a (64 × 64) mesh. Similar results are obtained for the rising bubble test (not shown).
For the localisation radii investigated in Figure 14, the best performing EnDA solution performs worse than the worst performing EnDAB solution (compare the solid cyan and dashed magenta lines). In the case of the EnDA solutions (solid lines), the fast-mode imbalances introduced are a significant source of error, see section 5.1.1 for more details. As a consequence, any reduction in the imbalances introduced will lead to a corresponding decrease in the error. This corroborates the decrease in the RMSE for increasing localisation sizes. For the EnDAB solutions (dashed lines), the balanced structure is preserved and there is an optimal localisation length scale. Too small a localisation region will lead to an under-sampling of the vortex dynamics, while too large a localisation region may distort the vortex dynamics by oversampling the background dynamics. This is observed in the larger RMSE for the (41 × 41) solution (dashed cyan) when compared to the (21 × 21) solution (dashed yellow). These results indicate that, for blended data assimilation of the travelling vortex experiment, a moderate localisation region of approximately a third of the grid-size, i.e. (21 × 21) cells in a (64 × 64) grid, yields the optimal analysis fields.

Discussion and conclusion
This paper has presented a new conceptual framework for balanced data assimilation based on blended numerical models. Using a discrete time-level numerical analysis for the Exner pressure field and a careful choice of pressure perturbation variables, the blended soundproof-compressible modelling framework of Benacchio et al. [2014] has been substantially upgraded by a functionality to switch between equation sets in a single time-step. In idealised numerical experiments with a travelling vortex and a gravity-driven warm air bubble, a single time-step in the pseudoincompressible limit regime was sufficient to recover a balanced state starting from imbalanced initial data. Moreover, the blended model yielded leftover acoustics with amplitude more than one order of magnitude smaller than the ones generated at the onset with the fully compressible model. The amplitude reduction is a sizeable improvement over the scores of Benacchio et al. [2014] who, in addition, needed several time steps in a hybrid soundproofcompressible configuration with non-integer values of the blending parameter α P to achieve their best level of noise reduction. The upgraded blended model has then been combined with a data assimilation engine and deployed as a tool to reduce imbalances introduced by regular assimilation of data within model runs. Numerical results on ensemble data assimilation with and without blending showed that while data assimilation alone produced imbalances that effectively destroyed important qualitative features of the solution in one of the test cases, data assimilation together with blending strongly reduced those imbalances and lead to recovery of accurate results. Moreover, blended data assimilation was effective despite the untuned data assimilation parameters used in the investigations. Throughout our study, a single time-step spent in the pseudo-incompressible limit regime after the assimilation of data was sufficient to restore the balanced state, as documented by strongly reduced RMSEs with the blended model.
For ensemble data assimilation experiments with the travelling vortex, assimilation of the momentum fields alone was found to be insufficient. Over longer simulations, the ensemble with balanced data assimilation carried larger errors than the control ensemble without data assimilation. See the green solid curves in Figure 11. We have traced the origin of this result back to an issue of controllability [Jazwinski, 2007]: This test case involves large, dynamically relevant potential temperature variations whose deviation from the truth cannot be controlled at all when only the momentum field is assimilated. In fact, a test with an analogous vortex that has constant entropy initial data yields results (not shown) close in quality to those of the rising thermal test when only momentum is assimilated. The issue could then be solved by assimilation of all variables. Further investigation is warranted on how the effectiveness of data assimilation can be improved under such circumstances without the need to observe all state variables. A scale analysis (Appendix B) corroborated the insight that the RMSE increase introduced by the assimilation of data corresponds to the fast-mode imbalances seen in the plots of the individual ensemble members and the ensemble mean. In this sense, our experiments make a case for investigations involving relatively simple idealised test cases, as we were able to gain some analytical understanding of the sources and consequences of errors and imbalances. Nevertheless, further studies based on more realistic scenarios will be required to demonstrate that the presented approach and its extensions will actually enable quantifiable improvements of numerical weather prediction skill scores.
In the experiments involving ensemble data assimilation with different localisation radii, blended data assimilation yielded, for all localisation sizes, substantial improvements to the RMSE relative to the plain data assimilation without a balancing procedure. In fact, the best-performing data assimilation-only run still produced worse results than the worst-performing run with blending. Furthermore, the recovery of a balanced vortex structure by blended data assimilation turned out to be sensitive to the choice of localisation radius, with best results obtained at some intermediate size of the localization domains. In contrast, increasing the localisation size for an ensemble with data assimilation without blending was sufficient to decrease its error. Yet, since the imbalances by far dominate the overall error in this case, this is just a reflection of the expected noise reduction resulting from increased smoothing of the assimilated information.
In numerical weather prediction, methods to damp or remove acoustic imbalances have long been employed [e.g. Daley, 1988, Skamarock and Klemp, 1992, Dudhia, 1995, Klemp et al., 2018. Moreover, practical application of sequential data assimilation procedures will generally excite all rapidly oscillatory modes of the compressible system, and filtering techniques are used to negate these unphysical imbalances [Ha et al., 2017]. In this context, the results presented in this paper are encouraging in that blended data assimilation was able to suppress acoustic noise and recover balanced analysis fields, albeit for idealised test cases. To the best of the authors' knowledge, this is the first study of a dynamics-driven method to suppress acoustic noise arising from the sequential assimilation of data.
In addition, the results presented in this paper prepare the ground for future work in a number of areas. In general, the performance of a data assimilation method can be improved by tuning its adjustable parameters. Here, however, we consciously employed an untuned data assimilation scheme known to produced unphysical imbalances to test the efficacy of our dynamics-driven method in removing them. Consequently, a comprehensive study similar to Popov and Sandu [2019] on multivariate tuning of the LETKF and localisation parameters for the blended numerical model will be an avenue for future improvements of our approach. The study could also compare our method with existing balancing strategies, e.g., the IAU and the DFI, following Polavarapu et al. [2004]. To ensure a fair comparison, optimisations of the IAU along the lines of Lei and Whitaker [2016] and He et al. [2020] may have to be carried out. A comparison of the effects of our dynamics-driven method on the slower dynamics against those of the DFI and IAU, which act as low-pass filters Zhang, 2016, Polavarapu et al., 2004], will be particularly insightful.
Despite the untuned data assimilation scheme used, the blended model has given promising results, although thus far only for idealised test cases. Another natural evolution will hence involve model performance on more realistic three-dimensional moist dynamics scenarios with bottom topography Klein, 2014, Duarte et al., 2015] and on benchmarks at larger scales Klemp, 1994, Benacchio and.
Although presented and refined here for the blending between the compressible Euler equations and the pseudo-incompressible model only, the methodology translates to other scenarios as long as one can formulate the according projection onto appropriate reduced dynamics via the elliptic pressure correction. Models imposing a divergence constraint on the weighted velocity field as well as frameworks blending between nonhydrostatic and hydrostatic dynamics will naturally fit into the present approach.
Specifically, the numerical scheme proposed by Benacchio and Klein [2019] enables solution of the hydrostatic system in the largescale limit in addition to the small-scale low Mach number limit considered in this paper. Therefore, a blended data assimilation framework such as the one presented here could be enhanced with hydrostatic blending and used in a two-way blended pseudo-incompressible / hydrostatic / compressible model [Klein and Benacchio, 2016] exploiting the different dynamics in the equation sets.
Moreover, the theoretical framework developed in that paper also included the unified model by Arakawa and Konor [2009] as one of the reduced models. Thus, after an appropriate extension of the present numerical scheme, yet another framework for blended data assimilation can be developed. In fact, a variant of the fully compressible/Arakawa-Konor model pair has recently been presented by Qaddouri et al. [2021], and a related blending approach will allow for the filtering of smaller-scale acoustic noise while leaving the Lamb wave components dynamically unaffected. Investigations similar to the ones in this paper can then be made on balancing initial states and data assimilation for small-to planetary-scale dynamics using the resulting doubly blended model framework. Internal waves play an important role for atmospheric dynamics and they should not be removed indiscriminately after a data assimilation step. Therefore, the identification and removal of unwanted internal wave noise while keeping the physically meaningful wave spectrum is an additional challenge that will require further theoretical developments beyond the scope of this paper.
More generally, semi-implicit compressible models feature in several dynamical cores used by weather centres worldwide. Notable examples include the currently operational hydrostatic IFS spectral transform model in use at the European Centre for Medium Range Weather Forecasts [ECMWF, Wedi et al., 2013], and the Met Office's Unified Model [Davies et al., 2005, Wood et al., 2014, which has a hydrostatic-nonhydrostatic switch. ECMWF's next-generation nonhydrostatic compressible dynamical core, IFS-FVM , actually uses a numerical discretisation akin to the one considered in this paper and would therefore be an ideal candidate for a first implementation of the blended tools in a semi-operational model. In addition, our approach will bear particular relevance to fully compressible operational models featuring the option of selectively employing the dynamics of a limit model [Wood et al., 2014, Melvin et al., 2019, Voitus et al., 2019, Qaddouri et al., 2021.
In this context, multimodel numerics with seamless switching could contribute to creating a level playing field to evaluate accuracy and performance with different equation sets in the same dynamical core. The positive evidence provided here in balancing data assimilation shows, in the authors' view, a considerable potential and potential impact of deploying the blended model framework across the whole forecast model chain.
spaces and the subscript [g] represents the global state space, i.e. localisation has not been applied.
1. Apply the forward operator H to obtain the state vectors in the observation space, 2. Stack the anomaly of the state and observation vectors to form the matrices, wherex [g] (ȳ [g] ) is the mean of the state vectors (in observation space) over the ensemble, e.g. x 3. From X X X f [g] and Y Y Y f [g] , select the local X X X f and Y Y Y f . 4. From the global observations y obs, [g] and observation covariance R R R [g] , select the corresponding local counterparts y obs and R R R.
Notice that the subscript [g] is dropped when representing the local counterparts.
5. Solve the linear system R R RC C C T = Y Y Y f for C C C ∈ R K×l . 6. Optionally, apply a localisation function to C C C to modify the influence of the surrounding observations. 7. Compute the K × K gain matrix, where b > 1 is the ensemble inflation factor. 8. Compute the K × K analysis weight matrix, 9. Compute the K-dimension vector encoding the distance of the observations from the forecast ensemblē w a = K K KC C C y obs −ȳ f , and addw a to each column of W W W a to get a set of K weight vectors {w a k } with k = 1, . . . , K.
10. From the set of weight vectors, compute the analysis for each ensemble member, x a k = X X X f w a k +x f , for k = 1, . . . , K.
11. Finally, recover the global analysis ensemble {x a k,[g] }, k = 1, . . . , K. This recovery depends on how the local regions were selected in (53) and (54). For local region surrounding the grid point under analysis, the reassembly of the global analysis ensemble is done by reassembling the analysed grid points back into the global grid.
Appendix B Scale analysis for the data assimilation error in the pressurerelated fields Figures 11 and 13 show that the assimilation of only the momentum fields leads to a jump in RMSE in the non-momentum fields, and the assimilation of all quantities in Figure 11 leads to a jump in RMSE in the pressure-related P field. This increase in the error occurs after the first assimilation time and remains of the same order of magnitude for the duration of the simulation, quantifying the imbalance introduced by data assimilation. The imbalance can be characterised by a scale analysis [Klein et al., 2001].
The assimilation of the momentum fields leads to a change in the divergence of the velocity fields, where (δu, δw) are the changes in the velocity fields due to the assimilation of momenta in the vertical slice experiments. (60) has the units [s −1 ]. Observe from Figures 10 and 12 that the imbalance introduced by data assimilation are fast-mode acoustic waves. This effect is modelled as a wave oscillating with the peak amplitude right after the assimilation of data at the grid point under analysis. Therefore, for an oscillating wave excited at grid point (x i , z j ), the maximum amplitude of the imbalances is (∇ · δv) (i,j) tac 0 cos π 2 t t ac dt = 2t ac π (∇ · δv) (i,j) π/2 0 cos(ξ)dξ = 2t ac π (∇ · δv) (i,j) , The acoustic timescale t ac is chosen as the timescale of the largest perturbations introduced. This is the time a wave takes to traverse to the edge of the (11 × 11) grid points local region from the analysis grid point. Therefore, which has the units of [Pa]. The hatˆsignifies that the quantity is obtained from scale analysis. Finally, use the equation of state (6) to obtain an estimate forP . EnNoDA EnDA EnDAB Figure B1: Scale analysis of the contribution to the mass-weighted potential temperature P from the divergence of the velocity fields for the travelling vortex ensemble (left) and the rising bubble ensemble (right). In the legend, (all) represents the travelling vortex ensembles with all quantities, ρ, ρu, ρw, P, π, assimilated. The first assimilation time is marked with a vertical solid black line.
For comparison with the RMSE, the norm is taken forP , given by where k indexes the K ensemble members and N x and N z are the number of grid points in the x and z coordinates. Figure B1 shows the results of scale analysis for the two test cases. Results at assimilation time are omitted. Scale analysis yields EnDA results for P that are of the same order of magnitude as the jumps in the RMSE plots (Figures 11 and 13) with a similar profile over time. Scale analysis characterises the error jump in the thermodynamical RMSE plots as fast-mode imbalances introduced through data assimilation.