## 1. Introduction

In recent years, there has been a rapid growth in the development and use of so-called convective-scale numerical weather prediction (NWP) systems using “convection-permitting models” (CPMs) (Clark et al. 2016). In building quantitative NWP systems, it is widely recognized that a number of aspects of the system are only known with some degree of uncertainty. Ensemble-prediction systems (hereafter ensembles) are designed to translate our understanding of uncertainty into reliable probabilistic forecasts, in particular of hazardous weather events. The uncertainty can come from many areas including model uncertainty and initial and boundary condition uncertainty, all of which have been studied at the convective scale (e.g., Leoncini et al. 2010, 2013; Keil et al. 2014; Kühnlein et al. 2014). It is essential that these uncertainties are represented accurately.

All forms of uncertainty may have an important role in a forecast as errors, particularly at the convective scale, can, and do, generally grow quickly within forecasts (e.g., Lorenz 1969; Hohenegger and Schär 2007). The consensus in convective-scale studies is that errors grow primarily as a result of the processes occurring within convection (Zhang et al. 2003; Hohenegger et al. 2006). Initial studies considered the growth from initial conditions and indicated that not only was error growth faster than in larger-scale models that use parameterized convection (Hohenegger and Schär 2007) but that forecasts could be improved in certain situations through better specification of the initial conditions (e.g., Melhauser and Zhang 2012). As expected, the predominant impact of the error growth from initial condition uncertainty occurred at the start of the forecast (Keil et al. 2014; Kühnlein et al. 2014), though of course this uncertainty will have some impact downstream. On the other hand, due to their regular refreshing, the boundary condition uncertainty led to consistent error growth throughout the forecasts (Keil et al. 2014; Kühnlein et al. 2014). Uncertainty also arises, for many reasons, from the representation of physical processes within the model. Investigations of error growth from simulated model physics variations in CPMs suggest that they have the greatest impact when initiated within the boundary layer (Lean 2006) and during the initiation and development of convective events (Leoncini et al. 2010; Keil et al. 2014). However, it is clear that all forms of uncertainty need to be represented in the model to allow for some form of variability on all scales from the start of the run, despite the different interactions and impacts of the error growth in the different convective situations (e.g., Raynaud and Bouttier 2016).

Uncertainty in the boundary layer can, itself, arise from a number of sources. The first is “model error” due to the fact that model parameterizations are not perfect. Put more precisely, it is important to recognize that, in the case of the boundary layer, boundary layer schemes (including shallow convection) are designed to predict the ensemble mean of realizations of a turbulent boundary layer in quasi-equilibrium. (It must be emphasized that the ensemble here is purely conceptual and nothing to do with ensemble-based NWP.) With given boundary conditions and forcing, we assume that there is a unique ensemble-mean solution, but no scheme always reproduces this perfectly. Schemes tend to have systematic errors; in an ideal world these would be correctable via a bias-correction scheme, leaving an unbiased error, but in many cases this has only been done through a form of assessment and manual tuning. Taking account of this has led to the use of “multiphysics” or “poor-man’s” ensembles, e.g., Ebert (2001). While undoubtedly important, this source of error is conceptually difficult as it is not based upon physics, but rather the inadequacies of our modeling of physics. As Ebert (2001) shows, it is essential that the performance of each model is properly taken into account before an optimal ensemble is designed. If not approached very carefully, it can lead to the artificial enhancement of ensemble spread to meet quantitatively desirable criteria by deliberately introducing less accurate parameterizations.

Physically based uncertainty arises from the heterogeneity of the boundary layer. In CPMs we want a space–time mean over a finite domain, not the ensemble mean, but we have no direct methods to establish how to parameterize this. Even a boundary layer with horizontally homogeneous forcing results in a highly inhomogeneous instantaneous turbulent flow. We assume that the stationary system is ergodic, i.e., that sufficiently long (wide) time (space) averages tend to the ensemble mean. However, as we look at increasingly small space and time periods, our ability to predict flux divergences is limited by the inherent variability of the turbulent flow.

Heterogeneity can also be introduced by the surface-exchange process. Many surface schemes, including that in the Met Office Unified Model (MetUM), account for surface inhomogeneity using techniques based upon the concept of blending height (e.g., Mason 1988) which accounts for inhomogeneity on scales of a few hundred meters; theory tells us that the boundary layer then “sees” an effective surface. The key point is that this heterogeneity is on a smaller scale than the convective-overturning scale considered in our scheme, so as a source of variability this should be less important. Heterogeneity at the model grid scale and larger is explicit in the model (though, of course, many questions surround whether the surface forcing should be smoothed for numerical reasons). Nevertheless, there is probably a range of scales in between where surface heterogeneity, if it exists, contributes to boundary layer variability.

In addition, there is uncertainty associated with surface characteristics. This may be essentially fixed or slowly varying (e.g., exactly how tall are the trees and hence what is their roughness length, or what is their leaf area index). This may be of very practical importance but is not a problem of physics but of information that can, in principle and, increasingly, in practice, be acquired. However, some variables, notably soil moisture, may have uncertainty that is impractical to reduce by measurement.

A number of studies exist concerning the impact of boundary layer uncertainty. One technique that has been used in atmospheric models when examining the predictability of convective events lumps small-scale stochastic variations in the boundary layer due to all unresolved turbulent processes into relatively arbitrary stochastic perturbation fields in the form of Gaussian “bumps” (e.g., Done et al. 2012; Leoncini et al. 2013; Flack et al. 2018). While used as the only form of uncertainty represented in the model it has not produced large spread at the kilometer scale (Flack et al. 2018). However, the representation of the processes has been useful for determining different positions and magnitudes of scattered showers and convection in general. Here we go beyond this to develop a stochastic boundary layer perturbation scheme that specifically takes into account variation due to sampling the turbulent processes. The ideas below take this as a starting point, assuming that our ensemble-mean parameterization of the system is correct and that our task is to estimate the “sampling error” or noise that arises from averaging the system over a smaller time period and area than would be required for the average to be sufficiently converged to equal the ensemble mean. We do not take account of heterogeneity of the surface or model error, though the scheme is designed to be compatible with any boundary layer scheme.

It should be noted that the objective is to gain reasonable physically based estimates of the sampling uncertainty as a function of model resolution, as it must increase as resolution decreases. At current CPM scales, the studies cited above lead us to expect that mesoscale uncertainty in initial and boundary conditions is the primary contributor to ensemble spread.

The methodology discussed in this paper is inspired by Plant and Craig (2008) but can be traced back to the idea of shot-noise decomposition of turbulence, that is briefly reviewed in section 2. The new scheme is described in section 3. We first introduce a simple bulk parameterization of the convective boundary layer (CBL) in order to introduce scaling, but also as a means to illustrate the approach in a simple framework. Section 4 outlines implementation of the scheme in the MetUM. Similar schemes have been implemented by others (e.g., A. Lock 2016, personal communication; Kober and Craig 2016, hereafter KC) and we compare our scheme with these in section 5.

We then test the scheme. This testing evaluates the impact of the scheme on its own; it is not expected that this source of uncertainty can account for the whole of forecast error, so evaluation is restricted in Part I of this paper to impact on ensemble spread and not on overall forecast reliability. The latter is addressed in Flack et al. (2021, hereafter Part II). The testing (in Part II) compares forecast uncertainty due to initial and boundary condition uncertainty with the uncertainty resulting from the scheme running continuously. It relies on the existence of realistic initial and boundary condition perturbations from the Met Office Global and Regional Ensemble Prediction System (MOGREPS) scheme; this only has one boundary layer scheme in operational use. At present it is therefore not feasible to test cleanly with another boundary layer scheme (though its design would make this straightforward in a model with multiple schemes). To test the scheme, diagnostics designed to test convective-scale ensemble rainfall forecasts have been used; these are described in section 6 and applied to case studies in section 7 and in Part II. Finally, we summarize our conclusions regarding the scheme’s characteristic behavior in section 8.

## 2. Shot-noise decomposition

The primary objective is to represent the variability that arises on space and time scales somewhat larger than the characteristic scales of boundary layer eddies. We do not wish to describe the detailed behavior of these eddies, but, rather, to capture the consequences of their intermittent and chaotic occurrence. Our starting point is to assume that eddies occur randomly and independently. This is a restriction that shall be discussed below. The idea of modeling a turbulent process as a superposition of random events was first proposed by Lumley (1967), and a particular methodology was described by Moin and Moser (1989). By analogy with the electronic noise that occurs due to the carrying of current by independent, random electrons, this approach is known as “shot-noise” decomposition.

We assume that eddies occur as independent, random events, that one might term eddies. Suppose we consider just temperature, the warming provided by “unit eddy” is characterized by a function *ϕ*(*z*, **x**, *t*) with **x** and *t* relative to some reference point in the eddy. Thus, *ϕ* includes length and time scales over which this warming is spread. Note that we have included a *z* dependence but this, at least when averaged horizontally, must equal that of the mean field.

**x**

_{i},

*t*

_{i}} and amplitudes {

*a*

_{i}}, where

*a*

_{i}is an independent random variable with a given pdf. We can think of a function

*F*made up of delta functions with these amplitudes and locations,

*δxδyδt*is

*pδxδyδt*follows Poisson statistics. Then the field of increments applied to the model state, denoted

*S*, can be constructed convolving this with

*ϕ*:

It is then straightforward to show that the autocorrelation function of *S* (in time and/or space) is just that of *ϕ*. An important parameter of the autocorrelation function, the integral time scale for the process, *τ*_{ϕ}, is obtained by integrating the autocorrelation function over time. Equivalent results exist for spatial autocorrelation functions and integral length scales,

We assume, without proof, that successive regions in space become uncorrelated if we average over a distance longer than the integral length scale. Similarly, if we average over areas greater than the product of integral length scales in *x* and *y*, successive regions in space become uncorrelated. We can then think of *F* as the source of a quantity such as temperature provided by the stochastic process. We assume that each “delivers” *a*_{i} over an average time scale *τ*_{ϕ} and area *a*_{i} and the number of events. This may be illustrated by a simple example using a bulk parameterization of the CBL. This will help motivate the final scheme but also facilitate a discussion of relevant space and time scales compared with model resolution and time steps.

## 3. A bulk parameterization

### a. Ensemble-mean parameterization

*h*and the surface buoyancy flux (expressed as a heat flux), given by

*ρ*is the density of air,

*C*

_{p}is the specific heat capacity of air at constant pressure, and ⟨

*w*′

*θ*′⟩

_{0}is the ensemble mean of the covariance of fluctuations of vertical velocity and potential temperature at the surface.

*θ*and

*g*is the acceleration due to gravity. Note that the boundary layer literature often uses the opposite sign convention, but it is conceptually easier here to drop this sign convention as we are primarily considering CBLs. In a well-mixed box model with an impermeable lid the heat flux profile is

As a numerical example, if *H* = 400 W m^{−2} and *h* = 1000 m,

### b. A simple “shot-noise” version of the bulk CBL

We adopt a very similar approach to that of Plant and Craig (2008) in dealing with deep convection. We assume the following:

- The mean heating is actually delivered to the boundary layer through a set of discrete thermals.
- Our space–time averaging is on a scale at least the size of one thermal. (Spatial correlation will be discussed in a little more depth in section 5.)
- The key source of variability is the
*number*of thermals, in a given area, averaged over and we do not need to consider, in detail, the structure of a thermal.

Each thermal “event” delivers heating such that an area *A*_{t} is warmed (throughout the depth of the boundary layer) by *θ*_{t} by the thermal, in time scale *τ*_{t}; the total heat delivered per thermal is thus *Q*_{t} = *ρC*_{p}*θ*_{t}*A*_{t}*h*. Note that *A*_{t} is the average area occupied by one thermal; it includes the upward motion (e.g., a convective thermal) *and* downward motion (i.e., the whole region influenced by the thermal). It makes sense to assume that *A*_{t} ∝ *h*^{2}; the result is that we can express the relative variability in boundary layer heating entirely in terms of *h* and *h* and

*p*per unit area and time. The probability that a thermal occurs in an infinitesimal time

*dt*in a given (small) area

*dA*is

*pdAdt*. The number of thermals that occur in finite time Δ

*t*and finite area Δ

*A*is thus given by a Poisson distribution. The mean number is

*n*) has the distribution

*Q*=

*pΔA*Δ

*tQ*

_{t}so the average rate of change of mean potential temperature is this divided by Δ

*Ah*Δ

*tρC*

_{p}; i.e.,

*Ah*in Δ

*t*is

*relative*standard deviation of the heating is proportional to the inverse square root of the averaging area and time.

*λ*= 1 the

*average*number of thermals in area Δ

*A*and time Δ

*t*is 1. The

*actual*heating rate over this area is

*n*; if we had included a distribution of thermal characteristics, that would appear as an additional factor here.

*θ*. Over time step

*δt*the stochastically perturbed increment can be written as

*ε*

_{p}=

*n*/

*λ*− 1 is a random number and the subscript is a reminder that this originates from a Poisson process, so a new value of

*ε*

_{p}arises only at intervals Δ

*t*(assumed, for simplicity, an integral multiple of

*δt*). It has properties

*λ*increases, the Poisson distribution tends to a normal distribution, so

*ε*

_{p}tends to a normal variate with mean zero and variance

*λ*

^{−1}.

The factor *θ*_{average}. The factor

*θ*

_{inc}=

*ε*

_{p}Δ

*θ*

_{average}, is thus

*n*is the number of thermals occurring in time Δ

*t*(and area Δ

*A*), Eqs. (21) and (24) can be applied in two different ways. If Δ

*t*=

*δt*, so we apply all the heating from a thermal in a model time step, the standard deviation of the stochastic increment gets smaller with smaller model time step, but only as the square root of the model time step, as the relative sampling noise increases.

*A*at any one time. The convective boundary layer depth,

*h*, is typically 1 km, so a model with a horizontal averaging length scale of around 1 km will have Δ

*A*≈

*A*

_{t}and standard deviation in temperature increment (added over

In practice, the scales actually represented well by a model are several (at least five or six) grid lengths (Skamarock et al. 2014). Thus, Δ*A* ≈ *m*^{2}Δ*x*^{2} > *A*_{t}, where *m* is about 5 or 6. Thus, in practice, the averaging area represents several thermals in models with horizontal grid length 1 km or larger, and the assumption of lack of horizontal correlation seems very reasonable. Hence, for models with grid length greater than around 100–200 m, there seems *no need to impose a horizontal correlation or structure* to the stochastic increments.

On the other hand, the time step of numerical models depends on factors such as the Courant number (*u*Δ*t*/Δ*x*, where *u* is the maximum horizontal wind speed). The MetUM has a semi-implicit, semi-Lagrangian dynamical core that is not restricted by the Courant number in terms of stability, but still uses a time step with maximum Courant number a small integer (typically 5) to maintain accuracy. The operational 1.5 km UKV uses a 60 s time step. Many other models, such as WRF (Skamarock et al. 2008), are more restricted in time step. In the typical example in section 3a, *do need to take into account the correlation of time perturbations*. Nevertheless, since we are averaging over the horizontal structure, there seems little justification in treating the temporal correlation in detail. We have chosen to include time correlation simply via a first-order autoregressive process with autocorrelation time scale determined by the characteristic eddy turnover time,

### c. Temporally correlated Poisson perturbations as a first-order autoregressive process

*h*

_{i}, taken to be the rate of change of state variables in time step

*i*. For simplicity, as above, we shall just consider

*θ*, so

*h*

_{i}as a (stationary) first-order autoregressive stochastic process, in which we assume that a portion

*μh*

_{i−1}of the heating comes from thermals that “fired” in previous time steps, and a portion

*n*

_{i}

*q*comes from

*n*

_{i}new thermals in time step

*i*and delivering

*q*heating rate. Thus,

*μ*∈ [0, 1), and the number of new thermals (

*n*

_{i}) is assumed to follow the Poisson process.

*μ*. We need to parameterize

*μ*. The “integral time scale” for this process is given by

*α*is a tuning parameter of order one. Clearly, negative

*μ*makes no physical sense, so we can write

*μ*= 0 perturbations are uncorrelated in time and are assumed to occur entirely in one time step

*δt*. The option to assume uncorrelated perturbations everywhere has been implemented but, in the following, time-correlated perturbations are always used.

*τ*

_{e}= −

*δt*/ln

*μ*. For small

*μ*in terms of

*λ*

^{−1}as in the uncorrelated case.

## 4. Implementation in the MetUM

### a. Procedure

*A*to be the area covered by

*n*

_{g}×

*n*

_{g}grid boxes and so assume that each area Δ

*A*is spatially uncorrelated. The integer

*n*

_{g}is an adjustable parameter of the scheme. Over each area we obtain an average

*λ*[using Eq. (16)] and

*μ*[using Eq. (41)] with

*t*the model time step,

*A*

_{t}=

*h*

^{2},

*w*

_{m}) stable conditions, though the minimum value of 0.4 m s

^{−1}, used for consistency with the boundary layer scheme, may override this in practice. This minimum has little impact in practice, as the scheme produces very small perturbations in stable or even very weakly convective conditions, but it does serve to ensure that extremely short correlation time scales do not occur.

Note that it would probably be more consistent to average the surface buoyancy flux, friction velocity and boundary layer depth over the *n*_{g} × *n*_{g} grid boxes to derive *λ* and *μ* but our approach was adopted to be more efficient on a parallel computer and has little practical impact in relatively homogeneous terrain. Furthermore, because of differences in land use, the surface buoyancy flux can vary substantially from grid box to grid box, while *λ* tends to vary less. As discussed above, this heterogeneity due to differences in land use is an explicit and deterministic source of small-scale variability. The multiplicative nature of the scheme means that this heterogeneity carries through to the stochastic forcing.

For each of the grid boxes on the reduced two-dimensional grid (i.e., each box is *n*_{g} × *n*_{g} model grid boxes) a random integer *n*_{i}, drawn from a Poisson distribution, is generated. Note that with *λ* substantially less than 1, as is often the case, this distribution is highly skewed, with 0 occurring frequently. The random factor *f*_{i} is then generated using Eqs. (34) and (33). The latter reduces to *f*_{i} = *ε*_{i} if *μ* = 0, i.e., uncorrelated perturbations. Finally, a perturbation *f*_{i} times the boundary layer increment is added to the model physics forcing. The option is implemented to apply this selectively to the potential temperature, moisture, or wind fields; in practice we have applied it to all. The scheme has been implemented outside the boundary layer scheme as an additional, stochastic forcing scheme. Ideally, it should be moved inside the scheme and the stochastic increments applied to the surface exchange as well, in order to conserve energy. Currently, energy is only conserved to the extent that the ensemble average of the stochastic increments is zero.

*ϕ*represents the model

*u*,

*υ*,

*θ*and

*q*

_{υ}. If

*f*

_{i}is a first-order autoregressive stochastic process, randomly forced by the Poisson-related variable

*ε*

_{i}[Eq. (34)] and with autocorrelation time scale

*ε*

_{i}has zero mean and variance

*λ*

^{−1},

*f*

_{i}has zero mean and variance, given by Eq. (37). Thus, inclusion of time correlation greatly reduces the dependency of the noise on time step, especially when

*O*(1) km.

*θ*

_{inc}= Δ

*θ*

_{stochastic}− Δ

*θ*

_{average}is thus

*A*

_{t}/Δ

*A*in Eq. (49), can be written (

*h*/

*n*

_{g}Δ

*x*)

^{2}so the standard deviation of the noise will depend on

*h*/(

*n*

_{g}Δ

*x*).

### b. Key features

The scheme outlined has the following physically sensible features:

- The key parameter is
$\overline{n}$ [Eq. (16)], the average number of thermals triggered in the selected area Δ*A*in time Δ*t.* - The stochastic increments have zero mean on average (in a steady system).
- The stochastic increments have an amplitude proportional to the mean increments times
*λ*^{−1/2}. Thus, more thermals means smaller stochastic increments. - The stochastic increments have time correlation (via an AR1 process) related to the turnover-time. If this is less than the time step, stochastic increments are uncorrelated from time step to time step.

## 5. Comparison with other schemes

The implemented scheme uses the parameterized increments [using the boundary layer parameterization described by Lock et al. (2000)]. However, in a CBL and cumulus-capped CBL, the results broadly resemble the bulk scheme described above, so we shall use this as a basis for comparison.

*g*

_{i}can be either a uniformly distributed variable

*ε*

_{U}in the range [−1, 1] or a temporally correlated variable with

*μ*

_{L}here is given by

*τ*a fixed autocorrelation time scale, usually taken to be 10 min. Thus, the temporal correlation is very similar to our scheme, apart from the fixed choice of time scale instead of the time scale varying with conditions. In practice, the fixed time scale of 10 min is quite representative of the values of

*δt*/

*δt*

_{0}scales the increment with time step, with

*δt*

_{0}= 60 s the time step of the UKV model the scheme was first implemented in. Both

*g*

_{i}and

*ε*

_{U}have variance 1/3 so the variance of the

*θ*increment is

*g*

_{i}for each box of a “supergrid” of

*n*

_{g}×

*n*

_{g}points similar to our scheme but the dependence on ratio of eddy area to averaging area is absent in the Lock scheme, meaning that the same amplitude of noise is applied irrespective of the averaging area. As discussed in section 4a, our scheme reduces the standard deviation by a factor roughly

*h*/(

*n*

_{g}Δ

*x*). The time step dependence is the same, but normalized by

*δt*

_{0}rather than

*h*≈ Δ

*x*and

*n*

_{g}larger than our scheme.

*α*

_{sh}is a constant,

*η*is a Gaussian random number field and ⟨Φ

^{2}⟩

^{1/2}is the variance of variables given by the turbulence scheme in their model. The scheme is applied to variables temperature

*T*, vertical velocity

*w*, and specific humidity

*q*. The Gaussian random number field contains spatial and temporal correlations. The temporal correlation arises from holding the field fixed for 10 min, then generating a new field. The 10 min represents the eddy turnover time. Spatial correlation in

*η*is provided by convolving a set of random numbers drawn from a Gaussian distribution with a horizontal two-dimensional Gaussian with standard deviation 2.5 grid points, representing a correlation length scale of 5Δ

*x*(in our notation). The resulting field is normalized to have mean 0 and variance

*E*(

*η*

^{2}) = 1.

This has similarities with our scheme; in our scheme, we choose a fixed area in space, generally larger than the expected horizontal correlation length scale, on the grounds that the horizontal scales well represented by the model are much larger than the spatial correlation scale. However, we explicitly represent the temporal correlation, on the grounds that the time step is shorter than the temporal correlation scale. Thus, the spatial pattern is stepped (or “checkerboarded”) but the temporal pattern varies relatively smoothly from time step to time step. The KC scheme does the reverse, representing smooth correlations explicitly in space on scales roughly the smallest that are well-represented by the model, but with a stepped time profile, each fixed period given by an equal 10 min with no correlation between 10-min periods. Thus, the physical idea behind temporal correlation is consistent with our scheme but the implementation does not adapt to different conditions in the same way.

Note that a revised version of the KC scheme has been published more recently (after first submission of this paper) in Hirt et al. (2019); this, among other changes, implements an autoregressive temporal correlation scheme similar to that of the Lock scheme discussed above. The scheme is implemented as in Eq. (33), with *ε* a spatially correlated “Gaussian bump” field and the same fixed correlation time as used in KC. Though the end result is very similar, we regard a key difference in our scheme as the link to the Poisson process expressed in the starting-point equation, Eq. (27), which provides a self-consistent scale adaptivity, as well as the physically based correlation.

The “checkerboard” in space in our scheme is necessary as the amplitude and correlation time scale vary in space (and time). No doubt some smoothing could be applied to the spatial pattern of parameters, if necessary, but in practice the increments in a time step are so small that this structure does not seem to have any significant impact. Both schemes assume vertically coherent perturbations.

*α*

_{sh}is given by

*l*

_{∞}is the asymptotic mixing length describing the average size of an eddy assumed in the turbulence parameterization,

*dt*is equal to the “temporal resolution of the model,” 25 s, and

*α*

_{sh,Φ}is a tunable parameter set equal to 2 for all variables. This does not seem to make physical sense. The stochastic increment

*in each time step δt*over the 10 min that a choice of

*η*applies, is thus

*δt*if

*δt*≡

*dt*. This means that if one were to halve the model time step, twice the total increment would be applied in the 10-min period

*η*applies for. Given the time step of 25 s quoted, this would also lead to a very large increment. Equation (56) only makes physical sense if

*dt*is a fixed time scale independent of the time step. Indeed, the revised scheme described in the appendix of Rasp et al. (2018) replaces this with

*τ*

_{eddy}. The tuning parameter was also changed (to 7.2 from 2). This partly offsets the change in time scale from 25 to 600 s, but clearly a change from 2 to 48 would be required to completely offset the change in time scale. The authors state that the “tuning factor was chosen so that the effects of the PSP scheme were noticeable but reasonable”; we understand this to mean the same methodology as KC was applied to optimize the amount and onset time of domain integrated precipitation for a strongly nonequilibrium case followed by a check that the same choice of tuning constant gave a similar improvement in other nonequilibrium cases, and while leaving the total precipitation largely unchanged in equilibrium cases. Whatever it means, clearly the size of perturbations applied in the revised scheme should be a factor of around 7 smaller than in the KC scheme. The scheme reported by Hirt et al. (2019) also uses

*τ*

_{eddy}.

*T*, the increment Δ

*T*

_{KCR}in a time step

*δt*is given by

*E*(

*η*

^{2}) = 1

*n*

_{g}), the length scale of turbulent eddies, and the variability in temperature due to turbulent eddies. The averaging scale is based upon a very similar premise and differs only in being more (hopefully realistically) physically flexible in our scheme (i.e., if we choose a larger averaging scale, we get smaller variability). A key difference in formulation is that we take the length scale as the length scale occupied, on average, by one eddy, rather than the mixing length in a diffusion approach, which is assumed to be the scale of the eddy itself. Thus

*l*

_{∞}, typically 150 m, is much smaller than

*h*(and, indeed, in some schemes is taken to be 0.15

*h*). However, including the factor of

*α*

_{sh,T}= 7.2 in this term means

*α*

_{sh,T}

*l*

_{∞}/

*h*≈ 1.

Garratt (1994) suggests *θ*^{2}⟩ is larger. Thus, if the local variance from the turbulence scheme is used at a given height, it would seem more appropriate to use a height-varying length scale. The effect would be to at least partially cancel the variability in each.

We have focused primarily on amplitude of perturbations, and particularly on temperature. The recent enhancements of the KC scheme by Hirt et al. (2019) focus on a nondivergent wind perturbation which is found to force the vertical velocity more effectively. This seems to be a useful innovation; Hirt et al. (2019) emphasize that this enhances coupling with the resolved scales as less of the perturbation projects onto sound waves, so comparing magnitudes may not give a true reflection of impact. Whether this is physically important at the scales we are applying perturbations at (several km) may be questionable, but the approach may become more important as models move more into the subkilometer “turbulence gray zone.” We did not pursue this in this initial implementation, in part, because the MetUM already has a kinetic energy backscatter scheme and, in part, to maintain simplicity. It is certainly likely that small-scale vertical motions at the top of the boundary layer play a crucial role in coupling to shallow cumulus, but this is likely to be at a scale much smaller than those we are considering, and future work will focus on extending the scheme to include parameterized coupling with unresolved vertical motion in the cumulus scheme along with the backscatter scheme.

In summary, at least when considering the magnitude of temperature fluctuations, the schemes are broadly similar in both magnitude and dependencies. The two schemes will produce different vertical profiles of perturbation; ours is simple and follows the parameterization profile. The KC scheme has a profile dependent on the profile of variability, but seems slightly flawed in not including the compensating profile of turbulence length scale. Our scheme seems better justified when considering the relative sizes of grid box and time step compared with spatial and, in particular, temporal correlation scales, respectively. Furthermore, the Poisson-based stochastic process is properly scale adaptive (at least as far as the assumptions go), so naturally gives higher amplitudes when run at smaller scales that average to the consistent amplitudes at larger scales. However, in practice, the manner in which correlations are applied probably makes little difference (as discussed further below), and the main difference is that temporal correlation in our scheme depends explicitly on the local boundary layer properties, and our random function is likely to be substantially more skewed.

Overall, despite their apparent differences in formulation, the schemes are notable more for their similarity than their differences, the main key difference being the nondivergent increments in the later Hirt et al. (2019) variant.

## 6. Diagnostics

### a. Introduction to diagnostics

Diagnostics for testing the scheme have been chosen to allow examination of both the changes in magnitude induced by the scheme and the spatial differences. As the scheme is written for convective-scale models, careful interpretation of more “traditional” techniques is required. To combat the differences between interpretation of diagnostics on the convection-permitting compared to convection-parameterizing scales either the metrics have been adapted (as discussed below) or cell statistics have been used. All of the diagnostics have been calculated over the analysis boxes shown in Fig. 1.

### b. Mean square difference

*P*

_{p}is the hourly accumulation of precipitation in the perturbed forecast, and

*P*

_{c}is the hourly accumulation of precipitation in the control forecast. The normalization factor implies that the MSD will not be altered by the total amount of precipitation in the forecast and so multiple cases with different precipitation amounts can be compared fairly.

The MSD is kept in its square form so that, when considering precipitation forecasts, it can be broken down into the MSD arising from precipitating points only in the control forecast, those points only in the perturbed forecast and those points that have precipitation in the same location in both the perturbed and the control (common points). Considering only the common points (MSD_{common}) between both forecasts removes the “double penalty” problem faced by all gridpoint diagnostics at the convective scale. By removing the “double penalty” problem the interpretation of this diagnostic at the convective scale reverts to its traditional interpretation of magnitude differences as opposed to a mixture of magnitude and spatial differences. The threshold for the precipitation used within this study is an hourly accumulation of 1 mm.

### c. Fraction of common points

*F*

_{common}) is a useful concept alongside MSD

_{common}as it helps determine the fraction of the precipitating points used within the calculation of MSD

_{common}. The fraction is defined by the total number of precipitating points in common between two forecasts (above an arbitrary threshold: in this case hourly precipitation accumulations exceeding 1 mm) divided by the worst case scenario of precipitating points (i.e., when both forecasts have all the rainfall in different locations); i.e.,

*N*is the number of precipitating points above the threshold and the subscripts

*p*and

*c*refer to the forecasts being compared (perturbed and control forecast, respectively). As with the MSD, this diagnostic has been used in Leoncini et al. (2010), and here we take the normalization of Flack et al. (2018) so that

*F*

_{common}ranges between 0 and unity, where unity represents a forecast where all of the precipitation is in the same location, and 0 represents a forecast where none of the precipitation is in the same location. Had the subtraction of

*N*

_{p,c}not occurred in the denominator then the fraction would range between 0.0 and 0.5.

As in Flack et al. (2018), the *F*_{common} values are compared to the fraction of common points that would be yielded from a random relocation of precipitating cells, based upon the number of cells in the control forecast (*F*_{chance}). A value of *F*_{common} smaller than or close to *F*_{chance} implies that the spatial forecasts of the events are near to a random forecast.

### d. Cell statistics

The precipitation field from each forecast comprises discrete areas, or cells, of rain. To determine the distribution of size and strength of these cells, the cells are first identified using a tracking algorithm described in Hanley et al. (2015). Each identified cell is then described as having an area-equivalent diameter and a mean rain rate; area-equivalent diameter is defined as the diameter a circle with the equivalent area would have. Area and rain rate thresholds are applied to the precipitation fields so that only cells with a specified minimum size and rain rate are considered in the evaluation. In this study, a minimum area of 2Δ*x* × 2Δ*x* is used and rain rate thresholds of 1 and 4 mm h^{−1} are imposed, representing light and heavy rain, respectively. After thresholds have been imposed, distributions of cell size and magnitude can be produced, in the form of histograms.

## 7. Testing the scheme

### a. Test cases

While the scheme has been applied to a number of convective cases during its development, two test cases from summer 2017 are presented here. These cases exhibit very different convective behavior and have been chosen to show how the scheme behaves in a situation that is more stochastic in nature (scattered showers) compared to one that is not [a mesoscale convective system (MCS)]. The convective adjustment time scale (e.g., Done et al. 2006) is used to show how different these cases are by considering their placement in the spectrum of convective regimes (e.g., Flack et al. 2018). The cases have also been named after the areas in which either there was flooding or was the most dynamically active.

#### 1) Coverack case: 18 July 2017

This case is highlighted by a flash flood that swept through the village of Coverack in southern Cornwall (Cornwall Council 2017). The convective event formed off the coast of Brittany at approximately 1200 UTC and progressed northward, reaching Coverack at around 1400 UTC. Part of the event anchored over Coverack and produced intense rainfall in the area for approximately 3 h, the main part of the system continued past Coverack and into the surrounding moorland. The event formed as a result of a surface trough (Fig. 1a), and this forcing continued with the event, with it eventually forming into an MCS. This rapidly decayed and later in the day another MCS formed over France, which then moved into southeast England causing surface-water flooding in Reading (Davies 2017). The convective adjustment time scale for this case begins at 4.2 h and reduces to 0.4 h by the end of the convection. Therefore, it sits somewhere in the middle of the regimes for the United Kingdom (cf. with Flack et al. 2016). Using the time scale’s value and combining it with the knowledge that local forcing existed to keeping the storm anchored places this case more toward the nonequilibrium end of the spectrum (despite the marginal time scale).

#### 2) Kent case: 5 August 2017

A high pressure center was located over continental Europe and the United Kingdom was dominated by westerly flow (Fig. 1b). This led to showers forming in the lee of the Welsh mountains which were then advected by the flow across England. As the showers formed, they intensified as a result of localized forcing from a surface trough. This surface trough then tracked with the showers and they eventually organized into squall lines over central England. These lines then progressed eastward into East Anglia, the Thames estuary, and North Kent. During the event, one of the authors (D. L. A. Flack) was in North Kent. Along with the intense precipitation associated with this convective line, at the southern end of the line (over the North Kent coast) he observed a mesocyclone and three funnel clouds from 1502 to 1534 UTC. This case has a low convective adjustment time scale throughout the convective life cycle varying between 1.1 and 0.1 h. These low values imply that the case is closer toward the convective quasi-equilibrium end of the spectrum of convective regimes.

### b. Model setup

The scheme has been tested in the 1.5 km resolution, UKV configuration of the MetUM at version 10.6 with the Even Newer Dynamics for General Atmospheric Modeling of the Environment (ENDGAME) dynamical core (Wood et al. 2014). The model domain is shown in Fig. 1. This configuration uses the Lock et al. (2000) boundary layer scheme, the Wilson and Ballard (1999) microphysics scheme, the Edwards and Slingo (1996) radiation scheme and the Porson et al. (2010) surface-exchange scheme. The two cases considered here are run such that the most intense convection of the case studies occurs at approximately 24 h into the simulation. The Coverack case is initiated at 1500 UTC 17 July 2017 and the Kent case is initiated at 1500 UTC 4 August 2017. These times are chosen to allow time for the perturbations to spin up so that they can influence the forecast of the convective events.

### c. Effect of the scheme

This section illustrates how the introduction of the stochastic scheme described in sections 3 and 4 influences the forecast in the MetUM using the diagnostics introduced in section 6. As described in section 4, the area Δ*A* over which we are considering the thermals is covered by *n*_{g} × *n*_{g} grid boxes. The perturbations to the field are calculated on a reduced two-dimensional grid (with each horizontal dimension reduced by *n*_{g}), with each original grid box within an *n*_{g} × *n*_{g} gridbox area using identical random perturbations. For each grid box in the original grid, the associated perturbation from the reduced grid is then multiplied by the perturbation to the field due to the boundary layer scheme in that grid box to obtain the resulting perturbation to be used in the scheme. This method is applied to potential temperature, moisture and horizontal wind fields as described in Eq. (48). A value of *n*_{g} = 8 has been chosen so as to be close to the 5 or 6 Δ*x* required to resolve features on a grid and yet not too large so that the stochastic perturbations are negligible. Sensitivity to this choice is tested in section 7d.

For each case, the MetUM was initially run without the addition of any stochastic perturbations, to produce a control run. The model run was then repeated five times with the scheme included, each time with a different random seed, to produce five ensemble members. This gives an indication of the spread of the forecasts when using the scheme. Figure 2 shows that in both cases the overall domain-averaged rainfall was quite well forecast but certainly not perfectly. Initiation was (perhaps unusually) more gradual in the model than in radar rainfall in the Coverack case. The initiation time of the convection is not significantly changed comparing the control run and any of the ensemble members. Furthermore, during the first 6 h after the convection is initiated, the domain-averaged rain rate is almost identical. The largest differences appear later into the run; although the peaks and troughs in the precipitation are similar, the magnitude varies by up to 10% or 20%. There is a qualitative impression that the growth of differences between members has saturated well before the end of the forecast—this is discussed more quantitatively below.

This variation is also evident in instantaneous precipitation fields in Fig. 3 for Coverack and Fig. 4 for Kent. In both cases the forecasts, while having larger-scale structure in common with the radar rainfall, have substantial errors at the cloud-cluster scale. The ensemble members show similar larger regions of precipitation (cloud clusters) but differ in the finer detail (cloud scale).

The fraction of common points between the control forecast and the ensemble member forecasts (using an hourly accumulation threshold of 1 mm) can be seen in Fig. 5, with the mean of all ensemble members shown as the solid line and the dashed lines indicating ±1 standard deviation from the mean. When the convection is fully developed in the domain, at a lead time of around 18 h, the fraction of common points generally has a value between 0.4 and 0.8, with a decreasing trend toward the lower end of the range. This supports observations that, while the location of the main regions of precipitation are superficially the same, the details in location and/or timing of the heaviest rain differ between forecasts.

The increases in *F*_{common} are associated with (i) the formation of new cells collocated in the ensemble members, (ii) enlargement/intensification of cells to encompass other smaller cells in the other members, or (iii) cells coming into the analysis domain from the boundaries, as the boundary conditions remain unchanged throughout each ensemble member. Subsequent falls in the diagnostic are expected as the forecasts begin to diverge from one another.

In the Coverack case, the entry of the MCS into the domain is preceded by the formation of small and very scattered shower cells, which develop in random locations in each ensemble member roughly 13 h into the forecast. The large area of organized precipitation then moves into the domain accompanied by smaller cells (again, in different locations depending on the ensemble member) and an increase in *F*_{common}. It is fully in the domain by 1500 UTC (*T* + 24 h) when *F*_{common} starts to increase again. By 1700 UTC, there are fewer small cells and then the organized system moves out of the domain. It would seem that the increases in *F*_{common} coincide with the decrease in the number of smaller cells.

The sharp drop at *T* + 26 h in Fig. 5 in the Kent case, is associated with the squall lines leaving the analysis domain and so reflects different timings alongside a drop in the number of precipitating cells, as indicated by the drop in *F*_{chance}.

The difference in magnitude for the common points between the control and the ensemble forecasts is shown in Fig. 6, where the mean square difference of common points between the control and each ensemble member, is less than 0.5 for all runs. For the Coverack case, the ensemble-averaged MSD_{common} remains fairly constant over time (though with a peak at *T* + 18 h). This is probably because the forecast was dominated by the MCS. The Kent case shows more temporal variations that can be associated with the development of the showers and merger into squall lines.

Cell statistics in Fig. 7 (for Coverack) and Fig. 8 (for Kent) show that the distribution of cell size and magnitude is largely similar for each ensemble member. Overall, the total number of precipitation cells (>1 mm h^{−1} threshold) appears to increase when the scheme is used and there is a hint that there are more larger cells in the Coverack case, although in general, the distribution of size and magnitude of precipitation cells are similar for the control and each ensemble member. Together with the MSD and *F*_{common} plots, we can conclude that the cells have similar characteristics for the control and each ensemble member but appear in different locations at any particular time (with evidence that there may be more of the larger cells with 1 mm h^{−1} threshold). On the other hand, the Kent case shows a general reduction in the total number of precipitation cells and is likely to be linked to slightly stronger upper-level forcing in this case compared to the Coverack case.

### d. Sensitivity to number of grid boxes, n_{g}

As described in section 4, the area Δ*A* in the scheme is defined using *n*_{g} grid boxes. This adjustable parameter determines how the stochastic perturbations vary spatially. A value of 8 is chosen by default for reasons described previously. However, it is still possible to use a smaller (or larger) value of *n*_{g} and this section explores the sensitivity of the scheme to the value of *n*_{g} in model runs for the cases described above. Specifically, values for *n*_{g} of 2 and 4 are compared with the *n*_{g} = 8 runs already described.

The key features in section 4 highlight that a smaller area Δ*A* would hold fewer thermals and thus the stochastic increments would be larger in a physically consistent way. This is confirmed in the distributions of the instantaneous perturbations shown in Fig. 9 for both cases. Note that the instantaneous perturbations are those added in a time step; as discussed in section 7c, the total perturbation amplitude delivered by temporally correlated perturbations is of order

There are some differences in the distributions between the cases though. Specifically, in the Kent case with increased *n*_{g}, the distribution shows greater frequency of points with a perturbation value of zero, and there is also a narrowing of the distribution (cf. *n*_{g} = 8 and *n*_{g} = 2; Fig. 9b), whereas for the Coverack case (cf. *n*_{g} = 8 and *n*_{g} = 4; Fig. 9a), the distribution only becomes narrower and there is no change in the frequency of points with a perturbation size of zero. These changes in the distribution are linked to the more numerous convection, and hence perturbation occurrence, in the Kent case, compared to the more locationally restricted perturbations (mainly around the MCS) in the Coverack case.

The sensitivity of the forecast to a change in value of *n*_{g} is shown by the time series of average rain rate in Figs. 10a and 10b in the Coverack and Kent cases, respectively. This shows that using a lower value of *n*_{g} does not significantly change the precipitation amount; the rainfall rate using these lower values of *n*_{g} actually fall within the range of values from the ensemble member forecasts (not shown here). Instantaneous rain rate plots over the Coverack area at the time when the precipitation was anchored over the village (third row of Fig. 3—and equivalent Fig. 4 for Kent), also show that the main regions of heavy precipitation are very similar in all ensemble members and runs with lower value of *n*_{g} although, again, the finer detail is different. Cell statistics (see Figs. 13a and 13b for Coverack and Kent, respectively) show that the distribution of the cell size and intensity are not sensitive to the value of *n*_{g}.

The number of common points between the control (no scheme) forecast and those with the scheme but varying values of *n*_{g} are largely similar for both cases. Small changes in *F*_{common} are observed with smaller perturbation areas (see Figs. 11a and 11b for Coverack and Kent, respectively). This is echoed in the mean square difference of the common points (see Figs. 12a and 12b for Coverack and Kent, respectively). This may reflect the idea that it is the variability at the scales resolvable that matters and the scheme is designed to produce similar variability on this scale even if a small scale is used in the scheme.

### e. Sensitivity to amplitude

The magnitude of the perturbations at each time step produced by this scheme are of order 20 times smaller than those from the (A. Lock 2016, personal communication) scheme described in section 5. It was demonstrated in the previous section that changing the seed in the calculation of the perturbations or the area over which the perturbations change did not significantly change the overall forecast compared with a forecast without the scheme (though substantial differences develop at small scales). This section considers the effect of the magnitude of the perturbation on the results. This is achieved by multiplying the second term on the right-hand side of Eq. (48) by a factor (hereafter referred to as Ampfactor) thus increasing the perturbations by that factor. We use Ampfactor values of 2 and 10. These values of Ampfactor are used purely for these sensitivity tests, they do not have any physical basis. This lack of physical basis implies that the results would be less scientifically justified for use within a forecast. Distributions of instantaneous perturbations for the experiments are shown in Fig. 9.

Once again, there are no obvious differences in the larger-scale structure of precipitation fields or distribution of cell size with the larger perturbation. These larger perturbations do not appear to significantly affect the average rain rate (see Fig. 10) and the area of precipitation is largely similar with differences only in the detail (see bottom row in Figs. 3 and 4).

The detail can be studied through the other diagnostics. Differences between Ampfactor 1 and Ampfactor 2 simulations, in comparison with the control and ensemble spread, are minimal. Larger differences occur for multiplying the perturbations by an Ampfactor of 10. There is a consistent decrease in *F*_{common}, particularly for the Kent case, during the mature stages of the convective events (Fig. 11b). The ensemble-averaged MSD_{common} (Fig. 12) shows similar behavior for Ampfactor 1 and 2 simulations, and even the Ampfactor 10 results show a similar spread in the ensemble, though again with some suggestion of more impact in the Kent case. Similarly, the size and magnitude of the cells, shown in Figs 13a and 13b (for Coverack and Kent, respectively), do not change significantly with perturbation size.

## 8. Conclusions

A physically consistent but simple stochastic boundary layer scheme has been developed that takes a form most naturally expressed as multiplicative noise. The noise is Poisson in character and, consideration of magnitudes shows that the distribution can be highly skewed at convection-permitting scales. The scheme has been derived via a very different route but has been shown to have much in common with the KC scheme, though with the advantage that some parameters are more physically based.

Quantitative investigation of the scheme shows that with horizontal grid lengths around 1 km temporal correlation is far more important than spatial. Above all, however, it is notable that the size of perturbations is very small (overall heating, for example, generally less than 0.01 K in temperature delivered over a correlation time of around 10 min); at least an order of magnitude smaller than in some more ad hoc schemes. As a result, they have very little impact on the energetics of the forecast, so overall domain-averaged precipitation, for example, is essentially unchanged.

Differences between ensemble members initialized from the control do grow to produce significant differences between ensemble members at the scale of individual convective cells. Thus, they can plausibly account for lack of predictability at this scale. After around 12 h they appear to be roughly saturated. However, it must be emphasized that perturbations are continuously added to the system, so this represents the time scale to achieve a balance between addition of new perturbations, perturbation growth and dissipation, not just saturation of initial perturbations. On these time scales there is no strong evidence for substantial growth of ensemble spread at larger scales. However, it is likely that, in addition to this being limited by the domain size, the cases chosen have a finite lifetime of very active convection.

The results of the scheme are not sensitive to the choice of averaging area the perturbations are chosen to represent, at least where this remains within an order of magnitude of the grid scale. This was the intention behind the design of the scheme and is consistent with the idea that the greater amplitude of perturbations when applied closer to the grid scale do not couple with the dynamics, and only those at well-resolved scales are effective. However, we have not proven this by looking at the initial perturbation growth, as it seems clear that the scheme has impact through the longer-term saturation of the continuously applied perturbations.

Some sensitivity to magnitude in difference diagnostics compared to no scheme is evident, which suggests that the amplitude of perturbations at resolved scales does have some effect, but the sensitivity is not strong. A factor-of-2 increase in magnitude produces results that are within the ensemble spread produced by the scheme but an order of magnitude increase in magnitude does lead to ensemble spread that is somewhat more rapid, and generally marginally more spatial spread when considering the Kent case.

The scheme represents only the variability due to boundary layer turbulence in a horizontally homogeneous system. Other sources of boundary layer variability undoubtedly exist, such as small-scale heterogeneity in land surface and soil moisture. More ad hoc schemes may well represent actual boundary layer variability more quantitatively, but at the cost of having less basis in physical processes. Nevertheless, the scheme has been shown to be sufficient to produce substantial differences in forecasts at the scale of convective cells; in Part II of this paper, the differences between ensemble members is compared with that resulting from mesoscale differences in the initial and boundary conditions.

The authors acknowledge the use of the MONSooN system, a collaborative facility supplied under the Joint Weather and Climate Research Programme, which is a strategic partnership between the Met Office and the Natural Environment Research Council (NERC). We would also like to acknowledge the three reviewers, two anonymous and George Craig, for their input that has undoubtedly improved the clarity of this paper.

This work has been funded under the work program Towards end-to-end flood forecasting and a tool for real-time catchment susceptibility (TENDERLY) as part of the Flooding From Intense Rainfall (FFIR) project by NERC under Grant NE/K00896X/1. The data used are available by contacting D. Flack at david.flack1@metoffice.gov.uk and are subject to licensing.

## REFERENCES

Arakawa, A., and W. Schubert, 1974: Interaction of a cumulus cloud ensemble with large-scale environment, part I.

,*J. Atmos. Sci.***31**, 674–701, https://doi.org/10.1175/1520-0469(1974)031<0674:IOACCE>2.0.CO;2.Beare, S., W. Tennant, and C. Sanchez, 2019: Stochastic physics code in the UM. Met Office Documentation Paper 081, 22 pp., https://code.metoffice.gov.uk/doc/um/latest/papers/umdp_081.pdf.

Carson, D. J., 1973: The development of a dry inversion-capped convectively unstable boundary layer.

,*Quart. J. Roy. Meteor. Soc.***99**, 450–467, https://doi.org/10.1002/qj.49709942105.Clark, A. J., W. A. Gallus Jr., M. Xue, and F. Kong, 2009: A comparison of precipitation forecast skill between small convection-allowing and large convection-parameterizing ensembles.

,*Wea. Forecasting***24**, 1121–1140, https://doi.org/10.1175/2009WAF2222222.1.Clark, P., N. Roberts, H. Lean, S. P. Ballard, and C. Charlton-Perez, 2016: Convection-permitting models: A step-change in rainfall forecasting.

,*Meteor. Appl.***23**, 165–181, https://doi.org/10.1002/met.1538.Cornwall Council, 2017: Coverack flood. Accessed March 2018, http://www.cornwall.gov.uk/coverack.

Davies, R., 2017: UK—Emergency rescues after flash floods in southern England. FloodList, accessed March 2018, http://floodlist.com/europe/united-kingdom/flash-floods-south-england-july-2017.

Done, J. M., G. C. Craig, S. L. Gray, P. A. Clark, and M. E. B. Gray, 2006: Mesoscale simulations of organized convection: Importance of convective equilibrium.

,*Quart. J. Roy. Meteor. Soc.***132**, 737–756, https://doi.org/10.1256/qj.04.84.Done, J. M., G. C. Craig, S. L. Gray, and P. A. Clark, 2012: Case-to-case variability of predictability of deep convection in a mesoscale model.

,*Quart. J. Roy. Meteor. Soc.***138**, 638–648, https://doi.org/10.1002/qj.943.Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation.

,*Mon. Wea. Rev.***129**, 2461–2480, https://doi.org/10.1175/1520-0493(2001)129<2461:AOAPMS>2.0.CO;2.Edwards, J., and A. Slingo, 1996: Studies with a flexible new radiation code. I: Choosing a configuration for a large-scale model.

,*Quart. J. Roy. Meteor. Soc.***122**, 689–719, https://doi.org/10.1002/qj.49712253107.Flack, D. L. A., R. S. Plant, S. L. Gray, H. W. Lean, C. Keil, and G. C. Craig, 2016: Characterisation of convective regimes over the British Isles.

,*Quart. J. Roy. Meteor. Soc.***142**, 1541–1553, https://doi.org/10.1002/qj.2758.Flack, D. L. A., S. L. Gray, R. S. Plant, H. W. Lean, and G. C. Craig, 2018: Convective-scale perturbation growth across the spectrum of convective regimes.

,*Mon. Wea. Rev.***146**, 387–405, https://doi.org/10.1175/MWR-D-17-0024.1.Flack, D. L. A., P. A. Clark, C. E. Halliwell, N. M. Roberts, S. L. Gray, R. S. Plant, and H. W. Lean, 2021: A physically based stochastic boundary layer perturbation scheme. Part II: Perturbation growth within a superensemble framework.

,*J. Atmos. Sci.***78**, 747–761, https://doi.org/10.1175/JAS-D-19-0292.1.Garratt, J. R., 1994:

*The Atmospheric Boundary Layer*. Cambridge University Press, 316 pp.Hanley, K. E., R. S. Plant, T. H. M. Stein, R. J. Hogan, J. C. Nicol, H. W. Lean, C. Halliwell, and P. A. Clark, 2015: Mixing-length controls on high-resolution simulations of convective storms.

,*Quart. J. Roy. Meteor. Soc.***141**, 272–284, https://doi.org/10.1002/qj.2356.Hirt, M., S. Rasp, U. Blahak, and G. C. Craig, 2019: Stochastic parameterization of processes leading to convective initiation in kilometer-scale models.

,*Mon. Wea. Rev.***147**, 3917–3934, https://doi.org/10.1175/MWR-D-19-0060.1.Hohenegger, C., and C. Schär, 2007: Predictability and error growth dynamics in cloud-resolving models.

,*J. Atmos. Sci.***64**, 4467–4478, https://doi.org/10.1175/2007JAS2143.1.Hohenegger, C., D. Lüthi, and C. Schär, 2006: Predictability mysteries in cloud-resolving models.

,*Mon. Wea. Rev.***134**, 2095–2107, https://doi.org/10.1175/MWR3176.1.Holtslag, A. A. M., and B. A. Boville, 1993: Local versus nonlocal boundary-layer diffusion in a global climate model.

,*J. Climate***6**, 1825–1842, https://doi.org/10.1175/1520-0442(1993)006<1825:LVNBLD>2.0.CO;2.Johnson, A., and Coauthors, 2014: Multiscale characteristics and evolution of perturbations for warm season convection-allowing precipitation forecasts: Dependence on background flow and method of perturbation.

,*Mon. Wea. Rev.***142**, 1053–1073, https://doi.org/10.1175/MWR-D-13-00204.1.Kaimal, J. C., J. C. Wyngaard, D. A. Haugen, O. R. Coté, Y. Izumi, S. J. Caughey, and C. J. Readings, 1976: Turbulence structure in the convective boundary layer.

,*J. Atmos. Sci.***33**, 2152–2169, https://doi.org/10.1175/1520-0469(1976)033<2152:TSITCB>2.0.CO;2.Keil, C., F. Heinlein, and G. Craig, 2014: The convective adjustment times-scale as indicator of predictability of convective precipitation.

,*Quart. J. Roy. Meteor. Soc.***140**, 480–490, https://doi.org/10.1002/qj.2143.Kober, K., and G. C. Craig, 2016: Physically based stochastic perturbations (PSP) in the boundary layer to represent uncertainty in convective initiation.

,*J. Atmos. Sci.***73**, 2893–2911, https://doi.org/10.1175/JAS-D-15-0144.1.Kühnlein, C., C. Keil, G. Craig, and C. Gebhardt, 2014: The impact of downscaled initial condition perturbations on convective-scale ensemble forecasts of precipitation.

,*Quart. J. Roy. Meteor. Soc.***140**, 1552–1562, https://doi.org/10.1002/qj.2238.Lean, P. W., 2006: The predictability of convective storms over the ocean. Ph.D. thesis, University of Reading, 179 pp.

Leoncini, G., R. Plant, S. Gray, and P. Clark, 2010: Perturbation growth at the convective scale for CSIP IOP 8.

,*Quart. J. Roy. Meteor. Soc.***136**, 653–670, https://doi.org/10.1002/qj.587.Leoncini, G., R. Plant, S. Gray, and P. Clark, 2013: Ensemble forecasts of a flood-producing storm: Comparison of the influence of model-state perturbations and parameter modifications.

,*Quart. J. Roy. Meteor. Soc.***139**, 198–211, https://doi.org/10.1002/qj.1951.Lock, A. P., A. R. Brown, M. R. Bush, G. M. Martin, and R. N. B. Smith, 2000: A new boundary layer mixing scheme. Part I: Scheme description and single-column model tests.

,*Mon. Wea. Rev.***128**, 3187–3199, https://doi.org/10.1175/1520-0493(2000)128<3187:ANBLMS>2.0.CO;2.Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion.

,*Tellus***21**, 289–307, https://doi.org/10.3402/tellusa.v21i3.10086.Lumley, L., 1967: The structure of inhomogeneous turbulent flows.

*Atmospheric Turbulence and Radio Wave Propagation*, A. M. Yaglom and V. I. Tatarsky, Eds., Nauka, 166–176.Mason, P. J., 1988: The formation of areally-averaged roughness lengths.

,*Quart. J. Roy. Meteor. Soc.***114**, 399–420, https://doi.org/10.1002/qj.49711448007.Melhauser, C., and F. Zhang, 2012: Practical and intrinsic predictability of severe and convective weather at the mesoscales.

,*J. Atmos. Sci.***69**, 3350–3371, https://doi.org/10.1175/JAS-D-11-0315.1.Moin, P., and R. D. Moser, 1989: Characteristic-eddy decomposition of turbulence in a channel.

,*J. Fluid Mech.***200**, 471–509, https://doi.org/10.1017/S0022112089000741.Panofsky, H. A., H. Tennekes, D. H. Lenschow, and J. C. Wyngaard, 1977: The characteristics of turbulent velocity components in the surface layer under convective conditions.

,*Bound.-Layer Meteor.***11**, 355–361, https://doi.org/10.1007/BF02186086.Plant, R., and G. Craig, 2008: A stochastic parameterization for deep convection based on equilibrium statistics.

,*J. Atmos. Sci.***65**, 87–105, https://doi.org/10.1175/2007JAS2263.1.Porson, A., P. A. Clark, I. N. Harman, M. J. Best, and S. E. Belcher, 2010: Implementation of a new urban energy budget scheme in the MetUM. Part I: Description and idealized simulations.

,*Quart. J. Roy. Meteor. Soc.***136**, 1514–1529, https://doi.org/10.1002/qj.668.Rasp, S., T. Selz, and G. C. Craig, 2018: Variability and clustering of midlatitude summertime convection: Testing the Craig and Cohen theory in a convection-permitting ensemble with stochastic boundary layer perturbations.

,*J. Atmos. Sci.***75**, 691–706, https://doi.org/10.1175/JAS-D-17-0258.1.Raynaud, L., and F. Bouttier, 2016: Comparison of initial perturbation methods for ensemble prediction at convective scale.

,*Quart. J. Roy. Meteor. Soc.***142**, 854–866, https://doi.org/10.1002/qj.2686.Skamarock, W. C., and Coauthors, 2008: A description of the Advanced Research WRF version 3. NCAR Tech. Note NCAR/TN-475+STR, 113 pp., https://doi.org/10.5065/D68S4MVH.

Skamarock, W. C., S.-H. Park, J. B. Klemp, and C. Snyder, 2014: Atmospheric kinetic energy spectra from global high-resolution nonhydrostatic simulations.

,*J. Atmos. Sci.***71**, 4369–4381, https://doi.org/10.1175/JAS-D-14-0114.1.Wilson, D. R., and S. P. Ballard, 1999: A microphysically based precipitation scheme for the UK Meteorological Office Unified Model.

,*Quart. J. Roy. Meteor. Soc.***125**, 1607–1636, https://doi.org/10.1002/qj.49712555707.Wood, N., and Coauthors, 2014: An inherently mass-conserving semi-implicit semi-Lagrangian discretisation of the deep-atmosphere global nonhydrostatic equations.

,*Quart. J. Roy. Meteor. Soc.***140**, 1505–1520, https://doi.org/10.1002/qj.2235.Zhang, F., C. Snyder, and R. Rotunno, 2003: Effects of moist convection on mesoscale predictability.

,*J. Atmos. Sci.***60**, 1173–1185, https://doi.org/10.1175/1520-0469(2003)060<1173:EOMCOM>2.0.CO;2.