A physically-based stochastic boundary-layer perturbation scheme. Part I: formulation and evaluation in a convection-permitting model

: We present a simple, physically consistent stochastic boundary layer scheme implemented in the Met Ofﬁce’sUniﬁedModel.Itisexpressedastemporally correlatedmultiplicativePoissonnoisewithadistributionthatdepends on physical scales. The distribution can be highly skewed at convection-permitting scales (horizontal grid lengths around 1 km) when temporal correlation is far more important than spatial. The scheme is evaluated using smallensembleforecasts of two case studies of severe convective storms over the United Kingdom. Perturbations are temporally correlated over an eddy-turnover time scale, and may be similar in magnitude to or larger than the mean boundary layer forcing. However, their meanis zero andhence they, inpractice, they have very little impacton the energetics ofthe forecast, so overall domain-averaged precipitation, for example, is essentially unchanged. Differences between ensemble members grow; after around 12h they appear to be roughly saturated; this represents the time scale to achieve a balance between addition of new perturbations, perturbationgrowth,anddissipation,notjustsaturationofinitialperturbations.Theschemetakesintoaccounttheareachosen toaverageover,andresultsareinsensitivetothisareaatleastwherethisremainswithinanorderofmagnitudeofthegridscale.


Introduction
In recent years, there has been a rapid growth in the development and use of so-called convective-scale numerical weather prediction (NWP) systems using ''convection-permitting models'' (CPMs) (Clark et al. 2016).In building quantitative NWP systems, it is widely recognized that a number of aspects of the system are only known with some degree of uncertainty.Ensemble-prediction systems (hereafter ensembles) are designed to translate our understanding of uncertainty into reliable probabilistic forecasts, in particular of hazardous weather events.The uncertainty can come from many areas including model uncertainty and initial and boundary condition uncertainty, all of which have been studied at the convective scale (e.g., Leoncini et al. 2010Leoncini et al. , 2013;;Keil et al. 2014;Kühnlein et al. 2014).It is essential that these uncertainties are represented accurately.
All forms of uncertainty may have an important role in a forecast as errors, particularly at the convective scale, can, and do, generally grow quickly within forecasts (e.g., Lorenz 1969;Hohenegger and Schär 2007).The consensus in convective-scale studies is that errors grow primarily as a result of the processes occurring within convection (Zhang et al. 2003;Hohenegger et al. 2006).Initial studies considered the growth from initial conditions and indicated that not only was error growth faster than in larger-scale models that use parameterized convection (Hohenegger and Schär 2007) but that forecasts could be improved in certain situations through better specification of the initial conditions (e.g., Melhauser and Zhang 2012).As expected, the predominant impact of the error growth from initial condition uncertainty occurred at the start of the forecast (Keil et al. 2014;Kühnlein et al. 2014), though of course this uncertainty will have some impact downstream.On the other hand, due to their regular refreshing, the boundary condition uncertainty led to consistent error growth throughout the forecasts (Keil et al. 2014;Kühnlein et al. 2014).Uncertainty also arises, for many reasons, from the representation of physical processes within the model.Investigations of error growth from simulated model physics variations in CPMs suggest that they have the greatest impact when initiated within the boundary layer (Lean 2006) and during the initiation and development of convective events (Leoncini et al. 2010;Keil et al. 2014).However, it is clear that all forms of uncertainty need to be represented in the model to allow for some form of variability on all scales from the start of the run, despite the different interactions and impacts of the error growth in the different convective situations (e.g., Raynaud and Bouttier 2016).
Uncertainty in the boundary layer can, itself, arise from a number of sources.The first is ''model error'' due to the fact that model parameterizations are not perfect.Put more precisely, it is important to recognize that, in the case of the boundary layer, boundary layer schemes (including shallow convection) are designed to predict the ensemble mean of realizations of a turbulent boundary layer in quasi-equilibrium.
(It must be emphasized that the ensemble here is purely conceptual and nothing to do with ensemble-based NWP.)With given boundary conditions and forcing, we assume that there is a unique ensemble-mean solution, but no scheme always reproduces this perfectly.Schemes tend to have systematic errors; in an ideal world these would be correctable via a bias-correction scheme, leaving an unbiased error, but in many cases this has only been done through a form of assessment and manual tuning.Taking account of this has led to the use of ''multiphysics'' or ''poor-man's'' ensembles, e.g., Ebert (2001).While undoubtedly important, this source of error is conceptually difficult as it is not based upon physics, but rather the inadequacies of our modeling of physics.As Ebert (2001) shows, it is essential that the performance of each model is properly taken into account before an optimal ensemble is designed.If not approached very carefully, it can lead to the artificial enhancement of ensemble spread to meet quantitatively desirable criteria by deliberately introducing less accurate parameterizations.
Physically based uncertainty arises from the heterogeneity of the boundary layer.In CPMs we want a space-time mean over a finite domain, not the ensemble mean, but we have no direct methods to establish how to parameterize this.Even a boundary layer with horizontally homogeneous forcing results in a highly inhomogeneous instantaneous turbulent flow.We assume that the stationary system is ergodic, i.e., that sufficiently long (wide) time (space) averages tend to the ensemble mean.However, as we look at increasingly small space and time periods, our ability to predict flux divergences is limited by the inherent variability of the turbulent flow.
Heterogeneity can also be introduced by the surfaceexchange process.Many surface schemes, including that in the Met Office Unified Model (MetUM), account for surface inhomogeneity using techniques based upon the concept of blending height (e.g., Mason 1988) which accounts for inhomogeneity on scales of a few hundred meters; theory tells us that the boundary layer then ''sees'' an effective surface.The key point is that this heterogeneity is on a smaller scale than the convective-overturning scale considered in our scheme, so as a source of variability this should be less important.Heterogeneity at the model grid scale and larger is explicit in the model (though, of course, many questions surround whether the surface forcing should be smoothed for numerical reasons).Nevertheless, there is probably a range of scales in between where surface heterogeneity, if it exists, contributes to boundary layer variability.
In addition, there is uncertainty associated with surface characteristics.This may be essentially fixed or slowly varying (e.g., exactly how tall are the trees and hence what is their roughness length, or what is their leaf area index).This may be of very practical importance but is not a problem of physics but of information that can, in principle and, increasingly, in practice, be acquired.However, some variables, notably soil moisture, may have uncertainty that is impractical to reduce by measurement.
A number of studies exist concerning the impact of boundary layer uncertainty.One technique that has been used in atmospheric models when examining the predictability of convective events lumps small-scale stochastic variations in the boundary layer due to all unresolved turbulent processes into relatively arbitrary stochastic perturbation fields in the form of Gaussian ''bumps'' (e.g., Done et al. 2012;Leoncini et al. 2013;Flack et al. 2018).While used as the only form of uncertainty represented in the model it has not produced large spread at the kilometer scale (Flack et al. 2018).However, the representation of the processes has been useful for determining different positions and magnitudes of scattered showers and convection in general.Here we go beyond this to develop a stochastic boundary layer perturbation scheme that specifically takes into account variation due to sampling the turbulent processes.The ideas below take this as a starting point, assuming that our ensemble-mean parameterization of the system is correct and that our task is to estimate the ''sampling error'' or noise that arises from averaging the system over a smaller time period and area than would be required for the average to be sufficiently converged to equal the ensemble mean.We do not take account of heterogeneity of the surface or model error, though the scheme is designed to be compatible with any boundary layer scheme.
It should be noted that the objective is to gain reasonable physically based estimates of the sampling uncertainty as a function of model resolution, as it must increase as resolution decreases.At current CPM scales, the studies cited above lead us to expect that mesoscale uncertainty in initial and boundary conditions is the primary contributor to ensemble spread.
The methodology discussed in this paper is inspired by Plant and Craig (2008) but can be traced back to the idea of shot-noise decomposition of turbulence, that is briefly reviewed in section 2. The new scheme is described in section 3. We first introduce a simple bulk parameterization of the convective boundary layer (CBL) in order to introduce scaling, but also as a means to illustrate the approach in a simple framework.Section 4 outlines implementation of the scheme in the MetUM.Similar schemes have been implemented by others (e.g., A. Lock 2016, personal communication; Kober and Craig 2016, hereafter KC) and we compare our scheme with these in section 5.
We then test the scheme.This testing evaluates the impact of the scheme on its own; it is not expected that this source of uncertainty can account for the whole of forecast error, so evaluation is restricted in Part I of this paper to impact on ensemble spread and not on overall forecast reliability.The latter is addressed in Flack et al. (2021, hereafter Part II).The testing (in Part II) compares forecast uncertainty due to initial and boundary condition uncertainty with the uncertainty resulting from the scheme running continuously.It relies on the existence of realistic initial and boundary condition perturbations from the Met Office Global and Regional Ensemble Prediction System (MOGREPS) scheme; this only has one boundary layer scheme in operational use.At present it is therefore not feasible to test cleanly with another boundary layer scheme (though its design would make this straightforward in a model with multiple schemes).To test the scheme, diagnostics designed to test convective-scale ensemble rainfall forecasts have been used; these are described in section 6 and applied to case studies in section 7 and in Part II.Finally, we summarize our conclusions regarding the scheme's characteristic behavior in section 8.

Shot-noise decomposition
The primary objective is to represent the variability that arises on space and time scales somewhat larger than the characteristic scales of boundary layer eddies.We do not wish to describe the detailed behavior of these eddies, but, rather, to capture the consequences of their intermittent and chaotic occurrence.Our starting point is to assume that eddies occur randomly and independently.This is a restriction that shall be discussed below.The idea of modeling a turbulent process as a superposition of random events was first proposed by Lumley (1967), and a particular methodology was described by Moin and Moser (1989).By analogy with the electronic noise that occurs due to the carrying of current by independent, random electrons, this approach is known as ''shotnoise'' decomposition.
We assume that eddies occur as independent, random events, that one might term eddies.Suppose we consider just temperature, the warming provided by ''unit eddy'' is characterized by a function f(z, x, t) with x and t relative to some reference point in the eddy.Thus, f includes length and time scales over which this warming is spread.Note that we have included a z dependence but this, at least when averaged horizontally, must equal that of the mean field.
Eddies occur randomly in space and time, at a set of points {x i , t i } and amplitudes {a i }, where a i is an independent random variable with a given pdf.We can think of a function F made up of delta functions with these amplitudes and locations, (1) The probability of an event occurring in the infinitesimal interval dxdydt is pdxdydt follows Poisson statistics.Then the field of increments applied to the model state, denoted S, can be constructed convolving this with f: It is then straightforward to show that the autocorrelation function of S (in time and/or space) is just that of f.An important parameter of the autocorrelation function, the integral time scale for the process, t f , is obtained by integrating the autocorrelation function over time.Equivalent results exist for spatial autocorrelation functions and integral length scales, ' xf and ' yf .
We assume, without proof, that successive regions in space become uncorrelated if we average over a distance longer than the integral length scale.Similarly, if we average over areas greater than the product of integral length scales in x and y, successive regions in space become uncorrelated.We can then think of F as the source of a quantity such as temperature provided by the stochastic process.We assume that each ''delivers'' a i over an average time scale t f and area A t 5 ' xf ' yf .Under these circumstances, the variability of the space/time averaged field is solely determined by the variability in a i and the number of events.This may be illustrated by a simple example using a bulk parameterization of the CBL.This will help motivate the final scheme but also facilitate a discussion of relevant space and time scales compared with model resolution and time steps.

A bulk parameterization a. Ensemble-mean parameterization
We take as a starting point a bulk model of the CBL (e.g., Carson 1973).To illustrate our approach, we shall ignore moisture and assume that the top of the boundary layer is a rigid, impermeable lid.Including these is straightforward but adds nothing to the insights gained with our simplified model.The key parameters are thus the boundary layer depth h and the surface buoyancy flux (expressed as a heat flux), given by where r is the density of air, C p is the specific heat capacity of air at constant pressure, and hw 0 u 0 i 0 is the ensemble mean of the covariance of fluctuations of vertical velocity and potential temperature at the surface.From these we can derive the convective velocity scale w * , and thence the scale for temperature fluctuations, u * [see, e.g., Garratt (1994), though our notation differs slightly]: and, requiring hw 0 u 0 i 0 5 w * u * , Here, hui is the vertical average of the ensemble-mean u and g is the acceleration due to gravity.Note that the boundary layer literature often uses the opposite sign convention, but it is conceptually easier here to drop this sign convention as we are primarily considering CBLs.In a well-mixed box model with an impermeable lid the heat flux profile is The rate of change of potential temperature is given by which implies (ignoring the change of boundary layer depth) where we have introduced an ''eddy-turnover time scale,'' t * 5 h/w * .As a numerical example, if H 5 400 W m 22 and h 5 1000 m, w * ' 2:2 m s 21 , and u * ' 0:2K.The eddy-turnover time scale t * is about 460 s or nearly 8 min.

b. A simple ''shot-noise'' version of the bulk CBL
We adopt a very similar approach to that of Plant and Craig (2008) in dealing with deep convection.We assume the following: 1) The mean heating is actually delivered to the boundary layer through a set of discrete thermals.2) Our space-time averaging is on a scale at least the size of one thermal.(Spatial correlation will be discussed in a little more depth in section 5.) 3) The key source of variability is the number of thermals, in a given area, averaged over and we do not need to consider, in detail, the structure of a thermal.
A difference from Plant and Craig (2008) is that they also assume a distribution of thermal (i.e., cloud) properties based on statistical-mechanical arguments.We could, no doubt, extend our argument similarly, but for the present we shall concentrate on the number variability.It should be noted that this approach is the natural counterpart of the idea that the problem with the ''gray zone'' is that we are no longer averaging over a ''large'' number of clouds (as assumed by the pioneers of ''mass flux''-based convection schemes; Arakawa and Schubert 1974).Each thermal ''event'' delivers heating such that an area A t is warmed (throughout the depth of the boundary layer) by u t by the thermal, in time scale t t ; the total heat delivered per thermal is thus Q t 5 rC p u t A t h.Note that A t is the average area occupied by one thermal; it includes the upward motion (e.g., a convective thermal) and downward motion (i.e., the whole region influenced by the thermal).It makes sense to assume that A t } h 2 ; the result is that we can express the relative variability in boundary layer heating entirely in terms of h and w * or, equivalently, h and t * .
We assume that thermals occur entirely at random (independently) with probability p per unit area and time.The probability that a thermal occurs in an infinitesimal time dt in a given (small) area dA is pdAdt.The number of thermals that occur in finite time Dt and finite area DA is thus given by a Poisson distribution.The mean number is n(DA, Dt) 5 pDADt, so the number of thermals (n) has the distribution with n 5 l 5 pDADt.The total heat delivered is Q 5 pDADtQ t so the average rate of change of mean potential temperature is this divided by DAhDtrC p ; i.e., dhui dt 5 pu t A t , (10) which implies Note that, in a Poisson distribution, the variance in the number of events equals the mean, so the variance in the heat supplied to volume DAh in Dt is and the standard deviation in the heating rate is given by If we apply this to the box model above, from Eqs. ( 8) and (11), we can write If we assume u t 5 u * , (which makes sense to within a constant, which we shall assume is subsumed into the definitions of thermal area), then and the Poisson distribution [Eq.( 9)] has The standard deviation of the heating can be written as Thus, the relative standard deviation of the heating is proportional to the inverse square root of the averaging area and time.This makes clear the intuitive simplicity of this model.If l 5 1 the average number of thermals in area DA and time Dt is 1.The actual heating rate over this area is As stated above, the variability in heating is purely due to variability in n; if we had included a distribution of thermal characteristics, that would appear as an additional factor here.Thus, the stochastic forcing is best considered as multiplicative noise.It can, however, be written as additive, which helps comparison with other schemes.We can write or, in terms of the boundary layer scales, In practice, numerical models work with discrete time steps, and model parameterization schemes produce an increment per time step Du.Over time step dt the stochastically perturbed increment can be written as or, in terms of boundary layer scales, where « p 5 n/l 2 1 is a random number and the subscript is a reminder that this originates from a Poisson process, so a new value of « p arises only at intervals Dt (assumed, for simplicity, an integral multiple of dt).It has properties E(« P ) 5 0, ( 22) Furthermore, as l increases, the Poisson distribution tends to a normal distribution, so « p tends to a normal variate with mean zero and variance l 21 .The factor (dt/t * )u * is just the average increment per model time step Du average .The factor dt/t * is the average number of thermals per model time step at a point or, alternatively, since, in practice, dt , t * , the average fraction of a thermal in a model time step at a point, as t * /dt is the average number of model time steps a thermal lasts at a point.
The variance of the stochastic increment, Du inc 5 « p Du average , is thus Recalling that n is the number of thermals occurring in time Dt (and area DA), Eqs. ( 21) and ( 24) can be applied in two different ways.If Dt 5 dt, so we apply all the heating from a thermal in a model time step, the standard deviation of the stochastic increment gets smaller with smaller model time step, but only as the square root of the model time step, as the relative sampling noise increases.
If, instead of the model time step, we choose an averaging time Dt 5 t * , i.e., apply the same increment over a time period equal to the correlation time scale, then this becomes so it is evident that we are merely spreading the overall increment over the number of time steps in t * , and the remaining factor depends just on the number of thermals in area DA at any one time.The convective boundary layer depth, h, is typically 1 km, so a model with a horizontal averaging length scale of around 1 km will have DA ' A t and standard deviation in temperature increment (added over t * ) of u * .
In practice, the scales actually represented well by a model are several (at least five or six) grid lengths (Skamarock et al. 2014).Thus, DA ' m 2 Dx 2 .A t , where m is about 5 or 6.Thus, in practice, the averaging area represents several thermals in models with horizontal grid length 1 km or larger, and the assumption of lack of horizontal correlation seems very reasonable.Hence, for models with grid length greater than around 100-200 m, there seems no need to impose a horizontal correlation or structure to the stochastic increments.
On the other hand, the time step of numerical models depends on factors such as the Courant number (uDt/Dx, where u is the maximum horizontal wind speed).The MetUM has a semi-implicit, semi-Lagrangian dynamical core that is not restricted by the Courant number in terms of stability, but still uses a time step with maximum Courant number a small integer (typically 5) to maintain accuracy.The operational 1.5 km UKV uses a 60 s time step.Many other models, such as WRF (Skamarock et al. 2008), are more restricted in time step.In the typical example in section 3a, t * 5 460 s.This tends to vary less than surface buoyancy flux and boundary layer depth as a large surface buoyancy flux (and hence large w * ) tends to produce a deep boundary layer depth.Thus, Dt/t * is often less than, or even much less than, 1 in many convection-permitting models with resolution of a few km.This suggests that we do need to take into account the correlation of time perturbations.Nevertheless, since we are averaging over the horizontal structure, there seems little justification in treating the temporal correlation in detail.We have chosen to include time correlation simply via a first-order autoregressive process with autocorrelation time scale determined by the characteristic eddy turnover time, t * .

c. Temporally correlated Poisson perturbations as a first-order autoregressive process
Let us introduce a more general stochastic process h i , taken to be the rate of change of state variables in time step i.For simplicity, as above, we shall just consider u, so We model h i as a (stationary) first-order autoregressive stochastic process, in which we assume that a portion mh i21 of the heating comes from thermals that ''fired'' in previous time steps, and a portion n i q comes from n i new thermals in time step i and delivering q heating rate.Thus, Here m 2 [0, 1), and the number of new thermals (n i ) is assumed to follow the Poisson process.
Taking expected values (i.e., ensemble mean) of this firstorder autoregressive stochastic process leads to with l 5 n; hence, As above, this may be rearranged as an additive stochastic process; thus, If we write This is the same stochastic model as Eq. ( 20), except instead of simple multiplicative noise we have a multiplicative stochastic process with where This has the properties and Note that the variance of the correlated factors is less than that of the uncorrelated noise because the same heating is delivered over a longer time determined by m.We need to parameterize m.The ''integral time scale'' for this process is given by å m50,' We assume that this is proportional to t * ; i.e., where a is a tuning parameter of order one.Clearly, negative m makes no physical sense, so we can write If m 5 0 perturbations are uncorrelated in time and are assumed to occur entirely in one time step dt.The option to assume uncorrelated perturbations everywhere has been implemented but, in the following, time-correlated perturbations are always used.From Eq. ( 29), Note that a single thermal delivers total heating (summed over all time steps) given by as expected.
We can write the autocorrelation function in exponential form: m jmj 5 e 2jmjdt/t e , (44) so the exponential correlation time scale is t e 5 2dt/lnm.For small dt/at * , lnm ' 2dt/at * so t e ' at * .Thus, for small dt/at * the scheme approximates to an exponential decay of the heating from each thermal.However, the parameterization of m in terms of t * , Eq. ( 41), gives us a natural cutoff.If dt $ at * then we revert to an ''uncorrelated'' formulation without any ''memory,'' with E(f 2 i ) naturally tending to l 21 as in the uncorrelated case.

a. Procedure
As noted above, in practice, the scales actually represented well by a model are several (at least five or six) grid lengths.It is thus unlikely that variability on the grid scale will couple efficiently with the dynamics of the model.We take the area DA to be the area covered by n g 3 n g grid boxes and so assume that each area DA is spatially uncorrelated.The integer n g is an adjustable parameter of the scheme.Over each area we obtain an average l [using Eq. ( 16)] and m [using Eq. ( 41)] with DA 5 n 2 g Dx 2 , Dt the model time step, A t 5 h 2 , t * 5 h/w m , and This is taken directly from the MetUM boundary layer scheme (Lock et al. 2000) and represents a generalized boundary layer time scale that includes the effect of wind shear through the friction velocity u * as well as the convective velocity scale.The latter is given by with w 0 b 0 the surface buoyancy flux, essentially Eq. ( 4) including the effect of moisture.The form of this essentially comes from surface layer similarity and was first proposed by Panofsky et al. (1977) and is discussed in Holtslag and Boville (1993), though they use a value 0.6 rather than 0.25; the choice of constant 0.25 is as in the Lock et al. (2000) scheme, though its origin is not entirely clear.This extends the use of the scheme to both neutral and (noting that a negative surface buoyancy flux reduces w m ) stable conditions, though the minimum value of 0.4 m s 21 , used for consistency with the boundary layer scheme, may override this in practice.This minimum has little impact in practice, as the scheme produces very small perturbations in stable or even very weakly convective conditions, but it does serve to ensure that extremely short correlation time scales do not occur.Note that it would probably be more consistent to average the surface buoyancy flux, friction velocity and boundary layer depth over the n g 3 n g grid boxes to derive l and m but our approach was adopted to be more efficient on a parallel computer and has little practical impact in relatively homogeneous terrain.Furthermore, because of differences in land use, the surface buoyancy flux can vary substantially from grid box to grid box, while l tends to vary less.As discussed above, this heterogeneity due to differences in land use is an explicit and deterministic source of small-scale variability.The multiplicative nature of the scheme means that this heterogeneity carries through to the stochastic forcing.
For each of the grid boxes on the reduced two-dimensional grid (i.e., each box is n g 3 n g model grid boxes) a random integer n i , drawn from a Poisson distribution, is generated.Note that with l substantially less than 1, as is often the case, this distribution is highly skewed, with 0 occurring frequently.The random factor f i is then generated using Eqs.( 34) and ( 33).The latter reduces to f i 5 « i if m 5 0, i.e., uncorrelated perturbations.Finally, a perturbation f i times the boundary layer increment is added to the model physics forcing.The option is implemented to apply this selectively to the potential temperature, moisture, or wind fields; in practice we have applied it to all.The scheme has been implemented outside the boundary layer scheme as an additional, stochastic forcing scheme.Ideally, it should be moved inside the scheme and the stochastic increments applied to the surface exchange as well, in order to conserve energy.Currently, energy is only conserved to the extent that the ensemble average of the stochastic increments is zero.
The scheme can be summarized thus: where f represents the model u, y, u and q y .If Dt , at * , f i is a first-order autoregressive stochastic process, randomly forced by the Poisson-related variable « i [Eq.( 34)] and with autocorrelation time scale at * .Since « i has zero mean and variance l 21 , f i has zero mean and variance, given by Eq. (37).Thus, inclusion of time correlation greatly reduces the dependency of the noise on time step, especially when Dt at * , as it often is at resolutions O(1) km.
A time step dependence is, of course, present as we have multiplied the increment in a time step by this noise factor.In a bulk model of the CBL our scheme reduces to The variance of the stochastic increment Du inc 5 Du stochastic 2 Du average is thus Note that the area factor A t /DA in Eq. ( 49), can be written (h/n g Dx) 2 so the standard deviation of the noise will depend on h/(n g Dx).

b. Key features
The scheme outlined has the following physically sensible features: 1) The key parameter is n [Eq.( 16)], the average number of thermals triggered in the selected area DA in time Dt.
2) The stochastic increments have zero mean on average (in a steady system).
3) The stochastic increments have an amplitude proportional to the mean increments times l 21/2 .Thus, more thermals means smaller stochastic increments.4) The stochastic increments have time correlation (via an AR1 process) related to the turnover-time.If this is less than the time step, stochastic increments are uncorrelated from time step to time step.

Comparison with other schemes
The implemented scheme uses the parameterized increments [using the boundary layer parameterization described by Lock et al. (2000)].However, in a CBL and cumulus-capped CBL, the results broadly resemble the bulk scheme described above, so we shall use this as a basis for comparison.
A relatively simple stochastic scheme has already been implemented by A. Lock (2016, personal communication) with more recent details in Beare et al. (2019).This assumes Here g i can be either a uniformly distributed variable « U in the range [21, 1] or a temporally correlated variable with The factor m L here is given by with t a fixed autocorrelation time scale, usually taken to be 10 min.Thus, the temporal correlation is very similar to our scheme, apart from the fixed choice of time scale instead of the time scale varying with conditions.In practice, the fixed time scale of 10 min is quite representative of the values of t * found in the CBL.The factor dt/dt 0 scales the increment with time step, with dt 0 5 60 s the time step of the UKV model the scheme was first implemented in.Both g i and « U have variance 1/3 so the variance of the u increment is This may be compared with the variance of the stochastic increment in our scheme applied to a bulk CBL [Eq.( 49)].The dependence on u * is essentially the same.The scheme is implemented with a singles value g i for each box of a ''supergrid'' of n g 3 n g points similar to our scheme but the dependence on ratio of eddy area to averaging area is absent in the Lock scheme, meaning that the same amplitude of noise is applied irrespective of the averaging area.As discussed in section 4a, our scheme reduces the standard deviation by a factor roughly h/(n g Dx).The time step dependence is the same, but normalized by dt 0 rather than t * .These differ by an order of magnitude.The remaining factors, about 1/2 [Eq.( 49)] compared with 1/3 [Eq.( 53)], are a consequence of different assumed probability density functions.Overall, in conditions with Unauthenticated | Downloaded 07/22/21 03:10 PM UTC h ' Dx and t * ' 10min, the Lock scheme will produce stochastic increments about a factor 6n g larger than our scheme.KC describe a scheme designed to address similar issues.A stochastic field is added to the model physics tendencies [KC Eq. ( 1)]; thus, ›F ›t where a sh is a constant, h is a Gaussian random number field and hF 2 i 1/2 is the variance of variables given by the turbulence scheme in their model.The scheme is applied to variables temperature T, vertical velocity w, and specific humidity q.The Gaussian random number field contains spatial and temporal correlations.The temporal correlation arises from holding the field fixed for 10 min, then generating a new field.The 10 min represents the eddy turnover time.Spatial correlation in h is provided by convolving a set of random numbers drawn from a Gaussian distribution with a horizontal two-dimensional Gaussian with standard deviation 2.5 grid points, representing a correlation length scale of 5Dx (in our notation).The resulting field is normalized to have mean 0 and variance E(h 2 ) 5 1.This has similarities with our scheme; in our scheme, we choose a fixed area in space, generally larger than the expected horizontal correlation length scale, on the grounds that the horizontal scales well represented by the model are much larger than the spatial correlation scale.However, we explicitly represent the temporal correlation, on the grounds that the time step is shorter than the temporal correlation scale.Thus, the spatial pattern is stepped (or ''checkerboarded'') but the temporal pattern varies relatively smoothly from time step to time step.The KC scheme does the reverse, representing smooth correlations explicitly in space on scales roughly the smallest that are well-represented by the model, but with a stepped time profile, each fixed period given by an equal 10 min with no correlation between 10-min periods.Thus, the physical idea behind temporal correlation is consistent with our scheme but the implementation does not adapt to different conditions in the same way.
Note that a revised version of the KC scheme has been published more recently (after first submission of this paper) in Hirt et al. (2019); this, among other changes, implements an autoregressive temporal correlation scheme similar to that of the Lock scheme discussed above.The scheme is implemented as in Eq. ( 33), with « a spatially correlated ''Gaussian bump'' field and the same fixed correlation time as used in KC.Though the end result is very similar, we regard a key difference in our scheme as the link to the Poisson process expressed in the starting-point equation, Eq. ( 27), which provides a self-consistent scale adaptivity, as well as the physically based correlation.
The ''checkerboard'' in space in our scheme is necessary as the amplitude and correlation time scale vary in space (and time).No doubt some smoothing could be applied to the spatial pattern of parameters, if necessary, but in practice the increments in a time step are so small that this structure does not seem to have any significant impact.Both schemes assume vertically coherent perturbations.
According to KC, the parameter a sh is given by where l ' is the asymptotic mixing length describing the average size of an eddy assumed in the turbulence parameterization, dt is equal to the ''temporal resolution of the model,'' 25 s, and a sh,F is a tunable parameter set equal to 2 for all variables.This does not seem to make physical sense.The stochastic increment in each time step dt over the 10 min that a choice of h applies, is thus i.e., independent of dt if dt [ dt.This means that if one were to halve the model time step, twice the total increment would be applied in the 10-min period h applies for.Given the time step of 25 s quoted, this would also lead to a very large increment.Equation ( 56) only makes physical sense if dt is a fixed time scale independent of the time step.Indeed, the revised scheme described in the appendix of Rasp et al. (2018) replaces this with t eddy .The tuning parameter was also changed (to 7.2 from 2).This partly offsets the change in time scale from 25 to 600 s, but clearly a change from 2 to 48 would be required to completely offset the change in time scale.The authors state that the ''tuning factor was chosen so that the effects of the PSP scheme were noticeable but reasonable''; we understand this to mean the same methodology as KC was applied to optimize the amount and onset time of domain integrated precipitation for a strongly nonequilibrium case followed by a check that the same choice of tuning constant gave a similar improvement in other nonequilibrium cases, and while leaving the total precipitation largely unchanged in equilibrium cases.Whatever it means, clearly the size of perturbations applied in the revised scheme should be a factor of around 7 smaller than in the KC scheme.The scheme reported by Hirt et al. (2019) and so has variance with E(h 2 ) 5 1 This may be compared with Eq. ( 25).Overall, the ratio of the KC scheme standard deviation in temperature increment per time step to that in our scheme is Clearly t * ; t eddy , the only difference being that in our scheme the local value is used rather than a global constant.The first three terms highlight the key physics in the scheme: the horizontal-averaging length scale (n g ), the length scale of turbulent eddies, and the variability in temperature due to turbulent eddies.The averaging scale is based upon a very similar premise and differs only in being more (hopefully realistically) physically flexible in our scheme (i.e., if we choose a larger averaging scale, we get smaller variability).
A key difference in formulation is that we take the length scale as the length scale occupied, on average, by one eddy, rather than the mixing length in a diffusion approach, which is assumed to be the scale of the eddy itself.Thus l ' , typically 150 m, is much smaller than h (and, indeed, in some schemes is taken to be 0.15h).However, including the factor of a sh,T 5 7.2 in this term means a sh,T l ' /h ' 1. Garratt (1994) suggests T 2 /u 2 * ranging from about 1.85 in the middle of the boundary layer to 10 or more near the top and bottom.Thus, in practice, the KCR scheme applies perhaps 3 or 4 times larger perturbations near the top and bottom of the boundary layer than our scheme.While perhaps of practical value, we would argue that this is physically inconsistent as the turbulent length scale clearly decreases in these regions.For example, Kaimal et al. (1976) (Fig. 5) show smaller length scales lower in the boundary layer where hu 2 i is larger.Thus, if the local variance from the turbulence scheme is used at a given height, it would seem more appropriate to use a height-varying length scale.The effect would be to at least partially cancel the variability in each.
We have focused primarily on amplitude of perturbations, and particularly on temperature.The recent enhancements of the KC scheme by Hirt et al. (2019) focus on a nondivergent wind perturbation which is found to force the vertical velocity more effectively.This seems to be a useful innovation; Hirt et al. (2019) emphasize that this enhances coupling with the resolved scales as less of the perturbation projects onto sound waves, so comparing magnitudes may not give a true reflection of impact.Whether this is physically important at the scales we are applying perturbations at (several km) may be questionable, but the approach may become more important as models move more into the subkilometer ''turbulence gray zone.''We did not pursue this in this initial implementation, in part, because the MetUM already has a kinetic energy backscatter scheme and, in part, to maintain simplicity.It is certainly likely that small-scale vertical motions at the top of the boundary layer play a crucial role in coupling to shallow cumulus, but this is likely to be at a scale much smaller than those we are considering, and future work will focus on extending the scheme to include parameterized coupling with unresolved vertical motion in the cumulus scheme along with the backscatter scheme.
In summary, at least when considering the magnitude of temperature fluctuations, the schemes are broadly similar in both magnitude and dependencies.The two schemes will produce different vertical profiles of perturbation; ours is simple and follows the parameterization profile.The KC scheme has a profile dependent on the profile of variability, but seems slightly flawed in not including the compensating profile of turbulence length scale.Our scheme seems better justified when considering the relative sizes of grid box and time step compared with spatial and, in particular, temporal correlation scales, respectively.Furthermore, the Poisson-based stochastic process is properly scale adaptive (at least as far as the assumptions go), so naturally gives higher amplitudes when run at smaller scales that average to the consistent amplitudes at larger scales.However, in practice, the manner in which correlations are applied probably makes little difference (as discussed further below), and the main difference is that temporal correlation in our scheme depends explicitly on the local boundary layer properties, and our random function is likely to be substantially more skewed.
Overall, despite their apparent differences in formulation, the schemes are notable more for their similarity than their differences, the main key difference being the nondivergent increments in the later Hirt et al. (2019) variant.

Diagnostics a. Introduction to diagnostics
Diagnostics for testing the scheme have been chosen to allow examination of both the changes in magnitude induced by the scheme and the spatial differences.As the scheme is written for convective-scale models, careful interpretation of more ''traditional'' techniques is required.To combat the differences between interpretation of diagnostics on the convectionpermitting compared to convection-parameterizing scales either the metrics have been adapted (as discussed below) or cell statistics have been used.All of the diagnostics have been calculated over the analysis boxes shown in Fig. 1.

b. Mean square difference
The mean square difference (MSD) is a simple measure of the difference of magnitude between two forecasts, and is more frequently used in its square root form (e.g., Hohenegger and Schär 2007;Clark et al. 2009;Johnson et al. 2014).Here, however, we use it in the squared form as in Leoncini et al. (2010Leoncini et al. ( , 2013) ) and, like Flack et al. (2018), we define the MSD as where P p is the hourly accumulation of precipitation in the perturbed forecast, and P c is the hourly accumulation of precipitation in the control forecast.The normalization factor implies that the MSD will not be altered by the total amount of precipitation in the forecast and so multiple cases with different precipitation amounts can be compared fairly.
The MSD is kept in its square form so that, when considering precipitation forecasts, it can be broken down into the MSD arising from precipitating points only in the control forecast, those points only in the perturbed forecast and those points that have precipitation in the same location in both the perturbed and the control (common points).Considering only the common points (MSD common ) between both forecasts removes the ''double penalty'' problem faced by all gridpoint diagnostics at the convective scale.By removing the ''double penalty'' problem the interpretation of this diagnostic at the convective scale reverts to its traditional interpretation of magnitude differences as opposed to a mixture of magnitude and spatial differences.The threshold for the precipitation used within this study is an hourly accumulation of 1 mm.

c. Fraction of common points
The fraction of common points (F common ) is a useful concept alongside MSD common as it helps determine the fraction of the precipitating points used within the calculation of MSD common .The fraction is defined by the total number of precipitating points in common between two forecasts (above an arbitrary threshold: in this case hourly precipitation accumulations exceeding 1 mm) divided by the worst case scenario of precipitating points (i.e., when both forecasts have all the rainfall in different locations); i.e., Here, N is the number of precipitating points above the threshold and the subscripts p and c refer to the forecasts being compared (perturbed and control forecast, respectively).As with the MSD, this diagnostic has been used in Leoncini et al. (2010), and here we take the normalization of Flack et al. (2018) so that F common ranges between 0 and unity, where unity represents a forecast where all of the precipitation is in the same location, and 0 represents a forecast where none of the precipitation is in the same location.Had the subtraction of N p,c not occurred in the denominator then the fraction would range between 0.0 and 0.5.As in Flack et al. (2018), the F common values are compared to the fraction of common points that would be yielded from a random relocation of precipitating cells, based upon the number of cells in the control forecast (F chance ).A value of F common smaller than or close to F chance implies that the spatial forecasts of the events are near to a random forecast.

d. Cell statistics
The precipitation field from each forecast comprises discrete areas, or cells, of rain.To determine the distribution of size and strength of these cells, the cells are first identified using a tracking algorithm described in Hanley et al. (2015).Each identified cell is then described as having an area-equivalent diameter and a mean rain rate; area-equivalent diameter is defined as the diameter a circle with the equivalent area would have.Area and rain rate thresholds are applied to the precipitation fields so that only cells with a specified minimum size and rain rate are considered in the evaluation.In this study, a minimum area of 2Dx 3 2Dx is used and rain rate thresholds of 1 and 4 mm h 21 are imposed, representing light and heavy rain, respectively.After thresholds have been imposed, distributions of cell size and magnitude can be produced, in the form of histograms.

Testing the scheme a. Test cases
While the scheme has been applied to a number of convective cases during its development, two test cases from summer 2017 are presented here.These cases exhibit very different convective behavior and have been chosen to show how the scheme behaves in a situation that is more stochastic in nature (scattered showers) compared to one that is not [a mesoscale convective system (MCS)].The convective adjustment time scale (e.g., Done et al. 2006) is used to show how different these cases are by considering their placement in the spectrum of convective regimes (e.g., Flack et al. 2018).The cases have also been named after the areas in which either there was flooding or was the most dynamically active.

1) COVERACK CASE: 18 JULY 2017
This case is highlighted by a flash flood that swept through the village of Coverack in southern Cornwall (Cornwall Council 2017).The convective event formed off the coast of Brittany at approximately 1200 UTC and progressed northward, reaching Coverack at around 1400 UTC.Part of the event anchored over Coverack and produced intense rainfall in the area for approximately 3 h, the main part of the system continued past Coverack and into the surrounding moorland.
The event formed as a result of a surface trough (Fig. 1a), and this forcing continued with the event, with it eventually forming into an MCS.This rapidly decayed and later in the day another MCS formed over France, which then moved into southeast England causing surface-water flooding in Reading (Davies 2017).The convective adjustment time scale for this case begins at 4.2 h and reduces to 0.4 h by the end of the convection.Therefore, it sits somewhere in the middle of the regimes for the United Kingdom (cf. with Flack et al. 2016).
Using the time scale's value and combining it with the knowledge that local forcing existed to keeping the storm anchored places this case more toward the nonequilibrium end of the spectrum (despite the marginal time scale).
2) KENT CASE: 5 AUGUST 2017 A high pressure center was located over continental Europe and the United Kingdom was dominated by westerly flow (Fig. 1b).This led to showers forming in the lee of the Welsh mountains which were then advected by the flow across England.As the showers formed, they intensified as a result of localized forcing from a surface trough.This surface trough then tracked with the showers and they eventually organized into squall lines over central England.These lines then progressed eastward into East Anglia, the Thames estuary, and North Kent.During the event, one of the authors (D.L. A. Flack) was in North Kent.Along with the intense precipitation associated with this convective line, at the southern end of the line (over the North Kent coast) he observed a mesocyclone and three funnel clouds from 1502 to 1534 UTC.This case has a low convective adjustment time scale throughout the convective life cycle varying between 1.1 and 0.1 h.These low values imply that the case is closer toward the convective quasiequilibrium end of the spectrum of convective regimes.

b. Model setup
The scheme has been tested in the 1.5 km resolution, UKV configuration of the MetUM at version 10.6 with the Even Newer Dynamics for General Atmospheric Modeling of the Environment (ENDGAME) dynamical core (Wood et al. 2014).The model domain is shown in Fig. 1.This configuration uses the Lock et al. (2000) boundary layer scheme, the Wilson and Ballard (1999) microphysics scheme, the Edwards and Slingo (1996) radiation scheme and the Porson et al. (2010) surface-exchange scheme.The two cases considered here are run such that the most intense convection of the case studies occurs at approximately 24 h into the simulation.The Coverack case is initiated at 1500 UTC 17 July 2017 and the Kent case is initiated at 1500 UTC 4 August 2017.These times are chosen to allow time for the perturbations to spin up so that they can influence the forecast of the convective events.

c. Effect of the scheme
This section illustrates how the introduction of the stochastic scheme described in sections 3 and 4 influences the forecast in the MetUM using the diagnostics introduced in section 6.As described in section 4, the area DA over which we are considering the thermals is covered by n g 3 n g grid boxes.The perturbations to the field are calculated on a reduced two-dimensional grid (with each horizontal dimension reduced by n g ), with each original grid box within an n g 3 n g gridbox area using identical random perturbations.For each grid box in the original grid, the associated perturbation from the reduced grid is then multiplied by the perturbation to the field due to the boundary layer scheme in that grid box to obtain the resulting perturbation to be used in the scheme.This method is applied to potential temperature, moisture and horizontal wind fields as described in Eq. ( 48).A value of n g 5 8 has been chosen so as to be close to the 5 or 6 Dx required to resolve features on a grid and yet not too large so that the stochastic perturbations are negligible.Sensitivity to this choice is tested in section 7d.
For each case, the MetUM was initially run without the addition of any stochastic perturbations, to produce a control run.The model run was then repeated five times with the scheme included, each time with a different random seed, to produce five ensemble members.This gives an indication of the spread of the forecasts when using the scheme.Figure 2 shows that in both cases the overall domain-averaged rainfall was quite well forecast but certainly not perfectly.Initiation was (perhaps unusually) more gradual in the model than in radar rainfall in the Coverack case.The initiation time of the convection is not significantly changed comparing the control run and any of the ensemble members.Furthermore, during the first 6 h after the convection is initiated, the domain-averaged rain rate is almost identical.The largest differences appear later into the run; although the peaks and troughs in the precipitation are similar, the magnitude varies by up to 10% or 20%.There is a qualitative impression that the growth of differences between members has saturated well before the end of the forecast-this is discussed more quantitatively below.
This variation is also evident in instantaneous precipitation fields in Fig. 3 for Coverack and Fig. 4 for Kent.In both cases the forecasts, while having larger-scale structure in common with the radar rainfall, have substantial errors at the cloudcluster scale.The ensemble members show similar larger regions of precipitation (cloud clusters) but differ in the finer detail (cloud scale).The fraction of common points between the control forecast and the ensemble member forecasts (using an hourly accumulation threshold of 1 mm) can be seen in Fig. 5, with the mean of all ensemble members shown as the solid line and the dashed lines indicating 61 standard deviation from the mean.When the convection is fully developed in the domain, at a lead time of around 18 h, the fraction of common points generally has a value between 0.4 and 0.8, with a decreasing trend toward the lower end of the range.This supports observations that, while the location of the main regions of precipitation are superficially the same, the details in location and/or timing of the heaviest rain differ between forecasts.
The increases in F common are associated with (i) the formation of new cells collocated in the ensemble members, (ii) enlargement/intensification of cells to encompass other smaller cells in the other members, or (iii) cells coming into the analysis domain from the boundaries, as the boundary conditions remain unchanged throughout each ensemble member.Subsequent falls in the diagnostic are expected as the forecasts begin to diverge from one another.
In the Coverack case, the entry of the MCS into the domain is preceded by the formation of small and very scattered shower cells, which develop in random locations in each ensemble member roughly 13 h into the forecast.The large area of organized precipitation then moves into the domain accompanied by smaller cells (again, in different locations depending on the ensemble member) and an increase in F common .It is fully in the domain by 1500 UTC (T 1 24 h) when F common starts to increase again.By 1700 UTC, there are fewer small cells and then the organized system moves out of the domain.It would seem that the increases in F common coincide with the decrease in the number of smaller cells.
The sharp drop at T 1 26 h in Fig. 5 in the Kent case, is associated with the squall lines leaving the analysis domain and so reflects different timings alongside a drop in the number of precipitating cells, as indicated by the drop in F chance .
The difference in magnitude for the common points between the control and the ensemble forecasts is shown in Fig. 6, where the mean square difference of common points between the control and each ensemble member, is less than 0.5 for all runs.For the Coverack case, the ensemble-averaged MSD common remains fairly constant over time (though with a peak at T 1 18 h).This is probably because the forecast was dominated by the MCS.The Kent case shows more temporal variations that can be associated with the development of the showers and merger into squall lines.Cell statistics in Fig. 7 (for Coverack) and Fig. 8 (for Kent) show that the distribution of cell size and magnitude is largely similar for each ensemble member.Overall, the total number of precipitation cells (.1 mm h 21 threshold) appears to increase when the scheme is used and there is a hint that there are more larger cells in the Coverack case, although in general, the distribution of size and magnitude of precipitation cells are similar for the control and each ensemble member.Together with the MSD and F common plots, we can conclude that the cells have similar characteristics for the control and each ensemble member but appear in different locations at any particular time (with evidence that there may be more of the larger cells with 1 mm h 21 threshold).On the other hand, the Kent case shows a general reduction in the total number of precipitation cells and is likely to be linked to slightly stronger upper-level forcing in this case compared to the Coverack case.

d. Sensitivity to number of grid boxes, n g
As described in section 4, the area DA in the scheme is defined using n g grid boxes.This adjustable parameter determines how the stochastic perturbations vary spatially.A value of 8 is chosen by default for reasons described previously.However, it is still possible to use a smaller (or larger) value of n g and this section explores the sensitivity of the scheme to the value of n g in model runs for the cases described above.Specifically, values for n g of 2 and 4 are compared with the n g 5 8 runs already described.
The key features in section 4 highlight that a smaller area DA would hold fewer thermals and thus the stochastic increments would be larger in a physically consistent way.This is confirmed in the distributions of the instantaneous perturbations shown in Fig. 9 for both cases.Note that the instantaneous perturbations are those added in a time step; as discussed in section 7c, the total perturbation amplitude delivered by temporally correlated perturbations is of order t * /Dt, typically an order of magnitude, larger.
There are some differences in the distributions between the cases though.Specifically, in the Kent case with increased n g , the distribution shows greater frequency of points with a perturbation value of zero, and there is also a narrowing of the distribution (cf.n g 5 8 and n g 5 2; Fig. 9b), whereas for the Coverack case (cf.n g 5 8 and n g 5 4; Fig. 9a), the distribution FIG. 7. Coverack: Cell statistics for radar, the control run, and all ensemble members, using the stochastic BL scheme with n g 5 8 (as described in section 4a) and different seeds.Rain-rate and area thresholds are indicated in the title of each subplot.only becomes narrower and there is no change in the frequency of points with a perturbation size of zero.These changes in the distribution are linked to the more numerous convection, and hence perturbation occurrence, in the Kent case, compared to the more locationally restricted perturbations (mainly around the MCS) in the Coverack case.
The sensitivity of the forecast to a change in value of n g is shown by the time series of average rain rate in Figs.10a and 10b FIG. 8. Kent: Cell statistics for radar, the control run, and all ensemble members, using the stochastic BL scheme with n g 5 8 (as described in section 4a) and different seeds.Rain-rate and area thresholds are indicated in the title of each subplot.FIG. 9. Distribution of amplitude of perturbations for model with different values of n g (as described in section 4a) and amplitude factor (as described in section 7e).The black line is the baseline run (n g 5 8, Ampfactor 5 1). in the Coverack and Kent cases, respectively.This shows that using a lower value of n g does not significantly change the precipitation amount; the rainfall rate using these lower values of n g actually fall within the range of values from the ensemble member forecasts (not shown here).Instantaneous rain rate plots over the Coverack area at the time when the precipitation was anchored over the village (third row of Fig. 3-and equivalent Fig. 4 for Kent), also show that the main regions of heavy precipitation are very similar in all ensemble members and runs with lower value of n g although, again, the finer detail is different.Cell statistics (see Figs. 13a and 13b for Coverack and Kent, respectively) show that the distribution of the cell size and intensity are not sensitive to the value of n g .
The number of common points between the control (no scheme) forecast and those with the scheme but varying values of n g are largely similar for both cases.Small changes in F common are observed with smaller perturbation areas (see Figs. 11a and 11b for Coverack and Kent,respectively).This is echoed in the mean square difference of the common points (see Figs. 12a and 12b for Coverack and Kent, respectively).This may reflect the idea that it is the variability at the scales resolvable that matters and the scheme is designed to produce similar variability on this scale even if a small scale is used in the scheme.

e. Sensitivity to amplitude
The magnitude of the perturbations at each time step produced by this scheme are of order 20 times smaller than those from the (A.Lock 2016, personal communication) scheme described in section 5.It was demonstrated in the previous section that changing the seed in the calculation of the perturbations or the area over which the perturbations change did not significantly change the overall forecast compared with a forecast without the scheme (though substantial differences develop at small scales).This section considers the effect of the magnitude of the perturbation on the results.This is achieved by multiplying the second term on the right-hand side of Eq. ( 48) by a factor (hereafter referred to as Ampfactor) thus increasing the perturbations by that factor.We use Ampfactor values of 2 and 10.These values of Ampfactor are used purely for these sensitivity tests, they do not have any physical basis.This lack of physical basis implies that the results would be less FIG.10.Average rain rate for radar and model with different of n g (as in section and amplitude factor (as described in section 7e).The black line is the baseline run (n g 5 8, Ampfactor 5 1).(a) Coverack and (b) Kent. FIG. 11.F common for model with different values of n g (as described in section 4a) and Ampfactor (as described in section 7e) with associated F chance in the lower lines.The black line is the run (n g 5 8, Ampfactor 5 1).Once again, there are no obvious differences in the largerscale structure of precipitation fields or distribution of cell size with the larger perturbation.These larger perturbations do not appear to significantly affect the average rain rate (see Fig. 10) and the area of precipitation is largely similar with differences only in the detail (see bottom row in Figs. 3 and 4).
The detail can be studied through the other diagnostics.Differences between Ampfactor 1 and Ampfactor 2 simulations, in comparison with the control and ensemble spread, are minimal.Larger differences occur for multiplying the perturbations by an Ampfactor of 10.There is a consistent decrease in F common , particularly for the Kent case, during the mature stages of the convective events (Fig. 11b).The ensembleaveraged MSD common (Fig. 12) shows similar behavior for Ampfactor 1 and 2 simulations, and even the Ampfactor 10 results show a similar spread in the ensemble, though again with some suggestion of more impact in the Kent case.Similarly, the size and magnitude of the cells, shown in Figs 13a and 13b (for Coverack and Kent, respectively), do not change significantly with perturbation size.

Conclusions
A physically consistent but simple stochastic boundary layer scheme has been developed that takes a form most naturally expressed as multiplicative noise.The noise is Poisson in character and, consideration of magnitudes shows that the distribution can be highly skewed at convection-permitting scales.The scheme has been derived via a very different route but has been shown to have much in common with the KC scheme, though with the advantage that some parameters are more physically based.
Quantitative investigation of the scheme shows that with horizontal grid lengths around 1 km temporal correlation is far more important than spatial.Above all, however, it is notable that the size of perturbations is very small (overall heating, for example, generally less than 0.01 K in temperature delivered over a correlation time of around 10 min); at least an order of magnitude smaller than in some more ad hoc schemes.As a result, they have very little impact on the energetics of the forecast, so overall domain-averaged precipitation, for example, is essentially unchanged.
Differences between ensemble members initialized from the control do grow to produce significant differences between ensemble members at the scale of individual convective cells.Thus, they can plausibly account for lack of predictability at this scale.After around 12 h they appear to be roughly saturated.However, it must be emphasized that perturbations are continuously added to the system, so this represents the time scale to achieve a balance between addition of new perturbations, perturbation growth and dissipation, not just saturation of initial perturbations.On these time scales there is no strong evidence for substantial growth of ensemble spread at larger scales.However, it is likely that, in addition to this being by the domain size, the cases chosen have a finite lifetime of very active convection.
The results of the scheme are not sensitive to the choice of averaging area the perturbations are chosen to represent, at least where this remains within an order of magnitude of the grid scale.This was the intention behind the design of the scheme and is consistent with the idea that the greater amplitude of perturbations when applied closer to the grid scale do not couple with the dynamics, and only those at well-resolved scales are effective.However, we have not proven this by looking at the initial perturbation growth, as it seems clear that the scheme has impact through the longer-term saturation of the continuously applied perturbations.Some sensitivity to magnitude in difference diagnostics compared to no scheme is evident, which suggests that the amplitude of perturbations at resolved scales does have some effect, but the sensitivity is not strong.A factor-of-2 increase in magnitude produces results that are within the ensemble spread produced by the scheme but an order of magnitude increase in magnitude does lead to ensemble spread that is somewhat more rapid, and generally marginally more spatial spread when considering the Kent case.The scheme represents only the variability due to boundary layer turbulence in a horizontally homogeneous system.Other sources of boundary layer variability undoubtedly exist, such as small-scale heterogeneity in land surface and soil moisture.More ad hoc schemes may well represent actual boundary layer variability more quantitatively, but at the cost of having less basis in physical processes.Nevertheless, the scheme has been shown to be sufficient to produce substantial differences in forecasts at the scale of convective cells; in Part II of this paper, the differences between ensemble members is compared with that resulting from mesoscale differences in the initial and boundary conditions.

FIG. 3 .
FIG. 3. Coverack: (top left)  Radar and precipitation fields at 1400 UTC from (top right) control model run, (second row) ensemble members with different seeds, (third row) ensemble member 1 with different n g as described in section 4a and (bottom) different values of Ampfactor, as described in section 7e.

FIG. 5 .
FIG. 5. F common for (a) Coverack and (b) Kent cases showing mean (solid line), 61 standard deviation (dashed lines), and F chance (dotted line) for all ensemble members.
FIG. 9. Distribution of amplitude of perturbations for model with different values of n g (as described in section 4a) and amplitude factor (as described in section 7e).The black line is the baseline run (n g 5 8, Ampfactor 5 1).(a) Coverack and (b) Kent case.
FIG. 11.F common for model with different values of n g (as described in section 4a) and Ampfactor (as described in section 7e) with associated F chance in the lower lines.The black line is the run (n g 5 8, Ampfactor 5 1).(a) Coverack and (b) Kent.

FIG. 12 .
FIG.12.Ensemble-averaged MSD common for model with different values of n g (as described in section 4a) and Ampfactor (as described in section 7e).The black line is the baseline run (n g 5 8, Ampfactor 5 1).(a) Coverack and (b) Kent.
Rasp et al. (2018)If we denote the revised scheme ofRasp et al. (2018)KCR, considering temperature T, the increment DT KCR in a time step dt is given by