Model Uncertainty Representation in a Convection-Permitting Ensemble—SPP and SPPT in HarmonEPS

: The stochastically perturbed parameterizations scheme (SPP) is here implemented and tested in HarmonEPS } the convection-permitting limited area ensemble prediction system by the international research program High Resolution Limited Area Model (HIRLAM) group. SPP introduces stochastic perturbations to values of chosen closure parameters representing ef ﬁ ciencies or rates of change in parameterized atmospheric (sub)processes. The impact of SPP is compared to that of the stochastically perturbed parameterization tendencies scheme (SPPT). SPP in this ﬁ rst version in HarmonEPS perturbs 11 parameters, active in different atmospheric processes and under various weather conditions. The main motivation for this study is the lack of variability seen in cloud products in HarmonEPS, as reported by duty forecasters. SPP in this ﬁ rst version is able to increase variability in a range of weather variables, including the cloud products. However, for some weather variables the root-mean-squared error of the ensemble mean is increased and the mean bias is impacted, especially in winter. This indicates that (some) parameter perturbation distributions are not optimal in the current con ﬁ guration, and a further sensitivity analysis is required. SPPT resulted in a smaller increase in variability in the ensemble, but the impact was almost completely masked out when combined with perturbations of the initial state, lateral boundaries, and surface properties. An in-depth investigation into this lack of impact from SPPT is here presented through examining, among other things, accumulated tendencies from the model physics.

Significance statement. Small inaccuracies, simplifications or errors in any part of a complex and nonlinear system like a weather model can amplify and in a short time become significant. We wanted to introduce a physically consistent way of representing these uncertainties in a model that is used in several European countries. To do this we introduce variations in a few parameters that are used in the model description, and that we know are uncertain. By doing this we were able to increase the variability of the cloud products as desired. We see this as a promising approach for capturing the possibilities of fog occurring or not in this model. Further refinements are needed before it can be used in operational weather forecasts.

Introduction
Ensemble prediction systems (EPSs) are the commonly used framework in numerical weather prediction (NWP) to provide information on the possible future states of the atmosphere, taking into account uncertainties that exist in different parts of the forecasting system. The main sources of uncertainty in NWP models originate from (i) incomplete reconstruction of the current atmospheric state (due to lack of observations, limitations in data-assimilation, etc.), and (ii) errors in model construction (arising from the need to approximate and discretize the atmospheric governing equations, which then results in parametrization of unresolved processes). These are referred to as initial state uncertainty and model uncertainty, respectively. A third source of uncertainty arises from how interactions are handled between the atmosphere and other Earth system components (oceans, glaciers, etc.). In Limited Area Modeling (LAM) an additional uncertainty source comes from how lateral boundary conditions from the host model are handled (see e.g. Frogner et al. 2019).
HarmonEPS (Frogner et al. 2019) is a convection-permitting LAM EPS developed by the international research program High Resolution Limited Area Model (HIRLAM) group. The EPS configuration used here includes initial state, surface and lateral boundary uncertainty representations. This study is motivated by feedback from duty forecasters related to insufficient spread characteristics in HarmonEPS cloud products. From a forecaster's perspective, the uncertainty regarding clouds, and especially low clouds, is of special interest. One reason is that it also affects other prognostic parameters such as temperature near the ground. Currently, the spread of the ensemble is considerably lower than the forecast error (e.g. root mean square error, RMSE, of the ensemble mean). It is often seen that all members have the same misplacement or have the same over-or under-prediction of the clouds. Thus, the issue of too confident ensembles in difficult-to-predict weather situations (related especially to cloud products) is often brought up by duty forecasters. On top of this, forecasting clouds in the correct place is important for generating products for atmospheric icing for wind energy, power lines and aviation (see e.g Bernabò et al. 2015;Kraj and Bibeau 2010;Nygaard et al. 2016).
Some operational HarmonEPS configurations have opted to use multi-physics or multi-model approaches, but there is currently no universal approach to handling model uncertainties in Har-monEPS. Stochastically Perturbed Parameterization Tendencies (SPPT) (see e.g. Bouttier et al. 2012) is available as an option, but is not used operationally in any suites due to the small effect seen in previous studies (see Frogner et al. 2019). Finding an efficient model uncertainty representation is therefore the most obvious pathway that could result in a more realistic variability in cloud products.
Representing model uncertainties in weather forecasts is challenging, and it continues to be an active area of research (see e.g. Leutbecher et al. 2017). Model errors are complex and arise from a multitude of sources, therefore several different approaches for representing them in EPS settings have been developed over the years (see e.g. Ollinaho et al. 2017). In this paper, the focus is on model uncertainty representations accounting for errors in (i) total tendency contributions from physical parameterizations of the model, and in (ii) chosen values of closure parameters controlling the efficiencies or rates of change of parametrized (sub)processes. For (i), SPPT is used. For (ii), the recently developed Stochastically Perturbed parameterizations (SPP) methodology is applied following the implementation from the European Centre for Medium-Range Weather Forecasting (ECMWF) (Ollinaho et al. 2017;Lang et al. 2021). SPP has been applied in several LAM EPSs, e.g. (i) Wastl et al. (2019a) applied SPP in a hybrid setup in Convection-Permitting Limited Area Ensemble Forecasting (C-LAEF), (ii) Jankov et al. (2019) use SPP in High-Resolution Rapid Refresh (HRRR) ensemble, and (iii) Thompson et al. (2021)

have implemented SPP in Weather
Research and Forecasting (WRF) model. The latter two implementations use a slightly different SPP configuration to that used by ECMWF and in this study (described in Section 5). In this first implementation of SPP in HarmonEPS, 12 key parameters have been chosen and tested with an emphasis on parameters related to clouds and microphysics (7 of the 12) with the aim of trying to overcome the lack of variability in clouds in particular. Based on initial testing, 11 of the 12 parameters were accepted for further studying. Details about the excluded parameter are however provided in Section 5.
The ensemble system is described in section 2, the experimental setup in section 3 and the verification methodology used in section 4. Section 5 describes the SPPT and SPP implementations and how the two model uncertainty schemes perform in HarmonEPS. Section 6 is devoted to an in-depth investigation on the different perturbations in HarmonEPS and how they interact. Finally a general discussion and conclusions are presented in section 7.

HarmonEPS -The HARMONIE-AROME ensemble prediction system
The convection-permitting HARMONIE-AROME model (Bengtsson et al. 2017) is developed within the ALADIN-HIRLAM NWP system (Termonia et al. 2018) and the system serves as the operational forecasting tool in a number of countries in Europe. The forecast model is run with a 2.5 km horizontal grid spacing with 65 levels in the vertical. The upper air data assimilation system is based on Three-Dimensional Variational Data Assimilation, 3DVAR, (Brousseau et al. 2011) with 3-hourly cycling. At the surface two meter temperature (T2m) and relative humidity (RH2m) as well as snow cover are assimilated using optimal interpolation (Giard and Bazile 2000). The ensemble prediction part of the system, HarmonEPS (Frogner et al. 2019), is used in this study, and it supports a wide range of perturbation methodologies dealing with initial, model and boundary uncertainty.
The following perturbations described in Frogner et al. (2019) are used in this study: (i) the initial condition (IC) perturbations created from applying the difference between the ECMWF ENS member and control to the HarmonEPS control member analysis. (ii) The lateral boundary perturbations (LBC). It must be noted that the LBCs are not actual perturbations, but rather balanced states from the corresponding ensemble member from the ECMWF operational EPS (ECMWF ENS) (Sleigh et al. 2019). (iii) At the surface, perturbations are applied to each members' surface analysis following Bouttier et al. (2016) with perturbations added to model fields kept constant during the forecasts (such as sea surface temperature, vegetation and leaf area index) as well as to fields evolving during the forecasts (temperature and moisture in the soil). For more details on IC, LBC and surface perturbations in HarmonEPS the reader is referred to Frogner et al. (2019).

Experimental setup
HarmonEPS has been run in three different setups in this study, all with the operational horizontal and vertical resolution of HARMONIE-AROME, and with 3-hourly cycling with 3DVAR for the control: (i) Each member is using the control member's upper air analysis with initial perturbations (IC) added as described above. All members run their own surface analysis using optimal interpolation, and surface perturbations are applied. The handling of the lateral boundaries are as described above (LBC). Since we are initializing the model surface fields from an ECMWF ENS model state, a two week spin-up period is run prior to the start of the experiment periods to allow the slow soil variables to adapt to the HarmonEPS model climate. This is the reference setup in All the experiments are run with 6 ensemble members plus a control member. Initial tests were conducted with more ensemble members, but in order to make testing computationally affordable there was a need to reduce the ensemble member count. 6 members was found to be sufficient to maintain the signal on how the ensemble skill was changing between the different experiments. This is also in line with the results of Clark et al. (2011) who showed that their relatively small ensemble (3-9 members) had statistically indistinguishable average ROC areas relative to their full 17-member ensemble, and Keil et al. (2019) who tested the effect of the different ensemble sizes by applying a re-sampling method with replacement and got qualitatively similar results.

Verification methodology
The verification of the different experiments is done against point observations and against satellite-observed cloud masks. For the point observations near surface parameters are verified against SYNOP stations, while upper air parameters are verified against radiosondes. Bi-linear interpolation is used to interpolate the forecast values to the observation points. For 2 m temperature a correction is applied to account for the different elevation in model and observations, using the standard adiabatic lapse rate of 6.5 K km-1. A gross error check is performed, where unrealistic values are removed. Then a further check is performed, where observations that are more than 6 standard deviations away from the forecast values are removed.
For the point verification the following validation metrics are used to show the relative performance of the different experiments. All ensemble members, including the control, are used in the calculations: • RMSE: The root-mean-square error of the ensemble mean of the forecast compared with observations.
• The ensemble spread, or variability: the standard deviation of the ensemble members around the ensemble mean where is the ensemble mean. The ensemble spread should be equal to the RMSE for a well calibrated ensemble.
• Mean bias: The ensemble mean -the observation, averaged over all cases. For the spatial verification the following validation metric is used: • FSS: Fractions Skill Score (Roberts and Lean 2008;Roberts 2008). FSS is a measure of model forecast skill as a function of spatial scale.
FSS is used here to evaluate the model forecast skill for clouds, where the forecast total cloud cover, , is assessed against a satellite-observed cloud mask. Crocker and Mittermaier (2013) employed a cloud mask to assess spatial model bias using contingency table metrics and objectbased methods. This work concluded that using a cloud mask can give a quick assessment of the forecast model tendency to contaminate clear sky with low cloud fractions when low thresholds are used.
In order to undertake the model evaluation, a forecast cloud mask, , is extracted from the predicted total cloud cover by defining a threshold in the following way: where is the masked field, is the cloud fraction and subscript is for observations and for forecast fields.
More than 50% of the model's domain is covered by clouds in all dates covered by the satellite data used in this study. Thus, the model performance is assessed by forecasting clear areas instead of clouds (Crocker and Mittermaier 2013). For this reason is set to 1 where is less than a given threshold. In this case an event of being cloud-free is defined at each model grid cell.
The satellite-observed cloud masks used in this study are a product of the Polar Platform System (PPS) of EUMETSAT Satellite Application Facilities for Nowcasting and very short range forecasting (SAFNWC) (Thoss 2014a,b). Since the resolution of satellite-observed cloud masks (1 km for AVHRR and 750 m for VIIRS, Thoss (2014b)) is higher than the model resolution (2.5 km), the fraction of cloudy pixels is computed for each model grid cell. This process results in observed cloud fractions of values between 0 and 1. The resulting field is converted to binary ( ) by Eq.
1 after defining a threshold.
In this study, the threshold is chosen to be 0.2. A low threshold means more clouds and less cloud-free grid cells. Using a lower threshold mimics the cloud mask generation algorithm which describes a cell as being cloudy even when only thin cirrus clouds are present. Only dates when the satellite data covers more than 80% of the model domain are considered. The maximum satellite time deviation from forecast valid-time is chosen to be 5 minutes.
To compute the FSS, firstly, the fraction of event occurrences in the grid cell neighborhood is calculated. A grid cell neighborhood of scale is defined as the the square centered on that grid cell and covers (2 + 1) 2 grid cells where the scale is 0, 1, 2, .. etc. Since the model grid spacing is 2.5 km, the scale number corresponds to the spatial scale length of 2.5(2 + 1) km.
As the experiments in this study have one control member and six perturbed members, seven values are defined at each grid cell for a given lead time when the control member is included and six values when it is excluded.
FSS is defined by the formula below: where and are the forecast and observed event fractions at grid cell , respectively, and following (Schwartz et al. 2010 where is the forecast event fraction (here, being cloud-free) for grid cell and member .
As this study was motivated by the insufficient spread characteristics in HarmonEPS cloud products, there is naturally a focus on the spread when evaluating the experiments. Also included are the FSS for clouds, RMSE of the ensemble mean, the mean bias and the fCRPS for a range of weather variables to see how the perturbations introduced may or may not alter the mean behavior of the ensemble. Other metrics were also looked at, but were found not to add any extra insight and are therefore not included.  based on the work of Wastl et al. (2019a). In this approach, the partial tendencies of the physics parameterization schemes are perturbed separately which is in contrast to the traditional SPPT approach implemented in HarmonEPS. This approach allows the boundary layer tapering to be switched off and thus tendency style perturbations can play an enhanced role (Wastl et al. 2019a).
In the HarmonEPS implementation of SPPT the stochastic pattern generator (SPG; Tsyrulnikov and Gayfulin 2017) is employed for the generation of the random perturbation fields. This pattern generator has the advantage of accounting for 'proportionality of scales', meaning it takes into account the fact that longer spatial scales live longer than shorter spatial scales, which die out quicker, a widespread feature in geophysics. In SPG, the perturbations vary spatially and temporally, and are correlated through a third-order in time stochastic differential equation with a pseudodifferential spatial operator defined on a limited area. The implementation in HarmonEPS interfaces the code provided by Tsyrulnikov and Gayfulin (2017) and is solely defined by the spatial (XLCOR) and temporal (TAU) correlation length scales, and the standard deviation, SDEV. Furthermore, SPG provides an initialization to ensure stationary statistics from the start of the integration.

1) SPPT
A number of sensitivity tests were carried out to investigate the optimum settings for XLCOR and TAU for the SPPT perturbations. Sensitivity tests were also designed to look into the influence of the clipping ratio (XCLIP) and size of the perturbations (controlled by the standard deviation of the perturbations, SDEV). Various ranges were used to test each one of these SPPT control parameters; for XLCOR, lengths of 100 to 2000 km; for TAU, time-scales of 6 to 24 hours; for SDEV and XCLIP values of 0.1-1.0 and 10.0 to 1.0 were used respectively. The ranges used for SDEV and the clipping ratio XCLIP ensured the perturbation coefficients were clipped at -1 and 1.
All sensitivity tests were carried out over the domain shown in Fig. 1 and XCLIP control parameters. The three tests shown represent the low, middle and high end of the tuning ranges. These sensitivity experiments give a much larger response than those for XLCOR and TAU. Increased standard deviation sizes and reduced clipping ratios lead to statistically significant improvements in the spread skill relationships for almost all variables at all lead times (not shown).
However, it was discovered that values of SDEV above 0.3 result in the undesirable effect of having a non-gaussian distribution of perturbation values. This non-gaussian characteristic arises when clipping is performed, as all values outside the clipping range are added to the tails of the interval.
In all further experiments presented here TAU is set to 8 h, XLCOR to 200 km and SDEV and XCLIP to 0.3 and 3.33, respectively.

2) SPPT EPS
Although SPPT demonstrated a clear impact when tested separately (see Fig. 2

1) S
All parameters are perturbed using the same spatial and temporal scales, but using a unique random seed for each parameter. The spatial and temporal length scales were tested in an early implementation of SPP which included 9 out of the 11 parameters listed in table 2. Two different spatial length scales were tested, 200 km and 1000 km. An example of the effect of changing the spatial scale is shown in Fig. 3. In another test (not shown) a spatial scale of 100 km was tested.
This did not give any significant differences from using 200 km. In the following, 200 km is used for all SPP results shown. A range of temporal scales were tested, from 1 hour to infinity, with very little sensitivity seen (not shown). In the following a temporal length scale of 12 hours is used.
2) P Experts were consulted on the different parameterizations and in particular about the range of values the (originally deterministic) parameters could take. The STD#1 value in Table 2  temperatures near -5 • C together with high concentrations of cloud liquid and of graupel. Therefore, a very long tail for ICENU has a physical relevance, and the median was chosen as it resulted in a longer tail than using the mean.
The sensitivity to the width of the distribution (the standard deviation) was examined for each parameter separately. These tests included a summer and a winter period ( For parameters KGN_SBGR, RADGR and RADSN (see Table 2 for a description of the different parameters), a clipping function was introduced to ensure the parameters were kept within physical bounds. The resulting parameter densities are shown in Fig. 4. As mentioned above, ICENU is the only parameter that uses the median. Its density distribution is quite different from the other parameters, hence it is also plotted separately in the right panel of Fig. 4. In

3) I
As expected, perturbing some parameters has a bigger impact than perturbing others. Due to the nature of the perturbed parameters, there is a clear difference in impact for the summer and winter testing periods for some parameters. In Fig. 6 the spread and RMSE from individually perturbing the parameters in Table 2 for two forecast lead times is shown for summer and winter testing periods. The +15 h and +27 h forecast lead times represent the maximum and minimum responses seen in the verification for some of the parameters and are connected to the diurnal cycle (all forecasts start at 00 UTC). These two forecast lead times are shown for T2m and fraction of low cloud cover (CClow) in Fig. 6. For 12 h accumulated precipitation, +18 h and +30 h forecast lead times are used, as the 12 h accumulation is not available at +15 h and +27 h. The spread and RMSE shown here is with the final parameter distributions (STD#2). The stable condition length scale (RZC_H), the saturation limit sensitivity for condensation (VSIGQSAT, especially in summer), the threshold cloud thickness used in shallow/deep convection decision (CLDDPTHDP) and the asymptotic free atmospheric length scale (RZL_INF) are clearly the most effective parameters for increasing the spread. It is also quite evident that in winter KGN_ACON, KGN_SBGR, RADGR and RADSN do so to a lesser extent. We can also observe that the spread depends much more on the parameter perturbed than the ensemble mean RMSE does.

4) SPP EPS
After the individual adjustment of the parameter pdf's, SPP was added to the reference setup The increased RMSE is more evident in winter than in summer. However, RMSE for the cloud variables (fraction of total cloud cover (CCtot), cloud base height (Cbase) and fraction of low clouds (CClow)) are mainly improved by adding SPP in summer. For winter the impact is more mixed. fCRPS also shows a mixed effect from including SPP, but there is a significant improvement from SPP for CCtot and Cbase for the summer period.
Recalling what was seen in Fig. 6, parameter RZC_H was the most influential among this set of parameters, and as expected reducing the width of the pdf for RZC_H has a large impact on the ensemble skill (not shown). The parameter has also a clear impact on the mean bias as seen in Fig. 9. It is naturally undesirable that the perturbations change the mean bias of the system (here making the ensemble colder). Reducing the STD#2 value of RZC_H helps, to a large extent, to alleviate this effect. Interestingly, it is also seen that for CCtot the behaviour is exactly opposite to that of T2m, with the mean bias becoming closer to the reference with increasing STD#2. Note that other parameter perturbations are active in this test, so this bias change for CCtot might be due to the interaction with other perturbations. Another possible explanation are compensating errors in the forecast model. This will be looked into in a future study in connection with revised pdfs for the parameters.
A case with poorly predicted fog has been selected to illustrate the low cloud/fog related forecast response of the SPP perturbations. Fig. 10 shows a satellite image from 16 February 2019 where widespread areas of fog cover e.g. southern Sweden and Denmark, and some areas of southwestern Norway and northeastern Finland are covered with scattered fog. In the reference setup (REF), all the perturbed members (Fig. 11) represent the scattered fog quite well, but the larger fog covered areas in Sweden and Denmark are not present in the forecasts at all. In the SPP experiment (Fig.   12), a larger variability between the ensemble members and a tendency for more fog can be seen.
The fog predicted in REF is still present, but in addition larger areas of fog in better agreement with the satellite imagery can be found. The larger variability seen in this case is in line with what is seen in the average scores for the cloud parameters in Fig. 7. SPP also increases the average cloud cover for the period (not shown).
For the convective summer cases investigated (not shown), the ensemble sensitivity to the SPP perturbations is less pronounced. In these cases, precipitation areas are redistributed, but without any significant changes in ensemble skill. According to Roberts and Lean (2008) the forecast is considered skillful for scales at which FSS exceeds FSS uniform = 0.5 + 0 , where 0 is the fraction of cloud-free grid cells. Figure 14 illustrates The FSS skill is also affected by the chosen masking threshold. For example, a threshold of 0.7 slightly reduces the model forecast skill (not shown). This is expected because a higher threshold corresponds to more cloud-free grid cells, while the satellite products used in this study tend to underestimate the number of cloud-free grid cells, as mentioned in section 4. Thus, by increasing the threshold, the forecast cloud mask is expected to be less similar to the satellite-observed cloud mask and therefore FSS is expected to be smaller. In this case, FSS medians for SPP become slightly lower than REF and lower in February than in June. This means that in general SPP produces more cloudy areas, especially in winter. Changing the threshold to 0.7 does not change the statistical confidence of the results considerably when compared to the threshold of 0.2. These changes are also reflected in the diff uniform values, however the changes are small and do not change the overall assessment of the forecast skill.

Interactions between perturbation types
As discussed in Section 5, SPPT did not produce any significant impact on the ensemble when it was combined with the other perturbation types in HarmonEPS, whereas SPP had a clear positive impact on the ensemble spread (Fig. 15, top panel). A series of experiments was conducted to understand the cause of this lack of impact from SPPT, using experiment setup (ii) as described in Section 3. Due to computational affordability, a subset of the full one-month long testing period was used. This allowed 8 additional experiments to be run in order to get qualitative answers for the lack of impact from SPPT. A one week period in February 2019 was chosen for this purpose.
Obviously, the perturbations in a non-linear system like an NWP model are not additive, but with the distinct nature of the perturbations that act and focus on different aspects and time ranges of the forecast, it is reasonable to assume they will all contribute to some extent to the variability of the ensemble. Although, the geographical location of the effect will likely be closely linked to the places in the modelled atmosphere where there is sensitivity/instability present. Looking at the effect of all the perturbations individually (Fig. 15 In contrast to the initial and surface perturbations that are applied at only the analysis time over the grid and the lateral boundary perturbations that are applied in a narrow lateral boundary zone, the model uncertainty representations add perturbations at every time step over the grid, and act on the physics parameterizations through perturbing the total physical tendencies (SPPT) or through stochastic perturbations to selected closure parameters in physical parameterizations (SPP). SPPT and SPP are therefore quite different in nature compared to the other perturbations. It is natural to expect that both would thus add variability on top of the other perturbations. This is however not the case for SPPT. In section 5 a clear impact on the spread was seen from SPPT when it was the only perturbation method applied, and the ensemble was sensitive to an increase in the standard deviation of the SPPT perturbations (see Fig. 2). This is confirmed in Fig. 16 Again, this is in line with the results in Fig. 16.
The cause for this behavior seems to be that the spread added by SPPT is located in areas where the other perturbations have already accounted for the variability. It is therefore interesting to see which of the other perturbations masks the effect of SPPT. In the following, the effect that SPPT has on top of the initial, lateral boundary and surface perturbations is looked at separately. Fig. 19 shows the spread and skill for these perturbation type combinations. The curves with/without SPPT show active areas where the others do not, e.g. along the southwest coast of Norway, along the southwestern part of Finland and in areas of southern Sweden. SPP therefore seems to be capable of adding variability that is not captured by the other perturbations.

Discussion and conclusions
SPP in this first HarmonEPS configuration is a promising scheme for including a model un- HarmonEPS' variability much more than SPPT. SPP is also in line with the objective of physically consistent perturbations as it does not violate local conservation properties of energy and moisture.
Although SPP was able to improve the variability of the ensemble, in some cases a degradation of the ensemble mean RMSE was observed. Moreover, perturbations of some parameters affected the mean bias of the model, especially during the cold season. In particular, three parameters were found to be more active in winter (VSIGQSAT, RZC_H and RZL_INF). While these parameters are also active in summer, the relative impact of all parameters is more even in summer (see Fig.   6). Reducing the standard deviation for RZC_H perturbations was seen to help in reducing the bias change with respect to T2m, but had the opposite effect on the cloud variables. Some of the perturbed parameters are involved in the same processes and are thus likely influenced by each other. So far this has not been taken into account, as each perturbed parameter has its own realisation of the random perturbation pattern. In theory, this could possibly result in perturbations of one parameter working against perturbations of another parameter. This in turn could lead to too large perturbations of the individual parameters to get the desired effect on the variability, as well as perturbations simply cancelling each other out. The impact of the correlation of the perturbation patterns is currently being studied, starting with RZC_H and RZL_INF. If this proves successful, this could further improve the SPP scheme and might also help in reducing the above mentioned ensemble mean RMSE/bias change. Although taking possible parameter correlations into account further adds to the maintenance of the scheme, it might be possible to utilize some algorithmic parameter estimation methods (see e.g. Ollinaho et al. 2013Ollinaho et al. , 2014 to inform about these correlations. been seen when SPPT was tested on a domain over Ireland (not shown). One possible explanation for this reduced impact from SPPT compared to other studies could be the relative magnitude of SPPT perturbations compared to the initial and lateral boundary perturbations (e.g. see Fig. 15 lower panel). HarmonEPS is known to have a larger initial spread compared to e.g. ECMWF ENS, as reported in Frogner et al. (2019). Looking at the tendencies, it was seen that SPPT was only able to create variability in the same geographical areas as the other perturbations, at least for the cases which have been investigated in this study. Disentangling the different perturbations in HarmonEPS showed that all of the other perturbations are taking part in masking the effect of SPPT. Although, the surface perturbations do so to a lesser extent than the initial and boundary perturbations.
The time periods used in this investigation may also have an impact on the response from SPPT.
Previous studies have reported that the impact of physics perturbations is quite case dependent with a particular dependence on synoptic forcing (greater response to physics perturbations in weakly forced cases e.g. One option for trying to improve SPPT in HarmonEPS is pSPPT (Wastl et al. 2019b), where the partial tendencies of the physics parameterization schemes are sequentially perturbed. This was also shown to improve the numerical stability of SPPT, making it possible to switch off the tapering in the boundary layer for SPPT for all parameterizations, except for the turbulence scheme. Applying pSPPT has the potential to create more variability near the surface than seen in the standard SPPT. pSPPT also results in a more physically consistent scheme, as the interaction between the uncertainties of the different physics parameterization schemes is sustained.
Model error is complex and originates from many sources, e.g. unresolved processes at sub-grid scale, simplified process description, incomplete knowledge of processes and uncertain closure parameters in the parameterizations. Debate exists over whether or not a single model uncertainty scheme is sufficient and that perhaps a combination of schemes is needed to account for the full model uncertainty present. For example, Jankov et al. (2019) argue that SPP on its own in their 3 km ensemble is not sufficient, and that combining it with SPPT is necessary. This is in contrast to the results presented in this paper where SPP is seen to contribute considerably to the variability of the ensemble, while activating SPPT adds very little. Jankov et al. (2019) applies SPP only to the Planetary Boundary Layer (PBL) scheme and Thompson et al. (2021) only for a few micro-physics parameters, and this could be part of the explanation why a reduced effect from SPP is seen.
Interestingly, Lang et al. (2021) show a revised version of SPP in ECMWF ENS that is as skillful as SPPT, while the first version was not (Ollinaho et al. 2017). Clearly, the additional probabilistic skill provided by SPP is, among other things, tied to the quantity and quality of the parameters perturbed.
The impact of SPP reported in this study, in Jankov et al. (2019), and in the two versions of ECMWF SPP (Ollinaho et al. 2017;Lang et al. 2021), highlights that the actual setup of the SPP scheme is important. This includes targeting influential parameters important in different weather situations, the shape of the parameter pdfs, the influence of the spatial and temporal scales used, and possible correlations between the parameters. An obvious improvement to the current SPP implementation in HarmonEPS is to increase the number of perturbed parameters, and also to extend perturbations to better cover the different parts of the model physics. Perturbations to the semi-Lagrangian horizontal diffusion will be added and investigated in a future study. Also the choice of spatial and temporal length scales will be revisited in a future study, possibly with different scales for different parameters. SPG as used here has the possibility to be extended to 3-dimensions, which might be important for some processes. For this to be computationally affordable in an operational setting, further work is needed to decrease the cost of the generation of the random fields and in optimizing how often they are applied. Currently, perturbing all 11 parameters used in this study at the same time gives rise to a 5 % increase in computer resources compared to a run without SPP when the pattern is updated every time step (in 2D), and 0.1 % when updated every hour. Carefully choosing parameters that are proven to be influential will also be important for the affordability of SPP. As seen in Fig. 6 some of the parameters have little impact in both seasons, and further studies will illustrate if these parameters can be excluded or if they prove to be important in certain situations.
One drawback of SPP compared to SPPT is the issue of maintenance. While SPPT requires very little maintenance, SPP needs reassessment and adjustments when new physics are developed.
However, Lang et al. (2021) argue that the conservation properties of SPP make it nonetheless an attractive option over SPPT. Currently physics developments are based mainly on deterministic experimentation. It will be important for optimal use of resources, as well as optimal uncertainty representation, that in the future stochasticity is taken into account at an early stage of physics development.
SPP works well for the convection-permitting ensemble tested here, even when perturbing so few parameters involved in a rather limited set of physical parameterizations and processes within them. An ensemble size of 6+1 members as used here, and experiment periods of two weeks for the sensitivity of SPP parameters and two months for testing SPP with respect to REF, are obviously not enough to adequately sample the full probability distribution of atmospheric states. Longer experiment periods that include a wider variety of differently forced atmospheric situations and large ensemble sizes would be desirable to confirm the results presented in this paper. However, we are confident, based on tests comparing our 6+1 member ensemble with a 20+1 member ensemble and the use of fair CRPS, that our results are a good foundation for further development of SPP in HarmonEPS. Further work will first focus on getting SPP ready for operational implementation in HarmonEPS suites (Frogner et al. 2019) by adjusting the pdfs for the already implemented parameters, while also taking into account the correlations of some of the parameters, and paying special attention to possible undesirable bias changes.