A high-resolution data assimilation system has been implemented and tested within a 4-km grid length version of the Met Office Unified Model (UM). A variational analysis scheme is used to correct larger scales using conventional observation types. The system uses two nudging procedures to assimilate high-resolution information: radar-derived surface precipitation rates are assimilated via latent heat nudging (LHN), while cloud nudging (CN) is used to assimilate moisture fields derived from satellite, radar, and surface observations. The data assimilation scheme was tested on five convection-dominated case studies from the Convective Storm Initiation Project (CSIP). Model skill was assessed statistically using radar-derived surface-precipitation hourly accumulations via a scale-dependent verification scheme. Data assimilation is shown to have a dramatic impact on skill during both the assimilation and subsequent forecast periods on nowcasting time scales. The resulting forecasts are also shown to be much more skillful than those produced using either a 12-km grid length version of the UM with data assimilation in place, or a 4-km grid-length UM version run using a 12-km state as initial conditions. The individual contribution to overall skill attributable to each data-assimilation component is investigated. Up until T + 3 h, LHN has the greatest impact on skill. For later times, VAR, LHN, and CN contribute equally to the skill in predicting the spatial distribution of the heaviest rainfall; while VAR alone accounts for the skill in depicting distributions corresponding to lower accumulation thresholds. The 4-km forecasts tend to overpredict both the intensity and areal coverage of storms. While it is likely that the intensity bias is partially attributable to model error, both VAR and LHN clearly contribute to the overestimation of the areal extent. Future developments that may mitigate this problem are suggested.
Owing to increases in computer power it is now viable to run convective-scale numerical weather prediction (NWP) models in an operational context. Indeed, since 2006, the Met Office has been running an operational 4-km grid-length version of the Unified Model (UM; Davies et al. 2005) with a domain that encompasses the entire United Kingdom, and in the near future it is anticipated that 1.5–1-km grid-length versions of the UM will become operational. A central advantage of such high-resolution models is that they are capable of explicitly resolving convection. This, in addition to other factors, such as improved representation of orography and microphysical processes, should lead to more realistic simulations of both convective- and mesoscale phenomena. This expectation has been born out by a number of studies (see, e.g., Weisman et al. 1997; Romero et al. 2001; Spear and Leslie 2002; Done et al. 2004; Roberts and Lean 2008).
The Met Office is developing high-resolution versions of the UM for the purposes of short-range forecasting, with an emphasis on the prediction of hazardous weather, such as that associated with severe convective storms. It has been shown that 4- and 1-km grid-length versions of the UM are capable of producing realistic representations of such events (Lean et al. 2008; Roberts and Lean 2008). However, as the authors point out, realism is not synonymous with accuracy; to achieve accurate forecasts, especially over nowcasting time scales of O(1–6 h), the introduction of data assimilation (DA) is vital.
In recent years, 4D-variational assimilation (4D-VAR) has become the preferred method of assimilating data within large-scale operational models: Met Office (Rawlins et al. 2007), European Centre for Medium-range Weather Forecasting (ECMWF; Rabier et al. 2000; Mahfouf and Rabier 2000; Klinker et al. 2000), MeteoFrance (Janiskov et al. 1999; Gauthier and Thépaut 2001), Environment Canada (Gauthier et al. 2007), and Japan Meteorological Agency (Kadowaki 2005). Methods based on Ensemble Kalman filtering (EnKF) are viable alternatives of assimilating at these scales (Lorenc 2003). At the convective scale, both 4D-VAR and EnKF face a number of difficult additional challenges. Both techniques are capable of making only weakly nonlinear corrections to the a priori estimation of the true atmospheric state, the so-called background state. However, convective-scale trajectories can be highly nonlinear, owing to the dynamics and microphysical processes, and particularly to the strong interaction between them. In such a regime, 4D-VAR or EnKF will work only if observations are sufficiently close to the a priori. This condition requires accurate specification of the background covariance matrix, which in turn, is made difficult by the high degree of nonlinearity and the associated lack of balance at these scales. Notwithstanding this, both techniques have been applied with encouraging success to the problem of assimilating organized convective storms (Sun and Crook 1997, 1998; Snyder and Zhang 2003). However, the majority of studies have considered semi-idealized events and it is clear that designing an operational system will require a large research effort.
Nudging offers an attractive alternative for the assimilation of convective-scale information. While the Met Office plans ultimately to develop an operational convective-scale data assimilation system based on 4D-VAR, the initial strategy has been to use nudging for a number of reasons. First, the infrastructure is preexisting as nudging has been used within the operational 12-km grid-length version of the UM for around 10 years; second, it is relatively computationally cheap. Unlike 4D-VAR or EnKF, nudging requires no multiple runs of either the full or linearized version of the UM, nudging increments being calculated and added as the forecast trajectory passes through its assimilation period. The third, and perhaps most compelling, reason is that since nudging is a feedback process involving the nonlinear model, it is able to make relatively large finite, nonlinear corrections to the model trajectory.
Initially used at large scale with success (Jones and Macpherson 1997; Macpherson et al. 1996; Krishnamurti et al. 1991), nudging has recently been applied to the convective scale (Leuenberger 2005; Leuenberger and Rossa 2007; Klink and Stephan 2004). These studies utilized latent heat nudging (LHN), which involves an adjustment of model latent heat release according to the ratio of observed-to-model surface precipitation rate. The applicability of LHN at the convective scale is by no means guaranteed. In low-resolution models, the success of LHN is arguably due to changes in the large-scale vertical motion produced by the LHN increments, which, in turn, lead to a beneficial change in resolved and/or parameterized precipitation rates. By contrast, in a model that explicitly resolves convective storms, if LHN is to usefully alter convective precipitation then it must, in part, do so by modifying the small-scale structure associated with the convection; that is, by making buoyancy modifications in the up- and downdrafts of convective storms. Leuenberger (2005) and Leuenberger and Rossa (2007) demonstrated the success of LHN in cases where precipitation can be treated diagnostically. However, in general, the increased dynamical importance of the prognostic nature of precipitation at these scales endangers the validity of a central assumption of LHN, namely that the surface–rain rate at a given horizontal point is proportional to the vertically integrated latent-heat release in the column above the point. While this is a good approximation at large scales, it begins to break down at the convective scale, where a significant horizontal displacement of convective precipitation can occur before it reaches the surface. Hence, it is possible that LHN, which adjusts the model latent-heat release on a column-by-column basis, may fail in the case of storms where there is a significant horizontal displacement, as evidenced in Klink and Stephan (2004).
An effective high-resolution DA scheme must be able to correct errors in both convective-scale structures and the large-scale environment in which those structures are embedded. This constitutes a major challenge in the context of VAR and ensemble-based methodologies as it is unclear how to simultaneously analyze dynamically distinct scales (see, e.g., Zhang et al. 2006). A possible solution to this multiscale problem involves analyzing only the convective scales directly in the high-resolution model, while blending in a large-scale correction produced by a low-resolution model analysis. Such an approach was adopted by Guidard and Fischer (2008) in the context of the mesocale limited-area model, Aire Limitée Adaptation Dynamique Développement International (ALADIN). Zhang et al. (2009) propose an EnKF-based solution wherein successively smaller scales are analyzed by reducing the localization radius of influence. In the present study, the multiscale problem is dealt with by correcting the convective scales via nudging and the large-scales via VAR. Nevertheless, a scale-separation technique (Z. Li 2007, personal communication) is employed in the VAR system, wherein the longwave component of a low-resolution model analysis is incorporated into the high-resolution run of VAR, the goal of which is to include information from observations located outside the boundaries of the high-resolution model domain.
This article describes the performance of a DA system implemented within a 4-km grid-length version of the UM. The system is based on a combination of a 3D-VAR scheme, used for the large-scale analysis of conventional observations, and nudging systems for the use of high-resolution moisture and surface precipitation data. A cloud nudging (CN) procedure is used to nudge humidity increments produced using the Moisture Observation Preprocessing System (MOPS; Macpherson et al. 1996), while surface precipitation rates, derived from radar measurements are assimilated via LHN (Jones and Macpherson 1997).
The performance of the system is assessed in terms of the prediction of convective precipitation on nowcasting time scales of 1–7 h, via a statistical analysis of five case studies from the Convective Storms Initiation Project (CSIP) observational period (Browning et al. 2007). Forecast skill was measured using the scale-dependent method introduced by Roberts and Lean (2008), applied to hourly accumulations of precipitation.
Owing to the computational expense of running high-resolution models, it is imperative that they produce forecasts of added benefit in comparison with lower-resolution model counterparts. In light of this, the statistical skill achieved by forecasts produced using a 12-km grid-length version of the UM is compared with that obtained by forecasts produced using the 4-km model with DA. Lean et al. (2008), in the absence of a proper DA system, adopted the procedure of running a 4-km grid-length version of the UM using interpolated T + 1 12-km model states as initial conditions. This procedure is statistically assessed in the present study in order to show the benefits of including DA within the 4-km model. The contribution to overall forecast skill arising from each of the DA components (i.e., VAR, CN, and LHN) is also assessed, along with the effect on skill of a number of modifications to both the CN and LHN configurations.
The structure of the paper is as follows. In section 2 the model and data-assimilation systems are described. Section 3 summarizes key aspects of the verification procedure. In section 4 the experimental results are presented. Section 5 discusses the implications of this study for future DA development within the context of high-resolution models.
2. Description of model and DA systems
a. Model configurations
The models used in this study were 12- and 4-km grid-length versions of the nonhydrostatic Unified Model (Davies et al. 2005). The 12-km grid-length version was run in the U.K. Mesoscale Model configuration (MES), which was operational at the time, but has subsequently been retired.1 It includes a nonlocal first-order boundary layer scheme (Lock et al. 2000), a mass-flux convection scheme with convectively available potential energy (CAPE) closure (Gregory and Rowntree 1990), and a mixed-phaseprecipitation scheme (Wilson and Ballard 1999). The 12-km domain, shown in Fig. 1, consists of 146 × 182 horizontal grid points with 38 vertical levels. The model has a 5-min time step and is driven by lateral boundary conditions (LBCs) produced by the global version of the UM, updated every hour.
The 4-km model domain (Fig. 1) consists of 190 × 190 horizontal grid points and 38 vertical levels. A 100-s time step was used and LBCs, updated every 15 min, were supplied by the 12-km model. Differences in configuration with respect to the 12-km model included a 16 time step del-4 horizontal diffusion, a prognostic rain variable and a modified version of the convection scheme wherein the CAPE closure time scale is made CAPE-dependent (Roberts 2003). The motivation for the latter is that deep convection can be reasonably well resolved explicitly by the dynamics, leaving shallow convection to be dealt with by the scheme.
b. Data assimilation system
A schematic illustration of the 3-hourly DA cycle used in both the 12- and 4-km models is shown in Fig. 2. In each cycle, the model was run from T − 2 h with respect to the cycle validity time, using the T + 1 h background model state from the previous cycle as initial conditions. During the DA period, the model trajectory was corrected at each time step by the addition of increments produced by both 3D-VAR and the nudging procedures. The 3D-VAR increments were added in from T − 1 h to T + 1 h in the case of both models. In the 12-km model, LHN and CN increments were added in from T − 2 h to T + 2 h and T − 2 h to T + 0.5 h, respectively. In the case of the 4-km model, the LHN period spanned [T − 2 h, T + 1 h 15 min] and two different CN periods were tested, namely [T − 2 h, T + 1 h 5 min] and [T − 2, T + 1 h 10 min].
1) Latent heat nudging
The basic method of LHN (Jones and Macpherson 1997) involves scaling the vertical latent heat profile at each grid point according to the ratio of observed- and model-precipitation rates at the surface, the intention being that this will lead to a subsequent change in the model vertical velocity field, which will, in turn, lead to a closer match between observed- and model- precipitation rates. It is hoped that this more realistic precipitation rate is obtained via the establishment of a dynamical storm structure, which is closer to reality, and that this structure will subsequently lead to a more accurate forecast trajectory.
Let PRmod(i, t) denote the model surface precipitation rate at grid point i and time step t, and the corresponding observational analysis of the surface precipitation rate be denoted by PRanal(i, t). In the absence of any form of data assimilation, the increment to the (3D) potential-temperature field at horizontal grid point i, on vertical level k caused by latent heating at each time step t is given by the following:
where the two terms on the right denote increments due to large-scale condensation and the convection scheme, respectively. In the case of the 4-km grid-length model, Δθconv makes only a small contribution to Δθ because the convection scheme is mass-flux-limited.
The rescaled latent heat field is given by
As described later, a modification of the LHN scheme was also tested in the 4-km model, in which no scaling was performed on negative regions of latent heat change; that is, those associated with subcloud evaporation of precipitation.
Following Jones and Macpherson (1997), Δθ is prevented from being scaled (either up or down) by a factor greater than 2. These conditions are enforced to avoid the dangers of overheating a small number of model layers and of causing gridpoint storms, respectively. If [PRanal(i, t)]/[PRmod(i, t)] > 2 or [PRanal(i, t)]/[PRmod(i, t)] < 0.5, then a search over neighboring grid points is performed until a grid point j is found such that 2 > [PRanal(i, t)]/[PRmod(j, t)] > 0.5. In that case, Δθscaled(i, k, t) = Δθ(j, k, t)[PRanal(i, t)]/[PRmod(j, t)].2 If the gridpoint search fails, then, providing that PRmod ≠ 0, Δθ is scaled by the maximum factor of 2, while if PRmod = 0, no scaling occurs.
A set of nudging increments can then be defined as
Here, R represents the application of a recursive filter with an effective Gaussian half-width of 21 km, which spatially smooths the increments on each horizontal level. The scalar FLH is a tunable parameter that controls the nudging strength. Finally, Δθ(i, k, t) is replaced with Δθnew(i, k, t) = Δθ(i, k, t) + Δθ′(i, k, t) and passed as a source term to the prognostic thermodynamic equation. The model relative humidity is preserved during this process (i.e., mixing ratio increments are added where required).
The precipitation observations used in this study were derived from radar reflectivities processed by the Met Office nowcasting system, Nimrod (Golding 1998), with spurious artifacts removed as described in Harrison et al. (2000). The resulting Nimrod analyses of surface precipitation are available at a maximum of every 15 min. In the 12-km model DA configuration, hourly Nimrod precipitation analyses were used, while in the 4-km model, the temporal frequency was increased to 15 min in order to depict the faster evolving convective structures. For a given time step, PRanal(i, t) is calculated by first linearly interpolating between 15-km resolution Nimrod analyses and then forming a weighted sum of the resulting field and PRmod(i, t) according to a set of weights that measure radar quality and depend upon distance from the radar and height above the model freezing level (Gibson 2000; Gibson et al. 2000). Radar quality is assumed to be perfect for pixels below the freezing level and when the radar distance is less than 150 km. Beyond these limits, the weights decrease linearly with distance, reaching zero when the radar distance is 250 km and when the height above freezing level is 4 km.
2) Cloud nudging via the MOPS system
The MOPS nudging system (Macpherson et al. 1996) utilizes 3D cloud-fraction observational analyses produced by the Nimrod cloud analysis scheme via a combination of satellite, radar, and surface observations. Each analysis, valid at time tob, is converted into a 15-km horizontal resolution relative-humidity (RH) field, rhob(i, k, tob), via the same relation used in the model cloud scheme. We can then define a set of RH nudging increments at time t:
where rh(i, k, t) is the model RH field before nudging and α is a specified weighting function. These increments are added to the model state by converting to vapor-mixing ratio, Δq(i, k, t) = Δrh(i, k, t)qsat(T, p), and including the result as an extra term in the prognostic equation for q; that is, Dq(i, k, t)/Dt = Sq(i, k, t) + Δq(i, k, t), where Sq is the normal model source term. This procedure constitutes the CN process, the goal of which is to relax the model RH toward that of the observational value on an effective time scale τeff(t) = α−1. It should be remembered, however, that the evolution of rh is determined by the nonlinear interaction between the equations of the UM plus the nudging term.
The general form of α(t) used in the present study is illustrated in Fig. 3. The choice of a triangular function is intended to minimize shocks to the UM trajectory. In the case of the 12-km model, one MOPS observation field, valid at the analysis time, tob = T + 0, was nudged in during each DA cycle, with an insertion period defined by [tmin, tmax] = [tob − 120 min, tob + 30 min], and τeff = 2.5 h. In the case of the 4-km model, where we aim to depict faster-evolving convective events, three analyses were used during each DA cycle valid at T − 1 h, T + 0 h, and T + 1 h. As discussed in the experimental section, two different insertion periods for each analysis have been tested: [T − 40 min, T + 10 min] and [T − 20 min, T + 5 min]. The integrated forcing per observational analysis was set to the value used in the 12-km configuration.
Conventional observations were assimilated via incremental 3D-VAR (Lorenc et al. 2000), the object of which is to produce an analysis valid time T + 0 h given by xa = xb + 𝗦−1 δw, where the background, xb, is a 3-h forecast state produced by the previous DA cycle. The incrementing operator 𝗦−1 is the generalized inverse of the simplification operator 𝗦, which decreases horizontal resolution and transforms multiple moisture and cloud variables into a single variable for the purposes of VAR. The increments δw are found by minimizing the following cost function:
where 𝗕 is the forecast error covariance matrix, and 𝗘 and 𝗙 are the instrumental- and representativeness-error covariances, respectively. It is assumed that the initial guess state used in the minimization is identical to xb. The model estimate of the observations yo is given by y = H(xb + 𝗦−1 δw), where H is the forward model operator.
The above approach was adopted in the case of the 12-km model,3 wherein VAR increments, δw12 km, were derived with respect to a 12 km T + 3 h background, , leading to an analysis, 𝗦−1(δw12km). The 4-km model4 utilized a scale-separation technique developed for the Met Office high-resolution system (Z. Li 2007, personal communication), with the 4-km analysis given by
where is a T + 3 h 4-km forecast state; corresponds to a filtered version of δw12km where all modes below a critical wavelength, λc = 120 km, have been set to zero. The set of increments , consisting of modes below the wavelength of 120 km, are obtained by minimizing a modified cost function wherein the first term of Eq. (5) spans short wavelengths only, and . The advantage of constraining the longwave component in this way is that the 4-km analysis includes information pertaining to observations outside of the boundaries of the 4-km domain and is thereby made more accurate. The choice of λc = 120 km was found to optimize the increase in accuracy (Z. Li 2007, personal communication).
In both the 12- and 4-km grid-length models, the relevant VAR increments were introduced to the model trajectory as a series of subincrements over the interval [T − 1 h, T + 1 h] via the incremental analysis update (IAU) procedure (Bloom et al. 1996), which serves to reduce any spurious gravity wave activity. The correlation length scales used in the formulation of 𝗕 are set to 80–120 km, depending on model variable, so that the VAR increments represent relatively large-scale corrections to the background.
3. Verification procedure
The method of objective verification adopted in this study is that of Roberts and Lean (2008), applied to hourly rainfall accumulations. In this section we briefly introduce the method and discuss some of its salient features.
Verification was performed within the area shown in Fig. 1. This domain, which is smaller than that of the 4-km model, was chosen both to avoid regions of less accurate or missing radar coverage and to minimize the effect of adjustment of the flow close to the lateral boundaries. The latter effect leads to poor prediction near the boundaries and such areas would normally be ignored in an operational forecast. Hourly surface precipitation accumulations were calculated from model fields and the 15-min NIMROD observational analyses. Both fields were then interpolated onto a 5-km grid-length mesh. The resulting model [Mr(i, j)] and radar-observed [Or(i, j)] hourly accumulation fields were next converted into binary fields, Ix:
where X = M or O, and q is a threshold, expressed either as an absolute rain rate or as a percentile of grid points. Fractions were then generated for every grid point by computing the fraction of surrounding points that exceed q within a square of width n:
These fields were then used to calculate the mean square error (MSE) between forecast and observations, for a box of side n:
A related reference MSE, corresponding to the case where there is no overlap between observed and model fractions, is defined as follows:
The fraction skill score (FSS) is given by the following:
where FSS ranges from 0 (zero skill) to 1 (perfect skill).
The target level of skill is defined as
where f0 is the fraction of observed points exceeding the threshold over the domain. Here, FSStarget is the FSS obtained for n = 1 by setting the fraction at every grid point equal to f0, and is equal to a skill level exactly halfway between a perfect forecast and a random forecast (where f0 is randomly distributed over points). Following Roberts and Lean (2008), we choose FSStarget as the minimum level of skill we require our forecast to attain. We define ΔFSS = FSS − FSStarget. Typically, FSS increases with scale, motivating the use of the quantity Lmin, the length scale at which FSStarget is first reached. In what follows, the Lmin corresponding to a given DA configuration y is denoted by .
In this study, we concentrate on a number of particular thresholds q, each of which serves to highlight various aspects of the accumulations. The absolute thresholds considered are 0.2 mm h−1 (corresponding to light rain rates), 1.0 mm h−1 (medium rates), and 4.0 mm h−1 (high rates). The FSS for a threshold of 0.2 mm h−1 measures the skill of the spatial distribution of all but the lightest rain rates and, hence, is sensitive to errors over a relatively wide range of rain intensities. Ideally, the spatial distribution of more intense rain can be assessed by use of the higher absolute thresholds. However, as will become evident in later sections, if there is a substantial bias in model rain intensities, then the FSS will be dominated by this bias rather than the spatial accuracy. Such cases motivate the use of percentile thresholds, denoted here by q = Xth where X ranges from 0 to 100. The 90th percentile threshold, for example, selects the 10% of grid points with the highest associated accumulations in both the radar and model fields. As the percentile threshold is decreased, an increasingly large proportion of the rainfall distribution is assessed. In the present study we concentrate on q = 90th, as this tends to isolate a large proportion of the rainfall associated with the high-impact storms considered.
4. Experimental investigations
The experimental investigations are organized into three main sections. In section 4a, forecasts produced using DA within the 4-km model are presented, first via an illustrative example, followed by a statistical analysis based on the five CSIP intensive observation periods (IOPs) summarized in Table 1. Section 4b compares the skill of the 4-km DA system with those of the 12-km DA system and the 4-km model run from interpolated 12-km initial states. Section 4c considers the relative impact of each of the component parts of the DA system. Table 3 summarizes all of the DA configurations tested within this study.
a. Data assimilation within the 4-km model
In this section the performance of the 4-km model with DA in place is considered. As shown later, highest forecast skill was obtained, within statistical error, via two DA configurations, DA1 and DA2. In both configurations, a CN insertion period of [T − 20 min, T + 5 min] was used; no subcloud scaling was performed within LHN (i.e., only positive regions of latent heat release were scaled); and no spatial filtering was applied to the LHN increments. The two configurations differed only in terms of the strength of LHN applied, with FLN = 0.5 in DA1 and FLN = 0.25 in DA2.
1) Illustrative example: IOP18
Before discussing the statistical analysis of forecasts, we consider an illustrative example from CSIP IOP18, with the aim of showing how the objective verification procedure described in section 3 relates to a subjective assessment of rainfall accumulations. All skill scores referred to in this section are summarized in Table 2.
Figure 4b shows the radar-derived precipitation accumulation spanning the hour [1100, 1200 UTC] on 25 August 2005. Three main areas of precipitation are identified (R1, R2, and R3). During this period the flow was predominantly westerly, and each of the identified areas resulted from convective storms that propagated in an approximately eastward direction. Region R1 was formed from a mesoscale convective system, which consisted of essentially two squall lines; R2 originated from a band of showers oriented approximately southwest–northeast (SW–NE); and R3 from a small band oriented north–south (N–S).
Figure 4a shows the [T + 2 h, T + 3 h] forecast accumulation produced by the DA1 configuration for this hour. The forecast produced explicitly resolved storms in the three areas identified in the radar accumulation. Although the model and observed storms differed in detail, the resulting accumulations (labeled A1 to A3 in Fig. 4a) are in relatively good agreement. This subjective assessment is confirmed by the objective verification method, which for q = 0.2 mm h−1 yields a score of ΔFSS ∼ 0.07 using a box width of L = 55 km, and has an associated minimum length scale, Lmin ∼ 28 km.
In general, the model accumulations in Fig. 4a attain higher magnitudes than those of the observations. This is indicative of the model propensity to overpredict both peak rainfall rates and the areal extent of convective cells. This bias is detected by the verification procedure in the form of decreasing skill with increasing absolute threshold. Hence, for example, for q = 1 mm h−1 at L = 55 km, ΔFSS ∼ 0.03, and Lmin ∼ 44 km; while for q = 4 mm h−1 at L = 55 km, ΔFSS ∼ −0.42, and Lmin was not reached even at the maximum scale of 180 km considered in this study. As stated in section 3, such a bias can be taken into account by considering percentile thresholds. The fraction fields corresponding to q = 90th using L = 55 km for the DA1 forecast and the radar data are plotted in Figs. 4d and 4e, respectively. In this case ΔFSS ∼ 0.07 and Lmin ∼ 33 km, indicating that notwithstanding the large bias, the forecast has represented the positions of the convective storms relatively well.
For this particular example, the configuration DA2 also produced realistic storms and associated accumulations (Fig. 4c), but the realization differed notably from that of DA1. The pattern of overall rainfall distribution is slightly worse as determined using q = 0.2 mm h−1, with ΔFSS ∼ 0.03 for L = 55 km, and Lmin ∼ 44 km. However, for moderate accumulations (q = 1 mm h−1), DA2 performed better with ΔFSS ∼ 0.24 for L = 55 km, and Lmin ∼ 36 km. The higher accumulations (q = 4 mm h−1) suffered from bias with ΔFSS ∼ −0.48 for L = 55 km and Lmin once again undefined below 180 km. By contrast, the spatial distribution of the most intense rainfall produced by the model coincides well with that of the observations, and is better than DA1 as can be seen subjectively by comparing Figs. 4d,e,f. Hence, for q = 90th, Lmin ∼ 24 km, and ΔFSS ∼ 0.24 when L = 55 km.
Table 2 includes the FSSs for both DA1 and DA2 at the observational gridlength (i.e., L = 5 km), all of which are relatively poor compared to the scores at L = 55 km. Also shown in the same table are the equitable threat scores (ETS) calculated using 5-km grid-length data. The ETS has a range of (−1/3, 1) with 1 representing a perfect forecast; hence, according to this traditional form of scoring, both DA1 and DA2 produced poor forecasts. These low scores at the grid-length highlight the need for a scale-dependent verification procedure.
2) Statistical verification via skill scores
In this section we consider the statistical results obtained by applying configuration DA1 to the five CSIP cases. In each case, four forecast cycles (as depicted in Fig. 2) were performed, leading to a total of 20 members at each forecast time. Shown in Figs. 5 and 6 are box plots of the resulting ΔFSS distributions as a function of forecast time for three absolute and three percentile thresholds, respectively, at L = 55 km. At all thresholds, mean skill (shown by the solid lines) is relatively high during the DA period (spanning [T − 2 h, T + 1.25 h]), and subsequently decreases during the free-forecast period. Large variation in skill is evident at each forecast time, reflecting the difficulty in determining skill on the basis of individual forecasts. For this reason, subsequent analysis will concentrate on properties associated with the mean value of skill, . In general, the estimated standard error in , Se, is small enough for significant conclusions to be drawn regarding model skill. A simple bootstrapping resampling technique involving 30 000 resamples was used to produce the Se corresponding to each .
The dependence of on horizontal length scale is shown in Fig. 7 for the two accumulation periods [T + 2, T + 3] and [T + 6, T + 7]. With the exception of q = 4 mm h−1 during the period [T + 2, T + 3], which is dominated by a large bias, all of the curves display a relatively rapid initial increase in skill. This is evidence of the model successfully producing storms that are realistic, but spatially displaced with respect to the observed storms. The rapid increase is indicative of the fractions associated with a substantial number of storms beginning to overlap [see Eq. (9)]; once this overlap is achieved, the rate of change in skill progresses at a lower rate.
As in Figs. 5 and 6, a decrease in skill with time at all thresholds is apparent by comparing relevant curves at [T + 2, T + 3] and [T + 6, T + 7] in Fig. 7. In the analysis presented in later sections of this paper, skill is assessed in terms of the time-dependent behavior of . Figure 8 shows , where the upper and lower limits of the error bars correspond to values of calculated from + Se and − Se, respectively (see Fig. 7).5 Note that the smaller is, the better the associated forecast (as is evident from Fig. 7). During the first hour without any DA (i.e., [T + 2 h, T + 3 h]), an acceptable level of skill for q = 0.2 mm h−1 is first reached at ∼35 km with an error (±10 km), dropping to ∼90 (−20, +30) km at [T + 6 h, T + 7 h] (see Fig. 8a, which also includes data pertaining to other model runs, discussed later).
The model overpredicts precipitation as can be seen in Fig. 9, which shows the mean ratio of model to radar-derived precipitation rates over the verification area, R. To understand this effect more clearly, we consider the model-to-radar ratios of areas, , where rates exceed a given threshold of x mm h−1. During the DA1 forecast period spanning [T + 1 h 15 min, T + 3 h], the areas covered by significant precipitation are overestimated, as can be seen from (see Fig. 10). Additionally, the rates associated with those areas are too high, as can be seen from (moderate rates) and (high rates) in Figs. 11 and 12, respectively. This latter overprediction of rates explains, in part, the decrease in mean skill with increasing absolute threshold that is evident by comparing different thresholds in Figs. 5, 7a, and 8a,b). Note that is not shown for q = 4 mm h−1 because, owing to bias, it failed to reach a value below the maximum considered scale of 180 km.
It is difficult to determine the exact cause of the anomalously high rain rates depicted in Figs. 11 and 12, but it is likely that they arise from a combination of model error and the imperfect nature of the DA system. As already noted, the use of percentile thresholds effectively takes into account the error associated with anomalously high precipitation rates. For example, in the case of q = 90th, attains relatively low values (Fig. 8c), demonstrating that the DA1 forecasts are actually more skillful in predicting the spatial distribution of the relatively heavy rainfall than the overall rainfall (as measured using q = 0.2 mm h−1). This difference is indicative of the model tendency to produce cells that are too large in spatial extent. For higher percentile thresholds, the skill drops significantly (see Figs. 6a–c and 7) reflecting less skill in predicting the position of more intense rain.
From T + 3 h to T + 7 h, the areal coverage (as measured using ) is increasingly underestimated.6 Examination of individual forecasts indicates that this is due to the model predicting fewer storms than are present in the observations. One reason that a real storm may be missed is that it is located outside of the model domain during the DA period; that is, either it initiates inside the domain or moves into it only after the free forecast has begun. It is also likely that some real storms may be missed because their formation is suppressed by overly intense preexisting model storms.
b. Comparison with alternative forecast methods
1) Comparison with 12-km model
As stated in the introduction, owing to the computational expense of running high-resolution models, it is imperative that they produce forecasts of added benefit in comparison with lower-resolution model counterparts. In this section we compare the statistical skill of DA1 with that of the 12-km grid-length model in forecasting the five CSIP cases.
The 12-km grid-length model forecasts are less skillful than those of DA1 at all thresholds throughout the entire time period (see Fig. 8). At this resolution the model is unable to dynamically resolve convective structures, and hence convection is dealt with via the parameterization scheme described in section 2a. The scheme has no memory and, therefore, cannot represent storm propagation. One consequence of this is a tendency to produce too much widespread light-to-moderate rain (Lean et al. 2008), evidenced in the present study by the very large values of and (Figs. 10 and 11) exhibited during both the DA and forecast periods. This overprediction is a large contributing factor to the relatively poor skill exhibited at q = 0.2 mm h−1 (Fig. 8a) and q = 1 mm h−1 (Fig. 8b); in the latter case, an acceptable level of skill is not obtained even at the maximum scale of 180 km considered in this study (hence there are no data points pertaining to the 12-km runs in Fig. 8b). Evidence of this effect is also present in the 12-km forecast pertaining to the illustrative case introduced in section 4a. Figure 13 shows the relevant [T + 2 h, T + 3 h] accumulation produced by the 12-km forecast, in which overprediction of low intensity rain can be seen throughout the verification area. In a similar fashion to the DA1 forecasts, and associated with the 12-km runs decrease with time. Rather than implying an increase in skill, this is due to the model failing to depict storms that occur inside the domain after the DA period has finished.
Another consequence of the model’s failure to properly depict convection is that peak rain-rates; that is, those associated with the cores of storms, tend to be significantly underestimated (Lean et al. 2008). Evidence of this bias can be seen in the illustrative example (Figs. 4 and 13). In the statistical analysis (Fig. 12), this leads to values of below unity between [T − 2 h, T − 0.5 h] and from T + 3 onward (with the exception of the period around T + 5 h, which is associated with the outlier mentioned in section 4b). In the intervening period [T − 0.5 h, T + 3 h], ∼ 1 and is indicative of DA successfully introducing storms of realistic intensity. Even during this period, however, spatial location of rain above this threshold is poor, and hence, fails to reach the maximum of 180 km for q = 4.0 mm h−1 throughout the 9-h period.
Failure to depict organized convection also leads to relatively low skill in the spatial distribution of convective rain as measured using percentile thresholds. Statistically, the 12-km model forecasts are shown to be worse than those of DA1 in predicting the location of heavy rainfall (Fig. 8c). Evidence of this poor representation is provided by the illustrative case, in which the 12-km model failed to produce coherent propagating storms corresponding to the regions R1 and R2 in Fig. 4. Instead, the convection scheme produced three quasi-stationary areas of rainfall, which lead to the accumulation areas marked C1a, C1b, and C2 in Fig. 13. This particular example also includes another frequently encountered problem, namely where spurious rainfall is caused by triggering of convection along coastlines, resulting in this case in the feature labeled C3. It is clear from these results that the 4-km model with DA in place leads to far better forecasts than obtainable via the 12-km model on nowcasting time scales.
2) Comparison with 4-km spinup forecasts
In Lean et al. (2008), a 4-km grid length version of the UM was run using interpolated T + 1 h 12-km model states as initial conditions. The same procedure, denoted as SPIN in what follows, has been adopted here for the five CSIP cases; that is, at every 3-h DA cycle, the 12-km T + 1 h state was interpolated to the 4-km grid and the resulting initial conditions were used to run the 4-km model out to T + 7 h. As convection is parameterized in the 12-km model, explicitly resolved convective structures are not contained in either the initial T + 1 h state or the driving LBCs, although such structures can develop or spinup during the subsequent forecast. The accuracy of the resulting precipitation fields may be relatively low, however, both because of the unphysical nature of the spinup process and because the initial large-scale information is insufficient to predict smaller-scale convective structures. The theoretical advantage of running with a DA system in place is therefore twofold in that it avoids the unphysical spinup period and introduces high-resolution observational information.
In the present study, spinup is evident in the time-dependent behavior of the Rarea values corresponding to the SPIN forecasts (Figs. 10 –12). Initially, is very small, corresponding to a lack of resolved storms. A relatively rapid spinup period then occurs, ending at ∼T + 2 h, reflecting a growth in both the number of storms depicted and their areal extent. Then remains approximately constant, with a value comparable to that exhibited by DA1 at T + 7 h. Hence, after an initial adjustment process, both DA1 and SPIN forecasts similarly underpredict the number of storms, owing to the mechanisms already discussed in section 4a.
Initial values of and are also small (Figs. 11 and 12). This is partly because there are too few storms and partly because those storms that are present are too weak. Strengthening occurs on a longer time scale than that associated with . Skill scores relating to absolute thresholds are very poor throughout the forecast period, with failing to reach a value below the maximum considered scale of 180 km in the case of q = 1 mm h−1 (Fig. 8b), and only after T + 6 h in the case of q = 0.2 mm h−1 (Fig. 8a). At early stages, low skill is associated with the underprediction of both areal coverage and precipitation strength. Hence, for example, the [T + 2 h, T + 3 h] SPIN forecast pertaining to the illustrative case of section 4a produced no rainfall during that hour. At later times the main source of error is associated with a poor spatial distribution of rain, and in particular, too few storms.
By contrast, SPIN forecasts are modestly skillful in predicting the spatial distribution of the intense rain (Fig. 8c). The time-dependent behavior of for q = 90th can be understood as arising from two competing effects: the spinup process and error growth. During the first 4-h skill increases as storms develop and strengthen, and error growth eventually begins to reduce the skill in the final hour. From T + 3 h, the skill obtained surpasses that associated with the 12-km model. However, skill is always less than that obtained using DA1. In summary, these results show that DA is vital in obtaining skillful 4-km grid-length model forecasts even at T + 7 h.
c. Relative impact of DA components
This section considers the impact on forecast skill of each component of the DA system. In the previous sections it was shown that the verification scores corresponding to relatively high absolute thresholds are significantly affected by rainfall-rate bias. For this reason we assess performance by concentrating on the two thresholds q = 0.2 mm h−1 and q = 90th. In the following, a number of DA configurations are considered, in which specified changes are made with respect to the control configuration, DA1.
1) Impact of LHN
The impact of LHN was assessed via configuration NOLHN in which LHN was switched off (Fig. 14). LHN is essential in depicting storms over the period [T − 2 h, T + 3 h] as can be seen both from the impact on skill in Fig. 14, and from the increase in (Fig. 10). While this increase is indicative of a greater number of real storms being depicted in the model, the areal extent of these storms is nevertheless too large. A possible cause of this is the breakdown of the central assumption of LHN mentioned in the introduction. In storms where a significant spatial separation between the surface rain rate and the latent heat release aloft occurs, it is likely that positive LHN increments will be introduced over too wide an area, resulting in an eventual overprediction of precipitation coverage. While beyond the scope of the present study, this effect could possibly be mitigated by building a phase correction into the LHN procedure.
From T + 3 h onward, LHN has a neutral impact on for a threshold of q = 0.2 mm h−1, but impact on depicting the 90th percentile persists until T + 6 h. In configuration DA1 the LHN parameter, FLN, was set to 0.5. The dependency of model skill on the parameter was investigated by testing three additional configurations with FLN = 1.0, 0.25, 0.125, respectively. The configuration using FLN = 1.0 resulted in LHN increments that were large enough to cause instability and eventual model failure. Decreasing FLN from 0.5 to 0.25 (labeled as DA2 in section 4a) had a negligible impact on during the forecast period (Figs. 14c,d), while a further halving to 0.125 (configuration DA3) led to a significant reduction in skill (Figs. 14e,f). Later trial studies performed within the context of the Met Office 4-km domain have shown that reducing FLN from 0.5 to 0.25 can lead to the avoidance of spuriously large rain rates in winter, frontal-dominated cases. Hence, the latter setting has been chosen for operational use.
The version of LHN used in the Met Office 12-km grid-length model scales regions of both positive and negative latent heat release. However, idealized high-resolution studies (Leuenberger and Rossa 2007) showed an improvement in skill when LHN was used to scale positive regions only. As noted above, this restriction was imposed in configuration DA1. A further configuration was tested, P&N, in which scaling of both positive and negative LH was permitted. Comparing the resulting time dependence of with that of in Fig. 15 shows that scaling negative regions slightly reduces overall skill during the forecast period.
As discussed in section 2b, Jones and Macpherson (1997) found that in the context of the 12-km grid-length version of the UM, it was beneficial to horizontally filter the LHN increments before adding them to the model state. To assess the effect of filtering in the context of the 4-km model, the configuration FILT was tested, which used the same filter settings as employed in the 12-km model. The behavior of is shown in Fig. 16. Switching off the filter leads to a clear improvement in skill at both thresholds until the accumulation hour [T + 3 h, T + 4 h]. This is in agreement with the expectation that, while the 12-km model benefits from the use of spatially smooth LHN increments, greater small-scale structure should benefit the 4-km model because most convection is explicitly resolved.
2) Impact of cloud nudging
The impact of CN on skill was assessed via the configuration NOCN (Figs. 17a,b) in which CN was switched off. In the case of q = 0.2 mm h−1, the effect of CN is neutral during the DA period and small and positive in the free forecast (Fig. 17a). However, including CN has a significant positive impact from T + 3 h onward on the skill of depicting the position of heaviest rainfall, as measured using q = 90th (Fig. 17b).
For a threshold of q = 90th, CN has a negative impact during [T − 2 h, T + 3 h]. This relative decrease in skill followed by an increase at later times may be indicative of a suboptimal interaction between CN, on the one hand, and LHN, VAR, and the updated LBCs, on the other, arising from mutual inconsistencies in their formulations. Alternatively, it may be that in the absence of CN, LHN leads to a more accurate precipitation field during the DA period by altering the underlying dynamics in a relatively inaccurate manner, (i.e., it overfits the model), and subsequent error growth leads to a worse depiction of the precipitation field than would have been obtained if CN increments had been included.
To avoid the introduction of timing errors, the insertion period for a cloud analysis should be shorter than the time scale over which the observed cloud is evolving; that is, less than an hour in the case of the 4-km grid-length UM. However, given that analyses are only available hourly, this implies intermittent nudging, which may lead to the introduction of imbalance in the model. To determine whether greater skill is achieved via a longer or shorter insertion period, a further configuration (LONG) was tested using a relatively long insertion period of [T − 40 min, T + 10 min]. Figures 17c,d shows that use of the longer insertion period has a neutral impact for q = 0.2 mm h−1, but leads to a substantial negative impact for q = 90th from T + 3 h.
3) Impact of 3D-VAR
The impact of 3D-VAR was assessed by running with a configuration (NOVAR) in which the VAR component was switched off. The resulting values of are shown in Fig. 18. VAR has a large positive impact from T + 2 h onward both on the overall rainfall (q = 0.2 mm h−1) and the spatial distribution of the heaviest rain (q = 90th). These results show that VAR has a positive effect at scales smaller than those at which it explicitly operates (see section 2b), indicating that skillful prediction of small-scale features requires an accurate depiction of the large-scale flow. Evidence of the suboptimality and/or overfitting discussed in the previous section is also present in the case of VAR (see [T + 0 h, T + 1 h] in Fig. 18b). That this effect appears to be weaker than in the case of CN is consistent with the expectation that convective storms will be affected more directly by the high-resolution moisture information contained in the CN increments.
It is clear from Fig. 10 that the inclusion of VAR contributes significantly to the large values of associated with DA1 throughout the DA period and until ∼ T + 3 h.7 This areal overprediction is also the cause of the large values of , which occur during the same period. The effect of VAR on higher rates (i.e., ) is small and neutral.
It is possible that the areal overprediction associated with VAR could be reduced while maintaining skill, by altering the way in which the VAR increments are added to the model state. For example, a planned replacement of the IAU with a digital filter procedure (Rawlins et al. 2007) would involve a balanced set of VAR increments being added instantaneously to the model rather than over a 2-h period, with the aim of reducing harmful interactions between the large-scale information present in the increments and the small-scale convective modes in the model, as well as reducing suboptimal interactions between VAR, and both the nudging systems and the LBCs.
This study has shown that use of a DA system based on nudging and VAR in a 4-km grid-length version of the UM leads to convective precipitation forecasts, which are skillful on nowcasting time scales of 1–7 h. More specifically, it has been demonstrated that for the spatial distribution of hourly rainfall, a mean level of acceptable skill for the 90th percentile is obtained at length scales of ∼30 km at [T + 2, T + 3], dropping to ∼60 km at [T + 6, T + 7]. For a threshold of 0.2 mm h−1, acceptable skill is obtained at larger scales: ∼35 km at [T + 2, T + 3], falling to ∼90 km at [T + 6, T + 7]. The reduced level of skill for this threshold primarily arises because the 4-km forecasts tend to produce cells that are larger than in reality. It has also been shown that the 4-km forecasts suffer from an overprediction in mean rates thereby reducing the skill at higher absolute thresholds. However, from the point of view of an end user, if this bias is borne in mind, the forecasts can be regarded as a skillful tool in predicting the location of convective rain. The forecasts produced using the DA system display much higher skill than those obtained using either a 12-km grid-length model using DA or a 4-km grid-length model running from 12-km initial states.
Up until T + 3 h, LHN has the greatest impact on skill. For later times, VAR and LHN and CN contribute equally to the skill measured using a threshold of q = 90th; while for a threshold of q = 0.2 mm h−1 VAR makes a large contribution and LHN and CN have a negligible impact.
Both VAR and LHN lead to an overestimation of the areal coverage of precipitation. It is likely that this could be mitigated by adding in VAR increments via a digital-filter method and incorporating a phase correction into the LHN procedure, respectively. Furthermore, greater improvements could be made if 4D-VAR were to be used to analyze large scales. Following Rawlins et al. (2007), the resulting increments would be added instantaneously at the start of the DA period and, hence, lessen any suboptimality associated with subsequent nudging.
There are clear theoretical advantages of assimilating both precipitation rates and MOPS data via 4D-VAR instead of the nudging systems used here. Chief among them is that the reliance on the basic assumption of LHN, which can break down at the convective scale, would no longer be required; instead the link between model dynamics and precipitation would be achieved via the linear perturbation forecast model. A second advantage is that including all observation types within the single 4D-VAR system would avoid any mutual inconsistencies in the formulations of the nudging systems, hence reducing suboptimal interactions. While the Met Office is currently developing 4D-VAR of both MOPS cloud and precipitation data it is not clear how quickly a successful implementation within high-resolution models will be achieved. Success depends on the differences between the background state and observations being weakly nonlinear at most, which in turn, relies on an accurate representation of the background error covariance matrix. Such a representation must include flow-dependent balance constraints that are valid at high resolution, and a method of analyzing large- and small-scale information separately needs to be determined. Furthermore, the assimilation of radar-derived winds and refractivities (currently being developed within the Met Office) will be needed in order to accurately constrain convective dynamics. Until these developments are completed, nudging procedures are likely to remain an essential component of the operational forecasting system.
We thank the following people for helpful discussions: Bruce Macpherson, Richard Renshaw, and Olaf Stiller. We are grateful to Brian Golding for helpful comments on an earlier version of this manuscript. We also thank the two anonymous reviewers for their valuable suggestions.
Corresponding author address: Mark Dixon, Met Office Joint Centre for Mesoscale Meteorology, Reading, Berkshire RG6 6BB, United Kingdom. Email: email@example.com
The current operational 12-km grid-length model is the North Atlantic European (NAE), which is driven by the global version of the UM.
In the present study, the search was limited to a square of side 109 grid points, centered on grid point i. In practice, most successful searches terminated at much shorter distances.
The DA system adopted here in the case of the 12-km model corresponds to that used in the retired MES model. The current 12-km operational model—the NAE—uses 4D-VAR.
The current operational 4-km grid-length model, the domain of which encompasses the United Kingdom, does not employ the scale-separation technique.
If + Se leads to a value of , which exceeds the maximum length of 180 km considered in this analysis, then only the lower error bar limit is shown.
The peaks that occur during the period [T + 5 h, T + 6.5 h] are associated with an extreme outlier.