A new frontier in weather forecasting is emerging by operational forecast models now being run at convection-permitting resolutions at many national weather services. However, this is not a panacea; significant systematic errors remain in the character of convective storms and rainfall distributions. The Dynamical and Microphysical Evolution of Convective Storms (DYMECS) project is taking a fundamentally new approach to evaluate and improve such models: rather than relying on a limited number of cases, which may not be representative, the authors have gathered a large database of 3D storm structures on 40 convective days using the Chilbolton radar in southern England. They have related these structures to storm life cycles derived by tracking features in the rainfall from the U.K. radar network and compared them statistically to storm structures in the Met Office model, which they ran at horizontal grid length between 1.5 km and 100 m, including simulations with different subgrid mixing length. The authors also evaluated the scale and intensity of convective updrafts using a new radar technique. They find that the horizontal size of simulated convective storms and the updrafts within them is much too large at 1.5-km resolution, such that the convective mass flux of individual updrafts can be too large by an order of magnitude. The scale of precipitation cores and updrafts decreases steadily with decreasing grid lengths, as does the typical storm lifetime. The 200-m grid-length simulation with standard mixing length performs best over all diagnostics, although a greater mixing length improves the representation of deep convective storms.
The 3D structures of over 1,000 convective storms observed by the Chilbolton radar are used to constrain storm dynamics and microphysics in models with resolutions between 100 and 1,500 m.
Convective storms are the frequent cause of flash floods in many midlatitude countries and can be accompanied by other threats such as hail, lightning, and severe winds. These events can have wide-ranging impacts on livelihoods and infrastructure, so the timing and location of convective storms, as well as their evolution, are important to forecast accurately. Numerical weather prediction (NWP) models are now run at convection-permitting resolutions at several operational forecasting centers. For instance, the Met Office runs its forecast model at a 1.5-km grid length and the German weather service runs its model at a 2.8-km grid length (Baldauf et al. 2011). At these resolutions, models are run without a convection parameterization, which improves the representation of mesoscale convective systems (Done et al. 2004) and the diurnal cycle of convection (Pearson et al. 2014). In practice, at least 4–5 grid lengths are required to represent a cloud (Lean et al. 2008), so clouds that ought to be at smaller scales are either represented at larger scales or are not represented at all. This is at least partly to blame for some of the known problems in kilometer-scale modeling of convection, such as that the timing of storm initiation and onset of rainfall can be delayed or advanced compared to observations (Kain et al. 2008; Lean et al. 2008) and individual storms can fail to organize into larger systems (Pearson et al. 2014). Nevertheless, we perceive a belief in the wider community that resolutions on the order of 1 km are sufficient for accurate simulations of convection, and any remaining issues will disappear simply by cranking up the resolution.
To test this belief, it is paramount to provide NWP models with benchmark observational datasets of convective storms and suitable diagnostic tools, especially as model development turns its focus to even higher resolution and more elaborate microphysics schemes (including the representation of graupel and hail). A number of recent field campaigns have targeted convective systems with radar and had a major focus on evaluating convective-scale models: for instance, the Convective Storm Initiation Project (Browning et al. 2007), the Convective and Orographically induced Precipitation Study (COPS; Wulfmeyer et al. 2008), the Midlatitude Continental Convective Clouds Experiment (Tao et al. 2013), the Tropical Warm Pool–International Cloud Experiment (May et al. 2008), and the 2011–12 MJO field campaign (Yoneyama et al. 2013). However, few studies have evaluated NWP models statistically against systematic observations of the 3D dynamical and microphysical structure of storms to answer key questions for convective-scale NWP. How does the distribution of storm size, lifetime, and total rainfall compare between model and observations? What is the strength and horizontal scale of updrafts in convective clouds and how accurately do models of different resolution capture them? Do models “converge” on the correct behavior when resolution is increased beyond a certain point?
These were exactly the questions that we targeted in the Dynamical and Microphysical Evolution of Convective Storms (DYMECS) project, in which we used automated radar scanning to track the evolution of the 3D structure of more than 1,000 convective storms. The storms were observed over 40 nonconsecutive days between July 2011 and August 2012 under various synoptic conditions, incorporating the majority of days with convective weather during this period. Through the development of specialized diagnostic tools, these observations are providing a powerful constraint on simulations by the Met Office model at resolutions between 1.5 km and 100 m, guiding the specification of physical processes. The large number of storms analyzed allows us to confidently judge whether model changes lead to real improvements in the statistical representation of convective storms. Future analysis of model predictive skill for individual convective events can thus be analyzed with an improved understanding of a model’s representation of convective storms. The DYMECS observational data are publicly available at the British Atmospheric Data Centre, but much of the approach we propose would be applicable to data from other field campaigns with a substantial radar presence, such as those mentioned above.
During the DYMECS project, we scanned more than 1,000 storms with the 3-GHz Chilbolton Advanced Meteorological Radar in southern England. With a 25-m dish, it is the largest fully steerable meteorological radar in the world. The narrow beamwidth (0.28°) allows for measurements separated by only a few hundred meters, approximately 440 m out to 100 km from the radar, making the instrument ideal for evaluating high-resolution models. The dish size limits the azimuthal scanning velocity to 2° s–1, so an innovative automated scanning procedure was developed to observe individual storms in real time through their life cycle, as explained in the sidebar “A radar scanning strategy for convection.”
We have developed an automated radar scanning procedure that allows us to target individual convective storms and track their development. This strategy is essential for scanning convective storms with the Chilbolton radar because of its low azimuthal scanning velocity, but the short time scales over which convective storms evolve make this strategy of use to faster scanning weather radars too. The procedure is composed of two algorithms, the first of which identifies and tracks storm features in the 1-km Met Office rainfall-radar data, using a rainfall-rate threshold of typically 4 mm h–1. We use the autocorrelation between consecutive rainfall scenes to calculate velocity vectors for individual storms (Stein et al. 2014). For each rainfall image, this algorithm produces a list of storms recording their size, lifetime, mean rainfall rate, velocity vector, and location relative to Chilbolton, as well as the locations and values of local rainfall maxima (i.e., the convective “core”). The second algorithm, the “scan scheduler,” scores each storm in the list based on its size and rainfall rate, while scores are reduced based on storm location relative to the current scanning position of the radar. The scan scheduler then prioritizes three storms to be scanned. Although the automated scanning procedure works well without interference, the user is able to monitor and adjust the list of prioritized storms.
Accounting for the time that has passed since the rainfall-radar data were recorded, the scan scheduler uses the velocity vector to update the storm and core locations (shown in Fig. SB1a). The scan scheduler issues commands to the radar to automatically scan the storms. Sets of four RHIs target the three most intense cores in the currently prioritized storms (Fig. SB1b). These commands are issued first as the core locations will be more accurately predicted closer to the time of the new rainfall data, while the cores are also expected to evolve more quickly. The RHIs are followed by stacks of 6–13 PPIs, a minimum of 0.5° apart, that target one or more storm(s) until all prioritized storms have been scanned (Fig. SB1c). The full cycle can last up to 15 min.
We tracked storms in the Met Office rainfall radar data, which provide radar-derived surface-rainfall rates at 1-km resolution every 5 min (Harrison et al. 2012). Using a rain-rate threshold of 4 mm h–1 to isolate convective storms, storm features with an area of at least 4 km2 were given a unique identifier and tracked throughout their life cycle, so that life-cycle statistics could be derived for model evaluation.
Using the Chilbolton radar observations, we derived 3D storm volumes from stacks of plan-position indicator scans (PPIs). Following Stein et al. (2014), the storm volumes were regridded to a regular Cartesian grid of 333 m × 333 m × 500 m, comparable to the horizontal resolution of the radar data. Doppler winds from the range height indicator (RHI) scans were interpolated onto a Cartesian grid with 500-m range by 250-m height resolution and were assumed to be equivalent to the horizontal wind parallel to the plane for scan elevations below 10°. We retrieved vertical velocities in the storm cores from these horizontal winds, using the mass-continuity equation, assuming zero divergence across the plane. The advantage of the mass-continuity method is that its performance can be evaluated using the model vertical wind field: namely, by retrieving an updraft velocity from the model horizontal wind fields.
CLOUD-RESOLVING MODEL CONFIGURATIONS.
All DYMECS simulations were run with the Met Office Unified Model (UM), which operates with a nonhydrostatic, deep-atmosphere dynamical core (Davies et al. 2005). The baseline 1,500-m simulations were reruns of the Met Office operational forecast version of the UM (UKV), which has 70 vertical levels and is run without a convection parameterization scheme. In these simulations, subgrid mixing is treated using the Lock et al. (2000) nonlocal boundary layer scheme in the vertical and a Smagorinsky–Lilly scheme in the horizontal with a mixing length of 0.2 times the grid length. Our simulations at 500-, 200-, and 100-m grid lengths (all with 140 vertical levels) use a local Smagorinsky–Lilly eddy-diffusion scheme, solved explicitly horizontally and implicitly in the vertical direction using the same solution code as the 1D boundary layer scheme.
The subgrid turbulent mixing scheme is an essential component of NWP models with grid lengths on the order of 100–1,000 m, since such models will partially resolve the inertial subrange. For a Smagorinsky–Lilly type of scheme, a ratio of 0.2 between mixing length and grid length was shown by Mason (1994) to best resolve turbulent eddies in large-eddy simulations (LES); a smaller ratio would lead to grid-scale noise, whereas a larger ratio would lead to overly smoothed flow (e.g., Mason and Callen 1986). Hanley et al. (2014) showed that the size distribution and intensity of simulated convective storms are sensitive to the mixing-length configuration. We therefore follow their model configurations and test mixing lengths of 300, 100, and 40 m, which are the default values for the 1,500-, 500-, and 200-m grid-length simulations, respectively.
In this paper, we primarily focus on convective storms observed on 20 April and 25 August 2012. The April case will be referred to as the “shower” case as storms did not develop beyond heights of 5 km, while the August case had a large proportion of storms reaching heights above 8 km and will be referred to as the “deep” case; both cases are representative of other DYMECS cases (Hanley et al. 2014). The 1,500-m grid-length simulation was initialized from the 0400 UTC operational UKV analysis, with lateral boundary conditions provided by the Met Office global model. The 500-m simulation was one-way nested in the 1,500-m simulation (initialized at 0400 UTC), the 200-m simulation was one-way nested in the 500-m simulation (0700 UTC), and the 100-m simulation was one-way nested in the 200-m simulation (0900 UTC). The UM operates a single-moment microphysics scheme based on Wilson and Ballard (1999), with prognostic treatment of liquid, ice, and rain. Our simulations were run with a diagnostic split between ice crystals and ice aggregates based on cloud-top temperature. Stein et al. (2014) showed that changes to the UM ice microphysics affect the distribution of water contents in the convective storms but do not greatly affect the overall storm morphology.
To evaluate the simulations against the observed radar reflectivities, a “forward model” was used to calculate reflectivities from the UM hydrometeor fields using the UM microphysics assumptions on the particle size distribution [see appendix A of Stein et al. (2014)]. Examples of the 3D structures thus obtained, as well as cross sections of reflectivity, are shown in Fig. 1: note that these examples are representative of storm structures observed and simulated but do not depict a one-to-one correspondence between observed and simulated storms. It is clear from this figure that the 1,500-m simulation produces smooth storm structures that vary on scales of several kilometers. The 200- and 100-m simulations on the other hand show numerous individual convective towers and variations on the subkilometer scale, which appear smaller than those in the radar observations, but we need rigorous analysis of the mean morphology and dynamical structures to provide a quantitative evaluation.
STORM LIFE CYCLES.
The analysis of surface-rainfall features, including tracking their life cycles, has previously been used to understand model errors in representing convection (e.g., Weusthoff and Hauf 2008; Varble et al. 2011; Caine et al. 2013; Clark et al. 2014). In their study of multiple DYMECS cases, Hanley et al. (2014) jointly analyzed storm-averaged rainfall rate and area and noted that although 500- and 200-m grid-length simulations generate a similar number of small storms to radar observations, these small storms tend to have rainfall rates a factor of 2 too high. The 1,500-m grid-length simulation on the other hand produces storms that are larger and fewer in number than observed.
We build on the Hanley et al. (2014) study by analyzing the evolution of surface-rainfall rate and area over storm life times. To isolate “dominant” storms, we apply the following rules: 1) For a storm that breaks up into fragments, only the largest fragment maintains the original storm identifier while all other fragments are given new unique identifiers. 2) For a merging event, the new storm will maintain the identifier of the largest original storm while all other original fragments are terminated. Thus, we can study the entire storm life cycle from initiation to dissipation, only counting those storms that remain inside a 200 km × 200 km domain centered on Chilbolton. We performed this analysis on the Met Office rainfall radar data and on the 5-min surface-rainfall output from each of the simulations, between 1000 and 1800 UTC, with rainfall rates aggregated onto a 1-km grid.
The number of small showers and short-lived events increases with decreasing grid length, yet these showers may not contribute much to total rainfall. We therefore show in Figs. 2a,b the cumulative fraction of the total rainfall from dominant storms by storm duration. In the 1,500-m simulation, approximately 20% of rainfall in the shower cases comes from storms lasting less than 2 h, compared to 60% in the observations; for the deep cases, these values are 10% and 30%, respectively. Thus, short-lived storms contribute significantly to the observed total rainfall in these cases, yet they are not represented well in the 1,500-m simulation. These results agree with previous analyses of storm duration in the UM. Barthlott et al. (2011) have shown that for a COPS case the storms last too long in the UM and that the model overestimates precipitation, while McBeath et al. (2014) showed for a convective cold-air outbreak that the UM storms last 20% longer than those observed. Clearly, the operational forecast model will overestimate storm duration and does not adequately represent short-lived storm life cycles.
The 500-m simulation compares well with observations though it generally underestimates the contribution from short-lived storms in the shower case. The 200- and 100-m simulations show similar behavior to one another, both underestimating the contribution from storms lasting longer than 2 h. We note, however, that breakups and mergers occur more frequently in the 100- and 200-m simulations than at coarser grid lengths, which may lead to fewer long-lived storms using our metric. Apart from model grid length, storm duration also depends on the treatment of subgrid turbulence. Increasing the turbulent mixing length at 200-m grid length from its default value of 40- to 100- or 300-m increases the rainfall contribution from long-lived storms in both cases.
To study the evolution of storm size and intensity, we combine the two variables into a storm area-integrated rainfall (AIR) amount. We center storm life cycles on the time at which they reach their maximum AIR and weight each storm by its lifetime integrated AIR. From Figs. 2a,b, we note that in the observations long-lived storms dominate total rainfall, while short-lived storms are more numerous (not shown) and will be representative of the “typical” behavior. We therefore study separately the AIR cycle from the largest AIR contributors in Figs. 2c,d and the cycle from the smallest AIR contributors in Figs. 2e,f; each weighted-average AIR cycle shown represents 50% of total rainfall from convective storms. Figures 2c,d show that, for large AIR storms, the 1,500-m simulation compares well with observations around the time of maximum AIR, although this is due to compensating errors of larger but less intense storms, as shown by Hanley et al. (2014). For the shower case, the 500-m simulation behaves comparably to the 1,500-m simulation, while for the deep case it generally underestimates AIR. The 200- and 100-m simulations again behave similar to one another, both vastly underestimating AIR, likely because they miss the largest storms, which tend to be long lived. The weighted AIR increases with mixing length, reflecting the increase in rainfall contribution from long-lived storms. The 200-m grid-length simulation with 100-m mixing-length simulation compares very well with the observations for the shower case, while the 300-m mixing-length simulation performs best in the deep case.
The weighted mean from small-AIR storms, shown in Figs. 2e,f, clearly indicates that the 1,500-m simulation does not represent such storms well, as its peak AIR is a factor of 1.5 greater than observed in the shower case and a factor of 4 greater in the deep case. The 500-m simulation also has AIR a factor of 1.5 greater than observed in the shower case but compares very well in the deep case. Even for these smaller storms, the 200- and 100-m simulations underestimate AIR, by about a factor of 2 for both cases. This likely relates to these simulations producing numerous storms that are smaller than observed but more intense, as shown for the 200-m simulation by Hanley et al. (2014). For small-AIR storms, the AIR also increases with mixing length, so that the 200-m grid length with 100-m mixing-length simulation compares exceptionally well with the observations in both cases. The 200-m grid-length simulation with 300-m mixing length, however, has too high AIR by factors of 1.5 and 2 for the shower case and deep case, respectively.
The subkilometer grid-length simulations clearly outperform the 1,500-m grid-length simulation in terms of life-cycle statistics, but we have shown that the results vary with the type of convective storms that we try to simulate, while they are also sensitive to subgrid turbulent mixing length. Furthermore, the 100- and 200-m simulations have too much of their rainfall contributed by small but intense short-lived storms, possibly as storms fail to merge into larger and longer-lived storms. To improve these simulations, we will need to better understand and evaluate the microphysical and dynamical structures that are generated.
STORM 3D MORPHOLOGY.
To evaluate 3D storm structures statistically, we describe them by an area-equivalent diameter at each height, which we then composite over multiple storms to obtain the median storm morphology. Thus, a typical storm structure can be represented by the median diameters (shown in Fig. 3). Storms are categorized by their maximum height of the 0-dBZ contour (cloud-top height) into three groups with similar numbers of observed storms. Stein et al. (2014) showed how storm structures decrease in width for each individual category as the model grid length is reduced, concluding that the 200-m simulation performs best overall. Since Hanley et al. (2014) showed that the statistics of surface-rainfall areas in high-resolution simulations are sensitive to the turbulent mixing length, we analyze the 3D morphology in the 200-m grid-length simulations for three different mixing lengths and compare these with the 1,500-m grid-length simulation and the Chilbolton observations.
The smallest scales represented by a model tend to be 4–6 times the grid length (Lean et al. 2008), so it is not surprising that features observed at scales smaller than 6–9 km are wider in the 1,500-m simulation. For instance, the 30-dBZ contour is 7–12 km wide over the three storm categories, about twice as wide as observed (Figs. 3a–f). A similar conclusion can be drawn studying the 0-dBZ contour, though it is only about 1.5 times as wide as observed. The 200-m grid-length simulations clearly generate more realistic 3D structures than the 1,500-m simulation, while both deep and shallow structures tend to reach greater median sizes when mixing length is increased. For broad structures, such as the 0-dBZ contours in the intermediate and deep storms, simulations with 100- and 300-m mixing lengths compare better with observations. Smaller structures, such as shallow storms and 30-dBZ cores, appear better simulated by the smaller (default) 40-m mixing length. As we increase the mixing length, the total number of storms decreases, which is largely due to a decrease in the number of small storms (Hanley et al. 2014). Thus, it is plausible that the 40-m mixing-length simulation produces median storm structures that are too narrow because too many narrow features develop into tall structures, as we can see for instance in the volume reconstruction in Fig. 1c.
We have performed the same analysis for the case of 20 April 2012 (not shown) and note that, although the observed storm widths are about 10 km, the 1,500-m grid-length simulation produces storm widths still a factor of 1.2 too wide. The 200-m grid-length simulations produce storms comparable to the observations, and again simulations with larger mixing lengths perform best. Both the deep case and the shower case suggest that a smaller grid length produces more realistic storm structures but that there is no single mixing-length formulation that is satisfactory for all storm classes.
Since the DYMECS simulations were run, the operational UM ice microphysics has changed to the Field et al. (2007) double-moment scheme to determine ice-particle size distribution, and changes to the ice-particle fall speed are currently being tested. The effect of changes to model ice microphysics on storm width is small compared to the effect of changes to the model grid length (Bryan and Morrison 2012; Stein et al. 2014). However, these changes may affect the internal reflectivity structure of storms (Stein et al. 2014), so the DYMECS data and methodology provide a useful benchmark for testing such changes.
The estimation of 3D wind fields typically requires coincident observations from two or more Doppler radars; such techniques are well established in the literature (e.g., Chong and Testud 1983). Previous attempts to estimate updrafts from single-Doppler RHI observations require the assumption (e.g., Chapman and Browning 2001) that all convergence occurs in the line of sight, enabling the estimation of vertical velocity based on the mass-continuity equation. When this technique is applied to convective rain cells in particular, updrafts will tend to be underestimated because of the undetected convergence perpendicular to the plane of the RHI. We have developed an innovative new approach to account for this problem statistically, illustrated in Fig. 4 using data from the 500-m grid-length simulation. In our method, vertical velocities are retrieved both from the cloud-top down and from the surface upward, assuming 0 m s–1 at the surface and the top; the final estimate shown in Fig. 4c is a weighted average between the two retrievals. We found the distribution of radar-derived vertical-velocity estimates to be comparable to the 500-m grid-length simulation, which has the same horizontal resolution as the interpolated radar data. However, the single-component retrieval underestimates the most intense updrafts of the true model vertical velocities. Therefore, we developed a rescaling function derived from the 500-m simulation statistics, which associates the single-component and true cumulative probability functions based on all the storms represented in the 500-m simulations on 25 August 2012. We see in Figs. 4c,d that, for this example, the strongest vertical velocities are slightly increased and the rescaled velocities are more comparable to the true velocities in Fig. 4e; we stress, however, that the method is not intended to provide the best estimate for an individual slice but to provide the best estimate of the overall velocity statistics. For the 500-m model, the retrieved peak updraft velocity has a root-mean-square error of 3.6 m s–1 with a standard error of 0.3 m s–1. In this section, radar estimates of vertical velocities are derived and rescaled following this method and are used to statistically evaluate the true model vertical velocities. For a full description of the method, see Nicol et al. (2015).
In the radar observations, for each RHI scan we consider only the storm with the highest reflectivity observed within 90 km of the radar for inclusion in the velocity statistics presented below. For the target storm, at each height we identify the location of the maximum vertical velocity and calculate the mean vertical velocity as a function of distance from this peak velocity up to the point where the vertical velocity either falls below 1 m s–1 or no longer decreases monotonically away from the peak to avoid any broadening associated with adjacent updrafts. This same methodology was applied to vertical slices through the surface-rainfall maxima of each storm extracted from the simulations. We compared the AIR statistics of the storms targeted with RHIs to the storm population in the Met Office rainfall radar data; the storms targeted were distributed almost uniformly among the population with above-median AIR. Therefore, only the top 50% of storms according to AIR were included in the statistics for both the observations and simulations. The mean vertical velocity as a function of distance from the updraft core for these storms is shown in Figs. 5a,b for 20 April (shower case) and 25 August 2012 (deep case), respectively.
For the shower case in Fig. 5a, the peak of the mean updraft profile in the 1,500-m grid-length simulation is about 1.5 m s–1, weaker than observed by a factor of 4. The 500-, 200-, and 100-m grid-length simulations have their mean updraft peak a factor of 2–3 weaker than observed. The mean profile width increases progressively with grid length, from about 1 km in the 100-m to about 3.5 km in the 1,500-m grid-length simulation. In the 200-m simulations, this width also increases with increasing mixing length, from 1.5 km with the default 40-m mixing length to about 3 km for the 100-m and 300-m mixing-length simulations. For the deep case in Fig. 5b, the trends with grid length and mixing length are similar to the shower case, but the 200-m grid-length simulation with 40-m mixing length is an excellent match to the observations.
The differences in mean updraft profile between the simulations and the observations have implications for the mass flux of individual updrafts. Assuming that storm width is about the same in both horizontal dimensions, we can estimate that updrafts in the 1,500-m grid-length simulation have a mass flux at least an order of magnitude too large. To obtain the same averaged mass flux over all storms, fewer storms will be required in the 1,500-m grid-length simulation than in the subkilometer simulations and in the observations. Hanley et al. (2014) found that domain-averaged rainfall rates were largely insensitive to changes in grid length and mixing length for the cases considered here. This suggests that the misrepresentation of updraft size and strength is closely tied to an inaccurate representation of the number of storms in the simulations.
A direct comparison between updraft and cloud morphology is obtained by comparing the width of the updraft profiles (as defined above) to the width of reflectivity profiles (determined in the same manner with a 20-dBZ threshold) in the same vertical slice or RHI scan. From Figs. 5c,d, it is clear that both the mean updraft and reflectivity widths decrease with model grid length, with the latter result expected from Fig. 3. The simulations exhibit a strong correlation between the two widths. The joint comparison of updraft width and storm reflectivity structure shows that the 200-m grid-length simulations perform better than the simulations at other grid lengths. Both the updraft and reflectivity widths increase with increasing mixing length, and the 200-m grid-length simulations with greater mixing length behave similarly to the 500-m grid-length simulation by this comparison. We note that, when we relax the monotonicity condition on the updraft and reflectivity widths, the updraft widths remain essentially unchanged (not shown), but the reflectivity widths increase by up to a factor of 3. This change due to “multipeaked” updrafts is most pronounced in the 100-m grid-length simulations and in the radar observations, so that the 500-m grid-length simulation and the 200-m simulations with 300-m mixing length perform best in terms of reflectivity width when multipeaked profiles are considered, while the 200-m grid-length simulations with 40- and 100-m mixing length consistently perform best in terms of updraft width, regardless of this condition.
IS 200-M RESOLUTION GOOD ENOUGH?
An accurate representation of the size spectrum of storms is an essential prerequisite for simulating convective storms, whether achieved by parameterization or by explicit representation. A systematic overforecast of large storms leads to excessive false alarms. Predictability is a separate issue and can be addressed either through assimilation of cloud-scale data such as radar (or nowcasting applications) or ensemble techniques or, ideally, a combination of the two. However, either approach requires a faithful representation of storm sizes.
Our diagnostics for convective-storm evaluation show improvements with decreasing grid length, with the 200-m grid-length simulation with 40-m mixing length performing best overall. Nonetheless, there are two aspects of this simulation that have scope for improvement. First, storms are predominantly small and short lived but intensely precipitating, which points to the need for further refinements in the microphysical assumptions and process rates. Second, deep convective clouds tend to be too narrow. Figures 2 and 3 indicate that both issues can be addressed by increasing the mixing length in the subgrid mixing scheme, but the fact that the default mixing length produces shallow convective clouds of around the right width supports the suggestion of Canuto and Cheng (1997) that the optimum mixing length is flow dependent.
An interesting finding is that storms and updrafts in the 100-m grid-length simulations are by most metrics less realistic than those in the 200-m simulations, being systematically too short lived and too narrow [see also Stein et al. (2014)]. When used in large-eddy models, the mixing length in the Smagorinsky–Lilly scheme relates to the filter scale of eddies much smaller than the large, energy-containing eddies. If the filter scale lies in the inertial subrange, its precise value is a matter of choice (provided we resolve the resulting flow), and it is usual to reduce the filter scale with decreasing grid length as more of the flow can be explicitly resolved, though it would be equally valid to hold it fixed to demonstrate numerical convergence. However, the finding that updrafts continue to decrease in size down to 100-m grid length demonstrates that the largest energy-containing eddies are not properly resolved or, at least, that the impact of the unresolved eddies on the resolved flow has substantial deficiencies.
Previous studies differ on the question of the resolution at which convergent behavior occurs: Bryan et al. (2003) reported that statistical properties of squall lines simulated at 250- and 125-m grid lengths had not converged, and it has been reported that grid lengths smaller than 100 m is required for convergent behavior in simulations of cumulus (Petch et al. 2002) and thermals (Craig and Dörnbrack 2008). Matheou et al. (2011) claimed that a grid length of 20 m was needed to obtain convergence of cloud variables. Conversely, Khairoutdinov and Randall (2006) concluded that their idealized simulations of convection over Amazonia at 100- and 250-m grid length already showed similar behavior, although differences in interpretation will have arisen because these papers did not use the same definition of “convergence.”
A key difference between previous studies and this one is that we have observational evidence to show which resolution model produces more realistic storms. However, our finding that the 200-m model performs better than the 100-m model may be partially because the vertical resolution of these simulations is constant for all horizontal grid lengths of 500 m or less (at around 100 m at 1-km altitude and 300 m at 8-km altitude), so the 100-m horizontal grid-length model may not be any better able to resolve large eddies than the 200-m model. Therefore, further work is needed to investigate the characteristics of storms when vertical resolution is improved as well. From a practical forecasting point of view, it is also important to further diagnose what similar behavior to the 200-m model is obtained by running at a resolution of around 500 m but reducing the mixing length from the default value at this resolution (Hanley et al. 2014).
Subkilometer-scale models are emerging from being run experimentally to actually aiding forecasts: for instance, the London and Weymouth 333-m models (Golding et al. 2013; Wang et al. 2013), model development for the High Definition Clouds and Precipitation for Climate Prediction project, and Environment Canada’s high-resolution model development for outdoor venues at the 2015 Toronto Pan-Am Games. We therefore require novel observational strategies and diagnostic tools for model evaluation. In DYMECS, we have developed a modern test bed for model evaluation with statistics of over 1,000 storms observed with the Chilbolton radar and developed new diagnostics to be used in tandem to highlight model strengths and weaknesses in storm dynamics and microphysics. We have found that, for the Met Office Unified Model (UM), a grid length of 200 m performs best in all diagnostics—life-cycle statistics, storm morphology, and convective updrafts—but the results are sensitive to the choice of mixing length in the subgrid turbulence scheme. For instance, we found that shallow storm structures are better represented by a smaller mixing length, whereas deep storm structures are better represented with a larger mixing length. Also, updraft cores respond differently to changes in mixing length compared to the cores of high reflectivity.
The DYMECS approach could be applied to other radar datasets and models, provided that these include a component at high temporal resolution: for example, 5-minute intervals for storm tracking and a narrow radar beamwidth to resolve storm structures. A key innovation of the DYMECS project has been to show how updraft width and intensity can be estimated from RHIs measured by a single Doppler radar and used to test these crucial aspects in cloud-resolving models. The next challenge is to establish whether better simulations of the character of convective storms leads to more accurate forecasts: in particular, the timing and location of the most intense flood-producing storms.
The Chilbolton radar is operated and maintained by the Rutherford Appleton Laboratory. We are especially grateful to Darcy Ladd, Mal Clarke, and Alan Doo at the Chilbolton Observatory for their invaluable assistance with gathering the radar data. We acknowledge use of the Met Office Unified Model; the Met Office rainfall radar data; and the MONSooN system, which is a collaborative facility supplied under the Joint Weather and Climate Research Programme, which is a strategic partnership between the Met Office and the Natural Environment Research Council. The DYMECS project is funded by NERC (Grant NE/I009965/1).