In recent years, a growing partnership has emerged between the Met Office and the designated U.S. national centers for expertise in severe weather research and forecasting, that is, the National Oceanic and Atmospheric Administration (NOAA) National Severe Storms Laboratory (NSSL) and the NOAA Storm Prediction Center (SPC). The driving force behind this partnership is a compelling set of mutual interests related to predicting and understanding high-impact weather and using high-resolution numerical weather prediction models as foundational tools to explore these interests.
The forum for this collaborative activity is the NOAA Hazardous Weather Testbed, where annual Spring Forecasting Experiments (SFEs) are conducted by NSSL and SPC. For the last decade, NSSL and SPC have used these experiments to find ways that high-resolution models can help achieve greater success in the prediction of tornadoes, large hail, and damaging winds. Beginning in 2012, the Met Office became a contributing partner in annual SFEs, bringing complementary expertise in the use of convection-allowing models, derived in their case from a parallel decadelong effort to use these models to advance prediction of flash floods associated with heavy thunderstorms.
The collaboration between NSSL, SPC, and the Met Office has been enthusiastic and productive, driven by strong mutual interests at a grassroots level and generous institutional support from the parent government agencies. In this article, a historical background is provided, motivations for collaborative activities are emphasized, and preliminary results are highlighted.
The international partnership between NSSL, SPC, and the Met Office has been enthusiastic and productive, driven by strong mutual interests at a grassroots level and generous institutional support from parent government agencies.
The National Oceanic and Atmospheric Administration (NOAA) conducts atmospheric research and development (R&D) primarily within the Office of Oceanic and Atmospheric Research (OAR), with much of the R&D effort aimed at improving weather forecasts issued from a separate NOAA line office, the National Weather Service (NWS). Although they fall in different NOAA branches, the designated national centers for both research and forecasting of severe thunderstorms—OAR’s National Severe Storms Laboratory (NSSL) and NWS’s Storm Prediction Center (SPC)—are located side by side within the National Weather Center in Norman, Oklahoma. Scientists and forecasters from NSSL and SPC regularly engage in collaborative R&D as well as experimental forecasting activities to make continuous improvements in the products and services provided to the public and the larger weather enterprise (e.g., other government agencies, emergency managers, and broadcast media).
The Met Office functions much like the atmospheric component of NOAA, conducting R&D in the atmospheric sciences and providing operational forecasting services for citizens of the United Kingdom. As with NSSL and SPC, organized collaborations between the research and forecasting divisions of the Met Office have been critically important for informing researchers where new R&D is needed and for moving the fruits of that R&D into forecasting operations. As with their American counterparts, this collaborative strategy results in considerable benefits for the United Kingdom.
In recent years, severe weather experts from NOAA and the Met Office have found compelling reasons to combine efforts in improving forecasts of high-impact weather. The impetus for this enhanced collaboration has been a growing recognition on both sides of the Atlantic of the potential value of high-resolution numerical prediction systems as providers of guidance for forecasting high-impact weather events. Over the last decade the Met Office has very successfully introduced progressively higher-resolution operational numerical weather prediction (NWP) systems into forecasting operations, focusing on a unified modeling framework, while NOAA has used somewhat coarser resolution but simultaneously developed and implemented multiple high-resolution NWP dynamic cores. Both organizations have developed innovative postprocessing strategies and unique tools for visualization and verification of model output. Both have also worked with frontline forecasters to tailor model output for specialized forecast problems. Significantly, much of the development in the two organizations has been complementary rather than duplicative, providing extra incentive to combine efforts for mutual benefit.
The primary venue for recent NSSL, SPC, and Met Office interactions has been the annual Spring Forecasting Experiment (SFE), conducted in the NOAA Hazardous Weather Testbed (HWT). Specific objectives of the SFE change each year, but the general framework for this experiment is to conduct real-time forecasting exercises in a simulated operational forecasting environment, with experimental forecast products prepared jointly by operational forecasters and “bench” scientists. These exercises differ from forecasting operations in that the normal set of observations and model guidance used to inform key decisions is supplemented with new scientific concepts and guidance products that are under development within the research community (including SPC’s Science Support Branch). The utility of emerging techniques and guidance products is documented during this process and further assessment of these products is conducted during postforecasting evaluation exercises. This process accelerates the transfer of promising new ideas and technologies into operations and offers key insights for further development of others.
NSSL and SPC have been conducting the annual SFE since 2000 (Kain et al. 2003b; Kain et al. 2006; Clark et al. 2012), and the Met Office became a contributing partner beginning with the 2012 experiment. The benefits of this partnership have been substantial for all organizations and for the SFE. These benefits are discussed herein, beginning with a brief historical overview, followed by a summary of positive outcomes from assessing high-resolution model guidance products from NSSL and the Met Office, and a summary of the benefits of the strategic partnership between NSSL, the Met Office, and SPC.
A BRIEF HISTORY.
Collaborations between NSSL and Met Office scientists on operationally relevant research topics date back to at least the early 1990s, when short-range ensemble forecasting was a topic of mutual interest. In 1994, NSSL and the U.S. National Meteorological Center cohosted a workshop on this topic (Brooks et al. 1995) at which Met Office scientists David Richardson and Mike Harrison participated. Gil Ross from the Met Office also visited NSSL around this time to collaborate on research interests related to model verification and ensemble forecast systems (Harold Brooks, NSSL, 2015, personal communication).
NSSL also had ongoing collaborations with SPC in the late 1980s and early 1990s (e.g., Lewis et al. 1989; Johns and Doswell 1992), and the breadth of these interactions increased dramatically in 1997 when SPC relocated from Kansas City, Missouri, to a specifically designed workspace within NSSL facilities in Norman, Oklahoma. The proximity of SPC operations and support scientists invigorated NSSL research and that burst of activity included studies evaluating the use of ensemble forecast systems to improve the prediction of severe weather (Stensrud et al. 1999, 2000; Stensrud 2001; Stensrud and Weiss 2002; Bright et al. 2004), which provided a scientific foundation for the implementation of an operational short-range ensemble forecasting (SREF) system (Tracton et al. 1998; Du and Tracton 2001) at the Environmental Modeling Center (EMC), which, like SPC, is part of the NWS’s National Centers for Environmental Prediction (NCEP).
The Met Office started developing a nonhydrostatic version of the Unified Model (MetUM) in the mid-1990s (Davies et al. 2005), with the first operational forecasts from this model produced in 2003. In 2005, the first convection-allowing (also called convection permitting) operational forecast was made using a 4-km configuration of the MetUM over the United Kingdom. This was followed in 2009 by a U.K.-wide model with variable resolution (UKV; Tang et al. 2013), having 1.5-km grid length in the interior stretching to 4 km at the boundaries; the UKV currently operates with a 3-h three-dimensional variational data assimilation (3D-Var) analysis cycle, which at the time of writing is about to include a 1-h four-dimensional variational data assimilation (4D-Var) analysis cycle for short-range forecast (out to T + 12 h). Along with the deterministic UKV, the Met Office also uses the Met Office Global and Regional Ensemble Prediction System (MOGREPS; Bowler et al. 2008) to drive a 2.2-km downscaled ensemble over the United Kingdom called MOGREPS-UK (Golding et al. 2016). A more detailed review of the development and use of convection-allowing models (CAMs) within the Met Office can be found in Clark et al. (2016).
Within their jointly occupied facility, NSSL and SPC conducted the inaugural SFE in 2000. Mesoscale deterministic NWP provided the foundation for the annual experiment through 2002 (e.g., Kain et al. 2003a), but SFE leaders shifted gears to focus on mesoscale ensemble forecast systems, such as EMC’s SREF in 2003 (Bright et al. 2004; Homar et al. 2006). A key participant in the 2003 SFE was Ken Mylne, who had worked for 6 years as an operational forecaster in the Met Office before assuming responsibility for the Met Office’s ensemble forecasting team in 1999. Ken contributed his insights for a full week during the SFE (Fig. 1).
The SFE continued during 2004 and 2005, with a primary focus on experimental CAM forecasts that were generated by collaborators at EMC and the National Center for Atmospheric Research. After witnessing firsthand the potential impact of CAMs in the 2005 SFE, EMC scientists began generating daily contiguous United States (CONUS)-scale CAM forecasts for the SPC on an experimental basis: the first such effort in the United States and a harbinger of things to come. There was a 1-yr hiatus of the SFE in 2006 owing to the relocation of both NSSL and SPC to the National Weather Center building. During this period there was diminished formal interaction between NSSL/SPC and the Met Office, but the organizations conducted parallel research initiatives focusing on applications of CAMs for the prediction of localized high-impact weather. Innovative Met Office concepts in representing uncertainty and in verifying high-resolution model output (e.g., Roberts 2005; Roberts and Lean 2008) informed ongoing research at NSSL (e.g., Schwartz et al. 2010). Meanwhile, seemingly contradictory results related to resolution sensitivities (Kain et al. 2008; Schwartz et al. 2009; Roberts and Lean 2008; Lean et al. 2008) begged further investigation on both sides.
The 2007 SFE featured the first demonstration and evaluation of CAM-based ensembles, generated for the experiment by the University of Oklahoma Center for Analysis and Prediction of Storms (CAPS; see Xue et al. 2007). Ken Mylne returned for another week of participation to exchange knowledge about ensemble forecasting, both the convection-allowing and convection-parameterizing varieties, sharing his perspective on how ensemble systems should be configured, interpreted, and evaluated. His immersion in the SFE’s application of convection-allowing ensembles for severe convection forecasting influenced his decision to give added weight to the preexisting plans for the development of CAM-based ensemble systems for the Met Office. In 2009, another Met Office scientist, Nigel Roberts, visited the SFE for 1 week to explore further development of both deterministic and ensemble configurations of CAMs.
Over the next couple of years, Met Office, NSSL, CAPS, SPC, and EMC scientists continued to explore new forecast applications using their CAMs and they worked on the development and optimization of CAM-based ensemble systems. This resulted in the first use of a real-time year-round CAM ensemble by SPC forecasters in 2011 (Jirak et al. 2012), with membership derived from operational forecasts at EMC and semi-operational runs at NSSL. In addition, NSSL and SPC collaborators developed new diagnostic strategies and postprocessing routines that were specifically designed for CAMs and the severe weather forecasting challenges faced by the SPC (e.g., Sobash et al. 2011; Kain et al. 2010; Clark et al. 2012). These tools were migrated to SPC operations and were soon embraced by the SPC forecast staff.
Complementary knowledge bases and skill sets were being developed by NSSL/SPC and the Met Office during 2010/11, yet formal interactions between NOAA and the Met Office again waned. However, in 2012 the Met Office made a bold move to provide real synergy to the collaboration by assigning two top severe weather forecasters—Steve Willington and Dan Suri—to participate for a week in that year’s SFE and for the full 5 weeks of the experiment in 2013 (Fig. 2). In addition, beginning in 2013 the Met Office increased its investment by generating real-time CAM forecasts for the SFE, using operational versions of the MetUM, and executing the model forecasts on high-priority, continuously supported operational computing systems. Furthermore, they began assigning a broader group of scientists and forecasters to participate in the SFE, contributing personnel for every day of the experiment in 2014 and 2015. The MetUM CAMs have provided an important dataset for comparison with CAMs that have been run by NOAA for the SFE, and Met Office personnel have provided unique and valuable perspectives on forecasting and high-resolution NWP, no doubt elevating the positive impact of the experiment on the broader meteorological community.
POSITIVE EARLY OUTCOMES.
New diagnostic tools, visualization strategies, and related initiatives.
In their efforts to help operational forecasters extract the most useful severe weather guidance from high-resolution NWP models, NSSL and SPC have developed numerous visualization and postprocessing tools (e.g., Karstens et al. 2015). One positive outcome of Met Office interactions with NSSL and SPC has been the migration of diagnostic tools and visualization strategies into the MetUM diagnostic toolkit. For example, simulated radar reflectivity was one of the first—and most revealing—diagnostic output fields to emerge from early CAM testing in the United States (e.g., Koch et al. 2005), and it was quickly adopted by the operational forecasting community in this country. However, when Met Office participation in the SFE ramped up in 2013, simulated reflectivity was not available in the MetUM’s operational diagnostic package (although offline code existed so it could be generated from other MetUM output fields; e.g., see Stein et al. 2014). The pervasive use and obvious value of simulated reflectivity as a diagnostic tool in the SFE provided motivation for the Met Office to incorporate this field as a standard diagnostic in the MetUM, facilitating direct comparison with the U.S. models. These comparisons revealed previously undetected, higher-than-expected graupel concentrations associated with the operational MetUM microphysical parameterization. As a result, additional testing and research was conducted, eventually leading to important adjustments and improvements to the representation of microphysics in the MetUM. The Met Office is now routinely outputting the reflectivity fields from its U.K. NWP suite. Several MetUM users outside of the United Kingdom have also started to make use of the simulated reflectivity for verification and forecasting purposes.
Another widely used CAM diagnostic field, updraft helicity (Kain et al. 2008), has also become part of the output diagnostic suite in the MetUM. This field has proven to be valuable for identifying and tracking supercell-like features in high-resolution model forecasts, which is significant because, among all thunderstorms, supercells are associated with a disproportionate share of severe weather reports (e.g., Duda and Gallus 2010; Smith et al. 2012).
The HWT experience has also inspired complementary research initiatives within the Met Office. For example, Met Office scientists have simulated several high-profile tornado events from the 2013 season using a domain with 100-m grid spacing nested within the 2.2-km grid that they used in real time during the 2013 SFE. At 100-m grid length the MetUM is able to produce realistic supercells with tornado-like vortices (Hanley et al. 2016).
Comparison of model performance 2014–15.
CAM forecasts from numerous contributors were considered in the annual SFE during 2014 and 2015 (e.g., Jirak et al. 2014), but the focus here is on comparisons of those generated by NSSL and the Met Office. Scientists from both organizations initialized their CAMs daily at 0000 UTC, deriving initial and lateral boundary conditions by simply downscaling from coarser-resolution operational models. Specifically, the NSSL CAM was initialized using EMC’s 12-km North American Mesoscale Model (NAM) (Rogers et al. 2009), while initial conditions for MetUM CAM configurations were derived from operational global versions of this model, which had 17-km grid spacing during this time period.
The NSSL CAM was a 4-km configuration of the Advanced Research version of Weather Research and Forecasting (WRF) Model (WRF-ARW; Skamarock et al. 2008) with 35 vertical levels (NSSL-WRF). It has been run year-round by NSSL in support of SPC operations since late 2006 (Kain et al. 2010), and the configuration was held constant during 2014 and 2015. The Met Office used multiple CAM configurations over these 2 years, with each configuration being a high-resolution, slightly modified version of the operational MetUM, including 70 vertical levels. Two different configurations were used in 2014. The first had 4.4-km grid spacing and was downscaled directly from the global MetUM, while the second had 2.2-km spacing and was nested within the coarser-resolution (4.4 km) domain. In 2015, three different configurations were used. Specifically, there were two 2.2-km versions with identical forecast domains and identical initial conditions derived from the global model, but one differed by using an experimental parameterization of partial cloudiness (labeled PC2 in graphics shown here). The third configuration used in 2015 had 1.1-km grid spacing, non-PC2 physics, and was nested within the non-PC2 2.2-km grid, from which it drew its initial and lateral boundary conditions forecasts. The forecast domains are shown in Fig. 3. A pair of 24-h forecasts from the 2.2-km MetUM and 4-km NSSL-WRF models is shown in Fig. 4, indicating that both CAMs are capable of generating rainfall fields with a great deal of realism.
Subjective evaluations of radar reflectivity forecasts.
A staple of the SFE is the systematic subjective assessment of weather forecasts, which is a forecast from both NWP models and human forecast teams (Kain et al. 2003a). In the 2014 SFE, this included a direct comparison of daily NSSL-WRF and 4.4-km MetUM forecasts of high-impact weather. Specifically, SFE forecast teams were asked to indicate (retrospectively) which CAM provided better guidance for the high-impact convective weather that occurred, that is, which model forecasts of convective storms corresponded better with observed radar. As shown in Fig. 5, the MetUM forecasts were judged to be superior 50% of the time, compared to 30% for the NSSL-WRF. Neither one was considered better than the other on 20% of the days. In addition to this simple either/or assessment, the teams were asked to note distinguishing characteristics of and differences in CAM forecasts. This information has proven to be very useful over the years for identifying and documenting systematic biases, strengths, and weaknesses of different modeling systems.
Objective verification of quantitative precipitation forecasts.
Precipitation forecasts from the CAMs were examined during the SFE on a daily basis, and after the experiment they were objectively verified in an aggregate sense, using precipitation observations from the Multi-Radar Multi-Sensor system developed by NSSL (Zhang et al. 2016; Smith et al. 2016). Results from three different metrics are shown here.
The first metric is the equitable threat score (ETS; also called the Gilbert skill score), which has been used in the United States as a bellwether metric for precipitation forecasts for decades (e.g., Olson et al. 1995; Baldwin and Kain 2006). It provides a measure of the degree of spatial overlap between forecasted and observed precipitation events. It is commonly agreed that, for CAM forecasts, it is useful to compute this score on a “neighborhood” basis, where exact gridpoint correspondence is relaxed, owing to inherently low predictability on the small scales of individual forecast grid points (Roberts and Lean 2008; Ebert 2009). Here, ETSs are measured by comparing CAM forecasts and observations at the grid point [radius of influence (ROI) of zero] and within a radius of 24 km, focusing on the 2014 results. Not surprisingly, the scores are consistently higher with the nonzero radius (Figs. 6a–c), regardless of precipitation threshold. Another obvious systematic difference is higher ETSs for the MetUM during the first 12 h of the 36-h forecasts, both for the 4.4- and 2.2-km configurations. The ETSs are quite similar for all CAMs during the 12–36-h period of the forecasts.
The second metric is frequency bias (commonly referred to as simply the bias), which is often used in combination with ETS. The bias is a ratio of total areal forecast coverage to observed coverage, without consideration of location. With a perfect bias being 1, it is evident that both of the MetUM CAMs tend to overestimate the areal coverage of precipitation for the entire forecast period, but the high bias appears to be more pronounced at higher precipitation thresholds and particularly during the first 12 h (Figs. 6d–f). These scores are from 2014, but the same tendencies were noted in 2015 (bias scores from 2015 are not shown).
The third metric used here is the fractions skill score (FSS; Roberts and Lean 2008), which provides information about the degree of correspondence between forecast and observed features as a function of spatial scale. Focusing now on 2015 results, Fig. 7 (left) shows that FSSs measured using a 16 mm (24 h)−1 accumulation threshold improve when assessed over increasingly larger scales. The MetUM scores are somewhat higher than the NSSL-WRF on all scales. For a higher threshold of 64 mm (24 h)−1, the NSSL-WRF outscores the MetUM at all but the smallest scales. The relatively low FSS from the MetUM for larger scales and higher accumulation thresholds may reflect the model’s high bias at these thresholds, as high bias tends to have a negative impact on FSSs, especially on larger scales (Mittermaier and Roberts 2010).
In combination, these ETS, FSS, and bias scores reveal important systematic differences in model performance. For example, both of the MetUM configurations appear to “spin up” precipitation at the start of the forecast period much more quickly than the NSSL-WRF does. Yet, this more timely adjustment appears to come at a cost: overprediction of precipitation coverage. It seems likely that these systematic biases result from some combination of deficiencies in model physics and initial conditions. Whatever the source, it is important to understand the physical reasons for these systematic differences and find an optimal balance between rapid spinup and appropriate coverage/intensity.
Further objective information can be obtained by examining the characteristics of the rainfall cells (Fig. 8). The MetUM 2.2- and 1.1-km CAMs and NSSL-WRF 4-km CAM all produce too many small cells, and the NSSL-WRF tends to produce too few larger cells (diameter > 250 km). The NSSL-WRF 4-km model produces too many cells of all intensities, whereas the MetUM models are closer in number for the very light showers but have noticeably too many very heavy rain cells even beyond values measured by radar. Once again this provides more insight into the differing behavior of the models, tying in with the findings from the verification scores, and is useful when delving deeper into the physical processes responsible.
Severe weather forecasters rely heavily on soundings from raobs and from model forecast soundings [point forecast soundings (PFCs)] to help them diagnose and predict the likelihood of initiation, potential intensity, mode of organization, and likelihood of hazards such as tornadoes, large hail, high winds, and/or heavy rain associated with deep convection (e.g., Baldwin et al. 2002; Coniglio et al. 2013). Consequently, corresponding PFCs from the NSSL-WRF and MetUM were compared during the 2014 SFE. A glaring difference was revealed in this comparison, particularly in environments with a convective boundary layer topped by a sharp, stable transition to an elevated mixed layer (commonly referred to as a capping inversion). The NSSL-WRF tended to provide an overly smooth depiction of this capping inversion or convective inhibition (CIN) layer (e.g., Fig. 9a), while the MetUM often represented this with noticeably greater fidelity (e.g., Fig. 9b). This difference is especially important because the structure of the CIN layer is believed to play an important role in modulating the timing, location, and incidence of convective initiation. The undesirable tendency for the NSSL-WRF to smooth out the vertical structure is the subject of ongoing investigation. Early testing suggests that this tendency cannot be avoided by simply increasing the vertical resolution of the model.
The NOAA HWT provides a stimulating environment that cultivates collaborative activities between meteorological researchers and practitioners. For more than a decade this test bed has provided an environment in which individuals from these groups of professionals have learned to appreciate the insights, skills, and efficacy of the other, enhancing collaborative relationships and resulting in operational forecasting practices that more quickly adapt to emerging technologies and scientific advances while better informing operationally relevant research initiatives. The enhancement of complementary research-to-operations (R2O) and operations-to-research (O2R) interactions is a core mission of NOAA test beds (Ralph et al. 2013).
The underlying enthusiasm for activities in the HWT is fuelled by a compelling mutual interest in applied research related to prediction of high-impact weather. In the HWT’s annual SFE, this grassroots enthusiasm has its foundation in interactions between NOAA’s national centers for expertise in severe weather research (NSSL) and forecasting (SPC), with additional contributions from other domestic agencies. In recent years, the SFE has been invigorated by contributions from the Met Office, which in 2013 initiated a strong commitment to the SFE that significantly expanded international participation and provided complementary insight and technical skills in both forecasting and research. Specifically, the Met Office brought expertise gained from its efforts using CAMs to better represent the convective storms that bring flash flooding in the United Kingdom. The infusion of Met Office models and perspectives dovetailed exceptionally well with the rapidly growing NSSL and SPC proficiency in using CAMs to help predict tornadoes, large hail, and damaging winds.
The challenges associated with prediction of these and other high-impact weather phenomena are too great to be addressed entirely by any single institution. The successful collaborative efforts of the HWT, NSSL, SPC, and Met Office are demonstrating that international collaboration can provide synergy, efficiency, and important scientific advances when it is strongly supported at both grassroots and institutional levels.
The annual SFE, conducted in the NOAA HWT, has benefited for many years from the generous support of the NSSL, the SPC, and the University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies (CIMMS), including the technical support and administrative support staffs from these organizations. In addition, the NOAA/OAR/Office of Weather and Air Quality has dramatically increased its support of the HWT in recent years, with considerable benefit to the organization, facilities, and experimental activities. Authors Knopfmeier and Karstens were funded by NOAA/Office of Oceanic and Atmospheric Research under NOAA–University of Oklahoma Cooperative Agreement NA11OAR4320072, U.S. Department of Commerce. The statements, findings, conclusions, and recommendations are those of the authors and do not necessarily reflect the views of NOAA or the U.S. Department of Commerce.
Since 2012 the SFE has received additional benefits from the generous support of the following Met Office teams: Forecasting and Service Delivery, with particular thanks to Iain Forsyth (executive head), Paul Davies (executive head forecasting), and Martin Cumper (strategic resources manager); Public Weather Service, with particular thanks to Derrick Ryall (executive head) and Becky Moore (international defense); Weather Science, with particular thanks to Dale Barker (deputy director), Ken Mylne (head, numerical modelling), Clive Wilson (manager mesoscale model development), Adrian Semple (forecast monitoring project leader), and Tom Blackmore (satellite applications research scientist); Foundation Science, with particular thanks to Douglas Boyd (senior scientist, science partnerships); Simon Vosper and Adrian Lock (atmospheric processes and parameterization); and Technology and Information Services, with particular thanks to Dave Robinson (operational NWP suite team leader) and Siân Collins (project manager).
Collaborations between the Met Office and NSSL are facilitated by a formal memorandum of agreement between these two organizations, dated 28 February 2014.
Many thanks to Harold Brooks of NSSL and four anonymous reviewers for their helpful and constructive comments on earlier versions of this manuscript.