Abstract

A wide range of numerical weather prediction (NWP) innovations are under development in the research community that have the potential to positively impact operational models. The Developmental Testbed Center (DTC) helps facilitate the transition of these innovations from research to operations (R2O). With the large number of innovations available in the research community, it is critical to clearly define a testing protocol to streamline the R2O process. The DTC has defined such a process that relies on shared responsibilities of the researchers, the DTC, and operational centers to test promising new NWP advancements. As part of the first stage of this process, the DTC instituted the mesoscale model evaluation testbed (MMET), which established a common testing framework to assist the research community in demonstrating the merits of developments. The ability to compare performance across innovations for critical cases provides a mechanism for selecting the most promising capabilities for further testing. If the researcher demonstrates improved results using MMET, then the innovation may be considered for the second stage of comprehensive testing and evaluation (T&E) prior to entering the final stage of preimplementation T&E.

MMET provides initialization and observation datasets for several case studies and multiday periods. In addition, the DTC provides baseline results for select operational configurations that use the Advanced Research version of Weather Research and Forecasting Model (ARW) or the National Oceanic and Atmospheric Administration (NOAA) Environmental Modeling System Nonhydrostatic Multiscale Model on the B grid (NEMS-NMMB). These baselines can be used for testing sensitivities to different model versions or configurations in order to improve forecast performance.

The MMET helps streamline the path to operational implementation for promising NWP innovations developed in the research community.

The “valley of death” is a term frequently used to describe the seemingly arduous process of transferring technology from research to operations (R2O). The core mission of the Developmental Testbed Center (DTC; Bernardet et al. 2008) is to bridge the valley by facilitating interaction between the research and operational sectors of the numerical weather prediction (NWP) community in pursuit of common goals. To accomplish this mission, focus is placed on two key aspects: 1) advancing science research by providing access to, and community support for, software components of operational NWP systems to the research community and 2) extensive testing and evaluation (T&E) of promising innovations emerging from the research community in a common framework to demonstrate the potential benefits for operational systems. The ultimate objective of these specific activities is to facilitate the interaction and transition of NWP technology between research and operations in order to accelerate the rate at which innovations are infused into operational NWP, promoting an overall improvement in forecast skill.

Known shortfalls within operational models are often targeted by developers as focus areas with an end goal of improving operational forecasts. A wide range of NWP innovations are under development in the research community [a small subset of examples include Pleim (2007), Iacono et al. (2008), Morrison et al. (2009), Angevine et al. (2010), Mansell et al. (2010), Niu et al. (2011), Jiménez and Dudhia (2012, 2013), Grell and Freitas (2014), and Thompson and Eidhammer (2014)] that have the potential to positively impact operational NWP models. With this large number of new techniques, a testing protocol detailing the procedures necessary to advance innovations through the R2O process is imperative. To address this need, a three-stage process to test promising new NWP techniques was first discussed at the “NWP Workshop on Model Physics with an Emphasis on Short-Range Prediction” held in Camp Springs, Maryland, during the summer of 2011 (Wolff et al. 2012). As described in the workshop summary, the three-stage process of R2O transition starts with 1) a proving ground for the research community to demonstrate promising results with a new technique on a limited number of cases, then moves into 2) extensive testing and evaluation performed at the DTC to further quantify impact on a broad range of cases, and ends with 3) preimplementation testing at an operational center prior to being fully integrated into the operational system. Ultimately, engaging the research community in the transition of research capabilities into operations needs to be streamlined to make it not only efficient but also effective. In addition, it is critical to foster an environment of active development and testing with open communication of results among the three participating partners of the transition process: researchers, the DTC, and operational centers.

The 2011 workshop also highlighted the need for establishing and maintaining a common dataset and testing framework to be used by the research community and model developers for testing innovations. The mesoscale model evaluation testbed (MMET; www.dtcenter.org/eval/meso_mod/mmet) addresses this need as part of the first stage of the R2O transition. Over the course of this paper, the MMET facility will be fully described and an illustration of how one particular innovation progressed through the R2O process is demonstrated in order to promote broader participation. The research community is encouraged to take advantage of the framework provided by MMET during their testing to efficiently and effectively demonstrate the merits of a new development that could positively impact future NWP operations. In addition, MMET provides the opportunity to perform diagnostic investigations into specific cases to advance the understanding of why a particular configuration did or did not perform well.

MESOSCALE MODEL EVALUATION TESTBED.

Background.

During development of new techniques or components in NWP systems, researchers generally perform case studies in order to delve further into the complex interactions of nonlinear systems. Detailed case studies allow the researcher to leverage existing familiarity with a particular event to assist in the early stages of development and advancement of methods and techniques. MMET was established by the DTC to assist the research community with testing conducted in this initial stage of the R2O transition process by providing a common framework and data for several case studies of interest.

A number of routine and high-impact cases are available in MMET. The current set of cases was selected through collaboration with operational centers [i.e., Environmental Modeling Center (EMC) and Storm Prediction Center (SPC)] and R2O organizations [i.e., Hydrometeorology Testbed (HMT)] to ensure the cases provided to the community are of high interest and address a specific operational need. Currently, there are 14 MMET cases (Table 1) available to the community and recommendations for new cases are readily accepted for consideration (www.dtcenter.org/eval/meso_mod/mmet/cases/form_submission.php). In addition to single-case events, longer time periods to further investigate a new technique are also included. It is recommended that researchers run several case studies spanning multiple weather regimes to illustrate the versatility of the innovation for operational use.

Table 1.

MMET case list.

MMET case list.
MMET case list.

In general, the end-to-end system utilized for testing innovations includes model initialization, either directly from an operational analysis by running fields through the appropriate preprocessor or using a data assimilation system, the model itself, postprocessing, verification, and graphics generation. Specifically, the software components used within MMET include either the Weather Research and Forecasting (WRF) Preprocessing System (WPS) or the Nonhydrostatic Multiscale Model on the B grid (NMMB) Preprocessing System (NPS), the Gridpoint Statistical Interpolation (GSI) data assimilation system, either the Advanced Research version of WRF (ARW) or the National Oceanic and Atmospheric Administration (NOAA) Environmental Modeling System NMMB (NEMS-NMMB), the Unified Post Processor (UPP), the Model Evaluation Tools (MET) for verification, and the National Center for Atmospheric Research (NCAR) Command Language (NCL) for model output visualization. The code base for each of the components used to generate the DTC-provided baselines in MMET is freely distributed and supported to the user community (www.dtcenter.org/code). In general, computational resources are not provided to run MMET cases; however, proposals for projects utilizing cases available through MMET are encouraged for the DTC Visitor Program (www.dtcenter.org/visitors/), which provides researchers with financial and computational resources to test new forecasting and verification techniques, model components for NWP systems, and data assimilation approaches.

Datasets of opportunity.

Data available in MMET from each step in the end-to-end process make up the so-called datasets of opportunity, which are housed at the DTC and are accessible to the community at large via a web interface (http://dtcenter.org/repository). Figure 1 is a graphical representation of MMET-provided model initialization datasets, configuration files, scripts, observation datasets, and output datasets. The Repository for Archiving, Managing, and Accessing Diverse Data (RAMADDA), which was developed by Unidata (Sherretz and Fulker 1988; Fulker et al. 1997), is used as the foundation for hosting and distributing the data. The datasets can be utilized by the research community to test and evaluate newly developed innovations that address forecast needs in areas that may include but are not limited to hydrology, severe weather, aviation, energy, ground transportation, air quality, and fire weather. Through the common framework provided by MMET, researchers have the ability to perform direct comparisons between multiple innovations tested by the research community and/or against the baseline results established by the DTC. Researchers are encouraged to collaborate with the DTC by communicating and sharing their results with the community at large via the MMET data repository.

Fig. 1.

Diagram illustrating the MMET datasets of opportunity available to the user community from each step of the end-to-end process, including configuration files (orange), scripts (gray), input datasets (green), and output datasets (blue).

Fig. 1.

Diagram illustrating the MMET datasets of opportunity available to the user community from each step of the end-to-end process, including configuration files (orange), scripts (gray), input datasets (green), and output datasets (blue).

The MMET data repository provides four options for initial conditions (ICs) and lateral boundary conditions (LBCs). The baseline forecasts established by the DTC currently use ICs/LBCs derived from the North American Mesoscale (NAM) model. The other available IC/LBC options include the Global Forecast System (GFS), the Rapid Refresh (RAP), and the High-Resolution Rapid Refresh (HRRR) models. Data are available for the initialization time(s) for each of the case studies and include the analysis and forecast files (3-h increments for NAM and GFS and hourly for the RAP and HRRR); RAP and HRRR are available for all MMET cases after the respective operational implementation of those systems. In addition, WPS and NPS configuration files and output from the final step of the preprocessing component are distributed.

Another component of the initialization datasets available on the data server includes the necessary files to run the GSI data assimilation system. A script to run GSI, which includes the autogeneration of the GSI namelist, is available along with select fixed files needed to properly run the system (e.g., background error covariance files, observation data control files for conventional and satellite observations, and satellite bias correction files). Through MMET, the user community has access to the North American Data Assimilation System (NDAS) observation files in Binary Universal Form for the Representation of Meteorological Data (BUFR) format, which include conventional data as well as satellite radiance data; both the ARW and NEMS-NMMB baselines will be configured to assimilate the NDAS files as MMET evolves. In addition, RAP observation files in BUFR format are available for all cases that occurred after its operational implementation.

Baseline results for each of the MMET cases have been established by the DTC using select model configurations based on operational settings (e.g., physics and dynamics options) within the ARW and NEMS-NMMB NWP systems (Table 2). Both models are configurable and offer a variety of options for testing and tuning; the configuration files used for the MMET baselines are available for download. All MMET baseline forecasts were run on a 12-km grid (15-km grid for versions prior to 3.7 for ARW and 0.9 for NMMB) that covers the contiguous United States (CONUS; Fig. 2) with select cases including a 3-km nested grid over the main area of interest for that particular event. The CONUS domain covers complex terrain, plains, and coastal regions, which represent diverse regional effects for worldwide comparability. For MMET cases that consist of a single initialization, forecasts were run out to 84 h; for multiday case studies, forecasts were initialized twice per day at 0000 and 1200 UTC and run out to 48 h.

Table 2.

Physics suites based on operational configurations used for baseline configurations.

Physics suites based on operational configurations used for baseline configurations.
Physics suites based on operational configurations used for baseline configurations.
Fig. 2.

Map depicting the standard 12-km parent domain (d01) and an example of a DTC-defined 3-km nested domain (d02; used for the 3 Feb 2012 snowstorm case).

Fig. 2.

Map depicting the standard 12-km parent domain (d01) and an example of a DTC-defined 3-km nested domain (d02; used for the 3 Feb 2012 snowstorm case).

For postprocessing, software components available in UPP are used to destagger fields from the native model grid to an unstaggered Arakawa A grid, generate derived meteorological variables, and vertically interpolate fields to isobaric levels. The postprocessed files created include two- and three-dimensional fields on constant pressure levels, both of which are required by the plotting and verification programs. Sample configuration files and an example run script are provided on the MMET data server to assist with the postprocessing step.

Output from UPP is used to create graphics of model fields for a number of surface and upper-air variables (e.g., surface temperature and 500-hPa wind speed and direction) as well as model-derived quantities [e.g., convective available potential energy (CAPE) and simulated radar reflectivity]. Graphics are generated for every forecast in 3-h increments for the duration of the forecast period; all plots, as well as the associated NCL scripts used by the DTC, are disseminated in the data repository. In addition to the standard NCL scripts available in MMET, new diagnostic tools are readily accepted for contribution and distribution to the community at large; example contributions to date include plots of vertical cross sections, integrated water vapor transport, and radiation fields. While objective verification statistics are a vital component of the evaluation, the graphics are also an excellent diagnostic tool to monitor model performance throughout an event and subjectively compare DTC baselines to an experimental run that includes an innovation.

The research community is strongly encouraged to use MET (Fowler et al. 2010; code and documentation are available online at www.dtcenter.org/met/users/metoverview/index.php) when performing objective verification to be consistent with both the DTC and other MMET users; the use of a common verification package further streamlines the R2O transition. The DTC advocates the communication of objective verification results produced by the research community in a way that is concise and allows for the value of innovations to be readily assessed. A major component to aid this effort is providing a common set of observation data for all community members to use in the verification steps. For grid-to-point verification, which is utilized to compare gridded surface and upper-air model data to point observations, NDAS conventional observation files are provided on the MMET data server. National Centers for Environmental Prediction (NCEP) stage II precipitation datasets are available for the 3-h precipitation accumulation grid-to-grid verification, while either the Climate Prediction Center (CPC) or stage II analyses are available for 24-h accumulations, depending on the date of the case. The DTC uses these observation datasets to compute objective verification for the DTC-generated baselines that are made available to the community, including output files from the statistical analysis stage of MET and plots of the aggregated verification statistics. For ease of use, scripts to run MET, along with template configuration files, are provided for each of the MMET cases for efficient generation of objective traditional and scale-appropriate spatial verification statistics for tests conducted by the researchers.

The results from DTC-established baseline forecasts can be used by the research community to compare sensitivities to different physical and dynamical configurations in order to promote improved forecast performance. The end-to-end environment in operational systems is more complex than what is used in MMET; however, a direct comparison between MMET baseline results and user-contributed results is meaningful if conducted in an equivalent manner. Testing innovations by initializing forecasts using operational model analyses that include advanced data assimilation methods is often beneficial in model development. It is also recognized that the addition of data assimilation within a test is a critical component to improve forecast accuracy. MMET will continue to evolve, and additional baselines using data assimilation (GSI) will be established for each case so that sensitivities to this aspect may also be investigated. While many of these software components are used in operational systems, at this stage MMET baselines are not required to run in a functionally similar operational environment (FSOE)—an environment that attempts to replicate operational systems in order to reproduce operational forecasts (e.g., GSI is not required).

Sharing objective verification output for innovation tests with the DTC provides an avenue for direct comparisons with other established results. Demonstrated improvement of forecast accuracy through objective verification during the first stage of testing may lead to the innovation being considered for the second stage of testing, where the natural progression leads to longer test periods in a FSOE.

Stages of the R2O process.

To ensure that promising innovations transition effectively through the R2O process, multiple stages of testing must be conducted to assess the merits of a particular advancement. Recent testing performed at all three stages is illustrated in the following section to highlight the progression of one innovation through the R2O process.

Stage I: Initial testing utilizing MMET

Identifying operational model shortfalls and conducting initial case study investigations with innovations targeted at addressing those deficiencies is a critical first step in the path to improving operational forecasts. One such long-standing, persistent problem seen in several operational models (Guan et al. 2015) is the surface daytime temperature biases, where a consistent warm bias is seen in the summer and a cold bias is noted in the winter, especially over areas of snow cover. MMET provides an excellent platform to conduct initial testing to investigate this operational model shortfall.

The Thompson microphysics scheme (Thompson et al. 2008) was recently ported into the NEMS-NMMB code base, allowing for the opportunity to replace the NAM operational scheme (Ferrier–Aligo microphysics; Aligo et al. 2014) and assess the impact on, among other things, surface temperature forecasts. When using the Ferrier–Aligo microphysics with the updated Rapid Radiative Transfer Model for General Circulation Models (RRTMG; Iacono et al. 2008), the internally calculated effective radius of cloud water in RRTMG is consistent with Ferrier–Aligo microphysics values, while a revised effective radius is defined for a combined ice category. In contrast, a more direct coupling between the Thompson microphysics scheme and RRTMG is employed, where the Thompson microphysics passes explicitly computed cloud water, cloud ice, and snow effective radii to the RRTMG radiation scheme. Through this explicit direct coupling, the potential for improved representation of cloudy areas, along with associated radiative properties (e.g., absorption, scattering), exists and may produce different physical responses in surface temperature. Select results from testing the NAM operational physics suite (NAMps) and the Thompson microphysics (THOMps) configurations (Table 2) for two MMET cases [one summer (0000 UTC 11 September 2013) and one winter (0000 UTC 3 February 2012)] are shown to highlight the types of objective baseline verification statistics available in MMET and how they can be used to assess the impact of an innovation.

Traditional verification statistics (Jolliffe and Stephenson 2011; Wilks 2011) are computed for surface (including precipitation) and upper-air variables for both the 12-km CONUS and 3-km nested domains and are aggregated over several verification regions and provided through MMET. Metrics calculated for both surface and upper-air temperature, dewpoint temperature, and wind speed include bias1 and bias-corrected root-mean-square error (BCRMSE).2 For upper-air results, verification statistics are computed for times valid at 0000 and 1200 UTC for the mandatory levels (not shown), and surface verification results start at the 3-h lead time and go out the length of the forecast in 3-h increments. Typically, the focus of case studies is on improving the forecast(s) for a particular event or parameter (in this case, surface temperature); however, the baseline results available through MMET also provide a way of ensuring that the overall forecast performance is not degraded when testing an innovation. While the surface dewpoint temperature and wind speed biases have smaller differences between the two configurations, it is seen that the THOMps configuration does impact the overall surface temperature bias for both the summer (Fig. 3a) and winter (Fig. 3b) cases across the 12-km parent domain. Overall, the THOMps configuration is colder than the NAMps configuration, leading to an improvement in the surface temperature bias in the summer and degradation in the winter. In addition to the time series plots of temperature, dewpoint temperature, and wind speed, biases were calculated at each observation station over the CONUS domain as a means to spatially assess the forecast performance of the configuration. An example of this type of plot is provided to highlight the spatially coherent areas of significant warm and cold biases for both the NAMps (Fig. 4a) and THOMps (Fig. 4b) configurations for the February 2012 case. It is evident that the THOMps configuration is generally colder CONUS-wide for the winter case; however, a large difference between the two configurations is immediately evident across Texas. Looking further at the differences in downward shortwave (SW) radiation at the surface (Fig. 4c), there is a distinct signal that the THOMps configuration had more extensive cloudiness across Texas (i.e., NAMps had larger values of downward shortwave radiation at the surface compared to THOMps) accounting for the significant cold bias in that region at that particular time of the event.

Fig. 3.

Time series plot of mean error (bias) for surface temperature (°C; red), dewpoint temperature (°C; blue), and wind speed (m s−1; black) aggregated over the 12-km parent domain for the NAMps (solid) and THOMps (dashed) for a single initialization from the (a) Sep 2013 flood and (b) Feb 2012 snowstorm cases.

Fig. 3.

Time series plot of mean error (bias) for surface temperature (°C; red), dewpoint temperature (°C; blue), and wind speed (m s−1; black) aggregated over the 12-km parent domain for the NAMps (solid) and THOMps (dashed) for a single initialization from the (a) Sep 2013 flood and (b) Feb 2012 snowstorm cases.

Fig. 4.

Spatial plot of surface temperature (°C) mean error (bias) by observation station for the (a) NAMps and (b) THOMps, along with the (c) difference (NAMps minus THOMps) in shortwave radiation reaching the surface between the two configurations. All plots are for the 12-km parent domain at the 18-h forecast lead time from the 0000 UTC 3 Feb 2012 case.

Fig. 4.

Spatial plot of surface temperature (°C) mean error (bias) by observation station for the (a) NAMps and (b) THOMps, along with the (c) difference (NAMps minus THOMps) in shortwave radiation reaching the surface between the two configurations. All plots are for the 12-km parent domain at the 18-h forecast lead time from the 0000 UTC 3 Feb 2012 case.

For precipitation verification, accumulation periods of 3- and 24-h are evaluated for a number of precipitation thresholds and include the traditional quantitative precipitation forecast (QPF) verification metrics of frequency bias3 and Gilbert skill score (GSS).4 Assessing 3-h accumulated precipitation for the February 2012 case using GSS, the NAMps and THOMps configurations tend to perform similarly for the 3-km nested domain (Fig. 5). The largest differences are seen at the smallest thresholds for the 48-h forecast lead time, where the NAMps configuration has higher GSS values. In addition to the traditional verification metrics provided, the fraction skill score (FSS; Roberts and Lean 2008) is calculated for accumulated precipitation and composite radar reflectivity over appropriate thresholds and neighborhood sizes; this neighborhood verification approach provides an assessment of the scale at which the forecast becomes skillful. In the example shown here, an increase in skill with larger spatial scale is noted for reflectivity values greater than or equal to 20 dBZ for the winter case, with the THOMps configuration typically having higher FSS values (Fig. 6). The inclusion of scale-appropriate spatial verification is necessary to properly assess forecast skill at finer scales where the so-called double penalty, the shortcoming of traditional verification methods to penalize high-resolution models for missing an event and having a false alarm when there is a small offset in placement, becomes significant (e.g., Wolff et al. 2014).

Fig. 5.

Threshold series plot of GSS (lines) and relative frequency of occurrence of precipitation exceeding each threshold (base rate; bars) for 3-h accumulated precipitation (in.; 1 in. = 2.54 cm) for the NAMps (solid) and THOMps (dashed) over the 3-km nested domain for forecast hours 12–48 in 12-h increments for the 0000 UTC 3 Feb 2012 case.

Fig. 5.

Threshold series plot of GSS (lines) and relative frequency of occurrence of precipitation exceeding each threshold (base rate; bars) for 3-h accumulated precipitation (in.; 1 in. = 2.54 cm) for the NAMps (solid) and THOMps (dashed) over the 3-km nested domain for forecast hours 12–48 in 12-h increments for the 0000 UTC 3 Feb 2012 case.

Fig. 6.

Time series plot of FSS for composite radar reflectivity using a threshold of ≥20 dBZ for the NAMps (solid) and THOMps (dashed) over the 3-km nested domain for the 0000 UTC 3 Feb 2012 case. Neighborhood sizes for n = 3 (9-km spatial scale; blue), 15 (45-km spatial scale; red), and 39 (117-km spatial scale; black) are shown.

Fig. 6.

Time series plot of FSS for composite radar reflectivity using a threshold of ≥20 dBZ for the NAMps (solid) and THOMps (dashed) over the 3-km nested domain for the 0000 UTC 3 Feb 2012 case. Neighborhood sizes for n = 3 (9-km spatial scale; blue), 15 (45-km spatial scale; red), and 39 (117-km spatial scale; black) are shown.

Utilizing two MMET cases to investigate the impact of the THOMps configuration on the persistent problem with surface temperature bias revealed several key considerations during the first stage of the R2O process. Impacts were noted for surface temperature bias, but the results were mixed. The THOMps configuration improved upon warm surface temperature biases in the summer, but the magnitude of the cold surface temperature bias in the winter was increased. Overall, neutral impacts were noted for additional forecast variables assessed. Given this, an extended test was deemed appropriate to quantify the robustness of the results and further assess whether the innovation might be suitable for operational consideration.

While MMET provides baseline verification results for all cases, users are not limited to this analysis and should investigate additional diagnostics for the purpose of identifying strengths and weaknesses of a new capability and improving forecast skill. One such example is provided in the above discussion related to surface shortwave radiation differences between the two configurations, the results of which are not included in the objective baseline verification through MMET, but scripts are available to be used as a diagnostic tool. Beyond investigating model-based diagnostics, the user community is urged to use advanced spatial verification techniques (e.g., object-based and neighborhood methods). A number of methods available in MET are beneficial for obtaining objective measures of how forecast skill varies with spatial scale as well as diagnosing model performance in terms of coverage, displacement, and orientation of features of interest. Researchers are strongly encouraged to not only use these types of independently employed model development and verification diagnostics but also to contribute the capabilities and results to the MMET data repository for use by the community at large.

Stage II: Extended testing at the DTC

While a positive impact may be shown for one or several case studies, it is important to extensively test an innovation to determine whether the performance is robust over a broad range of weather regimes to ensure operational applicability. To further assess the representativeness of the differences found in the initial results using MMET cases related to shortwave radiation and surface temperature biases, the DTC conducted an extended T&E activity across a broad range of cases using the NAMps and THOMps coupled to RRTMG as part of the second stage of the R2O process. A 12-km North American parent domain with 3-km one-way CONUS and Alaska nests were used in the test to mimic aspects of the operational NAM configuration. Forecasts were initialized every 36 h for 1 month per season and run out to 48 h. The full suite of verification results and final report are available on the project web page (www.dtcenter.org/eval/meso_mod/nmmb_test/nems_v0.9).

In short, the extended test confirmed the results from the two MMET cases are representative of the results aggregated over the summer and winter seasons in terms of surface temperature bias statistics. The largest differences in surface temperature between the two configurations were typically seen during the daytime hours, which also corresponded to the time of the largest difference of each configuration from the observations (bias; Fig. 7). Overall, the THOMps had lower median bias values compared to the NAMps, leading to improved performance during the summer, when both configurations exhibited a warm bias, and degraded performance in the winter, when both configurations had a cold bias. Again, it is noted that the surface temperature biases are strongly related to the downward shortwave radiation reaching the surface, and the THOMps configuration generally had lower values regardless of season (Figs. 8a,b). Downward longwave (LW) radiation is strongly influenced by water vapor and aerosols in the atmosphere, where increased longwave values are generally indicative of increased cloudiness. It is not surprising that large differences between the two configurations are seen in the upper Midwest and parts of the northern plains, where cloudy regimes are typical during the winter season. In these areas, THOMps has larger downward longwave radiation values than NAMps, indicating increased cloudiness in the THOMps (Fig. 8c). In the summer aggregation, while there are small, coherent structures in the difference field, they are typically isolated and not regionwide (not shown).

Fig. 7.

Time series plots of the median surface temperature (°C) mean error (bias) aggregated for the NAMps (solid) and THOMps (dashed) for all 0000 UTC initializations included in the stage II DTC extended testing during (a) summer and (b) winter. The vertical bars attached to the median represent the 99% confidence intervals.

Fig. 7.

Time series plots of the median surface temperature (°C) mean error (bias) aggregated for the NAMps (solid) and THOMps (dashed) for all 0000 UTC initializations included in the stage II DTC extended testing during (a) summer and (b) winter. The vertical bars attached to the median represent the 99% confidence intervals.

Fig. 8.

Mean difference (NAMps minus THOMps) in downward (a),(b) shortwave and (c) longwave radiation (W m−2) reaching the surface for all 0000 UTC initializations during (a) summer and (b),(c) winter at the 18-h forecast lead time.

Fig. 8.

Mean difference (NAMps minus THOMps) in downward (a),(b) shortwave and (c) longwave radiation (W m−2) reaching the surface for all 0000 UTC initializations during (a) summer and (b),(c) winter at the 18-h forecast lead time.

An additional metric that is important to make note of prior to operational consideration is the compute time required for each of the configurations. Because of the increased sophistication of the Thompson microphysics scheme, it took, on average, 54% longer to run to completion than the operational NAM configuration using the Ferrier–Aligo microphysics scheme. While the higher computational cost of the Thompson microphysics scheme does not allow this innovation to be implemented in the current operational NAM computational time slot, information from this test can be leveraged to improve existing operational configurations.

Stage III: Preimplementation testing by operational centers

Upon completion of extensive T&E conducted in the second stage of the R2O process, results are shared with interested operational entities and the research community through a variety of forums, including reports, conference presentations, and web pages. For each innovation that operational centers determine to be a promising candidate that should progress into the third stage of testing, well-defined and rigorous preimplementation testing procedures are performed with the innovation included in the target operational application. In the example discussed here, the results obtained during the stage II extended DTC testing were utilized during the operational decision-making process, along with internal results produced at EMC from ongoing parallel runs. While the innovation (THOMps configuration) was not included in its entirety in the target operational system, the results motivated an informed decision to modify the operational NAM physics suite. In particular, based on the results, EMC removed the lower limit for the cloud droplet effective radius in RRTMG with the Ferrier–Aligo microphysics. This modification is expected to reduce incoming surface shortwave radiation fluxes under liquid clouds and, in turn, reduce warm surface temperature biases. In addition, a partial cloudiness scheme to better represent subgrid-scale clouds is also being implemented to further improve the surface temperature forecasts. Provided positive results are seen during additional testing in the preimplementation phase of the R2O process at EMC, both of these modifications will be fully implemented in the next NAM operational bundle upgrade.

Through a well-defined R2O transition process, three stages of T&E were carried out for one particular innovation, which will ultimately impact an operational NWP system. The process began by investigating a persistent model bias using MMET case studies in the first stage and transitioned to extended DTC T&E in the second stage. The results produced in stage II were leveraged in stage III through work with EMC to facilitate changes in the operational NAM model physics to improve the coupling of clouds and radiation.

SUMMARY.

Through a coordinated effort between the research and operational NWP communities, a sturdy bridge across the R2O valley of death can be built. By defining and implementing a testing protocol for transitioning science innovations from research to operations, the DTC is working to ensure that efforts are streamlined. It is imperative that the process be efficient and effective. Ultimately, the R2O transition is a shared responsibility among researchers, the DTC, and operational centers throughout the three stages of the process. To assist researchers with the initial stage of case study testing, the DTC established MMET. Through this common testing framework, the merits of new developments that could potentially impact operational configurations in the future may be readily assessed. MMET will continue to be enhanced over time and will include hurricane cases with baseline results provided from the Hurricane WRF (HWRF) system (Bernardet et al. 2014). The NWP community is encouraged to engage in MMET during the initial stage of the testing process using techniques or capabilities believed to be ready for stage II, extensive testing and evaluation with the DTC. As improvement in forecast skill is realized, a new technique may progress through the R2O process by undergoing additional preimplementation testing at stage III in collaboration with EMC. In the end, major benefits toward improving operational NWP systems can be realized with a strong, well-defined R2O process and a well-engaged research and operational community.

ACKNOWLEDGMENTS

The authors express gratitude to Cliff Mass (University of Washington) for his assistance with establishing the preliminary foundation of transitioning NWP technologies through the R2O process in the testing protocol document. We would like to sincerely thank Pedro Jiménez (NCAR/RAL), Gary Lackmann (North Carolina State University), Kelly Mahoney (Cooperative Institute for Research in Environmental Sciences), and Anthony Torres (Significant Opportunity in Atmospheric Research and Science protégé) for using MMET and providing valuable feedback on how to improve the information provided to the NWP community. The T&E activity described in the R2O example provided here would not have been possible without Greg Thompson and his work to include his microphysics scheme in NEMS-NMMB. We extend gratitude to Cody Phillips and Tara Jensen for their contributions to the MMET facility along the way. We appreciate Ligia Bernardet, Joshua Hacker, and Biswas Mrinal for the time they donated to provide insightful suggestions for improvement of an earlier version of this manuscript. The constructive comments from three anonymous reviewers were helpful with improving the quality of the final submission. The Developmental Testbed Center (DTC) is funded by the National Oceanic and Atmospheric Administration (NOAA), the U.S. Air Force, the National Center for Atmospheric Research (NCAR), and the National Science Foundation (NSF).

REFERENCES

REFERENCES
Aligo
,
E.
,
B.
Ferrier
,
J.
Carley
,
E.
Rogers
,
M.
Pyle
,
S. J.
Weiss
, and
I. L.
Jirak
,
2014
:
Modified microphysics for use in high-resolution NAM forecasts. 27th Conf. on Severe Local Storms, Madison, WI, Amer. Meteor. Soc., 16A.1. [Available online at https://ams.confex.com/ams/27SLS/webprogram/Paper255732.html.]
Angevine
,
W. M.
,
H.
Jiang
, and
T.
Mauritsen
,
2010
:
Performance of an eddy diffusivity–mass flux scheme for shallow cumulus boundary layers
.
Mon. Wea. Rev.
,
138
,
2895
2912
, doi:.
Benjamin
,
S. G.
,
G. A.
Grell
,
J. M.
Brown
, and
T. G.
Smirnova
,
2004
:
Mesoscale weather prediction with the RUC hybrid isentropic–terrain-following coordinate model
.
Mon. Wea. Rev.
,
132
,
473
494
, doi:.
Bernardet
,
L.
, and Coauthors
,
2008
:
The Developmental Testbed Center and its Winter Forecasting Experiment
.
Bull. Amer. Meteor. Soc.
,
89
,
611
627
, doi:.
Bernardet
,
L.
, and Coauthors
,
2014
:
Community support and transition of research to operations for the Hurricane Weather Research and Forecast (HWRF) model
.
Bull. Amer. Meteor. Soc.
,
96
,
953
960
, doi:.
Dudhia
,
J.
,
1989
:
Numerical study of convection observed during the winter monsoon experiment using a mesoscale two-dimensional model
.
J. Atmos. Sci.
,
46
,
3077
3107
, doi:.
Fowler
,
T. L.
,
T.
Jensen
,
E. I.
Tollerud
,
J.
Halley Gotway
,
P.
Oldenburg
, and
R.
Bullock
,
2010
:
New Model Evaluation Tools (MET) software capabilities for QPF verification. Preprints, Third Int. Conf. on QPE/QPF and Hydrology, Nanjing, China, World Weather Research Programme, 189–193
.
Fulker
,
D.
,
S.
Bates
, and
C.
Jacobs
,
1997
:
Unidata: A virtual community sharing resources via technological infrastructure
.
Bull. Amer. Meteor. Soc.
,
78
,
457
468
, doi:.
Grell
,
G. A.
, and
S. R.
Freitas
,
2014
:
A scale and aerosol aware stochastic convective parameterization for weather and air quality modeling
.
Atmos. Chem. Phys.
,
14
,
5233
5250
, doi:.
Guan
,
H.
,
B.
Cui
, and
Y.
Zhu
,
2015
:
Improvement of statistical postprocessing using GEFS reforecast information
.
Wea. Forecasting
,
30
,
841
854
, doi:.
Hong
,
S.-Y.
,
J.
Dudhia
, and
S.-H.
Chen
,
2004
:
A revised approach to ice microphysical processes for the bulk parameterization of cloud and precipitation
.
Mon. Wea. Rev.
,
132
,
103
120
, doi:.
Hong
,
S.-Y.
,
Y.
Noh
, and
J.
Dudhia
,
2006
:
A new vertical diffusion package with an explicit treatment of entrainment processes
.
Mon. Wea. Rev.
,
134
,
2318
2341
, doi:.
Iacono
,
M. J.
,
J. S.
Delamere
,
E. J.
Mlawer
,
M. W.
Shephard
,
S. A.
Clough
, and
W. D.
Collins
,
2008
:
Radiative forcing by long-lived greenhouse gases: Calculations with the AER radiative transfer models
.
J. Geophys. Res.
,
113
,
D13103
, doi:.
Janjic
,
Z. I.
,
1994
:
The step–mountain eta coordinate model: Further developments of the convection, viscous sublayer, and turbulence closure schemes
.
Mon. Wea. Rev.
,
122
,
927
945
, doi:.
Janjic
,
Z. I.
,
1996
:
The surface layer in the NCEP Eta model. Preprints, 11th Conf. on Numerical Weather Prediction, Norfolk, VA, Amer. Meteor. Soc., 354–355
.
Jiménez
,
P. A.
, and
J.
Dudhia
,
2012
:
Improving the representation of resolved and unresolved topographic effects on surface wind in the WRF Model
.
J. Appl. Meteor. Climatol.
,
51
,
300
316
, doi:.
Jiménez
,
P. A.
, and
J.
Dudhia
,
2013
:
On the ability of the WRF Model to reproduce the surface wind direction over complex terrain
.
J. Appl. Meteor. Climatol.
,
52
,
1610
1617
, doi:.
Jolliffe
,
I. T.
, and
D. B.
Stephenson
,
2011
:
Forecast Verification: A Practitioner’s Guide in Atmospheric Science. 2nd ed. Wiley, 292 pp.
Kain
,
J. S.
,
2004
:
The Kain–Fritsch convective parameterization: An update
.
J. Appl. Meteor.
,
43
,
170
181
, doi:.
Mansell
,
E. R.
,
C. L.
Ziegler
, and
E. C.
Bruning
,
2010
:
Simulated electrification of a small thunderstorm with two-moment bulk microphysics
.
J. Atmos. Sci.
,
67
,
171
194
, doi:.
Mlawer
,
E. J.
,
S. J.
Taubman
,
P. D.
Brown
,
M. J.
Iacono
, and
S. A.
Clough
,
1997
:
Radiative transfer for inhomogeneous atmospheres: RRTM, a validated correlated-k model for the longwave
.
J. Geophys. Res.
,
102
,
16 663
16 682
, doi:.
Morrison
,
H.
,
G.
Thompson
, and
V.
Tatarskii
,
2009
:
Impact of cloud microphysics on the development of trailing stratiform precipitation in a simulated squall line: Comparison of one- and two-moment schemes
.
Mon. Wea. Rev.
,
137
,
991
1007
, doi:.
Nakanishi
,
M.
, and
H.
Niino
,
2006
:
An improved Mellor–Yamada level 3 model: Its numerical stability and application to a regional prediction of advecting fog
.
Bound.-Layer Meteor.
,
119
,
397
407
, doi:.
Niu
,
G.-Y.
, and Coauthors
,
2011
:
The community Noah land surface model with multiparameterization options (Noah-MP): 1. Model description and evaluation with local-scale measurements
.
J. Geophys. Res.
,
116
,
D12109
, doi:.
Paulson
,
C. A.
,
1970
:
The mathematical representation of wind speed and temperature profiles in the unstable atmospheric surface layer
.
J. Appl. Meteor.
,
9
,
857
861
, doi:.
Pleim
,
J. E.
,
2007
:
A combined local and nonlocal closure model for the atmospheric boundary layer. Part I: Model description and testing
.
J. Appl. Meteor. Climatol.
,
46
,
1383
1395
, doi:.
Roberts
,
N. M.
, and
H. W.
Lean
,
2008
:
Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events
.
Mon. Wea. Rev.
,
136
,
78
97
, doi:.
Sherretz
,
L. A.
, and
D. W.
Fulker
,
1988: Unidata: Enabling universities to acquire and analyze scientific data
.
Bull. Amer. Meteor. Soc.
,
69
,
373
376
, doi:.
Tewari
,
M.
, and Coauthors
,
2004
:
Implementation and verification of the unified Noah land surface model in the WRF model. 20th Conf. on Weather Analysis and Forecasting/16th Conf. on Numerical Weather Prediction, Seattle, WA, Amer. Meteor. Soc., 14.2a. [Available online at https://ams.confex.com/ams/84Annual/techprogram/paper_69061.htm.]
Thompson
,
G.
, and
T.
Eidhammer
,
2014
:
A study of aerosol impacts on clouds and precipitation development in a large winter cyclone
.
J. Atmos. Sci.
,
71
,
3636
3658
, doi:.
Thompson
,
G.
,
P. R.
Field
,
R.M.
Rasmussen
, and
W. D.
Hall
,
2008
:
Explicit forecasts of winter precipitation using an improved bulk microphysics scheme. Part II: Implementation of a new snow parameterization
.
Mon. Wea. Rev.
,
136
,
5095
5115
, doi:.
Wilks
,
D. S.
,
2011
:
Statistical Methods in the Atmospheric Sciences. 3rd ed. Elsevier, 676 pp.
Wolff
,
J. K.
,
B. S.
Ferrier
, and
C. F.
Mass
,
2012
:
Establishing closer collaboration to improve model physics for short-range forecasts
.
Bull. Amer. Meteor. Soc.
,
93
,
ES51
ES53
, doi:.
Wolff
,
J. K.
,
M.
Harrold
,
T.
Fowler
,
J.
Halley Gotway
,
L.
Nance
, and
B. G.
Brown
,
2014
:
Beyond the basics: Evaluating model-based precipitation forecasts using traditional, spatial, and object-based methods
.
Wea. Forecasting
,
29
,
1451
1472
, doi:.

Footnotes

*

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

1

Average difference between the forecast and observation or mean error. Perfect forecast = 0; low bias < 0 and high bias > 0.

2

The root-mean-square error that is not associated with the bias. Perfect forecast = 0; ranges are from 0 to ∞.

3

Ratio of the frequency of forecast events to that of observed events. Perfect forecast = 1; underforecast < 1, and overforecast > 1.

4

Fraction of observed events that were correctly predicted, adjusted for the expected number of hits associated with random chance. Perfect forecast = 1, and no skill forecast = 0; ranges are from -1/3 to 1.