With a goal of improving operational numerical weather prediction (NWP), the Developmental Testbed Center (DTC) has been working with operational centers, including, among others, the National Centers for Environmental Prediction (NCEP), National Oceanic and Atmospheric Administration (NOAA), National Aeronautics and Space Administration (NASA), and the U.S. Air Force, to support numerical models/systems and their research, perform objective testing and evaluation of NWP methods, and facilitate research-to-operations transitions. This article introduces the first attempt of the DTC in the data assimilation area to help achieve this goal. Since 2009, the DTC, NCEP’s Environmental Modeling Center (EMC), and other developers have made significant progress in transitioning the operational Gridpoint Statistical Interpolation (GSI) data assimilation system into a community-based code management framework. Currently, GSI is provided to the public with user support and is open for contributions from internal developers as well as the broader research community, following the same code transition procedures. This article introduces measures and steps taken during this community GSI effort followed by discussions of encountered challenges and issues. The purpose of this article is to promote contributions from the research community to operational data assimilation capabilities and, furthermore, to seek potential solutions to stimulate such a transition and, eventually, improve the NWP capabilities in the United States.
An operational data assimilation system was transitioned into a community-based code management framework with open code access and user support, opening a new pathway between research and operations.
Numerical weather prediction (NWP) is based on computer models, which describe the state of the atmosphere using mathematical equations in order to predict the evolution of weather conditions. Though attempted in the early 1900s, it was not until 1950 that the first successful weather forecast was recorded (Charney et al. 1950). This proved NWP was feasible and could produce realistic weather forecasts. The United States started to perform NWP operationally in mid-1955. The payoff came in 1958 when skillful, timely numerical predictions were delivered to forecasters to provide guidance for the then-manually prepared prognostic charts (Shuman 1989). Currently, operational NWP centers around the globe run a myriad of models. These models, some of which were developed and are run by the National Oceanic and Atmospheric Administration (NOAA)/National Weather Service (NWS), produce a wide variety of products and services for atmospheric and oceanic parameters, hurricanes, severe weather, aviation weather, air quality, and so on. Advancement of modern NWP is due to revolutionary improvements in a number of key areas, including developments in the theory of meteorology, innovations of observational instrumentation and technology, and advancements in modern computers and their massive computing capabilities. Similarly, NWP in the United States has progressed consistently through the years and its products are being used widely around the world. However, in recent years, concerns have arisen from the research community regarding the NWP improvement rate in the United States (Mass 2012). The lack of advanced data assimilation techniques and insufficient use of observations in operational data assimilation systems were recognized as some of the key elements.
The purpose of data assimilation in an NWP system is to provide an initial set of conditions with an optimized combination of background (e.g., the forecast from the previous cycle of the NWP model) and observation information obtained through weather stations, ships, satellites, and other observational instruments. Many previously published articles introduce the concepts and applications of data assimilation (e.g., Daley 1991; Navon et al. 1992; Houtekamer and Mitchell 1998; Wu et al. 2002; Kalnay 2003; Barker et al. 2004; Whitaker et al. 2011). Prior to 2012, the operational data assimilation system at NOAA was dominated by the three-dimensional variational data assimilation (3DVAR) technique, while other operational centers in the world had transitioned to more advanced techniques [e.g., 4DVAR for the European Centre for Medium-Range Weather Forecasts (ECMWF; Mahfouf and Rabier 2000); hybrid ensemble-variational (EnVar) technique for the Met Office (Clayton et al. 2013)]. Transitioning advanced data assimilation techniques from the research community to operations and taking maximum advantage of current and future observations, especially satellite data, had become urgent and critical for improving the quality of NWP in the United States.
Over the past few decades, many efforts have been made in the research community to improve data assimilation techniques and NWP systems. Serafin et al. (2002) discussed potential avenues that would facilitate the transition of new scientific research and technology to the NWS. Among these, community-based modeling efforts were considered an important route for research to operations (R2O) transitions. The source code for these community models/systems became publicly accessible and could be updated with contributions from a broader community, including universities, operational agencies, and the private sector. Such examples include the Advanced Research version of the Weather Research and Forecasting (WRF) Model (ARW; Skamarock et al. 2008), the Community Radiative Transfer Model (CRTM; Han et al. 2006), the Data Assimilation Research Testbed (DART) system (Anderson et al. 2009), and the WRF Data Assimilation (WRFDA) system (Barker et al. 2012). This article provides an overview of an effort centered at the Developmental Testbed Center (DTC) to provide a “new” data assimilation system to the community. Unlike many current community models/systems originating from the research community, this community data assimilation system was transitioned from an existing operational system and continues to be run for daily weather forecasts, while remaining open to the research community. Through this effort, the DTC works closely with the developers to explore the potential to bridge the research and operational data assimilation communities and help accelerate R2O transitions. Experiences and lessons gained via this effort are discussed in this article.
HISTORY OF THE DTC AND COMMUNITY GSI EFFORTS.
The DTC (Bernardet et al. 2008; Ralph et al. 2013; www.dtcenter.org/) is a distributed facility, residing at the National Center for Atmospheric Research (NCAR) and NOAA’s Earth System Research Laboratory (ESRL). The DTC collaborates with operational centers and the research community, supporting numerical models and their research, developing verification tools, and performing objective tests and evaluation of NWP methods. The WRF-Nonhydrostatic Mesoscale Model (WRF-NMM) is the first operational model for which the DTC provided support to the research community, in partnership with its development team at the National Centers for Environmental Prediction (NCEP). Since then, the DTC has further explored promoting usage of operational systems in the research community and enhancing the collaboration between the operational and research communities. For example, the DTC is currently providing full user support for the Hurricane WRF (HWRF; Bernardet et al. 2015). This article introduces the initial effort of the DTC in the area of data assimilation, providing the Gridpoint Statistical Interpolation analysis system (GSI) to the research community and building the pathway for data assimilation R2O transitions.
GSI is a state-of-the-art analysis system, initially developed by NCEP’s Environmental Modeling Center (EMC). It was designed as a traditional 3DVAR system applied in gridpoint space to facilitate the implementation of anisotropic inhomogeneous covariances (Wu et al. 2002; Purser et al. 2003a,b). This 3DVAR system replaced NCEP’s operational grid-space regional analysis system for the North American Mesoscale Forecast System (NAM) in 2006 and the global Spectral Statistical Interpolation (SSI) analysis system for the Global Forecast System (GFS) in 2007 (Kleist et al. 2009). In the past few years, along with the community framework being built for both internal developers and the rest of the research community, GSI has evolved to include various data assimilation techniques for multiple operational applications, including 2DVAR [e.g., the Real-Time Mesoscale Analysis (RTMA) system; De Pondeca et al. (2011)], the hybrid EnVar technique [e.g., the data assimilation systems for GFS, the Rapid Refresh system (RAP), NAM, HWRF, etc.], and 4DVAR [e.g., the data assimilation system for the National Aeronautics and Space Administration’s (NASA) Goddard Earth Observing System, version 5 (GEOS-5); Zhu and Gelaro 2008; Todling and Tremolet 2008]. Currently, GSI is under development to extend its 4D data assimilation capability through inclusion of the hybrid 4D–EnVar approach (Kleist and Ide 2012) with plans to apply this technique to the upcoming GFS implementation scheduled for 2016 (Tallapragada 2015).
As for other operational models and systems, a sustained development effort is granted to GSI within the research teams that support operational applications (therefore mostly considered to be “internal” teams to operational centers). High priority is given to incorporating new observational instrument measurements, especially satellite data. A complete list of data types assimilated by the latest community release code, GSI v3.4 (released in July 2015), can be found in Tables 1 and 2. The latest information can be accessed through the community GSI user’s web page (available online at www.dtcenter.org/com-GSI/users/). Such GSI development benefits from a close relationship between data providers [e.g., the National Environmental Satellite, Data, and Information Service (NESDIS)] and operational centers. For example, it only took seven months after the Suomi National Polar-Orbiting Partnership (Suomi-NPP) satellite mission was launched for NCEP to assimilate the Advanced Technology Microwave Sounder (ATMS) data into real-time GFS operations (Collard et al. 2013). Running efficiency is another focus area of the development team for operational applications. Though the amount of ingested data is rapidly increasing, the GSI code continues to be optimized to fit into limited operational windows. In addition, GSI operational applications have created solid reference configurations and benchmarks. All these aspects provide advantages when using such an operational system for research.
While gaining popularity for operational weather forecasts as well as climate studies (e.g., GFS reanalysis), GSI was not well recognized as a research system prior to 2009 when the community GSI effort was initiated. The code was developed within an operationally driven working environment for specific computing resources (e.g., NOAA computers). Therefore, portability was one of the biggest issues when running on other computing platforms. Documentation was neither well developed nor publically available. Individual communications among developers were often the only choice available to gain access to the latest code and for development coordination. Even within the same operational center, GSI was prone to code divergence since GSI has been continuously under development for different scales and implemented along varying timelines among applications.
At the same time, data assimilation techniques were rapidly advancing in the research community, some of which were developed within the context of other available data assimilation systems and computing environments. An example is the hybrid data assimilation technique, which incorporates ensemble-based flow-dependent background error information into a variational data assimilation system (Lorenc 2003; Buehner 2005; Wang et al. 2008; Barker et al. 2012). Transferring these research advancements into the operational GSI was typically tedious and costly. Developers who were not working closely with the GSI development team found it difficult to stay abreast of the latest GSI capabilities and/or test their new advances within the operational GSI environment. These challenges led to gaps in the process of transitioning research from this distributed development effort into a single system.
Learning from other community system efforts, the DTC recognized it is critical to build a close partnership with development teams from both the research and operational communities and provide a pathway for both sides to communicate and collaborate. Meanwhile, it was also recognized that an organized effort should be sustained to provide assistance with code development and research, as well as to support real-time operational implementations at multiple operational centers. The following section describes the measures and steps taken by the DTC to unify the cross-development teams (including those internal to operational centers) and promote the usage and development of the operational data assimilation system in the general research community.
COMMUNITY GSI FRAMEWORK AND SUPPORT.
Learning from other community systems and models, both the DTC and EMC recognized that a common code repository is an effective way to provide a traceable history of the code and open code access to different types of developers, either from a research facility, the private sector, or an operational center. In 2009, EMC created a code repository using Subversion (https://subversion.apache.org/), a versioning and revision control system, to serve the purpose of in-house code management and meet the implementation requirements within NCEP. While this operational repository has made code development and sharing much more efficient between the internal developers and operational teams, this repository unfortunately resides inside the NOAA security firewall and does not meet the open-access requirement to serve external users. The DTC’s strategy for addressing this access issue was to create a parallel GSI community repository. The community repository mirrors all components residing within EMC’s GSI operational repository, while also containing files not necessarily required by internal EMC users, for example, supplemental libraries required for running GSI, multiple-platform compilation tools, simplified run scripts, community-shared diagnostic utilities, and so on. This approach provides the least intrusive option for the established operational framework. This community repository is open to all users and developers, with an application procedure in place guided by the laws of the U.S. government. The DTC provides the aforementioned additional files and online support to assist users in compiling, configuring, and running GSI using their own computing resources (usually non-NOAA computers).
Users of the GSI repository (either the operational or community repository) can check out the latest code from the repository “trunk” for further development and/or releases (e.g., GFS operational implementations and community GSI releases). All repository users can create their own branches attached to the repository trunk for active development, code testing, and bug fixes. Code divergence among developers (and branches) can be sufficiently avoided through developers committing incremental changes to the trunk and synchronizing branches with the trunk in a timely manner. The code transition from branches back to the trunk, as well as the synchronization of the community and operational repositories, are managed by a procedure developed and monitored by the GSI Review Committee (GRC). The following section introduces the code transition procedure and its connection to the code repository(ies) (Fig. 1).
The GRC is the core of the GSI code management structure. It was formed in 2010 with a goal of incorporating all major GSI development teams in the United States within a unified community framework. It was expanded in 2011 and currently includes members from EMC, the Global Modeling and Assimilation Office (GMAO), ESRL, NCAR, the U.S. Air Force (USAF), NESDIS, and the DTC (chair). As the only organization focusing on user support, the DTC takes on the role of connecting the GRC with developers whose organizations are not represented. The GRC is open to all developers for new membership application. The committee members are responsible for proposing and shepherding new code advances, coordinating ongoing and future development, and providing advisory guidance to community GSI efforts. Two sets of formal meetings are in place to facilitate communications among developers, quarterly GRC meetings hosted by the DTC, coordinating development among major development teams, and biweekly GSI developer meetings hosted by EMC, open to all individual developers for ongoing research and code updates.
Another important function of the GRC is to review code updates/advances to be committed to the code repository. The GRC members review new code developments using their own testing suites, usually associated with operational configurations. Once the GRC reaches unanimous approval for the code changes, EMC and the DTC perform final software sanity tests and commit the code changes to the GSI repository trunk (the operational and community repositories are synchronized for each code commit). Since most of the development work is originally designed for a particular application, this rigorous test–review–test mechanism ensures the GSI system is stable and robust and prevents unexpected changes to all operational applications involved in this community GSI effort.
Community code access and support.
In addition to providing active developers with the developmental GSI system, in 2010 the DTC began providing the general research community with code access through the community GSI user’s website (www.dtcenter.org/com-GSI/users/index.php). This website provides an annual released GSI package, including supplemental libraries, fixed input files, reference configurations, the multiple-platform compilation tool, and sample run scripts, as well as diagnostic utilities. The DTC composed the first GSI user’s guide in 2009, in collaboration with developers, and has since provided updated documentation along with annual releases. The website also provides online exercises, test cases, and other GSI information.
To assist GSI users and developers, the DTC provides training and online support, following each code release. Both fundamental and advanced GSI topics are covered during the tutorials to meet the various needs of GSI users. Users can also practice GSI by completing the hands-on tutorial sessions. The DTC also periodically hosts GSI workshops to promote data assimilation research, which enhances the connections between the operational centers and community researchers. Past GSI community events are listed in Table 3. All of the presentations from these events can be accessed through the GSI community website.
The DTC Visitor Program is another important mechanism the DTC offers to promote the use of operational capabilities in the research community and to assist with transferring research advances to operations. This program provides financial and computing resources for projects that are usually associated with the operational systems supported by the DTC and have been through a rigorous review process. A list of GSI and associated data assimilation visitor projects (including the final reports) can be found online (www.dtcenter.org/visitors/data_assim/).
GSI code tests.
During the transfer of the operational GSI to a unified community system shared with distributed developers and users, it was recognized that performing standard and centralized code tests is essential to avoiding intrusive damage to the incorporated GSI operations and maintaining the integrity and robustness of GSI. The DTC works closely with the GRC members to build a solid testing and evaluation procedure for GSI. Currently, three types of regular tests are in place for GSI maintenance: repository regression tests, preimplementation tests, and the DTC community code tests.
Running regression tests is an essential part of the code review procedure and repository maintenance. The suite of regression tests contains a set of preconfigured cases to be run prior to and after new code is committed. These cases are selected to test certain components or configurations of GSI (e.g., running GSI in the global domain or for a tropical storm case). The size of the cases is usually small so that they can be run within a short time frame. The regression tests are performed for each update to the code repository trunk and the results provide information on whether, and how much, the computational cost and scientific performance have changed because of the particular update. Current regression tests, managed by both the DTC and EMC, are designed to run multiple reference configurations associated with operational applications (GFS, HWRF, RTMA, etc.) on multiple platforms. Running these regression tests has proven to be sufficient in preventing most system crashes stemming from new development. Many of the code issues, especially those related to portability and compatibility, are tackled through regression tests during the code review procedure before changes are added to the GSI trunk. The regression tests are updated periodically based on developers’ input.
Preimplementation tests refer to those tests performed inside operational centers prior to a particular operational implementation. A general practice is for an operational center to conduct an extensive period of real-time parallel runs using updated GSI capabilities, compare the generated results with the then-operational products, and evaluate the code robustness and impacts of the new add-ons. Though this type of testing may sound irrelevant to general researchers, the preimplementation tests play an important role in ensuring that the GSI code remains solid and robust. Since parallel runs are usually performed for multiple months, seasons, or even years depending on the requirement of the particular application implementation, GSI is tested continuously and thoroughly for those operational configurations.
Community test bed.
Preimplementation tests are essential for operational implementations to evaluate research and code advances before they are transitioned to operations. However, they are usually not available to external users and developers. Therefore, the DTC strived to build a community GSI test bed. Through such a test bed, researchers can evaluate new development impacts in a near-operational environment and, therefore, testing results are more relevant for the implementation of decision-making processes at operational centers. This test bed is an end-to-end system that includes preprocessing, GSI, the forecast model (e.g., ARW, NMM, HWRF, etc.), postprocessing, and verification, as well as archived operational datasets and other input files. In consultation with operational agencies, this test bed can be set up to be functionally similar to a particular operational configuration. The DTC testing capabilities are open to researchers through the DTC Visitor Program. Internally, the DTC uses this test bed for community release tests and tutorial practical sessions. The test bed is then used not only for GSI code tests but also for testing libraries and run scripts. This test bed framework is also used by the DTC to perform independent testing and evaluation of GSI and provide a rational basis for research studies as well as operational applications. All tests conducted by the DTC are defined in consultation with sponsors or based on community interest. They usually complement operational preimplementation tests in varying aspects.
EXPERIENCE AND LESSONS.
Over the past few years, this community effort has transitioned GSI into a unified community-based framework. The direct result is the rapid advancement of GSI. Since 2009, GSI has evolved into a data assimilation system containing advanced data assimilation techniques (e.g., hybrid EnVar), with better usage of observational measurements (e.g., cloudy radiance). The GSI code itself is more modular and modernized, as well as becoming portable and easier to edit for developers. Through many factors contributing to the GSI evolution (e.g., strong support of operational centers), this community GSI effort has helped stimulate more coordinated development and closer communication inside the development teams and among distributed developers. For example, before the code management procedure was implemented, the initial cloud analysis capability, currently included in GSI, developed by ESRL and the University of Oklahoma (Hu and Xue 2007; Hu et al. 2008), took more than a year to be accepted into the operational GSI for many reasons (e.g., code divergence, inconsistent coding standard, lack of development coordination). Transferring this research capability to operations was the first working case to which the GRC applied the code management procedure. The functions of the GRC and the code management framework were finalized during this process. As a result, over the past five years, the GRC has received about 100 code review requests, each with multiple code changes. One such request usually takes approximately five business days for a code review and one business day for the code to be committed to the GSI trunk. Currently, all of the GSI implementations in the U.S. operational centers, as well as the DTC community releases, come from the GSI repository managed within this community GSI framework.
Usage of GSI in the general research community has also significantly increased since 2009. Currently, there are over 1,000 community users registered through the DTC GSI website (in addition to the users using the code repositories), with over 300 individuals from the United States and the international community in attendance for the previous GSI tutorials. Over 50% of the current GSI registered users come from the university community. Incorporating the research community with operational center developers has broadened the scope of GSI development and research. In 2012, GSI implemented the hybrid DA technique, resulting in significant improvement in the GFS forecast score. This implementation resulted from a great collaboration among developers from multiple groups, including EMC, ESRL, and the University of Oklahoma. Research within this area took place independently by Whitaker et al. (2011), Kleist and Ide (2012), and Wang et al. (2013). Through the developer meetings, working areas were identified among these researchers and the hybrid capability was implemented into GSI through merging the code contributions from each contributor. Another community contribution example is the addition of the aerosol optional depth (AOD) assimilation capability. This capability was initially developed by Liu et al. (2011) at NCAR and transitioned to the operational GSI code with the assistance of the DTC. This capability was made available through the 2011 annual GSI release. Currently, this capability is being further developed by GMAO and ESRL.
DTC code tests for operations.
A majority of the past DTC GSI testing and evaluation activities have been conducted for regions outside the North American domain, where most of the operational GSI tests in the United States are performed. To help with operational implementations, the DTC tests alternative data types or configurations (system setup, parameter tuning, etc.) and provides suggestions and feedback for the preimplementation parallel tests performed at the operational centers. Such testing components can be either developmental capabilities from the research community or existing capabilities, which are not yet adopted by a particular application.
To help explain how the DTC tests assist in the operational implementation of GSI, including system tuning and testing, an example of DTC code tests performed for the USAF mesoscale applications is shown in Fig. 2. This test was performed to evaluate three different prescribed static background error (BE) statistics for one of the USAF’s regional domains. The motivation of this test comes from the requirement of USAF operations to run GSI in many regional domains throughout the world with a strict time constraint. Given the domain locations, the dimensions and resolutions may be altered as necessary, making the background errors a priori critical. The DTC was tasked with helping select one of the three prescribed BE files generated originally by NOAA for GFS, NAM, and RAP. The GFS and NAM BE files are also included in the annual GSI release packages. The testing was performed across a Northern Hemisphere domain. The BE forecast impact was monitored in a series of real-time and retrospective runs. Figure 2 shows the general operations (GO) indices for the GSI runs using different BE statistics during one of the retrospective testing periods. The GO index number is a ratio used for decision-making purposes by the USAF. It is composed of a series of skill scores, weighted by lead time, for wind speed, temperature, dewpoint temperature, heights at various levels and the surface, and mean sea level pressure. Given this definition, values of the GO index that are less than one indicate the control configuration has lower forecast skill and values greater than one indicate that the test configuration has higher forecast skill. Results in Fig. 2 show that the most positive impact on the forecast skill comes from the NAM BE, for which the GO index is larger than 1 for most of the testing period. Note the sensitivity of analyses and forecasts to BE is variable dependent and therefore decisions should be made based on the application specifics. For this study, the GO index is set up with higher weighting on USAF-selected variables (e.g., wind). Figure 3 shows the root-mean-square errors (RMSEs) of wind and temperature at the analysis time for each of the BE runs. Using the NAM BE significantly reduced the wind analysis error between 700 and 200 hPa. However, it generated larger temperature analysis errors compared with the run using the GFS BE. So the higher forecast skill results from NAM shown in Fig. 2 benefit at least in part from the improved wind field analyses. In addition to the three available operational BE files, a domain and model-specific BE file can also be generated using a background error generation tool developed at NCAR (Descombes et al. 2015). The DTC tested this community capability for the USAF as well. However, the GSI analyses and forecasts generated using this particular BE set were not superior to those of the NAM BE run. Based on results from these short-term experiments performed by the DTC, the tested configurations were fed into the USAF real-time parallel experiments, which were then compared to the production runs. Following the DTC’s recommendation, the GSI runs continuously outperformed the production runs in a month-long test, as shown by Martinelli (2013).
Data sensitivity studies are another common area of work that utilizes the DTC test bed. One of the tests the DTC conducted in 2014 was to evaluate the impact of the Solar Backscatter Ultraviolet/2 (SBUV/2) profile ozone data in the USAF GSI and ARW systems. The DTC performed this test to help the USAF determine whether the SBUV/2 data, as an additional data type for assimilation, might improve the weather forecasts. Figure 4 shows the time series of temperature RMSEs at 50 and 500 hPa with and without the SBUV/2 ozone assimilated for 1–31 August 2014 across an eastern North Pacific domain. Note that ozone is not a prognostic variable in ARW and, therefore, the impact of ozone data assimilation was expected to diminish with time. However, it is clear that the positive impacts on the temperature forecasts are significant throughout the first 48 h. Similar positive impacts were also present for the wind forecasts. This outcome suggests a promising application of ozone data assimilation for regional weather forecasts. The configured GSI system and the testing results were reported to the USAF and will be considered for operations.
The previous two examples demonstrate the types of tests the DTC performs for our operational sponsors. Many of these tests were performed in a functionally similar environment with real-time or retrospective operational cases. Therefore, the testing results were directly adopted by the specific operational centers for their implementation decision process. The operational centers then combined the suggested configurations (tuned parameters, selected observation types, etc.) with other updates and performed longer-term preimplementation tests for a final decision. More diagnostics and analyses are performed as part of these DTC tests and the results are included in the DTC reports for the sponsors. The DTC posts these reports on the DTC testing and evaluation website (www.dtcenter.org/eval/data_assim/) and also presents the results to the research community through DTC community outreach events, conferences, and meetings.
Lessons and potential directions.
Though the community GSI effort has made significant progress in the past few years, improvements upon current efforts are still needed (e.g., merging operational and community repositories), while a number of challenges still remain. Most of, if not all, the GSI development comes from the major development teams already incorporated in the GRC. Contributions from the rest of the research community are limited and many reasons may contribute to this issue. First, compared with the modeling community, the data assimilation research community is relatively small and the number of developers of GSI is even smaller. Applications to the DTC Visitor Program in the data assimilation area are also limited, in comparison with other areas(e.g., model physics or verification). However, it is evident that there is more organized GSI usage in the United States and throughout the international community, with increasing GSI-related presentations and papers appearing at conferences and workshops. This implies the promotion of GSI in the research community is working and many users have gone through the learning curve and have begun real development and research efforts. Therefore, it is essential to continue with the GSI outreach events and community support to sustain community interest. Second, the research community lacks incentives to contribute back to the operational systems. Currently, it is up to the researchers to come forward with feedback to GSI development. In addition, even in cases where some community researchers have agreed to contribute back to the operational code, performing objective and independent code tests was not always feasible. Sometimes, the developmental code, which has been evolving continuously, has not been available for the DTC to perform testing over an extended period or researchers were not willing to share their research code.
Now, since the pathway from research to operations has been laid out and proven to be working properly, it is time to seek more sufficient measures to encourage the involvement of the general research community in operational data assimilation development. The success of the NOAA Hurricane Forecast Improvement Program (HFIP; www.hfip.org/) and the latest Next Generation Global Prediction System program (NGGPS; www.nws.noaa.gov/ost/nggps/) might provide ideas for the community GSI effort and similar community modeling efforts. These two programs were initiated within the operational community, enticing the research community to directly contribute to the development of operational models/systems. However, only when appropriate code management framework and code transition procedures (including tests and reviews) are incorporated through the projects of such a program, will the transition from research to operations be efficient and smooth. It seems reasonable that a close collaboration between the DTC and such a program may direct community efforts to more motivated developers for R2O.
Second, it has been noted that maintaining GSI capabilities or adding new capabilities might become difficult when the associated development efforts are not sustained for some reasons. The DTC is not a development center and, therefore, it is not straightforward to gain expertise in research development without direct involvement from the developers. For example, the DTC was working with NCAR to transition the ARW-based GSI 4DVAR capabilities (Zhang and Huang 2013) to the community code. However, this work cannot be completed since there were no additional resources for NCAR to update the adjoint model for each release of GSI and ARW, which is, however, mandatory for releasing the 4DVAR capabilities to the public.
The third issue is also associated with the rapid development of GSI. GSI has interfaces with both global and regional models and incorporates many different types of observational instruments (some might have been discontinued). The GSI code is showing a trend toward becoming a “giant” eventually if no constraint is put in place for distributed contributions. This is also a common issue many community models may eventually confront. An over-sized system is not desired from an operational viewpoint, since its operational efficiency might be in jeopardy and its maintenance may become difficult over time. Meanwhile, the research community prefers flexibility and more run-time options (other data and background formats, different parameter tuning and configuring, etc.) with which to perform research and make improvements. How to meet this twofold need is a question for the community GSI effort and many other similar community efforts.
An intermediate solution to the last two issues is to modernize the existing system, GSI in this case. Recently, there have been discussions within the GRC and among other collaborators related to the possibility of refactoring GSI. The GSI code may be decomposed into multiple libraries/modules/components, with the flexibility to plug in and out. By doing so, obsolete capabilities can be safely removed and new capabilities or updates can be implemented more easily. The interface of data and background files to GSI may be handled externally to save memory and computer time. The observation operators, which transfer model state variables to observational space, can also become relatively independent of the solver of the GSI and therefore more easily adopted by other data assimilation systems (e.g., by an ensemble-based data assimilation system) or for verification purposes (e.g., verification against nontraditional satellite observations). Currently, there are existing tools available to the modeling community that provide such a modeling framework, for example, the Earth System Modeling Framework (ESMF; www.earthsystemcog.org/projects/esmf). The NOAA Environmental Modeling System (NEMS) is based on ESMF in an effort to streamline the components of operational modeling suites at NCEP. GSI, or similar community efforts, would certainly benefit from a similar effort, within the code management framework. Another possibility is to invest in next-generation data assimilation. The code management and transition framework should be considered from the beginning, while designing such a system. Developers would be required to receive education on building such a system with certain coding standards and requirements so that the code would be more modularized and modernized. Desired capabilities should be included, as well as the preferred interface for portability and interface flexibility.
Last, but not least, is the issue of the community GSI effort being actually beyond the GSI system itself. A data assimilation system is linked to data preprocessing, forecast models, postprocessing, and verification. Currently, the DTC is providing support to all of these components except for data preprocessing. The observations available to the public are sparse and their formats are not unified and therefore they require additional processing before being fed into GSI. Moreover, NCEP feeds GSI with quality controlled conventional data, through a preprocessing process. This process is not available through the existing community framework. Without access to near-operational datasets and appropriate quality control, research efforts using GSI might not be relevant enough to operations. Providing support for the data preprocessing process might be the next step in helping to complete this community GSI effort.
SUMMARY AND FUTURE PLANS.
Starting in 2009, a joint effort between the DTC and NCEP/EMC was initiated to expand the operational GSI data assimilation system to the research community, with the sponsorship of NOAA, the USAF, and NCAR [supported by National Science Foundation (NSF)]. The objectives of this effort are to provide operational capabilities to the research community, open up pathways for the research community to contribute directly to daily operations, and, eventually, accelerate transitions from research to operations, which is in line with the mission of the sponsors and the DTC.
This effort has produced a code management framework capable of unifying the distributed development and operational applications. Major GSI development teams across the United States are members of the GSI Review Committee, which is tasked with coordinating and reviewing code development. The GSI system and its supplemental libraries and auxiliary files are managed in the GSI Community Repository under version control (using Subversion). Targeted code tests are organized to maintain code robustness and integrity. General GSI community support is provided through the DTC, including code access, documentation, tutorials, a helpdesk, and assistance with code transitions and tests. Community researchers and users are encouraged to collaborate with the DTC and/or any of the GSI developers to further advance GSI and associated data assimilation techniques, following the same code management procedures as internal developers.
The close collaboration between the DTC and other primary GSI development teams (including EMC, GMAO, etc.) is critical to this community effort. This helps the DTC to better understand the needs of both the research and operational communities. It also promotes active communication about GSI development among distributed teams and enables the unified code management framework to function as expected. The framework set up during this effort, including the code management and code transition procedures, was also shared with other community efforts at the DTC.
However, through this community effort, the DTC and its collaborators also recognized additional challenges, including issues related to discontinued development, lack of incentives for the community contributions, lack of access to data handling, and so on. The DTC continues to work with its partners to seek solutions to these issues. It might be necessary to expand the current community GSI effort to refactor this data assimilation system or get involved with the development of a next-generation data assimilation system, as well as provide support for data preprocessing. It is also necessary for the DTC to continue to expand its expertise in data assimilation and build a sufficient mechanism (with proper incentives) for motivating contributions from the research community (e.g., developers considered to be “external” to the GRC members). All these potential solutions will require even closer collaborations with operational centers and funding agencies. The DTC also welcomes comments and feedback from the research community.
Meanwhile, the DTC will continue to provide operational data assimilation capabilities to the research community. In addition to GSI, the DTC is working to provide the research community with an ensemble-based data assimilation system, the ensemble Kalman filter (EnKF) system originally developed by NOAA. This system will complement the GSI-based hybrid capabilities through a continuous update of the ensemble pieces. By the time this article is published, this EnKF system will have been released to the research community together with GSI. The DTC will continue its efforts to facilitate the research community contributions back to operations and, eventually, improve numerical weather prediction through data assimilation. It is also important for the DTC to stay abreast of the new NWP initiative and help accelerate the development of the next generation of data assimilation for operations.
This work has been performed under the auspices of the DTC. The DTC is funded by NOAA, the USAF, NCAR, and NSF. The authors also acknowledge ESRL, NCAR, NCEP, the Joint Center for Satellite Data Assimilation (JCSDA), and the NCEP Central Operations (NCO) for facilitating some of the community GSI services.