Data assimilation (DA) integrates computer simulations and real-world observations based on statistical mathematics and dynamical systems theory, and plays a central role in numerical weather prediction (NWP). As computing and sensing technologies advance, DA will deal with “big simulations” and “big data.” Here we focus on rapidly changing convective weather and explore a future direction of two orders of magnitude more rapid weather forecasting by innovating what we call “big data assimilation” (BDA) technology. Tremendous efforts have been devoted to convective-scale NWP and radar DA, including the U.S. effort on the “Warn-on-Forecast” project (Stensrud et al. 2009; 2013), which has been pioneering rapidly updated NWP to be used for warnings about convective-scale hazards. Sun et al. (2014) provided a comprehensive review on this subject with a rich body of literature. Extending a wealth of previous studies, this article presents the concept of BDA research and the first proof-of-concept results of a real high-impact weather case, exploring 30-min forecasts at 100-m grid spacing refreshed every 30 s—120 times more rapidly than hourly updated systems. This revolutionary NWP is only possible by taking advantage of the fortunate combination of Japan’s most advanced technological developments: the 10-petaflops (floating-point operations per second) “K computer” and Phased Array Weather Radar (PAWR; Ushio et al. 2014; Yoshikawa et al. 2013). The science and analytics of big data, typically characterized by four “big V’s” (volume, variety, velocity, and veracity), are growing rapidly, and BDA is one of the first two projects awarded by the Japanese government strategic funding program started in 2013 on general big data applications.1
In contemporary weather forecasting, radar observations and NWP play an essential role in real-time monitoring and short-term prediction of severe weather. The widely used parabolic-antenna radar observes rain intensity along a curvilinear beam track. The radar is rotated, and changes the azimuth and elevation angles to capture the whole sky typically in 5 min for 15 elevation angles. Also, typical convective-scale NWP updates forecasts every hour for the next O(10) hours at O(1)-km grid spacing. However, convective weather systems evolve quickly in 5 min and undertake a nonlinear evolution. The current NWP systems that could possibly use all 5-min radar data at the highest frequency may still be far from sufficient to precisely represent individual convective activities.
Here we explore what the highest-end, next-generation supercomputing and sensing technologies can do at their full capacity, pioneering the future of weather forecasting for the next 10 years. The cutting-edge PAWR implemented in Osaka, Japan, in summer 2012 is capable of observing 100 elevation angles within only 30 s and produces about 100 times more data than the widely used parabolic-antenna radar. Unlike the every-5-min data, the every-30-s data show the continuous evolution of convective weather systems, and we may reasonably assume a linear evolution in 30 s at convective scales (Fig. 1). Also, the highest-end supercomputers to date with O(10)-petaflops capability enable high-precision weather simulations at O(10)-m grid spacing, or even smaller.
To fully take advantage of the large-volume and high-velocity big data from the most advanced simulations and sensors, we propose BDA innovation. BDA aims to refresh 100-m-mesh weather forecasts every 30 s, orders of magnitude faster and more precise than the typical operational hourly updated systems at O(1)-km grid spacing, leading to an O(10)-minute-lead early warning for sudden local severe weather. BDA is also a leap by an order of magnitude over the cutting-edge research which refreshes about 1-km-mesh forecasts every several minutes (e.g., Aksoy et al. 2009; Stensrud et al. 2013; Yussouf et al. 2013).
We developed a prototype BDA system, the workflow starting with 100 parallel simulations or ensemble runs of 100-m-mesh NWP (“100M”) for 30 s using the Japan Meteorological Agency (JMA) nonhydrostatic mesoscale model (NHM; Saito et al. 2006). JMA runs the NHM operationally at 5-km and 2-km grid spacing—50 and 20 times coarser than the BDA system, respectively. The simulation domain is chosen to be a 120-km-by-120-km square centered at the Osaka PAWR, covering the 60-km range of the PAWR (Fig. 2c). The 100 simulated states are input to the DA system based on the local ensemble transform Kalman filter (LETKF; Hunt et al. 2007; Miyoshi and Kunii 2012). The LETKF takes the PAWR data—both reflectivity and Doppler velocity—and updates the 100 states to initialize the next 30-s parallel simulations. The 30-s forecast-DA cycle is repeated by taking PAWR data every 30 s. The most probable state, given by the arithmetic average of the 100 states or simply the ensemble mean, is used to initialize a 30-min simulation. Here, the PAWR data are preprocessed by the newly designed quality control algorithm that takes advantage of the PAWR’s unique, high vertical and temporal resolution (Ruiz et al. 2015). Attenuated echoes are identified and rejected by quality control. For comparison, we also developed a reduced-resolution system at 1-km grid spacing (simply “1KM”).
We have run the prototype BDA system with up to 3,072 nodes of the Japanese 10-petaflops K computer and measured the computational amount as shown in Table 1 (see appendix A in the electronic supplement). With the full capacity of 88,128 nodes (705,024 cores) of the K computer at a typical 5% computational efficiency (i.e., 5% of the 10-petaflops peak performance, effectively 5 1014 flops), it takes about 15 s for the total 7.1 1015 floating-point operations of the 100 parallel NHM runs and LETKF, feasible for the 30-s cycling processes considering the additional time for data transfers between the NHM and LETKF, although the full-capacity experiment is a subject of future research. Compared with 1KM, BDA takes O(103) and O(102) more computations for the forecasts and LETKF, respectively. These are consistent with an estimate that O(10) times more resolution requires O(102) more computations due to an increased number of grid points both for forecasts and LETKF, and O(10) times more computations for forecasts due to a shorter time step.
Computational amount (floating-point operations) measured by the K computer.
The prototype BDA system was tested in a retrospective high-impact weather case that occurred on 13 July 2013, when Kyoto, Japan, had floods and a number of power outages due to severe lightning strikes. The convective rain system was well captured by the Osaka PAWR (see animated three-dimensional visualization in the electronic supplement). First, we have run the “NODA” experiment, in which the BDA system was initialized at 1500 Japanese Standard Time (JST: UTC + 9 hours) and was run for 10 min without observation data input. The 100-m-mesh initial conditions are generated by downscaling an independent 15-km-resolution NHM-LETKF system (see appendix B in the electronic supplement for details). Therefore, NODA contains information only on larger-scale convective instability but not on individual convective cells, which are typically resolved at O(1)-km grid spacing or smaller. Next, we ran the 100M experiment, exactly the same as NODA but with all 30-s PAWR data used. We also ran the 1KM experiment with both model and observations reduced to 1-km resolution. We assimilated weak echoes as “zero precipitation” (Aksoy et al. 2009; Yussouf et al. 2013).
We can see that both 100M and 1KM quickly become closer to the PAWR observations (Fig. 3). Figure 3 shows sawtooth patterns that are typical for cycling DA, for 100M and 1KM in the initial 10 min (white background), indicating that the errors are reduced by assimilating the PAWR data every 30 s. After the eighth DA step, or 1504 JST, the root-mean-square errors (RMSE) relative to the PAWR’s observed reflectivity reach the asymptotic level of about 7 dBZ for 100M (Fig. 3a). 1KM shows about twice as large RMSE for reflectivity, mainly because the 1KM model data cannot fit the higher-resolution PAWR data at about 100 m. For Doppler velocity, the RMSE drops more quickly for 100M than for 1KM, but at 1510 JST the RMSE are similar around 2.3 m s−1; that is, the observed radial velocity field is smoother, and 1KM model data can fit as well as 100M. We have run longer DA cycles and found that the 100M RMSE did not keep increasing (not shown).
The three-dimensional rain distribution captured by PAWR reflectivity at 1510 JST (Fig. 4a–h) shows very good agreement between 100M and the actual PAWR observation, while NODA shows only a broad precipitation pattern partly because of the averaging of widely dispersed ensemble runs. We find that 100M presents the three-dimensional structures of individual convective cells almost identically to the actual observation (Fig. 4g,h). 1KM agrees generally well, but has smoother patterns and missing details (Fig.4 b,f). While PAWR observations provide only reflectivity and Doppler velocity, the model produces physically consistent meteorological fields (Fig. 4j,k). Namely, strong heating in the convective center at around 3–6 km due to latent heat release from water and ice condensation is associated with a strong upward motion of about 10 m s−1 from 2 km to the tropopause. Cooling occurs near the ground due to rainfall behind the heating convective center. Again, 1KM shows only smooth patterns. 100M’s higher resolution can resolve smaller-scale details that may be relevant to local severe phenomena. For example, 100M simulates more detailed structures of hydrometeors with higher peak values of snow and graupel contents (Fig. 5, blue and red shades) accompanied by stronger updrafts (Fig. 5, black contours), suggesting a higher potential of lightning strikes. Also, Figs. 5e and 5f show significant differences, so that large rainwater content (green shades) reaches the ground in 100M but not in 1KM.
Representing these complete meteorological fields beyond what is observed is essential to improve forecasts for the next 30 min (Fig. 3, after 1510 with yellow background). Compared with NODA, both 100M and 1KM indicate strong reflectivity, more similar to the observed reflectivity in the 10, 20, and 30-min forecasts, although the forecast skill drops quickly (Fig. 6). 1KM and 100M are generally similar, and if we focus on larger-scale structures that the 1-km NWP can resolve (typically larger than 5- to 10-km scales), 1KM is useful as suggested by the previous studies (e.g., Aksoy et al. 2009; Stensrud et al. 2013; Yussouf et al. 2013). 100M includes small-scale structures, though not necessarily close to the observations. 1KM has about twice as large RMSE for reflectivity at the initial time, but the RMSE grows slowly and becomes similar to 100M around 1520 JST (Fig. 3a). Namely, the 100M’s small-scale structures are close to the observations at the initial time but evolve rapidly and lose predictability after 10 min. The small-scale structures may be essential in the development of hazardous phenomena, and the skill in the initial 10 min may be critical for successful evacuation of those at risk.
Also, 100M potentially resolves local peak quantities. If we verify the 30-min accumulated rainfall amount from 1510 to 1540 JST in Kyotanabe, Japan (Fig. 6, cross marks), 100M shows 23.95 mm, very close to the JMA gauge observation (20.5 mm), while 1KM indicates only 0.95 mm and none for NODA. No other station from the JMA gauge network reported a significant rainfall greater than 10 mm for this 30-min period due to this event, and a thorough verification with more cases is beyond the scope of this article.
Both 100M and 1KM tend to overproduce the strong reflectivity regions east of the actual observed echoes and may lead to a false alert (Fig. 6). The simulated reflectivity shows an increasing bias trend: at 1540 JST, 100M (1KM) shows the bias of 17.24 (21.29) dBZ relative to the PAWR observations. Also, Fig. 3b shows a rapid growth of the forecast error for Doppler velocity in the first 5 min. 100M shows a more rapid error growth than 1KM, and loses the skill completely after 10 min, even worse than NODA. These results may be related to systematic model errors, particularly in the microphysics schemes, and a dynamical imbalance caused by the 30-s update cycles. The current NWP systems have not been designed or tested with such rapid updates and dense PAWR observations, and NWP system developments for frequent updates and O(100)-m grid spacing or smaller would be essential. Previous studies (e.g., Aksoy et al. 2009; Stensrud et al. 2013; Sun et al. 2014) also suggested that we can provide very accurate analyses, but the forecasts resulting from these analyses tend to lose predictability very rapidly. It is critically important to improve the forecasts for practical applications, and this remains a subject of future research.
Here we showed the general concept of BDA and the first proof-of-concept case study. Future work includes improving the forecast skill with more case studies and enhancing the computational performance with the full capacity of the K computer with nearly a million CPU cores. Improving microphysics and other components of the NWP model for severe weather prediction is a grand challenge. Also, it is important to explore what benefits we can get from the 100-m-mesh 30-s-update BDA for different types of high-impact weather cases at the cost of O(102) more computations than a 1-km-mesh counterpart. Moreover, we have another big data source for BDA: the JMA’s new geostationary meteorological satellite Himawari-8 that started full operations in July 2015 with full-disk observations every 10 min and simultaneous limited area observations every 2.5 min and even every 30 s. Radars capture raindrops, but not smaller cloud droplets, which satellite imagery can capture at an earlier stage of convective development. Exploring an effective use of the high-frequency geostationary imagery data from Himawari-8 for further improvements of convective predictability is an important focus of BDA research.
ACKNOWLEDGMENTS
The authors thank reviewer Altug Alsoy and the other three anonymous reviewers for constructive comments. The PAWR data are archived in the NICT Science Cloud and made available to the public through http://pawr.nict.go.jp/. The numerical experiment results are archived locally at RIKEN Advanced Institute for Computational Science, Kobe, Japan. The LETKF FORTRAN90 code is based on the publicly available code at https://github.com/takemasa-miyoshi/letkf. The NHM model code was provided by JMA under the cooperative research contract between RIKEN and MRI. This study was supported by CREST, JST. T.M. is the PI and leads and coordinates the BDA project. M.K., J.R., and G.Y.L. developed the codes, ran the numerical experiments, and visualized the results. S.S. and T.U. made the PAWR observations, provided the data, and visualized the observation data. S.S., T.U., K.B., H.S., H.T., and Y.I. are the co-PIs and provided fundamental ideas. The authors thank the other members of the BDA project for fruitful discussions.
FOR FURTHER READING
Aksoy, A., D. C. Dowell, and C. Snyder, 2009: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part I: Storm-scale analyses. Mon. Wea. Rev., 137, 1805–1824, doi:10.1175/2008MWR2691.1.
Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112–126, doi:10.1016/j.physd.2006.11.008.
Miyoshi, T., and M. Kunii, 2012: The local ensemble transform Kalman filter with the Weather Research and Forecasting model: Experiments with real observations. Pure Appl. Geophys., 169, 321–333, doi:10.1007/s00024-011-0373-4.
Ruiz, J. J., T. Miyoshi, S. Satoh, and T. Ushio, 2015: A quality control algorithm for the Osaka phased array weather radar. SOLA, 11, 48–52, doi:10.2151/sola.2015-011.
Saito, K., and Coauthors, 2006: The operational JMA nonhydrostatic mesoscale model. Mon. Wea. Rev., 134, 1266–1298, doi:10.1175/MWR3120.1.
Stensrud, D. J., and Coauthors, 2009: Convective-scale warn-on-forecast system: A vision for 2020. Bull. Amer. Meteor. Soc., 90, 1487–1499, doi:10.1175/2009BAMS2795.1.
Stensrud, D. J., and Coauthors, 2013: Progress and challenges with warn-on-forecast. Atmos. Res., 123, 2–16, doi:10.1016/j.atmosres.2012.04.004.
Sun, J., and Coauthors, 2014: Use of NWP for nowcasting convective precipitation. Bull. Amer. Meteor. Soc., 95, 409–426, doi:10.1175/BAMS-D-11-00263.1.
Ushio, T., T. Wu, and S. Yoshida, 2014: Review of recent progress in lightning and thunderstorm detection techniques in Asia. Atmos. Res., 154, 89–102.
Yoshikawa, E., T. Ushio, Z. Kawasaki, S. Yoshida, T. Morimoto, F. Mizutani, and W. Wada, 2013: MMSE beam forming on fast-scanning phased array weather radar. IEEE Trans. Geosci. Remote Sens., 51, 3077–3088, doi:10.1109/TGRS.2012.2211607.
Yussouf, N., E. R. Mansell, L. J. Wicker, D. M. Wheatley, and D. J. Stensrud, 2013: The ensemble Kalman filter analyses and forecasts of the 8 May 2003 Oklahoma City tornadic supercell storm using single- and double-moment microphysics schemes. Mon. Wea. Rev., 141, 3388–3412, doi:10.1175/MWR-D-12-00237.1.
The other project is on pharmaceutical science, focusing on drug discovery and production.