Speeding Up the Computation of WRF Double-Moment 6-Class Microphysics Scheme with GPU

J. Mielikainen Space Science and Engineering Center, University of Wisconsin–Madison, Madison, Wisconsin

Search for other papers by J. Mielikainen in
Current site
Google Scholar
PubMed
Close
,
B. Huang Space Science and Engineering Center, University of Wisconsin–Madison, Madison, Wisconsin

Search for other papers by B. Huang in
Current site
Google Scholar
PubMed
Close
,
H.-L. A. Huang Space Science and Engineering Center, University of Wisconsin–Madison, Madison, Wisconsin

Search for other papers by H.-L. A. Huang in
Current site
Google Scholar
PubMed
Close
,
M. D. Goldberg NOAA/NESDIS/Center for Satellite Applications and Research, College Park, Maryland

Search for other papers by M. D. Goldberg in
Current site
Google Scholar
PubMed
Close
, and
A. Mehta NOAA/NESDIS/Center for Satellite Applications and Research, College Park, Maryland

Search for other papers by A. Mehta in
Current site
Google Scholar
PubMed
Close
Restricted access

We are aware of a technical issue preventing figures and tables from showing in some newly published articles in the full-text HTML view.
While we are resolving the problem, please use the online PDF version of these articles to view figures and tables.

Abstract

The Weather Research and Forecasting model (WRF) double-moment 6-class microphysics scheme (WDM6) implements a double-moment bulk microphysical parameterization of clouds and precipitation and is applicable in mesoscale and general circulation models. WDM6 extends the WRF single-moment 6-class microphysics scheme (WSM6) by incorporating the number concentrations for cloud and rainwater along with a prognostic variable of cloud condensation nuclei (CCN) number concentration. Moreover, it predicts the mixing ratios of six water species (water vapor, cloud droplets, cloud ice, snow, rain, and graupel), similar to WSM6. This paper describes improving the computational performance of WDM6 by exploiting its inherent fine-grained parallelism using the NVIDIA graphics processing unit (GPU). Compared to the single-threaded CPU, a single GPU implementation of WDM6 obtains a speedup of 150× with the input/output (I/O) transfer and 206× without the I/O transfer. Using four GPUs, the speedup reaches 347× and 715×, respectively.

Corresponding author address: Jarno Mielikainen, Space Science and Engineering Center, University of Wisconsin–Madison, 1225 W. Dayton St., Madison, WI 53706. E-mail: mielikai@gmail.com

Abstract

The Weather Research and Forecasting model (WRF) double-moment 6-class microphysics scheme (WDM6) implements a double-moment bulk microphysical parameterization of clouds and precipitation and is applicable in mesoscale and general circulation models. WDM6 extends the WRF single-moment 6-class microphysics scheme (WSM6) by incorporating the number concentrations for cloud and rainwater along with a prognostic variable of cloud condensation nuclei (CCN) number concentration. Moreover, it predicts the mixing ratios of six water species (water vapor, cloud droplets, cloud ice, snow, rain, and graupel), similar to WSM6. This paper describes improving the computational performance of WDM6 by exploiting its inherent fine-grained parallelism using the NVIDIA graphics processing unit (GPU). Compared to the single-threaded CPU, a single GPU implementation of WDM6 obtains a speedup of 150× with the input/output (I/O) transfer and 206× without the I/O transfer. Using four GPUs, the speedup reaches 347× and 715×, respectively.

Corresponding author address: Jarno Mielikainen, Space Science and Engineering Center, University of Wisconsin–Madison, 1225 W. Dayton St., Madison, WI 53706. E-mail: mielikai@gmail.com
Save
  • Cohard, J.-M., and Pinty J.-P. , 2000: A comprehensive two-moment warm microphysical bulk scheme. I: Description and tests. Quart. J. Roy. Meteor. Soc., 126, 18151842.

    • Search Google Scholar
    • Export Citation
  • Hanappe, P., and Coauthors, 2011: FAMOUS, faster: Using parallel computing techniques to accelerate the FAMOUS/HadCM3 climate model with a focus on the radiative transfer algorithm. Geosci. Model Dev., 4, 835844.

    • Search Google Scholar
    • Export Citation
  • Hong, S.-Y., and Lim J. O. J. , 2006: The WRF single-moment 6-class microphysics scheme (WSM6). J. Korean Meteor. Soc., 42, 129151.

  • Hong, S.-Y., Dudhia J. , and Chen S.-H. , 2004: A revised approach to ice-microphysical processes for the bulk parameterization of cloud and precipitation. Mon. Wea. Rev., 132, 103120.

    • Search Google Scholar
    • Export Citation
  • Horn, S., 2012: ASAMgpu V1.0—A moist fully compressible atmospheric model using graphics processing units (GPUs). Geosci. Model Dev., 5, 345353.

    • Search Google Scholar
    • Export Citation
  • Huang, B., Mielikainen J. , Oh H. , and Huang H.-L. , 2011: Development of a GPU-based high-performance radiative transfer model for the infrared atmospheric sounding interferometer (IASI). J. Comput. Phys., 230, 22072221.

    • Search Google Scholar
    • Export Citation
  • Hwu, W.-M. W., Ed., 2011: GPU Computing Gems. Applications of GPU Computing Series, Vol. 1, Emerald ed. Morgan Kaufmann, 886 pp.

  • Lee, V., and Coauthors, 2010: Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. Proc. 37th Annual Int. Symp. on Computer Architecture, Saint-Malo, France, Association for Computing Machinery, 451–460, doi:10.1145/1815961.1816021.

  • Lim, K.-S. S., and Hong S.-Y. , 2010: Development of an effective double-moment cloud microphysics scheme with prognostic cloud condensation nuclei (CCN) for weather and climate models. Mon. Wea. Rev., 138, 15871612.

    • Search Google Scholar
    • Export Citation
  • Mielikainen, J., Huang B. , and Huang H.-L. A. , 2011: GPU-accelerated multi-profile radiative transfer model for the infrared atmospheric sounding interferometer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 4, 691700.

    • Search Google Scholar
    • Export Citation
  • Plaza, A., Plaza J. , and Vegas H. , 2011: Improving the performance of hyperspectral image and signal processing algorithms using parallel, distributed and specialized hardware-based systems. J. Signal Process. Syst., 61, 293315.

    • Search Google Scholar
    • Export Citation
  • Preis, T., Virnau P. , Paul W. , and Schneider J. , 2009: GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model. J. Comput. Phys., 228, 44684477.

    • Search Google Scholar
    • Export Citation
  • Sanders, J., and Kandrot E. , 2011: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley, 312 pp.

  • Setoain, J., Prieto M. , Tenllado C. , Plaza A. , and Tirado F. , 2007: Parallel morphological endmember extraction using commodity graphics hardware. IEEE Geosci. Remote Sens. Lett., 4, 441445.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., Klemp J. B. , Dudhia J. , Gill D. O. , Barker D. M. , Wang W. , and Powers J. G. , 2005: A description of the advanced research WRF version 2. NCAR Tech. Note NCAR/TN-468+STR, 88 pp.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1177 624 192
PDF Downloads 452 89 8