• Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127, 27412758, https://doi.org/10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part 1: Tests on simple error models. Tellus, 61A, 8496, https://doi.org/10.1111/j.1600-0870.2008.00371.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part 2: Astrategy for the atmosphere. Tellus, 61A, 97111, https://doi.org/10.1111/j.1600-0870.2008.00372.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., B. Etherton, and S. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., J. S. Whitaker, and L. Lei, 2017: Gain form of the ensemble transform Kalman filter and its relevance to satellite data assimilation with model space ensemble covariance localization. Mon. Wea. Rev., 145, 45754592, https://doi.org/10.1175/MWR-D-17-0102.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Campbell, W. F., C. H. Bishop, and D. Hodyss, 2010: Vertical covariance localization for satellite radiances in ensemble Kalman filters. Mon. Wea. Rev., 138, 282290, https://doi.org/10.1175/2009MWR3017.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Carrió, D. S., C. H. Bishop, and S. Kotsuki, 2021: Empirical determination of the covariance of forecast errors: An empirical justification and reformulation of hybrid covariance models. Quart. J. Roy. Meteor. Soc., 147, 20332052, https://doi.org/10.1002/qj.4008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Clayton, A. M., A. C. Lorenc, and D. M. Barker, 2013: Operational implementation of a hybrid ensemble/4D‐Var global data assimilation system at the Met Office. Quart. J. Roy. Meteor. Soc., 139, 14451461, https://doi.org/10.1002/qj.2054.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Desroziers, G., L. Berre, B. Chapnik, and P. Poli, 2005: Diagnosis of observation, background and analysis-error statistics in observation space. Quart. J. Roy. Meteor. Soc., 131, 33853396, https://doi.org/10.1256/qj.05.108.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Etherton, B. J., and C. H. Bishop, 2004: Resilience of hybrid ensemble/3DVAR analysis schemes to model error and ensemble covariance error. Mon. Wea. Rev., 132, 10651080, https://doi.org/10.1175/1520-0493(2004)132<1065:ROHDAS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, 10 14310 162, https://doi.org/10.1029/94JC00572.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Greybush, S. J., E. Kalnay, T. Miyoshi, K. Ide, and B. R. Hunt, 2011: Balance and ensemble Kalman filter localization techniques. Mon. Wea. Rev., 139, 511522, https://doi.org/10.1175/2010MWR3328.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter–3D variational analysis scheme. Mon. Wea. Rev., 128, 29052919, https://doi.org/10.1175/1520-0493(2000)128<2905:AHEKFV>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 27762790, https://doi.org/10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamrud, M., M. Bonavita, and L. Isaksen, 2015: EnKF and hybrid gain ensemble data assimilation. Part I: EnKF implementation. Mon. Wea. Rev., 143, 48474864, https://doi.org/10.1175/MWR-D-14-00333.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hotta, D., and Y. Ota, 2021: Why does EnKF suffer from analysis overconfidence? An insight into exploiting the ever‐increasing volume of observations. Quart. J. Roy. Meteor. Soc., 147, 12581277, https://doi.org/10.1002/qj.3970.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123137, https://doi.org/10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and F. Zhang, 2016: Review of the ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 144, 44894532, https://doi.org/10.1175/MWR-D-15-0440.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112126, https://doi.org/10.1016/j.physd.2006.11.008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kondo, K., and T. Miyoshi, 2016: Impact of removing covariance localization in an ensemble Kalman filter: Experiments with 10 240 members using an intermediate AGCM. Mon. Wea. Rev., 144, 48494865, https://doi.org/10.1175/MWR-D-15-0388.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kondo, K., and T. Miyoshi, 2019: Non-Gaussian statistics in global atmospheric dynamics: A study with a 10 240-member ensemble Kalman filter using an intermediate atmospheric general circulation model. Nonlinear Processes Geophys., 26, 211225, https://doi.org/10.5194/npg-26-211-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kondo, K., T. Miyoshi, and H. L. Tanaka, 2013: Parameter sensitivities of the dual-localization approach in the local ensemble transform Kalman filter. SOLA, 9, 174178, https://doi.org/10.2151/sola.2013-039.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kotsuki, S., A. Pensoneault, A. Okazaki, and T. Miyoshi, 2020: Weight structure of the local ensemble transform Kalman filter: A case with an intermediate atmospheric general circulation model. Quart. J. Roy. Meteor. Soc., 146, 33993415, https://doi.org/10.1002/qj.3852.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kretschmer, M., B. R. Hunt, and E. Ott, 2015: Data assimilation using a climatologically augmented local ensemble transform Kalman filter. Tellus, 67A, 26617, https://doi.org/10.3402/tellusa.v67.26617.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kuhl, D. D., T. E. Rosmond, C. H. Bishop, J. McLay, and N. L. Baker, 2013: Comparison of hybrid ensemble/4DVar and 4DVar within the NAVDAS-AR data assimilation framework. Mon. Wea. Rev., 141, 27402758, https://doi.org/10.1175/MWR-D-12-00182.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lei, L., J. S. Whitaker, and C. Bishop, 2018: Improving assimilation of radiance observations by implementing model space localization in an ensemble Kalman filter. J. Adv. Model. Earth Syst., 10, 32213232, https://doi.org/10.1029/2018MS001468.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP—A comparison with 4D‐Var. Quart. J. Roy. Meteor. Soc., 129, 31833203, https://doi.org/10.1256/qj.02.132.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lu, X., X. Wang, Y. Li, M. Tong, and X. Ma, 2017: GSI‐based ensemble‐variational hybrid data assimilation for HWRF for hurricane initialization and prediction: Impact of various error covariances for airborne radar observation assimilation. Quart. J. Roy. Meteor. Soc., 143, 223239, https://doi.org/10.1002/qj.2914.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., 2005: Ensemble Kalman filter experiments with a primitive-equation global model. Ph.D. dissertation, University of Maryland, College Park, 197 pp., https://github.com/takemasa-miyoshi/letkf.

    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., 2011: The Gaussian approach to adaptive covariance inflation and its implementation with the local ensemble transform Kalman filter. Mon. Wea. Rev., 139, 15191535, https://doi.org/10.1175/2010MWR3570.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., and S. Yamane, 2007: Local ensemble transform Kalman filtering with an AGCM at a T159/L48 resolution. Mon. Wea. Rev., 135, 38413861, https://doi.org/10.1175/2007MWR1873.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., K. Kondo, and T. Imamura, 2014: The 10 240‐member ensemble Kalman filtering with an intermediate AGCM. Geophys. Res. Lett., 41, 52645271, https://doi.org/10.1002/2014GL060863.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Molteni, F., 2003: Atmospheric simulations using a GCM with simplified physical parametrizations. I: Model climatology and variability in multi-decadal experiments. Climate Dyn., 20, 175191, https://doi.org/10.1007/s00382-002-0268-2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Piccolo, C., M. J. Cullen, W. J. Tennant, and A. T. Semple, 2019: Comparison of different representations of model error in ensemble forecasts. Quart. J. Roy. Meteor. Soc., 145, 1527, https://doi.org/10.1002/qj.3348.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Posselt, D. J., and C. H. Bishop, 2012: Nonlinear parameter estimation: Comparison of an ensemble Kalman smoother with a Markov chain Monte Carlo algorithm. Mon. Wea. Rev., 140, 19571974, https://doi.org/10.1175/MWR-D-11-00242.1; Corrigendum. Mon. Wea. Rev., 142, 1382, https://doi.org/10.1175/MWR-D-13-00342.1

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schraff, C., H. Reich, A. Rhodin, A. Schomburg, K. Stephan, A. Periáñez, and R. Potthast, 2016: Kilometre‐scale ensemble data assimilation for the COSMO model (KENDA). Quart. J. Roy. Meteor. Soc., 142, 14531472, https://doi.org/10.1002/qj.2748.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, X., and T. Lei, 2014: GSI-based four-dimensional ensemble–variational (4DEnsVar) data assimilation: Formulation and single-resolution experiments with real data for NCEP Global Forecast System. Mon. Wea. Rev., 142, 33033325, https://doi.org/10.1175/MWR-D-13-00303.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, X., C. H. Bishop, and S. J. Julier, 2004: Which is better, an ensemble of positive–negative pairs or a centered spherical simplex ensemble? Mon. Wea. Rev., 132, 15901605, https://doi.org/10.1175/1520-0493(2004)132<1590:WIBAEO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yang, S.-C., M. Corazza, A. Carrassi, E. Kalnay, and T. Miyoshi, 2009: Comparison of local ensemble transform Kalman filter, 3DVAR, and 4DVAR in a quasigeostrophic model. Mon. Wea. Rev., 137, 693709, https://doi.org/10.1175/2008MWR2396.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • View in gallery

    Schematic illustration of the (a) SPEEDY model combined with the LETKF and (b) SPEEDY with the hybrid LETKF. The x-axis represents DA cycle, and squiggly lines represent accumulation of DA cycles. For hybrid LETKF, no update is employed to climatological components in contrast to ensemble component (cf. section 2c). The purple line represents the collection of ensemble perturbation [δXensb(1)]. The observation operator (obsope) yields ensemble forecasts in observation space (HXb) using the observation (yo) and ensemble forecasts (Xb).

  • View in gallery

    (a) Observation network. Black dots and red crosses represent SPEEDY model grid points and observation points, respectively. (b) Numbers of assimilated observations at fourth-level SPEEDY grid points at a horizontal localization scale (Le) of 900 km.

  • View in gallery

    First-guess (FG; i.e., 6-h forecast) root-mean-square errors (RMSEs; solid lines) and ensemble spreads (dashed lines) for temperature (K) at 500 hPa averaged over 2 years from January 1986 to December 1987. Black and red lines represent R- and Z-localizations, respectively. The abscissa shows the horizontal localization scale (Le). The LETKF experiments are employed with (a) 10, and (b) 20, and (c) 40 members.

  • View in gallery

    Spatial patterns of analysis increments for temperature (K) at 500 hPa with climatological error covariance Pclmb without localization. Results shown for (a) c = 20, (b) c = 100, (c) c = 240, (d) c = 365, (e) c = 730, and (f) c = 1095. Cross marks represented the observing point.

  • View in gallery

    As in Fig. 4, but also showing differences between cases of (left) no localization and (right) localization with Le = Lc = 800 km. The ensemble and climatological sizes were m = 20 and c = 730, respectively. The hybrid coefficients were (a),(b) α = 0.999; (c),(d) α = 0.500; and (e),(f) α = 0.001. Black lines indicate the localization cutoff radii, determined as 210/3Le. The “no loc” in (a), (c), and (e) means no localization. Cross marks represented the observing point.

  • View in gallery

    As in Fig. 4, but also showing cases of (a),(c),(e) no localization and (b),(d),(f) localizations with Le = 800 km and Lc = 2400 km. Black solid and dashed lines indicate the localization cutoff radii for ensemble and climatological perturbations, calculated as 210/3Le and 210/3Lc, respectively. Panels (a), (c), and (e) are the same as Figs. 5a, 5c, and 5e. The “no loc” in (a), (c), and (e) means no localization. Cross marks represented the observing point.

  • View in gallery

    First-guess (FG; i.e., 6-h forecast) RMSEs from 20-member experiments for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986 as a function of the hybrid coefficient α. The dashed line and solid lines indicate the RMSEs for the LETKF and hybrid LETKF, respectively, for c = 20 (green), c = 100 (blue), and c = 365 (red).

  • View in gallery

    Spatial patterns of first-guess (i.e., 6-h forecast) RMSEs for the 20-ensemble LETKF and hybrid LETKF for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986. EXP1 employed the LETKF, and EXP2, EXP3, and EXP4 employed the hybrid LETKF, with climatological sizes of 20, 100, and 365, respectively. The hybrid coefficient was α = 0.700. (a) RMSEs for EXP1. RMSE differences between experiments (b) EXP2 and EXP1, (c) EXP3 and EXP1, and (d) EXP4 and EXP1. The localization scales were Le = Lc = 900 km. Warmer colors indicate better RMSE results compared with EXP1.

  • View in gallery

    First-guess (FG; i.e., 6-h forecast) RMSEs of 20-member experiments for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986 as a function of the hybrid coefficient α. Experiments for which (a) c = 20, (b) c = 100, and (c) c = 365 using different localization scales (black, Le = Lc = 900 km; dark green, Le = Lc = 1000 km; green, Le = Lc = 1100 km; blue, Le = Lc = 1200 km; purple, Le = Lc = 1300 km; red, Le = Lc = 1400 km, and magenta, Le = Lc = 1500 km). Dashed lines indicate RMSEs for the LETKF, and gray shading indicates the ± 10% range.

  • View in gallery

    Globally averaged first-guess (i.e., 6-h forecast) RMSEs from 20-member experiments for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986. The x and y axes show the localization scales for ensemble and climatological perturbations (Le and Lc), respectively. The climatological sizes were (top) c = 100 and (bottom) c = 365, and the hybrid coefficients were (a),(d) α = 0.600; (b),(e) α = 0.700; and (c),(f) α = 0.800.

  • View in gallery

    Spatial patterns of first-guess (i.e., 6-h forecast) RMSEs for the hybrid LETKF with m = 20 and c = 365 for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986. The hybrid coefficient was α = 0.700. (a) RMSEs for EXP1. Differences in RMSE between (b) EXP2 and EXP1, (c) EXP3 and EXP1, and (d) EXP4 and EXP1. The localization scales were Le = Lc = 900 km for EXP1, Le = 1300 km and Lc = 900 km for EXP2, Le = 900 km and Lc = 1300 km for EXP3, and Le = Lc = 1300 km for EXP4. Warmer colors indicate better RMSE results compared with EXP1.

  • View in gallery

    First-guess (FG; i.e., 6-h forecast) RMSEs from (a),(b) 10-member; (c),(d) 20-member; and (e),(f) 40-member experiments for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986, as a function of the hybrid coefficient α. Climatological sizes were set as (left) c = 20 and (right) c = 365. Dashed lines indicate the RMSEs for the LETKF, and gray shading indicates the ± 10% range.

  • View in gallery

    Computational times (s) for 20-member data assimilation (DA) averaged over 40 DA cycles as a function of ensemble size plus climatological size (m + c). (a),(b) The normal scale and log-scale elapsed times, respectively. Black lines show the standard ETKF that always employ the eigenvalue decomposition for I + (HZb)TR−1HZb [O(m + c)3] whereas red lines show the optimal eigendecomposition (OED) ETKF that employs the eigenvalue decomposition for (HZb)TR−1HZb [O(m + c)3] or R−1/2HZb(HZb)TR−1/2 [O(p3)] adaptively depending on the number of local observations (p), and ensemble plus climatological sizes (m + c). Blue and green dashed lines in (b) indicate the functions [(m + c)/20]2 and [(m + c)/20]3, respectively.

  • View in gallery

    As in Fig. 9, but for experiments with imperfect SPEEDY model using different localization scales (black, Le = Lc = 700 km; dark green, Le = Lc = 800 km; green, Le = Lc = 900 km; blue, Le = Lc = 1000 km; purple, Le = Lc = 1100 km; red, Le = Lc = 1200 km, and magenta, Le = Lc = 1300 km). Dashed lines indicate RMSEs for the LETKF (0.999 K), and gray shading indicates the ± 10% range.

  • View in gallery

    Spatial patterns of the number of assimilated observations at fourth-level SPEEDY grid points at horizontal localization scale (Le) of (a) 500 km, (b) 1000 km, (c) 1500 km and (d) 2000 km.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 37 37 37
PDF Downloads 38 38 38

Implementing Hybrid Background Error Covariance into the LETKF with Attenuation-Based Localization: Experiments with a Simplified AGCM

View More View Less
  • 1 aCenter for Environmental Remote Sensing, Chiba University, Chiba, Japan
  • | 2 bRIKEN Center for Computational Science, Kobe, Japan
  • | 3 cRPRESTO, Japan Science and Technology Agency, Chiba, Japan
  • | 4 dSchool of Geography, Earth and Atmospheric Sciences, The University of Melbourne, Parkville, Victoria, Australia
  • | 5 eARC Centre of Excellence for Climate Extremes, Parkville, Victoria, Australia
Open access

Abstract

Recent numerical weather prediction systems have significantly improved medium-range forecasts by implementing hybrid background error covariance, for which climatological (static) and ensemble-based (flow-dependent) error covariance are combined. While the hybrid approach has been investigated mainly in variational systems, this study aims at exploring methods for implementing the hybrid approach for the local ensemble transform Kalman filter (LETKF). Following Kretschmer et al., the present study constructed hybrid background error covariance by adding collections of climatological perturbations to the forecast ensemble. In addition, this study proposes a new localization method that attenuates the ensemble perturbation (Z-localization) instead of inflating observation error variance (R-localization). A series of experiments with a simplified global atmospheric model revealed that the hybrid LETKF resulted in smaller forecast errors than the LETKF, especially in sparsely observed regions. Due to the larger ensemble enabled by the hybrid approach, optimal localization length scales for the hybrid LETKF were larger than those for the LETKF. With the LETKF, the Z-localization resulted in similar forecast errors as the R-localization. However, Z-localization has an advantage in enabling us to apply different localization scales for flow-dependent perturbation and climatological static perturbations with the hybrid LETKF. The optimal localization for climatological perturbations was slightly larger than that for flow-dependent perturbations. This study also proposes optimal eigendecomposition (OED) ETKF formulation to reduce computational costs. The computational expense of the OED ETKF formulation became significantly smaller than that of standard ETKF formulations as the number of climatological perturbations was increased beyond a few hundred.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Shunji Kotsuki, shunji.kotsuki@ chiba-u.jp

Abstract

Recent numerical weather prediction systems have significantly improved medium-range forecasts by implementing hybrid background error covariance, for which climatological (static) and ensemble-based (flow-dependent) error covariance are combined. While the hybrid approach has been investigated mainly in variational systems, this study aims at exploring methods for implementing the hybrid approach for the local ensemble transform Kalman filter (LETKF). Following Kretschmer et al., the present study constructed hybrid background error covariance by adding collections of climatological perturbations to the forecast ensemble. In addition, this study proposes a new localization method that attenuates the ensemble perturbation (Z-localization) instead of inflating observation error variance (R-localization). A series of experiments with a simplified global atmospheric model revealed that the hybrid LETKF resulted in smaller forecast errors than the LETKF, especially in sparsely observed regions. Due to the larger ensemble enabled by the hybrid approach, optimal localization length scales for the hybrid LETKF were larger than those for the LETKF. With the LETKF, the Z-localization resulted in similar forecast errors as the R-localization. However, Z-localization has an advantage in enabling us to apply different localization scales for flow-dependent perturbation and climatological static perturbations with the hybrid LETKF. The optimal localization for climatological perturbations was slightly larger than that for flow-dependent perturbations. This study also proposes optimal eigendecomposition (OED) ETKF formulation to reduce computational costs. The computational expense of the OED ETKF formulation became significantly smaller than that of standard ETKF formulations as the number of climatological perturbations was increased beyond a few hundred.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Shunji Kotsuki, shunji.kotsuki@ chiba-u.jp

1. Introduction

In numerical weather prediction (NWP), data assimilation (DA) combines information from NWP forecasts from recent initial conditions with millions of observations to create a new set of initial conditions. The optimality of the combination critically depends on the accuracy of estimates of the background and observation error covariances used in the DA procedure. Recent operational NWP systems have significantly improved medium-range forecasts by implementing hybrid background error covariance models, for which climatological (i.e., static) and ensemble-based (i.e., flow-dependent) error covariance are combined (e.g., Clayton et al. 2013; Kuhl et al. 2013; Wang and Lei 2014). This approach was first investigated by Hamill and Snyder (2000) and Lorenc (2003), who estimated the error covariance through the linear combination of climatological and flow-dependent error covariance. Carrió et al. (2021) compared true forecast error covariance and static/flow-dependent error covariance using an intermediate atmospheric global circulation model and found that the hybrid approach provided better estimates on forecast error covariance than both static and flow-dependent methods. The hybrid approach has been used mainly in variational DA systems to incorporate flow-dependent error covariances in addition to static error covariance. By contrast, hybrid DA has not been thoroughly investigated for ensemble Kalman filter (EnKF; Evensen 1994) approaches, which is another major DA method for NWP.

The EnKF has been explored intensively in the past two decades as a practical advanced DA option for NWP. The advantages of EnKF include computationally affordable flow-dependent error covariance estimates, ease of implementation, and analysis error representation by the analysis ensemble (Houtekamer and Zhang 2016). Among the various types of EnKFs, the local ensemble transform Kalman filter (LETKF; Hunt et al. 2007) is commonly used for operational NWP by operational NWP centers such as the Deutscher Wetterdienst and Japan Meteorological Agency. Ensemble-based error covariance estimates are typically restricted to a much lower rank than error covariance used in variational methods; this rank deficiency is practically compensated for by localization (Houtekamer and Mitchell 2001; Hamill et al. 2001). Kretschmer et al. (2015) proposed an alternative method to ameliorate this rank deficiency. They introduced a hybrid background error covariance by augmenting current flow-dependent perturbations with a climatological sample of perturbations. This combined ensemble was then used in the ETKF (Bishop et al. 2001). Kretschmer et al. (2015) tested this augmented approach using a low-dimensional Lorenz II model and reported that the augmented approach stabilized an LETKF-based DA system using small ensembles. However, the hybrid approach of Kretschmer et al. (2015) has not been tested for high-dimensional models such as NWP models.

The objective of the present study is to explore methods for implementing the hybrid approach for NWP models. Following Kretschmer et al. (2015), we construct hybrid background error covariances by combining random samples of ensemble perturbations from a climatology of previously made ensemble forecasts to the current ensemble forecast. This procedure enables the ensemble DA scheme to use a hybrid mixture of an approximation to the climatological mean of the forecast error covariance matrix and today’s flow-dependent ensemble. In addition, it increases the number of perturbations used by the DA scheme, and hence, boosts the rank of the background error covariance model. For the hybrid LETKF, we propose a new localization method that attenuates the ensemble perturbation instead of inflating observation error variance. The new localization method can apply different localization scales to flow-dependent and climatological perturbations. Finally, this study introduces an optimal eigendecomposition (OED) ETKF formulation to reduce computational cost. We implement the hybrid LETKF, new localization method and OED ETKF formulation using the Simplified Parameterizations, Primitive Equation Dynamics (SPEEDY) model based on an available SPEEDY-LETKF system (Miyoshi 2005; https://github.com/takemasa-miyoshi/letkf).

This paper is organized as follows: section 2 describes the DA and localization methods, section 3 explains the experimental setting, sections 4 and 5 present and discuss the results, respectively, and section 6 provides a summary. In this study, we updated the codes of SPEEDY-LETKF for three purposes. The first update is the implementation of the OED-ETKF formulation to reduce computational costs of the LETKF that solves the ETKF at every analysis grid points. The OED-ETKF is a simple modification of the ETKF solvers in LETKF DA systems. The OED-ETKF is described in section 2a, and its computational cost is compared with the solver of original LETKF in section 5c. The second update is the new attenuation-based localization named as Z-localization, which is defined in section 2b and compared with the classical R-localization in section 4a. Impacts of applying different localization scales with the Z-localization are discussed in section 5a. The third update is the augmentation of climatological perturbations for the hybrid LETKF, which is defined in section 2c. Section 3b describes the implementation of the hybrid LETKF into the existing LETKF system, and sections 4b, 5b, and 5d compare the forecast accuracy of the hybrid LETKF relative to the original LETKF.

2. Methodology

a. Ensemble transform Kalman filter (ETKF)

The ETKF (Bishop et al. 2001) is an ensemble square root filter that is widely used for DA in the geosciences. We assume an ensemble X with an ensemble mean vector x¯(n) and a perturbation matrix δX{x(1)x¯,,x(m)x¯}(n×m), where, n is the model dimension, m is the ensemble size, and the overbar indicates an ensemble mean. The error covariance matrix in EnKF is estimated using the ensemble perturbations as follows:
PZZT(n×n),
where ZδX/m1(n×m). With this approximation, the rank of P is equal to that of Z, which is at most m − 1 (Hunt et al. 2007). Bishop et al. (2001) proposed an ETKF for an (m − 1)-dimensional subspace (ensemble space) spanned by Zb. The ETKF transforms the background and analysis error covariances from the ensemble space to the model space as follows:
Pb=ZbP˜b(Zb)T,
Pa=ZbP˜a(Zb)T,
where P˜(m×m) is the error covariance matrix in the ensemble space, the tilde indicates the ensemble space, and the superscripts b and a represent the background (prior) and analysis (posterior), respectively. Equations (1) and (2) yield the background error covariance in the ensemble space as P˜b=I. Assuming Gaussian forecast and observation error covariances, Hunt et al. (2007) showed that
(P˜a)1=(P˜b)1+(HZb)TR1HZb,
P˜a=[I+(HZb)TR1HZb]1,
where R(p×p) is the observation error covariance, H (p×n) is the linear observation operator, and p is the number of observations. For the nonlinear observation operator H, we compute HZb as follows:
HZb{H(Xb)H(Xb)¯1}/m1,
where 1 is the row vector in which all components are 1 (i.e., 1 ≡ [1, 1, … ,1]).
Earlier, and starting with the Kalman filter equations that deliver the best linear unbiased estimate (BLUE) of the posterior state, regardless of the degree of Gaussianity of the uncertainty distribution, Bishop et al. (2001), noted that the computational expense of solving the equations could be linked to the number of nonzero singular values in the R−1/2HZb = EΓ1/2CT. For typical applications, the ensemble size m and the observation size p are much smaller than the model dimension n, and m is usually smaller than p. In the case, that m < p, the maximum number of nonzero eigenvalues in Γ is m − 1 and hence the most efficient way of obtaining Γ and C is by solving the m × m eigenvalue problem given by
(HZb)TR1HZb=CΓCTߙ(m×m),
where C (m×g) represents the eigenvectors and Γ (g×g) is the diagonal matrix, whose components are eigenvalues. The rank of (HZb)TR−1HZb is g = min(n, p, m − 1). When m < p, solving (7) has a similar cost to the matrix inversion [Eq. (5)] and it involves O(m3) operations. If p > m, then forming (HZb)TR−1HZb requires O(m2p), which is often larger than the cost of eigenvalue decomposition. The analysis increment is given by
x¯ax¯b=PaHTR1dob,
=Zb[I+(HZb)TR1HZb]1(HZb)TR1dob,
=ZbC(I+Γ)1CT(HZb)TR1dob,
where dob (p) is the observation departure [dobyoH(Xb)¯] and y (p) is the observation. The superscript o indicates the observation. The analysis perturbation is given by
Za=Zb(P˜a)1/2=ZbC(I+Γ)1/2CT.
Consequently, the analysis ensemble is given by
Xa=x¯a1+m1Za.
However, as noted in Posselt and Bishop (2012), if m is larger than p, then it is computationally less expensive to obtain Γ from
R1/2HZb(HZb)TR1/2=EΓET(p×p),
where E (p×p) represents the eigenvectors (Note that with m > p, all of the p eigenvalues in Γ are greater than zero, and hence this diagonal matrix is trivial to invert). The number of operations to achieve this eigenvalue decomposition is O(p3), which is smaller than the O(m3) operations for Eq. (7), under the condition that the number of assimilated observations p is smaller than the ensemble size m. The eigenvectors C and E satisfy
R1/2HZb=EΓ1/2CT.
Consequently, the concise m × p matrix C is given by
C=(HZb)TR1/2EΓ1/2.
Using (15) in (11) then provides a less computationally expensive update of the ensemble mean. In this case, as noted in Posselt and Bishop (2012) the appropriate update for the analysis perturbations is
Za=Zb{IC[I(I+Γ)1/2]CT}.

In passing, note that Eq. (16) is also the basis of the gain form of the ETKF (Bishop et al. 2017; Lei et al. 2018), which is particularly useful when modulated ensembles are used to achieve model space localization (Bishop and Hodyss 2009a,b).

Consequently, to minimize computational costs, in this study, the analysis update equations depend on the number of local observations (p) and the ensemble size (m). Specifically, we use
x¯ax¯b=ZbC(I+Γ)1CT(HZb)TR1dob,
Za=Zb(P˜a)1/2={ZbC(I+Γ)1/2CT,m<pZb{IC[I(I+Γ)1/2]CT},mp,
where the eigenvalue decomposition is solved by
{(HZb)TR1HZb=CΓCT,m<pR1/2HZb(HZb)TR1/2=EΓET,mp.

This study denotes this optimal selection of the eigenvalue-decomposition as the OED ETKF formulation. The extent to which the OED ETKF formulation [Eqs. (17)(19)] achieves computational savings over the Hunt et al. (2007) approach [Eqs. (10), (11), and (5)] with our experimental settings is assessed later in the paper (section 5c and Fig. 13).

The EnKF usually underestimates the analysis error covariance, mainly due to the limited ensemble size, nonlinear model dynamics, and model error. To mitigate the risk of an underdispersive ensemble, covariance inflation is practically important. In the present study, we apply multiplicative inflation (Anderson and Anderson 1999), which inflates the background ensemble perturbation by the following:
ZbβZb,
where β is the inflation coefficient.

b. Localization

In practical EnKF applications, covariance localization is necessary for increasing the rank of the background error covariance Pb and removal of erroneous long-range correlations (Houtekamer and Mitchell 2001; Hamill et al. 2001). Hunt et al. (2007) proposed the LETKF that solves the ETKF equations at all model grid points by assimilating a subset of observations surrounding the analysis grid point. Localization of the LETKF is usually achieved by inflating the observation error variance R for observations that are distant from the analysis model grid point (Hunt et al. 2007; Miyoshi and Yamane 2007; hereafter, R-localization) as follows:
Rii*fi1Rii,
where R is assumed to be diagonal (i.e., uncorrelated observation errors), which is a common assumption in meteorological applications, and fi is the localization function [0:1] between the ith observation and analysis model grid point. Thus, the analysis equation is given by
x¯ax¯b=ZbC(I+Γ)1CT(HZb)T(R*)1dob,
and Eq. (18), where the eigenvalue decomposition is solved by
{(HZb)T(R*)1HZb=CΓCT,m<p(R*)1/2HZb(HZb)T(R*)1/2=EΓET,mp.
In this study, we propose the following alternative localization that attenuates the observation space ensemble perturbation HZb for distant observations (hereafter, Z-localization):
(HZb)i,j*fi(HZb)i,j(j=1,,m),
(HZb)i,jfi(HZb)i,j(j=1,,m).
The analysis increment and perturbation are computed by
x¯ax¯b=ZbC(I+Γ)1CT(HZb)TR1dob,
and Eq. (18) where the eigenvalue decomposition is solved by
{(HZb)*TR1(HZb)*=CΓCT,m<pR1/2(HZb)*(HZb)*TR1/2=EΓET,mp.

Localization using Eqs. (21) and (24) has exactly the same effect for (HZb)TR−1HZb and R−1/2HZb(HZb)TR−1/2. However, Eq. (25) represents a more severe localization function so that the attenuation-based Z-localization is equivalent to R-localization for computing analysis increment [i.e., so that (HZb)†TR−1 is equivalent to (HZb)T(R*)−1]. Consequently, the Z-localization results in the same analysis ensemble as the R-localization if the observation operator is linear. However, the Z-localization makes it easy to apply differing localizations to differing ensemble members. Here, we shall explore the effect of applying localization length scales to the flow-dependent perturbation that are different to the localization length scales applied to the climatological static perturbations. In contrast, it is not easy to see how one could allow the localization to depend on ensemble member with R-localization (cf. section 2c).

Greybush et al. (2011) categorized covariance localization into two groups, depending on whether localization was applied to the observation error covariance matrix R (R-localization) or to the background error covariance matrix (B-localization). B-localization can be further divided into “observation space B-localization,” which localizes PbHT or HPbHT to compute the Kalman gain K, and “model-space B-localization,” which localizes Pb directly (Hotta and Ota 2021). The model-space localization does not specify locations of observations; therefore, the advantage of the model-space localization is most likely to be seen when nonlocal observation operators are used such as for satellite microwave radiances (Campbell et al. 2010). In addition, the model-space Z-localization might give contrasting results when the observation operator is a nonlinear function of model variables. However, implementing the model-space localization in EnKF is generally difficult since EnKFs does not construct Pb explicitly.

Bishop and Hodyss (2009a,b) proposed an ensemble modulation approach without constructing Pb explicitly such that
ϕPb=ϕ[Zb(Zb)T]=Zmodb(Zmodb)T.
Here, ϕ is the localization matrix (n×n), ߙ is the Schur product (i.e., elementwise multiplication) and Zmodb is the modulated ensemble perturbation (n×s), where s = m × q and q is the rank of ϕ. Subscript mod represents the modulated perturbation. Using the square root matrix W = [w1, w2, ..., wq] (n×q) that satisfy ϕ = WWT, Zmodb is computed by
Zmodb=1m1{[w1δxb(1),,wqδxb(1)],,[w1δxb(m),,wqδxb(m)]}.

The high-rank modulated perturbation Zmodb is used for solving the ETKF equations. The increase in the number of ensemble members associated with the increase in rank increases the computational cost.

The attenuating of HZb examined here is very similar to R-localization, and can be implemented easily without requiring the explicit linear observation operator H because of Eq. (6). Here, we introduce a new, computationally inexpensive model space type localization via the following steps:

  1. (i)Localize model-space ensemble perturbations Zb such that (Zb)i,j*fi(Zb)i,j and (Zb)i,jfi(Zb)i,j (j = 1, … ,m) where fi is the localization function [0:1] between the ith model grid point and analysis model grid point. At every analysis grid point, these procedures are employed for grids within each local ETKF region. Since Zb has dimension of n × m, the localization function fi is applied to m components in ith column of Zb. Note that the attenuation only needs to be done on the subset of variables that H operates on (i.e., the union of model variables for which at least one of the corresponding columns of H has a non-zero element);
  2. (ii)Add the localized model space perturbations (Zb)* and (Zb) back onto the ensemble mean x¯b to create new modified “full” ensemble members (Xb)* and (Xb);
  3. (iii)Apply the nonlinear observation operator H() twice to each of the “full” members and then derive the mean and perturbations in observation space to compute (HZb)* and (HZb).

If the linear observation operator H was available, one could skip steps ii and iii and obtain (HZb)* and (HZb) by multiplying H into (Zb)* and (Zb). As seen in steps i–iii, attenuating Zb directly needs no specification of observed locations. This fact makes it an attractive choice for the assimilation of observations that are integrals of the state, such as microwave radiances. While not tested in this paper, it will be interesting to see whether the new model space localization introduced here leads to similar DA improvements relative to observation space localization as those seen in the experiments of Campbell et al. (2010) and Lei et al. (2018).

If for some reason, one cannot access the Jacobian H of the observation operator note that, in this case, the nonlinear H() needs to be applied to each modulated ensemble member within each local ETKF region because the members with attenuated perturbations are different for each local region. Furthermore, the nonlinear observation operator needs to be applied to both (Xb)* and (Xb). Nevertheless, since the analysis in each local region is computationally independent on the analysis in all other regions, this additional computational work can be done in parallel and is perfectly scalable.

c. Hybrid LETKF

The standard EnKF constructs Zb using only flow-dependent perturbations from ensemble forecasts. Kretschmer et al. (2015) proposed a climatologically augmented approach that constructs Zb as follows:
Zhybb=[αδXensbm1,1αδXclmbc1]=[αZens,b1αZclmb]ߙ(n×(m+c)),
where c is the number of perturbations in the climatology of forecast perturbations. The subscripts hyb, ens, and clm stand for hybrid, ensemble, and climatological, respectively. This augmentation is equivalent to
Pb=αPensb+(1α)Pclmb.

Therefore, we have termed this approach the hybrid LETKF, where α is the tunable hybrid coefficient [0:1] that determines the weight between flow-dependent Pensb and static Pclmb. The augmented Zhyb is used to solve the ETKF equations described in sections 2a and 2b.

For the hybrid LETKF, the ensemble perturbations in observation space are defined by
HZhybb=[αHZensb,1αHZclmb]ߙ(p×(m+c)).
Thus, we can employ different localization function fens and fclm for ZXensb and ZXclmb using Z-localization:
(HZensb)i,j*fens,i(HZensb)i,j,(HZensb)i,jfens,i(HZensb)i,j(j=1,,m),
(HZclmb)i,j*fclm,i(HZclmb)i,j,(HZclmb)i,jfclm,i(HZclmb)i,j(j=1,,c),
where fens and fclm are the localization functions for ensemble and climatological perturbations, respectively.
Since the LETKF generates analysis ensemble perturbations as Za=Zb(P˜a)1/2 at all model grid points independently, the smooth transition of (P˜a)1/2 in space is essential not to produce imbalanced analysis ensemble. Therefore, a unique symmetric square root matrix is used in the LETKF (cf. appendix C of Wang et al. 2004). The symmetric of (P˜a)1/2 ensures a spatially smooth transition of (P˜a)1/2 from one grid point to the next (Hunt et al. 2007). The symmetric square root matrix also ensures the analysis ensemble perturbations are consistent with the background ensemble perturbations because it minimizes the mean square distance between (P˜a)1/2 and I. This characteristic is very helpful in the hybrid approach because the first m columns Zensa of Zhyba correspond to the ensemble for subsequent forecasts. Analysis ensemble perturbation is given by
Zhyba=Zhybb(P˜a)1/2
[αZens,a1αZclma]=[αZens,b1αZclmb](P˜a)1/2,
where analysis error covariance (P˜a) is computed by Eq. (18). This gives m + c perturbations in Zhyba=[αZens,a1αZclma]; however, to control computational costs, only m of them are required to reinitialize the ensemble for the next ensemble forecast. Here, we will not address the question of whether ensemble initialization could be improved by including some aspect of Zclma within the m perturbations used to reinitialize the ensemble. In Lei et al. (2018), the modulated ensemble was used to update the entire vertical column and the gain form of the ETKF provided a convenient method for producing m analysis perturbations. Here, the observations in each local region only update the variables associated with the grid point at the center of the region. Unlike in Lei et al. (2018), with the method used here, the modulated ensemble members are identical to the unmodulated members at the location of the grid point being updated. Hence, noting the correspondence between prior and posterior perturbations given in Eq. (36), we simply use
Xensa=x¯a1+(m1)Zensa
to reinitialize the m member ensemble forecast. No update is needed for the members of the climatological ensemble perturbations because these are not propagated by the nonlinear model.

3. Experimental settings

a. SPEEDY

The SPEEDY model is an intermediate hydrostatic model for global climate simulations developed at International Centre for Theoretical Physics (Molteni 2003). The SPEEDY has seven vertical layers as sigma-coordinate system, and 96 × 48 horizontal grid points. The prognostic variables are zonal wind, meridional wind, temperature, and specific humidity at each of the seven layers, as well as surface pressure. While the SPEEDY model is computationally inexpensive, it contains fundamental physical parameterization schemes including those for surface flux, radiation, convection and clouds condensation. This study develops the hybrid DA system based on the available SPEEDY-LETKF system (Miyoshi 2005).

b. Implementation of the hybrid LETKF

The existing SPEEDY–LETKF comprises three components for one DA cycle: ensemble forecasts with the SPEEDY, observation operator, and the LETKF (Fig. 1a). The SPEEDY provides ensemble forecasts (Xensb), which are input into the observation operator with the observation (yo) to yield ensemble forecasts in observation space (HXensb). The LETKF reads three input data (Xensb, HXensb and yo) and produces the analysis ensemble (Xensa), followed by the next SPEEDY ensemble forecasts. For the hybrid LETKF (Fig. 1b), we first perform spinup SPEEDY–LETKF cycle and save the ensemble perturbations of the first member of the ensemble [denoted δXensb(1)] over a sufficient period to collect a range of ensemble forecast perturbations whose covariance will approximate the mean of all true forecast error covariance matrices. To automate this procedure, a preprocessor of the hybrid LETKF simply collects the latest c climatological perturbations at the UTC timing of the analysis time (δXclmb). In other words, the climatological perturbations are taken from the previous c dates of the DA system. The preprocessor adjusts the climatological perturbations so that the mean of climatological perturbation is 0. The preprocessor adds the adjusted perturbations (δXclmb) to the ensemble mean of the ensemble forecast (x¯ens) to produce the climatological ensemble (Xclmb=x¯ens1+δXclmb). Then, the forecasted and climatological ensembles (Xensb,Xclmb) are input into the observation operator together with the observation (yo). The hybrid LETKF reads the necessary input data and produces the analysis ensemble (Xensa), followed by the SPEEDY ensemble forecasts.

Fig. 1.
Fig. 1.

Schematic illustration of the (a) SPEEDY model combined with the LETKF and (b) SPEEDY with the hybrid LETKF. The x-axis represents DA cycle, and squiggly lines represent accumulation of DA cycles. For hybrid LETKF, no update is employed to climatological components in contrast to ensemble component (cf. section 2c). The purple line represents the collection of ensemble perturbation [δXensb(1)]. The observation operator (obsope) yields ensemble forecasts in observation space (HXb) using the observation (yo) and ensemble forecasts (Xb).

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

The original SPEEDY–LETKF solves eigenvalue decompositions not for (HZb)TR−1HZb but for (P˜a)1=I+(HZb)TR1HZb, which requires computational costs of about O(m + c)3. This implementation is computationally inefficient when the climatological size increased beyond the number of local observations. Therefore, we modify the code to solve the eigenvalue decomposition by Eqs. (18) and (19), depending on the number of local observations (p) and the ensemble plus climatological sizes (m + c). Computational time of the OED ETKF formulation is compared with the standard LETKF in section 5c.

c. Experimental design

The LETKF and hybrid LETKF experiments use the adaptive multiplicative inflation method of Miyoshi (2011), in which the inflation factor β is estimated adaptively based on the observation space statistics of Desroziers et al. (2005). The Gaussian-based function is used for horizontal and vertical localizations given by
f=exp{0.5[(dh/L)2+(dυ/V)2]},
where f is the localization function, dh and dυ are the horizontal distance (km) and vertical difference [log(Pa)] between the analysis model grid point and the observation, respectively. The terms L and V are the tunable parameters of the horizontal localization scale (km) and vertical localization scale [log(Pa)], respectively. The localization function is replaced by 0 beyond 210/3L and 210/3V (Miyoshi and Yamane 2007). The vertical localization scale V is fixed at 0.1 log(Pa), such that an observation at one of the seven model layers has negligible impacts on the other model layers (Greybush et al. 2011; Kondo et al. 2013). We employ several sensitivity experiments for the horizontal localization scale.

Because the historical ensemble perturbations for constructing the climatological ensemble impose a lower real-time computational load than the flow-dependent perturbations, we could allow the climatological ensemble to be considerably larger than the flow-dependent ensemble. Generally, broader localization functions are supported by larger ensembles; therefore, a large climatological ensemble could be localized less tightly than with flow-dependent localization. As discussed earlier, different localization scales can be applied to ensemble and climatological perturbations (Le and Lc, respectively) in the hybrid LETKF. We investigate difference in optimal localization scales for ensemble and climatological perturbations.

This study considered a radiosonde-like network (hereafter, raob; Fig. 2a). At the observing stations, we produced 6-hourly observation data by adding uncorrelated Gaussian random numbers to the nature run. The observation error standard deviations were 1.0 m s−1 for zonal and meridional winds, 1.0 K for temperature, 0.1 g kg−1 for specific humidity, and 1.0 hPa for surface pressure. Temperature and wind speed were observed at all seven layers, and specific humidity was observed at layers 1–4. These experimental settings were determined according to previous SPEEDY–LETKF experiments (Greybush et al. 2011; Miyoshi et al. 2014; Kondo and Miyoshi 2016, 2019; Kotsuki et al. 2020). The number of assimilated observations at an observing station is 7 (temperature) + 7 (zonal wind) + 7 (meridional wind) + 4 (specific humidity) + 1 (surface pressure) = 26. The total number of assimilated observations is 415 × 26 = 10 790. The number of assimilated observations at the fourth model layer at L = 900 km is shown in Fig. 2b. Sensitivity of the number of local observations to horizontal localization scales (L = 500, 1000, 1500 and 2000 km) is presented in the appendix (Fig. A1). We leave the consideration of nonlocal observation operators such as those associated with microwave radiances to future work.

Fig. 2.
Fig. 2.

(a) Observation network. Black dots and red crosses represent SPEEDY model grid points and observation points, respectively. (b) Numbers of assimilated observations at fourth-level SPEEDY grid points at a horizontal localization scale (Le) of 900 km.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

We performed a series of observing system simulation experiments using SPEEDY. First, we performed a nature run initialized at 0000 UTC 1 January 1981, for a standard atmosphere at rest. The first year was considered to be a spinup period, followed by 4 years of spinup LETKF cycles. We started 1-yr hybrid DA experiments at 0000 UTC 1 January 1986, by succeeding the spinup LETKF experiments. The results for the final 10 months (March–December) were used to verify the hybrid LETKF. We evaluated the 6-h forecast (i.e., background) root-mean-square errors (RMSEs) for T at the fourth model level (∼500 hPa) because this is one of the most important variables for medium-range NWP. Similar results are observed for zonal and meridional winds (not shown).

The results of the DA cycle experiments performed in this study are summarized in Table 1. We performed LETKF experiments using R-localization and Z-localization to confirm that there were no significant differences in RMSE, followed by hybrid LETKF experiments using only Z-localization. We first discuss experiments under the perfect model assumption (sections 4a, 4b, 5a5c), followed by experiments with model errors (section 5d). The imperfect SPEEDY model was developed by perturbing model parameters following Table 3 of Miyoshi (2011). The imperfect-model experiments used the same nature run and observations used in the perfect-model experiments.

Table 1

List of data assimilation (DA) cycle experiments with the local ensemble transform Kalman filter (LETKF) and hybrid LETKF. Here m is ensemble size, c is climatological size, α is the hybrid parameter, Le is the localization scale for ensemble perturbation, and Lc is the localization scale for climatological perturbation.

Table 1

4. Results

a. LETKF experiments with R- and Z-localization

First, we compared the R- and Z-localization obtained with the LETKF (i.e., no augmentation of climatological members). To obtain optimal localization scales, we employed 6-yr LETKF experiments from January 1982 to December 1987, and the final 2 years were used for the verification. Figure 3 compares the first-guess (i.e., 6-h forecast) RMSEs and ensemble spreads for T at 500 hPa for 10-, 20-, and 40-member LETKF experiments. The RMSEs and ensemble spreads were not significantly different between R- and Z-localization runs, as expected from the localization functions. Therefore, only Z-localization was applied in subsequent LETKF and hybrid LETKF experiments. For the hybrid LETKF experiments, we used the initial conditions of 0000 UTC 1 January 1986, with Le = 500 km for 10-member, Le = 900 km for 20-member, and Le = 1200 km for 40-member LETKF experiments. We also saved the first-member ensemble perturbations during spinup for use as climatological perturbations in the hybrid LETKF experiments.

Fig. 3.
Fig. 3.

First-guess (FG; i.e., 6-h forecast) root-mean-square errors (RMSEs; solid lines) and ensemble spreads (dashed lines) for temperature (K) at 500 hPa averaged over 2 years from January 1986 to December 1987. Black and red lines represent R- and Z-localizations, respectively. The abscissa shows the horizontal localization scale (Le). The LETKF experiments are employed with (a) 10, and (b) 20, and (c) 40 members.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

b. Hybrid LETKF for single-observation DA

We compared the analysis increments of the hybrid LETKF assimilating a single temperature observation at 500 hPa over the Pacific Ocean (27.833°N, 198.75°E). The analysis increment of the single-observation DA visualizes the structure of the background error covariance centered at the observing point as follows:
x¯ax¯b=PbhT(hPbhT+r)1dob.

Figure 4 compares analysis increments based on the climatological error covariance Pclmb across climatological sizes of 20, 100, 240, 365, 730, and 1095 members. Here, we show a part of the global analysis increments (40°S–80°N) to enlarge analysis increments centered at observing point. At c = 20, the analysis increment was spatially noisy due to sampling error in the error covariance estimates (Fig. 4a). Increases in c result in less noisy increments. However, long-range structures exist even for c = 1095 members (Fig. 4f), suggesting that even 1000 climatological members are insufficient to completely remove long-range erroneous structures.

Fig. 4.
Fig. 4.

Spatial patterns of analysis increments for temperature (K) at 500 hPa with climatological error covariance Pclmb without localization. Results shown for (a) c = 20, (b) c = 100, (c) c = 240, (d) c = 365, (e) c = 730, and (f) c = 1095. Cross marks represented the observing point.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

Figure 5 compares analysis increments of the hybrid LETKF for m = 20 and c = 730. The first DA applied the hybrid LETKF for α = 0.999. Here we applied α = 0.999 instead of α = 1.0 to confirm its analysis increment was almost equivalent to that of the LETKF (not shown). This confirmation was conducted to see whether our hybrid LETKF implementation worked as designed. Without localization, the analysis increment was spatially noisy (Fig. 5a), as seen in the Pclmb-based DA (Fig. 4a). With localization (L = 800 km), noisy patterns distant from the observation point disappeared, and a dipole analysis increment was produced (Fig. 5b). By contrast, the hybrid LETKF for α = 0.001 produced analysis increments mainly with climatological-based error covariance Pclmb (Fig. 5e), resulting in less noisy patterns than the Pensb-based increment without localization (Fig. 5a) because the sampling noise was mitigated to a greater extent by the 730 climatological members. Again, we applied α = 0.001 instead of α = 0.0 to confirm its analysis increment was almost equivalent to that with Pclmb-based DA (Fig. 4e). This static Pclmb-based DA with localization did not produce the dipole covariance pattern of Pensb. Instead, it produced isotropic negative increments (Fig. 5f). Figures 5c and 5d shows analysis increments for α = 0.500 using the ensemble-based Pensb and climatological-based Pclmb with equal weight. These analysis increments produced intermediate patters between those for α = 0.999 and α = 0.001. These simple examples suggest that the hybrid LETKF was implemented in SPEEDY as designed.

Fig. 5.
Fig. 5.

As in Fig. 4, but also showing differences between cases of (left) no localization and (right) localization with Le = Lc = 800 km. The ensemble and climatological sizes were m = 20 and c = 730, respectively. The hybrid coefficients were (a),(b) α = 0.999; (c),(d) α = 0.500; and (e),(f) α = 0.001. Black lines indicate the localization cutoff radii, determined as 210/3Le. The “no loc” in (a), (c), and (e) means no localization. Cross marks represented the observing point.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

Figure 6 compares analysis increments for different localization scales, where Le = 800 km was applied for ensemble-based perturbations and Lc = 2400 km was applied for climatological perturbations. The analysis increments for α = 0.999 (Fig. 6b) were almost identical to those of the previous example (Fig. 5b) because the same localization scale was applied for ensemble-based perturbations. By contrast, the analysis increments for α = 0.001 (Fig. 6f) exhibited wider spatial patterns due to the increased localization scale for climatological perturbations. Consequently, the analysis increments for α = 0.500 exhibited a dipole structure and extended increments beyond the localization cutoff radius for Le (solid line, Fig. 6d). These examples demonstrated that the analysis increments produced by the hybrid LETKF result in an intermediate pattern between those of Pensb-based and Pclmb-based DAs. In addition, the different localization scales were successfully considered with Z-localization.

Fig. 6.
Fig. 6.

As in Fig. 4, but also showing cases of (a),(c),(e) no localization and (b),(d),(f) localizations with Le = 800 km and Lc = 2400 km. Black solid and dashed lines indicate the localization cutoff radii for ensemble and climatological perturbations, calculated as 210/3Le and 210/3Lc, respectively. Panels (a), (c), and (e) are the same as Figs. 5a, 5c, and 5e. The “no loc” in (a), (c), and (e) means no localization. Cross marks represented the observing point.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

Note that the climatological-perturbation-based error covariance Pclmb (e.g., Fig. 5f), which we use here, differs from the static background error covariance B normally used in three- and four-dimensional variational methods (hereafter 3DVAR and 4DVAR). Qualitatively, one can say that the 3DVAR/4DVAR static B has much higher rank than the climatological error covariance Pclmb; furthermore, some of 3DVAR/4DVAR implementations of the static B impose sophisticated balance constraints whereas the localization used in our climatological Pclmb creates spurious imbalances to some extent. On the other hand, the 3DVAR/4DVAR static B normally assumes a high degree of horizontal isotropy and does not account for variations in climatological error covariances likely to be associated with land–sea boundaries and topography. In contrast, our climatological Pclmb partially accounts for the effect of such physical boundaries on climatological error covariances since it is based on historical flow-dependent perturbations. One could hypothesize that a superior static covariance B for 3DVAR/4DVAR could be created by combining aspects of the Pclmb we have used in our study with the traditional quasi-isotropic B. However, a deeper investigation of that is beyond the scope of this paper.

c. Hybrid LETKF for DA cycle experiments

Next, we investigated the impacts of the hybrid LETKF on the first-guess RMSEs. Figure 7 compares the first-guess RMSEs of 20-member ensemble experiments averaged over the 10-month period from March to December of 1986. The localization scales were set at Le = Lc = 900 km, which was the optimal localization scale for the LETKF (Fig. 3b). The dashed line indicates the RMSE of the LETKF over the same period (0.6047 K). For α = 0.999, the hybrid LETKF resulted in RMSEs that were close to those of the LETKF because the background error covariance Phybb was very close to the ensemble-based Pensb. As α decreased, the RMSEs from the hybrid LETKF decreased until α = 0.8 for c = 20 and 365, and until α = 0.7 for c = 100. Increasing the number of climatological members generally resulted in smaller RMSEs; however, extremely low α values resulted in larger RMSEs than those of the LETKF (α < 0.6 for c = 20; α < 0.5 for c = 100; α < 0.4 for c = 365). It is known that the use of flow-dependent error covariance provides more accurate analysis than the use of static error covariance through comparisons between EnKF and three-dimensional variational DA (e.g., Yang et al. 2009). For extremely low α, the RMSEs became larger because the static component has more impacts than the flow-dependent component on the hybrid background error covariance.

Fig. 7.
Fig. 7.

First-guess (FG; i.e., 6-h forecast) RMSEs from 20-member experiments for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986 as a function of the hybrid coefficient α. The dashed line and solid lines indicate the RMSEs for the LETKF and hybrid LETKF, respectively, for c = 20 (green), c = 100 (blue), and c = 365 (red).

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

Figure 8 compares the spatial patterns of the RMSEs. For the LETKF, larger RMSEs were observed in sparse regions, e.g., over the eastern and southern Pacific Ocean and the southern Indian Ocean (Fig. 8a). For c = 20 and c = 100, the hybrid LETKF exhibited improvements mainly in sparsely observed regions such as in the southern Indian Ocean and northern Atlantic Ocean, but produced unclear results for densely observed regions such as Eurasia. These improvements were observed globally at c = 365. Since large improvements are seen in sparsely observed region, the improvements observed in the hybrid LETKF results are attributable to an improved ability to propagate observational information to poorly observed regions rather than an improved ability to “fit” higher density observations. Since only high rank ensembles can fit large numbers of accurate observations, this result suggests that for this experiment, it was the improved background error covariance estimates obtained using the hybrid approach, rather than the boosted rank that led to this improvement.

Fig. 8.
Fig. 8.

Spatial patterns of first-guess (i.e., 6-h forecast) RMSEs for the 20-ensemble LETKF and hybrid LETKF for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986. EXP1 employed the LETKF, and EXP2, EXP3, and EXP4 employed the hybrid LETKF, with climatological sizes of 20, 100, and 365, respectively. The hybrid coefficient was α = 0.700. (a) RMSEs for EXP1. RMSE differences between experiments (b) EXP2 and EXP1, (c) EXP3 and EXP1, and (d) EXP4 and EXP1. The localization scales were Le = Lc = 900 km. Warmer colors indicate better RMSE results compared with EXP1.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

Next, we compared the impacts of increasing the localization scale on RMSEs (Fig. 9). At c = 20, the optimal localization scale was Le = Lc = 1000 km, which is slightly larger than the optimal localization scale for the LETKF (Le = 900 km). The optimal localization scales were Le = Lc = 1300 km for c = 100 and c = 365, which suggests that the hybrid LETKF can benefit from the assimilation of a larger number of observations than the LETKF. This improvement is likely because the hybrid covariance model has a higher rank, allowing it to fit more observations, and that it also has more accurate background error covariances between widely separated locations as empirically demonstrated with SPEEDY (cf. Fig. 9 of Carrió et al. 2021). As the localization scale increased, the optimal hybrid parameter α shifted to smaller values to mitigate the impacts of sampling noise from the ensemble-based error covariance for observations that were distant from the analysis grid points. When an appropriate hybrid coefficient was applied, the hybrid LETKF outperformed the LETKF even for a small value of c (i.e., 20).

Fig. 9.
Fig. 9.

First-guess (FG; i.e., 6-h forecast) RMSEs of 20-member experiments for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986 as a function of the hybrid coefficient α. Experiments for which (a) c = 20, (b) c = 100, and (c) c = 365 using different localization scales (black, Le = Lc = 900 km; dark green, Le = Lc = 1000 km; green, Le = Lc = 1100 km; blue, Le = Lc = 1200 km; purple, Le = Lc = 1300 km; red, Le = Lc = 1400 km, and magenta, Le = Lc = 1500 km). Dashed lines indicate RMSEs for the LETKF, and gray shading indicates the ± 10% range.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

5. Discussion

a. Impacts of applying different localization scales

Figure 10 compares the first-guess RMSEs from 20-member ensemble DA for c = 100 and c = 365. Two-dimensional maps show RMSEs obtained by independently sweeping the localization scales of Le and Lc from 900 to 1600 km in 100-km increments. For optimal combinations of Le and Lc, Lc tended to exceed Le except at c = 100 and α = 0.600 (Fig. 10a). This suggests that the optimal localization for climatological perturbations was slightly larger than that for ensemble perturbations in our experiments.

Fig. 10.
Fig. 10.

Globally averaged first-guess (i.e., 6-h forecast) RMSEs from 20-member experiments for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986. The x and y axes show the localization scales for ensemble and climatological perturbations (Le and Lc), respectively. The climatological sizes were (top) c = 100 and (bottom) c = 365, and the hybrid coefficients were (a),(d) α = 0.600; (b),(e) α = 0.700; and (c),(f) α = 0.800.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

As seen in Figs. 5b and 5f, the flow-dependent error covariance has spatially inhomogeneous patterns whereas the static error covariance has isotropic patterns. On average, the ensemble perturbations would have the same scale lengths as the climatological perturbations. Therefore, differences in the localization scale between the climatological and flow-dependent perturbations were dependent on the number of perturbations used for constructing the background error covariance. A large increase in the number of perturbations led to the optimal localization for climatological error covariance becoming slightly larger than that for flow-dependent error covariance.

One might consider dividing both ensemble and climatological perturbations into different scales (i.e., broader and finer scales) and localizing each scale-based group through Z-localization. In such cases, Z-localization could be used as a multiscale localization tool. Investigating the impacts of multiscale Z-localization is an important future direction of our research.

Figure 11 shows the spatial patterns of RMSEs for T at the fourth model level for four experiments with m = 20, c = 365, and α = 0.700. These experiments were labeled EXP1 (Le = Lc = 900 km), EXP2 (Le = 1300 km, Lc = 900 km), EXP3 (Le = 900 km, Lc = 1300 km), and EXP4 (Le = Lc = 1300 km). Figure 11a shows an RMSE spatial pattern for the hybrid LETKF in EXP1. Increasing the localization scale for the ensemble-based perturbation improved the RMSEs over the southern Indian Ocean and southern Pacific Ocean (Fig. 11b), which was consistent with the findings of Kotsuki et al. (2020) that the optimal localization scale was larger in sparsely observed regions compared to densely observed regions. By contrast, increasing only Le degraded the RMSEs over the tropics and in the Northern Hemisphere (Fig. 11b), perhaps because the optimal localization scale for densely observed regions is smaller than that for sparsely observed regions (Kotsuki et al. 2020). In addition, error covariance tends to be smaller over the tropics because spatially finer convection is more dominant in the tropics than in the extratropical midlatitudes.

Fig. 11.
Fig. 11.

Spatial patterns of first-guess (i.e., 6-h forecast) RMSEs for the hybrid LETKF with m = 20 and c = 365 for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986. The hybrid coefficient was α = 0.700. (a) RMSEs for EXP1. Differences in RMSE between (b) EXP2 and EXP1, (c) EXP3 and EXP1, and (d) EXP4 and EXP1. The localization scales were Le = Lc = 900 km for EXP1, Le = 1300 km and Lc = 900 km for EXP2, Le = 900 km and Lc = 1300 km for EXP3, and Le = Lc = 1300 km for EXP4. Warmer colors indicate better RMSE results compared with EXP1.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

Increasing only Lc had detrimental effects on RMSEs, especially in sparsely observed regions of the southern Pacific Ocean (Fig. 11c). However, simultaneously increasing Le and Lc had beneficial effects worldwide (Fig. 11d). Signals of improvement shown in Fig. 11b appear to have intensified in Fig. 11d. The detrimental effects on RMSEs in the tropics (Fig. 11b) almost disappeared when Lc was increased too. Therefore, the improvement in Fig. 11d would be mainly owing to improved background error covariance rather than the increased rank. Hybrid background error covariance provides more accurate estimates than does ensemble-based error covariance (Carrió et al. 2021). Therefore, the hybrid LETKF allowed larger localization scales to assimilate more distant observations. The assimilation of distant observations had a greater impact in sparsely observed regions than in densely observed regions.

One might think we could use broader localization for climatological perturbations because c = 100 or 365 is much larger than the ensemble size m = 20. However, as seen in Fig. 10, the optimal localization for the climatological perturbations was not so different from the one for the ensemble perturbations. In general, the mean of two independent guesses provides a more accurate estimate than each of them. Even with c = 365, spurious long-scale correlations still existed (Fig. 4d). Therefore, simultaneously increasing the localization scales for ensemble and climatological perturbations would be necessary for reducing sampling noise of the erroneous error correlations between distant variables.

b. Sensitivity to the ensemble size

Figure 12 compares the first-guess RMSEs for different combinations of m (10, 20, or 40) and c (20 or 365). The results showed that improvements due to the hybrid LETKF were more significant when smaller ensembles were used. The improvement ratio exceeded 20% at an ensemble size of m = 10 but reached only about 10% at m = 40. At a smaller climatological size (c = 20), the optimal hybrid coefficient exhibited a positive relationship with ensemble size. For example, the optimal hybrid coefficients were 0.6–0.8 for m = 10, 0.7–0.8 for m = 20, and 0.8–0.9 for m = 40 (Figs. 12a,c,e). By contrast, the optimal hybrid coefficients were relatively constant when c = 365, as shown in Figs. 12b,d,f, possibly due to more accurate estimates of the climatological error covariance Pclmb with larger numbers of climatological members.

Fig. 12.
Fig. 12.

First-guess (FG; i.e., 6-h forecast) RMSEs from (a),(b) 10-member; (c),(d) 20-member; and (e),(f) 40-member experiments for temperature (K) at 500 hPa averaged over the 10-month period from March to December of 1986, as a function of the hybrid coefficient α. Climatological sizes were set as (left) c = 20 and (right) c = 365. Dashed lines indicate the RMSEs for the LETKF, and gray shading indicates the ± 10% range.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

Ignoring the cost of running an additional 20 ensemble members, the computational cost of single DA is almost equivalent for a 40-member LETKF (m = 40) and hybrid LETKF with 20-ensemble and 20-climatological members (m + c = 40). In this configuration, the RMSEs of the hybrid LETKF were much larger (>0.54 K; Fig. 12c) than those of the LETKF (0.425 K; dashed lines in Figs. 12e,f). Hence, at these small flow-dependent ensemble sizes, there is still great value in running larger flow-dependent ensembles for DA. For larger ensembles, the marginal gain of further increasing the ensemble size may be less clear. In general, to maximize forecast accuracy for a given computational cost using the hybrid LETKF, one needs to compare the cost to benefit ratio of running larger flow-dependent ensembles to the cost to benefit ratio of using previously generated ensemble perturbations in a hybrid DA scheme like that discussed here. When significant model error with unknown error characteristics is present, the DA may fail to benefit much from increases in ensemble size beyond a certain limit (e.g., Etherton and Bishop 2004). Given these reasons, the optimal flow-dependent and climatological ensemble sizes for the hybrid LETKF will inevitably be highly system dependent. Recent operational NWP models have greatly increased spatial resolution, and massive computational costs are required to generate each member of the flow-dependent ensemble forecast. Because the historical ensemble perturbations used to construct the climatological ensemble require almost no real-time computation, implementation of the hybrid LETKF may be particularly useful in these very high-resolution systems.

c. Computational costs as m + c becomes large

Here, we performed 20-ensemble DA cycles using 20 cores of a single processor with an A64FX microprocessor and calculated the average computational time required to solve the LETKF over 40 DA cycles. The horizontal localization scale was fixed at Le = Lc = 900 km so that the number of local observations are equivalent for all experiments in this section.

Figure 13 shows the computational time for the 20-member DA averaged over 40 DA cycles as a function of m + c. Here, computational times only for DA part is discussed. For the original LETKF implementation in SPEEDY based on Hunt et al. (2007), the computational costs increased by O(m3) due to the eigenvalue decomposition of (P˜a)1=I+(HZb)TR1HZb (black lines of Fig. 13), which was also observed for previous SPEEDY–LETKF experiments (cf. Table 2 of Kotsuki et al. 2020). Figure 13 is consistent with our expectation that for large (m + c), the cost of the OED ETKF formulation would be proportional to O[(m + c)2] and not O[(m + c)3]. This increase is reasonable because O[(m + c)2] computations are unavoidable for the LETKF. For example, multiplying (P˜a)1/2 into Zb requires O[n × (m + c)2].

Fig. 13.
Fig. 13.

Computational times (s) for 20-member data assimilation (DA) averaged over 40 DA cycles as a function of ensemble size plus climatological size (m + c). (a),(b) The normal scale and log-scale elapsed times, respectively. Black lines show the standard ETKF that always employ the eigenvalue decomposition for I + (HZb)TR−1HZb [O(m + c)3] whereas red lines show the optimal eigendecomposition (OED) ETKF that employs the eigenvalue decomposition for (HZb)TR−1HZb [O(m + c)3] or R−1/2HZb(HZb)TR−1/2 [O(p3)] adaptively depending on the number of local observations (p), and ensemble plus climatological sizes (m + c). Blue and green dashed lines in (b) indicate the functions [(m + c)/20]2 and [(m + c)/20]3, respectively.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

Between the Hunt et al. (2007) and the OED ETKF formulations, significant differences are seen when m + c = 320 and 640. Specifically, the Hunt et al. (2007) and Bishop et al. (2001) formulations are about 20% and 100% slower than the OED ETKF formulation when m + c = 320 and 640, respectively. We can expect that the difference will be even more significant if we increase the ensemble plus climatological sizes (m + c) beyond 1000. In contrast, no significant difference is seen when (m + c) ≤ 80 since (m + c) does not exceed the number of local observations by much (Fig. 2b).

d. Experiments with model errors

Finally, we performed 20-ensemble DA experiments with model errors (cf. section 3c). Prior to the hybrid-LETKF experiments, manual tuning found the optimal Z-localization scale for the LETKF be 700 km as in section 4a. Thus, the optimal localization scale for the imperfect-model LETKF (700 km) is slightly smaller than that for the perfect-model LETKF (900 km). The initial ensemble and climatological perturbations were also prepared as in section 4a from the imperfect-model LETKF experiment. We emphasize that the system used to create the climatological perturbations was the hybrid-LETKF with model error and that the ensemble did not include explicit attempts to account for model error structure as would be obtained from a perturbed physics ensemble. In addition, neither historical innovations nor historical analysis corrections were used to enhance the representation of model error in the climatological perturbations. Consequently, for this experiment, little attempt was made to give the climatological perturbations characteristics of model error.

Figure 14 compares the RMSEs for different combinations of localization scales and hybrid parameter α with the imperfect SPEEDY model. As seen in perfect-model experiments, the optimal localization scale became larger with increased climatological size. The optimal localization scales were Le = Lc = 800 or 1000 km at c = 20, Le = Lc = 900 or 1000 km at c = 100, and Le = Lc = 1300 km at c = 365. These optimal localization scales for the imperfect-model experiments were also smaller than those for the perfect-model experiments. The hybrid LETKF with model errors achieved similar amounts of forecast RMSE reduction as that without model errors. For example, at c = 100, the largest forecast RMSE reductions with respect to the LETKF are 0.0859 (K) for perfect-model experiments (Fig. 9b) and 0.0889 (K) for imperfect-model experiments (Fig. 14b). The hybrid LETKF might be improved further for imperfect-model experiments by accounting for model errors appropriately such as by perturbed physics, or use of historical innovations and historical analysis corrections (e.g., Piccolo et al. 2019).

Fig. 14.
Fig. 14.

As in Fig. 9, but for experiments with imperfect SPEEDY model using different localization scales (black, Le = Lc = 700 km; dark green, Le = Lc = 800 km; green, Le = Lc = 900 km; blue, Le = Lc = 1000 km; purple, Le = Lc = 1100 km; red, Le = Lc = 1200 km, and magenta, Le = Lc = 1300 km). Dashed lines indicate RMSEs for the LETKF (0.999 K), and gray shading indicates the ± 10% range.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

6. Summary

The objective of this study was to explore the application of hybrid background error covariance for NWP based on the LETKF. We implemented the hybrid LETKF by augmenting climatological and ensemble perturbations to solve the ETKF, following Kretschmer et al. (2015). To reduce the computational cost of incorporating large numbers of climatological perturbations, we utilized the optimal eigendecomposition (OED) form of the ETKF that is derived from the Bishop et al. (2001) and Posselt and Bishop (2012) formulations. We conducted a series of experiments using the intermediate global atmospheric model SPEEDY and reached the following conclusions.

First, we proposed a new localization method that attenuates ensemble perturbation (Z-localization), instead of inflating the observation error variance (R-localization). Using the LETKF, Z- and R-localization resulted in insignificantly different RMSEs. The Z-localization can incorporate different localization scales for ensemble-derived flow-dependent perturbation and climatological static perturbations.

We found that the hybrid LETKF resulted in smaller RMSEs than the LETKF. Due to the improved background error covariance estimates obtained using the hybrid approach, the optimal localization scale of the hybrid LETKF was larger than that of the LETKF. Significant improvements due to the hybrid LETKF were observed in sparsely observed regions. As would be expected, the optimal weight for the flow-dependent ensemble covariance matrix decreased as the number of climatological members increased and the number of flow-dependent ensemble members decreased. The hybrid LETKF for imperfect-model experiments also resulted in similar amounts of forecast RMSE reduction as that for perfect-model experiments. We found that the OED formulation of the ETKF became significantly faster than the formulations given in Bishop et al. (2001) and Hunt et al. (2007) as the number of climatological forecast error perturbations became large. It was more than twice as fast when the total number of perturbations reached 640.

Finally, the optimal localization for climatological perturbations was slightly larger than that for ensemble-based perturbations. Since the climatological perturbations had the same average scale as the flow-dependent perturbations, the optimal localizations were similar among climatological and flow-dependent perturbation.

The multiscale localization is an important future direction of hybrid LETKF research and application. If climatological and ensemble-based perturbations can be classified into differing scales, then multiscale localization can be implemented easily through Z-localization. Another remaining issue is the adaptive determination of the hybrid coefficient. Because manual tuning of the hybrid coefficient requires massive computational resources, future studies should explore methods for its adaptive determination.

For convenience, the present study simply collected c climatological perturbations from the previous c dates. A drawback of this simplistic approach results in a climatological ensemble that contains perturbations from different seasons to the current season. To improve the climatological background error covariance where seasonality exists, it would be helpful to collect the climatological perturbations over the targeting season (e.g., spring, summer, autumn, and winter) or regime (e.g., El Niño and La Niña). The optimal collection of climatological perturbations would depend on the climatological system of the targeting regions. Investigating season-dependent or regime-dependent collection of climatological perturbations would be helpful to improve the hybrid LETKF further.

Another drawback of our method to create climatological perturbations was that we made no effort to account for model error structure and bias within them. Innovations, analysis corrections and perturbed physics ensembles could all be used to try and improve this aspect of the climatological ensemble.

During the last decade, the hybrid background error covariance model has led significant forecast improvements for four-dimensional variational DA (e.g., Clayton et al. 2013; Kuhl et al. 2013), three-dimensional ensemble–variational DA (e.g., Lu et al. 2017), and four-dimensional ensemble–variational DA (e.g., Wang and Lei 2014). It would be useful to compare our hybrid LETKF with these existing hybrid approaches with variational and ensemble-variational methods. Such investigations are also an important direction of our future study.

Recently, the number of available observations required for NWP has increased dramatically, driven by advanced remote sensing technology. However, operational implementations of the LETKF have shown that assimilating fewer observations can produce more accurate analyses or forecasts when very large observation datasets are available (Hamrud et al. 2015; Schraff et al. 2016). This fact is partly caused by R-localization, which is the most commonly used localization approach for the LETKF (Hotta and Ota 2021). This issue may be overcome through the model space localization by ensemble modulation approach (Bishop and Hodyss 2009a,b). Our attenuation-based localization offers an alternative method for developing model space B-localization by attenuating Z, not HZ. Important directions for our future research include investigating the potential advantages of this new localization method, as well as applying Z-localization for multiscale localization.

Acknowledgments.

S. Kotsuki developed the experimental system and conducted experiments. C.H. Bishop conceptualized the research, and wrote this paper with SK. The hybrid SPEEDY-LETKF system was developed based on the public code at https://github.com/takemasa-miyoshi/letkf. All of the data used in this study are stored for 5 years in Chiba University. Due to the large volume of data and limited disk space, data will be shared online upon request (shunji.kotsuki@chiba-u.jp). SK thanks members of Data Assimilation Research Team, RIKEN Center for Computational Science (R-CCS), and Drs. Daisuke Hotta and Keiichi Kondo of Meteorological Research Institute for insightful discussions. This study was partly supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grants JP18H01549 and JP21H04571, JST PRESTO MJPR1924, and Ministry of Education, Culture, Sports, Science and Technology of Japan (JPMXP1020200305) as “Program for Promoting Researches on the Supercomputer Fugaku” (Large Ensemble Atmospheric and Environmental Prediction for Disaster Prevention and Mitigation), and JAXA Precipitation Measuring Mission (PMM). C.H. Bishop was supported by the ARC Centre of Excellence for Climate Extremes (CE170100023). The authors also declare that they do not have any conflicts of interest.

APPENDIX

Sensitivity of the Number of Assimilated Observations to Horizontal Localization Scale

Figure A1 shows the sensitivity of the number of local observations at fourth-level SPEEDY grid points to horizontal localization scales (L = 500, 1000, 1500, and 2000 km).

Fig. A1.
Fig. A1.

Spatial patterns of the number of assimilated observations at fourth-level SPEEDY grid points at horizontal localization scale (Le) of (a) 500 km, (b) 1000 km, (c) 1500 km and (d) 2000 km.

Citation: Monthly Weather Review 150, 1; 10.1175/MWR-D-21-0174.1

REFERENCES

  • Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127, 27412758, https://doi.org/10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part 1: Tests on simple error models. Tellus, 61A, 8496, https://doi.org/10.1111/j.1600-0870.2008.00371.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part 2: Astrategy for the atmosphere. Tellus, 61A, 97111, https://doi.org/10.1111/j.1600-0870.2008.00372.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., B. Etherton, and S. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., J. S. Whitaker, and L. Lei, 2017: Gain form of the ensemble transform Kalman filter and its relevance to satellite data assimilation with model space ensemble covariance localization. Mon. Wea. Rev., 145, 45754592, https://doi.org/10.1175/MWR-D-17-0102.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Campbell, W. F., C. H. Bishop, and D. Hodyss, 2010: Vertical covariance localization for satellite radiances in ensemble Kalman filters. Mon. Wea. Rev., 138, 282290, https://doi.org/10.1175/2009MWR3017.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Carrió, D. S., C. H. Bishop, and S. Kotsuki, 2021: Empirical determination of the covariance of forecast errors: An empirical justification and reformulation of hybrid covariance models. Quart. J. Roy. Meteor. Soc., 147, 20332052, https://doi.org/10.1002/qj.4008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Clayton, A. M., A. C. Lorenc, and D. M. Barker, 2013: Operational implementation of a hybrid ensemble/4D‐Var global data assimilation system at the Met Office. Quart. J. Roy. Meteor. Soc., 139, 14451461, https://doi.org/10.1002/qj.2054.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Desroziers, G., L. Berre, B. Chapnik, and P. Poli, 2005: Diagnosis of observation, background and analysis-error statistics in observation space. Quart. J. Roy. Meteor. Soc., 131, 33853396, https://doi.org/10.1256/qj.05.108.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Etherton, B. J., and C. H. Bishop, 2004: Resilience of hybrid ensemble/3DVAR analysis schemes to model error and ensemble covariance error. Mon. Wea. Rev., 132, 10651080, https://doi.org/10.1175/1520-0493(2004)132<1065:ROHDAS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, 10 14310 162, https://doi.org/10.1029/94JC00572.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Greybush, S. J., E. Kalnay, T. Miyoshi, K. Ide, and B. R. Hunt, 2011: Balance and ensemble Kalman filter localization techniques. Mon. Wea. Rev., 139, 511522, https://doi.org/10.1175/2010MWR3328.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter–3D variational analysis scheme. Mon. Wea. Rev., 128, 29052919, https://doi.org/10.1175/1520-0493(2000)128<2905:AHEKFV>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 27762790, https://doi.org/10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamrud, M., M. Bonavita, and L. Isaksen, 2015: EnKF and hybrid gain ensemble data assimilation. Part I: EnKF implementation. Mon. Wea. Rev., 143, 48474864, https://doi.org/10.1175/MWR-D-14-00333.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hotta, D., and Y. Ota, 2021: Why does EnKF suffer from analysis overconfidence? An insight into exploiting the ever‐increasing volume of observations. Quart. J. Roy. Meteor. Soc., 147, 12581277, https://doi.org/10.1002/qj.3970.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123137, https://doi.org/10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and F. Zhang, 2016: Review of the ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 144, 44894532, https://doi.org/10.1175/MWR-D-15-0440.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112126, https://doi.org/10.1016/j.physd.2006.11.008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kondo, K., and T. Miyoshi, 2016: Impact of removing covariance localization in an ensemble Kalman filter: Experiments with 10 240 members using an intermediate AGCM. Mon. Wea. Rev., 144, 48494865, https://doi.org/10.1175/MWR-D-15-0388.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kondo, K., and T. Miyoshi, 2019: Non-Gaussian statistics in global atmospheric dynamics: A study with a 10 240-member ensemble Kalman filter using an intermediate atmospheric general circulation model. Nonlinear Processes Geophys., 26, 211225, https://doi.org/10.5194/npg-26-211-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kondo, K., T. Miyoshi, and H. L. Tanaka, 2013: Parameter sensitivities of the dual-localization approach in the local ensemble transform Kalman filter. SOLA, 9, 174178, https://doi.org/10.2151/sola.2013-039.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kotsuki, S., A. Pensoneault, A. Okazaki, and T. Miyoshi, 2020: Weight structure of the local ensemble transform Kalman filter: A case with an intermediate atmospheric general circulation model. Quart. J. Roy. Meteor. Soc., 146, 33993415, https://doi.org/10.1002/qj.3852.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kretschmer, M., B. R. Hunt, and E. Ott, 2015: Data assimilation using a climatologically augmented local ensemble transform Kalman filter. Tellus, 67A, 26617, https://doi.org/10.3402/tellusa.v67.26617.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kuhl, D. D., T. E. Rosmond, C. H. Bishop, J. McLay, and N. L. Baker, 2013: Comparison of hybrid ensemble/4DVar and 4DVar within the NAVDAS-AR data assimilation framework. Mon. Wea. Rev., 141, 27402758, https://doi.org/10.1175/MWR-D-12-00182.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lei, L., J. S. Whitaker, and C. Bishop, 2018: Improving assimilation of radiance observations by implementing model space localization in an ensemble Kalman filter. J. Adv. Model. Earth Syst., 10, 32213232, https://doi.org/10.1029/2018MS001468.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP—A comparison with 4D‐Var. Quart. J. Roy. Meteor. Soc., 129, 31833203, https://doi.org/10.1256/qj.02.132.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lu, X., X. Wang, Y. Li, M. Tong, and X. Ma, 2017: GSI‐based ensemble‐variational hybrid data assimilation for HWRF for hurricane initialization and prediction: Impact of various error covariances for airborne radar observation assimilation. Quart. J. Roy. Meteor. Soc., 143, 223239, https://doi.org/10.1002/qj.2914.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., 2005: Ensemble Kalman filter experiments with a primitive-equation global model. Ph.D. dissertation, University of Maryland, College Park, 197 pp., https://github.com/takemasa-miyoshi/letkf.

    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., 2011: The Gaussian approach to adaptive covariance inflation and its implementation with the local ensemble transform Kalman filter. Mon. Wea. Rev., 139, 15191535, https://doi.org/10.1175/2010MWR3570.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., and S. Yamane, 2007: Local ensemble transform Kalman filtering with an AGCM at a T159/L48 resolution. Mon. Wea. Rev., 135, 38413861, https://doi.org/10.1175/2007MWR1873.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., K. Kondo, and T. Imamura, 2014: The 10 240‐member ensemble Kalman filtering with an intermediate AGCM. Geophys. Res. Lett., 41, 52645271, https://doi.org/10.1002/2014GL060863.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Molteni, F., 2003: Atmospheric simulations using a GCM with simplified physical parametrizations. I: Model climatology and variability in multi-decadal experiments. Climate Dyn., 20, 175191, https://doi.org/10.1007/s00382-002-0268-2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Piccolo, C., M. J. Cullen, W. J. Tennant, and A. T. Semple, 2019: Comparison of different representations of model error in ensemble forecasts. Quart. J. Roy. Meteor. Soc., 145, 1527, https://doi.org/10.1002/qj.3348.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Posselt, D. J., and C. H. Bishop, 2012: Nonlinear parameter estimation: Comparison of an ensemble Kalman smoother with a Markov chain Monte Carlo algorithm. Mon. Wea. Rev., 140, 19571974, https://doi.org/10.1175/MWR-D-11-00242.1; Corrigendum. Mon. Wea. Rev., 142, 1382, https://doi.org/10.1175/MWR-D-13-00342.1

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schraff, C., H. Reich, A. Rhodin, A. Schomburg, K. Stephan, A. Periáñez, and R. Potthast, 2016: Kilometre‐scale ensemble data assimilation for the COSMO model (KENDA). Quart. J. Roy. Meteor. Soc., 142, 14531472, https://doi.org/10.1002/qj.2748.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, X., and T. Lei, 2014: GSI-based four-dimensional ensemble–variational (4DEnsVar) data assimilation: Formulation and single-resolution experiments with real data for NCEP Global Forecast System. Mon. Wea. Rev., 142, 33033325, https://doi.org/10.1175/MWR-D-13-00303.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, X., C. H. Bishop, and S. J. Julier, 2004: Which is better, an ensemble of positive–negative pairs or a centered spherical simplex ensemble? Mon. Wea. Rev., 132, 15901605, https://doi.org/10.1175/1520-0493(2004)132<1590:WIBAEO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yang, S.-C., M. Corazza, A. Carrassi, E. Kalnay, and T. Miyoshi, 2009: Comparison of local ensemble transform Kalman filter, 3DVAR, and 4DVAR in a quasigeostrophic model. Mon. Wea. Rev., 137, 693709, https://doi.org/10.1175/2008MWR2396.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
Save