Ensemble-based atmospheric data assimilation (DA) systems are sometimes afflicted with an underestimation of the ensemble spread near the surface caused by the use of identical boundary conditions for all ensemble members and the lack of atmosphere–ocean interaction. To overcome these problems, a new DA system has been developed by replacing an atmospheric GCM with a coupled atmosphere–ocean GCM, in which atmospheric observational data are assimilated every 6 h to update the atmospheric variables, whereas the oceanic variables are subject to no direct DA. Although SST suffers from the common biases among many coupled GCMs, two months of a retrospective analysis–forecast cycle reveals that the ensemble spreads of air temperature and specific humidity in the surface boundary layer are slightly increased and the forecast skill in the midtroposphere is rather improved by using the coupled DA system in comparison with the atmospheric DA system. In addition, surface atmospheric variables over the tropical Pacific have the basinwide horizontal correlation in ensemble space in the coupled DA system but not in the atmospheric DA system. This suggests the potential benefit of using a coupled GCM rather than an atmospheric GCM even for atmospheric reanalysis with an ensemble-based DA system.
Ensemble-based data assimilation (DA) techniques such as the ensemble Kalman filter (Evensen 1994, 2003) have been rapidly growing because of their advantages of the flow-dependent estimation of analysis and forecast errors, relative ease of implementation, and efficiency with parallel computers.
Miyoshi and Yamane (2007) applied the local ensemble transform Kalman filter (LETKF; Hunt et al. 2007) to the Atmospheric General Circulation Model for the Earth Simulator (AFES; Ohfuchi et al. 2004; Enomoto et al. 2008) to construct the AFES–LETKF ensemble DA system. Miyoshi et al. (2007a) performed one and a half years of the AFES–LETKF experimental ensemble reanalysis using the observational dataset of the Japan Meteorological Agency operational system. Enomoto et al. (2013) constructed the second generation of the DA system (ALEDAS2) using the latest version of AFES with an improved cloud scheme for better representation of low-level clouds (Kuwano-Yoshida et al. 2010) and LETKF employing physical distances for localization instead of using local patches (Miyoshi et al. 2007b), and performed five years of an experimental ensemble reanalysis (ALERA2) assimilating observational data of the NCEP global DA system (PREPBUFR) archived at UCAR. The ALERA2 has been used as the reference dataset for a series of observing system experiments using the ALEDAS2 (e.g., Yamazaki et al. 2015; Hattori et al. 2016, 2017; Kawai et al. 2017).
In many ensemble DA systems based on AGCMs including the ALEDAS2, surface boundary conditions such as SST and sea ice distribution are identical among all ensemble members. It leads to an underestimated ensemble spread near the surface, or equivalently an overestimate of the accuracy of the first-guess fields, and may eventually lead to a degradation of the resulting analyses. Kunii and Miyoshi (2012) showed that introducing SST perturbations in the LETKF for a regional atmospheric DA system improves the analyses and subsequent forecasts. In addition, air–sea coupled phenomena, such as the lead–lag relationship between SST and precipitation over the tropics, are not well reproduced in AGCM-based systems (Arakawa and Kitoh 2004; Saha et al. 2010). By using a coupled atmosphere–ocean GCM (CGCM) instead of an AGCM in an ensemble DA system, it is expected that the effects of perturbed surface boundary conditions and atmosphere–ocean interaction are naturally introduced into the system.
Data assimilation into CGCMs has progressed in the last decade. Zhang et al. (2007) conducted a series of perfect model experiments, assimilating pseudo-observations made from a CGCM simulation with an ensemble filter to reconstruct climate variability and trends. Sugiura et al. (2008) assimilated 10-day-averaged atmospheric and oceanic observational data using a four-dimensional variational method and controlled surface fluxes by introducing adjustment factors for the coupled state estimation from 1996 to 1998. Several operational centers have constructed their global coupled DA systems mainly for seasonal to interannual prediction or climate reanalysis based on their existing operational atmospheric and oceanic DA systems. In these systems, CGCMs are used in the forecast step to construct the first-guess fields but atmospheric and oceanic DA are conducted separately in the analysis step (e.g., Saha et al. 2010; Lea et al. 2015), or atmospheric and oceanic systems are integrated only in a limited portion of the DA process (e.g., Laloyaux et al. 2016). The former methodology is called weakly coupled DA and the latter quasi-strongly coupled DA by the definition of Penny et al. (2017). Recently, some studies show the effectiveness of strongly coupled DA, in which atmospheric and oceanic DA are conducted integrally in the whole DA process and observational data in one component are directly used to update the state of the other, in an idealized or simplified framework (e.g., Smith et al. 2015; Lu et al. 2015a,b; Sluka et al. 2016). On the other hand, Frolov et al. (2016) proposed an “interface solver” for strongly coupled DA in a realistic situation and demonstrated its usefulness based on their regional operational system. Additionally, modern techniques such as a particle filter are also applied to CGCMs to deal with the intrinsic nonlinearity of the coupled DA (e.g., Browne and van Leeuwen 2015).
In this study, to enhance the capability of the ALEDAS2, a new system has been developed by replacing AFES with the Coupled Atmosphere–Ocean General Circulation Model for the Earth Simulator (CFES; Komori et al. 2008). As a first step toward a fully coupled version of the CFES–LETKF ensemble DA system (CLEDAS), the new system assimilates only atmospheric observational data to update the atmospheric variables and is referred to as CLEDAS-A. Using this system, two months of experimental ensemble analysis has been conducted, and two-month-averaged fields are compared with those of ALERA2. This approach is categorized as quasi-weakly coupled DA (Penny et al. 2017) and could be considered as the atmospheric counterpart of the attempt by Fujii et al. (2009), in which ocean DA constrains the ocean component of their CGCM to construct a coupled reanalysis dataset.
2. Ensemble data assimilation system
a. Forecast model
CFES is used as the forecast model in CLEDAS-A (Fig. 1), and its configuration is the same as used in the previous studies (Richter et al. 2010; Taguchi et al. 2012; Bajish et al. 2013; Sasaki et al. 2013; Kuwano-Yoshida et al. 2013; Miyasaka et al. 2014; Taguchi and Schneider 2014). CFES consists of AFES as an atmospheric component including a land surface process and OFES (Masumoto et al. 2004) as an oceanic component including a sea ice process. Surface variables such as SST, sea surface and sea ice velocities, sea ice concentration, sea ice thickness, and snow depth over sea ice are transferred from OFES to AFES, while the surface fluxes and sea level pressure are passed from AFES to OFES. AFES is coupled with OFES every hour in CLEDAS-A (Fig. 1b), whereas it is forced with prescribed surface boundary conditions (BC) of the NOAA 1/4° daily OISST (Reynolds et al. 2007; Banzon et al. 2016) in ALEDAS2 (Fig. 1a).
AFES is an AGCM and solves the primitive equations using the spectral transform method and Eulerian advection. The resolution is T119 (the triangle truncation at wavenumber 119, 100 km) in the horizontal and 48 σ layers in the vertical with the top level placed at (about 3 hPa), the same as used in ALEDAS2. In AFES, the broadband radiative transfer model (MstrnX; Sekiguchi and Nakajima 2008) and the land surface model, the Minimal Advanced Treatments of Surface Interaction and RunOff (MATSIRO; Takata et al. 2003) are incorporated. The cumulus convection is represented by the Emanuel scheme (Emanuel 1991; Emanuel and Živković-Rothman 1999; Peng et al. 2004), and the gridscale cloud scheme uses statistical partial condensation based on joint-Gaussian probability distribution functions of the liquid water potential temperature and total water content (Kuwano-Yoshida et al. 2010).
OFES is a z-coordinate OGCM based on the GFDL’s Modular Ocean Model, version 3 (MOM3; Pacanowski and Griffies 1999), and contains a dynamic–thermodynamic sea ice model (Komori et al. 2005). It has a resolution of 1/2° (50 km at the equator) in the directions of both longitude and latitude and 54 levels in the vertical with varying cell thicknesses from 5 m at the surface to 330 m at the maximum depth of 6065 m. Isoneutral diffusion (Redi 1982) and thickness diffusion (Gent and McWilliams 1990) are adopted for tracers, and the Smagorinsky scheme (Smagorinsky 1963) is adopted for horizontal friction. Vertical mixing is parameterized by the Noh model (Noh and Kim 1999; Noh et al. 2005), in which mixing depends on both the Richardson and Prandtl numbers. In addition, the shortwave penetration scheme is improved especially for the use with the free surface (Komori et al. 2012).
It should be noted that the treatment of sea ice is different between AFES and CFES, besides the difference of whether sea ice concentration and thickness are the boundary conditions (AFES) or the prognostic variables (CFES). In AFES, sea ice concentration is treated as one (full ice) if observed sea ice concentration is greater than 0.1 and otherwise treated as zero (no ice), and sea ice thickness is parameterized as a linear function of observed concentration with the maximum of 0.5 m. In CFES, sea ice concentration varies between zero and one, and sea ice thickness has no upper limit. Such a difference may affect the accuracy of atmospheric (re)analyses over the marginal ice zone (Inoue et al. 2011).
b. Analysis scheme and analysis–forecast cycle
CLEDAS-A uses the LETKF for analysis as in ALEDAS2 (Fig. 1), and the analysis is made only for atmospheric variables every 6 h with a 6-h window of DA. In the forecast step, each ensemble member is integrated in time with CFES for 9 h from the initial conditions (IC) to produce the restart file, which contains hourly forecasts at 3–9 h (3 h from analysis time t) and is used for input of the LETKF. In the analysis step, atmospheric observations (obs) from to are assimilated using the 4D-EnKF (Hunt et al. 2004) technique to produce analysis at time t of atmospheric prognostic variables (temperature, specific humidity, zonal and meridional winds, cloud water, and surface pressure). The guess file represents the forecast at time t of these variables. Finally, the analyses replace the prognostic variables in restart at time t to produce the next IC for the atmospheric component. Note that prognostic variables of the land surface model (soil temperature, soil moisture, snow amount, and so on) in restart, as well as diagnostic variables such as surface turbulent heat fluxes and precipitations, are not updated by the LETKF. The oceanic variables are also kept unchanged throughout the assimilation procedure, and restart is simply used as the next IC for the oceanic component.
Parameters for the LETKF used in CLEDAS-A are the same as those used in ALEDAS2. The ensemble size is 63. The error covariance is localized by physical distance (Miyoshi et al. 2007b), and the localization length, defined as a standard deviation of the Gaussian weighting function, is 400 km in the horizontal and 0.4 lnp in the vertical, where p is pressure. The constant inflation parameter of 10% spread inflation (equivalent to 21% covariance inflation) is used (Miyoshi and Yamane 2007).
3. Experimental ensemble reanalysis
An experimental retrospective analysis–forecast cycle with CLEDAS-A is conducted from 1 August to 30 September 2008, assimilating the atmospheric observations (NCEP PREPBUFR archived at UCAR) every 6 h. The result is referred to as CLERA-A.
The 63-member initial conditions of the atmospheric component at 0000 UTC 1 August are taken from ALERA2. The initial conditions of the oceanic component are made by stand-alone ocean simulations using OFES as follows. First, the surface boundary conditions (surface air temperature, specific humidity, zonal and meridional winds, downward shortwave and longwave radiations, rainfall, snowfall, and sea level pressure) are taken from the Common Ocean-ice Reference Experiments (CORE; Large and Yeager 2004, 2009) Corrected Interannual Forcing, version 2, and the river runoff data are from the CORE Corrected Normal Year Forcing, version 2. OFES is integrated for 60 years from the beginning of 1948 to the end of 2007 from the initial conditions of the climatological temperature and salinity fields for January in the World Ocean Atlas 2005 (WOA05; Locarnini et al. 2006; Antonov et al. 2006) with no ocean currents and sea ice, while sea surface salinity is restored to the climatological monthly value of WOA05 with a restoration time scale of 30 days. Then the surface boundary conditions (except for river runoff) are switched to the ensemble-mean fields of 63-member outputs of ALERA2, and the model is integrated for another 5 months from 1 January to 2 June 2008. Finally, 63-member ensemble ocean simulations are carried out from 2 June to 1 August 2008, in which the model is forced by each member of ALERA2. The ensemble ocean simulations, hereafter referred to as EnOFES, are extended to the end of the experiment (30 September 2008) to be compared with the oceanic component of CLERA-A.
Temporal evolution of the ensemble spread of SST and a snapshot at the beginning of the experiment (1 August 2008) are shown in Figs. 2 and 3 , respectively. Two months of ensemble simulations successfully create the perturbed initial conditions of the oceanic component, and the amplitude of the ensemble spread of SST in the extratropical Northern Hemisphere (NH, blue line in Fig. 2) is about 0.1 K at 1 August 2008, which is comparable to that of the previous experimental study (about 0.2 K; Kunii and Miyoshi 2012) in which SST is artificially perturbed to investigate a typhoon in the North Pacific.
It would be worth mentioning that the ensemble spread of SST in the NH is much larger than that in the extratropical Southern Hemisphere (SH, purple line), despite that the ensemble spreads of surface variables in ALERA2 are larger in the SH, especially over the ice-covered regions, than those in the tropics and NH (not shown). The ensemble spread of SST in the NH turns from rapid increasing into decreasing in late July, whereas that in the SH continue to increase. Their amplitudes are finally reversed in early November when EnOFES is further extended (not shown). In the case when EnOFES is started from 3 December instead of 2 June, the situation becomes entirely opposite: the ensemble spread of SST in the SH is much larger than that in the NH at the beginning, and their amplitudes are reversed in early May 2009 (not shown). Thus, the ensemble spread of SST shows clear seasonality that is larger in the tropics (red line) and summer hemisphere than that in the winter hemisphere. We consider the reason as follows. 1) Incident solar radiation is the primary source of surface heating, and the response of the ocean to the thermal forcing potentially generates the ensemble spread of SST. 2) The thick mixed layer is formed at the ocean surface in winter, and the thermal inertia of the surface ocean becomes larger than in summer. In other words, the surface ocean is less (more) sensitive to the atmospheric disturbances in winter (summer). 3) Ensemble members with higher (lower) SST than ensemble mean tend to lose more (less) heat through surface cooling by sensible and latent heat fluxes. This mechanism acts to make ensemble members converge, or equivalently, reduce ensemble spread. 4) SST under sea ice is very close to the freezing temperature in our model. This also acts to reduce ensemble spread in the ice-covered regions.
In addition to the large-scale contrast between the summer and winter hemispheres, we find some noticeable local maxima of the ensemble spread (Fig. 3). The local maximum around 40°N in the North Pacific is probably induced by the atmospheric disturbances, and that around 10°N in the eastern tropical Pacific corresponds to the ITCZ. On the other hand, the local maxima in the eastern equatorial Pacific and Atlantic are likely caused by tropical instability waves (Legeckis 1977) in the ocean. Thus, both the atmospheric and oceanic variations may contribute to increase the SST ensemble spread.
a. Comparison between ALERA2 and CLERA-A
Figure 4a shows the difference of the ensemble mean of forecast surface temperature (land, ocean, or sea ice surface temperature) between ALERA2 and CLERA-A averaged over the entire experimental period (from 0600 UTC 1 August to 0000 UTC 30 September 2008). SST in ALERA2 is a given boundary condition (NOAA daily OISST) rather than a prognostic variable, and the difference represents the model bias in CLERA-A. Regions of negative SST bias are found in the mid- and high latitudes in the North Pacific and Atlantic, and those of positive SST bias are found in the eastern side of the subtropical Pacific (off California and Peru) and South Atlantic (off Namibia). These are common features even among the state-of-the-art CGCMs (e.g., Richter 2015). In addition, a surface temperature in CLERA-A over the ice-covered regions is much higher than that in ALERA2. This is partly because the sea ice concentration in the Southern Ocean is underestimated in CLERA-A compared with observations, and partly because an effective surface temperature in a model grid is an average of ocean and sea ice surface temperatures (the former is much warmer than the latter) weighted with sea ice concentration in CLERA-A over these regions, whereas it is a purely sea ice surface temperature in ALERA2 (see section 2a for difference in sea ice treatment between AFES and CFES).
Figure 4b shows the difference of the ensemble spread of forecast surface temperature between ALERA2 and CLERA-A. As mentioned above, the boundary condition of SST is identical among the all ensemble members in ALERA2, and the ensemble spread of SST is exactly zero. Therefore, the difference in the ensemble spread of SST originates in CLERA-A. The ensemble spread of SST is 0.1–0.3 K in the tropics and summer hemisphere and kept (or slightly increased) during the experiment in the coupled system, but is much smaller in the winter hemisphere where the thick mixed layer is formed in the ocean (see also Fig. 2). Remarkably, the introduction of atmosphere–ocean coupling increases the ensemble spread of land surface temperature particularly over Africa and southern Eurasia although the ensemble mean of land surface temperature is almost the same between ALERA2 and CLERA-A. This might be related to perturbed monsoonal winds from the ocean and the associated precipitation in this season. The ensemble spread of surface temperature over the ice-covered regions, particularly in the Southern Ocean, in CLERA-A is much smaller than that in ALERA2. Because the surface ocean heat content is much larger than the sea ice heat content, the effective surface heat content in CLERA-A is much larger than that in ALERA2 because of a fractional treatment of sea ice concentration. This implies that surface temperature over the ice-covered regions in CLERA-A is less sensitive to the atmospheric disturbances than that in ALERA2, which may lead to the decrease of the ensemble spread of surface temperature in these regions.
The difference in the ensemble mean and spread of analyzed surface pressure between ALERA2 and CLERA-A is shown in Fig. 5. The spatial patterns of the difference in ensemble mean, especially over the ocean, well correspond to those of surface temperature (Fig. 4a): higher (lower) surface temperature causes lower (higher) surface pressure in CLERA-A than in ALERA2. The magnitude of the difference is, however, less than 1 hPa except for the ice-covered regions in the Southern Ocean, which implies that CLERA-A reproduces a similar atmospheric circulation to ALERA2 because of the assimilation of atmospheric observations. The ensemble spread of surface pressure in CLERA-A slightly increases almost all over the world and particularly over the eastern tropical Pacific.
Figure 6 compares the ensemble spread of forecast air temperature and specific humidity in the surface boundary layer (at the height of , which roughly corresponds to the 970-hPa surface) between ALERA2 and CLERA-A. The overall patterns of the ensemble spread in CLERA-A are very similar to those in ALERA2. Meanwhile, the ensemble spread in CLERA-A is larger than that in ALERA2 particularly in the regions where the ensemble spread in ALERA2 is relatively small, and over the tropics and summer hemisphere, where the ensemble spread of SST is more increased by the atmosphere–ocean coupling (Fig. 4b).
A similar comparison is made for the lower troposphere (at the height of , which roughly corresponds to the 860-hPa surface) in Fig. 7. The ensemble spread of forecast air temperature and specific humidity has local maxima over the eastern side of the subtropical Pacific and South Atlantic, as well as over the equatorial eastern Pacific and Atlantic, for both ALERA2 and CLERA-A. Similar to the boundary layer situation (Fig. 6), the ensemble spread in CLERA-A is larger than that in ALERA2 at this height except for the eastern side of the subtropical Pacific and South Atlantic, where low-level clouds are underestimated probably because of the significant warm bias in surface temperature in CLERA-A (Fig. 4a). It should be noted that the difference in the ensemble spread of air temperature between ALERA2 and CLERA-A is larger in the lower troposphere (Fig. 7c) than that in the surface boundary layer (Fig. 6c).
Figure 8 shows the spread–skill diagram of forecast air temperature over the ocean for ALERA2 (Figs. 8a–c) and CLERA-A (Figs. 8d–f). We define the skill as the RMS difference from the corresponding analysis of the ERA-Interim (Dee et al. 2011), and both the ensemble spread and RMS difference are averaged over the entire experimental period. Table 1 summarizes the spatiotemporally averaged values of the ensemble spread, RMS difference, and their ratio. At the 975-hPa surface, the ensemble spread in ALERA2 (Fig. 8a) is very small (0.25 K on average) in comparison with the RMS difference (1.60 K on average), or the system is underdispersive in the surface boundary layer. In CLERA-A (Fig. 8d), the ensemble spread is increased especially in the region where the ensemble spread in ALERA2 is smaller than 0.2 K as shown in Fig. 6. However, the RMS difference is also increased (1.87 K) possibly because of the model bias. At the 850-hPa surface, the ratio of the ensemble spread to the RMS difference is improved in both ALERA2 (Fig. 8b) and CLERA-A (Fig. 8e). No significant differences are found between them. Thus, the spread–skill relationship is not necessarily improved by simply using a CGCM, and the better representation of model error, for example, would be needed in the future. At the 500-hPa surface, the ensemble spread is almost comparable to the RMS difference in both ALERA2 (Fig. 8c; the average ratio is 0.70) and CLERA-A (Fig. 8f; 0.75). Interestingly, the RMS difference in CLERA-A (1.11 K) is smaller than that in ALERA2 (1.16 K) at this height. This might suggest the potential benefits of using a CGCM for an atmospheric analysis instead of an AGCM.
b. Ensemble covariance and correlation among variables
Ensemble-based reanalysis has the advantage that one can calculate the statistics from the ensemble members and can estimate the relationship among the variables from relatively short time series.
Figure 9 shows the lag covariance, estimated from the ensemble, between forecast SST and precipitation induced by cumulus convection (Figs. 9a–c) and large-scale condensation (Figs. 9d–f) in CLERA-A averaged over the experimental period. When SST leads cumulus precipitation by 1 day (Fig. 9a), the ensemble-based covariance is mostly positive over the regions with heavy precipitation such as the warm water pools in the western tropical Pacific, Atlantic, and Indian Oceans and the ITCZs in the eastern tropical Pacific and Atlantic. This positive covariance implies that the ensemble members with higher SST than the ensemble mean have more cumulus precipitation on the next day possibly because of the enhanced convective activity. The ensemble-based covariance without lag between SST and cumulus precipitation (Fig. 9b) is still positive over the ITCZ regions, but it turns negative over the warm water pools. When cumulus precipitation leads SST by 1 day (Fig. 9c), the ensemble-based covariance is unclear over the ITCZ regions, whereas it is strongly negative over the warm water pools. This negative covariance implies that the ensemble members with more cumulus precipitation than the ensemble mean have lower SST on the next day possibly because of the decreased solar irradiance, a feature that is never seen when using an AGCM.
The relationship between SST and large-scale precipitation is totally different from that between SST and cumulus precipitation. When SST leads large-scale precipitation by 1 day (Fig. 9d), the ensemble-based covariance is slightly negative but not a robust signal over the ITCZ regions. The ensemble-based covariance without a lag between SST and large-scale precipitation (Fig. 9e) is strongly negative over the ITCZ regions in the eastern tropical Pacific and Atlantic, and the negative ensemble-based covariance becomes stronger in these regions when large-scale precipitation leads SST by 1 day (Fig. 9f).
For the future extension of our system to assimilate oceanic observational data, Fig. 10 shows the ensemble-based vertical correlation between forecast SST and tropospheric air temperature in CLERA-A. The correlation is calculated every 6 h and then averaged over the entire experimental period to reduce noises. SST has positive correlation with air temperature in the lower troposphere (at the height of , Fig. 10a) over the tropics and subtropics in the summer hemisphere, and the correlation coefficient almost reaches 0.3 in some regions. (The correlation coefficient greater than 0.250 is significant at the 95% confidence level for 60 degrees of freedom.) Such a relatively high correlation is also seen even in the midtroposphere (at the height of , which roughly corresponds to the 510-hPa surface, Fig. 10b) over the warm water pool in the western tropical Pacific.
Finally, we compare remote effects of the tropical Pacific in CLERA-A with that in ALERA2. Figure 11 shows the ensemble-based horizontal correlation of forecast 2-m air temperature (color), 10-m winds (green arrows), and surface pressure (gray contour) with respect to 2-m air temperature averaged within the Niño-3.4 region (5°S–5°N, 170°–120°W). In ALERA2 (Fig. 11a), the area of high ensemble-based correlation among these variables is primarily confined to the Niño-3.4 region. The 10-m winds tend to converge into the region, and surface pressure is positively correlated with 2-m air temperature there. In CLERA-A (Fig. 11b), 2-m air temperature has a much broader and larger ensemble-based horizontal correlation over the tropics and subtropics in the summer hemisphere. The 10-m winds in CLERA-A also tend to converge into the Niño-3.4 region but from a much broader area over the tropical Pacific. In addition, 10-m winds over the Arabian Sea, Bay of Bengal, South China Sea, and Philippine Sea are correlated with 2-m air temperature within the Niño-3.4 region, and their directions are opposite to the typical monsoonal winds in this season. Surface pressure over the tropical Pacific is negatively correlated with 2-m air temperature within the Niño-3.4 region. In the ensemble ocean simulation driven by each member of ALERA2 (EnOFES), tropical SST exhibits similar (but weaker) ensemble-based correlation with respect to SST averaged within the Niño-3.4 region (not shown). Therefore, this large-scale structure of ensemble-based horizontal correlation over the tropics in CLERA-A might be induced by the ocean and amplified through some coupled atmosphere–ocean processes, although the detailed analysis is beyond the scope of this paper.
5. Summary and conclusions
In this study, to enhance the capability of the local ensemble transform Kalman filter (LETKF) with the Atmospheric General Circulation Model for the Earth Simulator (AFES), a new system has been developed by replacing the AFES with the Coupled Atmosphere–Ocean General Circulation Model for the Earth Simulator (CFES). Using the coupled system, two months of experimental retrospective analysis, CLERA-A, have been conducted from 1 August 2008 by assimilating atmospheric observational data (NCEP PREPBUFR) every 6 h to update only the atmospheric variables.
Comparison of the CLERA-A with the AFES–LETKF experimental ensemble reanalysis, version 2 (ALERA2), revealed that underestimation of the ensemble spread near the surface, which may arise from the use of identical surface boundary condition such as SST and sea ice distribution for all ensemble members and the lack of atmosphere–ocean interaction in ALERA2, was mitigated by using CFES instead of AFES. Moreover, the forecast skill in the midtroposphere is rather improved in CLERA-A. In addition, surface atmospheric variables over the tropical Pacific have the basinwide horizontal correlation in ensemble space in CLERA-A but not in ALERA2 (Fig. 11), despite the fact that the same observational data are assimilated using the same method. This suggests that the use of a coupled GCM rather than an atmospheric GCM could be beneficial in some aspects even for atmospheric reanalysis with an ensemble-based data assimilation system.
On the other hand, oceanic observational data are not assimilated in the current system, and the significant biases of SST remain in CLERA-A, which may degrade the atmospheric analyses. Therefore, it is desirable to restore SST to some analysis products (e.g., Laloyaux et al. 2016) or to assimilate oceanic data into the coupled system (e.g., Saha et al. 2010; Lea et al. 2015) as the next step of assimilation system development. Relatively high ensemble-based correlation between SST and the tropospheric air temperature (Fig. 10) suggests that the assimilation of oceanic data would help further improve the atmospheric analyses as well as the oceanic analyses by using cross covariances between the atmospheric and oceanic variables. Meanwhile, the following issues remain for future work toward a fully coupled ensemble DA system based on CFES: localization scales in both the horizontal and vertical directions across the domains (e.g., Frolov et al. 2016) possibly with their spatial and seasonal variations taken into account; other techniques for reduction of sampling noises to retain well-conditioned cross-domain covariances (e.g., Smith et al. 2018); treatment of the model errors/biases in the coupled system (e.g., Fowler and Lawless 2016); and the difference in appropriate assimilation windows between the atmosphere and ocean arising from the difference in their typical time scales and the amounts of available observations.
The ALERA2 dataset is available from http://www.jamstec.go.jp/alera/alera2.html. To access the CLERA-A dataset, contact the corresponding author (email@example.com). PREPBUFR, which is compiled by NCEP and archived at UCAR, is used as the observations (http://rda.ucar.edu). The authors are grateful to three anonymous reviewers for their valuable comments. This work was supported by MEXT/JSPS KAKENHI (22106008, 22244057, 22740319, 25400474, and 17K05663). Numerical simulations were carried out on the Earth Simulator with the support of JAMSTEC.