The present study examines the correspondence between short- and long-term systematic errors in five atmospheric models by comparing the 16 five-day hindcast ensembles from the Transpose Atmospheric Model Intercomparison Project II (Transpose-AMIP II) for July–August 2009 (short term) to the climate simulations from phase 5 of the Coupled Model Intercomparison Project (CMIP5) and AMIP for the June–August mean conditions of the years of 1979–2008 (long term). Because the short-term hindcasts were conducted with identical climate models used in the CMIP5/AMIP simulations, one can diagnose over what time scale systematic errors in these climate simulations develop, thus yielding insights into their origin through a seamless modeling approach.
The analysis suggests that most systematic errors of precipitation, clouds, and radiation processes in the long-term climate runs are present by day 5 in ensemble average hindcasts in all models. Errors typically saturate after few days of hindcasts with amplitudes comparable to the climate errors, and the impacts of initial conditions on the simulated ensemble mean errors are relatively small. This robust bias correspondence suggests that these systematic errors across different models likely are initiated by model parameterizations since the atmospheric large-scale states remain close to observations in the first 2–3 days. However, biases associated with model physics can have impacts on the large-scale states by day 5, such as zonal winds, 2-m temperature, and sea level pressure, and the analysis further indicates a good correspondence between short- and long-term biases for these large-scale states. Therefore, improving individual model parameterizations in the hindcast mode could lead to the improvement of most climate models in simulating their climate mean state and potentially their future projections.
Despite the significant efforts made in climate modeling since phase 3 of World Climate Research Programme (WCRP) Coupled Model Intercomparison Project (CMIP3), large systematic errors in precipitation, clouds, and water vapor remain in the simulated climate mean state for majority of the climate models in the recent CMIP5 (Jiang et al. 2012; Li et al. 2012; Klein et al. 2013). Understanding the origin of these systematic errors is challenging because nonlinear feedback processes in the climate system make it difficult to unambiguously identify causal relationships.
Nevertheless, recent studies from individual climate models demonstrate that a numerical weather prediction (NWP) technique (Phillips et al. 2004; Williams et al. 2013) is useful for understanding climate model errors and facilitates model parameterization improvements (Xie et al. 2004; Boyle et al. 2005; Williamson et al. 2005; Klein et al. 2006; Rodwell and Palmer 2007; Boyle et al. 2008; Williams and Brooks 2008; Xie et al. 2008; Hannay et al. 2009; Martin et al. 2010; Barton et al. 2012; Kamae and Watanabe 2012; Lin et al. 2012; Medeiros et al. 2012). Some studies (Brown et al. 2012; Xie et al. 2012; Ma et al. 2013) further indicate that many errors present in the short-term weather forecasts or hindcasts share great resemblance to those systematic errors in their Atmospheric Model Intercomparison Project (AMIP) type of climate simulations. For example, Xie et al. (2012) examined this correspondence in the National Science Foundation (NSF)–Department of Energy (DOE) Community Atmosphere Model, versions 4 and 5.1, and found that most climate errors in the annual mean fields, particularly those associated with moist processes, are apparent in the mean day-2 hindcast and steadily grow with the hindcast lead time. Examples are the excessive precipitation in much of the tropics and the overestimate of net shortwave absorbed radiation in the stratocumulus cloud decks over the eastern subtropical oceans and the Southern Ocean at about 60°S.
Exploration of such correspondence between short- and long-term systematic errors is an important step to understand the origin of climate errors. This is because a good correspondence may indicate that the errors are likely the results of parameterization errors associated with fast physics as the large-scale state remains close to observations in the first few days of hindcasts. However, there can also be a strong local interaction between the parameterized physics and dynamics, which can exacerbate errors. An extreme example of such interaction is discussed in Williamson (2013). A poor correspondence may suggest that the errors are likely as a result of more slowly evolving feedbacks. Therefore, for such errors exhibiting a correspondence, reduction of errors in the hindcast mode is likely to lead to improved climate simulations. Indeed, there are indications that improvements in fast physics processes such as clouds and precipitation can enhance forecast skills and lead to better performance in the mean state of climate simulations (Hurrell et al. 2009; Martin et al. 2010).
In this study, we extended the work in Xie et al. (2012) to further examine the correspondence between short- and long-term systematic errors using hindcasts from the Transpose-AMIP II (TAMIP; Williams et al. 2013; http://www.transpose-amip.info) and AMIP simulations from WCRP CMIP5 archive (CMIP5/AMIP). The TAMIP project is an international model intercomparison project endorsed by the World Meteorological Organization (WMO) Working Group on Numerical Experimentation (WGNE) and Working Group on Coupled Modeling (WGCM). The goal for this project is to better understand and yield significant insights into the cause of CMIP5 model errors. Using the identical climate model to that used in the CMIP5 AMIP experiment, each participating group performed 64 of 5-day-long hindcasts during the Year of Tropical Convection (YOTC; May 2008–April 2010; Waliser et al. 2012). Details of the experiment design are described in section 2.
Our goal is to identify the systematic errors in the free running climate simulations and explore whether the correspondence between short- and long-term systematic errors previously reported in individual models (Brown et al. 2012; Xie et al. 2012; Ma et al. 2013) is robust across other climate models based on simulations from the TAMIP and CMIP5/AMIP. We selected the northern summer season for analysis as an example to demonstrate the correspondence, although the bias correspondence is also present in other seasonal or annual means (e.g., Xie et al. 2012). The remainder of this manuscript is organized into three sections: Section 2 introduces the validation datasets, as well as the TAMIP and CMIP5/AMIP models. Section 3 examines the correspondence between short- and long-term biases from the TAMIP and CMIP5/AMIP simulations. Section 4 summarizes our findings and draws conclusions.
2. Validation datasets and models
a. Validation datasets
To evaluate model simulated mean biases for both hindcasts and climate runs, we obtained observational fields from several sources (Table 1). Observational rainfall datasets are adopted from Tropical Rainfall Measuring Mission (TRMM; Huffman et al. 2007) 3B42 version 6 (adjusted merged-infrared precipitation estimate) with 3-hourly temporal resolution and horizontal resolution of 0.25° latitude by 0.25° longitude. Both daily and monthly observed net shortwave flux at top of atmosphere (SWAbs) and outgoing longwave radiation (OLR) are obtained from Clouds and the Earth’s Radiant Energy System (CERES) Energy Balanced and Filled (EBAF) observations. These fields are derived from SYN1deg product with horizontal resolution of 1° latitude by 1° longitude (Doelling et al. 2013; http://ceres.larc.nasa.gov/products.php?product=SYN1deg). For total cloud fraction, both 3-hourly and monthly total cloud fraction are obtained from International Satellite Cloud Climatology Project (ISCCP) D1 and D2 datasets, respectively (Rossow and Schiffer 1999; http://isccp.giss.nasa.gov/docs/docSoftware.html).
Sea level pressure, 2-m temperature, and other state variables from the hindcasts are compared to the operational analysis from European Centre for Medium-Range Weather Forecasts (ECMWF) provided to the YOTC project (Waliser et al. 2012; available at http://data-portal.ecmwf.int/data/d/yotc). The analysis is available at a horizontal resolution of 0.125° latitude by 0.125° longitude. We also compare ECMWF latest global reanalysis dataset, the ECMWF Interim Re-Analysis (ERA-Interim; Dee et al. 2011), to the CMIP5/AMIP simulations. The dataset has horizontal resolution of 1.5° latitude and 1° longitude and 37 pressure levels in the vertical. More details on the model cycle 31R1 used in ERA-Interim can be found online (at http://www.ecmwf.int/research/ifsdocs/CY31r1/index.html).
b. Transpose-AMIP II and CMIP5/AMIP
At the time this paper was written, six sets of climate model hindcast experiments were available in the TAMIP archive (Table 2; see table for expanded model names): CCSM4, CNRM-CM5, HadGEM2-Ai1, HadGEM2-Ai3, IPSL-CM5A-LR, and MIROC5. We selected the northern summer hindcasts (July–August 2009) for analysis as mentioned in the introduction. A schematic diagram for the hindcast procedure and our analysis method is presented in Fig. 1. For each modeling group, a 5-day hindcast was performed for each of the specified starting times (Table 3) with the atmospheric state variables (velocity, temperature, humidity, and optionally surface pressure) initialized with ECMWF YOTC analysis. The method of generating initial conditions for land models is based on Boyle et al. (2005). We note that the starting times represent different phases of the diurnal cycle. However our analysis is of 24-h averages and thus is always over one diurnal cycle. More details regarding the impact of different hindcast starting hours are discussed in appendix A. With 16 hindcasts available from each model in the northern summer, we first calculate the simulated biases with available observations for each hindcast and then form a hindcast bias ensemble based on the hindcast lead time. For the day-1 ensemble, it covers from 0 to 24 hindcast hours. The day-2 ensemble is from 24 to 48 hindcast hours and so on for the day-3–5 ensembles. As mentioned above, a second set of hindcasts from HadGEM2-A is also available (HadGEM2-Ai3) for which Met Office analysis was utilized as the initial conditions to indicate potential initial condition dependency. A detailed experiment setup for TAMIP is presented in Williams et al. (2013) and online at http://www.transpose-amip.info. We present evidence in appendix B that the ensemble size of 16 for the hindcasts is adequate to identify the primary systematic errors in a model's climate.
To obtain long-time-scale climate mean biases, we also used the corresponding CMIP5/AMIP simulations in which the atmosphere model is subject only to the boundary conditions of the observed sea surface temperatures, sea ice distributions, and other variables usually prescribed in atmosphere model simulations. We selected the June–August (JJA) mean for the entire time period for the AMIP simulations covering from 1979 to 2008. A detailed experiment design for CMIP5/AMIP can be referred to Taylor et al. (2012). All the validation datasets and model simulations are linearly interpolated into a resolution of 2° longitude by 2° latitude resolutions.
c. Issues of initial spinup
Since all the models examined here except for HadGEM2-Ai3 are initialized with a ”foreign“ analysis (ECMWF analysis), it is worthwhile to examine how model spinup affects our analysis. Figure 2 shows the ensemble and global mean precipitation and total cloud fraction (from ISCCP simulator) from all the hindcasts for each model. For HadGEM2-Ai1, HadGEM2-Ai3, IPSL-CM5A-LR, and MIROC5, the global mean precipitation and cloud fraction reach a relative equilibrium state within a few hours of the start of the hindcast. CNRM-CM5 and CCSM4 take longer to reach a relative equilibrium state but do so after about 24 h. Even when initialized with a different analysis, HadGEM2-A produces a very similar global mean precipitation and cloud fraction after about 24 h. This suggests that initial spinup affects precipitation and cloud fraction in the day-1 hindcast ensembles but has minimal impacts on day-2 or later hindcasts. We also notice that the spread in the ensemble members is relatively small (plus/minus one standard deviation; gray shaded), especially during the first 24 h. In our analyses, therefore, we only examine the hindcast ensembles from day 2 to day 5 to avoid the initial spinup impacts.
3. Correspondence between short- and long-term systematic errors
a. How well do the models simulate the large-scale states and the fields associated with fast physical processes?
We first examine the departure from the observations of the large-scale states in the simulated mean states. Figure 3 demonstrates the performance of large-scale state variables (zonal wind, temperature, and specific humidity) at both upper (250 hPa) and lower troposphere (850 hPa) from both TAMIP hindcasts and CMIP5/AMIP in a Taylor diagram (Taylor 2001). Nearly all the large-scale state variables in the day-2 hindcasts show spatial standard deviations closer to one and very high correlations (~0.99) compared to their corresponding observations. This very small departure from observations suggests that these large-scale states in the hindcasts are still close to observations. The performance of day-5 hindcasts and AMIP simulations are still very good with correlations of approximately 0.95 or larger. Some day-5 hindcast performance is close to AMIP performance, implying a bias correspondence in the large-scale state between short-term hindcast and long-term climate. We will demonstrate this in the next sections.
Figure 3 also shows the performance of precipitation and total cloud fraction (produced by ISCCP simulator; Bodas-Salcedo et al. 2011), which are closely linked to model physical parameterizations. Since we require cloud fraction output from the ISCCP simulator, there are only four models available for analysis: CCSM4, CNRM-CM5, HadGEM2-Ai1, and MIROC5. The performance for these two fields is relatively poor compared to the large-scale states for all the models in these hindcasts. This does suggest that the biases for precipitation and clouds in the first few days of hindcast are most likely as a result of model parameterizations since the large-scale states remain close to observations.
b. Bias correspondence in fields associated with fast physics
To demonstrate the bias correspondence, we plot the pattern statistics (spatial correlations and standard deviations) between the bias errors in the TAMIP and CMIP5/AMIP experiments on a Taylor diagram (Fig. 4). Unlike the canonical Taylor diagram that uses the observation field as the reference field, the reference field for each model in the current plot is the mean bias from its corresponding AMIP simulation with respect to observations. In this way, we display the similarity between the systematic errors in the hindcast and climate experiments. Because of the limited hindcast ensemble members and the limited availability of high temporal frequency observations with global coverage, we only examine four two-dimensional fields from the TAMIP hindcasts: biases from precipitation, total cloud fraction, SWAbs, and OLR. These variables are normally emphasized among the most important geophysical fields that a climate model is required to accurately simulate. To reduce the impact from synoptic variability, only grid points with biases that are statistically significant at 95% confidence level (two-tailed t tests) in both the hindcasts and AMIP runs are used for pattern statistics calculation for the Taylor diagrams.
The spatial standard deviations of hindcasts for precipitation and SWAbs are generally larger than their AMIP counterparts (>1) and the values increase with hindcast lead time. Larger standard deviations in the short-term hindcasts are expected since highly transient synoptic variability is dominant, given the small size of the hindcast ensemble ( appendix B). On the other hand, the AMIP mean biases are calculated from 30 yr of monthly mean fields. For bias correlation, day-5 hindcast biases in nearly all the selected fields from all five models have correlation larger than 0.5 with their corresponding AMIP biases. For some fields, such as cloud fraction and SWAbs, the bias correlations are close or larger than 0.8. It is interesting to note that bias correlations of all the fields from CCSM4 increase with hindcast lead time, suggesting the hindcast biases gradually evolve toward the AMIP biases with hindcast lead time. For other models, the correlations do not change much from day 2 to day 5, suggesting that these biases saturate fast after the day-2 hindcast ensemble. Particular model behavior may depend largely on the parameterization suites utilized in each climate model and also how fast the biases from parameterizations interact with large-scale states in the hindcasts. The similar correlations in most fields between HadGEM2-Ai1 and HadGEM2-Ai3 suggest that bias patterns in the short-term hindcasts might not be strongly affected by different initial conditions, as long as the initial conditions are produced by a well-established modern NWP analysis system. Nevertheless, day-5 hindcast biases in all the fields do not go all of the way to their corresponding AMIP biases. Possible reasons for this can be that some feedback processes or compensating errors require a longer time than 5 days to develop. Furthermore, differences in initial conditions and ensemble sizes between the hindcasts (16) and AMIP free runs (30 JJAs) are also factors affecting the correlations. We also note that the hindcasts are primarily for July–August 2009 while the climate is for June–August of 1979–2008. Interannual variability and sample periods can also introduce differences. Given these differences, the bias pattern with a correlation coefficient of 0.5–0.9 as shown in Fig. 4 is quite robust. Bearing in mind these sampling issues, these results suggest that errors in short-term hindcasts resemble those in climate simulations.
We next focus on identifying regional systematic biases in the long-term climate simulations and short-term hindcasts, as well as on their bias correspondence. An effective way to illustrate systematic biases of climate models is to examine the multimodel mean (MMM) bias of the simulated fields with statistical significance t tests. For a given field at a certain region, if most models produce biases of the same sign (systematic/robust bias), the MMM bias stands out as statistically significant. If most models do not show the same sign of bias or the bias is small, the mean bias is usually not statistically significant because of the large standard deviation (spread) or the small mean bias of sampled models.
Figures 5a and 5b show the CMIP5/AMIP MMM JJA precipitation biases from the five different climate models (HadGEM2-Ai3 from the Met Office was not included here and in the later similar figures). Only regions where biases are statistically significant at the 95% confidence level are color shaded. Although only five models are examined here, the bias pattern in Fig. 5 is very similar to the JJA precipitation biases from a much larger set of models ( appendix C; see Fig. 1a in Ma et al. 2013). We find systematic wet biases over the central and subtropical Pacific, the western Indian Ocean, the western Atlantic, the subtropical Atlantic, the Southern Ocean, most of the Asian continent, eastern tropical Africa, and the southern Arabian Peninsula. There are systematic dry biases over the tropical western and eastern Pacific, East and Southeast Asia, western tropical Africa, tropical and southeastern South America, and central North America. Many of these systematic biases, such as the wet biases in the central Pacific or dry biases over the western Pacific, have been previous reported in the CMIP3 (Lin 2007) or individual model studies (Xie et al. 2012). Also note that overall precipitation bias pattern does not change much when using a different observational reference, such as data from the Global Precipitation Climatology Project.
To further examine whether these systematic biases can be identified through short-term hindcasts, we also calculated the mean precipitation biases for day-2 (Fig. 5a) and day-5 (Fig. 5b) hindcasts for individual models and compared them with MMM biases from the AMIP runs. For any given grid point, if at least four of five climate models (HadGEM2-Ai3 is not included) have the same sign in the hindcast bias as in the MMM AMIP bias, the grid point is stippled in Fig. 5 to indicate that the bias can be identified in the short-term (2 or 5 days) hindcasts. Although the stippled regions are sensitive to the number of ensemble members in the hindcasts, Fig. 5 still shows that many systematic biases in the climate runs can be identified with day-2 and day-5 hindcasts. An exception is the dry bias over the tropical western Pacific. The very similar stippled regions between day-2 and day-5 hindcasts also indicate that most systematic biases in the precipitation develop by day 2. These features are also true for other fields analyzed in this study. For simplicity, we only show results from day-5 hindcasts in the later MMM bias plots. We also examined the annual MMM biases from the AMIP runs (figure not shown) and found that the bias pattern between annual and JJA MMM is similar with the pattern correlation of approximately 0.81. This further suggests that the systematic biases in precipitation have weak seasonal dependence.
Figure 6 displays day-2 and day-5 ensemble mean biases for precipitation from the individual models, along with the JJA mean precipitation biases from the corresponding CMIP5/AMIP simulations. As expected, the hindcast ensembles for either day 2 or day 5 from individual models are noisier than its AMIP biases. Nevertheless, precipitation biases of day-5 ensemble hindcasts from individual model do share a similar bias pattern to the AMIP bias patterns.
Figure 7 shows the MMM JJA cloud fraction biases in reference to ISCCP observations. These climate models produce a too small cloud fraction over almost the entire globe, except over the eastern Pacific and Atlantic, and the low bias is larger in magnitude over Northern Hemisphere continents but is also present over the marine subtropical stratocumulus regions, such as Peruvian, Namibian, and Australian stratocumulus, which have been extensively reported in many studies (Wood 2012). Day-5 hindcasts from most models are able to identify the low bias in cloud fraction in most locations over land and the Peruvian, Namibian, and Australian marine stratocumulus regions, suggesting that these biases are robust across models and most likely as a result of model parameterizations (Klein et al. 2013).
SWAbs biases are negative (smaller shortwave fluxes into the atmosphere) over most tropical and subtropical oceans, as well as over Western Australia, eastern and southern Africa, the Tibetan Plateau, and Arctic regions (Fig. 8). Positive SWAbs biases (more shortwave fluxes into the atmosphere) occur over most Northern Hemisphere continents and the Antarctica, as well as over northwestern tropical Pacific; the Southern Ocean; and regions of Californian, Peruvian, and Namibian marine stratocumulus. Day-5 hindcasts show similar bias patterns except over the northwestern tropical Pacific resulting from the highly transient variability in the hindcasts. Upon comparing SWAbs biases to cloud fraction biases, it is interesting to see that there are some regions where too little SWAbs occurs despite having either the correct total cloud fraction or too little cloud fraction, such as most tropical and subtropical oceans. This suggests biases in the cloud condensate simulations (cloud liquid and ice) as the cloud optical depth is important in affecting the net radiation budget (Kay et al. 2012; Nam et al. 2012). We note that the negative SWAbs bias regions in the Pacific and Indian Oceans coincide with maxima in the precipitation biases, suggesting that too little SWAbs in these regions results from the mislocation of precipitation and the accompanying highly reflective clouds.
Figure 9 shows the MMM OLR biases, and they are closely linked to the performance of clouds and column temperatures. There are negative biases in the western Indian Ocean, central Pacific, western Atlantic, Southern Ocean, and Australia continent. There are positive biases over most continental regions, marine stratocumulus regions, Southeast Asia, Maritime Continent, western Pacific, and eastern Atlantic. We can especially find in the tropics that the OLR biases have high correlations with the precipitation/convection biases (Fig. 5). Also, the positive biases over continents in the midlatitudes are likely linked to the less cloud fraction and higher surface temperature (shown later in Figs. 11 and 12). Most of these biases are present in the day-5 hindcasts, except biases over subtropical south and southeast Pacific.
Although we cannot examine all the fields associated with moist processes primarily because of the lack of high temporal frequency observations with global coverage, both our analysis above and previous studies (Brown et al. 2012; Xie et al. 2012; Ma et al. 2013) suggest that biases of moist processes (precipitation and cloud fraction) and the associated processes, such as radiation, can be identified in short-term hindcasts in most climate models (“fast processes”).
c. Bias correspondence in the large-scale states
In this subsection, we further demonstrate that errors in the modeled moist and associated processes could quickly feed back to large-scale fields that are highly influenced by these fast physics. In addition, there is also a correspondence in these large-scale states between short- and long-term systematic biases.
Figure 10 shows the bias correspondence for 2-m temperature T2m and sea level pressure (SLP) between TAMIP hindcasts and CMIP5/AMIP climate simulations (reference fields) in a Taylor diagram. Both T2m and SLP hindcasts are found to have strong bias correspondence with their corresponding climate runs. The bias correlations for most day-5 hindcasts are close to 0.8 with their standard deviations comparable to the climate runs.
Possible reasons for such high correspondence are further examined in their MMM CMIP5/AMIP biases (Figs. 11, 12). Large T2m biases (Fig. 11) are only found over land since SSTs are prescribed in these simulations. Cold biases occur over subtropical land regions, the North Pacific, the Tibetan Plateau, and both poles; warm biases occur over most of Eurasia, tropical Africa, North America, and tropical South America. The strong systematic warm and dry (see Fig. 5) biases over North America are consistent with a previous study by Klein et al. (2006) using hindcasts from the Geophysical Fluid Dynamics Laboratory Atmospheric Model, version 2. Their results suggest that the underestimate in precipitation and clouds in that region results in a deficit in soil moisture and a corresponding high surface shortwave flux bias, which both amplify the warm temperature bias. It is also interesting to see that most large climate biases are present in day-5 hindcasts. One possible explanation for the T2m biases is that the initial biases in the precipitation or clouds can affect the surface energy budget and in turn modify the boundary layer property, such as T2m. The land–atmosphere interactions can then enhance these biases through feedback processes in the long-term climate runs (Klein et al. 2006). Another possible explanation is that the methods for generating initial conditions for land models suggested by the TAMIP are very likely introducing initial condition biases for the land model simulations. For example, some of these initial condition errors may be as a result of errors in precipitation and radiation forcing of the land model during the nudging process used to generate the initial condition for the land model (Boyle et al. 2005). These biases are later amplified in the hindcasts through land–atmosphere interactions.
High SLP biases occur mostly over high latitudes and north Pacific and Atlantic; low SLP biases occur mostly over land, such as over Eurasia, the Arabian Peninsula, eastern Africa, the Maritime Continent, North America, and tropical South America (Fig. 12). Interestingly, most SLP biases are also present in the hindcasts. The biases over land could be connected to the warm temperature biases seen in Fig. 11 via thermal lows. Previous studies have suggested that land surface processes can affect the surface fluxes, which in turn modify the near surface temperature and lead to changes in surface pressure (Ma et al. 2010; Xue et al. 2012). The high SLP bias correspondence over mid- and high-latitude oceans also indicates a correspondence in the large-scale states.
Figure 13 further shows bias correspondence for zonal winds of selected pressure levels between TAMIP hindcasts and CMIP5/AMIP climate simulations in a Taylor diagram. There are large correlations between hindcast biases and AMIP biases for all the models in most of the tropospheric levels, with most correlations between 0.4 and 0.8. The spatial standard deviations for the hindcasts in tropospheric levels are also comparable to the AMIP biases. For levels above tropopause, smaller standard deviations indicate that the bias magnitude is small in the hindcasts, except for CNRM-CM5. These features are also similar for specific humidity and temperature fields (not shown).
Figure 14 displays day-2 and day-5 ensemble mean biases of the zonal mean of zonal velocity from individual TAMIP models, as well as biases from the corresponding CMIP5/AMIP simulations. The hindcast bias magnitude at day 5 for all the models in the troposphere is comparable to AMIP. However, the biases above tropopause are much smaller compared to AMIP bias, except for CNRM-CM5 and HadGEM2-Ai3. CNRM-CM5 is not expected to be realistic in the lower stratosphere since it is a low-top version (31 vertical levels with the top level at 10 hPa). This model version also has no spontaneous quasi-biennial oscillation (QBO) so that the QBO signal included in the initial conditions vanishes rapidly. This seems to be consistent with a positive zonal wind bias in the lower stratosphere in its TAMIP runs. The large biases at day 2 in the tropics in HadGEM2-Ai3 is mostly as a result of the differences between the ECMWF analysis and the Met Office analysis since we use ECMWF analysis as our observational reference and HadGEM2-Ai3 is initialized with the Met Office analysis. Nevertheless, most bias patterns in the zonal velocity from the hindcasts are similar to those in the AMIP simulations. These features are consistent with the bias correspondence analysis in Fig. 13.
The reasons for the bias correspondence in the zonal velocity (temperature or specific humidity) may be linked to interactions between convection and large-scale states, and we may diagnose this through biases in the midtropospheric vertical velocity. Figure 15 shows the MMM biases of CMIP5/AMIP 500-hPa vertical velocity. Larger biases in the tropics are mostly well correlated with precipitation biases (Fig. 5), except some large biases are over steep terrain likely because of the model numerics. Most of these biases are also present in the day-5 hindcasts for most models, suggesting a quick interaction between the convection/precipitation processes and the large-scale dynamics, which can in turn affect model parameterized physics by day 5.
4. Summary and discussion
In this study, we examined the robustness of whether systematic errors in the long-term mean state of climate models are present in just a few days of model integration when initialized with realistic operational analysis data (NWP mode). Our approach is to examine the correspondence between short- and long-time-scale systematic errors in the simulations from the TAMIP and CMIP5/AMIP.
We examined the bias correspondence (pattern statistics in a Taylor diagram) between hindcast and climate simulations of identical models. Even though the large-scale states (zonal velocity, temperature, or specific humidity) remain close to observations in the first 2 days of hindcasts, global bias patterns for selected fields in the hindcasts at day 5 are found to share great resemblance to those in the long-term AMIP runs with bias correlations ranging from 0.5 to 0.9. Variables, such as precipitation, cloud fraction, SWAbs, and OLR, are among the most important geophysical fields that a climate model needs to accurately simulate. The simulated errors typically saturate after a few days of hindcasts with amplitudes comparable to the climate errors. The impacts of initial conditions on the simulated mean errors are relatively small, as suggested by additional HadGEM2-Ai3 hindcasts initialized with Met Office operational analysis. However, this issue was not thoroughly studied here.
We further examine regional systematic errors based on the CMIP5/AMIP MMM biases. Although we only evaluated five climate models here, the global bias patterns are very similar to those obtained using all CMIP5/AMIP models ( appendix C). Most systematic errors apparent in the long-term climate mean state for moist and associated processes also appear in the day-5 ensemble hindcasts for most climate models examined here. Examples are excessive precipitation in much of the tropics and overestimation of SWAbs in the stratocumulus cloud decks over the eastern subtropical ocean and the Southern Ocean. The results suggest that these systematic errors across different climate models likely initiated by model parameterizations since large-scale flows remain close to observations in the first few days of the hindcasts. Nevertheless, they could be exacerbated later by local interactions with the flow field, which is tightly coupled to the parameterized heating.
By day 5, the errors in precipitation and clouds can affect large-scale state variables that are closely linked to the fast physics. For example, there is a strong bias correspondence between short- and long-term simulations of T2m and SLP. Examples are systematic warm biases over midlatitude continents. Such a bias correspondence may be as a result of the interactions among initial biases in precipitation, clouds, and surface fluxes or biases in initial conditions of the land models. There is also a strong bias correspondence between short- and long-term simulations for other large-scale states, such as zonal velocity, temperature, and specific humidity. By day 5 the hindcast biases shown in these fields in the tropospheric layers have large spatial correlations and similar magnitude of standard deviations compared to their climate biases. For the layers above the tropopause, the correlations are large but the magnitude of standard deviations is smaller, suggesting that the magnitude requires longer time to develop. Possible reasons for the fast impacts on the tropospheric biases are likely as a result of biases in convection.
The robustness of the bias correspondence between hindcasts and climate simulations in CMIP5 models allows for more in-depth analyses of these climate errors using the NWP approach. By improving the physical parameterizations in individual models in the NWP mode, this process could lead to the improvement of most climate models in simulating their climate mean state and potentially their future projections.
We are grateful to the ECMWF for making their operational analyses available. The efforts of H.-Y. Ma, S. Xie, S. A. Klein, and J. S. Boyle were funded by the Regional and Global Climate Modeling and Atmospheric System Research programs of the U.S. Department of Energy as part of the Cloud-Associated Parameterizations Testbed. This work was performed under the auspices of the U.S. Department of Energy by LLNL under Contract DE-AC52-07NA27344. The TAMIP work by S. Bony and S. Fermepin was supported by the FP7-ENV-2009-1 European project EUCLIPSE (#244067). The efforts of B. Medeiros and D. Williamson were partially supported by the Office of Science (BER), U.S. Department of Energy, Cooperative Agreement DE-FC02-97ER62402.
Impact of Different Starting Hours on the Hindcast Biases
For each TAMIP experiment (tamip200907 in this study), there are four different starting hours (0000, 0600, 1200, and 1800 UTC; see Table 3) for the 16 hindcast ensembles. Here we examine whether there is a significant impact of different starting hours on the hindcast biases. Figure A1 shows the pattern statistics of precipitation bias for the day-5 hindcast ensembles with different starting hours in a Taylor diagram. The reference fields are the precipitation mean biases of the 16 ensembles from individual models. We find that the impact of different starting hours on the 24-h average biases are small since the bias correlations are large for all the models and there is also no evidence suggesting if any starting hour especially has higher precipitation biases.
Discussion of Hindcast Ensemble Size for the Transpose-AMIP II
Unlike the analyses in Xie et al. (2012) or Ma et al. (2013) in which the hindcast ensemble sizes are 365 and 184 days, respectively, the TAMIP only provides 16 hindcasts for July–August 2009. To examine whether the bias patterns in the hindcast obtained here is representative, we further plotted in Fig. B1 the bias pattern statistics (spatial correlation and standard deviation) of 6-day hindcasts for precipitation with different sampling sizes from the NSF–DOE Community Atmosphere Model, version 4, in a Taylor diagram. These 6-day hindcasts are identical to those in Xie et al. (2012) or Ma et al. (2013). We selected two reference fields: 1) the hindcast bias ensemble from June–August 2009, which includes 92 hindcasts of 6 days, and 2) the June–August mean bias (1979–2008) of CMIP5/AMIP for CCSM4. We then selected 5, 10, 15, 30, 45, 60, and 90 hindcasts evenly out of the 92 June–August hindcasts and calculated the ensemble mean pattern statistics. Figure B1a shows that the spatial correlation is increased with ensemble size while the spatial standard deviation is decreased with ensemble size. As ensemble size is larger than 15, the correlations are larger than 0.9 and the standard deviations are closer to the mean biases of the full 92 hindcasts for all 6-day hindcasts. In Fig. B1b, day-5 hindcasts have similar bias correlations of approximately 0.5–0.6 and spatial standard deviations with ensemble sizes larger than 15. Therefore, this suggests that, with ensemble sizes larger than 15 for June–August 2009, the overall bias pattern in the simulations is very representative of the full June–August 2009 ensemble.
Representativeness of Climate Mean Biases for the Transpose-AMIP II Models
To examine whether the AMIP MMM biases from the five TAMIP models are representative of the full set of CMIP5/AMIP MMM biases, Fig. C1 displays the pattern statistics of June–August MMM biases of selected fields from TAMIP/AMIP in a Taylor diagram. The MMM biases from the full CMIP5/AMIP simulations (28 members) are the reference fields. All the fields from the TAMIP/AMIP have bias correlations larger than 0.9 with slightly larger spatial standard deviations suggesting the highly representative of MMM biases of the TAMIP/AMIP for the full CMIP5/AMIP models.
The National Center for Atmospheric Research is sponsored by the National Science Foundation.