The diurnal cycles of near-surface variables and turbulent heat fluxes are evaluated in 16 models from phase 5 of CMIP (CMIP5) and compared with observations from 26 flux tower sites. The diurnal cycle of 2-m temperature agrees well in general with what is observed. The amplitude of the diurnal cycle of wind speed shows a large intermodel spread and is often overestimated at midlatitude grassland sites and underestimated at midlatitude forest sites. There is a substantial systematic negative bias in the nighttime net surface radiative flux, which is partly compensated for by the turbulent heat fluxes. Four models (CESM1, BCC_CSM1.1, HadGEM2-A, and IPSL-CM5A) are evaluated in more detail, including the vertical structure of the atmospheric boundary layer, at the ARM Southern Great Plains site in Oklahoma. At that site, all models tend to frequently overestimate the boundary layer depth and the wind turning in the boundary layer reveals large intermodel differences. In summer, these models exhibit a substantial warm bias with particularly high daytime temperatures. These high temperatures are associated with very small latent heat fluxes, indicating that the soil is too dry, which is likely to impact climate change scenarios.
Global climate models (GCMs) are expected to reliably represent all important processes and aspects of the climate system. Although many evaluations of the land models in the GCMs are performed using various observational datasets (e.g., Kumar and Merwade 2011; Schwalm et al. 2010; Bonan et al. 2012), evaluation of near-surface meteorological parameters, besides the mean global 2-m temperature, are fewer. Several model intercomparison studies that aim to isolate the boundary layer parameterization’s performance have been done, such as the GEWEX Atmospheric Boundary Layer Studies (GABLS). These studies present large intermodel spread that is not easily explained or improved (Holtslag et al. 2013).
New parts are added to models in the transition from global climate models to the more complex Earth system models (ESMs) and many of those depend on near-surface variables and planetary boundary layer (PBL) processes, both on the mean values and on the variations. Such processes include, for example, emissions of mineral dust aerosols from land sources that in a very nonlinear way depend on the surface friction velocity (e.g., Shao et al. 2011) and the dependence of the ecosystems’ net exchange of carbon dioxide on diurnal variations in temperature (e.g., Yi et al. 2010).
Previous studies of the diurnal cycle in GCMs include a comparison with a few months of detailed PBL observations at six locations (Garratt et al. 2002), showing a generally good consistency between model and observations. Dai and Trenberth (2004) evaluated an earlier version of the NCAR Community Climate System Model (CCSM2) with synoptic station data in which the model captured the amplitude of the diurnal cycle of temperature rather well over land. Lindvall et al. (2013) examined the two most recent versions of the NCAR Community Atmosphere Model (CAM4 and CAM5) in comparison with multiyear observations from 35 flux tower sites with high-frequency measurements in a range of different climate zones. They found that both model versions capture the timing of the diurnal cycle but considerably overestimate the diurnal amplitude of net radiation, temperature, wind, and turbulent heat fluxes. The most striking difference between the two CAM versions is, however, the low-level wind speed over land, which in CAM5 is about half as strong as in CAM4. The reason is the turbulent mountain stress parameterization, only applied in CAM5, which acts to increase the surface stress and thereby reduce the wind speed. The influence of the extra surface stress in CAM5 on the general atmospheric circulation, in terms of improved mean surface pressure fields, is not easily understood (Lindvall 2014).
The correct near-surface temperature climate does not depend solely on how well the PBL and surface exchange is modeled. The local temperature climate is also dependent on the general circulation, radiation, cloudiness, and precipitation. Biases or model deficiencies in all these aspects will manifest themselves as biases in near-surface temperatures as well. Over land at lower latitudes and at midlatitudes during summer, the turbulent surface heat fluxes must, in the mean (except what is stored in the ground), balance the incoming energy through the radiative fluxes. The challenge then lies in the energy partition between sensible and latent heat (i.e., the Bowen ratio), which in turn has a strong impact on the PBL structure and cloud formation (e.g., Gentine et al. 2013). During most of the winter at mid and high latitudes, however, no such energy constraints exist since the net turbulent heat fluxes generally are directed downward. At those locations and times, larger variability between GCMs is expected as well as larger deviations from observations. Although the absolute biases might be smaller in those regions, they may have a considerable impact on, for example, thawing of permafrost (Koven et al. 2013) and the extent and length of the snow season (Mortin et al. 2014).
The stably stratified atmospheric boundary layer can be very shallow, typically from a few meters up to a few hundred meters. The formulation of the surface exchange (i.e., the boundary condition for the PBL model and the coupling of the land–atmosphere interface) is commonly based on Monin–Obukhov theory, applicable in the lowest 10% of the PBL. Thus, the GCMs of today do not have enough vertical resolution, commonly one to three layers below 200 m, for most of the stably stratified cases. Even with a much higher resolution, models struggle with the representation of the structure of the PBL, especially for strongly stable cases e.g., (Holtslag et al. 2013) and thereby also the diurnal cycle in temperature and wind (Cuxart et al. 2006; Svensson and Holtslag 2009; Svensson et al. 2011).
There is a tight coupling between the near-surface gradients of the mean variables and the surface turbulent energy fluxes. It is thus informative to evaluate diurnal cycles of near-surface gradients along with the surface fluxes. For this to be possible, long-term datasets of concurrent observations are needed. There are no global observational datasets for turbulent fluxes suitable for evaluation of the diurnal cycle, but a number of flux sites exist and are utilized in this study (FLUXNET; http://fluxnet.ornl.gov/; Baldocchi et al. 2001). The FLUXNET network was established during the 1990s and consists of over 500 sites. A selection of these sites, chosen to cover a range of climate zones and ecosystems, has previously been utilized to evaluate CESM1 (Lindvall et al. 2013).
In this study, results from 16 models participating in phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012) are analyzed and compared with observations. The focus is on the diurnal cycle of near-surface and surface parameters and we utilize the flux tower dataset discussed in Lindvall et al. (2013). Furthermore, we take advantage of the experiments from the second phase of the Cloud Feedback Model Intercomparison Project (CFMIP-2), which provide more detailed high-frequency data on model levels for certain locations. Here, we choose to concentrate on a handful of models in a more detailed examination of the model results in comparison with data collected at a midlatitude grassland site, the Atmospheric Radiation Measurement Program (ARM) Southern Great Plains (SGP) site in Oklahoma.
1) Flux tower sites
High-frequency observations are needed to examine the diurnal cycle of near-surface variables of temperature, specific humidity, wind speed, and turbulent and radiative fluxes. There is no such global dataset and in this study we therefore choose to study the diurnal cycle at 26 selected sites. The sites provide long-term measurements of standard meteorological variables as well as measurements of turbulence using the eddy correlation technique (Aubinet et al. 1999; Baldocchi et al. 1988). Ameriflux (http://ameriflux.lbl.gov), AsiaFlux (http://asiaflux.net), CarboEurope (www.carboeurope.org), Fluxnet Canada (www.fluxnet-canada.ca), and OzFlux (www.ozflux.org.au), all participating in FLUXNET, as well as the Coordinated Energy and Water Cycle Observation Project (CEOP) archived by the NCAR Earth Observing Laboratory (EOL; http://data.eol.ucar.edu) have provided the flux tower data. The sites constitute a subset of the dataset utilized in Lindvall et al. (2013) and the selected sites location (see Fig. 1) and the dominant surface type in the surroundings can be found in Table 1. We refer to Lindvall et al. (2013) for a more detailed description of the sites.
In the CMIP5 database, instantaneous 3-hourly model results of 2-m temperature (T2m), 2-m specific humidity (q2m), and 10-m wind speeds (U10m) are provided, while the radiative and turbulent heat fluxes are 3-h means. The fluxes at the observational sites are measured in 30- or 60-min intervals. To make a fair comparison between models and observations, we have chosen to make 3-h averages of the observed fluxes. The diurnal cycles are evaluated by treating the model and observational data as “climatological data” and computing median diurnal cycles instead of comparing specific years [for more details, see Lindvall et al. (2013)].
2) ARM Southern Great Plains site
Simulated atmospheric vertical profiles are compared with observations at the ARM Southern Great Plains site in Oklahoma, located at 36.6°N, 97.5°W, and 318 m above sea level. The ARM Climate Modeling Best Estimate (CMBE) dataset (Xie et al. 2010) provides a range of measured atmospheric variables. In this study we mainly use the four times daily radiosonde observations, providing measurements of temperature, moisture, and wind velocities. The analyzed period is 2002–07 as the friction velocity needed to calculate the boundary layer height is provided by the Ameriflux dataset, which overlaps with the CMBE dataset for those years.
b. Model data
The 16 CMIP5 models that are analyzed in this study are listed in Table 2 along with the country of origin and the resolution of their atmospheric components. The simulated diurnal cycles are compared with observations from flux tower observation sites. All sites are land-based and therefore AMIP simulations with prescribed sea surface temperatures (SSTs) and sea ice extent are used.
For some of the models (BCC_CSM1.1, HadGEM2-A, and IPSL-CM5A), data for the CFMIP-2 experiments were also available although for IPSL the horizontal resolution is a bit different (see Table 2). We use AMIP simulations with high-frequency data on model levels for one of their sites. In addition, we have performed a simulation with a slightly updated version of the atmospheric component of CESM1(CAM5.3). This experiment is of AMIP type since we use prescribed aerosols, representative for year 2000, and climatological SST. This simulation is only used in the analysis at the CFMIP site. The CAM version used in the CMIP5 experiments is CAM5.1; here we denote both CAM experiments CESM1.
The selection of models, from CMIP5 and CFMIP-2 suites of experiments, is solely based on the availability of model data at the time of the analysis and the output frequency needed. The data presented are 10-yr averages over 1999–2008, except for CESM1 where the CMIP5 simulation is for 1996–2005. When several similar versions of a GCM were available, we chose to only include one.
The comparison with observations is done by choosing the nearest model grid point. While the flux sites are point observations and generally found in relatively homogeneous surroundings, each GCM grid box often consists of several surface types. Choosing only sites located in homogenous grid boxes that would accommodate all models is not possible. Moreover, the distribution of the different surface types in the models is for most models not available in the CMIP5 database. However, several of the models use the NCAR Community Land Model (CLM4; Oleson et al. 2010; Lawrence et al. 2011) and it is likely that the other models will have similar vegetation. We have therefore investigated the homogeneity of the grid boxes in CLM4 (approximately 1° gridbox size). For the boreal forest we choose to include only the two sites where the nearest grid box in CLM4 consists of approximately 90% forest. The five tropical forest sites are located in grid boxes consisting of 71%–96% forest. For the midlatitude sites we had to be less strict to be able to include European sites, where large areas with a single surface type dominating are rare. However, although two out of the seven midlatitude grassland/cropland sites are located in heterogeneous grid boxes in CLM4, the other five sites are found in grid boxes with 60%–90% grassland (and with grassland in over 90% of the vegetated part of the grid box). For the midlatitude forest sites, four are located in mixed grid boxes, while the other half has at least 70% and up to 90% forest. Despite our efforts, it is likely that in the model results the grassland and the forest sites will have some similarities, since a grid box always represents a mixture of both, and this will be more problematic in low-resolution models. The more detailed comparisons with the four CFMIP-2 models are done for one of the midlatitude grass sites, the ARM Southern Great Plains (see Table 1); 83% of the corresponding CLM grid box is defined as grassland.
The horizontal and vertical resolution vary among the models (see Table 2), which influences the analysis in several ways. The height to the first model level also varies among models but all have a rather deep layer, except EC-EARTH, which has the first mean level at 10 m. Even though some of the models, but not all, share the same land model, there are likely slight differences in how the coupling to the atmosphere is done. However, all models rely on Monin–Obukhov surface layer theory and the presence of a constant flux layer. The selected models use quite different parameterizations of boundary layer turbulence and vertical grid resolution. Ten of the models apply first-order closure schemes and have their first mean model level below 50 m (CanAM4, EC-EARTH, HadGEM2-A, and IPSL-CM5A) or a higher first level (BCC_CSM1.1, BNU-ESM, CCSM4, GISS-E2-R, INM-CM4.0, and NorESM1-M). Six of the models have some sort of turbulent kinetic energy (TKE) scheme (CNRM-CM5, MIROC5, MRI-AGCM3.2H, MRI-CGCM3, CESM1, and FGOALS-s2.0) of which most have a first level below 50 m; the exceptions are CESM1 and FGOALS-s2.0. The CFMIP-2 models contain both first-order (BCC_CSM1.1, HadGEM2-A, and IPSL-CM5A) and TKE scheme (CESM1) with different heights to the first model level (20–60 m).
a. Diurnal cycles
The diurnal cycles of CMIP5 model results for T2m, q2m, U10m, latent heat flux (LH), sensible heat flux (SH), and net radiation (Rn) at the surface are presented in Figs. 2–4 (as well as Fig. 6) along with observations from the sites listed in Table 1. The figures show seasonal median summer and winter results for several different types of surface vegetation cover. Two or more sites are included in each category (see Table 1). Note that for T2m, q2m, and U10m, the diurnal mean is subtracted and it is only the diurnal cycle that is shown in the figures. Table 3 presents the subtracted range of model means as well as the observed mean values. The individual model results are shown as well as the model ensemble median. Only the four models that are analyzed further in section 3b are identified separately by color in the figures. These models cover much of the spread in the ensemble although unfortunately U10m is not available for two of the models contributing to CFMIP-2 (BCC_CSM1.1 and CESM1).
1) Midlatitude grassland and forest sites
Figures 2a and 2b show summer results for midlatitude grassland and forest sites, respectively. All models behave similarly for T2m over grasslands and compare well with the observations except for a small phase shift with the morning minimum and afternoon maximum earlier in the day. At the forest sites, the T2m amplitude is overestimated by all models and the morning minimum occurs earlier in most models. The diurnal cycle in q2m is quite small in the observations with peaks in morning and afternoon. Most of the models show a similar behavior but with variable amplitude; the ensemble model median underestimates (overestimates) the amplitude at the grassland (forest) sites. One model (CanAM4) has an unrealistically large diurnal cycle in q2m. The wind speed diurnal cycle is underestimated by the ensemble model median at the grassland sites and somewhat overestimated at the forested sites. One of the TKE models with a deep first model layer (FGOALS-s2.0) has a diurnal cycle that is out of phase, which could be a sign of the first model layer being above the PBL during night.
The model ensemble-mean diurnal bias (not shown) for T2m is a few degrees at both sites. Most individual models are biased high, some by as much as 10°C (see Table 3). The ensemble mean bias in U10m is small for the grassland sites but more than 1 m s−1 at midday over forest. The ensemble mean of q2m is biased high with about 2 g kg−1 over forest but with a very small bias over grassland.
It can be noted that the ensemble model median diurnal cycles of wind speed and temperature are very similar at the grassland (Fig. 2a) and forest (Fig. 2b) sites, which is not true for the observations. This similarity in the models may be explained by the fact that the surface categories represented in the models consist of a mixture of grassland and forest. This illustrates some of the difficulty in comparing local observations with large-scale models. The effect of having a small fraction of rougher surface (e.g., forest) in a grassland-dominated area is larger than vice versa. This tends to make the results from these two categories more similar.
When examining the heat fluxes some common diurnal biases emerge. The total available energy for the surface through the radiative fluxes (Rn) is too large during the day in all models at both types of sites. This extra energy is returned to the atmosphere through the turbulent fluxes of sensible and latent heat, which together are consequently overestimated in all models at all these sites. Some individual models underestimate either SH or LH, but the ensemble means are larger than the observations in both turbulent heat fluxes at all times. The model median Bowen ratio is close to the observed value although there is a large intermodel variation.
The GCMs’ nighttime radiative fluxes are overestimated, particularly at the grassland sites, which would lead to surface cooling. This is partly compensated by the overestimated SH, which brings heat from the atmosphere to the surface. The overestimation of SH can be due to a number of reasons, including too much mixing due to long-tail formulations that give extra turbulence at strong stability, too sharp near-surface temperature gradients, and/or too strong wind. The coarse vertical resolution in the models means that this heat is extracted from a large volume of air and the consequence for the T2m is not that large. The observed LH is close to zero at night but is overestimated by all models.
In winter, the available energy at these midlatitude sites is lower and the T2m amplitude is weaker (see Figs. 3a and 3b). When the observed Rn is positive, its magnitude is relatively well captured by the models, but during night, there is a strong negative bias in all models except BCC_CSM1.1 due to too little downwelling longwave radiation (Rlwd; not shown). Again, this is partly compensated for by a stronger downward SH in the models during night. Part of the reason for this strong negative SH could be that the model ensemble wind speed bias is >1 m s−1 during night with a big range in mean wind speeds for the individual models (see Table 3). Several models also have biases of 2–4 m s−1 throughout most of the day at all midlatitude sites. During the day, however, the SH is mostly underestimated and the LH overestimated, especially at the forest sites.
It is interesting to note that the observed LH at the midlatitude forested sites shows a very slow increase in the morning after Rn has become positive in both seasons. This behavior is not seen over grassland (or boreal forest; see below) and it is not seen in the GCMs where, in winter, LH starts increasing even before Rn is positive.
2) Boreal forest and tundra/wetland sites
Next we turn to the sites at higher latitudes, the boreal forest (Fig. 4) and tundra/wetland (not shown). There are large differences between summer and winter at these high-latitude locations in terms of the available radiative energy and large mean temperature differences (see Table 3). The boreal forest sites have short (about 6 h) summer nights and winter days. Note that Rn is rather well captured by the model ensemble median during summer (Fig. 4a) except for the too negative values during night and too large peak values, both signals of too little cloud impact in the GCMs. The intermodel spread, however, is large. SH is slightly too low during the whole diurnal cycle, while LH is overestimated by basically all models and consequently the model ensemble Bowen ratio is consistently too small. The observed diurnal cycle of q2m is poorly captured by the models with the humidity deficit in early afternoon missing in most GCMs in summer. The diurnal variation in U10m is well captured by the model ensemble median but, just as for midlatitude forest (Fig. 2b), some models have a far too strong diurnal cycle, overestimated by about 1 m s−1. The model differences in the mean U10m are large both summer and winter and are biased high compared with observations (see Table 3).
The time period when Rn is positive during winter at the boreal sites is a couple of hours shorter in the GCMs than in the observations (Fig. 4b). Nevertheless, the ensemble model median diurnal cycles in temperature and wind speed are rather well captured in winter. The SH is underestimated during the entire cycle and the ensemble model mean bias is about −10 W m−2 throughout the day. The observed LH has very small absolute values during winter at these sites and some models give negative values during night, contrary to what is observed, while others overestimate the flux at all times.
The model median biases in temperature is 1°–2°C during summer but about −4°C in winter with large intermodel spread (see Table 3). The amplitude of wintertime q2m is very small in both the observations and models. Observations of humidity at these temperatures (mostly below freezing) can be problematic but the shape of the observed diurnal cycle is similar to that in summer with a morning peak. However, all GCMs have very similar diurnal cycles in winter and peak at early afternoon instead. Most models are biased low by about 1 g kg−1 (see Table 3), which is not surprising considering the negative bias in temperature.
Moving farther north to the tundra sites, there is an even more pronounced difference between summer and winter. During summer, the observed median net radiation does not turn negative while the model ensemble does (not shown). The model median diurnal cycles of T2m and U10m are too weak although the amplitude of Rn is larger in the models than observed. The amplitudes of SH and LH are well captured by some of the models; these have TKE schemes and also the best T2m amplitude. At night as well as during winter, there is a reoccurring problem with a too negative Rn in the models as the ensemble model median is about −35 W m−2 while the observed median values are between −3 and −5 W m−2. Simultaneously, the magnitude of the downward SH is too large (not shown).
3) Nighttime radiative and turbulent sensible heat fluxes
The analysis of the diurnal cycles indicates that for all groups of sites, and especially at high and midlatitudes, the models simulate a too negative Rn as well as a too negative SH during nighttime, whereas the T2m does not show a consistent bias. This is most obvious during winter as is seen in Fig. 5, which shows the simulated versus the observed T2m, Rn, and downward component of the longwave surface radiative flux at night in December–February (DJF) for the 10 midlatitude and Arctic sites that report the longwave radiative surface components separately. There is no mean bias in T2m considering the entire temperature range (Fig. 5a) although the model ensemble mean tend to be a bit too warm at the coldest sites. Individual models are up to 10°C warmer in the observed range from −20° to −25°C. The mean bias in Rn is as large as −19 W m−2 (Fig. 5b). While the observations never exhibit smaller values than −45 W m−2, the models simulate values as low as −75 W m−2. This is not compensated for by the turbulent heat fluxes, which both have too large absolute values but of opposite signs (not shown). The bias in Rn is partly explained by too little Rlwd (see Fig. 5c), which is underestimated by on average 12 W m−2.
4) Tropical sites
At low latitudes, the annual cycle is less pronounced; therefore we examine the annual median diurnal cycle in Fig. 6. The value of Rn is reasonably well captured during night and in the morning by the ensemble model median but becomes too large around noon compared with the observations. This deficiency is also seen in the diurnal cycle of T2m where the models are warmer than the observations around midday. This overestimation in the models is likely due to the clouds and the timing of precipitation, which may cool the surface, not being well represented. The models also tend to cool too much during the night and the modeled time of minimum temperature differs with a few hours from what is observed. Also, U10m amplitude is rather well described by the ensemble model median, while a group of models with a first-order closure and a deep first model layer overestimate the amplitude. The observed and ensemble model median give about the same diurnal amplitude of q2m but are 12 h out of phase. The ensemble model median of SH shows good agreement with observations while the LH is overestimated at all times. These differences in the turbulent heat fluxes give rise to large variations in Bowen ratios as well. The model median bias in T2m is on the order of 1°C and the modeled mean speeds range from 1.7 to 6.0 s−1, so the wind speeds are overestimated in all models (see Table 3).
b. ARM Southern Great Plains
In this section, a subset of the models is examined more closely at the ARM SGP main site. This site is also included in the grassland category discussed above and first we examine the performance of all models at this midlatitude continental site. As seen in Fig. 7, diurnal cycles from the four selected models are found distributed around the model ensemble median. The seasonal means of the model ensemble span 0.6°–6.3°C in winter and 27°–36°C in summer, while the observed seasonal mean values are 2.8° and 26.3°C, respectively. The model spread is narrowed substantially when comparing only the four models that are examined closer. BCC_CSM1.1, CESM1, HadGEM2-A, and IPSL-CM5A-LR have annual mean T2m of 16.1°, 17.6°, 18.7°, and 17.9°C (model ensemble 14.9°–20.6°C), a bit high compared with observations that show 15.1°C. This subset of models is also, however, far too warm during summer and reasons for this will be analyzed further below. The diurnal cycle in T2m is well represented in three of the models while BCC_CSM1.1 has a too weak cycle in winter.
The diurnal cycle in wind speed is very variable in the models, as previously discussed for the group of grassland sites. The rather sudden increase in observed morning wind speed, which is more pronounced in summer, occurs in the models (if it happens at all) spread over a number of hours, especially in winter, similar to what was found in Svensson et al. (2011). The annual mean wind speeds vary between 3 and 8 m s−1 in the model ensemble, while the observed value is 5 m s−1. The near-surface humidity diurnal cycles are weak and in winter the model ensemble is close to the observations. Note that we have omitted CanAM4 in this figure due to unrealistic values. In summer, the observations show an earlier evening increase in q2m than the models.
At this latitude, the maximum net radiation is about twice as high in summer as in winter, which the models capture. The model median for the whole ensemble compares well with observations during daytime; however, almost all models are losing too much energy by radiation during the night as discussed above. The daytime turbulent latent heat flux is generally underestimated in both seasons. The sensible heat fluxes are larger in magnitude both night and day compared with the observations. The observed Bowen ratio is less than one in both models and observations, but the models have much larger values with consequences for the near-surface parameters and the PBL development.
1) Boundary layer height
The ARM SGP site soundings, launched four times a day (indicated by solid vertical lines in Fig. 7), are used to estimate the observed PBL height. The method used is outlined in Vogelezang and Holtslag (1996) and is based on finding the height where a bulk Richardson number (which also takes into account the effect of surface friction) exceeds 0.3. The same method is used to find the quantity in the model results, but the analysis can only be done in the CFMIP models as they provide data on model levels. As we only have hourly data for CESM1, the model times (dashed vertical lines in Fig. 7) differ by half an hour from the observations. The estimated PBL depths are shown in Fig. 8.
At midnight, most observed PBL heights fall in the 150–400-m category while the models tend to have deeper PBLs. HadGEM2-A has a larger share of shallow (<150 m) PBLs than observed while CESM1 has about 15% of the data in the 900–1600-m range compared to a few percent in the observations. The tendency to overestimate the PBL depths is present at the other times as well. At noon and 1800 LT, most data fall in the 400–900-m range and the models and observations agree fairly well. Both the observations and the models show the widest PBL depth distributions at 1800 LT. The model distributions are skewed toward deeper PBLs and especially the occasions with deep PBLs (>1600 m) or more are too frequent in the models.
2) Large-scale characteristics
The properties of the PBL and the near-surface characteristics depend on the large-scale circulation in combination with local processes. Using the derived PBL heights, the properties in the free atmosphere just above the PBL can be analyzed. Figure 9 shows the joint probability distribution functions (PDFs) of the temperature and wind direction above the PBL. It is clear from both models and observations that the climatology at this location has two preferred wind directions that also divide the temperature distribution in two modes that overlap. Three of the models agree very well with the observations in terms of wind direction. The peak in IPSL-CM5A-MR is shifted about 10° toward north while the peak in BCC_CSM1.1 is less pronounced; they have about equally likely values over the range 230°–340°. The modeled temperature distributions tend to show flatter distributions, toward higher temperatures in all models and in BCC_CSM1.1 also toward colder occasions. The overall impression is nevertheless that the models capture these large-scale determined parameters quite well.
3) Summer temperature bias
As discussed above, the models all have strong positive biases in T2m in June–August (JJA) at the ARM SGP site. In Fig. 10, the joint PDF of noon T2m and LH at this site is shown. The observations have a peak around 30°C with most records in the range 20°–40°C. The corresponding LH values vary between about 100 and 600 W m−2 with no clear dependence on temperature. The models all show a shift in the temperature with peaks at 5°–10°C higher values. The most striking difference, though, is the very low LH in these hot conditions, which are most extreme in BCC_CSM1.1 and IPSL-CM5A-MR with the majority of values below 100 W m−2. HadGEM2-A shows a slightly broader LH distribution and CESM1 seem to have two regimes, one very hot with very limited latent heat fluxes and one at around 30°C with LH values that agree better with observations and increase with increasing temperatures. As expected, the models show deeper PBLs for warmer surfaces (not shown). The PDF of PBL heights (Fig. 10) compare well with observations for CESM1, while BCC_CSM1.1 is on the low side for this season and time, and HadGEM2-A and IPSL-CM5A-MR are biased high. HadGEM2-A in particular produces many occasions with unrealistically high PBL heights.
It appears that the models tend to get into a too warm summer regime with a very dry surface, resulting in a higher (lower) SH (LH) than observed (see Figs. 7 and 10). This gives also rise to too deep PBLs in HadGEM2-A and IPSL-CM5A-MR as well as part of the time in CESM1. It is a bit puzzling that BCC_CSM1.1 shows less of a deepening of the PBL while still displaying a large underestimation of LH. CESM1 seems to spend some of the time in this too warm and dry regime while at other times it is much closer to observations with more reasonable values of both SH and LH.
4) Near-surface stability and vertical profiles
The vertical structures of the median profiles of potential temperature and wind speeds for summer and winter seasons, four times a day, are presented in Fig. 11. The profiles are scaled using the PBL height and interpolated to z/h levels, where z is the height above the ground and h is the PBL height, chosen as the seasonal mean z/h for each model level and time of day. Observed variability is included in the figure showing 25th and 75th percentiles. In the case of potential temperature, the 2-m value is subtracted from the profiles. The profiles are also sorted into three stability classes based on a calculated near-surface flux Richardson number:
where θ is the potential temperature, the vertical heat flux, the friction velocity, and U the wind speed. The three stability classes are unstable (upward sensible heat flux, thus ), near neutral on the stable side (weakly stable, ), and stable (). Median profiles are only presented for categories that contain more than 10% of the data. The total percentage of observed unstable cases is about 41% and the models are in the range of 36%–42%. Also, 22% of the observed cases are weakly stable compared to 11%–23% for the models. All models have more cases, 3%–11% points, in the stable regime than the observed value of 37%.
Table 4 provides more detailed statistics on the stability regimes found in the models at different times and seasons. At noon, unstable conditions dominate completely in both models and observations. At midnight in spring and summer, the models and observations all agree that there are less than approximately 5% of unstable cases, whereas there are larger discrepancies in the division between the two stable classes. However, that could partly be due to the methods of deriving the Richardson flux number. At midnight in fall and winter, the observations show 9% and 14% unstable conditions, respectively, which is captured rather well by the models. For the remainder of times (0600 and 1800 LT) for all seasons, the differences among the models and in comparison with observations are larger. These transition times between day and night add extra challenges to models (e.g., Svensson et al. 2011; Bosveld et al. 2014), and the slightly different timing of the model output and the observations could also matter more at these times.
It is clear from Fig. 11 that the weakly stable category contains the cases with the strongest winds and has a high observed variability. Most models overestimate the mean PBL wind speed during these conditions as well as the variability (not shown), particularly at midnight. The observations show a much stronger potential temperature increase over the weakly stable layer but, due to the stronger wind shear near the surface, it is still considered near neutral. In winter, the weakly stable cases actually show a stronger temperature inversion over the entire PBL than the stable class for which less wind shear is observed.
The overall impression is that CESM1 has a PBL structure closest to the observations except near the surface, which could be explained by its deep first model layer (about 60 m). The PBL closure in CESM1 is a diagnostic TKE scheme, which could potentially capture more vertical structure than K-profile schemes. However, at 1800 LT in summer (Fig. 11b) the CESM1 profile in the stable category, containing 25% of the data (observations have only 8%; see Table 4), shows a very slow increase in wind speed with height and not much structure in temperature. HadGEM2-A shows the largest temperature gradients near the surface for basically all times and stability categories and it has the coldest temperatures in the bulk of the PBL at noon in both summer and winter. This is likely due to the too warm near-surface temperatures that produce the too deep PBLs, already discussed above. IPSL-CM5A-MR is often close to HadGEM2-A and they both show too strong PBL winds at noon in summer, likely because of the too deep PBL mixing down more momentum. The models generally show less variability than the observations; however, the variability is often larger in the models at the transition times (0600 and 1800 LT) and most often for wind speed (especially in winter; not shown).
5) Wind turning over the PBL
The PBL is important for the large-scale circulation, in particular the development of cyclones, because of the cross-isobaric flow. A way of estimating the magnitude of this flow, which has a direct effect on the pressure field as well as secondary effects in terms of spindown of circulations (e.g., Holton and Hakim 2012; Beare 2007), is to examine the wind directional shift between the surface and the top of the PBL. Figure 12 presents this wind turning for all cases as well as divided into the stability classes used in the previous section. In the calculations, we use the first model level as an approximation for the surface and the level just above the PBL as the free flow. Most of the turning between these levels is in the Ekman direction (positive values) and both observations and models show a clear preference for this direction. Two of the models, BCC_CSM1.1 and HadGEM2-A, show ranges of values that are very similar to the observations, although they have larger peaks at small turning angles than what is observed. The PDF for IPSL-CM5A-MR is much narrower than that of the observations, especially for the stable categories. CESM1 is the model that diverges the most, with many cases of turning angles of 50° or more. A likely reason is the large subgrid orographic surface drag that is applied in this model using a turbulent mountain stress (TMS) parameterization (Richter et al. 2010). This TMS parameterization was added to remove enough momentum from the atmosphere in order to improve the general circulation but with consequences for the near-surface treatment. This is discussed further in Lindvall et al. (2013) and Lindvall (2014).
The net mass flux is not only dependent on the wind turning over the PBL; the depth of the layer is also of importance. Following the procedure outlined in Svensson and Holtslag (2009), the magnitude of the mass flux is proportional to the surface stress component directly opposing the free atmospheric wind (i.e., the wind above the PBL). Table 5 presents the mean values of this quantity. The model values are surprisingly close to each other, around 0.2 m2 s−2, and are twice as high as the observations of 0.1 m2 s−2. The mean observed values for the unstable and weakly stable categories are similar, while the stable category has much lower values, which is expected since the stable PBL is generally much shallower. In the observations, the value for the stably stratified category is only 20% of the value in the unstable category, whereas in the models the stable category has much larger values, varying between 25% and 50% of the respective unstable value in the models. The difference between the models and observations is largest for the stably stratified cases, but the convective part dominates the total magnitude since it is twice as common as the weakly stable case; see section 3b(4).
An evaluation of the diurnal cycle in 16 CMIP5 models for 2-m temperature and specific humidity, 10-m wind speed, turbulent fluxes of latent and sensible heat, and net radiation at the surface is presented. Ten years of AMIP model experiments are compared with flux station data (FLUXNET; Baldocchi et al. 2001) following the procedure in Lindvall et al. (2013). From the analysis we draw the following conclusions:
The amplitude of the ensemble model median diurnal cycle of 2-m temperature compares well with observations over grassland in summer, midlatitude forest in winter, and boreal forest and tropical sites in both seasons, but it is too weak over midlatitude forest sites in summer and grassland in winter. A large intermodel spread and substantial biases are, however, found at all locations and times.
The diurnal cycle of 10-m wind speed is underestimated at midlatitude grassland sites and slightly overestimated at midlatitude forest sites, whereas the model ensemble median captures the amplitude quite well at the boreal forest and tropical sites. There is, however, a large model spread at all sites and some models even have diurnal cycles out of phase with the observations.
The evaluation of turbulent heat fluxes and net radiation shows substantial differences in the diurnal cycle. Further analysis of the nighttime energy fluxes reveals some systematic biases common for all models including a too negative net radiation, which is not entirely compensated for by the turbulent heat fluxes. They are both too large but of opposite signs with the sensible heat flux being too negative and latent heat flux too large and positive. The downwelling longwave radiation during winter reveals an ensemble model median bias of about −19 W m−2 at night, which points to interlinked problems with the boundary layer turbulence and radiation.
A subset of the models (CESM1, BCC_CSM1.1, HadGEM2-A, and IPSL-CM5A) is evaluated in more detail using the more extensive dataset available from the Atmospheric Radiation Measurement Program (ARM) for the Southern Great Plains in Oklahoma. When examining the diurnal cycles at this grassland site, the four evaluated models scatter around the larger model ensemble and appear to be fairly representative of the model ensemble. From this part of the study, we conclude the following:
The diurnal cycle of 2-m temperature is well represented in three of the models, while it is too weak in winter in BCC_CSM1.1. The simulated diurnal cycles of wind speed are quite variable and the mean wind speeds in the models vary between 3 and 8 m s−1 compared with an observed value of 5 m s−1.
All four models generally overestimate the boundary layer depth in all seasons. The overestimation is most pronounced in summer.
The models show a substantial daytime mean warm bias in summer with up to 10°C. The very high daytime temperatures are combined with very low turbulent latent heat fluxes, which indicates that the soil is much drier than observed or that the simulated plant transpiration is too limited during warm conditions. These biases likely have consequences for climate change scenarios.
Diagnosing the wind turning over the boundary layer and the magnitude of the surface stress opposing the wind just above the boundary layer provides an estimate of the cross-isobaric mass flux. This analysis shows large differences between the models and between models and observations. The wind turning in BCC_CSM1.1 and HadGEM2-A is similar to the observations, while IPSL-CM5A exhibits a much more narrow distribution and CESM1 a much larger wind turning. The modeled mean surface stress opposing the free flow wind is twice as large as the observed values. The absolute difference between the models and the observations is largest for convective conditions but the percentage difference is largest for the stable cases.
We acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups (listed in Table 2 of this paper) for producing and making available their model output. For CMIP the U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. We would also like to acknowledge high performance computing support from Yellowstone (http://www2.cisl.ucar.edu/resources/yellowstone) provided by NCAR’s Computational and Information Systems Laboratory, sponsored by the National Science Foundation. The CESM project is supported by the National Science Foundation and the Office of Science (BER) of the U.S. Department of Energy. We also thank the individual PIs at the flux sites and their teams for the data collection and preparation. We acknowledge the Ameriflux, AsiaFlux, CarboEurope IP, Fluxnet Canada, LBA, and OzFlux projects, all parts of FLUXNET, for coordinating and providing data. Data from the Coordinated Energy and Water Cycle Observation Project (CEOP) was provided by NCAR/EOL (http://data.eol.ucar.edu/) under the sponsorship of the National Science Foundation.
Current affiliation: Department of Meteorology, Stockholm University, Stockholm, Sweden.