The First International Urban Land Surface Model Comparison was designed to identify three aspects of the urban surface–atmosphere interactions: 1) the dominant physical processes, 2) the level of complexity required to model these, and 3) the parameter requirements for such a model. Offline simulations from 32 land surface schemes, with varying complexity, contributed to the comparison. Model results were analyzed within a framework of physical classifications and over four stages. The results show that the following are important urban processes: i) multiple reflections of shortwave radiation within street canyons; ii) reduction in the amount of visible sky from within the canyon, which impacts the net longwave radiation; iii) the contrast in surface temperatures between building roofs and street canyons; and iv) evaporation from vegetation. Models that use an appropriate bulk albedo based on multiple solar reflections, represent building roof surfaces separately from street canyons and include a representation of vegetation demonstrate more skill, but require parameter information on the albedo, height of the buildings relative to the width of the streets (height to width ratio), the fraction of building roofs compared to street canyons from a plan view (plan area fraction), and the fraction of the surface that is vegetated. These results, while based on a single site and less than 18 months of data, have implications for the future design of urban land surface models, the data that need to be measured in urban observational campaigns, and what needs to be included in initiatives for regional and global parameter databases.
The conclusions from the First International Urban Land Surface Model Comparison Project have implications for future models, observations, and parameter databases that extend beyond the urban modeling community.
Urban areas are often warmer than their sur-rounding rural environments, a condition that is referred to as the urban heat island (UHI). This urban warming has numerous effects, including the initiation of convective storms (e.g., Bornstein and Lin 2000), alterations to pollution dispersion patterns by adapting mixing through changes to atmospheric boundary layer structure (e.g., Sarrat et al. 2006; Luhar et al. 2014), impacts on the production and mixing of ozone (e.g., Chaxel and Chollet 2009; Ryu et al. 2013), enhanced energy demand for summertime cooling through air conditioning (e.g., Radhi and Sharples 2013; Li et al. 2014), impacts on urban ecology (e.g., Pickett et al. 2008; Francis and Chadwick 2013), and increased mortality rates during heat waves (e.g., Laaidi et al. 2011; Herbst et al. 2014; Saha et al. 2014). As such, it is important to be able to accurately forecast urban warming and other meteorological variables for cities where the majority of the world’s population now lives.
Predictions of future climate suggest additional warming in urban environments (McCarthy et al. 2010; Oleson et al. 2011). Indeed, the Intergovernmental Panel on Climate Change’s (IPCC) Working Group 1 Fifth Assessment Report (Stocker et al. 2013) included at least one model that explicitly included an urban representation, and this number is likely to increase in the future as the resolution of these climate models increases to the extent that some urban areas are resolved. For future design of buildings and planning of cities, it is important that the dominant processes that lead to urban warming effects are considered. This requires the development of models that can represent the most important features of the urban heat island for use in making reliable predictions.
The urban heat island results from differences in surface energy exchanges between the urban environment and its surrounding rural area. Thus, an understanding of these differences is needed to interpret the urban heat island. The differences in urban surface energy exchanges arise through a number of processes. The geometry of a street canyon will increase the incoming solar radiation and longwave radiation that are absorbed, due to multiple reflections, and reradiated from the three-dimensional structures. The orientation of street canyons and the elevation of the sun will impact the reflected solar radiation, as a consequence of the depth to which the direct sunshine can penetrate into the canyon. The reduced availability of water at the urban surface, compared to natural vegetated or bare soil surfaces, means more of the incoming solar radiation is transformed into heat rather than a flux of moisture into the atmosphere. However, a larger proportion of this energy for heating is held within the fabric of the buildings, given the large thermal inertia of the materials, resulting in changes in the diurnal cycle of urban temperatures. Moreover, an additional source of heating within the urban areas comes from human activities such as transport, the internal heating of the buildings, and the metabolic rates of the people themselves (e.g., Sailor and Lu 2004).
All of these processes contribute to the differences in the energy balance between urban and rural surfaces, but it is difficult to identify which are the dominant processes just from observations as the processes cannot be separated because of the complex nature of the environment. As such, the best way to study these processes individually is by using urban land surface models (ULSMs) that have been developed for weather and climate applications (i.e., exchange surface fluxes with an atmospheric model). There are a number of such ULSMs that vary considerably in their complexity (e.g., Kusaka et al. 2001; Fortuniak 2003; Krayenhoff and Voogt 2007; Hamdi and Masson 2008; Lee and Park 2008; Oleson et al. 2008a). Although newer models often include more complex features than previous versions, without knowing the dominant processes and controls, it is difficult to quantify the impact of each new feature.
The first urban land surface model comparison was designed to objectively assess and compare the performance of a range of ULSMs for a single observational site. It attempted to identify the dominant physical processes that need to be represented in ULSMs by comparing models of varying complexity (Table 1). These models ranged from simple bulk representations of the surface that have been applied to atmospheric models for over a decade, representations of the facets of a street canyon (i.e., roofs, walls, and roads) that have been used in weather and climate models, through to more recently developed schemes that consider a complete energy balance at various levels within the urban canyon that have been applied to stand-alone single-point studies. Figure 1 shows a conceptual representation of the surface energy balance for these models of varying complexity. While the scale that these models typically represent is larger than the size of the elements within a street canyon, a common feature is the ability to predict the exchange of fluxes between the urban surface and the atmosphere above it, that is, the net all-wave radiation (Q*) and the turbulent sensible (QH) and latent heat (QE) fluxes, as measured from flux towers in numerous urban observational campaigns.
The aim of the urban model comparison was to consider the following questions:
What are the dominant physical processes in the urban environment?
What is the level of complexity required for an ULSM to be fit for purpose?
What are the parameter requirements for such a model?
Here, we present an analysis of the model comparison results to address these questions.
MODEL COMPARISON DESIGN.
The criteria for selecting the evaluation dataset were as follow. First, it had not been used to evaluate any ULSMs previously, and second it needed to cover an annual cycle to allow assessment for different seasons. Model evaluation studies often result in the development and optimization of a model in order to obtain better representation of the assessed metrics. Hence, using a dataset previously used by one or a subset of the models to be evaluated would not enable a clean/independent objective assessment for all of the models.
The dataset for a suburb of Melbourne (Preston), Victoria, Australia (Coutts et al. 2007a,b), that had observations from 13 August 2003 to 13 November 2004 was selected. The moderately developed, low-density housing area is classified by Coutts et al. (2007b) as an urban climate zone (UCZ) 5 (Oke 2006), a local climate zone (LCZ) 6 (Stewart and Oke 2012), or by Loridan and Grimmond (2012) as an urban zone for energy exchange (UZE) medium density. The description of UCZ 5 is “medium development, low density suburban with 1 or 2 storey houses, e.g., suburban housing” (Oke 2006), and as such the site is typical of suburban areas found in North America, Europe, and Australasia. The area has a mean building height-to-width ratio of 0.42 and a mean wall-to-plan ratio of 0.4 (Coutts et al. 2007b). The surface is dominated by impervious cover (44.5% buildings, 4.5% concrete, and 13% roads), with a pervious cover of 38% (15% grass, 22.5% other vegetation, and 0.5% bare ground or pools) (Coutts et al. 2007a).
The methods used to obtain the observed fluxes applied to our current analysis are given in Table 2, with details (e.g., data processing) presented in the original observation papers (Coutts et al. 2007a,b). In addition, the initial model comparison results papers (Grimmond et al. 2011; Best and Grimmond 2013, 2014) provide the site parameters. A continuous gap-filled atmospheric forcing dataset (474 days) to run the models was created for this study (see Grimmond et al. 2011). To evaluate the modeled fluxes [sensible heat flux, latent heat flux, net all-wave radiative flux, and net storage heat flux (ΔQS)], 30-min periods are used when no observed fluxes are missing, to allow consistent analysis between the fluxes (N = 8865 or 38.9% of the full period).
To permit the research questions posed above to be considered, information about the observational site was released to the modeling groups in stages. This enabled analysis of the importance of the different types of information to model performance through the assessment of the change in model skill between the stages. The stages (Table 3), designed to correlate with ease of access to information for all cities globally, involved the release of (Grimmond et al. 2011) the following parameters:
stage 1—atmospheric forcing data (Table 3), typically provided by an atmospheric model;
stage 2—vegetation and built fraction, the two-dimensional plan area characteristics of the site; these characteristics can be determined from land cover datasets derived from satellite data;
stage 3—morphology, the three-dimensional characteristics of the site (Table 3); these traits can be interpreted from lidar (e.g., Goodwin et al. 2009; Lindberg and Grimmond 2011), aerial photographs (e.g., Ellefsen 1991), detailed satellite imagery (e.g., Brunner et al. 2010), or simple empirical relations (e.g., Bohnenstengel et al. 2011);
stage 4—building material parameters (Table 3), which are only obtainable from local knowledge of the materials used in the construction of the buildings; and
stage 5—observed fluxes, which allow parameter optimization studies. Only a few groups completed this stage, so these results are not presented here.
The results from 24 modeling groups are analyzed, involving 21 independent models (Table 1). Alternative versions of the same model were run by the same or independent modeling groups, which resulted in 32 sets of model simulations being submitted for all of the four stages (see full list in Grimmond et al. 2011). Each group completed a survey indicating the level of complexity used for various physical processes within their models. From the latter, categories of physical processes were established, with classes that cover the range of complexities (Grimmond et al. 2010, 2011). These categories were chosen to investigate the importance of various physical processes that could contribute to differences in the surface energy balance between the urban and rural environments. Thus, every model is assigned to a class in each category based on the survey information. In this study, the complexity category (Grimmond et al. 2011) is not considered, as the focus is to separate the specific physical processes. The categories, with the number of models in each class, are shown in Table 4.
Comparing the mean behavior of the models in each of the classes as a reference provides a method for determining the level of complexity that gives the best performance for each category. These data are analyzed to address the second research question, where “fit for purpose” in this study is defined as being able to accurately represent the energy exchange between the urban surface and the atmosphere (i.e., the net all-wave radiation and the turbulent sensible and latent heat fluxes).
Furthermore, by assessing the performance of the models across the categories for all classes, it is possible to identify the physical processes that have the largest impact on the performance of the models, hence identifying the dominant physical processes and addressing the first research question.
Initial results from the urban model comparison (Grimmond et al. 2011) ranked the models and assessed the performance of the various classes within the categories using standard statistical measures. Here, an alternative approach for assessing the models’ performance is used, which considers the percentage of the models’ data values that are within observational error (Eobs). This gives a measure between zero (no values within observational errors) and 100% (all values within observational errors, i.e., a “perfect” model). Although this type of analysis is not strictly benchmarking, as each model is not being compared to an a priori metric, it could be considered as being closer to the benchmarking ethos as having all data points within observational errors would be a stringent metric.
The observational error estimates used in this analysis are for daytime fluxes based on a percentage of the observed fluxes, as suggested by Hollinger and Richardson (2005): net all-wave radiation flux, 5%; turbulent sensible heat flux, 10%; latent heat flux, 8%; and upward components of both shortwave and longwave radiation fluxes, 10%. As the net storage heat flux in the observational dataset is determined to be the residual of the surface energy balance, its observational error is assumed to be the sum of the errors for the other terms (i.e., Q*, QH, and QE), giving 23%. The nighttime error estimates are assumed to be double the daytime error estimates for each of the fluxes. The absolute magnitude of the fluxes during this period is typically small [order of (10) W m−2]; hence, changes in the percentage of the observed flux used as the error estimates are likely to be within the reporting resolution [e.g., order of (1) W m−2] of the observations (especially the turbulent fluxes). While these error estimates may be indicative rather than the actual values, the results would not substantially change the analysis presented.
The analysis was undertaken for each model k in each class j within each category i (Table 4), for each flux, over each stage within the comparison, and separately for daytime and nighttime. From this, the percentage of data within observational error (Eobs i,j) was determined:
where M is the number of points within observational error for model k, n is the number of models, and T is the number of daytime or nighttime points in the time series, as appropriate.
Application of Eq. (1) to the sensible, latent, and net storage heat fluxes, for each class and category, at stages 1 and 4 (Table 3) is shown in Fig. 2. The results could range between 0% (i.e., no model data points within the observations errors) and 100% (i.e., all model data points within observational errors). The relative changes between the stages are also shown in Fig. 2, that is, for stage s the change relative to the previous stage (s − 1) given by
Assessment of “between stages performance” allows an emphasis on the common results across all of the classes and categories. It is scaled between 0% and 100%, with 50% corresponding to no change between the stages (Fig. 2).
Generally, the results of the analysis, consistent with Grimmond et al. (2011), show that the skill to model latent heat fluxes is improved between stages 1 and 2. Knowing the plan area vegetation fraction (provided in stage 2) is important for modeling the latent heat flux. No other stages show a general increase in model performance across the classes and categories for the fluxes shown in Fig. 2. For the radiation fluxes (Fig. 3), the largest changes evident between stages 3 and 4 are for the reflected shortwave radiation flux and are due to the specification of the bulk albedo at the site (i.e., the ratio of the reflected outgoing shortwave radiation flux from the whole urban surface to the incoming shortwave radiation flux, information released during stage 4). This is also consistent with the conclusions from Grimmond et al. (2011).
Model performance for the outgoing longwave radiation flux has its largest changes during the nighttime between stages 3 and 4 [when the 3D site morphological information (Table 3) was made available; see Fig. 3]. This enhanced performance at night could be related to improved estimates of the sky-view factor, which influences radiative trapping, and/or from improved estimates of the difference in nocturnal surface temperatures between building roofs and those of the roads and walls of the urban canyons. Improved performance is not detected in the daytime outgoing longwave radiation flux (Fig. 3), probably because of the dominance of shortwave radiation at this time. These results were not identified in Grimmond et al. (2011) as there was no separate analysis for daytime and nighttime.
It is evident from Figs. 2 and 3 that the performance of the models for each of the fluxes does not improve consistently for each stage, as might be expected. This suggests that the models are not able to correctly make use of all of the information that is provided at each of the stages and hence the design of the models, and the use of their specific parameters, is not necessarily correct. This is discussed further in Grimmond et al. (2011).
Each model is assigned to one class for every category (Table 4). This means that a model with particularly good (or poor) performance will influence the results for its class in each of the categories. The implications of this are that it is not possible to ensure that the good performance from a particular class within one category is not actually resulting from the results of a class from a different category. This potential contamination of results by categories inhibits the analysis of the dominant physical processes and the suitability of the models. Both the analysis presented in Grimmond et al. (2011) and that in Figs. 2 and 3 have this limitation; hence, we will not consider further any results in Figs. 2 and 3 for any specific class or category. Alternatively, to address this issue of cross-contamination, we repeat the complete analysis using Eq. (1) separately for each category c, but only considering the subset of models from class a. Hence, for each class j in category i for the analysis of Eq. (1), the models used are those that are in both class a of category c and class j of category i, of which there are ; thus,
This gives the equivalent of 26 versions of Figs. 2 and 3 (one for each class in each category), although for a given subset of models it is inevitable that some classes will not have any members and hence have no data. We then apply the following equation for each of the stages to determine which of the original class of models has the best performance:
where Pca is the percentage of classes in the analysis that are improved from just the subset of models (compared to the analysis with the full set of models),
is the number of classes that are improved in the analysis, Ntot is the total number of classes , and
is the number of classes with no data.
Hence, values of Pca close to 100% relate to nearly all classes in all categories being improved from the physical process represented in class a of category c. This indicates that this process and its representation are important to model performance, whereas values close to 0% relate to almost all classes in all categories being degraded, suggesting that the representation of the physical process is detrimental to model performance. Values around 50% have a similar number of classes that are improved and degraded, suggesting that the representation of the physical process has little impact on model performance. Hence, the conclusions that can be drawn from this analysis are more robust than those of Figs. 2 and 3 and from the previous study of Grimmond et al. (2011).
For example, with models that have an infinite number of reflections (category R, class i), the median of the results over the stages gives a value of 88% for the nighttime net storage heat flux (Fig. 4). This results from 14 of the 16 possible classes containing data that has shown improvement when considering only these models, demonstrating that this is important for predicting this flux. However, models that have multiple reflections (category R, class m) have a value of 12.5% for the nighttime net storage heat flux (Fig. 4). This results from only two of the possible 16 classes containing data that has been improved, hence, showing that this is detrimental to predicting the flux.
The results in Fig. 4 show that for some classes (e.g., infinite reflections; category R, class i; Table 4), there are some demonstrated improvements to a flux (e.g., LWup) that are not obviously explained by the physics (e.g., how do infinite reflections of shortwave radiation improve the outgoing longwave radiation but not the reflected shortwave?). Also, there are some classes that improve one particular flux, but not other fluxes. For example, models that represent the net storage heat flux as the residual of the surface energy balance (category S, class r; Table 4) demonstrate a clear improvement for the daytime sensible heat flux, but not for the latent or the net storage heat fluxes. This could be because with such models the sensible heat flux is not constrained by the energy balance giving them the freedom to enable better predictions of the sensible heat flux, while moisture availability is still the main control for the latent heat flux.
There are many such conclusions that can be drawn from Fig. 4. Here, the focus is on results that are consistent between the fluxes or consistent for a particular flux between the day- and nighttime.
Models with a bulk representation of the albedo and emissivity (category AE, class 1; Table 4), and a bulk representation of facets and orientation (category FO, class 1; the models in these two classes were identical), demonstrate an improvement in skill during the daytime for nearly all fluxes, with the exceptions of the outgoing longwave radiation, which shows little change in skill, and net all-wave radiation fluxes, with only small improvements (Fig. 4). This class of models also shows an improvement in the nighttime sensible and latent heat fluxes, but degradation in the radiative fluxes during the night. These improved results are most likely due to the ability to utilize the observed bulk albedo directly. This class of models clearly delivers the largest benefits across the fluxes and indicates the most significant physical process to represent is the bulk albedo for the urban surface, because the net shortwave radiation dominates the surface energy balance.
Improvements to the outgoing longwave radiation flux and the net all-wave radiation flux during both daytime and nighttime are obtained from models that have a single layer for each element of the urban environment (i.e., roofs and either urban canyons or walls and roads separately) in the morphology category (category L, class 2; Table 4 and Fig. 4). Improvements to the nighttime sensible heat flux and net storage heat flux are also obtained from this class of models, but there is no improvement to these fluxes during the daytime. This neutral daytime result in the sensible and net storage heat fluxes may be explained by the negative impact on the outgoing shortwave radiation flux, which dominates over the longwave radiation flux during the daytime. However, these results demonstrate the importance of presenting the difference in radiative surface temperatures between the roofs and the urban canyon, due to the nonlinear relationship between the upward longwave radiation and the radiative temperature.
When considering the way in which the models represent vegetation (category V; Table 4), we find that although including vegetation (classes s and i; Table 4) does generally lead to an improvement for the fluxes, these improvements are not as obvious as those from the bulk albedo or the single-layer urban morphology. Hence, although these results confirm those presented in earlier studies of the comparison (Grimmond et al. 2011; Best and Grimmond 2013, 2014), that representing vegetation gives improved results, we demonstrate that the more robust analysis presented here shows that this is not the most important physical process, as was concluded in these earlier studies. Getting the radiative fluxes correct from the shortwave radiation via the bulk albedo and the longwave radiation through the urban morphology are required before the vegetation can influence the partitioning of energy between the sensible and latent heat fluxes.
Previous studies of the urban comparison data have also concluded that models that neglect the anthropogenic heat flux (QF) do at least as well as the models that include this flux, although explaining this result has been difficult (Grimmond et al. 2011; Best and Grimmond 2013, 2014). However, the results in Fig. 4 show that although the class of models that neglects the anthropogenic heat flux (category AN, class n; Table 4) does improve some of the fluxes, the improvements are not consistent over all of the fluxes. Moreover, this class of models within the anthropogenic heat flux category is not always the one that delivers the best results. Hence, we can conclude that although the models that neglect the anthropogenic heat flux do show some improved results, we cannot make any significant statements about the classes within this category.
Prior conclusions from the ULSM comparison with daily (24 h) and seasonal analysis include that the representation of vegetation is critical to model performance (Grimmond et al. 2011; Best and Grimmond 2013), along with the associated initial soil moisture (Best and Grimmond 2014), and that the bulk albedo is also important (Grimmond et al. 2011). Notably, neglecting the distinctive urban anthropogenic heat flux was not found to penalize performance (albeit in the suburban area the value is small) (Best and Grimmond 2013). However, this new analysis considering diurnal performance (day, night) enables us to conclude that nocturnal radiative processes also benefit from accounting for the enhanced longwave trapping that occurs within urban areas. Separating the radiative processes of roofs and urban canyons is beneficial.
More critically, the more robust analysis presented here enables identification of a reprioritization of the key physical processes: first, ensuring the use of the correct bulk albedo for the urban surface; second, the outgoing longwave radiative fluxes with their representation of morphology separated into roofs and urban canyons; and third, the inclusion of vegetation. The implications of the bulk albedo is important for observations as the temporal resolution of satellite estimates means they will not provide the variations by the time of day that is observed (e.g., Christen and Voogt 2004; Grimmond et al. 2004; Kotthaus and Grimmond 2014).
The current results for anthropogenic heat flux are consistent with those of earlier studies, which show that neglecting the relatively small magnitude flux at this site (study period mean = ∼17 W m–2) is reasonable. This conclusion could well be different for urban environments where this is a more significant term in the surface energy balance. The flux is expected to be larger in other areas of Melbourne [e.g., as suggested from analysis using the model of Lindberg et al. (2013)] and for urban areas elsewhere. We therefore recommend that future model comparisons ideally include areas of cities with larger anthropogenic heat fluxes.
Thus, to answer the three overarching research questions of the urban model comparison, we present the following:
The dominant physical processes in the urban environment that models need to be able to simulate, in order, are changes to the bulk albedo of the surface that result from building materials and also shortwave trapping from the canyon geometry, the reduction in outgoing longwave radiation from the street canyon due to a reduced sky-view factor and the contrast between this and the roofs that see a full sky view, and the evaporation from vegetation.
For the current generation of ULSMs, the ability to utilize a bulk surface albedo (category AE, class 1; Table 4) and to be able to distinguish between the roofs of buildings and the urban canyons (category L, class 2), and to have a representation of vegetation (category V, classes s, i), results in the best performance.
The key parameters for ULSMs are the bulk surface albedo (information given for stage 4 influencing the upward shortwave radiation flux), the height-to-width ratio of the urban canyons and the fraction of building roofs to the urban canyons (information given for stage 3 influencing the upward longwave radiation flux), and the vegetation fraction (information given for stage 2 influencing the sensible and latent heat fluxes).
The results, from this and the previous studies on the ULSM comparison, all suggest that a simple representation for most of the physical categories is sufficient for this type of application, that is, determination of local scale fluxes (e.g., for use in the coupling to an atmospheric model). The prior categorization of the models (Grimmond et al. 2011; Best and Grimmond 2013) into (simple, medium, and complex) complexity classes based upon the number of physical categories treated as simple by a model demonstrated that the simple models performed best. This relative success of simple models suggests that for simulating local-scale fluxes, more complex schemes deliver little additional benefit. Furthermore, the reduced parameter requirements for simple schemes are advantageous for large-scale applications, such as global- or regional-scale modeling. However, it cannot be expected that this conclusion would also hold for other applications (e.g., atmospheric dispersion within street canyons of a specific city), as the simple models do not present some of the basic physical requirements for such applications. Thus, the requirement for the development of more complex ULSMs does remain.
The implications of this study go beyond the urban environment. In general, we need to balance the requirement for complexity within models against what is actually required for a model to be fit for purpose. Hence, new and more complex processes should not be included in models unless it can be demonstrated that they are required. In addition, consideration needs to be given to the availability of information to specify parameters within complex models and if such complexity can be justified given the uncertainty range for the parameters. Also, the type of analysis used here could be applied to any comparison study to ensure that the results are robust and not contaminated by physical processes not being directly considered.
These key conclusions are based on the single-site observational dataset of less than 18 months. This suburban site of low-density housing is typical of extensive areas in North America, Europe, and Australasia. Hence, we might expect the results from this study to be valid over a reasonable range of cities. However, most urban environments have a range of zones (e.g., Ellefsen 1991; Grimmond and Souch 1994; Stewart and Oke 2012) with very different characteristics. So to test if the results presented here are robust for other cities, similar “experiments” are required for additional sites with differing climates and urban characteristics. Hence, we recommend that further model comparison projects be developed for the urban community.
Despite these limitations, the results have implications for future development of ULSMs and for the types of data that need to be collected in future urban measurement campaigns (e.g., soil moisture, given its impact to limit transpiration and the long time scales required for model spinup, along with the conclusion that the fraction of vegetation is important for urban areas) and/or the parameters that should be collated systematically for cities around the world (e.g., Ching et al. 2009; Loridan and Grimmond 2012; Stewart and Oke 2012; Ching 2013; Faroux et al. 2013).
Funds to support the comparison project were provided by the Met Office (P001550). MJB was supported by the Joint DECC/Defra Met Office Hadley Centre Climate Programme (CA01101). We thank Andrew Coutts and Jason Beringer for allowing their data to be used for the comparison. We would also like to thank Mathew Blackett for all of his efforts in coordinating the model data collection, and to everyone who contributed results to the comparison from their models: J.-J. Baik, S. E. Belcher, J. Beringer, S. I. Bohnenstengel, I. Calmet, F. Chen, A. Dandou, K. Fortuniak, M. L. Gouvea, R. Hamdi, M. Hendry, M. Kanda, T. Kawai, Y. Kawamoto, H. Kondo, E. S. Krayenhoff, S.-H. Lee, T. Loridan, A. Martilli, V. Masson, S. Miao, K. Oleson, R. Ooka, G. Pigeon, A. Porson, Y.-H. Ryu, F. Salamanca, G. J. Steeneveld, M. Tombrou, J. A. Voogt, D. T. Young, and N. Zhang.