Search Results
You are looking at 11 - 20 of 29 items for
- Author or Editor: A. J. Pitman x
- Refine by Access: All Content x
Abstract
In addition to model validation techniques and intermodel comparison projects, the authors propose the use of software engineering metrics as an additional tool for the enhancement of “quality” in climate models. By discriminating between internal, directly measurable characteristics of structural complexity, and external characteristics, such as maintainability and comprehensibility, a way to benefit climate modeling by the use of easily derivable metrics is explored. As a small illustration, the results of a pilot project are presented. This is a subproject of the Project for Intercomparison of Landsurface Parameterization Schemes in which the authors use some typical structural complexity metrics, namely, for control flow, size, and coupling. Finally, and purely indicatively, the authors compare the results obtained from these metrics with scientists’ subjective views of the psychological complexity of the programs.
Abstract
In addition to model validation techniques and intermodel comparison projects, the authors propose the use of software engineering metrics as an additional tool for the enhancement of “quality” in climate models. By discriminating between internal, directly measurable characteristics of structural complexity, and external characteristics, such as maintainability and comprehensibility, a way to benefit climate modeling by the use of easily derivable metrics is explored. As a small illustration, the results of a pilot project are presented. This is a subproject of the Project for Intercomparison of Landsurface Parameterization Schemes in which the authors use some typical structural complexity metrics, namely, for control flow, size, and coupling. Finally, and purely indicatively, the authors compare the results obtained from these metrics with scientists’ subjective views of the psychological complexity of the programs.
Abstract
The role of land–atmosphere coupling in modulating the impact of land-use change (LUC) on regional climate extremes remains uncertain. Using the Weather and Research Forecasting Model, this study combines the Global Land–Atmosphere Coupling Experiment with regional LUC to assess the combined impact of land–atmosphere coupling and LUC on simulated temperature extremes. The experiment is applied to an ensemble of planetary boundary layer (PBL) and cumulus parameterizations to determine the sensitivity of the results to model physics. Results show a consistent weakening in the soil moisture–maximum temperature coupling strength with LUC irrespective of the model physics. In contrast, temperature extremes show an asymmetric response to LUC dependent on the choice of PBL scheme, which is linked to differences in the parameterization of vertical transport. This influences convective precipitation, contributing a positive feedback on soil moisture and consequently on the partitioning of the surface turbulent fluxes. The results suggest that the impact of LUC on temperature extremes depends on the land–atmosphere coupling that in turn depends on the choice of PBL. Indeed, the sign of the temperature change in hot extremes resulting from LUC can be changed simply by altering the choice of PBL. The authors also note concerns over the metrics used to measure coupling strength that reflect changes in variance but may not respond to LUC-type perturbations.
Abstract
The role of land–atmosphere coupling in modulating the impact of land-use change (LUC) on regional climate extremes remains uncertain. Using the Weather and Research Forecasting Model, this study combines the Global Land–Atmosphere Coupling Experiment with regional LUC to assess the combined impact of land–atmosphere coupling and LUC on simulated temperature extremes. The experiment is applied to an ensemble of planetary boundary layer (PBL) and cumulus parameterizations to determine the sensitivity of the results to model physics. Results show a consistent weakening in the soil moisture–maximum temperature coupling strength with LUC irrespective of the model physics. In contrast, temperature extremes show an asymmetric response to LUC dependent on the choice of PBL scheme, which is linked to differences in the parameterization of vertical transport. This influences convective precipitation, contributing a positive feedback on soil moisture and consequently on the partitioning of the surface turbulent fluxes. The results suggest that the impact of LUC on temperature extremes depends on the land–atmosphere coupling that in turn depends on the choice of PBL. Indeed, the sign of the temperature change in hot extremes resulting from LUC can be changed simply by altering the choice of PBL. The authors also note concerns over the metrics used to measure coupling strength that reflect changes in variance but may not respond to LUC-type perturbations.
Abstract
The coupled climate models used in the Fourth Assessment Report of the Intergovernmental Panel on Climate Change are evaluated. The evaluation is focused on 12 regions of Australia for the daily simulation of precipitation, minimum temperature, and maximum temperature. The evaluation is based on probability density functions and a simple quantitative measure of how well each climate model can capture the observed probability density functions for each variable and each region is introduced. Across all three variables, the coupled climate models perform better than expected. Precipitation is simulated reasonably by most and very well by a small number of models, although the problem with excessive drizzle is apparent in most models. Averaged over Australia, 3 of the 14 climate models capture more than 80% of the observed probability density functions for precipitation. Minimum temperature is simulated well, with 10 of the 13 climate models capturing more than 80% of the observed probability density functions. Maximum temperature is also reasonably simulated with 6 of 10 climate models capturing more than 80% of the observed probability density functions. An overall ranking of the climate models, for each of precipitation, maximum, and minimum temperatures, and averaged over these three variables, is presented. Those climate models that are skillful over Australia are identified, providing guidance on those climate models that should be used in impacts assessments where those impacts are based on precipitation or temperature. These results have no bearing on how well these models work elsewhere, but the methodology is potentially useful in assessing which of the many climate models should be used by impacts groups.
Abstract
The coupled climate models used in the Fourth Assessment Report of the Intergovernmental Panel on Climate Change are evaluated. The evaluation is focused on 12 regions of Australia for the daily simulation of precipitation, minimum temperature, and maximum temperature. The evaluation is based on probability density functions and a simple quantitative measure of how well each climate model can capture the observed probability density functions for each variable and each region is introduced. Across all three variables, the coupled climate models perform better than expected. Precipitation is simulated reasonably by most and very well by a small number of models, although the problem with excessive drizzle is apparent in most models. Averaged over Australia, 3 of the 14 climate models capture more than 80% of the observed probability density functions for precipitation. Minimum temperature is simulated well, with 10 of the 13 climate models capturing more than 80% of the observed probability density functions. Maximum temperature is also reasonably simulated with 6 of 10 climate models capturing more than 80% of the observed probability density functions. An overall ranking of the climate models, for each of precipitation, maximum, and minimum temperatures, and averaged over these three variables, is presented. Those climate models that are skillful over Australia are identified, providing guidance on those climate models that should be used in impacts assessments where those impacts are based on precipitation or temperature. These results have no bearing on how well these models work elsewhere, but the methodology is potentially useful in assessing which of the many climate models should be used by impacts groups.
The World Climate Research Programme Project for Intercomparison of Land Surface Parameterization Schemes (PILPS) is moving into its second and third phases that will exploit observational data and consider the performance of land surface schemes when coupled to their host climate models. The first stage of phase 2 will focus on an attempt to understand the large differences found during phase 1. The first site from which observations will be drawn for phase 2 intercomparisons is Cabauw, the Netherlands (51 °58′N, 4°56′E), selected specifically to try to reduce one of the causes of the divergence among the phase 1 results: the initialization of the deep soil moisture. Cabauw's deep soil is saturated throughout the year. It also offers a quality controlled set of meteorological forcing and 160 days of flux measurements. PILPS phase 2 follows the form of the phase 1 intercomparisons: simple off-line integrations and comparisons, but in phase 2 participating schemes' results will be compared against observed fluxes. Preliminary results indicate that between model variability persists (i) in better specified experiments and (ii) in comparison with data. Although median values are consistent with observations, there is a large range among models. Phase 3, in which the intercomparison of PILPS schemes as a component of global atmospheric circulation models, is being conducted jointly with the Atmospheric Model lntercomparison Project (AMIP) as diagnostic subproject number 12. Preliminary results suggest that results differ by about the same range as in the off-line experiments in phases 1 and 2. Incomplete diagnostics suggest that bucket and canopy models differ and that variability among models can be tracked to the soil moisture parameterization. This paper offers a review of the PILPS project to date and an invitation to participate in PILPS' current and future activities.
The World Climate Research Programme Project for Intercomparison of Land Surface Parameterization Schemes (PILPS) is moving into its second and third phases that will exploit observational data and consider the performance of land surface schemes when coupled to their host climate models. The first stage of phase 2 will focus on an attempt to understand the large differences found during phase 1. The first site from which observations will be drawn for phase 2 intercomparisons is Cabauw, the Netherlands (51 °58′N, 4°56′E), selected specifically to try to reduce one of the causes of the divergence among the phase 1 results: the initialization of the deep soil moisture. Cabauw's deep soil is saturated throughout the year. It also offers a quality controlled set of meteorological forcing and 160 days of flux measurements. PILPS phase 2 follows the form of the phase 1 intercomparisons: simple off-line integrations and comparisons, but in phase 2 participating schemes' results will be compared against observed fluxes. Preliminary results indicate that between model variability persists (i) in better specified experiments and (ii) in comparison with data. Although median values are consistent with observations, there is a large range among models. Phase 3, in which the intercomparison of PILPS schemes as a component of global atmospheric circulation models, is being conducted jointly with the Atmospheric Model lntercomparison Project (AMIP) as diagnostic subproject number 12. Preliminary results suggest that results differ by about the same range as in the off-line experiments in phases 1 and 2. Incomplete diagnostics suggest that bucket and canopy models differ and that variability among models can be tracked to the soil moisture parameterization. This paper offers a review of the PILPS project to date and an invitation to participate in PILPS' current and future activities.
Abstract
By coupling a multimode land surface scheme with a regional climate model, three scientific issues are addressed in this paper: (i) the regional model's sensitivity to the different levels of complexity presented by the land surface parameterization, (ii) relative model sensitivity to the land surface parameterization as compared with that to other model physical representations, and, (iii) following offline calibration, whether different complexity in the land surface representation leads to different model performance in the coupled experiments. In this study, a version of a regional model [Division of Atmospheric Research Limited Area Model (DARLAM)] is coupled with the Chameleon Surface Model (CHASM). Three sets of experiments are analyzed in this paper, employing six different complexity modes of CHASM. Model results from these coupled experiments show that the regional model is sensitive overall to different complexities represented in the CHASM modes. Moreover, these model sensitivities are larger than the model's intrinsic sensitivity to the perturbation of its initial conditions. The sensitivity is retained in a series of model configurations employing different vertical resolutions and convection schemes. Different complexities in the land surface representation lead to 10–30 W m−2 changes in surface evaporation and 0.5–2.5-K changes in surface temperature. In comparing different sets of coupled experiments, it is noted that, because of the complex feedbacks involved in air–land interactions, land surface parameterizations can induce quantitatively similar model sensitivity to that from changing other model aspects such as vertical resolution and convection parameterization. Although different CHASM modes can be calibrated to show similar offline results, when coupled with DARLAM these similarities between different complexity modes are significantly reduced. The sensitivity revealed in the coupled model simulations underlines the importance of understanding the feedbacks between model land surface parameterization and other physical components. More important, these results show that complexity in land surface representation cannot be substituted by tuning of parameters such as the surface or stomatal resistance, because offline agreement is not maintained in coupled simulations.
Abstract
By coupling a multimode land surface scheme with a regional climate model, three scientific issues are addressed in this paper: (i) the regional model's sensitivity to the different levels of complexity presented by the land surface parameterization, (ii) relative model sensitivity to the land surface parameterization as compared with that to other model physical representations, and, (iii) following offline calibration, whether different complexity in the land surface representation leads to different model performance in the coupled experiments. In this study, a version of a regional model [Division of Atmospheric Research Limited Area Model (DARLAM)] is coupled with the Chameleon Surface Model (CHASM). Three sets of experiments are analyzed in this paper, employing six different complexity modes of CHASM. Model results from these coupled experiments show that the regional model is sensitive overall to different complexities represented in the CHASM modes. Moreover, these model sensitivities are larger than the model's intrinsic sensitivity to the perturbation of its initial conditions. The sensitivity is retained in a series of model configurations employing different vertical resolutions and convection schemes. Different complexities in the land surface representation lead to 10–30 W m−2 changes in surface evaporation and 0.5–2.5-K changes in surface temperature. In comparing different sets of coupled experiments, it is noted that, because of the complex feedbacks involved in air–land interactions, land surface parameterizations can induce quantitatively similar model sensitivity to that from changing other model aspects such as vertical resolution and convection parameterization. Although different CHASM modes can be calibrated to show similar offline results, when coupled with DARLAM these similarities between different complexity modes are significantly reduced. The sensitivity revealed in the coupled model simulations underlines the importance of understanding the feedbacks between model land surface parameterization and other physical components. More important, these results show that complexity in land surface representation cannot be substituted by tuning of parameters such as the surface or stomatal resistance, because offline agreement is not maintained in coupled simulations.
Abstract
The relative importance of atmospheric advection and local land–atmosphere coupling to Australian precipitation is uncertain. Identifying the evaporative source regions and level of precipitation recycling can help quantify the importance of local and remote marine and terrestrial moisture to precipitation within the different hydroclimates across Australia. Using a three-dimensional Lagrangian back-trajectory approach, moisture from precipitation events across Australia during 1979–2013 was tracked to determine the source of moisture (the evaporative origin) and level of precipitation recycling. We show that source regions vary markedly for precipitation falling in different regions. Advected marine moisture was relatively more important than terrestrial contributions for precipitation in all regions and seasons. For Australia as a whole, contributions from precipitation recycling varied from ~11% in winter up to ~21% in summer. The strongest land–atmosphere coupling was in the northwest and southeast where recycled local land evapotranspiration accounted for an average of 9% of warm-season precipitation. Marine contributions to precipitation in the northwest of Australia increased in spring and, coupled with positive evaporation trends in the key source regions, suggest that the observed precipitation increase is the result of intensified evaporation in the Maritime Continent and Indian and Pacific Oceans. Less clear were the processes behind an observed shift in moisture contribution from winter to summer in southeastern Australia. Establishing the climatological source regions and the magnitude of moisture recycling enables future investigation of anomalous precipitation during extreme periods and provides further insight into the processes driving Australia’s variable precipitation.
Abstract
The relative importance of atmospheric advection and local land–atmosphere coupling to Australian precipitation is uncertain. Identifying the evaporative source regions and level of precipitation recycling can help quantify the importance of local and remote marine and terrestrial moisture to precipitation within the different hydroclimates across Australia. Using a three-dimensional Lagrangian back-trajectory approach, moisture from precipitation events across Australia during 1979–2013 was tracked to determine the source of moisture (the evaporative origin) and level of precipitation recycling. We show that source regions vary markedly for precipitation falling in different regions. Advected marine moisture was relatively more important than terrestrial contributions for precipitation in all regions and seasons. For Australia as a whole, contributions from precipitation recycling varied from ~11% in winter up to ~21% in summer. The strongest land–atmosphere coupling was in the northwest and southeast where recycled local land evapotranspiration accounted for an average of 9% of warm-season precipitation. Marine contributions to precipitation in the northwest of Australia increased in spring and, coupled with positive evaporation trends in the key source regions, suggest that the observed precipitation increase is the result of intensified evaporation in the Maritime Continent and Indian and Pacific Oceans. Less clear were the processes behind an observed shift in moisture contribution from winter to summer in southeastern Australia. Establishing the climatological source regions and the magnitude of moisture recycling enables future investigation of anomalous precipitation during extreme periods and provides further insight into the processes driving Australia’s variable precipitation.
Abstract
Global climate models play an important role in quantifying past and projecting future changes in drought. Previous studies have pointed to shortcomings in these models for simulating droughts, but systematic evaluation of their level of agreement has been limited. Here, historical simulations (1950–2004) for 20 models from the latest Coupled Model Intercomparison Project (CMIP5) were analyzed for a variety of drought metrics and thresholds using a standardized drought index. Model agreement was investigated for different types of drought (precipitation, runoff, and soil moisture) and how this varied with drought severity and duration. At the global scale, climate models were shown to agree well on most precipitation drought metrics, but systematically underestimated precipitation drought intensity compared to observations. Conversely, simulated runoff and soil moisture droughts varied significantly across models, particularly for intensity. Differences in precipitation simulations were found to explain model differences in runoff and soil moisture drought metrics over some regions, but predominantly with respect to drought intensity. This suggests it is insufficient to evaluate models for precipitation droughts to increase confidence in model performance for other types of drought. This study shows large but metric-dependent discrepancies in CMIP5 for modeling different types of droughts that relate strongly to the component models (i.e., atmospheric or land surface scheme) used in the coupled modeling systems. Our results point to a need to consider multiple models in drought impact studies to account for high model uncertainties.
Abstract
Global climate models play an important role in quantifying past and projecting future changes in drought. Previous studies have pointed to shortcomings in these models for simulating droughts, but systematic evaluation of their level of agreement has been limited. Here, historical simulations (1950–2004) for 20 models from the latest Coupled Model Intercomparison Project (CMIP5) were analyzed for a variety of drought metrics and thresholds using a standardized drought index. Model agreement was investigated for different types of drought (precipitation, runoff, and soil moisture) and how this varied with drought severity and duration. At the global scale, climate models were shown to agree well on most precipitation drought metrics, but systematically underestimated precipitation drought intensity compared to observations. Conversely, simulated runoff and soil moisture droughts varied significantly across models, particularly for intensity. Differences in precipitation simulations were found to explain model differences in runoff and soil moisture drought metrics over some regions, but predominantly with respect to drought intensity. This suggests it is insufficient to evaluate models for precipitation droughts to increase confidence in model performance for other types of drought. This study shows large but metric-dependent discrepancies in CMIP5 for modeling different types of droughts that relate strongly to the component models (i.e., atmospheric or land surface scheme) used in the coupled modeling systems. Our results point to a need to consider multiple models in drought impact studies to account for high model uncertainties.
Abstract
The multicriteria methodology, which provides a means to estimate optimal ranges for land surface model parameter values via calibration, is evaluated. Following calibration, differences between schemes resulting from effective parameter values can be isolated from differences resulting from scheme structure or scheme parameterizations. The method is applied to the Project for the Intercomparison of Land Surface Parameterization Schemes (PILPS) phase-2a data from the Cabauw site in the Netherlands using the Chameleon Surface Model (CHASM) as the surrogate for a range of land surface schemes. Simulations are performed calibrating six modes of CHASM, representing a range of land surface complexity, against observed net radiation and latent and sensible heat fluxes. The six modes range from a simple bucket model to a complex mosaic-type structure with separate energy balances for each mosaic tile and explicit treatment of transpiration, canopy interception, and bare-ground evaporation. Results demonstrate that the performance of CHASM depends on the complexity of the representation of the surface energy balance. If the multicriteria method is used with two observed variables, the performance of the model improves little with incremental increases in complexity until the most complex version of the model is reached. If the multicriteria method is used with three observed variables, the most complex mode is shown to calibrate more accurately and more precisely than the simple modes. In all cases, every calibrated mode performs better than simulations using the default PILPS phase-2a parameters. The performance of the most complex mode of CHASM suggests that more complex representations of the surface energy balance generally improve the calibrated performance of land surface schemes. However, all modes, when calibrated, retain a residual error that most likely is due to parameterization errors included in the scheme. Most error is contained in the simulation of the latent heat flux, which suggests that, to improve CHASM further, the representation of the surface hydrological processes should be developed. Thus, the multicriteria method provides a means to assess the performance of a single model or group of land surface models and provides guidance as to the directions scheme development should take.
Abstract
The multicriteria methodology, which provides a means to estimate optimal ranges for land surface model parameter values via calibration, is evaluated. Following calibration, differences between schemes resulting from effective parameter values can be isolated from differences resulting from scheme structure or scheme parameterizations. The method is applied to the Project for the Intercomparison of Land Surface Parameterization Schemes (PILPS) phase-2a data from the Cabauw site in the Netherlands using the Chameleon Surface Model (CHASM) as the surrogate for a range of land surface schemes. Simulations are performed calibrating six modes of CHASM, representing a range of land surface complexity, against observed net radiation and latent and sensible heat fluxes. The six modes range from a simple bucket model to a complex mosaic-type structure with separate energy balances for each mosaic tile and explicit treatment of transpiration, canopy interception, and bare-ground evaporation. Results demonstrate that the performance of CHASM depends on the complexity of the representation of the surface energy balance. If the multicriteria method is used with two observed variables, the performance of the model improves little with incremental increases in complexity until the most complex version of the model is reached. If the multicriteria method is used with three observed variables, the most complex mode is shown to calibrate more accurately and more precisely than the simple modes. In all cases, every calibrated mode performs better than simulations using the default PILPS phase-2a parameters. The performance of the most complex mode of CHASM suggests that more complex representations of the surface energy balance generally improve the calibrated performance of land surface schemes. However, all modes, when calibrated, retain a residual error that most likely is due to parameterization errors included in the scheme. Most error is contained in the simulation of the latent heat flux, which suggests that, to improve CHASM further, the representation of the surface hydrological processes should be developed. Thus, the multicriteria method provides a means to assess the performance of a single model or group of land surface models and provides guidance as to the directions scheme development should take.
Abstract
China is several decades into large-scale afforestation programs to help address significant ecological and environmental degradation, with further afforestation planned for the future. However, the biophysical impact of afforestation on local surface temperature remains poorly understood, particularly in midlatitude regions where the importance of the radiative effect driven by albedo and the nonradiative effect driven by energy partitioning is uncertain. To examine this issue, we investigated the local impact of afforestation by comparing adjacent forest and open land pixels using satellite observations between 2001 and 2012. We attributed local surface temperature change between adjacent forest and open land to radiative and nonradiative effects over China based on the Intrinsic Biophysical Mechanism (IBM) method. Our results reveal that forest causes warming of 0.23°C (±0.21°C) through the radiative effect and cooling of −0.74°C (±0.50°C) through the nonradiative effect on local surface temperature compared with open land. The nonradiative effect explains about 79% (±16%) of local surface temperature change between adjacent forest and open land. The contribution of the nonradiative effect varies with forest and open land types. The largest cooling is achieved by replacing grasslands or rain-fed croplands with evergreen tree types. Conversely, converting irrigated croplands to deciduous broadleaf forest leads to warming. This provides new guidance on afforestation strategies, including how these should be informed by local conditions to avoid amplifying climate-related warming.
Abstract
China is several decades into large-scale afforestation programs to help address significant ecological and environmental degradation, with further afforestation planned for the future. However, the biophysical impact of afforestation on local surface temperature remains poorly understood, particularly in midlatitude regions where the importance of the radiative effect driven by albedo and the nonradiative effect driven by energy partitioning is uncertain. To examine this issue, we investigated the local impact of afforestation by comparing adjacent forest and open land pixels using satellite observations between 2001 and 2012. We attributed local surface temperature change between adjacent forest and open land to radiative and nonradiative effects over China based on the Intrinsic Biophysical Mechanism (IBM) method. Our results reveal that forest causes warming of 0.23°C (±0.21°C) through the radiative effect and cooling of −0.74°C (±0.50°C) through the nonradiative effect on local surface temperature compared with open land. The nonradiative effect explains about 79% (±16%) of local surface temperature change between adjacent forest and open land. The contribution of the nonradiative effect varies with forest and open land types. The largest cooling is achieved by replacing grasslands or rain-fed croplands with evergreen tree types. Conversely, converting irrigated croplands to deciduous broadleaf forest leads to warming. This provides new guidance on afforestation strategies, including how these should be informed by local conditions to avoid amplifying climate-related warming.
Abstract
The Protocol for the Analysis of Land Surface Models (PALS) Land Surface Model Benchmarking Evaluation Project (PLUMBER) was designed to be a land surface model (LSM) benchmarking intercomparison. Unlike the traditional methods of LSM evaluation or comparison, benchmarking uses a fundamentally different approach in that it sets expectations of performance in a range of metrics a priori—before model simulations are performed. This can lead to very different conclusions about LSM performance. For this study, both simple physically based models and empirical relationships were used as the benchmarks. Simulations were performed with 13 LSMs using atmospheric forcing for 20 sites, and then model performance relative to these benchmarks was examined. Results show that even for commonly used statistical metrics, the LSMs’ performance varies considerably when compared to the different benchmarks. All models outperform the simple physically based benchmarks, but for sensible heat flux the LSMs are themselves outperformed by an out-of-sample linear regression against downward shortwave radiation. While moisture information is clearly central to latent heat flux prediction, the LSMs are still outperformed by a three-variable nonlinear regression that uses instantaneous atmospheric humidity and temperature in addition to downward shortwave radiation. These results highlight the limitations of the prevailing paradigm of LSM evaluation that simply compares an LSM to observations and to other LSMs without a mechanism to objectively quantify the expectations of performance. The authors conclude that their results challenge the conceptual view of energy partitioning at the land surface.
Abstract
The Protocol for the Analysis of Land Surface Models (PALS) Land Surface Model Benchmarking Evaluation Project (PLUMBER) was designed to be a land surface model (LSM) benchmarking intercomparison. Unlike the traditional methods of LSM evaluation or comparison, benchmarking uses a fundamentally different approach in that it sets expectations of performance in a range of metrics a priori—before model simulations are performed. This can lead to very different conclusions about LSM performance. For this study, both simple physically based models and empirical relationships were used as the benchmarks. Simulations were performed with 13 LSMs using atmospheric forcing for 20 sites, and then model performance relative to these benchmarks was examined. Results show that even for commonly used statistical metrics, the LSMs’ performance varies considerably when compared to the different benchmarks. All models outperform the simple physically based benchmarks, but for sensible heat flux the LSMs are themselves outperformed by an out-of-sample linear regression against downward shortwave radiation. While moisture information is clearly central to latent heat flux prediction, the LSMs are still outperformed by a three-variable nonlinear regression that uses instantaneous atmospheric humidity and temperature in addition to downward shortwave radiation. These results highlight the limitations of the prevailing paradigm of LSM evaluation that simply compares an LSM to observations and to other LSMs without a mechanism to objectively quantify the expectations of performance. The authors conclude that their results challenge the conceptual view of energy partitioning at the land surface.