Physics-Constrained Deep Learning Postprocessing of Temperature and Humidity

Francesco Zanetta aInstitute for Atmospheric and Climate Science, ETH Zürich, Zürich, Switzerland
bFederal Office of Meteorology and Climatology, MeteoSwiss, Locarno, Switzerland

Search for other papers by Francesco Zanetta in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-4954-4298
,
Daniele Nerini bFederal Office of Meteorology and Climatology, MeteoSwiss, Locarno, Switzerland

Search for other papers by Daniele Nerini in
Current site
Google Scholar
PubMed
Close
,
Tom Beucler cInstitute of Earth Surface Dynamics, University of Lausanne, Lausanne, Switzerland

Search for other papers by Tom Beucler in
Current site
Google Scholar
PubMed
Close
, and
Mark A. Liniger bFederal Office of Meteorology and Climatology, MeteoSwiss, Locarno, Switzerland

Search for other papers by Mark A. Liniger in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Weather forecasting centers currently rely on statistical postprocessing methods to minimize forecast error. This improves skill but can lead to predictions that violate physical principles or disregard dependencies between variables, which can be problematic for downstream applications and for the trustworthiness of postprocessing models, especially when they are based on new machine learning approaches. Building on recent advances in physics-informed machine learning, we propose to achieve physical consistency in deep learning–based postprocessing models by integrating meteorological expertise in the form of analytic equations. Applied to the postprocessing of surface weather in Switzerland, we find that constraining a neural network to enforce thermodynamic state equations yields physically consistent predictions of temperature and humidity without compromising performance. Our approach is especially advantageous when data are scarce, and our findings suggest that incorporating domain expertise into postprocessing models allows the optimization of weather forecast information while satisfying application-specific requirements.

Significance Statement

Postprocessing is a widely used approach to reduce forecast error using statistics, but it may lead to physical inconsistencies. This outcome can be problematic for trustworthiness and downstream applications. We present the first machine learning–based postprocessing method intentionally designed to strictly enforce physical laws. Our framework improves physical consistency without sacrificing performance and suggests that human expertise can be incorporated into postprocessing models via analytic equations.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Francesco Zanetta, zanettaf@ethz.ch

Abstract

Weather forecasting centers currently rely on statistical postprocessing methods to minimize forecast error. This improves skill but can lead to predictions that violate physical principles or disregard dependencies between variables, which can be problematic for downstream applications and for the trustworthiness of postprocessing models, especially when they are based on new machine learning approaches. Building on recent advances in physics-informed machine learning, we propose to achieve physical consistency in deep learning–based postprocessing models by integrating meteorological expertise in the form of analytic equations. Applied to the postprocessing of surface weather in Switzerland, we find that constraining a neural network to enforce thermodynamic state equations yields physically consistent predictions of temperature and humidity without compromising performance. Our approach is especially advantageous when data are scarce, and our findings suggest that incorporating domain expertise into postprocessing models allows the optimization of weather forecast information while satisfying application-specific requirements.

Significance Statement

Postprocessing is a widely used approach to reduce forecast error using statistics, but it may lead to physical inconsistencies. This outcome can be problematic for trustworthiness and downstream applications. We present the first machine learning–based postprocessing method intentionally designed to strictly enforce physical laws. Our framework improves physical consistency without sacrificing performance and suggests that human expertise can be incorporated into postprocessing models via analytic equations.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Francesco Zanetta, zanettaf@ethz.ch

1. Introduction

Weather forecasting centers heavily rely on statistical methods to correct and refine numerical weather prediction (NWP) outputs, which improves skill at low computational cost (Hemri et al. 2014; Vannitsem et al. 2021). While the fundamental approach has remained the same for decades—statistically relating past NWP model outputs and other additional data, such as topographic descriptors or seasonality, to observations—the traditional divide between physical and statistical modeling is narrowing as increasingly more sophisticated models emerge to harness the growing volume of available data (Vannitsem et al. 2021).

Current research focuses particularly on machine learning (ML) techniques, with deep learning (DL) and artificial neural networks (ANNs) emerging as a modern class of postprocessing methods with the potential to outperform traditional approaches in several aspects. For example, Rasp and Lerch (2018) found that simple feedforward ANNs could significantly outperform traditional regression based postprocessing techniques, while being less computationally demanding at inference time. The authors highlighted that ANNs could better incorporate nonlinear relationships in a data-driven fashion and were more suited to handle the increasing volumes of model and observation data thanks to their flexibility. ANNs have also been combined with other statistical techniques such as Bernstein polynomials (Bremnes 2020) for nonparametric probabilistic predictions.

Furthermore, more sophisticated ANNs such as convolutional neural networks (CNNs) have the ability to incorporate spatial and temporal data with unprecedented flexibility. Grönquist et al. (2021) used CNNs to improve forecasts of global weather. Höhlein et al. (2020) and Veldkamp et al. (2021) used CNNs for spatial downscaling of surface wind fields. A process-specific application was proposed by Chapman et al. (2019, 2022), with the goal of improving the prediction of atmospheric rivers, which are filaments of intense horizontal water vapor transport (Ralph et al. 2018). Dai and Hemri (2021) implemented a generative adversarial network based on CNNs to produce physically realistic postprocessed forecasts of cloud cover. Thus, first attempts at using DL-based approaches have shown promising improvements over traditional approaches, as they better capture nonlinear dependencies and often require less feature engineering. Still, a number of challenges remain in applying ML approaches to the postprocessing world (Vannitsem et al. 2021; Haupt et al. 2021), and they cannot be considered a panacea for all problems. This stresses the need to include more domain expertise into data-driven approaches in a hybrid manner, which is facilitated by the availability of custom losses and architectures in standard machine learning libraries (Ebert-Uphoff et al. 2021).

When traditional postprocessing methods are applied, the goal is to minimize the forecast error. This often leads to predictions that do not exhibit the typical spatial and temporal correlation structure that emerges from common patterns of atmospheric phenomena, or predictions that violate physical principles and dependencies between variables. However, for various applications, such as animated maps of meteorological parameters commonly disseminated to the public, or in the context of hydrological forecasting (Cloke and Pappenberger 2009) and renewable energy (Pinson and Messner 2018), it is important to provide forecast scenarios that not only have a smaller error, but also exhibit realistic spatiotemporal structures (e.g., Schefzik 2017, for related work). Furthermore, consistency across variables should be ensured in various applications. For hydrological modeling, for example, temperature, radiation, and precipitation should be consistent at all times. The issue of consistency is particularly relevant in the context of probabilistic postprocessing, where sampling from marginal predictive distributions is an additional step that further breaks the spatiotemporal and intervariable consistency. Existing approaches try to model dependencies from a statistical perspective, and not a physical one. We believe the two are complementary, closely related yet different as noted by Möhrlen et al. (2023). Furthermore, existing approaches were developed in the context of probabilistic forecasting, and they rely on the existence of a finite ensemble. Conversely, the methodology proposed here works in the deterministic setting, while the extension to probabilistic forecasts is left for future work. In the postprocessing field, limited research is available on the issue of physical consistency, in the sense of respecting physical principles or variable dependencies based on analytic relationships. However, this question has recently gained a lot of attention in the wider ML community, and some applications in weather modeling are reviewed in Kashinath et al. (2021) or Willard et al. (2022). In general, it has been shown that physical consistency can be pursued by applying constraints to DL models in order to prescribe specific physical processes. These constraints can take many forms. The most widely used approach is to incorporate physics via soft constraints, by defining physics-based losses in addition to common performance metrics such as mean absolute error (Daw et al. 2021). Another popular approach is to design custom model architectures such that the physical constraints are strictly enforced (e.g., Beucler et al. 2021; Ling et al. 2016).

In this paper we explore ways to incorporate domain knowledge in DL-based postprocessing models of temperature and humidity, and the related state variables. Specifically, we evaluate the effect of imposing constraints based on the ideal gas law and an empirical approximation of a thermodynamic state equation, and we identify benefits and disadvantages of different approaches. The goal of this paper is not to develop a highly optimized model for operational use, but rather to provide some technical guidelines and insights about incorporating meteorological expertise, in the form of analytic equations, in postprocessing models of NWP. For this reason, we simplified the problem under several aspects as explained in the next section. Most important, we focus here on a deterministic setting, although we hope to extend this framework to probabilistic predictions in the future.

2. Data and methods

a. Datasets

In this study, we make predictions at 131 weather station locations covering Switzerland, as shown in Fig. 1. We consider forecast data from COSMO-E (Klasa et al. 2018) for the postprocessing task that spans January 2017–December 2022. COSMO-E is a limited-area weather forecasting model used for the operational weather forecasts for Switzerland by MeteoSwiss. It is operated as a 21-member ensemble with a 2.2-km horizontal resolution. It runs two cycles per day (0000 and 1200 UTC) with a time horizon of 5 days. We interpolate the nearest model grid cell to the selected station locations, using the k-d tree method. We retain model runs initialized at 0000 UTC and consider lead times between +3 and +24 h with hourly steps. We use observational data from the SwissMetNet network (MeteoSwiss 2022) to define our “ground truth.” The predictors and predictands used in this study are summarized in Table 1. They represent instantaneous measurements with hourly granularity. We note that while using both dewpoint temperature and dewpoint deficit is redundant, this is necessary to guarantee a fair comparison between the different model architectures in section 3. In contrast to prior postprocessing studies, we train a single model for all lead times and use the lead time as a predictor, thereby increasing the variance in the training set. This choice was motivated by the ease of implementation, allowing us to focus on the main aspect of the proposed methodology, namely, the physical constraints.

Fig. 1.
Fig. 1.

The location of the 131 weather stations across Switzerland considered in our study. Stations are colored on the basis of their elevation above sea level. We show the topography of Switzerland and its surroundings in the background. Map tiles are by Stamen Design, under CC BY 3.0. Data are by OpenStreetMap, under an Open Database License (ODbL).

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0089.1

Table 1.

List of predictors and predictands considered in this study.

Table 1.

b. Cross validation and random seed

To consider the variability due to the data used during optimization, each model configuration was trained on multiple cross-validation splits. When dealing with time series, it is important to design the cross-validation strategy in a way that ensures independence between sets. At the same time, in our setup it is desirable that samples of different sets are roughly equally distributed throughout the year. To do so, we opted for a simple fourfold cross validation with a holdout set for testing. Specifically, of our five years of data we used 20% for testing, 60% for training and 20% for validation, and we removed values between sets in a gap of five days in order to ensure independence. A total of four years was considered for training and validation, meaning each of the four cross-validation folds considered a different year in the dataset. The dataset partitioning is shown in Fig. 1 in the online supplemental material. For each cross-validation split, we applied a standard normalization to the inputs of each set based on the training set’s mean and standard deviation. Moreover, to account for the stochasticity of neural network (NN) optimization (due to weights initialization and gradient descent), each model configuration was trained with three different random seeds. In total, for any given approach, 12 trials were conducted with the trained models and subsequently all evaluated on the same holdout test dataset.

c. Multitask neural networks for postprocessing

To keep our DL framework general, the basic building block used for all models in this study is a fully connected neural network (see Fig. 2) that takes as inputs a vector containing the predictors in Table 1 and a vector with station identifers (IDs). Station IDs are mapped into an n-dimensional, real-valued vector zR6 via an embedding layer and concatenated with the predictors. This approach, here referred to as “unconstrained setting,” is the same proposed by Rasp and Lerch (2018) and may be regarded as state-of-the-art for the case of local postprocessing of surface variables (Schulz and Lerch 2022). It is convenient because it allows for training a single model to do local postprocessing at many stations, instead of training one model for each station, making its operational implementation easier. We note that our R6 embedding has more dimensions than the R2 found in Rasp and Lerch (2018): this is likely due to the fact that we target five variables simultaneously and to the complex topography of the Alps (as compared with German stations). The forecasts are deterministic, and we train a model to predict multiple target variables simultaneously. As such, we are dealing with a case of multitask learning (see Crawshaw 2020, for a survey on the subject), where the individual errors of each task contribute to the objective loss function. Kendall et al. (2017) observed that the relative weighting of each task’s loss had a strong influence on the overall performance of such models. This is fairly intuitive in our application because tasks have different scales (due to different units) and uncertainties. The authors proposed a weighting scheme based on the homoscedastic uncertainty of each task, where the weights are learned during optimization. To the best of our knowledge, this approach for multitask learning is new in the postprocessing field. If jointly postprocessing multiple meteorological parameters simultaneously becomes more common, it will be important to design optimal weighting schemes. We use the mean-square error (MSE) for the loss function Lk of each task k. For a predicted value y^k,i in physical units (we will use that notation for predicted values throughout the rest of the paper) and an observed value yk,I, the task loss Lk is hence defined as
Lk=defi=1Nsamples(yk,iy^k,i)2Nsamples,
where Nsamples is the number of samples. For p tasks, we define the combined loss L as
L=defk=1p[12Lkσk2+logσk].
Each task’s loss Lk is scaled by the homoscedastic uncertainty represented by σk2, and a regularizing term logσk is added to prevent degenerating toward a zero-weighted loss. In practice, for improved numerical stability, we learn the log variance ηk=deflogσk2 for each task k, and Eq. (2) becomes
L=k=1pLkexp(ηk)+ηk2,
where the division by 2 can be ignored for optimization purposes because it does not influence the minimization objective. To avoid having to learn large biases, we initialize the bias vector in the model’s output layer using the training set-averaged output vector, which facilitates optimization. The optimization is insensitive to the initial values of this bias vector as long as it has the same order of magnitude as the mean output.
Fig. 2.
Fig. 2.

Summary of the models used in this study. (a) The basic building block of all models, a fully connected network preceded by an embedding layer. (b) The unconstrained setting used as a baseline, where all target variables are predicted directly. (c) The architecture-constrained setting, including a physical constraints layer that takes a subset of the target variables as inputs and returns the complete prediction. (d) The loss-constrained neural network, in which physical consistency is enforced by adding a physics-based penalty P to the conventional loss L. (e) An offline-constrained neural network, where constraints are only applied after training using the constraints layer.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0089.1

d. Enforcing analytic constraints in neural networks

The methodology used here follows Beucler et al. (2021) and was first applied to neural networks emulating subgrid-scale parameterization for climate modeling. Conservation laws are enforced during optimization via constraints in the architecture or the loss function. In this study we aim to enforce dependencies between variables using the ideal gas law and an approximate integral of the Clausius–Clapeyron equation used operationally at MeteoSwiss. Specifically, we postprocess air temperature T (°C), dewpoint temperature Td (°C), surface air pressure P (hPa), relative humidity (RH; %) and water vapor mixing ratio r (g kg−1). We then aim to enforce the following constraints:
RH=f(T,Td)=exp(aTdb+TdaTb+T)andr=g(P,Td)=622.0cexp(aTdb+Td)P[cexp(aTdb+Td)],
where a, b, and c are empirical coefficients, as explained below. The system of interest includes five variables and two constraints, which leaves us with three degrees of freedom. The constraints functions f and g are derived from the following equations:
e=cexp(aTdb+Td)andes=cexp(aTb+T),
RH=ees×100, and
r=10000.622epe.
We note from Eq. (5) that the parameters of interest are linked by two additional physical quantities: the water vapor pressure e (hPa) and the saturation water vapor pressure es (hPa). Equation (5) is structurally identical to the August–Roche–Magnus equation, an approximate integral of the Clausius–Clapeyron relation accurate for standard weather conditions. We use a = 17.368, b = 238.83, and c = 6.107 hPa for T ≥ 0 and a = 17.856, b = 245.52, and c = 6.108 hPa otherwise. We made this choice to ensure consistency with MeteoSwiss’ internal processing of meteorological variables, but other values can be found in the literature (e.g., Lawrence 2005). Equation (7) is a formula for water vapor mixing ratio derived from the ideal gas law for dry air and water vapor and can be found in many common textbooks (e.g., Emanuel 1994). We multiply by 1000 to express r in grams per kilogram rather than grams per gram.

We proceed to implement and compare two approaches on how to enforce physical constraints in our networks:

  1. In the architecture-constrained setting, the constraints are enforced by using a layer that uses Eqs. (4) to derive RH and r from T, Tdef, and P; Tdef is the dewpoint deficit, to which we apply a rectified linear unit (ReLU) activation function to ensure positivity before computing dewpoint temperature as Td=defTTdef. This additional step allows us to enforce that TTd and RH ∈ [0, 100], two desirable properties in this case. We show the constrained architecture in Fig. 2c. The trainable part of the model has the same number of layers and units as the unconstrained architecture, but directly predicts only a subset of variables. In our case, which has five outputs and two constraints, we directly predict three variables and derive the last two via a custom-defined layer encoding Eqs. (4). An important point is that the choice of which variables are computed first and which are derived analytically is arbitrary. Given n = 5 variables and q = 2 constraints, the total number of possibilities in our case is n!/[q!(nq)!] = 10. Nevertheless, there are differences in the actual implementation that point in favor of some configurations. For example, one may want the analytic constraints arranged in a way such that numerical stability is not a concern, for example, avoiding division by zero or asymptotes of logarithmic functions.

  2. In the loss-constrained setting, the constraints are enforced by using an additional physics-based loss term that includes a penalty term P based on residuals from our set of analytic equations. As for the architecture-constrained approach, the variables that we choose to calculate the residuals are arbitrary. Based on Eq. (4), we define the following constraints:
    {PRH=defRĤf(T^,Td̂)=0Pr=defr^g(P^,Td̂)=0,
    where physical violations result in nonzero residuals. Using the L2-norm for consistency with our MSE loss, we formulate the penalty term P used in the loss function as
    P=def(PRH)2σRH2+(Pr)2σr2,
    where we square the residuals PRH and Pr to penalize larger violations more, and then scale by the variance of the observed values σRH2 and σr2 in order to normalize the contribution of the two terms. Last, the physical penalty term is added to the conventional loss and our training objective becomes minimizing the physically constrained loss function LP:
    LP=def(1α)L+αP,
    where α ∈ [0, 1] is a hyperparameter used to scale the contribution of the physical penalty term. Note that in contrast with the hard constraints in the architecture, with the soft constraints in the loss, we have no guarantee that P = 0 because stochastic gradient descent does not generally lead to a zero loss.

Last, to assess whether enforcing constraints during optimization is advantageous, we additionally introduce an “offline-constrained” setting (see Fig. 2e) in which constraints are enforced after training. Specifically, we train a model to minimize the MSE of T, Td (derived from T and Tdef), and P. The RH and r are then calculated after training so as to exactly enforce our physical constraints.

e. Libraries, hyperparameters, and training

We use the PyTorch deep learning library (Paszke et al. 2019) to implement our models, the Ray Tune library (Liaw et al. 2018) for hyperparameter tuning, and Snakemake (Mölder et al. 2021) to manage our workflow (the code is available online: https://www.github.com/frazane/pcpp-workflow). For training we use the Adam optimizer (Kingma and Ba 2014) with the exponential decay rates set to β1 = 0.99 and β2 = 0.999 for the first and second moments, implementing an early stopping rule based on a validation loss to avoid overfitting. We use the aggregated normalized mean absolute error (NMAE) aggregated over all five outputs as our validation loss:
NMAE=def15k=15i=1Nsamples|yi,kyi,k̂|σk,
and halt training after five epochs in the absence of improvements in the validation loss. This metric was chosen for the validation and early stopping because it proved to be more robust to sudden model changes during training, when compared with the training loss. We save the model state with the lowest validation loss of all training epochs. The hyperparameters used to produce the main results of this study are shown in Table 2. We chose them after running a hyperparameter tuning algorithm for the unconstrained model that considered the aggregated loss of all cross-validation splits. The best performing hyperparameters configuration was then applied to all models. The loss-constrained model also required a hyperparameter to scale the influence of the physics-based penalty term. After testing different values, α was set to 0.995. We discuss this choice in the next section.
Table 2.

Hyperparameters used to train the models, along with their optimal value and the search space used for tuning the unconstrained model, represented as the range or set of possible values followed by the sampling method. After selecting the five best configurations automatically, we chose the one with the lowest number of trainable parameters for parsimony.

Table 2.

3. Results and discussion

In this section, we present the results of our models when evaluated on unseen data. We will first compare the performance and physical consistency of different architectures (section 3a) before discussing data efficiency (section 3b) and generalization ability (section 3c).

a. Predictive performance and physical consistency

We use two metrics to evaluate the overall performance of our models: the mean absolute error (MAE) and the mean-square skill score (MSSS) calculated with respect to the raw NWP forecast, defined as
MSSS=1MSEPPMSENWP,
where MSEPP and MSENWP represent the MSE for our postprocessed forecasts and the NWP forecast, respectively. The MSSS presented here was first computed on each station individually and then averaged. Values closer to 1 are better, and negative values indicate a decrease in performance. The overall results are presented in Table 3 and Fig. 3a for each variable. For both MAE and MSSS these results show that the performance is comparable for all NN architectures. There are small differences in performance of each setting on the different tasks, which we hypothesize are related to the influence of the constraints coupled with the multitask loss weighting. Overall, we observe slightly better results for the loss- and architecture-constrained approaches.
Table 3.

Two performance metrics: The mean absolute error (MAE; lower values are better) and mean-square skill score (MSSS; higher values are better) for each considered NN architecture (rows) and target variable (columns), averaged over the test set. Boldface type indicates the best performance for each metric and variable.

Table 3.
Fig. 3.
Fig. 3.

(a) MAE for each target variable and approach, where the boxplot distribution represents the nine trials using different cross-validation splits and random seeds. Note that the ranges of these distributions are relatively small in comparison with the absolute values of the error metric. (b) Scatterplot representing the distribution of physical violations PRH in RH units, as a function of RH, using all samples of all trials, where points are color coded by density. Physical violations are deviations from the zero line.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0089.1

Table 4 displays Diebold–Mariano predictive performance test results for MAE, applied individually to each station and lead time following the implementation of Schulz and Lerch (2022). The reported values show the percentage of tests where an approach significantly outperformed another. Overall, these results align with Table 3, without a clear winner, but the offline constrained approach appears generally worse, except for pressure. Our models achieved MAEs of approximately 1.35°C, whereas an altitude-corrected NWP forecast, often used as a reference, gave 1.63°C (using a fixed lapse rate of 6°C km−1).

Table 4.

For each target variable, the percentage of tests for which an approach (rows) produces significantly better forecasts than another (columns), according to Diebold–Mariano statistical tests performed with the MAE. Also reported is the average percentage of “wins” or “losses” for each approach. Tests are applied to each station and lead time individually.

Table 4.

We note that the MSSS values are surprisingly high, especially for pressure. These high MSSS values are due to the large errors in the NWP model. For variables that are strongly tied to elevation, such as pressure and temperature, the differences in the NWP model elevation and the true station elevation result in consistently large biases. These elevation differences can be larger than 100 m. Taking the example of pressure, the mean bias at certain stations is almost 90 hPa, which is reduced to almost 0 hPa by the postprocessing models, explaining the high MSSS values.

Figure 3b depicts the physical consistency of the predictions. On the vertical axis is ΠRH, which is the difference between the predicted RH and its physically consistent counterpart derived from the constraint function f(T, Td) [Eq. (4)], while we show predicted values on the horizontal axis. Deviations from zero are therefore considered physical violations, as are values that are not between 0 and 100. Relative to the unconstrained approach, we observe a noticeable decrease of violations using the loss-constrained approach, although large violations still occur at the ends of the RH distribution, where values larger than 100% still occur. The architecture-constrained models bring physical violations to zero to within machine precision. These results are consistent with Beucler et al. (2021). As a side remark, we note that in the unconstrained approach larger violations tend to occur more at the tails of the distribution, which could indicate that it is more difficult to converge to a physically consistent solution if the samples are scarce.

To choose an optimal value for the α hyperparameter, we tested several values and compared both the NMAE and the physical consistency of the predictions. The results are shown in Fig. 4 for the following values: α ∈ {0.0, 0.2, 0.5, 0.8, 0.9, 0.95, 0.99, 0.995, 0.999, 0.9999, 0.999 99}. We observe that the trade-off between physical constraints and performance is nonlinear: up to α = 0.995, there is little to no drawback in terms of performance. In contrast, for higher values of α, the NMAE starts to increase. We note two things: first, the choice of this hyperparameter α can be chosen based on how much one wishes to prioritize physical consistency over error reduction. Second, one should be aware that the choice of the learning rate has a significant influence on α’s impact (and vice versa) and thus on this trade-off, although this was not further investigated in this study. We limit ourselves to observe that, from a practical standpoint, this relationship is inconvenient as it makes model selection harder, and we consider it to be a drawback of the loss-constrained approach.

Fig. 4.
Fig. 4.

Effect of the hyperparameter α on both the overall physical violation P (in red) and the NMAE (in blue) for the test dataset.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0089.1

b. Robustness to data scarcity

Among the potential advantages of constraining neural networks using physical knowledge are improved robustness to data scarcity. The rationale is that because we reduce the hypothesis space of the model to a subset of physically consistent solutions, we could expect the physics-constrained models to learn with fewer training samples, or to require fewer parameters. We have therefore designed an experiment in which we retrained all models (with fewer parameters in order to reduce the chance of overfitting; see Table 2 in the online supplemental material) on increasingly reduced training datasets, namely, 20%, 5%, and 1% of the full dataset. The reduction is applied by station to ensure that all stations are still equally represented in the dataset. For instance, with the 1% reduction, we trained with roughly 200 samples for each station. The results, shown in Fig. 5a, seem to indicate a relatively smaller decrease in performance for the architecture-constrained approach when the data is scarce, although this difference is rather small. Importantly, the added value of enforcing physical consistency in data scarce situations is emphasized by Fig. 5b. For the unconstrained model in particular, the physical inconsistencies increase as we reduce the number of training samples. Conversely, for the architecture-constrained approach, physical inconsistencies are always zero by construction.

Fig. 5.
Fig. 5.

(a) NMAE for each reduction and approach, where the boxplot distribution represents the nine trials using different cross-validation splits and random seeds. As the size of the training dataset reduces, constrained models perform relatively better. (b) Boxplot showing the physical penalty term P’s distribution as a function of training data size for the unconstrained and loss-constrained settings. The architecture- and offline-constrained approaches have zero penalty by construction.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0089.1

c. Generalization ability

A common finding of physics-informed ML is that physically constraining models could help them generalize to unseen conditions (Willard et al. 2022). To test the ability of our models to generalize to unseen weather situations, we designed an experiment in which models are trained on a dataset that excludes the warm season (June–August), and then tested on the warm season only. This choice was motivated by the increasing relevance of record-shattering heat extremes in a warming climate. In such situations, the robustness of postprocessing models is put to test as they have to process and predict values never seen during training. We present our results in Fig. 6 and observe that physical constraints do not seem to impact the generalization capabilities of the model to unseen temperature extremes. This result, consistent with Beucler et al. (2020), suggests that the constraints of Eq. (4) are insufficient to guarantee generalization capability for our mapping.

Fig. 6.
Fig. 6.

NMAE for the test dataset containing samples from June–August, conditioned on different quantiles of the univariate temperature distribution. As expected, the error increases as temperature are more extreme, but the relative performance of the considered architectures does not change significantly.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0089.1

4. Conclusions and outlook

In this study, we have adapted a physically constrained, deep learning framework to postprocess weather forecasts, which is new to our knowledge. More generally, we demonstrated simple ways to integrate scientific expertise, in the form of analytic equations, into a DL-based postprocessing model. In comparison with unconstrained or loss-constrained models, architecture-constrained models enforce physical consistency to within machine precision without degrading the performance of the considered variables. Additionally, the architecture-constrained models were easier to implement in our case and therefore recommend them over the loss-constrained counterpart. We have also shown that physical constraints yield better predictions when data are scarce because of the increased value of physical consistency. However, we did not observe a significant advantage in terms of generalization capabilities. To interpret these results, it is useful to distinguish the data efficiency and the generalization experiment by their underlying challenge, that is, interpolation and extrapolation, respectively. Physically constraining outputs can help the model better interpolate data but cannot mitigate the well-known limitations of neural networks when it comes to out-of-distribution inputs (extrapolation).

We believe that a significant value of the proposed methodology lies in its simplicity: any kind of equation, as long as it is differentiable, can be included in DL-based postprocessing models. Importantly, this extends beyond the context of meteorology and physics-based constraints, as we could easily imagine a similar methodology used to satisfy a diverse set of constraints defined by the end users. For future research on this topic, we foresee (i) an extension to probabilistic forecasting, for example, by adopting a generative approach for the creation of physically consistent ensembles; and (ii) an extension to a global postprocessing setup, where the model generalizes in space. An open question is whether physical constraints have a stronger effect in more challenging tasks, for example, with higher-dimensional mappings or more marked nonlinearities.

Acknowledgments.

We thank the members of the APPP team at MeteoSwiss for helpful comments and feedback that significantly helped the project. Author Zanetta is supported by MeteoSwiss and ETH Zürich, authors Nerini and Liniger are supported by MeteoSwiss, and author Beucler is supported by the Canton of Vaud in Switzerland. We also thank the Swiss National Supercomputing Centre (CSCS) for computing infrastructure.

Data availability statement.

The project’s GitHub repository is accessible online (https://github.com/frazane/pcpp-workflow). The raw data used to train the models are free for research and education purposes and can be accessed via the IDA web portal (https://www.meteoswiss.admin.ch/services-and-publications/service/weather-and-climate-products/data-portal-for-teaching-and-research.html).

REFERENCES

  • Beucler, T., M. Pritchard, P. Gentine, and S. Rasp, 2020: Towards physically-consistent, data-driven models of convection. IGARSS 2020—2020 IEEE Int. Geoscience and Remote Sensing Symp., Waikoloa, HI, Institute of Electrical and Electronics Engineers, 3987–3990, https://doi.org/10.1109/IGARSS39084.2020.9324569.

  • Beucler, T., M. Pritchard, S. Rasp, J. Ott, P. Baldi, and P. Gentine, 2021: Enforcing analytic constraints in neural networks emulating physical systems. Phys. Rev. Lett., 126, 098302, https://doi.org/10.1103/PhysRevLett.126.098302.

    • Search Google Scholar
    • Export Citation
  • Bremnes, J. B., 2020: Ensemble postprocessing using quantile function regression based on neural networks and Bernstein polynomials. Mon. Wea. Rev., 148, 403414, https://doi.org/10.1175/MWR-D-19-0227.1.

    • Search Google Scholar
    • Export Citation
  • Chapman, W. E., A. C. Subramanian, L. Delle Monache, S. P. Xie, and F. M. Ralph, 2019: Improving atmospheric river forecasts with machine learning. Geophys. Res. Lett., 46, 10 62710 635, https://doi.org/10.1029/2019GL083662.

    • Search Google Scholar
    • Export Citation
  • Chapman, W. E., L. D. Monache, S. Alessandrini, A. C. Subramanian, F. M. Ralph, S.-P. Xie, S. Lerch, and N. Hayatbini, 2022: Probabilistic predictions from deterministic atmospheric river forecasts with deep learning. Mon. Wea. Rev., 150, 215234, https://doi.org/10.1175/MWR-D-21-0106.1.

    • Search Google Scholar
    • Export Citation
  • Cloke, H. L., and F. Pappenberger, 2009: Ensemble flood forecasting: A review. J. Hydrol., 375, 613626, https://doi.org/10.1016/j.jhydrol.2009.06.005.

    • Search Google Scholar
    • Export Citation
  • Crawshaw, M., 2020: Multi-task learning with deep neural networks: A survey. arXiv, 2009.09796v1, https://doi.org/10.48550/ARXIV.2009.09796.

  • Dai, Y., and S. Hemri, 2021: Spatially coherent postprocessing of cloud cover ensemble forecasts. Mon. Wea. Rev., 149, 39233937, https://doi.org/10.1175/MWR-D-21-0046.1.

    • Search Google Scholar
    • Export Citation
  • Daw, A., A. Karpatne, W. Watkins, J. Read, and V. Kumar, 2021: Physics-Guided Neural Networks (PGNN): An application in lake temperature modeling. arXiv, 1710.11431v3, http://arxiv.org/abs/1710.11431.

  • Ebert-Uphoff, I., R. Lagerquist, K. Hilburn, Y. Lee, K. Haynes, J. Stock, C. Kumler, and J. Q. Stewart, 2021: CIRA guide to custom loss functions for neural networks in environmental sciences—Version 1. arXiv, 2106.09757v1, https://doi.org/10.48550/ARXIV.2106.09757.

  • Emanuel, K. A., 1994: Atmospheric Convection. Oxford University Press, 580 pp.

  • Grönquist, P., C. Yao, T. Ben-Nun, N. Dryden, P. Dueben, S. Li, and T. Hoefler, 2021: Deep learning for post-processing ensemble weather forecasts. Philos. Trans. Roy. Soc., A379, 20200092, https://doi.org/10.1098/rsta.2020.0092.

    • Search Google Scholar
    • Export Citation
  • Haupt, S. E., W. Chapman, S. V. Adams, C. Kirkwood, J. S. Hosking, N. H. Robinson, S. Lerch, and A. C. Subramanian, 2021: Towards implementing artificial intelligence post-processing in weather and climate: Proposed actions from the Oxford 2019 workshop. Philos. Trans. Roy. Soc., A379, 20200091, https://doi.org/10.1098/rsta.2020.0091.

    • Search Google Scholar
    • Export Citation
  • Hemri, S., M. Scheuerer, F. Pappenberger, K. Bogner, and T. Haiden, 2014: Trends in the predictive performance of raw ensemble weather forecasts. Geophys. Res. Lett., 41, 91979205, https://doi.org/10.1002/2014GL062472.

    • Search Google Scholar
    • Export Citation
  • Höhlein, K., M. Kern, T. Hewson, and R. Westermann, 2020: A comparative study of convolutional neural network models for wind field downscaling. Meteor. Appl., 27, e1961, https://doi.org/10.1002/met.1961.

    • Search Google Scholar
    • Export Citation
  • Kashinath, K., and Coauthors, 2021: Physics-informed machine learning: Case studies for weather and climate modelling. Philos. Trans. Roy. Soc., A379, 20200093, https://doi.org/10.1098/rsta.2020.0093.

    • Search Google Scholar
    • Export Citation
  • Kendall, A., Y. Gal, and R. Cipolla, 2017: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. arXiv, 1705.07115v3, https://doi.org/10.48550/ARXIV.1705.07115.

  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/ARXIV.1412.6980.

  • Klasa, C., M. Arpagaus, A. Walser, and H. Wernli, 2018: An evaluation of the convection-permitting ensemble COSMO-E for three contrasting precipitation events in Switzerland. Quart. J. Roy. Meteor. Soc., 144, 744764, https://doi.org/10.1002/qj.3245.

    • Search Google Scholar
    • Export Citation
  • Lawrence, M. G., 2005: The relationship between relative humidity and the dewpoint temperature in moist air: A simple conversion and applications. Bull. Amer. Meteor. Soc., 86, 225234, https://doi.org/10.1175/BAMS-86-2-225.

    • Search Google Scholar
    • Export Citation
  • Liaw, R., E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica, 2018: Tune: A research platform for distributed model selection and training. arXiv, 1807.05118v1, https://doi.org/10.48550/ARXIV.1807.05118.

  • Ling, J., A. Kurzawski, and J. Templeton, 2016: Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech., 807, 155166, https://doi.org/10.1017/jfm.2016.615.

    • Search Google Scholar
    • Export Citation
  • MeteoSwiss, 2022: Automatic monitoring network. MeteoSwiss, accessed 23 August 2022, https://www.meteoswiss.admin.ch/home/measurement-and-forecasting-systems/land-based-stations/automatisches-messnetz.html.

  • Möhrlen, C., J. W. Zack, and G. Giebel, 2023: Best practice recommendations for forecast evaluation. IEA Wind Recommended Practice for the Implementation of Renewable Energy Forecasting Solutions, C. Möhrlen, J. W. Zack, and G. Giebel, Eds., Wind Energy Engineering, Academic Press, 147–184, https://doi.org/10.1016/B978-0-44-318681-3.00027-1.

  • Mölder, F., and Coauthors, 2021: Sustainable data analysis with Snakemake. F1000Research, 10:33, https://f1000research.com/articles/10-33.

  • Paszke, A., and Coauthors, 2019: PyTorch: An imperative style, high-performance deep learning library. arXiv, 1912.01703v1, https://doi.org/10.48550/ARXIV.1912.01703.

  • Pinson, P., and J. W. Messner, 2018: Application of postprocessing for renewable energy. Statistical Postprocessing of Ensemble Forecasts, S. Vannitsem, D. S. Wilks, and J. W. Messner, Eds., Elsevier, 241–266, https://doi.org/10.1016/B978-0-12-812372-0.00009-1.

  • Ralph, F. M., M. D. Dettinger, M. M. Cairns, T. J. Galarneau, and J. Eylander, 2018: Defining “atmospheric river”: How the Glossary of Meteorology helped resolve a debate. Bull. Amer. Meteor. Soc., 99, 837839, https://doi.org/10.1175/BAMS-D-17-0157.1.

    • Search Google Scholar
    • Export Citation
  • Rasp, S., and S. Lerch, 2018: Neural networks for postprocessing ensemble weather forecasts. Mon. Wea. Rev., 146, 38853900, https://doi.org/10.1175/MWR-D-18-0187.1.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., 2017: Ensemble calibration with preserved correlations: Unifying and comparing ensemble copula coupling and member-by-member postprocessing. Quart. J. Roy. Meteor. Soc., 143, 9991008, https://doi.org/10.1002/qj.2984.

    • Search Google Scholar
    • Export Citation
  • Schulz, B., and S. Lerch, 2022: Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison. Mon. Wea. Rev., 150, 235257, https://doi.org/10.1175/MWR-D-21-0150.1.

    • Search Google Scholar
    • Export Citation
  • Vannitsem, S., and Coauthors, 2021: Statistical postprocessing for weather forecasts: Review, challenges, and avenues in a big data world. Bull. Amer. Meteor. Soc., 102, E681E699, https://doi.org/10.1175/BAMS-D-19-0308.1.

    • Search Google Scholar
    • Export Citation
  • Veldkamp, S., K. Whan, S. Dirksen, and M. Schmeits, 2021: Statistical postprocessing of wind speed forecasts using convolutional neural networks. Mon. Wea. Rev., 149, 11411152, https://doi.org/10.1175/MWR-D-20-0219.1.

    • Search Google Scholar
    • Export Citation
  • Willard, J., X. Jia, S. Xu, M. Steinbach, and V. Kumar, 2022: Integrating scientific knowledge with machine learning for engineering and environmental systems. ACM Comput. Surv., 55, 137, https://dl.acm.org/doi/10.1145/3514228.

    • Search Google Scholar
    • Export Citation

Supplementary Materials

Save
  • Beucler, T., M. Pritchard, P. Gentine, and S. Rasp, 2020: Towards physically-consistent, data-driven models of convection. IGARSS 2020—2020 IEEE Int. Geoscience and Remote Sensing Symp., Waikoloa, HI, Institute of Electrical and Electronics Engineers, 3987–3990, https://doi.org/10.1109/IGARSS39084.2020.9324569.

  • Beucler, T., M. Pritchard, S. Rasp, J. Ott, P. Baldi, and P. Gentine, 2021: Enforcing analytic constraints in neural networks emulating physical systems. Phys. Rev. Lett., 126, 098302, https://doi.org/10.1103/PhysRevLett.126.098302.

    • Search Google Scholar
    • Export Citation
  • Bremnes, J. B., 2020: Ensemble postprocessing using quantile function regression based on neural networks and Bernstein polynomials. Mon. Wea. Rev., 148, 403414, https://doi.org/10.1175/MWR-D-19-0227.1.

    • Search Google Scholar
    • Export Citation
  • Chapman, W. E., A. C. Subramanian, L. Delle Monache, S. P. Xie, and F. M. Ralph, 2019: Improving atmospheric river forecasts with machine learning. Geophys. Res. Lett., 46, 10 62710 635, https://doi.org/10.1029/2019GL083662.

    • Search Google Scholar
    • Export Citation
  • Chapman, W. E., L. D. Monache, S. Alessandrini, A. C. Subramanian, F. M. Ralph, S.-P. Xie, S. Lerch, and N. Hayatbini, 2022: Probabilistic predictions from deterministic atmospheric river forecasts with deep learning. Mon. Wea. Rev., 150, 215234, https://doi.org/10.1175/MWR-D-21-0106.1.

    • Search Google Scholar
    • Export Citation
  • Cloke, H. L., and F. Pappenberger, 2009: Ensemble flood forecasting: A review. J. Hydrol., 375, 613626, https://doi.org/10.1016/j.jhydrol.2009.06.005.

    • Search Google Scholar
    • Export Citation
  • Crawshaw, M., 2020: Multi-task learning with deep neural networks: A survey. arXiv, 2009.09796v1, https://doi.org/10.48550/ARXIV.2009.09796.

  • Dai, Y., and S. Hemri, 2021: Spatially coherent postprocessing of cloud cover ensemble forecasts. Mon. Wea. Rev., 149, 39233937, https://doi.org/10.1175/MWR-D-21-0046.1.

    • Search Google Scholar
    • Export Citation
  • Daw, A., A. Karpatne, W. Watkins, J. Read, and V. Kumar, 2021: Physics-Guided Neural Networks (PGNN): An application in lake temperature modeling. arXiv, 1710.11431v3, http://arxiv.org/abs/1710.11431.

  • Ebert-Uphoff, I., R. Lagerquist, K. Hilburn, Y. Lee, K. Haynes, J. Stock, C. Kumler, and J. Q. Stewart, 2021: CIRA guide to custom loss functions for neural networks in environmental sciences—Version 1. arXiv, 2106.09757v1, https://doi.org/10.48550/ARXIV.2106.09757.

  • Emanuel, K. A., 1994: Atmospheric Convection. Oxford University Press, 580 pp.

  • Grönquist, P., C. Yao, T. Ben-Nun, N. Dryden, P. Dueben, S. Li, and T. Hoefler, 2021: Deep learning for post-processing ensemble weather forecasts. Philos. Trans. Roy. Soc., A379, 20200092, https://doi.org/10.1098/rsta.2020.0092.

    • Search Google Scholar
    • Export Citation
  • Haupt, S. E., W. Chapman, S. V. Adams, C. Kirkwood, J. S. Hosking, N. H. Robinson, S. Lerch, and A. C. Subramanian, 2021: Towards implementing artificial intelligence post-processing in weather and climate: Proposed actions from the Oxford 2019 workshop. Philos. Trans. Roy. Soc., A379, 20200091, https://doi.org/10.1098/rsta.2020.0091.

    • Search Google Scholar
    • Export Citation
  • Hemri, S., M. Scheuerer, F. Pappenberger, K. Bogner, and T. Haiden, 2014: Trends in the predictive performance of raw ensemble weather forecasts. Geophys. Res. Lett., 41, 91979205, https://doi.org/10.1002/2014GL062472.

    • Search Google Scholar
    • Export Citation
  • Höhlein, K., M. Kern, T. Hewson, and R. Westermann, 2020: A comparative study of convolutional neural network models for wind field downscaling. Meteor. Appl., 27, e1961, https://doi.org/10.1002/met.1961.

    • Search Google Scholar
    • Export Citation
  • Kashinath, K., and Coauthors, 2021: Physics-informed machine learning: Case studies for weather and climate modelling. Philos. Trans. Roy. Soc., A379, 20200093, https://doi.org/10.1098/rsta.2020.0093.

    • Search Google Scholar
    • Export Citation
  • Kendall, A., Y. Gal, and R. Cipolla, 2017: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. arXiv, 1705.07115v3, https://doi.org/10.48550/ARXIV.1705.07115.

  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/ARXIV.1412.6980.

  • Klasa, C., M. Arpagaus, A. Walser, and H. Wernli, 2018: An evaluation of the convection-permitting ensemble COSMO-E for three contrasting precipitation events in Switzerland. Quart. J. Roy. Meteor. Soc., 144, 744764, https://doi.org/10.1002/qj.3245.

    • Search Google Scholar
    • Export Citation
  • Lawrence, M. G., 2005: The relationship between relative humidity and the dewpoint temperature in moist air: A simple conversion and applications. Bull. Amer. Meteor. Soc., 86, 225234, https://doi.org/10.1175/BAMS-86-2-225.

    • Search Google Scholar
    • Export Citation
  • Liaw, R., E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica, 2018: Tune: A research platform for distributed model selection and training. arXiv, 1807.05118v1, https://doi.org/10.48550/ARXIV.1807.05118.

  • Ling, J., A. Kurzawski, and J. Templeton, 2016: Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech., 807, 155166, https://doi.org/10.1017/jfm.2016.615.

    • Search Google Scholar
    • Export Citation
  • MeteoSwiss, 2022: Automatic monitoring network. MeteoSwiss, accessed 23 August 2022, https://www.meteoswiss.admin.ch/home/measurement-and-forecasting-systems/land-based-stations/automatisches-messnetz.html.

  • Möhrlen, C., J. W. Zack, and G. Giebel, 2023: Best practice recommendations for forecast evaluation. IEA Wind Recommended Practice for the Implementation of Renewable Energy Forecasting Solutions, C. Möhrlen, J. W. Zack, and G. Giebel, Eds., Wind Energy Engineering, Academic Press, 147–184, https://doi.org/10.1016/B978-0-44-318681-3.00027-1.

  • Mölder, F., and Coauthors, 2021: Sustainable data analysis with Snakemake. F1000Research, 10:33, https://f1000research.com/articles/10-33.

  • Paszke, A., and Coauthors, 2019: PyTorch: An imperative style, high-performance deep learning library. arXiv, 1912.01703v1, https://doi.org/10.48550/ARXIV.1912.01703.

  • Pinson, P., and J. W. Messner, 2018: Application of postprocessing for renewable energy. Statistical Postprocessing of Ensemble Forecasts, S. Vannitsem, D. S. Wilks, and J. W. Messner, Eds., Elsevier, 241–266, https://doi.org/10.1016/B978-0-12-812372-0.00009-1.

  • Ralph, F. M., M. D. Dettinger, M. M. Cairns, T. J. Galarneau, and J. Eylander, 2018: Defining “atmospheric river”: How the Glossary of Meteorology helped resolve a debate. Bull. Amer. Meteor. Soc., 99, 837839, https://doi.org/10.1175/BAMS-D-17-0157.1.

    • Search Google Scholar
    • Export Citation
  • Rasp, S., and S. Lerch, 2018: Neural networks for postprocessing ensemble weather forecasts. Mon. Wea. Rev., 146, 38853900, https://doi.org/10.1175/MWR-D-18-0187.1.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., 2017: Ensemble calibration with preserved correlations: Unifying and comparing ensemble copula coupling and member-by-member postprocessing. Quart. J. Roy. Meteor. Soc., 143, 9991008, https://doi.org/10.1002/qj.2984.

    • Search Google Scholar
    • Export Citation
  • Schulz, B., and S. Lerch, 2022: Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison. Mon. Wea. Rev., 150, 235257, https://doi.org/10.1175/MWR-D-21-0150.1.

    • Search Google Scholar
    • Export Citation
  • Vannitsem, S., and Coauthors, 2021: Statistical postprocessing for weather forecasts: Review, challenges, and avenues in a big data world. Bull. Amer. Meteor. Soc., 102, E681E699, https://doi.org/10.1175/BAMS-D-19-0308.1.

    • Search Google Scholar
    • Export Citation
  • Veldkamp, S., K. Whan, S. Dirksen, and M. Schmeits, 2021: Statistical postprocessing of wind speed forecasts using convolutional neural networks. Mon. Wea. Rev., 149, 11411152, https://doi.org/10.1175/MWR-D-20-0219.1.

    • Search Google Scholar
    • Export Citation
  • Willard, J., X. Jia, S. Xu, M. Steinbach, and V. Kumar, 2022: Integrating scientific knowledge with machine learning for engineering and environmental systems. ACM Comput. Surv., 55, 137, https://dl.acm.org/doi/10.1145/3514228.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    The location of the 131 weather stations across Switzerland considered in our study. Stations are colored on the basis of their elevation above sea level. We show the topography of Switzerland and its surroundings in the background. Map tiles are by Stamen Design, under CC BY 3.0. Data are by OpenStreetMap, under an Open Database License (ODbL).

  • Fig. 2.

    Summary of the models used in this study. (a) The basic building block of all models, a fully connected network preceded by an embedding layer. (b) The unconstrained setting used as a baseline, where all target variables are predicted directly. (c) The architecture-constrained setting, including a physical constraints layer that takes a subset of the target variables as inputs and returns the complete prediction. (d) The loss-constrained neural network, in which physical consistency is enforced by adding a physics-based penalty P to the conventional loss L. (e) An offline-constrained neural network, where constraints are only applied after training using the constraints layer.

  • Fig. 3.

    (a) MAE for each target variable and approach, where the boxplot distribution represents the nine trials using different cross-validation splits and random seeds. Note that the ranges of these distributions are relatively small in comparison with the absolute values of the error metric. (b) Scatterplot representing the distribution of physical violations PRH in RH units, as a function of RH, using all samples of all trials, where points are color coded by density. Physical violations are deviations from the zero line.

  • Fig. 4.

    Effect of the hyperparameter α on both the overall physical violation P (in red) and the NMAE (in blue) for the test dataset.

  • Fig. 5.

    (a) NMAE for each reduction and approach, where the boxplot distribution represents the nine trials using different cross-validation splits and random seeds. As the size of the training dataset reduces, constrained models perform relatively better. (b) Boxplot showing the physical penalty term P’s distribution as a function of training data size for the unconstrained and loss-constrained settings. The architecture- and offline-constrained approaches have zero penalty by construction.

  • Fig. 6.

    NMAE for the test dataset containing samples from June–August, conditioned on different quantiles of the univariate temperature distribution. As expected, the error increases as temperature are more extreme, but the relative performance of the considered architectures does not change significantly.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 3089 2233 162
PDF Downloads 1654 818 55