A New Approach to Predict Tributary Phosphorus Loads Using Machine Learning– and Physics-Based Modeling Systems

Christina Feng Chang aDepartment of Civil and Environmental Engineering, University of Connecticut, Storrs, Connecticut

Search for other papers by Christina Feng Chang in
Current site
Google Scholar
PubMed
Close
,
Marina Astitha aDepartment of Civil and Environmental Engineering, University of Connecticut, Storrs, Connecticut

Search for other papers by Marina Astitha in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-3892-6672
,
Yongping Yuan bOffice of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina

Search for other papers by Yongping Yuan in
Current site
Google Scholar
PubMed
Close
,
Chunling Tang bOffice of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina

Search for other papers by Chunling Tang in
Current site
Google Scholar
PubMed
Close
,
Penny Vlahos cDepartment of Marine Sciences, University of Connecticut, Groton, Connecticut

Search for other papers by Penny Vlahos in
Current site
Google Scholar
PubMed
Close
,
Valerie Garcia aDepartment of Civil and Environmental Engineering, University of Connecticut, Storrs, Connecticut

Search for other papers by Valerie Garcia in
Current site
Google Scholar
PubMed
Close
, and
Ummul Khaira aDepartment of Civil and Environmental Engineering, University of Connecticut, Storrs, Connecticut

Search for other papers by Ummul Khaira in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Tributary phosphorus (P) loads are one of the main drivers of eutrophication problems in freshwater lakes. Being able to predict P loads can aid in understanding subsequent load patterns and elucidate potential degraded water quality conditions in downstream surface waters. We demonstrate the development and performance of an integrated multimedia modeling system that uses machine learning (ML) to assess and predict monthly total P (TP) and dissolved reactive P (DRP) loads. Meteorological variables from the Weather Research and Forecasting (WRF) Model, hydrologic variables from the Variable Infiltration Capacity model, and agricultural management practice variables from the Environmental Policy Integrated Climate agroecosystem model are utilized to train the ML models to predict P loads. Our study presents a new modeling methodology using as testbeds the Maumee, Sandusky, Portage, and Raisin watersheds, which discharge into Lake Erie and contribute to significant P loads to the lake. Two models were built, one for TP loads using 10 environmental variables and one for DRP loads using nine environmental variables. Both models ranked streamflow as the most important predictive variable. In comparison with observations, TP and DRP loads were predicted very well temporally and spatially. Modeling results of TP loads are within the ranges of those obtained from other studies and on some occasions more accurate. Modeling results of DRP loads exceed performance measures from other studies. We explore the ability of both ML-based models to further improve as more data become available over time. This integrated multimedia approach is recommended for studying other freshwater systems and water quality variables using available decadal data from physics-based model simulations.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Marina Astitha, marina.astitha@uconn.edu

Abstract

Tributary phosphorus (P) loads are one of the main drivers of eutrophication problems in freshwater lakes. Being able to predict P loads can aid in understanding subsequent load patterns and elucidate potential degraded water quality conditions in downstream surface waters. We demonstrate the development and performance of an integrated multimedia modeling system that uses machine learning (ML) to assess and predict monthly total P (TP) and dissolved reactive P (DRP) loads. Meteorological variables from the Weather Research and Forecasting (WRF) Model, hydrologic variables from the Variable Infiltration Capacity model, and agricultural management practice variables from the Environmental Policy Integrated Climate agroecosystem model are utilized to train the ML models to predict P loads. Our study presents a new modeling methodology using as testbeds the Maumee, Sandusky, Portage, and Raisin watersheds, which discharge into Lake Erie and contribute to significant P loads to the lake. Two models were built, one for TP loads using 10 environmental variables and one for DRP loads using nine environmental variables. Both models ranked streamflow as the most important predictive variable. In comparison with observations, TP and DRP loads were predicted very well temporally and spatially. Modeling results of TP loads are within the ranges of those obtained from other studies and on some occasions more accurate. Modeling results of DRP loads exceed performance measures from other studies. We explore the ability of both ML-based models to further improve as more data become available over time. This integrated multimedia approach is recommended for studying other freshwater systems and water quality variables using available decadal data from physics-based model simulations.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Marina Astitha, marina.astitha@uconn.edu

1. Introduction

Eutrophication problems resulting from excessive nutrient inputs, particularly phosphorus (P) and nitrogen (N), plague freshwater and coastal ecosystems globally (Paerl and Scott 2010; Smith 2003). Synergistic effects between anthropogenic nutrient sources and climate change are expected to continue to worsen water quality conditions (Moore et al. 2008; Paerl and Scott 2010; Visser et al. 2016). One susceptible water body is Lake Erie, the shallowest and most biologically productive of the Great Lakes, which has suffered from eutrophication effects such as harmful algal blooms and hypoxia since the 1960s (Ho and Michalak 2015; Koslow et al. 2013; Michalak et al. 2013; Scavia et al. 2014). During that time, point sources (e.g., wastewater discharge) were important contributors to excessive nutrient pollution in Lake Erie. Following the Great Lakes Water Quality Agreement (GLWQA) between the United States and Canada in the 1970s, which focused on establishing reduced target P loads, point source nutrient inputs to the Great Lakes declined and substantially reduced algal bloom events (DePinto et al. 1986; International Joint Commission 1978). However, since the late 1990s, Lake Erie has experienced significant resurgences of algal blooms, with contributing factors coming from nonpoint sources (i.e., agricultural runoff) of nutrient pollution exacerbated by heavy rainfall events that have led to increased river discharge into the lake (Gatz 2019; Kane et al. 2014; Koslow et al. 2013; Michalak et al. 2013; Scavia et al. 2014; Stow et al. 2015; Watson et al. 2016).

The main driver of eutrophication conditions in Lake Erie is elevated P loads from manure application and other forms of crop fertilization. Excess nutrients from these nonpoint sources drain into the lake from the surrounding watersheds, accounting for 71% of the total P (TP) load (Maccoux et al. 2016; Obenour et al. 2014; Scavia et al. 2016b). Other sources of P include point source discharge into tributaries (5%), point source discharge directly into the lake (14%), atmospheric deposition (6%), and input from upstream Great Lakes (4%) (Maccoux et al. 2016). Several tributaries carry P into Lake Erie, but the Maumee River watershed is the largest tributary in the area and contributes the largest source of TP and orthophosphate or dissolved reactive P (DRP) loads discharging into the western Lake Erie basin (WLEB), the shallowest basin (Koslow et al. 2013; Maccoux et al. 2016; Obenour et al. 2014; Stow et al. 2015). TP can be found in organic or inorganic particulate or dissolved forms; DRP is the inorganic, soluble, and highly bioavailable fraction of TP, making it the perfect fuel for plants and algal growth. To date, eutrophication problems continue to be managed with binational TP and DRP load reduction targets through the GLWQA Annex 4 of 2016 (International Joint Commission 2012, 2014; Interagency Working Group on the Harmful Algal Bloom and Hypoxia Research and Control Act 2017). These recurring eutrophication problems in Lake Erie have led to the development of countless modeling efforts over the years.

The relationship between nutrient concentrations in receiving waters and upstream nutrient loadings has long been identified (Sawyer 1947; Vollenweider 1968). Studies have adopted hybrid statistical, mechanistic, or empirical models to assess excessive nutrient loadings in the environment (e.g., Johnes 1996; Reckhow and Chapra 1999). Measured tributary nutrient loads have been vital to model algal biomass and cyanobacterial blooms in the WLEB (Bertani et al. 2016; Leon et al. 2011; Obenour et al. 2014; Stumpf et al. 2012, 2016; Verhamme et al. 2016; Wynne et al. 2018), assess hypoxia in the central basin (Bocaniov and Scavia 2016; Bocaniov et al. 2016; Scavia et al. 2014), simulate Cladophora growth in the eastern basin (Valipour et al. 2016), and study other eutrophication problems through combined statistical, satellite-based, or process-based water quality models. Investigations of long-term and seasonal nutrient inputs in combination with changing meteorological patterns have provided a better understanding of the impacts of changing agricultural land use and practices (e.g., fertilizer application timing, tillage practices, increases in corn cropland) on eutrophication problems (Han et al. 2012; Michalak et al. 2013; Williams and King 2020). Higher intensity precipitation due to shifting climate and hydrologic patterns coupled with increases in agricultural nonpoint sources have been tied to higher P loads in the WLEB (Michalak et al. 2013; Stow et al. 2015; Williams and King 2020). Heavy rainfall events lead to increases in tributary loads due to fertilizer runoff from agricultural lands and increases in discharge volumes, which have consequently led to problematic water quality events in Lake Erie (Koslow et al. 2013; Michalak et al. 2013; Stow et al. 2015). Due to concerns about worsening water quality conditions in the future and the need to evaluate best agricultural management practices, it has become increasingly important to have modeling frameworks capable of predicting tributary nutrient loads that consider agricultural nonpoint sources.

Common watershed models that have been used to study nutrient transport in the Laurentian Great Lakes include the Spatially Referenced Regressions on Watershed Attributes (SPARROW) model and the Soil and Water Assessment Tool (SWAT). The SPARROW model is a hybrid empirical and process-based mass balance model developed by the U.S. Geological Survey (USGS) that has been used to study the U.S. side of the Great Lakes region. The SPARROW model can be used to estimate nutrient loads through a series of hydrologically linked catchments and simulate long-term mean annual nutrient transport from the upper Midwest to the Great Lakes by using input data from various stream, nutrient, and environmental databases (Robertson and Saad 2011; Robertson et al. 2009; Schwarz et al. 2006). The SWAT model is a process-based watershed model that uses data inputs of climate, soil, slope, land-use, and land management information to estimate hydrology, water quality, and plant growth developed by the U.S. Department of Agriculture (USDA) (Arnold et al. 1998). Researchers have used these models to inform agricultural land management scenarios, guide nutrient reduction strategies, and investigate potential climate change impacts on discharge and nutrient loads in the Lake Erie area (Yuan and Koropeckyj 2022).

Efforts and developments to capture the entirety of the complex interconnections between air–land–stream processes continue through modeling systems. An example of this is the Fertilizer Emission Scenario Tool for Community Multiscale Air Quality (CMAQ) (FEST-C) interface, an integrated process-based modeling system developed by the U.S. Environmental Protection Agency (EPA) that includes atmospheric N deposition, meteorology, and fertilization interactions (Cooter et al. 2012; Ran et al. 2010). FEST-C was then integrated with the SWAT model to simulate streamflow and the fate and transport of dissolved N loadings from the Mississippi River basin to the Gulf of Mexico (Ran et al. 2019; Yuan et al. 2018). Usage of machine learning (ML) strategies is also becoming more common to study hydrology and water quality indicators such as watershed nutrient loading (Kim et al. 2012; Shen et al. 2020), lake nutrient dynamics (Hanson et al. 2020), algal blooms (Wei et al. 2001), trophic states (Hollister et al. 2016), and water temperature (Read et al. 2019).

An ML-based framework that integrates various numerical modeling outputs encompassing weather, hydrology, and agroecosystems has also been developed to assess and predict dynamic chlorophyll a (chl a) concentrations in Lake Erie (Feng Chang et al. 2021), referred to herein as the chl a model. The chl a model uses the random forest (RF) algorithm (Breiman 2001), a well-known ML algorithm, for its ability to reduce bias and provide good predictions while handling limitations due to distribution assumptions, correlations between variables, and sensitivity to outliers. The benefits of using ML algorithms include their capability to continue to learn and improve over time as more data become available, while the dynamic physics-based model outputs used as predictors can capture the change of environmental variables across time and space, allow for agricultural management and climate change scenario evaluations, and predict future outcomes. The result of combining ML- and various physics-based model outputs is a versatile tool that can be modified to perform countless environmental studies in numerous locations, if sufficient data are given and the most influential explanatory variables are selected.

In this paper, we build upon our previous work (Feng Chang et al. 2021) by demonstrating how this ML-based framework can be applied to predict TP and DRP loads for various tributary watersheds that discharge into the WLEB, thereby shifting our focus from in-lake chl a concentrations to nutrient loads from different tributaries (change in location and response variables). TP and DRP tributary loads in the Lake Erie watershed were selected for testing this new modeling approach due to their known importance in eutrophication problems and the availability of historical measurements. Our contribution is an alternative ML-based modeling approach that complements current state-of-the-science nutrient load models and continues to support the improvement and identification of tributary P loads. The uniqueness of our methodology lies in the combined usage of available decadal simulations from three numerical prediction models that involve weather from the Weather Research and Forecasting (WRF) Model (Skamarock et al. 2019), hydrology variables from the Variable Infiltration Capacity (VIC) model (Liang et al. 1994), and agroecosystem variables from the Environmental Policy Integrated Climate (EPIC) model (Williams et al. 2006) and their integration with ML to predict monthly TP and DRP loads. This integration of ML- and physics-based models allows efficient utilization of large available datasets from federal agencies and addresses new environmental problems, which the physical models themselves are not equipped to address. Other watershed process-based models directly rely on explanatory variables from observed datasets, or a combination of observed datasets and numerical prediction outputs, to calibrate, validate, and simulate nutrient transport. Our modeling approach feeds explanatory variables (outputs) from numerical prediction models into the RF algorithm and relies on it to learn about these complicated interconnected environmental relationships to predict P loads.

The main objectives of this study are to 1) demonstrate the transferability of the integrated multimedia modeling system framework to predict monthly tributary TP and DRP loads, 2) rank the importance of environmental variables that are known to affect high TP and DRP loads, and 3) explore the models’ capability to improve over time with more data availability. TP and DRP predictions are examined temporally and spatially. The performance of our models is compared with other investigations conducted for the Maumee area. We summarize and discuss our findings from sensitivity tests to identify ways to transfer and apply multimedia modeling and ML strategies to study a range of watershed and surface water quality problems. Our approach provides insights into the interconnections between nutrient loads and weather–land–hydrology in various watersheds near the WLEB.

2. Data and methods

a. Observed data

Stream and river water quality data were downloaded from the National Center for Water Quality Research (NCWQR) Heidelberg Tributary Loading Program (HTLP) data portal (https://ncwqr-data.org/HTLP/Portal; NCWQR 2022). The HTLP is the most detailed tributary monitoring program that has supplied data for Lake Erie calculations since 1975 (LaBeau et al. 2013). Nutrient samples at monitoring stations were collected 3 times a day using automated refrigerated ISCO brand samplers that pulled water from a receiving basin using a submersible pump in the river. A minimum of one sample per day was analyzed during low-flow periods, but during periods of high flow, all samples were analyzed. More information about sampling and analytical methods is available in the HTLP project study plan (Roerdink 2017). Streamflow measurements were provided by USGS gauging stations, as HTLP monitoring stations are collocated (yellow triangles in Fig. 1).

Fig. 1.
Fig. 1.

Map of study area indicating 1) locations of the NCWQR HTLP monitoring stations (yellow triangles), 2) watersheds of interest [Maumee (green), Sandusky (purple), Portage (pink), and Raisin (orange)], and 3) 12-km grid points from the WRF, VIC, and EPIC numerical prediction models (black circles). The inset panel depicts the western, central, and eastern Lake Erie basins.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

Our study focused on four monitoring stations located at the following rivers, which are primarily impacted by agricultural activities: Maumee River at Waterville, Ohio; Sandusky River near Fremont, Ohio; Portage River at Woodville, Ohio; and Raisin River at Monroe, Michigan (Fig. 1). Monitoring stations capture the quantity of pollutant loads entering the WLEB, which also represent the pollutant exports from upstream watersheds. Monthly loads of TP and DRP [in metric tons (MT)] from January 2002 to December 2017 (16 years) for each monitoring station served as the observed data (response variables) for this work. To calculate monthly loads, daily loads were estimated by multiplying streamflow by the daily TP or DRP concentration and the sample time window (to account for multiple measurements that may be available for a day) (NCWQR 2005). Daily loads were then summed monthly if data were available for at least half of the month (15-day cutoff) for a particular month and station; otherwise, that data point was discarded because of insufficient information. A 15-day cutoff was a safe choice that allowed the model to capture at least half of the monthly P loads without sacrificing too much data availability. The 15-day cutoff has also been used in prior studies (e.g., Zeng et al. 2019). We observed that some data were missing from Raisin, while data from Portage were only available starting in late 2010. A total of 595 monthly data points of TP loads and 594 monthly data points of DRP loads were obtained from the four monitoring stations. An additional point in DRP was excluded because it did not satisfy the 15-day cutoff threshold. All TP and DRP data from the four monitoring stations were used to develop a model for TP and a model for DRP that can predict P loads at the four tributaries at the same time.

Data for TP and DRP are right skewed, with a minority of the data being high P load values (Figs. 2a,b). The highest monthly TP load of greater than 1000 MT was observed during winter; however, high TP loads were more frequent during spring (500–900 MT) (Fig. 2c). TP loads were often much lower during summer and fall months than in spring and winter months. Similarly, high DRP loads were more frequent during spring (100–200 MT), with DRP loads observed to be the highest during winter and summer months (>200 MT) (Fig. 2d). Winter consists of December, January, and February; spring consists of March, April, and May; summer consists of June, July, and August; and fall consists of September, October, and November.

Fig. 2.
Fig. 2.

Histograms showing the frequency distribution of the response variables TP and DRP in MT: (a) The TP model consists of 595 monthly summed TP observations, and (b) the DRP model consists of 594 monthly summed DRP observations. Also shown are winter (blue), spring (orange), summer (gray), and fall (yellow) frequency distributions for (c) TP observations and (d) DRP observations.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

b. Model data

The three dynamic physics-based modeling systems contained over 100 explanatory variables encompassing weather, hydrology, and agroecosystem environmental factors. Literature review was the fundamental first step that was used to identify meaningful explanatory variables that were known to provide a physical understanding of tributary P loads and also helped exclude most of the unnecessary variables that were unrelated to P. Sensitivity tests were then used to further narrow down highly correlated P soil concentration variables (i.e., six different P soil concentration variables representing different soil layers) from the agroecosystem model. The sensitivity tests involved rerunning the ML-based models with different combinations of P soil concentration variables through a trial-and-error process until the best predictions were obtained. A selection of 10 variables for TP and nine variables for DRP served as the final set of explanatory variables used in the ML models to examine and predict monthly TP and DRP loads discharging into Lake Erie from 2002 to 2017 (Table 1). The WRF, VIC, and EPIC simulations used in this study were conducted by the EPA (Bash et al. 2013; Garcia et al. 2016; Tang and Dennis 2014; Feng Chang et al. 2021). We followed a modeling framework similar to that established in Feng Chang et al. (2021); however, our selection of variables, temporal scale, and pairing of P loads with explanatory values varies significantly due to the nature of the P load problem we are addressing.

Table 1.

List and definitions of explanatory variables used to build the ML models. All variables are used in the TP model, but SM is not included in the DRP model. Here, WS is watershed.

Table 1.

The WRF, VIC, and EPIC models were run at a 12-km horizontal grid spacing for the entire contiguous United States. WRF (version 4.2.1) dynamically predicts weather variables. WRF is utilized by VIC and EPIC to simulate the hydrologic cycle and the agroecosystem responses, respectively. WRF generates output at an hourly frequency of air temperature, precipitation, wind speed, and solar radiation, among other weather variables. The only variable we used from WRF in our ML models was precipitation.

Hydrology information was provided by VIC (version 4), a macroscale hydrologic model that solves full water and energy balances (Liang and Lettenmaier 1994). VIC provides output at a daily frequency and calculates soil moisture (SM) at three layers [layer 1 (0–10 cm), layer 2 (10–40 cm), and layer 3 (40–150 cm)], evapotranspiration, surface flow, base flow, water temperature, and heat fluxes. Our ML models used streamflow and soil moisture from VIC. The VIC model uses the digital elevation model to delineate the hydrologic unit code 8 (HUC-8) watershed boundaries (USGS 2020) and identify the drainage area associated with the four HTLP monitoring locations. Streamflow was generated by routing the flow to the closest grid points near the USGS gauge stations, which are associated with each HTLP station. The Maumee watershed consists of various aggregated HUC-8 watersheds (St. Mary’s, Auglaize, Tiffin, Blanchard, Upper Maumee, Lower Maumee, and St. Joseph), as seen in green in Fig. 1. Sandusky, Portage, and Raisin are individual HUC-8 watersheds.

Agroecosystem information was provided at a daily frequency by the USDA’s EPIC model (version 0509), which simulates plant demand–driven fertilizer applications (time, rate, and composition) for 42 crops under different agricultural management practices (Bash et al. 2013; Cooter et al. 2012; Ran et al. 2019; Williams et al. 2006). EPIC provides output fertilizer application and nutrient soil concentrations of carbon, N, and P species across various layers, among other information. Additionally, EPIC calculates N and P losses from the field through the movement of water, including losses due to surface runoff, subsurface flow (SSF), percolation (PRK), and sediment loss. The TP and DRP ML models used P loss variables and soil layer concentrations for mineralized and organic P species. Soil concentrations were available as layer 1 (upper 1 cm of the soil), layer 2 (1–15 cm), and total soil layer concentration [from 0 to full Baumer soil profile depth, which can be more than 1 m (up to 2.28 m in Lake Erie)]. The layer-1 soil concentration captures the more recent surface-applied fertilizers. “Injected” or plow-incorporated fertilizers are captured in layer-2 soil concentrations, which are also affected by percolation of the nutrients from above. We excluded from EPIC the following: nitrogen variables, variables relevant to crop growth, runoff (because it is highly correlated with streamflow), and fertilizer application variables (because P loads are connected to the P concentration in the soil rather than to what is sprayed from fertilizers at the surface).

The ML-based P models used the seven crops most widely grown in the U.S. Lake Erie region, which account for over 98% of the maximum crop coverage. These include alfalfa, corn grain, corn silage, other crops, hay, soybeans, and winter wheat (Feng Chang et al. 2021). EPIC P soil concentrations and loss outputs were converted from kilograms per hectare to MT by multiplying the fertilizer application value at each grid point with the associated crop acreage supplied by the Biogenic Emissions Landuse Dataset (version 4.1) file (EPA 2019) and converting from kilograms to MT.

WRF, VIC, and EPIC variables were previously evaluated across the entire Lake Erie watershed, showing good agreement between modeled and observed data for the chl a model (Feng Chang et al. 2021), the exception being WRF precipitation, which is a known challenge for numerical weather prediction models. Nevertheless, sensitivity tests showed that uncertainties in WRF precipitation did not significantly impact the performance of the chl a model (Feng Chang et al. 2021). Additionally, VIC soil moisture variables are difficult to evaluate, given the sparse availability of observations, but provide information related to how wet or dry the environmental conditions are. On the other hand, streamflow from VIC, which was shown by Feng Chang et al. (2021) to be accurately predicted (R2 = 0.91), is an important variable that influences tributary nutrient loads more readily than precipitation, thus improving the uncertainties inherent in the modeled WRF precipitation.

Nutrient soil concentrations and loss variables at the field were difficult to evaluate since they were not measured. However, EPIC fertilizer application outputs of N and P for the U.S. side of Lake Erie from 2002 to 2016 have previously been evaluated with fertilizer and livestock manure inventories and sales data and have shown a strong correlation (Feng Chang et al. 2021), providing confidence in the EPIC model predictions. Researchers have also demonstrated the reliability of the EPIC model by comparing simulated crop yield distributions with actual crop yields (Bryant et al. 1992; Easterling et al. 1997; Izaurralde et al. 2006; Ko et al. 2009; Niu et al. 2009). Additionally, researchers have extensively used the EPIC model to assess and manage crop water use and production (Bryant et al. 1992; Easterling et al. 1997; Izaurralde et al. 2006; Ko et al. 2009; Niu et al. 2009), which demonstrates the usefulness of the EPIC model outputs in real-world applications.

c. Aggregation of explanatory variables

Outputs from WRF, VIC, and EPIC were temporally aggregated to coincide with the monthly focus of our study. Increased rainfall and river discharge are the major transport mechanisms of nutrients from nonpoint sources into surface water. To capture rainfall and wet conditions throughout each month, precipitation was summed, and streamflow was averaged per month (Koslow et al. 2013; Michalak et al. 2013). The three individual soil moisture layers were averaged into a single soil moisture variable that represented the monthly average from 0 to 150 cm. Our study revealed that the ML model performed better with one soil moisture layer that described the entire depth than with three individual soil moisture layers, as it simplified the model and provided similar information. That is, the three soil moisture layers shared a similar distribution and fluctuation throughout the months, and the average soil moisture variable indicated an accurate representation of the soil moisture conditions at each layer.

Unlike the seasonal approach used in the chl a model (Feng Chang et al. 2021), the application of fertilizers, which takes place from March through July (with a few days of fertilizer application in the beginning of October), did not substantially improve the prediction of P loads. Because high P loads were the most challenging to predict due to low data availability, the various layers of P soil concentrations from EPIC were explored instead of the fertilizer applications. Layer-1 P soil concentrations were found to be more influential in predicting TP and DRP loads relative to individual or combined layer-2 soil concentrations, overall total soil concentrations, or combinations of the various soil concentrations. This finding is reasonable since layer-1 soils are more exposed to the environmental impacts of rainfall than are the deeper soil-layer counterparts. To capture the potential amounts of nutrients that are available in the field, can be lost, and enter the tributaries, individual layer-1 P soil concentrations and P loss variables were summed for each month, respectively. Explanatory variables were a summation of all daily gridpoint values in the watershed associated with the HTLP monitoring location as determined above, except for soil moisture, which was interpolated for the exact HTLP monitoring location. The complete list of variables used in the final ML models and their definitions can be found in Table 1.

We acknowledge that N loads also play an important role in water quality problems, as supported by many other studies (e.g., Gobler et al. 2016; Monchamp et al. 2014; Paerl and Scott 2010). However, we concentrate on P loads in this paper. The preliminary analysis showed that the modeling approach can be used to assess N loads with a different set of explanatory variables, and we will present the results in a separate paper.

d. Development of the P load ML models

The RF algorithm was selected because of its capability to identify nonlinear interactions between the target variable and explanatory variables and to better handle correlated explanatory variables without decreasing prediction accuracy when compared with other traditional linear statistical methods (Feng Chang et al. 2021). The RF algorithm is a supervised ML model that aggregates multiple decision trees to solve both classification and regression problems (Breiman 2001). Regression decision trees are used in our study to predict monthly TP and DRP loads. An individual regression decision tree continuously partitions the data starting from the parent node, which splits into two or more child nodes, which further split and become parent nodes, until a decision is made based on the weighted mean-square error (MSE) (Breiman et al. 1984). Decision trees calculate the MSE at each node and stop extending when a low MSE is achieved, which indicates the most informative split in a node. Each decision tree in the RF is trained using different data samples, since sampling is done with replacement (bootstrap aggregation) (Breiman 2001).

The randomForest package in R code (Liaw and Weiner 2002) was used to build a predictive model for TP and a separate predictive model for DRP using the explanatory variables. Variables were randomly sampled according to the number of features considered at each split (mtry) and the predetermined number of decision trees identified to grow in the forest (ntree), both of which are RF hyperparameters.

e. Cross-validation approaches and performance metrics

The K-fold (K = 10) cross-validation (CV) out-of-sample technique serves as the backbone for all our RF models in this study. K-fold CV is used for the training and validation of our RF models. K-fold CV splits the dataset into 10 folds, which iteratively fits the model 10 times, with each fold taking turns as a validation dataset or as part of the training dataset (Kohavi 1995). K-fold CV is a tool to maximize the number of data available to train models to better predict challenging thresholds.

K-fold CV was used to build an RF model for TP loads and an RF model for DRP loads, referred to as the “original TP model” and the “original DRP model”, respectively. K-fold CV was also used to tune the hyperparameters mtry and ntree to optimize the RF models’ performance through the caret package in R (Kuhn 2019). The mtry (ranging from 1 to 10 for the TP model and from 1 to 9 for the DRP model) and ntree (ranging from 500 to 2500 with increments of 500) were tuned through the grid search feature in the package. The final RF models for TP and DRP use mtry = 6 and ntree = 1500 as the optimal hyperparameters. The ranking of variables was evaluated through variable importance plots obtained from the original TP/DRP models. Variable importance plots show the impact a variable has on the overall RF model using the percent increase in MSE (%IncMSE), where a variable with a higher %IncMSE is a variable of more importance.

The performance metrics used in this study include the coefficient of determination (R2); root-mean-square-error (RMSE); centered root-mean-square-error (CRMSE); bias; Nash–Sutcliffe efficiency (NSE) (Nash and Sutcliffe 1970), which ranges from −∞ to 1, with values ≤0 indicating unacceptable performance and 1 indicating the optimal value; and percent bias (PBIAS), which has an optimal value of 0%. NSE explains how well the model simulates the trends of P loads, and PBIAS explains how well the model simulates the average magnitudes of P loads by identifying whether there is overprediction or underprediction. We opted to evaluate the model performance through the descriptive model performance ratings established by Moriasi et al. (2007, 2015), which have been widely used to evaluate and rate hydrologic and water quality models (e.g., Gildow et al. 2016; Kalcic et al. 2016; Scavia et al. 2016a; Yuan and Koropeckyj 2022) into ranges of “unsatisfactory” to “very good.” For P, model performance is considered “satisfactory” if R2 ranges from 0.40 to 0.65, NSE ranges from 0.35 to 0.50, and PBIAS ranges from ±20% to ±30%; “good” if R2 ranges from 0.65 to 0.80, NSE ranges from 0.50 to 0.65, and PBIAS ranges from ±15% to ±20%; and “very good” if R2 > 0.80, NSE > 0.65, and PBIAS < ±15% (Moriasi et al. 2015). In situations of unbalanced performance ratings, the overall performance is described conservatively based on the lowest performance rating obtained (Moriasi et al. 2007, 2015). For example, if a model obtained a “satisfactory” R2 with a “very good” NSE and an “unsatisfactory” PBIAS, the overall model performance is classified as “unsatisfactory.”

The leave-1-year-out CV approach was used as a testing phase to assess the original TP/DRP models’ predictive power by predicting TP and DRP data that were held out from the training dataset and mimics the real-world application of the RF models. Sixteen separate RF models were built using 10-fold CV, where each individual year from 2002 to 2017 was excluded once to serve as an independent holdout sample (testing dataset), while the remaining 15 years of data were used for training and validation. Predictions for each individual year obtained from the 16 separate RF models for each P species were consolidated and are referred to as the “TP testing model” and the “DRP testing model.” In addition to the performance metrics mentioned above, contingency statistics [percent of correct detections (PC) and probability of detection (POD)] were calculated for the TP/DRP testing models. Since a majority of the TP loads are below 100 MT (Fig. 2a), contingency statistics were calculated for TP loads > 100 MT (POD1), which consists of 20% of the samples and for TP loads > 100 MT (POD2). Similarly, contingency statistics for DRP loads > 50 MT (POD1) were calculated, as they consist of only 10% of the samples (Fig. 2b), as were DRP loads ≤ 50 MT (POD2). More information on contingency tables and calculations of PC and POD is provided in Feng Chang et al. (2021).

Predictions obtained in the TP and DRP testing models were evaluated temporally (seasonally and monthly) and spatially (by watershed) to identify any model inconsistencies, weaknesses, and successes. The performance statistics of the TP and DRP testing models were compared with the performance statistics (validation statistics, not calibration statistics) from SWAT investigations in Yuan and Koropeckyj (2022), who performed a comprehensive synthesis of the findings of 28 SWAT modeling studies within the WLEB and recorded the R2, NSE, and PBIAS values for each study. The SWAT studies reviewed by Yuan and Koropeckyj (2022) varied significantly in the collection of watershed areas investigated. To compare relatable hydrologic simulations from SWAT, comparison was limited to the Maumee watershed statistics listed in Yuan and Koropeckyj (2022). Of the 28 SWAT studies, only seven studies investigated TP loads and four studies investigated DRP loads, which were used for comparison with the Maumee statistics obtained in our testing models. The comparison of performance measures was conducted to better understand the performance of the ML models relative to a state-of-the-science model like SWAT.

We used learning curves (Perlich 2011) to investigate whether increasing the training data would improve the performance of the RF model. In this manner, learning curves were set up to investigate how adding an additional year of training data would change the performance metrics of the RF model. The RF models were trained through 10-fold CV by adding an additional year one at a time to imitate the real functionality of the models, as more data points are included over time. Improvements were tracked through performance metrics individually plotted in the y axis against the number of years used for training the RF model (x axis) at each time step of the learning curve. To evaluate the uncertainty of the performance metrics and determine whether the improvement in performance was significant, a 95% confidence interval was established by bootstrapping the RF model predictions at each time step by resampling with replacement a total of 10 000 times.

3. Results and discussion

a. Importance of explanatory variables

Both RF models ranked streamflow (Q) as the most important variable for the prediction of both TP and DRP (near 80% IncMSE) (Figs. 3a,b). The rest of the explanatory variables for TP and DRP varied in ranking and had less than 20% IncMSE but were nevertheless important in predicting the response variables in combination with streamflow. Variances between the ranking of the rest of the predictor variables for TP and DRP are expected because of their differences in chemistry and transport in the environment. It is important to consider the combined synergistic effects that these predictor variables have on P loads rather than solely focusing on their exact ranking. The significance of these variables is credible, as high streamflow is associated with high precipitation, which leads to heightened soil moisture and nutrient loss transport mechanisms (e.g., runoff, subsurface flow, percolation) that can flush nutrients from agricultural lands into rivers and tributaries and eventually into the lake (Paerl and Scott 2010; Stumpf et al. 2012). The timing of fertilizers in the spring and fall in conjunction with heavy precipitation events (Koslow et al. 2013; Michalak et al. 2013) further explains the vulnerability of upper-layer organic and mineralized P soil concentrations (L1_OP, L1_P) to P losses in percolation (PRKP), sediment (YP), and runoff (QAP). While the availability of upper-layer nutrient soil concentrations does not affect tributary loads immediately and is not a high risk by itself, rainfall events can change this synergy. P loss variables, which are more related to rainfall and streamflow effects, connect nutrients available in the upper soil layer with drainage of these nutrients to the tributary. The importance of these explanatory variables in predicting TP and DRP concentrations is seen in the resulting variable importance plots, indicating they have a high impact in reducing MSE.

Fig. 3.
Fig. 3.

Variable importance plots for the (a) TP and (b) DRP response variables. Percent increase in mean-square error (%IncMSE) is shown on the x axis. Higher values of %IncMSE indicate higher importance. The definitions for each variable can be found in Table 1.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

b. ML predictions

1) Original TP/DRP model prediction

The RF model for TP explained 81% of the variance in TP loads, with 67.60% of the model’s predictions being within 2 and 0.5 times the observations (red dotted lines in Fig. 4a). The model performed very well overall, with an R2 of 0.81, and most of the error was represented by the random component of the RMSE (Fig. 4a). Similarly, the RF model for DRP also performed very well overall, with an R2 of 0.80 (Fig. 4b). Most of the error in the DRP model was represented by the random component of the RMSE. The ability to remove bias (systematic error) almost entirely is a known functionality of ML algorithms.

Fig. 4.
Fig. 4.

Original TP/DRP model prediction of monthly (a) TP and (b) DRP loads through 10-fold CV, and TP/DRP testing model prediction of monthly (c) TP and (d) DRP loads through leave-1-year-out CV. The dashed black line is the one-to-one line; the black thick solid line is the linear regression line; the dotted red lines represent the fraction of predictions that are within 2 and 0.5 times the observations.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

Based on Moriasi et al. (2015), the models’ performances are significantly above the “satisfactory” performance criteria. The original TP model is considered “very good,” and the original DRP model is considered “good,” even though two of the three metrics are within the “very good” limits and the third one is exactly at the threshold (R2 of 0.80) (Table 2).

Table 2.

Summary of the statistical metrics and performance ratings for TP and DRP by model [original TP/DRP model (original); TP/DRP testing model (test)], watershed (Maumee only, Sandusky only, Portage only, and Raisin only), and season (winter, spring, summer, and fall). Watershed and seasonal information summarized are for the TP/DRP testing models. The performance rating is based on the R2, NSE, and PBIAS thresholds developed by Moriasi et al. (2015).

Table 2.

2) Testing phase: TP/DRP testing model

Ideally, the TP/DRP testing models’ results are as good as those of the original TP/DRP models. Observed and predicted TP loads indicated a similar “very good” performance as the original TP model (Fig. 4c and Table 2). Like the original TP model, there was underprediction of observed TP loads higher than 444 MT, which we hypothesize is due to the low number of TP loads at this extreme for training the RF model (444 MT was the 96th percentile of the observed TP). The DRP testing model also indicated a similar “good” performance as the original DRP model (Fig. 4d and Table 2). Like the original DRP model, there was underprediction of observed DRP loads higher than 107 MT, which is the 97th percentile of our data.

TP loads higher than 100 MT were identified 83.20% of the time (POD1 for the TP test column in Table 3). On the other hand, TP loads ≤ 100 MT were identified 95.40% of the time (POD2). Overall, the model correctly detected between both thresholds 92.90% of the time (PC). DRP loads higher than 50 MT were identified 73.80% of the time (POD1 for the DRP test column in Table 3). DRP loads ≤ 50 MT were identified 98.50% of the time (POD2). Overall, the model correctly detected between both thresholds 96.00% of the time (PC). This analysis showed that the RF model performs very well in predicting and classifying high and problematic TP and DRP loads that are not as abundant in our dataset.

Table 3.

Contingency statistics considering the event that TP > 100 MT and DRP > 50 MT: TP testing model (TP test) and DRP testing model (DRP test Here, PC is percent of correct detections, POD1 is probability of detection when TP > 100 MT or DRP > 50 MT, and POD2 is probability of detection when TP ≤ 100 MT or DRP ≤ 50 MT.

Table 3.

The findings indicate that the TP and DRP ML models are working as intended, with consistency among training, validation, and testing data. The testing phase provides promising results for the potential of this method to be used to predict TP and DRP loads in near-real time, based on weather, hydrology, and agroecosystem variables.

3) Testing phase: Seasonal and monthly predictions

TP predictions in the testing model were evaluated seasonally by combining the monthly data points into winter, spring, summer, and fall seasons. Predictions for spring, summer, and fall performed very well (Table 2 and Figs. 5b–d). Predictions for winter showed lower performance (“good”), which was expected, as the highest TP data point in the dataset was observed in February (1019 MT) (Fig. 5a). The underprediction of this data point was expected, as there were no additional TP load data of this high magnitude available to train the RF model. High TP loads are more challenging to predict, as they are considered outliers with limited data availability for training (Fig. 6a).

Fig. 5.
Fig. 5.

Seasonal performance of the TP testing model predictions for (a) winter (December–February), (b) spring (March–May), (c) summer (June–August), and (d) fall (September–November). The line definitions are as in Fig. 4.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

Fig. 6.
Fig. 6.

TP/DRP testing models: distribution of the observed (gray) vs predicted (blue) (a) seasonal TP loads and (b) seasonal DRP loads for winter, spring, summer, and fall. The number of data points available for each season is indicated in parentheses. The x mark inside the box is the mean of the season. Data points above the upper whisker indicate the outliers.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

Monthly TP predictions were also evaluated by isolating the data for each month and calculating the performance statistics. All winter months indicated “good” performance ratings, spring and summer months ranged from “good” to “very good,” and fall months were “satisfactory” for September, “unsatisfactory” for October, and “very good” for November (not shown). The “unsatisfactory” performance rating for October was because PBIAS (−45.84%) was above the thresholds, even though both R2 and NSE were within “good” limits. This high magnitude in PBIAS was associated with an October 2011 data point in the Maumee watershed, which was predicted to be over 2 times higher than the observations (103 MT observed; 262 MT predicted). By removing only this data point, PBIAS falls within the “satisfactory” thresholds. Monthly analyses indicated that PBIAS is an overly sensitive metric to predictions that are outside 2 and 0.5 times the observations. This was particularly true for a low TP load month like October, which had an average observed TP load of 14 MT and a maximum of 247 MT throughout the entire study period as compared with other months with higher TP loads. While PBIAS is a useful metric for seasonal evaluations, usage of PBIAS for monthly evaluations needs to be analyzed carefully due to low data availability at this temporal scale (41–54 data points per month). Due to the possibility of giving deceiving performance ratings, Moriasi et al. (2015) recommends that other statistical and graphical performance metrics be used in unison with PBIAS, which we already do by using scatterplots, time series, and other statistical metrics. The overall seasonal statistics for the TP testing model are considered above “satisfactory,” with “very good” predictions for the spring, summer, and fall seasons and “good” predictions for the winter season (Table 2).

Seasonal DRP predictions for the testing model indicated “very good” predictions for the summer and fall and “good” predictions for the winter and spring seasons (Table 2 and Fig. 7). Again, winter was the hardest to predict due to the large variability in DRP loads. Overall, the model performed well in predicting high DRP loads throughout the seasons but can be improved with the addition of high DRP data points, which are considered outliers in the DRP dataset (Fig. 6b). Both DRP and TP boxplots indicate that the predicted P loads follow the distribution of the observed P loads well and that the deviation lies in the prediction of the extreme P loads, which tend to be underpredicted. As additional high P load data become available for training the model, the model’s prediction capability for high P loads will increase. Discussions on this statement are presented in the walk-forward sensitivity test section. DRP loads for all months were predicted “satisfactorily” and above. Winter months indicated “good” performance ratings, spring months ranged from “good” to “very good,” summer months were “very good,” and fall was “satisfactory” for September and “very good” for October and November (not shown). These results indicate that the DRP testing model predicts seasonal and monthly DRP loads very well.

Fig. 7.
Fig. 7.

Seasonal performance of the DRP testing model predictions for (a) winter, (b) spring, (c) summer, and (d) fall. The line definitions are as in Fig. 4.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

4) Testing phase: Watershed predictions

TP and DRP predictions in the testing models for the four individual watersheds were evaluated. This section focuses on results for the Maumee watershed since it is the most problematic watershed, as it produced the highest TP and DRP loads in the area and exhibited the largest variability in P magnitudes. Monthly loads of Maumee ranged from 2.37 to 1019.41 MT for TP and from 0.11 to 237.88 MT for DRP, while monthly loads of Sandusky, Portage, and Raisin were lower than 293 MT for TP and lower than 59 MT for DRP throughout the study period. A closer look at the performance metrics for the Maumee watershed prediction (isolating the Maumee watershed points from the rest of the predicted values) indicates “good” statistics for TP and DRP. Time series plots for observed and predicted TP and DRP loads (Fig. 8) show that the RF models’ predictions align well with the peaks and valleys of the observed data. The underprediction of the highest values is expected, because those compose the minority of the available data (observed loads ≥ 164.81 MT for TP and ≥ 40.66 MT for DRP are considered to be outliers in the entire dataset), and ML algorithms struggle to capture the upper tail of a right-skewed distribution.

Fig. 8.
Fig. 8.

(top) TP and (bottom) DRP time series plot for the Maumee watershed from 2002 to 2017. The blue line indicates observed data, and the orange line indicates predictions.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

The RF-based models for both TP and DRP can remove the bias as expected, and the majority of the error left is represented by the random component of the RMSE (CRMSE). The results indicate watersheds are predicted “very good” and “good,” with only the Raisin watershed having a “satisfactory” performance due to R2, even though both NSE and PBIAS are within “good” ranges (Table 2). The “satisfactory” performance for the Raisin watershed stems from a single data point in April 2009, which was overpredicted in the testing models. However, observed versus predicted time series plots showed that the April 2009 TP and DRP predictions aligned well with the previous month’s observations, even more so than the March 2009 predictions themselves. This analysis highlighted that the Raisin watershed was predicted properly; however, there was a 1-month time shift, which may have been a result of the strict monthly temporal aggregation of the data. Future improvements in the TP and DRP models will investigate incorporating a longer legacy or time component for nutrient transport processes. Finally, the performance rating for the Raisin watershed increased to “good” with the exclusion of the April 2009 data point.

The Maumee watershed comparison between the ML-based approach and SWAT models indicated promising results. The comparison showed that the statistical metrics obtained from the TP/DRP testing models are within the ranges of those obtained from past SWAT modeling studies and on some occasions are even more accurate (Table 4). The testing models consistently performed well for the Maumee watershed, as seen from the R2, NSE, and PBIAS statistics, while other SWAT modeling studies struggled with at least one of the statistical metrics. PBIAS was the lowest in both the TP/DRP testing models, which again emphasizes the ML models’ ability to reduce bias relative to SWAT models. A particular highlight is that the DRP testing model achieved the best statistical measures in comparison with the rest of the SWAT studies. This ML-based approach is working very well, with the immense potential to work alongside SWAT models and provide additional support for P load modeling and assessments.

Table 4.

Comparison of the statistical metrics for TP and DRP loads obtained from the TP/DRP testing model (test) and the SWAT modeling studies as listed in Yuan and Koropeckyj (2022) for the Maumee watershed only. NR means that the value was not reported. The performance rating is based on the R2, NSE, and PBIAS thresholds developed by Moriasi et al. (2015). The asterisk indicates that performance ratings of models with missing metrics (NR) are not characterized.

Table 4.

5) Learning curves and recommendations for future work

As mentioned previously, theoretically the ML models’ prediction capabilities would improve if more instances of high P loads were available in our dataset. Keeping in mind the uncertainty in performance metrics as described by Clark et al. (2021), we tested for the statistical significance of their improvement through bootstrapping (Figs. 9 and 10). We consecutively increased the training size by 1 year (starting at 1 and ending with 15 years of training data) to predict all 16 years left out of training. For example, the first box plot with 1 year in the training (x axis) consists of all 16 years that have been individually predicted with 1 year of training data.

Fig. 9.
Fig. 9.

Learning curves tracking R2, RMSE, CRMSE, bias, NSE, and PBIAS for TP predictions as the number of training years increases. The x axis indicates the number of training years that were used to train the model at a particular time step. The boxplots indicate the average of the performance metrics and the 95% confidence interval using the 2.5th and 97.5th percentiles of the bootstrap distribution of estimates as the lower and upper bounds of the interval.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

Fig. 10.
Fig. 10.

As in Fig. 9, but for DRP predictions.

Citation: Artificial Intelligence for the Earth Systems 2, 3; 10.1175/AIES-D-22-0049.1

The R2, RMSE, CRMSE, bias, NSE, and PBIAS learning curves have similar behaviors for both TP and DRP RF models. Overall, the average of all the performance metrics indicated an improvement in the predictions for both TP and DRP, as more years of data were included for the training of the models (Figs. 9 and 10). However, as the 95% confidence interval indicates, the improvements in the predictions for both TP and DRP were only significant between having 1 and 7 years of training data. That improvement appears to affect the entire distribution, maybe slightly favoring low P load values, as the high loads are much more scarce no matter how many years we add in the training (right-skewed distribution). By the eighth year of training data, a clear overlap in upper and lower tails can be observed, which indicates that the improvement in performance is not significant. These results indicate that the performance of the prediction of TP and DRP loads has mostly leveled off with 7 years of training data. In addition, the 95% confidence intervals for both TP and DRP are quite wide, a signal that the sample size is too small.

The learning curves highlighted the challenge with data availability in our study. To get a narrow confidence interval, and thus, sample closer to the actual population, we would need more granularity in spatial and temporal availability of observed P loads (we have roughly 47 data points for 1 year). The uncertainty in the evolution of statistical error metrics as we add more data in the training does not allow us to infer potential improvements with the current dataset.

Changes that may lead to improvements in the ML-based P load models include the addition of missing P sources such as point sources, even though the majority of P originates from agricultural nonpoint sources, and the inclusion of legacy nutrient loads for P. Accounting for legacy P nutrients is a challenging task for the ML models because of how the predictors and explanatory variables are paired in space and time. We are indirectly accounting for some portions of legacy P, as we use monthly values for the explanatory and target variables. To capture the processes that contribute to legacy nutrients entering the tributary, we tested the standardized precipitation index (SPI) (3 months and 6 months) as one of the predictors to account for dry versus wet time periods longer than a month. Additionally, we tested lagging effects of streamflow by taking antecedent averages and standard deviations of streamflow (from 1 to 4 months). The ML-based models did not benefit from any of the SPIs or lagging effects, which is why they were not included in the final version of the ML models. Future development of this work should continue to explore the impact of legacy nutrients and identify ways to incorporate it, as have other studies, which have identified the impact of legacy information on controlling nutrient transport (Knapp et al. 2020; Minaudo et al. 2019; Ombadi and Varadharajan 2022).

4. Summary

Our results demonstrate the transferability of the integrated multimedia modeling approach—previously used to predict seasonal lake chl a concentrations—to predict P loads, with the necessary alterations of explanatory variables. Through RF, we explored the top 10 environmental variables for TP and the top nine environmental variables for DRP, which delineated the known effects these environmental variables exert on P loads. Streamflow was identified as the most influential variable, followed by upper-layer organic and mineralized P soil concentrations, P losses transported with sediment, and percolation. The prediction of TP is “very good,” according to the thresholds developed by Moriasi et al. (2015), with a good ability to categorize TP loads > 100 MT and a strong prediction of TP up to the 96th percentile. “Good” performance is seen in the prediction of DRP, with good characterization of DRP loads > 50 MT and strong prediction of DRP up to the 97th percentile. Overall, temporal evaluations indicated that the models’ seasonal performance ratings were “good” to “very good,” and the models’ monthly performance ratings were “satisfactory” to “very good,” except for 1 month in the TP testing model. Predictions of individual watersheds and their time series plots align well with the peaks and valleys of the observed dataset. Last, the RF models’ capability to “learn” and improve as more data become available was explored, and the uncertainty in the evolution of statistical error metrics as we add more data in the training does not allow us to infer potential improvements with the current dataset. When compared with SWAT models, current TP and DRP predictions show competitive and promising results, especially for DRP. Future improvements and updates to model simulations and increases in training data availability may strengthen the RF model predictions of TP and DRP loads in the future.

This ML and multimedia integrated framework can be used to predict a wide variety of water quality target variables (e.g., dissolved oxygen, total N, nitrate-N, chl a, microcystins) in various ecosystems (across the Great Lakes, other lakes, rivers, and coastal waters), provided that sufficient data are available to train the model. Furthermore, the potential for this modeling framework to be used for scenario assessments is particularly exciting. The numerical modeling systems provide the capability to conduct future scenario modeling and assess changes in agriculture practices (e.g., till vs no till, differing irrigation strategies) and climate change impacts that affect tributary P loads. Sequentially, such scenarios can inform management strategies to support future healthy tributaries in a changing climate. The broad applicability and scenario assessment capability are important differentiators of this modeling framework from current state-of-the-science modeling approaches.

Acknowledgments.

This research was partially funded by the U.S. Department of Education Graduate Assistantships in Areas of National Need (GAANN) project “Environmental Engineering at the Forefront of Water Science, Policy and Education” Award P200A150311 (2016–19). The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the U.S. EPA. The authors declare no competing interests.

Data availability statement.

Outputs from the ML-based models for tributary P loads and comparison with observations are publicly available from the Open Science Framework (https://doi.org/10.17605/OSF.IO/C4HZ7). Outputs from WRF, VIC, and EPIC from the EPA will be made available at the EPA’s Environmental Dataset Gateway (https://edg.epa.gov/metadata/catalog/main/home.page) once the data are reviewed and approved.

REFERENCES

  • Arnold, J. G., R. Srinivasan, R. S. Muttiah, and J. R. Williams, 1998: Large area hydrologic modelling and assessment. Part 1: Model development. J. Amer. Water Resour. Assoc., 34, 7389, https://doi.org/10.1111/j.1752-1688.1998.tb05961.x.

    • Search Google Scholar
    • Export Citation
  • Bash, J. O., E. J. Cooter, R. L. Dennis, J. T. Walker, and J. E. Pleim, 2013: Evaluation of a regional air-quality model with bidirectional NH3 exchange coupled to an agroecosystem model. Biogeosciences, 10, 16351645, https://doi.org/10.5194/bg-10-1635-2013.

    • Search Google Scholar
    • Export Citation
  • Bertani, I., D. R. Obenour, C. E. Steger, C. A. Stow, A. D. Gronewold, and D. Scavia, 2016: Probabilistically assessing the role of nutrient loading in harmful algal bloom formation in western Lake Erie. J. Great Lakes Res., 42, 11841192, https://doi.org/10.1016/j.jglr.2016.04.002.

    • Search Google Scholar
    • Export Citation
  • Bocaniov, S. A., and D. Scavia, 2016: Temporal and spatial dynamics of large lake hypoxia: Integrating statistical and three-dimensional dynamic models to enhance lake management criteria. Water Resour. Res., 52, 42474263, https://doi.org/10.1002/2015WR018170.

    • Search Google Scholar
    • Export Citation
  • Bocaniov, S. A., L. F. Leon, Y. R. Rao, D. J. Schwab, and D. Scavia, 2016: Simulating the effect of nutrient reduction on hypoxia in a large lake (Lake Erie, USA–Canada) with a three-dimensional lake model. J. Great Lakes Res., 42, 12281240, https://doi.org/10.1016/j.jglr.2016.06.001.

    • Search Google Scholar
    • Export Citation
  • Bosch, N. S., J. D. Allan, D. M. Dolan, H. Han, and R. P. Richards, 2011: Application of the soil and water assessment tool for six watersheds of Lake Erie: Model parameterization and calibration. J. Great Lakes Res., 37, 263271, https://doi.org/10.1016/j.jglr.2011.03.004.

    • Search Google Scholar
    • Export Citation
  • Breiman, L., 2001: Random forests. Mach. Learn., 45, 532, https://doi.org/10.1023/A:1010933404324.

  • Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone, 1984: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, 358 pp.

  • Bryant, K. J., V. W. Benson, J. R. Kiniry, J. R. Williams, and R. D. Lacewell, 1992: Simulating corn yield response to irrigation timings: Validation of the EPIC model. J. Prod. Agric., 5, 237242, https://doi.org/10.2134/jpa1992.0237.

    • Search Google Scholar
    • Export Citation
  • Clark, M. P., and Coauthors, 2021: The abuse of popular performance metrics in hydrologic modeling. Water Resour. Res., 57, e2020WR029001, https://doi.org/10.1029/2020WR029001.

    • Search Google Scholar
    • Export Citation
  • Cooter, E. J., J. O. Bash, V. Benson, and L. Ran, 2012: Linking agricultural crop management and air quality models for regional to national-scale nitrogen assessments. Biogeosciences, 9, 40234035, https://doi.org/10.5194/bg-9-4023-2012.

    • Search Google Scholar
    • Export Citation
  • Culbertson, A. M., J. F. Martin, N. Aloysius, and S. A. Ludsin, 2016: Anticipated impacts of climate change on 21st century Maumee River discharge and nutrient loads. J. Great Lakes Res., 42, 13321342, https://doi.org/10.1016/j.jglr.2016.08.008.

    • Search Google Scholar
    • Export Citation
  • De Pinto, J. V., T. C. Young, and L. M. McIlroy, 1986: Great Lakes water quality improvement. Environ. Sci. Technol., 20, 752759, https://doi.org/10.1021/es00150a001.

    • Search Google Scholar
    • Export Citation
  • Easterling, W. E., C. J. Hays, M. M. Easterling, and J. R. Brandle, 1997: Modelling the effect of shelterbelts on maize productivity under climate change: An application of the EPIC model. Agric. Ecosyst. Environ., 61, 163176, https://doi.org/10.1016/S0167-8809(96)01098-5.

    • Search Google Scholar
    • Export Citation
  • EPA, 2019: Air emissions modeling: Biogenic emission sources. Environmental Protection Agency, accessed 15 January 2021, https://www.epa.gov/air-emissions-modeling/biogenic-emission-sources.

  • Feng Chang, C., V. Garcia, C. Tang, P. Vlahos, D. Wanik, J. Yan, J. O. Bash, and M. Astitha, 2021: Linking multi-media modeling with machine learning to assess and predict lake chlorophyll a concentrations. J. Great Lakes Res., 47, 16561670, https://doi.org/10.1016/j.jglr.2021.09.011.

    • Search Google Scholar
    • Export Citation
  • Garcia, V., E. Cooter, J. Crooks, B. Hayes, B. Hinckley, M. Murphy, T. Wade, and X. Xing, 2016: Using a coupled modelling system to examine the impacts of increased corn production on groundwater quality and human health. Air Pollution Modeling and Its Application XXIV, D. Steyn and N. Chaumerliac, Eds., Springer Proceedings in Complexity Springer, 113–117.

  • Gatz, L., 2019: Freshwater harmful algal blooms: Causes, challenges, and policy considerations. Congressional Research Service Rep. R44871, 34 pp., https://www.everycrsreport.com/files/20190905_R44871_9e6b4b24aa165002e40914140c8aee92cfc04212.pdf.

  • Gildow, M., N. Aloysius, S. Gebremariam, and J. Martin, 2016: Fertilizer placement and application timing as strategies to reduce phosphorus loading to Lake Erie. J. Great Lakes Res., 42, 12811288, https://doi.org/10.1016/j.jglr.2016.07.002.

    • Search Google Scholar
    • Export Citation
  • Gobler, C. J., J. M. Burkholder, T. W. Davis, M. J. Harke, T. Johengen, C. A. Stow, and D. B. Van de Waal, 2016: The dual role of nitrogen supply in controlling the growth and toxicity of cyanobacterial blooms. Harmful Algae, 54, 8797, https://doi.org/10.1016/j.hal.2016.01.010.

    • Search Google Scholar
    • Export Citation
  • Han, H., J. D. Allan, and N. S. Bosch, 2012: Historical pattern of phosphorus loading to Lake Erie watersheds. J. Great Lakes Res., 38, 289298, https://doi.org/10.1016/j.jglr.2012.03.004.

    • Search Google Scholar
    • Export Citation
  • Hanson, P. C., and Coauthors, 2020: Predicting lake surface water phosphorus dynamics using process-guided machine learning. Ecol. Modell., 430, 109136, https://doi.org/10.1016/j.ecolmodel.2020.109136.

    • Search Google Scholar
    • Export Citation
  • Ho, J. C., and A. M. Michalak, 2015: Challenges in tracking harmful algal blooms: A synthesis of evidence from Lake Erie. J. Great Lakes Res., 41, 317325, https://doi.org/10.1016/j.jglr.2015.01.001.

    • Search Google Scholar
    • Export Citation
  • Hollister, J. W., W. B. Milstead, and B. J. Kreakie, 2016: Modeling lake trophic state: A random forest approach. Ecosphere, 7, e01321, https://doi.org/10.1002/ecs2.1321.

    • Search Google Scholar
    • Export Citation
  • Interagency Working Group on the Harmful Algal Bloom and Hypoxia Research and Control Act, 2017: Harmful algal blooms and hypoxia comprehensive research plan and action strategy: An interagency report. Subcommittee on Ocean Science and Technology, National Science and Technology Council, 103 pp., https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/NSTC/final_habs_hypoxia_research_plan_and_action.pdf.

  • International Joint Commission, 1978: Agreement on Great Lakes water quality, 1978 (with annexes). United Nations Treaty Series, 18177, 36 pp., https://treaties.un.org/doc/publication/unts/volume%201153/volume-1153-i-18177-english.pdf.

  • International Joint Commission, 2012: Great Lakes Water Quality Agreement: Protocol amending the agreement between Canada and the United States of America on Great Lakes water quality, 1978, as amended on October 16, 1983, and on November 18, 1987. IJC, 74 pp., https://www.ijc.org/sites/default/files/Great%20Lakes%20Water%20Quality%20Agreement%20-%202012_1.pdf.

  • International Joint Commission, 2014: A balanced diet for Lake Erie: Reducing phosphorus loadings and harmful algal blooms. Lake Erie Ecosystem Priority Rep., 100 pp., https://legacyfiles.ijc.org/publications/2014%20IJC%20LEEP%20REPORT.pdf.

  • Izaurralde, R. C., J. R. Williams, W. B. McGill, N. J. Rosenberg, and M. C. Quiroga Jakas, 2006: Simulating soil C dynamics with EPIC: Model description and testing against long-term data. Ecol. Modell., 192, 362384, https://doi.org/10.1016/j.ecolmodel.2005.07.010.

    • Search Google Scholar
    • Export Citation
  • Johnes, P. J., 1996: Evaluation and management of the impact of land use change on the nitrogen and phosphorus load delivered to surface waters: The export coefficient modelling approach. J. Hydrol., 183, 323349, https://doi.org/10.1016/0022-1694(95)02951-6.

    • Search Google Scholar
    • Export Citation
  • Kalcic, M. M., C. Kirchhoff, N. Bosch, R. L. Muenich, M. Murray, J. G. Gardner, and D. Scavia, 2016: Engaging stakeholders to define feasible and desirable agricultural conservation in western Lake Erie watersheds. Environ. Sci. Technol., 50, 81358145, https://doi.org/10.1021/acs.est.6b01420.

    • Search Google Scholar
    • Export Citation
  • Kalcic, M. M., R. L. Muenich, S. Basile, A. L. Steiner, C. Kirchhoff, and D. Scavia, 2019: Climate change and nutrient loading in the western Lake Erie basin: Warming can counteract a wetter future. Environ. Sci. Technol., 53, 75437550, https://doi.org/10.1021/acs.est.9b01274.

    • Search Google Scholar
    • Export Citation
  • Kane, D. D., J. D. Conroy, R. P. Richards, D. B. Baker, and D. A. Culver, 2014: Re-eutrophication of Lake Erie: Correlations between tributary nutrient loads and phytoplankton biomass. J. Great Lakes Res., 40, 496501, https://doi.org/10.1016/j.jglr.2014.04.004.

    • Search Google Scholar
    • Export Citation
  • Keitzer, S. C., and Coauthors, 2016: Thinking outside of the lake: Can controls on nutrient inputs into Lake Erie benefit stream conservation in its watershed? J. Great Lakes Res., 42, 13221331, https://doi.org/10.1016/j.jglr.2016.05.012.

    • Search Google Scholar
    • Export Citation
  • Kim, R. J., D. P. Loucks, and J. R. Stedinger, 2012: Artificial neural network models of watershed nutrient loading. Water Resour. Manage., 26, 27812797, https://doi.org/10.1007/s11269-012-0045-x.

    • Search Google Scholar
    • Export Citation
  • Knapp, J. L. A., J. V. Freyberg, B. Studer, L. Kiewiet, and J. W. Kirchner, 2020: Concentration–discharge relationships vary among hydrological events, reflecting differences in event characteristics. Hydrol. Earth Syst. Sci., 24, 25612576, https://doi.org/10.5194/hess-24-2561-2020.

    • Search Google Scholar
    • Export Citation
  • Ko, J., G. Piccinni, and E. Steglich, 2009: Using EPIC model to manage irrigated cotton and maize. Agric. Water Manage., 96, 13231331, https://doi.org/10.1016/j.agwat.2009.03.021.

    • Search Google Scholar
    • Export Citation
  • Kohavi, R., 1995: A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc. 14th Int. Joint Conf. on Artificial Intelligence, Montreal, QC, Canada, Association for Computing Machinery, 1137–1143, https://dl.acm.org/doi/10.5555/1643031.1643047.

  • Koslow, M., E. Lillard, and V. Benka, 2013: Taken by storm: How heavy rain is worsening algal blooms in Lake Erie with a focus on the Maumee River in Ohio. National Wildlife Federation, 25 pp., https://www.nwf.org/∼/media/pdfs/water/taken_by_storm_nwf_2013.ashx.

  • Kuhn, M., 2019: Caret: Classification and regression training, version 6.0-94. Caret Package, https://github.com/topepo/caret/.

  • LaBeau, M. B., H. Gorman, A. Mayer, D. Dempsey, and A. Sherrin, 2013: Tributary phosphorus monitoring in the U.S. portion of the Laurentian Great Lake basin: Drivers and challenges. J. Great Lakes Res., 39, 569577, https://doi.org/10.1016/j.jglr.2013.09.014.

    • Search Google Scholar
    • Export Citation
  • Leon, L. F., and Coauthors, 2011: Application of a 3D hydrodynamic–biological model for seasonal and spatial dynamics of water quality and phytoplankton in Lake Erie. J. Great Lakes Res., 37, 4153, https://doi.org/10.1016/j.jglr.2010.12.007.

    • Search Google Scholar
    • Export Citation
  • Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges, 1994: A simple hydrologically based model of land surface and energy fluxes for general circulation models. J. Geophys. Res., 99, 14 41514 428, https://doi.org/10.1029/94JD00483.

    • Search Google Scholar
    • Export Citation
  • Liaw, A., and M. Wiener, 2002: Classification and regression by randomForest. R News, No. 2 (3), R Foundation, Vienna, Austria, 18–22, http://CRAN.R-project.org/doc/Rnews/.

  • Maccoux, M. J., A. Dove, S. M. Backus, and D. M. Dolan, 2016: Total and soluble reactive phosphorus loadings to Lake Erie: A detailed accounting by year, basin, country, and tributary. J. Great Lakes Res., 42, 11511165, https://doi.org/10.1016/j.jglr.2016.08.005.

    • Search Google Scholar
    • Export Citation
  • Michalak, A. M., and Coauthors, 2013: Record-setting algal bloom in Lake Erie caused by agricultural and meteorological trends consistent with expected future conditions. Proc. Natl. Acad. Sci. USA, 110, 64486452, https://doi.org/10.1073/pnas.1216006110.

    • Search Google Scholar
    • Export Citation
  • Minaudo, C., R. Dupas, C. Gascuel-Odoux, V. Roubeix, P.-A. Danis, and F. Moatar, 2019: Seasonal and event-based concentration–discharge relationships to identify catchment controls on nutrient export regimes. Adv. Water Resour., 131, 103379, https://doi.org/10.1016/j.advwatres.2019.103379.

    • Search Google Scholar
    • Export Citation
  • Monchamp, M.-E., F. R. Pick, B. E. Beisner, and R. Maranger, 2014: Nitrogen forms influence microcystin concentration and composition via changes in cyanobacterial community structure. PLOS ONE, 9, e85573, https://doi.org/10.1371/journal.pone.0085573.

    • Search Google Scholar
    • Export Citation
  • Moore, S. K., V. L. Trainer, N. J. Mantua, M. S. Parker, E. A. Laws, L. C. Backer, and L. E. Fleming, 2008: Impacts of climate variability and future climate change on harmful algal blooms and human health. Environ. Health, 7 (Suppl. 2), S4, https://doi.org/10.1186/1476-069X-7-S2-S4.

    • Search Google Scholar
    • Export Citation
  • Moriasi, D. N., J. G. Arnold, M. W. Van Liew, R. L. Bingner, R. D. Harmel, and T. L. Veith, 2007: Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE, 50, 885900, https://doi.org/10.13031/2013.23153.

    • Search Google Scholar
    • Export Citation
  • Moriasi, D. N., M. W. Gitau, N. Pai, and P. Daggupati, 2015: Hydrologic and water quality models: Performance measures and evaluation criteria. Trans. ASABE, 58, 17631785, https://doi.org/10.13031/trans.58.10715.

    • Search Google Scholar
    • Export Citation
  • Nash, J. E., and J. V. Sutcliffe, 1970: River flow forecasting through conceptual models. Part I—A discussion of principles. J. Hydrol., 10, 282290, https://doi.org/10.1016/0022-1694(70)90255-6.

    • Search Google Scholar
    • Export Citation
  • NCWQR, 2005: Loading calculations, annual loads and unit area loads. Water Quality Laboratory, Heidelberg College, 5 pp., https://ncwqr.files.wordpress.com/2017/06/b-loading-calculations-annual-loads-and-unit-area-loads.pdf.

  • NCWQR, 2022: HTLP Monitoring Data Portal. Accessed 12 May 2022, https://ncwqr-data.org/HTLP/Portal.

  • Niu, X., W. Easterling, C. J. Hays, A. Jacobs, and L. Mearns, 2009: Reliability and input-data induced uncertainty of the EPIC model to estimate climate change impact on sorghum yields in the U.S. Great Plains. Agric. Ecosyst. Environ., 129, 268276, https://doi.org/10.1016/j.agee.2008.09.012.

    • Search Google Scholar
    • Export Citation
  • Obenour, D. R., A. D. Gronewold, C. A. Stow, and D. Scavia, 2014: Using a Bayesian hierarchical model to improve Lake Erie cyanobacteria bloom forecasts. Water Resour. Res., 50, 78477860, https://doi.org/10.1002/2014WR015616.

    • Search Google Scholar
    • Export Citation
  • Ombadi, M., and C. Varadharajan, 2022: Urbanization and aridity mediate distinct salinity response to floods in rivers and streams across the contiguous United States. Water Res., 220, 118664, https://doi.org/10.1016/j.watres.2022.118664.

    • Search Google Scholar
    • Export Citation
  • Paerl, H. W., and J. T. Scott, 2010: Throwing fuel on the fire: Synergistic effects of excessive nitrogen inputs and global warming on harmful algal blooms. Environ. Sci. Technol., 44, 77567758, https://doi.org/10.1021/es102665e.

    • Search Google Scholar
    • Export Citation
  • Perlich, C., 2011: Learning curves in machine learning. Encyclopedia of Machine Learning, C. Sammut and Webb G. I., Eds., Springer, 577–580.

  • Ran, L., Q. He, E. Cooter, and V. Benson, 2010: Development of an agricultural fertilizer modeling system for bi-directional ammonia fluxes in the Community Multiscale Air Quality (CMAQ) model. NATO/ITM Air Pollution Modeling and its Applications XXI, D. G. Steyn and S. Trini Castelli, Eds., NATO Science for Peace and Security Series C: Environmental Security, Springer, 213–219.

  • Ran, L., Y. Yuan, E. Cooter, V. Benson, D. Yang, J. Pleim, R. Wang, and J. Williams, 2019: An integrated agriculture, atmosphere, and hydrology modeling system for ecosystem assessments. J. Adv. Model. Earth Syst., 11, 46454668, https://doi.org/10.1029/2019MS001708.

    • Search Google Scholar
    • Export Citation
  • Read, J. S., and Coauthors, 2019: Process-guided deep learning predictions of lake water temperature. Water Resour. Res., 55, 91739190, https://doi.org/10.1029/2019WR024922.

    • Search Google Scholar
    • Export Citation
  • Reckhow, K. H., and S. C. Chapra, 1999: Modeling excessive nutrient loading in the environment. Environ. Pollut., 100, 197207, https://doi.org/10.1016/S0269-7491(99)00092-5.

    • Search Google Scholar
    • Export Citation
  • Robertson, D. M., and D. A. Saad, 2011: Nutrient inputs to the Laurentian Great Lakes by source watershed estimated using SPARROW watershed models. J. Amer. Water Resour. Assoc., 47, 10111033, https://doi.org/10.1111/j.1752-1688.2011.00574.x.

    • Search Google Scholar
    • Export Citation
  • Robertson, D. M., G. E. Schwarz, D. A. Saad, and R. B. Alexander, 2009: Incorporating uncertainty into the ranking of SPARROW model nutrient yields from Mississippi/Atchafalaya River basin watersheds. J. Amer. Water Resour. Assoc., 45, 534549, https://doi.org/10.1111/j.1752-1688.2009.00310.x.

    • Search Google Scholar
    • Export Citation
  • Roerdink, A., 2017: Water quality in Ohio Rivers and streams: Project study plan. National Center for Water Quality Research, Heidelberg University, 35 pp., https://ncwqr.files.wordpress.com/2019/04/ncwqr-study-plan-20170127-with-addendums.pdf.

  • Sawyer, C. N., 1947: Fertilization of lakes by agricultural and urban drainage. J. N. Engl. Water Works Assoc., 61, 109127.

  • Scavia, D., and Coauthors, 2014: Assessing and addressing the re-eutrophication of Lake Erie: Central basin hypoxia. J. Great Lakes Res., 40, 226246, https://doi.org/10.1016/j.jglr.2014.02.004.

    • Search Google Scholar
    • Export Citation
  • Scavia, D., and Coauthors, 2016a: Informing Lake Erie agriculture nutrient management via scenario evaluation. Water Center, University of Michigan, 83 pp., https://graham.umich.edu/media/pubs/InformingLakeErieAgricultureNutrientManagementviaScenarioEvaluation.pdf.

  • Scavia, D., J. V. DePinto, and I. Bertani, 2016b: A multi-model approach to evaluating target phosphorus loads for Lake Erie. J. Great Lakes Res., 42, 11391150, https://doi.org/10.1016/j.jglr.2016.09.007.

    • Search Google Scholar
    • Export Citation
  • Schwarz, G. E., A. B. Hoos, R. B. Alexander, and R. A. Smith, 2006: The SPARROW surface water-quality model: Theory, application and user documentation. USGS Series Rep. 6–B3, 248 pp., https://pubs.er.usgs.gov/publication/tm6B3.

  • Shen, L. Q., G. Amatulli, T. Sethi, P. Raymond, and S. Domisch, 2020: Estimating nitrogen and phosphorus concentrations in streams and rivers, within a machine learning framework. Sci. Data, 7, 161, https://doi.org/10.1038/s41597-020-0478-7.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., and Coauthors, 2019: A description of the Advanced Research WRF Model version 4. NCAR Tech. Note NCAR/TN-556+STR, 145 pp., https://doi.org/10.5065/1dfh-6p97.

  • Smith, V. H., 2003: Eutrophication of freshwater and coastal marine ecosystems: A global problem. Environ. Sci. Pollut. Res. Int., 10, 126139, https://doi.org/10.1065/espr2002.12.142.

    • Search Google Scholar
    • Export Citation
  • Stow, C. A., Y. Cha, L. T. Johnson, R. Confesor, and R. P. Richards, 2015: Long-term and seasonal trend decomposition of Maumee River nutrient inputs to western Lake Erie. Environ. Sci. Technol., 49, 33923400, https://doi.org/10.1021/es5062648.

    • Search Google Scholar
    • Export Citation
  • Stumpf, R. P., T. T. Wynne, D. B. Baker, and G. L. Fahnenstiel, 2012: Interannual variability of cyanobacterial blooms in Lake Erie. PLOS ONE, 7, e42444, https://doi.org/10.1371/journal.pone.0042444.

    • Search Google Scholar
    • Export Citation
  • Stumpf, R. P., L. T. Johnson, T. T. Wynne, and D. B. Baker, 2016: Forecasting annual cyanobacterial bloom biomass to inform management decisions in Lake Erie. J. Great Lakes Res., 42, 11741183, https://doi.org/10.1016/j.jglr.2016.08.006.

    • Search Google Scholar
    • Export Citation
  • Tang, C., and R. L. Dennis, 2014: How reliable is the offline linkage of Weather Research & Forecasting Model (WRF) and Variable Infiltration Capacity (VIC) model? Global Planet. Change, 116, 19, https://doi.org/10.1016/j.gloplacha.2014.01.014.

    • Search Google Scholar
    • Export Citation
  • USGS, 2020: Hydrologic unit maps. Water Resources of the United States, accessed 8 March 2021, https://water.usgs.gov/GIS/huc.html.

  • Valipour, R., L. F. Leon, D. Depew, A. Dove, and Y. R. Rao, 2016: High-resolution modeling for development of nearshore ecosystem objectives in eastern Lake Erie. J. Great Lakes Res., 42, 12411251, https://doi.org/10.1016/j.jglr.2016.08.011.

    • Search Google Scholar
    • Export Citation
  • Verhamme, E. M., T. M. Redder, D. A. Schlea, J. Grush, J. F. Bratton, and J. V. DePinto, 2016: Development of the Western Lake Erie Ecosystem Model (WLEEM): Application to connect phosphorus loads to cyanobacteria biomass. J. Great Lakes Res., 42, 11931205, https://doi.org/10.1016/j.jglr.2016.09.006.

    • Search Google Scholar
    • Export Citation
  • Verma, S., R. Bhattarai, N. S. Bosch, R. C. Cooke, P. K. Kalita, and M. Markus, 2015: Climate change impacts on flow, sediment and nutrient export in a Great Lakes watershed using SWAT. Clean Soil Air Water, 43, 14641474, https://doi.org/10.1002/clen.201400724.

    • Search Google Scholar
    • Export Citation
  • Visser, P. M., J. M. H. Verspagen, G. Sandrini, L. J. Stal, H. C. P. Matthijs, T. W. Davis, H. W. Paerl, and J. Huisman, 2016: How rising CO2 and global warming may stimulate harmful cyanobacterial blooms. Harmful Algae, 54, 145159, https://doi.org/10.1016/j.hal.2015.12.006.

    • Search Google Scholar
    • Export Citation
  • Vollenweider, R. A., 1968: Scientific fundamentals of the eutrophication of lakes and flowing waters, with particular reference to nitrogen and phosphorus as factors in eutrophication. Organisation for Economic Co-operation and Development Tech. Rep. DAS/CSI/62.27, 61 pp., https://hero.epa.gov/hero/index.cfm/reference/details/reference_id/37262.

  • Watson, S. B., and Coauthors, 2016: The re-eutrophication of Lake Erie: Harmful algal blooms and hypoxia. Harmful Algae, 56, 4466, https://doi.org/10.1016/j.hal.2016.04.010.

    • Search Google Scholar
    • Export Citation
  • Wei, B., N. Sugiura, and T. Maekawa, 2001: Use of artificial neural network in the prediction of algal blooms. Water Res., 35, 20222028, https://doi.org/10.1016/S0043-1354(00)00464-4.

    • Search Google Scholar
    • Export Citation
  • Williams, J. R., E. Wang, A. Meinardus, W. L. Harman, M. Siemers, and J. D. Atwood, 2006: EPIC users guide v. 0509. Texas A & M University, 64 pp., https://epicapex.tamu.edu/media/xz5nqfk2/epic0509usermanualupdated.pdf.

  • Williams, M. R., and K. W. King, 2020: Changing rainfall patterns over the western Lake Erie basin (1975–2017): Effects on tributary discharge and phosphorus load. Water Resour. Res., 56, e2019WR025985, https://doi.org/10.1029/2019WR025985.

    • Search Google Scholar
    • Export Citation
  • Wynne, T., A. Meredith, T. Briggs, W. Litaker, and R. Stumpf, 2018: Harmful algal bloom forecasting branch ocean color satellite imagery processing guidelines. NOAA Tech. Memo. NOS NCCOS 252, 48 pp., https://coastalscience.noaa.gov/data_reports/harmful-algal-bloom-forecasting-branch-ocean-color-satellite-imagery-processing-guidelines/.

  • Yen, H., and Coauthors, 2016: Western Lake Erie basin: Soft-data-constrained, NHDPlus resolution watershed modeling and exploration of applicable conservation scenarios. Sci. Total Environ., 569570, 12651281, https://doi.org/10.1016/j.scitotenv.2016.06.202.

    • Search Google Scholar
    • Export Citation
  • Yuan, Y., and L. Koropeckyj, 2022: SWAT model application for evaluating agricultural conservation practice effectiveness in reducing phosphorus loss from the western Lake Erie basin. J. Environ. Manage., 302, 114000, https://doi.org/10.1016/j.jenvman.2021.114000.

    • Search Google Scholar
    • Export Citation
  • Yuan, Y., R. Wang, E. Cooter, L. Ran, P. Daggupati, D. Yang, R. Srinivasan, and A. Jalowska, 2018: Integrating multimedia models to assess nitrogen losses from the Mississippi River basin to the Gulf of Mexico. Biogeosciences, 15, 70597076, https://doi.org/10.5194/bg-15-7059-2018.

    • Search Google Scholar
    • Export Citation
  • Zeng, Z., and Coauthors, 2019: A reversal in global terrestrial stilling and its implications for wind energy production. Nat. Climate Change, 9, 979985, https://doi.org/10.1038/s41558-019-0622-6.

    • Search Google Scholar
    • Export Citation
Save
  • Arnold, J. G., R. Srinivasan, R. S. Muttiah, and J. R. Williams, 1998: Large area hydrologic modelling and assessment. Part 1: Model development. J. Amer. Water Resour. Assoc., 34, 7389, https://doi.org/10.1111/j.1752-1688.1998.tb05961.x.

    • Search Google Scholar
    • Export Citation
  • Bash, J. O., E. J. Cooter, R. L. Dennis, J. T. Walker, and J. E. Pleim, 2013: Evaluation of a regional air-quality model with bidirectional NH3 exchange coupled to an agroecosystem model. Biogeosciences, 10, 16351645, https://doi.org/10.5194/bg-10-1635-2013.

    • Search Google Scholar
    • Export Citation
  • Bertani, I., D. R. Obenour, C. E. Steger, C. A. Stow, A. D. Gronewold, and D. Scavia, 2016: Probabilistically assessing the role of nutrient loading in harmful algal bloom formation in western Lake Erie. J. Great Lakes Res., 42, 11841192, https://doi.org/10.1016/j.jglr.2016.04.002.

    • Search Google Scholar
    • Export Citation
  • Bocaniov, S. A., and D. Scavia, 2016: Temporal and spatial dynamics of large lake hypoxia: Integrating statistical and three-dimensional dynamic models to enhance lake management criteria. Water Resour. Res., 52, 42474263, https://doi.org/10.1002/2015WR018170.

    • Search Google Scholar
    • Export Citation
  • Bocaniov, S. A., L. F. Leon, Y. R. Rao, D. J. Schwab, and D. Scavia, 2016: Simulating the effect of nutrient reduction on hypoxia in a large lake (Lake Erie, USA–Canada) with a three-dimensional lake model. J. Great Lakes Res., 42, 12281240, https://doi.org/10.1016/j.jglr.2016.06.001.

    • Search Google Scholar
    • Export Citation
  • Bosch, N. S.,