1. Introduction
Recent events—from the emergence of new viral pathogens to heatwaves to wildfires—have re-emphasized how vulnerable the security, sustainability, and well-being of our society are to the climate dynamics. Despite the increasing number of initiatives in the artificial intelligence (AI) community to develop more efficient methods for tracking climate change and quantifying the associated impact on the society, integration of the state-of-the-art AI methods into climate studies and environmental justice, in particular, still remains relatively limited, especially with respect to expansive applications in other disciplines such as social science, bioinformatics, and analysis of cyber-physical systems (Benami et al. 2021; McGovern et al. 2022; Zhang et al. 2021; Krupnova et al. 2022).
Motivated by the pressing question on environmental justice, in light of the COVID-19 pandemic, we harness the power of graph neural networks (GNNs) with a goal to investigate whether air pollution disproportionally contributes to COVID-19 clinical severity in the states of Texas and Pennsylvania. (These U.S. states have been selected as case studies because of the availability of COVID-19 hospitalization records, however, the presented methodology is sufficiently general for the analysis of other regions.) In particular, armed with the new data of NASA satellite observations on aerosol optical depth (AOD), NASAdat, we evaluate the predictive utility of air pollution to forecasting COVID-19 clinical severity while accounting for heterogeneous sociodemographic factors within the GNN framework. Note that the long-term observations of AOD contribute to estimating ground-level particulate matter pollutants having an aerodynamic diameter smaller than 2.5 μm (PM2.5), which are proven to be associated with a broad range of negative health outcomes such triggering respiratory and cardiovascular disease symptoms. The AOD observations are also used to provide estimates of the air pollution exposure in many epidemiological studies, for example, Franklin et al. (2017) and Meng et al. (2018). Indeed, as the recent studies indicate, GNNs and other deep learning (DL) models, which are adapted to data in non-Euclidean spaces such as graphs and manifolds and which are often referred to as geometric deep learning (GDL; Bronstein et al. 2017), can noticeably outperform more traditional statistical and machine learning (ML) tools in forecasting tasks, especially for nonstationary heterogeneous spatiotemporal data (e.g., air pollution transport of varying concentrations; W. Cao et al. 2020; Yu 2021; Jiang and Luo 2022). In turn, given the irregular lattice structure of the available COVID-19 epidemiological reports, the wide range of uncertainties due to the delayed, incomplete, and noisy official biosurveillance records, and a high level of sociodemographic heterogeneity, GDL appears to be one of the most promising forecasting approaches for tracking the hidden mechanisms behind spatiotemporal COVID-19 dynamics and the potentially disproportionate environmental burden on certain communities.
The novelty and contribution of our work can be summarized as follows:
-
This is the first paper to introduce GNNs and DL, in general, for satellite-based quantification of the disproportional environmental harm caused by air pollution during the COVID-19 pandemic.
-
Our findings suggest that COVID-19 clinical severity in counties of northeastern Texas that are also characterized by higher socioeconomic vulnerability are likely to be more impacted by poor air quality, thereby raising a concern of environmental injustice in these socioeconomically disadvantaged communities.
-
We introduce a unique, publicly available, easy to use, and regularly maintained NASA satellite dataset of AOD, temperature, and relative humidity over the entire surface of Earth (NASAdat). NASAdat can be used to address a broad range of AI tasks for social good, that is, AI-assisted solutions tackling the world’s critical challenges and yielding positive social impact—from assessing health outcomes of climate change and detecting potential environmental health disparities (i.e., the primary focus of this project) to developing fairness algorithms for home insurance accessibility to precision farming. NASAdat constitutes one of the first steps toward developing a new platform integrating NASA’s satellite observations with AI tools for social good.
2. Related work
a. COVID-19, climate, and environmental justice
The COVID-19 pandemic provided a unique opportunity to study environmental (in)justice from both global and local perspectives, including novel dimensions of social problems that keep arising as new data and historical information come out. Political discourses keep prioritizing the economy over social issues, thus keeping the environment, in all its forms, as the last pillar to take care of (Rodrigues and Lowan-Trudeau 2021). Meanwhile, the COVID-19 pandemic has produced a social catharsis by showing the racial disparities in health care access and health outcomes and teaching us that environmental (in)justice goes beyond local hazards and exposure to pollutants (Cooper and Nagel 2021). The complex interrelationships between the global infectious disease and socioecological systems uncover the structural inequalities contributing to higher mortality in people of color communities (Powers et al. 2021). Part of the problem is that people of color and poor communities are disproportionally impacted by pollution since they have historically been pushed toward circumstances/elements that other people avoid (e.g., land usage, facilities; Wilson et al. 2020). Sometimes the gentrification phenomenon displaces low-income families and socially marginalized residents, hence forcing them to live in crowded or inadequate conditions and affecting their healthcare access (Cole et al. 2021). Furthermore, environmental injustice affects most of the lives of children from lower socioeconomic backgrounds and, as such, maintains intergenerational environmental injustice in society (Rios et al. 2021). We recognize that a more comprehensive environment-centered education is the potential way to tackle such complex interlinks between infectious diseases, the environment, and social justice. In addition, using modern tools from artificial intelligence, we can potentially find novel elements that could lead us to previously missed analysis on the role of climate in contemporary environmental (in)justice.
b. Related climate datasets
There are few openly available datasets that provide climate data for both research and application purposes. The National Oceanic and Atmospheric Administration (NOAA) (NOAA 2022), through the National Centers for Environmental Information (NCEI), provides datasets that include weather variables such as temperature, precipitation, dewpoint, visibility, and so on. However, most of the observations on land rely on ground-based stations, which limits the resolution on covered areas across the United States; for example, many counties are far away from land-based stations. In comparison with existing data, our daily satellite-based measurements of temperature, relative humidity, and AOD provide annual cycles in these three variables for each county in combination with the Federal Information Processing Standard Publication 6–4 (FIPS 6–4) code, making them easier to match with datasets using the same granularity; for example, COVID-19, population, health and socioeconomic indicators, mobility, and so on for each county. The original satellite datasets include those three variables at regular Gaussian grids. However, we calculated spatially averaged values in such a way that corresponded with county geographical locations. Last, long-term AOD observations from a single instrument over the entire CONUS are only available from satellites. Temperature and relative humidity data for the entire globe including those over the ocean are another benefit of using satellite observations when running ML models for different spatial domains other than the United States. Hence, our proposed dataset, NASAdat, is the first dataset that can be easily used by the broader community to take advantage of up-to-date NASA’s satellite observations.
c. Geometric deep learning for spatiotemporal forecasting
Recently, GNNs and other GDL tools have emerged as a powerful alternative for modeling dependencies in multivariate spatiotemporal processes (W. Cao et al. 2020; Jiang and Luo 2022). In particular, Li et al. (2018), Yu et al. (2018), and D. Cao et al. (2020) introduce graph convolutional networks (GCNs) for multivariate time series forecasting, which allows us to more accurately capture nonstationary inter- and intradependencies among entities and to handle data heterogeneity, especially, in nonseparable cases. Nonseparability pertains to scenarios when spatial dependency varies over time and temporal dependency changes with geographical location. Since epidemiological data are always reported over the irregular polygons of census units, for example, counties, provinces, and states, and also tend to exhibit a highly nontrivial structure of spatiotemporal dependencies due to the complexity of socioenvironmental and pathogen interactions, GDL on manifolds and graphs is a promising new direction for infectious disease mapping. Few previous studies use climatological data, from NASA and other sources, along with DL models to explore connections with clinical severity (Segovia-Dominguez et al. 2021a). However, the utility of the GDL for biosurveillance and environmental justice remains largely unexplored.
3. Data
a. The NASAdat dataset
Instruments on NASA Earth Observing System (EOS) satellites, especially the Moderate Resolution Imaging Spectroradiometer (MODIS; Levy et al. 2013), provide high-accuracy measurements of AOD over both land and ocean for the last two decades. Furthermore, since September 2002, the Atmospheric InfraRed Sounder (AIRS; Aumann et al. 2003) aboard NASA’s Aqua satellite also provides vertical profiles of air temperature and moisture. As a result of the broad spatial coverage of AIRS, these observations from AIRS have advanced our understanding of annual cycles in near-surface temperature and moisture all over the globe. Figure 1 shows spatial distributions of AOD, surface air temperature, and relative humidity. The original datasets are publicly available through NASA Distributed Active Archive Centers (DAAC) servers. The AIRS3STD product provides the daily temperature and relative humidity datasets from 31 August 2002 to the present. The daily datasets are provided at regular Gaussian grids with 1° × 1° resolution. For each day, there is a file containing multiple variables including air temperature and relative humidity at the surface. To prepare our daily climatology data, we downloaded 6209 files for the 6209 days between 1 January 2003 and 31 December 2019. The Atmosphere Daily Global Product from MODIS on Terra, MOD08_D3, contains about 80 variables, including AOD at 550-nm wavelength, in each file for daily onboard observations. AOD from MODIS on board Terra is available from March 2000. We calculate daily AOD climatology using AOD for the 6939 days between 1 January 2001, and 31 December 2019. To utilize annual cycles in temperature, relative humidity, and AOD in each U.S. county, users of our data need to download only three files from our repository.1
(a) Aerosol optical depth (AOD) from the the Moderate Resolution Imaging Spectroradiometer (MODIS) averaged for the 19 years between 2001 and 2019. (b) Surface air temperature (K) and (c) relative humidity (%) from the Atmospheric Infrared Sounder (AIRS) averaged for the 17 years between 2003 and 2019.
Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0040.1
The county-level environment variables in our preprocessed datasets, which are rarely seen in other resources, make it unique and provide the possibility to discover the relationship between disease agents and their local environment so as to conduct surveillance of and respond to the infectious diseases threats more efficiently and effectively.
b. The sociodemographic and COVID-19 data
We use five socioeconomic indices provided by the CDC/Agency for Toxic Substances and Disease Registry (ATSDR) social vulnerability index (SVI) project (CDC-ATSDR 2022), which includes the socioeconomic status, household composition and disability, minority status and language, housing type and transportation, and the overall vulnerability index. We use the sociodemographic data based on the 2020 survey. The modifiable areal unit is county districts. [Additional details on the distinction between variables and SVI subindices can be found in Flanagan et al. (2011)]. We show the socioeconomic status based on percentile ranking values, from 0 to 1, with higher values indicating greater socioeconomic vulnerability (CDC-ATSDR 2022). To include the healthcare coverage dimension in our model, we also add four variables from the COVID-19 vaccine coverage index (CVAC) (CVAC 2022): historic under vaccination, sociodemographic barriers, resource constrained healthcare system, and healthcare accessibility barriers. For daily records on COVID-19 and hospitalizations, we use the curated datasets available from CovidActNow (2022).
4. COVID-19 and environmental (in)justice: What can geometric deep learning reveal?
a. Problem statement
Armed with the proposed dataset benchmark NASAdat, our goal here is to investigate the existence of predictive relationships (if any) between air quality (measured as satellite observations on AOD) and COVID-19 clinical severity (measured via COVID-19 hospitalizations).
Formally, let Yt be records on COVID-19 clinical severity and Xt be records on atmospheric variables, t = 1, 2, …. Let a positive integer h be the forecasting horizon. Under the concepts of Granger causality, our objective is to assess how different a conditional distribution of Yt+h, given
To train spatiotemporal graph convolutional networks, we represent the connection between adjacency counties as a weighted undirected graph
b. Remark 1
Hospitalization and mortality due to COVID-19 are often shown to be closely linked to a prior medical history of lung and other respiratory diseases (Núñez-Delgado et al. 2021). Some epidemiological studies have also associated exposure to particulate matter (PM) air pollution having an aerodynamic diameter smaller than 2.5 μm (PM2.5) with increased risk of respiratory diseases (Schraufnagel et al. 2019). As such, ambient PM pollution, which has been estimated with AOD observations, may shed an important light on assessing and predicting the severity of the COVID-19 burden and associated survival rates.
c. Remark 2
Note that connectivity among county-level geographical locations, for example, shared borders, provides a natural transmission network to track the disease spread. To account for temporal and spatial dependencies simultaneously, we perform experiments using a wide variety of recurrent graph neural networks, see Fig. 2 for the employed architecture.
Architecture overview: (I) Spatiotemporal graphs at different time stamps for California, where each county denotes a node. (II) Climatological features for each node including RH, temperature, AOD, precipitation, dewpoint, and so on. (III) The input of spatiotemporal graphs to geometric deep learning (GDL) models, i.e., node feature matrix (green box) and network structure at each time stamp. (IV) Conventional deep learning (DL) models, i.e., within IV, we apply RNNs, with or without a gated mechanism, to node features for multistep-ahead predictions. (V) Illustration of generating spatiotemporal node embeddings from
Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0040.1
d. Experimental settings
We analyze COVID-19 clinical severity in two U.S. states: Pennsylvania and Texas. We have selected these states on the basis of the availability of COVID-19 hospitalization data in the early stages of the pandemic. We use state-based information on sociodemographic indices. The methods are trained on a computer with AMD Ryzen 7 5800X v8 with a 4.7-GHz CPU, 32 gigabytes of RAM, and an NVIDIA RTX 3090 graphic card. We use daily data from 1 February to 31 December 2020 and split the graph signals into training set, the first 80% of days, and test set, the last 20% of days. Our training step uses five lags of daily reported values to produce a 15-day-ahead forecasting. We use the AMSGrad optimizer with the same learning-rate decay strategy to train all methods with a fixed learning rate of 0.02 without weight decay. In addition, the dropout rate is 0.5 and the epoch number is 500. For all methods, we train the model with the same hidden layer dimension (hidden_dim1 = 128) and the same output dimension (hidden_dim2 ∈ {60 251} for Pennsylvania and Texas, respectively). We use the same partition and run each algorithm 10 times with 10 different seeds and report the average accuracy along with standard errors. Python implementations use the PyTorch Geometric Temporal library (Rozemberczki et al. 2021). All experiments are conducted with the following setting: 500 epochs, 1 layer, 128 units, 0.5 as the dropout probability, 0.02 as the learning rate, and optimization via AMSGrad. Appendix B provides further details on the experimental settings.
e. Benchmark models
We benchmark two broad classes of neural networks: (i) spatiotemporal GCNs that exploit GCN and temporal convolution to capture dynamic spatial and temporal patterns and correlations; and (ii) recurrent neural network (RNN)-based approaches including long short-term memory (LSTM). We report performances using eight types of state-of-the-art benchmark GDL models (see Table 1). Our consensus analysis produces one vote for each county whenever a GDL model improves its forecasting results (by adding the information on AOD) with respect to the baseline (i.e., without the information on AOD). To the best of our knowledge, this is the first attempt to utilize GDL and, particularly, the model consensus among multiple GDL models for understanding the potential impact of air quality on COVID-19 hospitalizations and the associated implications of disproportionate environmental burden.
Performance for 15-day-ahead forecasts of COVID-19-related hospitalizations, based on DL and GDL models in two U.S. states: Pennsylvania and Texas, averaged over each state. Results (RMSE ± std dev) are averaged over 10 runs with different seeds. Boldface type indicates the best AOD-added performance. Statistically significant differences between Baseline and Baseline + AOD are marked with an asterisk (based on a one-sided two-sample t test). The best performance overall for each state is marked with a dagger symbol.
5. Experimental results
Table 1 shows the performance of all benchmarking models on two U.S. states (Pennsylvania and Texas). For the consensus analysis of each state, we compare Baseline versus Baseline + air quality (AOD), where the Baseline represents each model without using satellite-based data (see Table 1). Table 1 indicates that adding AOD leads to statistically significant improvement of predictive performance among five of eight models in Pennsylvania and four of eight models in Texas, thereby suggesting that air quality is likely to be an important predictor of COVID-19 clinical severity.
Figure 3 depicts the spatial distribution of GDL model consensus, measured as the number of GDL model votes for declaring AOD a useful predictor of COVID-19 clinical severity, on a county level for Pennsylvania and Texas. We find that a very high proportion of the DL models agree that COVID-19 hospitalizations in northeastern Texas, greater Houston (Texas), and southeastern Pennsylvania are impacted by higher AOD rates. This phenomenon can be attributed to a higher-than-average number of unhealthy air days in these counties (Air North Texas 2022). For instance, Houston experiences 9 times as much on-road air pollution as its metropolitan area counterparts (Environmental Defense Fund 2022), while Philadelphia (Pennsylvania) ranks as one of the 25 worst U.S. metropolitan areas for ozone and year-round particle pollution (American Lung Association 2022). In turn, as the left panel of Fig. 3 indicates, the counties that tend to exhibit the highest impact of air quality on COVID-19 hospitalizations in Texas are also the counties with the higher (i.e., indicating greater vulnerability) socioeconomic status index (CDC-ATSDR 2022), thereby implying the inequitable burden of environmental harm in these socioeconomically disadvantaged communities.
(a),(d) AOD maps; (b),(e) spatial distribution of the consensus among DL models, which suggest that AOD exhibits predictive utility for COVID-19 clinical severity (i.e., number of model votes agreeing that AOD is a useful predictor for COVID-19 hospitalizations); (c),(f) socioeconomic status map at the county level.
Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0040.1
6. Discussion and path to deployment
The biggest advantage of using satellite observations is their wide coverage over the entire globe. We already prepared the same datasets of temperature, relative humidity, and AOD averaged for each country but have not yet applied to them to model COVID-19 clinical severity outside the United States. Such COVID-19 biosurveillance analysis using GDL and other DL applied to the worldwide dataset will be our future work. Furthermore, we have experimented with multiresolution pattern matching using topological ML in application to the worldwide dataset (Ofori-Boateng et al. 2021), and we think that such multiresolution pattern matching will be of interest not only in environmental sciences but in broader problems of image processing and computer vision. Our software stack consists of a variety of tools for GDL and data visualization that are written in various languages but made accessible through Python bindings. This provides the flexibility for data scientists to build our model-voting system in a standalone environment while providing the possibility for onboard processor deployment of the trained models. We will expand the deployment possibilities to global coverage, using observations from polar-orbiting satellites, via the parallel processing capabilities and elastic scalability of the Advanced Data Analytics Platform (ADAPT) science cloud at the NASA Center for Climate Simulation. We will routinely publish codes and updates through the Open NASA Earth eXchange (OpenNEX) platform, thereby making a further impact on the scientific community and abroad.
7. Conclusions
The goal of this project has been to investigate the existence of predictive relationships (if any) between air quality (measured as satellite observations on AOD) and COVID-19 clinical severity (measured via COVID-19 hospitalizations), while accounting for the complex nonstationary heterogeneous spatiotemporal dependency structure of environmental, biosurveillance, and socioeconomic data. That is, while we do not perform formal hypothesis testing, our null hypothesis could be stated as that air quality has no impact on COVID-19 hospitalizations, and the alternative hypothesis is that poorer quality leads to higher hospitalizations. From our analysis, we tend to conclude that indeed AOD contains important information for predicting future COVID-19 hospitalizations, and in some cases, the relationship between AOD and COVID-19 clinical severity is particularly strongly demonstrated in socioeconomically vulnerable communities. Why can such a phenomenon happen? First, it is likely that socioeconomically vulnerable communities have poorer air quality, and our analysis does capture this adverse relationship between such negative exposure and health outcomes (in application to COVID-19, but it is likely to be applicable to many other respiratory and cardiovascular diseases). The obtained results then further underline the environmental (in)justice with respect to the socioeconomically disadvantaged communities (Wang et al. 2023; Van Horne et al. 2023; Conley et al. 2023), which has also recently found attention in the Justice40 initiative of the U.S. government. Second, it may be that such socioeconomically vulnerable communities have limited access to health care and then our results capture an association between poor quality and COVID-19 severity; further analysis in this case would require a standalone study of public health policies and healthcare outcomes in such areas. Third, it is important to note that poor air quality and its associative relationship with COVID-19 hospitalizations is observed in not only socioeconomically vulnerable communities, although the impact there is most substantial. It reconfirms earlier studies that poor quality is harmful to health (Alvarez 2023; Byrwa-Hill et al. 2023; Josey et al. 2023); particularly, in the case of COVID, it indicates that healthcare professionals shall be prepared for the increased emergency room visits of COVID patients during the projected increases of AOD. Furthermore, it appears that all models (with and without AOD) tend to perform better in Texas. We believe that such differences can be largely attributed to a richer topological structure of landscape factors and higher heterogeneity of land surface in Pennsylvania. In addition, variations in COVID-19 regulations and policies at the state and local levels could result in the difference between the two states. More strict enforcement level in Pennsylvania than in Texas may affect the models’ performance because our models did not reflect the difference in regulatory policies. To summarize, our focus here has been on predictive information yielded by AOD, where the complex interrelationships between air quality, COVID-19 hospitalizations, and sociodemographic information are incorporated into the learning representations—this can be viewed as one of the very first steps toward improving our understanding of the key latent mechanisms behind potential healthcare disparities among communities and the related impacts of excessive exposure to poor air quality, in general. Furthermore, a more extended analysis of potential confounding factors can shed more light on the causal effects of excessive hospitalization rates and environmental health outcomes.
In addition, despite the considerable impacts of aerosols on Earth’s radiation budget and air quality, aerosol response to the changing climate and even its mean state in the current climate are poorly represented in climate models according to the Intergovernmental Panel on Climate Change (IPCC 2021). Hence, NASA’s role as a provider of significant satellite data for the scientific community is crucial both to improve the current modeling tools and to support environmental justice. This project, in turn, complements the ongoing and planned missions funded by NASA for a better understanding of aerosols in Earth’s climate system. Aerosols and their effects on air quality and human health meet the outlined observation strategy of the 2017–2027 Decadal Survey for Earth Science and Applications from Space (ESAS; National Academies of Sciences, Engineering, and Medicine 2018) with the highest priority of “designated.” Nevertheless, NASA’s data presented in this paper remain largely unavailable to the broader AI community and this project opens a two-way street where both NASA’s data are more widely used in various AI initiatives for social good such as biosurveillance, spatiotemporal forecasting, and pattern matching, as well as more state-of-the-art ML algorithms are brought into climate studies. Last, we are confident that NASAdat and the associated platform integrating NASA’s satellite observations with DL tools will further enhance the utility of various NASA instruments for addressing present and future problems of social good.
Further details on the data (i.e., preprocessing, format, metadata description, etc.) and source codes can be found online (https://github.com/ZhiweiZhen/COVID-19_NASA).
Acknowledgments.
The project has been supported by NASA Grant 20-RRNES20-0021 under the Rapid Response and Novel Research in Earth Science, NASA AIST Grants 21-AIST21_2-0059 and NASA 21-AIST21_2-0020, the UTSystem-CONACYT ConTex program, and NSF ATD Grant DMS 1925346. A portion of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).
Data availability statement.
The NASAdat dataset is publicly available via the managed JPL research data repository at DataCite Commons (https://commons.datacite.org/repositories/vl6c4o9). Each variable can be downloaded: temperature (https://commons.datacite.org/doi.org/10.48577/jpl.z31y-2r10), relative humidity (https://commons.datacite.org/doi.org/10.48577/jpl.ws86-1q81), and atmospheric optical depth (https://commons.datacite.org/doi.org/10.48577/jpl.k37v-y751). Raw data of temperature and relative humidity are from the AIRS3STD product (https://disc.gsfc.nasa.gov/datasets/AIRS3STD_006/summary), and raw data of AOD are from the MODIS product (https://doi.org/10.5067/MODIS/MOD08_M3.006). Additional details on the quality-controlled observations and county-level computations are included in appendix A.
APPENDIX A
Further Details of the Collected NASAdat Dataset
a. Data preprocessing and format
AOD is a measure of the amount of light that atmospheric aerosols scatter and absorb and a monotonic function of air quality related to particulate matter near the ground. We generated daily climatology of AOD using the 19-yr observations between 1 January 2001 and 31 December 2019 (Fig. 1a) and used the climatological AOD in the team’s previous studies (Segovia-Dominguez et al. 2021a,b). To calculate a climatological mean for each day of the year, we averaged 19 observations between 1 January 2001, and 31 December 2019. For example, the climatological AOD on 1 January is an average of 19 New Year’s Days from 2003 through 2019.
We also provide data on the daily climatology of surface air temperature and RH from the Atmospheric Infrared Sounder (Aumann et al. 2003) as shown in Figs. 1b and 1c. To fully take advantage of its high spatial resolution, we use surface air temperature and relative humidity from AIRS and Cross-Track Infrared Sounder (CrIS) in 2020 and 2021. For example, GDL models can use topological summaries of the Community Long-Term Infrared Microwave Coupled Atmospheric Product System (CLIMCAPS) products as input. The underlying hypothesis to be tested over the next three years is that surface air temperature and RH may affect COVID-19 hospitalization and death indirectly.
The collected dataset includes a unique identifier for each county and is saved in the netCDF format. NASAdat can be accessed as follows: temperature (https://doi.org/10.48577/jpl.z31y-2r10; https://commons.datacite.org/doi.org/10.48577/jpl.z31y-2r10), relative humidity (https://doi.org/10.48577/jpl.ws86-1q81; https://commons.datacite.org/doi.org/10.48577/jpl.ws86-1q81), and AOD (https://doi.org/10.48577/jpl.k37v-y751; https://commons.datacite.org/doi.org/10.48577/jpl.k37v-y751). By including the FIPS code of each county, NASA’s atmospheric data in NASAdat are easily matched with county-level datasets from other public and private entities.
b. Uniqueness
The collected NASAdat dataset is unique in multiple aspects. First, long-term AOD observations from a single instrument over the entire CONUS, such as our NASAdat, is only available from satellites. While AOD observations are also available from NASA remote sensing Aerosol Robotic Network (AERONET) stations, AERONET coverage is noticeably sparser. In turn, many previous studies that compare AOD observations from MODIS with those from AERONET report reasonable agreement between the two, which also can serve as an additional measure of data quality control. Second, NOAA through NCEI provides data on such weather variables as temperature, precipitation, dewpoint, and visibility. Almost all of NOAA’s records rely on ground-based stations. As a result, in contrast to NASAdat, the NOAA data are limited to the resolution on covered areas across the United States, and many counties are far away from land-based stations, which further increases uncertainty in applications requiring better resolution, such as biosurveillance. Third, in comparison with all other existing data, our daily climatologies of temperature and relative humidity provide annual cycles in these variables for each county with the FIPS 6–4 code, thereby making it easier to match NASAdat with various key biosurveillance, socioeconomic and sociodemographic information of the best available granularity (i.e., at a county level) such as COVID-19 hospitalizations, cancer rates, and number of houses with solar panels. Fourth, temperature and relative humidity data for the entire globe including those over the ocean are another benefit of using satellite observations when running ML models for different spatial domains other than the United States. Fifth, the climatology datasets such as NASAdat can be used to study the impacts of the nation’s climate change on various sectors, from digital agriculture to resilience of critical infrastructures to adverse climate events. Moreover, given multiple types of ground truth instances associated with these data, for example, dust storms and teleconnection patterns, the presented benchmark NASAdat can serve as a test bed for a very broad range of ML tasks such as spatiotemporal forecasting with graph neural networks, transfer learning of climatic scenarios, dynamic clustering, anomaly detection, and multiresolution pattern matching.
c. Quality of the dataset
NASAdat undergoes standard data quality control checks under NASA guidelines. The original datasets were generated by averaging quality-controlled observations. As a part of retrieval algorithms, a quality flag is automatically assigned to each retrieved value of temperature, relative humidity, and AOD. The algorithms assign a quality flag of each pixel by comparing the observed values with predefined ranges of valid observations. A quality flag is a kind of automated annotation by a machine that is already considered in the original datasets. As such, we were confident about the quality of our newly generated datasets. Because of low-quality retrievals, there exists a small fraction of missing values in the original datasets. As per the standard statistical practice, these missing values are stripped when calculating a spatial and temporal average for each county.
Both the MODIS (https://atmosphere-imager.gsfc.nasa.gov/sites/default/files/ModAtmo/documents/QA_Plan_C61_Master_2021_09_22.pdf) and AIRS (https://docserver.gesdisc.eosdis.nasa.gov//public/project/AIRS/V7_L2_Quality_Control_and_Error_Estimation.pdf) missions provide more detailed information on the quality flag.
Both AIRS and MODIS datasets cover the entire globe. The total sizes of 6209 AIRS and 6939 MODIS files are about 2.5 and 4.2 gigabytes, respectively. In our processed data, each file for temperature, relative humidity, or AOD has a size of 95 megabytes.
d. Maintenance plan
Our previous work Lee et al. (2018) indicates that even 19 years (2001–19) may not be long enough to define statistically stable AOD climatology. Also, we recognize that continuous updates are the key for these data utilities, especially for biosurveillance and other time-sensitive applications. JPL NASA/California Institute of Technology will update our datasets 2 times per year and also whenever new versions of the NASA products are released through NASA’s DAACs.
In our maintenance plan, we are taking advantage of the fact that these benchmark data are one of the first projects within the most recent broader NASA JPL initiative on hosting datasets, such as these and assigning DOIs so there is persistence for papers, and also capturing the raw and any derived results. As such, JPL will continue updating and maintaining these benchmark data under this broader NASA initiative, with external access to a hub under the subdomain of jpl.nasa.gov. Our team will keep producing daily temperature, relative humidity, and AOD datasets from AIRS/CrIS and MODIS/VIIRS in a NetCDF format that can serve as input for multiple projects across the ML and atmospheric sciences communities. To take full advantage of the highest spatial resolution, we plan to expand and use level-2 surface air temperature and relative humidity from AIRS and CrIs of the following years. With the combination of using NASA front-end servers, NVIDIA DGX clusters at the NASA Center for Climate Simulation, and parallel processing capabilities and elastic scalability of the ADAPT science cloud, we expect to have no issue maintaining our data for years to come as these services will provide us all the resources necessary with no cost to NASAdat end users.
e. Original datasets
The raw/original datasets are publicly available through NASA DAAC servers. The AIRS3STD product provides the daily temperature and relative humidity datasets from 31 August 2002 to the present (https://doi.org/10.5067/Aqua/AIRS/DATA303; https://disc.gsfc.nasa.gov/datasets/AIRS3STD_006/summary). The Atmosphere Daily Global Product from MODIS on Terra (MOD08_D3) contains about 80 variables, including AOD at 550-nm wavelength, in each file for daily data (https://doi.org/10.5067/MODIS/MOD08_M3.006).
APPENDIX B
Further Details on the Experimental Setup
a. Benchmarking neural network models
We benchmark two broad classes of neural networks (i) RNNs: LSTM (Hochreiter and Schmidhuber 1997) can forecast univariate time series with LSTM hidden units; (ii) spatiotemporal graph convolutional networks: spatiotemporal model with the framework of GCN exploit GCN and temporal convolution to capture dynamic spatial and temporal patterns and correlations; we report performances for eight types of state-of-the-arts methods on our benchmark datasets including 1) diffusion convolutional recurrent neural network (DCRNN) (Li et al. 2018): diffusion convolution recurrent neural network that captures both spatial and temporal dependencies through random walks on graph and encoder–decoder architecture for multistep forecasting; 2) LSTM R-GCN (LRGCN) (Li et al. 2019): time-evolving neural network that integrates relational GCN (R-GCN) into the LSTM to fully investigate both intratime and intertime relations; 3) attention temporal graph convolutional network (A3T-GCN) (Bai et al. 2021): an attention temporal GCN that combines GCNs and gated recursive units (GRUs) with attention mechanism that can capture both spatiotemporal dependencies and global variation trends; 4) message passing neural networks with LSTM (MPNN+LSTM) (Panagopoulos et al. 2021): a time series version of message passing neural networks consists of a series of neighborhood aggregation layers to model in detail the dynamics of the spreading process; 5) evolving graph convolutional networks (EvolveGCNO and EvolveGCNH) (Pareja et al. 2019): evolving graph convolutional network that utilizes the recurrent model to update the trainable parameters of GCN for understanding and forecasting graph structure dynamics; 6) graph convolutional recurrent network (GconvLSTM) (Seo et al. 2018): graph convolutional recurrent network model that replaces convolution by graph convolution to extract the spatial-temporal information; 7) gated graph neural networks for dynamic graphs (DyGrEncoder) (Taheri et al. 2019): gated graph neural networks for dynamic graphs that uses a gated graph neural network equipped with standard LSTM for dynamic graph classification.
b. Further details on DL performance
Here we provide more details on the DL models, that is, DCRNN, LRGCN, AT3-GCN, MPNN+LSTM, EvolveGCNO, and GconvLSTM, and discuss potential reasons behind differences in their performance on Pennsylvania and Texas datasets.
-
DCRNN (Li et al. 2018) combines bidirectional random walks on the graph for spatial dependency and encoder–decoder architecture for temporal dependency.
-
LRGCN (Li et al. 2019) models the temporal dependency between consecutive graph snapshots as a distinct relationship imbued with memory and then employs relational GCN (without regularization terms) to simultaneously handle both intratime and intertime relationships.
-
AT3-GCN (Bai et al. 2021) models the short-term trend by using the gated recurrent units and learns the spatial dependence based on the topology of the dynamic networks through the graph convolutional network.
-
MPNN+LSTM (Panagopoulos et al. 2021)has a one-layer graph convolutional layer that incorporates an LSTM network.
-
EvolveGCNO (Pareja et al. 2019) adapts a GCN model (without regularization terms) along the temporal dimension without resorting to node embeddings.
-
Chebyshev GconvLSTM (Seo et al. 2018) applies one Chebyshev graph convolutional layer on graphs to identify spatial structures and RNN to find dynamic patterns.
Table 1 suggests that some models tend to deliver consistently better results without AOD (i.e., LRGCN and EvolveGCNO). One plausible reason for such performance degradation when introducing additional AOD features could be overfitting, especially since models like LRGCN and EvolveGCNO lack regularization terms. Moreover, AT3-GCN model exhibits a lower RMSE without AOD for Pennsylvania and the opposite for Texas. This might be attributed to AT3-GCN’s limited capability in discerning complex relationships between nodes in sparse networks (i.e., Pennsylvania data). Note that the average node degree of Pennsylvania and Texas are 18.9 and 25.3, respectively. However, MPNN+LSTM and GconvLSTM yield the reverse findings (i.e., AOD is found to be helpful for Pennsylvania and not helpful for Texas). The reasons are twofold: (i) we note that, in the case of MPNN+LSTM and GconvLSTM, differences among the model performances are less than one standard deviation and as such may be attributed to random fluctuations and (ii) there is only 1 GCN layer in MPNN+LSTM and GconvLSTM models that can learn the smaller networks well (i.e., Pennsylvania data) but fails to describe complex relations among nodes in larger networks (i.e., Texas data).
c. Why RMSE?
In our experiments we use the RMSE metric rather than R2 since RMSE is the standard metric for validation of predictive models in space–time forecasting Brockwell et al. (1991). Despite statistical criticism, R2 is still used in epidemiology. As such, we present a summary of results for R2. While we find that R2 for actual observations and hospitalization forecasts with/without AOD are generally similar in California, in Texas, and Pennsylvania R2 for GCNs with AOD tends to be from 0.05 to 0.25 higher than R2 for the same GCN but without AOD, with ranges from 0.6 to 0.88 in Pennsylvania and from 0.71 to 0.93 in Texas. These findings echo our conclusions on the contributions of AOD to COVID-19 clinical severity, based on predictive RMSE.
d. Why not regression models?
Simpler models, such as regression, ARIMA, and other Box–Jenkins class of models tend to focus only on linear relationships between variables and, as a result, cannot capture nonlinear nonseparable spatiotemporal dependencies of COVID-19 dynamics (and many other infectious diseases with high virulence). In turn, our analysis includes a broad range of DL architectures that allow us to address such nonlinear dependencies. Furthermore, the model consensus analysis presented in our paper enables us to address such pressing questions as whether a relative risk to be affected by COVID-19 is higher for some areas because of their higher exposure to poor air quality.
Nevertheless, here we also present experiments using simpler models such as random forest, logistic regression, and ARIMA(2, 1, 1) (selected using the Akaike information criterion). Based on the results in Table B1, we find that adding AOD to the backbone model can efficiently enhance the model and reduce the RMSE. However, in comparison with GDL, these simpler models perform less competitively. For example, in Texas, the best results without AOD are delivered by ARIMA and EvolveGCNO with RMSE of 89.2 and 54.6, respectively. In turn, the best results with AOD in Texas are yielded by RF (RMSE of 82.3) versus DCRNN (RMSE of 49.2).
Performance for 15-day-ahead forecasts of COVID-19-related hospitalizations, based on simpler models in two U.S. states: Pennsylvania, and Texas, averaged over each state. Results (RMSE ± std dev) are averaged over 10 runs with different seeds. Boldface type indicates the best AOD-added performance. The best performance overall for each state is marked with a dagger symbol.
e. Source codes
The source codes and other material, including this appendix, can be found online (https://github.com/safe-temp/covid19).
REFERENCES
Air North Texas, 2022: Ozone. https://www.airnorthtexas.org/ozone.
Alvarez, C. H., 2023: Structural racism as an environmental justice issue: A multilevel analysis of the state racism index and environmental health risk from air toxics. J. Racial Ethn. Health Disparities, 10, 244–258, https://doi.org/10.1007/s40615-021-01215-0.
American Lung Association, 2022: Air quality in Philadelphia metro area again worsened for ozone smog, finds 2019 ‘State of the Air’ Report, had best ever results for year-round particle pollution. ALA Press Release, https://www.lung.org/media/press-releases/air-quality-in-philadelphia.
Aumann, H., and Coauthors, 2003: AIRS/AMSU/HSB on the Aqua mission: Design, science objectives, data products, and processing systems. IEEE Trans. Geosci. Remote Sens., 41, 253–264, https://doi.org/10.1109/TGRS.2002.808356.
Bai, J., J. Zhu, Y. Song, L. Zhao, Z. Hou, R. Du, and H. Li, 2021: A3T-GCN: Attention temporal graph convolutional network for traffic forecasting. ISPRS Int. J. Geo-Inf., 10, 485, https://doi.org/10.3390/ijgi10070485.
Benami, E., R. Whitaker, V. La, H. Lin, B. R. Anderson, and D. E. Ho, 2021: The distributive effects of risk prediction in environmental compliance: Algorithmic design, environmental justice, and public policy. Proc. 2021 ACM Conf. on Fairness, Accountability, and Transparency, Online, Association for Computing Machinery, 90–105, https://doi.org/10.1145/3442188.3445873.
Brockwell, P., R. Davis, S. Fienberg, J. Berger, J. Gani, K. Krickeberg, I. Olkin, and B. Singer, 1991: Time Series: Theory and Methods. Springer, 580 pp.
Bronstein, M. M., J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, 2017: Geometric deep learning: Going beyond Euclidean data. IEEE Signal Process. Mag., 34, 18–42, https://doi.org/10.1109/MSP.2017.2693418.
Byrwa-Hill, B. M., T. L. Morphew, A. A. Presto, J. P. Fabisiak, and S. E. Wenzel, 2023: Living in environmental justice areas worsens asthma severity and control: Differential interactions with disease duration, age at onset, and pollution. J. Allergy Clin. Immunol., 152, 1321–1329.e5, https://doi.org/10.1016/j.jaci.2023.04.015.
Cao, D., and Coauthors, 2020: Spectral temporal graph neural network for multivariate time-series forecasting. NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, H. Larochelle et al., Eds., Curran Associates Inc., 17 766–17 778, https://dl.acm.org/doi/10.5555/3495724.3497215.
Cao, W., Z. Yan, Z. He, and Z. He, 2020: A comprehensive survey on geometric deep learning. IEEE Access, 8, 35 929–35 949, https://doi.org/10.1109/ACCESS.2020.2975067.
CDC-ATSDR, 2022: CDC/ATSDR social vulnerability index. CDC SVI 2018 Doc., 29 pp., https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI2018Documentation-H.pdf.
Cole, H. V. S., I. Anguelovski, F. Baró, M. García-Lamarca, P. Kotsila, C. P. del Pulgar, G. Shokry, and M. Triguero-Mas, 2021: The COVID-19 pandemic: Power and privilege, gentrification, and urban environmental justice in the global north. Cities Health, 5 (sup1), S71–S75, https://doi.org/10.1080/23748834.2020.1785176.
Conley, S., D. M. Konisky, and M. Mullin, 2023: Delivering on environmental justice? U.S. state implementation of the Justice40 initiative. Publius, 53, 349–377, https://doi.org/10.1093/publius/pjad018.
Cooper, D., and J. Nagel, 2021: Lessons from the pandemic: Climate change and COVID-19. Int. J. Sociol. Soc. Policy, 42, 332–347, https://doi.org/10.1108/IJSSP-07-2020-0360.
CovidActNow, 2022: U.S. COVID tracker. Accessed 1 March 2022, https://covidactnow.org.
CVAC, 2022: The COVID-19 vaccine coverage index. Accessed 1 March 2022, https://vaccine.precisionforcovid.org/.
Environmental Defense Fund, 2022: New tools reveal Houston’s pollution. https://www.edf.org/airqualitymaps/houston.
Flanagan, B. E., E. W. Gregory, E. J. Hallisey, J. L. Heitgerd, and B. Lewis, 2011: A social vulnerability index for disaster management. J. Homeland Secur. Emerg. Manage., 8, 3, https://doi.org/10.2202/1547-7355.1792.
Franklin, M., O. V. Kalashnikova, and M. J. Garay, 2017: Size-resolved particulate matter concentrations derived from 4.4 km-resolution size-fractionated multi-angle imaging spectroradiometer MISR aerosol optical depth over Southern California. Remote Sens. Environ., 196, 312–323, https://doi.org/10.1016/j.rse.2017.05.002.
Hochreiter, S., and J. Schmidhuber, 1997: Long short-term memory. Neural Comput., 9, 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735.
IPCC, 2021: Climate change widespread, rapid, and intensifying. IPCC Press Release, https://www.ipcc.ch/2021/08/09/ar6-wg1-20210809-pr/.
Jiang, W., and J. Luo, 2022: Graph neural network for traffic forecasting: A survey. Expert Syst. Appl., 207, 117921, https://doi.org/10.1016/j.eswa.2022.117921.
Josey, K. P., S. W. Delaney, X. Wu, R. C. Nethery, P. DeSouza, D. Braun, and F. Dominici, 2023: Air pollution and mortality at the intersection of race and social class. N. Engl. J. Med., 388, 1396–1404, https://doi.org/10.1056/NEJMsa2300523.
Krupnova, T. G., O. V. Rakova, K. A. Bondarenko, and V. D. Tretyakova, 2022: Environmental justice and the use of artificial intelligence in urban air pollution monitoring. Big Data Cognit. Comput., 6, 75, https://doi.org/10.3390/bdcc6030075.
Lee, H., M. Garay, O. Kalashnikova, Y. Yu, and P. B. Gibson, 2018: How long should the MISR record be when evaluating aerosol optical depth climatology in climate models? Remote Sens., 10, 1326, https://doi.org/10.3390/rs10091326.
Levy, R., S. Mattoo, L. Munchak, L. Remer, A. Sayer, F. Patadia, and N. Hsu, 2013: The collection 6 MODIS aerosol products over land and ocean. Atmos. Meas. Tech., 6, 2989–3034, https://doi.org/10.5194/amt-6-2989-2013.
Li, J., Z. Han, H. Cheng, J. Su, P. Wang, J. Zhang, and L. Pan, 2019: Predicting path failure in time-evolving graphs. Proc. 25th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, Anchorage, AK, Association for Computing Machinery, 1279–1289, https://doi.org/10.1145/3292500.3330847.
Li, Y., R. Yu, C. Shahabi, and Y. Liu, 2018: Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. Sixth Int. Conf. on Learning Representations, Vancouver, BC, Canada, ICLR, https://openreview.net/pdf?id=SJiHXGWAZ.
McGovern, A., I. Ebert-Uphoff, D. J. Gagne, and A. Bostrom, 2022: Why we need to focus on developing ethical, responsible, and trustworthy artificial intelligence approaches for environmental science. Environ. Data Sci., 1, e6, https://doi.org/10.1017/eds.2022.5.
Meng, X., M. J. Garay, D. J. Diner, O. V. Kalashnikova, J. Xu, and Y. Liu, 2018: Estimating PM2.5 speciation concentrations using prototype 4.4 km-resolution MISR aerosol properties over Southern California. Atmos. Environ., 181, 70–81, https://doi.org/10.1016/j.atmosenv.2018.03.019.
National Academies of Sciences, Engineering, and Medicine, 2018: A Midterm Assessment of Implementation of the Decadal Survey on Life and Physical Sciences Research at NASA. National Academies Press, 144 pp., https://doi.org/10.17226/24966.
NOAA, 2022: Climate data online. Accessed 1 March 2022, https://www.ncdc.noaa.gov/cdo-web/datasets.
Núñez-Delgado, A., Y. Zhou, and J. L. Domingo, 2021: Editorial of the VSI “Environmental, ecological and public health considerations regarding coronaviruses, other viruses, and other microorganisms potentially causing pandemic diseases.” Environ. Res., 192, 110322, https://doi.org/10.1016/j.envres.2020.110322.
Ofori-Boateng, D., H. Lee, K. M. Gorski, M. J. Garay, and Y. R. Gel, 2021: Application of topological data analysis to multi-resolution matching of aerosol optical depth maps. Front. Environ. Sci., 9, 684716, https://doi.org/10.3389/fenvs.2021.684716.
Panagopoulos, G., G. Nikolentzos, and M. Vazirgiannis, 2021: Transfer graph neural networks for pandemic forecasting. Proc. AAAI Conf. on Artificial Intelligence, Online, Association for the Advancement of Artificial Intelligence, 4838–4845, https://doi.org/10.1609/aaai.v35i6.16616.
Pareja, A., and Coauthors, 2019: EvolveGCN: Evolving graph convolutional networks for dynamic graphs. arXiv, 1902.10191v3, https://doi.org/10.48550/arXiv.1902.10191.
Powers, M., P. Brown, G. Poudrier, J. L. Ohayon, A. Cordner, C. Alder, and M. G. Atlas, 2021: COVID-19 as eco-pandemic injustice: Opportunities for collective and antiracist approaches to environmental health. J. Health Soc. Behav., 62, 222–229, https://doi.org/10.1177/00221465211005704.
Rios, C., A. L. Neilson, and I. Menezes, 2021: COVID-19 and the desire of children to return to nature: Emotions in the face of environmental and intergenerational injustices. J. Environ. Educ., 52, 335–346, https://doi.org/10.1080/00958964.2021.1981207.
Rodrigues, C., and G. Lowan-Trudeau, 2021: Global politics of the COVID-19 pandemic, and other current issues of environmental justice. J. Environ. Educ., 52, 293–302, https://doi.org/10.1080/00958964.2021.1983504.
Rozemberczki, B., and Coauthors, 2021: Pytorch geometric temporal: Spatiotemporal signal processing with neural machine learning models. Proc. 30th ACM Int. Conf. on Information & Knowledge Management, Online, Association for Computing Machinery, 4564–4573, https://doi.org/10.1145/3459637.3482014.
Schraufnagel, D. E., and Coauthors, 2019: Air pollution and noncommunicable diseases. Chest, 155, 417–426, https://doi.org/10.1016/j.chest.2018.10.041.
Segovia-Dominguez, I., H. Lee, Y. Chen, M. Garay, K. M. Gorski, and Y. R. Gel, 2021a: Does air quality really impact COVID-19 clinical severity: Coupling NASA satellite datasets with geometric deep learning. Proc. 27th ACM SIGKDD Conf. on Knowledge Discovery & Data Mining, Online, Association for Computing Machinery, 3540–3548, https://doi.org/10.1145/3447548.3467207.
Segovia-Dominguez, I., Z. Zhen, R. Wagh, H. Lee, and Y. R. Gel, 2021b: TLife-LSTM: Forecasting future COVID-19 progression with topological signatures of atmospheric conditions. Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, Vol. 12712, Springer, 201–212, https://doi.org/10.1007/978-3-030-75762-5_17.
Seo, Y., M. Defferrard, P. Vandergheynst, and X. Bresson, 2018: Structured sequence modeling with graph convolutional recurrent networks. Neural Information Processing, L. Cheng, A. Leung, and S. Ozawa, Eds., Lecture Notes in Computer Science, Vol. 11301, Springer, 362–373, https://doi.org/10.1007/978-3-030-04167-0_33.
Taheri, A., K. Gimpel, and T. Y. Berger-Wolf, 2019: Learning to represent the evolution of dynamic graphs with recurrent models. WWW’19: Companion Proc. 2019 World Wide Web Conf., San Francisco, CA, Association for Computing Machinery, 301–307, https://doi.org/10.1145/3308560.3316581.
Van Horne, Y. O., and Coauthors, 2023: An applied environmental justice framework for exposure science. J. Exposure Sci. Environ. Epidemiol., 33, 1–11, https://doi.org/10.1038/s41370-022-00422-z.
Wang, Y., and Coauthors, 2023: Air quality policy should quantify effects on disparities. Science, 381, 272–274, https://doi.org/10.1126/science.adg9931.
White, H., K. Chalak, and X. Lu, 2011: Linking granger causality and the pearl causal model with settable systems. Proc. Neural Information Processing Systems Mini-Symp. on Causality in Time Series, Vancouver, BC, Canada, PMLR, 1–29, https://proceedings.mlr.press/v12/white11/white11.pdf.
Wilson, S. M., R. Bullard, J. Patterson, and S. B. Thomas, 2020: Roundtable on the pandemics of racism, environmental injustice, and COVID-19 in America. Environ. Justice, 13, 56–64, https://doi.org/10.1089/env.2020.0019.
Yu, B., H. Yin, and Z. Zhu, 2018: Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. IJCAI’18: Proceedings of the 27th International Joint Conference on Artificial Intelligence, J. Lang, Ed., AAI Press, 3634–3640, https://dl.acm.org/doi/10.5555/3304222.3304273.
Yu, J. J., 2021: Citywide traffic speed prediction: A geometric deep learning approach. Knowl.-Based Syst., 212, 106592, https://doi.org/10.1016/j.knosys.2020.106592.
Zhang, R., H. Kim, E. Lien, D. Zheng, L. Band, and V. Lakshmi, 2021: Deep learning approach to predict peak floods and evaluate socioeconomic vulnerability to flood events: A case study in Baltimore, MD, USA. 2021 Systems and Information Engineering Design Symp., Charlottesville, VA, Institute of Electrical and Electronics Engineers, 1–6, https://doi.org/10.1109/SIEDS52267.2021.9483782.