1. Introduction
Coastal populations along the East Coast are facing increased flood exposure. Examples include sunny day flooding from high tides, precipitation events, and storm surge flooding from storms and hurricanes, most of which are exacerbated by sea level rise and coastal erosion (Bates et al. 2021; Thompson et al. 2021). These impacts are further amplified by climate change and are related to an increase in the severity of hurricanes, with more intense and widespread storm surges, extreme rainfall, and accelerated sea level rise (e.g., Bhatia et al. 2018; Ju et al. 2019; Knutson et al. 2013, 2020; Li et al. 2020). As the compound flood impacts are increasingly causing property damage, issues with accessibility and disruptions to people’s livelihoods (Tate et al. 2021), relocation from high-risk areas is becoming a more viable adaptation strategy (King et al. 2014; López-Carr and Marter-Kenyon 2015; Mach and Siders 2021). Whether as a long-term solution or an immediate response, population relocation has already been observed in many coastal U.S. communities following major disasters, for example after Hurricanes Katrina (2005), Sandy (2012), and Maria (2017) (Acosta et al. 2020; Binder et al. 2015; Spence et al. 2007). While relocation may be an effective adaptation strategy, it can have substantial socioeconomic costs and result in adverse outcomes without considering the nuanced political, socioeconomic, and cultural factors in sending and receiving locations (Bukvic et al. 2015).
Understanding the considerations that affect an individual’s willingness to relocate due to coastal flooding is vital for developing relocation planning and policy frameworks aligned with socioeconomic, cultural, and political contexts. Local governments generally plan for short-term time horizons shaped by political and budgeting cycles (McCarney et al. 2011). However, to prepare for a large-scale population migration in response to permanent coastal flooding, local policy makers must embrace a more holistic discourse on all possible flood-related outcomes, such as incremental or sudden outflux of residents from the affected area. It is also important for the federal government to develop policies supporting state and local officials to assist those who will be forced to relocate due to frequent or permanent inundation (Bukvic and Borate 2021). At the same time, little is known about the circumstances and scenarios under which households would collectively decide to relocate and the potential relocation triggers. Understanding of these factors will help inform policy and planning decisions about housing needs, support services, and incentives to accommodate anticipated population movement. For example, receiving communities can begin to prepare for population growth by evaluating how to best support a sudden influx of a large number of people with specific socioeconomic and cultural backgrounds and adapt housing, infrastructure, and services to accommodate this outcome.
Household surveys have been commonly used as a research tool to study population displacement and migration. Following Hurricane Sandy along the northeast U.S. coast, Bukvic et al. (2015) collected 125 household surveys in New Jersey’s communities significantly affected by the disaster to identify factors influencing willingness to consider permanent relocation. Further, data from telephone surveys were used to analyze factors driving the migration of individuals at increased risk of flooding along the Mississippi River delta (Correll et al. 2021).
Most of these primary data collection efforts rely on traditional statistical approaches for analysis, and rarely integrate such data with more advanced analytical techniques, in part due to limitations related to the data quality, format (e.g., open-ended questions), small sample size, and data scarcity (Grimmer et al. 2021). However, there is a growing interest in overcoming some of these barriers and finding innovative ways to integrate social science survey data with machine learning models that will produce novel results and advance survey design (Buskirk and Kirchner 2020). Data-driven machine learning models, including artificial neural networks, recurrent neural networks, support vector machine and random forests, have been extensively used for flood modeling and prediction (Le et al. 2019; Mosavi et al. 2018; Ramos-Valle et al. 2021; Zhang et al. 2018). Researchers have recently begun using different machine learning models to study population migration (Micevska 2021; Robinson and Dilkina 2018; Yi and Kim 2018). Such models are not only used for predictive purposes, but their analysis and visualization can help interpret and understand the relationship between the model’s input and output variables (McGovern et al. 2019; Ramos-Valle et al. 2021). Leveraging these capabilities of machine learning models, this study combines survey data with machine learning methods to identify determinants supporting relocation applied to the East Coast. In the context of this study and the coastal resilience field, relocation is often defined as a permanent voluntary movement from a location affected by natural hazards or climate stressors to another location with reduced or no such risk. It can take place at different distances, from neighborhood-level relocation (e.g., moving from an oceanfront property a few blocks away from the shoreline), to community, county, regional, or statewide movement. To our knowledge, this is the first study focused on the United States that integrates household survey data with machine learning.
Specifically, we design an artificial neural network (ANN) and a random forest (RF) to predict factors driving relocation and find patterns in the survey data. The goal of this study in applying machine learning models is twofold. First, we seek to train the models on the survey results to make predictions of whether an individual will relocate based in their socioeconomic attributes, past exposure and experiences with flooding, and concerns with flood impacts. Second, we aim to analyze the models to identify key predictors of relocation. This step is accomplished by identifying the variables the machine learning models consider most important. With these goals in mind, this study seeks to answer the following questions: (i) Can two distinct machine learning models be trained to predict an individual’s decision to relocate due to coastal flooding? (ii) Which model achieves the highest accuracy? (iii) What factors are the key determinants of one’s willingness to relocate in response to coastal flooding?
An overview of the survey data collected and used as input to the RF and ANN models is presented in section 2. Details of the RF and ANN models are also presented in section 2. Results on the optimal architecture required to develop an RF and ANN model with the given data, the model evaluation and the analysis of the input features are presented in section 3. A discussion of the results and conclusions of the study are provided in section 4.
2. Data and methods
Two machine learning classification models, a random forest and an artificial neural network are developed to evaluate results from a survey conducted on population migration due to potential flooding. We leverage the predictive capabilities of these models to understand whether individuals would choose to relocate from their current areas of residence. More importantly, we use the model results to understand the factors most important for making these predictions. In this section, we introduce the survey details, the dataset description, and the design and implementation of the machine learning models.
a. Survey data description
The data used in this study originate from a household survey conducted May–June 2021 in flood-prone urban coastal areas along the East Coast, from Delaware to Florida, described in detail in Bukvic and Barnett (2023). Purposive nonprobability sampling was used to collect inputs from respondents living in flood-prone coastal urban areas that may face the risk of displacement and relocation. The recruitment was primarily driven by the flood risk, although the income brackets were used to ensure a socioeconomically diverse representation. The survey recruitment areas were located in postal “zip codes” with 25% or more land area within the FEMA flood zone determined from the National Flood Hazard Layer (NFHL; FEMA 2021). The survey responses included information from 1450 participants from nine states (Fig. 1) and across 155 unique zip code areas. The survey was administered using Qualtrics online platform and customized research panels to extensive urban coastal areas to capture diverse perspectives on the possibility of flood-induced relocation (Bukvic and Barnett 2023).
The first set of questions collected sociodemographic information (age, race, family structure, education, employment status, residence time, household income and ownership status) from the participants. The distribution of the sociodemographic variables considered in this study is shown in Fig. 2 (see also Table A2 in the appendix). Most respondents are within the 30–39 age category (24%) and are employed full time (51%). The majority of respondents (51%) own their current residence. About 8% of respondents have lived in their current residences for more than 30 years, while about 46% have only been in residence for up to 5 years. Only 34% of residents have flood insurance, 54% do not have coverage, and 12% are not aware whether their residences would be insured due to a flooding event.
The survey had a total of 20 quantitative questions with multiple-choice and Likert-scale responses. Nine questions collected sociodemographic and economic information, out of which eight were used in the current study. They were followed by questions about a person’s attachment to their place of residence (hereinafter referred to as place attachment), past exposure to flooding and its impacts (hereinafter referred to as exposure), and drivers or factors affecting willingness to relocate due to flooding (hereinafter referred to as relocation drivers). The place attachment questions measured a person’s sense of belonging to their community using a five-point Likert-scale. The exposure questions focused on determining the types of flood events respondents experienced and their frequency, for example flooding from rainfall events or from storm surges. These questions also gauged the experienced impacts of these events, such as canceled school and medical appointments and property damage. The questions about relocation drivers assessed various concerns that may prompt respondents to relocate, such as deteriorated school systems, worsening crime and economy, and increased taxes on their coastal property using five-point Likert-scale responses. Notably, the survey also asked whether respondents would consider permanent relocation if the flooding in their community becomes more frequent and severe and where they would prefer to move. This question serves as a dependent variable. Ultimately, the machine learning models in this project are trained to predict whether an individual would report that they would relocate.
b. Machine learning models
RFs and ANNs are supervised machine learning algorithms in which both input and output are provided. As mentioned in the previous section, the model input is represented by a subset of survey questions (Table A1 in the appendix), and the output is given by the response to the consideration of permanent relocation. We present the application of RF and ANN to a classification problem using responses from a survey. Both models were implemented through the Scikit-Learn software in Python (Pedregosa et al. 2011).
For the machine learning application, we did not consider all the questions and responses in the survey. The data was filtered to remove questions and entries that allowed for write-in responses or responses to questions that did not provide relevant information (e.g., entries to household income that provided a “prefer not to answer” response). The survey questions considered represent input variables to the machine learning models, while the output or model target is defined by the overarching question of permanent relocation due to increased flooding frequency and severity. Respondents had the option to select from “yes,” “no,” or “maybe in the future” with regard to their willingness to relocate permanently. For purposes of our applications, we do not consider entries that responded with “maybe in the future” to remove a level of uncertainty in the models. This filtering reduced the dataset to 67 input variables and 883 samples. The responses were distributed into 660 “yes” and 173 “no” data points to considering relocation.
The input data is encoded to convert categorical data into a numerical representation before model training. We implement ScikitLearn’s one-hot encoder for variables that do not have a natural order or ranking (e.g., race or employment status) and an ordinal encoder for variables with a natural ranking associated with them (e.g., age or income). After using the one-hot-encoder encoder, the number of input variables increased to 90. A label encoder is applied to the target variable, where a value of 0/1 represents binary “no”/“yes” responses. For training purposes, the dataset is split into 75% for training and 25% for testing and is consistent for both models. The model training is performed with 624 samples. Similar to the entire dataset, the training dataset is unbalanced with 492 “yes” responses and 132 “no” responses. This issue is addressed differently for both models. The RF implementation uses the imbalanced dataset as input but handles the skewed distribution internally so that each tree in the forest sees a balanced subset. For the ANN implementation, the minority class is oversampled to have a balanced dataset before model training. Further discussion follows in the next sections.
1) Random forest model architecture
Random forests (Breiman 2001) are defined as a collection of individual uncorrelated decision trees that can be used for classification and regression problems. An example of a decision tree used in this study is shown in Fig. 3. The root node is the topmost decision node representing the entire population for a predictor. During training, the root node poses an initial true or false question to begin the classification process. The RF algorithm splits the training data recursively into subsets by identifying the most relevant questions and maximizing the subsets’ homogeneity (Ahijevych et al. 2016; Jergensen et al. 2020; McGovern et al. 2017). This is an iterative process that gets repeated creating multiple branch nodes until the nodes cannot split further. These end nodes are denoted leaf or terminal nodes and represent the tree classification. In an RF, each tree comes up with a solution or probabilities for each class. The most common solution from the decision tree becomes the prediction of the RF. Similar to applications of climate models, the idea behind RF is that an ensemble of models will outperform a single model solution. This is achieved by having multiple uncorrelated trees predicting a response. In our application of RF, we ensure this criterion by setting bootstrap aggregation (i.e., only a subset of the sample data is used) and feature randomness, in which a random subset of features or inputs are evaluated in each tree.
To determine the optimal hyperparameters, we also performed a tenfold cross validation for the number of estimators or decision trees in the RF, the maximum number of features each tree considers, the maximum depth of the tree and the minimum sample number to consider a node split (Table 1). We also tested the initial weights assigned to each class in the dataset. The training data imbalance is addressed in the implementation of the RF model via this “class weight” parameter where weights are assigned according to the frequency of the output class. Weights can be assigned as a function of the class frequency of the entire input dataset or as a function of the bootstrapped samples in each tree.
Tested RF parameters. Parameters selected are marked with an asterisk.
2) Artificial neural network architecture
ANNs are composed of multiple layers: an input layer, a variable number of hidden or intermediate layers, and an output layer. The input layer consists of a set of neurons representing the input variables, while computations and ANN learning are carried out in the neurons of the hidden layers. The ANN model seeks to minimize the binary cross-entropy loss function between the model prediction and the observed values. As such, the training process is an optimization problem to determine the weights that best describe the relationship between the input and output variables.
To determine the optimal hyperparameters to use we also performed a tenfold cross validation for the number of hidden layers, the number of neurons in each layer, the type of learning rate and the activation function (Table 2). The “adam” solver (Kingma and Ba 2014) was implemented in the model throughout the study. We also assess the use of a balanced and imbalanced training dataset by evaluating the data with and without resampling, respectively. The resampling performed on the training data consists of oversampling of the minority class. Samples of the “no” class are randomly selected and added to the training set. The resampling is performed with replacement, meaning that a specific minority class sample can be selected for the training set multiple times.
Tested ANN parameters. The parameters selected are marked with an asterisk. The ANN model selected used 140, 200 and 180 neurons in each layer, respectively.
c. Model interpretation and feature selection
One of the purposes of this study is to evaluate and determine the relationship between the inputs and output of the machine learning model. In this application, we evaluate the importance of variables concerning the question of permanent relocation. We use multiple methods for model interpretation, including permutation importance, chi-squared score, sequential forward selection, and mutual information analysis. Various methods allow us to gain confidence in the machine learning models and better understand how the models make decisions and what variables are important. These tools for model interpretation are employed using the Python Scikit-Learn package (Pedregosa et al. 2011) and SFS from the mlxtend package (Raschka 2018).
Similarly, the mutual information feature selection method is used to identify the k-best features by calculating the dependence between the variables. It measures the reduction in uncertainty of a variable given a known value of another (i.e., input features). Scores are equal or larger than 0, where 0 indicates independence between the variables and larger values indicate stronger dependency, thus enabling its use for feature selection.
Sequential forward selection (SFS) is also used to determine variable importance. More specifically, it identifies a subset of the most relevant predictors to the problem the model is addressing. The method recursively adds one predictor at a time to the model. The criteria to add predictors is that they must minimize the model error (i.e., SFS identifies the predictor that when added to the model, yields the best results). Using all these methods allows us to address the question of variable importance from various perspectives. The combination of evaluation methods provides confidence in the model results and correctly identifies patterns and relationships between the input and output variables.
3. Results
a. Model performance comparison between a random forest and an artificial neural network
A tenfold cross validation was performed on the RF to determine the combination of hyperparameters most suited for the model. Results indicate that the highest accuracy is achieved with 40 trees, each of which considers a balanced subsample between the two classes and a maximum of 10 features per tree. The optimal number of minimum samples in a split is five, and the optimal maximum tree depth resulted in eight levels. With this configuration, the RF model achieves an accuracy of 83%.
Similarly, 10-fold cross validation was performed to optimize an ANN. The highest model accuracy was achieved using a balanced training dataset, where the minority class (i.e., no relocation) in the training set is oversampled. That is, the ANN trains on a sample with an equal number (482) of “yes” and “no” responses to the question on permanent relocation. The model used three hidden layers consisting of 140, 200, and 180 neurons in each layer. It used the ReLU activation function and a constant learning rate of 0.001. This ANN configuration results in a model accuracy of 77%. A significance test between both models indicated that the differences between the model performance was not statistically significant.
b. Feature selection
Results of the RF and ANN models were used to identify the most important features. In our application, we are interested in identifying a subset of variables and categories most relevant in predicting relocation rather than identifying the single most important variable in the models. In the applied sense to survey research, we attempt to identify the most relevant questions that can be used to assess the willingness of an individual to consider relocation due to flooding. We employ the use of various analysis methods in addressing this inquiry.
The initial assessment of feature selection used the permutation feature importance method. A guiding question in this assessment is: How does the RF and ANN model accuracy change if we shuffle the values of a certain input variable? Results highlight the importance of questions about relocation drivers and place attachment in both models (Fig. 6). Permuting the data for the relocation scenario in which crime worsens in the community, results in the largest decrease in the RF and ANN model accuracy of 0.024 and 0.025, respectively, indicating its high importance. Interestingly, the top 10 variables for the RF model belong to the survey instrument exploring relocation drivers (e.g., neighbors move out or you experience one or more floods per year). The top 3 variables addressing the place attachment are proximity to the ocean, community traditions, and the lack of new development or construction in the area. While not identified as the main variables of importance, the key demographic variables are age, time of residence in the community, and lack of flood insurance (not shown in Fig. 6a).
The ANN model permutation importance scores show more variability and higher number of place attachments variables result as important predictors (Fig. 6b). Additionally, an exposure variable, an individual’s perceived vulnerability appears in the top 5 variables. Despite these differences, the models rank many of the same variables as important predictors, including worsening crime, not having flood insurance coverage, being closer to places of sentimental importance and neighbors potentially moving out.
The chi-squared score and the mutual information feature importance are also used to assess variable importance in the models. The top 20 variables identified in each case are shown in Fig. 7. The variables were shown to be significant at the 95% confidence level. In both assessments, relocation driver-based questions dominate the top 10 positions for both models, with variables common in both the chi-squared and mutual information assessments. These include increase in crime in the community, the prospect of being offered comparable housing elsewhere, and the possibility of experiencing one or more floods per year. Additionally, some variables are shown to be common for both models and in both assessments including limited access to services and amenities, business closures, and deteriorated school systems. The methods differ in the ranking and consideration of questions gauging place attachment and demographics.
Results from the sequential forward selection analysis in Table 3 show the model accuracy when a feature is individually added to the model. An accuracy of 83% is reached after including 10 variables in the RF model, identified in Table 3. The majority of the top 10 variables belong to questions about relocation drivers (variables 1, 2, 3, 7) or place attachment (variables 5, 6, 9, 10). Only a slight increase in accuracy (close to 84%) is obtained with 20 variables. This result suggests that a minimum subset of 10 variables could be used to achieve high accuracy for the RF model possibly due to reduced model complexity. An accuracy of 89% is achieved when adding the 10 identified variables in Table 3 to the ANN model. Some of the top 10 variables in the ANN model are similar to those identified for the RF, including worsening crime, experiencing one more flood per year and few abandoned properties in the neighborhood.
Top 10 variables resulting from the sequential forward selection analysis.
The four methods for model interpretation and feature selection used consistently identify relocation-related questions as important metrics to accurately determine an individual’s consideration for permanent relocation.
4. Discussion and conclusions
In this study, we use household survey data to train random forest and artificial neural network classifiers to determine an individual’s preference for relocation due to coastal flooding. The results are leveraged to understand how each input variable relating to the survey questions contributed to the prediction of willingness to consider relocation.
We performed a cross-validation analysis to determine the most optimal configuration and hyperparameters for each model. The highest model accuracy obtained was 83% and 77% for the RF and ANN models, respectively. While the purpose of this study was not to provide an in-depth comparison of the most optimal model, our results indicate that the differences between these models were not statistically significant, and both were suitable for this application.
Four methods were used to assess feature selection, including permutation importance, chi-squared scores, mutual information importance and sequential forward selection. All four assessments produced consistent results, highlighting the importance of relocation-based questions for accurately predicting relocation behavior. Some of the factors that are common among all the analyses are (i) concerns with crime increase in the community, (ii) the possibility of experiencing one more flood per year in the future, and (iii) the possibility of experiencing business closures due to flooding. Additional common variables resulting from the analyses are (iv) limited access to services and amenities and (v) being offered comparable housing elsewhere. Surprisingly, the variables gauging past exposure to coastal flooding and its impacts were not identified as important determinants of the decision to relocate.
The first two considerations, the crime increase and the exposure to a flood event in the next year, also served as noteworthy predictors of willingness to relocate due to coastal flooding in a multinomial logistic regression analysis (Bukvic and Barnett 2023). This finding aligns with the literature, showing that an increased crime contributes to the decision to migrate. For example, the migration gravity model shows that increased crime leads to out migration from urban neighborhoods to other urban areas perceived as safer (de Sousa 2014). Even though crime is a largely underexplored variable in migration studies (Koster and Reinke 2017), it has also been found that crime, violence, and insecurity and fear of crime serve as critical drivers of international migration (Cutrona et al. 2022). The possibility of one more flood per year also emerged as an important driver of relocation, similar to the results in Bukvic and Barnett (2023), where this concern served as a predictor of respondents saying yes to relocation. In both statistical and ML analyses, the perceived potential of experiencing two or more floods also emerged as important but less prominent for the random forest. Another flood impact that emerged as an important predictor of willingness to relocate was business closures due to flooding. In Bukvic and Barnett (2023), this flood outcome was not significantly correlated with willingness to relocate and was thus not included in the multinomial logistic regression.
In general, the application of machine learning models, specifically the RF, yielded similar results as the logistic regression in Bukvic and Barnett (2023), identifying relocation-based variables or drivers as the most important predictor category. We demonstrate consistency between the different approaches and validate the use and utility of machine learning models to evaluate survey data. Additionally, our results indicate the minimum number of predictors that could be used to achieve high accuracy in predicting relocation. The application of machine learning models, as well as model interpretation, has some limitations. The accuracy of the machine learning models is intrinsically related to the quality of the dataset used for training, namely sample size, data collection approach, and survey design. In addition, we point out that, for imbalanced datasets, accuracy alone, as a measure of model performance, is not sufficient. Other metrics such as the classification error and model specificity should be considered to thoroughly evaluate the model performance. Furthermore, interpreting model results can be challenging, particularly as there is often a trade-off between model accuracy and interpretability. Confidence in the top predictors of the feature importance metrics shown here builds up when evaluating across the multiple methods and from extensive comparison with the literature as mentioned above. The feature selection analyses coincided with identifying relocation-related variables as the most important category of predictors to accurately determine willingness to relocate. However, this result does not imply that the other variables and categories are unimportant. Machine learning models aim to find the solutions that provide the best predictive skill (McGovern et al. 2019), and the variable importance determined is based on optimizing the model accuracy.
In summary, the findings of this study are as follows:
-
Both a random forest and an artificial neural network model were successfully trained on household survey data addressing coastal relocation. This work demonstrates that the cross-pollination of social science inputs with ML approaches can be achieved and should be advanced using different survey modalities.
-
Results indicate that relocation-based questions are important predictors in determining willingness to relocate. Specific predictors include concern with increased crime in the community, experiencing one or more floods per year, having limited access to services and amenities, business closures, and offer of comparable housing elsewhere.
To our knowledge, this is the first study focused on the United States that integrates household survey data with machine learning to study an individual’s willingness to relocate due to coastal flooding. Machine learning models can outperform standard regression models (Hindman 2015), as well as discern complex patterns and nonlinear interactions among datasets with a large number of variables (Kreatsoulas and Subramanian 2018), such as the dataset in this study. The results presented here help to better understand the risk tolerance of flood-prone communities. The development of analytical methods to identify whether relocation is expected, as well as the establishment of policies informed by anticipatory planning have been identified as necessary in developing robust pathways to coastal retreat (Haasnoot et al. 2021). The conclusions from this study can be used to identify priority concerns driving flood-induced relocation and inform the development of local interventions managing the scale and extent of flood-induced relocation.
Acknowledgments.
This material is based upon work supported by the National Center for Atmospheric Research, which is a major facility sponsored by the National Science Foundation under Cooperative Agreement 1852977, and by the Early Career Faculty Innovator Program at the National Center for Atmospheric Research, a program sponsored by the National Science Foundation, under the Cooperative Agreement 1755088. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. The map in Fig. 1 was generated through MapChart.
Data availability statement.
The data used in this research involve human subjects. Because of privacy and ethical concerns protected by the Institutional Review Board, neither data nor the source of the data can be made available.
APPENDIX
Additional Supporting Material
a. Household survey description
Table A1 shows the survey questions used to train the machine learning models and the responses available to participants from which to select. The third column in the table quantifies the number of variables that each question represents. This depends on the format in which the question was used in the survey. For example, questions for which participants could select more than one response were represented as individual inputs to the model.
Survey questions and their representation in the machine learning models.
b. Distribution of demographic variables
Table A2 reports the distributions of participant’s sociodemographic data shown in Fig. 2.
Participant sociodemographic distributions.
REFERENCES
Acosta, R. J., N. Kishore, R. A. Irizarry, and C. O. Buckee, 2020: Quantifying the dynamics of migration after Hurricane Maria in Puerto Rico. Proc. Natl. Acad. Sci. USA, 117, 32 772–32 778, https://doi.org/10.1073/pnas.2001671117.
Ahijevych, D., J. O. Pinto, J. K. Williams, and M. Steiner, 2016: Probabilistic forecasts of mesoscale convective system initiation using the random forest data mining technique. Wea. Forecasting, 31, 581–599, https://doi.org/10.1175/WAF-D-15-0113.1.
Bates, P. D., and Coauthors, 2021: Combined modeling of US fluvial, pluvial, and coastal flood hazard under current and future climates. Water Resour. Res., 57, e2020WR028673, https://doi.org/10.1029/2020WR028673.
Bhatia, K., G. Vecchi, H. Murakami, S. Underwood, and J. Kossin, 2018: Projected response of tropical cyclone intensity and intensification in a global climate model. J. Climate, 31, 8281–8303, https://doi.org/10.1175/JCLI-D-17-0898.1.
Binder, S. B., C. K. Baker, and J. P. Barile, 2015: Rebuild or relocate? Resilience and postdisaster decision-making after hurricane sandy. Amer. J. Community Psychol., 56, 180–196, https://doi.org/10.1007/s10464-015-9727-x.
Breiman, L., 2001: Random forests. Mach. Learn., 45, 5–32, https://doi.org/10.1023/A:1010933404324.
Bukvic, A., and A. Borate, 2021: Developing coastal relocation policy: Lessons learned from the FEMA hazard mitigation grant program. Environ. Hazards, 20, 279–299, https://doi.org/10.1080/17477891.2020.1804819.
Bukvic, A., and S. Barnett, 2023: Drivers of flood-induced relocation among coastal urban residents: Insight from the US East Coast. J. Environ. Manage., 325A, 116429, https://doi.org/10.1016/j.jenvman.2022.116429.
Bukvic, A., A. Smith, and A. Zhang, 2015: Evaluating drivers of coastal relocation in Hurricane Sandy affected communities. Int. J. Disaster Risk Reduct., 13, 215–228, https://doi.org/10.1016/j.ijdrr.2015.06.008.
Buskirk, T. D., and A. Kirchner, 2020: Why machines matter for survey and social science researchers: Exploring applications of machine learning methods for design, data collection, and analysis. Big Data Meets Survey Science: A Collection of Innovative Methods, C. A. Hill et al., Eds., Wiley, 9–62.
Correll, R. M., N. S. N. Lam, V. V. Mihunov, L. Zou, and H. Cai, 2021: Economics over risk: Flooding is not the only driving factor of migration considerations on a vulnerable coast. Ann. Amer. Assoc. Geogr., 111, 300–315, https://doi.org/10.1080/24694452.2020.1766409.
Cutrona, S. A., J. D. Rosen, and K. A. Lindquist, 2022: Not just money. How organized crime, violence, and insecurity are shaping emigration in Mexico, El Salvador, and Guatemala. Int. J. Comp. Appl. Crim. Justice, https://doi.org/10.1080/01924036.2022.2052125.
de Sousa, F. L., 2014: Does crime affect migration flows? Pap. Reg. Sci., 93 (Suppl.), S99–S111, https://doi.org/10.1111/pirs.12047.
FEMA, 2021: National Flood Hazard Layer (NFHL). FEMA, accessed 24 February 2023, https://www.fema.gov/flood-maps/national-flood-hazard-layer.
Grimmer, J., M. E. Roberts, and B. M. Stewart, 2021: Machine learning for social science: An agnostic approach. Annu. Rev. Political Sci., 24, 395–419, https://doi.org/10.1146/annurev-polisci-053119-015921.
Haasnoot, M., J. Lawrence, and A. K. Magnan, 2021: Pathways to coastal retreat. Science, 372, 1287–1290, https://doi.org/10.1126/science.abi6594.
Hindman, M., 2015: Building better models: Prediction, replication, and machine learning in the social sciences. Ann. Amer. Acad. Political Soc. Sci., 659, 48–62, https://doi.org/10.1177/0002716215570279.
Jergensen, G. E., A. McGovern, R. Lagerquist, and T. Smith, 2020: Classifying convective storms using machine learning. Wea. Forecasting, 35, 537–559, https://doi.org/10.1175/WAF-D-19-0170.1.
Ju, Y., S. Lindbergh, Y. He, and J. D. Radke, 2019: Climate-related uncertainties in urban exposure to sea level rise and storm surge flooding: A multi-temporal and multi-scenario analysis. Cities, 92, 230–246, https://doi.org/10.1016/j.cities.2019.04.002.
King, D., and Coauthors, 2014: Voluntary relocation as an adaptation strategy to extreme weather events. Int. J. Disaster Risk Reduct., 8, 83–90, https://doi.org/10.1016/j.ijdrr.2014.02.006.
Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.
Knutson, T. R., and Coauthors, 2013: Dynamical downscaling projections of twenty-first-century Atlantic hurricane activity: CMIP3 and CMIP5 model-based scenarios. J. Climate, 26, 6591–6617, https://doi.org/10.1175/JCLI-D-12-00539.1.
Knutson, T. R., and Coauthors, 2020: Tropical cyclones and climate change assessment: Part II: Projected response to anthropogenic warming. Bull. Amer. Meteor. Soc., 101, E303–E322, https://doi.org/10.1175/BAMS-D-18-0194.1.
Koster, M. D., and H. Reinke, 2017: Migration as crime, migration and crime. Crime Hist. Soc., 21, 63–76, https://doi.org/10.4000/chs.1793.
Kreatsoulas, C., and S. V. Subramanian, 2018: Machine learning in social epidemiology: Learning from experience. SSM Popul. Health, 4, 347–349, https://doi.org/10.1016/j.ssmph.2018.03.007.
Le, X.-H., H. V. Ho, G. Lee, and S. Jung, 2019: Application of long short-term memory (LSTM) neural network for flood forecasting. Water, 11, 1387, https://doi.org/10.3390/w11071387.
Li, M., F. Zhang, S. Barnes, and X. Wang, 2020: Assessing storm surge impacts on coastal inundation due to climate change: Case studies of Baltimore and Dorchester County in Maryland. Nat. Hazards, 103, 2561–2588, https://doi.org/10.1007/s11069-020-04096-4.
López-Carr, D., and J. Marter-Kenyon, 2015: Human adaptation: Manage climate-induced resettlement. Nature, 517, 265–267, https://doi.org/10.1038/517265a.
Mach, K. J., and A. R. Siders, 2021: Reframing strategic, managed retreat for transformative climate adaptation. Science, 372, 1294–1299, https://doi.org/10.1126/science.abh1894.
McCarney, P., H. Blanco, J. Carmin, and M. Colley, 2011: Cities and climate change. Climate Change and Cities: First Assessment Report of the Urban Climate Change Research Network, C. Rosenzweig et al., Eds., Cambridge University Press, 249–270.
McGovern, A., K. L. Elmore, D. J. Gagne, S. E. Haupt, C. D. Karstens, R. Lagerquist, T. Smith, and J. K. Williams, 2017: Using artificial intelligence to improve real-time decision-making for high-impact weather. Bull. Amer. Meteor. Soc., 98, 2073–2090, https://doi.org/10.1175/BAMS-D-16-0123.1.
McGovern, A., R. Lagerquist, D. J. Gagne, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 2175–2199, https://doi.org/10.1175/BAMS-D-18-0195.1.
Micevska, M., 2021: Revisiting forced migration: A machine learning perspective. Eur. J. Political Econ., 70, 102044, https://doi.org/10.1016/j.ejpoleco.2021.102044.
Mosavi, A., P. Ozturk, and K. Chau, 2018: Flood prediction using machine learning models: Literature review. Water, 10, 1536, https://doi.org/10.3390/w10111536.
Pedregosa, F., and Coauthors, 2011: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 2825–2830.
Ramos-Valle, A. N., E. N. Curchitser, C. L. Bruyère, and S. McOwen, 2021: Implementation of an artificial neural network for storm surge forecasting. J. Geophys. Res. Atmos., 126, e2020JD033266, https://doi.org/10.1029/2020JD033266.
Raschka, S., 2018: MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. J. Open Source Software, 3, 638, https://doi.org/10.21105/joss.00638.
Robinson, C., and B. Dilkina, 2018: A machine learning approach to modeling human migration. Proc. First ACM SIGCAS Conf. on Computing and Sustainable Societies, San Jose, CA, Association for Computing Machinery, 30, https://dl.acm.org/doi/10.1145/3209811.3209868.
Spence, P. R., K. A. Lachlan, and J. M. Burke, 2007: Adjusting to uncertainty: Coping strategies among the displaced after Hurricane Katrina. Sociol. Spectrum, 27, 653–678, https://doi.org/10.1080/02732170701533939.
Tate, E., M. A. Rahman, C. T. Emrich, and C. C. Sampson, 2021: Flood exposure and social vulnerability in the United States. Nat. Hazards, 106, 435–457, https://doi.org/10.1007/s11069-020-04470-2.
Thompson, P. R., M. J. Widlansky, B. D. Hamlington, M. A. Merrifield, J. J. Marra, G. T. Mitchum, and W. Sweet, 2021: Rapid increases and extreme months in projections of United States high-tide flooding. Nat. Climate Change, 11, 584–590, https://doi.org/10.1038/s41558-021-01077-8.
Yi, C., and K. Kim, 2018: A machine learning approach to the residential relocation distance of households in the Seoul metropolitan region. Sustainability, 10, 2996, https://doi.org/10.3390/su10092996.
Zhang, J., A. A. Taflanidis, N. C. Nadal-Caraballo, J. A. Melby, and F. Diop, 2018: Advances in surrogate modeling for storm surge prediction: Storm selection and addressing characteristics related to climate change. Nat. Hazards, 94, 1225–1253, https://doi.org/10.1007/s11069-018-3470-1.