Predicting Flood Property Insurance Claims over CONUS, Fusing Big Earth Observation Data

Each year throughout the contiguous United States (CONUS), flood hazards cause damage amounting to billions of dollars in homeowner insurance claims. As climate change 17 threatens to raise the frequency and severity of flooding in vulnerable areas, the ability to predict the number of property insurance claims resulting from flood events becomes 19 increasingly important to flood resilience. Based on random forest, we develop a flood property Insurance Claims model (iClaim) by fusing records from the National Flood 21 Insurance Program (NFIP), including building locations, topography, basin morphometry, 22 and land cover, with data from multiple sources of hydrometeorological variables, including 23 flood extent, precipitation, and operational river-stage and oceanic water-level measurements. The model utilizes two steps — damage level classification and claim number regression — and 25 subsampling strategies designed accordingly to reduce overfitting and underfitting caused by 26 the flood claim samples, which are unevenly distributed and widely ranged. We evaluate the 27 model using 446,446 grid samples identified from 589 flood events occurring from 2016 to 28 2019 over CONUS, overlapping 258,159 claims out of a total of 287,439 NFIP records of the 29 same period. Our rigorous validation yields acceptable performance at the grid/event, 30 county/event, and event accumulative level, with R 2 over 0.5, 0.9, and 0.95, respectively. We conclude that the iClaim model can be used in many application scenarios, including 32 assessing flood impact and improving flood resilience.

The distribution of the response variable, the flood claims, is highly unbalanced (that is, 107 events with high losses are significantly rarer than events with low loss); this is, 108 6 File generated with AMS Word template 2.0 unfortunately, common in natural disasters (Yang et  For this study, we propose a hybrid ML scheme, iClaim, to predict in NRT the grid-total 114 count of property insurance claims caused by every flood event over CONUS. The datasets 115 and data processing are described in the second section. For a given flood event, we first 116 locate its potential flood zone (PFZ) by using a low-cost flood locator developed in a 117 CONUS inundation archive (Yang et al. 2021), which eliminate the need for a post-flood 118 surveyed location. Then, within the PFZ, we extract predictors and the response variable from 119 hydrometeorological, property, land use, and topographical data and data on effective 120 insurance policies. We develop a subsampling strategy to cope with the data imbalance issue 121 in training, as explained in the third section. The fourth section demonstrates the validation 122 results through rigorous leaving-one-out strategies. Finally, we examine possible applications 123 and discuss the limitations of iClaim. 124

Data processing
125

Event-based variable formation 126
The occurrence of flooding is low across both time and space. It is, therefore, inadvisable 127 to train a flood damage model using full time-series data covering all of CONUS because the 128 model will be overwhelmed by zero-claim samples. To focus on the potentially flooded areas 129 of each event in our study, we only take samples and run the model within the spatial-130 temporal extent of the PFZ, which could be rapidly generated by a flood locator proposed by 131 7 File generated with AMS Word template 2.0 For this study, we upgrade the flood locator by including flood zones caused by storm 141 surges, using hourly tidal station water-level measurements from the National Oceanic and 142 Atmospheric Administration (NOAA). We delineate the surge-and tidal water-affected areas 143 from a reverse routing algorithm (Lehner et al. 2013) by (1) interpolating the daily maximal 144 water level at the coast at 1 km resolution and (2) routing the coastal water reversely along 145 the flow direction to overland pixels where the elevation did not exceed 10 feet above the 146 coastal water level. The resultant daily dynamic PFZs included fluvial, pluvial, and coastal 147 flood zones (Fig. 1a). Finally, we calculate the daily PFZ by merging spatially proximate 148 PFZs for consecutive days, using the algorithm detailed by Yang et al. (2021). We then use 149 the time series of PFZs for each flood event to sample the claim records as well as the 150 predictors (Fig. 1b). Note that a PFZ could provide the maximum potential spatial extent of a 151 flood event but not the depth. or census tract. Since it is more convenient to unify multisourced data into regular grids, we 161 choose 0.1° × 0.1° grids as our spatial unit. 162

Predictor selection 163
We choose eight categories of predictors, including satellite-based flood extent, 164 precipitation, coastal water level, building location, land use, topography, geomorphology, 165 and effective policy count, as described below and listed in Table 1. It is worth mentioning 166 that, based on the unique spatial and dynamic temporal coverage of each sample (i.e., 0.1° 167 grid per event), we use grid statistics, such as count, mean, or fraction, for most features. All the data sources listed in Table 1 are freely accessible to the public. Fig. ES1 in the 217 online supplemental material demonstrates an example of a 0.1° × 0.1° grid with some 218 visualizable predictors. Note that, as high-resolution inundation extents are seldom available 219 for short-lasting flood events due to the acquisition gaps of Sentinel-1 satellites, the NaN 220 (empty) value prevails in the samples associated with those events. Furthermore, we set the 221 predictors related to coastal flood severity and exposure (predictors 12, 20, 21, and 22 in 222 Table 1) to a fixed value for inland grids-i.e., the average hydrological distance to the 223 11 File generated with AMS Word template 2.0 shoreline is longer than 50 km. Specifically, we set predictors 12 and 20 to zero and 224 predictors 21 and 22 to 999999. 225

226
The hybrid machine-learning scheme 227 To reduce the underfit of high-claim samples caused by the skewed sample distribution, 228 we propose a hybrid machine-learning scheme consisting of two steps-damage level 229 classification and claim number regression-as depicted in Fig. 1c. We first classify a given 230 sample to a damage level, then predict its final claim number by using the regressor 231 associated with the predicted damage level. In other words, we train a regressor for each 232 damage level, which help the regressor focus on a certain range of damage. We define five 233 damage levels: level 0 refers to no claim and level n (0<n≤5) to a claim number between 10 n-1 234 and 10 n . Fig. 1c shows the overview of the model scheme. 235

Subsampling strategy 236
To work with unbalanced data, we develop a subsampling method, as shown in Fig. ES2. 237 After splitting the input data into training and testing datasets, we randomly shuffle the 238 negative samples (i.e., the zero-claim samples) of the training data into K (K>1) splits and 239 make each split roughly match the size of the samples with positive claims. We form a basic 240 training unit (BTU) by combining one negative split with all the positive samples. Since K>1, 241 we would finally have more than one model output to form an ensembled result, similar to 242 that produced by the EasyEnsemble method (Liu et al. 2009  were highly unbalanced across all damage levels, with over 93% having no claims reported 337 and less than 1.5% (i.e., levels 3-4) contributing over 70% of the claims (Fig. 3b-c). 338 16 File generated with AMS Word template 2.0 Moreover, the sample size decreases exponentially with the claim count (Fig. 3d), which also 339 makes the distribution of samples within each damage level uneven. 340

Prediction of damage level 341
Both iClaim's classification model and the simple RF achieve seemingly high accuracy 342 on the damage level prediction, as indicated by their K-score values. But the simple RF only 343 performs well in predicting level 0 (Fig. 4b), whereas the prediction by iClaim yields high 344 accuracy across all damage levels (Fig. 4a). Note that stakeholders and policymakers have 345 more urgent needs of the capability in predicting the damage of major flood events whose 346 damage level is usually high (i.e., levels 3-4). We also observed that, in iClaim, most of the 347 confusion occur between adjacent levels, which allows the CLS in the regression step to 348 reduce significantly the error caused by misclassification. 349 Prediction of claim count 350 Fig. 5 shows the density scatter plots at the grid by event level using the four EnLOO 351 scenarios, based on both the iClaim model and simple RF. For iClaim, the R 2 of all validation 352 scenarios was close to or above 0.5. The percentage of bias (i.e., Pbias) was within ± 20%, 353 and the RMSE of most samples were below 20 houses, indicating an acceptable accuracy over 354 the CONUS area. As shown in Fig. 5f-h, the simple RF significantly underperforms iClaim. 355 In the LEO scenario, its performance is unacceptable. 356 Among all the EnLOO scenarios, the LSO (Fig. 5a and e) gains the highest accuracy for 357 both iClaim and RF, with R 2 being 0.628 and 0.532, respectively. In this scenario, the 358 EnLOO take advantage of both spatial and intra-event correlations hidden in the data. The 359 performance of both models decrease slightly in the LGO and LCO scenarios (Fig. 5b-c and  360 f-g), indicating the claim prediction did not have strong spatial dependence (Ploton et al.

17
File generated with AMS Word template 2.0 2020). As demonstrated by the significantly reduced R 2 and increased RMSE in the LEO 362 scenario ( Fig. 5d and h), the model utilizes intra-event correlation easily. The reduced 363 performance in the LEO scenario could also be attributed to the claim distribution, which 364 contains only a handful of major flood events (see Fig. 5a). Leaving those high-claim events 365 out means testing the model performance for unobserved extremes. Even in this most 366 challenging scenario, iClaim prediction still shows an acceptable agreement with 367 observations. In contrast, the simple RF model severely underestimates the high-claim 368 samples and overestimates the low-claim samples. The primary reason may be the integration 369 of the balanced subsampling strategies (i.e., CLS and BTL) in iClaim, which gives a less 370 biased estimation, even with some extreme samples permuted. 371 In practice, the prediction result at the county by event level is more important to 372 stakeholders and policymakers. We obtain the county-level predictions and observations by 373 aggregating from the grid level. The overall agreement between model prediction and 374 observation is expected to be significantly higher at the county-level than at the grid-level (R 2 375 are increase from 0.5~0.6 to 0.9), indicating that much of the random error at the grid-level is 376 neutralized in the aggregation process (Fig. 6). As in the grid-level analysis, the LEO 377 scenario yields the lowest performance (Fig. 6d), with R 2 and RMSE at 0.894 and 132.218 378 houses, respectively, which was acceptable, proving that iClaim could characterize 379 unobserved flood events. 380 To demonstrate further the predictability of unobserved major flood events, we extract 381 Mississippi River region (Fig. 7e), which could be attributed to the lack of sufficient Sentinel-389 1 coverage over these areas during the event (refer to Fig. 6a from Yang et al. (2021)). 390 To obtain a better understanding of the performance of iClaim across different events, we 391 compute the R 2 at both the grid-and county-levels for each flood event from the LEO 392 scenario, as shown in Fig. 8. The result indicates that the accuracy increases with the total 393 claim number for each event, and better performance could be expected at the county-level 394 than the grid-level. Lower performance was also reported for the events with fewer claims 395 (e.g., rank 71-589 events). Among those events, 190 were Fluvial Events without Sufficient 396 Predictors (FESP represents the events induced by fluvial floods while have no Sentinel-1 397 SAR images coverage). Most of these are found to have R 2 less than 0.1 and 0.2 at the grid 398 and county levels, respectively. This revealed that iClaim suffers from the lack of sufficient 399 predictors in the low-damage fluvial events. Fig. ES3 shows the prediction result in the event 400 accumulative level from the LEO scenario. The 0.952 R 2 reveals that the overall high 401 consistency with the observed claim number. Similar to the grid/event and county/event 402 levels, however, the accuracy drops significantly when the total claims per event were fewer 403 than 300. 404 To verify iClaim's predictability for unseen locations, we compute the spatial distribution 405 of R 2 at the county level using results of the LCO scenario, as depicted by Fig. 9. Overall, the 406 mean and the median of the local R 2 are 0.614 and 0.803, respectively. In particular, the local 407 R 2 tend to be higher for areas vulnerable to flooding, where the average of county-total claims 408 per event was greater than five ( Fig. 9a and b).

19
File generated with AMS Word template 2.0

Feature importance assessment
Since the LEO is the benchmark validation scenario for iClaim, we choose it to compute 411 the relative importance of features, as shown in Fig. 10. For clarity, we normalize the relative 412 importance, so the sum of all features is 100%. All finally selected features of iClaim are, 413 overall, important. At the low damage levels (0 and 1), Psummax5, Emean, Pmax, and NP are the 414 most important features (Fig. 10a). At the high levels (2-4) The contributions of some features are contrasting for low and high damage levels (Fig.  420   10b). The satellite-based flood inundation extent, for instance, is significantly more important 421 at the high levels than the low. This result is intuitive, because a short-lasting flood event is 422 significantly less likely to be captured by the Sentinel-1 than a long-lasting event. In fact, The capability of blind prediction at fine spatiotemporal scale, which is per grid per event, 443 had not been reported by previous studies. 444 Technically, iClaim works for both single-source flood events (e.g., the 2019 Midwestern 445 Great Flood) and compound ones (e.g., hurricane-induced events) to which heavy 446 precipitation and storm surge contribute simultaneously. By constructing a two-step working 447 flow-damage level classification followed by claim number estimation-with a specially 448 designed subsampling strategy, iClaim can overcome the overfitting and underfitting issues 449 that occur when dealing with flood claim samples that show a strongly skewed distribution in 450 both space and time. By alchemizing information from multisource satellite products, river 451 and costal water level monitoring systems, and flood insurance inventories, the model 452 demonstrates acceptable predictability at the grid/event, county/event, and event 453 accumulative levels, respectively, also including the predictability for unseen locations and 454 events. Its integration of a low-cost flood locator makes iClaim the only blind-prediction 455 model that eliminating the dependence on post-surveyed flood location information. It 21 File generated with AMS Word template 2.0 utilizes, for the first time, the NRT remotely sensed flood extent as predictors in a novel way, 457 comparing the elevation distribution of inundated pixels with that of properties. 458 The following lessons are learned from the feature importance assessment: 459 (1) As the primary flood driver, precipitation shows notable importance for all levels of 460 flood events, which is consistent with Wang et al. (2015). 461 (2) It is important to increase flood insurance penetration ( (1) It cannot predict the claim status for each individual house, as it is limited by the 471 truncated location information of the publicly accessible NFIP records. 472 (2) We only tested it using four years of data, as we are limited by the property locations 473 unknown to us from earlier times and by the seldom available flood extent products 474 based on SAR (e.g., Sentinel-1) from early years (e.g., before 2016). 475 In the future, we will revisit these limitations by reaching out to authorities and vendors to 488 access additional datasets under research-oriented and confidential agreements. These will 489 include FEMA for the original NFIP data records, Zillow for the housing records, and ICEYE 490 for the sub-daily SAR data. 491 In addition, we will extend the model to include longer historical records to enable    Table 1.  782